pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-27 00:54:52 +08:00

Author	SHA1	Message	Date
Soumith Chintala	73738ec570	bump version to 1.0 (#11717 ) Summary: I'm just doing the honors and bumping the version to 1.0.0. 1.0 preview and RC releases will have the 1.0.0.dev{date} tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717 Reviewed By: SsnL Differential Revision: D9840857 Pulled By: soumith fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace	2018-09-17 12:13:48 -07:00
vishwakftw	47d65ed34f	Fix issue 10492 (#11634 ) Summary: - pass infos vector by reference - checkErrors takes infos vector by reference - modified gesv tests to not cause infs or nans sporadically - also clean up error messages Reviewed By: ezyang Differential Revision: D9818550 Pulled By: soumith fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8	2018-09-17 12:13:45 -07:00
Gregory Chanan	39520ffec1	remove Type/Tensor/TensorMethods include order dependencies. (#11720 ) Summary: Previously, it was a necessity to include TensorMethods.h after Tensor.h in order to get the tensor method definitions. We abstracted this away from users by making sure ATen.h did this correctly; but we don't have any equivalent for ATen/core. In order to solve this dependency issue, we now forward declare Tensor in the Type declaration, which breaks the dependency cycle. Type.h now includes Tensor.h (for backwards compatibility) and Tensor.h now includes TensorMethods.h, so there is no longer include dependency restrictions. We could get rid of TensorMethods.h completely now, but that would involve coordinating a code generation change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11720 Reviewed By: ezyang Differential Revision: D9841488 Pulled By: gchanan fbshipit-source-id: 1668199095e096c1790e646b5dc9f61ec1b33c0a	2018-09-17 11:10:32 -07:00
Gregory Chanan	e125e61824	Fix flake8 Summary: Fix flake8 Reviewed By: ezyang Differential Revision: D9873872 fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd	2018-09-17 11:10:29 -07:00
Chenguang Xi	cdefc27795	Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162 as title, fix pr test failure Reviewed By: chocjy Differential Revision: D9619308 fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146	2018-09-17 10:29:03 -07:00
Peter Goldsborough	7949250295	Fixes for Torch Script C++ API (#11682 ) Summary: A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial: 1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature. 2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr AFAIK this is a much, much better way of doing all of this, no? 3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target. 4. Fixes to JIT docs. These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap. zdevito dzhulgakov soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682 Differential Revision: D9839431 Pulled By: goldsborough fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65	2018-09-17 09:54:50 -07:00
Thomas Viehmann	a7e3cd09e0	Fix ctc gradient handling (#11753 ) Summary: Fixes: #11750 Also fix cuda ctc with double to enable gradient check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11753 Differential Revision: D9861318 Pulled By: ezyang fbshipit-source-id: 2e7afea2b60dbbd891bb5d0bda61ee75fe01d933	2018-09-17 09:54:47 -07:00
Edward Yang	07fd4450ab	Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) Differential Revision: D9831398 Original commit changeset: db119d3f9c26 fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549	2018-09-17 09:39:23 -07:00
Edward Yang	f6a6d7fae1	Switch at::TensorImpl to store TypeMeta rather than ScalarType Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11702 Reviewed By: cpuhrsch Differential Revision: D9831384 fbshipit-source-id: 1b1233a70ed70b47a3dab4a5797b6cfcb7a2c265	2018-09-17 09:09:35 -07:00
Edward Yang	6660a128a5	Cache and use TypeMeta in TensorImpl (#11706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11706 This is necessary to handle use-cases when Storage is not set (because the tensor in question doesn't have a notion of storage.) Reviewed By: orionr Differential Revision: D9833361 fbshipit-source-id: e90a384019f44f57682b687d129b54e85b6fabb9	2018-09-17 08:58:13 -07:00
Edward Yang	2baba7f835	Add storage_offset to Caffe2 (#11701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11701 There's one extra multiply from TypeMeta::itemsize() which needs to be characterized. For all existing Caffe2 uses, storage_offset is zero. Reviewed By: li-roy Differential Revision: D9831230 fbshipit-source-id: 353678edf76d2ccc297a73475a34f6ab2a20d1e1	2018-09-17 08:58:11 -07:00
Edward Yang	35518b3dc7	Back out "Back out "Refactor Tensor/TensorImpl constructors."" E2: Confirm problem with old patch (#11744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11744 Original commit changeset: 093e4c47d557 Restores D9813742 Reviewed By: dzhulgakov Differential Revision: D9847835 fbshipit-source-id: f3f467891e01c923dd9d3352d892cf59e10402f1	2018-09-17 08:58:09 -07:00
Gregory Chanan	0d345cfa18	Remove Type method defaults in ATen. (#11675 ) Summary: This will allow us to break the dependency cycle between Tensor and Type, because currently Type has defaulted Tensor (reference) arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11675 Reviewed By: ezyang Differential Revision: D9819720 Pulled By: gchanan fbshipit-source-id: a9577ac34a358120075129ab0654e7862d1dace6	2018-09-17 08:58:07 -07:00
Jesse Hellemn	5bfd8f583c	Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726 ) Summary: This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726 Reviewed By: orionr Differential Revision: D9844075 Pulled By: pjh5 fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa	2018-09-17 08:58:05 -07:00
Gregory Chanan	a8b1755de6	Check device argument makes sense for legacy tensor constructors. (#11669 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/11427. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669 Differential Revision: D9817881 Pulled By: gchanan fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6	2018-09-17 08:24:25 -07:00
peter	d63bb72d89	Remove symbol export annotations in THC/generic/*.cu (#11367 ) Summary: We use these annotations during function declarations, not definitions. See the description of compiler error [C2491](https://msdn.microsoft.com/en-us/library/62688esh.aspx) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11367 Reviewed By: ezyang Differential Revision: D9697923 Pulled By: orionr fbshipit-source-id: 1e539c02957851386f887e6d0510ce83117a1695	2018-09-17 08:24:23 -07:00
JerryShih	f5bc2aef07	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563 ) Summary: Fix the link OpenMP link error for AppleClang 9.0 compiler. Built with the following command: python setup.py build develop The error message: ``` Undefined symbols for architecture x86_64: "___kmpc_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_end_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_for_static_fini", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_4", referenced from: _.omp_outlined. in init.cpp.o _.omp_outlined..35 in init.cpp.o _.omp_outlined..36 in init.cpp.o _.omp_outlined..37 in init.cpp.o _.omp_outlined..49 in init.cpp.o _.omp_outlined..52 in init.cpp.o _.omp_outlined..220 in init.cpp.o ... "___kmpc_for_static_init_8", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_8u", referenced from: _.omp_outlined..203 in init.cpp.o _.omp_outlined..207 in init.cpp.o _.omp_outlined..209 in init.cpp.o _.omp_outlined..210 in init.cpp.o "___kmpc_fork_call", referenced from: at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o ... "___kmpc_global_thread_num", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_push_num_threads", referenced from: void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "___kmpc_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "_omp_get_max_threads", referenced from: _THGetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char*) in init_omp.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "_omp_get_num_procs", referenced from: _THGetNumCores in THGeneral.cpp.o "_omp_get_num_threads", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_get_thread_num", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_in_parallel", referenced from: _THFloatTensor_copy in THTensorCopy.cpp.o _THDoubleTensor_copy in THTensorCopy.cpp.o _THByteTensor_copy in THTensorCopy.cpp.o _THCharTensor_copy in THTensorCopy.cpp.o _THShortTensor_copy in THTensorCopy.cpp.o _THIntTensor_copy in THTensorCopy.cpp.o _THLongTensor_copy in THTensorCopy.cpp.o ... "_omp_set_num_threads", referenced from: _THSetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char***) in init_omp.cc.o ld: symbol(s) not found for architecture x86_64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563 Differential Revision: D9831398 Pulled By: ezyang fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed	2018-09-17 08:24:20 -07:00
Tongzhou Wang	6f6b03566b	Vectorize grid sample 2d CPU kernels (#10980 ) Summary: This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically, 1. add `.data()` in `TensorAccessor` 2. support non-void return value for declaring CPU kernel stub 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)` 1. The following vectorized CPU primitives are added: + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]` + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`. + comparison ops + binary logical ops + `min(a, b)` + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`. + ctor with multiple values (i.e., `setr`) + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`: ``` inputs: {a0, a1, a2, a3, a4, a5, a6, a7} {b0, b1, b2, b3, b4, b5, b6, b7} outputs: {a0, b0, a1, b1, a2, b2, a3, b3} {a4, b4, a5, b5, a6, b6, a7, b7} ``` 2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`: ``` NOTE [ Grid Sample CPU Kernels ] Implementation of vectorized grid sample CPU kernels is divided into three parts: 1. `ComputeLocation` struct Transforms grid values into interpolation locations of the input tensor for a particular spatial dimension, basing on the size of that dimension in input tensor, and the padding mode. ``` ```cpp template<typename scalar_t, GridSamplerPadding padding> struct ComputeLocation { using Vec = Vec256<scalar_t>; // ctor ComputeLocation(int64_t size); // Given grid values `in`, return the interpolation locations after // un-normalization and padding mechanism (elementwise). Vec apply(const Vec &in) const; // Similar to `apply`, but also returns `d apply(in) / d in` // (elementwise). // this is often used in gradient computation. std::pair<Vec, Vec> apply_get_grad(const Vec &in) const; }; ``` ``` 2. `ApplyGridSample` struct Owns N `ComputeLocation` structs, where N is the number of spatial dimensions. Given N input grid vectors (one for each spatial dimension) and spatial offset, it gets the interpolation locations from `ComputeLocation`s, applies interpolation procedure, and then writes to the output (or grad_input & grad_grid in backward). ``` ```cpp template<typename scalar_t, int spatial_dim, GridSamplerInterpolation interp, GridSamplerPadding padding> struct ApplyGridSample { // ctor ApplyGridSample(const TensorAccessor<scalar_t, 4>& input); // Applies grid sampling (forward) procedure: // 1. computes interpolation locations from grid values `grid_x` and // `grid_y`, // 2. interpolates output values using the locations and input data // in `inp_slice`, and // 3. writes the first `len` values in the interpolated vector to // `out_slice` with spatial offset being `offset`. // // This assimes that `grid_x` and `grid_y` all contain valid grid // values \in [-1, 1], even at indices greater than `len`. // // The `*_slice` argument namess mean samples within a batch (i.e., // with the batch dimension sliced out). void forward(TensorAccessor<scalar_t, 3>& out_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; // Applies grid sampling (backward) procedure. Arguments semantics // and strategy are similar to those of `forward`. void backward(TensorAccessor<scalar_t, 3>& gInp_slice, TensorAccessor<scalar_t, 3>& gGrid_slice, const TensorAccessor<scalar_t, 3>& gOut_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; } ``` ``` 3. `grid_sample_2d_grid_slice_iterator` function Among the tensors we work with, we know that the output tensors are contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in backward), we need to randomly read `input` anyways, and `grad_output` usually comes from autograd and is often contiguous. So we base our iterating strategy on the geometry of grid. `grid_sample_2d_grid_slice_iterator` function provides an abstract to efficiently iterates through a `grid` slice (without batch dimension). See comments of that function on the specific cases and strategies used. ``` ```cpp template<typename scalar_t, typename ApplyFn> void grid_sample_2d_grid_slice_iterator( const TensorAccessor<scalar_t, 3>& grid_slice, const ApplyFn &apply_fn); // `apply_fn` is a function/lambda that can be called as if it has // declaration: // void apply_fn(const Vec256<scalar_t>& grid_x, // const Vec256<scalar_t>& grid_y, // int64_t spatial_offset, int64_t len); ``` ``` `apply_fn` will be called multiple times, and together cover the entire output spatial space. Therefore, e.g., to implement forward 2d grid sample, we can do ``` ```cpp ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor); for (int n = 0; n < input_accessor.size(0); n++) { grid_sample_2d_grid_slice_iterator( grid_accessor[n], [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y, int64_t spatial_offset, int64_t len) { grid_sample.forward(out_accessor[n], input_accessor[n], spatial_offset, grid_x, grid_y, len); }); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980 Differential Revision: D9564867 Pulled By: SsnL fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4	2018-09-16 20:41:10 -07:00
peter	10c29c8970	Fix CUDA 8 build on Windows (#11729 ) Summary: Tested via https://github.com/pytorch/pytorch/pull/11374. Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11729 Differential Revision: D9847807 Pulled By: orionr fbshipit-source-id: 69af3e6c5bba0abcbc8830495e867a0b1b399c22	2018-09-16 08:09:24 -07:00
Jiyan Yang	ca6f08f359	Set correct dtype for fp16 op inference function (#11693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11693 as desc. Reviewed By: hyuen Differential Revision: D9829061 fbshipit-source-id: 0f4c8a9d2b95d4cf5fa20a2aefd5671f273a8e76	2018-09-15 23:40:41 -07:00
Junjie Bai	b3e726042c	Do not use FixedDivisor in ROCM order switch op (#11697 ) Summary: Fix the recent order_switch_test failure in ROCM CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/11697 Reviewed By: BIT-silence Differential Revision: D9831039 Pulled By: bddppq fbshipit-source-id: 2368fd1ac7b1bab335ff3377071246cfd3392f3f	2018-09-15 18:24:51 -07:00
rohithkrn	eb3c47bdd5	max -> fmaxf in cross_entropy kernel (#11733 ) Summary: Changing `max` to `fmaxf` in `LabelCrossEntropy` kernel for hip to work correctly. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/11733 Differential Revision: D9846783 Pulled By: bddppq fbshipit-source-id: c1b394d2ba7ee0e819f7bf3b36b53d1962de5522	2018-09-15 18:13:42 -07:00
Ailing Zhang	f09054f8d0	Remove deprecate warning for Upsampling (#11568 ) Summary: Fixes #11452 . Based on the discussion with SsnL and soumith , we want to bring back Upsample as a module instead of introducing a new nn.interpolate module for now. If anyone want to do downsample, they should use `nn.functional.interpolate ` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11568 Differential Revision: D9804359 Pulled By: ailzhang fbshipit-source-id: 2b232d55fc83c2b581bf336f1ee8d1cf1c1159ca	2018-09-14 17:54:48 -07:00
Sebastian Messmer	bb6f18c44f	Simplify IValue::toTensor() (#11355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11355 There is no reason to implement refcounting manually in this case. Given the correct NullType, toIntrusivePtr() and moveToIntrusivePtr() will do the right thing. Reviewed By: ezyang Differential Revision: D9694918 fbshipit-source-id: 8aae4d66aec32ca5f85c438d66339bd80b72b656	2018-09-14 16:57:15 -07:00
Sebastian Messmer	690c999bba	Simplify union payload copying (#11353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11353 Before, there was one extra member in the union that had to be at least as large as the largest other member, because it was used for copying. Now, this isn't needed anymore and we copy the union directly. Reviewed By: ezyang Differential Revision: D9694326 fbshipit-source-id: 42b2f7d51ac5d4ea5ebafea3a598b018e10fed68	2018-09-14 16:57:14 -07:00
Sebastian Messmer	270fb22bd8	Remove intrusive_ptr::reclaim() in Storage (2/2) (#11547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11547 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9778042 fbshipit-source-id: c9572edc440c5ce5ea1b2355b5c54f87078ea28e	2018-09-14 16:57:12 -07:00
Sebastian Messmer	f4d9fe395d	Remove intrusive_ptr::reclaim() in Storage (#11352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9694327 fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd	2018-09-14 16:57:10 -07:00
Edward Yang	2c8a1b957e	Back out "Refactor Tensor/TensorImpl constructors." Summary: Original commit changeset: 7501b54fe5f3 Reviewed By: gchanan Differential Revision: D9838097 fbshipit-source-id: 093e4c47d5574ce99f706b0683ef369a89b62b38	2018-09-14 16:39:31 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Junjie Bai	d24bcfd930	Suppress hiprand "duplicate-decl-specifier" warning (#11698 ) Summary: Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug. iotamudelta Jorghi12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698 Differential Revision: D9840356 Pulled By: bddppq fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8	2018-09-14 15:51:43 -07:00
Peter Goldsborough	8e3f8c52e8	Document the Sequential module (#11648 ) Summary: 1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level 2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648 Differential Revision: D9834526 Pulled By: goldsborough fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1	2018-09-14 15:51:41 -07:00
Mike Ruberry	96d3f968eb	Splits CPU and CUDA fusion compilers (#10981 ) Summary: This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular: - A fusion interface is created that allows "fusion handles" to be requested - The CPU and CUDA fusers implement this interface, with dispatch determined by device - The fusion compilers, fusion function specializations and resource strings are split - CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser - Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common - There is still some specialization in jit/fusers/common, but these specializations are small(-ish) - Updates the build system to remove the dummy interface on Windows and minimize the use of macros This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981 Reviewed By: soumith Differential Revision: D9701999 Pulled By: apaszke fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3	2018-09-14 14:05:34 -07:00
David Riazati	70e68e755a	Casting for binary ops (#11708 ) Summary: Fixes #11663 `TensorIterator` was replacing the op tensors with type casted tensors which ended up producing side effects in binary ops like `a.float() * b` where `a` and `b` are `LongTensor`s. colesbury ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708 Differential Revision: D9834016 Pulled By: driazati fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d	2018-09-14 13:40:21 -07:00
Anders Papitto	224e62bbec	respect USE_CUDA_STATIC_LINK in build_libtorch.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11713 Differential Revision: D9835972 Pulled By: anderspapitto fbshipit-source-id: 046363b132e5487c05ef7e6e6d88b508196386a1	2018-09-14 12:25:08 -07:00
Michael Carilli	0c2648830f	Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881 ) Summary: Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script. If the op is in the forward pass that's easy: ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward. If the op is in the backward pass, it's much more difficult. From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information. Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are. This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions. The method is simple and surgical. During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number. During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op. Double-backward is not a problem. If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass. Minor caveats: - The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name). So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1. However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker. - Feel free to litigate my use of stringstream in profiler.cpp. I did it because it was easy and clean. If that's too big a hammer, let's figure out something more lightweight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881 Differential Revision: D9833371 Pulled By: apaszke fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088	2018-09-14 11:56:55 -07:00
Gregory Chanan	b90872c00e	Get rid of default arguments for TH/THC factory functions. (#11673 ) Summary: This is causing codegen problems in caffe2, when we try to remove the circular Tensor/Type declarations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11673 Differential Revision: D9819341 Pulled By: gchanan fbshipit-source-id: f2c2cd96e8a16f6de6aa4889e71b8a78e12e9256	2018-09-14 10:55:38 -07:00
Pieter Noordhuis	7535d98ec4	Add message tag parameter to send/recv Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490 Reviewed By: teng-li Differential Revision: D9828116 Pulled By: pietern fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7	2018-09-14 10:55:37 -07:00
Peter Goldsborough	3258fc11a7	Delete torch/csrc/api/README.md (#11703 ) Summary: We'll have separate docs for the C++ frontend, right now this file is just misleading Pull Request resolved: https://github.com/pytorch/pytorch/pull/11703 Differential Revision: D9832847 Pulled By: goldsborough fbshipit-source-id: 2e8b30ccf6b5cba9d0526e6261160f7c6211a35c	2018-09-14 10:55:35 -07:00
James Reed	278e304c18	Implement elif in string frontend (#11667 ) Summary: Closes #11625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11667 Differential Revision: D9828145 Pulled By: jamesr66a fbshipit-source-id: c72dc41cb310a4211b4e4c6b33f7e2c1fb3581a0	2018-09-14 10:09:46 -07:00
Roy Li	115b13ffab	clean up some old Half stuff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11687 Differential Revision: D9829027 Pulled By: li-roy fbshipit-source-id: f35dcdf93ea57ba4fa775e36e9d6378bed46a710	2018-09-14 09:54:45 -07:00
Alexander Sidorov	eb039dc92c	Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597 We should always CHECK pointers which we plan to dereference if they are inputs to the function. Nobody knows how the function will be called in the future. Reviewed By: yinghai Differential Revision: D9800002 fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03	2018-09-14 09:40:27 -07:00
Vishwak Srinivasan	0d9b9100f9	Fix gesv and gels docs (#11699 ) Summary: Closes #9935 and closes #5431 . Differential Revision: D9830448 Pulled By: soumith fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514	2018-09-14 09:24:45 -07:00
Edward Yang	72822ee6b2	Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533 ) Summary: …cuda()) While I was at it, I audited all other ways I know how we might get a CUDA type from PyTorch and fixed more constructors which don't work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533 Differential Revision: D9775786 Pulled By: ezyang fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17	2018-09-14 09:10:08 -07:00
Gregory Chanan	2631da0822	Move some Tensor method definitions from Type.h to TensorMethods.h. (#11650 ) Summary: There's no reason they need to be in Type.h and this moves us along the path of not having circular dependencies (so we can get rid of TensorMethods.h). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11650 Reviewed By: ezyang Differential Revision: D9812271 Pulled By: gchanan fbshipit-source-id: 8b70db9a5eb0a332398ab2e8998eeaf7d2eea6d7	2018-09-14 08:56:02 -07:00
Gregory Chanan	6c3792b9ec	Implement UndefinedType::typeMeta. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11666 Differential Revision: D9816212 Pulled By: gchanan fbshipit-source-id: 079899590150009bc2e2a3bbdc78a98de9380e37	2018-09-14 08:40:26 -07:00
Neeraj Pradhan	cda71e2600	Disallow scalar parameters in Dirichlet and Categorical (#11589 ) Summary: This adds a small check in `Dirichlet` and `Categorical` `__init__` methods to ensure that scalar parameters are not admissible. Motivation Currently, `Dirichlet` throws no error when provided with a scalar parameter, but if we `expand` a scalar instance, it inherits the empty event shape from the original instance and gives unexpected results. The alternative to this check is to promote `event_shape` to be `torch.Size((1,))` if the original instance was a scalar, but that seems to add a bit more complexity (and changes the behavior of `expand` in that it would affect the `event_shape` as well as the `batch_shape` now). Does this seem reasonable? cc. alicanb, fritzo. ```python In [4]: d = dist.Dirichlet(torch.tensor(1.)) In [5]: d.sample() Out[5]: tensor(1.0000) In [6]: d.log_prob(d.sample()) Out[6]: tensor(0.) In [7]: e = d.expand([3]) In [8]: e.sample() Out[8]: tensor([0.3953, 0.1797, 0.4250]) # interpreted as events In [9]: e.log_prob(e.sample()) Out[9]: tensor(0.6931) # wrongly summed out In [10]: e.batch_shape Out[10]: torch.Size([3]) In [11]: e.event_shape Out[11]: torch.Size([]) # cannot be empty ``` Additionally, based on review comments, this removes `real_vector` constraint. This was only being used in `MultivariateNormal`, but I am happy to revert this if we want to keep it around for backwards compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11589 Differential Revision: D9818271 Pulled By: soumith fbshipit-source-id: f9bbba90ed6f04e0b5bdfa169e70ca20b280fc74	2018-09-14 07:55:35 -07:00
Neeraj Pradhan	c391c20063	Adding .expand method for TransformedDistribution (#11607 ) Summary: This PR: - adds a `.expand` method for `TransformedDistribution` along the lines of #11341. - uses this method to simplify `.expand` in distribution classes that subclass off of `TransformedDistribution`. - restores testing of `TransformedDistribution` fixtures. - fixes some bugs wherein we were not setting certain attributes in the expanded instances, and adds tests for `.mean` and `.variance` which use these attributes. There are many cases where users directly use `TransformedDistribution` rather than subclassing off it. In such cases, it seems rather inconvenient to have to write a separate class just to define a `.expand` method. The default implementation should suffice in these cases. cc. fritzo, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11607 Differential Revision: D9818225 Pulled By: soumith fbshipit-source-id: 2c4b3812b9a03e6985278cfce0f9a127ce536f23	2018-09-14 07:55:33 -07:00
Edward Yang	74197c7115	Restore support for dim=None on WeightNorm. (#11661 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661 Reviewed By: veenix Differential Revision: D9826799 Pulled By: ezyang fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d	2018-09-14 07:39:43 -07:00
Edward Yang	19065f91fc	Centralize TypeExtendedInterface casts. (#11576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11576 Previously, they were spattered throughout the codebase. We now follow this convention: - LegacyTypeDispatch gives you Type - Context gives you TypeExtendedInterface - Tensor::type() gives you Type - at::getType() gives you TypeExtendedInterface I change some sites to use getType() over type(). Reviewed By: SsnL Differential Revision: D9790187 fbshipit-source-id: 5e2577cb590a5bbf5df530f3763d3b3c0b4625ca	2018-09-14 07:39:41 -07:00
Jiyan Yang	c5f7da3f4a	Support FP16 sparse lookup (#11674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658 Reviewed By: hyuen Differential Revision: D9676950 fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d	2018-09-14 02:40:08 -07:00
zrphercule	1637729620	Fix ci by skipping some tests (#11668 ) Summary: scalar_tensor_test skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/11668 Differential Revision: D9825819 Pulled By: zrphercule fbshipit-source-id: 6e62a001bcde49be8f7af1501b303bd93d09d005	2018-09-13 20:25:14 -07:00
Edward Yang	e6fe8d9cf5	Try to delete codeowners for ATen/core (#10693 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10693 Reviewed By: soumith Differential Revision: D9772210 Pulled By: ezyang fbshipit-source-id: 14560eaf77441980e9784536acd0ffe20b15c5b8	2018-09-13 20:25:11 -07:00
Fritz Obermeyer	2431eac7c0	Ensure most Distribution methods are jittable (#11560 ) Summary: This adds tests in tests/test_distributions.py to ensure that all methods of `Distribution` objects are jittable. I've replaced a few samplers with jittable versions: - `.uniform_()` -> `torch.rand()` - `.exponential_()` -> `-(-torch.rand()).log1p()` - `.normal_()` -> `torch.normal(torch.zeros(...), torch.ones(...), ...)` Some jit failures remain, and are marked in test_distributions.py - `Cauchy` and `HalfCauchy` do not support sampling due to missing `.cauchy_()` - `Binomial` does not support `.enumerate_support()` due to `arange` ignoring its first arg. - `MultivariateNormal`, `LowRankMultivariateNormal` do not support `.mean`, `.entropy` - [x] Currently some tests fail (I've skipped those) due to unavailability of `aten::uniform` and `aten::cauchy` in the jit. Can someone suggest how to add these? I tried to add declarations to `torch/csrc/ir.cpp` and `torch/csrc/passes/shape_analysis.cpp`, but that resulted in "Couldn't find operator" errors. - [x] There are still lots of `TracerWarning`s that something doesn't match something. I'm not sure whether these are real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11560 Differential Revision: D9816327 Pulled By: apaszke fbshipit-source-id: 72ec998ea13fc4c76d1ed003d9502e0fbaf728b8	2018-09-13 19:55:01 -07:00
xhzhao	99c0b96f68	optimize norm on ATen CPU backend (#11565 ) Summary: current torch.norm() runs sequentially on CPU. This PR did parallelization and vectorization of torch.norm() on ATen CPU path, roughly provide 2 order of magnitude performance boost. Performance is benchmarks on Xeon skylake 8180, 228 cores 2.5GHz, using the following script: ```python import torch from time import time count = 1000 size = 10001000 def test_norm(p=2): a = torch.randn(size) tstart = time() for i in range(count): torch.norm(a, p) tend = time() print("norm on size %d tensor p = %d: %f s" % (size, p, (tend-tstart))) for p in range(4): test_norm(p) ``` without this optimization, ``` (intel-pytorch) [mingfeim@mlt-skx065 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 1.071235 s norm on size 1000000 tensor p = 1: 1.069149 s norm on size 1000000 tensor p = 2: 1.068212 s norm on size 1000000 tensor p = 3: 69.735312 s ``` and with this optimization, ``` (pytorch-tf) [mingfeim@mlt-skx053 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 0.127507 s norm on size 1000000 tensor p = 1: 0.011867 s norm on size 1000000 tensor p = 2: 0.011907 s norm on size 1000000 tensor p = 3: 0.014470 s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11565 Differential Revision: D9804484 Pulled By: ezyang fbshipit-source-id: 52899f30ac26139d00684d07edfb47cb9b25d871	2018-09-13 19:40:43 -07:00
Adam Paszke	98e04db955	Implement requires_grad propagation in the JIT (#11586 ) Summary: Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute. I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586 Reviewed By: ezyang Differential Revision: D9813648 Pulled By: apaszke fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8	2018-09-13 19:25:26 -07:00
Gao, Xiang	513fd3dd36	Improve doc of `torch.nn.functional.pad` (#11623 ) Summary: I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623 Differential Revision: D9818255 Pulled By: soumith fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536	2018-09-13 19:25:24 -07:00
Tongzhou Wang	760679352e	Move Pixel Shuffle to ATen (#9721 ) Summary: <del>#9692 </del> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721 Differential Revision: D8955829 Pulled By: SsnL fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399	2018-09-13 18:25:48 -07:00
Edward Yang	e1cd220b90	Reimplement swap() using default move constructor. (#11659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11659 This is less error-prone and less code. Reviewed By: smessmer Differential Revision: D9814536 fbshipit-source-id: 028510e31e2fa7a9fa11c1398b0743c5cd085dd5	2018-09-13 16:32:55 -07:00
Edward Yang	02980d7f8c	Refactor Tensor/TensorImpl constructors. (#11657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11657 Previously, we had a constructor in TensorImpl for every constructor in Tensor. This was unnecessary and wordy: Tensor is the user-visible class, so it deserves the constructors, but TensorImpl is internal and doesn't need it. So I replaced TensorImpl with a single, Storage accepting constructor, and then rewrote Tensor to use that constructor. Reviewed By: jerryzh168 Differential Revision: D9813742 fbshipit-source-id: 7501b54fe5f39180f1bc07573fd7c1640b0f4e89	2018-09-13 16:32:53 -07:00
Edward Yang	7607b49538	s/GetDevicetype/device_type/ (#11656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656 The mis-capitalization really sticks up my craw. I know why (we already have a static function named GetDeviceType), but let's name it differently. ``` codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type ``` Reviewed By: jerryzh168 Differential Revision: D9813544 fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636	2018-09-13 16:32:51 -07:00
Edward Yang	c18510463b	Reduce includes in tensor_impl.h (#11643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11643 - Reduce the tensor_impl.h includes to the bare minimum necessary - Explicitly namespace std:: Reviewed By: jerryzh168 Differential Revision: D9811028 fbshipit-source-id: 44e32720962b35c12a7b2c93605721b9f6c5b254	2018-09-13 16:32:49 -07:00
Edward Yang	8402fde279	Revert D9778043: Pass Storage by value Differential Revision: D9778043 Original commit changeset: b1381cd60a82 fbshipit-source-id: 40f1de67e939cb41605978d632105a48a91e7629	2018-09-13 16:32:48 -07:00
Gregory Chanan	85ff72348d	Only involve tensor device in CUDA -> CPU copy, not current device. (#11592 ) Summary: This also unifies the device usage between the async and sync case. Fixes https://github.com/pytorch/pytorch/issues/10832. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11592 Differential Revision: D9797355 Pulled By: gchanan fbshipit-source-id: e496cd371111cfaf9a6c664167967b395e3d72e9	2018-09-13 16:32:46 -07:00
Sebastian Messmer	4672280b55	Pass Storage by value (#11546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11546 - Reviewed By: ezyang Differential Revision: D9778043 fbshipit-source-id: b1381cd60a826055ce8771d6c67eac4cc375b3b4	2018-09-13 15:26:05 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Pieter Noordhuis	29e29ca6ee	Use MPI_Isend/MPI_Irecv to back send/recv (#11630 ) Summary: The isCompleted function is changed to being non-const to accomodate setting some internal status on the work object in the case of completion. Previously, it was only checking a member field, but for the MPI backend it calls MPI_Test to poll for completion of an asynchronous request. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11630 Reviewed By: SsnL Differential Revision: D9808008 Pulled By: pietern fbshipit-source-id: 18b70825b1fb4d561a552fa75e9475a522852cd4	2018-09-13 15:01:24 -07:00
Marc Ferradou	f129da1a47	Add max to the ValueError for EmbeddingBag mode check (#11655 ) Summary: Related to #11624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655 Differential Revision: D9815454 Pulled By: SsnL fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a	2018-09-13 14:39:40 -07:00
Sebastian Messmer	90537289a0	Constexpr std::move / std::forward for C++11 (#11396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11396 std::move and std::forward in C++11 aren't constexpr (they are in C++14). This caused a build issue orionr was working on. It should be fixed by this diff Reviewed By: orionr Differential Revision: D9724805 fbshipit-source-id: 0d9047dce611385d659cc71a6c04cc7a6a40a5ae	2018-09-13 12:56:17 -07:00
James Reed	0f1ca569ce	End-to-end dynamic slicing with ONNX DynamicSlice experimental operator (#11255 ) Summary: Requires https://github.com/onnx/onnx/pull/1377 This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255 Differential Revision: D9790216 Pulled By: jamesr66a fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2	2018-09-13 12:39:52 -07:00
Soumith Chintala	acb6f18bab	fix generate_code.py caching (#11644 ) Summary: Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed. This updated logic fixes the input caching Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644 Reviewed By: orionr Differential Revision: D9814348 Pulled By: soumith fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f	2018-09-13 12:39:48 -07:00
Roy Li	75f49befeb	move instance_norm to aten (#10792 ) Summary: This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792 Differential Revision: D9800643 Pulled By: li-roy fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1	2018-09-13 12:26:22 -07:00
Edward Yang	912d3626c8	Split tensor.h into tensor_impl.h and tensor.h (#11642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11642 This is just a preparatory change to help with future refactoring: - I want to reduce the number of includes that tensor_impl.h depends on, but - I need to keep tensor.h providing all Caffe2 headers, because users may be relying on tensor.h transitively providing those headers. Introducing a level of indirection lets me do both at the same time. Reviewed By: jerryzh168 Differential Revision: D9810823 fbshipit-source-id: 8dfaac4b8768051a22898be8fcaf787ecc57eb13	2018-09-13 12:26:20 -07:00
Richard Zou	45e9ee096e	Fix test_mnist_training_leaks_no_memory_cuda warning (#11639 ) Summary: Before this PR it would warn that "dropout is non deterministic and can cause problems when checking trace", so I disabled the trace checking. cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11639 Differential Revision: D9812493 Pulled By: zou3519 fbshipit-source-id: fab86928a5fba8b218b47543533aaf7c82a10b4a	2018-09-13 12:09:20 -07:00
Roy Li	9abc666745	stop allowing extra positional args in arg parser (#10499 ) Summary: Arg parser allowed additional positional args to be parsed into keyword-only params. Fixes a couple cases: - The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected. - The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message. Pre-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) tensor([6, 0], device='cuda:1') ``` Post-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of: * (torch.device device) * (torch.Storage storage) * (Tensor other) * (tuple of ints size, torch.device device) * (object data, torch.device device) ``` Pre-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int ``` Post-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros() takes 1 positional argument but 2 were given ``` fixes #8351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499 Differential Revision: D9811093 Pulled By: li-roy fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76	2018-09-13 11:56:12 -07:00
David Riazati	6f53b4efea	Remove implicit bool casts (#11503 ) Summary: In order to comply with Python's rules on implicit casting of non-booleans to booleans, this PR removes implicit casting in favor of explicit casts via `bool()` cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503 Differential Revision: D9780869 Pulled By: driazati fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9	2018-09-13 11:26:45 -07:00
Zachary DeVito	ab3a2d25fb	Improve error messages when trying to use nested lists. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11606 Differential Revision: D9806949 Pulled By: zdevito fbshipit-source-id: c38abc4ce745a63d26a64f6aa1b41350e4b1acd5	2018-09-13 11:10:38 -07:00
Roger-luo	5bc90b8554	support conversion and dispatch of complex numbers (#11603 ) Summary: - Just a simple fix to support `fill_` - And a fix for indexing in `pytorch-complex` Differential Revision: D9804061 Pulled By: ezyang fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf	2018-09-13 11:10:37 -07:00
Roy Li	a861573e36	fix tensor export bug in IR export (#11613 ) Differential Revision: D9811094 Pulled By: li-roy fbshipit-source-id: 012792dbedc70bd3fa242fdf2e39da0b21ce158d	2018-09-13 11:10:35 -07:00
Lu Fang	d278344e36	Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb (#11622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11622 Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52 Included changes: - [39dd0d4](https://github.com/onnx/onnx/commit/39dd0d4): [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson> - [944db4f](https://github.com/onnx/onnx/commit/944db4f): cmake (#1401) <zrphercule> - [8ccc8dd](https://github.com/onnx/onnx/commit/8ccc8dd): Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan> - [df14e74](https://github.com/onnx/onnx/commit/df14e74): Change onnxifi test driver classname (#1396) <zrphercule> - [0c885cc](https://github.com/onnx/onnx/commit/0c885cc): ONNXIFI cpp test driver (#1290) <zrphercule> - [a557848](https://github.com/onnx/onnx/commit/a557848): Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana> - [31fd87f](https://github.com/onnx/onnx/commit/31fd87f): fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu> - [8ff08c2](https://github.com/onnx/onnx/commit/8ff08c2): Do not export onnx symbols in the python extension (#1388) <bddppq> Reviewed By: orionr Differential Revision: D9806635 fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f	2018-09-13 10:40:48 -07:00
Edward Yang	1f49b879d1	Add missing include for __half (#11638 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11638 Differential Revision: D9811063 Pulled By: ezyang fbshipit-source-id: dd103bb152485bcdbb0108b4d3de2443c30d5572	2018-09-13 10:33:09 -07:00
Tongzhou Wang	d4d72b87e3	Sphinx is case sensitive Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11646 Differential Revision: D9811355 Pulled By: SsnL fbshipit-source-id: d484561baa2ac5b3113870b4ee06fa3560b686e4	2018-09-13 10:33:06 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Christian Puhrsch	36fc1a0a58	Merge caffe2::/at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11637 Reviewed By: gchanan Differential Revision: D9806425 Pulled By: ezyang fbshipit-source-id: e20ec93bff6dc7fb22ca9b7e7348d060b3876b67	2018-09-13 09:40:48 -07:00
Elias Ellison	77f6998e54	Guard against inputting or returning sparse tensors (#11550 ) Summary: Add guards against using sparse tensor by checking the conversion from IValue -> PyObject & PyObject -> IValue. This diff also changes the behavior in constant propagation to not run python ops even if all ops are constant because of possible mutation to global state. This came up in trying to run get_sparse(), and I'm including it here to make it easier to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11550 Differential Revision: D9804712 Pulled By: eellison fbshipit-source-id: 9fe7daf721c6d6e48df4925c0f9c775873bcdc77	2018-09-13 08:58:29 -07:00
Christian Puhrsch	cac11a4ac3	Merge caffe2::/at::StorageImpl (#11543 ) Summary: Merges caffe2::StorageImpl methods with at::StorageImpl methods and defines caffe2::StorageImpl as at::StorageImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11543 Differential Revision: D9795228 Pulled By: cpuhrsch fbshipit-source-id: fbd6fa3cbf6c9099a4803337286c30e00652f95c	2018-09-13 01:25:50 -07:00
Wanchao Liang	44b2b6b150	clean up jit generated tests (#11403 ) Summary: Clean up some generated tests after we have newly nice features like var args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11403 Differential Revision: D9800545 Pulled By: wanchaol fbshipit-source-id: e9973b113f78dc38cf99a81b6ede3fa3485f1cfa	2018-09-12 22:55:03 -07:00
Christian Puhrsch	e998038bc0	Use TypeMeta instead of TypeIdentifier within at::StorageImpl (#11236 ) Summary: Further aligns at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11236 Differential Revision: D9776286 Pulled By: cpuhrsch fbshipit-source-id: f2c53995fcece013b77b3a1f709ab0f9df8ab23e	2018-09-12 22:26:00 -07:00
Edward Yang	6f05b5ee54	Pin Sphinx to 1.7.9 (#11620 ) Summary: Sphinx 1.8.0 breaks us. Upgrading is tracked in #11618. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11620 Differential Revision: D9806440 Pulled By: ezyang fbshipit-source-id: 7a8d849c78e697a8775d00cd3a463a7bdbcddabe	2018-09-12 21:55:21 -07:00
Guan Pang	17637f2b03	enable_mkl support for resnet18+lstm model Summary: * Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU * Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part * The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash. Reviewed By: viswanathgs Differential Revision: D8888763 fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6	2018-09-12 18:56:46 -07:00
Orion Reblitz-Richardson	0a6931cfee	Only reference ONNX through onnx_pb.h (#11609 ) Summary: I think this is needed to land https://github.com/onnx/onnx/pull/1407 without CI errors. cc mingzhe09088 houseroad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11609 Reviewed By: houseroad Differential Revision: D9803490 Pulled By: orionr fbshipit-source-id: 26193f38ab0a2eef9ad7d0da9a0310dc40ef0f2d	2018-09-12 18:25:58 -07:00
Edward Z. Yang	5da0b31bee	More native docs on TensorOptions. (#11558 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11558 Differential Revision: D9783655 Pulled By: ezyang fbshipit-source-id: 17c749c9ef99fd9dfd0ff365ebfe22102fb891d7	2018-09-12 17:39:39 -07:00
Roy Li	f00f99ebcc	use at::Half in THC (#11322 ) Summary: - use Half instead of half in THC - clean up TH_float2half, TH_half2float, etc. conversions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322 Differential Revision: D9799553 Pulled By: li-roy fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed	2018-09-12 17:39:37 -07:00
Edward Yang	daa379ffd7	Disable flaky test ObserverTest.TestMultipleNetBase (#11596 ) Summary: Tracked in https://github.com/pytorch/pytorch/issues/9137 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11596 Differential Revision: D9803256 Pulled By: ezyang fbshipit-source-id: 973393203ed8343a3a0feef36d34e561d9f653c4	2018-09-12 17:39:36 -07:00
Edward Yang	e2cd627cce	Temporarily disable docs build. (#11608 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11608 Differential Revision: D9803369 Pulled By: ezyang fbshipit-source-id: a206d6137e8e729f702189c926ec898444d1dc53	2018-09-12 17:39:34 -07:00
Xiaomeng Yang	7f7cda99cd	Optimize order_swich_ops on GPU (#11404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404 Optimize order_swich_ops on GPU Reviewed By: houseroad Differential Revision: D9728642 fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632	2018-09-12 16:56:15 -07:00
Johannes M Dieterich	776a9992e1	topk test fix, hgemm integration (#11593 ) Summary: After discussions in #11584 , new PR for just the test skip and hgemm integration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593 Differential Revision: D9798527 Pulled By: ezyang fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e	2018-09-12 16:56:13 -07:00
Edward Yang	def44c96fd	Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap. Differential Revision: D9779866 Original commit changeset: 96753eead790 fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4	2018-09-12 16:56:11 -07:00
Peter Goldsborough	5b2efcf425	Document the Conv module (#11566 ) Summary: Document the C++ API conv module. No code changes. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11566 Differential Revision: D9793665 Pulled By: goldsborough fbshipit-source-id: 5f7f0605f952fadc62ffbcb8eca4183d4142c451	2018-09-12 16:56:09 -07:00
Peter Goldsborough	130d55a5f4	Allow building the C++ API without cereal (#11498 ) Summary: I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package. For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498 Differential Revision: D9784803 Pulled By: goldsborough fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4	2018-09-12 16:56:07 -07:00
Eli Amesefe	12efef166a	Split out copy_op from utility_ops (#11470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11470 In order to reduce build sizes, we are identifying files that can be split up into smaller units, allowing us to only include the ops we need. Reviewed By: orionr, ajtulloch Differential Revision: D9725819 fbshipit-source-id: def1074a33dffe99bd6a7e6e48aa9e5be3d04a6a	2018-09-12 16:25:48 -07:00
Yinghai Lu	316c167940	Add checking of nullptrs in GetTensorInfo (#11587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11587 To help debug the issue in T33295362, we add some checks in the function. Possible crashing site in `GetTensorInfo` 1. tc is nullptr, which is checked. 2. tc->capacity_nbytes() hits nullptr, this is unlikely because storage is not a pointer and compute of capacity_nbytes doesn't involve pointers. It's numel * itermsize(). 3. tc->ExtractDeviceOption hits nullpt. One possibility raw_data() is nullptr because tc->ExtractDeviceOption will use that. This is checked. 4. Tensor itself which is not a reference. This is also checked. Reviewed By: salexspb Differential Revision: D9793484 fbshipit-source-id: 3fc72746fc310a23ae45553bbe0d269a4b9edb72	2018-09-12 16:25:46 -07:00
Xiaodong Wang	eb7a298489	Add resnext model to OSS (#11468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468 Add resnext model into OSS Caffe 2 repo. Reviewed By: orionr, kuttas Differential Revision: D9506000 fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8	2018-09-12 15:59:20 -07:00
Peter Goldsborough	c81406c514	Document Any (#11580 ) Summary: Documents the `AnyModule` class in the C++ API. Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write ```cpp any_module.forward(...).get<int>(); any_module.forward(...).get<torch::Tensor>(); ``` you now write ```cpp any_module.forward<int>(...); any_module.forward(...); ``` ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580 Differential Revision: D9798626 Pulled By: goldsborough fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73	2018-09-12 15:59:19 -07:00
Tongzhou Wang	ac94889939	Add jit doc entry to sidebar (#11598 ) Summary: cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11598 Differential Revision: D9801230 Pulled By: SsnL fbshipit-source-id: f0c8d2468b64a50c3c437667d462722dcd2682d1	2018-09-12 15:29:23 -07:00
Edward Yang	b663b7ce7e	Update ROCm Docker image with latest AMD debians (#11507 ) Summary: Building at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/194/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11507 Differential Revision: D9772474 Pulled By: ezyang fbshipit-source-id: ab00f05744547dc7ec9f97511e2c8495ac282fac	2018-09-12 15:29:21 -07:00
Tongzhou Wang	02c4cd3c8a	Skip flaky distributed tests (#11594 ) Summary: context: https://github.com/pytorch/pytorch/issues/11582 cc pietern The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11594 Differential Revision: D9798871 Pulled By: SsnL fbshipit-source-id: 9f9e1871c7fd9505ca898865eb8068fab4d3416d	2018-09-12 14:57:57 -07:00
Owen Anderson	d4e05f4e1e	Move function deletion from the stack to the heap. (#11534 ) Summary: This eliminates the need for any heuristics regarding stack size limits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534 Differential Revision: D9779866 Pulled By: resistor fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347	2018-09-12 14:39:59 -07:00
Lingyi Liu	958ba4e913	Aibench for asr decoder Summary: as title Reviewed By: sf-wind Differential Revision: D9738021 fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251	2018-09-12 14:25:19 -07:00
Edward Yang	f0a440007e	Explicitly set locale on docs build. (#11595 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11595 Differential Revision: D9798567 Pulled By: ezyang fbshipit-source-id: ac05458347e181960a07cacae1dfc68d2837451f	2018-09-12 14:11:24 -07:00
James Reed	504126e705	Documentation for debugging JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11540 Differential Revision: D9798647 Pulled By: jamesr66a fbshipit-source-id: 968a4af22c735a848fa27cbadaed9b7023ba8276	2018-09-12 14:11:22 -07:00
Michael Carilli	a3036b3bb3	Fused weightnorm for ATen (#10842 ) Summary: This PR contains a C++ implementation of weight norm. The user-side exposure of weight norm through torch.nn.utils.weight_norm is unchanged. If running on the GPU, and the norm is requested over the first or last dimension of the weight tensor, the forward pass is carried out using the fused kernels I wrote for our Fairseq GTC hero run, which offer superior performance to primitive ops and superior numerical stability when running in FP16. In the common case that the backward pass is not itself constructing a graph (ie not attempting to set up double backward) the backward pass will be carried out using another fused kernel. If the backward pass is constructing a graph, an alternate code path is taken, which does the math using differentiable primitive ops. In this way, the implementation allows double backward, even if the fused kernel was used in forward (although in this case, you don't benefit from the performance and stability of the fused backward kernel). If running on the CPU, or if norming over an interior dim, the forward pass is carried out using double-differentiable primitive ops. Figuring out how to generate all the right plumbing for this was tricky, but it was a fun experience learning how the autogenerator works and how the graph is constructed. Thanks to colesbury for useful guidance on this front. I do have a few lingering questions: - Should I unify my return statements (ie by default-constructing Tensors outside if blocks and using operator= within)? - What is the significance of `non_blocking` when calling e.g. `auto norms = saved_norms.to(saved_g.type().scalarType(), non_blocking=True/False);`? I am currently omitting `non_blocking`, so it defaults to False, but I didn't see any associated synchronizes on the timeline, so I'm wondering what it means. - Is there an "official" mapping from at::ScalarTypes to corresponding accumulate types, as there are for the PODs + Half in [AccumulateType.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h)? I looked for an equivalent mapping for ScalarTypes, didn't find one, and ended up rigging it myself (` at::ScalarType AccType = g.type().scalarType() == at::ScalarType::Half ? at::ScalarType::Float : g.type().scalarType();`). - Are sparse tensors a concern? Should I include another check for sparse tensors in the `_weight_norm` entry point, and send those along the fallback CPU path as well? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10842 Differential Revision: D9735531 Pulled By: ezyang fbshipit-source-id: 24431d46532cf5503876b3bd450d5ca775b3eaee	2018-09-12 13:55:27 -07:00
Gregory Chanan	9a7c196040	Move Type, Tensor, TensorMethods to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519 Reviewed By: yf225 Differential Revision: D9771684 Pulled By: gchanan fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0	2018-09-12 13:10:54 -07:00
Wanchao Liang	739e6af869	Add reminder % to the jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11557 Reviewed By: apaszke Differential Revision: D9784642 Pulled By: wanchaol fbshipit-source-id: b7c60c3e9534555c9d7db83769965b3f2f277cdf	2018-09-12 12:40:38 -07:00
Zachary DeVito	ad7936e108	Fix reloading modules back into python (#11552 ) Summary: This changes the way module import works so that when a module is reloaded in python it becomes a ScriptModule and not a _C.ScriptModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/11552 Differential Revision: D9782751 Pulled By: zdevito fbshipit-source-id: 9576850b75494b228ce3def94c0d371a4a44b11d	2018-09-12 12:25:15 -07:00
Gao, Xiang	17e76e26c8	Add trigonometry functions to docs/source/onnx.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11581 Differential Revision: D9794449 Pulled By: soumith fbshipit-source-id: 1218fcf8969a10ffbfefd3ced7fee9fe7df296f1	2018-09-12 12:10:01 -07:00
Richard Zou	13b05c8c78	Add EndToEndHybridModel CUDA tests (#11544 ) Summary: Also adds two additional tests that check for memory leaks while the relevant graph executors are alive: - (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive - (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11544 Reviewed By: apaszke Differential Revision: D9778479 Pulled By: zou3519 fbshipit-source-id: 2d6cdea81dd1264f2c0396b662f70fdafecb3647	2018-09-12 11:25:18 -07:00
Yan Zhu	23d55883c0	minor formatting error log (#11528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11528 as title Reviewed By: chocjy Differential Revision: D9773214 fbshipit-source-id: b7dd4c19ab83a18f344de8e71ce5b3bf74d1af72	2018-09-12 11:25:17 -07:00
zou3519	6398d626f4	Warn that export+import module always load onto the CPU (#11485 ) Summary: Test Plan `cd docs && make html` ![image](https://user-images.githubusercontent.com/5652049/45325074-ed04e480-b51d-11e8-9d2d-685dbe8a08e9.png) cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11485 Differential Revision: D9772119 Pulled By: zou3519 fbshipit-source-id: 3dcb16c9edc2e8deebef17accf91a1c7d4dc9063	2018-09-12 10:55:39 -07:00
Christian Puhrsch	12f4c46eea	caffe2::StorageImpl use at::DataPtr (#11282 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11282 Reviewed By: ezyang Differential Revision: D9658503 Pulled By: cpuhrsch fbshipit-source-id: 42fa73c979692cb1069c0345744a85d12150745c	2018-09-12 09:39:23 -07:00
Edward Yang	e5dd77c7ad	Sync all libnccl soversions, not just libnccl.so.1 (#11575 ) Summary: Fixes: ``` /bin/ld: warning: libnccl.so.1, needed by /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so, not found (try using -rp ath or -rpath-link) /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce' ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11575 Differential Revision: D9789956 Pulled By: ezyang fbshipit-source-id: 63e48763cc233be9d137cec721b239159b511a24	2018-09-12 09:24:51 -07:00
Peter Goldsborough	f0a284502a	Document BatchNorm and update default behavior (#11484 ) Summary: This PR: 1. Documents `BatchNorm`, 2. Makes a number of API changes after reconsidering some quirks: 1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful. 2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed. 3. Adds tests for `BatchNorm`, finally. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484 Reviewed By: pjh5 Differential Revision: D9779618 Pulled By: goldsborough fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459	2018-09-12 09:09:53 -07:00
Rasmus Diederichsen	6fc18a7541	Typo fix in randomness.rst (#11571 ) Summary: "need to be" -> "need not be" Pull Request resolved: https://github.com/pytorch/pytorch/pull/11571 Differential Revision: D9786001 Pulled By: soumith fbshipit-source-id: 7cc408f5c8bfcc56d4b5c153646f30e1cec37539	2018-09-12 08:25:46 -07:00
Thomas Viehmann	efc0f6784a	Move some bmm/baddbmm to ATen (#11292 ) Summary: - Incorporates MKL addition by mingfeima Thank you! (but all errors are my own) - Native CPU implementation: defer to matrix multiplication for small batches and parallelize over batch dimension for large batches. - Add bmm test for CUDA just to be sure. This is a partial fix for #10661, getting down to a factor ~5. Considerable overhead is incurred for the setup in einsum. It might be more efficient to eventually define an optimized contraction functions for arbitrary and several dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11292 Differential Revision: D9784941 Pulled By: ezyang fbshipit-source-id: f6dded2c6f5e8f0461fb38f31f9a824992a58358	2018-09-12 07:09:55 -07:00
Teng Li	76070fe73c	Make c10d test work on CPU only build (#11567 ) Summary: Make test work with CPU only build, also fixed the test failures for a long time Pull Request resolved: https://github.com/pytorch/pytorch/pull/11567 Differential Revision: D9785740 Pulled By: teng-li fbshipit-source-id: 61c43b758c1ee53117e30de8074583e6faea863a	2018-09-12 01:39:44 -07:00
Owen Anderson	6597779847	Clean up some C++ cruftiness in the script lexer. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11408 Differential Revision: D9772843 Pulled By: resistor fbshipit-source-id: 07f16bf7eaf4f1d8700e46e91a485de4b2d9ed83	2018-09-11 23:55:31 -07:00
Peter Goldsborough	3e3d8caecd	Allow setting deletion constant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529 Differential Revision: D9775398 Pulled By: goldsborough fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665	2018-09-11 23:11:46 -07:00
Teng Li	6dcdbd3a1d	Make C10d support CPU only build (#11513 ) Summary: This makes torch.distributed works for CPU only build. Also added one more CI test case to cover MPI CPU build. All CI tests should cover this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513 Differential Revision: D9784546 Pulled By: teng-li fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7	2018-09-11 22:10:34 -07:00
Adam Paszke	90e31f4896	Improve tracer warnings (#11545 ) Summary: Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would always happen in the hot path and be for nothing. A sample of how the new warnings look like: ``` tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! int(x) tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus e the trace to be incorrect. torch.tensor([1.]) tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe. torch.split(y, 2, dim=1)[0].add_(2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545 Differential Revision: D9782975 Pulled By: apaszke fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9	2018-09-11 22:10:32 -07:00
Adam Paszke	62c9d4ac96	Make .to() methods native functions (to fix JIT tracing) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491 Differential Revision: D9771121 Pulled By: apaszke fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7	2018-09-11 21:55:42 -07:00
Adam Paszke	a00fa2c614	Release GIL when calling into JIT interpreter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541 Differential Revision: D9777909 Pulled By: apaszke fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec	2018-09-11 21:55:40 -07:00
Soumith Chintala	1a246c9c7e	guard spurious cudnn.h include (#11562 ) Summary: This fixes the build when CuDNN was not found on the system. From the `git blame`, it looks like the bug has been around for 2 years :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562 Differential Revision: D9784589 Pulled By: soumith fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4	2018-09-11 21:09:54 -07:00
Tongliang Liao	a11ebfa195	Add explicit "this->" for nvcc. (#11196 ) Summary: Fix #11195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11196 Differential Revision: D9737625 Pulled By: ezyang fbshipit-source-id: fb62076f005bd619eba53c0ed3f07683633f6d91	2018-09-11 21:09:52 -07:00
Rasmus Diederichsen	8aa8ad8b01	WIP: Reproducibility note (#11329 ) Summary: This adds a Note on making experiments reproducible. It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements. I'm not sure what to do about the submodule changes. Please advise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329 Differential Revision: D9784939 Pulled By: ezyang fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e	2018-09-11 21:09:51 -07:00
Anders Papitto	b75c32ded9	link against TORCH_CUDA_LIBRARIES Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11475 Differential Revision: D9784616 Pulled By: anderspapitto fbshipit-source-id: bb8b443bcb308bbbe9707d265f21e5d00d717d65	2018-09-11 20:39:53 -07:00
Peter Goldsborough	f4d9f39a94	Test libtorch on cuda Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11554 Differential Revision: D9784772 Pulled By: goldsborough fbshipit-source-id: c3e071695f56c1f427984f427b1f7722722947d3	2018-09-11 20:39:51 -07:00
Rasmus Diederichsen	35348dab10	WIP: Include note on cudnn determinism in each function backed by cudnn (#11434 ) Summary: Ping ezyang This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong? Once #11329 is merged it might make sense to link to the reproducibility note everywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434 Differential Revision: D9751208 Pulled By: ezyang fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9	2018-09-11 20:27:09 -07:00
Wei Yang	54107ae8cf	convert output_device at data_parallel from torch.device to index (#10189 ) Summary: - fixes #9984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189 Differential Revision: D9545390 Pulled By: weiyangfb fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e	2018-09-11 20:27:07 -07:00
Peter Goldsborough	045f862574	Use torch::nn::init::xavier_normal_ Summary: The PyTorch C++ API has `torch.nn.init` equivalents that the RNNG can use to initialize the state of its StackRNNs. This gets rid of the `fanInOut_` methods on `Parser` and tidies up `xavierInitialState` a little. Reviewed By: wowitsmrinal Differential Revision: D9472595 fbshipit-source-id: c202116f32383d3b4bba064c2c0d2656311e1170	2018-09-11 20:27:06 -07:00
Peter Goldsborough	d95fedb436	Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458 ) Summary: This PR does two things: 1. Replaces the implementation of the `Dropout` module with a call to the ATen function, 2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`. I also replaced the implementation of `dropout3d` with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly). ebetica ezyang SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458 Differential Revision: D9756603 Pulled By: goldsborough fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c	2018-09-11 20:16:12 -07:00
Yangqing Jia	3121c8f526	Update gtest and remove the macro guide on gtest from #11321 (#11417 ) Summary: Last PR seems to have test failures, re-issuing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417 Reviewed By: orionr Differential Revision: D9784706 Pulled By: Yangqing fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609	2018-09-11 20:16:08 -07:00
Edward Yang	92fd69f256	Split Type into TypeExtendedInterface and Type (#11520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11520 Previously, we had Type which was a catch all interface for all functions and methods we could possibly want to do dynamic dispatch on. However, we want to check in a non-autogenerated Tensor class to ATen/core, and to do this, we must also check in a non-autogenerated Type class which we can do dispatch on. In principle, we could put the full Type interface in ATen/core, but this would be a bad developer experience, since any time you add a new free function, you'd have to regenerate the checked in Type header. For a better dev experience, we split Type into a two parts, Type, which will be checked in (though not in this diff), and TypeExtendedInterface, which will NOT be checked in. Type contains just enough methods to let Tensor be defined, and leaves the rest to TypeExtendedInterface. Some complications: - We (very unfortunately) have overloaded virtual methods. Because of C++'s rules, we cannot move one overload without doing some extra work to make sure that overload in a superclass and an overload in a subclass resolve together. I've chosen to resolve this problem simply by moving ALL overloads of a method which occurs in Tensor to Type. - There are some places where we take a type() object and call a method on it, which is not a Tensor base method. I've eliminated some where possible, but in other cases calling the method on type is the ONLY way to invoke it; in that case, I've just inserted a cast. Further refactoring is necessary. Reviewed By: gchanan Differential Revision: D9771708 fbshipit-source-id: c59d39fe919cd6f42be6dca699d474346ea3c614	2018-09-11 20:16:04 -07:00
Yangqing Jia	35d52dbb0e	re-enable USE_MPI (#11416 ) Summary: The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem. Not tested locally - waiting for CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416 Reviewed By: mingzhe09088 Differential Revision: D9771694 Pulled By: Yangqing fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91	2018-09-11 18:26:12 -07:00
Fritz Obermeyer	bbf54ea37c	Ensure .enumerate_support() methods are jittable (#11542 ) Summary: This works around #11535 by avoiding `arange(n, out=x)` and `eye(n, out=x)` in `torch.distributions`. I've confirmed that the `.enumerate_support()` methods are now jittable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11542 Differential Revision: D9777805 Pulled By: apaszke fbshipit-source-id: fa38f2f1acfc0a289f725fd8c92478573cfdbefb	2018-09-11 18:26:09 -07:00
Wei Yang	cda74ac476	fix nested no_grad decorator and with-statement (#11479 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10858 - allow `no_grad` decorator to apply `with torch.no_grad()` at the correct context - current behavior: ``` import torch torch.no_grad() def nothing(x): return x testin = torch.Tensor([0]) with torch.no_grad(): print(torch.is_grad_enabled()) # False testout = nothing(testin) print(torch.is_grad_enabled()) # False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11479 Differential Revision: D9758691 Pulled By: weiyangfb fbshipit-source-id: 87de2219c6c45f65a2c0406ae152c3ad760be8f2	2018-09-11 17:56:40 -07:00
Adam Paszke	8b196d671b	Allow tracing random functions (only when using default generators) (#11539 ) Summary: Fixes #11504. zdevito, neerajprad, fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11539 Differential Revision: D9777897 Pulled By: apaszke fbshipit-source-id: 56983260f5b93da7d5540a6242769ea7bd50eb06	2018-09-11 17:56:39 -07:00
Soumith Chintala	b6b0b5222d	fix missing libnccl.so.1 error (#11553 ) Summary: what it says on the tin. I broke the build in https://github.com/pytorch/pytorch/pull/11487 but contbuild didn't end up catching it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11553 Differential Revision: D9781557 Pulled By: soumith fbshipit-source-id: 2a1fa314af4b85b5491d74110bfee3d80599aa95	2018-09-11 17:25:58 -07:00
Tongzhou Wang	3a39006d38	Fix some more doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531 Differential Revision: D9776541 Pulled By: SsnL fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7	2018-09-11 16:26:55 -07:00
Roger-luo	3a8e39b215	Support load and store between Py_complex and std::complex (#11493 ) Summary: Printing for complex numbers requires loading and storing between `Py_complex` and `std::complex`. This patch aims to support this for the plugin. Differential Revision: D9771808 Pulled By: ezyang fbshipit-source-id: 024865f1945d63ddb5efc775a35438c8ea06408e	2018-09-11 15:55:11 -07:00
Zachary DeVito	289a8c9b7d	Allow train/eval, and non-Tensor arguments to python functions (#11505 ) Summary: This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work. This also changes the code for calling python functions from script to allow non-tensor inputs/outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505 Differential Revision: D9765466 Pulled By: zdevito fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69	2018-09-11 15:05:09 -07:00
Yangqing Jia	17776db2ee	Add gtest dependency on aten tests. (#11429 ) Summary: ezyang delivering my promise to you :) Basically, now aten tests can use gtest as part of our test harness unification effort. I also converted one test (atest.cpp) to show how one can do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11429 Reviewed By: ezyang Differential Revision: D9762934 Pulled By: Yangqing fbshipit-source-id: 68ec3a748403c6bd88399b1e756200985a4e07e3	2018-09-11 13:39:51 -07:00
Lukasz Wesolowski	4db21a1d8e	Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413 LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result. Reviewed By: manojkris, xianjiec Differential Revision: D9724988 fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900	2018-09-11 13:25:35 -07:00
Thomas Viehmann	c1dce21fd5	Cuda TensorAccessor (#11373 ) Summary: Provide a TensorAccessor-Like interface for CUDA as discussed in #8366. Compared to TensorAccessor - the CUDATensorAccessor copies the sizes and strides while on the host (I didn't implement a host indexing function, though) to enable transfer to the device, on the device, `[]` works like for TensorAccessors, - instantiation is from TensorAccessors in order to allow using `.accessor<..>`. The drawback is that it you cannot use `auto` for the variable declaration, but the alternative would be a cuda-specific `.accessor`-like function, - there is a PtrTraits argument to enable `__restrict__`, Example for the intended use: ``` ... template <typename scalar_t> __global__ void apply_homography_2d_kernel(cuda::CUDATensorAccessor<scalar_t, 4> dest_a, cuda::CUDATensorAccessor<scalar_t, 4> src_a, cuda::CUDATensorAccessor<float, 2> transform) { ... } template <typename scalar_t> Tensor apply_homography_2d_template(Tensor& res, const Tensor& image, const Tensor& transform) { ... cuda::CUDATensorAccessor<scalar_t, 4> image_a(image.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<scalar_t, 4> res_a(res.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<float, 2> transform_a(transform.accessor<float, 2>()); auto stream = at::cuda::getCurrentCUDAStream(); apply_homography_2d_kernel<scalar_t> <<<grid, block, 0, stream>>>(res_a, image_a, transform_a); return res; } ... ``` I could use a hint where to put a test for this (e.g. doing a plain vanilla matrix multiplication with a custom kernel) and comparing with the aten mm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11373 Differential Revision: D9735573 Pulled By: ezyang fbshipit-source-id: 482b218a0d514e19a8b692bbc77c0e37082cfded	2018-09-11 13:09:33 -07:00
vishwakftw	c56a7cfc37	More use of AT_CHECK and AT_ERROR (#11457 ) Summary: Considering these increase the size of the message stack, I didn't touch the code outside `ATen/native` Differential Revision: D9754283 Pulled By: soumith fbshipit-source-id: 04198ec4fd0c4abae09eeba92c493a783408537a	2018-09-11 12:55:09 -07:00
Will Feng	5952acc041	Add "merge to master" step before build in CircleCI (#11443 ) Summary: This PR adds the "merge to master" step before the build step in CircleCI, so that all PR commits are built against master instead of against the PR's branch. Note that all PRs still need to rebase to master to pick up this new config, so it won't apply to old PR branches retroactively. To check in CI: make sure it's performing the git merge to master appropriately in "Merge Onto Master" step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11443 Differential Revision: D9775628 Pulled By: yf225 fbshipit-source-id: 8083db6b098d234a44ae4481f40a486e9906f6f8	2018-09-11 12:39:37 -07:00
James Reed	fbc17321fd	Update pybind11 to fix Python 3.7 support for script (#11473 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11419 In particular pulling in https://github.com/pybind/pybind11/pull/1454 as well as pending bugfix in https://github.com/pybind/pybind11/pull/1517 (documenting in comment) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11473 Differential Revision: D9776003 Pulled By: jamesr66a fbshipit-source-id: a225dcfb66c06bcae98fd2508d9e690c24be551a	2018-09-11 12:39:36 -07:00
Adam Paszke	781737f84c	Remove time prefix from rsync (#11525 ) Summary: This fails with zsh saying "time: command not found". cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11525 Differential Revision: D9772522 Pulled By: apaszke fbshipit-source-id: b80d108fa6b174d68ada08a9fdbf7260ee37e08f	2018-09-11 12:10:24 -07:00
Will Feng	a566bc2f11	Disable all CircleCI jobs (#11523 ) Summary: Disable all CircleCI jobs until we are ready to move forward with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11523 Differential Revision: D9774462 Pulled By: yf225 fbshipit-source-id: c5724e71eb68bac4df958b4f7bcc380050668b3c	2018-09-11 11:25:17 -07:00
Fei Sun	d09041bd81	Add an option to statically link cuda (#10596 ) Summary: Need to link CUDA statically for benchmarking purpose. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10596 Reviewed By: llyfacebook Differential Revision: D9370738 Pulled By: sf-wind fbshipit-source-id: 4464d62473e95fe8db65b0bd3b301f262bf269bf	2018-09-11 11:09:29 -07:00
Lu Fang	727a4453aa	New Serialization Proto Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11166 Reviewed By: mingzhe09088 Differential Revision: D9623522 Pulled By: houseroad fbshipit-source-id: f21153034a398de7959404321d8534234cd58a40	2018-09-11 10:55:43 -07:00
Edward Yang	f80f15866b	Get rid of manual dispatch on Type. (#11486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11486 I discovered these by narrowing the interface on Type, and then fixing call sites outside of core plumbing code which depended on these methods being provided. Reviewed By: cpuhrsch Differential Revision: D9757935 fbshipit-source-id: 3abda0c98919a448a326a757671d438964f6909f	2018-09-11 10:40:22 -07:00
Peter Goldsborough	01c7542f43	Use -isystem for system includes in C++ extensions (#11459 ) Summary: I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I` I hope this works on Windows. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459 Differential Revision: D9764444 Pulled By: goldsborough fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235	2018-09-11 10:40:20 -07:00
Orion Reblitz-Richardson	d32b41003a	Copy protos on install same as develop (#11517 ) Summary: This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517 Differential Revision: D9771911 Pulled By: orionr fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9	2018-09-11 10:09:56 -07:00
James Reed	deac304b6b	Bugfix for basic slicing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11428 Differential Revision: D9753999 Pulled By: jamesr66a fbshipit-source-id: cfc4163a5a06b41beb808a4e24650d71f5d91f4f	2018-09-11 09:39:29 -07:00
Soumith Chintala	4e8d9a4a58	Introducing python setup.py rebuild develop (#11487 ) Summary: This speeds up incremental builds by doing the following changes: - Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy" - Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`. Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun Current no-op rebuild speedup: - 1m 15s -> 20s There are some lingering bugs. No-op rebuilds rerun `cmake` for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild). So what you see ``` python setup.py rebuild develop # first time - ~5 mins python setup.py rebuild develop # second time - ~3 mins python setup.py rebuild develop # third time - ~2 mins python setup.py rebuild develop # fourth time - ~20 seconds python setup.py rebuild develop # fifth time - ~20 seconds ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487 Differential Revision: D9769087 Pulled By: soumith fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5	2018-09-11 08:56:25 -07:00
Orion Reblitz-Richardson	31850163ac	Remove separate ATen build target (#11488 ) Summary: ATen has had a separate build target in the past, but with our move to a root-level CMakeLists.txt file this makes less sense and is harder to maintain. Also, as we blend code between Caffe2 and ATen this will become even less maintainable. Talked to ezyang about this, but also cc zdevito, Yangqing, and soumith. If this is too difficult, I will revert, but want to see if we can simplify for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11488 Differential Revision: D9770266 Pulled By: orionr fbshipit-source-id: c7ba52a1676d84e2d052dad4c042b666f49451cd	2018-09-11 08:56:23 -07:00
Tongzhou Wang	de460c7ad3	Improvements on conv/pool/fold/stft/ParamDict docs (#11106 ) Summary: Also fixes some incorrect formula rendering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106 Differential Revision: D9752433 Pulled By: SsnL fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21	2018-09-11 08:56:21 -07:00
Gregory Chanan	86ab92b0a9	Move TensorImpl / UndefinedTensor(Impl) to core (#11441 ) Summary: Moves TensorImpl to core. Renames UndefinedTensor to UndefinedTensorImpl and moves to core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11441 Differential Revision: D9736620 Pulled By: gchanan fbshipit-source-id: 0322ae3b903e338de253b35a0d74a9d3e219204b	2018-09-11 07:45:56 -07:00
Neeraj Pradhan	80fa8e1007	Add .expand() method to distribution classes (#11341 ) Summary: This adds a `.expand` method for distributions that is akin to the `torch.Tensor.expand` method for tensors. It returns a new distribution instance with batch dimensions expanded to the desired `batch_shape`. Since this calls `torch.Tensor.expand` on the distribution's parameters, it does not allocate new memory for the expanded distribution instance's parameters. e.g. ```python >>> d = dist.Normal(torch.zeros(100, 1), torch.ones(100, 1)) >>> d.sample().shape torch.Size([100, 1]) >>> d.expand([100, 10]).sample().shape torch.Size([100, 10]) ``` We have already been using the `.expand` method in Pyro in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py#L10) of `torch.distributions`. We use this in our models to enable dynamic broadcasting. This has also been requested by a few users on the distributions slack, and we believe will be useful to the larger community. Note that currently, there is no convenient and efficient way to expand distribution instances: - Many distributions use `TransformedDistribution` (or wrap over another distribution instance. e.g. `OneHotCategorical` uses a `Categorical` instance) under the hood, or have lazy parameters. This makes it difficult to collect all the relevant parameters, broadcast them and construct new instances. - In the few cases where this is even possible, the resulting implementation would be inefficient since we will go through a lot of broadcasting and args validation logic in `__init__.py` that can be avoided. The `.expand` method allows for a safe and efficient way to expand distribution instances. Additionally, this bypasses `__init__.py` (using `__new__` and populating relevant attributes) since we do not need to do any broadcasting or args validation (which was already done when the instance was first created). This can result in significant savings as compared to constructing new instances via `__init__` (that said, the `sample` and `log_prob` methods will probably be the rate determining steps in many applications). e.g. ```python >>> a = dist.Bernoulli(torch.ones([10000, 1]), validate_args=True) >>> %timeit a.expand([10000, 100]) 15.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit dist.Bernoulli(torch.ones([10000, 100]), validate_args=True) 11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cc. fritzo, apaszke, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11341 Differential Revision: D9728485 Pulled By: soumith fbshipit-source-id: 3b94c23bc6a43ee704389e6287aa83d1e278d52f	2018-09-11 06:56:18 -07:00
Adam Paszke	120d769432	Add support for tracing strings (#11506 ) Summary: This enabled `torch.einsum` both in tracing and in script mode. It's used all over Pyro at the moment, and is needed for any use of the JIT in there. Fixes #11157. zdevito fritzo neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11506 Differential Revision: D9764787 Pulled By: apaszke fbshipit-source-id: 9b5251b9e7c5897034602bd07ff67b425d33326c	2018-09-11 06:02:41 -07:00
Adam Paszke	0ddbe668cd	Improve shape analysis to cover all most commonly used ops (#11358 ) Summary: [Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358 Differential Revision: D9753693 Pulled By: apaszke fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c	2018-09-11 06:02:39 -07:00
Duc Ngo	f84693efa9	nomnigraph - Improvements to subgraph matching APIs (#11418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418 Several improvements that aim to make the APIs more straightforward to use - Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API - Get rid of operatorSubgraph helper method - invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion. - additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity). - Tests are updated to reflect the changes Follow up changes: - Possibly clean up the tests further. This change aims to minimally modify the unit tests. - Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria. - Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference. Differential Revision: D9732565 fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1	2018-09-11 04:39:27 -07:00
Teng Li	3d5fd12488	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 ) Summary: This is the new documentation for c10d release, and it also deprecates the old torch.distributed document. This PR depends on https://github.com/pytorch/pytorch/pull/11405 and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450 Differential Revision: D9765504 Pulled By: teng-li fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db	2018-09-11 02:10:28 -07:00
Teng Li	0988bbad2d	C10d release to torch.distributed for PT1 (#11405 ) Summary: The old `torch.distributed` will go to `torch.distributed.deprecated` The old DDP will go to `torch.nn.parallel.deprecated` Now `torch.nn.parallel.DDP` will use c10d DDP Now `torch.distributed` will use C10d frontend API Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405 Reviewed By: pietern Differential Revision: D9733733 Pulled By: teng-li fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08	2018-09-10 23:27:22 -07:00
Peter Goldsborough	b14a80553d	Ignore functional doc error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11508 Differential Revision: D9764380 Pulled By: goldsborough fbshipit-source-id: 3abb9c04f46137be833ea26d67734741e14f8010	2018-09-10 20:55:48 -07:00
Gregory Chanan	f9d12eeb27	Give copy an optional device argument. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11497 Differential Revision: D9762014 Pulled By: gchanan fbshipit-source-id: 996419cc5e86d000af953d030ff361adafb921ad	2018-09-10 20:40:03 -07:00
Peter Goldsborough	dd8defeb3f	Document the Functional module (#11460 ) Summary: Document the `Functional` module in the C++ API. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11460 Differential Revision: D9757555 Pulled By: goldsborough fbshipit-source-id: 15f8bf6d60bd26f3f4e69fb8e414e186e3c220ee	2018-09-10 19:58:38 -07:00
Peter Goldsborough	9cfdf0d677	Document the Embedding module (#11469 ) Summary: ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11469 Differential Revision: D9757547 Pulled By: goldsborough fbshipit-source-id: a95673abe949bb81d716dbc03c5c3e2a11cc15d3	2018-09-10 18:25:08 -07:00
Orion Reblitz-Richardson	a175282776	Flags for LMDB, LevelDB, and Caffe2 ops (#11462 ) Summary: Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with ``` USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps ``` Also add a flag to build Caffe2 ops, which is default `ON`. Disable with ``` NO_CAFFE2_OPS=1 python setup.py build_deps ``` cc Yangqing soumith pjh5 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462 Reviewed By: soumith Differential Revision: D9758156 Pulled By: orionr fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63	2018-09-10 17:27:50 -07:00
Orion Reblitz-Richardson	e1e69446f6	Lockdown NO_TEST=1 for tests even more (#11415 ) Summary: Skip torch tests as well when NO_TEST=1 environment variable is set. Also remove the separate ATen code path for not being built with Caffe2, since it will always be built with Caffe2. cc The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11415 Reviewed By: soumith Differential Revision: D9758179 Pulled By: orionr fbshipit-source-id: e3e3327364fccdc57a703aeaad8c4f30452973fb	2018-09-10 17:27:48 -07:00
Bram Wasti	3e49a69466	Resolve ambiguity when including both caffe2 and aten registries (#11411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11411 Simple fix Reviewed By: goldsborough Differential Revision: D9730371 fbshipit-source-id: f841327c01faa13cfb6b7fc6e279b8fc50fad1db	2018-09-10 17:27:46 -07:00
James Reed	3ad67c60f0	Traceable explicit Variable instantiation (#11463 ) Summary: There's a bunch of legacy code where people are explicitly instantiating Variable, and these call-sites have thus far been untraceable (appearing as prim::Constant nodes with the tensor value at the time of tracing). This makes it so that the new variable inherits the traced Value* from the tensor it's being constructed from Pull Request resolved: https://github.com/pytorch/pytorch/pull/11463 Differential Revision: D9756529 Pulled By: jamesr66a fbshipit-source-id: da99c6a7621957a305f2699ec9cb9def69b1b2d7	2018-09-10 17:03:24 -07:00
Mingda Li	f2f43ad2da	Add new LengthsSplit operator (#10974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291 This new operator will do the following: Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where: 1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements) 2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1) 3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0) Reviewed By: bddppq, chocjy Differential Revision: D9013119 fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84	2018-09-10 15:40:28 -07:00
Owen Anderson	0b78ae86c5	Cleanup byte swapping utilities to generate optimal code on the platforms we care about. (#11394 ) Summary: While the use of memcpy as part of the byte swapping sequence looks funky, all major compilers recognize and optimize this pattern reliably, resulting in essentially optimal code generation. For example, decodeUInt32LE goes from this on iOS arm64: > ldrb w8, [x0, #3] > ldrb w9, [x0, #2] > bfi w8, w9, #8, #8 > ldrb w9, [x0, #1] > bfi w8, w9, #16, #8 > ldrb w9, [x0] > bfi w8, w9, #24, #8 > mov x0, x8 > ret To this: > ldr w8, [x0] > rev w0, w8 > ret Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394 Reviewed By: SsnL Differential Revision: D9728659 Pulled By: resistor fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f	2018-09-10 15:40:24 -07:00
Peter Goldsborough	a0d4106c07	Integrate custom op tests with CI (#10611 ) Summary: This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them. I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think? Also ezyang for CI changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611 Differential Revision: D9597627 Pulled By: goldsborough fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3	2018-09-10 15:40:21 -07:00
Adam Paszke	3e665cc29b	Improve support for tracing sizes, add more tracer warnings (#11288 ) Summary: Many constructors like `torch.zeros` or `torch.randn` didn't support size tracing correctly which is fixed by this pass. Same issue has been fixed in legacy tensor constructors. Additionally, new tensor constructors, which do not participate in tracing (most notably `torch.tensor`, `torch.as_tensor` and `torch.from_numpy`) raise a warning when they are used. Finally, entering a traceable operation disables the tracing in its body. This is needed because zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288 Reviewed By: ezyang Differential Revision: D9751183 Pulled By: apaszke fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f	2018-09-10 15:22:48 -07:00
Tongzhou Wang	70d93f4777	Check for maximum numel in NCCL broadcasting (#11466 ) Summary: NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit. No test because I don't think our test suite should broadcast big tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466 Differential Revision: D9754753 Pulled By: SsnL fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685	2018-09-10 14:39:15 -07:00
Peter Goldsborough	35008e0a1a	Add flags to fix half comparison and test (#11395 ) Summary: The controller you requested could not be found. found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue. Reported in https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333 Related: https://github.com/pytorch/tutorials/pull/292 soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11395 Differential Revision: D9725102 Pulled By: goldsborough fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad	2018-09-10 14:10:21 -07:00
Myle Ott	18e5fd36c2	Normalize gradients before reduction in DistributedDataParallelC10d (#11109 ) Summary: Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11109 Differential Revision: D9594708 Pulled By: myleott fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338	2018-09-10 13:55:09 -07:00
Tongzhou Wang	ea0ee77c61	Fix katex math rendering (#11472 ) Summary: I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472 Differential Revision: D9755328 Pulled By: SsnL fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74	2018-09-10 12:40:23 -07:00
Sebastian Messmer	198ade74f9	Remove manual refcounting from Tensor class (#11294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11294 The Tensor(ptr, retain) constructor is error prone and circumvents the intrusive_ptr safety. This diff removes that and pushes the responsibility to callers. Step by step, manual refcounting can be pushed back and possibly eliminated in the end. Reviewed By: ezyang Differential Revision: D9663476 fbshipit-source-id: 7f010e5e47b137a9575960201c5bf5d552c5c2f5	2018-09-10 12:40:21 -07:00
Sebastian Messmer	b0c1397271	Fix intrusive_ptr move/copy for different NullType's (#11260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11260 This is needed to make something like this work: intrusive_ptr<TensorImpl, UndefinedTensorImpl> a = make_intrusive<SparseTensorImpl>(...); Reviewed By: ezyang Differential Revision: D9652089 fbshipit-source-id: 19c65e98460ccb27bc69e36d7e558cb9d6e67615	2018-09-10 12:40:20 -07:00
Sebastian Messmer	252f93df09	Improve Tensor() constructor (#11258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11258 The two intrusive_ptr constructors in Tensor can be combined into one implementation that does both, moving and copying. Reviewed By: ezyang Differential Revision: D9652088 fbshipit-source-id: 5efca02654ba305c99c20bbeb83551469d17a51d	2018-09-10 12:40:19 -07:00
Sebastian Messmer	09292f2c03	Some improvements to IValue (#11238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11238 - when moving an IValue, free the old value instead of keeping it allocated - making classes final - moving std::string - making ConstantList const Reviewed By: ezyang Differential Revision: D9644700 fbshipit-source-id: ab7228368e4f00f664ba54e1242b0307d91c5e7e	2018-09-10 12:40:17 -07:00
Sebastian Messmer	ce6906b051	Narrowing Blob (#11167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167 Narrow the Blob API as preparation for merging Blob/IValue - get rid of templated IsType and Operator::InputIsType / OutputIsType - Use 'using' instead of 'typedef' for DestroyCall (just for readability) Reviewed By: ezyang Differential Revision: D9623916 fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8	2018-09-10 12:40:16 -07:00
Richard Zou	040d75d455	Add option to use CUDA memory leak testing as a context manager (#11380 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11380 Reviewed By: ezyang Differential Revision: D9705877 Pulled By: zou3519 fbshipit-source-id: 02470c25236f57fa02f4ac9d7ed63d38a6355db2	2018-09-10 12:40:15 -07:00
Elias Ellison	2158f4a9c8	add export import test to TestJitGenerated (#10982 ) Summary: Checking assertExportImport for all of the generated test jit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10982 Differential Revision: D9636935 Pulled By: eellison fbshipit-source-id: f3f1ce77d454848098f2ac7e0fa18bf8564890be	2018-09-10 11:37:05 -07:00
Gregory Chanan	cee743f639	Move backward/set_data to Type-based dispatch. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11440 Differential Revision: D9736565 Pulled By: gchanan fbshipit-source-id: 1e66f54f1c87084f37c0b014030f0d6d2f8dfaee	2018-09-10 08:40:29 -07:00
Tongzhou Wang	87a9a8f80a	Use AT_CHECK and AT_ERROR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11444 Differential Revision: D9736992 Pulled By: SsnL fbshipit-source-id: bf5320e878c6ef71468f3e2aa12ce304b92d45ca	2018-09-09 21:26:12 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Thomas Viehmann	581099a7b2	pybind conversion for IntList (#11425 ) Summary: as discussed with ezyang and slayton58 , this might be a nice convenience to be able to use code in extensions just as in ATen. also split off `tracing_state.h` from `torch/jit/tracer.h` fix #11204 to bee able to use the utility functions pytorchbot it's not a jit patch per se. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11425 Differential Revision: D9735556 Pulled By: ezyang fbshipit-source-id: 466c92bbdb1d7d7a970eba1c26b7583fe9756139	2018-09-09 10:39:40 -07:00
Soumith Chintala	ee4309a9ac	override BUILD_TEST when building gloo (#11431 ) Summary: A recent build regression is that we need a system GoogleTest for builds to pass. This was because, when building with Gloo, gloo is trying to build it's own tests, which look for system gtest [here](https://github.com/facebookincubator/gloo/blob/master/cmake/Dependencies.cmake#L72-L80) (because we're not using full cmake build and making it aware of third_party/GoogleTest, but instead, we are building it isolated using tools/build_pytorch_libs.sh Traditionally, we didn't ask Gloo to build it's tests, but because we added `-DBUILD_TEST=1` by default to all builds (in refactoring variable names), we accidentally started asking Gloo to build it's tests. This PR overrides the Gloo flags and asks it to not build tests (like it used to) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11431 Differential Revision: D9736387 Pulled By: soumith fbshipit-source-id: 59e84edae780123b793bdaea5fd9ac46156cd0af	2018-09-09 10:11:56 -07:00
Mingfei Ma	1b94f5c6e6	optimize masked_fill on CPU (#11359 ) Summary: This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU. the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs `4.20` sec without the PR and `0.11` sec with the PR. ```python import torch import random from time import time size = 10 * 1000 * 1000 count = 100 def test_masked_fill(): dst = torch.randn(size) dst_ = dst.clone() mask = torch.rand(size).mul(2).floor().byte() val = random.random() tstart = time() for i in range(count): dst.masked_fill_(mask, val) tend = time() print("masked_fill_: %f" % (tend-tstart)) for i in range(size): if mask[i]: if dst[i] != val: print("fail") else: if dst[i] != dst_[i]: print("fail1") print("test_masked_fill: PASS") test_masked_fill() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11359 Differential Revision: D9735578 Pulled By: ezyang fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4	2018-09-09 00:25:26 -07:00
Syed Tousif Ahmed	b7ecf035dc	Updates FindCUDA.cmake to 3.12.2 upstream version (#11406 ) Summary: This PR is just a copy-paste of the upstream FindCUDA.cmake. Since, cublas_device is deprecated in CUDA >= 9.2, this change is necessary for build. Related: https://gitlab.kitware.com/cmake/cmake/merge_requests/2298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11406 Differential Revision: D9735563 Pulled By: ezyang fbshipit-source-id: c74d86ced7cc485cb2233f9066ce23e921832c30	2018-09-08 23:10:32 -07:00
Erik Brinkman	6683fb56ca	Add AVX optimizations for pdist (#11230 ) Summary: Added AVX optimizations for pdist using Vec256. This brings single threaded performance up to speed with scipy, but the current implementation greatly hurts performance without AVX enabled. Is there a way to special case out AVX on dispatch and call the non Vec256 code? Or is the way I used Vec256 completely wrong? Single threaded comparison to scipy ============================ This is the time to compute the pdist of a 2048 x 2048 float matrix with only one thread for various values of p between torch and scipy. p = 3 is the code path for arbitrary p, and so is much slower than the other values. p \| torch \| scipy -----\|-----------\|------ 0 \| 6.27 s ± 393 ms \| 7.23 s ± 498 ms 1 \| 5.49 s ± 201 ms \| 43.4 s ± 1.09 s 2 \| 5.74 s ± 474 ms \| 53.8 s ± 3.52 s ∞ \| 5.59 s ± 292 ms \| 47.4 s ± 2.03 s 3 \| really slow \| gave up Result by AVX support ================ This is the time to compute the distance and gradient of a 2048 x 2048 float matrix with all threads by AVX support. `before` is the old code, `default` is no AVX support, etc. Interestingly the AVX optimizations provided a great benefit over the old unoptimized code, but drastically hurt performance when compiled without AVX optimizations. p = 3 is the code path for arbitrary p, and so is much slower than the other values. Results for p = 0 ---------------- avx \| dist \| grad ----\|------\|----- before \| 514 ms ± 87.5 ms \| 191 µs ± 35 µs default \| 3.47 s ± 183 ms \| 201 µs ± 24.6 µs avx \| 123 ms ± 18.2 ms \| 281 µs ± 130 µs avx2 \| 103 ms ± 11.4 ms \| 216 µs ± 74.4 µs Results for p = 1 ---------------- avx \| dist \| grad ----\|------\|----- before \| 426 ms ± 35 ms \| 6.21 s ± 187 ms default \| 2.6 s ± 123 ms \| 5.62 s ± 273 ms avx \| 104 ms ± 6.37 ms \| 833 ms ± 44.3 ms avx2 \| 106 ms ± 3.59 ms \| 924 ms ± 86.2 ms Results for p = 2 ----------------- avx \| dist \| grad ----\|------\|----- before \| 425 ms ± 45.4 ms \| 6.31 s ± 125 ms default \| 3.04 s ± 187 ms \| 3.55 s ± 242 ms avx \| 110 ms ± 3.66 ms \| 896 ms ± 21.8 ms avx2 \| 113 ms ± 4.68 ms \| 934 ms ± 25.2 ms Results for p = ∞ ------------------ avx \| dist \| grad ----\|------\|----- before \| 501 ms ± 39.5 ms \| 6.64 s ± 321 ms default \| 2.15 s ± 92.9 ms \| 8.43 s ± 355 ms avx \| 104 ms ± 5.52 ms \| 835 ms ± 36.7 ms avx2 \| 100 ms ± 3.41 ms \| 864 ms ± 67 ms Results for p = 3 ----------------- avx \| dist \| grad ----\|------\|----- before \| 22.6 s ± 413 ms \| 11.1 s ± 242 ms default \| 24.9 s ± 1 s \| 11.2 s ± 293 ms avx \| 2.69 s ± 148 ms \| 5.63 s ± 88.4 ms avx2 \| 2.48 s ± 31.8 ms \| 5.61 s ± 114 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11230 Differential Revision: D9735503 Pulled By: erikbrinkman fbshipit-source-id: a9da619249e4ca2625b39ca1ca7f5543c3086bfb	2018-09-08 22:55:02 -07:00
Tongliang Liao	538ea67437	Search for CMake config files for pybind11. (#11423 ) Summary: If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423 Differential Revision: D9735557 Pulled By: ezyang fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569	2018-09-08 22:44:03 -07:00
Yifei Teng	02114e877f	fix #10838 incorrect bidirectional output format (#11368 ) Summary: Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368 Differential Revision: D9734814 Pulled By: soumith fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f	2018-09-08 17:09:57 -07:00
Edward Yang	ac9268f25d	Conversions to and from complex numbers. (#11420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420 Surprisingly tricky! Here are the major pieces: - We grow a even yet more ludicrous macro AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF which does what it says on the tin. This is because I was too lazy to figure out how to define the necessary conversions in and out of ComplexHalf without triggering ambiguity problems. It doesn't seem to be as simple as just Half. Leave it for when someone actually wants this. - Scalar now can hold std::complex<double>. Internally, it is stored as double[2] because nvcc chokes on a non-POD type inside a union. - overflow() checking is generalized to work with complex. When converting to std::complex<T>, all we need to do is check for overflow against T. When converting from complex, we must check (1) if To is not complex, that imag() == 0 and (2) for overflow componentwise. - convert() is generalized to work with complex<->real conversions. Complex to real drops the imaginary component; we rely on overflow checking to tell if this actually loses fidelity. To get the specializations and overloads to work out, we introduce a new Converter class that actually is specializable. - Complex scalars convert into Python complex numbers - This probably fixes complex tensor printing, but there is no way to test this right now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: cpuhrsch Differential Revision: D9697878 Pulled By: ezyang fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946	2018-09-08 16:39:43 -07:00
Tongzhou Wang	d3f98b5ffc	Add matrix power (#11421 ) Summary: vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it. I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead. Note to reviewers: patch was already approved at #10068 . cc yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421 Differential Revision: D9733407 Pulled By: SsnL fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599	2018-09-08 15:25:56 -07:00
Edward Yang	802380ac93	Improve LegacyTypeDispatch to handle initialization correctly. (#11331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11331 In the previous commit, we added a bare-bones LegacyTypeDispatch in ATen/core. This is not sufficient for the use cases we need: we not only need to be able to get a Type, but we also need to be able to initialize the Types if its the first time we have retrieved a CPU/CUDA/Complex type. I hemmed and hawed about how to do this; the strategy this PR takes is to introduce a new "hooks" interface specifically for initializing CPU/CUDA/Complex (which still lives in Context). We then move all "user-friendly" functions to LegacyTypeDispatch. Here were some other options which I considered, but don't work: - Assume that Type is already initialized, because we only intend to call Type from Tensor methods, where we already have a Tensor. This does not work because Caffe2 created tensors will not have gone through the standard Type codepath, and will have skipped initialization. - Move CUDAHooks and ComplexHooks to ATen/core. Besides being sucky, this isn't even a complete fix, because I still need to initialize CPU hooks (so you still need another hooks interface). Reviewed By: cpuhrsch Differential Revision: D9666612 fbshipit-source-id: ac7004b230044b67d13caa81fdfaf3c6ab915e3f	2018-09-08 10:10:17 -07:00
Edward Yang	9687a72794	Move the type registry out of Context, into LegacyTypeDispatch. (#11274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11274 We don't want to put all of Context into ATen/core, but one particular part cannot be avoided: the type registry, because implementations of TensorMethods will need to get a Type, and then do a virtual call on it. I needed to do a little bit of (temporary) footwork to get this in without also moving Type, because unique_ptr<Type> expects to be able to see the destructor of Type (but it's forward declared right now). So instead I put the destructor as an explicit functor. We can get rid of this once Type actually moves in ATen/core Reviewed By: cpuhrsch Differential Revision: D9657449 fbshipit-source-id: 940931493bf4f1f6a8dad03f34633cacdd63dd0b	2018-09-08 10:10:11 -07:00
Tongzhou Wang	b9b9ae935b	Make torch.randint have default dtype int64 (#11040 ) Summary: cc gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040 Differential Revision: D9565728 Pulled By: SsnL fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e	2018-09-08 07:55:06 -07:00
Teng Li	505ecab88d	bumping up the default store timeout (#11409 ) Summary: to 300 seconds to be safe. It used to be no timeout in THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11409 Differential Revision: D9731709 Pulled By: teng-li fbshipit-source-id: 0ce011dcca507cbf063176ad4995405c77dd0cdd	2018-09-07 23:55:23 -07:00
Pieter Noordhuis	3d2862526b	Support send/recv for the gloo process group (#11387 ) Summary: This change removes the skips for the existing send/recv tests in the backwards compatibility layer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11387 Reviewed By: teng-li Differential Revision: D9729330 Pulled By: pietern fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3	2018-09-07 20:25:18 -07:00
James Reed	47c1de25e8	Test exporting batch norm, dropout, RNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11126 Differential Revision: D9727689 Pulled By: jamesr66a fbshipit-source-id: f142257a2fba27d86844bf33084174f1f68a8ca5	2018-09-07 19:41:39 -07:00
Deyu Fu	b7a2c91eed	remove unnecessary clone() when .grad is None (#11165 ) Summary: Currently gradient is copied into .grad if it is None. This PR aim to remove the copy when it is not absolutely needed. It is generally an improvement of speed and memory usage. And here is a case it may help a lot: Normally, people do optimizer.zero_grad() every minibatch before backward. It will translate into a memset, and later a point-wise add. When there is some large weight in the network, one optimization people can always do is set parameter.grad to None instead of zero_grad. This will remove memset and change point-wise add to a memcpy. Here is result running following script on V100 GPU. It is 100 iterations of forward/backward/zero_grad on single 1-billion word benchmark size embedding. `Zero grad: 2.123847723007202` `None grad: 1.3342866897583008` With the backend change of this PR, the unnecessary memcpy is removed, thus further speed up is achieved. `Zero grad: 2.124978542327881` `None grad: 0.4396955966949463` [benchmark.txt](https://github.com/pytorch/pytorch/files/2341800/benchmark.txt) Some details on the code change: .detach() is used because we need to get rid of new_grad being a view without copy data. This should be safe in first-order only mode. data need to be contiguous, otherwise `grad_variable.data() += new_grad.data();` below will fail. Only the last variable that has reference to the temp gradient will grab its buffer. ngimel, mcarilli and mruberry helped on finalizing this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11165 Differential Revision: D9728874 Pulled By: soumith fbshipit-source-id: b8fb822a2dff6e812bbddd215d8e384534b2fd78	2018-09-07 19:41:37 -07:00
Edward Yang	c49b01a8a0	Change default variants to 'function'. (#11247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11247 Previously, the default for a declaration in native_functions.yaml was ['function', 'method'], i.e., generate both a method and function for every binding. We now believe this is inappropriate: the majority of new kernels added to PyTorch should live as free functions, NOT methods. Thus, we change the default accordingly. I also took the opportunity to de-method some "internal" functions that had a leading underscore. While, strictly speaking, this is a BC breaking change, I believe it is highly unlikely anyone was using these directly. Reviewed By: yf225 Differential Revision: D9648570 fbshipit-source-id: 8b94647b824e0899d6d18aa5585aaedc9d9957d2	2018-09-07 17:56:08 -07:00
Jesse Hellemn	fa522d1aed	Revert D9720931: [pytorch][PR] [third-party] Update googletest to release-1.8.1 Differential Revision: D9720931 Original commit changeset: 18a60d0409e7 fbshipit-source-id: a05dcba71277eb4f8ac38886f307d6cf6e6955a9	2018-09-07 17:42:03 -07:00
Yangqing Jia	c9843bd86b	Update googletest to release-1.8.1 (#11388 ) Summary: This is mainly to pick up the change `20074be19a` to avoid polluting the CMAKE_DEBUG_POSTFIX variable. cc orionr . Pull Request resolved: https://github.com/pytorch/pytorch/pull/11388 Reviewed By: orionr Differential Revision: D9720931 Pulled By: Yangqing fbshipit-source-id: 18a60d0409e74316f74d364f4fe16bf0d0198413	2018-09-07 16:56:16 -07:00
Peter Goldsborough	31d36b1d31	move complex registration test out-of-line (#11397 ) Summary: Moves the code for the complex registration code into an out-of-line C++ extension to de-noise the test_cpp_extensions.py file. Let's keep it nice and tidy so we can point our users at it for usage examples. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11397 Differential Revision: D9725335 Pulled By: goldsborough fbshipit-source-id: 290618f2ee711b1895cdb8f05276034dfe315c6d	2018-09-07 16:56:14 -07:00
James Reed	4ae16c9ad9	Recursive descent for validation + convert expands in ATen fal… (#11356 ) Summary: …lback Pull Request resolved: https://github.com/pytorch/pytorch/pull/11356 Differential Revision: D9721002 Pulled By: jamesr66a fbshipit-source-id: eeb50b56f8a72e929860c5e459a5ab50ac624814	2018-09-07 16:39:36 -07:00
Xiaomeng Yang	4c8cc36e34	Fix igios build (#11392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11392 Fix igios build Reviewed By: houseroad Differential Revision: D9720833 fbshipit-source-id: 33acc3c658c22addd4bad142433824076233e901	2018-09-07 15:55:23 -07:00
David Riazati	4bf5fc44c8	Fix split_size test failures (#11051 ) Summary: ~~This PR fixes #8525 by renaming `split_with_sizes` to `split` so that 2 `aten::split` ops are generated (previously `aten::split(self, int, int)` and `aten::split_with_sizes(self, int[], int)` were generated)~~ ~~`split_with_sizes` was made in PR #5443, but I don't see a reason for it to have a different name than `split` rather than just overload `split`.~~ This PR fixes #8525 by adding `register_special_ops.cpp` to mirror Python dispatching from `split` to `split` and `split_with_sizes` in [tensor.py](https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L279). It also fixes #8520 by adding an `int[]` wherever it sees `torch.Size` In a follow up PR this could also be used to fix some of the other `unknown builtin op` test errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11051 Differential Revision: D9582443 Pulled By: driazati fbshipit-source-id: d27201f85937d72e45e851eaa1460dd3dd1b61a9	2018-09-07 15:39:24 -07:00
Pieter Noordhuis	9886ebeb24	Remove hardcoded system path from CMAKE_MODULE_PATH (#11386 ) Summary: This seems to be causing different versions of OpenMPI being picked up by different parts of the build. Not a good practice to include absolute paths anyway, so let's try removing it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11386 Reviewed By: teng-li Differential Revision: D9724349 Pulled By: pietern fbshipit-source-id: 3dfef91c81f2e97e5125284aff9e7e98f8761917	2018-09-07 15:25:38 -07:00
Orion Reblitz-Richardson	802d21c8f4	Remove FULL_CAFFE2 flag (#11321 ) Summary: Continuing pjh5's work to remove FULL_CAFFE2 flag completely. With these changes you'll be able to also do something like ``` NO_TEST=1 python setup.py build_deps ``` and this will skip building tests in caffe2, aten, and c10d. By default the tests are built. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321 Reviewed By: mingzhe09088 Differential Revision: D9694950 Pulled By: orionr fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8	2018-09-07 15:09:44 -07:00
Tongzhou Wang	93da5a21c9	Update variable view note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11393 Differential Revision: D9725444 Pulled By: SsnL fbshipit-source-id: b1607d986ab93e64b0b0ff9e8f10d9e3f6e2160e	2018-09-07 15:09:43 -07:00
Peter Goldsborough	77b6d7d255	Doc improvements (#11347 ) Summary: 1. Remove cudnn* symbols from C++ docs 2. Fix code examples for `nn::Module` and `jit::compile` 3. Document Dropout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11347 Differential Revision: D9716751 Pulled By: goldsborough fbshipit-source-id: e0566cec35848335cac3eb9196cb244bb0c8fa45	2018-09-07 14:39:36 -07:00
Zachary DeVito	7de0332e10	Add initial documentation for JIT (#11357 ) Summary: In addition to documentation, this cleans up a few error message formats. It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs. The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress. Followup PRs will polish the docs and fix formatting issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357 Differential Revision: D9721277 Pulled By: zdevito fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b	2018-09-07 14:27:47 -07:00
Wanchao Liang	69b4b45f91	enable missing nn tests with single grad check, minor refactor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11366 Differential Revision: D9723305 Pulled By: wanchaol fbshipit-source-id: 9e7e2e7e68cb4919610bccfbf76fa33b647f6eb7	2018-09-07 14:27:46 -07:00
Teng Li	576807ce1a	flaky test fix trial (#11391 ) Summary: Add a barrier() to wait for all PG created before destroy Pull Request resolved: https://github.com/pytorch/pytorch/pull/11391 Differential Revision: D9727383 Pulled By: teng-li fbshipit-source-id: 689d62c978e642b68f4949dcf29982e34869ada4	2018-09-07 14:10:06 -07:00
Xiaodong Wang	e9da2dd3cc	Do not use PERSISTENT cudnn mode for spatialBN (#11382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11382 We found this cudnn bug in S163230 that causes accuracy loss. We fix this in D9601217, but due to the reimplementation of spatialBN it's overwritten. Let's land this fix again. Reviewed By: kuttas Differential Revision: D9702347 fbshipit-source-id: 11547e9edaf7b2ba7f4aa7263ffb4f0281bbf078	2018-09-07 13:41:18 -07:00
Peter Goldsborough	01930a3145	Move sync_params to C++ (#9805 ) Summary: The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase. I also split code into a `.h` and `.cpp` file for better code organization. The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805 Differential Revision: D9688604 Pulled By: goldsborough fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8	2018-09-07 12:56:40 -07:00
Gu Wang	ba6f10343b	update CUDAExtension doc (#11370 ) Summary: fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11370 Differential Revision: D9701777 Pulled By: soumith fbshipit-source-id: 9f3986cf30ae0491e79ca4933c675a99d6078982	2018-09-07 12:56:38 -07:00
vishwakftw	733402bef4	Fix issues with certain heterogeneous types in lists during tensor creation (#11377 ) Summary: Closes #9963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377 Differential Revision: D9701824 Pulled By: soumith fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c	2018-09-07 12:56:35 -07:00
Jerry Zhang	5e400e9cae	move context_base.h to ATen/core (#11336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336 Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc` Reviewed By: ezyang Differential Revision: D9670493 fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055	2018-09-07 12:20:25 -07:00
Peter Goldsborough	fb4e8088f3	Remove methods that start with an underscore from at::Tensor (#11152 ) Summary: This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API. For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods. ezyang colesbury gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152 Differential Revision: D9683607 Pulled By: goldsborough fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543	2018-09-07 11:55:11 -07:00
Tongzhou Wang	e80f7e1f64	Fix more warnings (#11320 ) Summary: also a missing space in fft error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/11320 Differential Revision: D9676012 Pulled By: SsnL fbshipit-source-id: a636e5fce042198510c8e456fa51fde714da8348	2018-09-07 11:26:58 -07:00
Erik Brinkman	91089a7e17	Add GPU implementation of pdist (#11102 ) Summary: Add the gpu kernel version. The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case. Test Plan --------- ``` python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist ``` Current performance specs are a little underwhelming, I'm in the process of debugging. size \| torch \| torch cuda \| scipy -----\|-------\|------------\|------ 16 x 16 \| 9.13 µs ± 3.55 µs \| 9.86 µs ± 81.5 ns \| 15.8 µs ± 1.2 µs 16 x 1024 \| 15 µs ± 224 ns \| 9.48 µs ± 88.7 ns \| 88.7 µs ± 8.83 µs 1024 x 16 \| 852 µs ± 6.03 µs \| 7.84 ms ± 6.22 µs \| 4.7 ms ± 166 µs 1024 x 1024 \| 34.1 ms ± 803 µs \| 11.5 ms ± 6.24 µs \| 273 ms ± 6.7 ms 2048 x 2048 \| 261 ms ± 3.5 ms \| 77.5 ms ± 41.5 µs \| 2.5 s ± 97.6 ms 4096 x 4096 \| 2.37 s ± 154 ms \| 636 ms ± 2.97 µs \| 25.9 s ± 394 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102 Differential Revision: D9697305 Pulled By: erikbrinkman fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99	2018-09-07 09:09:46 -07:00
Gregory Chanan	110191e5c7	Remove detach from TensorImpl, handle via Type. (#11337 ) Summary: This is so that TensorImpl does not have to depend on Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11337 Differential Revision: D9684421 Pulled By: gchanan fbshipit-source-id: d2af93420ca6d493429c251cfe5a34e9289c4484	2018-09-07 08:55:59 -07:00
Edward Yang	52b37d8b66	Move VariableHooksInterface to ATen/core (#11273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11273 This one might strike you as a bit surprising, but it's necessary to expose this interface in ATen/core, because we need to be able to get a true Variable type from Variable tensors, and to do that we need to go through the hooks interface. Reviewed By: gchanan Differential Revision: D9656548 fbshipit-source-id: 28bb5aee6ac304e8cd5fa1e4c65452c336647161	2018-09-07 08:11:53 -07:00
Edward Yang	396e64fff7	Move ATen/Registry.h to ATen/core/Registry.h (#11270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11270 Still need to deduplicate this with caffe2/core/registry.h, but this will be a bit tricky because the current formulation of the macro is namespace sensitive (i.e., the macro for classes defined in at:: namespace won't work if you call from caffe2:: namespace). Reviewed By: gchanan Differential Revision: D9654871 fbshipit-source-id: 2207d1f2cc6d50bd41bf64ce0eb0b8523b05d9d9	2018-09-07 08:11:52 -07:00
Edward Yang	b02b125d16	Rename getMaybeVariableType back to getType. (#11250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType ``` Reviewed By: gchanan Differential Revision: D9648830 fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851	2018-09-07 08:11:50 -07:00
Jongsoo Park	68371b6d2e	fast code path when partition=1 which makes LengthsPartition a simple copy (#11351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11351 When partitions == 1 (InputSize() == OutputSize()), LengthsPartition becomes just a copy. Reviewed By: aazzolini Differential Revision: D9693409 fbshipit-source-id: a9ea034d227af357b661477ab779a71600f58f58	2018-09-07 08:11:49 -07:00
vishwakftw	da4ebc2971	Switch SVD on CPU from gesvd to gesdd (#11194 ) Summary: - Added a note to the doc string for `svd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194 Differential Revision: D9683250 Pulled By: soumith fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f	2018-09-07 07:39:57 -07:00
rasbt	f9595e756e	typo/grammar fixes (#11344 ) Summary: Fixes some minor grammar issues in the code base. PS: I was actually looking for the following one but couldn't find it via grepping in this repo: ![screen shot 2018-09-06 at 3 27 39 pm](https://user-images.githubusercontent.com/5618407/45184280-1e16a980-b1ec-11e8-9cb1-87a96738bdd1.png) Any idea in which file this issue is raised? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11344 Differential Revision: D9696454 Pulled By: soumith fbshipit-source-id: 8ffe494b1bf1efb0e35563381d9da2e1e8032a3c	2018-09-06 21:57:14 -07:00
mruberry	a2afad2b69	Improves ATen CUDAEvent (#11293 ) Summary: After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR: - Adds set_device() to CUDAContext - Adds three CUDAEvent tests to stream_test.cpp - Fixes three bugs: - Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards). - Moving an event would cause a segfault. - Events were not destroyed on the device they were created on. See PR #9415 (pietern) - Adds the happened() and recordOnce() functions - Changes the record() functions to not be const - Adds additional assertions to verify correctness This PR does not: - Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR) Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here: `542aadd9a7/aten/src/ATen/native/cudnn/RNN.cpp (L1130-L1132)` and that file is the only one which currently uses the ATen CUDAEvent. Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs). I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR. apaszke ezyang pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11293 Differential Revision: D9665836 Pulled By: soumith fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2	2018-09-06 21:39:44 -07:00
Neeraj Pradhan	b3b1e7624d	Optional expand=True kwarg in distribution.enumerate_support (#11231 ) Summary: This adds an optional `expand=True` kwarg to the `distribution.expand_support()` method, to get a distribution's support without expanding the values over the distribution's `batch_shape`. - The default `expand=True` preserves the current behavior, whereas `expand=False` collapses the batch dimensions. e.g. ```python In [47]: d = dist.OneHotCategorical(torch.ones(3, 5) * 0.5) In [48]: d.batch_shape Out[48]: torch.Size([3]) In [49]: d.enumerate_support() Out[49]: tensor([[[1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.]]]) In [50]: d.enumerate_support().shape Out[50]: torch.Size([5, 3, 5]) In [51]: d.enumerate_support(expand=False) Out[51]: tensor([[[1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.]]]) In [52]: d.enumerate_support(expand=False).shape Out[52]: torch.Size([5, 1, 5]) ``` Motivation: - Currently `enumerate_support` builds up tensors of size `support + batch_shape + event_shape`, but the values are repeated over the `batch_shape` (adding little in the way of information). This can lead to expensive matrix operations over large tensors when `batch_shape` is large (see, example above), often leading to OOM issues. We use `expand=False` in Pyro for message passing inference. e.g. when enumerating over the state space in a Hidden Markov Model. This creates sparse tensors that capture the markov dependence, and allows for the possibility of using optimized matrix operations over these sparse tensors. `expand=True`, on the other hand, will create tensors that scale exponentially in size with the length of the Markov chain. - We have been using this in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py) of `torch.distributions` in Pyro. The interface has been stable, and it is already being used in a few Pyro algorithms. We think that this is more broadly applicable and will be of interest to the larger distributions community. cc. apaszke, fritzo, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11231 Differential Revision: D9696290 Pulled By: soumith fbshipit-source-id: c556f8ff374092e8366897ebe3f3b349538d9318	2018-09-06 21:39:42 -07:00
Yan Zhu	c59c1a25b2	diagnose option: get_entry to print a whole row (#11308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11299 Reviewed By: xianjiec Differential Revision: D9652844 fbshipit-source-id: 650d550317bfbed0c1f25ae7d74286cfc7c3ac70	2018-09-06 21:26:30 -07:00
Edward Yang	2946b021e3	Disable flaky test, see #11360 (#11361 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11361 Reviewed By: yf225 Differential Revision: D9696524 Pulled By: ezyang fbshipit-source-id: f6801d6f4f34090d467b16810db9cf576d5d519b	2018-09-06 20:40:00 -07:00
Edward Yang	3149a72c63	Move TensorOptions.cpp to the correct place in ATen/core (#11244 ) Summary: This actually ended up being a lot more involved than I thought. The basic problem is that in some of our build environments, thread local state is not supported. The correct way to test if this is the case is using the (undocumented) CAFFE2_FB_LIMITED_MOBILE_CAPABILITY macro. On mobile, OptionGuard is not available, and you have to do everything by hand. There's a static_assert to check if you accidentally use OptionGuard in this case and give you a better error message in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11244 Reviewed By: gchanan Differential Revision: D9646190 fbshipit-source-id: cf4016f79b47705a96ee9b6142eb34c95abb2bd4	2018-09-06 20:11:39 -07:00
Edward Yang	c45607f77f	Static assert GetMutable is not passed with Tensor argument (#11323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11323 If you do pass it this, you'll get a pointer to UndefinedTensor; probably not what you want! Reviewed By: Yangqing Differential Revision: D9676205 fbshipit-source-id: 0bd3c22c2c40ac2958f95fc7a73b908af291cf22	2018-09-06 20:11:37 -07:00
Orion Reblitz-Richardson	0f419abf40	Roll nomnigraph build into caffe2 (#11303 ) Summary: We need to remove nomnigraph from the list of public libraries in order to support libtorch extensions. Easiest way to do this is to include it into the Caffe2 source like all other caffe2/core/ code. However, because the headers are in a different place, we need to include them for linked libraries (pybind, tests, etc). On an upside, this means that nomnigraph is now default hidden visibility too. FYI peterjc123 xkszltl goldsborough bwasti Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11303 Reviewed By: pjh5 Differential Revision: D9694932 Pulled By: orionr fbshipit-source-id: 5db3eb20bc5ddc873ce9151236b74663fbb33ed8	2018-09-06 19:38:09 -07:00
iotamudelta	9de2085806	Use custom hcc/HIP, purge hcSPARSE (#11198 ) Summary: * purge hcSPARSE now that rocSPARSE is available * integrate a custom hcc and HIP * hcc brings two important compiler fixes (fixes hundreds of unit tests) * HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch) * mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault) * optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198 Differential Revision: D9652340 Pulled By: ezyang fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690	2018-09-06 19:38:07 -07:00
Xiaomeng Yang	ec5404a449	Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU Reviewed By: houseroad Differential Revision: D9512435 fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1	2018-09-06 18:26:13 -07:00
Teng Li	7726b36489	Full-fledged group testings and fixes for c10d frontend APIs (#11318 ) Summary: Fixed a few bugs that were not tested in the c10d frontend APIs, including get_rank, get_world_size, and destroy_process_group of a given group. These APIs are added to the CI tests. Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths. Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11318 Reviewed By: pietern Differential Revision: D9675896 Pulled By: teng-li fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc	2018-09-06 18:26:11 -07:00
Chenguang Xi	1a01c75dde	support gradClipping per blob in mtml (#10776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10776 as title Reviewed By: chocjy Differential Revision: D9458099 fbshipit-source-id: f840d4f1542e8180f41cc0732c8468fa43805ab8	2018-09-06 18:10:52 -07:00
Lu Fang	c39216f8c4	Automatic update of fbcode/onnx to bff0b8835870c7df7762ef43498d000d2d8ffb52 (#11346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11346 Previous import was 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c Included changes: - [bff0b88](https://github.com/onnx/onnx/commit/bff0b88): Add DynamicSlice experimental op (#1377) <James Reed> - [91a7b8e](https://github.com/onnx/onnx/commit/91a7b8e): statCoverage(model) (#1246) <Akshay Chalana> - [36643c6](https://github.com/onnx/onnx/commit/36643c6): fix the doc for softmax (#1374) <Lu Fang> - [8c64acd](https://github.com/onnx/onnx/commit/8c64acd): Silence usused result warning in ONNXIFI wrapper cleanup. Fix #1344 (#1371) <Marat Dukhan> - [53b20f6](https://github.com/onnx/onnx/commit/53b20f6): Add the ability to deprecate an OpSchema (#1317) <Ryan Hill> - [8aec4e2](https://github.com/onnx/onnx/commit/8aec4e2): [Anderspapitto patch] fix the shape inference for broadcasting (#1368) <Lu Fang> Reviewed By: jamesr66a Differential Revision: D9691533 fbshipit-source-id: 6aff6ce04ade37182e2ffe9bc83eb86846bc722d	2018-09-06 17:39:57 -07:00
Richard Zou	4d678790c5	enable advanced indexing with tensors (#10862 ) Summary: On the way to #10774 This PR adds advanced indexing with tensors. The approach is to desugar advanced indexing into an at::index op. This is exactly how normal pytorch does it. [(I used this code as reference)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) Supporting sequences is a little tricky because JIT script doesn't have an easy way to turn arbitrary n-dimensional python lists into a tensor (it would be easy if we supported `torch.tensor`), so that'll come in a future PR. cc jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10862 Differential Revision: D9659449 Pulled By: zou3519 fbshipit-source-id: 56d293720d44c0fd27909e18327ab3985ddfced6	2018-09-06 16:41:45 -07:00
Duc Ngo	148f7cc47a	nomnigraph - nit - fix generated code to be consistent with style (#11343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11343 make the generated classes (OpClasses.h...) consistent with fb cpp code style Reviewed By: yinghai Differential Revision: D9689487 fbshipit-source-id: 450e742d2462115d1bf41b9ea88d20df0a842b2b	2018-09-06 16:27:17 -07:00
Edward Yang	49231ab0a8	Reimplement storage slicing. (#11314 ) Summary: In #9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In #9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes #10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314 Reviewed By: ailzhang Differential Revision: D9671966 Pulled By: ezyang fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce	2018-09-06 16:11:59 -07:00
Jongsoo Park	1d406c04ae	fix comment on Cost params_bytes (#11190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11190 As discussed with Alexander Sidorov, params_bytes refer to the number of bytes we're reading for parameters, not the size of parameters. They only differ in sparse operators. Reviewed By: mdschatz Differential Revision: D9628635 fbshipit-source-id: 9e2aed0cf59388928dc69b8534cf254f0347c9c8	2018-09-06 15:12:22 -07:00
Yangqing Jia	68613cf5a2	Windows DLL build with Caffe2 code (#11266 ) Summary: This is an experimental build on top of what orionr and mingzhe09088 built. Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266 Reviewed By: orionr Differential Revision: D9682942 Pulled By: Yangqing fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3	2018-09-06 15:12:20 -07:00
Orion Reblitz-Richardson	34c0043aae	Force third_party Eigen from setup.py (#11334 ) Summary: We shouldn't use system Eigen in any cases when building with setup.py. If people want to use system Eigen (not from third_party) they can build with CMake for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11334 Reviewed By: pjh5 Differential Revision: D9689450 Pulled By: orionr fbshipit-source-id: baf616b9f195692942151ad201611dcfe7d927ba	2018-09-06 14:56:53 -07:00
Tommy Yu	03ca7358af	Add unit test for Parallel Spatial Batch Normalization (#11098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098 Added a test for testing CPU version across multiple devices. Reviewed By: enosair, BIT-silence Differential Revision: D9584520 fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e	2018-09-06 14:26:56 -07:00
Yinghai Lu	5712fe3297	Fix out-of-boundary conversion issue (#11338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11338 The `min_` and `max_` value of the filler is in `double` format but when we are filling a specific type of tensor, their value can exceed the type limits, resulting in crash. This diff checks the type limits first and if `min_`/`max_` is out of the limits, it will clip it. Reviewed By: highker Differential Revision: D9684455 fbshipit-source-id: 6da98a03c57f3296abaddc7c5cfc1c836c611eb0	2018-09-06 13:39:52 -07:00
Teng Li	ec195129ec	Adding setTimeout option in Store (#11265 ) Summary: This will allow users to set customized timeout option for the store. Tested by my own debug print to make sure that C++ actually used the timeout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11265 Differential Revision: D9666164 Pulled By: teng-li fbshipit-source-id: 4eb6441783da106a3fd59b95457e503e83e4640f	2018-09-06 12:55:50 -07:00
David Riazati	fef52cc1f8	Add resolver for 'torch' module (#10847 ) Summary: This lets you compile builtin functions from C++ without having a dependence on Python ```cpp auto module = torch::jit::compile(JIT"( def my_script_method(x, y): return torch.relu(x) + y )"); IValue result = module->run_method("my_script_method", 1, 2); ``` goldsborough zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847 Differential Revision: D9543461 Pulled By: driazati fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27	2018-09-06 12:42:21 -07:00
Duc Ngo	0f1ec07c57	nomnigraph - nit - rename unit test files (#11315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11315 Rename unit tests file to make it consistent with fb cpp style guideline "The unittest for MyFoo.cpp should be named MyFooTest.cpp." Reviewed By: yinghai Differential Revision: D9671519 fbshipit-source-id: 44ed6794f6e479d190916db8064eee692e3ad876	2018-09-06 12:28:18 -07:00
Peter Goldsborough	ed8849b640	Add include path to Doxygen preprocessing and add some documentation (#11313 ) Summary: 1. Add documentation to Linear and improve documentation for RNNs 2. Fix preprocessing in C++ docs by adding correct include path 3. Make myself and ebetica codeowner of docs/cpp to improve development speed ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11313 Differential Revision: D9683615 Pulled By: goldsborough fbshipit-source-id: 84ea32f9ea6b4060744aabbf5db368776a30f0b5	2018-09-06 12:28:17 -07:00
Costin Eseanu	f98bd53b01	Small fix to the UniformIntFill tensor shape and type inference. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11028 Reviewed By: salexspb Differential Revision: D7715107 Pulled By: costin-eseanu fbshipit-source-id: a4f73d53c0192b9826451b4bba4ab0992abbb1a2	2018-09-06 12:11:32 -07:00
Richard Zou	1ad61a18b2	Rename cuda tests to have 'cuda' in their names (#11332 ) Summary: Not a lot changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11332 Differential Revision: D9683680 Pulled By: zou3519 fbshipit-source-id: 95f444e54049dd268fc10effe425ef2df79c6467	2018-09-06 11:57:52 -07:00
Yiming Wu	0ef2b318a2	fix empty net type (#11286 ) Summary: Turns out that '' net.type is not acceptable to CreateNet. But empty net.type is acceptable. Fix that in this diff. Also this is related to T33613083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286 Reviewed By: Maratyszcza, wat3rBro Differential Revision: D9659920 Pulled By: harouwu fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946	2018-09-06 11:10:01 -07:00
Xiaodong Wang	936bba77d1	cudnn 7 upgrade with spatialBN fix (#11291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11291 In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%). Our current theory for this accuracy loss is because of the new "CUDNN_BATCHNORM_SPATIAL_PERSISTENT" in spatialBN operator. In Caffe 2, we've made this mode as default. According to CuDNN manual (https://fburl.com/z996mr13), this mode may introduce some limitation in the input data range and cause overflow (which outputs NaN). NaN is probably not the case, because we're seeing a few percent of accuracy drop but not gradient explosion or failure. However, this "performance-optimized" code path may introduce accuracy loss (which is not caught by our unit test case because the input data range is [-0.5-0.5]. Reviewed By: kuttas, stephenyan1231 Differential Revision: D9601217 fbshipit-source-id: 73c2690c19cb1f02ea4e5e2200f50128df4f377b	2018-09-06 10:11:59 -07:00
Elias Ellison	4ae95738b2	Ignore FuseGraph Call on Windows (#11015 ) Summary: Fusion is NYI implemented on Windows, so ignore FuseGraph call instead of failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11015 Differential Revision: D9619121 Pulled By: eellison fbshipit-source-id: ad09aeaa41b7fdeb9ca7bf5e1c166923ca405b15	2018-09-06 09:54:51 -07:00
Anders Papitto	a853a74217	defer resolution of mkl to a cmake wrapper library (#11298 ) Summary: this is a fix that's needed for building extensions with a pre-packaged pytorch. Consider the scenario where (1) pytorch is compiled and packaged on machine A (2) the package is downloaded and installed on machine B (3) an extension is compiled on machine B, using the downloaded package Before this patch, stage (1) would embed absolute paths to the system installation of mkl into the generated Caffe2Config.cmake, leading to failures in stage (3) if mkl was not at the same location on B as on A. After this patch, only a reference to the wrapper library is embedded, which is re-resolved on machine B. We are already using a similar approach for cuda. Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298 Differential Revision: D9683150 Pulled By: anderspapitto fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4	2018-09-06 09:10:39 -07:00
Orion Reblitz-Richardson	dda8402447	Cleanup dependency of distributed flags (#11221 ) Summary: Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set. cc pietern The controller you requested could not be found. cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221 Reviewed By: Yangqing Differential Revision: D9664267 Pulled By: orionr fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73	2018-09-06 08:56:00 -07:00
Gregory Chanan	68930c48cf	Move minimal wrapdim functionality to core, remove THTensor include i… (#11283 ) Summary: …n TensorImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11283 Reviewed By: ezyang Differential Revision: D9660015 Pulled By: gchanan fbshipit-source-id: 263cba226d9ee981d55281c94e6fda5842a46b02	2018-09-06 08:10:33 -07:00
Edward Yang	f6568b00f5	Change includes from ATen/Storage.h to ATen/core/Storage.h (#11217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11217 ``` codemod -d . --extensions cc,cpp,cu,cuh,h 'ATen/Storage.h' 'ATen/core/Storage.h' ``` Reviewed By: gchanan Differential Revision: D9634904 fbshipit-source-id: 35a177733f3816e32d8748513c9caa4cf13a6896	2018-09-06 08:10:30 -07:00
Richard Zou	656e81db93	Fix scalar tensor assert in fusion compiler (#10952 ) Summary: Fixes #8560. Unblocks #10715. The assert (nDim <= uncompressedDims) was being triggered for a scalar tensor because we compute nDim to be 1 for a scalar tensor but uncompressedDim = 0. This PR changes it so that we compute nDim to be 0 for a scalar tensor. This works because indexing in a kernel depends on nDim. If nDim = 0, then offset is always 0, which is what we want. Some other (small) changes were necessary to make this work: - One cannot define a 0-length array `IndexType arr[0]` so the code guards against that - Needed to change some of the maxTensorInfoSize logic to handle the case when uncompressedDim == 0. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10952 Differential Revision: D9544607 Pulled By: zou3519 fbshipit-source-id: 2b873f47e2377125e1f94eb1b310a95cda51476c	2018-09-06 07:54:57 -07:00
Bram Wasti	bb7d1837bc	Add dead code elimination pass (#10101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10101 Simple DCE enabled by knowledge of the actual outputs (stacked beneath this diff) Reviewed By: yinghai Differential Revision: D9107853 fbshipit-source-id: 0c38fe5fe408be2b7fc9e1fe6a5b7160c06ce79b	2018-09-05 23:55:17 -07:00
Teng Li	220c9e52b9	Distributed Data Parallel CPU module for C10D (#11168 ) Summary: Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed. We will keep both in the first release and remove the THD one once c10d is stable enough. Test fully covered just as THD too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11168 Differential Revision: D9674963 Pulled By: teng-li fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7	2018-09-05 21:59:31 -07:00
Jerry Zhang	126ac4b71f	Back out "[pt1][tensor] Add strides to caffe2::Tensor" Summary: Original commit changeset: 3643871b70f1 Differential Revision: D9665958 fbshipit-source-id: 46e22adbf39af92fb23abb66212991bd53a86317	2018-09-05 20:39:07 -07:00
Orion Reblitz-Richardson	fb836db4b2	Fix conv gradient conversion (#11312 ) Summary: Fix Windows build failure after https://github.com/pytorch/pytorch/pull/10744 landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11312 Reviewed By: mingzhe09088 Differential Revision: D9669907 Pulled By: orionr fbshipit-source-id: d717ec4f8fdf17acf334528d7838b88c5c50e9c3	2018-09-05 20:09:31 -07:00
Peter Goldsborough	dccd0f2de6	Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050 ) Summary: Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive). Fixed things like: - `typedef` vs `using` - Use `.empty()` instead of comparing with empty string/using `.size() == 0` - Use range for loops instead of old style loops (`modernize-`) - Remove some `virtual` + `override` - Replace `stdint.h` with `cstdint` - Replace `return Type(x, y)` with `return {x, y}` - Use boolean values (`true`/`false`) instead of numbers (1/0) - More ... ezyang apaszke cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050 Differential Revision: D9597505 Pulled By: goldsborough fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2	2018-09-05 19:55:50 -07:00
Tongzhou Wang	83a1ab2136	Sparse tensor printing; add NotImplemented autograd fn (#10181 ) Summary: Commits: 1. Add autograd function `NotImplemented` (subclass of `Error`) so python `grad_fn` prints nicer. Since `Error` is used in `DelayedError` to implement `oncedifferentiable`, I can't just change its name. cc colesbury 2. Add printing for sparse tensors. Fixes https://github.com/pytorch/pytorch/issues/9412 . cc weiyangfb The controller you requested could not be found. . 3. Add tests for sparse printing Examples: ```diff In [2]: x = torch.sparse.FloatTensor(torch.arange(4).view(2,2), torch.randn(2, 2), [10, 10, 2]) In [3]: x Out[3]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]]) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]]) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo) In [4]: x.requires_grad_() Out[4]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, requires_grad=True) In [5]: x + x Out[5]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-2.3664, -1.1855], - [ 0.1662, 0.5021]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 3.0162, 0.6902], + [-0.0785, 0.9553]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, grad_fn=<AddBackward0>) In [6]: x.double() Out[6]: - torch.sparse.DoubleTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], dtype=torch.float64, grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, dtype=torch.float64, layout=torch.sparse_coo, + grad_fn=<NotImplemented>) In [7]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2, 0), [0]) In [8]: x Out[8]: - torch.sparse.FloatTensor of size (0,) with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([], size=(2, 0)) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([], size=(2, 0)), + size=(0,), nnz=2, layout=torch.sparse_coo) In [9]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2), []) In [10]: x Out[10]: - torch.sparse.FloatTensor of size () with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([-0.0064, 0.8518]) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([ 0.9800, -0.5978]), + size=(), nnz=2, layout=torch.sparse_coo) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10181 Differential Revision: D9139845 Pulled By: SsnL fbshipit-source-id: 353eebd55fac4049ed9bf85f8b0ee2c1418a744e	2018-09-05 19:41:22 -07:00
Bram Wasti	fa147abda4	Add convertToCaffe2Proto to python API Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now Reviewed By: duc0 Differential Revision: D9670495 fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5	2018-09-05 18:40:48 -07:00
Wei Yang	425ea6b31e	fix doc for functional.dropout* (#10417 ) Summary: - fixes #4177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417 Differential Revision: D9542876 Pulled By: weiyangfb fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df	2018-09-05 17:26:00 -07:00
Jongsoo Park	ad116210e5	typo fix Tranpose2D -> Transpose2D (#11281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11281 A simple typo fix Reviewed By: BIT-silence Differential Revision: D9658324 fbshipit-source-id: b6513c8d12d8fe75a9b18df1b443e9e66e692744	2018-09-05 17:25:58 -07:00
Christian Puhrsch	a9d8b021e9	Remove THFinalizer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11287 Reviewed By: ezyang Differential Revision: D9662341 Pulled By: cpuhrsch fbshipit-source-id: 306bea00694db1ae207167ee4bf10de01426911c	2018-09-05 16:56:27 -07:00
Jesse Hellemn	c0efe6f027	Forward declarations of needed curand functions (#10911 ) Summary: Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911 Reviewed By: pjh5 Differential Revision: D9636256 Pulled By: orionr fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65	2018-09-05 16:56:26 -07:00
Duc Ngo	57728f71e7	nomnigraph - simplify core graph API and test (#11256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11256 - in deleteNode method, remove optional deleteEdge flag as it's not used - in deleteEdge method, remove optional removeRef flag as it's not used - in replaceNode method, remove optional newHead_ parameter as it's not used - also simplifying the implementation by just calling replaceInEdges and replaceOutEdges - remove importNode & importEdge as they're not in used - add getEdgeIfExists that is like getEdge() but returns nullptr instead of throwing when the edge does not exist - reduce verbosity in the basic graph unit test and add more test cases for ReplaceEdges Differential Revision: D9650913 fbshipit-source-id: 6c18b37bef0d2abe1b57fb4fc47bfdbcee387694	2018-09-05 16:40:49 -07:00
Peter Goldsborough	c43187291c	Small fixes to cppdocs for sync script (#11300 ) Summary: I'm setting up an automatic sync job for cppdocs and need two fixes to the cpp docs config: 1. Right now the cppdocs use the `torch` package to figure out the version. For C++ docs all I really need from the built package are the generated Tensor.h and Functions.h files. I can actually generate those directly via `aten/src/ATen/gen.py`, so I can skip building PyTorch altogether and save 10 minutes in the sync job! For this I need to avoid using the torch package in the docs. 2. Internal proxy issues prevent using the git link for sphinx_rtd_theme. We can just use the pip package for the cppdocs (not for the normal PyTorch docs) soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11300 Differential Revision: D9667193 Pulled By: goldsborough fbshipit-source-id: 5567e0b3d3bdce03f5856babdb4ff76bcee91846	2018-09-05 16:40:47 -07:00
Will Feng	c9e66351a7	Port all PyTorch and Caffe2 jobs to CircleCI (#11264 ) Summary: This PR adds all PyTorch and Caffe2 job configs to CircleCI. Steps for the CircleCI mini-trial: - [ ] Make sure this PR passes Jenkins CI and fbcode internal tests - [x] Approve this PR - [ ] Ask CircleCI to turn up the number of build machines - [ ] Land this PR so that the new `.circleci/config.yml` will take effect Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264 Differential Revision: D9656793 Pulled By: yf225 fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1	2018-09-05 16:28:11 -07:00
Jerry Zhang	9f4bcdf075	caffe2::DeviceType -> at::DeviceType (#11254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254 Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h: ``` template <int d> struct EventCreateFunctionRegisterer { explicit EventCreateFunctionRegisterer(EventCreateFunction f) { static_assert(d < MaxDeviceTypes, ""); Event::event_creator_[d] = f; } }; ``` at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example): 1. caffe2::DeviceType → caffe2::DeviceTypeProto 2. caffe2::CPU → caffe2::PROTO_CPU 3. caffe2::DeviceType = at::DeviceType 4. caffe2::CPU = at::DeviceType::CPU codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type, ' 'device_type(), PROTO_' + some manual changes In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU. In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later. Reviewed By: ezyang Differential Revision: D9545704 fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7	2018-09-05 16:28:09 -07:00
Yan Zhu	ac9f0a6884	refactor preproc, support dense in TumHistory layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11131 Reviewed By: xianjiec Differential Revision: D9358415 fbshipit-source-id: 38bf0e597e22d540d9e985ac8da730f80971d745	2018-09-05 16:10:13 -07:00
Natalia Gimelshein	3e85685f8f	add persistent rnns with conservative criteria (#11248 ) Summary: Persistent rnns provide much better performance on V100 with half input data for a variety of cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11248 Differential Revision: D9665687 Pulled By: ezyang fbshipit-source-id: 2bd09a7eb1f5190aadb580977b0ba956e21a7dd5	2018-09-05 16:10:11 -07:00
Richard Zou	68c2e014cb	Handling for py2/py3 division differences (#11016 ) Summary: - In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if `from __future__ import division` is not imported in the file. - The / operator is universally set to do "true" division for integers - Added a `prim::FloorDiv` operator because it is used in loop unrolling. The error if users use '/' in python 2 without importing from __future__ occurs when building the JIT AST. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016 Differential Revision: D9613527 Pulled By: zou3519 fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6	2018-09-05 14:57:38 -07:00
Pieter Noordhuis	9a0effb92c	Update send/recv tests to reflect intended use (#11275 ) Summary: The existing tests had every rank run send to every other rank and only then switch to recv mode. This only works if the send operations are non-blocking and the passed tensors are immediately copied to some kind of send buffer. Instead, every send must be matched with a recv on the other side, because from the API perspective they may block. E.g. imagine a 1GB tensor being sent to every other rank. It can only go through if there is a recv on the other side, or it will deadlock. This change reflects this in the send/recv unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11275 Differential Revision: D9658197 Pulled By: pietern fbshipit-source-id: fb6a3fc03b42343a9dfeed0def30d94914e76974	2018-09-05 14:40:04 -07:00
Martin Schatz	8da081f7a5	Add cost inference to ConvGradient and WeightedSum operators (#10744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10744 As title Reviewed By: jspark1105 Differential Revision: D9436387 fbshipit-source-id: 578b7a6d98843d57e3f8f4c564727e9cadbedd78	2018-09-05 13:56:05 -07:00
Christian Puhrsch	4fe3356ee0	Move collapse dims into a single place (#11272 ) Summary: Deduplicates implementations and reduces sources of failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/11272 Differential Revision: D9659167 Pulled By: cpuhrsch fbshipit-source-id: 759bfba4fd90795038afe684d9829f5f41f98109	2018-09-05 12:57:00 -07:00
Tongzhou Wang	5e2067ce30	Fix some more warnings (#11257 ) Summary: Found these when compiling the new master with gcc 7.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11257 Differential Revision: D9656612 Pulled By: SsnL fbshipit-source-id: 7acb19e13204c010238dab7bc6973cc97b96f9a4	2018-09-05 11:10:27 -07:00
Lu Fang	f866574afc	Fix the batchnorm onnx exporting when affine=False Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11249 Reviewed By: Ac2zoom Differential Revision: D9652526 Pulled By: houseroad fbshipit-source-id: 12a9038beddd227a2f9e2178edf4e8d623488c3e	2018-09-05 11:10:25 -07:00
Adam Paszke	55212507a2	Improve error message to include return types too (#11245 ) Summary: Fixes #11057. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11245 Differential Revision: D9652698 Pulled By: apaszke fbshipit-source-id: 4c5006e32e599c35367aa5acfae45de3ab8ac176	2018-09-05 10:56:51 -07:00
Peter Goldsborough	e6d6aed12e	Check doxygen output in travis (#11124 ) Summary: This PR adds a .travis.yml check for our C++ documentation. The goal is to avoid any documentation/comments in our C++ code that would break the doxygen output and possibly ruin the C++ documentation site (currently https://pytorch.org/cppdocs). For this, we: 1. Run doxygen and record any warnings, 2. Filter out some known bogus warnings, 3. Count the remaining warnings, 4. Fail the check if (3) is non-zero. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11124 Differential Revision: D9651011 Pulled By: goldsborough fbshipit-source-id: 30f776d23bb6d6c482c54db32828b4b99547e87b	2018-09-05 10:25:56 -07:00
Thomas Viehmann	267e1ec112	Accept more numpy scalars as doubles (#9659 ) Summary: Allows mulitplication of e.g. numpy.float32 with tensors. This came up with #9468 If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659 Differential Revision: D8948078 Pulled By: weiyangfb fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b	2018-09-05 10:25:55 -07:00
Dmitrii Marin	8bd80a6b74	Fixed log message (#10874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874 Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported Reviewed By: panshen1, Novitial Differential Revision: D9506606 fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22	2018-09-05 09:55:52 -07:00
Neeraj Pradhan	434e943b08	Fix to distribution.__repr__ with lazy attributes (#11263 ) Summary: `__repr__` currently fails for distributions with lazy attributes in PyTorch master, throwing a `KeyError`. This fixes the issue. Additionally: - Added `logits` to `arg_constraints` for distributions that accept either `probs` or `logits`. This is both to have `__repr__` display the `logits` param when available, and to be able to do validation checks (e.g. NaN checks) when the logit parametrization is used. fritzo, alicanb - I think there were reasons why we had not done so in the first place, but I am unable to recall now. It passes all the tests, but let me know if there is something that I am missing at the moment. - There are certain distributions, e.g. `OneHotCategorical` which won't show any parameters because it uses a `categorical` instance under the hood and neither `logits` / `probs` in `arg_constraints` are present in the instance's `__dict__`. This isn't addressed in this PR. cc. vishwakftw, fritzo, nadavbh12, apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11263 Differential Revision: D9654959 Pulled By: apaszke fbshipit-source-id: 16f5b20243fe8e2c13e9c528050d4df0b8ea6e45	2018-09-05 09:55:51 -07:00
Roy Li	9fc22cb772	Add import export step to end to end tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717 Differential Revision: D9562888 Pulled By: li-roy fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a	2018-09-05 09:39:47 -07:00
Edward Yang	1808e368e4	Add complex hooks for out of tree complex implementation. (#11216 ) Summary: This PR adds a hooks interface for registering types for complex scalar types, and a sample implementation of the hook in test_cpp_extensions. The hook registration is patterned off of the existing CUDA hooks. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11216 Differential Revision: D9654840 Pulled By: ezyang fbshipit-source-id: 7b97646280d584f8ed6e14ee10a4abcd04cf2987	2018-09-05 09:25:50 -07:00
Christian Puhrsch	aeb6094538	Unify opt flag for cmake codegen (#11227 ) Summary: Also enables debug for non-MSVC for kernel codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227 Differential Revision: D9656506 Pulled By: cpuhrsch fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333	2018-09-05 08:55:49 -07:00
Duc Ngo	d612855b91	nomnigraph - fix memory error in NN subgraph matchOp (#11127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11127 it's invalid to capture `predicate` by reference as it's a local variable. capture it by value instead. Differential Revision: D9600115 fbshipit-source-id: 92e0130d0a74908380b75ade5c3492df49e25941	2018-09-05 07:57:40 -07:00
Adam Paszke	6d6655e6be	Port PackedSequences functions to C++ (#11224 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224 Differential Revision: D9652703 Pulled By: apaszke fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950	2018-09-05 06:35:15 -07:00
Adam Paszke	b7038f7c37	Treat numerical differences as warnings instead of errors when tracing (#11246 ) Summary: Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246 Differential Revision: D9652701 Pulled By: apaszke fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25	2018-09-05 06:35:13 -07:00
Hector Yuen	b7cd4b692c	add a Float16UniformFill (#11123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11123 this adds an operator that fills a tensor with a uniform(min, max) the implementation is to use the fp32 generator and convert to fp16 if performance becomes an issue we could resort to intrinsics Reviewed By: jspark1105, chocjy Differential Revision: D9598142 fbshipit-source-id: 5aeab99acf7c3596fa6c33611d9d2c484f7c1145	2018-09-04 23:28:22 -07:00
Thomas Viehmann	d4060d2d0e	Implement torch.tensordot (#10025 ) Summary: Fixes: #8988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025 Reviewed By: ezyang Differential Revision: D9540967 Pulled By: yf225 fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb	2018-09-04 21:10:07 -07:00
Yiming Wu	d1b920b44f	keep net type info when generating model complete net (#11032 ) Summary: keep net type info when generating model complete net. This will keep the performance optimization option Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032 Reviewed By: wat3rBro Differential Revision: D9564125 Pulled By: harouwu fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df	2018-09-04 21:10:06 -07:00
Edward Yang	56bdd87b40	Get rid of some uses of type() (#11215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11215 I found these by deleting the implicit conversion of Type to TensorOptions and then fixing sites. This isn't a complete refactor, because I ran out of steam after fixing this many and decided to keep the implicit conversion. Still, why waste a perfectly good refactor? Reviewed By: gchanan, cpuhrsch Differential Revision: D9634750 fbshipit-source-id: 4d8fb778e13e6e24b888b1314a02709b2cb00b62	2018-09-04 20:26:22 -07:00
Edward Yang	9ca63c5e63	Reorganize methods in Type, add CPUTypeDefault/CUDATypeDefault (#11205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11205 Our short term plan for supporting out of tree complex development requires an external library to add a custom subclass of Type without access to the code generation facilities in ATen. This commit reorganizes Type so as to minimize the amount of boilerplate you have to write when making a subclass of Type. In particular, it: - Creates a new CPUTypeDefault/CUDATypeDefault class, which you are intended to inherit from, which provides default implementations of CPU/CUDA that is layout/dtype agnostic. - Adds new getCPUAllocator() and getCUDAAllocator() functions, as a more public API to get your hands on Allocator - Adds allocator() and getDeviceFromPtr(), abstracting the device specific parts of storage() methods; these methods are now implemented in base TypeDefault. - Delete the static typeString() method, which is now dead. - Move is_cuda/is_sparse/is_distributed to TypeDefault. Reviewed By: SsnL Differential Revision: D9631619 fbshipit-source-id: 40b600d99691230e36e03eb56434c351cbc2aa3a	2018-09-04 20:26:20 -07:00
Peter Goldsborough	f0d3fda064	Improve docs for torch::nn::Module (#11115 ) Summary: Added some documentation. Will rebuild docs to make sure it looks good. Can already accept approvals. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11115 Differential Revision: D9597880 Pulled By: goldsborough fbshipit-source-id: 56b701da631702ba56e281a0de0f7ebe490f5c5a	2018-09-04 18:10:38 -07:00
Gregory Chanan	7f74875304	Pull Context out of TensorMethods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11241 Reviewed By: ezyang Differential Revision: D9645514 Pulled By: gchanan fbshipit-source-id: 43e65d1d2fa3183264ed7e4752c1512df5f69175	2018-09-04 18:10:37 -07:00
Gregory Chanan	05cb40dc00	Move some includes from Tensor/Type to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11234 Reviewed By: ezyang Differential Revision: D9642669 Pulled By: gchanan fbshipit-source-id: 2c131bb46b54a0803c37b444ad48d861080056f1	2018-09-04 18:10:34 -07:00
Orion Reblitz-Richardson	c8672f0b42	Support environments with no libprotobuf (#11161 ) Summary: Just pulling this out of https://github.com/pytorch/pytorch/pull/10611 Make sure we can support environments where we don't have libprotobuf installed when we link-local protobuf. cc goldsborough Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11161 Differential Revision: D9650282 Pulled By: orionr fbshipit-source-id: 447b5e54cd2639973b4b10f58590d1c693a988d4	2018-09-04 17:27:54 -07:00
Teng Li	020501b7b0	Getting rid of USE_C10D for build (#11237 ) Summary: Will use USE_DISTRIBUTED for both c10d and THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237 Differential Revision: D9647825 Pulled By: teng-li fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969	2018-09-04 17:27:53 -07:00
Christian Puhrsch	313e89d8db	Fix dimension collapsing (#11226 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226 Differential Revision: D9646638 Pulled By: cpuhrsch fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0	2018-09-04 17:27:52 -07:00
Gregory Chanan	6219c4a28f	Make Scalar::toTensor a free function, move Scalar to ATen/core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125 Reviewed By: ezyang Differential Revision: D9599798 Pulled By: gchanan fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4	2018-09-04 16:25:57 -07:00
Pieter Noordhuis	033499cf56	Remove mention of USE_DISTRIBUTED_MW (#11240 ) Summary: This was lingering after #10731. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240 Differential Revision: D9645437 Pulled By: pietern fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d	2018-09-04 16:10:20 -07:00
Mingzhe Li	3f30c296d3	Export CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_* (#11233 ) Summary: This PR resolved the following compilation errors on devgpu: /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_Tan()' /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_MaxPool3D()' .... The same error has been happening with caffe2 build with debug mode before build_caffe2 was removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11233 Reviewed By: orionr Differential Revision: D9645527 Pulled By: mingzhe09088 fbshipit-source-id: 68a45aa7fd815cac41b7fd64cfd9838b3226345a	2018-09-04 14:56:43 -07:00
Maxim Naumov	7e0a052a5d	Adding synthetic data generation to the filler.h file (#11060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060 Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on). Reviewed By: highker Differential Revision: D9417594 fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7	2018-09-04 13:40:53 -07:00
Zachary DeVito	1eed7d5f0b	Report an error when trying to record a mutable operator when (#11129 ) Summary: there are multiple views of the tensor live. Also adds recording for copy_ because this is the critical in place op where these views will cause LHS indexing to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129 Differential Revision: D9600195 Pulled By: zdevito fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3	2018-09-04 13:40:51 -07:00
Yury Gitman	0e8088d6f6	Fix typo in data_parallel_model Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11086 Differential Revision: D9581297 fbshipit-source-id: b164177bdbb309f56ff3231c1ffc0973f6c5299b	2018-09-04 13:15:31 -07:00
Bram Wasti	ec6f0ed560	Additional Python Bindings Summary: Major change: - Addition of pattern matching bindings Minor change: - OperatorDef instantiation - Generic Graph API Reviewed By: duc0 Differential Revision: D9546205 fbshipit-source-id: ab5274014be23a3e9e3fcf18ae1815c4f387b83c	2018-09-04 12:10:10 -07:00
Elias Ellison	750cd48980	update expect file for short circuiting (#11229 ) Summary: Fix failing test by updating expect file Pull Request resolved: https://github.com/pytorch/pytorch/pull/11229 Differential Revision: D9638587 Pulled By: eellison fbshipit-source-id: e870ef3a4fbc7e07f299cc9413703d9f77e89895	2018-09-04 11:56:09 -07:00
Yangqing Jia	684b55d762	In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020 ) Summary: TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020 Differential Revision: D9562548 Pulled By: Yangqing fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5	2018-09-04 10:56:22 -07:00
Elias Ellison	539579aa9a	Logical short circuit (#11116 ) Summary: Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated. BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python. One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean. `tensor == (False or tensor)` `tensor == (True and tensor)` We do not currently support this. edit: wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116 Differential Revision: D9618168 Pulled By: eellison fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343	2018-09-04 09:25:13 -07:00
Edward Yang	b2217109ec	Move TensorOptions to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11147 Reviewed By: gchanan Differential Revision: D9614321 fbshipit-source-id: 618cb342eb7c52181425f6bb9c17b9ecdb87a394	2018-09-04 08:55:54 -07:00
Edward Yang	0ff1bb0d8a	Remove Type constructor from TensorOptions, add Type::options (#11189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189 Replaces it with an operator TensorOptions() method on Type, reestablishing the implicit conversion. I originally wanted to get rid of the implicit conversion entirely, but there were a lot of use-sites, so I added it back to avoid a huge codemod. In this patch, I only had to fix sites that used the optional device_index API. Reviewed By: cpuhrsch Differential Revision: D9628281 fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210	2018-09-04 08:10:04 -07:00
Tongzhou Wang	0d5e4a2c66	Allow passing through arguments to unittest (#11209 ) Summary: Example: ```sh python run_test.py -i sparse -- TestSparse.test_factory_size_check -f ``` With this, the `--verbose` option is redundant (one can call `python run_test.py -- -v` instead of `python run_test.py -v`. But since this is (probably) a frequently used flag, I didn't remove the existing easier-to-use option. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11209 Differential Revision: D9632215 Pulled By: SsnL fbshipit-source-id: ff522802da11ef0a0714578be46e4a44f6343d44	2018-09-03 20:09:08 -07:00
Tongzhou Wang	050aa42e09	Fix some more compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11208 Differential Revision: D9632216 Pulled By: SsnL fbshipit-source-id: b181f3ce114474e171146cd2ac5de150b0e23f75	2018-09-03 19:39:33 -07:00
Edward Yang	cd4c32691d	Add complex32, complex64 and complex128 dtypes (#11173 ) Summary: We don't generate a corresponding Type implementations for them, so this doesn't do anything at the moment. We don't plan on supporting complex32 in the near future, but it is added to reserve the name and number in case we do at some point in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173 Reviewed By: SsnL Differential Revision: D9627477 Pulled By: ezyang fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6	2018-09-03 19:19:36 -07:00
gngdb	c5b021cc88	State dict loading arguments were in the wrong order (#11200 ) Summary: In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200 Differential Revision: D9631160 Pulled By: SsnL fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be	2018-09-03 15:42:30 -07:00
Tongzhou Wang	7e2136c2b5	remove allclose from test_doc skipped list Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187 Differential Revision: D9628349 Pulled By: SsnL fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6	2018-09-03 09:39:56 -07:00
iotamudelta	24eb5ad0c5	Fix unit tests on CI (#11191 ) Summary: Disables two of the unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11191 Differential Revision: D9628702 Pulled By: ezyang fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265	2018-09-02 21:54:47 -07:00
Edward Yang	0a8c8c1dbe	Rename real to scalar_t. (#11163 ) Summary: This is necessary to allow us to use the complex header which defines real (and is very sad if real is macro'ed). We should also fix accreal, ureal, Real and REAL, but only 'real' is the real blocker. ``` codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163 Reviewed By: SsnL Differential Revision: D9619906 Pulled By: ezyang fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942	2018-09-02 15:26:01 -07:00
Edward Yang	43fd6b234d	Make Type a (mostly) pure virtual class; TypeDefault for impls (#11013 ) (#11013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11013 Previously, the parent class Type also contained a large number of implementations, for things like broadcasting and native functions that didn't need dispatch. We'd like to be able to reference this interface from Tensor even when we don't have any of these implementations are available. To do this, we convert Type into a truly pure virtual interface, and move all of the implementations to TypeDefault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11181 Differential Revision: D9561478 Pulled By: ezyang fbshipit-source-id: 13c49d80bc547551adf524b1cf1d691bfe311133	2018-09-02 15:25:59 -07:00
Tongliang Liao	e1a17d5a42	Should not use CAFFE2_API when definition is already in header. (#11114 ) Summary: Remove or use CAFFE2_EXPORT. Fix #11108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11114 Differential Revision: D9628293 Pulled By: ezyang fbshipit-source-id: dc3bb7dc5bc299e3b6cfd1cdd640f618c206fb5a	2018-09-02 14:39:38 -07:00
pbialecki	cf10efb8d4	Fixes unclear exception message for F.conv2d (#11053 ) Summary: Fixes #11033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11053 Differential Revision: D9573606 Pulled By: soumith fbshipit-source-id: 9729cbd6c8afcef0fd487bdd425b0d1f55189009	2018-09-02 13:39:34 -07:00
vishwakftw	593d74061f	Document torch.allclose (#11185 ) Summary: - Modify torch.autograd.gradcheck to use torch.allclose instead - Expose doc strings Closes #10355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185 Differential Revision: D9628016 Pulled By: soumith fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75	2018-09-02 09:26:07 -07:00
iotamudelta	33c7cc13ca	improve docker packages, fix bugs, enable tests, enable FFT (#10893 ) Summary: * improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs) * integrate rocFFT (i.e., enable Fourier functionality) * fix bugs in ROCm caused by wrong warp size * enable more test sets, skip the tests that don't work on ROCm yet * don't disable asserts any longer in hipification * small improvements Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893 Differential Revision: D9615053 Pulled By: ezyang fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b	2018-09-02 08:54:42 -07:00
Samuel Ainsworth	abe8b3391d	LowRankMultivariateNormal cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11179 Differential Revision: D9627502 Pulled By: soumith fbshipit-source-id: c7a4aa8be24bd8c688a7c655ff25ca901ed19704	2018-09-02 07:54:56 -07:00
Marcin Elantkowski	4d28b65fb8	fix serialization of nn.Parameter with dill (#10296 ) Summary: Should resolve #9981. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10296 Differential Revision: D9196353 Pulled By: soumith fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279	2018-09-01 23:55:40 -07:00
Tongzhou Wang	1350f76b62	Fix max and min with inf on CUDA (#11091 ) Summary: Fixes #10237 #11084 cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091 Differential Revision: D9582859 Pulled By: SsnL fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10	2018-09-01 23:09:23 -07:00
Owen Anderson	7eba9849c1	Pool constants during script compilation. (#10231 ) Summary: This places all constants in the entry block of the graph, and de-duplicates them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10231 Differential Revision: D9601501 Pulled By: resistor fbshipit-source-id: daa10ed8c99e9894830d6f3e5d65c8d3ab5ea899	2018-09-01 22:40:50 -07:00
Edward Yang	7af6f9515f	Move TensorAccessor to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11014 Reviewed By: cpuhrsch Differential Revision: D9561802 fbshipit-source-id: d3dbe6d7e76e2419ead81fb448711f101daee19f	2018-09-01 21:41:26 -07:00
Tongzhou Wang	011f615945	Fix compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11177 Reviewed By: soumith Differential Revision: D9626443 Pulled By: SsnL fbshipit-source-id: e75d893e1e91e49d3e7b021892434489d8df7987	2018-09-01 21:41:25 -07:00
Adam Paszke	1506547771	Disable -Werror on macOS test build (#11090 ) Summary: cc goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/11090 Reviewed By: soumith Differential Revision: D9582525 Pulled By: apaszke fbshipit-source-id: 5d2c6e930e7b09f0ed5a35fbf4fe36b8845a2580	2018-09-01 21:09:49 -07:00
Soumith Chintala	f60a2b682e	allow spaces in filename for jit-compiled cpp_extensions (#11146 ) Summary: Now, folder having spaces will not error out for `torch.utils.cpp_extensionload(name="xxx", sources=["xxx.cpp"], verbose=True)` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11146 Differential Revision: D9618838 Pulled By: soumith fbshipit-source-id: 63fb49bfddc0998dccd8a33a6935543b1a6c2def	2018-09-01 20:39:51 -07:00
James Reed	43e73f85ad	Dont optimize slicing dispatch when we are tracing (#11156 ) Summary: Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization. This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156 Differential Revision: D9622252 Pulled By: jamesr66a fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc	2018-09-01 17:13:03 -07:00
Xiaomeng Yang	b3d559cdd1	Optimize WeightedSumOp for two inputs (#11049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049 Optimize WeightedSumOp for two inputs Reviewed By: houseroad Differential Revision: D9566692 fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f	2018-09-01 11:54:55 -07:00
Shihao Xu	b834d9107e	Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164 Revert D9566744 Reviewed By: enosair Differential Revision: D9620272 fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02	2018-08-31 22:25:57 -07:00
Lu Fang	1b7172a2b9	fix the slice onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11117 Reviewed By: MisterTea Differential Revision: D9597870 Pulled By: houseroad fbshipit-source-id: 3a2a307ee327397939bedb9150f780682e18a89a	2018-08-31 17:40:03 -07:00
James Reed	03c06ec93d	Traceable detach (#11038 ) Summary: This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038 Differential Revision: D9588038 Pulled By: jamesr66a fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7	2018-08-31 16:40:42 -07:00
Christian Puhrsch	861e1c430c	Move StorageImpl and Storage to core (#11154 ) Summary: Will need to be accessible by caffe2 This also removes a bunch of unnecessary includes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11154 Reviewed By: ezyang Differential Revision: D9618681 Pulled By: cpuhrsch fbshipit-source-id: 838a87b75d9c3959e145fd5fca13b63bc5de7bd3	2018-08-31 15:55:26 -07:00
Peter Goldsborough	4abddad1a0	use py::str to remove deprecation warnings (#11107 ) Summary: ``` In file included from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/cast.h:13:0, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/attr.h:13, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pybind11.h:43, from caffe2/torch/csrc/utils/pybind.h:6, from caffe2/torch/csrc/jit/pybind.h:5, from caffe2/torch/csrc/jit/script/init.h:3, from caffe2/torch/csrc/jit/script/init.cpp:1: third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pytypes.h:118:19: note: declared here In file included from caffe2/torch/csrc/jit/pybind.h:12:0, from caffe2/torch/csrc/jit/python_ir.cpp:4: caffe2/torch/csrc/jit/pybind_utils.h: In function 'torch::jit::IValue torch::jit::argumentToIValue(const torch::jit::FunctionSchema&, size_t, pybind11::handle)': caffe2/torch/csrc/jit/pybind_utils.h:138:226: warning: 'pybind11::str pybind11::detail::object_api<Derived>::str() const [with Derived = pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>]' is deprecated: Use py::str(obj) instead [-Wdeprecated-declarations] ``` apaszke zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11107 Differential Revision: D9598040 Pulled By: goldsborough fbshipit-source-id: 4a055353ac08d54a2bbca49573ff099310de3666	2018-08-31 15:25:04 -07:00
Lu Fang	c48bf3a77e	Automatic update of fbcode/onnx to 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c (#11153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11153 Previous import was bae6333e149a59a3faa9c4d9c44974373dcf5256 Included changes: - [1b09eb1](https://github.com/onnx/onnx/commit/1b09eb1): Fix the shape inference for concat (#1361) <Lu Fang> - [7b9b3ee](https://github.com/onnx/onnx/commit/7b9b3ee): ONNX v1.3.0 release (#1359) <bddppq> Reviewed By: Ac2zoom Differential Revision: D9615844 fbshipit-source-id: f1d4e2d6ef72a269d6ab3c1c347b272b5bdc4f2a	2018-08-31 14:55:15 -07:00
Peter Goldsborough	5987b44dda	Remove aten doc/ folder (#11158 ) Summary: ATen's doc/ folder is manually maintained and can thus cause confusion with the generated file. We now have proper online documentation for ATen, which is superior to ATen doc/. Let's delete ATen/doc. ezyang apaszke soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11158 Differential Revision: D9618782 Pulled By: goldsborough fbshipit-source-id: 0ef14f84947601a0589aa4a41e5c8619783426fe	2018-08-31 14:55:13 -07:00
Adam Paszke	3081c8ea1d	Lower trivial differentiable subgraphs (#11110 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110 Differential Revision: D9616408 Pulled By: apaszke fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867	2018-08-31 14:55:10 -07:00
Christian Puhrsch	c87d082d26	Use ->data<real>() instead of THTensor_(data) and c10::raw::intrusive_ptr::decref instead of _free (#11039 ) Summary: Codemod used for this ``` grep -rnw "THTensor_(free)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(free)($[^)]$)/c10::raw::intrusive_ptr::decref(\1)/g" {} ``` ``` grep -rnw "THTensor_(data)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(data)($[^)]$)/\1->data<real>()/g" {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11039 Reviewed By: ezyang Differential Revision: D9617265 Pulled By: cpuhrsch fbshipit-source-id: d9e7581867a335703f82f4556cead2b32b97bd83	2018-08-31 14:27:09 -07:00
Edward Yang	adeebed549	Delete TensorImpl::toString() (#11035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11035 Instead, inline its definition into Tensor. We need to do this so we can avoid needing to getType() from TensorImpl. Reviewed By: cpuhrsch Differential Revision: D9564516 fbshipit-source-id: 19fdaa2b93419e21572b9916714aee4165cb3390	2018-08-31 14:27:08 -07:00
Edward Yang	5286925d4a	Add getMaybeVariableType(const TensorImpl*) (#11031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11031 The eventual plan is to get rid of TensorImpl::type() entirely; but first we need a function to call. Reviewed By: cpuhrsch Differential Revision: D9564206 fbshipit-source-id: b59a9ccfaed44199f185eff392835cec89ccda8e	2018-08-31 14:27:06 -07:00
Edward Yang	2c5ae8c4bf	Get rid of type() method on TensorOptions; use at::getType instead (#11023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023 I'd like TensorOptions to not know anything about Context, so I can move it to ATen/core without pulling in Context. To do this, the type() method has to go, since it consults the context to get a Type. Reviewed By: cpuhrsch Differential Revision: D9562467 fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0	2018-08-31 14:27:05 -07:00
Edward Yang	fd110411b7	Don't convert TensorOptions to type before printing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11145 Reviewed By: cpuhrsch Differential Revision: D9613897 fbshipit-source-id: eaa28b24992e8202cecb5ab97fa541fcf49a205f	2018-08-31 14:27:03 -07:00
Edward Yang	48c2f3cf0f	Move TensorOptions Tensor methods to TensorMethods.h (#11144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11144 We can move them now that TensorMethods no longer references Tensor. Reviewed By: cpuhrsch Differential Revision: D9613800 fbshipit-source-id: 99ad1dd7d77eb319000769230b7016294cf1980f	2018-08-31 14:27:02 -07:00
Adam Paszke	780d2792c5	Warn about non-traceable behavior when tracing (#11088 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088 Differential Revision: D9585527 Pulled By: apaszke fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2	2018-08-31 14:27:00 -07:00
Peter Goldsborough	c31ebccd01	Clean up TupleType and SchemaParser (#11007 ) Summary: Some fixes to address your comments zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11007 Differential Revision: D9597750 Pulled By: goldsborough fbshipit-source-id: f35f4801707dff2367e9dfc7d4e968357bc2b832	2018-08-31 14:26:59 -07:00
Sebastian Messmer	f4b2961af9	Simplify assignment operators (#11027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11027 Using swap() as a primitive, copy and move assignment become much easier. Reviewed By: ezyang Differential Revision: D9563753 fbshipit-source-id: e74faf39b596f097de758bfe038639565807040a	2018-08-31 13:43:41 -07:00
Orion Reblitz-Richardson	6508db7421	Remove BUILD_CAFFE2 and build everything (#8338 ) Summary: This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification. cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338 Reviewed By: mingzhe09088 Differential Revision: D9600513 Pulled By: orionr fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d	2018-08-31 13:10:24 -07:00
Edward Yang	a2a584f347	Proper recompilation tracking for more files in tools/autograd (#11143 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11143 Differential Revision: D9613758 Pulled By: ezyang fbshipit-source-id: 08ed143739438435e0e8219dff3a738ab424c3e1	2018-08-31 13:10:21 -07:00
Teng Li	3791bd12c8	PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128 ) Summary: Added MPI group support. And this will make all previous group test cases of MPI passed. Also, release the MPI thread level support by serializing different PG's MPI ops. This is required. The build is fixed too Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128 Differential Revision: D9602188 Pulled By: teng-li fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497	2018-08-31 12:39:56 -07:00
Edward Yang	d95e68c8cc	Delete Tensor constructor from TensorOptions. (#11101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101 I'd like to invert the dependency between Tensor and TensorOptions (such that Tensor includes TensorOptions); to do this, I'd prefer there to not be a Tensor constructor. Eventually, all references of Tensor will disappear from TensorOptions.h Reviewed By: cpuhrsch Differential Revision: D9585627 fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840	2018-08-31 09:55:01 -07:00
Edward Yang	a585158c9e	Some usage examples for TensorOptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11081 Reviewed By: goldsborough Differential Revision: D9579371 fbshipit-source-id: 329a07fc2e58f57384c8a840bcdebc2c6d4f7bb1	2018-08-31 09:40:30 -07:00
Hector Yuen	e2bdd35cf0	fixes to device.cc (#11122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11122 these changes add fixes to device.cc that are appropriate to create the intra-device-copies for opencl Reviewed By: bwasti Differential Revision: D9553292 fbshipit-source-id: e59f17916b5df30a504adee0718f9cecfe28f35a	2018-08-31 09:25:26 -07:00
Edward Yang	f30fd7fb5c	Get rid of the runtime type in TensorOptions (#11021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11021 We can now store a boolean saying if we want a Variable or not, and context can use VariableHooks to get a VariableType if we request one. Reviewed By: cpuhrsch Differential Revision: D9562312 fbshipit-source-id: 84653cd789622764132252406a5ea1a83eee3360	2018-08-31 09:10:52 -07:00
Edward Yang	1db5a7d8f0	Move variable getType lookup support to Context Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11017 Reviewed By: cpuhrsch Differential Revision: D9562197 fbshipit-source-id: dd00c79592d6c59f2e21c9d62fea3a2c093b609b	2018-08-31 09:10:51 -07:00
Edward Yang	9fac0a5093	Rename at::getType to at::getNonVariableType (#11096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096 To discourage willy-nilly use, and make it clearer that it is not a Variable Reviewed By: cpuhrsch Differential Revision: D9583699 fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae	2018-08-31 09:10:49 -07:00
Adam Paszke	0961c923c0	Unbreak the build Summary: The controller you requested could not be found. fbshipit-source-id: 861021dbe88f84d1a8bd80e04dd684527384629f	2018-08-31 08:13:12 -07:00
Edward Yang	3073051a18	Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam Differential Revision: D9554375 Original commit changeset: b88768f470ef fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1	2018-08-31 07:54:31 -07:00
Adam Paszke	82aeebb3d9	Fix a bug in addmm fusion in the JIT (#11100 ) Summary: Fixes #10839. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100 Differential Revision: D9585533 Pulled By: apaszke fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4	2018-08-31 07:24:34 -07:00
Chenguang Xi	0555768e0f	Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993 as title Reviewed By: chocjy Differential Revision: D9554375 fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422	2018-08-31 00:55:39 -07:00
Shashank Singh	f1bfe6750f	Back out "[caffe2] Update blackbox predictor with new constructor" (#11105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11105 Reverts: D9516972 See this discussion for context: https://fburl.com/w45hb1oc Reviewed By: highker Differential Revision: D9587931 fbshipit-source-id: 715247929d819dfa88e1d051021e51c5bf0c4835	2018-08-31 00:55:36 -07:00
Ansha Yu	9fae8fcdff	framework for committed serialized tests (#10594 ) Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8	2018-08-30 22:41:46 -07:00
Adam Paszke	00df09b65d	Change specialization rules in GraphExecutors (#10977 ) Summary: Review last commit only. Stacked on top of #10949. This commit fixes a number of issues connected to caching differentiability status of graphs inside graph executors, and changes the rules for optimization of differentiable subgraphs. Previously every one of those was instantiated as a separate graph executor, but now they are simply heavier-optimized graph regions, and graph executors are only instantiated for their backward. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977 Differential Revision: D9600626 Pulled By: apaszke fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3	2018-08-30 22:11:01 -07:00
Jerry Zhang	a320e5cbd3	Move static_context outside of class (#11097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11097 att Reviewed By: ezyang Differential Revision: D9549702 fbshipit-source-id: 058b942311b00be20a0b557ba97eb3451ea55e33	2018-08-30 22:10:58 -07:00
Edward Yang	750ede7215	Rename getType to getVariableTypeFromBaseType / getVariableType (#11095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095 We used getType to mean a lot of things. - getVariableTypeFromBaseType: given a base Type (non-Variable type) compute the Variable Type which corresponds to it. - getVariableType: like at::getType, but return the Variable type rather than the plain type. This rename makes it clearer at the use-site what things are what, and will make a subsequent rename of at::getType easier. Reviewed By: gchanan, cpuhrsch Differential Revision: D9583630 fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca	2018-08-30 20:11:25 -07:00
Edward Yang	c836a04dc8	Delete a bunch of uses of getType in favor of TensorOptions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11087 Reviewed By: cpuhrsch Differential Revision: D9581560 fbshipit-source-id: ebe3c4c0956da8a7215ada287bf6526dbcb2b07d	2018-08-30 20:11:24 -07:00
Edward Yang	34a0604d51	Eliminate use of getType from DLConvertor (#11080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11080 - Add a new TensorOptions(Device, ScalarType) constructor, which serves roughly the same role as getType used to. We shouldn't get too wild with these constructors, but since this particular one was widely used by getType, it seems worth adding. - Change DLPack DeviceType conversion to at::DeviceType, rather than at::Backend. While I'm at, add a few more conversions that at::DeviceType understands. - Add a new overload of from_blob which understands strides. Reviewed By: gchanan, cpuhrsch Differential Revision: D9578734 fbshipit-source-id: 28288ec053aae8765e23925ab91023398d632d6b	2018-08-30 20:11:23 -07:00
Edward Yang	c283acce72	Rename getTypeRaw to getNonVariableTypeRaw (#11078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11078 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeRaw getNonVariableTypeRaw ``` Reviewed By: gchanan, cpuhrsch Differential Revision: D9578399 fbshipit-source-id: 00a86ae8fb00d14116762ce39d15858da9a1671e	2018-08-30 20:11:21 -07:00
Edward Yang	66c4d7e060	Rename getTypeOpt to getNonVariableTypeOpt (#11077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11077 getType now supports retrieving variable types, so make it clearer when a getType function does NOT give you a variable type. ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeOpt getNonVariableTypeOpt ``` Reviewed By: gchanan Differential Revision: D9578398 fbshipit-source-id: 3ee502ac5c714849917f11ddc71de8eacfdaa9d3	2018-08-30 20:11:20 -07:00
Adam Paszke	f3c3127c67	Don't flatten output lists in the JIT IR (#10949 ) Summary: Operators like aten::chunk used to return a number of tensors, but now return a list. To make it easier to do shape prop through aten::chunk and fuse it, I've also introduced prim::ConstantChunk, which behaves like the previous implementation (has a variable length output list). The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949 Reviewed By: zdevito Differential Revision: D9556823 Pulled By: apaszke fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee	2018-08-30 19:54:39 -07:00
Orion Reblitz-Richardson	c8c21fa2b4	Allow same flags when glog is used or not (#11034 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11034 Reviewed By: mingzhe09088 Differential Revision: D9582801 Pulled By: orionr fbshipit-source-id: b41ca1bebf6cf62fff2a2b8caf4c94af3e43db00	2018-08-30 19:24:51 -07:00
Fei Sun	26409a4300	Caffe2 flags needs to be used after the GlobalInit function is called Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120 Reviewed By: llyfacebook Differential Revision: D9598430 Pulled By: sf-wind fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a	2018-08-30 19:10:39 -07:00
Hector Yuen	a6cb41486d	update documentation for observers Summary: update to the latest observer usage syntax add an example of HistogramObservers Reviewed By: jspark1105 Differential Revision: D6878439 fbshipit-source-id: c9521f2daecfc7f0c17de6a944dce58e568e3dbe	2018-08-30 18:11:48 -07:00
Tongliang Liao	15314c7b8e	GCC-7 doesn't like the original syntax. (#10665 ) Summary: Replace with "this->template f<T>()". Fix #7881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10665 Differential Revision: D9597187 Pulled By: ezyang fbshipit-source-id: 8af4e7efd98edadabb97e2523a58bd21bc116d1a	2018-08-30 16:41:16 -07:00
Jerry Zhang	684bd1b7bd	size_ -> numel_ (#11112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11112 att Reviewed By: ezyang Differential Revision: D9474018 fbshipit-source-id: d9267e52e2d50dac7524a456a44f2e28b6c0b693	2018-08-30 16:41:13 -07:00
Peter Goldsborough	7ddc6f84c4	NULL -> nullptr (#11047 ) Summary: How did we get so many uses of `NULL` again? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047 Differential Revision: D9566799 Pulled By: goldsborough fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3	2018-08-30 16:25:42 -07:00
Junjie Bai	302e9cb815	Update onnx submodule to onnx/onnx@bae6333 (#10961 ) Summary: ONNX v1.3.0 release Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961 Reviewed By: houseroad Differential Revision: D9543998 Pulled By: bddppq fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef	2018-08-30 15:25:57 -07:00
Orion Reblitz-Richardson	56c737a9b7	Inject GetEmptyStringAlreadyInited once for static proto (#11045 ) Summary: I've been seeing a lot of warnings about multiple declarations of this. Hopefully this fixes it. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11045 Reviewed By: mingzhe09088 Differential Revision: D9582756 Pulled By: orionr fbshipit-source-id: 6171485609a2f2f357d6e1c44e26b4ecfcdb4ce6	2018-08-30 14:59:54 -07:00
Jerry Zhang	a136d29fd1	Use intrusive_ptr in Storage (#10907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10907 replace shared_ptr with intrusive_ptr in Storage Reviewed By: ezyang Differential Revision: D9414388 fbshipit-source-id: d413549ffde24959166d2dff2042b99f0c5018af	2018-08-30 14:59:52 -07:00
Adam Paszke	f0142faab0	Expose arbitrary cpp autograd functions to Python (#11082 ) Summary: This is needed because the JIT declares some custom autograd functions. colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/11082 Differential Revision: D9580456 Pulled By: apaszke fbshipit-source-id: 6bf00c1188a20b2ee6ecf60e5a0099f8263ad55a	2018-08-30 14:25:59 -07:00
Zachary DeVito	93bd291e55	Change torch.jit.trace to no longer be a decorator (#11069 ) Summary: This was done because it surprising for a decorator to run a function rather than wrap it, and not simplify the syntax for tracing modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069 Reviewed By: jamesr66a Differential Revision: D9583192 Pulled By: zdevito fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13	2018-08-30 13:56:05 -07:00
Sebastian Messmer	ebe9d204fa	Add test cases to intrusive_ptr (#11026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11026 ezyang fixed a bug with moving or copying an intrusive_ptr into itself. This diff adds test cases for it. Reviewed By: ezyang Differential Revision: D9563464 fbshipit-source-id: 3a3b3f681124730d2500b276c0135c3bba7875ae	2018-08-30 13:25:33 -07:00
Tongzhou Wang	e85f3fccb3	Fix relying on UB in test_data_parallel_nested_output (#11092 ) Summary: We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092 Reviewed By: ezyang Differential Revision: D9583274 Pulled By: SsnL fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7	2018-08-30 13:10:25 -07:00
mruberry	9d4360c060	Creates stream pool (#9938 ) Summary: This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity. Change notes: - There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown. - All stream refcounting has been removed (the pools pattern replaces it). - Setting a stream now sets it on its device. Streams are associated with a device and the previous requirement to specify that device was unnecessary. - There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking. - Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.) - Unused THCStream and THCState stream-related functions were removed. - A new test of pooling behavior was added in stream_test. fyi: colesbury, apaszke, goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938 Reviewed By: SsnL Differential Revision: D9569036 Pulled By: ezyang fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d	2018-08-30 12:40:23 -07:00
Pádraig Brady	23b0c90e71	caffe2: fix gcc8 warnings Summary: The warnings are erroneous as far as i can see, so tweak things to avoid. The (unsigned int) cast is to avoid passing -1 to a size_t time. This was triggered in gcc8's lto build only, giving: caffe2/aten/src/TH/generic/THTensor.cpp: In function ‘THFloatTensor_squeeze1d’: lto1: error: ‘__builtin_memset’ specified size 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] In function ‘newImpl’, inlined from ‘operator new’ at common/memory/OperatorOverride.cpp:86:23, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/ext/new_allocator.h:111:0, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/alloc_traits.h:436:0, inlined from ‘_M_allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:172:0, inlined from ‘_M_default_append’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/vector.tcc:571:0, inlined from ‘resize’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:671:0, inlined from ‘THTensor_resizeDim’ at caffe2/aten/src/TH/THTensor.hpp:123:0, inlined from ‘THFloatTensor_squeeze1d.part.198’ at caffe2/aten/src/TH/generic/THTensor.cpp:429:0, inlined from ‘THFloatTensor_squeeze1d’: common/memory/OperatorOverride.cpp:86:23: error: argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=] void* ptr = malloc(size); Reviewed By: soumith Differential Revision: D9568621 fbshipit-source-id: 4569a4be897d669caa3f283f4b84ec829e8d77ad	2018-08-30 11:55:29 -07:00
Erik Brinkman	611a608517	Add ATen pdist CPU kernel (#10782 ) Summary: Also add single grad whitelist to the jit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782 Reviewed By: ezyang Differential Revision: D9583378 Pulled By: erikbrinkman fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944	2018-08-30 11:55:27 -07:00
Adam Paszke	029082e87c	Add entry for torch/lib/pythonX.Y in .gitignore (#11083 ) Summary: I've had `torch/lib/python3.6` show up as part of the build for some time now. It's not ignored which means I need to be extra careful about checking in files, or I end up with a thousand of them in my index. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11083 Differential Revision: D9580453 Pulled By: apaszke fbshipit-source-id: 369e4fe87962696532d111b24f2a4a99b9572bf2	2018-08-30 11:40:25 -07:00
Jerry Zhang	40227671e9	Add strides to caffe2::Tensor (#10826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10826 Add strides, and make sure the strides are consistent with sizes, and is_contiguous, for all the Caffe2 functions. is_contiguous means strides_[dim-1] = 1 and strides_[i] = strides_[i+1] * max(size_[i+1], 1); Reviewed By: ezyang Differential Revision: D9354480 fbshipit-source-id: 3643871b70f1111b7ffdd9fdd9fe9bec82635963	2018-08-30 11:25:58 -07:00
Orion Reblitz-Richardson	535633bddc	Export MPI functions (#11037 ) Summary: Potential fix for https://github.com/caffe2/caffe2/issues/2551#issuecomment-417124872 cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11037 Reviewed By: mingzhe09088 Differential Revision: D9580937 Pulled By: orionr fbshipit-source-id: 5e1fbf718728271a5b5af526d8e67cc5b48f0575	2018-08-30 10:42:02 -07:00
Fei Sun	e7195431e0	Add benchmarking functionality to the benchmark app (#10976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976 The app can run in XCode with the benchmark metrics collected. It can also run when building with buck Reviewed By: llyfacebook Differential Revision: D9546755 fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f	2018-08-30 09:54:55 -07:00
Xingdong Zuo	a8af7fe46a	Support import of `nn.RNNCellBase` in `__all__` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10992 Differential Revision: D9572005 Pulled By: soumith fbshipit-source-id: 26b546830b6a25a4f7ba6f825cd888d678233a97	2018-08-30 08:25:21 -07:00
Tullie Murrell	dbc0004f99	Remove use_count() == 1 in Tensor::Extend (#11046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11046 As suggested by jerryzh168, temporary fix for a new constraint that was added D9350686 is to remove this assert. Long term jerryzh168 is going to work out a better way of handling this. Reviewed By: jerryzh168 Differential Revision: D9566323 fbshipit-source-id: e4630c7cbe0cc68a084974ea7048654811fae01f	2018-08-29 23:55:28 -07:00
Tongzhou Wang	23af7deea7	Add has_lapack flag (#11024 ) Summary: Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python. Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024 Differential Revision: D9564898 Pulled By: SsnL fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2	2018-08-29 22:41:16 -07:00
Shihao Xu	ad1670cf54	Kill the dummy TaskOutput when task.get_step() (#11048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9566744 fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af	2018-08-29 20:11:29 -07:00
Christian Puhrsch	16b8e0a787	at::StorageImpl: Rename size_ to numel_ and elementSize() to itemsize() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11011 Reviewed By: ezyang Differential Revision: D9561898 Pulled By: cpuhrsch fbshipit-source-id: 0cf5cdc3e7acd397f7e2d66097856aaad0581147	2018-08-29 20:11:27 -07:00
Lu Fang	394bdcd49a	Fix the build of aten tests when FULL_CAFFE2=1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11019 Reviewed By: orionr Differential Revision: D9562691 Pulled By: houseroad fbshipit-source-id: 95a8dee580e5f4dc9af3a2e1f68ec6c62a0e4e04	2018-08-29 18:09:54 -07:00
Yi Cheng	e550eab3e2	Remove MetaNetDef test case in Predictor (#11052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11052 Delete the test case for Predictor with constructing by MetaNetDef since the constructor actually has been deprecated. The broken PR is for construcing predictor from DB instance. Reviewed By: highker Differential Revision: D9566935 fbshipit-source-id: 5511883953a2d3f6eb0a4f1c5518a1bc4b3ffbdc	2018-08-29 17:55:21 -07:00
Christian Puhrsch	91ecbf8b1d	Remove TensorBase (#11036 ) Summary: Not subclassed except by Tensor. Also requried to align further with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11036 Reviewed By: ezyang Differential Revision: D9565640 Pulled By: cpuhrsch fbshipit-source-id: ff7203a2c95d3f3956282b4f2d8dda6c2b93f4a6	2018-08-29 17:27:19 -07:00
Zachary DeVito	ae635b16f7	Record tensor factory functions in trace (#10935 ) Summary: Things like torch.zeros now appear in traces rather than constants. To continue to support our current level of ONNX export, we run constant prop to turn these back into constants where possible before export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935 Differential Revision: D9527427 Pulled By: zdevito fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a	2018-08-29 17:10:24 -07:00
Roy Li	c4e1adf29d	Remove THHalf type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11010 Reviewed By: ezyang Differential Revision: D9561325 Pulled By: li-roy fbshipit-source-id: 053cf2925ec1fc458db31e92bd31ffd23389f3e8	2018-08-29 16:44:45 -07:00
pbialecki	2cc98d8df7	Adds `dim` argument to `torch.unique` (#10423 ) Summary: Initial version of `unique` supporting a `dim` argument. As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html). Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`. To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105). Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`: (The values are calculated by taking the average of the times for both dimension) \| Device \| PyTorch (return_inverse=False) \| Numpy (return_inverse=False) \| PyTorch (return_inverse=True) \| Numpy (return_inverse=True) \| \| --- \| --- \| --- \| --- \| --- \| \| CPU \| ~0.007331s \| ~0.022452s \| ~0.011139s \| ~0.044800s \| \| GPU \| ~0.006154s \| - \| ~0.105373s \| - \| Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423 Differential Revision: D9517289 Pulled By: soumith fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5	2018-08-29 16:26:09 -07:00
Bram Wasti	98d85b1790	Debugging help + test Summary: When conversion fails, dump more information to help fix up the netdef Reviewed By: hyuen, yinghai Differential Revision: D9558667 fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e	2018-08-29 16:26:07 -07:00
Christian Puhrsch	ef7fc2a3e1	Remove at::StorageImpl::finalizer_ (#11022 ) Summary: Unused member variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/11022 Reviewed By: ezyang Differential Revision: D9562520 Pulled By: cpuhrsch fbshipit-source-id: af190b3ba06d33d65fa0fabffb34a0df769f38d0	2018-08-29 16:09:47 -07:00
Christian Puhrsch	6b87198245	Devirtualize StorageImpl deconstructor (#11018 ) Summary: Further align at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11018 Reviewed By: ezyang Differential Revision: D9562256 Pulled By: cpuhrsch fbshipit-source-id: d929317f6226a1e2550b78034b723afbae343aaa	2018-08-29 15:39:54 -07:00
Adam Paszke	d9b74f6540	Make it possible to disable JIT using env variables (#10867 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867 Differential Revision: D9556882 Pulled By: apaszke fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4	2018-08-29 15:11:05 -07:00
jgong5	c755616e00	Enable Detectron model inference for CPU and MKL-DNN paths (#10157 ) Summary: 1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks. 2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models. 3. Ignore 0-dim tensor in MKL-DNN concat operator. 4. Generate dynamic library of Detectron module for CPU device. This PR obsoletes #9164. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157 Differential Revision: D9276837 Pulled By: yinghai fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f	2018-08-29 15:11:01 -07:00
Tommy Yu	89834dfe64	Add GPU version of HardSigmoid Op to Caffe2 (#10955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955 Add GPU version of HardSigmoid Op to Caffe2. Updated test file to include GPU tests. Reviewed By: enosair Differential Revision: D9499353 fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545	2018-08-29 14:55:29 -07:00
Zhanibek Datbayev	22e3b2c9c3	Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() Differential Revision: D9413150 Original commit changeset: 51aaf3201e26 fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a	2018-08-29 14:39:49 -07:00
Yangqing Jia	6a8bc3804a	Add flush to logging messages higher than INFO. (#10983 ) Summary: This probably fixes the logging test error that orionr is encountering - haven't tested locally but wanted to send out a PR to kick off CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10983 Reviewed By: ezyang Differential Revision: D9552607 Pulled By: Yangqing fbshipit-source-id: 9ac019031ffd9c03972144df04a836e5dcdafe02	2018-08-29 14:39:48 -07:00
Edward Yang	0b1de74732	Documentation improvement in caffe2/core/tensor.h (#11006 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11006 Reviewed By: smessmer Differential Revision: D9558383 Pulled By: ezyang fbshipit-source-id: 7d36fb69a6e8a7d064da2c8796dc263a9fd4e094	2018-08-29 14:25:38 -07:00
Tongzhou Wang	e9eed8edb4	Add doc for Tensor.digamma_? (#11008 ) Summary: follow up for #10967 zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008 Differential Revision: D9559889 Pulled By: SsnL fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99	2018-08-29 14:11:16 -07:00
Edward Yang	f687ff5a59	Delete unnecessary includes from TensorImpl.h (#11005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11005 Reviewed By: smessmer Differential Revision: D9558300 Pulled By: ezyang fbshipit-source-id: ebebb3c6d3a1a2f7cc3da9fe9d3c56310ead46e1	2018-08-29 14:11:14 -07:00
Edward Yang	b644d5e74a	Delete context and get_context from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11001 Reviewed By: cpuhrsch Differential Revision: D9557315 fbshipit-source-id: b9862b8dda49194298bb1a4fbc214d466f3c8350	2018-08-29 13:55:45 -07:00
Edward Yang	cd9416317d	Minor copy-edit on setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933 Reviewed By: cpuhrsch Differential Revision: D9526650 fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f	2018-08-29 13:41:04 -07:00
Yi Cheng	c99a143eea	Update blackbox predictor with new constructor (#10920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10920 Update the black box predictor and the related code to use the constructor with PredictorConfig. Reviewed By: highker Differential Revision: D9516972 fbshipit-source-id: fbd7ece934d527e17dc6bcc740b4e67e778afa1d	2018-08-29 13:31:45 -07:00
Teng Li	56539f5fe1	PT1 Distributed Release MileStone No.1 - Completed Distributed Package and CI tests (#10871 ) Summary: The PR includes: (1) torch.distributed.c10d, which now includes the complete backward compatible frontend API for `torch.distributed` (2) `env://` init method functionality (3) Minor change to `test_distributed.py`, which is now a test for `torch.distributed.c10d`. (4) The old `test_distributed.py' is now moved to `test_distributed_thd` (5) Miscellaneous bug fixes. (6) DDP CPU test is removed since c10d doesn't have this support yet, but this is a very easy test after moving DDP CPU's dependency to torch.distributed.c10d. (7) CI config to test MPI, NCCL, and Gloo backend of c10d Now all the distributed test including c10d DDP can pass with the c10d frontend API TODO: (in a separate PR) MPI subgroup support, once this is added, CI group test will be enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10871 Differential Revision: D9554514 Pulled By: teng-li fbshipit-source-id: fb686ad42258526c8b4372148e82969fac4f42dd	2018-08-29 12:55:57 -07:00
Duc Ngo	fa7c81c640	nomnigraph - nit - code style update (#10987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10987 some code style update to make it consistent with fb cpp style Reviewed By: yinghai Differential Revision: D9550130 fbshipit-source-id: 6aef9878676c08e7d384383c95e7ba8c5c9a1bce	2018-08-29 12:55:55 -07:00
Christian Puhrsch	ec519e8a4a	Reduce number of elements within test_abs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997 Differential Revision: D9556861 Pulled By: cpuhrsch fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5	2018-08-29 12:55:54 -07:00
Yanghan Wang	dbce1c840f	exposing net_transformer_fun before add grad (#11003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003 Need a interface to re-write the graph after the net is built and after adding gradient ops. Reviewed By: aazzolini, harouwu Differential Revision: D9557827 fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37	2018-08-29 12:55:52 -07:00
Gregory Chanan	bed9d41abd	Generate Type::registerCPU as we do register_cuda_types. (#10947 ) Summary: The goal here is to separate out the base Type into core; as it was done previously we need all derived Types to be defined when we compile the base Type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10947 Reviewed By: gchanan Differential Revision: D9540025 Pulled By: ezyang fbshipit-source-id: 49f0b5acb3c378348ef3a55780abb73e4ae27edd	2018-08-29 12:39:47 -07:00
Richard Zou	4e446b85fb	Make profiler.build_table() O(n) rather than O(n^2) (#10969 ) Summary: Fixes #10851 Speeds up profiling results dramatically. For the following script: ``` import torch import time ITER = 2000 x = torch.randn(1, 1, requires_grad=True) with torch.autograd.profiler.profile() as prof: y = x for i in range(ITER): y = 3 * y - 2 * y y.backward() start = time.time() print("Done running. Preparing prof") x = str(prof) print("Done preparing prof results") end = time.time() print("Elapsed: {}".format(end - start)) ``` I get 7s before / 0.13s after these changes. cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969 Differential Revision: D9556129 Pulled By: zou3519 fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12	2018-08-29 12:25:51 -07:00
zou3519	396dec0e37	s/spaerse/sparse (#10968 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968 Differential Revision: D9546746 Pulled By: zou3519 fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500	2018-08-29 12:13:04 -07:00
Gregory Chanan	525548fb64	Move SparseTensorRef to core, change some includes to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10964 Differential Revision: D9545021 Pulled By: gchanan fbshipit-source-id: 8ba7e5e3a7bdf24e5aeb4bbc91957c1a6f14d7f0	2018-08-29 11:55:29 -07:00
Mingzhe Li	e0dbb91060	Windows raw string fix (#10998 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 mingzhe09088's fix of the docstrings for Windows builds. Unfortunately some versions of Windows seem to try and parse the `#` inside the string as a pre-processor declaration. We might need to change this to something else later, but want to get this landed first. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10998 Reviewed By: mingzhe09088 Differential Revision: D9557480 Pulled By: orionr fbshipit-source-id: c6a6237c27b7cf35c81133fd9faefead675a9f59	2018-08-29 11:40:08 -07:00
Orion Reblitz-Richardson	206d52d0e3	Disable smart_tensor_printer_test without glog (#10999 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 This test fails once we start building with `-DUSE_GLOG=OFF` since the non-glog logging case doesn't support flushing or streaming to the right location. For now, we just disable this test in that case. cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10999 Reviewed By: mingzhe09088 Differential Revision: D9557488 Pulled By: orionr fbshipit-source-id: 8b306f210411dfc8ccc404bdccf77ddcd36a4830	2018-08-29 11:10:23 -07:00
Lu Fang	562fc7631f	Add test cases for ONNX unsqueeze (#10924 ) Summary: PyTorch exporting test and end to end cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10924 Reviewed By: Ac2zoom Differential Revision: D9548210 Pulled By: houseroad fbshipit-source-id: 2381d1ad92a4e07f97060eb65c9fd09f60ad3de6	2018-08-29 11:10:21 -07:00
Gregory Chanan	1b0d5e60ab	Get rid of some unnecessary includes of Context. (#10951 ) Summary: This is part of splitting Context from what needs to go in ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10951 Differential Revision: D9540369 Pulled By: gchanan fbshipit-source-id: 73b0e8c4493785fbab368a989f46137c51f6ea0b	2018-08-29 11:10:20 -07:00
Ailing Zhang	a9469c9c8a	Fill eigenvector with zeros if not required (#10645 ) Summary: Fix #10345, which only happens in CUDA case. * Instead of returning some random buffer, we fill it with zeros. * update torch.symeig doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645 Reviewed By: soumith Differential Revision: D9395762 Pulled By: ailzhang fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba	2018-08-29 10:55:22 -07:00
Orion Reblitz-Richardson	b41988c71e	Cleanup BUILD_DOCS cmake section (#11000 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000 Differential Revision: D9557474 Pulled By: orionr fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00	2018-08-29 10:09:52 -07:00
zou3519	7169906249	torch.digamma (#10967 ) Summary: Fixes #10307 cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967 Differential Revision: D9546748 Pulled By: zou3519 fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd	2018-08-29 09:43:19 -07:00
Lu Fang	a5d7abedae	Enable fusing aten::expand on GT, LT, EQ (#10845 ) Summary: GT, LT, EQ all support numpy broadcasting, just enable the fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845 Reviewed By: bddppq Differential Revision: D9494089 Pulled By: houseroad fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e	2018-08-28 23:56:50 -07:00
James Reed	db0abe1890	Fix bugs in handling of negative slice + gather indices (#10973 ) Summary: This fixes multiple bugs in the handling of negative indices in both slicing and gather operations. These were uncovered by @[1466077526:Elias Ellison]'s diff D9493614, which made it so that we actually emit negative indices when we see them in PyTorch code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10973 Reviewed By: jhcross Differential Revision: D9546183 Pulled By: jamesr66a fbshipit-source-id: 6cb0e84e8ad399e47e24a96c44025f644c17b375	2018-08-28 23:40:40 -07:00
Shihao Xu	6ca28984c7	Kill the dummy TaskOutput when task.get_step() (#10739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9413150 fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac	2018-08-28 20:41:46 -07:00
James Reed	beeec47041	Sanity checks for tracing (#10841 ) Summary: TODO: integrate into torch.onnx.export -- separate PR Problem: We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation: * The tracer encountered dynamic control flow * Some computation escaped the tracer, and appeared as a Constant tensor node in the graph * Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs Objective: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing. Stage 1 (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference. Examples: ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)]) def foo(x): y = torch.arange(0, x.shape[0]).float() return x + y.unsqueeze(1) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ %2 : int = prim::Constant[value=0]() %3 : Dynamic = aten::_cast_Float(%1, %2) %4 : int = prim::Constant[value=1]() %5 : Dynamic = aten::unsqueeze(%3, %4) %6 : int = prim::Constant[value=1]() %7 : Dynamic = aten::add(%0, %5, %6) return (%7); } Node diff: - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ Trace source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Check source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper dank.py(3): <module> ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() Source Location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (shapes (3,), (4,) mismatch) x: array([0, 1, 2]) y: array([0, 1, 2, 3]) ``` == ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): y = x.data return x + y ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value=<Tensor>]() Source Location: dank.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441, 0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245], dtype=float32) y: array([0.243178, 0.315964, 0.972041, 0.0215 , 0.927751, 0.457512, 0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)]) def foo(x): for _ in range(x.size(0)): x = torch.neg(x) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { %1 : Dynamic = aten::neg(%0) %2 : Dynamic = aten::neg(%1) %3 : Dynamic = aten::neg(%2) + %4 : Dynamic = aten::neg(%3) - return (%3); ? ^ + return (%4); ? ^ } ``` == ``` import torch def foo(x): if not hasattr(foo, 'cache'): foo.cache = torch.neg(x) return x + foo.cache traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() %2 : int = prim::Constant[value=1]() %3 : Dynamic = aten::add(%0, %1, %2) return (%3); } Node diff: - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() Trace source location: test.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper test.py(8): <module> Check source location: test.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper test.py(8): <module> ``` The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios. ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): for i in range(3): x[i, :] = torch.zeros(4) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32) y: array([0., 0., 0., 0.], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)]) def foo(x): x.view(-1).add_(-x.view(-1)) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771], dtype=float32) y: array([0., 0., 0., 0., 0., 0.], dtype=float32) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841 Differential Revision: D9499945 Pulled By: jamesr66a fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45	2018-08-28 20:25:26 -07:00
Peter Goldsborough	fe15aedacc	Store schema in serialized modules and check arguments in function call (#10872 ) Summary: This PR adds argument checking for script method invocation from C++. For this I had to: 1. The schema of a method is currently not serialized in script modules, so we now store the function schema in the `doc_string` field of the ONNX proto. Upon loading of a serialized script module, we parse the schema into the structured C++ form and assign it to the loaded method, 2. Inside `Method::operator()`, we now verify the number and types of arguments. CC The controller you requested could not be found. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10872 Differential Revision: D9521219 Pulled By: goldsborough fbshipit-source-id: 5cb3d710af6f500e7579dad176652c9b11a0487d	2018-08-28 20:11:39 -07:00
Bram Wasti	ba71547e93	Add clip op to IR Summary: self explanatory Reviewed By: highker Differential Revision: D9551065 fbshipit-source-id: 14b3807af5337654c360a23816cffd7dd346bad5	2018-08-28 19:25:02 -07:00
Bram Wasti	90eb0b6031	Cleanup accidental logging Summary: cleanup Reviewed By: duc0 Differential Revision: D9549449 fbshipit-source-id: 9154b36a39936566fc2711a6e7bd33049681d1c8	2018-08-28 18:55:29 -07:00
Shihao Xu	72a84127b1	Add Workspace methods ws.feed_blob(name, arr) ws.remove_blob(name) (#10929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929 Workspace classes methods were missing on the Python side. Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation. Added - ws.feed_blob(name, arr) - ws.remove_blob(name) Reviewed By: mraway Differential Revision: D9486867 fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763	2018-08-28 17:54:34 -07:00
Junjie Bai	8e5b8490bf	Add relevant code for adding caffe2 pybind extensions registry to rocm (#10975 ) Summary: `cfa5dbadfc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10975 Differential Revision: D9546838 Pulled By: bddppq fbshipit-source-id: 3bd6dc0a4eee582bb92fc33ed27fc40eb3ab1200	2018-08-28 15:40:37 -07:00
Orion Reblitz-Richardson	4cb968fb77	Default hidden visibility (#10752 ) Summary: Flipping to hidden visibility one more time. Let's see what fails. cc mingzhe09088 pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10752 Reviewed By: ezyang Differential Revision: D9526343 Pulled By: orionr fbshipit-source-id: c0e9c29270e95e1b2e21c598095f720c199e1e52	2018-08-28 15:25:43 -07:00
Tommy Yu	92ff070b83	Add CPU version of hard sigmoid operator to caffe2 (#10837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837 Add CPU version of hard sigmoid operator to caffe2. The definition of this operator can be found here: https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid. Reviewed By: BIT-silence Differential Revision: D9489536 fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81	2018-08-28 14:55:49 -07:00
Tongzhou Wang	efd2aeac9e	Set -Wno-stringop-overflow only with GCC >=7 (#10954 ) Summary: `stringop-overflow` is added in GCC 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954 Differential Revision: D9546084 Pulled By: SsnL fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff	2018-08-28 14:25:29 -07:00
Duc Ngo	b3601a0425	nomnigraph - add documentation for new ReplaceSubgraph api to README.md (#10802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10802 add documentation for new ReplaceSubgraph api to README.md Reviewed By: yinghai Differential Revision: D9473282 fbshipit-source-id: 144c895564af83cc8727a0370e894c2f0b7eadf5	2018-08-28 12:55:25 -07:00
Bram Wasti	cfa5dbadfc	Add nomnigraph bindings Summary: Adds basic nomnigraph python bindings for quickly playing with the graphs. Reviewed By: duc0 Differential Revision: D9441936 fbshipit-source-id: fd70f8ea279b28c766e40f124008800acd94bddd	2018-08-28 12:40:16 -07:00
Teng Li	a88463cd9a	Working async version of AllGather, test fix and compiler warnings, and CI (#10932 ) Summary: The previous NCCL all gather doesn't work as expected. This is a fully working async version. Tested on both C++ and Python Frontend. Multi-node: ``` tengli@learnfair042:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=0 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 0 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful tengli@learnfair117:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=1 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 1 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` CI test: ``` test_set_get (__main__.FileStoreTest) ... ok test_set_get (__main__.PrefixFileStoreTest) ... ok test_set_get (__main__.PrefixTCPStoreTest) ... ok test_allreduce_ops (__main__.ProcessGroupGlooTest) ... ok test_broadcast_ops (__main__.ProcessGroupGlooTest) ... ok test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_common_errors (__main__.RendezvousFileTest) ... ok test_nominal (__main__.RendezvousFileTest) ... ok test_common_errors (__main__.RendezvousTCPTest) ... ok test_nominal (__main__.RendezvousTCPTest) ... ok test_unknown_handler (__main__.RendezvousTest) ... ok test_set_get (__main__.TCPStoreTest) ... ok ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10932 Differential Revision: D9542067 Pulled By: teng-li fbshipit-source-id: 25513eddcc3119fd736875d69dfb631b10f4ac86	2018-08-28 12:40:14 -07:00
Michael Carilli	579bc43a14	Future-proofing embedding.py against heuristic changes (#10959 ) Summary: - rebase of https://github.com/pytorch/pytorch/pull/9851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10959 Differential Revision: D9542292 Pulled By: weiyangfb fbshipit-source-id: ce51864d203c8ed89da3817f1da020a0ee932960	2018-08-28 12:40:12 -07:00
Xingdong Zuo	3b891d9d49	Support direct access of `nn.RNNCellBase` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10944 Differential Revision: D9541085 Pulled By: soumith fbshipit-source-id: 59077f3b226d04c68a93cd6864894e8f6c594aba	2018-08-28 12:25:12 -07:00
David Riazati	5c58cda8ca	Add subname to console output for assertExpected (#10559 ) Summary: Running `--accept` on a test doesn't tell you explicitly which sub-test is being updated, this PR fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10559 Differential Revision: D9353977 Pulled By: driazati fbshipit-source-id: a9d4014386ff0fe388a092f3dcf50f157e460f04	2018-08-28 12:13:03 -07:00
Edward Yang	91797c0672	Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946 ``` codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h ``` Reviewed By: houseroad Differential Revision: D9539945 fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e	2018-08-28 11:57:08 -07:00
Lu Fang	5ed62ea6fa	Add Upsample example for torch onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550 Reviewed By: orionr Differential Revision: D9541932 Pulled By: houseroad fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26	2018-08-28 11:39:55 -07:00
Zachary DeVito	22c9bc3117	Resolve builtins using a dict rather than by name (#10927 ) Summary: Changes the approach for resolving builtin ops so that the following works ``` add = torch.add script def foo(x): return add(x, x) ``` This handles cases when people alias torch and torch.nn.functional to shorter names. This works by building a table of id -> builtin name for the know builtin ops in torch, torch.nn.functional, and for any user-defined op created by accessing in torch.ops.foo.bar This allows us to clean up many SugaredValue types in the compiler. Notes: * we now consider any attributes on python modules to be constants (e.g. math.pi, and torch.double). * fixes a bug where we incorrectly allowed attribute lookup on arbitrary pyton objects. It is now restricted to modules only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927 Differential Revision: D9527522 Pulled By: zdevito fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660	2018-08-28 11:25:11 -07:00
Jerry Zhang	c9d337f436	Split IsEmptyOp (#10918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10918 att Differential Revision: D9515040 fbshipit-source-id: 53c05c160ba5dda92104aadc2e40801519a2cd28	2018-08-28 10:52:28 -07:00
Jerry Zhang	7de830b879	proper sharing in ShareExternalPointer (#10804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804 Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors. When we need to modify the field of storage, we'll create a new storage instead. Reviewed By: ezyang Differential Revision: D9350686 fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca	2018-08-28 10:52:26 -07:00
Wei Yang	7f9fd1cc26	allow RandomSampler to sample with replacement (#9911 ) Summary: fixes #7908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911 Reviewed By: yf225 Differential Revision: D9023223 Pulled By: weiyangfb fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7	2018-08-28 10:52:25 -07:00
Peter Goldsborough	504d705d0f	Support for CUDNN_HOME/CUDNN_PATH in C++ extensions (#10922 ) Summary: Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder. This means cudnn extensions now also get support on the FAIR cluster. soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922 Differential Revision: D9526856 Pulled By: goldsborough fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db	2018-08-28 09:40:29 -07:00
Rohan Varma	1421a9d704	added num_directions explanation to docstrings (#10786 ) Summary: Resolving [https://github.com/pytorch/pytorch/issues/10741](https://github.com/pytorch/pytorch/issues/10741). The current docs use `num_directions` quite a bit, without any explanation for them. `num_directions` is set to 2 if the RNN is bidirectional, or 1 otherwise. This change simply adds that to the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10786 Differential Revision: D9480235 Pulled By: zou3519 fbshipit-source-id: f61d1b0d2b943f84d5b7ff83df6fe0965a508a5e	2018-08-28 09:26:06 -07:00
Christian Puhrsch	bee779bc83	StorageImpl scalar_type_ to data_type_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10915 Reviewed By: ezyang Differential Revision: D9526416 Pulled By: cpuhrsch fbshipit-source-id: 68e43121d72b1b951c73df5bf7b598854fb0e291	2018-08-28 09:26:04 -07:00
Gregory Chanan	82bb9fbedd	Remove Scalar.local(). (#10917 ) Summary: It's a no-op now that Scalars don't store tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10917 Differential Revision: D9520267 Pulled By: gchanan fbshipit-source-id: 5388ff9a4fbb8fc9b9e1ce92208246bf6f08eb92	2018-08-28 07:41:36 -07:00
なるみ	7c7a2ccb58	Update onnx.rst for v0.4 (#10810 ) Summary: Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810 Differential Revision: D9500960 Pulled By: zou3519 fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f	2018-08-28 07:26:01 -07:00
Edward Yang	de099564e3	Minor copy-edit on README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10931 Reviewed By: cpuhrsch Differential Revision: D9526248 fbshipit-source-id: 2401a0c1cd8c5e680c6d2b885298fa067d08f2c3	2018-08-27 21:09:36 -07:00
Roy Li	de9cc98e66	Stop copying tensor memory when importing IR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10487 Differential Revision: D9370084 Pulled By: li-roy fbshipit-source-id: ecff1d5d7d006fd60e4f6238ee86c56ad168bfc8	2018-08-27 19:25:42 -07:00
Zachary DeVito	2c342e50e1	Fix a bug in constant prop (#10923 ) Summary: More support for tuples has uncovered a bug in constant prop where it assumed it can create constant nodes of tuples, even though we cannot easily create a single prim::Constant to represent a tuples. This fix checks when we cannot represent an IValue as a prim::Constant and then stops propagating the node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10923 Reviewed By: orionr Differential Revision: D9523417 Pulled By: zdevito fbshipit-source-id: 745058c4388d9a5e0fc1553eaa2731e31bc03205	2018-08-27 18:10:17 -07:00
Junjie Bai	157fb46ffc	Add -rdynamic only to linker flags to avoid compiler warnings (#10789 ) Summary: `clang: warning: argument unused during compilation: '-rdynamic'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10789 Reviewed By: houseroad Differential Revision: D9467385 Pulled By: bddppq fbshipit-source-id: 610550a8f34cfa66b9dfa183752eb129dae21eaa	2018-08-27 17:56:21 -07:00
Edward Yang	f7b02b3a68	Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824 API additions: - Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&) - Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&) - Tensor::operator=(Tensor&&) && (for completeness sake) - TensorBase::unsafeGetTensorImpl() - TensorBase::unsafeReleaseTensorImpl() - TensorBase::getIntrusivePtr() - TensorImpl::type_id() - Tensor::set_data() - Tensor::is_same(Tensor) - Tensor::use_count() - Tensor::type_id() - Tensor::scalar_type() - WeakTensor::is_same(WeakTensor) - intrusive_ptr::weak_use_count() - weak_intrusive_ptr::weak_use_count() - c10::raw::intrusive_ptr::{incref,decref,make_weak} - c10::raw::weak_intrusive_ptr::{incref,decref,lock} API changes: - Tensor::pImpl is no longer public (and now named tensor_impl_) - Most methods accessed this way are now accessible on Tensor maybe_zero_dim() and set_wrapped_number() being prominent exceptions (they are now accessed through unsafeGetTensorImpl()) - Type is no longer friend of Tensor - TensorBase::reset(TensorImpl) is deleted - TensorBase::reset(TensorImpl, bool should_retain) is deleted - TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead - TensorBase::get() is deleted; use unsafeGetTensorImpl() instead - TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead - TensorBase::retain() is deleted; use _raw_incref() instead - TensorBase::release() is deleted; use _raw_decref() instead - WeakTensor lost most of its methods (it no longer inherits from TensorBase) - TensorImpl::storage() is now a const method - Tensor(TensorBase) constructor removed, instead we go through getIntrusivePtr(). I'm not sure about this change; I happened to have accidentally removed the TensorBase constructor and decided to fix call sites, but I could go the other way. - detail::set_data() is deleted; use Tensor::set_data() instead - c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead. (The reason for this change, is that it is invalid to cast an intrusive_ptr_target* to a raw_intrusive_ptr_target* to take advantage of the methods. But there is no reason the incref/decref methods shouldn't also work on intrusive_ptr_target; it is primarily an API consideration. We can be more standards compliant by keeping them as functions, which are universally applicable.) - intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on pointers of the NullType. (This counts as a bug fix, because the documentation specified that pointers produced by release() are valid to reclaim(), and a release() on a null intrusive_ptr produces the NullType::singleton()) Bug fixes: - Dispatch code for mutable references incorrectly returned a reference to a value argument (which would immediately go out of scope). They now correctly return a tensor by value. - intrusive_ptr copy/move assignment did not work correctly when an object was assigned to itself. We now check for this case and no-op if so. (This bug manifested itself as a Tensor mysteriously becoming an UndefinedTensor after lines of code like 'x = x.mul_(y)') Other changes: - The checked cast functions in Utils.h have now been renamed and detemplatized into checked unwrap functions. - Added type_id() and scalar_type() methods to Tensor - pImpl is no longer public - Documented what the && overloads are doing - All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor') have been expunged. This is NO LONGER a valid way to create a new tensor, and if you do this, upon your first incref, you will catch an ASSERT failure saying that only tensors created by intrusive_ptr::release() are valid to reclaim(). Use c10::make_intrusive instead in this situation. - IValue is adjusted to use intrusive_ptr instead of Retainable, and all other sub-classes of Retainable were modified to use intrusive_ptr. When doing this, I had to make the constructors of sub-classes like ConstantList public, so that c10::make_intrusive could invoke them. Fortunately, if you incorrectly stack allocate a ConstantList, and then try to get an intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0. - IValue very narrowly sidesteps the problem of handling NullType, as it considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor> which is not always true. This was always the case, but there's now a comment explaining what's going on. Some MSVC bugs were uncovered during the preparation of this patch. They are documented as comments in the code. Reviewed By: gchanan Differential Revision: D9481140 fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc	2018-08-27 16:11:01 -07:00
Roy Li	f2bb9f0bb5	speed up kl div loss (#10336 ) Summary: Moved kl div loss to aten. benchmarks for 5000 iterations on input size (1000,100) New ``` cuda: forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006] input requires_grad=True: backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155] double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918] target requires_grad=True: backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101] double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982] cpu: forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648] input requires_grad=True: backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492] double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031] target requires_grad=True: backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143] double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685] ``` Old ``` cuda: forward [3.101281268056482, 3.068499860819429, 3.0527669726870954] input requires_grad=True: backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839] double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133] target requires_grad=True: backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282] double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176] cpu: forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352] input requires_grad=True: backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552] double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931] target requires_grad=True: backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046] double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336 Differential Revision: D9213636 Pulled By: li-roy fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b	2018-08-27 16:10:59 -07:00
rohithkrn	f5910c8a36	Add MIOPEN recurrent operator (#10840 ) Summary: The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840 Differential Revision: D9518980 Pulled By: bddppq fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f	2018-08-27 15:39:56 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Elias Ellison	58b145f515	Fix negative indices in tracer (#10560 ) Summary: Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560 Differential Revision: D9493614 Pulled By: eellison fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c	2018-08-27 15:19:41 -07:00
Junjie Bai	9aa92bc261	Change the default value of DeviceOption.numa_node_id from -1 to 0 (#10877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10877 change default value of DeviceOption.numa_node_id to 0 and use has_numa_node_id() to check existence Reviewed By: ilia-cher Differential Revision: D9473891 fbshipit-source-id: 91ac6a152f445644691023110c93d20a3ce80d43	2018-08-27 14:55:46 -07:00
Gregory Chanan	7842b6d0f7	Fix at::optional compile problems on Windows CUDA. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10909 Differential Revision: D9516837 Pulled By: gchanan fbshipit-source-id: fad7e3284e74c599b873ebaae2dcdf5013505855	2018-08-27 14:40:41 -07:00
Zachary DeVito	6ce799edd6	Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812 ) Summary: * Fix the necessary pathways so that tuples and lists can be inputs to the script. * prevent linear algebra functions from being run in shape prop because they frequently will error out for nonsense data. * favor schema-driven python input conversion where possible. remaining cases where we directly create Stacks without schema are only for debugging * Make the error messages when calling script/trace functions more pythonic * Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812 Differential Revision: D9477982 Pulled By: zdevito fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c	2018-08-27 14:40:40 -07:00
Yanghan Wang	f64f6eed3a	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859 Reviewed By: newstzpz Differential Revision: D9498312 fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304	2018-08-27 12:56:46 -07:00
Richard Zou	35beecfe17	fix xfails involving literals (#10905 ) Summary: I missed these in #10900 cc apaszke jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905 Differential Revision: D9516748 Pulled By: zou3519 fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3	2018-08-27 12:41:06 -07:00
vishwakftw	f940af6293	Bag of Distributions doc fixes (#10894 ) Summary: - Added `__repr__` for Constraints and Transforms. - Arguments passed to the constructor are now rendered with :attr: Closes https://github.com/pytorch/pytorch/issues/10884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10894 Differential Revision: D9514161 Pulled By: apaszke fbshipit-source-id: 4abf60335d876449f2b6477eb9655afed9d5b80b	2018-08-27 09:55:27 -07:00
Richard Zou	67f6f930a8	Remove FIXME_zerol() from test_jit.py (#10900 ) Summary: The scalar situation has gotten a lot better and now we can remove all instances of FIXME_zerol(). cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900 Differential Revision: D9514206 Pulled By: zou3519 fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce	2018-08-27 08:55:08 -07:00
Tongzhou Wang	841d779598	Increase BC for PackedSequence ctor (#9864 ) Summary: PackedSequence is never supposed to be created by user, but unfortunately some community repo is already doing this (e.g., [here](`7c191048ce/torchmoji/model_def.py (L218-L229)`)). Some change we made break the calling pattern `PackedSequence(data=x, batch_sizes=y)`. This patch adds back support for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9864 Differential Revision: D9011739 Pulled By: SsnL fbshipit-source-id: 0e2012655d7f4863ec54803550df30874ec35d75	2018-08-27 08:25:23 -07:00
Gregory Chanan	c3271b53e4	Remove ability of Scalars to hold Tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10889 Differential Revision: D9512589 Pulled By: gchanan fbshipit-source-id: 8b2b26c9f3a4da31a46f684793ab237e9ef9a323	2018-08-27 07:26:14 -07:00
Edward Z. Yang	3aaad3ecb1	Begin a bestiary of MSVC/NVCC bugs. (#10883 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10883 Differential Revision: D9513997 Pulled By: ezyang fbshipit-source-id: 37db956e57d86471323d284869bb844f5a4753ac	2018-08-27 07:09:47 -07:00
Adam Paszke	c8b246abf3	Prevent JIT from overspecializing to every single size configuration (#10844 ) Summary: Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details. Summary of changes: - Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail. - Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`. - Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`. - Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`. zdevito ezyang mruberry ngimel csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844 Differential Revision: D9498705 Pulled By: apaszke fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61	2018-08-26 09:54:48 -07:00
Jorg Doku	9679fc5fcd	Handling failing test on ROCm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10854 Reviewed By: ezyang Differential Revision: D9498721 Pulled By: Jorghi12 fbshipit-source-id: 4018383fea5a2a6baff7183b0c0197a4b7a09f20	2018-08-26 07:55:33 -07:00
Yi Cheng	ddc37d7487	Update mobile predictor caller's interface Summary: Update all the caller for the new interface Reviewed By: highker Differential Revision: D9323167 fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308	2018-08-24 23:40:05 -07:00
Christian Puhrsch	d632ccd2c1	Cache isContiguous and numel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10696 Differential Revision: D9437963 Pulled By: cpuhrsch fbshipit-source-id: 7217682f5e4b69c73d943411d738e4892bb465f5	2018-08-24 22:40:39 -07:00
Qinqing Zheng	17dac3e17f	Create class constant for string literal 'blob_names' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10827 Reviewed By: boryiingsu Differential Revision: D9484567 fbshipit-source-id: 275eddc9406b5f427d72c0ab9b0da481b5e59ece	2018-08-24 22:11:43 -07:00
Jongsoo Park	8253cfaa72	Conv BN fusion for 3D conv (#10239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10239 Make Conv + BN fusion also work for 3D convolutions Reviewed By: duc0 Differential Revision: D9176314 fbshipit-source-id: 6604aa569c5c3afdb4480a5810890bc617e449c4	2018-08-24 21:24:36 -07:00
Adam Paszke	542aadd9a7	Stop using symbolic override for tracing RNNs (#10638 ) Summary: This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638 Differential Revision: D9385830 Pulled By: apaszke fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b	2018-08-24 20:25:58 -07:00
Bram Wasti	f2f6e6c0e8	Add registry to pybind_state (#10759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10759 Adding a basic registry pattern to pybindstate so that we can have separate 'cc' files register module updates. This is substantially cleaner than using multiple pybind modules (which have been known to cause bugs) Reviewed By: bddppq Differential Revision: D9441878 fbshipit-source-id: af9e9e98385e92b58ca50e935678328c62684d8e	2018-08-24 17:25:02 -07:00
Orion Reblitz-Richardson	c172ffb632	Remove the nanopb submodule Summary: After making changes internally, really remove the nanopb submodule. Finalizes https://github.com/pytorch/pytorch/pull/10772 Reviewed By: yns88 Differential Revision: D9504582 fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b	2018-08-24 16:24:57 -07:00
Peter Goldsborough	148ea2a653	Create at::linear (#10799 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX ezyang jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799 Differential Revision: D9482168 Pulled By: goldsborough fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc	2018-08-24 16:02:08 -07:00
Syed Tousif Ahmed	1fbabff76a	Refactor THCNumerics and add common math functions for at::Half (#10301 ) Summary: Summary: This PR is a followup of mruberry's https://github.com/pytorch/pytorch/pull/9318/. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR https://github.com/pytorch/pytorch/pull/10147) Comments: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check https://github.com/pytorch/pytorch/pull/8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: https://github.com/ROCm-Developer-Tools/HIP/issues/374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - https://github.com/pytorch/pytorch/pull/6786 - https://github.com/pytorch/pytorch/pull/5475 - https://github.com/pytorch/pytorch/pull/9401 - https://github.com/pytorch/pytorch/pull/8689 - https://github.com/pytorch/pytorch/pull/8919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a	2018-08-24 16:02:06 -07:00
Gregory Chanan	87a7840fa6	Remove Tensor constructor of Scalar. (#10852 ) Summary: This is along the way of removing Tensor as a member of the tagged union in Scalar. This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase). Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway. I'm undecided what the final API should be here. We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best. For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852 Reviewed By: ezyang Differential Revision: D9496766 Pulled By: gchanan fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db	2018-08-24 16:02:05 -07:00
Edward Yang	0d5584d8d7	Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply. Differential Revision: D9492561 Original commit changeset: d0f0e2ab7180 fbshipit-source-id: fc822e63b11866195ff7883f360338a41e25d9e2	2018-08-24 16:02:04 -07:00
Elias Ellison	0ef5cfd28c	fix ivalue printing for lists (#10777 ) Summary: Fixing the printing of IValue lists, which didn't work previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777 Differential Revision: D9474264 Pulled By: eellison fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f	2018-08-24 16:02:03 -07:00
Adam Paszke	983e0f2413	Remove Node::invalidateSchema (#10822 ) Summary: The schema_ field is a private and internal cache for nodes, and no methods meant to manipulate it should be publicly visible. This call wasn't even necessary at its call site, since removeInput will reset the schema by itself. zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10822 Reviewed By: zdevito Differential Revision: D9498683 Pulled By: apaszke fbshipit-source-id: 42e1743e3737cb7d81f88e556204487d328c0e47	2018-08-24 16:02:01 -07:00
Elias Ellison	74e6a666b3	If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180 ) Summary: When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180 Differential Revision: D9438153 Pulled By: eellison fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5	2018-08-24 16:02:00 -07:00
Yunus Rahbar	474684cf03	Re-sync with internal repository (#10868 )	2018-08-24 15:48:03 -07:00
Yinghai Lu	8044dc4eb8	Support new Reshape semantics (#10848 ) Summary: Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models. This change also flushes out some of the bugs and new requirement. - Converting shape info into int64 tensor - Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848 Reviewed By: houseroad Differential Revision: D9495121 Pulled By: yinghai fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a	2018-08-24 11:46:41 -07:00
Adam Paszke	8130b1a950	Ignore stack frames coming from python3 object file (#10627 ) Summary: goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/10627 Reviewed By: ezyang Differential Revision: D9384411 Pulled By: apaszke fbshipit-source-id: ce4f6edb9ffbd0c7e320b9347da10399de472150	2018-08-24 11:26:21 -07:00
Jerry Zhang	6e2f6dc6e6	Move Allocator and Device to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10798 Reviewed By: ezyang Differential Revision: D9466602 fbshipit-source-id: f5bda17045076d8c81be9fa5a0749c97bf274b5f	2018-08-24 11:26:19 -07:00
Wei Yang	f1df85d799	bug-fix in normal_( ) (#10846 ) Summary: - fixes #10642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846 Differential Revision: D9495014 Pulled By: weiyangfb fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae	2018-08-24 11:26:18 -07:00
Jorg Doku	313139d14e	Moving the operator argument to the front for kernelPointwiseApply. (#10829 ) Summary: Currently on PyTorch AMD, memory accesses on the TensorInfo struct contained in the Operators passed into the kernelPointwiseApply kernel leads to hangs on the HCC runtime. Permuting the argument order such that the operator is first alleviates this issue and the kernel hangs disappear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10829 Reviewed By: ezyang Differential Revision: D9492561 Pulled By: Jorghi12 fbshipit-source-id: d0f0e2ab7180e55846db909f2744b8c8b110205e	2018-08-24 11:10:43 -07:00
Lu Fang	e3d12d7afb	Automatic update of fbcode/onnx to 6146a85d371481222c10ede4430ad5476e60de87 (#10831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10831 Previous import was 7848f1e0414ba3b2e263609d93d46fd60790b2e9 Included changes: - [6146a85](https://github.com/onnx/onnx/commit/6146a85): Check pybind version (#1315) <Changming Sun> - [2cbf740](https://github.com/onnx/onnx/commit/2cbf740): Domain exists in GraphProto but not in Node (#1310) <Ryan Hill> - [9b874e9](https://github.com/onnx/onnx/commit/9b874e9): [Title] Add optimization pass eliminating nop Pad (#1307) <Tingfan Wu> Reviewed By: yinghai Differential Revision: D9485475 fbshipit-source-id: 3adb4e6e182278fd2abe5068a9d4569763e0ff0c	2018-08-24 10:54:40 -07:00
Orion Reblitz-Richardson	3c9775fff8	Remove nanopb since we've switched to protobuf (#10772 ) Summary: We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default. cc zdevito pjh5 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772 Reviewed By: pjh5 Differential Revision: D9465894 Pulled By: orionr fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201	2018-08-24 10:54:38 -07:00
Orion Reblitz-Richardson	8c13971f57	Remove protobuf require and use requirements.txt (#10771 ) Summary: In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771 Reviewed By: pjh5 Differential Revision: D9474458 Pulled By: orionr fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492	2018-08-24 10:39:40 -07:00
Gregory Chanan	474bd60bad	Provide a tensor overload to mul_out_sparse_scalar. (#10828 ) Summary: This is a small part of the effort to remove Tensor as a tagged member in Scalar because it is inconsistent with how we normally do overloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10828 Differential Revision: D9485049 Pulled By: gchanan fbshipit-source-id: 103f5cc03bb7775cd2d3a0a5c0c5924838055f03	2018-08-24 09:39:26 -07:00
Sam Gross	e146518e46	Fix AT_CUDA_CHECK and AT_CUDNN_CHECK macros (#10834 ) Summary: Previously, the macros evaluated the expression multiple times on error. For example: ``` AT_CUDA_CHECK(cudaStreamWaitEvent(ptr->stream, event, 0)); ``` would previously expand to ``` if (cudaStreamWaitEvent(ptr->stream, event, 0) != cudaSuccess) { AT_ERROR("CUDA error: ", cudaGetErrorString(cudaStreamWaitEvent(ptr->stream, event, 0))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10834 Differential Revision: D9493257 Pulled By: colesbury fbshipit-source-id: d2473020fd83a25aa421171d19c8dfe559155a9b	2018-08-24 09:09:18 -07:00
Richard Zou	ca567862b2	Support multidimensional indexing (#10787 ) Summary: Part of #10774. This PR does the following: - Support ast.ExtSlice in the frontend. This is done by returning a list of ast.Index and ast.Slice. - Support multidimensional indexing with ints and slices The general approach is to desugar multidimensional indexing into at::slice, at::select operations. This is exactly how normal pytorch does indexing (by desugaring it into at::slice, at::select, and other ops). I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference. We should be able to copy the rest of this to implement the missing indexing features in script (indexing with ellipses, tensors, sequences, etc). After I'm done implementing the missing indexing features in future prs, I can try to templatize python_variable_indexing.cpp so that it can work with both JIT script and normal pytorch indexing, but right now I'm not sure if that's a good idea or not. cc zdevito jamesr66a apaszke wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787 Differential Revision: D9481402 Pulled By: zou3519 fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837	2018-08-24 08:10:32 -07:00
Xiaodong Wang	6993e4a9f7	Caffe2 Functional enforcing inplace output (#10797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10797 A few operators enforces in-place output (e.g., running mean/var for SpatialBN). Functional right now doesn't follow the inplace_enforced_ rules in OpSchema and therefore, the RunNetOnce() will fail on OpSchema->Verify(). Edit the output_names in Functional following the rules to pass check. Reviewed By: jerryzh168 Differential Revision: D9470582 fbshipit-source-id: 168efeccecc32184bd1d02f3fefe8e61faa4e0f4	2018-08-23 22:42:47 -07:00
Yi Cheng	8da4167129	Fix performance regression (#10835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10835 The last diff of constructor cause performance regression in cold run. This one tried to fix this. Reviewed By: highker Differential Revision: D9489617 fbshipit-source-id: a77c2e2c903a73e2ad9806b4f9c209cdb751442f	2018-08-23 19:55:23 -07:00
Teng Li	df2d48b42c	Added PrefixStore, pybind, test for group backward compatibility (#10762 ) Summary: Added Prefix Store support. This will make group be backward compatible. Test is covered too. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./FileStoreTest Using temporary file: /tmp/testoglRl4 Using temporary file: /tmp/testepZIpB Test succeeded tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./TCPStoreTest Test succeeded ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10762 Differential Revision: D9484032 Pulled By: teng-li fbshipit-source-id: 85754af91fe3f5605087c4a2f79ae930a9fd1387	2018-08-23 18:10:37 -07:00
Duc Ngo	61b34d42e7	nomnigraph - isSubgraphMatch returns the matched Subgraph & map from MatchNodes to graph nodes (#10605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10605 Make isSubgraphMatch returns a subgraph and map from MatchNodes to graph nodes in the result, which makes it easier to write graph fusion logic. Also include some more helper methods for NN subgraph matcher. Reviewed By: bwasti Differential Revision: D9374931 fbshipit-source-id: 3a273295eec81a43027ec3a9e835d27f00853df9	2018-08-23 16:40:19 -07:00
Tigran Hakobyan	ee022a476a	Added this-consts to all methods on SymbolicVariable (#10805 ) Summary: Self explanatory. See https://github.com/pytorch/pytorch/issues/9109 or T32954812 for more details Pull Request resolved: https://github.com/pytorch/pytorch/pull/10805 Reviewed By: ezyang Differential Revision: D9477686 Pulled By: hakobyant fbshipit-source-id: 73dd84e5295e4c749bd6416ce2f6eb7590f05cbc	2018-08-23 16:25:27 -07:00
Peter Goldsborough	9403e0cac0	Use ATen implementation of RNNs (#10761 ) Summary: apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke! I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not. apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761 Differential Revision: D9443885 Pulled By: goldsborough fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8	2018-08-23 16:12:14 -07:00
Johannes M Dieterich	a4c59a9dab	MIOpen integration, more tests enabled, bug fixes (#10612 ) Summary: * first integration of MIOpen for batch norm and conv on ROCm * workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing * workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script * use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm * enable test_sparse set on CI, skip tests that don't work currently on ROCm * enable more tests in test_optim after the elementwise_bug got fixed * enable more tests in test_dataloader * improvements to hipification and ROCm build With this, resnet18 on CIFAR data trains without hang or crash in our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612 Reviewed By: bddppq Differential Revision: D9423872 Pulled By: ezyang fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd	2018-08-23 15:24:47 -07:00
Zachary DeVito	3d43a82440	Add support for vararg style functions. (#10250 ) Summary: Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments. I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification. closes #10632 closes #8516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250 Differential Revision: D9478235 Pulled By: zdevito fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c	2018-08-23 15:10:36 -07:00
Edward Yang	9dbcc9cebd	Move _raw_* intrusive pointer manipulations to raw_intrusive_ptr_target (#10779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10779 The idea is to let classes opt-in to providing these methods by default. Reviewed By: jerryzh168 Differential Revision: D9466076 fbshipit-source-id: b6beee084cc71d53ce446cdc171d798eeb48dc12	2018-08-23 14:32:24 -07:00
Pengyao Chen	dec3ed7b49	Increase the limit for Proto size (#10745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745 ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/ Differential Revision: D9436368 fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b	2018-08-23 13:55:50 -07:00
Andrei Maximov	432b3adffc	Print blob sizes on fatal signal (#10766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766 Added a `Workspace::ForEach(...)` API for accessing the global set of existing Workspace instances. This is used in the signal handler to print blob info on the thread receiving a fatal signal. Reviewed By: mraway Differential Revision: D9147768 fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4	2018-08-23 13:39:55 -07:00
Edward Yang	82ddeb7f2b	Using shared implementation in Tensor (#10619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10619 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9047 Reviewed By: jerryzh168 Differential Revision: D8417101 fbshipit-source-id: 98e0a3275864283c2f06d28f4c9b859b5827ed4d	2018-08-23 13:39:53 -07:00
Sam Gross	23a366be33	Use ATen native functions for THCTensor_cadd/cmul/cdiv/csub (#10707 ) Summary: This seems to save a few percent in binary size in libcaffe2_gpu.so, but the effect may not be real. In fact, deleting some functions can cause the binary size to increase (perhaps due to alignment issues). cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/10707 Differential Revision: D9409009 Pulled By: colesbury fbshipit-source-id: 282931e562e84e316a33ac6da4788c04c2984f08	2018-08-23 13:31:03 -07:00
mruberry	0f5c8edfd3	Removes unused THCState code paths (#9735 ) Summary: To prepare THCState for refactoring into ATen, this PR removes unused THCState code paths. In particular, it: - Removes the UVA Allocator - Removes the THDefaultDeviceAllocator - Respects the 1 BLAS and 1 sparse handle per device reality - Removes kernel p2p access - Removes setting p2p access - Removes the GCHandler code path - Removes many unused THCState_... functions - Removes THCThreadLocal.h/.cpp It does not change the preexisting external behavior of any remaining function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9735 Differential Revision: D9438558 Pulled By: SsnL fbshipit-source-id: dde9acbec237a18bb6b75683e0526f7ff1c9a6ea	2018-08-23 13:10:05 -07:00
Lin Li	ab9e7ae23e	Add CUDA implementation of LARS --caffe2 (#10509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10509 This diff enables CUDA implementation of LARS operator in caffe2. Reviewed By: enosair Differential Revision: D9318356 fbshipit-source-id: 365b9f01e3afd4d9d3ba49155e72e728119f40c5	2018-08-23 12:55:57 -07:00
Will Feng	b14f2e899c	Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279 ) Summary: When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times: ``` _sparseDims + _denseDims = len(shape) _indices.shape: dimensionality: 2, shape: (_sparseDims, nnz) _values.shape: dimensionality: 1 + _denseDims. shape: (nnz, shape[_sparseDims:]) ``` This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled. Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279 Differential Revision: D8936683 Pulled By: yf225 fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e	2018-08-23 10:10:24 -07:00
なるみ	0eb2c83006	Fix link in THNN/README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10821 Differential Revision: D9481118 Pulled By: soumith fbshipit-source-id: 0a416202eb4db025ec7d395e70344cbbf626fec0	2018-08-23 09:25:16 -07:00
Fritz Obermeyer	fcfb1c1979	Make more distributions jittable Summary: This uses zou3519's new `torch.broadcast_tensors()` #10075 to make `Categorical.log_prob()` and the `*Normal.__init__()` methods jittable. Previously `.log_prob()` was failing due to calls to `torch._C.infer_size()` with errors like ``` def log_prob(self, value): if self._validate_args: self._validate_sample(value) > value_shape = torch._C._infer_size(value.size(), self.batch_shape) if self.batch_shape else value.size() E RuntimeError: expected int at position 0, but got: Tensor ``` After this change I'm able to jit many more of Pyro's tests. Reviewed By: ezyang Differential Revision: D9477487 Pulled By: apaszke fbshipit-source-id: 5f39b29c6b8fa606ad30b02fefe2dfb618e883d6	2018-08-23 08:09:49 -07:00
Erik Brinkman	529fc68df2	Update docs with clean (#10819 ) Summary: Add tip about cleaning if installing ninja after a build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10819 Reviewed By: soumith Differential Revision: D9480095 Pulled By: erikbrinkman fbshipit-source-id: 96ae1387038afe6964a1bd1e2186468f6a5ea12f	2018-08-23 07:25:19 -07:00
Edward Yang	deda05e59f	Revert D9395814: move HeatmapMaxKeypointOp unittest to oss Differential Revision: D9395814 Original commit changeset: 25073eb6b143 fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb	2018-08-23 06:54:29 -07:00
Xianjie Chen	b885dea300	parallize the dense part in event models Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10768 Reviewed By: Wakeupbuddy Differential Revision: D9445750 fbshipit-source-id: b8c2ddfe3ccb9278506de15a5e43bada016408f7	2018-08-22 22:40:07 -07:00
Elias Ellison	5c0eece2fd	Force types on values returned from if blocks to be equivalent (#10281 ) Summary: When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281 Differential Revision: D9466566 Pulled By: eellison fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c	2018-08-22 19:55:38 -07:00
Yanghan Wang	9a43fc5eaa	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674 Reviewed By: newstzpz Differential Revision: D9395814 fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9	2018-08-22 19:11:10 -07:00
Yi Cheng	4aa5075cae	update the constructor to accept the PredictorConfg only to set up the predictor (#9483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9483 The interface is updated to accept the config to construct the predictor. Reviewed By: highker Differential Revision: D8872999 fbshipit-source-id: 3ca54d644970823fc33c0ade9a005e12f52e2b24	2018-08-22 19:11:09 -07:00
François Garillot	f0ec3bfa56	Changes for Python3 compatibility (#10524 ) Summary: Review by tomdz volkhin anshulverma Pull Request resolved: https://github.com/pytorch/pytorch/pull/10524 Reviewed By: ezyang Differential Revision: D9328001 Pulled By: huitseeker fbshipit-source-id: 144721c4fd9a1ea6cf6673793416f20cb448aa93	2018-08-22 18:55:01 -07:00
Teng Li	44b47fd7f3	Working pybind version of MPI process group and abort() pybind (#10606 ) Summary: This will make pybind version of MPI PG work. The issue is the scope of the tensor list won't be available for the MPI worker thread. So we pass the vector by value instead. Also added recv_anysource pybind to make it work. The front-end API will wrap one level up with an int for this function. So taking a tensor should be the easiest way for now. Also added abort pybind and fixed the flaky test. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ mpirun -np 8 ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10606 Differential Revision: D9474393 Pulled By: teng-li fbshipit-source-id: cca236c333656431e87d0d3573eeae9232c598b0	2018-08-22 18:26:04 -07:00
Wei Wen	6c75fc0aa3	Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644 Depends on D8493264 Reviewed By: chocjy, boryiingsu Differential Revision: D9347706 fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842	2018-08-22 17:10:03 -07:00
Adam Paszke	f72e813c2f	Allow tracing functions that take tuples of tensors as inputs (#10637 ) Summary: And return tuples. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637 Reviewed By: eellison Differential Revision: D9385892 Pulled By: apaszke fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431	2018-08-22 15:37:10 -07:00
Jesse Hellemn	043a2e36e5	Removing setup_caffe2.py (#10734 ) Summary: FULL_CAFFE2=1 python setup.py (install \| build_deps develop) should be all anyone needs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10734 Reviewed By: orionr Differential Revision: D9439354 Pulled By: pjh5 fbshipit-source-id: 0169afcda4f8f38c57498ba2151f7654ecce6070	2018-08-22 15:37:07 -07:00
Richard Zou	6c84f7fea0	Relax RHS type assert for augassign (#10730 ) Summary: Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`). Right now we assert that the RHS of the binop is a tensor, but it really doesn't have to be because we support scalar/scalar ops and also list-list ops (i.e., `[1, 2] + [2, 3]`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730 Differential Revision: D9465110 Pulled By: zou3519 fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b	2018-08-22 15:10:33 -07:00
James Reed	d40a598777	Back out "[pytorch][PR] Create at::linear" (#10785 ) Summary: Multiple failing external and internal CI signals were ignored when this commit was landed. goldsborough please fix the text failures and resubmit this change as a new PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785 Reviewed By: ezyang Differential Revision: D9466791 Pulled By: jamesr66a fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949	2018-08-22 14:39:59 -07:00
James Reed	6fcac354c5	Erase ListConstruct nodes for ONNX export (#10713 ) Summary: ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713 Differential Revision: D9458508 Pulled By: jamesr66a fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694	2018-08-22 14:39:58 -07:00
Tongzhou Wang	de11a5fb28	Resubmit #8322 with scipy version check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775 Differential Revision: D9458207 Pulled By: SsnL fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b	2018-08-22 13:39:49 -07:00
Gregory Chanan	ee3e48d34b	Move Backend, Layout, ATenGeneral, Deprecated, Generator to ATen/core. (#10740 ) Summary: I included "legacy" includes in the old spots for Backend, Generator, Layout; it seemed unlikely that the other ones had direct user includes. This is another step on the path to move Type/Tensor to ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10740 Reviewed By: ezyang Differential Revision: D9435888 Pulled By: gchanan fbshipit-source-id: 89f4f0f445d4498a059d3a79069ba641b22bbcac	2018-08-22 13:39:46 -07:00
Chetter2	5ca2713a8b	Fix performance of WeightedRandomSampler (#10636 ) Summary: Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636 Differential Revision: D9423869 Pulled By: zou3519 fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44	2018-08-22 13:15:48 -07:00
Wei Wen	0e30fa6f3c	Faster random number generation in fused_rowwise_random_quantization_ops (#10634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10634 ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=2, random_=True, data_shape_=array([1024, 1224]), gc=, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.9944190979003908 ms Quantizing time: 2.080512046813965 ms (1.0431669296609765X) De-quantizing time: 0.7375001907348633 ms (0.36978195380863577X) ``` ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=1, random_=True, data_shape_=array([1024, 1224]), gc=device_type: 1, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.6691923141479492 ms Quantizing time: 7.500243186950684 ms (4.493336761366071X) De-quantizing time: 1.1209726333618164 ms (0.6715658967876477X) ``` Reviewed By: jspark1105 Differential Revision: D8849770 fbshipit-source-id: 2bb2bac7e633f647f38e419ce980b8958f3bcae2	2018-08-22 13:15:46 -07:00
Junjie Bai	754ec9e386	Reduce rocm link time with ThinLTO Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10758 Differential Revision: D9467554 Pulled By: bddppq fbshipit-source-id: 6853ccd96ac3209e062c110913ea37d6840c8134	2018-08-22 13:15:45 -07:00
Julian Rosenblum	9767951ca8	Remove regex matching from undefined_tensor_test, fixes #10013 (#10702 ) Summary: Don't regex against strings that may have come from the backtrace. Better to just not regex at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10702 Reviewed By: ezyang Differential Revision: D9406154 Pulled By: jsrmath fbshipit-source-id: 9b17abee2a6e737a32c05f1e3963aef4b6638a47	2018-08-22 12:39:57 -07:00
Jerry Zhang	b0ad8105d2	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: ezyang Differential Revision: D9384781 fbshipit-source-id: 40ded2437715a3a2cc888ef28cbca9a25b1d5350	2018-08-22 11:55:02 -07:00
Vishwak Srinivasan	5fb9b31ed5	Add matrix_rank (#10338 ) Summary: - Similar functionality as NumPy - Added doc string - Added tests Differential Revision: D9240850 Pulled By: SsnL fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf	2018-08-22 09:58:38 -07:00
Anders Papitto	fbd7189949	add explicit flag to build static libtorch (#10754 ) Summary: I've tested locally that this works to build static and non-static binaries with and without CUDA. In terms of ongoing testing, I am working on incorporating this into the release package generation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10754 Differential Revision: D9457423 Pulled By: anderspapitto fbshipit-source-id: aa1dcb17c67c0f0c493a9cf93aca4a6e06b21666	2018-08-22 09:26:07 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Pritam Damania	2fe5fa78fa	Use FinishDeviceComputation instead of adding events in Operator::SyncDevice Summary: The code in Operator::SyncDevice had some duplicate logic and using FinishDeviceComputation sufficed in this case. Reviewed By: yinghai Differential Revision: D9348288 fbshipit-source-id: d8d874bab491e6d448fcd5fa561a8b99d502753b	2018-08-22 01:09:53 -07:00
Ahmed Aly	22446a3619	Productionize CRF layer in PyText (#10362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10362 This diff implements a manual export from PyText's CRF module to the caffe2 CRF layer. Note that most of the changes in caffe2/python/crf.py are just formatting changes, the only relevant change is the new class CRFUtils. Reviewed By: hikushalhere Differential Revision: D9234126 fbshipit-source-id: 1a67d709034660e8b3d5ac840560b56de63e3f69	2018-08-22 00:25:26 -07:00
Edward Yang	19031c68dc	Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488 ) Summary: ``` Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage This patch does two major changes: - It replaces the use of Retainable in Storage with a new implementation based on intrusive_ptr. This will be necessary because Caffe2 will be using this class to implement intrusive_ptrs, and we need to line these up for the merge. One good thing about the new implementation is that the default copy/move constructors/assignment operators and destructor work automatically, instead of needing to be hardcoded into Storage/Tensor. - It replaces all places where we returned std::unique_ptr<Storage> with Storage, collapsing an unnecessary double indirection that is no longer necessary now that we have correctly working copy/move constructors. I didn't initially want to do step (2), but it was very important to eliminate all bare uses of new Storage and new StorageImpl, and this making the API change was the most straightforward way to do this. HOW TO FIX YOUR CODE IN THE NEW API - You no longer need to dereference the result of tensor.storage() to pass it to set. So, instead of: x.set_(*y.storage()); just write: x.set_(y.storage()); - If you were accessing methods on StorageImpl via the pImpl() method, you must use the dot operator to run pImpl(). Even better; just drop pImpl, we now have method forwarding. So, instead of: storage->pImpl()->data(); just do: storage->data(); // storage.pImpl()->data() works too but is not as recommended - storage->getDevice() is no more; instead use storage->device().index() MISC CODE UPDATES - retain, release, weak_retain, weak_release and weak_lock are now reimplemented using the "blessed API", and renamed to make it clearer that their use is discouraged. - nvcc OS X and general OS X portability improvements to intrusive_ptr - A new comment in intrusive_ptr describing how stack allocated intrusive_ptr_targets work differently than heap allocated ones from c10::make_intrusive CAVEAT EMPTOR - THStorage_weakRetain used to work on strong pointers, but it NO LONGER works with intrusive_ptr. You must reclaim the strong pointer into a real strong pointer, construct a weak pointer from it, and then release the strong and weak pointers. See StorageSharing.cpp for an example. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488 Reviewed By: gchanan Differential Revision: D9306134 Pulled By: ezyang fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2	2018-08-21 21:39:55 -07:00
Tongzhou Wang	abb209ef25	Fixes *fft docs (#10760 ) Summary: cc cranmer fixes #10751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760 Differential Revision: D9444473 Pulled By: SsnL fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d	2018-08-21 21:09:04 -07:00
Yiming Wu	e5e2514f4e	fix debug_info arg in createOperator and improve reroute_tensor (#10736 ) Summary: -Fixed C2 core.CreateOperator debug info assignment -Improving core.Net.reroute_tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10736 Differential Revision: D9426659 Pulled By: harouwu fbshipit-source-id: 90caf848c88854e17e568d5f6910dc6c81fd000a	2018-08-21 19:40:16 -07:00
Peter Goldsborough	1068ba667c	Create at::linear (#10755 ) Summary: The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere. apaszke ezyang (who mentioned this in #10481) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755 Differential Revision: D9443881 Pulled By: goldsborough fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3	2018-08-21 19:40:15 -07:00
Bram Wasti	a2ca634e04	Add enforce back to converter.cc Summary: hotfix for B*8 Differential Revision: D9444060 fbshipit-source-id: 368f8463e684c39ec0ac18bcb11a7b6132d9f874	2018-08-21 19:09:22 -07:00
James Reed	ddf187c198	Dont assume serialized integral types were widened to int32 in raw_data (#10718 ) Summary: zdevito et al came to the conclusion that the ONNX spec does not mandate the widening conversion of integral types when serializing tensor data into raw_data, as opposed to serializing the data into int32_data. PyTorch recently made this change in the export code, which caused import in caffe2 to break because it did not match semantics. This fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10718 Differential Revision: D9423712 Pulled By: jamesr66a fbshipit-source-id: 479fbae67b028bf4f9c1ca1812c2c7b0c6cccd12	2018-08-21 18:41:31 -07:00
Aaron Jaech	6325e5aa48	fix typo in error message (#9827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9827 changed unitilized to uninitialized Reviewed By: jerryzh168 Differential Revision: D8995509 fbshipit-source-id: 94518d5542a7bff49fcb9a4505c0c7a959746f78	2018-08-21 18:41:29 -07:00
tomdz	44f996f82c	Py3 fixes for layer_model_helper.py (#10525 ) Summary: Fixes `__getattr__` to adhere to its Python API contract, and wraps `range()` call in a list since it does not return one anymore in Python 3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10525 Reviewed By: ezyang Differential Revision: D9441360 Pulled By: tomdz fbshipit-source-id: d489c0e7cefecc4699ca866fd55ddbfa629688d4	2018-08-21 18:41:28 -07:00
Peter Goldsborough	71ddd837d7	Support custom ops in ScriptModule and tidy up test files (#10610 ) Summary: This PR adds support for using custom ops in ScriptModules, the last step for our custom op strategy. You can now write ``` import torch torch.ops.load_library('libcustom_ops.so') class Model(torch.jit.ScriptModule): def __init__(self): super(Model, self).__init__() torch.jit.script_method def forward(self, input): return torch.ops.custom.op(input) + 1 model = Model() model.forward(torch.ones(5)) # Works model.save("model.pt") # Works model = torch.jit.load("model.pt") # Works ``` You can then load the `model.pt` in C++ and execute its `forward` method! Missing for this was the fact that the script compiler didn't know to convert `ops.custom.op` into a `BuiltinFunction` which then emits a function call. For this I came up with the following strategy inside `torch/csrc/jit/scrip/init.cpp`: 1. When we access `torch.ops`, we return a `CustomOpValue` (subclass of `PythonValue`), whose purpose is only to return a `CustomOpNamespaceValue` (subclass of `PythonValue`) whenever something under it is accessed. 2. `CustomOpNamespaceValue` will then for each field accessed on it return a `BuiltinFunction`. This doesn't reduce performance for any calls that are not to `torch.ops` (as opposed to inspecting every function call's name the call site, for example). I also had to fix `BuiltinFunction` to not assume the namespace is always `aten::`. A lot of other changes are just tidying up the Python and C++ test harness before I integrate it in CI. zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10610 Differential Revision: D9387832 Pulled By: goldsborough fbshipit-source-id: c00f431db56c7502a66fe1f813fe78067f428ecb	2018-08-21 18:41:27 -07:00
Tongliang Liao	e94ae99d24	Delete copy constructor/assignment of class Observable explicitly. (#10593 ) Summary: This should resolves "error C2280: 'std::unique_ptr<caffe2::ObserverBase<caffe2::OperatorBase>,std::default_delete<_Ty>> &std::unique_ptr<_Ty,std::default_delete<_Ty>>::operator =(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function" from Visual Studio. It should also make error message more human-readable in case if something really messed up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10593 Reviewed By: orionr Differential Revision: D9436397 Pulled By: mingzhe09088 fbshipit-source-id: 31711667297b4160196134a34365da734db1c61d	2018-08-21 16:56:04 -07:00
Shihao Xu	04b773ab87	Support Loading to GPU (#10710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10710 Can't resume from checkpoint for workflows that use GPU. The problem is just we didn't leverage the already-provided GPU deserialization of Caffe2. `keep_device` arg of LoadOp. See https://fburl.com/y27ltaxw How a serialized BlobProto (contraining TensorProto) is loaded into GPU memory? - Load BlobProto from DB. https://fburl.com/pe1qaeyf - Deserialize the BlobProto into a Blob instance. https://fburl.com/5dirjuuh and https://fburl.com/stoho0x1 - Call Blob->Deserialized. https://fburl.com/bnureu32 - Deserializer Registration. https://fburl.com/wbu95ry7 https://fburl.com/ycetud8u - Create TensorCUDA Deserializer. https://fburl.com/2lirfuqj - Create Tensor on GPU and get TensorProto of BlobProto. https://fburl.com/7dre82zg - Copy TensorProto in CPU to Tensor on GPU. https://fburl.com/fr0qk2oe Cloned the GPU workflows for testing in D9125520. Reviewed By: mraway Differential Revision: D9372950 fbshipit-source-id: 2bf70747bd71e8da16239197f7d2761d63f09ff8	2018-08-21 13:57:36 -07:00
Orion Reblitz-Richardson	edb34434ab	More changes for hidden visibility (#10692 ) Summary: Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624 cc mingzhe09088 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692 Reviewed By: mingzhe09088 Differential Revision: D9423617 Pulled By: orionr fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf	2018-08-21 13:39:57 -07:00
nadavbh12	8a1739b05d	Add arguments __repr__ in Distribution base class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10373 Differential Revision: D9240316 Pulled By: ezyang fbshipit-source-id: f35c500f61f86e6be405e8bd4040db5146224984	2018-08-21 12:10:23 -07:00
Lei Zhang	9c321a8779	Add util function from core type to dtype (#10716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10716 title Reviewed By: idning Differential Revision: D9417357 fbshipit-source-id: 0f71805b1d64a46791d6ee4d8620763f878ffdb6	2018-08-21 10:55:19 -07:00
Lu Fang	b23d59ce1a	Make ONNX_ATEN_FALLBACK as internal default option Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10629 Reviewed By: bddppq Differential Revision: D9381106 fbshipit-source-id: 03d42c95d17a70a68fe0f38dad68f1793996dfce	2018-08-21 10:10:50 -07:00
Jorghi12	b0b5139149	Set the BUILD_ENVIRONMENT variable before installing sccache. (#10640 ) Summary: Set the build environment before installing sccache in order to make sure the docker images have the links set up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10640 Reviewed By: yf225 Differential Revision: D9399593 Pulled By: Jorghi12 fbshipit-source-id: a062fed8b7e83460fe9d50a7a27c0f20bcd766c4	2018-08-21 09:40:41 -07:00
Marat Dukhan	30ad13faca	Avoid shadowing i, j vars in GeneralProposals test (#10721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10721 - Fix compilation warning "declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]" Reviewed By: newstzpz Differential Revision: D9419688 fbshipit-source-id: 76efc3688782ce4ead3c89e7069211736febfac2	2018-08-21 09:11:38 -07:00
Gregory Chanan	f9d1b001e1	Move THNN Reduction to ATen/core. (#10703 ) Summary: This is part of moving the (base) Type to ATen/core; Some Type methods have default argument of type THNN Reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10703 Differential Revision: D9406060 Pulled By: gchanan fbshipit-source-id: 789bb3387c58bd083cd526a602649105274e1ef6	2018-08-21 08:54:35 -07:00
Mingzhe Li	f0d8a36e70	Completely remove build_aten and use_aten (#10469 ) Summary: Breaking out of #8338 to completely remove build_aten and use_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469 Reviewed By: orionr Differential Revision: D9413639 Pulled By: mingzhe09088 fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183	2018-08-20 20:26:42 -07:00
Michael Suo	9e75ec11fb	Make empty list literals construct empty Tensor[] (#10705 ) Summary: This will make the common case more natural (no need to do `_construct_empty_tensor_list()`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705 Differential Revision: D9411622 Pulled By: michaelsuo fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96	2018-08-20 18:28:28 -07:00
Jesse Hellemn	5c0d9a2493	Soumith's last few patches to v0.4.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10646 Reviewed By: ml7 Differential Revision: D9400556 Pulled By: pjh5 fbshipit-source-id: 1c9d54d5306f93d103fa1b172fa189fb68e32490	2018-08-20 18:28:27 -07:00
Duc Ngo	e449a27646	Fix issues link in Caffe2 readme (#10711 ) Summary: Change to pytorch issues link orionr pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10711 Reviewed By: orionr Differential Revision: D9412870 Pulled By: duc0 fbshipit-source-id: 341e8504ade8eba614cead832e5b5fdca4b1c270	2018-08-20 16:55:11 -07:00
JerryShih	826550a32e	Update the onnx Gemm op to FC/FCTransposed logic in caffe2 onnx backend (#10108 ) Summary: The broadcast is used by default when the opset version is greater then 6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10108 Reviewed By: bddppq Differential Revision: D9176934 Pulled By: houseroad fbshipit-source-id: b737bd87b0ddc241c657d35856d1273c9950eeba	2018-08-20 16:09:22 -07:00
Jesse Hellemn	15d7f49205	Adding ATEN_NO_TEST option to root level cmake for propogation to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708 Reviewed By: ml7 Differential Revision: D9410916 Pulled By: pjh5 fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d	2018-08-20 15:40:27 -07:00
James Reed	585e6b581f	Allow method-style casts on tensors (#10641 ) Summary: Closes https://github.com/pytorch/pytorch/issues/10631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641 Differential Revision: D9407598 Pulled By: jamesr66a fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f	2018-08-20 14:10:21 -07:00
Edward Yang	39a3dcc999	Fix #10698 build failure (#10704 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10704 Differential Revision: D9406072 Pulled By: ezyang fbshipit-source-id: 0d472ef84cddc3bf7600b06d04e5e02e94d59fa3	2018-08-20 14:10:19 -07:00
Jason Gauci	b4684db698	Add support for Log() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10694 Reviewed By: houseroad Differential Revision: D9405612 Pulled By: MisterTea fbshipit-source-id: 6d83d3c2db933a3822076c7faf578ac0e92e60c6	2018-08-20 13:25:21 -07:00
Huan Gui	7832e9d564	Add a bisect percentile operator (#10563 ) Summary: Add a bisect percentile operators with lower and upper bounds for interpolation Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563 Reviewed By: chocjy Differential Revision: D7802182 Pulled By: olittle fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417	2018-08-20 13:14:05 -07:00
Jerry Zhang	3d0757430b	Fix EnsureCPUOutputOp (#10651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10651 EnsureCPUOutputOp will copy the input from another Context to CPU, but currently there is no guarantee that the Copy will be executed. Differential Revision: D9390046 fbshipit-source-id: af3ff19cf46560264cb77d2ab8821f0cc5be74f6	2018-08-20 12:12:48 -07:00
Duc Ngo	2e563c417c	Nomnigraph - rename some APIs that invole Subtree to Subgraph (#10551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10551 Renaming from "subtree" -> "subgraph" to improve clarity of subgraph matcher APIs since it now supports DAG This is pure renaming, no functionalities change. Reviewed By: bwasti Differential Revision: D9348311 fbshipit-source-id: 4b9267845950f3029dfe385ce3257d3abb8bdad4	2018-08-20 10:55:21 -07:00
Duc Ngo	aa9f328fa3	Nomnigraph - DAG matching (#10549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10549 Support dag matching in nomnigraph. This is done by maintaining a map from node in the MatchGraph to node in the input graph, and additionally enforce that same nodes in the MatchGraph must match to same nodes in the input graph (with the exception of multiplicity i.e. when count != 1 on the MatchGraph node). In a follow up diff, I'll rename the API that refers to subtree as subgraph to improve clarity. Reviewed By: bwasti Differential Revision: D9347322 fbshipit-source-id: 171491b98c76852240a253279c2654e96dd12632	2018-08-20 10:55:19 -07:00
Gregory Chanan	0cce4620fe	Fix backend/device-type comparison with MKLDNN. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10689 Differential Revision: D9400450 Pulled By: gchanan fbshipit-source-id: f75b042b886d5d525edb2c423173a9646c613a1b	2018-08-20 10:41:08 -07:00
Tongzhou Wang	db7b7f1359	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686 Differential Revision: D9399874 Pulled By: SsnL fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa	2018-08-20 10:40:55 -07:00
Orion Reblitz-Richardson	d4832f1e7b	More fixes for hidden visibility (#10624 ) Summary: Some more `ATEN_API` additions for hidden visibility. Running CI tests to see what fails to link. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624 Reviewed By: mingzhe09088 Differential Revision: D9392728 Pulled By: orionr fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726	2018-08-20 10:11:59 -07:00
Adam Paszke	9ad9191323	Fix cuDNN dropout state cache (#10662 ) Summary: Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10662 Reviewed By: soumith Differential Revision: D9393629 Pulled By: apaszke fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90	2018-08-20 05:09:41 -07:00
Kittipat Virochsiri	c37fac4d50	Fixing stop condition on composite reader (#9888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9888 Limiter cannot be shared or copied; just pass it to the first reader. Reviewed By: xianjiec Differential Revision: D9008871 fbshipit-source-id: e20cd785b26b1844e156efc3833ca77cfc3ffe82	2018-08-20 03:02:20 -07:00
Xiang Gao	83066e9b30	Add trigonometry functions for ONNX export (#7540 ) Summary: Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869 This PR makes pytorch support exporting graphs with trigonometry functions. This PR might need to wait until it is ready to change ```python _onnx_opset_version = 6 ``` to ```python _onnx_opset_version = 7 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540 Differential Revision: D9395041 Pulled By: bddppq fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2	2018-08-19 23:01:28 -07:00
Tongzhou Wang	3f603eeee8	some improvements on distributed docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666 Differential Revision: D9395242 Pulled By: SsnL fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956	2018-08-19 17:40:28 -07:00
Tongzhou Wang	108b657159	Import DistributedSampler in utils/data/__init__ (#10671 ) Summary: There is no reason that user should do an extra import to use DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671 Differential Revision: D9395189 Pulled By: SsnL fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2	2018-08-19 16:55:13 -07:00
Edward Yang	6bdbad93b9	Refactor Device to not depend on Backend. (#10478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478 - Removed Backend constructor from Device, and fixed all use-sites to use DeviceType::CPU instead of kCPU, or use a new function backendToDeviceType to perform the conversion. - New method device_type() on Type; it gives you the underlying device type, e.g., CPU for SparseCPU. - We add backward compatibility for kCPU/kCUDA uses, by introducing a new special type which is implicitly convertible to both DeviceType and Backend. As long as you don't define a function that's overloaded on both DeviceType and Backend (but not on BackendOrDeviceType), the implicit conversions will ensure that uses of at::Device(at::kCPU) keep working. We fixed use-sites in the library, but did NOT fix sites in the test code, so that we can exercise this BC code. Reviewed By: Yangqing Differential Revision: D9301861 fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716	2018-08-18 17:39:14 -07:00
Richard Zou	f1420adfe3	Move at::chunk into the graph fuser (#10178 ) Summary: ... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026 This is done through the following: 1) Absorb starting chunks into FusionGroup as a part of the graph fuser pass. 2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked. 3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an input tensor on the CPU. This chunk directly takes in an at::Tensor and creates four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors. - Expect test and correctness test to see if a single chunk is fused by the graph fuser - Correctness test for a variety of chunks (dimension = beginning, middle, end) and tensors (contiguous, non-contiguous, edge case (splitSize = 1) for both CPU/CUDA - Expect test for multiple chunks fused into the same kernel and correctness test. cc zdevito apaszke LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights. After changes: ``` thnn cudnn jit 8.8468 6.5797 9.3470 ``` Before changes: ``` thnn cudnn jit 9.9221 6.6539 11.2550 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178 Differential Revision: D9382661 Pulled By: zou3519 fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8	2018-08-18 16:10:11 -07:00
poh	d87b4e941b	fix python interpreter can not be found without `PYTHON_EXECUTABLE` (#10659 ) Summary: Take 2 of #10543 The problem was that between commit and merge there was added one more run point `tools/build_libtorch.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10659 Differential Revision: D9393540 Pulled By: soumith fbshipit-source-id: 8ebfed600fc735fd1cb0489b161ec80e3db062e0	2018-08-18 15:40:08 -07:00
Taewook Oh	152762a567	Fix warnings diagnosed in recent clang (#10647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10647 Fix "missing std::move from the return value" warning diagnosed by recent clang compiler. Reviewed By: soumith, DavidCallahan Differential Revision: D9384692 fbshipit-source-id: 8ad951e47d605e6f98a9650f2dec2909ad0f3eb8	2018-08-17 21:32:58 -07:00
Richard Zou	e29b5a1ea8	graph fuser inserts explicit expands where necessary (#10325 ) Summary: Fixes #10096 If the only thing preventing a simple mappable operator from being fused into a fusion group is that its Tensor inputs are not of the same shape as the output, then the graph fuser inserts explicit expand nodes for those inputs. This helps the graph fuser not miss out on any fusion opportunities involving simple mappable operations that have Tensor inputs. This PR doesn't do anything for the scalar case; that can be addressed later. Test Plan - Simple expect test case - Added expect tests for a raw LSTMCell. The expands help speed up the forwards pass by allowing more operations to be fused into the LSTMCell's single FusionGroup. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325 Differential Revision: D9379308 Pulled By: zou3519 fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245	2018-08-17 16:03:46 -07:00
Yinghai Lu	7c55d11ba5	Make sure we don't relocate the weight name buffer (#10630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10630 `onnxTensorDescriptorV1.name` points to the string buffer. We use a vector of strings to serve as the storage. This means we cannot reallocate the vector because that may invalidate the `onnxTensorDescriptorV1.name` pointers. Solution is to reserve a large enough vector so that it won't reallocate. Reviewed By: bddppq, houseroad Differential Revision: D9381838 fbshipit-source-id: f49c5719aafcc0829c79f95a2a39a175bcad7bfe	2018-08-17 16:03:31 -07:00
Peter Goldsborough	65b9308128	Basic infrastructure for C++ documentation (#10569 ) Summary: Adds the folder structure, Doxyfile, sphinx setup and Makefile to build C++ docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10569 Differential Revision: D9386744 Pulled By: goldsborough fbshipit-source-id: 0a7c581dcf0a5f7b01ba19d317b493cf95935134	2018-08-17 15:39:50 -07:00
Jesse Hellemn	b62b378022	Adding torch support for CMAKE_ARGS env Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10635 Reviewed By: ml7 Differential Revision: D9383845 Pulled By: pjh5 fbshipit-source-id: fb21bda12e88053eec738974e6e419388c5038d9	2018-08-17 14:54:43 -07:00
Tongzhou Wang	c5c1c051ca	Fix dropout fused kernel applied in eval mode (#10621 ) Summary: fixes https://github.com/pytorch/pytorch/issues/10584 cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621 Differential Revision: D9379397 Pulled By: SsnL fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc	2018-08-17 14:54:42 -07:00
Richard Zou	86c9856d9c	Fuse tensor-scalar ops when scalar is constant (#10511 ) Summary: This is on the way to resolving #9940. Fixes #10501 This PR modifies graph fuser to fuse operations that have constant scalar arguments. These constant scalar arguments are directly inlined into the kernel body. The context for this is that LSTM backward (in particular, sigmoid backward) has many add(x, 1.) operations. This PR should be sufficient for LSTM backward to get fused by the graph fuser. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511 Differential Revision: D9378896 Pulled By: zou3519 fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f	2018-08-17 14:10:23 -07:00
Keren Zhou	f3ac619764	Add fusion support for batchnorm and convolution without bias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10595 Reviewed By: bwasti Differential Revision: D9110099 fbshipit-source-id: e1ed66c7d82b2f9987b7eb9c7f98877a6dbeb902	2018-08-17 12:11:44 -07:00
Adam Paszke	d35f365ad5	Remove all cuDNN specific inputs to RNN functions (#10581 ) Summary: This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too. zdevito ezyang colesbury (for minor DispatchStub.h) changes There was no way to handle those in the JIT for now, and they turned out to be completely unnecessary. It should make the Python and C++ module code much simpler too, since all the logic is now centralized in the native functions. The downside is that RNN modules no longer own their dropout buffers, which are shared per-device instead (with appropriate locking and synchronization). This might appear as a perf regression at first, but in reality it's highly unlikely that anyone will want to run cuDNN RNNs on the same GPU in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581 Reviewed By: colesbury Differential Revision: D9365541 Pulled By: apaszke fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5	2018-08-17 11:09:51 -07:00
Wanchao Liang	52058204d6	Add nn functional tests in JIT (#10409 ) Summary: The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode. Some statistics in summary: - Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional). - 7 functions/ops does not support higher gradient, so just excluded from the whole test. - 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409 Differential Revision: D9350694 Pulled By: wanchaol fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8	2018-08-17 11:09:49 -07:00
Andrey Malevich	b4e72ea811	Revert D9377394: [pytorch][PR] [Caffe2] Add AT_CORE_EXPORT and AT_CORE_IMPORT. Differential Revision: D9377394 Original commit changeset: 993062a461ff fbshipit-source-id: af8ab92e9b88466602508981d9b3ea24ce393dfc	2018-08-17 10:39:27 -07:00
Jongsoo Park	bd9ab650ae	fix compile error in math_hip.cc from new Im2Col/Col2Im interface (#10623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10623 Fix compile error in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-build/10280//console Reviewed By: ezyang Differential Revision: D9379451 fbshipit-source-id: 67cc3964981edba1915b93c49643caa300d63c16	2018-08-17 10:24:25 -07:00
Edward Yang	ff440b61f6	Revert D9378844: [pytorch][PR] fix python interpreter can not be found Differential Revision: D9378844 Original commit changeset: 022e20aab7e2 fbshipit-source-id: 962280707e84edff2a4f59b1ce2f4211a579a055	2018-08-17 10:09:27 -07:00
Elias Ellison	e190505e84	Adding support for inlining if branches (#10084 ) Summary: Inlining if branches which have constant inputs. If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should return a mutated set of (a) and not (a, b). ``` if cond: if True: a = a - 1 else: b = b - 1 ``` To calculate this we recursively update mutate variables in if branches from the leaf nodes up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084 Reviewed By: michaelsuo Differential Revision: D9340429 Pulled By: eellison fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98	2018-08-17 09:48:47 -07:00
Junjie Bai	31c7a32d1c	Include aten_op by default in caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10603 Reviewed By: ahhegazy, dzhulgakov Differential Revision: D9364309 fbshipit-source-id: e72d9f2b1e99cb0fb2186c737fcd925b14d42754	2018-08-17 08:39:46 -07:00
Yinghai Lu	03982fb8d3	Fix subgraph cutting wrt recent external_input change in nomnigraph (#10598 ) Summary: https://github.com/pytorch/pytorch/pull/10100 recently take external input/output in nomnigraph. This PR makes adjust to 0. Relax some of the conditions on external input 1. Update NNModule inputs/outputs when pruning the input/output. 2. Avoiding copying external input/output as nomnigraph already takes care of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10598 Reviewed By: bwasti Differential Revision: D9371730 Pulled By: yinghai fbshipit-source-id: 9273be5041dc4cc8585587f47cb6721e518a06a8	2018-08-17 08:25:49 -07:00
Nikita Melentev	ff3a481aee	fix python interpreter can not be found (#10543 ) Summary: Custom python installation, which have no aliases to `python` or `python3` can't be found by cmake `findPythonInterp` without extra cmake argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10543 Differential Revision: D9378844 Pulled By: ezyang fbshipit-source-id: 022e20aab7e27a5a56b8eb91b6026151116193c7	2018-08-17 08:25:48 -07:00
Tongliang Liao	51222500e2	Add AT_CORE_EXPORT and AT_CORE_IMPORT. (#10602 ) Summary: Fix "error LNK2019: unresolved external symbol" from "CAFFE_KNOWN_TYPE" in tests where we should use dllexport instead of AT_CORE_API(=dllimport). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10602 Differential Revision: D9377394 Pulled By: Yangqing fbshipit-source-id: 993062a461ffce393f2321c5391db5afb9b4e7ba	2018-08-17 02:09:38 -07:00
Jongsoo Park	cc53807be5	group conv with NHWC layout (#10585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585 group conv with NHWC layout Reviewed By: BIT-silence Differential Revision: D7547497 fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661	2018-08-17 00:39:23 -07:00
onnxbot	0aefb9f26c	Update onnx to onnx/onnx@7848f1e (#10613 ) Summary: https://github.com/onnx/onnx/commit/7848f1e Pull Request resolved: https://github.com/pytorch/pytorch/pull/10613 Reviewed By: houseroad Differential Revision: D9376224 Pulled By: bddppq fbshipit-source-id: ce8a53255ba24f0f8f989570e8b015837f8442fb	2018-08-16 23:39:37 -07:00
Summer Deng	6667d55e73	Disallow input filler for GatherRangesOp (#10592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10592 Filter out GatherRanges ops Reviewed By: highker Differential Revision: D9365220 fbshipit-source-id: e21ab00dc9e553c9aaf172e1241206e0c0a7a23d	2018-08-16 21:39:09 -07:00
Hassan Eslami	3578909671	Remove unused code base for distributed training (#10282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10282 This diff removes the unused/deprecated features from the code base. Reviewed By: manojkris Differential Revision: D9169859 fbshipit-source-id: d6447b7916a7c687b44b20da868112e6720ba245	2018-08-16 20:10:17 -07:00
Anders Papitto	f1d40ef280	build_pytorch_libs.sh: use MAX_JOBS rather than NUM_JOBS (#10600 ) Summary: MAX_JOBS is set by our jenkins setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/10600 Differential Revision: D9375317 Pulled By: anderspapitto fbshipit-source-id: 25416d5ee12372f7610baa78cb7b423806b26aa2	2018-08-16 20:10:15 -07:00
Peter Goldsborough	c101a57a74	Build mechanism for custom operators (#10226 ) Summary: This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I: 1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries 2. Created a ` torch/op.h` header for easy inclusion of necessary headers, 3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op. 1. It defines an op in `op.{h,cpp}` 2. Registers it with the JIT using `RegisterOperators` 3. Builds it into a shared library via a `CMakeLists.txt` 4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey! The pure C++ and the Python builds are separate and not coupled in any way. zdevito soumith dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226 Differential Revision: D9296839 Pulled By: goldsborough fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0	2018-08-16 18:56:17 -07:00
Marat Dukhan	67c6d93634	Tune minimal work size (#10599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599 Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed. Reviewed By: ajtulloch Differential Revision: D9366664 fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e	2018-08-16 17:39:57 -07:00
Jerry Ma	afd7477eaa	Add ``buffers(),` `named_buffers()`` methods. (#10554 ) Summary: This commit adds the ``buffers()`` and ``named_buffers()`` methods as analogues of ``parameters()`` and ``named_parameters()``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554 Reviewed By: SsnL Differential Revision: D9367762 Pulled By: jma127 fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e	2018-08-16 16:26:48 -07:00
Junjie Bai	342517e6e7	Back out "Add aten_op to caffe2 onnx (python) backend" (#10589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10589 Original commit changeset: 2cc6fedbaf08 Reviewed By: houseroad Differential Revision: D9365208 fbshipit-source-id: 3871d8e70f0d8e48c8af9593c78587d16c45afc2	2018-08-16 15:15:27 -07:00
Orion Reblitz-Richardson	488ea824ed	Additional changes to make GPU builds work (#10507 ) Summary: A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds. I was testing with ``` FULL_CAFFE2=1 python setup.py build_deps \| tee ~/log.txt cat ~/log.txt \| egrep 'undefined refer' \| sort \| less ``` I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing. cc mingzhe09088 anderspapitto ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507 Reviewed By: Yangqing Differential Revision: D9359606 Pulled By: orionr fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2	2018-08-16 13:25:27 -07:00
Ailing Zhang	ef15bb8787	remove implicit conversion from gpu to cpu (#10553 ) Summary: Resubmit #10416 with fixed tests . This is to remove implicit conversion from gpu to cpu in when calling numpy to keep behavior match others. It requires users to move the tensor back to cpu() before call numpy functions on it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10553 Differential Revision: D9350212 Pulled By: ailzhang fbshipit-source-id: 9317d8fea925d4b20ae3150e2c1b39ba5c9c9d0a	2018-08-16 12:10:39 -07:00
Chao Li	d6f3c88418	Revert D9076734: Split storage from tensor Differential Revision: D9076734 Original commit changeset: ea9e1094ecf8 fbshipit-source-id: 3fa9b65b7265fce6207d9e1d9ef4707dbb29704b	2018-08-16 11:25:32 -07:00
Kirtesh Patil	40a070422d	Adding new allreduce bcube routines to ops supported by gloo (#10494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10494 Adding the AllredubeBcube routines as they are now available in gloo. Reviewed By: wesolwsk Differential Revision: D8269473 fbshipit-source-id: 6a3a32291bbf1fbb328b3ced0f2a753dc5caf4e5	2018-08-16 10:56:26 -07:00
Yinghai Lu	4be4b4c8b5	Remove weight from input of onnxifi backend op (#10575 ) Summary: The ONNXIFI backend will absorb the constant weight in Conv, so we should not add it as an input. This is just a test artifacts. Note that Onnxifi transformer will do the right thing when cutting the graph to absorb the weights. rdzhabarov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10575 Reviewed By: houseroad Differential Revision: D9357339 Pulled By: yinghai fbshipit-source-id: a613fa3acafa687295312f5211f8e9d7f77b39cd	2018-08-16 10:56:25 -07:00
Fei Sun	319fefe9e6	Support benchmark on windows machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564 Reviewed By: llyfacebook Differential Revision: D9356389 Pulled By: sf-wind fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9	2018-08-16 10:56:23 -07:00
Gregory Chanan	00f2731112	Merge THTensor into TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479 Differential Revision: D9315800 Pulled By: gchanan fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e	2018-08-16 08:10:06 -07:00
Anders Papitto	130881f0e3	Delete build_caffe2.sh, replace with build_libtorch.py (#10508 ) Summary: delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR). This ensures that all consumers of the torch CMake file go through as unified a path as possible. In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508 Differential Revision: D9354398 Pulled By: anderspapitto fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9	2018-08-16 08:10:04 -07:00
Edward Yang	c6facc2aaa	Add conversions between DataType and ScalarType. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10472 Reviewed By: gchanan Differential Revision: D9298048 fbshipit-source-id: c58efa582eab64c58d0771d90d90862911c168d1	2018-08-16 07:55:31 -07:00
Edward Yang	fdd2b9baee	Add DataType alias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10547 Reviewed By: soumith Differential Revision: D9346040 fbshipit-source-id: 1069a44182ccff68b1694086c8b709ba2046b22b	2018-08-16 07:55:29 -07:00
Edward Yang	8fdba4ec35	Move all operator<< overloads out of the global namespace. (#10546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10546 Have you ever written an operator<< overload in the caffe2 namespace in a core Caffe2 header, and then been stunned when some completely unrelated code started breaking? This diff fixes this problem! The problem looks like this: 1. You're building against a really old version of glog (think 0.3.2, or something like that) 2. This version of glog defines operator<< overloads for std containers in the global namespace 3. You add a new overload in your current namespace (e.g., caffe2). Congratulations: this overload is preferentially chosen over the global namespace one for all calls to << in that namespace. And since it doesn't actually have std::vector overloads, unrelated Caffe2 code breaks. Newer versions of glog have a fix for this: they have the line: namespace std { using ::operator<<; } in their header. So let's help old versions of glog out and do this ourselves. In our new world order, operator<< overloads defined in the global namespace won't work (unless they're for std containers, which work because of ADL). So this diff also moves all those overloads to the correct namespace. Reviewed By: dzhulgakov Differential Revision: D9344540 fbshipit-source-id: 6246ed50b86312668ebbd7b039fcd1233a3609cf	2018-08-16 07:55:27 -07:00
Tongliang Liao	238b4b9236	Resolve error C2370 "redefinition; different storage class" by adding dllimport. (#10571 ) Summary: For #10568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10571 Differential Revision: D9357987 Pulled By: Yangqing fbshipit-source-id: 6726f0a1d31a225375a0ddc0e05284f3eb89dda8	2018-08-16 00:39:33 -07:00
Junjie Bai	84427d26db	Add aten_op to caffe2 onnx (python) backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10579 Reviewed By: houseroad Differential Revision: D9357837 fbshipit-source-id: 2cc6fedbaf088df7e11b52a91dfe3b8f0d7fd599	2018-08-16 00:39:30 -07:00
Junjie Bai	76da0b34c2	Remove an unused variable found by linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10578 Differential Revision: D9357880 Pulled By: bddppq fbshipit-source-id: 6b56c2dbd02258124b5a4656cdf44d14a59e1b71	2018-08-16 00:25:44 -07:00
Tongliang Liao	7487ee55f1	Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572 ) Summary: For #10570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572 Differential Revision: D9357984 Pulled By: Yangqing fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb	2018-08-16 00:25:41 -07:00
Owen Anderson	abf85bf0ef	Perform CSE across block boundaries. (#10105 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105 Differential Revision: D9186678 Pulled By: resistor fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d	2018-08-16 00:25:36 -07:00
Peter Goldsborough	2e0dd86903	Make torch::Tensor -> at::Tensor (#10516 ) Summary: This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like ``` auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() + torch::stack(torch::TensorListView(value_loss)).sum(); ``` is now ``` auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum(); ``` CC jgehring ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516 Differential Revision: D9324691 Pulled By: goldsborough fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449	2018-08-15 21:25:12 -07:00
Vishwak Srinivasan	8013dac43d	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Reviewed By: Yangqing Differential Revision: D9348485 Pulled By: soumith fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a	2018-08-15 20:55:59 -07:00
Teng Li	05dcf00644	fixed c10d test (#10557 ) Summary: fixed NCCL test, which is not run in CI. We should enable it soon. ``` ~/new_pytorch/pytorch/test$ python test_c10d.py ............... ---------------------------------------------------------------------- Ran 15 tests in 13.099s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10557 Reviewed By: ailzhang Differential Revision: D9353286 Pulled By: teng-li fbshipit-source-id: 5a722975beaa601203f51c723522cc881f2d2090	2018-08-15 17:22:38 -07:00
Yangqing Jia	0a809fc8b1	build changes to make cpu unified build working. (#10504 ) Summary: Properly annotated all apis for cpu front. Checked with cmake using cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON and resulting libcaffe2.so has about 11k symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504 Reviewed By: ezyang Differential Revision: D9316491 Pulled By: Yangqing fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454	2018-08-15 17:22:36 -07:00
Xiaomeng Yang	87cac4c2f1	Update Im2Col related to make preparation for group conv in NHWC order. (#10439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439 Update Im2Col related to make preparation for group conv in NHWC order. Reviewed By: houseroad Differential Revision: D9285344 fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d	2018-08-15 17:10:24 -07:00
Yiming Wu	579962f2a8	reroute tensor feature in core.Net and generate one net feature in model_helper (#10528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528 adding 2 features to core and model_helper - reroute_tensor which supports op insertion on net level - model_helper complete net and cut net used for full graph analysis Differential Revision: D9330345 fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a	2018-08-15 16:40:15 -07:00
Jerry Zhang	523bdc8ec1	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: dzhulgakov Differential Revision: D9076734 fbshipit-source-id: ea9e1094ecf8c6eaeaa642413c56c6a95fb3d14e	2018-08-15 16:40:14 -07:00
Gregory Chanan	03e9ea5ef0	Fix leaking of Storages (not StorageImpls) (#10552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10552 Fix leaking of Storages (not StorageImpls) Reviewed By: li-roy Differential Revision: D9349824 fbshipit-source-id: 31f14951020a63189bebda25a3bf8bf195cd227f	2018-08-15 16:10:00 -07:00
Lukasz Wesolowski	4c49da34a9	Add new MKLDNN fallback operators (#10526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10526 Resubmitting these changes. Previously they caused issues with multifeed, which I fixed with D9280622 Reviewed By: yinghai Differential Revision: D9327323 fbshipit-source-id: ec69428039b45c6221a5403b8fe9a83637857f04	2018-08-15 15:55:22 -07:00
Simon Wang	a129f9ad3b	Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation. Differential Revision: D9332335 Original commit changeset: 1b3a91d078ef fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec	2018-08-15 15:25:11 -07:00
Thomas Viehmann	151e7de893	varargs for einsum (#10067 ) Summary: Implemented via a wrapper, thank you Richard for the suggestion! Fixes: #9929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067 Differential Revision: D9083388 Pulled By: soumith fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da	2018-08-15 15:13:25 -07:00
Will Feng	fb45ec5ac3	Don't set DEBUG=1 in ASAN build (#9902 ) Summary: This should make ASAN tests run faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9902 Differential Revision: D9032986 Pulled By: yf225 fbshipit-source-id: 3d2edec2d7ce78bc995d25865aa82ba6d3f971d0	2018-08-15 14:39:57 -07:00
Marat Dukhan	26c764a1db	Update FP16 submodule. Close #10523 (#10548 ) Summary: Pull a fix in FP16 for compilation bug when using Intel Compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/10548 Differential Revision: D9349469 Pulled By: Maratyszcza fbshipit-source-id: 43e6dc5c3c18319d31eca23426770c73795feec5	2018-08-15 14:26:56 -07:00
Orion Reblitz-Richardson	021b4888db	Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530 ) Summary: In my environment, it looks like setup.py hangs when running ``` FULL_CAFFE2=1 python setup.py build_deps ``` Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530 Differential Revision: D9349597 Pulled By: orionr fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4	2018-08-15 14:26:53 -07:00
Eli Amesefe	c5b1aa93ee	Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316 Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor. Reviewed By: harouwu Differential Revision: D9004839 fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845	2018-08-15 14:26:50 -07:00
Edward Yang	6f14202acd	Revert D9276252: [pytorch][PR] remove implicit conversion to cpu Differential Revision: D9276252 Original commit changeset: ea7d9d4f9390 fbshipit-source-id: 5977bf90d4c84b47e15bc8266cc3ce5602c4e05f	2018-08-15 13:55:18 -07:00
Syed Tousif Ahmed	5adcac3dce	Cuda half macros cleanup (#10147 ) Summary: This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's https://github.com/pytorch/pytorch/pull/9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds. - Commit `1248de7baf` removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit `2f186df52d`, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS` - Commit `520c99b057` removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10147 Differential Revision: D9345940 Pulled By: soumith fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0	2018-08-15 13:25:42 -07:00
Adam Paszke	86363e1d8e	Move RNN implementations to C++ (#10481 ) Summary: This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR. Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that). zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481 Reviewed By: ezyang Differential Revision: D9341113 Pulled By: apaszke fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5	2018-08-15 13:25:41 -07:00
Thomas Viehmann	484395edfb	Fix corner case with torch.multinomial (#9960 ) Summary: In the shortcut for n_sample=1, when category 0 has 0 weight, we should not map the (uniform) sample 0 to category 0. The conversion uniform->multinomial was apparently written to work on a (0,1] range (like curand uses), but PyTorch uses a [0,1) range. Fixes: #4858. Thank you, Roy Fejgin for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9960 Reviewed By: soumith Differential Revision: D9341793 Pulled By: ailzhang fbshipit-source-id: 6b1a96419a7bc58cc594f761f34c6408ff6354cf	2018-08-15 13:25:39 -07:00
Bram Wasti	fb09292020	Increase tolerance in ConvBN test Summary: reduce flakiness of test Reviewed By: Maratyszcza Differential Revision: D9344877 fbshipit-source-id: 24d5e1b873f94d816c980f3b7db93248cf10aca5	2018-08-15 13:14:35 -07:00
Tongzhou Wang	254dedf604	Propagate NaN through threshold (#10277 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277 Reviewed By: SsnL Differential Revision: D9199825 Pulled By: soumith fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2	2018-08-15 12:59:31 -07:00
Will Feng	0bbcc7b534	Don't assume curl version in Windows build script (#10476 ) Summary: Since we can't specify version number to `choco install curl`, we should not assume that `7.57.0` is the curl version that's in the Windows AMI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10476 Differential Revision: D9303129 Pulled By: yf225 fbshipit-source-id: 198544be68330860fbcf93c99bc995f4e280bda7	2018-08-15 12:59:23 -07:00
James Sun	85408e744f	Move filler interface to operator schema (#10522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10522 Move filler interface to operator schema to avoid extra code for caffe2 mobile. Reviewed By: dzhulgakov Differential Revision: D9312940 fbshipit-source-id: 77fb2406f0c6b171a1912a207e05e36da50c6966	2018-08-15 12:40:18 -07:00
Johan Gudmundsson	9646d68962	support broadcasting in _kl_categorical_categorical (#10533 ) Summary: Support broadcasting in _kl_categorical_categorical this makes it possible to do: ``` import torch.distributions as dist import torch p_dist = dist.Categorical(torch.ones(1,10)) q_dist = dist.Categorical(torch.ones(100,10)) dist.kl_divergence(p_dist, q_dist) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10533 Differential Revision: D9341252 Pulled By: soumith fbshipit-source-id: 34575b30160b43b6c9e4c3070dd7ef07c00ff5d7	2018-08-15 12:40:17 -07:00
Orion Reblitz-Richardson	05a260da43	Bump gloo to latest master (#10545 ) Summary: Needed by the Gloo development team. Verifying nothing breaks in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10545 Reviewed By: Maratyszcza Differential Revision: D9344413 Pulled By: orionr fbshipit-source-id: 207edb71170870bacec47a635a12d7f55b6c1275	2018-08-15 12:25:44 -07:00
Ailing Zhang	5d27d68779	remove implicit conversion to cpu (#10416 ) Summary: Fixes #9934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10416 Differential Revision: D9276252 Pulled By: ailzhang fbshipit-source-id: ea7d9d4f9390edefcd0865a98498f6c4307c291d	2018-08-15 12:25:42 -07:00
Brian Hart	9cffe783f1	relax tolerance for two torch.half (float16) tests (#10519 ) Summary: Two tests in the 'nn' test bucket may fail when the torch.half (float16) data type is used. The assertions used in the tests intend to allow slight floating point imprecision in the results, but the tolerances used for the comparisons are too strict for the half type. Relax the tolerances so that slight float16 imprecision won't cause test failures. The affected tests are: - test_variable_sequence_cuda - test_Conv2d_groups_nobias For more information, see issue: https://github.com/pytorch/pytorch/issues/7420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519 Differential Revision: D9343751 Pulled By: soumith fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a	2018-08-15 12:11:20 -07:00
Duc Ngo	d93e8ab343	Nomnigraph - Refactor SubtreeMatchCriteria to become a Graph of MatchNode (#10512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10512 SubtreeMatchCriteria now becomes a graph of MatchNode MatchNode consists of NodeMatchCriteria, nonTerminal and count. This is a cleaner internal representation of the data structure and will bring us much closer to DAG matching. Note that I still keep the debugString method because convertToDotGraph doesn't currently work with Subgraph. Reviewed By: bwasti Differential Revision: D9321695 fbshipit-source-id: 58a76f007a9a95d18cf807d419c2b595e9bc847f	2018-08-15 12:11:18 -07:00
Mingfei Ma	f59bcea2c3	parallel max and min for ATen on CPU (#10343 ) Summary: optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10343 Differential Revision: D9330799 Pulled By: ezyang fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c	2018-08-15 11:41:01 -07:00
Bangsheng Tang	44b029f5b8	move matrix formation for dot products to precompute/request-only (#10531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10531 fixed a naming issue in pairwise_similarity Reviewed By: huayuli00 Differential Revision: D9331716 fbshipit-source-id: d7de36f20504c08b1c7871ccdffa343221a3da0c	2018-08-15 11:02:10 -07:00
Eli Stevens	f5a4dd89b5	Implements volumetric (5d) affine grid generation. (#8322 ) Summary: I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline. I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all. Diff probably best viewed with whitespace changes ignored. Thanks for considering! Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322 Differential Revision: D9332335 Pulled By: SsnL fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df	2018-08-15 11:02:08 -07:00
Jongsoo Park	d8ff7ad6f8	generalize order switch ops for 1-3d (#10395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395 Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images. This diff generalizes them to 1D and 3D, and also add a unit test we didn't have. Reviewed By: protonu Differential Revision: D9261177 fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda	2018-08-15 10:09:31 -07:00
James Reed	0f05f5fb07	ATen layer norm symbolic (#10513 ) Summary: We can't rely on the ATen fallback pathway here because we need to parse out the constant attributes explicitly Pull Request resolved: https://github.com/pytorch/pytorch/pull/10513 Reviewed By: dzhulgakov Differential Revision: D9322133 Pulled By: jamesr66a fbshipit-source-id: 52af947e6c44532ef220cb4b94838ca838b5df06	2018-08-15 08:28:52 -07:00
Peizhao Zhang	ce8e8feceb	Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390 Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. * The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold. * In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'. Reviewed By: wat3rBro Differential Revision: D9252726 fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499	2018-08-14 23:54:23 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Freddie Mendoza	f1631c3106	Modify build.sh and test.sh scripts for ppc64le jenkins build and test (#10257 ) Summary: Initial jenkins builds / test scripts for ppc64le. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10257 Differential Revision: D9331278 Pulled By: ezyang fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838	2018-08-14 21:54:44 -07:00
Wei Yang	19ad55cc02	set coalesced=false at sparse transpose() and removed transpose invariants (#10496 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/6219 - removed invariants at https://github.com/pytorch/pytorch/pull/4707 - assume a sparse tensor with coalesced=true when: 1. its elements are unique and 2. the indices are in sorted order Pull Request resolved: https://github.com/pytorch/pytorch/pull/10496 Differential Revision: D9311214 Pulled By: weiyangfb fbshipit-source-id: 167fa5a8e9e5f9c800db02f728a1194029f7e4f3	2018-08-14 21:25:37 -07:00
Mingzhe Li	964e30de1d	Workaround for Cuda9.2 and GCC7 compilation errors (#10510 ) Summary: Breaking out of #8338 This PR is a workaround for a bug with CUDA9.2 + GCC7. Here is the error this PR fixed: .../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace)’: .../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’ BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace ws) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510 Reviewed By: orionr Differential Revision: D9319742 Pulled By: mingzhe09088 fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81	2018-08-14 20:54:52 -07:00
Teng Li	b6cc65afea	Send, Recv, RecvAnysource, Barrier Op for MPI PG and Python Bindings (#10227 ) Summary: Based on: https://github.com/pytorch/pytorch/pull/10199 Added: (1) send, recv, recvanysource, and barrier for MPI process group. (2) python binding (3) testing Please review: `2e64f5d675` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10227 Reviewed By: ailzhang Differential Revision: D9327138 Pulled By: teng-li fbshipit-source-id: 80496714550a3ca498eb474465ddbd1b8d657d49	2018-08-14 20:10:11 -07:00
Zeming Lin	26e40fa665	Tensor.accessor now fails on rvalue reference (#10518 ) Summary: Previously, it's easy to do `x[0].accessor<float, 2>()`. However, x[0] is a temporary, so the accessor will point to invalid strides/sizes and probably segfault. With this change, such unsafe code is a compile error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10518 Reviewed By: goldsborough Differential Revision: D9329288 Pulled By: ebetica fbshipit-source-id: d08763bee9a19a898b9d1ea5ba648f27baa1992f	2018-08-14 19:41:31 -07:00
Wei Wen	17ecc06b65	static casting TIndex (#10514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10514 fix the bug which break the windows build in fused_rowwise_random_quantization_ops.h Reviewed By: ezyang, jspark1105 Differential Revision: D9322291 fbshipit-source-id: a6a27e87423b6caa973414ffd7ccb12076f2e1e4	2018-08-14 18:42:44 -07:00
Junjie Bai	60aa416a6d	Re-purpose setup_caffe2.py for faster caffe2 build iterations (#10520 ) Summary: setup.py is the official install script, setup_caffe2.py is not used any more Pull Request resolved: https://github.com/pytorch/pytorch/pull/10520 Reviewed By: yinghai Differential Revision: D9325548 Pulled By: bddppq fbshipit-source-id: 3dda87f3dff061b574fd1d5c91859044f065ee33	2018-08-14 18:13:19 -07:00
James Reed	32bb4040dd	Unified type annotation parsing for script frontends (#10279 ) Summary: After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations. Possible TODOs: - Clean up the functions marked HACK - Clean up the Subscript tree-view to better match the Python AST versions - Can we use this for Python functions? That's the only place annotations.get_signature() is still needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279 Differential Revision: D9319726 Pulled By: jamesr66a fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa	2018-08-14 18:13:15 -07:00
Teng Li	b69b1c477b	Adding python binding for MPI process group (#10199 ) Summary: Based on https://github.com/pytorch/pytorch/pull/10159 Please review ProcessGroupMPI.cpp/hpp and init.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/10199 Reviewed By: yf225 Differential Revision: D9324027 Pulled By: teng-li fbshipit-source-id: 2dd524bee0c7ca8f9594ec3b4f3ebbbb608df337	2018-08-14 15:56:33 -07:00
Duc Ngo	39bfc2d0d4	Nomnigraph - add diagnostic ability for Subgraph matching API (#10267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10267 isSubtreeMatch now returns a SubtreeMatchResult which contains a match flag and a debugMessage string that contains the reason why a subtree is not matched (if requested). Reviewed By: bwasti Differential Revision: D9182429 fbshipit-source-id: 530591fad592d02fb4c31fc398960a14ec90c86a	2018-08-14 15:56:31 -07:00
Teng Li	3c39e857ca	Python binding for reduce,allgather,scatter,gather ops and python tests (#10159 ) Summary: Provided python binding for these four ops. Also provided nccl binding test. Based on https://github.com/pytorch/pytorch/pull/10058 Please only review init.cpp, and test file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10159 Reviewed By: yf225 Differential Revision: D9323192 Pulled By: teng-li fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e	2018-08-14 14:24:57 -07:00
Lara Haidar-Ahmad	16ecd6f99c	Fix Debug Build On Windows (#10359 ) Summary: compile files in torch/csrc with /MDd runtime library option for debug build on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/10359 Differential Revision: D9316946 Pulled By: SsnL fbshipit-source-id: c84bfad81d61cd49f39b7bce7177edd2b1e8bd69	2018-08-14 13:24:14 -07:00
Teng Li	3f3a30f79c	Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups (#10058 ) Summary: Added - Reduce (both NCCL and MPI) - AllGather (both NCCL and MPI) - Gather (MPI) - Scatter (MPI) for c10d process groups. This basically finalizes all supported ops for C10d to match THD. All ops are tested as well. ``` mpirun -np 8 ./ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` ``` ./ProcessGroupNCCLTest Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10058 Reviewed By: yf225 Differential Revision: D9316312 Pulled By: teng-li fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f	2018-08-14 13:10:21 -07:00
Peter Goldsborough	13814d6744	Remove use of data() in optimizers (#10490 ) Summary: After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors. This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations. The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`. For this PR I: 1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`. 2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy! 3. Minor cleanup of the optimizer codebase ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490 Differential Revision: D9318229 Pulled By: goldsborough fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5	2018-08-14 13:10:19 -07:00
Lu Fang	bdb11e716a	Split the dependence of ONNX from test_operators.py (#10151 ) Summary: Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed) The major changes: - output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.) - ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151 Reviewed By: jamesr66a Differential Revision: D9130706 Pulled By: houseroad fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f	2018-08-14 12:54:44 -07:00
Jan Stria	eea8ab1861	Move common code to RNNCellBase. (#10399 ) Summary: There are three classes `RNNCell`, `LSTMCell`, `GRUCell` inherited from `RNNCellBase`, all defining the identical initialization function `reset_parameters`. Lets move it to the common base. Another option is to have different initialization for RNN, LSTM and GRU. Maybe those weights whose output is processed with sigmoid (i.e. gain=1) should be initialized differently from those going to tanh (gain=5/3)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10399 Differential Revision: D9316978 Pulled By: SsnL fbshipit-source-id: a2d9408f0b5c971a3e6c3d42e4673725cf03ecc1	2018-08-14 12:39:59 -07:00
Jongsoo Park	bd497809e2	CAFFE_ENFORCE -> CAFFE_ENFORCE_EQ for error with more information (#10244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10244 Use CAFFE_ENFORCE_EQ(x, y) instead of CAFFE_ENFORCE(x == y) in conv_op_impl.h for error messages with more information. Reviewed By: viswanathgs Differential Revision: D9177091 fbshipit-source-id: cf8d10afec1ce6793d3ae0b62f05648722a4130b	2018-08-14 12:24:44 -07:00
Edward Yang	2400512a08	Remove unnecessary include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10486 Reviewed By: ml7 Differential Revision: D9305283 fbshipit-source-id: 0d1316f9a72670ddbe8d95ead93603d00ad0f63b	2018-08-14 12:10:04 -07:00
Anders Papitto	d1442b36f3	add a rebuild_libtorch command for speedier iteration. (#10036 ) Summary: It just calls into `ninja install`. For iterative work on libtorch.so/_C.so, `python setup.py rebuild_libtorch develop` should provide quick iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036 Differential Revision: D9317869 Pulled By: anderspapitto fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2	2018-08-14 12:10:02 -07:00
Peizhao Zhang	520f4f6cb9	Added some unit test for box_with_nms_limit_op. (#10389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389 Added some unit test for box_with_nms_limit_op. Reviewed By: wat3rBro Differential Revision: D9237860 fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731	2018-08-14 11:55:03 -07:00
Tongzhou Wang	d043f83019	Add tests for Tensor.* nn.* F.* docs (#10311 ) Summary: Test only for existence for now. I had to skip a lot of them so there a FIXME in the test. Also I'm not testing torch.* because of namespace issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311 Differential Revision: D9196341 Pulled By: SsnL fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc	2018-08-14 11:39:46 -07:00
Richard Zou	b4462511fd	Add LSTMCell backward pass expect tests (#10506 ) Summary: - Exposed get_debug_graph for ScriptModule (gets the debug graph for its forward Method) - Added forward/backward expect tests for lstm and milstm cells. These are intended to prevent regressions cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506 Differential Revision: D9316590 Pulled By: zou3519 fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf	2018-08-14 11:39:44 -07:00
Yinghai Lu	e5811becdd	Add tags for onnx tensor descriptors (#10502 ) Summary: We missed 2 places to add tags when we create tensor descriptors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10502 Reviewed By: Maratyszcza Differential Revision: D9312075 Pulled By: yinghai fbshipit-source-id: 329e83ec5470b0a778d2eda525dd6f2143facbdf	2018-08-14 11:25:52 -07:00
Orion Reblitz-Richardson	9497383706	Fix some warnings (#10297 ) Summary: Fixing some compiler warnings while looking at symbol visibility. cc smessmer ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10297 Reviewed By: soumith Differential Revision: D9195336 Pulled By: orionr fbshipit-source-id: 04cbfd3549984caec7bdd1a5b39a6d25e80348e9	2018-08-14 10:40:08 -07:00
Zachary DeVito	61bedc96f0	Schema-based creation of graph nodes (#10198 ) Summary: This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload. Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well: ``` g.insert(aten::add, {v, 3}); ``` Keyword arguments are also supported: ``` g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}}); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198 Differential Revision: D9307252 Pulled By: zdevito fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b	2018-08-14 10:25:38 -07:00
LadyRick	3a40baa15c	fix a grammatical error: accelerate compute (#10204 ) Summary: "accelerate compute" a verb shouldn't go with another verb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10204 Differential Revision: D9316699 Pulled By: fmassa fbshipit-source-id: f1126c594905c3236ffd6b7e57a92552d3d4c1f1	2018-08-14 10:11:15 -07:00
Ahti Kalervo	ef44faece2	check attribute existence in torch.legay.nn.SpatialFullConvolution in method type (#8740 ) Summary: This is related to #5255 When adding cuda support for the model, this error comes: `` AttributeError: 'SpatialFullConvolution' object has no attribute 'finput' `` here is my short code for test. https://gist.github.com/kaleaht/26518c3deea5d1d3dda722fbf1f3ecdc I converted torch7's model also from here. https://github.com/art-programmer/FloorplanTransformation Pull Request resolved: https://github.com/pytorch/pytorch/pull/8740 Differential Revision: D8872735 Pulled By: SsnL fbshipit-source-id: 8d97f8b59cdf4049e87be14b78c4608fd973d149	2018-08-14 10:11:13 -07:00
jgong5	329d901a91	Fold AffineChannel to Conv, the same way as BN (for Detectron models) (#10293 ) Summary: AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293 Differential Revision: D9276789 Pulled By: yinghai fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b	2018-08-13 22:43:37 -07:00
Bram Wasti	c618df154e	Add intrinsic support for external_input/output to nomnigraph (#10100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10100 nomnigraph has until this point tried to ignore external input and output, as they aren't very well defined (does order matter?). but for DCE and some of Keren's work they are becoming necessary. I went ahead and added this to the core nomnigraph converter Reviewed By: yinghai Differential Revision: D9105487 fbshipit-source-id: a2e10e3cc84515611d6ab7d4bc54cf99b77729c0	2018-08-13 21:39:17 -07:00
Vishwak Srinivasan	7d16e87f14	Fix byte ordering issue in from_numpy (#9508 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3671 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508 Differential Revision: D9307186 Pulled By: soumith fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa	2018-08-13 21:39:16 -07:00
peter	facb293aad	Fix FindMKL.cmake for Windows (#10453 ) Summary: Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453 Differential Revision: D9311591 Pulled By: soumith fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31	2018-08-13 21:09:27 -07:00
Richard Zou	fed05cf4cf	Fix prim::FusedConcat bug (#10466 ) Summary: Fixes #10456 The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example, ``` import torch torch.jit.script def fn(x, y, z): x1 = x + y y1 = x - y w = torch.cat([x1, y1]) return w + z x = torch.randn(2, 2, dtype=torch.float, device='cpu') y = torch.randn(2, 2, dtype=torch.float, device='cpu') z = torch.randn(4, 2, dtype=torch.float, device='cpu') fn(x, y, z) fn.graph_for(x, y, z) ``` produced the following graph: ``` graph(%x : Float(2, 2) %y : Float(2, 2) %z : Float(4, 2)) { %3 : int = prim::Constant[value=1]() %y1 : Float(2, 2) = aten::sub(%x, %y, %3) %8 : int = prim::Constant[value=0]() %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y) return (%14); } with prim::FusionGroup_0 = graph(%1 : Float(4, 2) %5 : Float(2, 2) %7 : Float(2, 2) %8 : Float(2, 2)) { %11 : int = prim::Constant[value=1]() %9 : int = prim::Constant[value=1]() %x1 : Float(2, 2) = aten::add(%7, %8, %9) %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5) %2 : int = prim::Constant[value=1]() %3 : Float(4, 2) = aten::add(%w, %1, %2) return (%3); } ``` this is a problem because it violates two invariants: 1) all inputs to the FusionGroup must have the same size 2) prim::FusedConcat's output must not be used inside the FusionGroup This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer. If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466 Differential Revision: D9296686 Pulled By: zou3519 fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd	2018-08-13 21:09:25 -07:00
Junjie Bai	099a545376	Hipify Caffe2 binaries (#10468 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468 Reviewed By: yinghai Differential Revision: D9301178 Pulled By: bddppq fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c	2018-08-13 20:56:28 -07:00
Peter Goldsborough	9a9224e5c1	Remove "locally" from CONTRIBUTING.md (#10495 ) Summary: A bootcamper was confused by the word "locally" and thought it meant on his macbook as opposed to his FB dev machine. Besides the confusion for the FB context, the word "locally" isn't really necessary at all soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10495 Reviewed By: soumith Differential Revision: D9311480 Pulled By: goldsborough fbshipit-source-id: 2779c7c60f903a1822a50d140ed32a346feec39e	2018-08-13 20:56:26 -07:00
Lukasz Wesolowski	f6eb966fd2	Fix TanhGradientOperator linker errors (#10426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10426 We were seeing linker errors for TanhGradientOperator in multifeed. Since we only use the float specialization, we might as well define it that way. Reviewed By: yinghai Differential Revision: D9280622 fbshipit-source-id: d2ffb698c73a84bb062de5e1f3bda741330e4228	2018-08-13 17:57:10 -07:00
Wei Wen	ffb59e5f20	adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending) Summary: This operator implements b (1/2/4/8) bit stochastic quantization of a floating matrix in a row-wise fashion. 8/b floating values are concatenated to a byte and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629 Reviewed By: harouwu Differential Revision: D8493264 fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02	2018-08-13 16:39:23 -07:00
pbialecki	c6fc3ab557	fixes printing non-contiguous tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405 Differential Revision: D9302794 Pulled By: soumith fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30	2018-08-13 16:26:20 -07:00
Gregory Chanan	216961b7bf	Remove is_zero_dim_ bool in THTensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10415 Reviewed By: ezyang Differential Revision: D9274954 Pulled By: gchanan fbshipit-source-id: 353a52d91556d5b81c3510eb2bf399d102c9a0a4	2018-08-13 12:39:06 -07:00
peter	f59cce95b4	Some symbol annotation fixes for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10369 Differential Revision: D9300187 Pulled By: ezyang fbshipit-source-id: bf29966ad6aa221332b7232a965fb85e652f866d	2018-08-13 12:26:00 -07:00
Edward Yang	382ff03222	Add missing #pragma once Reviewed By: ml7 Differential Revision: D9299779 fbshipit-source-id: b5b5a1b9ead1b275d3ae54ecfad99617d2869094	2018-08-13 11:39:45 -07:00
iotamudelta	75651d5b58	improve use of ROCm libraries, enable more tests, small fixes (#10406 ) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2	2018-08-13 11:39:43 -07:00
Jesse Hellemn	cd81217f8e	A single print statement in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473 Reviewed By: ml7 Differential Revision: D9299196 Pulled By: pjh5 fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb	2018-08-13 11:39:42 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Edward Yang	64235d5c01	Rewrite TensorImpl to use TensorTypeId. (#10278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278 Translation to Backend happens immediately before we go into the Type universe; otherwise we use TensorTypeId. I allocated TensorTypeId corresponding exactly to existing ATen Backend. Only CPUTensorId and CUDATensorId are relevant in the Caffe2 universe. Reviewed By: gchanan Differential Revision: D9184060 fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597	2018-08-13 11:20:04 -07:00
Edward Yang	145eb330ad	Back out "Back out "Move typeid.h to move to ATen/core"" (#10465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10465 Original commit changeset: 7050fe845e65 Reviewed By: jerryzh168 Differential Revision: D9296375 fbshipit-source-id: cb8161440ba809dcec5027858a29cd026d537fc3	2018-08-13 11:20:01 -07:00
Zeming Lin	b8530dc1f0	A few additions (#9837 ) Summary: This PR provides 4 fixes / features: 1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e. ```c++ struct MySuperModuleImpl : virtual public torch::nn::Module { virtual void myFunction() = 0; } struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {}; struct MyModule : public MySuperModule<MyModule> { void myFunction() override; }; ``` This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module. 2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise 3. Serialization functions creates autograd history and calls copy_! Bad! 4. Optimizers did not create buffers after add_parameters was called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837 Reviewed By: goldsborough Differential Revision: D9199746 Pulled By: ebetica fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299	2018-08-13 10:24:58 -07:00
root	0a39a9cfbc	Add db directory for hipifying (#10428 ) Summary: bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10428 Differential Revision: D9297115 Pulled By: bddppq fbshipit-source-id: d7134ff24102f03f762e6a7b4340055546c9ecfd	2018-08-13 10:24:56 -07:00
Yangqing Jia	56267cc97b	gflags improvement to allow CAFFE2_EXPORTS (#10444 ) Summary: Explanation copied from code: // Motivation about the gflags wrapper: // (1) We would need to make sure that the gflags version and the non-gflags // version of Caffe2 are going to expose the same flags abstraction. One should // explicitly use caffe2::FLAGS_flag_name to access the flags. // (2) For flag names, it is recommended to start with caffe2_ to distinguish it // from regular gflags flags. For example, do // CAFFE2_DEFINE_BOOL(caffe2_my_flag, true, "An example"); // to allow one to use caffe2::FLAGS_caffe2_my_flag. // (3) Gflags has a design issue that does not properly expose the global flags, // if one builds the library with -fvisibility=hidden. The current gflags (as of // Aug 2018) only deals with the Windows case using dllexport, and not the Linux // counterparts. As a result, we will explciitly use CAFFE2_EXPORT to export the // flags defined in Caffe2. This is done via a global reference, so the flag // itself is not duplicated - under the hood it is the same global gflags flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10444 Differential Revision: D9296726 Pulled By: Yangqing fbshipit-source-id: a867d67260255cc46bf0a928122ff71a575d3966	2018-08-13 09:54:48 -07:00
Edward Yang	64a6f17177	Fix ATen/core header installation. (#10463 ) Summary: Fixes #10353 and fixes #10397. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463 Differential Revision: D9296491 Pulled By: ezyang fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366	2018-08-13 09:25:49 -07:00
onnxbot	fa5d95a00c	Bump onnx to onnx/onnx@0d250de (#10452 ) Summary: `0d250dea76` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10452 Reviewed By: houseroad Differential Revision: D9288037 Pulled By: bddppq fbshipit-source-id: 206be3ee2b8ebca26f3d8af0597078363ed6d168	2018-08-13 00:09:15 -07:00
Tongliang Liao	3cbe8f0c3e	Detect system RocksDB installation with CMake config files. (#7315 ) Summary: On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake. And it doesn't include/link the RocksDB dependencies either, like: * `Snappy` * `Shlwapi.lib` * `Rpcrt4.lib` This PR try to detect in config mode first before using private find module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315 Differential Revision: D9287587 Pulled By: Yangqing fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00	2018-08-12 18:24:10 -07:00
Tongliang Liao	82d11b847e	Use CUDA_LINK_LIBRARIES_KEYWORD instead of hacking. (#10437 ) Summary: There's no need to hack. Using `CUDA_LINK_LIBRARIES_KEYWORD` is the normal way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10437 Differential Revision: D9287579 Pulled By: Yangqing fbshipit-source-id: d3d575ea8c3235576ba971e4b7493ddb435f92f3	2018-08-12 18:09:20 -07:00
Tongliang Liao	508de8109f	Added missing "AT_" prefix to macro. (#10436 ) Summary: For issue #10435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436 Differential Revision: D9287578 Pulled By: Yangqing fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50	2018-08-12 18:09:19 -07:00
Yinghai Lu	1756daaa75	Use FULL_CAFFE2 to build caffe2 and python in one shot (#10427 ) Summary: Building caffe2 and pytorch separately will end up duplicated symbols as they now share some basic libs. And it's especially bad for registry. This PR fixes our CI and build them in one shot with shared symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10427 Reviewed By: bddppq Differential Revision: D9282372 Pulled By: yinghai fbshipit-source-id: 0514931ea88277029a68fa5368ff4336472f132e	2018-08-12 15:39:12 -07:00
Edward Yang	51f154e072	Fix Python lint errors. (#10441 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10441 Reviewed By: Yangqing Differential Revision: D9285502 Pulled By: ezyang fbshipit-source-id: 12c94b28bee9cade930c8f260577e81ea1915269	2018-08-11 21:08:50 -07:00
Yangqing Jia	cd53b78bd0	Remove caffe namespace GetEmptyStringAlreadyInited (#10438 ) Summary: A followup cleanup of #10380 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/10438 Differential Revision: D9285692 Pulled By: Yangqing fbshipit-source-id: c73defbef00d3b563240d0b69d85bd0a6e3eb504	2018-08-11 17:39:58 -07:00
jgong5	ab6afc2b23	Optimize max_pooling for inference for MKL-DNN/IDEEP device (#10156 ) Summary: Optimize the max_pooling operation for inference path by setting the "inference" flag to the underlying MKL-DNN, saving the computation and store of max indices which is only needed for training. To make the API compatible, training mode is still the default and inference mode is set in the optimizeForIdeep path. Test shows the speed-up of a single max_pooling operation is up to 7X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10156 Differential Revision: D9276755 Pulled By: yinghai fbshipit-source-id: ad533d53aabb8ccb3b592da984d6269d9b794a8a	2018-08-10 23:14:05 -07:00
Yinghai Lu	d3ccc836de	Fix warning in Nomnigraph (#10425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10425 `const size_t` as return value doesn't make sense. Reviewed By: duc0 Differential Revision: D9281442 fbshipit-source-id: c3d9c94f5dbe516476f0c74f63c35e60893c8140	2018-08-10 22:40:26 -07:00
Edward Yang	1dbdc5a93d	Back out "Move typeid.h to move to ATen/core" Summary: Original commit changeset: 21f2c89e58ca Reviewed By: yinghai Differential Revision: D9282171 fbshipit-source-id: 7050fe845e6524b965bdd45794a6fa1665b83e34	2018-08-10 21:39:25 -07:00
Jason Gauci	31646edfff	Increase GLOO rendevous timeout Summary: Increase GLOO rendevous timeout Reviewed By: teng-li Differential Revision: D9273544 fbshipit-source-id: 5c22c1d18df3032f019ff12e2a720aea7c390f15	2018-08-10 18:40:18 -07:00
Edward Yang	767687835e	Replace sudo with --user in CI caffe2 install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10328 Reviewed By: pjh5 Differential Revision: D9275809 Pulled By: ezyang fbshipit-source-id: c22cb1570c67199b74b2188ad83b1e4828e11911	2018-08-10 15:11:43 -07:00
Adam Paszke	adbcb3c1dc	Move dropout and alpha dropout to ATen (#10384 ) Summary: zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384 Reviewed By: ezyang Differential Revision: D9272583 Pulled By: apaszke fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921	2018-08-10 14:55:28 -07:00
Gregory Chanan	5b0be9de59	Remove TH compatibility calls for strides. (#10414 ) Summary: This should just work now that sizes/strides are unified between TH and ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10414 Differential Revision: D9274681 Pulled By: gchanan fbshipit-source-id: 69eb766f4e3a5b6c57b15837cffdef513b6d7817	2018-08-10 13:54:58 -07:00
Edward Yang	674f7a9778	Correctly share CUDA Parameters. (#10220 ) Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue #9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes #9996. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c	2018-08-10 13:54:56 -07:00
Christian Puhrsch	0b8a0125ab	Fixes torch.log after torch.expand giving incorrect results (#10269 ) Summary: fixes #10241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269 Differential Revision: D9272472 Pulled By: cpuhrsch fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca	2018-08-10 13:39:38 -07:00
Tongzhou Wang	6a55238a3f	Grid sampler: nearest interpolation & reflection padding (#10051 ) Summary: closes #9702 . cc jph00 Commit structure: 1. Change the index calculation logic. I will explain using 1-D for simplicity. Previously we have (in pseudo code): ``` // 1. get the float locations from grid scalar_t x = from_grid() // 2. find the integral surrounding indices int x_left = floor(x) int x_right = x_left + 1 // 3. calculate the linear interpolate weights scalar_t w_left = x_right - x scalar_t w_right = x - x_left // 4. manipulate the integral surrounding indices if needed // (e.g., clip for border padding_mode) x_left = manipulate(x_left, padding_mode) x_right = manipulate(x_right, padding_mode) // 5. interpolate output_val = interpolate(w_left, w_right, x_left, x_right) ``` This is actually incorrect (and also unintuitive) because it calculates the weights before manipulate out-of-boundary indices. Fortunately, this isn't manifested in both of the current supported modes, `'zeros'` and `'border'` padding: + `'zeros'`: doesn't clip + `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are clipped to the same value, so weights don't matter But this is a problem with reflection padding, since after each time we reflect, the values of `w_left` and `w_right` should be swapped. So in this commit I change the algorithm to (numbers corresponding to the ordering in the above pseudo-code) ``` 1. get float location 4. clip the float location 2. find the integral surrounding indices 3. calculate the linear interpolate weights ``` In the backward, because of this change, I need to add new variables to track `d manipulate_output / d manipulate_input`, which is basically a multiplier on the gradient calculated for `grid`. From benchmarking this addition doesn't cause obvious slow downs. 2. Implement reflection padding. The indices will keep being reflected until they become within boundary. Added variant of `clip_coordinates` and `reflect_coordinates` to be used in backward. E.g., ```cpp // clip_coordinates_set_grad works similarly to clip_coordinates except that // it also returns the `d output / d input` via pointer argument `grad_in`. // This is useful in the backward pass of grid_sampler. scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t grad_in) ``` For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`. If `in` is reflected odd* times in `'reflection'` mode, `grad_in` is set to `-1`. 3. Implement nearest interpolation. 4. Add test cases 5. Add better input checking Discussed with goldsborough for moving `operator<<` of `at::Device`, `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise `AT_CHECK` can't find them.) 6. Support empty tensors. cc gchanan + Make empty tensors not acceptable by cudnn. + Add `AT_ASSERT(kernel block size > 0)` if using `GET_BLOCKS` + Cache `numel` in `TensorGeometry` I was going to use `numel` to test if cudnn descriptor should accept a tensor, but it isn't used eventually. I can revert this if needed. 7. Add more test cases, including on input checking and empty tensors 8. Remove an obsolete comment 9. Update docs. Manually tested by generating docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051 Differential Revision: D9123950 Pulled By: SsnL fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6	2018-08-10 12:43:27 -07:00
Jesse Hellemn	def3715e82	Minor changes for nicer pip packages (#9544 ) Summary: I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544 Reviewed By: orionr Differential Revision: D9267111 Pulled By: pjh5 fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894	2018-08-10 12:09:46 -07:00
Yangqing Jia	40109b16d0	Remove caffe1 specific proto (#10380 ) Summary: This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation. Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380 Differential Revision: D9267981 Pulled By: Yangqing fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390	2018-08-10 11:10:26 -07:00
Anders Papitto	018790cd4b	thread BUILD_SHARED_LIBS through build_pytorch_libs.sh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10272 Differential Revision: D9239337 Pulled By: anderspapitto fbshipit-source-id: 187b3acb7e85635d9b45a3dd82c98d86a2b51e70	2018-08-10 10:39:31 -07:00
Gregory Chanan	9b8a036873	Fix basic.cpp, which compared equality between a size [1] tensor with… (#10404 ) Summary: … a size [] tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10404 Differential Revision: D9268467 Pulled By: gchanan fbshipit-source-id: 92bb387358f4030519c6883c12ea69312185446e	2018-08-10 10:39:29 -07:00
Zhishuai Zhang	e524a8994b	Make lengths_host_.CopyFrom synced in LengthsCosineCoherenceOp and LengthsTileOp (#10360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10360 It seems `lengths_host_.CopyFrom(lengthsInput, &context_);` is asynchronous w.r.t. the host while `lengths_host_.CopyFrom(lengthsInput);` is synchronous. However, according to jerryzh168, `lengths_host_.CopyFrom(lengths, &context_); context_.FinishDeviceComputation();` is the safest way to guarantee synchronization. Reviewed By: jerryzh168 Differential Revision: D9197923 fbshipit-source-id: 827eb63d9d15c1274851e8301a793aed39d4fa6b	2018-08-10 10:39:28 -07:00
Adam Paszke	be5fb8f6fd	Move fused RNN kernels into ATen (#10305 ) Summary: As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305 Reviewed By: ezyang Differential Revision: D9196227 Pulled By: apaszke fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba	2018-08-10 09:12:05 -07:00
Gregory Chanan	e221791afc	Fix typo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10387 Differential Revision: D9255840 Pulled By: gchanan fbshipit-source-id: 97b52d4e349c1e2d1970abde7dc6b25e7cf668a0	2018-08-10 08:55:30 -07:00
Gregory Chanan	1e3e26e3e8	Use nDimensionLegacyNoScalars in THTensorDimApply. (#10388 ) Summary: This issue was exposed in https://github.com/pytorch/pytorch/pull/10383. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10388 Differential Revision: D9255836 Pulled By: gchanan fbshipit-source-id: 88c5a6415c27d56ff54d00a8957fdc1617cfbde7	2018-08-10 08:55:28 -07:00
Edward Yang	3667d029b4	Move typeid.h to move to ATen/core (#10163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10163 - Remove dependency on caffe2/core/common.h for ATen/core/typeid.h Unfortunately, Windows seems to rely on typeid.h including this header, so it is still included from the forwarding header caffe2/core/typeid.h - Deduplicate Demangle/DemangleType with their ATen equivalents Reviewed By: smessmer Differential Revision: D9132432 fbshipit-source-id: 21f2c89e58ca1e795f1b2caa316361b729a5231b	2018-08-10 08:45:44 -07:00
Roy Li	e9ad74357e	Use serialization container in ir import export (#10394 ) Summary: Copy of #10191 because these changes didn't land with the diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394 Differential Revision: D9260816 Pulled By: li-roy fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558	2018-08-10 00:09:30 -07:00
Michael Suo	0950d7a98d	support list slicing (#10318 ) Summary: As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318 Differential Revision: D9254351 Pulled By: michaelsuo fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586	2018-08-09 17:25:13 -07:00
Gregory Chanan	b1e3239ec8	Fix some backwards definitions wrt keepdim. (#10382 ) Summary: Before we had 0-dim tensors in TH, we were flexible in what we accepted wrt to the difference between size [] and size [1] tensors in backwards functions because they were identical in TH. So, we had backwards definitions that were technically incorrect, but happened to work. This often masks shape issues, adds greatly to code complexity and thus IMO isn't worth keeping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10382 Differential Revision: D9244618 Pulled By: gchanan fbshipit-source-id: 2c29c53a8ffe8710843451202cad6b4323af10e8	2018-08-09 15:11:55 -07:00
Gregory Chanan	209af45614	Back out "[pytorch][PR] Fix bincount for empty input" Summary: Original commit changeset: 6c4c66c23679 Reviewed By: SsnL Differential Revision: D9253403 fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076	2018-08-09 14:25:33 -07:00
Alex Sergeev	18d2fcde7a	Fix performance of DistributedSampler per #8958 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361 Differential Revision: D9240798 Pulled By: ezyang fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f	2018-08-09 12:54:37 -07:00
Thomas Viehmann	64a60030a6	Don't copy on clamp, clamp_out (#10352 ) Summary: This makes clamp and relu faster (fixes #10276). The extra copying was introduced when clamp moved to ATen and the _th_clamp_ wrapper was used to forward to TH/THC, we remove that and add _th_clamp(_out) instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10352 Reviewed By: ezyang Differential Revision: D9233590 Pulled By: SsnL fbshipit-source-id: 4f86a045498e5e577fb22656c71f171add7ed0ac	2018-08-09 12:40:47 -07:00
Vishwak Srinivasan	b43beec070	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Differential Revision: D8966879 Pulled By: soumith fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869	2018-08-09 12:40:45 -07:00
peter	cc5b47ff47	Fix the logic for PATH guess on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10372 Differential Revision: D9240207 Pulled By: soumith fbshipit-source-id: 0933f6fde19536c7da7d45044efbdcfe8ea40e1f	2018-08-09 12:40:44 -07:00
Tongzhou Wang	3fa1c1022a	Avoid std::thread ctor "cannot resolve" error (#10381 ) Summary: If an `at::test` function is added, gcc can't figure out the `std::thread(test, -1)` resolution. It is not a problem for current code. I bumped into this when playing with native functions. But I think it is a good to just prevent it from happening in future by removing `using namespace at;`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10381 Differential Revision: D9241614 Pulled By: SsnL fbshipit-source-id: 972ac3cecff3a50602b3fba463ae1ebd3f53d036	2018-08-09 11:55:40 -07:00
peter	99b10adc01	Fix compile flags for MSVC Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10368 Differential Revision: D9240791 Pulled By: ezyang fbshipit-source-id: 536b093b5c800cc1cf02cbbde9ae341e25d083d1	2018-08-09 09:39:58 -07:00
Gregory Chanan	7d53c876dc	Move maybeZeroDim to TH, change condition so it doesn't turn off scal… (#10333 ) Summary: …ars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10333 Differential Revision: D9206091 Pulled By: gchanan fbshipit-source-id: 492c50189edc2056aa2acce98d49234d2a54ce39	2018-08-09 09:28:57 -07:00
Gregory Chanan	e967fa9757	Fix THTensor_nElement for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10332 Differential Revision: D9206039 Pulled By: gchanan fbshipit-source-id: 0bc7c15050a6a602f621d3e9ecc3a6ea35481a6a	2018-08-09 09:28:55 -07:00
Thomas Viehmann	52d85bedb7	Deal with undefined tensors in unbind backward (#9995 ) Summary: When only part of the outputs of unbind are used in a backward, the gradients for the others are undefined. This sets those to zero in to_tensor_list. Fixes: #9977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9995 Differential Revision: D9239610 Pulled By: soumith fbshipit-source-id: eb8d1b3f2b4e615449f9d856e10b946910df9147	2018-08-09 08:54:28 -07:00
Zhishuai Zhang	b70b7066f7	Keep kEps in one place to make sure they are consistent (#10334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10334 Keep kEps in one place to make sure they are consistent Reviewed By: xianjiec Differential Revision: D9202280 fbshipit-source-id: 35d173ce1d1a361b5b8cdbf1eac423e906e7c801	2018-08-09 08:27:42 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	037d8d1bab	Order Loss functions alphabetically in nn.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10365 Differential Revision: D9237287 Pulled By: SsnL fbshipit-source-id: 28e9de76b9cfd8f63c8df561ff1531ea8d0803ea	2018-08-08 22:39:55 -07:00
Marat Dukhan	9dfc4edc68	Update NNPACK and cpuinfo submodules (#8564 ) Summary: Bring in extra optimizations in Winograd-based convolution on NEON Pull Request resolved: https://github.com/pytorch/pytorch/pull/8564 Reviewed By: hlu1 Differential Revision: D9088140 Pulled By: Maratyszcza fbshipit-source-id: 2089191416db98bdad8f0e4848b1435fcf74a88b	2018-08-08 22:39:52 -07:00
Thomas Viehmann	6e49f933ad	Check that result is on CPU for CPU unary ops kernels (#10358 ) Summary: Fixes: #10270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358 Differential Revision: D9233066 Pulled By: soumith fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35	2018-08-08 21:11:53 -07:00
Duc Ngo	783f2c60b2	nomnigraph - Enhancements to subgraph matching APIs (#10218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10218 SubtreeMatchCriteria now supports: - nonTerminal flag : if this is set, it means we only match the root of the subtree and do not care about the children. Example use case: to match an "input" node but does not care how the input is produced. Additional tests for these new logic are added to subgraph_matcher_test.cc. Subgraph matching APIs for NNGraph is also added. (Further enhancement to make the SubgraphMatching API constructs a Subgraph object/more diagnostic information will go later). Reviewed By: bwasti Differential Revision: D9156092 fbshipit-source-id: 3f28ac15d9edd474b3e0cd51fd7e6f973299d061	2018-08-08 14:56:23 -07:00
Ailing Zhang	69760e2840	update torch.eig() doc (#10315 ) Summary: This fixes #9383 Update torch.eig() doc, the complex part is written based on https://scc.ustc.edu.cn/zlsc/sugon/intel/mkl/mkl_manual/GUID-16EB5901-5644-4DA6-A332-A052309010C4.htm Pull Request resolved: https://github.com/pytorch/pytorch/pull/10315 Reviewed By: yf225 Differential Revision: D9200723 Pulled By: ailzhang fbshipit-source-id: d2e186fd24defbc4fdea6c2cf3dc4f7e05e1d170	2018-08-08 06:43:41 -07:00
Edward Yang	0d03219a42	Remove hack as integrated builds use FULL_CAFFE2 now (#10320 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10320 Reviewed By: jerryzh168 Differential Revision: D9198902 Pulled By: ezyang fbshipit-source-id: 8af28d607735e5f4450c40127c1f8c262ea602ce	2018-08-07 21:40:07 -07:00
Thiago Crepaldi	7d6d7bef6a	Enable docker image build for PyTorch using specific python version (#10317 ) Summary: Current Dockerfile builds pytorch using default python within miniconda, which happens to be Python 3.6 This patch allows users to specify which python should be installed in the default miniconda environment used by the pytorch dockerfile. I have tested the build for python 2.7, 3.5, 3.6 and 3.7. Python 2.7 required typing and cython Pull Request resolved: https://github.com/pytorch/pytorch/pull/10317 Differential Revision: D9204401 Pulled By: ezyang fbshipit-source-id: 11355cab3bf448bbe8369a2ed1de0d409c9a2d6e	2018-08-07 16:13:33 -07:00
Gregory Chanan	66b3bae47c	Add sizesLegacyNoScalars/stridesLegacyNoScalars analog of sizeLegacyN… (#10323 ) Summary: …oScalars,strideLegacyNoScalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10323 Differential Revision: D9200567 Pulled By: gchanan fbshipit-source-id: 5580d6f92eef0acb04132f1978436cc31cdf563a	2018-08-07 15:41:28 -07:00
Christian Puhrsch	b7bc327180	Remove new_Tensor and generated components Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10194 Differential Revision: D9160559 Pulled By: cpuhrsch fbshipit-source-id: 133185b3d4258c154dc43f7572dbef6bfa6786f3	2018-08-07 15:09:38 -07:00
Peter Goldsborough	5390476297	Add tracing to custom op and simplify tracer overall (#10212 ) Summary: This PR adds tracing infrastructure for custom operators. It also simplifies the tracer overall, and changes the codegen to do more metaprogramming there instead of via C++ (which was necessary for the custom op tracing). To give an example of the tracer/metaprogramming change, what used to look like this in `VariableType.cpp`: ``` jit::tracer::PreTraceInfo trace_info; if (jit::tracer::isTracing()) { trace_info = jit::tracer::preRecordTrace(jit::aten::index_select, "self", self, "dim", dim, "index", index); } ``` is now simply the inlined version of `preRecordTrace`, minus C++ metaprogramming: ``` torch::jit::Node* node = nullptr; if (jit::tracer::isTracing()) { auto& graph = jit::tracer::getTracingState()->graph; node = graph->create(jit::aten::index_select_out, /outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "result", result); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "dim", dim); jit::tracer::addInputs(node, "index", index); graph->appendNode(node); } ``` zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10212 Differential Revision: D9199615 Pulled By: goldsborough fbshipit-source-id: cd4b603c1dc01340ead407228e109c99bdba2cfc	2018-08-07 13:54:15 -07:00
Natalia Gimelshein	5bb21493fd	add fused dropout kernels (#9666 ) Summary: While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask. Once dropout is moved to aten, these kernels still can be used for efficient implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666 Reviewed By: SsnL Differential Revision: D8948077 Pulled By: ezyang fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a	2018-08-07 13:34:53 -07:00
Viswanath Sivakumar	74979495f0	Optional input lengths in CTC op (#10228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10228 Sometimes, for all items in the minibatch in test mode, input length will be equal to max time steps. This avoids having to pass in an external tensor. Differential Revision: D9174378 fbshipit-source-id: 22f7d5c311c855d9c3ac59f2a5e773279bd69974	2018-08-07 13:34:51 -07:00
mruberry	9b1a65bec3	Extends type and shape tracing with device (#9796 ) Summary: This PR extends the existing type and shape metadata tracing and verification done in autograd with device information. This expansion of tracing is required for #8354, is likely useful in other scenarios, and is a healthy sanity check, just like type and shape tracing. The precise changes are: - TypeAndShape -> InputMetadata, now includes device() - Creating InputMetadata is simplified to just require a tensor, and callers were updated to use this simpler invocation wherever possible - The gradient accumulator of a variable is now reset when set_data() is called if either the type or device changes, and this reset now locks to avoid contention with acquiring the gradient accumulator - Mismatched devices during backward() will throw a runtime error, just like mismatched type and shape - (Bonus!) Two uninitialized pointers in THCReduce are now initialized (to nullptr) to prevent build warnings fyi colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9796 Reviewed By: goldsborough Differential Revision: D9119325 Pulled By: ezyang fbshipit-source-id: 76d1861b8d4f74db0575ff1f3bd965e18f9463de	2018-08-07 12:25:17 -07:00
Edward Yang	2993c42ee4	Squash some 'invalid escape sequence' warnings. (#10310 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10310 Differential Revision: D9196254 Pulled By: ezyang fbshipit-source-id: 63bb8e52ac6970fe8e11a2d3c491ab58250dc467	2018-08-07 12:25:15 -07:00
Wei Yang	db7a2b1f0d	fix doc for as_tensor (#10309 ) Summary: - fixes #9914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10309 Differential Revision: D9196427 Pulled By: weiyangfb fbshipit-source-id: c9a01e42c2e9dbfe2bd94ad14651d9f578751de2	2018-08-07 11:24:45 -07:00
Wei Yang	dcaafdd04b	fix doc of sparse_coo_tensor (#10308 ) Summary: - fixes #9998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10308 Differential Revision: D9196423 Pulled By: weiyangfb fbshipit-source-id: 23b4ed96e354ac9aa7c268aad105818a2c6d3bd8	2018-08-07 11:24:44 -07:00
Jorghi12	20a549b101	Start using a newer version of rocRand that's PyTorch compatible. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10280 Differential Revision: D9196349 Pulled By: Jorghi12 fbshipit-source-id: 4147f2e6e3fdd641b026f3761d684437591405be	2018-08-07 11:09:59 -07:00
Roy Li	fe68879832	Fix dir(torch) for python 3.7 (#10271 ) Summary: fixes #10160. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271 Differential Revision: D9188031 Pulled By: li-roy fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f	2018-08-07 09:57:51 -07:00
Edward Yang	ad76fc8807	s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275 Remove forwarding declaration in caffe2/core/common.h ``` codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN ``` Reviewed By: mingzhe09088 Differential Revision: D9184809 fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8	2018-08-07 08:54:26 -07:00
Edward Yang	66f7b8abbe	Better macro name hygiene prefixing. (#10274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274 Good C++ libraries don't take up un-namespaced identifiers like DISABLE_COPY_AND_ASSIGN. Re-prefix this. Follow up fix: codemod Caffe2 to use the new macro, delete the forwarding definition Reviewed By: mingzhe09088 Differential Revision: D9181939 fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c	2018-08-07 08:54:24 -07:00
Adam Lerer	18e298305e	Increase TCP listen queue size from 64 to 1024 (#10268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268 Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64. I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused. Reviewed By: soumith Differential Revision: D9182216 fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936	2018-08-07 08:26:06 -07:00
Edward Yang	1a797ec810	Revert "clean up the build a bit. We no longer need the separate buil… (#10285 ) Summary: …d_libtorch entrypoint (#9836)" This reverts commit 62e23a1ee47eb66056e6695cefef4e42599f8bd0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10285 Differential Revision: D9193107 Pulled By: ezyang fbshipit-source-id: de96dce12fdf74410413ae18feee5caf0bed0025	2018-08-07 07:40:20 -07:00
Michael Suo	b6402648f4	fix off-by-one bug in open-ended slicing (#10286 ) Summary: Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286 Differential Revision: D9193292 Pulled By: michaelsuo fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7	2018-08-07 00:39:42 -07:00
Michael Suo	5a7c710548	Support some basic list operations (#10225 ) Summary: Support a few basic operators: - eq - add - len - select (indexing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225 Differential Revision: D9172338 Pulled By: michaelsuo fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9	2018-08-07 00:39:40 -07:00
Michael Suo	1bae6e24c9	Change empty list literal compiler error to match actual builtin name (#10265 ) Summary: I changed the name of this builtin to match Python's native style, but forgot to change the compiler error to match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10265 Differential Revision: D9192963 Pulled By: michaelsuo fbshipit-source-id: 225ca4cd50fbbe3b31c369deeb3123a84342aab1	2018-08-07 00:39:39 -07:00
Edward Yang	fa9ea5bde9	Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264 Since we now have DISABLE_COPY_AND_ASSIGN macro in the file, CoreAPI is no longer an accurate name. Reviewed By: dzhulgakov Differential Revision: D9181687 fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67	2018-08-06 22:27:44 -07:00
Edward Yang	da44cf6101	Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263 Auxiliary changes that were needed: - Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file now) Reviewed By: dzhulgakov Differential Revision: D9181321 fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa	2018-08-06 22:27:40 -07:00
James Geboski	f1cf3105de	Revert D9169049: [pytorch][PR] Add new mkldnn fallback operators Differential Revision: D9169049 Original commit changeset: 3bc30250d734 fbshipit-source-id: 65a91594bda699ff9535b27dccd0d1e5d1a8036a	2018-08-06 20:39:30 -07:00
wuhuikx	f47bec821e	Add new mkldnn fallback operators (#10162 ) Summary: Add new ideep fallback operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10162 Reviewed By: yinghai Differential Revision: D9169049 Pulled By: wesolwsk fbshipit-source-id: 3bc30250d7340fea2c442f36d16b85241ceee6e7	2018-08-06 16:56:00 -07:00
Achal	25b2e88750	Stop propagating std flags to downstream gcc/nvcc (#10098 ) Summary: When we directly use -std=c++11, it propagates to the downstream applications. Problems: 1. Gcc flags propagating to nvcc. 2. nvcc flags propagating to nvcc. (Which throws an error like redeclaration of std flag) This PR will fix these propagation issues! Similar problem: https://github.com/FloopCZ/tensorflow_cc/pull/92 https://github.com/CGAL/cgal/issues/2775 Requires: Cmake 3.12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10098 Differential Revision: D9187110 Pulled By: ezyang fbshipit-source-id: 0e00e6aa3119c77a5b3ea56992ef3bbfecd71d80	2018-08-06 15:30:27 -07:00
Edward Yang	8b08eca203	Move ScalarType to ATen/core, splitting out Backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10262 Reviewed By: dzhulgakov Differential Revision: D9157408 fbshipit-source-id: 11631a35dfc6cb1f73f61ea08d3115f8ef4cb034	2018-08-06 15:30:25 -07:00
iotamudelta	a38b572de3	enable unit tests and other changes (#10266 ) Summary: This PR for the ROCm target does the following: * enable some unit tests on ROCm * fix a missing static_cast that breaks BatchNorm call on ROCm * fix BatchNorm to work on ROCm w/ ROCm warp sizes etc * improve the pyhipify script by introducing kernel scope to some transpilations and other improvements * fix a linking issue on ROCm * for more unit test sets: mark currently broken tests broken (to be fixed) * enable THINLTO (phase one) to parallelize linking * address the first failing of the elementwise kernel by removing non-working ROCm specialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266 Differential Revision: D9184178 Pulled By: ezyang fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297	2018-08-06 14:54:01 -07:00
Jerry Zhang	e0d43572c1	Cleaner semantics for Reserve (#10261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261 1. Reserve Currently, Reserve will allocate new memory and old data in the tensor is also preserved, and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend. We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor. Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve. 2. Extend/Shrink Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo. Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it. 3. Usage Patterns The expected usage patterns right now is: ``` t->Resize({0, 32, 32, 32}); t->template mutable_data<T>(); // set meta_ t->Reserve(100); auto* t_data = t->template mutable_data<T>(); // feed data to tensor using t_data for (int i = 0; i < 100; ++i) { t->Extend(1, 50, &context_); // you can continue to use t_data if you have reserved enough space // otherwise, you should call t->template mutable_data<T> again to // get the new data pointer since Extend will allocate new memory even // though the original data is preserved. } ``` Reviewed By: ezyang Differential Revision: D9128147 fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096	2018-08-06 14:40:16 -07:00
Xiaomeng Yang	a13a53c151	Optimize group_norm on cpu (#10246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10246 Optimize group_norm on cpu Reviewed By: houseroad Differential Revision: D9177878 fbshipit-source-id: 41f7aadc6336317c338c75daccef6cb98e9de9de	2018-08-06 14:26:09 -07:00
Peter Goldsborough	0c848f4179	Python integration for custom operators (#10149 ) Summary: Adds the Python path to custom operators, including dynamically loading operations into Python. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149 Reviewed By: ezyang Differential Revision: D9158380 Pulled By: goldsborough fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02	2018-08-06 13:54:48 -07:00
Anders Papitto	62e23a1ee4	clean up the build a bit. We no longer need the separate build_libtorch entrypoint (#9836 ) Summary: the new entrypoint is `./tools/build_pytorch_libs.sh caffe2` this will also speed up CI builds a bit, since we will no longer be compiling all of libtorch twice Pull Request resolved: https://github.com/pytorch/pytorch/pull/9836 Differential Revision: D9182634 Pulled By: anderspapitto fbshipit-source-id: 0b9a20ab04f5df2d5c4e7777e4dc468ab25b9ce2	2018-08-06 13:41:51 -07:00
Gregory Chanan	d1a0c2eaf8	Add back THTensor_nDimension. (#10259 ) Summary: Turns out some people are using this via the C-API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10259 Differential Revision: D9180135 Pulled By: gchanan fbshipit-source-id: 68f59beabf7f8093e67581d7e7ebfe8dff9e6b69	2018-08-06 11:09:41 -07:00
Gregory Chanan	6ac35b35d1	Stop using THLongStorage for sizes/strides, remove THLongStorageView. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10219 Reviewed By: cpuhrsch Differential Revision: D9159550 Pulled By: gchanan fbshipit-source-id: 745a6d335613688ed41b32369ee4938907ce8cbb	2018-08-06 09:25:32 -07:00
Jongsoo Park	835a5d4f49	Add cost inference of fwd sparse operators and sparse adagrad (#9314 ) Summary: We should also add cost inference for sparse operators in backward pass later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9314 Reviewed By: orionr Differential Revision: D8789240 Pulled By: jspark1105 fbshipit-source-id: 68c2170f294fe13bcc409276f599b5fa8a98bcd3	2018-08-06 08:39:16 -07:00
peter	506142ac8a	Add warning for building PyTorch using Python 2.7 on Windows (#10247 ) Summary: Fixes #9232. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247 Differential Revision: D9178257 Pulled By: SsnL fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca	2018-08-05 21:24:02 -07:00
Lingyi Liu	267c397c5b	Add the ocr_det model for benchmarking (#10245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245 as title Reviewed By: sf-wind Differential Revision: D9176654 fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449	2018-08-05 16:45:35 -07:00
Lingyi Liu	7f2e43a084	Add the ocr_rec model json (#10240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240 as title Reviewed By: sf-wind Differential Revision: D9176522 fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d	2018-08-05 16:45:23 -07:00
Shuichi KITAGUCHI	df23bdc82d	add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233 ) Summary: Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`. But "setup.py clean" deletes these folders because those are described in `.gitignore` file. To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker. Discussed in #10206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233 Differential Revision: D9175515 Pulled By: ezyang fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37	2018-08-05 15:55:44 -07:00
Xiaomeng Yang	f57e4ce1d5	Update broadcast with alpha to reduce num of launching kernels. (#10235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235 Update broadcast with alpha to reduce num of launching kernels. Reviewed By: houseroad Differential Revision: D9175824 fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0	2018-08-04 19:54:20 -07:00
Qin Huang	ab293924bb	support generic feature in DPER2 (#10197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197 Support generic feature in DPER2 For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature. For new feature types with specific structure, there should also be corresponding coding changes expected. Reviewed By: itomatik Differential Revision: D8788177 fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184	2018-08-04 15:25:13 -07:00
Xiaomeng Yang	57d2d4bcff	Optimize reduce ops for 2d and 3d (#9992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992 Optimize reduce ops for 2d and 3d Reviewed By: houseroad Differential Revision: D9042505 fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d	2018-08-04 13:53:58 -07:00
Richard Zou	29406a2c4c	Fix shared_ptr refcycle in graph executor (#10222 ) Summary: Fixes #10032 When capturing an output, GraphExecutorAutogradFunction creates SavedVariable with is_output=False and owns it: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87 Constructing SavedVariable with is_output=False makes it own a copy of the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference cycle: `6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)` The solution in this PR is to construct the SavedVariable with is_output=True if the captured value is an output. Test Plan Turn on cuda memory checking for JitTestCase. If the test's name includes "cuda" or "gpu" in it, the cuda memory checking test happens. cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222 Reviewed By: ezyang Differential Revision: D9162995 Pulled By: zou3519 fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66	2018-08-04 11:39:10 -07:00
Marat Dukhan	2141cb7d53	Update OnnxifiOp to reflect onnx/onnx#1256 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10230 Reviewed By: yinghai Differential Revision: D9174527 Pulled By: Maratyszcza fbshipit-source-id: 753493e67446b528d65b146e89ea9f874b469ead	2018-08-04 08:09:19 -07:00
Sergii Dymchenko	5df8547ff9	Fix ONNX LogSoftmax export. (#9576 ) Summary: This fixes an issue with incorrect `axis=-1` in the exported ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9576 Reviewed By: yinghai Differential Revision: D9125463 Pulled By: houseroad fbshipit-source-id: 6f4cb1067d1aa6bb0a9f56690fc21816c98eebfa	2018-08-03 22:09:42 -07:00
Edward Yang	36939417b2	Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175 Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf), intended to help us distinguish between CPU, CUDA, etc. devices. This replaces at::Device::Type entirely with at::DeviceType, which in turn is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType 'enum'. We can't eliminate the 'enum' because this would a pretty drastic API change (enum is interconvertible with integers, enum class is not) but we can make the two line up exactly and share code for, e.g., printing. Reviewed By: Yangqing Differential Revision: D9137156 fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab	2018-08-03 19:25:06 -07:00
Edward Yang	98d60ad43d	Replace caffe2::EnforceNotMet with at::Error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10184 Reviewed By: dzhulgakov Differential Revision: D9140095 fbshipit-source-id: 3beead825609cec5054347e59903b0b78ef150f8	2018-08-03 19:25:05 -07:00
Edward Yang	e2976ea519	Make at::Error look more like caffe2::EnforceNotMet (#10183 ) Summary: - New concept of a message stack; you can add messages using AppendMessage - New concept of a caller; it's just a way to pass along some arbitrary extra information in the exception Coming soon is changing Caffe2 to use at::Error instead of EnforceNotMet Pull Request resolved: https://github.com/pytorch/pytorch/pull/10183 Differential Revision: D9139996 Pulled By: ezyang fbshipit-source-id: 6979c289ec59bc3566a23d6619bafba2c1920de9	2018-08-03 19:25:03 -07:00
Edward Yang	c7c6e93312	Use target_compile_definitions for AT_CORE_STATIC_WINDOWS (#10213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10213 nvcc only respects definitions, not options. Reviewed By: dzhulgakov Differential Revision: D9154388 fbshipit-source-id: 04c4809154df1c61108b65f1115fccdeb336952e	2018-08-03 19:25:02 -07:00
Edward Yang	02a64b183c	Move ATenGeneral back out of core. (#10224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10224 It doesn't work with Caffe2; use AT_CORE_API from ATen/core/CoreAPI.h instead. Reviewed By: smessmer Differential Revision: D9162467 fbshipit-source-id: 3c7d83c1ccb722ebac469296bdd7c3982ff461e5	2018-08-03 19:25:01 -07:00
Edward Yang	41dce17e22	Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210 ) Summary: The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I fixed this by reordering the static initializers. Fixes #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210 Differential Revision: D9150697 Pulled By: ezyang fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35	2018-08-03 18:25:19 -07:00
Wei Yang	149d4f776b	use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965 ) Summary: - fixes #9141, #9301 - use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion) - return (N) instead of (N, C) to match the same behavior as MultiMarginLoss - Note that with this PR, the following behavior is expected: ``` loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none') loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean') loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum') loss.sum() == loss_sum # True loss.mean() == loss_mean # True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965 Differential Revision: D9038402 Pulled By: weiyangfb fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9	2018-08-03 17:54:19 -07:00
Dmytro Dzhulgakov	7bc87172ea	Kill Tensor::shares_data (#10217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217 It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages. Reviewed By: jerryzh168 Differential Revision: D9155685 fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895	2018-08-03 17:40:39 -07:00
Jerry Zhang	3b3aff2ed6	IsType<TensorCPU> -> IsType<Tensor>(CPU) (#10135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10135 att Reviewed By: yinghai Differential Revision: D9121892 fbshipit-source-id: 4a4a3bfc450896b619bf92c92ef218aaaefc3081	2018-08-03 17:24:59 -07:00
Sebastian Messmer	4aa7469d1f	Implement c10 ops needed for benchmark (#9360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9360 This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark. All implementations are CPU-only and experimental. They're not meant to be used in production. They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths. Reviewed By: dzhulgakov Differential Revision: D8811698 fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482	2018-08-03 16:09:27 -07:00
Sebastian Messmer	08e7af20d3	Implement calling of c10 ops from c2 (#9369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9369 This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept. I used this test script to make sure it works: from caffe2.python import workspace, model_helper import numpy as np data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) m = model_helper.ModelHelper(name="my net") sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1") sigmoid2 = m.net.Sigmoid("data1", "sigmoid2") workspace.RunNetOnce(m.param_init_net) workspace.CreateNet(m.net) data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) workspace.RunNet(m.name, 1) print(workspace.FetchBlob("data1")) print(workspace.FetchBlob("sigmoid1")) print(workspace.FetchBlob("sigmoid2")) (and check that both sigmoid outputs are the same) Reviewed By: ezyang Differential Revision: D8814669 fbshipit-source-id: eeb0e7a854727f1617a3c592a662a7e5ae226f40	2018-08-03 16:09:23 -07:00
wuhuikx	c5abe8844a	Add IDEEP fallbacks for Resnet50 training ops (#8541 ) Summary: 1. Add fallback gradient ops 2. In fallback ops, set the output Tensor as CPUTensor instead of IDEEPTensor if ndim = 0. Because IDEEPTensor doesn't support 0 dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8541 Reviewed By: yinghai Differential Revision: D9115233 Pulled By: wesolwsk fbshipit-source-id: 163e6a76f02bd781c95d1060ccbacf2cab90055e	2018-08-03 15:54:17 -07:00
Sebastian Messmer	4680ab4d44	Generalize intrusive_ptr comment (#10216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10216 - Reviewed By: ezyang Differential Revision: D9155601 fbshipit-source-id: 154de2e6ad747134413a3ab3ae0b7507b8284d49	2018-08-03 14:25:28 -07:00
Sebastian Messmer	97cbcb7d67	Allow releasing/retaining weak_intrusive_ptr (#10214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10214 Seems we're passing weak pointers over C API boundaries. Need this API there too. Reviewed By: ezyang Differential Revision: D9154505 fbshipit-source-id: c9889689b87dad5d918f93ba231e01704b8d2479	2018-08-03 14:25:24 -07:00
Thomas Viehmann	6456b944fd	ctc_loss odds and ends (#10112 ) Summary: - Add convenience wrapper to pass tensors as input_lengths, target_lengths - Fix documentation example - Check BLANK >= 0 Thank you, Simon and Soumith for the suggestions! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112 Differential Revision: D9130737 Pulled By: SsnL fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6	2018-08-03 13:25:18 -07:00
Sebastian Messmer	65d32b1705	Remove unused substitutions (#10187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10187 These substitutions don't actually occur in the target file. Remove them. Reviewed By: ezyang Differential Revision: D9141567 fbshipit-source-id: fcfddee0b4d31e21763b39d852577d2dbb9ce843	2018-08-03 12:25:59 -07:00
Sebastian Messmer	f51f15bb27	Update include paths for ATen/core (#10130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130 Update some include paths to make them internally consistent Reviewed By: ezyang Differential Revision: D9119906 fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467	2018-08-03 11:57:02 -07:00
Aaron Jaech	f77b62c3e1	Add documentation for margin arg in Caffe2 MarginRankingCriterionOp (#10186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10186 The MarginRankingCriterionOp margin argument was undocumented. Reviewed By: jerryzh168 Differential Revision: D9141228 fbshipit-source-id: 724d45dc8e555fbe9d3e8afc7b6bf8ed17bbbdb1	2018-08-03 11:45:51 -07:00
Peter Goldsborough	cb0e72e00d	Add registerOperator overloads that infer the schema (#10048 ) Summary: This PR adds a way to infer the JIT/script schema of a function from its signature, and then create an operator from the schema and implementation. The implementation function is wrapped into another function, which pops values from the stack into an argument tuple, then invokes the function and pushes the return value back onto the stack, sometimes unpacking the return value if it is a tuple. Currently the method is called `createOperator`. We may want to think of a nicer way of registering ops in tandem with `RegisterOperators`. It might be very cumbersome to add a template constructor to `Operator`, so maybe we can come up with a chaining method on `RegisterOperators` like `RegisterOperators(schema, func).op(schema.func).op(schema, func)` -- it has to work at startup time (for a static variable) though. We can solve this in another PR. zdevito apaszke smessmer dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10048 Differential Revision: D9125975 Pulled By: goldsborough fbshipit-source-id: de9e59888757573284a43787ae5d94384bfe8f9a	2018-08-03 11:45:49 -07:00
Owen Anderson	7a377b9a53	Add torch.argsort mirroring similar functionality in numpy. (#9600 ) Summary: Per issue #9542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600 Differential Revision: D8952338 Pulled By: resistor fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7	2018-08-03 11:45:47 -07:00
Sebastian Messmer	c91af1202a	Make release_resources non-const (#10192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10192 - release_resources() method must be non-const because it modifies the object - for intrusive_ptr<const MyClass>, this needs to be const_cast :( Reviewed By: ezyang Differential Revision: D9143808 fbshipit-source-id: 9203ff7a7ff3bec165931279371c6e75d4f0ca8c	2018-08-03 11:24:45 -07:00
Sebastian Messmer	39476d79a2	Allow releasing/reclaiming intrusive_ptr (#10133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10133 This is useful for C APIs where we want to give owning pointers to/from other languages. Reviewed By: ezyang Differential Revision: D9121493 fbshipit-source-id: f903f5830f587b2ba69c0636ddcf1a066bbac2e0	2018-08-03 11:24:43 -07:00
Edward Yang	5753746d29	Enable static initializer order ASAN. (#10211 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10211 Differential Revision: D9150687 Pulled By: ezyang fbshipit-source-id: 4cd458d19a34788c8897905a87d1b52229f67f90	2018-08-03 11:24:42 -07:00
Christian Puhrsch	4a6fbf03c6	Make StorageImpl member variables largely private and use getters and setters Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10074 Differential Revision: D9086887 Pulled By: cpuhrsch fbshipit-source-id: d2dd0d6a1b71d0f864aefb64cd1daefd11dcfb91	2018-08-03 11:10:02 -07:00
Wanchao Liang	50cf326158	Allow type cast between int and float in Script (#10168 ) Summary: The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168 Differential Revision: D9141163 Pulled By: wanchaol fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208	2018-08-03 10:56:05 -07:00
Jerry Zhang	5d3782b655	Fix IDEEP Copys (#10104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10104 . Reviewed By: yinghai Differential Revision: D9109638 fbshipit-source-id: 319cc5711132314dfba0f09ac403522f21ad532b	2018-08-03 10:31:32 -07:00
Jerry Zhang	656bb320b7	EnforceFinite test (#10143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143 att Reviewed By: xianjiec Differential Revision: D9122444 fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02	2018-08-03 10:31:29 -07:00
Michael Suo	13de6e8dfa	Make list literals construct ListType (#10193 ) Summary: Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like: ``` foo = [2] if True: foo = [2, 2] ``` to fail to compile, since `(int)` is not the same as `(int, int)`. This PR changes things so that list literals construct ListTypes, which can be resized. Potentially breaking changes introduced: - Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them. - Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work) - Lists must have a single type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193 Differential Revision: D9147166 Pulled By: michaelsuo fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039	2018-08-03 00:55:23 -07:00
Tongzhou Wang	ab0ac6391b	fix padding doc not rendered correctly (#10196 ) Summary: somehow sphinx doesn't like the previous wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/10196 Differential Revision: D9146817 Pulled By: SsnL fbshipit-source-id: 2140859bc363af556a021658def946d7afbdb245	2018-08-02 23:26:45 -07:00
Junjie Bai	4778afb8bb	In Expand support using -1 to indicate preserving original size (#10174 ) Summary: zrphercule https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174 Differential Revision: D9136467 Pulled By: bddppq fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583	2018-08-02 22:09:47 -07:00
Junjie Bai	dd527db711	Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179 ) Summary: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179 Differential Revision: D9139657 Pulled By: bddppq fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee	2018-08-02 21:09:27 -07:00
Zachary DeVito	1f78e06f63	Add g.insertConstant and clean up dead attributes code (#10177 ) Summary: * Changes `insertConstant(g, val)` to `g.insertConstant(val)`. * Moves SourceRange to its own file to enable it. * Cleans up dead attribute code in schema matching and graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177 Differential Revision: D9137789 Pulled By: zdevito fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d	2018-08-02 20:45:31 -07:00
Sebastian Messmer	798b530361	weak_intrusive_ptr (#10038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10038 Add weak_ptr ability to intrusive_ptr. Reviewed By: ezyang Differential Revision: D9039980 fbshipit-source-id: dd504d6e0d7acf5914cd45845355e28f9df201fb	2018-08-02 17:25:14 -07:00
Sebastian Messmer	2bd709a7c8	intrusive_ptr (#9897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9897 Add an IntrusivePtr class to do intrusive refcounting with a shared_ptr-like interface. Reviewed By: ezyang Differential Revision: D9018619 fbshipit-source-id: 5de8706aab8eea2e30bead0f59bd6a7ca4d20011	2018-08-02 17:25:12 -07:00
Roy Li	0e9c6898cb	Export modules in ir with google protobuf Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746 Differential Revision: D9110006 Pulled By: li-roy fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639	2018-08-02 15:54:51 -07:00
Lukasz Wesolowski	e2ecf3914a	Change default CUDA block size from 512 to 128 (#10090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090 Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000) Reviewed By: pjh5 Differential Revision: D9093573 fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd	2018-08-02 15:40:13 -07:00
Taewook Oh	7dc870bd7b	Delete invalid 'template' keyword (#10173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173 With D9024330, `Extend` fundtion is no more a template, which makes the `template` keyword here invalid. For some reason current version of LLVM doesn't catch this, but the latest one does. Reviewed By: jerryzh168 Differential Revision: D9133462 fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860	2018-08-02 14:50:11 -07:00
Owen Anderson	dad6e8bb6c	Remove capture specifiers in register_aten_ops when they're not needed. (#9669 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9669 Differential Revision: D8952335 Pulled By: resistor fbshipit-source-id: 8fbbec7a7f55fbeeda3509cb3d339e1db90a53e6	2018-08-02 13:40:31 -07:00
Christian Puhrsch	94c67f1454	Replace storageimpl type with scalar_type and backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10097 Differential Revision: D9124287 Pulled By: cpuhrsch fbshipit-source-id: c976abeeaaa085b972812c1a3270eb6aef0c0dca	2018-08-02 13:31:30 -07:00
Tongzhou Wang	538b15d13c	Use PYTORCH_PYTHON to call generate_code.py (#10171 ) Summary: Probably fixes https://github.com/pytorch/pytorch/issues/8373#issuecomment-409994847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10171 Differential Revision: D9135607 Pulled By: SsnL fbshipit-source-id: 72f535875658c857621e41fd25c2174052714557	2018-08-02 12:54:14 -07:00
Edward Yang	9e85a7a9de	Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169 Original commit changeset: 2b4d867abfdc Reviewed By: pjh5, SsnL Differential Revision: D9135216 fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5	2018-08-02 12:39:01 -07:00
onnxbot	7be071a829	Update onnx to onnx/onnx@2a3a226 (#10167 ) Summary: `2a3a226a96` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10167 Reviewed By: houseroad Differential Revision: D9134738 Pulled By: bddppq fbshipit-source-id: 9d3fd3c04a584d5626146f174ac78cabfa0e5934	2018-08-02 12:25:19 -07:00
Rob Kunkle	6e85112f12	Adding katex rendering of equations, and required edits to equations. (#8848 ) Summary: This fixes issue #8529. - Adds Katex extension to conf.py and requirements.txt - Fixes syntax differences in docs - Should allow documentation pages to render faster Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848 Reviewed By: soumith Differential Revision: D8677702 Pulled By: goodlux fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f	2018-08-02 12:25:17 -07:00
Junjie Bai	ee98533746	Fix compiler warnings on ignored const qualifiers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10142 Reviewed By: yinghai Differential Revision: D9125502 Pulled By: bddppq fbshipit-source-id: 8043b2a05507a4707220fa820ab6cc486760a93e	2018-08-02 12:10:37 -07:00
Edward Yang	5765549155	codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166 TypeIdentifier is still easy to codemod away from Reviewed By: smessmer Differential Revision: D9132840 fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5	2018-08-02 11:54:30 -07:00
Lin Li	4a2f3cc45f	Improve lars operator by applying clipping (#9905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905 This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate Reviewed By: pjh5 Differential Revision: D9020606 fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f	2018-08-02 11:54:28 -07:00
Gregory Chanan	a243e517fa	Guard sizes/strides in TH/THC for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10145 Differential Revision: D9125791 Pulled By: gchanan fbshipit-source-id: d0b8c88c49d7af85971a4531a63fd85a97bfbec7	2018-08-02 11:24:36 -07:00
Elias Ellison	170d29769b	Strings lexing, parsing, implementation in print (#9324 ) Summary: This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print. If they are encountered elsewhere a NYI exception will be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324 Reviewed By: jramseyer Differential Revision: D8807128 Pulled By: eellison fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078	2018-08-02 11:09:03 -07:00
Gregory Chanan	230ca98d4b	Remove THTensor_isSize. (#10146 ) Summary: This is part of the process of removing THLongStorage to represent sizes/strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10146 Differential Revision: D9126611 Pulled By: gchanan fbshipit-source-id: b0d995a4c51dfd54bf76dcfee9a69f37f9d01652	2018-08-02 10:39:43 -07:00
James Reed	9c818bfbc7	Refactor PythonValue types + use tryMatchSchema for PythonOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132 Differential Revision: D9121327 Pulled By: jamesr66a fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da	2018-08-02 10:26:58 -07:00
iotamudelta	cfa05706ef	ROCm contributions week 29 (#9653 ) Summary: In this changeset: * improvements to `hipify-python.py` * marking unit tests broken for ROCm * reducing the number of jobs for the built to avoid out of memory issues * switch to Thrust/cub-hip master for the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653 Differential Revision: D9117791 Pulled By: ezyang fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352	2018-08-02 09:09:00 -07:00
Runtian Zhou	70d47f92db	Add support for rand_like op in fusion compiler (#9795 ) Summary: Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795 Differential Revision: D8999029 Pulled By: SsnL fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8	2018-08-02 08:55:25 -07:00
Duc Ngo	4a5cd4f6ab	nomnigraph - new utility for graph transformation (#10081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10081 Add new utility that make it easier to write graph transformation. Callers now only need to take care of the actual transformation logic. The subgraph matching is simplified because callers only need to specify a simple construct for subtree matching criteria. The utlity is SubgraphMatcher::replaceSubtree Some notes: - replaceSubtree takes a subtree matching criteria, and a lambda that takes a subtree root. It does't not handle any transformations itself. Callers should be responsible for the transformation part, including deleting all nodes in the matched subtree(s). We could enhance this to also handle the deletion part if it turns out to be useful. - Only sub tree matching is supported for now but we can add general DAG sub-graph support later if needed. Reviewed By: bwasti Differential Revision: D9073297 fbshipit-source-id: 465a0ad11caafde01196fbb2eda2d4d8e550c3b6	2018-08-01 23:09:41 -07:00
Zhicheng Yan	acbc2744d8	fix bug in 3d group convolution (#9860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9860 For 3D group convolution, in the case of CUDNN 7 and NCHWD order, filter dim is (M, C/group_, k_h, h_w, k_d). According to CUDA doc (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#grouped-convolutions), the existing implementation is incorrect, and will crash the 3d video model training with group convolution. In the implementation, `filter.dims(1)` is already `C/group_`. So don't need to divide it by `group_` again. Reviewed By: BIT-silence Differential Revision: D9008807 fbshipit-source-id: 2f0d6eb47f4e16d7417a7e3baeba709e3254154f	2018-08-01 22:55:38 -07:00
Chunli Fu	57061d600a	Auto-batching IR transformation for control flow (#9392 ) Summary: Implement IR transformation for control flow - `prim::Constant`: clone to new graph directly - `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1` - `prim::TensorToNum`: clone to new graph - `prim::ListConstruct`: clone to new graph - `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond` - `prim::Loop`: - for loop - while loop: change while `cond` to `cond_any`, use `cond` to update outputs test case: hand-written LSTM, greedy search, beam search Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392 Differential Revision: D8822369 Pulled By: ChunliF fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5	2018-08-01 22:24:35 -07:00
Edward Yang	8a25acbba5	Use angle brackets instead of quotes for includes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10153 Reviewed By: smessmer Differential Revision: D9123768 fbshipit-source-id: 0970552ba4d5772fb3cef2db3af3181d98f85140	2018-08-01 22:02:51 -07:00
Edward Yang	5699250acc	Move IdWrapper to ATen/core (#10152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10152 - Moved from namespace c10::guts to at - I fixed the use sites, since there were only three of them - Macro renamed from C10_ to AT_ Reviewed By: smessmer Differential Revision: D9123652 fbshipit-source-id: bef3c0ace046ebadb82ad00ab73371f026749085	2018-08-01 22:02:50 -07:00
Edward Yang	8cc7d33656	Renumber typeid.h so that the number lines up with ScalarType (#10139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10139 We want CaffeTypeId to be interconvertible with at::ScalarType, and this means we should have the numbers line up exactly. Fortunately this is not too hard to do. Reviewed By: smessmer Differential Revision: D9123058 fbshipit-source-id: 7e9bd59ca25a552afe9d2d0a16cedc4f6311f911	2018-08-01 22:02:46 -07:00
Richard Zou	6b338c8026	Implement torch.broadcast_tensors (#10075 ) Summary: This exposes expand_outplace to python. Fixes #8076. Fixes #10041. I didn't name it torch.broadcast because numpy.broadcast does something slightly different (it returns an object with the correct shape information). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075 Differential Revision: D9125816 Pulled By: zou3519 fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1	2018-08-01 19:18:34 -07:00
Michael Suo	191482fa39	Distinguish TupleLiteral from ListLiteral (#10128 ) Summary: Previously, the parser was emitting list literals for tuples, but the IR was representing list literals internally with TupleTypes. For implementing most list operations, I think it will be helpful distinguish between lists (dynamic size, homogeneous types) and tuples (fixed arity, heterogeneous types) This diff modifies the parser logic to emit tuple literals. This frees us to represent lists as ListType in the IR, while still properly mapping tuple literals to TupleTypes. A following diff will actually switch over list literals to emit ListTypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10128 Differential Revision: D9121305 Pulled By: michaelsuo fbshipit-source-id: e0cad07ae8bac680f7f8113d10e5129d5a1a511d	2018-08-01 19:18:31 -07:00
Yinghai Lu	a44d9d6eb4	Fix tensor check logic in logging (#10138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10138 Note that `TensorCPU` and `TensorGPU` are all refined to be `Tensor` now. Basically they are the same thing. So check like `blob.IsType<TensorCPU>()` is no longer safe as `TensorGPU` can pass the check too. We need to systematically weed out the such usage in our codebase... @[100008320710723:jerryzh] Reviewed By: houseroad Differential Revision: D9115273 fbshipit-source-id: 13b293c73691002eac34e095cdcd96c27183e875	2018-08-01 18:09:19 -07:00
Edward Yang	24bb8cecbe	Move ATen/Half to ATen/core, and apply lint (#10137 ) Summary: This rewrites checked_convert to use stringstreams, eliminating the use of to_string which is not available on Android stdc++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10137 Reviewed By: smessmer Differential Revision: D9122340 fbshipit-source-id: b7c1bff70e36217305f2b3333c51543ef8ff3d9c	2018-08-01 17:54:58 -07:00
Junjie Bai	806854a3c5	Pin AMD gpu id in Caffe2 CI (#10144 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10144 Differential Revision: D9125707 Pulled By: bddppq fbshipit-source-id: 8ef8f3da6ceb1855f28fc24be621b9b4854ff7f9	2018-08-01 17:39:21 -07:00
Edward Yang	59c355c870	Move halfbits2float and float2halfbits conversions to ATen. (#10134 ) Summary: This will be needed soon because I want to move Half.h into ATen/core, and then I cannot have a TH dependency. I also took the liberty of making the code more strict-aliasing safe (this is not actually useful, since we will never built Torch with strict aliasing) by replacing pointer casts between float and unsigned with a memcpy instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10134 Differential Revision: D9121920 Pulled By: ezyang fbshipit-source-id: 3b1f86a7c5880e8ac1a589a51f0635bb72e1fd40	2018-08-01 17:09:12 -07:00
Jenny Ramseyer	4ed5b9267c	#8518 Support for empty tuples (#10027 ) Summary: Fixing #8518 Sorry for the pile of commits; I forgot to rebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10027 Reviewed By: ezyang Differential Revision: D9070028 Pulled By: jramseyer fbshipit-source-id: 49729c9755ab8a586711e9f6d6a574f3035a7e75	2018-08-01 16:10:00 -07:00
Pushkar Tripathi	1f6888b70a	Allow mobile exporter to export string arrays (#10017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017 Allow mobile exporter to export string arrays Reviewed By: pjh5 Differential Revision: D9061213 fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7	2018-08-01 16:09:58 -07:00
Edward Yang	1d427fd6f6	Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787 ) Summary: …e_/is_variable_ The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I didn't fix it in this PR; it's tracked in #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787 Reviewed By: cpuhrsch Differential Revision: D8980971 Pulled By: ezyang fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8	2018-08-01 15:34:56 -07:00
Sebastian Messmer	edb90387b2	Lint ArrayRef.h (#10129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10129 - Reviewed By: ezyang Differential Revision: D9119933 fbshipit-source-id: dd13c6d2a0ab72d943acff5cb02b3278ca8c7ba6	2018-08-01 15:34:54 -07:00
Sebastian Messmer	080ae5ea1f	Remove implicit ArrayRef -> vector conversion (#9740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9740 - Remove implicit ArrayRef -> vector conversion - Fix 4 call sites that accidentally did an implicit expensive vector conversion but wouldn't have needed to - Remove explicit vector conversion from 4 call sites that also didn't need to do that Reviewed By: ezyang Differential Revision: D8961693 fbshipit-source-id: 980da9f988083c0072497f9dbcbbf6f516fa311c	2018-08-01 15:34:52 -07:00
Sebastian Messmer	e2846c365a	Improve ArrayRef (#9610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9610 Mostly making some stuff in ArrayRef constexpr to give it better perf. Reviewed By: ezyang Differential Revision: D8926785 fbshipit-source-id: af6d4b05fbc69d20855a80f3edc2b501577a742b	2018-08-01 15:34:50 -07:00
Richard Zou	ad6d62250a	Add torch.compiled_with_cxx11_abi(). (#10071 ) Summary: It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1. Fixes #8385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071 Differential Revision: D9088946 Pulled By: zou3519 fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0	2018-08-01 15:34:48 -07:00
onnxbot	1b1c47dfe5	Update onnx to onnx/onnx@32ac71b (#10126 ) Summary: `32ac71b1b9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10126 Reviewed By: houseroad Differential Revision: D9120544 Pulled By: bddppq fbshipit-source-id: 4fbe1f16e3b712c092f2f188324173ba1ecc1062	2018-08-01 14:28:54 -07:00
Gregory Chanan	fb24c52dc3	Prepare TH for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10123 Differential Revision: D9121068 Pulled By: gchanan fbshipit-source-id: 1cdc6e4b327cf158729cbb4026315be63b159f9d	2018-08-01 14:28:53 -07:00
Gregory Chanan	2d56b5cf8b	Prepare THC for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072 Differential Revision: D9082421 Pulled By: gchanan fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd	2018-08-01 14:28:51 -07:00
Edward Yang	59af5b928a	Move UniqueVoidPtr to ATen/core and apply lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10131 Reviewed By: smessmer Differential Revision: D9121096 fbshipit-source-id: a6861429f06302e3e279ff669961bba34a9fb7a1	2018-08-01 13:25:23 -07:00
Edward Yang	2d6738e89e	Fix lint in ATen/core (but not ArrayRef) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10124 Reviewed By: smessmer Differential Revision: D9119768 fbshipit-source-id: c0a56d27401b730956945146d4f48d4d5a9b77a6	2018-08-01 13:25:19 -07:00
Roy Li	f908b2b919	Use google protobuf in pytorch onnx import/export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8469 Reviewed By: houseroad Differential Revision: D9102041 Pulled By: li-roy fbshipit-source-id: 805c473745d181b71c7deebf0b9afd0f0849ba4f	2018-08-01 12:54:41 -07:00
Edward Yang	5a44be50ab	Minor nit in comment in CMakeLists.txt Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10125 Reviewed By: smessmer Differential Revision: D9119766 fbshipit-source-id: 290b804bc552b1c3f68e5129ff60ef7f34307714	2018-08-01 12:39:38 -07:00
Anders Papitto	e8f27311aa	fix a couple problems with libtorch cmake file (#10091 ) Summary: in particular, make not building tests actually work Pull Request resolved: https://github.com/pytorch/pytorch/pull/10091 Differential Revision: D9121366 Pulled By: anderspapitto fbshipit-source-id: d7d38cf759aa46bff90d3b4f695c20f29039ae75	2018-08-01 11:39:33 -07:00
Owen Anderson	f126687fbc	Add a dump() method to IR Node's. (#10106 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10106 Differential Revision: D9119891 Pulled By: resistor fbshipit-source-id: 5f41d8890007c639f8f0cdc92d11b128433ad6b8	2018-08-01 11:09:53 -07:00
Sebastian Messmer	4070005081	Move C++17.h to ATen/core (#10107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10107 This header is needed for ATen/core stuff This diff also fixes an issue in C++17.h when run in C++17 enabled compilers. Reviewed By: ezyang Differential Revision: D9095209 fbshipit-source-id: d45947956019a7095875f48746b88c414e8865bc	2018-08-01 09:54:59 -07:00
Peter Goldsborough	87d57dc5f5	Simplified Operator (#10080 ) Summary: zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things: 1. Removes all code associated with attributed operators, 2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly). Now rebased on top of https://github.com/pytorch/pytorch/pull/9801 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080 Differential Revision: D9113668 Pulled By: goldsborough fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750	2018-08-01 09:41:08 -07:00
Mingzhe Li	f1964c43fd	Update eigen submodule to fix BUILD_ATEN issue (#10095 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 Updating Eigen submodule to fix an issue we saw with BUILD_ATEN and BUILD_CAFFE2 removal. cc mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10095 Reviewed By: mingzhe09088 Differential Revision: D9109877 Pulled By: orionr fbshipit-source-id: 90e36c298d8a22398558d70dc5f68a95a7687d6b	2018-08-01 09:41:06 -07:00
Anders Papitto	a2a7b0c01a	Initial documentation for building libtorch (#10087 ) Summary: It's not a particularly pretty process right now, but it may as well be documented. I'm not aware of an ideal location for this, so I'm just dropping it in the docs/ folder for now as recommended by soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10087 Differential Revision: D9119681 Pulled By: anderspapitto fbshipit-source-id: cd4afb642f3778c888d66a501bc697d0b0c88388	2018-08-01 09:41:02 -07:00
Dr. Kashif Rasul	ee964c51f4	NegativeBinomial distribution (#9345 ) Summary: - [x] implement distribution - [x] add tests - [x] docs cc ingmarschuster Pull Request resolved: https://github.com/pytorch/pytorch/pull/9345 Differential Revision: D8807023 Pulled By: ezyang fbshipit-source-id: 7bf7f352dd455e0909c58dd94e1bdebba0e8b5c8	2018-08-01 08:39:25 -07:00
Xingdong Zuo	2f848ec8ec	Use new PyTorch API to make code simpler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9968 Differential Revision: D9088316 Pulled By: li-roy fbshipit-source-id: 2658fe0c1734d8b064cbad24d8f0d6c341400b4e	2018-08-01 08:39:23 -07:00
Edward Yang	fa6b28bf40	Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092 ) Summary: This also makes Backtrace more portable, by disabling its functionality for mobile builds as well. It also handles Caffe2 static Windows builds by introducing a new variable, AT_CORE_STATIC_WINDOWS, which must be set if you're building ATen on Windows as part of a static library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092 Reviewed By: gchanan, smessmer Differential Revision: D9094393 Pulled By: ezyang fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4	2018-08-01 08:39:22 -07:00
Gregory Chanan	b503109f20	Guard sizes/strides in THCUNN for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10083 Differential Revision: D9093572 Pulled By: gchanan fbshipit-source-id: a5c27571ec06f8ed30e6b3b492c743444b58d9fe	2018-08-01 08:10:33 -07:00
Tongzhou Wang	43b151224e	Move grid sampler to ATen (#9961 ) Summary: Spatial version benchmark \| \| CPUFloat THNN \| CPUFloat ATen \| CPUDouble THNN \| CPUDouble ATen \| CUDAHalf THNN \| CUDAHalf ATen \| CUDAFloat THNN \| CUDAFloat ATen \| CUDADouble THNN \| CUDADouble ATen \| \|---------------------------\|---------------\|---------------\|----------------\|----------------\|---------------\|---------------\|----------------\|----------------\|-----------------\|-----------------\| \| [1024x1x28x28] zero pad \| 2.19281888s \| 0.21280479s \| 2.52922535s \| 0.23944831s \| 0.17494774s \| 0.06242800s \| 0.31270599s \| 0.03706479s \| 0.40542483s \| 0.07391024s \| \| [1024x1x28x28] border pad \| 3.04329610s \| 0.24705672s \| 2.29205394s \| 0.22336411s \| 0.17980361s \| 0.06212497s \| 0.31415701s \| 0.03847790s \| 0.43020391s \| 0.07540464s \| \| [32x3x244x244] zero pad \| 18.29301333s \| 2.18566656s \| 19.01662397s \| 3.51552224s \| 1.72487235s \| 0.28933954s \| 2.02466702s \| 0.18178749s \| 2.63671613s \| 0.41391206s \| \| [32x3x244x244] border pad \| 18.72205329s \| 2.02600884s \| 20.13017297s \| 3.25979590s \| 1.96455693s \| 0.33070564s \| 2.18666625s \| 0.19546938s \| 2.91268897s \| 0.38465047s \| For #9702 basics: + grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d). + input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d) + grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar). variable naming: + `tensor_sH` means the stride of `tensor` at the dimension of `H`. + `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961 Differential Revision: D9057175 Pulled By: SsnL fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6	2018-08-01 07:54:46 -07:00
Xiang Gao	6fc75eadf0	Add CELU activation to pytorch (#8551 ) Summary: Also fuse input scale multiplication into ELU Paper: https://arxiv.org/pdf/1704.07483.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551 Differential Revision: D9088477 Pulled By: SsnL fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b	2018-08-01 07:54:44 -07:00
Wei Yang	6f6a1f2d63	fix test_load_error_msg failure (Network is unreachable) (#10021 ) Summary: - fixes [some failure] - removed use of urlopen in test_load_error_msg] cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021 Differential Revision: D9068108 Pulled By: weiyangfb fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2	2018-08-01 00:24:01 -07:00
Pritam Damania	5bd43a7af8	Refactor Seq2SeqModelCaffe2EnsembleDecoder (#10035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035 This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class. Reviewed By: jmp84 Differential Revision: D9026372 fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8	2018-07-31 23:09:09 -07:00
Yinghai Lu	3d247041e4	Force sync device when ops are sampled for observation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10054 Reviewed By: xw285cornell Differential Revision: D9071097 fbshipit-source-id: 44357cdf79148e81db86c5350122a1a320a923fb	2018-07-31 21:09:00 -07:00
Bram Wasti	ec807f2a91	Bail out if netdef has disable_nomnigraph argument Summary: allow models to override nomnigraph opts Reviewed By: ajtulloch Differential Revision: D9035729 fbshipit-source-id: 2b30208263c14ce7039f27c618a3b232bf11ee33	2018-07-31 20:54:46 -07:00
Bram Wasti	fcd567ed15	Enable Optimization on mobile by default Summary: Re-enable opt by default Reviewed By: Maratyszcza Differential Revision: D8525434 fbshipit-source-id: a61253907251a44cfc59e0b50fb1906c5eb20558	2018-07-31 20:54:44 -07:00
Peter Goldsborough	7d2bda7588	Move DDP broadcast coalesced to C++ (#9729 ) Summary: This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions. pietern The controller you requested could not be found. apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9729 Differential Revision: D8985308 Pulled By: goldsborough fbshipit-source-id: dc459fe9040273714044152063585e746974752f	2018-07-31 19:54:21 -07:00
0phoff	294c065384	Changed serialization mechanism of LambdaLR scheduler (#9927 ) Summary: I opened an issue explaining some of my frustrations with the current state of schedulers. While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix. This PR changes the way the LambdaLR scheduler gets serialized: > The lr_lambda functions are only saved if the are callable objects (which can be stateful). > There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless. This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function). Does this functionality need any unit tests? Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927 Differential Revision: D9055505 Pulled By: soumith fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f	2018-07-31 19:39:06 -07:00
Kyle M. Tarplee	aae37324cc	fixed a newly introduced regression in softmax (#10066 ) Summary: There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence. This is a major regression that broke my training. I'm not sure how a unit test did not catch this. ``` x = torch.tensor([1, 2, 3.5, 4]) print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0 print(F.softmax(-x, dim=0)) # this is what softmax should be print(F.softmax(x, dim=0)) print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly ``` In 0.4.1 this produces tensor([-0.0278, -0.0755, -0.3385, -0.5581]) tensor([0.6668, 0.2453, 0.0547, 0.0332]) tensor([0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) In 0.4.0 this produces the correct values tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066 Differential Revision: D9106995 Pulled By: soumith fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b	2018-07-31 19:28:30 -07:00
Bram Wasti	f2412fbafc	Allow multiple ops.def and clean up code gen in general Summary: This is a cleanup and refactoring. In its original form (changeset 6fdf915c057a) this diff caused a 5% regression on ads CPU. The root cause was an omission of link_whole = True, causing symbols to be stripped in mode/opt and forcing the converter to fallback causing patterns to be unmatched in the graph transform logic. This version of the diff tests for link_whole by including a C++ test of the transform Reviewed By: yinghai Differential Revision: D9040511 fbshipit-source-id: 3e19b89989aa68b021762d12af2d0b4111280b22	2018-07-31 19:28:28 -07:00
Shuichi KITAGUCHI	799c947cf3	add .gitattributes for EOL conversion. (#9813 ) Summary: `.bat` file's EOL is LF, so a build is failed on some Windows machines. To fix this, add `.gitattributes` and set batch file's EOL to CRLF. Discussion is in #9677. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9813 Differential Revision: D9026486 Pulled By: soumith fbshipit-source-id: 341eaa677c35f8476a7eda1bac9827385072eb29	2018-07-31 18:38:43 -07:00
Bram Wasti	9c0f65fc87	Remove While op stuff (#10102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10102 these codepaths are unused, deleting them Reviewed By: yinghai Differential Revision: D9109764 fbshipit-source-id: 8ace42a399806632bfbcada96b383268f0a8ae89	2018-07-31 17:56:25 -07:00
Bram Wasti	c54d71ba60	Upgrade old transform passes to newer APIs (#10046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10046 stampable Reviewed By: duc0 Differential Revision: D9075830 fbshipit-source-id: dc65be1d39625ef24ad319b5ce0263ecfe7a10c9	2018-07-31 17:39:35 -07:00
Bram Wasti	ceb0f14176	Fix SpatialBN Fusion (#10044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10044 The test was subtly broken! This transform wasn't writing to the correct blob and the test did not catch that because it was looking at the old version. thanks @[100022211048576:kerenzhou] for catching this Reviewed By: Jokeren Differential Revision: D9075520 fbshipit-source-id: c31ff0afcd78dd2dc7ffc240e2e89eeda87f1fb4	2018-07-31 17:39:34 -07:00
Zachary DeVito	bf744bea94	Parse and register schema declarations lazily (#9801 ) Summary: This should prevent slow startup times, and will not report as many errors during static initialization time which are hard to debug ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801 Reviewed By: goldsborough Differential Revision: D8986603 Pulled By: zdevito fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc	2018-07-31 17:24:24 -07:00
Gregory Chanan	34c7c56c73	Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077 ) Summary: This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077 Differential Revision: D9087491 Pulled By: gchanan fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354	2018-07-31 16:43:45 -07:00
Junjie Bai	ba5d33bede	Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060 Differential Revision: D9081387 Pulled By: bddppq fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6	2018-07-31 15:27:05 -07:00
Yinghai Lu	e04f8bbfa6	Add virtual dtor for ideep context (#10059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10059 Without virtual dtor, it could induce incorrect sized deallocation, messing up the memory. And unfortunately, sized deallocation cannot be detected by ASAN, yet. Reviewed By: jerryzh168 Differential Revision: D9080526 fbshipit-source-id: c136cf653134e75b074326be2bc03627da42446f	2018-07-31 15:27:02 -07:00
Edward Yang	d2178562a4	Remove some unnecessary includes. (#10085 ) Summary: The affected files are all files that are planned to be moved to ATen/core; the includes are for headers which are NOT slated for movement. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10085 Differential Revision: D9093746 Pulled By: ezyang fbshipit-source-id: 2beeffdae26d03d631d2d51b40bf6303759a2f50	2018-07-31 15:13:37 -07:00
Adam Paszke	1f13453b4d	Slightly relax the constraints on argument and return types to script functions (#9969 ) Summary: This lays out initial support for taking and returning a richer set of types than only tensors. Floats and ints are already valid, lists are straightforward to add, tuples need some discussion. Based on top of #9948. Review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969 Reviewed By: zdevito Differential Revision: D9076973 Pulled By: apaszke fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0	2018-07-31 14:25:29 -07:00
Sebastian Messmer	58fd6e1dd6	Also add ATen/core tests to oss CI (#10029 ) Summary: - Pull Request resolved: https://github.com/pytorch/pytorch/pull/10029 Reviewed By: ezyang Differential Revision: D9070030 Pulled By: smessmer fbshipit-source-id: b5ae79a383dc14e7d79e6a82c5d70e951c9f5168	2018-07-31 13:54:39 -07:00
Lu Fang	ee17ed672b	Add missing dependencies (#10086 ) Summary: Fix the master Pull Request resolved: https://github.com/pytorch/pytorch/pull/10086 Differential Revision: D9093741 Pulled By: houseroad fbshipit-source-id: 65e42994ae7d8e0b449d10a8116a7609434aad04	2018-07-31 13:54:38 -07:00
Roy Li	2422801625	fix _pointwise_loss for target gradients (#10018 ) Summary: _pointwise loss has some python special casing, we converted reduction to aten enums too early. fixes #10009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018 Differential Revision: D9075489 Pulled By: li-roy fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162	2018-07-31 13:39:58 -07:00
Lu Fang	56d1a82b31	Add shape inference when converting from onnx to caffe2 (#10037 ) Summary: Otherwise, some RNN case conversion may fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10037 Reviewed By: orionr Differential Revision: D9072298 Pulled By: houseroad fbshipit-source-id: 080f589eba8618719453feb15a7a494fe5380dd0	2018-07-31 12:42:02 -07:00
Ailing Zhang	371a786b18	Errors out when Openmpi < 2.x.x with distributed. (#10015 ) Summary: This PR fixes #9418 . Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version. I've tested on 2.1.1 and 3.0.0 and they work well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015 Reviewed By: soumith Differential Revision: D9088103 Pulled By: ailzhang fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045	2018-07-31 12:24:40 -07:00
Edward Yang	1ae520c704	Add AT_CHECK for null storage. (#9823 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9823 Differential Revision: D9029433 Pulled By: ezyang fbshipit-source-id: 6101556305593c66f618b20d8c2a084ae2558ea8	2018-07-31 12:09:25 -07:00
Thomas Viehmann	685224aa14	Add CTC loss (#9628 ) Summary: The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the modification that is is in log space. The there also is a binding for the (much faster) CuDNN implementation. This could eventually fix #3420 I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments. I could use feedback on all sorts of things, including: - Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors) - Input convention. I use log probs because that is what the gradients are for. - Launch parameters for the kernels - Errors and obmissions and anything else I'm not even aware of. Thank you for looking! In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this). I have read CuDNN is much faster than implementations because it does not use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step. Average timings for the kernels from nvprof for some size: ``` CuDNN: 60.464us compute_alphas_and_betas 16.755us compute_grads_deterministic Cuda: 121.06us ctc_loss_backward_collect_gpu_kernel (= grads) 109.88us ctc_loss_gpu_kernel (= alphas) 98.517us ctc_loss_backward_betas_gpu_kernel (= betas) WarpCTC: 299.74us compute_betas_and_grad_kernel 66.977us compute_alpha_kernel ``` Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations. Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably. My performance measuring testing script: ``` import timeit import sys import torch num_labels = 10 target_length = 30 input_length = 50 eps = 1e-5 BLANK = 0#num_labels batch_size = 16 torch.manual_seed(5) activations = torch.randn(input_length, batch_size, num_labels + 1) log_probs = torch.log_softmax(activations, 2) probs = torch.exp(log_probs) targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long) targets_2d = targets.view(batch_size, target_length) target_lengths = torch.tensor(batch_size[target_length]) input_lengths = torch.tensor(batch_size[input_length]) activations = log_probs.detach() def time_cuda_ctc_loss(grout, args): torch.cuda.synchronize() culo, culog_alpha = torch._ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_cudnn_ctc_loss(groupt, args): torch.cuda.synchronize() culo, cugra= torch._cudnn_ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_warp_ctc_loss(grout, args): torch.cuda.synchronize() culo = warpctc.ctc_loss(args, blank_label=BLANK, size_average=False, length_average=False, reduce=False) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() if sys.argv[1] == 'cuda': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cuda_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'cudnn': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cudnn_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'warpctc': import warpctc activations = activations.cuda().detach().requires_grad_() args = [activations, input_lengths.int(), targets.int(), target_lengths.int()] grout = activations.new_ones((batch_size,), device='cpu') torch.cuda.synchronize() print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals())) ``` I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628 Differential Revision: D8952453 Pulled By: ezyang fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860	2018-07-31 11:09:48 -07:00
Edward Yang	430e44480f	Delete some obsolete steps in the ROCm build. (#10005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10005 Differential Revision: D9066107 Pulled By: ezyang fbshipit-source-id: 346f654214cff1c956a4022173347d95657ee9d4	2018-07-31 11:09:46 -07:00
Junjie Bai	f779202711	Correctly set CAFFE2_DISABLE_NUMA when USE_NUMA=OFF in cmake (#10061 ) Summary: previously https://github.com/pytorch/pytorch/blob/master/caffe2/core/numa.cc still gets compiled even when USE_NUMA=OFF Pull Request resolved: https://github.com/pytorch/pytorch/pull/10061 Reviewed By: houseroad Differential Revision: D9081385 Pulled By: bddppq fbshipit-source-id: ad28b647e0033727839770b1da0fba341b1b7787	2018-07-31 11:01:51 -07:00
Junjie Bai	cba03e2ebe	Handle dynamic repeats in onnx symbolic (#10052 ) Summary: ONNX Tile can takes the `repeats` as dynamic input Pull Request resolved: https://github.com/pytorch/pytorch/pull/10052 Differential Revision: D9076841 Pulled By: bddppq fbshipit-source-id: ddd692c5f5846c8fdba019baa9fad83ef9638da4	2018-07-31 10:39:50 -07:00
Gregory Chanan	0c11101eca	Prepare THNN/THCUNN for first class scalars. (#10023 ) Summary: I previous did some transformations, e.g. _nDimension,_dim -> nDimensionLegacyAll, nDimension -> nDimensionLegacyNoScalars. But this didn't touch dim(), which needs to be updated to support scalars. Instead of doing an (ugly) move, I audited the call sites and updated the cases that could be size 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10023 Differential Revision: D9068996 Pulled By: gchanan fbshipit-source-id: c63820767dd1496e908a5a96c34968482193f2c5	2018-07-31 10:39:48 -07:00
Mohammad Hossein Sekhavat	c2d9d2888b	Fix typo in tensors.rst (#10073 ) Summary: An tensor -> A tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10073 Differential Revision: D9087421 Pulled By: soumith fbshipit-source-id: 6713f5a5e11fb11dff0ab5d2d6274f7837c6625f	2018-07-31 10:13:40 -07:00
103yiran	68cbe37c6a	fix the reference link path Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9240 Reviewed By: SsnL Differential Revision: D8764196 Pulled By: ezyang fbshipit-source-id: 3efc70714406d801ed74f52313beca61129593c7	2018-07-31 09:09:46 -07:00
Adam Paszke	5e5c15dd42	Add (constant size) TensorLists to JIT, use them in cat and stack nodes (#9948 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9948 Reviewed By: ezyang Differential Revision: D9033666 Pulled By: apaszke fbshipit-source-id: 02d75e391ed6dee62500842df50f0b6ee5e38846	2018-07-31 07:39:52 -07:00
Gregory Chanan	6fb9acfc16	Revert empty n-dim and ATen in C2 integration builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064 Differential Revision: D9082082 Pulled By: gchanan fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa	2018-07-31 07:25:56 -07:00
Lu Fang	78b806c861	Fix the onnx symbolic for upsample (#10001 ) Summary: We missed the upsample symbolic when bumping up the opset to 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10001 Reviewed By: bddppq Differential Revision: D9067212 Pulled By: houseroad fbshipit-source-id: 3e285d2800a32cb04fa82f8e7f261bdd010a8883	2018-07-30 21:39:48 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
Junjie Bai	aa36a5d01c	Add typing into caffe2 requirements.txt for USE_ATEN (#10047 ) Summary: I was dumb lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10047 Differential Revision: D9076023 Pulled By: bddppq fbshipit-source-id: 10587875d04ac2aed2e015846fc73ce9e4717a4f	2018-07-30 20:09:21 -07:00
Junjie Bai	51539fa383	Add pyyaml into caffe2 requirements.txt for USE_ATEN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10039 Reviewed By: houseroad Differential Revision: D9074261 Pulled By: bddppq fbshipit-source-id: 26df516633d5a4ec539a03a62cf9e7839e1e1964	2018-07-30 18:11:25 -07:00
Andrew Tulloch	8f0a229078	Fix HPTT path for 0-sized inputs. Reviewed By: Maratyszcza Differential Revision: D9068091 fbshipit-source-id: 4aeac45f9732a86979a08488637bf0ba6cc79b34	2018-07-30 17:54:57 -07:00
Duc Ngo	788b2e996d	nomnigraph - minor cleanup of Graph.h (#9890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9890 Minor cleanups for Graph.h to make it more consistent with our style guide Also fix opt/device.cc and binary_match_test.cc to not access subgraph.nodes_ which is now private Reviewed By: bwasti Differential Revision: D9017108 fbshipit-source-id: 9f5cba4a2cd2a452a955005f4704f6c120bbc1d5	2018-07-30 16:24:03 -07:00
Yinghai Lu	e0a0234018	Remove C++14 feature (#10022 ) Summary: Which test should I look at, bddppq? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10022 Reviewed By: bddppq Differential Revision: D9068732 Pulled By: yinghai fbshipit-source-id: 241ef72c7fac0ed0b8c58ecdffbb5e24eb956217	2018-07-30 16:24:02 -07:00
onnxbot	3e3f40aeeb	Update onnx to latest master (#10024 ) Summary: `df01dbc005` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10024 Reviewed By: houseroad Differential Revision: D9069464 Pulled By: bddppq fbshipit-source-id: 751328352cd495e27b6bd533f4632d3d6d06c4a6	2018-07-30 15:54:34 -07:00
Elias Ellison	e57cb4a1b2	Add a Constant Propagation Pass to the JIT (#8808 ) Summary: Adding a constant propagation pass to the JIT. I have added examples to the expect files. There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures. To be consistent with DCE, python ops & CPP ops are treated as not having side-effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808 Reviewed By: wanchaol Differential Revision: D8906770 Pulled By: eellison fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c	2018-07-30 15:54:31 -07:00
Xiuyan Ni	db96a0951f	Add SIMD version to GFTRL optimizer (#9698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698 Add SIMD version to GFTRL optimizer Differential Revision: D8949723 fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26	2018-07-30 15:27:24 -07:00
Christian Puhrsch	9987282134	Use Retainable as base class for StorageImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9956 Reviewed By: gchanan Differential Revision: D9066103 Pulled By: cpuhrsch fbshipit-source-id: 1a5a2ace306308707add3d0e0c1fc861f5c79705	2018-07-30 15:08:56 -07:00
Gregory Chanan	7214754663	Check and return when numel() == 0 in Loops.cuh. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10031 Reviewed By: colesbury Differential Revision: D9070346 Pulled By: gchanan fbshipit-source-id: d6ad4e6ca43d334f5be42fea35915270dd8f405e	2018-07-30 15:01:28 -07:00
Junjie Bai	57750bd638	Enable ATen in C2 in integration builds to test ONNX ATen conversions (#10014 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/10014 Reviewed By: houseroad Differential Revision: D9061842 Pulled By: bddppq fbshipit-source-id: 1e1c2aeae62dd2cc5c6a8d5e1d395ea5cf882734	2018-07-30 15:01:13 -07:00
Thomas Viehmann	6c7fb1582f	Introduce __array_priority__ on torch.Tensor (#9651 ) Summary: This causes numpy to yield to the torch functions, e.g. instead of numpy array/scalar __mul__ converting the tensor to an array, it will now arrange for the Tensor __rmul__ to be called. Fixes case 2 of #9468 I also makes case 3 and 4 equivalent but does not fix them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651 Differential Revision: D8948079 Pulled By: ezyang fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db	2018-07-30 14:39:43 -07:00
vishwakftw	ea3c36b822	NumPy Scalar to PyTorch Scalar (#9225 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4985 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225 Differential Revision: D8769317 Pulled By: ezyang fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28	2018-07-30 14:39:40 -07:00
Mingzhe Li	c9eab34e63	Fix Caffe2 with ATen conda build failure (#10020 ) Summary: Extracted from `627624627e` and in support of https://github.com/pytorch/pytorch/pull/10019 cc pjh5 mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10020 Reviewed By: pjh5 Differential Revision: D9068124 Pulled By: orionr fbshipit-source-id: 4dd4910136a312b6517c65ce8802837108475f89	2018-07-30 14:10:02 -07:00
Peter Goldsborough	04939a4745	Match parameter names and = default (#9737 ) Summary: More clang tidy cleanups in `torch/csrc`. This time: 1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration) 2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs. Also updated my script a little bit. apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737 Differential Revision: D9069069 Pulled By: goldsborough fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae	2018-07-30 14:10:00 -07:00
Zachary DeVito	40a8239984	Fix a bug in argument spec (#9958 ) Summary: Non-tensor types did not set the running total_dims count, causing corrupted data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9958 Reviewed By: jamesr66a Differential Revision: D9065621 Pulled By: zdevito fbshipit-source-id: 0ac1fcdf6da076a9c9ebd5d70ce9126e3f8e722e	2018-07-30 13:08:59 -07:00
Thomas Viehmann	faa96c1c47	Deal with spaces in einsum equation string (#9994 ) Summary: Fixes #9930 Thank you, vadimkantorov for the report. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994 Differential Revision: D9042876 Pulled By: ezyang fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b	2018-07-30 12:57:56 -07:00
Gregory Chanan	ce5f0d40b6	Enable n-dimensional empty tensors. (#9947 ) Summary: These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947 Reviewed By: ezyang Differential Revision: D9032778 Pulled By: gchanan fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd	2018-07-30 12:33:17 -07:00
Jerry Zhang	73a60efccc	Fix Caffe2CTScan error (#9962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9962 att Reviewed By: hlu1 Differential Revision: D9036869 fbshipit-source-id: 3155af00c62d489f998cbfba07121c4fd20e1c6f	2018-07-30 12:33:15 -07:00
Edward Yang	b4f8c60931	Don't use the XML reporter for Catch2. (#10012 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10012 Differential Revision: D9057766 Pulled By: ezyang fbshipit-source-id: 12148a8cf3061423c61b3e7b36864dfcdb1138a1	2018-07-30 11:25:09 -07:00
Christian Puhrsch	9a9a7325c6	Remove the generation of storage files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9954 Reviewed By: gchanan Differential Revision: D9035947 Pulled By: cpuhrsch fbshipit-source-id: 9b56c7a68e3f562ea11b9265a5fa234838f2b4e0	2018-07-30 09:53:57 -07:00
Edward Yang	432ca747b0	Don't seed GPUs if there are none available. (#9931 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9931 Differential Revision: D9051375 Pulled By: ezyang fbshipit-source-id: 1721f6217e07f80adc107d95e897cd7dd488659a	2018-07-30 08:23:53 -07:00
onnxbot	3609977d7f	Update onnx to onnx/onnx@c761845 (#9964 ) Summary: `c761845c7f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9964 Reviewed By: houseroad Differential Revision: D9038133 Pulled By: bddppq fbshipit-source-id: 6ce740944e636175d2de4602edb92cc4d7e8e5ac	2018-07-29 23:10:12 -07:00
Giovanni	5ff1551eb9	ATen's emscripten support (#9803 ) Summary: Not sure if anybody is interested but I managed to infer a `GRU` fine in `wasm` using ATen's compiled with emscripten. It was quite trivial to fix the configuration. It also passes most of the tests, specially all scalar tensor tests. The command line to configure was, but could be simplified: ``` emconfigure cmake -DAT_LINK_STYLE=STATIC -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DCMAKE_C_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_CXX_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_INSTALL_PREFIX=/home/sugar/aten-wasm ../ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9803 Differential Revision: D9004610 Pulled By: ezyang fbshipit-source-id: db26c59f27162ed80f6aee2973c4cb9252d3d1e4	2018-07-29 20:39:00 -07:00
peter	3d6015db0e	Add essential PATH for the Windows PyTorch loading process (#9920 ) Summary: Fixes #9818. It seems original Python doesn't add `[PYTHONPATH]\Library\bin` into `PATH`. We try to add it before dll loading process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9920 Differential Revision: D9040825 Pulled By: soumith fbshipit-source-id: c07fff71b2aea254a396042ab677696f6829aac7	2018-07-29 08:23:59 -07:00
Anshul Jain (B*8)	56974a06b5	Revert D8909766: [caffe2] Simplify order switch operators Differential Revision: D8909766 Original commit changeset: 17a302d5bf4a fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba	2018-07-28 18:40:13 -07:00
rasbt	eee01731a5	Adds the default value for the amsgrad arg to the Adam docstring (#9971 ) Summary: Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971 Differential Revision: D9040820 Pulled By: soumith fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c	2018-07-28 09:23:45 -07:00
Junjie Bai	b99492a507	Fix BlobStatRegistry HIP BlobStatGetter registration issue (#9973 ) Summary: This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7df. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922	2018-07-28 02:23:40 -07:00
Huayu Li	46d8002800	Fix bug that always uses the same blob when repeating poolings Reviewed By: houseroad Differential Revision: D9027902 fbshipit-source-id: 957702ad9736812ec5aa32066d286c2c3adffc49	2018-07-28 00:09:16 -07:00
Wanchao Liang	47c1badf90	Fix the clamp special case and gradient problem on None, add None to JIT (#9596 ) Summary: Supersedes #8925 This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like: ```python torch.jit.script def func(): x = torch.randn(3, 3, requires_grad=True) y = torch.clamp(x, None, 0) # max = 0 y = torch.clamp(x, min=None, max=0) ``` In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp. In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed. Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp). zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596 Reviewed By: zdevito Differential Revision: D8940839 Pulled By: wanchaol fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28	2018-07-27 22:54:33 -07:00
James Reed	851c18dd20	PyTorch File Format API (#9900 ) Summary: This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900 Reviewed By: zdevito Differential Revision: D9021057 Pulled By: jamesr66a fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3	2018-07-27 22:24:57 -07:00
JerryShih	d913db70f2	Handle the "spatial" attribute in onnx BatchNormalization op (#9492 ) Summary: If we have this "spatial" attribute and its value equals to 1, we could just remove this attribute and convert this op to caffe2 SpatialBN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9492 Differential Revision: D8988165 Pulled By: houseroad fbshipit-source-id: a9218dc9cd5fab43deb371f290f81285f5283231	2018-07-27 22:09:15 -07:00
Jerry Zhang	bcba5a50d1	Fix EnforceFiniteOp Summary: att Reviewed By: kennyhorror Differential Revision: D9040248 fbshipit-source-id: 0da0f3b1ce51375731098cc86c92f35953be0861	2018-07-27 22:01:23 -07:00
Bram Wasti	ab4e209007	Back out "[caffe2][nomnigraph] Allow multiple ops.def and clean up code gen in general" Summary: Original commit changeset: 6fdf915c057a Reviewed By: yinghai Differential Revision: D9040008 fbshipit-source-id: 33fd5d4ddc0ec8cae56cf86f6d63b6f666e51a3e	2018-07-27 20:09:14 -07:00
Igor Milyakov	607688e928	Adding reciprocal operator and a test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908 Differential Revision: D9035809 Pulled By: virtan fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7	2018-07-27 18:24:43 -07:00
Lu Fang	ee827f6ba3	Fix a testcase in logsoftmax onnx export (#9660 ) Summary: We only support special case. The original dim is not supported by ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660 Reviewed By: bddppq Differential Revision: D8965507 Pulled By: houseroad fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776	2018-07-27 17:54:32 -07:00
Igor Milyakov	12a1af3731	Adding conv tests with explicit algo definition Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798 Differential Revision: D9034663 Pulled By: virtan fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d	2018-07-27 17:39:17 -07:00
Jerry Zhang	9eeb4e17af	Split gather op for easier smaller code size (#9916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9916 att Differential Revision: D8961085 fbshipit-source-id: 39a9838647dc97611e77beb0607c4655de727ada	2018-07-27 17:15:33 -07:00
root	c3fe071483	Update hip files (#9826 ) Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f	2018-07-27 16:54:39 -07:00
Norman Mu	a532c1a48c	Fix default argument value for CTCGreedyDecoder op (#9747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747 Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases. Reviewed By: houseroad Differential Revision: D8963635 fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb	2018-07-27 16:33:07 -07:00
cclauss	eb9bb1f09a	Travis CI: Run flake on Python 2.7 and 3.7 (#9953 ) Summary: Flake8 will produce different results on Python 2 and 3. Python 3.7 has __async__ as a reserved word https://github.com/pytorch/pytorch/pull/4999. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9953 Differential Revision: D9035415 Pulled By: soumith fbshipit-source-id: 8a46e028a2e20a7e3f6d90137020268d65a7cc64	2018-07-27 14:43:26 -07:00
Sam Gross	829d763c69	Implement add, sub, mul, div using TensorIterator (#8919 ) Summary: ``` This adds TensorIterator, a helper class for computing element-wise operations that's intended to replace the CPU and CUDA apply utils functions. CPU kernels are implemented as functions that operate on strided 1-d tensors compared to CPUApplyUtils which operated individual elements. This allows the kernels to handle vectorization, while TensorIterator handles parallelization and non-coalesced dimensions. GPU kernels continue to operate on elements, but the number of specializations is reduced. The contiguous case remains the same. The non-contiguous case uses a single (reduced) shape for all operands and the fast integer division from THCIntegerDivider. To avoid extra specializations for indexing with 64-bits, large operations are split into smaller operations that can be indexed with 32-bits. Major semantic changes: - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by TensorIterator. The autograd engine performs the reduction assuming standard broadcasting if the gradient shape does not match the expected shape. Functions that do not use standard broadcasting rules should either continue to trace the expand calls or handle the reduction in their derivative formula. - Use ONNX v7, which supports broadcasting ops. Performance impact: - Small increased fixed overhead (~0.5 us) - Larger overhead for wrapped numbers (~2.5 us) - No significant change for ops on contiguous tensors - Much faster worst-case performance for non-contiguous GPU tensors - Faster CPU bias addition (~2x) - Faster GPU bias addition (~30% faster) Future work: - Decrease overhead, especially for wrapping numbers in Tensors - Handle general inter-type operations - Extend to unary ops and reductions - Use buffering for compute-bound operations on non-contiguous tensors (pull in from CPUApplyUtils) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919 Differential Revision: D8677600 Pulled By: colesbury fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd	2018-07-27 14:43:24 -07:00
Owen Anderson	e3c4057b6c	Eliminate an extra lookup in the hashtable during CSE. (#9668 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9668 Differential Revision: D8955185 Pulled By: resistor fbshipit-source-id: f3f929efc11be63850bd863679cc7b297c98d679	2018-07-27 14:43:22 -07:00
Christian Puhrsch	ef9801f32c	Merge THStorage into at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9772 Reviewed By: ezyang Differential Revision: D9019375 Pulled By: cpuhrsch fbshipit-source-id: d5185e29747929d648e4260db4967452cd40f563	2018-07-27 13:53:55 -07:00
Owen Anderson	6ed41adb04	Use round-to-negative division when computing output sizes for convolutions involving striding and dilation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640 Differential Revision: D8948081 Pulled By: resistor fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595	2018-07-27 13:22:54 -07:00
Wei Yang	8c0355c90d	convert lambd directly to scalar_t at hardshrink (#9919 ) Summary: - convert lambd directly to scalar_t instead of creating a tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9919 Differential Revision: D9026708 Pulled By: weiyangfb fbshipit-source-id: d20ab06ecc12aa972ee9d1323ee2f84abf8d5ffd	2018-07-27 13:22:52 -07:00
Adam Paszke	ce0b895a0c	Fix UBSAN error in ONNX peephole pass, make it more robust. Summary: Minor fix for a bug introduced by D9004285 Reviewed By: anderspapitto Differential Revision: D9028762 fbshipit-source-id: 9b9c5eef30e61d7ae19784e0418fa29bad2b5564	2018-07-27 12:38:56 -07:00
Thomas Viehmann	c77e4bc4d5	export tensor(ArrayRef, options) on Windows (#9904 ) Summary: I hope this helps me for the windows build failure in #9628 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9904 Differential Revision: D9026715 Pulled By: soumith fbshipit-source-id: bb97d41d060823f5a37bfc9a1659815b8b9f4eab	2018-07-27 12:14:52 -07:00
Jerry Zhang	aebf3b47ae	Remove template parameter from Tensor (#9939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939 Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: ezyang, houseroad Differential Revision: D9024330 fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba	2018-07-27 10:56:39 -07:00
Lu Fang	94439d7df4	Suppress the vptr warning in ubsan (#9909 ) Summary: Unblock https://github.com/pytorch/pytorch/pull/8469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9909 Differential Revision: D9023650 Pulled By: houseroad fbshipit-source-id: 7682a9cd7905e98c802b820ad59745672b32970d	2018-07-27 10:28:07 -07:00
Gregory Chanan	c0bacc6284	Guard test_lapack_empty with has_magma. (#9936 ) Summary: CUDA lapack functions generally don't work unless has_magma is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936 Differential Revision: D9028579 Pulled By: gchanan fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055	2018-07-27 10:09:00 -07:00
Eugene Vorontsov	bf32ea8094	Fix dimension check in 1D instance norm, allowing 2D tensors alongside 3D. (#9924 ) Summary: Fixes #9776. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9924 Differential Revision: D9028328 Pulled By: soumith fbshipit-source-id: d5f22abb2be83b34aee95ebe144c97519a6854f8	2018-07-27 09:24:07 -07:00
Gregory Chanan	d3ba9a173e	Handle case where THC btrifact doesn't zero info. (#9907 ) Summary: This was showing up in the n-dimensional empty tests as flaky because it's reading uninitialized cuda memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9907 Differential Revision: D9021413 Pulled By: gchanan fbshipit-source-id: 31542b7597919df9afd6e528bb108a4a3e8eaf60	2018-07-27 09:11:44 -07:00
Gregory Chanan	1af1b0c2a5	Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895 The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward. Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!): ``` nDimension corresponds to the "true" ATen dimension. TODO: implement. nDimensionLegacyNoScalars correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors. nDimensionLegacyAll corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors and tensors with a dimension of size zero are collapsed to 0-dimensional tensors. ``` So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll. These are just codemods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835 Reviewed By: ezyang Differential Revision: D8999338 Pulled By: gchanan fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311	2018-07-27 08:56:38 -07:00
Gregory Chanan	bc66d98248	Fix narrow on empty tensors after negative size support. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9838 Differential Revision: D9002345 Pulled By: gchanan fbshipit-source-id: 13f4bacff94d9d0ea31a3b73a75b9b3e774eabf5	2018-07-27 07:55:20 -07:00
Changmao Cheng	7b375ed362	fix ParameterDict doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9918 Differential Revision: D9026402 Pulled By: soumith fbshipit-source-id: d0459dcda631e8921ab39725b9045e03960da5c9	2018-07-27 01:10:50 -07:00
tomguluson92	a709f23225	revise a little spell mistake in tensor.py (#9868 ) Summary: Hello! I just find a small spell mistake while reading this source code. Just PR it, Thx! Pull Request resolved: https://github.com/pytorch/pytorch/pull/9868 Reviewed By: gchanan, ezyang Differential Revision: D9016030 Pulled By: soumith fbshipit-source-id: fc3877177be080adbdbda99a169e401691292ebb	2018-07-27 00:55:03 -07:00
Junjie Bai	4a192bcc3d	Rename onnx integration tests file to avoid confusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9913 Differential Revision: D9026787 Pulled By: bddppq fbshipit-source-id: a3e7e79973abc4f5fe163f3e86b24382a1efd082	2018-07-26 23:40:41 -07:00
Adam Paszke	8cb1eef7b9	Unify IR operator representation (stop using attributes in the JIT) (#9807 ) Summary: Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it could infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7	2018-07-26 22:11:50 -07:00
Vignesh Ramanathan	2c1d9e09b8	Support UINT8 for addition data in ImageInputOp (#9901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9901 Added support for UINT8 datatype for additional data (prefetching and output) by ImageInputOp Reviewed By: ashwinb Differential Revision: D9018964 fbshipit-source-id: f938a8a072c15c0ee521b2f16788c024b08cd37f	2018-07-26 22:11:46 -07:00
James Sun	aa671ddefa	Support production models with predictor benchmark (#9855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9855 Support production models with predictor benchmark Two new flags are added: `--update_prod`: pull production data (netdef, input types, input dims) from Hive and store locally `--use_prod`: run benchmark with local production data with the same workload as in production. By default, 300 models will be loaded. production vs benchmark avg net run time: (collected by prod: https://fburl.com/scuba/6lb91zfx and bench: https://fburl.com/ngjj1dc8) prod: `408us` vs bench: `543us` (With prod data distribution, this should be even closer) framework overhead (as of 2018-07-22): prod: ``` 9.111% BlackBoxPredictor::Run 4.602% SimpleNet::Run 2.377% Operator::Run 1.786% BlackBoxPredictor::AllocateMemory 1.372% Observable::StartAllObservers 1.358% Observable::StartObserver 1.206% Blob::GetMutable ``` bench: ``` 8.577% BlackBoxPredictor::operator() 3.276% SimpleNet::Run 1.954% Operator::Run 1.697% BlackBoxPredictor::AllocateMemory 1.477% Tensor::ShareData 1.230% Blob::GetMutable 1.034% Observable::StartObserver ``` Reviewed By: yinghai Differential Revision: D8942996 fbshipit-source-id: 27355d7bb5a9fd8d0a40195261d13a97fa24ce17	2018-07-26 21:39:29 -07:00
David Brownell	eb33887816	Addressed issue identified by static code analysis: potential buffer … (#9889 ) Summary: …overrun Pull Request resolved: https://github.com/pytorch/pytorch/pull/9889 Differential Revision: D9026278 Pulled By: soumith fbshipit-source-id: ee2ee255f34731ddc581261984c3caf56faa0e12	2018-07-26 21:09:51 -07:00
Vishwak Srinivasan	e41eb43327	Remove deprecated masked_copy (#9819 ) Summary: No tests are affected by this removal. Closes https://github.com/pytorch/pytorch/issues/1885 and closes #9817 While I was at it, I also fixed #9876 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9819 Differential Revision: D9018126 Pulled By: SsnL fbshipit-source-id: a9142bf4e2403bef05779a097f61fa8b7db04b71	2018-07-26 20:55:18 -07:00
Owen Anderson	a841006353	Simplify some code by directly constructing unordered_set from nodes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9675 Differential Revision: D8952196 Pulled By: resistor fbshipit-source-id: 5ef2308fed9f702021f650cf2d241a83d880d359	2018-07-26 19:54:38 -07:00
Yi Cheng	dfa0af093d	Move predictor into caffe2/caffe2/predictor (#9548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548 Pull Request resolved: https://github.com/pytorch/translate/pull/157 One part of refactor predictor. Move all the files into predictor dir. Reviewed By: highker Differential Revision: D8845276 fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013	2018-07-26 19:03:40 -07:00
Sam Gross	c045e969b6	Use qualified name at::Half in Dispatch.h (#9848 ) Summary: This makes AT_DISPATCH_ALL_TYPES_AND_HALF valid outside of the at namespace. See https://github.com/pytorch/extension-cpp/issues/15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9848 Differential Revision: D9006921 Pulled By: colesbury fbshipit-source-id: a6e4f097a9d6fb85c921e1c9b9ea25d0f2db06dc	2018-07-26 19:03:24 -07:00
Jongsoo Park	e7ab093d93	Simplify order switch operators (#9581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581 Mostly to simplify code. Should also improve performance but order switch ops don't take much time anyway. Reviewed By: viswanathgs Differential Revision: D8909766 fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919	2018-07-26 18:24:29 -07:00
Wanchao Liang	b7b61a8eb4	Change expect, cast on Type to return shared pointers, make isSubtypeOf accept TypePtr (#9786 ) Summary: Follow up task of #9584. Commit 1: - change expect/cast to return shared pointers instead of raw pointer - isSubtypeOf accept TypePtr instead. Use `x->isSubtypeOf(NumberType::get())` rather than `x->isSubtypeOf(*NumberType::get())` Commit 2: - to address enable_shared_from_this pitfalls, we make the constructor private and expose the factory method to make sure user can only create it using our factory method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9786 Reviewed By: zdevito Differential Revision: D8980441 Pulled By: wanchaol fbshipit-source-id: e5c923fc57a701014310e77cf29985b43bb25364	2018-07-26 18:09:45 -07:00
Ailing Zhang	9df9c46992	fix loading 1dim tensor from 0.3.* to 0dim tensor (#9781 ) Summary: This PR fixes #9743 . Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781 Differential Revision: D8988196 Pulled By: ailzhang fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef	2018-07-26 17:09:41 -07:00
Gregory Chanan	d65c667f28	Avoid divide-by-zero when hamming_window window length is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9896 Reviewed By: ezyang Differential Revision: D9018572 Pulled By: gchanan fbshipit-source-id: fa314687973124165bffb3084932d8ab6d872a93	2018-07-26 15:56:44 -07:00
Fei Sun	d1260d26fe	Sleep before run (#9891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891 Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880 Reviewed By: llyfacebook Differential Revision: D9014254 Pulled By: sf-wind fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0	2018-07-26 14:39:17 -07:00
Norman Mu	18a6541b82	Create IDEEP fallback operators for ctc decoder ops (#9847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9847 CTCBeamSearchDecoder and CTCGreedyDecoder do not currently support IDEEP execution. Add fallback operators to allow IDEEP execution of models that use these operators. Reviewed By: yinghai Differential Revision: D9006234 fbshipit-source-id: fc539ba67b07d1f960d28564d8adde0be8690649	2018-07-26 14:09:11 -07:00
Jerry Zhang	969b62f276	Revert D8121878: Remove template parameter from Tensor Differential Revision: D8121878 Original commit changeset: 4a5e9a677ba4 fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e	2018-07-26 14:02:04 -07:00
Junjie Bai	456f41301c	Disable unique ops test on rocm (#9892 ) Summary: Somehow we have Unique operator tests in two places test_unqiue_ops.py and hypothesis_test.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9892 Reviewed By: houseroad Differential Revision: D9017631 Pulled By: bddppq fbshipit-source-id: 1f9e40e4953afca26141ef4581202b9b9fce0ae9	2018-07-26 13:10:23 -07:00
zou3519	1dc708493e	Add html-stable target to docs Makefile (#9884 ) Summary: This lets one build docs for the release easier. All of the unstable warnings are removed in `make html-stable`. cc soumith SsnL Sample build: ![image](https://user-images.githubusercontent.com/5652049/43277115-05e2f720-90d5-11e8-9977-b0b4a6ee4b8e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9884 Reviewed By: SsnL Differential Revision: D9016001 Pulled By: zou3519 fbshipit-source-id: 5cf2dfbf886de993242db28cdac5d0c5fadbdc4d	2018-07-26 12:09:06 -07:00
Junjie Bai	0c84a5c27e	Pass shape infos to ONNX -> Caffe2 C++ conversion backend (#9870 ) Summary: And let Gemm conversion to inspect the input `C` to try converting to FC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9870 Reviewed By: houseroad Differential Revision: D9013198 Pulled By: bddppq fbshipit-source-id: b4c509cfccca238262e1c406b004e66cef256321	2018-07-26 12:00:32 -07:00
Adam Paszke	e39c8043dc	Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763 ) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20	2018-07-26 12:00:27 -07:00
Junjie Bai	6f10944f88	Re-enable rocm tests that have been fixed in rocm 1.8.2 (#9862 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9862 Differential Revision: D9012520 Pulled By: bddppq fbshipit-source-id: cdcc184e23befa8dbd1bc44d59bd25766aac33d0	2018-07-26 10:54:57 -07:00
Gregory Chanan	716f7d657d	Remove Broadcast.py. (#9843 ) Summary: I don't think this file is used anywhere, I guess we'll find out! (Weirdly this failed lint on one of my PRs even though it shouldn't). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9843 Differential Revision: D9003949 Pulled By: gchanan fbshipit-source-id: 26d580d1e7cdd30e82e5f4176244e51fd7cd616d	2018-07-26 10:44:24 -07:00
Jerry Zhang	cd5adc7b5f	Remove template parameter from Tensor (#13 ) Summary: Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: xw285cornell Differential Revision: D8121878 fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81	2018-07-26 10:25:23 -07:00
Wei Yang	2c7e7e37a6	Corrected doc in class RNNCell (#9866 ) Summary: fixes #9642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9866 Differential Revision: D9012131 Pulled By: weiyangfb fbshipit-source-id: d2849b1a50234dbdb335dffab4835c9de85183c3	2018-07-26 09:27:05 -07:00
Junjie Bai	bdbbcf068a	Temporarily disable test_unique on rocm since it keeps running into segfault (#9872 ) Summary: petrex https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872 Reviewed By: ezyang Differential Revision: D9013335 Pulled By: bddppq fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e	2018-07-26 08:34:00 -07:00
Ashish	e70fc145a9	MIOpen fixes for Caffe2 (#9842 ) Summary: The PR contains: Fixes for running MIOpen conv operator in a multi worker scenario, along with a performance fix Fixing a typo in MIOpen pool op and adding some extra checks for MIOpen spatial BN op bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9842 Differential Revision: D9012512 Pulled By: bddppq fbshipit-source-id: 270e1323c20fbfbc4b725f9a4ff34cd073ddaaa8	2018-07-26 02:42:26 -07:00
Junjie Bai	3be8e4db51	Do not run ONNX integration tests in parallel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9861 Differential Revision: D9011458 Pulled By: bddppq fbshipit-source-id: 7ab1b1763d56f1290ade7a99682ad461c97f807b	2018-07-25 21:54:29 -07:00
Junjie Bai	997f46d1e1	Disable "filter too much" health check for fc operator tests (#9865 ) Summary: makes the CI flaky Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865 Differential Revision: D9011882 Pulled By: bddppq fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d	2018-07-25 21:41:14 -07:00
Marat Dukhan	ba062e7da9	Update OnnxifiOp according to onnx/onnx#1224 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9844 Reviewed By: yinghai Differential Revision: D9004222 Pulled By: bddppq fbshipit-source-id: 1bdcefc0dfbd5e3422217b5254b2462e5a568d2a	2018-07-25 19:29:38 -07:00
Edward Yang	5e4de0821a	Set ROCm MAX_JOBS=4 (#9856 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9856 Differential Revision: D9009100 Pulled By: ezyang fbshipit-source-id: 28f34128fcb7c3d6a115884bf28dc2a6bde5aed6	2018-07-25 19:09:41 -07:00
Edward Yang	6cd0174ff5	Reimplement localScalar as a native function. (#9762 ) Summary: I split it into two parts, _local_scalar and _local_scalar_dense (unchecked) so I could reuse the sparse logic in both paths. _local_scalar became a method on Tensor to work around a circular include problem. This is resurrected copy of #9652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762 Differential Revision: D8972348 Pulled By: ezyang fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c	2018-07-25 19:09:39 -07:00
Edward Yang	ad47228020	Test pinning Hypothesis 3.59.0 (#9830 ) Summary: We think this will band-aid some of the new Caffe2 test failures. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9830 Differential Revision: D9008052 Pulled By: ezyang fbshipit-source-id: 84f1c0faea429d758d760965d6cbfe9e4c72eb19	2018-07-25 18:11:10 -07:00
Edward Yang	b84b78a69d	Fix the ROCM build, and enable sccache for it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9841 Differential Revision: D9008030 Pulled By: ezyang fbshipit-source-id: 51cac3c75fc52658b22a10a6bf8a479bcf803fb2	2018-07-25 17:55:47 -07:00
James Reed	0b16b03b98	Plumb type annotations through script compilation (new) (#9547 ) Summary: Supersedes https://github.com/pytorch/pytorch/pull/9405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547 Reviewed By: zdevito Differential Revision: D8900327 Pulled By: jamesr66a fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd	2018-07-25 17:10:14 -07:00
Xiaomeng Yang	445c17d492	Update CopyMatrix in math (#9792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792 Update CopyMatrix in math Reviewed By: houseroad Differential Revision: D8982421 fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002	2018-07-25 16:10:52 -07:00
Duc Ngo	74ac5265d1	nomnigraph - make use of nodeIterator (#9831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9831 Follow up to D8980903 - replace dataIterator with nodeIterator where the data isn't used. Reviewed By: pjh5 Differential Revision: D8998351 fbshipit-source-id: c333847ecd8b6d8075352322845839b94a63aecc	2018-07-25 15:40:44 -07:00
Wei Yang	302adb7cc8	added torch.rot90() to ATen (#8628 ) Summary: 1. fixes #6271 2. implemented torch.rot90() following [numpy.rot90()](`6a58e25703/numpy/lib/function_base.py (L54-L138)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628 Reviewed By: ezyang Differential Revision: D8987860 Pulled By: weiyangfb fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9	2018-07-25 15:11:44 -07:00
Gregory Chanan	2f5c0c30cd	Make logsumexp work with empty tensors again. (#9825 ) Summary: https://github.com/pytorch/pytorch/pull/9755 broke this, but it was only tested if size zero dims were turned on (it can still happen even if that isn't turned on, because we support size [0] tensors). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9825 Differential Revision: D8997303 Pulled By: gchanan fbshipit-source-id: 911dce112f73fad0f3980a7f4f9423df0f2d923d	2018-07-25 13:41:24 -07:00
Edward Yang	4b0098f3ae	Add --allow-change-held-packages to make nccl2 install in docker work (#9828 ) Summary: This was used to build Caffe2 Docker version 170. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9828 Differential Revision: D8997808 Pulled By: ezyang fbshipit-source-id: f48938b2b71bc86578c9d9b46c281ed05478724e	2018-07-25 11:56:40 -07:00
James Reed	279b836675	Add some user-friendly checks in pack padded symbolic to ensure thing… (#9731 ) Summary: …s are the right type Pull Request resolved: https://github.com/pytorch/pytorch/pull/9731 Reviewed By: soumith Differential Revision: D8958693 Pulled By: jamesr66a fbshipit-source-id: 7db1f86a85188fd2c84d0edaaaac6a096d64ba52	2018-07-25 11:25:42 -07:00
Gregory Chanan	be163f50a3	Avoid divide-by-zero when bartlett_window size is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788 Differential Revision: D8980951 Pulled By: gchanan fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c	2018-07-25 10:40:39 -07:00
Gregory Chanan	56fbfee872	Remove ifdef __cplusplus from THTensor.h, have cpp self-contained in … (#9775 ) Summary: …THTensor.hpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9775 Differential Revision: D8977140 Pulled By: gchanan fbshipit-source-id: d6d2461f7cb0511ee1def52ac1032a86349a7105	2018-07-25 10:25:17 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit 9ee513365121cd387e11987c66db6599ac53ded7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Yuan Xie	c14e17eced	Co-disitillation with different archs and/or feature set (#9793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9793 Enable co-distillation with different archs Reviewed By: pjh5 Differential Revision: D8888479 fbshipit-source-id: eac14d3d9bb6d8e7362bc91e8200bab237d86754	2018-07-25 10:10:27 -07:00
bhushan	ea67a2bd11	Allows negative index to tensor.narrow (Fixes: #9546 ) Summary: Fixes #9546 Test cases added Reviewed By: ezyang Differential Revision: D8974842 Pulled By: zou3519 fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df	2018-07-25 09:25:45 -07:00
Gregory Chanan	0853d13f86	Move scalar boolean to THTensor, rename scalar in this context to zer… (#9783 ) Summary: …o dim. Manifest: 1) The scalar boolean is now in THTensor, although it isn't hooked up at the TH level yet. 2) setScalar is gone, everything now goes through the maybeScalar equivalent (which is renamed) 3) all "scalars" in this context now refer to "zero_dim" in order to differentiate this concept from the "Scalar" class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9783 Differential Revision: D8978911 Pulled By: gchanan fbshipit-source-id: f09254be4bebad0e4c510fefe4158b4f7e92efe1	2018-07-25 09:25:41 -07:00
Duc Ngo	8825e323b5	nomnigraph - Add way to check if a NodeRef is in a graph, and make a graph node iterator (#9790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9790 - Add way to check if a NodeRef is in a graph - Make a nodeIterator (similar to dataIterator) but only iterate through nodes. Reviewed By: bwasti Differential Revision: D8980903 fbshipit-source-id: b20504a46715858752e25242303125a15a709b88	2018-07-25 09:02:13 -07:00
Jorghi12	42a4747389	Temporarily need this to prevent sccache from breaking. (#9810 ) Summary: Temporarily need this to prevent sccache from breaking when I move sccache install to the DockerFile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9810 Differential Revision: D8991684 Pulled By: Jorghi12 fbshipit-source-id: 14cd0278f53a72372f9bbe27b228980f8d3c1d4a	2018-07-25 09:01:58 -07:00
caoxudong	a74a3fdeb6	typo fix, tutorials url with http protocol is not valid (#9812 ) Summary: The tutorials url with http is not valid, replacing it with https. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9812 Differential Revision: D8991344 Pulled By: ezyang fbshipit-source-id: c12faa57905b50eadc320f9938c39c4139bd093b	2018-07-25 07:54:26 -07:00
sethah	3ef521e98a	Implement backward for torch.symeig (#8586 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/6890. (backward pass for non-symmetric eigen-decomposition is not implemented in other packages, e.g. autograd, mxnet, tensorflow, presumably because the eigenvalues can be imaginary for the general case, and AFAIK we cannot support complex numbers). This patch adds a backward function for the symmetric eigen-decomposition function `torch.symeig`. The formula used is taken from [here](http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf). Unit tests are added to verify correctness. There is still one outstanding issue, which is how to handle the case where the `symeig` is called with `eigenvectors=False`. In this case, the eigenvectors are returned as a zero tensor, but the backward computation for the eigenvalues depends on the eigenvectors. There was a previous attempt to implement this in https://github.com/pytorch/pytorch/pull/2026, where apaszke mentioned that the `eigenvectors` argument should be overridden so that they are saved for the backwards pass. The forward code is autogenerated, though, and it isn't clear to me how that would be done. I'd appreciate any guidance. For now, there is a unit test that will fail until that issue is resolved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8586 Reviewed By: ezyang Differential Revision: D8872760 Pulled By: SsnL fbshipit-source-id: 76614495d0f9c118fec163a428f32e5480b4d115	2018-07-25 07:16:10 -07:00
Edward Yang	0262fd0f91	Delete Tensor::typeString() (#9764 ) Summary: The primary use-site of typeString was checked_cast_tensor. I did a little more than I needed in this patch, to set the stage for actually deleting the tensor type. Specifically, I modified checked_cast_tensor to explicitly take Backend and ScalarType, the idea being that once we remove the tensor subclasses, we will delete the T template parameter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764 Differential Revision: D8969196 Pulled By: ezyang fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6	2018-07-24 22:26:15 -07:00
Edward Yang	723a600ebd	Update for new incremental build instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9773 Differential Revision: D8988285 Pulled By: ezyang fbshipit-source-id: c2c3b7cefb54e4e18602b180281f22939293a383	2018-07-24 22:26:13 -07:00
Tony Duan	bca10ad706	Implementation of Weibull distribution (#9454 ) Summary: This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution). Details - We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions). - The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector. Example of the bug: ``` >>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine array([1., 1., 1.]) >>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var return self.dist.var(self.args, self.kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var res = self.stats(args, **kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats if np.isinf(mu): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454 Differential Revision: D8863574 Pulled By: SsnL fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787	2018-07-24 20:40:15 -07:00
Siddharth Goyal	4b61760738	Add Adadelta optimizer to caffe2 (#9088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088 Closes https://github.com/pytorch/pytorch/pull/9088 - Added CPU/GPU implementations of Adadelta and SparseAdadelta. - Added corresponding Python unittests Reviewed By: BIT-silence Differential Revision: D8712169 fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be	2018-07-24 20:09:21 -07:00
Anders Papitto	620952117e	remove unnecessary -Wno= flags Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9608 Differential Revision: D8946664 Pulled By: anderspapitto fbshipit-source-id: b05f10af58da25b2a2588f7153f393bb3637f29a	2018-07-24 18:40:42 -07:00
Jesse Hellemn	9cf76cfb4c	Chaning conda build script to use current python version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9780 Reviewed By: ml7 Differential Revision: D8983501 Pulled By: pjh5 fbshipit-source-id: 79208796247433cbe271a2d06f66254587d96f80	2018-07-24 18:40:40 -07:00
Peter Goldsborough	f62bc01dfe	Remove TORCH_ASSERT (#9575 ) Summary: I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575 Differential Revision: D8924566 Pulled By: goldsborough fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441	2018-07-24 18:10:06 -07:00
Sebastian Messmer	d2610fb379	Constexpr Type Ids -> 6.5% caffe2 perf improvement (#9603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9603 Using constexpr for some heavily queried type ids gives us a 6.5% perf improvement for caffe2. Benchmark results: P59829647 Also ad canaries (but they don't show a significant difference): - adfinder: - https://our.intern.facebook.com/intern/ads/canary/411346509423301481 - https://our.intern.facebook.com/intern/ads/canary/411346563021753557 - adindexer: - https://our.intern.facebook.com/intern/ads/canary/411346517006038367 - https://our.intern.facebook.com/intern/ads/canary/411346571387258927 - multifeed_predictor: - https://our.intern.facebook.com/intern/ads/canary/411346526631282941 - https://our.intern.facebook.com/intern/ads/canary/411346583141009531 Reviewed By: dzhulgakov Differential Revision: D8841577 fbshipit-source-id: 1a0ce7f2bee1ae54b723caefe5bc7f85a20935b4	2018-07-24 17:24:55 -07:00
Keren Zhou	6c6a353a66	Fix speedbenchmark bug (#9770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9770 Add zero ops to operators that do not have a valid schema Reviewed By: hlu1 Differential Revision: D8957472 fbshipit-source-id: d8d0a351183e88ace2e050a87c1e1c363af67e33	2018-07-24 17:10:37 -07:00
onnxbot	d7d673b68d	Updata onnx to lastest master (#9782 ) Summary: `52d40befa7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9782 Reviewed By: yinghai, houseroad Differential Revision: D8978668 Pulled By: bddppq fbshipit-source-id: 238f76a36784c12cc5655a2ee059f7e0169c0bb6	2018-07-24 14:42:01 -07:00
Junjie Bai	e5fe66d7ea	Add support for specifying device_option in Functional (#9619 ) Summary: e.g. ``` Functional.Add(x, y, device_option=DeviceOption(HIP, 0)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9619 Differential Revision: D8966599 Pulled By: bddppq fbshipit-source-id: 22235e42f19278e79802642798bf0ee70a1202f6	2018-07-24 14:41:59 -07:00
Tongzhou Wang	37fc58f1d3	Use torch::empty before random_ on seed gen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9769 Reviewed By: goldsborough Differential Revision: D8977636 Pulled By: SsnL fbshipit-source-id: c2437d5ef53dc74e1b17eb16e728e1d67ae314c7	2018-07-24 14:41:58 -07:00
Peter Goldsborough	f393df774b	Test case for c10d DDP (#9670 ) Summary: Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some. I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!). I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9670 Differential Revision: D8977724 Pulled By: goldsborough fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944	2018-07-24 14:10:24 -07:00
Gregory Chanan	e26d584445	Remove isScalar() from TensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9765 Differential Revision: D8969474 Pulled By: gchanan fbshipit-source-id: 42002b129488179affc919dba877de5a4e8f9fb5	2018-07-24 12:55:06 -07:00
Thomas Viehmann	7050d83dd7	Make logsumexp_out inplace (#9755 ) Summary: Fixes: #9754 Maybe this could also make its way into 0.4.1, it is a severe debugging headache if you hit this... Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755 Reviewed By: ezyang Differential Revision: D8967178 Pulled By: zou3519 fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42	2018-07-24 12:40:48 -07:00
Vishwak Srinivasan	360c1bbd5b	Add multivariate log-gamma (mvlgamma) (#9451 ) Summary: 1. Add tests in test_cuda, test_torch 2. Add doc strings Closes https://github.com/pytorch/pytorch/issues/9378 . Differential Revision: D8859746 Pulled By: ezyang fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e	2018-07-24 12:10:10 -07:00
Edward Yang	6885b3fd62	Delete dead IsVariable enum. (#9768 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9768 Differential Revision: D8975802 Pulled By: ezyang fbshipit-source-id: f85844872a1eb13e782aba0c168a3a1c1ac0313d	2018-07-24 11:58:11 -07:00
vishwakftw	f9a99d5504	Specify default initialization schemes for modules in docs (#9038 ) Summary: This closes #6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48	2018-07-24 11:58:08 -07:00
Kittipat Virochsiri	2b134c72e6	Add interface to provide blob types to shape&type inference (#9643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643 Current map interface assumes float data type, which is not always correct. Reviewed By: kennyhorror Differential Revision: D8455784 fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10	2018-07-24 11:58:05 -07:00
Junjie Bai	7af5883860	Eanble python tests on ROCM (#9616 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616 Differential Revision: D8960623 Pulled By: bddppq fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993	2018-07-24 11:37:58 -07:00
Gregory Chanan	6ab5e697b9	Small fixups for enabling zero size dims. (#9724 ) Summary: 1) Properly test cpu for alpha/beta addmm cases. 2) Unsqueeze on empty no longer throws an exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724 Reviewed By: ezyang Differential Revision: D8958513 Pulled By: gchanan fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09	2018-07-24 11:11:39 -07:00
Gregory Chanan	675d80841a	Small fixups for n-dimensional empty tensors in CUDA non-reduction di… (#9722 ) Summary: …m ops. Continuation of https://github.com/pytorch/pytorch/pull/9658. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9722 Differential Revision: D8956321 Pulled By: gchanan fbshipit-source-id: 116fcaa1be5b1373f03217911556a28125cc860d	2018-07-24 11:11:37 -07:00
Will Wilson	f6496229a5	Fixes xcode 10 beta 4 compile error (#9748 ) Summary: When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748 Reviewed By: jerryzh168 Differential Revision: D8954332 Pulled By: williamtwilson fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3	2018-07-24 11:11:35 -07:00
Edward Yang	1283834600	Devirtualize TensorImpl::toString (#9758 ) Summary: This can hardly be called an improvement (we now print CPUFloatType instead of CPUFloatTensor) but it was the simplest way I could think of devirtualizing this function in the short term. Probably need some sort of native function that gives string information about a tensor. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Approved in #9710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758 Differential Revision: D8966935 Pulled By: ezyang fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db	2018-07-24 11:11:33 -07:00
Gregory Chanan	679d397f28	Fix scalar_tensor_test for squeeze/unsqueeze with zero sized dimensions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9766 Differential Revision: D8971173 Pulled By: gchanan fbshipit-source-id: 50bf7778eee7c60f51e1660ad834e161fa40f563	2018-07-24 10:42:39 -07:00
Junjie Bai	a7afba7308	Remove duplicated functions (#9601 ) Summary: found by linter, duplication was likely introduced in previous code sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/9601 Differential Revision: D8922379 Pulled By: bddppq fbshipit-source-id: 1f61bd7f539d823e62920615674a532ec0149623	2018-07-24 10:23:46 -07:00
Lu Fang	adda789770	Skip maxpool_with_indices onnx tests (#9751 ) Summary: Not in the same format. Skip at the moment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751 Reviewed By: yinghai Differential Revision: D8965636 Pulled By: houseroad fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f	2018-07-24 10:23:43 -07:00
Edward Yang	ba634c11df	Move strides to base class. (#9749 ) Summary: Approved in #9644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9749 Differential Revision: D8965336 Pulled By: ezyang fbshipit-source-id: d1b0763e592f298395621cfd684715dc0a550cd6	2018-07-23 22:27:48 -07:00
Zachary DeVito	9bf72b2087	Add missing windows exports Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9738 Reviewed By: apaszke Differential Revision: D8961728 Pulled By: zdevito fbshipit-source-id: aacba8c03d0d8dfe1e87585d1c2b26703d2ed103	2018-07-23 19:55:19 -07:00
Xiaomeng Yang	5df3eae89e	Add 1x1 specialization for conv with NCHW order (#9671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671 Add 1x1 specialization for conv with NCHW order Reviewed By: houseroad Differential Revision: D8944686 fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c	2018-07-23 18:54:58 -07:00
Tongzhou Wang	a387331e54	Re-enable test_segfault after recent dataloder changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700 Differential Revision: D8953615 Pulled By: SsnL fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60	2018-07-23 18:38:42 -07:00
Edward Yang	099b5ba9d1	Tensor merge PRs from July 20 (#9713 ) Summary: Constituent PRs: - [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang) - [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan) - [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan) - [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan ) - [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang) - [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang) - [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL) Upcoming PRs which didn't make this cut: - [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan) - [ ] #9652 Native localScalar (by ezyang, UNREVIEWED AND FAILING TESTS) - [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan) - [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_ (by cpuhrsch, CHANGES REQUESTED AND FAILING TESTS) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713 Reviewed By: gchanan Differential Revision: D8960882 Pulled By: ezyang fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204	2018-07-23 18:00:41 -07:00
Bram Wasti	e3fb9088d5	Allow multiple ops.def and clean up code gen in general Summary: Basic cleanup, refactoring out some ops to closed source fb Reviewed By: yinghai Differential Revision: D8720722 fbshipit-source-id: 6fdf915c057a5749656d9f34a57fc142de6b076b	2018-07-23 15:44:04 -07:00
Peter Goldsborough	5849354aa1	Add operator<< overloads for TensorOptions (#9606 ) Summary: Added `operator<<` overloads for `at::TensorOptions` on request of ebetica Example output: ``` TensorOptions(dtype=Double, device=cpu, layout=Strided, requires_grad=false) ``` ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9606 Differential Revision: D8925191 Pulled By: goldsborough fbshipit-source-id: 0503bc2851268276e9561d918290bc723e437c9c	2018-07-23 15:11:33 -07:00
Peter Goldsborough	d05a8145c5	Change behavior of clone to clone to a device (#9609 ) Summary: ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from `shared_ptr<Module> clone()` to `shared_ptr<Module> clone(optional<Device> device = nullopt)` with semantics of: 1. If a `device` is given, all parameters/buffers are moved to that device, 2. If no `device` is supplied (default), parameters/buffers retain their device. ezyang apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609 Differential Revision: D8957367 Pulled By: goldsborough fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8	2018-07-23 14:55:25 -07:00
Peter Goldsborough	31ba2f15e1	Rename embedding variable to weight (#9720 ) Summary: I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720 Differential Revision: D8955647 Pulled By: goldsborough fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea	2018-07-23 14:55:24 -07:00
Anders Papitto	431415adc4	quick patch for PackPadded removal to propagate the correct size. (#9657 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9657 Differential Revision: D8940824 Pulled By: anderspapitto fbshipit-source-id: ea827a24c85447fe4ae470336a746329598eee84	2018-07-23 14:25:39 -07:00
Zachary DeVito	a949245a86	Switch interpreter to use IValue's primitive int/floats (#9718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718 This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base. Notes: * Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line * Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because we no longer allow implicit tensor -> integer conversions following the same convention as in python * prim::ListConstruct/createList has been added to the interpreter for creating lists and this has replaced aten::stack for integers lists * gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract the primitives * IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor * `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type more consistent. This functionality combined with `toIValue(Value)` replaces the `tensor_as` and `as_tensor` family of functions. conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors * IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store the scalars as a double or int64. * To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler, along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior. * Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions. * Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions. * TensorToNum and NumToTensor have actual implementations as operators now * regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp * Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler. * incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here. * stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types: ``` at::Tensor a; at::Scalar b; pop(stack, a, b); at::Tensor c = a + b; push(stack, c); ``` apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584 Reviewed By: apaszke Differential Revision: D8910546 Pulled By: zdevito fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30	2018-07-23 14:11:11 -07:00
Yinghai Lu	a9742e1a27	Add fallback to TensorCPU if there are unsupported types for IDEEP Tensor (#9667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667 MKL-DNN doesn't support 64-bit integger (`cfee61bf81/include/mkldnn_types.h (L62-L75)`). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back. Reviewed By: pjh5 Differential Revision: D8943544 fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116	2018-07-23 13:54:57 -07:00
Norman Mu	ee2cc68259	Add ctc_beam_search_decoder op for caffe2 (#9622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622 Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder. Differential Revision: D8903100 fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3	2018-07-23 13:40:24 -07:00
Sam Gross	aa8a9fa5fc	Extend DispatchStub to support CUDA dispatch (#9664 ) Summary: This is a modification of the strategy from https://github.com/pytorch/pytorch/pull/8919 and https://github.com/pytorch/pytorch/pull/9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef	2018-07-23 13:40:23 -07:00
Xiaolong Wang	3e9e3ef383	Improving diagnose RF NE with Cali (#9550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9550 as titled Differential Revision: D8899226 fbshipit-source-id: 3c7cf026e8cbc0e95770e5a35b213a97bebba385	2018-07-23 13:40:21 -07:00
Sebastian Messmer	88d6b6e6cd	Fix D8722560 (#9717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9717 D8722560 was landed with some build errors, unfortunately the c10 code isn't part of contbuild yet. Fixing them. Differential Revision: D8954141 fbshipit-source-id: 2a082fb8041626e45ccd609f37a8ef807f6dad8a	2018-07-23 12:55:20 -07:00
Peter Goldsborough	5094684238	Create torch::from_blob for variables (#9605 ) Summary: Need an overload of `at::from_blob` for Variables. ezyang colesbury ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9605 Differential Revision: D8926226 Pulled By: goldsborough fbshipit-source-id: e377c0d019d4377f3fc124614c7dcc562aa69990	2018-07-23 12:40:12 -07:00
Fei Sun	14d4bdb406	Reformat output data format to make it more general for other binaries (#9555 ) Summary: This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries. This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555 Reviewed By: pjh5 Differential Revision: D8903024 Pulled By: sf-wind fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a	2018-07-23 11:11:26 -07:00
idansc	029cf1d78a	Improve error messages of wrong dimensions (#9694 ) Summary: Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694 Differential Revision: D8949589 Pulled By: ezyang fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f	2018-07-23 10:10:55 -07:00
fehiepsi	9525925119	Low rank multivariate normal (#8635 ) Summary: This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost. During the way, I also revise MultivariateNormal distribution a bit. Here are other changes: + `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`. + Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler. + Use `torch.diagonal` for `_batch_diag` + Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`. + Use trtrs to compute term2 of KL. + `variance` relies on `scale_tril` instead of `covariance_matrix` TODO: - [x] Resolve the fail at `_gradcheck_log_prob` - [x] Add test for KL cc fritzo stepelu apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635 Differential Revision: D8951893 Pulled By: ezyang fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3	2018-07-23 10:10:53 -07:00
Gregory Chanan	9d6521c3a0	Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658 ) Summary: …unctions. This also unifies the error checkign between scatter/scatterAdd on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658 Differential Revision: D8941527 Pulled By: gchanan fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea	2018-07-23 08:40:12 -07:00
peter	53083b8353	Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491 ) (#9491 ) Summary: Fixes #9092. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693 Differential Revision: D8946850 Pulled By: ezyang fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707	2018-07-23 06:40:39 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
Edward Yang	1afdc57ed8	Hide all other fields in THTensor (#9683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9683 This pops off `refcount_`, `storage_`, `storage_offset_`; there are now no more direct accesses to these fields and we can make them private (with appropriate friending). Stacked on #9561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9591 Reviewed By: SsnL Differential Revision: D8922246 Pulled By: ezyang fbshipit-source-id: dfae023d790e29ce652e2eab9a1628bbe97b318d	2018-07-22 09:09:34 -07:00
Di Yu	f3d72b2101	Modify barrier net to allow better control over its initialization and execution in DPM (#9665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665 In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net. Reviewed By: andrewwdye Differential Revision: D8375389 fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578	2018-07-22 00:23:47 -07:00
Adam Paszke	769cb5a640	Add new ways of matching nodes with schemas in the JIT (#9567 ) Summary: REVIEW LAST COMMIT ONLY As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function: ```cpp n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor") ``` This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567 Reviewed By: zdevito Differential Revision: D8938482 Pulled By: apaszke fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e	2018-07-21 21:39:07 -07:00
Xiaomeng Yang	a01d6f01b5	Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet Reviewed By: houseroad Differential Revision: D8889361 fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e	2018-07-21 12:54:33 -07:00
Owen Anderson	3bb8c5eab1	Allow MKLDNN on macOS, and any other OS where CMake is able to detect it. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9638 Reviewed By: soumith Differential Revision: D8946130 Pulled By: resistor fbshipit-source-id: 87bd9cb12608467b05bd4998fdb00bfdbd038ca2	2018-07-20 22:27:02 -07:00
Edward Yang	b5c8d59451	Add a CUDAContext header include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9662 Differential Revision: D8945581 Pulled By: ezyang fbshipit-source-id: 2fe0adc96456788579f7d6f1c4513fe45360c030	2018-07-20 20:39:09 -07:00
Edward Yang	23ed26a0c3	Guard include of cuda-only header comm.h (#9656 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9656 Reviewed By: colesbury Differential Revision: D8941361 Pulled By: ezyang fbshipit-source-id: c18cb0e606ae0608e5892040192b8792ae542b74	2018-07-20 19:46:36 -07:00
Ashish	5e84403d5f	Fix for half conversion for ROCm 1.8.2 (#9663 ) Summary: This PR contains the change for explicit conversion between ushort and __half required for ROCm 1.8.2 support bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9663 Differential Revision: D8943937 Pulled By: bddppq fbshipit-source-id: 16102f9dbc68ed4ece2e8fc244825c3992c24901	2018-07-20 17:11:30 -07:00
Gregory Chanan	3efdece9da	Support n-dimensional empty tensors in take/put. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635 Differential Revision: D8935119 Pulled By: gchanan fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef	2018-07-20 15:40:49 -07:00
Yinghai Lu	45e5c17ecf	ONNXIFI transform (#9569 ) Summary: Cut-off runnable subgraph and off-load to ONNXIFI backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569 Reviewed By: Maratyszcza Differential Revision: D8930408 Pulled By: yinghai fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b	2018-07-20 15:09:59 -07:00
Kittipat Virochsiri	01581037dc	Add workspace.RunPlanInBackground (#9637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637 Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training. Reviewed By: MisterTea Differential Revision: D8906439 fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e	2018-07-20 14:56:12 -07:00
Mike Ruberry	1003ccfa15	Creates CUDAContext (#9435 ) Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with #9277 and I will merge with master after #9277 goes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751	2018-07-20 12:56:15 -07:00
Kittipat Virochsiri	8a0fe0a588	set_input_record() should always add external input (#9636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636 Make sure that the blobs are registered to the net Reviewed By: pjh5 Differential Revision: D8924883 fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6	2018-07-20 11:55:37 -07:00
Gregory Chanan	bae156a481	Support (some) CUDA Lapack on n-dimensional empty tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631 Reviewed By: ezyang Differential Revision: D8933202 Pulled By: gchanan fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f	2018-07-20 11:40:25 -07:00
vmirly	d3688861ec	Fixed a missing '=' in LPPoolNd repr function (#9629 ) Summary: In the repr funciton of LPPoolNd(..) class, there was a missing '='. (`kernel_size{kernel_size}`) Link to line in the code: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/pooling.py#L694 Original: return 'norm_type={norm_type}, kernel_size{kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Fixed: return 'norm_type={norm_type}, kernel_size={kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9629 Differential Revision: D8932913 Pulled By: soumith fbshipit-source-id: 9030dff6b14659b5c7b6992d87ef53ec8891f674	2018-07-20 11:24:42 -07:00
Zhaoheng Ni	a3a6ab60cd	Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598 The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp. Reviewed By: jerryzh168 Differential Revision: D8919799 fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840	2018-07-20 11:09:34 -07:00
Adam Paszke	1d4d9fc7da	Prepare to stop using attributes in the JIT (#9505 ) Summary: This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are: - `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name. - `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue - `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position). All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here. One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505 Reviewed By: ezyang Differential Revision: D8915496 Pulled By: apaszke fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5	2018-07-20 10:56:00 -07:00
Sam Gross	b9e89cf9fd	Revert "Extend DispatchStub to support CUDA dispatch (#9579 )" (#9614 ) Summary: This reverts commit bcf0bf42a1727c8ee788f733c28579d0e36a387c. The commit was causing issues for some internal FB projects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6	2018-07-20 10:25:11 -07:00
Christian Puhrsch	bbb30ad4ab	Use THTensor/Storage for THVoidTensor/Storage (#9588 ) Summary: Change akin to change for THVoidStorage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9588 Reviewed By: gchanan Differential Revision: D8915559 Pulled By: cpuhrsch fbshipit-source-id: 6cc69df0e29942c62750f990903dfd8e4d344581	2018-07-20 09:54:44 -07:00
Christian Puhrsch	f84fdc7866	Remove unnecessary functions from StorageDerived.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9553 Reviewed By: ezyang Differential Revision: D8915526 Pulled By: cpuhrsch fbshipit-source-id: 32013d3aa58a1a68637f99ee619d06e27fadaad6	2018-07-20 09:41:36 -07:00
Vishwak Srinivasan	7b9d8916e5	Fix integral type dispatch error message (#9625 ) Summary: This fix will prevent errors like (found in `bincount`) ``` RuntimeError: %s not implemented for '%s'bincounttorch.FloatTensor ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9625 Differential Revision: D8932945 Pulled By: soumith fbshipit-source-id: 794e3b58d662779402ab318e274661826a5db8b2	2018-07-20 09:24:27 -07:00
Tongzhou Wang	2a0018f2a8	Add scatter_add_ doc (#9630 ) Summary: fixes #4176 cc vishwakftw I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630 Differential Revision: D8933022 Pulled By: SsnL fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508	2018-07-20 08:41:05 -07:00
Tongzhou Wang	bfe2aa093e	docs fixes (#9607 ) Summary: fixes #9589 #9507 #9502 #9390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9607 Reviewed By: ezyang, soumith Differential Revision: D8923575 Pulled By: SsnL fbshipit-source-id: cb61d990333b700d813ce781040c3d0325999b8c	2018-07-20 07:55:25 -07:00
Anders Papitto	4028ff6c3a	Revert "quick patch for PackPadded removal to propagate the correct s… (#9613 ) Summary: …ize. (#9593)" This reverts commit 85b28163584380bf4953f2ac2fa21df9715f12d5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9613 Reviewed By: bddppq Differential Revision: D8929322 Pulled By: anderspapitto fbshipit-source-id: 3ae4d320e5407acc1fb63a26b7d1f2ff4059eba9	2018-07-20 00:39:29 -07:00
Adam Paszke	aa7af94656	Make JIT tracing a thread-local property (#9414 ) Summary: As in the title. Lets us simplify a lot of code. Depends on #9363, so please review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414 Reviewed By: zdevito Differential Revision: D8836496 Pulled By: apaszke fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3	2018-07-19 19:09:39 -07:00
Rio Hoshi	5651b27458	Add CAFFE_STATIC_EVENT to Stats (#9501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9501 Added a new stat value to log static states like CPU and memory usage. Reviewed By: pjh5 Differential Revision: D8872254 fbshipit-source-id: 469e94cab99029a3da55f8986dddeadac076e2a8	2018-07-19 16:25:59 -07:00
Peter Goldsborough	b770156a7a	Functional DataParallel (#9234 ) Summary: This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend. For this, I had to: 1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s, 2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++. `replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice). `parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py). Added lots of tests for these things. apaszke ezyang ebetica colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234 Differential Revision: D8865182 Pulled By: goldsborough fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4	2018-07-19 16:12:04 -07:00
Peter Goldsborough	7e78e80d94	Make error message for empty module friendlier (#9565 ) Summary: In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer. I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment. E.g. ``` Linear linear; In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0, from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1: required from here /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt y std::shared_ptr). static_assert( ``` ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565 Differential Revision: D8903666 Pulled By: goldsborough fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec	2018-07-19 15:56:54 -07:00
Sam Gross	bcf0bf42a1	Extend DispatchStub to support CUDA dispatch (#9579 ) Summary: This is a few files taken from https://github.com/pytorch/pytorch/pull/8919. They're unchanged from the latest versions of that PR. ``` This is part of https://github.com/pytorch/pytorch/pull/8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0	2018-07-19 14:25:40 -07:00
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Junjie Bai	f521823b7b	Do not always set broadcast argument when exporting new onnx add and sub to caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597 Reviewed By: colesbury Differential Revision: D8920575 Pulled By: bddppq fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129	2018-07-19 14:10:05 -07:00
Zhishuai Zhang	6557856671	Fix l2 normalization when handling zero vector (#9594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594 When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this. Reviewed By: pjh5 Differential Revision: D8849732 fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706	2018-07-19 14:10:03 -07:00
Anders Papitto	85b2816358	quick patch for PackPadded removal to propagate the correct size. (#9593 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9593 Differential Revision: D8919125 Pulled By: anderspapitto fbshipit-source-id: a88ca979c3b9d439863e223717d3697180c26121	2018-07-19 14:10:02 -07:00
Tongzhou Wang	f33cd36c9b	Use int64_t for im2col and col2im (#9590 ) Summary: Fixes #9404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9590 Differential Revision: D8916020 Pulled By: SsnL fbshipit-source-id: ac6758326bbb09b48642b149f4eb8f466ef7044e	2018-07-19 11:29:24 -07:00
Gregory Chanan	f180373d68	Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573 ) Summary: This is mainly straightforward, with two exceptions: 1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y). This is handled by manually calling zero / mul. 2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors. Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called. But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573 Reviewed By: ezyang Differential Revision: D8906144 Pulled By: gchanan fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56	2018-07-19 09:50:27 -07:00
Tongzhou Wang	aee9e90abd	Fix TestAutograd.test_as_strided (#9538 ) Summary: 0. Fixes #9479 1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check. 2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538 Differential Revision: D8894556 Pulled By: SsnL fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299	2018-07-19 09:11:13 -07:00
Will Feng	e0446fcfa9	Pass dtype to tensor contructor in test_neg (#9558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9554. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558 Differential Revision: D8901085 Pulled By: yf225 fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8	2018-07-19 08:54:39 -07:00
Peter Yeh	54db14e390	HIP Operators Generator--> HipOpG (#9322 ) Summary: The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at compile time. Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322 Differential Revision: D8884707 Pulled By: bddppq fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8	2018-07-19 00:26:06 -07:00
Marat Dukhan	45f0d05202	Adapt OnnxifiOp to removed suffix handling in ONNXIFI loader (#9571 ) Summary: Adapt to changes in onnx/onnx#1203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9571 Reviewed By: yinghai Differential Revision: D8907892 Pulled By: bddppq fbshipit-source-id: 9f88471639dbe9050194e84340f335bece834d5d	2018-07-18 19:26:23 -07:00
Viswanath Sivakumar	604f7e98c3	Expose CAFFE2_USE_OPENCV preprocessor flag (#9509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9509 generate_proposals_op_util_nms.h conditionally requires OpenCV in some cases, and earlier this was checking just CV_MAJOR_VERSION macro, but that is undefined unless opencv.hpp is included. Adding `-DCAFFE2_USE_OPENCV` to TARGETS when opencv is included in external_deps to check for this correctly. Thanks jinghuang for flagging this issue! Differential Revision: D8880401 fbshipit-source-id: 65abbcf4ffe3feffc0ee2560882cb8eb0b7476f9	2018-07-18 18:56:49 -07:00
Yi Cheng	b3e141e84c	Add predictor config into Predictor (#9434 ) Summary: This is the first step of refactoring the Predictor. In this diff the config struct is introduced and the internal data structure of Predictor has been updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9434 Differential Revision: D8843262 Pulled By: fishbone fbshipit-source-id: 23f5e4751614e3fedc9a04060d69331bfdecf864	2018-07-18 16:39:56 -07:00
Keren Zhou	04b33b7231	Add byte_weight_dequant_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9541 Reviewed By: hlu1 Differential Revision: D8882964 fbshipit-source-id: 06d2e0d227ea6a4a8dc5ef1ea9dd1d449c149b47	2018-07-18 16:27:21 -07:00
Christian Puhrsch	c1ee8835b6	Constructors and member functions for THStorage (#9357 ) Summary: Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357 Reviewed By: ezyang Differential Revision: D8863934 Pulled By: cpuhrsch fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1	2018-07-18 15:56:26 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Peter Goldsborough	3b886500a0	Add CUDAGuard to ATen (#9277 ) Summary: THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes https://github.com/pytorch/pytorch/issues/7800 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7	2018-07-18 14:40:31 -07:00
Christian Puhrsch	8769fec03f	Move clamp into ATen (#9506 ) Summary: Glue component of https://github.com/pytorch/pytorch/pull/9319 Important to unblock wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506 Reviewed By: wanchaol Differential Revision: D8879437 Pulled By: cpuhrsch fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1	2018-07-18 13:40:11 -07:00
Edward Yang	c506ff97c8	Disable py2-clang3.8-rocmnightly-ubuntu16.04-test in disabled-configs… (#9543 ) Summary: ….txt setting In the ROCm branches we will experiment with turning this on. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9543 Differential Revision: D8897990 Pulled By: ezyang fbshipit-source-id: ae9d25d1b79ee421d49436593edf8c7e49b3a4e5	2018-07-18 12:58:56 -07:00
Xiaomeng Yang	ca3b36aa6a	Add implementation for batch_moments_op (#9510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510 Add implementation for batch_moments_op Reviewed By: houseroad Differential Revision: D8587654 fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b	2018-07-18 11:59:54 -07:00
Keren Zhou	8c741b7c4f	Add transformation from caffe2::resizeop to onnx::upsample Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511 Reviewed By: hlu1 Differential Revision: D8876692 fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc	2018-07-18 11:59:52 -07:00
Artem Volkhin	b6b6e1b39f	Fix core.Plan.create_from_proto (#9438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438 Current implementation of create_from_proto doesn't work as expected: it duplicates networks and execution steps by copying original PlanDef first and adding each step one-by-one later. Reviewed By: pjh5 Differential Revision: D8850316 fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b	2018-07-18 10:55:55 -07:00
Tongzhou Wang	27455e9c78	Use _six for inf and nan (#9500 ) Summary: Things like `float('inf')` are actually quite expensive. ```py In [1]: import math In [2]: %timeit -n 200 math.inf 49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) In [3]: %timeit -n 200 float('inf') 194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500 Reviewed By: soumith Differential Revision: D8876229 Pulled By: SsnL fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7	2018-07-18 10:40:29 -07:00
Natalia Gimelshein	35f7925aad	fix small literals being flushed to 0 by std::to_string Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9478 Differential Revision: D8872083 Pulled By: soumith fbshipit-source-id: 90083b6047f59466949ace249193094131a30cd5	2018-07-18 09:25:06 -07:00
Edward Yang	d6e124e9a5	Dummy CircleCI config. (#9537 ) Summary: The purpose of this config is to make sure that CircleCI builds don't fail when I turn them on for pytorch/pytorch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9537 Differential Revision: D8894497 Pulled By: ezyang fbshipit-source-id: 22f43c84a9b8a54cd47a6572ba068f70a73f043a	2018-07-18 09:25:05 -07:00
Tong Xiao	28954b9e68	Fix RoIAlignOp GPU implementation for RoIs without batch index (#9230 ) Summary: Fix RoIAlignOp GPU implementation for RoIs without batch index According to https://caffe2.ai/docs/operators-catalogue.html#roialign, RoIs is "2D input of shape (R, 4 or 5)" Pass RoIs 2nd dimension as kernel parameter and adjust kernel accordingly Pull Request resolved: https://github.com/pytorch/pytorch/pull/9230 Reviewed By: houseroad Differential Revision: D8886798 Pulled By: malfet fbshipit-source-id: 52a8b4df85f7e350e36c842ee4428f3a1cba2588	2018-07-18 08:39:50 -07:00
Nikita Shulga	8fe2622090	Fix gatherTopK template (#9231 ) Summary: Fix gatherTopK template This change makes it possible to instantiate getherTopK() with IndecesType other than caffe2::TIndex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9231 Reviewed By: houseroad Differential Revision: D8886778 Pulled By: malfet fbshipit-source-id: d5fb1f8814710cd81bc0cf65e0f96fd9fd8317da	2018-07-18 08:25:23 -07:00
Gregory Chanan	f277645968	Support N-dimensional empty tensors in CPU BLAS and (a selection of) … (#9522 ) Summary: …CPU LAPACK routines. Note that the LAPACK functions in general require a different approach, because direct calls with size zero dims do not work. Here I just selected a reasonable subset of LAPACK routines to support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9522 Reviewed By: ezyang Differential Revision: D8888180 Pulled By: gchanan fbshipit-source-id: 16b9013937806d375d83d1c406815765fda00602	2018-07-18 08:25:21 -07:00
bhushan23	5eaed750c2	Implementing torch.isfinite (#9487 ) Summary: fixes #9132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9487 Reviewed By: soumith Differential Revision: D8875529 Pulled By: SsnL fbshipit-source-id: d1b8aa825d202cfbdca27897da6a8bc1b714f856	2018-07-18 08:25:20 -07:00
albanD	57608214d4	Make squeeze doc consistent with it's behaviour (#9529 ) Summary: A 0-dimensional tensor is now returned when squeezing a tensor with a single element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9529 Differential Revision: D8893103 Pulled By: soumith fbshipit-source-id: 658189ecfff283b2b7281feb16a397692d6dbd8f	2018-07-18 08:25:18 -07:00
iotamudelta	3eb3f03776	ROCm contributions week 28 (#9432 ) Summary: This PR contains the ROCm contributions of last week: * documentation of pyHIPIFY data format originating from #8812 reviewing comments by ezyang * removal of most patch files from the `amd_build` directory and integration into the code base * enabling of previously disabled_features that do compile now * improvement to the static_cast feature in pyHIPIFY (it will only apply static_cast to kernel arguments, not launch arguments) * addition of two workarounds to pyHIPIFY for ROCm/HIP shortcomings: a) `__forceinline__` does not imply `static`, hence change to `__inline__`, b) `std::[exp,log,pow]` math functions cannot be selected in device code, use `::[exp,log,pow]` instead. Both of these workarounds will be removed once the issues are fixed upstream. Neither of these issues have surfaced on the CI but were reproduced internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9432 Differential Revision: D8887441 Pulled By: ezyang fbshipit-source-id: 71cf5c6b13772a66d10be369a45ebf06e4e268e1	2018-07-18 07:54:58 -07:00
fehiepsi	73225e4a1d	add docs for using `python setup.py clean` in developing mode (#9524 ) Summary: This command (suggested by albanD when I raised a related question in pytorch slack) is super useful to me. I have used it several times and it worked like a charm (without it, I have to delete entire pytorch folder and clone things again). So I guess it is nice to have in the CONTRIBUTING doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9524 Differential Revision: D8890126 Pulled By: soumith fbshipit-source-id: c1798ff1ab2423627fcd8e0662a66c4e85cb2413	2018-07-18 05:23:41 -07:00
Jinhyun lewha0 Kim	89db578e66	Fixed a typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9523 Differential Revision: D8890124 Pulled By: soumith fbshipit-source-id: dea8d153fc352c36b219298c52f2c97caf9999f4	2018-07-18 05:09:22 -07:00
James Sun	6de038286a	Add random data filler to predictor bench to support production nets (#9520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520 Add random data filler to predictor bench to support production nets Reviewed By: salexspb Differential Revision: D8712757 fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03	2018-07-18 00:46:02 -07:00
Edward Yang	543d4af79f	Be strict prototypes clean. (#9516 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9516 Differential Revision: D8886493 Pulled By: ezyang fbshipit-source-id: fea974fd96c7d81126a129eb5b8b06eb1b028526	2018-07-17 20:25:53 -07:00
Wei Yang	aa73348d75	added reminder of args naming rules to readme (#9504 ) Summary: - I ran into this couple days ago, and thought it might be useful to take note on that Pull Request resolved: https://github.com/pytorch/pytorch/pull/9504 Reviewed By: soumith Differential Revision: D8887396 Pulled By: weiyangfb fbshipit-source-id: d2061cf379ce140d6e43ef6c18241f7ce00dbab6	2018-07-17 19:40:38 -07:00
Edward Yang	004d924807	Give THTensor a constructor, use new/free. (#9496 ) Summary: Stacked on #9495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9496 Differential Revision: D8875528 Pulled By: ezyang fbshipit-source-id: 6419d2ffb07aaf49c1462e7b64737019abbb7f61	2018-07-17 19:25:37 -07:00
Ilia Cherniavskii	c33d2c0b04	Thread-safe dispatcher table (#9126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9126 Closes https://github.com/pytorch/pytorch/pull/9126 Allow concurrent read and writes in dispatcher table Reviewed By: smessmer Differential Revision: D8722560 fbshipit-source-id: e376bcd59f1b9f6b0e6fd3dd376a55561ea3c9c3	2018-07-17 17:41:53 -07:00
Huamin Li	13e0c9295d	Add Support for count_include_pad in AveragePool in Caffe2 ONNX Backend (#9458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9458 The goal is to support count_include_pad in Caffe2 ONNX backend. This commit contains the first step - support 4-D tensor cases. AveragePool with count_include_pad can be expressed as PadImage + AveragePool. Reviewed By: houseroad Differential Revision: D8852180 fbshipit-source-id: 4db00e9771be7a000a2d92850dfd066d9c9c38bf	2018-07-17 17:41:52 -07:00
vishwakftw	1c3580b6fe	Added hash for device (#9246 ) Summary: If this is good, I could write some tests to ensure collision doesn't occur within a given range. Closes #7228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9246 Differential Revision: D8872608 Pulled By: ezyang fbshipit-source-id: 0ed29a73188f4167b42756f59a5c9a3d5cb37326	2018-07-17 17:10:17 -07:00
tippisum	5c695e3a60	Implement 2D and 3D alpha_dropout (#9073 ) Summary: It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073 Differential Revision: D8727008 Pulled By: ezyang fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331	2018-07-17 17:10:16 -07:00
Yanghan Wang	6116954e97	oss heatmap_max_keypoint_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9470 Reviewed By: pjh5 Differential Revision: D8826713 fbshipit-source-id: 47674af86b3a5ae0752056faf3b93f0d96e38fc2	2018-07-17 16:55:47 -07:00
Lin Li	0fe980c748	Memory usage measurement -- Caffe2 (#9017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9017 Closes https://github.com/pytorch/pytorch/pull/9017 Added "get_blob_size_bytes" to "pybind_state.cc" in Caffe2 to expose the size of blob in bytes. Reviewed By: kuttas Differential Revision: D8685696 fbshipit-source-id: 9a9d38f207c8c59ef534217181e8ce1514617628	2018-07-17 16:40:23 -07:00
Edward Yang	9b0c53ac22	Deduplicate THTensor and THCTensor. (#9495 ) Summary: This is enabled by the allocator patch; previously we could not deduplicate THStorage_free/THCStorage_free; now we can. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9495 Reviewed By: SsnL Differential Revision: D8875497 Pulled By: ezyang fbshipit-source-id: 387198dff446eb9f84d2d6187066fae1d595dea7	2018-07-17 15:41:15 -07:00
Peter Goldsborough	2249751422	Add OptimizerBase::add_parameters (#9472 ) Summary: ebetica asked for a way to add parameters to `Optimizer`s after they are created. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9472 Differential Revision: D8872176 Pulled By: goldsborough fbshipit-source-id: 39a4032c519a6d3b458dd3596361b04afea10365	2018-07-17 14:10:22 -07:00
Gregory Chanan	890037eaaf	Fix (non-reduction) ops over a dimension for n-dimensional empty tens… (#9482 ) Summary: …ors (CPU). This includes (mainly) CPU fixes; CUDA fixes are a little more involved because you can't use an empty grid. This also includes a fix for index_copy, which checked that self.size(dim) == src.size(0), which isn't correct (the same dimension should be compared). Finally, also includes a fix for CUDA flip (although it's not tested yet), to get the stride using multiplication rather than division to avoid divide-by-0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9482 Reviewed By: ezyang Differential Revision: D8873047 Pulled By: gchanan fbshipit-source-id: 86523afd3d50277834f654cd559dfbc7875cdffe	2018-07-17 13:11:04 -07:00
Viswanath Sivakumar	8be4657871	Add ideep copy for TensorCPU<long> in IDEEPFallbackOp (#9480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9480 Ops like Reshape sometimes take a second input tensor of long with the new shape (can also be specified in arg). If this input tensor is passed in via external input (which ONNX does sometimes), LoadOp fails with an exception. Such ops anyway are executed by IDEEPFallbackOp, so this should be fine. Reviewed By: yinghai Differential Revision: D8872671 fbshipit-source-id: 659a02416c374e373ce041a7d65a174be828702d	2018-07-17 11:55:23 -07:00
Junjie Bai	30f849cdc5	Correct model name in caffe2 onnx backend tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9485 Reviewed By: houseroad Differential Revision: D8873733 Pulled By: bddppq fbshipit-source-id: 3a3cc351834cbbedce360760504ea16f5fa0ea06	2018-07-17 11:41:01 -07:00
Edward Yang	d2d43824cd	Delete flag from THTensor. (#9494 ) Summary: It was only used to toggle refcounting, but we ALWAYS refcount tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9494 Differential Revision: D8875169 Pulled By: ezyang fbshipit-source-id: 3a8618fb288334e62942bbaf388f3c9e473e7524	2018-07-17 11:25:41 -07:00
Edward Yang	e5678794ed	Reenable multiprocessing preserve sharing tests on ASAN. (#9498 ) Summary: This issue was fixed in 976f9253a5425918eda7cf865b097cf42b5da8d7 Fixes #5311. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9498 Differential Revision: D8875605 Pulled By: ezyang fbshipit-source-id: 449ffe975d35c959f92874437ba9be37d4d3a1f2	2018-07-17 11:10:21 -07:00
Tongzhou Wang	050a2588b5	change stft to have consistent signature with librosa (#9497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497 Fixes #7883 by using `rfft`. It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error). soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308 Reviewed By: ezyang Differential Revision: D8806148 Pulled By: SsnL fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58	2018-07-17 10:55:43 -07:00
Brian W. Hart	7d2a17876f	test_cuda: ensure tests use float and adjust HalfTensor tolerances (#9475 ) Summary: test_cuda.py uses routine 'number' to prepare many testscases. number should return a floating point value for float-type tensor types, or integer otherwise. But number's test to classify the type is incorrect, so it always returns the integer value. (type(t).__name__ is always 'torch.tensortype' so never matches 'Double', 'Float', or 'Half'.) Update number to use the existing is_floating() helper to make the check. The change to number causes a few tests to fail for HalfTensor. Relax the tolerance for those in line with other HalfTensor testcases. The failing tests--for addcdiv and fill--were not previously relaxed for HalfTensor so are held to the over-strict 1e-5 default tolerance. Finally, update a couple other tests for HalfTensor type to use the existing is_half() helper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9475 Reviewed By: yf225 Differential Revision: D8872112 Pulled By: ezyang fbshipit-source-id: 016e3e15adb23f6606bd4c08218954c1396699db	2018-07-17 10:25:17 -07:00
vishwakftw	52cc073212	Implement reshape_as (#9452 ) Summary: 1. Added tests 2. Added doc string 3. Remove view_as redundant definition from tensor.py Closes #9416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9452 Differential Revision: D8851794 Pulled By: ezyang fbshipit-source-id: 0aa0430dd0a174e1a5caddbc50a7e2c9eb7802bc	2018-07-17 08:54:42 -07:00
Thiago Crepaldi	11fc16dc98	Remove HTML tags from README.md (#9296 ) Summary: This change makes README.md compatible with both Github and VSTS markdown engines. Images can be reduced if necessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/9296 Differential Revision: D8874931 Pulled By: soumith fbshipit-source-id: 0c530c1e00b06fc891301644c92c33007060bf27	2018-07-17 07:24:43 -07:00
onnxbot	4ff636a3fd	Update onnx to onnx/onnx@b2817a6 (#9476 ) Summary: `b2817a682f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9476 Reviewed By: houseroad Differential Revision: D8868253 Pulled By: bddppq fbshipit-source-id: b1f14bab47f020f0bc0239da7e2bbf959a407d6a	2018-07-16 22:17:09 -07:00
Peter Goldsborough	ae44a6b5e3	Fix Sequential::clone() (#9372 ) Summary: I noticed that `Sequential::clone()` does not work. This is because `Sequential` does not use `reset()` which is normally where modules have to initialize and register its submodules. Further, this is because of the way `Sequential` allows its modules to be passed in the constructor, which doesn't work with `reset()` (since it does "late" initialization). I've added some better error messages inside `Cloneable::clone()` which makes this kind of mistake clearer for other users, and tests for `Sequential::clone()`. I also had to give `AnyModule` a deep `clone()` method. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9372 Differential Revision: D8865189 Pulled By: goldsborough fbshipit-source-id: b81586e0d3157cd3c4265b19ac8dd87c5d8dcf94	2018-07-16 21:53:42 -07:00
Gu, Jinghui	e8b8c3895e	Enable Conv fusion optimizations in optimizeForIdeep (#9255 ) Summary: Enable fusion for IDEEP in optimizeForIdeep including Conv+ReLU, Conv+Sum, Conv+Sum+ReLU, Conv+BN Pull Request resolved: https://github.com/pytorch/pytorch/pull/9255 Reviewed By: bddppq Differential Revision: D8809030 Pulled By: yinghai fbshipit-source-id: af30bad3b96cb965bd26a4dfa810370faec4bb88	2018-07-16 21:28:50 -07:00
Viswanath Sivakumar	9235ff53f1	Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403 In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits within the images. For rotated boxes, this doesn't always make sense as there could be multiple ways to clip a rotated box within an image boundary. Moreover, clipping to a horizontal box means we leave out pixels of interest potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a specified `angle_thresh` tolerance). Reviewed By: pjh5 Differential Revision: D8828588 fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c	2018-07-16 20:24:49 -07:00
Will Feng	ad74006ffa	Pass THDRequest as void* pointer to THDRequest_free (#9398 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/9054. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9398 Reviewed By: ezyang Differential Revision: D8827778 Pulled By: yf225 fbshipit-source-id: 862287802cb69c6ac71ff4df19cadb89b1face1d	2018-07-16 19:25:22 -07:00
Brad Stocks	c4bff25282	Additional operator information values (#9153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9153 Closes https://github.com/pytorch/pytorch/pull/9153 Modified the values reported by the benchmarking platform to include tensor_shape and op_args. These values have a different naming scheme to values like flops and latency. Reviewed By: sf-wind Differential Revision: D8729791 fbshipit-source-id: f050200be01c6d0794bf5faaa6e8cef12a00affe	2018-07-16 17:40:44 -07:00
Junjie Bai	7df48d0444	Merge .cu and _gpu.cc files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9473 Reviewed By: houseroad Differential Revision: D8865754 Pulled By: bddppq fbshipit-source-id: 406eda6c145f03a0ee35c4643ec1ec0092fbce88	2018-07-16 17:10:18 -07:00
Yinghai Lu	45140368c3	Update onnx-tensort module to the latest (#9469 ) Summary: Update onnx-tensort to follow up recent changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9469 Reviewed By: Maratyszcza Differential Revision: D8866704 Pulled By: yinghai fbshipit-source-id: 3b96ec2fa28470f0d4b5a7c62ab332eeba4bdb12	2018-07-16 17:10:16 -07:00
Wanchao Liang	5ff686651f	move batchop import to init to avoid debugging confusions (#9425 ) Summary: fixes #9409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9425 Reviewed By: ezyang Differential Revision: D8842844 Pulled By: wanchaol fbshipit-source-id: 3c6b26470d59d8d1fc5f79caa70252b9de7290e4	2018-07-16 15:40:28 -07:00
Edward Yang	80160f6186	Skip PyTorch ROCm tests in the script. (#9467 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9467 Reviewed By: houseroad Differential Revision: D8860794 Pulled By: ezyang fbshipit-source-id: 9b11475d9bb4b3361973865d7f68e562bffbf9d8	2018-07-16 15:40:26 -07:00
Edward Yang	976f9253a5	Eliminate storage views. (#9466 ) Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes #9447. Fixes #46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97	2018-07-16 15:40:24 -07:00
Zachary DeVito	9ed2190bdb	Add a tagged union type that replaces tensor in the interpreter. (#9368 ) Summary: IValue is short for interpreter value. It is used frequently so a short name is important. This will allow us to implement more non-tensor types in an efficient way and remove many hacks from the compiler. This PR is limited. It only introduces IValue and changes interpreter to use it. Follow up PRs will: * Change the way aten_ops consume non-tensor types so that integer lists, are no longer represented as Tensors. * Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch * Change the compiler to implement math on primitive numbers rather than converting to tensors. jamesr66a apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368 Reviewed By: ezyang Differential Revision: D8817598 Pulled By: zdevito fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f	2018-07-16 15:40:22 -07:00
Adam Paszke	9ae77cc1f5	Implement tensor weak references (#9363 ) Summary: Add `WeakTensor` - a `Tensor` counterpart which doesn't keep the data (or any other expensive resources) alive. They can be `.lock()`ed and return `at::optional<Tensor>` if they're still alive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9363 Reviewed By: ezyang Differential Revision: D8815434 Pulled By: apaszke fbshipit-source-id: 1b3e96503c1285d78ef124c585e65c7630f3253e	2018-07-16 13:10:29 -07:00
Edward Yang	9413fabb0b	Nuke TestCollectEnv (#9459 ) Summary: The tests were too flaky, and the procedure for legitimately updating versions of software too onerous, to warrant continually testing these. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9459 Reviewed By: zou3519 Differential Revision: D8852357 Pulled By: ezyang fbshipit-source-id: 24e99cd00b4252cdeec2a1d9af92456b4a54912a	2018-07-16 13:10:28 -07:00
vishwakftw	b0c5c86492	Add test case for segmentation fault fix in grad_fn (#9457 ) Reviewed By: apaszke Differential Revision: D8863572 Pulled By: ezyang fbshipit-source-id: 13749f51320a4e403644674b0335aed4987fa887	2018-07-16 13:10:26 -07:00
Runtian Zhou	66fe3b5c06	Add peephole optimization for type_as operators. (#9316 ) Summary: If the type_as operator takes in two values with the same type, remove that operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9316 Reviewed By: zdevito Differential Revision: D8808355 fbshipit-source-id: 2d5710a6380b22f4568fc38a439061b5340c4eb1	2018-07-16 10:26:56 -07:00
Will Feng	52abcdd0dc	Fix out-of-range error for test_neg (#9431 ) Summary: `test_neg` sometimes fails internally because `random_()` can generate an out-of-range value for CharTensor. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9431 Reviewed By: SsnL Differential Revision: D8843284 Pulled By: yf225 fbshipit-source-id: bf516cceb8f780e133fa54f7364c77821eb7c013	2018-07-16 10:26:54 -07:00
Ma, Mingfei	e7f49d1444	add depthwise conv support for mkldnn (#8782 ) Summary: Change-Id: I3836dacc63afc1b5e31b1d706bba6bb13699ba41 beneficial for depth wise convolution on CPU, such as mobilenet, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8782 Reviewed By: SsnL Differential Revision: D8790869 Pulled By: ezyang fbshipit-source-id: 29f410763ce403c2438fc527aa354ff02e1829bf	2018-07-15 17:40:55 -07:00
Alican Bozkurt	8766daeec9	Refactor `_log_sum_exp` (#9173 ) Summary: This PR removes `distributions.utils._log_sum_exp` in favor of `torch.logsumexp`. Also fixes some warnings with `reduce` arg. in `binary_cross_entropy_with_logits` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9173 Reviewed By: SsnL Differential Revision: D8764174 Pulled By: ezyang fbshipit-source-id: b9c4136dbf0182e8ae77082e6448d23a430d5cb6	2018-07-15 17:40:53 -07:00
Karan Dwivedi	97008a64a1	Add ModuleDict and ParameterDict containers (#8463 ) Summary: Addresses: https://github.com/pytorch/pytorch/issues/4048 and https://github.com/pytorch/pytorch/pull/5297#issuecomment-394924139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8463 Reviewed By: SsnL Differential Revision: D8689291 Pulled By: ezyang fbshipit-source-id: 47e67d9bae1b64ec10771a2c00c56229463b1598	2018-07-15 17:40:52 -07:00
Edward Yang	cffca2926b	Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (#9358 ) Summary: See Note [Supervisor deleter] for how SupervisedPtr works. This design is not the obvious one, but there were a lot of constraints feeding into it: - It must support the reallocation usage-pattern, where, given an existing Storage, we allocate a new region of memory, copy the existing data to it, and then deallocate the old region of memory. - Creation of a deleter for memory MUST avoid dynamic allocations in the common case. We've done some benchmarking in Caffe2 where dynamic allocation for deleters is ruinously expensive, and it's really hard to avoid these performance tarpits in very general function wrappers like std::function or folly::Function (while benchmarking this, we discovered that folly::Function's move constructor was way more expensive than it should be). - We need to be able to deallocate data that comes from external sources, e.g., dlpack and numpy tensors. Most notably, you often cannot deallocate these with merely the void* data pointer; you need some extra, out-of-band information (e.g., the managing struct) to deallocate it. Sometimes, you may even want to resize data living in an external source! - The "core" allocators need to support being wrapped in a Thrust allocator, so you need to be implement the following two functions: char* allocate(size_t); void deallocate(char, size_t); - We need to support tensors which contain non-POD, non-trivially copyable data; specifically tensors of std::string. This is an upcoming requirement from Caffe2. It's dirty AF, but it's really useful. - It should use C++ standard library types like std::unique_ptr (which is hugely problematic because std::unique_ptr doesn't call the deleter when the pointer is null.) Here is the billing of changes: - Built-in support for realloc() has been DROPPED ENTIRELY. Instead, you're expected to allocate and then copy from the old memory to the new memory if you want to do a reallocation. This is what you'd generally have expected to occur; and axing realloc() from the design lets us avoid some tricky correctness issues with std::realloc(), namely the fact that we must refuse the realloc if the type of the elements are not trivially copyeable. If it really matters, we can add this back, but there really needs to be a good explanation WHY you need fast resizing reallocations (by in large, people don't resize their storages, and it should be acceptable to have a performance degradation when they do). - TH_STORAGE_FREEMEM is no more; instead, if you want a storage which doesn't free its result, you just give it an empty deleter. - What we used to call an "allocator" (really, a combined object for allocating/deleting) has been split into two concepts, an allocator, and a smart pointer (SupervisedPtr) which knows how to delete data. - Unlike previously, where THAllocator/THCDeviceAllocator could have a per-tensor context storing extra information (e.g., a pointer to the metadata you need to actually free the tensor), there is no context in the allocator or the deleter of the smart pointer; instead, the smart pointer directly holds an owning reference to the metadata necessary to free the data. This metadata is freshly manufactured* upon every allocation, which permits us to resize tensors even in the absence of built-in support for realloc(). - By default, allocators don't support "raw" allocations and deallocations with raw pointers. This is because some allocations may return a different context every time, in which case you need to reconstruct the context at delete time (because all you got was a void, not a unique_ptr that carries the deleter). - The diff between at::Allocator and THCDeviceAllocator is a bit larger: - It used to return a cudaError_t. Now, allocators are expected to check the error status immediately and throw an exception if there was an error. It turns out that this is what was immediately done after all occurrences of allocate/release, so it wasn't a big deal (although some subsidiary interfaces had to themselves be converted to not return cudaError_t). There is one notable exception to this, and it is how we handle CUDA OOM: if this occurs, we attempt to return unused memory to the system and try again. This is now handled by a catch-all try-catch block. The cost of catching the exception is probably the least of your worries if you're about to OOM. - It used to take the CUDA stream to perform the allocation on as an argument. However, it turned out that all call sites, this stream was the stream for the current device. So we can push this into the allocator (and the choice, in the future, could be made explicitly by twiddling thread local state.) - It held two extra methods, emptyCache and cacheInfo, specifically for interacting with some state in THCCachingAllocator. But this "generality" was a lie, since THCCachingAllocator was the only allocator that actually implemented these methods, and there is actually a bunch of code in THC which assumes that it is the caching allocator that is the underlying allocator for CUDA allocations. So I folded these two methods into this interface as THCCachingAllocator_emptyCache and THCCachingAllocator_cacheInfo. - It held its context directly inside the THCDeviceAllocator struct. This context has been moved out into whatever is holding the at::Allocator. - The APIs for getting at allocators/deleters is now a little different. - Previously there were a bunch of static variables you could get the address of (e.g., &THDefaultAllocator); now there is a function getTHDefaultAllocator(). - Some "allocators" didn't actually know how to allocate (e.g., the IPC "allocator"). These have been deleted; instead, you can wrap the produced pointers into SupervisedPtr using an appropriate makeSupervisedPtr() static method. - Storage sharing was a lot of work to wrangle, but I think I've tamed the beast. - THMapAllocator and its "subclasses" have been refactored to be proper, honest to goodness C++ classes. I used the enum argument trick to get "named" constructors. We use inheritance to add refcounting and management (in libshm). What we previously called the "Context" class (Context has been dropped from the name) is now the supervisor for the data. - Sometimes, we need to pull out the file descriptor from a tensor. Previously, it was pulled out of the allocator context. Now, we pull it out of the supervisor of the SupervisorPtr, using the static method fromSupervisedPtr(), which uses the deleter as the typeid, and refines the type if it matches. - I renamed the std::function deleter into InefficientStdFunctionSupervisor, to emphasize the fact that it does a dynamic allocation to save the std::function deleter. TODO: - Windows libshm is in shambles and needs to be fixed. Perhaps for the future: - newFromFd is now unconditionally calling cudaPointerGetAttributes even though this is unnecessary, because we know what the device is from higher up in the callstack. We can fix this by making newWithDataAndAllocator also take an explicit device argument. - Consider statically distinguishing between allocators that support raw_allocate/raw_deallocate, and those which don't. The Thrust constraint applies only to the CUDA device allocator; you never need to allocate CPU memory this way - Really want to get rid of storage views. Ugh. Nontrivial bugs I noticed when preparing this patch: - I forgot to placement-new unique pointers and attempted to assign them directly on uninitialized memory; very bad! Sam Gross has encouraged me to replace this with a proper constructor but I keep putting it off, because once everything goes in StorageImpl there really will be a proper constructor. - I rewrote a number of APIs to use newWithDataAndAllocator instead of newWithAllocator, calling the allocator at the call site (because they required "allocation context" which we no longer give to "allocators"). When I did this, I forgot to insert the multiplication with sizeof(real) to scale from numels to number of bytes. - The implementation of swap on storages was missing it for scalarType and backend. It was benign (because the only case we call swap is when these are the same), but I fixed it anyway. - I accidentally returned a nullptr unique_ptr with no deleter, even though there was a legitimate one. This matters, because some code still shoves its hands in the deleter context to get extra metadata about the function. - I used std::move() on a unique_ptr, and then did a boolean test on the pointer aftewards (always false!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9358 Reviewed By: SsnL Differential Revision: D8811822 Pulled By: ezyang fbshipit-source-id: 4befe2d12c3e7fd62bad819ff52b054a9bf47c75	2018-07-15 15:11:18 -07:00
bhushan	5eb9d40cc6	Introducing IsInf (#9169 ) Summary: torch.isinf - checks element wise +/- inf implements #9132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9169 Reviewed By: SsnL Differential Revision: D8768614 Pulled By: zou3519 fbshipit-source-id: dd1b5f6c976deb421d626e22cdd25500ec04d796	2018-07-15 07:55:09 -07:00
Ailing Zhang	fda03406cf	add device to CUDAEvent (#9415 ) Summary: This PR add a device_ member to CUDAEvent. This is necessary since if we create a cudaEvent on one device but destroy it from another, it also creates an additional context on that device. So this device information is needed to guard the cudaEventDestroy. (cc: ngimel is this expected behavior? I can provide a simple cu script to repro this). c10d tests are probably not in CI yet, please let me know how the test are run and I could double check. Thanks pietern apaszke for help debugging! Pull Request resolved: https://github.com/pytorch/pytorch/pull/9415 Reviewed By: apaszke Differential Revision: D8839688 Pulled By: ailzhang fbshipit-source-id: b950ba37d57b9e3c5fe71726ec92f6a9601c4d0e	2018-07-14 13:38:41 -07:00
Thomas Viehmann	a4f63576b6	Make localScalar error message more intuitive (#9443 ) Summary: Fixes: #9419 This assumes that anyone who knows localScalar can also grep for the error message or get a traceback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9443 Reviewed By: soumith Differential Revision: D8850718 Pulled By: ezyang fbshipit-source-id: a106fee718fef97064e861810a49ca05f536f27e	2018-07-14 12:24:56 -07:00
Thomas Viehmann	8444e1660b	Only accept continguous tensors in TopK for cuda (#9441 ) Summary: Fixes: #9421 I don't think it is easy to deal with non-contiguous array in cuda topk, so I'm adding a check. The argument number is a bit confusing when it shows in PyTorch but it is consistent with the other checks. (Not sure whether it would make sense to eliminate argument numbers from the error TH/THC error messages given that they're probably off more than once...) Do we need a test that it indeed refuses non-contiguous? Pull Request resolved: https://github.com/pytorch/pytorch/pull/9441 Reviewed By: soumith Differential Revision: D8850719 Pulled By: ezyang fbshipit-source-id: d50561bb37ed50ab97aeaf54d8e3fc6c765bdc7c	2018-07-14 12:24:52 -07:00
Mark Richardson	88146484b4	Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299 Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them. I only implemented this on CPU so far. Reviewed By: pjh5 Differential Revision: D8757381 fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7	2018-07-14 10:54:13 -07:00
James Reed	7160846c81	Only view() rhs of index_put if we need to (#9424 ) Summary: During tracing (and export) we are now introducing an unnecessary hard-coded view on the RHS of indexed assignments such as `tensor[idxs] = rhs`. This caused a regression in the PyTorch translate models because these expressions appear with variable sizes in the RHS. This change makes it so we only call view if we indeed need to strip leading 1-dimensions Pull Request resolved: https://github.com/pytorch/pytorch/pull/9424 Reviewed By: colesbury Differential Revision: D8838881 Pulled By: jamesr66a fbshipit-source-id: 399e5daa7d021f4f59f6f92b9fae581f92bfc538	2018-07-14 00:10:21 -07:00
Zhaoheng Ni	5ac8a80f8b	Add BatchBucketizeOp in caffe2 (#9385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385 The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output. Reviewed By: bddppq Differential Revision: D8820351 fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853	2018-07-13 20:39:30 -07:00
Jian Zhang	099a6d5e08	Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001 Closes https://github.com/pytorch/pytorch/pull/9001 We added caffe2 python wrapper and unit test for the Wngrad C++ operator. Reviewed By: chocjy Differential Revision: D8655724 fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8	2018-07-13 18:54:52 -07:00
Jian Zhang	9e2f2cab94	Implementation and operator test for Wngrad optimizer (#8999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999 Closes https://github.com/pytorch/pytorch/pull/8999 Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case. Reviewed By: pjh5 Differential Revision: D8627933 fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046	2018-07-13 18:11:41 -07:00
Vishwak Srinivasan	86eeeab758	Fix segmentation fault in grad_fn (#9292 ) Summary: Fixes #8774 . Reviewed By: soumith Differential Revision: D8836478 Pulled By: apaszke fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38	2018-07-13 14:46:13 -07:00
Liyuan Liu	bcd20f96e0	update docs (#9423 ) Summary: minor modification: fixed the incorrect comment format for ```split_size_or_sections``` (https://pytorch.org/docs/master/torch.html#torch.split) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9423 Differential Revision: D8841367 Pulled By: soumith fbshipit-source-id: 2d09a38ce8d278ac29b3864e8d09a91cd296196c	2018-07-13 13:55:35 -07:00
Peter Goldsborough	fd25a2a86c	Remove virtual+override anti-pattern (#9335 ) Summary: I'm cramming through clang tidy emitted warnings. This PR addresses the `hi-cpp-override` check which warns that `virtual` + `override` is redundant, since `override` already signifies that a function is overriding and thus virtual. Where there was `virtual` + `override` I removed the `virtual`, where there was `virtual` and no `override` I removed `virtual` and added `override`. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9335 Differential Revision: D8807082 Pulled By: goldsborough fbshipit-source-id: e0a261053f6540a22cc56ec160a24aa285af6319	2018-07-13 11:25:01 -07:00
Jesse Hellemn	c6376cf999	A reasonable way to detect Python include dirs and library Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9361 Reviewed By: ml7 Differential Revision: D8837706 Pulled By: pjh5 fbshipit-source-id: 6979f9f37709c23e72b9169531787a60f3b37254	2018-07-13 11:25:00 -07:00
Michael Carilli	cc9dcdff16	Improving THCReduce.cuh's performance on latency-bound non-contiguous reductions (#9214 ) Summary: This PR improves perfomance of (formerly) latency-bound non-contig-dim reduction kernels by up to 20X, while maintaining determinism. Currently, reducing across a non-contiguous dimension uses the parallelism exposed across the number of output elements. This means that performance suffers if the number of output elements is small. Example: ``` a = torch.cuda.FloatTensor(32768, 32) a.sum(dim=0) ``` Before this PR, `a.sum`'s kernel (kernelReduceNoncontigDim_shared) took 138 microseconds on my machine. The speed-of-light estimate (based on a bandwidth of 700 GB/s) should be around 6 microseconds. After this PR's changes, `a.sum(dim=0)`'s kernel takes 6.9 microseconds on my machine. Christian implemented some nice logic to squeeze out better performance for cases like `a.sum` using intra-block and instruction-level parallelism across the dimension being reduced, but his kernel still only launched one block for every 32 output elements. This was insufficient to saturate the device in many cases, like `a.sum` here (where only one block is launched). My PR adds block cooperation across the dimension being reduced. Many blocks, instead of one block, help to reduce into each 32 output elements. Internally, each block leverages all of Christian's nice logic to compute a partial reduction into a per-block staging buffer, then the last block to finish combines the results to compute the final output. Block cooperation does require THCudaMalloc-ing staging and semaphore buffers, so it's not always worthwhile. I included a set of rough heuristics to decide when the kernel should choose to use block cooperation. These heuristics are based on Python-side timings of calling sum() many times in a loop, and comparing to the old implementation. I tested a wide range of sizes (to determine heuristics) and as long as the number of output elements is greater than 16ish, I don't think there are any remaining pathological sizes where users will encounter unexpectedly poor performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9214 Reviewed By: gchanan Differential Revision: D8808127 Pulled By: colesbury fbshipit-source-id: 139f310fc6ea6d187a7c983128f8eb8e1c9b4be3	2018-07-13 11:10:51 -07:00
Sam Gross	06e47d88b5	Remove ScalarConvert and cast_wrapper in favor of static_cast (#9401 ) Summary: While talking to mruberry, I noticed a few places that use special cast wrappers that are no longer necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9401 Differential Revision: D8828874 Pulled By: colesbury fbshipit-source-id: 2b7fe7ac3af3b71be26b43a9ad3949f8065a7bc9	2018-07-13 10:25:05 -07:00
Gregory Chanan	57a05983be	Move non-dimension reduction var/std to native wrappers. (#9400 ) Summary: This is to unify the handling of empty tensors in std/var between the dimension reduce and all reduce cases. Also to avoid triggering ubsan errors around divide by 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9400 Reviewed By: ezyang Differential Revision: D8828879 Pulled By: gchanan fbshipit-source-id: 6b9306805c94251eec28bd12e234618338bff4e3	2018-07-13 08:25:41 -07:00
Gregory Chanan	f09828ee0e	Support n-dimensional empty tensors in TensorShape methods. (#9362 ) Summary: This includes either bug fixes or NumPy semantics changes for the following methods: chunk, diagonal, unfold, repeat, flatten, reshape, split, unsqueeze. The n-dimensional empty tensor feature is still hidden behind a feature flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9362 Reviewed By: ezyang Differential Revision: D8817002 Pulled By: gchanan fbshipit-source-id: 6ff704ec96375f00b4dd39ebcd976efac0607fb4	2018-07-13 08:25:40 -07:00
Thomas Viehmann	3799b10c44	various documentation formatting (#9359 ) Summary: This is a grab-bag of documentation formatting fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9359 Differential Revision: D8831400 Pulled By: soumith fbshipit-source-id: 8dac02303168b2ea365e23938ee528d8e8c9f9b7	2018-07-13 02:48:25 -07:00
Xiaomeng Yang	bb9ff58c6d	Add cudnn activation ops (#9379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379 Add cudnn activation ops Reviewed By: houseroad Differential Revision: D8818013 fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a	2018-07-12 23:18:56 -07:00
Alexander Sidorov	b15a7d05ce	Inference benchmark: NUMA-awareness + multi-model support Summary: Pure experimental addition to guide us on delivering this into real production systems and their threadpools. Biggest limitation now is that we need to turn off BlackBoxPredictor activation deallocation logic to get to sane performance Reviewed By: highker Differential Revision: D8798029 fbshipit-source-id: ec7962689d605fba62b2c9e0904309df567a25a4	2018-07-12 20:09:19 -07:00
Vishwak Srinivasan	cd3e067e46	Add reversed(torch.Tensor) (#9216 ) Summary: Closes https://github.com/pytorch/pytorch/issues/3376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9216 Differential Revision: D8753933 Pulled By: soumith fbshipit-source-id: 5dac9b8b11ff34a205b6478db99b02fda8bd9cce	2018-07-12 19:42:07 -07:00
Sebastian Messmer	04fce5eca6	Remove dummy c10 folder (#9367 ) Summary: This was previously meant to be used for c10 code but that plan since changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/9367 Reviewed By: orionr Differential Revision: D8814361 Pulled By: smessmer fbshipit-source-id: 8e35fa74e160343a2bb8432013847677aa73695a	2018-07-12 19:14:55 -07:00
103yiran	117a5c3cc0	fix the annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9380 Differential Revision: D8821294 Pulled By: zou3519 fbshipit-source-id: b375cd0de9042bcaef1d22de104966fb704bd43e	2018-07-12 18:53:59 -07:00
Peter Goldsborough	4a796e4430	Initialization functions (#9295 ) Summary: To allow our C++ customers to use our initialization methods as well, this PR moves some of the code from `torch.nn.init` to ATen, calls it from Python, and adds equivalent code to the C++ frontend. Notes: 1. Happy to hear thoughts on whether it's ok to have e.g. `torch.nn.init.dirac_` and `torch.dirac_` (the former has a `no_grad` guard). We have this for `ones_` and stuff too, so I don't mind it. 2. I left the exception checking in Python because they throw `ValueError`s while ATen errors show as `RuntimeError`s. I imagine this would break users' error handling if someone were to have a `try`-`except` handler for `ValueError` (or maybe it's a far fetch) EDIT: After discussions with zdevito, the PR now simply duplicates the code in C++ exclusively for the C++ API, and we leave the Python code as-is (to make it easier for people to read/modify). ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9295 Differential Revision: D8813793 Pulled By: goldsborough fbshipit-source-id: 4b969f3f75952c1be4e837e19e23b8098e5fbd4b	2018-07-12 18:53:57 -07:00
Ruochen Liang	e90860780b	Migrate PriorCorrectionCalibration to Dper3 Summary: Migrated PriorCorrectionCalibration from Dper2 layer to Dper3 module. A few notes: 1. Calibration operators need dynamic linking; 2. All calibration implementation and tests are located in /modules/calibration/ 3. Added a type inference function in operator_shcema.h/operator_schema.cc Reviewed By: idning Differential Revision: D8756832 fbshipit-source-id: 7e6300a3bb3d3feaaf3b82340ece2f35d71493fc	2018-07-12 18:40:07 -07:00
Pieter Noordhuis	2ead3b0e54	Update include paths to use c10d prefix everywhere Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9397 Reviewed By: goldsborough Differential Revision: D8825909 Pulled By: pietern fbshipit-source-id: 25af272819e04eacbb6bd69e3f1c03c78f091d13	2018-07-12 17:55:22 -07:00
Syed Tousif Ahmed	34554d6adb	Enable standalone build of ATen (#9377 ) Summary: This PR changes the ATen `CMakeLists.txt` slightly, to enable standalone build of ATen inside PyTorch. Currently, the tests in ATen gets linked to `libcaffe.so libcaffe2.so`. As a result, ATen can't be built standalone without building from the root pytorch directory. I know that there is a big merge happening between caffe2 and pytorch and hence, the purpose of this PR is to really start a conversation on what would be the proper way of migrating the CMakeLists to enable clean builds. We should also follow up on this PR: https://github.com/pytorch/pytorch/pull/7275. For your reference, that PR has the explanation for why `-Wl --no-as-need` is needed. Moreover, without `set(ATen_CUDA_SRCS ${all_cuda_cpp})`, the standalone build will throw unresolved references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9377 Reviewed By: smessmer Differential Revision: D8825921 Pulled By: orionr fbshipit-source-id: c521159b4885639fc7990a9819202051455d07db	2018-07-12 14:25:00 -07:00
Pieter Noordhuis	43103af7a7	Use at::DeviceGuard everywhere (#9396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9396 The custom and local CUDADevice RAII wrapper has been superseded by at::DeviceGuard so it doesn't make sense to keep it around. Reviewed By: ailzhang Differential Revision: D8824200 fbshipit-source-id: 39fa00ffab4f495606c8001446e976bbf603e866	2018-07-12 13:43:47 -07:00
Junjie Bai	99dbcd0451	set CMAKE_HIP_ARCHIVE_APPEND (#9394 ) Summary: petrex To make `-DBUILD_SHARED_LIBS=OFF` working Pull Request resolved: https://github.com/pytorch/pytorch/pull/9394 Reviewed By: mingzhe09088 Differential Revision: D8822947 Pulled By: bddppq fbshipit-source-id: 4fb213c723138804fb0fdb3b381e32623cf14468	2018-07-12 12:24:49 -07:00
Chenguang Xi	feaee21968	Plotting embeddings norm being slow in distributed training. (#9325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9325 as title. Fixing by calculating norm on same device. Reviewed By: chocjy Differential Revision: D8668136 fbshipit-source-id: 6671a1858da4b0a6f766f067b7fa648a072cd219	2018-07-12 11:51:23 -07:00
Jesse Hellemn	374fee4804	Minor cleanup to scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9354 Reviewed By: orionr Differential Revision: D8810415 Pulled By: pjh5 fbshipit-source-id: 792b0dc6f6a4fabde38e2ad4475963526204914c	2018-07-12 10:54:44 -07:00
Alican Bozkurt	d017e1798f	add erfc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9366 Differential Revision: D8816768 Pulled By: soumith fbshipit-source-id: 7d709f932cf156a2e7ec71c710837beb7f647d66	2018-07-12 08:32:02 -07:00
Gregory Chanan	b154761547	Guard nullptrs around memcpy. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9370 Reviewed By: ezyang Differential Revision: D8816996 Pulled By: gchanan fbshipit-source-id: 8cad41a5259774d86e94807eb4a7f43f66fdf47f	2018-07-12 08:32:00 -07:00
Mary McBreen	483ae8cb5d	Replaces const ref with && for apply (#9175 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/5011 Tested with python test/test_autograd.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9175 Reviewed By: zdevito Differential Revision: D8736377 Pulled By: marymcbreen fbshipit-source-id: ff86f427f7b2cf0cab5912e7f32812bd0f49a712	2018-07-12 08:31:59 -07:00
Adam Paszke	e1863778e3	Guard gloo algorithm creation with DeviceGuard (#9371 ) Summary: Let us avoid creating a context on GPU0 unnecessarily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9371 Reviewed By: pietern Differential Revision: D8817343 Pulled By: apaszke fbshipit-source-id: a6cc91a1dd127840486a42c64f97f117475b0d5f	2018-07-11 23:08:31 -07:00
Pieter Noordhuis	aeccec755d	In Gloo backend use ring reduction by default (#9309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9309 This is faster when you're dealing with a small number of processes. Around the 16 processes mark the halving/doubling algorithm is faster. Reviewed By: apaszke Differential Revision: D8785364 fbshipit-source-id: 4a03326266e473026d943787186e149d0cc489f0	2018-07-11 21:40:01 -07:00
Tongzhou Wang	00b4b4703e	fix unsqueeze doc (#9374 ) Summary: fixes #9348 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9374 Differential Revision: D8817215 Pulled By: SsnL fbshipit-source-id: 047661ae4556bb19e4cd125b01a3fd75ed6642f3	2018-07-11 21:25:44 -07:00
Hassan Eslami	7f38ea4555	Remove unused feature: num PS tuning Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9293 Reviewed By: huitseeker Differential Revision: D8778499 fbshipit-source-id: 0cf59e02cb37b3fe22885c1b5e10b5d2e7585382	2018-07-11 18:54:45 -07:00
Chunli Fu	a487b08c2e	AutoBatching - IR transformation(basic operators) (#9198 ) Summary: Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation). - `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph. - Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select). - Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph) Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198 Reviewed By: zdevito Differential Revision: D8744466 Pulled By: ChunliF fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff	2018-07-11 18:25:07 -07:00
Akshay Chalana	e30ff68410	Add Hardtanh Export (#8804 ) Summary: Added hartanh CPU/GPU Implementations and backend tests to Caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804 Reviewed By: bddppq Differential Revision: D8813987 Pulled By: houseroad fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26	2018-07-11 18:09:51 -07:00
Lu Fang	1a8e826ed4	Skip the count_include_pad in average pool for now (#9365 ) Summary: Will create a bootcamp task. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9365 Reviewed By: bddppq Differential Revision: D8813889 Pulled By: houseroad fbshipit-source-id: bce1eaafd0efb3c27c0f71fcc40a8313e2b1c7b8	2018-07-11 18:09:50 -07:00
Peter Goldsborough	153e2e96d4	Make Sequential ref-counted (#9151 ) Summary: In the C++ API, `Sequential` currently was not refcounted itself, but stored `shared_ptr<AnyModule>` to get the reference semantics. This is unfortunate because most modules in the API are accessed via `->`, e.g. `Linear l(1, 2); l->forward(...);`. `Sequential` was different in that it had value semantics itself, thus was accessed via `.`. This PR makes `Sequential` store `AnyModule` (without extra indirection), and uses the same pImpl mechanism we use for all other modules to make `Sequential` have reference semantics itself. This makes it consistent with the rest of the library. It also removes one level of indirection inside of `Sequential`, which is cool. One thing I had to change was that the `ModuleHolder` with which the whole pImpl thing is implemented previously did some tricks to make `Linear(3, 4)` actually construct `Linear(LinearOptions(3, 4))`. This doesn't work well with `Sequential` since it takes a variadic parameter pack. Instead, I made `ModuleHolder` forward all arguments to the underlying module, and then further pushed the trick to forward parameters to modules' options types into the actual Modules. This adds one constructor per Module in the library. This is not something user modules have to do (unless they want this nice forwarding themselves). It makes the code simpler overall. ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9151 Reviewed By: ezyang Differential Revision: D8809298 Pulled By: goldsborough fbshipit-source-id: da68452c3de912fbc67af330ba93b5220de6909f	2018-07-11 17:24:59 -07:00
Ilia Cherniavskii	94bc4c6091	Ensure pending tasks are finished in case of failure (#9290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9290 Ensure pending tasks (e.g. network ops) are finished when net fails Reviewed By: heslami Differential Revision: D8777230 fbshipit-source-id: e57fcf1df6aa0ed8847923391502b666edb43674	2018-07-11 15:39:46 -07:00
Yan Shang	8253947256	Make error message more informative (#9352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9352 I am debugging a failed workflow f61490672, and found the original error message to be not informative. Differential Revision: D8808181 fbshipit-source-id: 3f524ca092881186a492c5c0456124ce31d54751	2018-07-11 15:09:46 -07:00
Orion Reblitz-Richardson	7f33ec55b2	Fix Eigen issue on OS X with CUDA and nvcc compile (#9350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350 Re-apply #9270 Breaking this out of #8338 This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer. Reviewed By: mingzhe09088 Differential Revision: D8794431 fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7	2018-07-11 14:00:05 -07:00
Xiaomeng Yang	cbcf45274b	Move tanh function to math (#9328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328 Move tanh function to math Reviewed By: houseroad Differential Revision: D8794745 fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc	2018-07-11 13:59:50 -07:00
Mingzhe Li	7d8b532c1f	Fix CUDA build failures (#9347 ) Summary: Breaking this out of #8338 This fixes some CUDA related build and runtime issues after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9347 Reviewed By: orionr Differential Revision: D8806954 Pulled By: mingzhe09088 fbshipit-source-id: 9f8e3feee06478d1ac2deb30796939453352d388	2018-07-11 13:39:59 -07:00
Yinghai Lu	80380f637c	Fix to make ONNXIFI flow work (#9340 ) Summary: Small step to have Relu test work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9340 Reviewed By: bddppq Differential Revision: D8807018 Pulled By: yinghai fbshipit-source-id: 429f3185e12afb12aaecfea8dd9595fdf838d356	2018-07-11 13:09:41 -07:00
Peter Goldsborough	18a975210d	Add explicit to conversions (#9336 ) Summary: Another code-mod for clang-tidy: Conversion operators should be marked explicit so that they don't cause unwanted implicit conversions. This is especially important for `operator bool()`, see https://stackoverflow.com/questions/39995573/when-can-i-use-explicit-operator-bool-without-a-cast ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9336 Reviewed By: apaszke Differential Revision: D8807065 Pulled By: goldsborough fbshipit-source-id: 0e9f4ebd0048a2a510c0d05fa410695d7e977eb1	2018-07-11 12:10:30 -07:00
Viswanath Sivakumar	c2dd90c40e	Add angle normalization for rotated boxes (#9056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056 Closes https://github.com/pytorch/pytorch/pull/9056 Updates bbox_transform for rotated boxes with angle info to normalize the predicted angle to be within [angle_bound_lo, angle_bound_hi] range. Reviewed By: pjh5 Differential Revision: D8706240 fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879	2018-07-11 11:25:54 -07:00
Viswanath Sivakumar	9126f95ac3	GenerateProposals and BoxWithNMSLimit ops: Add support for rotated boxes (#8953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8953 Closes https://github.com/pytorch/pytorch/pull/8953 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8655687 fbshipit-source-id: 4985739e585c07dd406b9386dc7f46ad93576798	2018-07-11 11:25:52 -07:00
Viswanath Sivakumar	491f317b24	NMS util for rotated boxes (#8954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8954 Closes https://github.com/pytorch/pytorch/pull/8954 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8618673 fbshipit-source-id: 4c54297e3b3bf614de4d7c0146176a419518790a	2018-07-11 11:25:49 -07:00
JerryShih	8da936ab52	Fix the build break for python3.7 PyUnicode_AsUTF8AndSize() prototype changing (#9259 ) Summary: https://docs.python.org/3.7/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize The return type changes from "char" to "const char". Pull Request resolved: https://github.com/pytorch/pytorch/pull/9259 Reviewed By: orionr Differential Revision: D8776219 Pulled By: pjh5 fbshipit-source-id: e5eadf71264002ba57cfb68dd39686a7ec074092	2018-07-11 10:39:43 -07:00
Adam Paszke	b9f575fc33	Remove legacy code from the JIT (#9323 ) Summary: In particular, get rid of backward tracing and CppOp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323 Reviewed By: ezyang Differential Revision: D8795935 Pulled By: apaszke fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe	2018-07-11 10:25:38 -07:00
Richard Zou	05559b4071	Accumulate MSELoss reduce=True into accreal instead of real (#9287 ) Summary: THNN was accumulating the result of reduction loss functions into real instead of accreal. This was causing precision issues with MSELoss. This patch only fixes MSELoss. Some of the other losses exhibit bad precision as well (because they accumulate into real instead of accreal) and require more investigation. I will open an issue for those (#9286) Fixes #8710 cc li-roy SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/9287 Reviewed By: SsnL Differential Revision: D8775708 Pulled By: zou3519 fbshipit-source-id: d1a1f159deee0cb90fd8e81e63b246115eea8e9e	2018-07-11 10:25:36 -07:00
Viswanath Sivakumar	748a90d05b	BBoxTransform op: Add support for rotated boxes (#8952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952 Closes https://github.com/pytorch/pytorch/pull/8952 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8598547 fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7	2018-07-11 10:25:34 -07:00
Zachary DeVito	01cffaa7e8	fix extra output in generate_code.py (#9339 ) Summary: operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339 Reviewed By: ezyang Differential Revision: D8798689 Pulled By: zdevito fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213	2018-07-11 10:25:31 -07:00
Thomas Viehmann	b2a74d17ad	document torch.utils.dlpack (#9343 ) Summary: dlpacks deserve documentation. :) I wonder whether it might make sense to merge the various small torch.utils pages (and include a link for the larger ones, e.g. data) to enhance the structure in the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9343 Differential Revision: D8801227 Pulled By: soumith fbshipit-source-id: 2980d271971743b86f052bec5a2cb4d146a90d9b	2018-07-11 07:46:09 -07:00
Lu Fang	04a7fc1dc4	Add Upsample support in C2 onnx backend for opset 1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9327 Reviewed By: ailzhang Differential Revision: D8798462 Pulled By: houseroad fbshipit-source-id: d7d1127a853de6a7bb8fdef146f283487e1e5569	2018-07-10 22:43:25 -07:00
Huamin Li	fb9f9c9ba2	Implement Sinh and Cosh (#9213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213 Closes https://github.com/pytorch/pytorch/pull/9213 Added hyperbolic trig functions Sinh and Cosh Reviewed By: BIT-silence Differential Revision: D8752566 fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7	2018-07-10 18:55:31 -07:00
Christian Puhrsch	00aeb0b84b	Privatize values for vec256 (#9321 ) Summary: Helps prevent calling functions of the base case on float/double/int subclasses that aren't supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9321 Reviewed By: colesbury Differential Revision: D8793627 Pulled By: cpuhrsch fbshipit-source-id: 7fde779ecd4b890dda406f3d1306b58bab40efe2	2018-07-10 18:11:16 -07:00
Johannes M Dieterich	b4c66459c5	Add pyHIPIFY scripts needed for ROCm transpilation to PyTorch (#8812 ) Summary: As discussed in call, this will allow us to keep this integral part of the effort to run PyTorch on ROCm in sync with the main code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8812 Reviewed By: ezyang Differential Revision: D8796245 Pulled By: bddppq fbshipit-source-id: 8e12c2acf6a7e0740f31b21e50be74e10ed8b12c	2018-07-10 18:02:43 -07:00
Roy Li	a47a30b9ce	Implement grid_sampler in aten (#8929 ) Summary: Partially addresses #8928. Maybe #7273? Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929 Reviewed By: ezyang Differential Revision: D8668919 Pulled By: li-roy fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c	2018-07-10 15:10:24 -07:00
Keren Zhou	ea1869244f	Change depthwise convolution bandwidth formula (#9317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9317 Change depthwise convolution bandwidth formula Reviewed By: hlu1 Differential Revision: D8786684 fbshipit-source-id: ba76fea94a6d2fda8d87f40dd626b3dfd90770ed	2018-07-10 14:24:10 -07:00
Zachary DeVito	0a679105ff	Fix missing accept file changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9313 Reviewed By: ezyang Differential Revision: D8789043 Pulled By: zdevito fbshipit-source-id: 283607116c49a4f3a82658d9b4d45f5df3ae283b	2018-07-10 13:39:24 -07:00
Christian Puhrsch	e9e47ce8f1	Vectorize sigmoid (#8612 ) Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a	2018-07-10 12:40:39 -07:00
Zachary DeVito	efefd1d7cf	Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885 ) Summary: This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness. Commit 1 ======= This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process. * Switches schema over to parsed declarations, in the future this will allow something like: ``` registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) { ... }) ``` This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines). The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that. Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types. This removes the other way we encoded schema, and makes it easier to see what schema are registered. Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6 * Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself. * Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change. * Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier. Commit 2 ======= This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed. * Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info. * Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes. * Remove addInterpreterOpHandler in favor of global operator registry. * Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered. * Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op. * Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first. * remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication * refactor stack manipulation functions into a separate header file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885 Reviewed By: jamesr66a Differential Revision: D8751048 Pulled By: zdevito fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557	2018-07-10 10:24:48 -07:00
peter	d867757649	Fix CUDA 8 build for Windows (#9300 ) Summary: Replacement of #9023. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9300 Differential Revision: D8781492 Pulled By: soumith fbshipit-source-id: 6c0994da46d3112c24769f92366836c397891d93	2018-07-10 10:24:46 -07:00
Mike Kelley	8e6e8098ce	Revert D8768025: [pytorch][PR] Fix Eigen issue on OS X with CUDA and nvcc compile Differential Revision: D8768025 Original commit changeset: 5b34017aeb67 fbshipit-source-id: 6ec892ff483bb9d966eb7138eadc77443972c8f8	2018-07-10 10:24:43 -07:00
Orion Reblitz-Richardson	bbeae24145	Fix Eigen issue on OS X with CUDA and nvcc compile (#9270 ) Summary: Breaking this out of #8338 This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer. cc mingzhe09088 smessmer BIT-silence Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/9270 Reviewed By: mingzhe09088 Differential Revision: D8768025 Pulled By: orionr fbshipit-source-id: 5b34017aeb67e35a1b5938d962181ccd4cd37591	2018-07-10 09:25:42 -07:00
Thomas Viehmann	3254bcaed8	Call deleter when destroying unconsumed DLPack PyCapsules (#9297 ) Summary: Usually DLPack consumer is expected to call DLManagedTensor's deleter to signal that it doesn't need the contents. This patch calls the deleter when freeing unconsumed DLPack capsules created by PyTorch. Test script: ``` import torch import torch.utils.dlpack import gc for i in range(10000): a = torch.randn(1000,1000, dtype=torch.float32, device='cuda') b = torch.utils.dlpack.to_dlpack(a) gc.collect() ``` Before patch: consume all GPU ram. After patch: constant GPU ram consumption. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9297 Differential Revision: D8781571 Pulled By: soumith fbshipit-source-id: 2ebadec6c857646220d632ca64110af430dbd52f	2018-07-10 07:56:59 -07:00
Kaiyu Shi	89c2b50a15	Grad clip for parameters on different devices (#9302 ) Summary: I'm trying to write a multi-gpu network by pipelining some layers onto different GPUs. However, the current gradient clip requires all the parameters to locate in the same device. The overhead of CUDA launch is reduced since the scalar calculation is performed on CPU, but it introduces extra data transfers. No performance regression is observed by running the following snippet: ```python import time import torch module = torch.nn.Sequential( torch.nn.LSTM(1024, 1024), torch.nn.LSTM(256, 256), torch.nn.Linear(100, 10000), ).cuda() torch.nn.utils.clip_grad_norm_(module.parameters(), 1) torch.cuda.synchronize() start = time.time() for _ in range(1000): torch.nn.utils.clip_grad_norm_(module.parameters(), 1) torch.cuda.synchronize() time_elapse = time.time() - start print('{} ms per clip'.format(time_elapse)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9302 Differential Revision: D8781551 Pulled By: soumith fbshipit-source-id: 9d76d01fe0531927f770a16b9523872a7e08e927	2018-07-10 07:56:55 -07:00
Tongzhou Wang	1597fc594d	3d conv should use int64_t (#9274 ) Summary: Fixes #9264 . There can be so many elements in the output of `vol2col` so it overflows `int` range! This PR changes 3d conv to use `int64_t` mostly. Also fixes some unused var warning (cc goldsborough ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9274 Differential Revision: D8770682 Pulled By: SsnL fbshipit-source-id: f6e37f1aa56fe1009dd4c9bcbc042244e47252db	2018-07-10 06:39:45 -07:00
Edward Yang	d0d1820814	Add weak pointer and finalizer support directly to THStorage. (#9148 ) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f	2018-07-10 06:25:33 -07:00
Lu Fang	e06abab264	Fix Upsample ONNX Symbolic (#9288 ) Summary: Adjust the change to changes in ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/9288 Reviewed By: ailzhang Differential Revision: D8779078 Pulled By: houseroad fbshipit-source-id: 7f387eeb35ae1f5a1494afc6287853a87a6173b4	2018-07-09 23:25:26 -07:00
Lu Fang	181d2a5e60	Add support of is_compatible for old version of onnx (#9284 ) Summary: Fix the problem if caffe2 works with old version of onnx Pull Request resolved: https://github.com/pytorch/pytorch/pull/9284 Reviewed By: yinghai Differential Revision: D8773894 Pulled By: houseroad fbshipit-source-id: 99b5a962099f854edc85a2ea815cb88c82a6e175	2018-07-09 21:09:14 -07:00
Yinghai Lu	7ace3a99ec	Fix TensorRT tests (#9285 ) Summary: ONNX-TensorRT is still using old opset (<7). Patch it for now. Future fix would be expose versioning in onnx exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9285 Reviewed By: houseroad Differential Revision: D8775268 Pulled By: yinghai fbshipit-source-id: c272073f80cce35ebd971e44ec9472e3c8fd4b9e	2018-07-09 20:40:19 -07:00
Peter Goldsborough	4498fb962b	Add space around operator (#9294 ) Summary: Fixes lint failure on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294 Differential Revision: D8779010 Pulled By: goldsborough fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91	2018-07-09 20:24:21 -07:00
Gregory Chanan	f92edf7ef4	N-dimensional empty tensors: indexing, factories, reductions. (#9209 ) Summary: This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM. Still remaining to add: 1) TensorShape functions 2) Simple linear algebra functions (matrix multiply variants) 3) Other functions that operate over a dimension (but don't reduce). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209 Reviewed By: ezyang Differential Revision: D8751257 Pulled By: gchanan fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23	2018-07-09 19:40:01 -07:00
peter	19ecb5f8ad	Fix docs for Windows CUDA 8 builds (#9254 ) Summary: Fixes #9200. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9254 Differential Revision: D8778011 Pulled By: soumith fbshipit-source-id: 0a2c2863ac1bc515397fc446039db64d1d4e236d	2018-07-09 18:55:03 -07:00
Jesse Hellemn	99ab082366	Making setup.py install work for Caffe2 (#8509 ) Summary: Tested on my mac on a pretty clean anaconda3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509 Reviewed By: orionr Differential Revision: D8702257 Pulled By: pjh5 fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689	2018-07-09 18:10:58 -07:00
Kaiyu Shi	342dbcc35a	Remove legacy redundant codes (#9252 ) Summary: Fix #9167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9252 Differential Revision: D8774644 Pulled By: soumith fbshipit-source-id: 0b004f497026bca3b101c577e78aec22bdc3df51	2018-07-09 16:55:28 -07:00
Hector Yuen	2b8aea3ada	add more logging messages to dimension checks of FCGradient (#9203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9203 Closes https://github.com/pytorch/pytorch/pull/9203 Added extra logging for FCGradient input dimension checks Reviewed By: yinghai Differential Revision: D8738549 fbshipit-source-id: d4f26572d86f3d44f40c9dca62d4f241ba15aead	2018-07-09 16:55:26 -07:00
Lu Fang	c67ade26a7	Add onnx support for clamp_min clamp_max (#9224 ) Summary: Add support for clamp as required by https://github.com/onnx/onnx/issues/1168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9224 Reviewed By: yinghai Differential Revision: D8758945 Pulled By: houseroad fbshipit-source-id: fad724d273c59f4527e96481ee6b2d14bfba205d	2018-07-09 16:25:44 -07:00
Mingzhe Li	01a7ca3d64	Fix Pytorch Mac build issues (#9283 ) Summary: Breaking this out of #8338 This fixed Mac build issues after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9283 Reviewed By: orionr Differential Revision: D8773459 Pulled By: mingzhe09088 fbshipit-source-id: 71942e8e6891a625e6b1a7dc0160e87444c64209	2018-07-09 15:40:46 -07:00
Mingzhe Li	29b1c2cfce	Install typing for Mac (#9271 ) Summary: Breaking this out of #8338 When BUILD_CAFFE2 and BUILD_ATEN are removed, we need to install typing on Mac. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9271 Reviewed By: orionr Differential Revision: D8768701 Pulled By: mingzhe09088 fbshipit-source-id: 052b96e90e64b01e6b5dd48b91c0fb12fb96b54a	2018-07-09 14:58:50 -07:00
Mingzhe Li	a70a90b28f	Fix pytorch linux build issues (#9273 ) Summary: Breaking out of #8338 This fixes the build issues with pytorch on linux machines after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9273 Reviewed By: orionr Differential Revision: D8768869 Pulled By: mingzhe09088 fbshipit-source-id: 2730426ed1bed398eb5dc804c7348aeeb27c93d3	2018-07-09 14:41:36 -07:00
Edward Z. Yang	d0ad696f9d	Warn about THPObjectPtr needing GIL. (#9265 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9265 Differential Revision: D8767687 Pulled By: ezyang fbshipit-source-id: 900b37f2749112cafc5b48e7b444a256df18186a	2018-07-09 13:55:22 -07:00
Orion Reblitz-Richardson	b19b38c427	Fix Mac CUDA issues (#9269 ) Summary: Breaking this out of #8338 This takes care of failures we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Specifically, smessmer fixed `std::hash` being handled in a weird way by nvcc and I fixed an nvcc template issue by moving `SparseNormalizeOp::RunOnDevice` implementation into the cc file. cc mingzhe09088 smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/9269 Reviewed By: mingzhe09088 Differential Revision: D8767984 Pulled By: orionr fbshipit-source-id: 550686bfcef6d331f16d593859c99169216c5c2e	2018-07-09 12:40:40 -07:00
Mingzhe Li	744cd90074	Fix Android build issue (#9275 ) Summary: Breaking this out of #8338 This fixed an Android build issue after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9275 Reviewed By: orionr Differential Revision: D8769913 Pulled By: mingzhe09088 fbshipit-source-id: afce52a12697757a0b2103c7c343e19ab158a9f7	2018-07-09 12:40:37 -07:00
Yinghai Lu	cb98c5020a	Normalize IDEEP spatial bn op test (#9276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9276 Use `checkDevice` instead rolling our own. Reviewed By: orionr Differential Revision: D8769401 fbshipit-source-id: bd47ec2b2501552c2da1cee2eb9ad96a215602b4	2018-07-09 11:55:41 -07:00
Orion Reblitz-Richardson	936f47f271	Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test. cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267 Reviewed By: mingzhe09088 Differential Revision: D8767819 Pulled By: orionr fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975	2018-07-09 11:55:39 -07:00
Orion Reblitz-Richardson	768a0e3298	Some more changes to support USE_CUDNN=OFF (#9268 ) Summary: Breaking this out of #8338 More changes required to support USE_CUDNN=OFF. We should be able to land some of our fixes before the big BUILD_CAFFE2 and BUILD_ATEN removal lands. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/9268 Reviewed By: mingzhe09088 Differential Revision: D8767981 Pulled By: orionr fbshipit-source-id: 0607ca2773253b685209c274a3adf70180d8ce58	2018-07-09 11:55:38 -07:00
Pieter Noordhuis	1483bb7246	Remove unused functions (#9223 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9223 TSIA Reviewed By: ezyang Differential Revision: D8755761 fbshipit-source-id: 284fa03397df5626bd56de150b90ba61ae3b8c6e	2018-07-09 10:09:47 -07:00
Tongzhou Wang	e8536c08a1	Update extension docs, fix Fold/Unfold docs (#9239 ) Summary: Commits: 1. In extension doc, get rid of all references of `Variable` s (Closes #6947 ) + also add minor improvements + also added a section with links to cpp extension :) goldsborough + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`. 2. Fix several sphinx warnings 3. Change `*` in equations in `module/conv.py` to `\times` 4. Fix docs for `Fold` and `Unfold`. + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks. 5. Fix doc saying `trtrs` not available for CUDA (#9247 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239 Reviewed By: soumith Differential Revision: D8762492 Pulled By: SsnL fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a	2018-07-08 19:09:39 -07:00
richard	f48e15624e	Unique cuda support (#8899 ) Summary: Add cuda support for unique. There is a simple test below for a tensor including 1M <int> data. And the performance is faster. ```python Performance cpu: 0.05040597915649414 s x: tensor([1, 3, 1, ..., 4, 9, 4]) x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9]) x inverse: tensor([0, 2, 0, ..., 3, 8, 3]) gpu: 0.015192985534667969 s y: tensor([1, 3, 1, ..., 4, 9, 4], device='cuda:0') y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0') y inverse: tensor([0, 2, 0, ..., 3, 8, 3], device='cuda:0') ``` ```python Code import torch import time x=torch.randint(1,10,(1000000,),dtype=torch.long) device = torch.device("cuda") y=x.to(device) start = time.time(); output,inverse = x.unique(sorted=True,return_inverse=True) stop = time.time(); print('cpu:',stop-start,'s') print('x:',x) print('x output:',output) print('x inverse:',inverse) start = time.time(); output1,inverse1 = y.unique(sorted=True,return_inverse=True) torch.cuda.synchronize(); stop = time.time(); print('gpu:',stop-start,'s') print('y:',y) print('y output:',output1) print('y inverse:',inverse1) ``` Closes https://github.com/pytorch/pytorch/pull/8899 Reviewed By: SsnL Differential Revision: D8677655 Pulled By: ezyang fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb	2018-07-08 17:09:26 -07:00
Zachary DeVito	819815d9c0	Fix missing compile_commands.json for aten (#9227 ) Summary: When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227 Reviewed By: goldsborough Differential Revision: D8757984 Pulled By: zdevito fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6	2018-07-08 16:54:34 -07:00
Xiang Gao	a615baa51f	move unbind to ATen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8587 Differential Revision: D8764086 Pulled By: soumith fbshipit-source-id: 7f311cf13c341040e1f2cf4a8f05723e32d38947	2018-07-08 16:46:35 -07:00
Jenny Ramseyer	66dc97e51c	#8714 Improve Error Messages for module re-assignment (#9212 ) Summary: Here's an improved error message. Let me know if this change makes the errors a little clearer. Closes https://github.com/pytorch/pytorch/pull/9212 Reviewed By: soumith Differential Revision: D8752896 Pulled By: jramseyer fbshipit-source-id: d2bd8462c3ddf14acd3de56a4c1aeb75a9bc4067	2018-07-08 16:46:33 -07:00
mruberry	d6f21fc663	Ports Streams to ATen (#8997 ) Summary: This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it: + Creates a new (THC free) at::CUDAStream class and API + Extends the at::Context API to expose it + Stubs the current THCStream and THCState APIs to use it + Updates THC to no longer violate stream encapsulation (stream.hpp is dead) + Adds an ATen cpp test of the API + Bonus: Removes some debug spew in test_nn.py The new API has several advantages over the old one: (1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t. (2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique) (3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState (4) The internals are completely encapsulated, unlike the historic THCStream (5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention) There are a couple things this PR does not do, however, which are left for future work: - It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream). - It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed. I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8997 Differential Revision: D8683375 Pulled By: soumith fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf	2018-07-08 16:25:09 -07:00
Bram Wasti	75919b4e18	Expose generic device copy algorithm (#9009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9009 Closes https://github.com/pytorch/pytorch/pull/9009 Nice little helper for the related stacked diff github_tests_pass Reviewed By: hyuen Differential Revision: D8688509 fbshipit-source-id: 22de241d69932210d161df1e29d9c41eb50a8133	2018-07-08 15:40:36 -07:00
fehiepsi	4ad6e53557	fix the deprecate argument in bce with logits (#9162 ) Summary: As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9162 Differential Revision: D8753892 Pulled By: SsnL fbshipit-source-id: 7ce9ac16571a550a3fa7b86d68eb5c077a5956fb	2018-07-07 10:26:35 -07:00
Junjie Bai	f40ed548d8	Bump onnx submodule (#9215 ) Summary: To include new onnx backend test cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/9215 Reviewed By: houseroad Differential Revision: D8754785 Pulled By: bddppq fbshipit-source-id: 2c113a7155c537c4ec5ddb021661d68acb775879	2018-07-06 15:42:22 -07:00
Benjamin Graham	067b270717	Optimize LeakyReLU and PReLU 'forward' functions on the CPU (#9206 ) Summary: This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread. ``` import os os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread import torch, torch.nn as nn, time def test_net(net,offset): net.eval() total=0 with torch.no_grad(): for _ in range(100): x = torch.randn(100,100,100)+offset start_time = time.time() y = net(x) total+=time.time()-start_time print(net, total*10, 'ms') for offset in [-1,0,+1]: test_net(nn.LeakyReLU(),offset) test_net(nn.PReLU(),offset) ``` Closes https://github.com/pytorch/pytorch/pull/9206 Reviewed By: yf225 Differential Revision: D8749491 Pulled By: btgraham fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a	2018-07-06 15:42:19 -07:00
Ailing Zhang	227c8f2654	Implement nn.functional.interpolate based on upsample. (#8591 ) Summary: This PR addresses #5823. * fix docstring: upsample doesn't support LongTensor * Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit) * Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode. * Add a new interpolate function API to replace upsample. Add deprecate warning for upsample. * Add an area mode which is essentially Adaptive_average_pooling into resize_image. * Add test cases for interpolate in test_nn.py * Add a few comments to help understand linear interpolation code. There is only "cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer. After the change, the files named as Upsampling.c works for both up/down sampling. I could rename the files if needed. Differential Revision: D8729635 Pulled By: ailzhang fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b	2018-07-06 15:28:11 -07:00
Yinghai Lu	766fa1fc96	Fix IDEEP CMakefile (#9217 ) Summary: The reason is that we are referencing `__ideep_looked_for` here: `77484d91db/cmake/Modules/FindMKL.cmake (L350)` This was first flushed out in https://github.com/pytorch/pytorch/pull/8105 and probably can help with https://github.com/pytorch/pytorch/issues/9024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9217 Reviewed By: houseroad Differential Revision: D8754491 Pulled By: yinghai fbshipit-source-id: 70aecc2d60684b9ea522403dc98a0a1a2c3db7e6	2018-07-06 15:28:07 -07:00
Hao Lu	af107c4d16	Fix shape inference bug (#9199 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9199 The input shapes are not logged correctly in production because `PerfNetObserver::Stop()` only gets called after the inference is done for the net and in the mobile models, it's common practice to reuse the blobs as much as possible to save memory. And the shapes of the blobs keep changing during inference. By the time you you query `InputTensorShapes()` in `PerfNetObserver::Stop()`, you only get the final shape of the blobs. To fix this bug, I moved the 'InputTensorShapes()' query from `PerfNetObserver::Stop()` to `PerfOperatorObserver::Stop()`. The latter gets called at the end of operator->run() whereas `PerfNetObserver::Stop()` gets called at the end of net->run(). Also remove `PerfOperatorObserver::getAnalyticalCost()` since it's now done on the server side and no longer needed on mobile Reviewed By: Maratyszcza Differential Revision: D8743346 fbshipit-source-id: 5d2d0132e3f5e084be7d0173863e695e62a6b4a0	2018-07-06 15:15:17 -07:00
Zhaoheng Ni	f87499a8f3	Modify the original PackSegments operator by adding "max_length" argument (#9048 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9048 max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim. Reviewed By: bddppq Differential Revision: D8702782 fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af	2018-07-06 14:33:59 -07:00
Xiuyan Ni	4e5369349f	Add FTRL Optimzier with Group Lasso regularizer (#9074 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9074 Implement an optimzier based on FTRL Optimzier which support Group Lasso regularizer. The relevant paper list for this optimizer: 1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf, 2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf Differential Revision: D8623146 fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452	2018-07-06 13:41:00 -07:00
Bram Wasti	c0bfe2a6ed	Clean up conversion registration Summary: [x] get registry working [x] move all current ops to registry Reviewed By: yinghai Differential Revision: D8706115 fbshipit-source-id: 8dfce79039b57dea1c15e8e291cdd74f39766ade	2018-07-06 13:40:56 -07:00
Shaoliang Nie	da39c24971	Add GroupL1Norm regularizer (#9115 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9115 As desc Reviewed By: hlu1 Differential Revision: D8718011 fbshipit-source-id: c9d750662064dd6e6362b6b13d9d0175e93e60e4	2018-07-06 13:26:09 -07:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
Peter Goldsborough	d863391871	nn::Module::as (#9149 ) Summary: Added a way to `dynamic_cast` an `nn::Module` and get a pointer to it. `nn::Module::is<T>` just checked if the return value of the `dynamic_cast` was nullptr, so I got rid of `is<T>` since it's equivalent to `as<T> != nullptr`(or just `as<T>` due to boolean conversion). We're now at ``` if (auto* conv = module.as<nn::Conv2d>()) { conv->weight.data().normal_(0.0, 0.02); } else if (auto* bn = module.as<nn::BatchNorm>()) { bn->weight.data().normal_(1.0, 0.02); bn->bias.data().fill_(0); } ``` ezyang apaszke ebetica Closes https://github.com/pytorch/pytorch/pull/9149 Differential Revision: D8735954 Pulled By: goldsborough fbshipit-source-id: e2b8f6f0cea16a621f8bc0807a33cc7651d25154	2018-07-06 11:10:29 -07:00
Zachary DeVito	9aded4351e	Allow arbitrary namespaces for Symbols (#9018 ) Summary: Context: I am updating jit::FunctionSchema to use `Symbol name;` rather than `std::string name`. Sometimes the name refers to a builtin thing like `prim::UnpackTuple`, sometimes to an aten operator like `aten::add`, and sometimes just to a raw string, like `my_method_foo` that really doesn't belong in any namespace and should be printed to the user in that form. For this last case, I want the ability to create a raw Symbol again, like was previously possible, that just represents an interned string. This PR enables that use, keeps the other functionality still possible, and simplifies interned_string's implementation a bit. This changes how Symbol is implemented. Now the namespace of a symbol is optional and the namespaces themselves are Symbols. This allows Symbol to be used with arbitrary namespaces, and allows you to use Symbol as an simple interned string using via fromQualString and toQualString without :: in the string. This also simplifies the implementation. Like with string conversion, builtin primitives go through a fast path for namespace lookup while registered symbols require holding a lock and reading an array entry to lookup the namespace. Note: alexnet expect file update is from a previous commit. It doesn't run in CI because pytorch vision is not installed. Closes https://github.com/pytorch/pytorch/pull/9018 Reviewed By: SsnL Differential Revision: D8690449 Pulled By: zdevito fbshipit-source-id: b65ee57704641d7294fe115c5470cf55d406458f	2018-07-06 10:11:15 -07:00
Will Feng	84884dc2d3	Allow passing '0' to ASAN/UBSAN flags (#9202 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/9187, This PR makes setting the `PYTORCH_TEST_WITH_ASAN` and `PYTORCH_TEST_WITH_UBSAN` flags easier internally, by allowing the flags to be set to `0`. Closes https://github.com/pytorch/pytorch/pull/9202 Differential Revision: D8745533 Pulled By: yf225 fbshipit-source-id: 6293f52f2e8b1c3ef150becfdc2dd7ded56d5d80	2018-07-06 08:40:37 -07:00
Gregory Chanan	168a29f497	Create native wrappers around dimension reduction functions. (#9197 ) Summary: This is necessary for n-dimensional empty tensors, which have special native handling. Closes https://github.com/pytorch/pytorch/pull/9197 Differential Revision: D8744083 Pulled By: gchanan fbshipit-source-id: 3cc692a1d62cbeb169681b7c40e3df50e12953b7	2018-07-06 08:11:23 -07:00
Adam Paszke	1f1fb813a6	Use a static random_device in StorageSharing (#9080 ) Summary: I've been cleaning up my email notifications, and noticed that this PR used a stack-allocated `random_device`. This is generally a bad idea due to this sentence from the C++ reference (emphasis mine): > `std::random_device` may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. In this case each `std::random_device` object may generate the same number sequence. If this is how this object is implemented, then this `rd()` call will give the same result at every call. cc yf225 Closes https://github.com/pytorch/pytorch/pull/9080 Differential Revision: D8748342 Pulled By: soumith fbshipit-source-id: 22987befee61ff7faacda5ecc10138c2ac5d26ff	2018-07-06 07:39:53 -07:00
Matthew Rocklin	eadc5071e8	Use torch.save in _StorageBase.__reduce__ (#9184 ) Summary: Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storage objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol or with copy For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See #9168 for context Closes https://github.com/pytorch/pytorch/pull/9184 Differential Revision: D8747794 Pulled By: soumith fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79	2018-07-06 07:24:53 -07:00
Tongzhou Wang	7b25cbbef9	Test nn.Module on non-contiguous inputs (#9114 ) Summary: 1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules. 2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs. 3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`, a. prefix all current `embedding_bag` functions with `_`, indicating that they require input to be contiguous (there is a check in each function). b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag` 3. Make many ATen `embedding` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`. 4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests. 5. Update `TensorUtils.cpp` to use `AT_` macros. Request: review from cpuhrsch on the `Embedding` changes. review from ezyang on ATen sparse & BN changes. Closes https://github.com/pytorch/pytorch/pull/9114 Differential Revision: D8717299 Pulled By: SsnL fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527	2018-07-05 21:09:34 -07:00
Tongzhou Wang	a769fae91d	Fix TestAutograd.test_pinverse not actually testing (#9192 ) Summary: cc vishwakftw Also added a check if none of the input tensors in `gradcheck` have `requires_grad=True`. Closes https://github.com/pytorch/pytorch/pull/9192 Differential Revision: D8739401 Pulled By: SsnL fbshipit-source-id: 81bb3aa0b5c04eb209b137a4bd978e040e76cbcd	2018-07-05 18:55:00 -07:00
Will Feng	ff501c30af	Turn on UBSAN in the OSS build (#8813 ) Summary: Copy of https://github.com/pytorch/pytorch/pull/8802 Closes https://github.com/pytorch/pytorch/pull/8813 Differential Revision: D8707364 Pulled By: yf225 fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417	2018-07-05 15:55:49 -07:00
Xiaomeng Yang	21c420c32c	Remove unused RowwiseArgMaxOp (#9119 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9119 Remove unused RowwiseArgMaxOp Reviewed By: houseroad Differential Revision: D8719826 fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5	2018-07-05 15:25:28 -07:00
Lu Fang	f45dfbccef	Add support for ArgMax and ArgMin in C2 onnx backend and frontend (#9050 ) Summary: Pass the end to end test cases in https://github.com/onnx/onnx/pull/1049 Closes https://github.com/pytorch/pytorch/pull/9050 Reviewed By: hlu1 Differential Revision: D8703757 Pulled By: houseroad fbshipit-source-id: 63308202e349dfc02d532e87f49495ba1aab085b	2018-07-05 14:26:08 -07:00
Gao, Xiang	213540cd85	Add meshgrid to PyTorch (#8581 ) Summary: Part of this issue https://github.com/pytorch/pytorch/issues/7580 Closes https://github.com/pytorch/pytorch/pull/8581 Differential Revision: D8661660 Pulled By: soumith fbshipit-source-id: 4a72fb5152ed6eb4d57f14de691bf09a2a2e5b0c	2018-07-05 11:25:27 -07:00
Will Feng	1c9073b43a	Allow passing '0' to NO_MULTIPROCESSING_SPAWN (#9187 ) Summary: This PR makes setting the `NO_MULTIPROCESSING_SPAWN` easier internally, by allowing the flag to be set to `0`. Closes https://github.com/pytorch/pytorch/pull/9187 Differential Revision: D8736206 Pulled By: yf225 fbshipit-source-id: b8a34cb9a747b13bc9428777a3ed766ce441cfe1	2018-07-05 11:10:46 -07:00
Vishwak Srinivasan	14cbd9adb8	Implement torch.pinverse : Pseudo-inverse (#9052 ) Summary: 1. Used SVD to compute. 2. Tests in test_autograd, test_cuda and test_torch 3. Doc strings in _torch_docs.py and _tensor_docs.py Closes #6187 Closes https://github.com/pytorch/pytorch/pull/9052 Reviewed By: soumith Differential Revision: D8714628 Pulled By: SsnL fbshipit-source-id: 7e006c9d138b9f49e703bd0ffdabe6253be78dd9	2018-07-05 09:11:24 -07:00
Francisco Massa	f6027bb15d	Install hpp headers for CPP Extensions (#9182 ) Summary: With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`. This addresses issues such as ``` /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory ``` Closes https://github.com/pytorch/pytorch/pull/9182 Reviewed By: soumith Differential Revision: D8734581 Pulled By: fmassa fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a	2018-07-05 07:55:25 -07:00
vishwakftw	08daed40f7	Fix bug in flip() (#9156 ) Summary: Closes #9147 Added a test to prevent regression in test_torch Added entries in docs cc ezyang weiyangfb Closes https://github.com/pytorch/pytorch/pull/9156 Differential Revision: D8732095 Pulled By: soumith fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61	2018-07-04 07:24:01 -07:00
Pieter Noordhuis	4b2b690792	Install THC/THCGeneral.hpp (#9159 ) Summary: This file was added in #9107 but wasn't installed. The libraries in ./torch/lib use the headers from Caffe2/ATen from their temporary install path at torch/lib/tmp_install, and c10d was not able to find THC/THCGeneral.hpp before this fix. Closes https://github.com/pytorch/pytorch/pull/9159 Reviewed By: Yangqing Differential Revision: D8731107 Pulled By: pietern fbshipit-source-id: d6009f6f6e8e6e0f37dea24cc4c3570736943ab1	2018-07-03 21:40:44 -07:00
vishwakftw	49f88ac956	Add grid lines for activation images, fixes #9130 (#9134 ) Summary: 1. Add dashed light blue line for asymptotes. 2. RReLU was missing the activation image. 3. make clean in docs will remove the activation images too. Sample image: ![image](https://user-images.githubusercontent.com/23639302/42224142-5d66bd0a-7ea7-11e8-8b0a-26918df12f7c.png) Closes https://github.com/pytorch/pytorch/pull/9134 Differential Revision: D8726880 Pulled By: ezyang fbshipit-source-id: 35f00ee08a34864ec15ffd6228097a9efbc8dd62	2018-07-03 19:10:00 -07:00
nkhuyu@gmail.com	e3dbdb2a17	Fix the comments: code and comments dimensions mis-match (#9070 ) Summary: This will resolve the code and comments mis-match issue. Closes https://github.com/pytorch/pytorch/pull/9070 Differential Revision: D8712261 Pulled By: ezyang fbshipit-source-id: a8a7d8af890a41ec246e11c2a62b0bde297be9c1	2018-07-03 14:39:57 -07:00
Roy Fejgin	b479494ed4	loss plugin: Fix indexing into a scalar (#9143 ) Summary: The loss plugin was using the old-style loss[0] access, which in PyTorch 0.4 and later is an attempt to index into a scalar, generating a warning. Replaced that with loss.item(). This fixes https://github.com/pytorch/pytorch/issues/9142 Closes https://github.com/pytorch/pytorch/pull/9143 Differential Revision: D8726403 Pulled By: ezyang fbshipit-source-id: 6c496b140a74d22c8423f511db901b18615fd6fa	2018-07-03 14:25:44 -07:00
Edward Yang	b432837a9d	Add some missing error checks in sparse. (#9140 ) Summary: - There were missing error messages for AT_CHECK in SparseTensorImpl::set_indices_and_values - We have to check that the backends of all our inputs line up, since native does not do it for us. - Some math operations were missing shape tests. Fixes #9110 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/9140 Differential Revision: D8724349 Pulled By: ezyang fbshipit-source-id: 3c75104187aca97cbe92bb0ec24f6ded07b2c3d6	2018-07-03 13:11:12 -07:00
Gregory Chanan	f17b9e4cde	Fix boolean indexing. (#8920 ) Summary: Booleaning indexing was special cased to handle a single boolean value, but didn't generally work given multiple booleans. This PR unifies the behavior with slicing. Note that only 'True' and torch.tensor(True) behave like NumPy due to the lack of n-dimensional empty tensors. The corresponding tests for false values have been added, but are guarded behind a flag until we add n-dimensional empty tensors. Closes https://github.com/pytorch/pytorch/pull/8920 Reviewed By: ezyang Differential Revision: D8661876 Pulled By: gchanan fbshipit-source-id: 0dc8a45a303aa41f729d04ab8908cfaf2e3ce3d7	2018-07-03 10:24:12 -07:00
Jesse Hellemn	4f89777d29	Removing extraneous main function to fix buck test detection (#9121 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9121 This main function causes 'buck test caffe2_test_cpu' to run 0 tests Reviewed By: orionr Differential Revision: D8719343 fbshipit-source-id: dc1cf76b0355637eaae193be2159f5746873b9f9	2018-07-03 09:25:12 -07:00
Edward Yang	e09d993d8b	Move easy THStorage/THCStorage functions out of generic (#9136 ) Summary: Some functions are exactly implemented in THStorage_; in that case, we called those functions directly. Stacked on #9135 Closes https://github.com/pytorch/pytorch/pull/9136 Reviewed By: Yangqing Differential Revision: D8723998 Pulled By: ezyang fbshipit-source-id: 653d23a5e1db4b9bdda50641fa97730894cc8ed5	2018-07-03 09:11:51 -07:00
peter	9b0cece9b0	Enable the general usage of _download_url_to_file (#9090 ) Summary: A requirement for the fix on https://github.com/pytorch/examples/issues/378. Closes https://github.com/pytorch/pytorch/pull/9090 Reviewed By: goldsborough Differential Revision: D8712254 Pulled By: ezyang fbshipit-source-id: b28765f24d891890e9d88757ee4ec704e38e6af7	2018-07-02 19:55:39 -07:00
Peter Goldsborough	97b9712aed	Create Sequential::extend (#9116 ) Summary: There is no way to concatenate two `Sequential`s in Python, but it's also easier to do in an immutable fashion by just writing `Sequential(first.modules() + second.modules())`. Concatenating vectors isn't as easy in C++, so I think it's fair to save users some for loops by giving them `Sequential::extend()`. apaszke ebetica ezyang CC jamespinkerton Closes https://github.com/pytorch/pytorch/pull/9116 Reviewed By: ezyang Differential Revision: D8719630 Pulled By: goldsborough fbshipit-source-id: 840d7ac70755350e6202b493c531e30ecbb6546f	2018-07-02 19:42:03 -07:00
Junjie Bai	16570ef0d5	Update onnx submodule to include the protobuf fix for windows Summary: Closes https://github.com/pytorch/pytorch/pull/9113 Reviewed By: houseroad Differential Revision: D8717259 Pulled By: bddppq fbshipit-source-id: c99a4390b764707affea7db765abef789230f497	2018-07-02 19:42:01 -07:00
Roy Li	21c786071b	update nn loss tests to use new reduction arg (#9118 ) Summary: The tests were using the old args, which caused them to emit a lot of deprecation warnings. closes #9103. Reviewed By: ezyang Differential Revision: D8720581 Pulled By: li-roy fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8	2018-07-02 19:41:57 -07:00
Edward Yang	4d57a1750c	Unify THStorage and THCStorage structs. (#9107 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9107 Some details about how this was done: - For now, the allocators for CPU and CUDA are different (unifying the allocators is a bigger change to make, I'll contribute this in a later patch). To smooth this over, the allocator field now stores a void* instead of THAllocator* or THCDeviceAllocator*; to make this clear the field is renamed to allocatorVoidPtr. - Some THStorage functions which were generated per-scalar are now generalized, and thus moved out of the generic/ library. This way they can be called directly from a non-code-generated at::Storage - THCState is moved into a C++ header. This is actually not really related to this particular diff, but I'll need it soon to replace THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't mention it in a C header file.) - THPPointer needs to be adjusted, since there is no more type refinement between THStorage/THCStorage for it to template match over. This is a little tricky, because I can't refer to THCStorage_free unless we actually compile with CUDA. So there's two copies of the function now: one for the CPU build, one for the CUDA build. If we ever split CUDA/non-CUDA Python builds, you will have to indirect this through some dynamic dispatch. I want to soon replace the THCDeviceAllocator pointers in THCState with at::Allocator, but I can't reference a C++ namespaced type from C code, so THCState needs to move. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/9087 Reviewed By: orionr Differential Revision: D8712072 Pulled By: ezyang fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284	2018-07-02 17:09:52 -07:00
Peter Goldsborough	5d474e1812	Make all module members public (#9111 ) Summary: Having circulated the C++ API a bit, I found that it would make it easier for folks to access module parameters directly than through the `parameters()` map. So here I make all variables/submodules and also the configuration options for every module public. For RNNs, I also updated the names of parameters to match PyTorch, e.g. `hhw` -> `w_hh`. This should make it easier to transition from Python. apaszke ebetica Closes https://github.com/pytorch/pytorch/pull/9111 Differential Revision: D8717112 Pulled By: goldsborough fbshipit-source-id: 3d36d5e161f7a86f44db7136c9c2fa53067abe1c	2018-07-02 16:09:57 -07:00
Wei Yang	cb1bfe91af	Deprecated several functions at torch.nn.functional (#8748 ) Summary: 1. fixes #6245 2. deprecated tanh, sigmoid Closes https://github.com/pytorch/pytorch/pull/8748 Differential Revision: D8697975 Pulled By: weiyangfb fbshipit-source-id: f30714aa0611a1fe870040692f3dbcc8238aece9	2018-07-02 15:54:46 -07:00
Bram Wasti	50392cc554	Store OperatorDef by copy (#9108 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9108 OperatorDef ownership was given to the net in the past, we no longer want to do that Reviewed By: pjh5 Differential Revision: D8705347 fbshipit-source-id: 34976de202a7a7a71b935dd13c1bc8e9c73552e0	2018-07-02 15:42:18 -07:00
Jerry Zhang	b79e8f79d8	Make SumElementsGradient use copy (#9039 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9039 att Reviewed By: ezyang Differential Revision: D8696455 fbshipit-source-id: 945e49a4c294fa39f847576d44ca0e6a32ecaf18	2018-07-02 13:25:12 -07:00
Thomas Viehmann	e977485449	detach spectral norm calculated weight in eval mode (#9020 ) Summary: As we left weight to be the last calculated weight in eval mode, we need to detach it from the computation in order to facilitate using backward. The typical use case is in GANs when the discriminator has spectral norm, is in eval mode and we want to backprop through the discriminator to get weight gradients for the generator. Closes https://github.com/pytorch/pytorch/pull/9020 Reviewed By: ezyang Differential Revision: D8694054 Pulled By: SsnL fbshipit-source-id: 09ee5843687cac3ed4c40759ac577a14c5371730	2018-07-02 10:39:47 -07:00
Yavuz Yetim	553c41f082	Adds serialization path (#9035 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9035 This diff builds on the structure in the stacked diff to add serialization/deserialization. It supports the old format and a new suggested format. Reviewed By: ilia-cher Differential Revision: D8415115 fbshipit-source-id: acaacce2b015f4c6ac0ae22625455290a3f30262	2018-07-02 09:09:39 -07:00
Tongzhou Wang	623ae0c07c	Fix loading 0.4 BN checkpoints (#9004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8481 Closes https://github.com/pytorch/pytorch/pull/9004 Reviewed By: soumith Differential Revision: D8684017 Pulled By: SsnL fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9	2018-07-01 22:24:21 -07:00
Tongzhou Wang	179807a8c7	Fix MAGMA svd and eig (#9082 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9079 There is room for speed-up for both functions (see https://github.com/pytorch/pytorch/issues/9083), but let's get this in to unblock #9052 . Closes https://github.com/pytorch/pytorch/pull/9082 Reviewed By: ezyang Differential Revision: D8711687 Pulled By: SsnL fbshipit-source-id: f043a9bf55cb6aec5126c3331d35761f7aa3f8e3	2018-07-01 22:24:17 -07:00
Soumith Chintala	474fdd7e2d	minor pybind for jit (#8890 ) Summary: add two small bindings to recently added attributes. Also want to leave a reference gist here: https://gist.github.com/soumith/8102ef39530bac09070912b1a5401d0f It showcases: - traced a module - symbolically differentiated the forward graph, to get a forward, backward graph - executed the subsequent forward + backward graphs correctly - compared the jit vs non-jit results Closes https://github.com/pytorch/pytorch/pull/8890 Reviewed By: ezyang Differential Revision: D8677663 Pulled By: soumith fbshipit-source-id: a29919c05baad997cd7fb7df718f933a83035118	2018-07-01 21:39:29 -07:00
Yan Zhu	8364470e5c	fix expty batch for softmax (#9075 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9075 as title Reviewed By: QueryConnectionException Differential Revision: D8710616 fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4	2018-07-01 16:40:14 -07:00
peter	04f2708265	Fix build script for Windows (#9060 ) Summary: 1. Escape quotes 2. Use file exist logic to determine build success/failure Closes https://github.com/pytorch/pytorch/pull/9060 Differential Revision: D8707290 Pulled By: soumith fbshipit-source-id: a34265f46725eaaf9489bc38546200aeae75e8a9	2018-07-01 07:10:06 -07:00
Roy Li	c61f0217a5	combine size_average and reduce args in loss functions (#8018 ) Summary: closes #7929 Closes https://github.com/pytorch/pytorch/pull/8018 Differential Revision: D8682540 Pulled By: li-roy fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936	2018-07-01 05:39:00 -07:00
Xiaomeng Yang	03e7953a98	Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9072 Use FixedDivisor in Reduce and Broadcast CUDA kernels Reviewed By: houseroad Differential Revision: D8710243 fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb	2018-07-01 00:25:34 -07:00
Will Feng	90fd4df695	Add flag for disabling tests with multiprocessing spawn start method (#9061 ) Summary: This will resolve some of the timeout issues in CPU and GPU tests internally. Closes https://github.com/pytorch/pytorch/pull/9061 Reviewed By: ezyang Differential Revision: D8707471 Pulled By: yf225 fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246	2018-06-30 14:39:11 -07:00
Luca Antiga	2c6c53f5ce	Ensure that domain starts with domain_prefix before extracting substring (#9053 ) Summary: Fixes #9049. When provided with a domain string that lacks proper prefix, i.e. `org.pytorch.`, an exception is thrown. Closes https://github.com/pytorch/pytorch/pull/9053 Differential Revision: D8708264 Pulled By: ezyang fbshipit-source-id: e2593d8d36a17d3bb26fc0b239a61b84f1c38ecb	2018-06-30 10:39:40 -07:00
Peter Goldsborough	0515664c42	Make _C depend on csrc-no-python (#9057 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9057 Make the `_C` target depend on the `csrc-no-python` target. Also removes the `csrc` target and the with-python version of autogradpp (which is not used). Let me know if we should pick better names here. I also ran into a nasty linker issue with only one symbol being undefined. It turns out had been given inline linkage in the `.cpp` file, which I believe is an error. Reviewed By: orionr Differential Revision: D8705750 fbshipit-source-id: 8de083e371dbf5e9f12c15572d88e1c595dfa087	2018-06-29 20:39:24 -07:00
Yan Zhu	b07ea04e23	empty batch for spatialBN (#8933 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8933 spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting: during training, when batch_size = 0: in forward, output's saved_mean and saved_var are zeros. in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros. Reviewed By: pjh5 Differential Revision: D8644699 fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c	2018-06-29 18:40:41 -07:00
Tongzhou Wang	d7487bfe9e	Speed-up multidim sum (#8992 ) Summary: 1. Instead of using non `_out` variant, we allocate a buffer and use `_out` variant to write the intermediate results into the buffer. 2. Reduce dimensions in order of decreasing sizes. Benchmark: Sum a randn tensor of shape `[200, 1, 30, 40, 20, 1, 50]` along dimensions `[4, 6, 3, 0, 2, 5]`. Averaged across 1000 times: ``` before patch: CPU: 0.0441 s CUDA: 0.0273 s after patch: CPU: 0.0234 s CUDA: 0.0047 s ``` Closes https://github.com/pytorch/pytorch/pull/8992 Differential Revision: D8681069 Pulled By: SsnL fbshipit-source-id: 2c5d5af5c5a284f2e945181f2b24ee8c78becd50	2018-06-29 18:40:39 -07:00
Peter Goldsborough	9ce15173fb	Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012 ) Summary: The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class. The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set. To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API. I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/9012 Reviewed By: pjh5 Differential Revision: D8689786 Pulled By: goldsborough fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c	2018-06-29 17:25:23 -07:00
Lu Fang	863754c722	Update the ONNX op coverage in C2 Summary: Closes https://github.com/pytorch/pytorch/pull/9051 Reviewed By: pjh5 Differential Revision: D8704583 Pulled By: houseroad fbshipit-source-id: 186e8b62378ab4f7cdef5fa77dc08c6b9ddc9cc0	2018-06-29 17:25:19 -07:00
Ailing Zhang	d793473e60	add note to avoid memory surge on GPU (#9019 ) Summary: Addresses #7415 . Adding a note first, will do the API change if there's a need in the future. Closes https://github.com/pytorch/pytorch/pull/9019 Differential Revision: D8694056 Pulled By: ailzhang fbshipit-source-id: 0b6fa43fa62ac55deff3b3b099d1bc9fee74a5f9	2018-06-29 16:55:17 -07:00
Chunli Fu	67b21117b7	Add BatchTensor class (#8922 ) Summary: Add BatchTensor class - construct from data, mask, dims or construct from list of tensors - can return a list of tensors from an BatchTensor class next step: do IR level transformation and operators Closes https://github.com/pytorch/pytorch/pull/8922 Differential Revision: D8668986 Pulled By: ChunliF fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e	2018-06-29 15:57:27 -07:00
James Reed	3a71cf2e54	Disable verbose printing for time sequence prediction test Summary: Closes https://github.com/pytorch/pytorch/pull/9040 Reviewed By: soumith, wanchaol Differential Revision: D8697870 Pulled By: jamesr66a fbshipit-source-id: 212fe14aaf9c60c4c9c6d383b202395b1d0ec680	2018-06-29 12:40:18 -07:00
James Reed	7a1081b310	Re-enable passing operator-level tests (#9044 ) Summary: Just tried these and they work now Closes https://github.com/pytorch/pytorch/pull/9044 Reviewed By: soumith Differential Revision: D8698819 Pulled By: jamesr66a fbshipit-source-id: 1d5574de1819aa31fc36ad245186c7aa68587178	2018-06-29 12:25:28 -07:00
James Reed	b3fe200704	Fix TestJit.test_alexnet expect file Summary: Closes https://github.com/pytorch/pytorch/pull/9041 Reviewed By: soumith Differential Revision: D8698147 Pulled By: jamesr66a fbshipit-source-id: 63eb1bc96562b6f972aeba8748454efb9c889d5c	2018-06-29 12:25:25 -07:00
Pieter Noordhuis	f6cfd83a80	Find unused port for test dynamically (#9037 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9037 Fixes flaky test failures due to port in use. Reviewed By: soumith Differential Revision: D8696779 fbshipit-source-id: a05412d1eb1dcb9a4b35023dead371aa33d62c39	2018-06-29 12:25:23 -07:00
Lu Fang	b75490414c	Bump up the C2 onnx frontend opset to 8 (#9006 ) Summary: Now ONNX master has bump up to opset 8. Closes https://github.com/pytorch/pytorch/pull/9006 Reviewed By: yinghai Differential Revision: D8685417 Pulled By: houseroad fbshipit-source-id: f0c0a3682417b8803a856e232c2740cf3e68e554	2018-06-29 11:56:11 -07:00
Tongzhou Wang	4efbd2e22c	Improve DataLoader worker fail error message (#9007 ) Summary: Tell people to run with num_workers=0 when DataLoader worker failed Closes https://github.com/pytorch/pytorch/pull/9007 Differential Revision: D8686005 Pulled By: SsnL fbshipit-source-id: bf872267f609c7b86e943061caab953149507bfe	2018-06-29 11:09:55 -07:00
Tongzhou Wang	a2bf55f9eb	Fix select backward when wrap dim (#9033 ) Summary: Previous backward was broken when `index=-1` because slicing `[-1:0]` gives empty tensor/list/array. Added a test. cc goldsborough Closes https://github.com/pytorch/pytorch/pull/9033 Differential Revision: D8694300 Pulled By: SsnL fbshipit-source-id: 8377b043896f8d0b1da173cc0077ace0bea5e862	2018-06-29 10:40:13 -07:00
peter	2507e273dc	Fix CUDA 8 for Windows (#9023 ) Summary: Fix missing functions for MSVC 2015 Inspired by https://github.com/tensorflow/tensorflow/pull/13525 Closes https://github.com/pytorch/pytorch/pull/9023 Reviewed By: soumith Differential Revision: D8694046 Pulled By: ezyang fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5	2018-06-29 09:40:48 -07:00
Yinghai Lu	c2a89b69b9	Support to ONNXIFI op (#8749 ) Summary: This PR adds basic support to ONNXIFI op. Closes https://github.com/pytorch/pytorch/pull/8749 Reviewed By: Maratyszcza Differential Revision: D8665739 Pulled By: yinghai fbshipit-source-id: 961916f9e1a4a26390b73c4b648d177883143a22	2018-06-29 09:10:26 -07:00
Karan Dwivedi	37e526e1a8	Better print of nn Containers (#8939 ) Summary: Fix https://github.com/pytorch/pytorch/issues/8900 Waiting on https://github.com/pytorch/pytorch/pull/8463 1. Remove extra Line 2. ... Closes https://github.com/pytorch/pytorch/pull/8939 Reviewed By: soumith Differential Revision: D8687730 Pulled By: ezyang fbshipit-source-id: 81c57a03683875704d537cb4585b11838f70df56	2018-06-29 08:24:09 -07:00
Max Schwarz	512c49e831	Correct link flag order for GNU ld in utils.cpp_extension.load (#9021 ) Summary: Any flags linking libraries only take effect on inputs preceding them, so we have to call `$cxx $in $ldflags -o $out` instead of the other way around. This was probably not detected so far since the torch libraries are already loaded when loading JIT-compiled extensions, so this only has an effect on third-party libraries. This also matches our behavior on windows. Closes https://github.com/pytorch/pytorch/pull/9021 Reviewed By: soumith Differential Revision: D8694049 Pulled By: ezyang fbshipit-source-id: e35745fc3b89bf39c14f07ce90d6bd18e6a3d7cc	2018-06-29 08:24:07 -07:00
Thomas Viehmann	6a1e801071	add second variant to Tensor.add, Tensor.add_ docstring (fixes: #8690 ) (#9027 ) Summary: fixes: #8690 Closes https://github.com/pytorch/pytorch/pull/9027 Reviewed By: soumith Differential Revision: D8694042 Pulled By: ezyang fbshipit-source-id: bc3b1112b41f959231854366cdcf9292b3699779	2018-06-29 08:24:06 -07:00
vishwakftw	b795620442	Fix x.pow(0) gradient when x contains 0 (#8945 ) Summary: This closes https://github.com/pytorch/pytorch/issues/8940 . Closes https://github.com/pytorch/pytorch/pull/8945 Differential Revision: D8668853 Pulled By: ezyang fbshipit-source-id: 80a629352ee2f506c38a05647b769281579a5af7	2018-06-29 06:53:42 -07:00
James Reed	00b5d397ae	Fix resolution callback for @script_method (#8912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8715. This was peeking too few frames up when we instantiate the callback Closes https://github.com/pytorch/pytorch/pull/8912 Reviewed By: ezyang Differential Revision: D8684972 Pulled By: jamesr66a fbshipit-source-id: 11dbb919ae7273f92cbe25fe21f7946b9fa28aeb	2018-06-28 22:56:17 -07:00
vishwakftw	4643269eb5	Document get_device, fixes #8857 (#8859 ) Differential Revision: D8677690 Pulled By: ezyang fbshipit-source-id: 0167672d1d2659d9fc7d68530760639ba35ed7d8	2018-06-28 22:11:08 -07:00
Edward Yang	bf65df5310	Get rid of possible ODR violation with const char*. (#8962 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8962 Differential Revision: D8668580 Pulled By: ezyang fbshipit-source-id: 466a5940f175f1f339dc826a1ab32bf3e42e64fd	2018-06-28 17:53:55 -07:00
Teng Li	5b7951057d	Distributed Data Parallel Module Implementation (#8584 ) Summary: This is an initial implementation of Distributed Data Parallel module for c10d GLOO and NCCL backend. Have done performance testing and made sure that both single GPU / process and multi-GPU / process are able to overlap communication with BW computation The idea is, DDP will bucket parameters and do all reduce in the reverse order of the bucket. Since all C10D ops are async ops, no more dedicated thread is needed and we simply queue the all-reduce kernels once the bucket is ready following the deterministic reduction order. Tested with 8 nodes 64 GPUs, ResNet 50, hit the required accuracy within 90 epochs Closes https://github.com/pytorch/pytorch/pull/8584 Reviewed By: goldsborough Differential Revision: D8678696 Pulled By: teng-li fbshipit-source-id: 440341b804befc6762e92acece2759ba47157cea	2018-06-28 17:25:40 -07:00
Christian Puhrsch	30549a1293	Deal with more threads than necessary (#8961 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8949 Closes https://github.com/pytorch/pytorch/pull/8961 Reviewed By: colesbury Differential Revision: D8669829 Pulled By: cpuhrsch fbshipit-source-id: 368f76a2c6602a62fb7609d404af9753c87dc605	2018-06-28 16:44:23 -07:00
James Reed	2e23bc1a20	Switch to emitting ScriptModule for scripted and traced functions (#8876 ) Summary: Solves https://github.com/pytorch/pytorch/issues/8716 and closes https://github.com/pytorch/pytorch/issues/8867 This makes it so that all of {script, traced} {module, function} create ScriptModules and implements proper inlining between them. This also greatly simplifies things and makes clear that tracing is a way to convert regular Python into a ScriptModule Closes https://github.com/pytorch/pytorch/pull/8876 Differential Revision: D8675996 Pulled By: jamesr66a fbshipit-source-id: 3b12ad4b758324f558074c27c1f1a9fb616b170a	2018-06-28 16:44:21 -07:00
Wanchao Liang	0bd9e96b08	Enable script for time-sequence prediction (#8862 ) Summary: Enable script for the time-sequence prediction, did bunch of hacks to make the script mode work, and couple of issues discovered while enabling the time-sequence prediction, all noted in #8452, Shall we merge this PR and iteratively fix those issues thereafter? Closes https://github.com/pytorch/pytorch/pull/8862 Differential Revision: D8677683 Pulled By: wanchaol fbshipit-source-id: 02319cd56c87de523be898f0e6c541dd15e57cac	2018-06-28 16:10:10 -07:00
Peter Goldsborough	f0772c0ab2	Replace max_pool with max_pool_with_indices (#8946 ) Summary: Re-push from https://github.com/pytorch/pytorch/pull/8892 Closes https://github.com/pytorch/pytorch/pull/8946 Differential Revision: D8666862 Pulled By: goldsborough fbshipit-source-id: 44cd3d63d347316818a7b0f5f89fce8ff7486736	2018-06-28 16:10:08 -07:00
Peter Goldsborough	66465f1e17	Create nn::Module::is (#8970 ) Summary: When initializing weights for my C++ model, I had to write ```cpp void initialize_weights(nn::Module& module) { if (module.name().find("Conv2d") != std::string::npos) { module.parameters()["weight"].data().normal_(0.0, 0.02); } else if (module.name().find("BatchNorm") != std::string::npos) { auto parameters = module.parameters(); parameters["weight"].data().normal_(1.0, 0.02); parameters["bias"].data().fill_(0); } } ``` The string-based module determination is not very nice, and not very C++-y. So I created `nn::Module::is<T>` which does a `dynamic_cast` inside. It also handles the `ModuleHolder` vs. `Module` distinction. It now becomes ```cpp if (module.is<nn::Conv2d>()) { module.parameters()["weight"].data().normal_(0.0, 0.02); } else if (module.is<nn::BatchNorm>()) { auto parameters = module.parameters(); parameters["weight"].data().normal_(1.0, 0.02); parameters["bias"].data().fill_(0); } ``` ebetica ezyang apaszke Closes https://github.com/pytorch/pytorch/pull/8970 Differential Revision: D8677476 Pulled By: goldsborough fbshipit-source-id: 053294e19b6a58cce868167596c89639f7de91c2	2018-06-28 16:10:04 -07:00
Will Feng	15a75208ee	Use std::random_device for generating storage handle (#8971 ) Summary: Currently the `test_RNG_after_pickle` in the PR would fail because pickling a tensor changes the RNG state. This PR aims to fix it. Closes https://github.com/pytorch/pytorch/pull/8971 Reviewed By: ezyang Differential Revision: D8677474 Pulled By: yf225 fbshipit-source-id: 1713d9611699ad288b66d92dbb29ce9feb34b8cf	2018-06-28 15:10:27 -07:00
Xiaomeng Yang	838fdd6f99	Add Cube and Cbrt Ops (#8991 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8991 Add Cube and Cbrt Ops Reviewed By: houseroad Differential Revision: D8678848 fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930	2018-06-28 14:55:30 -07:00
Wei Yang	61ca0ba222	Add log1p for sparse tensor (#8969 ) Summary: - fixes log1p at #8853 - added log1p of sparse tensor in ATen - make log1p of sparse tensor non-differentiable and raise error, because local derivate of log1p for zero element is 1 / (0 + 1) = 1 and make tensor dense Closes https://github.com/pytorch/pytorch/pull/8969 Reviewed By: ezyang Differential Revision: D8677491 fbshipit-source-id: 8363a613519de4bc75eda087ccd20a3eb2d18126	2018-06-28 13:10:11 -07:00
Shaoliang Nie	8d384600b8	Add ShapeTypeInference for Conditional operator (#8924 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8924 Closes https://github.com/pytorch/pytorch/pull/8915 As desc Reviewed By: ezyang Differential Revision: D8649582 fbshipit-source-id: d08a456b9861dd7edd19ed18e16d4778b4240c90	2018-06-28 12:10:24 -07:00
Richard Zou	7310229426	Fix TestCollectEnv flakiness (#8983 ) Summary: The problem was a bad regex; the version hash match used to match 6 wildcards. This PR changes it to match \w+, which is sufficient for the test because the version hash is always followed by either whitespace or a right-paren. Fixes #8981 Closes https://github.com/pytorch/pytorch/pull/8983 Differential Revision: D8677771 Pulled By: zou3519 fbshipit-source-id: dfdde98669bcd682335145cba98c82530a815afa	2018-06-28 11:45:37 -07:00
Xiaomeng Yang	93cc7d1923	Add in_place test for binary ops Summary: Closes https://github.com/pytorch/pytorch/pull/8973 Reviewed By: houseroad Differential Revision: D8674216 Pulled By: BIT-silence fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b	2018-06-28 11:45:35 -07:00
Peter Goldsborough	ccc14071f4	Fix Module::zero_grad (#8964 ) Summary: `nn::Module::zero_grad` did not respect undefined `grad()` variables. This is fixed (the code now replicates PyTorch). ebetica ezyang apaszke Closes https://github.com/pytorch/pytorch/pull/8964 Reviewed By: ezyang Differential Revision: D8677529 Pulled By: goldsborough fbshipit-source-id: afdc4ba00dbf5012c37d1f794c731937ee5e422e	2018-06-28 10:26:52 -07:00
Lu Fang	63233f98ad	Bump up opset version to 7 in Caffe2 ONNX exporter (#8854 ) Summary: Will bump up to opset 8 in another PR to match the current opset version. Already tested through generating the models in current model zoo. Closes https://github.com/pytorch/pytorch/pull/8854 Reviewed By: ezyang Differential Revision: D8666437 Pulled By: houseroad fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa	2018-06-28 07:39:02 -07:00
Peter Goldsborough	148088a681	Convert at::Tensor to torch::Tensor in AnyModule (#8968 ) Summary: Operations on `Variable`s (or `torch::Tensor`) usually return `at::Tensor`. This is usually fine, but the `AnyModule` used in the implementation of `torch::Sequential` is very picky about types, and does not understand implicit conversions like this. This means that `sequential.forward(at_tensor_that_is_actually_a_variable)` will fail unless you wrap `at_tensor_that_is_actually_a_variable` with `torch::Tensor`. This PR adds a special case to `AnyModule` that will convert an `at::Tensor` to `torch::Tensor` when the tensor is really a variable, and else just pass the `at::Tensor`. This is a nice little usability improvement for the often-used `Sequential` class. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/8968 Reviewed By: ezyang Differential Revision: D8670407 Pulled By: goldsborough fbshipit-source-id: 3635ed6ed28238f3900ce4a876d07f1b11713831	2018-06-28 06:40:48 -07:00
Sam Gross	77484d91db	Add AT_WARN to issue warnings from ATen (#8967 ) Summary: Use AT_WARN from python_anomaly_mode instead of printing to stdout. Closes https://github.com/pytorch/pytorch/pull/8967 Reviewed By: ezyang Differential Revision: D8670654 Pulled By: colesbury fbshipit-source-id: 3f7aee8ea06914d7d4381feec086e95f0b194752	2018-06-27 21:24:39 -07:00
Yinghai Lu	c3b499227d	Avoid iomp/gomp clash when building IDEEP ops (#8955 ) Summary: This PR does 3 things - Reorder the search order of `intel_lp64` and `gf_lp64` as the first one is more essential and should have high priority. - Avoid repetitive searching of MKL libraries in `ideep` and `mkldnn` submodule if we already found those in `FindMKL` - Avoid adding more MKL dependencies to IDEEP if MKL is also found. TODO: provide an option for user to chose iomp or gomp. Closes https://github.com/pytorch/pytorch/pull/8955 Reviewed By: bddppq Differential Revision: D8666960 Pulled By: yinghai fbshipit-source-id: 669d3142204a8b47c19a900444246fc44a139012	2018-06-27 21:24:36 -07:00
Junjie Bai	ccd3e2c03d	Skip operator tests in rocm CI jobs (#8720 ) Summary: disable operator tests for now until we have enough rocm workers in CI Closes https://github.com/pytorch/pytorch/pull/8720 Reviewed By: ezyang Differential Revision: D8654871 Pulled By: bddppq fbshipit-source-id: ff2504d6a7182f85f7cc15618f2df8e512447fa8	2018-06-27 20:39:19 -07:00
Yinghai Lu	059ccb62c1	bump up onnx version (#8975 ) Summary: To include the change in https://github.com/onnx/onnx/pull/1151 Closes https://github.com/pytorch/pytorch/pull/8975 Reviewed By: bddppq Differential Revision: D8673552 Pulled By: yinghai fbshipit-source-id: f55c270ef869bd2e19fdabbdf906a6ae12129791	2018-06-27 20:24:30 -07:00
Yinghai Lu	346de2535d	Workaround lack of 0-dim support in ideep (#8959 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8959 MKL-DNN doesn't have support to 0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore. Reviewed By: viswanathgs Differential Revision: D8665168 fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b	2018-06-27 20:24:28 -07:00
Peter Goldsborough	03d0a70a4d	Set random seed at the start of C++ tests (#8903 ) Summary: Sets the random seed at the start of C++ tests so that everything is super deterministic. I made sure we only generate random values from torch instead of `std::`, so that this seed always applies. I.e. I do: ``` torch::randint(2, {2}, at::kInt64) ``` instead of ``` std::rand() % 2 ``` Also got rid of the tests that test the random seeding, since it would interfere here. And the test is not useful since we just use ATen's seeding mechanism, which should work. Fixes #7288 #7286 #7289 ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/8903 Differential Revision: D8667269 Pulled By: goldsborough fbshipit-source-id: a833e86e156d5e68dae8c53a4b1c433cb0608b6c	2018-06-27 20:09:46 -07:00
Karan Dwivedi	a41d433d9d	Check key should be string in nn.Module.add_module, parameter and buffer (#8960 ) Summary: Because I probably messed up the rebase in https://github.com/pytorch/pytorch/pull/8905 Closes https://github.com/pytorch/pytorch/pull/8960 Reviewed By: soumith Differential Revision: D8668202 Pulled By: ezyang fbshipit-source-id: 41e19803c7ac7aac898c8e70c6a9769314476ca9	2018-06-27 19:40:00 -07:00
Richard Zou	07b6c28715	Fix comment in file Summary: Closes https://github.com/pytorch/pytorch/pull/8966 Differential Revision: D8670090 Pulled By: zou3519 fbshipit-source-id: fe92f31264cec89b0e0139f44720dd72b4f31c6e	2018-06-27 19:11:14 -07:00
Duc Ngo	f52c2ca1c6	net_async tracing use enable_profile arg from NetDef (#8927 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8927 Closes https://github.com/pytorch/pytorch/pull/8855 - Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name). - Append a unique id to the json profiling result file because there could be multiple instances of the same net running. - Dump json profling file regularly instead of just when the Tracer object is destroyed Reviewed By: ilia-cher Differential Revision: D8372378 fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01	2018-06-27 16:24:57 -07:00
Dmitriy Serdyuk	ba8e133844	Refactor batch sampler (#8958 ) Summary: Fixes #8652, fixes #8957 Closes https://github.com/pytorch/pytorch/pull/8958 Reviewed By: ezyang Differential Revision: D8668253 Pulled By: soumith fbshipit-source-id: 663d461621511166f29cfcc902e6c2a71befa647	2018-06-27 16:06:47 -07:00
Dmytro Dzhulgakov	6aa8b67ed0	Attempt to fix operator<< in Caffe2 Summary: Closes https://github.com/pytorch/pytorch/pull/8947 Reviewed By: dzhulgakov Differential Revision: D8664902 Pulled By: bddppq fbshipit-source-id: 1cf7123062b8604e4477eee6142b087675344992	2018-06-27 14:54:45 -07:00
Peter Goldsborough	fef9a66d08	Use torch:: instead of at:: (#8911 ) Summary: This PR is the final step to making `torch::` the only namespace users of the C++ API ever see. Basically, I did: ``` cpp namespace torch { using namespace at; } ``` And then changed `torch::` to `at::` almost everywhere. This worked surprisingly well out of the box. So users can now write `torch::relu` and `torch::log_softmax` and `torch::conv2d` instead of having to know when to use `at::` and when `torch::`. This is happy! Another thing I did was to have `using Dtype = at::ScalarType`, which will be the eventual name anyway. ebetica ezyang apaszke zdevito Closes https://github.com/pytorch/pytorch/pull/8911 Reviewed By: ezyang Differential Revision: D8668230 Pulled By: goldsborough fbshipit-source-id: a72ccb70fca763c396c4b0997d3c4767c8cf4fd3	2018-06-27 14:42:01 -07:00
Orion Reblitz-Richardson	4c5192788b	Cleanup of the shipit commit (#8956 ) Summary: Some files shouldn't have been added. Minor changes. Closes https://github.com/pytorch/pytorch/pull/8956 Reviewed By: pjh5 Differential Revision: D8667962 Pulled By: orionr fbshipit-source-id: 3331c6e93763ea4ea5b0c17dba1f0fc92172fd1b	2018-06-27 14:41:59 -07:00
Zhicheng Yan	e6208b3340	by default, donot throw image decoding error (#8951 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8951 Change default value of max decode error rate to 1.0 which means we don't throw such runtime error by default Reviewed By: avulanov Differential Revision: D8665640 fbshipit-source-id: 9d373979dd8a97253ad528b167f8d73a28fee82a	2018-06-27 14:26:49 -07:00
Jerry Zhang	da4cb226d8	Fix a bug introduced by the deletion of copy constructor of tensor Summary: Closes https://github.com/pytorch/pytorch/pull/8942 Reviewed By: jerryzh168 Differential Revision: D8666530 Pulled By: orionr fbshipit-source-id: ddb311141ec7dbf163665ebfc6b475b219a5a999	2018-06-27 13:10:58 -07:00
Jesse Hellemn	a898a8f1f0	Adding pyyaml to mac and windows builds Summary: Closes https://github.com/pytorch/pytorch/pull/8851 Reviewed By: mingzhe09088 Differential Revision: D8666075 Pulled By: pjh5 fbshipit-source-id: a3fdc9f9801f814b1e4010bd20ba51afbb048a1d	2018-06-27 13:10:57 -07:00
Orion Reblitz-Richardson	624303340e	Remove third_party from CODEOWNERS file (#8950 ) Summary: No longer required now that we've switched over to ShipIt on master. Closes https://github.com/pytorch/pytorch/pull/8950 Reviewed By: Yangqing Differential Revision: D8666175 Pulled By: orionr fbshipit-source-id: 6d8b8b38f6558d87cabd0aa19b72a390057c137b	2018-06-27 11:54:42 -07:00
Edward Yang	6446ffa536	More detailed help message for 'without ATen_cuda library' message. (#8898 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8898 Differential Revision: D8661562 Pulled By: ezyang fbshipit-source-id: 9cb976f9642c6f40902b10b34eada2d6ff6fd81c	2018-06-27 11:44:01 -07:00
Tongzhou Wang	d9c64851e9	Fix nccl/CMakeLists.txt (#8948 ) Summary: Changes (were merged) in #8834 and #8829 (cc yf225 ) were lost in `9ec0a2aef4 (diff-6997846ce6daf0c271e2db9ef0508551)`. This PR resubmits them. Closes https://github.com/pytorch/pytorch/pull/8948 Differential Revision: D8665760 Pulled By: SsnL fbshipit-source-id: 15514021fa79e6b908ea665dd6cb464b3ea00ab0	2018-06-27 11:44:00 -07:00
Mingzhe Li	c4744cfafa	bilinear upsample operator on CPU Summary: Add support for bilinear upsample operator on CPU. Reviewed By: BIT-silence Differential Revision: D7853215 fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66	2018-06-27 10:12:06 -07:00
Edward Yang	c82715ced5	Add some extra punctuation to README. (#8941 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8941 Differential Revision: D8661797 Pulled By: ezyang fbshipit-source-id: 876163b11a8463d7560308b2b8e68231f2a657cb	2018-06-27 08:56:13 -07:00
Orion Reblitz-Richardson	9ec0a2aef4	fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af	2018-06-27 04:50:56 -07:00
Peter Goldsborough	290d20b094	Replace max_pool with max_pool_with_indices (#8892 ) * Create max_poolXd_with_indices * Match ATen names in ONNX symbolic	2018-06-26 17:09:30 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Tongzhou Wang	055f527242	[build] Use conda cmake in two CI builds (#8864 ) * use conda cmake in pytorch-linux-xenial-cuda8-cudnn6-py2 and pytorch-linux-xenial-cuda9-cudnn6-py3 * update test_expect * add exit 1 * check cmake 3.5 * bump expect driver version * add back space	2018-06-26 17:22:04 -04:00
Peter Goldsborough	55757357b2	[C++ API] Better forward methods (#8739 ) * Better forward methods in C++ API capitalize error message in test_torch.test_flatten Support for operator() * Add operator() to Functional * Get rid of SigmoidLinear * Add BoundFunction to FunctionalImpl * Remove macro from conv because it makes errors more nasty	2018-06-26 13:23:16 -07:00
Pieter Noordhuis	f607794dc2	[c10d] No default device for ProcessGroupGloo (#8888 ) This should be set by the code that instantiates it, be it the Python bindings or other C++ code. Defaulting to use localhost is not useful beyond tests. Instead of keeping multiple default paths around we can punt on it here and require it to be initialized elsewhere.	2018-06-26 11:37:20 -07:00
Vadim Velikodniy	74d2d562f3	Fix default values for affine= in the docstrings of InstanceNormXd (#8895 )	2018-06-26 14:06:31 -04:00
Edward Z. Yang	76e9dbad37	Stop making dynamic allocations of PinnedMemoryAllocator. (#8896 ) There is no relevant state in PinnedMemoryAllocator, so we can have a single allocator with static lifetime. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-26 14:03:44 -04:00
Peter Goldsborough	1f36caceb2	[C++ API] Rework optimization package (#8815 ) * Rework optim folder * Removed TORCH_OPTIMIZER_CLASS macro * Got rid of CRTP/Impl * Removed TORCH_AUTOGRAD_KWARG * Differentiate between Optimizer and LossClosureOptimizer * Make Optimizers parameters based instead of model based * Allow construction of optimizer from arbitrary vector * Added test for zero grad * Added test for external parameter vectors * Now comparing against baseline values * Documentation * Post rebase fixes * Different strategy for creating and accessing buffers in optimizers * Fix member ordering	2018-06-26 10:13:14 -07:00
Yu Feng	22ba8726da	Mention MPICH_MAX_THREAD_SAFETY=multiple. (#8580 ) Currently, this is a common step to enable level 3 support on MPICH based systems.	2018-06-26 12:40:48 -04:00
gchanan	31327dd1e1	Unify isViewable, handle n-dimensional empty tensors. (#8883 ) * Unify isViewable, handle n-dimensional empty tensors. 1) Unifies the two isViewable functions in ATen and TH. 2) Handle n-dimensional empty tensors in the implementation 3) Clarify some comments. This requires an extra copy in the TH case, but that will go away. * Also unify THCTensor version. * Remove C-linkage from THTensor_compute_stride. * Update comment.	2018-06-26 12:38:45 -04:00
Vadim Velikodniy	6e28d4d364	Add pos_weight argument to nn.BCEWithLogitsLoss (#5660 ) (#6856 ) * Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660) - Add an option to control precision/recall in imbalanced datasets - Add tests (but new_criterion_tests) * Move pos_weight to the end of args list in the documentation. `pos_weight` was moved to the end because it is the last argument in both `nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`	2018-06-26 12:31:07 -04:00
Tongzhou Wang	f935ba1b05	[build] Enable clang-specific warnings only when using clang (#8869 ) * Wraps clang only warnings in an if * add back -Wno-missing-field-initializers	2018-06-26 11:09:25 -04:00
Pieter Noordhuis	8e019826c9	Fix cmake cudnn autodetection (#8891 ) If CUDNN_INCLUDE_DIR, CUDNN_LIB_DIR, and/or CUDNN_ROOT_DIR were set, but USE_CUDNN was not explicitly set, the code in cmake/Dependencies.cmake would set USE_CUDNN=OFF even though it could be found. This caused an issue in ATen, where it includes its CuDNN bindings if the variable CUDNN_FOUND is set. This was the case, because the find_package call in cmake/public/cuda.cmake searches for CuDNN and ends up finding it. The net result is that ATen tried to compile CuDNN bits, but the caffe2::cudnn target is never defined let alone added as dependency, and the build fails on not being able to find the header cudnn.h. This change does two things: 1) Restore CuDNN autodetection by setting USE_CUDNN=ON if it is found. 2) Remove obsolete FindCuDNN.cmake module. This functionality now lives in cmake/public/cuda.cmake.	2018-06-26 06:54:27 -07:00
Pieter Noordhuis	af741dc2fd	[c10d] Fix link order for building C++ tests (#8889 ) List dependency on gloo_cuda before dependency on gloo such that unresolved symbols in gloo_cuda are correctly resolved (since the linker resolves from left to right). This fixes building c10d C++ tests on GCC 4.8.	2018-06-25 23:59:32 -07:00
anderspapitto	8ef5d37ac5	directly add_subdirectory(nanopb) from torch CMakeLists (#8870 ) currently torch/CMakeLists doesn't know how to find nanopb without some higher-level script (setup.py or build_all.sh) telling it where to look, which is an obstacle towards fully CMake-ifying libtorch.so. This change removes that dependency.	2018-06-25 21:23:25 -07:00
Peter Goldsborough	47492ed451	[C++ API] Bag of fixes (#8843 ) * Bag of fixes * Rename tensor_range.h to tensor_list_view.h * Post rebase fixes * Rename torch::tensor namespace to torch::tensors due to name conflict * Avoid recursion in Module::to	2018-06-25 21:11:49 -07:00
Tongzhou Wang	3d580f2f7d	[build] Raise in cmake when seeing NVCC{9/9.1} + GCC6 combo (#8863 ) * Add error message for NVCC{9/9.1} + GCC6 combo * requires -> require	2018-06-26 00:07:13 -04:00
Peter Goldsborough	8e98a1a84d	Create avg_pool1d in ATen (#8880 ) * Create avg_pool1d in ATen * Put function name into check1d method	2018-06-25 20:31:32 -07:00
li-roy	85f4d2b55a	throw error when grid_sample is passed unsupported mode (#8884 )	2018-06-25 22:37:41 -04:00
Zachary DeVito	f74207c99f	Allow autograd to work even when the shape of values cannot be determined (#8641 ) This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.	2018-06-25 18:40:04 -07:00
Peter Goldsborough	7a614799f7	Make at::Tensor::to() const (#8839 ) * Make at::Tensor::to() const * Add cheaper checks to Tensor::to	2018-06-25 17:55:10 -07:00
onnxbot	5cb8586dde	[auto] Update onnx to 458c521 - Fix typo (onnx/onnx#1143 ) `458c521844`	2018-06-25 23:37:19 +00:00
Xiaomeng Yang	288d37998a	[Caffe2] Fix gradient_check on in-place ops (#8828 ) * Fix gradient_check on in-place ops * Fix hsm_test * Fix SplitByLengthOp test * Fix input_device_options for gradient_checker * Fix hypothesis_test_util.py	2018-06-25 15:25:56 -07:00
Tongzhou Wang	838fb87874	Fix as_strided_backward (#8721 ) * make as_strided safer * patching as_strided; and stop using it in backward * Test a simple case in as_strided_backward * a long note * remove boundary checks of as_strided; implement slow path * wip * fix as_strided backward when input is overlapping check for input overlapping too [doc] clarify gradcheck behabior when input is overlapping longer note * fix a deprecation warning in test_autograd * nits	2018-06-25 18:17:35 -04:00
Soumith Chintala	b5a123c06c	[jit] Add python bindings for Gradient and differentiate (#8830 ) * improve assertion error message in jit::differentiate * add python binding for Graph::copy * add pybind for jit::differentiate and jit::Gradient	2018-06-25 18:09:29 -04:00
Praveen Palanisamy	49a3e49627	Fixes #8508 . Upcasted loc to 1-d if a scalar loc is provided to MultivariateNormal (#8543 ) * Fixes #8508 Broadcasted loc to 1-d if a scalar loc is provided to MultivariateNormal. * move to non-inplace	2018-06-25 18:06:51 -04:00
onnxbot	41181169ae	[auto] Update onnx to 6bedd27 - add broadcasting support for min/max/sum/mean (onnx/onnx#1124 ) `6bedd27b03`	2018-06-25 22:03:11 +00:00
gchanan	89afb93e1d	Delete dead TH size inference code. (#8866 )	2018-06-25 17:45:43 -04:00
Sebastian Meßmer	cca247635c	First version of dispatcher (#8713 )	2018-06-25 13:11:53 -07:00
Tongzhou Wang	2b926aafb0	[build] disable test_expect for pinning cmake to 3.5* in dockerfiles repo (#8850 ) * pin pytorch-linux-xenial* to use cmake 3.5* * disable test_expect	2018-06-25 14:21:42 -04:00
gchanan	04440d2c57	Fix nonzero and tensor printing of n-dimensional empty tensors. (#8849 )	2018-06-25 12:09:47 -04:00
Tongzhou Wang	1e7fcb5d1b	fix NCCL NVCC_GENCODE w/ multiple archs (#8834 )	2018-06-25 08:07:53 -07:00
Ben	e251fb5036	Add file and line to CUDA_CHECK and CUDNN_CHECK (#8836 ) * Add file and line to CUDA_CHECK and CUDNN_CHECK * use stringstream * clang-format * switch to AT_ERROR	2018-06-25 10:46:52 -04:00
Will Feng	e31ab99932	[Ready for Review] Better fix for NCCL + sccache (#8829 ) * Better fix for NCCL + sccache * Try to set NUM_JOBS to 1 * Try to fix third_party/nccl/CMakeLists.txt as well * Pass NUM_JOBS to nccl/CMakeLists.txt	2018-06-25 02:17:07 -04:00
ptrblck	50410c9572	fixes #8840 (#8841 )	2018-06-25 02:01:05 -04:00
Peter Goldsborough	a5df8ec841	Created DefaultTensorOptions in ATen (#8647 ) * Created DefaultTensorOptions * Fix TensorOptions() call which was interpreted as function decl * Fix empty OptionsGuard * Make options_ and mutex_ in DefaultTensorOptions class static because of dynamic linker issues * Make DefaultOptions thread local	2018-06-24 21:15:09 -07:00
Peter Goldsborough	521f5111ad	[C++ API] Use torch::Tensor instead of at::Tensor/Variable mix (#8680 ) * Use torch::Tensor instead of at::Tensor/Variable mix * TensorRange -> TensorListView	2018-06-24 19:03:39 -07:00
peterjc123	22a70fbe2e	Minor fixes for finding CUDNN (#8743 ) * Minor fixes for finding CUDNN * Minor fixes for comment * Fix lints * Fix naming conflicts * Fix import name	2018-06-24 21:42:19 -04:00
Thomas Viehmann	fc22bf3e82	Spectral norm improvements (#8590 ) * Spectral norm improvements - Don't do iterations on weight in eval mode To facilitate this, register weight as buffer in order to be able to use module with spectral norm in eval mode after immediately after loading state dict (#8208) - Use weight instead of weight_orig as weight when removing spectral norm - Add dim parameter in case the normalization should occur w.r.t. a dimension other than 0 (#7865) * add and update spectral norm tests * More spectral norm tests Thank you, Simon, for the suggestions.	2018-06-24 17:15:13 -04:00
Edward Z. Yang	3598356420	Port THCS to ATen. (#8689 ) * Port THCS to ATen. General structure of the sparse implementation: - SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain the same functions as their CPU analogues - SparseCUDAApplyUtils.cuh contains what used to be in THCSTensor.cu - SparseCUDABlas.cu contains what used to be THCSparse.cu Unrelated improvements: - Forward declared CUDA types in Context.h are now moved exclusively to CUDAHooks - New getCurrentCUDASparseHandle in Context - Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message directly Some unusual pieces: - get_device got the LegacyBridge makeover, as it needs special logic on sparse tensors (defer to the inner tensors). - I noticed that I need to turn off device_guard codegen for many functions in sparse, noticed because get_device became a native function, and resulted in an infinite recursion. This was done by adding device_guard: False to the native definitions. An alternative strategy might be to make the heuristic for deciding when to put in a device guard more clever. Scaffolding removal: - LegacyBridge now special-cases only on sparse versus dense; no more CUDA test (hooray!) - Native bindings get CUDA/SparseCUDA dispatch entries. CPU sparse refactoring: - New SparseUtils.h header, with all of the utility functions that used to live in SparseTensor.cpp - new_with_tensor_sparse now correctly handles both CPU and CUDA - transpose functions in sparse/ turned out to be dead, so I killed them Bugs I noticed while working on this: - I used accessor<...>() on a CUDA tensor, because I thought it does the CUDA-CPU sync. It does not. Last mile changes: - I killed all of the THS/THCS directories, build scripts, bindings everything. It is now no more! - A bunch of trampolines in LegacyBridge are no more; anything that was "sparse only" is now done natively. - `sparse_coo_tensor` is implemented a little funny, but we think it's a good idea. - HIP is handled by explicitly ifdef'ing out all kernels; we'll add support for this at some later point in time. - TH_INDEX_BASE is now unconditionally set to 0. - Some uses of x.type() now replaced with x.options(), the new way of doing it. - More notes about checked_cast_tensor, and eliminate Storage/Tensor fields in the code gen env when they are dead. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-24 15:14:09 -04:00
Tongzhou Wang	731273b8d6	Improve convT output_padding docs (#8825 ) * improve output_padding doc for convT modules * Update functional.py * Update conv.py * lint	2018-06-23 14:33:18 -04:00
ngimel	e4ff0b8aa1	remove unnecessary headers from SpectralOps, add cuda.h include to deviceutils (#8819 )	2018-06-23 14:31:13 -04:00
Vishwak Srinivasan	ebae3f502c	Fix CUDA_NVCC_EXECUTABLE from being set to empty (#8822 )	2018-06-23 11:11:32 -04:00
Jong Wook Kim	7fbd57091d	Doc: specify batch_first is True by default in RNN (#8807 )	2018-06-22 19:33:25 -04:00
bddppq	74fa304b31	[Caffe2] Export clang compilation datatbase in setuptools build (#8811 )	2018-06-22 16:19:43 -07:00
Tongzhou Wang	12904edae9	Test that broadcast doesn't copy when dst and src devices are the same (#8803 ) * test that broadcast doesn't copy when dst and src devices are the same * only test if input is cuda	2018-06-22 17:36:19 -04:00
cpuhrsch	46bff5d9ff	Set MKL VML error mode to ignore (#8800 )	2018-06-22 16:54:47 -04:00
Tongzhou Wang	73b92472d2	[README.md] Use GitLab URL for CMake (#8799 ) * update to GitLab url * use GitLab url for upstream CMake	2018-06-22 16:51:35 -04:00
Vishwak Srinivasan	1d4cf095b8	Add CUDA to logspace and linspace declarations in Declarations.cwrap (#8798 ) * Add CUDA to logspace and linspace These functions are already implemented, but where not exposed. Fixes https://github.com/pytorch/pytorch/issues/8786 . * Add small tests	2018-06-22 16:14:27 -04:00
Tongzhou Wang	675b579bf9	cmake wrapper (#8797 )	2018-06-22 15:29:25 -04:00
Edward Z. Yang	d3ec956d91	Revert "ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732 )" (#8791 ) Upstream fixed 1.8.2, and it will be fine in the final release. This reverts commit 9dffaf593e8c58a6d02583079162f4a88cb1bc66.	2018-06-22 15:05:56 -04:00
li-roy	f138111d52	remove unused flag (#8779 )	2018-06-22 10:54:48 -07:00
Ailing	ddda7cfea5	allow output_size to contain None in adaptive pooling methods (#8596 ) * allow output_size to contain None in adaptive pooling methods * fix lint * address comments	2018-06-22 13:29:15 -04:00
Yinghai Lu	b1b77c9eb5	Use virtual dtor for Annotation (#8780 )	2018-06-22 10:20:37 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Orion Reblitz-Richardson	fed44cb1b3	Remove aten project for main build (#8532 )	2018-06-22 08:40:44 -07:00
Wei Yang	ce13ca235e	added default lambd=0.5 for hardshrink (#8770 ) * added default lambd=0.5 and tests * lint	2018-06-22 09:52:55 -04:00
Orion Reblitz-Richardson	5a7b4840d9	Move nanopb-generated ONNX to unique file name (#8773 ) * Move nanopb-generated ONNX to unique file name * fix other places	2018-06-22 09:51:56 -04:00
Lu Fang	9c426797a8	Expose is_compatible function (#8783 )	2018-06-21 23:37:54 -07:00
onnxbot	83f846ff7a	[auto] Update onnx to 410530e - Make test suite backward compatible (onnx/onnx#1137 ) `410530e8c6`	2018-06-22 06:35:03 +00:00
Hexus (Shihao Xu)	bd95f8f948	Resolve name conflict of ContextManager (#7244 ) * Resolve conflicting name, ContextManager Concept name `Context Manager` is taken by Python. See https://docs.python.org/3.6/reference/datamodel.html#with-statement-context-managers It says, A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. The `ContextManager` here is more like a registry. And there is a C++ registry in caffe2 codebase `caffe2/caffe2/core/registry.h`. There is also a Caffe2DBRegistry, declared by calling `CAFFE_DECLARE_REGISTRY(Caffe2DBRegistry, DB, const string&, Mode);` in `caffe2/caffe2/core/db.h`. I think we can follow the concept name `Registry`, calling it `ContextRegistry`. * Make Classes and Functions internal to this module start with "_" Make Classes and Functions internal to this module start with "_" * Update context.py * Update context.py	2018-06-22 00:41:51 -04:00
gchanan	53c0de57d9	Document ideal vs actual SparseTensorImpl invariants. (#8776 )	2018-06-21 23:08:18 -04:00
Will Feng	fd32cc6118	Disable sccache when building NCCL (#8708 ) * Disable sccache when building NCCL * Fix nccl CMakeLists.txt	2018-06-21 17:30:07 -07:00
cpuhrsch	0750967496	Adjust nested parallelization to deal with OMP (#8723 ) * Adjust parallelization to deal with OMP	2018-06-21 20:24:53 -04:00
onnxbot	54a2e817a6	[auto] Update onnx to bc986de - Add is_compatible method in python backend (onnx/onnx#1132 ) `bc986dee4c`	2018-06-22 00:16:24 +00:00
Mike Ruberry	dc5837a1f4	[JIT] Adds fp16 support to the jit (#8679 ) * adds fp16 support to the jit * improves formatting * improves formatting * added an explanatory comment * fixes Python2 flake8 * updates c code * all except halfs	2018-06-21 18:14:51 -04:00
Pieter Noordhuis	709c300437	[c10d] Configurable number of algorithm entries per key (#8765 )	2018-06-21 14:30:55 -07:00
Sam Gross	2bb7e480c1	Define conversions and operations on at::Half (#8660 ) The goal is to be able to use at::Half throughout ATen, including in CUDA kernels and have it operate like built-in types. This avoids the need for cuda::from_type and cuda::to_type before every AT_DISPATCH_ALL_TYPES_AND_HALF call.	2018-06-21 17:16:32 -04:00
Peter Goldsborough	41c08fe4a1	Add tools/shared/_utils_internal.py to gitignore (#8756 )	2018-06-21 13:28:46 -07:00
Richard Zou	8489c4cc6e	Better support for literals in jit script (#8687 ) Addresses #8177 A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version General approach: - Add NumberType, FloatType, IntType to represent Python numbers, floats and ints. - Emit these types for python literals - Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType. - Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below) - Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export ### Tensor/number math ``` import torch @torch.jit.script def fn(x): return x + 1 ``` ``` graph(%x : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : Dynamic = prim::NumToTensor(%1) %3 : Dynamic = aten::type_as(%2, %x) %4 : Dynamic = aten::add[alpha={1}](%x, %4) return (%5); } ``` ### Number/Number Math ``` import torch @torch.jit.script def fn(zero): c = 1 + 1 return zero + c ``` ``` graph(%zero : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : int = prim::Constant[value={1}]() %3 : Dynamic = prim::num_to_tensor(%1) %4 : Dynamic = prim::num_to_tensor(%2) %5 : Dynamic = aten::add[alpha={1}](%3, %4) %c : int = prim::TensorToNum(%6) # this is the result of the addition ... return (%13); } ``` List of squashed commits: * Introduce Python Number types Added: IntType, FloatType, NumberType with IntType <: NumberType FloatType <: NumberType Changed aten_schema so arguments have corresponding types * Emit a NumberType for python literals. Also emit a NumberType for Scalar default values. * Add prim::NumToTensor and prim::TensorToNum * Add DynamicType -> NumberType implicit cast for bc * Better ensureTensor error message * Add ensureTensorOrNumber. Allow passing Number to some functions Like the range() construct and slices * Patch IntList to work. IntList is still a DynamicType in the frontend: a tensor gets built from a List[int]. Also, IntList[1] is a "union between int and IntList" the way it is implemented. If the frontend sees an int being passed for an IntList[1] arg, it converts it to a tensor as well. * Enforce some order on schemas to avoid overload ambiguity add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This matches the order in which python_arg_parser parses its arguments. * Disable std_dim and var_dim tests. With the new schema information, std(input, keepdim) and std(input, dim) are ambiguous. This will need to be fixed at a later date. * Add NumberType erasure pass. This is used for ONNX export and to ensure that NumberType information doesn't reach the interpreter * Add support for mixed tensor/number math ops. * Tests for new functionality. Includes: - Tensor/number math - number/number math - EraseNumberTypes pass test * Patch tests Update expect tests for: - decompose_addmm - loop unrolling tests Because python numbers are now NumberType, they cannot be returned by functions anymore. Work around this by using "torch.full", or by adding a tensor([0]) (taken from FIXME_zerol()). Both approaches are used because torch.full is more readable, but it is broken in some cases. * Add erase_number_types to torch/CMakeLists.txt * Move math back to emitSimpleExpr from emitSugaredExpr * Remove some dead lines * Renable some excluded script/trace tests that are fixed. * Move some tests to expected failure * Address some comments (more addressing to come) * Erase relevant aten::type_as nodes in EraseNumberTypes I also changed it so that EraseNumberTypes is only called for ONNX export. It is no longer used to prevent prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or interpreter.cpp. shape_analysis infers the type of the output of these nodes to be the same as their input. intepreter.cpp treats both of these nodes as no-ops. * Add reminder to fix std/var * Call EraseNumberTypes only when exporting a script module * Update expects after rebase	2018-06-21 15:43:38 -04:00
Tongzhou Wang	3de45f3430	Add ssnl and zou3519 as pytorch doc owner (#8754 )	2018-06-21 15:10:02 -04:00
Tongzhou Wang	be3d65a7e2	i2h<->h2h in gif (#8750 ) * i2h<->h2h * should have 11 frames	2018-06-21 14:46:47 -04:00
James Reed	c8cc246226	[JIT] Tests for calling between different frontend modes (#8704 )	2018-06-21 10:38:03 -07:00
Edward Z. Yang	40262ca9d1	Disable flaky test_lstm_fusion_cpu test (#8747 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 10:32:27 -07:00
Will Feng	e07a49e15a	Set DEBUG=1 in trusty-py3.6-gcc5.4 CI build (#8593 )	2018-06-21 12:58:43 -04:00
Will Feng	b300934db6	Add CUDA 9.2 + GCC 7 build and test to CI (#8592 )	2018-06-21 12:58:28 -04:00
Edward Z. Yang	117b77e574	Install vim by default on all Caffe2 docker images. (#8731 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 11:10:32 -04:00
Peter Goldsborough	98a7d84a5a	Link to C++ extensions in README.md (#8737 )	2018-06-21 09:48:04 -04:00
gchanan	c0dfe23703	Support n-dimensional empty tensors in (most of) THCUNN. (#8722 ) * Support n-dimensional empty tensors in (most of) THCUNN. * Fix incorrect parens.	2018-06-21 09:12:16 -04:00
gchanan	9b465313cf	Support n-dimensional empty tensors in more of TH/THC. (#8726 ) * Support n-dimensional empty tensors in more of TH/THC. * Fix warning.	2018-06-21 09:11:28 -04:00
Edward Z. Yang	9dffaf593e	ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 08:31:08 -04:00
Will Feng	ac068fdabe	Use env var to pass sharding options to test_nn.py (#8727 ) Buck doesn't support passing arguments to Python unit tests, and we have to use environment variables to pass the sharding options instead. Also, buck test doesn't go through the __name__ == '__main__' code path and we need to move the env var checking logic to top-level. * Use env var to pass sharing options to test_nn.py * Move env var checking to top-level * fix lint	2018-06-21 08:30:28 -04:00
onnxbot	bbd71a7c81	[auto] Update onnx to 9b9f595 - Make axis optional (onnx/onnx#1128 ) `9b9f595107`	2018-06-21 05:49:53 +00:00
Ben	4f604a436b	Export tensor descriptor (#8313 ) * Export TensorDescriptor * Export descriptors * install cudnn_h * Add tests and with_cuda * tab to space * forgot cpp * fix flake * ld flags * flake * address comments * clang-format * fixtest * fix test * extra headers * extra headers * camelcasing	2018-06-20 22:32:50 -07:00
Edward Z. Yang	35e66efbfc	Don't set HIP flags on non-HIP build. (#8728 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-20 23:53:31 -04:00
onnxbot	6181979a7c	[auto] Update onnx to 7558954 - Use cmath instead of math.h (onnx/onnx#1129 ) `7558954ffd`	2018-06-21 02:56:50 +00:00
onnxbot	d79711d689	[auto] Update onnx to 068f1a4 - Optimization pass to fuse batch normalization operator with convolution operator (onnx/onnx#1106 ) `068f1a4079`	2018-06-20 23:48:51 +00:00
gchanan	f037d392c1	Support n-dimensional empty tensors in (most of) THNN. (#8702 ) * Support n-dimensional empty tensors in (most of) THNN. Most of the argument checking in THNN is directly around dimensionality, which doesn't work in general for n-dimensional empty tensors, because you will end up dividing by 0 or similar. Instead, we change these to check for empty and give error messages for those cases as well. In some cases, the error messages are improved as well. * Fix bug.	2018-06-20 18:30:19 -04:00
Pieter Noordhuis	1e570fa5a8	Add c10d/Def.hpp placeholder (#8711 ) This is a placeholder for the header that is generated CMake. It is needed if you include the c10d headers directly from this directory.	2018-06-20 15:03:58 -07:00
James Reed	802929608c	[JIT] Improve test coverage for ErrorReport instances (#8668 ) * [JIT] Coverage for ErrorReport * Fixes * lint * More coverage	2018-06-20 14:51:53 -07:00
Tongzhou Wang	d00c79f2b5	Improve cudnn RNN backward error message in eval mode (#8706 ) * Improve cudnn RNN backward in eval error msg * fix accidental change	2018-06-20 17:47:17 -04:00
Peter Goldsborough	17784d2029	Make at::tensor faster (#8709 )	2018-06-20 14:46:58 -07:00
anderspapitto	544690bf4e	Update rnn.py (#8705 )	2018-06-20 17:46:09 -04:00
anderspapitto	48e90e3339	Build system changes (#8627 ) * All changes needed to get rid of process_github.sh * allow thnn_h_path	2018-06-20 17:45:26 -04:00
Peter Goldsborough	0acddd6cee	Add torch.cuda.cudnn_is_available (#8703 )	2018-06-20 14:18:03 -07:00
Sebastian Meßmer	85468155ce	Implement OpSchema and a default DispatchKey (#8662 )	2018-06-20 14:14:24 -07:00
onnxbot	f9da3aa1aa	[auto] Update onnx to b1571d8 - ONNXIFI loader library (onnx/onnx#556 ) `b1571d829f`	2018-06-20 20:59:15 +00:00
JackLangerman	5642937ac1	more formatting (#8701 ) * fix formatting in :math: in fold docstring * escape more underscores	2018-06-20 15:32:33 -04:00
Vishwak Srinivasan	3e25b4af6d	Fix #8692 (#8699 )	2018-06-20 15:17:54 -04:00
Wanchao	73ce21a313	Create captured inputs recursively for loop to resolve loop-carried dependencies across nested blocks (#8345 ) * enable captured inputs for if Stmt to fix the carried deps bug in nested blocks * postpone captured inputs deletion and add new test case * recursively generate captured values for nested loops * check asSimple when recursively create captured input	2018-06-20 12:09:24 -07:00
Will Feng	d6c873a393	Shard test_nn to reduce runtime for each test target (#8678 ) * Shard test_nn to reduce runtime for each test target * Use load_tests for selecting tests to enable * fix lint * Use arg parser from common.py	2018-06-20 15:01:28 -04:00
Peter Goldsborough	9335885b1b	Create at::tensor (#8475 )	2018-06-20 11:44:21 -07:00
Richard Zou	b4cd9f2fc9	Clarify mp note about sharing a tensor's grad field. (#8688 ) * Clarify mp note about sharing a tensor's grad field. * Address comments * Address comments	2018-06-20 14:22:38 -04:00
Peter Goldsborough	08c1770d79	Add owner rule for cpp_extension.py (#8700 )	2018-06-20 14:11:28 -04:00
JackLangerman	b492d103ee	fix formatting in :math: in fold docstring (#8696 )	2018-06-20 13:36:57 -04:00
gchanan	b6af5d40bf	Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666 ) * Some 0-sized dimension support, port catArray away from resizeLegacy. The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass. The major changes here are: 1) catArray uses the new resize API, no longer the old resizeLegacy API. 2) As 1) is the last usage of resizeLegacy, it is deleted. 3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors. 4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray. We previously allowed this because we didn't have n-dimensional empty tensors. 5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM). 6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the strides are monotonically increasing in the empty tensor case. 7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets. 8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above. * Fix flake8. * Address review comments.	2018-06-20 13:26:08 -04:00
li-roy	cc6b046f48	Implement flatten function (#8578 ) * Implement flatten function * address comments * allow start_dim=end_dim * undo submodule change	2018-06-20 12:53:06 -04:00
Peter Goldsborough	065fdbd500	Created Tensor::to functions (#8643 ) * Created Tensor::to functions * Only have to(dtype) and to(device) * Ignore requires_grad in TensorOptions(Tensor) constructor	2018-06-20 09:28:08 -07:00
Vishwak Srinivasan	d97c9dd019	Add a warning in gradcheck if inputs precision < float64 (#8663 ) * Solves #8659 This PR adds a warning to alert users about the possibility of a failure in the gradcheck * Fix lint * Update gradcheck.py * Update gradcheck.py * update error message * Update warning message to be more descriptive	2018-06-20 12:23:22 -04:00
li-roy	61b863cbdc	Fix parsing of floating point defaults in python_arg_parser (#8681 )	2018-06-20 12:17:44 -04:00
Pieter Noordhuis	3da27312bb	Export ProcessGroupGloo options to Python (#8664 ) This surfaces the options struct that can be passed to the ProcessGroupGloo constructor to Python. By default, if no options struct is passed at construction time, the Python bindings default to using a struct with a TCP backed Gloo device that uses the machine's hostname to resolve the IP address to bind to.	2018-06-20 09:08:06 -07:00
Jinghui	0e0031e204	Fix build error in pybind_state_ideep (#8684 ) Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-06-20 08:29:48 -07:00
gchanan	695fd98192	Compatibility: write nDimension/_nDimension corresponding to dim()/_dim(). (#8676 ) Currently, THTensor_(nDimension) goes to _dim(), which makes it difficult to move individual usages over to the new API. Instead, let's create a THTensor_(_nDimension) going to _dim() and THTensor_(nDimension) going to _dim(). To do this, we will redirect all current calls and move them over as we did for _dim() and dim().	2018-06-20 11:00:25 -04:00
Will Feng	6402a4278b	Improve win-build.sh for local build (#8674 )	2018-06-20 09:41:50 -04:00
ngimel	be3e3f2ec8	don't do unnecessary copies for bernoulli_ (#8682 )	2018-06-20 10:53:35 +02:00
cpuhrsch	7fa81d6dbc	Use parallel if get_num_threads 0 (#8677 )	2018-06-19 22:12:15 -04:00
li-roy	8e4fe5dcf4	Fix serialization for Parameters (#8633 ) * Fix serialization for Parameters * address comments * addres comments	2018-06-19 22:11:13 -04:00
Yangqing Jia	637dcdc279	Remove dangling inclusion path (#8671 )	2018-06-19 17:02:20 -07:00
Peter Goldsborough	d46312fd15	Create at::from_blob (#8640 )	2018-06-19 17:00:28 -07:00
Sebastian Meßmer	66e8ecf2ea	16bit typeid (#8534 ) * 16bit typeid * CaffeTypeId::createTypeId() instead of TypeMeta::_createTypeId()	2018-06-19 19:23:58 -04:00
cpuhrsch	4608aa3058	Setup wrappers to get vectorized version of mean (#8618 ) * Setup wrappers to get vectorized version of mean * Responding to review 1 * Responding to review 2 * Use variadic AT_CHECK * Fix AT_CHECKS in ReduceOps	2018-06-19 18:14:35 -04:00
Sebastian Meßmer	d3b690ecd5	TensorTypeId (#8389 )	2018-06-19 15:05:24 -07:00
cpuhrsch	7a048cdcd7	Vectorize non-contiguous unary operations (#8488 ) * Vectorize non-contiguous unary operations All builds pass. Manual Windows rerun is here: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/9714/	2018-06-19 16:56:49 -04:00
Yangqing Jia	03f7289fcf	Add CAFFE2_USE_CUDNN guard on context_gpu.cu (#8657 )	2018-06-19 13:49:06 -07:00
Tongzhou Wang	2bf8b702a3	Fix broadcast copying device[0] tensor when not using NCCL (#8222 ) * Fix broadcast copying device[0] tensor when not using NCCL; Avoids potential extra copy in flatten_dense_tensors * use toType * revert dense_flat changes * address comments	2018-06-19 16:34:29 -04:00
Tongzhou Wang	a60540ed2b	Make NCCL build select NVCC_GENCODE smarter (#8615 ) * Make NCCL build select NVCC_GENCODE smarter * add info print * replace ; with \s * gencode\s -> gencode= * Don't let nccl use sccache	2018-06-19 16:31:17 -04:00
Teng Li	61c96811be	[c10d] NCCL python binding and CI test, with bug fixes (#8357 ) * [c10d] NCCL python binding and CI test, with bug fixes * Addressed comments and further bug fix * Made NCCL build optional, made C10D libc10d.a only * Fixed tests so that NCCL pg won't run when not neeeded * Addressed comments	2018-06-19 13:02:39 -07:00
James Reed	5ca4f5b43b	[JIT] Remove dead functions (#8658 )	2018-06-19 12:46:23 -07:00
Peter Goldsborough	a2dd707031	[C++ API] Create fixed width dtypes in torch:: namespace (#8639 ) * Create fixed width dtypes in torch:: namespace * Make kByte -> kUInt8	2018-06-19 12:40:58 -07:00
Peter Goldsborough	7ccecbbb4e	Create Tensor::options (#8630 )	2018-06-19 11:09:01 -07:00
gchanan	6cc7670bed	Port all indirect calls of resizeNdLegacy to resizeNd. (#8603 ) * Port all indirect calls of resizeNdLegacy to resizeNd. * Handle 1-d to 1-d resize. * Maintain behavior of tensor.set_(). * Fix lack of initializer_list in C :). * Return full dimensionality from newSizeOf.	2018-06-19 13:28:48 -04:00
zrphercule	65f7797d4d	typo corrected (#8632 )	2018-06-19 10:23:08 -07:00
Orion Reblitz-Richardson	c80a703829	Add CODEOWNERS entry for third_party to track changes (#8654 )	2018-06-19 08:59:11 -07:00
Thomas Viehmann	b8b051cc19	change avg_pool2/3d count_include_pad default to what it is in the docs and in 0.2 (#8645 )	2018-06-19 11:55:57 -04:00
Thomas Viehmann	9a9eadacc6	explicitly check device for grid_sampler (fixes: #8599 ) (#8646 )	2018-06-19 11:53:46 -04:00
Ailing	5f64484800	update to avoid potential duplicate error msg (#8638 )	2018-06-19 08:50:00 -07:00
kittipatv	32bc28dd18	caffe2 export (#8642 )	2018-06-19 00:50:33 -07:00
Ailing	1ac1a9dbc6	update doc for comparison operators (#8636 )	2018-06-18 21:15:22 -07:00
Ailing	f14887a63f	check for exact shape match before loading (#8619 ) * check for exact shape match before loading * Use RuntimeError instead of ValueError to keep it consistent with other errors * fix lint	2018-06-18 20:16:34 -07:00
Peter Goldsborough	271406f276	[C++ API] Make pImpl easy to use in modules to enable happy reference semantics (#8347 ) * Created TORCH_MODULE macro Rewrote Linear Rewrote Dropout and added default constructor to TORCH_MODULE macro Turned TORCH_MODULE contens into a proper base class Added some documentation Got rid of the old Dropout module Got rid of the old Embedding module Got rid of the old BatchNorm module Got rid of the old Conv module Fixing optimizers Rebase Removed old RNN modules and the TORCH_ATTR macro Removed temporary P:: namespace Added cloning behavior to all modules Got rid of some get() calls self review nits Remove noexcept from ModuleHolder methods that can throw Remove spaces Add missing override to reset() methods Added examples to documentation in pimpl.h * Post rebase fixes	2018-06-18 19:45:53 -07:00
Marat Dukhan	d3651585b8	Simplify pthreadpool implementation on top of Caffe2 thread pool (#7666 ) Remove one layer of pointer dereference when calling the thread pool.	2018-06-18 19:06:50 -07:00
Teng Li	2289815fc3	Make CI green again (#8631 )	2018-06-18 17:11:04 -07:00
bddppq	6307c117b3	Fix const type qualifier warning (#8613 )	2018-06-18 16:34:02 -07:00
zrphercule	c44c95fd0b	New operator 'expand' (#8263 ) * operator 'expand' * updated operator with a simple testcase * Revert "updated operator with a simple testcase" This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7. * updated operator with a simple testcase * expand operator with a passed testcase * typo * GPU full support added * GPU support testing... * GPU full supported * formatted * nits repaired * gpu parameters fixed * Expander removed * nits fixed, document added * formatted * new testcases added & nits repaired	2018-06-18 16:33:47 -07:00
cpuhrsch	05c473b85c	Temporarily remove TBB (#8255 )	2018-06-18 19:31:57 -04:00
Peter Goldsborough	4f37a6481d	Fix DeviceGuard usage in THD (#8622 )	2018-06-18 18:33:54 -04:00
Edward Z. Yang	10961a5b6d	Add OpenMPI for MPI tests. (#8625 ) * Add mpich for MPI tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Changed to OpenMPI * Comments change	2018-06-18 15:30:01 -07:00
James Reed	a7bf539002	[JIT] add missing check for excluding tensor method tests (#8617 ) * Improve check for addmm in autodiff * Fix missing check for excluding tensor method tests	2018-06-18 15:13:57 -07:00
James Reed	525aa74165	Improve check for addmm in autodiff (#8575 )	2018-06-18 15:12:56 -07:00
Edward Z. Yang	e4f254224e	apt update before installing nccl2 (#8624 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-18 15:10:02 -07:00
gchanan	11ea8175d4	Remove all resizeLegacy calls, except for catArray. (#8616 ) catArray is more complicated because it requires real 0-size dimension support. The other changes are safe in that the functions are never called (and are now deleted), or they are used on a result of THTensor_(newSizeOf), which has a valid size.	2018-06-18 18:08:04 -04:00
onnxbot	0a5fe55c9f	[auto] Update onnx to 53edd9e - Exclude Random Generator from Test Coverage Stat (onnx/onnx#1119 ) `53edd9e80e`	2018-06-18 20:08:51 +00:00
cpuhrsch	90532d5f57	Don't use MKL VML for log2 if below MKL build 20180406 (#8614 )	2018-06-18 16:07:01 -04:00
James Reed	ae25737455	Add kwarg support to test_autograd and stop using deprecated schema for accumulation ops (#8574 )	2018-06-18 12:41:22 -07:00
Richard Zou	2039c7a38f	Fix test_rnn_args_check (#8606 ) test_rnn_args_check generates mismatched input_shape and hidden_shape args. To do this, it changes a dimension of input_shape or hidden_shape to have an incorrect size. Before, the test was changing the size of a dimension to -1. However, this is flawed because an input of size i.e. (6, -1, 2) is wrong. This PR fixes it so that the test changes sizes of dimensions to `bad_size = 7`. As long as none of the other sizes (input_size, hidden_size, num_layers, batch_size) divide this, we don't have to worry about that dimension being accidentally broadcasted into working.	2018-06-18 14:08:57 -04:00
Paul Jesse Hellemn	e62c3a470c	[Caffe2] Make cmake find current Python first (#8569 ) * Make cmake find current Python first * Switch from string syntax to list syntax in cmake/Dependencies	2018-06-18 09:39:37 -07:00
Edward Z. Yang	88db4c816e	Disable flaky Chaining tests (#8601 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-18 11:24:37 -04:00
gchanan	c1d04c73d2	Implement non-legacy TH/THC resize, with pseudo 0-sized dimension support. (#8559 ) Unlike resizeLegacy / resizeNdLegacy, these don't call deprecated methods (e.g. _dim) and don't map between logical sizes (i.e. nDimension == 0 -> size [0]). What you ask for is what you get. The full 0-sized dimension support is hidden behind an ifdef, because we it's not fully supported yet.	2018-06-18 10:37:31 -04:00
Peter Goldsborough	d813ffc613	Dont show Python frames in backtrace (#8579 )	2018-06-18 10:13:08 -04:00
Thomas Viehmann	0ae8b6c027	add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600 ) * add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc and a few drive-by doc fixes * typo	2018-06-18 09:36:42 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Wei Yang	c9b8d8566d	Added flip() fn in ATen (CPU + CUDA) (#7873 ) * Spelling fix in MultivariateNormal docstring (#7915) * [c10d] MPI Process Group Implementation (#7783) This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR. * [c10d] MPI Process Group Implementation ref: https://github.com/pytorch/pytorch/issues/7434 * Better exception, atexit func, and addressed comments * Clang formatting changes * Static initialization and addressed comments * Added constness back * Test will now launch mpi processes if found * CMakeList Changed * Fix Windows doc for import error (#7704) * Fix Windows doc for import error * Fix doc again * Fix wrong format * Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465) * Updates to caffe2 operator documentation (#7917) * Significant updates to the operator docs in prep for merge * [auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038) `307995b143` * Test if ASAN is actually working as part of ASAN tests. (#6050) * Test if ASAN is actually working as part of ASAN tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Drop explicit use of libstdc++, we should not care. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Build with DEBUG=1 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Increase main thread stack size when using ASAN. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Split up detail.h (#7836) * Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952) * Fix fbcode compatibility (#7939) * add test for correctness of transpose fusion (#7950) * [JIT][script] Fix emitted gather and slice for dynamic indices (#7861) * [JIT][script] Fix emitted gather for dynamic indices * Also fix slice * Address comments * cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942) * Add unsafe flag to skip checking in prepare (#7832) * Add unsafe flag to skip checking in prepare * pop * Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937) These are used to convert Half -> half and half -> Half respectively. from_type will be used for runtime type checking in THC. * Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936) * try again * use DEFINED * use a loop * Minor fixes * remove sort requirement from pad-sequence (#7928) * pad-sequence no longer requires sorting entries pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself. * remove sort requirement from pad-sequence Picks up from #5974. Removes the requirement that input sequences to pad_sequence have to be sorted. Addressed the comments in the PR: - Updated docstring for pad_sequence - Remove sort requirement in pad_sequence test - Test unsorted and sorted sequences in pad_sequence test * Fix checkBackend error message (#7926) * Fix checkBackend error message Fixes #7849 * Switch order of printing args * Split CI tests in half and run them in parallel (#7867) * Split and run tests in parallel * Refactor tests * Handling of scalars in torch.Size (#5676) * Handling of scalars in torch.Size torch.Size() constructor uses python_arg_parser IntList in python_arg_parser can take iter/range Have IntList take python iterables and ranges. Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew Address comments Address comments * Rebased * Address nit * [JIT] Fission and fusion passes for addmm (#7938) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue * Set smaller grain size for some cases (#7941) * Fix returning scalar input in Python autograd function (#7934) * fix _wrap_outputs not working with scalar inputs * add a test * Prevent git autocrlf for bash scripts (#7949) * Delete unused file (#7919) * Fix typo in autodiff formula for addmm (#7932) * 1) use meshgrid for flip() CPU implementation, only need one copy of input tensor; 2) changed kernel of CUDA implementation, no need materialized indices tensor; 3) reusing error checking code * [caffe2] YellowFin parameter update GPU code fix. (#6993) * [Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155) * Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536) * added const for local variables * Fix the cpp libtorch CUDA build (#7975) * Use mingfeima's mkldnn (#7977) * Fix the import part of the windows doc (#7979) * Change perf test folder after git checkout (#7980) * Move the broadcast check in MKL Add/Sum to runtime (#7978) * Use Glog's implementation of STL logging when possible. (#7206) Inject custom workaround into namespace std so that it can be found by ADL. * [Hotfix] Bring back warnings and -Werror to ATen (#7866) * Bring back warnings and -Werror to ATen * Unbreak... * Fix tbb errors * Enable ONNX backend Mean tests (#7985) * Add third wayt to determine IS_CONDA (#7971) * Fix EmbeddingBag max_norm option (#7959) * fix EmbeddingBag max_norm option * flake8 * add warning to the embedding bag arg change * Raise error when torch.load a storage on a non-existing device (#7921) * Raise error when torch.load a storage on a non-existing device Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine would raise an unreadable error: ``` ~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self) 223 if self.idx is -1: 224 return --> 225 self.prev_idx = torch._C._cuda_getDevice() 226 if self.prev_idx != self.idx: 227 torch._C._cuda_setDevice(self.idx) AttributeError: module 'torch._C' has no attribute '_cuda_getDevice' ``` This PR makes it so that torch.load raises a hard error if one tries to load a storage onto a non-existing device and suggests the user to use torch.load's map_location feature. * Address comments * missing dep * Make THStorage / THCStorage have void* data ptr. (#7964) * Make THStorage / THCStorage have void* data ptr. This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type. The major changes here are: 1) data has been renamed to data_ptr and made void* in THStorage/THCStorage. 2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage). 3) APIs for Accessing the data as a real: a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct). b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths). c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>(). Add include. * Attempt to fix clang build issues. * Clarify comment and remove extra character. * Rename unsafeData -> unsafe_data. * Remove unnecessary 'to' function to get compile time rather than link time errors. * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda. * Remove python bindings for `torch.slice` (#7924) * skip python bindings for slice * remove tests * convert slice test to indexing * Build ONNX for PyTorch version of libcaffe2 (#7967) * support loading gzip (#6490) * support loading gzip * address comments * address comments * fix lint * fix test for python2 * Add memory leak check in CUDA tests (#7270) * Add memory leak check in CUDA tests * Tracking multi-GPU too * fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test * add a comment * skip if cuda * 1. Change the wrapper to a method in common.py:TestCase 2. Refactor common constants/method that initialize CUDA context into common_cuda.py 3. Update some test files to use TEST_CUDA and TEST_MULTIGPU * Fix MaxUnpool3d forward memory leak * Fix MultiLabelMarginCriterion forward memory leak * Fix MultiMarginLoss backward memory leak * default doCUDAMemoryCheck to False * make the wrapper skip-able * use TEST_MULTIGPU * add align_corners=True/False tests for Upsample; fix TEST_CUDNN * finalize interface * VolumetricMaxUnpooling_updateOutput * fix test_nccl * rename THC caching allocator methods to be clearer * make the wrapped function a method * address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp * fix renamed var * Revert "Set smaller grain size for some cases" (#7988) * Entry for c10d in CODEOWNERS (#8001) * Fix a couple of typos (#7998) * Fix typo * Fix typo * Fix typo * Fix typo * Add on-stack observer cache for Observable (#7931) observers_list_ stores all the observers for an observable. The list is allocated on heap, which can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20% speed up for start and stop observer calls. * Reduce grain size for Unary operations (#8003) * [auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053) `8ec0e5fe9b` * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935) * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. This requires renaming the _cast functions which used the unqualified names. * Separate onnx mapping of scalar type from cast name. * Fix flake8. * Properly cast onnx. * Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013) * Mention the pytorch-ci-hud on the README. (#8004) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Re-enable build env check (#7969) * Re-enable build env check * Fix linux test error * Try to fix macOS test error * Update nn.rst (#8029) * Example for Transformed Distribution (#8011) * [auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052) `33e9cd4182` * [auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057) `1504a33abb` * Support CUDA tensors in ProcessGroupGloo (#7694) This adds an unconditional dependency on CUDA, which is not desirable for the long term. Ideally we have split like ATen where we have different artifacts for different backends so you can decide at runtime what to use. * [auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062) `3fb965666e` * propagate nan in some activations (#8033) * propagate nan in some activations * fix py2 not having math.nan * flake8 * Fix profiler crash when no events register (#8034) * Fix profiler crash when no events register When trying to profile, attempting to print the event table throws a vague error because the event list is empty: .... max_name_length = max(len(evt.key) for evt in events) ValueError: max() arg is an empty sequence This change fixes the error by returning an empty string. * Update profiler.py * Allow CI testing with different AVX configs (#8020) * Allow CI testing with different AVX configs * Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config * Support for generating ATen during the fbcode build, rather than committing the generated files (#8002) Paint the internal bikeshed a slightly different color to appease Buck tooling. * Factor python dependency out of interpreter (#7970) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI * [auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060) `760c9283d0` * Support modules that output scalar in Gather (and data parallel) (#7973) * Support modules that output scalar in Gather (and data parallel) * Improve warning msg * [auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064) `9e7855dcd4` * [script] Add support for torch.zeros, torch.ones, etc. (#7799) * [script] Add support for torch.zeros, torch.ones, etc. * modifies gen_jit_dispatch to creating bindings for functions that do not take tensor arguments, but do have an initial type argument * adds tensor attributes to these functions for device, layout, and dtype specification * extends the list of valid compiler constants to include device, layout, and dtype. * allows functions with Generators, but only using the default generator Known limitations: * when using `torch.float`, we convert it to a scalar tensor and make no checks that it is actually used only in a dtype specification. This is similar to how we handle Python numbers, creating some situations where the script is more permissive. Fixing this requires much more significant changes to the IR, so is lower priority for now. * devices specified using string literals e.g. 'cuda:1' do not work, since we do not support string literals in general. * Add profiling annotations to NeuralNet[Operator\|Data] (#8005) * Update from facebook 1ee4edd286a3 (#8040) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state * Skip CUDA memory leak test on BN tests on windows (#8043) * workaround for Sequential when one cannot retrieve python source (#8048) * [auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047) `0dbec2a047` * [auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069) `4f8ef17ad3` * [auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050) `e6a500e54c` * [auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061) `033f956f41` * Remove NO_PYTHON macros from Exceptions.h/cpp (#8007) Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp * [ready] Clean up torch.distributions (#8046) * Have a single THStorage and THCStorage type. (#8030) No longer generate data-type specific Storage types, since all Storage types are now identical anyway. For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type * Reduce usages of TensorUtils<T>::DataType in THC. (#8056) TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call. However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work. Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member), but the DataType specific functions are more problematic. So, this PR does two things: 1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical 2) Templatizes the THC_pointwiseApplyX functions to take scalar types. To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of the corresponding template parameter. We will need to get rid of these static_asserts in the future, but this is useful for now. * Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037) * Added support to run ONNX Upsample operator (mode=nearest) in Caffe2 * adding error checks to upsample * adding error checks to upsample * adding error checks to upsample * changing to np.isclose * Revert onnx submodule update * still fixing * [auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886) `eb12f72a86` * [auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068) `bd98abbba0` * Skip ConvTraspose ONNX backend tests (#8074) * Post process onnx proto (#8064) * Post processing onnx generated protobuf files to hide global symbols * . * . * Add code for TensorBoard visualization of JIT GraphExecutors (#8050) * [auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063) `cc26486541` * [auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070) `356208d756` * Move backtrace to its own header (#8096) * Move backtrace to its own header * Move cxxabi.h into Backtrace.cpp * Fix and ignore some warnings (#8081) * Do an additional sanity check that nvcc and CUDA include dir agree. (#8094) If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may end up in a situation where the CUDA_VERSION of your includes mismatches the CUDA version of your nvcc. See #8092 for a concrete case where this can occur. Explicitly detect this situation and give a good error message in this case! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * use regex in kwarg parser (#8061) * Removing remaining NO_PYTHON ifdefs (#8067) * Remove NO_PYTHON in tracing * Remove NO_PYTHON in ir.h * Remove NO_PYTHON in test_jit.cpp * Replace std::size_t with size_t (#8093) * Remove out-of-date comment (#8114) * [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Resolve merge conflicts * . * Update GetAsyncNetHIPThreadPool * Enable BUILD_CAFFE2 in pytorch build * Unifiy USE_HIP and USE_ROCM * always check USE_ROCM * . * remove unrelated change * move all core hip files to separate subdirectory * . * . * recurse glob core directory * . * correct include * . * Detect CUDNN related environment variables in cmake (#8082) * Implement adaptive softmax (#5287) * Implement adaptive softmax * fix test for python 2 * add return_logprob flag * add a test for cross-entropy path * address review comments * Fix docs * pytorch 0.4 fixes * address review comments * don't use no_grad when computing log-probs * add predict method * add test for predict * change methods order * get rid of hardcoded int values * Add an optional bias term to the head of AdaptiveSoftmax * Make libshm also test if rt requires pthread. (#8112) In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23), -lrt is not sufficient to use shm_open; you also need to declare a dependency on pthread. This patch adds a surgical extra fix to detect this situation, in the case that I noticed it failing in the wild. Fixes #8110 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058) `2d5ce4aeb6` * Add missing pragma once. (#8118) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903) `2a876162ac` * Split SparseTensorImpl off from TensorImpl. (#7990) * Split SparseTensorImpl off from TensorImpl. At the moment they have the same data layout, but with the upcoming refactor they will not, and we need a place to put all of the sparse tensor specific fields. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update SparseTensorImpl.h * [Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896) * [Caffe2] Support non peer access in muji * [Caffe2] Add test for 4 gpus and 2 groups * [Caffe2] Add comments * Fix bug when reduced_affix is empty * Fix typo and add comments about cpu and amd gpu * Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127) * Replace most remaining usages of TensorUtils<T>::DataType. (#8124) As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type. This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal. After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first. * Add utf-8 header to Python file with Unicode. (#8131) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add back lrn test (#8134) * Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)" This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225. * Fix mismatched default values * Add non_blocking to Tensor/Module.to (#7312) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments * Fix job name checking for AVX tests (#8135) * Fix a corner case for ReShapeOp (#8142) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem. * cpu/ideep context converter (#8139) * fix type mismatch while call torch._C._cuda_setDevice (#8065) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * docs: Add warning to torch.repeat() (#8116) * docs: Add warning to torch.repeat() closes #7993 * docs: Add links for numpy functions * docs: Break the too long line * Accelerate bernoulli number generation on CPU (#7171) * opt bernoulli rng with vsl and openmp * detect cpu vendor for bernnoulli * retrigger test platform * check the vendor more severely * use cpuinfo to check vendor * docs: add canonical_url and fix redirect link (#8155) * docs: enable redirect link to work for each specific page * docs: add canonical_url for search engines closes #7222 * docs: update redirect link to canonical_url * docstring support for @script and @script_method (#7898) * docstring support for @script and @script_method * make it python2 compatible * improve according to review * improve build_stmts * use filter instead of list comprehension * improve the way wrap is handled for script_method * stash the original method instead * allow dynamic attr for ScriptMethod and GraphExecutor * a bit comment on build_Expr * remove _build_wrap * a bit improve on comments * rename to __original_methods * should be _original_methods * [auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075) `968d28d901` * remove some unnecessary cudaGetDevices (#8089) * remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device * Fix cuda.framework error on OSX. (#8136) When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes #8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [C++ API] Improve and use OrderedDict for parameters / modules (#7823) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp * Fix __rshift__ bug (#8161) * Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments * Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164) For non-generic function call implementations in Storage used by TensorUtils, we do the following: 1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType. 2) Move the implementation from generic/C++ to non-generic/C++. 3) Change the generic implementation to call the non-generic implementation. This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner). * Pinning opencv to < 3.4 in conda builds (#7923) * Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4 * Adding -setup- path, and better code structure (#8122) * Abstract parallelization to faciliate using threadpools (#8163) * [Caffe2] Update elementwise ops to support numpy style boradcast (#8070) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check * Export getCudnnHandle (#7726) * [JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173) * [JIT] Support a single TensorList argument anywhere in the argument list * [JIT] index_put * use the correct datatype format (#8144) * Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143) * Get rid of SOVERSION (again). (#8132) We don't want SOVERSION because pip will lose the symlink and double your distribution size, and also because our setup.py accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib on OS X. This leads to a very puzzling error where you get the error "cannot initialize CUDA without ATen_cuda", because there are actually two copies of your registry in memory (because there are two copies of the dynamic library). Dropping SOVERSION makes it impossible to make this mistake. In principle, if the shared library load is done with DYLD_GLOBAL, that should also prevent two copies of the registry from popping up. Worth checking at some later point, if you need to bring back SOVERSION (because, e.g., pip finally fixed their software.) Partially fixes #8022. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix a corner case for ReShapeOp (#8178) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem. * Better conv error message basing on weight shape (#8051) * Add retry logic to sccache download for Windows build (#7697) * Add retry logic to sccache download for Windows build * fix script bug * clean up * fix caffe2 docker build (#7411) * [ONNX] Fix type_as symbolic (#8183) * [ONNX] Nuke type_as symbolic * make it better * Fix lookup + test * Yangqing as an ONNX codeowner (#8185) * Fix protobuf options (#8184) * protobuf * fix protobuf_MSVC_STATIC_RUNTIME * Add a loop unrolling pass to PyTorch JIT (#7672) * [auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078) `4e65fd83ba` * [Caffe2] Merging setup.py with setup_caffe2.py (#8129) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags * Fix scalar check for sparse tensors. (#8197) * Fix scalar check for sparse tensors. As discovered in #8152 If `t` is a scalar sparse tensor, `t._indices` used to return a sparse empty tensor because the scalar check was incorrect. This PR modifies the scalar check to return a dense tensor instead of a sparse tensor. i.e. ``` tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device) out = tensor._indices() # was a sparse tensor, now is dense. ``` * Fix typos * fix lint * Add more annotations for arguments in ATen schema (#8192) * use THCThrustAllocator in BCECriterion (#8188) * Allow parallel_apply to take in list[Tensor] (#8047) * Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166) * Docs for gradcheck and gradgradcheck; expose gradgradcheck * address comments * Implement randperm for CUDA (#7606) * Implement randperm for CUDA * Use Thrust to implement randperm * clean up * Fix test * Offload small input scenario to CPU * Fixed test * Try to fix Windows error * Fix Windows error and clean up * Use fork_rng context manager * Move test_randperm_cuda to test_cuda * Add half tensor support * Fix cuda::type error * Fix CPU offloading * Fix issues * No need to check range for n == 0 case * Update c10d build to link against Caffe2 (#8201) This follows #7399. * add wipe_cache option (#8204) as title * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176) * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these. This PR does the following: 1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp. This involves moving includes of it outside of generic/, so we can use the new implementations. 2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor. This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions. 3) For functions inside of TensorUtils that don't use data pointers: a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp. b) Have the generic versions call the non-generic versions. c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn. * Add comment about THCTensor struct. * Error if storage is null in setStorageNd or resizeNd. * Fix c10d compiler warnings (#8206) Copy compiler flags from the ones used in setup.py and fix warnings. This makes the root build that includes c10d headers warning free. * Bump gloo submodule (#8202) This includes facebookincubator/gloo#125. * rm -rf aten/contrib (#8165) * Remove aten/contrib * Remove from CMake * Fix tanh_op on ios build (#8207) * Fix tanh_op on ios build * Fix tanh * [auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090) `f28e2f1a60` * [cmake] deprecate caffe2_* specific cuda function in cmake. (#8200) * deprecate caffe2_* specific cuda function in cmake. * ENV{} -> $ENV{} * CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST * . * . * . * skip CUDA memory leak check on Windows altogether (#8213) * Record shape and type in autograd to validate gradients (#8168) The check that the gradient is defined is currently disabled because TestJit.test_ge_optimized will trigger the error. * [auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088) `18d70ff529` * Set up a c10 source folder (#7822) * Set up a c10 source folder * Change the benchmark log format and also log flops (#8215) as title * Move helper functions to unnamed namespace. (#8224) Currently, the helper functions in this file are in global namespace. I am guessing the purpose of excluding them from was to keep them local. * [auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083) `e96d823e5c` * Change new bernoulli implementation to be fully generic. (#8218) The current implementation depends on THTensor types being unique, which is not guaranteed going forward. * Structure THTensor like THCTensor is structured. (#8217) In particular, define a base type, _THTensor, that can be used for all THRealTensor structs. This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates). * move THCP-related utils to cuda/utils.cpp. (#8221) These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations. (Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X). utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code. This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>). * [READY TO MERGE] Use ccache in macOS build (#8009) * Use ccache in macOS build * Moving to sccache * Don't use sccache in test job * [NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647) * Add nan and inf probs check to multinomial * fix bug * Spawn CUDA test in subprocess * Make sure invalid input won't pass the test case * Try to fix error * Test failure cases in Python 3 only * Try to fix Windows error * Move CUDA test to test_cuda.py * fix issues * fix module name error * no need to check for CUDA existence in test_cuda * Use PY3 * [READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745) * Don't import TEST_CUDA for test_dataloader on Windows * test_partial_workers is stuck on Windows * Don't copy unneeded grads when using a function for several derivatives (Fixes #7722) (#7759) Trying to copy all results fails when one of them is a tensor list which has not been populated. This blew up for CuDNN RNNs when the weights did not require grad. Thanks to Sylvain Gugger for reporting! * Fix win mkldnn (#7718) * Sync build_pytorch_libs.bat with build_pytorch_libs.sh * fix quoting * add warnings * fix warnings * Add /EHa * [Caffe2] Add ADD operator for IDEEP (#8220) * Add ADD operator for IDEEP * Add boradcast check * Comments * Allow optional build and installation of native test binaries (#8225) * test finetuning * install off by default * Turn BUILD_TEST=ON for jenkins. * Turn on install_test in jenkins as well * Update MKL exporter to IDEEP ops (#8228) IDEEP exporter support * [ideep] Add IDEEP Squeeze op (#8227) Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc * [auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084) `62e63e9de8` * Use .cc since some downstream libraries are configured for C++ only. (#8234) * Rename SparseTensor to SparseTensorRef. (#8237) I want to introduce using SparseTensor = Tensor (as a documentary type alias for Tensor), but the name is already taken. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [caffe2] Build Android tests and binaries in CI (#7593) Update benchmark submodule to version with fixed Android/GNUSTL build * Remove core and util warnings (#8239) * Fix some signed/unsigned mismatches * Skip unused result warning * Explict fallthrough for murmur hash * Enable aligned new support to eliminate warning * Switch to int instead of unsigned in some cases * Remove .gitmodules.aten since it is in .gitmodules now (#8232) * Fix: gradcheck forced float32 (#8230) * Print requires_grad and grad_fn in string repr of tensor (#8211) For example: >>> torch.ones(3).requires_grad_() tensor([ 1., 1., 1.], requires_grad=True) >>> torch.ones(3).requires_grad_() * 5 tensor([ 5., 5., 5.], grad_fn=<MulBackward0>) The suffix (dtype, requires_grad, grad_fn) wraps to a new line if it would cause the the line to exceed the linewidth. >>> torch.ones(10).double().requires_grad_() tensor([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=torch.float64, requires_grad=True) * Fix TEST_CUDA import in test_cuda (#8246) * Fix lifting cat into its constant version (#8174) This fixes a bug where schema including varargs lists did not lift properly blocking correct ONNX export. * Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243) * Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc. This PR does the following: 1) Removes THSTensor macros in torch/csrc, which aren't used. 2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_): a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h. b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character. In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example: currently in the THC random code, the state is casted to THByteTensor; this happens to work because the macros don't happen to override THByteTensor. But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works. The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't. We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now. Don't change the plugin. * [auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077) `3a035f4397` * Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251) * [cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248) * Fix app size check (#8256) Fix app size check * wip on CPU impl * Stop BCELoss from returning negative results (#8147) * Stop BCELoss from returning negative results * check explicitly for 0 before taking log * add tests * fix lint * address comments * Relax CUDA_HOME detection logic, to build when libraries are found. (#8244) Log when no cuda runtime is found, but CUDA is found * Added backward function for kl_div target (#7839) * added backward fn for target * added module test for kl_div target, and assuming targets are probabilities * Change the output format of caffe2 observers (#8261) as title * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247) * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. * Fix template parameter. * [caffe2] Move submodule onnx-tensorrt forward (#7659) Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential. * [ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260) TSIA * un-genericize THCDeviceTensorUtils. (#8258) * provide data<T>() in TH(C)Tensor. * un-genericize THCDeviceTensorUtils. This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type. * [caffe2] Fix ATen dispatch for ops with TensorList arg (#8226) * [cmake] Add and export Modules_CUDA_fix (#8271) * Add and export Modules_CUDA_fix * actually, need to include before finding cuda * [auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097) `2508156135` * [auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098) `39e46687ea` * [cmake] Make cudnn optional (#8265) * Make cudnn optional * Remove cudnn file from cpu file * Move signal window functions to ATen; add Blackman window (#8130) * Move signal window functions to ATen; add Blackman window * fix cuda test not checking scipy * [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233) IDEEP supports fusion for non-group conv * [c10d] NCCL Process Group implementation (#8182) * [c10d] Process Group NCCL implementation * Addressed comments * Added one missing return and clang format again * Use cmake/Modules for everything and fix gloo build * Fixed compiler warnings * Deleted duplicated FindNCCL * Set up CI build for CUDA 9.2 + macOS (#8274) * Add macOS CUDA build to CI * Fix undefined symbols issue * Use sccache for CUDA build * Fix sccache issues * clean up * c10 build setup (#8264) * Move c10/ to caffe2/dispatch/ * Set up caffe2/utils directory * Remove remaining TensorTypeUtils functions. (#8286) Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Create initial Python bindings for c10d (#8119) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted * Add option USE_NVRTC which defaults to off (#8289) * [build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159) * Remove /torch/lib/THD/cmake in favor of /cmake * path fix * Explicitly marking gloo to use cuda * Fix gloo path in THD * Have a single THTensor / THCTensor type. (#8288) * Remove remaining TensorTypeUtils functions. Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Have a single THTensor / THCTensor type. As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor. For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type. * undef GENERATE_SPARSE. * [auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102) `58efe0a9ca` * Some utils for compile-time programming (#7778) * Add some C++17 features, implemented with C++14 * Add some type traits * Compile-time type list abstraction * Some utils for compile-time programming * Fix compatibility with a larger range of compilers * Use guts::array instead of std::array because of std::array shortcomings * code review comments * Use quotes for includes * Remove THC's FindMAGMA (#8299) * Entries for torch.distributed in CODEOWNERS (#8293) * Add depthwise convolution test for IDEEP (#8301) * Fix dividing by zero segfault in Reshape (#8302) when infer a dimension of zero size new shape * Removes unused THCTensorConv (#8229) * Replace Variables to Tensors (#8309) * Clean up old sccache log before build (#8305) * Remove unused grad ops on mobile to reduce app size (#8297) Remove unused grad ops on mobile to reduce app size * Small fixes (#8296) * [auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104) `5ed684ebe5` * Fix sample code for cuda stream (#8319) * [auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105) `4b4085c2e9` * [THD] fix broken THD build with NCCL (#8323) * Add docstring for `torch.sparse_coo_tensor` (#8152) * add sparse_coo_tensor docstring * update empty tensor example * whitespace * whitespace again * add error when backend is not supported by DDP (#8325) * Fix collect_env.py for Windows (#8326) * Fix collect_env.py for Windows * Fix expect file for Win machine * Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277) * Simplify the solution * Remove the usage of set errorlevel * Skip test_multinomial_invalid_probs_cuda on Windows (#8324) * Support printing sparse tensors in ATen, fixes #8333. (#8334) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [C++ API] Cursors (#8190) * Add cursors to C++ API * Small self nits * s/struct/class * Use more STL like names for cursors * Implement dim_arange operator (#8266) * Implement arange_like operator * add ONNX symbolic * lint * change name * Comment the hack * 1. fixed flip CPU impl for non-continuous flip dims; 2. added more tests; 3. using TensorInfo and collapseDims to speed up CUDA impl for cases where flip dim is the 1st or last dim * nits * 1. removed for loop in pointwise CUDA kernel; 2. using templated (int64_t) IndexType for indices in pointwise CUDA kernel * added torch.flip.__doc__ * nits	2018-06-15 21:20:55 -04:00
Soumith Chintala	92f67d9404	fix lint	2018-06-15 18:18:20 -07:00
li-roy	26bed6d83e	assert limit on cudnn grid_sampler (#8576 )	2018-06-15 17:33:33 -07:00
Mike Ruberry	7b2ad8893d	Eliminates noisy assert spew when running test_cuda.py (#8531 ) * Fixes test_multinomial_invalid_probs_cuda debug spew * Fixes test_multinomial_invalid_probs_cuda debug spew * Fixes Python linting	2018-06-15 19:52:53 -04:00
ngimel	682dec2cea	add relu to jit and exp to autodiff (#8573 )	2018-06-15 19:49:20 -04:00
Matthew Inkawhich	b10c94b507	Update operator documentation with markdown descriptions and interfaces (#8085 ) * Update operator documentation with markdown descriptions and interfaces * Added rest of updated operator documentation to source files * Commiting local changes for rebase * fixed bracket typo in sqrt_op.cc file * Added updated markdown documentation to remaining completed ops	2018-06-15 19:02:24 -04:00
Zachary DeVito	d968614502	Enable open registration of VariableType objects (#8540 ) We have 2 use cases where we want to experiment with new base ATen tensor types: * BatchTensor for matchbox * Tensors that live on accelerators It is possible to subclass TensorImpl to implement these but VariableType does not work with them because it cannot find the equivalent variable type in the registry. This commit changes the way we implement type -> variable(type) lookup so that torch::register_variable_type_for can be called on any at::Type. Lookups are still done using arrays so there should be no perf impact from the change.	2018-06-15 14:56:19 -07:00
Edward Z. Yang	711e5a6ceb	Port THS to ATen. (#8409 ) * Port THS to ATen. The basic structure of the patch: - All kernels in aten/src/THS got rewritten as native functions in aten/src/ATen/native/sparse I took the liberty to rename some of the kernels, opting for a longer, more transparent names than things like 'spaddcmul'. - Instead of holding fields for sparse tensor in the TH C struct THSTensor, they are now held in a C++ class SparseTensorImpl (this explains why I had to do this all in one go; I can't have two reps for sparse tensors!) Along the way, we change a key internal representation invariant: an "empty" sparse tensor has dimI == 1 and dimV == 0 (this is different from dimI == 0 and dimV == 0 we had before); this ensures that we maintain the invariant that dim == dimI + dimV. "Scalar" sparse tensors are made illegal, because there really is no way to properly express them in COO format. - Because we haven't ported THCS or any of the traditional dense TH implementations, there is a new set of adapter functions in native/LegacyBridge.cpp exclusively devoted to deciding whether or not to go to the new native implementation or back to the legacy TH binding (prefixed with th_). The intent is that when everything gets ported, we can delete this file. - I've kept the stubs for all the THS functions, but they now all error if you try to actually call them. Eventually, we should replace these with calls to ATen so that everything keeps working. - I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty. There are some miscellaneous improvements which were needed for other changes in this patch: - There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what it says on the tin. - axpy templated function moved to TH/BlasUtils.h, there's a new macro which lets you easily forward to all of the TH functions. We also expose THBlas_copy. I'm not terribly pleased with these functions but they seem to serve a purpose they need. - New method on Tensor to get TensorImpl, unsafeGetTensorImpl - accessor() is now this-const, since const-correctness on Tensor is a lie - New toSparse()/toDense() methods on Type; now you can call these directly without having to manually apply at::toSparse/toDense on the Backend and then running toBackend yourself. Changes to the kernels: - Previously, the whole body of all kernels was compiled for every supported scalar type. In our new implementation, the scalar dispatch has been pushed into the smallest extent which (1) is not in a type loop and (2) requires statically knowing the scalar type. These sites all use AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as possible, but sometimes it was not possible when a OpenMP pragma was used. - Anywhere we tested if the nDimension of a tensor was zero, we replaced with a test that numel is zero. Because, as we known, nDimension of zero-size tensors in TH is zero, and that's wrong wrong wrong (and not done this way in ATen). Some subtleties: - Places where previously fastget1d was used, I now use a TensorAccessor. However, you have to be careful about grabbing the accessor, because sometimes you will be accessor'ing indices/values and they are empty, which means they will be 1D* ("oh, aren't indices always 2D?" Nope. Nyet.) So, essentially, it is only safe to grab an accessor after you have checked that nnz != 0. All of these shenanigans will go away when we properly support zero-size dimensions. A few places, we test for this case just by wrapping the loop in a conditional on nnz. Some other places this is not so easy, so we instead short-circuit the function with a special case for when nnz == 0 (usually, these implementations are degenerate). - There is a very subtle but important difference between _sparse_get_impl(self)->indices() and self._indices(); the latter may return a view! This is because nnz is not guaranteed to match the dimensions of indices/values; you can "truncate" a sparse tensor by setting the nnz. Actually, I think this is not a good idea and we should enforce a stronger invariant, but for this patch I slavishly adhere to the old ways, and as such I have to be very careful if I want to resize something, I had better use the former and not the latter. - I had to reimplement broadcasting by hand (thus the s_ and non-s_ functions in the sparse native files). There is a very important distinction between foo_out and foo_, so it is important that the LegacyBridge function always call to the lower layer, and not try to avoid boilerplate by calling to another LegacyBridge function first. I did NOT put broadcasting in LegacyBridge (even though, ultimately, that's where it must live), because the th_ functions which are invoked from LegacyBridge handle broadcasting themselves, and I don't want to broadcast twice. - Sparse function MUST explicitly specify the Type they dispatch from, otherwise Variable wrapping/unwrapping will not work correctly. If you use _get_sparse_impl, that is sufficient to levy this requirement. - The "has native" tests in LegacyBridge.cpp are not 100%, because some of the functions are mixed dense-sparse functions, and so you can't just say, "Oh, if it's sparse and CPU, call the native sparse implementation." This is handled on a case by case basis. There is some especially complex logic for add(), which has dense-dense, sparse-sparse and dense-sparse implementations. - I added some uses of SparseTensorRef in native_functions.yaml, but you will notice that these are all on native_* functions, and not the actual, top-level functions. So the SparseTensorRef is purely documentary (helping you not call the wrong overload) but there is no magic; we do the wrapping ourselves the hard way. (This is in constrast to the TH binding code which is magical.) Except for _sparse_mask; _sparse_mask is magical. - There is a raw_copy_sparse_ method, which is really my way of getting around the fact that copy_ has never been implemented for sparse tensors (even before this patch), but there IS a super secret, internal way of doing these copies that the THS code used, and which I needed to get my hands on when I did this port. We should refactor so that either (a) copy_ does support sparse-sparse copy natively, or (b) we do this other ways. - Irritatingly, I must explicitly resize_as_ before copy_ into a tensor. This was not the case with THTensor_(copy) but I don't have any direct binding that doesn't have this requirement. - For some reason, the sparse tensor constructor accepts a scalar tensor for the values tensor. This is kind of weird because you always need an nnz-dimension. However, the old code supported this and just expanded it into a 1D size 0 tensor; so we need some explicit code to do this. There are maybe a bit more AT_ASSERTs in some of the kernels than is wise. I added them all when I was debugging and was loathe to remove them. Some last mile fixes after this commit went into PR - Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts). - Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short. - Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings - Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing - Update test_function's output - Some last mile fixes for dispatch confusion in sparse_coo_tensor functions. - New simplified regression test based on failures I saw in ONNX - Increase tolerance on super resolution test - More robust dynamic_type normalization, fixes ONNX bug. The dynamic_type situation is very delicate; probably need to stop having both Scalar and real. - Make new_with_tensor_sparse more CUDA safe - Note about CUDA-safety in SparseTensorImpl - Rename dimI/dimV to sparseDims/denseDims. - Make localScalar on SparseTensorImpl work. - Make numel uniformly supported on all types, not just dense types - Add tests for is_nonzero() method (which exercises localScalar) - Disable constant JIT autogenerated tests, which are fragile and broken by this change, but being fixed in a parallel track. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-15 17:52:21 -04:00
Tongzhou Wang	c537fd7432	fix lint (#8567 )	2018-06-15 17:34:39 -04:00
Paul Jesse Hellemn	c457fc994d	Adding pyyaml to Ubuntu and Centos docker images (#8490 )	2018-06-15 13:55:48 -07:00
Xiao Yang	ec23ee67cf	add order switch op to nomnigraph (#8436 )	2018-06-15 10:07:41 -07:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Lu Fang	d7690742d5	Fix the formula of some norms (#8545 )	2018-06-15 10:41:26 -04:00
Jorghi12	b002aee0ff	Disable verbose logging for PyTorch ROCm nightly builds. (#8517 )	2018-06-15 09:14:03 -04:00
Soumith Chintala	7251d70c5b	fixed THD NO_CUDA (#8539 )	2018-06-15 09:09:23 -04:00
onnxbot	0965e8e9e7	[auto] Update onnx to 0125af3 - Add node test for Dropout (onnx/onnx#1115 ) `0125af3204`	2018-06-15 11:31:13 +00:00
onnxbot	4e3ada19cf	[auto] Update onnx to d9fc1b1 - Add Node test for BatchNormalization (onnx/onnx#1117 ) `d9fc1b14aa`	2018-06-15 08:36:29 +00:00
onnxbot	5a31f73611	[auto] Update onnx to b70ee6a - Make RNN/LSTM/GRU treatment of recurrent weights consistent (onnx/onnx#1103 ) `b70ee6a99b`	2018-06-15 05:22:43 +00:00
James Reed	677739cd1e	Fix createZerosLike for scalars (#8537 )	2018-06-14 20:51:14 -07:00
onnxbot	55de546146	[auto] Update onnx to c647994 - fix upper-bound for local-region in lrn test case (onnx/onnx#1095 ) `c6479945bb`	2018-06-15 03:40:07 +00:00
bddppq	a8bf30d7a5	caffe2 hip python binding (#8491 ) * caffe2 hip python binding * Change back onnx submodule	2018-06-14 19:56:56 -07:00
onnxbot	3a1265c739	[auto] Update onnx to 578a439 - Add Node Test for InstanceNormalization (onnx/onnx#1118 ) `578a439b63`	2018-06-15 01:40:42 +00:00
Edward Z. Yang	829bcf3e9b	Don't apply PR 12 to Thrust anymore. (#8542 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-14 21:39:21 -04:00
Edward Z. Yang	848873e1f6	Must run apt-get install as sudo. (#8454 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-14 21:32:42 -04:00
Lu Fang	302408e6c2	Support BatchNormalization opset 7 (#8482 )	2018-06-15 08:44:35 +08:00
Will Feng	54c456da68	Improve win-build.sh for Windows local build (#8493 )	2018-06-14 17:11:59 -07:00
James Reed	544605d3a9	[JIT] Remove TK_WHERE (#8536 )	2018-06-14 16:46:08 -07:00
James Reed	34c9d16ca1	[JIT] End-to-end example-based robustness testing for hybrid frontend (#8451 ) * End-to-end example-based robustness testing for hybrid frontend * delet this	2018-06-14 14:58:30 -07:00
li-roy	6869a5f0fb	Throw error on 0-length tensor slicing (#7775 ) * throw error on 0-length tensor slicing * return empty tensor instead of throwing error * make 0 slice work for tuples also * add tests * move check to aten * Address comments	2018-06-14 17:40:51 -04:00
gchanan	edc3000963	Move empty size logic from ATen into TH/THC. (#8468 ) * Move empty size logic from ATen into TH/THC. The goal here is to unify the tensor representations; since the "majority" of the representation is in TH, we push the empty size ({0}) and empty stride ({1}) logic into TH. This PR does the following: 1) Previously THTensor/THCTensor with dim_ == 0, size == nullptr, stride == nullptr are now dim_ == 1, size == {0}, stride == {1}. 2) The logic that previously implemented this at the ATen level (e.g. THLongStorageView STRIDE_EMPTY_TENSOR) is removed. 3) The above is pretty clean except for resize/resizeNd logic -- that is still called with nDimension == 0. So, we rename these to resizeLegacy, resizeNdLegacy, map nDimension == 1 into the new regime, and will later write a empty-aware resize/resizeNd and move over the calls to resizeLegacy, resizeNdLegacy. 4) Also introduces some ifdefs that are just used for testing: a) USE_TH_SCALAR: move scalar logic in TH b) USE_TH_ZERO_SIZE_DIM: support arbitrary 0-sized dimensions, i.e {...,0,...}. These are just used to write forward-looking correct code while call sites to _dim() (old TH nDimension) and resizeLegacy are updated. * Get rid of noelem_to_empty. * Use static_cast rather than C-style cast. * Allocator size for empty tensors in THS/THCS. * Add back THLongStorageView type Stride (TH and arg parsing has some magic that needs these to be nullptrs).	2018-06-14 16:56:52 -04:00
onnxbot	6287b80d67	[auto] Update onnx to 3ca20e6 - Remove obsolete installation doc. (onnx/onnx#1108 ) `3ca20e6993`	2018-06-14 20:50:50 +00:00
Wei Yang	ae55865a3b	Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117 ) * 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet * optimized memory read/write * 1. pass in lambd as scalar for CPU/CUDA_apply; 2. removed tests for hardshrink at test_legacy_nn fixes test_utils * 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda * 1. printing lambd value; 2. default lambd=0.5 is still failing * getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py * cleaned up debug printf	2018-06-14 16:42:20 -04:00
Peter Goldsborough	2ab4c9dbec	DEPRECATED -> AT_DEPRECATED (#8496 )	2018-06-14 16:25:49 -04:00
Jorghi12	c4194169a8	Temporary solution for having access to Python installation path. (#8487 ) * Temporary solution for having access to the root path for python installations until Caffe2/PyTorch figure out the best way to build. * Update build.sh Increasing the verbosity of HIP errors.	2018-06-14 16:05:03 -04:00
Zachary DeVito	2f25d1fbc1	Enable tracing and script autograd tests (#8145 ) This commit turns autograd function/method tests into tests run inside of a trace, or directly written using script. These tests have uncovered many bugs and limited functionality in the trace/script pathway, and these failing parts of the tests are disabled using new exclusion sets. The size of these sets will shrink as the bugs are fixed.	2018-06-14 11:48:15 -07:00
sf-wind	aa2c79a125	Add ONLY_FOR_TEST device type into executor (#8461 ) Add ONLY_FOR_TEST device type into executor to support some of the tests	2018-06-14 14:06:35 -04:00
Will Feng	467fc3c436	[READY TO MERGE] Improve docs for Multinomial and Categorical distributions (#8472 ) * Improve docs for Multinomial and Categorical distributions * more improvement * more improvement	2018-06-14 12:47:35 -04:00
Will Feng	aed98067bf	Pin correct clang version in macOS CI test (#8457 )	2018-06-14 12:47:24 -04:00
wuhuikx	fa277e6785	[IDEEP] [fix bug] Fix bug in ideep SkipOutputCopy strategy (#8372 ) * fix a bug for SkipIndices * IDEEP bug, revise the output to CPUTensor in SkipOutputCopy strategy * [IDEEP] Add IDEEP fallbacks for Style-Transfer ops	2018-06-14 09:42:00 -07:00
Tongzhou Wang	a4bd4f6c6f	Fix -g not passed to nvcc when DEBUG=1 (#8407 ) * Fix -g not passed to nvcc when DEBUG=1 * blacklist -Werror * filter CMAKE_CXX_FLAGS too * restore to space-delimited string before ending macro	2018-06-14 12:36:50 -04:00
Sebastian Meßmer	384936f73e	TypeId improvements (#8350 ) * Improve TypeId: - move it to c10 namespace to allow for easy extraction from caffe2 into c10 (i.e. reuseability from aten) - Use unordered_map/unordered_set instead of map/set for performance - Make TypeId a type safe class (i.e. no implicit casts from/to int) - Make TypeId constexpr - Some readability improvements (e.g. using instead of typedef) - Don't explicitly implement TypeMeta copy assignment and construction - let the compiler do that for us. - Add TypeMeta move constructor - Make TypeMeta members noexcept - Implement TypeMeta::operator== and operator!= as free functions instead of in-class * CR comments * fix * fix windows * Rename back to CaffeTypeId * Remove c10::TypeId/TypeMeta * remove C10_KNOWN_TYPE * code review	2018-06-14 09:16:26 -07:00
sf-wind	752bb954b4	Update RunAsyncFailure test (#8486 ) Fix RunAsyncFailure test	2018-06-14 12:05:57 -04:00
Chintak Sheth	21609e0fd0	``bincount`` feature implementation (#6688 ) * Implement CPU bincount feature support * Incorporate feedback on renaming to SummaryOps file and other nits * bincount gpu implementation * refactor cuda code and incorporate nits * doc fix * cuda bincount - cast weights to double if integral type * fix: signed unsigned comparison error * fix: ssize_t error * refactor * make template typenames readable and other nist * make compatible with v0.5 * incorporate comments * update test cases to ensure CUDA code coverage	2018-06-14 11:38:04 -04:00
Orion Reblitz-Richardson	2a0e98a334	Move libtorch CMakeLists.txt to torch/ (#8444 )	2018-06-14 11:36:49 -04:00
Paul Jesse Hellemn	e323f02277	Fixing missing PyCObject_Type bug (#8467 )	2018-06-14 08:08:25 -07:00
cpuhrsch	2184e3f933	Use MKL VML if available (#8458 )	2018-06-14 10:40:21 -04:00
ngimel	8d674c0d51	add comparison operators to jit (#8058 ) * add comparison operators to jit * try to fix CI * address review comments * fix type of comparison ops result * address review comments * fix indentation * add comments * require type_as to have non-dynamic tensor arg * Typo (should check if template argument of type_as, inputs()[1], is tensor) * Use .at() instead of [] * Use .at() again	2018-06-14 09:30:25 -04:00
Du Phan	9d88ff7d0d	Add half cauchy, half normal distributions (#8411 )	2018-06-14 10:28:42 +02:00
li-roy	6a85b133d3	Improve number formatting in tensor print (#7632 ) * Improve number formatting in tensor print * fix bad rebase * address comments * fix test * fix test * use assertExpected for tests * address comments * address comments	2018-06-13 23:57:16 -07:00
Lu Fang	bb9ef8fc2e	Support new version of Dropout (#8470 )	2018-06-14 14:47:47 +08:00
li-roy	2de4ab88f5	remove _assert_no_grad from loss modules (#8460 )	2018-06-13 21:30:51 -04:00
mcarilli	db14f3f33c	More efficient kernels that avoid deprecated shuffles in Embedding and LookupTable (#8400 ) * More efficient kernel that avoids deprecated shuffles in Embedding.cu and THCUNN/LookupTable.cu * Using WARP_BALLOT from THCDeviceUtils.cuh, also changing WARP_BALLOT to return unsigned	2018-06-13 21:29:51 -04:00
onnxbot	f7585178cd	[auto] Update onnx to b7d5a60 - Add stats on ONNX node tests (onnx/onnx#1110 ) `b7d5a60f90`	2018-06-14 01:18:58 +00:00
Peter Goldsborough	64d5b1454e	Add is_variable tag to Tensor (#8414 ) * Add is_variable tag to Tensor * Add is_variable tag to Type	2018-06-13 18:14:29 -07:00
li-roy	6e314f9f68	update tensor clone docs (#8462 )	2018-06-13 21:06:21 -04:00
llyfacebook	681964cc47	output each operator separately due to logcat truncation (#8456 ) as title	2018-06-13 21:05:05 -04:00
Jorghi12	ad378dfbaf	Adding necessary LOCAL variables in order for the perl script that HIP utils uses to run successfully without error. (#8464 )	2018-06-13 20:28:54 -04:00
bddppq	df3559ca58	Move hip utils files to a separate directory (#8446 )	2018-06-13 16:49:59 -07:00
Pieter Noordhuis	dc209ed963	[c10d] Rendezvous skeleton (#8294 ) * [c10d] Rendezvous skeleton The rendezvous function takes an URL and produces a triplet of a store, a process rank, and the process group size. For the file and TCP handlers, the rank and size must be specified, but other handlers may discover these parameters dynamically. It returns a generator function, such that if a rendezvous handler supports rerendezvous, you can write: for store, rank, size in c10d.rendezvous(...): pg = c10d.ProcessGroup(store, rank, size) while the process group is valid: # Do stuff with process group * Add Python 2 fallback for urlparse library * Import X as Y * Relative import seems to fix it * Spelling * Gate import on c10d availability	2018-06-13 15:27:32 -07:00
Jorghi12	8a837f0fe3	Repairing the integrated build path to handle the Caffe2 PR. (#8441 ) * Modifying the build path to handle Caffe2's merge * Update LoadHIP.cmake Fixing typo. * Update Dependencies.cmake Keeping hip_include_directories since other Caffe2 libs depend on it. * Update CMakeLists.txt Only including for the second time if we're building with ATen. * Update CMakeLists.txt Adding comments to make sure future users understand why necessary commands have been added.	2018-06-13 17:16:59 -04:00
Sam Gross	4d287f9074	Use int64_t instead of int for in loop that may overflow. (#8435 )	2018-06-13 17:02:32 -04:00
Pieter Noordhuis	2c9c48a323	Add CODEOWNERS entry for c10d test file (#8445 )	2018-06-13 16:22:57 -04:00
Wei Yang	71a3633e3f	change tensor.set_() argument names to match descriptions in doc (#8403 ) Replaced args name `storage` and `sourceStorage` to `source` in tensor.set_() to match the descriptions in docs.	2018-06-13 13:22:50 -07:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Vishwak Srinivasan	f1b5124306	Fix #8420 , defaulting the initial hidden state to 0 (#8427 )	2018-06-13 14:26:28 -04:00
Edward Z. Yang	09896d1e77	Allow nccl downgrades (#8429 ) * Revert "Stop pinning nccl version. (#8421)" This reverts commit 3cb45bafc8b9b023049e5f979a2bcb75e3f7009d. * Allow downgrades from libnccl2 install. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-13 13:56:34 -04:00
Orion Reblitz-Richardson	edd4e2c5d1	Expose proto utils and ONNX (#8073 ) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files	2018-06-13 10:25:32 -07:00
Lu Fang	7543d0f794	Enable some of the ONNX backend test on broadcasting (#8423 ) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast	2018-06-13 10:15:56 -07:00
Vishwak Srinivasan	61f61de270	Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428 )	2018-06-13 12:27:58 -04:00
Edward Z. Yang	3cb45bafc8	Stop pinning nccl version. (#8421 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-13 10:53:56 -04:00
Yangqing Jia	7ca8e2f131	fix old comment to point to the right file (#8416 )	2018-06-13 21:33:05 +08:00
Lu Fang	a42c12bb11	Enable some reduce operators' ONNX backend tests (#8418 )	2018-06-13 21:32:50 +08:00
Peter Yeh	c37e5b7137	[Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format	2018-06-13 04:00:39 -07:00
Peter Goldsborough	36bf89bf09	Remove imaginary file (#8415 )	2018-06-12 23:17:19 -07:00
James Reed	04503962ff	[ONNX] Add an ATen fallback pathway for ONNX export (#8273 ) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface	2018-06-12 22:59:45 -07:00
Jinghui	76f22b7aef	[caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364 ) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-06-12 22:06:16 -07:00
Jorghi12	81b92f7515	Get ROCm building again on master (#8343 ) Billing of changes: - New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running - New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily - New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet) - A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved - Switch from hiprng to hcrng - Apply some patches directly in code, eliminating the patches - Use __hdiv instead of hdiv, it's more portable - THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub - Add a few more overloads HIP needs - Turn off use of hcc to link (we plan to turn this back on to get tests running) - Search for hiprand, hiprng, hipblas, hipsparse - Better Python 2 portability	2018-06-12 23:05:21 -04:00
cpuhrsch	49d6c5f99f	Branch parallel if number of threads is 1 (#8401 )	2018-06-12 22:28:51 -04:00
Peter Goldsborough	7c9e936986	Add way of deprecating ATen functions (#8404 )	2018-06-12 19:26:43 -07:00
Will Feng	557511102e	Always include Modules_CUDA_fix for Caffe2 builds (#8396 )	2018-06-12 22:19:23 -04:00
Edward Z. Yang	4485ce66c2	Fix flaky RoiAlignTest, fixes #8084 . (#8312 ) * Fix flaky RoiAlignTest, fixes #8084. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> * Increase tolerance * more...	2018-06-12 20:06:24 -04:00
Edward Z. Yang	b947ac227d	Check if you forgot to specify 'variants: function' on _out (#8402 ) The Python binding generation code doesn't understand method '_out' bindings correctly, and will compute the indices wrong if you have an '_out' function that's also method. This is a quick check to prevent you from making this mistake. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-12 20:05:45 -04:00
anderspapitto	fcd9af8a25	changes to support ATen code generation inside fbcode (#8397 ) * Back out "Back out "Add support for generating ATen files during fbcode build"" Original commit changeset: 7b8de22d1613 I'm re-sending this diff exactly as it was approved and committed. Fixes to support @mode/opt will be sent separately for ease of review. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level.	2018-06-12 14:57:29 -07:00
Will Feng	ffffee6aa9	Skip test_multinomial_invalid_probs on Windows (#8360 )	2018-06-12 17:00:49 -04:00
Paul Jesse Hellemn	712a3fad27	Adding CMAKE_PREFIX_PATH and CMAKE_INSTALL_PREFIX to cmake summary (#8398 )	2018-06-12 14:21:11 -06:00
Wei Yang	c3e4b3c88b	raise more informative error msg for torch.load not support seek (#7754 ) Raising more informative error msg for torch.load() when input file does not support seek() or tell()	2018-06-12 12:57:28 -07:00
Richard Xue	c6db1bc952	Add gt lt ge le to the supported operators list (#8375 ) Add gt lt ge le to the supported operators list	2018-06-12 15:28:34 -04:00
Orion Reblitz-Richardson	bef12551ee	Check CAFFE2_USE_MSVC_STATIC_RUNTIME to set -MD vs -MT in cuda.cmake (#8381 )	2018-06-12 11:59:39 -07:00
Orion Reblitz-Richardson	5f5ea75283	Use SYSTEM For all includes in Dependencies.cmake (#8380 )	2018-06-12 11:59:02 -07:00
Orion Reblitz-Richardson	49eec35e5b	More warning skips (#8382 ) * Remove check for unused private fields * Suppress inconsistent-missing-override * Hopefully last warning skip for Mac * Add one more warning ignore	2018-06-12 14:44:36 -04:00
Tongzhou Wang	a77b391de7	[SpectralNorm] don't register original weight as buffer (#8170 ) * don't register original weight as buffer; fixes for buffers that require grad * add test	2018-06-12 14:42:05 -04:00
Xiaodong Wang	922adf8d09	Skip calling ncclCommDestroy in destructor (#8352 ) There is a bug in NCCL that causing seg faults while calling ncclCommDestroy() in the destructor during program exit. According to Nvidia, "Whether the NCCL destructor will be called before or after the CUDA runtime destructor is undefined, which can lead to crashes." For the immediate workaround, skip calling ncclCommDestroy ihe NCCL destructor. This is UGLY and we'll follow up with Nvidia to solve this ASAP.	2018-06-12 13:11:09 -04:00
Yangqing Jia	991bdd7f13	[build] remove the use of NO_CUDA (#8300 ) * Only remove NO_CUDA from CMakeLists.txt * @ezyang's catch	2018-06-12 12:14:36 -04:00
Pieter Noordhuis	5484a197d9	[c10d] Convenience wrappers for collective functions (#8292 ) * [c10d] Add convenience wrappers * Release GIL	2018-06-12 09:05:16 -07:00
Edward Z. Yang	cc8fbc9d08	Revert "Name the thread pools (#8137 )" (#8379 ) This reverts commit 96876d9e7ef6baf9d11541454b5f4d22b092de77.	2018-06-12 11:51:32 -04:00
Giuseppe Ottaviano	96876d9e7e	Name the thread pools (#8137 ) Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.	2018-06-11 23:13:46 -07:00
Edward Z. Yang	a161639fcd	Move copyright lines back to NOTICE file, fixes #6911 (#8310 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2018-06-11 23:12:41 -07:00
Xiaomeng Yang	44973a06ba	Add affine_channel_op (#8356 ) Add affine_channel_op	2018-06-11 20:51:11 -07:00
onnxbot	87dcdf5fe5	[auto] Update onnx to 86999f9 - Fix the LRN's doc (onnx/onnx#1107 ) `86999f90f0`	2018-06-12 02:52:51 +00:00
Will Feng	1f02ebd323	Use clang 8 to build CUDA in macOS CI (#8355 ) * Don't use -faligned-new flag for clang < 9.0 * Select Xcode 8.2 toolchain when building CUDA * Better comment	2018-06-11 22:45:40 -04:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
gchanan	38362fa9f3	Prepare for moving 0-sized dimensions in TH/THC. (#8337 ) This does the following: 1) makes nDimension an int64_t (to match ATen) 2) changes the dimension value to dim_ (so we catch direct usages) 3) provide an _dim() that provides access to the "old" view (so we can migrate functions one at a time) 4) have code call ->-_dim() instead of ->nDimension.	2018-06-11 21:18:02 -04:00
Edward Z. Yang	0cced57cb8	Build DEBUG mode with -O0, fixes #8335 . (#8336 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-11 21:05:12 -04:00
Dmytro Dzhulgakov	ae1ceef36a	Allow TypeMeta hold non-default-constructible types (#8349 ) Necessary for Tensor detemplatization (D8121878) - now tensor won't have default constructor (as we don't know the device). Thus this diff makes TypeMeta be constructible with non-default-constructible types in which case ctor() is non-null but always throws. It's dangerous however as we won't catch potential type errors at compile time. Luckily - the only place where ctor() is used is in Blob and Tensor which have templated wrappers there (GetMutable and mutable_data respectively). We can just enforce the necessary type requirements there explicitly as a static_assert. It also changes the failure behavior to be throw() instead of abort(). Aborting the process is not cool for the library :)	2018-06-11 15:53:07 -07:00
Xiaomeng Yang	ddab886105	[caffe2] Move elementwise grad ops to separate files (#8315 ) * Move elementwise grad ops to separate files Move elementwise grad ops to separate files * Fix proto build * Fix build * Fix sync error	2018-06-11 15:38:36 -07:00
Dmytro Dzhulgakov	46c0b01234	Revert D3314316 (#8346 ) This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature.	2018-06-11 14:23:10 -07:00
Orion Reblitz-Richardson	9b1480a28e	Fix disabling of USE_CUDNN when not found (#8340 )	2018-06-11 11:40:51 -07:00
James Reed	607b86f603	Implement dim_arange operator (#8266 ) * Implement arange_like operator * add ONNX symbolic * lint * change name * Comment the hack	2018-06-11 10:49:29 -07:00
Peter Goldsborough	de4e97e89a	[C++ API] Cursors (#8190 ) * Add cursors to C++ API * Small self nits * s/struct/class * Use more STL like names for cursors	2018-06-11 09:48:43 -07:00
Edward Z. Yang	77660a9cbb	Support printing sparse tensors in ATen, fixes #8333 . (#8334 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-11 12:15:50 -04:00
Will Feng	77dea37dac	Skip test_multinomial_invalid_probs_cuda on Windows (#8324 )	2018-06-11 11:14:10 -04:00
peterjc123	f4b79f99d1	Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277 ) * Simplify the solution * Remove the usage of set errorlevel	2018-06-11 11:03:30 -04:00
peterjc123	bed172cf54	Fix collect_env.py for Windows (#8326 ) * Fix collect_env.py for Windows * Fix expect file for Win machine	2018-06-11 10:52:21 -04:00
Ailing	52e4d3c4a2	add error when backend is not supported by DDP (#8325 )	2018-06-11 02:18:30 -04:00
Seth Hendrickson	94888106a9	Add docstring for `torch.sparse_coo_tensor` (#8152 ) * add sparse_coo_tensor docstring * update empty tensor example * whitespace * whitespace again	2018-06-11 00:03:51 -04:00
Teng Li	80b6f9edd6	[THD] fix broken THD build with NCCL (#8323 )	2018-06-10 23:48:10 -04:00
onnxbot	01f5ba4f3e	[auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105 ) `4b4085c2e9`	2018-06-10 20:49:46 +00:00
Kaiyu Shi	0169ac5936	Fix sample code for cuda stream (#8319 )	2018-06-10 11:41:50 -04:00
onnxbot	bf8689d0e5	[auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104 ) `5ed684ebe5`	2018-06-10 04:59:13 +00:00
Sebastian Meßmer	d33cc08a97	Small fixes (#8296 )	2018-06-09 23:11:35 -04:00
Xiaomeng Yang	5fe24968ed	Remove unused grad ops on mobile to reduce app size (#8297 ) Remove unused grad ops on mobile to reduce app size	2018-06-09 23:10:05 -04:00
Will Feng	07d3f14eed	Clean up old sccache log before build (#8305 )	2018-06-09 23:07:47 -04:00
Vishwak Srinivasan	b78466a37d	Replace Variables to Tensors (#8309 )	2018-06-09 23:07:15 -04:00
Mike Ruberry	29849e428c	Removes unused THCTensorConv (#8229 )	2018-06-09 17:15:26 -04:00
bddppq	3521cd54af	Fix dividing by zero segfault in Reshape (#8302 ) when infer a dimension of zero size new shape	2018-06-09 09:48:22 -07:00
Yinghai Lu	2ed03898cd	Add depthwise convolution test for IDEEP (#8301 )	2018-06-09 08:44:13 -07:00
Pieter Noordhuis	e6ef18d531	Entries for torch.distributed in CODEOWNERS (#8293 )	2018-06-09 00:28:40 -04:00
Yangqing Jia	788f05d215	Remove THC's FindMAGMA (#8299 )	2018-06-08 21:03:39 -07:00
Sebastian Meßmer	a34211bd79	Some utils for compile-time programming (#7778 ) * Add some C++17 features, implemented with C++14 * Add some type traits * Compile-time type list abstraction * Some utils for compile-time programming * Fix compatibility with a larger range of compilers * Use guts::array instead of std::array because of std::array shortcomings * code review comments * Use quotes for includes	2018-06-08 17:10:53 -07:00
onnxbot	f35d7cce91	[auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102 ) `58efe0a9ca`	2018-06-08 23:10:17 +00:00
gchanan	045e7435c3	Have a single THTensor / THCTensor type. (#8288 ) * Remove remaining TensorTypeUtils functions. Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Have a single THTensor / THCTensor type. As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor. For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type. * undef GENERATE_SPARSE.	2018-06-08 17:57:44 -04:00
Yangqing Jia	37073f8be0	[build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159 ) * Remove /torch/lib/THD/cmake in favor of /cmake * path fix * Explicitly marking gloo to use cuda * Fix gloo path in THD	2018-06-08 17:55:12 -04:00
Yangqing Jia	c486b8749d	Add option USE_NVRTC which defaults to off (#8289 )	2018-06-08 14:27:23 -07:00
Pieter Noordhuis	695d40efc2	Create initial Python bindings for c10d (#8119 ) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted	2018-06-08 12:59:51 -07:00
gchanan	75563674c4	Remove remaining TensorTypeUtils functions. (#8286 ) Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.	2018-06-08 15:51:25 -04:00
Sebastian Meßmer	efba555a38	c10 build setup (#8264 ) * Move c10/ to caffe2/dispatch/ * Set up caffe2/utils directory	2018-06-08 12:11:17 -07:00
Will Feng	d56b4f2568	Set up CI build for CUDA 9.2 + macOS (#8274 ) * Add macOS CUDA build to CI * Fix undefined symbols issue * Use sccache for CUDA build * Fix sccache issues * clean up	2018-06-08 14:12:52 -04:00
Teng Li	a994b432ee	[c10d] NCCL Process Group implementation (#8182 ) * [c10d] Process Group NCCL implementation * Addressed comments * Added one missing return and clang format again * Use cmake/Modules for everything and fix gloo build * Fixed compiler warnings * Deleted duplicated FindNCCL	2018-06-08 10:33:27 -07:00
Viswanath Sivakumar	d301d9df7a	[ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233 ) IDEEP supports fusion for non-group conv	2018-06-08 10:29:15 -07:00
Tongzhou Wang	742912512c	Move signal window functions to ATen; add Blackman window (#8130 ) * Move signal window functions to ATen; add Blackman window * fix cuda test not checking scipy	2018-06-08 11:37:46 -04:00
Yangqing Jia	20c516ac18	[cmake] Make cudnn optional (#8265 ) * Make cudnn optional * Remove cudnn file from cpu file	2018-06-08 02:04:27 -07:00
onnxbot	147fc6b9cc	[auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098 ) `39e46687ea`	2018-06-08 06:12:08 +00:00
onnxbot	2928a33f50	[auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097 ) `2508156135`	2018-06-08 05:11:15 +00:00
Yangqing Jia	1a03ba51dc	[cmake] Add and export Modules_CUDA_fix (#8271 ) * Add and export Modules_CUDA_fix * actually, need to include before finding cuda	2018-06-07 21:50:30 -07:00
James Reed	49593a609a	[caffe2] Fix ATen dispatch for ops with TensorList arg (#8226 )	2018-06-07 20:35:22 -07:00
gchanan	80fade8af4	un-genericize THCDeviceTensorUtils. (#8258 ) * provide data<T>() in TH(C)Tensor. * un-genericize THCDeviceTensorUtils. This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type.	2018-06-07 23:29:41 -04:00
Viswanath Sivakumar	4f1440e828	[ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260 ) TSIA	2018-06-07 20:21:56 -07:00
Pooya Davoodi	048b2f3a91	[caffe2] Move submodule onnx-tensorrt forward (#7659 ) Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential.	2018-06-07 20:07:04 -07:00
gchanan	8d0c3c721a	Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247 ) * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. * Fix template parameter.	2018-06-07 20:54:49 -04:00
llyfacebook	0c9b5f0825	Change the output format of caffe2 observers (#8261 ) as title	2018-06-07 17:30:43 -07:00
Wei Yang	4c2a1a1a64	Added backward function for kl_div target (#7839 ) * added backward fn for target * added module test for kl_div target, and assuming targets are probabilities	2018-06-07 17:17:18 -07:00
Ehsan Azar	ce122cc2d3	Relax CUDA_HOME detection logic, to build when libraries are found. (#8244 ) Log when no cuda runtime is found, but CUDA is found	2018-06-07 20:08:13 -04:00
li-roy	73966f65ae	Stop BCELoss from returning negative results (#8147 ) * Stop BCELoss from returning negative results * check explicitly for 0 before taking log * add tests * fix lint * address comments	2018-06-07 20:06:04 -04:00
Xiaomeng Yang	e2be77eae8	Fix app size check (#8256 ) Fix app size check	2018-06-07 15:34:22 -07:00
Pieter Noordhuis	78b88219fa	[cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248 )	2018-06-07 15:00:38 -07:00
gchanan	b4c6310247	Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251 )	2018-06-07 17:47:45 -04:00
onnxbot	95ae09c866	[auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077 ) `3a035f4397`	2018-06-07 20:33:02 +00:00
gchanan	93a9bb9f35	Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243 ) * Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc. This PR does the following: 1) Removes THSTensor macros in torch/csrc, which aren't used. 2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_): a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h. b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character. In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example: currently in the THC random code, the state is casted to THByteTensor; this happens to work because the macros don't happen to override THByteTensor. But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works. The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't. We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now. Don't change the plugin.	2018-06-07 16:10:10 -04:00
Zachary DeVito	a466c12bd4	Fix lifting cat into its constant version (#8174 ) This fixes a bug where schema including varargs lists did not lift properly blocking correct ONNX export.	2018-06-07 12:38:58 -07:00
Will Feng	f2c86532f3	Fix TEST_CUDA import in test_cuda (#8246 )	2018-06-07 15:12:05 -04:00
Sam Gross	14f5484e0d	Print requires_grad and grad_fn in string repr of tensor (#8211 ) For example: >>> torch.ones(3).requires_grad_() tensor([ 1., 1., 1.], requires_grad=True) >>> torch.ones(3).requires_grad_() * 5 tensor([ 5., 5., 5.], grad_fn=<MulBackward0>) The suffix (dtype, requires_grad, grad_fn) wraps to a new line if it would cause the the line to exceed the linewidth. >>> torch.ones(10).double().requires_grad_() tensor([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=torch.float64, requires_grad=True)	2018-06-07 14:31:23 -04:00
Bhushan Sonawane	d2271dcee3	Fix: gradcheck forced float32 (#8230 )	2018-06-07 12:31:18 -04:00
Yangqing Jia	3eb9ba4d60	Remove .gitmodules.aten since it is in .gitmodules now (#8232 )	2018-06-07 09:12:37 -07:00
Orion Reblitz-Richardson	d1bdb3b10a	Remove core and util warnings (#8239 ) * Fix some signed/unsigned mismatches * Skip unused result warning * Explict fallthrough for murmur hash * Enable aligned new support to eliminate warning * Switch to int instead of unsigned in some cases	2018-06-07 09:10:33 -07:00
Marat Dukhan	ea5d871e49	[caffe2] Build Android tests and binaries in CI (#7593 ) Update benchmark submodule to version with fixed Android/GNUSTL build	2018-06-07 09:07:38 -07:00
Edward Z. Yang	7ed361a466	Rename SparseTensor to SparseTensorRef. (#8237 ) I want to introduce using SparseTensor = Tensor (as a documentary type alias for Tensor), but the name is already taken. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-07 11:03:49 -04:00
xkszltl	346568d40f	Use .cc since some downstream libraries are configured for C++ only. (#8234 )	2018-06-07 01:37:52 -07:00
onnxbot	c22c55ebed	[auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084 ) `62e63e9de8`	2018-06-07 05:42:14 +00:00
Viswanath Sivakumar	832c88a766	[ideep] Add IDEEP Squeeze op (#8227 ) Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc	2018-06-06 21:58:51 -07:00
Viswanath Sivakumar	4df86b6547	Update MKL exporter to IDEEP ops (#8228 ) IDEEP exporter support	2018-06-06 21:43:43 -07:00
Yangqing Jia	b401e6b03a	Allow optional build and installation of native test binaries (#8225 ) * test finetuning * install off by default * Turn BUILD_TEST=ON for jenkins. * Turn on install_test in jenkins as well	2018-06-06 20:56:31 -07:00
Yinghai Lu	8af88f3525	[Caffe2] Add ADD operator for IDEEP (#8220 ) * Add ADD operator for IDEEP * Add boradcast check * Comments	2018-06-06 20:20:33 -07:00
Ben	2f18f864fb	Fix win mkldnn (#7718 ) * Sync build_pytorch_libs.bat with build_pytorch_libs.sh * fix quoting * add warnings * fix warnings * Add /EHa	2018-06-06 22:59:38 -04:00
Thomas Viehmann	d0ca8896d5	Don't copy unneeded grads when using a function for several derivatives (Fixes #7722 ) (#7759 ) Trying to copy all results fails when one of them is a tensor list which has not been populated. This blew up for CuDNN RNNs when the weights did not require grad. Thanks to Sylvain Gugger for reporting!	2018-06-06 22:54:23 -04:00
Will Feng	c84b97b979	[READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745 ) * Don't import TEST_CUDA for test_dataloader on Windows * test_partial_workers is stuck on Windows	2018-06-06 22:50:39 -04:00
Will Feng	89ea6acde2	[NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647 ) * Add nan and inf probs check to multinomial * fix bug * Spawn CUDA test in subprocess * Make sure invalid input won't pass the test case * Try to fix error * Test failure cases in Python 3 only * Try to fix Windows error * Move CUDA test to test_cuda.py * fix issues * fix module name error * no need to check for CUDA existence in test_cuda * Use PY3	2018-06-06 22:49:12 -04:00
Will Feng	784c46ba1d	[READY TO MERGE] Use ccache in macOS build (#8009 ) * Use ccache in macOS build * Moving to sccache * Don't use sccache in test job	2018-06-06 22:38:10 -04:00
gchanan	1172b152ab	move THCP-related utils to cuda/utils.cpp. (#8221 ) These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations. (Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X). utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code. This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>).	2018-06-06 20:58:57 -04:00
gchanan	5ec3041a42	Structure THTensor like THCTensor is structured. (#8217 ) In particular, define a base type, _THTensor, that can be used for all THRealTensor structs. This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates).	2018-06-06 20:58:04 -04:00
gchanan	deb56dfd06	Change new bernoulli implementation to be fully generic. (#8218 ) The current implementation depends on THTensor types being unique, which is not guaranteed going forward.	2018-06-06 20:54:38 -04:00
onnxbot	07df98a3b8	[auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083 ) `e96d823e5c`	2018-06-07 00:49:04 +00:00
yyetim	02734e389d	Move helper functions to unnamed namespace. (#8224 ) Currently, the helper functions in this file are in global namespace. I am guessing the purpose of excluding them from was to keep them local.	2018-06-06 17:16:34 -07:00
llyfacebook	7cace7219a	Change the benchmark log format and also log flops (#8215 ) as title	2018-06-06 17:04:54 -07:00
Sebastian Meßmer	b03ba9023e	Set up a c10 source folder (#7822 ) * Set up a c10 source folder	2018-06-06 16:56:17 -07:00
onnxbot	f3869b4e03	[auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088 ) `18d70ff529`	2018-06-06 23:40:38 +00:00
Sam Gross	12229afd00	Record shape and type in autograd to validate gradients (#8168 ) The check that the gradient is defined is currently disabled because TestJit.test_ge_optimized will trigger the error.	2018-06-06 18:09:53 -04:00
Tongzhou Wang	36b8cc5483	skip CUDA memory leak check on Windows altogether (#8213 )	2018-06-06 17:29:53 -04:00
Yangqing Jia	56b1dcccf6	[cmake] deprecate caffe2_* specific cuda function in cmake. (#8200 ) * deprecate caffe2_* specific cuda function in cmake. * ENV{} -> $ENV{} * CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST * . * . * .	2018-06-06 14:13:26 -07:00
onnxbot	f2f76e29ee	[auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090 ) `f28e2f1a60`	2018-06-06 21:13:09 +00:00
Xiaomeng Yang	1f23043b0a	Fix tanh_op on ios build (#8207 ) * Fix tanh_op on ios build * Fix tanh	2018-06-06 14:09:01 -07:00
Peter Goldsborough	7ee517a266	rm -rf aten/contrib (#8165 ) * Remove aten/contrib * Remove from CMake	2018-06-06 16:55:48 -04:00
Pieter Noordhuis	005eef5027	Bump gloo submodule (#8202 ) This includes facebookincubator/gloo#125.	2018-06-06 13:31:29 -07:00
Pieter Noordhuis	5935c5f23b	Fix c10d compiler warnings (#8206 ) Copy compiler flags from the ones used in setup.py and fix warnings. This makes the root build that includes c10d headers warning free.	2018-06-06 13:23:53 -07:00
gchanan	61fd99e1b3	Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176 ) * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these. This PR does the following: 1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp. This involves moving includes of it outside of generic/, so we can use the new implementations. 2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor. This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions. 3) For functions inside of TensorUtils that don't use data pointers: a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp. b) Have the generic versions call the non-generic versions. c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn. * Add comment about THCTensor struct. * Error if storage is null in setStorageNd or resizeNd.	2018-06-06 16:19:40 -04:00
llyfacebook	4d025a6a54	add wipe_cache option (#8204 ) as title	2018-06-06 13:08:39 -07:00
Pieter Noordhuis	eaea0f4b82	Update c10d build to link against Caffe2 (#8201 ) This follows #7399.	2018-06-06 11:40:07 -07:00
Will Feng	edfcbfbe1f	Implement randperm for CUDA (#7606 ) * Implement randperm for CUDA * Use Thrust to implement randperm * clean up * Fix test * Offload small input scenario to CPU * Fixed test * Try to fix Windows error * Fix Windows error and clean up * Use fork_rng context manager * Move test_randperm_cuda to test_cuda * Add half tensor support * Fix cuda::type error * Fix CPU offloading * Fix issues * No need to check range for n == 0 case	2018-06-06 14:30:58 -04:00
Tongzhou Wang	9af3a80cff	Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166 ) * Docs for gradcheck and gradgradcheck; expose gradgradcheck * address comments	2018-06-06 13:59:55 -04:00
Tongzhou Wang	35f08b930d	Allow parallel_apply to take in list[Tensor] (#8047 )	2018-06-06 13:49:52 -04:00
ngimel	e6044e5576	use THCThrustAllocator in BCECriterion (#8188 )	2018-06-06 13:19:16 -04:00
Adam Paszke	c0b2a2aa3b	Add more annotations for arguments in ATen schema (#8192 )	2018-06-06 13:11:39 -04:00
Soumith Chintala	5e372c7106	fix lint	2018-06-06 12:53:58 -04:00
Richard Zou	115a494b5f	Fix scalar check for sparse tensors. (#8197 ) * Fix scalar check for sparse tensors. As discovered in #8152 If `t` is a scalar sparse tensor, `t._indices` used to return a sparse empty tensor because the scalar check was incorrect. This PR modifies the scalar check to return a dense tensor instead of a sparse tensor. i.e. ``` tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device) out = tensor._indices() # was a sparse tensor, now is dense. ``` * Fix typos	2018-06-06 12:24:25 -04:00
Paul Jesse Hellemn	8e6f7a1382	[Caffe2] Merging setup.py with setup_caffe2.py (#8129 ) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags	2018-06-06 08:31:31 -07:00
onnxbot	857020b849	[auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078 ) `4e65fd83ba`	2018-06-06 13:25:11 +00:00
Adam Paszke	f45a3d5558	Add a loop unrolling pass to PyTorch JIT (#7672 )	2018-06-06 09:36:12 +02:00
Ben	a6305ea210	Fix protobuf options (#8184 ) * protobuf * fix protobuf_MSVC_STATIC_RUNTIME	2018-06-05 22:43:05 -07:00
James Reed	c496a4a347	Yangqing as an ONNX codeowner (#8185 )	2018-06-05 22:06:32 -07:00
James Reed	3b8f4d1d88	[ONNX] Fix type_as symbolic (#8183 ) * [ONNX] Nuke type_as symbolic * make it better * Fix lookup + test	2018-06-05 22:06:20 -07:00
Guo Tang	bae82f726d	fix caffe2 docker build (#7411 )	2018-06-05 22:51:43 -04:00
Will Feng	e8d6ac50b4	Add retry logic to sccache download for Windows build (#7697 ) * Add retry logic to sccache download for Windows build * fix script bug * clean up	2018-06-05 22:38:30 -04:00
Tongzhou Wang	c1bd3b3fb7	Better conv error message basing on weight shape (#8051 )	2018-06-05 22:22:00 -04:00
sunnieshang	b2dac08049	Fix a corner case for ReShapeOp (#8178 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-05 19:06:10 -07:00
Edward Z. Yang	c21465e32e	Get rid of SOVERSION (again). (#8132 ) We don't want SOVERSION because pip will lose the symlink and double your distribution size, and also because our setup.py accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib on OS X. This leads to a very puzzling error where you get the error "cannot initialize CUDA without ATen_cuda", because there are actually two copies of your registry in memory (because there are two copies of the dynamic library). Dropping SOVERSION makes it impossible to make this mistake. In principle, if the shared library load is done with DYLD_GLOBAL, that should also prevent two copies of the registry from popping up. Worth checking at some later point, if you need to bring back SOVERSION (because, e.g., pip finally fixed their software.) Partially fixes #8022. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-05 22:03:04 -04:00
bddppq	d7ba404e29	Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143 )	2018-06-05 22:02:14 -04:00
Xiao Yang	ffde23d45e	use the correct datatype format (#8144 )	2018-06-05 22:01:59 -04:00
James Reed	e53fec0495	[JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173 ) * [JIT] Support a single TensorList argument anywhere in the argument list * [JIT] index_put	2018-06-05 21:48:54 -04:00
Ben	ccabdfef42	Export getCudnnHandle (#7726 )	2018-06-05 20:51:52 -04:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00
cpuhrsch	0517623517	Abstract parallelization to faciliate using threadpools (#8163 )	2018-06-05 22:36:17 +00:00
Paul Jesse Hellemn	ba46d3d981	Adding -setup- path, and better code structure (#8122 )	2018-06-05 14:40:00 -07:00
Paul Jesse Hellemn	fa1bdcf4d2	Pinning opencv to < 3.4 in conda builds (#7923 ) * Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4	2018-06-05 13:16:02 -07:00
gchanan	a3fc5ed351	Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164 ) For non-generic function call implementations in Storage used by TensorUtils, we do the following: 1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType. 2) Move the implementation from generic/C++ to non-generic/C++. 3) Change the generic implementation to call the non-generic implementation. This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner).	2018-06-05 14:50:02 -04:00
Vishwak Srinivasan	1cdd7b5c0f	Fix __rshift__ bug (#8161 ) * Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments	2018-06-05 14:30:02 -04:00
Peter Goldsborough	990c6c5531	[C++ API] Improve and use OrderedDict for parameters / modules (#7823 ) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp	2018-06-05 14:29:09 -04:00
Edward Z. Yang	bf58bb5e59	Fix cuda.framework error on OSX. (#8136 ) When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes #8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-05 13:37:05 -04:00
ngimel	7c1e8c3c7a	remove some unnecessary cudaGetDevices (#8089 ) * remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device	2018-06-05 13:17:47 -04:00
onnxbot	aec6d6a7d3	[auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075 ) `968d28d901`	2018-06-05 16:31:01 +00:00
Gao, Xiang	fe805794ac	docstring support for @script and @script_method (#7898 ) * docstring support for @script and @script_method * make it python2 compatible * improve according to review * improve build_stmts * use filter instead of list comprehension * improve the way wrap is handled for script_method * stash the original method instead * allow dynamic attr for ScriptMethod and GraphExecutor * a bit comment on build_Expr * remove _build_wrap * a bit improve on comments * rename to __original_methods * should be _original_methods	2018-06-05 10:36:08 -04:00
Ir1dXD	c719c8032c	docs: add canonical_url and fix redirect link (#8155 ) * docs: enable redirect link to work for each specific page * docs: add canonical_url for search engines closes #7222 * docs: update redirect link to canonical_url	2018-06-05 10:29:55 -04:00
Lynn	227a7640ce	Accelerate bernoulli number generation on CPU (#7171 ) * opt bernoulli rng with vsl and openmp * detect cpu vendor for bernnoulli * retrigger test platform * check the vendor more severely * use cpuinfo to check vendor	2018-06-05 10:23:48 -04:00
Ir1dXD	ee0b75a3d2	docs: Add warning to torch.repeat() (#8116 ) * docs: Add warning to torch.repeat() closes #7993 * docs: Add links for numpy functions * docs: Break the too long line	2018-06-05 10:15:36 -04:00
LaiyuanGong	f5cd479b59	fix type mismatch while call torch._C._cuda_setDevice (#8065 ) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice	2018-06-05 09:53:22 -04:00
Yinghai Lu	c446269568	cpu/ideep context converter (#8139 )	2018-06-04 21:28:59 -07:00
sunnieshang	f8c18e00d5	Fix a corner case for ReShapeOp (#8142 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-04 20:40:43 -07:00
Will Feng	a5ce0126cc	Fix job name checking for AVX tests (#8135 )	2018-06-04 19:25:15 -04:00
Tongzhou Wang	c0a419e6ba	Add non_blocking to Tensor/Module.to (#7312 ) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments	2018-06-04 18:46:52 -04:00
bddppq	ec4a0f332e	Add back lrn test (#8134 ) * Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)" This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225. * Fix mismatched default values	2018-06-04 15:06:40 -07:00
Edward Z. Yang	94e197c262	Add utf-8 header to Python file with Unicode. (#8131 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 14:49:32 -07:00
gchanan	0ea2fa15a3	Replace most remaining usages of TensorUtils<T>::DataType. (#8124 ) As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type. This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal. After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first.	2018-06-04 16:48:57 -04:00
bddppq	410191c417	Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127 )	2018-06-04 12:34:15 -07:00
daquexian	df28f5d06e	[Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896 ) * [Caffe2] Support non peer access in muji * [Caffe2] Add test for 4 gpus and 2 groups * [Caffe2] Add comments * Fix bug when reduced_affix is empty * Fix typo and add comments about cpu and amd gpu	2018-06-05 03:14:43 +08:00
Edward Z. Yang	7fc110b521	Split SparseTensorImpl off from TensorImpl. (#7990 ) * Split SparseTensorImpl off from TensorImpl. At the moment they have the same data layout, but with the upcoming refactor they will not, and we need a place to put all of the sparse tensor specific fields. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update SparseTensorImpl.h	2018-06-04 15:02:09 -04:00
onnxbot	f24d715e23	[auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903 ) `2a876162ac`	2018-06-04 18:13:14 +00:00
Edward Z. Yang	cef8bfb33e	Add missing pragma once. (#8118 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 13:26:39 -04:00
onnxbot	7ba0dbc2cd	[auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058 ) `2d5ce4aeb6`	2018-06-04 16:35:08 +00:00
Edward Z. Yang	96a77b5aa8	Make libshm also test if rt requires pthread. (#8112 ) In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23), -lrt is not sufficient to use shm_open; you also need to declare a dependency on pthread. This patch adds a surgical extra fix to detect this situation, in the case that I noticed it failing in the wild. Fixes #8110 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 12:12:59 -04:00
Marcin Elantkowski	c2046c1e5e	Implement adaptive softmax (#5287 ) * Implement adaptive softmax * fix test for python 2 * add return_logprob flag * add a test for cross-entropy path * address review comments * Fix docs * pytorch 0.4 fixes * address review comments * don't use no_grad when computing log-probs * add predict method * add test for predict * change methods order * get rid of hardcoded int values * Add an optional bias term to the head of AdaptiveSoftmax	2018-06-04 12:12:03 -04:00
bddppq	e749159064	Detect CUDNN related environment variables in cmake (#8082 )	2018-06-04 12:10:36 -04:00
bddppq	e5b997223c	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Resolve merge conflicts * . * Update GetAsyncNetHIPThreadPool * Enable BUILD_CAFFE2 in pytorch build * Unifiy USE_HIP and USE_ROCM * always check USE_ROCM * . * remove unrelated change * move all core hip files to separate subdirectory * . * . * recurse glob core directory * . * correct include * .	2018-06-04 09:04:30 -07:00
Sam Gross	3d7a064369	Remove out-of-date comment (#8114 )	2018-06-04 11:45:33 -04:00
Peter Goldsborough	04a3616de0	Replace std::size_t with size_t (#8093 )	2018-06-04 11:10:44 -04:00
Zachary DeVito	185f8fbe7c	Removing remaining NO_PYTHON ifdefs (#8067 ) * Remove NO_PYTHON in tracing * Remove NO_PYTHON in ir.h * Remove NO_PYTHON in test_jit.cpp	2018-06-04 10:53:28 -04:00
Seth Hendrickson	f8830f9991	use regex in kwarg parser (#8061 )	2018-06-04 10:47:55 -04:00
Edward Z. Yang	9fc0ba31b9	Do an additional sanity check that nvcc and CUDA include dir agree. (#8094 ) If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may end up in a situation where the CUDA_VERSION of your includes mismatches the CUDA version of your nvcc. See #8092 for a concrete case where this can occur. Explicitly detect this situation and give a good error message in this case! Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 10:41:47 -04:00
bddppq	db5bc71562	Fix and ignore some warnings (#8081 )	2018-06-04 01:01:59 -07:00
Peter Goldsborough	dff115f47a	Move backtrace to its own header (#8096 ) * Move backtrace to its own header * Move cxxabi.h into Backtrace.cpp	2018-06-03 21:11:29 -07:00
onnxbot	36c3859d3e	[auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070 ) `356208d756`	2018-06-03 18:15:05 +00:00
onnxbot	74672a31a2	[auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063 ) `cc26486541`	2018-06-03 09:15:45 +00:00
Adam Paszke	9232afeffa	Add code for TensorBoard visualization of JIT GraphExecutors (#8050 )	2018-06-02 20:55:25 +02:00
bddppq	5e35fbfaa3	Post process onnx proto (#8064 ) * Post processing onnx generated protobuf files to hide global symbols * . * .	2018-06-02 10:46:48 -07:00
bddppq	01f5ee77e3	Skip ConvTraspose ONNX backend tests (#8074 )	2018-06-02 09:52:18 -07:00
onnxbot	624ade1eac	[auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068 ) `bd98abbba0`	2018-06-02 16:04:34 +00:00
onnxbot	1fc96b6471	[auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886 ) `eb12f72a86`	2018-06-02 15:53:55 +00:00
Varun Jain	68948306bc	Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037 ) * Added support to run ONNX Upsample operator (mode=nearest) in Caffe2 * adding error checks to upsample * adding error checks to upsample * adding error checks to upsample * changing to np.isclose * Revert onnx submodule update * still fixing	2018-06-02 08:45:44 -07:00
gchanan	7be457c2a4	Reduce usages of TensorUtils<T>::DataType in THC. (#8056 ) TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call. However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work. Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member), but the DataType specific functions are more problematic. So, this PR does two things: 1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical 2) Templatizes the THC_pointwiseApplyX functions to take scalar types. To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of the corresponding template parameter. We will need to get rid of these static_asserts in the future, but this is useful for now.	2018-06-02 11:26:02 -04:00
gchanan	7926313235	Have a single THStorage and THCStorage type. (#8030 ) No longer generate data-type specific Storage types, since all Storage types are now identical anyway. For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type	2018-06-02 11:05:02 -04:00
Vishwak Srinivasan	3cbaa6b785	[ready] Clean up torch.distributions (#8046 )	2018-06-02 16:54:53 +02:00
zrphercule	afa75fa6b2	Remove NO_PYTHON macros from Exceptions.h/cpp (#8007 ) Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp	2018-06-01 22:37:18 -07:00
onnxbot	bef306eac7	[auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061 ) `033f956f41`	2018-06-02 05:06:33 +00:00
onnxbot	f2573e8df7	[auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050 ) `e6a500e54c`	2018-06-02 04:29:28 +00:00
onnxbot	7379b22abe	[auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069 ) `4f8ef17ad3`	2018-06-02 04:20:54 +00:00
onnxbot	8d4e92a91d	[auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047 ) `0dbec2a047`	2018-06-01 23:59:08 +00:00
Soumith Chintala	2fb957da81	workaround for Sequential when one cannot retrieve python source (#8048 )	2018-06-01 18:45:11 -04:00
Tongzhou Wang	eb2f21f1e4	Skip CUDA memory leak test on BN tests on windows (#8043 )	2018-06-01 18:09:14 -04:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
yyetim	9060b7f4e2	Add profiling annotations to NeuralNet[Operator\|Data] (#8005 )	2018-06-01 14:27:42 -07:00
Zachary DeVito	ef1c15f5ca	[script] Add support for torch.zeros, torch.ones, etc. (#7799 ) * [script] Add support for torch.zeros, torch.ones, etc. * modifies gen_jit_dispatch to creating bindings for functions that do not take tensor arguments, but do have an initial type argument * adds tensor attributes to these functions for device, layout, and dtype specification * extends the list of valid compiler constants to include device, layout, and dtype. * allows functions with Generators, but only using the default generator Known limitations: * when using `torch.float`, we convert it to a scalar tensor and make no checks that it is actually used only in a dtype specification. This is similar to how we handle Python numbers, creating some situations where the script is more permissive. Fixing this requires much more significant changes to the IR, so is lower priority for now. * devices specified using string literals e.g. 'cuda:1' do not work, since we do not support string literals in general.	2018-06-01 14:24:18 -07:00
onnxbot	2ec2e6947e	[auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064 ) `9e7855dcd4`	2018-06-01 21:15:47 +00:00
Tongzhou Wang	c6a923f486	Support modules that output scalar in Gather (and data parallel) (#7973 ) * Support modules that output scalar in Gather (and data parallel) * Improve warning msg	2018-06-01 16:20:39 -04:00
onnxbot	215abffe60	[auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060 ) `760c9283d0`	2018-06-01 20:14:57 +00:00
Zachary DeVito	23dd033b51	Factor python dependency out of interpreter (#7970 ) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI	2018-06-01 16:07:21 -04:00
anderspapitto	41ef5c2d4b	Support for generating ATen during the fbcode build, rather than committing the generated files (#8002 ) Paint the internal bikeshed a slightly different color to appease Buck tooling.	2018-06-01 16:04:02 -04:00
Will Feng	d27e138a1a	Allow CI testing with different AVX configs (#8020 ) * Allow CI testing with different AVX configs * Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config	2018-06-01 12:30:11 -07:00
Ryan Brigden	8f421159fd	Fix profiler crash when no events register (#8034 ) * Fix profiler crash when no events register When trying to profile, attempting to print the event table throws a vague error because the event list is empty: .... max_name_length = max(len(evt.key) for evt in events) ValueError: max() arg is an empty sequence This change fixes the error by returning an empty string. * Update profiler.py	2018-06-01 14:38:24 -04:00
Tongzhou Wang	bf29abd908	propagate nan in some activations (#8033 ) * propagate nan in some activations * fix py2 not having math.nan * flake8	2018-06-01 14:08:01 -04:00
onnxbot	8b447fa784	[auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062 ) `3fb965666e`	2018-06-01 17:09:28 +00:00
Pieter Noordhuis	d0ec8af0fc	Support CUDA tensors in ProcessGroupGloo (#7694 ) This adds an unconditional dependency on CUDA, which is not desirable for the long term. Ideally we have split like ATen where we have different artifacts for different backends so you can decide at runtime what to use.	2018-06-01 09:54:45 -07:00
onnxbot	d0e27609ab	[auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057 ) `1504a33abb`	2018-06-01 15:24:25 +00:00
onnxbot	03fe106448	[auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052 ) `33e9cd4182`	2018-06-01 15:23:39 +00:00
Vishwak Srinivasan	52368f25cc	Example for Transformed Distribution (#8011 )	2018-06-01 16:23:57 +02:00
Xingdong Zuo	8be17723cb	Update nn.rst (#8029 )	2018-06-01 09:37:18 -04:00
Will Feng	b41050ff66	Re-enable build env check (#7969 ) * Re-enable build env check * Fix linux test error * Try to fix macOS test error	2018-06-01 06:57:47 -04:00
Edward Z. Yang	dbe5c7f6e9	Mention the pytorch-ci-hud on the README. (#8004 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-01 06:56:48 -04:00
bddppq	580d212267	Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013 )	2018-05-31 20:50:59 -04:00
gchanan	436211e27c	Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935 ) * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. This requires renaming the _cast functions which used the unqualified names. * Separate onnx mapping of scalar type from cast name. * Fix flake8. * Properly cast onnx.	2018-05-31 20:50:16 -04:00
onnxbot	6c0bc27371	[auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053 ) `8ec0e5fe9b`	2018-06-01 00:11:02 +00:00
cpuhrsch	e63be0d58f	Reduce grain size for Unary operations (#8003 )	2018-05-31 21:59:53 +00:00
James Sun	0fe4cb10e3	Add on-stack observer cache for Observable (#7931 ) observers_list_ stores all the observers for an observable. The list is allocated on heap, which can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20% speed up for start and stop observer calls.	2018-05-31 13:05:02 -07:00
Dmitriy Serdyuk	fd30487089	Fix a couple of typos (#7998 ) * Fix typo * Fix typo * Fix typo * Fix typo	2018-05-31 15:29:02 -04:00
Pieter Noordhuis	8afe4c95d6	Entry for c10d in CODEOWNERS (#8001 )	2018-05-31 15:28:16 -04:00
Edward Z. Yang	80ede55242	Revert "Set smaller grain size for some cases" (#7988 )	2018-05-31 15:24:03 -04:00
Tongzhou Wang	85ee94b7be	Add memory leak check in CUDA tests (#7270 ) * Add memory leak check in CUDA tests * Tracking multi-GPU too * fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test * add a comment * skip if cuda * 1. Change the wrapper to a method in common.py:TestCase 2. Refactor common constants/method that initialize CUDA context into common_cuda.py 3. Update some test files to use TEST_CUDA and TEST_MULTIGPU * Fix MaxUnpool3d forward memory leak * Fix MultiLabelMarginCriterion forward memory leak * Fix MultiMarginLoss backward memory leak * default doCUDAMemoryCheck to False * make the wrapper skip-able * use TEST_MULTIGPU * add align_corners=True/False tests for Upsample; fix TEST_CUDNN * finalize interface * VolumetricMaxUnpooling_updateOutput * fix test_nccl * rename THC caching allocator methods to be clearer * make the wrapped function a method * address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp * fix renamed var	2018-05-31 15:09:54 -04:00
li-roy	bafec1637e	support loading gzip (#6490 ) * support loading gzip * address comments * address comments * fix lint * fix test for python2	2018-05-31 15:06:38 -04:00
Orion Reblitz-Richardson	3481c6c5e2	Build ONNX for PyTorch version of libcaffe2 (#7967 )	2018-05-31 11:57:35 -07:00
Seth Hendrickson	e9c33e91d9	Remove python bindings for `torch.slice` (#7924 ) * skip python bindings for slice * remove tests * convert slice test to indexing	2018-05-31 13:42:49 -04:00
xkszltl	89ba9dc44f	Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834 ) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda.	2018-05-31 10:22:21 -07:00
gchanan	eb39a23d8e	Make THStorage / THCStorage have void* data ptr. (#7964 ) * Make THStorage / THCStorage have void* data ptr. This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type. The major changes here are: 1) data has been renamed to data_ptr and made void* in THStorage/THCStorage. 2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage). 3) APIs for Accessing the data as a real: a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct). b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths). c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>(). Add include. * Attempt to fix clang build issues. * Clarify comment and remove extra character. * Rename unsafeData -> unsafe_data. * Remove unnecessary 'to' function to get compile time rather than link time errors.	2018-05-31 13:10:08 -04:00
Richard Zou	b5594ac750	Raise error when torch.load a storage on a non-existing device (#7921 ) * Raise error when torch.load a storage on a non-existing device Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine would raise an unreadable error: ``` ~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self) 223 if self.idx is -1: 224 return --> 225 self.prev_idx = torch._C._cuda_getDevice() 226 if self.prev_idx != self.idx: 227 torch._C._cuda_setDevice(self.idx) AttributeError: module 'torch._C' has no attribute '_cuda_getDevice' ``` This PR makes it so that torch.load raises a hard error if one tries to load a storage onto a non-existing device and suggests the user to use torch.load's map_location feature. * Address comments * missing dep	2018-05-31 09:44:50 -04:00
Tongzhou Wang	f9926e4ce5	Fix EmbeddingBag max_norm option (#7959 ) * fix EmbeddingBag max_norm option * flake8 * add warning to the embedding bag arg change	2018-05-31 09:42:56 -04:00
cpuhrsch	5596260b9e	Add third wayt to determine IS_CONDA (#7971 )	2018-05-31 09:04:27 -04:00
bddppq	d8e28cfec2	Enable ONNX backend Mean tests (#7985 )	2018-05-31 21:03:12 +08:00
Peter Goldsborough	d476d0b4ab	[Hotfix] Bring back warnings and -Werror to ATen (#7866 ) * Bring back warnings and -Werror to ATen * Unbreak... * Fix tbb errors	2018-05-30 21:59:04 -07:00
xkszltl	1bb6d44a21	Use Glog's implementation of STL logging when possible. (#7206 ) Inject custom workaround into namespace std so that it can be found by ADL.	2018-05-30 21:10:27 -07:00
bddppq	74783f0cd8	Move the broadcast check in MKL Add/Sum to runtime (#7978 )	2018-05-30 21:09:32 -07:00
Will Feng	08b4c7ab7f	Change perf test folder after git checkout (#7980 )	2018-05-30 20:15:53 -07:00
peterjc123	108fb1c2c9	Fix the import part of the windows doc (#7979 )	2018-05-30 21:51:30 -04:00
cpuhrsch	6e1de968d6	Use mingfeima's mkldnn (#7977 )	2018-05-30 21:46:39 -04:00
Orion Reblitz-Richardson	df77ea7baf	Fix the cpp libtorch CUDA build (#7975 )	2018-05-30 21:27:45 -04:00
Svetoslav Kolev	fce6b24468	Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536 )	2018-05-30 16:53:54 -07:00
Pooya Davoodi	9b1abd2f81	[Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155 )	2018-05-30 16:38:44 -07:00
Du Bois Eloi	f0c09203b0	[caffe2] YellowFin parameter update GPU code fix. (#6993 )	2018-05-30 16:36:08 -07:00
James Reed	c94f3bbf33	Fix typo in autodiff formula for addmm (#7932 )	2018-05-30 18:11:24 -04:00
Sam Gross	2e78bfa530	Delete unused file (#7919 )	2018-05-30 18:09:55 -04:00
Holger Kohr	fa8bdafa6c	Prevent git autocrlf for bash scripts (#7949 )	2018-05-30 18:09:10 -04:00
Tongzhou Wang	f721481543	Fix returning scalar input in Python autograd function (#7934 ) * fix _wrap_outputs not working with scalar inputs * add a test	2018-05-30 18:08:22 -04:00
cpuhrsch	df5d01df1e	Set smaller grain size for some cases (#7941 )	2018-05-30 18:07:13 -04:00
James Reed	1f94a6eab3	[JIT] Fission and fusion passes for addmm (#7938 ) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue	2018-05-30 18:06:58 -04:00
Richard Zou	769f5f7cfe	Handling of scalars in torch.Size (#5676 ) * Handling of scalars in torch.Size torch.Size() constructor uses python_arg_parser IntList in python_arg_parser can take iter/range Have IntList take python iterables and ranges. Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew Address comments Address comments * Rebased * Address nit	2018-05-30 17:50:32 -04:00
Will Feng	d102f9ea18	Split CI tests in half and run them in parallel (#7867 ) * Split and run tests in parallel * Refactor tests	2018-05-30 17:42:25 -04:00
Richard Zou	8e6cd43291	Fix checkBackend error message (#7926 ) * Fix checkBackend error message Fixes #7849 * Switch order of printing args	2018-05-30 16:51:23 -04:00
Richard Zou	0656ef483d	remove sort requirement from pad-sequence (#7928 ) * pad-sequence no longer requires sorting entries pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself. * remove sort requirement from pad-sequence Picks up from #5974. Removes the requirement that input sequences to pad_sequence have to be sorted. Addressed the comments in the PR: - Updated docstring for pad_sequence - Remove sort requirement in pad_sequence test - Test unsorted and sorted sequences in pad_sequence test	2018-05-30 16:36:55 -04:00
Tongzhou Wang	c5b895ac50	Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936 ) * try again * use DEFINED * use a loop * Minor fixes	2018-05-30 16:30:21 -04:00
gchanan	f8e83dc257	Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937 ) These are used to convert Half -> half and half -> Half respectively. from_type will be used for runtime type checking in THC.	2018-05-30 15:25:25 -04:00
James Reed	5419c6ecb7	Add unsafe flag to skip checking in prepare (#7832 ) * Add unsafe flag to skip checking in prepare * pop	2018-05-30 11:48:01 -07:00
Soumith Chintala	f4256c9605	cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942 )	2018-05-30 11:44:23 -07:00
James Reed	c0d50e1e1f	[JIT][script] Fix emitted gather and slice for dynamic indices (#7861 ) * [JIT][script] Fix emitted gather for dynamic indices * Also fix slice * Address comments	2018-05-30 11:43:22 -07:00
anderspapitto	795f6e1077	add test for correctness of transpose fusion (#7950 )	2018-05-30 10:56:51 -07:00
Sebastian Meßmer	b3e87b1066	Fix fbcode compatibility (#7939 )	2018-05-30 13:35:46 -04:00
Tongzhou Wang	8858b1d519	Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952 )	2018-05-30 12:55:02 -04:00
Peter Goldsborough	4a80755834	Split up detail.h (#7836 )	2018-05-30 08:55:34 -07:00
Edward Z. Yang	15122e93bc	Test if ASAN is actually working as part of ASAN tests. (#6050 ) * Test if ASAN is actually working as part of ASAN tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Drop explicit use of libstdc++, we should not care. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Build with DEBUG=1 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Increase main thread stack size when using ASAN. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-30 11:31:42 -04:00
onnxbot	4e5dec3024	[auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038 ) `307995b143`	2018-05-29 23:13:26 +00:00
Nathan Inkawhich	38dbe6e605	Updates to caffe2 operator documentation (#7917 ) * Significant updates to the operator docs in prep for merge	2018-05-29 14:38:56 -07:00
Anna Petrovicheva	c72c083151	Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465 )	2018-05-29 22:08:41 +01:00
peterjc123	267fc43a96	Fix Windows doc for import error (#7704 ) * Fix Windows doc for import error * Fix doc again * Fix wrong format	2018-05-29 22:07:00 +01:00
Teng Li	c2fa1f363b	[c10d] MPI Process Group Implementation (#7783 ) This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR. * [c10d] MPI Process Group Implementation ref: https://github.com/pytorch/pytorch/issues/7434 * Better exception, atexit func, and addressed comments * Clang formatting changes * Static initialization and addressed comments * Added constness back * Test will now launch mpi processes if found * CMakeList Changed	2018-05-29 22:06:48 +01:00
Arya McCarthy	a8625e016a	Spelling fix in MultivariateNormal docstring (#7915 )	2018-05-29 16:53:36 -04:00
xkszltl	0951f4424a	CUDA 9.2 adds support to GCC 7.3.1. (#7880 )	2018-05-29 21:53:06 +01:00
Sam Gross	e8cc16bb92	Release GIL when copying to shared memory (#7918 ) This releases the GIL when creating and copying a THStorage to shared memory.	2018-05-29 21:51:58 +01:00
Tongzhou Wang	f70146e922	Fix SN not backprop via sigma(W), and not reusing W_u (#7905 )	2018-05-29 15:55:29 -04:00
tvn	146b951ec5	Fix seeding random module in DataLoader (#7886 ) * fix seeding random module * make base seed int * follow 0.4 idiom * add a test for random seeding	2018-05-29 15:55:04 -04:00
Orion Reblitz-Richardson	65f8465f6f	Add back cpp_build tests for Mac (#7810 )	2018-05-29 12:54:12 -07:00
Sebastian Meßmer	a0480adc79	Fix file extension (#7852 )	2018-05-29 15:52:31 -04:00
Soumith Chintala	1ce7ed2895	fix slack email link	2018-05-29 15:51:22 -04:00
Orion Reblitz-Richardson	f7458faf98	Only add BUILD_ATEN/USE_ATEN once to flags (#7845 )	2018-05-29 12:21:11 -07:00
onnxbot	5c1fcea5db	[auto] Update onnx to 7361eec - Fix Operator Tests (onnx/onnx#1044 ) `7361eec59a`	2018-05-29 19:04:36 +00:00
Vedaanta Agarwalla	215fe057ea	No Default argument to max_unpool functions (Fixes #7327 ) (#7388 ) * Fix for Issue #7327 * Added testcase for max_unpool	2018-05-29 15:02:23 -04:00
Sebastian Meßmer	49f8581745	Update from facebook (#7855 ) * [mpscnn] MPSCNNChannelShuffle att * [Easy] Adding tags as an argument to the functional layer Without it "tags" would be added as an argument to the operator. The change here is based on the assumption that there is no operator that takes "tags" as an argument. * Fix locally_connected_op schema check. Fix locally_connected_op schema check. * [C2] Add TypeAndShape inference for few more operators As desc * [c2] Shape inference should support 0 as dimension Tensors can have 0 in their dimension. * Make MockHiveReader loop over and support max_examples Replace DatasetReader with RandomDatasetReader. So that Mock Hive Reader can simulate a large data input using a small sample file as source. * Utility function to wipe cache between benchmark runs Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache. * Allow caffe2 GlobalInit to be invoked multiple times Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization. * Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG * Rethrow current exception on failure Rethrow current exception instead of copy constructing a new one on op failure. * Make `clone()` return subclass of List/Struct `clone()` is not working correctly when we subclass those classes * Wipe the cache before the net run the util function is copied from D7409424 will rebase once D7409424 is landed. * [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds * Correct includes async_polling include -> async_base include * Prepare execution flags for executor migration Making async_scheduling aware of underlying net type to prepare for executor migration * Add operator level observers into async executor Adding operator level observers into RunAsync operators' calls * Cleanup TEST_Benchmark Remove duplicate code and provide default implementation in NetBase * [C2] Fix type and shape inference for binary comparison ops As desc. * Add GlobalInit to predictor to ensure initialization is always done before prediction FACEBOOK: Redo D7651453 the correct way. Now use a static variable for the arguments passed to GLog * Remove spammy log message This method is currently used in various places inside Caffe itself. * Disable events for operators inside a chain We don't need to use events in operators within a chain because the chain is always scheduled on a single stream, keeping only first and last event for scheduling purposes * Ensure correct finish run order In rare cases we might call finishRun and trigger net's destruction while another worker is still holding shared_ptr to a thread pool, that can cause thread pool destruction from within a worker thread in case no other nets are using the pool. This diff fixes the order of calling finishRun and also changes pool() to return raw pointer to keep pool's ownership within the net * Reduce unnecessary polling Make sure we don't waste CPU by polling operators that we can set an efficient callbacks on * Squash commit of syncing 9506eeb from github to fbcode Patch xplat buck fix add virtual destructor to OptimizationPass add virtual destructor to OptimizationPass build fixes for sync build fixes for sync * Fix net tracing Fix net tracing from async_scheduling * Fix logging	2018-05-29 11:38:02 -07:00
Richard Zou	9f21ec7ca2	Add spaces to indexing error message (#7922 ) Followup to #7345	2018-05-29 13:10:06 -04:00
Sam Gross	637a044a24	Add missing ${generated_comment} (#7920 )	2018-05-29 13:08:05 -04:00
Edward Z. Yang	bc8a92d03d	Move REGISTER_CUDA_HOOKS to cpp file. (#7630 ) It's going to define a static variable, and this was a loaded footgun if another C++ file directly included this header. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-29 17:40:50 +01:00
onnxbot	fb59ce32c8	[auto] Update onnx to 385523b - Eliminate unused initializer (onnx/onnx#860 ) `385523bf1c`	2018-05-29 13:55:52 +00:00
Edward Z. Yang	6dfadfeb89	Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST" (#7914 ) * Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879)" This reverts commit 45cdb63d8b8022ab26f073d3bed718e75d2aedaf. * Disable dirty test; always run all CI runs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-29 14:48:18 +01:00
Thomas Viehmann	42a68749bf	einsum: don't inplace modify arguments (fixes: #7763 ) (#7765 ) Thank you, Pierce Freeman, for the report and minimal example!	2018-05-29 11:26:39 +01:00
gchanan	fb23e62797	Remove templatization of PyTypeObject in THP copy storage methods. (#7811 ) * Remove templatization of PyTypeObject in THP copy storage methods. An in-progress refactoring of THStorage is collapsing the types of THStorages to not be ScalarType-specific. The revelant PyTypeObject to use for the THPStorageType is currently templatized based on the current THStorage; this doesn't work if the ScalarType is collapsed. Instead, just pass it explicitly. * Pass src type instead of dst type. * Line up columns.	2018-05-29 11:19:34 +01:00
Sam Gross	8b85b8afd7	Avoid @generated in templates. (#7858 ) * Avoid @generated in templates. We want @generated only in the build products. Otherwise, templates are locked and changes to the templates are excluded from phabricator. Also adds @generated to autograd generated files (e.g. VariableType.cpp). See #7780 * Don't try to specify the template filename in generated comment The template filename is not always the same as the generated filename.	2018-05-29 11:18:31 +01:00
onnxbot	07f55ae568	[auto] Update onnx to e570127 - update version (onnx/onnx#1034 ) (onnx/onnx#1041 ) `e5701271f0`	2018-05-29 05:52:05 +00:00
Bram Wasti	c122d271a8	[caffe2][nomnigraph] Default disable optimization passes (#7741 )	2018-05-28 15:49:25 -07:00
Tongzhou Wang	45cdb63d8b	Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879 )	2018-05-26 17:20:46 -04:00
li-roy	07a0482d80	Make size_average docs clearer (#7829 ) * Make size_average docs clearer * fix format	2018-05-26 11:18:57 -04:00
gchanan	7cd1ea8166	Make TensorMethods (fastGetSet) not depend on data type of Storage. (#7859 ) * Make TensorMethods (fastGetSet) not depend on data type of Storage. Currently, fastGetSet is implemented as macros that depend on the data type of Storage (i.e. that storage->data is real). Since we are moving to having 'void' data this won't work in the future. Also, due to the recentl C/C++ split, these are actually C++ implementations (because they require the struct definition which is C++), so we move them to a generic .hpp file and implement them as static inline functions. * Fix set functions. * Add generic to CMakeLists.	2018-05-26 11:17:40 -04:00
James Reed	5e50993be7	Better type checking for pack_padded_sequence symbolic (#7874 )	2018-05-26 11:16:41 -04:00
James Reed	af3d0e20a0	[ONNX] Fix transpose fusion logic (#7872 )	2018-05-26 11:13:15 -04:00
braincodercn	f0dc40f77e	Fix typo (#7876 )	2018-05-26 11:11:19 -04:00
onnxbot	fece8787d9	[auto] Update onnx to 789efb1 - update proto files. (onnx/onnx#1040 ) `789efb166d`	2018-05-26 09:40:53 +00:00
Yinghai Lu	d8101e8410	[Caffe2] Fix roi_align_op_gpu_test and test_layer_norm_grad_op (#7875 ) * Fix roi_align_op_gpu_test * Fix layer_norm_op_test.py::TestLayerNormOp::test_layer_norm_grad_op	2018-05-26 02:28:48 -07:00
Paul Jesse Hellemn	5c8d48c457	Properly pass xml report flags to ATen tests in Caffe2 builds (#7863 ) * Not running ATEN tests on Caffe2 builds * Keeping test directory when only aten is built * Changing to run all aten tests too * Skipping directories again * . * . * skip aten/integer_divider_test (it hangs for unknown reason)	2018-05-25 23:21:40 -07:00
onnxbot	06d5dd088d	[auto] Update onnx to ec3b679 - Re-enable mypy, Fix releasing from Windows (onnx/onnx#1037 ) `ec3b6797b7`	2018-05-26 00:41:46 +00:00
Orion Reblitz-Richardson	74246c9ba4	Potential fix for RNN test on MKL (#7862 )	2018-05-25 16:16:46 -07:00
bddppq	aae0ad58f3	Fix onnx integration tests build issues (#7856 ) * Fix onnx integration tests build issues * set -DBUILD_SHARED_LIBS=OFF for integrated builds * verbose log * non-local protobuf * . * turn back off verbose logging * Fix typo	2018-05-25 15:19:54 -07:00
Chunli	14f8cd7e3d	[JIT][script] Implement nn.Sequential that can be inlined into script modules (#7747 ) * Implement nn.Sequential that can be inlined into script modules * fix bugs * add comment * add _ConstSequential class * add script_method for forward in ConstSequential * fix build bug * refactor	2018-05-25 13:38:24 -07:00
Yinghai Lu	c5b623e5d1	Use __float2half (#7850 )	2018-05-25 13:25:56 -07:00
anderspapitto	d5c466e5ce	RNN export: add transpose to match onnx spec (#7825 ) Didn't quite get it right the first time. fixes https://github.com/pytorch/pytorch/issues/7817	2018-05-25 12:56:57 -07:00
anderspapitto	e6488bbd01	add jit/passes/onnx CODEOWNERS line (#7853 )	2018-05-25 15:52:39 -04:00
cpuhrsch	d2f98fcae9	Fix perf commits (#7848 )	2018-05-25 17:42:47 +00:00
ngimel	b1d03b795a	add launch bounds to im2col and col2im (#7779 )	2018-05-25 12:25:49 -04:00
avmgithub	0f7f27a843	fix typo from #7399 (#7846 )	2018-05-25 12:11:50 -04:00
Sam Gross	bed0ec3b21	Add missing trailing underscores	2018-05-25 08:27:25 -07:00
ngimel	8d0622ca9d	re-fix 9.2 build (#7828 )	2018-05-25 11:13:20 -04:00
Yinghai Lu	fb5cc630f6	Fix me (#7837 ) * Mini fix * No USE_MKL * Add CAFFE2_USE_EIGEN_FOR_BLAS	2018-05-25 07:38:50 -07:00
Gao, Xiang	d7c32df67f	move Subset, random_split to data, use sequence at some places. (#7816 )	2018-05-25 12:50:50 +02:00
onnxbot	ce1a65b5c2	[auto] Update onnx to 94dbb76 - Fix comma in Gemm description (onnx/onnx#1032 ) (onnx/onnx#1035 ) `94dbb76747`	2018-05-25 03:41:34 +00:00
bddppq	93b7b5dddd	Fix trigonometric_op_test failures when running in python3.6 (#7831 )	2018-05-24 19:09:35 -07:00
onnxbot	dbac3d21f6	[auto] Update onnx to b18cbd3 - remove mypy which blocks release. (onnx/onnx#1031 ) `b18cbd3364`	2018-05-25 01:29:25 +00:00
Peter Goldsborough	28b1a3852c	Add backward() to Tensor and Variable (#7774 ) * Add backward() to Tensor and Variable * Add at:: in front of Tensor * Trying to not move optional to appease windows? * Move implementation into cpp file * Undo some formatting changes	2018-05-24 17:31:41 -07:00
onnxbot	147cc05cf5	[auto] Update onnx to 8236f49 - Kezhan/update manifest (onnx/onnx#1029 ) `8236f49124`	2018-05-24 23:00:38 +00:00
Yinghai Lu	144c5d1ff3	Overwrite INTEL_MKL_DIR correctly (#7824 )	2018-05-24 15:04:25 -07:00
anderspapitto	2271e7d7ab	onnx->caffe2 output: better handling of init/pred splitting (#7820 )	2018-05-24 14:49:14 -07:00
Yinghai Lu	71bad33cc4	Match parenthesis (#7797 )	2018-05-24 13:45:23 -07:00
Peter Goldsborough	b12164005f	[C++ API] Remove virtual forward and implement Sequential based on Any(Module) (#7508 ) * Remove virtual forward * Rebase	2018-05-24 12:46:51 -07:00
Ruotian(RT) Luo	1078491502	Change is_tensor to isinstance(*, torch.Tensor) (#7814 ) Thanks!	2018-05-24 15:08:16 -04:00
onnxbot	0fddfe6c21	[auto] Update onnx to f8aa447 - update version number (onnx/onnx#1027 ) `f8aa447431`	2018-05-24 18:14:57 +00:00
onnxbot	9a736f5228	[auto] Update onnx to 640a4ec - [Easy] Fix the gen_doc.py (onnx/onnx#1024 ) `640a4ec5d2`	2018-05-24 15:33:02 +00:00
onnxbot	f88c529d06	[auto] Update onnx to 5591c95 - Enable non-static schema registration (onnx/onnx#894 ) `5591c95f68`	2018-05-24 15:31:45 +00:00
Tongzhou Wang	c946db16ec	[distributions] Always enable grad when calculating lazy_property (#7708 ) * Always enable grad when calculating lazy_property * Add test with MultiVariableNormal	2018-05-24 11:22:39 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
Lu Fang	f9633b9542	[Caffe2] Skip some tests to unbreak CI (#7804 ) * Skip some tests to unbreak CI * Pass the opset_version to run_node * Remove the stale check_graph call, caffe2_net_to_onnx_model will invoke check_model	2018-05-24 00:12:00 -07:00
onnxbot	fdabc02644	[auto] Update onnx to 9e6e7e4 - Support opset_version in run_node (onnx/onnx#1022 ) `9e6e7e4282`	2018-05-24 06:10:07 +00:00
Peter Goldsborough	cfd70dc1cf	[C++ API] Back to reset() and fixed in-place cloning (#7796 ) * Back to reset() and fixed in-place cloning * Add final override to clone_	2018-05-23 22:11:32 -07:00
onnxbot	6df371ba2f	[auto] Update onnx to 9a37d4d - Add PRelu test cases (onnx/onnx#580 ) `9a37d4daf5`	2018-05-24 01:51:36 +00:00
onnxbot	43d87afdc2	[auto] Update onnx to d2a46da - fix gru, rnn, lstm test cases to match the specification and add some cases (onnx/onnx#920 ) `d2a46da13b`	2018-05-24 01:50:51 +00:00
Lu Fang	1289fc870d	Disable onnx backend node tests with broadcasting (#7730 )	2018-05-24 09:15:16 +08:00
bddppq	966c65859d	Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802 ) * Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) `4898c9e925`" This reverts commit 9c679dab5fe7cac27bb8c783fd143276e6046ef1. * Revert "Add BiasCHW fallback for GPU (#7738)" This reverts commit 14ad2e74f108d13ec98abb078f6aa7f01aae0aad. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)" This reverts commit 2ebcf4bb37739733e76b754284cf8b2ffcba1c30.	2018-05-23 17:58:47 -07:00
onnxbot	9c679dab5f	[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879 ) `4898c9e925`	2018-05-24 00:31:58 +00:00
Yinghai Lu	14ad2e74f1	Add BiasCHW fallback for GPU (#7738 )	2018-05-23 16:04:35 -07:00
Peter Yeh	2ebcf4bb37	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.	2018-05-23 15:13:09 -07:00
jvmancuso	4352eab367	Call grad_mode.py context managers as decorators (#7737 ) * call grad_mode.py context managers as decorators * flake fixes * switch to using context manager in wrapper * fix set_grad_enabled test * removed dumb github UI whitespace * revert set_grad_enabled to normal, update tests	2018-05-23 17:39:13 -04:00
Thomas Viehmann	aa214a8b8c	catch CPU tensors in checkSameGPU (fixes #7689 ) (#7767 ) Thank you, Nikita Kitaev, for the report and example.	2018-05-23 17:28:37 -04:00
Marat Dukhan	0e9613cc49	Mark stack as non-executable in NNPACK (#7752 ) Pull new revision of NNPACK which specifies non-executable stack in assembly files. Previous revision didn't do that, and depending on toolchain could cause linker to mark stack as executable for the linked binaries.	2018-05-23 12:50:07 -07:00
ngimel	1feb1a9b88	small fixes in fusion_compiler (#7776 ) * small fixes in fusion_compiler * address review comments	2018-05-23 15:18:58 -04:00
Pieter Noordhuis	7d0de4f138	Run clang-format on c10d (#7791 )	2018-05-23 11:26:35 -07:00
yyetim	42134ee799	Allow empty storage for the 'Edge' class. (#7595 ) This commit: - Converts edge storage to an optional type. - Adds a new test in tarjans_test. - Refactors related bits in other files.	2018-05-23 10:40:29 -07:00
Pieter Noordhuis	ee5e474fcf	Process group base class and Gloo implementation (#7628 ) This is a starting point and only implements allreduce for CPU tensors. It includes most base functionality like algorithm caching (similar approach as taken in the THD GlooCache) and multi-threaded execution (new). The expectation is that function calls on the process group class are globally serialized. They execute collective functions, so members of the collective must call the same functions in the same order, or a deadlock may happen. The algorithm cache works as follows: the ProcessGroupGloo class has a cache map from algorithm keys to algorithm entries. The algorithm key is a struct with fields that make up the signature of a collective function. It includes the dimensionality of the input/output tensors, tensor device assignment, source/destination rank, etc. For collective calls with the same key, the process group will lazily initialize and then cache a Gloo algorithm instance. For now we only keep a single algorithm instance per key, but this may be revisited in the future, if we observe contention on a single key and can exploit additional parallelism.	2018-05-23 09:02:18 -07:00
Ailing	5a3f7810f8	_LRSchedulers getstate include optimizer info (#7757 ) * getstate should include optimizer * remove getstate/setstate functions	2018-05-23 11:43:42 -04:00
Tongzhou Wang	e3e15b5d95	[PyTorch] [gradcheck] change backward() to grad() (#7710 ) * Change backward calls to grad to avoid memory leak from #7343; Replace unnecesary create_graph=True with retain_graph=True * fix gradgradcheck use of make_non_contiguous * allow non-contguous target * remove unnecessray .grad.zero_() * remove contiguous_detach * fix PReLU double backward always returning ggW as a scalar * let noncontig gO require grad * move requires_grad to return	2018-05-23 11:03:12 -04:00
vfdev	6a604f16cc	Update test_nn.py (#7787 )	2018-05-23 12:28:13 +02:00
avmgithub	60d5c0eb19	Define general default scheduler for TBB and fix ppc64le bug (#7761 )	2018-05-23 12:24:33 +02:00
Ruotian(RT) Luo	2222fc7666	Add support for accepting Tensor as input in clip_grad_* functions. (#7769 )	2018-05-23 12:12:03 +02:00
bddppq	5316cad5c2	[Easy] Remove unused code (#7782 )	2018-05-22 22:32:47 -07:00
cpuhrsch	85e9ae20e5	Update tbb (#7734 )	2018-05-23 01:54:16 +00:00
Andrew Tulloch	f534339a1a	Add @generated annotation (#7780 )	2018-05-22 18:33:05 -07:00
Tongzhou Wang	ee628d64b9	fix legacy comment after variable tensor merge (#7771 )	2018-05-22 19:08:42 -04:00
Will Feng	60745b3380	Revert #7750 and #7762 to fix Windows CI on master (#7772 ) * Revert "Add missing brace (#7762)" This reverts commit ea27c5af50f6bc8ba82068e6d36ade9c773dc101. * Revert "[C++ API] Add backward() to Tensor and Variable (#7750)" This reverts commit 1e2762796f33123d86782936089dbeda37bdcc92.	2018-05-22 15:42:52 -07:00
Will Feng	8d91a602cc	Temporarily disable build env check (#7768 )	2018-05-22 12:51:00 -07:00
Peter Goldsborough	ea27c5af50	Add missing brace (#7762 )	2018-05-22 14:18:22 -04:00
Peter Goldsborough	1e2762796f	[C++ API] Add backward() to Tensor and Variable (#7750 ) * Add backward() to Tensor and Variable * Added a couple tests	2018-05-22 10:43:04 -07:00
onnxbot	e5b830eb0e	[auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005 ) `d43b55087d`	2018-05-22 17:40:43 +00:00
onnxbot	bb15a0830d	[auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675 ) `ea1aa139b2`	2018-05-22 01:50:13 +00:00
Ben	bb34887ae3	include cudnn_h (#7749 )	2018-05-21 21:48:50 -04:00
Peter Goldsborough	549b4069bb	[C++ API] Using new registration mechanism (#7663 ) * Using new registration mechanism * Fix signature of param() in module.cpp * Remove ParameterList * Fix tests	2018-05-21 17:59:21 -07:00
onnxbot	312ab535ba	[auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000 ) `5dd68e634b`	2018-05-22 00:22:30 +00:00
onnxbot	8275e430b0	[auto] Update onnx to 169b156 - Add more missing type hints (onnx/onnx#991 ) `169b1561e9`	2018-05-21 22:08:15 +00:00
onnxbot	f01be11efd	[auto] Update onnx to b3b3b28 - Enable checking for functions that don't have a type hint (onnx/onnx#989 ) `b3b3b2851a`	2018-05-21 19:18:19 +00:00
onnxbot	c5ffc3a02c	[auto] Update onnx to 9f9316a - Catch up with type hints (onnx/onnx#988 ) `9f9316a5e2`	2018-05-21 19:17:25 +00:00
onnxbot	d02b7ab389	[auto] Update onnx to c168303 - Better error message if protoc isn't found (onnx/onnx#1004 ) `c168303031`	2018-05-21 18:38:53 +00:00
onnxbot	9506eeb73a	[auto] Update onnx to 52f7528 - add more shape inference tests (onnx/onnx#971 ) `52f75285ad`	2018-05-21 17:09:03 +00:00
Zachary DeVito	286cd04a20	JIT cleanup (#7631 ) Cleans up dead code in the JIT: * Remove interpreter_autograd_function * Remove Handles * Remove HandleBuilder * Remove creates_handles, and tracing_autograd_python_function flags * Remove unused var_args * Fix submodules	2018-05-21 10:06:29 -07:00
avmgithub	e6f7e1807d	fix to build sleef when using cmake 3.11.1 (#7679 )	2018-05-21 15:13:17 +00:00
braincodercn	5ee5537b98	Fix typo in document (#7725 )	2018-05-21 11:10:24 -04:00
onnxbot	28b592e00b	[auto] Update onnx to 6f4b1b1 - Tests for Gemm operator (onnx/onnx#885 ) `6f4b1b12e5`	2018-05-21 12:22:11 +00:00
onnxbot	987b52460d	[auto] Update onnx to c6c6aad - Enhance the 1-element broadcast case (onnx/onnx#902 ) `c6c6aad416`	2018-05-21 11:29:53 +00:00
Ailing	b4ae80d459	serialization for torch.device (#7713 )	2018-05-21 11:34:26 +02:00
peterjc123	ee6e3fe301	Fix compile flags for MSVC (#7703 )	2018-05-21 10:20:19 +02:00
bddppq	0a11018db6	Fix exporting Sum to onnx (#7685 ) * Fix exporting Sum to onnx * extend fix to prod and mean * update expect file	2018-05-20 23:37:42 -07:00
Lu Fang	a890a0be07	Renanme ZFNet to ZFNet512 (#7723 )	2018-05-21 11:37:39 +08:00
Ailing	75cf0faf4c	Implement __reduce__ for torch.dtype (#7699 )	2018-05-20 14:59:02 +02:00
bddppq	5000a05724	Remove unnecessary include in vec256_float.h (#7711 )	2018-05-20 11:23:43 +02:00
bddppq	f94ae3ba1d	Update from facebook (#7696 ) * Fix handling of empty batches in SumReduceDimsOp As titled * Deferrable async_scheduling finishRun fix Proper order of finishing run operations in deferrable_async_scheduling net * Simplify exception handling in async_scheduling Simplify exception handling, no need to busy wait, thread that processes the last task can finish the run * [C2]worker_coordinator_memorize_worker_ids As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids * Add unit test for nets with no type set * Ignore total length argument in sympolic_pad_packed_sequence 1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence) 2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp * Add support for MKLDNN to async_scheduling Just add MKLDNN as a possible CPU option to async_scheduling's pool function * [AuFL][ensemble] support branch output for prediction This diff supports using predictions from different branches and thus enables model ensembling (not fully independent). * Fix a bug in add_loss in layer_model_helper As titled. * Support lradaption for adam 1.lr adaption operator 2.apply to dense adam * Perf tweaks for async_scheduling Restore single pool option + remove unnecessary (no-ops) calls * add quantization to SparseSimdAdagradOp add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next * [sr] [codemod] Change all SR callsites to use new API @allow-large-files This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`. ``` cd ~/fbsource/fbcode find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \; ``` Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes). * Back out "Fix handling of empty batches in SumReduceDimsOp" Original commit changeset: 282da1730cc2 This commit is blocking the Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this diff depends on will be reverted in the sync D7990948 which causes this to break. The sync diff cannot be patched with this reversion because it must be landed against base revision 5c8c099 , and D7881937 must not be included in the sync diff because it is breaking GPU tests that are not available in sandcastle : https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console for one example. * Add the flow to support operator benchmark 1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark * [tum][gpu] Connect DPM trainer with flow and unit tests This diff: - Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases. - Add correct tags to the TUM code, so we can do data parallel transform - pass extra info when instantiation. - add unit test for using DPM in TUM model After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking. * w/o normalized lradaption for adam dense only The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step * [fb] Use SharedPromise in DeferrableAsyncSchedulingNet This code is to simplify DeferrableAsyncSchedulingNet by removing condition variable + small fixes * [tum] implement cuda sparseLengthsMean and LengthsMean as title * Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. * Move feature_to_index to FeatureSpec.feature_to_index move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields * [Caffe2] Rename bytes_moved to bytes_written Just a rename in preparation for supporting bytes_read. * [c2] fix ReduceFrontSumOp for empty case by setting 0 otherwise, it may use the results from last iteration when it's empty batch. * [Caffe2] [Int8] Improve Intel CPU performance * [Easy] Improve PrependDim op logging as titled * DBFileReader expand db_path using os.path.expanduser(..) Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves. * [Caffe2] Add bytes_read to cost structure We're adding analytical read bytes to cost functions. This extends the structure accordingly for all CostInference defined operators. Additionally, some small bug fixes were performed: 1) Cost functions now extract type information of operands instead of assuming float * Fix sleef on aarch64 for hhvm @bypass-lint Rename flag * Remove duplicated part in caffe2/ideep/operators/conv_op.cc should be sync error * Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest	2018-05-19 23:10:48 -07:00
ngimel	2cb096ada8	fix for cuda 9.2 builds (#7709 )	2018-05-19 21:18:48 -07:00
Gao, Xiang	42e5e12750	make BatchSampler subclass of Sampler, and expose (#7707 )	2018-05-19 21:29:03 +02:00
Peter Goldsborough	cf9b80720d	Dont emit warning for ABI incompatibility when PyTorch was built from source (#7681 )	2018-05-19 20:25:52 +01:00
Thomas Viehmann	8f97cbcf4e	remove index from python bindings (fixes: #7639 ) (#7690 )	2018-05-19 20:04:07 +02:00
Naman Jain	ee882eae8e	Update _torch_docs.py (#7700 ) Added better example	2018-05-19 11:12:02 -04:00
Paul Jesse Hellemn	48bf733480	Changes from D7881937 and D7963936 plus an edit (#7605 ) * Changes from D7881937 and D7963936 plus an edit * D8038158 * Another change from cxj	2018-05-18 20:59:16 -07:00
onnxbot	77fe4bd0b6	[auto] Update onnx to 241a350 - Type and shape inference for RNN, LSTM, GRU (onnx/onnx#937 ) `241a350272`	2018-05-19 02:04:55 +00:00
Marat Dukhan	f7d96a367b	Update NNPACK and cpuinfo submodules (#7691 ) Updated NNPACK to 42d9355 Updated cpuinfo to 1e6c8c9	2018-05-18 18:27:56 -07:00
Chunli	ec71c689fc	[JIT][script] Add matmul(@), pow(*) operator (#7648 ) add matmul(@), pow(*) operator fix bug(matmul not in py2) in @ operator * fix bugs * add get_fn help func to remove duplication in test_jit	2018-05-18 15:24:20 -07:00
Pieter Noordhuis	27ea7148fe	Updates to .clang-format (#7683 ) 1) No longer compact namespaces (revert from #5127) 2) Don't break on return type for long function declarations	2018-05-18 15:11:17 -04:00
onnxbot	2f5494ac14	[auto] Update onnx to a75fa2c - fix customized version of find Protobuf for Error when calling find_package(Protobuf) twice (onnx/onnx#901 ) `a75fa2c402`	2018-05-18 18:43:01 +00:00
onnxbot	875a5dceb0	[auto] Update onnx to 55fff7b - python setup.py typecheck (onnx/onnx#972 ) `55fff7b796`	2018-05-18 18:24:01 +00:00
gchanan	4f20a0e439	Fix various sparse transpose issues; remove dead code from Declaratio… (#7200 ) * Fix various sparse transpose issues; remove dead code from Declarations.yaml. 1) Fixes some checks in t_, transpose_ that don't allow transposing empty sparse tensors. 2) Remove out= variants from docs since they don't exist (and haven't since at least v0.3.1). 3) Unify implementations of t_, transpose_, t, transpose. 4) Move dead checking code from Declarations.cwrap to actual implementations. 5) Fix test which never tested transpose_. * Add test for error with t, t_. * Address review comments. * Fix jit tests. * Fix test_jit.	2018-05-18 19:51:41 +02:00
gchanan	7abdc303c6	Don't allow requires_grad to be set on integer Tensor constructors in… (#7185 ) * Don't allow requires_grad to be set on integer Tensor constructors in tensor_new. * Fix autograd test. * Fix test_distributions. * Fix test_jit. * Fix NN tests.	2018-05-18 19:45:10 +02:00
cpuhrsch	431c80a128	Guard sleef for AVX/AVX2 (#7678 )	2018-05-18 17:33:21 +00:00
onnxbot	cf0c585b6a	[auto] Update onnx to e050bcc - add multinomial op to ONNX (onnx/onnx#897 ) `e050bccacb`	2018-05-18 17:17:17 +00:00
Seth Hendrickson	32b23a4bfc	Throw error on tensor creation when sequence shape cannot be determined (#7583 ) * first commit * unit test * minor style edits	2018-05-18 19:14:42 +02:00
Richard Zou	e37da05bd5	Expose documentation for random_split (#7676 ) Fixes #7640	2018-05-18 17:16:25 +02:00
Thomas Viehmann	8212f576db	improve RNN docs (fixes #3587 ) (#7669 )	2018-05-18 16:41:03 +02:00
Thomas Viehmann	f7bc7007d4	return nan in max_pool/adaptive_max_pool for nan args (#7645 ) (#7670 )	2018-05-18 16:39:41 +02:00
Thomas Viehmann	bf95dff85b	Map digamma +/-inf results to nan in test (fixes #7651 ) (#7665 )	2018-05-18 16:35:00 +02:00
Richard Zou	50d8473ccc	Document dtype arg for reduce ops (#7654 ) Fixes #7039.	2018-05-18 10:30:38 -04:00
Richard Zou	c46a0c8813	add back Tensor.permute docs (#7652 )	2018-05-18 10:29:43 -04:00
Richard Zou	56e7a2cde1	Better support for adding zero-filled sparse tensors (#7479 ) Right now, if we add a zero-filled sparse tensor with another sparse tensor, both tensors must have the same "density" (dimI, dimV) and size (tensor.size()) for them to be added successfully. This relaxes that constraint so that if both tensors have the same tensor.size() and at least one is zero-filled, they can be added successfully. Before: ``` i = torch.LongTensor([[0, 1, 1], [2, 0, 2]]) v = torch.FloatTensor([3, 4, 5]).unsqueeze(1) sparse_mat = torch.sparse.FloatTensor(i, v, torch.Size([2,3,1])) zeros = torch.zeros(sparse_mat.size(), layout=torch.sparse_coo) sparse_mat + zeros RuntimeError: cadd operands have incompatible sizes or dimension types at ../src/THS/generic/THSTensorMath.c:126 ``` After: no error.	2018-05-18 10:29:27 -04:00
Mikhail Korobov	f12b8770cd	use matching tp_name for torch.device (#7673 )	2018-05-18 16:24:21 +02:00
onnxbot	c58893eb9e	[auto] Update onnx to 59b0b24 - Clarified description of Pad attribute (onnx/onnx#962 ) `59b0b24643`	2018-05-18 13:09:51 +00:00
Adam Paszke	06fa332e2b	Fix UB when converting negative floating values to uint8_t (#7644 )	2018-05-18 11:02:00 +02:00
onnxbot	4dd0aab33c	[auto] Update onnx to 3fc5f43 - move finalize function to be public. (onnx/onnx#987 ) `3fc5f43e91`	2018-05-18 08:07:12 +00:00
Marat Dukhan	47ab3f936b	[caffe2] Fix warning in net_async_tracing.cc (#7646 ) Compilers used to report a warning: caffe2/core/net_async_tracing.cc: In member function 'void caffe2::tracing::Tracer::renameThreads()': caffe2/core/net_async_tracing.cc:210:32: warning: overflow in implicit constant conversion [-Woverflow] const long numa_multiplier = 10e9; This patch fixes it.	2018-05-17 22:36:54 -07:00
onnxbot	ca860907bb	[auto] Update onnx to 8d548e2 - Update shape inference methods to throw exception (onnx/onnx#986 ) `8d548e2361`	2018-05-18 04:42:23 +00:00
bddppq	bc4feab3e3	Fix flaky atomic iter test (#7649 )	2018-05-17 21:17:29 -07:00
bddppq	5207998fc3	Fix onnx Pow export (#7657 )	2018-05-17 21:15:04 -07:00
onnxbot	93f8d98027	[auto] Update onnx to 8356ad5 - Add unit test framework for the project C++ APIs (onnx/onnx#763 ) `8356ad54e9`	2018-05-17 23:52:58 +00:00
Bram Wasti	2d313276b2	[caffe2][nomnigraph] Add registry for optimization passes (#7656 )	2018-05-17 16:33:56 -07:00
onnxbot	8c0299b5e6	[auto] Update onnx to 94ca052 - Update mypy version (onnx/onnx#968 ) `94ca052447`	2018-05-17 22:52:09 +00:00
Soumith Chintala	d4f6c84041	fix nccl distributed documentation	2018-05-17 18:03:54 -04:00
Mike Ruberry	f2295494af	Makes AccumulateGrad high priority in backwards passes (#7604 ) * Makes accumulate_grad functions high priority in backwards passes * Delegating constructor and comments * Sequence_nr ain't pretty no more * Sequence_nr ain't pretty no more	2018-05-17 23:49:15 +02:00
Peter Goldsborough	cba19e59ca	[C++ API] Implement builder style construction (#7597 ) * Implemented fused builder based construction mechanism * "weights" -> "weight" * Use int64_t instead of size_t everywhere in RNN * Extracted Conv::ExpandingSize into its own thing * Rename TORCH_PARAMETER to TORCH_ATTR * Added documentation * Fix weight names in batchnorm module	2018-05-17 17:10:15 -04:00
Teng Li	0d27d2686c	C10D: Added TCPStore to support C10D store interface (#7560 ) Reference: https://github.com/pytorch/pytorch/issues/7434 * C10D: Added TCPStore to support C10D store interface * Used pipe to terminate the store daemon and addressed all comments * Used notify/wake for wait and addressed all comments * Clean up nits * Clean up all socket states when the socket is closed	2018-05-17 13:38:06 -07:00
onnxbot	ec42a11410	[auto] Update onnx to ba86ec2 - Protobuf typing (onnx/onnx#982 ) `ba86ec2682`	2018-05-17 18:29:16 +00:00
Matt Le	562d9971c9	Add LBFGS optimization algorithm to C++ API (#7596 ) * Adding LBFGS to cpp API * Adding stop conditions * Test cases now passing and adding closure to all algs * Addressing code review * Set seeds to make optim tests more deterministic	2018-05-17 14:03:08 -04:00
Gao, Xiang	84730aa659	support <= and >= (#7633 )	2018-05-17 10:01:29 -07:00
Zachary DeVito	f7f95f1742	Reduce gen_jit_dispatch options (#7562 ) * Reduce gen_jit_dispatch options This removes the power set of options generated for IntList[k] arguments in aten_dispatch. Instead, the compiler now performs the broadcast using schema information. This substantially cuts the compile time for aten_dispatch.cpp	2018-05-17 10:00:35 -07:00
onnxbot	331a04d8eb	[auto] Update onnx to 321d874 - update output shape of RNN ops according to ONNX spec (onnx/onnx#923 ) `321d87457f`	2018-05-17 05:47:23 +00:00
onnxbot	77e8a23a29	[auto] Update onnx to a8b3316 - add exception mechanism for use in type and shape inference (onnx/onnx#983 ) `a8b3316cff`	2018-05-17 04:41:12 +00:00
onnxbot	9a1a20cb33	[auto] Update onnx to 13196bf - Shape inference for ConvTranspose (onnx/onnx#973 ) `13196bf40b`	2018-05-17 03:58:54 +00:00
James Sun	b4d5e67e5f	Add asin, acos, tan, atan operators (#7600 )	2018-05-16 18:09:26 -07:00
cpuhrsch	221e615665	Move bernoulli further into ATen (#7578 )	2018-05-16 23:20:40 +00:00
cpuhrsch	330a72581f	Update README to contain instructions on how to install mkldnn for Linux (#7625 )	2018-05-16 19:08:03 -04:00
onnxbot	3c9ded098d	[auto] Update onnx to 83f3666 - Spec clarity: Versioning (onnx/onnx#931 ) `83f366619e`	2018-05-16 22:29:20 +00:00
Adam Paszke	3238db6247	Show skipped distributed tests as skipped (#7624 ) Previously, tests that have been skipped because their backend was missing would show up as succeeded, which has been very confusing.	2018-05-17 00:23:46 +02:00
Peter Goldsborough	8f42bb65b3	Be more lenient w.r.t. flag processing in C++ extensions (#7621 )	2018-05-16 18:17:18 -04:00
Bram Wasti	f87091636d	Update .gitignore (#7622 )	2018-05-16 18:10:35 -04:00
bddppq	8f6f43f5cf	Fix rocm docker images environment variables round 2 (#7626 )	2018-05-16 14:40:07 -07:00
Will Feng	599d0fac93	Reduce MAX_JOBS for gcc 7.2 build (#7618 )	2018-05-16 17:30:09 -04:00
Yinghai Lu	64cb4fb13d	Add ChannelShuffle to IDEEP fallback (#7623 )	2018-05-16 14:02:27 -07:00
Edward Z. Yang	c3a02fd8ed	Conditionalize all of conv_op_eigen on version (#7581 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-16 14:17:25 -04:00
Adam Paszke	b45f2ff1ae	Remove CompiledFunction + clean up JIT tests (#7421 )	2018-05-16 20:03:04 +02:00
onnxbot	28b0b16f9b	[auto] Update onnx to 01745b2 - Update README.md (onnx/onnx#976 ) `01745b28fa`	2018-05-16 16:56:18 +00:00
Paul Jesse Hellemn	c425d0350b	Patches needed for sync, rebased (#7564 )	2018-05-16 11:20:14 -04:00
Eric S. Yu	7bc3414f8f	fix caffe build failed with -O0 (#7570 )	2018-05-16 11:19:15 -04:00
Matt Le	c5b9a36f1e	Make return uniform in lbfgs step (#7586 ) * Make return uniform in lbfgs step This ensures that we are returning results of the same type in LBFGS step. * Adding test case to exercise different exit points Sets the tolerance_grad to negative infinity and positive infinity to deterministically excercise the early exit branch * Fixing lint error	2018-05-16 11:16:46 -04:00
Eric S. Yu	9213336c73	fix cmake USE_ASAN (#7608 )	2018-05-16 11:10:13 -04:00
bddppq	6eec4118a3	Fix python3.6 build in caffe2 CI (#7602 ) * Fix python3.6 build in caffe2 CI * Turn off onnx protobuf type stubs generation * Revert "Turn off onnx protobuf type stubs generation" This reverts commit 618b80911a316caa69f2d774fb12ae6b24b2a6d6.	2018-05-15 23:01:18 -07:00
onnxbot	ba44231cbc	[auto] Update onnx to 3a14d83 - Improve LRN doc (onnx/onnx#965 ) `3a14d83974`	2018-05-16 05:49:21 +00:00
onnxbot	86b1e230c7	[auto] Update onnx to 061af05 - Print protobuf type stubs warning to stderr (onnx/onnx#979 ) `061af05f45`	2018-05-16 05:07:35 +00:00
bddppq	ed458fd311	Fix environment variables in rocm docker images (#7598 ) * Fix environment variables in rocm docker images * Add to .bashrc as well	2018-05-15 21:51:02 -07:00
Marat Dukhan	9213b3f739	[caffe2] Fix linking of Android unit tests (#7607 ) Android unit tests failed to link due because libnnpack and libcpuinfo appeared in the linker command line before libcaffe2. This patch somehow fixes it.	2018-05-15 21:39:37 -07:00
onnxbot	0493d49afa	[auto] Update onnx to 63234db - remove fc op. (onnx/onnx#977 ) `63234dbae6`	2018-05-16 04:15:22 +00:00
Richard Zou	c76da6494b	Drop support for MAGMA v1 (#7582 ) Fixes #7502. Test Plan: build and test Build output has this: ``` -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - True -- Compiling with MAGMA V2 support -- MAGMA INCLUDE DIRECTORIES: /data/users/rzou/miniconda3/include -- MAGMA LIBRARIES: /data/users/rzou/miniconda3/lib/libmagma.a ```	2018-05-15 23:57:16 -04:00
onnxbot	be145e4f5b	[auto] Update onnx to 0524595 - Do not generate protobuf python type stubs if protobuf python package is not installed (onnx/onnx#974 ) `052459560d`	2018-05-16 03:40:42 +00:00
Pooya Davoodi	56fa6ec66a	[caffe2] Change iteritems in trt/transform.py to items for python3 compatibility (#7599 )	2018-05-15 20:32:06 -07:00
Yinghai Lu	c187a5d79e	Resolve the performance issue on ConvFusion Op (#7584 )	2018-05-15 20:31:29 -07:00
Jorghi12	cd86d4c554	PyTorch AMD Build Scripts (#6625 ) * PyTorch AMD Build Script. * Python invocation for hipify * Adding individual hip fles. * Updating CWD Use the actual path for the file instead of the current working directory, which depends on where the script is invoked. * Updating folder path for amd_build * Removing previous amd_build directory * Updated setup.py to support WITH_ROCM * Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD. * Removing old BatchNorm & Conv files since they've been renamed. * Updating build path to handle ROCM * Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths. * Seperated the individual patch files to make it easier to detect issues while building. * Removed CMakeLists hip files and fixed directory structure * Adding build pytorch amd script * Merged setup patch into PyTorch setup.py & cleaned a few issues * Added information on where to download the hipify-python script. * Resolved linting issues inside of build_pytorch_amd.py * Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process. * Refactored the PR for supporting HIP * Minimizing the number of changes inside individual patches. * Cleaned up patch files. * Removed patch files. * Updating patches * Removing HIP change from file. * Cleaned up patches * Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now. * Removing the other HIP file * Removed patch file + merged ROCm into Aten/test * Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack. * Reduced the number of patches down to 14 after Edward's suggestions. * Transferred deletion of certain functions from patch to yaml file. * Set default Thrust path * Fixed aten files so we now use the templated pow/abs instead of std:: directly. * Removed error from aten/src/THCUNN/Abs.cu * Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP. * Removed hip extraction from the build script and removed the old hip file. * Replaced MACRO with function in upper level cmake. * Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests. * Updated aten tests. * Removed the hip include from the ATen header. * Can't throw exceptions on C++ AMP, using abort * Missing IF guards for cuda/hip executables in aten tests. * Removed a series of patch files. * Added template keyword to help out the HCC compiler. * Rebased the specific files displayed in the PR * Fixing typo. * Change flag from "WITH_CUDA" to "NOT NO_CUDA" Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase. * Fix LoadHIP path * Updating build files after rebasing. * Reorganization after cpu/gpu separation. * Removed HIPCC from setup.py & removed -shared extra linking args. * Updated CMake / Setup build to correctly link when under ROCm stack. * Removed the unnecessary argument from Extension constructor. * Adding another test to be included with ROCm building. * Updated the setup_helpers scripts in order to get around linter error * Fix syntax issue * Solving lint issue: line too long	2018-05-15 18:38:01 -07:00
Will Feng	2de1b4488f	Run sccache in background mode and save logs to file (#7594 ) Running sccache in foreground mode seems to uniformly slow down the builds and causes virtual memory exhausted errors for gcc7.2 builds. This PR moves sccache to background mode instead and print the compilation log at the end of the build.	2018-05-15 21:21:19 -04:00
Bram Wasti	4b6c884b99	[caffe2][nomnigraph] Add optimize function to opt:: namespace that takes in a level and optimizes the graph/workspace accordingly. Adding it to predictor and speed_benchmark arguments (#7558 )	2018-05-15 15:57:06 -07:00
onnxbot	469c6c88a3	[auto] Update onnx to dc07e0f - Extend Concat/Gather/Squeeze/UnSqueeze to accept any tensor type (onnx/onnx#957 ) `dc07e0fb2f`	2018-05-15 22:26:47 +00:00
Marat Dukhan	9211790049	[caffe2] Include <array> in fatal_signal_asan_no_sig_test (#7592 ) fatal_signal_asan_no_sig_test.cc uses std::array, but doesn't include the header. It caused build error on Android.	2018-05-15 15:02:24 -07:00
onnxbot	0df84d7ec7	[auto] Update onnx to 21b56ad - mypy info (onnx/onnx#970 ) `21b56ada78`	2018-05-15 21:38:37 +00:00
Marat Dukhan	79b9bbe60f	[caffe2] Use caffe2::stod in lexer (#7591 ) std::stod causes build errors on Android	2018-05-15 14:06:24 -07:00
onnxbot	be019e4429	[auto] Update onnx to 76a288f - add script to count shape inference implementations (onnx/onnx#967 ) `76a288f098`	2018-05-15 20:54:11 +00:00
bddppq	3af3d13599	Run onnx integration tests in caffe2 CI (#7565 ) * Run onnx integration tests in caffe2 CI * verbose log * turn off onnx verbose installation log * can not install ninja * Do not use all cores to build pytorch * install tests require * pip install to user dir * use determined path to improve (s)ccache hit * Do not change path in test.sh * Add the compile cache hit trick to conda install as well * cover jenkins in CI environment detection	2018-05-15 13:25:24 -07:00
onnxbot	e65d6de16a	[auto] Update onnx to 3f80231 - Add type hints to numpy_helper_test.py (onnx/onnx#951 ) `3f80231786`	2018-05-15 20:07:18 +00:00
onnxbot	37f5b147fc	[auto] Update onnx to 037cfaa - Add type hints to test_backend_test.py (onnx/onnx#954 ) `037cfaa015`	2018-05-15 20:06:06 +00:00
Fritz Obermeyer	996886137a	Add link to TensorFlow Distributions paper (#7563 )	2018-05-15 15:46:54 -04:00
onnxbot	5748cc43ce	[auto] Update onnx to c918b4b - Add type hints to basic_test.py (onnx/onnx#947 ) `c918b4be91`	2018-05-15 19:23:53 +00:00
bddppq	d971782a03	Change code owners for onnx integration tests (#7587 )	2018-05-15 15:22:32 -04:00
Edward Z. Yang	efb7dead9d	Squelch -Werror=non-virtual-dtor (#7554 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-15 13:53:15 -04:00
onnxbot	4251e38eb3	[auto] Update onnx to b265987 - Add type hints to helper_test.py (onnx/onnx#950 ) `b26598714c`	2018-05-15 03:30:02 +00:00
onnxbot	a52eb24c42	[auto] Update onnx to bb4d582 - Add type hints to relu_test.py (onnx/onnx#952 ) `bb4d5827cf`	2018-05-15 03:24:41 +00:00
onnxbot	be7c5e573e	[auto] Update onnx to 533a84c - Add type hints to elu_test.py (onnx/onnx#949 ) `533a84c3ca`	2018-05-15 03:23:15 +00:00
onnxbot	f007392522	[auto] Update onnx to a659ab9 - Add type hints to schema_test.py (onnx/onnx#953 ) `a659ab90cc`	2018-05-15 03:22:04 +00:00
onnxbot	dbf77ef7a7	[auto] Update onnx to 28a8849 - Add type hints to onnx/test/optimizer_test.py (onnx/onnx#955 ) `28a8849127`	2018-05-15 03:20:57 +00:00
onnxbot	fb314ee150	[auto] Update onnx to 65f1811 - Fix a type error in lstm test case (onnx/onnx#959 ) `65f1811d2d`	2018-05-15 02:43:40 +00:00
Marat Dukhan	08415c42af	Replace std::to_string with caffe2::to_string in nomnigraph (#7561 ) std::to_string is not available on Android with GNU STL. We conventionally use caffe2::to_string as a portable alternative.	2018-05-14 19:37:43 -07:00
Thomas Viehmann	e1148db7f2	Implement logsumexp (fixes #2591 ) (#7254 ) * Implement logsumexp (fixes #2591) * Add logsumexp_backward, fix _out declaration. Thank you Simon and Edward for your comments!	2018-05-14 22:08:14 -04:00
cpuhrsch	05853945a4	Vectorize softmax and logsoftmax (#7375 ) This PR uses Vec256 to vectorize the softmax and logsoftmax Layers. This comes in 4 steps: log_softmax softmax log_softmax_backward softmax_backward * Vectorized Softmax and LogSoftmax * Abstractions * Style * Remove <limits> for Kernel * Perf investigations * Last cleanups	2018-05-14 22:08:00 -04:00
Paul Jesse Hellemn	44a10f2a98	Removing arch 20 + 21 (#7512 ) Should solve the shfl_xor undefined problem on cuda8 with conda and aten	2018-05-14 22:06:52 -04:00
Will Feng	4d35a40f3b	Better logging for sccache compilation failure (#7555 )	2018-05-14 22:03:38 -04:00
Peter Goldsborough	3414475653	[C++ API] Remove initialize_* functions (#7517 ) * Remove initialize_ functions * Fix clone() to recursively clone children * Small codemove	2018-05-14 18:24:58 -07:00
bddppq	bf9676180f	Update the name of env var for triggering integrated conda build (#7557 )	2018-05-14 16:28:39 -07:00
onnxbot	1666b54068	[auto] Update onnx to ac970c9 - update onnx model tests for rnn/lstm/gru (onnx/onnx#960 ) `ac970c9dcb`	2018-05-14 22:43:18 +00:00
anderspapitto	284f13b814	make sure that pytorch and caffe2 usage lines up with onnx rnn spec (#7511 )	2018-05-14 15:42:56 -07:00
Zachary DeVito	ce69d3110b	Improve script builtin checking using schema (#7311 ) Improve script builtin checking using schema * This add aten_schema.h which provides a barebones amount of type and argument information about each builtin operator * emitBuiltinCall is updated to use this information rather than aten_dispatch to ensure the operator is correct. * handling of keyword and position arguments now matches python behavior * There is no longer a requirement that kwargs be constant or that the attributes of an op must be entirely constant or non-constant * compiler now constructs a non-attributed version of the op first and then turns it into the constant-attribute version if all attributes are constants. * default arguments for builtins now work * SugaredValue::call and similar functions now have SourceRange information for their arguments so that error reporting is more accurate Notes: * This does not try to merge the builtin checking with python arg parser. Given that we will eventually have C10 schema which will replace aten_schema, we will eventually have a C++ description of the schema and working of that description directly will be the easiest form to understand. * python function calls and script method calls do not support keyword arguments yet. When we add this support we should refactor the handling in tryEmitSchema that resolves keywords into a common function. * default arguments work * keyword arguments to builtins work (still need to extend to calling python and other script methods) * much better error reporting for incorrect builtins Lift any constants to attributes on nodes when possible * Schema is usable internally in the compiler as the function signatures of script functions as well as for builtin operators. * Adds a List[T] class to better represent the arguments to cat/stack as a type rather than with custom checking. * Support kwargs for calls of script methods A future commit will be needed to add support for: * calls to script _functions_ which are currently are GraphExecutors without schema info. * kwargs to python functions, which will require refactoring python op	2018-05-14 14:46:36 -07:00
Dominik Schlösser	1f08000562	return value of LSTM example fixed. (#7534 )	2018-05-14 15:36:09 -04:00
andreh7	61afbbbd18	clamping the return value of uniform.cdf() to [0..1] (#7538 ) * fix for #7532: clamping the return value of uniform.cdf() to the range [0,1] * removed whitespace around equals to pass flake8 tests * added a test for uniform.cdf() with arguments outside support	2018-05-14 15:36:00 -04:00
Martin Drawitsch	bccb727b65	Remove wrong "input" arg from scatter_() docstring (#7550 )	2018-05-14 15:33:47 -04:00
onnxbot	a9a44faf03	[auto] Update onnx to 310b44c - Add tools for generating c++ code test coverage (onnx/onnx#938 ) `310b44c800`	2018-05-14 19:13:47 +00:00
bddppq	cf9913d569	Install torchvision before running integration tests (#7552 )	2018-05-14 11:49:10 -07:00
Will Feng	4af63916cd	Set up Caffe2 CUDA builds to use sccache (#7547 ) * Set up Caffe2 CUDA builds to use sccache * comment fix	2018-05-14 11:15:58 -07:00
onnxbot	56a63459b6	[auto] Update onnx to 330fd0f - shape inference for TopK and trigonometric functions (onnx/onnx#946 ) `330fd0f73e`	2018-05-14 04:29:19 +00:00
onnxbot	169e91c530	[auto] Update onnx to 8ff5fdb - fix def of gru version 1 (onnx/onnx#945 ) `8ff5fdbe26`	2018-05-14 03:48:22 +00:00
Mike Ruberry	fc23885105	Fixes reductions where accum type != type and simplifies all reductions (#7487 ) This PR makes two improvements: It fixes reduce kernels where accum type != type. Currently, for example, half tensors with small values may have norms that are (approximately) representable in fp16, but calling .norm() on them will result in underflow and a reported norm of zero. This PR fixes that behavior and adds a test in test_cuda.py to ensure underflow does not occur (test_tiny_half_norm). It simplifies all reductions by removing excessive templating and the -2 contiguous special case from THC_reduceDim and THC_reduceAll. The latter was previously removed from pointwise apply. This has no performance impact as the -2 special case was already mapping to the 1D code path. PyTorch currently attempts to handle accum type != type by either (1) writing kernels that immediately convert values to accum type after reading or (2) writing operations that take in type values and accumulate to the accum type. The latter path was not working properly (hence the current excessive half tensor underflow) and resulted in a lot of redundant code, with two reduce ops being passed to a kernel instead of one, and reduce ops frequently receiving the same template argument twice. This PR makes the former approach THE approach. Kernels that accumulate to (potentially) different types should follow the pattern of converting their input to the accum type, performing all operations on that type, and then converting back to the appropriate type if writing their value back to the tensor. This pattern makes the second reduce op redundant and allows for simpler templating, which should improve readability, reduce build time, and reduce binary size. Also, this prevents ops from having to perform their own conversions, which could result in poor performance if the same value was operated on multiple times. One exception to this simplification was that a new ThrustTensorDistOp was created to handle a call to thrust::inner_product(). This Op fuses the conversion and the TensorDistOp. In addition to the expected simplification, there is also some cleanup of excessive template parameters. For example, kernelReduceAllPass2() had three template parameters: T, IndexType, and ReduceOp, but IndexType was never used. * wip * Adds tests * Fixes Python linting * mean and norm fusions, code cleanup * fixes file permissions	2018-05-13 18:33:48 -04:00
onnxbot	d0287eca94	[auto] Update onnx to c50f329 - Adding shape inferences for GlobalMaxPool, GlobalAveragePool, and GlobalLpPool" (onnx/onnx#943 ) `c50f329dcd`	2018-05-13 20:36:27 +00:00
ngimel	63ae163b24	put dropout states on the input device (#7515 ) * put dropout states on the input device * add assert to aten, add test, fix lint * only assert device if states are defined	2018-05-13 16:25:37 -04:00
Thomas Viehmann	1ce5431aaf	Documentation improvements (#7537 ) - improve scatter documentation (fixes #7518) - refine KLDivLoss documentation (fixes #7464) - fix some sphinxbuild warnings Thank you, Hugh Perkins for reporting!	2018-05-13 15:44:24 -04:00
onnxbot	8f64f918f7	[auto] Update onnx to 0a6076e - Fix the opset version in backend tests (onnx/onnx#944 ) `0a6076eae6`	2018-05-13 15:46:52 +00:00
bddppq	c84fdda582	Skip onnx backend tests for inverse trigonometric ops (#7533 )	2018-05-13 08:41:28 -07:00
Edward Z. Yang	a3b2877810	Fix CUDA builds. (#7529 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-13 09:54:03 -04:00
onnxbot	825c3ca2d6	[auto] Update onnx to 4e98b03 - add trigonometric functions (onnx/onnx#869 ) `4e98b038d1`	2018-05-13 07:52:49 +00:00
onnxbot	f529b85035	[auto] Update onnx to 0bd3f78 - Add shape inference for LpPool, RoiPool, and fix MaxPool, AveragePool, and Conv (onnx/onnx#928 ) `0bd3f78bf4`	2018-05-13 05:05:49 +00:00
Edward Z. Yang	5336ea4195	Work around Python nightly regression. (#7526 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-13 00:17:36 -04:00
Edward Z. Yang	2bb38ba700	Built-in support for rebuilding in win-build.sh (#7442 ) * Built-in support for rebuilding in win-build.sh Signed-off-by: Edward Z. Yang <ezyang@fb.com> * fixups Signed-off-by: Jenkins <jenkins@ci.pytorch.org> * CR comments * CR comments * more delayed expansion fixes	2018-05-12 23:53:40 -04:00
Soumith Chintala	ac52f1186a	[minor] change dockerfile to point to pytorch channel (#6960 )	2018-05-12 23:43:09 -04:00
Mike Ruberry	37b9d093d2	Updates collapseDims() function and documentation (#7056 ) * Updates collapseDims() function and documentation * Adds C++ tests, validates input, updates names for readability * Removes invalid test * stashing to merge AT_CHECK macro * Updates asserts, removes tests on Windows	2018-05-12 23:42:55 -04:00
Thomas Viehmann	cfc1d92975	Implement ellipses ('...') and diagonals (e.g. 'ii->i') in einsum. (#7173 ) This brings the two most important missing numpy einsum features to toch.einsum.	2018-05-12 23:39:37 -04:00
Thomas Viehmann	7edd451a4e	Improve spectral_norm (fixes #7261 ) (#7298 ) * Improve spectral_norm (fixes #7261) Thank you Morgan Funtowicz for the report and minimal example! * compute sigma only once	2018-05-12 23:31:37 -04:00
Mahdi Nazemi	cf9751207e	Allow building Caffe2 with ATen support (Addresses #7249 ) (#7297 ) * Addresses Issue #7249, where Caffe2 cannot be built with ATen support * Fixed indentation	2018-05-12 23:30:46 -04:00
Richard Zou	eaa3f2e613	Fix advanced indexing with negative indices (#7345 ) * Fix advanced indexing with negative indices Fixes #7156 Here is some behavior before this PR: ``` In[1]: x = torch.arange(9).view(3, 3).contiguous() x[[0], [-1]] # Should be equivalent to x[0, -1] Out[1]: tensor([ 8]) ``` The bug is that negative indices are added to the computed linear index directly. In the above example, the linear index computed is "-1", which wraps around to "8", giving the last element of a flattened view of `x`. Instead, we should wrap negative indices around before adding them to the linear index. * Use toCLong()	2018-05-12 23:24:40 -04:00
onnxbot	2ac34b98ea	[auto] Update onnx to 490c4c6 - fix build dependency between onnx-operators.proto and (onnx/onnx#934 ) `490c4c6ca9`	2018-05-13 03:14:44 +00:00
Sam Gross	976b1d5ec1	Don't initialize the current device in CUDAGenerator::CUDAGenerator (#7392 ) Previously, CUDAGenerator::CUDAGenerator would initialize the random number generator on the current device. This would usually be device 0. This is undesirable because initialize the CUDA context allocates a few 100 MBs due to all the kernels in libTHC.so. This avoids the unecessary call to THCRandom_getGenerator() in the CUDAGenerator constructor. Fixes #7320 Previously, CUDAGenerator::CUDAGenerator would initialize the random number generator on the current device. This would usually be device 0. This is undesirable because initialize the CUDA context allocates a few 100 MBs due to all the kernels in libTHC.so. This avoids the unecessary call to THCRandom_getGenerator() in the CUDAGenerator constructor. Fixes #7320 * Fix call to get THCState	2018-05-12 22:57:06 -04:00
Edward Z. Yang	acb6f2697e	Some notes about developing on Windows (#7447 ) * Some notes about developing on Windows * typofix	2018-05-12 22:55:11 -04:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
Xiaomeng Yang	921dece2d7	Update Im2ColNd functions (#7505 ) Update Im2ColNd functions	2018-05-12 15:59:50 -07:00
Kaiyu Shi	db6e4576da	Use customized python interpreter (#7520 )	2018-05-12 13:06:39 -04:00
cpuhrsch	0337d6708c	Use SLEEF's tanh (#7513 )	2018-05-12 14:14:02 +00:00
Yinghai Lu	ed3b12e1ba	[Caffe2] Ideep net optmization passes (#7514 ) * Transform ideep net * Add conv+relu transformation * Add verification and address comments	2018-05-11 23:50:18 -07:00
onnxbot	580556dd60	[auto] Update onnx to 25b8845 - Extend AveragePool to support average count include padding (onnx/onnx#884 ) `25b8845a14`	2018-05-12 04:10:55 +00:00
Peter Goldsborough	6ada041b31	Some small fixes in C++ API (#7510 )	2018-05-11 18:56:53 -07:00
onnxbot	aced37a633	[auto] Update onnx to 7c8b3d2 - [Typing 4/5] Add type hints to onnx/backend (onnx/onnx#913 ) `7c8b3d2c75`	2018-05-11 23:19:04 +00:00
bddppq	141d81d095	Move ONNX integration tests from onnx-fb-universe to PyTorch repo (#7397 ) * Move ONNX integration tests from onnx-fb-universe to PyTorch repo * Switch to use torchvision * Delete single rnn operator tests, they have been covered in e2e tests in test_caffe2.py * Mirror the fix in onnx-fb-universe to bypass cuda check `667326d84b`	2018-05-11 15:05:18 -07:00
anderspapitto	b3f0ab3726	rnn onnx export: consolidate rnn/gru/lstm (#7506 )	2018-05-11 14:58:20 -07:00
Yinghai Lu	2863d935b9	[Caffe2] Fix of the performance issue of IDEEP (#7503 ) * Sketch fix of the performance issue of IDEEP * Revert CMakefile * Fix tests * format * comments * Print error * review comments	2018-05-11 13:43:41 -07:00
Zachary DeVito	38bc732b2d	[jit] Change interpreter/fuser to work on Variables only (#7489 ) * this removes the flag controlling whether the interpreter works on variables. * now the interpreter _always_ works on variables * constants in the IR are still _always_ non-variables, and an assert was added to ensure this. * as_tensor was split into as_variable and as_tensor since it is sometimes used to construct constants in the IR * I tried changing the IR to also always use variables but that change was much more cross cutting and fragile and I never got it working	2018-05-11 13:33:47 -07:00
Karan Dwivedi	dc0faab18d	Add zeros_ and ones_ init + tests (#7488 ) * Add zeros_ and ones_ init + tests * Dedup tests * Remove all occurences of as_variable	2018-05-11 11:07:11 -04:00
Barlas Oguz	5f96a2d26a	Add sparse gradient option to pretrained embedding (#7492 ) * Add sparse gradient option to pretrained embedding * Add sparse gradient option to pretrained embedding * Trailing white space	2018-05-11 08:44:53 -04:00
Jon Walsh	857e3f4a5e	Throw error in tensor constructor when numpy strides mismatch (#7440 )	2018-05-11 11:00:43 +02:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
onnxbot	947155c69d	[auto] Update onnx to b2539fc - Shape and type inference for Flatten, SpaceToDepth, DepthToSpace (onnx/onnx#930 ) `b2539fca83`	2018-05-11 02:43:57 +00:00
Orion Reblitz-Richardson	f8b5d420a4	Fix Caffe2 build with ATen CPU/GPU split (#7486 )	2018-05-10 19:28:56 -07:00
onnxbot	75f549bbef	[auto] Update onnx to 9dd2533 - Changes done internally at Facebook (onnx/onnx#909 ) `9dd2533ee3`	2018-05-10 23:34:10 +00:00
Pieter Noordhuis	d5e77fb058	Port interface of store base class from Caffe2 (#7439 ) The file store implementation is new and based on the file initialization method (which uses a single file and file locking) and the interface of the Caffe2 store handler. See #7434.	2018-05-10 16:04:19 -07:00
Qinqing Zheng	6547245f1f	Add return value to setup() function of PipedReaderBuilder (#7476 )	2018-05-10 15:39:54 -07:00
Sam Gross	6c7a8318c4	Fix Tensor.type(dtype) not preserving device (#7474 ) Note that Tensor.cuda() will stil copy the tensor to the current device if it's a CUDA tensor on a different device. Fixes #7441	2018-05-10 18:22:13 -04:00
anderspapitto	43264c3c30	add cast to ensure correct type for sequence lens argument (#7483 )	2018-05-10 14:58:00 -07:00
Lu Fang	c489c6a1da	Skip upsample onnx backend test (#7477 )	2018-05-10 13:17:24 -07:00
Bram Wasti	a2a4b229cc	[caffe2][nomnigraph] Make conv relu fusion more generic (#7437 )	2018-05-10 13:03:20 -07:00
Ethan Steinberg	9fa1dff66a	Allow the use of torch.device for loading (#7339 ) * Allow using torch.device for loading * Make recommended changes * Better tests	2018-05-10 15:50:00 -04:00
danielsimig	b6adf6871c	EmbeddingBag to handle empty bags in all modes (#7389 )	2018-05-10 15:46:57 -04:00
Will Feng	3f029224cd	hotfix: update cmake version for Linux CUDA9 builds (#7478 )	2018-05-10 15:39:57 -04:00
Konpat	9789602814	Fix excess ']' in nn.utils.rnn.pack_sequence (#7475 )	2018-05-10 14:41:17 -04:00
Zachary DeVito	93eb50c103	Mark expand nodes as implicit/explicit in trace (#7303 ) When tracing we record expand nodes. This is useful in some cases because it makes it clear a broadcast happened. However, in future runs the broadcast may be different or not needed. This change adds an attribute to expand to track if it was implicitly added. This takes the form of an unused input to expand with a default value. The execution engine then removes implicit expands before execution. Note that shape_analysis will re-add expands when it can prove by shape analysis that they will exist and this is useful for the fuser, so this change should not affect fusion passes.	2018-05-10 10:47:43 -07:00
onnxbot	c3918da523	[auto] Update onnx to 008a805 - update some model files (onnx/onnx#926 ) `008a8054fd`	2018-05-10 17:45:10 +00:00
Deyu Fu	20041e2704	better cache for nccl resourse (#6970 ) allow more than 1 device list to be stored	2018-05-10 19:42:36 +02:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Vince Jankovics	ea98256e96	Buf check_unique fix for jit (#7468 )	2018-05-10 19:27:24 +02:00
bddppq	b5a1eda7d3	guard dynamic sizes expand from peephole passes (#7436 )	2018-05-10 09:34:20 -07:00
Will Feng	6a118b21b5	Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA (#7361 ) * Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA * Change JOBS setting in tools/cpp_build/build_common.sh	2018-05-10 12:25:13 -04:00
Paul Jesse Hellemn	78c3d8c164	Adding yaml to docker images for Aten builds (#7430 ) * Adding yaml to docker images for Aten builds * Removing pip install of yaml due to permissions	2018-05-10 09:07:21 -07:00
Peter Goldsborough	c5de3314cf	Add name() to C++ modules (#7409 ) * Add name() to C++ modules * Use RTTI to get module name by default * Add functional.cpp to CMakeLists.txt * Call typeid() inside name() instead of constructor * Add tests and use default constructor	2018-05-10 08:52:38 -07:00
anderspapitto	ab5c391100	onnx rnn export: use spec-respecting dimensions (#7394 ) fixes https://github.com/pytorch/pytorch/issues/6879	2018-05-10 08:19:17 -07:00
Orion Reblitz-Richardson	d9671ea38e	Fix Caffe2 with ATen build (#7452 )	2018-05-10 07:57:31 -07:00
Changhan Wang	a257bd19a2	added state_dict/load_state_dict for ReduceLROnPlateau (#7201 )	2018-05-10 12:02:28 +02:00
Peter Goldsborough	4eaf5261d3	Provide default implementation of clone() in base module (#7446 )	2018-05-10 00:49:29 -07:00
Marat Dukhan	48b7f298f9	Update NNPACK and cpuinfo submodules to latest master (#7443 ) In Maratyszcza/NNPACK#140 @daquexian reported an error on Faster-RCNN model with MobileNet V2, when running with NNPACK engine. The error disappears when using the latest NNPACK and cpuinfo. Updating submodules upstream to ensure others don't hit this issue.	2018-05-10 00:20:19 -04:00
Will Feng	bd8f6bd46a	hotfix: update cmake version for OSX builds (#7456 )	2018-05-10 00:05:04 -04:00
Peter Goldsborough	3023dd25f3	Use set_type to implement type conversions in C++ API (#7408 ) * Use set_type to implement .cuda() in C++ API * Change C++ module parameter types in place * Fix bug where batchnorm state was not moved to CUDA	2018-05-09 17:01:19 -04:00
James Reed	ed111619da	[ONNX] Allow specifying only a subset of input/output names (#7427 ) * [ONNX] Allow specifying only a subset of input/output names Then we can only specify the "real" names while ignoring the names for all the parameters * fix * Update utils.py	2018-05-09 13:02:20 -07:00
James Reed	d9c74f727c	Fix ONNX tutorial specification for input names (#7433 ) * Fix ONNX tutorial specification for input names * Some more updates	2018-05-09 13:01:53 -07:00
James Reed	56077f5661	Fix CODEOWNERS precedence for ONNX folder (#7429 ) More specific paths should come later, since the last matching pattern takes precedence	2018-05-09 14:31:10 -04:00
Peter Goldsborough	23be4ac3a2	Add clang tidy tooling (#7412 )	2018-05-09 13:08:53 -04:00
Jinghui	769397eb77	[Caffe2] [feature request] Add gradient operators for IDEEP (#7234 ) * Add gradient operators for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add gradient test cases for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Upgrade third_party/ideep Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Refine SumOp for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Share input buffer in fallback op if possible Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fallback ConvTranspose op for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix bug introduced by the patch of sharing input buffer Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Share output buffer in fallback operators Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove IDEEP to resolve repo issue Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Reflash IDEEP repo Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove redundant lines in IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fallback operators for IDEEP (Flatten, ResizeLike, Transpose, and Reshape) Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-05-09 08:52:24 -07:00
Yann Zimmermann	97c5c0b034	add python library linking on Windows (#7157 )	2018-05-09 11:50:55 -04:00
avmgithub	f02ae65727	skip test_utils.TestFFI.test_cpu for ppc64le due to incompatible exception handling (#7422 )	2018-05-09 11:45:30 -04:00
Domagoj Alagić	f43e067128	Make optimizer not complain about parameters with requires_grad=False (#7419 )	2018-05-09 11:34:52 -04:00
Soumith Chintala	6fd252ccae	AUTOGRAD_ to TORCH_AUTOGRAD_ for macros (#7424 )	2018-05-09 10:45:05 -04:00
Isaac Ge	537cb10525	improve DataParallel/DistributedDataParallel docs (#7407 )	2018-05-09 10:30:42 +02:00
onnxbot	ef477b2b00	[auto] Update onnx to 72e15ac - [Typing 2/5] Add type hints to onnx/defs (onnx/onnx#911 ) `72e15ac46f`	2018-05-09 06:49:45 +00:00
onnxbot	dca540e455	[auto] Update onnx to ee7f97c - Add type hints to onnx/tools (onnx/onnx#910 ) `ee7f97c2b1`	2018-05-09 06:48:38 +00:00
bddppq	af23ab9b3e	Make omnigraph a public dependency of caffe2 main lib (#7402 )	2018-05-08 23:37:40 -07:00
anderspapitto	5c2015d133	onnx werror is now opt in (#7390 )	2018-05-08 21:21:34 -07:00
cpuhrsch	8dbeffab07	Add back SLEEF and also use better cmake setup. (#7341 )	2018-05-09 02:48:16 +00:00
Peter Goldsborough	7911a30081	Move #endif below magma source (#7400 )	2018-05-08 22:28:26 -04:00
James Reed	92d02a46dd	Dont do CSE on nodes with blocks (#7363 )	2018-05-08 18:00:45 -07:00
Bram Wasti	b1fbf29b52	[caffe2][nomnigraph] Change the standard transform API to take in NNModule rather than NetDef (#7308 )	2018-05-08 17:43:51 -07:00
Paul Jesse Hellemn	dc3252730e	Fixing conda builds by removing unneeded python args (#7384 )	2018-05-08 17:33:30 -07:00
Bram Wasti	3913e9ead3	[caffe2][nomnigraph] Batchnorm + Conv Fusion (#7057 )	2018-05-08 15:40:34 -07:00
Richard Zou	3185d8342e	Replace incorrect usages of "NotImplemented" (#7381 ) * Replace incorrect usages of "NotImplemented" Fixes #7266. Replaces "NotImplemented" (which is supposed to be used for binary ops) with the correct "NotImplementedError". * Address comments	2018-05-08 18:31:45 -04:00
Richard Zou	755d3105b6	Fix MultiMarginLoss equation in docs (#7383 ) Fixes #7237	2018-05-08 18:30:47 -04:00
Yinghai Lu	e3935f7509	[Caffe2] Add conv+relu fusion for MKLDNN ops (IDEEP) (#7385 ) * Add conv+relu fusion for MKLDNN ops (IDEEP) * comments	2018-05-08 14:44:53 -07:00
ngimel	8c8918c341	make half overflow checks consistent with other types (#7382 )	2018-05-08 14:40:18 -07:00
onnxbot	8f27582194	[auto] Update onnx to dee6d89 - make werror opt-in (onnx/onnx#908 ) `dee6d89781`	2018-05-08 21:22:40 +00:00
Richard Zou	71626491c4	Add batched linear solver to torch.gesv() (#6100 ) * Add batched linear solver to torch.gesv() Fixes #3164 Picks up from #4502 I moved `gesv` to ATen. Adds bindings for MAGMA's `gesv_batched` function for CUDA. For CPU, runs `THLapack(gesv)` in a for loop. The new function supports arbitrary batch dimensions (and broadcasting of those dimensions). For example, the 4-d tensor `A x B x M x M` should be treated as having batch-size `(A x B)`. The overhead of creating the magma_queue_t is: ~350000 microseconds the first time it's called and ~6 microseconds every time after that. * Tests and docs * Address comments * Address comments * Rebase * Address comments * Fix rebase * Addressed comments * Address comments * Address comments * Addressed comments	2018-05-08 17:06:27 -04:00
bddppq	f598ef9102	Add CI docker image for rocm builds (#7349 )	2018-05-08 13:41:27 -07:00
bddppq	7b66c433bc	Use a CI specific onnx namespace to catch hardcoded ones in the code (#7369 )	2018-05-08 13:40:55 -07:00
Paul Jesse Hellemn	de470d1222	Small fix needed to build Caffe2 Aten without CUDA (#7387 )	2018-05-08 15:55:03 -04:00
Richard Zou	fea95de854	Add aten::expand to the isDifferentiable list (#7350 ) This lets aten::expand be differentiable in torchscript. It was probably omitted from the list by accident in the past b/c gradientForNode does already support aten::expand. Also adds a test to check expand and its gradient in a torchscript fn.	2018-05-08 21:40:36 +02:00
Mike Ruberry	913e145340	Removes -2 special case and specialization from pointwise apply (#7366 ) * Removes -2 special case and specialization * Specialization and comment cleanup	2018-05-08 14:58:46 -04:00
bddppq	4adba42a75	[easy] minor cleanup in caffe2 jenkins test script (#7378 )	2018-05-08 11:50:48 -07:00
Paul Jesse Hellemn	9396740406	Updating condas to build for all CUDA archs (#7379 )	2018-05-08 11:45:45 -07:00
Edward Z. Yang	67e7c24479	Add note about thread-safety of registry (#7285 )	2018-05-08 10:26:28 -07:00
Orion Reblitz-Richardson	24b41da795	[build] Make ATen buildable without all Caffe2 by root cmake (#7295 ) * Make ATen buildable without all Caffe2 by root cmake * Fix typo in aten cmake * Set BUILD_ATEN from USE_ATEN as compat * Only set BUILD_ATEN from USE_ATEN when on * Have USE_GLOO only set when BUILD_CAFFE2	2018-05-08 10:24:04 -07:00
onnxbot	0aebddd476	[auto] Update onnx to 522c055 - version bump to 7 (onnx/onnx#876 ) `522c05566e`	2018-05-08 17:10:40 +00:00
Yinghai Lu	e9f6f14555	[Caffe2] Revamp the convnet benchmark code by using models from model zoo (#7351 ) * Revamp the convnet benchmark code by using models from model zoo * Move ModelDownloader to caffe2/python/models * Remove convnet_benchmarks.py	2018-05-08 08:53:52 -07:00
Yinghai Lu	2cb26bcd40	Fix type in TensortRT tests (#7357 )	2018-05-08 07:52:04 -07:00
Bram Wasti	75dbf9b113	[caffe2][build] Update python cmake flag print script (#7306 )	2018-05-08 00:34:42 -07:00
103yiran	79a4d27232	Correct the parameter annotation (#7367 ) Make the annotation keep pace with the parameter.	2018-05-08 00:31:16 -07:00
Yinghai Lu	f439ba5843	[Caffe2][nomnigraph] Generic fuse conv relu pass for nomnigraph (#7355 ) * Generic fuse conv relu pass for nomnigraph * Use it in NNPACK conversion * Comments * Change the postprocess interface to take node instead of conv op	2018-05-07 23:19:06 -07:00
Paul Jesse Hellemn	f3c8bd598d	[Caffe2] Pinning conda-numpy to 1.14 to avoid SVD issue (#7344 ) * Pinning conda-numpy to 1.14 to avoid SVD issue * Adding another leveldb test to conda's ignored tests, removing a mkl-test from this * Removing commented out section	2018-05-07 22:55:50 -07:00
anderspapitto	75651c199f	fix build (#7348 )	2018-05-07 20:43:08 -07:00
François Garillot	b6adecdeee	correct schema.Scalar's shape for a shape argument of 1 (#6493 ) The schema.Scalar class makes pretty strict assumptions (via its docstring) on the spec of the shape of its underlying object. Because of idiosyncracies of numpy indexing and the use of np.dtype, those assumptions are broken on an edge case (dtype = (scalar_type, 1)). This corrects the behavior of this edge case to conform to the spec.	2018-05-07 18:58:11 -07:00
Bram Wasti	e7116d95e0	Create README.md (#7360 )	2018-05-07 18:26:59 -07:00
Yinghai Lu	ea24c7ff1b	Remove cdft library requirement from MKL (#7246 )	2018-05-07 15:31:30 -07:00
Bram Wasti	ed6f79ccd2	[caffe2][build] Add ASAN to the debug release of caffe2 (#7107 )	2018-05-07 15:26:51 -07:00
onnxbot	edbfe02941	[auto] Update onnx to ea0e0cb - remove whitespace and semicolon (onnx/onnx#904 ) `ea0e0cb13f`	2018-05-07 22:07:27 +00:00
Bram Wasti	3642745ef9	[caffe2][nomnigraph] Add maxpool sink transform (#7207 )	2018-05-07 14:52:10 -07:00
Peter Goldsborough	8fce8673bb	Rename Container to Module in autogradpp and reorg code (#7304 ) * Rename autograd namespace to torch and change torch.h into python.h * Pave the way for torch::nn::Module * Reorganize module code structure * Undo ONNX update * Remove sleef submodule	2018-05-07 14:45:00 -07:00
onnxbot	5146bc99e4	[auto] Update onnx to 328ed3e - shape inference for logical ops (onnx/onnx#899 ) `328ed3e679`	2018-05-07 18:45:53 +00:00
Will Feng	2fdc00e41c	Use sccache for Windows build (#7331 )	2018-05-07 14:42:59 -04:00
Yan Facai (颜发才)	f1e38725bf	add `to` method for PackedSequence (#7319 ) * ENH: add to method for PackedSequence * ENH: return self if possible * TST: remove extra data * DOC: add more explanation * TST: remove extra data * DOC: minor fix	2018-05-07 14:39:03 -04:00
onnxbot	c68ae308cd	[auto] Update onnx to d05b6b4 - Just don't output opset_version in the example then. (onnx/onnx#887 ) `d05b6b46f8`	2018-05-07 18:04:01 +00:00
onnxbot	4f48b7c1ba	[auto] Update onnx to 5be6d86 - fix typos in documentation (onnx/onnx#896 ) `5be6d86654`	2018-05-07 17:44:15 +00:00
Qian Hong	bebccc0c6d	Improve math formula rendering in Poisson Distribution docs. (#7340 )	2018-05-07 18:40:01 +02:00
onnxbot	4c511075c3	[auto] Update onnx to 6fa9f1a - promote identity op given it's being used. (#892 ) `6fa9f1a58b`	2018-05-06 21:07:56 +00:00
onnxbot	f9b83f2e6c	[auto] Update onnx to c0fb725 - Spec clarity: IR.md modifications. (#720 ) `c0fb725b64`	2018-05-06 19:56:05 +00:00
Anton	56daed0a85	copy paste documentation error fixed in Softmin (#7324 )	2018-05-06 21:50:46 +02:00
Peter Goldsborough	54a4867675	Bring back C++ extension torch.h (#7310 ) * Bring back C++ extension torch.h * Fix python.h include in python_tensor.cpp	2018-05-05 14:06:27 -07:00
onnxbot	6087a5feaa	[auto] Update onnx to b0ab0d1 - function registration c++ API (#848 ) `b0ab0d1d15`	2018-05-05 14:37:10 +00:00
onnxbot	94b74d2068	[auto] Update onnx to ceb259c - Tests for ReduceLogSum (#862 ) `ceb259c903`	2018-05-05 08:36:40 +00:00
Paul Jesse Hellemn	0859f0e3e6	Pinning numpy version in conda builds (#7314 )	2018-05-04 16:38:53 -07:00
onnxbot	1f14d681dd	[auto] Update onnx to 1c600f8 - Lint the code and fix the CI (#895 ) `1c600f802d`	2018-05-04 22:50:30 +00:00
onnxbot	ea12702e02	[auto] Update onnx to 278ef5b - inference for math ops (#893 ) `278ef5bc9c`	2018-05-04 21:51:16 +00:00
onnxbot	56ed857f1b	[auto] Update onnx to f708d41 - type and shape inference for experimental ops (#890 ) `f708d41fea`	2018-05-04 21:50:10 +00:00
jfan-uber	f06fcc6efa	Fix bug that introduced in pull #3280 (#7292 ) Apparently get() is a function of requests, not a module (not sure if in the past get() used to be a module). Therefore, the syntax in #3280 will alway fail with ImportError, and requests lib will never be used (kind of defeat the purpose of that pull request). Also, if requests lib is used, should add stream=True parameter, otherwise requests.get() will load the whole response into memory.	2018-05-04 14:14:02 -07:00
onnxbot	e1c7e6dce2	[auto] Update onnx to 38eea57 - add ONNX_NO_WERROR as option (#891 ) `38eea57313`	2018-05-04 21:04:54 +00:00
anderspapitto	67a9948d87	Refactor rnn export (#7263 ) * rnn refactor: extract rnn weights and biases * rnn refactor: make rnn with converted outputs * rnn refactor: finish it off	2018-05-04 14:00:09 -07:00
Tongzhou Wang	55b8317f1d	Update gif with new logo (#7301 ) * Update gif with new logo * add requires_grad=True	2018-05-04 16:47:08 -04:00
Richard Zou	24681a8e49	Update unstable docs logo to new logo. (#7305 ) Fixes #7302	2018-05-04 16:44:58 -04:00
Peter Goldsborough	feb64b5291	Add -Wno-unknown-pragmas (#7291 )	2018-05-04 13:44:13 -07:00
Richard Zou	3369828bfa	Clarify patience in ReduceLROnPlateau docs (#7242 ) * Clarify patience in ReduceLROnPlateau docs It's unclear which definition of patience we have. The two ways to interpret it are: - How many bad epochs can you see before you start considering changing the learning rate. - How many bad epochs can you see before you change the learning rate. This PR clarifies the docs with an example. If `patience = 2`, then after 2 bad epochs, we begin considering changing the learning rate. After seeing one more epoch (the 3rd epoch), if that epoch is also bad, then we change the learning rate after it. * address comments	2018-05-04 16:39:26 -04:00
Tongzhou Wang	ac5d7bdf62	Fix onnx.symbolic.upsample_bilinear2d not considering align_corners (#7264 )	2018-05-04 16:38:38 -04:00
bddppq	0dd2521d4c	Fix ONNX export for AveragePool with count_include_pad=True (#7279 )	2018-05-04 13:21:32 -07:00
Paul Jesse Hellemn	0259d9c8d3	Changing underscores to hypens in conda package names (#7299 )	2018-05-04 12:50:41 -07:00
Sergei Lebedev	a0c1e5faea	Change the error message in pad_sequence to be more user-friendly (#7283 )	2018-05-04 12:29:21 -07:00
gchanan	36a3f0995b	Remove THDTensorDescriptor_newFromTH{X}Tensor. (#7287 ) They don't seem to be used and we are moving to a single TensorImpl model.	2018-05-04 12:22:19 -07:00
Lu Fang	833b1e6c74	Skip the test case on ReduceLogSum (#7293 )	2018-05-04 11:49:30 -07:00
anderspapitto	026cb9d2f1	set ONNX_NO_WERROR (#7296 )	2018-05-04 11:35:15 -07:00
ngimel	a015d579dd	move softmax/logsoftmax to ATen (#6786 ) * move softmax/logsoftmax to ATen * specify cpu and gpu accum types * use accreal for CPU * expose softmax backward to python, fix legacy interface * fix Distributions.cu to use common AccumulateType * fix cuda 8 build * delete commented out lines * rebase on master, fix breakages	2018-05-04 14:23:35 -04:00
Zeming Lin	5c575a1497	Fixes RNN shapes for C++ API (#7272 )	2018-05-04 14:00:30 -04:00
anderspapitto	9e3f5bb5fd	enable onnx shape inference when converting onnx -> caffe2 (#7260 )	2018-05-04 10:27:30 -07:00
Edward Z. Yang	157d7499e7	Disable two flaky C++ API tests. (#7290 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-04 10:23:52 -07:00
onnxbot	46d0140d94	[auto] Update onnx to 541512b - tests for type and shape inference for Random generator ops (#880 ) `541512b93a`	2018-05-04 16:02:33 +00:00
Edward Z. Yang	4abb229960	Double-dispatch copy. (#7197 ) * Double-dispatch copy. In order to split ATen's CPU/CUDA code into two separate libraries which don't require a build flag (AT_CUDA_ENABLED) to separate them, we need to be able to split source files based on whether or not they handle CPU functionality only, or also touch CUDA. Copy poses a unique challenge here, because the naive implementation involves writing a matrix for all combinations of CPU/GPU in a single file. This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting the following matrix: to\from CPU CUDA +--------------------------- CPU \| CPUCopy.cpp CUDACopy.cpp CUDA \| CUDACopy.cpp CUDACopy.cpp When you run x.copy_(y) where x is CPU and y is CUDA, we do a second virtual dispatch to copy_from(y, x) on y's type, so that we can get from CPUCopy.cpp to CUDACopy.cpp The new autogenerated code for CPU looks like this: Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const { // code generated by copy_wrapper checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false); switch (src.type().ID()) { case TypeID::CPUByte: THByteTensor_copyByte(static_cast<CPUByteTensor>(dst.pImpl)->tensor, static_cast<CPUByteTensor>(src.pImpl)->tensor); break; case TypeID::CPUChar: THByteTensor_copyChar(static_cast<CPUByteTensor>(dst.pImpl)->tensor, static_cast<CPUCharTensor>(src.pImpl)->tensor); break; ... default: return src.type().s_copy_from(src, dst, non_blocking); Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy but the arguments are reversed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lintfix and no-CUDA fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix compilation erorr. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-04 11:58:22 -04:00
Orion Reblitz-Richardson	053b68c4da	Fix USE_ATEN flag in caffe2 (#7252 )	2018-05-04 08:30:08 -07:00
Peter Goldsborough	67d0d14908	Rename autograd namespace to torch and change torch.h into python.h (#7267 ) * Rename autograd namespace to torch and change torch.h into python.h * Include torch.h instead of python.h in test/cpp/api * Change some mentions of torch.h to python.h in C++ extensions * Set paths directly, without find_path	2018-05-04 08:04:57 -07:00
cpuhrsch	bcffb5aa1d	Remove SLEEF and all dependent code paths (#7268 ) Temporarily remove this dependency.	2018-05-04 14:41:09 +00:00
Adam Paszke	0829d4502d	Trace size-dependent expressions correctly (#6554 ) This makes the JIT tracer much more robust, by allowing it to record dependencies on tensor sizes. For example, if you were to trace this function def fn(x): return x.view(x.size(1), -1) before this patch, then it would embed the actual value of x.size(1) in the trace as a constant, making it very hard to have e.g. batch size independent traces. Now, this will correctly record the dependency, and will retrieve the size of x at every run.	2018-05-04 10:55:39 +02:00
Adam Paszke	da654337e0	Add support for type annotations in Python functions (#7009 )	2018-05-04 10:54:19 +02:00
vfdev	6363faf184	Fix issue #7209 in DataLoader (#7265 )	2018-05-04 10:51:46 +02:00
onnxbot	159c75a2ca	[auto] Update onnx to e35126b - add type inference function for classifier ops. (#882 ) `e35126bc4b`	2018-05-04 08:03:07 +00:00
onnxbot	739d3d48ec	[auto] Update onnx to 7ee7d0b - enable Werror=sign-compare on linux (#867 ) `7ee7d0b57a`	2018-05-04 08:02:14 +00:00
onnxbot	d856bfc1bf	[auto] Update onnx to e35126b - add type inference function for classifier ops. (#882 ) `e35126bc4b`	2018-05-04 06:47:08 +00:00
Fritz Obermeyer	98c24fae6b	Fix broadcasting error in LogNormal and TransformedDistribution (#7269 )	2018-05-03 23:03:51 -04:00
Christian Sarofeen	8325206c6f	A clip grad fix for sparse tensors. (#7257 )	2018-05-04 00:35:32 +02:00
bddppq	a95b7b13f9	Extend support to arbitrary ops in init net when converting c2 models to onnx (#7256 )	2018-05-03 15:34:47 -07:00
Adam Paszke	8091388d0f	Add support for __floordiv__ and __rdiv__ for integral tensors (#7245 )	2018-05-03 23:34:59 +02:00
Tongzhou Wang	371cc1e2db	update the gif for 0.4 (#7262 )	2018-05-03 14:23:08 -07:00
Soumith Chintala	92f54e1f01	remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259 )	2018-05-03 12:32:57 -07:00
Xiaomeng Yang	3ae92b3a8b	Fix lint errors (#7247 )	2018-05-03 12:17:23 -07:00
Bram Wasti	e625ecc41f	[caffe2][nomnigraph] Fix NNPack conv-relu fusion for ping-pong naming, (#7199 ) add test for it and make tests python3 compatible	2018-05-03 12:12:24 -07:00
Martin Tutek	c96f2624a2	Speedup sparse init (#6899 ) * Sparse initialization speedup * +empty line * simplify indexing * Can't reproduce locally... * Can't reproduce locally...+ * Can't reproduce locally...+ * Fix test, cleanup	2018-05-03 14:29:12 +01:00
Edgar Andrés Margffoy Tuay	4ab6ea5b1f	Add unbuffered flag to distributed node launcher (#7226 )	2018-05-03 11:49:06 +02:00
Thomas Viehmann	79245306c7	Fix onnx sum (#7232 ) * fix onnx ReduceSum generation * allow handle_only_zero_dim to return none to make mypy happy	2018-05-03 00:18:16 -07:00
Daniel Bermond	f9393ffc90	Remove unneeded entry for NCCL in .gitmodules (#7216 ) NCCL currently is not a git submodule. The NCCL source code is bundled in 'third_party/nccl'. Closes #7150	2018-05-03 00:07:58 -07:00
Thomas Viehmann	c4078b42b4	Add docstring for Tensor.tolist (Fixes #7095 ) (#7182 )	2018-05-02 23:58:32 -07:00
ngimel	6538ae5c16	clean up runtime dockerfile, use cuda 9 package (#7230 )	2018-05-02 23:54:05 -07:00
Peter Goldsborough	7c70c3bdca	Fixes for C++ build on macOS (#7192 ) * Fix C++ build on Mac * Enable CI on Mac * Create NO_API switch to only build jit without api * More fixes * Fixes to CMake	2018-05-02 23:06:04 -07:00
Paul Jesse Hellemn	1313791015	Need an explicit flag since opencv is on by default (#7225 )	2018-05-02 21:00:34 -07:00
Orion Reblitz-Richardson	aa38ae303d	[build] Setup to build ATen from root CMake file (#7163 ) * Setup to build ATen from root CMake file * Move aten/src/TH/cmake into cmake/Modules * Add special code path for FindMKL for merge	2018-05-02 19:33:31 -07:00
gchanan	681baa9254	Restore warning to torch.range. (#7194 ) Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.	2018-05-02 21:53:00 -04:00
Thomas Viehmann	07513cfd1d	implement sum over multiple dimensions (fixes #2006 ) (#6152 )	2018-05-02 21:50:29 -04:00
Zeming Lin	e25e501bea	Fix build for osx (#7187 ) For some reason, this used to build in autogradpp but requires us to put the declaration in the .cpp in PyTorch.	2018-05-02 21:08:14 -04:00
Paul Jesse Hellemn	d154d32890	Fix to a conda hack (#7212 )	2018-05-02 17:35:15 -07:00
Paul Jesse Hellemn	8ac6856e54	Removing features for a sec (#7211 )	2018-05-02 17:11:19 -07:00
Paul Jesse Hellemn	faef70b5b0	Fixing a bug in my bug fix (#7210 )	2018-05-02 17:02:24 -07:00
onnxbot	a10870a2d1	[auto] Update onnx to 676e0c7 - Type and shape inference for generator ops (#871 ) `676e0c7726`	2018-05-02 23:36:33 +00:00
Yangqing Jia	83622abd9f	Reroute aten to use the root cmake system (#7188 )	2018-05-02 16:25:56 -07:00
Paul Jesse Hellemn	1ca6e77615	Fix to comput_70 error + some more lowercasing (#7205 )	2018-05-02 15:34:35 -07:00
li-roy	93242d320f	fix scale on some tensors (#7189 )	2018-05-02 15:33:02 -07:00
Xiaomeng Yang	a61d4a3374	[Caffe2] Refactor reduce ops to take flexible input types (#7164 ) * Refactor reduce ops to take flexible input types * Add DISPATCH_FUNCTION macros in common_gpu.h * Use macros to reduce switch case in dispatching cuda functions	2018-05-02 12:08:38 -07:00
Takayoshi Nishida	197412fa8f	Fix typo in comment (#7183 )	2018-05-02 11:58:30 -07:00
Edward Z. Yang	619a56bf21	Emergency new fork for ideep (upstream lost commits). (#7191 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-02 14:50:47 -04:00
cpuhrsch	88a705555a	Add SLEEF for float and double (#6725 )	2018-05-02 18:40:44 +00:00
Paul Jesse Hellemn	4d2693973e	[Caffe2] Turning on ATEN for Caffe2 in integrated builds (#7169 ) * Turning on ATEN for Caffe2 in integrated builds * Adding slim version * Fixing missing name suffix, fixing conda tests	2018-05-02 11:16:29 -07:00
Soumith Chintala	1904058370	update logos (#7184 )	2018-05-02 10:56:20 -07:00
onnxbot	e6330559c8	[auto] Update onnx to c7055f7 - update defs for reduce, rnn, and tensor depth-space ops (#847 ) `c7055f721c`	2018-05-02 16:41:28 +00:00
Edward Z. Yang	604f907bc7	Restore filename and line number on AT_ASSERT. (#7152 ) AT_ASSERT is an internal, PyTorch specific error, so we should give a little more debug information (than with the ordinary errors.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-02 07:49:31 -07:00
Zachary DeVito	f07f24db0b	Change unique name so that you are guarenteed: (#7166 ) ``` JIT_ASSERT(v->setUnique(x)->uniqueName() == x); ``` This works by changing any other value in the graph with name x to a different name. This mirrors llvm behavior and is useful when you want to ensure some names have particular values.	2018-05-02 07:32:01 -07:00
Pieter Noordhuis	ebebfce681	Minor THD cleanup (#7161 ) * Remove stale THD README * Move common THD dependency into THD/base The master_worker directory now no longer contains files that are needed for building other parts of THD.	2018-05-02 07:29:27 -07:00
cpuhrsch	414e0b4b6f	Split up CPUApplyUtils for perf (#7168 )	2018-05-02 14:22:36 +00:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Bram Wasti	967c4a0c18	[caffe2][nomnigraph] Fix NNPACK relu fusion for inplace relu (#7124 )	2018-05-01 16:26:54 -07:00
Bram Wasti	20666feb2c	[caffe2][nomnigraph] Add compatibility for MSVC, which lacks some C++11 language features (#7158 )	2018-05-01 16:26:20 -07:00
gchanan	f3c76b9b78	Remove specifications from Declarations.cwrap that have no effect and are already handled. (#7147 ) These changes are already handled, either in native functions or via resize specifications in Declarations.cwrap. The resize_ one is technically not handled, although in TH it is checked if the storage is actually reallocated; this is less strict, but seems okay.	2018-05-01 19:10:31 -04:00
cpuhrsch	a9f2ee0817	CPUApplyUtils is faster if iterate is split into two steps (#7148 )	2018-05-01 22:32:02 +00:00
Bram Wasti	9ba503ac9c	[caffe2][nomnigraph] Add ability to pass the old net to convertToCaffe2Proto (#7149 )	2018-05-01 15:31:07 -07:00
Edward Z. Yang	1418cc72d6	Make refcount in THMapInfo atomic. (#7135 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 18:14:46 -04:00
Edward Z. Yang	a5e1d4a049	Delete dead header (#7153 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 18:14:06 -04:00
Xiaomeng Yang	08a853b02c	Add rsqrt op in caffe2 (#7154 )	2018-05-01 15:06:53 -07:00
onnxbot	a8b059edcc	[auto] Update onnx to 69894f2 - Use op schema.all tensor types in random like definitions (#865 ) `69894f207d`	2018-05-01 21:30:49 +00:00
Xiaomeng Yang	762eb3ddc8	[Caffe2] Add moments op in caffe2 (#7114 ) * Add moments op in caffe2 * Use rsqrtf in float for group_norm * Add docs for default behavior when axes is not provided. * Update group_norm_op by using Eigen::sqrt on CPU	2018-05-01 12:19:08 -07:00
Paul Jesse Hellemn	323e3aca47	A small fix for aten cmake (#7141 )	2018-05-01 12:12:29 -07:00
Bram Wasti	dfe1bae3cd	[caffe2][nomnigraph] Move tests to proper gtest suite (#7046 )	2018-05-01 12:00:43 -07:00
Peter Goldsborough	bcadf92ad5	Move codegen from setup.py to CMake for C++ libraries (#7121 ) * Generate code without setup.py for C++ build * Move code generation to CMake * Set DEPENDS files correctly * Fix some errors in codegen * Fix blank line lint	2018-05-01 11:30:13 -07:00
Luca Antiga	5d3c3c53aa	Add raw IR serialization/deserialization (#6392 )	2018-05-01 20:21:29 +02:00
onnxbot	ca8ee4c1e1	[auto] Update onnx to b9d6b90 - Clarify random like operators (#846 ) `b9d6b90a64`	2018-05-01 17:54:27 +00:00
gchanan	2a18e7c45b	Have python dispatch respect 'auto_gpu' and 'with_gil'. (#7137 )	2018-05-01 13:51:02 -04:00
gchanan	8031da5479	Implement torch.as_tensor, similar to numpy.asarray. (#7109 ) * Implement torch.as_tensor, similar to numpy.asarray. torch.as_tensor behaves like torch.tensor except it avoids copies if possible; so also somewhat like tensor.new but without the size overloads. I didn't add a requires_grad field, because we haven't decided on the semantics such as as_param. * Remove requires_grad for doc.	2018-05-01 12:54:43 -04:00
onnxbot	1f5b392da0	[auto] Update onnx to fc6b5fb - Refactor shape inference implementation (#855 ) `fc6b5fbb6d`	2018-05-01 15:04:47 +00:00
peterjc123	15b12e6f8a	Add support for MKLDNN on Windows (#7130 )	2018-05-01 10:57:16 -04:00
Paul Jesse Hellemn	7968ee0f59	Removing references to CUDA_SDK_ROOT_DIR to see if it breaks anything (#7125 )	2018-05-01 07:52:16 -07:00
Peter Goldsborough	87e6362393	Add more warnings to C++ API build (#7123 ) Enables more warnings in the C++ API build. Fixed a bunch of things in torch/csrc/. Mostly taken from c10 * Enable -pedantic for C++ build * Enable more warnings * Include CUDA and library headers with -isystem * Fix sign-promo warning	2018-05-01 10:40:22 -04:00
Edward Z. Yang	0427afadd1	Make AT_ASSERT/AT_ERROR non-printf based, other tweaks (#7104 ) * Make AT_ASSERT/AT_ERROR non-printf based, other tweaks - AT_ASSERT/AT_ERROR don't take printf strings anymore; instead, they take a comma-separated list of things you wanted to print (bringing it inline with Caffe2's conventions). Instead of AT_ASSERT(x == 0, "%d is not zero", x) you write AT_ASSERT(x == 0, x, " is not zero") This is done by way of a new variadic template at::str(), which takes a list of arguments and cats their string reps (as per operator<<) together. - A bunch of the demangling logic that was in Error.h is now moved to Error.cpp (better header hygiene.) Also, demangle has been moved out to its own helper function, and also a new helper demangle_type (from Caffe2) added. - A bunch of AT_ASSERT converted into AT_CHECK, to more properly convey which checks can be caused by user error, and which are due to logic error in ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix test failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More fixes. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * One more fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Try harder Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 10:28:31 -04:00
xkszltl	24461a756a	Separate "-Xcompiler <...>" into 2 elements because ${nvcc_flags} (when using CUDA_SEPARABLE_COMPILATION) doesn't recognize it. (#7118 ) This solves the "nvcc fatal : Unknown option 'Xcompiler -MD'" issue where nvcc gets -'Xcompiler -MD'.	2018-05-01 09:31:43 -04:00
Kento NOZAWA	dccfdf317b	Fix example of torch.clamp (#7131 )	2018-05-01 14:52:32 +02:00
Masaki Kozuki	ba046331e8	add spectral normalization [pytorch] (#6929 ) * initial commit for spectral norm * fix comment * edit rst * fix doc * remove redundant empty line * fix nit mistakes in doc * replace l2normalize with F.normalize * fix chained `by` * fix docs fix typos add comments related to power iteration and epsilon update link to the paper make some comments specific * fix typo	2018-05-01 17:00:30 +08:00
onnxbot	23a5ddd3c8	[auto] Update onnx to b7d8dc8 - fix cmake warning message (#863 ) `b7d8dc8fa6`	2018-05-01 08:21:41 +00:00
onnxbot	e8916f510b	[auto] Update onnx to f585c5d - add pytorch-operator test for tile (#831 ) `f585c5d066`	2018-05-01 07:22:44 +00:00
onnxbot	c72e5da7eb	[auto] Update onnx to 993fe70 - add install step (#832 ) `993fe70805`	2018-05-01 07:21:36 +00:00
Lu Fang	5acc62ffa5	Skip Tile onnx backend to keep CI green (#7120 )	2018-04-30 22:37:34 -07:00
James Reed	892bef9aa3	[ONNX] Delay external value resolution as long as possible in ONNX backend (#7111 )	2018-04-30 21:30:31 -07:00
Joel Wong	0b0279981d	Fix example for new_zeros in documentation (#7128 ) Fix for Issue #7088	2018-05-01 00:29:13 -04:00
Pooya Davoodi	531944275c	[Caffe2] Guard CUDA API calls in caffe2/operators using macro CUDA_CHECK (#6810 )	2018-04-30 21:27:37 -07:00
Yinghai Lu	150af6ac1e	Move ideep ops from caffe2/contrib/ideep to caffe2/ideep (#7112 )	2018-04-30 21:10:46 -07:00
Yinghai Lu	b2cdd08252	Introducing onnx-tensorrt to third_party (#7119 )	2018-04-30 21:09:51 -07:00
bddppq	4add3a4df7	Add dependency from caffe2_gpu to ATen in CMake (#7117 )	2018-04-30 19:30:34 -07:00
onnxbot	cdc6d104e2	[auto] Update onnx to 68bc26c - add type inference for traditional ml ops except classifier ops. (#857 ) `68bc26cfb2`	2018-05-01 02:21:49 +00:00
Richard Zou	b3be71f046	[easy] Stop hardcoding "python" executable in bottleneck tests (#7105 ) Right now, the bottleneck test_utils.py tests assume that a user's python executable is 'python'. This may not be the case especially if the user has multiple versions of python installed. This PR changes it so that test_utils.py uses `sys.executable` as the python executable.	2018-04-30 22:01:36 -04:00
Peter Goldsborough	afe3c2688f	Update C++ API tests to use Catch2 (#7108 ) * Update C++ API tests to use Catch2 * Update download_mnist.py to be less verbose	2018-04-30 21:36:35 -04:00
Peter Goldsborough	25e7d5c612	Make @ebetica and @goldsborough owners for test/cpp/api (#7113 )	2018-04-30 21:35:13 -04:00
Yinghai Lu	6e72ba9798	[Caffe2] Fail fast for C++ unit tests too (#7106 ) * Fail fast for C++ unittests too * Fix based on comments	2018-04-30 17:30:03 -07:00
onnxbot	7efd6f0506	[auto] Update onnx to 9cc0cda - fix string representation of scalar types (#858 ) `9cc0cdabd3`	2018-05-01 00:07:32 +00:00
theweiho	ab44002ac8	Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs (#7063 ) * Refactor extractMetaNetDef and runGlobalInitialization into open... * Fix test by making get output blobs optional * Update test instead of making output blobs optional	2018-04-30 17:01:27 -07:00
Peter Goldsborough	71f6cca992	Make @ebetica and @goldsborough owners for torch/csrc/api (#7110 )	2018-04-30 15:48:12 -07:00
onnxbot	bd69d2fd23	[auto] Update onnx to 1078925 - fix y in pow test case to scalar (#852 ) `1078925c2d`	2018-04-30 22:42:37 +00:00
daquexian	f87462c65f	[Caffe2] Fix the wrong argument name in collect_and_distribute_op (#7091 ) * Fix the wrong argument name, FPN works! * Fix collect_and_distribute test	2018-04-30 15:01:11 -07:00
Peter Goldsborough	50218a25e7	[EASY] Document load_inline (#7101 ) * Document load_inline * Link to tests for examples * Links in RestructuredText are weird	2018-04-30 14:36:41 -07:00
Paul Jesse Hellemn	1ea3f79569	Location of pip package changed (#7100 ) * Location of pip package changed * They moved setuptools two days ago too	2018-04-30 14:35:17 -07:00
Paul Jesse Hellemn	95681257d6	Revising cudnn version check (#7062 )	2018-04-30 14:34:41 -07:00
Peter Goldsborough	af71fb882f	Merge autogradpp into PyTorch (#7074 ) * Dump autogradpp into PyTorch * Fixed up CMake for autogradpp/C++ API * Made cereal a submodule * Change search location of autogradpps mnist directory * Add test_api to CI * Download MNIST from the internet instead of storing in repo * Fix warnings	2018-04-30 12:53:46 -07:00
Peter Goldsborough	3407708b81	Remove unused variable (#7103 )	2018-04-30 12:53:28 -07:00
onnxbot	bf9fab3cf3	[auto] Update onnx to c66fb6f - Add some math function shape inference (#845 ) `c66fb6f077`	2018-04-30 19:45:21 +00:00
Thomas Viehmann	20c965f7d6	fix max/min on cuda in presence of NaN (fixes #6996 ) (#7052 ) Thank you ngimel and zou3519!	2018-04-30 21:02:47 +02:00
Paul Jesse Hellemn	90026f59a3	Switching to conda's --no-test flag (#7099 ) * Switching to conda's --no-test flag * Also updating callsite in .jenkins/build.sh	2018-04-30 11:22:25 -07:00
daquexian	9a3c723644	Add missing PrintOp arguments doc (#7084 )	2018-04-30 11:17:56 -07:00
xkszltl	caa6a8ce30	Switch to the official git mirror for Eigen. (#7090 )	2018-04-30 14:09:18 -04:00
Edward Z. Yang	39c0b0b850	Delete unnecessary header includes. (#7094 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-30 14:04:28 -04:00
Paul Jesse Hellemn	2a56666196	Removing leveldb to make special gcc builds unnecessary (#7098 )	2018-04-30 10:55:37 -07:00
Peter Goldsborough	b70b7a80d4	Inline JIT C++ Extensions (#7059 ) Adds ability to JIT compile C++ extensions from strings >>> from torch.utils.cpp_extension import load_inline >>> source = ''' at::Tensor sin_add(at::Tensor x, at::Tensor y) { return x.sin() + y.sin(); } ''' >>> module = load_inline(name='inline_extension', cpp_sources=source, functions='sin_add') Fixes #7012 * Inline JIT C++ Extensions * jit_compile_sources -> jit_compile * Split up test into CUDA and non-CUDA parts * Documentation fixes * Implement prologue and epilogue generation * Remove extra newline * Only create the CUDA source file when cuda_sources is passed	2018-04-30 11:48:44 -04:00
onnxbot	c5978db094	[auto] Update onnx to ff667d1 - Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853 ) `ff667d1dfb`	2018-04-30 15:37:06 +00:00
Tongzhou Wang	d9aeb7e71b	clamp now has subgradient 1 at min and max (#7049 ) * subgradient 1 at min and max for clamp * clamp max and clamp min too * add comment	2018-04-30 21:21:56 +08:00
Thomas Viehmann	8fbab83c2a	only Tensors of floating point dtype can require gradients (see #7021 ) (#7034 )	2018-04-30 10:20:00 +02:00
Soumith Chintala	6a55d86234	GroupNorm docs (#7086 )	2018-04-30 09:40:34 +02:00
onnxbot	881af544fd	[auto] Update onnx to 11c6876 - clear initializer names when clear initializer (#849 ) `11c6876f1d`	2018-04-30 07:00:36 +00:00
Marcin Elantkowski	bc62645e4c	[jit] Fix handling of IntList[k] parameters (#6965 ) * squash commits * emit additional declarations and handle positional arg. case * apply minor tweaks * py-2 fix * Address Tom's comments * move logic to gen_jit_dispatch, start adding tests * add test * address review comments * address review comment * fix build issue. change argument indices to argument names. Get rid of deepcopy * py-2 flake8 fix	2018-04-29 23:09:04 -04:00
Richard Zou	96c6ae67bb	Remove incorrect/irrelevant test code. (#7050 ) Followup to #6873.	2018-04-29 23:03:44 -04:00
Ethan Steinberg	ee00a8049a	Add max pooling support to EmbeddingBag (#5725 ) * Add max mode support to EmbeddingBag * Lint fix * Fix compilation issue on other platforms * Rebase + don't waste memory when not in max mode * Oops, missed a spot * Fix whitespace from merge * less precision * Lower precision to avoid spurious failures * Minor typo * Switch to size()	2018-04-29 16:48:11 -04:00
Xiaomeng Yang	49f87320ba	[Caffe2] Add full impl of GroupNorm (#7058 ) * Add full impl of GroupNorm * Fix comments in math.h * Remove unsed buffers * Add #include <array> in gpu version * Remove unused moments_buffer_ * Make inverse std to be a template. * Add detailed comments	2018-04-29 11:26:40 -07:00
Luca Antiga	0703357723	Don't build THD/master_worker if not explicitly requested (#7081 )	2018-04-29 13:17:09 -04:00
Francisco Massa	b240cc9b87	Add support for dotted names in CPP Extensions (#6986 ) * Add support for dotted names in CPP Extensions * Modify tests for cpp extensions Test that dotted names work * Py2 fixes * Make run_test cpp_extensions Win-compatible	2018-04-29 18:10:03 +02:00
Yinghai Lu	e6ce1afe47	[Caffe2] Follow-up of onnx-trt API change (#7076 ) * Follow-up of onnx-trt API change * indent * comments	2018-04-28 23:07:15 -07:00
onnxbot	7450e9152b	[auto] Update onnx to 73c34ae - Clarify FeatureVectorizer description. (#843 ) `73c34ae62f`	2018-04-28 22:43:39 +00:00
Peter Goldsborough	281f095972	Add autograd API to at::Tensor (#6582 ) * Add autograd API to at::Tensor * Trying to fix linker errors on Windows * Add AT_API to set_data	2018-04-28 12:54:05 -07:00
onnxbot	802e718e1c	[auto] Update onnx to 1befb9b - Remove useless text in docs (#850 ) `1befb9b12d`	2018-04-28 17:30:40 +00:00
Edward Z. Yang	4caea64d72	Make all of TH and THC C++. (#6913 ) Changelist: - Move .c to .cpp - Change includes of ".c" to ".cpp" - A bunch of cmake configuration modifying CMAKE_C_FLAGS changed to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code - Explicitly cast void* to T* in a number of places - Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C") - Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct" - Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor) - Documentation about how the TH C interface (and extern C business) works - Note that THD master_worker mode is dead - C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h) - New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly - New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero. - In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions - Some extra Werror fixes (char* versus const char*)	2018-04-28 07:45:02 -04:00
James Reed	4667983f0f	Fixes for interpreter and ONNX export for translation (#7044 ) Fixes for interpreter and ONNX export for translation Address comments	2018-04-27 22:23:57 -07:00
Paul Jesse Hellemn	fc6a846cc5	[Caffe2] Fixing bug in conda builds (#7061 ) * Fixing bug in conda builds * Update to other PR	2018-04-27 21:52:40 -07:00
Paul Jesse Hellemn	1048d0dd67	[Caffe2] Moving all conda package information into package name rather than build string (#7041 ) * Lowercasing script internal variables * Removing nccl from name	2018-04-27 21:42:49 -07:00
xkszltl	065cd32ed0	Fix ".pb.h" dependency issue about DLL build. (#7027 ) * Add missing header "caffe2/core/common.h" before "caffe/proto/caffe.pb.h" to provide CAFFE2_API macro. This only affects the Windows build since CAFFE2_API is only defined for DLL. * Fix ".pb.h" dependency issue about DLL build. CAFFE2_API defined in "caffe2/core/common.h" is required by ".pb.h" generated on Windows for DLL build. We always need to have "#include <caffe2/core/common.h>" before using any proto header. In this case "caffe2.pb.h" is already included by "context_gpu.h" -> "common_cudnn.h" in the correct order, hence we simply remove a line.	2018-04-27 21:21:46 -07:00
onnxbot	bb9c859253	[auto] Update onnx to e84788f - Fix SELU attributes' default values (#839 ) `e84788fb48`	2018-04-28 04:18:49 +00:00
James Reed	20cd27da42	[caffe2][ONNX] Implement CPU NumpyTileOp and corresponding ONNX backend (#7053 ) * Implement CPU NumpyTileOp * Address comments	2018-04-27 19:58:15 -07:00
Peter Goldsborough	2e023a29e4	Add optional support to C++ extensions (#7055 )	2018-04-28 01:59:50 +01:00
Peter Goldsborough	7b09bc72a5	[WIP] Enable WERROR in tests (#6539 ) * Enable WERROR in tests * Also set WERROR=1 for cpp_build in CI * Enable Werror after the compiler checks * Remove -DWERROR because its picked up from the env var * Had to fix some errors in aten/contrib/data * Allow an uninitialized variable in ReduceOpsKernel.cpp * Use CUDNN_DATA_UINT8 in cuDNN type string conversion * Fixes and use target_compile_options * Fix uninitialized variables in THNN * Include Python.h earlier in tensor_types.cpp * Use CUDNN_VERSION 7100 instead of 7000? * More Python.h includes * Make switch case in common_subexpression_elimination.cpp exhaustive * Build with WERROR=0 just to see all the warnings * Remove some Python includes * Enable WERROR=1 again * Bring back switch case default	2018-04-28 01:51:16 +01:00
Zachary DeVito	733e2967b1	Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions (#7017 ) * Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions * Fix bugs in @script loops 1. while loops run shape propagation multiple times until the shapes have converged. There were two bugs here. (a) First the 'changed' condition was not checking if it actually changed the output, and instead would mark changed = true if the two inputs were different. This incorrect because the output of the block and the input of the block may always have different shapes. Now it actually checks if it is about to change the output entry that it is writing to. (b) expand nodes were being inserted into the graph even inside the while loop body. However, if we iteratively discover that the input shape to one of these expands is actual dynamic, then it was incorrect to insert the expand in the first place. This changes it so that we only insert expands after we have converged on the shapes. 2. the way deleteExtraInputs removed loop-carried dependencies was unsafe because it would lookup Value* elements in the loop body's environment that were previously invalidated when deleteExtraInputs remove another input to the loop. This changes the way deleteExtraInputs works so that it never has to read a value out of the loop body's environment to avoid using the invalidated pointers.	2018-04-27 17:44:17 -07:00
Orion Reblitz-Richardson	02a764f82d	Update the video input op in caffe2 (#7054 ) There are multiple fixes to the video input op recently. This is to update the caffe2 version so that it is up to date.	2018-04-27 17:17:42 -07:00
xkszltl	980960d036	Fix Visual Studio error C2398 about ill-formed narrowing conversion. (#7024 )	2018-04-27 17:07:56 -07:00
Marat Dukhan	59f5f9ac36	[caffe2] Fix build of depthwise_3x3 for CUDA compute capability < 3.5 (#7048 ) PR #6601 broke build on older CUDA targets due to __ldg intrinsics. This patch adds a work-around.	2018-04-27 18:53:24 -04:00
gchanan	361648a4a7	Fix torch.tensor(...) device-type calculation when used with numpy an… (#6995 ) * Fix torch.tensor(...) device-type calculation when used with numpy and type inference. * Fix tensor device type inference as well. * Better variable type inference: infer cuda-ness only if device is not specified.	2018-04-27 18:12:33 -04:00
Samuel	0c737dff63	fix lbfgs variable names (#7037 ) Switches the step/direction variable names (steps and directions are flipped in the current implementation of the two loop-recursion). This change does not change the numerical output of the program, but should make it easier to follow.	2018-04-27 17:47:37 -04:00
onnxbot	6ce376fee3	[auto] Update onnx to ebac046 - Add tile test case (#823 ) `ebac0463a0`	2018-04-27 21:01:58 +00:00
Bram Wasti	f630de8f33	[caffe2][nomnigraph] Lint run (#7045 )	2018-04-27 12:58:58 -07:00
Richard Zou	932c4c2364	Prevent stack overflow on deletion of deep graph (#6873 ) * Prevent stack overflow on deletion of deep graph Fixes #5534. Sometimes one can end up with a very big computation graph of Functions and Edges. Each std::shared_ptr<Function> contains a list of Edge, and each Edge contains a std::shared_ptr<Function>. Deleting a std::shared_ptr<Function> can trigger the recursive deletion of other std::shared_ptr<Function>'s: this can stack overflow if the graph is deep enough. Here is an example of such a graph: shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function> The solution here is to use a custom deleter with each std::shared_ptr<Function>. The custom deleter keeps track of how many nested deleters it is in. When this number exceeds the maximum allowed depth, the Function* to be deleted are accumulated in a per-thread delete queue and handled by one of the deleters. Example code that could trigger the overflow (set ``depth`` to something > 100000) is below. I also benchmarked the below code before/after the changes to see if there are any significant performance differences. ``` import torch def scope(): depth = 80000 x = torch.randn(9, requires_grad=True) y = x.clone() # build deeply nested computation graph for i in range(depth): y = y + y * 0.000001 %timeit -n 100 scope() 376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) Without changes: 352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` With the change, the above code is 6.8% slower. UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a * WIP * Add custom deleter to PyFunctions created by THPFunction * Address some comments; pick new value * Address some more comments * Add more complicated test; special case the windows depth constant	2018-04-27 15:49:58 -04:00
Thomas Viehmann	c730792d51	Add big warning about averaging to KLDivLoss documentation #6622 (#7006 ) * Add big warning about averagin to KLDivLoss documentation #6622 Also: An (independent) change in diagonal docstring tensor formatting. * Improve note with example Thank you Richard Zou! * use log_softmax	2018-04-27 15:45:26 -04:00
cpuhrsch	ae35e0e924	Support non-contiguous tensors for unary ops (#6119 )	2018-04-27 21:31:34 +02:00
gchanan	a6bfa16c17	torch.arange: add numpy-style type inference. (#7016 ) * torch.arange: add numpy-style type inference. This is a backwards-compatibility breaking change. * Fix flake8. * Use at::optional. * Remove unneeded header files. * Use reference wrapper. * Update arange for test. * Address review comments.	2018-04-27 15:11:45 -04:00
onnxbot	bdd27ea956	[auto] Update onnx to 8b7a925 - a few more shape inference functions (#772 ) `8b7a9252c9`	2018-04-27 19:06:58 +00:00
onnxbot	f6083b343b	[auto] Update onnx to 9718f42 - Make the coefficient non optional for LinearClassifier (#836 ) `9718f42976`	2018-04-27 18:06:39 +00:00
onnxbot	39c6101ab4	[auto] Update onnx to ef083d0 - Add save_tensor and load_tensor functions for Protos (#770 ) `ef083d0338`	2018-04-27 17:13:59 +00:00
Thomas Viehmann	1b0ad8678b	import *Sampler to utils.data (Better fix than #6982 ) (#7007 )	2018-04-27 10:18:29 +02:00
Peter Goldsborough	3d4d39ce30	Also check compiler ABI compatibility when JIT compiling (#7015 )	2018-04-27 08:19:17 +01:00
onnxbot	9db779f331	[auto] Update onnx to 45ceb55 - Check if CMAKE_BUILD_TYPE set before project(). (#812 ) `45ceb5523a`	2018-04-27 04:51:00 +00:00
Jerry Ma	76d3c30783	Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445 )	2018-04-26 19:27:24 -07:00
Bram Wasti	eaab6ce459	[caffe2][nomnigraph] Move nomnigraph<->caffe2 converter logic to caffe2/opt (#7018 )	2018-04-26 18:28:13 -07:00
gchanan	18ed2160b0	Use Index rather than Long for IntList parsing (#6674 ) * Use Index rather than Long for IntList, so floating-point types convertible to ints fail the parsing. Basically, our unpackLong code works with floating-point types that are convertible to ints, but this isn't often what you want (because of truncation). What you actually want is to convert to an index, which will usually find such issues. I made this the minimal change I could because: 1) I didn't want to change unpackLong because the existing code call checkLong before unpackLong, so this should be a non-issue most of the time. And fixing this properly requires calling checkLong again, which will slow everything down. 2) An exception above is with IntList, which only checks that 1) it is a tuple or 2) it is a varargs tuple (i.e. torch.ones(1, 2, 3)). * Fix bug. * Don't conflict tensor and IntList bindings. * Change function to be consistent between python 2 and 3. * Check Index. * Move IntList overloads in legacy new functions to below Tensor overloads.	2018-04-26 19:13:23 -04:00
Paul Jesse Hellemn	902579602b	[wip] [Caffe2] Changes to integrated binaries (#6997 ) * Changes to integrated binaries * Changes for cpu version of integrated binary * Disabling static linking of CUDA for pytorch for integrated builds	2018-04-26 15:43:24 -07:00
onnxbot	19cb5a0436	[auto] Update onnx to 4b3d2b0 - [WIP] reenable shape inference tests (#834 ) `4b3d2b02e8`	2018-04-26 22:17:52 +00:00
onnxbot	d67ec68dbe	[auto] Update onnx to 22d17ee - RNN tests: LSTM, GRU, SimpleRNN (#739 ) `22d17eee2e`	2018-04-26 20:57:42 +00:00
gchanan	a08091a42d	Implement matmul_out and dot_out. (#6961 ) * Implement matmul_out and dot_out. * Fix autograd by only calling _out variants if we have an out ourselves. * Disallow mismatched types in dot_out. * Make sure out variant doesn't have a method. * Do proper type conversion.	2018-04-26 16:52:58 -04:00
Mike Ruberry	49493948a8	Fixes some build warnings. (#7004 )	2018-04-26 16:44:23 -04:00
Lu Fang	9a6c033004	Skip unsupported ONNX backend test cases (#7005 )	2018-04-26 13:10:55 -07:00
li-roy	242f6c3470	Don't print dots after nonfinite numbers in integral float tensors (#6835 ) * Don't print dots after nonfinite numbers in integral float tensors * get around lint * support python 2 * refactor * better refactor	2018-04-26 11:18:12 -07:00
Thomas Viehmann	2b44c420c8	Enhance diagonal (fixes #6479 ) (#6718 ) * Enhance diagonal This patch - adds Tensor.diagonal to complement torch.diagonal - implements diagonal natively in ATen - makes diagonal a view - implements taking arbitrary diagonals - implements diagonal backward instead of referring to the (more limited) diag * add tests, copy diagonal code to backward for double differentiability * improve tests and doc comment. Thank you, Adam! * Mark diagonal as view function in gen_autograd.py, use simple backward.	2018-04-26 11:11:20 -04:00
Paul Jesse Hellemn	8109b3065e	Slight changes to anaconda script (#6994 )	2018-04-26 10:04:58 -05:00
Zachary DeVito	b2581c0289	Workaround in onnx to get transposes into init_nets (#6924 ) * Workaround in onnx to get transposes into init_nets This adds a pass to ONNX so that it can speculate Transpose operators so that ONNX's split pass can put them into an init_net Also fixes a potential bug in onnx peephole where an optimization across blocks might move a Value and violate scoping. * Perform shape propagation when embedding a program into a trace. This ensures the trace still has type information specific to that trace, which will help onnx export succeed in more cases.	2018-04-26 11:04:17 -04:00
Wenhao Hu	a64b2987b4	[ONNX] export tile op (#6954 ) * onnx export aten::repeat to Tile * move repeats to input * turn repeats to a long tensor constant * deal with case that len of repeats bigger than number of dims in input	2018-04-26 11:03:41 -04:00
Thomas Viehmann	5dc5a71d74	Improve error message (Sampler location) Fixes #6917 (#6982 ) Thank you @ruotianluo for reporting!	2018-04-26 10:58:27 -04:00
derek_kim	984516bdc4	typo corrected: is -> if (#6980 )	2018-04-26 09:57:11 -04:00
Neeraj Pradhan	3964253f94	Allowing for vectorized counts in Binomial Distribution (#6720 )	2018-04-26 15:53:01 +02:00
Thomas Viehmann	f98b778086	Fix forward and backward for norm/renorm with infty norm (fixes #6817 ) (#6969 )	2018-04-26 12:54:53 +02:00
Marat Dukhan	24d05662ea	[caffe2] Open-source DEPTHWISE_3x3 engine (#6601 ) DEPTHWISE_3x3 engine provides an optimized implementation of depthwise 3x3 convolution, e.g. for ShuffleNet, MobileNets Implementations exist for CPU (generic), ARM CPU, and CUDA GPU. Originally developed by @ajtulloch	2018-04-26 02:30:51 -04:00
onnxbot	eb4154a007	[auto] Update onnx to 485b787 - function proto for composite op. (#802 ) `485b7875fa`	2018-04-26 03:01:03 +00:00
gchanan	3d907ef78e	Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973 ) We were previously doing this in the most common cases, but not consistently.	2018-04-25 22:46:42 -04:00
Thomas Viehmann	c10da636b5	implement gamma cuda (#6855 ) * Refactor standard_gamma and implement CUDA gamma sampling * Attempt fixes for AT_CUDA_ENABLED changes * Gamma cuda and cpu forward as ATen native * implement standard_gamma_grad_cuda * update native_test.cpp, try to fix windows and various cuda version compiles * searching a windows fix via CI... use std:: for math * casting some constants in the calculation, compute at float for half precision * whitespace fixes * add acctype to do half->float computation, include HALF in generation, cast locally rather than tensors * fix cuda8 half compilation * always use scalar_cast with CUDACC, lock CPU generator, CPU acctype = double\nThank you for your review comments!	2018-04-25 22:22:09 -04:00
Lu Fang	7cbef70372	Fix the onnx symbolic for selu and maxpool3d (#6816 )	2018-04-25 22:20:45 -04:00
Emanuel Jöbstl	645ad7ad0c	Fixing LP-Pooling stability issues (#6766 ) * Added ReLU unit to LP pooling, so the gradient does not become NAN if all inputs are zero. * Added workaround for odd p. Added a bit of doc. * Make the linter happy.	2018-04-25 22:13:15 -04:00
Soumith Chintala	bd14d8e8f8	add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891 )	2018-04-25 22:10:38 -04:00
Mike Ruberry	ab016a2b30	Code Cleanup: removes unused getTextureObject (#6974 )	2018-04-25 21:07:48 -04:00
Mike Ruberry	2d6d6a4d10	Removes unused _long functions in THCTensorIndex (#6971 )	2018-04-25 21:07:28 -04:00
Mike Ruberry	31c9b4f0d2	Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" (#6953 ) * Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" THE PROBLEM The current overlappingIndices() is meant to detect if a tensor defines multiple valid indices for the same data element. There are two significant issues with this function: (1) The algorithm it attempts to implement cannot do this. (2) That algorithm is not implemented correctly. This call is used by pointwiseApply() and scatter(). If a tensor is readable/writable and detected as overlapped these algorithms will create a non-overlapped copy of it to work on. When tensors are improperly identified as overlapped this causese extra work. If tensors are improperly identified as non-overlapped then this would cause the operations to exhibit unexpected behavior. For example, ref = torch.arange(0, 32 * 5).view(4, 8, 5).cuda().double() p = ref[:,:,::2] p += 1 Results in a call to pointwiseApply1, which detects p as an overlapped tensor (it is not), causing a call to pointwiseApply2 that copies it into a non-overlapped temporary, and then another call to pointwiseApply2 later that copies it back to the original tensor. If, however, the original tensor is given dimensions of (4, 8, 4), instead, it is correctly detected as non-overlapped and only a single pointwiseApply1 call is made. DISCUSSION + FIX The algorithm that overlappingIndices() attempts to implement tests for a sufficient but not necessary condition of a tensor to be non-overlapping. That is, if its algorithm were implemented properly then it would be a conservative check that would ensure all overlapped tensors were copied (as desired), but also that some non-overlapped tensors were copied too. The algorithm can be thought of as trying to test whether the dimensions can be ordered like "nesting dolls," with each dimension fitting within the next one larger than it. If this is true then the tensor is non-overlapping, but if it's false the tensor may or may not be overlapped. For example, a tensor with dims (2, 3) and strides (4, 3) cannot be "nested," but is non-overlapping. (The tensor looks like [[0, 3, 6], [4, 7, 10]].) The algorithm is currently implemented improperly, as can be seen in the example above. The tensor p has dimensions [4, 8, 3] and strides [40, 5, 2]. This confuses the current implementation, which thinks the innermost dimension needs a stride of 6, which is incorrect. The first row is [0, 2, 4] and the next row begins with 5. The current implementation also improperly implemented its sorting behavior. (qsort comparators require -1, 0, and 1, not true/false return values.) Fixing the existing algorithm is straightforward (and what this PR does, see below), but it is important to note that the algorithm never performed as intended, so its name and the documentation around it has been updated, too. A natural question is if it's possible to write an efficient overlappingIndices(), and I believe the answer is "no." Disambiguating overlapping from non-overlapping tensors is equivalent to finding a nonzero solution to a linear diophantine equation with restricted coefficients, that is, an equation of the form x_0s_0 + x_1s_1 ... = 0 where s_X is the stride in dimension X and x_X is an integer from [-size_X + 1, size_X - 1]. Another note is that the CPU does not perform this check. For example, if we run: a = torch.FloatTensor([[0,1], [10, 11]]) b = torch.FloatTensor([[0,0],[0,0]]) b = b.set_(a.storage(), storage_offset=0, size=a.size(), stride=(1,1)) b += 1 Then b is [[1, 3], [3, 11]] because the operation is applied twice to the second element of the original tensor. This causes no warning. Since the CPU does not perform a similar check, another question is whether the GPU code should remove its check. While it may seem that writing to overlapping tensors is an error state, running test_cuda.py reveals 171 instances of possibly overlapped tensors being copied by pointwiseApply(). (The prior incorrect version has 176 copies.) Allowing writing to overlapped tensors on the GPU may violate assumptions about memory accesses, too. In fairness, these assumptions may be violated on the CPU already. Leaving the CPU vs GPU behavior question for the future, this fix corrects the current intended GPU behavior. This means that there will be fewer unnecessary copies and no chance of an overlapped tensor sneaking through on the GPU. The CPU behavior remains unchanged. The fix also adds a test to test_cuda.py to ensure that overlapped tensors on the GPU are written to as expected. * cleanup * Fixes Python formatting	2018-04-25 21:07:13 -04:00
ngimel	d48d3ef6bc	Make cuda 9 behave as cuda 8 wrt half conversions (#6958 ) * Make cuda 9 behave as cuda 8 wrt half conversions Cuda 9 is too smart about implicit half conversions, this would disable them so that cuda 8 and cuda 9 behave in the same way wrt half. * try fixing windows build * one more broken conversion	2018-04-25 17:59:49 -07:00
onnxbot	5209213fa7	[auto] Update onnx to cd58928 - specify defaults for attributes of Affine op (#820 ) `cd589283a0`	2018-04-26 00:26:42 +00:00
Lu Fang	f21c5c5cd8	Fix the symbolic of batchnorm to handle special case (#6967 )	2018-04-25 17:04:25 -07:00
Paul Jesse Hellemn	b038b3d7be	Always dumping final meta.yaml for debugging (#6977 )	2018-04-25 19:00:24 -05:00
onnxbot	3573f64bb1	[auto] Update onnx to 7ee2cf9 - merge the dummy backend back into the main one (#743 ) `7ee2cf9854`	2018-04-25 23:44:01 +00:00
Lu Fang	8028162103	Update the script to avoid the protobuf lib issue and add ZFNet (#6966 )	2018-04-25 16:38:43 -07:00
gchanan	94d2afbe50	Clarify _unsafe_view comment. (#6952 ) It was unclear to me whether the "viewed" tensor was the input or the output.	2018-04-25 19:29:49 -04:00
Paul Jesse Hellemn	2e32e8df75	Statically linking CUDA for Anaconda builds (#6680 ) * Statically linking CUDA for Anaconda builds * typo * Adding a summary line * Comments * Typo fix * Fix faulty parameter passing * Removing problem CUDA modules for now * Fixing unused debugging function * Turning off static cuda linking until script changes are in * Disabling mkl	2018-04-25 18:22:54 -05:00
James Reed	7599d0c3fe	[caffe2] ONNX backend support for control nodes (#6914 )	2018-04-25 15:44:00 -07:00
Sam Gross	3b009dffe1	Delete unused legacy indexed based streams (#6964 ) PyTorch uses THC's THCStream API.	2018-04-25 18:38:47 -04:00
Bram Wasti	1e134b11ec	[caffe2][cmake][opencl] Wrong directories were being included, which might break systems without opencl in the system headers (#6972 )	2018-04-25 14:58:16 -07:00
onnxbot	5aed120bc3	[auto] Update onnx to 1c03a5a - [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551 ) `1c03a5a42e`	2018-04-25 21:28:39 +00:00
Sam Gross	a7b274bb2a	Remove scratch space from THCState (#6956 ) THC had a concept of per-device per-stream scratch space that was persistent in THCState. This was useful before the caching allocator because it avoided synchronizations in kernels that needed temporary scratch space. However, it's not thread-safe since multiple threads can operate on the same stream: In a two-pass reduction the scratch space may get clobbered in between the two kernels. This removes the scratch space and just uses THCudaMalloc and THCudaFree within the reductions. I've kept THCState_getCurrentDeviceScratchSpaceSize for now since it's useful to have the temporary buffer be sized based on the number of SMs.	2018-04-25 16:02:17 -04:00
onnxbot	075ca76c26	[auto] Update onnx to 3769a98 - Rename real model test case from VGG-16 to ZFNet (#821 ) `3769a98362`	2018-04-25 19:57:13 +00:00
Soumith Chintala	333e8c9b22	any/all returns LongTensor, make test expect that (#6957 )	2018-04-25 14:05:29 -04:00
rolczynski	6ebcb4606f	fix typo in the LSTMCell math definition (#6951 )	2018-04-25 19:20:46 +02:00
onnxbot	138d69c688	[auto] Update onnx to 403ccfb - Change the return type for the zipmap operator to match the description in the spec. (#818 ) `403ccfbd01`	2018-04-25 15:48:39 +00:00
Soumith Chintala	e767b186ee	add missing UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD to TH_TENSOR_APPLY_REDUCTION_OMP (#6946 )	2018-04-25 10:34:31 -04:00
Oleksandr "Alex" Zinenko	e7babb1890	[aten] only lookup CuDNN if compiling with CUDA (#6905 ) ATen can be configured to compile without CUDA support by passing -DNO_CUDA=0 to cmake. However, cmake will look for CuDNN independently of that flag and may eventually find it. In cases were compilation without CUDA support was requested on system with CUDA installed, this will result in linking errors while building some tests that rely only on CuDNN being found. Do not look for CuDNN if -DNO_CUDA=1 was provided in the cmake call since it does not make sense to compile with CuDNN if CUDA support was disabled.	2018-04-25 09:13:23 -04:00
vfdev	2dc177ac50	Update checkpoint.py (#6943 )	2018-04-25 08:43:58 -04:00
Tao He	39d4814933	Make any and all on ByteTensor behave like sum/prod. (#4627 )	2018-04-25 10:25:38 +02:00
onnxbot	241a1e0f52	[auto] Update onnx to 15289e3 - Tile - align with numpy (#757 ) `15289e3d77`	2018-04-25 08:16:31 +00:00
onnxbot	c820fda180	[auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804 ) `42207c60d8`	2018-04-25 08:15:37 +00:00
Sang-gil Lee	c92b5422f7	Fix typo in set_grad_enabled description (#6931 ) After setting set_grad_enabled(False), y.requires_grad returns False. But in the example it is described as True.	2018-04-25 09:23:15 +02:00
Xiaomeng Yang	e27d66a454	Remove Eigen from math CUDA and update algorithm in ReduceTensor and Moments (#6922 )	2018-04-24 23:07:35 -07:00
onnxbot	40301c3be7	[auto] Update onnx to 15289e3 - Tile - align with numpy (#757 ) `15289e3d77`	2018-04-25 06:05:47 +00:00
Wenhao Hu	2f311be90b	add default value to ConstantFill doc (#6923 )	2018-04-24 20:57:09 -07:00
li-roy	09f40ae06f	silence compiler warnings (#6915 )	2018-04-24 23:49:12 -04:00
Yang, Zhen	d9bde84b84	Add threshold for ops using openmp macro (#5584 ) * add threshold for ops using omp macro * modify interface for ops using omp macro * modify some thresholds * implement C macros with optional parameters to avoid duplicating definitions for all pointwise operations * add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing * modify the comment * Revert "add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing" Modify macro LAB_IMPLEMENT_VECTORIZED_FUNCTION to enable optional parameters This reverts commit 8ef783a0cc67b653c435e64a3beb6866a6b4216d. Conflicts: aten/src/TH/generic/THTensorMath.c * fix build error on windows * retrigger the test	2018-04-24 23:41:55 -04:00
jhcross	aa88ca8ae0	remove quotes from caffe2/contrib/aten/CMakeLists.txt (#6928 )	2018-04-24 20:37:14 -07:00
Orion Reblitz-Richardson	dec5e99e99	[aten] Move submodules to third_party (#6866 ) * [aten] Move submodules to third_party * [aten] Update aten_mirror.sh script for third_party * [aten] Move ATen submodules def to root and rename * [aten] Update cpuinfo cmake build * [aten] Fix cpuinfo cmake build * Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1 * [aten] Fix JIT test reference to catch	2018-04-24 23:33:46 -04:00
Varun Agrawal	c33d7f565b	updated the environment collection script URL to the raw version on Github to download the script instead of the webpage (#6927 )	2018-04-24 23:30:32 -04:00
Yinghai Lu	8b70f7d248	[Caffe2] Clean up ideep integration (#6881 ) * Clean up ideep integrtation * . * Remove redundant code in convnet benchmark * MKL ON * Do not add -mavx2 everywhere * . * Comments * rename * .	2018-04-24 18:32:35 -07:00
Zachary DeVito	b7487d42a0	Workaround to make PythonOps traced with torch.jit.trace work correctly. (#6738 ) The long-term fix is to remove the handling-creating pathways and remove all the modes from PythonOp making it into an op that simply calls a PyObject. Right now ONNX expects PythonOp to hold a nn.Function, not a generic callable, so completely removing the legacy pathway will also require changes to how ONNX symbolics are found.	2018-04-24 17:21:00 -07:00
onnxbot	e28508afa5	[auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804 ) `42207c60d8`	2018-04-24 23:53:27 +00:00
James Reed	3c80a2b85c	[caffe2] Add flag to ONNXWhile to skip scoping (#6910 ) * [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend * [caffe2] Add flag to ONNXWhile to skip scoping	2018-04-24 16:53:22 -07:00
onnxbot	53a8158d6d	[auto] Update onnx to 0eaf45f - Add dtype for input in Gather node test case (#815 ) `0eaf45ff89`	2018-04-24 22:23:24 +00:00
Zachary DeVito	0b5910f77e	[jit][script] Fix a bug combining sizes/unsized tensors (#6882 ) * [jit][script] Fix a bug combining sizes/unsized tensors This add an isSubtypeOf method to reflect that sized tensors are a subtype of Dynamic[Tensors]. It updates the typechecking code to reflect this relationship. * Add index_select to shape prop	2018-04-24 14:04:18 -07:00
James Reed	6e60edb799	[caffe2] Fix logic error in tensor filling ops in C++ ONNX backend (#6909 )	2018-04-24 13:53:27 -07:00
Lu Fang	146e8c8a10	Fix the legacy padding handling on global pool case (#6473 )	2018-04-24 13:34:51 -07:00
Bram Wasti	cfb626b638	[caffe2][tiny][fix] Make the build work with profile observers (#6908 )	2018-04-24 12:46:48 -07:00
Richard Zou	9dd73aa7eb	Fix stable link to always be /stable/ (#6907 )	2018-04-24 15:42:46 -04:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
gchanan	90e75c6528	Speed up printing of large tensors. (#6876 ) * Speed up printing of large tensors. Instead of deciding on the format based on all of the elements of the tensor, decide based on the elements that will actually be printed. * Fix flake8. * Add else case.	2018-04-24 14:04:29 -04:00
Richard Zou	0430bfe40b	[docs] Update broadcasting and cuda semantics notes (#6904 ) * [docs] Update broadcasting and cuda semantics notes * Update multiprocessing.rst * address comments * Address comments	2018-04-24 13:41:24 -04:00
Edward Z. Yang	6418c49ee9	Make ArrayRef read-only by default. (#6444 ) Sebastian Messmer noticed that these iterators were writeable by default, which seemed dangerous. Replaced with const iterators. This doesn't seem to affect any ATen code; seems reasonable enough. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-24 13:30:43 -04:00
Edward Z. Yang	26c53c58a2	Fix ATen .travis.yml setup (#6860 ) - ATen repo now has a new top-level, so Travis script has to be adjusted to (1) be moved to the top-level and (2) cd into the aten directory before doing anything. - Unfortunately, this makes the import script even slower, because I'm banging on the entire index every commit. If anyone has better suggestions for how to twiddle the index. One possibility is to fold the ATen build into the base\ .travis.yml but only activate it when a file is missing (and then filter out that file.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-24 10:07:33 -04:00
onnxbot	21e0fc8fec	[auto] Update onnx to adbfb4a - Fix the ConstantFill spec (#808 ) `adbfb4ad19`	2018-04-24 03:55:51 +00:00
Priya Goyal	7d32f6fdc3	Adding runtime warning for checkpointing inputs to have requires_grad=True (#6883 ) * Adding the warning for the checkpointing inputs to have requires_grad=True * fix bug	2018-04-23 22:43:35 -04:00
Sam Gross	9765bb5f1e	Revert "Fix performance regression of simple indexing cases (#6793 )" (#6886 ) This reverts commit 8a016693c0808ec8353370fd4c48f4049a372b74.	2018-04-23 22:22:12 -04:00
Soumith Chintala	b6ed729cdc	fix memory leak in median (#6889 )	2018-04-23 22:20:03 -04:00
Yangqing Jia	df2817d3b1	Bump benchmark to master (#6878 ) * Bump benchmark to master * add semicolon to BENCHMARK_MAIN	2018-04-23 16:28:08 -07:00
Richard Zou	82a33c32aa	Update device docs (#6887 ) Tell users that one can substitute torch.device with a string	2018-04-23 19:04:20 -04:00
Tongzhou Wang	b5d2d285a8	fix SVD backward on non-square matrices when some=False (#6870 )	2018-04-23 19:01:51 -04:00
Tongzhou Wang	1ee009599c	Add torch.get_default_dtype doc (#6872 ) * add torch.get_default_dtype doc * address comments	2018-04-23 18:58:01 -04:00
anderspapitto	750a323ca1	Work around protobuf issues by importing onnx first (#6833 )	2018-04-23 15:44:04 -07:00
Bram Wasti	aa56a1211d	Update from facebook (#6871 ) * Track checkpoint performance in scuba As title. * [C2/CUDA]: fix cross entropy sigmoid with logits when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes it. * Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions" Original commit changeset: 8918dd40205a Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands * [Easy][C2] Don't add blob to external outputs from output_record if it's already external output As desc. * On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization FACEBOOK: The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there. With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging. Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models. Will clean up the old code calling the init in the product in a separate diff. * add padding op for sparse length tensor to pad length-based sparse tensor with padding_value * Add conv_op with cudaconvnet engine Add conv_op with cudaconvnet engine * [numa] Fix simple NUMA copy benchmark Move XavierFill into init_net and also compute BW * call roundf (device function) instead of round (host function) * [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer 1. Add ClearGlobalNetObservers() 2. Make caffe2_benchmark use its own observer and observer_reporter * [detectron] Use roundf instead of round in the detectron module ops * allow K larger than number of elements in top k op one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined. * add ChannelShuffle DNNLOWP op * fixup math_cpu.cc break	2018-04-23 15:01:56 -07:00
mdschatz	aeb91587e5	[caffe2] Fix observer logic in RNN executor. Remove dynamic casts (#6202 ) * Fix observer logic in RNN executor. Remove dynamic casts * Revert to original design	2018-04-23 15:01:00 -07:00
Bram Wasti	548f6e34ab	[caffe2][nomnigraph][fixup][tiny] Remove accidentally included logging (#6880 )	2018-04-23 13:59:55 -07:00
Yinghai Lu	9ed46c615c	[Caffe2] Provide option to initialize the TensorRT engine at Operator constructor time (#6809 ) * Try to have a lazy conversion of onnx-trt * . * Make it work * comments	2018-04-23 13:09:35 -07:00
li-roy	a2f2d6b43f	Add special case for printing dtype for empty int64 tensor (#6869 ) * add special case for printing dtype for empty int64 tensor * add comment	2018-04-23 12:07:59 -07:00
kevinbchen	a02b7c9776	Move main slice logic for easier reuse (#6822 ) Want to reuse this logic for Int8 Slice.	2018-04-23 12:00:56 -07:00
Zachary DeVito	b8ada7380a	Tuple literal and cat support (#6691 ) * Support list and tuple literals: Adds support for [a, b], (a, b) and "a, " * Allow non-tensors to reach emitBuiltinCall, each SugaredValue::call is now responsible for checking the types of its inputs. Add support for calling cat with a tuple to emitBuiltinOp	2018-04-23 10:58:07 -07:00
Qinqing Zheng	90586d925f	[DT] [38/n] Rename add_stop_signal to add_stop_condition (#6825 ) att	2018-04-23 10:39:37 -07:00
onnxbot	a986b85afd	[auto] Update onnx to 3cb4d61 - Extend optimizer passes to recursively descend on GraphProto attributes (#803 ) `3cb4d61387`	2018-04-23 17:05:41 +00:00
James Reed	46b1737255	[ONNX] Switch ONNX peephole optimizers to recursively descend on sub-blocks (#6828 )	2018-04-23 10:01:03 -07:00
Richard Zou	3b63be063e	quick fix for collect_env (#6861 )	2018-04-23 10:33:06 -04:00
Richard Zou	4040164097	Relax collect_env.py tests (#6859 ) This PR makes it so that the collect_env.py tests ignore the most minor number of most version strings. It also bumps the version up to 0.5.0a to fix the CI.	2018-04-23 10:28:41 -04:00
peterjc123	a4dbd37403	[doc] Minor fixes for Windows docs (#6853 )	2018-04-23 13:15:33 +02:00
Jinghui	26ddefbda1	[feature request] [Caffe2] Enable MKLDNN support for inference (#6699 ) * Add operators based-on IDEEP interfaces Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Enable IDEEP as a caffe2 device Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add test cases for IDEEP ops Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add IDEEP as a caffe2 submodule Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Skip test cases if no IDEEP support Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Correct cmake options for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add dependences on ideep libraries Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix issues in IDEEP conv ops and etc. Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Move ideep from caffe2/ideep to caffe2/contrib/ideep Signed-off-by: Gu Jinghui <jinghui.gu@intel.com> * Update IDEEP to fix cmake issue Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix cmake issue caused by USE_MKL option Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Correct comments in MKL cmake file Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-04-22 21:58:14 -07:00
Yinghai Lu	a16b85facd	[Caffe2] Fix cuda.cmake (#6821 ) * Fix cmake * .	2018-04-22 21:32:18 -07:00
Yinghai Lu	e966f22656	fix typo (#6824 )	2018-04-22 21:32:00 -07:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
Richard Zou	7a3c38ab59	Add environment collection script (#6635 ) * Add environment collection script Fixes #6111. This should make it easier for users to report bugs by giving them a script to collect system environment information. Changes include: - Refactor out the environment collecting code from utils.bottleneck - Add script (collect_env.py) - Cleaned up the issues template so that it suggests using the script and is more readable. Testing: added expect tests to go with 4 CI configurations. Whenever one of these configurations gets updated, the test will fail until the test also gets updated. * Expect tests * Update issue template * Fix random space * Minor improvement to issue template; fix expect test * Skip expect test if BUILD_ENVIRONMENT not found; test fix; split off smoke/expect test	2018-04-22 15:18:14 -04:00
peterjc123	56567fe47d	Add documents for Windows (#6653 ) * Add Windows doc * some minor fixes * Fix typo * more minor fixes * Fixes on dataloader	2018-04-22 15:18:02 -04:00
Mike Ruberry	7d5c9bff58	Removes (unused) LinearIndexCalcData. (#6791 ) This class as well as several functions using it appear to not be used. This is simply code cleanup. Testing: All tests in test_cuda.py pass.	2018-04-22 13:58:22 -04:00
gchanan	1c7b0c1020	Update version string to 0.5. (#6795 )	2018-04-22 13:57:48 -04:00
Soumith Chintala	50e92a3085	Static linkage for CUDA (#6807 ) * add static linkage option for CUDA libs * add CuFFT linking via fakelink * remove warning for 5.0 cuda architecture	2018-04-22 13:57:17 -04:00
cpuhrsch	a8bdb561b7	Fix reductions on some contiguous tensors where size(dim) == 1 (#6815 )	2018-04-22 13:55:55 -04:00
James Reed	814f791f2b	[JIT][script] Improve error reporting for tuple type mismatch (#6819 ) Previously we would see errors like: variable 'states' previously has type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) but is now being assigned to a value of type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor): since the default case in the diagnostic printout was "Tensor". This adds a virtual member function to each Type class that returns a human-readable string for better error reporting * Improve error reporting for tuple type mismatch * Add better Tensor printout	2018-04-22 13:54:52 -04:00
Tongzhou Wang	95d0e9aaa2	[docs] Update set_default_(tensor_\|d)type docs (#6843 ) * update set_default_(tensor_\|d)type docs * make ndarray display nicer	2018-04-22 13:44:20 -04:00
bddppq	0d0dcde5a8	Fix caffe2 eigen + cuda9 windows build (#6746 )	2018-04-22 09:36:09 -07:00
onnxbot	4e8e13d90c	[auto] Update onnx to bf00ae6 - Kezhan/update ml op spec (#799 ) `bf00ae6118`	2018-04-21 22:34:34 +00:00
li-roy	d564ecb4a5	Update docs with new tensor repr (#6454 ) * Update docs with new tensor repr * remove cuda in dtype * remove changes to gloo submodule * [docs] document tensor.new_* ctor * [docs] Add docs for tensor.to(), tensor.float(), etc * [docs] Moar examples for docs. * [docs] Warning for tensor ctor copy behavior * Quick fix * [docs] Document requires_grad_() * [docs] Add example for requires_grad_() * update slogdet and fft update tensor rst * small fixes * update some docs * additional doc changes * update torch and tensor docs * finish changing tensor docs * fix flake8 * slogdet with negative det * Update functional.py tensor ctors * Fix nll_loss docs * reorder to move device up * torch.LongTensor -> torch.tensor or torch.empty in docs * update tensor constructors in docs * change tensor constructors * change constructors * change more Tensor() to tensor() * Show requires_grads_ docs * Fix set_default_dtype docs * Update docs with new tensor repr * remove cuda in dtype * remove changes to gloo submodule * [docs] document tensor.new_* ctor * [docs] Add docs for tensor.to(), tensor.float(), etc * [docs] Moar examples for docs. * [docs] Warning for tensor ctor copy behavior * Quick fix * [docs] Document requires_grad_() * [docs] Add example for requires_grad_() * update slogdet and fft update tensor rst * small fixes * update some docs * additional doc changes * update torch and tensor docs * finish changing tensor docs * fix flake8 * slogdet with negative det * Update functional.py tensor ctors * Fix nll_loss docs * reorder to move device up * torch.LongTensor -> torch.tensor or torch.empty in docs * update tensor constructors in docs * change tensor constructors * change constructors * change more Tensor() to tensor() * Show requires_grads_ docs * Fix set_default_dtype docs * Link to torch.no_grad, etc, from torch doc * Add dtype aliases to table * regen docs again * Tensor attributes stub page * link to inplace sampling * Link torch.dtype, device, and layout * fix dots after nonfinite floats * better layout docs	2018-04-21 07:35:37 -04:00
Xiaomeng Yang	34fa355f27	[caffe2] Add Moments to math (#6798 ) * Add gpu check for reduce_max * Add Moments in math * Update cpu version to avoid int type to be 0 * Update Moments on CPU to same as GPU	2018-04-21 01:03:44 -07:00
onnxbot	5945f3a7b4	[auto] Update onnx to e3da0f9 - Fix some checks not ideal to onnx-ml (#781 ) `e3da0f9bab`	2018-04-21 03:28:57 +00:00
bddppq	7b6b7d4575	Mark schema registration helper variables as unused (#6799 )	2018-04-20 19:57:42 -07:00
Yangqing Jia	8b28ab4858	Add option cache to speed up cmake build (#6737 ) * Add option cache to speed up cmake build * Also only run autogen_init_py_files once	2018-04-20 19:55:39 -07:00
li-roy	34edd6f12e	fix sparse tensor print (#6829 )	2018-04-20 19:39:52 -07:00
gchanan	8a434d9554	Print integral floating point numbers as X. instead of X.0000. (#6812 )	2018-04-20 21:26:21 -04:00
peterjc123	8fc11748fe	Fix debug build for Windows (#6758 ) * Fix debug build for Windows * Fix for wrong placement * Fix variable name	2018-04-20 21:02:18 -04:00
gchanan	a568b91a5d	[docs] Add missing device parameters to factories, refer to dtypes as data types rather than types. (#6803 )	2018-04-20 21:01:16 -04:00
gchanan	516f067641	InputBuffers should AutoGPU for accumulation. (#6826 )	2018-04-20 20:15:51 -04:00
Dr. Kashif Rasul	6c8f0ef33b	fixed error message (#6820 )	2018-04-20 20:14:10 -04:00
onnxbot	9b37a4d027	[auto] Update onnx to 4890619 - Remove debug string (#798 ) `48906190e6`	2018-04-20 23:39:11 +00:00
onnxbot	356af0c195	[auto] Update onnx to 2f7c284 - Use ONNX_NAMESPACE::to_string instead of std::to_string (#797 ) `2f7c284e57`	2018-04-20 23:28:21 +00:00
onnxbot	afea133113	[auto] Update onnx to b20fae0 - Add newline at the end (#795 ) `b20fae0287`	2018-04-20 23:24:08 +00:00
Lu Fang	db540c9e7b	Fix the bug in fb devgpu setup script (#6823 ) * Update onnx_c2_setup.sh * More fix	2018-04-20 15:15:41 -07:00
onnxbot	41bb1d56a7	[auto] Update onnx to f5496b2 - Update the remainig cases (#794 ) `f5496b2c74`	2018-04-20 21:06:51 +00:00
onnxbot	02544f4472	[auto] Update onnx to 7d1e102 - change the inference context api to use TypeProto (#779 ) `7d1e102e73`	2018-04-20 20:05:40 +00:00
Fritz Obermeyer	1d51dd8665	[distributions] Fix Independent.rsample() and add more tests (#6806 )	2018-04-20 21:55:39 +02:00
Bram Wasti	12e07ca731	[caffe2][nomnigraph] Add binary split algorithm to Algorithms.h (#6689 )	2018-04-20 11:49:17 -07:00
Bram Wasti	a73b3fd1f0	[caffe2][opencl] Add OpenCL context (#6777 )	2018-04-20 11:31:21 -07:00
Lu Fang	8a15bc4c9c	Fix the ONNX exporter API (#6788 )	2018-04-20 09:10:38 -07:00
onnxbot	188b6e9346	[auto] Update onnx to 6953eff - some cleanups to shape inference impls (#771 ) `6953eff49a`	2018-04-20 16:05:40 +00:00
Lu Fang	c286efb442	Quick patch for the CI (#6802 )	2018-04-20 08:58:38 -07:00
onnxbot	378f742792	[auto] Update onnx to 8dafe88 - Remove incorrect cases (#791 ) `8dafe88901`	2018-04-20 15:36:16 +00:00
Teng Li	3e2891b27a	Let Gloo close socket, destroy() not needed for non-NCCL backend (#6787 )	2018-04-19 23:52:12 -07:00
James Reed	ef76e24f60	[JIT][script][ONNX] ScriptModule ONNX export + ONNX export for control flow nodes (#6608 ) * ScriptModule ONNX export * ScriptModule ONNX export * Export for control flow nodes * Add pretty-print capability for ONNX export testing * Update tests and handling of mutliple GraphProto names * Maybe bugfix? * factor out code from export and pretty print	2018-04-19 23:45:03 -07:00
onnxbot	945cb0fabc	[auto] Update onnx to 45be0fe - Fix shadow-compatible-local compiler warning (#789 ) `45be0fe736`	2018-04-20 05:02:50 +00:00
Yinghai Lu	d695624efe	More trt tests (#6782 )	2018-04-19 21:53:49 -07:00
onnxbot	503be98d61	[auto] Update onnx to d01e4af - update the test cases (#788 ) `d01e4afc4e`	2018-04-20 04:35:38 +00:00
Zachary DeVito	c420297545	[jit][script] Constants python int now turn into Long (#6728 ) This matches the behavior or literals.	2018-04-19 21:33:29 -07:00
xkszltl	7e1c5ca6d5	Add missing #include for CAFFE2_MODULE macro. (#6790 )	2018-04-19 20:46:09 -07:00
gchanan	8a016693c0	Fix performance regression of simple indexing cases (#6793 ) * Fix performance regression on simple cases of indexing Dispatches to the old kernels * Adapt JIT test The test was expected to fail, but due to the change in the previous diff, it would now dispatch to index_select, which succeeds. I modified the function to go through the advanced indexing codepath * Only do checks once, properly AutoNoGil, AutoGPU.	2018-04-19 23:41:44 -04:00
onnxbot	c3bc927920	[auto] Update onnx to 7e1bed5 - Make proto_utils compatible for old version of protobuf (#787 ) `7e1bed51cc`	2018-04-20 03:32:13 +00:00
gchanan	a4ab83045d	Fix cross device indexing for more than 1 cuda device. (#6781 ) * Fix cross device indexing for more than 1 cuda device. Cross device indexing is attempted from ATen, which doesn't work well because ATen doesn't have AutoGPU, etc. Instead, before dispatching to ATen we do type conversion on the indices; it would probably be better if we pushed all this down to ATen, but that will take some work. * Small cleanup.	2018-04-19 22:03:25 -04:00
onnxbot	1a53e45558	[auto] Update onnx to abe285e - Fix unused parameter warnings (#786 ) `abe285e987`	2018-04-19 23:49:32 +00:00
Richard Zou	d1a992a85e	Disallow chunks that are <= in torch.chunk (#6761 ) Fixes #6759. Before, `tensor.chunk(0)` would cause a divide by 0. `tensor.chunk(-1)` would throw an error complaining that "split_size needs to be positive". This PR changes it so that the error message makes it clear that `chunks` has to be greater than 0.	2018-04-19 18:31:14 -04:00
onnxbot	264ffd143c	[auto] Update onnx to 5ef9c6e - Parallel windows build (#784 ) `5ef9c6ee28`	2018-04-19 22:27:53 +00:00
Tongzhou Wang	6a41e2dc47	Add BC mechanism to Module.load_state_dict (#6639 ) * Add version counter to module, change load_state_dict to use load_local_state_dict which does class specific loading * Clarifies version number in docs * fix jit tests * fix state_dict tests * typo * fix ddp * exclude version numbers from state dict entries * Fix jit test and empty modules * address comments * test for "." * revert the private version change in state_dict * make IN case a hard error * fix not reporting error when unexpected submodule * address comments * disallow empty string in name and remvoe trailing dot	2018-04-19 15:36:30 -04:00
onnxbot	6fed2341e9	[auto] Update onnx to 3b27cc8 - Try using pep518 to install the protobuf build dependency (#782 ) `3b27cc8faa`	2018-04-19 19:25:37 +00:00
bddppq	370acdf3bf	Change to use CAFFE2_HOME for specifiying caffe2 models path (#6775 )	2018-04-19 11:34:52 -07:00
Zachary DeVito	a3f3817fbd	[jit][script] Allow variables to be define in if statements (#6675 ) We allow variables defined inside of if statements to be defined after if statements as long as they will be defined unconditionally. This supports a larger subset of python programs than we supported before.	2018-04-19 11:32:31 -07:00
gchanan	4c5b95a433	Revert "Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 )" (#6772 ) This reverts commit 8d6a50aaeba2166ce870016da7488f879395ebb1.	2018-04-19 14:28:48 -04:00
Edward Z. Yang	6dfaa1071a	Check in ATen mirror script. (#6762 ) Fixes #6556. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-19 11:12:16 -07:00
Marek Šuppa	e169465672	rnn: A note on zero defaults for recurrent cells (#6719 ) * Add a note on zero default for recurrent cells * Fixes #434 Signed-off-by: mr.Shu <mr@shu.io>	2018-04-19 13:54:07 -04:00
gchanan	d0b0edf27a	Add a requires_grad_() function to tensors. (#6771 )	2018-04-19 13:47:24 -04:00
103yiran	f6da2fd944	Make the variable closer to usage (#6752 ) `chain` is used for the loop below.	2018-04-19 10:43:05 -07:00
Richard Zou	2acc247517	[docs] Update autograd notes (#6769 )	2018-04-19 13:34:14 -04:00
Tongzhou Wang	de9bdf1d31	Module.to doc udpate and example format update (#6774 )	2018-04-19 13:30:40 -04:00
Richard Zou	47bd4be4d3	[docs] More factory functions (#6709 ) * More factory functions Changes: - Added the remaining factory and factory-like functions - Better argument reuse via string templates - Link under torch.rst's Creation Ops to the randomized creation ops * Add double tick around False * fix flake8 * Fix False * Clarify comment: hopefully it is clearer now	2018-04-19 13:16:07 -04:00
Richard Zou	cc3284cad3	[docs] Clarify more CUDA profiling gotchas in bottleneck docs (#6763 )	2018-04-19 13:15:27 -04:00
Yinghai Lu	7f587de4bc	[Caffe2] Let TensorRT flow use the generic graph transformer (#6696 ) * Refine the transform API * Let TensorRT flow use the generic graph transformer * Rebase	2018-04-19 10:07:01 -07:00
Richard Zou	9c682f02b7	[docs] Fix some sphinx warnings (#6764 ) These aren't important but too many of them can obscure real warnings with the docs.	2018-04-19 12:37:42 -04:00
Armen	e44f901b55	added functionality for state_dict/load_state_dict for lr_scheduler ( Fixes: #3026 ) (#6342 ) * added functionality for state_dict/load_state_dict for lr_scheduler * fixed linting issues/removed unused import * refactor lr_scheduler state_dicts/state_dict holds everything __dict__ but optimizer * changed documentation in lr_scheduler * Update lr_scheduler.py	2018-04-19 07:09:03 -04:00
Tongzhou Wang	072d49f787	Fix import error sometimes happening in dataloader when exiting Python (#6671 ) * Fix import error sometimes happening in dataloader when exiting Python * address comments	2018-04-19 06:56:39 -04:00
Thomas Viehmann	533beab5bb	Fix doc for torch.nn.functional.relu (fixes #6742 ) (#6749 ) Thank you Shengyi Qian (JasonQSY) for spotting and reporting.	2018-04-19 11:25:43 +02:00
Xiaomeng Yang	71c644b005	[caffe2] Add ReduceMinOp and ReduceMaxOp (#6744 ) * Add gpu check for reduce_max * Add ReduceMinOp and ReduceMaxOp * Merge util functions in reduce_ops and math * Expose math internal functions	2018-04-19 00:22:23 -07:00
onnxbot	fff80c2c1f	[auto] Update onnx to 1439eab - Fix Protobuf error message in CI (#776 ) `1439eab554`	2018-04-19 04:53:09 +00:00
gchanan	e1f5d80d5c	Eliminate handle_zero_dim when broadcasting is applied earlier. (#6683 ) * Eliminate handle_zero_dim when broadcasting is applied earlier. This ends up not actually doing anything unless all the broadcasted tensors are scalars, which ends up with inconsistent behavior in that case only, because the type promotion rules are different. This is better solved with real type promotion logic. * Change type of script comparison to long. * Fix jit tests. * Fix cpp jit test by being consistent about long-vs-float. * Consistent float and long. * Use int64_t rather than long.	2018-04-18 23:37:54 -04:00
MRuberry	9c47eb5548	Fixes test_torch.py so that all tests pass on Volta hardware. (#6736 ) Issue: "python3 test_cuda.py" currently results in a failure when using Volta hardware. The failure is in test_advancedindex, and is caused by two "sub-tests." At line 4651 a series of indices are used to compare PyTorch's and Numpy's indexing behavior. At least two of these indices index the same element of the reference tensor multiple times. These are: [slice(None), [[2]], [[0, 3], [4, 4]]] [slice(None), [[0, 1], [1, 0]], [[2, 3], [3, 0]]] The first index selects the 5th element of the third row twice, and the second index selects the 4th element of the second row twice. This causes the test to attempt to update the same index with two distinct values simultaneously. On my machine the Numpy created tensor will always take the "latter" of these two values, while the Volta tensor will always take the "former." (Not to say this behavior is guaranteed by either framework.) The fix is to remove these two indices from test_torch.py. This causes all tests to pass. While updating test_torch.py I also noticed that assert_get_eq(tensor, indexer) had a bug where it was referring to "reference" instead of "tensor." This bug had no impact on behavior. The fix is to have this function refer to its input tensor, "tensor," instead. All tests still pass after this fix.	2018-04-18 22:44:14 -04:00
Richard Zou	11c1af8dbc	[docs] add docs for tensor.view_as (#6730 )	2018-04-18 22:43:45 -04:00
Tongzhou Wang	bacda6df8d	Better error message for gels on CUDA (#6726 )	2018-04-18 22:43:30 -04:00
Peter Goldsborough	75ccfb321b	Fix cpp_extensins.py (#6722 )	2018-04-18 22:43:12 -04:00
Yinghai Lu	4d2a0b889f	[Caffe2] Use mapped workspace instead of renaming when working on renamed nets (#6717 ) * Use mapped workspace instead of renaming when working on renamed nets * Comments	2018-04-18 19:14:11 -07:00
Jongsoo Park	c40eefeef9	ChannelShuffle with NHWC layout (#6667 ) * ChannelShuffle with NHWC layout * ChannelShuffle with NHWC layout	2018-04-18 19:13:45 -07:00
Adam Paszke	d26ab68485	Sort declarations when generating Python bindings (#6701 ) * Sort declarations when generating Python bindings This helps resolve ambiguities in argument parsing according to any rules we will need. For now, this allows us to make scalar operations more conservarive wrt. argument types, but makes them commutative again. * Fix inconsistencies between mod with tensor and scalar * Fix a stupid mistake	2018-04-18 21:51:35 -04:00
Xiaomeng Yang	e47b3018b7	[caffe2] Update EigenTensorMap to use ColMajor (#6735 ) * Add gpu check for reduce_max * Update EigenTensorMap to use ColMajor * Revert incorrect change on cpu	2018-04-18 18:28:38 -07:00
li-roy	d1bb75e273	Redo tensor repr to make it less verbose (#6370 ) * Redo tensor repr to make it less verbose * fix empty tensor * fix scaled scalars * update for device-dtype split * address comments * removed repeated lines * address comments * add cuda to device string	2018-04-18 18:25:07 -07:00
Will Feng	8d6a50aaeb	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6606 ) * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue	2018-04-18 20:41:33 -04:00
Tongzhou Wang	354dac9769	updates module.to doc for the new tensor.to(requires_grad) (#6733 )	2018-04-18 18:42:15 -04:00
Richard Zou	198be34de6	[docs] Add back deleted tensor.cuda() method (#6732 )	2018-04-18 18:20:09 -04:00
Pooya Davoodi	6ae060b2b1	Roll forward Eigen to 5a0ab9f to solve the compilation problem with CUDA 9.1 (#6710 )	2018-04-18 15:17:06 -07:00
Peter Goldsborough	c14b62fca2	Create FileBaton to synchronize distributed JIT C++ extension builds (#6684 ) * Create FileBaton to synchronize distributed JIT C++ extension builds * Move FileBaton to its own file * Autoformat code * Respect verbose flag in cpp_extension._prepare_ldflags	2018-04-18 18:07:03 -04:00
onnxbot	789e0e066a	[auto] Update onnx to 1756f61 - Align schema function signatures in python and c++ (#775 ) `1756f6183d`	2018-04-18 22:01:16 +00:00
Xiaomeng Yang	38614c4670	Add gpu check for reduce_max (#6729 )	2018-04-18 14:51:52 -07:00
Marat Dukhan	61a69c2492	[caffe2] Use both __ARM_NEON__ and __ARM_NEON macros (#6697 ) ARM64 clang from Android NDK doesn't define __ARM_NEON__, which results is perf regression on some models. I figured that some compilers define __ARM_NEON__ while others define __ARM_NEON. This patch changes all NEON-specific parts in Caffe2 to check both macros.	2018-04-18 17:45:47 -04:00
Paul Jesse Hellemn	ad8bfb7359	Adding package name parameter for conda builds (#6727 )	2018-04-18 14:02:09 -07:00
Richard Zou	2d09799950	[docs] Document CUDA profiling gatchas in bottleneck docs (#6715 )	2018-04-18 16:55:13 -04:00
Pooya Davoodi	969251962c	[Caffe2] Enhance test for CollectAndDistributeOp (#6693 ) * Caffe2: Enhance test for CollectAndDistributeOp This also changes the operator and the test to use stable sort otherwise the test will fail due to differences between the op and the test when facing ROIs of the same score. * Caffe2: Adjust comparator to make std::nth_element and std::sort stable Revert the removal of std::nth_element and std::sort and adding of std::stable_sort.	2018-04-18 13:19:05 -07:00
Will Feng	e089849b4a	Add mutex to THC random number generator (#6527 ) * Add mutex to THC random number generator * Add test for CUDA RNG multithread * fix lint * Rename gen_state to state and remove unnecessary mutex lock * Remove RNG test from cpp_extensions * Add CUDA RNG test to libtorch * Build test_rng only if CUDA exists * Move test to aten/src/ATen/test/ * Separate ATen build and test, and run ATen test in CI test phase * Don't test ATen in ASAN build * Fix bug in ATen scalar_test * Fix bug in ATen native_test * Add FIXME to some CUDA tests in scalar_tensor_test * Valgrind doesn't work well with CUDA, seed the CPU and CUDA RNG separately instead	2018-04-18 15:54:13 -04:00
Paul Jesse Hellemn	c25f097225	[wip] Fixing ci conda tests (#6686 ) * Disabling ttsvd test conda * Just skipping all tt_svd tests	2018-04-18 12:47:13 -07:00
ngimel	96e2140ffb	Check for g++ also in check_compiler_ABI (#6711 ) Otherwise a spurious warning is generated. @goldsborough	2018-04-18 20:30:52 +01:00
onnxbot	530e1e89f0	[auto] Update onnx to 5509f70 - more shape inference implementations (#758 ) `5509f70b80`	2018-04-18 18:20:13 +00:00
Xiaomeng Yang	be0b7f8c81	Add reduce min and reduce max (#6685 )	2018-04-18 10:58:05 -07:00
onnxbot	6c3e5af393	[auto] Update onnx to ac948de - add eliminate identity optimizer (#755 ) `ac948de61d`	2018-04-18 17:54:07 +00:00
daquexian	63d42408d0	[Caffe2] Detectron fpn support (#6645 ) * [Caffe2] Update collect_and_distribe op to fit arbitrary size * [Caffe2] batch_permutation CPU implementation * Make requested changes	2018-04-18 10:00:49 -07:00
cdiep	a1cc8dde80	Fix LSTM and GRU parameters description (#6665 ) * Fix LSTM and GRU parameters description * Fix previous layer time to t-1 as reviewed * Replace 'the first layer' to 'at time 0' per review suggestion	2018-04-18 12:05:25 -04:00
Ace	2a628ba32f	Update README.md (#6703 )	2018-04-18 09:44:08 -04:00
Thomas Viehmann	bd0cc7d364	Implement torch.einsum (fixes #1889 ) (#6307 ) * start at generic trilinear * Implement einsum (fixes #1889) This provides a simple implementation of einsum. It is built on top of the work for computing bilinear (#6110). It uses a naive left-to-right resolution at the moment. Autograd is able to differentiate by itself. The obvious unsupported feature is taking diagonals (einsum('ii->i',(a,)). * add tests and docs * fix flake8 * clean diff * rebase on current master to resolve conflicting String wrapping * clean up after rebase * better commentary in einsum and sumproduct_pair * don't say fixme if it's fixed and rename num_outputs to num_output_dims * adapt python wrapper to use std::string instead of String to avoid typedef at::String * typos and some vector to array conversion * fix accidental python<->python3 change * really fix bad rebase	2018-04-18 13:41:27 +02:00
Fritz Obermeyer	187955b959	[distributions] Skip validation of lazy properties (#6666 )	2018-04-18 10:12:08 +02:00
Zeming Lin	fb7bd8e4ae	Better dispatch (#6687 )	2018-04-18 08:38:47 +02:00
Orion Reblitz-Richardson	6223bfdb1d	Update from Facebook (#6692 ) * [GanH][Easy]: Add assertion to adaptive weighting layer 0 weight causes numeric instability and exploding ne * [Easy] Add cast op before computing norm in diagnose options As LpNorm only takes floats we add a manual casting here. * Introduce a new caching device allocator `cudaMalloc` and `cudaFree` calls are slow, and become slower the more GPUs there are. Essentially, they grab a host-wide (not device-wide) lock because GPU memory is transparently shared across all GPUs. Normally, this isn't much of a concern since workloads allocate memory upfront, and reuse it during later computation. However, under some computation models (specifically, memory conserving approaches like checkpoint-and-recompute, see https://medium.com/@yaroslavvb/fitting-larger-networks-into-memory-583e3c758ff9) this assumption is no longer true. In these situations, `cudaMalloc` and `cudaFree` are common and frequent. Furthermore, in data parallel contexts, these calls happen at nearly the same time from all GPUs worsening lock contention. A common solution to this problem is to add a custom allocator. In fact, nVIDIA provides one out of the box: CUB, which Caffe2 already supports. Unfortunately, the CUB allocator suffers from very high fragmentation. This is primarily because it is a "buddy" allocator which neither splits nor merges free cached blocks. Study https://github.com/NVlabs/cub/blob/1.8.0/cub/util_allocator.cuh#L357 if you want to convince yourself. This diff adapts a caching allocator from the Torch codebase https://github.com/torch/cutorch/blob/master/lib/THC/THCCachingAllocator.cpp which does splitting and merging and ends up working really well, at least for workloads like the checkpoint-and-recompute computation models noted above. I simplified the implementation a little bit, made it a bit more C++-like. I also removed a bunch of stream synchronization primitives for this diff. I plan to add them back in subsequent diffs. * Report reader progress in fblearner workflows Integrate with fblearner progress reporting API and add support to report training progress from reader nodes. If reader is constructed with batch limits, report based on finished batch vs total batch. The finished batch may be more than total batch because we evaludate if we should stop processing everytime we dequeue a split. If no limit for the reader, report based on finished splits (Hive files) vs total splits. This is fairly accurate. * [GanH][Diagnose]: fix plotting 1. ganh diagnose needs to set plot options 2. modifier's blob name is used for metric field can need to be fixed before generating net * Automatic update of fbcode/onnx to 985af3f5a0f7e7d29bc0ee6b13047e7ead9c90c8 * Make CompositeReader stops as soon as one reader finishes Previously, CompositeReader calls all readers before stopping. It results in flaky test since the last batch may be read by different threads; resulting in dropped data. * [dper] make sure loss is not nan as desc. * [rosetta2] [mobile-vision] Option to export NHWC order for RoIWarp/RoIAlign Thanks for finding this @stzpz and @wangyanghan. Looks like NHWC is more optimized. For OCR though it doesn't yet help since NHWC uses more mem b/w but will soon become important. * Intra-op parallel FC operator Intra-op parallel FC operator * [C2 Proto] extra info in device option passing extra information in device option design doc: https://fb.quip.com/yAiuAXkRXZGx * Unregister MKL fallbacks for NCHW conversions * Tracing for more executors Modified Tracer to work with other executors and add more tracing * Remove ShiftActivationDevices() * Check for blob entry iff it is present When processing the placeholders ops, ignore if the blob is not present in the blob_to_device. * Internalize use of eigen tensor Move use of eigen tensor out of the header file so we don't get template partial specialization errors when building other libraries. * feature importance for transformed features. * - Fix unused parameter warnings The changes in this diff comments out unused parameters. This will allow us to enable -Wunused-parameter as error. #accept2ship * add opencv dependencies to caffe2 The video input op requires additional opencv packages. This is to add them to cmake so that it can build * Add clip_by_value option in gradient clipping Add clip_by_value option in gradient clipping when the value is bigger than max or smaller than min, do the clip * std::round compat	2018-04-17 23:36:40 -07:00
Dmytro Dzhulgakov	eca0ef5e42	__STDC_FORMAT_MACROS was conflicting with some thirdparty include from google perf tools. Looks like a harmless fix (#6676 )	2018-04-17 22:33:37 -07:00
Yinghai Lu	6252706feb	[Caffe2] Workspace centric API for TensorRT transformation (#6678 ) * Workspace centric API for trt transformation * Merge SSA rewrite code	2018-04-17 21:23:27 -07:00
Simeon Monov	dc94182db0	Check for --noprefix option for mpiexec in run_test.py (#6690 ) * Check for --noprefix option for mpiexec --noprefix option to mpiexec is not part of the MPI standard. It is needed in certain configurations when using OpenMPI but not supported with other MPI implementations such as MPICH and maybe others. This commit adds a check if the option is supported by the current mpiexec. Also this commit fixes Issue #4965 and MPI tests can be enabled in the CI. Fixes: #4965 * Update run_test.py	2018-04-17 23:34:33 -04:00
Tongzhou Wang	1c01eabd3c	Codemod to update our codebase to 0.4 standard (#6641 ) * Codemod to update our codebase to 0.4 standard * Update some of the test scri[ts * remove Variable in test_clip_grad_value * fix _symbolic_override_wrapper_maker	2018-04-17 22:06:54 -04:00
bddppq	c43c911662	Export onnx protobuf bindings to python (#6651 ) * Export onnx protobuf bindings to python * rename native onnx module to _onnx	2018-04-17 16:38:57 -07:00
onnxbot	f50f1769ec	[auto] Update onnx to 844bbc2 - Update Python schema API to take domain (#764 ) `844bbc2142`	2018-04-17 23:33:10 +00:00
Jerry Zhang	711343f981	Gltensor fix (#6647 ) Fix getGLTensor	2018-04-17 16:25:38 -07:00
anderspapitto	4dd29ac89f	fix broken code from rebasing (#6681 )	2018-04-17 15:44:56 -07:00
Tongzhou Wang	1191627008	Make torch.backends.mkl.is_available() work without importing (#6677 )	2018-04-17 18:10:32 -04:00
Sanjeev Satheesh	f15f3ca1af	Scope variables inside the dataloader (#6673 ) * Scope variables inside the dataloader This clears up the memory consumed by batches inside the dataloader. Its pretty useful for long living data loaders. * Update dataloader.py	2018-04-17 17:48:12 -04:00
Tongzhou Wang	a86f53fbf1	Fix padding and output_padding in ConvTranspose docs (#6679 )	2018-04-17 17:36:19 -04:00
gchanan	8cf41b40e6	Update gitignore so that third_party/build and aten/src/ATen/Config.h are cleaned properly. (#6672 )	2018-04-17 17:27:35 -04:00
Yinghai Lu	459dfdc304	[Caffe2] C++ SSA Rewrite of Caffe2 nets (#6531 ) * Netdef SSA rewrite * unit test	2018-04-17 14:24:13 -07:00
Richard Zou	7de61c3b8c	Update tensors.rst Tensor introduction (#6670 ) Changes: - Deleted docs for old constructor. Add link to new `torch.tensor` ctor - Add docs for `torch.tensor` - Add some info on dtypes to the top of `tensors.rst`.	2018-04-17 16:52:22 -04:00
Xiaomeng Yang	4be34ca0f3	Add broadcast and reduce gradient (#6668 ) Add broadcast and reduce gradient	2018-04-17 13:31:13 -07:00
anderspapitto	e51e792cef	enable exporting bidirectional rnn with fixes seq len from onnx to caffe2 (#6566 )	2018-04-17 12:27:16 -07:00
Zachary DeVito	f656301526	Allow traces to call @script functions (#6642 ) This adds the ability to trace script functions while preserving their control flow. When the trace encounters a script function it inlines the graph of the function into the trace rather than tracing the function itself.	2018-04-17 15:19:16 -04:00
Richard Zou	1f2829dd2a	Update tensor factory method docs (#6640 ) * Update tensor factory method docs Also add new docs for `torch.empty`. * Add full; some refactoring to make docs nicer	2018-04-17 14:30:46 -04:00
Zeming Lin	d193f82c1d	Adding dispatch to Tensors (#6664 ) It solves the problem of chaining externally defined functions.	2018-04-17 14:13:29 -04:00
onnxbot	2aaa9ae60f	[auto] Update onnx to 54ca9cb - The content of a string is doubled if it's a string tensor (#765 ) `54ca9cb503`	2018-04-17 17:28:30 +00:00
Francisco Massa	feb8522f99	randperm supports n=0 (#6656 ) This makes it compatible with arange and numpy.random.permutation	2018-04-17 19:03:57 +02:00
Tony Beltramelli	7fcaf3b49e	Update torch.nn.init and torch.nn.utils.clip_grad (#6173 ) Introducing two updates. 1. Add param to He initialization scheme in torch.nn.init Problem solved: The function calculate_gain can take an argument to specify the type of non-linearity used. However, it wasn't possible to pass this argument directly to the He / Kaiming weight initialization function. 2. Add util to clip gradient value in torch.nn.utils.clip_grad Problem solved: DL libraries typically provide users with easy access to functions for clipping the gradients both using the norm and a fixed value. However, the utils clip_grad.py only had a function to clip the gradient norm. * add param to He initialization scheme in torch.nn.init * add util to clip gradient value in torch/nn/utils/clip_grad.py * update doc in torch.nn.utils.clip_grad * update and add test for torch.nn.utils.clip_grad * update function signature in torch.nn.utils.clip_grad to match suffix_ convention * ensure backward compatibility in torch.nn.utils.clip_grad * remove DeprecationWarning in torch.nn.utils.clip_grad * extend test and implementation of torch.nn.utils.clip_grad * update test and implementation torch.nn.utils.clip_grad	2018-04-17 11:32:32 -04:00
Richard Zou	1e34493825	Fix some loss output sizes (#6659 )	2018-04-17 10:59:12 -04:00
Mike Vella	d5f041aa8b	Updated documentation for cross entropy loss to include multi-dimensional input shapes (#6638 )	2018-04-17 09:56:43 -04:00
gchanan	c77fca570c	Add device docs; match constructor parameter names with attribute names. (#6633 ) * Add device docs; match constructor parameter names with attribute names. * Use double quotes for strings. * Update printing. * Separate device ordinal-only construction into a separate note. * Use current device.	2018-04-17 09:55:44 -04:00
gchanan	30849eb668	Bind 0-dim variables without requires grad to int64/double similar to how we do with Scalar. (#6637 ) Note: - Only integral scalar types bind to int64 - Both integral and floating point scalar types bind to double (same rules as python numbers).	2018-04-17 09:54:49 -04:00
Semion Sidorenko	639dd0e324	Fix an error in the tensor docs. (#6658 ) The docs incorrectly stated that there was seven CPU tensor types and eight GPU tensor types, before listing eight types for both CPU and GPU.	2018-04-17 09:54:19 -04:00
xhzhao	f2c9975378	Add DistributedDataParallelCPU (#5919 )	2018-04-17 15:36:47 +02:00
Du Phan	c345212c86	Support gpu triangle solve (#6648 ) * add cuda trtrs * remove queue * add test trtrs	2018-04-17 14:33:39 +02:00
albanD	b34ae77be8	always compute gradients for the gradcheck inputs (#6654 )	2018-04-17 14:23:59 +02:00
costin-eseanu	bc6243cb4a	Explicitly define all caffe2 reducer ops by name (#6513 ) * Explicitly define all caffe2 reducer ops by name instead of string concatenating them Explicitly define all caffe2 reducer ops by name instead of string concatenating them. * Use recursion to make the equal() function compatible with C++11. * Trivial change. * Trivial change. * Trivial change to force the flaky build system to rebuild. * Trivial change to force the flaky build system to rebuild. * Trivial change to force the flaky build system to rebuild. * Trivial change to force the flaky build system to rebuild. * Trivial change to force the flaky build system to rebuild. * Addressed @dzhulgakov's comments. * Addressed @dzhulgakov's comments. * Trivial change to force the flaky build system to rebuild. * Trivial change to force the flaky build system to rebuild.	2018-04-17 00:40:58 -07:00
Teng Li	e46043ab0c	Fixed NCCL build in fbcode (#6643 )	2018-04-16 23:53:56 -04:00
gchanan	5ed3f3347a	Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. (#6573 ) * Add dtypes (with reasonable defaults) to sum, prod, cumsum, cumprod. This adds optional dtypes to torch.sum, torch.prod, torch.cumsum, torch.cumprod. By default, the dtype is torch.float64 for integral types, and the dtype of the input for floating point types. * Don't use optional<ScalarType>, because the jit can't handle it yet. Instead, we manually build the overloads. This is fairly painful because of default arguments, but should be easy to pull out once the jit can handle optional<ScalarType>. * Fix keepdim with out parameters. * Fix _cudnn_rnn_flatten_weight. * If dtype is provided to an out function, make sure it matches the dtype of the result. * Fix typo.	2018-04-16 23:52:59 -04:00
Richard Zou	dd91d57c3f	Update docs for torch.zeros factory method (#6594 ) * Update docs for torch.zeros factory method If this looks good, I'll submit another PR rewriting the other factory methods in this fashion. * Address comments * Better explanation for device default * Add variable argument back * s/set/sequence/g * Remove class from torch.strided	2018-04-16 18:28:12 -04:00
Zachary DeVito	ee240aa00c	Allow script_methods to be defined out of order (#6341 ) This modifies the registration process so that all script methods in a ScriptModule are defined at once. Method gains a `method_creator` callback that gets invoked when the method is first called to define it if it has not already been defined. Recursive cycles in this `method_creator` are checked. This approach was chosen over first creating all the graphs and then inlining the call sites because it will combine better with type propagation for non-tensor types like tuples. e.g. ``` a = foo(b) return bar(*a) ```	2018-04-16 15:19:05 -07:00
Tongzhou Wang	0e93a2c334	Add Module.to (#6629 )	2018-04-16 17:46:52 -04:00
Atul Kumar	3e83e3abfe	Adding initial_accumulator_value parameter to Adagrad (#6616 )	2018-04-16 22:12:36 +02:00
srib	53d2612b55	Fix a typo in the setup.py script (#6632 )	2018-04-16 15:29:45 -04:00
Yinghai Lu	582d47e986	[Caffe2] Scoped dummy name generator (#6458 ) * Scoped dummy name generator * Fix * Fix * Use class variable * Fix build * comment	2018-04-16 11:58:02 -07:00
li-roy	ce2854c875	Create safe and unsafe versions of sparse_coo_tensor (#6058 ) Fixes #5748. Added an unsafe version so embedding isn't slowed. * Create safe and unsafe versions of sparse_coo_tensor * rename sparse_coo_tensor_unsafe to _sparse_coo_tensor_unsafe * refactor * make helper static inline * add sparse size check test * fix lint	2018-04-16 14:42:57 -04:00
Thomas Viehmann	40592f91b5	Fix bilinear performance regression (#6110 ) The current implementation of bilinar uses a matrix multiplication approach. This creates a large intermediate matrix (batch * output dimension * input dimension). Relative to the previous pure python approach, this caused severe performance regression (600ms vs. 18ms for 300x100x200 weights and a batch of 50 on CPU, and also quadratic memory). The attached change restores the performance using the previous strategy of looping over output features. It implements forward, backward, and double backward as native ATen code. Credits: Martin Tutek reported the regression and pinpointed the problem Adam Paszke patiently answered my questions about ATen I would not have been able to prepare this without you, thank you! I referenced the old python implementation, used a python version of the naive implementation, and coded manual functions etc. The tests have gradgradcheck etc. * fix memory use of native bilinear * bilinear double backward * Move bilinear_double_backward to Functions.cpp Addresses review comment by Tongzhou Wang. Thank you! * add WrapDimUtilsMulti.h * start at generic trilinear * move to generic trilinear * catch up on dim_list_to_bitset * switch bilinear to use _trilinear implement _trilinear_backward * add comments to Linear.cpp, move _trilinear in yaml	2018-04-16 14:41:47 -04:00
Simeon Monov	24b4931462	Improve run_test.py to support running individual test classes and methods (#6344 ) * Improve run_test.py to support running individual test classes and methods Added support in run_test.py for running individual test classes and methods. The -i/--include option can specify a list of test modules, classes or methods like this: python run_test.py -i autograd torch.TestTorch.test_abs \ torch.TestTorch.test_add utils.TestBottleneck -f, -l and -x behaviour stays the same as before * Fixed some code formatting * Multiple fixes according to the reviews in #6344	2018-04-16 14:33:50 -04:00
Richard Zou	4d0097fab8	Note that the Docker Hub image is not up-to-date. (#6434 ) Fixes #6397.	2018-04-16 14:31:34 -04:00
bddppq	7ef14bf04c	Follow the change of ONNX Cast operator "to" attribute (#6574 ) * Follow the change of ONNX Cast operator "to" attribute * Update Cast conversion in frontend and backend * update pytorch onnx frontend	2018-04-16 14:24:42 -04:00
Ailing	30157971f0	Update dist test to use multi gpus (#6337 ) * update dist test to use multi gpus * add nccl to jenkins * address comment * make lint happy * convert range object to list	2018-04-16 14:10:27 -04:00
Tongzhou Wang	892be8b779	Make dtype in .to positional rather than kwarg only (#6628 )	2018-04-16 14:03:40 -04:00
onnxbot	04fae73323	[auto] Update onnx to bf42662 - Change the "to" attribute of Cast operator to of type int (#727 ) `bf42662637`	2018-04-16 17:54:50 +00:00
gchanan	d7cb78478f	Split set_default_tensor_type(dtype) into set_default_dtype(dtype). (#6599 ) * Split set_default_tensor_type(dtype) into set_default_dtype(dtype). * Fix flake8. The difference between this one and set_default_tensor_type is that it only sets scalar type what determines the type + device of a tensor returned from a factory function with defaults is the default tensor type + the current device (if the default tensor type is cuda). This just changes the scalar type of the default tensor type. We do eventually want to deprecate set_default_tensor_type; it is not clear how to do that in a sensible and backwards compatible way.	2018-04-16 13:49:00 -04:00
Fritz Obermeyer	76ca037069	[distributions] Implement Independent distribution (#6615 ) * Implement Independent distribution * Add docs for Independent distribution	2018-04-16 11:42:12 -04:00
Yannick Soom	fd6d11ae66	Fixed text of error message in case of unexpected target size (#6617 )	2018-04-16 11:27:02 -04:00
gchanan	46374ad5c8	Add tensor.to(device) method. (#6588 ) * Add tensor.on(device) and tensor.on_device_as(tensor) methods. * Rename {'on', 'on_device_as'} -> 'to'. * Fix test ordinal. * Fix device ordinal again.	2018-04-16 10:50:34 -04:00
なるみ	084e3a755b	fix incorrect path (#6605 )	2018-04-15 21:22:11 -07:00
Xiaomeng Yang	2ef23b6241	[caffe2] Update transpose with compile time dimension (#6614 ) * Update transpose with compile time dimension * Change return to break	2018-04-15 19:20:39 -07:00
Teng Li	f5beff334b	Added distributed docs on NCCL2 backend/functions and launch module (#6579 )	2018-04-15 21:53:10 -04:00
Jon Malmaud	5463a4a319	Fix typo. (#6609 )	2018-04-15 11:43:10 +02:00
James Reed	8aff844f2d	[JIT] torch::jit::Type needs a virtual destructor (#6611 )	2018-04-15 11:41:34 +02:00
Xiaomeng Yang	cd2112717c	[caffe2] Update math functions with params on host. (#6602 ) * Update ReduceMean Add reduce mean to math Add reduce mean to math * sync reduce_ops_test * Update math_gpu.cu	2018-04-14 21:41:41 -07:00
onnxbot	caadc9301f	[auto] Update onnx to ff7b3b4 - enable warning check and fix warnings. (#760 ) `ff7b3b4c85`	2018-04-14 22:27:55 +00:00
onnxbot	0e246305ab	[auto] Update onnx to 97d3ae6 - Kezhan/update size op output type (#759 ) `97d3ae6ddd`	2018-04-14 01:40:58 +00:00
Richard Zou	eaf1e4b6ab	Docs for torch._like(...) factory functions (#6589 ) Docs for torch._like(...) factory functions In the same spirit as `torch.randn_like`. Address comments * Recommend ones/zeros with out keyword	2018-04-13 20:49:35 -04:00
James Reed	e8d2f05931	[JIT] Switch JIT passes to take a graph rather than TracingState (#6598 ) * Switch JIT passes to take a graph rather than TracingState * Add pybind11 binding for ONNX pass from graph * Fix canonicalize pass * address comment * Switch ToONNX to explicitly return new graph * optimize_graph instead of optimize_trace	2018-04-13 17:38:22 -07:00
Zachary DeVito	825ce7f196	[jit][script] Allow tuples to be re-assigned (#6538 ) * Allow tuples to be re-assigned This commit improves our support of tuples by making them more first-class. In particular, it allows tuples to be re-assigned across loops and ifs. It does this by making them first-class values in the Graph IR, and then removing the tuples in a LowerTuples pass. An alternative approach would have added more support for desugaring tuples in the Environment object as they were emitted. Instead, the current approach was chosen anticipating a future when tuples are fully supported (including the interpreter). In that future, the current code can be completly reused with the LowerTuples pass just becoming a optimization that removes unneeded tuple allocations.	2018-04-13 17:34:50 -07:00
Paul Jesse Hellemn	0042851e04	Fixing some typos (#6595 )	2018-04-13 16:30:31 -07:00
onnxbot	11b9180563	[auto] Update onnx to 5355440 - add fuse_conv_add_into_bias optimizer (#707 ) `5355440f5a`	2018-04-13 23:03:37 +00:00
onnxbot	9dfc01b659	[auto] Update onnx to b7d66d8 - Add some more type/shape inference implementations (#725 ) `b7d66d8838`	2018-04-13 20:41:22 +00:00
Andrew Tulloch	84707be156	WorkersPool uses atomic writes to task_ (#6577 )	2018-04-13 13:26:41 -07:00
Paul Jesse Hellemn	e10d5cdc68	Change to ldd parsing regex (#6592 )	2018-04-13 13:10:31 -04:00
onnxbot	3140fe0ed1	[auto] Update onnx to 7b33c37 - fix docs of pool op (#751 ) `7b33c37ae5`	2018-04-13 16:52:17 +00:00
Richard Zou	6c0f74089f	More precise digamma (#6517 ) * More precise digamma Fixes #6190. This is a rebase of #3955 with some tweaks for better performance around poles. The code is ported over from cephes with permission. By itself, the cephes code returns inf for the poles. For better performance around the poles with float32, one intermediate step is always computed with double precision, regardless of dtype. This step does `PI / tan(PI * input)`. This is necessary because small (1e-6) rounding errors for the inputs to tan have strong effects on the output (ie, the derivative of tan is very large at some points). * Replace usages of finite-differences digamma with newly implemented digamma * Better behavior near and at poles * ScalarConvert -> scalar_cast for readability	2018-04-13 11:49:09 -04:00
Richard Zou	99cfb56698	Add docs for torch.randn_like (#6565 ) * Add docs for torch.randn_like * Address comments * Address commetns * Address comments	2018-04-13 11:33:56 -04:00
onnxbot	62ac7f9812	[auto] Update onnx to fa04841 - specify default value for thresholdedrelu's alpha attribute . (#753 ) `fa048410fa`	2018-04-13 03:12:38 +00:00
cdiep	f3a9be0ed5	Fix RNN parameters description (#6575 )	2018-04-12 23:08:44 -04:00
Tongzhou Wang	56563a0a79	Use THC allocation for CUFFT workspace (#6568 ) * use THC allocation for CUFFT * use auto& instead	2018-04-12 21:11:44 -04:00
Paul Jesse Hellemn	be86500244	Conda binary changes (#6534 ) * Adding integrated pytorch-caffe2 package * Updates * Fixing more substitution * Fix to pytorch build location * Bugfixes, progress towards including CUDA libs in package * Fix to sed call * Putting off packaing CUDA libs for Caffe2 * Progress towards packaging CUDA libs * Progress towards packaging CUDA libs * Changes to CUDA copying * Turning on CUDA lib packaging * Correction to env variables passed into meta.yaml * typo * Adding more needed variables in build.sh * Adding some debugging info * Changing versioning to have dates and be in build string * Removing version from build string * Removing packaging CUDA logic for static linking (later) * Changing version to mirror pytorch * Removing env variable req in build.sh * Change to sed to port to mac	2018-04-12 16:51:06 -07:00
James Reed	3b0204d43c	[JIT] Hacky: Staged symbolics for RNN nodes (#6297 ) * Staged symbolic for RNN modules * Move function to symbolic.py * Add comments, improve tests, fixup logic	2018-04-12 16:29:25 -07:00
Peter Goldsborough	8af0f69a23	lowercase tools/cpp_build/libtorch/CMakeLists.txt (#6567 )	2018-04-12 16:21:46 -07:00
bddppq	d725cd5966	Fix ATen build in Caffe2 (#6496 )	2018-04-12 16:09:50 -07:00
onnxbot	30a37a2111	[auto] Update onnx to 5c9c778 - [Typing 2/3] Add python type hints for C++ code (#610 ) `5c9c778270`	2018-04-12 22:47:41 +00:00
Richard Zou	16704249cb	Add docs for tensor.index_put_ (#6563 )	2018-04-12 17:00:02 -04:00
Tongzhou Wang	c2187790e3	Improve utils.checkpoint docs (#6526 ) * improve util.checkpoint docs * change volatile to no_grad, and add more explanation * address comments	2018-04-12 16:59:06 -04:00
Tongzhou Wang	e01569afd7	Restore allow_unused functionality (#6553 )	2018-04-12 21:30:42 +02:00
James Reed	60b67eb604	Create CODEOWNERS entry for torch/onnx (#6560 )	2018-04-12 15:27:29 -04:00
Marat Dukhan	6ce6c0ed65	[caffe2] Fix bug in NNPACK bindings for convolution in precomputed transform (#6555 ) Caffe2-NNPACK integration created blobs for precomputed kernel transorms based on the name of Conv operator. When Conv operators have the same name (e.g. empty string), or the blobs for precomputed transforms get the same name and overwrite each other. This patch ensures that blobs for all precomputed transforms in the network get a unique name.	2018-04-12 15:20:26 -04:00
Tongzhou Wang	8aa0ae3836	Support arbitrary number of batch dimensions in *FFT (#6528 )	2018-04-12 15:03:22 -04:00
gchanan	749d51414a	Separate cuda-ness from dtype. (#6470 ) * Separate cuda-ness from dtype. There are no longer torch.cuda.int64, etc; only torch.int64 that correspond to at::ScalarType. At the python arg parser level, the corresponding ATen type is selected from the combination of (ScalarType, Layout, Device). There is also currently unused code in here for support ScalarType in native_functions; this will be used for specifying aggregate types on reduction functions. * Fix test_autograd. * Add defaults to randint_like. * Track is_cuda in py tensor types. * Fix test_sparse. * Fix multiprocessing. * Fix rnn. * Fix test_nn. * Fix flake8.	2018-04-12 14:05:44 -04:00
Zachary DeVito	8995ddda05	[jit][script] Check that each builtin returns the right number of values. (#6492 ) * Fixes to the way script handles multiple values, and other minor fixes. This commit improves our handling of operators that return multiple values. Builtins are now checked so that they return the right number of values, and support for TupleValue is extended to all things that can return multiple values. This resolves issues where the compiler accepted things like: a, b = c + c This would cause the interpreter to crash. Now each operator knows how many results it will produce and can check it against the number of requested inputs. Notes: * Allow True/False literals in constant expressions * make handling of keyword constants more consistent to support True/False * make parsing constants match the way we construct constants from python * improve the error messages when accessing bad graph attributes. * switch findTensorOp to return an optional. * check that attribute types are correct in findTensorOp * Check the correct number of outputs for builtins This also changes emitExpr to return a single SugaredValue Rather than possibly returning multiple values, emitExpr now always returns a single value, which _might_ be a tuple. This approach more closely follows python making the code easier to follow. Checks for returning the right number of values are now located in the assignment operator, and occur when unpacking the tuple. We still pass `n_binders` to function calls so that calls into python know how many values they should return.	2018-04-12 10:32:49 -07:00
Tongzhou Wang	f6e8b86315	STFT is differentiable out of the box. Fix the regression that marked it as backward-not-implemented (#6541 )	2018-04-12 08:59:00 -04:00
peterjc123	d45f3d0d5c	Skip cpp_extensions test when possible on Windows (#6423 )	2018-04-12 12:12:39 +02:00
Xiaomeng Yang	8849bea120	[caffe2] Update ReduceOps (#6497 ) * Update ReduceMean * Add reduce mean to math * Update cuda flag * Update Eigen::Tensor ctor * Remove unused variables * Skip ReduceTensorGPUTest if no gpus * Add NOMINMAX for windows * Fix lpnorm_op in windows	2018-04-11 23:36:05 -07:00
peterjc123	0a6331792d	Fix #6398 , Add MKL threading support for Windows (#6416 ) * Add openmp support for Windows * Remove pthread from dependency list * Revert "Add openmp support for Windows" This reverts commit f234c124ba2b47746e197bc185c083737fee6e65. * Don't link with msvc openmp libs	2018-04-11 23:10:06 -04:00
peterjc123	f54eac7eba	Add flag and warning for Python 2.7 users on Windows (#6499 )	2018-04-11 23:06:51 -04:00
peterjc123	fc56e8fea5	Quote arguments only when possible (#6405 ) * Quote arguments only when possible * Minor fix * Add no quote conditions	2018-04-11 23:03:02 -04:00
James Reed	1943e9763f	[ONNX][easy] Don't set uniqueName if it's already set (#6533 )	2018-04-11 18:41:38 -07:00
Marat Dukhan	63b5cc47eb	[caffe2] Minor changes in NNPACK CMake scripts (#6532 ) - Tell NNPACK to not link pthreadpool, but only its headers - Remove FindNNPACK.cmake as it is no longer used	2018-04-11 20:56:38 -04:00
Yinghai Lu	434f710f3f	[Caffe2] Add support to TensorRT (#6150 ) * Add support to TensorRT * Removed License header * Bind input/output by position * Comments * More comments * Add benchmark * Add warning for performance degradation on large batch * Address comments * comments	2018-04-11 17:03:54 -07:00
Tongzhou Wang	1f0b07cddc	fix typos in sampler.py (#6525 )	2018-04-11 17:27:25 -04:00
Tongzhou Wang	6b7ec95abb	Link relevant FAQ section in DataLoader docs (#6476 ) * Link FAQ section on workers returning same random numbers in DataLoader docs * explicitly mention section names	2018-04-11 13:41:46 -04:00
Tongzhou Wang	5ce6b97aee	Use symbolizer in ASAN (#6506 )	2018-04-11 13:41:23 -04:00
Xingdong Zuo	494aaab00e	Add docs for `item()` (#6508 )	2018-04-11 12:40:01 -04:00
Naman Jain	1e5611014d	Adding autofunction entry for torch.randint (#6507 ) * added randint function in ATEN yaml as well as Tensorfactories.cpp * corrected randint * randint with overloading complete,getting tuple of ints behaviour though * done randintlike and randint_out Left : adding docs and test, and remove the bug on size = (5) * Removed my error messages, ThRandomTensor will handle all exceptions * added docs and tests, corrected a mistake Tested with manual seeds in some test cases as well. Seems fine to me (check documentation though) * corrected indentation to spaces, and improved sizes argument description * made documentation argument description shorter * added whitespace after ',' in torch docs * addes spaces in documentation * added more tests (including bounds and overloading features) * added whitespaces in test_torch * removed trailing whitespaces * removed whitespace from a blank line * removed positive requirement from docs. Added dtype argument and gave eg * made randint over randn in all files * changed to data type for dtype in docs for randint * added autofunction entry for randint in torch.rst	2018-04-11 12:34:25 -04:00
Tongzhou Wang	ca09e4a3c5	Fix THTensor_(take) negative index check (#6482 ) * fix THTensor_(take) negative index check * add tests * rename to invalidIdxPos	2018-04-11 12:12:35 -04:00
Sam Gross	e07952dbc9	Add SmallVector from llvm (#6485 ) Adds at::SmallVector and supporting AlignOf class to ATen from LLVM. http://llvm.org/doxygen/SmallVector_8h_source.html	2018-04-11 12:01:12 -04:00
onnxbot	d3f11310fa	[auto] Update onnx to 00fa587 - Enhancements to shape inference (#655 ) `00fa58791e`	2018-04-11 15:09:16 +00:00
Tongzhou Wang	d9345aa60f	add checkpoint to index.rst (#6498 )	2018-04-11 02:50:01 -04:00
Peter Goldsborough	e4f1d3b538	Better warnings (#6428 ) * Better warnings * Remove -Wc++14-extensions because gcc does not know it * Warning fix in input_buffer.cpp * Remove pedantic for torch/csrc/ * Also use Wextra and Wall for ATen * Use check_env_flag * Undo changes in shape_analysis.cpp * Remove C linkage flag	2018-04-10 23:34:25 -07:00
Yinghai Lu	ef8f556212	[Caffe2] Changes done inside Facebook (#6378 ) * fix unit test for sqrt op From the error logging: [idx, grad, grad_estimate] are: [[ 146. 0.5 0.45776367] [ 147. 0.5 0.45776367] The gradient == 0.5 is correct, which means the SqrtOp and its gradient is doing right job. (Because y = sqrt(x), loss = y^2/2 = x/2, and then d(loss)/dx = 1/2 = 0.5; ) The test failed because of numerical problem of grad_estimate (in unit test). It can be because the step_size is small, and float precision is not high (when there are multiple elements in the tensor, we do sum(y^2) to compute loss) This diff - increase the step size, and also move the test cases to be further away from 0 (where sqrt(x) is not well defined) to be safe :) - also clean up, and merge the test case for inplace Vs. non-inplace Tested with: `CAFFE2_HYPOTHESIS_PROFILE=debug ai_bt caffe2/caffe2/python/operator_test:elementwise_ops_test -- "test_sqrt"` * CompositeReader & CompositeReaderBuilder A new type of reader gluing multiple readers together. * Back out "Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid" Original commit changeset: 9325a4356dbe * [dai][WIP] convert params to int8 on ps before sending to trainer Add float->uint8 conversion in addition to float->fp16 conversion in model_saver. * [easy] improve unit test for sparse length sum ops as desc. #accept2ship * Update GitHub upstream to 771fcb3455cbfe69c2abcc4cb3bd7ef92d59af24 * move sparse hash unique ops to OOS and add unit tests - move the SparseHash version to OOS, since 'sparsehash' is already deps of caffe2 OOS: https://fburl.com/arssw4n1 - The 'SparseHash' engine is also being used in OOS, so the SparseHash version shall be in OOS to reduce confusion: https://fburl.com/o5ea7ah2 - fix the CUDA UniqueOp for the case when batch is empty. - add unit test * group_norm_op for caffe2 This is the cuda op for Group Normalization (GN): https://arxiv.org/abs/1803.08494 This code implements GN in one op that computes Y=gamma * (X-mu) / sigma + beta and also its gradients. It is expected to have minimal memory consumption (similar to the BN op), without creating new blobs if GN were implemented as several ops (e.g., reshape, norm_mean/std, affine_channel). * Resubmit D7405233: disappeared in D7464958 OOS publish causes the op missing -- however, test was still there * [c2] add sparse hash engine for cuda unique op The SparseHash version of UniqueOp copy input tensor to CPU, and make use of sparse hash map to get unique output, and then copy back to GPU. * [dper][gpu] enable unit testing gpu trainer for sparse nn to debug the GPU trainer using mock data in unit test. make it easier to develop GPU trainer for new models. * Reuse Gloo context for Synchronize() calls Previously we were creating (and leaking) the Gloo context on each call to Synchronize(). Now only run the common world op and create the barrier net once, then run the barrier net on each Synchronize() call. Since timeout is associated with the Gloo context, assert that the timeout is fixed instead of trying to handle the complexity of multiple timeouts (and associated contexts). * [GanH/WGAN][1/n]: add FC param clipping as titled * [mobile] minimizing changes between caffe2_benchmark and speed_benchmark * [GanH]: enable diagnose within model avoid finding blob names but to directly enable inside the model * Add `net_transformer_fun` option to DPM This callback allows for various transformations to be made to the model after gradient operators have been added. The immediate motivation for this is to allow transformations such has "checkpoint-and-recompute" which allow trading off memory for additional compute. Adding several callbacks like this has made DPM's API less than ideal at this stage. However, I could not find any reasonable alternative. * [DT] [33/n] Compile flow task groups task groups need to compiled in order to pickle the object in fblearner. However I also changed the Job's compile function as creating new object is not necessary. * Initial commit for sparse_normalize vectorization and benchmark * [GanH]: LB Calibration for JSD as titled * Tracing event in async executor Adding event tracing through TRACE_EVENT macro in async executor * [Resubmit] D7409751 Reseting book-keeping blobs when the reservoir is reset D7409751 got lost in D7464958 * Visualizing realtime weights values we want to visualize the weights values as optimizer is iterating. This diff supports to visual the weights at an assigned index. Currently, we assume the blob to be 2 dimensional. * [GanH][Easy]: Fix Homotopy Weighting apparantely, there was a bug in homotopy weight (alpha, beta) update * [c2] move sparse hash unique op out of oss so that oss do not need to depend on google hash map. * Get rid of std::round as it's not supported on Android * Revert changes on setup.py * Skip shaky test on Dataio * fix	2018-04-10 21:11:43 -07:00
Tongzhou Wang	0dff2b5e35	[fft] [3 of 3] Implements backward of fft ifft rfft irfft (#5537 ) * change irfft signal_sizes arg to be the last * add docs for fft, ifft, rfft, irfft; update doc for stft * fix typo in window function docs * improve gradcheck error message * implement backward of fft, ifft, rfft, irfft * add grad tests for fft, ifft, rfft, irfft * fix nits and typos from #6118 * address comments	2018-04-10 22:09:36 -04:00
Jerry Zhang	63472bcf29	Sync current changes in ACL backend (#6484 ) * Sync changes in ACL backend	2018-04-10 17:32:22 -07:00
Tongzhou Wang	37d5c58f4b	Skip all TestTorch tests in test_cuda.py (#6489 )	2018-04-10 20:31:05 -04:00
Bram Wasti	7bd398b3db	Add fuseNNPACKConvRelu (#6439 )	2018-04-10 16:51:16 -07:00
Edward Z. Yang	5f311da758	Make python setup.py clean delete aten/build. (#6487 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-10 18:53:40 -04:00
onnxbot	d4e13a4ec8	[auto] Update onnx to 50fe321 - Fix fix (#744 ) `50fe321d05`	2018-04-10 22:06:04 +00:00
Will Feng	5c8290c20d	Update MKL version to 2018.2.185 (#6483 )	2018-04-10 18:05:10 -04:00
Peter Goldsborough	6f10978e7b	Skip C++ extensions test when ninja is not available (#6480 )	2018-04-10 14:50:24 -07:00
onnxbot	432425c76b	[auto] Update onnx to 1963285 - Guard type checking numpy imports (#741 ) `1963285656`	2018-04-10 20:44:44 +00:00
Peter Goldsborough	ae592b4999	Louder warning for C++ extensions (#6435 )	2018-04-10 12:47:39 -07:00
Teng Li	8e1d920695	Fixed Clang Compilation Warnings for THD by removing outdated C linking (#6448 )	2018-04-10 12:41:40 -07:00
Priya Goyal	e3196e0ea8	[Re-checkpointing] Autograd container for trading compute for memory (#6467 ) * Autograd container for trading compute for memory * add a unit test for checkpoint * address comments * address review comments * adding some docs for the checkpoint api * more comments * more comments * repro bug * Fix a subtle bug/apply some review comments * Update checkpoint.py * Run everything in grad mode * fix flake and chunk=1 * use imperative backward as per discussion * remove Variable and also add models and test for models * Add a simple thread local variable to check for autograd grad mode * remove models and models test after debugging * address review comments * address more comments * address more comments	2018-04-10 15:26:24 -04:00
Richard Zou	04c215b445	Add link in docs menu to stable docs (#6475 ) Part of #5738. Warns users that they're not viewing the latest stable release docs. We should remember to delete this when cutting out 0.4.0 release docs. (we'd just delete the div in pytorch.github.io)	2018-04-10 14:53:04 -04:00
Peter Goldsborough	c3f7e5ff55	Install signal handler for SIGCHLD in run_test.py (#6436 ) Handle exit signal in run_test.py	2018-04-10 11:31:23 -07:00
James Reed	ad5d421554	[JIT] Implement staged symbolics for pack_padded_sequence/pad_packed_sequence (#6256 ) * Unit test for pack_padded tracing * Move monkeypatching stuff * Switch symbolic * Fix stack traces and update test * Fixup and confirm e2e working * lint * Move monkeypatch back to onnx * Address comments * remove extraneous import * Add gradient checking * lint * Address comments * improve test case	2018-04-10 11:30:50 -07:00
onnxbot	8d6c5c7898	[auto] Update onnx to 0ee95e3 - Split operator tests (#557 ) `0ee95e36e6`	2018-04-10 18:04:04 +00:00
Sam Gross	64e94814da	Clean-up test_indexing.py after Tensor/Variable merge (#6433 )	2018-04-10 14:03:14 -04:00
onnxbot	aea31131e5	[auto] Update onnx to 7fcdf41 - Setup mypy type checker (#676 ) `7fcdf41557`	2018-04-10 17:51:02 +00:00
Qinqing Zheng	038b66ee07	[caffe2] use dictionary in Printer (#6443 )	2018-04-10 10:37:07 -07:00
Tongzhou Wang	930f181255	Fix fft when any of the input dimensions is not aligned (#6118 ) * fix fft when any of the input dimensions is not like complex type; add test for ifft+fft * clarify the comments * Address comments: add note; add helper function * use at::nullopt * add notes on conjugate symmetry; fix complex-to-real cloning condition (should be advanced data layout rather than base_istride) * add at::sum_intlist and at::prod_intlist * revert optional<vector> helper due to windows compiler error	2018-04-10 13:11:05 -04:00
albanD	bb097e2a50	[pytorch] Fix signed random_ (#6463 ) * Fix cpu signed random * fix gpu signed tensor * add test for signed random_ * cleaner tests * fix lint	2018-04-10 13:07:04 -04:00
Yinghai Lu	f41044fa8a	[caffe2][nomnigraph] Generic subgraph replacement (#6368 ) * Nomnigraph cutting * Comments * Use caffe2::to_string * More to_string	2018-04-10 09:26:31 -07:00
Naman Jain	acb7df11a2	Add torch.randint and torch.randint_like functions (#6136 ) Adds randint and randint_like to TensorFactories.cpp	2018-04-10 12:08:21 -04:00
Sam Gross	aa99aa1cb8	Slice (instead of copy) when indexing by a zero-dim tensor (#6426 ) Slice (instead of copy) when indexing by a zero-dim tensor Fixes #6217	2018-04-10 11:47:22 -04:00
Tongzhou Wang	59bda9a8c4	Fix reflection padding boundary checks (#6438 ) * Fix Reflection padding boundary checks * Improve padding docs * fix lint	2018-04-10 10:37:01 -04:00
Xingdong Zuo	65a8ac0b8e	Add method to calculate perplexity of distribution (#6427 )	2018-04-10 12:18:26 +02:00
Subhash Mullapudi	79c3ebc040	adds correct precision to test_noncontig_conv_grad (#6440 )	2018-04-10 12:18:01 +02:00
Yinghai Lu	1110dd1f8f	Add mock to conda (#6460 )	2018-04-09 23:29:22 -07:00
Qinqing Zheng	66791f54d5	Update the compile function of Job (#6323 )	2018-04-09 22:44:23 -07:00
bddppq	df2e1d2962	Disallow using the OOP api workspace as context managers (#6456 )	2018-04-09 22:13:54 -07:00
bddppq	5e12ba92dc	Guard couple shape inference functions for unkown input shapes (#6379 )	2018-04-09 22:03:56 -07:00
onnxbot	ce37cf7914	[auto] Update onnx to 985af3f - Update PythonAPIOverview.md (#738 ) `985af3f5a0`	2018-04-10 03:06:09 +00:00
Tongzhou Wang	c05acd3840	Clarify Embedding padding_idx arg (#6430 ) * Clarify Embedding padding_idx arg * add a sentence about gradient being zero	2018-04-09 23:06:00 -04:00
James Reed	1533155c4e	[JIT][script] Implement compile-time tuples & starred unpacking (#6214 ) * Something that works * Tuple sugared value * Works with commenting out input size check * support string frontend * Initial starred assignment * Fix parser * Fixup tests * clang-format * fix rebase error * lint * move star assign test to string frontend to make py2 happy * Py2 fix: parse starargs from Call node * Address some comments * Fixup merge * Remove overloaded unary operators * Bugfix and test case * Address a few more comments * asValues -> asTuple * Remove unrolledFor stuff * Fixup getValues * Pass CallsiteDescriptor struct and have different behavior for different call types * Address comments and lint * some type checks * Address comments * lint * Fix mistake	2018-04-09 19:34:51 -07:00
onnxbot	afaa72716b	[auto] Update onnx to b69be33 - Add backend test for upsample (#729 ) `b69be334e5`	2018-04-10 02:03:16 +00:00
onnxbot	4900118a68	[auto] Update onnx to 0d9496e - Input test data of concat op should be float (#711 ) `0d9496e79b`	2018-04-10 01:49:54 +00:00
Richard Zou	265e1a97ec	Add different logo for master docs (#6446 )	2018-04-09 18:48:53 -04:00
onnxbot	26eb08abfa	[auto] Update onnx to 20bcb8b - Fix the spec for batchnorm and instancenorm (#733 ) `20bcb8bab8`	2018-04-09 21:58:48 +00:00
Marat Dukhan	e83dd716ec	[caffe2] Support fused Conv+Relu with NNPACK (#6375 ) Enable the use of fused Convolution+ReLU functionality from NNPACK	2018-04-09 15:39:31 -04:00
Tongzhou Wang	f9d3c3f4fd	fix typo in link to sigmoid activation image (#6429 )	2018-04-09 14:48:26 -04:00
Richard Zou	1b3a5a4e7d	bottleneck supports better user-provided arguments (#6425 ) Fixes #6312. Changed bottleneck's arg parser to user argparse.REMAINDER. This lets the user specify args as `python -m torch.utils.bottleneck script.py [args]` (previously, a -- was needed after `bottleneck` and before `script.py`).	2018-04-09 13:57:26 -04:00
peterjc123	5651695a99	Fixes #6386 , Use copies instead of symbolic files (#6396 ) * Use copies instead of symbolic files * bug fix * Remove useless item	2018-04-09 13:54:10 -04:00
Zhou Chang	d0f395f744	[pytorch] Fix clamp is missing kwarg out (#6028 ) (#6418 ) torch.clamp is out from template code, add it manually, same with auto generated code.	2018-04-09 13:39:31 -04:00
Luca Antiga	57ee202022	Use string comparison in OS check (#6420 )	2018-04-09 09:23:22 -07:00
François Garillot	a91c88a348	Check mappings ONNX -> Caffe2 bear the same argument names (#6317 ) * Check mappings ONNX -> Caffe2 bear the same argument names When adding an extra arg to an input ONNX op, if it's not supported in Caffe2, the exporter would just silently pass it to NetDef and ignore it in the implementation. It's pretty error-prone. Caffe2 also has an OpSchema description and we can enforce that all arguments explicitly appear in schema or listed explicitly in Caffe2. See also https://github.com/caffe2/caffe2/pull/2478 Add test for C2 argument checking * Some operators do not log arguments, which prevents argument checks. Invite users to file an issue to fix the schema.	2018-04-09 09:15:42 -07:00
Yinghai Lu	73a23b492c	Add mock python module for testing (#6387 )	2018-04-09 09:12:10 -07:00
peterjc123	0cabab02bb	Another CUDA 8 fix for Windows (#6383 ) * Another CUDA 8 fix for Windows * Skip ATen tests when compiler is not sufficient * Fix wrong syntax	2018-04-09 10:20:37 -04:00
Teng Li	18fc4fd447	Using a function registry for THD init_methods for easy extension (#6334 )	2018-04-09 12:54:12 +02:00
Soumith Chintala	108f5c197f	[pytorch] add static linkage support for CuDNN and NCCL (#6410 ) * when linking static CUDA libs, additional dep on culibos.a * add USE_STATIC_NCCL option * add USE_STATIC_CUDNN option * remove libATen soversion * add caffe, caffe2 folders to setup.py exclude list	2018-04-08 22:54:18 -04:00
Tongzhou Wang	4d15442ebc	Add total_length option to pad_packed_sequence (#6327 ) * add total_length to pad_packed_sequence; add example on how to use pack->rnn->unpack with DP * address comments * fix typo	2018-04-08 20:25:48 -04:00
Tongzhou Wang	88da5a0db4	fix incorrect error message in convolution_expand_param_if_needed (#6409 )	2018-04-08 20:25:03 -04:00
Will Feng	99939b6d90	Increase margin for CPU perf test, and change test order (#6363 )	2018-04-08 17:00:43 -04:00
Ben	119ea39021	add cuda headers (#6401 )	2018-04-08 10:50:20 -04:00
Lu Fang	67bbf585cd	Fix the c2-onnx exporter bug on Gemm (#6331 )	2018-04-07 16:48:29 -07:00
Kento NOZAWA	3b58b859b2	Fix typos in docs (#6389 )	2018-04-07 12:41:15 -04:00
li-roy	e9adbbba82	refactor reduce arg to _Loss superclass (#6371 )	2018-04-07 11:09:31 -04:00
Tongzhou Wang	e0f3e5dc77	fix activation images not showing up on official website (#6367 )	2018-04-07 11:06:24 -04:00
onnxbot	aecec8b412	[auto] Update onnx to c9f825f - Refine a little bit about op spec. (#666 ) `c9f825fc68`	2018-04-07 14:59:32 +00:00
peterjc123	c053a76182	Several minor fixes for Windows build (#6332 ) * Several minor fixes for Windows build * Use version_info instead of version	2018-04-07 11:39:59 +02:00
Peter Goldsborough	32f3bf7946	Simplify and extend cpp build (#6343 ) * Modify cpp build * Use absolute path in .jenkins/pytorch/build.sh	2018-04-06 22:26:16 -07:00
onnxbot	a915e4715c	[auto] Update onnx to a484eb2 - Fix an error in Conv doc (#731 ) `a484eb2cb3`	2018-04-07 03:15:20 +00:00
Svetoslav Kolev	997acfd7fe	[Caffe2] Some small changes to InferBlobShapesAndTypes definition and SameAsInput Schema (#6335 ) * Change Same as input type deduction to work for ops with multiple outputs * change InferBlobShapesAndTypes definition to take vector ot pointers instead of unique_ptr. The function doesn't own the objects, so no need to pass smart pointers and that prevents calling the function with existing object, since the caller has to create unique_ptr, i.e. copy an existing object just to create the pointer * switching order of std::move<unique_ptr> and uniqur_ptr.get * adding comma	2018-04-06 19:06:46 -07:00
Paul Jesse Hellemn	774601c04c	[Caffe2] Consolidating conda build scripts (#6359 ) * Consolidating conda build scripts * Grep bug * Naming bug * Correcting quoting of variable passing	2018-04-06 16:54:42 -07:00
onnxbot	f2130ae495	[auto] Update onnx to 7410cc4 - Fix incorrect package output paths (#730 ) `7410cc4abf`	2018-04-06 23:18:10 +00:00
Bram Wasti	47259cfb6a	[nomnigraph] Version bump (#6364 ) updating nomnigraph to clean up diff stack	2018-04-06 15:52:12 -07:00
Lu Fang	a9a96a4acb	Fix the onnx split backend axis handling (#6366 )	2018-04-06 15:47:27 -07:00
Marat Dukhan	e45b51148a	[caffe2] Always build NNPACK together with Caffe2 (#6365 ) Caffe2 started with an option to use NNPACK pre-installed in the system. Now this option is mostly legacy, as Caffe2 can include NNPACK in its own build on all platforms. Due to problems when pre-installed NNPACK is built with different dependencies or compiler options, we decided to remove this option and alwyas build NNPACK with Caffe2. This change makes Caffe2 always build NNPACK as part of its own build, and updates NNPACK and cpuinfo submodules.	2018-04-06 18:27:59 -04:00
Lu Fang	aab0bd3c13	Change onnx_optimizer API (#6290 )	2018-04-06 13:46:53 -07:00
onnxbot	6d8a33b5e6	[auto] Update onnx to be546e2 - Improve optimizer's API and docs (#713 ) `be546e257c`	2018-04-06 20:46:09 +00:00
gchanan	87e369111a	Add string-style devices to all tensors. (#6283 ) * Add string-style devices to all tensors. Previously, tensors only had a 'get_device' method which would throw an exception on a CPU tensor. This made it necessary to if/else code that was meant to be device agnostic. This PR implements the following: 1) Adds a 'device' property to all tensors that returns a string representation of the device for all tensors. For cpu tensors this is 'cpu'. For cuda tensors this is 'cuda:X', where X is the cuda device ordinal. 2) Adds a DeviceSpec class. This is just a helper class for separating device_type and device_index specification and to allow partial specification. For example, you can call DeviceSpec('cuda'), DeviceSpec('cuda:0'), DeviceSpec('cuda', 1). Also has backwards compatibility support for specifying integers, which are treated as cuda devices. DeviceSpecs have the following properties: a) device_type: string representation of the device type (i.e. 'cpu' or 'cuda') b) device_index: integer for the device index (None if not specified) c) cuda_device_index: for backwards compatibility; behaves roughly like `get_device` did previously. I.e. if a function previously took integers for cuda devices, it can now take DeviceSpecs (or strings), and can maintain the old functionality by calling `old_index = DeviceSpec(old).cuda_device_index`. 3) tensor methods and torch. functions that took integer devices can now take integers, strings, or DeviceSpecs. For example: torch.randn((2,3), dtype=torch.cuda.float32, device='cuda:1') TODO in future PRs: A) Split out cuda from dtype so you don't need to overspecify cuda-ness B) We currently only support strings/DeviceSpecs in tensor methods and torch. functions. We should have equivalents torch.cuda.device(...), torch.cuda.device_of, etc. at the torch. level that work on strings/DeviceSpecs * Add deviceInt64 to python arg parser. * device_str. * Remove device_str. * remove device prefix from attributes. * Use const char * instead of string. * Move autogpu index out of Device. * comment on is_default. * Rename torch.DeviceSpec to torch.device. * comment. * Fix tests. * Fix flake8. * Fix sparse_coo_tensor parameter name. * Improve error message. * Remove device_ prefix from C++ device object. * Allocate static strings. * Return not implemented from rich compare. * Move torch::Device to THPDevice. * Remove cuda index. * Py_RETURN_NOTIMPLEMENTED doesn't exist in python2.	2018-04-06 15:12:05 -04:00
Tongzhou Wang	fc7aa5c3be	Fix torch.dtype getting incorrectly rendered as torch.dpython:type by sphinx (#6358 )	2018-04-06 14:59:22 -04:00
Edward Z. Yang	b724084335	INCULDE typofix. (#6354 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-06 13:57:07 -04:00
bddppq	c42f4fa2ee	Add missing attributes to the schema GivenTensorFill operators (#6330 )	2018-04-06 08:42:22 -07:00
Kento NOZAWA	c00ee6da8f	Fix typos (#6348 ) * Fix typo * Fix typo * Update faq.rst	2018-04-06 11:06:42 -04:00
onnxbot	81676d8554	[auto] Update onnx to c61506f - Fix the shape inference python API (#716 ) `c61506f9f4`	2018-04-06 05:17:56 +00:00
Lu Fang	876ad110af	Skip some unsupported onnx backend tests (#6247 )	2018-04-05 21:33:35 -07:00
onnxbot	5198d2b9ab	[auto] Update onnx to e9d4134 - Fix cmake on windows when not building python extension (#728 ) `e9d41346d2`	2018-04-06 02:30:58 +00:00
harrysummer	4cde7c0f09	Modify cmake dedent function to make it compatible with Windows. (#6296 )	2018-04-05 21:37:12 -04:00
Masaki Kozuki	a093ec997f	fix typo (#6329 )	2018-04-05 21:36:16 -04:00
Adam Paszke	c1cd6eab9f	Handle broadcasting in the JIT (#6084 ) * Add size checks to JIT's fuser * Handle broadcasting in shape propagation pass * Fix build errors and add tests	2018-04-05 17:07:52 -07:00
onnxbot	2f30fe64fd	[auto] Update onnx to 72187aa - Add value_info support in make_graph (#726 ) `72187aa08d`	2018-04-05 22:17:19 +00:00
anderspapitto	aba5f129bc	fix broadcast export to onnx (#6243 )	2018-04-05 14:25:37 -07:00
Tongzhou Wang	29c69f049e	add test for old tensor serialization (#6275 )	2018-04-05 17:00:30 -04:00
onnxbot	15f636bd10	[auto] Update onnx to 67b7d89 - Fix gen_proto in cmake (#719 ) `67b7d89d24`	2018-04-05 20:38:11 +00:00
Paul Jesse Hellemn	38b995a13b	Fixing conda test builds (#6261 ) * Moving conda test package installs into docker image * Small nits * Onnx setup.py still needs PROTOBUF_INCDIR passed in	2018-04-05 13:27:43 -07:00
Marat Dukhan	0b3edfd3dd	[caffe2] Do not print version and build info unless explicitly requested (#6282 )	2018-04-05 16:09:13 -04:00
Edward Z. Yang	482e1511ff	Revert "Increase # of runs for CPU perf test, and increase margin of error" (#6322 ) * Revert "Add __constants__ to Script modules (#6092)" This reverts commit 5ab30eedf33c670514685838423371f9a5df80f3. * Revert "[ready] Implement log2 and log10 in PyTorch (#6272)" This reverts commit 0aa35780bfade6bf9c428f1ae45426caa8a7df93. * Revert "Use reshape({-1}) (#6281)" This reverts commit 8ae67a444506a838e648aa60f9eb6a4da22c9b06. * Revert "Move instruction set specific code to anonymous namespace (#6314)" This reverts commit 6953c1b77efe2d0764ca9ba7dbf7c9284d68a80c. * Revert "[auto] Update onnx to 54be8fa - Use cmake3 if it's available (#718) `54be8fad1e`" This reverts commit d33ec12d1e3f4739e10cacf1436764bc54ff89a3. * Revert "default build with MKL for desktop (#6266)" This reverts commit 5dcf7078c689f7055ca6837e67ca834cc70d6497. * Revert "Increase # of runs for CPU perf test, and increase margin of error (#6302)" This reverts commit 9d1a660670d55590cdab5509bb81c26e8bb3d26a.	2018-04-05 16:06:29 -04:00
onnxbot	d38adfe35d	[auto] Update onnx to fcb4ae3 - docs rewording: Important Python Functions -> Python API Overview (#721 ) `fcb4ae329f`	2018-04-05 19:40:52 +00:00
onnxbot	c21ce7e083	[auto] Update onnx to 24275d6 - Ignore .eggs directory when doing lint (#722 ) `24275d6cea`	2018-04-05 19:34:39 +00:00
Zachary DeVito	5ab30eedf3	Add __constants__ to Script modules (#6092 ) Like `__slots__` the `__constants__` property changes the set/getattr behavior of a script module for the keys listed so they behave as constants. This enables script methods to use them in way that are otherwise not allowed. * Python numbers/bools can be inlined as constants in script code. * List of numbers can be iterated over using for loops * nn.ModuleLists can be used in for loops as well, unrolling their content.	2018-04-05 11:31:43 -07:00
Vishwak Srinivasan	0aa35780bf	[ready] Implement log2 and log10 in PyTorch (#6272 ) * Implemented log2 and log10 * Re-add incorrectly removed files * Fix minor bugs * Fix log1p docs * Add a try-except for python2 math module in log2 test * Revert changes made to aten/doc/* * Fix docstring errors * Fix windows build	2018-04-05 14:28:37 -04:00
Peter Goldsborough	8ae67a4445	Use reshape({-1}) (#6281 )	2018-04-05 14:27:40 -04:00
Sam Gross	6953c1b77e	Move instruction set specific code to anonymous namespace (#6314 ) The vec256 and SIMD kernels are compiled multiple times with different headers. It's important that these functions have internal linkage so that kernels for different architectures don't get combined during linking. It's sufficient to label functions "static", but class methods must be an unnamed namespace to have internal linkage (since static means something different in the context of classes). This fixes a bug in which the implementations of Reduction::reduce_all for different instruction sets was getting combined during linking.	2018-04-05 14:21:33 -04:00
onnxbot	d33ec12d1e	[auto] Update onnx to 54be8fa - Use cmake3 if it's available (#718 ) `54be8fad1e`	2018-04-05 17:57:26 +00:00
Sergei Belousov	5dcf7078c6	default build with MKL for desktop (#6266 ) * default build with MKL for desktop default build with MKL for desktop * remove SET(INTEL_COMPILER_DIR "/opt/intel")	2018-04-05 09:36:03 -04:00
Will Feng	9d1a660670	Increase # of runs for CPU perf test, and increase margin of error (#6302 )	2018-04-05 09:29:48 -04:00
Simeon Monov	9b111f1a88	Fix worldsize use in test_distributed with MPI backend (#6301 ) WORLD_SIZE is not used for MPI tests and the check fails for the group tests	2018-04-05 09:28:53 -04:00
li-roy	de54f23de6	Add default args to loss functions in native_functions.yaml (#6289 )	2018-04-05 12:33:43 +02:00
Xiaomeng Yang	f73c044576	Remove eigen impl for arg_max and arg_min (#6293 )	2018-04-04 21:51:27 -07:00
onnxbot	7e0227d3e1	[auto] Update onnx to b8c4238 - Add python function docs (#714 ) `b8c423889b`	2018-04-05 03:51:11 +00:00
Edward Z. Yang	73ab15d388	Change ATen to use Caffe2/cmake upstream FindCUDA (#6240 ) * Remove ATen's copy of FindCUDA Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Minor bugfix for updated FindCUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cl.exe as the host compiler even when clcache.exe is set. Upstream merge request at https://gitlab.kitware.com/cmake/cmake/merge_requests/1933 H/t peterjc123 who contributed the original version of this patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Include CMakeInitializeConfigs polyfill from ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tweak the regex so it actually works on Windows. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-04 23:26:57 -04:00
Tongzhou Wang	efc91d8c6d	Add arg checks in torch.utils.data.Sampler classes (#6249 ) Fixes #6168 * add arg checks in torch.utils.data.Sampler * add check for positive-ness	2018-04-04 23:07:31 -04:00
Soumith Chintala	0016dad841	[pytorch] minor fixes around binary builds (#6291 ) * remove patch * check that cuda dev environment is also present before running cpp_extension cuda tests * add OSError to list of exceptions when c++filt is not found	2018-04-04 22:37:13 -04:00
anderspapitto	12bfa47ddd	Onnx RNN export: remove Constant default hidden state (#6199 ) when no explicit hidden state is provided, a default is created by constructing a new Variable filled with zeros. This gets traced as a Constant operator, which hardcodes in the batch size. To fix this, we remove such constant operators in an 'optimization' pass. We could have also fixed it by causing the code to not generate a Constant in the first place, but this is the least invasive fix from the perspective of the pure pytorch codebase.	2018-04-04 19:22:38 -07:00
li-roy	afdaf52c34	Change Python Arg Parser to only read default params if they are assigned (#6254 ) * only read default param if it's actually assigned * address comments	2018-04-04 15:43:32 -07:00
bddppq	8df2487de9	Properly skip the failing onnx conversion test (#6280 )	2018-04-04 14:07:03 -07:00
Edward Z. Yang	ed9952dd25	Update FindCUDA to cmake master as of 561238bb6f07a5ab31293928bd98f6f… (#6241 ) * Update FindCUDA to cmake master as of 561238bb6f07a5ab31293928bd98f6f8911d8bc1 NB: I DID have to apply one local patch; it's the `include_guard` change. Should be obvious next time you do an update. Relevant commits: commit 23119366e9d4e56e13c1fdec9dbff5e8f8c55ee5 Author: Edward Z. Yang <ezyang@fb.com> Date: Wed Mar 28 11:33:56 2018 -0400 FindCUDA: Make nvcc configurable via CUDA_NVCC_EXECUTABLE env var This is useful if, for example, you want ccache to be used for nvcc. With the current behavior, cmake always picks up /usr/local/cuda/bin/nvcc, even if there is a ccache nvcc stub in the PATH. Allowing for CUDA_NVCC_EXECUTABLE lets us work around the problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> commit e743fc8e9137692232f0220ac901f5a15cbd62cf Author: Henry Fredrick Schreiner <henry.fredrick.schreiner@cern.ch> Date: Thu Mar 15 15:30:50 2018 +0100 FindCUDA/select_compute_arch: Add support for CUDA as a language Even though this is an internal module, we can still prepare it to be used in another public-facing module outside of `FindCUDA`. Issue: #16586 commit 193082a3c803a6418f0f1b5976dc34a91cf30805 Author: luz.paz <luzpaz@users.noreply.github.com> Date: Thu Feb 8 06:27:21 2018 -0500 MAINT: Misc. typos Found via `codespell -q 3 -I ../cmake-whitelist.txt`. commit 9f74aaeb7d6649241c4a478410e87d092c462960 Author: Brad King <brad.king@kitware.com> Date: Tue Jan 30 08:18:11 2018 -0500 FindCUDA: Fix regression in per-config flags Changes in commit 48f7e2d300 (Unhardcode the CMAKE_CONFIGURATION_TYPES values, 2017-11-27) accidentally left `CUDA_configuration_types` undefined, but this is used in a few places to handle per-config flags. Restore it. Fixes: #17671 commit d91b2d9158cbe5d65bfcc8f7512503d7f226ad91 Author: luz.paz <luzpaz@users.noreply.github.com> Date: Wed Jan 10 12:34:14 2018 -0500 MAINT: Misc. typos Found via `codespell` commit d08f3f551fa94b13a1d43338eaed68bcecb95cff Merge: 1be22978e 1f4d7a071 Author: Brad King <brad.king@kitware.com> Date: Wed Jan 10 15:34:57 2018 +0000 Merge topic 'unhardcode-configuration-types' 1f4d7a07 Help: Add references and backticks in LINK_FLAGS prop_tgt 48f7e2d3 Unhardcode the CMAKE_CONFIGURATION_TYPES values Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !1345 commit 5fbfa18fadf945963687cd95627c1bc62b68948a Merge: bc88329e5 ff41a4b81 Author: Brad King <brad.king@kitware.com> Date: Tue Jan 9 14:26:35 2018 +0000 Merge topic 'FindCUDA-deduplicate-c+std-host-flags' ff41a4b8 FindCUDA: de-duplicates C++11 flag when propagating host flags. Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !1628 commit bc88329e5ba7b1a14538f23f4fa223ac8d6d5895 Merge: 89d127463 fab1b432e Author: Brad King <brad.king@kitware.com> Date: Tue Jan 9 14:26:16 2018 +0000 Merge topic 'msvc2017-findcuda' fab1b432 FindCUDA: Update to properly find MSVC 2017 compiler tools Acked-by: Kitware Robot <kwrobot@kitware.com> Acked-by: Robert Maynard <robert.maynard@kitware.com> Merge-request: !1631 commit 48f7e2d30000dc57c31d3e3ab81077950704a587 Author: Beren Minor <beren.minor+git@gmail.com> Date: Mon Nov 27 19:22:11 2017 +0100 Unhardcode the CMAKE_CONFIGURATION_TYPES values This removes duplicated code for per-config variable initialization by providing a `cmake_initialize_per_config_variable(<PREFIX> <DOCSTRING>)` function. This function initializes a `<PREFIX>` cache variable from `<PREFIX>_INIT` and unless the `CMAKE_NOT_USING_CONFIG_FLAGS` variable is defined, does the same with `<PREFIX>_<CONFIG>` from `<PREFIX>_<CONFIG>_INIT` for every `<CONFIG>` in `CMAKE_CONFIGURATION_TYPES` for multi-config generators or `CMAKE_BUILD_TYPE` for single-config generators. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Polyfill CMakeInitializeConfigs Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Tweak condition for when to use bundled FindCUDA support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Comment out include_guard. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-04 17:04:21 -04:00
Peter Goldsborough	9ba70856a1	Add max_values and argmax convenience functions to ATen (#6201 ) * Add max_values and argmax convenience functions to ATen * Add documentation for torch.argmax/argmin and skip max_values * Add tests for argmax/argmin * Dont default the dim argument * Use dim=0 in test_torch.py for argmax tests * Implement argmin() and argmax() without dim * Call .contiguous() before .view(-1)	2018-04-04 15:53:26 -04:00
Sebastian Meßmer	92e7f627cd	Add typing dependency to caffe2 CI (#6195 ) This is needed to run mypy on CI	2018-04-04 15:48:02 -04:00
bddppq	004545fe32	[Caffe2] Always build local protobuf library with -fPIC (#6264 ) * Always build local protobuf library with -fPIC * .	2018-04-04 11:08:11 -07:00
Tongzhou Wang	8f27c27941	fix legacy tensor __setstate__ (#6251 )	2018-04-04 13:36:56 -04:00
Orion Reblitz-Richardson	0469926ba3	Add a CODEOWNERS file (#6274 ) * Add a CODEOWNERS file * This will let us require review from owners of aten/ and torch/ while giving wider access (for now) to caffe2/ * This will be adjusted as we work on shared components. * update OWNERS to cover more pytorch bits	2018-04-04 13:18:00 -04:00
Sam Gross	fd580ce419	Fix potential UB when input is empty (#6242 ) If the source and result tensors are empty, arr_in and arr_out may be null (and size will be 0). This previously called memcpy(null, null, 0), which is UB according to http://en.cppreference.com/w/cpp/string/byte/memcpy. Note that either one of these changes would be sufficient. (Detected by UBSan)	2018-04-04 11:59:21 -04:00
Simeon Monov	2f0bb19d7b	Do not use cpuinfo on PowerPC (#6255 ) cpuinfo_initialize() prints error message to the console/log when run on unsupported CPU/platform. Even the code will work fine this is confusing error message that shouldn't be shown to the users when use PyTorch on other architectures than the supported by cpuinfo.	2018-04-04 11:45:33 -04:00
Vishwak Srinivasan	3497f0207c	[distributions] KL-Divergence for Multivariate Normal (#6172 )	2018-04-04 13:19:47 +02:00
Ailing	1499a604cf	fix assertion error when input size smaller than number of module_copies (#6252 )	2018-04-04 12:05:34 +02:00
bddppq	b125033f85	Manually bump onnx submodule to current latest (#6237 ) * Manually bump onnx submodule to current latest * skip _equal_ tests * Revert "skip _equal_ tests" This reverts commit 72db49ebc16c9f98ed12add293a8f41e7d509bf3. * bump to include a fix * bump	2018-04-03 22:59:03 -07:00
bddppq	5f268b0668	Fix the processing of extra cmake args passed to caffe2's setup.py (#6263 )	2018-04-03 22:48:14 -07:00
gchanan	7fd56b2c1f	Remove unnecessary properties from Layout. (#6250 ) This was just copy/pasted/changed from Dtype and shouldn't have gotten through.	2018-04-03 22:48:46 -04:00
Tongzhou Wang	a2880531ea	fix SGD lr check (#6244 )	2018-04-03 21:29:18 -04:00
Tongzhou Wang	06a697785c	Add dtype to torch.*_window; Add dtype.is_floating_point (#6158 )	2018-04-03 21:19:30 -04:00
Sam Gross	6b3a4637d6	Make the tensor type torch.Tensor instead of torch.autograd.Variable (#5785 ) This changes type(tensor) to return `torch.Tensor` instead of `torch.autograd.Variable`. This requires a few implementation changes: - torch.Tensor is now a regular Python class instead of a pseudo-factory like torch.FloatTensor/torch.DoubleTensor - torch.autograd.Variable is just a shell with a __new__ function. Since no instanes are constructed it doesn't have any methods. - Adds torch.get_default_dtype() since torch.Tensor.dtype returns <attribute 'dtype' of 'torch._C._TensorBase' objects>	2018-04-03 16:29:25 -04:00
Tongzhou Wang	dfcd90783c	fix sparse embedding backward when input contains only padding_idx (#6211 )	2018-04-03 15:53:43 -04:00
bddppq	14bf37f22e	Fix AvgPool breaking changes (#6221 ) Made in 605307f8f3c249d9279030502d2aac98d4170b83	2018-04-03 15:51:21 -04:00
Richard Zou	de51764119	Fix memory leak in maxpool3d backwards (#6230 ) Fixes #6222 We don't need to make sure gradInput is contiguous because it's always passed in as an empty tensor (see CUDAFloatType.cpp after it gets codegen-ed). This was increasing the reference on gradInput and leaking it. I'm not sure if there's a good way to test this. I put together a script that 1) Prints out when a tensor is allocated and deallocated 2) Checks allocations vs deallocations after running a python script And verified that each allocation matches each deallocation.	2018-04-03 15:47:29 -04:00
Edward Z. Yang	29e81e01aa	Expunge ATen submodule; use the in-tree copy. (#6235 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-03 15:47:07 -04:00
bddppq	40096c98ff	Support export torch.max(input, dim) and torch.min(input, dim) to ONNX (#6220 ) * Support export torch.max(input, dim) and torch.min(input, dim) to ONNX * .	2018-04-03 15:29:11 -04:00
Sam Gross	83926393d3	Detect re-initialization of _C shared library (#6232 ) We had a bug in the Buck build of PyTorch due to symbols from _C being present in two shared libraries that were both loaded at runtime. This caused global variables to be initialized twice and destructed twice on exit. The second destruction often caused segfaults on exit. This attempts to detect that sort of situation early on. If Module.cpp is compiled twice, the symbol pytorch_duplicate_guard()::initialized will be shared. The second initialization will print an error message and abort.	2018-04-03 15:28:37 -04:00
Edward Z. Yang	80ff36c9a4	Print the diff files to aid in debugging when it's wrong. (#6238 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-03 15:12:58 -04:00
gchanan	581d74f8d0	Remove unused variable in Layout.cpp. (#6236 )	2018-04-03 14:26:46 -04:00
Sam Gross	4a9e02fc2f	Reduce flakiness of math tests in test_torch.py (#6200 ) This compares the torch function against the reference math funciton against a relative small set of inputs, including integers, extremes of some common functions, zero, a few numbers from randn and a few numbers near 1e6. The idea here is not to be completely exhaustive, but rather quickly expose the most common bugs. For exhaustive checks, we should evaluate torch functions against all ~4e9 possible float32 value. We compare the torch function evaluated against contiguous and non-contiguous inputs and large vs. small tensors. Also: - Make torch.allclose work with nan and +/-inf - Add torch.isclose (like numpy.isclose) - Add torch.testing.assert_allclose (like numpy.testing.assert_allclose)	2018-04-03 13:51:47 -04:00
cpuhrsch	1b41d7ac1e	avx_mathfun.h is imprecise (#6192 ) After discussion with @colesbury it turns out that avx_mathfun.h is imprecise and cannot be trusted blindly. Turns on /fp:strict in Windows to disable replacement of trig functions with imprecise vectorized implementation.	2018-04-03 12:16:57 -04:00
Richard Zou	e831ad6204	Fix sharing of empty tensor in multiprocessing (#6229 ) Fixes #5719 Previously, the following would error out with an "Invalid file descriptor" error: ``` import torch import torch.multiprocessing as mp q = mp.Queue() t = torch.tensor([]) q.put(t) ``` on some OSes. The problem was that because one cannot mmap data of size 0, and that an empty tensor has a storage of size 0, the file descriptor for the storage (referencing shared memory) was not being set. The multiprocessing sharing code then calls DupFD on that uninitialized file descriptor, leading to an error. This PR special cases sharing an empty tensor on the CPU. CUDA does not have this problem. Unit tests for both cpu and cuda empty tensors	2018-04-03 11:49:40 -04:00
kuttas	460e8cd376	change print to logger.warning in operator traceback code (#6216 )	2018-04-03 08:01:25 -07:00
Paul Jesse Hellemn	4375dfd0b2	Changes without protoc conditions (#6142 )	2018-04-03 09:50:14 -04:00
Lu Fang	80cf134aff	Adjust the setup script according to the repo changes (#6218 )	2018-04-03 09:44:52 -04:00
Kaiyu Shi	4f1eb06989	Delete dead codes (#6226 )	2018-04-03 09:38:48 -04:00
hlu1	2e156f3eab	[caffe2] Add default values to speed_benchmark args (#6210 )	2018-04-02 22:00:21 -07:00
Qinqing Zheng	fd2e7cb487	Change JobRunner's __call__ function to train (#6205 )	2018-04-02 21:04:36 -07:00
James Reed	9f49be51ec	Fix argument checking for inlining a module (#6207 )	2018-04-02 23:14:04 -04:00
Paul Jesse Hellemn	771fcb3455	[caffe2] Fbcode to GitHub sync (#6208 ) * [easy] allow empty tensor in cuda relu op The diff has not enabled unit test of empty tensor, because MLKVersion of ReluOp need extra work to support * Make blob norm plotting work with distributed trainer when the old framework is used	2018-04-02 16:35:27 -07:00
Joo-Kyung Kim	fe89e21b02	Add a missed parenthesis to the LogSigmoid documentation (#6209 ) Add a missed parenthesis to the LogSigmoid documentation	2018-04-02 18:47:21 -04:00
Edward Z. Yang	26c022b183	Documentation for reentrant backwards. (#6191 ) * Documentation for reentrant backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR	2018-04-02 18:26:15 -04:00
Robert (Bobby) Wagner	ad34d88959	added word object to function doc string for clarity (#6204 )	2018-04-02 18:22:01 -04:00
Orion Reblitz-Richardson	a409f959e8	Remove ShuffleNet from model zoo. (#6203 ) * No longer supported.	2018-04-02 15:00:06 -07:00
gchanan	4c81282c33	Introduce torch.layout and split layout from dtypes. (#6145 ) * Introduce torch.layout and split layout from dtypes. Tensors (and tensor types) now have a 'layout' attribute that returns either 'torch.strided' or 'torch.sparse_coo'. Previously, dtypes were 1-to-1 with ATen types/PyTensorTypes; the impetus behind this decision was to make things easy in the common case (i.e. specifying a type in a factory function). But this doesn't really follow for sparity, which isn't a common case. It also doesn't properly represent the concept or a dtype, which in numpy are proper scalar types (i.e. roughly the type returned from indexing the last dimension of an n-d array). But this should be the same whether or not the tensor is represented via strides, sparsity, etc. This is accomplished by: 1) having the dtype of tensor return the (device-type, scalar-type) combination, i.e. torch.cuda.float32, so both torch.cuda.FloatTensor and torch.cuda.sparse.FloatTensor have the same dtype 2) Adding a layout parameter to python functions, where the combination of (dtype, layout) maps to an ATen type that is used for dispatch. * Formatting, make init throw python_error. * Fix cuda not enabled error message. * Fix test.	2018-04-02 14:07:50 -04:00
bddppq	28e66705ff	Move helper scripts to new repo (#6159 )	2018-04-02 14:06:29 -04:00
peterjc123	63af898d46	Fix extension test on Windows (#5548 ) * Change cpp_extensions.py to make it work on Windows * Fix linting * Show python paths * Debug * Debug 1 * set PYTHONPATH * Add ATen into library * expose essential libs and functions, and copy _C.lib * Specify dir in header * Update check_abi for MSVC * Activate cl environment to compile cpp extensions * change version string * Redirect stderr to stdout * Add monkey patch for windows * Remove unnecessary self * Fix various issues * Append necessary flags * add /MD flag to cuda * Install ninja * Use THP_API instead of THP_CLASS * Beautify the paths * Revert "Use THP_API instead of THP_CLASS" This reverts commit dd7e74c44db48e4c5f85bb8e3c698ff9de71ba2d. * Use THP_API instead of THP_CLASS(new)	2018-04-02 13:53:25 -04:00
Kaiyu Shi	605307f8f3	Add support for printing extra information in Module and refactor redundant codes (#5936 ) This PR enables users to print extra information of their subclassed nn.Module. Now I simply insert the user-defined string at the ending of module name, which should be discussed in this PR. Before this PR, users should redefine the __repr__ and copy&paste the source code from Module. * Add support for extra information on Module * Rewrite the repr method of Module * Fix flake8 * Change the __repr__ to get_extra_repr in Linear * Fix extra new-line for empty line * Add test for __repr__ method * Fix bug of block string indent * Add indent for multi-line repr test. * Address review comments * Update tutorial for creating nn.Module * Fix flake8, add extra_repr of bilinear * Refactor DropoutNd * Change to extra_repr in some Modules * Fix flake8 * Refactor padding modules * Refactor pooling module * Fix typo * Change to extra_repr * Fix bug for GroupNorm * Fix bug for LayerNorm	2018-04-02 13:52:33 -04:00
Richard Zou	7355f5cd8d	Tell source users about TORCH_CUDA_ARCH_LIST (#6185 ) Put it into the comments about env vars in setup.py. Also put in a line in the README about where to find this info.	2018-04-02 13:35:14 -04:00
James Reed	4748c9b529	Fix logic inside insertInput (#6146 ) * Fix logic inside insertInput * Add comment * Commentary	2018-04-02 13:20:35 -04:00
mseitzer	92a0f7835e	Support returning dictionaries in DataParallel (#6113 )	2018-04-02 15:16:44 +02:00
Fritz Obermeyer	0b17f4b87e	[distributions] Support python floats in AffineTransform (#6035 ) This avoids promotion from python float to torch.Tensor for AffineTransform. This appears to be needed so that constraint registration works across CPU and all GPUs. Previous discussion at `3a25db73c8 (r176361909)` Background: There are three basic types of objects in torch.distributions: - Distributions are flyweight objects constructed from tensor or float args. They always promote float args to tensors. - Transforms are longer-lived objects (sometimes cached; some are static globals). They can take float arguments. This PR makes AffineTransform avoid promoting float args to tensors. - Constraints are long-lived objects. They can take either float or tensor arguments. They do not promote floats to tensors. These are relatively symbolic and are not much more than partially evaluated comparisons, e.g. constraints.positive is basically a symbolic version of lambda x: x > 0 that can be stored in a ConstraintRegistry table. The Problem: Sometimes we want to apply transform_to(constraints.positive) to a torch.Cuda.FloatTensor. This is fine since transform_to(constraints.positive)(x) = ExpTransform()(x) = x.exp() which works with any tensor type. Other times we want to apply transform_to(constraints.greater_than(1.5)) to a torch.cuda.FloatTensor. This is problematic before this PR since transform_to(constraints.greater_than(1.5))(x) = ComposeTransform([ExpTransform(), AffineTransform(1.5, 1)])(x) = AffineTransform(1.5, 1)(x.exp()) = t.loc + t.scale * x.exp() # where t = AffineTransform(1.5, 1) Before this PR, AffineTransform would promote t.loc and t.scale to tensors. This promotion can happen as early as library load time for some transforms, e.g. transform_to(constraints.unit_interval). Therefore before this PR, the second example would error at t.scale * x.exp() because t.scale is a [default] torch.FloatTensor whereas x.exp() is a torch.cuda.FloatTensor. Proposed solution: This PR merely adds support for python floats as the .loc and .scale parameters of AffineTransform. This should suffice for most purposes since only AffineTransform and a handful of parameter-free transforms are ever stored in the global transform_to and biject_to registries. Alternative solutions include: - allowing promotion from torch.FloatTensor to all other tensor types, e.g. torch.cuda.FloatTensor. - adding a handful of specific parameter-free transforms like NegateTransform() in lieu of AffineTransform(0, -1). Tested: added a regression test * Support python floats in AffineTransform * Update docstrings	2018-04-01 23:56:34 -04:00
Edward Z. Yang	8617d3f1eb	Refine dirty matching for docs. (#6177 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2018-04-01 21:18:33 -04:00
Edward Z. Yang	d93d41b2ef	Some notes about PyTorch/Caffe2 merge. (#6147 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-31 11:33:01 -07:00
Edward Z. Yang	9ce21b0e90	Delete NNPACK (#6151 ) Since we added cpuinfo as a vendored dependency, this created a problem with our NNPACK integration, because NNPACK also depends on cpuinfo, as per #6068. This is particularly difficult to resolve because we depend on a fairly recent version of cpuinfo, which we generally cannot assume users have installed (it is submoduled.) So, it would seem that to fix this properly, NNPACK would have to be vendored and built against the correct cpuinfo. However, discussion with Christian Puhrsch and Marat Dukhan suggests that the benefit of carrying on with NNPACK integration is not all that great, because mkldnn has since come out with a CPU convolution implementation that performs better than NNPACK. NNPACK's x86 implementation is not really maintained, and its ARM support is not really relevant to PyTorch. So rather than go through all the rigamarole of vendoring NNPACK, better to just delete it. If you need good perf for CPU convolutions, please make sure you build against mkldnn. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-31 11:28:31 -07:00
Adam Paszke	da6c3c90d9	Relax constraints on return statements in the script (#6070 ) Script functions can now have no return statements, empty return statements, or return one or more values. Additionally fix the lexer to always emit TK_NEWLINE before TK_DEDENT, which simplifies the parser.	2018-03-31 18:35:33 +02:00
Thomas Viehmann	32ba2ca203	add documentation for diagflat and diagonal (#6161 )	2018-03-31 18:03:21 +02:00
Richard Zou	7e1046ce83	Fix SparseMM compiler warning (#6156 ) ``` [6/179] Building NVCC (Device) object src/ATen/CMakeFiles/ATen.dir/native/cuda/ATen_generated_SparseMM.cu.o /home/rzou/pytorch/aten/src/ATen/native/cuda/SparseMM.cu(9): warning: statement is unreachable /home/rzou/pytorch/aten/src/ATen/native/cuda/SparseMM.cu(9): warning: statement is unreachable ``` Warning was caused by unnecessary return statement.	2018-03-31 16:40:01 +02:00
Du Phan	de42542351	Make precision matrix computation in mvn stable (#6128 )	2018-03-31 16:39:33 +02:00
Peter Goldsborough	0d19b81a65	Give ATen errors backtraces (#6112 )	2018-03-31 16:39:12 +02:00
Orion Reblitz-Richardson	cbe92abd7c	Disable failing test_lengths_max_gpu	2018-03-30 21:00:45 -07:00
Orion Reblitz-Richardson	e0633ef1f1	Fix Windows build of nomnigraph and remove header.	2018-03-30 21:00:45 -07:00
Orion Reblitz-Richardson	acea18a54a	Fix net_test ParseFromString usage.	2018-03-30 21:00:44 -07:00
Ellie Wen	3d27095eec	[easy] fix comments nit: fix comments	2018-03-30 21:00:44 -07:00
Qinqing Zheng	365652229d	Back out "Revert D7372460: [DT] [28/n] Lift epoch_limiter" Original commit changeset: b0a986d16c3b	2018-03-30 21:00:44 -07:00
Andrey Malevich	b9d2ba1dbf	Revert D7394363: [GanH]: Log D Trick for Cross Entropy with Sigmoid This reverts commit d63266ccbc0c1390c58c2a71ae0b562fdec2fbc0 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files	2018-03-30 21:00:44 -07:00
Ellie Wen	363a227d19	extend bucketize op to support duplicated boundries upgrade bucketize op to support duplicated boundaries	2018-03-30 21:00:44 -07:00
Jason Gauci	551d5fbf9a	CUDA version of LengthsMax operator CUDA version of LengthsMax operator @override-unit-failures	2018-03-30 21:00:44 -07:00
Andrew Tulloch	0df662c67f	[Caffe2] [Int8] More exhaustive unit tests for int8 ops (+ bug fix in Int8Add in-place case) As title. This catches one bug in the Int8Add in-place case, which wasn't tested in int8_test.cc	2018-03-30 21:00:44 -07:00
Xiaolong Wang	2b0e39f569	[GanH]: Log D Trick for Cross Entropy with Sigmoid as titled	2018-03-30 21:00:44 -07:00
Andrey Malevich	f8eb8a66e2	Revert D7372460: [DT] [28/n] Lift epoch_limiter This reverts commit 05bd9bec10fad5ff9dc40be88836fd7274d50ce9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files	2018-03-30 21:00:44 -07:00
Xiaomeng Yang	58ae29b702	Fix schema check for arg_ops Fix schema check for arg_ops.	2018-03-30 21:00:44 -07:00
Bram Wasti	ee64200c64	[nomnigraph] Expose transformations to python Adding a python interface to the transformations	2018-03-30 21:00:44 -07:00
Ilia Cherniavskii	028a598cb9	Expose thread pool to operators Adding ExecutorHelper interface between executor and operators	2018-03-30 21:00:44 -07:00
Ilia Cherniavskii	77976d34f4	Respect num_workers parameter in async net executor Making sure we honor num_workers parameter in async executor	2018-03-30 21:00:44 -07:00
Yiming Wu	03c5198331	[C2 Int8][C2 Core]fetch int8 blob Providing Python API to fetch Int8 tensors. data, scale. zero_point = workspace.FetchInt8Blob(blob_name) now returns a tuple if the blob contains a Int8TensorCPU 'data' = int8 data array 'scale' = fake quantization scale 'zero_point' = fake quantization offset Although FetchBlob shares back-end implmentation with FetchInt8Blob, we raise error to prevent unexpected behavior of the same method	2018-03-30 21:00:44 -07:00
Lu Fang	8f3ba30266	Fix a typo Fix a typo in optimize_onnx_test.py	2018-03-30 21:00:44 -07:00
Jay Mahadeokar	c9dbfca275	bugfix im2col op fixes grad bug in im2col op	2018-03-30 21:00:44 -07:00
Kittipat Virochsiri	bb04053e22	Fixing TTSN unit tests got lost in rebase	2018-03-30 21:00:44 -07:00
Kittipat Virochsiri	91162a74ed	[easy] Improving error message `_EQ` variation prints the values in case of failure; make it easier to debug	2018-03-30 21:00:44 -07:00
David Lai	4cb79ee8e1	[codemod][caffe2] comment out unused parameters The changes in this diff comments out unused parameters. All changes are automated using clang-tidy. This will allow us to enable `-Wunused-parameter` as error. #accept2ship	2018-03-30 21:00:44 -07:00
Martin Schatz	e13c6fee66	[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot Same as D7281311 but with DotProduct TensorInference removed	2018-03-30 21:00:44 -07:00
Orion Reblitz-Richardson	0ac4d19a29	Linter changes.	2018-03-30 21:00:44 -07:00
Martin Schatz	85c9b89edf	Back out "[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot" Original commit changeset: 48fccd71a270	2018-03-30 21:00:44 -07:00
Orion Reblitz-Richardson	02786a3819	Linter changes.	2018-03-30 21:00:44 -07:00
Paul Jesse Hellemn	c2703aa141	Renaming .jenkins testing folder to caffe2_test (#6148 ) * Renaming .jenkins testing folder to caffe2_test * Another fix	2018-03-30 19:38:47 -04:00
Tongzhou Wang	4563e190c4	Use THC cached CUDA device property when get_device_name and get_device_capability (#6027 ) Getting CUDA device property struct with cudaGetDeviceProperties is expensive. THC caches CUDA device property, which is available via THCState_getDeviceProperties, which is available via at::globalContext().getDeviceProperties(device), which is available via torch.cuda.get_device_properties. This PR changes the two methods that previously calls cudaGetDeviceProperties to directly using torch.cuda.get_device_properties in Python. Also fixes ATen compile error when it can't find CUDA. Fixes #4908. Using the script from that issue, we get roughly 18x speed-up. [ssnl@ ~] python dev.py # master 0.2826697587966919 0.00034999847412109375 0.0003493785858154297 0.000356292724609375 0.00036025047302246094 0.0003629922866821289 0.00036084651947021484 0.00035686492919921874 0.00036056041717529296 0.0003606319427490234 [ssnl@ ~] python dev.py # this PR 0.27275662422180175 2.1147727966308594e-05 1.9598007202148438e-05 1.94549560546875e-05 1.9359588623046876e-05 1.938343048095703e-05 2.0074844360351563e-05 1.952648162841797e-05 1.9311904907226562e-05 1.938343048095703e-05	2018-03-30 16:39:22 -04:00
Richard Zou	1449c9f754	Update autograd docs (#5907 ) * Update autograd docs * Deprecate 'grad_variables' in backward(). Advise to replace with 'grad_tensors'. * Resolve saved_variables/saved_tensors * Tensor section * Address comments * Address comments * Address comments	2018-03-30 15:33:11 -04:00
James Reed	5fe3c406f2	Experimental support for different ONNX export types (#6016 ) Allows you to export an ONNX model as: Protobuf file (this is what we have now) Uncompressed zip archive Compressed zip archive Directory * Experimental support for different ONNX export types * Remove a copy * Add comment * Add test cases * lint * fix bug * address comments	2018-03-30 15:30:38 -04:00
Tongzhou Wang	d2c0f8bb57	avoid generating torch.*_backward_(input\|weight\|bias) (#6114 )	2018-03-30 15:23:56 -04:00
Peter Goldsborough	3d3b62e2d6	Add REL_WITH_DEB_INFO build mode (#6122 ) Small PR to allow use of RelWithDebInfo mode in CMake as per request from @ebetica, to make debugging in optimized binaries easier (i.e. don't have to suffer major decrease in performance when using DEBUG mode, but can still debug properly, not like in RELEASE mode). From what I can see using RelWithDebInfo means -O2 -g -DNDEBUG while Release means -O3. normal (release): $ python setup.py build develop $ grep -e' -fexceptions ' aten/build/build.ninja FLAGS = -DUSE_AVX2 -msse3 -DUSE_SSE3 --std=c++11 -Wall -Wno-unknown-pragmas -Wno-vla -fexceptions -fopenmp -O3 This PR allows use of the REL_WITH_DEB_INFO environment variable: $ REL_WITH_DEB_INFO=1 python setup.py build develop $ grep -e' -fexceptions ' aten/build/build.ninja FLAGS = -DUSE_AVX2 -DUSE_SSE3 --std=c++11 -Wall -Wno-unknown-pragmas -Wno-vla -fexceptions -O2 -g -DNDEBUG * Add REL_WITH_DEB_INFO mode * Fix batch file syntax	2018-03-30 15:18:48 -04:00
Orion Reblitz-Richardson	eb8a43a272	Fix setup_caffe2.py lint error. (#6143 )	2018-03-30 14:08:50 -04:00
Edward Z. Yang	93efe22d72	PyTorch does not use top level cmake. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-30 10:56:40 -07:00
Edward Z. Yang	a2a28c0ef1	tox.ini update. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-30 10:33:08 -07:00
Edward Z. Yang	37044d7515	Add 'dirty diff' tests for PyTorch and Caffe2. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-30 10:32:25 -07:00
Edward Z. Yang	90afedb6e2	Merge caffe2 with pytorch.	2018-03-30 10:29:50 -07:00
Richard Zou	1e9a16c3d1	Fix typo in NLLLoss docs (#6134 )	2018-03-30 10:01:02 -07:00
Tongzhou Wang	48ad4546d2	Move LayerNorm to ATen; remove tracking_running_stats functionality (#5983 ) * move LN to aten; remove tracking_stats functionaility * Address comments about error message and respect cudnn flag for LayerNorm and GroupNorm	2018-03-30 09:44:11 -07:00
cpuhrsch	bc1b4c8912	ByteTensor sum test (#6042 )	2018-03-30 10:58:38 -04:00
Orion Reblitz-Richardson	eca84e2532	Rename setup.py to setup_caffe2.py (#2483 ) * Rename setup.py to setup_caffe2.py * Also move VERSION_NUMBER under caffe2/ directory. * Our setup.py file needs to be at the root level. Add requirements.txt	2018-03-30 07:29:55 -07:00
Svetoslav Kolev	3aca8f3b40	adding const fxn modifier to Operator::type() (#2484 )	2018-03-30 01:05:21 -07:00
Tongzhou Wang	60a16e5663	Set dataloader.batch_size = None when batch_sampler is given (#6108 )	2018-03-30 10:01:09 +02:00
bddppq	4da3fa5095	strip some python dependencies (#2486 ) * strip some python dependencies * remove matplotlib as well * remove pydot which is only used by net_drawer	2018-03-29 21:49:33 -07:00
James Reed	47a1fd208f	Quick and dirty raw value substitution from zip file (#2454 )	2018-03-29 19:18:58 -07:00
Ma Mingfei	f8270c0225	Enable MKLDNN convolution forward and backward (#6062 ) * Enable MKLDNN convolution forward and backward * minor change * fix mkldnn build error when building ATen standalone	2018-03-29 15:25:07 -07:00
Sam Gross	e4c0bb1809	Speed up sum over a dimension (#6026 ) Perf numbers: https://gist.github.com/colesbury/9e28dd7b0f27b0b019f68adbd4bd4b88 I've changed the dispatch stub so that it doesn't require every kernel to be compiled for every instruction set. Kernel implementations are stored in the stub's table with the REGISTER_DISPATCH macro. I've also moved vec256 to it's own folder and split up the specializations before they get too unwieldy. Change UnaryOpsKernel to use new DisaptchStub - Prefer signed integers. Mixing signed and unsigned integers is a pain and ATen mostly uses signed integers (int64_t). - Use inline lambda instead of struct for UnaryOps - Rename partial load overload "load_partial"	2018-03-29 18:13:43 -04:00
lazypanda1	3dffac91bc	Fixed some tests by using the correct optimizer (#6116 )	2018-03-29 23:19:00 +02:00
Orion Reblitz-Richardson	2ed2624c28	Move README.md to caffe2/ in prep for merge. (#2479 )	2018-03-29 14:09:32 -07:00
Peter Goldsborough	d42fcdbc96	Add source location information to error messages (#6059 )	2018-03-29 22:57:18 +02:00
samuela	7ffcb20295	small math cleanups in the docs (#6057 )	2018-03-29 22:50:08 +02:00
Tongzhou Wang	29c389078b	RNN `num_layers` and `dropout` docs and checks (#6079 )	2018-03-29 22:44:27 +02:00
Will Feng	53bca3302d	Add CPU perf test for torch.* and torch.Tensor.* (#6054 )	2018-03-29 14:51:07 -04:00
onnxbot	df8991b1b7	[auto] Update onnx to 1d7dee4 - Fix Average pool test cases converted from PyTorch (#677 ) `1d7dee4e21`	2018-03-29 17:27:21 +00:00
Tongzhou Wang	bb114bc05d	Update FFT comments from #5856 (#6089 )	2018-03-29 13:27:03 -04:00
Tongzhou Wang	4f05cb710e	Add underscore to nn.init.* and deprecate the original ones (#6093 ) Fixes #5946. * add underscore to nn.init.* and deprecate the original ones * add a test for deprecation	2018-03-29 13:26:12 -04:00
Lu Fang	21aba57744	Fix a bug in ONNX symbolic of average 3d pooling op (#6101 )	2018-03-29 13:25:18 -04:00
cpuhrsch	f5d0d947c1	Exp, log, sin, cos vectorized (#6078 ) Measured perf using the this script: https://paste.fedoraproject.org/paste/yJiXU3AZGHuyjTVRWlj5OQ	2018-03-29 13:24:44 -04:00
Orion Reblitz-Richardson	368f96acde	Remove tutorials from main repository. * They now live at https://github.com/caffe2/tutorials * Updating caffe2.ai website to match in a separate commit.	2018-03-29 09:31:09 -07:00
Edward Z. Yang	16b0adb274	Remove top-level cmake directory. (#6085 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-29 11:51:09 -04:00
Tongzhou Wang	b21e135ab8	Add class-specific error when key mismatch in load_state_dict (#6086 )	2018-03-29 12:22:23 +02:00
Yinghai Lu	bb3bfa09f3	Avoid some string copies when creating operators (#2475 )	2018-03-28 22:31:49 -07:00
gchanan	df039e2998	Unify handling of type_dispatched_args in gen_python_functions. (#6088 ) This is just to simplify the handling, there is no generated code difference.	2018-03-28 22:23:20 -04:00
Simeon Monov	a90aa5d818	Fix small typo in setup.py (#6091 ) Fixed small typo in setup.py	2018-03-28 16:51:08 -07:00
Edward Z. Yang	ba0f18a9d7	Delete defunct .travis files, and move release-notes.md to caffe2 dir. (#2472 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-28 15:54:32 -07:00
Tongzhou Wang	ecd5de0f36	[fft][2 of 3] Forward for fft methods (#5856 ) * implement fft ifft rfft irfft * add tests for fft ifft rfft irfft	2018-03-28 18:44:29 -04:00
gchanan	6ae0576e1c	Remove dtypes from legacy tensor.new(...) (#6081 ) This is in preparation for splitting out sparsity (layout) from dtypes; it's complex to maintain these and tensor.new(...) is a legacy API in any case.	2018-03-28 18:37:21 -04:00
Richard Zou	371e14b807	NLLLoss: error message for mismatched input/target batch sizes (#6072 ) Fixes #5554 Adds an error message for when NLLLoss is passed an input and target whose batch sizes don't match. Ideally this check should live in ATen but since there is NLLLoss logic in python the check is there right now.	2018-03-28 14:21:38 -07:00
Will Feng	127cdc324d	Fetch master commit log before perf test (#6077 )	2018-03-28 14:19:13 -07:00
gchanan	2f602dce1d	Correct argument misspelling. (#6076 )	2018-03-28 15:20:40 -04:00
Richard Zou	1807bacd65	Fix printing of unknown binop operator in torchscript (#6069 ) Before, using an unknown binary operator like `@`: ``` import torch @torch.jit.script def mm(x, y): return x @ y x = torch.randn(4, 3) y = torch.randn(3, 2) mm(x, y) ``` resulted in [this not-so-readable trace](https://gist.github.com/zou3519/052b8998108c4bc0fe0e7c85c6f5758e). Now, it tells the user that the problem is an unknown binary operator: ``` NotSupportedError: unsupported binary operator: MatMult @torch.jit.script def mm(x, y): return x @ y ~~~ <--- HERE ```	2018-03-28 19:41:45 +02:00
Orion Reblitz-Richardson	a014a7cd37	Link protobuf public in the standard case	2018-03-28 10:05:20 -07:00
Orion Reblitz-Richardson	e881efde79	Use local FindCUDA for CMake < 3.7	2018-03-28 10:05:20 -07:00
Orion Reblitz-Richardson	3a84574c81	Update CAFFE2_LINK_LOCAL_PROTOBUF functionality. * Continuation of https://github.com/caffe2/caffe2/pull/2306 and based on Yangqing's PR at https://github.com/caffe2/caffe2/pull/2326 * Put caffe2_protos as static library and link it whole to libcaffe2.so * For protobuf::libprotobuf, only link it to libcaffe2_protos (and hence libcaffe2.so), but not any downstream library. This avoids manipulating protobuf objects across dll boundaries. * After the above, during linking one will receive complaint that fixed_address_empty_string is not found. This is because we compiled protobuf with hidden visibility, and the fact that the generated caffe2.pb.h has an inline function that invokes the inline function in protobuf GetEmptyStringAlreadyInited() * Added sed-like commands to replace the generated header to use caffe2::GetEmptyStringAlreadyInited() instead. And, in proto_utils.cc, implement a function that essentially routes the function call to protobuf's internal one. The reason this works is that, caffe2::G... is visible globally, and libcaffe2.so is able to see the real protobuf one. This ensures that we are always calling protobuf functions that are inside libcaffe2.so.	2018-03-28 10:05:20 -07:00
Orion Reblitz-Richardson	dbac044759	Add protobuf wrapper functions to proto_utils. * These will be used when we statically link libprotobuf.a inside libcaffe2.so	2018-03-28 10:05:20 -07:00
Sebastian Meßmer	b752f4cdda	Fix instance norm (#6023 )	2018-03-28 11:21:16 -04:00
Jason Park	64e2c03bea	Enable TensorDataset to get any number of tensors (#6038 ) Keeping compatibility, enable TensorDataset to get any number of tensors. * Enable TensorDataset to get any number of tensors * Update dataset.py Fix syntax error on python 2.7 * Add several test for tensordataset * Fix whitespaces * Simplify args * Update dataset.py	2018-03-28 11:20:50 -04:00
cpuhrsch	bc7fb1d6d8	Update cpuinfo to d0222b47948234cc01983243a2e0ede018f97f3a (#6043 )	2018-03-28 11:19:37 -04:00
Edward Z. Yang	7f66164a89	Delete defunct .travis.yml and appveyor.yml files (#2429 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-28 07:55:03 -07:00
Jiaming Liu	31c0e2321a	Block set from param_group['params'] (#6031 ) * Block set from param_group['params'] This might cause `list(params)` to output in random order. In this case, in `load_state_dict()`, `id_map` would not be matched correctly. * Update Error Message * Add Warning on Optimizer Docs * Update optimizer.py	2018-03-28 07:45:19 -07:00
lazypanda1	063946d2b3	Added parameter range checks for all optimizers (#6000 )	2018-03-28 11:22:23 +02:00
peterjc123	ae4362bc6a	Fix memory leak when using multiple workers on Windows (#5585 )	2018-03-28 10:35:28 +02:00
sundw2014	8964aab260	fix docs error in torch.nn.functional.nll_loss (#6060 ) According to the code in _torch/nn/functional.py:1399_ (```if target.size()[1:] != input.size()[2:]:```), if the size of input is (N, C, d_1, d_2, ..., d_K), the size of target should be (N, d_1, d_2, ..., d_K).	2018-03-28 10:05:14 +02:00
onnxbot	e114e84d91	[auto] Update onnx to 36d7fff - Fix Attribute default value pybind11 binding (#671 ) `36d7fffaf3`	2018-03-28 06:58:24 +00:00
onnxbot	0c7c34253b	[auto] Update onnx to 0536866 - git ignore .pytest_cache (#674 ) `053686607d`	2018-03-28 06:54:44 +00:00
Zachary DeVito	0f198fa723	Add additional script module functionality (#6033 ) * allow calls to non-script methods, allow python non-script attributes in methods * add test to make sure submodules are not reassigned * Test that we can change python attributes	2018-03-27 23:37:56 -07:00
Yinghai Lu	d3f92eebee	Remove redundant code (#2460 )	2018-03-27 22:44:39 -07:00
onnxbot	86e285d0e0	[auto] Update onnx to afc84ac - Update README.md (#672 ) `afc84aca45`	2018-03-28 03:52:00 +00:00
Edward Z. Yang	eb18a2f26c	Reorganize third-party libraries into top-level third_party directory (#6025 ) - gloo, pybind11, nanopb and nccl now live in third_party. - ATen builds in aten/build rather than torch/lib/build/aten - A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools - Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly	2018-03-27 22:09:20 -04:00
Paul Jesse Hellemn	02d5ae6c9b	Removing verbose logging from windows (#2455 )	2018-03-27 18:19:55 -07:00
Lu Fang	344fa57680	Adjust the test since only the op only has CPU implementation	2018-03-27 18:10:39 -07:00
Lu Fang	8b434d1141	Quick fix on the observer test	2018-03-27 18:10:39 -07:00
Lu Fang	6412adcef3	Move the stump op to oss	2018-03-27 18:10:39 -07:00
Lu Fang	0ac8495165	Fix the CMake issues caused by internal changes	2018-03-27 18:10:39 -07:00
Xiaolong Wang	af3dcdf6ae	[D2]: Improve loss weight by allowing omitted weights as titled	2018-03-27 18:10:39 -07:00
Xiaolong Wang	d6c30ee6af	[GanH]: Unifying two discriminators to improve the flexibility and combines different discriminators in one model.	2018-03-27 18:10:39 -07:00
Jongsoo Park	3300e21d52	Add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather add SparseLengthsPositionalWeightedSum operator that fuses SparseLengthsWeightedSum, LengthsRangeFill, and Gather	2018-03-27 18:10:39 -07:00
Xianjie Chen	e6b04ba121	fix lengths sum cuda op for empty batch the cuda does not allow launching empty kernel	2018-03-27 18:10:39 -07:00
Xianjie Chen	6ed9a0c3f2	fix cuda elementwise ops for empty batch CUDA will fail to launch empty kernel	2018-03-27 18:10:39 -07:00
Dehua Cheng	c6587597d8	Ignore backward step when there is no loss function; Ignore backward step when there is no loss function; For some customized model, we can encode the update directly in forward step and there is no backward step;	2018-03-27 18:10:39 -07:00
Xiaolong Wang	c909abd85f	[GanH] Label Smooth: Add Layer and Integrate to SparseNN as titled	2018-03-27 18:10:39 -07:00
Yan Zhu	107cb670b1	add typecast and assertion for histogram computing as title	2018-03-27 18:10:39 -07:00
Summer Deng	26fbfa959e	Integrate fbgemm fp16 with Caffe2 Added C2 operators and python test Added transformation from FC to FBPackedFC and unit test	2018-03-27 18:10:39 -07:00
Xianjie Chen	078b6d5ad1	[layer model] remove duplicated init ops it saves some model init time, and reduce confusion.	2018-03-27 18:10:39 -07:00
Martin Schatz	d5e38a8aee	[PerfModel] Add Profile observer Adds profile observer to system. This outputs the following information 1) Input tensor sizes 2) Argument list 3) Output tensor sizes 4) Operator run time Example output: I0206 14:00:51.217067 1730559 profile_observer_gpu.cc:53] --------- Starting operator Conv op#0 --------- I0206 14:00:51.217073 1730559 profile_observer_gpu.cc:65] Input 0: Tensor gpu_0/data of type float. Dims: (32,3,227,227,): I0206 14:00:51.217077 1730559 profile_observer_gpu.cc:65] Input 1: Tensor gpu_0/conv1_w of type float. Dims: (64,3,7,7,): I0206 14:00:51.217082 1730559 profile_observer_gpu.cc:71] Argument 0: name: "kernel" i: 7 I0206 14:00:51.217087 1730559 profile_observer_gpu.cc:71] Argument 1: name: "enable_tensor_core" i: 0 I0206 14:00:51.217089 1730559 profile_observer_gpu.cc:71] Argument 2: name: "exhaustive_search" i: 1 I0206 14:00:51.217092 1730559 profile_observer_gpu.cc:71] Argument 3: name: "float16_compute" i: 0 I0206 14:00:51.217095 1730559 profile_observer_gpu.cc:71] Argument 4: name: "stride" i: 2 I0206 14:00:51.217099 1730559 profile_observer_gpu.cc:71] Argument 5: name: "pad" i: 3 I0206 14:00:51.217103 1730559 profile_observer_gpu.cc:71] Argument 6: name: "order" s: "NCHW" I0206 14:00:51.217105 1730559 profile_observer_gpu.cc:71] Argument 7: name: "ws_nbytes_limit" i: 67108864 I0206 14:00:51.217109 1730559 profile_observer_gpu.cc:85] Output 0: Tensor gpu_0/conv1 of type float. Dims: (32,64,114,114,): I0206 14:00:51.217111 1730559 profile_observer_gpu.cc:88] --------- Finished operator Conv in 1.12685 ms --------- Example output for internal RNN op (from seq2seq): I0219 18:57:06.779331 2960991 profile_observer_gpu.cc:52] --------- Starting operator LSTMUnit op#3161697160-7 --------- I0219 18:57:06.779336 2960991 profile_observer_gpu.cc:59] Input 0: Tensor model0/encoder/layer3/lstm/hidden_t_prev of type float. Dims: (1,1,512,): I0219 18:57:06.779340 2960991 profile_observer_gpu.cc:59] Input 1: Tensor model0/encoder/layer3/lstm/cell_t_prev of type float. Dims: (1,1,512,): I0219 18:57:06.779343 2960991 profile_observer_gpu.cc:59] Input 2: Tensor model0/encoder/layer3/lstm/gates_t of type float. Dims: (1,1,2048,): I0219 18:57:06.779346 2960991 profile_observer_gpu.cc:59] Input 3: Tensor encoder_lengths of type int. Dims: (1,): I0219 18:57:06.779350 2960991 profile_observer_gpu.cc:59] Input 4: Tensor timestep_rnnexec_t24 of type int. Dims: (1,): I0219 18:57:06.779353 2960991 profile_observer_gpu.cc:70] Argument 0: name: "no_sequence_lengths" i: 0 I0219 18:57:06.779357 2960991 profile_observer_gpu.cc:70] Argument 1: name: "drop_states" i: 0 I0219 18:57:06.779362 2960991 profile_observer_gpu.cc:70] Argument 2: name: "forget_bias" f: 0 I0219 18:57:06.779366 2960991 profile_observer_gpu.cc:79] Output 0: Tensor model0/encoder/layer3/lstm/hidden_t of type float. Dims: (1,1,512,): I0219 18:57:06.779369 2960991 profile_observer_gpu.cc:79] Output 1: Tensor model0/encoder/layer3/lstm/cell_t of type float. Dims: (1,1,512,): I0219 18:57:06.779372 2960991 profile_observer_gpu.cc:89] RecurrentNetwork 3161697160: order: 7 I0219 18:57:06.779373 2960991 profile_observer_gpu.cc:92] --------- Finished operator LSTMUnit in 0.00153923 ms --------- Existing deficiencies: 1) Need support to create separate CPU and GPU builds Once this is approved, I'll port the changes over to OSS	2018-03-27 18:10:39 -07:00
Martin Schatz	677c8d6769	[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot Same as D7281311 but with DotProduct TensorInference removed	2018-03-27 18:10:39 -07:00
Roxie He	d2453afb1e	Add SumElementsInt operator Added a caffe2 math sum operator so that it takes integers (only int32) Changed the SumFloatIter to SumGenericIter so that it takes >1 types. Added a sumElementInt operator	2018-03-27 18:10:39 -07:00
Janusz Kudelka	a0a136117c	Faster positive modulo in IndexHashOp Change the positive modulo computation to use less modulo. This should run ~2x faster (just the modulo part). In addition, we should later switch to compute reciprocal modulo.	2018-03-27 18:10:39 -07:00
James Cross	16312e8123	[fbtranslate/onnx] decoder step (pytorch -> caffe2) exporter for fbtranlsate This code introduces a new class for exporting decoder step (ensemble) models trained with fbtranslate pytorch to Caffe2 models via ONNX, for the purpose of use in "component beam search" being developed concurrently in C++ by @juancarabina.	2018-03-27 18:10:39 -07:00
Martin Schatz	60d6ecd90f	Back out "[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot" Original commit changeset: 48fccd71a270	2018-03-27 18:10:39 -07:00
Shannon Zhu	0a4f146228	Codemod imports from libfb to use full path /caffe2 Codemoding imports from libfb.py of the format "from libfb import X". This is part of a larger codemod to remove the mapping from libfb/py to libfb, in the interest of enabling static typechecking in fbcode.	2018-03-27 18:10:39 -07:00
Manoj Krishnan	a92a6233b5	Enable support for placeholder ops in InjectCrossDeviceCopies This is required to support placeholder/decorator ops which does not have operator schema. Note that the change is made in such a way that it is a no-op if placeholder Ops are not used. Changes: 1. Since the placeholder ops always run on CPU, added a utility to infer placeholder ops blob devices. 2. Placeholder op's input/output blobs should be on CPU as well. This change takes care of dealing with output blobs - i.e. use blobs on CPU. 3. Added a Unit test - test_inject_copy_placeholder_ops	2018-03-27 18:10:39 -07:00
Martin Schatz	84605438f2	[PerfModel] Added analytical counters for FCTransposed, BatchMatMul, BatchOneHot	2018-03-27 18:10:39 -07:00
Martin Schatz	8baa563daf	Change observer copy() method to take id parameter This diff is added to support the ProfileObserver in order to differentiate operators in the stepnet properly. Since copy() is only used in the context of RNNs, the name has been changed to reflect that.	2018-03-27 18:10:39 -07:00
Dmytro Dzhulgakov	e977825c01	Merge the conflicts	2018-03-27 18:10:39 -07:00
cpuhrsch	bde2f6b298	ATen Unary Ops (#6030 ) Implements a few unary operations for which there are AVX intrinsics. The perf comparison script is here: https://paste.fedoraproject.org/paste/f1adcJhpGtzDNWImS34XzQ	2018-03-27 20:39:28 -04:00
Paul Jesse Hellemn	9f3a46c583	Bumping aten to latest commit (#2453 )	2018-03-27 16:09:07 -07:00
Edward Z. Yang	3c577fccf3	Move Caffe2 Dockerfiles to docker/caffe2 (#2430 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 15:51:50 -07:00
Edward Z. Yang	b7084e4028	Move .jenkins to .jenkins/caffe2 (#2434 ) * Move .jenkins to .jenkins/caffe2 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 15:49:55 -07:00
Edward Z. Yang	da7193b69a	Update gloo to PyTorch's version (#2451 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 15:42:26 -07:00
Edward Z. Yang	0eab63d9dd	pybind11 submodule update to PyTorch's version. (#2450 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 15:41:55 -07:00
Jiyan Yang	8fa38f8dce	Add gradient clipping (#2452 ) As titled.	2018-03-27 15:10:15 -07:00
onnxbot	c4e5001af8	[auto] Update onnx to 9d2b530 - Revert "[Typing 1/3] Setup mypy type checker (#607 )" (#667 ) `9d2b5301ac`	2018-03-27 21:47:02 +00:00
bddppq	ebc0194950	Fix use-after-free bug in peephole pass (#6037 ) * Fix use after free bug in peephole pass * Move the loop befor the switch	2018-03-27 17:25:38 -04:00
Russ Ferriday	8054dbd655	Trivial typo (#6053 )	2018-03-27 14:21:47 -07:00
Edward Z. Yang	b5fa9a82c8	Update Dockerfile build instructions for new layout. (#6051 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 14:11:22 -07:00
Marcin Elantkowski	5583c12888	Fix bias size assert in Bilinear (#5992 )	2018-03-27 23:05:04 +02:00
onnxbot	2ad57eeea9	[auto] Update onnx to 086727e - [Typing 1/3] Setup mypy type checker (#607 ) `086727e5a0`	2018-03-27 20:11:54 +00:00
Orion Reblitz-Richardson	1d5780d42c	Remove Apache headers from source. * LICENSE file contains details, so removing from individual source files.	2018-03-27 13:10:18 -07:00
Orion Reblitz-Richardson	2017c9caef	Add script for removing Apache header. * Can be used to remove the license header added with add_apache_header.sh	2018-03-27 13:10:18 -07:00
Will Feng	34f2f48394	Allow larger margin of error for GPU perf test runtime (#6044 )	2018-03-27 15:55:59 -04:00
gchanan	db53389761	Add numpy.array-like type inference to torch.tensor. (#5997 ) * Add numpy.array-like type inference to torch.tensor. * Temporary fix for int/double types. * Treat python floats as the default (scalar) dtype. * Also make 0-length sequences the default scalar type and add more tests. * Add type inference to sparse_coo_tensor. * Fix sparse test. * Remove allow_variables. * Check numpy platform bits. * Address review comments. * Make suggested changes to constraints. * More checking windows builds. * Fix test for windows.	2018-03-27 15:27:23 -04:00
Kutta Srinivasan	c89685a115	Make error messages in net_dag more clear	2018-03-27 11:56:49 -07:00
onnxbot	5f90d41211	[auto] Update onnx to 5716e20 - Convert all Node tests to Model tests (#651 ) `5716e2076b`	2018-03-27 18:45:21 +00:00
Edward Z. Yang	a3d08de331	Move .jenkins to .jenkins/pytorch (#6004 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 10:54:32 -04:00
Edward Z. Yang	49f2bb7e0b	Extra comment about backward vs. grad in engine. (#6005 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-27 10:54:06 -04:00
Peter Goldsborough	47f31cb1e6	Update FAQ to make more sense after tensor/variable merge (#6017 )	2018-03-27 07:48:25 -07:00
Paul Jesse Hellemn	f393c90cda	Moving conda/ to caffe2/conda (#2428 ) * Moving conda/ to caffe2/conda * fix * Moving caffe2/conda to conda/caffe2	2018-03-27 07:48:23 -07:00
Jason Gauci	f93e820e7d	Revert "[C2][GPU]LengthsMax CUDA version (#2209 )" (#2444 ) This reverts commit 71acc269bb573c8c04343e6d534b2557a456b29a.	2018-03-27 01:15:52 -07:00
harouwu	6740126f5c	[C2][GPU]LengthsMax CUDA version (#2209 ) lengthsmax CUDA version. will provide gradient later	2018-03-27 00:19:17 -07:00
Orion Reblitz-Richardson	9e2001683e	Move doc generation code into docs/caffe2. (#2435 )	2018-03-26 21:07:40 -07:00
Kutta Srinivasan	0e0918cb9a	dpm synchronize	2018-03-26 19:54:31 -07:00
mlappelbaum	d11fc90317	Export atomic iter count (#2379 ) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Add axis to top_k_op. (#2416) * Revert update on top_k_op * Add axis to top_k_op Add axis to top_k_op * [auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647) `a8e4648a7d` * [auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617) `f4acf281ef` * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Initialize cpuinfo in the thread pool Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself. This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized. * Updated Python Op and Image Pre-Processing Pipeline tutorials && Added CIFAR-10 Part 1 tutorial (#2286) * Updated Basics tutorial: (1) Added Python 3 support with __future__ statements; (2) Various grammatical/typo fixes and minor refactoring of Markdown * Added Python 3 support and made minor typo fixes * Added Python 3 support with future imports, refactored and corrected errors in Markdown, added comments * Added Python 3 support with future imports, Added use of caffe_translator.py to translate downloaded .caffemodel file to .pb files * Upgrades to Image Pre-Processing Pipeline tutorial * Updated Python Op tutorial * removed markdown with empty links * Added Part 1 of an end-to-end CIFAR-10 tutorial * Updated MNIST Dataset and Databases tutorial with python3 support and markdown fixes * Tweaks to markup, less training iterations * changed permissions of CIFAR10_Part1; typo corrections in Image_Pre-Processing_Pipeline * Typo corrections in Multi-GPU Training tutorial * sync Python_Op py_gen with the IPython notebook * nit typo correction * [auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653) `5cb999ddc1` * [auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657) `ecac1c1624` * Strip down onnx to only pb definitions in mobile build (#2426) * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-26 19:26:09 -07:00
Yinghai Lu	b6e80a1ec4	Caffe2-onnx exporter (#2248 ) * caffe2-onnx frontend * Remove Python part of the conversion code * nit * convert more ops * Address commmetns	2018-03-26 19:23:45 -07:00
cpuhrsch	a589180021	Update cpuinfo submodule (#6014 )	2018-03-26 19:25:02 -04:00
Edward Z. Yang	ef4c09fb4a	mkl-include is not installable if your conda is too old. (#6022 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-26 18:46:54 -04:00
Edward Z. Yang	64e94f02b7	Move Dockerfile to docker/pytorch (#6009 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-26 17:26:34 -04:00
onnxbot	b6b2edb96f	[auto] Update onnx to 6fe932a - Replace unittest.skip with custom exception (#659 ) `6fe932a3e1`	2018-03-26 21:20:58 +00:00
Lu Fang	fc030bf377	Remove consumed_input (#5928 )	2018-03-26 16:58:38 -04:00
bddppq	1e417e23bc	Strip down onnx to only pb definitions in mobile build (#2426 )	2018-03-26 13:52:16 -07:00
onnxbot	2a47fb3082	[auto] Update onnx to ecac1c1 - Merge Rel 1.1.0 branch into master (#657 ) `ecac1c1624`	2018-03-26 20:51:33 +00:00
onnxbot	f2bc1dc099	[auto] Update onnx to 5cb999d - Minor cleanups to shape inference (#653 ) `5cb999ddc1`	2018-03-26 20:23:03 +00:00
Marat Dukhan	7462eca363	Initialize cpuinfo in the thread pool Thread pool called cpuinfo_get_processors_count() without initializing cpuinfo. Only by luck it didn't make Caffe2 single-threaded: threadpool is initialized after NNPACK, and NNPACK initializes cpuinfo itself. This commit also updates cpuinfo to a version that aborts with a fatal error if its used uninitialized.	2018-03-26 15:44:47 -04:00
Richard Zou	5d628db0a2	Deprecate ctx.saved_variables via python warning. (#5923 ) * Deprecate ctx.saved_variables via python warning. Advises replacing saved_variables with saved_tensors. Also replaces all instances of ctx.saved_variables with ctx.saved_tensors in the codebase. Test by running: ``` import torch from torch.autograd import Function class MyFunction(Function): @staticmethod def forward(ctx, tensor1, tensor2): ctx.save_for_backward(tensor1, tensor2) return tensor1 + tensor2 @staticmethod def backward(ctx, grad_output): var1, var2 = ctx.saved_variables return (grad_output, grad_output) x = torch.randn((3, 3), requires_grad=True) y = torch.randn((3, 3), requires_grad=True) model = MyFunction() model.apply(x, y).sum().backward() ``` and assert the warning shows up. * Address comments * Add deprecation test for saved_variables	2018-03-26 14:13:45 -04:00
Vedanuj Goswami	4dc8c2a3cf	Add descriptive error message for test_cpp_extensions ModuleNotFoundError (#5978 ) * Add descriptive error message for test_cpp_extensions ModuleNotFoundError error. * Modify the error message	2018-03-26 14:11:11 -04:00
Tongzhou Wang	cfd94c481e	Add precision matrix to MultivariateNormal (#5998 ) Also changed some .contiguous().view() to .reshape().	2018-03-26 14:09:38 -04:00
Tongzhou Wang	39829c1670	Improve docs (#5999 ) * Clarify det and svd doc on when backward is not stable * Fix some links in nn.functional doc; improve upsampling doc	2018-03-26 14:09:11 -04:00
peterjc123	1ab248d09e	Fixes #5973 : Stop printing verbose warnings for MSVC (#6001 ) * Stop printing verbose warnings * Add missing options * Fix for misspelling	2018-03-26 09:40:30 -04:00
JP	2df578a71a	add mkl dependencies to setup (#5991 )	2018-03-25 23:21:16 -04:00
onnxbot	c6e903f804	[auto] Update onnx to f4acf28 - Remove allowconsumed enforceconsumed from op schema. (#617 ) `f4acf281ef`	2018-03-25 23:51:09 +00:00
Fritz Obermeyer	b2da9fd220	[distributions] Rename .params to .arg_constraints, fix logic (#5989 )	2018-03-25 15:24:32 +02:00
Fritz Obermeyer	03a6952ac9	[distributions] Fix scalar bugs in torch.distributions.transforms etc. (#5931 )	2018-03-25 13:33:31 +02:00
Fritz Obermeyer	f895698183	Implement MultivariateNormal.mean, .variance properties (#5988 )	2018-03-25 13:32:06 +02:00
Will Feng	f6274a4ef7	Fix "command not found" error in perf test (#5982 )	2018-03-24 23:46:48 -04:00
Tongzhou Wang	f9882473b2	add pip mkl-devel to the error message when mkl is found but mkl headers are not (#5984 )	2018-03-24 18:25:41 -04:00
Du Phan	41c84ca735	Support batch LowerCholeskyTransform (#5980 ) * batch lower cholesky transform * add checking contiguous * remove cache file	2018-03-24 13:43:46 -04:00
Tongzhou Wang	5d77709485	Linearly interpolating upsampling fix (#5927 ) * Changes in bilinear upsampling * Add align_corners option to upsampling module & functional when using linearly interpolating modes When align_corners=True, it uses the old original upsampling scheme, which gives visually better results, but doesn't properly align input and output pixels, and thus cause the output vary basing on input. This PR adds this align_corners option, and changes the default behavior to align_corners=False, with proper warning if this option is not specified upon using nn.Upsample or nn.functional.upsample to let be aware of this new change. Adds tests in test_nn.py for spatial invariance when align_corners=False, and usual module tests for align_corners=False. * remove redundant checks and unnecessary variables; fix the cast * fix negative indices	2018-03-24 12:21:13 -04:00
Will Feng	2f8d6582de	Store perf numbers in S3 (#5951 ) * Store perf numbers in S3 Previously the perf numbers are stored in https://github.com/yf225/perf-tests/tree/cpu, but we couldn't figure out a way to push the perf numbers only from master builds. This PR moves the perf number storage to S3, which allows us to have finer control over when to push the new numbers. This is in replacement of #5844 - storing numbers in RDS has its own problems with schema migration and backward compatibility, and using a NoSQL database might be an overkill at this point. * Fixed issues	2018-03-24 12:19:56 -04:00
peterjc123	332d5ffd11	Modidy setup docs for Windows (#5981 )	2018-03-24 12:17:01 -04:00
Tongzhou Wang	08891b0a4e	Group Normalization (#5968 ) * Group Normalization * move to ATen	2018-03-24 12:16:18 -04:00
Vishwak Srinivasan	ed0f629fe9	[distributions] Implement Power transform (#5976 )	2018-03-24 15:20:07 +01:00
Will Feng	15a981e75a	Disable TestBottleneck test_cuda on Windows (#5977 )	2018-03-24 08:00:32 -04:00
onnxbot	f508e7378e	[auto] Update onnx to a8e4648 - Adjust link flags when built in Windows Debug mode (#647 ) `a8e4648a7d`	2018-03-24 04:08:54 +00:00
Xiaomeng Yang	a73f9af5ab	Add axis to top_k_op. (#2416 ) * Revert update on top_k_op * Add axis to top_k_op Add axis to top_k_op	2018-03-23 20:43:43 -07:00
Richard Zou	9923701a0d	Fix crash when cat-ing empty cuda tensors (#5971 ) Fixes #5739. The CUDA path for `torch.cat` was missing a check for the case where all input tensors are empty.	2018-03-23 22:22:39 -04:00
James Reed	641fb21bdd	Update no_unions flag for nanopb gen and update ONNX proto files (#5972 )	2018-03-23 18:52:33 -04:00
onnxbot	7375ba5e60	[auto] Update onnx to 7c009fe - Fix lint error in optimizer test (#656 ) `7c009fe8df`	2018-03-23 22:36:29 +00:00
Vedanuj Goswami	f3e16cc737	Expose gradients w.r.t. input & weight for conv1d, conv2d, conv3d in Python (#5408 ) This PR addresses issue #5024 * Expose Conv2dBackward in python * Separate interface for exposing gardients of operators * Revert old changes * Add tests * Add conv1d gradients. Refactor tests for grad convolutions * Refactor names and change examples * Remove Varibale from tests for conv backward	2018-03-23 17:49:32 -04:00
AlexanderRadionov	831780390c	Fixed non-determinate preprocessing on DataLoader (#4640 ) dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate. DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087 To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results. To reproduce issue you may change ind_worker_queue to False and run the script several times. Code to reproduce issue is in the corresponding PR. * TestIndividualWorkerQueue added to DataLoader tests * Review fixes * "Simplify" code by removing itertools * Rebase conflicts fix * Review fixes * Fixed shutdown behavior * Removed ind_worker_queue flag. * Rebase on master * Disable tests that use DataLoader with multiple workers (#5322)	2018-03-23 17:43:59 -04:00
Vedanuj Goswami	83de3a0b0e	add AVX2 implementation for sigmoid function (#5010 ) PR introduces AVX2 optimization for sigmoid floats. Issue #4929. The internal benchmark shows ~10x speedup. Added AVX2 vectorized sigmoid using the 8-way vectorized exp (exp256_ps) in avx_mathfun.h. Implemented vector dispatch for sigmoid. Since sigmoid function is defined for floats and doubles only, for now, added preprocessor #ifdef to init sigmoid dispatch only for float and double. Vector functions in THVector.h were not called for all of the basic functions in floating point or double only. Changed the LAB_IMPLEMENT_BASIC_FUNCTION define in THTensorMatch.c to use THVector_(NAME) implementations if the inputs are contiguous. For the functions that do not have vectorized SIMD implementations will use the same default function from THMath.h * add AVX2 implementation for sigmoid function * Fix bug in AVX2 code for sigmoid * Add new macro for custom vectorized functions	2018-03-23 17:34:34 -04:00
Richard Zou	feb2785c5c	Implement torch.util.bottleneck (#5216 ) * Implement torch.util.bottleneck This is a tool that is intended to be used as initial exploratory debugging of bottlenecks in user scripts. Run it with python -m torch.utils.bottleneck /path/to/source/script.py * Refactor and address comments * Fix tests * Allow passing of args to the profiled script * Replace Variable	2018-03-23 17:27:35 -04:00
Sam Gross	3cc00e8b2f	Remove pragma once from cpp file (#5965 )	2018-03-23 21:26:36 +01:00
Yinghai Lu	810edb615d	[easy] Minor improvement of the code quality in caffe2/onnx (#2396 ) * code quality * Comments	2018-03-23 13:25:58 -07:00
bddppq	7a5e2af6d5	Follow new version number in setup.py (#2266 )	2018-03-23 13:22:26 -07:00
onnxbot	e5f4b9dc0e	[auto] Update onnx to 063d12f - Fix optimizer split pass for models with constant output (#652 ) `063d12f6a9`	2018-03-23 18:56:16 +00:00
Vishwak Srinivasan	8cf521b522	fix mvn docs (#5967 )	2018-03-23 14:26:55 -04:00
Tongzhou Wang	4dc55a4240	Fix incorrect rendering of Tensor.index_*_ doc examples. (#5969 )	2018-03-23 14:26:21 -04:00
onnxbot	8fbad1b28a	[auto] Update onnx to a4dcc47 - Minor code quality improvements in defs/ (#613 ) `a4dcc47791`	2018-03-23 18:23:40 +00:00
bddppq	425361af6a	Bump onnx opset version (#2402 )	2018-03-23 10:48:12 -07:00
onnxbot	34e49ceb83	[auto] Update onnx to c88ab71 - Verionize model zoo with opset version (#650 ) `c88ab71e98`	2018-03-23 17:26:53 +00:00
onnxbot	8c92be5320	[auto] Update onnx to 8c90dc1 - Add maxpool test cases (#573 ) `8c90dc1dd9`	2018-03-23 17:07:11 +00:00
James Reed	213fa61706	Implement range for loop in script (#5827 ) * Implement range for loop in script * Fix handling of boolean constants * Use WithInsertPoint * Allow dynamic max trip count * fix symbols * Fix argument order * fix test * Add insert{Input,Output} APIs and use them * Factor out condition stuff * clang-format * Address remaining comments * Fix tests * Implement script in AST frontend	2018-03-23 11:55:32 -04:00
peterjc123	03495137d0	Add windows doc (#5859 )	2018-03-23 11:54:44 -04:00
Richard Zou	8e22ef0cb2	Support legacy empty tensor behavior in cat (#5889 ) * Support legacy empty tensor behavior in cat Continuing from #5837: Fixes #5332. Currently, the following behavior happens with torch.cat: ``` import torch x = torch.randn(4, 3, 32, 32) empty = torch.Tensor([]) res1 = torch.cat([x, empty], dim=1) res2 = torch.cat([empty, x], dim=1) ``` However, at some point in the past, res1 and res2 were equal. This PR supports the legacy behavior of ignoring empty tensors when concatenating a list of tensors, until we have empty tensors that can have arbitrary shape, at which point we'll stop supporting this behavior. * Address comments	2018-03-23 11:53:31 -04:00
Simeon Monov	c4ee2b7067	Moved torch headers copy to build_deps (#5772 ) * Moved torch headers copy to build_deps PR #5706 initially moved headers under build_ext to fix bdist_wheel and build develop. This broke install and #5755 moved them back to install which broke bdist_wheel and build develop. Looks like build_ext is called from install after it already tried to copy the headers to the python install dir and the headers were not installed correctly. Using build_deps works correct with all setup.py install, bdist_wheel and build develop. * Comment about the auto-generated files Added comment that the current solution will not include auto-generated files which may be a problem if somebody needs to use them	2018-03-23 11:34:27 -04:00
Marat Dukhan	0045895837	Update speed_benchmark binary - Support specifying type (float or uint8_t) for inputs - Create input blobs if they don't exist	2018-03-23 11:26:23 -04:00
Katrin Leinweber	2030ac7545	Recommend citation (implements #4126 ) (#5955 )	2018-03-23 09:57:29 -04:00
onnxbot	fe6c5ad435	[auto] Update onnx to 1e613b5 - Add DepthToSpace test cases (#619 ) `1e613b5d4e`	2018-03-23 08:03:43 +00:00
onnxbot	5c87e55a4d	[auto] Update onnx to 34d9ad2 - struct InferenceContext needs a virtual destructor (#648 ) `34d9ad20de`	2018-03-23 07:48:06 +00:00
Yinghai Lu	21918b94e4	Add InheritOnnxSchema property to c2 op schema (#2366 ) * Add InheritOnnxSchema property to c2 op schema * Add onnx inherit for {Conv,Maxpool,AveragePool}{1D,2D,3D}	2018-03-22 22:50:27 -07:00
bddppq	bbb7c722df	Remove legacy onnx optimizer tests (#2394 )	2018-03-22 21:08:05 -07:00
Yinghai Lu	b4d33cefc1	Fix compiling issue with CAFFE2_NO_SANITIZE (#2386 )	2018-03-22 20:48:43 -07:00
Qinqing Zheng	1288c4fd79	refactor epoch_limiter (#2389 ) * refactor epoch_limiter * fix test	2018-03-22 20:32:13 -07:00
bddppq	f3b7b2f293	Remove ONNX consumed_inputs (#2278 ) * Remove ONNX consumed_inputs * Bump up opset version to 6 issued by onnx caffe2 frontend	2018-03-22 20:24:35 -07:00
onnxbot	81a29967c5	[auto] Update onnx to 5f69c37 - Remove the only use of EnforceConsumed (#640 ) `5f69c37628`	2018-03-23 03:23:28 +00:00
onnxbot	e1948d7377	[auto] Update onnx to 85133e9 - Introduce shape inference (#564 ) `85133e9849`	2018-03-23 01:12:18 +00:00
onnxbot	1e1be56591	[auto] Update onnx to 0f49cb6 - Set 2GB protobuf parse limit (#646 ) `0f49cb696c`	2018-03-23 00:14:38 +00:00
Richard Zou	e3e0c34390	Unify error checking for tesnor.index_copy_ (#5642 )	2018-03-22 20:07:15 -04:00
Lu Fang	e35212ebd0	Handle the ONNX opset for BatchNormalization (#2382 ) * Handle the ONNX opset for BatchNormalization * address comments	2018-03-22 15:09:54 -07:00
Qinqing Zheng	566a25e1e4	Add keyword argument to PipeReaderBuilder (#2381 ) att	2018-03-22 14:17:47 -07:00
Bram Wasti	c803ed524e	fix windows build	2018-03-22 13:55:53 -07:00
bddppq	d946267b80	Specify outputs number in embedding_bag onnx export (#5935 )	2018-03-22 16:55:23 -04:00
Edward Z. Yang	2ad972c9eb	A complete revamp of our test scripts. (#5904 ) - All of the scripts are based off of the idea that they should be as simple as possible, and all the heavy lifting done in the construction of the Docker file. The scripts are really simple now. A bigger philosophical discussion can be found in .jenkins/README.md - build-asan.sh is split out of build.sh, as ASAN builds are a bit specialized and it's inappropriate to run many of the other builds as part of them. - We now build and run with mkl/mkl-include on the CPU only builds - We now report sccache and ccache stats at the end of all builds. - run_test.py flushes stdout/stderr before making a subprocess call, which should solve our interleaving problems. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-22 16:31:50 -04:00
Richard Zou	c9c978dff0	Fix tensor.permute(dims) backward for negative dims (#5945 ) Fixes #5943 For the following code: ``` import torch u = torch.zeros((3, 3), requires_grad=True) v = u.permute(-1, -2) # (1, 0) here is fine v.sum().backward() ``` during the backward pass, a std::vector is constructed as an "inverse" of the permutation. To do this, all the dims are indexed into the vector. The problem with that is that the negative dims were being indexed into the std::vector, causing undefined behavior. This PR wraps those negative dims so they're handled correctly.	2018-03-22 16:30:55 -04:00
bwasti	977bae0a71	move nomnigraph to OSS @already-on-github moving nomnigraph to caffe2/core for open source	2018-03-22 12:42:32 -07:00
Bram Wasti	504320d85b	Update README.md	2018-03-22 12:30:08 -07:00
Yinghai Lu	45da53f478	Remove Python onnx-caffe2 conversion code (#2362 ) * WIP * Remove Python onnx-caffe2 onversion code * Fix build * Comments * Add comments * Fix typo in comments	2018-03-22 11:59:03 -07:00
Adam Paszke	a58f2d242a	Test both Python and string JIT frontends (#5891 )	2018-03-22 16:58:36 +01:00
cpuhrsch	befd9642bf	py3 - use loop instead of map for test_torch:test_cpu_parallel (#5940 )	2018-03-22 11:28:29 -04:00
Jon Malmaud	add04c56bf	Verify that 'catch' submodule has been checked out before attempting build. (#5941 )	2018-03-22 11:28:04 -04:00
bddppq	2a02ec6537	Fix index out of range error when view a scalar as 1-dim tensor (#5934 )	2018-03-22 09:39:59 -04:00
Luca Antiga	37a84dd40d	Move definitions of Kind out of NO_PYTHON block (#5914 )	2018-03-22 09:36:08 -04:00
Xiaomeng Yang	3053618624	Add argmax and argmin ops (#2371 ) * Revert update on top_k_op * Add axis to top_k_op * Remove do { ... } while (false) * Revert top_k op to upstream * Add argmin and argmax ops Add argmin and argmax ops * Revert top_k_test to upstream * Add argmin and argmax ops Add argmin and argmax ops	2018-03-22 00:52:11 -07:00
cpuhrsch	e9f144b3e8	parallel_for_2d fix and guarding avx/avx2 compilation (#5926 ) Fix for #5921. I'm adding support compilers that don't support -mavx -mavx2 by revisiting the dispatch code.	2018-03-22 01:14:56 -04:00
Adam Paszke	418aad2c54	Add support for subscripts in Python frontend (#5890 )	2018-03-22 01:11:25 -04:00
James Reed	48c70d2dbd	Fix ReduceMean performance by specializing Eigen implementation for common shapes (#2355 )	2018-03-21 21:48:54 -07:00
Zachary DeVito	c8d1ec02be	[jit] Have ScriptModule inherit from Module (#5769 ) * Have ScriptModule inherit from Module This is accomplished by created replacement _parameters, _buffers, and _modules which implement the OrderedDict APIs but which actually get/set their members inside script::Module * Merge TracedModule with ScriptModule * Move logic of attribute handling into Python bindings rather than make script::Module handle it. This was redundant with nn.Module, which already handles attribute. * Make TracedModule a subclass of ScriptModule * Move handling of attribute kind logic into bindings. * Allow ScriptModule to contain non-script module submodules.	2018-03-22 00:17:49 -04:00
Joseph Spisak	b2c56eb219	Removed special handling for onnx sqrt (#2353 )	2018-03-21 21:05:25 -07:00
Lu Fang	1c0862c301	Fix a typo (#2339 )	2018-03-21 17:24:39 -07:00
Yangqing Jia	2d03ae2f85	Move ParseProtobufFromLargeString to proto_utils (#2354 ) * Move ParseProtobufFromLargeString to proto_utils * ParseProtobuf -> ParseProto to be consistent in naming	2018-03-21 17:05:14 -07:00
Maruan	7cbbc0bc74	Implementation of the logistic-normal distribution (#5547 )	2018-03-22 00:32:14 +01:00
Orion Reblitz-Richardson	0ea8964fd6	Revert "Export number of iterations of AtomicIterOp" (#2359 ) * Revert "Use -DCMAKE_BUILD_TYPE=Release for local build by default" This reverts commit 035c62081f6420405b9f1380cc5d21b4c6ae78f6. * Revert "Export number of iterations of AtomicIterOp (#2338)" This reverts commit 91b7a0cb48c6b079e2ca8fd5c26819a003937d76.	2018-03-21 16:11:29 -07:00
Marat Dukhan	3aa393f7e2	Log NNPACK profile to std::cout instead of LOG(INFO) Similar to #2333, but for NNPACK bindings	2018-03-21 18:46:11 -04:00
onnxbot	4b54f04eab	[auto] Update onnx to caf9256 - Do not allow multiple spaces after comma (#638 ) `caf9256a9d`	2018-03-21 22:13:30 +00:00
Ailing	d707dae013	Add half test in test_nn for auto generated tests. (#5362 ) * add half and double test in NewTestModule * add half/double/float tests in NewCriterionTest * resolve merge conflict with master	2018-03-21 16:55:06 -04:00
Marat Dukhan	44039ffcea	Use -DCMAKE_BUILD_TYPE=Release for local build by default	2018-03-21 16:12:32 -04:00
li-roy	e4eee7c2cf	Implement MarginRankingLoss as native function and add reduce=True arg to it (#5346 ) * add reduce=True arg to MarginRankingLoss * make default margin arg match for legacy * remove accidentally added test * fix test * fix native_functions.yaml alphabetical order	2018-03-21 15:40:58 -04:00
mlappelbaum	8346088094	Export number of iterations of AtomicIterOp (#2338 ) * Exported AtomicIterOp count * Exported AtomicIterOp count	2018-03-21 12:39:30 -07:00
Yangqing Jia	611a89c4b6	Remove more protobuf APIs. (#2348 ) * Wrap ShutdownProtobufLibrary * Remove text_format.h header and only put the function in proto_utils.h * ParseFromString returns bool	2018-03-21 10:29:45 -07:00
Lu Fang	b1684e9a3a	Skip DepthToSpace and MaxPool same mode onnx backend tests (#2343 )	2018-03-21 09:24:06 -07:00
theweiho	a3bd7b2875	Optimize unique sorting by using std::vector+sort instead of std::set (#5913 )	2018-03-21 08:51:20 +01:00
onnxbot	ece288392a	[auto] Update onnx to 1a067ba - fix all python lint errors and enforce it in CI (#635 ) `1a067bac03`	2018-03-21 07:03:57 +00:00
Marat Dukhan	75a65ffe0f	Set proper optimization options (#2344 ) - Use -O2 in release build of Caffe2 (Android defaults to -Os on ARMv7) - Update NNPACK submodule to use the proper options for ukernels and layers	2018-03-20 23:41:58 -07:00
onnxbot	ccd8c2a6bc	[auto] Update onnx to d4a378c - Add ONNX_USE_LITE_PROTO (#634 ) `d4a378c02e`	2018-03-21 06:11:10 +00:00
Tongzhou Wang	537e0e0330	better err msg for missing mkl headers (#5894 ) Fixes #5887 . Now it shows: -- MKL library found -- Found a library with BLAS API (mkl). CMake Error at CMakeLists.txt:389 (MESSAGE): MKL header files not found. If using conda, please run `conda install mkl-include`. Otherwise, please make sure that CMake will search the directory containing the header files, e.g., by setting CMAKE_INCLUDE_PATH. -- Configuring incomplete, errors occurred! See also "/home/ssnl/sftp/pytorch/torch/lib/build/aten/CMakeFiles/CMakeOutput.log". See also "/home/ssnl/sftp/pytorch/torch/lib/build/aten/CMakeFiles/CMakeError.log".	2018-03-20 22:25:28 -04:00
Vedanuj Goswami	08b1324ec2	Fix integer overflow in remainder operator (#5906 ) * Fix integer overflow in remainder * Fix remainder operator in CUDA * Add tests for remainder integer overflow * Add has_different_sign static function	2018-03-20 22:05:34 -04:00
Xiaomeng Yang	def37111eb	Update locally_connected_op to reduce transpose dimensions. (#2340 ) * Revert update on top_k_op * Update locally_connected_op to reduce tranpose dims	2018-03-20 17:19:38 -07:00
Lu Fang	6cae6d3841	Update ONNXOpCoverage.md	2018-03-20 15:22:43 -07:00
Adam Paszke	06e86a6455	Add submodules in the ATen subtree (#5911 )	2018-03-20 22:00:56 +01:00
Fritz Obermeyer	e43d0ac92a	[distributions] Support pickling of constraint objects (#5910 )	2018-03-20 22:00:37 +01:00
Lu Fang	1c80ee1c74	Update ONNXOpCoverage.md	2018-03-20 13:56:13 -07:00
Lu Fang	ac1b7b6366	Update ONNXOpCoverage.md	2018-03-20 13:55:33 -07:00
Orion Reblitz-Richardson	42d3bcc189	Only run WeightedMultiSample test on CPU and not GPU.	2018-03-20 13:34:22 -07:00
Orion Reblitz-Richardson	6aa087d902	Revert "export num iterations of AtomicIter" This reverts commit be9c8e5591f5d38131b9bdc2249542f27dadc221.	2018-03-20 13:34:22 -07:00
Xianjie Chen	22d0828f00	[easy] improve error messages as desc. #accept2ship	2018-03-20 13:34:22 -07:00
Yan Shang	69706b2ab4	Add C2 for weighted sampling C2 operator, with input (1) index; (2) cdf; argument number_samples, output number_samples samples from the index.	2018-03-20 13:34:22 -07:00
Xiaolong Wang	4bb73b8361	[GanH] Weighting Layers: Adaptive/Constant/Homotopy use case: to weight multiple losses (real values) as a single composite loss for optimization	2018-03-20 13:34:22 -07:00
Xiaolong Wang	a5279dccd4	[GanH]: homotopy JSD as titled	2018-03-20 13:34:22 -07:00
Matan Appelbaum	fac306d3c9	export num iterations of AtomicIter as title. Useful for tracking number of EASGD updates.	2018-03-20 13:34:22 -07:00
Lukasz Wesolowski	f7f48989ba	GPU support for ChannelBackpropStatsOp Step 2 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelBackpropStatsOp. Similar to D6953411.	2018-03-20 13:34:22 -07:00
Chenguang Xi	3940e7f0a7	Support computing averaged norm in blob magnitdue visualization 1. support the LpNorm operator to calculate the average LpNorm by adding one more boolean argument, i.e., LpNorm(average = true) = LpNorm(x) / size of (x) 2. integrate the average option into visualization framework	2018-03-20 13:34:22 -07:00
Manoj Krishnan	c43896732e	Added device inference functions for Concat and Split Ops. Changes: ======= 1. Added device inference functions for Concat and Split Ops. 2. Added a unit test to validate the change. See, test_device_inference_function in core_test.py 3. Fixed some formatting.	2018-03-20 13:34:22 -07:00
Wei Zhang	e0e334793c	Revert D7219461: Mark full sync data parallel ops with rules This reverts commit 79c56ec5859e25c7caec7bb6b79e80dd19307c64	2018-03-20 13:34:22 -07:00
Wei Zhang	9edbafe0de	Mark full sync data parallel ops with rules Instead of using hard-coded rules or rely on gpu_strategy to mark full sync data parallel ops, we need some generic rules that is applicable to both the single and distributed setting.	2018-03-20 13:34:22 -07:00
Qinfan Wu	7bef225e72	[Caffe2] Fix double map lookup in operator_schema.h [Caffe2] Fix double map lookup in `operator_schema.h`.	2018-03-20 13:34:22 -07:00
Kittipat Virochsiri	35b6b0747a	Fix stop_if() Making sure that stop blob is never overrided.	2018-03-20 13:34:22 -07:00
Summer Deng	0cde2f1cc7	Output blob allocation in Caffe2. Add support to accept manually allocated objects in output blobs and avoid calling the empty constructor of the object.	2018-03-20 13:34:22 -07:00
Yan Shang	40683cdf42	Allow calculating average margin rank loss Similar to LrLoss, we allow for average loss of margin rank loss.	2018-03-20 13:34:22 -07:00
Ashwin Bharambe	d4996e50de	Minor (but important) documentation update for SplitOp This was just a typo, but an important one. Confused me for a while.	2018-03-20 13:34:22 -07:00
Kittipat Virochsiri	72f2cd8bcc	Making preproc_output_schema explicit Make it easier to plug in intermediate steps between preprocessing & trainer by maintaining a stable schema. I also fixed enqueue() so that we can pass in the same blob in multiple location without causing data corruption.	2018-03-20 13:34:22 -07:00
Zhanibek Datbayev	7aeda25cfb	Add type / shape inference for IndexHash op just as title says	2018-03-20 13:34:22 -07:00
Edoardo Conti	6af3429f4f	Add 2D Row-wise Arg Max Operator Add operator to return row-wise arg max of 2D matrix.	2018-03-20 13:34:22 -07:00
Kittipat Virochsiri	9be2de507b	Cleaning up ReaderBuilder interface The way `splits()` is currently used is so convoluted. It's impossible to compose ReaderBuilder. I'm working on a composite reader so this is a prerequisite for it. The idea is that the ReaderBuilder should maintain the states it needs to create a reader. Any setup is done through the new `setup()` method. Currently, `setup()` should only be called once, but, if needed, it should be safe to call it multiple times.	2018-03-20 13:34:22 -07:00
Fei Sun	10e8d7100d	Fix caffe2_benchmark	2018-03-20 13:34:22 -07:00
Di Yu	ab3065de25	Playground refactoring and DataPreproc reader for DAIPlayground at facebook add one more input module preproc everstore for IN1k. It uses the same datasets of sherlock everstroe input reader, then it us DAtaPreproc operator to distribute the image preprocessing on other machine other than the trainer. Suppose to release some compute burdent from trainers. @override-unit-failures (Note: this ignores all push blocking failures!)	2018-03-20 13:34:22 -07:00
Kittipat Virochsiri	a4d0ef2621	Fix stop blob of processing reader See inline comment	2018-03-20 13:34:22 -07:00
Jongsoo Park	f62d6f0578	Limit the number of LOG(INFO) for unavailable engine to 64 (#2332 ) Inlining what glog's LOG_FIRST_N does because not every platform has glog.	2018-03-20 12:48:10 -07:00
Marat Dukhan	9123fcc857	Use std::cout instead of LOG(INFO) in TEST_Benchmark implementation LOG(INFO) can be stripped out at compile-time or disabled at run-time, but there're hardly use-cases where we want to call TEST_Benchmark, but don't want to see the result. Additionally, on Android, LOG(INFO) writes to logcat, which is OK for errors/warnings, but inconvenient for benchmarking results, as on new phones logcat spawns logs like crazy.	2018-03-20 15:31:03 -04:00
onnxbot	d211904be3	[auto] Update onnx to cf76f2f - Add averagepool test case (#572 ) `cf76f2f3cf`	2018-03-20 17:29:13 +00:00
Ailing	84a73775d5	adding fp16 tests in test_nn (#5020 )	2018-03-20 18:01:05 +01:00
Adam Paszke	2f27c1b56b	Revert "Fix ImportError with requests in model_zoo (#5896 )" (#5909 ) This reverts commit 21ce93e88ff8eed1c9fa230a8c5d97d188093705.	2018-03-20 17:35:03 +01:00
Yinghai Lu	efe1c2bd13	hypen as a valid part of model names (#2312 )	2018-03-20 08:52:54 -07:00
Cory Lorenz	21ce93e88f	Fix ImportError with requests in model_zoo (#5896 ) Not sure if this is a backwards compatibility issue. ``` Python 2.7.9 (default, Apr 2 2015, 15:35:35) [GCC 4.9.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import requests.get as urlopen Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named get >>> from requests import get as urlopen >>> ```	2018-03-20 09:51:34 +01:00
Jongsoo Park	a749bde0b1	Make InputSize and OutputSize const member function (#2313 ) Make InputSize and OutputSize const member function	2018-03-20 01:34:12 -07:00
Lu Fang	cda2f02f89	Skip the test average pool same mode tests (#2324 )	2018-03-20 00:13:31 -07:00
onnxbot	2d8a674141	[auto] Update onnx to 4560267 - Use static_cast to replace dynamic_cast to avoid needing RTTI (#629 ) `4560267df4`	2018-03-20 05:59:14 +00:00
Yinghai Lu	b0fe67aca8	Expose more APIs for onnx cpp backend (#2317 )	2018-03-19 22:46:26 -07:00
bddppq	2a4b33bf87	Add doc for torch/onnx/operators.py (#5895 ) * Add doc for torch/onnx/operators.py * lint	2018-03-19 23:48:25 -04:00
Yangqing Jia	b43d6162fb	Remove USE_THREADS since it is needed explicitly. (#2322 )	2018-03-19 20:46:43 -07:00
Xiaomeng Yang	bbafae143b	Caffe2 transpose (#2320 ) * Revert update on top_k_op * Speed transpose up on gpu Speed transpose up on gpu	2018-03-19 20:30:33 -07:00
onnxbot	ebd4dadeb0	[auto] Update onnx to 5865ed1 - Minor code quality improvements (#614 ) `5865ed15f4`	2018-03-20 00:42:05 +00:00
Bram Wasti	aa4af1a5f9	[tiny] make debug info optional, CAFFE2_DEBUG env variable driven	2018-03-19 16:58:04 -07:00
Qinqing Zheng	23631eee5a	[C2] Fix the check of current scope in optimizer (#2316 ) scope.CurrentDeviceScope() can return a None type, which was not considered.	2018-03-19 16:38:55 -07:00
bddppq	39a6859685	Fix softmax symbolic (#5893 ) `ba64724aee`	2018-03-19 19:33:02 -04:00
Yan Zhu	fb77b423f4	refactor histogram as net modifier (#2314 )	2018-03-19 16:04:58 -07:00
Brooks	1936753708	Added an implementation of a multivariate normal distribution (#4950 )	2018-03-19 23:22:46 +01:00
Soumith Chintala	7e13138eb6	Revert "Enable resetting of batchnorm running stats and cumulative ("simple") moving average" (#5892 ) * Revert "Port ATen and JIT C++ tests to Catch2 (#5788)" This reverts commit 6f80023c29e0fb55f46a32c4931bc5d4ba749846. * Revert "Fix error message for cat-ing zero-dim tensors (#5819)" This reverts commit cf2e1760490d369e93017b9425279b235c10772d. * Revert "Softmax symbolic should account for negative dim (#5846)" This reverts commit ba64724aeea8ad5d4b50cd1154fca5a011618333. * Revert "[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855)" This reverts commit 22ef8e5654c45d1f5404e3add6ad19678c0b80a9. * Revert "Don't modify requires_grad when running DataParallel in no_grad mode (#5880)" This reverts commit d11b7fbd1c49ed7bd84c89d286e2763e6ba55f51. * Revert "fix some methods not showing up in doc (#5882)" This reverts commit 24fca0efb289a069929639783d1c050b79e591c0. * Revert "ReduceOps cleanup and set_num_threads (#5723)" This reverts commit 84400d5531500e1a3fbcfe8a3f2865f982405861. * Revert "introduce shape_as_tensor and reshape_from_variable_shape (#5824)" This reverts commit f446b82e70ca0aa42fffa58469c28b6bce51d021. * Revert "Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766)" This reverts commit 99b1f6cfad85a4856550cc1e787afd7ff9e6c6aa.	2018-03-19 17:47:54 -04:00
Yangqing Jia	e426a5dadd	Add an option for Caffe2 to link with local protobuf. (#2306 )	2018-03-19 14:36:53 -07:00
onnxbot	56505007a2	[auto] Update onnx to c39280b - Add the wheel setup test in Windows build and support py35 in CI test (#620 ) `c39280b566`	2018-03-19 21:21:14 +00:00
Orion Reblitz-Richardson	00603b5e0a	Add CollectAndDistributeFpnRpnProposalsOp for FPN support (#2254 ) * Add CollectAndDistributeFpnRpnProposalsOp for FPN support * Adds a C++ operator equivalent to the Python op in Detectron * Once some additional GenerateProposalsOp changes are made this will let us support Detectron FPN models with straight Caffe2 C++ ops * RetinaNet and segmentation models require additional work * Remove some uses of conservativeResize * Add notes about training and inputs/outputs to operator documentation	2018-03-19 14:04:43 -07:00
Luca Antiga	6f80023c29	Port ATen and JIT C++ tests to Catch2 (#5788 ) This PR addresses #5648. In particular, following the discussion at #5648: - it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils - it ports all ATen tests to Catch - it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)	2018-03-19 16:09:43 -04:00
Richard Zou	cf2e176049	Fix error message for cat-ing zero-dim tensors (#5819 ) Fixes #5552 * Fix error message for cat-ing zero-dim tensors * Address comments	2018-03-19 16:06:27 -04:00
James Reed	ba64724aee	Softmax symbolic should account for negative dim (#5846 )	2018-03-19 15:43:41 -04:00
Tongzhou Wang	22ef8e5654	[fft][1 of 3] build system and helpers to support cuFFT and MKL (#5855 ) This is the first of three PRs that #5537 will be split into. This PR adds mkl headers to included files, and provides helper functions for MKL fft and cuFFT. In particular, on POSIX, headers are using mkl-include from conda, and on Windows, it is from a new file @yf225 and I made and uploaded to s3. * add mkl-include to required packages * include MKL headers; add AT_MKL_ENABLED flag; add a method to query MKL availability * Add MKL and CUFFT helpers	2018-03-19 15:43:14 -04:00
Sam Gross	d11b7fbd1c	Don't modify requires_grad when running DataParallel in no_grad mode (#5880 ) Previously, running DataParallel in no_grad mode would change the requires_grad property of the network's parameters to False. The issue is that Broadcast returns aliases of the inputs for the source device. In no_grad mode, it would deatch these inputs in-place. Fixes #5851	2018-03-19 15:26:51 -04:00
Tongzhou Wang	24fca0efb2	fix some methods not showing up in doc (#5882 )	2018-03-19 14:48:15 -04:00
cpuhrsch	84400d5531	ReduceOps cleanup and set_num_threads (#5723 )	2018-03-19 13:40:56 -04:00
anderspapitto	f446b82e70	introduce shape_as_tensor and reshape_from_variable_shape (#5824 )	2018-03-19 13:30:27 -04:00
Lu Fang	3c213bd9da	Add fallback for CuDNN pooling (#2291 )	2018-03-19 09:54:30 -07:00
Paul Jesse Hellemn	3f667176cc	Fixing the conda-gcc-cuda builds (#2305 ) * Fixing mistakes in earlier PR * Allowing cuda builds of different gccs	2018-03-19 09:32:32 -07:00
Jerry Ma	99b1f6cfad	Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#5766 )	2018-03-19 11:47:57 -04:00
peterjc123	5014adfe2f	Fix CUDA 8 build on Windows (#5869 )	2018-03-19 09:01:22 -04:00
Lu Fang	334fc98fb0	Handle the legacy padding in global pooling case (#2292 )	2018-03-18 21:28:15 -07:00
bddppq	58af449ca1	Bump onnx opset version to lastest (#5849 ) This is mainly to include the new version of Reshape operator.	2018-03-18 23:16:08 -04:00
Edward Z. Yang	0eaf883d6a	Delete stubs from one more place. (#5866 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-18 22:56:22 -04:00
Alexander Sidorov	e431c98205	Caffe2: Add support for several auto-created observers and move net summary to (#2304 ) a separate observer This allows to support several auto-attached observers.	2018-03-18 18:23:40 -07:00
Edward Z. Yang	b5def81de8	Delete stubs from LD_LIBRARY_PATH when we actually run code. (#5861 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-18 20:58:03 -04:00
Yangqing Jia	dad57a414b	put caffe2_protos to a standalone target (#2302 )	2018-03-18 17:38:23 -07:00
Edward Z. Yang	77042266ee	Multi-gpu test. (#5854 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-18 00:45:28 -04:00
Paul Jesse Hellemn	c18dba9fe7	Adding gcc4 conda builds (#2283 ) * Changes without centos changes * Changes for protobuf 3.5 and gcc 4.8 * Changing 3.4.1 back to 3.5.1 * Preventing installing two versions of setuptools * Fixing setuptools bug	2018-03-17 17:26:37 -07:00
Ailing	2f64e1cdf6	Add second iteration in test_DistributedDataParallel (#5830 )	2018-03-18 00:27:45 +01:00
Richard Zou	0ca046c68d	Fix bug (#5836 )	2018-03-17 17:05:33 +01:00
li-roy	1dcad08537	Support N-D tensors in Bilinear (#5764 ) * support n-d inputs in bilinear and move to aten * support n-d inputs in bilinear and move to aten * add asserts to bilinear inputs * address comments * cast int64_t in asserts	2018-03-17 11:57:43 -04:00
Richard Zou	04edb8948a	Fix kldiv backward on CUDA (#5814 ) * Test that gradOutput is being used for criterion losses * Fix incorrect kldiv backward on CUDA * Address comments * Fix legacy	2018-03-17 11:17:07 -04:00
li-roy	e876b5d9d0	implement TripletMarginLoss as a native function (#5680 ) * implement TripletMarginLoss as a native function * implement TripletMarginLoss as native function * fix compile error * address comments * address comments * Add keepdim arg to pairwise distance	2018-03-17 11:10:48 -04:00
Peter Goldsborough	32462e0ac4	Cleaner solution to the undefined references in RPC (#5817 )	2018-03-17 11:10:24 -04:00
Edward Z. Yang	40ea24cc54	Skip test_backwards_fork test as flaky. (#5839 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-17 10:40:27 -04:00
li-roy	d776c52ff7	Fix nvprof parsing (#5840 )	2018-03-17 10:38:57 -04:00
James Reed	ce0204402b	Add ATen symbolic for unique op (#5845 )	2018-03-17 10:28:27 -04:00
ngimel	f390a252f4	fused GLU backward (#5782 )	2018-03-17 10:27:05 -04:00
Thomas Viehmann	7cbe63da86	improve handling of precision issue in torch.multinomial (solves #4858 ) (#5774 ) * improve handling of precision issue in torch.multinomial (solves #4858) * add test * review feedback - eliminate size check. Thanks!	2018-03-17 10:26:22 -04:00
Tongzhou Wang	00cc962670	typo (#5847 )	2018-03-17 10:26:00 -04:00
Fritz Obermeyer	abf97e954e	Avoid in-place ops in BoltzmannTransform (#5842 )	2018-03-17 10:24:23 -04:00
Vedanuj Goswami	d441396e47	Fix crash in new tensor with numpy array in CUDA (#5850 )	2018-03-17 10:23:02 -04:00
Adam Paszke	e6ac93b817	Add support for number and list literals in Python frontend (#5843 )	2018-03-17 10:22:23 -04:00
onnxbot	0167f76d2a	[auto] Update onnx to 012145f - Relax the precision on the output (#622 ) `012145fda9`	2018-03-17 03:13:07 +00:00
onnxbot	def76eee1c	[auto] Update onnx to e2e8003 - add output shape as input for reshape (#608 ) `e2e8003ec3`	2018-03-16 23:32:44 +00:00
bddppq	c155842cc1	Update onnx frontend to emit new onnx Reshape (with shape as input) (#2287 ) * Update onnx frontend to emit new onnx Reshape (with shape as input) * Address comments and revert submodule change	2018-03-16 16:32:35 -07:00
Peter Goldsborough	875925b030	Add operator[](int64_t) overload (#5838 )	2018-03-16 23:10:37 +01:00
gchanan	c474136ee1	[REDO] Add torch.sparse_coo_tensor factory. (#5781 ) * Add torch.sparse_coo_tensor factory. Notes: 1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add 2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor) will return a sparse tensor corresponding to the default type rather than a sparse IntTensor. We can add type inference later when we add it to other factories. * Fix merge. * Use type_conversion function from python_variable_methods.	2018-03-16 13:58:02 -04:00
Edward Z. Yang	acc409396b	Namespaced symbols (#5820 ) * Namespaced symbols - Our interned strings now have structure, "ns::symname" rather than just "symname" before. We support efficient namespace testing for uniques by encoding the namespace in one byte in the Symbol internal representation. See torch/csrc/jit/interned_strings.h for a more in-depth implementation discussion. - All uses of ksymbol are now attr::symbol (or some appropriate namespace). The valid namespaces are prim, attr, onnx and aten. - Symbol is bound in Python as a qualified string "attr::symbol", EXCEPT for the attribute setting/getting API, whose symbols must always be attr symbols; they get special cased to assume strings are passed. There's a little bit of naughtiness in the implementation, maybe you know how to solve it. - However, the g.op() convenience function assumes that you're generating ONNX operators, unless you explicitly qualify. - All ATen operators and nodes have built-in interned strings generated for them, so you should never have to write a string literal ever again. The tracing code is adjusted to use it. - ONNX exporter now properly tests to see that all operators are in onnx namespace before accepting the export. This is way more robust than the previous exporter, which would be willing to export capitalized operators which were not actually ONNX operators. - A slight organizational change for symbolic.py; this module now ONLY contains aten operators. In particular, the exporter for Constant has moved into utils.py (along with Undefined, from the C++ side), since primitive ops get "special treatment." - The un-inplacing logic in recording is more robust, so that we don't delete a trailing underscore from __and__. This never affected us before because we didn't have any tests for it. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-16 13:36:11 -04:00
Tongzhou Wang	940a0ab67b	Add logdet and slogdet (#5393 ) * 1. Add logdet and slogdet in ATen side 2. Previously, det can return result with incorrect sign upon seeing symmetric matrices. This is caused by the wrong assumption I had on SVD (when input is symmetric U=V^T). This fixes it. 3. Moreover, after fixing 2 now QR is always needed for det forward. So I moved SVD to backward call. Since this is a specific variant of SVD, it is named as _svd_with_positive_UV_det, with derivative.yaml entry being svd_backward. 4. Updated/added backward functions for det, logdet and slogdet, which uses _svd_with_positive_UV_det and svd_backward inside. 5. Optimized svd_backward: a. Avoid unnecessary kernels when only sigma has gradient (this is the usual case, and also true with det backward functions). b. Fix SVD double backward by avoiding a nan. 1. Add/update grad checks for det, logdet, and slogdet. 2. Fix an incorrect check for dim_args_idx in test_autograd.py 3. Add option to only test a subset of output values, specified by test_output_indices, for cases like slogdet where only the second output is differentiable. 4. Add better doc for the test generating list. * Add/improve output tests for det, logdet and slogdet Add a scaling to random matrices so closeness checks are more robust * Remove unnecessaery Variable wrappers in some test files * Add logdet slogdet docs * Improve an err msg in THTensorLapack.c * add inverse-based backward for invertible matrices use svd only for non-invertible case, so don't need the special variant anymore * use LU rather than QR	2018-03-16 09:23:00 -04:00
Ailing	f5aa8d55ad	fix detach in place error in DDP (#5829 ) * fix detach in DDP * fix typo * make lint happy	2018-03-16 09:22:04 -04:00
Peter Goldsborough	a5a99bd4a1	Make static state function-local (#5822 )	2018-03-16 09:15:10 -04:00
Yu-An Chen	0b5b28f6a7	add some onnx exported supports (#5734 ) * add some onnx export supports * fix the number of spaces * fix blank line and white spaces * rm initialize * split upsample, gt off, add lt	2018-03-16 00:06:35 -04:00
Will Feng	2322ab11b9	Allow larger margin for perf test runtime variation (#5799 )	2018-03-16 00:05:41 -04:00
lazypanda1	7f864bbe52	Fixed distribution constraints and added some test cases for distributions parameter check (#5358 )	2018-03-15 23:11:20 +01:00
James Reed	e8f14f5d37	Fix ONNX backend for MatMul (#2273 ) * Fix ONNX backend for MatMul * Update Python implementation * Address comments	2018-03-15 14:43:52 -07:00
Adam Paszke	eeb90d9c95	Add a Number node to the JIT AST and unify script syntax with Python (#5716 )	2018-03-15 20:56:23 +01:00
Martin Raison	c40b99f9ae	speed up CPU EmbeddingBag (indexSelectAdd op) (#5433 ) * speed up CPU EmbeddingBag (indexSelectAdd op) * keep operator inside EmbeddingBag + speedup * comment * update checkScalarTypes signature * enforce type in embedding_bag_backward_cpu	2018-03-15 15:46:53 -04:00
Richard Zou	ecffe53ef0	Fix convolution type mismatch error message (#5815 )	2018-03-15 15:44:06 -04:00
Edward Z. Yang	404b8e9442	Revert "introduce size_as_tensor and resize_from_tensor" (#5818 ) * Revert "introduce size_as_tensor and resize_from_tensor (#5792)" This reverts commit 4fa08535ed8c63f05c7e33ca6faa255c0bb5e93b.	2018-03-15 15:05:51 -04:00
bddppq	eee4f1ee42	Add symbolic functions for cumsum and embedding_bag (#5786 ) * Add symbolic functions for unsqueeze, cumsum and embedding_bag * unsqueeze already exists	2018-03-15 14:50:41 -04:00
anderspapitto	4fa08535ed	introduce size_as_tensor and resize_from_tensor (#5792 ) these two operators use a Tensor to hold the sizes, which allows symbolic implementations to be attached	2018-03-15 14:47:35 -04:00
Adam Paszke	b239b123e4	Clean up TraceInput (#5743 )	2018-03-15 19:38:33 +01:00
Peter Goldsborough	3084a577eb	Allow indexing by scalars and zero-dim tensors (#5749 )	2018-03-15 17:49:16 +01:00
Orion Reblitz-Richardson	bc0bd063ca	Fix license header for GenerateProposalsOp. (#2202 ) * Seems like this didn't get adjusted when the ops were open sourced.	2018-03-15 09:25:04 -07:00
anderspapitto	5c51bb6c0f	bugfix in onnx export of batch_first = True (#5753 )	2018-03-15 12:23:21 -04:00
cpuhrsch	5fa3aac610	ATen ReduceOps (#5776 ) #5481 was reverted due to a strange test bug. This PR attempts to fix that. This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.	2018-03-15 12:09:28 -04:00
Richard Zou	42ba8c1a73	Add section on unit testing to CONTRIBUTING (#5813 )	2018-03-15 12:06:20 -04:00
peterjc123	abd6f82709	Fix debug build failure on Windows (#5771 )	2018-03-15 11:42:44 -04:00
gchanan	6f5e869259	Add promoteTypes to ATen and torch._promote_types to python. (#5795 ) This isn't hooked up to anything yet, but is necessary for both scalar binary ops in ATen and tensor constructor type inference in PyTorch.	2018-03-15 11:02:28 -04:00
Richard Zou	82777815f8	Fix bmm memory leak (#5744 ) Fixes #5611. THCTensor_(baddbmm) assumes that newContiguous will always return a new tensor (this is a bad assumption). At the end of the function, tensors are freed if tensor_new != tensor_old. As a result, some tensors aren't freed if they were initially contiguous and newContiguous is called on them. Test Plan code reading run the following (from the #5611 bug report) and assert that the memory doesn't leak anymore import subprocess import torch from torch.autograd import Variable # This is from https://discuss.pytorch.org/t/access-gpu-memory-usage-in-pytorch/3192/4 def get_gpu_memory_map(): """Get the current gpu usage. Returns ------- usage: dict Keys are device ids as integers. Values are memory usage as integers in MB. """ result = subprocess.check_output( [ 'nvidia-smi', '--query-gpu=memory.used', '--format=csv,nounits,noheader' ], encoding='utf-8') # Convert lines into a dictionary gpu_memory = [int(x) for x in result.strip().split('\n')] gpu_memory_map = dict(zip(range(len(gpu_memory)), gpu_memory)) return gpu_memory_map l, m, n = 1, 9, 1 w = torch.nn.Parameter(torch.Tensor(1024, 2, l, m).cuda()) for i in range(10000): a = Variable(torch.Tensor(1024, 2, m, n).cuda()) torch.matmul(w, a).permute(0, 3, 1, 2).mean().backward() if i % 100 == 0: gpu_mem = get_gpu_memory_map() print("GPU: {:.2f} KB".format(gpu_mem[0]))	2018-03-15 10:44:35 -04:00
CNC	b499332aaf	fixed a message typo in ATen CMakeLists.txt (#5802 )	2018-03-15 10:37:27 -04:00
peterjc123	7a5fc2fa22	Fix undefined '__func__' for CUDA 8 on Windows (#5803 )	2018-03-15 10:33:31 -04:00
Joan Puigcerver	a24d4b7454	Fix compilation with CUDA < 8.0 (#5621 ) * Compile with CUDA 7.5 and GCC > 4.9 * Removed static keyword from device constants.	2018-03-15 10:17:12 -04:00
Myle Ott	f5f6258288	Enable additional tensor types in Gloo backend (#5483 )	2018-03-15 14:53:24 +01:00
Adam Paszke	c66111e79b	Desugar torch.* and F.* functions in JIT script (#5784 )	2018-03-15 12:02:31 +01:00
Adam Paszke	694bee1f7e	Fix the rule for Assign in JIT's Python frontend (#5793 )	2018-03-15 09:14:03 +01:00
Peter Goldsborough	4613eef69e	Simplify run_test.py and dont use shell=True (#5767 ) * Simplify run_test.py and dont use shell=True * Fix non-shell output for check_output and always print to stderr * Use shlex.split instead of str.split * s/log/print_to_stderr * with_init -> with_init_file * Remove bufsize argument	2018-03-15 01:12:51 -04:00
onnxbot	eea680a354	[auto] Update onnx to 31ca96c - Microbenchmark for encoding+decoding ModelProto and GraphProto with a single operator (#609 ) `31ca96ca33`	2018-03-15 03:21:09 +00:00
Peter Goldsborough	514f87a16c	Define RPC types out of source (#5794 )	2018-03-14 21:36:54 -04:00
Richard Zou	1709484a40	Restore tensor.type, tensor.type_as docs (#5746 )	2018-03-14 17:59:31 -04:00
Soumith Chintala	bedba9c156	Fix unused parameter warning	2018-03-14 14:58:31 -07:00
Soumith Chintala	af5bfa00a5	Fix unused parameter warning in THTensorMath.c	2018-03-14 14:37:41 -07:00
Paul Jesse Hellemn	74f0b270ea	Fixing conda (#2123 ) * Fixing conda * Adding hypothesis and onnx to conda builds * Updates but still not working * Adding required changes to conda_full * Updates * Moving to more general build_anaconda script * Adding check for gcc version * Adding general ways to add/remove packages from meta.yaml? * Changes for specific packages to build on gcc 5.4 * Fix with glog spec * Requiring >numpy 1.12 for python 3 to satisfy opencv dependency * Adding pydot to required testing packages * Adding script to read conda versions for gcc ABI * Trying to fix segfault by installing in env instead * conda activate -> source activate * Trying adding back leveldb * Setting locale for ONNX + conda-search changed its format * read_conda_versions handles libprotobuf * Conda script updates * Adding a protobuf-working test * Removing changes to proto defs b/c they will require internal changes in a separate diff	2018-03-14 12:24:37 -07:00
Soumith Chintala	e40425fd9b	Revert "Add torch.sparse_coo_tensor factory. (#5745 )" (#5780 ) This reverts commit 361baa5a48cb72a4f5e11508a963978edcd6cff9.	2018-03-14 13:30:52 -04:00
Lu Fang	8a9925f03f	Fix useless opset_import in onnx (#2243 ) * Fix useless opset_import in onnx * Set the default ir version in make_model * Use the target_opset_version in Caffe2Frontend * remove make_model from helper in caffe2.python.onnx	2018-03-14 10:17:32 -07:00
Yinghai Lu	5022b32b62	Fix windowns build (#2261 )	2018-03-14 09:49:49 -07:00
gchanan	361baa5a48	Add torch.sparse_coo_tensor factory. (#5745 ) Notes: 1) I didn't add Tensor.new_sparse_coo_tensor; it didn't seem particularly useful, but it's easy to add 2) This doesn't do the type inference, i.e. torch.sparse_coo_tensor(indices=LongTensor, values=IntTensor) will return a sparse tensor corresponding to the default type rather than a sparse IntTensor. We can add type inference later when we add it to other factories.	2018-03-14 12:10:07 -04:00
Soumith Chintala	e9fffb5579	use std:: math functions (#5773 )	2018-03-14 08:56:10 -04:00
Edward Z. Yang	3f3b686056	Refactor run_test.py to pass all options, not just verbose. (#5760 ) I need this because run_test is going to need to read other options than just verbose when I implement JUnit XML dumping. (JUnit XML dumping cannot be implemented solely by frobbing --python because the XML file to dump to must vary based on the test name.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-14 07:44:58 -04:00
Edward Z. Yang	cadeb0cb17	Revert "ATen ReduceOps (#5481 )" (#5765 ) * Revert "ATen ReduceOps (#5481)" This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc. * Revert "Check that new cpuinfo and tbb submodules exist (#5714)" This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.	2018-03-13 23:50:16 -04:00
Sam Gross	11056528d1	Fixes Variable::data() on UndefinedTensor (#5756 ) The save_mean and save_std are undefined if training is false. Previously, we unpacked them even though we did not use them in the computation. We also don't need to re-pack the mean/variance variables.	2018-03-13 22:22:20 -04:00
Mohammad Hossain	28eda01809	Reduce Sum and Reduce Mean (#2189 ) * Reduce Sum and Reduce Mean * Handle reductions with empty 'axes' * Merge codebase and simplify tesnor reduction logic * Restructure code and add comments. * Fix parameter to scale * Fix parameter to scale	2018-03-13 19:13:47 -07:00
Yangqing Jia	dd921f65ba	bump version to 0.8.2 (#2251 )	2018-03-13 18:07:07 -07:00
bddppq	0476a2346b	Add symbolic for relu to support exporting to ONNX (#5759 )	2018-03-13 20:20:39 -04:00
Peter Goldsborough	bab0f8484b	Put torch header install back into the install command (#5755 )	2018-03-13 19:23:02 -04:00
Peter Goldsborough	16fa12214d	raise RuntimeError on test failure (#5754 )	2018-03-13 18:53:43 -04:00
Richard Zou	11444a7273	Save self.numel() for backward (#5747 )	2018-03-13 17:45:29 -04:00
Qinqing Zheng	edd138ba00	[C2] Support optional lengths input to ReduceFront/Back operators (#2250 )	2018-03-13 13:20:26 -07:00
Peter Goldsborough	effc568cee	Add ReLU to ATen (#5626 )	2018-03-13 19:23:24 +01:00
Edward Z. Yang	835b2ffd72	Warning police. (#5720 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-13 13:16:35 -04:00
Vishwak Srinivasan	76a283db40	[ready] General Documentation Improvements - 2 (#5685 ) * Fix some minor errors in existing docs. * Fix Convolution and Pooling docs in torch.nn.functional * Cleaned up torch.nn.functional docs * Address @SsnL 's comments * Add multiplication sign missing in docs * Fix more typos, and clear some warnings * Change infinity symbol in LPPool2d * Revert some changes in torch.nn.functional * Few more minor changes	2018-03-13 09:47:43 -04:00
Teng Li	37059ba0ec	Added torch.distributed.launch module for easier multi-proc/node distributed job launching (#5348 )	2018-03-13 12:04:38 +01:00
YifengHuang	f377159cc8	make dimension checker of `scatter_add_` consistent with `scatter_` (#5659 ) * make dimension checker of scatter_add_ consistent with scatter_ * move TH_TENSOR_DIM_APPLY3_SIZE_SCATTER out of scatter and scatterAdd	2018-03-13 05:23:26 -04:00
Calvin Lee	025e43c263	Attempt to fix #5718 . (#5726 ) * Attempt to fix #5718. * markdown fix for LPPool1d * It's sum pooling, not average pooling. (I both tested this and considered the math.)	2018-03-13 04:38:04 -04:00
Calvin Lee	f69fb3829a	Add documentation for LPPool1D (#5730 )	2018-03-13 04:37:25 -04:00
Richard Zou	542fbcc127	Add optimization to norm for common norms (#5722 )	2018-03-12 19:54:49 -04:00
James Reed	55af142b44	Traceable dispatch for cast methods (#5629 ) Previously, methods like int() and long() would fail tracing because they eventually dispatch down to toType, which takes a Type as a parameter. We don't (currently) support tracing ops with Type inputs[0], so this PR adds specializations for the ATen scalar types and dispatches to those directly. These specialized ops can be traced into the IR without needing a Type argument. A more long-term solution would be to add support for Types in the IR. * Traceable dispatch for Variable cast methods * Add ONNX symbolics * Fix test * Fix cross-backend copy issue * Prepend underscores to cast identifiers * Metaprogram symbolics * clang-format * stupid lint * Add comments for all code fragments	2018-03-12 19:01:14 -04:00
James Reed	0919b5247d	Fix at::optional return type in fusibleExpandTo (#5717 ) * Fix at::optional return type * More type-safe return expressions :)	2018-03-12 19:00:06 -04:00
Yinghai Lu	7e6693991d	Onnx caffe2 backend (#2039 ) * C++ version of ONNX->Caffe2 backend * use namespace ONNX_NAMESPACE * Fix Build * Comments * Change namespace from onnx_caffe2 to caffe2::onnx	2018-03-12 15:18:05 -07:00
ngimel	c7611f7608	improve occupancy for cuda rngs (#5710 )	2018-03-12 16:21:01 -04:00
Sam Gross	a2641500bf	Implement torch.reshape and Tensor.reshape (#5575 ) * Implement torch.reshape and Tensor.reshape This implements reshape which has similar semantics to numpy.reshape. It will return a view of the source tensor if possible. Otherwise, it returns a copy. * Remove in-place reshape_ that was an alias for resize_ * Update documentation	2018-03-12 16:20:40 -04:00
gchanan	f6c708f869	Ensure torch.tensor and Tensor.new_tensor copy numpy data. (#5713 )	2018-03-12 16:20:10 -04:00
Sam Gross	1a23c9901d	Check that new cpuinfo and tbb submodules exist (#5714 )	2018-03-12 15:44:10 -04:00
jmp84	b465bb9a8e	fix post eos penalty (#2235 )	2018-03-12 12:42:22 -07:00
James Reed	4007dd76e2	Add missing ONNX symbolics and fix fusible expand logic (#5654 ) This includes various fixes required to export the NMT decoder to ONNX * Add missing ONNX symbolics and fix fusible expand logic * Update comments and use of at::optional * Use _unimplemented	2018-03-12 15:39:39 -04:00
sf-wind	602a09dde7	Update caffe2 from facebook 4f527ef46abf (#2234 ) * [GanH]: two_task_discriminator as titled and adding label smooth * [Dper2] Simplified UI options needed for blob magnitude visualization * [GanH]: fix tags as titled * Added type and shape inference for GatherRange operator This helps with type / shape inference when using this operator in layers. Also just a nice to have in general. * Demonstrate Caffe2 exception handling with StoreHandlerTimeoutError in Python We'd like to catch and recover from certain Caffe2 net exceptions. Use this diff to demonstrate a pattern of registering a pybind exception mapping and catching in Pythonusing caffe2::StoreHandlerTimeoutException. * Bind Gloo IoException to IoError in Python Allow peer failure handling and recovery using an exception based mechanism. This diff registers gloo::IoException with pybind. * [GanH]: add label smoothing to softmax with loss as titled * [C2] Enable LARS in Adagrad and hook it to DPER * [DPER] Don't pass LayerModelHelper in create_trainer_nodes Since we're planning to get rid of it eventually and I want to get access to NetDef only interface ASAP - I'm looking towards removing all references to LMH, where we don't really need them. * fix bugs in LambdaRankNdcgOp the loss and gradient in LambdaRankNdcgOp are incorrect. The loss should be negative log of probs instead of log. * Restrict thread pool on iOS to only big cores Historically, iPhones exposed only one type of cores, and Caffe2 thread pool used all of them. However, iPhone 8/iPhone X exposes 2 big + 4 LITTLE cores. As our thread pool doesn't support work stealing or other forms of load balancing, fast cores end up waiting for the slow ones, and it may be better to restrict execution to only 2 fast cores, like we do on Android. * Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine Remove SparseLength Sum/WeightedSum/Mean operators with fp16 engine * make clang happy and get fewer warnings make clang happy and get fewer warnings * [Personalization] Support add_output_schema() in layer_model_helper Problem: Currently the output_schema of sparse_nn can only be set once. https://fburl.com/efth5zer. Solution: For flexibility, we want to add fields to output_schema incrementally. Plan: Wrap the change of `model._output_schema` into a new function `add_output_schema()` for adding additional output_schema. Callsite: The add_output_schema() should be called instead at https://fburl.com/efth5zer Reference: The newly added `add_output_schema()` will be similar to `add_loss()` in https://fburl.com/t2ii8njh	2018-03-12 12:22:59 -07:00
cpuhrsch	310c3735b9	ATen ReduceOps (#5481 ) This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities. The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc. For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC. There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc. I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch. Here is the command for 1 core `OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200` Here is the command for all cores `python sum_bench.py --enable_numpy 200` Here are the results of each: [Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ) [This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w) [Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw) [This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA) To test the command is `python sum_bench.py --test 200` [This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw) For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.	2018-03-12 15:19:12 -04:00
ngimel	4a96f5616c	make CUDA_VERSION available in cudnn/Descriptors.h (#5709 ) Otherwise hmma is not called in rnns on volta.	2018-03-12 14:41:00 -04:00
Edward Z. Yang	dc4984ef10	Delete ""_sym literal form. (#5707 ) * Delete ""_sym literal form. Two reasons: 1. It's unnecessary now; all of the uses of the literal form would be better directly referring to the interned string (esp. since now we are autogenerating symbols.) 2. When I add namespacing, there will be no convenient way to specify the desired namespace with just _sym. If we add it back, we would need distinct suffixes for each different type. Easiest to delete it while we don't need it.	2018-03-12 18:47:19 +01:00
Benjamin Heinzerling	42bf2f9289	Explain floating point issue in torch.arange doc (#5708 ) * Explain floating point issue in torch.arange doc https://github.com/pytorch/pytorch/issues/5556 https://github.com/pytorch/pytorch/issues/5704 https://github.com/pytorch/pytorch/pull/5600 * Add line break to stay below max comment length * Copyedit * Typofix	2018-03-12 12:04:51 -04:00
Soumith Chintala	4b2d278968	check-in pytorch.version file to master	2018-03-12 10:07:57 -04:00
Zachary DeVito	41285edbb6	[jit] add a compiled script module (#5630 ) Add script::Module C++ class to represent script modules switch AST -> IR conversion to work on Modules/Methods rather than raw graphs function-only AST -> IR conversion is just a simplified case where there is only one module with a single method and no parameters. introduce SugaredValue in compiler.h to represent values in scope in a script function that are not first-class and that get desugared. This is used to represent the module's self parameter, as well as python function calls, and method calls on tensor provide a Python ScriptModule that provides a nice API on top of script::Module allowing for the definition of script modules with methods, parameters, and submodules Not in this PR but intended for the future: ScriptModule actually subclasses nn.Module, with most methods implemented Unification of tracedmodule and script module functionality into one container class. Detailed changelog: * Switch compiler over to using Module, but don't use them yet. * Remove intermediate attribute encoding in compiler * Create SugaredValue object to handle resolution of compiled module. * switch to_ir to modules, implement Select * hacky python wrappers * Private ScriptModule * Add `define` to script module * Attributes use TK_LIST_LITERAL this anticipates adding a real list literal expression to the language. * Add a metaclass to make sure script stubs are registered * Add a test * Doc createResolutionCallback * Docs and minor editing * Address PR comments * Document * Fix unicode issue	2018-03-12 09:52:40 -04:00
Simeon Monov	dede63689f	Moved headers files copy for C++ extensions to build_ext in setup.py (#5706 ) The header files needed for the C++ extensions were copied to torch/lib/include under install. In case of bdist_wheel or build develop for example, the files are not copied and cpp_extensions test is failing: ``` Running test_cpp_extensions.py ... running install running build running build_ext /home/moni/src/ibm/AI/pytorch/torch/utils/cpp_extension.py:79: UserWarning: Your compiler (g++) may be ABI-incompatible with PyTorch. Please use a compiler that is ABI-compatible with GCC 4.9 and above. See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html. warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler)) building 'torch_test_cpp_extension' extension creating build creating build/temp.linux-x86_64-3.6 gcc -pthread -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/moni/src/ibm/AI/pytorch/torch/lib/include -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/TH -I/home/moni/src/ibm/AI/pytorch/torch/lib/include/THC -I/home/moni/miniconda3/envs/pytorch/include/python3.6m -c extension.cpp -o build/temp.linux-x86_64-3.6/extension.o -g -DTORCH_EXTENSION_NAME=torch_test_cpp_extension -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ extension.cpp:1:25: fatal error: torch/torch.h: No such file or directory #include <torch/torch.h> ^ compilation terminated. error: command 'gcc' failed with exit status 1 ```	2018-03-12 14:07:45 +01:00
Vishwak Srinivasan	f5a40a8b53	Fix error message (#5701 )	2018-03-12 03:47:28 -04:00
Paul Jesse Hellemn	1df99e541c	Fixes for build errors on Windows with GPU (#2222 ) * Fixes for build errors on Windows with GPU * Typo	2018-03-11 15:44:14 -07:00
Edward Z. Yang	000edb791e	Make use of new BUILD_ENVIRONMENT variable when possible. (#5699 ) * Make use of new BUILD_ENVIRONMENT variable when possible. Eliminate CI provided environment variables. At the moment, our build scripts depend on a few environment variables which are specified by the CI system and passed down to the build. Based on the build scripts, these environment variables are JOB_NAME, PYTHON_VERSION and GCC_VERSION; variables that depend solely on the image being built and the invoked script. a. Proposal: A recent rewrite of the pytorch-dockerfiles has embedded a new environment variable, BUILD_ENVIRONMENT, which is automatically set when you run the Docker image. This environment variable subsumes JOB_NAME (this variable doesn't specify if you are “building” or “testing”, but this can easily be inferred from the script that is being invoked.) Make use of this environment variable to compute the other variables. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * syntaxfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * bugfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-11 18:14:49 -04:00
Peter Goldsborough	6404904d8a	Fix run_test.py (#5693 )	2018-03-10 19:16:40 -05:00
anderspapitto	e9d1a5f6d5	support non-Variable arguments to functions in symbolic overrides (#5645 ) simply pass them through unmodified. This is just the final tweaks, after the bulk of the work getting rid of ExportProxy	2018-03-10 17:51:49 -05:00
Tongzhou Wang	261dd6ea83	fix named_modules doc, clarify eval doc (#5691 )	2018-03-10 17:35:07 -05:00
Adam Paszke	15cc24a970	Minor improvement in AutoGPU usage in CUDA bindings (#5689 )	2018-03-10 11:55:46 -05:00
Kaiyu Shi	248c93372d	Check value type for register_buffer (#5657 ) * Check value type when registering buffer * Fix PEP8 * Use isinstance in favor of is_tensor	2018-03-10 13:02:04 +01:00
onnxbot	ec36e6f40a	[auto] Update onnx to 79dc46f - Add ONNX_NAMESPACE around rnn/old.cc (#605 ) `79dc46fa4d`	2018-03-10 06:35:48 +00:00
Peter Goldsborough	54aa28da73	Add shebangs to perf_test shell scripts (#5684 )	2018-03-09 23:56:04 -05:00
Peter Goldsborough	dca41bb696	Minor fix to gen.py to make CPU-only generation cleaner (#5683 )	2018-03-09 23:54:48 -05:00
Richard Zou	4e190c2fed	Fix floor latex rendering (#5682 ) * Make floors larger * Improve Latex rendering of floor * Improve latex rendering of ceil * Fix flake8	2018-03-09 23:53:14 -05:00
Peter Goldsborough	7368c09280	Add efficient isVariable test to ATen (Part 2) (#5675 ) * Add efficient isVariable test to ATen. This is done as a field on Type so that we can define a non-virtual, inlinable function. The added ASSERTs probalby affect runtime performance, we may need to toggle them off on non-DEBUG builds. Fixes #4814. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rebase and brush up * is_variable -> is_variable_or_undefined	2018-03-09 23:52:54 -05:00
li-roy	e6090403cb	small fixes to CosineEmbeddingLoss tests (#5681 ) * small fixes to CosineEmbeddingLoss tests * fix test	2018-03-09 23:52:22 -05:00
Richard Zou	439aae7e94	Add tensor.repeat docs. Remove legacy tensor repeat function. (#5666 ) * Add tensor.repeat docs. Remove legacy tensor repeat function. * Fix nit	2018-03-09 23:51:47 -05:00
gchanan	b5ee5e585b	Only allow dense floating-point types as the default tensor type. (#5674 )	2018-03-09 23:50:18 -05:00
Richard Zou	03f2ad9029	Add check for python build deps to setup.py (#5618 ) * Add check for python build deps to setup.py * Address comments * Remove install_requires line	2018-03-09 23:49:18 -05:00
Richard Zou	74043b69c2	Alias torch.diagonal, torch.diagflat (#5622 ) * Alias torch.diagonal, torch.diagflat * Address comments; Add sanity tests for torch.diagonal and torch.diagflat	2018-03-09 23:46:42 -05:00
li-roy	7b61b458b1	Make torch.arange consistent with numpy.arange (#5600 ) * Fix arange floating point error * fix test * add type cast when calculating arange size * fix nit * update test * use doubles instead of floats to calculate size * requested changes	2018-03-09 23:43:55 -05:00
onnxbot	eb34186104	[auto] Update onnx to 71fa008 - Provide option to enforce /MD or /MT when building with MSVC (#602 ) `71fa008efe`	2018-03-10 03:22:11 +00:00
Marat Dukhan	09b6ad5785	Use cpuinfo instead of Android's libcpufeatures in Android build	2018-03-09 22:20:37 -05:00
James Reed	59d1d17775	Print source location when ONNX export fails for a node (#5652 )	2018-03-09 15:31:28 -08:00
Richard Zou	582d045092	Fix rrelu docs (#5678 )	2018-03-09 23:33:20 +01:00
Richard Zou	50770f0bc4	Fix Hardshrink equation in docs (#5679 )	2018-03-09 23:32:52 +01:00
Peter Goldsborough	7391dae709	Fix Variable conversion on the way to/from Python (#5581 ) * PyObject* <--> at::Tensor no longer unwraps variables, instead we expect end uses to always work with variable types, and we will only unwrap the variables when we optimize. * Add torch::CPU, torch::CUDA and torch::getType * at::CPU -> torch::CPU in extensions	2018-03-09 14:31:05 -08:00
Kutta Srinivasan	0ee53bf7fe	Fix one more naming issue in resnet50_trainer.py for PR 2205	2018-03-09 13:51:42 -08:00
Kutta Srinivasan	ed05ca9fec	Clean up naming of FP16-related code, add comments	2018-03-09 13:51:42 -08:00
sf-wind	b07980334c	Update jenkins build script using the same flag as used in benchmarking (#1977 ) * Update jenkins build script using the same flag as used in benchmarking * Add a recently added flag * Remove BUILD_OBSERVERS flag since it is no longer used	2018-03-09 13:44:41 -08:00
Shagun Sodhani	b543041e21	Corrected a typo in LSTM documentation. Fixes #5661 (#5662 )	2018-03-09 22:03:25 +01:00
Peter Goldsborough	53876c4606	Rewrite run_test.sh in Python (#5615 )	2018-03-09 22:02:02 +01:00
gchanan	d4c0538be2	Add device to Tensor.new_tensor. (#5669 )	2018-03-09 15:41:14 -05:00
gchanan	ae0c04c773	Add torch.empty, torch.full and new_ size Tensor factory methods. (#5668 ) * Add torch.empty, torch.full and new_ size Tensor factory methods. This adds torch.full, torch.empty equivalents of np.full, np.empty. In addition, this adds size-based Tensor factory methods new_empty, new_ones, new_full, new_zeros, which is meant to complete the separation of the legacy "new" method into data-based and size-based functions. This also fixes an issue in sparse zeros_like when the dtype didn't match the argument dtype. * Get rid of unnecessary zero in sparse tensor zeros_like. * Fix test if only 1 cuda device.	2018-03-09 15:29:29 -05:00
James Reed	60299e03cf	Report all errors during ONNX backend translation rather than failing fast (#2210 )	2018-03-09 10:58:22 -08:00
onnxbot	57e5559788	[auto] Update onnx to b184dd3 - Fix ONNX library build for Windows `b184dd3cb8`	2018-03-09 18:45:24 +00:00
Lu Fang	52460a0b30	Add outputs_info as parameter in run_node (#2161 )	2018-03-09 10:44:51 -08:00
Adam Paszke	e4c303f373	Defer shape analysis failures until runtime (#5574 )	2018-03-09 18:43:03 +01:00
Jerry Zhang	27cb06ae22	Adding rewrite_net for ACL backend (#2186 ) Add rewrite_net for ACL backend	2018-03-09 09:00:21 -08:00
onnxbot	063f066394	[auto] Update onnx to 0174eb5 - fix `get_attribute_value` can not get `g` field bug (#599 ) `0174eb51c8`	2018-03-09 16:45:55 +00:00
gchanan	a3442f62bc	Support native namespace functions with type dispatch. (#5576 ) * Support native namespace functions with type dispatch. Use 'ones' as an example. Note this is a "halfway" solution; i.e. the call chain is: at::ones(shape, dtype) -> dtype.ones(shape, dtype) -> CPUFloatType.ones(shape, dtype) -> at::native::ones(shape, dtype) The "nicer" solution would probably be something like: at::ones(shape, dtype) -> dtype.ones(shape) -> CPUFloatType.ones(shape) -> at::native::ones(shape, this) * Fix type inference. * Fix test install. * Fix extensions. * Put dtype argument at the beginning. * Fix extension.cpp. * Fix rnn. * Move zeros in the same manner. * Fix cuda. * Change randn. * Change rand. * Change randperm. * Fix aten contrib. * Resize in randperm_out. * Implement eye. * Fix sparse zeros. * linspace, logspace. * arange. * range. * Remove type dispatch from gen_python_functions. * Properly generate maybe_init_cuda for type dispatch functions not named type. * Don't duplicate dtype, this parameters for native type dispatched functions. * Call VariableType factory methods from the base type so it gets version number 0. * Address review comments.	2018-03-09 10:52:53 -05:00
Jongsoo Park	037011e757	Avoid duplicated log when explicitly specified engine is not available (#2214 ) * Avoid duplicated log when explicitly specified engine is not available * Update operator.cc	2018-03-09 07:42:53 -08:00
Jongsoo Park	b225893e2a	update comments in segment_reduction_op (#2207 ) * update comments in segment_reduction_op * Update segment_reduction_op.cc	2018-03-09 07:42:36 -08:00
Jongsoo Park	64b33672af	add GatherFused8BitRowwise operator (#2167 ) * add GatherFused8BitRowwise operator * Update gather_fused_8bit_rowwise_op.cc * Update gather_fused_8bit_rowwise_op.cc	2018-03-09 07:42:17 -08:00
Jongsoo Park	632f8b5be7	fix comment on the location of scale and bias (offset) in each fused rowwise 8bit (#2166 ) * fix comment on the location of scale and bias (offset) in each fused rowwise 8bit * Update fused_rowwise_8bit_conversion_ops.cc * Update lengths_reducer_fused_8bit_rowwise_ops.cc * Update lengths_reducer_fused_8bit_rowwise_ops.cc	2018-03-09 07:41:59 -08:00
Thomas Viehmann	a33aeed1dc	Add set_grad_enabled as context manager and function (#5555 )	2018-03-09 11:36:56 +01:00
onnxbot	70fdeb8e07	[auto] Update onnx to 7e205b6 - Add global avg and max pool test cases (#574 ) `7e205b6619`	2018-03-09 07:23:55 +00:00
Ilia Cherniavskii	f9f5946908	Fix variable shadow warning title	2018-03-08 21:40:40 -08:00
bddppq	f88bba1c73	Fix docker builds (#2199 ) * Fix docker builds * Guard more places * fix bash syntax error * . * . * quote	2018-03-08 20:09:38 -08:00
onnxbot	ff804ba168	[auto] Update onnx to 5516ebb - to_string for Android (#597 ) `5516ebb49f`	2018-03-09 04:07:19 +00:00
Tongzhou Wang	71d73211f4	[ready] torch.* doc update for Variable/Tensor merge, and other improvements (#5443 ) * 1. Update doc to reflect changes in Variable/Tensor merge, and new printing style 2. Remove functions in torch/functional.py that are already implemented with native_function 3. Add set_detault_tensor_type doc * fix torch.split * py2 unicode string fix * update torch.gels doc * address @fmassa 's comments * double-colon	2018-03-08 23:02:38 -05:00
Marat Dukhan	359d54ea97	Fix typo in CMakeLists build fix for Ninja (#2213 ) The comment suggests that a special case should apply to either Ninja or Visual Studio, but the condition checks for both	2018-03-08 19:45:39 -08:00
Richard Zou	8ab101ccee	Implement pow() for integer types (#5526 ) * CPU int-types pow() * CUDA int-type pow() * Cleanup + fix deleted line * Tests for integer-types pow * Fix build * Fix windows tests * Make _test_int_pow static	2018-03-08 22:33:32 -05:00
Tongzhou Wang	57c7d132c9	Fix nn.Module.apply doc formatting (#5623 ) * fix nn.Module.apply doc example * other examples' double-colon and newline'	2018-03-08 22:26:01 -05:00
Sam Gross	f84fa526d3	Add additional deprecated overloads with out kwarg (#5643 ) This improves backwards compatiblity with 0.3. It adds support for the out kwarg for the deprecated overloads that have optional positional alpha/beta/scale arguments. The addcmul(self, value, tensor1, tensor2, out=self) syntax is used by gpytorch.	2018-03-08 22:25:20 -05:00
Tongzhou Wang	8f068bd780	fix CUDA btrifact error message using wrong info type (#5644 )	2018-03-08 22:21:26 -05:00
Richard Zou	8ba8713f5d	torch.load() / torch.save() support arbitrary file-like object (#5466 ) * Test serialization file-like object API guarantees and update docs. * Implement torch.load() / torch.save() for arbitrary file-like objects * Add tests for torch.load/save for file-like objects * Fix compiler errors * Throw error if user tries torch.save(tensor, StringIO.StringIO) * Skip test_serialization_container_filelike. Investigation pending. * Address comments * Fix _test_serialization_container * Address comments * fix comment * Use PyBuffer_FromReadWriteMemory * Fix build by removing inlining * Fix clang builds? * Address comments * Don't use memoryview in python 2 * Ensure doRead/doWrite templates are instantiated before they're used in generic/serialization.cpp	2018-03-08 22:18:55 -05:00
anderspapitto	7f44c0d011	rename onnx/utils/__init__.py -> onnx/utils.py (#5639 )	2018-03-08 22:17:59 -05:00
anderspapitto	b9cc035654	import torch.jit in torch/__init__.py (#5638 ) previously, it was being implicitly imported via the import of torch.onnx this is no longer the case, and is a hacky thing to depend on anyway, so import it explicitly	2018-03-08 22:17:47 -05:00
anderspapitto	06df037d9a	do away with ExportProxy hack in onnx export (#5614 ) ExportProxy was a mechanism to reuse the code that supported exporting autograd Functions to support overriding arbitrary python functions. However, it had some serious downsides - only works on some functions (all args must be Variable) - complicated - bad error messages in some cases Instead, just expose enough functionality to python to perform the necessary logic explicitly.	2018-03-08 22:17:30 -05:00
Peter Goldsborough	4aecbe0877	Give ATen/gen.py output directory option (#5653 ) * Give ATen/gen.py output directory option * Dont yield files * os.path.join is too cross-platform	2018-03-08 22:08:50 -05:00
Ailing	92596197fc	add end to end test for DistributedDataParallel (#5182 ) * add end to end test for DistributedDataParallel * address comments * skip subgroup tests when less than 3 processes * set process number based on available gpus * add single gpu;cleanup WORLD_SIZE * fix comments	2018-03-08 22:07:34 -05:00
Tongzhou Wang	a268ed6588	fix momentum doc in IN andLN (#5649 )	2018-03-08 22:01:56 -05:00
Ailing	a3f463517e	add gpu guard for broadcast_coalesce (#5655 )	2018-03-08 21:59:19 -05:00
Qinqing Zheng	9acac2a513	Pass in task groups to PipedReaderBuilder (#2182 )	2018-03-08 16:16:57 -08:00
li-roy	4c4a42b3f9	implement CosineEmbeddingLoss as a native function and add reduce arg (#5646 ) * implement CosineEmbeddingLoss as a native function and add reduce=True arg to it * fix flake8 * address comments * add reference function to tests * fix flake8	2018-03-08 17:54:24 -05:00
onnxbot	807a4914c3	[auto] Update onnx to 728cc98 - Add outputs_info into run_node backend interface (#588 ) `728cc987af`	2018-03-08 22:49:14 +00:00
Jiyan Yang	f4b1e8b334	[Dper2] Add NetModifier abstraction and support for plotting the norm of blobs (#2201 )	2018-03-08 13:41:32 -08:00
onnxbot	d90cd73aea	[auto] Update onnx to b052fef - Fix node test name of Slice (#596 ) `b052feffab`	2018-03-08 21:40:13 +00:00
Luca Antiga	396637cdd6	Python-free build of autograd + jit (#5356 ) This PR adds the possibility to build the C++ parts of autograd and jit, with no dependency on Python. The goal is to allow taking a PyTorch IR representation (a tree s-expr) and running it with provided inputs. Prerequisite: build PyTorch so that codegen runs once. Instructions: cd tools/cpp_build bash build_all.sh This will build libtorchjit and torchjit_test in tools/cpp_build/build/torchjit-build. The latter basically runs the code in test_jit.cpp for now. While writing the PR, it turned out that a few of Python.h includes were redundant. They were removed here (PyTorch tests still pass on my machine, we'll see CI). * Introduce Python-free builds of autograd and jit * Remove NO_PYTHON ifdef in functions/special	2018-03-08 15:13:10 -05:00
Edward Z. Yang	9de922991c	Revert "implement CosineEmbeddingLoss as a native function and add reduce arg" (#5640 ) * Revert "implement CosineEmbeddingLoss as a native function and add reduce arg (#5447)" This reverts commit c16478fe3fb8842119438b8fd79d98c8f50ca688.	2018-03-08 14:07:17 -05:00
onnxbot	6c6d301e4e	[auto] Update onnx to ec5f1d3 - Add option to use customized protoc (#594 ) `ec5f1d3813`	2018-03-08 18:28:35 +00:00
Vishwak Srinivasan	32b3841553	[ready] General documentation improvements (#5450 ) * Improvize documentation 1. Add formula for erf, erfinv 2. Make exp, expm1 similar to log, log1p 3. Symbol change in ge, le, ne, isnan * Fix minor nit in the docstring * More doc improvements 1. Added some formulae 2. Complete scanning till "Other Operations" in Tensor docs * Add more changes 1. Modify all torch.Tensor wherever required * Fix Conv docs 1. Fix minor nits in the references for LAPACK routines * Improve Pooling docs 1. Fix lint error * Improve docs for RNN, Normalization and Padding 1. Fix flake8 error for pooling * Final fixes for torch.nn.* docs. 1. Improve Loss Function documentation 2. Improve Vision Layers documentation * Fix lint error * Improve docstrings in torch.nn.init * Fix lint error * Fix minor error in torch.nn.init.sparse * Fix Activation and Utils Docs 1. Fix Math Errors 2. Add explicit clean to Makefile in docs to prevent running graph generation script while cleaning 3. Fix utils docs * Make PYCMD a Makefile argument, clear up prints in the build_activation_images.py * Fix batch norm doc error	2018-03-08 13:21:12 -05:00
li-roy	c16478fe3f	implement CosineEmbeddingLoss as a native function and add reduce arg (#5447 ) forward (new) [1.1905965859768912, 1.160144692985341, 1.1558120870031416] backward (new) [1.9150976981036365, 1.9792822760064155, 1.8779143309220672] double backward (new) [3.6898688060464337, 3.5784677929477766, 3.569505032035522] forward (old) [3.2359962839400396, 3.275224728975445, 3.3409753759624436] backward (old) [5.668679727939889, 5.722980880062096, 5.585088661056943] double backward (old) N/A * implement CosineEmbeddingLoss as a native function and add reduce=True arg to it * fix flake8 * address comments * add reference function to tests * fix flake8	2018-03-08 13:15:12 -05:00
anderspapitto	28b1c94f0f	allow application of @symbolic decorators without circular imports (#5595 )	2018-03-08 12:44:16 -05:00
Richard Zou	08f9cad140	Fix typo (#5635 )	2018-03-08 12:32:22 -05:00
Joseph Spisak	cebf44e960	Element-wise tests now use or seeded with hypothesis (#2181 ) * Updated all element-wise tests to use hypothesis testing or at least use hypothesis seeds * Updated tests to add seed to sqr function	2018-03-08 07:51:45 -08:00
Junjie Bai	d812a196e7	.	2018-03-08 10:10:34 -05:00
Junjie Bai	c55dc983d9	Fix ninja build in setuptools	2018-03-08 10:10:34 -05:00
Jongsoo Park	bdc63be9fd	log INFO for not available engine only when engine was explicitly specified (#2187 )	2018-03-08 06:49:14 -08:00
li-roy	363de58a8b	implement double backwards for MaxPool3d (#5328 ) * implement double backwards for MaxPool3d * change MaxUnpool3d to use same indices as MaxPool3d * fix nits	2018-03-08 06:15:07 -05:00
Tongzhou Wang	04461fa289	Prefix DataLoaderIter with underscore to discourage subclassing (#5619 )	2018-03-08 11:09:51 +01:00
Christian S. Perone	8720d72d7c	Fixing inconsistent docs (missing parameters docs). (#5620 )	2018-03-08 10:42:40 +01:00
onnxbot	5450ef50ed	[auto] Update onnx to 2edc1e7 - Handle situations where protobuf is built on the fly (#592 ) `2edc1e727b`	2018-03-08 04:11:03 +00:00
Marat Dukhan	280d51e324	Use Ninja build system in setup.py when available	2018-03-07 20:49:30 -05:00
Marat Dukhan	bbc2c642c9	Use Ninja build system when available When Ninja is installed, use it instead of Make for native builds and for Android cross-builds.	2018-03-07 20:49:30 -05:00
Alexander Sidorov	60aa8c793d	Update caffe2 from facebook (#2178 ) * [C2] Don't crash kernel in case of invalid shapes for ConcatOp Enforce correctness of the shapes for input tensors so we won't access invalid index. * [Caffe2] Add analytical performance counters to Dynolog Initial diff for counting analytical flops and memory writes for C2 operators. * BBoxTransform op: Handle RoIs from multiple images per batch BBoxTransform op used during typical Faster-RCNN inference operates only on RoIs from a single image (no batching). Adding support to handle that with an optional output blob containing the batch splits (i.e., the number of RoIs belonging to each item in the batch). The code is perfectly backward compatible and shouldn't break any existing models.. * [mkl] Make MKL-DNN cooperate with memongered nets C2's MKL-DNN implementation caches input dims and reuses intermediate and output buffers across net runs, which prevents memonger from being used. This may not always be useful since input dims may vary widely in many cases and we'll end up reallocating anyway. Added an option to force reallocation when memonger is used. * [oncall] fix batch gather ops for empty input still need to bisect for the breaking change, but this shall fix the case for empty input. the error logging is like: https://interncache-ftw.fbcdn.net/t49.3276-7/23938497_293562711176943_6500112636590424064_n.txt?_nc_log=1 @[557759185:raychen] can you help to subscribe oncall from ads side. this may affect the Sigrid online trainer. * optimize BatchOneHotOp We want to iterate in row-major as opposed to column-major for better locality. * Supported exporting model with int blobs. Supported exporting model with int blobs. Needed by condensenet. * BoxWithNMSLimit op: Handle boxes from mutiple images per batch Similar to D7135360. Added support for multiple images per batch in the op. Takes an optional additional input "batch_splits" as output by BBoxTransform op, and returns new batch_splits after applying NMS and filtering. Otherwise, backward compatibility is maintained.	2018-03-07 16:41:22 -08:00
Yinghai Lu	957ddb54d6	Fail fast in pytest (#2116 )	2018-03-07 16:36:16 -08:00
theweiho	c2721ab503	Add per-element unique op for CPU (#5503 ) Questions/possible future works: How to template-ize to extend support beyond LongTensor? How to check if autograd works (and if not, how to add explicit gradient)? CUDA support? Testing command: DEBUG=1 NO_CUDA=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build && DEBUG=1 NO_CUDA=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py develop && python3 test/test_torch.py Partially fixes #2031 * Initial commit for unique op * Working unique with test * Make inverse indices shape conform to input * flake8 whitespace removal * address review comment nits * Expose fn and add docs. Explicitly declare no gradients * Trial generic dispatch implementation * Add tests for generics * flake8 whitespace * Add basic CUDA error throwing and templateize set * Explicit contiguous and AT_DISPATCH_ALL_TYPES return * Remove extraneous numpy conversion * Refactor out .data calls * Refactored to variable return length API with wrapper fn as opposed to returning a 0-length tensor, per off-line reviewer comments * Remove A * Don't use hidden torch._unique() in test * Fix documentations	2018-03-07 18:16:51 -05:00
Kutta Srinivasan	0a18608b43	hacks to test exception handling and python operator backtraces Add exception handling & re-throwing to worker threads of DAGNetBase	2018-03-07 15:09:17 -08:00
Jerry Zhang	ab8498e5c8	Acl copy ops (#2158 ) Copy op for ACL backend	2018-03-07 13:34:08 -08:00
ilia-cher	0c6e843028	[caffe2] Add scopes into ONNX While op (#2149 ) Summary: Executing loop's body in a separate workspace, using WorkspaceStack to support saving and reusing of workspaces Test Plan: python caffe2/python/operator_test/onnx_while_test.py Reviewers: caffe2-review, jamesreed Subscribers: Tasks: Tags:	2018-03-07 12:34:11 -08:00
Will Feng	9ebfece900	Update perf test baseline with every master commit (#5605 ) * Update perf test baseline with every master commit * Get perf test data from repo for local runs	2018-03-07 15:07:30 -05:00
kevinbchen	eededd3f97	Move main reshape logic for easier reuse (#2122 ) We'll want to reuse this logic for Int8 Reshape, but currently the code assumes Input(0) and Output(0) are TensorCPUs, which may not be the case for a subclass.	2018-03-07 11:32:39 -08:00
onnxbot	88883825e5	[auto] Update onnx to 910db3b - Minimally fix CMakeLists on Windows (#589 ) `910db3bcd9`	2018-03-07 17:07:20 +00:00
Ailing	fcaa3bf609	disable ibverbs build with env variable (#5513 ) if the env variable is specified, use its value to determine what to do otherwise use the heuristic we have (should_build_ib)	2018-03-07 11:18:48 -05:00
Richard Zou	461e3e3ae0	Allow indexing tensors with both CPU and CUDA tensors (#5583 ) * Allow indexing tensors with both CPU and CUDA tensors * Remove stray import	2018-03-07 10:24:12 -05:00
Will Feng	a90b695590	Disallow num_workers > 0 for DataLoader on Windows (#5591 ) Using DataLoader with num_workers > 0 is known to cause CUDA out-of-memory issue on Windows. This issue has already been noted in #4092.	2018-03-07 10:21:03 -05:00
li-roy	3bc90d471d	remove legacy workaround for hinge embedding loss reference fn (#5596 )	2018-03-07 10:20:08 -05:00
Francisco Massa	0f50ca0b48	Add reduce to functional smooth_l1 documentation (#5610 ) This has been present in master since https://github.com/pytorch/pytorch/pull/3382 but the doc for the functional interface was not taken into account.	2018-03-07 10:16:40 -05:00
Peter Goldsborough	792daeb422	Enable documentation for C++ extensions on the website (#5597 )	2018-03-07 14:07:26 +01:00
Sam Gross	63b4694bb8	release() does not need to be virtual (#5594 ) Only the destructor `~Retainable()` needs to be virtual.	2018-03-07 04:07:24 -05:00
onnxbot	5d74462891	[auto] Update onnx to 8bcecad - fix cast op type constraints. (#587 ) `8bcecad91c`	2018-03-07 06:54:24 +00:00
onnxbot	4c0e0ebb4e	[auto] Update onnx to 4e9d21b - travis tweaks to make sure the correct versions of python are installed (#584 ) `4e9d21b68e`	2018-03-07 05:14:09 +00:00
Marat Dukhan	c9cc514df4	Bump minimum CMake version to 3.2 CMake 3.2 is required to properly track dependencies in projects imported as ExternalProject_Add (BUILD_BYPRODUCTS parameter). Users on Ubuntu 14.04 LTS would need to install and use cmake3 package for configurations. Users of other popular distributions generally have a recent enough CMake package.	2018-03-06 19:57:48 -08:00
Yangqing Jia	dd1564b061	Caffe2 module update: move observers as well as binaries. (#2145 ) * Caffe2 module update: move observers as well as binaries. * Add threads linkage * Add Threads dependency to public interface	2018-03-06 14:45:21 -08:00
Adam Paszke	cdd0febd86	Fix for a confusion around grammar of Maybe (#5593 )	2018-03-06 23:05:20 +01:00
Sam Gross	82bdc51dd1	Use operator.index to convert indices to Python int (#5582 ) This makes ParameterList, ModuleList, and Sequential convert PyTorch and NumPy scalars to integers. This matches the behavior of Python lists.	2018-03-06 12:41:23 -05:00
Adam Paszke	5597aba868	Add return statement to the JIT AST (#5578 )	2018-03-06 13:14:53 +01:00
Adam Paszke	a6650f5664	Recompute captures after the parameter is updated (#5488 )	2018-03-06 11:19:45 +01:00
Dmytro Dzhulgakov	7d141d4243	Changes done internally at Facebook (#2154 ) f679c644e332 dzhulgakov [caffe2] Sync script - add ability to handle rebase conflicts 51729b061a15 dzhulgakov [caffe2] Changes done on GitHub	2018-03-06 01:23:54 -08:00
Dmytro Dzhulgakov	9395a26fe5	disable NetTest.ChainingForDifferentDevices which is broken	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	e3e9f91889	Fixed a typo in BoxWithNMSLimit doc. Fixed a typo in BoxWithNMSLimit doc.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	bec8923e02	[C2] Adding Clip Tensor by Scaling op This op is used for gradient clipping to take care of exploding / vanishing gradients. If original_norm is larger than the threshold, then each element of the tensor is scaled by threshold / original_norm.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	9f2a35ee8b	[C2] Enable LARS on GPU [PR Patch #2115 ] ATT https://github.com/caffe2/caffe2/pull/2115	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	56ac3ef180	Correcting size types in (Un)PackSegmentsOp int32 is too small for large sequence & batch size.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	4a4407337d	Supported inplace arguments for norm_planar_yuv. Supported inplace arguments for norm_planar_yuv.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	ee33a24af2	avoid vector copy/destruction in *_dim_ helper functions All of these take the dims by value. In a tight loop this is really significant because of copy and free.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	6b98315a28	[GanH] Model Test as titled	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	16ba087b64	[oncall]fix unittest dper/layer_models/tests:utils_test as titled -- fix offending diff D7091725 due to added debug_info in operator proto	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	496c999f7d	[core] NUMA-aware pinned allocator Using cudaHostRegister/Unregister instead of cudaMallocHost to move memory to a specific NUMA node	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	7d8188a4c2	fix invalid-null-argument UBSAN error math_cpu.cc Exposed by UBSAN	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	b68e2786e0	fix invalid-null-argument UBSAN error in math_cpu.cc	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	14c47fb211	fix invalid-null-argument UBSAN error in math_cpu.cc Add an if statement to check if the destination buffer is not nullptr.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	80d0f5de93	[mobile][mpscnn] iOS11.3 interface update data source change for MPSCNNConvolution	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	08bb6ae8bb	Fix OSS build Fix OSS build broken after D6946982 by adding CMake detection variable (https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-gcc4.9-ubuntu14.04-build/1343/console)	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	9e71de398b	[core] Graph-level NUMA awareness in Caffe2 Adding NUMA awareness through numa_node_id in DeviceOption. Blobs of operators with numa_node_id are allocated on corr. memory banks, using CPU pools with NUMA affinity set to run operators.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	8b0b090ff1	fix Caffe2TensorToNumpyArray for py3 with python3 np.int defaults to int64. This diff should fix it. I don't know if test exist for this function already, however following ASR test was breaking when i switch to py3 ``` buck test caffe2/caffe2/fb/speech/asr_training/:tensor_parser_test ```	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	968ebb3b82	[GanH]fuse jsd with lr loss/xent as titled	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	fe3c22cd24	[GanH/Easy]Fix blob dim as titled	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	08dbd96642	Add TensorInferenceFunction for PowOp Add TensorInferenceFunction for PowOp so that we can infer the shape and datatype of Pow output.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	f2ec5b7b0e	[DPER] Fix bug in uint8 quantization shortcut. After D6953547 some of the blobs were no longer impacted by uint8 quanitzation, but they would still generate operators expecting uint8 inputs and thus fail. This diff is adding a temporal hack to avoid doing this quantization when layer is not quantized. Will fix it with switching to Net rewriting instead.	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	1f0a833d8e	JSD fwd/bwd op as titled	2018-03-06 00:33:11 -08:00
Dmytro Dzhulgakov	2d3aebd5fb	fix bug for conv3d Op cpu There is a bug in ConvOp. SetDeviceTensor function only copies data to tensor when the sizes of the two are different. In the 3d convolution case for video models, img_shape_device_ (NCTWH) is modified only in the first processed example, and for the following examples, it won't get updated, because img_shape_device_.size() == img_shape.size(). However, it should get updated for each example, because T is changing for different videos. It is the same with col_buffer_shape_device_. In this diff, if any dimension of img_shape_device_ is different from img_shape, img_shape_device_ get updated.	2018-03-06 00:33:11 -08:00
Yangqing Jia	4aded2f7c1	Add Numa support (#2152 )	2018-03-05 23:30:20 -08:00
Kutta Srinivasan	115579697e	fix typo in previous cudnn fix	2018-03-05 21:13:03 -08:00
Kutta Srinivasan	2b7f750992	Fix cudnn < 6	2018-03-05 21:13:03 -08:00
Kutta Srinivasan	b4b2f0d2cc	Work on fp16 conv op	2018-03-05 21:13:03 -08:00
Kutta Srinivasan	72f259c84b	Add C++ preprocessor define CAFFE2_USE_EXCEPTION_PTR to guard use of std::exception_ptr	2018-03-05 16:02:40 -08:00
bddppq	5c769bd243	Update Python information shown in CMake summary (#2132 ) * Do not show Python library in cmake summary as we no longer link with libpython * Show python include dirs in cmake summary	2018-03-05 15:27:59 -08:00
Sam Gross	7588893ce2	Some additional clean-ups (#5505 ) - Remove some uses of mega-header THP.h - Use HANDLE_TH_ERRORS in functions that may throw - Move NumPy includes to common header - Delete unused allocator	2018-03-05 17:45:02 -05:00
Richard Zou	a91b2ad85f	Fix flake8 (#5573 )	2018-03-05 17:28:12 -05:00
onnxbot	e7897a3dc7	[auto] Update onnx to a711252 - Recent CI changes have issues: revert them while fixing to unbreak CI (#583 ) `a71125280c`	2018-03-05 22:03:57 +00:00
Sam Gross	a2c3ffa5c7	Delete unused expand functions in TH/THC (#5533 ) These are implemented as native functions in ATen	2018-03-05 16:17:28 -05:00
mingzhe	6aef608f10	Fix Out of Memory failure in test TensorTest.Tensor64BitDimension (#2114 ) * WIP: Fix Out of Memory failure in test TensorTest.Tensor64BitDimension * WIP: update warning message and wrap resize inside TensorTest.Tensor64BitDimension * WIP: only catch exception which is related to out of memory * WIP: add return in the out of memory exception	2018-03-05 10:26:05 -08:00
Sam Gross	976aaa55aa	Add at::optional from https://github.com/akrzemi1/Optional (#5530 ) Add optional.hpp from https://github.com/akrzemi1/Optional `f27e79084a/optional.hpp`	2018-03-05 12:32:23 -05:00
Edward Z. Yang	c7e69e9015	Test documentation build in CI. (#5492 ) * Test documentation build in CI. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * bugfix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-05 09:42:40 -05:00
Clinton Yeboah	1ff36bfd61	Missed Step (#5558 ) I think you missed a step	2018-03-05 08:59:48 -05:00
Dillon Laird	8376e63738	fixed softmax support documentation (#5557 )	2018-03-05 08:59:06 -05:00
Evpok	c93076495d	add: padding_value to `torch.nn.utils.rnn.pad_sequence` (#5540 )	2018-03-05 11:39:03 +01:00
Kutta Srinivasan	9213109e58	Modifications to improve readability of prof_dag	2018-03-04 17:06:11 -08:00
Kutta Srinivasan	fb848311b9	Add .watchmanconfig to .gitignore so Atom/Watchman won't complain	2018-03-04 13:06:29 -08:00
Fritz Obermeyer	66547ca061	Fix links in distribution docs (#5531 )	2018-03-04 21:33:07 +01:00
peterjc123	abd8501020	Export MAX_JOBS for build_libs on WIndows (#5550 )	2018-03-04 10:50:28 +01:00
peterjc123	6aeaa52476	Fixes #5542 , api changes for output path on Windows (#5549 )	2018-03-04 02:02:47 -05:00
Edward Z. Yang	4ad58e6278	Deterministically seed all ATen C++ tests. (#5545 ) Hopefully this fixes the following assertion faiulre: /var/lib/jenkins/workspace/aten/src/ATen/test/native_test.cpp:102: test: Assertion `d5.matmul(d1).allclose(d5.view({24, 2, 3}).bmm(d1.view({1, 3, 1}).expand({24, 3, 1})).view({3, 2, 4, 2}))` failed. (this error seems to only occur on ASAN tests...) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-04 02:01:34 -05:00
Yongjik Kim	c713c667e0	Use fast integer division algorithm to avoid division ops inside kernels. (#5054 ) * Use pre-computed offset increments to avoid int division inside kernels. - OffsetInfo and OffsetIterator pre-computes the necessary coordinate change along each dimension, so that each successive offset can be computed using only addition/subtraction/comparisons. - Added IntDivider which supports "magic division" for uint32_t, thus eliminating integer divisions altogether for offset calculation, as long as indices fit in 32 bits. - In code paths with statically determined dimensions (Dims=1 or 2), kernel arguments now contain only the necessary data (instead of MAX_CUTORCH_DIMS of everything). - Fixed index overflow errors: for tensors with >= 2G elements, we used to have incorrect results or an infinite loop inside the kernel. TODO: The following pattern is broken for tensors with >= 2G elements. It will result in overflow, even if IndexType is uint64_t. Need to search and replace them. > for (IndexType linearIndex = blockIdx.x * blockDim.x + threadIdx.x; > linearIndex < totalElements; > linearIndex += gridDim.x * blockDim.x) { * Update CMakeLists.txt * Removed OffsetIterator, and kept only the fast integer division logic. - Also changed canUse32BitIndexMath so that the max index for 32-bit math is INT32_MAX, instead of UINT32_MAX. It also simplifies the division operation. * Merged OffsetInfo into THCTensorInfo.cuh.	2018-03-04 00:39:57 -05:00
cjsg	15eae9543e	Fixed dimensions in docs of conv and conv_transpose (#5543 )	2018-03-03 05:49:01 -05:00
Pooya Davoodi	37dec493a5	Scope MultiRNN blobs with name as well as layers (#2025 ) * Scope MultiRNN blobs with name as well as layers Also don't double scope MultiRNN in case of multiple layers. * Scope input projection of first layer with name We don't scope it with layers because the projection is done outside of the layer. * Avoid scoping input blob in MemongerTest.test_rnn * Rectify input_blob in prepare_input Revert change in memonger_test because rectifying input will solve the problem.	2018-03-02 22:21:07 -08:00
Tongzhou Wang	18a76f54a6	add concrete example for python_default_init in native functions doc; (#5538 )	2018-03-02 23:05:06 -05:00
Qinqing Zheng	d013e16cf4	[C2] Enable LARS on GPU (#2115 )	2018-03-02 18:06:19 -08:00
Kutta Srinivasan	7cb2863e80	Merge branch 'master' of https://github.com/caffe2/caffe2	2018-03-02 17:34:51 -08:00
Kutta Srinivasan	5f8029f90b	Fix documentation for WeightedSumReducerDef Summary: Fix documentation for WeightedSumReducerDef to be more general since it applies to both Sparse and Dense ops Test Plan: Reviewers: Subscribers: Tasks: Tags:	2018-03-02 17:14:07 -08:00
Marat Dukhan	e026cb1854	Fix build with ComputeLibrary on ARM64 (#2124 )	2018-03-02 16:48:57 -08:00
onnxbot	0bce97b101	[auto] Update onnx to 8bbeb2a - Improve CMakefile of ONNX (#563 ) `8bbeb2ae45`	2018-03-03 00:45:58 +00:00
Yangqing Jia	fdfe1d09a0	Explicitly require listing additional libraries if a binary needs things beyond Caffe2_MAIN_LIBS (#2110 )	2018-03-02 16:29:13 -08:00
Joseph Spisak	11a736b682	Sqrt op (#2101 ) * First attempt on sqrt op * Adding the Sqrt op along with the test cases * Made changes per @Yangqing's questions re: tensor format and used hypothesis to generate input tensor	2018-03-02 16:19:45 -08:00
Mohammad Hossain	349238f5bf	Mean Op (#2072 ) * Mean Op * Mean Op * Mean Op * Fix gradients and include seed for randomized input generation * Update test strategies parameters	2018-03-02 16:18:17 -08:00
Xiaomeng Yang	558e2a92df	Revert update on top_k_op (#2119 )	2018-03-02 16:07:45 -08:00
Marat Dukhan	fc7ee0c941	Use NEON for Android build	2018-03-03 01:04:56 +01:00
Will Feng	01e261c25b	Update perf number for test_gpu_speed_word_language_model (#5529 )	2018-03-02 18:53:12 -05:00
Xiaomeng Yang	c70beed31c	Add axis to top_k_op.	2018-03-02 15:21:31 -08:00
James Reed	60415cf0d2	Big batch of fixes for JIT (#5517 ) * Check if node output matches in shape propagation * Fix list attributes and view shape propagation * fix inferred shapes for view * Fix shape inference for integrally typed tensors * Fixes for concat in control flow * Fix print	2018-03-02 15:03:44 -08:00
onnxbot	b5a3894c61	[auto] Update onnx to 679d70e - temporarily disable python3 on osx for travis (#579 ) `679d70e30a`	2018-03-02 22:40:42 +00:00
Xiaomeng Yang	f76fc6fa19	Update locally_connected_op (#2113 )	2018-03-02 14:05:52 -08:00
Richard Zou	2d4212274e	Add typing dep to ATen standalone .travis.yml (#5527 )	2018-03-02 16:44:56 -05:00
Paul Jesse Hellemn	ec3c299baf	Turning off conv_op_test for now (#2104 ) * Skipping conv_op_test * Adding todo	2018-03-02 11:36:08 -08:00
Koan-Sin Tan	72d5d9016a	move -s to CMakeLists.txt	2018-03-02 20:31:33 +01:00
Koan-Sin Tan	24dee1515c	add a rule back for non-Android platforms `-DBUILD_TEST=ON -DBUILD_BINARY=ON -DUSE_OBSERVERS=ON -DBUILD_OBSERVERS=ON` should work for both Andorid and non-Android platformas (e.g., Ubuntu)	2018-03-02 20:31:33 +01:00
Koan-Sin Tan	fab2c07af9	make -DUSE_OBSERVERS=ON work ``` scripts/build_android.sh -DBUILD_TEST=ON -DBUILD_BINARY=ON -DBUILD_OBSERVERS=On -DUSE_OBSERVERS=ON ```	2018-03-02 20:31:33 +01:00
Koan-Sin Tan	9befaf14ea	fix -DBUILD_TEST=ON -DBUILD_BINARY=ON for Android make ``` ./script/build_android -DBUILD_TEST=ON -DBUILD_BINARY=ON ``` work	2018-03-02 20:31:33 +01:00
Koan-Sin Tan	ca90d4c356	Add -s for Android back Android libraries are statically linked, we'd better strip binaries	2018-03-02 20:31:33 +01:00
Sam Gross	54b4cdeffa	Replace all uses of 'Tensor or Variable' with 'Tensor' (#5508 ) Replace all uses of 'Tensor or Variable' and 'Variable or Tensor' with 'Tensor'	2018-03-02 14:26:11 -05:00
Zachary DeVito	806239d6bd	Fix a bug gen_jit_dispatch.py (#5518 ) * Fix a bug gen_jit_dispatch.py The `fromLast` function is confusing to understand since `fromLast(stack, 0)` was actually invalid whereas `fromLast(stack, 1)` was the last element. This created off-by-one bugs in gen_jit_dispatch for some operators. This changes it to `peek(stack, i, N)` which treats the last `N` elements of the stack as a list, and extracts element `i` of that list. This usage reflects how `fromLast` was actually being used in the code. `peekSlice(stack, i, len, N)` similarly treats the last N elements as a list but extracts a slice. This enables use to get rid of drop calls and simplify the dispatch logic.	2018-03-02 10:32:02 -08:00
Edward Z. Yang	58cd133f7e	Avoid OOM when running ASAN by splitting nn tests. (#5523 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 12:21:15 -05:00
Edward Z. Yang	f064c5aa33	Expunge all occurrences of torch._C._VariableFunctions (#5525 ) Some of the call-sites now look a little hokey with this removed, saving that for another patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 12:19:44 -05:00
Sam Gross	8f627fc658	CharTensor should be signed (#5512 ) CharTensor is actually int8_t which is signed	2018-03-02 11:34:10 -05:00
Dmytro Dzhulgakov	a6520d6b98	Changes to ATenOp CMake to make it compatible with BUCK (#2111 ) * Add TARGETS for ATenOp (hackily) This is the best way I could figure out to hook up custom_rule. See https://fb.prod.facebook.com/groups/fbcode/permalink/1810939952287945/ for more details on why it's tricky. As for the fix with SparseTensor - it seems to be a bug in ATen declarations introduced recently. * cmake fixes	2018-03-02 07:58:13 -08:00
Edward Z. Yang	0877558e60	Port cuDNN RNN dropout state initialization to ATen and make Python c… (#5383 ) * Port cuDNN RNN dropout state initialization to ATen and make Python code use it. Fixes #5138. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Variable/Tensor bugfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 10:00:00 -05:00
Edward Z. Yang	dda4bdd596	Reduce OS X MAX_JOBS because we are still OOMing (#5493 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-02 09:38:19 -05:00
Sam Gross	70ba50c3d4	Remove some uses of torch.is_tensor in favor of isinstance (#5473 )	2018-03-02 06:17:38 -05:00
Sam Gross	5dedc648bb	Compile DataLoader.cpp separately (#5507 ) Don't #include DataLoader.cpp in Module.cpp	2018-03-02 05:54:33 -05:00
li-roy	df88373f88	set default ams param in adam optimizer (#5501 )	2018-03-02 11:43:06 +01:00
Peter Goldsborough	bbad9e7c8a	Add virtual destructor to SourceLocation (#5516 )	2018-03-02 10:06:04 +01:00
sf-wind	b7ab3ff5e3	Change caffe_add_linker_flag to caffe2_interface_library (#2109 ) * Remove OpenGL code from benchmark * Update function name since it is renamed in other places	2018-03-01 21:45:44 -08:00
Alexander Sidorov	1af7df6e78	fix rnn_cell_test in fbcode (#2107 )	2018-03-01 21:02:52 -08:00
Jongsoo Park	acd8dfdfb9	Warning if engine is not available (#2106 ) * Warning if engine is not available * Update operator.cc	2018-03-01 19:11:21 -08:00
Lu Fang	1981557751	Add README and ONNXOpCoverage doc back (#2102 ) * Add README and ONNXOpCoverage doc back * Polish the coverage table again * Remove onnx-caffe2 from title	2018-03-01 17:05:25 -08:00
Edward Z. Yang	0de5443469	Reorganize interned strings into categories, autogen ATen strings. (#5471 ) This also starts generating dispatch code for __and__ and similar variants. I was too lazy to see if we have committed the '__and__ is not inplace' mistake other places. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-01 19:51:04 -05:00
onnxbot	7cdc272224	[auto] Update onnx to 27b4022 - osx travis support (#566 ) `27b40225ea`	2018-03-01 23:51:35 +00:00
Lu Fang	aa5145bf14	Enable onnx backend test on pow, ceil and floor (#2103 )	2018-03-01 15:33:58 -08:00
Tongzhou Wang	27265503ad	nn.* doc update after Variable/Tensor merge (#5459 ) The nn.* counterpart of #5443 . Mostly removed Variable wrapper. Also added doc for nn.RReLU. Notice that torch.randn(*, requires_grad=True) isn't documented until #5462 is done.	2018-03-01 18:11:39 -05:00
Edward Z. Yang	1ae884ff86	Add pytorch-docker-build-test. (#5468 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-01 17:09:54 -05:00
anderspapitto	c0304c83b1	Copy some outputs in order to decouple storage (#2105 ) so that mutating one of them does not mutate the others	2018-03-01 13:25:31 -08:00
Peter Goldsborough	a5e1b4efc9	Fix warnings in jit (#5499 )	2018-03-01 15:15:35 -05:00
Yangqing Jia	56096c2311	Building rocksdb as a module (#2094 )	2018-03-01 12:01:44 -08:00
Peter Goldsborough	4a50ab0fdb	Fix naming issue in TensorCompare.cpp (#5498 )	2018-03-01 14:55:25 -05:00
Sam Gross	b38ed69441	Delete unused files (#5500 )	2018-03-01 14:28:06 -05:00
gchanan	285a9e2452	Add dtype to torch.Tensor constructors and accept them in set_default_tensor_type (#5444 ) * Add dtype to torch.Tensor, torch.FloatTensor, etc. * Support passing dtypes to set_default_tensor_type. * Check dtype exception. * Correctly handle new type initialization order. * Move handling of torch.Storage alias to C++. * Delete function that erroneously reappeared.	2018-03-01 14:06:55 -05:00
Christian Sarofeen	b69b885e82	cuDNN 7.1 fix. (#5439 ) Output from cudnnGetFilterNdDescriptor has changed in cuDNN 7.1 this fix will be forward and backward compatible.	2018-03-01 12:23:11 -05:00
Will Feng	9235277dba	Re-enable some CUDA tests on Windows (#5446 ) This PR enables the following tests on Windows again: CUDA HalfTensor tests in test_torch.py and test_nn.py test_Conv2d_deterministic_cudnn in test_nn.py test_Tensor_qr_big in test_cuda.py The issues are no longer reproducible, possibly because of an upgrade to the display driver. Reenable CUDA HalfTensor tests on Windows * Reenable test_Conv2d_deterministic_cudnn on Windows * Reenable test_*Tensor_qr_big on Windows	2018-03-01 12:21:17 -05:00
Paul Jesse Hellemn	8c6c09ad41	Adding openmpi to all conda builds (#2089 ) * Adding openmpi to all conda builds * Typo and turning off quiet * Removing openmpi from non_cuda conda build * Actually openmpi is already in the images	2018-03-01 09:19:34 -08:00
Edward Z. Yang	ef0ef70cf5	Don't spuriously raise warning for Constant nodes, fixes #5101 (#5469 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-03-01 12:08:48 -05:00
Peter Goldsborough	5e188e4182	Refactor and simplify ATen dispatch (#5475 ) Simplifies type dispatch options to consistent use of macros (not macros here and functions there), Adds the dispatch header to ATen/ATen.h so that users (e.g. writing extensions) can dispatch too. * Refactor and simplify ATen dispatch * cuda/Dispatch.h -> cuda/Dispatch.cuh * Change dispatch strategy for half * Use __VA_ARGS__ and get rid of parantheses * Remove rogue UnderlyingType.h * Fix TensorCompare.cu and add comment * Include CUDATensorMethods in TensorCompare.cu * to_cuda_type -> cuda::type and move AccumulateType out of native	2018-03-01 12:02:01 -05:00
Adam Paszke	b1dec4a74f	Fix doc-push (#5494 )	2018-03-01 17:37:30 +01:00
gchanan	da894901ef	Deprecate variable factory, use torch.tensor instead (#5476 ) * Remove usages of torch.autograd.variable; use torch.tensor instead. * Deprecate torch.autograd.variable. * Remove unused sample_scalar.	2018-03-01 10:58:16 -05:00
Piotr Mitros	7b33ef4cff	Documentation cleanup for activation functions (#5457 )	2018-03-01 14:53:11 +01:00
bddppq	72aa83d702	Add pytest ccache into git ignore (#2095 )	2018-03-01 00:09:33 -08:00
Paul Jesse Hellemn	c96338ee2c	Fixing mkl builds not using mkl (#2093 )	2018-02-28 23:59:01 -08:00
Adam Paszke	4afd62db09	Add TracedModule to the JIT (#5409 )	2018-02-28 22:50:50 -08:00
Yinghai Lu	544aeaec62	Refix the linkage condition (#2091 ) Merging as the test failure is a known issue and it's not relevant (linux vs mac).	2018-02-28 22:44:13 -08:00
wjcskqygj2015	2ad242bee9	Update Dependencies.cmake (#1920 ) force find_package first to find OpenCV 3 when we have default package OpenCV 2 installed.	2018-02-28 22:22:25 -08:00
Lu Fang	fcde409166	Fix the pybin11_state_gpu.so linking issue (#2087 )	2018-02-28 22:19:57 -08:00
onnxbot	e03d74c40e	[auto] Update onnx to ee79865 - Clarify reshape behavior when '0' is passed in (#569 ) `ee7986538a`	2018-03-01 04:18:25 +00:00
James Reed	55c64e5243	Add Python function calls to JIT script (#5445 ) * Add Python function calls to script * Script compiler gains a `Resolver` object that runs when it does not understand a function call. This decouples the python resolution from the conversion to IR.	2018-02-28 19:45:04 -08:00
Peter Goldsborough	b10fcca5f0	Install cuda headers in ATen build (#5474 )	2018-02-28 19:36:41 -08:00
ngimel	771791fe2f	install pytorch into default conda env (#5482 )	2018-02-28 21:42:38 -05:00
Paul Jesse Hellemn	c3e4d7ff87	Cuda full (#2084 ) * Removing leveldb for ubuntu * changes * Removing ibverbs * Moving cuda_full back to default channels for gcc >5	2018-02-28 18:21:00 -08:00
Zachary DeVito	39608b0180	Add source information to IR nodes (#5449 ) * Add source information to IR nodes SourceRange information from the script is not propagated to IR nodes. This information is only used in two places now: the interpreter wraps errors that occur when an instruction executions and shape propagation now reports errors on the line where it fails: Traceback (most recent call last): File "test/test_jit.py", line 1655, in test_script_error bar(Variable(torch.rand(10), requires_grad=True), Variable(torch.rand(9), requires_grad=True)) RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0: @torch.jit.script def bar(c, b): return c / b ~~~~~ <--- HERE In the future, shape propagation should really not report any size errors and instead just not propagate shapes and let the actual execution fail. However, this is hard to accomplish while we still depend on running the op to do shape propagation.	2018-02-28 17:06:18 -08:00
Soumith Chintala	36abf023bd	Added 3d grid sampler (for volumetric transformer networks) (#5453 ) * add 3d grid_sample * add cuda implementation, more testing	2018-02-28 19:32:15 -05:00
Richard Zou	7772d26cb0	Fix test sparse (#5478 )	2018-02-28 16:05:50 -08:00
anderspapitto	749a17661c	Introduce padding op to mimic pytorch semantics in ONNX export (#2069 ) In pytorch, after pad_packed_sequence, the "extra" elements (after the ends of the sequences) are reset. In the equivalent Caffe2 graph exported via ONNX, they contained some leftover values, which caused tests to fail. Probably no one depends on these values, but just in case, set them to zero to mimic pytorch semantics.	2018-02-28 15:44:54 -08:00
peterjc123	377d896969	better solution for the linking error related to lazy_init for MSVC (#5375 ) * Revert "Fix wrong argument name (#5366)" This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a. * Fix wrong argument naming * Revert "Wrap torch::cuda::lazy_init with WITH_CUDA flag" This reverts commit a8fa37f8fac5aef09eb7fe54d84de6126618c262. * Revert "Solves the linking error related to lazy_init for MSVC" This reverts commit 63913a102f274865a76e7c40ffdf6b40c277d5ff. * better solution for the linking error related to lazy_init for MSVC * Naming changes * Namespace changes and further comment * Rebasing onto current master * Remove code that is useless * Fix linting * Remove rebasing bugs	2018-02-28 17:34:34 -05:00
Orion Reblitz-Richardson	5c381bbc57	Patch cuda-convnet2 from internal Facebook changes. * Unfortunately this needs to be manually monkey patched. * This should get it so GitHub and fbcode versions match.	2018-02-28 14:20:48 -08:00
onnxbot	ea10b7bc63	[auto] Update onnx to 4f00542 - Create unique proto filename based on ONNX_NAMESPACE (#555 ) `4f00542fc1`	2018-02-28 22:08:33 +00:00
Sam Gross	509aed6ca3	More Variable/Tensor clean-ups (#5464 )	2018-02-28 16:46:47 -05:00
Paul Jesse Hellemn	e91560017d	Removing leveldb for ubuntu (#2081 ) * Removing leveldb for ubuntu * changes	2018-02-28 12:35:51 -08:00
Lu Fang	eb612b09e9	Fix Caffe2 ==> ONNX converter to handle three models (#2058 ) * Handle legacy pad in Caffe2==>ONNX converter, also remove fake initializer * Address the comments, 1) have filtering fake initializer before ssa rewrite, 2) polish the legacy padding handling logic * Add test cases to cover the code just added * Nit	2018-02-28 11:55:49 -08:00
gchanan	0f86f64398	Add support for device python arguments with constructors. (#5384 ) * Add support for device python arguments with constructors. * Fix flake8. * Simplify device handling. * Dont use torch._C._VariableFunctions. * Handle default values for functions that have tensor args (e.g. ones_like).	2018-02-28 14:41:57 -05:00
Sam Gross	459dadf04d	Use 'Tensor' instead of 'Variable' in type error messages (#5465 )	2018-02-28 14:35:12 -05:00
Sam Gross	ebd32f7bcd	Check that parsed_args contains enough space for all parameters (#5467 )	2018-02-28 14:34:04 -05:00
onnxbot	687de0bd67	[auto] Update onnx to 8dc7369 - preserve value infos if they are needed (#561 ) `8dc7369bb9`	2018-02-28 18:32:23 +00:00
gchanan	6ab33a820c	Support type conversion via type(dtype). (#5441 ) * Support type conversion via type(dtype). * Merge overloads.	2018-02-28 13:05:38 -05:00
gchanan	94938be367	Support dtypes in legacy new constructors. (#5343 ) * Support dtypes in legacy new constructors. * Add comment about why we don't have dtype for sparse (indices, values). * separate legacy tensor ctor vs new (new includes dtypes). * Use TypeError.	2018-02-28 12:52:11 -05:00
soumith	2de0bb3df4	[minor] change test name	2018-02-28 06:02:27 -08:00
Soumith Chintala	e09fa090e6	dont test for SourceChangeWarning in incompatible environments (#5458 )	2018-02-28 09:00:28 -05:00
Yangqing Jia	3d070e78fe	Fix cmake dependency error in static library case. Peer coded with @bddppq (#2078 ) * Fix cmake dependency error in static library case. Peer coded with @bddppq * Temporarily add back the private dependencies to the binary targets	2018-02-28 01:33:59 -08:00
Simeon Monov	847fad70a9	Check if CXX compiler supports all the needed functions (#5401 ) * Check if CXX compiler supports all the needed functions This commit improves the code for PR #5230 according to @ezyang comments. Instead of checking ubuntu/gcc versions it checks the support for the needed functions from the C++ compiler using CHECK_CXX_SOURCE_COMPILES. Fixes: 5229	2018-02-28 00:22:34 -05:00
Edward Z. Yang	6f9dc115e8	Mark test_fs_sharing as hanging in ASAN. (#5451 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2018-02-28 00:15:53 -05:00
Yangqing Jia	35c3b91f8a	Remove no longer used flag (#2075 )	2018-02-27 21:07:25 -08:00
Yangqing Jia	178c4be295	[wip] Cmake modernization (#2066 ) * cmake target - work in progress * wip cmake public targets * Add missing INTERFACE keyword * Add cuda public dependencies * Add dependency for test targets	2018-02-27 20:42:37 -08:00
Paul Jesse Hellemn	6341a0fd79	Fix cuda full (#2070 ) * Trying fuller cuda_cull * changes * Migrating to conda-forge for openmpi * Adding openmpi * Adding leveldb * Fixing unrelated minor conda bug * Another unrelated fix	2018-02-27 16:26:41 -08:00
Marat Dukhan	e07083f00a	Cleanup CMake files and build scripts for Android (#2067 ) - Remove USE_ARM64 option because it doesn't do what is expected - Disable ARM ComputeLibrary for non-ARM/ARM64 builds - Remove analysis of CMake options from scripts/build_android.sh - Add user-specified CMake options at the end of command line to allow overriding defaults - Update README for ARM ComputeLibrary integration and do not require to disable NNPACK for ARM64 build with ARM ComputeLibrary	2018-02-27 16:05:21 -08:00
li-roy	5bbeb55f22	add reduce=True arg to MultiMarginLoss (#5150 ) * add reduce=True arg to MultiMarginLoss * Change tests to support legacy * fix flake8 * address comments * formatting change * remove free of unallocated tensor * fix after variable/tensor merge	2018-02-27 18:35:50 -05:00
Tongzhou Wang	392fc8885c	add faq on cuda memory management and dataloder (#5378 )	2018-02-27 18:35:30 -05:00
Tongzhou Wang	1a7815e662	Add Scalar to native_function.yaml doc (#5416 ) * add Scalar to native_function.yaml doc * address @gchanan 's comments	2018-02-27 18:34:42 -05:00
Yinghai Lu	0955e791d3	Fix caffe_add_whole_archive_flag in cmake (#2062 )	2018-02-27 15:04:01 -08:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
onnxbot	7276432bbd	[auto] Update onnx to eb55f2a - Update defs.cc to clarify Pool op semantics (#552 ) `eb55f2a637`	2018-02-27 22:08:17 +00:00
onnxbot	fa24e47d1a	[auto] Update onnx to 296953d - spelling pass for docs (#542 ) `296953db87`	2018-02-27 21:39:41 +00:00
Carl Lemaire	6b95ca4eda	DataParallel: GPU imbalance warning (#5376 )	2018-02-27 21:30:41 +01:00
Di Yu	5e0a3e99bc	OSS Playground modulelized model components (#2059 ) Relatively independent feature. Tested and reviewed. Should be save to merge	2018-02-27 12:27:41 -08:00
gchanan	d5038309a1	Remove WITH_SCALARS, as it's enabled by default now. (#5437 )	2018-02-27 14:51:11 -05:00
anderspapitto	76304300a8	Transpose shape inference (#2057 ) * fix variable name * enhance shape inference to handle transpose in the case arising from pack_padded(..., batch_first=True)	2018-02-27 11:51:10 -08:00
onnxbot	af78d51dea	[auto] Update onnx to 61836da - Check whether perm exists before using it (#559 ) `61836da46e`	2018-02-27 18:50:46 +00:00
Achal Dave	8327982904	Set python random seed in workers (#5415 ) * Set python random seed in workers * Import random	2018-02-27 03:16:10 -05:00
Yinghai Lu	9f2975e2cf	Remove onnx-caffe2 (#5425 ) * Remove onnx-caffe2 * Comments	2018-02-27 03:15:49 -05:00
Soumith Chintala	d5de0dca38	fix crash in cudnn setup helper on machines without cudnn (#5427 )	2018-02-27 03:15:23 -05:00
Vedanuj Goswami	7f1b3d12e1	Fix ASAN alloc-dealloc-mismatch in TestMultiprocessing (#5428 )	2018-02-27 03:14:52 -05:00
Yinghai Lu	38fb8c5cf7	Remove onnx-caffe2 reference (#2063 )	2018-02-27 00:09:03 -08:00
onnxbot	a12aae2a72	[auto] Update onnx to b0ffb2d - Remove onnx-caffe2 reference (#558 ) `b0ffb2d302`	2018-02-27 06:10:57 +00:00
Roger Pack	bc4c919a9e	update dependencies (#5423 ) On OS X from source I get `Missing build dependency: Unable to import the typing module. `	2018-02-26 22:43:42 -05:00
Jerry Zhang	12a477b12e	Update README.md	2018-02-26 18:21:24 -08:00
onnxbot	ddf6b3daae	[auto] Update onnx to 176e357 - adding tests for cast operation (#543 ) `176e3575ea`	2018-02-27 02:03:42 +00:00
Jerry Zhang	679232657d	Update README.md	2018-02-26 18:02:28 -08:00
Jerry Zhang	ec194f2468	Fix typos in README	2018-02-26 18:01:27 -08:00
Tongzhou Wang	8c18220a59	Fix layer_norm initialization and nn.Module docs (#5422 ) * Fix LN initialization; Support single int normalized_shape * disable docstring inheritance * fix sphinx warnings	2018-02-26 19:32:08 -05:00
gchanan	611c771fc8	Introduce torch.tensor (was torch.autograd.variable). (#5419 ) * Introduce torch.tensor (was torch.autograd.variable). * Get rid of torch.variable usages. * Use more precise name.	2018-02-26 19:10:29 -05:00
Zachary DeVito	05269b582b	[JIT] Support shape propagation with control-flow (#5391 ) Support shape propagation with control-flow * This allows us to enable optimization in the GraphExecutor for most script tests. * Changes Type to always be present (non-null) on a Value, removing `hasType()` and `typeOption()`. A new type kind 'DynamicType' now represents when a specific type has not been determined. * If/Loop nodes propagate shapes/types in the simple cases where types of outputs do not change depending on where control flows. In other cases, we propagate DynamicType to indicate we do not know what the shape will be. * Remove the `cond` input to the body of Loop to simplify handling in interpreter and shape propagation. * Bugfix for zero-dim contiguousStridesOf	2018-02-26 15:24:05 -08:00
gchanan	0250b57978	Avoid extra cpu->cpu copy in dispatch_type. (#5418 ) * Avoid extra cpu->cpu copy in dispatch_type. * Simplify cases.	2018-02-26 17:56:45 -05:00
gchanan	3600e9ef6c	Mark functions that shouldn't end up in torch. as method-only. (#5392 )	2018-02-26 16:52:54 -05:00
Zachary DeVito	c6d47f6386	add @torch.jit.script, @torch.jit.compile, torch.jit.CompilationUnit(str) (#5367 ) * torch.jit.trace annotation now creates a GraphExecutor The other torch.jit.trace, which was used for testing purposes and for onnx to get the trace graph, is now called torch.jit. torch.jit.get_trace_graph. * @script annotation, and compilation unit for strings	2018-02-26 13:22:45 -08:00
Yangqing Jia	c3320887fe	[cmake] try removing caffe2_include_directories hack (#2050 )	2018-02-26 12:45:06 -08:00
Sam Gross	406c9f9c28	Remove two uses of the old Tensor class (#5413 )	2018-02-26 15:00:51 -05:00
anderspapitto	ec547ce640	RNN ONNX export: concat hidden/cell states on the right axis (#2055 ) Test Plan: existing tests in onnx-fb-universe catch this, modulo a bug in the tests which I am fixing in a separate diff	2018-02-26 11:04:04 -08:00
gchanan	e68b815afe	Empty sparse tensor copy revers dimI, dimV. (#5414 )	2018-02-26 13:54:20 -05:00
Orion Reblitz-Richardson	c7a3b00bf3	Back out "[caffe2] fix signed-integer-overflow UBSAN error" Original commit changeset: 89c604e11ad4 Needed to back out D7006399 because of test failure. The previous diff changed behavior.	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	028bc2f23f	[C2 OSS][GPU]exposing totalGlobalMem info to workspace python exposing totalGlobalMem info to GetDeviceProperties method so that users can have better understanding	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	c55d34ed81	Add operation time metrics to blobs_queue. Export read time and write time from the blobs queue. Fix queue balace stat for `blockingRead`.	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	a5b387fa27	Fix Caffe2 OSS build Fix Caffe2 OSS build	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	c55a642d83	[c2] update SparseFeatureHash layer The diff makes following changes for this layer: copy length blob; add nameScope for output schema; add layer tests	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	e397367db0	GatherRangesToDenseOp supporting sorting with keys Added functionality to GatherRangesToDenseOp such that it supports an optional input KEY, and will sort DATA according to KEY for each example per feature.	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	bdd25d80a8	fix invalid-shift-base UB in conv_op_cache_cudnn.h	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	6922d7d89f	Add cudaconvnet for caffe2 Add cudaconvnet for caffe2	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	c18f9b4dea	Back out "[codemod] - comment out unused parameters" Original commit changeset: 8e10b1f1e2ae @allow-large-files	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	148f6b200a	fix signed-integer-overflow UBSAN error	2018-02-26 10:26:25 -08:00
Orion Reblitz-Richardson	7e9f8af018	[codemod] - comment out unused parameters	2018-02-26 10:26:25 -08:00
Andrzej Sołtysik	fc9837899d	Embedding.load_pretrained method (#5350 )	2018-02-26 17:46:25 +01:00
gchanan	f4cfd9bbfc	Don't python bind 'tensor' or 'sparse_coo_tensor'. (#5390 ) These are internal ATen functions; we have better python APIs.	2018-02-26 11:06:25 -05:00
Kaiyu Shi	10fd272b7a	Update doc of batch size requirements for DP (#5108 ) * Update doc of batch size requirements for DP Fix #5039 * Delete the recommendation for batch size There's no significant speed difference between divisible and indivisible batch size.	2018-02-26 00:55:08 -05:00
Qinqing Zheng	7cafdab69b	[C2] Implement Layer-wise Adaptive Rate Scaling (LARS) (#2034 ) * [C2] Implement Layer-wise Adaptive Rate Scaling (LARS) * [C2] Implement Layer-wise Adaptive Rate Scaling (LARS) * add unit test for Lars * set default value for lars to be None * remove lars for subclasses of SgdOptimizer	2018-02-25 14:58:31 -08:00
Marat Dukhan	39001db843	Update NNPACK and cpuinfo to check cpuinfo_initialize status (#2051 )	2018-02-25 09:03:57 -08:00
Yangqing Jia	1f9df59de9	Move caffe_option to proper cmake_dependent_option (#2049 )	2018-02-24 23:31:36 -08:00
Marat Dukhan	9d94a529fa	Update NNPACK and its dependencies (#2047 ) I made changes to NNPACK and its dependencies to not mess up global CMake state. This commit brings these changes to Caffe2.	2018-02-24 23:20:38 -08:00
PengBo	07646e405e	no_bias in resnet32x32 (#1817 )	2018-02-24 16:58:23 -08:00
Soumith Chintala	d2f71cbdeb	make CuDNN finders respect library major version (#5399 )	2018-02-24 19:37:00 -05:00
Yangqing Jia	80430501c9	Remove the use of EXTERNAL_DEPENDENCIES (#2045 ) * [cmake] Move nccl to modern cmake, and avoid using EXTERNAL_DEPENDENCIES * [cmake] Move nnpack to modern cmake and avoid using EXTERNAL_DEPENDENCIES. * [cmake] Move ATen to modern cmake and avoid using EXTERNAL_DEPENDENCIES. * Move cpufeatures to modern cmake, and avoid using EXTERNAL_DEPENDENCIES * Finally remove EXTERNAL_DEPENDENCIES. * Maratyszcza's comments	2018-02-24 16:15:28 -08:00
Edward Z. Yang	40d79e4447	Turn on ASAN in continuous integration. (#5271 ) I know this works because I had to squelch a bunch of ASAN errors in multiprocessing. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-24 17:04:25 -05:00
Tongzhou Wang	1ff537ca71	Ignore FileNotFoundError when shutting down in data_queue.get (#5380 ) * Ignore FileNotFoundError when shutting down in data_queue.get * Address @apaszke comments	2018-02-24 13:32:13 -05:00
Pieter Noordhuis	c60d509fdf	Pin libnccl2 to version 2.1.2 (#2033 ) * Pin libnccl2 to version 2.1.2 Version 2.1.4 exports C++ symbols that it shouldn't, which causes a mismatch between raised exceptions and expected exceptions. Pin this to 2.1.2 until this is solved and NVIDIA releases a new version. * Fix for 9.1 * Actually pin 2.1.4 for 9.1	2018-02-24 09:51:47 -08:00
Edward Z. Yang	c06c6046e3	Accept GPU perf test regression. (#5395 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-24 18:24:24 +01:00
Lu Fang	c1919b370b	Skip Cast ONNX backend test, which is not supported in Float16 case (#2005 )	2018-02-24 03:51:08 -08:00
Adam Paszke	a0118533ef	Add a print() function to the JIT script (#5274 ) Additionally: - add support for calling functions that are not methods in the Python frontend - add an end-to-end test for the Python frontend - add a capture_stdout helper for checking that `print` actually works	2018-02-24 11:15:55 +01:00
Adam Paszke	fbf1f06521	Implement no-attribute dispatch of ATen ops from the JIT (#5298 )	2018-02-24 11:15:43 +01:00
Yangqing Jia	ff189b9023	Update CMake min requirement to 3, and use interface library for cuda libs. (#2021 ) * Try use CMake interface library to simplify some of the cuda libs. * Bump to cmake 3	2018-02-24 00:02:18 -08:00
Yangqing Jia	07414d94d7	Add eigen version check (#2037 )	2018-02-23 22:00:07 -08:00
Yangqing Jia	0f0f7957e4	Add docker cmake3 install for ubuntu 14.04 (#2038 )	2018-02-23 21:19:15 -08:00
Paul Jesse Hellemn	ff3ef8301c	[WIP] splitting conda-builds into separate build and test phases for PRs (#2031 ) * [WIP] moving conda scripts to separate build+test * [WIP] Splitting conda-builds into build and test phases * Migrating build_local to call build_anaconda * Tidying up a regex	2018-02-23 18:47:14 -08:00
Jerry Zhang	c0866e45c7	Caffe2 ARM ComputeLibrary integration (#2015 ) Caffe2 ARM Compute Library Integration	2018-02-23 18:09:05 -08:00
onnxbot	e3aae398ce	[auto] Update onnx to e78c068 - Adding int32, int64 and double input data types for featurevectorizer (#547 ) `e78c068008`	2018-02-24 00:33:05 +00:00
Yangqing Jia	99e99130f5	Remove build_host_protoc.bat as it is no longer needed after protobuf update. (#2020 )	2018-02-23 16:13:01 -08:00
Sam Gross	30ec06c140	Merge Variable and Tensor classes (#5225 ) This replaces the torch.Tensor constructors with factories that produce Variables. Similarly, functions on the torch module (e.g. torch.randn) now return Variables. To keep the PR to a reasonable size, I've left most of the unused tensor code. Subsequent PRs will remove the dead code, clean-up calls to torch.autograd.Variable, and rename Variable to Tensor everywhere. There are some breaking changes because Variable and Tensors had slightly different semantics. There's a list of those changes here: https://github.com/pytorch/pytorch/wiki/Breaking-Changes-from-Variable-and-Tensor-merge	2018-02-23 18:03:31 -05:00
avmgithub	7a36c132ce	Skip denormal test for now. See issue #5331 (#5387 ) * Skip denormal test for now. See issue #5331 * Skip denormal test for now. See issue #5331	2018-02-23 16:35:51 -05:00
Will Feng	da8e037e03	Test both CPU build and CUDA build for Windows (#5364 )	2018-02-23 15:25:03 -05:00
Richard Zou	6a2afe3b59	Fix segfault in test_dep_nograd (#5377 )	2018-02-23 14:41:32 -05:00
Qinqing Zheng	b3fdfa7bd6	[DT] [4/n] Make epoch_group explicit for JobRunner (#2018 )	2018-02-23 10:41:52 -08:00
Richard Zou	dcbbf346c2	Change output_declarations in function_wrapper.py to be a NamedTuple (#5312 ) * Add python typing module as build dependency * Change output_declarations to be a NamedTuple * Add mypy configuration files mypy-files.txt includes a list of all files that should be typed checked with mypy. Run mypy with `mypy @mypyfiles.txt`. mypy.ini includes mypy options. Unfortunately this can't be merged with mypy-files.txt. Update .travis.yml so that one doesn't have to specify what files to type check inside it. * Add RuntimeError on missing `typing` module Alerts users to the new build dependency.	2018-02-23 13:33:59 -05:00
gchanan	2130070785	Handle copying empty sparse tensors to/from CPU, GPU. (#5361 ) * Handle copying empty sparse tensors to/from CPU, GPU. This is likely not a robust fix because it special cases the case where both the indices and values are empty rather than handling each one separately. But this is currently blocking a change introducing devices to constructors. * Guard sizes being NULL.	2018-02-23 13:17:27 -05:00
onnxbot	232837a75e	[auto] Update onnx to 3ca6622 - Fix pow op's test case (#546 ) (#548 ) `3ca6622ad0`	2018-02-23 18:08:00 +00:00
peterjc123	6c587e9e67	Solves the linking error related to lazy_init for MSVC (#5368 ) * Revert "Fix wrong argument name (#5366)" This reverts commit cc9d3b265d7e688865fde055ee3a2f9b77b5714a. * Solves the linking error related to lazy_init for MSVC * Fix wrong argument naming * Wrap torch::cuda::lazy_init with WITH_CUDA flag	2018-02-23 11:08:20 -05:00
anderspapitto	77036704aa	add a third output in LSTM onnx export (#5359 ) since that output has been added to the ONNX spec	2018-02-23 10:58:45 -05:00
Peter Goldsborough	008ba18c5b	Improve CUDA extension support (#5324 ) * Also pass torch includes to nvcc build * Export ATen/cuda headers with install * Refactor flags common to C++ and CUDA * Improve tests for C++/CUDA extensions * Export .cuh files under THC * Refactor and clean cpp_extension.py slightly * Include ATen in cuda extension test * Clarifying comment in cuda_extension.cu * Replace cuda_extension.cu with cuda_extension_kernel.cu in setup.py * Copy compile args in C++ extension and add second kernel * Conditionally add -std=c++11 to cuda_flags * Also export cuDNN headers * Add comment about deepcopy	2018-02-23 10:15:30 -05:00
Jan Gaura	e2519e7dd1	Fix undefined refence to convolve_5x5_sse on SSE4.1 CPUs (#5371 )	2018-02-23 10:12:48 -05:00
Paul Jesse Hellemn	0f68eac94a	Fixing an error building with CUDA on windows (#2004 ) * Fixing an error building with CUDA on windows * Fixing cublas issue too	2018-02-22 23:37:02 -08:00
peterjc123	cc9d3b265d	Fix wrong argument name (#5366 )	2018-02-23 00:37:02 -05:00
peterjc123	013ed5b88f	Add lazy_init.h into build for Windows and refactor code (#5365 ) * Add lazy_init.h into build for Windows and refactor code * Remove minor bugs	2018-02-23 00:05:43 -05:00
Yinghai Lu	cbd1fd6c85	Install onnx by using the onnx inside caffe2 (#2002 ) * Install onnx by using the onnx inside caffe2 * Add (s)ccche symlink for x86_64-linux-gnu-gcc	2018-02-22 20:31:33 -08:00
Yinghai Lu	c249f49ddd	Rename caffe2_ref_test.py to c2_ref_test.py (#2016 ) * Rename caffe2_ref_test.py to c2_ref_test.py * Rename the module name doc too	2018-02-22 20:22:39 -08:00
Zachary DeVito	8904616028	add control flow to interpreter (#5293 ) * Use stacks in the interpreter/aten_dispatch Rather than have separate input/output lists, the interpreter now works using a single stack. Operators in the interpreter push/pop from the stack. This allows ownership of tensors to transfer directly to an operator, and an operator can drop the reference to a tensors as soon as it is no longer needed. This is important for the GraphExecutor op, which recursively runs the interpreter. Once autograd is updated to pass variables to Function by value, we will be able to ensure that we release ownership as soon as possible. This commit also switches the interpreter to use a fake tensor 'ContainerTensor' rather than at::Retainable to hold non-tensor data in the interpreter. This allows us to use std::vector<at::Tensor> for all registers, which is significantly less confusing than the OwnedRetainables struct it was replacing. * Add If and Loop to interpreter * Preprocess loop to calculate where references to tensor should be dropped * Add control instructions JumpZ/JumpNZ/Jump * Switch from explicitly having stage structs to having a single list of instructions with Store/Load instructions to take values off the initial stack * Make the interpreter tests executable rather than use expect files * add a flag to interpreter code so that constants are variables if the interpreter is running on variables. * Add tensor_as to its own file	2018-02-22 19:56:15 -08:00
Bram Wasti	51897e52da	fix all the broken tests from adding debug info (#2013 )	2018-02-22 17:43:53 -08:00
anderspapitto	38f18c1daa	add third output in onnx -> caffe2 lstm conversion (#2011 )	2018-02-22 17:43:33 -08:00
Adam Paszke	c2a3d85a07	Traverse sub-blocks in JIT passes (#5329 ) * Traverse sub-blocks in JIT passes * Add an extra check to prevent cross-block fusion	2018-02-22 17:32:31 -08:00
anderspapitto	b6854ee012	support batch-first in ONNX export of padded sequences (#5360 )	2018-02-22 20:24:56 -05:00
Bram Wasti	4e5df5cda6	added debug info to OperatorDef	2018-02-22 15:53:49 -08:00
Edward Z. Yang	02b758f63c	Add a disabled-configs.txt interlock. (#5352 ) This will make it easier to bring online new CI configurations without temporarily breaking the CI, since you can mark it as disabled in PyTorch HEAD first and then bring the job online. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-22 15:58:41 -05:00
Yangqing Jia	fe5fe7bad2	CMake cuda targets (#1993 ) * wip: cuda targets * Remove FindCuDNN.cmake as it is no longer needed	2018-02-22 15:54:34 -05:00
Marat Dukhan	4da3ce720f	Support convolution without bias in NNPACK bindings	2018-02-22 14:25:17 -05:00
Tongzhou Wang	1848cad108	[ready] Layer Normalization (#4922 ) * at::maybe_data_ptr and Check.h => TensorUtils.h * THNN support for optional BN running_* * ATen support for optional BN running_* * Python nn.* support for optional BN running_; Improve IN and BN doc Add tests for IN and BN new option * Layer Norm * Fix LRN doc * functional interface for LN and IN * Layer norm tests * fix BN double backward returning undefined tensors * fix jit test using wrong dim inputs for BN * add/improve BN, IN and LN GPU tests with half type * Udpate docs to be consistent with Conv notation Fix onnx Clarified onnx symbokic wrapper * fix typo * Address comments	2018-02-22 11:56:41 -05:00
Yinghai Lu	2344decc91	Add onnx as a submodule (#1998 )	2018-02-21 21:10:50 -08:00
Soumith Chintala	9388d35293	prioritize cudnn library dir in library_dirs order (#5345 )	2018-02-21 22:51:04 -05:00
Paul Jesse Hellemn	090850e89b	Adding guards around adding protobuf targets (#1997 ) * Adding guards around adding protobuf targets * Moving include_dirs add into target creation	2018-02-21 18:48:30 -08:00
Orion Reblitz-Richardson	3ee9b5edca	[PR] Floor and Ceil Op Closes https://github.com/caffe2/caffe2/pull/1932 GitHub Author: Mohammad Hossain <zem@devgpu242.prn2.facebook.com>	2018-02-21 18:31:45 -08:00
Orion Reblitz-Richardson	ccea6924a2	Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Second Try. The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu.	2018-02-21 18:31:45 -08:00
gchanan	853dba8e3b	Improve sparse variable printing. (#5335 )	2018-02-21 18:01:58 -05:00
Teng Li	579de82bcf	DDP: 10% of NCCL backend perf improvements with mixed-prec support (#5064 )	2018-02-21 23:59:52 +01:00
Will Feng	069f66e267	only delete S3 image for successful Windows tests (#5341 )	2018-02-21 17:42:44 -05:00
gchanan	0878c6d4d7	Various dtype improvements. (#5321 ) * Various dtype improvements. 1) Add dtypes to the new data-based constructors: Variable.new_tensor and torch.autograd.variable. 2) In the python signatures, use Type instead of Dtype to match the C++ signatures; the error messages still print as dtype. 3) Handle / add a better error message when a dtype is used when ATen was not compiled with that type (e.g. cuda types). 4) Move cuda_lazy_init to its own file. A later commit will add support to the legacy constructors as well. * Move implementation of lazy_init to cpp. * Fix parsed_arg size.	2018-02-21 17:37:59 -05:00
Peter Goldsborough	702a7f3864	Improve Function interface (#5221 ) * Improve Function interface * Undo tracer changes * Fix bug in VariableType.set_history * Rename function_counter and sequence_number to sequence_nr * Clarify Function documentation * Replace swap_next_edges with next_edges() getter * Bring back set_gradient_edge * Simplify special.cpp * add_gradient_edge -> create_gradient_edge * Add mutable getters for pre/post hooks * Use make_variable with Edge * Remove remove_gradient_edge in favor of detach_ * Fix documentation and remove create_gradient_edge friend method * Canonicalize some includes	2018-02-21 16:37:52 -05:00
Ben Graham	ba8bbeced3	Fix input size checks in ATen for SpatialFractionalMaxPooling (#5337 )	2018-02-21 16:37:11 -05:00
bddppq	9bf9f0e613	Fix the bug of only processing one attribute (#5334 )	2018-02-21 22:35:36 +01:00
Junior Rojas	642e4d0762	Fix typos (#5340 )	2018-02-21 16:27:12 -05:00
Will Feng	09cff195df	Improve GPU perf test (#5327 ) * Reduce dataset size for word_language_model; increase NUM_RUNS for all GPU tests * Test check_cpu_governor option * Update perf test numbers for CPU and GPU	2018-02-21 13:21:18 -05:00
gchanan	6522d6e692	Make _like dtype arguments keyword only. (#5320 )	2018-02-21 12:18:39 -05:00
gchanan	af4e72fdd2	Remove _out variants of like functions. (#5318 ) These now have dtypes, which matches the numpy API.	2018-02-21 10:31:39 -05:00
Will Feng	0340e46f9b	Disable tests that use DataLoader with multiple workers (#5322 )	2018-02-21 09:20:37 -05:00
Ailing	3ef2e484bf	Add fp16 testcases in test_cuda (#5122 )	2018-02-21 14:35:29 +01:00
Teng Li	4b8f4fc259	Added mixed-precision support in distributed training (#4891 )	2018-02-21 14:29:39 +01:00
bddppq	5e4acd032b	Add an option in cmake for specifying caffe2 python lib relative installation path (#1981 ) * Add an option in cmake for specifying caffe2 python lib relative installtion path * Fix variable name	2018-02-20 21:49:05 -08:00
bddppq	2588f5de06	Update onnx version to include the model files suffix change (#1991 )	2018-02-21 00:39:50 -05:00
Yangqing Jia	0d641145a1	Fix public protobuf interface (#1961 ) * Fix public protobuf interface - wip * Try turn on custom protobuf in mac jenkins. * Adding back auto-fallback protobuf option * Address typos pointed out by reviewers	2018-02-21 00:39:00 -05:00
sf-wind	5439ab3cdc	Remove gf library in MKL (#1976 ) * Remove OpenGL code from benchmark * Make it possible to print plot in the ipython notbook * Create the blob if the blob is not specified in the init net * Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run	2018-02-20 15:17:34 -08:00
Soumith Chintala	492466f25f	extern C guards around some TH headers (#5316 )	2018-02-20 17:54:15 -05:00
sf-wind	0074dc7fa8	Allow more backends in caffe2_benchmark (#1979 ) * Remove OpenGL code from benchmark * Make it possible to print plot in the ipython notbook * Create the blob if the blob is not specified in the init net * Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run * Support more backends in Caffe2 Benchmark * Revert "Do not use gf library for MKL. Even after I install the entire MKL library it is still not found. After removing it, the MKL code can still run" This reverts commit 981b6693a94cbf63ad78d51bd806c7a0d7a5a2d3. * Build caffe2_benchmark using shared or static library depending on the flag	2018-02-20 14:45:53 -08:00
Soumith Chintala	5b142e5344	add guards when source of container cannot be retreived (#5317 )	2018-02-20 17:42:57 -05:00
Yinghai Lu	cc7e61c88d	Move onnx-caffe2 inside caffe2 (#1921 ) * Move onnx-caffe2 inside caffe2 * Update to the lastest onnx-caffe2 and update jenkins env * Rename onnx_caffe2 to onnx * Add __init__.py to caffe2/python/onnx * Change CI check variable to JENKINS_URL * Cherrypick recent onnx-caffe2 update	2018-02-20 13:56:52 -08:00
Edward Z. Yang	031412a14b	setup.py and cmake improvements (#5269 ) * Document env vars and properly propagate MAX_JOBS down. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Apply CFLAGS and LDFLAGS environment variables to cmake builds. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Test that running built program works; fixes #5151. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CMake CR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-20 16:55:57 -05:00
bddppq	7283d5194a	Avoid having two protobuf on ubuntu14.04 (#1989 ) * Avoid having two protobuf in ubuntu14.04 * Fix indent	2018-02-20 12:10:17 -08:00
Will Feng	5ce46be17c	Disable test_multi_keep on Windows (#5314 )	2018-02-20 15:00:53 -05:00
Marat Dukhan	639d1c7c5e	Make sure libcaffe2.so does not require executable stack This commit updates python-peachpy submodule to bring in the fix. In #1543 @samarjeet reported that importing caffe2 from Python fails on his system with the error "CRITICAL:root:Cannot load caffe2.python. Error: libcaffe2.so: cannot enable executable stack as shared object requires: Invalid argument". I investigated and found that this is caused by libcaffe2.so being marked as requiring executable stack, which itself was caused by assembly (PeachPy) files in NNPACK not specifying whether they need an executable stack (by default, linked assumes execstack needed). I patched PeachPy to add ".note.GNU-stack" section to generated ELF files, which makes the linker mark libcaffe2.so as NOT needing executable stack. See Maratyszcza/PeachPy#89 for details.	2018-02-20 13:28:24 -05:00
Paul Jesse Hellemn	ee71eab4c6	Adding 'full' version of conda build (#1934 ) Adds another package to Anaconda.org with a "-full" suffix which includes more libraries by default. This also installs NCCL 2.1 onto the CI Ubuntu docker images to accomplish this.	2018-02-20 10:20:07 -08:00
gchanan	5edf6b2037	Add numpy-style dtypes to Variable factories. (#5245 ) * Add numpy-style dtypes to Variable factories. 1) Add numpy-style dtypes corresponding to torch tensor types. These are: torch.float16, torch.float32, torch.float64, torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 as well as torch.cuda, torch.sparse, and torch.cuda.sparse equivalents. 2) Adds "legacy" names for the above dtypes that correspond more closely to existing tensor names. These are: torch.half, torch.float, torch.double, torch.short, torch.int, torch.long. torch.byte and torch.char don't exist because they either don't match numpy semantics or differ on different architectures. 3) Adds a "dtype" parameter to Variable factories (e.g. zeros, ones) that allows the user to specify the type without changing the default tensor type. 4) Adds a "dtype" getter to Variables that return the canonical dtype from 1) This PR is missing the following useful features that should be added in the future: A) We only add the "dtype" parameter to auto-generated factories; hand-written factories like in tensor_new.cpp don't support this yet. B) We don't allow type conversions to use dtypes; that should be added to type(param) or a new function. C) We don't yet have a "device" parameter for these factories; right now, they will only create Variables on the default device. * backend_to_string can be private. * Define python binding argument indexes in a more simple way. * add all_declared_types, still need to hook it up to THPDType. * Fix all_declared_types for missing types (it's Sparse + Half). * Ensure cuda dtypes are created even if compiled with NO_CUDA=1. * Fix case where dtype is provided but dispatch is via namespace. This happens in ones_like, empty_like, randn_like. There is some question if we should do: 1) at::ones_like(tensor).toType(dtype) 2) at::ones_like(tensor.toType(dtype)) I did the former because this matches with the numpy documentation, i.e.: "Overrides the data type of the result." and it's easier to implement. Note that the above causes an extra copy, either of the input or output. Here's a better implementation: 1) Make zeros_like, ones_like native functions that take an optional type (named dtype?). 2) Match the type argument with the dtype, so we don't have two different parameters. 3) Call at::zeros_like(input, type) -> at::native::zeros_like(input, type) -> type.zeros(input.sizes()) * Don't return from maybe_initialize_cuda. * Don't leak DType name. * Address cpp review comments. * Share code between sparse and non-sparse test_dtypes. * Rewrite _like functions as native function with explicit type parameter. * Use type 'Type' instead of 'dtype' for consistency. * Address review comments. * Handle arg_idx when there is requires_grad but no dtype in python_binding_arguments.	2018-02-20 11:04:14 -05:00
Marcin Elantkowski	d2ff733cb1	Make ReduceLROnPlateau serializable. (#5300 ) * replace lambdas with partial * flake8	2018-02-20 00:59:14 -05:00
brett koonce	596470011b	minor sp, underlyhing->underlying (#5304 )	2018-02-19 22:28:17 -05:00
Sam Gross	0509f26d41	Speed-up nn.Linear for the 3d input case (#5279 ) This adds at::_unsafe_view and uses it in matmul. The _unsafe_view function is identical to view except that the output is not treated like a view by the automatic differentiation code. This avoids in-place modifications triggering the more expensive CopySlices/AsStridedBackward behavior. The _unsafe_view function is only safe to use on temporaries that will be immediately discarded and that do not alias other tensors. Otherwise, in-place modificatiions may trigger incorrect gradients. The funciton is not exposed to Python. See #5169	2018-02-19 19:47:20 -05:00
Choongwoo Han	cf71385ec9	Implement torch.isnan (#5273 ) * Implement torch.isnan * Simple python implementation * Fix typo	2018-02-19 19:46:35 -05:00
Choongwoo Han	fae6c67121	Configurable flushing denormal numbers on CPU (#5294 ) * Configurable flushing denormal numbers on CPU * Formatting * Update docs * Minor doc changes	2018-02-19 19:23:43 -05:00
Marcin Elantkowski	6279367297	Check class index in no-reduce ClassNLLLoss kernels (#5299 )	2018-02-19 17:18:52 +01:00
James Reed	5eefe87d4e	Emit ternary if in script compiler (#5291 )	2018-02-18 09:53:13 +00:00
Will Feng	9193dfd185	Disable test_multi_drop on Windows (#5290 )	2018-02-17 20:49:12 -08:00
Jon Malmaud	c71c84ee04	Tweak 'detach' docstring. (#5292 )	2018-02-17 23:35:30 -05:00
Vedanuj Goswami	f51e284408	Fix ASAN detected global buffer overflows in autograd (#5289 ) * Fix asan buffer overflow in autograd saved_variable.cpp * Fix asan global buffer overflow in any_variable_requires_grad * Revert change in any_variable_requires_grad	2018-02-17 19:52:45 -08:00
Richard Zou	9c207b195a	Fixes UB when using legacy python functions and mark_non_differentiable (#5275 ) * Fixes UB when using legacy python functions and mark_non_differentiable If an output of a python Function is marked as non_differentiable, autograd won't save a gradfn for that output. During the backward pass, this translates to an undefined tensor being passed to the backward of the Function. The legacy python Function path checks if any of the inputs to backward requires_grad. This requires_grad check uses Variable::get(), which casts the undefined tensor to a VariableImpl and then accesses the _requires_grad member. This is UB because the undefined tensor is NOT a VariableImpl. The fix here is to add a check for if the variable/tensor is defined in the legacy python Function code path. * s/and/&&/	2018-02-17 19:06:40 -08:00
Peter Goldsborough	22fe542b8e	Use TORCH_EXTENSION_NAME macro to avoid mismatched module/extension name (#5277 ) * Warn users about mismatched module/extension name * Define TORCH_EXTENSION_NAME macro	2018-02-16 22:31:04 -05:00
Kato Tetsuro	5c93ca258b	check attribute existence in SpatialFullConvolution (#5255 )	2018-02-16 21:06:08 -05:00
Manoj Krishnan	f4f5bad901	Adding a new ReduceScatter Operator. Summary: Integrating Gloo's ReduceScatter operation with caffe2 Reviewed By: pietern Differential Revision: D6970344 fbshipit-source-id: 27762f940812eb0bf6c99afb4ff1a25914855b11	2018-02-16 17:27:17 -08:00
Yan Zhu	36c49c9f4a	change schema's __repr__() flat output to pprint style indented output Summary: as title. This is similar with python pprint utility for nested json data structure. It can be useful for checking schema during debugging. Reviewed By: kittipatv Differential Revision: D6710767 fbshipit-source-id: e450aa5477fa1ad4f93c4573f8108a2f49956da8	2018-02-16 16:26:11 -08:00
James Reed	3ffd6ffa7d	while and if for experimental JIT script (#5176 ) This commit adds while and if support to the experimental script frontend, following the design of ONNX.	2018-02-16 15:30:18 -08:00
David Lai	fac4852ff4	- Fix unused parameter warning in pool_op.cc Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy. Reviewed By: yfeldblum Differential Revision: D7012203 fbshipit-source-id: 585e9e89d916dca8894308438d0c985cb1e1b07a	2018-02-16 15:11:12 -08:00
Fei Sun	1c2cef10e2	Make zstd position independent Summary: Closes https://github.com/caffe2/caffe2/pull/1975 Differential Revision: D7012872 Pulled By: sf-wind fbshipit-source-id: 31a8f787faf99894ac25508d85e1eb0f0e8e84a2	2018-02-16 14:35:37 -08:00
Yinghai Lu	a4c7c88f13	Update onnx version for onnx-caffe2 test Summary: We recently added `onnx.optimize` from onnx-caffe2 to onnx. So we need a newer version of onnx to run the tests add in https://github.com/caffe2/caffe2/pull/1921. Closes https://github.com/caffe2/caffe2/pull/1974 Differential Revision: D7012613 Pulled By: yinghai fbshipit-source-id: db3476374a05ce0bc1341aab46bd27ea374fe014	2018-02-16 13:28:27 -08:00
Frank Jiang	c809d89810	Fix RowWiseSparseAdam implementation Summary: The original implementation averaged the momentum across the embedding dimensions, which doesn't make any sense. This meant all the embedding dimensions received the same update, becoming a very memory-expensive one-dimensional embedding. Differential Revision: D7003135 fbshipit-source-id: ed54e3427bc13895a4e949e96b4b17f6ebfb6d53	2018-02-16 13:28:26 -08:00
Tongzhou Wang	8bfb1aa71b	Fix __syncthread in SpatialClassNLLCriterion.cu (#5276 ) * remove unnecessary __syncthread in SpatialClassNLLCriterion.cu; fix reduceBlock comment * address comments	2018-02-16 13:59:26 -05:00
Marat Dukhan	a6a75621cb	Fix warning in net_test Summary: Fixes an annoying warning when building for Android with tests enabled. Closes https://github.com/caffe2/caffe2/pull/1970 Reviewed By: pietern Differential Revision: D7011817 Pulled By: Maratyszcza fbshipit-source-id: 06162d5c5b12ed939581ce9a8498fbed3eb2c47b	2018-02-16 10:56:15 -08:00
Ilia Cherniavskii	25e7f8ab28	Fix event synchronization logic Summary: Fix logic in operator's event synchronization: Record might be called after async CPU op calls SetFinished Reviewed By: azzolini Differential Revision: D7003277 fbshipit-source-id: 4d77d6619c6403e71ba45fbaaf78e939982452b6	2018-02-16 10:18:45 -08:00
Sebastian Messmer	3975abe549	Make caffe2 handle out of bounds values correctly Summary: according to the new onnx standard in https://github.com/onnx/onnx/pull/513 Closes https://github.com/caffe2/caffe2/pull/1903 Reviewed By: dzhulgakov Differential Revision: D6920004 Pulled By: smessmer fbshipit-source-id: 95771f467499ae625ff0156418a4cdf5e5631a02	2018-02-16 09:09:47 -08:00
Teng Li	4157562c37	Added further automatic IBVERB lib and header check before enabling THD/Gloo IB support (#5264 ) * Added further automatic IBVERB lib and header check before enabling THD/Gloo IB support * Refectoring and addressed comments	2018-02-15 23:09:08 -08:00
Andrey Malevich	60dc3ca66f	Use 8-bit quantization only in cases when it makes sense. Summary: In some cases we were doing quantization even we we should not. This diff is preventing this from happening. Reviewed By: rayleichen Differential Revision: D6953547 fbshipit-source-id: 7c65baaf969e5e1bddb68ca8182f4f3b43f2431d	2018-02-15 19:33:03 -08:00
Xianjie Chen	c5497a34f6	Add CPU_ONLY tag for sparse_feature_hash layer Summary: as desc. Differential Revision: D6997841 fbshipit-source-id: 75a33ea146224979f149a36a063a78d6f18338ee	2018-02-15 19:05:56 -08:00
Edward Z. Yang	e411525f2c	Add a FAQ, for now just 'out of memory' advice. (#5251 ) * Add a FAQ, for now just 'out of memory' advice. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Updates based on comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * minor copyedit	2018-02-15 17:38:55 -08:00
Manoj Krishnan	801d7bc906	Update gloo Summary: This includes ReduceScatter implementation. Closes https://github.com/caffe2/caffe2/pull/1969 Differential Revision: D7005286 Pulled By: manojkris fbshipit-source-id: c68508b25dbc9ff700efa1426103f932806efc07	2018-02-15 17:14:18 -08:00
Do Huy Hoang	1711878aac	Support EQ operator for bool type Summary: EQ op should work on bool type. Reviewed By: ender-wieczorek Differential Revision: D6992905 fbshipit-source-id: 9a08c8b840963c9817405c7602a7f67dc6a6caab	2018-02-15 15:24:35 -08:00
Richard Zou	70e71391d2	Fix THCTensor_(max) and THCTensor_(min) inits (#5265 ) Their cuda kernels should be initialized with (min_value, 0) and (max_value, 0), respectively, where the second number is a default index value. However, they were being initialized with (max, 1) and (min, 1) instead, probably a remnant from the lua torch days. This caused bugs in torch.max() and torch.min() when the input is at the extreme values, and the max value (or min value) occurs at index 0. For example, import torch x = torch.ByteTensor([[0]]) x.cuda().max(dim=0) # returns (0, 1) but the expected result is (0, 0)	2018-02-15 14:41:19 -08:00
Richard Zou	cac3026b35	Fix typo in DataParallel docs (#5268 )	2018-02-15 23:02:26 +01:00
Adam Paszke	cb2fd39fdd	Add Python frontend to the JIT (#5190 )	2018-02-15 22:53:19 +01:00
David Lai	5ee4794d3c	- Fix unused parameter warning in math_cpu.cc Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy. Reviewed By: yfeldblum Differential Revision: D7001946 fbshipit-source-id: 680d812c98703ec57a9eb952a69c6316e7415be8	2018-02-15 13:18:53 -08:00
Will Feng	a27f0e4daa	Fix conda removal step for Windows build (#5267 )	2018-02-15 15:53:34 -05:00
Peter Goldsborough	fe72037c68	Add CUDA support for JIT-compiling C++ extensions (#5226 )	2018-02-15 15:50:01 -05:00
Edward Z. Yang	170b22a8f0	Run tests with -v flag, fixes #5240 (#5259 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-15 15:44:25 -05:00
li-roy	68aed0779d	add reduce=True arg to MultiLabelSoftMarginLoss (#5097 ) * add reduce=True arg to MultiLabelSoftMarginLoss * Move some tests to new_criterion_tests * fix flake8 * fix multilabelsoftmarginloss weights test	2018-02-15 15:29:44 -05:00
Yinghai Lu	3384f56cce	Fix setup.py Summary: There is a typo in the setup.py which will cause incomplete install. This fixes it. Closes https://github.com/caffe2/caffe2/pull/1968 Reviewed By: bddppq Differential Revision: D7000517 Pulled By: yinghai fbshipit-source-id: c89e32bc5a4a77571f6ab6569297a6b6a1d1f2fc	2018-02-15 11:38:29 -08:00
Jesse Hellemn	3036346af6	Trying a quick patch to install protobuf 2.6 Summary: Closes https://github.com/caffe2/caffe2/pull/1964 Reviewed By: orionr Differential Revision: D6994739 Pulled By: pjh5 fbshipit-source-id: dbef0c7e5b5ade1580effa463fe19f04b8a0a276	2018-02-15 10:59:30 -08:00
ngimel	2f40c88508	downgrade docker back to 9 (#5257 )	2018-02-15 12:31:45 -05:00
Martin Drawitsch	1fdb3929c9	Fixes for docstrings/sphinx rendering of CosineAnnealingLR and Local Response Normalization (#5254 ) * Fix LaTex rendering in CosineAnnealingLR Backslashes were interpreted by Python as escapes in the string, so \frac turned into frac, which is not a valid LaTex command. This could be fixed with double backslashes, but the easiest solution is to just use a raw (r) docstring. * Fix sphinx warnings for LRN doc headings * Move LRN docstring from __init__ to class level The docstring was not rendered by sphinx at http://pytorch.org/docs/master/nn.html#torch.nn.LocalResponseNorm because it was in the constructor. * Remove superfluous backticks from LRN formula	2018-02-15 10:29:02 -05:00
Andrey Malevich	16cd3f4a9e	Don't allow to export models where parameters are inputs/outputs Summary: Without this enforce it's too easy to export model overriding it's params in predictor. Reviewed By: rayleichen Differential Revision: D6984506 fbshipit-source-id: 9bbf375758686c6ad12ad071723f255363e98ae6	2018-02-14 23:54:42 -08:00
Andrew Tulloch	66131dec6f	Expose Caffe2 WorkerPool from ThreadPool Reviewed By: harouwu Differential Revision: D6946610 fbshipit-source-id: a9fef0f1c7732b534433ee9517abddc32d0ec702	2018-02-14 21:09:15 -08:00
Soumith Chintala	677030b1cb	Revert "Remove unnecessary __syncthreads before reduceBlock" (#5250 )	2018-02-14 22:24:40 -05:00
Junjie Bai	bd22b83d62	Fix nccl cmake files Summary: Closes https://github.com/caffe2/caffe2/pull/1963 Differential Revision: D6994392 Pulled By: bddppq fbshipit-source-id: 4ab6a8f7dcb4469bdd3e152559ff3474984776fc	2018-02-14 16:04:11 -08:00
David Lai	8bbd376107	- Fix unused parameter warning in typeid.h Summary: We are going to enable `-Werror=unused-parameter` flag and I need to manually fix some files so we rest of this process can be automated with a tool called clang-tidy. Reviewed By: yfeldblum Differential Revision: D6928263 fbshipit-source-id: 38ce3597b9968a2c0dba3ab21be5ee1c84a13e41	2018-02-14 15:48:22 -08:00
Will Feng	66a97ddfd6	Add CPU and GPU perf tests in enabled-configs.txt (#5243 )	2018-02-14 15:35:51 -08:00
Sam Gross	9dfbc120f5	Fix assertNotEqual handling of message/precision (#5246 )	2018-02-14 18:08:53 -05:00
Junjie Bai	01b17b3e20	Disable support of using ninja in setup.py Summary: Our cmake files have some issue when using using ninja as the generator to build with cuda Closes https://github.com/caffe2/caffe2/pull/1962 Differential Revision: D6992456 Pulled By: bddppq fbshipit-source-id: 7aa328b16e7edfddfee33495352bfcf8cd8ce9f3	2018-02-14 14:19:04 -08:00
Simeon Monov	2078e4ed37	Check GCC version on Ubuntu (#5230 ) * Check GCC version on Ubuntu GCC 5 in Ubuntu 17.10 and newer doesn't define the macro _GLIBCXX_USE_C99 and causes std::to_string, std::isnan, std::isinf (and more) functions not to be defined neither. This fix checks if GCC 5 is used on Ubuntu 17.10 or later and shows an error message describing the problem. * Check GCC version on Ubuntu GCC 5 in Ubuntu 17.10 and newer doesn't define the macro _GLIBCXX_USE_C99 and causes std::to_string, std::isnan, std::isinf (and more) functions not to be defined neither. This fix checks if GCC 5 is used on Ubuntu 17.10 or later and shows an error message describing the problem. Fixes #5229	2018-02-14 15:19:18 -05:00
Richard Zou	cbb2ee66f9	Remove unnecessary _syncthreads before reduceBlock (#5242 )	2018-02-14 15:18:54 -05:00
Tongzhou Wang	7363736c50	Fix THC multinomial stride usage; (#5238 ) Improve multinomial test	2018-02-14 15:00:55 -05:00
Marat Dukhan	284b3c3764	Fix Android build with binaries Summary: After we removed android-cmake submodule and switched to android.cmake.toolchain from Android NDK, the code that builds cpufeatures dependency is no longer valid. This commit fixes it. Closes https://github.com/caffe2/caffe2/pull/1957 Differential Revision: D6990082 Pulled By: Maratyszcza fbshipit-source-id: ccbe8190e30e097474a2876ed4c0b263bcb117ef	2018-02-14 11:10:52 -08:00
Yan Zhu	0a66c76a4c	detailed error output for parameter sharing Reviewed By: xianjiec Differential Revision: D6986239 fbshipit-source-id: 5b8bb06ea2383ce64318b5322bda7a58469f3eb0	2018-02-14 11:10:51 -08:00
Jesse Hellemn	c784a273bc	Fixing conda builds Summary: Closes https://github.com/caffe2/caffe2/pull/1959 Reviewed By: orionr Differential Revision: D6989330 Pulled By: pjh5 fbshipit-source-id: 721437031b3088409766c931753a774a921258be	2018-02-14 10:34:09 -08:00
Pieter Noordhuis	52fa742c51	Revert D6893040: Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: This reverts commit 30f614beea6f859fee25ce4f85573142885dde45 bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! cause_a_sev_many_files Differential Revision: D6893040 Original commit changeset: 30f614beea6f fbshipit-source-id: 5e98a24699088283f864efe31234874bdacbe3c3	2018-02-14 10:34:08 -08:00
Sean Naren	fe810edc80	Consolidated dockerfile changes, updated README (#5235 )	2018-02-14 11:57:23 -05:00
Adam Paszke	8910dd5a81	Fix GraphExecutor and add more AD formulas (#5215 )	2018-02-14 16:59:48 +01:00
Vishwak Srinivasan	318ae2085a	Include __delitem__ for Sequential (#5233 )	2018-02-14 13:04:27 +01:00
Sam Gross	6204877cd4	Allow zero-dim tensors to be bound to at::Scalar (#5142 ) * Allow zero-dim tensors to be bound to at::Scalar This relaxes THPUtils_unpackLong and THPUtils_unpackDouble to allow values convertable to PyLong and PyFloat objects. This includes NumPy scalars and zero-dim tensors (Variables). This is important to maintain backwards compatibility in the Tensor constructors once scalars are enabled and Variable and Tensor are merged. * Add comment and unpack PyInt as int64_t	2018-02-13 23:14:40 -08:00
Yinghai Lu	c746357017	Add dependency Python packages for onnx-caffe2 Summary: onnx-caffe2 requires some more Python packages in order to run its tests. Closes https://github.com/caffe2/caffe2/pull/1956 Reviewed By: bddppq Differential Revision: D6985654 Pulled By: yinghai fbshipit-source-id: 06d4ec95729b09cdd1bc7e096ecf6680124070cd	2018-02-13 22:17:39 -08:00
Will Feng	4256dbe2d0	Update perf test suite (#5191 ) * hard exit when test output contains warning or error * update perf test links * update base machine description * update z value range * update cpu perf test numbers * store perf test numbers in S3 instead, for easier updating * update mini_sequence_labeler perf test link * fix lint * store perf test numbers in repo * update link to mini_sequence_labeler test	2018-02-13 20:46:50 -08:00
Soumith Chintala	198958bb52	Fix for PRId64 (#5228 )	2018-02-13 22:42:42 -05:00
Maxim Naumov	f7cc8e8822	Implementing Pow operator (this merges existing pow with a scalar and new pow with a tensor exponent). Summary: The old pow operator has been deleted in math_ops.cc, math_ops.cu and math_ops.h, while the new operator supporting scalar and tensor exponent has been added in pow_op.cc, pow_op.h an elementwise_op.cu. Reviewed By: houseroad Differential Revision: D6893040 fbshipit-source-id: 30f614beea6f859fee25ce4f85573142885dde45	2018-02-13 17:46:35 -08:00
Yan Shang	fd28e0fa29	Add bool function to return whether a model contains loss Summary: Add a function to return true if the model contains loss and retuen false if the model doesn't include a loss. Reviewed By: kittipatv Differential Revision: D6982444 fbshipit-source-id: 1f63b7a1eaa3077841a0ad5d8d854b471d0aa84c	2018-02-13 16:38:36 -08:00
Fritz Obermeyer	a4d0a74cee	Ensure Distribution.sample() result is detached (#5086 )	2018-02-14 01:32:11 +01:00
Peter Goldsborough	1b71e78d13	CUDA support for C++ extensions with setuptools (#5207 ) This PR adds support for convenient CUDA integration in our C++ extension mechanism. This mainly involved figuring out how to get setuptools to use nvcc for CUDA files and the regular C++ compiler for C++ files. I've added a mixed C++/CUDA test case which works great. I've also added a CUDAExtension and CppExtension function that constructs a setuptools.Extension with "usually the right" arguments, which reduces the required boilerplate to write an extension even more. Especially for CUDA, where library_dir (CUDA_HOME/lib64) and libraries (cudart) have to be specified as well. Next step is to enable this with our "JIT" mechanism. NOTE: I've had to write a small find_cuda_home function to find the CUDA install directory. This logic is kind of a duplicate of tools/setup_helpers/cuda.py, but that's not available in the shipped PyTorch distribution. The function is also fairly short. Let me know if it's fine to duplicate this logic. * CUDA support for C++ extensions with setuptools * Remove printf in CUDA test kernel * Remove -arch flag in test/cpp_extensions/setup.py * Put wrap_compile into BuildExtension * Add guesses for CUDA_HOME directory * export PATH to CUDA location in test.sh * On Python2, sys.platform has the linux version number	2018-02-13 15:02:50 -08:00
Sam Gross	232ce18a41	Additional sparse Variable fixes (#5203 ) * Fix mul with dense + sparse * Add missing hspmm and smm Also make repeat only a function (not a method) to match Tensor behavior. These were discovered by running test_torch.py and test_sparse.py after merging Variable and Tensor	2018-02-13 17:54:20 -05:00
Kittipat Virochsiri	83c494787d	Allow adding to trainer_extra_schema Summary: Sometimes we need to add some extra schema later Reviewed By: sunnieshang Differential Revision: D6951849 fbshipit-source-id: 564eb88f9250eae24869fd10ba3426e00a18af33	2018-02-13 14:40:36 -08:00
Kittipat Virochsiri	6f533fd8b8	Only overwrite path_prefix & path_type when not None Summary: This breaks internal functionality Reviewed By: aartibasant Differential Revision: D6975222 fbshipit-source-id: ce751950b4b9217d8ea5de703690451e98642f00	2018-02-13 14:40:35 -08:00
gchanan	232530cc28	Move scalar tests from common_nn to legacy_nn. (#5223 )	2018-02-13 16:44:21 -05:00
Pieter Noordhuis	9a726a0770	Skip system Python if Anaconda is used Summary: We don't care about a particular system Python when building Anaconda images. Rebasing later to remove the sccache change once it is merged (#1952). Closes https://github.com/caffe2/caffe2/pull/1953 Differential Revision: D6978409 Pulled By: pietern fbshipit-source-id: 39762602cdd35eefd485a014011b53e3ee2e830d	2018-02-13 11:47:46 -08:00
Pieter Noordhuis	4a377d7817	Optionally build with sccache Summary: Work in progress to start using sccache Closes https://github.com/caffe2/caffe2/pull/1949 Differential Revision: D6978772 Pulled By: pietern fbshipit-source-id: 721462d8e3470736472263337c628b287cd1a901	2018-02-13 11:35:26 -08:00
Matan Appelbaum	d99d28b3e6	Allow custom component tagging in DeviceOptions.node_name Summary: Modify detect_components to take a list of valid node_name prefixes instead of values. Users can set node_name to e.g. `'sparse_component:0'`, `'sparse_component:1'`, etc. and pass `'sparse_component:'` as a valid prefix. Also add `Tags.SPARSE_COMPONENT` in addition to `Tags.SPARSE_SHARDED` and `Tags.SPARSE_DONT_SHARD` and update all calls to `detect_device_components`. Reviewed By: azzolini Differential Revision: D6952599 fbshipit-source-id: e1b1e6b146a6bd053b295690016044fd5990c893	2018-02-13 11:14:41 -08:00
Pieter Noordhuis	5fe2f3f9e5	Install sccache in base images Summary: Closes https://github.com/caffe2/caffe2/pull/1952 Differential Revision: D6977809 Pulled By: pietern fbshipit-source-id: 36fd3b42c4ad3b4a3415b7a270b052a973450209	2018-02-13 10:32:01 -08:00
James Reed	f96f3c312d	Implement symbolic for slice operation (#5204 )	2018-02-13 10:12:59 -08:00
Richard Zou	ab18aaeba7	Clarify output shapes of reduce=False losses (#5082 )	2018-02-13 10:11:14 -08:00
Yimeng Zhang	da79697d45	make explicit about keyword-onlyness of `out` (#5165 ) * make explicit about keyword-onlyness of `out` fix issue 2 of https://github.com/pytorch/pytorch/issues/5156#issuecomment-364521510	2018-02-13 09:55:36 -08:00
Edward Z. Yang	7c3a8eaa15	Use sccache for CPU builds. (#5208 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-13 09:54:49 -08:00
Will Feng	e958727874	Disable NCCL tests for Windows (#5129 )	2018-02-13 09:44:30 -08:00
Zeming Lin	8678e8584f	Update aten docs (#5197 )	2018-02-13 09:38:16 -08:00
Pieter Noordhuis	da938019da	Include newer Python 3 versions in base image builder Summary: cc bddppq Closes https://github.com/caffe2/caffe2/pull/1947 Differential Revision: D6970881 Pulled By: pietern fbshipit-source-id: 3a3c97d58e079ddf9afe9ea214efa7be60b4fbe4	2018-02-13 09:32:46 -08:00
li-roy	147612e64a	add reduce=True arg to SoftMarginLoss (#5071 ) * add reduce=True arg to SoftMarginLoss * add reference function for SoftMarginLoss * Rebase onto master * Address comments * Fix flake8 * Fix rebase error	2018-02-13 10:51:57 -05:00
Junjie Bai	b11ba65204	Experimental support for setup.py develop mode install Summary: `python setup.py develop` / `pip install -e .` Closes https://github.com/caffe2/caffe2/pull/1926 Reviewed By: orionr Differential Revision: D6951780 Pulled By: bddppq fbshipit-source-id: 01249cbca90ec5326ea4107d4e500ae95a9dbd7b	2018-02-12 23:36:18 -08:00
Edward Z. Yang	2b2d56d846	Add missing async deprecated wrapper to tools/autograd/templates/python_variable_methods.cpp (#5196 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2018-02-12 23:29:35 -08:00
Edward Z. Yang	942f04ec16	Modest refactor of .jenkins scripts (#5202 ) - Create a new common.sh to put common bash stanzas in - Create a new enabled-configs.txt file, which you can use to selectively disable tests when running CI - Specify exited user land via trap, which means early successful exit will correctly print the end sigil. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-12 23:26:39 -08:00
Yinghai Lu	86803004e3	Fix cmake function to resolve libraries correctly Summary: Previous behavior may fail to resolve the correct library name. A rework of https://github.com/caffe2/caffe2/pull/1935 as it was messed up in the rebase... Closes https://github.com/caffe2/caffe2/pull/1950 Reviewed By: bddppq Differential Revision: D6974530 Pulled By: yinghai fbshipit-source-id: 924b653e8ac0b68c46341edfd3eb05d9cc0155f2	2018-02-12 22:22:55 -08:00
Peter Goldsborough	2d5fbe6e0d	Improve Variable interface (#5127 ) * Improve Variable interface * Address comments from @apaszke and @colesbury * string ::operator= is not noexcept * Remove ir.h from tracer_state.h to improve build times * Make Variable a struct and pack SavedVariable fields * Implement as_variable_ref * grad_fn_ptr() -> grad_fn_unsafe() * Reduce hackiness of set_type hack * Include variable.h and edge.h in tracer_state.h because it uses them * class Variable -> struct Variable because Windows cant even * Make Variable::output_nr uint32_t instead of int * Add comment about tracing state * Replaced more static_cast<Variable&> and improve docs * Remove SavedVariable destructor and construct members in init list * Clarify docs for Variable * Variable::set_version -> set_version_counter	2018-02-12 23:26:26 -05:00
Zhicheng Yan	d79a31761e	rectangle_cropping_multi_cropping_color_jittering_lighting Summary: Change log - Support rectangle cropping, where height and width of clip cropping can be set separately. This is useful when most video resolution is non-square, such as 240p, 360p and 480p where width is significantly larger than height. - Comparisons of training on ucf101 between using 112x112 croppings and using 112x144 cropping. - https://fburl.com/i0rw6y1k - Support 14 multi-cropping per video clip at testing stage to improve classification accuracy. Take left-top, central-top, right-top, left-bottom, central-bottom, right-bottom and central-central croppings as well as their mirrorings. In total, 14 croppings. - Comparisons on the same model trained on UCF-101. Use 1 clip per video - RGB. f41014306, w/o Vs f41014868, w/ multi-cropping: `0.64099 Vs 0.65796` - OF. f41014889, w/o Vs f41014913, w/ multi-cropping: `0.65796 Vs 0.67624` - Support color jittering and color lighting on RGB data for training data augmentation. - Comparisons of training on ucf101 from scratch with and without color jittering and lighting: - https://fburl.com/k69zatul Reviewed By: HengCV Differential Revision: D6962620 fbshipit-source-id: 9b43478945874142727fea351ee04417218e6606	2018-02-12 16:39:06 -08:00
Adam Paszke	0ef10385b2	Make Python functions respect grad mode (#5184 )	2018-02-13 01:27:36 +01:00
Fei Sun	38f2cd16ee	If a blob is not specified in the init net, create the blob Summary: In Caffe2 Benchmark, if a blob is not specified in the init net, but only specified in the predict net (e.g. input), the blob cannot be retrieved from the workspace. In some cases, it results some errors. Create the Blob before using it if it doesn't exist. Closes https://github.com/caffe2/caffe2/pull/1948 Reviewed By: orionr Differential Revision: D6970316 Pulled By: sf-wind fbshipit-source-id: 3e317403de0b5cf7568c7bda69a0ebe9d59d4a1f	2018-02-12 16:24:18 -08:00
Shrinidhi KL	d116e47143	Fix compiler error. (#5179 )	2018-02-12 19:14:51 -05:00
Junjie Bai	4b311847b1	Fix python extension suffix Summary: https://www.python.org/dev/peps/pep-3149 Add the missing abi tags to our pybind_state extensions. Closes https://github.com/caffe2/caffe2/pull/1946 Reviewed By: orionr Differential Revision: D6966545 Pulled By: bddppq fbshipit-source-id: cb94bd7e635a6a21517a8df436f910f102686bf3	2018-02-12 14:57:48 -08:00
Edward Z. Yang	849f94526b	Set MAX_JOBS=3 for OS X builds. (#5199 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-12 17:51:22 -05:00
Sam Gross	df0a4474c4	Allow and warn when indexing a zero-dim Variable (#5114 ) This better maintains backwards compatibility when Tensors and Variables are merged. For example: >>> loss = var.sum().data[0] Currently, `var.sum().data` is 1-dim so indexing. Once scalars are enabled and Variable and Tensor are merged it will be zero-dim. This change allows that expression to continue working (with a warning). In the future, the canonical way to compute that expression will be: >>> loss = float(var.sum()) Or an equivalent alternative: >>> loss = var.sum().item() Also fixes a few error cases.	2018-02-12 17:50:19 -05:00
Sam Gross	bada92ddcd	Implement Variable.new(...) overloads for sparse tensors (#5117 ) We were missing support for the sparse variable constructors which take indices and values.	2018-02-12 16:56:37 -05:00
Sam Gross	c7d95dcba5	Don't use Variable vs. Tensor type-checks for requires_grad logic (#4919 ) Prior to this change, test_autograd.py used type checks that differentiate between Tensor and Variable to determine if an argument needs requires_grad=True. This logic breaks when Tensor and Variable are merged. This changes the logic for method_tests so that: - non_differentiable(..) marks an argument as not requiring grad - floating point tensors have requires_grad=True - integral tensors have requires_grad=False - Variables are disallowed (unless they're wrapped in non_differentiable)	2018-02-12 16:20:14 -05:00
Thibault FEVRY	e39e86f119	Remove deprecated references to volatile (#5193 )	2018-02-12 21:08:27 +01:00
Peter Goldsborough	f38b6f611e	Replace NULL with nullptr in autograd (#5162 )	2018-02-12 12:01:52 -08:00
Amir Arsalan Soltani	2fd8e596b6	CUDA 9 (#5194 )	2018-02-12 14:43:28 -05:00
Will Feng	4ed87e3c9e	Make conda install and s3 cp in Windows build more quiet (#5187 ) * make conda install more quiet * make s3 cp more quiet	2018-02-12 14:22:21 -05:00
Amir Arsalan Soltani	19c2ad8834	CUDA 9.0 and cuDNN 7 (#5186 )	2018-02-12 14:21:56 -05:00
cpuhrsch	07be53b57f	Move EmbeddingBag into ATen (#4856 ) This diff creates code related to EmbeddingBag in ATen. It also allows sparse gradients.	2018-02-12 14:20:32 -05:00
Edward Z. Yang	177b4509ce	Fix memory corruption in im2col/vol2col based convolution kernels. (#5173 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-12 14:13:42 -05:00
anderspapitto	315ee107f6	document input_names, output_names feature of onnx export (#5189 )	2018-02-12 19:56:02 +01:00
Jesse Hellemn	43f2877b7d	Pinning networkx to 2.0 Summary: Cause 2.1 moved bellman_ford, and scikit-image will install the most recent networkx by default Closes https://github.com/caffe2/caffe2/pull/1944 Reviewed By: pietern Differential Revision: D6966299 Pulled By: pjh5 fbshipit-source-id: 71ad387cb4a2b22cde3b87e6665977da6b4c428e	2018-02-12 10:55:11 -08:00
Jesse Hellemn	1c005602fc	Adding model_id argument to nets in predictor_container when modelInfo exists Summary: Copying model_id from metaNetDef_->modelInfo in PredictorContainer for dper models. Since these model_id's are strings of <model_id>_<snapshot_id>, changed them to strings in net_observer Reviewed By: salexspb Differential Revision: D6752448 fbshipit-source-id: 93c91950b44c012e57240aaf909bc961449cfd7c	2018-02-12 10:38:58 -08:00
peterjc123	2f1493f1fb	#4990 , Makes Window build fail quicker (#5175 )	2018-02-12 13:11:15 -05:00
Adam Paszke	99474d28b8	Fix compound assignment in JIT script (#5178 )	2018-02-12 09:12:28 -08:00
Richard Zou	e1a88a7e98	Expose sparse variable sspaddmm (#5017 ) * Expose sparse variable sspaddmm * Delete unnecessary sspaddmm code for binding into THC * Address comments * Clean up code * address comment	2018-02-12 11:18:44 -05:00
Teng Li	d7b6a61a54	DDP: coalescing many little broadcasts to improve performance (#4978 )	2018-02-12 16:41:33 +01:00
Fritz Obermeyer	b608ea9178	Fix sign error in TransformedDistribution.cdf() and .icdf() (#5172 )	2018-02-12 11:45:22 +01:00
lazypanda1	a061000250	Added check and test for betas parameter in Adam optimizer (#5147 ) * Added check and test for betas parameter in Adam optimizer * Simplified test	2018-02-11 20:24:43 -05:00
Kevin Zakka	6dc41f9e63	fixed doc for cholesky potrs (#5180 )	2018-02-11 20:23:10 -05:00
Bram Wasti	ba84e78144	Fix up caffe2 server build (for @mode/fbandroid/server) Summary: This fixes issues revolving building on a devserver Reviewed By: pjh5 Differential Revision: D6953242 fbshipit-source-id: 59b4d3f846971a8b5eb9c1d802a8bacef3fad696	2018-02-10 08:35:31 -08:00
Lukasz Wesolowski	78c9a35a84	GPU support for ChannelStatsOp Summary: Step 1 of 3 in adding support for multidevice batch normalization on GPUs. Implements ChannelStatsOp for the GPU. Next steps are to port the backprop stats op and tie things together in DPM. Reviewed By: rbgirshick Differential Revision: D6953411 fbshipit-source-id: cd50e53d66ea84fe66021c08b978b28290d9f347	2018-02-09 19:31:31 -08:00
Dmytro Dzhulgakov	c718b7b62b	Make shape inference work with MKLMemory Summary: MKLMemory is not really a tensor, but we can make shape info collection work. Reviewed By: stephenyan1231 Differential Revision: D6947770 fbshipit-source-id: 04303ea309a8a9c1ac4c5401c43934d1abb6a7c4	2018-02-09 19:03:10 -08:00
Richard Zou	9f980b1795	Implement sparse tensor and variable norm(value) (#4882 )	2018-02-09 18:45:32 -05:00
Yuxin Wu	0df54f4d74	Fix typo Summary: Closes https://github.com/caffe2/caffe2/pull/1919 Reviewed By: ppwwyyxx Differential Revision: D6945318 Pulled By: orionr fbshipit-source-id: 700585e56d627d17f8280fe40d81ae8d984a7f40	2018-02-09 15:31:14 -08:00
Adam Paszke	fedb7095d6	Make tree views statically typed in JIT script AST (#5145 )	2018-02-09 22:18:31 +01:00
anderspapitto	8243e898ab	allow dropout in RNN ONNX export except in training mode (#5160 )	2018-02-09 16:04:27 -05:00
gchanan	4b8bf73729	Enable scalars. (#5158 ) * Enable scalars. * Avoid variable name shadowing in list comprehension, because it rebinds in python2, but not python3.	2018-02-09 15:45:41 -05:00
Kittipat Virochsiri	51267095d5	Remove enqueue_splits() from ReaderBuilder Summary: The interface is not used anywhere AFAICT; cleaning up to make it less confusing. Reviewed By: kuttas Differential Revision: D6867040 fbshipit-source-id: 3e8a77df76ef09c6864c308561825777b326f76c	2018-02-09 12:20:53 -08:00
Marat Dukhan	39c73556fb	Update NNPACK submodule to fix build with Python >= 3.6 Summary: enum34 dependency of PeachPy conflicts with built-in enum package on Python >= 3.6 This commit brings in NNPACK change to avoid using enum34 on Python >= 3.4 Closes https://github.com/caffe2/caffe2/pull/1925 Differential Revision: D6951906 Pulled By: Maratyszcza fbshipit-source-id: a698d8bbbc7b7b0c1b0b532c2c9d74fe0d2ae266	2018-02-09 11:51:03 -08:00
Zhicheng Yan	06f8fc3f49	extend_operator_CostInferenceFunction Summary: - Extend SimpleNet::TEST_Benchmark to report extra FLOP, feature map memory, parameter memory at operator-level - Add cost interfence function for 3D conv, sum, relu, spatial_bn, fc operators. Reviewed By: sf-wind Differential Revision: D6909893 fbshipit-source-id: 534492ccf2e15860e86f1e7f759ff338bf57753f	2018-02-09 10:56:29 -08:00
Kai Arulkumaran	8f1f84a6f2	Expand distributions docs (#5148 ) * Expand distributions docs * Add ref to SCG paper * Clarify use of distributions for SCGs	2018-02-09 12:14:17 -05:00
li-roy	ce5702fa80	add reduce=True arg to HingeEmbeddingLoss (#5130 ) * add reduce=True arg to HingeEmbeddingLoss * pass arg to super constructor in HingeEmbeddingLoss * make HingeEmbeddingLoss reference fn work on legacy	2018-02-09 11:38:36 -05:00
gchanan	3b63e552f9	Fix test_distributions when WITH_SCALARS. (#5121 ) * Fix test_distributions when WITH_SCALARS. * Use SCALAR_SHAPE in test, use self.scale in AffineTransform. * Handle device correctly for scalars. * Fix one hot categorical. * Fix relaxed categorical. * Add a new_tensor instance method to Variable that takes only data. This is to work around the legacy problems of new, where e.g. new(5) will give you an unfilled tensor rather than a scalar. * Fix cuda scalar code path. * Remove double return. * Work around lack of WITH_SCALARS. * Use tensor_new.	2018-02-09 11:01:13 -05:00
gchanan	6a9b7132ec	Add a new_tensor instance method to Variable that takes only data. (#5144 ) * Add a new_tensor instance method to Variable that takes only data. This is to work around the legacy problems of new, where e.g. new(5) will give you an unfilled tensor rather than a scalar. * Remove double return. * Fix cuda scalar code path. * Work around lack of WITH_SCALARS.	2018-02-09 10:59:15 -05:00
Hao Lu	6df58dac1d	Make NNApi build Summary: To build with tests and benchmarks `./scripts/build_android.sh -G Ninja -DBUILD_TEST=ON -DUSE_NNAPI=ON` To run unit test `adb push build_android/bin/nnapi_test data/local/tmp` `adb shell "cd data/local/tmp &&./nnapi_test` To run benchmark `adb push build_android/bin/nnapi_benchmark data/local/tmp` `adb shell "cd data/local/tmp &&./nnapi_benchmark` Tested on Google PIxel 2 XL with android 8.1 Closes https://github.com/caffe2/caffe2/pull/1918 Reviewed By: Maratyszcza Differential Revision: D6944604 Pulled By: hlu1 fbshipit-source-id: 462f010117ae4628b23bef506c41397de3817ad4	2018-02-08 19:02:18 -08:00
Lin Yang	cec7003190	only enable FloatToHalf test for GPU Reviewed By: bddppq Differential Revision: D6945312 fbshipit-source-id: 9550a9607c0daec6783ce63d3c9f082ff27b0303	2018-02-08 17:48:47 -08:00
anderspapitto	65fb885467	Bidirectional RNN export to ONNX (Elman/LSTM/GRU) (#5120 )	2018-02-08 20:30:50 -05:00
Hao Lu	de2a708187	Rename test.cc Reviewed By: jerryzh168 Differential Revision: D6941693 fbshipit-source-id: ced6063b1776464953b445a0bc907d18baf4b172	2018-02-08 15:48:56 -08:00
Marat Dukhan	08113f922b	Vendor Python dependencies of NNPACK Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1917 Reviewed By: orionr Differential Revision: D6938735 Pulled By: Maratyszcza fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4	2018-02-08 15:48:56 -08:00
Lin Yang	27b9b7b15a	Make TypeInference work for HalfToFloat & FloatToHalf. Summary: add missing type mapping. Reviewed By: kennyhorror Differential Revision: D6940574 fbshipit-source-id: b70cea4ce2e519cb3e72d0482a38f50dbb968b4a	2018-02-08 15:33:43 -08:00
Andrew Dye	6ecaed5021	Generate a core dump when CompleteInTimeOrDie forcefully quits Summary: CompleteInTimeOrDie was added to detect deadlocks and proactively exit. In addition, call os.abort() to generate a core dump so that the error is actionable. Reviewed By: bmaurer Differential Revision: D6938343 fbshipit-source-id: 8bd36da4f4bb1195bd3398f25d133a6ebf1c66ad	2018-02-08 14:08:51 -08:00
Andrey Malevich	01de4e40d6	Fix a bug in nested parameter sharing logic. Summary: It appears that my initial implementation was not really working when one starts doing nesting. This diff is fixing this by replacing itertools with something that is really easy to reason about. Reviewed By: idning Differential Revision: D6933763 fbshipit-source-id: f7a1de996d878a41bac2b2acd9d87a7c4b416778	2018-02-08 13:32:53 -08:00
Tongzhou Wang	873f116380	adjust stft result comparison precision to 7e-6 (#5143 )	2018-02-08 15:44:18 -05:00
Tongzhou Wang	6e0d0f08a9	Improves Conv*d(Transposed) docs to have correct newline and formatting (#5139 ) Improves CUDA matmul error message by basically copying the CPU error message	2018-02-08 15:34:30 -05:00
Giri Anantharaman	6aaa701c9c	Adding ThresholdedRelu Op support. Summary: Core operator and python operator changes for adding ThresholdedRelu Op support. Reviewed By: houseroad Differential Revision: D6900660 fbshipit-source-id: 9b17ede13ccb3264286389c7fc633ab9c1a7bbbf	2018-02-08 12:18:40 -08:00
gchanan	affe742d31	Add scalar module tests for test_nn. (#5116 ) * Add scalar module tests for test_nn. * Properly return from glu. * Guard scalar test with skipIf.	2018-02-08 13:53:24 -05:00
Richard Zou	0629785645	Initial type hints for function_wrapper (#4947 ) * Initial type hints for function_wrapper * Don't break python 2 * Update TopEnvironment * Add mypy check to travis * Add .mypy_cache to .gitignore	2018-02-08 13:52:31 -05:00
gchanan	696db00bcd	Print Parameters like Variables (i.e. print scalars correctly). (#5119 )	2018-02-08 12:33:52 -05:00
Sam Gross	8edde3de15	Ensure Tensors have storages in resizeNd (#5115 ) Follow up to #4744 This is another code-path in which storages may be null, which is not allowed in PyTorch. The Python tensor bindings handle this in pynew, but the ATen bindings do not. This is caught by test_torch.py when Tensor and Variable are merged.	2018-02-08 12:23:21 -05:00
gchanan	a9f3299abe	Fix test_distributions to always use Variables for examples. (#5134 )	2018-02-08 12:15:21 -05:00
gchanan	8e9b530fd7	Fix ffi cdata for Variables. (#5128 ) * Fix ffi cdata for Variables. * Fix parameter order.	2018-02-08 10:54:41 -05:00
Rachit Singh	c4d43b4c7c	Implemented RelaxedOneHotCategorical + RelaxedBernoulli distributions (#5056 )	2018-02-08 14:13:38 +01:00
Marat Dukhan	3108ce63ba	Back out "[caffe2][PR] Vendor Python dependencies of NNPACK" Summary: Original commit changeset: d0c1c7681605 Reverting due to broken OSS build due to this commit Reviewed By: bddppq Differential Revision: D6935666 fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95	2018-02-08 01:34:22 -08:00
Simeon Monov	5816721e35	Fix the evaluation order problem with build_lstm_body (#5124 ) C++ argument evaluation order is undefined and leads to different results in different platforms. This commit fixes build_lstm_body to do the calculation slightly differently. Fixes #5055	2018-02-08 00:49:16 -05:00
Lynn	beb9fe6a46	remove some warning introduced by #2764 (#5104 )	2018-02-08 00:09:30 -05:00
Soumith Chintala	2d84cb4b04	warn that CUDA capability 3.0 and 5.0 is no longer supported (#5125 )	2018-02-08 00:07:53 -05:00
Marat Dukhan	9093eb1ba0	Vendor Python dependencies of NNPACK Summary: Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time Closes https://github.com/caffe2/caffe2/pull/1901 Differential Revision: D6930731 Pulled By: Maratyszcza fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f	2018-02-07 17:48:06 -08:00
Alexander Sidorov	e0e124e617	Fix RNN scoping situation Summary: There is a long lasting problem of scoping which was introduced in original python wrappers early in H1. Basically each RNNCell implemented has to manually scope outputs of each of the operators. If somebody forgets, then there could be weird bugs with layers etc. Approach is the following. User has to explicitly specify current scope when using apply_over_sequence function and others if the function is going to be called several times (like for stacking layers). This way we use Caffe2 native scoping approach instead of inventing one extra API people have to use (i.e. passing scope name as an argument to the RNNCell constructor). Closes https://github.com/caffe2/caffe2/pull/1681 Differential Revision: D6777536 Pulled By: salexspb fbshipit-source-id: 73d860b8d4857589e04bdea5a6fcd3080d68427c	2018-02-07 17:35:29 -08:00
Jesse Hellemn	8724298482	Fixing spdir copy in build script for cuda Summary: Closes https://github.com/caffe2/caffe2/pull/1913 Reviewed By: orionr Differential Revision: D6932577 Pulled By: pjh5 fbshipit-source-id: 49921ed345d922b9584c51c150405ba4f37e780d	2018-02-07 17:35:29 -08:00
Hao Lu	99cdf7f91c	Integrate android nn api Summary: Integrate android nn api into Caffe2. Supported ops include averagepool, maxpool, conv, relu, and softmax Reviewed By: Maratyszcza Differential Revision: D6560366 fbshipit-source-id: 2879a99c01acb050e711d9d7d5bde022ef95888d	2018-02-07 16:53:58 -08:00
Will Feng	2c27bae802	Change Windows CI conda install path (#5126 )	2018-02-07 17:56:25 -05:00
anderspapitto	ef14590209	Support calling pack_padedd_sequence with a Variable lengths (#5113 ) This was accidentally lost while addressing review comments on https://github.com/pytorch/pytorch/pull/4695 pack_padded_sequence may be called either with a list or with a Variable. If called with a list we convert to Variable internally. I added to test_nn to test the new codepath. The bug was also caught by the onnx-fb-universe tests (which rely on passing in Variable).	2018-02-07 17:11:33 -05:00
Richard Zou	bf603299b6	Restore torch.mm behavior for sparse variables (#5077 ) torch.mm(sparse, dense) -> dense works for tensors. This PR makes it work for variables as well. I renamed mm to _mm in Declarations.cwrap and wrote a native mm function that wraps _mm for the dense case and addmm for the sparse case.	2018-02-07 15:42:29 -05:00
Sam Gross	85e22b5475	Reverts force_gpu_half changes from #3660 (#5000 ) The test_cuda.py setup purports to test half tensors, but actually just re-tests FloatTensors because the keys in type_map were str instead of type. Testing HalfTensors is more complicated, requiring changes to precision and requires excluding some unimplemented methods. We should fully test half CUDA tensors. This change just deletes the duplicate tests of FloatTensor.	2018-02-07 15:33:17 -05:00
bddppq	3e85613751	Experimental jit script (#5074 )	2018-02-07 20:43:45 +01:00
gchanan	1de4501078	Add scalar module tests for common_nn. (#5095 ) * Add scalar module tests for common_nn. * Properly skip cuda Hardshrink tests. * Fix flake8.	2018-02-07 14:09:24 -05:00
Peter Goldsborough	25e946bf78	Replace edge_type with Edge and create Variable::gradient_edge() (#5030 )	2018-02-07 10:50:42 -08:00
Sam Gross	0390587d12	Bind Tensor.random_ in ATen for CUDA (#5111 ) Matches the behavior of TensorRandom.cwrap	2018-02-07 13:45:24 -05:00
Lu Fang	c111cdfd1d	Add onnx support for InstanceNorm (#4626 ) * Add ONNX symbolic for instancenorm * Fix some bugs	2018-02-07 10:54:30 -05:00
Vishwak Srinivasan	011941087a	Implementation of the cumulative distribution function and its inverse (#5079 )	2018-02-07 16:10:19 +01:00
li-roy	e75b434ca2	fix MultiLabelMarginLoss test names (#5098 )	2018-02-07 11:28:36 +01:00
Jason Gauci	e027277a57	Set of RL improvements: Fix error in quantile computation. Handle missing values in sparse_to_dense. Replace page_size with minibatch size. Summary: Set of RL improvements: Fix error in quantile computation. Handle missing values in sparse_to_dense. Replace page_size with minibatch size. Differential Revision: D6888977 fbshipit-source-id: bb84477866c64da5ff57d6c25df1c8d3b799e437	2018-02-06 20:48:00 -08:00
Will Feng	8f78dd7249	Refactor CPU and GPU perf tests (#5078 )	2018-02-06 22:12:53 -05:00
Richard Zou	ccb61e0da7	Check shape instead of number of elements for some losses (#5085 ) * Check shape instead of number of elements * Remove arbitrary-shape spec in MSELoss docs	2018-02-06 22:12:11 -05:00
Tongzhou Wang	47ee86776e	Fix CPU torch.multinomial with noncontiguous prob tensor (#5093 ) * fix CPU torch.multinomial not working on noncontiguous probability distn' * address comments * change some tabs to spaces in THStorage.c	2018-02-06 22:11:43 -05:00
anderspapitto	b2cfd961d3	Handle sequence lengths correctly when exporting RNNs to ONNX (#4695 ) * PackedSequence: store batch_sizes as tensor rather than converting to a list of python integers. This maintains the invariant that module's inputs/outputs are collections of Variables. In particular, this causes the JIT to no longer choke when flattening and unflattening arguments. * Handle sequence lengths correctly when exporting RNNs to ONNX - when uniform sequence lengths are provided, correctly omit the argument when constructing the ONNX graph, so as to not fix the graph to the batch size. - handle PackedSequences by floating them through the graph and eliminating them in an optimization pass. ONNX does not have packed sequences, but operates on a representation equivalent to PaddedSequence, so we hide the representation-switching from ONNX - as a preliminary step towards handling PackedSequences, not directly tied to ONNX export, change batch_sizes from being an argument to the RNN operators into being an argument to the forward() function of those RNN operators. This more closely models the reality that batch_sizes are effectively part of the input sequences.	2018-02-06 21:40:27 -05:00
Guillaume Dumont	7dafb1217e	Fixed CAFFE2_API decoration for caffe2/proto when building static libraries Summary: This was forgotten in #1854. cc Yangqing Closes https://github.com/caffe2/caffe2/pull/1880 Differential Revision: D6919916 Pulled By: Yangqing fbshipit-source-id: 1a8dbae604677bc3c3d23b4e55bd09bb87c24cfd	2018-02-06 18:11:53 -08:00
Kaiyu Shi	f796080781	Add assignment support for Sequential (#4931 )	2018-02-07 02:22:25 +01:00
Tongzhou Wang	f160e552df	change long to int64_t (#5094 )	2018-02-06 19:35:02 -05:00
Jesse Hellemn	b3c8b3d132	Adding more summary output to make debugging CUDA problems easier Summary: Closes https://github.com/caffe2/caffe2/pull/1902 Reviewed By: orionr Differential Revision: D6917525 Pulled By: pjh5 fbshipit-source-id: af8c3d1adcd528a49bcd2885207304e199a06f6f	2018-02-06 16:04:50 -08:00
gchanan	7af433deeb	Add scalar criterion tests (#5087 ) * Add criterion scalar tests. This exposed an issue in MarginRankingLoss with scalars, but the cleanest way to fix is to wait until forward runs on Variables (so we don't have to wait for the backward to check if something is a scalar). * Fix flake8. * Add error message for margin_ranking_loss with scalars.	2018-02-06 18:40:37 -05:00
Sam Gross	3cd825d25e	Check that indices and values are on the same device (#5089 ) We perform this check in the generic/SparseTensor.cpp (the Python binding), but the ATen bindings don't use that code path Fixes test_broadcast_coalesced with sparse tensors	2018-02-06 18:23:26 -05:00
James Reed	a68e224219	Fix ONNX While test for CUDA Summary: We should not be trying to instantiate this op on GPU at this point Reviewed By: pietern Differential Revision: D6915576 fbshipit-source-id: 6bdbc93ad12fc67e3001fce1b506fe2895d7b0ba	2018-02-06 14:35:34 -08:00
Sam Gross	895aebac08	Use Variable instead of Tensor in Function.forward (#4786 ) The Tensor and Variable classes are being merged. autograd.Function.forward is now called on Variables, but with "no-grad" mode (torch.no_grad()) enabled. One benefit is that we no longer have to explicitly track shared storages.	2018-02-06 17:24:27 -05:00
Sam Gross	c4d3f69053	Add Variable.item() (#5090 ) Variable.item() converts one-element tensors to standard Python numbers. This operates like float(var) or int(var) depending on the data type of the Variable.	2018-02-06 17:15:53 -05:00
Sam Gross	c1b98f0841	Add deprecated add_out overload (#5088 ) We have a few calls that use this signature on Tensors. This also updates the binding code to support deprecated xxx_out signatures.	2018-02-06 17:08:23 -05:00
Jesse Hellemn	e2f193c650	Installing recent setuptools version for python3 Summary: Closes https://github.com/caffe2/caffe2/pull/1900 Reviewed By: pietern Differential Revision: D6915920 Pulled By: pjh5 fbshipit-source-id: 9bd8073a9670afd6e9fc02228cff7d8800762d5c	2018-02-06 14:05:10 -08:00
Yongjik Kim	36bbaf0d85	Fixed double memory accesses of several pointwise operations. (#5068 ) Because nvcc does not know that in/out pointers do not alias each other, if we assign a value to out and then use in again, the kernel has to emit a write to out and then another read from in. (Affected kernels become marginally faster after the fix.)	2018-02-06 16:24:03 -05:00
Tongzhou Wang	4ad7fab16e	Fix TH compile warnings (#5065 ) * fix all TH compile warnings * wrap __attribute__((unused)) in a macro	2018-02-06 14:47:51 -05:00
gchanan	fcccd07cc0	Implement hinge_embedding_loss as a native function. (#5080 )	2018-02-06 14:43:36 -05:00
Christian Sarofeen	78649419c4	Cuda 9.1 is cuda version 9010 not 9100 (#4861 )	2018-02-06 13:40:25 -05:00
gchanan	67ff50c30d	Run test_nn criterion tests over Variables, add a scalar test (#5058 ) * test_nn working. * Fix some incorrect scalar assumptions. * Don't use Variables when we don't have to. * Use Variable Mixin. * Fix NLLLoss reference function when WITH_SCALARS not enabled. * Allow device to be optional in cuda(). * Fix multilabelmarginloss_reference.	2018-02-06 11:11:18 -05:00
Lynn	13ef8432b6	parallelize vol2col and col2vol of Conv3D with CPU backend (#4824 ) * parallelize vol2col and col2vol of Conv3D with CPU backend * parallelize vol2col and col2vol of Conv3D with CPU backend * interface test of conv3d * replace long with int64_t * correct pragmatic error of comments	2018-02-06 10:53:53 -05:00
Richard Zou	237c27c35f	Fix reduction functions not respecting the strides of output when output is correct size (#4995 )	2018-02-06 10:50:28 -05:00
Qinqing Zheng	c028bcd466	Fix input of Reduce{Front/Back}{Sum/Mean}Gradient ops Summary: The previous refactor of these four Ops changed their input semantics, which makes backward impatible with old models. This diff fix this problem by checking the input and define follow-up behavior by case, so that the old models can be accommodated. Reviewed By: dzhulgakov Differential Revision: D6905840 fbshipit-source-id: fc37baec407fd5eae64fc9c2b61aba3c492a90f3	2018-02-05 23:33:07 -08:00
James Reed	f383600625	ONNX While Operator Summary: Special While loop operator that follows the semantics of While in ONNX: https://github.com/jamesr66a/onnx/blob/controlflow/docs/Operators.md#experimental-loop Stuff that's missing: - Lexical scoping enforced via child workspaces - Double-buffering on forward Further possible enhancements: - Full parallelism when there are no loop-carried dependencies - Diagonal execution - More optimized scan_outputs shaping via static shape inference provided in ONNX (coming sometime) - GPU support (probably just some tensor value management stuff) - Gradient support (likely low-pri right now) Closes https://github.com/caffe2/caffe2/pull/1848 Reviewed By: dzhulgakov Differential Revision: D6907524 Pulled By: jamesr66a fbshipit-source-id: 4938108733e168b8c027035091104712a18c992a	2018-02-05 21:05:52 -08:00
Anders Papitto	6a02cb2844	implement sequence length support for BasicRNN Summary: Closes https://github.com/caffe2/caffe2/pull/1843 Differential Revision: D6839575 Pulled By: anderspapitto fbshipit-source-id: efdf00f1c5cfb0d63f1992028a796c8277b76688	2018-02-05 21:05:51 -08:00
Tongzhou Wang	805639906a	Broacast output requires_grad if only corresponding input requires_grad (#5061 )	2018-02-05 23:38:35 -05:00
James Reed	895987f9e9	Add clang-format style file to caffe2 Summary: Closes https://github.com/caffe2/caffe2/pull/1894 Reviewed By: dzhulgakov Differential Revision: D6908057 Pulled By: jamesr66a fbshipit-source-id: 1f5657e7051e2ce77a30d37c1f1c40345651d0fe	2018-02-05 20:35:18 -08:00
albanD	c9ee47b0b5	Fix topk work size computation (#5053 ) * fix grid computation for topk kernel * backslash alignment, no change in code	2018-02-05 23:34:31 -05:00
Dmytro Dzhulgakov	fc856f0036	Fix type and disable warning about cpu arch Reviewed By: Yangqing Differential Revision: D6908986 fbshipit-source-id: 42bba410ea772999717353b749c411bd8484af6b	2018-02-05 19:36:01 -08:00
Richard Zou	a83c240644	Fix maxpool3d / avgpool3d crashs (#5052 ) * Replace downcastOuter with newFoldBatchDim * Fix double free * Address comments	2018-02-05 21:14:25 -05:00
Aarti Basant	28f42cc8e7	separating set_params and init() for checkpoint managers. Summary: separating set_params and init() for checkpoint managers. Reviewed By: anshulverma Differential Revision: D6852255 fbshipit-source-id: 061f16ce0c49953ca8a5fe9546af5c9945a3be48	2018-02-05 18:03:21 -08:00
Jesse Hellemn	1d044dc459	Changing sed call in CUDA conda-builds to keep friendly package name Summary: Closes https://github.com/caffe2/caffe2/pull/1893 Reviewed By: orionr Differential Revision: D6903915 Pulled By: pjh5 fbshipit-source-id: 4cdd98f7cc0be68f6aa9a455c4d4d8478c4e8869	2018-02-05 17:36:37 -08:00
joncrall	61ad0e486b	cmake: python packages now install to the cannonical directory Summary: Addresses issue #1676 Now when `make install` is run, the `caffe2` (and `caffe`) python modules will be installed into the correct site-packages directory (relative to the prefix) instead of directly in the prefix. Closes https://github.com/caffe2/caffe2/pull/1677 Reviewed By: pietern Differential Revision: D6710247 Pulled By: bddppq fbshipit-source-id: b49167d48fd94d87f7b7c1ebf0f187ec6a203470	2018-02-05 17:05:34 -08:00
Evgeny Kharitonov	7c7e09fe2d	Adding the Percentile op & UT Reviewed By: MisterTea Differential Revision: D6879507 fbshipit-source-id: 7ca4165a42c073e384d3a6138ef033ca384afd49	2018-02-05 16:08:00 -08:00
Adam Paszke	239d3b2461	Add formulas for LSTM ops to JIT AD (#4916 )	2018-02-06 00:01:02 +01:00
Marat Dukhan	3f0a99dc90	Update FXdiv submodule Summary: This brings an option to disable inline assembly in FXdiv via CMake configuration option `-DFXDIV_USE_INLINE_ASSEMBLY=OFF` Inline assembly in FXdiv apparently triggers a bug in some gcc versions Closes https://github.com/caffe2/caffe2/pull/1892 Differential Revision: D6904507 Pulled By: Maratyszcza fbshipit-source-id: 2ef24b277cbaa2634c69e2d53cef21415b05195f	2018-02-05 14:33:20 -08:00
Richard Zou	885c874167	Fix refcycles in DataParallel scatter and gather (#4988 ) * Eliminate reference cycles in scatter_gather * Test for refcycles * Better fix * Add comments	2018-02-05 17:19:36 -05:00
Andrew Tulloch	cfb536937c	fix Android typeid_test.cc build error Summary: Fix typeid_test when running android C2 tests Previously it says: Build failed: Command failed with exit code 1. stderr: caffe2/caffe2/core/typeid_test.cc: In member function 'virtual void caffe2::{anonymous}::TypeMetaTest_Names_Test::TestBody()': caffe2/caffe2/core/typeid_test.cc:49:12: error: variable 'string_meta' set but not used [-Werror=unused-but-set-variable] TypeMeta string_meta = TypeMeta::Make<string>(); Reviewed By: Yangqing Differential Revision: D6869192 fbshipit-source-id: ccbc30d53d04a8ece98de0a99598c176e6aaf4dc	2018-02-05 13:51:31 -08:00
Peter Goldsborough	b08101e281	Bring back Tensor::data<__half>() and remove base Tensor::data() template (#5035 )	2018-02-05 16:42:58 -05:00
Anders Papitto	d8748a9d53	GRU sequence lengths: allow unspecified sequence lengths Summary: modeled after the earlier change for LSTM Closes https://github.com/caffe2/caffe2/pull/1841 Differential Revision: D6837461 Pulled By: anderspapitto fbshipit-source-id: de4e787019fa30f813a4b29f14b7000ce9d22d8e	2018-02-05 13:20:05 -08:00
Jesse Hellemn	019c1c4ca5	Removing some default dependencies of CUDA conda builds Summary: Closes https://github.com/caffe2/caffe2/pull/1847 Reviewed By: orionr Differential Revision: D6900615 Pulled By: pjh5 fbshipit-source-id: 5c9fec941b13bcb1007e0a29801e8e70ec042840	2018-02-05 11:20:11 -08:00
Richard Zou	e4eaf67ec9	Fix torch.diag backward with non-square matrix (#4538 ) * Fix torch.diag backward with non-square matrix * Addressed comments	2018-02-05 13:59:29 -05:00
Tongzhou Wang	91efc30bfa	fix #5047 (#5048 )	2018-02-05 13:58:29 -05:00
Vishwak Srinivasan	1eaa10b32e	Update torch.distributions documentation (#5050 ) * Add a small paragraph for pathwise estimator * Add differentiability as well * Add small snippet and clear some grammatical errors * Update documentation to reflect has_rsample * Add a fix for ExponentialFamily docs * Update __init__.py	2018-02-05 13:57:38 -05:00
Edward Z. Yang	7bd2db997e	Port cuDNN RNN bindings to ATen (#4881 ) * Add transpose() to TensorGeometry. This code is dead; I briefly used it in my RNN patchset but eventually rewrote it to not be necessary. However, it seemed like a useful gadget so I kept it. In general, it seems that it would be useful for TensorGeometry to support all operations that Tensor does, but it only computes the changes to sizes/strides instead of actually doing the computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Turn on wrap_dim behavior for TensorGeometry Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support for hard-coded differentiable outputs. Some outputs of functions are nondifferentiable, and should always be returned with requires_grad=False. Traditionally, we have used the presence of 'grad' to signal that only the first output is differentiable, and the rest are not, but cudnn_rnn (to be implemented) breaks this pattern; its first three outputs are differentiable, but its last output is a buffer that is just consumed by backwards. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * TensorGeometry constructor from just sizes The sizes are assumed to form a contiguous tensor, and we compute the strides we would get in that case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support saving TensorList for backwards. There is some back story here. Saved TensorList in backwards will be used by cudnn_rnn, and it is worth asking, why is it necessary to save a list of tensors? Indeed, technically speaking a list of tensors is not necessary, we only need to save the sizes of each of the weight tensors. (We need the sizes because cuDNN is only going to blast the derivative of weights into a flat buffer, but we need to match the sizes of the views into the buffer when we eventually return the derivatives.) However, it was surprisingly awful trying to implement passing just sizes, because as non-Tensor arguments, the JIT interpreter generation code is expected to handle all non-Tensor arguments as attributes in the trace, and our attributes struct doesn't actually know how to do arrays of arrays. Saved TensorList code was much easier to get working, so that's what this patch does. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * MatrixRef - an ArrayRef with a stride, making it a 2D ArrayRef. Like ArrayRef, this class does not own the underlying data, it is expected to be used in situations where the data resides in some other buffer. This is intended to be trivially copyable, so it should be passed by value. For now, 2D only (so the copies are actually cheap, without having to write a SmallVector class) and contiguous only (so we can return non-strided ArrayRef on index). The intended use-case (not in this commit) is to make it easier to work with RNN weights, which are num_weights x num_layers matrix of parameters. P.S. dimension 0 indexes rows, dimension 1 indexes columns Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Generalize getDataType in Descriptors.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Change copy_range to take Tensor, and change cat_tensors_backward accordingly Should a backward function return a Variable or a Tensor? For the most part, all of our backward functions return Tensor, except cat_tensors_backward, which returns a variable_list (which is really the only thing that matters, because Tensor and Variable are interconvertible). But this is kind of weird, because it means that you can't implement a backwards in ATen that returns a std::vector<Tensor>, and then hook it up transparently with the derivatives code. So I switched it over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 5-ary return Tensor tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support code generation with mixed Tensor/TensorList in output. I don't think I ended up using this in cudnn_rnn, but this seems it might be useful for someone else later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Support 4-ary boolean array Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add support for retain_variables in tools/autograd/derivatives.yaml 'retain_variables', a bool which is true if a user has specified that saved variables should be retained in case the backwards is run again later. This allows an optimization where we can destroy saved buffers if we know variables are not going to be retained, e.g., it is (will be) used by _cudnn_rnn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lazily initialize cuDNN descriptors Previously, cuDNN descriptors were eagerly allocated as soon as a FooDescriptor object was created. However, in some uses of TensorDescriptor, this is problematic: some tensors are optional and cuDNN's API expects to be given a nullptr TensorDescriptor in this case, not an uninitialized (but allocated) descriptor. Lazily initializing the descriptors makes it less likely for us to use uninitialized memory and matches the usual semantics of unique_ptr. It's good sense! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Port cuDNN RNNs to ATen. This brings three new functions: - _cudnn_rnn_flatten_weight: flatten a matrix of weight tensors into a single contiguous weight buffer as required by cuDNN - _cudnn_rnn: run RNN forwards - _cudnn_rnn_backward: run RNN backwards RNNs have a lot of parameters, so we restructured what was previously a single 'fn' object that recorded all the parameters into three objects: RNNDescriptorParams, TensorDescriptorListParams and DropoutDescriptorParams. We make use of MatrixRef to organize the weight tensors (which are weight/bias x number of layers), but I did not teach the codegen how to pass these as arguments/return values natively, so instead a MatrixRef is passed as its constituent ArrayRef and int64_t stride0. cudnn_rnn has three differentiable outputs and one nondifferentiable one, so it makes use of the support for hard-coded differentiable outputs. I haven't deleted all of the descriptor code from Python, because dropout initialization still goes through this codepath, that should be fixed soon but I don't see it as essential for this PR. This commit also removes the last use of NestedIOFunction from PyTorch. There are some shenanigans with cuDNN dropout descriptor initialization, see below: Note [cuDNN dropout descriptor initialization] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In most cases, setting descriptors in cuDNN is cheap (e.g., cudnnSetTensorNdDescriptor). However, this is not the case for cudnnSetDropoutDescriptor: in cuDNN 6/7 (and possibly others) it does an expensive precomputation to initialize the random number generator states. In cuDNN 6, this is the ONLY official mechanism to initialize a dropout descriptor, which means that law-abiding clients were expected to generate a dropout descriptor once and cache it. However, our ATen interface is (1) stateless (so we can't cache the descriptors) and (2) does not accept arbitrary user types in its interface (so we can't pass the descriptor in). This puts us in a pickle. In cuDNN 7, a new function, cudnnRestoreDropoutDescriptor was added, which forgoes the expensive initialization process, and can initialize the descriptor with a pre-initialized state CUDA tensor. This is great, because it means we can simply pass in the state tensor and then initialize the descriptor internally. Unfortunately, this function is not available in cuDNN 6. To work around this, we break the cuDNN abstraction barrier, and have the struct layout of the underlaying dropout descriptor. With this struct, we can reimplement cudnnRestoreDropoutDescriptor from scratch. Great! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix cuDNN 7 behavior. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused, controversial methods from MatrixRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing filter_dim_a slice Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Replace nested for-loop with itertools.chain. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comment on mut_desc() Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Refactor DropoutDescriptor API. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use cached CurrentDeviceProperties from Context. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Document _cudnn_rnn outputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Improve fmap docs, convert some functions to use it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Move IndexRange to autograd/function.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Elaborate on CUDNN_STATUS_INVALID_VALUE return some more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add an all-in-one setter for RNNDescriptorParams. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Print what the unrecognized RNN mode was Signed-off-by: Edward Z. Yang <ezyang@fb.com> * RNN TensorDescriptor improvements - Have an explicit size/stride overload for set TensorDescriptor, so you don't have to create a goofy view to feed in. - Change the padding to 3D rather than 5D, which is all you actually need (it's just 2D that is not supported by cuDNN API.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix implementation of cudnnRestoreDropoutDescriptor, plus test. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better comments about input layout. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add comment about no-DropoutDescriptor argument RNNDescriptor function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Rename vocab_size back to input_size. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't use backslash in comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bugfix for contiguous TensorGeometry calculation. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Don't allocate a dummy tensor when setting TensorDescriptor for flatten_weight. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make contiguity errors more user-friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/fn.dropout.train/fn_train/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * s/_cudnn_rnn_backward_grad/_cudnn_rnn_backward_input/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Make dcx properly undefined when not required. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove old TODO. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add state size check in cudnnRestoreDropoutDescriptor Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Explicitly narrow int64_t to size_t Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Restore copyParams comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update benchmark numbers, and slight engineering improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Typofix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-05 13:54:11 -05:00
li-roy	28f056fed2	add reduce=True argument to MultiLabelMarginLoss (#4924 ) * add reduce=True argument to MultiLabelMarginLoss * Fix lint * Addressed comments * Remove unneeded syncthreads calls	2018-02-05 12:28:51 -05:00
Orion Reblitz-Richardson	d3ea7e260b	Allow for all of the names we have in our model zoo. Summary: * We now allow subdirectories as well as numbers in the name. * Also fixed an error case. Closes https://github.com/caffe2/caffe2/pull/1875 Reviewed By: pjh5 Differential Revision: D6894401 Pulled By: orionr fbshipit-source-id: 6a9938bc7d2ba6b8f094ed7b8a02664120a10626	2018-02-05 08:52:55 -08:00
Richard Zou	ba61eee074	Expose sparse variable addmm, addmm_ (#5016 ) sspaddmm, mm for sparse tensors to come in another pr; they're a little more involved.	2018-02-05 11:40:53 -05:00
Sam Gross	76ae03d5f1	Operate on Variables in torch.nn.init (#4964 ) Once Variable and Tensor are merged the existing Variable test would cause an infinite recursion. Instead, modify the Variables directly inside a `no_grad()` block.	2018-02-05 11:34:05 -05:00
gchanan	f4a2b0e446	Don't allow scalars where vectors are required in mv, addmv, ger, addr. (#5003 ) * Don't allow scalars where vectors are required in mv, addmv, ger, addr. * Fix scalar_tensor_test for ger. * Address review comments. * Fix merge.	2018-02-05 11:09:18 -05:00
Zachary DeVito	b044c95129	Use blocks machinery to simplify bookkeeping in autodiff (#5036 ) * Remove addValues and use WithInsertPoint * Use blocks to simplify differentiate Using @ezyang's suggestion, this change uses a block rather than staging annotations to represent the reverse pass. This allows us to reuse the machinery to copy graphs/blocks to extract the reverse pass concisely. This also change the input order of Gradients df to: [output vjps][temporary vjps][captures] In addition to being simpler to generate in this order, it also will allow ExecutionPlan to append the captures onto the already- existing input list of vjps that are given by the autograd, rather than have to prepend them, which should be slightly cheaper. * Enforce that input capture are before outputs This changes the Gradient struct to enforce that input captures appear before output captures in the capture list, which makes it easier to use in ExecutionPlan.	2018-02-05 10:43:50 -05:00
Simeon Monov	c65bd6660e	Move the cudnn include path before system include path (#5026 ) In some cases when there are two different versions of cudnn installed, one under /usr/local/cuda and other under a virtual env such as conda or under the main system path /usr/include, the compiler would pickup the cudnn.h from the virtual env/system path first. This is because cmake generates C_INCLUDES and CXX_INCLUDES flags with system include path first. All this may lead to linking problems as described in Issue #4869 Fixes #4869	2018-02-04 10:36:22 -05:00
Vishwak Srinivasan	85a7e0fc41	Addition of ExponentialFamily (#4876 )	2018-02-04 12:18:28 +01:00
Lin Yang	3acce3e4a7	assert global_constant name as string Reviewed By: kennyhorror Differential Revision: D6895157 fbshipit-source-id: 9844ab6176d22c6d05a5a0f83b731f734ef9853d	2018-02-04 01:02:30 -08:00
Lin Yang	95626737d0	enforce global_constant name should be a string Reviewed By: kennyhorror Differential Revision: D6880114 fbshipit-source-id: 2c9bd27b01cedb469f19843163b04a613fda5904	2018-02-04 01:02:27 -08:00
Peter Goldsborough	9c7ac85050	Replace more sample_n calls in test_distributions.py (#5034 )	2018-02-03 20:57:53 -05:00
Peter Goldsborough	61b5ea85d4	Remove FunctionFlags (#5018 )	2018-02-03 20:57:39 -05:00
Zach DeVito	f8388d2aea	Add the ability to change the insert point Graphs In lieu of a more complicated builder object, this commit adds an 'insert point' to Graph and a method 'insertNode' which inserts nodes at that insert point. setInsertPoint can be used to change the insert point on the graph to the end of a block or to any point inside a current block. The resource guard `WithInsertPoint` can be used to temporarily change it to, for example, insert into the "then" branch of an If statement. This commit also updates the resource guard for scopes. It previously relied on return value optimization to work correctly which is not guaranteed to be applied until C++17.	2018-02-03 12:09:40 -08:00
Vishwak Srinivasan	423677bacc	Add KL-divergence for Categorical and OneHotCategorical and stronger tests (#4961 )	2018-02-03 12:47:13 +01:00
Zach DeVito	99ce581155	Add support for ::copy and ::createClone with blocks	2018-02-02 23:24:49 -08:00
Zach DeVito	0d748fac96	Add nested Blocks in IR This commit is getting the IR ready for representing ONNX control flow. It adds nested blocks to the IR. * Each node now has blocks(), addBlock(), and eraseBlock() similar to a node's output list. * Blocks are a property of every node rather than an attribute because to make it easier to manage the lifetime of the containing nodes and because the behavior of cloning Blocks will likely be different from the way we clone other attributes. * A block itself has a list of nodes, as well as inputs and outputs. The meaning of the nested input/output nodes are specific to the particular node kind containing the block. It is safe to assume inputs to a block will be in scope in the block. * Each Block has an owningNode() and each node has an owningBlock(). The owningNode of the top-most block is null. * Values are lexically scoped: nested blocks can use values from outer blocks that have been defined in previous nodes. Lint has been updated with these new scoping rules. * This change preserves almost all of the pre-Block API. No attempt has been made to make optimizations aware of Blocks. This will need to be done on a case-by-case basis as we make optimizations capable of handling Blocks.	2018-02-02 23:24:49 -08:00
Zachary DeVito	c308e03f3e	Initial GraphExecutor Implementation. (#4982 ) This adds the initial implementation of graph executor for the new JIT design. It includes a few python tests ensuring that nograd, backward, and double-backward cases work for simple examples and some corner cases. More work needs to be done to performance optimize as there are many extra copies and places where we hold onto variables longer than we should. These are noted in the comments.	2018-02-02 17:45:59 -08:00
Marco Zandonadi	3708914bd5	Give NetObserverReporter a virtual destructor for correct destruction Summary: Future-clang is stricter about some things. We need to address deletes on non-virtual destructors. For reference, the compiler error in question can be identified by: "delete called on 'ClassName' that is abstract but has non-virtual destructor [-Werror,-Wdelete-non-virtual-dtor]" for a given ClassName. Reviewed By: smeenai Differential Revision: D6853479 fbshipit-source-id: a40c8e83da7c1b44da48e887cc029e98e40d6737	2018-02-02 17:32:26 -08:00
Orion Reblitz-Richardson	b0d09dd8d7	Cleanup operator docs for catalog generation. Summary: * Likely need to test this so bad formatting can't be added in the future, but cleaning all operators so we at least have good examples. * Formatting between our internal Facebook operator catalog and external caffe2.ai catalog are still slightly different. We'll work on this. Closes https://github.com/caffe2/caffe2/pull/1846 Reviewed By: pjh5 Differential Revision: D6848570 Pulled By: orionr fbshipit-source-id: b9bc0bfccb243d0440bd7b2406858cad8dc37e92	2018-02-02 16:36:05 -08:00
Yan Shang	e816c777eb	Add regularization for sparse features Reviewed By: xianjiec Differential Revision: D5767997 fbshipit-source-id: b9b7c47d11417fbe67d861a2a6b4daa38adbe57b	2018-02-02 16:03:32 -08:00
Yan Shang	dabddd65f4	Add sparse normalization operator Reviewed By: xianjiec Differential Revision: D6735673 fbshipit-source-id: 870b38d5175cb2d2dcad43c0e9fa4746e4dd15dd	2018-02-02 15:05:59 -08:00
gchanan	4ae05799fa	Don't allow scalars in torch.dot for Variables. (#4972 ) * Don't allow scalars in torch.dot for Variables. There is no dot_out, so the lack of _out isn't an issue. * Fix test for 1-d only dot.	2018-02-02 16:07:27 -05:00
Peter Goldsborough	56112cbafd	Add .clang-format (#5019 )	2018-02-02 14:34:03 -05:00
Will Feng	7400de3080	Fix C FFI extension after moving TH to C++ (#5005 ) * fix cffi include issue * add cffi test * disable cffi test for python 3.7 * and new line and comment	2018-02-02 12:45:30 -05:00
Tongzhou Wang	f23feca681	Fix output_nr not incremented correctly (#4812 ) * fix output_nr not incremented correctly * update test_conv_double_backward to cover this case; call accGradParameters if any param (not just weight) requires grad in parse_nn.py * update Spatial/VolumetricFull(Dilated)Convolution to support accGradParameters with only bias requiring grad * Spatial/VolumetricConvolutionMM * Spatial/VolumetricDilatedConvolution * address @fmassa 's comments	2018-02-02 12:39:33 -05:00
Edward Z. Yang	e22095b09d	Add some more builder scripts from ossci-job-dsl (#4945 ) * Add some more builder scripts from ossci-job-dsl Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Relax precision requirement on test_Upsample_trilinear_scale_3d_cuda Partially addresses #5006. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-02-02 11:14:56 -05:00
Soumith Chintala	bf3655a10c	make torch.set_num_threads also set MKL threads (take 2) (#5002 ) * torch.set_num_threads sets MKL option too * fix to use C prototype instead of fortran	2018-02-02 09:24:54 -05:00
Peter Goldsborough	86fd5fd524	Replace async with non_blocking for Python 3.7 (#4999 ) * Replace async with non_blocking for Python 3.7 upgrade * Remove trailing whitespace * Give _cuda and _type kwargs and accept async for compatibility * Rename async to non_blocking in all C++ code * Add entries for async in python_variable_methods * Friendlier backward compatibility for cuda and type	2018-02-02 09:23:51 -05:00
Adam Paszke	8e22f847ad	Improve CUDA softmax performance	2018-02-02 13:23:56 +01:00
Peter Goldsborough	390d542db2	Modify .jenkins/test.sh to install ninja	2018-02-01 22:42:07 -08:00
Peter Goldsborough	733ce9529e	[cpp-extensions] Implement torch.utils.cpp_extensions.load()	2018-02-01 22:42:07 -08:00
Xue Feng	142a335b81	fix ModOp Windows build issue Summary: It seems that integral in std:signbit is not well supported in Windows. Bypassing it. Reviewed By: xianjiec Differential Revision: D6869924 fbshipit-source-id: b98a3431c4d26dcffd08e26259037083afd41114	2018-02-01 21:14:59 -08:00
Marat Dukhan	39b351ecb0	Fix build with NNPACK Summary: - Fix path to FXdiv and FP16 dependencies - Link cpuinfo library - Pull NNPACK fix for PYTHONPATH handling when launching PeachPy - Pull cpuinfo fix for cross-compiling on Linux for Android - Pull cpuinfo fix for CPUINFO_LIBRARY_TYPE support - Pull cpuinfo fix for iOS builds Closes https://github.com/caffe2/caffe2/pull/1869 Differential Revision: D6881428 Pulled By: Maratyszcza fbshipit-source-id: 7b4115daa090096dbd97303503792e7b144fbb43	2018-02-01 20:47:10 -08:00
Richard Zou	a69110c0d7	Add size checks for sparse tensor constructor (#4113 ) * Add size checks for sparse tensor constructor * Fix tests * Free max_indices	2018-02-01 22:08:20 -05:00
Zhiyong Dang	4d656842d9	enable USE_MOBILE_OPENGL by default Summary: iOS is also depend on USE_MOBILE_OPENGL, so I think we should only disable it for Android. Closes https://github.com/caffe2/caffe2/pull/1835 Differential Revision: D6880522 Pulled By: Maratyszcza fbshipit-source-id: b2c2fa052ad5948bc52fa49eb22c86eb08f59a39	2018-02-01 18:57:38 -08:00
Peter Goldsborough	1475895c1d	Use distutils.copy_tree/copy_file instead of shutil	2018-02-01 16:19:03 -08:00
Peter Goldsborough	9e36f979c9	Add ABI compatibility check to cpp_extensions.py	2018-02-01 16:19:03 -08:00
Peter Goldsborough	1262fba8e7	[cpp extensions] Create torch.h and update setup.py	2018-02-01 16:19:03 -08:00
Tongzhou Wang	6665a45d5e	Add README.md for ATen/cudnn (#4998 )	2018-02-01 18:29:31 -05:00
Edward Z. Yang	ce5ccaef0c	Rewrite ATen native docs. (#4816 ) * Rewrite ATen native docs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Formatting fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Some of the CR comments * More CR comments [ci skip] * One last CR comment	2018-02-01 16:56:14 -05:00
gchanan	eb5daa9478	Make cat/cat_out native function that rejects scalar inputs. (#4992 ) * Make cat/cat_out native function that rejects scalar inputs. * Print position of scalar in error message.	2018-02-01 16:20:01 -05:00
Sam Gross	a8bda67ff1	Only check that arguments are Variables in VariableType (#4991 ) Don't check the ScalarType and Backend of arguments in VariableType. Instead, only check that arguments are Variables of any type. The precise type checks are handled by the base type. Many of our functions take heterogeneous types. There isn't enough information in Declarations.yaml to ensure the precise types of arguments in VariableType, which makes it difficult to add new methods. This is #4943 with a fix to the memset call	2018-02-01 14:56:56 -05:00
gchanan	2ebd7f17eb	Support stack_out as a native function. (#4977 )	2018-02-01 13:57:26 -05:00
gchanan	d183f2305f	Add scalar autograd tests for functions requiring 'special' Variables… (#4953 ) * Add scalar autograd tests for functions requiring 'special' Variables on LHS. * Add index_* tests. * Fix flake8. * Use normal for clamp rather than uniform. * Add tests for gather, scatter, scatter_add. * Make sure masked_select doesn't get all zeros. * Properly fill in make_non_contiguous data for sizes that can't be mad… (#4951) * Properly fill in make_non_contiguous data for sizes that can't be made contiguous. * Use clone instead of copy. * Fix and test backward for mv, ger with scalars. * Fix addmv. * Use grad.type() instead of type(grad). * Fix addr. There are a couple of hacks here: 1) We need to squeeze the backward result because of implicit broadcast of the arguments to match behavior of ger. 2) The broadcast_dims code doesn't work for scalars; I added support for adding '.scalar' onto the end of the broadcast specification, but really this should just be a native function with _out support. * Don't allow scalars in torch.dot for Variables. There is no dot_out, so the lack of _out isn't an issue. * Revert "Don't allow scalars in torch.dot for Variables." This reverts commit 76c521eba8c1fb533e164f121075230209d52927. * Revert "Fix addr." This reverts commit afe04a0078394f94645e10cec53626f582cbc55c. * Revert "Fix addmv." This reverts commit 550c7ac71b3b832a3b74a809fec9ce5f5e554909. * Revert "Use grad.type() instead of type(grad)." This reverts commit ddcb5a424ed004fa2ee238a50177573e6d4a1b89. * Revert "Fix and test backward for mv, ger with scalars." This reverts commit 10b0ecad48d987774c41184ffaf11742322926ab.	2018-02-01 13:56:38 -05:00
Sam Gross	2bf9ed8e05	Revert "Only check that arguments are Variables in VariableType (#4943 )" (#4980 ) Revert "Only check that arguments are Variables in VariableType (#4943)"	2018-02-01 11:59:21 -05:00
Lin Yang	e138203d8f	add sparse_to_dense_test Summary: hypothesis_test have been introduced in D4508879, add a plain test which is more straightforward. Reviewed By: kennyhorror Differential Revision: D6835334 fbshipit-source-id: d05a2cd199b2de56ac0cc0319f19fcd7978647d5	2018-02-01 08:14:37 -08:00
Marat Dukhan	7ee286c80a	Vendor NNPACK dependencies with Caffe2	2018-01-31 21:05:07 -08:00
Viswanath Sivakumar	bac898dbfa	Add is_test option to CTCOp, fix OMP thread count override Summary: Added forward-only mode to CTCOp to compute only the costs without the grads. Also, num_threads was set to 1, which ends up stomping over --caffe2_omp_num_threads mid-execution (https://fburl.com/uq65xfty). Fixing that to use the already configured num OMP threads. Reviewed By: ajtulloch Differential Revision: D6867829 fbshipit-source-id: 9ab1fec9857e00d277a9e82c4bd64caa6f4b2a62	2018-01-31 19:36:47 -08:00
Xue Feng	f652f20f73	change ModOp to support output sign configurations Summary: enable ModOp to control the output sign to follow dividend or divisor. Reviewed By: xianjiec Differential Revision: D6852457 fbshipit-source-id: 62dbb66cacecb8e0a0f81f63f2b7b378efbd6ee2	2018-01-31 18:03:16 -08:00
Guillaume Dumont	65b0474527	Fix for finding protobuf on windows Summary: On windows when using a prebuilt version of protobuf (such as provided by vcpkg) we need to set the PROTOBUF_LIBRARIES and PROTOBUF_INCLUDE_DIRS manually. The CAFFE2_API decoration should only be defined to dllexport when building shared libs. Closes https://github.com/caffe2/caffe2/pull/1854 Differential Revision: D6867345 Pulled By: Yangqing fbshipit-source-id: d4d48f709d313af9dde103fc8dfbfc217261715b	2018-01-31 18:03:15 -08:00
Jerry Pan	eee42748d9	Caffe2: serialize init for parallel workers Summary: Caffe2: serialize init for parallel workers Reviewed By: kevinwilfong Differential Revision: D6862119 fbshipit-source-id: 805b2971eca4501977950420565bd9ea37dc0f6c	2018-01-31 17:50:10 -08:00
Yangqing Jia	5daf4ca1c9	Remove android-cmake submodule	2018-01-31 17:27:06 -08:00
Tongzhou Wang	964707e9b5	temporarily disable test_segfault until we figure out why it intermittently fails on cuda CI workere (#4976 )	2018-01-31 19:04:44 -05:00
Guillaume Dumont	3a82c41d95	Fix for glog on windows Summary: These changes are required to use glog on Windows. Yangqing Please consider merging them as they were removed when PR #1793 was reverted. Closes https://github.com/caffe2/caffe2/pull/1853 Differential Revision: D6863567 Pulled By: Yangqing fbshipit-source-id: f6ce3a1c5855e2b39000ce989d62dc2b34cd4817	2018-01-31 15:52:22 -08:00
Peter Goldsborough	401eeb2007	s/sample_n(n)/sample((n,)) to silence warnings in test_distributions.py	2018-02-01 00:29:50 +01:00
Peter Goldsborough	65353f1342	Remove volatile section from autograd notes	2018-02-01 00:26:36 +01:00
Sam Gross	f2fd38c53c	Use TypeError in PythonArgParser (#4966 ) Uses TypeError from torch/csrc/Exceptions.h in python_arg_parser.cpp so that the exception is interpreted as a Python TypeError instead of RuntimeError.	2018-01-31 18:21:03 -05:00
Ilia Cherniavskii	7c8843f1c0	Async_scheduling update Reviewed By: romain-intel Differential Revision: D6824067 fbshipit-source-id: 00c94afad53941a63971deccea8ee1fff9860764	2018-01-31 15:06:40 -08:00
Dmytro Dzhulgakov	1b4959e48d	Type error message when RTTI is not enabled Summary: When RTTI was not enabled, previously we can only print (RTTI not enabled ...) type error message. This is annoying when developing on mobile environment. Adding gRegistry when #T to have basic string for type easy type inference Reviewed By: Yangqing Differential Revision: D6849614 fbshipit-source-id: d41417d72fdcfb7b8c9ddc4ded604ea598572b73	2018-01-31 15:06:39 -08:00
Tongzhou Wang	f2d3f20f6d	Revert "torch.set_num_threads sets MKL option too" (#4967 ) * Revert "Clarify grad_input_mask documentation in derivatives.yaml (#4963)" This reverts commit 6f3266b4a195db6ade4651431595f9f22bd9e656. * Revert "fix triu and tril for zero-strided inputs on gpu (#4962)" This reverts commit 6c197c2f15090ab7368d183439229b768ece5efc. * Revert "Add mutex for CPU RNG and move TH to C++ (#4041)" This reverts commit 96239dd50e89bc2d1fd5d91cc5ee8fca95b07f90. * Revert "Support multivariate TransformedDistributions (#4937)" This reverts commit ca5071d0721767fcfeb226b5c695dfd5d0671072. * Revert "Only check that arguments are Variables in VariableType (#4943)" This reverts commit d44437968f2b136a3399dc62af66adfd3eaa249e. * Revert "torch.set_num_threads sets MKL option too (#4949)" This reverts commit 2aaeec0db0be0e9e9effd277c268cd224ff66ef9.	2018-01-31 15:38:49 -05:00
Edward Z. Yang	6f3266b4a1	Clarify grad_input_mask documentation in derivatives.yaml (#4963 )	2018-01-31 14:45:25 -05:00
albanD	6c197c2f15	fix triu and tril for zero-strided inputs on gpu (#4962 )	2018-01-31 14:38:49 -05:00
Will Feng	96239dd50e	Add mutex for CPU RNG and move TH to C++ (#4041 ) * Add mutex for CPU RNG * move more things to cpp to make cuda build work * fix mutex bug on OS X * try to fix cuda9 half .x bug * try to fix windows error * create THGeneratorState as seperate field * fix mutex issues	2018-01-31 14:26:39 -05:00
Fritz Obermeyer	ca5071d072	Support multivariate TransformedDistributions (#4937 )	2018-01-31 18:32:24 +01:00
Sam Gross	d44437968f	Only check that arguments are Variables in VariableType (#4943 ) Don't check the ScalarType and Backend of arguments in VariableType. Instead, only check that arguments are Variables of any type. The precise type checks are handled by the base type. Many of our functions take heterogeneous types. There isn't enough information in Declarations.yaml to ensure the precise types of arguments in VariableType, which makes it difficult to add new methods.	2018-01-31 12:31:11 -05:00
Soumith Chintala	2aaeec0db0	torch.set_num_threads sets MKL option too (#4949 )	2018-01-31 11:59:15 -05:00
vtomole	3736ccb1d8	git clone from the master branch in the docker files because branch v0.8.1 does not exist Summary: Closes https://github.com/caffe2/caffe2/pull/1520 Reviewed By: Yangqing Differential Revision: D6853197 Pulled By: orionr fbshipit-source-id: f0a15cd977617294dc1754e7658056ec20e15db2	2018-01-30 21:21:01 -08:00
gchanan	3ac412efe9	Properly fill in make_non_contiguous data for sizes that can't be mad… (#4951 ) * Properly fill in make_non_contiguous data for sizes that can't be made contiguous. * Use clone instead of copy.	2018-01-31 00:09:12 -05:00
Qinqing Zheng	90a3363f29	Return an empty TaskGroup if node managers exist in MultiNodeCheckpointManager Summary: Current MultiNodeCheckpointManager return None in this case, yet in JobRunner we assume this function returns a valid task group, i.e. we call session.run(self.checkpoint_manager.init(...)) directly. This will fail the case we use LocalHostScheduler and reuse a MultiNodeCheckpointManager Reviewed By: azzolini Differential Revision: D6843450 fbshipit-source-id: a7ec942cfe692f19e8751b0078ae6a6108f29e54	2018-01-30 19:20:50 -08:00
Edward Z. Yang	e776f69ddd	Urgent CI fix for test.sh (#4955 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-30 21:41:02 -05:00
Alican Bozkurt	20fbdb9a8b	Adding mean, variance, stddev to distributions (#4923 )	2018-01-31 00:26:32 +01:00
Adam Paszke	ae903ca61a	Fix JIT tracing in autograd codegen (#4941 )	2018-01-31 00:14:36 +01:00
Fritz Obermeyer	8f273dea09	Implement constraint registry	2018-01-31 00:13:28 +01:00
gchanan	52bd369da5	Add some scalar test_autograd tests for multi-tensor functions (#4944 ) * Add some scalar autograd tests for functions taking multiple tensors/variables. * Add skipIfNoScalars.	2018-01-30 16:45:55 -05:00
gchanan	004e7590ac	Add suppress_warnings to test_resize. (#4942 )	2018-01-30 16:03:34 -05:00
Alexander Sidorov	98a4c3f9b2	Enable rnn_cell_test in jenkins Summary: Closes https://github.com/caffe2/caffe2/pull/1839 Differential Revision: D6847623 Pulled By: salexspb fbshipit-source-id: b8a32cb39a8063b8938c89556e5d42606735238d	2018-01-30 11:48:35 -08:00
Jesse Hellemn	9d1721e588	Adding gflags to default dependency of conda builds Summary: Closes https://github.com/caffe2/caffe2/pull/1842 Reviewed By: orionr Differential Revision: D6845845 Pulled By: pjh5 fbshipit-source-id: 311af987ae94977e50069bdb1e98d652eddfb2c8	2018-01-30 11:30:29 -08:00
Dmytro Dzhulgakov	5b43c22f73	Add symbolic_override_first_arg_based (#4799 ) * Add symbolic_override_first_arg_based * flake fix * comment * remove comment (keep forgetting about this PR)	2018-01-30 16:41:43 +01:00
peterjc123	c011c8b5a6	Enable fixed tests again in Windows (#4928 )	2018-01-30 16:33:49 +01:00
Christian Sarofeen	ef4cf860ac	Lazy init in set device, also should not be called in getDevCount (#4918 )	2018-01-30 16:24:31 +01:00
albanD	ee8bcdca79	make torch.cuda.empty_cache() a no-op when cuda is not initialized (#4936 )	2018-01-30 16:22:17 +01:00
Teng Li	5c65466b86	Release NCCL distributed backend from experimental (#4921 ) * Release NCCL distributed backend from experimental * fix typo	2018-01-30 16:21:21 +01:00
Hannes Schulz	ea0283325c	fix copy/paste error in debug message	2018-01-30 12:16:59 +01:00
Xiaomeng Yang	8e8e3eb828	Fix build failure Summary: Fix build failure Reviewed By: pietern Differential Revision: D6843835 fbshipit-source-id: b57b42c6a455325801a1ca6ab9a40d2f47490b11	2018-01-29 23:08:47 -08:00
Lu Fang	560e5c94bd	Change default value of LeakyRelu's alpha from 0 to 0.01 Summary: To match the semantic in ONNX, change the default value of alpha of LeakyRelu to 0.01 Reviewed By: dzhulgakov Differential Revision: D6840975 fbshipit-source-id: 08543f80fd86cbe96a0eee8d725ef137a5bf4ab8	2018-01-29 22:31:12 -08:00
Xiaomeng Yang	6b1f848df6	Adds gpu implementation for FCTransposed Summary: Adds gpu implementation for FCTransposed. Reviewed By: salexspb Differential Revision: D6572785 fbshipit-source-id: a7cd0f7364ace286942c46b91e0287307cbfea83	2018-01-29 19:03:24 -08:00
gchanan	60f5ae05ee	Add more scalar autograd tests. (#4920 ) These are from auto-generated tests from existing tests with the following constraints: 1) Forward function passes with scalar self and size (1,) self 2) No Variable/Tensor arguments (besides self)	2018-01-29 20:53:56 -05:00
Adam Paszke	6f0b7bea03	Add support for requires_grad in JIT's AD (#4898 )	2018-01-30 01:28:50 +01:00
gchanan	712a6c6362	Deprecate out-of-place resize and resize_as on Variables. (#4886 ) * Deprecate out-of-place resize and resize_as on Variables. * Use default UserWarning instead of DeprecationWarning for Variable resize.	2018-01-29 18:02:06 -05:00
Ilia Cherniavskii	d1a3254764	Use global thread pool in async_scheduling Summary: Simplify async_scheduling to use global thread pool instead of per network polling threads Reviewed By: romain-intel Differential Revision: D6814274 fbshipit-source-id: f91ac3e99d9b8cf15578a751ed7929be84840408	2018-01-29 14:54:43 -08:00
gchanan	78ff996dc0	Fix some scalar issues with autograd. (#4889 ) * Fix some scalar issues with autograd. 1) Better error messsages in functions that don't support scalars 2) Don't access size(dim) in the backward of a function taking a scalar because the wrap fails. * Fix CUDA build.	2018-01-29 17:50:41 -05:00
mdschatz	3c952426fb	Add operator attaching net observer Summary: Commonly, net observers attach operator observers at construction. This diff separates the logic into a base class to inherit from. Closes https://github.com/caffe2/caffe2/pull/1806 Reviewed By: salexspb Differential Revision: D6808623 Pulled By: mdschatz fbshipit-source-id: 75ef0eea913ef30943541c829c0a976965f42736	2018-01-29 14:34:34 -08:00
Sam Gross	2a6177e6de	Speed-up repeat autograd tests. (#4915 ) These tests were incredibly slow because they needed to compute a jacobian matrix with 9 million elements. Reduce the sizes used in the test cases.	2018-01-29 16:34:01 -05:00
Sam Gross	12c6088267	Fixes to native_functions.yaml to match existing Tensor behavior (#4911 ) - Add default 'p' value for bernoulli_ - Bind expand, expand_as, and permute only as functions	2018-01-29 15:24:23 -05:00
gchanan	260a246192	Move repeat autograd to C++. (#4885 )	2018-01-29 15:09:59 -05:00
Edward Z. Yang	e93ece90a5	Add Linux Jenkins scripts to PyTorch repo. (#4910 ) Putting these scripts here has a few benefits: 1. PyTorch developers can easily update the scripts without having to ask for permissions to ossci-job-dsl 2. You can test changes in the scripts by opening a PR to PyTorch (functionality is ossci-job-dsl is not easily testable.) 3. If you get one of our stock Docker images, you can run these scripts to trigger a build identical to what would occur in Jenkins (not entirely true yet, but we can make it so.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-29 15:08:06 -05:00
Sam Gross	f0acd68536	Fixes to aten/Declarations.cwrap (#4912 ) - Make transpose and as_strided work on CPU half - Bind at::pow(float, Tensor) - Add _dirichlet_grad from TensorRandom.cwrap	2018-01-29 14:53:37 -05:00
Sam Gross	4f63f348ae	Fix condition in inferUnsqueezeGeometry (#4909 ) The bounds check was too conservative by an extra one.	2018-01-29 14:53:03 -05:00
Yangqing Jia	91d76f5dbd	Reapply Windows fix Summary: Last fix was uncommitted due to a bug in internal build (CAFFE2_API causing error). This one re-applies it as well as a few more, especially enabling gtest. Earlier commit message: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work other than gpu shared_lib, which willyd kindly pointed out a symbol limit problem. A few highlights: (1) Updated newest protobuf. (2) use protoc dllexport command to ensure proper symbol export for windows. (3) various code updates to make sure that C2 symbols are properly shown (4) cmake file changes to make build proper (5) option to choose static runtime and shared runtime similar to protobuf (6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together. (7) enabled gtest and fixed testing bugs. Earlier PR is #1793 Closes https://github.com/caffe2/caffe2/pull/1827 Differential Revision: D6832086 Pulled By: Yangqing fbshipit-source-id: 85f86e9a992ee5c53c70b484b761c9d6aed721df	2018-01-29 10:03:28 -08:00
Yangqing Jia	657214543c	add back cuda auto arch detection Summary: This was removed in an earlier version. Anyway, I suspect this will make jenkins a bit unhappy (do we use gpu instances for building as well?) so firing a PR to test. Closes https://github.com/caffe2/caffe2/pull/1833 Differential Revision: D6834889 Pulled By: Yangqing fbshipit-source-id: bc501cdb9d83a32ad38d24e972c2bfec5242d767	2018-01-29 10:03:27 -08:00
gchanan	f8439c241b	Add some explicit scalar autograd tests. (#4888 ) * Add some explicit scalar autograd tests. * Fix flake8. * Test negative dims more consistently.	2018-01-29 12:40:15 -05:00
albanD	7a47790c27	Add missing _lazy_init in cuda python functions	2018-01-29 18:19:03 +01:00
Zhiyong Dang	c3596c2dfa	remove "-s" compilation flag from clang when build for Android Summary: Now we use clang to build Caffe2 for Android with arm64-v8a ABI, but clang doesn't support "-s" compilation flag. If we append this flag to clang, it will report a warning: > clang++: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument] This submit will check we use gcc or clang to build Caffe2 for Android. Closes https://github.com/caffe2/caffe2/pull/1834 Differential Revision: D6833011 Pulled By: Yangqing fbshipit-source-id: e4655d126fb3586e7af605a31a6b1c1ed66b9bcb	2018-01-29 00:03:02 -08:00
Orion Reblitz-Richardson	94b659034e	Potential fix to net_test failure with one GPU Summary: * Putting up to test on Jenkins since I can't test locally on my Mac. Might fix https://github.com/caffe2/caffe2/issues/1796 but I haven't touched these files before, so it's a guess. :) Closes https://github.com/caffe2/caffe2/pull/1826 Reviewed By: Yangqing Differential Revision: D6832918 Pulled By: orionr fbshipit-source-id: 22bdeafa031dbe6457d81cb105b41a451ca3a25d	2018-01-28 23:25:51 -08:00
Zach DeVito	2d829d15af	[JIT] Add simple shape analysis This quick and dirty shape analysis just makes up fake tensors, and runs them through ATen to do shape propagation.	2018-01-28 22:55:36 -08:00
Zach DeVito	3b38a244ab	Add ArgumentSpec data structure and tests This data-structure will be used as the key in GraphExecutor's code cache. It supports fast creation, hashing, and equality checking because it will run on all inputs to GraphExecutors in the hot path.	2018-01-28 22:55:36 -08:00
Yangqing Jia	d481afb125	Modernizing glog. Same as gflags. Summary: Same as PR #1819. Closes https://github.com/caffe2/caffe2/pull/1830 Differential Revision: D6832171 Pulled By: Yangqing fbshipit-source-id: 462a9b807e78d60748160a0cfd24932c9003fcc3	2018-01-28 18:21:22 -08:00
Tongzhou Wang	64a9ecae02	Dataloader issues (#4643 ) * EINTR and kill by loader fix * addressed @apaszke 's comments * remove EINTR handling and add test if we are in main thread before setting SIGCHLD	2018-01-29 01:18:17 +01:00
Alican Bozkurt	967bceb16b	Implement Transforms (#4771 )	2018-01-28 21:17:16 +01:00
SsnL	3ecd25b065	fix indentation	2018-01-28 20:56:57 +01:00
Ailing	ff3f689239	Add mote tests for Nccl backend (#4796 )	2018-01-28 12:36:59 +01:00
Soumith Chintala	5630bb1fcc	add compress flags to NCCL	2018-01-28 05:55:41 +01:00
Yangqing Jia	73ed0d5ced	Modernizing the gflags dependency in cmake. Summary: Historically, for interface dependent libraries (glog, gflags and protobuf), exposing them in Caffe2Config.cmake is usually difficult. New versions of glog and gflags ship with new-style cmake targets, so one does not need to use variables. New-style targets also make it easier for people to depend on them in installed config files. This diff modernizes the gflags library, and still provides a fallback path if the installed gflags does not have cmake config files coming with it. It does change one behavior of the build process though - when one specifies -DUSE_GFLAGS=ON but gflags cannot be found, the old script automatically turns it off but the new script crashes, forcing the user to specify USE_GFLAGS=OFF. Closes https://github.com/caffe2/caffe2/pull/1819 Differential Revision: D6826604 Pulled By: Yangqing fbshipit-source-id: 210f3926f291c8bfeb24eb9671e5adfcbf8cf7fe	2018-01-27 19:31:14 -08:00
Peter Goldsborough	94e29ba24a	Fix visibility of AT_CUDA_ENABLED (#4892 ) * Fix visibility of AT_CUDA_ENABLED * link ATen with verify_api_visibility so ATen headers get generated in time * Move CUDAHalf.* to ATen/cuda * ATen/cuda/CUDAHalf.cpp -> ATen/cuda/CUDAHalf.cu * Remove inline attributes from HalfFix * Also test for AT_CUDNN_ENABLED and add clarifying comment * Remove unnecessary static inline from HalfFix template * Move Half::operator double() into header for windows * Mark Half::operator() as inline	2018-01-28 02:59:30 +01:00
Rachit Singh	e58a53af6f	Added Poisson self KL + Bernoulli/Poisson KL	2018-01-27 22:36:00 +01:00
Edward Z. Yang	a249016044	New index computation strategy in Functions.cpp (Tensor/TensorList) (#4775 ) When generating autograd::Function wrappers for ATen functions, we need to take derivative expressions in derivatives.yaml (identified by name) and correlate them with the correct index they should take in grad_inputs (identified positionally only). Previously, this computation was done statically in load_derivatives.py (set_up_derivatives) and then we hard-coded indices in the generated Functions.cpp. This is sufficient for supporting ATen operations which consist solely of Tensor arguments, or a single TensorList argument. However, this strategy will not work for mixed Tensor/TensorList arguments, as the index of any Tensor after a TensorList is not known at codegen time, since it will vary depending on the length of the TensorList, e.g., foo({x1, x2}, y) ==> y is index 2 foo({x1, x2, x3}, y) ==> y is index 3 This commit introduces a new strategy for generating these indices which pushes index computation to runtime (though any decent C++ optimizer can re-optimize the index computation back into constants; this was verified in Godbolt.) Instead of hard-coding constants, a small IndexRangeGenerator object is created and used to generate the correct index ranges (std::pair<size_t, size_t>) for each argument. Here is an example of mm rewritten in the new codegen format: variable_list MmBackward::apply(const variable_list& grads) { IndexRangeGenerator gen; auto self_ix = gen.range(1); auto mat2_ix = gen.range(1); variable_list grad_inputs(gen.size()); auto& grad = grads[0]; auto self = self_.unpack(); auto mat2 = mat2_.unpack(); if (should_compute_output({ mat2_ix })) { auto grad_result = mm_mat2_backward(grad, self, mat2_sizes, mat2.strides(), 1); copy_range(grad_inputs, mat2_ix, grad_result); } if (should_compute_output({ self_ix })) { auto grad_result = mm_mat1_backward(grad, mat2, self_sizes, self.strides(), 1); copy_range(grad_inputs, self_ix, grad_result); } return grad_inputs; } Unlike before, where self_ix and mat2_ix were hardcoded as 0 and 1, we derive them by invoking IndexRangeGenerator (which internally is just a little counter which bumps up each invocation of 'range'). Each _ix variable actually represents a range, as can be seen here. variable_list CatBackward::apply(const variable_list& grads) { IndexRangeGenerator gen; auto tensors_ix = gen.range(tensors_size_); variable_list grad_inputs(gen.size()); auto& grad = grads[0]; if (should_compute_output({ tensors_ix })) { auto grad_result = cat_tensors_backward(grad, tensors_sizes_dim, dim); copy_range(grad_inputs, tensors_ix, grad_result); } return grad_inputs; } The invocation of 'copy_range' reads a TensorList returned by the backward function into the correct entries in grad_inputs. tensors_size_ is a new member of CatBackward which is filled with the size of the forward input tensor when cat is originally invoked. With this new code generation strategy, we can completely eliminate the special cases for Tensor and TensorList in index selection, and we can smoothly support mixed Tensor/TensorList by making multiple invocations of gen.range() with non-one arguments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-27 21:46:08 +01:00
Sam Gross	bd9b8a384a	Fix torch.pstrf on Variables (#4883 ) The LAPACK function returns one-indexed pivots. We need to convert them to zero indexed.	2018-01-27 11:13:31 -05:00
Teng Li	ae28411af8	Slightly improve DDP single GPU multi-process dist training performance	2018-01-27 12:15:44 +01:00
Tongzhou Wang	6420c6b224	Improve `torch.cuda.empty_cache` documentation (#4879 ) * add doc about empty_cache wont increase amount of memory available * typo	2018-01-27 04:54:25 -05:00
Xiaolong Wang	f8575f6d68	Breakdown Dispatcher Summary: dispatch by Ngram breakdown Differential Revision: D6794082 fbshipit-source-id: 7f6e8fa3a0abe0dc6d0d466c95e8c4fc865e3abb	2018-01-26 17:47:54 -08:00
Anders Papitto	33d2212751	LSTM sequence lengths: allow unspecified sequence lengths Summary: In this case, each sequence is treated as having a length equal to the first dimension of the input tensor. This matches the semantics of ONNX when the sequence length input is left out. Closes https://github.com/caffe2/caffe2/pull/1764 Reviewed By: dzhulgakov Differential Revision: D6751219 Pulled By: anderspapitto fbshipit-source-id: 89e0efd12339157627494e2b8c83e952bdd8a9f8	2018-01-26 16:32:56 -08:00
Fei Sun	4a528cefac	Remove OpenGL code from benchmark Summary: OpenGL is no longer built by default. Even after setting flag -DUSE_MOBILE_OPENGL, the build fails. Remove it in the benchmark code so that the benchmark can still be built. Closes https://github.com/caffe2/caffe2/pull/1822 Reviewed By: Maratyszcza Differential Revision: D6824777 Pulled By: sf-wind fbshipit-source-id: 5af8b669a36adcd6a98b0a11237b9e03c146bb9d	2018-01-26 16:17:53 -08:00
Yan Shang	e7d4bbc9dd	Add CaffeEnforce in SafeDequeueOp Summary: Preivously in SafeDequeueOp, the in.dims()[0] would fail if in.ndim()=0. However the error message if not informative. I added a Caffe_Enforce, which would print out the input and output blob name. This is very helpful for future debugging as well. Differential Revision: D6821421 fbshipit-source-id: b07e5829a2c580aaaac88b0d9ff8d05f6da11713	2018-01-26 13:50:32 -08:00
Adam Paszke	fe9121ff59	Fix a bug in BatchMM JIT pass add node has multiple overloads, including one that only takes a single input. This wasn't checked previously and could lead to segfaults.	2018-01-26 22:40:36 +01:00
Adam Paszke	e5958d0e67	Inherit JIT scopes when cloning only when it's correct It's correct only when the new graph owns the same scope tree as the original one. We can end up with dangling pointers otherwise.	2018-01-26 22:40:36 +01:00
Adam Paszke	349a1c3424	Add code for lambda lifting backward in JIT's AD	2018-01-26 22:40:36 +01:00
gchanan	db0f1e806c	Add Variable (value) tests for variable fill, index_fill, masked_fill. (#4875 ) * Add Variable (value) tests for variable fill, index_fill, masked_fill. * Skip scalar tests if built without scalars. * Fix flake8. * Remove _scalar_sum remnants.	2018-01-26 16:09:25 -05:00
Edward Z. Yang	b8ab7bee26	Use variadic templates instead of initializer lists and overloads. (#4772 ) Suppose you are given a list of arguments, each of which may be Tensor or TensorList. How can you write a function that can treat these arguments uniformly as a list of tensors? This patch solves the problem using variadic templates. Why variadic templates? Use of variadic templates means anyone working with this code has to understand universal references, perfect forwarding, parameter packs and some idioms of C++ template design. However, I argue that variadic templates are the right tool for supporting the implementation of functions which must take an arbitrarily heterogenous set of inputs. We were able to limp by in old code because, for the most part, tensor inputs were homogenous, but this is no longer the case for some non-primitively differentiable functions; and with the upcoming cuDNN RNN in ATen PR, will no longer be the case for primitively differentiable functions too. There are two parts to the PR. First, we add torch/csrc/utils/variadic.h, which defines a mix-in IterArgs that takes any class which supports operator(), and augments with a new variadic function apply() which calls operator() on each argument passed to it. In an original draft of the patch, I wrote the recursion for each parameter pack from scratch for each function; however, it turns out there are no fewer than seven instances where we need this idiom, and the mix-in reduces the lines of code, and also helps centralize the most important (and easy to forget) boilerplate for perfect forwarding. To verify that IterArgs is compiled away into an unrolled form per call site, I inspected the assembly on some synthetic examples. Next, we modify the following functions to make use of IterArgs: - compute_requires_grad - Function::flags (Variable and Tensor variants) - flatten - isTracing - count_tensors / count_variables Finally, the tuple packer is rewritten to be variadic, although we cannot make use of IterArgs (since we are given a tuple). It might make sense to refactor the code into a generic piece which invokes a function with the arguments specified by a tuple, and then an appropriate IterArgs, but we leave this for future work. One thing to note: we cannot write a function with overloads for both Tensor and Variable, because both ArrayRef<Variable> and Tensor have implicit conversions from Variable, making such an overload ambiguous. It may be interesting to remove the implicit conversion from ArrayRef. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-26 15:56:39 -05:00
Edward Z. Yang	24177adc12	Make TensorDescriptor call more portable (#4878 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-26 15:31:53 -05:00
Lin Yang	252211b001	testPairwiseDotProduct Summary: as title. Reviewed By: kennyhorror Differential Revision: D6793829 fbshipit-source-id: f803e0400635ca37184f1dd5bb711bfe0e4bea21	2018-01-26 11:33:08 -08:00
Jesse Hellemn	51feaee007	Assorted small change to conda scripts Summary: More changes to be added later. I need to make a PR so that I can point jenkins to this Closes https://github.com/caffe2/caffe2/pull/1767 Reviewed By: orionr Differential Revision: D6817174 Pulled By: pjh5 fbshipit-source-id: 0fc73ed7d781b5972e0234f8c9864c5e57180591	2018-01-26 09:32:36 -08:00
Edward Z. Yang	84c6887d2a	Switch cuDNN Descriptor classes to use unique_ptr. (#4850 ) The primary benefit is now we have working move constructors et al without having to write all the boilerplate. Furthermore, the size of the code is substantially reduced. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-26 12:28:15 -05:00
Alexander Sidorov	a3b8c459d4	Revamp MNIST tutorial Summary: Main changes: 1. Move reader creation to Brew in order to be consistent and avoid a wild use of param_init_net 2. Use optimizers for training function, avoid manual optimizer construction 3. Add MLP mode (a default) 4. Fix a bunch of too verbose comments and add a bit of new explanations Closes https://github.com/caffe2/caffe2/pull/1760 Differential Revision: D6749059 Pulled By: salexspb fbshipit-source-id: 9dfbbb2d9772a74a0300c2e404a92e791f7cc593	2018-01-26 09:17:31 -08:00
Vladimir Chalyshev	8c02674964	Revert D6817719: [caffe2][PR] Better support for windows Summary: This reverts commit d286264fccc72bf90a2fcd7da533ecca23ce557e bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! cause_a_sev_many_files Differential Revision: D6817719 fbshipit-source-id: 8fe0ad7aba75caaa4c3cac5e0a804ab957a1b836	2018-01-26 06:08:49 -08:00
Yangqing Jia	8aa8eaabb1	Better support for windows Summary: Basically, this should make windows {static_lib, shared_lib} * {static_runtime, shared_runtime} * {cpu, gpu} work. A few highlights: (1) Updated newest protobuf. (2) use protoc dllexport command to ensure proper symbol export. (3) various code updates to make sure that C2 symbols are properly shown (4) cmake file changes to make build proper (5) option to choose static runtime and shared runtime similar to protobuf (6) revert to visual studio 2015 as current cuda and msvc 2017 do not play well together. Closes https://github.com/caffe2/caffe2/pull/1793 Reviewed By: dzhulgakov Differential Revision: D6817719 Pulled By: Yangqing fbshipit-source-id: d286264fccc72bf90a2fcd7da533ecca23ce557e	2018-01-26 00:48:43 -08:00
Fei Sun	849b0a0e0e	Update SNPE readme. Indicate libgnustl_shared.so is also needed to ru… Summary: …n snpe binaries Closes https://github.com/caffe2/caffe2/pull/1776 Reviewed By: bwasti Differential Revision: D6777970 Pulled By: sf-wind fbshipit-source-id: 86a863536afadb2f22303b065e1dfcd3896f1152	2018-01-25 16:35:23 -08:00
gchanan	5a5afa5c17	Properly define 'true' in test. (#4859 )	2018-01-25 18:40:23 -05:00
Peter Goldsborough	0fd41a63a1	Integrate Fused8BitRowwise ops with DPER Summary: Updates `sparse_lookup.py` for the new fused 8-bit rowwise quantization. Mostly just changing the same files as the original diffs (D5753626 and D5761202). I know very little about this code here so please let me know if this is safe, also in terms of migration away from the non-fused storage. Reviewed By: kennyhorror Differential Revision: D6710784 fbshipit-source-id: 185f147af52a094a937ba631b0351225e660d205	2018-01-25 15:02:42 -08:00
Sam Gross	483828e25e	Don't throw exceptions inside OpenMP parallel blocks (#4857 ) Fixes undefined behavior: exceptions are not allowed to be thrown across OpenMP constructs.	2018-01-25 17:56:19 -05:00
Orion Reblitz-Richardson	08dc40a5de	Use case insensitive names for Doxygen docs. Summary: * This way we won't have issues across Linux and Mac. * Also eliminates some weirdness where files with both capitalizations existed. Closes https://github.com/caffe2/caffe2/pull/1813 Reviewed By: pjh5 Differential Revision: D6812141 Pulled By: orionr fbshipit-source-id: 27f52089e2db623196349d7036aa8882e93c32fd	2018-01-25 14:33:18 -08:00
Geunsik Lim	8aa3dab959	doc: update installation.md for third_party packages Summary: PR Description ----------------- This commit informs the developers why they have to use packages of third_party folder instead of packages in their Linux distribution. By default, Caffe2 find installed packages in the Linux distribution. If it cannot be found, as a next step Caffe2 uses the version bundled in third_party folder. Changes proposed in this PR: 1. Added difference between Linux distro packages and third_party packages Self assessment: Checked. Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Closes https://github.com/caffe2/caffe2/pull/1724 Reviewed By: pjh5 Differential Revision: D6728185 Pulled By: orionr fbshipit-source-id: 0c596cf56faaccf947caefc49ea3c6f0a473e9bf	2018-01-25 14:33:18 -08:00
Frank Jiang	304e607b70	Fix adam test Reviewed By: pietern Differential Revision: D6787780 fbshipit-source-id: a2d1428b0e028d6f3d8f7c312c90f3fa411cd0a2	2018-01-25 12:59:54 -08:00
gchanan	0844b5b25c	Fix deepcopy with scalars. (#4854 )	2018-01-25 15:12:36 -05:00
gchanan	2648428986	Various indexing fixes around scalars. (#4853 ) 1) Have 0-dim byte tensors behave like Py_TRUE, Py_FALSE 1) Py_TRUE now properly returns a copy from getitem 3) setitem now properly shapes the LHS consistent with the RHS (this doesn't really matter outside of error messages having the proper shape) 4) setitem supports numpy-style copy_to broadcasting (cuts off prefix 1s from src), so e.g. you can setitem (1,1,2,3) to (2,3) even though that doesn't follow the normal inplace broadcasting rules.	2018-01-25 14:05:14 -05:00
Xiaolong Wang	b2cfc5ea53	add KeySplitOp Summary: as titled After converting categorical to Ngram keys, use this op to extract eids Differential Revision: D6794020 fbshipit-source-id: 4f9251a22d7a129da30b92845e312876e6510e7e	2018-01-25 10:50:53 -08:00
Xiaomeng Yang	d695027300	Adds cuda support for LC op Summary: Adds cuda support for LC Op Reviewed By: QueryConnectionException Differential Revision: D6803659 fbshipit-source-id: 538bbf6fd202c79154132fda0e90e175eb09d025	2018-01-25 10:19:48 -08:00
gchanan	c046da76ef	More distributions fixes for scalars. (#4849 )	2018-01-25 12:33:01 -05:00
Soumith Chintala	e0b0328722	add cuda9 options to nccl	2018-01-25 11:19:36 -05:00
Soumith Chintala	bb3bc969ca	fix binary version scheme to be PEP compliant (#4847 )	2018-01-25 11:16:02 -05:00
Edward Z. Yang	4c29d19f53	Update pybind11, fix #4809 (#4811 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-25 10:44:16 -05:00
Richard Zou	e4ddbeb554	Fix typo (#4846 )	2018-01-25 10:33:45 -05:00
Rachit Singh	aaa0288aed	Implemented Poisson in Distributions.cu and Distributions.cpp	2018-01-25 10:28:29 +01:00
Huazhong Ning	90543ff13a	weighted sampling reader dequeue outputs table index Summary: Weighted sampling reader dequeue randomly chooses a hive reader to read a mini-batch. This diff allows dequeue to output the index of the randomly chosen table to a specific blob. Reviewed By: kennyhorror Differential Revision: D6621070 fbshipit-source-id: 754b981fc2bcfdb0146d2a0a5b677e7cfe74211b	2018-01-24 19:06:25 -08:00
Huan Gui	c261b9ce70	Fix NGram from categorical test Summary: Fix the flaky test for ngram from categorical test Reviewed By: dragonxlwang Differential Revision: D6801152 fbshipit-source-id: dcbae17b1d3737a41fb2f5c794c1146a02c542bb	2018-01-24 18:51:16 -08:00
Xiaomeng Yang	afafe8a466	Add LC Layer Summary: Add the 1st version of LC layer. Reviewed By: Yangqing Differential Revision: D6788647 fbshipit-source-id: ebee9215a1d6e1e567548a0fef771802851682a3	2018-01-24 16:51:17 -08:00
gchanan	4970e73304	Add support for distributions and test_distributions when WITH_SCALAR… (#4834 ) * Add support for distributions and test_distributions when WITH_SCALARS enabled. * Fix flake8.	2018-01-24 19:22:05 -05:00
Aarti Basant	fc56e86c7d	Introduce init API for the optional Checkpoint Metadata Handler object Summary: Every call to the checkpoint_metadata_handler write() API requires us to pass all params like db_prefix, db_type etc. Introducing an init API in the checkpoint_metadata_handler so that such params can be saved and need not be passed in every API call Reviewed By: mraway, anshulverma Differential Revision: D6792651 fbshipit-source-id: 059fa4309e8fce1ee5ab009af3e0570573c24245	2018-01-24 15:19:55 -08:00
Teng Li	1b3d6ab864	Enabling Infiniband support for Gloo data channel with auto IB detection (#4795 )	2018-01-24 23:18:24 +01:00
Peizhao Zhang	eea39dbdd9	Updated bbox_transform op to match detectron training code better. Summary: Updated bbox_transform op to match detectron training code better. - Set apply_scale=False and correct_transform_coords=True to match detectron training/inference code. Reviewed By: wat3rBro Differential Revision: D6782894 fbshipit-source-id: 053d9847bf2b3c62a535499017a8413d78871ee0	2018-01-24 14:18:03 -08:00
Xiaomeng Yang	278d398748	Add GPU version of math::Transpose Summary: Add GPU version of math::Transpose Reviewed By: Yangqing Differential Revision: D6747958 fbshipit-source-id: 7047107609386c1ab53492381ca9bcf8bccd2924	2018-01-24 14:18:02 -08:00
zhiyong.dang	3e8465bc02	Check if system has protobuf package when it already has protoc command Summary: When system has protobuf package but hasn't protoc, cmake will be success: > -- ****** Summary ****** -- General: -- CMake version : 3.5.1 -- CMake command : /usr/bin/cmake -- Git version : v0.8.1-967-g27d12d8-dirty -- System : Linux -- C++ compiler : /usr/bin/c++ -- C++ compiler version : 5.4.0 -- Protobuf compiler : PROTOBUF_PROTOC_EXECUTABLE-NOTFOUND -- Protobuf include path : /usr/include -- Protobuf libraries : optimized;/usr/lib/x86_64-linux-gnu/libprotobuf.so;debug;/usr/lib/x86_64-linux-gnu/libprotobuf.so;-lpthread ... Then make will be failed. This submit make it to check protobuf package only when protoc has been found. This pull request is a clone of [1781](https://github.com/caffe2/caffe2/pull/1781), that pull request closed by mistake. Closes https://github.com/caffe2/caffe2/pull/1792 Differential Revision: D6800513 Pulled By: pietern fbshipit-source-id: 79a77a139f342ae0aaa2c37fc1d9a74e28a08422	2018-01-24 13:45:30 -08:00
Lukasz Wesolowski	29a4c942fe	Add support for multi-device batch normalization through an option to data_parallel_model Summary: Stage 3 in stack of diffs for supporting multi-device batch normalization. Adds input parameter to data_parallel_model to enable multi-device batch normalization. Depends on D6699258. Reviewed By: pietern Differential Revision: D6700387 fbshipit-source-id: 24ed62915483fa4da9b1760eec0c1ab9a64b94f8	2018-01-24 13:24:06 -08:00
Lukasz Wesolowski	00a1092641	Add extra optional inputs to SpatialBN and SpatialBNGradient to enable multi-device batch normalization Summary: Diff 2 in stack of diffs for multi-device batch normalization. Allows plugging of intermediate stats into SpatialBN and SpatialBNGradient to enable multi-device batch normalization. Depends on D6697336. Reviewed By: rbgirshick Differential Revision: D6699258 fbshipit-source-id: 1bae0b9a33d257f8de9525f8b2511bec2ec9d51e	2018-01-24 13:24:05 -08:00
Lukasz Wesolowski	9414072159	Add operators to support batch normalization across multiple devices on the same node Summary: This is the first in a series of diffs to enable batch normalization across multiple devices on the same node with data parallel model. The diff contains the ops for computing the per-channel statistics required to obtain the mean and variance across multiple devices on the same node on the forward pass, and the gradient of the bias and scale during backpropagation. The actual modifications to SpatialBN and SpatialBNGradient to make use of these results will be in a separate diff. Reviewed By: rbgirshick Differential Revision: D6697336 fbshipit-source-id: 0de2750fe7e851795f238d9f625aeb4d74023dc2	2018-01-24 13:24:04 -08:00
Pieter Noordhuis	7a232aae49	Add random seed to NGramFromCategorical test Summary: TSIA Reviewed By: Yangqing, Maratyszcza, dzhulgakov Differential Revision: D6797213 fbshipit-source-id: e1132229cda09d1fbde63686aaec81b995989c03	2018-01-24 13:05:28 -08:00
Yanghan Wang	2828c7a391	Moved RoIAlign to OSS. Reviewed By: newstzpz Differential Revision: D6775228 fbshipit-source-id: a9a6689fb5f6004f13ec03db8410fd81e2e6468e	2018-01-24 13:05:27 -08:00
Marat Dukhan	09a1ef54ab	Add missing cerrno include in text_file_reader_utils Summary: text_file_reader_utils.cc uses errno, but lacks #include <cerrno> This causes build failure on Android NDK r16b Reviewed By: Yangqing, pietern Differential Revision: D6706978 fbshipit-source-id: 494b2b0aa7d74d8913bfcbd75015848f16eb9cdb	2018-01-24 12:11:38 -08:00
soumith	8400c57daa	remove now-unnecessary check	2018-01-24 10:37:27 -08:00
Zachary DeVito	0ae5498079	[JIT] add create_autodiff_subgraphs (#4822 ) This pass splits differentiable subgraphs into their own Node, similar to a fusion group. This initial implementation does not create optimal subgraphs, but it works well in the case where most things are differentiable, and has the building blocks (`mergeNodes`) to extend to the better implementation.	2018-01-23 23:46:54 -05:00
Tongzhou Wang	a14abc741e	Heuristic-based autograd execution order (#4746 ) * heap autograd order * --accept JIT test	2018-01-23 23:45:33 -05:00
Teng Li	e979b7c940	Removed redundant import re (#4826 )	2018-01-23 23:43:28 -05:00
Richard Zou	5e72d7af13	Remove setting coalesce to 0 in sparse transpose_ (#4707 ) * Remove setting coalesce to 0 in sparse transpose_ * Remove setting coalesced to 0 in THCSTensor transpose_ * Add test for transpose's coalesce invariant	2018-01-23 21:57:12 -05:00
Richard Zou	bc11511cda	Restore sparse variable transpose_() and t_() (#4779 ) * Restore sparse variable transpose_() and t_() * Add dimension wrapping to transpose_, t_ * Don't expose sparse_raw_resize_ to python	2018-01-23 21:32:40 -05:00
peterjc123	23dc8acbc8	Fix missing import and enable test for profiler on Windows (#4522 ) * Fix missing import and enable test for profiler on Windows * Skip process when excutable is not found	2018-01-23 21:30:42 -05:00
Rachit Singh	1e7d15953e	Added Chi2 test for distributions (#4815 )	2018-01-23 21:29:56 -05:00
Will Feng	82fed06535	disable qr_big cuda test on Windows (#4747 )	2018-01-23 21:29:32 -05:00
Richard Zou	e83546b686	Restore sparse variable _dimI() and _dimV() (#4785 )	2018-01-23 21:13:03 -05:00
Richard Zou	c7a2e318ed	Restore cuda variable.bernoulli() (#4787 )	2018-01-23 21:12:47 -05:00
Xiaolong Wang	29c7c682d8	add NGramFromCategorical Op Summary: as titled Differential Revision: D6783763 fbshipit-source-id: 78280cf15c2cdc3c308562d3f27a81b61ef8d662	2018-01-23 15:08:25 -08:00
Edward Z. Yang	27505e6429	Fix #4480 by tracing inputs before running function. (#4807 ) * Fix #4480 by tracing inputs before running function. The DCE trick says that if I have y = f(x), and f is internally implemented as g, it's OK to trace both g and f. Recall the tracing algorithm is: enter f(x) compute its result y trace y = f(x) return from f So when you run the example above, you'll do this: # suppose x is mapped to %1 enter f(x) enter g(x) result of g is y trace y = g(x a.k.a. %1) (mapping y to %2) return from g result of f is y trace y = f(x a.k.a. %1) (remapping y to %3) return from f and end up with a trace like this: %2 = g(%1) %3 = f(%1) ... only %3 is live, because %2 was killed from the mapping... Subsequent DCE will eliminate the invocation of g and you'll only see f in the final trace. However, if f and g are inplace functions, the machinery breaks: # suppose x is mapped to %1 enter f(x) enter g(x) result of g is x trace x = g(x a.k.a. %1) (remapping x to %2) return from g result of f is x trace x = f(x a.k.a. %2) (remapping x to %3) return from f resulting in: %2 = g(%1) %3 = f(%2) # OOPS This commit changes the strategy so we instead do this: enter f(x) trace f(x) compute its result y trace y = f(x) (computed above) return from f Now we get the correct Value before it is overwritten. Here is what the new trace code looks like: jit::tracer::PreTraceInfo trace_info; if (jit::tracer::isTracing( self, index )) { trace_info = jit::tracer::preRecordTrace( "index_fill", { self, index } ); setattr(trace_info.n, jit::Symbol("dim"), dim); setattr(trace_info.n, jit::Symbol("value"), value); } baseType->index_fill_(self_, dim, index_, value); increment_version(self); rebase_history(self, grad_fn); if (trace_info.state != nullptr) { jit::tracer::postRecordTrace( trace_info, { self } ); } Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Revert "Hot patch ONNX _run_symbolic_function" This reverts commit d1c973fee1a20da86d60d526e253ce89f5840baf. * lintfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add missing expect file Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-23 18:06:55 -05:00
Xue Feng	0e9b0cf779	add error msg in fc input_record Summary: as titled Reviewed By: xianjiec Differential Revision: D6787879 fbshipit-source-id: 4bbdd11455480b25fa18121fa4527a9f0a03addc	2018-01-23 14:48:15 -08:00
gchanan	691c38d670	Remove windows linebreaks in various distributions files. (#4817 )	2018-01-23 17:15:59 -05:00
Anders Papitto	0aa1a6387e	Add a seed to the gru unit test Summary: as it calls np.random and sometimes fails unreproducibly Closes https://github.com/caffe2/caffe2/pull/1779 Reviewed By: pietern Differential Revision: D6779802 Pulled By: anderspapitto fbshipit-source-id: 2ad069f8a15f70a8110b1a6bdb06f81577c53ad4	2018-01-23 13:47:43 -08:00
Zhicheng Yan	24babe249f	single_label_weighted_sampling Summary: When using sample weights to do weighted sampling in everstore loader, the proto size is increased by one. Update image_input_op to support this new use case Reviewed By: chenlifei Differential Revision: D6776709 fbshipit-source-id: 6148908881ad019b6b621413f452ea1814573a00	2018-01-23 13:02:50 -08:00
gchanan	9bb6d33d35	Enable scalars if compiled with WITH_SCALAR environment variable. (#4806 ) * Enable scalars if compiled with WITH_SCALAR environment variable. We are pretty close to enabling scalars (0-dimensional arrays); this allows turning them on for development purposes and to be able to write code that works both with and without scalars enabled. WITH_SCALARS is currently broken with distributions, but should work for test_torch, test_autograd, test_nn. * Fix unsqueeze. * Fix wrap dim, wrapping with Scalar.	2018-01-23 15:44:11 -05:00
Richard Zou	e60f7e2490	Create issue template with guidelines for issue submissions (#4810 )	2018-01-23 15:00:40 -05:00
Marat Dukhan	a7ef4e4d46	Use android.cmake.toolchain from Android NDK Summary: The android.cmake.toolchain file we use from a submodule is unmaintained and not updated since 2015. It causes numerous problems in Caffe2 build: - Caffe2 can't be built for Android ARM64, because gcc toolchain for ARM64 doesn't support NEON-FP16 intrinsics, and the android.cmake.toolchain we use doesn't allow us specify clang-5.0 from NDK r15c - Caffe2 can't be built with Android NDK r16 (the most recent NDK version) - Caffe2 can't be built for Android with Ninja generator This change updates the build script to use $ANDROID/build/cmake/android.cmake.toolchain instead, which is maintained by Android team, and synchronized with Android NDK version. As this toolchain file doesn't support "armeabi-v7a with NEON FP16" ABI, I had to disable mobile OpenGL backend, which requires NEON-FP16 extension to build. With some work, it can be re-enabled in the future. Closes https://github.com/caffe2/caffe2/pull/1740 Differential Revision: D6707099 Pulled By: Maratyszcza fbshipit-source-id: 8488594c4225deed0323c1e54c8d71c804b328df	2018-01-23 11:32:41 -08:00
Viswanath Sivakumar	91e2e67c8b	Add fallbacks for ChannelShuffle and Transpose Summary: Tried implementing ChannelShuffle but couldn't get it to achieve good perf. Even with 2 groups, it takes a pretty big perf hit (D6708051). For transpose, https://software.intel.com/en-us/mkl-developer-reference-c-deep-neural-network-functions mentions it's supported via conversion. Looking into dnnConversionCreate though (https://software.intel.com/en-us/mkl-developer-reference-c-dnnconversioncreate) it looks like from_layout and to_layout are expected to have the same dims and can only differ in strides. So not sure how to go about implementing this. Differential Revision: D6754439 fbshipit-source-id: 8e3c005818be30457eac46b70867f1b52d7ed1a6	2018-01-23 11:04:07 -08:00
Viswanath Sivakumar	3a8bf0d3dc	Fix crash in MKLSumOp due to layout mismatch Summary: MKLSumOp assumes that all inputs will have the same layout, but this needn't be the case as different inputs are typically created by different primitives and some of them might have a custom layout. Create a View() before executing dnnSumCreate(). Differential Revision: D6753233 fbshipit-source-id: 62420b972898066157c9c841275ccc917b3dec59	2018-01-23 11:04:06 -08:00
Xianjie Chen	76a141f016	add error msg in get_key Summary: as title Differential Revision: D6782896 fbshipit-source-id: bd29f6d085e56f51deb4bf6ad81771787fd85a5a	2018-01-23 11:04:05 -08:00
anderspapitto	70f0436335	add Elman RNN export to ONNX (#4613 )	2018-01-23 13:56:11 -05:00
Dániel Simig	2dd79eb53a	Visualize distribution of activation functions Summary: This is a first attempt at completing bootcamp task T24449916. This diff contains 3 major changes: 1) Change LayerModelHelper to allow for exposing the output and parameters of any layer to metrics 2) Added a runner that allows metrics to draw arbitrary plots to a matplotlib axes object 3) Implement a metric that aggregates distributions of values in a blob over the training, and try this out in a notebook Reviewed By: kennyhorror Differential Revision: D6671273 fbshipit-source-id: b8961837395e89c957edbf5c7c862bdb845ccf4b	2018-01-23 10:36:40 -08:00
Christian Sarofeen	5403f3bc17	Temporary fix for Issue 4752 (#4760 ) * Temporary fix for half embedding. * Call data<Half> from data<__half>	2018-01-23 13:34:09 -05:00
Richard Zou	c6a64f1a78	Better unsqueeze_to	2018-01-23 18:02:02 +01:00
gchanan	e37f02469d	Favor Variables over Tensors for scalar constructors in torch.distrib… (#4791 ) * Favor Variables over Tensors for scalar constructors in torch.distributions. Current behvior: 1) distribution constructors containing only python number elements will have their python numbers upcasted to Tensors. 2) Python number arguments of distribution constructors that also contain tensors and variables will be upcasted to the first tensor/variable type. This PR changes the above to favor Variables as follows: 1) The python numbers will now be upcasted to Variables 2) An error will be raised if the first tensor/variable type is not a Variable. This is done in preparation for the introduction of Scalars (0-dimensional tensors), which are only available on the Variable API. Note that we are (separately) merging Variable and Tensor, so this PR should have no real long-term effect. Also note that the above means we don't change the behavior of constructors without python number arguments. * Fix tests that require numpy.	2018-01-23 11:49:15 -05:00
Lynn	c2afd590ae	parallelize elementwise operation with openmp (#2764 ) * parallelize discontiguous tensors' basic operations * add comments * remove unnecessary header file * remove trailing whitespace * resolve omp parallel for error(need for statement directly) in windows	2018-01-23 11:35:11 -05:00
Richard Zou	8c69eacde6	Initialize cuda before setting cuda tensor types as default	2018-01-23 11:06:22 +01:00
Teng Li	154038e318	Removing NCCL clear_group_cache workaround with one more check in new_group (#4766 )	2018-01-23 11:03:52 +01:00
Lin Yang	8e0177255e	Test for PositionWeighted Summary: add Test for SparseLookup with PositionWeighted. Reviewed By: kennyhorror Differential Revision: D6771612 fbshipit-source-id: b4b3bfd514f366f579b4192643330ae73843d4f9	2018-01-22 19:20:46 -08:00
Viswanath Sivakumar	231d6f7b09	Add SqueezeOp in MKLDNN Summary: SqueezeOp support to drop drop dims of size 1. MKLMemory now supports Reshape() if the buffer is in plain layout, in which case just the dims and layouts are modified similar to caffe2::Tensor. SqueezeOp takes care of converting the input to plain layout if needed via an intermediate buffer before calling Reshape(). Differential Revision: D6735656 fbshipit-source-id: 953309498370e1b8986e8c593bc6963f38036255	2018-01-22 18:39:42 -08:00
Stefan Otte	409b1c8319	Improve wording of Sequential docs (#4790 )	2018-01-22 21:18:23 -05:00
Yongjik Kim	966db35dd9	Improve memory access patterns for index operations. (#4493 ) Currently, index operation kernels work in "source/destination index-major order". (E.g., if thread count equals slice size, each thread will process slice #0 in lockstep, and then slice #1, and so on.) However, when elements inside each "slice" is separated by large strides (e.g., selecting columns of a matrix), it is better to switch to "elementInSlice-major order". For example, each thread can process element #0 of every slice, and then element #1 of every slice, and so on.	2018-01-22 20:47:18 -05:00
gchanan	c49f0279a6	Add kwarg-only 'requires_grad' parameter to Variable factories. (#4748 ) * Add kwarg-only 'requires_grad' parameter to Variable factories. Functions that create variables, e.g. torch.ones_like currently always return Variables with requires_grad=False; this is less convenient than the existing Variable constructor that has a requires_grad parameter. This commit adds the parameter at the python binding level. * Fix flake8. * Address review comments. * Match set_requires_grad implementation with tensor_new version.	2018-01-22 19:15:11 -05:00
gchanan	9390f7d3d6	Implement a (data-only) Variable factory (#4753 ) * Implement a (data-only) Variable factory. Implements a function, torch.autograd.variable that is modeled after np.array. The main difference between it and new() and the tensor constructors is it inteprets a python number as data, i.e. as a 0-dimensional tensor (we currently don't expose that at the pytorchl level, so it will temporarily end up as a 1-dimensional tensor), rather than a size. The main difference currently between torch.autograd.variable and np.array is that np.autograd.variable is stricter, e.g. passing a PyFloat when an integral type is the default tensor type will result in an array; np.array basically lets anything through (floating-point / integral mismatch, overflow, etc). This is to keep it consistent with Variable.new when called with a sequence, although we can loosen the checks later. This will be renamed to torch.tensor once we merge Variable and tensor. * Address review comments.	2018-01-22 18:14:22 -05:00
Orion Reblitz-Richardson	e64ad91365	Revert "Add doxygen and graphviz to Jenkins docker base." Summary: This reverts commit 417f1bab18b1721db5edc7ac8abaf883c1f7d3ee. No longer needed since we'll add this within the Jenkins job itself. Closes https://github.com/caffe2/caffe2/pull/1777 Reviewed By: pietern Differential Revision: D6778185 Pulled By: orionr fbshipit-source-id: d66befa76e84f83cf41eea50e54bc610db03ddd0	2018-01-22 15:00:09 -08:00
Wei Zhang	1d4e996b87	Separate parameter downloading tasks from training tasks and run them in a different group Summary: At the end of distributed training, trainer needs to download the parameters back from parameter servers for saving the model. Currently, this parameter downloading happens at the end of job's epoch task group, which creates several problems when checkpointing is enabled for distributed training: 1. When checkpointing is enabled, we run multiple training epochs. At the end of each epoch, the model download tasks will run to collect parameters, but we won't save the model until the true end of training, so there is a big waste of resource. 2. After trainer0 downloads the parameters, these parameters take a lot of memory, so trainer0 can easily run out of memory in the next epoch of training. Our solution is to insert a parameter download task group between the job's training epoch_group and the job's exit_group. Reviewed By: azzolini Differential Revision: D6765393 fbshipit-source-id: 5a4f556fc3c1cd7834a7c406a3c0de3fccd50c49	2018-01-22 14:04:12 -08:00
Yangqing Jia	27f4041738	Checking performance flags during init. Summary: Adds 2 features: (1) In cmake, allow the use of -march=native (2) During initialization, check if Caffe2 is built with matching cpu features of the current machine. This helps us guarding performance claims in case the Caffe2 baseline is built with limited computation capability. Currently only added avx, avx2 and fma which are common. Closes https://github.com/caffe2/caffe2/pull/1775 Reviewed By: ezyang Differential Revision: D6772059 Pulled By: Yangqing fbshipit-source-id: 884a3d7c7a71ed9631b7c6269ae95d842a09e1bd	2018-01-22 14:04:11 -08:00
Junjie Bai	a82b3096ef	OSError will be raised in setup.py if "git" is not installed Summary: Closes https://github.com/caffe2/caffe2/pull/1771 Reviewed By: pietern Differential Revision: D6777503 Pulled By: bddppq fbshipit-source-id: 7ef66c1bdd6a1c410c3938566d5e8979e3bb5b12	2018-01-22 14:04:10 -08:00
Richard Zou	876bcc06b9	Fix squeeze() backward in edge case (#4783 ) * Fix squeeze() backward in edge case * Address comments	2018-01-22 16:36:02 -05:00
gchanan	1569797b15	Use ATen infer_size implementation rather than TH. (#4781 ) * Use ATen infer_size implementation rather than TH. The only substantitive difference between the two implementations is in how empty sizes are handled; in ATen these are treated as scalars (i.e., can be expanded to anything), whereas in TH they are treated as a special case of empty tensors (i.e., can't be expanded to anything). Therefore, this change is necessary to support scalars (0-dimensional tensors). We could also take a bool parameter for determining how we treat empty tensors but this seems unnecessary: if one tries to expand an empty tensors (as a result of an infer_size calculation), the expansion will fail. * Make changes for review. * Attempt to fix windows build. * long -> int.	2018-01-22 15:34:31 -05:00
Soumith Chintala	db45dbbebf	Update README.md	2018-01-22 15:16:48 -05:00
Sam Gross	14033df3cb	Fix resize_as_ on Variables containing SparseTensors (#4745 ) Fix resize_as_ on Variables containing SparseTensors Also enable Tensor::tensor(...) on sparse types	2018-01-22 14:33:42 -05:00
Richard Zou	b7752efc1b	Restore sparse variable methods for: (#4780 ) - _nnz - coalesce - to_dense - is_coalesced	2018-01-22 13:48:51 -05:00
Pieter Noordhuis	d618c05174	Increase lower bound of values for values in div test Summary: This should translate to an 1% error margin. The gradient checker uses a .5% threshold. Closes https://github.com/caffe2/caffe2/pull/1766 Differential Revision: D6774077 Pulled By: pietern fbshipit-source-id: f97c7ffb2ef34fdd71d69320a7fdcf4a6a457715	2018-01-22 09:06:12 -08:00
Richard Zou	a5440717ae	Restores some sparse variable methods (#4687 ) * Restores some sparse variable methods: - transpose - t - zeros - zeros_like - sub - sub_ - div - div_ - mul - mul_ * Restore sparse variable pow()	2018-01-22 10:24:39 -05:00
Adam Paszke	ad2edd8613	Check submodules only in build_deps (#4770 )	2018-01-21 20:24:05 -08:00
Viswanath Sivakumar	b5d513b1f9	Add op in MKLDNN Summary: Just redirects to MKLSumOp. Doesn't support broadcast though since dnnSumCreate expects identical dims. Differential Revision: D6729788 fbshipit-source-id: 3e189465ad9d026bec4954648562ffe4e67fc393	2018-01-21 08:21:43 -08:00
Yongjik Kim	dd5c195646	More documentation for CUDA stream functions. (#4756 )	2018-01-21 12:58:51 +01:00
Vishwak Srinivasan	f033dd60cd	Implementation of the Fisher-Snedecor Distribution (#4706 )	2018-01-20 21:49:09 +01:00
Vishwak Srinivasan	8593c6f4f7	Adding better KL-Tests (#4739 )	2018-01-20 21:47:11 +01:00
Adam Paszke	816d5d8ff7	Scaffolding for source-to-source AD in the JIT	2018-01-20 17:34:08 +01:00
Alexander Sidorov	85126ba217	Semi-automatically generate scripts out of our tutorials Summary: The idea is the following. We are going to automatically generate .py files using a jupyter post-save hook. Also, there is a script to generate these for all the tutorials. The script is also used from Jenkins test.sh. So if you don't run the sync anyhow, test will complain. In this diff I include the framework itself + .py files generated for all tutorials. They live under a separate folder. Closes https://github.com/caffe2/caffe2/pull/1762 Differential Revision: D6749358 Pulled By: salexspb fbshipit-source-id: d6ad28e863a0670af2d1e5af86e16909dc0dcf2c	2018-01-19 22:36:47 -08:00
James Cross	91066559a8	truthy check for empty string in NameScope() Summary: As in name. LATTE translation team moving some code from Python 2 to 3 uncovered a case where comparison between unicode and str types leads NameScope('') to prepend a separator to the beginning of blob names. This fixes it. Thank you so much to dzhulgakov for tracking down the cause of this so quickly! Reviewed By: dzhulgakov Differential Revision: D6766866 fbshipit-source-id: fbe46cff581f425ba10e8668400915ea40baab94	2018-01-19 21:34:09 -08:00
Ilia Cherniavskii	4ce4bc5c7f	Fix occasional test timeouts Summary: Make test less computationally expensive Reviewed By: Yangqing, dzhulgakov Differential Revision: D6766236 fbshipit-source-id: 59e51faa1331d804b11da9f7237ee9ce0cb27df8	2018-01-19 20:08:58 -08:00
Edward Z. Yang	96ceb91384	Add cudnn_is_acceptable function. (#4749 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-19 22:48:30 -05:00
gchanan	d38cf0e1e9	Allow assertEqual checks with mixed Tensors, Variables, numbers. (#4754 ) Currently, a Variable can only be compared with a Variable, but a Tensor can be compared with Tensors or numbers. Relax this constraint so Variables behave identically to Tensors.	2018-01-19 22:28:37 -05:00
gchanan	1fee7cd626	Delete some dead expand code. (#4755 )	2018-01-19 22:27:17 -05:00
Yangqing Jia	ced2c7e2b2	Remove Set/GetDefaultGPUID and move to use current gpu id instead. Summary: Reason for this change: (1) Setting/Getting default gpu id doesn't seem to be used at all. (2) It actually is confusing compared to the CUDA_VISIBLE_DEVICES options etc. (3) When setting cuda_gpu_id=-1 in the CUDAContext arg, it used to use the default gpu id but probably we should use the current gpu - so that the caller will be able to control the device placement. One use case is for TensorRT - if we have a custom callback layer, then it would be easier for TRT or whatever caller to set the running device. Reviewed By: dzhulgakov Differential Revision: D6740357 fbshipit-source-id: 2ea710e434b10220d5a198e31c93847304636863	2018-01-19 18:03:21 -08:00
Peizhao Zhang	69ce46a20b	Moved mask-rcnn inference operators to open source caffe2. Summary: - Moved mask-rcnn inference operators to open source caffe2. - Registered GeneratedProposalsOp as GenerateProposals in addition to GenerateProposalsCPP. Reviewed By: rbgirshick Differential Revision: D6747190 fbshipit-source-id: be98d6b56b5b53b13af46e839f5ceaf27f7fddc3	2018-01-19 16:20:14 -08:00
Peter Goldsborough	cded9683ad	Implement fused 8bit rowwise sparse lengths reductions Summary: Building on D6710785 (float <-> fused_8bit_rowwise conversions) and D6710843 (`FusedEmbeddingLookup`), this diff implements the new reduction operations for the fused 8-bit rowwise storage. I mostly followed the [old 8-bit quantized code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_rowwise_8bit_ops.h) and [full-precision code](diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/lengths_reducer_ops.h). Reviewed By: kennyhorror Differential Revision: D6710844 fbshipit-source-id: b9e85db7437bd32dd44d01733c3749f35c00b06e	2018-01-19 15:44:35 -08:00
Peter Goldsborough	d401c26d63	Add FusedEmbeddingLookup Summary: Updates the perfkernel codebase to implement embedding lookup for our new fused storage format, where each row in the data matrix stores the quantized values and the scale and bias. msmelyan see this as my best-effort attempt at updating the perfkernel stuff for the fused storage. Let me know if any of this is grossly wrong. I also don't know if we need to update any of the prefetching operations or something like that. Note that we have to keep the old code around for a bit until we get rid of the old operations with separate `scale_bias` storage. Reviewed By: kennyhorror Differential Revision: D6710843 fbshipit-source-id: b485ef2389f526c5db1260cac9d4be3fc8df0979	2018-01-19 15:44:34 -08:00
Peter Goldsborough	8dc0702af5	Add float32 <-> fused_rowwise_8bit conversion Caffe2 operators Summary: This first diff adds the conversion operators that go from float to our fused 8bit rowwise quantized storage and back again. For now I've put the scale and bias in front of each row because it makes the pointer arithmetic nicer here and in the EmebddingLookup perfkernel. If benchmarks or other reasons point out that this is a bad idea we can change it easily. Reviewed By: kennyhorror Differential Revision: D6710785 fbshipit-source-id: 086ab91c12d3b472564a06eff6329be6cb9e680e	2018-01-19 15:44:33 -08:00
Summer Deng	e9dceec2c8	Fix the Macro definiton for E in cpuid.h; #undef E Summary: Changed #undef C to #undef E after the definition of Macro E in cpuid.h Reviewed By: ot, luciang Differential Revision: D6763664 fbshipit-source-id: beb221f0c690b5450c39577dd0a843613d802e9c	2018-01-19 15:44:32 -08:00
Orion Reblitz-Richardson	bf45811266	Add doxygen and graphviz to Jenkins docker base. Summary: * This will let us generate documentation on the Jenkins workers. Closes https://github.com/caffe2/caffe2/pull/1772 Reviewed By: ezyang Differential Revision: D6762731 Pulled By: orionr fbshipit-source-id: 2e170d13055429971fc2cce66512480825030572	2018-01-19 15:05:45 -08:00
Alican Bozkurt	f72d86e0d3	Implement geometric distribution (#4708 )	2018-01-19 21:45:14 +01:00
Sam Gross	a0b7169b7e	Ensure that Tensors always have Storages (#4744 )	2018-01-19 13:26:22 -05:00
Heng Wang	c052eb6bbb	update the video input op in caffe2 Summary: This is to update the video input op in caffe2 so that it is up to date. It adds additional support for: 1, optical flow and early fusion 2, different ways of sampling clips from video 3, different ways of resizing the input video Reviewed By: dutran Differential Revision: D6752788 fbshipit-source-id: 0cbd4d4bbbe97b0ada4cba7a55adc91a7af60d5f	2018-01-19 09:52:25 -08:00
Lin Yang	4ea6e6a556	testSparseLookup Summary: add basic test for SparseLookup Reviewed By: kennyhorror Differential Revision: D6749915 fbshipit-source-id: f97af785e4f89f36788a992843066fd1ec2b75a9	2018-01-19 09:27:20 -08:00
Sam Gross	870ef8e95f	Implement record_stream on Variable (#4728 ) The function record_stream is currently only defined on Tensor in TensorCuda.cwrap. It would be best to implement this in ATen and automatically bind it to Python, but we're missing ATen types to represent CUDA streams.	2018-01-19 10:58:13 -05:00
Sam Gross	b6eb7d7ba0	Allow Python Variables to be bound to at::Tensor in pybind11 converter (#4730 ) This allows _broadcast and _broadcast_coalesced to be called on Variables. It also broadens the types accepted by some JIT methods.	2018-01-19 10:57:43 -05:00
Sam Gross	f1c616418d	Fix Python docs for broadcast and braodcast_coalesced (#4727 )	2018-01-19 10:57:20 -05:00
Sam Gross	e23acb3b08	Allow Variables in the (legacy) THNN bindings. (#4723 ) The legacy NN bindings currently operate only on Tensors. We are slowly replacing all uses of Tensor with Variable in Python code so that there will only be one user-visible class. This changes the NN bindings accessed through type2backend to accept either Tensors or Variables. This does not affect the NN bindings that go through ATen.	2018-01-19 10:56:58 -05:00
gchanan	b984c0b6e9	Various testing and utility improvements including torch.testing module. (#4726 ) * Various testing and utility improvements including torch.testing module. 1) Remove method definition for randn_like since ones_like, zeros_like do not have methods. 2) Add an empty_like native function for creating a tensor with uninitialized values. 3) Add an is_floating_point() native function, similar to is_signed(). 4) Add a torch.testing module loosely modeled after numpy.testing; currently it contains make_non_contiguous (moved from test_autograd) and randn_like (wrapper around the VariableFunction). 5) Remove code from test_autograd and test_nn that is responsible for generating grad_outputs to use with gradgradcheck. These now use gradgradcheck's own generating code. This fixes test_nn.py with scalars because gradgradcheck does the right thing here already. * Rename parameter. * Fix parameter usages.	2018-01-19 10:54:41 -05:00
Sam Gross	db6be0e1f1	Fix call to THPUtils_parseSlice (#4732 ) * Fix call to THPUtils_parseSlice THPUtils_parseSlice returns a bool * Add Variable.__index__ * Add test	2018-01-19 09:39:26 -05:00
Richard Zou	b997474a4f	Adds Im2Col and Col2Im (#4729 )	2018-01-19 09:37:53 -05:00
Maxim Berman	f7ab0cb56c	Legacy Padding: correct output size with nInputDim	2018-01-19 12:45:30 +01:00
Dmytro Dzhulgakov	d29670db46	Make tensor cast constructor explicit Summary: Fixes a beautiful bug spotted by mschatz: MetaStr was super slow for TensorCUDA because it was defined for CPU tensors only. And thus C++ friendly was invoking the casting costructor which copied the entire buffer to CPU! I think both copy constructor and cast constructor should be explicit for Tensor given that it's an expensive op. There might be more spots to fix in the code. Original revision with MetaStr bug is `2d026cfe9c` :) Reviewed By: Yangqing Differential Revision: D6758540 fbshipit-source-id: 7d2dffadd84c043908e16927fe02e6ffb01f750c	2018-01-19 01:39:16 -08:00
ngimel	92aeca1279	update runtime dockerfile (#4736 )	2018-01-18 22:07:25 -05:00
Christian Sarofeen	f9fd82d893	Type fix fused/mix precision (#4734 )	2018-01-18 22:05:19 -05:00
Orion Reblitz-Richardson	b28d5a3586	Build doxygen docs with cmake and fix catalog generation Summary: This updates https://github.com/caffe2/caffe2/pull/1096/ to build doxygen docs with cmake and fixes operator catalog generation. See the new README.md for details, but you can run ``` mkdir build && cd build cmake -DBUILD_DOCS=ON .. && make ``` and ``` python caffe2/python/docs/github.py ~/c2docs/_docs/operators-catalogue.md ``` to generate docs. There was one weird issue in `generator.py` that we sometimes receive tuples and sometimes objects. I handled this just by testing `isinstance`, but we might want to be more principled in the future. Closes https://github.com/caffe2/caffe2/pull/1758 Reviewed By: pietern Differential Revision: D6752127 Pulled By: orionr fbshipit-source-id: 9ba9ad8efc920b27a57327f8a7d3050f3650d4ce	2018-01-18 18:47:59 -08:00
Jesse Hellemn	a9a2b9ee3e	Adding a separate script for anaconda builds Summary: Lots of unwanted stuff here that shouldn't be in this branch. I just need to make a PR so I can test it Closes https://github.com/caffe2/caffe2/pull/1765 Reviewed By: orionr Differential Revision: D6752610 Pulled By: pjh5 fbshipit-source-id: cc93290773640a9eb029f350b17f520ac5f2504e	2018-01-18 16:03:45 -08:00
Sam Gross	e855317370	Make dirichlet_grad and standard_gamma match ATen declarations (#4722 ) The Python function has an underscore (_) prefix so the C++ IMPLEMENT_STATELESS call should have an underscore prefix as well.	2018-01-18 16:49:18 -05:00
Sam Gross	93f49667d0	Allow Variables in calls to NCCL bindings. (#4725 ) The Tensor and Variable classes are being merged in Python. This means that all interfaces to C++ must accept Variables where they previously accepted Tensors.	2018-01-18 15:25:41 -05:00
Anders Papitto	e3e6680b48	Add ElmanCell and ElmanRNN Summary: Closes https://github.com/caffe2/caffe2/pull/1742 Reviewed By: dzhulgakov Differential Revision: D6706809 Pulled By: anderspapitto fbshipit-source-id: 15a05786a26aeb719ea4377f4dbbb62738d9e697	2018-01-18 12:14:02 -08:00
Sam Gross	3249d8bf89	Allow Variables in calls to type2backend (#4724 ) Use x.type() instead of type(x) when accessing type2backend to support Variables as well as Tensors.	2018-01-18 15:01:38 -05:00
Anirban Roychowdhury	158e001238	Checking for positive epoch size before running epoch Summary: Checking for positive epoch size before running epoch Reviewed By: pietern Differential Revision: D6738966 fbshipit-source-id: 64e1fb461d784786b20a316999e4c037787f3a14	2018-01-18 11:48:35 -08:00
Ross Girshick	8e4f67ed72	Enable the detectron module in cmake Summary: Closes https://github.com/caffe2/caffe2/pull/1761 Reviewed By: pietern Differential Revision: D6749288 Pulled By: rbgirshick fbshipit-source-id: cfdd2a6c9fe30b7e8f24b2e83e4bb0191d1893a0	2018-01-18 10:21:22 -08:00
Sam Gross	23fc2b7e06	Define CHECK in torch/csrc/cuda/nccl.h (#4721 ) The CHECK function was used but not defined in the nccl.h header file.	2018-01-18 13:08:06 -05:00
Samuel	f072986733	adds reduce argument to BCEWithLogitsLoss interface (#4705 ) * adds reduce arg to BCEWithLogitsLoss interface Adds the missing 'reduce' argument for the BCEWithLogitsLoss module so that it matches the functional interface. * fix indentation and add additional test fixes the indentation used to update the BCEWithLogitsLoss module and adds a unittest to sanity check its usage with `reduce=False`	2018-01-18 10:54:18 -05:00
Adam Paszke	79d15c52cb	Improve the engine support for functional graph execution (#4690 ) Previously the side-effect free grad calculation was performed using callbacks that could also override the decision to run a function. However this had a few problems e.g. it forced us to iterate over pretty much all functions in the graph and drop their buffers. This patch improves the mechanism, by adding explicit support for this kind of evaluation in execute(). It's safer, and the algorithm used to decide which nodes have to be evaluated was replaced with a faster one.	2018-01-18 11:20:30 +01:00
Adam Paszke	d1c4065f0d	Support copy on sparse tensors in at::Type	2018-01-18 11:16:45 +01:00
Adam Paszke	1061d7970d	Move broadcast and broadcast_coalesced to C++	2018-01-18 11:16:45 +01:00
Adam Paszke	de5f7b725e	Base for pure C++ NCCL interface	2018-01-18 11:16:45 +01:00
Zachary DeVito	2da43bf6f1	Make Symbol a true struct (#4717 ) Previous Symbol was just a uint32_t and we converts symbolToString and stringToSymbol. Now Symbol is a struct with a toString method, and constructors from either BuiltinSymbols enums (e.g. kParam) or strings. Symbol is convertible to a uint32_t to ensure it can still be used in switch statement BuiltinSymbol case branches.	2018-01-17 21:49:28 -08:00
gchanan	d7e7e794f5	Fix display of test failure number in test_distributions. (#4713 ) * Fix display of test failure number in test_distributions. Previously, if e.g. the last example of 3 failed, it would say example 2/3. * Fix other instances of enumerate pattern.	2018-01-17 20:57:34 -05:00
Sam Gross	57549b7e44	Bind functions with out= arguments in VariableType (#4565 ) This adds overrides in VariableType for the xxx_out ATen functions and implements Python bindings. There is no support for automatic differentiation. If any of the inputs (or outputs) requires grad, then the function will throw an exception unless it's running in "no-grad" mode. The bindings for calling torch.xxx functions on Variables are moved to a different object. Previously, they were static method on VariableBase. This change prevents users from accidentally calling static methods as if they were instance methods.	2018-01-17 18:27:42 -05:00
Frank Jiang	6f0bb28afb	Stop running RowWiseSparseAdam test on GPU Reviewed By: pietern Differential Revision: D6739194 fbshipit-source-id: 0892cdc6a575a84147f86984c67e7b4bf605a197	2018-01-17 15:05:21 -08:00
Sam Gross	a8bdce38fe	Replace PowConstant (#4711 )	2018-01-17 17:30:56 -05:00
Sam Gross	720c7b1e2c	Move repeat to torch/_utils.py (#4712 ) This moves the implementation of repeat to _utils so that the autograd function can call it directly instead of relying on forward being called on tensors. This also removes _range, which was previously necessary because we shadowed the built-in range() function.	2018-01-17 17:30:43 -05:00
Neeraj Pradhan	b37aa2bf0e	Ensure lazy evaluation for probs and logits (#4691 )	2018-01-17 22:36:40 +01:00
gchanan	539b1ed4b9	Add proper scalar checks to functions bound by nn.yaml. (#4696 ) * Add proper scalar checks to functions bound by nn.yaml. By default, the forward functions use the default ATen scalar checks and the backward functions use x_->isScalar() for grad_x (with grad_input mapping to self). These can also be overridden by specifying a dict of arg_name -> scalar_check. If the argument is not overridden and the default mapping cannot work (because x for grad_x is not passed to the backward), an error is raised and the scalar_check must be explicitly specified. * Fix scalar checks for loss functions with a reduce parameter.	2018-01-17 16:17:04 -05:00
Adam Paszke	1a02d3ae86	Implement MM fusion (MM with add reduction tree) (#4615 ) Implement MM fusion (MM with add reduction tree) A tree where leaves are matrix multiplies and inner vertices are adds can be computed as a single mm. Such subgraph often appear in backward if a single weight is reused multiple times (e.g. in RNNs). NOTE: this seems to be slightly slower on the GPU than the naive implementation, but it's a huge win on the CPU (think 100x lower overhead)	2018-01-17 21:36:21 +01:00
gchanan	db7f5dae77	Test_autograd support for 0-dim input/outputs. (#4647 ) * Test_autograd support for 0-dim input/outputs. This uses the 'fake' _scalar_sum function to test scalar (0-dimensional) inputs and output in test_autograd. Main changes: 1) Introduces a randn_like function (this is really just for convience but it comes up often in testing. 2) Because the Tensor and Variable API are different wrt sizes, we take care to not exit the Variable API when constructing Variables based on other Variables. This is pretty straightforward, but there is sometimes an extra line of code for setting requires_grad. Should we have the 'like' functions maintain requires_grad? Or bind all factory functions with an additional 'requires_grad' parameter? * Fix flake8. * Get rid of _scalar_sum tests. * Use zeros_like instead of more complicated constructs. Also remove _scalar_sum native function / derivative definitions.	2018-01-17 13:55:10 -05:00
Ross Girshick	d6423d9895	Import Detectron ops	2018-01-17 10:31:30 -08:00
Alican Bozkurt	8c2d35c754	Refactor distributions (#4688 )	2018-01-17 11:58:08 +01:00
Frank Jiang	61356cbadc	RowWiseSparseAdam operator Summary: Added the RowWise functionality for SparseAdam, which saves roughly 2/3 memory usage by only keeping one first and second moment term for each row of the parameter tensor, rather than one for each individual parameter. Differential Revision: D6679342 fbshipit-source-id: ce6fb27e35ce41a890c66f6089cd2748d10e7a44	2018-01-16 19:39:31 -08:00
Sam Gross	05ebd15207	Fix cuDNN batch norm overload in VariableType for half precision (#4693 ) cuDNN batch norm uses mixed half/float precision in batch norm. This changes the overload to only check that the arguments are of VariableType and does not check their concrete type (scalar/backend).	2018-01-16 18:19:23 -05:00
Soumith Chintala	d6b48c1571	[ASAN] fix more load_real deletes (#4694 )	2018-01-16 18:07:58 -05:00
Orion Reblitz-Richardson	6ba96952a6	Fix Eigen failure with `conda build conda` on Mac. Summary: * I saw this fail on my Mac, but it's a general problem. More details at https://github.com/ryanrhymes/eigen/issues/2 Closes https://github.com/caffe2/caffe2/pull/1756 Reviewed By: pietern Differential Revision: D6729609 Pulled By: orionr fbshipit-source-id: e03cce1d6a6b68b131bae9b84d28636c06f85615	2018-01-16 14:38:47 -08:00
Sam Gross	cb83474a57	Fix embedding with sparse=True (#4686 ) Fixes #4666	2018-01-16 16:19:20 -05:00
Pieter Noordhuis	559380c9ea	Install CMake 3.6.3 in base image for Android build Summary: This is needed for #1740. Verified that `./build.sh py2-android-ubuntu16.04` builds an Android base image with CMake 3.6.3. Closes https://github.com/caffe2/caffe2/pull/1747 Differential Revision: D6729823 Pulled By: pietern fbshipit-source-id: f7c888b4fba14ff6ea703cc269175b327b49f6b8	2018-01-16 12:59:35 -08:00
Alican Bozkurt	3254eca8c8	Implement binomial distribution (#4658 )	2018-01-16 21:39:05 +01:00
Rishi Raj Singh Jhelumi	1fd05df738	Add no_prefetch option to prefetch_op. Summary: We may not want to run the operator in a prefetch manner if we don't need any prefetching. The option allows without modification to any operator to run it ina normal fashion. Differential Revision: D6717720 fbshipit-source-id: 10114d68edd95258b823603d8532360120421649	2018-01-16 11:07:50 -08:00
Shagun Sodhani	d452291a72	updated documentation for Embedding layer. Fixes #4682 (#4684 )	2018-01-16 13:18:30 -05:00
Richard Zou	ddb767f214	Add printing support for sparse variables (#4683 )	2018-01-16 13:18:10 -05:00
Geunsik Lim	d6ec05d0e3	doc: update installation.md for Ubuntu 14.04/16.04 Summary: PR Description ---------------- This commit is to update how to install Caffe2 in Ubuntu distribution. The existing instruction is written as installation guide for generic Ubuntu distributions. Let's update the existing manual in more detail. Changes proposed in this PR: 1. Added Ubuntu 14.04 section with existing contents. 2. Added Ubuntu 16.04 section Self evaluation: Tested (compilation in Ubuntu 16.04 x64 LTS) Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Closes https://github.com/caffe2/caffe2/pull/1723 Reviewed By: pietern Differential Revision: D6692998 Pulled By: orionr fbshipit-source-id: 8da9250ff27dbeb41f12364cdd531b2fb416c31f	2018-01-16 09:42:14 -08:00
Edward Z. Yang	97fc06ac22	Use restat to reduce ninja rebuilding when running codegen. (#4635 ) * Use restat to reduce ninja rebuilding when running codegen. Usually, you're only working on one codegen file at a time, but in our old behavior, editing one would induce a rebuild of everything that depended on ANY generated file. We fix this in two steps: - Don't write the file (updating the timestamp) when the contents are unchanged. (I had to update three seperate places; shared Python library for build tools when?!) - Use the 'restat' ninja feature to avoid rebuilding when the timestamp doesn't change. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * lintfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * lintfix2 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-16 12:32:22 -05:00
Edward Z. Yang	c4edb56b45	Print full type of Variable tensor Previously, it printed [Variable]; now it prints [Variable CPUDoubleTensor]. I'm not altogether sure why toString on Variable returns the uninformative thing, but that might be worth fixing too. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-16 12:24:55 -05:00
Edward Z. Yang	c3b7baecea	Fix #4422 , use grad for cudnn_batch_norm derivative / don't use toTensor() This commit fixes double-backwards on batch norm. There were two bugs: - Returned buffers from batchnorm backwards were being marked as differentiable when they shouldn't be. The fix for this is "easy": use 'grad' instead of 'grads[0]' in cudnn_batch_norm's backward definition. (More on this below.) - I was using toTensor on a Scalar, which gives me a Tensor of the wrong type when I'm in CUDA world. Using the Scalar add() overload directly solves the problem. The differentiability of returned buffers was annoyingly subtle and I nearly went off and implemented a big pile of infrastructure to "tell" the codegen how to distinguish between differentiable and non-differentiable outputs before realizing that there must be a way we do this legitimately, because it works for THNN. I documented this in derivatives.yaml, and also added tests for the problem in load_derivatives.py to catch the various ways you could "get it wrong". Hope this helps someone else. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-16 12:24:55 -05:00
Soumith Chintala	7449b467d9	fix deallocation and accesses from ASAN detection (#4678 )	2018-01-16 12:21:21 -05:00
Kai Arulkumaran	9f893dda5f	Add LocalResponseNorm to docs (#4681 )	2018-01-16 11:12:50 -05:00
Kai Arulkumaran	2260649fb6	Local Response Normalization (#4667 ) * Local Response Normalization * Add 1D and 3D LRN * Generalise LRN to higher dims * Use mean instead of sum Specify 'across-channels'	2018-01-15 22:23:51 -05:00
Priya Goyal	522276759d	GPU detection fails when CUDA compilation requires CUDA_HOST_COMPILER to be set (#4676 )	2018-01-15 14:29:09 -05:00
Adam Paszke	67494cee9d	Fix cast direction in THCBlas (#4670 )	2018-01-15 11:15:33 -05:00
Adam Paszke	017893e21b	Fix batch norm JIT dispatch	2018-01-14 23:37:44 +01:00
Vishwak Srinivasan	86fe793948	Addition of KL-Divergences for torch.distributions (#4638 )	2018-01-14 22:52:28 +01:00
Siddharth Mittal	27d7182d6c	replace full stop by comma From (batch. hidden_size) to (batch, hidden_size)	2018-01-14 20:34:27 +01:00
Fritz Obermeyer	bdb05c2243	Add tests for distribution .entropy() methods (#4657 )	2018-01-14 13:56:38 +01:00
nguyen-binh-minh	188ee3ff0b	Fix wrong learning rate evaluation in CosineAnnealingLR in Python 2 (#4656 )	2018-01-14 13:10:41 +01:00
David Pollack	05908e8243	current code works with dim = 3, so I added it to dim checks	2018-01-13 12:58:08 +01:00
Alican Bozkurt	9b6441ecbc	Implement Multinomial distribution (#4624 )	2018-01-13 11:26:14 +01:00
Hao Lu	cb7350fc8d	Add vulkanSymbolWrapperReset function Reviewed By: Maratyszcza Differential Revision: D6707702 fbshipit-source-id: 140c4be7884a307953684a13202c668cb2c1a927	2018-01-12 21:18:06 -08:00
Ravindra Rathi	4db89e6890	Check for result in queue only after background process is terminated Summary: Gloo test was waiting only for 10sec for processes to terminate causing tests to be flaky. Reviewed By: pietern Differential Revision: D6672990 fbshipit-source-id: c58ba512396a0e45fa6ea4d14534ab0ccd54f2a9	2018-01-12 18:06:47 -08:00
Pieter Noordhuis	e79eea2c11	Use protoc RPATH to figure out its install prefix Summary: [x] Have to rebase [x] Have to ensure this works on macOS + Anaconda Closes https://github.com/caffe2/caffe2/pull/1741 Differential Revision: D6714172 Pulled By: pietern fbshipit-source-id: 43a16d99a6ddf821a35b512c780cdfa35a721219	2018-01-12 17:39:11 -08:00
Leon Masopust	81898e5d47	Fix for wrong newline in caffe_translator.py (Crop layer translation) Summary: - fixed the false newline at the initialization of the crop layer translation which caused the exceptions described in issue #1215 Closes https://github.com/caffe2/caffe2/pull/1746 Differential Revision: D6716228 Pulled By: Yangqing fbshipit-source-id: dd93b06b3b903f96505d6e6f8e67caeb6981fe66	2018-01-12 16:17:53 -08:00
Anders Papitto	db6777eaf4	fix gru_cell bug Summary: the fc needs to be in the output_gate_t scope so it can find its input weights correctly Closes https://github.com/caffe2/caffe2/pull/1739 Reviewed By: dzhulgakov Differential Revision: D6705443 Pulled By: anderspapitto fbshipit-source-id: 139e83ac77589a203ffe404fedab98eea5b1a51c	2018-01-12 15:34:23 -08:00
Christian Sarofeen	8eded5aece	Fused fp16 lstm backward math fix. (#4611 )	2018-01-12 23:11:05 +01:00
Teng Li	a3b098dcf9	Adding is process_group initialized support (#4618 )	2018-01-12 22:56:54 +01:00
HE, Tao	5343b71a62	More strict shape check on Conv operators. (#4637 ) * More strict shape check on Conv operators. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Test case for conv's shape check. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Fix lint. Signed-off-by: HE, Tao <sighingnow@gmail.com>	2018-01-12 15:32:45 -05:00
gchanan	841ce42daf	Fix flake8. (#4644 )	2018-01-12 14:38:14 -05:00
gchanan	8ef26185d6	Add missing torch declarations to derivatives.yaml. (#4617 ) 1) Zero-dim tensors to the fill functions that weren't bound (they couldn't be called successfully because we haven't enabled scalars), and needed derivatives for their value arguments. 2) ne_ was missing a Scalar overload.	2018-01-12 14:28:17 -05:00
gchanan	eb857ec367	Introduce a (non-public) autograd scalar method and improve printing (#4586 ) * Specialize Variable pinting and always print device for GPU tensors/Variables. * Introduce a (non-public) _scalar_sum() method for autograd scalar testing.	2018-01-12 14:26:38 -05:00
gchanan	a14dd69be8	[ATen] Have any()/all() return a Tensor in preparation for dim/keepdim parameters. (#4639 )	2018-01-12 14:25:21 -05:00
Soumith Chintala	b7a0d0efb5	fix heap-use-after-free in THStorage.c	2018-01-12 11:11:41 -08:00
HE, Tao	b42f163835	[ONNX] export sum, prod, sqrt improve log_softmax. (#4579 ) * ONNX: export sum, prod, sqrt improve log_softmax and fix a typo in doc. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Add new exported op to doc. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Double quotes. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Update trace log of log_softmax. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Improve export when dim is None and axes_i should be a list of ints. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Fix prod when no dim given. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Update line ends in test expected file. Signed-off-by: HE, Tao <sighingnow@gmail.com>	2018-01-12 07:44:56 -05:00
Richard Zou	7e3da98734	Clean up error checking in THPTensor_(_convertToTensorIndexers)	2018-01-12 12:44:00 +01:00
Neeraj Pradhan	736190fc78	Allow broadcasting of value x params in Categorical (#4614 )	2018-01-12 12:16:19 +01:00
Xiaomeng Yang	d5e0aa3d53	Remove gmock from batch_matmul_gpu_test Summary: Remove gmock from batch_matmul_gpu_test Reviewed By: pietern Differential Revision: D6706787 fbshipit-source-id: 3dceac2b0097202c5a03d5bb472f77e0223ea1f1	2018-01-11 15:45:18 -08:00
Viswanath Sivakumar	b2964a92d9	Add MKLConcatOp Summary: MKLConcatOp along the channel dim of NCHW tensors. Spec: https://software.intel.com/en-us/mkl-developer-reference-c-dnnconcatcreate Reviewed By: ajtulloch Differential Revision: D6689716 fbshipit-source-id: 492bc440474f8ce37caa85509789496659b03e79	2018-01-11 14:19:22 -08:00
Xue Feng	dda33ca53a	enable setting model initialization seed Summary: This diff enables setting model initialization seed, instead of random seed, when reproducible restults are desired. Reviewed By: xianjiec Differential Revision: D6642971 fbshipit-source-id: 387b1ee2ecef4f8f66570c882498fb97d7007e17	2018-01-11 14:04:03 -08:00
Soumith Chintala	9760329014	fix possible divide by zero	2018-01-11 13:32:27 -08:00
Xiaomeng Yang	995eafec84	Remove gmock dependency Summary: Remove gmock dependency Reviewed By: pietern Differential Revision: D6704808 fbshipit-source-id: 8067e382061ef00f9536a7588064bcbb73a598c2	2018-01-11 13:24:43 -08:00
gchanan	1d0426a9f5	Prepare test_autograd.py for introduction of scalars (#4599 ) * Distinguish between scalar tests and pyscalar tests. * Distinguish between scalars and no arguments. * Add NoArgsClass so NO_ARGS is iterable. * Fix iterator specification in python3. * Now fix for python 2. * Fix flake8.	2018-01-11 16:23:41 -05:00
Richard Zou	7b31d33e80	Fix use after free (#4559 ) In `THPTensor_(_convertToTensorIndexers)`, a `vector<THPIndexTensor>` is created by constructing `THPTensor`s from sequences/tensors/etc. Each `THPIndexTensor` is then freed with the following: ``` for (auto& idx : indexers) { THIndexTensor_(free)(LIBRARY_STATE idx->cdata); Py_DECREF(idx); } ``` This is a problem because `Py_DECREF(idx)` will turn `idx->ob_refcnt` to 0 since this function created the relevant `THPIndexTensor`s and owns them, causing `THPTensor_(dealloc)` to be called. `THPTensor_(dealloc)` already has a line that calls `THIndexTensor_(free)(LIBRARY_STATE idx->cdata)`. So `THIndexTensor_(free)(LIBRARY_STATE idx->cdata)` gets called twice on the same `cdata`. After the first call frees `cdata`, the second attempts to access flags/members of `cdata` to determine if it should free it.	2018-01-11 16:21:35 -05:00
Soumith Chintala	4d62cf499c	fix out-of-bounds access in THTensor.c caught by asan	2018-01-11 13:17:26 -08:00
Lu Fang	77523df413	Add more check on softmax ONNX exporting logic (#4592 ) * Add more check on softmax exporting logic * Add more comments about axis and dim	2018-01-11 15:14:33 -05:00
Jesse Hellemn	4357dee097	Adapting conda build to work for ubuntu and adding a flag to control precedence of Anaconda include dirs Summary: This should fix Protobuf version problems on all Anaconda builds by putting include directories under Anaconda before all other include directories. Closes https://github.com/caffe2/caffe2/pull/1728 Reviewed By: orionr Differential Revision: D6698435 Pulled By: pjh5 fbshipit-source-id: f73f4a5ebb4ca91db14770a88a704ace69d37ba4	2018-01-11 12:01:04 -08:00
Marat Dukhan	224493d9ce	NNPACK: Use new bindings and custom thread pool Summary: This change should dramatically (~10X) improve performance of convolution with NNPACK engine Closes https://github.com/caffe2/caffe2/pull/1730 Reviewed By: sf-wind Differential Revision: D6695895 Pulled By: Maratyszcza fbshipit-source-id: 26291916811ef4cb819a59aec848c4e23668e568	2018-01-11 10:48:12 -08:00
Lu Fang	d3b6c5e556	Support output_padding in ConvTranspose while doing ONNX exporting (#4583 )	2018-01-11 12:31:06 -05:00
Pan He	2b2a7dc2ad	small fix on MaxPool2d __repr__ (#4591 )	2018-01-11 12:29:43 -05:00
Fritz Obermeyer	71b1120ba8	Fix bug in Dirichlet.rsample(); add tests (#4602 ) * Fix bug in Dirichlet.rsample(); add tests * Address review comments	2018-01-11 12:29:10 -05:00
Soumith Chintala	19a8a3fc35	updating gloo to latest master (#4608 )	2018-01-11 12:28:49 -05:00
Jon Crall	94f439c07c	Fixed setup.py to handle CUDNN_LIBRARY envvar with aten (#4597 ) * Fixed setup.py to handle CUDNN_LIBRARY envvar with aten * undo changes * Added CUDNN_LIBRARY to bat file	2018-01-11 07:24:17 -05:00
Adam Paszke	0988e328c9	Fix errors in travis config	2018-01-11 12:10:23 +01:00
Soumith Chintala	059299b74d	fix compile errors (#4600 )	2018-01-10 21:50:44 -05:00
Fritz Obermeyer	8cff8e93d2	Add torch.distributions.utils._finfo for numerical stability (#4572 ) * Add torch.distributions.utils.finfo * Make _finfo private * Address review comments * Simplify _finfo() to key on Storage type	2018-01-10 21:42:47 -05:00
Soumith Chintala	c1d5e71e7c	removing Local.cwrap entry in build_libs (#4595 )	2018-01-10 21:26:23 -05:00
bddppq	868e77a3d2	Ignore clang compilation database in git (#4601 )	2018-01-10 21:26:02 -05:00
Xiaomeng Yang	0a8a18ca01	Fix GemmBatched Summary: Fix GemmBatched Reviewed By: Yangqing Differential Revision: D6678168 fbshipit-source-id: 132117633573600d4e31c1959a0ccbe34416e1f1	2018-01-10 18:16:52 -08:00
Yedidya Feldblum	9eeb342bf9	Cut the ScopeGuard alias now that we have auto Summary: [Folly] Cut the `ScopeGuard` alias now that we have `auto`. This form works because of hidden lifetime extension: ```lang=c++ folly::ScopeGuard guard = folly::makeGuard([] { /.../ }); // ... // guard falls out of scope ``` But this form would not work correctly: ```lang=c++ folly::ScopeGuard guard = folly::makeGuard([] { /.../ }); std::async(std::launch::async, [guard = std::move(guard)] {}); ``` Because `folly::ScopeGuard` is an rvalue-reference-to-base. We have `auto`, so just remove `folly::ScopeGuard`. This form works correctly: ```lang=c++ auto guard = folly::makeGuard([] { /.../ }); std::async(std::launch::async, [guard = std::move(guard)] {}); ``` Reviewed By: igorsugak Differential Revision: D6690070 fbshipit-source-id: 54e32b300d36fce4eb95a59f1828819afe312ec0	2018-01-10 18:06:32 -08:00
Yedidya Feldblum	b1de1f6a5e	Move ScopeGuardImpl and ScopeGuardImplBase into the detail namespace Summary: [Folly] Move `ScopeGuardImpl` and `ScopeGuardImplBase` into the `detail` namespace. Let them be marked as private implementation details. Reviewed By: andrewjcg Differential Revision: D6665317 fbshipit-source-id: 03e8fee6a16338395ec92c582613b053bd9f74ec	2018-01-10 18:06:31 -08:00
Pieter Noordhuis	90db3fbad2	Include CMake version in configuration summary Summary: Closes https://github.com/caffe2/caffe2/pull/1731 Reviewed By: Yangqing Differential Revision: D6699495 Pulled By: pietern fbshipit-source-id: 4c30ea595f8ea3b0c7bffac15e80c7412b516a16	2018-01-10 17:17:10 -08:00
Yangqing Jia	ab638020f8	Backport FindCUDA functionalities from CMake Summary: This is in principle similar to #1612 and is tested on Windows 2017. CMake passes, although there are still bugs in the MSVC compiler that prevents cuda to compile properly. The difference between this and #1612 is that this diff explicitly puts the CMake files into a separate folder and uses a MiscCheck.cmake chunk of code to test whether we need to include them. See README.txt for more details. Closes https://github.com/caffe2/caffe2/pull/1727 Reviewed By: pietern Differential Revision: D6693656 Pulled By: Yangqing fbshipit-source-id: a74b0a1fde436d7bb2002a56affbc7bbb41ec621	2018-01-10 16:36:03 -08:00
Jon Crall	f94f5723e7	fixed spelling (#4598 )	2018-01-10 18:48:14 -05:00
Tongzhou Wang	0ac58d53b8	ATen conv param expansion; InstanceNorm use_running_stats fix (#4544 ) * fix instancenorm and aten conv param expansion * addressed colesbury 's comments * improve conv input shape check	2018-01-10 17:36:26 -05:00
Edward Z. Yang	bc7a41af7d	Ensure convolution weights are contiguous, fixes #4500 (#4543 ) * Ensure convolution weights are contiguous, fixes #4500 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-10 17:33:31 -05:00
Soumith Chintala	03a6b5ecea	add include	2018-01-10 16:42:20 -05:00
Fei Sun	47de6f47f3	Use GCC to compile Android Caffe2 Summary: It seems GCC performs better. Always use it to compile. Closes https://github.com/caffe2/caffe2/pull/1725 Reviewed By: Yangqing Differential Revision: D6690581 Pulled By: sf-wind fbshipit-source-id: 3fceb25fc081bd4f875e914a0465b959c7fd5eda	2018-01-10 13:09:14 -08:00
Justus Schwabedal	a20ac05c8b	Added method cuda to PackedSequence. (#4430 )	2018-01-10 21:42:37 +01:00
Aarti Basant	33d734fcf1	Generalize construction of db_name in checkpoint manager Summary: Instead of constructing db_name as a member of checkpoint_manager, generalize this function Reviewed By: anshulverma Differential Revision: D6671088 fbshipit-source-id: c528538def66933619f2fdf67820bca5d13571ea	2018-01-10 11:49:17 -08:00
Pieter Noordhuis	944f9aa826	Move Android.mk	2018-01-10 11:32:34 -08:00
gchanan	c9bb811d6a	Remove accumulate_grad version_counter check. (#4566 ) * Remove accumulate_grad version_counter check. * Fix spelling.	2018-01-10 14:22:20 -05:00
Marat Dukhan	2435d22782	Move NNPACK integration to share/contrib/nnpack Summary: we are going to deprecate NNPACK bindings in caffe2/contrib/nnpack. The first step is to move modern NNPACK bindings from caffe2/mobile/contrib/ios/ to caffe2/share/contrib/nnpack/, and is implemented in this diff. Reviewed By: sf-wind Differential Revision: D6687454 fbshipit-source-id: 458614bade92ab5ba5d2ab7f0691071043198b57	2018-01-09 17:22:24 -08:00
Di Yu	cd3e90c16f	Fix failed test due to D6665466 Summary: Test in Jenkins fail becasue test_global_pooling_3d filtered too many tests. We made use of infered value of global_pooling (pad and stride will be constant) to reduce the test samples generated. Reviewed By: pietern Differential Revision: D6686840 fbshipit-source-id: d316c0e9f9070b12770170ab9f36e33de68a9ab9	2018-01-09 16:40:35 -08:00
Luca Antiga	040336f5dc	Further fix to tracing scope (#4558 ) * Set missing temporary scope in callPySymbolicMethod * Use expected traces in all scope tests	2018-01-09 15:57:40 -05:00
Marat Dukhan	cd9d0f4561	Link cpuinfo when using external NNPACK Summary: Close #1685 Closes https://github.com/caffe2/caffe2/pull/1722 Differential Revision: D6686071 Pulled By: Maratyszcza fbshipit-source-id: bbe86bfd479376bc7cdfdd0bad3896f1c2356216	2018-01-09 12:50:52 -08:00
Tongzhou Wang	5918243b0c	Methods for checking CUDA memory usage (#4511 ) * gpu mem allocated * add test * addressed some of @apaszke 's comments * cache stats * add more comments about test	2018-01-09 11:47:48 -05:00
Orion Reblitz-Richardson	a9afecdfc8	Update installation.md to roughly match caffe2.ai Summary: * Also remove build status, since it isn't relevant here. I'm tempted to just reference https://caffe2.ai/docs/getting-started.html and remove all of this, but seemed like it might be worth having a standalone installation.md doc. Closes https://github.com/caffe2/caffe2/pull/1706 Reviewed By: Yangqing Differential Revision: D6666561 Pulled By: orionr fbshipit-source-id: 640f8100a5e4f8d6b2eee2266dd634bd25d0e58e	2018-01-09 08:46:57 -08:00
HE, Tao	f4a75deccf	Fix the inconsistency of `polygamma` on Tensor and Variable, for issue #4466 (#4527 ) * Fix the inconsistency of `polygamma` on Tensor and Variable. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Regression test for #4466, polygamma works on variables. Signed-off-by: HE, Tao <sighingnow@gmail.com> * Add macro IMPLEMENT_STATELESS_SWAP to dispatch stateless methods on Variables correctly. When call stateless methods with more than one arguments and the `self` comes second, the `self` argument needs to be swapped to the first position before dispatching. The macro `IMPLEMENT_STATELESS_ADDXX` is still reserved for deprecated `add**` methods. Signed-off-by: HE, Tao <sighingnow@gmail.com>	2018-01-09 10:39:09 -05:00
Marcin Elantkowski	a0ab48575e	Implement backward pass for pack_padded_sequence	2018-01-09 12:33:15 +01:00
Riddhiman Dasgupta	f99c7d9429	Padding_idx in Embedding supports negative indexing (#4496 )	2018-01-09 12:04:11 +01:00
Di Yu	82198831e7	Fix pool op custom path issue 2, wrongful routing to global pooling Summary: In D5681122 - when routing to global maxpool and average pool, the condition is not correct. see T24876217 for discussion Reviewed By: Yangqing Differential Revision: D6665466 fbshipit-source-id: dcb5b4686249e6ee8e1e976ab66b003ef09b32fd	2018-01-09 00:54:45 -08:00
Fritz Obermeyer	3a335427b0	Start framework for kl_divergence(-,-) in torch.distributions (#4525 )	2018-01-09 09:44:59 +01:00
Zachary DeVito	b3710a2e01	Fix a missing AutoGPU (#4545 ) ATen dispatch in the JIT interpreter needs to switch the current gpu, but it is not handled in ATen itself, and no higher-level pathway ensures the device is set correctly. This also improves debugging information for cross-device issues.	2018-01-08 19:57:59 -05:00
anderspapitto	a3f4fa254c	support GRU export to ONNX (#4390 )	2018-01-08 19:56:29 -05:00
gchanan	a59bb97868	[ATen] Support wrapping dimensions over scalars. (#4536 ) This follows the behavior of numpy in that you can wrap dimensions over a scalar (0-dimensional tensor) in the range [-1, 0]. I.e. scalarTensor.prod(0) and scalarTensor.prod(-1) works, but scalarTensor.prod(2) does not. The only current exception to this is with size(dim) and stride(dim); there are no numpy equivalents of these (they are attributes), so it seems cleaner to just have these as (dimensional wrapping) sugar for sizes()[dim] and strides()[dim]; otherwise there are subtle differences in semantics, e.g. you have to use size(dim) when you want it to directly apply to scalars, if the default value (1?) makes sense in that case. Simpler to just not have that difference. Note that this change can cause problems if code assumed that maybe_wrap_dim would throw an exception in this case and then called sizes()[dim] or size(dim) without checking; I went through the code and only found this case in squeeze/squeeze_.	2018-01-08 19:54:26 -05:00
Zach DeVito	674ddf6b91	Fix multi-gpu fuser bug cuModuleLoad is only valid for a single device so we need to compile for the particular device that the fusion group will run on. CompiledFunction already specializes different traces for tensors, so we just need to have fusion_compiler produce the cuFunction on the right device.	2018-01-08 15:04:22 -08:00
Edward Z. Yang	e3bafb884a	Link NNPACK even when CUDA is not available. (#4541 ) Fixes #4526. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-08 17:25:33 -05:00
Vishwak Srinivasan	5d6a5cf3a7	Implementation of Gumbel Distribution (#4517 )	2018-01-08 23:21:27 +01:00
Neeraj Pradhan	8fe3d287b2	Fix return type for Bernoulli enumerate_support (#4529 )	2018-01-08 23:17:43 +01:00
Anders Papitto	12309f4aa6	GRU cell: add linear_before_reset boolean parameter Summary: This matches the semantics of cudnn (and others, like pytorch) Closes https://github.com/caffe2/caffe2/pull/1695 Reviewed By: dzhulgakov Differential Revision: D6658208 Pulled By: anderspapitto fbshipit-source-id: 00e1716fba47b0ac296d1e9e0131165f4997ac7d	2018-01-08 13:22:56 -08:00
Yan Shang	41bb662d96	add dense regularization Reviewed By: xianjiec Differential Revision: D5617571 fbshipit-source-id: 875d7c8753bdb3b6847d5e3f47ad8568cdf172f8	2018-01-08 13:03:17 -08:00
Viswanath Sivakumar	073312eade	Updates to MKL conversion script Summary: Handling some special cases. Reviewed By: ajtulloch Differential Revision: D6647011 fbshipit-source-id: 6a434442da5e0a63d355242cb8df9418885c6fb4	2018-01-08 12:25:23 -08:00
Sam Gross	04ad23252a	Refactor gen_variable_type (#4487 ) The gen_variable_type.py script now is only responsible for generating VariableType.h/cpp. The parent script, "gen_autograd.py", delegates to gen_autograd_functions.py, gen_variable_type.py, and gen_python_functions.py. I've removed "fallthrough" functions. It's replaced by DONT_RECORD_TRACE, DONT_PROFILE, and DONT_REQUIRE_DERIVATIVE. In preparation for binding the _out variants, I changed some static types to Tensor (from Variable) and we now unpack and name tuple return values.	2018-01-08 13:43:09 -05:00
gchanan	3f974d6ffe	[ATen] Improve ASSERT test infra. (#4505 ) 1) Separates ASSERT_THROWS and ASSERT_THROWSM for checking messages vs not. 2) ADDS TRY_CATCH_ELSE for python-style error checking 3) Uses ASSERT_THROWS and TRY_CATCH_ELSE more generally The previous more ad-hoc constructions were often wrong, i.e. an assert could pass if the logical else threw an exception if it passed the assert in the catch.	2018-01-08 12:22:44 -05:00
Edward Z. Yang	7d25a41251	Fix #4492 , make it impossible to forget to reset cudnn flags (#4503 ) Three stage plan to no more stupidly weird "why isn't cuDNN enabled" bugs: - Add torch.backends.cudnn.disable_global_flags(), which as its name suggests, disables global flag setting in cuDNN, so that you are not allowed to make changes to this state. However, the flags() context manager continues to work (since they are non-global changes). - Call disable_global_flags() in test/common.py - Switch all of the manual flag setting/unsetting in test/test_nn.py to use the context manager. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-08 12:21:09 -05:00
Luca Antiga	d3612a5914	Fix tracking of tracing scopes during ONNX pass (#4524 ) * Fix tracking of tracing scopes during ONNX pass * Use ResourceGuard to manage setting a temporary current scope in Graph * Add tests for ONNX pass scopes * Remove unused num_classes argument	2018-01-08 12:20:52 -05:00
Kaiyu Shi	c650c73cbc	Extract the finish check for profiler (#4519 ) * Extract the finish check for profiler Delete unused import and rearrange the import order. * Add imports for win support	2018-01-08 07:54:55 -05:00
Alican Bozkurt	c9bc6c2bc3	Implement Student's t-distribution (#4510 )	2018-01-08 10:23:48 +01:00
HE, Tao	5c641cc14f	Fix abs specialization for `uint8_t` type. (#4521 ) Signed-off-by: HE, Tao <sighingnow@gmail.com>	2018-01-07 08:38:26 -05:00
peterjc123	e5f25421ae	Implement demangle in Windows (#4515 )	2018-01-07 05:35:10 -05:00
peterjc123	2dd7039b6b	Fix multiprocessing and dataloader tests on Windows (#4453 )	2018-01-06 17:41:36 +01:00
Edward Z. Yang	21d48be2dc	Delete redundant isContiguous check from THCUNN SpatialDilatedConvolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Edward Z. Yang	0f8ece5657	Actually test CUDA double-backwards codepath. Previously, we only tested CPU double-backwards, which is bad! This would have caught #4422 (still not fixed, so those tests are manually disabled) and also uncovered #4500 (not yet diagnosed.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Edward Z. Yang	4e3a4bd688	Check for out of bounds grads access in derivatives.yaml This test would have caught the OOB in thnn_conv_depthwise2d_backward Fixes #4457 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Edward Z. Yang	6a266f5832	s/uses_grad/uses_single_grad/ for more clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Edward Z. Yang	ed33dc1d4f	Fix 'invalid argument 4: weight tensor has to be contiguous' Weight can be non-contiguous due to double backwards, where we transpose the weight. I'm not very happy with this fix but it seems to make the tests pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Edward Z. Yang	1f8a8cc941	Fix two bugs in thnn_conv_depthwise2d_backward gradient. - Out of bounds grads[2] access (thnn_conv_depthwise2d_backward doesn't compute bias gradient) - Groups was not set appropriately for depthwise convolution Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-06 10:58:05 -05:00
Vishwak Srinivasan	123f49badb	Add Slicing capabilities for Sequential, ModuleList and ParameterList (#4491 )	2018-01-06 13:01:17 +01:00
peterjc123	9c2561e60c	Fixes #4475 , Add debug flag for Windows (#4508 )	2018-01-06 12:49:36 +01:00
Yedidya Feldblum	1dd441ba32	Use auto for scope-guard locals v.s. folly::ScopeGuard Summary: Use `auto` for scope-guard locals v.s. `folly::ScopeGuard`. Reviewed By: igorsugak, meyering Differential Revision: D6664915 fbshipit-source-id: ea239b712f3f9dc7ef81105aaf82f4b36bc07db5	2018-01-05 23:01:11 -08:00
Xian Li	c1d9694f42	Backed out changeset 6f532bad5824 Summary: D6636282 caused regression test failure of nmt model use in prod, see 24949620 for besect history. Reviewed By: pietern Differential Revision: D6671602 fbshipit-source-id: d863013964666727cf488a6ac5b01f5216f149d9	2018-01-05 19:34:38 -08:00
Ravindra Rathi	ad45d1bfb5	Added Caffe2 operator CPU binding for Gloo Allgather Summary: Added Caffe2 operator binding for Gloo Allgather algorithm. Added new test to verify the binding. Binding is supported only for CPU device with these changes. Reviewed By: pietern Differential Revision: D6610074 fbshipit-source-id: b21df9b5e71befbdb6841d6b146727bb4c83d753	2018-01-05 12:42:39 -08:00
gchanan	027407f224	[ATen] have ger handle scalars like np.outer. (#4489 ) Basically, scalars and implicitly unsqueezed.	2018-01-05 15:38:17 -05:00
Manoj Krishnan	6d32e36682	Caffe2 Operator: GPU implementation of Swish Activation Summary: GPU (CUDA) implementation of the Swish activation function in Caffe2. Reviewed By: Yangqing, xianjiec Differential Revision: D6656907 fbshipit-source-id: f5f2c667055abf679728d2b5d43998895ddec708	2018-01-05 12:04:25 -08:00
Sam Gross	b8fd57a0cc	Fix handling of empty indices in CUDA Tensor.put_ (#4486 ) Fixes #4386	2018-01-05 12:58:27 -05:00
Neeraj Pradhan	408c84de7c	Supporting logits as parameters in Bernoulli and Categorical (#4448 ) * Supporting logits as parameters in Bernoulli and Categorical * address comments * fix lint * modify binary_cross_entropy_with_logits * address comments * add descriptor for lazy attributes * address comments	2018-01-05 03:45:05 -05:00
James Bradbury	0afcc8ebb9	Fix typo in fusion compiler (#4488 ) This mismatched paren causes a syntax error in generated code. I'm guessing the parentheses are necessary, since there was one in there before, but I don't actually know whether the compiler can produce things like a - (b - c) that would make them required.	2018-01-05 02:16:42 -05:00
Xiaomeng Yang	2cda295244	Adds cpu version of transpose util function in math. Summary: Adds transpose CPU version to prepare for LC layer. Reviewed By: Yangqing Differential Revision: D6641358 fbshipit-source-id: 1825b4c270dea2c0049ba334303abcbf50b22ee7	2018-01-04 23:05:40 -08:00
Pieter Noordhuis	a43fd6ae52	Bump gloo Summary: This includes a fix for caffe2/caffe2#1146. Closes https://github.com/caffe2/caffe2/pull/1609 Differential Revision: D6664351 Pulled By: pietern fbshipit-source-id: 21a206fa0cfcefa95d91a1c279220444854ca5f4	2018-01-04 17:49:21 -08:00
Yangqing Jia	3725d8ea97	Disable the python op test numba import in asan Summary: Some installations of numba seems to be not compatible with asan, so we will disable its import. Reviewed By: dzhulgakov Differential Revision: D6664055 fbshipit-source-id: 311774667e54bdbf328ef280ab2a52ecba1361f2	2018-01-04 17:49:21 -08:00
Alexander Sidorov	64b0039ef9	rnn_cell_test: make it determinitistic and speed up Summary: In this PR I do the following: 1. split lstm_test_main into several tests for LSTM, MiLSTM and various Norm based versions 2. instead of looping over various gradient / optimization parameters now they are random inputs through hypothesis. 3. These change make the test faster and we can avoid limiting number of examples 4. Fix a minor bug with gradient checker in RNN unroll test running twice 5. Generate seed for numpy in hypothesis. This make hypothesis avoid having fluky tests Also note that Norm tests sometimes fail. I haven't looked into it much, it could be just precision issues. New test split should help identify these issues. Closes https://github.com/caffe2/caffe2/pull/1678 Reviewed By: pietern Differential Revision: D6657076 Pulled By: salexspb fbshipit-source-id: 9f59c71ccd2c818156e9d2424c3423d450b8c8e2	2018-01-04 15:00:42 -08:00
Fritz Obermeyer	a3e91515de	Declare constraints for distribution parameters and support (#4450 )	2018-01-04 23:58:26 +01:00
Will Feng	c6adee0807	disable CUDA HalfTensor tests in test_cuda for Windows (#4482 )	2018-01-04 22:58:13 +01:00
Vishwak Srinivasan	1e76ade9dc	Implementation of Pareto Distribution (#4459 )	2018-01-04 22:57:47 +01:00
Richard Zou	73bdb661fe	Fix BCELoss test precision (#4484 ) BCELoss's outputs and gradInput computations are accurate to around 1e-6 on float types (as a relative value, not absolute), which is reasonable. However, the tests use absolute thresholds: the accumulation of 5 gradInputs has to have error less than 0.0002. The worse case for BCELoss's gradInput for each element may be described as 1 / ( (1-x) * x ). Previously, the input to the test was restricted to [0.02, 1- 0.02], resulting in worse-case largest gradInput of 50, resulting in a total accumulated grad of 505 = 250, resulting in an error of 250 1e-6 = 0.00025, which was too big. By restricting x to [0.028, 1- 0.028] we get a worse case of 36.74, resulting in a total accumulated grad of 184, which is less than the 200 needed to have error less than 0.0002.	2018-01-04 16:54:51 -05:00
gchanan	dfde42a94c	Remove THPGenerator default code for random functions in Declarations.cwrap. (#4479 ) The specification and logic aren't necessary anymore, it's fine to specify the default as nullptr.	2018-01-04 16:53:52 -05:00
gchanan	58f6008f76	Improvements around torch.cat on empty Variables (#3602 ) * Add test for empty Variable cat (forward only). * Test for empty cat (no grad/gradgrad checks) * Support gradcheck on empty inputs, check it for cat with an empty Variable. * Fix lint.	2018-01-04 14:47:10 -05:00
Lu Fang	d1c973fee1	Hot patch ONNX _run_symbolic_function	2018-01-04 13:17:21 -05:00
Richard Zou	35c4d73bdb	Deprecate nn.NLLLoss2d (#4238 ) * Deprecate nn.NLLLoss2d * Fix legacy tests * Fix tests * Remove NLLLoss2d from docs, add deprecation warning instead of error * fix lint * Add more to docs	2018-01-04 12:38:04 -05:00
Richard Zou	fe70823f8e	Fix StepLR docs (#4478 )	2018-01-04 12:37:26 -05:00
Hugh Perkins	fc0d940c5e	add gumbel_softmax, based on Eric Jang's implementation (#3341 ) * add gumbel_softmax, based on Eric Jang's implementation * Make gumbel_softmax CUDA friendly * gumbel_softmax tweaks	2018-01-04 12:23:21 -05:00
Edward Z. Yang	2d68956005	Add Tensor::print() for gdb use. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-04 12:12:28 -05:00
Orion Reblitz-Richardson	b98740b8ec	Remove request for proposal link from README.md Summary: * The request has finished. We might do others in the future, but removing for now. Closes https://github.com/caffe2/caffe2/pull/1700 Reviewed By: Yangqing Differential Revision: D6659664 Pulled By: orionr fbshipit-source-id: cd49d41bdde3c07b5acbcd4724aaa359f69e4752	2018-01-04 09:11:05 -08:00
ptrblck	7c729e6321	- added size_splits to functional (#3837 )	2018-01-04 09:52:47 -05:00
Edward Z. Yang	dc76db349e	Delete a pile of dead code (#4295 ) * Delete obsolete basic ops. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More deletion. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete some unused utilities. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete dead apply_fn Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete CppFunction symbolic support. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Delete ForwardFunction Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Batchnorm is 'working' Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-04 09:21:54 -05:00
Adam Paszke	af8b64aadc	Fix template type for std::array size (#4473 )	2018-01-04 08:45:39 -05:00
Vishwak Srinivasan	d7d396b14b	Modify derivatives for efficiency and change `destination` to `result` for consistency (#4415 ) * make derivative changes and change destination --> result * fix typo * add changes for addcdiv also * modify rsqrt derivative * revert the derivative for addcdiv * revert the derivative for div * fix typo, sorry	2018-01-04 08:45:09 -05:00
Adam Paszke	2e279eb260	Make the dash nicer	2018-01-04 13:51:01 +01:00
Adam Paszke	a246d56150	Update build matrix badges in the README	2018-01-04 13:48:29 +01:00
Tongzhou Wang	b062769940	instance norm fix running stats settings (#4444 )	2018-01-04 07:17:55 -05:00
Yongjik Kim	d7da50473e	Add check for slice shape match in index_copy_ and index_add_. (#4342 ) Emits a warning if slices have the same size but different shapes. (It shouldn't be allowed, but it was, so some code might be unknowingly depending on the behavior.) Also refactored argument checking code, including index_fill_.	2018-01-04 07:17:02 -05:00
Tzu-Wei Huang	5b91b240d2	adds missing argument (#4446 )	2018-01-04 01:51:47 -05:00
Xiaomeng Yang	68726df0ac	Fix GemmBatchedOp Summary: Fix GemmBatchedOp to prepare for LC Layer. Reviewed By: Yangqing Differential Revision: D6636282 fbshipit-source-id: 6f532bad582442ebf3da843e973eb85405371c02	2018-01-03 21:16:18 -08:00
HE, Tao	c43b120d43	Improve float precision stability of `linspace` op, fix 4419. (#4470 ) Signed-off-by: HE, Tao <sighingnow@gmail.com>	2018-01-03 22:45:26 -05:00
Tongzhou Wang	cc9dc3f343	add lock for SynchronizedSeedDataset; add additional os level close stderr for tests that launch failing process (#4463 )	2018-01-03 22:45:05 -05:00
Yangqing Jia	28eea8b032	Adding commandline flags to disable implicit engine preferences. Summary: During debugging I found that our recently added automatic engine preference actually makes debugging a bit harder - it implicitly routes computation to e.g. CUDNN when we actually want to test out the default GPU implementations. This diff adds a commandline flag that disables it. Closes https://github.com/caffe2/caffe2/pull/1696 Reviewed By: pietern Differential Revision: D6658765 Pulled By: Yangqing fbshipit-source-id: ef56a16e778eeea6ecdd4dc6002421236e15371a	2018-01-03 18:31:49 -08:00
Will Feng	cc70a33e74	Windows fix for #4312	2018-01-03 21:28:31 -05:00
Yangqing Jia	e9f0761460	Fix pool op custom path issue Summary: This was introduced in D5681122 - it causes a pretty serious numerical issue that broke pooling test. Specifically, if threadIdx.x > sz, max is initialized with an out of bound index and the max is incorrectly computed. Reviewed By: pietern Differential Revision: D6658945 fbshipit-source-id: 487222d26050921ff9c7764fe46076e31a99bb86	2018-01-03 18:14:38 -08:00
Daniel Bermond	a7cc653139	cmake: handle CUDA 9.1 in GCC version check Summary: GCC version check is currently being skipped when using the newly released CUDA 9.1. This will also handle other CUDA 9.x minor releases if any, reducing our work if there are such releases like 9.2. This assumes that the next major CUDA version will be 10.0, needing adjustment only after such major version is released. Closes https://github.com/caffe2/caffe2/pull/1658 Differential Revision: D6659000 Pulled By: pietern fbshipit-source-id: 79291b5da9d4e8b4f2c7ac82fe2b1e7939438bc9	2018-01-03 17:42:55 -08:00
Lei Tian	3329f36f1a	Move load_save_test.py from caffe2/python/ to caffe2/python/operator_test/ Summary: Move load_save_test.py from caffe2/python to caffe2/python/operator_test/ Reviewed By: boryiingsu Differential Revision: D6657724 fbshipit-source-id: 030942316444ec93c3bc2970902d7b3980e60cfc	2018-01-03 17:42:55 -08:00
Sam Gross	48436ac124	Fix compile warning "implicit declaration of function" (#4467 )	2018-01-03 20:37:20 -05:00
Jesse Hellemn	f90feac38b	Adding conda specific script to macos builds Summary: Closes https://github.com/caffe2/caffe2/pull/1640 Reviewed By: pietern Differential Revision: D6624013 Pulled By: pjh5 fbshipit-source-id: 0e980f020bce7bca1cb0845114a6071a004443af	2018-01-03 17:14:27 -08:00
Sam Gross	6a0c636d4e	Don't special case NN functions in gen_variable_type.py (#4395 ) This modifies NN binding in ATen so that the xxx_forward functions now return buffers instead of taking them as inputs. The NN functions with no suffix are implemented in Type.cpp. They call the xxx_forward variants and discard any returned buffers. This simplifies derivatives for NN functions. The derivatives are now defined on the xxx_forward functions and buffers are treated as any other input.	2018-01-03 19:22:50 -05:00
gchanan	e2ccd6e7ab	Rename native/TensorGeometry to native/TensorShape since there is already an ATen (non-native) TensorGeometry. (#4465 )	2018-01-03 18:30:47 -05:00
gchanan	73fbad0bfc	Fix some scalar checks (#4462 ) * Run scalar_tensor_test on CUDA if available. * Fix take scalar_check. * Make ger scalar check always false.	2018-01-03 17:58:19 -05:00
Adam Paszke	d80669fce8	Guard PyArray_Check with WITH_NUMPY	2018-01-03 22:33:21 +01:00
Peter Goldsborough	77c792ec27	Vectorize normal_ (#4312 )	2018-01-03 22:30:55 +01:00
gchanan	e426020c87	Move prod, cumprod backwards to C++ (#4394 ) * Add view_as as a native_function. * Move prod, cumprod backwards to C++. * Update for review requets. * Review comments. * Reorder slice parameters so dim is first. * Update test_slice. * Update test_autograd. * Fix flake8.	2018-01-03 16:27:50 -05:00
Pieter Noordhuis	9835ca9bac	Ensure indices list in sparse optimizer tests is unique Summary: There were no dimensionality constraints to the generated indices array, causing many examples being generated and filtered out. Instead, we should ensure the probability of unique indices is high. There is a better fix for this by using the `unique` keyword argument to `hypothesis.extra.numpy.arrays`, but this is available only in hypothesis version 3.28.0 and later. This is related to #1536 and #1599. Once this change has proven to be OK, we can modify the other tests that now have health check suppression enabled as well. Closes https://github.com/caffe2/caffe2/pull/1686 Reviewed By: Yangqing Differential Revision: D6651789 Pulled By: pietern fbshipit-source-id: d80886c9ccf0a7a842a7580a279f33a2d6cca97c	2018-01-03 12:19:14 -08:00
Adam Paszke	f321f61b9a	Improve dropout Previously it would unnecessarily clone the input in eval mode.	2018-01-03 13:44:49 -05:00
Adam Paszke	17148f891f	Fix a leak in JIT interpreter	2018-01-03 13:44:49 -05:00
Adam Paszke	2d2b157d25	Handle repeated outputs in the tracer	2018-01-03 17:29:27 +01:00
Adam Paszke	e6cbe84bf6	Handle repeated inputs in JIT tracer	2018-01-03 17:29:27 +01:00
Jon Crall	f05ca657dd	added fix for #4408 and a test (#4452 ) * added fix for #4408 and a test * forgot import * moved test to onnxbot/onnx-fb-universe	2018-01-03 10:23:50 -05:00
Ilija Radosavovic	387b4234ea	Provide CMake support for detectron ops Reviewed By: Yangqing Differential Revision: D6637258 fbshipit-source-id: 72b2bf55a5f8ca8e322c8b65f62977416319ed9e	2018-01-03 06:23:14 -08:00
Will Feng	82e995e0b9	Windows fix for #4322 (#4455 ) * fix DISPATCH_ALL_FLOATING_TYPES * fix precision issue	2018-01-03 06:07:53 -05:00
Yangqing Jia	bf37548ccc	Properly include the generate proposal headers. The header files will be committed separately from fbcode.	2018-01-02 21:05:19 -08:00
Yangqing Jia	3cdcbd5986	re-apply D6652354 Summary: TSIA - trying to address internal build errors. Reviewed By: Maratyszcza Differential Revision: D6654287 fbshipit-source-id: dfb77797d2bb449831418a7161587fa724985053	2018-01-02 21:02:18 -08:00
Lei Tian	56508566a1	Enhance Caffe2 Load op to support loading blobs from multiple files. Summary: The current Load op can only load blobs from one file. We need to make the Load op to support loading blobs from a list of dbs. Reviewed By: boryiingsu Differential Revision: D6596034 fbshipit-source-id: 906fa48b0ad61c83e247d497b6b079c04fed499f	2018-01-02 18:02:19 -08:00
Skotch Vail	2f23ab0bfe	Revert D6652354: [caffe2] Move proposal generation headers to oss. Summary: This reverts commit fd291f662e3793b6d11a7e02e1acc741c027a1fd bypass-lint Differential Revision: D6652354 fbshipit-source-id: 108bf97e5c2e27dd73954ef5d2b7c16c434e4597	2018-01-02 17:21:43 -08:00
Yangqing Jia	2af506cb6c	Move proposal generation headers to oss. Summary: TSIA - it used to cause build errors. Reviewed By: pietern Differential Revision: D6652354 fbshipit-source-id: fd291f662e3793b6d11a7e02e1acc741c027a1fd	2018-01-02 16:33:56 -08:00
Ilija Radosavovic	8fd3888c4c	Provide CMake support for contrib/prof Summary: `contrib/prof` provides functionality for profiling (eg. `prof_dag`) but no CMake. Hence, provide CMake support for building it. Reviewed By: Yangqing Differential Revision: D6640488 fbshipit-source-id: 9ed8095b10d7c0337db061206daf2a66f41f4713	2018-01-02 16:02:32 -08:00
Yangqing Jia	77484ecc45	Manually applying cudnn5 pull request. Summary: TSIA. Closes #1631 Reviewed By: pietern, Maratyszcza Differential Revision: D6626887 fbshipit-source-id: 1a2dc7c47bc6ce794fdf598fbd547c04029edce4	2018-01-02 15:31:33 -08:00
Yangqing Jia	48492b02cd	disable travis webhook as we are moving to jenkins as CI Summary: cc pietern Closes https://github.com/caffe2/caffe2/pull/1687 Differential Revision: D6652179 Pulled By: Yangqing fbshipit-source-id: 2ff0efe85970d1e48abd3afac694b76251d45d28	2018-01-02 14:42:15 -08:00
gchanan	33bb849a73	Remove assign_(Scalar). (#4445 )	2018-01-02 16:32:11 -05:00
Tiangao Gou	bc50510016	use numerically stable version of BatchLRLoss Summary: change all use cases of BatchLRloss to the numerically stable version. This includes the uses of function build_loss defined in fbcode/caffe2/caffe2/fb/dper/layer_models/loss.py and class BatchLRLoss defined in fbcode/caffe2/caffe2/python/layers/batch_lr_loss.py. Reviewed By: xianjiec Differential Revision: D6643074 fbshipit-source-id: b5678556b03cbdd380cab8a875974a87c33d7f12	2018-01-02 13:18:36 -08:00
Sam Gross	20b5e82155	Implement embedding in ATen (#4322 ) Implements nn.Embedding (lookup table) in ATen. Breaking change: new optional argument padding_idx in F.embedding to match nn.Embedding. Note that there are a few bugs in Embedding that are inherited from the previous code: - CUDA renorm has race conditions if index contains duplicate entries - sparse gradient doesn't work with scale_grad_by_freq	2018-01-02 15:44:46 -05:00
Fritz Obermeyer	43ab911182	Improve precision of dirichlet_grad() approximation (#4421 )	2018-01-02 20:53:47 +01:00
Kutta Srinivasan	bb04034bf7	Adding a time limit reader Summary: ReaderWithTimeLimit() class to stop after a certain amount of time Reviewed By: boryiingsu Differential Revision: D6477623 fbshipit-source-id: 165874c9344b0c9c7e0b33e12e72e24c46669cb2	2018-01-02 11:33:53 -08:00
Will Feng	bf1c7d96c8	turn off unsupported multiprocessing methods for Windows	2018-01-02 20:03:31 +01:00
Richard Zou	2060f355a6	Fix python gc race condition with THPVariable_traverse (#4437 )	2018-01-02 19:57:21 +01:00
Alykhan Tejani	18a866aedd	Add random_split to torch.utils.data.dataset (#4435 )	2018-01-02 18:56:49 +01:00
Edward Z. Yang	57f9db9c3c	Two NNPACK build fixes. (#4439 ) 1. master NNPACK now uses cpuinfo library, so we detect it and add it to the list of libraries. 2. If a user builds nnpack with --inference-only, there won't actually be enough symbols to successfully link against NNPACK. This won't manifest until quite late in the build process. So we now explicitly test that the gradient functions are available in the library. Upstream bug: https://github.com/Maratyszcza/NNPACK/issues/123 Fixes #4336 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-02 12:42:40 -05:00
Tongzhou Wang	98e5f2c808	nllloss doc (#4438 )	2018-01-02 12:39:21 -05:00
gchanan	2cee02cc86	[ATen] Get rid of assign_(Tensor), use copy_ instead. (#4397 ) * [ATen] Get rid of assign_(Tensor), use copy_ instead. Note: assign_(Scalar) still exists. * Get rid of test that is no longer valid.	2018-01-02 11:35:35 -05:00
Will Feng	4d4b7782cd	Fixes for Windows build on master (#4432 )	2018-01-02 12:58:37 +01:00
Fritz Obermeyer	35abc4efa2	Add low-precision digamma() and polygamma() functions (#4399 )	2018-01-02 11:53:23 +01:00
Zhi-Qiang Zhou	7592e96503	More detailed documentation. (#4428 ) * More detailed documentation. * More detailed documentation. * Fixed W291 * minor bug fixes	2018-01-01 22:14:41 -05:00
Alican Bozkurt	02e7eba309	Implement Chi2 distribution (#4425 ) * add chi2 * add tests for chi2 * add randomized test comments	2018-01-01 19:41:18 -05:00
Tzu-Wei Huang	98c02c20b1	fixes #4403 (#4407 )	2018-01-01 23:44:03 +01:00
Peter Goldsborough	0b328874c6	Pick up NO_NNPACK for ATen build (#4423 )	2018-01-01 16:00:56 +09:00
peterjc123	b7c64249cb	Add quotes and fix ninja on Windows (#4416 )	2017-12-31 18:21:21 +09:00
Alican Bozkurt	2f25b9d052	fix build error for unix (#4414 )	2017-12-31 14:11:32 +09:00
Viswanath Sivakumar	4cf13cf417	Fix crash due to copying empty tensors into MKLMemory Summary: Ran into a scenario where if the CPU op in MKLFallbackOp outputs an empty tensor, attempting to copy the output to MKLMemory (https://fburl.com/www2mtt4) crashes. Modify MKLMemory to gracefully handle this. This is done at the MKLMemory level because we want to make sure that its members such as dims and layout are Reset() correctly. Interestingly, MKL calls fail at different points for dims {0} and dims {0,N} despite the buffer size being empty for both - former in dnnAllocateBuffer and the latter in dnnConversionExecute (likely due to some difference in layout?). Also fixed CopyTo in addition to CopyFrom and tested all scenarios. Reviewed By: ajtulloch Differential Revision: D6646320 fbshipit-source-id: 61df585f610a949f312f05308baf310241dc9cb2	2017-12-30 15:36:48 -08:00
Dr. Kashif Rasul	859a173502	fix AMSGrad for SparseAdam (#4314 )	2017-12-30 13:00:17 +01:00
peterjc123	b78a37a058	Enable ninja during python build process for MSVC (#3993 )	2017-12-30 12:58:32 +01:00
Kam Leung	fec3d4a079	RNN support has been implemented (#4409 ) * RNN support has been implemented `4447b80b5e` was merged in and now support RNN	2017-12-30 09:26:36 +09:00
gchanan	240d448a9c	Split cuda native functions into components; fix mistake with conv_tbc cpp move. (#4398 )	2017-12-29 18:01:24 -05:00
Fritz Obermeyer	6185b27cc6	Improve precision of standard_gamma_grad() (#4369 )	2017-12-29 12:11:04 +01:00
Neeraj Pradhan	fa8de6b4f3	Adding the Cauchy distribution to torch.distributions	2017-12-29 11:57:21 +01:00
Soumith Chintala	99068d2e52	fix nn.init.constant example	2017-12-29 19:14:53 +09:00
Edward Z. Yang	8c9a22a88e	Support NO_NNPACK environment variable (#4401 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-29 16:33:01 +09:00
Jon Morton	4238f5e604	Extract some utility operators to their own source files to reduce build size. Summary: Extract some operators from utility_ops and normalize_op to reduce build size impact of depending on these files. Reviewed By: Maratyszcza Differential Revision: D6616741 fbshipit-source-id: 1757b6b8a3ce4e2a248deee61322344e5095e940	2017-12-28 20:35:44 -08:00
Hao Lu	b132187014	Add vulkan stub Summary: Imported and modified from https://github.com/ARM-software/vulkan-sdk I changed libvulkan-stub.cpp to libvulkan-stub.c Reviewed By: Maratyszcza Differential Revision: D6641092 fbshipit-source-id: 1a7fbf745d58b6111a06a983910c583912365357	2017-12-28 17:37:07 -08:00
William Falcon	ff9d1aeab5	removes duplicate variable reference crash from pad_sequences (#4383 )	2017-12-29 08:34:53 +09:00
Sam Gross	98f71912b0	Fix type signature of in-place NN functions (#4389 ) This is a step towards removing the special casing of NN functions in gen_variable_type.py. It fixes the signature of in-place NN functions so that they return Tensor & instead of Tensor.	2017-12-28 16:50:09 -05:00
Vishwak Srinivasan	af3bffb638	Update derivative of `expm1`	2017-12-28 20:41:20 +01:00
Atabak Dehban	ab80c27b47	Fix undefined FileNotFoundError (#4384 )	2017-12-28 20:32:49 +01:00
Vishwak Srinivasan	89acc10f85	Adding description for Optimizers (#4371 )	2017-12-28 16:55:52 +01:00
Fritz Obermeyer	5c33400dd3	Implement OneHotCategorical distribution (#4357 )	2017-12-28 16:54:55 +01:00
Vishwak Srinivasan	3a169780e9	fix some typos (#4379 )	2017-12-28 22:23:31 +09:00
Vishwak Srinivasan	e519ef5337	Adding torch.expm1() and its inplace function (#4350 )	2017-12-28 18:56:03 +09:00
Richard Zou	d859c3c7cc	Fix creating tensors with np.longlong array	2017-12-28 09:15:03 +09:00
Sam Gross	f8a4b1a266	Split off load_derivatives and gen_autograd_functions from gen_variable_type (#4370 )	2017-12-27 18:59:41 -05:00
anderspapitto	410fd58b4f	support RNN export (#4163 ) Currently 1-layer RNN is supported	2017-12-27 18:10:53 -05:00
gchanan	15b657af84	Support ATen GPU pointwise apply and torch.where. (#4304 ) * Support ATen GPU pointwise apply and torch.where. Like the CPU version, this implements an apply template that is almost identical to the apply template already in THC, but using the ATen API. Much of this involves stripping out the TensorUtils code (which is basically templated ATen-style), although a couple of functions remain that are apply specific (and thus don't seem worth porting to ATen), namely overlappingIndices, canUse32BitIndexMath, and getTensorInfo. We can make those generally available if there's a need. * Use int64_t instead of ptrdiff_t. * Use snake case for _copyIgnoringOverlaps_.	2017-12-27 16:36:50 -05:00
albanie	5bcacb21d5	add bias term to linear __repr__ functions, fix spacing Adds a missing bias term to the __repr__ functions of the Linear and Bilinear modules. Fixes the spacing in the Conv2d __repr__ to make it consistent with other modules.	2017-12-27 22:08:17 +01:00
gchanan	cd23994dbb	Improve matmul native test tolerance. (#4365 ) * Improve matmul native test tolerance. Because we don't directly use bmm in one case of matmul, a comparison to bmm doesn't make sense; instead, we compare to the double result. * Fix spelling.	2017-12-27 15:33:44 -05:00
Sam Gross	a76ac19955	VariableType clean-up (#4366 ) - as_variable no longer needs to be an instance function - mark functions as static	2017-12-27 15:07:00 -05:00
Edward Z. Yang	4453a5402f	allow_inf on test_beta_log_prob (#4354 ) * allow_inf on test_beta_log_prob * Support allow_inf on assertAlmostEqual Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-27 09:25:12 -05:00
Neeraj Pradhan	1b608eea4e	Fix distribution tests due to merge order (#4351 )	2017-12-26 16:37:15 -05:00
Neeraj Pradhan	ffa7fab67f	Minor changes to test utils to catch type errors (#4270 )	2017-12-26 10:08:33 +01:00
yongjik	15163a3273	Improved documentation of several index operations.	2017-12-26 06:08:44 +08:00
Yangqing Jia	efa7c895f6	Misc Windows lint Summary: Closes https://github.com/caffe2/caffe2/pull/1656 Differential Revision: D6633052 Pulled By: Yangqing fbshipit-source-id: 5eeb3912fc769cfd06d252f3ed1d8d5f2a207cfc	2017-12-23 20:07:27 -08:00
Marcin Elantkowski	26168e22cd	fix NameError in torch/nn/rnn.py	2017-12-24 00:26:02 +01:00
Soumith Chintala	6646c3e542	remove CPU builds from Travis, as they are now covered by Jenkins	2017-12-24 06:27:03 +08:00
SsnL	9a48f8d7c3	add tests for btrifact_with_info and doc for btriunpack	2017-12-24 03:08:28 +08:00
SsnL	658d4c7ea8	allow optional int tensor	2017-12-24 03:08:28 +08:00
Samuel	a51a094200	fix MaxPool2d __repr__ missing ceil_mode summary (#4335 )	2017-12-24 03:07:22 +08:00
Neeraj Pradhan	0c4b3f4271	Adding Uniform distribution to PyTorch (#4328 )	2017-12-23 15:14:44 +01:00
Edward Z. Yang	e9bfe8ca92	Make expect file directory search more robust. Previously, we assumed that __main__ was the test file being run, which is not true if you are using pytest. New algorithm uses __module__ of the test class, which is a bit more robust. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-23 11:44:14 +01:00
Ves Stoyanov	1a0eefd5fc	Parallelize batcher Summary: Still WIP, but works for the universal encoder. The other ones are currently broken. Differential Revision: D6492786 fbshipit-source-id: 232e0058eb3a0c036de3adf0295db5efd624cca7	2017-12-22 20:23:26 -08:00
gchanan	3304185c6c	Fix test_gamma_sample_grad. (#4327 )	2017-12-22 22:01:04 -05:00
gchanan	3b7fbc397e	Reorder native_functions.yaml by alphabetical order. (#4326 )	2017-12-22 21:23:27 -05:00
Hao Lu	3837a962d3	Fix typo in Concat and Softmax Reviewed By: Maratyszcza Differential Revision: D6629260 fbshipit-source-id: 06fff59a770312b6948b3b5e1c04db6f539ea268	2017-12-22 17:49:16 -08:00
gchanan	5fb5e7b01d	Split NativeFunctions.cpp into functional components. (#4325 )	2017-12-22 20:08:22 -05:00
gchanan	3264ef95f0	Test CUDA types in native_test and make sure THCTensorRandom launches are valid. (#4323 ) When debugging related issues, cuda-gdb was complaining about 0-sized launches from THCTensorRandom, so these now only launch when the size is valid.	2017-12-22 17:13:06 -05:00
Will Feng	6c4e97220a	disable test_gamma_sample_grad until it's fixed (#4324 ) * disable test_gamma_sample_grad until we have SciPy in xenial builds * skip test_gamma_sample_grad until it's fixed	2017-12-22 17:11:15 -05:00
Will Feng	e6ad0ea27f	disable test_distributed for windows (#4317 )	2017-12-23 04:13:39 +08:00
Sam Gross	de28e754b2	Make Variable.is_sparse an attribute (#4308 ) This matches Tensor.is_sparse, which makes it easier to replace Tensor with Variable.	2017-12-22 12:46:28 -05:00
Adam Paszke	7f6ca8efa5	Fixed unused return value from write	2017-12-22 17:08:05 +01:00
Adam Paszke	89de9a494a	Generate grad_input_mask only if it's actually used	2017-12-22 17:08:05 +01:00
Adam Paszke	d4fd9a3fd4	Remove unused functions	2017-12-22 17:08:05 +01:00
Adam Paszke	9488eeb308	Fix signed compare + redefined macro in libs	2017-12-22 17:08:05 +01:00
Adam Paszke	fb46836fc6	Fixed unused result warnings in THD	2017-12-22 17:08:05 +01:00
Sherin Thomas	492e26fbcd	Pad sequences and Pack sequences (#3875 )	2017-12-22 16:14:09 +01:00
Ilia Cherniavskii	5d3fc364aa	Fix OSS build Summary: Add missing .cc file into CMakeLists for pybind Reviewed By: pjh5, houseroad Differential Revision: D6625894 fbshipit-source-id: 900f10bf7d9abd1e2a1b8cdf56f098664a575889	2017-12-21 19:04:25 -08:00
Fei Sun	5d6dacaafe	Enable building operator QuantDecompZstd Summary: Make operator QuantDecompZstd buildable in open source. The operator is not built by default. Need to specify -DBUILD_SHARE_DIR=ON -DUSE_ZSTD=ON to build it. Test plans: Build android caffe2 with the change without issue. Run a model with the operator successfully. Closes https://github.com/caffe2/caffe2/pull/1613 Reviewed By: Yangqing Differential Revision: D6556723 Pulled By: sf-wind fbshipit-source-id: 453a7d787a55928f2dea1ed2b99f2df011aa8d26	2017-12-21 17:47:05 -08:00
Ilia Cherniavskii	a7ac591d3b	Support for DLPack in Python op Summary: Adding support for DLPack tensors to Python op Reviewed By: Yangqing Differential Revision: D6577702 fbshipit-source-id: e14ef213fcdb2930ffe164667971a92aa8db503c	2017-12-21 17:02:16 -08:00
Pieter Noordhuis	b231efbdc1	Install jupyter in all Jenkins images Summary: Closes https://github.com/caffe2/caffe2/pull/1650 Differential Revision: D6624661 Pulled By: pietern fbshipit-source-id: afb659181defa61a74b3dd4139495fce19691710	2017-12-21 16:38:15 -08:00
Sam Gross	1632ab2979	Fix default device for Variable.new() (#4307 ) Variable.new() should default to the device of "self" if no device is specified. Previously, we were using the current device. This now matches Tensor.new().	2017-12-21 18:35:35 -05:00
gchanan	f5de5a84be	Throw exception in checkBackend, improve standard_gamma_grad error messages. (#4306 )	2017-12-21 18:21:55 -05:00
Sam Gross	60bbccc8e7	Add factory Type::sparse_coo_tensor(indices, values) (#4303 )	2017-12-21 17:08:36 -05:00
Sam Gross	4dba674324	Move factional max pooling to ATen (#4290 )	2017-12-21 17:07:46 -05:00
gchanan	a076731066	Use `where` rather than `_s_where` in `_s_where` backwards so `where` is traced. (#4301 )	2017-12-21 16:08:16 -05:00
gchanan	41c9959ef7	Enable functional torch.where. (#4298 )	2017-12-21 13:55:57 -05:00
Pieter Noordhuis	e0ebd9a14e	Check GCC version on Ubuntu Summary: Thanks to feldim2425 we know that GCC 5 in Ubuntu 17.04 and later doesn't define the macro _GLIBCXX_USE_C99 and by extension the std::to_string, std::stoi, and std::stod functions (and probably more). Instead of avoiding using these functions, we simply recommend people to use GCC 6 or higher on the newer Ubuntu versions where GCC 5 doesn't work. As a side note, CUDA 8.0 is compatible with GCC up to version 5. This implies that compiling Caffe2 with CUDA on Ubuntu >= 17.10 implies using CUDA >= 9.0. If you need to compile with CUDA 8.0 and are on Ubuntu, you are stuck on version 16.04 or lower. I verified this fix by running cmake on Ubuntu 17.10 with -DCMAKE_CXX_COMPILER=/usr/bin/g++5 and observing the fatal error. This closes #1633. Closes https://github.com/caffe2/caffe2/pull/1645 Differential Revision: D6620812 Pulled By: pietern fbshipit-source-id: 29af88cad9bede4fd952084c404c85db05baa9c4	2017-12-21 10:51:50 -08:00
Aarti Basant	8af9f0da99	Saving checkpoint failure should not cause job failure Summary: If we encounter failures while writing a checkpoint, ensure that the job does not fail. A job can make progress even if writing a checkpoint fails Reviewed By: anshulverma, boryiingsu Differential Revision: D6615163 fbshipit-source-id: 01f790422e1a81bab1fe73f86750eaf75a72bb77	2017-12-21 10:32:55 -08:00
Edward Z. Yang	5f7c5502b8	Further improvements to ATen convolution (#4287 ) - Rename THNN convolution to have thnn_ prefix. - Propagate CuDNN benchmark and deterministic to at::Context - Add 'convolution', 'convNd' and 'conv_transposeNd' native wrappers, with defaults The conv_transposeNd wrappers are updated to have the same argument order as Python. - torch.nn.functional directly dispatches to the native wrappers - Make it possible to turn off tracing for some native wrappers, so I don't have to write symbolics for all the functions above - Spectral ops can now make use of CuDNN convolution if possible - Better commentary on cudnn_batch_norm - Turn on DCE for all JIT tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-21 13:03:43 -05:00
Pieter Noordhuis	46054ddb5c	Run MiscCheck.cmake earlier in CMake process Summary: This means warnings and errors fire sooner rather than later. This requires a fix for an issue where CMAKE_REQUIRED_FLAGS propagates to some unrelated check, which then fails, because the Android compiler doesn't support -mavx2. Closes https://github.com/caffe2/caffe2/pull/1646 Differential Revision: D6620129 Pulled By: pietern fbshipit-source-id: 4d1185406ebee3a523d39811bca6783bee82c898	2017-12-21 09:17:26 -08:00
Edward Z. Yang	5b8fe5cbb5	Batchnorm in ATen (#4285 ) * Batchnorm in ATen This commit moves BatchNorm derivatives into ATen, eliminating torch/csrc/autograd/functions/batch_normalization.cpp Some refactoring along the way: - Functions got renamed to remove _forward from their names - CuDNN batchnorm forward was modified to return save_mean/save_std instead of take it as parameters. To avoid returning undefined Variables, these return (small) uninitialized tensors when they are not used. - THNN batch normalization takes care of resizing save_mean and save_std on forward. - There are some shenanigans re batchnorm backwards in eval mode. I'm tracking that in #4284 - I decided not to introduce buffers as a proper concept in ATen, which means that tensors like running_mean/running_var are variables in ATen. This meant there needed to be some adjustments to how we trace such variables; the new strategy is if we can't find a Value for a variable, we look and see if we have a Value for the buffer pointed to by the variable, before finally falling back on constant. - This PR finally reliably triggered OOM on Travis builds; I fixed this by reducing the number of parallel jobs. - Stop using std::string when it's not necessary. - Remove training parameter from cudnn_batch_norm_backward, because it doesn't make sense; cuDNN doesn't implement the math for evaluation mode batchnorm backwards. - batchnorm_double_backward is now in an anonymous namespace, as it no longer needs to be called from torch/csrc Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-21 11:38:31 -05:00
Alican Bozkurt	d7e6ede784	Implement Laplace distribution (#4289 )	2017-12-21 17:03:03 +01:00
Hao Lu	cf1878814f	Fix typo in operator_schema.h Reviewed By: salexspb Differential Revision: D6609091 fbshipit-source-id: e251ec5b98aa00cb7557baa6cf8aeb731ebf78d8	2017-12-21 01:38:35 -08:00
Hao Lu	e996157a5c	Check if dlopen() return handle is NULL in open_libopencl_so() Reviewed By: Maratyszcza Differential Revision: D6616616 fbshipit-source-id: 36aab05ec38ca1b843b05f36433dcd90ca476122	2017-12-20 17:36:19 -08:00
Soumith Chintala	32a4a523d5	cache OpenMP_FOUND in cmake (#4252 )	2017-12-20 19:48:57 -05:00
Fritz Obermeyer	54e11639f9	Fix broken test_beta_log_prob in Python 3.6 (#4261 )	2017-12-20 19:41:03 -05:00
Adam Paszke	a53e04a63e	Document some autograd invariants (#4272 )	2017-12-20 19:40:00 -05:00
Zachary DeVito	674d7d1f8e	Allow compiled functions to call compiled functions. (#4286 )	2017-12-20 19:02:59 -05:00
Lu Fang	fab5885df6	Add Min and MinGradient Op in Caffe2 Summary: Add Min and MinGradient Op Reviewed By: jamesr66a Differential Revision: D6608668 fbshipit-source-id: 7e1f8fa7a42a94f26152da0109d597e5deeb21c0	2017-12-20 14:49:55 -08:00
Sam Gross	b6a30f7ede	Move SELU to ATen (#4269 ) Fuse scale multiplication into ELU	2017-12-20 16:32:21 -05:00
Sam Gross	dad4b2d6cc	Move adaptive avg/max pool1d to ATen (#4266 )	2017-12-20 15:50:17 -05:00
Sam Gross	689ef9cba3	Move upsampling to ATen (#4264 )	2017-12-20 15:12:07 -05:00
Adam Paszke	efb6feb242	Make the JIT interpreter handle unused inputs correctly	2017-12-20 11:27:40 -08:00
Edward Z. Yang	6daf34ce7b	Don't mark index as traceable, and other improvements (#4249 ) * Improve 'untraced variable' message, add failing test. * Make index traceable.	2017-12-20 11:25:59 -08:00
Edward Z. Yang	a88a8ec827	Convolution derivatives in ATen (#4116 ) * Convolution derivatives in ATen This PR introduces ATen implementation of convolution, which dispatches to THNN/CuDNN/nnpack based on input parameters. The general strategy is to compose this function out of the various forward-backward pairs of specific implementations, rather than write a monolithic function with backwards (which is what we did before because the boilerplate of doing it otherwise would have been very high.) The new API provides the following functions: - _convolution, which is a fully generic, native convolution implementation that dispatches to various other convolution implementations depending on input characteristics. This is prefixed with an underscore because it explicitly takes benchmark, deterministic and cudnn_enabled which are implementation details for CuDNN. The intent is to eventually provide a convolution that reads these parameters out of the context using #4104. - _convolution_nogroup is a convolution implementation for non-CuDNN algorithms which don't support group convolution natively. - _convolution_double_backward is the generic double-backwards implementation for convolution. In more detail: - Most functionality from torch/csrc/autograd/functions/convolution.cpp has been moved into aten/src/ATen/native/Convolution.cpp - We continue to make use of ConvParams, but we now construct the parameters upon entry to a function from the function signature (which does not use ConvParams; having convolution take ConvParams directly would require teaching the code generator how to accept these as parameters, complicating ATen's API model) and destruct them when making subprocedure calls. - I introduce a new idiom, input_r, which represents a const Tensor& reference, which will subsequently be assigned to a local Tensor input. This is helpful because a lot of the existing algorithms relied on being able to assign to locals, which is not permitted with a const reference. - The native argument parser now supports std::array<bool,2> inputs (NB: there MUST NOT be a space; this is the same hack as is applied to derivatives.yaml) - Native parser now supports Tensor? arguments, which indicates a nullable tensor. Previously this function was only used by NN methods. - Documentation updates on THNN library - I added an extra fgradInput argument to VolumetricConvolutionMM_updateOutput and VolumetricConvolutionMM_accGradParameters so that its buffer list lines up with the backward argument list. This makes it possible to write derivative for conv3d which previously was not supported (commented out in derivatives.yaml) - Extra double_backward declarations for all convolution backwards functions was added. - You can now use the syntax Tensor? in native_functions.yaml to indicate that a tensor argument is nullable. There are adjustments to propagate this to the Python argument parser. - NNPACK was ported to ATen, and ATen now builds and links against ATen if possible. New AT_NNPACK_ENABLED macro. The nnpack functions are nnpack_spatial_convolution. - Some modest CuDNN convolution refactoring to remove _forward from names. - There's a new cudnn_convolution_backward function to deal with the fact that CuDNN convolution double backward requires you to have computed all gradients in one go. - Variable set_flags now checks if the tensor is undefined, fixing a silent memory corruption. - checkSameType updated to not raise an exception if called with Variable arguments - "no ATen declaration found for" error message is improved to say what available declarations are - make_variable now accepts undefined tensors, and returns an undefined tensor in this case.	2017-12-20 14:19:27 -05:00
gchanan	63ac3633f5	Implement torch.where(condition, x, y) CPU Variable. (#4259 ) * Implement torch.where(condition, x, y) CPU Variable. * Get rid of IMPLEMENT_STATELESS for where.	2017-12-20 13:08:42 -05:00
Richard Zou	456b5b1642	Implement _values() and _indices() methods for sparse variables in python (and sparse tensors in aten) (#4058 ) This is a part of making sparse tensors work with dataloader (#3898) This exposes `_values()` and `_indices()` for sparse variables in python (and sparse tensors in Aten). To do this, I added THDenseTensor* and THDenseIndexTensor* return value functionality to Declarations.cwrap. These should always mean "the dense equivalent of THTensor" and "the dense equivalent of THIndexTensor" respectively. cc @zdevito for the THDenseTensor in cwrap addition ### Test Plan Run the following: ``` import torch from torch.autograd import Variable v = torch.FloatTensor([3, 4, 5]) i = torch.LongTensor([[0, 1, 1], [2, 0, 2]]) x = Variable(torch.sparse.FloatTensor(i, v, torch.Size([2,3]))) x._indices() x.data._indices() x._values() x.data._values() ```	2017-12-20 12:25:26 -05:00
albanD	d400305eb9	fix typo in grad_mode	2017-12-20 17:13:39 +01:00
Zachary DeVito	766312b7f2	Further relax VariableFlags, ... and fix bugs (#4244 ) * Further relax VariableFlags * Allow a requires_grad=True trace to be used for a requires_grad=False input by computing the gradient but they not connecting it to the input. * Enable CSE to de-duplicate WLM backwards pass code which calls sum twice. * Fix a bug in the interpreter that frees a register too early when it appears twice in a use list. * [fuser] Follow all outputs to check if fusion is safe This bug was introduced when we allowed fusion groups to fuse together. Previously producers were forced to have a single output, but now producers that are fusion groups can have multiple outputs. So now we check the uses of all the outputs of a producer. * [JIT] Fix handling of undefined inputs It is not legal to call .data() on variable objects whose tensors are undefined.	2017-12-20 10:36:22 -05:00
peterjc123	77ea2f26d8	Add build support for Python 2.7 using MSVC (#4226 )	2017-12-20 15:07:25 +01:00
albanD	b11db95478	Fix compilation warnings (#4248 )	2017-12-20 15:07:13 +01:00
Fritz Obermeyer	0bc1505f34	Implement .entropy() methods for all distributions (#4268 )	2017-12-20 14:06:01 +01:00
Richard Zou	cf2e088c9a	Translate None to zeros for old-style autograd functions (#4242 )	2017-12-20 14:03:56 +01:00
Will Feng	1681d07199	Disable tests and fix issues with Windows CUDA build (#4251 )	2017-12-20 11:30:21 +01:00
Fritz Obermeyer	69265ea5bc	Ensure gamma samples are positive (#4262 )	2017-12-20 10:17:31 +01:00
Xiaolong Wang	257a9e5279	add hill learning rate scheduling Summary: hill: the learning rate changes according to following 3 stages 1) linear warmup (increasing) at first num_iter steps from start_multiplier 2) inverse shrink (decreasing) afterwards (gamma, power) 3) lower bounded by end_multiplier Differential Revision: D6565379 fbshipit-source-id: 9c0e51fc825ba6a7765803a1f09479497057a9d9	2017-12-19 23:35:44 -08:00
Sam Gross	4c56ce0958	Remove unused thnn/loss.py (#4267 )	2017-12-19 22:40:28 -05:00
Peter Goldsborough	ce2a0aa4d8	Add slice and gather syntax Summary: Implemented syntactic sugar for the following constructs: - `x.Gather(y)` can now be written as `x[y]` - `x.Slice(start, end)` can now be written as `x[start:end]` For slicing, `start` and/or `end` can be omitted iff `x` is one-dimensional (i.e. a vector). That is, `vector[start:]`, `vector[:end]` and `vector[:]` will work. Doesn't work for higher-dimensional tensors because to emit the start/end indices we need to know the rank of the tensor (since `Slice` requires one entry per dimension of the tensor). Also added a `getProto()` function so that I could test that the generated code is as expected (i.e. that the syntactic sugar does not affect the structure of the output). Reviewed By: zdevito Differential Revision: D6605864 fbshipit-source-id: 786359713a13314c24be2fc07e01486c507404ef	2017-12-19 19:17:01 -08:00
Sam Gross	b476d10c64	Move max_pool1d to ATen (#4257 )	2017-12-19 20:10:11 -05:00
Sam Gross	8c8114801b	Fix onnx export of replication pad (#4263 )	2017-12-19 20:07:01 -05:00
Sam Gross	9495595520	Move reflection/replication padding to ATen (#4258 )	2017-12-19 18:57:14 -05:00
Qinqing Zheng	97c33a22a6	GPU fallback for LengthsRangeFill Op Summary: Simple fallback implementation to support LengthsRangeFill, we can have native CUDA implementation later Reviewed By: pietern Differential Revision: D6594031 fbshipit-source-id: b705234a591a61e8d1ee5f7524aceec3f4581f9c	2017-12-19 15:42:13 -08:00
gchanan	c470055319	Remove template_scalar, implement is_signed using dispatch. (#4255 ) This also fixes is_signed for Half, which was using the default std::is_signed, which returns false.	2017-12-19 18:17:22 -05:00
Sam Gross	227ef1fb60	Move adaptive avg pooling 2d/3d to ATen (#4254 ) Move adaptive avg pooling 2d/3d to ATen Also use ATen for softshrink	2017-12-19 15:45:33 -05:00
Xiaolong Wang	168271f1b8	add struct get method Summary: as titled, to improve the schema usage Differential Revision: D6565050 fbshipit-source-id: a551fb4f3089410e9cd468ee58e756de6a8ed66e	2017-12-19 12:35:56 -08:00
Soumith Chintala	96007ec6c0	fix an out of bounds hypothetical (#4240 )	2017-12-19 08:37:42 -05:00
Christian Sarofeen	bc6bd62bd6	Fix distributed dataloader so it pins memory to current GPU not GPU 0.	2017-12-19 13:39:06 +01:00
Xiaolong Wang	7315a19bc9	add maybe_add_global_constant Summary: In layer model helper, add a method `maybe_add_global_constant` to ensure that when two global constants are added with the same name, we check if they are actually the same (by initializer) and only add it once. Reviewed By: kennyhorror Differential Revision: D6537532 fbshipit-source-id: 37aa3860a2e40d81161ccdea0c50a316248be2e2	2017-12-18 22:14:00 -08:00
James Reed	cb4f6c3148	conv_tbc (#3730 ) attempt to rebase skip conv_tbc in preprocess_nn_functions Add conv_tbc symbolic Fix backward issue with dBias ConvTBC nn wrapper and unit test	2017-12-18 23:52:36 -05:00
Peter Goldsborough	019db89cb2	Fix documentation for ResizeNearest op Summary: Undoes fake news in `ResizeNearest` documentation. Closes https://github.com/caffe2/caffe2/pull/1630 Reviewed By: Yangqing Differential Revision: D6584224 Pulled By: goldsborough fbshipit-source-id: ec5b8ffe611a042dd3031e94aff4552d01f4f5e8	2017-12-18 18:33:14 -08:00
Ilia Cherniavskii	d28720b90a	Backpropagation for While op Summary: Adds support for backprop to While op, fixes gradient computation for Pow Reviewed By: azzolini Differential Revision: D6456875 fbshipit-source-id: 9f660317ad6f3898ff7d8ce43098f85c3426409b	2017-12-18 16:03:45 -08:00
Jesse Hellemn	52600f8607	Record workflow run id for inference. Reviewed By: salexspb Differential Revision: D6094757 fbshipit-source-id: d8761749e8eb080f50fb08a37431e8a987d0a2db	2017-12-18 15:33:19 -08:00
Lu Fang	1b820be7e5	Fix the ATen static building issue on CUDA Summary: Yangqing pietern With https://github.com/caffe2/caffe2/pull/1627, Caffe2 can statically built with USE_ATEN=ON and USE_CUDA=OFF. But the function deleterFor defined in aten_op_template.h causes duplicated symbols in libcaffe2.a and libcaffe2_gpu.a. I checked at only one place we call this function, so directly manually inline it into the caller. Later when we use it at other places, we can just extract it again, and put the implementation in aten_op.cc. Closes https://github.com/caffe2/caffe2/pull/1632 Reviewed By: pietern Differential Revision: D6594063 Pulled By: houseroad fbshipit-source-id: 2328e2b2dce819378a9f18411c449830917e0d6a	2017-12-18 15:14:30 -08:00
Edward Z. Yang	9bf5e40dfa	Refactor cudnn code layout / make build more robust. (#4201 ) * Refactor cudnn code layout / make build more robust. When I previously moved cuDNN into ATen, I wasn't too familiar with the ATen native function directory layout, and so I did a number of suboptimal things. This commit fixes those problems. - If NO_CUDA was set but cuDNN is installed on your system, we'd incorrectly assume that CUDNN was enabled, to hilarious effect. - We now distinguish between cudnn implementation files and cudnn native function files. The native files now live in ATen/native/cudnn, and are unconditionally compiled, even when we are not building with cuDNN. This means that we can unconditionally declare cudnn functions in yaml and they are always available, even if they are broken. The cuDNN specific files live in 'cudnn', they are never installed, and they are used purely for implementation purposes. I had to add stub implementations of all ATen functions to achieve this. - I had written headers for at::native functions manually, but codegen will generate them for me automatically. So I deleted the headers. That lets me get rid of some header install logic as well. - There's a new note about ATen preprocessor philosophy.	2017-12-18 16:47:57 -05:00
Alican Bozkurt	94ff31f54d	Implement Exponential distribution (#4234 ) * add exponential distribution * add exponential tests * fix default val of sample_shape * lambd->rate * updates per review * remove notes, keep failure_rate same in exponential test	2017-12-18 16:44:35 -05:00
James Reed	e4f46905c0	Fix numpy batch matmul index calculation Summary: hoangmit reported an ASAN test failure on D6389022. Upon further investigation, it appeared there was a logic error on calculating shapes when either the A or B matrix is being broadcasted. This path fixes that error = Reviewed By: dzhulgakov Differential Revision: D6580307 fbshipit-source-id: 2bcf9b76f668c42a463f2f0fdc82f544af3ae721	2017-12-18 13:20:10 -08:00
Sam Gross	d605058212	Replace Variable.volatile with torch.no_grad() (#3970 ) This removes volatile from Variable. The functionality is mostly replaced by a global (thread-local) flag, which is controlled by torch.set_grad_enabled() and the context manager torch.no_grad(). In C++, the flag is exposed through GradMode::is_enabled() and GradMode::set_enabled() Fixes #3627	2017-12-18 15:46:13 -05:00
gchanan	0876bab8b7	Support CPU Apply in ATen and implement standard_gamma using it (#4161 ) * Support CPU Apply directly in ATen and implement standard_gamma using it. Main changes in this PR: 1) Added a TH_APPLY-style templatized function for CPU apply calls (currently only 2 and 3 tensor argument versions are supported, but more are easy to add). In fact, this is basically identical to TH_APPLY, except it uses ATen functions and the API is a template instead of a macro. The template takes an operation that is performed on the data (and an indicator to signal early termination); i.e. you don't need to know that x_data is a pointer to the current data location of x. 2) Refactors the ATen dispatch code to easily generate dispatch code for different subsets of the scalar types. This is in preference to the template_scalar path, which requires valid specialization of each scalar type. Valid specializations are particularly annoying with CUDA because you most likely can't put the specializations in a header so need to write some sort of for-all-scalar-type macro to get the correct specializations. Currently, we only generate dispatch_all (all scalar types, the equivalent existed already), and dispatch_cpu_floating_types (which is used by standard_gamma). 3) Implements standard_gamma using the above changes (this is an arbitrary choice, it was the latest apply macro to be committed). The forward is bound via Declarations.yaml, the backward via the Apply template, and then they are hooked together in derivatives.yaml. This eliminates needing to change TH at all going forward, which means one can write idiomatic C++ instead of the TH-style macros (e.g. TH_MATH_NAME). * Generate Dispatch code with nicer spacing. * Small cleanups. * Fix typo. * Add TODOs for changing macros, remove dead code. * Use a lambda function. * Get rid of early exit. * Rename Scalar,ScalarType template parameters to CScalar. * Reorder _standard_gamma_grad parameters. * Add comments explaining calling convention. * Don't generate Dispatch.h anymore. * Get rid of backend specific checks in dispatch. * Fix empty/scalar check.	2017-12-18 15:45:01 -05:00
Fritz Obermeyer	bcbb36e99a	Allow value broadcasting in distributions.Distribution (#4210 )	2017-12-18 20:11:39 +01:00
Dr. Kashif Rasul	68c0998cbe	added AMSgrad optimizer to Adam and SparseAdam (#4034 ) * initial AMSGrad * added test for amsgrad * added amsgrad to adam * fixed tests * added option to sparse adam * flake8	2017-12-18 13:24:49 -05:00
Fritz Obermeyer	ee98e7a82e	Implement Dirichlet and Beta distributions (#4117 )	2017-12-18 19:11:37 +01:00
Richard Zou	ccf4dc1525	Add reduce arg to BCELoss (#4231 ) * Add reduce arg to BCELoss * Fix test precision * reduce keyword for BCELoss in derivatives.yaml	2017-12-18 12:28:53 -05:00
Tongzhou Wang	d8b2e5d091	Add python only default init expression; Implement stft, hann/hamming/bartlett window. (#4095 ) * implement stft * addressed comments; implemented window functions; added support for python only default initialization	2017-12-18 12:28:23 -05:00
Soumith Chintala	3b641dc805	fix include order for PRId64 macro	2017-12-18 09:32:16 -05:00
Soumith Chintala	54d689253e	Revert "Add reduce arg to BCELoss" (#4221 ) * Revert "Add reduce arg to BCELoss (#3532)" This reverts commit 847c56aeb5857fc4d3f5df88b9e8f937939bb8cc.	2017-12-18 03:13:09 -05:00
Kai Arulkumaran	e9ef20eab5	Add Cosine Annealing LR Scheduler (#3311 ) * Add Cosine Annealing LR Scheduler * Update eta_min in tests to prevent numerical mistakes * Use non-zero min_eta in test_cos_anneal_lr	2017-12-18 02:43:08 -05:00
Richard Zou	847c56aeb5	Add reduce arg to BCELoss (#3532 ) * Add reduce arg to BCELoss * Fix test precision	2017-12-18 02:39:49 -05:00
Kevin Zakka	b86dc0c8ba	add reduce arg to PoissonNLLLoss (#3770 ) * add reduce arg to PoissonNLLLoss * fixed comments except reference function * fixed unit test * small indentation fix * fixing last comments by richard * lint check * another linting issue	2017-12-18 02:32:05 -05:00
peterjc123	02317d9336	Enable ext build for Windows (#3935 ) * Enable ext build for Windows * Include the static libs to make the compiling of the extension easier	2017-12-18 02:23:34 -05:00
Tongzhou Wang	390b7afd45	Fix CUDA Multinomial checks (#4009 )	2017-12-18 02:20:26 -05:00
Richard Zou	43dd6319db	Exclude attrs with invalid python variable names from __dir__ (#4011 )	2017-12-18 02:19:55 -05:00
Tongzhou Wang	5cc26c0c90	Add default PyTorch seeding and worker_init_fn to DataLoader (#4018 ) * Add default PyTorch seeding and worker_init_fn to DataLoader * generate seed using current RNG each time * worker_seed <- main_proc_RNG_generated_seed + worker_id	2017-12-18 02:19:08 -05:00
Richard Zou	30e6898808	Implement NLLLossNd (#4035 ) * Implement NLLLossNd * Fix tests and typos * Fix tests	2017-12-18 02:16:16 -05:00
ngimel	7f41149e14	handle requires_grad when creating buckets for distributed (#4044 )	2017-12-18 02:13:53 -05:00
Tongzhou Wang	3796ce9255	assert (#4056 )	2017-12-18 02:11:01 -05:00
Tongzhou Wang	e0d5d1b7c9	view in certain noncontig case (#4062 )	2017-12-18 02:08:17 -05:00
Richard Zou	9394e65b44	Add proper shape checking to torch.cat (#4087 ) * Fix catArray in THTensor Asserts that the inputs have the same size except in the cat dimension or are empty (or a mix of both). * Fix catArray for THCTensor * Document torch.cat shape checks * Fix types	2017-12-18 02:05:58 -05:00
Sam Gross	2c71b679d2	Implement pin_memory() as a NativeFunction (#4094 ) * Implement pin_memory() as a NativeFunction This adds allocators as a concept in ATen that extends deleters. An allocator is a subclass of at::Allocator that implements the virtual methods: virtual void* allocate(size_t n); virutal void deallocate(void* ptr); A tensor created with a custom allocator can be resized, unlike a tensor with a custom deleter. * Rename AllocatorContext to AllocatorRetainable	2017-12-18 02:03:28 -05:00
ngimel	0257f5d19f	improve performance of maxpooling backwards (#4106 )	2017-12-18 01:55:38 -05:00
Sam Gross	bec0349280	Implement Variable.cuda and Variable.type using ATen (#4139 ) * Implement Variable.cuda using ATen This adds an optional async flag to Tensor::copy_, which attempts to do a non-blocking copy if the one of the tensors is in pinned memory and the other is a CUDA tensor. * Perform cross-device copy in CopyBackwards Also call torch.cuda._lazy_init() from Variable.cuda() * Implement Variable.type via ATen * Changes from review: - remove copy_out - remove unnecessary include - fix default device for .cuda() * Combine if statements in dispatch_type	2017-12-18 01:54:35 -05:00
Sam Gross	b79d74aa81	Re-initialize autograd engine in child processes (#4158 ) * Re-initialize autograd engine in child processes The autograd engine uses threads for backwards. These don't exist after forks and they were not being re-initialized because the Engine::start_threads_flag was already set. This re-initializes the engine in child processes, which will cause it to re-create threads when backwards() is called in the child process. Note that we only attempt to handle the common case where fork() is called while the backwards threads are idle. Fixes #3966 * Avoid non-async-signal-safe functions in fork handler	2017-12-18 01:51:27 -05:00
yongjik	5c46427f08	Rearrange dimensions for pointwise operations for better performance. (#4174 ) * Rearrange dimensions for pointwise operations for better performance. In existing code, pointwise operations on transposed tensors process data "column by column", resulting in poor performance. The worse case happens when all operands are transposed tensors. This change tries to "un-transpose" tensors in such a case, so that memory access patterns are as sequential as possible. * More explanation on what rearrangeDims() does. * Fixed a very important (and stupid) typo.	2017-12-18 01:49:52 -05:00
Edward Z. Yang	e2c75d3732	Make import work even if 'tools' is available in Python path sys.path is searched from first to last, which means that if there is already a 'tools' directory in the existing python path, we will fail to find the root directory of PyTorch. Better to put it first.	2017-12-18 01:09:32 +01:00
Luca Antiga	a6fb960b98	Expose node scopeName to python (#4200 )	2017-12-16 20:00:21 -05:00
Zachary DeVito	0e804ae042	[jit.compile] add a jit_debug_info method (#4205 ) This method prints a bunch of useful debug information including the traces that have been record, their shapes, and the traced graphs associated with them.	2017-12-16 13:26:28 -05:00
Dmytro Dzhulgakov	cab5921227	Improve symbolic hack a bit (#4143 )	2017-12-16 18:44:26 +01:00
avmgithub	2e08885df8	Fix for issue #4103 (remove -march=native flag for ppc64le) (#4162 )	2017-12-16 15:09:08 +01:00
Maciej Kula	d4d8698581	Fix repeat non owning (#4084 )	2017-12-16 14:09:02 +01:00
Adam Paszke	8307f21bf6	Allow map_location in torch.load to be a string	2017-12-16 13:04:42 +01:00
Shuntaro Takahashi	e393a4f03c	fix typo (#4206 )	2017-12-15 23:43:18 -05:00
Hao Lu	038fb70455	Remove dlopen() in get_libopencl_path() Reviewed By: Maratyszcza Differential Revision: D6584697 fbshipit-source-id: bdf5c6c6dc75eb0d7d46b1eba9852a9814f57373	2017-12-15 19:18:17 -08:00
Jerry Zhang	1766e27324	Add DepthwiseConv in iOS11+ Summary: Use MPSCNNDepthwiseConv when groups == input_channels Reviewed By: ajtulloch Differential Revision: D6541561 fbshipit-source-id: 7164f26b8f3a101c0ab5c3e6c02ed855397d2750	2017-12-15 16:47:36 -08:00
Ahmed Taei	0a25926f4b	CUDA implementation for GatherPadddingOp Summary: AT Reviewed By: enosair Differential Revision: D6561996 fbshipit-source-id: ad03d6db8d4318e426ff96569bb3c93cba696926	2017-12-15 16:05:31 -08:00
Do Huy Hoang	db0a2ff4eb	selu op Summary: selu operator for cuda Reviewed By: prigoyal Differential Revision: D5703418 fbshipit-source-id: 06b16a30fe1c67c1d45505e2f5cffc6408674ef3	2017-12-15 15:38:44 -08:00
Peter Goldsborough	95b3c7edad	Fix undefined behavior in GLFilter Summary: Ran into some issues where these values seemed to be initialized to 0 and caused some trouble. Initializing to 1 is safe and well defined. Reviewed By: hlu1 Differential Revision: D6582774 fbshipit-source-id: 088ec4e782d9680a1d9b4d2d42523d06cbc7dd72	2017-12-15 15:38:44 -08:00
Sam Gross	c813ce3787	Implement Variable._sparse_mask (#4124 ) * Implement Variable._sparse_mask * Use SparseTensor as the dyanmic_type	2017-12-15 17:25:20 -05:00
Pieter Noordhuis	5fc4b66cc4	Fix timing issue in stats_test.cc Summary: Failure in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-mkl-ubuntu16.04-test/135/ Closes https://github.com/caffe2/caffe2/pull/1629 Reviewed By: azzolini Differential Revision: D6581743 Pulled By: pietern fbshipit-source-id: 8c84f8c959015d7717785ee3f37b93c4ef146f96	2017-12-15 13:18:27 -08:00
Xianjie Chen	7a5200b450	print exception in layers Summary: as desc Reviewed By: chocjy Differential Revision: D6577301 fbshipit-source-id: 3c2d08a05f6fd1d6771019347e6dec4dd711a653	2017-12-15 12:12:28 -08:00
Yangqing Jia	9331ecad3f	Fix for installation of exported targets Summary: cc houseroad pietern Closes https://github.com/caffe2/caffe2/pull/1627 Differential Revision: D6579710 Pulled By: Yangqing fbshipit-source-id: d6457585a436c2a93d0133c491dec607cae0db7f	2017-12-15 12:12:27 -08:00
Edward Z. Yang	6d72c82985	Trace ATen native functions as themselves, not their implementations. (#4127 ) * Trace ATen non-primitive functions as themselves, not their implementations. Previously, if I invoked an ATen non-primitive function foo, which in turn called subfoo, I would always see 'subfoo' in the trace (e.g., tracing 'inlines' all of these operations.) Such inlining is bad for ONNX (and can be bad for optimization) as it prevents high-level optimizations from taking advantage of the structure. It might be right to inline, but give the optimizer a chance to work before inlining happens! The implementation here is surprisingly simple, because it uses the "DCE trick". Essentially, it doesn't matter if the constituent calls perform tracing, because you can always trace it again, and override the trace nodes associated with the returned variables. The original trace becomes dead and can be DCE'd. While implementing this, I also refactored how 'isTracing' and 'trace_outputs' works: - isTracing was previously a single function with overloads for both Tensor and Variable arguments. Unfortunately, such overloads are not safe, because of how C++ implicit conversions work. You would think that C++ should never confuse an overload for Variable with ArrayRef<Tensor>, but this is exactly what can happen: Tensor is convertible to both Variable and ArrayRef<Tensor>, thus it's ambiguous and C++ doesn't like it. The last time I ran into this problem, I applied initializer lists to everything and called it a day. A more robust fix is to separate out the Variable and Tensor overloads, which I have done in this patch. - trace_outputs was fed as an initializer list, which doesn't work when you have heterogenous inputs. So instead we first feed everything through 'flatten', which has overloads for each of the argument patterns in ATen, which then goes on to the recordTrace (which takes an ArrayRef). This is no less efficient, because we were allocating a vector anyway (to do the conversion from vector of Tensor to vector of Variable). This fixes mean that 'index' can properly be traced... although the JIT still does not support it. A failing test case has been added to this effect. Some knock-on effects: - The fuser now knows about chunk as well as split. They're pretty similar so there is no problem. - There is a new 'canonicalize' pass in the JIT which renumbers a graph so that all structurally equivalent graphs render the same. - We run DCE before the fuser tests, to make sure dead nodes don't block fusion. - There are new ONNX exports for the newly introduced higher level ATen operations. This includes type_as (no-op case only), chunk, select. Zach didn't like the extra use of 'native' in the new codegen, so we've introduced a new concept, 'abstract'. An abstract function is one that is implemented in derived types (e.g., CPUDoubleType), where as a concrete one is implemented in the base type (Type). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-15 13:50:32 -05:00
Pieter Noordhuis	31c3766d5a	Use Jenkins build status badge Summary: Closes https://github.com/caffe2/caffe2/pull/1628 Differential Revision: D6579677 Pulled By: pietern fbshipit-source-id: c53c0be06a342b12d1b1fcb35297ab9e792d4782	2017-12-15 10:23:34 -08:00
Zachary DeVito	93c2f81f32	Fix another leak in pybind11 code. (#4185 ) This time caused by an upstream pybind11 bug: https://github.com/pybind/pybind11/pull/1216 This changes causes the code to go down a non-buggy pathway.	2017-12-15 12:57:49 -05:00
Zachary DeVito	84b7daadb2	Relax verify of VariableFlags (#4191 ) * Fix another leak in pybind11 code. This time caused by an upstream pybind11 bug: https://github.com/pybind/pybind11/pull/1216 This changes causes the code to go down a non-buggy pathway. * Relax verify of VariableFlags If we trace with a defined tensor, but see a run with a undefined tensors we now allow that run to happen, replacing the tensor with zeros. This also fixes a bug where stage 0 tensors were not checked against their verify flags. This change does _not_ handle all bad situations that can happen. For instance if the first thing traced has a undefined tensor but a later tensor is defined, then it will fail because the graph itself does not contain the trace for the derivative of the tensor. However it is possible to work around this later case by dry-running the function: z = Variable(...,requires_grad=True) x,y = f(z) (x.sum() + y.sum()).backward()	2017-12-15 12:57:31 -05:00
Tongzhou Wang	fc8ad6fde6	improve svd doc (#4155 )	2017-12-15 12:57:14 -05:00
Pieter Noordhuis	6552ea110f	Make timing based test more likely to pass Summary: This assumed that the expect statement would run within 1us, whereas we only care it runs in less than the 100ms to check that it got reset. Closes https://github.com/caffe2/caffe2/pull/1606 Reviewed By: Yangqing Differential Revision: D6572951 Pulled By: pietern fbshipit-source-id: fd0c2854bc6459c8bf0e17fa75035eb0a4e522cd	2017-12-15 09:48:44 -08:00
Lu Fang	dde10e1d4b	Add docs talking about how to adding symbolic for unsupported ops (#3741 )	2017-12-15 09:37:09 -05:00
Jorghi12	7874f611a5	Allowing usage of GPU Direct within PyTorch for the Broadcast operation (#4183 )	2017-12-15 09:35:02 -05:00
Christian Sarofeen	5a264b4c0c	Add cublas batched gemm support. (#4151 ) * Add cublas batched gemm. * Comment cleanup batched gemm. * Fix cuda versioning batched gemm.	2017-12-15 09:29:11 -05:00
Neeraj Pradhan	fac711c238	Provide full support for distribution shapes (#4193 )	2017-12-15 12:41:08 +01:00
Will Feng	db446d69ca	Fix issues with Windows 7 & 10 CPU build (#4065 )	2017-12-15 10:14:43 +01:00
Qinqing Zheng	28ea5ac069	Refactor Reduce{Front,Back}{Sum,Mean} Operators Summary: Currently these operators are implemented in a complex meta-programming fashion, I removed the definitions and put modified CPU/CUDA implementions into reduction_front_back_ops.{cc,cu}. This will help future extension of these ops to support lengths input. Reviewed By: asaadaldien Differential Revision: D6506568 fbshipit-source-id: 7323baf7c8e0eca37912f3ae28c02e37ad2e1103	2017-12-14 20:02:36 -08:00
Lu Fang	595c6dea71	Create an ONNX ATen exporting mode (#3489 )	2017-12-14 22:36:53 -05:00
sunaaron	def4b78b6f	adding index_select to symbolic.py (#4061 )	2017-12-14 22:33:53 -05:00
Zachary DeVito	00fe088659	Enable OpenMP in fuser (#4042 ) Because it is hard to know whether -fopenmp will work on a user's machine, we just try it, and then disable it if it doesn't work. Fused kernels are now competitive with the stuff in TH when the kernel is flops bound, and faster when the original kernel was memory bound.	2017-12-14 22:26:56 -05:00
bddppq	d8d82d14cf	Add an option to suppress download progress (#4135 ) * Add an option to suppress download progress * Add a disable option to pbar to make it a no-op * Document progress	2017-12-14 22:19:27 -05:00
Emanuel Jöbstl	be1ef5e4a4	Added explicit tuple element-count to doc for Conv1d. (#4136 ) * Added explicit tuple element-count to doc for Conv1d.	2017-12-14 22:17:46 -05:00
Zachary DeVito	d8c5f2ae21	Fix a bug where from_dlpack failes if cuda is not initialized. (#4182 )	2017-12-14 21:54:36 -05:00
Luca Antiga	9792acb4e0	Preprocess both inplace and non-inplace nn functions (#4184 )	2017-12-14 21:51:55 -05:00
Pieter Noordhuis	d4db1b90a1	Resuppress adagrad health checks Summary: Commit 479e4ce5 didn't end up solving the health checks firing and they are likely still caused by the remaining `assume` calls. Closes https://github.com/caffe2/caffe2/pull/1625 Differential Revision: D6573036 Pulled By: pietern fbshipit-source-id: eeb21bdd61dca0a632eb1ba9e529177ac2569bfd	2017-12-14 16:34:41 -08:00
Alican Bozkurt	7f25fff2fe	add reparameterization, combine sample and sample_n (#4142 )	2017-12-15 00:25:39 +01:00
Pieter Noordhuis	19c511b42f	Make docker image built on Jenkins usable out of the box Summary: The install prefix we use in our builds is /usr/local/caffe2. This is not standard, so in order to load caffe2 from Python, the Python interpreter must know where to find it. In a post-build section in the Jenkins build script we know add a symlink to Python's dist-packages directory and instruct the loader to look in /usr/local/caffe2/lib. Together, these tricks make it usable out of the box. Closes https://github.com/caffe2/caffe2/pull/1617 Differential Revision: D6572322 Pulled By: pietern fbshipit-source-id: c37b789a0d0babbb1110f991318c6b75fe351c0e	2017-12-14 15:05:52 -08:00
Pieter Noordhuis	50360aa00a	Install cmake3 (v3.6.3) in CentOS containers Summary: Closes https://github.com/caffe2/caffe2/pull/1624 Differential Revision: D6572260 Pulled By: pietern fbshipit-source-id: 5698d78f851108826ae68a6a41d81ee16453f666	2017-12-14 15:05:50 -08:00
Edward Z. Yang	c6381c6d44	Add function to explicitly initialize PyTorch CUDA state. (#4180 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 17:48:05 -05:00
Qinqing Zheng	931f5e66d9	Change MakePadding function to be private Summary: att Reviewed By: asaadaldien Differential Revision: D6570051 fbshipit-source-id: 9c5eb2c1cb87c32dd19a9f096e68d521e690cf39	2017-12-14 14:47:57 -08:00
Ahmed Taei	0867120f1e	SortedSegmentMean/SortedSegmentLogMeanExp Gradients CUDA implementation. Summary: AT Reviewed By: enosair Differential Revision: D6525541 fbshipit-source-id: dc095af1c3485d029f7744aadb66f8c51acf8ffe	2017-12-14 13:05:19 -08:00
Jiyan Yang	d38a9bb4ec	Fix dot processor with only one sparse feature and no dense feature Summary: As titled. This will fail with the message: File "/mnt/xarfuse/uid-30088/f8742a88-seed-a26ddfbc-49aa-4c5f-9e08-91909f4775da-ns-4026532692/caffe2/python/layers/concat.py", line 52, in __init__ "Concat expects that limited dimensions of the input tensor" This is because the output scalar of the pairwise_dot_product layer won't contain shape information if output_dim is 1. https://fburl.com/1m9r3ayp This diff is fix it. Reviewed By: xianjiec Differential Revision: D6565930 fbshipit-source-id: 181181232065ef3fdfc825aa25d2714affbe6b8d	2017-12-14 13:05:17 -08:00
Yangqing Jia	eed95f8660	Simple fix for windows Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/1620 Reviewed By: pietern Differential Revision: D6566180 Pulled By: Yangqing fbshipit-source-id: 904e8f43831fc2a4c1f7c475d1f839ab4b7d250c	2017-12-14 12:32:24 -08:00
Alexander Sidorov	54f6b18168	Caffe2: Make SimpleNet simple again Summary: There is a lot of bussiness logic around various events in the base net class. SimpleNet doesn't have to handle those (checked with ilia-cher). Normally these should be no events registered for simple nets, but we can have some issues where they will be added, so its less error prone to just have a SimpleNet::Run pure. And then we also avoid extra virtual calls / empty vector iterations. Reviewed By: ilia-cher Differential Revision: D6551440 fbshipit-source-id: c97a732a00bb36eed49d35e727156ce94225a08b	2017-12-14 11:20:20 -08:00
James Cross	ca44c16e72	LayerConfigMILSTMCell Summary: A version of MILSTMCell which uses layer normalization (see https://arxiv.org/pdf/1607.06450.pdf). There's a lot of copypasta because we don't want to make the existing RNNCell classes harder to approach / understand by adding new options. Differential Revision: D6564208 fbshipit-source-id: 0bc43e12b6c08ebdf5ea6af2c631f785c302bdb4	2017-12-14 10:17:53 -08:00
Martin Schatz	f19ae690c3	Update observer when attached to RNN ops Summary: Observer passed to RNN step net cloned with RecurrentOperator as subject instead of internal Operator. This diff applies adds the internal operator as the subject Reviewed By: enosair Differential Revision: D6560996 fbshipit-source-id: 7af4fb0ff8c19795b5c994c5fc6876f3d2ba7bf4	2017-12-14 10:04:20 -08:00
Richard Zou	d450895a74	fix typo (#4175 )	2017-12-14 12:31:58 -05:00
Edward Z. Yang	787b9c5202	Propagate CuDNN enabled to ATen library. (#4104 ) This is not currently used by anything, but eventually ATen will need to make decisions about whether or not to use CuDNN functions or not, which means we need to propagate this variable to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-14 11:29:25 -05:00
Richard Zou	dac5e6568d	Better error messages for blas ops with cuda.LongTensor (#4160 ) * Better error messages for blas ops with cuda.LongTensor Fixes #4157 Test plan Try matrix multiplying with cuda.LongTensors >>> import torch >>> x = torch.randn(4, 4).long().cuda() >>> y = torch.randn(4, 4).long().cuda() >>> x.mm(y) Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: addmm for CUDA tensors only supports floating-point types. Try converting the tensors with .flo at() at /private/home/rzou/pytorch/pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:381	2017-12-14 11:28:59 -05:00
Adam Paszke	16b7f3a35d	Clean up InputBuffer	2017-12-14 15:14:35 +01:00
Adam Paszke	78ea42e37c	Accept sparse tensors of corresponding type in VariableType casts	2017-12-14 15:14:35 +01:00
Neeraj Pradhan	4f4e0df68f	Allow for broadcasting of distribution parameters (#4140 )	2017-12-14 09:37:03 +01:00
lizz	1eae0ac8b1	Update instancenorm.py (#4171 )	2017-12-14 03:20:37 -05:00
Zach DeVito	991f03fbfb	Fix memory leak in JIT THPVariable_Wrap creates a new PyObject with refcount 1. py::reinterpret_borrow<py::object>() would then bump it to 2, causing it to leak.	2017-12-14 09:07:09 +01:00
Alexander Sidorov	3de8661184	Disable SDT calls for all nets by default Summary: We see a non trivial overhead because of this debugging code. I talked with Romain and looks like we can comment this out for now. We will think about better way to integrate this kind of functionality in Caffe2 going forward Reviewed By: romain-intel, pietern Differential Revision: D6551108 fbshipit-source-id: efa3e643b953d33dc5f3d11f88cafdf2730bc4e4	2017-12-13 21:33:08 -08:00
Yangqing Jia	ac2e368cb2	Fix aten header inclusion Summary: cc houseroad Closes https://github.com/caffe2/caffe2/pull/1618 Reviewed By: bddppq Differential Revision: D6563193 Pulled By: Yangqing fbshipit-source-id: e5bc9d9e798599e96dc739a3d7d4561d5e31d4ba	2017-12-13 18:34:18 -08:00
Xiaomeng Yang	3842128ce1	Fix gpu test for FCTransposed Summary: Fix gpu test for FCTransposed. Reviewed By: pietern Differential Revision: D6560213 fbshipit-source-id: 3b5a3e2f1f2f1c144599967d3565d71dc4340cec	2017-12-13 15:48:18 -08:00
Luca Antiga	8199edf5c1	Refactor generation of NN derivatives (#4096 ) Derivatives for NN functions now have to be specified in tools/autograd/derivatives.yaml. Leaving a function out will result in that function not being available in autograd. Note that _backward declarations used in derivatives.yaml are auto-generated by aten/src/ATen/nn_parse.py so the content of tools/autograd/derivatives.yaml has to reflect the generated declarations. This is an inconvenience, although it's smaller than it looks: future kernels will be implemented directly as ATen native functions. As a help to the user, we could eventually save declarations generated in nn_parse.py to a file. * Avoid automatic generation of NN derivatives * Add inplace functions * Refactor nn preprocessing function * Use output instead of self in inplace derivatives * Include grid_sampler in derivatives * Finish fixing grid_sampler and affine_grid_generator Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Factor out setting up derivatives, use the same logic for NN and non-NN codepaths	2017-12-13 17:25:09 -05:00
Edward Z. Yang	8a254a0271	Port batchnorm_double_backward to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-13 17:19:47 -05:00
Sam Gross	d41b6c7daa	Implement remaining random methods through ATen (#4137 ) * Implement remaining random methods through ATen * Change test_bernoulli on Tensor to avoid broadcasting The new ATen-dispatched bernoulli_ supports broadcasting. The old Tensor.bernoulli_ bindings instead require the tensors to have the same number of elements. I haven't change the old code because it will be deleted soon.	2017-12-13 15:40:34 -05:00
Xiaomeng Yang	dbbfdee4c0	Implement FCTransposed gradient Summary: Add FCTranposed gradient implementation Reviewed By: salexspb Differential Revision: D6551998 fbshipit-source-id: 0ee8ac7df8c33e55d715bfe65d58bb9bbe1afa50	2017-12-13 11:33:07 -08:00
Richard Zou	28890b2046	Add rnn args check (#3925 ) * Add rnn args check * Check both hidden sizes for LSTM * RNN args check test	2017-12-13 12:48:00 -05:00
Fritz Obermeyer	0ab68b8db4	Implement .enumerate_support() for Bernoulli, Categorical distributions (#4129 )	2017-12-13 13:01:05 +01:00
Christian Sarofeen	6db9f6dc78	Enable half communication for distributed (#4091 )	2017-12-13 13:00:12 +01:00
Dong Li	c16a21b67d	removed the device_type assumption in adagrad_test Summary: the "assume" statement in adagrad_test leads to health check failure. here we remove it by checking dc == hu.gpu_do Reviewed By: pietern Differential Revision: D6513314 fbshipit-source-id: 4caf2d938e5f5935a95cca8abd99185182223d63	2017-12-13 03:35:51 -08:00
Ahmed Taei	98143776f5	SortedSegmentMean /LogExp Reduction CUDA implementation. Summary: As titled. Differential Revision: D6506412 fbshipit-source-id: 69f5a4f89f56a5b90905112a59fa3e99e51b46bb	2017-12-12 23:42:33 -08:00
Pieter Noordhuis	54342287fe	Look for NCCL in CUDA_TOOLKIT_ROOT_DIR Summary: Closes https://github.com/caffe2/caffe2/pull/1611 Reviewed By: dzhulgakov Differential Revision: D6550168 Pulled By: pietern fbshipit-source-id: e034ce4057d37bfc8b53949c56cbcb701ea5d958	2017-12-12 21:50:49 -08:00
Xue Feng	0000766566	Gan for ranking alternate learning rate Summary: This enables two learning rate for Generator and Discrimintor in GAN. For each iteration i, it will decide whether to enable training on G (or D) based on the desired active_period and inactive_period for G (or D). Reviewed By: dragonxlwang Differential Revision: D6379325 fbshipit-source-id: 926f1041e25f48791b2ac1fc1a8eaa08db9639b8	2017-12-12 16:06:28 -08:00
Jesse Hellemn	34566f004d	Adding Ubuntu Anaconda environments Summary: Closes https://github.com/caffe2/caffe2/pull/1603 Reviewed By: pietern Differential Revision: D6546192 Pulled By: pjh5 fbshipit-source-id: 8a61139068edd591489fc5b4b3aef2a89a2a35f8	2017-12-12 12:18:35 -08:00
Zachary DeVito	ae60ef12fa	Module syntax sugar. Summary: Adds modules: a = Module() # create a module a.b = 3 # set tensors in module a.c = 4 b = my_func(a) # pass a module to a function as an argument c = b.what + 1 # and receive a module as a return global foo foo.a.b # translates to Caffe2 name foo/a/b This should help clean up beam search where many external nets are grouped into modules. Reviewed By: jamesr66a Differential Revision: D6543292 fbshipit-source-id: 349eae0b1609efab4557f94650938e1fa543579d	2017-12-12 12:07:57 -08:00
Pieter Noordhuis	3b99bb5dd1	Add readme for docker/jenkins directory Summary: This also removes the `bin/{build.sh,test.sh}` scripts that are now located in `.jenkins/{build.sh,test.sh}`. The rationale for this is that these scripts don't care about Docker specifically and are also run for, for example, macOS builds. Closes https://github.com/caffe2/caffe2/pull/1610 Differential Revision: D6546204 Pulled By: pietern fbshipit-source-id: 643bfb0c342b1719c0fb51e4e0987b2674e6424f	2017-12-12 10:45:41 -08:00
ngimel	8d358a1db5	allow cudnn for fp16 batch norm (#4021 )	2017-12-12 12:49:26 -05:00
Neeraj Pradhan	ba93c031f2	Moving distribution classes into a separate package	2017-12-12 02:44:44 -08:00
Pieter Noordhuis	790933b430	Build Redis support on Linux Summary: Builds can then execute rendezvous where a shared file system is not available. Closes https://github.com/caffe2/caffe2/pull/1530 Differential Revision: D6543267 Pulled By: pietern fbshipit-source-id: a924e2d8c26e0e30e95673ca17c7e1f40f43b3dc	2017-12-11 22:33:05 -08:00
Qing He	77352fdbdd	Remove scoping assertion because it is not useful and causing errors Summary: Remove scoping assrtion because it is not useful and causing errors Reviewed By: salexspb Differential Revision: D6538219 fbshipit-source-id: e587e294d4beec1370e6895af9354f0818a4cdd8	2017-12-11 18:03:45 -08:00
Qichao Que	234591a809	Support regression with output transform in MTML for feed Summary: changes on metrics and mtml. Differential Revision: D6457175 fbshipit-source-id: 1a162c519191f290e8e919cc7fe978f502ec2840	2017-12-11 17:20:20 -08:00
Pieter Noordhuis	90f06860b1	Update scripts in .jenkins Summary: Part of a 2-step process to move the Jenkins entry point scripts from `docker/jenkins/bin` to `.jenkins`. Closes https://github.com/caffe2/caffe2/pull/1605 Differential Revision: D6537959 Pulled By: pietern fbshipit-source-id: 716b2e6bd50bbfe56b0bb844dd6b0c666a52527c	2017-12-11 14:35:26 -08:00
Pieter Noordhuis	d84033dc6b	Add placeholders for issues/pull requests Summary: Closes https://github.com/caffe2/caffe2/pull/1604 Differential Revision: D6537325 Pulled By: pietern fbshipit-source-id: 90dae8389e318ff36c8455a0e002c8e42167aa9a	2017-12-11 14:35:25 -08:00
Pieter Noordhuis	2d07360938	Fix compilation on GCC 7 Summary: Thanks to BrettRyland for the initial fix in #805. Closes https://github.com/caffe2/caffe2/pull/1602 Reviewed By: Yangqing, asaadaldien Differential Revision: D6534431 Pulled By: pietern fbshipit-source-id: 1a3ecb77743e7cee76b61c516332137c07331067	2017-12-11 13:32:30 -08:00
Fei Sun	53f9a0f03d	Ipython notebook directory name is changed, Change from ipython to jupyter, Also pass arguments instead of fixing --ip Summary: Change the directory name for ipython notebook. Change the executable name fro ipython to jupyter Pass arguments to the script to the notebook, instead of fixing --ip=''. In some setup, --ip='' cause jupyter notebook not displayed. Closes https://github.com/caffe2/caffe2/pull/1546 Reviewed By: pietern Differential Revision: D6460324 Pulled By: sf-wind fbshipit-source-id: f73d7be96525e2ab97f3d0e7fcb4b1557934f873	2017-12-11 13:05:40 -08:00
Ilia Cherniavskii	e78d0e5a23	Update SingleThreadAsyncNet Summary: Updated SingleThreadAsyncNet to use new interface Reviewed By: ajtulloch Differential Revision: D6526515 fbshipit-source-id: 6aa24678ba7350a5e448e9c2ab29ccd07a1fcb0b	2017-12-11 13:05:39 -08:00
Sam Gross	aeb7a3668d	Implement Variable.new (#4080 )	2017-12-11 15:45:43 -05:00
Pieter Noordhuis	eb3292bbf2	Add builder for CentOS Docker images Summary: cc pjh5 yangqing Closes https://github.com/caffe2/caffe2/pull/1598 Reviewed By: pjh5 Differential Revision: D6535868 Pulled By: pietern fbshipit-source-id: 2b43b7b334422a485b45b1c51051f7e8cd2bd5b2	2017-12-11 12:04:30 -08:00
Edward Z. Yang	8612b0bbd8	Fix 'Undefined symbols _THDoubleTensor_digamma_one' Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-11 14:38:49 -05:00
Edward Z. Yang	aed38c96bb	Don't set -fno-openmp for Clang. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-12-11 14:38:49 -05:00
Fritz Obermeyer	05ebd21a36	Implement reparameterized gradient for Gamma sampler (#3978 )	2017-12-11 03:32:15 -08:00
Richard Zou	77dfdbf96c	Ensure RNNCell variants don't broadcast (#4074 ) * Ensure RNNCell variants don't broadcast * Fix lint * Add test for hidden_size=1 in RNNCell no broadcasting test * Prevent broadcasting for hidden_size and input_size * Isolate input checking from hidden size checking	2017-12-11 03:00:54 -08:00
Pieter Noordhuis	fca617c62f	Suppress hypothesis health check in adagrad_test.py Summary: PR #1536 suppressed test_sparse_adagrad but test_row_wise_sparse_adagrad also filters too many examples. Suppress health checks for this test as well. Closes https://github.com/caffe2/caffe2/pull/1599 Differential Revision: D6530850 Pulled By: pietern fbshipit-source-id: c73f30d2e104565421e3e381b1cf66185edc833e	2017-12-10 11:47:15 -08:00
Jongsoo Park	93009b15e8	Compute flops in conv based on output image size Summary: Flops in conv were underestimated when pad is not zero. The difference is especially big when image is small. Reviewed By: salexspb Differential Revision: D6394190 fbshipit-source-id: b9f057fceae77f745c5daa668cb2100f993d21a7	2017-12-09 21:32:08 -08:00
Pieter Noordhuis	b886498f62	Don't use CMake generator expression for in-tree protoc build Summary: This fixes the in-tree protoc build on CentOS 7 (that ships with super old protobuf version). Closes https://github.com/caffe2/caffe2/pull/1595 Differential Revision: D6529307 Pulled By: pietern fbshipit-source-id: ac81c7cd884846854b4ffd4909377e87d93bddc3	2017-12-09 13:33:30 -08:00
James Reed	a8b8614efa	Fix typo Summary: Closes https://github.com/caffe2/caffe2/pull/1596 Reviewed By: zdevito Differential Revision: D6528030 Pulled By: jamesr66a fbshipit-source-id: feedf272cd6583360e5e20e90de3f02b728566e6	2017-12-08 20:49:36 -08:00
gchanan	c902f1cf98	Allow specification of bool defaults in native functions. (#4089 )	2017-12-08 15:26:08 -08:00
Pieter Noordhuis	53b12693ff	Run NCCL tests in CUDA environments Summary: cc slayton58 Closes https://github.com/caffe2/caffe2/pull/1594 Differential Revision: D6525109 Pulled By: pietern fbshipit-source-id: c95d0615849e5a2014d228ed14bc81b5c827084f	2017-12-08 15:17:18 -08:00
Simon Layton	a24c11329a	Fix out-of-place allocations Summary: Also add int as a datatype and correctly check error codes on group start, end Closes https://github.com/caffe2/caffe2/pull/1590 Differential Revision: D6524086 Pulled By: pietern fbshipit-source-id: 385aab6fe1bbf6b5c06fa905066bc576a733c856	2017-12-08 15:03:49 -08:00
Sam Gross	83162b2af1	Expose resize_ and resize_as_ to Python (#4088 ) We'll need these functions when we merge Variable and Tensor. They throw an exception if called on a Variable that requires grad. As of now, every Variable that has a grad_fn also requires grad.	2017-12-08 16:40:41 -05:00
Yanghan Wang	f8ae4b6670	avoid auto's in the lambdas in OSS build Summary: no Reviewed By: pietern Differential Revision: D6521515 fbshipit-source-id: 74049ad63fcf2e854ebeeac150c2ba2017904b7a	2017-12-08 12:05:02 -08:00
Pieter Noordhuis	1b25dfd204	Parameterize jenkins user uid/gid in docker build Summary: Closes https://github.com/caffe2/caffe2/pull/1587 Differential Revision: D6521818 Pulled By: pietern fbshipit-source-id: 6336e6af917b9f71abb63dd4b78009dad26890ca	2017-12-08 12:05:00 -08:00
Sam Gross	75c11d62b7	Implement Variable.__invert__ (#4082 )	2017-12-08 13:05:51 -05:00
Zachary DeVito	fd2cab9ded	Rudimentary schema checking of operators Summary: Uses caffe2 operator schema to check # of inputs/outputs. Falls back to actual schema->Verify so that schema errors get reported associated with a SourceRange. Reviewed By: jamesr66a Differential Revision: D6517136 fbshipit-source-id: 9be89165ea5e717c4cec1d25bbd967df86200d6c	2017-12-07 23:44:28 -08:00
Zachary DeVito	1c6595c8e8	Add function calls and externs Summary: Adds the ability for a script function to call another and adds the extern function to register an external Caffe2 Net that can be called by the script. Closes https://github.com/caffe2/caffe2/pull/1591 Reviewed By: jamesr66a Differential Revision: D6515877 Pulled By: zdevito fbshipit-source-id: b893d9e4bacd7389b550ac8a37ad7974b95de749	2017-12-07 23:44:28 -08:00
Yanghan Wang	0f8bdf61e6	add option to config engine for benchmark binaries Summary: no Reviewed By: ajtulloch Differential Revision: D6304644 fbshipit-source-id: 5a699c93bef72db12fc5ad60b1d7d4d35c042b7d	2017-12-07 18:03:39 -08:00
James Reed	8d8079a7c3	Builtins for {zeros,ones}{,_like} Summary: Closes https://github.com/caffe2/caffe2/pull/1589 Reviewed By: zdevito Differential Revision: D6511699 Pulled By: jamesr66a fbshipit-source-id: d12421a13fec0c2d4f4fe0dc27b0f8a7b93b7c16	2017-12-07 15:48:52 -08:00
James Reed	1fd1eaa119	More complete beam search example w/ init code Summary: Closes https://github.com/caffe2/caffe2/pull/1588 Reviewed By: zdevito Differential Revision: D6511524 Pulled By: jamesr66a fbshipit-source-id: eb19e74918a3f3a4f5e8a1ed68762e5e5c346160	2017-12-07 15:48:51 -08:00
James Reed	71ab6f41ed	Post EOS penalty example Summary: Closes https://github.com/caffe2/caffe2/pull/1586 Reviewed By: zdevito Differential Revision: D6505863 Pulled By: jamesr66a fbshipit-source-id: 2778081de32fcf134df7083ab8fa739ec41fd182	2017-12-07 14:47:38 -08:00
Zachary DeVito	5c809de4b4	Add missing derivatives.yaml input	2017-12-07 14:46:43 -08:00
gchanan	1c96809cf8	Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable. (#3945 ) * Bind cauchy_, exponential_, normal_, uniform_ functions to THPVariable. Also changes the error messages around Generator parser; previously, you'd get an error like: torch._C.Generator is not a torch.Generator; now the check is proper but returns that only None is supported. * Support passing Generators to ATen Variable-bound methods. This involves changing THPGenerator to have an at::Generator rather than a THGenerator. TH getRNGState, setRNGState are still called directly because they are not bound from ATen yet; they should probably be on the Generators and return (opaque) GenerateState objects. * Fix default values. * Properly use THRandom_initialSeed. * update standard gamma to use new default generator.	2017-12-07 14:34:51 -08:00
Sam Gross	9ea576d068	Implement neg for all types (#4075 ) The C/C++ unary negation operator is well defined for unsigned types. We should use that behavior. This also implements neg for CharTensor. That behavior currently depends on whether char is signed or unsigned. Fixes #4066, #3225	2017-12-07 16:37:17 -05:00
Sam Gross	60c03bc09c	Implement apply_, map_, and map2_ in Variable (#4057 )	2017-12-07 14:48:56 -05:00
Martin Schatz	f233a3ebd8	Explicitly set default data type in seq2seq/translate.py Summary: word_rewards data type is mixed; ConstantFill assigns long but later is filled with float32. This causes issues when running net from outputted protobuf. This change makes data type to be float32 for lifetime of blob. Reviewed By: jhcross Differential Revision: D6486723 fbshipit-source-id: c4ce5185a0a6d71b08b1819f2355e9354823b701	2017-12-07 11:21:01 -08:00
Sam Gross	ea11c30df6	throw new -> throw (#4059 )	2017-12-07 09:20:10 -08:00
Sam Gross	fc4d976a8a	Fix non-determinism in code generation scripts (#4063 )	2017-12-07 09:18:06 -08:00
James Reed	dc47319074	Implement AssertOp Summary: This can be used for testing and debugging. zdevito and I will primarily use this for our caffe2 script project Closes https://github.com/caffe2/caffe2/pull/1585 Reviewed By: zdevito Differential Revision: D6501209 Pulled By: jamesr66a fbshipit-source-id: fdd65e422c44b74bb6926320af506dcae13327f3	2017-12-06 17:18:52 -08:00
Zachary DeVito	a6ff78457f	Misc. fixes and improvements Summary: * condition if * True/False literals * and, or, not * 0-output expressions, like print * _ is given a fresh name * x.foo(...) is desugared to foo(x,...) * +=, *= Closes https://github.com/caffe2/caffe2/pull/1581 Reviewed By: jamesr66a Differential Revision: D6495256 Pulled By: zdevito fbshipit-source-id: b601d3f9e08fa544881a0c946b4feac24cb7e116	2017-12-06 17:03:33 -08:00
James Reed	098ab27013	Update beam search example to use new features Summary: Code looks much nicer after improvements introduced in https://github.com/caffe2/caffe2/pull/1581 Closes https://github.com/caffe2/caffe2/pull/1582 Reviewed By: zdevito Differential Revision: D6497976 Pulled By: jamesr66a fbshipit-source-id: 529278a104c0be81aa999a414d89c2f2e0264324	2017-12-06 15:02:43 -08:00
Jerry Zhang	0365640d7e	Fix ConvTranspose Summary: Turns out that similar to RoIWarp, col2im in custom ConvTranspose implementation is also missing a bound check for image. Reviewed By: ajtulloch Differential Revision: D6494061 fbshipit-source-id: 1fadbdd05f360b20343df49b70d2be65eab128ac	2017-12-06 12:20:57 -08:00
Sam Gross	d0cabbde74	Implement Variable.from_numpy (#4043 ) Implements from_numpy using ATen tensors. Variable.from_numpy is a convenient placeholder for the variant that returns Variables until we merge Tensor and Variable. The behavior is slightly changed: - from_numpy() on an empty array now returns an empty tensor instead of throwing an exception. The shape may not be preserved. - CharTensor(ndarray) used to throw an exception. It now copies the ndarray. Copying is implemented via ATen toType.	2017-12-06 14:08:56 -05:00
Jerry Zhang	3c1932c35f	Fix RoIWarp Summary: Fix MPSCNNRoIWarp and made it more general to channels Reviewed By: ajtulloch Differential Revision: D6493869 fbshipit-source-id: 77cfa2e2f3bd80efc6e69a0774793e0162d9942a	2017-12-06 11:02:07 -08:00
Sam Gross	38f13447bc	Implement Variable.tolist() (#4038 ) Tensor.tolist() now dispatches through Variable.tolist() so that we only have one code path to test until we merge Variable and Tensor.	2017-12-06 12:35:05 -05:00
Sam Gross	090a23251e	Add Variable._cdata (#4045 ) This is to help with merging Variable and Tensor. It's equivalent to Tensor._cdata and ATen's unsafeGetTH().	2017-12-06 12:26:01 -05:00
ngimel	f92c5aa7ce	slightly simplified indexing (#4040 )	2017-12-06 00:23:57 -08:00
James Reed	6154670e0d	Fix test_while case with in-place add op + broadcast Summary: r = Add(r, r, broadcast=1i) is apparently illegal in caffe2 Reviewed By: zdevito Differential Revision: D6495190 fbshipit-source-id: 8caddef6d9dbcb0f6f6ff18b39aec5251ab1d1e5	2017-12-05 22:49:00 -08:00
James Reed	188e709885	Beam search example Summary: Closes https://github.com/caffe2/caffe2/pull/1578 Reviewed By: zdevito Differential Revision: D6492503 Pulled By: jamesr66a fbshipit-source-id: a8cd5901a1c799656882706213f3a1b2a6cfe652	2017-12-05 19:53:19 -08:00
James Reed	fef095af9c	Set broadcast flag for binary operators Summary: lines such as output_scores = best_scores_per_hypo + scores_t_squeezed hypo_t_int64 = best_indices / 6LL will emit the respective binary operator (e.g. `Add`, `Div`) with the `broadcast` flag set to 1 Closes https://github.com/caffe2/caffe2/pull/1577 Reviewed By: zdevito Differential Revision: D6489991 Pulled By: jamesr66a fbshipit-source-id: 3bef2bd43dfa18659a299cc62affd74f9a763491	2017-12-05 19:53:19 -08:00
James Reed	a53522e560	Implement typed numeric literals Summary: 1 is an int32 1LL is an int64 1f is a float Still need: Parsing out numbers such as 1.0 as integer. 1.0f should work, though Closes https://github.com/caffe2/caffe2/pull/1576 Reviewed By: zdevito Differential Revision: D6489944 Pulled By: jamesr66a fbshipit-source-id: 46aab9483a18a31d883c8c7e3086d3074fa5efac	2017-12-05 19:53:18 -08:00
Andrew Tulloch	8610ea5e2a	ElementwiseLinear fallback Summary: TSIA Reviewed By: Yangqing Differential Revision: D6494589 fbshipit-source-id: 20dafbb4039b187edbf500ccb71e5abfdf9fa173	2017-12-05 19:32:18 -08:00
Dong Li	7ebd589801	Added supports to prof_dag_net and GetProfDagStats operator to collect not only per-op-type cost but also per-op cost. Summary: Previously, GetProfDagStats operator collects per-op-type cost of a given prof_dag net. With this diff, the operator GetProfDagStats has a new option “per_op”, when it is false (default value) , the operator still calculates per-op-type cost. Otherwise, it returns per_op cost, the cost of multiple instances of the same op type will be calculated separately Reviewed By: heslami Differential Revision: D6478547 fbshipit-source-id: 82f00f5fb262cd60b81d2accdd8e3598ddf2eefe	2017-12-05 18:32:43 -08:00
Qinqing Zheng	0a2c5d1ad7	CUDA implementation of UnpackSegmentsOp Summary: Replace the fallback implementation by native CUDA code. Minor edits of PackSegmentsOp: let all computation use one buffer tensor. Reviewed By: asaadaldien Differential Revision: D6455236 fbshipit-source-id: 71f146c470009d1cecf3f2e2f5c381b1751c061c	2017-12-05 17:47:56 -08:00
Ilia Cherniavskii	79ac146808	Add if and while ops to brew Summary: Adding if and while control ops to brew, also adding unit tests Note: unlike net_builder where we can figure which blobs are external and which ones are local to subnets, here in brew we need to use external_blobs param explicitly to point at external blobls Reviewed By: harouwu Differential Revision: D6440508 fbshipit-source-id: c920f0af84b77ccb2d8462ffc7567bb1908c844a	2017-12-05 17:33:34 -08:00
James Reed	e70b117583	Set of bugfixes for script compiler Summary: * Fix typo in negative constant handling "Negate" -> "Negative" * Fix unpacking constant in parsing elements for a list attribute * Parse negative signs in constants * Switch list syntax to use square brackets in attributes Closes https://github.com/caffe2/caffe2/pull/1572 Reviewed By: zdevito Differential Revision: D6483286 Pulled By: jamesr66a fbshipit-source-id: 949e8fd6a96b12efde756bac9da987da0010e153	2017-12-05 16:49:48 -08:00
Ahmed Taei	d1d6c0b12b	Add CUDA implementation for ReplaceNaNOp Reviewed By: jay-mahadeokar Differential Revision: D6481993 fbshipit-source-id: cb253621795bb9de73d3e8bc1c8fc21b596d88c3	2017-12-05 13:34:51 -08:00
Zach DeVito	ea7652e011	Add debug and fix some bugs in CPU fuser * avoid writing `x + 1.0000y` which causes a promotion to double from float refactor tests to make writing graphs easier (while not strictly necessary, I have some benchmarking code that I am using to make the fuser faster that is easier to write in this form) * option to dump the disassembly of the CPU fused code for perf debugging.	2017-12-05 13:20:31 -08:00
Luca Antiga	0b7f1e5efd	Fix segfault during ONNX export	2017-12-05 14:31:39 -05:00
Sam Gross	5241cdf546	Implement Variable.numpy() (#4006 ) Implement Variable.numpy() and dispatch Tensor.numpy() through Variable.numpy() Variable.numpy() is disallowed on variables that require grad.	2017-12-05 14:24:11 -05:00
Sam Gross	71b1858de7	Implement Variable.storage_type() (#4036 )	2017-12-05 14:11:08 -05:00
Jon Crall	5c13c6962c	Raise errors when num_workers == 0 in DataLoader (#4019 )	2017-12-05 11:07:43 -08:00
Yangqing Jia	046c11cd73	Stod Summary: This is in order for Android to pass - Android support for string related functions is quite limited. Closes https://github.com/caffe2/caffe2/pull/1571 Reviewed By: pietern Differential Revision: D6486079 Pulled By: Yangqing fbshipit-source-id: f0961e2dde6202bd6506f4fb8a3aea4af1670cb5	2017-12-05 10:48:09 -08:00
Simon Layton	a8250280bb	Py3 test fixes Summary: \cc pietern Closes https://github.com/caffe2/caffe2/pull/1555 Differential Revision: D6479902 Pulled By: pietern fbshipit-source-id: 84647eddec45620b1ed603f4882ded2dd49adc43	2017-12-05 10:34:41 -08:00
James Reed	ea56e0d424	Implement BatchMatMul with Numpy-style batch broadcast semantics Summary: ONNX has decided to implement a single MatMul operator that borrows semantics from np.matmul: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html This PR introduces a new op that we can target for ONNX that mimics the numpy-style broadcast semantics Closes https://github.com/caffe2/caffe2/pull/1507 Reviewed By: dzhulgakov Differential Revision: D6389022 Pulled By: jamesr66a fbshipit-source-id: a2270ad0042b1ddf6c65ba7cb10d83e0763cf950	2017-12-05 10:34:35 -08:00
Richard Zou	7842b4b878	Use warp shuffles in cuda varInnermostDim (#3846 ) Use warp shuffles in cuda varInnermostDim and remove unnecessary __syncthreads()	2017-12-05 12:35:49 -05:00
Konstantin Lopuhin	f01052ade4	Use enabled in torch.autograd.profiler.emit_nvtx (#4032 ) Or else it's always enabled.	2017-12-05 08:45:23 -08:00
Sam Gross	94a0c72089	Delete _write_metadata and move _new_with_metadata_file into Python (#4020 ) This will make it easier to merge Variable and Tensor	2017-12-05 11:24:54 -05:00
Sam Gross	535a13dbc2	Move renorm to C++ and expose cumsum (#4013 ) Also allow cumprod forward in C++	2017-12-05 11:24:03 -05:00
Lu Fang	84d8e81311	Fix the symbolic for view	2017-12-05 02:31:49 -05:00
Dmytro Dzhulgakov	fcc142386b	Make pool export compliant with onnx spec	2017-12-05 02:27:02 -05:00
ngimel	0d68ce9383	Use integer division to fix failing test	2017-12-04 21:16:09 -08:00
Hassan Eslami	8da31c240d	Revert changes in blob name in optimizer Summary: A while ago, we had to change some blob names in `optimizer.py` (more specifically, names of `iteration_mutex` and `optimizer_iteration`) to handle corner cases when preparing a net for parallel execution. Reviewed By: azzolini Differential Revision: D6480819 fbshipit-source-id: a03a7aa9fad322a50e7785914b0eb0f8654e6d90	2017-12-04 19:32:45 -08:00
Sam Gross	7e1fccb8f5	Add is_pinned, is_shared, and share_memory_ to Variable (#4015 ) These are copied directly from Tensor. We'll need them before we can merge Tensor and Variable.	2017-12-04 20:47:10 -05:00
Qinqing Zheng	5ec224496b	Merge common part in CUDA & CPU implementations of AddPaddingOp Summary: The RunWithType() function of CUDA version shares a lot of code with the CPU version of the op. Merge them by pulling out the different parts of RunWithType() and putting them into a separate CPU/CUDA functions. Reviewed By: asaadaldien Differential Revision: D6467962 fbshipit-source-id: 83b45e697a094e959f66e898f46f06b0e2c329bc	2017-12-04 16:55:49 -08:00
Zach DeVito	f83ca6338a	Shorten stack trace in build for C++ errors.	2017-12-04 18:27:34 -05:00
Zach DeVito	7bf6aaacd3	add missing CMAKE_GENERATOR	2017-12-04 18:27:34 -05:00
Sam Gross	d76f7a806d	Fix potrf gradient and enable gradchecks (#3861 )	2017-12-04 16:50:41 -05:00
Jesse Hellemn	32500fe800	Reducing array sizes used in pack_ops_test to prevent time outs during Travis CI builds Summary: Reduced the array sizes used in pack_ops_test to prevent time outs during Travis CI builds. Reviewed By: enosair Differential Revision: D6476703 fbshipit-source-id: 20ab871ae40349ca27186447a84135bbc5c351b1	2017-12-04 12:48:53 -08:00
Pieter Noordhuis	c8cc04bd85	Builder scripts for Docker containers Summary: This includes a build script for Docker containers to run builds and tests in as well as a build and test script that is run to build and test Caffe2 itself. These scripts are directly used by Jenkins. Closes https://github.com/caffe2/caffe2/pull/1552 Reviewed By: pjh5 Differential Revision: D6476377 Pulled By: pietern fbshipit-source-id: c9268873c03d0878bea0e8516a72c27813284427	2017-12-04 12:04:22 -08:00
Zachary DeVito	9e46fca424	Use ninja as the cmake backend as well.	2017-12-04 14:16:26 -05:00
Zachary DeVito	739fa34ccd	Change ATen's gen.py script so that it can list all of its outputs before reading input/files and doing string formating.	2017-12-04 14:16:26 -05:00
Zachary DeVito	4b6c8779eb	Fixes the the NativeFunctionsCuda.cu intermittent build issues. CMake does not correctly add generated header file dependencies for CUDA compilation units (cpp works fine.). This introduces an explicit dependency to force the aten generator to run first.	2017-12-04 14:16:26 -05:00
Zachary DeVito	61a582da44	Fuser now locates g++.	2017-12-04 14:13:44 -05:00
Zachary DeVito	0cdc1f2f1f	Make TempFiles lifetimes shorter. Relax the amount of file syncing for cpp file.	2017-12-04 14:13:44 -05:00
Zach DeVito	f72fe0624d	Add a CPU Fuser (single core) This adds a simple fusion backend for the CPU. * Refactors CompiledFusionFunction to have two subclasses that handle the compilation details of each backend. * emit-compile-link-run cycle for the CPU * simple single core loop to run the operation * lift CUDA-only restrictions in the fuser, checks that fusion groups are only on a single backend.	2017-12-04 14:13:44 -05:00
Sam Gross	bcfe259f83	Add streams and comms as optional arguments (#3968 ) Adds streams and comms as optional arguments to the NCCL calls in torch.cuda.nccl. Also exposes ncclUniqueId and ncclCommInitRank for multi-process mode. Moves Py_RETURN_NONE statements after the GIL is re-acquired.	2017-12-04 13:51:22 -05:00
Peter Goldsborough	540a9c279e	Add LayerNormLSTM Summary: Adds a new `LSTMCell` subclass to the `rnn_cell` module that performs layer normalization on the fused input matrix. Moves around some code in `rnn_cell.py` to avoid copy-pasta. Adds relevant test cases to `rnn_cell_test.py`. Had to fix `brew.layer_norm` first. See T24013870. Reviewed By: jhcross Differential Revision: D6454883 fbshipit-source-id: 0f4ea7a778cc5be6a7274f7b28c793f5dd7c6095	2017-12-04 10:48:37 -08:00
Alykhan Tejani	5571d0187e	Accept longs in default_collate for dataloader in python 2 (#4001 )	2017-12-04 09:50:57 -08:00
ngimel	a9606580ef	Remove separate nccl installation from Dockerfile9 (#4003 ) Base image already contains nccl	2017-12-04 09:50:06 -08:00
Luca Antiga	4eb8e12765	Introduce scopes during tracing (#3016 )	2017-12-04 09:19:06 -08:00
Pieter Noordhuis	e1e08d631a	Always check cuDNN support in test_convolution_gradients Summary: Regardless of device checker/gradient checker we cannot run a backwards pass with cuDNN when NHWC is used. Closes https://github.com/caffe2/caffe2/pull/1566 Differential Revision: D6474181 Pulled By: pietern fbshipit-source-id: 727d7b4f2a1431a4d6675ffb76c5b60d3d7fa712	2017-12-04 08:50:39 -08:00
Lu Fang	7ddcb91c7f	Add more ONNX symbolics	2017-12-04 07:15:35 -05:00
Pieter Noordhuis	41897e3e78	Supress hypothesis health check in glu_op_test.py Summary: Closes https://github.com/caffe2/caffe2/pull/1564 Differential Revision: D6472568 Pulled By: pietern fbshipit-source-id: 4f1bd3a1ced6d77991531eb864d2cf5d39bc7c4f	2017-12-03 22:51:46 -08:00
Pieter Noordhuis	cdd48a8575	Fix typo in clang ifdef to fix clang 3.9 build Summary: This prevented building with clang 3.9. Closes https://github.com/caffe2/caffe2/pull/1565 Differential Revision: D6472567 Pulled By: pietern fbshipit-source-id: 361c3f9e85237ca0328e12eb23309bc4a3e11556	2017-12-03 22:51:45 -08:00
Jesse Hellemn	07904eaed9	Moving tensorboard to OSS Summary: Moving tensorboard from fb specific and untying all dependencies on fb code Reviewed By: dzhulgakov Differential Revision: D6313818 fbshipit-source-id: 19302c372540400fa60d34015ef9e944ab203d2e	2017-12-03 19:18:01 -08:00
Pieter Noordhuis	1351152362	Skip DeviceShiftTest if host has < 4 GPU devices Summary: Closes https://github.com/caffe2/caffe2/pull/1563 Differential Revision: D6471667 Pulled By: pietern fbshipit-source-id: 99efd21b98c00eb0a846ca8b395bdfd550fe02f1	2017-12-03 16:02:05 -08:00
Hao Lu	76d7bace47	Add opencl logging part I Reviewed By: Maratyszcza Differential Revision: D6441192 fbshipit-source-id: 453580e6bf5abceb00667e1045e316ffe30764cb	2017-12-03 13:16:57 -08:00
Davin Wang	f2be3a4e5e	Allow specifying device to prepare_prediction_net() Summary: This is a supplementary to commit ce8267d425444f60ae650389fb41838847a44a5e. It allows specifying device to prepare_prediction_net() so prediction extractor can work with GPU. Closes https://github.com/caffe2/caffe2/pull/1035 Differential Revision: D6467420 Pulled By: salexspb fbshipit-source-id: b5b9a1536fb516e90b5e4b615403086943cfbe93	2017-12-03 10:32:08 -08:00
Zach DeVito	710f6d6958	Fix warnings and add alert to enable ninja when developing.	2017-12-03 04:49:41 +01:00
Pieter Noordhuis	67f6b5b565	Handle broadcast of ints on CPU side Summary: Closes https://github.com/caffe2/caffe2/pull/1537 Reviewed By: harouwu Differential Revision: D6456371 Pulled By: pietern fbshipit-source-id: 8bf05c2d9e1f5adda5efb29ccaedb220932397f3	2017-12-02 19:33:03 -08:00
James Cross	ca7951b93d	remove unused variable Summary: Oops, I left an unused variable here. Let's get rid of that! Reviewed By: enosair Differential Revision: D6468223 fbshipit-source-id: 27cc0900b330f056c5b5585a136fb46f5830cf81	2017-12-02 01:31:42 -08:00
James Cross	2c190d2f05	update transformer code for layer_norm() API change Summary: Quick fix for unit test broken by D6454290. This is my fault for approving while the tests covering the single callsite were broken. Reviewed By: goldsborough Differential Revision: D6466566 fbshipit-source-id: 2683be3d6bb184286e64fbde3e572946e39030c7	2017-12-01 20:19:31 -08:00
Tongzhou Wang	e5906db3e9	trtrs backward (#3972 )	2017-12-01 22:17:50 -05:00
Soumith Chintala	232f8c73dd	fix flake	2017-12-01 22:16:07 -05:00
Hassan Eslami	96cd3743f1	Make workspace id type consistent with net-rewriting pipeline Summary: There are two components that deal with workspace ids: 1) comm framework, 2) injection of GLOBAL_WORKSPACE_ID. The type of workspace id should be consistent for these components. 32 bits integers should be sufficient for such ids. Reviewed By: akyrola Differential Revision: D6443675 fbshipit-source-id: 7b0e8a3b005683350706fa5c330abf0a9d4881dd	2017-12-01 18:47:12 -08:00
Jerry Zhang	0512597f86	Switching to MPSCNNConvolutionTranspose for iOS11 and above Summary: att. Reviewed By: ajtulloch Differential Revision: D6420049 fbshipit-source-id: 30262dfefe8c400285bcaaab50de3a5d3ff68858	2017-12-01 17:49:09 -08:00
Soumith Chintala	638b10d39b	fix softmax default dim for 1D Tensor	2017-12-01 19:20:04 -05:00
Fritz Obermeyer	165d0897e4	Implement distributions.Gamma (#3841 )	2017-12-02 01:10:08 +01:00
Tongzhou Wang	932e484029	fix doc change lint; (#3974 )	2017-12-01 17:24:30 -05:00
Peter Goldsborough	b43c1b2bed	Fix and upgrade brew.layer_norm Summary: While working on layer normalization for LSTMs I encountered an issue where the layer norm parameters (which are the scale/gain and bias/shift from the paper) were not registered in the model for `brew.layer_norm`. salexspb explained that this is because it was using the `init_net_param` API instead of `create_param`. This diff fixes this. While fixing I noticed that I noticed that `brew.layer_norm` actually had a bug where it was multiplying with the bias instead of adding it. Another issue was that the function giving the scale and bias a shape of `[1]`, however the paper (https://arxiv.org/pdf/1607.06450.pdf) specifies that, like for batch norm, there is one scale and bias parameter per neuron, i.e. the shape should be `[1, axis_dimension]`. The API now takes an explicit `dim_in` parameter (also more consistent with other normalization functions in that module) so that this can be specified. See tests for how this now looks. Reviewed By: jhcross Differential Revision: D6454290 fbshipit-source-id: fc00ca614de3190c40ab743e8984bec9e85fb58c	2017-12-01 14:18:28 -08:00
Tongzhou Wang	fe12ac57a4	Improve docs for torch and torch.Tensor (#3969 ) * doc overhaul * update split doc	2017-12-01 14:56:48 -05:00
gchanan	b2865ef389	Fix comparison warnings in scalar_tensor_test. (#3964 )	2017-12-01 14:56:34 -05:00
Jesse Hellemn	3af2b8f428	Adding length verification check to pack_segments Summary: Adding a check to pack_segments to make sure the lengths passed in add up as expected. Additionally started to address https://fb.facebook.com/groups/1405155842844877/permalink/1977332432293879/ , but it might not fix that issue, but is still useful if it does not help that issue. Reviewed By: salexspb Differential Revision: D6443490 fbshipit-source-id: 680dc763a788a550d321d97a556c5b46e3402dd1	2017-12-01 10:47:25 -08:00
Tongzhou Wang	c681b03d37	Add determinant function on variable; Add backward on svd (#3816 ) * determinant on variable * svd bwd	2017-12-01 13:22:46 -05:00
Pieter Noordhuis	3d1135c842	Skip remove_padding test because it is flaky Summary: Must be fixed in #1547 Closes https://github.com/caffe2/caffe2/pull/1548 Reviewed By: jhcross Differential Revision: D6456373 Pulled By: pietern fbshipit-source-id: 484a58e31506acfc8b8a0954f76796d14dfdfda3	2017-12-01 09:47:31 -08:00
Tzu-Wei Huang	80c8635a7e	fix math notation (#3962 )	2017-12-01 10:15:10 -05:00
Yangqing Jia	34beafcd00	Add static_cast to Get*Argument to avoid compiler warning. Summary: This suppresses warnings on Windows. Reviewed By: dzhulgakov Differential Revision: D6454709 fbshipit-source-id: f7ea437ae261eee584cac36264e4ab331d6eb3c8	2017-12-01 00:36:50 -08:00
SsnL	f80902c6fa	update Tensor.new doc	2017-11-30 23:14:19 -05:00
SsnL	96c6652131	improve Tensor.scatter doc	2017-11-30 23:14:03 -05:00
Edward Z. Yang	754ae49f65	Documentation updates for ONNX. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:09:45 -05:00
Edward Z. Yang	de00aab720	PyTorch now uses operator versioning. Also move some of the exporter info out of the ModelProto constructor. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:09:45 -05:00
Edward Z. Yang	1c0fbd27a1	CuDNN bindings rewrite (into ATen) (#3666 ) * Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra The executive summary is that this moves the torch/csrc/cudnn library into ATen, adding a number of new cudnn_ methods to ATen for batchnorm, convolution, affine grid generator and grid sampler. ATen infra changes: - TensorGeometry was moved to ATen - TensorGeometry was modified to make its interface resemble that of Tensor; in particular, sizes is no longer a field, it's a method. - AT_CUDA_ENABLED macro is set via ATen/Config.h header which is generated at cmake configure time. Fixes https://github.com/zdevito/ATen/issues/168 - Change AT_CUDA_ENABLED macro to be a function macro, so that we error if it is not defined - Introduce a new TensorArg class, which is a Tensor plus a little metadata. This helps us give good error messages when checking dimensions/shapes of tensors. Fixes https://github.com/zdevito/ATen/issues/169 - Also introduce a TensorGeometryArg class, for when you don't need the actual tensor data (which is most of the time.) - Add ATen/Check.h, which contains a number of utility functions for testing shapes, types and devices of input tensors. This will be particulary useful for native methods, which don't get code generated input testing code. These functions take a 'CheckedFrom' argument, at the moment just a string, which specifies some extra information about what function was doing the actual checking; this greatly improves error messages. - Many check functions take initializer lists, which let you test that all tensors have some property. This API is peculiar, in that we IGNORE undefined tensors in this case. This is handled by filterDefined. - Add AT_CUDNN_ENABLED macro - CuDNN linking from ATen was improved; for example, we now actually add the CuDNN headers to our include path. - Add some missing override specifiers to some methods - We now actually build tests with CUDA functionality accessible (previously, AT_CUDA_ENABLED was not defined, meaning that the headers were missing all CUDA-only functionality.) - Native functions now support giving explicit names to return outputs in yaml. This makes it possible to hook into the NN autogenerated derivatives codepath using native functions. CuDNN rewrite changes: - torch/csrc/cudnn now uses ATen (rather than passing around THVoidTensor) and lives in ATen. This lets us remove tensorPointer shenanigans. The functions are exposed to ATen as native functions described in aten/src/ATen/cudnn/cuDNN.yaml - ATen now builds and links against CuDNN when enabled. The cmake package script was taken from Caffe2. - Some header reorganization was done to help reduce dependencies on headers (this reorg is no longer used but I've kept it) - Rename CHECK to CUDNN_CHECK - Rip out old shape/type testing code in favor of modern ATen/Check.h interface using TensorArg. In many cases, increase the robustness of the checking code. - Change the inputs of the public facing functions, so that they can be bound by ATen - Delete THCState; this is retrieved from the global ATen context - Delete cudnnHandle_t, this is retrieved from the global Handles.h - Delete cudnnDataType_t, this is retrieved from the Tensor type - Delete Convolution class, instead its constituent arguments are passed individually - Change functions to return tensors, rather than take an appropriately sized output tensor as an input. - Redo how transposed convolution / backward convolution is implemented (knock on effect of returning tensors). Previously it was assumed that you would always pass an appropriately sized output tensor, but we don't want to do this anymore. For backwards, we instead give the desired output tensor (input, really) size, because that is readily available. For transposed* convolution, however, we take output_padding, and otherwise do the shape calculation. - Redo how legacy group convolution is implemented (knock on effect from porting cudnn to ATen.) Previously, group convolution was implemented by manually constructing sizes and strides and then outputting appropriate, with macros switching between individual groups and all-at-once based on CuDNN version. Now, the code looks exactly what you'd expect: there's a top-level wrapping function that supports group convolution no matter the version of CuDNN, and a low-level wrapper which supports only what CuDNN supports. The top-level function conditions on CuDNN version, and invokes the low-level interface 1 or n times. - There is now a debugging printer for tensor descriptors. - Convolution struct is replaced with ConvolutionArgs, which is not part of the public API but is used internally to conveniently pass around all of the arguments needed for Convolution. - Add some constexprs for well-known dimensions, reduce amount of magic numbers in code. - Put 'deterministic' in to ConvParams. Fixes #3659 - Lots more comments. - Some pessimizations, in the name of code clarity: - The descriptors are initialized on every invocation of convolution forward/backward. Previously, the descriptors were cached, so that you didn't have to initialize them again on backwards. This is difficult to support in the ATen interface so I didn't support it. - Legacy group convolution initializes its workspace for every group it performs. I did not feel motivated to fix this because the legacy codepath is already quite slow. - Affine grid generator and grid sampler automatically call contiguous on their arguments as necessary. - Batchnorm input checking is greatly beefed up, it now checks for the following input characteristics: - Definedness - GPU location - Type - Contiguity - Size PyTorch binding code changes - batchnorm now uses consistent var/data naming - batchnorm and convolution make use of new ATen bindings - Affine grid generator and grid sampler make use of ATen CuDNN bindings via derivatives.yaml. This means I had to restructure the code a little, since the THNN bindings still go through a legacy Python class. - I fixed some warnings: - s/friend class/friend struct/ on InterpreterStateImpl - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp - Removed unused pack_list on Scalar Signed-off-by: Edward Z. Yang <ezyang@fb.com> GCC 4.8 buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> Add TensorGeometry to ATen.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> CUDNN_CHECK Signed-off-by: Edward Z. Yang <ezyang@fb.com> Update TODO comment Signed-off-by: Edward Z. Yang <ezyang@fb.com> Delete return in cudnn_grid_sampler Signed-off-by: Edward Z. Yang <ezyang@fb.com> s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g Signed-off-by: Edward Z. Yang <ezyang@fb.com> Don't allocate a new vector when filtering defined. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Remove Check overloads, convert to pass references. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Some more microbenchmarking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:06:58 -05:00
Xiaolong Wang	21731d7b53	register gradient for lambda rank Summary: as titled Reviewed By: xianjiec Differential Revision: D6438164 fbshipit-source-id: 9df4da5ba4983d2952a586a8c70bbcf3094a17b4	2017-11-30 17:03:12 -08:00
Jerry Zhang	a00d7a1bec	ushort2(gid.x, gid.y) -> gid.xy Summary: att Reviewed By: ajtulloch Differential Revision: D6442939 fbshipit-source-id: 57da10b7249769e8e03d5f505ed3b6ddd3314c98	2017-11-30 16:48:20 -08:00
Yan Shang	cf07820849	Enable SparseLengthsMean Differential Revision: D6445834 fbshipit-source-id: 5cbc95e6975b2447dc82dbe293d0ddd9adf6b5a3	2017-11-30 16:04:38 -08:00
Xue Feng	0c588a500b	Replace sigmoid + xent loss with SigmoidCrossEntropyWithLogits for better numerical stability Summary: Replaced sigmoid + xent loss with SigmoidCrossEntropyWithLogits. The sigmoid layer computes the multinomial logistic loss of the sigmoid of its inputs. It's conceptually identical to a sigmoid layer followed by a multinomial logistic loss layer, but provides a more numerical stable gradient. Reviewed By: xianjiec Differential Revision: D6305455 fbshipit-source-id: 444c9f651fbdf13c3c52be5142769f8f98ed8770	2017-11-30 14:04:36 -08:00
Lu Fang	ab0a7eb7bf	Add ONNX symbolics for several ops (#3956 )	2017-11-30 16:40:13 -05:00
SsnL	fcb0b0de26	fix (#3953 )	2017-11-30 14:42:06 -05:00
Dmytro Dzhulgakov	79f32b46c1	Fix contrib/script build Summary: Fixes a missing line in `ff2c973547` zdevito jamesr66a Closes https://github.com/caffe2/caffe2/pull/1540 Reviewed By: bddppq Differential Revision: D6448498 Pulled By: dzhulgakov fbshipit-source-id: 997453fc6182910140967506d6ad2c8366d06e32	2017-11-30 11:05:55 -08:00
Zach DeVito	8f2bc151d3	Find ninja path using the python module	2017-11-30 13:47:27 -05:00
Zachary DeVito	70ca83793d	Add support to emit compile_commands.json from CMake/ninja files.	2017-11-30 13:47:27 -05:00
Zachary DeVito	0e54c3a989	Significantly speed up the incremental build. This commit adds code to setup.py to use ninja to manage C++ and code generator dependencies rather than use raw setuptools. This is based on similar code added to ONNX. Enabled optionally when ninja is installed. On my computer speed for a do-nothing build drops from 10s to 1.5 seconds. Speed of other compilation steps is significantly improved as well. Dependencies are tracked correctly so the need for ccache is reduced.	2017-11-30 13:47:27 -05:00
Zachary DeVito	442ffac686	Update CONTRIBUTING.md (#3952 )	2017-11-30 13:26:56 -05:00
mr.Shu	8cb32ba630	rnn.py: Note zero defaults for hidden state/cell * Add a note on zero defaults for hidden states/cells of RNNs/LSTMs/GRUs. * Should fix the note in #434 Signed-off-by: mr.Shu <mr@shu.io>	2017-11-30 19:06:26 +01:00
Ellie Wen	fc3f88d8a4	higher order interaction of embeddings Summary: Get higher order interaction of embeddings, similar to cross net but applied in the embedding level. Formula: e_(l+1,i) = element_wise_mul[e_(0,i), \sum_i(e_(l,i) * w_(l,i))] + e_(l,i) + b where l means the l-th layer of this higher order net, i means the i-th embedding in the list. Finally, concat all the embeddings in the last layer, or concat the sum of each embedding, and attach to the output blob of dot processor. Differential Revision: D6244001 fbshipit-source-id: 96292914158347b79fc1299694d65605999b55e8	2017-11-30 08:51:09 -08:00
alcinos	094df38e2f	Fix dependency build when pwd contains spaces (#3950 )	2017-11-30 10:31:19 -05:00
Bingjun Sun	7e9724142a	batched layer parameter loading for model initialization from an existing model Summary: Problem: when we initialize a model from an existing model, currently we load information for each layer parameter independently (in utils.py), including shape information. we have to load the whole model from the db_path every time when we initialize one parameter (in layers.py). For example, in f31078253, the model needs to be initialized twice (not sure why). each time there are 152 layer parameters to load. and loading a model needs 10 min - 50 min depending on resource status. Restriction: 1. _infer_shape_from_initializer in layers.py is called from multiple other places, besides the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py, which is the root cause of f31078253. So we still need to support the load operator in _infer_shape_from_initializer. So we need to batch shape blobs loading outside of LayerParameter. 2. in the if branch of ModelInitDefinition.PARAMS in load_parameters_from_model_init_options in utils.py, the db_path can be different from different parameters, so it is hard to batch them. Solution: Batch the shape blobs loading in the if branch of ModelInitDefinition.INIT_MODEL_PATH in load_parameters_from_model_init_options in utils.py. We load the model and generate shape blobs of layer parameters in the workspace, so that _infer_shape_from_initializer in layers.py can directly return shape blobs of layer parameters cached in the workspace without reloading the model. and at the same time _infer_shape_from_initializer can still support separate any load operator if shape blobs are not pre-loaded into the workspace (this logic can be used for other ways to initialize a model rather than from an existing model). Right now we are using 500 layer parameters per batch, and it worked fine. So for 152 layer parameters, one model loading is enough. Reviewed By: xianjiec Differential Revision: D6397607 fbshipit-source-id: 54f6f61d6d8b70c82b74c2d72ac56cd010a710da	2017-11-29 22:17:51 -08:00
Qinqing Zheng	7374c981d8	CUDA support for PackSegments Op Summary: Replace GPUFallbackOp by native CUDA implementation Reviewed By: akyrola Differential Revision: D6423200 fbshipit-source-id: 47dfecbc486e9a8bf0cc6b897ab8b6a2488caa34	2017-11-29 22:01:42 -08:00
Andrey Malevich	b766335753	Revert D6403523: [Part 2] Support regression with output transform in MTML for feed. Summary: This reverts commit faa0aab1227a27286b617e8e25adfbab3a349d2c bypass-lint Differential Revision: D6403523 fbshipit-source-id: eb43f348b09f2abcc52e101f43b0b9cc42a48ffb	2017-11-29 21:47:01 -08:00
Aapo Kyrola	2caca70a37	Allow shifting of activations / ops to other GPUs in data parallel model Summary: (Work in progress). This diff will allow shifting of activations to other GPUs, in case the model does not fit into memory. To see the API, check the code in data_parallel_model_test, which tests shifting two activations from 0 and 1 to gpu 4, and from gpu 2 and 3 to gpu 5. I will need to further test on ResNets, and probablly add copy operations to handle device change points. Reviewed By: asaadaldien Differential Revision: D5591674 fbshipit-source-id: eb12d23651a56d64fa4db91090c6474218705270	2017-11-29 21:17:00 -08:00
gchanan	4c7219b3b0	Implement matmul as a native function; use it for Variable impl (#3943 ) * Implement matmul as a native function; use it for Variable impl. This also includes an (inefficient) version of allclose, which was necessary for testing. A more efficient version would use some apply logic to fuse the ops and exit early (coming in future PR). On small tensors [(2, 5, 5) @ (5,5)], this yields ~2.5x speedup over the python implementation. * Make maybeSqueeze static.	2017-11-29 23:13:04 -05:00
James Cross	0e21cd2eae	CUDA implementation of RemovePadding operator Summary: This is a CUDA implementation of the RemovePadding operator, modeled on akyrola's implementation for AddPadding. There's also an incidental spelling correction: GetAddPadingGradient -> GetAddPaddingGradient. Reviewed By: akyrola Differential Revision: D6439594 fbshipit-source-id: b29cd0c252021c58e150b901bbaad28a3bd3cc4a	2017-11-29 18:48:01 -08:00
Jerry Zhang	6e9bb93a71	Handle MPSCNNConcat edge case Summary: Handle cases when channels in an output image is filled by multiple input images. e.g. Input1: 1 channel, I2: 1 channel, Output: 2 channels Reviewed By: ajtulloch Differential Revision: D6432909 fbshipit-source-id: b7a8e9be51010e6aef0c50d93f9a7ec5558c74a4	2017-11-29 17:03:15 -08:00
Zachary DeVito	6811acbef9	Syntax for control flow in C2 Summary: Experimental code that allows you to write C2 NetDefs directly using python-like syntax. This includes the ability to write native control-flow (if, while) and have it turn into IfOp and WhileOp Reviewed By: jamesr66a, dzhulgakov Differential Revision: D6123298 fbshipit-source-id: 25fc078b5769be61ac7fb3aa9a7c95bd88dccc30	2017-11-29 16:47:45 -08:00
Qichao Que	c9e181f50f	Support regression with output transform in MTML for feed. Summary: Support regression with output transform in MTML for feed. Differential Revision: D6403523 fbshipit-source-id: faa0aab1227a27286b617e8e25adfbab3a349d2c	2017-11-29 15:47:19 -08:00
SsnL	1661370ac5	Signal handling in DataLoader workers; Timeout option (#3474 )	2017-11-29 23:52:14 +01:00
Simeon Monov	3c709f5b26	Add HANDLE_TH_ERRORS for THPFunction_saved_variables and THPFunction_saved_tensors SavedVariable.unpack() may throw std::runtime_error which may lead to program termination with SIGABRT without the exception beeing handled in Python Fixes #3860	2017-11-29 22:54:27 +01:00
Alexander Sidorov	913a9a736c	Backed out changeset 4e1241fe65cd (revert a revert :) ) Summary: This fixes the issue but I haven't figured out yet why is it happening. Reviewed By: bwasti Differential Revision: D6437378 fbshipit-source-id: bf983c9b6f57647423423ec6b22e0f9d2b170e74	2017-11-29 13:33:15 -08:00
Teng Li	926ed2b280	Implemented NCCL Distributed Backend for PyTorch with new dist APIs (#3435 ) * Implemented NCCL Distributed Backend for PyTorch with new dist APIs * Let FindNCCL to determine the NCCL version * Let NCCL2 Backend use ATEN instead deprecated THPP * Let distributed parallel model use a single reduction thread for NCCL backend * Caching the sockets, bug fix, refactoring, and addressed Adam's comments * Make BcastNcclID take a single param and bug fix for all_gather * Removed barrier function, added warning for users, and not exposing experimental func to users * Use the simplest single bucket working solution for distriubted data parallel model with rebase * Cleanup, fixes and further addressed Adam's comments * Used PySequence_Fast in distributed csrc * Removed the limitation that each group is only bound to a given device sequence * Used THPObjectPtr for PySequence_Fast	2017-11-29 15:57:02 -05:00
Lu Fang	60d86bac91	Fix the UpsamplingNearest2d's symbolic (#3450 )	2017-11-29 15:22:01 -05:00
David Pollack	47fadc3138	improvements to extend in ModuleList and ParameterList (#3505 )	2017-11-29 20:46:39 +01:00
Pieter Noordhuis	6f218cef25	Supress hypothesis health check in adagrad_test.py Summary: With some test seeds this warning starts firing. Should be addressed in a better way, not generating as many invalid examples. Closes https://github.com/caffe2/caffe2/pull/1536 Reviewed By: bddppq Differential Revision: D6437138 Pulled By: pietern fbshipit-source-id: c619d928a585e3d887f686db5d98f841af10c56b	2017-11-29 11:35:04 -08:00
Jerry Zhang	eba0af4d5d	Enable sampling ratio = 0 in RoIWarp Summary: The case when sampling_ratio = 0 was skipped before, this diff enables that setting. Reviewed By: ajtulloch Differential Revision: D6366669 fbshipit-source-id: 4f3b9eaf47eb9dc20823935428d3d886ea32a5fc	2017-11-29 11:04:41 -08:00
Zachary DeVito	929a11f920	Add interpreter support for Handles/PythonOp/CppOp (#3866 ) * Add interpreter support for Handles/PythonOp/CppOp This treats Handles as a first-class type in the interpreter since this turned out to be conceptually simpler than treating them as a separate concept, which requires a second channel for register allocating and moving data from one op to the next. Notes: * The refcounting nature of tensors is factored into its own base type so that it can be shared with other refcounted types such as handle. * Some methods redundant with TensorBase have been deleted from Tensor * The interpreter uses raw refcounted handles. In addition to being able to treat Tensors and Handles as the same base object, it removes a lot of redundant refcounting as objects moved from tensors to input/ output lists. * aten_dispatch has been updated to work directly on the raw refcounted lists to avoid refcounting and duplicate lists. * Removing jit_closure.cpp, The interpreter can now handle all pathways. * Functions like `unsafeToTensorShare` describe how ownership transfers in the interpreter. The `Steal` variants take rvalue references as arguments, and invalidate those arguments to prevent potential problems. * Make TensorTemporary is not a subtype relationship because it is too easy to do something horribly unsafe: ``` void foo(at::Tensor bar) { // bar destructor call release on a temporary! } foo(TensorTemporary(retainable)); // structure slicing! ```	2017-11-29 11:38:57 -05:00
Anton Vaneev	03829e55b3	Remove const modifier where it has no effect Summary: Remove `const` modifier on value-type return types, since it has no effect. This fixes a clang 5 warning. Reviewed By: Maratyszcza Differential Revision: D6399474 fbshipit-source-id: b40af161be5ae67a944518f9b4043c194511267d	2017-11-29 08:32:06 -08:00
Anton Vaneev	e105fee57d	Fix ThreadPool class/struct forward declaration mixup Summary: `ThreadPool` is a class, but it is forward-declared as a struct, which produces an error when compiled with clang 5. Reviewed By: Maratyszcza Differential Revision: D6399594 fbshipit-source-id: e8e81006f484b38e60389c659e9500ec9cfab731	2017-11-29 07:21:03 -08:00
Anton Vaneev	c1babfa8e9	Fix ambiguous brace initialization of std::array Summary: Double braces are required in C++11 when constructing an `std::array<,>` using aggregate initialization. Reviewed By: Maratyszcza Differential Revision: D6399752 fbshipit-source-id: 7b12c7a8193ba4904bb71b764a344bfd06ad7a7a	2017-11-29 07:21:02 -08:00
Adam Paszke	6ae0d477ea	Fix cuBLAS arguments for fp16 dot (#3660 ) * Fix cuBLAS arguments for fp16 dot * Enable FloatTensor <-> CUDA HalfTensor checks in test_cuda.py	2017-11-29 07:16:34 -08:00
Anton Vaneev	0ba9e5a636	Remove unused lambda capture Summary: Remove unused lambda capture parameter which produces a warning in clang 5. Reviewed By: Maratyszcza Differential Revision: D6399643 fbshipit-source-id: cb49dc89749bd1d0143148ed559aa397f4d8f592	2017-11-29 07:03:03 -08:00
peterjc123	4e9fe7f168	Fix wrong arg in operator function for MSVC (#3934 )	2017-11-29 15:11:22 +01:00
Natalia Gimelshein	ea28deee75	use torch.cat in _flatten	2017-11-29 10:54:57 +01:00
Yangqing Jia	4beb3ac3ab	Properly guard cudnn backward path - NHWC is still not supported. Summary: TSIA. This is found in https://github.com/caffe2/caffe2/pull/1530 Reviewed By: dzhulgakov Differential Revision: D6434417 fbshipit-source-id: 2285c2f6252eb7f24e83357eb4887851b3adf690	2017-11-28 23:03:02 -08:00
Aarti Basant	f639625807	Epoch duration may be based on batches_per_epoch or duration in minutes Summary: Updating the reader Limiter to identify an epoch end either based on batches_per_epoch or epoch_duration_len. I am basically addressing the review comment of D6299602 where I was asked to break that diff into 2 smaller diffs. This is Part 1 of the diff D6299602 i.e. making the multi-reader capable of identifying epoch end either based on batches_per_epoch or based on epoch_duration_minutes Reviewed By: azzolini Differential Revision: D6379955 fbshipit-source-id: b8f8e396f515c898ad2f9ee900ec8fad055306b0	2017-11-28 18:50:33 -08:00
Ilia Cherniavskii	38f166c13a	Async executor with less polling Summary: Async executor based on async_polling (D5985110): - Tasks scheduling other tasks, using polling only when necessary (e.g. CUDA->CPU case) - Fully async, i.e. RunAsync immediately returns Reviewed By: azzolini Differential Revision: D6281681 fbshipit-source-id: 06e3723e1424ffab652c38ca7b279cf76e43fa44	2017-11-28 18:50:32 -08:00
Sam Gross	0a434ff685	Remove Function::is_executable (#3907 ) * Remove Function::is_executable Ensure that grad_fn is null if requires_grad is false. * Assert that grad_fn implies requires_grad=True	2017-11-28 18:29:27 -08:00
anderspapitto	67c3cbd5e2	Optimizer: optimize transposes in variety of circumstances (#3509 ) * Optimizer: Optimize transposes in variety of circumstances - No-op transposes - Consecutive transposes (fuse them) - Transposes into Gemm (fuse them into transA/transB parameter) * touch up out of date comment	2017-11-28 14:41:41 -05:00
Sam Gross	d4e5d9061d	Fix indexing with all zero ByteTensors (#3926 ) Fixes #3914	2017-11-28 14:32:35 -05:00
Matthias Ochs	14cc15e8f4	fixed NCCL bug in data_parallel_model.py Summary: Changed the dict of viewvalues into a python list See issue: https://github.com/caffe2/caffe2/issues/1516 Closes https://github.com/caffe2/caffe2/pull/1532 Differential Revision: D6425901 Pulled By: akyrola fbshipit-source-id: 37988abe29726aea86637e18eedb948b7c281008	2017-11-28 10:50:02 -08:00
gchanan	157f949cef	Implement python scalar conversions via ATen; allow localScalar if numel == 1 (#3908 ) * Have localScalar work with all 1 element tensors, not just scalars. Also have toCFloat, etc. call localScalar so 1 element tensors work as well. * Implement python number conversions. * Implement __bool__, __nonzero__ as ATen functions. * Remove merge artifacts. * Simplify by dispatching to toCDouble.	2017-11-28 12:56:51 -05:00
Adam Paszke	af9fd35d82	Cast tensors when loading optimizer state dicts (#3658 )	2017-11-28 09:56:39 -05:00
Edward Z. Yang	51ca3a1a48	Make sparse test also check that coalesce status of tensors makes sense. (#3171 ) This adds more heavy sanity checking when we run to_dense(); in particular, we make sure that if it claims to be coalesced, it truly is coalesced, and if it is not, that the coalesced version also to_dense() to the same thing. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-28 09:55:56 -05:00
Zachary DeVito	c25a1493cd	CUDA mode profiler fixes (#3754 ) * CUDA mode profiler fixes * Enable multi-gpu CUDA tracing We need to record per-device start events because event timing comparison only works for events on the same device. * Course-grained CPU-CUDA syncing of timelines Record a __cuda_start event used to synchronize cuda/gpu timings. This requires running some warm-up event records to ensure the call to event record for the __cuda_start event doesn't take longer than normal. fix syncing * fix cuda build and lint	2017-11-28 09:32:34 -05:00
Zachary DeVito	47918cab01	clean up some dead build logic (#3633 )	2017-11-28 09:30:59 -05:00
Adam Paszke	31af836412	Improve fuser algorithm	2017-11-28 09:52:49 +01:00
Adam Paszke	e66c592d10	Handle ops with multiple inputs in aten_dispatch.cpp	2017-11-28 09:52:49 +01:00
Adam Paszke	00fe1f7cc8	Record trace before saving outputs	2017-11-28 09:52:49 +01:00
Adam Paszke	329891b44f	Add a warning when JIT falls back to AutogradClosure	2017-11-28 09:52:49 +01:00
Dmytro Dzhulgakov	4726d0cb7a	Fix exporting HalfTensor	2017-11-28 08:57:17 +01:00
Dmytro Dzhulgakov	709fcfda8a	Now actually fix padding (the tests are added in onnx-pytorch) (#3893 ) * Now actually fix padding (the tests are added in onnx-pytorch) * fix test	2017-11-27 23:39:48 -05:00
Aapo Kyrola	453eb258de	add code comments to RNNExecutor + lint + rename some Summary: RecurrentNetworkExecutor is quite complex and was lacking documentation and had some stray comments. Cleaned up and added documentation. Also did some renaming and reformatting. Reviewed By: ilia-cher Differential Revision: D6421087 fbshipit-source-id: c3a57f60042ae4425a59123af5f54acb19e860e7	2017-11-27 19:18:08 -08:00
Ilia Cherniavskii	dd1558dc8d	Improve stream selection Summary: Check that next picked up stream is non-busy. Details: https://fb.facebook.com/groups/101100140348621/permalink/377531329372166/ Reviewed By: azzolini Differential Revision: D6381701 fbshipit-source-id: 58f81b8d7ed8179e524f4ee50578dddbb3e69e45	2017-11-27 17:29:08 -08:00
gchanan	e91b75615e	Use ATen version of Variable type_as. (#3840 ) * Use ATen version of Variable type_as. * type_as can't handle Tensors (non-Variables) in the parsing code, handle this in python.	2017-11-27 19:10:33 -05:00
Aapo Kyrola	a08909160e	fix bug in CUDA AddPadding when lenghts output is not provided Summary: enosair caught bug that the operator returned too early if the lengths output was not provided. Fixed and added testing. + noticed the op does not support case when no lengths-input is provided. Added a temporary CAFFE_THROW for this case, will fix later Reviewed By: enosair Differential Revision: D6405585 fbshipit-source-id: a81717e1b39afde6e900ddd9049b820943aea9f1	2017-11-27 15:14:07 -08:00
Lu Fang	96e15179db	Fix function signature in ATenOp for at::Half Set function (GPU version) Summary: The CPU version is fixed here. https://github.com/caffe2/caffe2/pull/1464/files Now fix GPU version. Closes https://github.com/caffe2/caffe2/pull/1506 Reviewed By: jamesr66a Differential Revision: D6392830 Pulled By: houseroad fbshipit-source-id: 922205630cbd8e24da80e269c6cd32bc4f040100	2017-11-27 14:18:26 -08:00
Hao Lu	1ef54e3dab	Fix OpenGL 3.0 Summary: Make OpenGL build Reviewed By: bwasti Differential Revision: D6415848 fbshipit-source-id: 0b78c90d8b0faf30c342ddbe5ccf91a9ac63ef8b	2017-11-27 11:48:56 -08:00
Sam Gross	93e489d168	Fix _analytical_jacobian to not require in-place grad accumulation	2017-11-27 20:03:44 +01:00
Junjie Bai	73f6715f47	Do not link against libpython when building python bindings Summary: our cmake build used to link against libpython.so with its absolute path (instead of -LSOME_LIB_PATH -lpython), so at runtime loader will think it needs the libpython.so at that specific path, and so load in an additional libpython.so, which causes the python binding built with one python installation not reusable by another (maybe on same machine or sometimes even not on same machine). The solution is quite simple, which is we don't link against libpython, leave all the python related symbols unresolved at build time, they will be resolved at runtime when imported into python. Closes https://github.com/caffe2/caffe2/pull/1514 Reviewed By: dzhulgakov Differential Revision: D6412405 Pulled By: bddppq fbshipit-source-id: 9ff5b752ae3806bfac94085942f82d89c304c887	2017-11-27 08:47:08 -08:00
Junjie Bai	6e2e623fa9	Use list for *requires in setup.py Summary: `f012485e47` Closes https://github.com/caffe2/caffe2/pull/1523 Reviewed By: dzhulgakov Differential Revision: D6414162 Pulled By: bddppq fbshipit-source-id: 6a2941a4e996f04acddf6ad871b6b58cb763cc91	2017-11-27 08:35:28 -08:00
Junjie Bai	3da9d7971d	Suppress pytest filter_too_much health check Summary: Fix the travis CI Closes https://github.com/caffe2/caffe2/pull/1524 Reviewed By: dzhulgakov Differential Revision: D6412499 Pulled By: bddppq fbshipit-source-id: eaa5942c88d4edd65600d035e31d2300fd8ab3a8	2017-11-27 08:35:27 -08:00
Luca Antiga	9b31280ccf	Have __sizeof__ account for size of stored elements (#3821 ) * Have __sizeof__ account for size of stored elements * Conform to sizeof specification	2017-11-27 11:22:34 -05:00
Iaroslav Shcherbatyi	558516fcdb	More docs for Conv1d Conv2d (#3870 ) * Add a bit of notation explanation For a first time user of Conv1d, it is not clear from documentation what N, C and L exactly mean. This should clarify this. Same for Conv2d.	2017-11-27 11:07:48 -05:00
Sergey Zagoruyko	11c9bd6c98	Allow target.requires_grad in l1_loss and mse_loss (#3876 )	2017-11-27 10:59:16 -05:00
Adam Paszke	669a99b595	Remove as much of Python from JIT hot path as possible	2017-11-27 11:42:47 +01:00
Dmytro Dzhulgakov	06408168e6	Fix padding according to https://github.com/onnx/onnx/issues/261	2017-11-27 01:09:40 +01:00
Tristan Zajonc	3a5bbc2140	improve PackedSequence docs to explain batch_sizes (#3878 )	2017-11-26 21:11:54 +01:00
gchanan	989e8ff781	Implement is_sparse, is_distributed as native function, (#3838 ) work towards cpu() working on sparse tensors.	2017-11-26 13:19:51 -05:00
theFool	fca77c9e25	Correct gradient of rosenbrock (#3881 )	2017-11-26 11:49:04 -05:00
Sam Gross	74fd79d889	Set seed at top-level of common.py (#3862 ) Some tests, such as test_autograd.py, include random generation at the top-level. It's going to be tough to police these files to ensure that all randomness only happens within a test, so just set the seed as soon as args are parsed (as well as before each test). torch.manual_seed_all is no longer needed since torch.manual_seed also seeds the CUDA random number generator.	2017-11-26 11:46:53 -05:00
Sam Gross	ed640010ce	Delete unused autograd functions (#3856 )	2017-11-24 14:31:11 -05:00
Adam Paszke	9bbf4ee55e	Fix lint (#3859 )	2017-11-24 10:38:50 -05:00
Adam Paszke	65e0d5bad8	Fix void* wrapping in autograd codegen Also, add assertions here and there to make sure bad things never happen again.	2017-11-24 13:33:13 +01:00
Hugh Perkins	ef70db09dd	fix some more mathjax (#3352 )	2017-11-24 11:14:37 +01:00
Jon Crall	ffd39f4c9f	added missing arg and improved example clarity (#3444 )	2017-11-24 11:11:35 +01:00
Mikhail Korobov	754f3d3fe8	fixed a typo in ConcatDataset.cumulative_sizes attribute name	2017-11-24 11:07:51 +01:00
xuy111	92883f3444	change doc for Adaptive Pooling	2017-11-24 11:06:13 +01:00
Andrew Tulloch	09b008f155	Fix BUCK for caffe2_test Differential Revision: D6402763 fbshipit-source-id: c8fe2f84c1cac92eab9bb8f612278957cbfe042f	2017-11-22 20:56:38 -08:00
Andrew Tulloch	eb4344d6e6	Depthwise F(2x2, 3x3) convolution Reviewed By: Maratyszcza Differential Revision: D5117325 fbshipit-source-id: 21de84f8836bad142465eb02405a2f867fa09f85	2017-11-22 20:56:34 -08:00
Fei Sun	8495503c6f	Set a default input type Summary: Set a default input type so that users do not need to always specify one. Test Plans: run caffe2_benchmark without the input_type argument, the default one is used. Closes https://github.com/caffe2/caffe2/pull/1513 Reviewed By: hlu1 Differential Revision: D6401820 Pulled By: sf-wind fbshipit-source-id: bc8406ca000b3f65fb9aeb1c9c80eb766d625758	2017-11-22 18:36:57 -08:00
Aapo Kyrola	0954775d28	AddPadding CUDA version Summary: CUDA version of the AddPadding op. It first executes a prefix-sum using Cub to compute the cumulative lenghts array. Then it launches a kernel that uses this information to fill the output tensor with start, end paddding and the actual contents. Reviewed By: asaadaldien Differential Revision: D6391413 fbshipit-source-id: 45b431e5976674729e53cb4752c7753c1d8a69e8	2017-11-22 18:17:21 -08:00
Xianjie Chen	5250d7fd11	simplify logic for weighted pooling using id score list Summary: so that user can use 'WeightedSum' pooling method when there is mix of id list feature and id score list features. - it's still intuitive to have "WeightedSum" for id list, and we do not need to introduce new "UnWeightedSum" etc. Reviewed By: chocjy Differential Revision: D6369270 fbshipit-source-id: 722fa08d1a7986bc6ecf4c7cb02bbae0825bcab4	2017-11-22 17:32:04 -08:00
SsnL	6dc1fc7e69	fix padding_idx for sparse=True (#3842 )	2017-11-22 19:04:29 -05:00
Luca Antiga	af58bfbb1b	Make integer parameters and buffers immune to float(), double() and half() (#3820 ) * Avoid casting integer params and buffers to float(), double() and half() * Add test for immune integer buffers * Fix documentation for float(), double() and half() * Fix test	2017-11-22 18:34:53 -05:00
Andrew Tulloch	48415d83c8	Fix instance_norm_test.test_instance_norm_model_helper Reviewed By: jerryzh168 Differential Revision: D6391749 fbshipit-source-id: ba861d401e358290782db8f360c430e3f3daae96	2017-11-22 15:05:29 -08:00
Simeon Monov	b5ad8c8d16	Fix CharType min and max values (#3843 ) * Fix CharType min and max CharType is int8_t and this is not equal to char. CHAR_MIN and CHAR_MAX cannot be used reliably to specify min and max values. * Use SCHAR_* instead of hardcoded min/max values for CharType	2017-11-22 17:00:13 -05:00
Yangqing Jia	59b2654544	reapply header change after xplat move Summary: This is a reapplication of the earlier PR due to xplat move. Original author is Christoph Conrads <christoph.conrads@fluent.ai> christoph-conrads . Reviewed By: houseroad Differential Revision: D6379736 fbshipit-source-id: b7482ecf3b9487a528c15e92976e915791210002	2017-11-22 13:04:37 -08:00
PE Mazaré	3ac2a20c5f	Fix DataParallel scattering for empty lists / dicts / tuples (#3769 ) * Fix DataParallel scattering for empty lists and dicts * Fix DataParallel scattering for empty tuples	2017-11-22 14:24:36 -05:00
Fei Sun	9468a1e24f	Add input type to caffe2_benchmark Summary: Allow inputs to be either float of uint8_t type. Closes https://github.com/caffe2/caffe2/pull/1508 Reviewed By: hlu1 Differential Revision: D6392742 Pulled By: sf-wind fbshipit-source-id: d83f1602b366907405108ce37fa35c1a0f68551a	2017-11-22 11:09:08 -08:00
Richard Zou	b404cfe29a	Fix MultiLabelMarginLoss docs (#3836 )	2017-11-22 13:08:11 -05:00
gchanan	9c498aa523	Implement Variable cpu() as an ATen method. (#3802 )	2017-11-22 11:25:52 -05:00
peter	c7b1d58b16	Fix THP_export for python_variable_indexing.cpp	2017-11-22 17:12:20 +01:00
Adam Paszke	fd7bfaf4e4	Fix errors in previous DataChannelMPI refactor (#3831 )	2017-11-22 10:05:57 -05:00
Liang Xiong	fc0c8c2316	minor refactoring in dper Summary: small changes as I was reading through the dper code base. all of them are nits, but somewhat helped me understanding things. Reviewed By: xianjiec Differential Revision: D6389380 fbshipit-source-id: 3412052e4fcba199c6ffc84c6f7ae11bf8ff6ee9	2017-11-21 18:12:49 -08:00
Pieter Noordhuis	26c5e5d5d9	Use EIGEN3_INCLUDE_DIR for Eigen includes Summary: The plural version is not defined in the CentOS CMake module. Verified EIGEN3_INCLUDE_DIR is defined in the Ubuntu CMake module. This fixes the build on CentOS when using system Eigen3. Closes https://github.com/caffe2/caffe2/pull/1505 Differential Revision: D6390712 Pulled By: pietern fbshipit-source-id: b8abb14a62e0ff9fa9c920866504da0e75786c0d	2017-11-21 15:53:38 -08:00
Pieter Noordhuis	b3e5166d4c	Run build_android.sh in Jenkins Summary: Closes https://github.com/caffe2/caffe2/pull/1479 Differential Revision: D6386248 Pulled By: pietern fbshipit-source-id: ac4ce163c164a49aa83e2c7015003763bc2fd0e7	2017-11-21 15:53:38 -08:00
Pieter Noordhuis	12ce6c8b7c	Re-enable net_test Summary: Disabled when configuring Jenkins to get a run where tests pass. Closes https://github.com/caffe2/caffe2/pull/1449 Differential Revision: D6390647 Pulled By: pietern fbshipit-source-id: c16edc0c4d21ad60f101cf860e5dec183a1ea71a	2017-11-21 15:53:37 -08:00
Richard Zou	5215640a41	Fix cosine_similarity's output shape (#3811 )	2017-11-21 18:33:41 -05:00
Adam Paszke	cf3ca13321	Improve DataChannelMPI (#3817 ) Remove unnecessary messages and make certain functions in-place. This commit weakens error checking, but I think it's fine to make it UB for now, and implement a better asynchronous mechanism later. This is much needed for achieving high performance. This also adds support for CUDA-aware MPI implementations.	2017-11-21 18:33:05 -05:00
Soumith Chintala	c5e8048f58	add error checking for FusionCompiler around old CUDA versions (#3753 ) * add error checking for FusionCompiler around old CUDA versions * improve error message	2017-11-21 18:27:12 -05:00
Ilia Cherniavskii	304e64e70d	Fix NetTest.ChainingForDifferentDevices Summary: Closes https://github.com/caffe2/caffe2/pull/1495 Reviewed By: bwasti Differential Revision: D6381246 Pulled By: pietern fbshipit-source-id: bdc104dff0c667bde08fa0512b5956a39e84ad7e	2017-11-21 11:04:36 -08:00
Aapo Kyrola	daa450d656	add sanity check to model_helper.TensorProtosDBInput Summary: Caffe2 user was confused when model.TensorProtosDBINput([reader]) did not work. This is because of this outdated model helper function, that ignored the input blobs. Added assertion to enforce correct usage. I did not want to make this work with reader input as well, since this probably should not be used anyway. Reviewed By: amanrajdce Differential Revision: D6380326 fbshipit-source-id: 6a50c2861f7f58c06cbfe3e86bde0f17a2b443cb	2017-11-21 10:28:25 -08:00
Sam Gross	4518793aa2	Implement indexing in ATen (#3725 ) Implements basic and advanced indexing using ATen tensors/variables. Basic indexing is translated at the Python-binding level (python_variable_indexing.cpp) to slice/squeeze/unsqueeze/select calls. Advanced indexing is implemented in ATen in terms of take() and put() calls.	2017-11-21 13:19:00 -05:00
Yan Shang	dcaaf51100	Support /sqrt(n) pooling Differential Revision: D6378584 fbshipit-source-id: 3c6606c4e71afbd31dbb97ceeac38dfbe7b40090	2017-11-21 09:04:02 -08:00
Simeon Monov	8ebf18b5b1	Added MAGMA_HOME env var to specify alternative MAGMA root directory (#3809 ) FindMAGMA.cmake will look for MAGMA library under harcoded /usr/local/magma by default. This commit adds MAGMA_HOME env variable as alternative way to provide the MAGMA home directory. This is very useful (and the only way) when the user is with restricted rights and cannot install magma librairies under /usr/local/magma. Also it is helpful when having multiple versions of the library and being able to select the one to use.	2017-11-21 09:01:34 -05:00
Junjie Bai	303ed8af44	Allow specifying cmake build directory in the build scripts Summary: Closes https://github.com/caffe2/caffe2/pull/1496 Reviewed By: pietern Differential Revision: D6379743 Pulled By: bddppq fbshipit-source-id: 1cb2238e5708547767729de3ac1d3e1a76ed5ba1	2017-11-20 20:32:30 -08:00
Andrew Tulloch	77b78935f2	More extensions Reviewed By: kevinbchen Differential Revision: D6300944 fbshipit-source-id: e915c3f3d6b475752d8b7df82ec467d86f88a7c7	2017-11-20 17:18:51 -08:00
Andrew Tulloch	a81c63df83	Fix pad handling in ConvPoolOpBase::SetOutputSize(...) in the legacy_pad case. Reviewed By: Yangqing Differential Revision: D6300926 fbshipit-source-id: 8126a02667f9313a8d148e3905384adf7470debf	2017-11-20 17:18:50 -08:00
Aapo Kyrola	e0c8c539e7	Backed out changeset 119623addbbd Summary: Unlanding D6327460 because seems to be causing unstability. Differential Revision: D6377117 fbshipit-source-id: 4e1241fe65cd4c7a127fa6fa724f60b75965a096	2017-11-20 16:17:52 -08:00
Pieter Noordhuis	335c7dc681	Fix perfkernel compile error on clang 3.8 Summary: Closes #1483. Closes https://github.com/caffe2/caffe2/pull/1489 Reviewed By: bddppq Differential Revision: D6376107 Pulled By: pietern fbshipit-source-id: 892f74d67629609ed82c991cfd94508cf8e23c29	2017-11-20 16:17:51 -08:00
Adam Paszke	c52ca23447	Always define outputs of ConvBackwardBackward (#3799 )	2017-11-20 19:05:25 -05:00
Scott Stevenson	a9ef76b9c6	Reflect renaming of OS X to macOS (#3795 )	2017-11-20 16:52:10 -05:00
Pieter Noordhuis	ad3e619198	Bring back CUDA_ARCH_NAME=Manual Summary: This should also be ported to Gloo since its Cuda.cmake was synchronized to Caffe2 in #1256. Verified that running CMake with `-DCUDA_ARCH_NAME=Manual` and `-DCUDA_ARCH_BIN=70` ends up running nvcc with `-gencode arch=compute_70,code=sm_70`. Closes #1460. Closes https://github.com/caffe2/caffe2/pull/1487 Reviewed By: bwasti Differential Revision: D6376222 Pulled By: pietern fbshipit-source-id: 563a2947567a2af8a0e64475b346a19d76545ed3	2017-11-20 13:51:21 -08:00
Sam Gross	4bce69be22	Implement Variable.storage() (#3765 ) This still uses THPStorage, but avoids touching THPTensor	2017-11-20 14:18:07 -05:00
Sam Gross	10d24d8f84	Add Tensor.slice() (#3750 ) The slice function is very similar to narrow, except that it takes an optional "step" argument. Unlike narrow, the arguments use the same conventions as Python indexing: negative values wrap around and start and stop are clamped to the size of the Tensor.	2017-11-20 13:58:12 -05:00
gchanan	ee08120b46	Move Variable conversion methods to ATen. (#3762 ) * Move Variable conversion methods to ATen. * Add a test to ensure type conversions work through backwards. * Fix VariableType copy for type conversions. * Add comment about needing to handle device movement. * Move back to opposite order for copy function params -- inplace views depend on it. * Use is_available() rather than is_available.	2017-11-20 13:28:08 -05:00
Bram Wasti	0348e98cb4	docs update Reviewed By: akyrola Differential Revision: D6374075 fbshipit-source-id: bf312695f6f429e0ad1d1117bcbb85b0d1e06195	2017-11-20 10:07:22 -08:00
Adam Paszke	cf407213f9	Clean up stochastic function related dead code (#3782 )	2017-11-20 12:44:45 -05:00
Andrew Dye	1ba3e14608	Throw Python exception from PythonOp instead of logging Summary: Today when PythonOp throws an exception, we log the error and fail the op. Later we assert that the op/net/plan succeeds and throw with a generic message. The user must ttail the logs to find the real error. Instead, align with exception handling from other ops - throw directly. This will include full context of the exception in the error message. Reviewed By: Yangqing, akyrola Differential Revision: D6359684 fbshipit-source-id: 85133ba6562759607a3971449120647cbacce946	2017-11-20 09:03:17 -08:00
Simeon Monov	38cd6b3bd0	Fix run_test.sh mpiexec failures under virtual python envs (#3792 ) If virtual python environment is in use (e.g. conda) and mpiexec was compiled with --enable-mpirun-prefix-by-default option, it will fail by default as the path is updated to the prefix and different python (most cases /usr/bin/python) will be used.	2017-11-20 08:54:03 -05:00
Qinqing Zheng	4471e15b76	BMUF cpu support Summary: change the interface so BMUF can run on cpus Reviewed By: asaadaldien Differential Revision: D6356026 fbshipit-source-id: f58a4da9f800d969145a1a376e118b0f3581f8c1	2017-11-19 23:41:25 -08:00
Adam Paszke	4d405a4430	Fix hash.h compile errors in newer compilers (#3783 ) Disable the default std::hash<T> overload if T is an enum type.	2017-11-19 19:28:43 -05:00
Adam Paszke	3e4a777e44	Correct JIT interpreter autograd function (#3760 )	2017-11-19 21:48:22 +01:00
vfdev	fa5324d2a3	Update README.md (#3781 ) * Update README.md * Update README.md	2017-11-19 11:59:01 -05:00
Stefan Schweter	2c39f3de99	flake8 fix	2017-11-19 00:32:02 +01:00
Fritz Obermeyer	1f64c2ef91	Rename pyro.distributions.Multinomial -> .Categorical (#3766 ) * Rename distributions.Multinomial -> distributions.Categorical * Rename Multinomial -> Categorical * Update docs * Update variable.py * Update distributions.py * Update variable.py	2017-11-18 16:10:07 -05:00
jekbradbury	40179cd61c	fix cuDNN RNN weight tying test (#3774 )	2017-11-18 14:42:40 -05:00
folz	ca3fc59a9a	fix elapsed_us spelling	2017-11-18 18:28:27 +01:00
peterjc123	0fd9682305	Fix torch::hash for MSVC again (#3767 )	2017-11-17 22:46:14 -05:00
jekbradbury	a9ec4ee742	Detect aliasing in cuDNN RNN flatten_parameters (#3752 ) * Detect aliasing in cuDNN RNN flatten_parameters * add test	2017-11-17 22:32:38 -05:00
Yongqiang Wang	0e99334efb	move print to logger Summary: further cleanup data_worker's messy output Reviewed By: asaadaldien Differential Revision: D6217857 fbshipit-source-id: 51cee29a687501d0f965422586fd6cb66a2d516a	2017-11-17 18:03:44 -08:00
Junjie Bai	0a09fba3f6	Capture CMAKE_ARGS in setup.py and pass them as args to build_local.sh Summary: build_local.sh has been changed in a8bb05d to not take CMAKE_ARGS environment variable as args to cmake command Closes https://github.com/caffe2/caffe2/pull/1488 Differential Revision: D6364057 Pulled By: bddppq fbshipit-source-id: a96787f3d3f1367ada4819420906e549f0945c8f	2017-11-17 15:51:30 -08:00
gchanan	067f799e9f	Implement remaining Variable fallthrough methods via ATen (#3744 ) * Use aten version of is_signed. * Define is_cuda native function and use it for variable. * Use ATen dim for Variable dim/ndimension. * Get rid of dim, ndimension fallthroughs in variable.py. * Move size/stride Variable methods to use ATen. * Implement shape property on Variable via ATen. * Remove the _getattr__ function from Variable. * Get rid of dispatch functions and avoid cast. * Add THPUtils_packInt64Array. * Throw python errors. * Use fallthrough and fix fallthrough generation for native functions. * is_cuda is a property, not a method.	2017-11-17 15:57:56 -05:00
Aarti Basant	5de880f3e1	Resume from epoch instead of re-starting a worklow from scratch when we retry Reviewed By: anshulverma Differential Revision: D6354076 fbshipit-source-id: d2bee93a1136fb07c46942649e90110d2e3ccb0e	2017-11-17 12:51:07 -08:00
Junjie Bai	931bd87e98	Add setup.py Summary: Take environment variable CMAKE_ARGS as extra cmake flags tested `pip install` `pip wheel` `python setup.py build` `python setup.py install` Closes https://github.com/caffe2/caffe2/pull/1480 Reviewed By: Yangqing Differential Revision: D6347062 Pulled By: bddppq fbshipit-source-id: 5806c3a50826c6936e82a64884db1fd7db142097	2017-11-17 12:22:52 -08:00
Philipp Lang	c4b0db5079	Remove hard file offset reset in load() (#3695 ) * improved file offset logic * load offset test * whitespace * needless exception handling * test integer in binary	2017-11-17 15:21:37 -05:00
Sam Gross	2453bc2876	Implement clamp using ATen (#3739 )	2017-11-17 13:12:36 -05:00
Sam Gross	23ca19ae3d	Fix GCC 4.8 build	2017-11-17 08:54:28 -08:00
Alexander Sidorov	8d321b6cd3	Improve observer framework overhead Summary: There were several regressions over time. Looks like the main one is recent change with having a map which we iterate over for each operator call. I made some other little optimizations to our Facebook observer. Overal this seems to cut about 1000ns from an opertor. At a rate of 36B operators per second this shouldbe about 750 type vi hosts. Reviewed By: bwasti Differential Revision: D6327460 fbshipit-source-id: 119623addbbd575486906959d65603eea8d4f5e6	2017-11-17 08:36:35 -08:00
Gregory Chanan	309b2a0093	Move jit test order from beginning to right before multiprocessing.	2017-11-17 11:21:32 -05:00
Edward Z. Yang	689cf7d480	Reduce nondeterminism in test_jit (#3561 ) Occasionally Travis builds would fail on these two tests. It's not entirely clear where this nondeterminism is coming from. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-17 19:48:59 +08:00
anderspapitto	b97dfc8a92	Pretty names: support names set via export or Variable constructor (#3371 ) Add (fully opt-in) functionality to support setting pretty names for nodes in the graph. In particular - Variable now has a `name` parameter in the constructor - export now has `input_names` and `export_names` parameters Nodes that are not named via this mechanism continue to be named internally with unique integers. Names have a few rules. - They must all be unique in the graph. - They may not be integers (because of potential conflicts with internally generated names).	2017-11-16 21:11:34 -05:00
James Reed	5b4a438563	Implement bmm symbolic (#3681 )	2017-11-16 19:57:02 -05:00
Aapo Kyrola	1a02e72254	fix missing DPM .values() and .keys() to viewvalues() and viewkeys() Summary: Reported by SImon Layton from NVIDIA: we had a couple of py3-incompatible expresions in data_parallel_model Reviewed By: azzolini Differential Revision: D6349447 fbshipit-source-id: a09feb69396be43296400591a3bfed5b8c370b0d	2017-11-16 16:08:18 -08:00
Adam Paszke	9bfaec86b0	Use multiplication instead of division in conv group size checks	2017-11-17 00:13:27 +01:00
Adam Paszke	35295e42c0	Add static asserts to ensure all cuDNN algos are checked	2017-11-17 00:13:27 +01:00
Adam Paszke	d1fb8fdf03	Improve IODescriptors in JIT arg checking	2017-11-17 00:13:02 +01:00
Adam Paszke	5b2026e75b	Add torch::hash	2017-11-17 00:13:02 +01:00
Pieter Noordhuis	b909fce358	Make macOS build use ccache via CMAKE_C*_COMPILER Summary: Closes https://github.com/caffe2/caffe2/pull/1484 Reviewed By: Yangqing Differential Revision: D6352137 Pulled By: pietern fbshipit-source-id: f17c7c8cf38e7a4b8e2af60010bdde920f39e7c5	2017-11-16 14:24:54 -08:00
Zeming Lin	67354da8cd	conv2d in autograd C++ (#3702 )	2017-11-16 17:17:43 -05:00
Yan Shang	24e83acbb9	Enable sampling in evaluation Reviewed By: chocjy Differential Revision: D6119768 fbshipit-source-id: c8447326008392df70ab10b04f84223cf6d882b1	2017-11-16 14:03:51 -08:00
Zachary DeVito	cc7f09a372	Add cudaEvent support to the profiler (#3734 ) * Add cudaEvent support to the profiler This adds the ability to record cuda timings using cudaEventRecord in the profiler. Since it doesn't require nvprof it is easier to run than the nvprof path. This also records a thread id for each event, which will make tracing results easier to understand * Add flow arrows from cpu to cuda event * Fix no cuda build * Review comments * Move CUDA checks to one place	2017-11-16 13:58:09 -08:00
Qinqing Zheng	ce3413549f	check cloned observer in RNN Executor Summary: Ensure the clone() function didn't return a nullptr before attaching to an RNN operator Reviewed By: salexspb Differential Revision: D6341735 fbshipit-source-id: acf89c32f8dae2fd9bc8cb1029bc00df5dbe9dbd	2017-11-16 13:30:53 -08:00
Yiming Wu	127a55ae49	cast op for empty batch Summary: Cast op cuda can deal with empty batch now. Reviewed By: azzolini Differential Revision: D6350138 fbshipit-source-id: 2f3d19f4d42ff34806aa9597690e66f6b4de1a6b	2017-11-16 12:20:20 -08:00
James Reed	8f5c0f9678	Record stack traces for CppOps (#3727 )	2017-11-16 14:49:01 -05:00
Pieter Noordhuis	fd7e227826	Re-enable mkl_sbn_op_test.py Summary: Closes https://github.com/caffe2/caffe2/pull/1478 Reviewed By: Yangqing Differential Revision: D6349938 Pulled By: pietern fbshipit-source-id: d7f42c928f75a7d93318f5e622f8fcb0efab015e	2017-11-16 11:45:32 -08:00
Gregory Chanan	066f26c7fa	[ATen] Introduce templatized native functions and implement is_signed.	2017-11-16 11:34:34 -05:00
Wenyi Huang	d8dfaeeef7	Add batch-based/row-based sparse from/to dense operator Summary: Two ops: BatchSparseToDenseOp and DenseToBatchSparseOp Inverse operations of each other. Details are described in op Doc These op is used along with flexible topK, where the output is lengths, indices, and values. We want to do softmax on the values, but the dimension of each batch is different. So these op will convert sparse representation to dense and vice versa. The two ops are also gradient op for each other. Reviewed By: chocjy Differential Revision: D6288338 fbshipit-source-id: 0ba9e611058b39e46e7414dcc5f39cab29915fa3	2017-11-16 00:59:21 -08:00
Dmytro Dzhulgakov	6eca9e052d	Fix symbolic for Embedding and Upsampling and improve error messages	2017-11-16 00:38:10 -08:00
Xiaolong Wang	3bde37fbf0	Listwise Ranking -- LambdaNDCG Summary: This is part one: It adds lambdaNDCG loss which can be used to heuristically optimize the NDCG metric. Differential Revision: D5830650 fbshipit-source-id: 1eb696337c9a77727ad40219c68f6468e2e097a5	2017-11-16 00:05:48 -08:00
Gregory Chanan	2502ac082b	[ATen] Rename isDistributed -> is_distributed.	2017-11-15 18:33:07 -08:00
Gregory Chanan	6dee02923c	[ATen] Rename isSparse -> is_sparse.	2017-11-15 18:33:07 -08:00
Gregory Chanan	9a2b54e08b	[ATen] Rename isCuda -> is_cuda.	2017-11-15 18:33:07 -08:00
James Reed	50d6d258a3	Fix build breakage in ATen NativeFunction (#3729 )	2017-11-15 21:22:06 -05:00
Anshul Verma	a3afca6fc9	Minor documentation fix in NetBuiler Summary: Came across this bug in doc when I was figuring out NetBuilder form the code. Reviewed By: volkhin Differential Revision: D6341821 fbshipit-source-id: 8818f3d92681366bfe7b90d9d4da9f68ef6e4672	2017-11-15 16:22:22 -08:00
Soumith Chintala	9f9d7e6ee7	update gloo submodule (#3728 )	2017-11-15 18:47:02 -05:00
Frank Jiang	983872899e	Linear and constant warmup learning rate policies Summary: Implement LinearWarmup and ConstantWarmup learning rate policies. LinearWarmup warms up the learning rate from (starting_multiplier * learning_rate) to the specified learning rate over the first 'num_iter' steps. ConstantWarmup scales the learning rate by 'multiplier' for the first 'num_iter' steps. Differential Revision: D6316038 fbshipit-source-id: 1649c3ecd78bcdfec93b6cf195d86328393a7cb4	2017-11-15 15:23:08 -08:00
gchanan	b96976fceb	Use ATen equivalents for variable element_size and nelement. (#3724 ) * Use aten numel for variable nelement. * Use ATen elementSizeInBytes for element_size.	2017-11-15 17:54:02 -05:00
Pieter Noordhuis	39f0859749	Use ccache for macOS builds if present Summary: Closes https://github.com/caffe2/caffe2/pull/1475 Reviewed By: Yangqing Differential Revision: D6340034 Pulled By: pietern fbshipit-source-id: a932b8b2fd6f94215162b1f15f8f3ea640f542be	2017-11-15 14:38:36 -08:00
Yangqing Jia	e4bb22ebbf	bump ios-cmake Summary: For #1475 . Closes https://github.com/caffe2/caffe2/pull/1477 Differential Revision: D6337441 Pulled By: Yangqing fbshipit-source-id: 80b786e5d1989b53751751cf873d835ad16a1dd7	2017-11-15 13:49:44 -08:00
Soumith Chintala	99037d627d	fix OSX cuda build (#3722 )	2017-11-15 16:38:18 -05:00
Hao Lu	a8d99c145b	Move quant_decomp_zstd.* to share/contrib Summary: Move quant_decomp_zstd.* to share/contrib so that they're automatically synced to fbcode Reviewed By: Yangqing Differential Revision: D6336968 fbshipit-source-id: 1bf48ce97a017ddea8cc82865428a498653d5872	2017-11-15 13:18:09 -08:00
Artem Volkhin	067bc141c3	Cached reader Summary: a wrapper around reader with persistent file cache. Reviewed By: kennyhorror Differential Revision: D6257639 fbshipit-source-id: 113296173ca18d25b86e188e0c09e3dbd830969d	2017-11-15 12:38:49 -08:00
Richard Zou	2b5a38b1a8	Add missing trtrs, orgqr, ormqr docs (#3720 ) * trtrs docs * orgqr and ormqr docs	2017-11-15 15:37:34 -05:00
Sam Gross	b09d66e60d	Fix a reference cycle when in-place ops on views save the output (#3679 ) Previously, an in-place operation that saves its output (such as relu/threshold) would create a reference cycle when applied to the a view. There were two cycles created: 1) The cycle base.grad_fn.fn.input_.base base.grad_fn is a CopySlices base.grad_fn.fn is ThresholdBackward base.grad_fn.fn.input_ is a SavedVariable with base pointing to base 2) The cycle base.grad_fn.fn.input_.grad_fn.next_functions[0] base.grad_fn.fn.input_.grad_fn is AsStridedBackward and next_functions[0] points to base.grad_fn Generally, we avoid cycles because the AD graph is mostly immutable. Two notable exceptions are: a) Variable.grad_fn can change to point to a new grad_fn b) SavedVariables in a function can be set after the function is created The first case is not a problem if grad_fns do not hold strong references to Variables. Removing "base" from SavedVariable removes the strong ref. For the second case, we need to avoid saving the grad_fn of outputs. We were incorrectly saving the grad_fns of outputs when they were the result of in-place ops on views.	2017-11-15 15:19:41 -05:00
Zach DeVito	2300234c9c	Lint checks, small fixes	2017-11-15 11:47:18 -08:00
Zach DeVito	ef4b19f767	Refactor ir.h to distinguish Nodes and Values This commit adds a Value type similar to the one @ezyang suggested a while ago for handling multi-return nodes. Previously if we had a graph like: a = op1(b) c, d = op2(a) Then its in-memory format would look like: %0 = op1(b) %1 = op2(%0) %2 = select(%1, 0) %2 = select(%1, 1) Select nodes were used only to handle the multi-output case. In the single-output case ops referred directly to their uses. This required special handling for the single- and multi- output cases, and was confusing when used with ONNX which distinguishes values (the inputs/outputs of a node) from the nodes themselves (e.g. a Conv). This commit adds the Node/Value distinction to the IR. In the example above, `a`, `b`, `c`, and `d` are now Value objects, while `op1` and `op2` are now Node objects. Inputs/Outputs to the graph are values. * Nodes now always have multiple outputs, accessible through their `output()` method. * Methods exist for adding/removing outputs from a node. * Nodes own their output Values, destroying a node destroys its outputs and it is only valid to destroy a node when no uses of its outputs remain. * Unlike select, Values do not appear in the nodes list. * The method `node()` on `Value` retrieves its defining node. Calling it is always valid. For inputs, its kind is "Param". Like "Return" there is a single Param node representing all inputs. * For single-output Nodes, the method `output()` retrieves the single output Value, asserting that the node is in-fact single output. * Functions are the same, but some functions like `type()` have moved to Value. * `replaceAllUsesWith` is now sanely defined for both Values and Nodes. In the case of Nodes, it replaces all outputs of the node with the outputs of the replacement node. * stage is defined both on Node/Value. This is because Inputs require a stage. * Apart from changing data types from Node->Value most passes remain the same. Things that previously assumed single-output nodes now have to call output() to get the node. * This removes the uses = [...] field in the outputs because it was getting confusing even before this commit when uses would refer to nodes, but we print the names of Values. The lint pass validates the use list, so printing it out seems less necessary.	2017-11-15 11:47:18 -08:00
Sam Gross	feb0a145c3	Move Variable.var and Variable.std to ATen (#3704 )	2017-11-15 14:36:15 -05:00
gchanan	445cc1f5b9	NativeFunctions: support backend-specific dispatch and SpatialRoIPooling (#3672 ) * Support [output] in native_parse. * allow specifying [output] in NativeFunctions. Limitation: doesn't work for method, functions; can only do one or the other. * Sample native function with output. * spatial roi pooling forward skeleton (note, build is broken after this commit) * Support multiple variants in native functions with outputs. * add roi pooling forward cpu * Add support for tuple return in NativeFunctions. * native functions cuda * fix bug in roi pool cpu forward * finish forward kernel minus invocation * add option for getting current stream * Support backend-specific native function dispatch. * Move cuda stuff to native. * Move native related files to /native. * Get rid of NativeFucntionsCuda.h. * launch forward kernel * roipool backward kernel * Rebase expand error message changes. * Fix up header files. * add backward kernel launch, write as native function * Default to base dispatch. * Re-arrnage native_parse.py. * Get rid of tabs. * Get rid of at:: in C++ code in native function decl. * Parse name. * Parse name and return. * Parse arguments. * Don't specify variants. * Get rid of /NativeFunction. * Infer dispatch level. * Infer dispatch. * Improve argument parser. * Comment, simplify parsing. * Allow single line comments. * Parse 'const Tensor &foo' correctly. * Add comment to native_get_return_types. * Fix python2 build by removing kwarg to rsplit. * tabs --> spaces in roi foward cpu * rename to RoiPooling2d * add _cpu to roi pooling functions on cpu * fix name handling in native functions * Fix lint. * Simplify default handling. * Get rid of dispatch_level; infer it from dispatch. * Simplify multiple return type native parsing. * Move naming of outputs to gen.py from gen_variable_type. * Get rid of m_ for type methods; keep only method_prefix_derived for s_ functions. * add derivatives.yaml entry for roi pool * Native functions parsed from yaml. * Add comment explaining native_functions.yaml. * Fix runtime_error string format.	2017-11-15 10:24:51 -05:00
peterjc123	737aba3fc5	Fix cmake scripts for CUDA and MSVC (#3713 ) * Fix wrong CUDA generators and allow for new ones * Fix CUDA detection for other generators * Simplify the changed code * Remove useless flags for MSVC	2017-11-15 09:38:36 -05:00
Pieter Noordhuis	2bc71d4135	Forward args to .jenkins/build.sh to cmake Summary: So we can do things like pass -DCMAKE_BUILD_TYPE=DEBUG Closes https://github.com/caffe2/caffe2/pull/1474 Differential Revision: D6334701 Pulled By: pietern fbshipit-source-id: 08e6e48ba453ffca50ad0949ee7b0bf7251a542f	2017-11-15 01:05:48 -08:00
Dmytro Dzhulgakov	2bf4dec9ff	Add missing CMakeFile in caffe2/observers Summary: Broke by `43075b779b` Closes https://github.com/caffe2/caffe2/pull/1473 Differential Revision: D6333460 Pulled By: dzhulgakov fbshipit-source-id: 94a06b53650b02ff5938367896b52f47cdbf811a	2017-11-14 21:46:57 -08:00
James Cross	65a1dbc93d	penalty for EOS successor Summary: Current beam search generates successor states to EOS which are considered for inclusion in the beam even though they do not represent valid sequence prefixes. This diff introduces a penalty to ensure that such states are not included in the beam. Reviewed By: xliilx Differential Revision: D6325511 fbshipit-source-id: b17f10b0d00f3bc5fcc5a826a8a57a0f2cb360a6	2017-11-14 21:46:56 -08:00
Ilia Cherniavskii	2792de0d22	Revert D6331513: [caffe2][test] Fix NetTest Summary: This reverts commit b9e8ec9afc110b0284550c4818bde15ae108fa2f bypass-lint Differential Revision: D6331513 fbshipit-source-id: f24cb46fbcbcdbea2523297c567b08ceaaa93ea6	2017-11-14 21:33:12 -08:00
Adam Paszke	3bb2308a89	Minor JIT improvements (#3703 ) * Record autograd profiler events in JIT * Fix the graph fuser It was supposed to only work for float inputs, but worked for all types _except_ float.	2017-11-14 21:23:31 -08:00
Hao Lu	e73228b73c	Opensource styler_ops, norm_planar_yuv_op, and quant_ops Reviewed By: Yangqing Differential Revision: D6086149 fbshipit-source-id: ac1fb711c4f51091fdadf8e348abb127cf6bc245	2017-11-14 20:47:41 -08:00
Ilia Cherniavskii	80c3f8fa88	Fix NetTest Summary: Split into cpu and gpu parts, update chaining test Reviewed By: Yangqing Differential Revision: D6331513 fbshipit-source-id: b9e8ec9afc110b0284550c4818bde15ae108fa2f	2017-11-14 19:18:28 -08:00
Dmytro Dzhulgakov	1c1519d7cf	Fix export for recent changes in ONNX (#3708 )	2017-11-14 21:46:53 -05:00
Xue Feng	f0306c12ff	add Mean Pooling distributed support Reviewed By: dragonxlwang Differential Revision: D6114111 fbshipit-source-id: bc0a79a4455e490bdfaa1d5d6d77badfacd2375c	2017-11-14 17:30:31 -08:00
Daniel Tse	74367755f2	Integrated GRU implementation into C2 Summary: Fixed unit test failures for GRU cell first implemented in D5778202 - GRUCell implementation added to rnn_cell.py - GRU with recurrent attention test added to seq2seq_model_caffe2.py - seq2seq_rnn.py - Added specific behavior for 'gru' cell type - in LSTMWithAttentionDecoder, output_indices fix for GRU cells - in build_initial_rnn_decoder_states, don't process cell state for GRU cells Reviewed By: salexspb Differential Revision: D6316441 fbshipit-source-id: 18668f3db62245c5cdaf3bfa473a40e0feba0473	2017-11-14 16:18:50 -08:00
Sam Gross	5e9b445d38	Implement VariableType::alias (#3707 ) This isn't exposed to Python directly. Instead Python code can use variable[:], which will be implemented in terms of alias.	2017-11-14 19:14:12 -05:00
Sam Gross	b888d3ac2b	Implement toBackend and toScalarType on VariableType (#3706 )	2017-11-14 18:30:16 -05:00
Qinqing Zheng	c77f0cb5e6	Attach observers to operators inside step net Summary: Pass the list of observers to rnnExecutor_ and attach them to operators Reviewed By: akyrola Differential Revision: D6279655 fbshipit-source-id: 086dde1bf6edbfb36082d6b4de33ec41f0bbefab	2017-11-14 15:06:38 -08:00
Sam Gross	1d198c4f8c	Use ATen for Variable.contiguous() (#3701 )	2017-11-14 17:13:15 -05:00
Yangqing Jia	f756d9d45b	Turn off omp by default Summary: It used to be that we do quite a bit of #pragma omp parallel, but now it is pretty rare: https://github.com/caffe2/caffe2/search?utf8=%E2%9C%93&q=pragma+omp&type= As a result we should probably turn it off by default. Closes https://github.com/caffe2/caffe2/pull/1472 Reviewed By: bwasti Differential Revision: D6327459 Pulled By: Yangqing fbshipit-source-id: e304a85312bc2eb1e7cfe661373f873bffb2fb90	2017-11-14 13:17:49 -08:00
Yangqing Jia	b431526dbe	Disable protobuf libprotoc and protoc build for cross compilation. Summary: Also bumped third_party/protobuf to v3.4.1 similar to #1462 . cc pietern Closes https://github.com/caffe2/caffe2/pull/1466 Reviewed By: pietern Differential Revision: D6322210 Pulled By: Yangqing fbshipit-source-id: 00f72472b71d1903a2705daf56652e4fb3fc021e	2017-11-14 12:06:02 -08:00
Sam Gross	d478ece11e	Propagate is_volatile to the base when performing in-place ops on views (#3680 ) Previously, an in-place operation on a view that caused the view to be volatile would not propagate up to the base. This often happens in backward passes involving CopySlices which would increase memory usage by making grad non-volatile.	2017-11-14 14:42:06 -05:00
Soumith Chintala	0e522853bf	fix half uniform for cuda 7.5	2017-11-14 11:35:15 -08:00
Dmytro Dzhulgakov	47ac468504	Remove dilations for pooling in onnx export and other small fixes (#3698 ) * fix optimization pass issues * remove pool dilations	2017-11-14 14:28:05 -05:00
Lu Fang	f779f44c89	Add ONNX exporter for glcgan Summary: Export PyTorch glcgan model to Caffe2 using ONNX Reviewed By: dzhulgakov Differential Revision: D6298765 fbshipit-source-id: 324e52249bb88c6e7bb3b682a4ec0662b6a0c1ea	2017-11-14 10:09:44 -08:00
Sam Gross	9cb8b43778	Split off in-place NN functions (#3683 ) For example, this splits threshold into threshold(), which is now never in-place, and threshold_() which is always in-place. This simplifies the in-place vs. non-in-place logic in gen_variable_type.py, which was bug-prone.	2017-11-14 12:59:06 -05:00
Simon Layton	1ab3fd1a29	Fix Batched Matmul test accuracy Summary: Datatypes was being handled badly in reference check, causing sporadic fails in CI. All batched mat-mul with fp16 data is performed as pseudo-fp16, with all math in fp32. Adjusted the reference implementation to reflect this. Adjusted the gradient check threshold to the best I could get to consistently pass. Closes https://github.com/caffe2/caffe2/pull/1406 Differential Revision: D6324431 Pulled By: pietern fbshipit-source-id: 83ff2584438a11f7a6db4599a4fb0e75e9e15a3d	2017-11-14 09:31:18 -08:00
Adam Paszke	7605d196fe	Hotfix for ONNX BatchNorm export (#3691 )	2017-11-14 09:58:02 -05:00
peter	589ce4dfab	set CC and CXX only when it's empty In that way, we can use sth like clcache to speed up builds.	2017-11-14 15:16:21 +01:00
Sam Gross	446f869a0d	Support negative dimensions in softmax and log_softmax Fixes #3677	2017-11-14 14:53:31 +01:00
peter	ba3b79b06b	Fix the missing import	2017-11-14 09:36:43 +01:00
Yangqing Jia	b8f670eae8	Fix windows build error Summary: TSIA. Verified on local machine with VS 2017. Closes https://github.com/caffe2/caffe2/pull/1455 Differential Revision: D6310658 Pulled By: Yangqing fbshipit-source-id: 88f4519e8e9a4178719a5627365267f627dcb939	2017-11-14 00:05:33 -08:00
Gregory Chanan	a3bf06c0c7	Use ATen implementations for is_contiguous, is_set_to, numel, get_device.	2017-11-14 08:29:55 +01:00
Yan Zhu	c5b2c13433	fix error in NegateGradientOp Summary: remove unnamed namespace in .cc, to avoid error CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_NegateGradientEv Reviewed By: dragonxlwang Differential Revision: D6322332 fbshipit-source-id: af4e859761b6235dfb5c8b91a902262c2c775ad7	2017-11-13 22:47:09 -08:00
Yangqing Jia	c5bcd5560c	Adding zstd to build Summary: This is in order for us to share compression ops to oss. Closes https://github.com/caffe2/caffe2/pull/1463 Reviewed By: hlu1 Differential Revision: D6319101 Pulled By: Yangqing fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe	2017-11-13 22:18:44 -08:00
Zachary DeVito	e43ff32192	Add a JIT interpreter (#3634 ) * Add a JIT interpreter The separate interpreter is used to graphs with a lower overhead than converting them to autograd graphs. Some notes: * does not support Handles/PythonOp/CppOp, these will be in a future commit * jit_closure.cpp still exists and we fall back to it for now when cannot handle something because of PythonOp/CppOp * In order to support retain_graph=True, the interpreter can be cloned, creating a copy that can be run with different arguments. This is assumed to be the non-standard case so cloning is not particularly optimized. No tensor _data_ is copied, but the at::Tensor list in the interpreter is. If we hit problems, there is a lot we could do (such as register allocation) to minimize the stuff that needs to be copied. * Uses a pImpl pattern to keep implementation details out of its header file. * Modifies the way getTensorOp works so that it reads/writes to already-existing vectors, this prevents needing to realloc these buffers each time. * Timings are here: https://gist.github.com/zdevito/5a20ac29fb1b9e449e693b67dc478127 This reduces overhead to about the same as running it in python. It is about 10us faster to run the same thing using ATen directly. * Code Mod Interpreter -> InterpreterState Function -> Code Add other requested comments. * RegList -> ListHandle<T> Change the RegList functions to be safer by identifying the type of each argument list, and checking that list insert does not try to add to two different lists at once. * Use exactly equal for interp tests	2017-11-13 22:09:53 -08:00
Zeming Lin	b67acd2d39	Move detach to variable (#3676 ) * Move detach to variable * Move to autograd.cpp	2017-11-13 22:44:23 -05:00
peterjc123	283ca417cc	Fix ssize_t for MSVC (#3686 )	2017-11-13 22:43:23 -05:00
peterjc123	83b36175fc	Fix CUDA 9 builds for Windows (#3684 ) * Fix CUDA 9 builds for Windows * Add msvc conditional flag * minor bug fix * minor bugs #1	2017-11-13 22:43:00 -05:00
Sam Gross	70b8f0ed47	Fix elu double-backwards when applied in-place (#3687 ) * Fix elu double-backwards when applied in-place Removed unused "input" argument to elu_backwards. Also removed 'inplace' argument from backwards functions, since we don't ever want to use it. * Fix up additional calls to ELU_updateGradInput	2017-11-13 22:41:16 -05:00
James Reed	8701a2dfa3	Allow negative indices in Concat/Split ops Summary: Closes https://github.com/caffe2/caffe2/pull/1440 Reviewed By: dzhulgakov Differential Revision: D6290009 Pulled By: jamesr66a fbshipit-source-id: 93eaff6103211ff89ed63ecaf4aa96d38e6bed63	2017-11-13 18:32:24 -08:00
Zachary DeVito	5d61b1f559	Update ATen operator in C2 Summary: Update ATen operator to new version of aten library. This adds support for many neural network functions that previously were not exposed. This also supports operators that take a list of tensor inputs or produce a list of outputs by appending them to the end of the input/output lists. Reviewed By: jamesr66a Differential Revision: D6267327 fbshipit-source-id: 0df6af18369241afa8600fd51923811749900c2e	2017-11-13 18:18:41 -08:00
Yan Zhu	7b047c161d	NegateGradientOp and test Summary: add NegateGradientOp: in forward pass, this op simply copies the input to output. In backward pass, it flips the sign of gradients. Reviewed By: dragonxlwang Differential Revision: D6314456 fbshipit-source-id: 56afd8b131eff9f7e120ab7e4e87461df49649d4	2017-11-13 18:05:14 -08:00
Yan Shang	4847f8c191	Remove unused field in tensor proto Summary: This new field is not needed anymore, so this diff removes it Reviewed By: kennyhorror Differential Revision: D6316744 fbshipit-source-id: f8afc1c42a0592fd03c7939f8e6f78afc8510ec9	2017-11-13 17:25:15 -08:00
James Reed	30068b5b64	Fix function signature in ATenOp for at::Half Set function Summary: `c777be07d9` changed the type signature for the Set function, this fixes it for the ATenOp Closes https://github.com/caffe2/caffe2/pull/1464 Reviewed By: zdevito Differential Revision: D6317561 Pulled By: jamesr66a fbshipit-source-id: e54d553f44ccf0d5fc695e14dc671dde77004b54	2017-11-13 16:03:46 -08:00
James Reed	564efd3521	Allow 1->N broadcasts at the beginning and end to be fused (#3616 ) * Allow 1->N broadcasts at the beginning and end to be fused * Update comments and size logic	2017-11-13 15:37:48 -08:00
Hassan Eslami	c2ea3f66b3	Make a concrete function for device_option equality Summary: Currently, the device_option equality is done in a specialized private function. Ideally, we should be able to test the equality from other places in the code and have a more detailed check for the equality. Reviewed By: akyrola Differential Revision: D6316608 fbshipit-source-id: c3fd085583e535d7936d05e4c8b15d2eff91c744	2017-11-13 15:17:06 -08:00
Fei Sun	31e9ceeb4b	Refactor the observer code to use one function to report both net and operator Summary: There is no need to use two functions to report net and operators. One function is sufficient. Reviewed By: Maratyszcza Differential Revision: D6228730 fbshipit-source-id: c599527254f4a15a3e440d37055cc95fbb3436bb	2017-11-13 15:03:47 -08:00
Daniel Bermond	f600056f48	Allow build with CUDA 9.0 Summary: This correctly adds handling of CUDA 8.0 and 9.0 by cmake. Discussion: CUDA 9.0 is currently not handled by cmake. When trying to build with it and gcc6, the following cmake error is shown: -- CUDA detected: 9.0 ... CMake Error at cmake/Dependencies.cmake:332 (message): CUDA 8.0 is not compatible with GCC version >= 6. Use the following option to use another version (for example): -DCUDA_HOST_COMPILER=/usr/bin/gcc-5 Closes https://github.com/caffe2/caffe2/pull/1392 Differential Revision: D6317033 Pulled By: pietern fbshipit-source-id: 08b89f21b994af52533d5afaaa62f26e2e94aee8	2017-11-13 14:33:31 -08:00
caozhong	e8abfd359a	Limit this fix to apple clang only Summary: Use "__apple_build_version__" macro to distinguish Apple's Clang while brew installed LLVM will compile caffe2 without trouble. Closes https://github.com/caffe2/caffe2/pull/1461 Differential Revision: D6316861 Pulled By: Yangqing fbshipit-source-id: f7a08cdd8822b197a93aa11dc8f28ef5cd738eee	2017-11-13 14:33:30 -08:00
Aapo Kyrola	e9cc41885e	fix dynamic memory management for distributed execution Summary: Dynamic memory management in Data Parallel Model was broken for distributed computation because it also the parameter gradients where freed after been used. That is problem with GLOO because it expects the tensors to have the same address over multiple calls. It is not a huge loss to remove parameter gradients from recycling as they are relatively small for typical convnets. Reviewed By: asaadaldien Differential Revision: D6314095 fbshipit-source-id: 949161d8c592927ae2fa82b3262b5f9ee47bed6f	2017-11-13 12:09:11 -08:00
Fei Sun	97e4743aaf	Caffe2_benchmark can benchmark multiple backend engines Summary: Support the default, nnpack, and opengl back end engines. There is no need to change the model. The file would convert the model to appropriate backend. Closes https://github.com/caffe2/caffe2/pull/1436 Reviewed By: hlu1 Differential Revision: D6275975 Pulled By: sf-wind fbshipit-source-id: fbd864e18f00372b4c03de294c22383c405a9210	2017-11-13 12:09:09 -08:00
Yangqing Jia	4d152ab931	disable sbn running mean and var comp Summary: cc pietern Closes https://github.com/caffe2/caffe2/pull/1454 Differential Revision: D6310656 Pulled By: Yangqing fbshipit-source-id: fa9a1e44b6289eb59e0388325c39b11d3b3e3ad4	2017-11-13 12:09:08 -08:00
Hassan Eslami	667c7d980b	Avoid misleading message about NODE_ID blob Summary: Currently, in the single machine execution, a misleading message is printed to the log that the 'NODE_ID' blob is not found. This diff ensures that this message is not spitting out anymore while maintaining the semantics. Reviewed By: Maratyszcza Differential Revision: D6302728 fbshipit-source-id: 0f45245aedf6d4f664368595f7894e0f695e5323	2017-11-13 12:09:06 -08:00
Hassan Eslami	8ce205069a	Fix stats resporter in calulating STDDEV Summary: The STDDEV calculation code assumes that the `compare_exchange` returns the value of the atomic, while the C++ spec actually returns `bool`. Also, the diff puts enough guard to avoid math error on python side -- although this should not happen, the guard is just to avoid problems with floating point calculation offsets. Differential Revision: D6307930 fbshipit-source-id: d1754afb631f937aca7a88a82b5be2dd0c704aec	2017-11-13 12:09:06 -08:00
Qinqing Zheng	2be44ab242	remove redundant "template" keyword Summary: remove redundant "template" keyword Reviewed By: pietern Differential Revision: D6304205 fbshipit-source-id: cb15b784cc8954a7679ea1e12dda866e9fa86231	2017-11-13 12:09:03 -08:00
Ilia Cherniavskii	814cd7ade3	Fix event test Summary: Closes https://github.com/caffe2/caffe2/pull/1452 Reviewed By: pietern Differential Revision: D6304147 Pulled By: ilia-cher fbshipit-source-id: ed838d675689a47c8d1831926ab18dbca063ca08	2017-11-13 12:09:02 -08:00
Marat Dukhan	fec5631513	Updated nnpack code. original author is @Maratyszcza	2017-11-13 11:28:15 -08:00
James Reed	c7cb6a795e	Record stack traces during JIT tracing (#3607 ) * Update comments and size logic * Record stack traces during JIT tracing * Use string helper functions and AutoGIL * Use SourceLocation object instead of storing in debugName * Address zdevito comments * Address comments	2017-11-13 10:18:55 -08:00
Soumith Chintala	25b166ed1f	add depthwise convolution terminology as a note	2017-11-12 23:26:42 -05:00
josecabjim	e33df2b88a	Add border-padding for grid_sampler (#3599 ) * adds border padding to spatial grid sampler * fixes flake8 * adds docs	2017-11-12 18:46:49 -05:00
Vladislav Zavadskyy	30d06218cb	Solved boolean ambiguity for variables and tensors which contain one value. (#3656 ) * Solved boolean ambiguity for variables and tensors which contain one value. * Update variable.py * Update tensor.py	2017-11-12 11:07:50 -05:00
peterjc123	ea4432b3c2	Fix CUDA builds for Windows (#3650 ) * Fix CUDA builds for Windows 1. CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS has a limitation, the maximum number of exported functions cannot exceed 65535. So it can't be used. 2. Specify static on an inline function to prevent linking errors. * cancel CMAKE version limitation	2017-11-12 00:14:05 -05:00
Jaemin Cho	73431f087b	Allow torch.load and torch.save to take pathlib.Path (#3589 ) * Allow torch.load to take pathlib.Path pathlib has been python standard library for filesystem path since python 3.4 But `torch.load` currently cannot take `pathlib.Path` as its filename of state dictionary. I changed `torch.load` and `_with_file_like` to check so that they can accept `pathlib.Path` typed filepath. * Fix flake8: too long line & indentation	2017-11-11 18:50:13 -05:00
Sam Gross	4fa94793dd	Bump version in master (#3605 )	2017-11-11 18:49:19 -05:00
ngimel	2bf70c137e	fix selecting deterministic conv algo (#3631 )	2017-11-11 11:36:42 -05:00
Christian Sarofeen	0443c11f7e	Fix for cuDNN half precision RNN for pre-volta archs (#3613 ) * Fix for cuDNN half RNN on pre-volta archs * Fix cuDNN versioning in rnn. * lint fix	2017-11-11 11:34:58 -05:00
Luca Antiga	84c618010d	Remove redundant dimension check that produced maybe-uninitializd warnings	2017-11-11 13:40:55 +01:00
peter	7160fb0801	Fix setup scripts for Windows CUDA builds	2017-11-11 13:05:35 +01:00
Soumith Chintala	95821ca4e5	fix USE_BLAS detection in THGeneral.h.in (#3632 )	2017-11-10 14:13:30 -08:00
Hassan Eslami	1a58775e19	Fix AppVeyor Windows build due to template chaining Summary: The windows compiler has a bug with chained templates. This diff avoids using such pattern in `plan_executor.cc`. Closes https://github.com/caffe2/caffe2/pull/1442 Reviewed By: Yangqing Differential Revision: D6300046 Pulled By: heslami fbshipit-source-id: 1dc74441d6e2f0586c636e799eb5e88ced289063	2017-11-10 14:08:30 -08:00
peterjc123	0483304fab	Enable EXPORT_ALL_SYMBOLS for CMAKE (#3617 ) * Enable EXPORT_ALL_SYMBOLS for CMAKE If we turn on CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS flag, we don't need to add most decorators by hand. * Add quotation marks to pass the string args * added endif * Update CMakeLists.txt	2017-11-10 17:00:47 -05:00
Xianjie Chen	ae5673741b	add option to do simple modulo Summary: as desc. Differential Revision: D6240061 fbshipit-source-id: 814a541a3e7f09ebbe2df63fd9202312e9f4c8d4	2017-11-10 13:49:07 -08:00
Yangqing Jia	fc8532c89d	Allow serialization of custom types inside Tensor Summary: The use case is that sometimes we need a Tensor of custom type instead of POD or string. This diff allows one to delegate to BlobSerializerBase to further serialize the contents inside the Tensor. Design choices: (1) Each element is serialized as a BlobProto string, and stored in the repeated string field. (2) UNDEFINED is used as the enum value for the tensor data type, and the exact type string is stored in the additional field. (3) BlobSerializer is called on each item to obtain the serialized string. (4) This requires the custom type to have copy constructor - otherwise it will simply not be possible to copy over the deserialized content without explicit type. See blob_test.cc for an example. Reviewed By: sunnieshang Differential Revision: D6300196 fbshipit-source-id: 18bf94a22a07337e0fa83d3f1004b3651e38cf27	2017-11-10 13:14:21 -08:00
rluo	efe4386d24	Fix module load_state_dict error information.	2017-11-10 22:11:30 +01:00
andreh7	cc8fd5bde1	added #define __STDC_FORMAT_MACROS to tensor and storage code templates to avoid problems with gcc 4.8.5 (#3629 )	2017-11-10 15:21:33 -05:00
Xianjie Chen	c04ec84e1a	disable uniform fill large blob Reviewed By: pietern Differential Revision: D6299413 fbshipit-source-id: 2ea4a5f1434060c3ab6fd42abd4052bdb10a37cc	2017-11-10 12:10:14 -08:00
Marat Dukhan	3a6b38eb2c	Avoid unsupported version pinning for HomeBrew on CI Summary: Closes https://github.com/caffe2/caffe2/pull/1451 Differential Revision: D6298975 Pulled By: Maratyszcza fbshipit-source-id: 5b8a592748b400ca8ba4df089a8cdf886b6c0cf6	2017-11-10 11:20:43 -08:00
Marat Dukhan	4971aec81e	Add /usr/local/opt/python/libexec/bin to $PATH on Mac travis Summary: This should Travis the build failures on Mac Closes https://github.com/caffe2/caffe2/pull/1443 Reviewed By: bddppq Differential Revision: D6295041 Pulled By: Maratyszcza fbshipit-source-id: c143220e1ec17e49fe8e84f586f9fb82daba321a	2017-11-10 10:23:31 -08:00
Jeff Johnson	0440f3bf93	Reduce caffe2 GPU topk test sizes Summary: The topk GPU test was taking too much time, but there are still a variety of codepaths to test (k <= 1024, k > 1024, k == 1, k == n). Reduce the batch sizes and n to reduce time taken by the in-python CPU code equivalent. Reviewed By: pietern Differential Revision: D6272628 fbshipit-source-id: b8b8f3601f28bf64f144c73d7c9e915f40c84d70	2017-11-10 07:47:00 -08:00
Soumith Chintala	5478d0154f	Fix pthread detection for MKL	2017-11-10 07:40:57 -08:00
Adam Paszke	1f1612ee37	Move _CompiledMixin to C++	2017-11-10 16:31:44 +01:00
Adam Paszke	02450fff38	Expend autograd profiler docs (#3621 )	2017-11-10 08:58:45 -05:00
andreh7	7e1d795354	fix for unknown ssize_t in aten/src/TH/THMemoryFile.c (#3612 ) * added sys/types.h include to fix unknown ssize_t in aten/src/TH/THMemoryFile.c * now including <sys/types.h> only if _WIN32 is not #defined * now including sys/types.h in aten/src/TH/THDiskFile.c (if _WIN32 is not defined) to fix undefined off_t	2017-11-10 08:54:40 -05:00
Mohammad Hossain	e8e29690ef	Add has_debug_def() check to net's debug_def() Summary: same as title Reviewed By: salexspb Differential Revision: D6264232 fbshipit-source-id: e9f499e0c8758bcb52f079521fa95973fcba441f	2017-11-10 03:24:49 -08:00
Xianjie Chen	d1c73eb407	use size_t for rand fill functions in math Summary: The number of elements in the caffe2 blob can be larger than int32. Use size_t to prevent overflow. Reviewed By: ajtulloch Differential Revision: D6278363 fbshipit-source-id: 356e294c667a53360d8a65b56a63a39d5ce3384e	2017-11-09 18:44:46 -08:00
Edward Z. Yang	9b0990539b	Hack to detect when only one output is differentiable.	2017-11-10 09:58:40 +08:00
Edward Z. Yang	df433d427c	Only set_flags on differentiable outputs. I messed this up and TestNN.test_MaxPool2d_indices caught me out on it. This patch assumes that IndexTensor outputs are not differentiable. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	19515520bb	Make prelu an ATen op. This operator is a warmup I was doing before tackling convolution, as it has many properties that make it a "first" for implementing things. In particular, it is the first operator whose backwards have multiple returns; this means its double backwards is the first backwards for a function with multiple differentiable outputs. This exercises new code for output_mask and set_flags. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	68b5d94371	Extra sanity checking for derivatives.yaml versus Declaraitons.yaml Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	efb611a134	Fix misnamed generator argument. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	016af4ebf7	Fix parsing problem with std::array<int, 2> (note space) We are splitting on ', ', but that causes problems when you have a nested comma. Quick and dirty fix is to NOT have the space. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	95ddfbc947	Delete default parameters from derivatives.yaml. They don't actually do anything and they're not accurate (many functions have defaults which we didn't specify here). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	dfcd2a73f5	s/thpp/at/ Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	690bfc0781	Delete unused defined_if fields. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	9cdc24f550	Make 'name' occur first in Declarations.yaml Whenever I used to read Declarations.yaml, it would drive me batty that 'name' was always embedded somewhere in the middle of the record. Now it at the top, as it should be! What it looks like now: - name: storage_offset method_prefix: m_ arguments: - dynamic_type: Tensor name: self type: const Tensor & Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	43e4e3cca2	Some developer notes for ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	2da308f4b9	Add expand_as/type_as to ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	664fb135af	More elaborate error message when expand fails. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Edward Z. Yang	8f3bef2292	Add operator<< for at::Type Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-10 09:58:40 +08:00
Alexander Sidorov	bfdd864631	Automatically pretranspose FCs in BlackBoxPredictor Summary: pretransposing FCs seems to offset loses we get from low batch sizes in AdIndexer. First I confirmed this on local benchmarks (see previous diff). Then in https://fburl.com/yuo49onj I showed how this change saves 19% of FC time on AdIndexer. Which is already $0.4M in cap. exp. and over 3 years gives 5x more ROI. We also we reuse this code for later more efficient gemm implementations. I.e. msmelyan is working on new fp16 gemm which would cut bandwidth usage 2x. We can reuse code in this diff for repacking required by a new gemm. In this diff I had to take care of memory usage. Here are several possible approaches to the transformation: 1. Perform on the fly, copy the memory. This is what is done in skinny gemm (FC with engine SKINNY) Cons: slow first execution, memory is replicated for each thread 2. Perform copy of weights in operator constructor. On the fly in dbg mode verify that hash on original weight is the same Cons: memory is still replicated for each thread 3. Perform copy weights in Predictor constructor Cons: if we have 2 predictors sharing the same weight blob (via PredictorContainer), we still get 3x more memory. I.e. original weights and two copies for each of the predictors in a container 4. Replace weights in Predictor constructor, take care of mapping to support weight sharing within a Predictor container This is the approach taken in this diff, it solves issues above and doensn't create any memory overhead. Cons: Logic became complex, requires a mutex at initialization time Reviewed By: akyrola Differential Revision: D6214593 fbshipit-source-id: 25da6ba7bfd39fc8f4b578094d3f334c7957490d	2017-11-09 17:35:32 -08:00
Xianjie Chen	fe22e3deb9	make summarize op support larger blob and more robust Summary: - so that it can also summarize blob of size larger than int - the calculation of the mean and std may overflow/underflow, change to use double for intermediate calculation Differential Revision: D6278275 fbshipit-source-id: f0bb72a5279212d429fa6d09b5487cad1baacdbe	2017-11-09 17:02:48 -08:00
Wenyi Huang	7cedf80923	add flexible topK op Summary: Will probably rename to adaptive topK to be aligned with the layer name. The main difference from top_k op is that the K is not fixed as a layer parameter, instead this op takes in a blob that conatins K information for each row of the input data (batch mode). Reviewed By: chocjy Differential Revision: D6221209 fbshipit-source-id: f7fd575ff8f515d886d93278ad94fd17e8bd6fa5	2017-11-09 16:48:14 -08:00
SsnL	43d1405d0d	Fix ld* conditions for gemv ger gemm (#3604 )	2017-11-09 19:43:29 -05:00
Sam Gross	d496f9b20c	Ensure that Variables are at least one-dim in VariableType (#3609 ) Previously, we checked that Variables were at least one dimensional in the Python binding (wrap_outputs.h) and in the backwards functions. This was necessary because some Tensor functions returned Scalar types, which must be zero dimensional. This moves the wrapping logic into VariableType.	2017-11-09 17:34:24 -05:00
Christoph Conrads	0b476e6456	CMake: remove unneeded dependency with OpenBLAS Summary: Do not try to link against `libcblas.so` when using the OpenBLAS back-end. This fixes #763. I briefly checked the OpenBLAS repository and as far as I can tell, the OpenBLAS build script by build never created a library called _cblas_. Closes https://github.com/caffe2/caffe2/pull/1420 Differential Revision: D6283019 Pulled By: pietern fbshipit-source-id: 53cd4455bdc63ee9f31d5bca9822844548350ae3	2017-11-09 14:04:39 -08:00
Marat Dukhan	febe45ebb4	Disable NNPACK build on unsupported CPU architectures Summary: Few people complained in NNPACK repo about broken build on PPC64, as it specifically whitelists supported architecture in its CMakeLists.txt, and refuses to build on unsupported platforms. This commit explicitly disables NNPACK build (as part of Caffe2 build) on unsupported architectures. Closes https://github.com/caffe2/caffe2/pull/1439 Differential Revision: D6288999 Pulled By: Maratyszcza fbshipit-source-id: 76c40e9ce882356944b63968df8fd853f21ecd35	2017-11-09 13:48:05 -08:00
Anshul Verma	4b8669b087	Write checkpoint info to XDB at the end of an epoch Summary: In this diff I am making sure that the checkpoint metadata is written out to the db for every epoch. This will allow us to automatically resume from a epoch if a workflow fails. Reviewed By: aartibasant Differential Revision: D6234832 fbshipit-source-id: f09a4de118f2eac25f663556476ac6313925fdf3	2017-11-09 11:13:24 -08:00
Sam Gross	1bf717e17d	Raise exception when Variable.reinforce is called (#3555 ) Fixes #3554	2017-11-09 12:30:12 -05:00
Soumith Chintala	50009144c0	add warnings if device capability is less than ideal (#3601 )	2017-11-09 11:48:59 -05:00
Aapo Kyrola	12e4af94e8	add better gradient creation error message Summary: Print the full operator definition when gradient creation fails. This helps debugging cases where same op type is used in many places. Differential Revision: D6282832 fbshipit-source-id: 4b9dab2602c7c53f795da93a3085cf5c8ca741c1	2017-11-09 08:06:05 -08:00
Ozan Çağlayan	cc757acd36	docs: clarify the difference between net() and net.forward() (#3596 )	2017-11-09 08:16:01 -05:00
Ozan Çağlayan	dd6d04ddf2	doc: Normalize all true/false in docstrings to ``True\|False`` (#3593 ) * doc: Normalize all true/false in docstrings to ``True\|False`` This makes them more apparent in the documentation. * doc: fix flake8	2017-11-09 08:12:29 -05:00
peterjc123	9d4c2d743b	Enable the build for MSVC 2017 and Ninja (#3595 ) * Add ninja support for Windows MSVC * Enable prebuild commands for builds * Fix wrong typing	2017-11-09 08:10:40 -05:00
peterjc123	555c51c846	Fix build failures in MSVC (#3594 )	2017-11-09 08:10:00 -05:00
Zach DeVito	b06c59e543	fix warnings about _XOPEN_SOURCE redefinition. Every compilation unit whose headers recursively include Python.h need to include Python.h first. This is a known limitation of the Python headers.	2017-11-09 09:21:30 +01:00
Edward Z. Yang	0217ad29d2	Fix OS X build, fixes #3573 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-09 16:16:16 +08:00
Zachary DeVito	25d3c25f50	add more fusable nodes to the graph compiler (#3559 )	2017-11-08 22:58:08 -05:00
Soumith Chintala	285ce10dbe	fix linking order of nvrtc to force no-as-needed (#3583 )	2017-11-08 22:05:09 -05:00
vfdev	bf5932fb15	Add missing documentation for replacement in WeightedRandomSampler (#3579 ) * Update sampler.py * fix lint	2017-11-08 20:23:42 -05:00
Edward Z. Yang	d2784b6e5b	Link ATen against CuDNN when available. (#3582 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 20:20:53 -05:00
Richard Zou	ec389f5128	Fix cuda symeig (#3566 ) * Fix cuda symeig * Add symeig test * Better check for magma	2017-11-08 20:20:14 -05:00
gchanan	aabfae0503	CPU all/any should work with empty tensors. (#3581 )	2017-11-08 20:18:26 -05:00
Matthew Chan	bcc8c8f696	Support RMSProp in Caffe2. Summary: Add `RmsPropOptimizer` to `optimizer.py` so RMSProp can be used as an optimizer. `RmpsPropOptimizer` uses `RmpPropOp` to update the gradient and `MomentumSGDUpdateOp` to update the model parameters. Differential Revision: D6118279 fbshipit-source-id: e38b8380ff74c1d1bb1e87fc300b6b55e32cd2e0	2017-11-08 16:43:18 -08:00
Zachary DeVito	adf883b7b1	fix uninitialized warnings in THCUNN. (#3575 )	2017-11-08 16:18:45 -08:00
Alisson Gusatti Azzolini	4e3aa25139	Unit test that compares net snippets after parallelization Summary: - This is meant as a set of examples on how parallelize_net works. - Currently, only one example is provided. More to be added. Reviewed By: mraway, xianjiec Differential Revision: D6240160 fbshipit-source-id: 6f6f2d77445825883e050498cb6e06fb74508bbf	2017-11-08 15:55:27 -08:00
Pieter Noordhuis	2bc7fc8698	Add Jenkins build scripts Summary: Let's see if we can make this work... Closes https://github.com/caffe2/caffe2/pull/1417 Differential Revision: D6276601 Pulled By: pietern fbshipit-source-id: 4d51a66b693a1c5cff1e0c03373cd42bb273c885	2017-11-08 14:47:27 -08:00
Gregory Chanan	547ac8c0b9	Ensure aten build depends on NativeFunctions.h. Otherwise you can change the metadata and the code won't be re-generated.	2017-11-08 17:39:33 -05:00
Francisco Massa	0509f401d1	Update ATen to fix issues with old g++ (#3574 ) * Update ATen to fix issues with old g++ * Add comments	2017-11-08 17:25:49 -05:00
Junjie Bai	e6fadfa76e	Relaxing checks for fp16 in BatchMatMul tests Reviewed By: pietern Differential Revision: D6275557 fbshipit-source-id: e336ba9c897b88801f1be1b32029c5af58ec3fc5	2017-11-08 13:42:28 -08:00
Gregory Chanan	15c523f836	[ATen] Make size/stride native functions. Previously, sizes/strides() would give you the ATen view of the shape, while size(dim), stride(dim) would give you the TH view. This was unnecessarily confusing and there was no automatic way to get dim wrapping on the ATen view.	2017-11-08 16:33:42 -05:00
Fei Sun	b2bbc7c091	Enable building mobile directory files in OSS Summary: The source files are not exposed to the parent directory in mobile. Expose them now so that the files are built in OSS. Closes https://github.com/caffe2/caffe2/pull/1435 Reviewed By: akyrola Differential Revision: D6274056 Pulled By: sf-wind fbshipit-source-id: 6b54645bc9a42b4329d8aa20051abeb5fc6b1c37	2017-11-08 12:34:14 -08:00
peterjc123	aa911939a3	Improve Windows Compatibility (for csrc/scripts) (#2941 )	2017-11-08 19:51:35 +01:00
Pieter Noordhuis	348e29c49b	Don't run CUDA tests for ops without CUDA implementation Summary: Closes https://github.com/caffe2/caffe2/pull/1434 Reviewed By: houseroad, ilia-cher Differential Revision: D6272614 Pulled By: pietern fbshipit-source-id: 7b998b08ec02b03f88a6fd24a949b0d199b2aa37	2017-11-08 10:28:02 -08:00
gchanan	1d57a2d54c	[ATen][Scalars] Remove Scalar from return types of functions. (#3557 ) * Add direct C-type scalar conversions from Tensor, e.g. toCFloat() as an alias for Scalar(x).toFloat() * Provide tensor overloads for fill_, masked_fill_, index_fill_. * Everythign up to scalar overload. * Fix pytorch build for aten scalar return type changes. * Use valid expression instead of dangling else. * Simplify code generation. * Fix test_jit (why didn't this compile locally?)	2017-11-08 11:29:56 -05:00
Gregory Chanan	22d1e37540	Have ATen build respect DEBUG variable.	2017-11-08 11:28:30 -05:00
Anshul Verma	4761b32f96	make use of the average length of sparse features for init Summary: Ability to use average length of sparse feature to initialize weights. Based on experiments, it turns out that this allows a model to converge faster. More results of the experiment -- https://fb.quip.com/VfraAXNFWhSg Reviewed By: xianjiec Differential Revision: D6092437 fbshipit-source-id: d979be7d755719ff297b999f73cba0671e267853	2017-11-08 07:31:47 -08:00
Richard Zou	e579ae75b5	Fix error when default_collate is passed a collection of numpy.str_ (#3404 ) * Fix error when default_collate is passed a collection of numpy.str_ * Error if default_collate input is nested nparray containing non-numbers	2017-11-08 10:02:08 -05:00
Sam Gross	be071d767d	Fix uniform on CUDA tensor to return in range [0, 1) (#3547 ) The curand_uniform function returns the range (0, 1]. Most RNG APIs have the opposite bounds. Fixup the values in uniform_() so that they fall in the more common bounds.	2017-11-08 10:00:37 -05:00
Sam Gross	9c3cb6e652	Fix stride checks in gemm dispatch (#3548 ) From https://software.intel.com/en-us/mkl-developer-reference-fortran-gemm: lda: "When transa = 'N' or 'n', then lda must be at least max(1, m), otherwise lda must be at least max(1, k)." ldb: "When transb = 'N' or 'n', then ldb must be at least max(1, k), otherwise ldb must be at least max(1, n)." Partly addresses #3525	2017-11-08 09:55:25 -05:00
Holger Kohr	5e382894be	add numpy() and from_numpy() to HalfTensor (#2953 )	2017-11-08 15:01:29 +01:00
Edward Z. Yang	8d2b9a08f4	Some documentation for derivatives.yaml Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 14:57:43 +08:00
Edward Z. Yang	fb186c0079	Make atan2 backwards reuse intermediate computation. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 14:57:43 +08:00
Edward Z. Yang	7747078a89	Support defining gradient for multiple inputs simultaneously in derivatives.yaml Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 14:57:43 +08:00
Edward Z. Yang	07d30e9c3f	Delete obsolete only_registry entries in cwrap. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 14:57:43 +08:00
Edward Z. Yang	d719936b13	Top level comment for gen.py Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-08 14:57:43 +08:00
Ellie Wen	84b76a0712	fix shape info in concat layer Summary: The output shape info is incorrect, e.g. if we have 4 embeddings with dim size 32, the actual shape is (4, 32), but the previous implementation in concat layer will give us (128, 1). This bug doesn't affect the dot products calculation because the actual shape of the blob is still (4, 32) in concat_split_op Differential Revision: D6264793 fbshipit-source-id: 82995e83a8c859cbd15617ff7850a35b30b453b6	2017-11-07 21:08:39 -08:00
gchanan	daf2743bbb	Prevent segfaults from undefined aten tensors (#3482 ) * Prevent segfaults from undefined aten tensors. This introduces a singleton UndefinedTensor TensorImpl with UndefinedType that is the starting state of a Tensor with no constructor arguments. In this way we avoid null pImpls and avoid segfaults without having to if-check each pImpl dereference. * If either Backend or Scalar type is Undefined in registry, return the UndefinedType to avoid errors like CPUUndefinedType is not enabled. * Address review comments. * Avoid refcounting UndefinedTensors. * Use reference_wrapper to avoid copy in check_defined. * Declare UndefinedTensor singleton as class-static. * Seperate checked_cast into storage and tensor versions. * Include <functional> * Handle nullptr TensorImpls coming from NN. * Fix nullptr check in batch_normalization backward with defined check.	2017-11-07 21:28:17 -05:00
Ilia Cherniavskii	c75ab8167d	Fix double event record in RNN executor Summary: RNN executor uses its own set of events (https://fburl.com/37mows6l) and may call RunAsync multiple times on the same op. Disable internal op event for this use case. Reviewed By: akyrola Differential Revision: D6258471 fbshipit-source-id: 228f9ca9882cfbac5bc8fba55ddf80bd2b542072	2017-11-07 14:16:45 -08:00
Richard Zou	dc10083fc0	Previous PyTorch version info (#3549 )	2017-11-07 17:15:33 -05:00
gchanan	6c1bff4cbc	Generate native functions with const ref Tensor arguments. (#3465 ) * Generate native functions with const ref Tensor arguments. This matches the non-native functions and avoids unnecessary ref counts. * Properly handle inplace functions. * Return Tensor & for inplace native functions.	2017-11-07 17:07:22 -05:00
SsnL	bb1b826cdc	Exposing emptyCache from allocator (#3518 ) * Add empty_cache binding * cuda.empty_cache document * update docs	2017-11-07 17:00:38 -05:00
ngimel	f3c7bb9bc1	avoid unnecessary multiplies in derivatives (#3545 )	2017-11-07 16:29:55 -05:00
Sam Gross	ecbc4b0dc3	Fix float uniform generation in TH (#3541 ) Generate random uniform floats in the range [0, 1) by generating random uniform uint32 in the range [0, 2^24-1] and dividing by 2^24. This ensures that the largest value is representable as a float32 less than one. This also changes the uniform double generation to use more bits of randomness.	2017-11-07 16:26:11 -05:00
chenyuntc	9b54f8e59c	ignore digit in container's __dir__	2017-11-07 22:08:32 +01:00
Gregory Chanan	5fd93b56fd	[master] Don't expose 0-dim tensors to Variable API.	2017-11-07 15:15:42 -05:00
Richard Zou	9a020ea2ff	Document weights argument format for BCELoss (#3535 )	2017-11-07 14:19:46 -05:00
Zachary DeVito	534e8ecc97	fix C_FLAGS typo (#3538 )	2017-11-07 13:48:29 -05:00
Kai Arulkumaran	4587a7686b	Make distributions docstring raw (#3539 )	2017-11-07 13:47:48 -05:00
Richard Zou	00d2befba1	THTensor_varOuterDim numeric stability (#3533 )	2017-11-07 13:47:20 -05:00
Sam Gross	6fde0cb507	Fix memory leak in THTensor_(addmm) (#3536 ) THTensor_(newContiguous) always increments the refcount. It may return the same pointer if the tensor is always contiguous. Since we added the check for zero strides, it may be called when the tensor is already contiguous. We need to make sure that THTensor_(free) is always called in this case. Fixes #3498	2017-11-07 12:47:13 -05:00
avmgithub	99907f2eb0	[ppc64le] add -fexceptions to aten build function for C and CXX builds (#3515 ) * add -fexceptions to aten build function for C and CXX builds * add -fexceptions to aten build function for C and CXX builds * add -fexceptions to aten build function for C and CXX builds * Fix test_torch.py test for Power see issue #3277	2017-11-07 12:12:41 -05:00
Richard Zou	77ddd5130b	Add reduce keyword for KLDivLoss (#3330 )	2017-11-07 08:57:11 -05:00
Edward Z. Yang	db3f5f86b2	Update ONNX IR we emit to version 0.0.2 (attribute discriminators) / fix Permute export (#3484 ) * Regenerate ONNX nanopb from latest version. But don't bump the IR version, we don't handle discriminators yet. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add discriminator to AttributeProto. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add back ONNX definition for permute Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-07 07:40:28 -05:00
peterjc123	29fc920305	Fix MSVC build after major change (#3467 ) * Fix MSVC builds after ATen patch * Skip isnan and isinf tests for integral types * Remove additional blank line * fix wrong template arguments * using spaces instead of tabs * Revert to default formatting * Fix build scripts * Revert wrong changes	2017-11-07 07:15:58 -05:00
Soumith Chintala	6767db28dc	adds flag __CUDA_NO_HALF_OPERATORS__ (#3520 ) * adds flag __CUDA_NO_HALF_OPERATORS__ * Update CMakeLists.txt	2017-11-07 07:01:09 -05:00
Victoria X Lin	d2ddbaaf8d	Fix command highlight in README (#3521 )	2017-11-07 06:50:48 -05:00
Edward Z. Yang	6dd87dc88a	Merge vestigial Local.cwrap into Declarations.cwrap / remove standalone ATen build logic (#3522 ) * Merge vestigial Local.cwrap into Declarations.cwrap Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove dead standalone ATen build logic. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-06 19:23:51 -08:00
Andrew Tulloch	7d488544d3	Fix leak of workspace buffers Reviewed By: Yangqing Differential Revision: D6254401 fbshipit-source-id: 57b6a99b79d79e13e3f9ec666df399918327294a	2017-11-06 18:07:26 -08:00
Xianjie Chen	cbb03b8db8	add modulo operator Summary: as desc. Reviewed By: chocjy Differential Revision: D6240026 fbshipit-source-id: fa4dcccebc44b0a713946823b6f56e73d5d6146b	2017-11-06 16:44:16 -08:00
Adam Paszke	621fbd5c4e	Move flattening/unflattening JIT logic to C	2017-11-06 19:42:44 -05:00
Adam Paszke	22f596572c	Add torch.autograd.profiler.range	2017-11-06 19:42:44 -05:00
avmgithub	68116d7f84	Fix test_torch.py test for Power see issue #3277 (#3517 )	2017-11-06 18:51:02 -05:00
SsnL	e2f33eb6a2	add doc for sparse_adam (#3519 )	2017-11-06 18:37:15 -05:00
Christian Sarofeen	aa93a3d633	-1 indexing fix in THCApply for pre CUDA9 (#3457 ) * THCApply fixes * THCApply add undef	2017-11-06 18:28:01 -05:00
Sam Gross	fde355f7d4	Allow in-place operations on views (#3384 ) Allow in-place operations on views Adds VariableViewImpl, a subclass of VariableImpl which has a pointer to the base Variable on which it is a view. In-place operations on views change the grad_fn of the base. Note that in-place operations only work on views that are the first output of the function that created them. All C++/ATen implemented functions have this behavior, but it's possible to write Python-implemented autograd functions that do not. In-place operations on these view will raise an exception. Fixes #3313	2017-11-06 18:19:56 -05:00
Zachary DeVito	d6a8d28d65	Simplify ATen Build (#3496 ) * THS build change * merge THCS into ATen build * THCUNN build change over * update THNN build * move THC build to ATen, as well as some of the accumulated top level config from other TH* libraries * TH library build merged into ATen, and warnings fixes. * fix magma support checking * check cuda early * fall back to GCC atomics if C11 atomics have issues. * fix install name * disable openmp in files that also include stdatomic.h * make sure LAPACK is visible to TH build file.	2017-11-06 17:46:15 -05:00
Sam Gross	50a63ee6fd	Fix and speed-up norm_backwards (#3481 ) Fixes #3264	2017-11-06 17:11:44 -05:00
Richard Zou	3d06a1e075	Make THCTensor_varInnermostDim numerically stable using Welford's algorithm (#3425 ) * Use Welford's algorithm when reducing along inner dimension for THCTensor's variance fn * Use accreals in THCTensor's varInnermostDim * Skip cuda tests if no cuda * Variance testing	2017-11-06 16:00:29 -05:00
Gregory Chanan	4e5b25ed47	Use ASSERT(...) rather than assert(...) in ATen tests. Since ATen build no longer relies on NDEBUG from pytorch, this ensures the asserts will still fire.	2017-11-06 14:31:59 -05:00
SsnL	8fd171a6fd	add test_index to test_cuda	2017-11-06 14:21:31 -05:00
SsnL	0bb0ee883e	relax index dim check	2017-11-06 14:21:31 -05:00
SsnL	f76d6c029c	Sparse Adam optimizer for sparse gradients (#3137 ) * sparse adam * Favor dense addition over sparse_mask	2017-11-06 14:20:51 -05:00
Richard Zou	c2626f6031	Fix error message for type mismatches with sparse tensors (#3504 ) * Fix error messages * Better fix for error checking	2017-11-06 13:12:40 -05:00
Trevor Killeen	122d884bbf	add CMake flag for disabling contrib builds (#3508 )	2017-11-06 12:55:54 -05:00
Dhanton	74d1bb54e6	Add single argument version of torch.arange (#3494 )	2017-11-06 12:26:04 -05:00
陈云	c2bdda1224	implement `__dir__`for Variable (#3501 ) * implement __dir__ for Variable * Update test_autograd.py	2017-11-06 08:08:15 -05:00
Andrey Malevich	84067bc17d	Make RowWiseSparseAdagrad type/shape inference compatible. Summary: Current version of the code is not supporting type and shape inference that is going to make all places that rely on it fail misserably. I'm still leaving option of doing init in the old way in case if some places are already failing this inference logic. Reviewed By: ffjiang Differential Revision: D6241270 fbshipit-source-id: e9080ffe93d610b5ada58ebe66579acfa57c6b3c	2017-11-06 00:50:44 -08:00
Kaixhin	5de7f9e731	Tidy up CUDA notes	2017-11-05 14:42:06 +01:00
Kaixhin	5c881f00a0	Add REINFORCE rule to distributions doc	2017-11-04 12:03:13 -04:00
Skotch Vail	0ce65ede86	Revert D6224054: [xplat] Switch to open-source NNPACK Summary: This reverts commit 4dbe02b4da97648a663586414550c2d4e23c7221 bypass-lint Differential Revision: D6224054 fbshipit-source-id: 6be2e5a129928650ddfe8baa1b309068d90bea69	2017-11-04 00:31:33 -07:00
Bram Wasti	0b661035f3	pointwise cost function Summary: start adding some more annotations Reviewed By: salexspb Differential Revision: D6180221 fbshipit-source-id: b02157da6b2dfa2064ecab3fad5aaddcc7551253	2017-11-03 22:46:35 -07:00
Edward Z. Yang	8cb7e5bd5b	Don't assume construction succeeded in __del__. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-04 00:57:08 -04:00
Mohammad Hossain	ac099ceda0	Set debug_net_def for NetBase Summary: Same as title Reviewed By: salexspb Differential Revision: D6203094 fbshipit-source-id: 8e57d596b95d3bf71b59f265a58bc61a3b727f5b	2017-11-03 20:55:05 -07:00
Pieter Noordhuis	1021402136	Compile nnpack and pthreadpool with -fPIC Summary: Closes https://github.com/caffe2/caffe2/pull/1428 Reviewed By: Maratyszcza Differential Revision: D6240390 Pulled By: pietern fbshipit-source-id: 6d441bbfda81ce79e3c824ec28eec0b2cdd8c7cd	2017-11-03 20:19:26 -07:00
Sam Gross	fe4e14ed29	Fix fill derivative (#3483 )	2017-11-03 23:00:48 -04:00
Marat Dukhan	5616d41421	Switch to open-source NNPACK Summary: replaces FB-internal NNPACK fork with open-source version. Important FB features are already upstreamed to the GitHub repo. Reviewed By: ajtulloch Differential Revision: D6224054 fbshipit-source-id: 4dbe02b4da97648a663586414550c2d4e23c7221	2017-11-03 19:01:53 -07:00
Lu Fang	2bdea8b451	Add ONNX symbolic for Elu	2017-11-03 20:57:51 -04:00
Marat Dukhan	54972458e1	Build NNPACK and pthreadpool as static libraries Summary: Closes https://github.com/caffe2/caffe2/pull/1427 Differential Revision: D6239272 Pulled By: Maratyszcza fbshipit-source-id: 644b588838235b55086413e08bb7f144d924507f	2017-11-03 17:32:35 -07:00
Qinqing Zheng	ce62c65c18	momentum sgd Summary: Add support for SparseMomentumSGDUpdate and tests for momentum SGD in both dense and sparse cases Reviewed By: akyrola Differential Revision: D6234834 fbshipit-source-id: 9848c29ea06794ef35f1ebaff0f5e81eac4f4db9	2017-11-03 16:17:17 -07:00
Ilia Cherniavskii	7ac341c862	Fix EventBasics test Summary: Add missing event reset. Reviewed By: Yangqing Differential Revision: D6236352 fbshipit-source-id: 4bee6dd22fa69532a9f376f03bb2f69c9c24c01e	2017-11-03 16:02:19 -07:00
ngimel	ea9fcd5c47	fix copy-paste error in #3263 (#3476 ) I have no idea how it worked on cuda 8, but apparently this fixes failures on cuda 9. cc @colesbury	2017-11-03 18:51:26 -04:00
Richard Zou	f7a459b28b	Fix overflow when using magma (#3470 ) * Fix types * Make types better instead of casting to size_t	2017-11-03 18:13:06 -04:00
Alexander Sidorov	20feef45bc	NNFC operator: an FC with noTrans noTrans options Summary: This seems to be faster in a bunch of cases. Prefer to keep it as a separate op instead of MatMul + Add so its easy to compare perf on per op basis between this one and the baseline (normal FC) Reviewed By: akyrola Differential Revision: D6169187 fbshipit-source-id: 09b96325d44bd181896f396aec88b27314c435b0	2017-11-03 15:08:39 -07:00
Philipp Keller	68ed66a2c5	Faster BatchBoxCox Operator using MKL Summary: Use MKL VML vsPow() and row-major iteration for faster BatchBoxCox operator. Reviewed By: kennyhorror Differential Revision: D6042052 fbshipit-source-id: 54fc6b9184cb341672183a77730d79a271d09207	2017-11-03 12:04:03 -07:00
ngimel	13fde88b83	Install magma in cuda 9 docker (#3469 )	2017-11-03 14:17:05 -04:00
Aapo Kyrola	b71cebb11f	Fix LoadModel() in resnet50_trainer Summary: resnet50 trainer will save the 'optimizer_iteration' blob in checkpoints, but loads it i in GPU context. This fails because AtomicIter/Iter expect the blob to be in CPU context. So manually reset the optimizer_iteration in CPU context. I am thinking of making the iter-operators automatically do this switch, but in the mean time this unbreaks the trainer. Reviewed By: sf-wind Differential Revision: D6232626 fbshipit-source-id: da7c183a87803e008f94c86b6574b879c3b76438	2017-11-03 11:15:25 -07:00
Sam Gross	42de0df411	Add assertion that 'pos' is in-bounds (#3466 )	2017-11-03 12:54:55 -04:00
Sam Gross	a8efd88cac	Fix warning in jit/ir.cpp	2017-11-03 09:11:33 -07:00
Xianjie Chen	1b5c843a9c	cleaner logic on sparse feature hashing Reviewed By: kennyhorror Differential Revision: D6195525 fbshipit-source-id: f687ac3d4914c3dbb0d35679e3a3d3a64a71ac53	2017-11-03 07:27:45 -07:00
Ilia Cherniavskii	1149b9bbb5	Polling async net executor Summary: Implementation of polling async net executor. Notes: - New net executor async_polling - schedules CPU and GPU ops asynchronously, uses single polling thread - Events: update to Caffe2 events to support async CPU events, adding new methods: Query() - non-blocking checking of event states: INITIALIZED -> RECORDED -> SUCCESS/FAILED ErrorMessage() - when operation runs asynchronously and fails calling this on event will give error message - Tasks: using existing DAGNet's algorithm to compute CPU and GPU chains, a separate task for each chain - Polling: using single thread to query state of events - for CPU tasks atomically queries task state, for GPU task - uses cudaEventQuery; using Event - Scheduling of CPU ops: using global thread pools - Scheduling of GPU ops: using GPU thread pool per GPU device Reviewed By: dzhulgakov Differential Revision: D5985110 fbshipit-source-id: a9de7fcbb71d046a3aa1b573072b89a65dfeee8c	2017-11-03 07:27:44 -07:00
Dmytro Dzhulgakov	8548dd2486	Fix intrinsic in perf kerneles for int8 Summary: 8 bytes is 64 bits. Fixes out of range access caught by ASAN Reviewed By: Yangqing Differential Revision: D6219576 fbshipit-source-id: f7c418b12fa211890abcb5aef800bd456390b73a	2017-11-03 05:19:58 -07:00
Dmytro Dzhulgakov	583bc63c98	Fix boundary checking in 8-bit sparselengthssum ops Summary: Before the boundary checking was happening after the first access for 8bit ops. Reviewed By: Yangqing Differential Revision: D6206753 fbshipit-source-id: 07ab240cae8c67b3048f03aa79af0b6399b9940b	2017-11-03 05:19:57 -07:00
Richard Zou	e11d2b9c9c	Better error messages for Aten tensor types (#3449 ) * Better error messages for Aten tensor types * Address comments, add unit test	2017-11-03 07:59:05 -04:00
Sam Gross	596a335851	Add gradient checks for take and put_ (#3460 ) * Add gradient checks for take and put_ Fix the gradient formula for put_ * Make grad_output optional in gradgradcheck	2017-11-03 07:55:59 -04:00
Sam Gross	9136dcdb60	Make grad_output optional in gradgradcheck (#3459 )	2017-11-03 07:55:14 -04:00
Zach DeVito	cbedba373c	use valgrind to make aten test pass	2017-11-02 20:39:11 -04:00
Andrew Tulloch	ebae2f6c71	MKL Sigmoid op wrapper Reviewed By: Yangqing Differential Revision: D6222910 fbshipit-source-id: 92d0825a6a35a4bf6a12636e3d5dd8affcffeef3	2017-11-02 17:30:29 -07:00
Andrew Tulloch	a7644e4f4b	Extend rewrite functionality to handle multiple outputs. Summary: Still assumes a complete subgraph, but slightly more generic. Reviewed By: Yangqing Differential Revision: D6103228 fbshipit-source-id: bfa0d46067e05baa0478a4c37a67ccf8f81f34ec	2017-11-02 17:30:27 -07:00
Zach DeVito	502aaf39cf	make sure stdatomic.h is included when checking for ATOMIC_INT_LOCK_FREE	2017-11-02 19:53:36 -04:00
Zachary DeVito	81e56ff8aa	NO_CUDA for travis	2017-11-02 19:53:36 -04:00
Zach DeVito	531a20b312	enable ATen in the travis build tests.	2017-11-02 19:53:36 -04:00
Zach DeVito	f6dac327df	build fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	88d56cc198	fix setup.py paths	2017-11-02 19:53:36 -04:00
Zach DeVito	5aa5b572e4	update build so that all of TH* is in libATen	2017-11-02 19:53:36 -04:00
Zach DeVito	4424b3e352	Update CMakeLists.txt in TH* libraries to support static builds.	2017-11-02 19:53:36 -04:00
Zach DeVito	320ff3ad64	remove subtree of ATen since ATen is now inside pytorch	2017-11-02 19:53:36 -04:00
Zach DeVito	d792c21f72	move TH* folders into aten/src	2017-11-02 19:53:36 -04:00
Zach DeVito	39fc9f9c11	make stack only a function	2017-11-02 19:53:36 -04:00
Zach DeVito	e3b82a7665	use private to prevent double linking	2017-11-02 19:53:36 -04:00
Zach DeVito	5dfbc3d6c9	whole archive	2017-11-02 19:53:36 -04:00
Zach DeVito	f1b7464119	create file so that find_package works in CMake	2017-11-02 19:53:36 -04:00
Zach DeVito	185cd0af46	modify ATen/TH build to make/install only libATen.so libTH is built statically and folded into libATen	2017-11-02 19:53:36 -04:00
Zach DeVito	f3e4dc176b	add TH_LINK_STYLE, which allows the universal use of STATIC libraries across TH* and ATen	2017-11-02 19:53:36 -04:00
Zach DeVito	9398e0c0c1	fix CMakeLists for new directories	2017-11-02 19:53:36 -04:00
Zach DeVito	8e584a5cd5	directory restructure	2017-11-02 19:53:36 -04:00
Zach DeVito	5c49df8875	from pytorch	2017-11-02 19:53:36 -04:00
Gregory Chanan	1420375ead	Change 'sizes' parameter name to 'size' in expand native function.	2017-11-02 19:53:36 -04:00
gchanan	c38defd201	stack should not be a method. (#156 )	2017-11-02 19:53:36 -04:00
Zach DeVito	360203a2a4	update nn	2017-11-02 19:53:36 -04:00
Gregory Chanan	73117ea5ba	Implement stack as a native function.	2017-11-02 19:53:36 -04:00
Sam Gross	97bc100b92	Fix handling of inf and nan (#153 )	2017-11-02 19:53:36 -04:00
Edward Z. Yang	f1c5d8c4ce	Add an at() method for indexing. (#152 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Priya Goyal	07a049d900	update dlpack header and convertors	2017-11-02 19:53:36 -04:00
Priya Goyal	cbebcb347b	Make valgrind optional to make our build pass	2017-11-02 19:53:36 -04:00
Gregory Chanan	5d84013249	Correct dimensions for reduction functions, squeeze, unsqueeze. Reduction functions that take a dimension now properly reduce down to scalars if passed a 1-dimensional tensor. Squeeze now properly reduces down to scalars as well (and is implemented as a native function). Unsqueeze now handles scalar inputs correctly (so unsqueezing a scalar returns a dim 1 tensor, rather than a dim 2 tensor).	2017-11-02 19:53:36 -04:00
Nicolas Vasilache	682aec30b5	Update ExpandUtils.h Include what you use, otherwise compilation may break. @prigoyal reported compilation errors with gcc-4.9 I believe.	2017-11-02 19:53:36 -04:00
Priya Goyal	cf348bcdee	tighten hasCUDA check	2017-11-02 19:53:36 -04:00
Gregory Chanan	ac1abc4cb8	Add comment explaining return of dim() when tensor is a scalar.	2017-11-02 19:53:36 -04:00
Gregory Chanan	d5d6dafb04	Address review comments.	2017-11-02 19:53:36 -04:00
Gregory Chanan	a10030eec7	Represent empty tensors as size {0} tensors and fix scalar checks. This gets rid of kUndefinedDimensions and has nice properties like: - the dimensionality always matches the length of the sizes and strides. - the number of elements is always the product of the sizes (starting at the identity) - the shape you pass to factory functions (e.g. randn) matches the shape that is returned etc. In addition to the empty tensor change, this makes some related changes: 1) expand is now a native function, because it needs to operate on the ATen view of the size/strides. 2) adds tests for a number of functions operating on empty, scalar, non-scalar tensors. This uncovered a number of scalar_check bugs; some of these are fixed in the generated code, some that need to be manually specified can be specified by a 'scalar_check' argument in the cwrap. 3) fixes the formatting of empty tensors 4) changes the THLongStorageView API; the public API was getting overly complicated, so now you call 'makeFromSize', 'makeFromStride', 'makeFromLength' and it just handles the correct mapping for that type.	2017-11-02 19:53:36 -04:00
Zachary DeVito	c369d4da85	warning fix (#142 )	2017-11-02 19:53:36 -04:00
gchanan	0e9e18303b	Adds permute and as_strided to ATen (#137 ) Permute transposes multiple dimensions at once. The as_strided function changes the sizes and strides of a tensor without changing the Storage. It's a subset of Tensor::set_.	2017-11-02 19:53:36 -04:00
Sam Gross	8cdd7650ee	Make toScalarType and toBackend virtual This allows VariableType override them to return instances of VariableType. Combined with the change to Formatting.cpp, this lets us print Variables to std::cout.	2017-11-02 19:53:36 -04:00
Gregory Chanan	6b113b1d1c	Make size, strides, dim functions const.	2017-11-02 19:53:36 -04:00
Gregory Chanan	fee9195821	Change is_same_size to a native function. For one thing, we will want a different implementation from TH because we need to differentiate between scalars and 1-dim tensors. Also, we don't really want to expose the THS/THCS function; in addition to checking the shapes are the same, it checks that the dimensions which are sparse are the same (because various THS/THCS operators only work if this is true; it should really be called "is_congruent" or similar.	2017-11-02 19:53:36 -04:00
Gregory Chanan	7273906eac	Add unsqueeze of scalar to wrapdim_test.	2017-11-02 19:53:36 -04:00
Trevor Killeen	1cde661df3	bind newWithTensor in ATen (#129 )	2017-11-02 19:53:36 -04:00
Zach DeVito	dd0c95d552	fix merge problems	2017-11-02 19:53:36 -04:00
Zach DeVito	c11349a9b8	missing code from pytorch	2017-11-02 19:53:36 -04:00
Zach DeVito	a03621462e	missing entry	2017-11-02 19:53:36 -04:00
Sam Gross	bdc98a0e7a	The at::cat should default to dim=0	2017-11-02 19:53:36 -04:00
Zach DeVito	60e7e96c7a	update docs	2017-11-02 19:53:36 -04:00
Trevor Killeen	32ecaa0870	regenerate docs w/ recent changes (#126 )	2017-11-02 19:53:36 -04:00
Gregory Chanan	8e13a95357	Support default parameters for native functions.	2017-11-02 19:53:36 -04:00
Tongzhou Wang	930b98cacd	smarter backend option	2017-11-02 19:53:36 -04:00
SsnL	b06d8937f5	sparse cuda and get device (#122 )	2017-11-02 19:53:36 -04:00
Edward Z. Yang	0c1ce9feb2	Conda packaging (#119 ) * conda packaging Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update comment, and the build problem is fixed now	2017-11-02 19:53:36 -04:00
Zachary DeVito	4afd630632	remove CMAKE_CXX_STANDARD stuff in favor of setting --std=c++11 directly because parse of FindCUDA ignore the former approach (#121 )	2017-11-02 19:53:36 -04:00
peter	c58913dc95	Remove C exports and rename AT_API	2017-11-02 19:53:36 -04:00
peter	ed46386c85	Fix missing <functional> and export decorations in lib/ATen	2017-11-02 19:53:36 -04:00
Priya Goyal	d84429b526	Revert the enum changes as discussed	2017-11-02 19:53:36 -04:00
Edward Z. Yang	5683144b97	Fix typos in orgqr and orgmqr Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Edward Z. Yang	db1292a509	Add missing string include, fixes https://github.com/pytorch/pytorch/issues/3192 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Edward Z. Yang	f074a7a95c	Rename value to other, wherever there is both a Scalar and Tensor overload. (#115 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Gregory Chanan	67c2f0ead9	Separate out native processing into procecss_native; remove (TH)Type specific logic.	2017-11-02 19:53:36 -04:00
Gregory Chanan	e73f6c211b	Support 'native' ATen functions with Tensor, (base) Type, NS impls. This adds the ability to specify 'native' functions in NativeFunctions.h and specifies 'split' and 'chunk' in this manner. The function arguments, returns, variants, etc. are specified as if they were processed via other parsing mechanisms (e.g. cwrap_parse) with the following additional parameters: type_method_definition_level: this allows one to specify that the type method should be defined at the 'base' type level; this is because in the case of 'split' and 'chunk' (and probably most/all other native functions that don't directly dispatch to TH/THC) we don't need type-specific implementations. Currently it is enforced that 'base' is specified for native functions, but this is easy to remove later. type_method_definition_dispatch: this defines the function to dispatch to. For split, this is at::native::split; this is just to avoid having a magic namespace and allowing one to dispatch to a function with a different name.	2017-11-02 19:53:36 -04:00
Priya Goyal	b5d3edfd7f	Update DLPack tensors enum to avoid binary issues and expose one function	2017-11-02 19:53:36 -04:00
Edward Z. Yang	bacca0eba1	Change softmax and log_softmax to take int64_t dim rather than int. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Edward Z. Yang	6849554ac6	Squash ATen warning Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Adam Paszke	8257f3e6e2	Update log_softmax and softmax signatures to include dim (#106 )	2017-11-02 19:53:36 -04:00
Zach DeVito	164a8fbaf5	nit: move ATenDLMTensor to cpp file since it doesn't need to be in the header	2017-11-02 19:53:36 -04:00
Priya Goyal	3bc54bf2d9	[dlpack] Memory management for dlpack	2017-11-02 19:53:36 -04:00
Sam Gross	30bbeb8b87	Relax Scalar::toXXX conversions to only check for overflow Currently, the toXXX functions on Scalar check that the conversions are exact. This will cause an exception in code like: auto t = CPU(kFloat).ones({1}); t = M_PI; Or the equivalent in Python: t = torch.ones(1) t = math.pi This changes the checks to only throw an exception in the case of overflow (positive or negative).	2017-11-02 19:53:36 -04:00
Sam Gross	56a241f97a	Every argument controlled by the output_mask may be null	2017-11-02 19:53:36 -04:00
Sam Gross	e9be595081	Add additional erf, erfinv, and additional nn functions	2017-11-02 19:53:36 -04:00
Sam Gross	2295a15b8c	Add bindings to additional NN functions	2017-11-02 19:53:36 -04:00
Sam Gross	8e312ab2e6	Expose is_nullable in Declarations.yaml Some parameters can be null but do not have default values.	2017-11-02 19:53:36 -04:00
Sam Gross	112f183dc2	Support broadcasting in copy a.copy_(b) will now broadcast b to the shape of a. Note that this means that copies between tensors of the same number of elements but incompatible shapes are not allowed. For example, the following will throw an exception: Tensor a = type.rand({4, 43); Tensor e = type.rand({3, 4}); a.copy_(e)	2017-11-02 19:53:36 -04:00
Sam Gross	c9889d934f	Use pointer equality to compare types	2017-11-02 19:53:36 -04:00
Sam Gross	a6335b54d6	Combine comparison methods and functions The methods were separate because PyTorch supports multiple output types for comparison methods. For example, for FloatTensors 'a' and 'b' both calls are vaild: torch.lt(a, b, out=<ByteTensor>) torch.lt(a, b, out=<FloatTensor>) ATen only supports ByteTensor outputs because the overloads have the same static signature and would conflict. It would be nice to fix this in the future like with the bernoulli function. In the meantime, the separate function and method definitions with different argument names make implementing VariableType more difficult.	2017-11-02 19:53:36 -04:00
Sam Gross	54addcf0af	Support wrap_dim in nn.yaml	2017-11-02 19:53:36 -04:00
Sam Gross	dc9c5806a3	Expose the THGenerator* via unsafeGetTH on at::Generator	2017-11-02 19:53:36 -04:00
Mark Neumann	5cfa890926	pass values with flags	2017-11-02 19:53:36 -04:00
Zach DeVito	9f5c0a02a7	add a deleter callback to tensorFromBlob	2017-11-02 19:53:36 -04:00
Sam Gross	004cd36efe	Add additional comments	2017-11-02 19:53:36 -04:00
Sam Gross	0243338603	Generate PyTorch-style NN bindings This generates NN bindings with a similar interface to PyTorch's torch.nn.functional package. The file nn.yaml specifies function signatures and THNN implementations. Each NN operation generates three functions. For example: - conv2d - conv2d_forward - conv2d_backward The conv2d and conv2d_forward functions differ in how they handle buffers that need to be passed to the backward function. conv2d_forward takes the buffers as parameters. conv2d creates the buffers internally and discards them.	2017-11-02 19:53:36 -04:00
Sam Gross	c8c967fa43	Improve Declarations.yaml: (#81 ) * Improve Declarations.yaml: - translate defaults to C++ values - include names of returned values - mark keyword-only arguments * Add comment to translate_default	2017-11-02 19:53:36 -04:00
Sam Gross	37d9ad748b	Refactor out TensorBase from Tensor Use TensorBase in Scalar class	2017-11-02 19:53:36 -04:00
Sam Gross	25b97aebdf	Fix copy and move constructors	2017-11-02 19:53:36 -04:00
Sam Gross	43fbe58dc0	Remove has_full_argument_list	2017-11-02 19:53:36 -04:00
Sam Gross	986c577e93	Fix lint	2017-11-02 19:53:36 -04:00
Sam Gross	9b0b26d037	Add check that tensor is defined in Scalar constructor	2017-11-02 19:53:36 -04:00
Sam Gross	937950e064	Move default arguments to function declaration * Make alpha, beta in addmm kwarg_only * Move kwarg_only arguments to the end * _out variants now have output arguments at the beginning	2017-11-02 19:53:36 -04:00
peter	3d80bd31d8	Fix build for MSVC	2017-11-02 19:53:36 -04:00
Sam Gross	32057edbf3	Fix build (#75 )	2017-11-02 19:53:36 -04:00
Gregory Chanan	9a6334fead	Implement _unnarrow (backwards of narrow) in ATen. Note this is currently prefixed with an underscore because it may go away (can be implemented via index).	2017-11-02 19:53:36 -04:00
Gregory Chanan	f3e2d6669e	Enable wrap_dim in Local.cwrap. This includes torch.cat, which is a TensorList argument, which wasn't supported before.	2017-11-02 19:53:36 -04:00
Gregory Chanan	211c717e53	Make all dim arguments int64_t	2017-11-02 19:53:36 -04:00
Priya Goyal	7d1c01a86f	Converting dlpack tensor to aten tensor	2017-11-02 19:53:36 -04:00
Priya Goyal	6826a5c467	adding a simple class for converting atensor to dlTensor	2017-11-02 19:53:36 -04:00
Priya Goyal	6b61d72eec	Test stub for dlconvertor	2017-11-02 19:53:36 -04:00
Priya Goyal	21d98db9b8	adding dlpack header	2017-11-02 19:53:36 -04:00
peter	99141e62a6	Fix build failure in MSVC	2017-11-02 19:53:36 -04:00
Gregory Chanan	0acaf1ee6b	Update generated docs for post-const Type changes.	2017-11-02 19:53:36 -04:00
Gregory Chanan	dec470797b	Mark all (non-static) Type methods as const.	2017-11-02 19:53:36 -04:00
Zach DeVito	73a31cfed2	add merge_all script for subtrees	2017-11-02 19:53:36 -04:00
peterjc123	9ed7ab82de	Win64 support for lib/ATen	2017-11-02 19:53:36 -04:00
Sam Gross	aba1bb1d46	Micro optimizations in ATen * Compare typeid instead of using dynamic_cast * Mark derived TensorImpl classes as final * Use tensor->nDimension instead of THTensor_(nDimension)	2017-11-02 19:53:36 -04:00
Soumith Chintala	9a01d3f374	add support for custom python	2017-11-02 19:53:36 -04:00
Sam Gross	19770db681	Make 's_' functions on Type public	2017-11-02 19:53:36 -04:00
Sam Gross	33e94adaa9	Mark unsafeGetTH as const	2017-11-02 19:53:36 -04:00
Gregory Chanan	ec539abc6e	Move wrap_dim code to Utils function to minimize generated code.	2017-11-02 19:53:36 -04:00
Gregory Chanan	054a9719f1	Generate wrap_dim code on derived type rather than base type. Either should work, but code feels more natural this way.	2017-11-02 19:53:36 -04:00
Gregory Chanan	e33d154bcc	Support wrap_dim specifications from cwrap.	2017-11-02 19:53:36 -04:00
Luca Antiga	21df48f7b4	Use cast instead of literal as a temporary fix	2017-11-02 19:53:36 -04:00
Luca Antiga	709dfba95a	Fix default constructor argument	2017-11-02 19:53:36 -04:00
Zach DeVito	2d5764539f	force NO_CUDA to be specified to disable cuda. add pytorch's FindCUDA so that it is possible to get ccache to work for nvcc. make excluded notification more concise.	2017-11-02 19:53:36 -04:00
Sam Gross	752ebc58cc	Handle scalars that are not backed by tensors	2017-11-02 19:53:36 -04:00
Sam Gross	d23a83add4	Add accessor to underlying Tensor	2017-11-02 19:53:36 -04:00
Sam Gross	2a2c989e4b	zero_dim_to_one and empty_to_null can't both be specified	2017-11-02 19:53:36 -04:00
Sam Gross	af184b562b	Rename 'canonical' to 'has_full_argument_list'	2017-11-02 19:53:36 -04:00
Sam Gross	463fb29710	Include non-canonical functions in Declarations.yaml	2017-11-02 19:53:36 -04:00
Sam Gross	efbc1ad2a8	Make Scalar default constructible	2017-11-02 19:53:36 -04:00
Soumith Chintala	bfeacce4ff	fix static linkage and make THD statically linked	2017-11-02 19:53:36 -04:00
Sam Gross	bfc85dbe0f	Handle default arguments in base Type class	2017-11-02 19:53:36 -04:00
Sam Gross	3e960b759f	Use CWRAP_FILES_BASE if defined	2017-11-02 19:53:36 -04:00
Sam Gross	b4900260ef	Add missing const qualifiers	2017-11-02 19:53:36 -04:00
Gregory Chanan	adc9cf15ed	Fix typo.	2017-11-02 19:53:36 -04:00
Gregory Chanan	e5f6057f86	Remove unnecessary early conversion to IntList and make expand functions inline.	2017-11-02 19:53:36 -04:00
Gregory Chanan	f2168578f0	Remove scalar expansion tests.	2017-11-02 19:53:36 -04:00
Gregory Chanan	3ca164a6cc	Address review comments.	2017-11-02 19:53:36 -04:00
Gregory Chanan	8b049e1c46	Support broadcast specifications from cwrap. This respects all the broadcast cwrap specifications except for 'fallback'; i.e. pointwise functions operating on tensors where the number of elements match but the sizes are different and not broadcastable. This behavior is currently deprecated in PyTorch. Note that this is a breaking change in ATen, because ATen just passes through to TH/THC, where the fallback behavior is actually implemented. This also changes expand semantics wrt Scalars (as tensors). Previously, one could 'expand' a 1-dimensional tensor with size 1 to a 'scalar' (i.e. empty size initializer list).	2017-11-02 19:53:36 -04:00
Trevor Killeen	1f0461e76c	elementSizeInBytes for types	2017-11-02 19:53:36 -04:00
Zach DeVito	e84634e4d6	provide more information in Declarations.cwrap	2017-11-02 19:53:36 -04:00
Zach DeVito	c23dfe5ddb	update generated code in documentation to match changes	2017-11-02 19:53:36 -04:00
Zach DeVito	2d018cc24e	sync Declarations.cwrap with pytorch	2017-11-02 19:53:36 -04:00
Trevor Killeen	11807f99b4	Add rudimentary support for calling a few sparse tensor functions.	2017-11-02 19:53:36 -04:00
Zachary DeVito	a1438bad5f	fix issues where scale gets reported as 0.0000 in output	2017-11-02 19:53:36 -04:00
Zeming Lin	8fd8cf7b24	Small readme fix	2017-11-02 19:53:36 -04:00
Daniil Pakhomov	b57f82a2cb	made the repository available for embedding into other projects	2017-11-02 19:53:36 -04:00
Zach DeVito	3fc3289745	add some asserts to basic.cpp	2017-11-02 19:53:36 -04:00
Zach DeVito	fefc2a2c9b	add valgrind to CI	2017-11-02 19:53:36 -04:00
Zach DeVito	187c4ffdd9	allow retain to be specified for unsafeTensorFromTH	2017-11-02 19:53:36 -04:00
Zachary DeVito	99b94fe73f	fix osx build errors related to long/int64_t	2017-11-02 19:53:36 -04:00
Edward Z. Yang	8c427b7715	Note [Undefined-dim versus 0-dim] Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Zach DeVito	c0ad9380c0	fix a bug where some scalars were getting truncated to integers incorrectly.	2017-11-02 19:53:36 -04:00
Zeming Lin	e3322069ec	Fix build for CPU only machines	2017-11-02 19:53:36 -04:00
Zach DeVito	78820919a5	return a sentinel value when THTensor has undefined dimensions.	2017-11-02 19:53:36 -04:00
Soumith Chintala	8380a1a110	fix lint	2017-11-02 19:53:36 -04:00
Soumith Chintala	6fe5126a0a	Static linking against libstdc++ in Binary Build mode	2017-11-02 19:53:36 -04:00
Edward Z. Yang	7a11627a13	Make clang shut up about class/struct mismatch. Makes us -Werror clean again, I think. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 19:53:36 -04:00
Trevor Killeen	7af130f6a4	still generate multiple versions	2017-11-02 19:53:36 -04:00
Trevor Killeen	b7f793c618	add support for Null Tensors to functions	2017-11-02 19:53:36 -04:00
Soumith Chintala	070b2ce33c	lint fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	96b4cfdba0	produce a Declarations.yaml file that describes Functions/Type/Tensor methods that framework produced.	2017-11-02 19:53:36 -04:00
Trevor Killeen	01b8f07624	basic travis script (build + pylint)	2017-11-02 19:53:36 -04:00
Trevor Killeen	7b9b538aa5	operator== for type	2017-11-02 19:53:36 -04:00
Zach DeVito	95e4c4ff87	allow type inference to work on TensorList	2017-11-02 19:53:36 -04:00
Luca Antiga	f6ac105d9a	Fix handling of if_true/if_false in ATen	2017-11-02 19:53:36 -04:00
Christian Sarofeen	f4149b7d0e	Half fixes for ATen and CUDA 9.0	2017-11-02 19:53:36 -04:00
Soumith Chintala	e4e960a42b	lint fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	782be20ac2	fix bug in method declarations	2017-11-02 19:53:36 -04:00
Trevor Killeen	3281be7e3c	add isCUDA() on Type	2017-11-02 19:53:36 -04:00
Zach DeVito	1da1a50ee8	write generated_cpp. to a file rather than as output to make error reporting clearer.	2017-11-02 19:53:36 -04:00
Trevor Killeen	5136bf2dee	dont clobber gen.py error, fix for old versions of python	2017-11-02 19:53:36 -04:00
Soumith Chintala	38681b47e7	fix lint	2017-11-02 19:53:36 -04:00
Trevor Killeen	afdc44c73e	match PyTorch syntax	2017-11-02 19:53:36 -04:00
Trevor Killeen	2fff1dd056	checked cast does it all	2017-11-02 19:53:36 -04:00
Trevor Killeen	2cd8bbd2bc	basic cat implementation in ATen	2017-11-02 19:53:36 -04:00
albanD	52a561e583	Fix ATen build for debug python	2017-11-02 19:53:36 -04:00
Sam Gross	052ad8bf04	Fix a few C++ warnings 1) Type needs a virtual dtor 2) Tensor move ctor should be noexcept 3) Make constructors from Context* and Type* explicit	2017-11-02 19:53:36 -04:00
Trevor Killeen	a19b9d0a1c	add some documentation to Tensor	2017-11-02 19:53:36 -04:00
Trevor Killeen	e73ab1c4c4	add basic gitignore, thpp -> at doc fix	2017-11-02 19:53:36 -04:00
Zach DeVito	67dba8144d	always use a custom default float	2017-11-02 19:53:36 -04:00
Zach DeVito	0c224445d1	python style fixes	2017-11-02 19:53:36 -04:00
Zach DeVito	f5a57e0f7e	support unsafe functions for getting/constructor tensors from TH objects for backward compat.	2017-11-02 19:53:36 -04:00
Zach DeVito	e7a64b8e95	lazily initialize cuda so that we behave similar to PyTorch	2017-11-02 19:53:36 -04:00
Zachary DeVito	70b1401be5	osx build issues and clang warnings	2017-11-02 19:53:36 -04:00
Zach DeVito	fc429af20a	remove Sparse from dispatch for now, will add dispatch variants later	2017-11-02 19:53:36 -04:00
Jonas Gehring	dbba384bd6	Always include THNN in the build, don't check for CUDA twice As a result, the project builds on MacOS with gcc-6 (without CUDA).	2017-11-02 19:53:36 -04:00
Zach DeVito	b8152eba8d	fix build issue when cuda does not exist	2017-11-02 19:53:36 -04:00
Zach DeVito	23bda6a36e	bind THS THCS, leaving all operators unimplemented. This is required because THPP can represent Sparse tensors even though the wrapper doesn't implement any operators.	2017-11-02 19:53:36 -04:00
Zach DeVito	eed5e2d143	adding build for sparse libraries	2017-11-02 19:53:36 -04:00
Zach DeVito	d2345ff1af	enable warnings in build and fix warnings	2017-11-02 19:53:36 -04:00
Zach DeVito	650f24b569	update readme and add assign_(Scalar) variant	2017-11-02 19:53:36 -04:00
Zach DeVito	328a250b64	fix a bug with scalar handling by simplifiying the maybeScalar check.	2017-11-02 19:53:36 -04:00
Zach DeVito	ec4cb72a0d	handle select and operator[] style operations	2017-11-02 19:53:36 -04:00
Zach DeVito	622350d3e9	add checks for scalars on output	2017-11-02 19:53:36 -04:00
Zach DeVito	d460f6725d	start adding rules to propagate scalar to results	2017-11-02 19:53:36 -04:00
Zach DeVito	dadc23cafb	Scalar objects can now be backed by 0-dim Tensors.	2017-11-02 19:53:36 -04:00
Zach DeVito	4b2ea3ff2f	missing fixed allocator files	2017-11-02 19:53:36 -04:00
Zach DeVito	6100191fff	scalar flags added, and used to dispatch when there is a scalar variant of a function. broadcast annotations are used to figure out when a scalar s + A should also be converted.	2017-11-02 19:53:36 -04:00
Zach DeVito	f62def0701	set TH_INDEX_BASE to 0	2017-11-02 19:53:36 -04:00
Zach DeVito	e7a316e1ee	update with tensorFromBlob doc	2017-11-02 19:53:36 -04:00
Zach DeVito	70e3951eca	allow tensors to be constucted from views of external data. Support creating new tensors that already have a size/stride	2017-11-02 19:53:36 -04:00
Zach DeVito	6b285cb37d	improve error reporting for undefined tensors passed as arguments.	2017-11-02 19:53:36 -04:00
Zach DeVito	7f376a2c46	tensor.data<> also as toLongData() variants. Scalar now also has .to<T>() variants	2017-11-02 19:53:36 -04:00
Zach DeVito	e7436022f4	document accessors	2017-11-02 19:53:36 -04:00
Zach DeVito	e32210658d	add readme and generated files for Type/Tensor/Functions to a doc folder to make it possible to view headers without building the library	2017-11-02 19:53:36 -04:00
Zach DeVito	7a5987123f	rename TensorLib -> ATen	2017-11-02 19:53:36 -04:00
Zach DeVito	2c2648ea38	split Local.cwrap from Declarations.cwrap so local ones can be modified without regenerating declarations from pytorch	2017-11-02 19:53:36 -04:00
Zach DeVito	4e3b1c46d9	adding xt makefile	2017-11-02 19:53:36 -04:00
Zach DeVito	37f5e3ff78	import xt data/meter directories	2017-11-02 19:53:36 -04:00
Zach DeVito	f6f6fa2464	add operator [] to do select	2017-11-02 19:53:36 -04:00
Zach DeVito	56f1019fc7	add overloaded operators for tensor object	2017-11-02 19:53:36 -04:00
Zach DeVito	288fd61c0b	add accessor object for fast(er) access to tensor data when the dim and scalar type are known.	2017-11-02 19:53:36 -04:00
Zach DeVito	927ac2bb1a	add script that can collect all the cwrap declarations for external use	2017-11-02 19:53:36 -04:00
Zach DeVito	3976333bc6	fix build paths and allow for cwrap_files to be externally specified	2017-11-02 19:53:36 -04:00
Zach DeVito	9879566a3b	switch dispatch to function	2017-11-02 19:53:36 -04:00
Trevor Killeen	92c2aad894	more flake8	2017-11-02 19:53:36 -04:00
Trevor Killeen	0ba09a843a	disable tests from cmake for tensorlib	2017-11-02 19:53:36 -04:00
Trevor Killeen	5d330de56e	autopep8	2017-11-02 19:53:36 -04:00
Trevor Killeen	a83f62e36f	remove PUBLIC from target_link_libraries in CMake	2017-11-02 19:53:36 -04:00
Zach DeVito	e65bef39df	fix handling of methods that allocate returns	2017-11-02 19:53:36 -04:00
Zach DeVito	8f9c222fc5	make copy copy_out and add copy_ to be consistency with argument/output order for the rest of the library	2017-11-02 19:53:36 -04:00
Zach DeVito	eb9e6165be	fix error messages	2017-11-02 19:53:36 -04:00
Trevor Killeen	76a2d7bff8	port optional argument declaration handling to shared code	2017-11-02 19:53:36 -04:00
Trevor Killeen	8454b87034	reuse declaration option sorter from common_with_cwrap in ArgcountSortPlugin	2017-11-02 19:53:36 -04:00
Trevor Killeen	595ff0d3ed	move set_declaration_defaults to a common location	2017-11-02 19:53:36 -04:00
Zach DeVito	e88ae5eb49	port xt basic.cpp	2017-11-02 19:53:36 -04:00
Zach DeVito	7897aac109	get rid of lt_t variants for now. These will not be exposed in the C++ library yet.	2017-11-02 19:53:36 -04:00
Zach DeVito	424d5d1faf	fix generator bug, begin porting tests	2017-11-02 19:53:36 -04:00
Zach DeVito	d69f2e4ff9	import xt print code, implementing copy and type conversion	2017-11-02 19:53:36 -04:00
Zach DeVito	eb20c8daa2	initial binding of TH(CU)NN	2017-11-02 19:53:36 -04:00
Zachary DeVito	16c3b7e3f4	return references when the returns are actually just one of the arguments.	2017-11-02 19:53:36 -04:00
Zachary DeVito	ea77e3ddef	auto-generate const mark on tensor based on in-place	2017-11-02 19:53:36 -04:00
Zachary DeVito	285a820877	remove TensorRef. Instead correctly mark const Tensor & and Tensor & in arguments depending on use.	2017-11-02 19:53:36 -04:00
Zachary DeVito	53eafff042	addressing comments from pull request: processors codemodded to backend and other minor changes	2017-11-02 19:53:36 -04:00
Zachary DeVito	6a3e5510dc	fix context initialization to use https://stackoverflow.com/questions/12302057/c11-safe-double-checked-locking-for-lazy-initialization-possible/12302355#12302355	2017-11-02 19:53:36 -04:00
Trevor Killeen	415449470e	autopep8	2017-11-02 19:53:36 -04:00
Zachary DeVito	cbb798fbca	add generator for out-of-library dispatch macro	2017-11-02 19:53:36 -04:00
Zachary DeVito	e178b5c9c6	fix duplicate symbol issue	2017-11-02 19:53:36 -04:00
Zachary DeVito	514b31c5e5	support multiple returns	2017-11-02 19:53:36 -04:00
Zachary DeVito	095cd734fd	resize and zero handling	2017-11-02 19:53:36 -04:00
Zachary DeVito	79bb2d842a	fix a few before_call cases, and annotate the resize info in cwrap	2017-11-02 19:53:36 -04:00
Zachary DeVito	ab7517d888	changes to make cuda parts of wrapper compile.	2017-11-02 19:53:36 -04:00
Zachary DeVito	5d1fd0cab1	add TensorRef so that we don't refcount++ on argument passing	2017-11-02 19:53:36 -04:00
Zachary DeVito	07aacec83d	example things	2017-11-02 19:53:36 -04:00
Zachary DeVito	29c0dadfaa	implement size() stride() and formatting for IntList	2017-11-02 19:53:36 -04:00
Zachary DeVito	b3b61d6596	long -> int64 to avoid hack in Scalar	2017-11-02 19:53:36 -04:00
Zachary DeVito	18095c713d	inline the static methods/functions so they can be optimized	2017-11-02 19:53:36 -04:00
Zachary DeVito	7e98cabf25	switch Tensor to ref counting, using pImpl pattern	2017-11-02 19:53:36 -04:00
Zachary DeVito	17b88322b0	make type statically dispatched	2017-11-02 19:53:36 -04:00
Zachary DeVito	1beb1732bb	switch Storage/Generator to be returned as unique_ptr	2017-11-02 19:53:36 -04:00
Zachary DeVito	be6ec51140	mod to use references rather than pointers to make API look correct	2017-11-02 19:53:36 -04:00
Zachary DeVito	4496398eee	add a default type to make the library more ergonomic	2017-11-02 19:53:36 -04:00
Zachary DeVito	af791fbc83	handle strings as bools, now compiles for CPU classes	2017-11-02 19:53:36 -04:00
Zachary DeVito	d38b4e97c2	more progress getting it to compile, now makes it through a few CPU types and fails on Double	2017-11-02 19:53:36 -04:00
Zachary DeVito	1420566199	fix some ambiguiuty problems	2017-11-02 19:53:36 -04:00
Zachary DeVito	1ce4a51885	checked casting for scalars	2017-11-02 19:53:36 -04:00
Zachary DeVito	e48a14fecc	logic fix for result allocate things	2017-11-02 19:53:36 -04:00
Zachary DeVito	d06ffcc5ca	array ref and storage views for THSize/THStride	2017-11-02 19:53:36 -04:00
Trevor Killeen	d1ef531b09	to env type	2017-11-02 19:53:36 -04:00
Zachary DeVito	cdb06e17a4	more fixes to handle a lot of cwrap	2017-11-02 19:53:36 -04:00
Zachary DeVito	8df54be9d7	some changing before generalizing to more types	2017-11-02 19:53:36 -04:00
Zachary DeVito	4738577036	integrate checked_cast	2017-11-02 19:53:36 -04:00
Zachary DeVito	4936ce0c2f	generating code for neg	2017-11-02 19:53:36 -04:00
Trevor Killeen	fe1c286f48	add cast that may or may not work	2017-11-02 19:53:36 -04:00
Trevor Killeen	4f1e04b615	fix utils (oops), also add prints	2017-11-02 19:53:36 -04:00
Zachary DeVito	bf07aec920	listen to variants	2017-11-02 19:53:36 -04:00
Trevor Killeen	ff89a39d41	add assert function	2017-11-02 19:53:36 -04:00
Zachary DeVito	34ee792c11	more scaffolding for emitting derived functions	2017-11-02 19:53:36 -04:00
Zachary DeVito	f94a145bfa	more scaffolding to generate. still need to generated derived	2017-11-02 19:53:36 -04:00
Zachary DeVito	9f0ce2666e	add stuff to process each option in the right place	2017-11-02 19:53:36 -04:00
Zachary DeVito	33dab3c593	fix cuda build	2017-11-02 19:53:36 -04:00
Zachary DeVito	acbd569710	add places in templates where we will put generated methods	2017-11-02 19:53:36 -04:00
Zachary DeVito	ae0a749258	add flags to be able to build without CUDA	2017-11-02 19:53:36 -04:00
Zachary DeVito	486a606d0d	rename types and processors to match naming in gen.py, allow for [[CPU,floating_point], [GPU,all]] style pair listings so that we can simplify the logic for elaborating pairs	2017-11-02 19:53:36 -04:00
Trevor Killeen	92aee309fd	process types and processors	2017-11-02 19:53:36 -04:00
Zachary DeVito	bf83194db7	sanitize names	2017-11-02 19:53:36 -04:00
Zachary DeVito	8e29bb52e4	add sort...	2017-11-02 19:53:36 -04:00
Zachary DeVito	365dfee37d	Option elaboration	2017-11-02 19:53:36 -04:00
Zachary DeVito	842f94b320	initial sanitize	2017-11-02 19:53:36 -04:00
Zachary DeVito	20441712d1	add declarations	2017-11-02 19:53:36 -04:00
Zachary DeVito	ef37e9d9ad	infra to load yaml from cwrap	2017-11-02 19:53:36 -04:00
Zachary DeVito	d9783e2293	fix header files to be in TensorLib	2017-11-02 19:53:36 -04:00
Zachary DeVito	0902ec3df3	put a fake example op in to understand how dispatch will propagate.	2017-11-02 19:53:36 -04:00
Zachary DeVito	8729715051	add tensor skeleton	2017-11-02 19:53:36 -04:00
Zachary DeVito	cb0366c6ca	adding Type object which will handle dispatch	2017-11-02 19:53:36 -04:00
Zachary DeVito	ca3dd74c55	generate cmake outputs using script	2017-11-02 19:53:36 -04:00
Zachary DeVito	0e0c0ef89e	add storage to generator	2017-11-02 19:53:36 -04:00
Zachary DeVito	c64b031fbf	Initial commit of framework for TensorLib	2017-11-02 19:53:36 -04:00
Sam Gross	3003ebe67a	Replace None grad_inputs with zero tensors in some cases (#3433 ) Replace None grad_inputs with zero tensors in some cases In Python-implemented autograd functions, we sometimes return None as the grad_input if the output is marked "non-differentiable". This replaces those None values with zero-filled Variables if the corresponding input has requires_grad=True. C++ implemented autograd functions expect the input (grad_outputs) to be defined if they're executed. They always return non-null grad_inputs if should_compute_output(i) is true. This could lead to segfaults if a subsequent Python-implemented function returned None. See #3412, #3241	2017-11-02 17:23:25 -04:00
Simon Layton	b07a9e1219	Fix dropout state restoring Summary: \cc akyrola Fixes a few issues: 1. Performance issue related to regeneration of rng states every time the input size changed - this was unnecessary, now states should be initialized once only. 2. States were being overwritten between fprop and bprop operators, causing silent wrong results. This required use of the new `cudnnRestoreDropoutDescriptor` API, requiring a new gating behind cuDNN v7 3. Random seed was not being inherited from the `operator_def.device_option()` Closes https://github.com/caffe2/caffe2/pull/1418 Differential Revision: D6222081 Pulled By: akyrola fbshipit-source-id: 021067b95bcf0a16db8f4a73d3ed70e21b54bc9f	2017-11-02 14:17:41 -07:00
Danqing Liu	b1ea066836	Remove duplicate Docker dependency Summary: Closes https://github.com/caffe2/caffe2/pull/1396 Differential Revision: D6224487 Pulled By: Maratyszcza fbshipit-source-id: 79b5641e9d8a7e5bc487f76ea931cf431e341707	2017-11-02 14:01:27 -07:00
SsnL	8b1b06d723	add CUDA_DEBUG build flag (#3419 )	2017-11-02 15:35:18 -04:00
Sam Gross	48fe5d4622	Move select and permute to ATen/C++ (#3421 ) Move select and permute to ATen/C++	2017-11-02 15:17:36 -04:00
gchanan	066db5dea3	Don't rely on squeeze_out in THD. (#3446 ) We don't currently generate _out functions for ATen native functions and may not (they don't work with Variables currently). Also, the existing code was wrong as the argument orders were swapped in the two squeeze variants.	2017-11-02 15:12:39 -04:00
ngimel	dfaccc96b7	add dockerfile with cuda9 volta support (#3445 )	2017-11-02 15:08:39 -04:00
Romain Cledat	d0accb85e0	Send/Recv C++ portion Summary: Implements send/receive calls in C++. This includes both a C2 independent library in async/comm as well as the C2 operations in the c2 sub-directory There are still several items to be addressed in future diffs: - multiple channels per pair to alleviate the issue with small message latency - re-add statistics per comm-client and per-op - continue adding test cases as usage patterns diversify Reviewed By: akyrola Differential Revision: D6095219 fbshipit-source-id: 6d72770dbac693d2b7035f03ce8c6df5ce03706e	2017-11-02 11:25:50 -07:00
Andrew Tulloch	8d377617e7	Fix MKLMemory::CopyTo for case where shapes don't match' Summary: There were cases where the direct copy succeeded, but the dimensions didn't match. Now, we check dimensions and reset if they don't match before issuing the copy. Reviewed By: salexspb Differential Revision: D6103325 fbshipit-source-id: 602605d8b119cae74e006c792bc42f355a5a9b4e	2017-11-02 11:25:49 -07:00
Andrew Tulloch	7244d27220	Add a EmptyDeviceScope (i.e. allow setting CurrentDeviceScope() to None) Summary: See comments for where this can be useful (disabling the OperatorDef::DeviceOption(...) so we can control the scope at the NetDef::DeviceOption(...) level). Reviewed By: viswanathgs Differential Revision: D6103412 fbshipit-source-id: 75a9be54275760132f6d1e71acbe9190e7099289	2017-11-02 11:25:48 -07:00
Andrew Tulloch	d96a5ddb1b	Load/Save/Reshape in MKL via fallback Summary: TSIA Reviewed By: viswanathgs Differential Revision: D6103278 fbshipit-source-id: dc3d2754bed5bf54f2ab8f0a9a9cc0d5d15502af	2017-11-02 11:25:47 -07:00
Andrew Tulloch	1fb68fd371	Fix FC op invariant Summary: TSIA Reviewed By: viswanathgs, pietern Differential Revision: D6103249 fbshipit-source-id: c4563dd6900f23a0ee1d1bf1386238f19b4ba7bd	2017-11-02 11:25:46 -07:00
Aapo Kyrola	14f95c2782	Updated brew SpatialBN to use initializers Summary: Updated brew SpatialBN to use initializers similar to other brew ops such as conv and fc instead of initilaizing all of its parameters itself within the brew call. Reviewed By: asaadaldien Differential Revision: D5840359 fbshipit-source-id: 9f3d688d4957605eaf7ecd2488bc26bfb1da3f78	2017-11-02 11:25:45 -07:00
Fei Sun	e4af5e4e04	Update the sample rate function call since the API is changed Summary: With the update of the sample rate API, caffe2_benchmark needs to be changed as well. Tested building the caffe2_benchmark and running the program on an android phone. See the delay metrics reported in adb. Closes https://github.com/caffe2/caffe2/pull/1419 Reviewed By: Maratyszcza Differential Revision: D6221101 Pulled By: sf-wind fbshipit-source-id: 77a06ecce55b54cff8b9fa0aef857bc542a5f371	2017-11-02 10:32:12 -07:00
Sam Gross	afdf50cafe	Move jit/assert.h to csrc/assertions.h (#3442 ) I've kept JIT_ASSERT as an alias to TORCH_ASSERT, which we can use throughout the C++ code.	2017-11-02 13:26:51 -04:00
Soumith Chintala	bed30c1582	long* -> int64_t*	2017-11-02 13:13:10 -04:00
Richard Zou	9b2117ed87	Fix MSELoss docs (#3443 )	2017-11-02 13:08:36 -04:00
Soumith Chintala	fc7a68d147	fix lint	2017-11-02 07:36:58 -04:00
Soumith Chintala	4108feb27d	fix OSX cuda build	2017-11-02 07:15:24 -04:00
Hassan Eslami	2ed64d13db	Generate globally unique workspace id for hogwild threads on all trainers Summary: As title Reviewed By: azzolini Differential Revision: D6150329 fbshipit-source-id: 5102fb7605b889a54d6654017f452cad6be78ef3	2017-11-01 23:37:34 -07:00
Edward Z. Yang	9ca8b321f5	Skip cpp tests if CUDA not available. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-02 02:21:10 -04:00
Hassan Eslami	5388948b59	CreateLocalBlob for workspace Summary: Adds the ability to create a local blob in the workspace even if the blob exists in the parent workspace. This is to support cases where a user wants to create a local copy of the blob and hide the blob from the parent workspace. Reviewed By: akyrola Differential Revision: D6194386 fbshipit-source-id: 92c064159ac635ee76c211abc013b72bd8752447	2017-11-01 21:32:47 -07:00
Fei Sun	ef48ab0bb3	Update observer sample rate Summary: We'd like to sparsely sample the net execution, but after the net is sampled for the first time, we'd like to densely sample the following few iterations so that we can have some meanful data for a short period of time. Change the observer sample rate to the following: skipIter: skip the first few iterations. netInitSampleRate: the sample rate for the first iteration after the skipIter or immediately after reset. netFollowupSampleRate: the sample rate after the netInitSampleRate is hit. netFollowupSampleRate: the number of iterations that use the netFollowupSampleRate. After this number is hit, use netInitSampleRate (reset) operatorNetSampleRatio: whenever the net is sampled, if the random number also hit operatorNetSampleRatio, collect operator metrics instead. Reviewed By: Maratyszcza Differential Revision: D6205657 fbshipit-source-id: da0c048f77fc4dc64f3fb71b6072429a57e9d2f0	2017-11-01 18:59:42 -07:00
Junjie Bai	7c2804ee90	Add support for doing broadcast with single elem dimensions at both ends Summary: Closes https://github.com/caffe2/caffe2/pull/1413 Reviewed By: jamesr66a Differential Revision: D6201556 Pulled By: bddppq fbshipit-source-id: 1d443e895dbb3f5b67a5a0e027977b7807df3de1	2017-11-01 18:33:11 -07:00
Pieter Noordhuis	2c10b13eeb	Pass CUDA_NVCC_EXECUTABLE to NCCL build Summary: If this variable is set to a ccache symlink then the NCCL build will also use the cache. The NCCL build is the slowest component of a cached build without this change Closes https://github.com/caffe2/caffe2/pull/1416 Reviewed By: Yangqing Differential Revision: D6214008 Pulled By: pietern fbshipit-source-id: e0a90e27de9b1c5a1fdc0e5bad5fb61f9fa924c3	2017-11-01 15:32:22 -07:00
Adam Paszke	53b01527f4	Improve NYI error message to point to VariableType	2017-11-01 23:18:17 +01:00
Adam Paszke	6e17e73701	Register VariableType methods in autograd profiler	2017-11-01 23:18:17 +01:00
Alisson Gusatti Azzolini	72a5bb3c09	Remove possible static initialization order fiasco Summary: CAFFE2_ENFORCE accesses a global variable in a separate compilation unit. Reviewed By: romain-intel Differential Revision: D6200236 fbshipit-source-id: a501b05bd23afec2ef4a23dd482a4dc4cfc196f1	2017-11-01 14:28:35 -07:00
Aapo Kyrola	b5c053b1c4	fix fp16 issues with resnet trainer Summary: My commit bab5bc broke things wiht fp16 compute, as i had tested it only with the null-input, that actually produced fp32 data (even dtype was given as float16). Also, I had confused the concepts of "float16 compute" and fp16 data. Issue #1408. This fixes those issues, tested with both Volta and M40 GPUs. Basically restored much of the previous code and fixed the null input to do FloatToHalf. Reviewed By: pietern Differential Revision: D6211849 fbshipit-source-id: 5b41cffdd605f61a438a4c34c56972ede9eee28e	2017-11-01 13:30:08 -07:00
Lu Fang	66d24c5067	Update the ONNX doc	2017-11-01 15:43:08 -04:00
Trevor Killeen	0e38d3bbb3	remove thpp library (#3405 )	2017-11-01 11:57:09 -04:00
Trevor Killeen	df0bf06385	move type enum into THD (#3403 )	2017-11-01 10:41:55 -04:00
Trevor Killeen	b544882335	ATen in THD (Part I) (#2288 ) * enable size from ATen type * temp commit aten thd * port copy, math * port random * changes after rebase * lapack bind * thd and csrc compile * fix min/max reductions in DataChannelTCP * clean up changes * re-enable tensor constructors * port MPI to at::Tensor * fix storage methods to not cast to thpp storage ptrs	2017-11-01 09:59:02 -04:00
Edward Z. Yang	b7f5bc506e	Make inputs/outputs return an ArrayRef. Some knock on effects: - at() is not supported on ArrayRef. I fixed this by adding a new overload for input() to access a specific input. I also filed https://github.com/zdevito/ATen/pull/152 - Need new overloads for fmap/filter, because template deduction won't attempt an implicit constructor in attempt to match the argument. - New overload in ir.cpp for printing ArrayRef. - When we pybind11 an ArrayRef, we convert it into an iterator. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Edward Z. Yang	d4abaa4b9e	Move ONNX broadcast fusion into separate ONNX pass, fixes verbose printing. This breaks a lot of the onnx-pytorch tests because the abstraction barriers are not respected. I'll spin up a patch for that separately. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Edward Z. Yang	247d50e2ad	Improve const-correctness of JIT. This started off as a minor fix based on Adam's question, "why is printing a graph not const" and snowballed into a giant yak shaving exercise. - The Graph and Node APIs now uniformly enforce deep constness; e.g., if you get a const Node* or const Graph, it is not possible to get a non-const Node/Graph* somewhere else in the graph (even though the member variables of these are non-const. Hooray for private access specifier.) - A big pile of functions got const versions, most notably the printing functions, and functions for accessing inputs(). - REALLY IMPORTANT, BC-BREAKING CHANGE: inputs() now returns a COPY of the inputs, rather than a reference to the underlying. I was forced to do this because there is no way to portably turn a std::vector<Node> into a std::vector<const Node>, which is necessary to provide a const-correct version of inputs() that enforces deep const-correctness. I then justified this choice to myself with the observation that outputs() returned a copy (by necessity), so this makes the API more uniform. But making this change uncovered two very subtle bugs: 1. If you change functions from returning a reference to returning a copy, the idiom node->inputs().begin() is no longer valid, because the memory the iterator points to immediately becomes invalid. THIS SUCKS. Honestly, we should add a lint rule rejecting calling begin()/end() on temporaries because this is very dangerous. To excise this pattern from the codebase, I added begin() and end() methods to Graph, so that we got rid of the graph->nodes().begin() idiom, which happens to be sound, despite not returning a reference, because graph_node_list is a non-owning reference. 2. pybind11 doesn't handle std::vector<Node> cast out of the box. Fortunately, I found a simple fix in the GitHub issues tracker that involved adding an extra type converter. And yes, this does mean that outputs() in Python never worked correctly. - New const_graph_node_list, which is a graph_node_list that gives you const Node There are some more miscellaneous improvements: - Applied CR comment fixes on export.cpp; using replaceInput, and renaming variables for clarity. - assertValidInput helper method added, and applied to replaceInput - Use an explicit function to print THPObjectPtr, otherwise we get the wrong overload. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 09:49:53 -04:00
Hugh Perkins	b043a74919	fix softmax doc (#3337 )	2017-11-01 08:47:51 -04:00
Gökçen Eraslan	638f0b5d78	Prevent numerical issues with poisson_nll_loss when log_input=False (#3336 ) * Prevent numerical issues with poisson_nll_loss when log_input=False Evaluation of the logarithm of the input variable in poisson negative log likelihood leads to NaN loss if variable being evaluated is zero. Small epsilon is added to prevent this. See equivalent Keras epsilon here: https://github.com/fchollet/keras/blob/master/keras/losses.py#L68 * PEP8 fix * Add epsilon support to PoissonNLLLoss in nn.modules.loss	2017-11-01 08:47:19 -04:00
Soumith Chintala	91af122d43	add no-as-needed for THRTC	2017-11-01 04:25:42 -07:00
Edward Z. Yang	ae48a394b7	Count hits/misses, add statistics printing. (#3369 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-01 06:35:48 -04:00
Richard Zou	6214487fa7	Add reduce keyword to L1Loss (#3366 ) * Add reduce keyword to L1Loss * Fix legacy test for abscriterion * Address comments	2017-11-01 06:33:18 -04:00
Richard Zou	bf4c269bee	Implement reduce keyword for SmoothL1Loss (#3382 ) * Implement reduce keyword for SmoothL1Loss	2017-11-01 06:29:34 -04:00
Xiao Yang	3cb34744db	adaptive pooling supports only specifying size in certain dimension (#3127 ) * adaptive pooling supports only specifying size in certain dimension	2017-11-01 06:11:30 -04:00
Adam Cécile	d77b94495d	Pass -DOMPI_SKIP_MPICXX=1 when building C code (#3378 )	2017-11-01 06:09:03 -04:00
Soumith Chintala	88d9ebc850	lazy-load nvrtc and libcuda (#3408 )	2017-11-01 06:07:03 -04:00
SsnL	fa5efab669	comments and case where not all sparse (#3370 )	2017-11-01 06:05:17 -04:00
Sam Gross	7c0b16c140	Add torch.take and Tensor.put_ (#3263 ) * Add torch.take and Tensor.put_ These are similar to numpy.take and numpy.put. The take function allows you to linearly index into a tensor without viewing it as a 1D tensor first. The output has the same shape as the indices. The put function copies value into a tensor also using linear indices.	2017-11-01 06:04:44 -04:00
Richard Zou	d905a90f0b	Clear out eigenvector tensor when eigenvector=F for symeig (#3411 )	2017-11-01 05:51:42 -04:00
Sean Naren	cf256ee268	Added tensor op check for cudnn rnns (#3409 )	2017-11-01 05:51:23 -04:00
Ahmed Taei	e0e4b3a3b5	Fix strides 3D bias descriptor Reviewed By: ajtulloch Differential Revision: D6206759 fbshipit-source-id: 7bace218593ded8b854921eaa9811a7ffb49eb69	2017-10-31 22:47:50 -07:00
James Cross	397793d61c	simplify beam search code Summary: This cleans up the _hack_get_slice_end() using the Conditional operator. Reviewed By: jmp84 Differential Revision: D6177797 fbshipit-source-id: 5ce0b76b8472123415bba39488aa2c69aad96111	2017-10-31 16:59:20 -07:00
Marat Dukhan	f8cc285e37	Add explicit build dependency on NNPACK Summary: Caffe2 fails to build with some old CMake versions because it doesn't figure out that the build implicitly depends on NNPACK build. This commit adds this dependency explicitly. Closes https://github.com/caffe2/caffe2/pull/1414 Differential Revision: D6203486 Pulled By: Maratyszcza fbshipit-source-id: 86f6d9d88976656820f44e3416c57ddf22350362	2017-10-31 16:29:51 -07:00
Richard Zou	81b995514e	Make THTensor_(var) and THTensor_(std) more numerically stable (#3410 )	2017-10-31 18:36:26 -04:00
gchanan	3c00c0169d	Make mm grads column major when the input is column major. (#3406 )	2017-10-31 17:55:38 -04:00
Yongqiang Wang	db25f8602f	Remove order by clause if it is not needed. Increasing timeout from 10mins to Reviewed By: asaadaldien Differential Revision: D6167599 fbshipit-source-id: 3e6bdd55d0aa5b497cc1871f237074b3b9ef6f29	2017-10-31 14:51:39 -07:00
SsnL	6fef6f6dee	fix upsample1d (#3407 )	2017-10-31 17:49:24 -04:00
Gökçen Eraslan	d4a0ec62dc	Typo fix in torch.median (#3399 )	2017-10-31 17:19:40 -04:00
James Cross	00567a14fc	Clarify Slice operator documentation Summary: Updating the documentation to clarify the behavior of negative end indices. Reviewed By: jamesr66a Differential Revision: D6169058 fbshipit-source-id: f14f7cb8b30c26b1cccce104eba8c957a444657f	2017-10-31 12:55:43 -07:00
Zachary DeVito	8cc30e4895	Fix the Fusion Pass (#3362 ) * update fuser to match ATen-formatted JIT ops * fix concat optimizations and add test * allow onnx export to work with single-export functions * fix onnx handling of multi-return nodes. * nits, format, vision test update * fix add constant * fix driver init issues * Add missing Neg symbolic. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-31 13:44:13 -04:00
Richard Zou	690256c18c	Remove MSELoss test module in favor of wrap_functional	2017-10-31 17:56:40 +01:00
Edward Z. Yang	c10898f8ab	Revert "ATen expand symbolic" This reverts commit 47f999b814515a6fac03be9d242d8f24917d3a80.	2017-10-31 08:27:53 -07:00
Frank Cash	7b5ac333ad	Update README.md (#3392 ) Getting started HyperLinks match up now.	2017-10-31 10:48:33 -04:00
Ace	3f6fccd1a8	fixes for torch.nn.Hardtanh (examples and CPU implementation) (#3391 )	2017-10-31 14:29:42 +01:00
Geoffrey Roeder	dce525ab6b	adds sample_n function (#3249 ) * adds sample_n function * fixes style issues * uses more efficient api calls * fix bug where transpose applied to 1 dimension	2017-10-31 09:04:05 -04:00
Soumith Chintala	bd8bf4a86e	enable remaining tests and fix a subtle issue in ConvBackwardBackward around sizes being a reference	2017-10-31 09:02:16 -04:00
Soumith Chintala	b46ee946d9	add double backward for ConvTranspose	2017-10-31 09:02:16 -04:00
Alykhan Tejani	429d66549e	If available try to use requests instead of urllib for model_zoo.load_url (#3280 ) * try to use requests instead of urllib for load_url if available * add noqa to silence flake warnings * remove urllib import into the except	2017-10-31 08:53:58 -04:00
Filip Binkiewicz	e4a3747cd8	Add unit tests for casting onto scalars	2017-10-31 08:51:55 -04:00
Filip Binkiewicz	c2e8b7aafe	Allow casting Variables onto Python scalars	2017-10-31 08:51:55 -04:00
Filip Binkiewicz	54bfa88eec	Allow casting one-element Tensors onto Python scalars	2017-10-31 08:51:55 -04:00
Aapo Kyrola	f7b15c52ff	partial revert of D6155510 to fix a race condition Summary: I (actually by mistake) included some premature optimization in D6155510 for the threaded rnn executor. Unfortunately, there was a subtle race condition when some ops where run out-of-order, but i had made the count down only to count down in the last timestep. Hard to explain. For caution, revert D6155510's changes to recurrent_network_executor.cc excluding one assertion and setting of the debug flag. Differential Revision: D6195544 fbshipit-source-id: 24a275e185e5a80835401a8cdcb162dbc2411789	2017-10-31 00:12:18 -07:00
Aapo Kyrola	cec27b8134	AddDistributedBlobsSync Summary: Added a simple function to synchronize a blob across machines (but not across devices), i.e a blobs that are not synced over devices. Reviewed By: yqwangustc Differential Revision: D6192922 fbshipit-source-id: a4d653c9fb09f06b0c42330bdae07b42f5e6346c	2017-10-30 22:33:29 -07:00
Dong Li	3bfabb4d5f	support float16 input for operator SparseAdagrad Summary: Implemented new CUDA class for operator SparseAdagrad. The param and moment inputs now can be float or float16. The functions for mixed-precision add/mult/store are defined in a separate head file ("caffe2/core/float16_util.h") for reuse purpose. Reviewed By: azzolini Differential Revision: D5880200 fbshipit-source-id: dca227f38629a03a9d771f42efe2c0b673075c4d	2017-10-30 19:32:30 -07:00
James Reed	47f999b814	ATen expand symbolic	2017-10-30 20:24:46 -04:00
Aapo Kyrola	669ec0ccba	Added FP16 compute support to FC Op Summary: Allow the GEMMs in the FC/FCGradient Op to do FP16 compute instead of FP32 if the appropriate op flag is set. Reviewed By: asaadaldien Differential Revision: D5839777 fbshipit-source-id: 8051daedadf72bf56c298c1cf830b019b7019f43	2017-10-30 17:03:51 -07:00
gchanan	3e6e81da46	Dispatch trivial variable operators to C++ aten functions. (#3372 ) Implement __comparison_ops__ by calling the VariableBase methods.	2017-10-30 19:46:05 -04:00
SsnL	8cd0df020c	make sparse (new) functions conform that storage is not NULL (#3381 )	2017-10-30 18:55:26 -04:00
Weiyi Zheng	7d096ff7e6	use CAFFE2_ENFORCE_EQ for more detailed error message Summary: CAFFE2_ENFORCE(a == b) and CAFFE2_ENFORCE_EQ() are functionally equivalent, though the later provides a more detailed failure message. Reviewed By: salexspb Differential Revision: D5991775 fbshipit-source-id: 52e4d6d559c933de5b33d791b20223effe9d4f66	2017-10-30 15:44:52 -07:00
Richard Zou	eac0942f6d	Add more nn docs (#3374 )	2017-10-30 18:37:36 -04:00
Adam Cécile	a5dbc254f8	if git is not installed at all, no subprocess exception will be raised (#3379 )	2017-10-30 18:37:12 -04:00
Adam Cécile	d38fccc586	Debian/Ubuntu comes with GCC 4.9.2 and it does require -D_FORCE_INLINES (#3380 )	2017-10-30 18:36:35 -04:00
Richard Zou	2be8bd1880	Add docs for ByteTensor any()/all()	2017-10-30 16:00:48 -04:00
albanD	1ae10a4831	add test to check zero_strided tensors in blas level 2 and 3 functions	2017-10-30 16:00:21 -04:00
albanD	d04574b1fc	ensure BLAS/MKL is not used if stride values are not supported	2017-10-30 16:00:21 -04:00
Aapo Kyrola	86e3e008e0	optimize RNN executor subnet construction for forward-only models Summary: RNN executor had a disadvantage to plain nets when running in forward-only mode: for plain nets, we only create two workspaces and two nets and alternate between them. With RNN executor, we had only four workspaces (4 > 2 because it was faster in some cases), but the nets (or rather the ops) were created for each of the timesteps. This has significant overhead. This diff changes this sos that if executor is is forward-only mode (i.e has limited parallelism setting), then it will use the same operators as the t - 4'th net -- excluding the ops that require the timestep blob. The latter exception is required because RNN executor needs different timestep blob for each timestep because it cannot modify the value of the timestep blob like when running nets in a loop. Also removed redundancy in the dependency computation and added a debug flag to the executor that outputs the description of the rnn contents. Reviewed By: salexspb Differential Revision: D6155510 fbshipit-source-id: c47f727d2128649b081270d15020a08d41e5748d	2017-10-30 12:24:12 -07:00
vfdev	acb73c729b	Space is missing in __repr___ of conv (#3229 ) * - Remove spaces in `__repr__` of layers - Replace `size` by `kernel_size` in `__repr__` of a pooling layer * Fix flake8 errors	2017-10-30 13:45:37 -04:00
Ozan Caglayan	28f3d50f9d	doc: Replace nclasses with C	2017-10-30 12:06:20 -04:00
Ozan Caglayan	71d731fb57	Fix documentation inconsistencies for some loss classes - The actual parameter is weight not weights - Unify all mentions about batch_size -> N - Unify all mentions about n_classes -> C	2017-10-30 12:06:20 -04:00
Junjie Bai	b7a9f51de3	In BatchMatMul, add support for accepting inputs >=2d Summary: Closes https://github.com/caffe2/caffe2/pull/1399 Differential Revision: D6183083 Pulled By: bddppq fbshipit-source-id: 5c8f17c2de212fbc39a66c90aa2599b714f5ceb4	2017-10-29 23:38:33 -07:00
Edward Z. Yang	8fbe003d4e	Miscellaneous ONNX fixes and behavior changes. - Deleted Addmm/Concat Function class, as this is now native ATen operator - Resurrected ONNX operator for Concat (now called 'cat') - Add a "fake" Expand ONNX operator, which we now do the optimization on; this helps prevent us from emitting a warning that 'expand' is not supported. We still fail if any of these Expand operators make it to the final model, until we actually formalize Expand in ONNX. This also simplifies the fuseBroadcast code, because single-return ONNX nodes don't get select nodes. - New error reporting strategy. If we fail to export an operator because of something, we emit a warning, but otherwise keep going. At the very end, in export.cpp, we now check if there are any ATen operators left over. If there are, we bug out. This assumes that ATen is lower case and ONNX is upper case. You're now supposed to 'return _unimplemented(msg)' in these cases. - New toString() method on Graph, for getting the string graph (useful for slapping it into error messages.) - Some of the legacy symbolics (still in Python symbolic method of Function subclass) have been cleaned up for clarity.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Edward Z. Yang	40f7f6e095	Improve handling of 'expand' (broadcasting) in JIT and ONNX The pieces: - I improved the lint / asserts to catch some bugs which I committed while working on my export. There are two new properties which the linter checks now: (1) "Anticipated uses". If a node says that is used by M, M better appear later in the topsort. Previously, we only checked if it was in all_nodes. (2) If you are a select node, you better be a multi-type node; if you're not a select node, you better not be! And you should never have an input that is multi-type. - There is a new peephole optimization pass, for simple, local transformations to graphs. Right now, it implements a simple optimization: remove 'expand' invocations that are no-ops (the size before matches the size after), but we can add other things to it later. I needed this for ONNX because no-op expands show up in the left-hand argument, which we don't support. - There is now a broadcast fuser, which fuses ATen expand ops into broadcastable ONNX ops (Add, Div, Mul, Pow, Sub, Gemm.) It only fuses when the original size is a suffix of the new size, as per the ONNX spec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-29 23:50:34 -04:00
Nintorac	2e42272cc1	Make DataParallel a no-op when CUDA not available (#3318 )	2017-10-29 13:47:36 +01:00
Marat Dukhan	bbafd4fa90	Fix native compilation on ARM/Linux (update NNPACK) Summary: Brings in Maratyszcza/NNPACK@5974985f99 which fixed native compilation issues on 32-bit ARM Linux systems Closes https://github.com/caffe2/caffe2/pull/1398 Differential Revision: D6182438 Pulled By: Maratyszcza fbshipit-source-id: f5b5b96acebf075dddbe89f4e4979e00a50b340f	2017-10-28 16:40:35 -07:00
SsnL	4f33b136d8	add tests for the previously failing coalesce case	2017-10-28 18:52:35 -04:00
SsnL	0b89a68111	fix sparse tensor coalesce	2017-10-28 18:52:35 -04:00
SsnL	bb7b630953	fix pynew gpu_guards	2017-10-28 18:52:35 -04:00
SsnL	91a8d3325e	test sparse dp, broadcast_coalesced, reduce_add_coalesced	2017-10-28 18:52:35 -04:00
SsnL	01be4d6b20	sparse broadcast_coalesce and reduce_add_coalesced	2017-10-28 18:52:35 -04:00
SsnL	3a0aee71f3	fix sparse tensor .cpu()	2017-10-28 18:52:35 -04:00
Tzu-Wei Huang	618026e999	implements operator + for Dataset class (#3180 ) * implements operator + for Dataset class * check for exact equivalent	2017-10-29 01:19:59 +05:30
Yangqing Jia	e0d7de5b61	Fix bug introduced in a recent commit Summary: This is introduced in `8539a1e78b` - vector<float> should not be used in Tensor shape inference. Closes https://github.com/caffe2/caffe2/pull/1393 Reviewed By: akyrola Differential Revision: D6181075 Pulled By: Yangqing fbshipit-source-id: 002144a137148b5b16118d0c123132890e8d325a	2017-10-28 10:11:20 -07:00
John Chiotellis	a0ce84e476	fix triplet margin loss documentation (#3339 )	2017-10-28 17:15:58 +02:00
Adam Paszke	6d2e39559a	Replace Variable constructor with a static_cast in tracer	2017-10-28 17:10:38 +02:00
Adam Paszke	a381fa10a5	Add a hack for RNN export to ONNX	2017-10-28 17:10:38 +02:00
SsnL	9107110d3a	Add sparseTensor.new wrapper bindings (#3329 )	2017-10-28 16:34:08 +02:00
Hugh Perkins	820ac0df2b	fix mathjax notation on softmax/softmin (#3338 )	2017-10-28 18:10:35 +05:30
Qinqing Zheng	42ffb1ae07	support non-normalized weights Reviewed By: akyrola Differential Revision: D6158290 fbshipit-source-id: 4d54e5c0d0f91f23deab18da047df4d209d4c312	2017-10-27 23:18:25 -07:00
Maxim Berman	7b00adf5d3	Add CUDNN_LIB_DIR in rpath (#3255 ) * Add CUDNN_LIB_DIR in link -rpath * insert CUDNN_LIB_PATH in front of rpath	2017-10-28 00:13:53 -04:00
Darren Garvey	2d0667233a	Add .dockerignore. (#3333 ) .gitignore should have uninteresting files listed, so acts as a good .dockerignore. Reduces the build context sent to the docker daemon from to 2.927GB (after building locally) to 66.66MB (:O).	2017-10-28 00:11:11 -04:00
James Reed	204044a522	Symbolic representation for unfold using ATen (#3334 )	2017-10-28 00:08:45 -04:00
bddppq	ac8f56656d	Adapt ONNX Slice op changes (#3316 )	2017-10-28 00:03:29 -04:00
Darren Garvey	dc6c9e8df8	Fix compilation without numpy. Fix this and related errors: Tensor.cpp:309:47: error: ‘PyArray_Check’ was not declared in this scope	2017-10-28 00:50:57 +02:00
Dmytro Dzhulgakov	e4752518a6	Tiny optimization to AsyncDagNet: wait on fewer events Summary: Just noticed while reading the code. We can wait only to tails of the dag, not every execution chain node. Reviewed By: akyrola Differential Revision: D5861078 fbshipit-source-id: f4f6296fed1ccc96b1ab99b4272b82c8bf764ca9	2017-10-27 14:49:45 -07:00
SsnL	de1f4e69dd	raw text (#3327 )	2017-10-28 01:24:02 +05:30
Simon Layton	f3078dec64	Add cuDNN handles to CUDAContext Summary: Add CUDAContext::cudnn_handle() for easier integration of single cudnn routines into operators without requiring the weight of CuDNNWrapper or similar, or needing to spin out a separate CuDNN*Op version of an operator. It was necessary to split out the cuDNN wrapper code from the base cuDNN helpers in order to resolve a circular dependency between context_gpu.h and common_cudnn.h when handles and cuDNN `#define` were added. Closes https://github.com/caffe2/caffe2/pull/1376 Reviewed By: pietern Differential Revision: D6162034 Pulled By: akyrola fbshipit-source-id: 95687e55b3e1e921e1f5e0f016f43b586f5f3350	2017-10-27 12:03:11 -07:00
Aapo Kyrola	86dc6e0837	Added inverted FP16 Initializer Summary: Added initializer which sets up the ParameterInfo object in the opposite format as the pFP16Initializer. This is needed for when the op requires the initialized blob to be FP32 but a FP16 copy of the weights is needed. Reviewed By: wesolwsk Differential Revision: D5840832 fbshipit-source-id: 439e87f41a1dbc58bf63a5c0e7f7fc4cb00b4d65	2017-10-27 10:20:04 -07:00
Richard Zou	d8f3c601e4	Add reduce keyword to CrossEntropyLoss	2017-10-27 19:19:52 +02:00
Jon Crall	9735ddd899	check_env_flag now ignores case (#3317 )	2017-10-27 15:15:02 +05:30
Jiyan Yang	ee3baa2ed4	Add shape checks and print more info in parameter sharing Summary: As titled. Reviewed By: kittipatv Differential Revision: D6145747 fbshipit-source-id: 39a212bb6bebbbf3164cade2f95db22ddb2d2c87	2017-10-27 01:22:06 -07:00
Tilak Sharma	7b7dcaf269	Initialize presence tensor if data is empty. Summary: See https://fb.facebook.com/groups/811605488888068/permalink/1645450575503551. Differential Revision: D6116836 fbshipit-source-id: 3072643eaf6f134bda7d224af3d5f8339da1f39d	2017-10-27 01:05:42 -07:00
Qing He	0b0d5b2b1d	Add tensor output that gives the sampled values Summary: Given an additional tensor containing the values corresponding to the weighted samples, add tensor output that contains the values selected by the sampled indexes. Reviewed By: akyrola Differential Revision: D6050094 fbshipit-source-id: 1eccc641b99e30d36ae83d49f630b018a53e4147	2017-10-26 16:04:57 -07:00
Kittipat Virochsiri	879e39ea5c	Distill loss with SigmoidCrossEntropyWithLogits Summary: Sigmoid + CrossEntropy has numerical stability issue. The gradient of sigmoid is `dx = dy * y * (1-y)`. When `label=0` and `x` is large, `1-y` could be round to (near) 0 and we loss `dx`. Switch to `SigmoidCrossEntropyWithLogits` solve the issue because the gradient is not dependent of `y`. Reviewed By: chocjy Differential Revision: D6086950 fbshipit-source-id: f990ae726802aa5c56fa62cf5e23f2e61ee047fa	2017-10-26 15:18:34 -07:00
Sam Gross	d56713680d	Fix const modifiers on VariableImpl	2017-10-26 14:31:29 -07:00
Sam Gross	a762fe0b0d	Merge commit 'cbe7b8b636ea840fa9e02608011572936fb5a2b3'	2017-10-26 14:31:22 -07:00
SsnL	86d0c24b6a	Dynamically find min log scale #3289 * dynamic fina min scale * compute only once each _number_format call	2017-10-27 02:42:16 +05:30
Adam Paszke	fa0f3cf98a	Re-enable and fix most JIT tests	2017-10-27 02:40:09 +05:30
Adam Paszke	61afb0d519	Autogenerate ATen dispatch for JIT nodes	2017-10-27 02:40:09 +05:30
James Reed	869bdeb936	Symbolic implementation of Index supporting tuple of slices. (#3294 )	2017-10-27 02:39:38 +05:30
Bor-Yiing Su	e0fa72455d	Fixes the checkpoint test. Summary: We need to use Cluster to isolate the definition of the nodes. Otherwise, the contexts are polluted and the run becomes stateful. Reviewed By: Yangqing Differential Revision: D6140404 fbshipit-source-id: 09d1c86ef12bb01eaa16b1dade4d2e1e93be287a	2017-10-26 13:18:21 -07:00
Yangqing Jia	545c0937fb	Making a module option for Caffe2 Summary: This will help releasing models that are using Caffe2 but have their own operator implementations and extensions. More detailed docs to arrive later. Let's see what contbuild says. Closes https://github.com/caffe2/caffe2/pull/1378 Differential Revision: D6155045 Pulled By: Yangqing fbshipit-source-id: 657a4c8de2f8e095bad5ed5db5b3e476b2a877e1	2017-10-26 12:33:58 -07:00
Aapo Kyrola	c3a4bc5d73	fix asan-error by removing SHOULD_NOT_DO_GRADIENT from .cu file Summary: For some reason, having SHOULD_NOT_DO_GRADIENT in a .cu file (this is for an only-CUDA operator) will cause double-free error detected by asan. This is why innocent looking D5837837 caused automatic asan tests to fail (at least on Xray). Removing these entries makes the error go away, and is ok because we don't really need these tags. But it would be nice to understand what causes the double-free. I don't have time to investigate myself now. Reviewed By: Maratyszcza, salexspb Differential Revision: D6161559 fbshipit-source-id: a52cb2a9cc62f2ec54ed866846f2bd1ccb0ae90f	2017-10-26 11:55:03 -07:00
Richard Zou	3853d5da97	Add reduce keyword to NLLLoss and NLLLoss2d (#3080 ) * API changes * Implement reduce for THNN ClassNLLCriterion * Implement reduce keyword for THCUNN ClassNLLCriterion * Implement reduce for THNN SpatialClassNLLCriterion * Implement reduce for THCUNN SpatialClassNLLCriterion * Make legacy NLLLoss work * Docs for NLLLoss reduce * reduce keyword for double backwards NLLLoss * reduce=False tests * Addressed comments * Fix trailing whitespace * Fix test failures in legacy nn * Rebase: add reduce keyword to aten declarations of NLLLoss * Add reference functions for all NLLLoss and NLLLoss2d test cases * Replaced slow get/set fns. Don't use int64_t in kernels. * Use TH_INDEX_BASE in NLLLoss for consistency * Fix legacy ClassNLLCriterion tests	2017-10-26 13:54:19 -04:00
Marat Dukhan	0664b30612	Update NNPACK submodule Summary: CMake scripts in NNPACK use enum34 polyfill for PeachPy to support pre-3.4 Python interpreters, which do not have built-in enum module. This polyfill was found to be conflicting with built-in enum module on Python 3.6, and I updated NNPACK CMake scripts to only use polyfill for Python < 3.4. This commit propagates this change to Caffe2, so Caffe2+NNPACK can be built on systems with Python 3.6. Closes https://github.com/caffe2/caffe2/pull/1389 Reviewed By: bddppq Differential Revision: D6161663 Pulled By: Maratyszcza fbshipit-source-id: c8aa07def6abe252a0a2ab927f6c49ccd846ab93	2017-10-26 10:48:07 -07:00
SsnL	bdeee47d33	Add zero, zeros_like, _dimI and _dimV for sparse tensors (#3271 )	2017-10-26 18:28:04 +02:00
Adam Paszke	5760b036fb	Fix pack_padded_sequence to accept inputs of arbitrary sizes	2017-10-26 17:40:03 +02:00
gchanan	cbe7b8b636	Adds permute and as_strided to ATen (#137 ) Permute transposes multiple dimensions at once. The as_strided function changes the sizes and strides of a tensor without changing the Storage. It's a subset of Tensor::set_.	2017-10-26 11:35:29 -04:00
Dirk Weissenborn	21ff182809	improve padding code	2017-10-26 12:03:17 +02:00
zhaopeng	a99506f2fc	fixed error: namespace "std" has no member "min"	2017-10-26 09:24:13 +02:00
Jiyan Yang	6e33ae79df	Add gradient op for WeightedSum op Reviewed By: dzhulgakov Differential Revision: D6149163 fbshipit-source-id: 0e8cf400323233d001243bc5cb25a0025115a564	2017-10-26 00:16:51 -07:00
Aapo Kyrola	63297e1a1f	RunNetOnce->RunNet (removes rnn_executor overhead) Summary: seq2seq/translate.py was running much slower on RNNExecutor. This was because RNNExecutor has significant init overhead (I have another diff to reduce, but not completely eliminate it), and translate was calling the decoder with RunNetOnce -- thus always recreating the net and the ops. Changhing this to RunNet() makes translate run faster than without executor. RunNet uses the net name and uses the already created net, while RunNetOnce passes the whole protobuffer. Noticed similar bug in seq2seq ensemble bean model, which also calls CreateNet() but uses RunNetOnce() instead of RunNet(). Reviewed By: jhcross Differential Revision: D6156566 fbshipit-source-id: a933453e36a0d8fd163d0584186fda427a680687	2017-10-25 22:06:02 -07:00
James Reed	b3b7203b40	Symbolic representation for mm (#3290 ) * Symbolic representation for mm * Fix whitespace issues	2017-10-26 00:29:57 -04:00
Sam Gross	8afbdd8dcf	Make toScalarType and toBackend virtual This allows VariableType override them to return instances of VariableType. Combined with the change to Formatting.cpp, this lets us print Variables to std::cout.	2017-10-25 20:46:34 -07:00
Gregory Chanan	2bcca48a62	Make size, strides, dim functions const.	2017-10-25 20:45:07 -07:00
Gregory Chanan	c03799e8eb	Change is_same_size to a native function. For one thing, we will want a different implementation from TH because we need to differentiate between scalars and 1-dim tensors. Also, we don't really want to expose the THS/THCS function; in addition to checking the shapes are the same, it checks that the dimensions which are sparse are the same (because various THS/THCS operators only work if this is true; it should really be called "is_congruent" or similar.	2017-10-25 20:44:29 -07:00
Gregory Chanan	715ca3a2c8	Add unsqueeze of scalar to wrapdim_test.	2017-10-25 20:44:06 -07:00
Trevor Killeen	fcdd394f66	bind newWithTensor in ATen (#129 )	2017-10-25 20:43:44 -07:00
Ahmed Taei	5bb8ed67e3	Compute GLU for an arbitrary axis Summary: As in title Differential Revision: D6151804 fbshipit-source-id: bd0fa08be1676ebd1abd9720711c221c61c11ad1	2017-10-25 19:49:55 -07:00
Marat Dukhan	817eaf6b1f	Build NNPACK using its own CMake scripts Summary: NNPACK now supports building with CMake, and its build scripts have advantages over the ones in Caffe2: - They automatically download all dependencies, no need to keep them in submodules anymore - They automatically download and setup PeachPy for x86-64 build - The same scripts are used for server/desktop (Linux, macOS) and mobile (Android/iOS) - They unblock Caffe2 build with Ninja Closes https://github.com/caffe2/caffe2/pull/1382 Reviewed By: Yangqing Differential Revision: D6150723 Pulled By: Maratyszcza fbshipit-source-id: 7c3e4e3406f60d4cc059e1c8112cb10aa3d75ece	2017-10-25 18:48:06 -07:00
Zach DeVito	4819197a40	fix merge problems	2017-10-25 17:54:32 -07:00
Zach DeVito	3b26b48d90	missing code from pytorch	2017-10-25 17:32:17 -07:00
Zach DeVito	699e47d380	missing entry	2017-10-25 17:27:32 -07:00
Yan Shang	39359afc84	Add rank loss for retrieval models with random negative sample Summary: In order to reproduce StarSpace model using the architecture of Two Tower model, we need to implement the ranking loss that is used in StarSpace as well as Filament model. In both StarSpace and Filament model, all negative samples come from random negative sampling, thus the number of negative sampler per positive record is fixed (say 64). To calculate the total loss, for each positive record, the hinge distance between the positive score and negative scores (the 64 scores in the example) are calculated. This diff implement this loss in Dper framework. The main idea is to add an option so that negative_sampling.py can output random negative samples as an independent field rather than merged with the original input_record. In this way, we can calculate the positive score and negative score separately, which will eventually been used when calculating the ranking loss. (Note: this ignores all push blocking failures!) Reviewed By: kittipatv Differential Revision: D5854486 fbshipit-source-id: f8a5b77be744a6cc8a2b86433282b3b5c7e1ab4a	2017-10-25 16:19:41 -07:00
Kai Arulkumaran	a7c5be1d45	Document CUDA best practices (#3227 )	2017-10-25 22:38:17 +02:00
Alexander Miller	837f933cac	remove 'path' from key_averages header path appears to be unused	2017-10-25 21:34:59 +02:00
Sam Gross	a65db4e956	Use ATen for torch.cat, torch.addmm, and friends on Variables. (#3286 ) This includes some changes to the dispatch code for torch.xxx functions: - Since Variable.addmm is an instance-method, the self argument has to come first. The dispatch code swaps the first two arguments if necessary to suppor the deprecated signatures where 'alpha' or 'beta' comes before the 'self' tensor. - Delete IMPLEMENT_STATELESS_REVERSED. These functions require output arguments to be passed in using the keyword 'out'. They were meant to handle torch.gt(out, a, b), but we haven't allowed that for a while.	2017-10-25 14:27:45 -04:00
Frank Jiang	d67624173b	Change RowWiseSparseAdagrad assertion message Summary: Made the asesrtion messasge clearer to let people know that rowwise is not supported for dense adagrad. Differential Revision: D6135363 fbshipit-source-id: d706135a335305627310c69a2a6d7721b0a47f0e	2017-10-25 10:54:33 -07:00
Sam Gross	43f0c74461	Merge commit '48911e116d43ab2b887fb714e30de09676a765a3'	2017-10-25 09:34:29 -07:00
andreh7	b46ced4aab	clarification in docstring of Module.register_forward_hook() (#3279 ) * made it explicit in the docstring of Module.register_forward_hook() that the hook(s) will be called AFTER calling forward(). * added "every time" in docstring of Module.register_forward_pre_hook()	2017-10-25 15:36:00 +02:00
Adam Paszke	b3642b3e65	Softmax/LogSoftMax refactor (wrapped up) (#3245 ) * Unify CUDA kernels for SoftMax and LogSoftMax * Improve SoftMax and LogSoftMax kernels performance Added a new instantiation of the spatial kernel for low inner_size and larger dim_size.	2017-10-25 14:47:56 +02:00
Ozan Çağlayan	e43a63a968	tensor: Ensure that the tensor is contiguous before pinning (#3266 ) (#3273 ) * tensor: Ensure that the tensor is contiguous before pinning (#3266) pin_memory() was producing out-of-order tensor when the given tensor was transposed, i.e. in column-major order. This commit fixes this by calling contiguous() before pinning. * test: add contiguous test for pin_memory (#3266)	2017-10-25 13:17:54 +02:00
Dirk Weissenborn	b5170c8bf1	improves pack padded sequence operation runtime #1788 (#3278 ) * improves pack padded sequence operation runtime #1788 * error message	2017-10-25 13:16:32 +02:00
Edward Z. Yang	9989bb1a43	Export index constants as long, not int (onnx-caffe2 needs it.) (#3274 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-25 09:50:33 +02:00
Aapo Kyrola	241b9f6c14	disable rnn executor for beam search Summary: RNN executor has significant overhead of creating the timestep-nets the first time, and this is especially bad with beamsearch that is complex. So disable RNN executor for now until perf regression is fixed (I have pending diff on it). Reviewed By: salexspb Differential Revision: D6138878 fbshipit-source-id: ce63ab9ce9cc1c0f67097aea1e370494ca98c680	2017-10-24 20:49:56 -07:00
Sam Gross	48911e116d	The at::cat should default to dim=0	2017-10-24 18:30:31 -07:00
Edward Z. Yang	e760e63244	Handle remainder=0 case correctly Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-24 19:33:37 -04:00
Edward Z. Yang	df71f2aef5	ONNX export for split. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-24 19:33:37 -04:00
Gregory Chanan	4b1e85d266	Remove split/chunk python autograd.	2017-10-24 19:33:37 -04:00
Gregory Chanan	fe0ac0f7d0	Support native functions in C++ autograd automatically.	2017-10-24 19:33:37 -04:00
Gregory Chanan	59e0472af1	Merge commit '9f6a41d63d37d5108f62d8499de4e691adac09e6'	2017-10-24 15:30:01 -07:00
Mike Garmulewicz	5fc122bf39	Fix to #2236 - tensor.numpy() checks that no positional arguments are passed. (#3224 ) * tensor.numpy() checks that no arguments are passed * tensor.numpy() checks that no arguments are passed * Improve .numpy() argument checking performance	2017-10-24 23:54:28 +02:00
Edward Z. Yang	3de3ac31cc	Merge commit 'd131219742d6efd31b4986342e22c0184a6d4340'	2017-10-24 16:06:56 -04:00
Aapo Kyrola	2e4d8aa530	Added FP16/FP32 MomentumSGD + WeightDecay Update Ops Summary: Added two new ops, FP16MomentumSGDUpdate and FP32MomentumSGDUpdate, which perform both the momentum sgd and weight decay updates to a given parameter in a single op -- thus being more efficient. Also updated the standard momentum sgd test to test if nesterov momentum works. Reviewed By: asaadaldien Differential Revision: D5837837 fbshipit-source-id: 5ad487b9c59434491d3a4fcfdeed820db6083f57	2017-10-24 12:28:16 -07:00
Gregory Chanan	9f6a41d63d	Support default parameters for native functions.	2017-10-24 12:16:05 -07:00
ngimel	f9d002d9f7	perf improvements for depthwise convolutions (#3265 ) The biggest performance improvements are due to templating kernels. See PR for some numbers.	2017-10-24 14:57:47 -04:00
Bram Wasti	a0aa6d0e24	expose flop annotation to python Summary: expose the flop annotation framework to python functions Reviewed By: Maratyszcza, Yangqing Differential Revision: D6135705 fbshipit-source-id: 2eed80b6cbda7b3ee3fe0e019a0f1fc4b0aa320b	2017-10-24 11:35:24 -07:00
SsnL	0b8b9cf928	update	2017-10-24 20:30:49 +02:00
Tongzhou Wang	d131219742	smarter backend option	2017-10-24 11:16:42 -07:00
Lu Fang	5691b0b8d2	Fix the Slice changes in ONNX (#3216 )	2017-10-24 14:12:54 -04:00
Aapo Kyrola	388a1b1e66	Added FP16SgdOptimizer Summary: Added FP16SgdOptimizer to optimizers. The optimizer updates the params using the FP16MomentumSGDUpdate and FP32MomentumSGDUpdate ops. To determine which update op to call the optimizer expects either the fp32_update flag to be set, or that the blobs are in a recognized format created by initializers.py. These requirements can be loosened if the blob DataType can be queried in python, though I am unsure of how to do this. It also forces FP32 updates to SpatialBN as CuDNN does not support FP32 params for SpatialBN. Reviewed By: asaadaldien Differential Revision: D5840806 fbshipit-source-id: 84ab8dc11a6e91a198ed72c00287f4809607079d	2017-10-24 10:44:04 -07:00
Junjie Bai	ed08533a1e	Add CUDA version of ScatterAssign Reviewed By: houseroad Differential Revision: D6128352 fbshipit-source-id: ea59f4bc723ef929b0f6ed15797df776d8054422	2017-10-24 10:20:03 -07:00
Edward Z. Yang	cc5a948e62	Fix clang-802.0.42 tuple overload bug, fixes #3234 . (#3252 ) * Fix clang-802.0.42 tuple overload bug, fixes #3234. Originally, my plan for emit_record_trace was to keep it as simple as possible, if at the expense of some somewhat ugly overloads. So this meant we had a 'recordTrace' function with overloads like this: recordTrace(..., const Variable& out) recordTrace(..., const std::tuple<Variable, Variable>& out) Unfortunately, this triggers a bug in clang-802.0.42 (widely used in macOS Sierra 10.12.6) wherein a Variable is implicitly convertible into a std::tuple<Variable, Variable>; a minimal repro can be seen below here: #include <tuple> struct T {}; void f(const std::tuple<T, T>&) {} void g(T& x) { f(x); } To work around this bug, the code generator is a bit more complicated, and is taught how to handle this situation. Previously the generated code looked like: jit::tracer::recordTrace( "min", { self }, ret ); Now it looks like: jit::tracer::recordTrace( "min", { self }, { std::get<0>(ret), std::get<1>(ret) } ); Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR comments Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-24 13:13:38 -04:00
Aapo Kyrola	1b71bf1d36	Updated resnet50_trainer and resnet for more FP16 support Summary: Added FP16SgdOptimizer to resnet50_trainer Reviewed By: wesolwsk Differential Revision: D5841408 fbshipit-source-id: 3c8c0709fcd115377c13ee58d5bb35f1f83a7105	2017-10-24 09:19:06 -07:00
SsnL	1a0e4e1b00	sparse cuda and get device (#122 )	2017-10-24 17:31:01 +02:00
vfdev	0748ea56eb	Change size by kernel_size in __repr__ Probably, __repr__ should return `MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1)))` -> `MaxPool2d (kernel_size=(3, 3), stride=(2, 2), dilation=(1, 1)))`	2017-10-24 12:46:19 +02:00
peter	25ed9aba03	Remove C exports and rename AT_API	2017-10-23 20:30:55 -07:00
peter	c6671f4379	Fix missing <functional> and export decorations in lib/ATen	2017-10-23 20:30:55 -07:00
Sam Gross	8e58135a26	Fix E722 ('do not use bare except') (#3239 ) The new version of flake8 includes a check for not using bare except. We should avoid this since it catches things like KeyboardInterrupt.	2017-10-23 23:03:37 -04:00
Sam Gross	9cca84a96f	Remove dead code	2017-10-23 19:31:53 -07:00
Ahmed Taei	512a8015b8	Gated Linear Unit implementation Summary: As titled Differential Revision: D6117600 fbshipit-source-id: 84b0154dc4cf77cc9c9146e9a534c7485989346b	2017-10-23 18:14:57 -07:00
Priya Goyal	d02ca80613	Revert the enum changes as discussed	2017-10-23 17:55:27 -07:00
Luke Yeager	7660c4cfe5	Fix linguist detection with gitattribute overrides Summary: Before: ``` 34.34% Jupyter Notebook 34.06% C++ 20.95% Python 4.84% Cuda 2.36% C 1.78% Objective-C++ 1.05% CMake 0.27% Metal 0.19% Shell 0.05% Objective-C 0.04% Batchfile 0.04% HTML 0.02% CSS 0.01% Makefile ``` After: ``` 51.87% C++ 31.91% Python 7.37% Cuda 3.59% C 2.72% Objective-C++ 1.59% CMake 0.42% Metal 0.30% Shell 0.08% Objective-C 0.06% Batchfile 0.06% HTML 0.02% CSS 0.01% Makefile ``` Closes https://github.com/caffe2/caffe2/pull/1375 Differential Revision: D6130054 Pulled By: Yangqing fbshipit-source-id: e98383381cbde636473e017f204eb6afedb5a34a	2017-10-23 17:03:07 -07:00
Yarik Markov	c6ef04db04	Add "dtype" parameter for GivenTensorOp Summary: Adding "dtype" parameter for the GivenTensorOp. Also, providing backwards compatibility for the existing code, byt supporting the templating if "dtype" is not provided. Reviewed By: bddppq Differential Revision: D6090049 fbshipit-source-id: f5deaa57b49f2280289975f4583aba5bc064a2bc	2017-10-23 16:06:37 -07:00
Yangqing Jia	f0ca857e6b	Explicitly use Eigen MPL2 in builds. Summary: Closes #1371 Closes https://github.com/caffe2/caffe2/pull/1372 Reviewed By: asaadaldien, houseroad, akyrola, bwasti Differential Revision: D6125792 Pulled By: Yangqing fbshipit-source-id: 5fd7ee9a5d77381fe9afbe899ef18465ecd1ceea	2017-10-23 15:06:38 -07:00
Hao Lu	f2f057af99	Support MetaNetDef model specs on mobile Reviewed By: ajtulloch Differential Revision: D6010327 fbshipit-source-id: 5f1b81fc9ba92889044b89ae766c45c2d3c090d8	2017-10-23 14:53:03 -07:00
Richard Zou	5795b173de	Fix LogSoftMax (#3244 )	2017-10-23 22:40:42 +02:00
Simon Layton	50049168a6	Pybind v2.2.1 Summary: Bumps the pybind version from v1.8.1 to v2.2.1, resolving all compile & runtime issues that arose. Upgrades to the API used https://github.com/pybind/pybind11/blob/master/docs/upgrade.rst as the point of reference. This also solves a long-standing bug we had, where a type would spontaneously and intermittently change in the C++ -> Python boundary. \cc Yangqing Closes https://github.com/caffe2/caffe2/pull/1308 Differential Revision: D6125152 Pulled By: pietern fbshipit-source-id: 67839a9654c655d143820c6686c311beba64eff2	2017-10-23 11:32:49 -07:00
Sam Gross	5afc166769	Fix lint build (#3237 ) The flake8 package was upgraded to include new errors which cause the build to break.	2017-10-23 14:04:56 -04:00
Lu Fang	e870f569db	Fix core_overhead_benchmark building issues Summary: The GPU version of core_overhead_benchmark needs CUDA_curand_LIBRARY. Closes https://github.com/caffe2/caffe2/pull/1365 Reviewed By: Yangqing Differential Revision: D6125248 Pulled By: houseroad fbshipit-source-id: de6ffcbd1f5b685b06560cae860ff0b26cb86ddc	2017-10-23 11:03:06 -07:00
Sam Gross	b92e06e50e	Fix reference counting bug in python_nn_functions.cpp (#3236 ) Py_InitModule returns a borrowed reference. PyModule_AddObject steals the reference, so we need to incref the `_nn` object. (The Python 3 function PyModule_Create returns a new reference.)	2017-10-23 12:35:59 -04:00
soumith	a806d1ad69	make softmax test name unique case-insensitive	2017-10-23 06:29:58 -07:00
soumith	dc6510f7ed	fix copy elison warnings / get rid of an std::move	2017-10-23 01:18:34 -07:00
Sam Gross	d5604aea0b	Don't create grad_fn if requires_grad=False (#3212 ) Don't create grad_fn if requires_grad=False - Check that arguments without derivative definitions have requires_grad=False - Pass all tensor arguments to the tracer, including ones without derivative definitions	2017-10-22 18:41:04 -04:00
Soumith Chintala	891f41c14b	Upgrade to 2.2.1 Summary: Update pybind from 1.8.1 to 2.2.1 aarch64 platform updates pending. Reviewed By: houseroad, kmatzen Differential Revision: D6089712 fbshipit-source-id: 80ce09c381717f4317e2e698479ff604cf28c709	2017-10-22 13:26:56 -07:00
peterjc123	0989889251	Fixing lib/THNN build for Windows (#3217 )	2017-10-22 12:19:00 +02:00
Qinqing Zheng	6a4182eead	weighted sample op cuda Summary: CUDA version of weighted sampling operator; minor changes for CPU version Reviewed By: asaadaldien Differential Revision: D6106668 fbshipit-source-id: 42d7607bd845a4a39cf5b89d7476904cb5928431	2017-10-21 18:49:59 -07:00
Sam Gross	67839ce7bc	Delete unused Softmax code (#3220 ) Softmax and LogSoftmax are automatically bound and dispatched through VariableType.	2017-10-21 20:51:27 +02:00
Priya Goyal	129336cb06	[dlpack] Memory management for dlpack	2017-10-21 20:19:51 +02:00
Pieter Noordhuis	6b5f57b397	Make make_image_db multi threaded Summary: While waiting for the single threaded version to complete I noticed it was doing an awful lot of waiting, so decided to make it multi threaded. Creating a 150GB DB is now ~4x faster on an AWS EBS volume. Closes https://github.com/caffe2/caffe2/pull/1334 Reviewed By: romain-intel Differential Revision: D6045259 Pulled By: pietern fbshipit-source-id: 43f9392a0a383355660a3ead217ab38939dd2bc2	2017-10-20 16:03:24 -07:00
Yangqing Jia	ba1dba45f7	Finish #1358 Summary: Closes https://github.com/caffe2/caffe2/pull/1362 Differential Revision: D6115853 Pulled By: Yangqing fbshipit-source-id: 581713e328f778fe916114f4f52d7089bc25bc3c	2017-10-20 15:47:58 -07:00
Edward Z. Yang	d89d9d74bd	Fix Python 3 portability problem. (#3209 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-21 00:44:29 +02:00
Richard Zou	ed9c43774c	Don't resize output in cpu torch.gels (#3204 ) * Don't resize output in cpu torch.gels when m > n	2017-10-21 00:43:42 +02:00
Dong Li	39729aa55c	Add GPU support to operator RowWiseSparseAdagrad Summary: Previously, CPU version of operator RowWiseSparseAdagrad has been implemented, Here the GPU version of of the operator has been implemented and tested Reviewed By: azzolini Differential Revision: D6082828 fbshipit-source-id: 74befd495666c357d5ab425a698c5880cd8f927c	2017-10-20 15:20:50 -07:00
Edward Z. Yang	53fe804322	Make ONNX work with new C++ autograd world. The general strategy is there is a new module, torch.onnx.symbolic, which contains a function for every ATen method name with the ONNX translation. While implementing this, I took the opportunity to expunge all references of 'g' from the public API; instead, it is managed by a global variable in torch.onnx which tracks the "current graph". Other changes: - If you pass a Tensor to op as an argument, it will now automatically be converted into a Constant ONNX node. This lets us remove needing to implement ONNX - Rename value to other, wherever there is both a Scalar and Tensor overload. This way, keyword dispatch can work uniformly in both cases. - Deleted any autograd Function classes that both had a symbolic and were ported to the new C++ autograd implementation. There may still be some straggling classes that didn't have symbolic. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 15:38:01 -04:00
Edward Z. Yang	e64f40ae5b	Add tracing to the new ATen style API. The generated tracing code looks like this: if (jit::tracer::isTracing({ self })) { jit::Node *n = jit::tracer::recordTrace( "mean", { self }, ret ); n->rawSet(jit::stringToSymbol("dim"), dim); n->rawSet(jit::stringToSymbol("keepdim"), keepdim); } A few design decisions I made: - Instead of making the assignment of 'n' conditional on whether or not attributes are present, I just add (void)n if it would not be used otherwise. This modestly simplifies code generation. - Tracing of operations that involve Generator or Storage are not supported. This is fine because such ops don't take any Variable arguments anyway, so they couldn't trigger tracing. - Unfortunately, at::ArrayRef is not covariant, so there is some faffing about to support conversions from at::ArrayRef<Tensor> (aka TensorList) to at::ArrayRef<Variable>. In the case of 'recordTrace' (slow path), I just allocated an intermediate std::vector to get the types correct; in the case of isTracing (fast path) there's three overloads to avoid refcount bumping when possible. - Tracing is all in one place, rather than spattered between the beginning and end of an ATen function, as Sam suggested. - This commit doesn't actually enable ATen definitions. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 15:38:01 -04:00
Edward Z. Yang	0589dfab81	nested_dict documenting comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 15:38:01 -04:00
Sam Gross	5989b05ecc	Enable ATen implementation of some NN functions and Variable methods	2017-10-20 15:38:01 -04:00
Edward Z. Yang	a385979677	Guard against executing the Hardshrink on CUDA Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 15:38:01 -04:00
Edward Z. Yang	507319ca39	Revert "Speed up norm_backward" This reverts commit 17a817190c72e8ee48919c9c326e74b385764f5a.	2017-10-20 15:38:01 -04:00
Gregory Chanan	5a0ded4dad	PyTorch fixes for latest ATen: 1) softmax, log_softmax backwards now have int64_t dim argument 2) chunk/split in autograd/functions/tensor.cpp conflict with new ATen implementations, just delete them and use the ATen ones. 3) div/mul with Scalar now use "other" parameter rather than "value"/	2017-10-20 11:05:11 -07:00
Gregory Chanan	96cb3f7c80	Merge commit '10df3496cbe392fa06648600aff3682a490e43c5'	2017-10-20 10:57:32 -07:00
Edward Z. Yang	10df3496cb	Fix typos in orgqr and orgmqr Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 10:56:00 -07:00
Edward Z. Yang	9560540084	Add missing string include, fixes https://github.com/pytorch/pytorch/issues/3192 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 10:56:00 -07:00
Edward Z. Yang	16095b5737	Rename value to other, wherever there is both a Scalar and Tensor overload. (#115 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-20 13:54:30 -04:00
Yangqing Jia	02f4303749	Use own benchmark and not any system pre-built ones: Summary: (1) use the cmake files of the corresponding libs (2) allow static linkage of gtest and gbenchmark. (3) Helps removing the temp solution in #1112 We are yet to disable the installation of the benchmark library, and I have an open pull request at https://github.com/google/benchmark/pull/463 - once it is merged I will do submodule update. cc lukeyeager pietern who had this issue before - hopefully this makes the solution cleaner. Closes https://github.com/caffe2/caffe2/pull/1358 Differential Revision: D6111404 Pulled By: Yangqing fbshipit-source-id: 17468d32cef27f96e9445d119eb869c9c7913118	2017-10-20 10:37:44 -07:00
Badri Narayan Bhaskar	25bfffeafe	Swish Activation Function Summary: Swish: A self-gated activation function. https://arxiv.org/pdf/1710.05941.pdf Reviewed By: ajtulloch Differential Revision: D6100424 fbshipit-source-id: 0103d6d82e9ffb50106c98a8785e62b8808e9af1	2017-10-20 10:37:43 -07:00
Simon Layton	0c0c9e743e	Fix dimensions check Summary: To match CPU implementation [here](https://github.com/caffe2/caffe2/blob/master/caffe2/operators/segment_reduction_op.h#L323) Closes https://github.com/caffe2/caffe2/pull/1360 Differential Revision: D6111071 Pulled By: Maratyszcza fbshipit-source-id: ba0019ff483ff28f4aa452103c3bad5d9294af96	2017-10-20 10:28:01 -07:00
Gregory Chanan	246701df81	Separate out native processing into procecss_native; remove (TH)Type specific logic.	2017-10-20 10:12:26 -07:00
Gregory Chanan	90e396f6bb	Support 'native' ATen functions with Tensor, (base) Type, NS impls. This adds the ability to specify 'native' functions in NativeFunctions.h and specifies 'split' and 'chunk' in this manner. The function arguments, returns, variants, etc. are specified as if they were processed via other parsing mechanisms (e.g. cwrap_parse) with the following additional parameters: type_method_definition_level: this allows one to specify that the type method should be defined at the 'base' type level; this is because in the case of 'split' and 'chunk' (and probably most/all other native functions that don't directly dispatch to TH/THC) we don't need type-specific implementations. Currently it is enforced that 'base' is specified for native functions, but this is easy to remove later. type_method_definition_dispatch: this defines the function to dispatch to. For split, this is at::native::split; this is just to avoid having a magic namespace and allowing one to dispatch to a function with a different name.	2017-10-20 10:12:26 -07:00
James Cross	5b3931c119	logic fix for repeated sequence masking Summary: Logic fix for sequence masking repeated along data dimensions. Reviewed By: jamesr66a Differential Revision: D6109418 fbshipit-source-id: 1a006e863a26e627039d7a88c922625d50bde8e3	2017-10-20 10:09:22 -07:00
Priya Goyal	fea60da92e	Update DLPack tensors enum to avoid binary issues and expose one function	2017-10-20 10:08:54 -07:00
Trevor Killeen	0b0f24a71b	disable test_cudnn_weight_format when CuDNN not available (#3200 )	2017-10-20 19:06:53 +02:00
Adam Paszke	76abc06b1f	Fix nvprof mode in autograd profiler	2017-10-20 10:22:54 -04:00
Sam Gross	17a817190c	Speed up norm_backward	2017-10-20 10:21:28 -04:00
SsnL	634c8315a4	isContiguous problems (#3148 ) * with the size=1 case, impossible to do single point check, replace with isContiguousRange * fix stride in desc; fix undef scope * add test for this case for cudnn * assertTrue	2017-10-20 10:20:33 -04:00
David Mascharka	2797c8005b	Update THDTensor.cpp	2017-10-20 10:17:27 -04:00
David Mascharka	50de9160aa	Update THDTensor.cpp Add `__STDC_FORMAT_MACROS` to fix gcc issues	2017-10-20 10:17:27 -04:00
Junjie Bai	ee62a595fc	ScatterAssign int types Summary: Closes https://github.com/caffe2/caffe2/pull/1357 Reviewed By: dzhulgakov Differential Revision: D6107036 Pulled By: bddppq fbshipit-source-id: 9278dae988c3c0656b4e4fd08bf7ca1e2eec3348	2017-10-19 23:22:54 -07:00
Edward Z. Yang	d8ad5de560	Fix intermittent segfault on Python 2. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	6ebfa20ab9	Include math.h for M_PI. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	147287a33c	Fix the build on clang. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	7d95127a4f	Squash ATen warning Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	67612cba09	Add -Wno-missing-braces Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	8faffef321	Make flags overloads compile. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Edward Z. Yang	3696300fcf	Include Python.h less using a new stub header. In many "non-Python" headers, we include Python.h because we need to declare a pointer to PyObject, and solely because of that. It would be a lot better if we had a simpler version of Python.h that just declared PyObject available for pointers, without anything else. This is what torch/csrc/utils/python_stub.h does. The good thing about not including Python.h is that it is easy to be warning-less; no more ugly insertions of Python.h on headers where it has no good reason to be. This makes PyTorch warning clean again. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 23:04:19 -04:00
Sam Gross	8b3acd7d7b	Check that type_str is in the type_map (#3191 )	2017-10-19 22:54:25 -04:00
Dmytro Dzhulgakov	623f2bf815	Add GivenTensorInt64Fill on gpu Summary: Before we fix it properly with 'type' argument. Reviewed By: bddppq Differential Revision: D6103973 fbshipit-source-id: 8c00a93c373dd0ad0bbfe59944495f6574223ab6	2017-10-19 18:32:41 -07:00
Edward Z. Yang	0da15f913c	Change softmax and log_softmax to take int64_t dim rather than int. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 17:50:17 -07:00
Edward Z. Yang	357f9b6f01	Squash ATen warning Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-19 17:40:22 -07:00
Huazhong Ning	f7ad13694c	support model init Summary: a parameter can be initialized multiple times in init_net if parameter sharing is enabled. With the original implementation, only the first parameter init will be replaced by pre-trained parameters and the next are still unchanged. This overwrites the initialization with pre-trained parameters. This diff fixes this issue and also support model init for ads-intent project Reviewed By: dragonxlwang Differential Revision: D5991291 fbshipit-source-id: 36173f6239c56bd0d604a77bd94e36072f32faa7	2017-10-19 15:56:37 -07:00
Jongsoo Park	e5e6c71743	include memory and map from observer.h Summary: include memory and map from observer.h Reviewed By: ajtulloch Differential Revision: D6094338 fbshipit-source-id: f39b27cb76dae3b06816bb9ae37c2c1f96eaa8ba	2017-10-19 15:19:25 -07:00
Sam Gross	e970d35091	Make VariableVersion refcounting thread-safe (#3184 ) I've also made the version counter and the "live" reference count atomics. Note that it's not safe to set the version counter (operator=) from multiple threads, because shared_ptr assignment isn't thread safe. Currently, the only call sites to these functions are on newly created variables before they can be accessed from other threads. See #3111	2017-10-19 17:22:01 -04:00
Hassan Eslami	db6a9d2ae4	Fixes type inference for Slice and GivenTensorFill operators Summary: Currently, the type inference infers FLOAT as the type for all GivenTensorFill operators. However, the inferred type should match the actual operators. Also, for `Slice` operator, there is a corner case where type inference fails Reviewed By: azzolini Differential Revision: D6096813 fbshipit-source-id: d65b7c0f42436138cbc49d8a5a62374fa5e927e1	2017-10-19 14:02:21 -07:00
Bangsheng Tang	7b30436201	remove Alias in SparseFeatureHash Summary: remove Alias in SparseFeatureHash Reviewed By: kennyhorror Differential Revision: D6094663 fbshipit-source-id: f313aeb17bf6cfdacae62b2c1ad6b4175d0882dd	2017-10-19 13:24:20 -07:00
Sam Gross	d9b89a352c	Replace StochasticFunctions v2 (#3165 ) This removes the StochasticFunctions for bernoulli, multinomial, and normal and replaces them with classes in the torch.distributions package. Each distribution supports the differentiable log_prob function that returns the log of the pdf/pmf of the samples. The current StochasticFunction implementation has a few problems: it can be painful to use when there are multiple stochastic outputs which need to be back-propagated through. It also requires that we store grad_fns on Variables that have requires_grad=False in order to find stochastic nodes.	2017-10-19 15:05:07 -04:00
Sam Gross	f1f64c8d07	Generate autograd functions for NN / more refactors (#3136 ) Generate autograd functions for NN and implement more derivatives in derivatives.yaml A big refactor of gen_variable_type.py	2017-10-19 15:03:26 -04:00
Adam Paszke	98e67448fa	Large Softmax and LogSoftmax refactor - Cleaned up THNN and THCUNN code and kernels - Improved THCUNN kernel performance 5x, making it match cuDNN performance - Added support for computing softmax over arbitrary dims NOTE: The default dim for 3D inputs is now 1 (used to be 0) - Both functions now accept inputs with arbitrarily many dimensions - Autograd functions no longer save the input (it's unnecessary) - Added cuDNN bindings for softmax, but they are unused as THCUNN matches or even exceeds cuDNN performance	2017-10-19 19:51:10 +02:00
Adam Paszke	3a4ca7a269	Add support for saving the output in autogenerated functions	2017-10-19 19:51:10 +02:00
Yangqing Jia	a1518b7801	CMake changes to make Caffe2 more friendly for dependent libraries Summary: This introduces a few things: - It enables us to create Caffe2Config.cmake that can be used down the road for building dependent libraries, so they do not need to explicitly write FindCaffe2.cmake. - The config file will automatically figure out transitive dependency of Caffe2 as well as compiler flags. - This diff also disables the RPATH setting since it is kind of a mess right now. In principle, we should figure out a clearer rpath setting following the typical rpath setting choices (https://cmake.org/Wiki/CMake_RPATH_handling) - I can send a follow up PR to clean this up. - Minor: removed old gflags ang glog files. Closes https://github.com/caffe2/caffe2/pull/1354 Reviewed By: dzhulgakov Differential Revision: D6098014 Pulled By: Yangqing fbshipit-source-id: cb06c41a7ef60fddb78b24887b6b3e82684b7c6b	2017-10-19 10:05:32 -07:00
Adam Paszke	f9ee52efa9	Update DLPack bindings	2017-10-19 10:06:53 -04:00
Adam Paszke	76071cfbac	Merge commit '99cbf24b8b5a9d769b9794e447e0b740bcdd99c8'	2017-10-19 10:06:41 -04:00
Adam Paszke	99cbf24b8b	Update log_softmax and softmax signatures to include dim (#106 )	2017-10-19 12:19:20 +02:00
Hassan Eslami	8d8cebd6be	Fixes the net-rewriting pipeline for model with rowwise adagrad Summary: Model with rowwise RMSProp does not work in net-rewriting pipeline (fbl 29841194). This diff solves the issue by changing the way Slice op is used in the model and adds a rule to `parallelize.py` to cover for needed cases. Reviewed By: azzolini Differential Revision: D6096022 fbshipit-source-id: c4f615b2ba99da9f77a1d49c9fb898e0e59401f8	2017-10-18 20:05:37 -07:00
Junjie Bai	03bfd7a873	In Predictor interface allow real model inputs to be fed in run* functions Summary: https://github.com/caffe2/caffe2/issues/1294 Reviewed By: jerryzh168 Differential Revision: D6086990 fbshipit-source-id: 7d21269055d91cc223a72f6352cdb45584f5b56b	2017-10-18 20:05:36 -07:00
Junjie Bai	43b303bfc0	Expose Predictor::run_map to Python Reviewed By: jerryzh168 Differential Revision: D6087316 fbshipit-source-id: d90e20429645391f17f0c56c8a8a60685097f801	2017-10-18 19:32:56 -07:00
James Cross	96c6212513	repeat sequence mask for data dims Summary: Allow the application of sequence-length masking to be replicated along one or more minor axes. See task for details. Reviewed By: jamesr66a Differential Revision: D6090835 fbshipit-source-id: 9064232aa9b93246c582b6e0bae73be5dbe09e98	2017-10-18 18:08:08 -07:00
Lu Fang	7fd6fd6d80	Output more useful error message when exporting FeatureDropout in train mode (#3156 ) * Output more useful error message when exporting FeatureDropout in train mode * Update the comment	2017-10-18 20:27:18 -04:00
Zach DeVito	3631cd71b1	nit: move ATenDLMTensor to cpp file since it doesn't need to be in the header	2017-10-18 16:29:14 -07:00
Priya Goyal	424390bc96	[dlpack] Memory management for dlpack	2017-10-18 16:25:30 -07:00
Trevor Killeen	9eb9615a6b	fix build error when nnpack is enabled (#3167 )	2017-10-18 17:50:31 -04:00
Marcin Elantkowski	57ffe64cbe	Embedding related fixes (#3128 ) * Fix docs for nn.Embedding and F.embedding. - add description of 'sparse' argument (#3104) - fix F.embedding example (resulted in RuntimeError) * Make EmbeddingBag a New Style Function. * Add a functional interface for EmbeddingBag * Fix failing tests: add max_norm and norm_type to context, and fix typo in backend call. * Docfix: remove torch.manual_seed from example code. * Add a note about using sparse keyword in Embedding function.	2017-10-18 23:38:07 +02:00
Edward Z. Yang	9ec9acc0cd	Fix bug with 'coalesced' calculation in 'cadd'. (#3162 ) Apparently, the algorithm only guarantees the output is coalesced if the inputs are coalesced. I'm planning to do another PR that does much more stringent correctness testing for the 'coalesced' bit shortly, but y'all should merge this one first. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-18 23:20:56 +02:00
Trevor Killeen	f8cb5d6437	Trying to construct a Tensor from a Variable fails more appropriately (#3163 )	2017-10-18 23:13:22 +02:00
Sam Gross	51c2075f16	Relax Scalar::toXXX conversions to only check for overflow Currently, the toXXX functions on Scalar check that the conversions are exact. This will cause an exception in code like: auto t = CPU(kFloat).ones({1}); t = M_PI; Or the equivalent in Python: t = torch.ones(1) t = math.pi This changes the checks to only throw an exception in the case of overflow (positive or negative).	2017-10-18 13:53:09 -07:00
Fei Sun	4348c9c8f8	Disable logging by default Summary: By default, do not log anything to reduce the runtime overhead Reviewed By: Maratyszcza Differential Revision: D6082490 fbshipit-source-id: 35fd09ea439925139d66b4623211e01af46e18f2	2017-10-18 12:54:28 -07:00
Trevor Killeen	dcb457fdd9	add support for using nnpack when installed via conda (#3155 ) * add support for using nnpack when installed via conda * unify nnpack discovery between conda and user	2017-10-18 20:11:13 +02:00
Trevor Killeen	7680601659	Spatial Depthwise Convolution on the GPU (#3057 ) * THCUNN Skeleton for Depthwise Convolution port * implement Depthwise Convolution CUDA Kernels (handles weight parameter only, not bias) * working kernels and bindings for forward + backward for base conv, and integration * add support for padding * strides for weight kernel * dilation for weight gradient, enable for others * add support for depthwise multiplier * remove old depthwise conv * rename to SpatialDepthwiseConvolution * clean up depthwise code, add shape asserts, more constrained thread count for accgradparams * add bias for forward for depthwise conv * add grad_bias, move bias for forward to CUDA * fix eligibility test to guard against transposed, properly identify depth multiplier * add basic unit test; make depthwise conv take priority over cudnn when appropriate * add tests for depthwise permutations * make cuda kernels calculate positions using mul instead of div * remove unnecessary samegpu requirement * use accreal, test for double type * use THAssert instead of assert * rename to is_depthwise * half prec support for depthwise * make certain computation more pythonic * flake8	2017-10-18 14:16:02 +02:00
Alykhan Tejani	95556f4075	add ignored_keys param to load_state_dict (#3159 ) * add ignored_keys param to load_state_dict * remove ignored_keys in favour of a strict param * raise KeyError only if strict is enables	2017-10-18 14:14:19 +02:00
Marcin Elantkowski	23a3f78988	Reverse the order of checks in torch.gather (#3130 ) * Reverse the order of checks in torch.gather * Remove unnecessary comment * Add missing check for indexing dimension	2017-10-18 12:30:05 +02:00
Sam Gross	6647475bc2	Lazily create Variable.data PyObject* (#3149 ) Previously, we the Variable.data PyObject* in THPVariable_Wrap. For many Variables, we don't access their data directly. Instead, they are passed from one Variable compuatation to another. This reduces the overhead of ATen-implemented Variable methods by ~200ns.	2017-10-17 11:54:55 -04:00
Sam Gross	75bb50be0a	Remove THHeapUpdate (#3143 )	2017-10-17 11:07:40 +02:00
Soumith Chintala	3109e4ad6a	add common terminology to BatchNorm docs	2017-10-17 11:03:31 +02:00
Sebastian Raschka	f176c864f0	minor autograd reference change in readme (#3144 )	2017-10-17 10:16:06 +02:00
Dmytro Dzhulgakov	923bcfdd27	Gate engine=NNPACK with nnp_initialize Summary: Somehow we're observing mysterious test failures for some nnpack-related tests with gcc5 only on Travis: https://travis-ci.org/caffe2/caffe2/jobs/288804879 Marat suggested that maybe the machine doesn't have avx2 support. Right now gating is happening for FB-internal only. I think it makes sense to make gating generic. Calling `nnp_initialize` seems like the right way to do so. It returns failure if the hardware is not supported and is a noop after the first call. Reviewed By: Maratyszcza Differential Revision: D6073808 fbshipit-source-id: e684668628b5c635368351114b6c502d2cc81fe4	2017-10-16 20:17:39 -07:00
Jerry Zhang	4ac8ecb76e	Some bug-fixs in mpscnn backend Summary: att Reviewed By: ajtulloch Differential Revision: D6037723 fbshipit-source-id: d7405b27089210abfd48a33ecee47a87f67ae9a0	2017-10-16 18:33:28 -07:00
Bryan Wu	6ac393a32b	WeightedSigmoidCrossEntropyWithLogits Summary: Op for computing SigmoidCrossEntropyWithLogits with per-label, per-sample weights. Can be used for addressing class or label imbalance. Doc: Given three matrices: logits, targets, weights, all of the same shape, (batch_size, num_classes), computes the weighted sigmoid cross entropy between logits and targets. Specifically, at each position r,c, this computes weights[r, c] * crossentropy(sigmoid(logits[r, c]), targets[r, c]), and then averages over each row. Returns a tensor of shape (batch_size,) of losses for each example. Reviewed By: stephenyan1231 Differential Revision: D5997723 fbshipit-source-id: f3172325f1c98b6f26e1700131ef897b743a72fc	2017-10-16 17:34:38 -07:00
Lu Fang	9d4d0640f2	Support MNIST in ONNX (#3100 ) * Support MNIST in ONNX * Add train mode check in FeatureDropout symbolic, add todo mark in logsoftmax_symbolic * export FeatureDropout as a simple identity op * turn x = x or y to if-checks.	2017-10-16 19:51:40 -04:00
Lei Chen	58bcf76ba3	Have model downloading as a separate plan Summary: For distributed offline training, downloading parameters from trainer_0 is part of epoch plan. However for distributed realtime training, we publish model by a specific time interval, so we need run multiple iterations for epoch plan before publishing the model. In this diff, I split downloading parameters from epoch plan as a separate plan, so we can explicitly execute it before model publishing for distributed online training. Reviewed By: boryiingsu Differential Revision: D5995122 fbshipit-source-id: 47d61d7b8c57cfae156e79b7ec32068ef579d7c3	2017-10-16 16:03:48 -07:00
SsnL	fce3ed19e5	Change device_id to device in python land (#3133 ) * change device_id to device in python land * cuda/random.py	2017-10-17 00:54:26 +02:00
SsnL	ba05dc5549	dense buffer (#3139 )	2017-10-17 00:51:37 +02:00
Arthur Crippa Búrigo	17d68f824d	Fix typo. (#3140 )	2017-10-17 00:50:33 +02:00
Ilia Cherniavskii	569bdb4b77	Refactor executor test Summary: Travis treats test_settings/test_model_names as tests, moving them into executor_test_util Reviewed By: bddppq Differential Revision: D6068920 fbshipit-source-id: 01c5bf962b985398414f44a7849c0f6344fd7e1d	2017-10-16 15:17:16 -07:00
Lu Fang	3261e1337a	Use 0D (1-element) tensor instead of 1D tensor	2017-10-16 17:47:36 -04:00
Lu Fang	00996006d1	Remove type inference from value	2017-10-16 17:47:36 -04:00
Lu Fang	93e1749c85	Add ONNX support for AddConstant and SubConstant	2017-10-16 17:47:36 -04:00
Lu Fang	da7aa3a12f	Add helper function _constant in onnx.py	2017-10-16 17:47:36 -04:00
Bram Wasti	7d16d320d5	expose observers to python, add multiple observers per observable Summary: observer framework can now be used in python + a small writeup of how to use it. this is D6035393 with a fix for ct-scan Reviewed By: salexspb Differential Revision: D6066380 fbshipit-source-id: 896c4c580d4387240b81ac2dbbc43db51d4bfeb9	2017-10-16 14:32:56 -07:00
Sam Gross	e92246fffa	Visit hooks in C++ implemented autograd functions (#3138 ) Once mul uses ATen, this is necessary for TestAutograd.test_hooks_cycle to pass.	2017-10-16 17:30:09 -04:00
Lu Fang	36895e2dd2	update the comments, move the expect check logic into the helper function	2017-10-16 16:57:16 -04:00
Lu Fang	a1deb2d47f	Move the exception logic to the helper function	2017-10-16 16:57:16 -04:00
Lu Fang	cad9438bb9	Add unit tests for onnx helper functions	2017-10-16 16:57:16 -04:00
Junjie Bai	1735c5f6c7	Add Filler op for double Summary: Closes https://github.com/caffe2/caffe2/pull/1344 Reviewed By: dzhulgakov Differential Revision: D6065137 Pulled By: bddppq fbshipit-source-id: 1849beeaa4fee8cc056b685664f91daca71764b8	2017-10-16 13:48:15 -07:00
Dmytro Dzhulgakov	f6f51129ce	Fix SparseToDenseMask for int64 indices Summary: that what made tests fail :) Reviewed By: xianjiec Differential Revision: D6067037 fbshipit-source-id: 0194f082feed87b0502170683c6773e07db3ff44	2017-10-16 13:17:31 -07:00
Jerry Zhang	3c144e3872	Relax CopyToMPSCNN dimension requirement Summary: Enable CopyToMPSCNN to accept 1 <= ndim <= 4. Reviewed By: ajtulloch Differential Revision: D6021320 fbshipit-source-id: e76222b41a0c7b19b38df2ef8be5a4bb24843419	2017-10-16 12:18:05 -07:00
Richard Zou	0f4ae13f05	Better cudnn version checking (#3132 )	2017-10-16 20:59:18 +02:00
Sam Gross	47beb64b5c	Use ATen generator as default CPU generator (#3135 ) ATen has it's own default CPU RNG. Use this as the default in PyTorch so that random functions called through ATen have the same behavior as random functions called through TensorMethods	2017-10-16 14:22:58 -04:00
Bram Wasti	0c8aaabce8	disable share dir by default Summary: until we have an internal build test for this directory we should not have enabled by default in open source Reviewed By: salexspb Differential Revision: D6060577 fbshipit-source-id: 25f5c2d30adf274620cd8ec2e2db9565b98cfa7c	2017-10-16 11:21:20 -07:00
Sam Gross	28ed514bfe	Add additional resizes to ClassNLLCriterion (#3134 )	2017-10-16 12:30:45 -04:00
Marat Dukhan	c0c3162c1a	Support NVIDIA Tegra Summary: makes the necessary changes to support Caffe2 OpenGL ES backend on NVIDIA Tegra devices - Remove no_bounds global because Tegra GLES driver doesn't recognize it as a constant. Define BOUNDS_CHECK_MODE macro instead. - Recognize "NVIDIA Tegra" as a supported GL_RENDERER Reviewed By: hlu1 Differential Revision: D6030760 fbshipit-source-id: e3655467612469d69c70b3fee35edb2d6774a793	2017-10-15 10:18:52 -07:00
Sasank Chilamkurthy	a0ac72e84e	Use template instead of sphinx-contrib for google analytics	2017-10-15 18:40:05 +02:00
Scott Yost	a7a81351f2	Revert D6035393: [caffe2] expose observers to python, add multiple observers per observable Summary: This reverts commit 4563cf0203095fa979bb2160621cd16dd22ff830 bypass-lint Differential Revision: D6035393 fbshipit-source-id: 090fba774ce433904f7ef769dda75c2fbbf784a8	2017-10-14 21:47:34 -07:00
Bram Wasti	58fe66e337	expose observers to python, add multiple observers per observable Summary: observer framework can now be used in python + a small writeup of how to use it Reviewed By: sf-wind Differential Revision: D6035393 fbshipit-source-id: 4563cf0203095fa979bb2160621cd16dd22ff830	2017-10-14 13:09:29 -07:00
greaber	490d5c2f13	improve torch.load documentation (#3118 )	2017-10-14 18:54:53 +02:00
Edward Z. Yang	75665ca6db	Suggest NO_CUDNN=1 as alternative when CuDNN is too old. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-14 12:04:40 -04:00
Edward Z. Yang	f709199c49	Make test_jit more robust about compilation. It's pretty easy to accidentally fail to actually compile a JITed region, which means that we have accidentally failed to have test coverage for a number of features. This adds a secret _assert_compiled kwarg, which will raise an error if we don't actually hit the compiled codepath. This is not intended to be user visible; we have some other ideas for handle this case. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-14 12:04:40 -04:00
SsnL	6dc67aef17	doc (#3110 )	2017-10-14 10:44:35 +02:00
SsnL	38f87cc9c4	Limit print scale by sys.float_info (#3113 ) * limit print scale by sys.float_info * test print tiny/huge values in test_print * fix lint	2017-10-14 08:52:01 +02:00
Sam Gross	f11fb319bd	Fixes for new ATen - names of convolution and batch normalization functions changed - at::Type copy now uses broadcasting - at::Type storageFromBlob takes a deleter	2017-10-13 22:04:21 -07:00
Sam Gross	5a195d9dc6	Merge commit '88b5bf8ec08f19c817019fa229eef0b1c6c92431'	2017-10-13 22:04:17 -07:00
Lu Fang	864bd934b0	Add a helper function to check broadcasting (#3115 )	2017-10-13 23:22:16 -04:00
Sam Gross	4f81eff2eb	Perform gradient checks on masked_scatter and masked_fill We weren't doing gradient checks on these functions because the tests were in-place only. We also incorrectly classified __magic__ functions as inplace.	2017-10-14 00:02:22 +02:00
Richard Zou	8666be05f5	Raise runtime error in setup.py if cudnn version is not supported	2017-10-13 23:58:25 +02:00
Richard Zou	1322f9a272	Add cudnn version to torch.version	2017-10-13 23:58:25 +02:00
Aapo Kyrola	123cb5dd07	use non-cudnn transpose for int tensors Summary: Turns out CuDNN's tensor transform only supports floats. Previous implementation pretended it would work with ints by casting to floats and indeed passed tests for some reason. But rgirdhar found a case where it returned nonsensical results. So rewire int-transposes to use non-cudnn version. Had to refactor a bit for that. Also added a test for the case. Reviewed By: asaadaldien Differential Revision: D6043284 fbshipit-source-id: cc3b14f9fbbdeff421b01da453a1d3c7c5ffd4ac	2017-10-13 14:02:48 -07:00
Sam Gross	88b5bf8ec0	Every argument controlled by the output_mask may be null	2017-10-13 14:01:05 -07:00
Junjie Bai	4c3b02f314	Enable Flatten operator to take an arbitrary axis arguemnt Summary: input dimensions up to "axis" will be flattened to the outer dim of output and the remaining input dims will be the inner dim Closes https://github.com/caffe2/caffe2/pull/1330 Reviewed By: dzhulgakov Differential Revision: D6039560 Pulled By: bddppq fbshipit-source-id: e92c30b49a9288feeefc4a639522406e97e149e1	2017-10-13 12:28:22 -07:00
Junjie Bai	c3a9423c7f	Fix: ClearField only accepts string as field name Summary: Closes https://github.com/caffe2/caffe2/pull/1336 Reviewed By: akyrola Differential Revision: D6050556 Pulled By: bddppq fbshipit-source-id: 19809912ff0a1252d6054372debd6d77eea917a6	2017-10-13 10:57:34 -07:00
Sam Gross	8cfb23529b	Add additional erf, erfinv, and additional nn functions	2017-10-13 10:39:26 -07:00
Sam Gross	9b5371df1c	Add bindings to additional NN functions	2017-10-13 10:39:26 -07:00
Sam Gross	f444bd72b2	Don't free interned Python strings held in global variables (#3107 ) This pulls in @gchanan's fix for some crashes on exit in PyArgParser from #2997	2017-10-13 18:31:03 +02:00
soumith	5a96037810	skip ncclCommDestroy if CUDA driver is already unloaded	2017-10-13 08:50:00 -07:00
SsnL	8f26d6aabc	More shape checking for ConvNd (#3052 ) * check conv weight & bias dims * address comments	2017-10-13 16:56:19 +02:00
Priya Goyal	4831e478e1	Expose cmake version as env variable and scipy test	2017-10-13 16:54:35 +02:00
Trevor Killeen	4c6c4c513a	fix grad_bias calculation for nnpack	2017-10-13 16:16:31 +02:00
soumith	dd494091b2	remove std::move in profiler	2017-10-13 07:14:07 -07:00
soumith	cb011410b8	fix warning in THD	2017-10-13 04:03:11 -07:00
Willy Blandin	5f5270d4bf	raise AttributeError from __getattr__ for hasattr to work Summary: - hasattr is misbehaving in python 3 - python2: `This is implemented by calling getattr(object, name) and seeing whether it raises an exception or not` - python3: `This is implemented by calling getattr(object, name) and seeing whether it raises an AttributeError or not.` Reviewed By: azzolini Differential Revision: D5973797 fbshipit-source-id: 0b6a413e6ebacd9bdd197c46feab256ab383ace2	2017-10-12 23:25:15 -07:00
Dmytro Dzhulgakov	2972a6ca02	Revert D6026557: [caffe2][PR] Fix "No handlers could be found for logger" Summary: This reverts commit 95c634872ac02be721257169e38c8fead04cd66b bypass-lint Differential Revision: D6026557 fbshipit-source-id: 663c28583ce3b01070ff5449115ed7e222f71776	2017-10-12 20:21:52 -07:00
Sam Gross	4b12d9d1b2	Expose is_nullable in Declarations.yaml Some parameters can be null but do not have default values.	2017-10-12 19:53:29 -07:00
Sam Gross	3366654fd4	Support broadcasting in copy a.copy_(b) will now broadcast b to the shape of a. Note that this means that copies between tensors of the same number of elements but incompatible shapes are not allowed. For example, the following will throw an exception: Tensor a = type.rand({4, 43); Tensor e = type.rand({3, 4}); a.copy_(e)	2017-10-12 19:52:43 -07:00
Sam Gross	de7e1a9a82	Use pointer equality to compare types	2017-10-12 19:50:24 -07:00
Sam Gross	3bc94bf02c	Combine comparison methods and functions The methods were separate because PyTorch supports multiple output types for comparison methods. For example, for FloatTensors 'a' and 'b' both calls are vaild: torch.lt(a, b, out=<ByteTensor>) torch.lt(a, b, out=<FloatTensor>) ATen only supports ByteTensor outputs because the overloads have the same static signature and would conflict. It would be nice to fix this in the future like with the bernoulli function. In the meantime, the separate function and method definitions with different argument names make implementing VariableType more difficult.	2017-10-12 19:49:54 -07:00
Sam Gross	92c9848c04	Support wrap_dim in nn.yaml	2017-10-12 19:45:15 -07:00
Sam Gross	5d689989ec	Expose the THGenerator* via unsafeGetTH on at::Generator	2017-10-12 19:43:17 -07:00
Aapo Kyrola	998a1b6d74	fix memonger after D5994548 Summary: memonger.cc's support for RNNs was broken in D5994548, because it changed a .n argument to .s argument. That made data_parallel_model_test fail (but tests were not run for the blame diff, so this was not noticed). Reviewed By: kennyhorror Differential Revision: D6043948 fbshipit-source-id: d29abd6927c519227a28b41c1ef70fb1756904bf	2017-10-12 17:36:27 -07:00
Aapo Kyrola	d748c43f71	for dpm.GetLearningRateBlobNames Summary: I broke dpm.GetLearningRateBlobNames() when adding a new nodename param in optimizer. Fixing it. Reviewed By: asaadaldien Differential Revision: D6043828 fbshipit-source-id: b3a79dd0dfae144187bcb359e2374eab6b32c485	2017-10-12 17:20:33 -07:00
Sam Gross	2675ff73fd	Resize output argument `total_weight` in ClassNLLCriterion	2017-10-13 01:32:07 +02:00
Sam Gross	61bb0d2954	Remove unused parameter 'input' from Tanh	2017-10-13 01:31:48 +02:00
Edward Z. Yang	66bb3d6dec	Remove incorrect comment that join_with is symmetric. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Edward Z. Yang	191224b6e6	Suggest key_averages by default, it's more useful. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Edward Z. Yang	94c1fdd254	Typofix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Edward Z. Yang	86c1842701	More detailed docs for Graph.op Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Edward Z. Yang	b9cd45adcf	Add note about inplace status in ONNX and JIT. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-13 01:31:22 +02:00
Ilia Cherniavskii	2c01afd2a6	DoOp reuse workspace and test Summary: Adding ability to reuse workspace in Do op and unit tests Reviewed By: akyrola Differential Revision: D6037992 fbshipit-source-id: 73d6a14001f667f7ca5e1e02ff39911dc65e4cd1	2017-10-12 13:37:34 -07:00
Pieter Noordhuis	9575364d30	Update protobuf detection Summary: The scripts/build_local.sh script would always build protoc from the third_party protobuf tree and override the PROTOBUF_PROTOC_EXECUTABLE CMake variable. This variable is used by the protobuf CMake files, so it doesn't let us detect whether the protoc was specified by the user or by the protobuf CMake files (e.g. an existing installation). This in turn led to a problem where system installed headers would be picked up while using protoc built from third_party. This only works if the system installed version matches the version included in the Caffe2 tree. Therefore, this commit changes the variable to specify a custom protoc executable to CAFFE2_CUSTOM_PROTOC_EXECUTABLE, and forces the use of the bundled libprotobuf when it is specified. The result is that we now EITHER specify a custom protoc (as required for cross-compilation where protoc must be compiled for the host and libprotobuf for the target architecture) and use libprotobuf from the Caffe2 tree, OR use system protobuf. If system protobuf cannot be found, we fall back to building protoc and libprotobuf in tree and packaging it as part of the Caffe2 build artifacts. Closes https://github.com/caffe2/caffe2/pull/1328 Differential Revision: D6032836 Pulled By: pietern fbshipit-source-id: b75f8dd88412f02c947dc81ca43f7b2788da51e5	2017-10-12 11:48:50 -07:00
Tilak Sharma	1e8a16224f	PackSegments: return value presence. Summary: Optionally return a blob of shape [batch size, max length] that is false only in locations where the output tensor was padded. One can separately convert lengths to segment ids and cast, but this is more convenient, and possibly more efficient. Differential Revision: D6006073 fbshipit-source-id: af6c4ea31972566e7d059dcd3fdd8afba97a88e9	2017-10-12 11:17:34 -07:00
Evgeniy Shin	c6f96c1d7b	Add GPU support for LengthsTile Reviewed By: kittipatv Differential Revision: D5999171 fbshipit-source-id: cd0e305488f05c20d1925745fca0c4b4eef23071	2017-10-12 11:17:34 -07:00
Zach DeVito	7cf4529a82	add a deleter callback to tensorFromBlob	2017-10-12 11:06:40 -07:00
Aapo Kyrola	ca392b7c76	remove timeout from RNN executor Summary: I had a 30 sec timeout in RNN executor to find out deadlock bugs, but looks like people are occasionally bumping on it in the course of normal business -- perhaps when CPU is heavily used, the threads don't get enough time and run out of the timeout. Removed the timeout but retain the warning logging. Reviewed By: salexspb Differential Revision: D6001960 fbshipit-source-id: 5b2293359ee68c1c24f0d9e0406d88391e531280	2017-10-12 10:59:41 -07:00
Zhicheng Yan	6b22f64d2c	fix_im2col_nd_gpu_kernel Summary: Im2colNd GPU version was not correctly implemented due to 1) the lack of unit test 2) it is actually NOT used by any use case. A little more background: We are working implementing a conv-deconv 3D operator, which takes 3D volume data (e.g. video) as input, do conv in spatial domain to reduce resolution and do deconv (a.k.a conv transpose) in temporal domain. We first implement a conv transpose 3D op in D6035108, and spot the buggy gpu implementation. Reviewed By: asaadaldien Differential Revision: D6035081 fbshipit-source-id: b76dea2e44bcb73d202441bb246249c4481973e1	2017-10-12 10:05:27 -07:00
Sean Naren	f964105b56	Update generated ffi wrapper to consider other variable types (#3087 )	2017-10-12 18:54:31 +02:00
Artem Volkhin	4908351212	Do not propagate gradients for GatherRangesToDense Summary: as title Differential Revision: D5997854 fbshipit-source-id: a4f1cadfbd8057b01517e49f23f61b2029fa6099	2017-10-11 22:33:11 -07:00
Lu Fang	9ef39a50ee	Fix the broadcast in Addmm's symbolic (#3063 ) * Fix the broadcast in Addmm's symbolic * fix the non-matching dimension cases * Add exception for non-supported case, remove onnx test cases (moved to onnx-pytorch repo) * remove the test_onnx.py in run_test.sh * lint the code	2017-10-11 22:23:11 -04:00
Fei Sun	14c1e19c73	Consolidate the observer implementation Reviewed By: bwasti Differential Revision: D6013605 fbshipit-source-id: 661e11c2e35c4ecaaf6e2fdc67c44a75859c6b36	2017-10-11 18:53:36 -07:00
Fei Sun	bfae95043d	Self register the observer reporter when the file is included in the source Summary: This way, we can choose to include a file and the the containing reporter is registered in the ObserverConfig. We can have different targets with different reporters without exposing the dependency to all clients. Closes https://github.com/caffe2/caffe2/pull/1320 Reviewed By: bwasti Differential Revision: D6024096 Pulled By: sf-wind fbshipit-source-id: c6eabd7f9ca51b88ea4b268612355ca60809c0a2	2017-10-11 18:53:31 -07:00
Sam Gross	790941d6a0	Add additional comments	2017-10-11 18:28:07 -07:00
Sam Gross	8d19116508	Generate PyTorch-style NN bindings This generates NN bindings with a similar interface to PyTorch's torch.nn.functional package. The file nn.yaml specifies function signatures and THNN implementations. Each NN operation generates three functions. For example: - conv2d - conv2d_forward - conv2d_backward The conv2d and conv2d_forward functions differ in how they handle buffers that need to be passed to the backward function. conv2d_forward takes the buffers as parameters. conv2d creates the buffers internally and discards them.	2017-10-11 18:28:07 -07:00
Sam Gross	7bc154f8ea	Remove unused argument 'input' to Sigmoid_updateGradInput (#3079 )	2017-10-11 23:52:50 +02:00
Sam Gross	23c4152b41	Resize outputs in criterions (#3074 ) Most NN functions size their outputs appropriately. This makes the criterions used in PyTorch consistent with the other NN functions.	2017-10-11 23:52:31 +02:00
SsnL	2000ba0b26	Add random_ for cuda, fix random_ for cpu (#3042 )	2017-10-11 23:45:17 +02:00
Artem Volkhin	5b10ad255b	Use EMBEDDING feature type instead of FLOAT_TENSOR Summary: create a special type for embeddings Differential Revision: D5997808 fbshipit-source-id: 9a5ad8ecc019d10536705d3b25f2436ca8a56454	2017-10-11 13:50:03 -07:00
Malek Doghman	3e9f0092eb	Remove Redundant CMAKE_BUILD_TYPE Summary: Closes https://github.com/caffe2/caffe2/pull/1323 Differential Revision: D6031534 Pulled By: Yangqing fbshipit-source-id: de75523b17f67d092d45edb91fbb4e83c67b04be	2017-10-11 12:49:24 -07:00
Yangqing Jia	57863e4e79	Remove CAFFE2_CPU_FLAGS Summary: Since this is only a duplicate of CMAKE_CXX_FLAGS we should simplify the set of options. Closes https://github.com/caffe2/caffe2/pull/1327 Differential Revision: D6031544 Pulled By: Yangqing fbshipit-source-id: 5c610a70118089b4d96be30ab028ef1d5efdb019	2017-10-11 12:49:23 -07:00
Bram Wasti	c97e78715d	Revert D6028262: [caffe2][fix] update observer api in perf_observer Summary: This reverts commit 3dd99649473b9fe30493aa9306907e05b434d0d4 bypass-lint Differential Revision: D6028262 fbshipit-source-id: 7fed6e0948a4199b429fbd28cfcc1ae9ff0c145a	2017-10-11 11:21:38 -07:00
Adam Paszke	cc3058bdac	Fix macOS build (with CUDA) (#3071 )	2017-10-11 19:04:15 +02:00
bddppq	bd9b4df6e9	Add support for exporting MulConstant, DivConstant and Softmax to ONNX (#2923 ) * Add support for exporting MulConstant and Softmax * Add support for MulConstant in autograd execution * Also add support for DivConstant	2017-10-11 13:03:33 -04:00
SsnL	9260f0e5ee	Fix a typo in optim.rst (#3069 )	2017-10-11 16:47:14 +02:00
Richard Zou	72f6b5a03b	Make DtoH and HtoD transfers respect the current stream (#3067 )	2017-10-11 09:49:14 -04:00
Bram Wasti	4fb7600fcb	update observer api in perf_observer Summary: fix observer API Reviewed By: sf-wind Differential Revision: D6028262 fbshipit-source-id: 3dd99649473b9fe30493aa9306907e05b434d0d4	2017-10-11 00:16:53 -07:00
Luke Yeager	25b35a3f62	Fix broken MPI tests Summary: Broken since e16871d87d06f3ae1adfc90bd43410c00cc4a330 Closes https://github.com/caffe2/caffe2/pull/1315 Differential Revision: D6026591 Pulled By: Yangqing fbshipit-source-id: 0569128bb4df6c912d5d00239f6d70cdb72d3a15	2017-10-10 22:32:14 -07:00
Luke Yeager	75bece6ede	Fix "No handlers could be found for logger" Summary: Closes https://github.com/caffe2/caffe2/pull/1316 Differential Revision: D6026557 Pulled By: Yangqing fbshipit-source-id: 95c634872ac02be721257169e38c8fead04cd66b	2017-10-10 22:32:13 -07:00
Yangqing Jia	b1508e8e86	Revert D5905002: [caffe2] expose observers to python Summary: This reverts commit e40ec24a55e08fb73beea9b4f3b68e71fc66ffb1 bypass-lint Differential Revision: D5905002 fbshipit-source-id: 4f1b79d9a318978f6b74565f633f34b9701a9d5c	2017-10-10 22:12:00 -07:00
Andrey Malevich	e13f199452	Switch RNNOp to use NetDef argument for step represenetation. Summary: Before this diff RNNOp was using TextFormat for representing steps. This diff is changing RNNOp to prefer NetDef argument instead. To be backward compatible it supports TextFormat for existing models, though we can compile RNNs without TextFormat as well. Reviewed By: salexspb Differential Revision: D5949330 fbshipit-source-id: 9336a8f5ccf30ad8d8e3a7067b9437e1704b1c9f	2017-10-10 22:01:51 -07:00
Sasank Chilamkurthy	169ed0cd4b	remove torchvision docs from pytorch repo. Moved to vision repo (#3024 )	2017-10-10 23:59:55 -04:00
SsnL	828048f578	Add document on how Module.cuda() and optims should work together (#3056 )	2017-10-10 22:55:23 -04:00
Edward Z. Yang	f2809a5259	Fix Python lint. (#3061 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-10-10 22:44:33 -04:00
Ilia Cherniavskii	1dbbef6b48	Fix crash in blob deallocation Summary: We have to use copy constructor in Concat when copying non-primitive types Reviewed By: Yangqing Differential Revision: D6002883 fbshipit-source-id: 0aebc955079975bb6423291589ed09ce0660acf3	2017-10-10 19:03:01 -07:00
Zhifeng Deng	66b8cb95e9	Add int64 support to sparse_to_dense_mask_op Summary: [CAFFE2] Add int64 support to sparse_to_dense_mask_op Reviewed By: ender-wieczorek Differential Revision: D6022278 fbshipit-source-id: 489b6df4d43a64c743ee278d94929ca50259f7b8	2017-10-10 17:19:44 -07:00
Ilia Cherniavskii	b4dfadcfa2	Fix OOM in Travis in executor test Summary: Use only MLP model and re-enable test Reviewed By: bddppq, Yangqing Differential Revision: D6013471 fbshipit-source-id: 0cb4a9346c62a739ee6259832181f71e60eef311	2017-10-10 17:19:43 -07:00
Yangqing Jia	18790639ed	Rename library name to lower Summary: In the past we call our libraries libCaffe2_CPU.so and libCaffe2_GPU.so that don't really match the usual linux so library naming conventions. This diff changes it to libcaffe2.so (old Caffe2_CPU) and libcaffe2_gpu.so (old Caffe2_GPU). This might affect existing building scripts that explicitly use Caffe2_CPU and Caffe2_GPU: what do you guys think? pietern bwasti slayton58 Closes https://github.com/caffe2/caffe2/pull/1300 Differential Revision: D6025973 Pulled By: Yangqing fbshipit-source-id: 6243de4e7af8924f737bb74f3936015f4c91fa26	2017-10-10 17:02:21 -07:00
Jerry Zhang	1b892ea295	Enable axis argument for MatmulOp Summary: att Reviewed By: ajtulloch Differential Revision: D5523365 fbshipit-source-id: b7a379c9c4326cd642e7b4768cc590b5e1b94b6d	2017-10-10 16:47:37 -07:00
Yangqing Jia	09d6b6fd00	update intel script Summary: TSIA - this would allow us to auto-sync the up to date version with intel's repo. Closes https://github.com/caffe2/caffe2/pull/1319 Reviewed By: pietern Differential Revision: D6023739 Pulled By: Yangqing fbshipit-source-id: 79bd91aa3a193c266acccdeb682519a49e028bae	2017-10-10 16:41:07 -07:00
Bram Wasti	63caca89db	expose observers to python Summary: observer framework can now be used in python + a small writeup of how to use it Reviewed By: salexspb Differential Revision: D5905002 fbshipit-source-id: e40ec24a55e08fb73beea9b4f3b68e71fc66ffb1	2017-10-10 16:10:41 -07:00
Sam Gross	246a382610	Simplify PReLU binding (#3055 ) * Simplify PReLU binding - Remove internal buffers from function signature - Compute nOutputPlane internally * Fix legacy PReLU	2017-10-10 17:50:13 -04:00
soumith	f74665f0c4	remove gcc install suggestion	2017-10-10 14:45:19 -07:00
soumith	d66549d27c	remove files from botched merge	2017-10-10 14:42:41 -07:00
Junjie Bai	f11ff5befb	Fix mismatched input shape in ATen sample script Reviewed By: akyrola Differential Revision: D6023015 fbshipit-source-id: b210e9e8f213d416abf9c9ddbb28bca3bd35c512	2017-10-10 14:21:08 -07:00
Lu Fang	8d8a99c244	Add ONNX Pad reflect and edge mode support (#3048 )	2017-10-10 17:02:08 -04:00
Sam Gross	9437644f66	Replace softmin and softsign with simple differentiable expressions	2017-10-10 16:57:47 -04:00
Kevin Wilfong	e3a7c78f04	Add shutdown_fun to parallel_workers Summary: parallel_workers supports calling a custom function "init_fun" when WorkerCoordinators are started which is passed in as an argument to init_workers. Adding an analogous argument "shutdown_fun" which gets passed in to init_workers, and gets called when a WorkerCoordinator is stopped. This allows users of the parallel_workers to add custom cleanup logic before the workers are stopped. Reviewed By: akyrola Differential Revision: D6020788 fbshipit-source-id: 1e1d8536a304a35fc9553407727da36446c668a3	2017-10-10 12:02:24 -07:00
Luke Yeager	ee143d31ef	Fix ImageInput op in resnet50_trainer.py Summary: Fix #1269 (from fa0fcd4053dd42a4ec3a2a12085662179f0e11df). Closes https://github.com/caffe2/caffe2/pull/1314 Reviewed By: bwasti Differential Revision: D6021171 Pulled By: bddppq fbshipit-source-id: 7d7c45f8b997c25f34530f826729d700a9c522d4	2017-10-10 11:20:52 -07:00
Junjie Bai	d894a6362f	Add missing is_test argument in ImageInput ops Summary: reported in Github Issue https://github.com/caffe2/caffe2/issues/1269 Reviewed By: salexspb Differential Revision: D6004461 fbshipit-source-id: 03f4bccfe085010b30109ab7b6fe7325caa160ef	2017-10-10 10:03:13 -07:00
Luca Antiga	c23ae308f3	Fix build without numpy (#3049 )	2017-10-10 18:47:19 +02:00
Edward Z. Yang	f7f37306e4	New torch.jit.verify function for verify once-backward. A few notes about the implementation: - Need to plumb 'devices' through to the 'fork_rng' calls. You definitely want these; it makes verify run A LOT faster - New keyword argument for compiled model execution, '_force_trace', which forces us to retrace a model. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-10 11:46:40 -04:00
soumith	6de2929967	fix TH warnings after explicit types changes	2017-10-10 08:34:55 -07:00
Priya Goyal	2443fcac0b	Deterministic cudnn algorithms	2017-10-10 10:53:34 -04:00
Richard Zou	403a533827	Forgot to modify a kernel call	2017-10-10 10:18:50 -04:00
Richard Zou	8cc258153d	Make VolumetricAveragePooling cuda stream-aware	2017-10-10 10:18:50 -04:00
SsnL	a47948784d	add kwargs_only defaults for sorted and largest	2017-10-10 10:18:28 -04:00
Francisco Massa	9dd872053f	Add possibility to fallback to retrieving MAJOR.MINOR	2017-10-10 10:16:14 -04:00
Francisco Massa	139aaf65d6	Bugfix plus remove other option that depends on the version.txt file Restrict search for cudart instead of libcudart	2017-10-10 10:16:14 -04:00
Francisco Massa	f093545919	Add compiled CUDA version in torch.version.cuda	2017-10-10 10:16:14 -04:00
Zach DeVito	5e01bc7122	add 'at' helper method	2017-10-10 10:14:29 -04:00
Sam Gross	b56098b540	Make parameter names consistent Use the same name for parameters computed in updateOutput which are used in updateGradInput or accGradParameters	2017-10-10 10:13:54 -04:00
Xianjie Chen	9455eda57b	cast distill loss teacher label to float Summary: it failed for the case when the `prod_prediction` is used as teacher label, which is double, instead of float. Reviewed By: kittipatv Differential Revision: D6018163 fbshipit-source-id: cd93fd46996e07c7f762eedbeb67331a4665d4c4	2017-10-10 01:16:07 -07:00
Yangqing Jia	6e12a9c4a4	get around homebrew issue Summary: This fixes osx build issues - once those pass I'll merge. Closes https://github.com/caffe2/caffe2/pull/1310 Differential Revision: D6018394 Pulled By: Yangqing fbshipit-source-id: 345c74435a78909535fa90e8c908fc06d6dabc36	2017-10-09 23:47:26 -07:00
Soumith Chintala	efe91fb9c1	delete redundant python nccl code	2017-10-09 22:24:18 -04:00
Soumith Chintala	e9dccb3156	implement all_reduce, broadcast, all_gather, reduce_scatter	2017-10-09 22:24:18 -04:00
Soumith Chintala	4d62933529	add initial NCCL C bindings	2017-10-09 22:24:18 -04:00
Soumith Chintala	b7e258f81e	link specific versioned System NCCL, rather than generic file	2017-10-09 22:24:18 -04:00
Zachary DeVito	2ff516bf79	Add tutorial describing how to use the ATen Caffe2 operator from PyTorch Summary: Also fixes a dependency bug in the cmake file for the ATen Op. Closes https://github.com/caffe2/caffe2/pull/1309 Differential Revision: D6017166 Pulled By: zdevito fbshipit-source-id: 3f4d18772f9179367927d4e7a52e51a4580342e9	2017-10-09 19:02:10 -07:00
Kittipat Virochsiri	d5f60b240d	Fix distill loss Summary: The layer should also apply to evaluation as it's needed for feature importance run. Reviewed By: xianjiec Differential Revision: D6016125 fbshipit-source-id: e1db1a2eb3d45515e3cdc71b4badaaf738a4afd8	2017-10-09 18:17:31 -07:00
Mohammad Husain	77ae903650	Skip negative indices Summary: A single negative index can crash the job today. We want to skip a few of them but not a lot. If we skip too many then we will force the job to crash. Reviewed By: kennyhorror Differential Revision: D6003461 fbshipit-source-id: 7881ed6c2cfa78c7bda90c7aa01e81ca00fd08a6	2017-10-09 16:09:50 -07:00
Yanghan Wang	30dac012e0	change header Differential Revision: D5887857 fbshipit-source-id: 994002cb1a72d123035667e4b809d6cea1950a5e	2017-10-09 15:41:57 -07:00
Richard Zou	803afd58a0	Make MultiLabelMarginCriterion respect the cuda current stream	2017-10-09 14:06:22 -04:00
Edward Z. Yang	a0831219cf	SqueezeNet ceil_mode not yet supported. Fixes #2898. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-10-09 11:07:11 -04:00
Alisson Gusatti Azzolini	45c5ac1415	Print net type arguments in net_printer Summary: This prints the inner net of 'Do' op, for example. Reviewed By: akyrola Differential Revision: D6007278 fbshipit-source-id: 459583fe13191b0449982efb7be733c9c01ecf76	2017-10-08 20:02:55 -07:00
Luca Antiga	6743d59513	Add missing import. Add return to __getstate__	2017-10-08 11:07:10 -04:00
Aapo Kyrola	43adc5ba05	Add nodename to ONE, iteration_mutex etc. Summary: Similar as with Iter, LR. Reviewed By: azzolini Differential Revision: D6005817 fbshipit-source-id: 6d1260791d1acb3df957315eb9156eac183ee25c	2017-10-07 22:06:11 -07:00
Ellie Wen	463bcd00ea	add None check for scope.CurrentDeviceScope() Summary: add None check for scope.CurrentDeviceScope() Reviewed By: akyrola Differential Revision: D6005320 fbshipit-source-id: 05e2515736dcb2bddbb47fa423f892091c4577d7	2017-10-07 17:38:30 -07:00
Ellie Wen	44a0f6805e	fix get_cpu_blob_name() Summary: add def get_cpu_blob_name(self, base_str) back before D6001124 Reviewed By: akyrola Differential Revision: D6004994 fbshipit-source-id: 318581d2b2c22878929993160da8edcb7d7a58e6	2017-10-07 11:56:15 -07:00
Andrey Malevich	2aac8f4f82	Add support for NetDef in RNNOp. Summary: RNNOp have been using TextFormat for representing nets. This have already cause some incompatibilites and also pulls huge dependencies for RNN on Mobile. This diff is adding support for using NetDef arg instead and adds supports for compiling only this version. Reviewed By: salexspb Differential Revision: D5994548 fbshipit-source-id: 6c4ded97b80d7a57ad5a013b79ae917aac777c7d	2017-10-07 04:16:37 -07:00
Sam Gross	c62490bf59	Use PyInt in Python 2.7 with small values	2017-10-07 00:41:29 -04:00
Sam Gross	f29bcab67e	Use Declarations.yaml to generate python bindings	2017-10-07 00:41:29 -04:00
Sam Gross	558d26a69e	Fix argument indices	2017-10-07 00:41:29 -04:00
Sam Gross	dcb8d0f088	Refactor out python binding generation from gen_variable_type.py - Also includes some prep work for binding NN functions	2017-10-07 00:41:29 -04:00
SsnL	dc1b4ff74e	Fix isContiguousDim (#3011 )	2017-10-07 00:40:51 -04:00
Soumith Chintala	c52b3d7524	qr memory leak fix (#3017 )	2017-10-07 00:35:05 -04:00
Lu Fang	69fb6bee58	Remove the extra fake output in ONNX Concat (#3014 )	2017-10-06 22:43:22 -04:00
Aapo Kyrola	dcfed49e96	fix multiple issues with multiple PS, learning rates, iter; Summary: 1. iteration and LR must be node-name specific in optimizer Reviewed By: azzolini Differential Revision: D6001124 fbshipit-source-id: 0fa53fb3347e89401f62125865166356ac56796b	2017-10-06 19:21:16 -07:00
Sam Gross	aaa74b4929	Fix flaky erfinv autograd test (#3015 )	2017-10-06 20:05:47 -04:00
Fei Sun	dba92055f3	Update Caffe2 benchmark file to write text output Summary: The Caffe2 benchmarking framework can now compare the output of a model with some golden output. In order to do that, and reduce the dependency of the benchmarking framework and caffe2, the output is dumped as text format without any schema. The output is read in by the benchmarking framework and perform the comparison. Closes https://github.com/caffe2/caffe2/pull/1301 Reviewed By: bwasti Differential Revision: D5992836 Pulled By: sf-wind fbshipit-source-id: f6b403103949f4b9880c8372bbdc36966475a387	2017-10-06 15:50:55 -07:00
SsnL	0eec332e14	assert reflection padding in range (#3008 )	2017-10-06 17:59:01 -04:00
Jerry Zhang	1605566388	Add map input for predictor Summary: Added TensorMap input for run function in predictor.cc Reviewed By: bwasti Differential Revision: D5847103 fbshipit-source-id: cd9755a0491b50adc35177164ffe7a50e73ff80f	2017-10-06 13:32:59 -07:00
Jiyan Yang	2c44a9f9cd	Add BatchBucketOneHotOp Summary: Input is a matrix tensor. Its first dimension is the batch size. For each column, bucketize it based on the boundary values and then do one hot encoding. The `lengths` specifies the number of boundary values for each column. The final number of buckets is this number plus 1. This would also be the expanded feature size. `boundaries` specifies all the boundary values. Note that each bucket is right-inclusive. That is, given boundary values [b1, b2, b3], the buckets are defined as (-int, b1], (b1, b2], (b2, b3], (b3, inf). For example If data = [[2, 3], [4, 1], [2, 5]], lengths = [2, 3], and boundaries = [0.1, 2.5, 1, 3.1, 4.5], then output = [[0, 1, 0, 0, 1, 0, 0], [0, 0, 1, 1, 0, 0, 0], [0, 1, 0, 0, 0, 0, 1]] Reviewed By: xianjiec Differential Revision: D5976030 fbshipit-source-id: fd746c20b19bcdf5f769451d804c219ad6463f28	2017-10-06 13:25:12 -07:00
Sam Gross	d39e519ce2	Merge commit '18eb4bbdf9563c0620bbc93daa045c2258b63bde'	2017-10-06 12:36:34 -07:00
Sam Gross	18eb4bbdf9	Improve Declarations.yaml: (#81 ) * Improve Declarations.yaml: - translate defaults to C++ values - include names of returned values - mark keyword-only arguments * Add comment to translate_default	2017-10-06 15:30:25 -04:00
Richard Zou	39a82f3e3f	Fix triu/tril (#3007 )	2017-10-06 15:28:57 -04:00
Jiyan Yang	85d0bfb6f3	Cuda SparseLabelSplitOp Summary: This is a brief introduction to what this op is doing. In the multi-label case, i.e., each example has more than one label, we want to find out which examples have values for each label. That is, given a sparse representation in len = (2,3), ind = (1, 2, 0, 1, 2), val = (10, 20, 5, 8, 15), we want to return example_id_0 = [1], example_id_1 = [0,1], example_id_2 = [0,1], value_0 = [5], value_1 = [10,8], value_2 = [20,15]. There are two special things here. 1. The size of each output tensor is unknown until runtime; 2. The ordering in each output tensor should be preserved, e.g., example_id_1 = [0,1] instead of [1,0]. What I am doing now is to get the output size and an offset map (see codes) in cpu and then launch a kernel to take care of the rest. This requires a copy of O(N) which is really not ideal. Previously I had an implementation that computes the output size in gpu but when I fill values in the output tensors it is hard to make sure the ordering will be preserved unless I do a sorting afterwards. Reviewed By: azzolini Differential Revision: D5825104 fbshipit-source-id: 4d987cef0247746998ec1d2acc47fc5ed2302722	2017-10-06 12:06:39 -07:00
Ilia Cherniavskii	4362c4de9c	Temporarily disable test in Travis Summary: Temporarily disable executor test in Travis Reviewed By: akyrola Differential Revision: D5997441 fbshipit-source-id: 54f454d99a50a917a950dfd23b1e20fb7fbbc754	2017-10-06 12:06:38 -07:00
Andrew Tulloch	5e38345d4a	Fix break Differential Revision: D5997998 fbshipit-source-id: a3937539fe331107f4d2917a2e44e187fa14a8c1	2017-10-06 11:34:54 -07:00
Fei Sun	d2195218f6	Build local Summary: The build_local.sh script current is single thread, which is really slow. Use the same mechanism in build_android.sh to parallelize the build. Closes https://github.com/caffe2/caffe2/pull/1282 Differential Revision: D5992231 Pulled By: sf-wind fbshipit-source-id: 01ba06b6efcb0f535f974a2dfffbae9ba385d27d	2017-10-06 11:06:29 -07:00
Sam Gross	3ae961f062	Release saved variables in generated functions (#3004 )	2017-10-06 12:17:07 -04:00
Lu Fang	10b42f5d6c	Add ONNX support for ConstantPadNd (#2962 ) * Add ONNX support for ConstantPadNd * add comments to explain the order of paddings and pad is guaranteed to have even elements	2017-10-06 11:03:48 -04:00
Richard Zou	898c732293	Introduce a `reduce` keyword argument for MSELoss (#2878 ) * Add reduce keyword to MSECriterion API * Move gradOutput usage from py to backend * Implement reduce keyword for THNN MSECriterion * Implement reduce keyword for THCUNN MSECriterion * Implement reduce keyword for MSE double backwards * Tests for MSECriterion with reduce keyword * Documentation for reduce for MSELoss * Make legacy nn work with reduce keyword by ignoring it * Apply linter suggestions * Address comments (small changes) * Revert "Tests for MSECriterion with reduce keyword" This reverts commit 1c0be0defa49d336d023d7d9795db4037c92b6fe. * Undo changes to legacy nn tests * Reuse module test for MSELoss by creating a wrapper class for MSELoss * Address comments: refactor MSECriterion.cu to be nicer * Fix lint & build errors	2017-10-06 10:57:22 -04:00
Lu Fang	6a91f556d0	fix a bug in exporter, we forgot to copy type to the new node for index op	2017-10-06 10:40:14 -04:00
Lu Fang	7dd74b6a71	Address the introduced types in ONNX PR 57	2017-10-06 10:40:14 -04:00
Lu Fang	268fce1073	change encodeType to encodeTypeProtoTensorType	2017-10-06 10:40:14 -04:00
Lu Fang	10537ce4ed	Support the new proto introduced in onnx/onnx PR 51	2017-10-06 10:40:14 -04:00
SsnL	b2f5ccf366	lint	2017-10-06 10:39:33 -04:00
SsnL	0c2957512f	Fix two legacy modules clearing input tensor in clearState	2017-10-06 10:39:33 -04:00
SsnL	ecdb86e733	Update all existing nn tests to new args format; Move all randomness inside tests	2017-10-06 10:39:33 -04:00
SsnL	b6e1dd2674	Remove top-level seed setting	2017-10-06 10:39:33 -04:00
SsnL	c76e2900a8	Change TestCase args to accept value, size or fn for constructor_args, input and target	2017-10-06 10:39:33 -04:00
Sergey Kolesnikov	5f8bab47c8	bugfix for 2428 ussue (#3000 )	2017-10-06 09:20:12 -04:00
Hao Lu	50208c9fd6	Refactor GLConvolution Summary: Separate class definition into header file Remove uniform buffer initialization in the constructor because it's not necessary Separate tiling and batching code Reviewed By: jerryzh168 Differential Revision: D5960502 fbshipit-source-id: 5e3bce5192ce6dc69868be1722f490f690d87076	2017-10-05 22:31:47 -07:00
Qing He	f535700ccc	Add weighted_sampling operator to Caffe2 Summary: Add weighted_sampling operator to Caffe2 Reviewed By: akyrola Differential Revision: D5962199 fbshipit-source-id: ab3f56a1dc7b8eaf4ed4d74af6c6c08dccca5a1e	2017-10-05 20:33:59 -07:00
Richard Zou	4af66c4304	Cleanup: remove useCurrentStream function (#2990 )	2017-10-05 23:04:59 -04:00
Romain Cledat	4b3400b249	Added statistics for standard deviation Summary: Added an exported statistics that helps in computing standard deviation. It uses an offset-ed mode of computation to avoid a common pitfall Reviewed By: azzolini Differential Revision: D5977811 fbshipit-source-id: e9f3b99a952e10fb3e3eb18a29b5bdca92f82f4c	2017-10-05 17:06:27 -07:00
Pieter Noordhuis	db06e91097	Bump gloo Summary: Latest version of Gloo takes care of MPI_Init/MPI_Finalize for us, so this commit removes handling that from caffe2/contrib/gloo. It also imports CMake NCCL module changes from Gloo to stay consistent and allow setting NCCL_INCLUDE_DIR and NCCL_LIB_DIR separately. Closes https://github.com/caffe2/caffe2/pull/1295 Reviewed By: dzhulgakov Differential Revision: D5979364 Pulled By: pietern fbshipit-source-id: 794b00b0a445317c30a13cc8f0f4dc38e590cc77	2017-10-05 16:57:59 -07:00
Junjie Bai	225de6628c	Improve NNPACK error message Reviewed By: dzhulgakov Differential Revision: D5923278 fbshipit-source-id: 79b03f0cb24a4c71a34b6e8f95f2fc2a709c4afa	2017-10-05 15:43:46 -07:00
Richard Zou	9425a2bf19	Fix cudnn grid_sample backward for implicit gradOutput (#2993 )	2017-10-05 17:57:35 -04:00
Jerry Zhang	0710a90fa1	Tiled Softmax Summary: Add tiling support for GLSoftmax Reviewed By: hlu1 Differential Revision: D5891341 fbshipit-source-id: 38db5f64b3363852b4b650fed0ee1ee425d041a5	2017-10-05 12:47:15 -07:00
Edward Z. Yang	2e4de82514	Support more ONNX ops in autograd execution Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	2861638e8a	Add torch.random.fork_rng, which forks the RNG temporarily. There is a bit of nuance to this function. If one blindly charges in and initializes all GPUs, it is going to take a long time. 20sec for 8 GPUs on my dev machine. But to a user, it is non-obvious that fork_rng is going to hit all the GPUs by default (which it does by default for safety reasons.) So there is a nice warning when we notice we're hitting more than one GPU. There is a bit of extra generality which is going to be used by torch.jit in a subsequent commit.	2017-10-05 15:27:49 -04:00
Edward Z. Yang	539ae451d2	Move random initialization functions from torch to torch.random. The motivation is that I wanted to add some more general purpose utility random functions, but not gunk up torch/__init__.py. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	b08219b51a	Correctly mark a method as override. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	bfd77e9942	Delete obsolete comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	0ae56ab247	Squash Python.h warning. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	f9e9c5326b	Support for Tanh and Sigmoid in the executor. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Edward Z. Yang	be04d5a347	Print small tensors in IR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-05 15:27:49 -04:00
Sam Gross	c9f7b1efcc	Fix additional deprecated function signatures	2017-10-05 11:47:33 -07:00
Jiyan Yang	da46b9c886	Make cudnn relu op work for empty batches Summary: As titled. Reviewed By: azzolini Differential Revision: D5888868 fbshipit-source-id: fbccf63b6fa1e9b487c81de2ca86488e91f18274	2017-10-05 11:40:24 -07:00
Ilia Cherniavskii	5eb45fb0b4	Add check for Travis in executor test Summary: Also check whether test runs under Travis Reviewed By: Yangqing Differential Revision: D5966311 fbshipit-source-id: 0d72259e194b25cc7477d6e62c6fa8e8d83e5f50	2017-10-05 11:40:23 -07:00
Sam Gross	2631ee749a	Generate ATen from torch/lib/ATen/Declarations.cwrap - Update calls to use new ATen ordering	2017-10-05 11:29:53 -07:00
Sam Gross	92fdc55aaf	Merge commit 'ba1f94b6f59e5cba251d4a4266701f1e72015bc2'	2017-10-05 11:04:52 -07:00
Sam Gross	ba1f94b6f5	Refactor out TensorBase from Tensor Use TensorBase in Scalar class	2017-10-05 14:02:34 -04:00
Sam Gross	ef3b7597b7	Fix copy and move constructors	2017-10-05 14:02:34 -04:00
Sam Gross	fa812c4511	Remove has_full_argument_list	2017-10-05 14:02:34 -04:00
Sam Gross	a18e81ddb8	Fix lint	2017-10-05 14:02:34 -04:00
Sam Gross	5e564d6c12	Add check that tensor is defined in Scalar constructor	2017-10-05 14:02:34 -04:00
Sam Gross	4a12f70ba1	Move default arguments to function declaration * Make alpha, beta in addmm kwarg_only * Move kwarg_only arguments to the end * _out variants now have output arguments at the beginning	2017-10-05 14:02:34 -04:00
Yangqing Jia	d8a0cdc0c5	Adding asan option Summary: This would allow one to debug with asan. Known problems: - only works with new -fsanitizer=address option. - not tested on clang. It's turned off in default so existing builds won't be affected. Closes https://github.com/caffe2/caffe2/pull/1299 Differential Revision: D5987034 Pulled By: Yangqing fbshipit-source-id: de29cd3b84edaed5db73e33f8f759c5c3271b5b7	2017-10-05 10:55:04 -07:00
Richard Zou	cbdbe518e9	If cudnnSetStream is not successful, give error instead of warning (#2988 ) * If cudnnSetStream is not successful, give error instead of warning * Use built-in error reporting	2017-10-05 13:12:29 -04:00
gchanan	c74f7d8ade	Support varags style IntLists in derivatives.yaml and implement view. (#2963 )	2017-10-05 11:46:23 -04:00
Richard Zou	137b139551	Make cuDNN use the current stream (#2984 )	2017-10-05 09:27:04 -04:00
Artem Volkhin	fb8a7679cc	preprocs for embeddings Summary: embeddings Differential Revision: D5888420 fbshipit-source-id: b293df6444cba49e2feab6ccf8b8346019e5b421	2017-10-04 22:18:21 -07:00
Hassan Eslami	de43326cfc	Identify components after sparse layers' tagging Summary: Given a pair (init_net, train_net) where ops in sparse layers are tagged, this diff detects the components and rename the `node_name` (e.g. tag) to reflect the component name. Reviewed By: azzolini Differential Revision: D5948222 fbshipit-source-id: aeda9cfc88bb64922bf7a9942b969e3c5066718a	2017-10-04 21:03:47 -07:00
Fei Sun	b649ce3d6d	Caffe2 Benchmarking Framework Summary: Implement a framework to benchmark the Caffe2 inferencing time. It only contains the observer collecting the delay information for running the net and the operator. The driver of the benchmark is in a separate repository. It does not interfere with the rest of the Caffe2. Closes https://github.com/caffe2/caffe2/pull/1263 Reviewed By: bwasti Differential Revision: D5956861 Pulled By: sf-wind fbshipit-source-id: ba4f0226066f55d333b27d472e09137d7272d449	2017-10-04 20:02:37 -07:00
Wenyi Huang	20b3918ba8	add cuda support for Topk Gradient Summary: as title Reviewed By: azzolini Differential Revision: D5822303 fbshipit-source-id: 3bc88a9071167c41e3fc717a2b31dceee6fee360	2017-10-04 19:31:56 -07:00
Yangqing Jia	642542ec2d	Resolve heap-buffer-overflow problem Summary: In instance norm implementation, the lambda function is causing heap overflow so moving it explicitly into the function body itself. accept2ship Reviewed By: pietern Differential Revision: D5981662 fbshipit-source-id: 6901c9cd738de048e3d0308a0a4c52f9c37e524a	2017-10-04 19:02:32 -07:00
Soumith Chintala	b029582655	Merge commit '03d856977ecbaac87e598c0c4bafca96761b9ac7'	2017-10-04 21:57:36 -04:00
Hassan Eslami	8e309c014c	Tagging sparse parameters Summary: This is the first step on DPER side to use net transformation step (`parallelize_net`). So far, it tags the sparse parameters (in init_net and train_net) once distributed trainer nets are built. Next step is to merge the part that creates distributed trainer nets (`create_distributed_trainer_nets`) into the part that creates single-trainer, multi-reader nets ('create_distributed_reader_nets`). This step should get rid of parts of `MixtureStrategyModelBuilder`. Reviewed By: azzolini Differential Revision: D5902733 fbshipit-source-id: 85fbddbb6c2704badd82b237f1dd2c7c5790e43a	2017-10-04 18:46:48 -07:00
Andrey Malevich	7e80dc6cbd	Remove check that can never be true from RNNOp. Summary: As desc. Reviewed By: salexspb Differential Revision: D5971303 fbshipit-source-id: 4728b4df91e16c151efce48f1987f2e5d109f343	2017-10-04 17:36:02 -07:00
James Reed	995c83f945	Disable cudnn dropout Summary: The cudnn version of the DropoutOp was taking a significant (and unwarranted) amount of time in our RNN training. Further investigation showed that setting the cudnn dropout descriptors was an extremely expensive operation (https://pxl.cl/99nT), much more so than the dropout operation itself. This diff adds to the DropoutCell the option to disable cudnn. The non-cudnn version uses a raw curand call that elides all of the expensive descriptor setting. Reviewed By: jmp84, akyrola Differential Revision: D5972022 fbshipit-source-id: 6325ec5d6569f8b94d776cbb2554cc8ddb28f699	2017-10-04 17:24:09 -07:00
Andrey Malevich	6a71cfa31e	Faster version for RowWiseSparseAdagradOp Summary: Move common operation out of loop. Reviewed By: dzhulgakov Differential Revision: D5962894 fbshipit-source-id: e4f8a5406c870958215cbc1fd366fa87bc381471	2017-10-04 15:46:39 -07:00
Artem Volkhin	a2be56bc34	add GatherRangesToDense operator Summary: adding an operator with behavior similar to fused GatherRanges and Split. Reviewed By: kennyhorror Differential Revision: D5961761 fbshipit-source-id: 616d4668b8901256418004def90d91a0b2041620	2017-10-04 15:18:10 -07:00
Uthsav Chitra	964d740ede	adding batch support to SequenceMaskOps Summary: Added support for batching to SequenceMaskOp. Let b be the batch dim and k be the axis dim. (We enforce that b < k.) Write the dimensions of the input tensor as [a_1, ..., a_b, ..., a_k, ...]. We first collapse our tensor down to 3D, with dimensions [P, Q, D], where: P = a_1 * ... * a_b, Q=a_{b+1} * ... * a_{k-1}, and D=a_k * a_{k+1} * ... * a_n. Then we mask each slice [i, :, : ] of this 3D tensor (note that each slice is a Q times D tensor w/ dimension 2) Reviewed By: jamesr66a Differential Revision: D5733382 fbshipit-source-id: e7a314d9fe6e6691a75112edbee8ba6e8ea8e396	2017-10-04 15:18:09 -07:00
SsnL	ba766ef39a	Fix BN size check in eval mode (#2977 )	2017-10-04 16:03:20 -04:00
peter	7a809ea6fd	Fix build for MSVC	2017-10-04 16:01:30 -04:00
Soumith Chintala	f783a65a5a	Merge commit 'bace20a7d446c4e130d49ad47c3370ae00f82c05'	2017-10-04 16:00:15 -04:00
peter	bace20a7d4	Fix build for MSVC	2017-10-04 15:56:57 -04:00
Junjie Bai	91bb6ce095	Allow explicitly specifying to use operators' default implementation Reviewed By: dzhulgakov Differential Revision: D5973635 fbshipit-source-id: 12dccc6332a8dd264ccc9f831a053a3be9b89c56	2017-10-04 12:17:36 -07:00
Bram Wasti	d2e94d0faa	change device enums to be contiguous Summary: quick change Reviewed By: ajtulloch Differential Revision: D5976025 fbshipit-source-id: a5a1538a380edb7c3b0af76e74c2ccee09ecb928	2017-10-04 11:17:57 -07:00
Trevor Killeen	029252fb3b	NNPACK bindings for Convolution (#2826 ) * skeleton commit for building and linking nnpack library in PyTorch * first stab at conv forward binding + integration * bind NNPACK gradient kernels * move nnpack forward, input gradient calls deeper * nnpack conv api mimics nn * fix symbol error; use memory across calls * clean up warnings, add shape checking, thread safety, configurable thread specification * add batch size threshold, also bind for single-element batch for the future	2017-10-04 13:48:14 -04:00
Trevor Killeen	42712c677d	More user-friendly error messages for indexing with multi-dimensional LongTensors (#2974 )	2017-10-04 10:55:55 -04:00
SsnL	f608208a80	Fix scatter size check (#2960 ) * scatter size check * add comment for size_check macro	2017-10-04 10:21:29 -04:00
Martin Drawitsch	b3bcba60c7	Correct padding docs of 3D modules (#2970 ) 3D modules apply padding on all three sides. "Both" doesn't make sense here. I used the wording of the AvgPool3d docstring, where it was already correct.	2017-10-04 09:52:37 -04:00
Priya Goyal	756ab3f24f	Adding conversion from python tensor to dlpack tensor (#2933 )	2017-10-04 08:35:42 -04:00
Dmytro Dzhulgakov	5527dd3b08	Expose CMake options in the binary Summary: Useful for figuring out with people which version they built with. We can just ask for --caffe2_version gflag or get core.build_options from python. Also adds CMAKE_INSTALL_RPATH_USE_LINK_PATH - without it wasn't building on my Mac. How should it be tested? Closes https://github.com/caffe2/caffe2/pull/1271 Reviewed By: bddppq Differential Revision: D5940750 Pulled By: dzhulgakov fbshipit-source-id: 45b4c94f67e79346a10a65b34f40fd258295dad1	2017-10-04 02:33:02 -07:00
Di Yu	acc384183a	caffe2 operator logit / logit gradient CUDA implementation Summary: This is the continuation of T20872698 Implement the gradient operator for element-wise Logit Reviewed By: asaadaldien Differential Revision: D5969487 fbshipit-source-id: c9bb4222529f9fd9085aa9048b90eb70a63f41f4	2017-10-03 18:48:25 -07:00
Jerry Zhang	81284c7a0d	Translating Crop to Slice Summary: Only works for len(offset) == 1 for now. Also, Slice Op only supports slicing in one dimension, can we extend it to support slicing multiple dimensions? Reviewed By: bwasti Differential Revision: D5967476 fbshipit-source-id: 6cf9ff510e752ddb3bc9673d47f6a577ae9ccc79	2017-10-03 17:18:32 -07:00
Hao Lu	17a92389b3	Remove metal remnants Summary: Clean up the metal remnants in BUCK now that the metal code has been removed Reviewed By: bwasti Differential Revision: D5966095 fbshipit-source-id: 6b022624fe91a6728549d93d2954328c6b4e059e	2017-10-03 15:43:58 -07:00
gchanan	5f864ca4d2	Support TensorList arguments, torch.cat, and narrow in derivatives.yaml (#2936 ) * Generate torch.cat autograd via ATen. Most of the change is around supporting generation of: 1) TensorList arguments 2) Arguments to "size", "sizes", i.e. "sizes(dim)"	2017-10-03 18:21:10 -04:00
Lu Fang	c489445c46	Add ONNX support for Mean (#2956 )	2017-10-03 18:16:45 -04:00
SsnL	faa6fdfa18	Raise error when each channel only has 1 value in batch norm (#2961 ) * add error when each channel only has 1 value	2017-10-03 17:56:15 -04:00
Edward Z. Yang	6fbdf40284	Translate addmm into Gemm operator / fix alpha-beta mixup / execute in JIT. The alpha/beta naming in addmm was flipped; this commit fixes that problem. It also fixes the ONNX export of alpha/beta parameters. Finally, it supports executing matmul in the JIT. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-10-03 17:23:43 -04:00
Sam Gross	76a282d228	Fix resizing of gradInput in BatchNormalization (#2959 ) * In C there was a race condition when gradInput was resized within the parallel for * CUDA was missing the resize for gradInput	2017-10-03 15:38:34 -04:00
Ujjwal	9088a940d7	Completed Stride() documentation (#2948 )	2017-10-03 15:36:10 -04:00
Gregory Chanan	9b5f70780c	Merge commit '9a471b015fc6db86f67833d29114b7307e3b1727'	2017-10-03 10:27:43 -07:00
Sam Gross	1512562613	Fix lint	2017-10-03 09:20:55 -07:00
SsnL	1ff34a0535	generates non-equal random tensor for max pool	2017-10-03 11:56:59 -04:00
Holger Kohr	fa8044d92f	Add tests for array interface	2017-10-03 10:27:56 -04:00
Holger Kohr	c488a9e9bf	Add Numpy array interface to tensors	2017-10-03 10:27:56 -04:00
Adam Paszke	b6b41c829a	Add inplace checks in JIT	2017-10-03 10:20:58 -04:00
Adam Paszke	82bc97e6be	Fix THC exponential to not sample infinity	2017-10-03 10:06:47 -04:00
Adam Paszke	437d3af7bf	Add CUDNN_INCLUDE_DIR before CUDA directories in setup.py	2017-10-03 10:06:47 -04:00
Adam Paszke	bf82ecd776	Hotpatch THPP compile error	2017-10-03 10:06:47 -04:00
Adam Paszke	6fbbb1bc4e	Limit number of demangler invocations in autograd profiler	2017-10-03 09:55:37 -04:00
Hassan Eslami	7fc7756487	Refactor param initialization from model manipulation to layers logic Summary: This diff refactors the parameter initialization logic from model manipulation to layers Reviewed By: azzolini Differential Revision: D5920225 fbshipit-source-id: 50d230e406bc9ce0b00bdd164802c504cf32ea46	2017-10-02 22:08:40 -07:00
Gregory Chanan	9a471b015f	Implement _unnarrow (backwards of narrow) in ATen. Note this is currently prefixed with an underscore because it may go away (can be implemented via index).	2017-10-02 21:25:59 -04:00
Gregory Chanan	d381efcf3c	Enable wrap_dim in Local.cwrap. This includes torch.cat, which is a TensorList argument, which wasn't supported before.	2017-10-02 21:25:59 -04:00
Ilia Cherniavskii	bf7b11f235	Fix executor test base module Summary: Fix base module of executor test util Reviewed By: dzhulgakov Differential Revision: D5960543 fbshipit-source-id: 4bcaba583a2c8ee4f7544b8000ad60e8d9846936	2017-10-02 17:34:06 -07:00
Mohammad Hossain	d1213cc6c2	Include information of the engine for Caffe2 operators. Summary: Include information of the engine for Caffe2 operators. Reviewed By: salexspb Differential Revision: D5876323 fbshipit-source-id: 3b1837ccff098109bdfb0865a4fa3f509496ffdb	2017-10-02 17:24:48 -07:00
Bram Wasti	49396c6fa1	add openglv2 to experimental Summary: only changes needing review are in proto_utils.cc and caffe2.proto Reviewed By: jerryzh168 Differential Revision: D5956743 fbshipit-source-id: e03fffaf5bc8413f2320c20a89a421f1a69b2870	2017-10-02 15:59:25 -07:00
SsnL	312e0ce3ba	fix nn.HingeEmbeddingLoss doc	2017-10-02 18:14:40 -04:00
SsnL	2c26f4728a	fix typo in document of nn.AdaptiveMaxPool1d	2017-10-02 17:54:42 -04:00
Junjie Bai	e4701e63f6	Fix exporting Reshape with single torch.Size argument	2017-10-02 23:29:49 +02:00
Priya Goyal	4d605259b9	Fixes after merging ATen: * Mark all (non-static) Type methods as const.	2017-10-02 13:14:35 -07:00
Priya Goyal	e99aec9e9e	Merge commit '9f4accd5bb99900dfda9ffab110aeb7a4534d629' * commit '9f4accd5bb99900dfda9ffab110aeb7a4534d629': Make all dim arguments int64_t Converting dlpack tensor to aten tensor adding a simple class for converting atensor to dlTensor Test stub for dlconvertor adding dlpack header Fix build failure in MSVC Mark all (non-static) Type methods as const.	2017-10-02 13:01:18 -07:00
Ilia Cherniavskii	6258fc2f15	Executor benchmarks Summary: Executor benchmarks to measure QPS for different models (sparse nn hogwild and dataparallel, resnet50 dataparallel) Reviewed By: dzhulgakov Differential Revision: D5950770 fbshipit-source-id: 9aa8e0480468a55a6a97b10589d785c682fae01e	2017-10-02 12:59:21 -07:00
Ilia Cherniavskii	1f3424b78f	Adjust test thresholds Summary: Adjust test thresholds and number of examples Reviewed By: salexspb Differential Revision: D5945588 fbshipit-source-id: 7aecb8c642d8775f51dd3c296a28f1faf7ae0c81	2017-10-02 12:59:20 -07:00
jcatana	4c61cf2a1f	Updated functions for benchmark test	2017-10-02 15:01:51 -04:00
jcatana	00b62db723	Fix scope error error: ‘getInitConfig’ was not declared in this scope	2017-10-02 14:43:46 -04:00
Alykhan Tejani	621603169c	initialize new tensor	2017-10-02 09:53:21 -04:00
Taehoon Lee	6ef417ce89	Fix typos	2017-10-02 09:32:25 -04:00
Alykhan Tejani	ca644ca204	Add inplace zero to variable (#2212 )	2017-10-02 14:02:24 +02:00
Lu Fang	3ce6f0a457	turn ModelProto.graph into callback type	2017-10-01 23:09:13 -04:00
Lu Fang	9fc86782d7	Fix the breaking changes in ONNX PR #58	2017-10-01 23:09:13 -04:00
Mark Neumann	a64daf2c59	support dictionary return types in nn.Module's __call__ (#2037 )	2017-10-01 20:33:03 -04:00
Taehoon Lee	5d9de014bd	Fix typos	2017-10-01 03:09:25 -04:00
Soumith Chintala	21f8ad44e1	put limits on CuDNN BatchNorm codepath	2017-09-30 19:00:44 -04:00
SsnL	d5a7e304fa	added volumetric adaptive max pooling	2017-09-30 16:57:51 -04:00
SsnL	7ff9e0eb6c	fixed test_AdaptiveMaxPool*d_indices testing the non-adaptive classes	2017-09-30 16:57:51 -04:00
SsnL	9415f84982	spatial CUDA kernel int64_t stride inputs, removed unused parameter	2017-09-30 16:57:51 -04:00
SsnL	855b7e28ee	START_IND & END_IND macros, removed unnecessary computation in updateGradInput	2017-09-30 16:57:51 -04:00
SsnL	b9c942a7d4	reorder spatial variables BDHW	2017-09-30 16:57:51 -04:00
SsnL	0685c063bf	rename spatial version variables	2017-09-30 16:57:51 -04:00
Edward Z. Yang	67b2923a9d	Set all GPU state, not just the first one. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00
Edward Z. Yang	a8bf73be50	Mention random_ not available on CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00
Edward Z. Yang	2dcaa40425	Add get_rng_state_all and set_rng_state_all. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-30 16:21:04 -04:00
Edward Z. Yang	db298618e4	Minor typofix. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2017-09-30 16:18:03 -04:00
ngimel	60bff0a5f3	fix nccl version	2017-09-30 16:17:20 -04:00
Natalia Gimelshein	5cc3aff9ba	use nccl deb in Dockerfile, easier to change python version	2017-09-30 16:17:20 -04:00
Adam Paszke	9b9704e701	Simplify getApplyGrid in THC (#2900 )	2017-09-30 16:16:36 -04:00
Soumith Chintala	b3bc5fe302	refactor THCP method defs into cuda/Module.cpp	2017-09-30 13:14:35 -07:00
Lu Fang	7190979ab3	fix the script to generate the nanopb files (#2907 )	2017-09-30 10:21:06 -04:00
Yangqing Jia	d315c62e72	Kick fbsync Summary: fbshipit-source-id: 886ac051235a878b5b0fe294619bb6184d5d24ab (Note: this ignores all push blocking failures!) Reviewed By: dzhulgakov Differential Revision: D5947236 fbshipit-source-id: c3f7d00d5d7faad6366d4c456fffb9387f30b2aa	2017-09-29 16:31:11 -07:00
Bo Xie	4acf56cf80	Typo Summary: Typo in the docstring Reviewed By: azzolini Differential Revision: D5943729 fbshipit-source-id: f4c7adfb8d8855ba66ee988868650acbf0f6ccdb	2017-09-29 16:31:11 -07:00
Facebook Github Bot	c775b90426	Fix aten submodule Effectively D5935765	2017-09-29 16:31:11 -07:00
ngimel	181b2481d3	add error checking to grid sampler (#2902 )	2017-09-29 15:18:31 -04:00
peterjc123	d7ee3e0bd0	Fix the memory leak for multiple workers (#2897 )	2017-09-29 11:58:28 -04:00
Lorenzo Porzi	e67c2bc567	Fix detection of NCCL_INCLUDE_DIR (#2877 ) * Fix detection of nccl.h when libnccl.so is in /usr/lib/x86_64-linux-gnu and similar paths * full support for independent NCCL_LIB_DIR and NCCL_INCLUDE_DIR * lint fix * add back CUDA_HOME	2017-09-29 10:42:10 -04:00
Ilia Cherniavskii	8c0844f497	Executor test Summary: Executor test that checks on different models that model params are the same when using a given executor and simple net Reviewed By: akyrola Differential Revision: D5908769 fbshipit-source-id: b6f5a2cf89c5c67b68e8b9be3264f38d5740d897	2017-09-29 02:07:14 -07:00
Jiaming Liu	6a800be748	import lr_scheduler in __init__.py Fix https://github.com/pytorch/pytorch/issues/2809	2017-09-28 23:38:23 -04:00
Yangqing Jia	8286ce1e3a	Re-license to Apache Summary: Closes https://github.com/caffe2/caffe2/pull/1260 Differential Revision: D5906739 Pulled By: Yangqing fbshipit-source-id: e482ba9ba60b5337d9165f28f7ec68d4518a0902	2017-09-28 16:22:00 -07:00
Soumith Chintala	21707065d2	latest gloo	2017-09-28 15:34:42 -07:00
Lu Fang	96b17543a3	Compile with MKL in conda-build Summary: Problem: Without -DBLAS=MKL, conda-build won't include MKL library into Caffe2 build. And the BLAS performance is bad on CPU. Solution: Explicitly add the flag. Add mkl and mkl-include as dependencies. ezyang Yangqing Closes https://github.com/caffe2/caffe2/pull/1264 Reviewed By: bddppq Differential Revision: D5919192 Pulled By: houseroad fbshipit-source-id: bb51e4fc4015212694404180a610e06ec8ddb424	2017-09-28 15:19:06 -07:00
Soumith Chintala	a92fce1871	fix precision of grid_sample test	2017-09-28 15:11:50 -07:00
Edward Z. Yang	b9747af242	Use make_variable instead of VariableImpl. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 17:37:22 -04:00
Junjie Bai	7d40cce267	Simplify glu symbolic	2017-09-28 16:42:52 -04:00
Junjie Bai	c72ee3981b	Add support for exporting GLU to ONNX	2017-09-28 16:42:52 -04:00
Christian Sarofeen	002288c118	Add launch bounds to spatial grid sampler Needed for CUDA9	2017-09-28 16:33:34 -04:00
Aapo Kyrola	b9009df222	Add mask device, fix test Reviewed By: azzolini Differential Revision: D5930258 fbshipit-source-id: 16fdc2aeba7d95e815e55ca495118a5129495bb0	2017-09-28 12:33:01 -07:00
Xiaolong Wang	642dea487d	update inline comment Summary: as desc Reviewed By: kennyhorror Differential Revision: D5930526 fbshipit-source-id: 510388fd66b487410ff748a9e6f546a8ce27bc1d	2017-09-28 10:17:13 -07:00
Edward Z. Yang	954e9e370c	Uncurry trace. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	bff81a3cbd	s/extra/unmatched/ Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	91827edd1c	Fix initializers off-by-one. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	cdcf09405e	Use parent setUp, which also seeds CUDA if necessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	600fcf2f04	Delete params. We have decided we are not going to support it. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	fecca48a2c	Time how long compilation takes. Also, still give time even if we throw an error midway. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	0ad6c2d59c	Lintfix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	cfa176b9bd	Dump the final trace (redundantly), for ease of use. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	db3349faa3	Support class decorator syntax; remove instance compilation. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	1cf24b8d55	Restore enabled/time debug parameters. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	c430501ee5	Timing works again, controlled by PYTORCH_JIT_TIME. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	b1ba6c3ddd	Add back trace dumping, fix some syntax errors. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	7bace0a1d9	apaszke review comments Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Edward Z. Yang	0c40305ddd	Rewrite torch.jit interface. torch.jit now contains two user-facing functions: compile and trace (corresponding to what was previously trace/traced and record_trace). The non-curried versions of these functions have been eliminated, so that there is only one function in the API (we must have the curried versions, since these enable their use as decorators). There is detailed usage documentation in the docblocks for these methods. This comes with a complete rewrite of the internals of torch.jit, in the process fixing a number of bugs. Key points of the new implementation: - compile and trace both always return a Module representing the wrapped with compilation/tracing underlying function/module. This makes handling of the function/module cases more uniform, as we can think of the function case as creating an on-the-fly module with the parameters explicitly specified by the user. For technical reasons, we now require any parameters in the function case to be honest-to-goodness Parameters (gory details: you can't register a Variable as a Parameter to a Module, but you can't create a Parameter from a Variable while sharing the same underlying identity.) - Flattening and unflattening is done a lot more uniformly. We now have a _flatten and _unflatten function which are inverses of each other: _flatten always returns both the flat, tuple of Variables, as well as the "proto" (now referred in the code as the "struct") from which we can unflatten the variables. Low level functions like 'raw_trace' always work with the flattened inputs/outputs, which keeps their logic simple. - JIT trace keying now also includes the "struct" of the input arguments. This is a step towards accepting non-Variable arguments in functions, although flatten/unflatten don't currently support it. - TraceForKey (previously TraceInfo) has had its API reworked to have less degrees of freedom when you are interacting with it. TODO: Verify, timing, and trace dumping have been temporarily excised. I plan on adding them back. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-28 12:34:35 -04:00
Micael Carvalho	f2037970cb	Cleanup for 'prob_dist' in multinomial function (fixes #1584 )	2017-09-28 12:07:06 -04:00
Ahmed Aly	2f381bf6a4	Joint intent-slots modeling workflow initial diff Summary: This is a prototype for joint intents + slots modeling workflow, it has the following: 1- New data readers and data processors to process joint labels in parallel 2 - New JointNN model 3- New Fblearner workflow (jointnn) for joint modeling experimentations This is still work in progress, sending the diff to start the discussion about the interface and what we need to support in our joint modeling efforts. P.S. The number of lines in this diff is multiplied by 3 since caffe2 is mirrored in both fbandroid and fbobjc. I will highlight the most important parts so that people are not confused. Differential Revision: D5725243 fbshipit-source-id: ecc5322f937ad0fddaf200a9e090b3573a69f994	2017-09-28 03:47:34 -07:00
Andrey Malevich	b21ae92b56	Move expand_dims operators to a expand_dims_op.h/cc Summary: As desc. Reviewed By: dzhulgakov Differential Revision: D5928167 fbshipit-source-id: 8fb9b21e77766e5da3037b6a82c66eb4c1f5810c	2017-09-28 01:21:32 -07:00
Luca Antiga	c27aaf67cd	Improve Function docs	2017-09-27 22:41:45 -04:00
Frank Jiang	c33c9c1ba4	Fixed size_to_dim enforce Summary: Fixed Caffe2Enforce in size_to_dim() so that it works even if k is same as the number of dimensions in the tensor. Reviewed By: salexspb Differential Revision: D5893264 fbshipit-source-id: 525ea263f5e21e197c7010e1c66501355b8027c8	2017-09-27 19:05:56 -07:00
Trevor Killeen	095805036c	re-enable out-of-place bernoulli for cuda tensors	2017-09-27 21:32:36 -04:00
Gregory Chanan	9f4accd5bb	Make all dim arguments int64_t	2017-09-27 15:10:34 -07:00
Luca Antiga	e9fe0d8e6c	Fix for clang 9 build issues	2017-09-27 17:53:06 -04:00
Priya Goyal	0fb9db1606	Converting dlpack tensor to aten tensor	2017-09-27 09:58:52 -07:00
Priya Goyal	b4e02e8e0f	adding a simple class for converting atensor to dlTensor	2017-09-27 09:58:52 -07:00
Priya Goyal	4a58e0ca42	Test stub for dlconvertor	2017-09-27 09:58:52 -07:00
Priya Goyal	c6a2175d27	adding dlpack header	2017-09-27 09:58:52 -07:00
Junjie Bai	c8f824cd1b	Improve import failure messages	2017-09-27 10:37:54 -04:00
Aapo Kyrola	2108d1c250	Add unit-tests for fb-specific models Reviewed By: azzolini Differential Revision: D5895367 fbshipit-source-id: e7a7cdb272cdcdd7495efe9a6203750d1e6d6c48	2017-09-26 21:17:51 -07:00
peter	1a8fb81f22	define M_PI for TH	2017-09-27 00:06:01 -04:00
SsnL	dcee596a8b	change Variable.cuda to be consistent with Tensor.cuda	2017-09-26 23:48:40 -04:00
Simon Layton	22ec2ca968	Add shape inference to fp16<->fp32 ops Summary: Added to HalfToFloat and FloatToHalf Closes https://github.com/caffe2/caffe2/pull/1241 Differential Revision: D5902071 Pulled By: salexspb fbshipit-source-id: 9c79b0c50990200ca5bd6e00b3e8881d1c784e36	2017-09-26 19:33:08 -07:00
Jerry Zhang	fb1c7874ea	Deconv translation Summary: att Reviewed By: bddppq Differential Revision: D5865061 fbshipit-source-id: ba27e954771ed40b0284021dee1a766fc8678829	2017-09-26 16:48:10 -07:00
Andrei Chtcherbatchenko	cb986bb913	Deformable convolution operator in Caffe2 Summary: This diff implements deformable convolution operator. The idea behind it is that instead of using a fixed NxM kernel, we associate a set of learnable offsets (dx, dy) with each element of the kernel, and use bilinear interpolation to estimate weights in between the integer indices. For background see paper https://arxiv.org/abs/1703.06211 and mxnet implementation https://github.com/msracver/Deformable-ConvNets/tree/master/rfcn/operator_cxx To simplify code review of the new files the feature is stacked into 2 diffs. First diff duplicates core convolution operator into a separate set of files prefixed with deform_. It also provides documentation on the operator but nothing else. Second diff contains the actual changes that make deformable convolution possible. Thefore, I recommend focusing your code review on changes between diffs 1 and 2. Current limitations of the operator: 1. Only CUDA is supported. CPU version is not implemented. 2. Only NCHW layout is supported. 3. Only 2d convolution is supported. CUDA code is ported from mxnet implementation with minimal changes. See also inline comments in code for tricky parts. Reviewed By: akyrola Differential Revision: D5702983 fbshipit-source-id: 4d1bf2c6c73135e6a70dbe87037b38915f4453f9	2017-09-26 16:20:31 -07:00
Hao Lu	08b3140827	Back out D5772847 and D5908415 Summary: D5772847 is breaking real time style transfer on android and conv unit tests on iPhone 7 upgraded to iOS 11. The temporary fix in D5908415 only fixes android. iPhone 7 is still crashing. I think these two diffs should be backed out before D5772847 is fully debugged Reviewed By: fricc33 Differential Revision: D5913834 fbshipit-source-id: b8072c59c83adfed8a0b0ab0f42c39bc4398c7a0	2017-09-26 15:47:49 -07:00
Aapo Kyrola	8a45b65f96	ReduceFrontMax, ReduceBackMax + gradients, CPU and CUDA Summary: Implementation of ReduceFront/Back/Max/Gradient for CPU and CUDA. Reviewed By: asaadaldien Differential Revision: D5905402 fbshipit-source-id: 6967ce41aa95ee5ea7a90065430892e81a6da477	2017-09-26 15:22:25 -07:00
Di Yu	711d7137c7	Implement the gradient operator for element-wise Logit Summary: Implemented logit gradient with eps as arg. Add the unit test for it and explored the optimal parameter to run the test. Reviewed By: asaadaldien Differential Revision: D5910655 fbshipit-source-id: 44898b784a57c7ad45519b202b1eaf95c1c4d460	2017-09-26 14:49:22 -07:00
Hao Lu	59be3da3bc	Make GLContext unique_ptr Reviewed By: fricc33 Differential Revision: D5908793 fbshipit-source-id: 281f9ae9baac737fb8fafd79948d0804724087bc	2017-09-26 14:33:10 -07:00
Hao Lu	44b45a1d73	Fix real time style transfer on android Reviewed By: fricc33 Differential Revision: D5908415 fbshipit-source-id: 27af70baf7a953566cc64dab040f669784c4224b	2017-09-26 14:33:08 -07:00
Sam Gross	de757805fc	Implement some autograd functions using ATen (#2805 ) This adds some generated autograd functions implemented in C++, which are generated from derivatives.yaml. It also generates Python bindings for the Variable methods. The generated files are: Functions.cpp/h: subclasses of torch::autograd::Function VariableType.cpp/h: The at::Type for autograd Variables python_variable_methods.cpp: Python bindings to torch::autograd::Variable python_variable_methods_dispatch.h: wrapper which releases GIL and sets the CUDA device python_functions.cpp/h: exposes generated autograd functions as Python objects The generated functions are mostly shadowed by the definitions in variable.py. We'll remove the Python implementations in favor of the generated C++ implementations in a subsequent commit.	2017-09-26 17:08:00 -04:00
Frank Jiang	0a5ee1e806	Implemented RowWiseSparseAdagrad operator that only keeps one moment term per embedding Summary: Implemented version of SparseAdagrad that only keeps track of an average sum of squared gradients term for each row of the parameter tensor, rather than a sum of squared gradients term for each individual parameter. Differential Revision: D5881918 fbshipit-source-id: bd96ccf25554b457baaaca9309fc8048adbb37f7	2017-09-26 13:34:44 -07:00
Edward Z. Yang	9be8d0a9d2	Add a docstring for functional.linear. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-26 12:29:07 -04:00
Aapo Kyrola	753133f015	SignOp Summary: Equivalent to numpy.sign for CPU and CUDA. Reviewed By: dzhulgakov Differential Revision: D5906446 fbshipit-source-id: 389f994bccbb87a62df2c4aaacc327f9a6223cbd	2017-09-26 09:17:45 -07:00
Yangqing Jia	f14d75c7ef	Proper versioning and misc CMake improvements Summary: This brings proper versioning in Caffe2: instead of manual version macros, this puts the version information in CMake (replacing the TODO bwasti line) and uses macros.h.in to then generate the version in the C++ header. A few misc updates: - Removed the mac os rpath, verified on local macbook that it is no longer needed. - Misc updates for caffe2 ready: - Mapped cmake/Cuda.cmake with gloo's setting. - upstreamed third_party/nccl so it builds with cuda 9. - Separated the Caffe2 cpu dependencies and cuda dependencies - now libCaffe2_CPU.so do not depend on any cuda libs. - caffe2 python extensions now depend on cpu and gpu separately too. - Reduced the number of unused functions in Utils.cmake Closes https://github.com/caffe2/caffe2/pull/1256 Reviewed By: dzhulgakov Differential Revision: D5899210 Pulled By: Yangqing fbshipit-source-id: 36366e47366c3258374d646cf410b5f49f95767b	2017-09-26 08:52:21 -07:00
albanD	2d6a880952	Fix jit attributes tests	2017-09-26 10:51:58 -04:00
Junjie Bai	d9b0bcd7a4	Make all existing (except in RoIPool) "is_test" arguments required Reviewed By: akyrola Differential Revision: D5830168 fbshipit-source-id: 8634e9cfe308ba0ee90cd8a5c4b09a47b0b5f015	2017-09-25 23:46:12 -07:00
Huazhong Ning	808c9e3e70	fix a small typo error in sparse_lookup Summary: as title Reviewed By: kittipatv Differential Revision: D5908455 fbshipit-source-id: e7c66e84a27273156d66dfd043e9cfd9b0ab9a98	2017-09-25 21:46:56 -07:00
Lu Fang	def0506d95	Fix a caffe2-gloo dependency problem Summary: The problem: Building caffe2 fails because the installed directory contains "anaconda". The cause: Compiling Gloo will generate a new config.h file in the binary folder. If we put the original config.h in front, the compiler will complain "Expected GLOO_USE_CUDA to be defined". ~~~Switch the positions of the include folders can solve the problem.~~~ Function caffe2_include_directories in cmake/Utils.cmake is a little bit hacky. If the directory contains "anaconda", it will append the new include directory after existing include path. Otherwise it will insert the directory before the path. So in the first case, the directories are inserted in order, and in the latter one, they are inserted reversely. The solution: See the commit. pietern #1121 Closes https://github.com/caffe2/caffe2/pull/1258 Reviewed By: Yangqing Differential Revision: D5907167 Pulled By: houseroad fbshipit-source-id: 2cb3916e7e0313ebc3be3d1666bfa14bbf479607	2017-09-25 21:37:12 -07:00
Soumith Chintala	ded3a3b317	fix small bug in nccl setup helper	2017-09-25 21:21:36 -07:00
Soumith Chintala	7caceea6e8	better error messages for Conv*d input shape checking	2017-09-25 23:53:59 -04:00
Adam Paszke	833bedc77d	Add CUDA profiler bindings	2017-09-25 23:21:30 -04:00
Adam Paszke	b7849662b5	Always regenerate nn wrappers after rebuilding THNN and THCUNN	2017-09-25 23:21:30 -04:00
Adam Paszke	411e1469e0	Add tools for autograd profiling	2017-09-25 23:21:30 -04:00
Andrew Tulloch	bd5233b4f9	Fix on NEON Reviewed By: bwasti Differential Revision: D5907389 fbshipit-source-id: 51bce58f5a65e74f5f5b1d3ff0317f781ee8e57d	2017-09-25 16:21:27 -07:00
Soumith Chintala	f4eca7c94d	make CUDA_HOME take precedence over all other CUDA detection methods (#2863 )	2017-09-25 18:17:40 -04:00
IraKorshunova	4e23658d47	Fix warnings in TH_ErfInv (#2861 )	2017-09-25 18:05:32 -04:00
Soumith Chintala	9defb8e653	fix Dockerfile for submodules	2017-09-25 18:04:34 -04:00
SsnL	6a4ec4f9a8	VolumetricAdaptiveAveragePool	2017-09-25 15:12:44 -04:00
SsnL	7254104cfc	Spatial CUDA kernel: removed unused sizeD parameter; changed stride types to int64_t to be consistent with caller function	2017-09-25 15:12:44 -04:00
SsnL	dd891c4923	reorder spatial version variables so that B (batch) before D (feature) before H (height) before W (width); change some code to be more concise	2017-09-25 15:12:44 -04:00
SsnL	8ffe8eca6c	rename spatial version	2017-09-25 15:12:44 -04:00
Adam Paszke	3128218397	Allow specifying unused inputs to torch.autograd.grad (#2859 )	2017-09-25 14:42:33 -04:00
Sylvan Zheng	605beb2565	Parallelize CUDA LookupTable_renorm (#2803 )	2017-09-25 14:04:07 -04:00
Zachary DeVito	d6ff84de5c	Add an aten_op to contrib. Summary: This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN) through the ATen tensor library. Use of the operator is described in the README. The operator itself is generated from ATen's Declarations.yaml file which describes its public API. Closes https://github.com/caffe2/caffe2/pull/1235 Reviewed By: dzhulgakov Differential Revision: D5876944 Pulled By: zdevito fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a	2017-09-25 10:53:51 -07:00
Edward Z. Yang	c08395e290	Give a better error message when we hit a legacy function. We now include the type name of the legacy function implementing class. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-25 12:26:07 -04:00
Adam Paszke	2a8603c5e1	Make distributed recv return sender rank	2017-09-25 12:11:52 -04:00
Soumith Chintala	5be06230f9	cleanup external NCCL detection, add NCCL_ROOT_DIR / NCCL_LIB_DIR mechanism	2017-09-25 11:28:59 -04:00
Soumith Chintala	289dc2a870	fix argument passing bug in build_libs and allow external NCCL_ROOT_DIR via environment variable	2017-09-25 11:28:59 -04:00
Soumith Chintala	30ceac28e4	also check LD_LIBRARY_PATH for cudnn	2017-09-25 11:28:59 -04:00
Alisson Gusatti Azzolini	15a7bb3bff	GatherByKeyOp (Inverse operation of PartitionOp) Summary: Can be used to gather outputs of a sharded "Gather", or for the SparseLengthsSumGradient when we need the gradient on values. Reviewed By: akyrola Differential Revision: D5800901 fbshipit-source-id: 90835755d6d15be13fb0f538cfade980cf4a1cd2	2017-09-24 22:18:17 -07:00
Alisson Gusatti Azzolini	e3609a0619	Correctly propagate remap_blob across net boundaries Summary: If a blob is copy from device A to device B in the init_net, and then is used as an external_input in the train_net, we want the train_net to correctly use the blob already on device B instead of copying it over and over again. Reviewed By: akyrola Differential Revision: D5800870 fbshipit-source-id: d93f44bba80e4ed70eb03183d552496b54a966b5	2017-09-24 21:21:57 -07:00
Igor Sugak	4664808938	fix UMR UB in qtensor Summary: Exposed by UBSAN: ```lang=bash caffe2/caffe2/core/qtensor.h:61:40: runtime error: load of value 190, which is not a valid value for type 'bool' #0 0x7fb4fc09c289 in caffe2::QTensor<caffe2::CPUContext>::Resize(std::vector<int, std::allocator<int> >) caffe2/caffe2/core/qtensor.h:61 #1 0x7fb4fc090403 in caffe2::QuantizedFullyConnectedOp<float, caffe2::CPUContext, caffe2::DefaultEngine>::RunOnDevice() caffe2/caffe2/fb/operators/quantized_fully_connected_op.h:93 #2 0x7fb4fc08d5ee in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:306 #3 0x426d8a in caffe2::QFCTest(float, float, float, int, int, int, int) caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:78 #4 0x4295f6 in caffe2::QuantizedFullyConnectedTest_Test_Test::TestBody() caffe2/caffe2/fb/operators/quantized_fully_connected_op_test.cc:110 #5 0x7fb4eee3b6a1 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458 #6 0x7fb4eee2cbe1 in testing::Test::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2475 #7 0x7fb4eee2cd27 in testing::TestInfo::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2656 #8 0x7fb4eee2ce34 in testing::TestCase::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2774 #9 0x7fb4eee2eb8b in testing::internal::UnitTestImpl::RunAllTests() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4649 #10 0x7fb4eee2ef3c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl, bool (testing::internal::UnitTestImpl::)(), char const) /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:2458 #11 0x7fb4eee2ef3c in testing::UnitTest::Run() /home/engshare/third-party2/googletest/master/src/googletest/googletest/src/gtest.cc:4257 #12 0x7fb4fbee2ed0 in RUN_ALL_TESTS() third-party-buck/gcc-5-glibc-2.23/build/googletest/include/gtest/gtest.h:2233 #13 0x7fb4fbee2d60 in main common/gtest/LightMain.cpp:12 #14 0x7fb4e0ef7857 in __libc_start_main /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../csu/libc-start.c:289 #15 0x424e08 in _start /home/engshare/third-party2/glibc/2.23/src/glibc-2.23/csu/../sysdeps/x86_64/start.S:118 UndefinedBehaviorSanitizer: invalid-bool-load caffe2/caffe2/core/qtensor.h:61:40 ``` Reviewed By: yfeldblum Differential Revision: D5898877 fbshipit-source-id: e32b1732a1946fdafaec67b3fbc072dc93bcd917	2017-09-24 17:21:18 -07:00
Luca Antiga	c580352aee	Adding 1d upsampling (#2846 )	2017-09-24 16:50:24 -04:00
James Reed	ab62a92dab	De-dup beam search state reshape shape blob Summary: T22119644 showed that there is a potential illegal memory access in beam search with attention. Upon further inspection, we can see that there are multiple ops that write to the same old shape blob: {"output0": "model0/attention_decoder/attention_weighted_encoder_context_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/attention_weighted_encoder_context" }}, {"output0": "model0/attention_decoder/hidden_t_external_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/attention_decoder/hidden_t_external" }}, {"output0": "model0/decoder/layer0/cell_t_reshaped", "output1": "state_old_shape_before_choosing_per_hypo", "input0": "model0/decoder/layer0/cell_t" }}, This diff de-dupes these outputs Reviewed By: akyrola Differential Revision: D5899103 fbshipit-source-id: 8b6f3f113e764dfeb9262f6c442e1124559cd2d8	2017-09-23 23:19:44 -07:00
Yangqing Jia	a5879ea9bd	Resolve Windows warning C4099 issue (class/struct name mixture) Summary: TSIA - no functionality change introduced. See current build (e.g. https://ci.appveyor.com/project/Yangqing/caffe2/build/2148/job/mj9auhnernrgdfpe) for the warning messages produced right now. Closes https://github.com/caffe2/caffe2/pull/1255 Differential Revision: D5899177 Pulled By: Yangqing fbshipit-source-id: 3f41c82b0d5a1caba63d8cc7101582af63fbc99f	2017-09-23 23:04:56 -07:00
Yangqing Jia	5898bd4b4d	Update eigen to origin master Summary: Closes https://github.com/caffe2/caffe2/pull/1254 Differential Revision: D5899187 Pulled By: Yangqing fbshipit-source-id: d4b62686ca26a1dc4aab5235a36f98cf13e50cd2	2017-09-23 22:33:30 -07:00
Yangqing Jia	9fe99241b2	Update gloo to master Summary: Gloo was incorrectly updated in #1188 to the non-master version, so this brings back gloo to master. Closes https://github.com/caffe2/caffe2/pull/1253 Differential Revision: D5899017 Pulled By: Yangqing fbshipit-source-id: bdf6dbbc4402814e5bcf346cb8a610a448c53cef	2017-09-23 19:45:17 -07:00
Brett Koonce	b054f369a5	minor spelling tweaks Summary: Closes https://github.com/caffe2/caffe2/pull/1252 Differential Revision: D5898823 Pulled By: Yangqing fbshipit-source-id: e31f636cd4de2e6fef9375bd4ba6fd1b86d98af5	2017-09-23 17:46:57 -07:00
Edward Yang	7c45ac8e43	Officially support Python 3 in Conda build. Summary: Closes https://github.com/caffe2/caffe2/pull/1188 Reviewed By: Yangqing Differential Revision: D5898795 Pulled By: ezyang fbshipit-source-id: 9d17c3239d8c76f6e0858a877242b6d2e11a4f18	2017-09-23 16:16:49 -07:00
Yangqing Jia	cf769a7b6f	Avoid race condition in get device properties. Summary: TSIA Reviewed By: salexspb Differential Revision: D5898125 fbshipit-source-id: 1822ef2a017719442045fa446321d007b9d544b8	2017-09-23 16:01:23 -07:00
Setogit	8103e185d4	Fix OSX build w/CUDA=ON Summary: connect to https://github.com/caffe2/caffe2/issues/1249 With this change, build, install, and smoke test pass on OSX with CUDA=ON. ``` $ cmake -DUSE_CUDA=ON .. $ sudo make install $ python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" \|\| echo "Failure" Success ``` Closes https://github.com/caffe2/caffe2/pull/1251 Differential Revision: D5898758 Pulled By: Yangqing fbshipit-source-id: 4b2362af800dbcf2d5c441ab97f68a1c23f19f24	2017-09-23 15:48:12 -07:00
Yangqing Jia	7d06898592	Add travis webhook Summary: Closes https://github.com/caffe2/caffe2/pull/1250 Differential Revision: D5898162 Pulled By: Yangqing fbshipit-source-id: 06093e1c110c8645b876a13940552a39d3af1c43	2017-09-23 15:30:57 -07:00
Xingdong Zuo	eff5b8b09c	parameters to vector and vector to parameters (#2795 )	2017-09-23 13:06:40 -04:00
Junjie Bai	287f434900	Add support for exporting Addmm with alpha != 1 or beta != 1	2017-09-23 11:17:27 -04:00
Teng Li	767f704b84	Let Gloo check if it supports GPU Direct at run-time	2017-09-23 11:07:53 -04:00
Yangqing Jia	3cd0003bf6	fix layers_test: atol should almost always accompany rtol Summary: TSIA Reviewed By: chocjy Differential Revision: D5898129 fbshipit-source-id: f49e8478f79d9df5b59a26287fff7fc5417aac6e	2017-09-22 23:31:01 -07:00
Luke Yeager	ec801d535c	Fix typo in warning in data_parallel_model Summary: Closes https://github.com/caffe2/caffe2/pull/1219 Differential Revision: D5898077 Pulled By: Yangqing fbshipit-source-id: 7ee726ef3399a350a36e77093cbad0f70f8f3dce	2017-09-22 23:03:28 -07:00
Alisson Gusatti Azzolini	b984eb35cd	Fix concat_split_op for input.size() > sizeof(int32) Summary: We were keeping the offset in an int :( Reviewed By: kennyhorror Differential Revision: D5811955 fbshipit-source-id: 7d00833fa0d5847beed44b73ea74fcb5a8e24090	2017-09-22 15:48:35 -07:00
Edward Z. Yang	bf9ab91779	Indicate if the last invocation of setup.py was debug or not. How to use: import torch.version print(torch.version.debug) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 18:33:47 -04:00
Lu Fang	c630c34d41	remove Undefined node from black list, remove const_cast, add some comments	2017-09-22 18:22:32 -04:00
Lu Fang	f256f686b5	Remove device comparison TODO mark, change the white list to black list on node kind checing	2017-09-22 17:06:27 -04:00
Lu Fang	0566a4c026	Fix some bugs, and assume graph is always visited in topological order.	2017-09-22 17:06:27 -04:00
Lu Fang	18a1d272bf	Add attributes comparison, fixed several issues, more interesting test case.	2017-09-22 17:06:27 -04:00
Edward Z. Yang	972d048cf8	Typofix [ci skip]	2017-09-22 17:06:27 -04:00
Lu Fang	0a1ac8bfe5	create a cse pass, with very naive support.	2017-09-22 17:06:27 -04:00
Ethan Luo	999607460a	Add a verbose option for gradcheck. (#2780 ) When verbose is True, a more detailed message on why gradcheck failed will be printed to stderr.	2017-09-22 15:59:14 -04:00
James Reed	0d6baa0d59	Fix lack of data dependencies for beam search RecurrentNetwork op Summary: Previously, the RecurrentNetwork op used for our beam search did not have any of the input blobs listed as data dependencies. This was fine when we were using SimpleNet, since the ops were run in the order in which we added them to the graph, and thus the RecurrentNetwork op was run after all the other ops. However, when switching to DAG, the ops that produce input data for the beam search were being run in parallel with the RecurrentNetwork beam search op, which caused non-deterministic failures based on thread scheduling. This fixes that Reviewed By: jmp84, jhcross Differential Revision: D5879622 fbshipit-source-id: b622de1f6a24b2636b191096db92990e0535890c	2017-09-22 12:18:20 -07:00
Fei Sun	dee3ac3fce	Use Resize instead of reshape in speed_benchmark Summary: When using reshape, the speed_benchmark always reports an error. When using resize, the speed_benchmark can run without any issue. Reviewed By: salexspb Differential Revision: D5847999 fbshipit-source-id: 1b9899534d514c779d1710008e239124fe3d2377	2017-09-22 11:17:23 -07:00
Kittipat Virochsiri	5aac6a2e06	Make LastNWindowCollector thread-safe Summary: Make LastNWindowCollector optionally thread-safe. The main benefit is that the mutex can then be used to lock the buffer later, avoiding the need to copy the data. Reviewed By: chocjy Differential Revision: D5858335 fbshipit-source-id: 209b4374544661936af597f741726510355f7d8e	2017-09-22 09:48:30 -07:00
Andrew Dye	2070467c57	Allow CheckpointManager init() and load() to use a different db type with path_prefix Summary: CheckpointManager already accepts a path_prefix override for init() and load(), but it assumes the same db_type passed in __init__(). This change adds an optional path_type for each call. Reviewed By: boryiingsu Differential Revision: D5888152 fbshipit-source-id: 21cd31a62a0188fe0e0b19b43c3b232c2342d0a8	2017-09-22 09:48:29 -07:00
Lu Fang	e4d6ee114f	typo fix	2017-09-22 12:37:59 -04:00
Edward Z. Yang	450379256c	Don't call is_available() in manual_seed, it initializes CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	b17dfa07ba	Make CUDA seeding/RNG state functions even lazier Instead of initializing CUDA immediately and executing them, we wait until we actually initialize CUDA before executing. To keep things debuggable, we also keep track of the original backtrace when these functions are called, so we can inform users where they actually called the seeding/state functions (as opposed to the first time they actually initialized the RNG). Fixes #2517 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
Edward Z. Yang	06d7a0b1bc	Write docs for RNG seeding on GPU more carefully. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-22 12:37:06 -04:00
gchanan	805ad16924	Support "expanding" an empty tensor to an empty tensor. (#2824 ) This doesn't currently support expanding the sizes to (0,), but we can handle that eventually at the ATen level.	2017-09-22 11:58:03 -04:00
Teng Li	34a1d414a5	[Distributed/Gloo] 3X performance improvement of Gloo AllReduce By Enabling CUDA Direct (#2827 )	2017-09-22 09:32:56 -04:00
Yangqing Jia	9b2c5501b8	Fix Windows build Summary: After this, windows should be all green. Closes https://github.com/caffe2/caffe2/pull/1228 Reviewed By: bwasti Differential Revision: D5888328 Pulled By: Yangqing fbshipit-source-id: 98fd39a4424237f2910df69c8609455d7af3ca34	2017-09-21 20:13:15 -07:00
Luke Yeager	f841446fbb	Formatting fix for verbose net logging Summary: This doesn't look quite right: `I0915 21:26:03.910737 19 net_simple.cc:24] Creating operator :ConstantFill` Closes https://github.com/caffe2/caffe2/pull/1218 Differential Revision: D5888865 Pulled By: Yangqing fbshipit-source-id: 7db5059fd952c200a11fdcf01126e43497565116	2017-09-21 20:13:14 -07:00
Anshul Verma	a340d141de	Check num_elements > num_samples in UniformSampling Summary: When num_elements is less than num_samples, a workflow should fail during net construction time. Currently, it fails at run time. Reviewed By: kittipatv Differential Revision: D5858085 fbshipit-source-id: e2ab3e59848bca58806eff00adefe7c30e9ad891	2017-09-21 16:37:20 -07:00
Soumith Chintala	cf7e28de8e	add CUDA RNG docs	2017-09-21 19:36:41 -04:00
Yangqing Jia	85b08f1b99	Trying to fix all networkx 2 issues. Summary: Basically: - more generator vs list changes. - difference in the return type of bellman_ford(), see _get_path. 2.x returns list. - nx 2 removed nbunch in topological_order, so we will need to manually use lexicographical_topological_sort with an explicit key derived from the source node order. Closes https://github.com/caffe2/caffe2/pull/1243 Reviewed By: ajtulloch Differential Revision: D5883195 Pulled By: Yangqing fbshipit-source-id: 215d01fdd026d3af1a11ff866bf835e104370e4c	2017-09-21 16:01:47 -07:00
Yiming Wu	e86f941395	quick fix image input op Summary: This is a quick fix for image input op Reviewed By: bddppq Differential Revision: D5857147 fbshipit-source-id: 4b5102616fe295c7c21d394391af8030b79de992	2017-09-21 15:21:46 -07:00
Alexander Sidorov	4106c650d3	fix a race in type registration Summary: Here's what's happening: C++ only guarantees that static initialization is thread safe there: https://fburl.com/40wdmf1q So TypeNameRegisterer<bool> can not be called concurrently with TypeNameRegisterer<bool> from another invocation But there's no guarantees about different template specializations as they declare separate variables. Thus TypeNameRegisterer<int> might race with TypeNameRegisterer<bool>. And TypeNameRegisterer accesses the global variable here: https://fburl.com/gv2mhi08 Thanks dzhulgakov for the investigation! Reviewed By: Yangqing Differential Revision: D5882913 fbshipit-source-id: 4db1080b11e6351ce8136373e2dfc52980642fbb	2017-09-21 15:21:45 -07:00
Jeff Petkau	b8ab3080b1	Fix InferShapesAndTypes() for convolutions Summary: If kernel sizes were specified via "kernel_w" and "kernel_h", tensor size inference was incorrect in InferShapesAndTypes(): it was checking for "helper_w" instead of "kernel_w". Reviewed By: akyrola Differential Revision: D5884280 fbshipit-source-id: 430cbedcedadbe3570384e706198a4ddc499504e	2017-09-21 14:50:43 -07:00
Misha Smelyanskiy	2cbb4167c1	Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting Summary: Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting Sum, WeightedSum, and Mean reducers. Added number of unit tests to test these operators. Performance Results =================== Performance results are below for old code, sparse_lengths_sum_benchmark.old.par, that uses code in lengths_reducer_rowwise_8bit_ops.h, and our new code, optimized via code generator, sparse_lengths_sum_benchmark.new.par. Block size was 128 in all cases. [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.old.par --iteration 10000 --dtype uint8 I0912 02:49:58.773259 2640913 net_simple.cc:162] Time per operator type: I0912 02:49:58.773264 2640913 net_simple.cc:171] 0.75769 SparseLengthsSum8BitsRowwise [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype uint8 I0912 02:50:33.981832 2642102 net_simple.cc:162] Time per operator type: I0912 02:50:33.981837 2642102 net_simple.cc:171] 0.233322 SparseLengthsSum8BitsRowwise [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float16 I0912 02:51:26.748972 2643925 net_simple.cc:162] Time per operator type: I0912 02:51:26.748977 2643925 net_simple.cc:171] 0.106591 SparseLengthsSum [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float I0913 01:39:22.372238 1076874 net_simple.cc:162] Time per operator type: I0913 01:39:22.372244 1076874 net_simple.cc:171] 0.211041 SparseLengthsSum Analysis ======== Our optimized generated code is ~3.5x faster than original code in lengths_reducer_rowwise_8bit_ops.h as shown below. However, our uint8 is about 2x slower than float16 and is on par with float32. There are several reasons for that: 1. uint8 intrudoces extra instructions to multiply by bias and add scaling factors 2. In addition to emebding blocks, we are now also reading scale_bias. For every pair of scale and bias, we bring entire cache line of 64 bytes, whiles only using 8 bytes. 128-wide uint8 input block only occupies 2 cache lines and hence reading nearly entire extra cache lines of useless data adds to bandwidth wastage. 3. In addition, hardware prefetcher runs past the end of the input block and scale_bias cache line, trying to prefetch more useless lines. This effect was characterised in Appendix section of https://fb.facebook.com/notes/jason-lu/sparse-adagrad-performance-optimization-in-model-training/10214810437360961/ To get deeper insights into what is going on, we isolated SparseLengthsSum and SparseLengthsSum8BitsRowwise codes, for float32, float16 and uint8, into a microbenchmark, where we varried block size, while keeping table size constant (256MB) block_size time(uint8) time(float16) time(float32) 64 0.19 0.09 0.17 128 0.12 0.09 0.17 256 0.70 0.09 0.14 1024 0.50 0.06 0.10 The pattern for block size of 64 and 128 is similar to what we observed in sparse_lengths_sum_benchmark. However, we see that as block_size increases (for a fixed table size), time to perform embeddings decreases quite drastically. For block_size of 256 and beyond, uint8 starts achieving speedup over float16. Longer block better amortizes bandwidth wastage due to scale_bias and hardware prefetcher running past the end of the block. Reviewed By: kennyhorror Differential Revision: D5870907 fbshipit-source-id: 445321b96f1b5801ef91f296f6063c35673ee11b	2017-09-21 14:50:43 -07:00
Edward Z. Yang	8d19319fa7	Documentation for FusionGroup and Eval requested by @houseroad (#2808 ) Plus a test for Eval nodes in the IR, since we hadn't actually covered this case now that some nodes are transparently traceable. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 17:14:56 -04:00
Andrew Tulloch	7750b8db36	Remove NNPACK MaxPool wrapper Reviewed By: Maratyszcza Differential Revision: D5879495 fbshipit-source-id: e2020f7e32d64ed9318ab8d09ea63ce6f12a94a3	2017-09-21 12:05:47 -07:00
Yangqing Jia	84182b1853	Partially fix memonger with networkx 2.0 Summary: This fixes the apparent discrepancy (list vs iterator). After this, there are still 3 failures regarding topological sort but that seems a bit involved. Someone shall look deeper. Closes https://github.com/caffe2/caffe2/pull/1242 Reviewed By: akyrola Differential Revision: D5881806 Pulled By: Yangqing fbshipit-source-id: 5a200010724befde2fa8ce1b61a9c1ba42cad46a	2017-09-21 10:24:41 -07:00
albanD	892940b45a	fix memory leak in min function	2017-09-21 12:10:28 -04:00
Gregory Chanan	723214e9ac	Resolve mismatch between ATen master and pytorch subtree.	2017-09-21 12:10:09 -04:00
Edward Z. Yang	f6d3c17fd7	Directly check if the state_dict() has changed, so we fail earlier. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	e1add8fdff	[FIXUP] Give a slightly different error if tracing state is expired. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	6125ea7c83	Create a FuncModule for conveniently module-izing functions. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	ea2e7a1f4e	[FIXUP] Deduplicate accept_output logic, Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	a01de93fad	Give better error message when symbolic() arguments to line up. Now we actually tell the user what operator was being translated when there was a failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	c083d3ac2e	Fix minor bug when --accept'ing commits. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	b805f3f676	Also fix AvgPool2d to follow new convention. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	08148a462c	Print name of operator whose symbolic gave wrong number of inputs. TODO: Robustify this to apply to everything. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	bfed2dce25	AvgPool2d was returning too many outputs, fix it. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	871e3b41e3	Ask for the correct number of derivatives when tracing. - If you operate with TracingState, you MUST check if it is live. Otherwise you will segfault if it is expired; it is VALID for tracing states to become expired. - Tracing states can expire if they request backward tracing (which the tracer does by default). We don't want this to happen for exports, which only look at forwards. So make sure we set the correct num_derivatives. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	10ef82f13e	Make assertExpected work with Unicode strings in Python 2. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	460a03751b	More expect test improvements. - Print some diagnostic information when accepting new test output. - If it's the first time you ran an expect test, print out the output you got so it's easier to decide if you want to accept it. - Add infrastructure for expect-testing against exceptions (I'm going to use this in a later patch). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Edward Z. Yang	f3ae642162	Tighten up the ONNX export interface - If a user accidentally attempts to export a model that is in training mode, the tracer may perturb the parameters (since modules like batchnorm will update their parameters.) To prevent this from happening, we temporarily turn off training mode to make sure this doesn't happen. Temporary is important, since model export should not actually affect the model - If you have a buggy model which is changing the parameters, it is much better for us to export the state_dict() prior to executing the model, because that is what we actually used as the inputs to the trace. The state_dict() afterwards could be anything. - kwargs support never worked, so it's been excised. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-21 12:03:22 -04:00
Yangqing Jia	12ed8ebe5a	Revert D5879947: [caffe2][PR] Enable python3 builds Summary: This reverts commit c452362b2ab54397723b5be3f1258c57213f6fc4 bypass-lint Differential Revision: D5879947 fbshipit-source-id: 9dfe4e17ea84c252fa75c103c7a267e1ceddab98	2017-09-20 23:03:19 -07:00
Yangqing Jia	ae10a0a3e8	Enable python3 builds Summary: Closes https://github.com/caffe2/caffe2/pull/1240 Differential Revision: D5879947 Pulled By: Yangqing fbshipit-source-id: c452362b2ab54397723b5be3f1258c57213f6fc4	2017-09-20 22:05:23 -07:00
peter	d2d7a0f514	Fix build failure in MSVC	2017-09-21 00:02:04 -04:00
Jerry Zhang	5d6a41b8aa	MPSCNNMul(scalar only) Summary: Implementation of MPSCNNMul that only supports multiplying a tensor with a scalar value for now. Benchmark runtime for CPU, OpenGL and MPSCNN: ``` I0919 21:15:17.942468 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 527.795. Iters per second: 1.89467 I0919 21:15:21.043023 3068398464 opengl_test.cc:2293] Main run finished. Milliseconds per iter: 249.766. Iters per second: 4.00374 I0919 21:15:23.182369 3068398464 net_simple.cc:103] Main run finished. Milliseconds per iter: 175.548. Iters per second: 5.69644 ``` Reviewed By: hlu1 Differential Revision: D5870100 fbshipit-source-id: 2aadd5d134f3b8b40a41f638040cbef35a0086df	2017-09-20 19:22:01 -07:00
IraKorshunova	2b9765ad02	Erf and erfinv (#2799 )	2017-09-20 21:23:45 -04:00
Ahmed Taei	c3a3d6ceba	Add an option to use dynamic memory optimizer. Reviewed By: akyrola Differential Revision: D5869664 fbshipit-source-id: ab11bc27395bf10e8381ebf97e6afb83ae9af81f	2017-09-20 12:52:55 -07:00
Kittipat Virochsiri	1b059f4c98	Add option to ignore parameter initialization Summary: When parameter sharing is used, the model may not own the parameters. Emptying out initializer ensures that the shared model doesn't overwrite initialization. Reviewed By: chocjy Differential Revision: D5870362 fbshipit-source-id: f8587b84c3a13f331a3251973e8206563939606a	2017-09-20 12:03:22 -07:00
Kittipat Virochsiri	7d2b2cae19	Remove OFFLINE_TRAINING from global constant Summary: This is not a very generic constant Reviewed By: volkhin Differential Revision: D5870378 fbshipit-source-id: 59509bb48cecb52ba4a3f26b290855374547fe7e	2017-09-20 12:03:21 -07:00
randxie	1a83c372ec	address issue #1488 by using defaultdict in load_state_dict	2017-09-20 14:56:21 -04:00
Trevor Killeen	ad414908d7	Advanced Indexing with variables for autograd (#2590 )	2017-09-20 14:50:07 -04:00
Dmytro Dzhulgakov	0fff025973	Consistent behavior of max reduction for segment ops and fix test Summary: Two implementation of max pool reducers had different semantics in case of equal indices. It matters less in real cases, but breaks tests. Choosing the behavior of LengthMax over SortedSegmentRangeMax as the former is more widely used. Also some minor tweaks for the test code. Reviewed By: Yangqing Differential Revision: D5870386 fbshipit-source-id: 6488cbd5cacaf595ffc07c44084730dd44b3f9dd	2017-09-20 10:59:43 -07:00
Zach DeVito	2996aad68c	remove dead code, add insertAt helper	2017-09-20 12:24:27 -04:00
Edward Z. Yang	6e495f5f85	Make output_ a const field in Graph. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	0821856ac9	Add missing is-Param assert Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	6efd797376	Document unchecked invariant. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	25c2b7d8b2	Some minor extra comments on python_function Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	794e52bb1c	Make cloneFrom() copy all metadata; use createClone() as much as possible. To be honest, this was the whole point of this refactor set. I noticed that in a lot of code, we were repeatedly copying lots of metadata from old nodes to new nodes. This was quite concerning because I wanted to add some more metadata (alias information) and I didn't want to have to get it right in all cases. Plus, in a lot of cases we were forgetting to set more optional properties like debug names when we "copied". To solve this, I first made cloneFrom() copy all of this metadata. Then, I searched for all occurrences of setType() (a proxy for "I'm cloning this node), looked for cases where we really were morally doing a copy, and rewrote the code to use cloneFrom() instead, allowing us to drop explicit setType() (and getting more metadata preservation in the process.) Finally, I refactored tryToMoveChunk. The code is modestly longer, but the new version has the nice property that the initialization of selects for input_chunk are next to the creation of the node (as opposed to delayed for later.) I also added a lot more comments for invariants I noticed when I was working on the code. One minor extra change: TensorType grew a new constructor and a withSizesStride "immutable setter" which returns a new copy of TensorType with different info. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	0b421e590c	Move some logic into create(). Previously, there was a hidden, unchecked invariant that you were not allowed to call create(kParam) or create(kReturn). Now that the logic for them is embedded in create(), the create(kParam) case is valid, and the create(kReturn) case will raise dynamically if you try it. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	ba95ffed97	Const correctness in IR and Attribute / linked list excision Since this code has been stable for a while, I think it's a good opportunity to make it const correct. There is only a slight increase in code size, which I hope will appease @zdevito. - consts were added to all methods which are logically const. Most notably, lint() is now declared const. - I made extra const versions of Node::iterator(), Node::reverseIterator(), Graph::nodes(), Attribute::find(), linked_list::begin(), linked_list::end(), linked_list::rbegin(), linked_list::rend(); in all cases these were one-liners except for find() (I spent a little time trying to make find() a one-liner but didn't think of a way to do it.). - graph_node_list got factored out into a new, templated type linked_list<T> (perhaps we should call it intrusive_list<T>). I had to template the iterator to define constant and non-constant iterators without duplicating code, and once I was there, I decided to templatize everything else. The code nicely factors out, although I wouldn't recommend using it for anything else without more refactoring. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Edward Z. Yang	670ec4bc59	Split Type into its own header file. No other substantive changes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-20 12:24:27 -04:00
Zach DeVito	06903c3525	bugfix for word language model	2017-09-20 12:24:27 -04:00
Zach DeVito	5949bb27b5	move write_vis into contrib	2017-09-20 12:24:27 -04:00
Zach DeVito	a194e66186	allow Concat operators to be the final operator in a fusion group, and update the fusion compiler to support code that includes final concats	2017-09-20 12:24:27 -04:00
Zach DeVito	27bae83a3a	make graph layout more readable	2017-09-20 12:24:27 -04:00
Zach DeVito	3fb39add23	debugging code to understand fuser	2017-09-20 12:24:27 -04:00
Sam Gross	c8993a3e2c	Add add_scaled and sub_scaled to TH and THC (#2789 ) These functions accept a scaling parameter like THTensor_(cadd)/(csub), which will make it easier to have the same signature for tensor and scalar addition in PyTorch and ATen. For example: tensor.add(other, alpha=2) Will work if other is a scalar or a tensor value. See #2739	2017-09-20 11:39:40 -04:00
Adam Paszke	16a3de081a	Minor rebase fixes	2017-09-20 11:22:57 -04:00
rluo	3be774ccb7	Use TH_TENSOR_APPLYx_CONTIG for contiguous tensor to increase the speed.	2017-09-20 11:22:57 -04:00
Sam Gross	06fdce04ca	Generate ATen from torch/csrc/Declarations.cwrap (#2791 ) This adds a concatenated Declarations.cwrap which is the result of running ATen/extract_cwrap.py on TensorMethods.cwrap. This will let ATen and the Variable bindings temporarily diverge from Tensor before the new Variable class subsumes Tensor. See #2739 and #2633	2017-09-20 09:44:01 -04:00
Sam Gross	f4169260f8	Fix crash when calling backwards on leaf variable which does not require grad (#2788 )	2017-09-20 09:43:20 -04:00
Emanuel Jöbstl	39434ee2e4	Added LPPool1d. (#2783 )	2017-09-20 09:19:29 -04:00
Andrew Tulloch	aff1370974	AndroidGLContext can lazily allocate static map Reviewed By: fricc33 Differential Revision: D5867975 fbshipit-source-id: 0cc9159c27e3f667a001b4cd7768098c36d9550f	2017-09-19 19:06:48 -07:00
Gregory Chanan	871530afdf	Mark all (non-static) Type methods as const.	2017-09-19 18:21:42 -07:00
Yangqing Jia	06b7a9e0f6	Backed out changeset 3a5c020294d8 Summary: Broke CAFFE2_HYPOTHESIS_PROFILE=debug buck test //caffe2/caffe2/python:lengths_reducer_rowwise_8bit_ops_test Reviewed By: kennyhorror Differential Revision: D5867880 fbshipit-source-id: 80c6f23eccb59b74be4a7258b4f193d79f814c3f	2017-09-19 17:54:18 -07:00
Luke Yeager	dab5bd23ea	fp16: RecurrentNetwork Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1192 Reviewed By: salexspb Differential Revision: D5829775 Pulled By: akyrola fbshipit-source-id: e0f7609317ca95faf9eb9c81b265d678a24a80e3	2017-09-19 14:47:27 -07:00
Hao Lu	ddf6ad83aa	Add tiling support to GLConcat Reviewed By: fricc33 Differential Revision: D5864131 fbshipit-source-id: 63894f5082fbfc64cd078a8f781b4db1b00a69dc	2017-09-19 13:32:12 -07:00
Misha Smelyanskiy	b468ffe6d1	Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting Summary: Adding uint8 support for to code generator for and high-performance emebding look-up kernels, supporting Sum, WeightedSum, and Mean reducers. Added number of unit tests to test these operators. Performance Results =================== Performance results are below for old code, sparse_lengths_sum_benchmark.old.par, that uses code in lengths_reducer_rowwise_8bit_ops.h, and our new code, optimized via code generator, sparse_lengths_sum_benchmark.new.par. Block size was 128 in all cases. [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.old.par --iteration 10000 --dtype uint8 I0912 02:49:58.773259 2640913 net_simple.cc:162] Time per operator type: I0912 02:49:58.773264 2640913 net_simple.cc:171] 0.75769 SparseLengthsSum8BitsRowwise [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype uint8 I0912 02:50:33.981832 2642102 net_simple.cc:162] Time per operator type: I0912 02:50:33.981837 2642102 net_simple.cc:171] 0.233322 SparseLengthsSum8BitsRowwise [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float16 I0912 02:51:26.748972 2643925 net_simple.cc:162] Time per operator type: I0912 02:51:26.748977 2643925 net_simple.cc:171] 0.106591 SparseLengthsSum [root@fblearner001.01.ftw1 /home/msmelyan]# ./sparse_lengths_sum_benchmark.new.par --iteration 10000 --dtype float I0913 01:39:22.372238 1076874 net_simple.cc:162] Time per operator type: I0913 01:39:22.372244 1076874 net_simple.cc:171] 0.211041 SparseLengthsSum Analysis ======== Our optimized generated code is ~3.5x faster than original code in lengths_reducer_rowwise_8bit_ops.h as shown below. However, our uint8 is about 2x slower than float16 and is on par with float32. There are several reasons for that: 1. uint8 intrudoces extra instructions to multiply by bias and add scaling factors 2. In addition to emebding blocks, we are now also reading scale_bias. For every pair of scale and bias, we bring entire cache line of 64 bytes, whiles only using 8 bytes. 128-wide uint8 input block only occupies 2 cache lines and hence reading nearly entire extra cache lines of useless data adds to bandwidth wastage. 3. In addition, hardware prefetcher runs past the end of the input block and scale_bias cache line, trying to prefetch more useless lines. This effect was characterised in Appendix section of https://fb.facebook.com/notes/jason-lu/sparse-adagrad-performance-optimization-in-model-training/10214810437360961/ To get deeper insights into what is going on, we isolated SparseLengthsSum and SparseLengthsSum8BitsRowwise codes, for float32, float16 and uint8, into a microbenchmark, where we varried block size, while keeping table size constant (256MB) block_size time(uint8) time(float16) time(float32) 64 0.19 0.09 0.17 128 0.12 0.09 0.17 256 0.70 0.09 0.14 1024 0.50 0.06 0.10 The pattern for block size of 64 and 128 is similar to what we observed in sparse_lengths_sum_benchmark. However, we see that as block_size increases (for a fixed table size), time to perform embeddings decreases quite drastically. For block_size of 256 and beyond, uint8 starts achieving speedup over float16. Longer block better amortizes bandwidth wastage due to scale_bias and hardware prefetcher running past the end of the block. Reviewed By: dzhulgakov Differential Revision: D5824641 fbshipit-source-id: 3a5c020294d84874da78c6943e596423393473d6	2017-09-19 10:50:09 -07:00
Simon Layton	2bcad92d12	Fixes for NCCLReduce with non-zero root Summary: All other NCCL ops expect paired src, dst pointers for each GPU. Reduce doesn't, and the old logic would always set dst for rank = 0 regardless of whether that was the root or not. This change takes into account that Reduce only has one output, and it should assign dst only for the root rank. Also changes the schema to allow inplace for any input and Output(0). Closes https://github.com/caffe2/caffe2/pull/1214 Differential Revision: D5843177 Pulled By: pietern fbshipit-source-id: 1e775e6a1ca052e29691b89c1429db03a0e6378b	2017-09-19 10:41:14 -07:00
Jerry Zhang	cc3e6ade42	Fix caffe translator Summary: att Reviewed By: bddppq Differential Revision: D5854100 fbshipit-source-id: bebb0fbe36367f973e93cb09c98ec75758829769	2017-09-19 09:21:14 -07:00
Edward Z. Yang	5deacb5bce	Enhance comments * Explain why null edge pruning interferes with SimpleEval * Explicitly refer to notes using Note sigil * Copyedit comment for clarity	2017-09-19 10:53:32 -04:00
Adam Paszke	c536da7064	Remove TensorMeta	2017-09-19 10:53:32 -04:00
Adam Paszke	a7c4152302	Prune null edges in Eval nodes	2017-09-19 10:53:32 -04:00
Adam Paszke	b66d90c84f	Add a pass to remove all non-standard ONNX nodes before export (#225 )	2017-09-19 10:53:32 -04:00
houseroad	6855d24ff1	Move pybind11 type_caster to different pybind.h in the corresponding folders. (#222 )	2017-09-19 10:53:32 -04:00
Adam Paszke	b7e89d7248	Add support for some ONNX nodes in JIT closure	2017-09-19 10:53:32 -04:00
Adam Paszke	fe5c644f81	Handle AddConstant in fusion compiler	2017-09-19 10:53:32 -04:00
Adam Paszke	e05cfb2064	Make sure passes don't mess up stages of nodes and graphs	2017-09-19 10:53:32 -04:00
Adam Paszke	8a605ce766	Minor refactor of fusion compiler	2017-09-19 10:53:32 -04:00
houseroad	75497d624e	Add JIT_EXPECT (#220 ) Add JIT_EXPECT(M) and turn some JIT_ASSERT(M) to JIT_EXPECT(M)	2017-09-19 10:53:32 -04:00
Edward Z. Yang	d4fda0bbf8	More updates for Variable ATen Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-19 10:53:32 -04:00
Adam Paszke	ba6e652c02	Add simple mode to Eval	2017-09-19 10:53:32 -04:00
Edward Z. Yang	1f80dd03bd	Track change of Variable from shared_ptr to ATen style tensor Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-19 10:53:32 -04:00
Adam Paszke	aa1a94058b	Add AddConstant node to the JIT	2017-09-19 10:53:32 -04:00
Adam Paszke	7506a3bcb7	Add pybind converters for Symbol and AttributeKind	2017-09-19 10:53:32 -04:00
Adam Paszke	28828e033f	Make certain functions traceable	2017-09-19 10:53:32 -04:00
Adam Paszke	4d1ed4ec42	Assign traces before saving Variables	2017-09-19 10:53:32 -04:00
Adam Paszke	af688905e4	Fix a bug in CppOp (missing cloneFrom)	2017-09-19 10:53:32 -04:00
Adam Paszke	214eef5e5d	Record device information in TensorType and check it in the fuser	2017-09-19 10:53:32 -04:00
Zach DeVito	ab375e19aa	size test	2017-09-19 10:53:32 -04:00
Edward Z. Yang	83e38d687b	Add a comment about what is going on here	2017-09-19 10:53:32 -04:00
Zach DeVito	dd85947542	fix the fusion test WAR	2017-09-19 10:53:32 -04:00
Adam Paszke	2ae7d8e5f9	Fix Chunk heuristic in graph fuser	2017-09-19 10:53:32 -04:00
Adam Paszke	b708b6de8d	Add ONNX pass (JIT trace initialization)	2017-09-19 10:53:32 -04:00
Adam Paszke	0e53fe3a41	Put ONNX files where they belong	2017-09-19 10:53:32 -04:00
Adam Paszke	8dae433de8	Move JIT passes to a separate directory	2017-09-19 10:53:32 -04:00
Adam Paszke	2a7b4f5095	Allow TensorMeta to be undefined	2017-09-19 10:53:32 -04:00
Adam Paszke	6b60f31081	Fix bugs in AutogradClosure	2017-09-19 10:53:32 -04:00
Adam Paszke	964b731af3	Try to handle NULL Variables in the tracer	2017-09-19 10:53:32 -04:00
Adam Paszke	aafa35e0b5	Fix bugs in Traceable Previous refactor introduced a few problems like not saving the output proto, and it didn't use the flattened inputs when querying the key.	2017-09-19 10:53:32 -04:00
Trevor Killeen	9c39e8cecb	Parity with NumPy newaxis placement in indexing (#2779 )	2017-09-19 10:38:18 -04:00
Xiaolong Wang	4341dc7e7f	avoid variable naming conflict in macro Summary: I hit a strange bug and found that the reason is that in the macro, it uses a temp variable named 'r'. This will cuasing conflict when the macro's own argument is also expanded as 'r' or related stuff (in my case, it expands to 'r.size()' where here r is a tensor) Reviewed By: pietern Differential Revision: D5822833 fbshipit-source-id: 64a6c6b0fc5a1f8359d459d70644bb232ef40606	2017-09-18 23:19:10 -07:00
Aapo Kyrola	ad68f623f2	task api, fix comments - a bit cleanup Summary: Comments say experimental: don't use it. But these functions are used in the critical path from pipeline.py, so better to remove the comment? Also changed if-else to first check for None. Although python does not crash with getattr(None, "x"), it is confusing. Some lint issues. Reviewed By: azzolini Differential Revision: D5853639 fbshipit-source-id: 977de5ba0ea3ae26343ae5fcacac883faf892b0e	2017-09-18 21:43:20 -07:00
Ilia Cherniavskii	f8f5e79f5f	Backpropagation for If operator Summary: Adding backward pass support for If operator: - Implemented necessary changes to Do operator and generation of gradient Do operator to properly forward gradient blobs in and out of subnet - Using WorkspaceManager to keep track of workspaces used by Do, in case we need to have access to local blobs to compute gradients (also important for loop's backprop) - Update to Workspace to handle blob binding from multiple parent workspaces - Implemented generation of gradient If operator - Unit test to build and train a net with If control op Reviewed By: azzolini Differential Revision: D5745096 fbshipit-source-id: 1023c90a2113716254424d1e50b9e560fe9083e5	2017-09-18 16:17:42 -07:00
Soumith Chintala	561fc8d96a	remove rotted TODOs	2017-09-18 18:17:20 -04:00
Soumith Chintala	25aea46739	add missing AutoGPU guards	2017-09-18 18:03:03 -04:00
Soumith Chintala	8536079142	missing include	2017-09-18 14:51:23 -07:00
gchanan	30af9d793d	Add broadcasting to bitwise operators. (#2776 )	2017-09-18 17:30:02 -04:00
Richard Zou	5229a79bf5	Implement THCUNN code for GridSampler (#2737 )	2017-09-18 17:29:26 -04:00
Yangqing Jia	888f4d4f61	Update cub to master Summary: For future reference - seems that at some point cub had a force push. If any already checked out branch has issues, try deleting the cub submodule and redo git submodule update --init. Closes https://github.com/caffe2/caffe2/pull/1227 Differential Revision: D5856030 Pulled By: Yangqing fbshipit-source-id: c192974246c27ce6bd739295c31c25fd75766a35	2017-09-18 14:17:35 -07:00
David Pollack	c6ea6ed8ff	Add Nd Padding, Pad1d functions and ConstantPad3d (#2657 )	2017-09-18 14:48:49 -04:00
Kazuma Hashimoto	ea8b09365c	Specifying the value used for padding (#2751 ) * Specifying the value used for padding The "pad_packed_sequence" function fills padded elements with zeros, but sometimes it is not useful. For example, some previous papers on NLP, including my recent paper [1], use a max-pooling technique for RNN-based sentence representations. More specifically, the max-pooling technique selects the maximum value from all time steps (i.e., hidden states) for each dimension. In such a case, we do not want the padded zeros to be selected. To overcome this situation, we can simply use a very small value instead of zero. An LSTM example is shown below: input = embedding(Variable(batchInput)) packedInput = nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first = True) h, (hn, cn) = self.encoder(packedInput, (h0, c0)) h, _ = nn.utils.rnn.pad_packed_sequence(h, -1024.0 batch_first = True) sentenceRep, _ = torch.max(h, 1, keepdim = True) [1] A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks. Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. The 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). https://arxiv.org/abs/1611.01587 (Equation (4)) * Modified the order of the arguments Following the suggestion, I modified the order of the arguments.	2017-09-18 14:48:10 -04:00
albanD	2763bfc49e	Norm subgradient at 0 (#2775 )	2017-09-18 12:26:36 -04:00
Soumith Chintala	16ddc863f4	revert more THC Atomics bits from Windows changes	2017-09-18 07:09:01 -07:00
Junjie Bai	c231ac2253	Add an argument for suppressing download progress Summary: In some cases (e.g. CI), showing progress bar will mess up the log. Reviewed By: jerryzh168 Differential Revision: D5850918 fbshipit-source-id: 2da9d020832264cef977391dc2fd8d1e2677d159	2017-09-17 20:15:51 -07:00
Luke Yeager	ebeaecbfa3	workspace_gpu: Get{CUDAVersion,DeviceProperties} Summary: Expose some useful utilities to Python Closes https://github.com/caffe2/caffe2/pull/1216 Differential Revision: D5843888 Pulled By: akyrola fbshipit-source-id: fc731781aec3c7cc6a4b7132f1624423d015abff	2017-09-17 20:01:34 -07:00
Yangqing Jia	fb37d35d28	Additional fix for LRN: unitialized variable. Summary: It is interesting that under facebook fbcode this was not an issue - but definitely causing issue on oss. Closes https://github.com/caffe2/caffe2/pull/1225 Reviewed By: dzhulgakov Differential Revision: D5851360 Pulled By: Yangqing fbshipit-source-id: f8a8f15184092a888bdc909ba2323229d4485902	2017-09-17 19:01:23 -07:00
Yangqing Jia	4e26aa4f91	Update nccl master Summary: This is needed for cuda 9 builds. Closes https://github.com/caffe2/caffe2/pull/1226 Reviewed By: dzhulgakov Differential Revision: D5851354 Pulled By: Yangqing fbshipit-source-id: 4a23f06b97262fc603b4f0b7b84c3122888e954d	2017-09-17 16:54:07 -07:00
Yangqing Jia	211cb13a7d	fix local_response_normalization Summary: This fixed a minor bug in D5690181. Failing test observed in https://travis-ci.org/caffe2/caffe2/jobs/275603846 Reviewed By: jerryzh168 Differential Revision: D5850985 fbshipit-source-id: 02aefb8902878d6adf7686a94153823b92c0e7b7	2017-09-17 14:19:34 -07:00
Adrian Punga	59b139dabd	Fixed compilation on OSX (#2761 )	2017-09-17 10:17:59 -04:00
Soumith Chintala	1fc85cde1f	serialization fix to preserve backward compatibility and contbuild (#2763 )	2017-09-17 10:16:21 -04:00
Soumith Chintala	e397439611	fix THPP CUDA after windows changes	2017-09-16 23:04:54 -07:00
Adam Paszke	ddd417faf0	Fix non-CUDA builds after Windows PRs (#2760 )	2017-09-17 02:02:52 -04:00
Soumith Chintala	2bc1e07b62	THC/THCUNN reverts of incorrect changes after Windows fixes	2017-09-16 22:19:56 -07:00
Soumith Chintala	db7c76128f	Merge commit '6643f1b9caefd466441f7c0d18ba06a2a810b7f5'	2017-09-17 00:37:45 -04:00
peterjc123	6643f1b9ca	Win64 support for lib/ATen	2017-09-16 21:36:30 -07:00
Soumith Chintala	7951c4a68d	Merge commit '5b5218ea9574f93887498a81038352af47fd7fd8'	2017-09-17 00:31:21 -04:00
peterjc123	c7d5ddd23b	Improve Windows Compatibility(for lib/THCS) (#2442 ) * Win64 support for lib/THCS * Kill some warnings for MSVC	2017-09-17 00:02:44 -04:00
peterjc123	4ead38f96a	Improve Windows Compatibility(for lib/THS) (#2449 ) * Win64 support for lib/THS * Fix VS warnings(for lib/THS) * Revert changes that prevent sucessful build * use the type descriptors for int64_t * Fix warnings in THS for MSVC	2017-09-17 00:02:09 -04:00
peterjc123	5befdd45bd	Win64 support for lib/THD (#2444 )	2017-09-17 00:01:40 -04:00
peterjc123	268a1f1b96	Improve Windows Compatibility(for lib/THPP) (#2447 )	2017-09-17 00:00:08 -04:00
peterjc123	caecbffe62	Improve Windows Compatibility(for lib/THCUNN) (#2443 )	2017-09-16 23:58:22 -04:00
peterjc123	0e691f8998	Improve Windows Compatibility(for lib/THNN) (#2446 )	2017-09-16 23:55:15 -04:00
peterjc123	1c51c185a1	Improve Windows Compatibility(for lib/THC) (#2440 )	2017-09-16 23:50:15 -04:00
peterjc123	61813cfd97	Improve Windows Compatibility(for lib/TH) (#2439 ) * Win64 support for lib/TH * Edit codes to clear warnings(for TH) * fix format string * revert modulo changes * change formats for snprintf	2017-09-16 23:40:58 -04:00
Xianjie Chen	eccfa1041c	fix cuda GatherOp for empty batch Summary: as title Differential Revision: D5840432 fbshipit-source-id: 5d9021f152c21d24e91dc0cc3d95443782afc228	2017-09-15 17:40:43 -07:00
Dhruv Mahajan	c3fd31b1a2	weights for labels in image_input_op Summary: Introduced weight for labels in multi-lable setting. An extra weight blob is introduced and read in the operator in case lable setting is weighted sparse. Reviewed By: kevinwilfong Differential Revision: D5812467 fbshipit-source-id: efb209092e1e9effc915b0a753fa0c67b47a4fb6	2017-09-15 17:40:42 -07:00
Andrew Gallagher	9639ddd22f	Cleanup omnibus-blacklist-hack rules Summary: Now that Buck supports a way to opt-out external C/C++ libs from omnibus linking, this diff removes the hack we previously relied on (and which got copy-pasta-d everywhere). Reviewed By: pixelb Differential Revision: D5832450 fbshipit-source-id: cc3d12488f8498be6fb12bce1fedb3ad1accb518	2017-09-15 16:49:35 -07:00
Aapo Kyrola	9ec981b866	for CPU-data parallel, allow sharing model Summary: On CPU, no need to replicate parameters. So try using only one copy (cpu_0) for parameters. Made resnet50_trainer use shared model in cpu mode. Reviewed By: wesolwsk Differential Revision: D5812181 fbshipit-source-id: 93254733edbc4a62bd74a629a68f5fa23f7e96ea	2017-09-15 16:19:37 -07:00
Yiming Wu	132e35bf51	faster sparse lengths weighted sum Summary: following optimization in sparse lengths sum, translate it into weightedsum Reviewed By: azzolini Differential Revision: D5732859 fbshipit-source-id: 430ee077a1063f3c55806f6dbb5ea46f0fd5c486	2017-09-15 15:46:15 -07:00
Yiming Wu	4c6d177b4f	faster SparseLengthsSum kernel Summary: following wickedfoo's previous diff, I made SparseLengthsSum kernel a little faster. I did: - `__restrict__` note for ptrs - `ExactBlock` optimization for kernels where post < Maxthreads. This is a general case ===Check Test Area Please, Are we looking at another 57% speed up here???=== Reviewed By: azzolini Differential Revision: D5676351 fbshipit-source-id: 963f4712106b324fda488ec5c63b7e010b915814	2017-09-15 15:46:14 -07:00
Fei Sun	3cc309a2e3	Add Net observer for mobile apps Reviewed By: salexspb Differential Revision: D5593850 fbshipit-source-id: 96f7ea6e8a8ad3f92adf4e82239022d9ea2bd50a	2017-09-15 15:38:54 -07:00
Aapo Kyrola	6b44a00c71	remove in-place Dropout from rnn_cell (bug in PR-1185) Summary: This caused gradient generation problems. Output was made in-place in PR-1185, by mistake, I believe. Differential Revision: D5844825 fbshipit-source-id: 4ad84d0fb468aafde9f78463b9acf89316e633ca	2017-09-15 14:03:33 -07:00
Matt Uyttendaele	af8f6c1bca	adding unit tests to compphoto caffe2 projects Summary: Ported existing adhoc test code to use python unittests. Small tweak to caffe2.python.hypothesis_test_util Reviewed By: kmatzen Differential Revision: D5837295 fbshipit-source-id: daa2360db3c18c7d4bda7785e7a0b9175f5858af	2017-09-15 12:49:37 -07:00
Scott Sievert	dd27997aeb	DOC: adding note about distributed MPI backend (#2750 )	2017-09-15 13:47:35 -04:00
Pieter Noordhuis	27dde63358	Allow run of example resnet50_trainer without training data Summary: This is useful for pure throughput tests where we don't care about training a real model. Reviewed By: akyrola Differential Revision: D5834293 fbshipit-source-id: dab528c9269fb713e6f6b42457966219c06e0a35	2017-09-15 09:45:11 -07:00
Luca Antiga	3a3d27130d	Fix symbolic for max pool in all dimensions (#2742 )	2017-09-15 10:10:38 -04:00
Huazhong Ning	1a89c6e1ec	Decayed adagrad Summary: When trained on billions of data, the adagrad gradient square sum be very big and create an issue of adding small numbers to big numbers. This diff Allow to decay the adagrad gradient square sum. Reviewed By: queqichao Differential Revision: D5825932 fbshipit-source-id: 570224483b77d42ae53410fa2f767af86de167eb	2017-09-15 00:35:21 -07:00
Geet Sethi	f21de86209	Add per Op execution counts to prof_dag Summary: Added new counter to prof_dag which counts the number of times a particular op_type executed during an iteration, and prints the count per iter in the output. Reviewed By: akyrola Differential Revision: D5837444 fbshipit-source-id: 0f2571c6f85410dac21d4b627fe455ef7c1ab908	2017-09-14 23:04:33 -07:00
Aapo Kyrola	fb45383ed6	resubmission of PR1175: fp16 BatchMatMul Summary: PR 1175 caused a build error because gemmBatched was only under a specific #ifdef. Now put it outside the #ifdef, and things work. Reviewed By: asaadaldien Differential Revision: D5834868 fbshipit-source-id: 072a64c8f4b259ff7504104121766115b46b8aa0	2017-09-14 21:46:05 -07:00
Hao Lu	0bbf8a7a4c	Fix squareFactors in opengl_test.cc Summary: Remove the caffe2 namespace {} because all the code inside opengl_test.cc is wrapped inside the caffe2 namespace Reviewed By: Maratyszcza Differential Revision: D5829458 fbshipit-source-id: e68dde08a1c3dc4c41260f5f028ca7efe8d34fbd	2017-09-14 20:16:55 -07:00
Alykhan Tejani	7752fe5d4e	remove zero padding in orthogonal initialization	2017-09-14 23:13:43 -04:00
Alisson Gusatti Azzolini	b42a125ee4	Fix NCCL ops + Add NCCLReduceScatter Summary: - All NCCL ops that were triggering a reallocation were deadlocking because I think cudaMalloc or something wants the lock that is being held by ncclRun, so I split the parts where potential allocation happens to a separate lambda. Thanks a lot akyrola and asaadaldien for the after-hours help on debugging this. - Added support for NCCLReduceScatter. - NCCLReduce is still deadlocking, but it happens somewhere else. We can debug it separately. Reviewed By: akyrola Differential Revision: D5800861 fbshipit-source-id: c963f93942a3ee3bb706fac52047b18c3f37831a	2017-09-14 18:47:11 -07:00
Scott Sievert	3821fca0c6	DOC: i{send, recv} message order with MPI backend	2017-09-14 20:38:11 -04:00
Adam Paszke	b14c5bf016	Save output_nr in SavedVariable	2017-09-14 20:31:30 -04:00
Aapo Kyrola	1e37145872	Resnet50 should param init net before creating test net Summary: Otherwise weights, biases are not created and test creation fails Reviewed By: gsethi523 Differential Revision: D5836438 fbshipit-source-id: 32a75313b6b9ebecbfaa43ebd39f19c8eaba8cd1	2017-09-14 16:06:01 -07:00
Junjie Bai	86a9a06878	HTTPMessage in Python 3 does not have getheader Summary: get and getheader are the same in Python 2 Reviewed By: akyrola Differential Revision: D5836486 fbshipit-source-id: 3bacfccc872c44741d7f26c68ba967093fce45c2	2017-09-14 13:59:06 -07:00
Wojciech Glogowski	6340fde3b9	Made some arguments in momentum_sgd_update const Summary: Concerns N, momentum and nesterov arguments. Reviewed By: asaadaldien Differential Revision: D5787218 fbshipit-source-id: 6a068b49db4bb06674c2aef3efd366ce4d9ac60d	2017-09-14 13:32:16 -07:00
Aapo Kyrola	7eb5ad2e26	Fix profdag infinite loop Summary: Runasync() called DagNetBase::Run() which called ProfDag::RunAsync(). Reviewed By: Yangqing Differential Revision: D5835852 fbshipit-source-id: 30618d517c7ee235143de6efaa2f40df3f1d372f	2017-09-14 13:20:57 -07:00
Jerry Zhang	632da0b6be	LRN Op input "scale" Summary: * For forward: allow either 1 or 2 output. * For gradient generator: always return a gradient operator that does not use scale. * For cudnn gradient op: nothing to do, already like this * For default CPU and CUDA gradient ops: put scale as a member variable, and always recompute scale. Reviewed By: bddppq Differential Revision: D5690181 fbshipit-source-id: a6353202dcaf7359298bc8f032ac0c651352e2bc	2017-09-14 12:22:07 -07:00
Brett Koonce	08b4770adf	minor spelling, intialize->initialize	2017-09-14 15:13:01 -04:00
Edward Z. Yang	06c44e2283	Replace Variable(new VariableImpl(...), false) with make_variable. Also squash a warning about an implicit conversion that will never occur (because the type being converted to is a superclass). Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-14 14:33:08 -04:00
Edward Z. Yang	bcad604ea6	Move imap to six. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-14 14:33:08 -04:00
Sam Gross	5b5218ea95	Micro optimizations in ATen * Compare typeid instead of using dynamic_cast * Mark derived TensorImpl classes as final * Use tensor->nDimension instead of THTensor_(nDimension)	2017-09-14 11:14:45 -07:00
Jerry Zhang	0e7bd68536	Allow one output for droput at inference time Summary: att Reviewed By: bddppq Differential Revision: D5680214 fbshipit-source-id: 19e731901cb5c9491100c61baefc4b75e6e8b262	2017-09-14 10:46:41 -07:00
Jerry Zhang	63a2b75027	Add option to remove legacy_pad in caffe_translator Summary: To speed up deprecating legacy_pad, we added the option to remove legacy pad in the caffe_translator Reviewed By: bddppq Differential Revision: D5724079 fbshipit-source-id: 25465d26f35bd009aa71667c7c523047de42e802	2017-09-14 10:32:48 -07:00
Soumith Chintala	253d48c815	add in-place random sampling ops	2017-09-14 10:03:17 -04:00
Soumith Chintala	ce4932f8a4	add softmax2d docs	2017-09-14 09:41:04 -04:00
Yangqing Jia	0f0829d88e	Strict bound check for SequenceFunctor Summary: This exhibits the problem in NMT training where some out of bound data seems to have silently written over bound, and causing random segfaults elsewhere in the code. This itself does not solve the problem, but will trigger us to then fix the out of bound issues. Differential Revision: D5832646 fbshipit-source-id: 5eb259e4584e5341ef3f19362f98f0a9554e9aec	2017-09-14 01:30:58 -07:00
Philip Pronin	efda016108	fix dynamic-type-mismatch (ubsan) in caffe2/caffe2/core/tensor.h Summary: UBSan report: ``` UndefinedBehaviorSanitizer: dynamic-type-mismatch caffe2/caffe2/core/tensor.h:786:22 in caffe2/caffe2/core/tensor.h:787:19: runtime error: member call on address 0x60c01f610440 which does not point to an object of type 'caffe2::Tensor<caffe2::Tensor<caffe2::CPUContext> >' * Aborted at 1505298367 (Unix time, try 'date -d 1505298367') * * Signal 6 (SIGABRT) (0xf2) received by PID 242 (pthread TID 0x7fb376f06700) (linux TID 33215) (maybe from PID 242, UID 0), stack trace: * 0x60c01f610440: note: object is of type 'N6caffe26TensorINS_10CPUContextEEE' 07 5e 81 60 c8 47 13 35 00 00 00 00 90 f3 73 80 20 60 00 00 98 f3 73 80 20 60 00 00 a0 f3 73 80 ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'N6caffe26TensorINS_10CPUContextEEE' #0 0x1f0d1c22 in std::vector<long, std::allocator<long> > caffe2::GetTensorInfo<caffe2::Tensor<caffe2::CPUContext> >(void const, bool, unsigned long, caffe2::DeviceOption) caffe2/caffe2/core/tensor.h:787:19 #1 0x9a5e0a1 in caffe2::FacebookOperatorObserver::log() caffe2/caffe2/fb/init/net_observer.cpp:300:15 #2 0x9a5b49d in caffe2::FacebookOperatorObserver::Stop() caffe2/caffe2/fb/init/net_observer.cpp:229:11 #3 0x447d046 in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:308:20 #4 0x1ecedb2f in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:51:14 #5 0x1f1ba169 in caffe2::Workspace::RunNet(std::basic_fbstring<char, std::char_traits<char>, std::allocator<char>, std::fbstring_core<char> > const&) caffe2/caffe2/core/workspace.cc:211:26 ... ``` The bug is that `GetTensorType` and `GetTensorType` take context as template argument, not tensor itself. Reviewed By: bddppq Differential Revision: D5826781 fbshipit-source-id: 9cfd2ca1aaef6f8ee8a556ce7b553c0a4f43a100	2017-09-13 23:31:35 -07:00
Jongsoo Park	e9581e47a2	fix comment on core.Net.RunAllOnMKL Summary: Fix comment on core.Net.RunAllOnMKL (the comment was actually for core.Net.RunAllOnGPU) Reviewed By: zem7 Differential Revision: D5734309 fbshipit-source-id: 2cc40a99a2c0083c73ec1e4c8279f55f296a003c	2017-09-13 19:32:18 -07:00
Romain Cledat	77ea40c01a	Added USDT sample points to simple net Summary: This enables opsnoop to work with simple net as opposed to just dag net Reviewed By: pietern Differential Revision: D5721732 fbshipit-source-id: c38d0b51d3b0469ecb2883e7075eeee7acf81d75	2017-09-13 19:10:16 -07:00
Yangqing Jia	f0d0361609	Revert D5794634: [caffe2][PR] fp16: BatchMatMul Summary: This reverts commit 911c462824edec3de529a5a4385a4c437e24bf59 bypass-lint Differential Revision: D5794634 fbshipit-source-id: 1863b02282329cbee6b10e5870f03051b4bb6c58	2017-09-13 18:46:47 -07:00
Simon Layton	6436881e2d	Re-issue random resize Summary: Closes https://github.com/caffe2/caffe2/pull/1110 Reviewed By: akyrola Differential Revision: D5661604 Pulled By: harouwu fbshipit-source-id: de8b7916ffd9b9970db20ad79da77c135e759a4f	2017-09-13 17:47:43 -07:00
Sam Gross	80d229b0e7	Refactor THPUtils_invalidArguments into separate file	2017-09-13 19:18:02 -04:00
Alisson Gusatti Azzolini	68f358452b	Add node_name to DeviceOption Summary: Allow for generalizing net transforms. Reviewed By: Yangqing Differential Revision: D5812140 fbshipit-source-id: e3f30acad362ae1f0614ee218d331b525710b88e	2017-09-13 16:04:04 -07:00
Luke Yeager	37af6566e1	fp16: LSTMUnit Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1191 Differential Revision: D5825387 Pulled By: akyrola fbshipit-source-id: edb47c8bd7ffb72e1e587a9c5bfee9347e3d587e	2017-09-13 15:47:03 -07:00
Jerry Zhang	23f4f78c22	Functional C2 Summary: Supporting calling C2 operators as functions, e.g. ``` from caffe2.python.functional import Functional Y = Functional.Relu(X)[0] ``` Supporting numpy arrays as input for now. Reviewed By: bddppq Differential Revision: D5791821 fbshipit-source-id: 7e936ad52b8b304c5e210248bd6649fd066cd909	2017-09-13 15:37:28 -07:00
Soumith Chintala	e4c0af8b56	revert #2708 modify orthogonal init for rows<cols case	2017-09-13 18:23:43 -04:00
Soumith Chintala	2b5835ba5c	fix lint	2017-09-13 18:18:34 -04:00
Peter Ruch	0a9f93e43c	add env var for python executable	2017-09-13 17:49:08 -04:00
Soumith Chintala	7eafd6cd6f	Merge commit '23e5a8be8ea42118c4d93632affb00a0802a7770'	2017-09-13 17:38:11 -04:00
Aapo Kyrola	ec2ee181c1	allow sharing tensor of simple types Summary: If blob type switches between fp32, fp16 - for example - we should share the tensor buffer. This kind of switching can happen with memonger and in-place conversions. Reviewed By: bddppq Differential Revision: D5812333 fbshipit-source-id: 44d54bfe52cbda734db8c7f20d6970e4b51ee1e1	2017-09-13 14:35:29 -07:00
Marat Dukhan	bd17684252	Run thread pool only on fast cores Summary: choose the number of cores for the thread pool as the number of fast cores Didn't do any benchmarks, so its mostly FYI diff Reviewed By: ajtulloch Differential Revision: D5579797 fbshipit-source-id: 5ada001116c731780f38a62e9c0b500bd64a4bfe	2017-09-13 14:35:28 -07:00
Junjie Bai	90ca470d70	Standardize operator argument "is_test" Summary: Also add the ability to mark an argument as required. Added a string constant `OpSchema::Arg_IsTest` for `is_test` arg. If users define the `is_test` argument with `ArgIsTest(...)`, then it automatically becomes required argument, in the meanwhile user can still use `Arg("is_test", ...)` to define an optional `is_test` argument. Reviewed By: akyrola Differential Revision: D5812391 fbshipit-source-id: eaaba50d027813a8012389edc6c459de23c3c728	2017-09-13 14:35:27 -07:00
Luke Yeager	3cfc6f26e7	fp16: BatchMatMul Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1175 Reviewed By: Yangqing Differential Revision: D5794634 Pulled By: akyrola fbshipit-source-id: 911c462824edec3de529a5a4385a4c437e24bf59	2017-09-13 14:35:25 -07:00
Dmytro Dzhulgakov	97e733615c	Use simple function pointers for memory allocation and deallocation. Reviewed By: Yangqing Differential Revision: D5822238 fbshipit-source-id: 9624e6494ea6be10221aa75c7f22aa8721946af2	2017-09-13 14:26:04 -07:00
Soumith Chintala	23e5a8be8e	add support for custom python	2017-09-13 14:06:56 -07:00
nadavbh12	d01adcbe0e	modify orthogonal init	2017-09-13 16:54:37 -04:00
Jongsoo Park	4d3a0f7a20	spell fix seet to set Summary: spell fix seet to set Reviewed By: zem7 Differential Revision: D5825897 fbshipit-source-id: 17c85450b17b0d857cc69739bfc33c8a0d55b981	2017-09-13 13:21:01 -07:00
Soumith Chintala	462f95ed6d	fix bug in autograd type() for non-default GPU input	2017-09-13 15:33:37 -04:00
Alisson Gusatti Azzolini	c07ebd2396	TrimDataset to ensure size is multiple of number or replicas Summary: For data parallel we need the batch size to be multiple of nubmer of replicas. In order to do so with this diff we do Dataset(rec).trim(multiple_of=num_replicas) Reviewed By: dzhulgakov, harouwu Differential Revision: D5753861 fbshipit-source-id: c5d728b925707dbd3d1f500a93e67e185c223569	2017-09-13 12:17:21 -07:00
Luke Yeager	c313855523	Use brew in rnn_cell.py Summary: Was https://github.com/caffe2/caffe2/pull/1151. Closes https://github.com/caffe2/caffe2/pull/1185 Differential Revision: D5794716 Pulled By: akyrola fbshipit-source-id: c27d30d5d6dd7dacc47610150dcfef03343a7120	2017-09-13 12:02:57 -07:00
Aapo Kyrola	6e322a4191	refactor states-handling of CuDNNDropout Summary: CuDNNDropout used to append the CUDNN states structure on top of the mask blob. This is a bit controversial, and also caused problems when the mask-blob was released by dynamic memory management. This diff makes that states-blob a separate blob managed outside the inputs/outputs (so that we don't need to have different signature for CUDNN and non-CUDNN op). Since Gradient op needs to access the same states, it will grab the states blob based on the mask blob name. Perhaps not the most cleanest way to pass information, but at least better than the previous model. Also could remove a fair amount of code. Reviewed By: bddppq Differential Revision: D5787039 fbshipit-source-id: d95f0ffafb5fb2a6a7ce46f4a855e9c1b9a47f52	2017-09-13 12:02:57 -07:00
albanD	2356ee41b7	Fix segfault in backward	2017-09-13 14:47:26 -04:00
Luke Yeager	361bbb8b43	fp16: SumReduceLike Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1183 Differential Revision: D5794704 Pulled By: akyrola fbshipit-source-id: e4dee46f753e9a8663057c81f23028f6246fba02	2017-09-13 11:46:23 -07:00
Luke Yeager	f775149205	tests: use assertRaises, not expectedFail Summary: I would expect that tests marked "expected failure" mean that there is a known issue in the code which will be fixed later. Both of these tests are simply verifying proper error-checking - nothing needs fixing. Before (looks like something is wrong): ``` ======================================= 2 xfailed in 0.27 seconds ======================================= ``` After: ``` ======================================= 2 passed in 0.28 seconds ======================================== ``` /cc akyrola gsethi523 Closes https://github.com/caffe2/caffe2/pull/1209 Differential Revision: D5825373 Pulled By: akyrola fbshipit-source-id: 1b98f503e4e406f69567d02425532f43bd16a465	2017-09-13 11:39:35 -07:00
Gregory Chanan	d910a94b2b	Support AdaptiveMaxPool1d/2d double backwards.	2017-09-13 12:28:43 -04:00
Gregory Chanan	2cad108269	Make AdaptiveMaxPool1d/2d indices format the same as MaxPool1d/2d format.	2017-09-13 12:28:43 -04:00
Sam Gross	4b5a6c07ac	Make 's_' functions on Type public	2017-09-13 00:19:40 -07:00
Yangqing Jia	95c954abc0	redesigning NetBase's Run() and RunAsync() functionalities Summary: Right now, each net implements 2 functions: Run() and RunAsync(). The (loose) abstraction is: * Run(): run the network in a synchronous way. The call is synchronous. * RunAsync(): run the network still synchronously, but potentially use asynchronous scheduling of the underlying operators. As one can see, this is highly confusing: RunAsync() is actually a sync call, and the semantics it tries to implement should actually be done by a different net type. For example, DAGNet and AsyncDAGNet both implement the Run() function, and under the hood one uses sync scheduling and one uses async scheduling. Currently, the only user of the RunAsync() function is in SimpleNet::RunAsync(). The only call site is in recurrent_net_op. Instead, the operator implements the two Run() and RunAsync() functions as follows: * Run(): run the operator in a synchronous way. aka doing FinishDeviceComputation(). * RunAsync(): run the operator in an asynchronous way if possible (i.e. still sync in CPU, but async in cuda), records the action in the event_, and return immediately. Semantically, Run() is equal to RunAsync() followed by event().Finish(). As a result, we propose in diff D5812854 to change the network interface similar to the operator interface, and explicitly raise RunAsync() as a first class citizen of the net interface. Specifically, whether a net can run asynchronously is now determined by the * Adding a SupportsAsync() function that determines if a net supports async execution or not. * Run(): run the net in a synchronous way. * RunAsync(): if SupportsAsync() is false, same as Run(). if SupportsAsync() is true, run the operator in an asynchronous way, with the scheduling algorithm determined by the implementation itself. Then, record all outstanding events in the events_ field, and return immediately. Semantically, Run() is equal to RunAsync, and call event.Finish() for all the events. This is actually the implementation and Run() is no longer a virtual function, RunAsync() is: all sub classes of NetBase shall implement SupportsAsync() and RunAsync() now. Why SupportsAsync()? This is a design idea that probably needs iterating. Basically, the idea is that RunAsync() is the main entry for the net execution, and it's actually like RunAsyncIfTheNetSupportsIt(). In theory, Run() is basically a wrapper on top of RunAsync() to reduce code duplication: if a net type does not support RunAsync(), its RunAsync() implementation simply is sync (see e.g. SimpleNet) and the Run() to RunAsync() lowering is a no-op (with the only overhead being a nested function call). I exposed the SupportsAsync() function just in case some caller wants to explicitly check whether an instantiated net supports async call or not - for example, a caller may want to make sure that it is actually running a net asynchronously, in which case SupportsAsync() is the place to query. Reviewed By: dzhulgakov Differential Revision: D5812854 fbshipit-source-id: 916b38fded0eb14439f340ab254a034ac5a9a465	2017-09-13 00:02:20 -07:00
Sachin Padmanabhan	a198da5583	Added LengthMax Operator to Caffe2 Summary: Added LengthMax operator to Caffe2. Reviewed By: dzhulgakov Differential Revision: D5720124 fbshipit-source-id: 1995fea8e480c9a9f3e054d02801b03c1ce6c51b	2017-09-12 20:01:48 -07:00
Hao Lu	0b89eb7592	Make seg ios run with OpenGL Summary: Trying to reland D5803411 Reviewed By: fricc33 Differential Revision: D5819829 fbshipit-source-id: 96cb29c7699df625d30853f91844153ed76505d5	2017-09-12 18:16:23 -07:00
Hao Lu	63829695c6	Make android segmentation net run with MPSCNN Summary: Trying to reland D5803245 Reviewed By: fricc33 Differential Revision: D5818735 fbshipit-source-id: 252fd3c68ce8731b5c96e2f0678128ba9b668581	2017-09-12 18:16:22 -07:00
Edward Z. Yang	cd9b27231b	Add comment about scope-defining trick. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-12 21:08:23 -04:00
Gregory Chanan	713756d115	Remove function test code, cleanup.	2017-09-12 21:07:48 -04:00
Gregory Chanan	36b13f4776	Implement Concat Function tests as individual test methods since there is no cat method on Tensors/Variables.	2017-09-12 21:07:48 -04:00
Gregory Chanan	3da453f25a	Unify function and method tests.	2017-09-12 21:07:48 -04:00
Gregory Chanan	08eb88f3de	Duplicate what is tested in function tests in the method tests. Also make some function-vs-method tests uniform and change method tests so they will pass gradchecks (i.e. avoid nans)	2017-09-12 21:07:48 -04:00
Soumith Chintala	19cfda761c	write THD link libraries to text file and read it in setup.py to link dependencies correctly (#2711 )	2017-09-12 20:56:36 -04:00
Fabio Riccardi	8860fb7fe0	Implemented uniform buffer batching Summary: Kernel data and other shader parameters are now cached directly into uniform buffer blocks, and the blocks are dynamically attached at run time. Reviewed By: hlu1 Differential Revision: D5772847 fbshipit-source-id: 746448c2d5db12e38fb883874ede3acfccb9f6ef	2017-09-12 17:51:39 -07:00
Xian Li	68e7a0f2ed	Enable target dialect token in inference. Differential Revision: D5665714 fbshipit-source-id: 56ba88e72f71cae23d992e3ad7ea134c3d2c6d1d	2017-09-12 17:22:18 -07:00
Aapo Kyrola	47fd6cc255	Revert D5801013: [caffe2] Use simple function pointers for memory allocation and deallocation. Summary: This reverts commit 7068207a43400fa3902bbb3689b3c729e839456c bypass-lint Differential Revision: D5801013 fbshipit-source-id: ca2bd9aaf61c20ce1935a007ab7b34f5d37f5033	2017-09-12 16:36:36 -07:00
Pieter Noordhuis	c2169c717f	Remove references to cnmem Summary: TSIA Reviewed By: Yangqing Differential Revision: D5815624 fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed	2017-09-12 14:37:12 -07:00
Aapo Kyrola	ce36a972b0	fix timeouts in CloneOrCreateCommonWorld Summary: Default value for timeout in CreateOrCloneCommonWorld does not work properly: if the value of dpm._DEFAULT_TIMEOUT is changed, the default still stays as old 30s. Changed to use None instead as default. Reviewed By: pietern Differential Revision: D5813228 fbshipit-source-id: f617ceec40a03893c27d3e13c426e1ca6b2114e2	2017-09-12 13:09:05 -07:00
Viswanath Sivakumar	583d031754	Operator to compute RoI region coordinates for RMAC Summary: Computes a fixed grid or RMAC region coordinates for a given 4D feature tensor (NCHW) as described in https://arxiv.org/abs/1511.05879. The output is the `roi` format expected by RoIPoolOp. To compute the actual RMAC itself, the output of this op should be passed to RoIPoolOp. Reviewed By: wickedfoo Differential Revision: D5594994 fbshipit-source-id: 5edac98a18137b53555f9a16354419b424679c99	2017-09-12 12:47:17 -07:00
Xianjie Chen	be406b1e5f	Revert D5639080: Caffe2: Cuda implementation for BatchOneHot operator Summary: This reverts commit 8ee280c4bab64c1fdfb7429ee2c9ac8c02933931 bypass-lint Differential Revision: D5639080 fbshipit-source-id: cf522822b7cb5ba9a238ba7837f0f522e1f49b73	2017-09-12 11:51:14 -07:00
Aapo Kyrola	93bd3c77f8	AddBlobsSync() Summary: Explicit function to sync blobs. Notice that this must be called before CreateNet(), and syncs the blobs every run. Reviewed By: asaadaldien, jay-mahadeokar Differential Revision: D5805891 fbshipit-source-id: 58a1bb47805d75d5cbead136e2e0e9fe663ea954	2017-09-12 10:33:22 -07:00
Sam Gross	1290e586fb	Use at::Tensor based autograd Variable (#2676 ) Variable is now a subclass of at::Tensor backed by a VariableImpl* pImpl. The implementation of the ATen functions is defined in the auto-generated VariableType.h/cpp file. Currently, only functions which fall through to the base type, such as sizes() and isCuda() are implemented. Differentiable ops like add() and mul() will be added in a subsequent PR.	2017-09-12 11:36:01 -04:00
Edward Z. Yang	820143f4af	Drop L specifier; reimplement tuple printing in C++ When you call repr() on a long in Python 2, it prints a long suffix. This is annoying for tests which assert on the exact output. Use str() instead. But then there is a problem with Python 2's default tuple str() implementation, where it calls repr() on its arguments rather than str(). This means that if you have a tuple of longs, it will render as "(1L, 2L)" in Python 2. To solve this problem, we just reimplement tuple printing in C++. This is not a very robust fix (nested tuples, dictionaries, all these situations will fail) but in practice it hits the cases that matter. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-12 11:03:03 -04:00
Edward Z. Yang	d1346c75ec	Always use generator version of map for Variable iteration. In Python 2, the non-generator map will always perform the indexing even when it is not used in the end. Using the generator can let us avoid indexing when it is not used. As an added bonus, it makes the ordering of operations deterministic between Python 2 and Python 3 in LSTM. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-12 11:03:03 -04:00
Edward Z. Yang	39d495b267	Generate expect files in same directory as top-level test script. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-12 11:03:03 -04:00
Xian Li	a782858285	Move go_token_id out of beam search constructor. Summary: This is will allow the same decoder to handle different go tokens. Differential Revision: D5801811 fbshipit-source-id: ddd309963c97e32c728b15d2ccd4ba0c4ad5ebbe	2017-09-11 18:52:08 -07:00
Hao Lu	d52404779f	Revert D5803245: [caffe2][MPSCNN][segmentation] Make android segmentation net run with MPSCNN Summary: This reverts commit 6808e9c3504389c113c7a16504d6554e83bdcc3e bypass-lint Differential Revision: D5803245 fbshipit-source-id: e6e2e90dd196ae958d729af2e19942e922207a2a	2017-09-11 18:33:53 -07:00
Hao Lu	f09fb7735e	Revert D5803411: [caffe2][segmentation]Make iOS segmentation net run with OpenGL Summary: This reverts commit d208771d59f99b4f95ce67849baf369c14e66b37 bypass-lint Differential Revision: D5803411 fbshipit-source-id: b120583dca6b885e91c92993ab3cc18f7e2c8a48	2017-09-11 18:33:52 -07:00
Soumith Chintala	4fec5f658b	add Bilinear to docs, fix reference	2017-09-11 20:12:27 -04:00
Soumith Chintala	1794e76800	add missing bilinear docs entry	2017-09-11 20:06:44 -04:00
Fei Sun	670cbf0350	Remove the files added by PR 1203 Reviewed By: pietern Differential Revision: D5809970 fbshipit-source-id: 011b635ca9d1c285543b88cb021df5ba8f4b2a5a	2017-09-11 17:02:00 -07:00
Hao Lu	98173850b2	Make iOS segmentation net run with OpenGL Reviewed By: fricc33 Differential Revision: D5803411 fbshipit-source-id: d208771d59f99b4f95ce67849baf369c14e66b37	2017-09-11 16:32:41 -07:00
Hao Lu	ebf7784840	Make android segmentation net run with MPSCNN Summary: The android segmentation net was failing with MPSCNN because the some fused MPSCNNConvRelu ops become in-place after fusion. Reviewed By: fricc33 Differential Revision: D5803245 fbshipit-source-id: 6808e9c3504389c113c7a16504d6554e83bdcc3e	2017-09-11 16:32:40 -07:00
Zachary DeVito	103977cc8c	fix warnings (#2693 )	2017-09-11 18:34:27 -04:00
Soumith Chintala	ebe6f8b631	Merge commit '0161ea2ca911acce1cfebab3e9238992dc5ce963'	2017-09-11 17:58:23 -04:00
Soumith Chintala	3ebf4b6173	Merge commit 'bc66c9da86c5652dd271c9711659ccd689253786'	2017-09-11 17:55:45 -04:00
Natalia Gimelshein	bc66c9da86	fix alignment warning	2017-09-11 17:54:47 -04:00
Luke Yeager	944115c915	Bugfix for concat frontend Summary: When breaking out pooyadavoodi's change to `brew.concat` from https://github.com/caffe2/caffe2/pull/1151 to https://github.com/caffe2/caffe2/pull/1184, I made it throw an error instead of silently changing removing `order`. But `order` is always present because of [this](https://github.com/caffe2/caffe2/blob/v0.8.1/caffe2/python/model_helper.py#L118), so the frontend can never be used to set `axis`. That's bad. This PR changes the behavior back to Pooya's original implementation. Closes https://github.com/caffe2/caffe2/pull/1202 Reviewed By: akyrola Differential Revision: D5806488 Pulled By: pietern fbshipit-source-id: ceaea77469688a66b269b8ed2944f0d3fe873940	2017-09-11 13:02:59 -07:00
Pieter Noordhuis	84167faf0f	Enable use of GPUDirect through argument to Gloo AllreduceOp Summary: If the Gloo InfiniBand transport is used, the Gloo algorithms can use GPUDirect to DMA directly from/to GPU memory. This is done through the CudaDeviceWorkspace. This change adds a "gpu_direct" option to the Allreduce operator that makes it use GPUDirect if the transport supports it. Closes https://github.com/caffe2/caffe2/pull/1203 Reviewed By: wesolwsk Differential Revision: D5806366 Pulled By: pietern fbshipit-source-id: 9e9a78f059f2b5c6e4fbf6574b7db4776a94696c	2017-09-11 13:02:58 -07:00
Sam Gross	0161ea2ca9	Mark unsafeGetTH as const	2017-09-11 11:17:16 -07:00
Gregory Chanan	ace1426d50	Move wrap_dim code to Utils function to minimize generated code.	2017-09-11 11:16:52 -07:00
Gregory Chanan	183c2071f9	Generate wrap_dim code on derived type rather than base type. Either should work, but code feels more natural this way.	2017-09-11 11:16:52 -07:00
Gregory Chanan	39b5031517	Support wrap_dim specifications from cwrap.	2017-09-11 11:16:52 -07:00
Luca Antiga	4a71ca6c60	Use cast instead of literal as a temporary fix	2017-09-11 10:44:36 -07:00
Luca Antiga	1cf58bddd6	Fix default constructor argument	2017-09-11 10:44:36 -07:00
Soumith Chintala	d7f79e7d98	Merge commit '92abd54dfdf03c7ad6f9426c91ad55dc49d95d02'	2017-09-11 13:32:05 -04:00
lichen	92abd54dfd	simplify the code	2017-09-11 13:31:04 -04:00
Rishi Raj Singh Jhelumi	bc4f233b56	Make use of zeus kv store. Summary: Implement atomic add operation for zeus kv store. All nodes now use zeus as KVStore instead of replying on master hosting a KVServer Code cleanup. Reviewed By: andrewwdye Differential Revision: D5581697 fbshipit-source-id: ba7d99215fb478a30942ff593f13dad65aa48d36	2017-09-11 09:05:00 -07:00
Mayank Rana	1c414426df	Caffe2: Cuda implementation for BatchOneHot operator Summary: Cuda implementation for BatchOneHot operator. Reviewed By: lvdmaaten Differential Revision: D5639080 fbshipit-source-id: 8ee280c4bab64c1fdfb7429ee2c9ac8c02933931	2017-09-11 08:24:44 -07:00
Soumith Chintala	cf2c7ca998	add THPP linkage when building THD (#2687 )	2017-09-11 08:53:38 -04:00
Yangqing Jia	01a1cf1e07	small fix for pointer initialization. Summary: A bit safer, and also suppresses compiler warning. Reviewed By: bddppq Differential Revision: D5803080 fbshipit-source-id: d8c782c936a8fdaded4ae209b212378e78606ffb	2017-09-11 01:41:35 -07:00
Yangqing Jia	10a032de67	Use simple function pointers for memory allocation and deallocation. Summary: During the team meeting today Dima and Alex mentioned that the current lambda function causes slowdown in performance when a large number of alloc and dealloc happen. My observation is that most of the Delete are actually direct Delete() function pointers, so I gave it a shot to see if we can reduce the overhead. RawAllocDealloc is much fast already, and we observe another 5ns reduction (12.5%). For TensorAllocDealloc of 32x32 tensors, we are observing 57ns saving (26%). This is measured on Xeon(R) CPU E5-2660. Also cleaned up the function interfaces of ShareExternalPointer so we have 2 functions only. Reviewed By: salexspb, dzhulgakov Differential Revision: D5801013 fbshipit-source-id: 7068207a43400fa3902bbb3689b3c729e839456c	2017-09-10 22:47:26 -07:00
Yangqing Jia	47d1b6846a	Add a memory allocation / deallocation overhead bencmark. Summary: TSIA Reviewed By: dzhulgakov, salexspb Differential Revision: D5801003 fbshipit-source-id: 8be1133ae2f75a735072a82ac33b922da75de8d2	2017-09-10 21:39:26 -07:00
Soumith Chintala	4998a14144	Merge commit 'e8dec6e395faf6c4726df145e85ff7f77618668a'	2017-09-10 13:52:36 -04:00
Soumith Chintala	a77aa12759	Merge commit '0df2f1cbd62ab2a7d507bc68d8d43509ca268a0e'	2017-09-10 13:51:53 -04:00
Francisco Massa	1da87118cc	Optimize pow for different exponents and add tests	2017-09-10 13:51:05 -04:00
Francisco Massa	e8dec6e395	Optimize pow for different exponents and add tests	2017-09-10 13:50:57 -04:00
Francisco Massa	0df2f1cbd6	Optimize pow for different exponents and add tests	2017-09-10 13:50:50 -04:00
Varun Agrawal	141f8921ac	MultiLabelMarginLoss doc fix (#2683 )	2017-09-10 13:48:33 -04:00
Varun Agrawal	b31cf0ebd4	Added support for nInputDim parameter in legacy Padding class (#2645 ) * Added support for nInputDim parameter in Padding class * moved nInputDim to the end so as to not break backwards compatibilty * hasattr to check if nInputDim is actually set * check if nInputDim is positive before checking against input dim	2017-09-10 13:47:34 -04:00
Zhicheng Yan	96cc52cde7	image_input_op_support_int64 Summary: Support int64 data type in protobuffer tensor in image input op. This is useful when fbid, which is usually of data type BIGINT, is stored in tensor proto. Reviewed By: panshen1 Differential Revision: D5792697 fbshipit-source-id: 0bc3da4fd31120b0582fb32dd7c2d09fe591a6de	2017-09-09 22:50:37 -07:00
Kevin Matzen	5d9c505e41	elu gradient cuda fix Summary: CPU gradient is correct. CUDA gradient was wrong. Reviewed By: asaadaldien Differential Revision: D5801595 fbshipit-source-id: 7e529ed751b92137e49a0517120ddfae7a30ec28	2017-09-09 21:46:45 -07:00
Aapo Kyrola	72ea242280	fix race condition with finished_timesteps Summary: Stress tests for recurrent_net_executor_test failed sporadically when the executor got stuck in forward-only mode. In forward-only mode we apply limitation to the number of parallel timesteps (because we recycle workspaces cyclically). There was a race condition where the finished_timesteps_ variable was set to 0 after jobs had been executed by threads. So set the variable to 0 before putting any jobs to the queue. Reviewed By: azzolini, Yangqing Differential Revision: D5801599 fbshipit-source-id: 8443c67f4ae8af3ae08c6f0cd4575ef729ffa3af	2017-09-09 16:46:15 -07:00
Aapo Kyrola	45f07238f4	make rnn executor figure out recurrent mappings from links Summary: RNN executor previously relied on getting the mapping from x to x_prev (and gradients) from recurrent.py, but we can just infer them from links. This makes all models compatible with rnn executor, given enable_rnn_executor=1 argument. Reviewed By: jamesr66a Differential Revision: D5801436 fbshipit-source-id: 14d0e26dfbad6347f645d907da493187c98e9b17	2017-09-09 16:19:26 -07:00
Luke Yeager	1cf94854a4	fp16: SequenceMask Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1178 Reviewed By: bddppq Differential Revision: D5794641 Pulled By: akyrola fbshipit-source-id: c3bd99dde74317280a65af7cc7a36a6a734822f6	2017-09-09 13:02:38 -07:00
Allen Ye	977b1f988c	Fix EmbeddingBag doc (#2679 )	2017-09-09 00:05:12 -04:00
Alykhan Tejani	d81d71f24c	fix docs for variable.backward (#2678 )	2017-09-08 20:23:34 -04:00
Pieter Noordhuis	d43ab4bec5	Create Gloo common world through MPI rendezvous Summary: Before this change there were two ways for machines to rendezvous for a distributed run: shared file system or Redis. If you're using an MPI cluster it is much more convenient to simply execute mpirun and expect the "right thing (tm)" to happen. This change adds the "mpi_rendezvous" option to the CreateCommonWorld operator. If this is set, the common world size and rank will be pulled from the MPI context and Gloo rendezvous takes place using MPI. Note that this does NOT mean the MPI BTL is used; MPI is only used for rendezvous. Closes https://github.com/caffe2/caffe2/pull/1190 Reviewed By: akyrola Differential Revision: D5796060 Pulled By: pietern fbshipit-source-id: f8276908d3f3afef2ac88594ad377e38c17d0226	2017-09-08 17:18:47 -07:00
Luke Yeager	6cf172c60d	fp16: SumSqrElements Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1179 Differential Revision: D5794650 Pulled By: akyrola fbshipit-source-id: 63e7973a88193a3b74ac4ba677df737889cbf0b6	2017-09-08 16:36:51 -07:00
Aapo Kyrola	cef2068eee	enable setting rnn executor threads and max streams Summary: As title. Made the configurations op-specific since many models run multiple RNNs. Reviewed By: jamesr66a Differential Revision: D5796208 fbshipit-source-id: 88173879dfff9f3f7bf583ccc4f4c6385cca5aca	2017-09-08 16:36:51 -07:00
Kittipat Virochsiri	27433e978c	Make piper of PipedReaderBuilder takes arguments Summary: Allow context to be passed into piper function Reviewed By: volkhin Differential Revision: D5684716 fbshipit-source-id: 693f0464fe28f8692d75901705a85a0a413a7bed	2017-09-08 13:46:29 -07:00
Hao Lu	c11755e559	Add checks for input texture slice for tiling Summary: The convolution should not run with input texture slices > 1 with tiling Differential Revision: D5774187 fbshipit-source-id: 5e94f82cd65e0d4425a7a0090a61a33bef2a14fc	2017-09-08 12:52:22 -07:00
Bram Wasti	cd7d96e3b6	Fix travis build system by adding sudo Summary: This should fix the pip installation errors. Here's the build on my branch: https://travis-ci.org/bwasti/caffe2/builds/273382203 Closes https://github.com/caffe2/caffe2/pull/1189 Differential Revision: D5795633 Pulled By: bwasti fbshipit-source-id: a4c341140d19b1885772f79bf321e9febf7986bc	2017-09-08 12:21:22 -07:00
Luke Yeager	c9f11bc317	fp16: Scale Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1180 Differential Revision: D5794682 Pulled By: akyrola fbshipit-source-id: 29bfa8ebcd3e6d65086abd50c472bd87d2ed0550	2017-09-08 12:21:21 -07:00
Honghao Wei	6763c14e84	add base class ModifierContext, rewrite OptimizerContext, add RegularizerContext Summary: `ModifierContext` is the base class for `OptimizerContext` and `RegularizationContext`. `UseModifierBase` is the base class for `UseRegularizer `and `UseOptimizer` Most of codes in `OptimizerContext`, `RegularizationContext` and other potential Context class in future could be shared. We thus implemented a new base class, called `ModifierContext` to support it. It happens to be the same for `UseRegularizer` and `UseOptimizer`, and we implemented a new base class called `UseModifierBase`. In this way, users only need to provide API for get and has operation. Also, they need to tell what's the context class. Note Mirrored code in fbandroid and fbobj would be added when finally check in. Reviewed By: kittipatv, xianjiec Differential Revision: D5724613 fbshipit-source-id: de19bb822dcd41ec5c459d65065603a0abe2fd20	2017-09-08 11:39:23 -07:00
Honghao Wei	e76015040a	add regulariztion in caffe2 and dper Summary: Regularization added for caffe2 and dper. This regularization is intended for `dense feature `only. Sparse feature would serve as individual optimizer, see ` D5618405 ` and `D5534579` for details. The implementation of dense regularization is similar to the ones in optimizer. we now support `l1 norm` and ` l2 norm` in regularizer. In dper, we would call different regularization based on regularization type defined in model_definition.thrift. Reviewed By: xianjiec Differential Revision: D5724851 fbshipit-source-id: 0fbee698cfeff1ac477fc9d07785406069f8d9c8	2017-09-08 11:39:22 -07:00
Pieter Noordhuis	b8eb8ced7d	Add transport/interface arguments to CreateCommonWorld operator Summary: These arguments control which Gloo transport (TCP or IB) and which network interface is used for the common world. If not specified, it defaults to using TCP and the network interface for the IP that the machine's hostname resolves to. The valid values for the transport argument are "tcp" and "ibverbs". For ibverbs to work, Gloo must have been compiled with ibverbs support. If Gloo is built as part of Caffe2 (sourced from the third_party directory), then you can pass -DUSE_IBVERBS=ON to CMake to enable ibverbs support in Gloo. Closes https://github.com/caffe2/caffe2/pull/1177 Reviewed By: akyrola Differential Revision: D5789729 Pulled By: pietern fbshipit-source-id: 0dea1a115c729e54c5c1f9fdd5fb29c14a834a82	2017-09-08 10:57:41 -07:00
Zach DeVito	3f899a15ce	force NO_CUDA to be specified to disable cuda. add pytorch's FindCUDA so that it is possible to get ccache to work for nvcc. make excluded notification more concise.	2017-09-08 10:39:08 -07:00
Luke Yeager	fdbfcfc431	fp16: CuDNNSoftmax Summary: Was https://github.com/caffe2/caffe2/pull/1151 Closes https://github.com/caffe2/caffe2/pull/1181 Differential Revision: D5794693 Pulled By: akyrola fbshipit-source-id: c83a98968bb363f612e53a04e8236582be6edd5d	2017-09-08 10:34:35 -07:00
Luke Yeager	03de05229e	brew.concat: don't set both order and axis Summary: Was https://github.com/caffe2/caffe2/pull/1151. pooyadavoodi says this was causing problems for him. I don't remember the details. Closes https://github.com/caffe2/caffe2/pull/1184 Differential Revision: D5794711 Pulled By: akyrola fbshipit-source-id: 4d75f2a9b30881ba662141c352ac556cb5d3cce6	2017-09-08 10:34:34 -07:00
Luke Yeager	1a2b229d47	fp16: add test for FC Summary: fp16 and TensorCore support was already added to the op in https://github.com/caffe2/caffe2/pull/1056. This adds a test. Closes https://github.com/caffe2/caffe2/pull/1182 Differential Revision: D5794698 Pulled By: akyrola fbshipit-source-id: b0d7ef317dfbb9d712b0b4646b38dc600b8434f1	2017-09-08 10:34:34 -07:00
Sam Gross	d43185612a	Specify CWRAP_FILES_BASE for ATen	2017-09-08 09:43:49 -07:00
Sam Gross	046e9ae5c8	Use arg['default'] as constant value	2017-09-08 09:41:26 -07:00
Sam Gross	e6fdbd5807	Merge commit '591e3efb6b51ed38e81b3f24bd4a529e21d60f0a'	2017-09-08 09:18:55 -07:00
Trevor Killeen	591e3efb6b	Merge pull request #54 from colesbury/default_args Handle default arguments in base Type class	2017-09-08 11:15:29 -04:00
Alexander Sidorov	5d85a6753a	Caffe2 BlackBoxPredictor Reviewed By: dzhulgakov Differential Revision: D5720775 fbshipit-source-id: e3b37ce5a4aa63807825937ec5f9c0aea76c2aba	2017-09-07 23:16:02 -07:00
James Reed	9aed89ac88	Allow specification of num_workers in PredictorExportMeta and enable for NMT beam search model Summary: The predictor export functions allowed a way to specify a net type, but no way to specify num_workers for when you use net type 'dag'. This adds that option to the PredictorExportMeta named tuple and populates the field in the exported protobuf. Also added parameters to callsites in NMT ensemble model class and model repackager to populate net_type and num_workers. Using DAGNet for our base predictor net (not recurrent stepnets) speeds up our inference by 1.15x, since we can now run encoder forward and backward RecurrentNet's for each model in the ensemble in parallel. Reviewed By: salexspb Differential Revision: D5792203 fbshipit-source-id: cb9a8237a0cbe1a09645d4de051dfbb23f06dcfa	2017-09-07 22:48:45 -07:00
Aapo Kyrola	519d5acd4d	fix bug in dependency inference for RNNExecutor Summary: RNN executor did not consider race condition -type of dependency where an op A reads blob X and following op writes blob X. This happened in beam search with a inplace-reshape following FC op. Reviewed By: jamesr66a Differential Revision: D5792018 fbshipit-source-id: a5590d80e1b7b127abcdf2b1c2854ea56018e12f	2017-09-07 21:42:14 -07:00
Yan Shang	6a883d1bc0	Remove dot_product layer Summary: This dot_product layer was added before functional layer was added. Now we have functional layer, this dot_product layer is no longer needed. This diff removes dot_product layer. Reviewed By: kittipatv Differential Revision: D5783303 fbshipit-source-id: 5d13f729918148ee57836fb47c48e6f24773654b	2017-09-07 18:48:30 -07:00
Fei Sun	c087a60026	The CMakeLists.txt name is wrong Summary: Fix the CMakeLists.txt file name Reviewed By: Yangqing Differential Revision: D5790555 fbshipit-source-id: 7c5cc36e6154a2708dc290a336da2204a387c416	2017-09-07 18:16:57 -07:00
Xianjie Chen	ec713d437d	make sure the output of sparse lookup layer is float Summary: currently, if reduer=Nonoe, the output if fp16 Differential Revision: D5773560 fbshipit-source-id: 24d7e5fae366d70352582e9a1ee14c7613753b7a	2017-09-07 17:47:39 -07:00
Yan Shang	b6c9ecac7c	Fix shape inference of distance_op Summary: The shape inference of distance_op has issues (only works when inputs are 1D tensors). This diff fix the shape inference and the unit test. Reviewed By: kittipatv Differential Revision: D5788744 fbshipit-source-id: cb1b7facf7b9ccd64b54edca156325eceef50f33	2017-09-07 17:16:46 -07:00
Junjie Bai	176f8f9a19	Make ConvTranspose allow optional bias term Reviewed By: jerryzh168 Differential Revision: D5755702 fbshipit-source-id: a00487ca376d09b68132162c53797f5af052d114	2017-09-07 17:16:43 -07:00
Jongsoo Park	eec2a0d905	add documentation on top_k option in accuracy op Summary: top_k argument was missing in accuracy op documentation Reviewed By: zem7 Differential Revision: D5758807 fbshipit-source-id: ca8a0c172d0a5eafb825a0b134529294edc0b8b4	2017-09-07 15:17:17 -07:00
Jongsoo Park	aeec8ae2ae	label_type option was duplicated in image_input_op Summary: Duplicated label_type removed Reviewed By: zem7 Differential Revision: D5753003 fbshipit-source-id: 7f2917ec201ecd859e9462622ddce637b84a3da7	2017-09-07 15:06:43 -07:00
Fei Sun	0f1a61cf80	@allow-large-files [Caffe2] [Folded diff] Move mobile files to mobile directory Reviewed By: Yangqing Differential Revision: D5752229 fbshipit-source-id: bc6e3ec3e4b06ae4b09f94b141a106420664d9ea	2017-09-07 15:06:43 -07:00
Simon Layton	381a45a541	Fix BAD_PARAM errors Summary: Closes https://github.com/caffe2/caffe2/pull/1117 Reviewed By: Yangqing Differential Revision: D5727512 Pulled By: pietern fbshipit-source-id: 540faafecb50e5815793991f2a443e9be7e5d353	2017-09-07 14:05:58 -07:00
Sam Gross	b2f0ee5d46	Handle scalars that are not backed by tensors	2017-09-07 12:40:31 -07:00
Sam Gross	f75cf375da	Add accessor to underlying Tensor	2017-09-07 12:40:31 -07:00
Sam Gross	32635f1292	zero_dim_to_one and empty_to_null can't both be specified	2017-09-07 12:29:44 -07:00
Sam Gross	70f7cfedea	Rename 'canonical' to 'has_full_argument_list'	2017-09-07 12:11:42 -07:00
Igor Sugak	98b7c882c0	add float-divide-by-zero suppressions Reviewed By: meyering Differential Revision: D5769768 fbshipit-source-id: 8835e1a605f64a02aaeac07cf2bb3c1dcf6aba00	2017-09-07 11:35:56 -07:00
Sam Gross	81066a5e30	Include non-canonical functions in Declarations.yaml	2017-09-07 11:34:17 -07:00
Aapo Kyrola	cd59b56440	tell which argument name is duplicate Summary: We could be a bit more helpful. Reviewed By: jamesr66a Differential Revision: D5778789 fbshipit-source-id: 570095196b07d593cfed8318477b296e47c5d43d	2017-09-07 11:19:20 -07:00
Sam Gross	7bbfa1dd76	Make Scalar default constructible	2017-09-07 11:06:31 -07:00
Trevor Killeen	e341bc3bea	Merge pull request #55 from colesbury/cwrap_files_base Use CWRAP_FILES_BASE if defined	2017-09-07 13:34:58 -04:00
Edward Z. Yang	459cc5a346	Check for nanopb and pybind11 submodules as well. (#2660 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 13:24:31 -04:00
Soumith Chintala	b2534a4f60	Merge commit '8176a558277aeec831f2f1a846cb4856a58fb941'	2017-09-07 13:23:22 -04:00
Rudy Bunel	8176a55827	Adjust error message for View When the size given is incorrect for the number of elements, the current error message is: `size '[1 x 1 x 5]' is invalid for input of with 1 elements at /pytorch/torch/lib/TH/THStorage.c:41` This replaces it by `size '[1 x 1 x 5]' is invalid for input with 1 elements at /pytorch/torch/lib/TH/THStorage.c:41` which is grammatically better	2017-09-07 13:22:20 -04:00
Soumith Chintala	8e4a889c8f	Add onnx to the documentation index.	2017-09-07 09:43:37 -07:00
Trevor Killeen	e8e1c61409	Merge pull request #51 from colesbury/const Add missing const qualifiers	2017-09-07 12:10:00 -04:00
Soumith Chintala	84095f9512	add linux guard	2017-09-07 11:57:49 -04:00
Soumith Chintala	ab3e95315d	Merge commit '3024ff5705faccc2908660582c895371fd133603'	2017-09-07 11:56:38 -04:00
Soumith Chintala	608327b156	Merge commit '5ef96aadd9287ef1f0c10d0469097fd9439efcd7'	2017-09-07 11:55:26 -04:00
Soumith Chintala	eea54cc065	Merge commit 'b6648fe311889cef29f34734d92caee7f5d54db2'	2017-09-07 11:55:00 -04:00
Soumith Chintala	894c05fd22	fix static linkage and make THD statically linked	2017-09-07 11:54:18 -04:00
Edward Z. Yang	a3ae136c25	Temporarily suppress buggy test case with relaxed test. (#2663 ) Proper broadcasting in ATen uncovered a bug in our fusion compiler where it outputs the wrong shaped tensor. We're tracking the issue in https://github.com/ezyang/pytorch/issues/206 but for now, rewrite the code so it does an "old style" comparison, which works fine. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 11:50:17 -04:00
Edward Z. Yang	9cdef6c33b	Update for latest ToffeeIR changes. (#2662 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 11:47:54 -04:00
Edward Z. Yang	4a952e7112	Python 3 fix: OrderedDict values is not a list. (#2661 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 11:47:39 -04:00
Edward Z. Yang	7838840084	Detailed install instructions for ONNX. (#2654 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-07 08:48:37 -04:00
Aapo Kyrola	b7997a0f41	support device ids>10 Summary: Data parallel model failed with device numbers 10, 11.. because it used string sorting of the blob names. Changed to make sorting happen based on device number and then blob name. Also added reduction for 16 devices. Reviewed By: wesolwsk Differential Revision: D5781521 fbshipit-source-id: 16be0984ecb55340604c82893be366c0528e822c	2017-09-07 00:01:33 -07:00
Soumith Chintala	3024ff5705	fix static linkage and make THD statically linked	2017-09-06 23:42:03 -07:00
Soumith Chintala	5ef96aadd9	fix static linkage and make THD statically linked	2017-09-06 23:41:45 -07:00
Soumith Chintala	b6648fe311	fix static linkage and make THD statically linked	2017-09-06 23:41:16 -07:00
Sam Gross	8190096fec	Handle default arguments in base Type class	2017-09-06 20:22:57 -07:00
Sam Gross	4e7f171ed5	Use CWRAP_FILES_BASE if defined	2017-09-06 20:18:18 -07:00
Edward Z. Yang	fbb8f13499	Docs now finally run with ToffeeIR master. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-06 21:35:50 -04:00
Adam Paszke	a2e5224847	Fix autograd tests	2017-09-06 21:35:50 -04:00
Adam Paszke	5e144a8938	Volatile input keys should also consider non-Variable arguments Additionally, check Variable argument sizes	2017-09-06 21:35:50 -04:00
Adam Paszke	a897f5a6ee	Expose requires_grad for cpp functions	2017-09-06 21:35:50 -04:00
Adam Paszke	d90cd88fb7	Improve next_functions hanling in tracer and JIT closure Added extra logic that records edges of previous stages and allows JIT closures to copy next_functions for next stages.	2017-09-06 21:35:50 -04:00
Adam Paszke	3b1dfcb51c	Add trace flag checking in backward passes too	2017-09-06 21:35:50 -04:00
Adam Paszke	ea888c1905	Check input flags in Traceable	2017-09-06 21:35:50 -04:00
Adam Paszke	230721e198	Support calling traced functions multiple times in forward * Variables now hold a list of ValueTracingStates and can participate in multiple traces. * Refactored Traceable to maintain a list of traces, and only stop tracing once it records all stages	2017-09-06 21:35:50 -04:00
Adam Paszke	fdbef1cfb0	Traces can now expire	2017-09-06 21:35:50 -04:00
Edward Z. Yang	eb11cab272	Misc doc improvements. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-06 21:35:50 -04:00
Edward Z. Yang	7ea9de051e	Code review comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-06 21:35:50 -04:00
Yangqing Jia	360cd9ca58	speed benchmark fix Summary: Closes https://github.com/caffe2/caffe2/pull/1171 Reviewed By: prigoyal Differential Revision: D5780191 Pulled By: Yangqing fbshipit-source-id: 5e8f983f2bcd308be247c60d20674c8fed101561	2017-09-06 16:46:52 -07:00
Kittipat Virochsiri	3251c60804	TensorInferenceFunction for Unique Summary: Filling in the gap in tensor inference Reviewed By: sunnieshang, akyrola Differential Revision: D5779550 fbshipit-source-id: 9ec68c9dad566183d7d0fc2819829c2b91430dda	2017-09-06 15:37:11 -07:00
Pieter Noordhuis	30da84fbe1	Make Gloo depend on Caffe2 NCCL build Summary: If Caffe2 used the packaged NCCL version then the Gloo build will try to use it as well. To make sure the NCCL build has completed we need to add an explicit dependency between the two. Another subtle change here is that we add the PROJECT_BINARY_DIR to the include path, since that is where the generated <gloo/config.h> resides. Without this path Caffe2 includes the empty config.h from the source tree. Closes https://github.com/caffe2/caffe2/pull/1170 Differential Revision: D5779002 Pulled By: pietern fbshipit-source-id: 9bc0d41f01a9b0f023d71bc4dee128a77eec1712	2017-09-06 15:37:10 -07:00
Sam Gross	a4a44a7cf3	Add missing const qualifiers	2017-09-06 14:53:02 -07:00
Aapo Kyrola	ceb13bf3fb	Fix cell/hidden init issue, add copy states to test Summary: As title. Wonder this had not been encountered before. Only affects cases where the states are copied over though. Reviewed By: Yangqing Differential Revision: D5777314 fbshipit-source-id: 8aef435c832e4ead5bb3d3e35bb065c734a2af5f	2017-09-06 14:16:17 -07:00
Wojciech Glogowski	d4336edb05	Disabled test for equivalency between Caffe2's and Numpy's YellowFin Summary: According to GitHub issue #1168, YellowFin's accuracy between Caffe2 and Numpy models from tests are not good enough in some environments. Results were very close on my machine. GitHub's Travis failed on some tests which I later disabled. Therefore the difference doesn't come from logical differences but from loss of precision on some machines. It is safe to disable equivalency test if equivalency was already once tested. Reviewed By: akyrola Differential Revision: D5777049 fbshipit-source-id: c249a205d94b52c3928c37481f15227d500aafd0	2017-09-06 13:47:45 -07:00
Pieter Noordhuis	6d5c3eaeb7	Add CloneCommonWorld op Summary: Cloning was previously done by overloading CreateCommonWorld op. Closes https://github.com/caffe2/caffe2/pull/1159 Reviewed By: andrewwdye Differential Revision: D5757580 Pulled By: pietern fbshipit-source-id: 9e80b295e390bf92623bafb72be21cbafdcf2ff4	2017-09-06 13:32:30 -07:00
Yan Shang	91b24b19de	Add type inference for EnsureDense and Normalize operator Summary: Add type inference for EnsureDense operator so that the output tensor has the same data_type and shape of the input tensor Reviewed By: kittipatv Differential Revision: D5763117 fbshipit-source-id: e507e8d928c1515bd01063e2af595eb0daf1e768	2017-09-06 12:52:36 -07:00
Aapo Kyrola	631971e459	threaded RNN executor for CPU, multi-stream executor CUDA Summary: Special executor for RNNs which can exploit parallelism over timesteps. For CPU we use multi-threading, achiving 3x or so improved on 4-layers LSTMs. With CUDA, perf improvements are more modest, but the structure allows for optimizing it further. For CUDA, we use multiple streams and events if there is parallellism over timesteps. In my experiments, it was not good to use more than 2 streams, though. Flag --caffe2_rnn_executor can be used to switch the executor off. Reviewed By: salexspb Differential Revision: D5749304 fbshipit-source-id: d6f76b3e16598be5b4e8188aff031671ebafaa4c	2017-09-06 12:26:30 -07:00
Jerry Zhang	ff38bbfe2c	Enable mpscnn only for 10.2 and above Summary: att Reviewed By: ajtulloch Differential Revision: D5773504 fbshipit-source-id: 452971ed295380193321b05458799dbd93f7ee52	2017-09-06 11:02:25 -07:00
Zach DeVito	9da95d9b07	bump to renamed onnx repo	2017-09-06 13:45:39 -04:00
Zach DeVito	3c61b59fd4	codemod primspec -> symbol, PrimSpec -> Symbolic	2017-09-06 13:45:39 -04:00
Zach DeVito	af649c19a2	ONNXIR -> to ONNX	2017-09-06 13:45:39 -04:00
Zach DeVito	bafe55bce4	use toffee import until ToffeeIR repo is renamed	2017-09-06 13:45:39 -04:00
Zach DeVito	6d8d5bab4c	Codemod Toffee -> ONNX, toffee -> onnx. Change file names to match	2017-09-06 13:45:39 -04:00
Edward Z. Yang	c42ca96714	Stop returning tensors from torch.onnx.export() Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-06 13:45:39 -04:00
Trevor Killeen	fc5137cf73	Merge pull request #49 from gchanan/broadcast Support broadcast specifications from cwrap.	2017-09-06 13:42:22 -04:00
Dmitrii Podoprikhin	c7684e3b27	Rowwise quantization Reviewed By: kennyhorror Differential Revision: D5753626 fbshipit-source-id: 680c627a81658bcd653feab68e7040db0cb7a185	2017-09-06 10:19:38 -07:00
Gregory Chanan	e4718430e8	Fix typo.	2017-09-06 09:12:45 -07:00
Gregory Chanan	e3d6c2a942	Add proper error message for specifying dimension on a tensor with no dimensions.	2017-09-06 12:09:16 -04:00
Gregory Chanan	22ea8d44e2	Remove unnecessary early conversion to IntList and make expand functions inline.	2017-09-06 08:33:38 -07:00
Jerry Zhang	5419c5ffbc	Set default values for concat_split_op Summary: att Reviewed By: bddppq Differential Revision: D5768251 fbshipit-source-id: 7f74b5c2826012619047b61d7a7d1588f1b8d0a6	2017-09-05 17:02:22 -07:00
Andrew Tulloch	6ad54d55c9	QConv impl (re-up) Summary: Re-revert of D5607549, won't break build (+ won't increase binary size). Reviewed By: Yangqing Differential Revision: D5769757 fbshipit-source-id: 725295f05355350774c5c5a10c8d2c90dd8b7994	2017-09-05 15:18:31 -07:00
Edward Z. Yang	4fc54af010	Code review comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Lu Fang	f1e4de9a63	Add primspec for Sub, Index, Chunk, and Embedding	2017-09-05 17:48:55 -04:00
Edward Z. Yang	29b4ebbf47	test_toffee updates. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c2f19f5d72	ToffeeIR update. - kernels -> kernel_shape - Use the new hybrid dict/tuple result object from Toffee - Write g and t as singulars, not plural - nanopb generated files update - Bugfix for msg() micropb helper - Start recording producer_version/producer_tag - Use ir_version from proto description - Value -> value (Constant) - Remove special-casing for transposed convolution; we now rely on the Caffe2 Toffee backend to do something reasonable - Batchnorm order is no more Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	a63d88c95b	print more detailed error message when trying to exported an unsupported operator	2017-09-05 17:48:55 -04:00
Priya Goyal	331521cdfd	Step 1: Trace and proto collected for SRResNet model (#183 )	2017-09-05 17:48:55 -04:00
Edward Z. Yang	1b792d3e57	Doc updates. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	cb5fbe1944	Expunge %2.0 syntax. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	394ff072eb	Update to latest ToffeeIR operator schema. - Conv no longer supports bias, so we create an explicit broadcasted addition afterwards. There is one minor problem, however, which is that ConvTranspose in Caffe2 has mandatory bias. So there's a hack. See Note [Caffe2ConvTranspose] for the details. - Squeeze: dims -> axes - Transpose: axes -> perm - Reshape lost its extra output (yay!) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	9a05b8dd62	Update to latest ToffeeIR protobuf. This was a doozy! - 'namespace' is a C++ reserved keyword, so if you have a field named this, nanopb will blithely export some malformed C++. I submitted a PR for this: https://github.com/ProjectToffee/ToffeeIR/pull/88 - Zach added support for singular tensor and graph. While attempting to add support for these, I realized that it was actually impossible to support them under the default protobuf translation. The gory details are in Note [Callback for nested messages]. The singular callbacks needed a new helper which I dubbed msg; it's just the singular version of list. - While I was working on the API, I braino'd with the tensor() method. It turns out this is totally not the right way to think about it; it's more string_from_tensor(). So I renamed it. I also renamed add_tensor to set_raw_data; add_tensor is a misnomer since it implies you can add multiple tensors, which is not true. - version turned into producer_version. Actually, this is a bit questionable and might change soon. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	99d6b9b923	make API debuggable	2017-09-05 17:48:55 -04:00
Zach DeVito	52e693022a	helper methods appendNewNode and NewNode for python Graph API uses suffixes to disambiguate attribute types	2017-09-05 17:48:55 -04:00
Edward Z. Yang	5c82aefa24	Fix bug in Transpose export. This is a case of two wrongs make a right. There were a pair of related bugs; - We incorrectly translated Transpose as if it were a Permute; but Torch transpose actually is a swap between dimensions. - Why didn't we ever notice it? In all of our tests, a transpose was solely done to get a weight matrix into the correct form. But Caffe2's FC operator implicitly does a transpose on the weight matrix. This commit fixes both of these problems. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b5833551f3	Documentation, and inplace support. This adds the PyTorch API user documentation for Toffee. To make the example work, I also converted all "inplace" ops to export out-of-place in Toffee. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	e1b3321f92	remove singluar kernel, stride, pad. they are being removed from ToffeeIR	2017-09-05 17:48:55 -04:00
Edward Z. Yang	434317b155	PR comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	57eb8bd288	Frontend refactor, and some documentation. - BC BREAKING: export now also takes a mandatory file-ish argument, specifying the file to export the protobuf to. I rewrote the tests to use BytesIO to get out the string so they could parse it again. - BC BREAKING: export no longer returns the tensors that were computed. To get these, use the internal _export function. - Multiple inputs to models are now supported by passing a tuple to input. (Old API of a single Variable still works.) - Keyword arguments to models are now supported via kwargs keyword arg. - Renamed embed_params to export_params, and it now defaults to True. - Toffee tests now live in their own test_toffee.py file. I had to rename a pile of expect files for this. - Removed defunct torch.toffee imports from autograd to solve module import cycle. - Helper function _with_file_like to abstract over opening file-ish arguments, taken from torch.save() Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6ae77b32b9	Delete dead torch.toffee.op Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	61a922e183	data_other_types now has correct type. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	1f77d482d5	Don't insert Transpose if it is no-op. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e29655f46d	Run JIT tests earlier Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	215b980f06	More torch.jit docs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4174112b49	Add lint pass for handle invariant. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	cd8d41c0f9	regen toffee.proto for nanopb, enum of types has dropped double	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3ef2ec6153	Actually correctly handle non-float exports. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	72843d5186	ATen hotfix: elementSizeInBytes for types Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	805c35a519	Model updates. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	daa3f7324c	Track ToffeeIR inplace changes. Rather than reuse input as output names in ToffeeIR, mark places where inputs are consumed. In C2 conversion these annotations will be used to create the corresponding graph. Toffee submodule update. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	c537aebf5a	Always run DCE in Traceable	2017-09-05 17:48:55 -04:00
Adam Paszke	9f0c4c9f9a	Make autograd engine reentrant without creating new threads	2017-09-05 17:48:55 -04:00
Priya Goyal	e05979c4ea	adding dummy bias for the conv transpose	2017-09-05 17:48:55 -04:00
Priya Goyal	ff77906e44	Refactor the user facing e2e test API - hide trace	2017-09-05 17:48:55 -04:00
Zach DeVito	4f6a7f4e2e	support more types in export	2017-09-05 17:48:55 -04:00
Zach DeVito	31eda1230c	support exporting constants	2017-09-05 17:48:55 -04:00
Edward Z. Yang	161e21f68d	Missing batchnorm fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	292ec9d75b	Remove NDEBUG macro. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b2e7438ead	Move disallow_copy into utils. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	d59714e3b1	Code review comment changes. - Reduce setup.py diff. - Expunge WITH_TOFFEE from codebase. - Elaborate on a comment. - Move gen_toffee.sh to tools - Delete densenet test. - Use 'using' to inherit a constructor. - Delete outdated comment. - Comment about why primspecs can return fewer outputs. - Remove dead, commented out includes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	7ac6d67a4e	Add nanopb to list of dep_libs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	77ede8fc1c	.travis.yml cleanup Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Priya Goyal	1e0171f436	Super resolution network (#148 )	2017-09-05 17:48:55 -04:00
Edward Z. Yang	2e266837f5	Port TracingState to pybind11, new export() method. Along the way I added converters for Variable and TracingInput. Variable should probably be moved to a more widely known spot. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8f1168d355	Test updates for new version Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	2bc3881fe2	Put version in protobuf we produce. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	50b5f4d219	Minor comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3b478c17a0	JIT backward closure comments / Render stage changes in inputs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	f83c4fad7b	Fix exception propagation from recursive Engine calls	2017-09-05 17:48:55 -04:00
Adam Paszke	d8e2ab632e	Add support for Constant nodes in AutogradClosureFactory	2017-09-05 17:48:55 -04:00
Adam Paszke	594f98ce16	Support multi-stage AutogradClosures	2017-09-05 17:48:55 -04:00
Adam Paszke	43be0a679c	fmap now doesn't require template arguments	2017-09-05 17:48:55 -04:00
Adam Paszke	b33f64b2e7	Fix nanopb build	2017-09-05 17:48:55 -04:00
Edward Z. Yang	25287a129b	Test updates for plural attributes #145 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Lu Fang	3b5a5a6f9c	Use plural attributes in MaxPool1d, MaxPool2d and AvgPool2d	2017-09-05 17:48:55 -04:00
Zach DeVito	225e8c8acf	switch to using raw_data in PB	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6264996169	ToffeeIR CI hotfix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	55ac596ea9	Faster tensor serialization. Instead of dynamically allocating a float for each element of the tensor (lol!) save the tensor itself, and directly read out the data. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	2d4da3657f	Maintain invariant in env that all nodes are mapped. "Unused" nodes are mapped to nullptr, and we distinguish on lookup nodes which were never mapped versus nodes that were mapped but supposed to be unused. This case should never happen, but a little extra safety never hurt. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e2a84e1e65	PR comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8c5eba3f3c	Add an Undefined node for null arguments to tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b2e305e390	Lint after ToffeeIR, and subsequent fallout. I realized we weren't running the linter after ToToffeeIR, so I added a lint call. It thus emerged that the current implementation was using "Unused" nodes that were not added to the graph, which was tripping the lint. I fixed this a few ways: - BatchNorm and Conv primspecs were returning dead "unused" nodes for their (implicit) handle parameters. I removed them because setOutputs handles this already, and a dead unused node which is not attached to the graph violates the "no dead nodes" invariant. - OK, but MaxPool actually needs to return a unused node for the output which supported by PyTorch but not Toffee; we need to error if subsequently in the trace this output is used. The new strategy is to have MaxPool's primspec return a None at the unused position, and then immediately check if there are any uses of that output. If there are, that's an error! - I needed to adjust the Select invariant in the exporter loop: only if a Select node has uses is it mandatory for it to be defined in env. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	685d7b83ba	Batchnorm's bias is mandatory. Unlike convolution, bias in SpatialBn is mandatory; see https://github.com/caffe2/caffe2/blob/master/caffe2/operators/spatial_batch_norm_op.cc Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	84f8c88c24	Batchnorm fixup Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	82efbe349b	Handle batchnorm properly. Basic idea: - Pass buffers (marked as non-Variable tensors) as input variables to the trace. Every buffer gets represented as an input variable to the trace, and we remember a correspondence of the underlying TH pointer and an input variable in the trace. - When we initially trace a function, we DO NOT record the buffers as edges. This is so autograd doesn't have to know anything about buffers. If we ever turn buffers into requires_grad=False parameters, then this problem goes away. - When we primspec the buffer, NOW we reach into the cached buffers (now appropriately named) and gin up the buffer information we need. Other things: - CppOp execution is now supported (but lightly tested) using SimpleEval (thanks @apaszke!) Todo: - E2E tests need to have their hacks removed. - Figure out what is going on with backwards Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	218058b94a	Make CppOp autograd execution work (temporary) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	63c835bbe7	Add keep_vars parameter to state_dict. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	96ae6a5e48	Don't DCE Params. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	2db7c5621f	merging dcgan changes that needed to be refactored from older primspec approach	2017-09-05 17:48:55 -04:00
Zach DeVito	dc6378d891	merge fixes for Squeeze and ConvTranspose	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a1bb403326	Ignore nanopb for lint. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	605ef38831	Explicitly override CMAKE_DEBUG_POSTFIX for nanopb build. If it's not set, CMAKE_DEBUG_POSTFIX sets it to 'd' which means the static library gets named something different when built in debug mode. This is annoying because it means if you build in debug mode, the library is in a different place. Rather than teach the build system to find the correct name, just set this POSTFIX so names don't change. Also, update setup.py to look for the non-debug archive. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	0bc498ee94	Apparently, lib64 isn't created all the time. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	de6ef65be5	Port to nanopb. General strategy: - nanopb is statically linked into PyTorch. It must be built with -fPIC. - Generated nanopb files for toffee.proto are checked into our repo. - Because nanopb generated protobufs are C only, we wrote a wrapper around it to give a Google C++ style interface. More on this shortly. How does the wrapper work? - It's called "micropb" becaues it is less small than nanopb :) - nanopb requires all variable-length fields to be written out using a "callbacks" mechanism. - We wrote pre-canned callbacks for all of the types ToffeeIR writes out and lists; these are micropb_callback and micropb_callback_list. These operate simply by dynamically allocating and storing the data to be written out in data (this defeats the purpose of the callback mechanism, but it's easy to implement) - Finally some boilerplate to actually implement the wrapper classes and have owning pointers to the actual data. Testing strategy: - Take the serialized protobuf from nanopb, parse it again with ToffeeIR and print it. Worked with all of test_jit.py! These tests don't run without 'toffee' being installed. TODO: - Update CI to install ToffeeIR, so we can run the Toffee tests in CI - Update E2E with Caffe2 tests so that they work with new stuff. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	ac8d3372b0	Add nanopb submodule. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	35bddb6b7e	pr feedback	2017-09-05 17:48:55 -04:00
Zach DeVito	c9f7f2eff4	Change pipeline for exporting to toffeeIR previously: PythonOp/CppOp Graph -> ToffeeIR, primspecs worked with protobufs now: PythonOp/CppOp --ToToffeIR--> jit::Graph of in-memory ToffeIR -> protobufs of ToffeIR This commit let's primspec functions work directly with JIT IR nodes, which makes it possible to do a lot more stuff in those functions.	2017-09-05 17:48:55 -04:00
Zach DeVito	3afb4d8728	giant expect commit	2017-09-05 17:48:55 -04:00
Zach DeVito	bad5717e15	add ability to specify initial values for inputs	2017-09-05 17:48:55 -04:00
Edward Z. Yang	81342910d7	fix the op Tanh spelling: tests Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Lu Fang	d12cf7dd45	fix the op Tanh spelling	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8c2663a685	Put every input on a new line: TestJit test updates Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	d5d65080e3	Put every input on a new line. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	384efe482a	Use Toffee IR schema to disambiguate types of attributes. Let say I write alpha=2 in my PyTorch code. Is alpha a float or an int? This problem is resolved when we actually pass it to the underlying kernel, which knows what type it expects it as. When serializing to Toffee IR, the Toffee NodeProto also needs to dictate the correct type; otherwise, we may guess wrong. We get this information from the OpSchema in the ToffeeIR library. With this, we can avoid explicitly casting in dropout.py and auto_primspec.py WARNING: You will need to update torch/lib/ToffeeIR when you pull this patch, as attribute schemas were added recently to ToffeeIR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	f062e06c91	Make null Variables on convolution and batchnorm work. This addresses when bias is disabled, which occurs in torchvision's alexnet and densenet. The general strategy is this: - When we encounter a null variable, we turn this into a Constant node with an undefined at::Tensor - Toffee exports for BatchNorm and Conv have special cases for bias, checking if they are provided by a Constant node with undefined value, and just omit the input if so. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6039f007c4	Make assertExpected Python 2 friendly. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	6701cc7c8e	flake8 excludes update	2017-09-05 17:48:55 -04:00
Zach DeVito	a60d9bd022	Bind Attributes in python ir, and add test for python ir binding	2017-09-05 17:48:55 -04:00
Zach DeVito	a3fdb281d1	Python wrapper for Node IR using pybind11 Supports almost all of the IR API.	2017-09-05 17:48:55 -04:00
Zach DeVito	6d0364f13d	Add pybind11 as a submodule.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	0a83f86348	Add Eval Handles: JIT test update Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	5823cc419a	Ignore Handle when exporting to ToffeeIR. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e14b766a81	Add a comment about Handle	2017-09-05 17:48:55 -04:00
Adam Paszke	965a349bbd	Record context edges in the JIT	2017-09-05 17:48:55 -04:00
Adam Paszke	9f97291408	Make tracer thread-safe	2017-09-05 17:48:55 -04:00
Adam Paszke	8dab0237e2	Maintain Select-node invariant in DCE	2017-09-05 17:48:55 -04:00
Adam Paszke	ec9761789a	Enforce deterministic ordering on Eval inputs/placeholders	2017-09-05 17:48:55 -04:00
Adam Paszke	fa308b3183	Improve backward tracing	2017-09-05 17:48:55 -04:00
Priya Goyal	91dcf2938a	Miscellaneous fixes needed to make caffe2 E2E	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6297144e51	Build hotfix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	1517ef687e	use constants for directions	2017-09-05 17:48:55 -04:00
Priya Goyal	b0ba9a81d2	remove std::list, restore custom node list implementation.	2017-09-05 17:48:55 -04:00
Zach DeVito	222e8c0591	PR fixes	2017-09-05 17:48:55 -04:00
Zach DeVito	b606106c4d	thread safe interned_strings	2017-09-05 17:48:55 -04:00
Zach DeVito	14f9316d2b	renaming IR_IF family	2017-09-05 17:48:55 -04:00
Zach DeVito	55cd9f37d1	remove Select, and NodeWithKind	2017-09-05 17:48:55 -04:00
Zach DeVito	4a4739e048	remove most node subtypes	2017-09-05 17:48:55 -04:00
Zach DeVito	c369a44bf1	remove chunk subclass	2017-09-05 17:48:55 -04:00
Zach DeVito	9f8a35c0b9	remove Primitive nodes.	2017-09-05 17:48:55 -04:00
Zach DeVito	24cdb897d6	starting removing nodes by removing Return	2017-09-05 17:48:55 -04:00
Zach DeVito	b037efa92c	prep for removing node subtypes	2017-09-05 17:48:55 -04:00
Zach DeVito	57b7370aab	switch NodeKind over to Symbol type.	2017-09-05 17:48:55 -04:00
Zach DeVito	1fa5b19ba4	Attributes object that mirrors Toffee, and interned string table, used by attributes for keys.	2017-09-05 17:48:55 -04:00
Priya Goyal	3c5dced6ce	Make batch-norm work end-to-end with caffe2 Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision:	2017-09-05 17:48:55 -04:00
Zach DeVito	3c6fbcabea	encode size in name...	2017-09-05 17:48:55 -04:00
Zach DeVito	d596bad1b9	remove attribute in expect	2017-09-05 17:48:55 -04:00
Zach DeVito	d7d74428a3	batchnorm hacking	2017-09-05 17:48:55 -04:00
Trevor Killeen	150fd2848d	batch norm primspec stub	2017-09-05 17:48:55 -04:00
Priya Goyal	af90a780d1	primspec for avgpool + squeeze (#80 )	2017-09-05 17:48:55 -04:00
Trevor Killeen	0ca3ca302e	test for primspec for concat (#77 )	2017-09-05 17:48:55 -04:00
Trevor Killeen	52e0816bed	primspec for concat	2017-09-05 17:48:55 -04:00
Priya Goyal	72a7530023	premspec for leaky_relu (#70 )	2017-09-05 17:48:55 -04:00
Edward Z. Yang	0e5320e073	Lint Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6405391065	Small comment. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	db79be82ab	Move Toffee for C++ functions back to autograd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	f265ff1dca	Bugfix where it was always input 0 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	bee0e45355	Don't create empty attributes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c0d0a99977	Alexnet back online. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	ee2ba279f2	Working Reshape op Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	35f1cb462d	Invert negation. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b345d81b	More alexnet things as primspec. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	1f4bebe27a	Build fixes when Toffee is enabled. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6f6fe177f1	Make Toffee optional. Unbreaks CI. The general strategy: - We put all the toffee files in torch/csrc/toffee; they will only be added when toffee is enabled - Toffee is enabled if torch/lib/ToffeeIR is present (since we don't have a submodule/subtree thing going on) - The most prevalant place you will need to use WITH_TOFFEE is for primspec definitions on C++ autograd functions. There is a macro HAS_PRIMSPEC to ameliorate optionally defining primspec() virtual overrides on Function classes. HasPrimspec is always available but will be a zero field class when Toffee is disabled. NB: We might revert this commit in the future if we figure out a way to unconditionally enable Toffee that everyone likes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	05a6d4c137	Create a C++ primspec virtual method. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4b1f182199	Disable C++ Python conversion code. We want all the conversion code to live in one place. Away it goes! This means that alexnet protobuf no longer works. It will start working again when we port changes. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	dd58b145c3	Toffee graph exporting for PyTorch. This commit adds a new exporter pass which takes a graph and returns a string of the human-readable protobuf representation of a model. We have two strategies for how conversions are implemented: - If a Python autograd function has a primspec static method, we invoke it to get the Toffee conversion. Use torch.toffee.op to generate the format expected to be returned. The particular data representation is opaque and subject to change in the future. - Otherwise, there's a giant if statement in the exporter, which manually uses the JIT IR C++ API and Toffee IR C++ protobuf API to convert. You must check out a copy of the ToffeeIR repo https://github.com/ProjectToffee/ToffeeIR at torch/lib; at the moment we don't have a subtree/submodule set up. Technical debt in this commit: - To get protobuf headers in scope, we unconditionally add $CONDA_PREFIX/include to the include path. This needs to be replaced with a more robust mechanism. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	890c2071f0	PR comments	2017-09-05 17:48:55 -04:00
Edward Z. Yang	35f1ca1293	Make autograd engine reentrant (#37 )	2017-09-05 17:48:55 -04:00
Zach DeVito	c8b303e853	guard dump, guard cuda	2017-09-05 17:48:55 -04:00
Zach DeVito	f4b7178b59	track scalar type	2017-09-05 17:48:55 -04:00
Zach DeVito	b6175eb54d	enable fusion group execution in autograd closure. implement chunk. propagate type information through fusion optimization.	2017-09-05 17:48:55 -04:00
Zach DeVito	62efac4ba5	make Type into a immutable object and share them rather than clone. allow nodes to have undefined types, which reflects reality right now where some TensorType nodes are just not filled in.	2017-09-05 17:48:55 -04:00
Zach DeVito	bcf5c11e10	cuda guards	2017-09-05 17:48:55 -04:00
Zach DeVito	e91966a0b4	Unify our tracing API into a single interface for functions/models. The API works on either functions or models, taking an extra parameter argument so that functions can pass in additional variables to trace. Other behavior is folded into boolean options: time - collect stats for our own perf debugging verify - run the original code, and check it is within threshold optimize - run optimization (currently off until fusiongroups pr is accepted). enabled - flag to turn off tracing so you can check timing of stuff that cannot be traced.	2017-09-05 17:48:55 -04:00
Zach DeVito	510529ecd0	missing expect	2017-09-05 17:48:55 -04:00
Edward Z. Yang	9431742d5a	Build error fix	2017-09-05 17:48:55 -04:00
Adam Paszke	7f60a18293	Add initial support for backward tracing	2017-09-05 17:48:55 -04:00
Edward Z. Yang	5b6bcf1ce4	Warning squishing. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	29ddcbfe17	Rename TypeKinds to suffix Type, matching class names. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	accd52feef	Print types, and improvements to type APIs. Fixes #48. I had to shave some yaks: - I needed switch on Type, so I wrote a new macro set TYPE_IF, and abstracted the IR_IF into a GENERIC_IF. The parametrization is on const-ness and the type kind; also there is a minor annoyance where type kinds (ugh, hate the name; it means the wrong thing in Haskell land) don't match the class names, so there needs some suffix munging. There's still some extra funny business, see https://github.com/ezyang/pytorch/issues/51 - A lot of functions on types weren't declared const when they could have been. I added const qualifiers as necessary. - setType now takes an honest to goodness Type* rather than TypeKind. - init_pass now preserves types when it does transformations. There are still some places we're losing types, most notably fusion. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	eb730f8321	Inplace test. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4a1bbc01ac	Fix #41 . Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	765b0bf137	Make in-place work again. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	32c5be4c31	Lint record_trace. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b6a8eaa6ed	Give ConvForward an explicit name. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	21c0ad9702	Test case that we fail legacy traces Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	453b0fac03	Always print diffs, no matter how large. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	624e451d6b	Add comments	2017-09-05 17:48:55 -04:00
Adam Paszke	1c4538e017	Trace C functions	2017-09-05 17:48:55 -04:00
Adam Paszke	bdcbbeaf68	Remove GlobalTracingState	2017-09-05 17:48:55 -04:00
Adam Paszke	ba2b2bcdc1	Change calling convention for C++ autograd functions	2017-09-05 17:48:55 -04:00
Zach DeVito	82ed7c0232	POC: add Handles to represent opaque state passed between Nodes	2017-09-05 17:48:55 -04:00
Zach DeVito	09b35506f4	rename init_pass.cpp	2017-09-05 17:48:55 -04:00
Zach DeVito	a136c30309	add comments, rename function	2017-09-05 17:48:55 -04:00
Zach DeVito	9fd06b2051	add a rule to distribute chunk operators when it stops fusions.	2017-09-05 17:48:55 -04:00
Zach DeVito	a096959ab8	make multi-output uses/defs easier to ready in pretty print.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	0d3421ac01	Handle Constant lint. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b158aaf6b4	Make linter an optimization pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	cf46ef05db	Finish the rest of the lint pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3016f459d2	Partial lint pass. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	76c7788e81	Remove THPP imports	2017-09-05 17:48:55 -04:00
Edward Z. Yang	dad625b54a	Comment for WrapConstant/ConstantFactory, remove thpp import	2017-09-05 17:48:55 -04:00
Edward Z. Yang	f0902027ce	Typofix	2017-09-05 17:48:55 -04:00
Adam Paszke	2dc3ef73ae	Lint	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c931feaad0	Elaborate on NB a little	2017-09-05 17:48:55 -04:00
Adam Paszke	3e0f1608fe	Capture Variables that are not inputs as constants	2017-09-05 17:48:55 -04:00
Adam Paszke	af21c6b018	Add Node type to JIT IR Rewrite Type as a class hierarchy PR comments + rebase fixes	2017-09-05 17:48:55 -04:00
Zach DeVito	348950dc74	cleanup jit_test	2017-09-05 17:48:55 -04:00
Zach DeVito	1f900861b6	remove _NOCAST, use fully-qualified name in macros	2017-09-05 17:48:55 -04:00
Adam Paszke	233a66dcbe	Remove SimpleMap from JIT IR	2017-09-05 17:48:55 -04:00
Zach DeVito	f5e414862a	cuda guards for fusion compiler	2017-09-05 17:48:55 -04:00
Edward Z. Yang	ea4aaa6b0b	Document TemplateEnv & PR fixes	2017-09-05 17:48:55 -04:00
Zach DeVito	50e51eaa7f	Fusion of simple map operations using nvrtc. Approach is based on the approach of THC's pointwiseApply{1,2,3} family of kernels, but doesn't have any dependencies on that code. Adjacent contiguous dimensions of input tensors are compressed to reduce the complexity of indexing math. For the completely contiguous case, the indexing logic simplifies to just the linear index. In simple tests, this code matched or beat the equivalent from THC.	2017-09-05 17:48:55 -04:00
Adam Paszke	51a1618683	Remove Return node from nodes()	2017-09-05 17:48:55 -04:00
Adam Paszke	a4086508c6	Enable tests	2017-09-05 17:48:55 -04:00
Adam Paszke	f270973937	Add JIT IR -> Autograd IR converter	2017-09-05 17:48:55 -04:00
Adam Paszke	e186d16e6b	Apply JIT optimizations form Python	2017-09-05 17:48:55 -04:00
Adam Paszke	72659bcdef	Minor code cleanup	2017-09-05 17:48:55 -04:00
Edward Z. Yang	57d65a99bb	Add LSTM fusion test. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8f12bc5a4c	Temporarily print Return nodes, pending printer fix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8f3a01932b	Swap order of assertMultiLineEqual. This makes the diff look more intuitive. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a5b87de139	Squash warnings. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	9662cffd26	Use std::list in JIT IR	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e238f3cada	Very simple accept/golden test framework for JIT trees. - To test whether or not a multiline string matches some expected value, you can use assertExpected. This tests that the string matches the content stored at a file based on the name of the test (and an optional subname parameter you can pass if you what to assertExpected multiple times.) - Suppose you make a change that modifies the output in a big way. Instead of manually going through and updating each test, you instead run python test/test_jit.py --accept. This updates all of the expected outputs. You can now review them one-by-one and make sure your changes make sense. We can add more features later (e.g., munging the output to make it more stable, more sanity checking) but this is just to get us started testing. One thing to watch out for is that accept tests on intermediate representation can be a bit wobbly: it is extremely important that people be able to read the IR. It may be worth introducing niceties to the printer in order to ensure this is the case. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	cb53882c5e	Make warnings clean. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Zach DeVito	ac5dd887dc	python clone, more asserts, better names.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	da6122bd35	Document all public graph manipulation functions	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e477e56519	Add some preconditions to the comments I added.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3182d732ee	Some documentation for mutator methods.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a89c49d723	Minor fixes to comments	2017-09-05 17:48:55 -04:00
Zach DeVito	d959bf43c3	add comments explaining IR and fuser	2017-09-05 17:48:55 -04:00
Zach DeVito	fde064088f	Add logic for fusion. Add clone mechanism to IR, with init() methods to setup nodes.	2017-09-05 17:48:55 -04:00
Zach DeVito	538cc89dbc	print uses in output	2017-09-05 17:48:55 -04:00
Zach DeVito	48945a435d	IR modifications to make mutatation possible. Nodes are in intrusive doubly-linked list. Methods added to manipulate inputs etc.	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a2c140f985	Refactor owning Graph pointer initialization. Now it gets initialized during the constructor. This results in more boilerplate but is conceptually more correct, and solves an assert failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	49bb223786	Break when an assert fails. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8215860d2f	Add an assert wrapper for easy porting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	3dcbba1f35	Keep Variable mapping as part of TracingState	2017-09-05 17:48:55 -04:00
Adam Paszke	55c9e0258e	Make the linter happy	2017-09-05 17:48:55 -04:00
Adam Paszke	6be47ec907	Minor fixes and improvements	2017-09-05 17:48:55 -04:00
Edward Z. Yang	2ced918063	Add a very simple visual (non-automated) test. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Adam Paszke	ea05ac8f41	Move JIT-related files to jit dir. Remove IR interpreter	2017-09-05 17:48:55 -04:00
Zach DeVito	1325fa511c	JIT IR including use-def chains and updated comments.	2017-09-05 17:48:55 -04:00
Zach DeVito	7c083b00f8	refcounting for Node/Value	2017-09-05 17:48:55 -04:00
Zach DeVito	f369f8e80d	simplify IR	2017-09-05 17:48:55 -04:00
Edward Z. Yang	4979359800	Add graphs, trace them. It is not an /expression/ we trace, but it is a /graph/: that is, a closed expression which knows its parameters. Knowing the list of parameters is helpful and helps remove a hack when interpreting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a2cc7a00e6	Fix Python 3 build problem Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c1dec0663f	New stratification: add Operator/Instruction This prevents nested lets, which are not allowed in ANF. We basically have SSA now. There's some niftiness with the visitor returning a lambda which then gets fed the actual argument. I like it. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	60751cd889	Add verify_model to torch.jit, for sanity checking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	7bd4c5a27c	Minor sanity check. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	3055b69f63	Refactor Arg class away. Although ANF style developments traditionally stratifies syntactic classes into atomic (Arg) and complex (Expr) expressions, where atomic expressions could be variables, constants or lambdas, Zach has successfully convinced me that we should do away with the variant here and always require arguments to be variables. There are a few reasons for this: 1) Tensor constants, not currently supported, could be modeled using a "Constant" instruction, removing the need for them to be representable directly inline. An inline constant is marginally more convenient for peephole optimizations, but since we have gone full ANF, we are going to need to be able to see across def-uses in any case, and it is not too much worse to need to handle constants this way. By the way, Swift Intermediate Language also made a similar choice, see the slide on "Literal Instructions" in http://llvm.org/devmtg/2015-10/slides/GroffLattner-SILHighLevelIR.pdf 2) Scalar constants, which are quite important for passing non-tensor arguments to Python operators, are now stored out-of-band as NON first-class values. This more closely matches the ToffeeIR design, and makes it clear what parameters are "first class" (tensors only) and which ones are not. However, we need to be able to unswizzle the separate scalar/tensor lists into a unified list in the correct format; this is what PyFunctionCConv is for. Also, Locals got renamed into Tuple. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	13663c1ee7	Fix clang build error, struct/class agreement. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	25bf7639f4	Remove incorrect clear from THPExpr/Arg_dealloc Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	8ab905b769	Remove unused output_list. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c466b2c1f6	Make an error message better Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c4ccae6a89	Document move semantics on PyObject with THPObjectPtr&& constructor. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	0fc17adf71	Add simple JIT frontend. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	f9458a3720	Add comments from discussion with Zach. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	d35ae86f26	Don't use misleading Ret nomenclature. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	a797ab9343	Rewrite AST to a new, more functional representation. Previously, our AST was a DAG, where shared Nodes indicated a computation should be reused. This commit rewrites the IR into a new functional representation which represents sharing explicitly using variable bindings. We offer a few justifications for this new style: 1. The new representation is not all that different from the old one; it is about as easy to construct, and the lack of an explicit graph doesn't negatively impact our ability to interpret the graph, since we've chosen, as a matter of design, to NOT have the IR participate in the actual execution of a graph. 2. The new let-binding representation has an implicit ordering, which we can use to conveniently keep track of the original order the trace showed up as. This automatically gives us a topsort, and gives us an easier to read textual representation of our IR: %14 = Embedding %11, %0, -1, None, 2, False, False %15 = Dropout %14, 0.2, True, False %16 = Index %12, 0 %17 = Index %12, 1 %18 = Index %13, 0 %19 = Index %13, 1 %20 = Index %15, 0 %21 = Linear %20, %1, %3 %22 = Linear %16, %2, %4 3. It moves us closer to a Futhark style language (http://futhark-lang.org/publications/pldi17.pdf). Major aspects of the diff - Node is replaced with Expr and Arg, a pair of mutually recursive structures which represent our new language. In BNF, the language looks like this: a ::= c \| %i e ::= %i, ... = e \| PyOp e, ... \| Ret %i, ... Technically, Ret is not actually a return (no control flow is involved), it just tuples up a series of tensors (identified by variables). One important invariant is that locals are always tensors; they are never constants (this is asymmetric with Args.) - Arguments support Python constants. This is an important piece because many operators take extra Python literals like integers and tuples in order to specify extra parameters about how an operator operates. Adding this was essential to getting word_language_model to work. - As both Expr and Arg have multiple variants, there is new infrastructure for doing case on the variants using ExprVisitor and ArgVisitor. The strategy here is adapted from WebAssembly's visitors, although we have generalized to permit arbitrary argument forwarding, which is necessary to support tail-recursive visitor calls. TCO is important because our interpreter may recurse arbitrarily deep into a stack of nested lets. If users wish, they can also manually case on the type tag. - Tracing is now turned on and off using _tracer_enter/_tracer_exit in torch._C. _tracer_enter accepts a list of variables which are to be treated as arguments; _tracer_exit accepts the list of traced variables which should be returned when you reexecute the trace, and returns the trace expression which can be reexecuted. GlobalTracingState is a global variable which tracks whether or not we are tracing or not. - You use run_forward to execute a trace on some set of parameters. - When under tracing, variables keep track, via trace_local, what the name of their variables in the IR are. Here is a simple runner which leaks memory but can be used to JIT models: import torch.autograd.function as F import torch._C def jit(model): import types real_forward = model.forward def forward(self, args): def flatten(x): return tuple(F._iter_variables(x)) if not hasattr(self, "saved_trace"): torch._C._tracer_enter(tuple(self.parameters()) + flatten(args)) out = real_forward(args) self.saved_trace = torch._C._tracer_exit(flatten(out)) self.saved_outs = out return out else: flat_out = Variable._execution_engine.run_forward(self.saved_trace, tuple(self.parameters()) + flatten(args)) return F._unflatten(flat_out, self.saved_outs) Major problems: - Sanity checking is spotty at best, especially when users pass in variables. - The interpreter leaks tensor memory from the store. When we add back def-use we should be able to deallocate tensors as soon as we know they are no longer necessary. - The interpreter needs to reach feature parity with the old execution engine. From there, we need to see if backwards can be subsumed as well. - I still have no confidence in having memory managed everything correctly. This requires a close look. - Rather than return an open expression as a trace, we should return a lambda instead, which knows about how many formal parameters it requires. - The IR is not introspectable from Python at the moment, but this is simply a matter of implementing all the binding code. - The tracer is NOT reentrant (you can't trace while you're inside a trace.) Furthermore, no sanity checking is done if you try to incorrectly reuse things from one trace in another. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	6f9774d7db	Minor bugfix. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	11107190ca	Handle legacy correctly. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	1e8bf12b3a	Add an inefficient but working evaluator for forward traces. Simple test: import torch from torch.autograd import Variable import torch._C as _C x = Variable(torch.Tensor([4]), requires_grad=True) y = Variable(torch.Tensor([7]), requires_grad=True) z = x * y z.sum().backward() print(x.grad) print(y.grad) x.data[0] = 2 y.data[0] = 3 (z,) = z._execution_engine.run_forward((x, y), (z,)) z.sum().backward() print(x.grad) print(y.grad) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	50b375d9bf	Add input nodes to the IR representation. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	e1b7872fc2	Make it possible to access IR from Python. Also, add a new trace_fn field to attach forward IR to Variables. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	c5faaf69d8	Initial IR representation for forward trace. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
Edward Z. Yang	b1b20e4097	Remove dead field from UnpackedInput Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-09-05 17:48:55 -04:00
ngimel	3d7459ff6c	fix indices for data_parallel and add parameter gradient tests (#2632 )	2017-09-05 17:29:27 -04:00
Aarti Basant	77a02eaa7f	Enable reader checkpoint Summary: Reader checkpointing was disabled due to bug captured in T21143272 Now that we have resolved that issue, re-enabling reader checkpointing Reviewed By: boryiingsu, rayleichen Differential Revision: D5730545 fbshipit-source-id: 7fae48b03e07eaf530bfc9e8e8b6683d8ed4e206	2017-09-05 14:21:25 -07:00
Andrew Tulloch	81ddd5e869	Use std::{thread,mutex,condition_variable} instead of raw pthreads in WorkersPool Reviewed By: Yangqing Differential Revision: D5753072 fbshipit-source-id: 436915e6253eb517306c577e31854f8e018a36dc	2017-09-05 12:33:13 -07:00
Aapo Kyrola	3ff351fc89	insert Free ops when blob used last time + memory allocation estimator Summary: release_blobs_when_used() will analyze when a blob is output the last time, and insert a Free op after that. Unless the blob was aliased. memonger.estimate_memory_usage() does a static memory analysis based on shape inference. See experimental/akyrola/test.py for example use. Reviewed By: asaadaldien Differential Revision: D5729199 fbshipit-source-id: 527a5152dbd4ef3bbe28b776c29163fff25f700a	2017-09-05 12:03:04 -07:00
Soumith Chintala	f2c9aea75f	Merge commit 'a3ddf9e18003f13a1094c7c5d62905f4db102da3'	2017-09-05 14:46:23 -04:00
Soumith Chintala	a3ddf9e180	fix pointer arithmetic for large input/output sizes	2017-09-05 11:44:16 -07:00
Dong Li	1104bab796	add axis argument to NormalizeOp and NormalizeGradientOp Summary: As described in task T21337239, NormalizeOp currently normalizes over only the last dimension. In this commit, the following changes have been made: (1) Added an axis-parameter to NormalizeOp in both the CPU and CUDA context. (2) Added the same axis parameter to NormalizeGradient in both the CPU and CUDA context (3) Removed the limit that the original NormalizeOp operator requires the input dimension to be 2 Reviewed By: akyrola Differential Revision: D5745162 fbshipit-source-id: 69e04f59ac4d954b0062c3b2a53c8ca465a1027b	2017-09-05 11:17:32 -07:00
Gregory Chanan	90a4d91469	Remove scalar expansion tests.	2017-09-05 10:59:40 -07:00
Gregory Chanan	25dd9ba799	Address review comments.	2017-09-05 10:34:04 -07:00
Yangqing Jia	7ec4485858	move cmake_uninstall.cmake.in into cmake/ subfolder Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/1167 Differential Revision: D5767229 Pulled By: Yangqing fbshipit-source-id: 0798981e505ffe11f532065680f794cba16d140c	2017-09-05 10:02:18 -07:00
Sang-gil Lee	42448cf07f	Fix to make the sample code executable as-is in "Extending PyTorch" (#2621 )	2017-09-05 10:19:49 -04:00
Iacopo Poli	1b013c0b52	fixed issue #2613 in torch/legacy/nn (#2624 )	2017-09-05 10:13:56 -04:00
Allen Ye	bfbd1bbb50	Update torch.triu/torch.tril doc (#2619 )	2017-09-05 00:05:44 -04:00
Hao Lu	dd5400e452	Make android segmentation network run on both iOS and android with tiling Summary: Add tiling support to GLAdd, GLPool, and GLResizeNearest Differential Revision: D5733208 fbshipit-source-id: b73113326b96d421787d4695ccf7d2d919ee2ed8	2017-09-04 17:33:31 -07:00
Soumith Chintala	8430cf6e86	Merge commit 'db78f3cf468549b206c3c8bdc9fb42df86ded2a7'	2017-09-04 11:13:59 -04:00
lichen	db78f3cf46	fix bug for THTensor data access	2017-09-04 11:12:52 -04:00
Ofir Press	40ca356d36	make logsoftmax documenatation readable (#2606 )	2017-09-04 00:23:26 -04:00
Ryuichi Yamamoto	7fa7a101af	Fix emmbedding doc formatting (#2605 )	2017-09-03 11:27:11 -04:00
hongyi-zhang	bf013f4c99	fix Python 2 gloo install (#2597 )	2017-09-02 20:05:37 -04:00
Alykhan Tejani	f0f7b39650	fix example in docs for nn.init.calculate_gain (#2600 )	2017-09-02 19:23:25 -04:00
Andrey Malevich	2d9728d594	Add more enforces to SparseToDenseMask operator. Summary: It looks like this operator is missing some enforces that it should have (since it's working on the user inputs). This diff is added enforces to ids to be in a valid range. Reviewed By: dzhulgakov Differential Revision: D5488336 fbshipit-source-id: e045c3b71b92e443edd23c95aa75d144877f1334	2017-09-02 02:16:24 -07:00
Luke Yeager	c858c68537	cmake: stop including files from the install directory Summary: Here is the buggy behavior which this change fixes: * On the first configure with CMake, a system-wide benchmark installation is not found, so we use the version in `third_party/` ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L98-L100)) * On installation, the benchmark sub-project installs its headers to `CMAKE_INSTALL_PREFIX` ([see here](https://github.com/google/benchmark/blob/4bf28e611b/src/CMakeLists.txt#L41-L44)) * On a rebuild, CMake searches the system again for a benchmark installation (see https://github.com/caffe2/caffe2/issues/916 for details on why the first search is not cached) * CMake includes `CMAKE_INSTALL_PREFIX` when searching the system ([docs](https://cmake.org/cmake/help/v3.0/variable/CMAKE_SYSTEM_PREFIX_PATH.html)) * Voila, a "system" installation of benchmark is found at `CMAKE_INSTALL_PREFIX` * On a rebuild, `-isystem $CMAKE_INSTALL_PREFIX/include` is added to every build target ([see here](https://github.com/caffe2/caffe2/blob/v0.8.1/cmake/Dependencies.cmake#L97)). e.g: cd /caffe2/build/caffe2/binaries && ccache /usr/bin/c++ -I/caffe2/build -isystem /caffe2/third_party/googletest/googletest/include -isystem /caffe2/install/include -isystem /usr/include/opencv -isystem /caffe2/third_party/eigen -isystem /usr/include/python2.7 -isystem /usr/lib/python2.7/dist-packages/numpy/core/include -isystem /caffe2/third_party/pybind11/include -isystem /usr/local/cuda/include -isystem /caffe2/third_party/cub -I/caffe2 -I/caffe2/build_host_protoc/include -fopenmp -std=c++11 -O2 -fPIC -Wno-narrowing -O3 -DNDEBUG -o CMakeFiles/split_db.dir/split_db.cc.o -c /caffe2/caffe2/binaries/split_db.cc This causes two issues: 1. Since the headers and libraries at `CMAKE_INSTALL_PREFIX` have a later timestamp than the built files, an unnecessary rebuild is triggered 2. Out-dated headers from the install directory are used during compilation, which can lead to strange build errors (which can usually be fixed by `rm -rf`'ing the install directory) Possible solutions: * Stop searching the system for an install of benchmark, and always use the version in `third_party/` * Cache the initial result of the system-wide search for benchmark, so we don't accidentally pick up the installed version later * Hack CMake to stop looking for headers and libraries in the installation directory This PR is an implementation of the first solution. Feel free to close this and fix the issue in another way if you like. Closes https://github.com/caffe2/caffe2/pull/1112 Differential Revision: D5761750 Pulled By: Yangqing fbshipit-source-id: 2240088994ffafdb6eedb3626d898b505a4ba564	2017-09-01 23:33:14 -07:00
Yangqing Jia	e368740612	Update the speed benchmark code Summary: (for TIR demo cases) Closes https://github.com/caffe2/caffe2/pull/1160 Differential Revision: D5761679 Pulled By: Yangqing fbshipit-source-id: 53b6c7fd098a394eba51baeac1e70371bcddf360	2017-09-01 23:16:39 -07:00
Curtis Huang	c9238671ee	Use char-ngram embedding for out-of-vocabulary words Summary: Description Provide DeepText model with the functionality to load a secondary index (pre-trained char-ngram embedding, e.g. FastText) during training/test. Embeddings of out-of-vocabulary words will be computed on-the-fly during training/test by averaging the char-ngram embeddings. Approach This diff provides two custom operators to accomplish this task – ConditionalOp and IndexCharNgramGetOp. We first use IndexCharNgramGetOp to perform char-ngram index lookup and return a sparse tensor segmented by lengths for each token. The sparse tensor is then used to compute the average embedding provided by the char-ngram index. Finally, we use a ConditionalOp to replace those whose embeddings were not found in the original index during the feature apply stage. Please refer to documentations of the code for more details. Reviewed By: jamesr66a Differential Revision: D5666924 fbshipit-source-id: f76605d093154a014d5b9ebf9510de9d79874eee	2017-09-01 19:16:49 -07:00
Aapo Kyrola	0c324ba417	set stream for cudnn handle correctly in cudnn wapper Summary: CuDNNWRapper inline_cudnn_handle() should set the stream every time, since it can change. This caused problems in RNN scenarious. Also this bug rendered singlethread_async_net incorrect / slow! I found out the problem by using nvprof --print-gpu-trace and noticing that some kernels were run in different stream than i expected. Reviewed By: ajtulloch, Yangqing Differential Revision: D5758426 fbshipit-source-id: 651c62fe28eaf09e1675d4adf3f1fac8b4c8e75b	2017-09-01 18:07:07 -07:00
Priya Goyal	ca3f2f9e6a	Small fix to exporter to accept net/NetDef both Reviewed By: bwasti Differential Revision: D5753261 fbshipit-source-id: 55b9252606023648ee3b2acdcbbe89bcc8b54748	2017-09-01 13:32:12 -07:00
Gregory Chanan	1f1aca6e09	Support broadcast specifications from cwrap. This respects all the broadcast cwrap specifications except for 'fallback'; i.e. pointwise functions operating on tensors where the number of elements match but the sizes are different and not broadcastable. This behavior is currently deprecated in PyTorch. Note that this is a breaking change in ATen, because ATen just passes through to TH/THC, where the fallback behavior is actually implemented. This also changes expand semantics wrt Scalars (as tensors). Previously, one could 'expand' a 1-dimensional tensor with size 1 to a 'scalar' (i.e. empty size initializer list).	2017-09-01 12:11:04 -07:00
Wojciech Glogowski	a1992e81b3	Replaced std::copysign(x) with (x > 0 ? 1 : -1) Summary: Replaced std::copysign(x) with (x > 0 ? 1 : -1). std::copysign is not available on some Android platforms which was detected in GitHub's Travis tests: "/home/travis/build/caffe2/caffe2/caffe2/sgd/yellowfin_op.cc:57:23: error: 'copysign' is not a member of 'std'" Reviewed By: akyrola Differential Revision: D5756384 fbshipit-source-id: 56bc220d2c6216ff45b9cc47ed02aebf6ad439a5	2017-09-01 11:52:44 -07:00
Trevor Killeen	579fc7e959	unify bernoulli yaml declarations across backends (#2578 )	2017-09-01 14:28:42 -04:00
Trevor Killeen	8820d467d6	handle useless ellipsis in advanced indexing (#2589 )	2017-09-01 14:27:47 -04:00
Alykhan Tejani	c5a8a59116	raise KeyError if registering buffer/param when attr exists (#2108 )	2017-09-01 14:08:49 -04:00
Wojciech Glogowski	925cfc0d90	Disabling test for YellowFin Summary: Disabling test for YellowFin that does not pass test in Travis. Difference comes from numerical reasons. Test passes on my cpu / math libraries. Decide whether to merge it. Reviewed By: Yangqing Differential Revision: D5754144 fbshipit-source-id: b6ed6628f962d6904a8d522f0cf4080d7878acad	2017-09-01 10:35:48 -07:00
Aapo Kyrola	bb08f261f1	EnsureDense/SparseToDense for CUDA Summary: Make CUDA version of SparseToDense, register EnsureDense (which is trivial) on CUDA. Need to use atomics because indices can be duplicated. We can later add an option to inform if the indices are unique, and use faster path then. Reviewed By: jhcross Differential Revision: D5750893 fbshipit-source-id: 005d1675b127a571aac8474fca62d9633f0c7bff	2017-09-01 09:33:05 -07:00
Yangqing Jia	b2bd9ef15a	protoc: only disable in watch os mode Summary: (see comment) Closes https://github.com/caffe2/caffe2/pull/1157 Reviewed By: bddppq Differential Revision: D5753813 Pulled By: Yangqing fbshipit-source-id: 7a60f03bb37314161e42ac0405a4b168d2541f3f	2017-09-01 00:46:51 -07:00
Jerry Zhang	3ae810753e	fix travis build Summary: Closes https://github.com/caffe2/caffe2/pull/1150 Reviewed By: Yangqing Differential Revision: D5753901 Pulled By: jerryzh168 fbshipit-source-id: f4fd9259207bba4e602abee0b194a5557f57fa77	2017-08-31 23:35:20 -07:00
Adam Paszke	9f685e4aa3	Ensure GIL is held in ObjectPtrAllocators (#2581 )	2017-09-01 00:30:09 -04:00
Trevor Killeen	26cdfcd9cf	allow single non-tuple sequence to trigger advanced indexing (#2323 )	2017-09-01 00:28:45 -04:00
James Cross	53ccbd9a6e	soft-coverage attention Summary: Implementation of a new variant of attention module, which contains a recurrent decoder state with vectors corresponding to each source-side word and strictly increasing values, thus enabling it to model the degree to which source words have been translated. The approach is a variant of the approaches described in https://arxiv.org/pdf/1601.04811.pdf. We simply include the sum of all previous attention weights for encoder words as a new recurrent state (coverage_t). A new linear transform on encoder_outputs is used to produce coverage_weights, which has the same dimensionality as encoder_outputs, and implicitly models the fertility of source-side words (and putting this extra information strain on the encoder network). Thus the encoder output, the decoder state, and the coverage weights have the same dimensionality for a given source word, and attention logits are calculated as v * tanh(coverage * coverage_weights + encoder_output + decoder_state). Note: the entire coverage state for each translation instance is of shape (encoder_length, coverage_units), but the states for the RecurrentNetwork operator, used to train the decoder, must be flat in the data dimension. This state is therefore initialized with shape (encoder_length * coverage_units) [not shown in the open-source library] and reshaped appropriately within the apply_soft_coverage_attention() function. Differential Revision: D5593617 fbshipit-source-id: 7d0522b5eb0b26f22e8429e4461a459f2f16ed46	2017-08-31 21:21:54 -07:00
Bram Wasti	0e99e7bd99	Accidental addition of a file Summary: hotfix Reviewed By: Yangqing Differential Revision: D5753187 fbshipit-source-id: 472c36a6b69cfb4ffb279e525a5eb43133828f1b	2017-08-31 20:17:12 -07:00
Andrew Tulloch	0b643fce09	Use std::atomic instead of volatile and custom barriers in WorkerPool Reviewed By: Maratyszcza Differential Revision: D5745673 fbshipit-source-id: fcaabe941847e58624c8e87d27ccc607dc74e27f	2017-08-31 18:47:42 -07:00
Bram Wasti	50b5c76ea9	A benchmark generator for individual ops Summary: basic little op benchmark generator -- outputs init_net.pb and predict_net.pb for use with speed_benchmark or mobile_speed_benchmark Reviewed By: Maratyszcza Differential Revision: D5728534 fbshipit-source-id: 3e912fa63548497ca65ab34c8bb967694c46815b	2017-08-31 17:33:21 -07:00
Aapo Kyrola	a1a4924fc0	protect cudnnSetDropoutDescriptor with mutex Summary: Turns out NCCL can deadlock with cudnnSetDropoutDescriptor, so we need a lock. Reviewed By: pietern Differential Revision: D5748325 fbshipit-source-id: b3828c50f6acfc4b5323008ec04f571f6d0d5586	2017-08-31 14:56:07 -07:00
Bram Wasti	a0bd836afd	add conv flops inference Summary: Added super rough conv cost inference that takes into account very few params Reviewed By: Maratyszcza Differential Revision: D5412611 fbshipit-source-id: f662822fd5a532eacb525fbc361e8a62f32430a8	2017-08-31 14:18:21 -07:00
Bram Wasti	c609f22638	added gflop annotation to TEST_benchmark Summary: TEST_benchmark will print out gflops if it can infer them Reviewed By: Maratyszcza Differential Revision: D5412644 fbshipit-source-id: 3af7bb42cda4684e30db6d8ae5484d441898479c	2017-08-31 14:18:20 -07:00
Andrey Malevich	fc1f117502	Return TensorInferenceFunction for SliceOp Summary: It looks like one of the rebases that I have been doing on this op have completely messed up my code and I have accidentally remove TensorInferenceFunction for SliceOp. This diff is returning it back. Reviewed By: akyrola Differential Revision: D5745305 fbshipit-source-id: 5266c9e14c7d55be5a9cc96688e128db79547b1a	2017-08-31 14:03:47 -07:00
Andrey Malevich	03711e9ab8	Handle bool's correctly in net.Const Summary: As desc. Reviewed By: volkhin Differential Revision: D5745310 fbshipit-source-id: 66c3da37a42cf98bae05cead58f3f694eae19e0d	2017-08-31 12:02:58 -07:00
Jerry Zhang	debceaff02	Support new arguments in ConvTranspose Summary: Adding support to use kernels, strides, pads etc. as arguments. Reviewed By: houseroad Differential Revision: D5710699 fbshipit-source-id: 8b63af4c4a76cd06b637a376aeb29a34c659be2e	2017-08-31 11:17:32 -07:00
Alisson Gusatti Azzolini	b4b89e1bd5	Ability to dequeue and concat multiple records in a single QueueDequeue op Summary: This will allow to do data reading in small batches and concat the batches later on. Reviewed By: kennyhorror Differential Revision: D5739129 fbshipit-source-id: 66a8087e5f9d10d654e367c6111ac90cbf54224e	2017-08-31 10:48:59 -07:00
Aapo Kyrola	eed2292123	check for null commonworld in DestroyCommonWorld Summary: Check for nullptr before closing a common world. Reviewed By: pietern Differential Revision: D5746256 fbshipit-source-id: d395bf60d3b7f2c2629761d2b6fd46085683390c	2017-08-31 10:48:57 -07:00
Kittipat Virochsiri	4ec26d23a7	TensorInference function for LengthsSum and such Summary: Adding missing tensor inference function Reviewed By: kennyhorror Differential Revision: D5735119 fbshipit-source-id: 1602b5aeec95f13a3c3c6d3e5417af2712a4dfbb	2017-08-31 09:32:48 -07:00
Kittipat Virochsiri	571b651ef2	Remove redundant tensor inference function Summary: Both D5695197 & D5691262 implement the tensor inference function for Gather. Keeping only one. Reviewed By: akyrola Differential Revision: D5742331 fbshipit-source-id: 1c31427fbfbc87bfec84b8c04851275f45154fcf	2017-08-31 09:17:43 -07:00
Soumith Chintala	d84dbcfb9e	add a "clone the source" section	2017-08-31 11:55:23 -04:00
Bram Wasti	a7e11bddab	Added readme for SNPE build and usage Summary: .. Reviewed By: asaadaldien Differential Revision: D5744360 fbshipit-source-id: cae3b4cb86aa75cc5a22225d09e9bfe288920a91	2017-08-30 21:47:01 -07:00
Wojciech Glogowski	fefd5479a3	Initial implementation of YellowFin algorithm Summary: Added YellowFin optimizer to Caffe2. This implemention is different from the original: It has separate alpha and mu for each parameter and it uses different version of Momentum SGD. Tests / benchmarks for the optimizer are to be done. Some refactor of the code is to be done before pushing. This is still a working version. Reviewed By: akyrola Differential Revision: D5652689 fbshipit-source-id: c10dc0424f47c3051b454aede1d121902cb759a8	2017-08-30 18:53:46 -07:00
Wojciech Glogowski	5ed5be71b1	YellowFin GPU class and Python optimizer Summary: YellowFin GPU in .cu file, Python operator in optimizer.py Reviewed By: asaadaldien, akyrola Differential Revision: D5727450 fbshipit-source-id: 42a878e5fd35e288e0e6eeaa0bf980a9db96e5a7	2017-08-30 18:32:24 -07:00
Hassan Eslami	0f3a5d3180	Tuning number of parameter servers based on performance estimation job Summary: 1) Adds monitoring of CPU utilization in trainers and PS's, and report the utilization to global statistics 2) Adds the plan execution time to global stats 3) Uses CPU utilization and network utilization observed from performance estimation job to calculate the optimal number of parameter servers needed for the actual job. The optimal number of parameter server is the minimum number of servers needed while parameter servers are not the bottleneck in execution. //Note: The calculation assumes that parameter shards are assigned to PS's in a uniform way and accesses to the shards follow a uniform access pattern. In reality, shards' access pattern may be skewed. As a next step, we should monitor shard access pattern in performance estimation job and distribute the shards in the optimal way.// Reviewed By: sf-wind Differential Revision: D5674398 fbshipit-source-id: 67a07cb9ed4e4d61ff5e81a0ecfe519b8feb2352	2017-08-30 18:03:59 -07:00
Andrey Malevich	00366ca2d1	Move SliceOp outisde of utility_ops.h Summary: As desc. Reviewed By: ajtulloch Differential Revision: D5713178 fbshipit-source-id: 4c733bfd4ca2e8e2f6650e2ae76ef1e7d09046d4	2017-08-30 18:03:58 -07:00
Wojciech Glogowski	55ec2cb08c	YellowFin CPU class Summary: .h and .c files with YellowFinOp. .cu and test files will be included in next commits. Reviewed By: akyrola Differential Revision: D5724198 fbshipit-source-id: b05b9c047af25f9081641a0fe0cdba2ee74cb04b	2017-08-30 17:02:24 -07:00
Jiyan Yang	33ef5f38a0	Fixed cuda loss op Summary: Currently the loss ops are still not on GPU even though ALL strategy is selected. This diff is to enable it. Reviewed By: xianjiec Differential Revision: D5671255 fbshipit-source-id: 033863f171e1f89c8d75430d3af6a1e6d0d2eff2	2017-08-30 17:02:23 -07:00
Yassir Solomah	94edc073ed	Added subtraction operator, tested useTextureInput set to True Summary: -Added subtraction operator and testing the subtraction precision -CPU running at 692 ms/iter -GPU Accelerated running at 160 ms/iter Differential Revision: D5732310 fbshipit-source-id: 763e6eb62aee2ee2ad0f58cc0655a718ffa07ce1	2017-08-30 17:02:23 -07:00
Yassir Solomah	f293a22f39	Comparing CPU & GPU results for Denoiser Network Summary: Layer by layer comparison between CPU and GPU verified within 1% scale precision Differential Revision: D5714594 fbshipit-source-id: f4ddee60c317aeeae4c7f3f9ac299fddf9057761	2017-08-30 17:02:22 -07:00
Jerry Zhang	f103b4d93f	Relax dimension constraint in CUDA to 6 for Transpose Summary: att Reviewed By: bddppq Differential Revision: D5739634 fbshipit-source-id: 967d2b8811a619dc57a65943cd3ba1063c998aa3	2017-08-30 17:02:21 -07:00
Misha Smelyanskiy	080fab8f6c	Code generator for and high-performance emebding look-up kernels, supporting Summary: Code generator for and high-performance emebding look-up kernels, supporting Sum, WeightedSum, and Mean reducers. Achieve at least 1.5x speedup on float and over 2x speedup for float16, compared to existing code These are results on Broadwell, using sparse_lengths_sum_benchmar.par benchmark Old ============== [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 Preparing lookup table. 2017-08-08 00:10:23.101848 Preparation finished. 2017-08-08 00:10:27.955680 I0808 00:10:27.955732 30700 net.cc:177] Starting benchmark. I0808 00:10:27.955759 30700 net.cc:178] Running warmup runs. I0808 00:10:27.956367 30700 net.cc:188] Main runs. I0808 00:10:31.839035 30700 net.cc:199] Main run finished. Milliseconds per iter: 0.388264. Iters per second: 2575.56 I0808 00:10:35.704169 30700 net.cc:233] Operator #0 (indices, Python) 0.0583264 ms/iter I0808 00:10:35.704210 30700 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.327694 ms/iter I0808 00:10:35.704213 30700 net.cc:237] Time per operator type: I0808 00:10:35.704217 30700 net.cc:246] 0.327694 SparseLengthsSum I0808 00:10:35.704221 30700 net.cc:246] 0.0583264 Python [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 --dtype float16 Preparing lookup table. 2017-08-08 00:10:59.047159 Preparation finished. 2017-08-08 00:11:05.140565 I0808 00:11:05.140612 31725 net.cc:177] Starting benchmark. I0808 00:11:05.140635 31725 net.cc:178] Running warmup runs. I0808 00:11:05.141104 31725 net.cc:188] Main runs. I0808 00:11:08.371510 31725 net.cc:199] Main run finished. Milliseconds per iter: 0.323039. Iters per second: 3095.6 I0808 00:11:11.671450 31725 net.cc:233] Operator #0 (indices, Python) 0.0609876 ms/iter I0808 00:11:11.671489 31725 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.26856 ms/iter I0808 00:11:11.671494 31725 net.cc:237] Time per operator type: I0808 00:11:11.671497 31725 net.cc:246] 0.26856 SparseLengthsSum I0808 00:11:11.671500 31725 net.cc:246] 0.0609876 Python New (Misha's) ============== [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 Preparing lookup table. 2017-08-07 23:44:55.897748 Preparation finished. 2017-08-07 23:45:00.708896 I0807 23:45:00.708945 4178361 net.cc:177] Starting benchmark. I0807 23:45:00.708971 4178361 net.cc:178] Running warmup runs. I0807 23:45:00.709444 4178361 net.cc:188] Main runs. I0807 23:45:03.608551 4178361 net.cc:199] Main run finished. Milliseconds per iter: 0.289909. Iters per second: 3449.36 I0807 23:45:06.536182 4178361 net.cc:233] Operator #0 (indices, Python) 0.0572399 ms/iter I0807 23:45:06.536224 4178361 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.23512 ms/iter I0807 23:45:06.536228 4178361 net.cc:237] Time per operator type: I0807 23:45:06.536232 4178361 net.cc:246] 0.23512 SparseLengthsSum I0807 23:45:06.536236 4178361 net.cc:246] 0.0572399 Python [root@fblearner001.01.ftw1 /home/msmelyan]# numactl -m 0 -C 0 ./sparse_lengths_sum_benchmark.par --iteration 10000 --dtype float16 Preparing lookup table. 2017-08-07 23:45:17.191579 Preparation finished. 2017-08-07 23:45:23.173668 I0807 23:45:23.173715 4179316 net.cc:177] Starting benchmark. I0807 23:45:23.173743 4179316 net.cc:178] Running warmup runs. I0807 23:45:23.174090 4179316 net.cc:188] Main runs. I0807 23:45:24.939749 4179316 net.cc:199] Main run finished. Milliseconds per iter: 0.176564. Iters per second: 5663.67 I0807 23:45:26.698885 4179316 net.cc:233] Operator #0 (indices, Python) 0.0557303 ms/iter I0807 23:45:26.698923 4179316 net.cc:233] Operator #1 (Y, SparseLengthsSum) 0.119794 ms/iter I0807 23:45:26.698927 4179316 net.cc:237] Time per operator type: I0807 23:45:26.698931 4179316 net.cc:246] 0.119794 SparseLengthsSum I0807 23:45:26.698935 4179316 net.cc:246] 0.0557303 Python Reviewed By: salexspb Differential Revision: D5582172 fbshipit-source-id: d71f5a55580b734a51b8f30852b75f379acfdaf2	2017-08-30 16:22:11 -07:00
Trevor Killeen	71a87f0645	elementSizeInBytes for types	2017-08-30 15:55:56 -07:00
Pieter Noordhuis	45e6e71198	Tidy up CMake for NCCL Summary: Use HINTS instead of PATHS for find_library so that you can specify -DNCCL_ROOT_DIR and it will use this NCCL installation regardless of what else is installed on your system. Also add a path hint to include the default base path for NCCL 2 libraries. Closes https://github.com/caffe2/caffe2/pull/1152 Reviewed By: Yangqing Differential Revision: D5740053 Pulled By: pietern fbshipit-source-id: 43f0908a63e8a9b90320dece0bbb558827433b48	2017-08-30 15:39:56 -07:00
Edward Z. Yang	a03e5cb409	Remind users to submodule update. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-08-30 16:14:38 -04:00
Edward Z. Yang	466f0a823a	Use external nccl, fixes #2553 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-08-30 16:14:38 -04:00
Wojciech Glogowski	a7ec5def7b	data_parallel_model names fix Summary: Updated usage of deprecated functions in data_parallel_model.py Reviewed By: akyrola Differential Revision: D5738512 fbshipit-source-id: a7767e518da777ece058bcad480e5df1d91e9b42	2017-08-30 12:47:14 -07:00
Trevor Killeen	8f6fa78271	disable cudnn when output_padding >= stride or dilation	2017-08-30 15:46:35 -04:00
Trevor Killeen	4a211314d0	fix shape and correctness bugs in autograd/convolution BackwardBackward	2017-08-30 15:46:35 -04:00
Trevor Killeen	58b7d1c764	remove python convnd function	2017-08-30 15:46:35 -04:00
Trevor Killeen	7ca196c11d	enable cudnn transposed dilated	2017-08-30 15:46:35 -04:00
Trevor Killeen	0cf2c37505	refactor nn calls in autograd convolution	2017-08-30 15:46:35 -04:00
Trevor Killeen	e950c44c80	enable dilated transpose and gradgrad nn tests	2017-08-30 15:46:35 -04:00
Trevor Killeen	d13d95c09c	dilated/transposed conv in autograd	2017-08-30 15:46:35 -04:00
Eider Moore	ae5101c137	Fix range op's GPU Summary: The GPU op was broken. Copy over the scalar data so that it can be used to construct the output tensor. Reviewed By: akyrola Differential Revision: D5733170 fbshipit-source-id: dfc800b9a408eaeb7f9abefbb640e10074204add	2017-08-30 11:47:48 -07:00
Jerry Zhang	1b11ea3934	Change default argument for LRN Summary: att Reviewed By: bddppq Differential Revision: D5736242 fbshipit-source-id: c79f27c3177d5446f8ef2044b5e21e432382b4e7	2017-08-30 10:51:19 -07:00
Pieter Noordhuis	59540847b1	Include Caffe2 headers before anything else Summary: This was a tricky one to debug. After pulling from master, my build was complaining that certain identifiers in updated source files were undefined. After building with VERBOSE=1, extracting the compilation commands, and adding -M, I saw that CMake had included the Caffe2 installation directory as include path. Worse yet, this path had precedence over the path to the actual source code. The compiler included older headers when compiling newer source files. This change forces the path to the Caffe2 source code to take precedence over all other include paths. The only path that takes precedence over that path is PROJECT_BINARY_DIR, which holds the headers that are generated at compile time. Closes https://github.com/caffe2/caffe2/pull/1140 Reviewed By: Yangqing Differential Revision: D5727133 Pulled By: pietern fbshipit-source-id: c60c89e82e8b1ab1cfca0907d31b84417788d79b	2017-08-30 10:22:06 -07:00
Edward Z. Yang	6e03c5dc1f	Ignore gloo when linting. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-08-30 11:54:04 -04:00
Zach DeVito	7310ebb66f	Add gloo submodule. We make gloo a submodule because it contains submodules itself, and Git cannot handle subtrees with nested submodules. Fixes https://github.com/pytorch/pytorch/issues/2426	2017-08-30 11:54:04 -04:00
Zach DeVito	aaef3a3ed3	Remove gloo subtree, in preparation for gloo submodule addition.	2017-08-30 11:54:04 -04:00
Michael Dietz	e69063405e	Allow param groups to be added to Optimizer dynamically (#2374 )	2017-08-30 11:20:58 -04:00
Jongsoo Park	19f6941e7c	fix arxiv link to batch-norm paper Summary: arxiv link to batch-norm paper was broken because dot(.) was included at the end Reviewed By: zem7 Differential Revision: D5734405 fbshipit-source-id: e037c14091e7f9e415c2f7a3008cbf2bf066e699	2017-08-30 07:51:13 -07:00
Andrew Tulloch	501b49074b	TestGLConvolution CPU baseline can use precomputed Reviewed By: Maratyszcza Differential Revision: D5734588 fbshipit-source-id: e6ef13620b1efaeab95237f4fe1560a653a9199b	2017-08-30 00:24:53 -07:00
Bram Wasti	7ba8de6771	Add S6 support Summary: Adding support for integer textures and thus the Galaxy S6 among other devices Differential Revision: D5695151 fbshipit-source-id: 46514e5aa931f98f8c7c82ec923e7803bcaa9bc0	2017-08-29 21:31:17 -07:00
Andrew Tulloch	68842a5510	HPTT Differential Revision: D5733991 fbshipit-source-id: 40e457cdab5ef444c8a5b36402187f7c7c3be5b3	2017-08-29 21:06:40 -07:00
Aapo Kyrola	60c17a34b5	better default settings for CUB Summary: The default CUB settings led to very slow execution in practice when using "dynamic" memory allocation with C2 (i.e freeing blobs after their use). After some tinkering, I arrived to these numbers, that work with resnet-50 and NVIDIA M40 GPU much better than the origianal defaults. Also made the maximum allocated memory configurable. Reviewed By: Yangqing Differential Revision: D5732930 fbshipit-source-id: 9ff34f49d5a3eb138bc6f44c82918731a35325a6	2017-08-29 19:11:08 -07:00
Junjie Bai	41adebe974	Clear the operator default engines before running operator tests Reviewed By: akyrola Differential Revision: D5729024 fbshipit-source-id: f2850d5cf53537b22298b39a07f64dfcc2753c75	2017-08-29 17:47:20 -07:00
Romain Cledat	e41dd5affe	Added USDT probes needed to support QueueSnoop Summary: Add USDT probes to support QueueSnoop Reviewed By: pietern Differential Revision: D5650744 fbshipit-source-id: 94dfcf97e23f7ebf76ac31e3d2240f67f802c924	2017-08-29 15:54:08 -07:00
Ahmed Taei	5315669bd8	Add ShapeInference for ConcatOp (Fixed) Reviewed By: akyrola Differential Revision: D5721442 fbshipit-source-id: 64ed35cb4c40f32a5cca29fe9cd04e18a340db4b	2017-08-29 12:18:03 -07:00
Aapo Kyrola	d698b29f1b	handle reshape gradient in shape inference in special way Summary: Reshape op's gradient op will have the original shape stored in a blob. Shape inference won't work directly because shape inference function does not have access to blob contents. In this case, I think making a special exception in the shape inference system is justified: we store the output of reshape in a reshape-cache, and pass that in a backward pass. Also include my experimental test script that I used for NeuralMT CNN model. Reviewed By: asaadaldien Differential Revision: D5721502 fbshipit-source-id: fdc8ab901d3bee2c4621ee5140a5435e49f4471d	2017-08-29 11:28:05 -07:00
Aapo Kyrola	488abdcd6c	slice op shape inference Summary: As titled + test Reviewed By: jamesr66a Differential Revision: D5720637 fbshipit-source-id: eae76e587808139fcf06abc0f8345152979815ec	2017-08-29 11:05:24 -07:00
Aziz Alto	c1b09cd5ab	Fix typo in docstring example (#2562 )	2017-08-29 11:48:44 -04:00
Alykhan Tejani	bc228b2409	auto_gpu:True for ones_like and zeros_like (#2559 )	2017-08-29 09:51:36 -04:00
Gabriel Bianconi	cdae579c22	Fix typos in "Extending PyTorch" (#2558 )	2017-08-29 09:39:29 -04:00
Priya Goyal	bdeafe49ac	Hotfix the OSS build error Summary: fixing master Reviewed By: ajtulloch Differential Revision: D5724857 fbshipit-source-id: da7b93e181cf496d59364122234a87edd6775a82	2017-08-28 23:53:18 -07:00
Ilia Cherniavskii	a0204331a8	Control flow operators Summary: This diff adds control flow operators in Caffe2 (starting with If, While): - Added If operator that executes then/else subnet - Branch subnet is executed in a separate isolated workspace, with some of the blobs transparently forwarded from the outer workspace - Adding a new NetBuilder subclass to construct nets using new operator - NetBuilder also keeps track of outer blob names and automatically sets blob bindings between outer and inner workspace, implementing generic convention on handling local/global variables in blocks Reviewed By: volkhin Differential Revision: D5720644 fbshipit-source-id: a674cde0c789f6a6ffdcd9d80159d1e42e49133f	2017-08-28 20:04:43 -07:00
Soumith Chintala	440f1abbdf	Merge commit '3b7b923de86675a7f4c37b04db8788ccc4b5b682'	2017-08-28 21:42:43 -04:00
Christian Sarofeen	3b7b923de8	Fix grid size for batch cat tensor now that getApplyGrid has been changed.	2017-08-28 21:41:46 -04:00
gchanan	a71330d13f	More efficient broadcasting backward by bailing in all cases if sizes match. (#2556 )	2017-08-28 21:38:33 -04:00
Aapo Kyrola	bfca30e6f1	shape inference for batchmatmul Summary: As titled. Direct adaptation of the operator code. Reviewed By: azzolini Differential Revision: D5721174 fbshipit-source-id: cc9d4c916d7d79d202a344f29ef384ddc68f4988	2017-08-28 18:31:55 -07:00
Aapo Kyrola	7c7603a60e	fix FC shape inference Summary: FC shape inference was broken for non-default axis. Add test. Reviewed By: asaadaldien Differential Revision: D5720146 fbshipit-source-id: f36f9cc8477dc61c3b07eeea8ea0702562045c88	2017-08-28 16:08:07 -07:00
Bram Wasti	5a360c92a6	@allow-large-files [caffe2] update snpe for oss Summary: oss build Reviewed By: Yangqing Differential Revision: D5605281 fbshipit-source-id: 289283d5ce8267a2ba22e6c35f6c4af0d45c439b	2017-08-28 15:32:23 -07:00
Hao Lu	4b51adb032	Add tiled vs. batched comparison for models Summary: Add tiled vs. batched comparison for models Add more logging to GLPadImage Differential Revision: D5718546 fbshipit-source-id: fdd4f0aabc41cb3b86b6f0ccf8e618a15170ceae	2017-08-28 13:55:25 -07:00
Hao Lu	3a315dd809	Fix a bug in GLConvolution Differential Revision: D5701479 fbshipit-source-id: c167277cc509261a4e17cdc99a473aada51bdfd7	2017-08-28 13:55:21 -07:00
Andrew Tulloch	898f3f398c	Use gemmlowp-based worker pool (spinning + #threads of blocks of work) instead of custom work-stealing impl Reviewed By: Yangqing Differential Revision: D5696841 fbshipit-source-id: 84b629d2c1ebd418c75d5da907799e580cc59d1e	2017-08-28 00:46:01 -07:00
Soumith Chintala	5585c265a2	Merge commit 'bd27f0b5a7183bbb42b024f88bd9058842c10f95'	2017-08-27 21:32:08 -04:00
Soumith Chintala	bd27f0b5a7	remove repetition of libquadmath in TH CMakeLists	2017-08-27 21:30:58 -04:00
Soumith Chintala	674e1f2ba1	increase test subprocess timeout	2017-08-27 21:11:08 -04:00
Soumith Chintala	4cca286d9e	add google analytics to docs	2017-08-27 20:58:33 -04:00
Soumith Chintala	5a0163cdee	Merge commit '80caca4edbc415abb2f0695fb2565e6b46c410a8'	2017-08-26 22:25:01 -04:00
Christian Sarofeen	80caca4edb	Allowing larger grids for THCApply shows improved performance.	2017-08-26 22:23:50 -04:00
Soumith Chintala	18d6579ee3	Merge commit '72a257584efa7fb63b14f09d19efc96caa5d6e4d'	2017-08-26 22:21:19 -04:00
Soumith Chintala	b42911dd48	Merge commit '429699bb20596e1c8bc87ab37e4597b700eff8f6'	2017-08-26 22:20:55 -04:00
Alykhan Tejani	72a257584e	Add numerically stable logsigmoid	2017-08-26 22:19:48 -04:00
Alykhan Tejani	429699bb20	Add numerically stable logsigmoid	2017-08-26 22:19:39 -04:00
Justin Johnson	94b5990201	Add torch.cuda.get_device_name function (#2540 )	2017-08-26 15:06:37 -04:00
Soumith Chintala	d9f9047e39	Merge commit '16008661bf63da2fbe7fc8412a8ab4dd70deeace'	2017-08-26 14:46:26 -04:00
Soumith Chintala	a9b089c3c8	Merge commit 'b5949d8e9d9e43737979bcc089de2a2d2f783e1d'	2017-08-26 14:46:05 -04:00
Lu Fang	5294017d9f	Adding implicit padding for 3d average pooling	2017-08-26 14:45:19 -04:00
Lu Fang	16008661bf	Adding implicit padding for 3d average pooling	2017-08-26 14:44:59 -04:00
Lu Fang	b5949d8e9d	Adding implicit padding for 3d average pooling	2017-08-26 14:44:49 -04:00
peterjc123	150dc7a8e3	Improve Windows Compatibility(for libshm) (#2455 )	2017-08-26 07:20:45 -04:00
Artem Volkhin	d3c8e68004	Revert D5641588: [caffe2] Control flow operators Summary: This reverts commit f9e04429961c3da7da4ebca3e8163bfcc2a09ec9 bypass-lint Differential Revision: D5641588 fbshipit-source-id: bb23b213d08e9c3ea509216fce9367625943d007	2017-08-26 00:07:58 -07:00
Yangqing Jia	9f693b39aa	Revert D5711951: [caffe2] Add shape inference for ConcatOp Summary: This reverts commit 9173ef0f18af25326ec18e66f6ce29eecfa5ceea bypass-lint Differential Revision: D5711951 fbshipit-source-id: 9bbb872eafcbd3c470b782a5ddb2a1c894888101	2017-08-25 23:37:38 -07:00
Christopher Hay	cc3662e939	Added support for scaling learning rate of Caffe2 optimizers during training Summary: While there is currently support for scaling the base learning rate when loading the model, there is not support for scaling the base learning rate during training. This is needed for LATTE's seq2seq translation models, as the learning schedule is not predefined and is modified at runtime. Reviewed By: jhcross Differential Revision: D5701391 fbshipit-source-id: ae3bec45f238db1a2be7af9c04d720067e9095d5	2017-08-25 19:04:47 -07:00
Xianjie Chen	105d5e595c	squeeze op enforce about dim Summary: rt Reviewed By: kittipatv Differential Revision: D5709352 fbshipit-source-id: 7805f26f5f1c0eb941c1c9e85211bbdcc8f2e6b8	2017-08-25 18:30:44 -07:00
Yangqing Jia	26f0943130	Do CaffeCudaSetDevice and CaffeCudaGetDevice Summary: These are wrapper functions so that if we run in a Caffe2-only mode, we can turn the flag on and get some small speedup on cuda device switches. The purpose of the diff is to allow us to quickly assess the overhead of cuda device switch functions. Ideally, the caching behavior shall live in the cuda driver, which is the only safe place to ensure correctness. If other code is running aside Caffe2 and does not properly do device guard, this functionality will fail as separate cudaSetDevice() calls will not update Caffe2's thread local device id. As a result, the functionality is only enabled when/if one explicitly sets the flag. This might not be safe, so use with caution. - cudaGetDevice can go from 90ns to 2ns - when setting the same device, we can go from 100ns to 2 ns - when setting a different device, things are the same (1ns overhead on top of 143ns) Reviewed By: azzolini Differential Revision: D5709398 fbshipit-source-id: 6255f17a3d41f59a30327436383f306a2287896e	2017-08-25 18:20:14 -07:00
Ahmed Taei	da418f5744	Add shape inference for ConcatOp Reviewed By: akyrola Differential Revision: D5711951 fbshipit-source-id: 9173ef0f18af25326ec18e66f6ce29eecfa5ceea	2017-08-25 18:09:35 -07:00
Wojciech Glogowski	e27431ddf5	New math.h functions required by YellowFin Summary: New math.h functions requred by YellowFin Reviewed By: akyrola Differential Revision: D5695258 fbshipit-source-id: b21a23b7f9647004173f8eb4f8ba9a852370d97a	2017-08-25 18:09:34 -07:00
Jerry Zhang	3c180ba317	Opensourcing channel shuffle Summary: att Reviewed By: Yangqing Differential Revision: D5662540 fbshipit-source-id: 474d7d808841ff8f7ce97b55df836b9d2f4a7629	2017-08-25 16:46:31 -07:00
Aapo Kyrola	885d9a7796	fix memonger for RecurrentNetworks Summary: When we ported to memonger to C++ in D5544219, we forgot to include the special handling of RecurrentNetwork ops. This fixes that and adds a test. Reviewed By: asaadaldien Differential Revision: D5692407 fbshipit-source-id: 4e739b5dd6c7298303eee9bfa1aa4d19359eb7b5	2017-08-25 16:01:25 -07:00
Bor-Yiing Su	5bc52c3223	Adds the master setup plan to the model exporter. Reviewed By: rayleichen Differential Revision: D5697246 fbshipit-source-id: d1775e0de3b3080f398350f98659436b3dfbd7b8	2017-08-25 16:01:24 -07:00
Lei Chen	432cba6c05	Set up run_every_ms when constructing ExecutionStep Summary: same as title. Differential Revision: D5709274 fbshipit-source-id: f88b1325f3e6b948b836cc90f4d9c38a27be28ab	2017-08-25 15:58:29 -07:00
Alisson Gusatti Azzolini	ae0c4c8e66	Respect inplace blobs in InjectCrossDeviceCopies Summary: Before this diff, we were not respecting in-place blobs. E.g. if we had: with DeviceOption(CPU): blob = net.MyOpA([]) with DeviceOption(CUDA): net.MyOpB([blob], [blob]) After the InjectCrossDevicesCopies we would have: blob = net.MyOpA([], device=CPU) blob_cuda0 = net.Copy([blob], [blob_cuda0], device=CUDA) net.MyOpB([blob_cuda0], [blob], device=CUDA) Basically, we were not respecting inplace blobs. After this diff, we'll keep the inplace blob. Reviewed By: harouwu Differential Revision: D5671867 fbshipit-source-id: 6ad68c612dae19d7e1f45f4988d929644100b4d5	2017-08-25 14:57:58 -07:00
Sam Gross	327a0793b4	Add missing parameters to tensor docs (#2541 )	2017-08-25 17:40:55 -04:00
Aapo Kyrola	cffbbfa9e3	Revert D5655753: [Caffe2] better straggler exit procedure Summary: This reverts commit ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb bypass-lint Differential Revision: D5655753 fbshipit-source-id: 2f1d350286d2ee31e8045c9bd03ef1235f1a93ec	2017-08-25 14:23:09 -07:00
Yangqing Jia	3b903e8c68	Fix more MKL build issues Summary: Turns out that due to the cmake improvement by lukeyeager , we now no longer rely on compiler flags but on the macros.h file to obtain CAFFE2_USE_MKL. This requires some minor changes in the MKL implementation to properly capture the macro before testing it. Closes https://github.com/caffe2/caffe2/pull/1124 Reviewed By: jerryzh168 Differential Revision: D5705134 Pulled By: Yangqing fbshipit-source-id: 6f6ad820cdd826818c12cf5aa344533a9324dbe2	2017-08-25 14:01:01 -07:00
Aapo Kyrola	a2a033937b	DestroyCommonWorld op Summary: Add an op to explicitly close common world connections, thus helping propagate closures when errors happen. Requires D5661477. Reviewed By: pietern Differential Revision: D5660476 fbshipit-source-id: 85791686691305abd96b082a6f68e4427ba14fbb	2017-08-25 14:01:01 -07:00
Ilia Cherniavskii	86cc7ace93	Control flow operators Summary: This diff adds control flow operators in Caffe2 (starting with If, While): - Added If operator that executes then/else subnet - Branch subnet is executed in a separate isolated workspace, with some of the blobs transparently forwarded from the outer workspace - Adding a new NetBuilder subclass to construct nets using new operator - NetBuilder also keeps track of outer blob names and automatically sets blob bindings between outer and inner workspace, implementing generic convention on handling local/global variables in blocks Reviewed By: azzolini Differential Revision: D5641588 fbshipit-source-id: f9e04429961c3da7da4ebca3e8163bfcc2a09ec9	2017-08-25 12:31:14 -07:00
Alexander Sidorov	7eba614503	RNNCell: Initializers interface, simplify _LSTM helper Summary: _LSTM helper is a legacy piece we had before all the RNNCell awesomeness landed. Now we need to pull it apart and create separate building blocks that people can use for any RNNs. Please note changes to a test with double scoping. That should go away once we change RNNCell scoping logic in such a way that each cells ads its own name to the scope for all of its outputs (see another diff: D5613139 ) Reviewed By: jhcross Differential Revision: D5632276 fbshipit-source-id: 1cb568ab995c4c0b3dd1b4bad2d028e34bded9c1	2017-08-25 12:01:24 -07:00
Pieter Noordhuis	e5740c53de	Update gloo dependency Summary: This includes the commit that adds `close()` to gloo::transport::Pair. Closes https://github.com/caffe2/caffe2/pull/1127 Reviewed By: akyrola Differential Revision: D5708513 Pulled By: pietern fbshipit-source-id: 8ef505d48b3bfa1576c068c4e4a29c9a8ed5efc7	2017-08-25 12:01:23 -07:00
Aapo Kyrola	82360d8cba	shape inference for ReduceFront/Back/Sum/Mean, Gather and Dropout Summary: These were missing and required for some seq2seq models. Unit tested. The previous implementation of ReduceBackMean shape inference was incorrect, so removed it. Reviewed By: asaadaldien Differential Revision: D5691262 fbshipit-source-id: 76f868b298440f988635966a410f0232301ca6c4	2017-08-25 11:31:17 -07:00
Soumith Chintala	50129befb6	Merge commit '7d42fd8423213a50a2ac66c08100eb540c531ea0'	2017-08-25 14:30:52 -04:00
Soumith Chintala	61fae72e5f	Merge commit 'e4d15223dcba0fb55e58ae9822f1b97a2f9d97d7'	2017-08-25 14:28:51 -04:00
Soumith Chintala	d72118cfcd	Merge commit 'e31ec51ee5333bec15b5ae10d646c21c422ff9fe'	2017-08-25 14:28:09 -04:00
Zhou Mo	2c07f88ea3	Fix typos.	2017-08-25 14:27:07 -04:00
Zhou Mo	7d42fd8423	Fix typos.	2017-08-25 14:25:58 -04:00
Zhou Mo	e4d15223dc	Fix typos.	2017-08-25 14:25:28 -04:00
Zhou Mo	e31ec51ee5	Fix typos.	2017-08-25 14:25:17 -04:00
Taehoon Lee	61e4723132	Fix typos (#2472 )	2017-08-25 14:13:38 -04:00
Soumith Chintala	0b95b4c7d1	Merge commit '8a2e69177b91b16f17c898fe6c71b4a3c1f3d6cb'	2017-08-25 14:12:39 -04:00
Soumith Chintala	d281fea9ac	Merge commit '7b71abc52ad3ef1bb179e26f22e038a84707c270'	2017-08-25 14:12:01 -04:00
Alykhan Tejani	eb58740651	add ones_like and zeros_like	2017-08-25 14:11:04 -04:00
Alykhan Tejani	8a2e69177b	add ones_like and zeros_like	2017-08-25 14:10:52 -04:00
Alykhan Tejani	7b71abc52a	add ones_like and zeros_like	2017-08-25 14:10:42 -04:00
Soumith Chintala	c86f8fa746	Merge commit '4bef5f5ff97c0b02b9125caf3e68008573c25dd7'	2017-08-25 14:04:29 -04:00
rluo	3b155fa305	Not changing dimension size for expand when target size is -1	2017-08-25 14:04:23 -04:00
rluo	4bef5f5ff9	Not changing dimension size for expand when target size is -1	2017-08-25 14:01:53 -04:00
Soumith Chintala	a655e6313e	update README with new major contributors and remove redundant sections	2017-08-25 13:47:17 -04:00
Pieter Noordhuis	523d8af26e	CMake helper to deprioritize Anaconda include path Summary: I ran into an issue where a subset of packages were found in the Anaconda path. This path also contained includes for other packages and the Anaconda path inadvertently took precendence over the intended include path. The new `caffe2_include_directories` helper is a hacky attempt to "fix" this by deprioritizing Anaconda paths in the hope that intended include paths are searched before Anaconda. Closes https://github.com/caffe2/caffe2/pull/1121 Reviewed By: Yangqing Differential Revision: D5701819 Pulled By: pietern fbshipit-source-id: 908284cd4ea6c8167774e4e3fcc4dc0ca8a23110	2017-08-25 10:32:59 -07:00
gchanan	15e16f6963	More double backwards support for pooling, unpooling, padding (#2516 ) * Support double backwards for AdaptiveAvgPool1d and AdaptiveAvgPool2d. * Support double backwards for ReplicationPad2d, ReplicationPad3d, and ReflectionPad2d. * Support double backwards for FractionalMaxPool2d. * Support double backwards for MaxUnpool1d and MaxUnpool2d. * Circular recursive imports not supported in python 2. * Address review comments.	2017-08-25 12:28:06 -04:00
gchanan	9c948c22b5	Fix check_no_size_average tests. (#2532 ) * Fix check_no_size_average tests. * size_average / sizeAverage for non-legacy vs legacy. * Fix lint.	2017-08-25 12:27:26 -04:00
Soumith Chintala	98a5c99b46	remove debug code	2017-08-25 11:26:02 -04:00
gchanan	14038fe559	Remove unnecessary if in maybe_view. (#2538 )	2017-08-25 11:21:50 -04:00
Adam Paszke	f250815fa4	Fix bugs caused by flatten_parameters() (#2537 )	2017-08-25 11:08:54 -04:00
yunjey	153c9b0714	Add examples in functional.py and loss.py (#2371 ) * Add examples in functional.py Added examples for F.cross_entropy, F.binary_cross_entropy and F.binary_cross_entropy_with_logits. * Add ` for PyTorch docs Added ` for PyTorch docs. * Add examples in loss.py Added examples for nn.BCELoss and nn.BCEWithLogitLoss.	2017-08-25 09:44:36 -04:00
Soumith Chintala	0d7d79ad75	Merge commit 'd112cbd7f675a8ffde3a8995ac37c69a4c84e5df'	2017-08-25 07:39:02 -04:00
Soumith Chintala	ecc7579f44	Merge commit 'e4c05c2b5f3dbc121c0cf4bb78d15540412dcd3c'	2017-08-25 07:37:19 -04:00
Soumith Chintala	e4c05c2b5f	fix leaking symbols from THNN	2017-08-25 07:36:27 -04:00
Soumith Chintala	b3d2a3574e	Merge commit '01adebea1c0cb9aa704e50a9d14507b0fab5939f'	2017-08-25 07:36:00 -04:00
Christian Sarofeen	802ddd997d	Disable persistent BN for cudnn < 7.0.3	2017-08-25 07:33:24 -04:00
Soumith Chintala	51b60354a5	cudnn 7 grouped convolutions	2017-08-25 07:33:03 -04:00
Christian Sarofeen	ec86d0b2ba	Updates for CUDA 9	2017-08-25 07:32:05 -04:00
Soumith Chintala	01adebea1c	cuda 9 hgemm fix	2017-08-25 07:31:32 -04:00
Christian Sarofeen	d112cbd7f6	Updates for CUDA 9	2017-08-25 07:27:25 -04:00
Christian Sarofeen	bc93d79967	Updates for CUDA 9	2017-08-25 07:27:16 -04:00
Soumith Chintala	b079469af0	self -> ctx in Extending note	2017-08-25 07:19:20 -04:00
Alisson Gusatti Azzolini	5e0b28e7bd	PrependDimOp Summary: Split the first dimension of a tensor into 2, the first of which is fixed and given in the argument. This is used to then split batch into smaller batches and distributed it across workers. Reviewed By: harouwu Differential Revision: D5702175 fbshipit-source-id: 02bb93e49bf9db411b516e149c8e647301dd2ca5	2017-08-24 18:52:05 -07:00
Jiyan Yang	20c854d43c	Make FC op work with empty batch in cuda Reviewed By: xianjiec Differential Revision: D5673458 fbshipit-source-id: d1c950c94173843670ae1fae0e15ff61ca7d6761	2017-08-24 18:52:04 -07:00
Daniel Bermond	b3d85cf6b6	Removed CNMEM git submodule Summary: CNMEM was deprecated by commit c59f291 and is not used anymore by Caffe2. It was superseded by CUB. The git submodule can now be removed. Closes https://github.com/caffe2/caffe2/pull/1118 Reviewed By: Yangqing Differential Revision: D5699492 Pulled By: pietern fbshipit-source-id: 44627ed038f37c12312889bb27691db426ad122f	2017-08-24 16:36:56 -07:00
Yangqing Jia	d3a6fefe1e	fix nnpack mkl header inclusion Summary: Closes https://github.com/caffe2/caffe2/pull/1123 Reviewed By: ajtulloch Differential Revision: D5701658 Pulled By: Yangqing fbshipit-source-id: 8351a5531d5e05204312b77f3614cba0228b1331	2017-08-24 15:46:29 -07:00
Pieter Noordhuis	813cca85d1	Use CMake HINTS to find CuDNN Summary: The PATHS suggestion to find_library is searched after everything else. By using HINTS, it searches CUDNN_ROOT_DIR much earlier, avoiding potential conflicts with other paths that have the CuDNN header. Closes https://github.com/caffe2/caffe2/pull/1122 Reviewed By: Yangqing Differential Revision: D5701822 Pulled By: pietern fbshipit-source-id: 3f15757701aff167e7ae2a3e8a4ccf5d96763a0c	2017-08-24 15:35:24 -07:00
John Pearson	14d8c03424	adding backward capability for potrf (Cholesky) (#2386 )	2017-08-24 17:18:11 -04:00
Trevor Killeen	7e21e760e6	More cogent error messages during indexing (#2503 )	2017-08-24 17:13:03 -04:00
Kaiyu Shi	b7a6e823a9	Fix TypeError of prod when BP to GPU tensor (#2353 )	2017-08-24 17:09:25 -04:00
jekbradbury	7aa6bc516f	add "Basics" section to distributed docs (#2433 )	2017-08-24 17:07:20 -04:00
Tzu-Wei Huang	6bcbecfb97	fix doc of lr_scheduler (#2280 ) * resolves #1991 * fix typo	2017-08-24 17:04:53 -04:00
Nick Hynes	5c6d543b7a	Allow kwarg-only inputs to DataParallel	2017-08-24 17:01:04 -04:00
Aapo Kyrola	4c9eff807b	better straggler exit procedure Differential Revision: D5655753 fbshipit-source-id: ad0c998feeb03bcb0cf4e5127fb3cc7bb00dcedb	2017-08-24 12:33:30 -07:00
Aapo Kyrola	23209152a9	fix memonger test for open source by checking for cuda support Summary: This test was failing on non-GPU builds because it refers to operator CopyGPUToCPU. Thanks pietern for catching this. Reviewed By: asaadaldien Differential Revision: D5698763 fbshipit-source-id: 0bde0f3e99c58647dba2ea6da4d51938e763d10c	2017-08-24 12:02:38 -07:00
Jerry Zhang	7f4ceb83e3	Relax dimension constraints for weight matrix in FC Summary: att Reviewed By: Yangqing Differential Revision: D5662265 fbshipit-source-id: 893ee2f92debab06117725beeca3199cba565f1e	2017-08-24 11:16:39 -07:00
Christopher Hay	ad07f5f05d	Added norm-based gradient clipping to optimizer library Summary: Moved code for global norm-based gradient clipping from fb specific workflows (seq2seq) to the open-source caffe2 optimizer library Reviewed By: jhcross Differential Revision: D5637453 fbshipit-source-id: 7e73c9a1c97c28a152c188467b27a6449f79242e	2017-08-24 10:17:50 -07:00
Hao Lu	c4bc718c4b	Fix OpenGLPadImage Summary: I was assuming left padding == right padding and top padding == bottom padding, but actually they could be different, which results in different output size. Differential Revision: D5693719 fbshipit-source-id: 32595652231da0cf1ec269dc34fa87df23732328	2017-08-24 10:17:49 -07:00
Gregory Chanan	de903ad208	Implement double backwards for nn.Upsample.	2017-08-24 11:13:39 -04:00
Robert Kirby	5d09fcd028	Make DistributedDataParallel threads Daemon threads to allow clean process exit (#2524 )	2017-08-24 06:32:29 -04:00
Long Jin	3faeb621d3	support id_score_list for Feed Reviewed By: xianjiec Differential Revision: D5624894 fbshipit-source-id: 1b2caba9ffcce68f346020485cb1f4edb01ca5e7	2017-08-24 00:32:05 -07:00
Kittipat Virochsiri	d368b59177	logging the blob that has type error Summary: Currently, it's not easy to track down which tensor is missing type and shape info. Print it out for easier debuggin. Reviewed By: volkhin, xianjiec Differential Revision: D5695223 fbshipit-source-id: 7f0be0be777a35bb5a71b3799b29b91f0763c159	2017-08-23 21:21:27 -07:00
Kittipat Virochsiri	409d985d43	Tensor inference function for Gather Summary: Make Gather more convenient to use in layer model Reviewed By: xianjiec Differential Revision: D5695197 fbshipit-source-id: aa0406ea39af5b6980ee6fd3bb11250732caac00	2017-08-23 21:21:26 -07:00
Yangqing Jia	93e12e75df	Allow caffe2 to detect if cuda lib has been linked, and also fix oss build error. Summary: Closes https://github.com/caffe2/caffe2/pull/1114 Reviewed By: pietern Differential Revision: D5686557 Pulled By: Yangqing fbshipit-source-id: 6b7245ebbe4eeb025ce9d0fe8fda427a0c3d9770	2017-08-23 18:41:15 -07:00
Devesh Agrawal	16549ed92b	Scaled training and fetching from the PS Summary: Today, the PS's weirdly store the entire embedding and not just their subsection of it. This was simply an oversight on the part of the original author and this diff fixes that. The sparse params are sharded to the PS's and the PS's just store their section of the embedding. The trainer requests the id's as is from the PS. But the PS divides the id by the num_of_shards before looking it up in the emdedding table blob. This happens on the backward and the forward pass. However, during the model download part, the PS multiples the embeddings with the num_of_shards before returning them to the trainer. The upshot is that the trainer does not know anything about how the embeddings are scaled on the PS. The PS adds extra divide and multiply steps to achieve that. 2. During estimation time, we allocate just one PS for estimation. So in order to make all of the embeddings fit on the single PS: We simply additionally scale the hash table sizes (proportionally and equally for all the sparse params) such that it fits. This scaling is handled analogously to (1). Reviewed By: boryiingsu Differential Revision: D5664093 fbshipit-source-id: 92f501f61566f939c41ce0b614a1b499669f978a	2017-08-23 18:16:03 -07:00
Devesh Agrawal	1d83a46b44	Improve float16 support Summary: The operators were lacking some float16 stuff: Extend ScatterAssign for float16. In addition, introduce a constant fill for float16. This needs to be a separate operator instead of ConstantFill, since the latter is in OSS and hence cannot use the Float16 stuff that is fb specific. Reviewed By: azzolini Differential Revision: D5664071 fbshipit-source-id: 5b84f625693b6ddddd8b7a35f1541ae40df49fbe	2017-08-23 16:33:07 -07:00
Catherine Dong	1955d0797e	Added fast path for CUDNN global max pooling Summary: This adds a fast path for global max pooling with NCHW. Compared to equivalent ReduceBackMean, this is about 3.5x faster. Based on D5533059. Reviewed By: akyrola Differential Revision: D5681122 fbshipit-source-id: 7a4df934044c7dd01888f095f7dd46654aaf4eae	2017-08-23 16:33:06 -07:00
Yanghan Wang	2de1bc894b	move ShapeOp out from utility_ops Summary: move ShapeOp out from utility_ops Reviewed By: ajtulloch Differential Revision: D5686081 fbshipit-source-id: ac1ae50bfa2e36eddd1834839169ba3cdf0722dc	2017-08-23 16:33:06 -07:00
Jeonghee Yi	98da4e3a04	pairwise dot product with dot_groups support Summary: extending pairwise dot-product only between dot_groups Differential Revision: D5527060 fbshipit-source-id: be5d3178c332e122853a2f9d8da12a880608b0ab	2017-08-23 15:23:36 -07:00
Christian Sarofeen	4c69697d2a	Distribtued bug fixes. (#2434 )	2017-08-23 14:46:52 -04:00
Junjie Bai	620d3ab714	Do not run operator gpu tests if there is not gpu Reviewed By: Yangqing Differential Revision: D5689269 fbshipit-source-id: ca3be27a81ffdfb93c153a8aa75a8b8857a33552	2017-08-23 11:32:41 -07:00
Junjie Bai	6eeb7e6fd8	Use cast::GetCastDataType to handle "from_type" and "to" arguments Summary: Also enforce the "from_type" argument is supplied when getting gradient Reviewed By: Yangqing Differential Revision: D5684399 fbshipit-source-id: bee955d44a04c44142b2212cff548cea6e08b22f	2017-08-23 10:18:01 -07:00
Sasank Chilamkurthy	bbf2c6a084	Fix ConcatDataset docs (#2355 ) * Fix ConcatDataset docs so that sphinx-napoleon parses it right. * Fix WeightedRandomSampler docs	2017-08-23 09:47:57 -04:00
yogi81	5e54d9330f	hidding statically linked libstdc++ symbols (#2471 ) This is a solution for the problem described in this comment: `1d9b10d312 (commitcomment-23678756)` And a solution for the issue #2462	2017-08-23 07:18:21 -04:00
Sam Gross	966fdbd93a	Add commands to re-build individual libraries. (#2506 ) When working on PyTorch dependencies we often want to rebuild only that dependency and the Python extension. You can now do that by running: python setup.py build_thc to only re-build THC	2017-08-23 07:16:05 -04:00
Adam Fisch	27bd3df71b	Patching EmeddingBag to accept 2D input (#2429 ) * Patching EmeddingBag to accept 2D input * fix for CUDA inputs * fix lint	2017-08-23 07:12:21 -04:00
Mikhail Korobov	008a62b18a	DOC fixed Tensor.expand docstring (#2495 )	2017-08-23 06:38:55 -04:00
Jeonghee Yi	d675c101e9	extend pairwise dot product for non-equal x & y dimension size Summary: extend pairwise dot product for different number of embeddings on x & y dimensions Differential Revision: D5663553 fbshipit-source-id: 1743a2c101cb8c0fc1f0f3d89c19530802400ec6	2017-08-23 02:08:20 -07:00
Junjie Bai	c52fd11f58	Add CUDNN to the gpu devices' default preferred engines Summary: The original diff is unlanded as the fbcode-target-determinator tests were not run, recreating a new diff with the same change to trigger the tests. CUDNN should be almost always faster than the default implementation Reviewed By: salexspb Differential Revision: D5637156 fbshipit-source-id: 413a08acba7a83502be6199fcb524ab46f1fd4ce	2017-08-22 23:55:34 -07:00
Ilia Cherniavskii	e33dfe93e4	Update proto definition Summary: Update Argument's definition to allow direct passing of NetDef Reviewed By: azzolini Differential Revision: D5681837 fbshipit-source-id: e6c618bff051f9bbc56075c796aeba0094fa97dd	2017-08-22 19:01:18 -07:00
Ilia Cherniavskii	67a55b81e3	Forward blobs into workspace Summary: Better isolation for workspaces to allow forwarding selected blobs from parent to child workspace, possibly under new names. Used for proper isolation of subnets (loops, then/else branhes, etc) from outer workspace. Reviewed By: azzolini Differential Revision: D5681667 fbshipit-source-id: e61a2c7c98ee2abf1f0761905f4bfae47c201c32	2017-08-22 18:45:56 -07:00
Hao Lu	502b43641f	More flexible tiling for Conv and ConvTranspose Summary: With these changes, Conv, ConvTranspose, PRelu, and Relu work with tiling now. The default is still batching. Differential Revision: D5623321 fbshipit-source-id: 07aa378d24165ec19e751cd79c70dea995003be9	2017-08-22 18:17:40 -07:00
Kittipat Virochsiri	058815955d	Add default implementation of __call__ for context manager Summary: Making it more convenient to wrap code int context Reviewed By: boryiingsu Differential Revision: D5680991 fbshipit-source-id: 07b7e4d5aa657184039a7d18192b68fe11c1a570	2017-08-22 17:46:22 -07:00
Badri Narayan Bhaskar	9507cae9e0	Create MergeIdListsLayer Summary: We create a layer for MergeIdListsOp Differential Revision: D5531348 fbshipit-source-id: a2e227e1abda05cefa893fd41a2c3ca997851e25	2017-08-22 17:00:55 -07:00
Alisson Gusatti Azzolini	930acc8e85	CUDA SparseLengthsWeightedSum Summary: title. Reviewed By: harouwu Differential Revision: D5665776 fbshipit-source-id: a8ae1a71a9a21e68172662f38b5f799870b9dcd1	2017-08-22 15:42:02 -07:00
Luke Yeager	c1356216a2	cmake: generate macros.h with configure_file() Summary: Using file(WRITE) caused the file to be rewritten for every CMake reconfigure, which was causing unnecessary full rebuilds of the project even when no source files changed. The new strategy has the added benefit of enforcing that the macros.h file is always generated correctly. When the main project relies on this header for macro definitions (instead of relying on add_definitions()), we can be more confident that the project will build correctly when used as a library (which is the whole point of the macros.h file). Upsides: * No more unnecessary rebuilds * Higher confidence that the project will compile properly as a third-party library Downsides: * Developers need to add an entry to `macros.h.in` whenever they would have added a new definition with `add_definitions()` Closes https://github.com/caffe2/caffe2/pull/1103 Differential Revision: D5680367 Pulled By: Yangqing fbshipit-source-id: 4db29c28589efda1b6a3f5f88752e3984260a0f2	2017-08-22 14:22:36 -07:00
Junjie Bai	5748e7140f	Strip Operator Schema in mobile build Reviewed By: Yangqing Differential Revision: D5677792 fbshipit-source-id: d29edb26a36b24a46821e13e2d77af0f21571fcd	2017-08-22 13:31:08 -07:00
Kittipat Virochsiri	0e5fcc7ca2	Make Tags a decorator as well Summary: In case the whole function should be wrapped in certain context, this make it less ugly. Reviewed By: xianjiec Differential Revision: D5665253 fbshipit-source-id: ecdc6b1a08e91bae6a4352341f97ee37f3aa677a	2017-08-22 11:01:14 -07:00
Luke Yeager	e902620620	cmake: relative paths for install() Summary: I discovered this while investigating more build-caching issues like https://github.com/caffe2/caffe2/pull/1103. > If a relative path is given it is interpreted relative to the value of the CMAKE_INSTALL_PREFIX variable. https://cmake.org/cmake/help/v3.0/command/install.html This is a non-functional change - it just makes the code a bit easier to read. I verified locally that the resulting install directories are identical. Closes https://github.com/caffe2/caffe2/pull/1111 Differential Revision: D5677328 Pulled By: Yangqing fbshipit-source-id: 9bb1bfe85fc0bc54a9b7ce33cc31e45ea061d21e	2017-08-22 09:52:09 -07:00
Gregory Chanan	e37847af92	Test CrossEntropyLoss double backwards.	2017-08-22 11:12:03 -04:00
Gregory Chanan	0390e80a7e	Support MarginRankingLoss double backwards.	2017-08-22 11:12:03 -04:00
Gregory Chanan	e27127391d	Support double backwards for SoftMarginLoss.	2017-08-22 11:12:03 -04:00
Gregory Chanan	fb7e9583bd	Generate no_size_average criterion tests by specifying check_no_size_average=True	2017-08-22 11:12:03 -04:00
gchanan	22ec5f37ca	Support double backwards with parallel nn autograd functions. (#2508 )	2017-08-22 03:57:45 -04:00
Allen Ye	a32e98b700	Add documentation for std/var unbiased argument (#2509 )	2017-08-22 03:45:54 -04:00
Douglas Chen	440d979075	Optimizations for Caffe2 SinusoidPositionEncodingOp Summary: Optimizations for SinusoidPositionEncodingOp to sinusoid position embeddings more competitive against table based embeddings. - Removed most calls to std::pow - Replaced division with multiplication with reciprocal - Reused computation across examples within a batch Current speedup with batch size of 16, sequence length of 128 and embedding size of 512 is about 270x (17k embeddings per second -> 4.7M embeddings per second). The speedup is very dependent on the batch size; at a batch size of 4 this only gets 1.7M embeddings per second. Profile: https://pxl.cl/8zf0 Annotated DoRunWithType: P57925031 Reviewed By: jamesr66a Differential Revision: D5634766 fbshipit-source-id: 0f35bb176164ea547c91de242a0205c5d7adf7cf	2017-08-22 00:04:06 -07:00
Yangqing Jia	65112f3865	code cleanup: separate the several net implementations to separate files. Summary: TSIA. Reviewed By: harouwu Differential Revision: D5670906 fbshipit-source-id: 507e789978144341bf696fb20dc11f3c2d55493b	2017-08-21 22:07:48 -07:00
Alisson Gusatti Azzolini	51d67ecd8c	DeviceInference function for NCCLAllreduce Summary: Not sure it is correct in general, but it works as long as we have one blob per GPU. Reviewed By: harouwu Differential Revision: D5671891 fbshipit-source-id: 739475101e9b509bc521e268c5b308faa36800e7	2017-08-21 15:46:50 -07:00
Yangqing Jia	0b363fd9de	Add event as a first-class citizen of the OperatorBase interface. Summary: This adds Event as a new member object to OperatorBase, hence allowing us to do async computation more easily. Will send a fix for proper RunAsync() for SimpleNet. In principle this should have no functionality change yet - the only difference is that async_dag net now delegates to the operators for holding the event objects. Reviewed By: harouwu Differential Revision: D5668627 fbshipit-source-id: 55f994074be6b85d6c66f09795dcbe2b93aba300	2017-08-21 13:30:53 -07:00
Andrew Tulloch	c535b8098f	Add a HPTT path in transpose_op.cc Summary: https://arxiv.org/abs/1704.04374 is a simple, stateless library that implements a high performance tensor transposition abstraction - it's substantially faster than what we have. I think instead of going through an engine specialization on the CPU side, we can just add this path, since there's no value (in terms of state management, etc) for having it separate? We could cache the plan, but it's so cheap to create in these tests. Reviewed By: jonmorton Differential Revision: D5534519 fbshipit-source-id: de2fd64fee11be259656b0f02f42a62b7035e3d3	2017-08-21 12:46:57 -07:00
Skotch Vail	77c28b7a7c	Revert D5607549: [Caffe2] [Mobile] [ULP] QConv impl. Summary: This reverts commit dfdd7f78d4c64c1f71e11106c57f2c4007581e48 bypass-lint Differential Revision: D5607549 fbshipit-source-id: ecfe2d455508cae49607efd31aed79198d225883	2017-08-21 11:46:01 -07:00
Jerry Zhang	6465c14aa1	Temporary crash fix Summary: Disable mpscnn for 10.0.2 temporarily since I can't reproduce the crash Reviewed By: ajtulloch Differential Revision: D5665269 fbshipit-source-id: 2f95ba591099078a0347f7ea7bfa82dc37005228	2017-08-21 11:22:57 -07:00
Andrew Tulloch	304f3773d0	QConv impl. Reviewed By: Yangqing Differential Revision: D5607549 fbshipit-source-id: dfdd7f78d4c64c1f71e11106c57f2c4007581e48	2017-08-21 10:31:40 -07:00
Alykhan Tejani	de24bb4b66	Update readme with docker cmd (#2501 ) * update docker command in readme to use pre-built images * correct spelling of Docker Hub * Update README.md	2017-08-21 08:52:26 -04:00
Soumith Chintala	d2b8d3f8f7	add slack clarification	2017-08-21 06:12:46 -04:00
Yangqing Jia	c38206d901	add wait event and record for MKLOperator Summary: This is a patch for the recent change for Events. ajtulloch caught this one. Reviewed By: harouwu Differential Revision: D5663317 fbshipit-source-id: 471a24f594583669bcd5bbf2fabaeb5664bd0bb7	2017-08-19 21:30:47 -07:00
Zhicheng Yan	0e20a7cb7d	ImageInputOp_more_data_augmentation Summary: Add more data augmentation to ImageInputOp 1) Inception-style random sized cropping 2) color jittering 3) color lighting Reviewed By: panshen1 Differential Revision: D5637726 fbshipit-source-id: 45d9cc69eec9f4d48c1607d80ccd89e325961b1a	2017-08-19 14:15:58 -07:00
LuoweiZhou	5c43fcda8d	Support params that don’t require grad in DistributedDataParallel (#2464 )	2017-08-19 11:22:20 -04:00
Quan Vuong	c5a9aa027b	fix wrong path to ReduceLROnPlateau in docstring	2017-08-19 10:27:58 -04:00
Bor-Yiing Su	b3536a3a6d	Adds checkpoint taskgroups to the online trainer. Summary: 1. Uses the upload_builder in the offline training. 2. Adds the checkpoint taskgroups to the online trainer. 3. Changes the naming rules so that the model checkpoint has the format of <directory>/<entity_id>_<snapshot_id>.<node_name>.<snapshot_id> Reviewed By: rayleichen Differential Revision: D5665068 fbshipit-source-id: a8103aed2ca195a506174d2a1d50611d2f1d9c35	2017-08-19 04:09:47 -07:00
Ben Zhang	0f35ec9872	Common Subexpression Elimination Summary: A new transform, which combines common subexpressions (where an "expression" is one operator), reducing repeated work. This version is shippable, but one problem: This transform will also combine operators which write to external_output, which will make behavior incorrect. Reviewed By: bwasti Differential Revision: D5629886 fbshipit-source-id: 2bf9f459e2ca633fddc57de85c9fc75845783099	2017-08-18 16:31:48 -07:00
Yangqing Jia	5d24a4eeef	Early design for a general Event abstraction cross-devices. Summary: There are ad-hoc efforts on avoiding excessive device synchronizations, such as async_dag, singlethread_async, etc. This diff aims to provide an early design for a general Event class, that can achieve the following: (1) It is device agnostic, essentially using a vtable to do cross device record, wait and synchronization. (2) Created new functions WaitEvent and Record in the Context class for interacting with Events. (3) Exposed the corresponding WaitEvent and Record functions in the OperatorBase class as well. An example use case is that, after potential future refactoring, one can achieve a real async execution per operator by running op.WaitEvent(previous_event); op.RunAsync(); op.RecordEvent(this_op_event); and the next op can do next_op.WaitEvent(this_op_event); Right now, I changed async_dag net implementation so that it uses the general event design. The old Event class is assimilated to the general Event class and the old Stream class is now essentially taken over by the Context class itself. Reviewed By: harouwu Differential Revision: D5648463 fbshipit-source-id: 58bd84d06e4a9977b0b835110ddb2f18be3b7cbc	2017-08-18 15:46:51 -07:00
Eider Moore	d6632a9a05	Adding a range operator similar to np.arange Summary: Adding a range operator in the spirit of np.arange. It is an imporant building block for a lot of manipulation functions. This accepts parameters with the same meaning in the same order as python's range or np.arange (e.g. `(stop)`, `(start, stop)` or `(start, stop, step)`) Differential Revision: D5616861 fbshipit-source-id: 02622b8bd85ebca125cc881c06fae5b54b7c602a	2017-08-18 14:45:56 -07:00
Philipp Keller	d617a77433	Add tests for ConcatOp and SplitOp Summary: The new test ensures 'add_axis' and 'split' arguments work as intended for tensors of various dimensions. Hypothesis should checks various edge cases like zeroes in 'split_info' and 1D input with axis=0, add_axis=1. Reviewed By: hoangmit Differential Revision: D5645778 fbshipit-source-id: 061f9511a082da54e5c1bbe53a0e7096af4b8d1b	2017-08-18 14:02:42 -07:00
Yiming Wu	d44e2dabbf	Revert D5653336: [caffe2][PR] Add random input scaling Summary: This reverts commit 9c353fbe2bf2207e01bc51d14487de323c68af7b bypass-lint Differential Revision: D5653336 fbshipit-source-id: 0f2de8afbc87e82f74d1de1f61c6ad196da32cc5	2017-08-18 13:31:29 -07:00
Simon Layton	4905ea898a	Add random input scaling Summary: Add ability to specify a range for randomly scaling to a new shortest side. For example, for Resnet50 training, one would set `random_scale=[256,480]` in the `ImageInput` operator to resize to a random shortest side in the range [256, 480] Closes https://github.com/caffe2/caffe2/pull/1106 Differential Revision: D5653336 Pulled By: harouwu fbshipit-source-id: 9c353fbe2bf2207e01bc51d14487de323c68af7b	2017-08-18 11:50:54 -07:00
Ben Zhang	623df4adb3	Fix travis tests, by splitting DummyOp to GraphDummyOp and TransformDummyOp Summary: Tests shouldn't rely on operators defined in other tests, because there is no guarantee that they will build together. transform_test and graph_test did this, and this fixes it. Reviewed By: jerryzh168 Differential Revision: D5657635 fbshipit-source-id: e628fe1791a64bb124cdd8c59e80c0d915bfb281	2017-08-18 11:17:28 -07:00
Kai Arulkumaran	11a14fd0fd	Clarifications on setting up torch.distributed (#2475 )	2017-08-18 09:21:04 -04:00
Xianjie Chen	fa8b8a5f07	improve unsorted segment op speed Summary: use cub DeviceReduce, improve the speed from 23k to 26k, but still far from the 100k, when without dedup. the bottleneck is at UniqueOp Reviewed By: harouwu Differential Revision: D5633828 fbshipit-source-id: e96b8f7317d01c5388c072e7dcfe987abcb01b67	2017-08-17 22:16:43 -07:00
Bor-Yiing Su	1d70a2276d	Changes the checkpoint naming rules. Summary: So far the we format the epoch name with 6 digits, but this is constraining. In order to have consistent naming, we can simply append the epoch to the suffix. Then we will have consistent naming rules for small and for large epoch numbers. Reviewed By: azzolini Differential Revision: D5653871 fbshipit-source-id: acdf26a14b731347bb85fe2f33c1b89e2ba83bdd	2017-08-17 22:16:42 -07:00
Yangqing Jia	2c18748c54	Move set_stream_id() to protected field. Summary: This does not change any existing code behavior - as part of the event abstractions, this is a cautious step to reduce the interfaces exposed from contexts. Nothing else is changing. Reviewed By: harouwu Differential Revision: D5656597 fbshipit-source-id: 53c5caf278613e610daf6ad3ca4bb6da73367cfc	2017-08-17 20:34:44 -07:00
Yangqing Jia	23506824b0	CUDA-related updates to the core overhead benchmark Summary: TSIA Reviewed By: harouwu Differential Revision: D5656471 fbshipit-source-id: 59cc63f37d3cd0c34516bc077be9a11055618628	2017-08-17 19:32:08 -07:00
Aapo Kyrola	db02fbd9bf	Fix stepworkspace sizing Summary: When forward-only mode, we need only 2 workspaces. Errornously we sized the length of the workspace vector to 2 if it was different than 2. But if it was longer (because the step workspaces was shared by an non-forward-only op), we end up deleting the workspaces. With RNN Executor, this is a problem, because it held a reference to the deleted workspaces. Without RNN executor, we just ended recrearing the nets. Reviewed By: jhcross Differential Revision: D5654534 fbshipit-source-id: 1e6276e63453831747fee6a85c5057f01b89fde5	2017-08-17 19:32:07 -07:00
Chonglin Sun	5f612d9740	GPU version of BatchGatherOp Summary: GPU version of BatchGatherOp. Reviewed By: azzolini Differential Revision: D5613593 fbshipit-source-id: 0e4a35b84db852ac2718868a02fa90e7c3d8f1f0	2017-08-17 18:31:10 -07:00
Yiming Wu	6e22427929	fix tci complaining test - test_load_model_from_checkpoints Summary: Travis CI is complaining about test_load_model_from_checkpoints in recent PRs. E: AssertionError: 'trainer:1/task/GivenTensorInt64Fill:0, a C++ native class of type nullptr (uninitialized).' != array([103]) See for example https://travis-ci.org/caffe2/caffe2/jobs/265665119 Reason unkown yet. First disable this then try to fix it Reviewed By: Yangqing Differential Revision: D5655068 fbshipit-source-id: 10949339ec92b0a4c2f0e59246040f1b0510be12	2017-08-17 17:50:42 -07:00
Yan Shang	aad748fbae	Cannot divide on 0 Summary: Add a small fix so that the divident won't be 0. Reviewed By: kittipatv Differential Revision: D5650240 fbshipit-source-id: fe17bdf0595c4ff113428d2bc18bf7c455e85302	2017-08-17 17:50:36 -07:00
Yan Shang	57c93435e3	Dedup name in functional layer Summary: Before this fix, a functional layer name can appear several time in a blob and causes confusion. This diff fix this issue. Reviewed By: kittipatv Differential Revision: D5641354 fbshipit-source-id: d19349b313aab927e6cb82c5504f89dbab60c2f2	2017-08-17 17:50:34 -07:00
Ben Zhang	cfbd116966	ApplyTransformIfFaster Summary: Implemented ApplyTransformIfFaster Determine if a transform is faster, then return whichever net is better. Reviewed By: bwasti Differential Revision: D5534535 fbshipit-source-id: 509943205b0c454bf30fb01343ac4e88d1441c39	2017-08-17 15:36:51 -07:00
Luke Yeager	f7ece79949	Add fp16 and tensorcore support to resnet50_trainer Summary: Use like `--dtype=float16 --enable-tensor-core` Closes https://github.com/caffe2/caffe2/pull/1093 Differential Revision: D5634840 Pulled By: harouwu fbshipit-source-id: 18c1e70236ba5ef8661ff55fb524caae1be19310	2017-08-17 15:16:24 -07:00
gchanan	5b8e2ad2a6	test_distributed cuda tests don't skip if cuda not available. (#2476 ) test_distributed cuda tests don't skip if cuda not available.	2017-08-17 17:45:32 -04:00
Pieter Noordhuis	692f4e4e3b	Disable -Wstrict-aliasing when including cuda_fp16.h Summary: The cuda_fp16.h header in CUDA 9 RC triggers this diagnostic. It is included by cusparse.h as well, so guarding the inclusion of only cuda_fp16.h is not enough. Reviewed By: Yangqing Differential Revision: D5651995 fbshipit-source-id: 4778a8a793761e7a1dbebf3792b85b33a3e26219	2017-08-17 14:15:32 -07:00
Kittipat Virochsiri	fa984af0f9	use create_param() in layers Summary: These layers were not codemoded Reviewed By: chocjy Differential Revision: D5645982 fbshipit-source-id: 4325f77a0f8152dfe6dfdeee59697b25ecb1de35	2017-08-17 13:47:57 -07:00
Aapo Kyrola	7fad4be4c6	Device-specific memongering Summary: Enforce that blobs don't mix between operators on different GPUs or CPU/GPU. Add test. + Fix memonger when no namescope is provided. Reviewed By: asaadaldien Differential Revision: D5644708 fbshipit-source-id: 0cb361efd6361b6e2138462584bab6b4de039b5d	2017-08-17 13:31:26 -07:00
Philipp Keller	4d0fbb0e6f	ConcatOp: fix axis check with add_axis. Summary: when adding a new axis to concatenate along, allow it to be the last axis. For example, concated 1D columns into a 2D matrix with axis=1, add_axis=1. Reviewed By: hoangmit Differential Revision: D5622495 fbshipit-source-id: 8d7c8650c198450ccd4f9e1c98e4ea9f40162be0	2017-08-17 13:03:18 -07:00
James Reed	f388135d3f	Layer norm brew wrapper Summary: Implement a brew wrapper for the LayerNorm op. This adds the scalar weight and bias terms to the op. Reviewed By: jmp84 Differential Revision: D5595836 fbshipit-source-id: 467b2e1158b0c454a149d4b26c47719826e98752	2017-08-17 11:17:47 -07:00
James Reed	e45e621b0e	Implement layer norm gradient GPU Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf Reviewed By: wickedfoo Differential Revision: D5594445 fbshipit-source-id: 873643165c958fd5829fa7cf07d5d4b1b8b0ed59	2017-08-17 11:17:46 -07:00
James Reed	8e8e90f595	IMplement layer normalization backward CPU Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf Reviewed By: jmp84 Differential Revision: D5578306 fbshipit-source-id: 94d262f0317b3ee1b504e0110ad5135afe8350ca	2017-08-17 11:17:46 -07:00
James Reed	e16c40eb4f	Implement layer normalization op forward GPU Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf Reviewed By: wickedfoo Differential Revision: D5552262 fbshipit-source-id: d0cddb0769623a1b3779e2114c19e6ebc57c0f0d	2017-08-17 11:17:45 -07:00
James Reed	474c043be5	Implement layer normalization op forward CPU Summary: Implement layer normalization from https://arxiv.org/pdf/1607.06450.pdf Reviewed By: akyrola Differential Revision: D5543381 fbshipit-source-id: 1102e568439af6a60aad3b87017d5a997fb7dc16	2017-08-17 11:17:44 -07:00
Aapo Kyrola	e89474c496	fix forward_only mode Summary: Forward-only mode had broken at some point. Two things: RNNCell did not pass the parameter to recurrent.py and also recurrent.py was broken if forward_only=True after python3 codemod. Added test to rnn_cell_test to actually check the forward only parameter is passed to prevent future breakage. Reviewed By: jmp84 Differential Revision: D5639306 fbshipit-source-id: b1bbc39d59c3f3734b2f40a1c2f3740c733e0bd4	2017-08-17 10:19:04 -07:00
Jerry Zhang	a63e7314f3	Adding 1d-2d-3d Schemas for Conv and Pool Summary: Add Conv and Pool operators with dimensions. Reviewed By: bddppq Differential Revision: D5588614 fbshipit-source-id: 2552c40dc3ca180a6ab51817d60f0b85b97885d5	2017-08-17 09:45:54 -07:00
Jerry Zhang	4ca5735753	Allow inplace for spatial_bn_op Summary: att Reviewed By: Yangqing Differential Revision: D5644717 fbshipit-source-id: 1a020fe4ca7028056ce7bebddb7bfd1437998530	2017-08-17 09:18:55 -07:00
Badri Narayan Bhaskar	ae2aad9c0d	Operator to Merge ID_LIST features Summary: As an alternative to sharing embeddings, we want to explore merging the ID_LISTs in the net. This commit adds an operator to merge many ID_LIST features into a single one. Differential Revision: D5481523 fbshipit-source-id: 446121122a32de5682d5d75a165370bc8d776d03	2017-08-17 01:16:00 -07:00
Luke Yeager	58838baa75	Remove unused travis scripts Summary: The current scripts live at `.travis/`. These files at `caffe2/.travis/` were apparently added by accident in `fbe2393cc2`. Closes https://github.com/caffe2/caffe2/pull/1102 Differential Revision: D5648563 Pulled By: Yangqing fbshipit-source-id: 8a071f78f466a1c0bbe62b720b50bacc425287bc	2017-08-17 01:05:03 -07:00
Yangqing Jia	578adbe9c0	Adios CNMEM. You will be remembered. Summary: As part of the cuda 9 move we have decided to deprecate the cnmem path as it seems to be superceded by cub if one needs a memory pool. Closes https://github.com/caffe2/caffe2/pull/1104 Differential Revision: D5647672 Pulled By: Yangqing fbshipit-source-id: 988af5bf63e24efa1b631fd91ddb58e798ffc5c6	2017-08-17 00:05:57 -07:00
Jingfei Du	b3029df1d0	Added window mode for caffe2 sequence operator Summary: This can be used for local attention to mask elements outside of a window Reviewed By: jamesr66a Differential Revision: D5643677 fbshipit-source-id: 92b33866258ccc7307d5bcf08234610aa3fb152d	2017-08-16 21:34:29 -07:00
Ahmed Taei	a0fe96d7cd	Rewrite memonger DAG in C++. Summary: This diff replaces the main of the memonger for dag algorithm _compute_blob_recycling_for_dag with a c++ implementation. Reviewed By: akyrola Differential Revision: D5544219 fbshipit-source-id: 9f868880c8d0eb997ad3dd39433f9d0b9216d303	2017-08-16 16:17:15 -07:00
Simon Layton	5fb7853803	Fixes compile errors Summary: Seems to be required for CUDA 9 compilation Closes https://github.com/caffe2/caffe2/pull/1100 Differential Revision: D5642986 Pulled By: harouwu fbshipit-source-id: 5f934d580152d3d66f7baa71695fb8847ee2c029	2017-08-16 15:12:22 -07:00
Sam Gross	661beb3345	Speed-up weight_norm over the right-most dim (#2431 ) When weight-normalizing over the right-most dimension, combine all dimensions to the left into a single dim. This avoids two extra transposes.	2017-08-16 18:04:18 -04:00
Luca Antiga	bbcc7d37ca	Have Tensor.sort accept descending as only argument (#2329 )	2017-08-16 18:01:30 -04:00
Alykhan Tejani	30baba7d15	fix typo in docstring	2017-08-16 17:55:39 -04:00
Soumith Chintala	51385b3887	Merge commit '73e0b3f4014b9f5b716eb1216d11f13347207f27'	2017-08-16 17:53:22 -04:00
Soumith Chintala	2579b6b53f	Merge commit '98ac4542e0e097cd1b26c62d0ffe7fb37230347c'	2017-08-16 17:52:44 -04:00
Anton Osokin	0d34a6451a	fixing the bug with squeezing a singleton dimension in torch.min and torch.max	2017-08-16 17:51:48 -04:00
Anton Osokin	73e0b3f401	fixing the bug with squeezing a singleton dimension in torch.min and torch.max	2017-08-16 17:51:41 -04:00
Anton Osokin	98ac4542e0	fixing the bug with squeezing a singleton dimension in torch.min and torch.max	2017-08-16 17:51:24 -04:00
Luca Antiga	21d8465d8b	Add test for Tensor creation from NumPy on CPU and CUDA	2017-08-16 17:44:58 -04:00
Luca Antiga	7409e0822b	Cuda fixes	2017-08-16 17:44:58 -04:00
Luca Antiga	f269d3f0b5	Add cuda tensor initialization with array	2017-08-16 17:44:58 -04:00
Luca Antiga	727942be55	Use proper type for counter	2017-08-16 17:44:58 -04:00
Luca Antiga	610d9d04e7	Support constructing tensors from arrays of non-matching types	2017-08-16 17:44:58 -04:00
Soumith Chintala	6e1d72998f	Merge commit 'ec2863024434b54f339801266a0e8d2d63a418ce'	2017-08-16 17:26:43 -04:00
Francisco Massa	b797ee04fc	Add CUDA version of eye	2017-08-16 17:25:52 -04:00
Francisco Massa	ec28630244	Add CUDA version of eye	2017-08-16 17:23:28 -04:00
Yiming Wu	a104dac193	remove unsed code and bring back single benchmark mode Summary: the old gpu single benchmark mode is lost in recent changes. We still need this mode to benchmark some operators. I also removed some unused ancient code Reviewed By: azzolini Differential Revision: D5628501 fbshipit-source-id: c5d2c6c99af18c41bead5d86c46a42f05821e2ff	2017-08-16 14:06:31 -07:00
gchanan	0985eaf373	Add ability to specify init_method for test_distributed. (#2465 ) * Add ability to specify init_method for test_distributed. * Move init_method specification to test run line. * Run for gloo tests as well. * Better status message for gloo test.	2017-08-16 17:04:21 -04:00
Kevin Wilfong	1f47a80e88	Caffe2: diagonal fill op Summary: Caffe2: diagonal fill op Reviewed By: panshen1 Differential Revision: D4775640 fbshipit-source-id: bb388ffe223e6b153d4cde1fdad6f84a2bb65b0f	2017-08-16 13:05:11 -07:00
Bor-Yiing Su	30616ee309	Fixes the broken checkpoint test. Summary: Since we temporarily disable checkpointing the readers, we need to rename all the node names in the test to make it pass. Reviewed By: azzolini Differential Revision: D5640930 fbshipit-source-id: 1e61be31ddf9b6e28efd2eb8e6e91e63dcd83154	2017-08-16 11:24:50 -07:00
Lei Chen	14950a9082	Support session in distributed realtime trainer Summary: Convert from PlanDef ProtoBuf into python Plan object by recursively creating Nets and ExecutionSteps. Also support running Plan object directly in Session. Reviewed By: azzolini Differential Revision: D5608393 fbshipit-source-id: c0ae3b6da743a759af6db3b614a5a3935fe0b34c	2017-08-16 10:28:55 -07:00
Aapo Kyrola	a53192e334	Revert D5001637: [Caffe2][RNN] Threaded dependency-aware RNNExecutor (frontier/diagonal execution). Summary: This reverts commit 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8 bypass-lint Differential Revision: D5001637 fbshipit-source-id: 4d6250ae7e66ea0aa635a68d943d552e5db65b69	2017-08-16 03:21:49 -07:00
Yangqing Jia	367106f591	move memory allocators to allocator.{h,cc} Summary: (no major functionality change, purely cosmetic) Closes https://github.com/caffe2/caffe2/pull/1098 Reviewed By: akyrola Differential Revision: D5638664 Pulled By: Yangqing fbshipit-source-id: bbae0589ec1afe938a186ccfce9f6ff1a986a5db	2017-08-16 01:35:20 -07:00
Aapo Kyrola	453c60ce28	Threaded dependency-aware RNNExecutor (frontier/diagonal execution). Summary: This diff adds dependency-aware concurrent/parallel execution of operators in stepnets. For CPU, we use multi-threaded execution. For CUDA, we use multiple streams and cuda events for parallelism and dependency tracking. Much of the diff is about computing dependency graph, which was quite tricky because we need to also avoid write-races of multiple operators running in multiple timesteps in parallel. Also, recurrent blobs "change name" when passing over timestep ("_prev"), so that needs to be handled as well. This diff also restores the link-ops that I unlanded earlier. The performance gain of this diff is very good for CPU (same perf as with static_dag, even better on forward-only). On CUDA, the gains are modest, at least with the sizes i was testing with. Reviewed By: salexspb Differential Revision: D5001637 fbshipit-source-id: 3d0a71593d73a9ff22f4c1a5c9abf2a4a0c633c8	2017-08-15 23:55:15 -07:00
Bor-Yiing Su	49ec942825	Temporarily disables the checkpoints for the readers. Summary: The hive reader checkpoints are broken because of D5582328. This breaks our offline simulator test as well. This is a temporary fix that disables the checkpoints for readers. Reviewed By: azzolini Differential Revision: D5637719 fbshipit-source-id: 4f31ae534cb7e981fcacbb721cbb2420249fad91	2017-08-15 19:36:11 -07:00
Yangqing Jia	1db7a99249	disable travis test for dpm test Summary: After this, we should have test going back to all green. Closes https://github.com/caffe2/caffe2/pull/1058 Reviewed By: harouwu Differential Revision: D5637495 Pulled By: Yangqing fbshipit-source-id: ac3ab5a27bc56e3bb08fa81aa8ed186cb7e8832b	2017-08-15 19:17:41 -07:00
Luke Yeager	f92fdd850d	Important typo in resnet50_trainer Summary: Closes https://github.com/caffe2/caffe2/pull/1092 Reviewed By: Yangqing Differential Revision: D5637489 Pulled By: harouwu fbshipit-source-id: 13609a3e14a45e640849268821fd8565fd7aae4d	2017-08-15 19:03:15 -07:00
Ailing	3a8feb7fb7	Address integer division to make it compatible with py2	2017-08-15 21:12:21 -04:00
Ben Zhang	255b176f6b	Sorted Order and Generalized Pattern Matching Summary: Pattern match currently only supports one type of pattern matching: connected components. It will be useful to sometimes use different algorithms to pattern match, either a subset of the operators in order, or general non-connected subgraphs. While generalized pattern matching can match for all types, it is inefficient to use it when sorted order or connected component suffice. You can can set the PatternMatchType to be one of the three options (it is connected by default), and Transform will use the associated algorithm. We will need this for common subexpression elimination - specifically, sorted order matching. Reviewed By: bwasti Differential Revision: D5629321 fbshipit-source-id: 2104f2d4384fe4aba06a386881a08ca324f290a6	2017-08-15 18:07:01 -07:00
Junjie Bai	8592f00ec4	Revert D5633240: Add CUDNN to the gpu devices' default preferred engines Summary: This reverts commit 99c45c04bf6a3c19f3f7eb27be1bb89344bc03d4 bypass-lint Differential Revision: D5633240 fbshipit-source-id: 18d7f040f7a611c072bc7fbbfc4cd74c9f24cd3e	2017-08-15 17:36:05 -07:00
Junjie Bai	0e419ae1b2	Add CUDNN to the gpu devices' default preferred engines Summary: CUDNN should be almost always faster than the default implementation Reviewed By: Yangqing Differential Revision: D5633240 fbshipit-source-id: 99c45c04bf6a3c19f3f7eb27be1bb89344bc03d4	2017-08-15 15:36:32 -07:00
Douglas Chen	e95b79a69c	Benchmark for embedding generation Summary: Adds a benchmark comparing two methods used to generate positional embeddings, table-based and sinusoid (as in the Transformer paper). Reviewed By: jamesr66a Differential Revision: D5625633 fbshipit-source-id: faee2d20ea0c3d9c41479c5114fa010ac49fab24	2017-08-15 14:22:41 -07:00
Andrew Dye	443a4544d4	Update third_party/gloo Summary: Fixes failure in test_synchronization_barrie Closes https://github.com/caffe2/caffe2/pull/1075 Reviewed By: Yangqing Differential Revision: D5622791 Pulled By: andrewwdye fbshipit-source-id: 6a41e74218ae1d4fc4bbb240e1c438a39f844cf2	2017-08-15 14:04:21 -07:00
Alexander Sidorov	52befa4802	DataParallelModel: take param_init_net into account in _InferBlobDevice Summary: Here is my example: For static RNN timestep is created as a part of param_init_net. Before DPM assumed that it is CUDA blob by default and it participated in broadcasting causing Copy on line 798 to fail. No device mapping is correct for this blob. Reviewed By: akyrola Differential Revision: D5631716 fbshipit-source-id: 28c3eb17ecc3080c95c41d69a60bf7262d3907d4	2017-08-15 12:06:46 -07:00
Edward Z. Yang	b09d7c890e	Copy-edit sparse constructor docs for clarity. Basically, it's easy to confuse the dimensions of the index tensor. This adds some more text which should hopefully clarify the situation. Fixes #2416. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-08-15 13:36:30 -04:00
Alykhan Tejani	606699ef97	add calls to contiguous for cudnn affine_grid	2017-08-15 13:35:35 -04:00
Soumith Chintala	a6ba3581a9	Merge commit '1a68c961ea88fe70cb4fca419e9c4186b75846b3'	2017-08-15 03:01:37 -04:00
Natalia Gimelshein	1a68c961ea	accumulate in accType for reductions over dimensions	2017-08-15 03:00:44 -04:00
Soumith Chintala	08c82f8b9d	Merge commit '54a111564147f7621785b0f284f77a7afd22f337'	2017-08-15 02:59:17 -04:00
Trevor Killeen	c55cc743fb	move clamped random functions out of cwrap and into TH	2017-08-15 02:58:28 -04:00
Trevor Killeen	54a1115641	move clamped random functions out of cwrap and into TH	2017-08-15 02:58:14 -04:00
Rudy Bunel	763fb5d708	Update documentation to reflect Linear with 2+D inputs (#2410 )	2017-08-15 02:55:01 -04:00
Soumith Chintala	fb5e40face	Merge commit '3f25232aaba44aa4377c7e5ed670587a72f5886e'	2017-08-15 02:52:54 -04:00
Soumith Chintala	469969e324	Merge commit '4bca77816e8402539917d61ecce239810d7f3d5e'	2017-08-15 02:52:16 -04:00
Gregory Chanan	b3db52fe36	Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types.	2017-08-15 02:51:25 -04:00
Gregory Chanan	3f25232aab	Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types.	2017-08-15 02:51:11 -04:00
Gregory Chanan	4bca77816e	Support __neg__, .neg(), and neg_() for Long, Int, Short tensor types.	2017-08-15 02:50:55 -04:00
yunjey	d19ee9c182	Add comments for default value (#2282 ) Added comments for default value in conv.py	2017-08-15 02:49:22 -04:00
Sam Gross	f9d02903b7	Always copy indices in Embedding._renorm (#2414 ) LookupTable_renorm sorts and de-dupes the passed in indices tensor in-place. Fixes #2413	2017-08-15 02:46:25 -04:00
jekbradbury	5e088da5ba	Add DistributedDataParallel to docs DataParallel was included twice.	2017-08-15 10:01:36 +05:30
Aapo Kyrola	c05c500a82	check _grad suffix Summary: Memonger had a subtle bug which caused it to recycle "splitinfo" outputs of Concat/Split. That is bad since they are in CPU device, and woult cause them to be realloaced. This caused big slowdown with Kaiming's trainer. Bug was that we checked for gradients as contaning "_grad" in the name, although we should only allow it as a suffix. Admittedly, this is not elegant to do string checking anyways, but that is how Caffe2 works now. Reviewed By: asaadaldien Differential Revision: D5627251 fbshipit-source-id: c12be2323109bf81c3725d8884c7ef024e010bd5	2017-08-14 19:47:59 -07:00
Simon Layton	62dcc2feed	cudnn conv group support Summary: Enable the new convolution group functionality in cuDNN v7 Closes https://github.com/caffe2/caffe2/pull/1079 Differential Revision: D5625074 Pulled By: Yangqing fbshipit-source-id: 00be025b50161a3bae7e7f09712e4b1adeaffd9f	2017-08-14 19:38:19 -07:00
Gregory Chanan	b0da5bf0fb	Use masked_fill rather than casting masks when appropriate.	2017-08-14 16:19:10 -04:00
Gregory Chanan	0b0d2a06f7	Update legacy SoftPlus to add threshold constructor arg.	2017-08-14 16:19:10 -04:00
Gregory Chanan	c92f229aa2	CosineEmbeddingLoss as a new style function.	2017-08-14 16:19:10 -04:00
Gregory Chanan	9bcb9658d5	MarginRankingLoss as new style function.	2017-08-14 16:19:10 -04:00
Gregory Chanan	7aeb837895	Implement HingeEmbeddingLoss double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	1efe38768d	Implement KLDivLoss double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	5106ce67bb	Implment SmoothL1Loss double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	19d4c37ced	Implement MSELoss double backward.	2017-08-14 16:19:10 -04:00
Gregory Chanan	7875c02217	Implement GLU double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	9a243abe5c	Implement Softmin double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	988b0d58e6	Implement LogSigmoid double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	0c3a01fe44	Implement SoftShrink double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	8d38c0ee52	Implement Softplus double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	ea9a7823b4	Implement Hardshrink double backwards.	2017-08-14 16:19:10 -04:00
Gregory Chanan	a6cccc8701	Implement RReLU double backwards.	2017-08-14 16:19:10 -04:00
Juan Miguel Pino	434fa7f694	Reduce memory usage for dot attention Summary: Title Differential Revision: D5569996 fbshipit-source-id: c705fc7870ac3e71a071c3f808ac885a82334af2	2017-08-14 12:35:50 -07:00
gchanan	33383f3912	Reduce overhead of broadcasting when broadcasting isn't required. (#2364 ) * Reduce overhead of broadcasting when broadcasting isn't required. * Fix typo.	2017-08-14 15:00:38 -04:00
Pieter Noordhuis	ca64190491	Update cub submodule Summary: This was updated in 707aed36e89ab9e2041de25166a4930fc4e24ee7 but a force push into https://github.com/NVlabs/cub made the commit Caffe2 was pointing to unreachable. cc slayton58 lukeyeager Closes https://github.com/caffe2/caffe2/pull/1089 Differential Revision: D5621958 Pulled By: pietern fbshipit-source-id: b1242dc6303a38d3ac9adb37e190084a40a66aa2	2017-08-14 11:27:08 -07:00
James Reed	ffd9316b03	Use SequenceMask op in attention code for sequence masking Summary: Use the new SequenceMask op to mask out invalid positions in the attention mechanism rather than using PackSegments and UnpackSegments. This should help us on several fronts, including elision of host<>device copies and using fewer intermediate blobs Differential Revision: D5619156 fbshipit-source-id: e59c644236cee02f853d8743f9a938fb10adc73b	2017-08-12 19:17:49 -07:00
James Reed	a985355935	Gradient for SequenceMaskOp Summary: Implement backward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72. Reviewed By: akyrola Differential Revision: D5618373 fbshipit-source-id: b831fa69f51d9468c858961f922564159e12b46f	2017-08-12 14:34:29 -07:00
James Reed	0a828768e9	Implement SequenceMaskOp forward pass Summary: Implement forward pass for a SequenceMaskOp to replace https://github.com/caffe2/caffe2/blob/master/caffe2/python/attention.py#L54-L72. This implements two modes: a sequence-length based mode and a matrix triangle mode. Reviewed By: akyrola Differential Revision: D5615493 fbshipit-source-id: a2ce4a8e655d9b720049010a7856be052c5567eb	2017-08-12 14:34:28 -07:00
Bor-Yiing Su	8a5bdc383e	Fixes the flaky upload test Summary: The LocalSession does not work with the multi-node definitions. The test becomes flaky because of that. The fix is to create different LocalSession for each Node(), and run each node sequentially. Differential Revision: D5617857 fbshipit-source-id: a8079a90291b4c8b5aa6b471c33c06d18e59976c	2017-08-11 18:58:24 -07:00
Luca Antiga	cd5275e79f	Convert upsampling Functions to new style (#2372 )	2017-08-11 21:03:58 -04:00
Benoit Rostykus	641e582f31	Fix typo (#2378 )	2017-08-11 20:57:26 -04:00
Gregory Chanan	3285dc12c9	Avoid reorder warnings with -Wreorder	2017-08-11 18:41:54 -04:00
Bor-Yiing Su	404f8ee9b4	Extends the jobrunner to support uploading checkpoints. Summary: 1. Adds one more step in the JobRunner class to upload checkpoints. 2. Adds one function to return the name of the checkpoint given the name of the node. Reviewed By: andrewwdye Differential Revision: D5597130 fbshipit-source-id: 570a55785e6227859e1115326d6cab077f0e7f72	2017-08-11 14:17:17 -07:00
Zhaoming Wu	399fc9fb09	Added Nesterov Summary: Added Nesterov momentum as an option for BMUF and corresponding tests Reviewed By: asaadaldien Differential Revision: D5599888 fbshipit-source-id: 30819c9e689347c8b75daddc7444bea9f54193ae	2017-08-11 13:52:43 -07:00
Jerry Pan	9372ff7a86	Caffe2: support Tensor in BlobsQueueDB Summary: Caffe2: support Tensor in BlobsQueueDB Reviewed By: kevinwilfong Differential Revision: D5589616 fbshipit-source-id: 66aa6092b6403960c4858abd986771b58be94106	2017-08-11 11:21:14 -07:00
Gregory Chanan	dd5618aa49	Remove unnecessary moves in convolution autograd.	2017-08-11 10:47:26 -04:00
Jon Morton	319c46fa1c	Vectorize ELU op on CPU Summary: ##select()##, used previously by the ELU implementation, is not vectorized for vector maps in Eigen. This change switches the ELU cpu implementation to use ##cwiseMin## and ##cwiseMax##, which increases the perf by about 4x. Reviewed By: Maratyszcza Differential Revision: D5609370 fbshipit-source-id: 99560a25e0ea2cd35e34aa50c65e53788a6be6b0	2017-08-10 21:52:49 -07:00
Simon Layton	85788a0f65	Add TensorCore support Summary: Add support for TensorCore convolution and gemm on Volta hardware. Currently built on top of #1055 Closes https://github.com/caffe2/caffe2/pull/1056 Differential Revision: D5604068 Pulled By: Yangqing fbshipit-source-id: 100f67e26ed5fabb1dbb31dcd77f7ecb84de4ee7	2017-08-10 20:16:48 -07:00
Alexander Sidorov	a7be496fe2	Revert D5589309: modify _LSTM into _RNN to adapt GRU Summary: This reverts commit f5af67dfe0842acd68223f6da3e96a81639e8049 bypass-lint Differential Revision: D5589309 fbshipit-source-id: 79b0a3a9455829c3899472a1368ef36dc75f6e14	2017-08-10 16:42:41 -07:00
Kittipat Virochsiri	b91c2f5064	Make reservoir sampling thread safe Summary: Guarding reservoir sampling with mutex & fix the bug in counting number of new entries. Reviewed By: chocjy Differential Revision: D5503300 fbshipit-source-id: fd6b0bacb71fbab99d6d5df2c72da523fba02847	2017-08-10 15:27:21 -07:00
Kittipat Virochsiri	9c4872f4bc	Reservoir sampling with object ID deduplication Summary: Adding the option to dedup by object ID so that more frequent objects are not present more than once in the reservoir Reviewed By: chocjy Differential Revision: D5503109 fbshipit-source-id: e36c3ad8eea134d6c10a4c875fceadc0f843c976	2017-08-10 15:27:20 -07:00
Kittipat Virochsiri	f78af06f1b	Features collection with reservoir sampling Summary: Make the candidate pool less localized Reviewed By: chocjy Differential Revision: D5453289 fbshipit-source-id: 848cb7551d7112f6f47f2cf647bb0daca6eff341	2017-08-10 15:27:20 -07:00
Kevin Wilfong	5dba88b40b	Caffe2 [easy]: Better exception logging in parallel_workers/data_workers Summary: Instead of printing the exception using print() use traceback.print_exc() This way you get a stack trace Reviewed By: jay-mahadeokar Differential Revision: D5604642 fbshipit-source-id: f8cb67e554305cd2fbed384a4a2040fa2b16e7c0	2017-08-10 15:27:19 -07:00
Kittipat Virochsiri	8d342fc6e2	Sampling random negative based on sparse features Summary: Avoid labelling objects similar to true positive (according to raw ID features) as negative. Reviewed By: chocjy Differential Revision: D5336506 fbshipit-source-id: 05f68f5d0af2a6eb907963d38702f0d6e9b2f99b	2017-08-10 15:27:18 -07:00
James Cross	4758bd851b	rectify args btw. train and translate Summary: Make the command-line arguments pertaining to model architecture the same as between train.py and translate.py. Also use s() scoping function for all intermediate blobs in attention.py (this is for comatibility with multi-headed attention). Differential Revision: D5594312 fbshipit-source-id: cadf51d854b5a9174ec913f32c655be2abf111e5	2017-08-10 15:27:18 -07:00
Christopher Hay	f2dfb40302	Added amplitude argument to SinusoidPositionEncodingOp Summary: In order to control the absolute scale/magnitude of the output of this op, added a tuning parameter: amplitude Reviewed By: jamesr66a Differential Revision: D5596574 fbshipit-source-id: 3b7e316de55cce6fd686da70aa5658ec3e99b070	2017-08-10 15:27:17 -07:00
Ahmed Taei	5bb1e6b817	Allow passing unsymmetric 2d kernels to brew.conv. Reviewed By: jay-mahadeokar Differential Revision: D5598523 fbshipit-source-id: 47135a8562f7c720badb2be677cb79730dc417a0	2017-08-10 15:27:16 -07:00
Fabio Riccardi	ad84747433	Optimized Tiling Code Summary: Turned a number of uniform shader variables into constants Differential Revision: D5596760 fbshipit-source-id: 68004c081c6b9ba2e55f7f74e48a673489c927b1	2017-08-10 15:27:16 -07:00
Fei Sun	52fa113774	Sync opengl changes Summary: Sync the opengl files to make github up-to-date Test Plan: Reviewers: Subscribers: Tasks: Tags: Blame Revision:	2017-08-10 14:06:45 -07:00
Gregory Chanan	d79662088c	Remove unnecessary moves, avoid IncRef/DecRef of PyBools.	2017-08-10 14:04:53 -04:00
Gregory Chanan	062673db88	Properly pass saved_for in BatchNorm/Conv as the relevant Backward function. Previously, these Functions passed themselves, i.e. the saved_for from ConvForward would be ConvForward.	2017-08-10 14:04:53 -04:00
Gregory Chanan	2f624dfd90	Add AutoGPU guard and properly reference Python args from BatchNormBackwardBackward.	2017-08-10 14:04:53 -04:00
Gregory Chanan	50c208a50b	Revert "Fix typos." This reverts commit 4622b3395276b37e10141fab43ffea33941ca0c2.	2017-08-10 13:57:00 -04:00
ngimel	7f097f4b82	call gemmStridedBatched for cuda >=8 to avoid calling kernels to set up pointers (#794 )	2017-08-10 01:37:10 -04:00
Fei Sun	01051334a2	Add CMakeLists.txt files in opengl directory Reviewed By: Yangqing Differential Revision: D5594761 fbshipit-source-id: 2282407bd7fc3a8c9019e16d2e77c45e5b71b4d7	2017-08-09 14:39:31 -07:00
Aaron Markham	e908cf28f4	Docker move Summary: Bringing over selected dockerfiles from documentation branch and updated the GPU Dockerfiles to use some of lukeyeager provided docker configurations. Latest docker with CUDA 8.0 and cuDNN 6 can be pulled via `docker pull caffe2ai/caffe2` or built with `ubuntu-16.04-cuda8-cudnn6-all-options/Dockerfile`. You must use nvidia-docker instead of docker to run the GPU-enabled dockers. Tutorial files can be overlaid by building `ubuntu-16.04-gpu-tutorial/Dockerfile`. Supersedes #911. Closes #876. Closes #923. Closes https://github.com/caffe2/caffe2/pull/949 Reviewed By: Yangqing Differential Revision: D5510872 Pulled By: aaronmarkham fbshipit-source-id: 390f5eea1d9ec1a3edda828470b12386ab8a1775	2017-08-09 13:54:17 -07:00
Kittipat Virochsiri	eb85258beb	CreateMapOp Summary: Add operator to create empty map Reviewed By: xianjiec Differential Revision: D5454652 fbshipit-source-id: ecad6cc58572b378962af08cf02063ef546ed58f	2017-08-09 13:32:19 -07:00
Tao Wu	7b86a34610	modify _LSTM into _RNN to adapt GRU Summary: GRU is different than LSTM that it only has hidden states but no cell states. So in this case, reusing the code of _LSTM is problematic, as we need to delete the part of creating cell state, and change many other places that use hard-coded 4 (hidden_all, hidden, cell_all, cell) into 2 (hidden_all, hidden). Otherwise GRU will break during the backward pass, when the optimizer tries to apply gradient to each of the parameters, because cell state is never used, so it does not have gradients for the corresponding parameters (i.e., cell_state_w, cell_state_b). Differential Revision: D5589309 fbshipit-source-id: f5af67dfe0842acd68223f6da3e96a81639e8049	2017-08-09 13:24:45 -07:00
Aaron Markham	784ba07bf3	updated downloader to use s3 url without a redirect via the vanity url Summary: Model downloader was broken after the move on s3 to the vanity url, download.caffe2.ai. Using this as the url base hits a redirect, and will result in the script throwing a 403 error. Rather than upgrading to urllib2 or putting in a bunch of code to handle a redirect on urllib, we can just use the non-vanity base url. Closes https://github.com/caffe2/caffe2/pull/1020 Reviewed By: Yangqing Differential Revision: D5568686 Pulled By: aaronmarkham fbshipit-source-id: d88a6b3e1b7955835fc03b036dc54dec48316e7f	2017-08-09 12:25:30 -07:00
Simon Layton	d4e687d6aa	Add NCCL_VERSION_MIN, use v2 API if installed Summary: Basic NCCL 2 API support - the same as applied to gloo [here](`49586d9556`) /cc Yangqing pietern Closes https://github.com/caffe2/caffe2/pull/1055 Reviewed By: Yangqing Differential Revision: D5583234 Pulled By: bwasti fbshipit-source-id: 3a9ce302649fdab9ce897613b94788c1843262e2	2017-08-09 12:10:03 -07:00
Yangqing Jia	bf18d85945	Clean cmake script options, and add USE_METAL to optionally build ios metal code. Summary: Closes https://github.com/caffe2/caffe2/pull/1063 Differential Revision: D5591620 Pulled By: Yangqing fbshipit-source-id: 99a674221413568c3301cf4decb5697d0788dd48	2017-08-09 09:23:22 -07:00
Yangqing Jia	a6bf0ca4da	Bump travis osx to 8.3.3 Summary: This is needed for metal build. Note that for older xcode (7.3), right now ios build fails due to not having metal headers. We will require xcode 8.0 onwards now. Closes https://github.com/caffe2/caffe2/pull/1062 Differential Revision: D5591536 Pulled By: Yangqing fbshipit-source-id: 57fbb9e052629ce6ecc16f1ea5179e3303a10907	2017-08-09 01:21:39 -07:00
Junjie Bai	1ce95090ca	Add support for specifying engine preferences Reviewed By: Yangqing Differential Revision: D5460994 fbshipit-source-id: 08a8af699eebec37defc070389a8415b3e81ac16	2017-08-09 00:47:18 -07:00
Yangqing Jia	5e0d434b4b	Add build support for opengl and latest nnpack. Summary: (1) Changed android-cmake to use Yangqing/android-cmake, which supports NEON fp16. (2) Added cmake scripts to build opengl. (3) Updated nnpack to master, and changed the corresponding build files. Closes https://github.com/caffe2/caffe2/pull/1061 Differential Revision: D5591387 Pulled By: Yangqing fbshipit-source-id: 1d3f28511d33c09df6ecef5041448ac9a3246601	2017-08-09 00:31:53 -07:00
Yangqing Jia	dba6a32450	Revert #1027 Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/1060 Differential Revision: D5590958 Pulled By: Yangqing fbshipit-source-id: e557eb604e5838255c82c3f59f07f4037cf0a487	2017-08-08 22:50:56 -07:00
Hans Gaiser	218f4506fd	Fix CUDA check for gcc > 5. Summary: In response to https://github.com/caffe2/caffe2/pull/504 , this PR modifies the gcc compiler check for CUDA slightly. All ABI since [gcc-3](https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html) are compatible with eachother. The check from https://github.com/caffe2/caffe2/pull/504 forced the 'regular' CXX / CC compiler to be set to gcc < 6 but this is not required. According to the documentation for [FindCUDA](https://cmake.org/cmake/help/v3.0/module/FindCUDA.html), `CUDA_HOST_COMPILER` is set to `CMAKE_C_COMPILER` by default. This PR checks if `CMAKE_C_COMPILER` is too new for CUDA 8 and whether `CUDA_HOST_COMPILER` is set to `CMAKE_C_COMPILER`. It also modifies the message slightly. Closes https://github.com/caffe2/caffe2/pull/525 Differential Revision: D5590749 Pulled By: Yangqing fbshipit-source-id: 89f9ea7aecc787d6b74bf794da8aea82fc547ec1	2017-08-08 22:35:04 -07:00
Jason Juang	1c0d20d58c	add in make uninstall for cmake Summary: After sudo make install, it is quite cumbersome to remove the installed files manually.This change allows the user to simply type sudo make uninstall to remove all installed files. Closes https://github.com/caffe2/caffe2/pull/748 Differential Revision: D5590971 Pulled By: Yangqing fbshipit-source-id: b354640056c88b9975dd0cf195a6a4d8cad8d0ab	2017-08-08 22:10:07 -07:00
Edward Z. Yang	595f1a92e0	Conda packaging for caffe2. Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/caffe2/caffe2/pull/1019 Differential Revision: D5590909 Pulled By: Yangqing fbshipit-source-id: 8aa4bb687555a93b693b7c70198c6708db4da441	2017-08-08 21:55:30 -07:00
Jonathan R. Williford	7efb83ae52	Require C++11 support with CMake functions. Summary: This PR replaces PR #464. It requires C+11 support using the new CMake variables (`CMAKE_CXX_STANDARD`, `CMAKE_CXX_STANDARD_REQUIRED`, etc.) when CMake is version 3.1 or above. Otherwise, if CMake is older (e.g. Ubuntu 14.04) it falls back to using the -std=c++11 flag and issues a warning. This PR is based on the comment from Yangqing: https://github.com/caffe2/caffe2/pull/464#issuecomment-305376923 The corresponding line in cmake/MiscCheck.cmake is removed in order to reduce redundancy. Another option would be to move the C++11 logic to MiscCheck.cmake. Closes https://github.com/caffe2/caffe2/pull/1027 Differential Revision: D5590646 Pulled By: Yangqing fbshipit-source-id: 11ac63fbeaab7a1da02115549e214f9c529f1873	2017-08-08 20:48:38 -07:00
Priya Goyal	5c77cc8182	Exposing num_workers as parameter and enable recycling activations Summary: as promised, a separate diff for dpm changes I made in experimental code Reviewed By: pietern Differential Revision: D5551304 fbshipit-source-id: 9013aeab6c388b1c415ffb2e36fb8dd6b8cf90b0	2017-08-08 19:48:41 -07:00
Sri Krishna	1199e3d496	changed a small mistake in cross entropy doc (#2292 )	2017-08-08 22:04:19 -04:00
gchanan	c000d15058	Properly use Py_RETURN_True, Py_RETURN_False in back compatibility warnings. (#2345 )	2017-08-08 21:54:20 -04:00
Robert Kirby	9199c954f1	Fix typo in DistributedDataParallel (#2320 )	2017-08-08 21:53:42 -04:00
Luca Antiga	1ac98b1bce	Add documentation for apply (#2327 )	2017-08-08 21:53:26 -04:00
gchanan	9357b8fafc	new_criterion_tests is redefined so BCELogitsWithLoss tests don't execute. (#2347 )	2017-08-08 21:53:15 -04:00
Jammy Zhou	b96c4e714b	Fix build failure on MacOS X with clang-800.0.42.1 Summary: Signed-off-by: Jammy Zhou <jammy.zhou@gmail.com> Closes https://github.com/caffe2/caffe2/pull/1047 Differential Revision: D5583196 Pulled By: Yangqing fbshipit-source-id: 7fe782b6caa14074573fbdacd68f50e16fb85e3f	2017-08-08 18:49:27 -07:00
Andrei Chtcherbatchenko	a2204f0b1e	Caffe2: Write CUDA version of OneHot operator Summary: This diff implements CUDA version of OneHot operator. Reviewed By: bddppq Differential Revision: D5578543 fbshipit-source-id: 55b70e8ec6ee34b647b9140fecbba31b6968f403	2017-08-08 18:17:39 -07:00
Chonglin Sun	0cf488295d	fix Windows build breaks by LengthsTopKOp Reviewed By: Yangqing Differential Revision: D5584020 fbshipit-source-id: f351eaad04fb5319230ebdae5c51d60a7161eff6	2017-08-08 18:06:24 -07:00
Long Jin	ef64a4f6b2	Add conv layer and layer tests Reviewed By: xianjiec Differential Revision: D5569206 fbshipit-source-id: ed836315f3ee4d7983da94f2633a3085fe99194d	2017-08-08 10:57:43 -07:00
Jianlong Zhong	152d2ae3a8	Implement CUDA version of GRU operator Summary: Add CUDA version of GRU operator Reviewed By: jamesr66a Differential Revision: D5571043 fbshipit-source-id: 332aa64fc8a9116cc33382f2b2907080e58c13b3	2017-08-08 10:57:40 -07:00
Yangqing Jia	190a1dda5b	fix thread_pool.h Summary: TSIA. accept2ship Reviewed By: ajtulloch Differential Revision: D5581086 fbshipit-source-id: 47220312b751aeef13f87faaae5a55bcd4e147eb	2017-08-08 10:32:08 -07:00
Yangqing Jia	679a586d53	Fix metal build after sync Summary: While I was trying to make a quick oss cmakefile, I found that some of the ios source files are out of sync with the most code changes. This diff should fix the issues. I manually ran cmake on the oss side with scripts/build_ios.sh to make sure things pass. Reviewed By: ajtulloch Differential Revision: D5582265 fbshipit-source-id: 2636d353d32fcd8fb7087385b9bbed8476e33e74	2017-08-08 10:18:13 -07:00
James Cross	9fcf676cfa	testing for open-source seq2seq Summary: Fix multilayer inference in Caffe2 example seq2seq code. (Rely on LSTMWithAttentionDecoder.apply rather than fixed state indices to determine stepwise decoder output.) Also assorted updates to bring code in line with changes elsewhere in the codebase, and added unit tests which ensure that training and inference networks generate the same loss, which should make these problems much easier to identify in future. Reviewed By: jamesr66a Differential Revision: D5579803 fbshipit-source-id: 6e0f27340d981990ab8d0da58e63793222e7be87	2017-08-08 10:09:41 -07:00
Mo Zhou	35bbb7bfba	THD: add a missing header to fix build failure	2017-08-08 11:08:07 -04:00
Zhou Mo	4622b33952	Fix typos.	2017-08-08 11:05:38 -04:00
Andrew Tulloch	ac3a1328d5	Remove unnecesary .proto files Reviewed By: Yangqing Differential Revision: D5577595 fbshipit-source-id: cd234893a1be3807aca3195bb29aab7ecfee2d8a	2017-08-08 07:17:07 -07:00
Trevor Killeen	751198f3b1	move cpp flags earlier (#2325 )	2017-08-08 07:22:33 -04:00
adampolyak	e51fec3be0	Update sparse.py (#2336 )	2017-08-08 07:16:52 -04:00
Sasank Chilamkurthy	5caa42b538	Add ConcatDataset to docs (#2337 )	2017-08-08 07:16:04 -04:00
Aapo Kyrola	07e745408b	revert D5528436 Summary: Users are reporting CUDA illegal access errors happening on some configurations after D5528436 introduced lazy peer connections. Will debug later, but this diff is to revert that change. Reviewed By: pietern Differential Revision: D5581673 fbshipit-source-id: ef8e367160a38fc62434d6f5905892db274d9f06	2017-08-07 23:07:50 -07:00
Chonglin Sun	8ad382df3c	implement LengthsTopK operator Summary: It was reverted previously because of lack of schema for gradient op. Added it back and resend. difference between this diff and previous reverted diff: 1. added schema for gradient operator 2. change line:95 in kmax_pooling_op.h from CAFFE_ENFORCE to CAFFE_ENFORCE_GE Reviewed By: xianjiec Differential Revision: D5568867 fbshipit-source-id: 39813b389a5da803967a561249793afdfce00c58	2017-08-07 18:19:29 -07:00
Ahmed Taei	8af625ede2	Implement gradients for Col2Im and Im2Col operators Reviewed By: jay-mahadeokar Differential Revision: D5576385 fbshipit-source-id: a0ca4f704fd861f7cc67079041b1d0772fc66920	2017-08-07 15:51:30 -07:00
Romain Cledat	ddc1b288bb	Improve logic when creating a common world Summary: When creating a common world, we would attempt to create one using an existing common world to save on setup cost. This could cause unexpected behavior when the backing common world had a shorter timeout than the world being created. This patch improves this logic by limiting the usage of a backing world to only ones that have a long enough timeout. Reviewed By: andrewwdye Differential Revision: D5570904 fbshipit-source-id: d3b5073a64381ed068a30dcc461a6ec9ce15ad9c	2017-08-07 15:51:29 -07:00
Yangqing Jia	5ae3865112	Fix build Summary: (1) BlobsQueue is causing a gcc error (google search suggeste it was a bug, but we'll put the implementation in a separate cc file). (2) Preparing for cuda 9: update cub. (3) Prepare for cudnn 7: update cudnn rnn op. (4) Fix an MSVC issue Reviewed By: sf-wind, jerryzh168 Differential Revision: D5574352 fbshipit-source-id: 230820ce3ceaa32bee8323bdc509de352c93fcf2	2017-08-07 15:34:49 -07:00
Yangqing Jia	a11aa0ab35	remove mpscnn-fb folder for the new contrib/ios sync. Summary: The mpscnn-fb folder was intended for our earlier sharing of the MPSCNN code. Now that we have fully migrated the code, one should check contrib/ios instead. accept2ship Reviewed By: ajtulloch Differential Revision: D5577227 fbshipit-source-id: df3706a272f022ea6e529f38d960bce374f79baa	2017-08-07 15:19:06 -07:00
Trevor Killeen	1449c2c821	long -> int64_t in convolution autograd	2017-08-07 18:16:01 -04:00
Junjie Bai	95d4561d05	Fix canonical_axis_index_ enforce failure when doing memonger shape inference for RN101 Summary: Add TensorInferenceFunction for ReduceBackMean operator Reviewed By: akyrola Differential Revision: D5570720 fbshipit-source-id: f51d6cbec8bf32131ee34a32deff216df372e3a9	2017-08-07 14:53:59 -07:00
Ahmed Taei	647f35e742	Fix SyncAllParamsDistributed for Python 3x Summary: In Python 3x dictionary values aren't a list and can't be concatenated to a list this diff should fix that. Reviewed By: andrewwdye Differential Revision: D5576724 fbshipit-source-id: c60441857ceceb9c4a71122d2db5e9abad6d3fc2	2017-08-07 14:23:32 -07:00
Ben Zhang	42fb87d0b1	L1Distance Row-wise, instead of cumulative Summary: The L1Distance operator used to return a single value denoting the L1 of the entire input, instead of a vector for each input value. This fixes that. Reviewed By: Yangqing Differential Revision: D5570385 fbshipit-source-id: fbab0e0c9262ccbdb3af27262b8baacdeb2d0fc9	2017-08-07 14:09:25 -07:00
Jacqueline Xu	a1bf14d8e6	Building new randomized sparse nn model Summary: New hybrid randomized sparse nn, which allows layers of sparse NN model to be randomized, semi-random, or learnable Reviewed By: chocjy Differential Revision: D5416489 fbshipit-source-id: eb8640ddf463865097ba054b9f8d63da7403024d	2017-08-07 12:48:58 -07:00
Zhicheng Yan	e7192c3b91	image_input_op_dense_multi_label Summary: To train an image model, we also can use label embedding vector as supervision as opposed to using SoftmaxLoss/SigmoidCrossEntropyLoss. In such case, the label is a dense vector. This diff enables such use cases. Reviewed By: panshen1 Differential Revision: D5556203 fbshipit-source-id: 52c61495e02fab457dc2d43e3345d7dbd5580ab7	2017-08-07 12:38:16 -07:00
Kevin Wilfong	d072701547	Caffe2: Refactor the core logic from data_workers.py into parallel_workers.py Summary: data_workers.py provides a really nice, easy way to run background threads for data input. Unfortunately, it's restrictive, the output of the fetcher function has to be a numpy array. I pulled out that core nice thread management into parallel_workers, and updated the classes data_workers to extend those classes. The main change was refactoring out most of the queue handling logic into QueueManager. This way parallel_workers can be used to manage background threads without having to use the queue for output. Reviewed By: akyrola Differential Revision: D5538626 fbshipit-source-id: f382cc43f800ff90840582a378dc9b86ac05b613	2017-08-07 10:14:08 -07:00
Yangqing Jia	cc2c4d07d6	Always use assertAlmostEqual for floats when crossing python and C boundaries Summary: This fixes travis numerical issue. Closes https://github.com/caffe2/caffe2/pull/1024 Differential Revision: D5571340 Pulled By: Yangqing fbshipit-source-id: 097e6f91da68cc3eacf21fe109f342e0dddea189	2017-08-06 14:51:11 -07:00
Yangqing Jia	836af7f211	update gloo to master. Summary: Per pietern this should fix the gloo travis testing error. Closes https://github.com/caffe2/caffe2/pull/1023 Differential Revision: D5571334 Pulled By: Yangqing fbshipit-source-id: 9dfe38fd24830510a2f8e4f39d188c186453a864	2017-08-06 14:51:10 -07:00
Geunsik Lim	02e5367bdd	Support a build script for Tizen target Summary: There does not exist appropriate build script for Tizen software platform. This commit is to fix #847. Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Closes https://github.com/caffe2/caffe2/pull/877 Differential Revision: D5571335 Pulled By: Yangqing fbshipit-source-id: 12759a3c0cb274ef93d7127b8185341e087f2bfa	2017-08-06 14:51:09 -07:00
Yangqing Jia	42806d6815	kick fb sync fbshipit-source-id: 9c08c6da71565c0f3e4df0c6f9aa67125afe2330	2017-08-06 14:37:43 -07:00
Simon Layton	e97c04118e	CUDA 9 support Summary: Adds support for the CUDA 9 toolkit. Includes new fp16 data type fixes, and changes to warp-synchronous programming. Also updates CUB third-party repo for CUDA 9 support. Closes https://github.com/caffe2/caffe2/pull/853 Differential Revision: D5548507 Pulled By: Yangqing fbshipit-source-id: c7fd2edb623f2aa8c67b9a1000efc8f71e6832ab	2017-08-06 11:50:17 -07:00
Juan Miguel Pino	4d8a8c2e1e	Implement dot attention Summary: Implement dot attention as described in https://arxiv.org/abs/1508.04025 This saves the computation of weighted encoder outputs in `rnn_cell.py` When the encoder and decoder dimensions are different, we apply an FC, which corresponds to the general case below Figure 2. Refactored unit tests. Reviewed By: jhcross Differential Revision: D5486976 fbshipit-source-id: f9e9aea675b3b072fbe631bc004199b90a9d95cb	2017-08-06 11:50:16 -07:00
Ahmed Taei	ac6eee1118	Delete duplicate PoolOp cuDNN implementation. Reviewed By: sf-wind Differential Revision: D5566421 fbshipit-source-id: 7ccbbd6b6b4f1cd372e8525b6a753c3ab7113c0f	2017-08-06 11:50:15 -07:00
Jerry Pan	fac241bcbc	Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface Summary: Caffe2: add a DB that's wrapped around a BlobsQueue as an adapter for data from non-DB interface. This is useful for bridging the gap between DB interface data processing ops (TensorProtosDBInput, ImageInputOp etc.) and data that's coming from arbitrary Python or the pretty intricate Hive reader. Reviewed By: akyrola Differential Revision: D5554560 fbshipit-source-id: 01bb0056410f9ade205367d5fefc721f91f5b629	2017-08-06 11:50:14 -07:00
Fei Sun	4ad1dbc189	Strip unnecessary files in xplat/fbcode Summary: Now Caffe2 is replicated in three code bases. Some directories are only for mobile or only for server. Need to strip the unnecessary files in checkout. run command to strip the files checked out in mobile hg sparse --enable-profile fbandroid/xplat/caffe2/.hgsparse-caffe2-xplat run command to strip the files checked out in server hg sparse --enable-profile fbcode/caffe2/.hgsparse-caffe2-dev Reviewed By: mzlee Differential Revision: D5557190 fbshipit-source-id: e41c8edab09d3fafcb0c8e40ebe1c6809388dc02	2017-08-06 11:50:14 -07:00
Yangqing Jia	91c9812dd1	Sync of codebases This is so that we can do per-commit sync between codebases, removing the current tech debt of manual syncing. The code is contributed by various folks: @tulloch for ios, @bwasti for snpe, @fricc33 and @hlu for opengl, among many others. @feisun (sf-wind) made the original sync.	2017-08-06 11:27:06 -07:00
Hugh Perkins	1654bc9335	add shape to pass-throughs	2017-08-06 10:54:02 -04:00
Soumith Chintala	d2b61e4db9	Merge commit '95f357ffcfe431c544b5fcfa8df402b1507baca3'	2017-08-04 19:51:48 -04:00
Soumith Chintala	2490f3c955	Merge commit '24c496bdda9e5feae813868d901de67d516cf8e8'	2017-08-04 19:51:12 -04:00
Trevor Killeen	87451fd643	move normal variants to TH/THC	2017-08-04 19:50:23 -04:00
Trevor Killeen	95f357ffcf	move normal variants to TH/THC	2017-08-04 19:49:49 -04:00
Trevor Killeen	24c496bdda	move normal variants to TH/THC	2017-08-04 19:49:30 -04:00
Adam Paszke	4599c0c7df	Update autograd notes (#2295 )	2017-08-05 05:18:05 +05:30
Trevor Killeen	8ce4401f09	documentation nit fix for torch.Tensor.random_ (#2297 )	2017-08-05 04:31:15 +05:30
Sylvain Jeaugey	03d856977e	Update README to link to NCCL2	2017-08-04 09:44:37 -07:00
Sylvain Jeaugey	4a33f66e27	Update README to link to NCCL2 part 3	2017-08-04 09:44:09 -07:00
Sylvain Jeaugey	d66fb63679	Update README to link to NCCL2 #2	2017-08-04 09:43:29 -07:00
Sylvain Jeaugey	80ae43b443	Update README to link to NCCL2	2017-08-04 09:42:25 -07:00
Soumith Chintala	1baae004bf	cuda 7.5 fix for gloo	2017-08-04 06:01:54 -04:00
Szymon Piechowicz	12f25c8106	Revert D5545533: [pairatt] implement kMaxPooling operator Summary: This reverts commit 8378caaac528a71c154067168787ed493bfb0d37 bypass-lint Differential Revision: D5545533 fbshipit-source-id: a8d9db807f5b22461b21b7589886cf54861e3757	2017-08-04 01:33:29 -07:00
Jiyan Yang	4b80ff89e2	Use softsign op for s=0 in arc-cosine feature map Summary: The current implementation for s=0 doesn't support backward pass. Switching to using pow op instead as a temporary solution. Reviewed By: jackielxu Differential Revision: D5551742 fbshipit-source-id: 33db18325b3166d60933284ca1c4e2f88675c3d3	2017-08-03 23:35:11 -07:00
Xiang Wei	5d721c1c14	Some adjustments for Windows build Summary: 1. switch the protoc building system from msbuild to cmake 2. set default CMAKE_GENERATE to VS2015 3. set default CMAKE_BUILD_TYPE to Release 4. improve error handling 5. add the generated protobuf include path 6. exclude many optional dependencies from build_windows.bat Closes https://github.com/caffe2/caffe2/pull/1014 Differential Revision: D5559402 Pulled By: Yangqing fbshipit-source-id: 019e3a6c3c909154027fa932ce1d6549476b23bb	2017-08-03 17:54:12 -07:00
Li Dong	6648677acf	[doc] variable shape error of LSTMCell, GRUCell (#2289 )	2017-08-04 06:18:51 +05:30
Dmitrii Podoprikhin	a95f7aa38b	Fixed bug with blob_test Reviewed By: Yangqing Differential Revision: D5556434 fbshipit-source-id: 4877872e9b1357a5c5a338ef06c67d6ac409f0a6	2017-08-03 17:22:49 -07:00
Gregory Chanan	977f9644c0	Fix ZeroPad2d backwards with negative pads.	2017-08-04 05:40:31 +05:30
Hung-Ju Chen	01b6a5a3ea	Add boolean type in input2 and input3 for caffe2: Where operator Summary: Caffe2: Where operator allows users to specify three inputs: input1: TensorTypes<bool> input2: TensorTypes<float, double, int, long, std::string> input3: TensorTypes<float, double, int, long, std::string>, which allows users to do the operation: output = input1? input2:input3 We found that there is a need to add boolean type in input2 and input3 for caffe2: Where operator for customers who want to use boolean tensor for doing logic Reviewed By: ender-wieczorek Differential Revision: D5541815 fbshipit-source-id: 55171b242821f5f2c83235f5229a85f8cbe580de	2017-08-03 13:17:06 -07:00
uestclx	83ba2b1091	Typo correction in CMakelist.txt Summary: Closes https://github.com/caffe2/caffe2/pull/1010 Differential Revision: D5554930 Pulled By: akyrola fbshipit-source-id: 7bd93608aeace1baacff00b4c302fc4a5e20a607	2017-08-03 10:54:31 -07:00
Pieter Noordhuis	d177846dbf	Add prefix argument to FileStoreHandler Summary: This brings it up to par with how the RedisStoreHandler works. The store handler configuration does not have to change and only the run ID parameter changes across runs. This was inconsistent and came up in https://github.com/caffe2/caffe2/issues/984. Reviewed By: Yangqing Differential Revision: D5539299 fbshipit-source-id: 3b5f31c6549b46c24bbd70ebc0bec150eac8b76c	2017-08-03 10:37:26 -07:00
Aapo Kyrola	3628cd30f0	initialize peer access only when (might be) needed Summary: Currently Caffe2 enables peer access between all 8 gpus, even if only 1 gpu would be used. This adds several seconds to the startup time, but also takes a lot of memory (110 MB per GPU). This diff makes the peer access initialization "lazy". When GPU X is first used, pairwise peer access is set between GPUs 0 to X-1 with X. A lookup table is used to ensure no double peer access initialization. Reviewed By: pietern Differential Revision: D5528436 fbshipit-source-id: 8f3c2c8154291a7d3a99ee2882e4834ef5e38b66	2017-08-03 09:24:14 -07:00
Yiming Wu	8e1ecb1cfd	async sparse length sum op Summary: This diff makes SparseLengthsSum(Gradient) Async. It goes through these logics: 1. Adding INDICES to Gradient op input so that we can make it async without device host copies. 2. Registering new 3 input op as gradient for CPU/GPU version of SLS 3. In order to not breaking old nets(they are mostly on cpu), I still register the old 2 input op. So the op schema will not complain when it encounter some old nets that has SLSGradient op in it. wickedfoo Sorry this diff might bring you extra work of migrating your optimization effort to this new async gradient op. But we think it is worth it. :( Reviewed By: dzhulgakov Differential Revision: D5423188 fbshipit-source-id: 62494a6c52a507c4a4688d5a9e1a2bc720d5370d	2017-08-03 03:04:15 -07:00
Christopher Hay	a4e6ca6956	Added Sinusoidal Position Encoding Op Summary: Added caffe2 operator to calculate the sinusoidal position encoding for word embeddings, as described on page 6 in https://arxiv.org/abs/1706.03762. Reviewed By: jamesr66a Differential Revision: D5533024 fbshipit-source-id: 1afb35cd7f9d8c71f2635b853e56b2c840f0bc1f	2017-08-03 01:46:46 -07:00
Chonglin Sun	4a8545e3c6	implement kMaxPooling operator Summary: used by attention model Differential Revision: D5545533 fbshipit-source-id: 8378caaac528a71c154067168787ed493bfb0d37	2017-08-03 00:48:34 -07:00
Wenlin Chen	adc5510ecb	dynamic embedding Summary: refactor get_categorical_limit Reviewed By: xianjiec Differential Revision: D5459389 fbshipit-source-id: 14a7e07394db52fb090c6923e341c34576fcb6d6	2017-08-03 00:33:18 -07:00
Jiyan Yang	a8695178aa	Adding parameter sharing API to Dper2 Summary: To achive this, I modified the blob name scheme defined in a layer. Before it was scope/fc_w and scope/fc_w_auto_0 (if there is another fc within the same scope). Now I change it to scope/fc/w and scope/fc_auto_0/w. That is, we rely on the uniqueness of the scoped layer name to define names for blobs. I also overwrote the create_param method in LayerModelHelper to let it use the resolved name for blobs given the sharingparameter context. There are some details such as making the initializer more structured that I need to finalize. Reviewed By: kennyhorror Differential Revision: D5435132 fbshipit-source-id: a0525f5ea0977e255dd5ea765b38913f5951d455	2017-08-03 00:33:18 -07:00
Yangqing Jia	0b50e078d1	add proper build support for perfkernels Summary: Closes https://github.com/caffe2/caffe2/pull/972 Differential Revision: D5506606 Pulled By: Yangqing fbshipit-source-id: d9327e08fc1726bf9b20a8668d06a5be179f45d4	2017-08-02 23:17:04 -07:00
Adam Paszke	38b42e0421	Improve cuDNN weight layout test	2017-08-03 08:22:55 +05:30
Adam Paszke	d1ab37a65b	Make sure deserialized RNN modules have _data_ptrs too	2017-08-03 08:22:55 +05:30
Soumith Chintala	70c95dbe52	fix Conv3d non-contiguous weight bug	2017-08-02 22:47:09 -04:00
Soumith Chintala	74e5328b03	remove limitations on output_padding in Conv* routines	2017-08-02 22:46:24 -04:00
Soumith Chintala	814b65df4f	remove limitations on output_padding in Conv* routines	2017-08-02 22:46:04 -04:00
Soumith Chintala	a565b77791	add 2d and 3d dilated full Convolution	2017-08-02 22:44:59 -04:00
Soumith Chintala	6e6dca001c	add 2d and 3d dilated full Convolution	2017-08-02 22:44:44 -04:00
gchanan	60e7966c1f	Fix BatchNorm double backwards when training=False. (#2277 )	2017-08-03 05:34:12 +05:30
Thomas Viehmann	7c04f11d88	search for ldconfig in /sbin for nccl detection (#2276 )	2017-08-03 05:32:21 +05:30
Li Dong	f6585e80d7	if RNN's hx is None, requires_grad=False (#2274 ) When the initial hidden states of RNN are ``None'', we don't need to compute their gradients.	2017-08-03 05:29:50 +05:30
Christian Sarofeen	0b000952c1	Split batchnorm eval test into cpu and cuda functions. (#2273 )	2017-08-03 05:25:05 +05:30
Soumith Chintala	42328b70f7	fix another is_same_size call	2017-08-02 19:53:39 -04:00
Dmitrii Podoprikhin	7df859871e	Added functionality that allows users to store huge blobs Summary: Added functionality that allows users to store huge blobs of any type not only Tensors. Blob has to be divided into chunks in the same way as Tensor blob. Reviewed By: kennyhorror Differential Revision: D5432762 fbshipit-source-id: c171faacd99d209bfae6f9707ebde7c4e23ba3b9	2017-08-02 16:08:09 -07:00
Honghao Wei	cb1dd21280	adding operator lp_norm to support calculating l1 norm and l2 norm Summary: Implement operators LpNorm, which is to calculate the Lp norm of a tensor for regularization(p=1or 2) . Currently, there are only operator L1Distance to calculate the l1 distance of two same-shape tenors. We want to make it take only one input and output the l1 loss. We would do the same for l2 loss. We also plan to implement l_{p,q} loss, but have not decided which p and q to take. Reviewed By: xianjiec Differential Revision: D5460051 fbshipit-source-id: d67a38fbc94afa52de26d4a53e4d2b7df3c50b6a	2017-08-02 15:09:08 -07:00
Gregory Chanan	ca98c659df	Add tests that gradcheck grad sizes match input size and fix advanced indexing case that fails check.	2017-08-02 17:49:02 -04:00
Gregory Chanan	2a8379847b	add reentrancy checking for gradcheck.	2017-08-02 17:49:02 -04:00
Gregory Chanan	eb1ac73184	Remove save_mean/save_var from BatchNorm double backwards, as it's not needed. These could cause a problem with double backwards because they were std::move'd in Backward.	2017-08-02 17:49:02 -04:00
Aapo Kyrola	677324518d	Add InputsCanCrossDevices() to NCCL op schemas Summary: This silences performance warnings of input blobs crossing devices (by prof_dag). Reviewed By: prigoyal Differential Revision: D5548325 fbshipit-source-id: 13aa288f77abdfeab3703664421cb9c32bf31567	2017-08-02 12:50:19 -07:00
Simon Layton	ded2a5899e	Option to set BN scale and bias initial values Summary: Necessary to reproduce setup from 1-hour imagenet paper Closes https://github.com/caffe2/caffe2/pull/995 Differential Revision: D5547666 Pulled By: akyrola fbshipit-source-id: cbd4396888b02f32c67e1fe7e53636329de64f1b	2017-08-02 11:38:57 -07:00
Aapo Kyrola	ab42a95b6f	fast path for CUDNN global average pooling Summary: KaimingHe debugged slow model, and found out that global average pooling was hideously slow, even with CUDNN. Turns out CUDNN pooling op (especially backward pass) is not optimized for global pooling. This adds a fast path for global average pooling with NCHW. This is about 30x faster than CUDNN with 56 x 56 pooling, Compared to equivalent ReduceBackSum, this is about 3x faster. I will bootcamp the max pooling. Reviewed By: asaadaldien Differential Revision: D5533059 fbshipit-source-id: 2d590693d737fa92184603663031d96f6145f304	2017-08-02 11:10:10 -07:00
Soumith Chintala	b3ca3da4b6	fix type mismatch	2017-08-02 10:18:03 -04:00
Alisson Gusatti Azzolini	0fc2bf26b4	Option to enforce batch size Summary: This will throw away a few examples. It is desirable to keep batch size constant for full sync data parallel Reviewed By: dzhulgakov Differential Revision: D5531788 fbshipit-source-id: e19385401155e731cfc5b25e8e9ea7c16c19d478	2017-08-01 22:29:55 -07:00
Yan Shang	c662480ea6	Return empty Struct when get_field has empty input Summary: Currently, for `from_column_list` if the input col_names=[], it throws errors. To solve this issue, we fix the get_field function so that it creates an empty Struct when empty col_names is given. Reviewed By: kittipatv Differential Revision: D5543865 fbshipit-source-id: f6dfa25326e355f8ec24e5542761851a276beeb9	2017-08-01 19:49:47 -07:00
Do Huy Hoang	da66f10042	Improve StringJoin operator Summary: StringJoin operator converts input array/matrix elements to string then join them to make vector of strings Changes: * Support string tensor input * Support join on 1-axis * Add unit tests Differential Revision: D5513705 fbshipit-source-id: 25f96ed3586065c15f845a968c9f8864ca8f5bdf	2017-08-01 19:03:43 -07:00
gchanan	f484a5fee8	Implement LogSoftmax double backwards (#2270 )	2017-08-02 07:17:09 +05:30
Junjie Bai	0c7ee02c37	Add CUDA implementation of BooleanUnmask and fixed some bugs in the test Reviewed By: akyrola Differential Revision: D5405606 fbshipit-source-id: fd755ee2ec3d742597f7f5500f54caa396db4da4	2017-08-01 16:51:40 -07:00
Ben Zhang	6314c1fc15	Transforms in Python Summary: Allow the use of apply_transform() in the python API Reviewed By: bwasti Differential Revision: D5530483 fbshipit-source-id: 61a6d36fe125c89629fdeea040a717c453d84417	2017-08-01 16:51:38 -07:00
James Reed	c92559c67f	Add ios_base initializer to operator_schema error path Summary: Hopefully this will stop SIOF warnings. Context https://fb.facebook.com/groups/fbcode/permalink/1437211199649048/ Reviewed By: nbronson Differential Revision: D5537782 fbshipit-source-id: c32d66d12b69bee65b3084e3bad8e0c6a944bf02	2017-08-01 15:33:13 -07:00
Thomas Dudziak	676bedd298	Fixes for Python 3 in caffe2/caffe2/fb/data Summary: As title Reviewed By: MisterTea Differential Revision: D5532387 fbshipit-source-id: 0a51ca40b93cc2eb5371f0b86f2800354cd1939c	2017-08-01 15:22:55 -07:00
Kevin Wilfong	60cb55461e	Caffe2: Support additional outputs in ImageInputOp Summary: This allows users to add an arbitrary of additional outputs to ImageInputOp. These are populated by reading additional TensorProto values from the TensorProtos from the DBReader, and converting them into Tensors. Similar to labels, only ints and floats are supported, and multiple values are supported. Reviewed By: panshen1 Differential Revision: D5502019 fbshipit-source-id: 5a8b61b3a8549272a112e8e02cd613d8f9a271ba	2017-08-01 14:36:05 -07:00
Bram Wasti	3a99698734	include numpy's other 32bit int type Summary: forgot one :) Reviewed By: akyrola Differential Revision: D5534905 fbshipit-source-id: a0e58ca3922ec80f526f7586931ff3da8e9bcffc	2017-08-01 13:53:11 -07:00
Tao Wu	5d304a3b49	add gradient for SparseToDenseMask operator Summary: add gradient for SparseToDenseMask operator Reviewed By: kittipatv Differential Revision: D5320792 fbshipit-source-id: 8ee7f1c87e8270ad6077ed197ce9512524069b59	2017-08-01 13:05:03 -07:00
Bram Wasti	d804c848dd	Added O2 flag to default compilation path Summary: Closes https://github.com/caffe2/caffe2/pull/1000 Differential Revision: D5539652 Pulled By: bwasti fbshipit-source-id: 8ac9c28d7f61a02dce4705df8ce704f00878ae44	2017-08-01 12:54:25 -07:00
Yangqing Jia	5954211ed9	Fix #997 Summary: cc phg1024 Closes https://github.com/caffe2/caffe2/pull/998 Differential Revision: D5538341 Pulled By: Yangqing fbshipit-source-id: 2df69e03c8c94c67628ab8051d2a863e93f49692	2017-08-01 11:21:00 -07:00
Si Chen	f961d6da60	Update SoftmaxOp documentation: input not necessarily 2-D Summary: Update SoftmaxOp documentation: input not necessarily 2-D Reviewed By: jamesr66a Differential Revision: D5535238 fbshipit-source-id: 1beed3b35348737c51cd564d95e9f87e9ba0608a	2017-08-01 10:38:12 -07:00
Alisson Gusatti Azzolini	5cca4cc0f2	Fix blob device inference for LearningRate Summary: LearningRate inputs are always on CPU. Reviewed By: kennyhorror Differential Revision: D5531910 fbshipit-source-id: 88b5a50800e46f2cf0f0a82ea0de1adeec8de6ed	2017-08-01 10:17:30 -07:00
Alisson Gusatti Azzolini	1968e03486	net_printer.to_string() accepts NetDef Summary: Title. Reviewed By: kennyhorror Differential Revision: D5531925 fbshipit-source-id: 8f8961e6ab14d49720f74ec01c197ba9cc3e33ce	2017-08-01 10:17:29 -07:00
Szymon Piechowicz	3324db447f	Caffe2: allow nets that don't use all input in net.ClonePartial Summary: Caffe2: allow nets that don't use all input in net.ClonePartial Differential Revision: D5535564 fbshipit-source-id: 0ec8fb3ade4d7d6cd4a702c9c265d9c77f27a627	2017-08-01 10:05:46 -07:00
Aapo Kyrola	6d8933b939	improve enforces in SquaredDistanceOp Summary: Change DCHECK to CAFFE_ENFORCE (so that the problems occurs also on mode/opt) and use the EQ enforce Reviewed By: asaadaldien, Yangqing Differential Revision: D5517647 fbshipit-source-id: 4da6eae54abf71114957133df088ae3623d8beaa	2017-08-01 09:09:55 -07:00
Gregory Chanan	aebec91301	Fix serialization of legacy ClassNLLCriterion with ignore_index.	2017-08-01 14:29:33 +05:30
Gregory Chanan	9c1e9d8a9b	Update legacy ClassNLLCriterion to add ignore_index.	2017-08-01 14:29:33 +05:30
Gregory Chanan	61c873cc7d	Implement SoftMax and NLLLoss double backwards.	2017-08-01 14:29:33 +05:30
yunjey	e1ca722988	Add comments for default value (#2248 ) Added comments for default value in nn.functional	2017-08-01 14:27:46 +05:30
Ben Zhang	9a6c72891b	Move Transform from Contrib to Core Summary: In order to pybind, we need transform in core. It's a basically finished product, with a big test suite. It's safe. We can begin hooking up observers after this, and I have a diff coming up that pybinds some apply_transform function. Reviewed By: bwasti Differential Revision: D5522200 fbshipit-source-id: dea6aa606fc689af84e2533569d1ef348cb5f3f2	2017-07-31 20:38:42 -07:00
Ben Zhang	d035af1f2c	Added general string operator matching a la tensorflow, device option, engine, and argument matching Summary: Allows Operators to match their string properties using * and \|, to allow an operator to match multiple types. Also allows device option, engine, and argument matching. Reviewed By: bwasti Differential Revision: D5419697 fbshipit-source-id: fe09c7f83a5a2fefe61d79e09ee1d5b755045313	2017-07-31 20:38:37 -07:00
Davin Wang	b4fe71925d	fix #983 by remove unsupported archs Summary: `resize_op.cu` line63 leverages `__ldg` feature of CUDA, which implies `compute_35` as minimum requirement. Closes https://github.com/caffe2/caffe2/pull/986 Differential Revision: D5534305 Pulled By: Yangqing fbshipit-source-id: 1bac789c89178211ce2214007787d459f4228f99	2017-07-31 18:38:59 -07:00
Jon Morton	9349dab8a0	Full sync of fbcode to fbobjc/fbandroid Summary: running ##xplat/caffe2/fb_sync.sh##. Also add two new core sources to the BUCK file, and add ##createSharedBuffer## to NNPACKConvOp. Reviewed By: ajtulloch Differential Revision: D5373061 fbshipit-source-id: c030b2629d2715e1d2776c98715f57e2650922c9	2017-07-31 17:38:38 -07:00
Davin Wang	8a396f8dbe	fix #985 compilation error due to type mismatch Summary: Fix the error during compilation on Win10+CUDA, not sure if it affects Linux and MacOS. caffe2/operators/top_k_radix_selection.cuh(359): error : a value of type "caffe2::TIndex " cannot be used to initialize an entity of type "long " Closes https://github.com/caffe2/caffe2/pull/992 Differential Revision: D5532399 Pulled By: Yangqing fbshipit-source-id: 6958ee4f21053f73a0628cf98936931099211749	2017-07-31 16:18:18 -07:00
Huazhong Ning	3c8018b565	Utils to Print accumulate histogram of blobs Summary: 1. allow PrintOp to print every N 2. add a util function to accumulate hist and print. Reviewed By: dzhulgakov Differential Revision: D5437008 fbshipit-source-id: 7dd8e51b20f9daaec6c0a4e69ff6e082fca671e6	2017-07-31 16:04:25 -07:00
Aapo Kyrola	e38015756a	shape inference for Squeeze Summary: Add tensor inference function for squeeze, refactor a bit Reviewed By: asaadaldien Differential Revision: D5518880 fbshipit-source-id: 5b8cb9154f5f777d4be3612a96d7ed76a9068c0c	2017-07-31 16:04:24 -07:00
Xiaolong Wang	82adbde878	pass layer_parameter shape to ps builder if cannot inferred from initializer Summary: Feed team uses distributed training and wants to also use transfer learning. Currently, transfer learning implements by overwriting the layer parameter initializer. Therefore, PS builder can't infer correctly the parameter shape. To fix this, add a field 'shape' in `layer_parameter` and set the shape if we overwrite its initializer. We also enforce the check of parameter shape between the original initializer and the loaded blob. (this adds extra cost) Differential Revision: D5520541 fbshipit-source-id: 80547dbd328b3f6cbfcea0b2daaf4004703dfe81	2017-07-31 16:04:23 -07:00
Zach DeVito	410f464dd1	provide more information in Declarations.cwrap	2017-07-31 14:52:20 -07:00
James Cross	8c65b5ab34	multilayer seq2seq Summary: Several refinements to seq2seq example code, including support for multilayer LSTM. Reviewed By: jamesr66a Differential Revision: D5460372 fbshipit-source-id: d2eabf6aa9a5b5df7bbc341fd99c4e7d8322e717	2017-07-31 12:27:51 -07:00
Dmytro Dzhulgakov	5f8693cc6f	Make Context::FinishDeviceComputation throw instead of FATAL Summary: We shouldn't LOG(FATAL) in Caffe2 code under any conditions as it's a library. The case where it failed was a bug in SparseAdaGrad that failed on empty input trying to launch 0-sized CUDA kernel. Also, the trend for C2 core is in moving from bool to exceptions, so I just moved CAFFE_ENFORCE directly into FinishDeviceComputation. Most of the use cases were already doing that or ignoring the output (bad!). Reviewed By: akyrola Differential Revision: D5495913 fbshipit-source-id: 66f382369417a262da69d54470f720e7d04a5cdf	2017-07-31 00:05:10 -07:00
Aapo Kyrola	8079abbaf1	fix traversal order Summary: Memonger did not properly track the number of times a blob output has to be produced before an operator can be visited. Actually I remember fixing this before, but well. This bug was manifested in Priya's model, so thanks prigoyal, and benz's model verifier nicely caught the wrong output. Reviewed By: asaadaldien Differential Revision: D5524912 fbshipit-source-id: 10f4d7056b84aba0274a918af508ea043e6026f9	2017-07-30 21:47:48 -07:00
Dmytro Dzhulgakov	76bb054f2c	Scaffolding for perfkernels dispatch of embedding lookup Summary: Based on discussion with Misha we're going to go for code-generation for all possible variants: AVX2/AVX512 (eventually) embedding type: float16, float32 index type: int32, int64 reducer: sum, weighted sum, mean (with scaling by lengths) block size: 32, 64, 128 From some simple testing full-loop fusion with prefetching (as opposed to TypedAxpy) gives at least 1.5x performance win, so it is justified. This just adds scaffolding for perfkernels for the embedding lookup subfunction. I haven't actually moved the current implementation, because it's more work to refactor current macroses/templates, it's easier and more extensible to do codegen. Scaffolding is a bit ugly because we don't want to pass templates across translation units and thus it requires explicit names of types in function names. Better suggestions are welcomed. msmelyan - you'd pretty much need to generate appropriate embedding_lookup_avx2.cc Reviewed By: Yangqing Differential Revision: D5505887 fbshipit-source-id: ece489d4fd36e7ddbe71efb890f48ab38acaeaec	2017-07-30 12:34:23 -07:00
Edward Z. Yang	8262920b72	Add ATen overload to AutoGPU. (#2234 ) * Add ATen overload to AutoGPU. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Use new AutoGPU overload. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-30 09:01:24 +05:30
Ahmed Taei	95bcc09812	Resolve some compiler warnings. Reviewed By: bddppq Differential Revision: D5519146 fbshipit-source-id: 6a99ea55af3a89c07b17d0ee02088b5258a2c551	2017-07-29 11:09:27 -07:00
yunjey	0cd149f06f	Add comments for default value (#2242 )	2017-07-29 14:14:14 +05:30
Mingda Li	e3c45206ec	Add a method to run a train net multiple times in layer_test_util.py Summary: This method runs a train net multiple times therefore enables testing layers with iteration-dependent behavior. Differential Revision: D5493750 fbshipit-source-id: a7fb967a66f799aaf82acfadc4ecf66e0744da20	2017-07-28 19:56:05 -07:00
Zachary DeVito	43c944acbd	Remove dead THPP code that has been replaced with ATen objects. (#2235 ) THPP usage is now isolated in THD.	2017-07-29 08:07:41 +05:30
Aapo Kyrola	84b9d267dc	add warnings about slow data input Summary: One of my workflows was stuck before everstore/hive data input was experiencing networking issues (No route to host etc.). But it is hard to know this is happening because the errors were logged to stdout. Anyway, added a simple logging to warn if the data workers enqueue thread is not getting new data for over 10 secs. Reviewed By: panshen1 Differential Revision: D5522816 fbshipit-source-id: a036c4afdfbbafea130a4251c1ca02c138d19a83	2017-07-28 18:21:42 -07:00
Zachary DeVito	bf26a51f91	fix a bug where an uninitialized at::Tensor was passed to createPyObject (#2239 )	2017-07-29 06:28:18 +05:30
Tao Wu	6530db49bc	improve pair_wise_loss operator to support multiple sessions Summary: The diff adds support for rank_loss operator to support computing loss for multiple sessions (batch). Reviewed By: kittipatv Differential Revision: D5515465 fbshipit-source-id: 55a01cd5ad21eaeae82875ad136c392fed0dbb26	2017-07-28 15:12:47 -07:00
Trevor Killeen	80192d3e8d	Add rudimentary support for calling a few sparse tensor functions.	2017-07-28 12:38:23 -07:00
Yi Wang	930b6b83c5	Update class comment of Context Summary: Fixes https://github.com/caffe2/caffe2/issues/988 Closes https://github.com/caffe2/caffe2/pull/989 Differential Revision: D5518437 Pulled By: Yangqing fbshipit-source-id: 885e6fed2a32eed57c3b3aeb16fe65925406501c	2017-07-28 11:35:20 -07:00
Aapo Kyrola	071127cc07	change bunch of inexpensive DCHECKS to CAFFE_ENFORCEs Summary: There is no need to disable inexpesnive assertions in mode/opt, but it makes it incredible difficult to debug model problems. So changed a bunch of them to CAFFE_ENFORCEs. Reviewed By: Yangqing Differential Revision: D5517902 fbshipit-source-id: 9154d0114db159e8136a482fb6508e92084af97a	2017-07-28 11:35:19 -07:00
Dmytro Dzhulgakov	f2090debb0	Optimized SparseLengthsSum Summary: Optimised SparseLengthsSum (fp32) for now 1) Specialized reducer 2) created fast routine with prefetches, loop unrolling, block specailization and register tiling 3) added more variety of block sizes to segment_ops_test.py Reviewed By: Yangqing Differential Revision: D5392472 fbshipit-source-id: 8ed9baf1b12ec05bd391cabb390024e6bc60a6f6	2017-07-28 10:10:25 -07:00
Trevor Killeen	c304d04fc6	Replace thpp::Tensor with ATen Tensor in autograd csrc (#2170 )	2017-07-28 10:18:37 -04:00
Yawara ISHIDA	f1fd4ac7ed	Added aarch64 support (#2226 )	2017-07-28 11:24:19 +05:30
Bangsheng Tang	a41cbdec0e	float support for square root divide Summary: to support an operation needed by D5507205 Reviewed By: xianjiec Differential Revision: D5512522 fbshipit-source-id: a9b3a668c28eff71d1e106dbbb572184df4a7638	2017-07-27 17:40:40 -07:00
Viswanath Sivakumar	0676dfef2b	ExtractPredictorNet should strip gpu_id prefix from step_net Summary: The renames were only being applied to the main net, if step_net has an external input that is part of renames, running the model would fail with 'blob not found in workspace' error. Differential Revision: D5511953 fbshipit-source-id: ba262a094c3263978dfe173f2cab00301131b57f	2017-07-27 16:06:47 -07:00
Jacqueline Xu	13569c9aa0	Fixing semi-random layer model for multi-layer models Summary: Updated the semi-random layer model for multi-layer models using semi-random layers. Notable changes: - Input and outputs for the semi-random layer is now a Struct with "full" and "random" components - Flag was added to choose to initialize output schema in Arc Cosine or not (if output schema initialization will happen in Semi Random layer) Reviewed By: chocjy Differential Revision: D5496034 fbshipit-source-id: 5245e287a5b1cbffd5e8d2e3da31477c65b41e04	2017-07-27 15:25:19 -07:00
Ben Zhang	78d1806679	fixed invalid memory access, caused by iterator invalidation Summary: ASAN caught invalid memory problems in 3 of the tests in PatternNetTransformTests. The cause was pushing elements into a vector, that although will remain the same when in scope, can be relocated when resized; thus invalidating the iterator pointer. Reviewed By: bwasti Differential Revision: D5510112 fbshipit-source-id: affb11dbd221c826e108136789ef11c96c5d9843	2017-07-27 14:28:11 -07:00
Aapo Kyrola	26645154bb	warn about using test/val model with init_params=True + fixed some cases Summary: It is common mistake to create test/validation model with init_params=True. When its param_init_net is run, it will overwrite training models' params, and with DPM, those won't be synchronized to all GPUs. I don't want to make this an assertion yet, since it might break people's trainers (it is ok to have init_params=True if you never run the param_init_net...). Reviewed By: asaadaldien Differential Revision: D5509963 fbshipit-source-id: 63b1a16ec0af96e3790e226850f6e0e64689143f	2017-07-27 13:20:27 -07:00
Zachary DeVito	be7dcccdd9	fix issues where scale gets reported as 0.0000 in output	2017-07-27 11:24:12 -07:00
Christian Sarofeen	ac76ab5fca	Increase tol. for float tensor qr big test. test_FloatTensor_qr_big test is still a bit flaky on K80. Increasing tolerance to improve reliability as tests are moved around and results change for this test.	2017-07-27 14:23:06 -04:00
Aapo Kyrola	af1e45c1e1	support appending net and converting them Summary: As per rushabhmshah99 request: he wants to append a pre-trained model (without training that) to the model. So added data_parallel_model.ConvertNetForDevice() to enable that. The unit test shows example how to use this with AppendNet, and I also added a blurb to the function. Differential Revision: D5503335 fbshipit-source-id: b2a5db5c1739dc97f46dd0d7606ed555d99255b8	2017-07-27 11:07:48 -07:00
Bangsheng Tang	d8443b8ffa	BatchGatherOp Summary: 1. added BatchGatherOp and BatchGatherGradientOp 2. unit tests Reviewed By: xianjiec Differential Revision: D5443965 fbshipit-source-id: bdcbb7f9f91c55484372a4bdb1727ae6d49e2018	2017-07-27 10:17:42 -07:00
Aapo Kyrola	a53f4b0f9b	add dimension check to NHWC2NCHW shape inference Summary: To prevent assertion from protobuffer when accessing the dims. Reviewed By: asaadaldien Differential Revision: D5504362 fbshipit-source-id: d9b55fab3126e2760a3e790615ed30a1af2ddc32	2017-07-27 09:54:44 -07:00
Adam Paszke	04f31aa034	Improve Variable.retain_grad	2017-07-27 20:36:14 +05:30
Hugh Perkins	ae59e008cd	add `retain_grad` method, to variable, so gradient gets stored during backpop, on non-user variables	2017-07-27 20:36:14 +05:30
Eric Cosatto	e25b3d7bc5	replace lon glong types with size_t (#1267 ) Work around bug in msvc compiler in win32 mode	2017-07-27 19:13:56 +05:30
Aapo Kyrola	ba1ae2136e	strengthen gloo_test by checking for success Summary: Weakness in gloo_test led to an embarrassing diff review (D5494956): my test "succeeded", alhough each of the workers failed hard in an assertion. This was not handled because there was no exception to be caught and put into the result queue. So change the logic to put a success-token into the queue, signaling successfully completion. Reviewed By: pietern Differential Revision: D5503760 fbshipit-source-id: f2415bcc55638595cefa5d64dea811d86e77f24d	2017-07-26 23:37:15 -07:00
Yangqing Jia	8a156b651b	Move cpuid ctor to .cc Summary: As Dima suggested. Reviewed By: dzhulgakov Differential Revision: D5504045 fbshipit-source-id: 3fcf40ebfbcd2aebf05e79078630f04748944799	2017-07-26 23:37:14 -07:00
Aapo Kyrola	3363681304	enable CreateCommonWorld to bootstrap from existing common world Summary: Use romain-intel's ContextFactory to create common worlds from existing common worlds, thus bypassing KV store completely. Changed data_parallel_model to automatically find if there is already a CW we can work. CreateCommonWorldOp takes optional second parameter, which is existing CW. Reviewed By: andrewwdye Differential Revision: D5494956 fbshipit-source-id: 5f7a840bcd5fe4ea756fafeacc746bc2cf5078b0	2017-07-26 22:31:55 -07:00
Ben Zhang	346ff7ed18	Implementation for Pattern Net Transforms, which is a transform initialized by a Pattern NetDef and a Replace NetDef. Summary: Split this into its own file for ease of reviewing. This is a simple interface for someone to create a Transform - by simply providing their own Pattern and Replace NetDefs. Reviewed By: akyrola Differential Revision: D5440426 fbshipit-source-id: dc643226f40ffe4ec5c86d56cfea374bd6a4e0e5	2017-07-26 22:08:00 -07:00
Yangqing Jia	de92dbe4bb	MKL code move Summary: Nothing gets changed - this would allow us to more easily deal with build systems. Also now everything that is MKL related lives under mkl/. Reviewed By: dzhulgakov Differential Revision: D5505157 fbshipit-source-id: ddb2e6ac290a146a7cb495da23bb0e5b5594bd2a	2017-07-26 20:21:55 -07:00
Xiaolong Wang	1eecabcfb5	bug in mtml: shared_embedding Summary: A bug reported in MTML group: https://fburl.com/lumicchc The reason is that in MTML, the `task_shared_embedding` was not correctly initalized in python Reviewed By: xianjiec Differential Revision: D5502875 fbshipit-source-id: 3538d917392568ecd37c39059dc86f866bce9543	2017-07-26 20:21:53 -07:00
Yangqing Jia	45ce863151	CMake updates. Summary: Closes https://github.com/caffe2/caffe2/pull/970 Differential Revision: D5505960 Pulled By: Yangqing fbshipit-source-id: 1843c83d4ab5f9f3880bf93a9c748717c6af8565	2017-07-26 18:58:20 -07:00
Ahmed Taei	40b783b746	Fix flaky test due to numerical gradient approximation error. Summary: Use smaller step size for GradientChecks and pass seed to help reproducing the test from logged inputs. Reviewed By: Yangqing Differential Revision: D5505698 fbshipit-source-id: fc308efe72d535695ba628944aee1913ba16b2f1	2017-07-26 18:58:19 -07:00
Yangqing Jia	d187b2f4c9	MKLDNN bugfix Summary: Some old compilers (e.g. gcc 4.8) does not like lambdas. Reviewed By: ajtulloch Differential Revision: D5500500 fbshipit-source-id: fe6bcc7277fd7e9607f54a83be1f0ec146411440	2017-07-26 18:58:18 -07:00
Ben Zhang	1bd7fd6bc8	fixed nnpack transform to match on cpu not gpu Summary: [easy] fix convtonnpack transform to transform cpu operators not gpu Reviewed By: bwasti Differential Revision: D5501625 fbshipit-source-id: da69bd4127d29ccea707e91bff0573dc3a4b5e1b	2017-07-26 18:07:35 -07:00
gchanan	925208af72	Implement BatchNorm double backwards (#2207 ) * Implement BatchNorm double backwards as a python function called directly from C++. This will be converted to C++ code once ATen is integrated with autograd. * Some performance improvements via inplace ops and reusing calculations.	2017-07-27 06:00:31 +05:30
Alykhan Tejani	643f8d12ff	[bugfix] in bce_with_logits logsumexp calculation (#2221 ) * fix bug in bce_with_logits logsumexp calculation * flake8 fix	2017-07-27 05:58:56 +05:30
Zach DeVito	d4bd6c4314	add some asserts to basic.cpp	2017-07-26 16:51:41 -07:00
Jacqueline Xu	9bec54bbf1	Modify arc cosine feature map and semi random layers to initialize parameters as global constants Summary: The original issue was that the initialized parameters for randomized layers (Arc Cosine and Semi-Random) were not fixed across distributed runs of the layers. Moreover, as the weights are initialized as (constant) parameters, when the layer is added to the preprocessing part, these weights won't be saved after training since they don't exist on the trainer. I fixed the issue here by building an option to add the randomized parameters to the model global constants so that the same parameter values can be accessed. Also, the parameters can be saved when the training is finished. In this diff, I've: - Updated randomized parameters to be added as a global constant across distributed runs of Arc Cosine Feature Map and Semi Random Feature layers - Updated unit tests - Ran an end-to-end test, enabling multiple readers to test the fixed issue Reviewed By: chocjy Differential Revision: D5483372 fbshipit-source-id: b4617f9ffc1c414d5a381dbded723a31a8be3ccd	2017-07-26 16:37:00 -07:00
Zach DeVito	3b6d01301f	add valgrind to CI	2017-07-26 16:11:26 -07:00
Soumith Chintala	fb8f9de498	fix for ATen API Change	2017-07-26 18:55:56 -04:00
Szymon Piechowicz	54b171eae5	Caffe2: don't swallow exception stacktrace Summary: Caffe2: don't swallow exception stacktrace {F69325406} Reviewed By: akyrola Differential Revision: D5503227 fbshipit-source-id: 4e11d921652a094e20c46af19ba880390be8e997	2017-07-26 15:48:05 -07:00
Edward Z. Yang	cb9ad7a892	Opt into Trusty builds. (#2214 ) * Opt into Trusty builds. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Bump to 2.7.9. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-27 04:04:57 +05:30
Soumith Chintala	f7de7bab6e	Merge commit 'fd97d92479e32e550866adfd1f0465e4cfa5e581'	2017-07-26 18:11:16 -04:00
Zach DeVito	fd97d92479	allow retain to be specified for unsafeTensorFromTH	2017-07-26 14:58:32 -07:00
Edward Z. Yang	f3aa97f169	Deduplicate THPUtils_checkLong/THPUtils_unpackLong (#2218 ) There were two implementations of THPUtils_checkLong/THPUtils_unpackLong; one that was a macro and one that was not, which is hella bad if you accidentally include the macro before the real definition. Now we always use the inline function. A reasonable follow-up task would be to un-macro-ify the rest of these functions. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-27 03:12:12 +05:30
Soumith Chintala	b0648fc3fc	Merge commit 'be9ef9283f297997afd3bf8e21147ec6bf09ebbf'	2017-07-26 17:25:39 -04:00
Wojciech Glogowski	8f8dccd2ed	distance_op_test from hypothesis_test refactored Summary: Moved distance_op_test from hypothesis_test to distance_op_test and refactored Reviewed By: akyrola, asaadaldien Differential Revision: D5495104 fbshipit-source-id: 4a90c75eabeb380ae9d150d6258e9b5b0fbfc5ca	2017-07-26 13:37:08 -07:00
Zachary DeVito	be9ef9283f	Merge pull request #35 from ezyang/pr/undefined-dim-doc Note [Undefined-dim versus 0-dim]	2017-07-26 12:42:33 -07:00
Zachary DeVito	9c0d52a32f	fix osx build errors related to long/int64_t	2017-07-26 12:36:25 -07:00
Edward Z. Yang	54545c2154	Note [Undefined-dim versus 0-dim] Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-26 12:34:13 -07:00
Gregory Chanan	9ec7051442	Remove __func__ hack in auto nn.	2017-07-26 15:28:25 -04:00
Jay Mahadeokar	e6ffc2acb1	Add get gradient for CTC Summary: - Adds GetCTCGradient CTC training, so we can use AddGradientOperators() on "costs". The function just calls CopyOp. - Modified test to verify inputs_gradient is created in workspace. Reviewed By: yqwangustc Differential Revision: D5499271 fbshipit-source-id: 5a6985f90f309303aadaceb7c966d822ad3576b2	2017-07-26 12:09:15 -07:00
gchanan	2676c6357f	Enable Conv groups gradgradchecks. (#2216 )	2017-07-27 00:24:12 +05:30
Davin Wang	d89632b52c	Support (U)INT8, (U)INT16 in data type conversion Summary: Data type conversion between Numpy Array and Caffe2 Tensor currently only support 3 types: FLOAT, DOUBLE and INT32. Support 8bit and 16bit date types will help reduce the model size in some circumstance. I benefit from this to reduce size of a data set from 8GB to 1GB by using INT8. Closes https://github.com/caffe2/caffe2/pull/930 Reviewed By: Yangqing Differential Revision: D5440929 Pulled By: akyrola fbshipit-source-id: 3762da1d845e62a13ba384d1c144328b19dd663b	2017-07-26 11:23:53 -07:00
Yangqing Jia	e6d2941ec0	revert codemod since this code also need to be built on ARM Summary: This is causing OSS android build failures such as https://travis-ci.org/caffe2/caffe2/jobs/257609575 Reviewed By: akyrola Differential Revision: D5497495 fbshipit-source-id: b3ba0cca135a4a632461851c9b9212f3d75abd5d	2017-07-26 07:57:42 -07:00
Yangqing Jia	b6daf562c4	fix windows build Summary: __attribute__((unused)) is not supported on Windows, so we actually need to substitute it with a macro. Also changed UNUSED_VARIABLE to CAFFE2_UNUSED because we also use it to mark functions now. Reviewed By: ajtulloch Differential Revision: D5497063 fbshipit-source-id: bcda026e626c41f71c21c36f029a3f871eaea7d4	2017-07-26 03:50:20 -07:00
Yangqing Jia	d20e50a39f	fix perfkernels build Summary: Closes https://github.com/caffe2/caffe2/pull/967 Differential Revision: D5497418 Pulled By: Yangqing fbshipit-source-id: 171d7a3128ef6e54d409a8186a4f335439a82f68	2017-07-26 00:32:22 -07:00
Guillaume Dumont	8cc9dbf357	Added Ninja generator support on Windows Summary: I successfully built caffe2 using MSVC 2015 and the Ninja Generator. I use vcpkg to build glfags, glog, lmdb and protobuf. Here is my build procedure: 1. Install vcpkg and set it up according to vcpkg docs 2. Install dependencies ``` $> vcpkg install gflags glog lmdb protobuf eigen3 --triplet x64-windows-static ``` 3. Run CMake with this batch file ```Batch setlocal if NOT DEFINED VCPKG_DIR ( echo "Please defined VCPKG_DIR" && exit /b 1 ) if NOT DEFINED CMAKE_BUILD_TYPE set CMAKE_BUILD_TYPE=Release if NOT DEFINED BUILD_DIR set BUILD_DIR=build_%CMAKE_BUILD_TYPE% if NOT DEFINED USE_CUDA set USE_CUDA=OFF call "%VS140COMNTOOLS%\..\..\VC\vcvarsall.bat" amd64 if NOT EXIST %BUILD_DIR% (mkdir %BUILD_DIR%) pushd %BUILD_DIR% set CMAKE_GENERATOR=Ninja set ZLIB_LIBRARY=%VCPKG_DIR%\installed\x64-windows-static\lib\zlib.lib cmake -G"%CMAKE_GENERATOR%" ^ -DBUILD_SHARED_LIBS=OFF ^ -DCMAKE_VERBOSE_MAKEFILE=1 ^ -DBUILD_TEST=OFF ^ -DBUILD_SHARED_LIBS=OFF ^ -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% ^ -DUSE_CUDA=%USE_CUDA% ^ -DZLIB_LIBRARY:FILEPATH="%ZLIB_LIBRARY%" ^ -DVCPKG_TARGET_TRIPLET=x64-windows-static ^ -DVCPKG_APPLOCAL_DEPS:BOOL=OFF ^ -DCMAKE_TOOLCHAIN_FILE:FILEPATH=%VCPKG_DIR%\scripts\buildsystems\vcpkg.cmake ^ -DPROTOBUF_PROTOC_EXECUTABLE:FILEPATH=%VCPKG_DIR%\installed\x64-windows-static\tools\protoc.exe ^ ..\ ninja popd endlocal ``` Closes https://github.com/caffe2/caffe2/pull/880 Differential Revision: D5497384 Pulled By: Yangqing fbshipit-source-id: e0d81d3dbd3286ab925eddef0e6fbf99eb6375a5	2017-07-26 00:32:20 -07:00
Dmytro Dzhulgakov	cf1ce29631	Fix GPU SparseAdaGrad with empty tensors Summary: CUDA doesn't like 0-sized grids :) Reviewed By: Yangqing Differential Revision: D5495805 fbshipit-source-id: 6819513024978ee6bb70a39b25d23ced06465750	2017-07-25 23:50:54 -07:00
Daniel Bermond	0458985c1b	Fix build with external nnpack installation Summary: libpthreadpool is needed during the linking stage and is missing when user chooses to use an external nnpack installation (from system libraries). Fixes GitHub issue #459. Detailed discussion on [this comment](https://github.com/caffe2/caffe2/issues/459#issuecomment-308831547). Closes https://github.com/caffe2/caffe2/pull/808 Differential Revision: D5430318 Pulled By: Yangqing fbshipit-source-id: 5e10332fb01e54d8360bb929c1a82b0eef580bbb	2017-07-25 23:03:39 -07:00
Dmytro Dzhulgakov	0ee5688892	Fix SparseLengthSum undeclared schema Differential Revision: D5495760 fbshipit-source-id: a2c9e6204021687f6df830ea2bfbe355bfc888be	2017-07-25 18:19:10 -07:00
Artem Volkhin	2f5c96a730	Fix Flatten operator for empty tensors Reviewed By: xianjiec Differential Revision: D5487475 fbshipit-source-id: f1321e15352b0bbe039312f544a9c2ed78da8732	2017-07-25 17:51:42 -07:00
Ben Zhang	240b307d8b	Implemented Registry pattern + ConvToNNPack Transform as example of it Summary: Implemented the registry pattern: now all transforms are instantiated by a string. I then made a simple transform which, given a graph, will change the engine of all Conv operators to be NNPACK, to demonstrate. Reviewed By: bwasti Differential Revision: D5447007 fbshipit-source-id: 48065a88fa648ad0e11f7f8ee93b8e732cd515d7	2017-07-25 15:08:02 -07:00
Zach DeVito	ef3b09fb5f	fix a bug where some scalars were getting truncated to integers incorrectly.	2017-07-25 14:27:16 -07:00
Andrew Tulloch	133dc2603e	Support grouped convolutions in MKL Reviewed By: Yangqing Differential Revision: D5487692 fbshipit-source-id: 94fb66b3b104cf16dcad07743def4ea940515689	2017-07-25 14:19:02 -07:00
Andrew Tulloch	d86f32ae2e	Implement simple graph rewrite functionality. Reviewed By: Yangqing Differential Revision: D5487075 fbshipit-source-id: f7c7867c5cbae39cf197cf5e7ed8a64149f33208	2017-07-25 14:19:01 -07:00
Andrew Tulloch	1313f70390	MKLSpatialBNOp doesn't support in-place Reviewed By: Yangqing Differential Revision: D5487067 fbshipit-source-id: cd0068d9dbed8d55c4c2ed913a80b97113c49653	2017-07-25 14:19:01 -07:00
Andrew Tulloch	9e6ea2987f	MKLReluOp supports in-place X/Y Reviewed By: Yangqing Differential Revision: D5487060 fbshipit-source-id: 35d2d450f46aefc3c9395be45af99e13d1c168ec	2017-07-25 14:19:00 -07:00
Andrew Tulloch	af43e2b251	MKLConvOp handles the no bias case Reviewed By: Yangqing Differential Revision: D5487050 fbshipit-source-id: 4791943d331a2d7283f0f9b939f3f03e32dbdbed	2017-07-25 14:18:58 -07:00
Andrew Tulloch	f028e74fb7	Implement a filler op test Reviewed By: Yangqing Differential Revision: D5487042 fbshipit-source-id: 0b03683fd3822769381c14790c0c2e46162d1aaf	2017-07-25 14:18:57 -07:00
Andrew Tulloch	ea813c0a91	Fix MKLFallbackOp random seed propagation. Reviewed By: Yangqing Differential Revision: D5487038 fbshipit-source-id: 5ec3958a4c73611ff6ce784a4336aeacce95575b	2017-07-25 14:18:56 -07:00
Andrew Tulloch	bbf2b578dc	Implement MKL CopyTo/CopyFrom ops Reviewed By: Yangqing Differential Revision: D5482636 fbshipit-source-id: d044c495837aef985210f0b63d61f88f9acc3db7	2017-07-25 14:18:55 -07:00
Andrew Tulloch	71d04fd5cc	Implement SumOp for MKL Reviewed By: Yangqing Differential Revision: D5482622 fbshipit-source-id: e1e8f8aebce874efc31fab2c870cd274ca0d037c	2017-07-25 14:18:54 -07:00
Andrew Tulloch	ad7d7657a4	Add tests to targets Reviewed By: Yangqing Differential Revision: D5482614 fbshipit-source-id: 04727e19b7b83b6d0d41ad3227866957480bc1ee	2017-07-25 14:18:54 -07:00
Andrew Tulloch	007492e730	Fix MKL spatial pooling test Summary: tsia Reviewed By: Yangqing Differential Revision: D5482603 fbshipit-source-id: e95a8829c71125623066cfee3b76e774c7f3a46b	2017-07-25 14:18:53 -07:00
Andrew Tulloch	a7d8f489d9	Improe MKL SpatialBN test Summary: tsia Reviewed By: Yangqing Differential Revision: D5482596 fbshipit-source-id: 2817ceb57154dcefffec3251efc397cba8163097	2017-07-25 14:18:52 -07:00
Andrew Tulloch	8b2c6341cc	Fallback for MSRAFill Summary: TSIA Reviewed By: Yangqing Differential Revision: D5482575 fbshipit-source-id: 57bcf4b980c42ca4200e8a2fab50fe5152f67501	2017-07-25 14:18:52 -07:00
Soumith Chintala	f194ac1e09	Merge pull request #477 from wickedfoo/feature_lp_pooling GPU implementation of L_p feature pooling	2017-07-26 02:31:59 +05:30
Soumith Chintala	26a0b9aa43	Merge pull request #1259 from wickedfoo/feature_lp_pooling CPU implementation of L_p feature pooling	2017-07-26 02:31:50 +05:30
Wojciech Glogowski	f656e002a7	CosineSimilarity GPU Reviewed By: asaadaldien, akyrola Differential Revision: D5476812 fbshipit-source-id: d931a7d8e4a4dfdf22ee18f8b9c755cc21b0e75b	2017-07-25 13:34:01 -07:00
Sasank Chilamkurthy	e548580f31	Add missing models to torch vision documentation (#2204 )	2017-07-26 01:58:18 +05:30
Adam Paszke	421607a935	DataParallel device_ids slicing fixes (#2200 )	2017-07-26 01:54:38 +05:30
Yangqing Jia	eccddbc204	vectorized typed axpy implementation Summary: This adds an example for vectorized typed axpy implementation under perfkernels. Reviewed By: dzhulgakov Differential Revision: D5479258 fbshipit-source-id: 469e6c8aaf2c12cdf0025bc867eb9d4cab84184f	2017-07-25 12:08:27 -07:00
Yangqing Jia	c2f2b5ad51	lengths_reducer_ops refactoring. Summary: (1) Wrote up length reducer operators from the original dispatcher implementation under segment_reduction_op.cc. Note that this does not change the fp16 version now. (2) created subfolder perfkernels for potential different backends, with scaffolding done. (3) provided the vanilla fp16 implementation, so that currently the default implementation will support fp16 (very slow) right now. This sets up the fp16 benchmarking capability after D5477844. Next step is actually to implement the faster versions. The goal of this diff is mainly so that Misha can plug in his custom implementations more easily. Reviewed By: dzhulgakov Differential Revision: D5479056 fbshipit-source-id: bba30dc0d892b8e2cdfc825034fdfb7bd22a1726	2017-07-25 12:08:26 -07:00
Alisson Gusatti Azzolini	0d96933338	Fix assert Summary: If the last group has length=0, then ##start == end == len_indices##. Implementation is correct, just the assert is not Reviewed By: wickedfoo Differential Revision: D5488858 fbshipit-source-id: fcc4ef8162f1390534a7c556de2ae7d2b82eddc9	2017-07-25 10:38:02 -07:00
Adam Paszke	7be545292d	Update cudnn.py	2017-07-25 09:35:44 -04:00
Adam Paszke	a0e83280ef	Update cudnn.py	2017-07-25 09:35:44 -04:00
Sergey Zagoruyko	aa35be2032	search for cudnn in conda	2017-07-25 09:35:44 -04:00
albanD	626840aef3	C function wrapper uniqueness (#1912 ) * add SharedFunctionMaker to create Function shared in the graph * Clean shared_ptr usage for only function that will be used in the graph * make Function binding match Varible one * remove unnecessary changes * fix comments * proper weakref implementation * add call to clear in dealloc	2017-07-25 13:12:54 +05:30
Rushabh Shah	babb28d2a3	Change DHCECK to CAFFE_ENFORCE in softmax_with_loss_op.cc Summary: Based on discussion on the post in Caffe2 users. Changing DCHECK that works only in debug mode to CAFFE_ENFORCE that throws exception and is a better option. Update: Also correct the check for label_data >= 0, did not check for all elements previously. Moved it to inner loop. Reviewed By: akyrola Differential Revision: D5483788 fbshipit-source-id: ccbff09e19e05e7036db772498f71795063c1fed	2017-07-24 21:52:30 -07:00
Tao Wu	5449afa855	use model.create_param instead of using param_init_net directly Summary: When creating parameters for modelhelper, we should use create_param instead of using param_init_net and model.params directly. The diff rewrite some of these cases in rnn_cell.py in order to make model._parameter_info and model.params consistent. Reviewed By: kittipatv Differential Revision: D5477724 fbshipit-source-id: 28c4aaf8f98d9d89125af6a42ad328008f0079e1	2017-07-24 21:17:24 -07:00
Christopher Hay	eae6400d59	Updated summary for the FC layer in caffe2 Summary: Fixed incorrect description of the input tensor X, and auto-formatted the file. Reviewed By: jamesr66a Differential Revision: D5467876 fbshipit-source-id: 1936cf5eb65824c8aeaf2c7924d5b850ab36b593	2017-07-24 20:32:29 -07:00
Gregory Chanan	bcea678e7b	Update rebased functions to call apply.	2017-07-25 07:37:25 +05:30
Gregory Chanan	1a52ca02ef	Always return indices from MaxPool autograd functions to simplify implementation; The callers (in functional.py) will filter out the return instead.	2017-07-25 07:37:25 +05:30
Gregory Chanan	84314859af	Implement double backwards for MaxPool2d.	2017-07-25 07:37:25 +05:30
Gregory Chanan	9c2beb33c5	Implement double backwards for MaxPool1d.	2017-07-25 07:37:25 +05:30
Gregory Chanan	7deba74969	Implement MaxPool{1d,2d,3d}Backwards (non-differentiable) functions.	2017-07-25 07:37:25 +05:30
Gregory Chanan	48bb07a4db	Implement double backwards for AvgPool3d.	2017-07-25 07:37:25 +05:30
Gregory Chanan	bb86ed7b97	Implement double backward for AvgPool1d, AvgPool2d, LPPool2d.	2017-07-25 07:37:25 +05:30
Gregory Chanan	291369ff1b	Convert pooling functions to new-style, once_differentiable functions.	2017-07-25 07:37:25 +05:30
Gregory Chanan	2118400e18	Fix lint.	2017-07-25 07:37:25 +05:30
Gregory Chanan	39934da8b3	Address review comments.	2017-07-25 07:37:25 +05:30
Gregory Chanan	c12b494329	Implement double backwards for ELU.	2017-07-25 07:37:25 +05:30
Gregory Chanan	506d52dc33	Add check_gradgrad=False for new NLLLoss2d test.	2017-07-25 07:37:25 +05:30
Gregory Chanan	7687c2677a	Fix double backwards advanced indexing derivative wrt grad_output. Also small legacy nn test issue and unrelated syntax issue.	2017-07-25 07:37:25 +05:30
Gregory Chanan	97d21e243b	Implement L1Cost double backwards.	2017-07-25 07:37:25 +05:30
Gregory Chanan	0bda56956e	Implement double backwards for auto-generated HardTanh.	2017-07-25 07:37:25 +05:30
Gregory Chanan	40af93bb57	Optimize PReLU double backwards via a PReLUBackwards autograd function.	2017-07-25 07:37:25 +05:30
Gregory Chanan	9608e37969	Implement double backwards for PReLU.	2017-07-25 07:37:25 +05:30
Gregory Chanan	ec7c510557	Implement Softsign double backwards.	2017-07-25 07:37:25 +05:30
Gregory Chanan	8636be3880	Ensure gradients wrt grad_outputs are checked in gradgradcheck.	2017-07-25 07:37:25 +05:30
Gregory Chanan	fb2284f3a0	Add gradgrad checks for NN module and criterion tests.	2017-07-25 07:37:25 +05:30
Gregory Chanan	9ec9dee27d	Implement NN Criterion functions as potentially double backwards functions.	2017-07-25 07:37:25 +05:30
Gregory Chanan	7b6aab9079	Unify implementation of _Loss and _WeightedLoss autograd functions.	2017-07-25 07:37:25 +05:30
Gregory Chanan	852dd5f011	Convert _WeightedLoss functions to new style autograd functions.	2017-07-25 07:37:25 +05:30
Gregory Chanan	085abee444	Rebase kl_div changes.	2017-07-25 07:37:25 +05:30
Gregory Chanan	48b85fe012	Implement THNN non-criterion Functions as new style with backward/backward.	2017-07-25 07:37:25 +05:30
Gregory Chanan	45ce4df74c	Convert auto nn Functions (non-criterion) to new style.	2017-07-25 07:37:25 +05:30
yunjey	5695cbf986	Add comments in loss.py and distance.py (#2189 ) * Add examples in CrossEntropyLoss 1. Added examples in CrossEntropyLoss 2. Make consistent style of example for PyTorch docs 3. Delete unnecessary character ' * Change comments in distance.py 1. Delete x1, x2 from arguments and add eps in PariwiseDistance 2. For the shape, added input1 and input2 for readability (PairwiseDistance and CosineSimilarity. * Add examples Added the word 'examples' for PyTorch docs	2017-07-25 07:36:28 +05:30
Soumith Chintala	03df5debe3	Gloo fixes for Linux + old cmake (2.8.0) + old glibc (CentOS6)	2017-07-24 21:59:58 -04:00
Soumith Chintala	2ebdef0154	Add 'torch/lib/gloo/' from commit '1978bba3e421eceab6181bcbc838553091cedecc' git-subtree-dir: torch/lib/gloo git-subtree-mainline: ceb4f84d12304d03a6a46693e54390869c0c208e git-subtree-split: 1978bba3e421eceab6181bcbc838553091cedecc	2017-07-24 21:59:49 -04:00
Dmytro Dzhulgakov	8930c095c1	Add support for int32 indices in SparseLengthSum and friends Summary: Need it for some reference comparison for c2isl. Also there's an argument that it might be faster on GPU with int32. Doesn't seem to be the case now, but haven't tested with Jeff's changes yet. Reviewed By: kennyhorror Differential Revision: D5405482 fbshipit-source-id: dc1a983dce5f06f1111c5634ec475647c94848cc	2017-07-24 17:50:00 -07:00
Adam Paszke	ceb4f84d12	Improve memory usage of cuDNN RNN modules (#2179 )	2017-07-25 04:00:17 +05:30
Henry Lu	10667a914e	Add linter for enforcing caffe operator documentation Summary: Add check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator. Reviewed By: dzhulgakov Differential Revision: D5443110 fbshipit-source-id: 3793c3d29bea1228078cb30bdf8243ac0ab90664	2017-07-24 15:27:47 -07:00
Alykhan Tejani	112728cbe9	reformulate bce_with_logits to not use abs (#2195 ) * reformulate bce_with_logits to not use abs * flake8 fixes	2017-07-25 03:46:27 +05:30
Adam Paszke	dc17fb68e4	Fix minor bug in parallel_apply (#2193 )	2017-07-25 03:45:00 +05:30
James Cross	0eda7955bd	use internal cell for DropoutCell output prep methods Summary: In order to get dimensions right, correctly identify gradients, etc., DropoutCell should call the _prepare_output and _prepare_output_sequence methods of its internal cell for its own such methods. This bug was identified by NVIDIA intern Syed Tousif Ahmed. Reviewed By: akyrola Differential Revision: D5483082 fbshipit-source-id: f6df5b4a0502ed0771056638aab219fb5cc7d964	2017-07-24 14:53:11 -07:00
Yangqing Jia	0deee2194f	Add a quick SparseLengthsSum benchmark. Summary: TSIA - this makes it a bit easy to benchmark sparse lengths sum. Reviewed By: dzhulgakov Differential Revision: D5477844 fbshipit-source-id: 89e25c5e0dbf3538877ba1a9abc75a10abfa2757	2017-07-24 13:17:47 -07:00
Robert Eng	4195858614	factored out DBExists function Summary: DBExists function was factored out of the DBExistsOp. Reviewed By: azzolini Differential Revision: D5472587 fbshipit-source-id: 2a53375ffcccfb88e8f0af2ab55dad4c6a9586e3	2017-07-24 11:21:27 -07:00
Aapo Kyrola	7b2b817b9c	improve error for non-existing/vs. sparse or dense gradient Summary: I have hated the "gradient of X is either not provided or sparse" message. It is better to say which one is the problem. Reviewed By: dzhulgakov Differential Revision: D5468923 fbshipit-source-id: b63cde293fe252e5136d225ce4c762b4981f6fc8	2017-07-24 08:56:02 -07:00
Aaron Markham	9c3e59d484	updated research proposal link Summary: Closes https://github.com/caffe2/caffe2/pull/957 Differential Revision: D5480191 Pulled By: aaronmarkham fbshipit-source-id: 445a4955795a2b16d53238029e9140533f7888e5	2017-07-24 08:56:02 -07:00
Yangqing Jia	f6afa6adbd	Add proper cpuid support. Summary: This is needed for us to do more fine grained dispatch based on CPU arch, so I figured we should just add it. Can help Dima and Misha doing optimization I think? Reviewed By: dzhulgakov Differential Revision: D5477444 fbshipit-source-id: 48aaf8bd799e9755493cd51c793ceec080a8846c	2017-07-23 17:21:50 -07:00
Junjie Bai	3c1c3c10e7	Apply OperatorDef shared pointer memory saving feature to DAG nets Summary: SimpleNet and DAGNetBase are the only two direct subclasses of NetBase. This feature has already been applied to SimpleNet before, with this diff all nets should be covered. Reviewed By: dzhulgakov Differential Revision: D5475498 fbshipit-source-id: 339edac31d008ec1e4630d93d2e27d0f518f4ebb	2017-07-23 16:21:58 -07:00
James Cross	99e79a616b	attention with encoder_lengths Summary: For RNN attention, we should not include the invalid parts of the encoder output (based on encoder_lengths) in the computation. This diff accomplishes that by forcing logits for those positions to be negative infinity. Note that the this step can be bypassed by passing encoder_lengths=None, which is what we do for beam search, thus incurring no extra overhead for inference. Reviewed By: jamesr66a Differential Revision: D5402547 fbshipit-source-id: 1863d6050b5129e4df829c6357f0aa9ded0715dc	2017-07-23 10:06:01 -07:00
Kaiyu Shi	4a4d8841e6	Delete unused import	2017-07-23 12:48:11 -04:00
Yiming Wu	b51e0ec0c2	quick fix inplace blob bug Summary: fixing the case where the init net will initialize same blob twice. I made an exception by allowing inplace blob among ops if the blob keeps on the same device. This should fix this problem in a generalized way as most of our training is only on CPU now. Reviewed By: dzhulgakov Differential Revision: D5450564 fbshipit-source-id: 525c4c9a2e5216a70dbd1229da2d9f8a58b89e47	2017-07-23 02:18:16 -07:00
Yiming Wu	920c553ac0	saving/loading CPU/GPU nets Summary: Saving 2 nets at offline training and loading the correct net the user want. The keep_device=false will help us load gpu blobs to CPU memory. Reviewed By: dzhulgakov Differential Revision: D5396689 fbshipit-source-id: ff26bf3759856b07f3a1bbefac4a1e613a8a02e1	2017-07-23 02:18:15 -07:00
Yiming Wu	4a256dfc97	save/load/run nets and params with device info correctly Summary: ===Update log 7/10=== We are now restrained from problem of connection. Will post if this problem does not fix in 2hrs. ===Update 7/6=== Luke is experimenting on the convergence of this diff. Hopefully he could present results next week Right now this is not affecting our original CPU training pipeline because the loading op is still correct in CPU situation now. I will need final test to make sure. But that is now blocked by log device issue t19952135 I will do CPU/GPU nets saved in a separate diff. ====Update before 7.4==== It's actually working! Include local run screenshot {F67959016} dogscience Reviewed By: dzhulgakov Differential Revision: D5307058 fbshipit-source-id: cad5d9324c239419530f4b120392ec2ccbb72280	2017-07-23 02:18:15 -07:00
ngimel	3c275fe7a0	Increase flaky test tolerance (#2185 )	2017-07-22 11:37:34 -04:00
Andrew Tulloch	6892b03499	bindings Reviewed By: Yangqing Differential Revision: D5458167 fbshipit-source-id: 74b52df567e4b44977685c5b396795a0ff056682	2017-07-21 19:03:43 -07:00
Andrew Tulloch	58039aa25b	Improve PoolOp NCHW Reviewed By: asaadaldien Differential Revision: D5459633 fbshipit-source-id: cd09c1a6cfaab76e04baeed289b002c9f12bb80d	2017-07-21 18:22:06 -07:00
Ahmed Taei	24ece087c7	Replace ReduceDimsOps math::Gemv with CUDA reduction kernel. 5.6x speed up. Summary: This reduces runtime from 1.54757 ms/iter -> 0.273687 ms/iter for an 100 parallel reductions each of size 100000. Reviewed By: akyrola Differential Revision: D5471324 fbshipit-source-id: 626cabb8249fb4655275648fae2738cb739e1a72	2017-07-21 17:36:22 -07:00
Ahmed Taei	804ebf7c41	Populate learning rate blob name into data_parallel_model and fix resnet50_trainer example. Reviewed By: akyrola Differential Revision: D5463772 fbshipit-source-id: 10b8963af778503a3de6edbabb869747bd1e986d	2017-07-21 16:24:10 -07:00
Victor Gao	34be12353b	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 15:14:43 -07:00
Victor Gao	1978bba3e4	comment out unused parameters Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually. Reviewed By: igorsugak Differential Revision: D5454343 fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2	2017-07-21 14:57:12 -07:00
Ben Zhang	b16c911667	Implementation for Graph Transforms Summary: The Implementation of Graph Transformations, with the PatternMatch and ReplaceMatch rules. Reviewed By: akyrola Differential Revision: D5404144 fbshipit-source-id: 2bab68e6bff2e841ea9fb64df5d92ea945e704af	2017-07-21 13:51:12 -07:00
Alisson Gusatti Azzolini	8e80ef7e6d	s/CopyGPUToGPU/Copy Summary: CopyGPUToGPU does not exist. Copy seems to do the trick. Didn't go into details of how copy works, not sure if it ends up triggering UVA. Reviewed By: akyrola Differential Revision: D5471014 fbshipit-source-id: d8bc1aed9b19070c92f3ffc76f5617bdd0054563	2017-07-21 13:51:11 -07:00
Alykhan Tejani	35757af6f7	Add broadcasting of weights to bce/bce_with_logits (#2161 ) * added tests + removed explicit expand of weight in bce with logits * add auto broadcasting of weight to BCELoss * remove the need for _BCELoss * formatting of warning * remove TODO * move across assert from _functions/thnn/loss.py * flake8 fixes	2017-07-21 16:02:07 -04:00
Adam Paszke	8ab3d214d5	Fixes for DistributedDataParallel (#2168 )	2017-07-21 16:00:46 -04:00
Soumith Chintala	ec2def803b	Merge commit '2efac3ed83a29f57f914e9044fdddd2ce7ecd6b7'	2017-07-21 15:58:23 -04:00
Sam Gross	71ce3448d9	Fix torch.inverse when magma is not available Fixes #2156	2017-07-21 15:57:43 -04:00
Sam Gross	2efac3ed83	Fix torch.inverse when magma is not available Fixes #2156	2017-07-21 15:57:25 -04:00
ngimel	66bbe5d75a	.creator -> .grad_fn in the code example (#2171 )	2017-07-21 14:43:16 -04:00
Junjie Bai	efe2d01a3e	Fix some bugs in CPU version of BooleanMask and add GPU version Reviewed By: akyrola Differential Revision: D5397208 fbshipit-source-id: 0314cc181e315f3b6cda846292b2e2ea73bb015b	2017-07-21 11:38:49 -07:00
yunjey	ea607afd06	Add comments in nn.Upsample (#2175 )	2017-07-21 14:34:58 -04:00
Junjie Bai	d94c68ecff	Remove net_def_ from NetBase Summary: Constructor should extract everything needed from NetDef instead of keeping it for usage after construction. Reviewed By: akyrola Differential Revision: D5469095 fbshipit-source-id: 288ea3243d85061ba9c018d2aef3b4d97485dd00	2017-07-21 11:22:34 -07:00
Adam Paszke	4f035f14de	Add a support matrix for distributed backends	2017-07-21 14:19:46 -04:00
Edward Z. Yang	72e9e7abf7	Warning squash. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-21 14:13:11 -04:00
Matt Uyttendaele	7f28a891f3	added sincos function to caffe2/utils/math Summary: In situations where both sin & cos are necessary to compute, the joint SinCos function is faster than doing these individually. Both MKL and CUDA support this function, so exposing it here. Reviewed By: kmatzen Differential Revision: D5465588 fbshipit-source-id: 7686498e4f2d4b5862d83a1ecf14fcc88ea53640	2017-07-21 09:55:21 -07:00
Michael Pound	4d45ce7d11	Added UpSampling module and associated tests.	2017-07-21 12:25:50 +01:00
Aapo Kyrola	cbb85545ec	warn about orphan StopGradient output Summary: Quite common confusion is how to use StopGradient, and typical bug is to forget to specify input=output. This adds a sanity check to gradient builder that checks if some StopGradient outputs are orphaned. Reviewed By: dzhulgakov Differential Revision: D5458341 fbshipit-source-id: 056fef4f0ee53eb10e66e9be0ecb55b55f9cc3d7	2017-07-20 21:41:41 -07:00
Ahmed Taei	bcce1bd04a	Fix optimizer_context OSS test Summary: This will fix the test by querying how many instances of the optimizer are already created. Because OSS tests doesn't run in isolation causing number of created instances of optimizer to be >= 0. Reviewed By: akyrola Differential Revision: D5462433 Tags: easy fbshipit-source-id: 7a9ab4fe5345f5d5138abb461ba7a990d9ace840	2017-07-20 12:21:09 -07:00
Honghao Wei	290acab2c7	implement drelu and unittest Summary: In this revision, I mainly implemented the DRelu activation. See https://arxiv.org/pdf/1706.06978v1.pdf for details. To sum up, different from standard relu and purely, which divide the scope into two parts with boundary at zero, DRelu calculate another value p to divide the activation into two part. P is the softmax value of the output of Batch Normalization. For f(x)=x part in relu, you can find similar patten in f(x)=px, and for f(x)=0 part in rely, you can find similar pattern in f(x)=a(1-p)x, in which a is a parameter to tune. Drelu activation result is the sum of these two parts, f(x) = a(1-p)x + px. To implement DRelu, I take BatchNormalization as super class and then use the above formula for computation. In order to allow users to choose activation methods, which usually takes place when calling add_mlp function in processor_util.py, I pass the parameter transfer in model_option from UI to the details, just as what dropout do. Currently, I place it in extra_option, but can modify it if AML team needs to redesign the UI. I also add units test for DRelu. We check the shape of output and also do the numeric unit tests. For Unit test, I first check the numeric value of BatchNormalization, since there is no similar test before. I then compute the value of DRelu outputs and compare the results with current DRelu layer. Reviewed By: chocjy Differential Revision: D5341464 fbshipit-source-id: 896b4dcc49cfd5493d97a8b448401b19e9c80630	2017-07-20 11:50:08 -07:00
Soumith Chintala	eed323c344	avoid warning	2017-07-20 10:59:56 -07:00
Soumith Chintala	ea6f9a26b8	fix version number	2017-07-20 13:30:53 -04:00
Zach DeVito	3719b4247a	return a sentinel value when THTensor has undefined dimensions.	2017-07-20 10:25:30 -07:00
Gavin	bf1fc250d1	get conda root dir automatically, trick from Dockerfile	2017-07-20 11:02:30 -04:00
Edward Z. Yang	47942307b5	Comment that data of THStorage may be NULL. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-20 10:55:35 -04:00
Edward Z. Yang	6b69723d4f	Document how Numpy memory management works. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-20 10:55:35 -04:00
Po-Hsien Chu	5254846bb2	fix typo of error msg of cmul in THSTensorMath (#2158 )	2017-07-20 02:58:54 -04:00
Edward Z. Yang	f3f478960e	Convert Embedding to new style. (#1916 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-20 02:35:21 -04:00
Hugh Perkins	e537023147	add functional embedding (#1987 )	2017-07-20 01:53:37 -04:00
Soumith Chintala	09abaa2189	make keepdim backcompat warnings emit in autograd as well (#2157 )	2017-07-20 01:48:05 -04:00
Adam Paszke	575a4a98e0	Remove assertions with side effects	2017-07-20 01:45:57 -04:00
Adam Paszke	02e23f4f6b	Unify argument names in tensor and Variable methods	2017-07-20 01:45:57 -04:00
Adam Paszke	8946502348	Accept all kinds of arguments in Variable.expand	2017-07-20 01:45:57 -04:00
Adam Paszke	e708de37cc	Allow keyword args in long_arg options	2017-07-20 01:45:57 -04:00
Adam Paszke	4af40e3471	Let parallel_apply accept arbitrary inputs	2017-07-20 01:45:57 -04:00
Adam Paszke	f417cb062b	Fix repeat backward to handle unsqueezed dims	2017-07-20 01:45:57 -04:00
Ahmed Taei	29cb541ec6	Fix CMakelists.txt for caffe2/contrib Differential Revision: D5460222 fbshipit-source-id: 765c94e48f9831b176244060c3e126097f5bb924	2017-07-19 21:35:34 -07:00
Ben Zhang	8d35b11af2	Graphs for Graph Transforms Summary: The Graph Interface and Implementation, for the Graph Transformation Framework. The last diff was too long and unapproachable - let's try this instead :) Reviewed By: akyrola Differential Revision: D5403985 fbshipit-source-id: 89f9361841088db8ebf45a9a4f8d2357eae3fb76	2017-07-19 14:06:03 -07:00
Junjie Bai	44790697c7	Nuke arg_helper() in OperatorBase Reviewed By: akyrola Differential Revision: D5449624 fbshipit-source-id: 20ff6568fe3482af94d1d266e9b47a1709b5004e	2017-07-19 13:52:39 -07:00
Aron Barreira Bordin	11f3ccf98f	Add missing Modules to nn.functional (#1801 ) * add dropout2d and dropout3d to functional added some loss functions to functional added tests using dropout from backend added docs fixes * edited loss modules to call functional	2017-07-19 15:55:21 -04:00
Trevor Killeen	31894cafdd	add support for advanced indexing with less than ndim indexers, ellipsis (#2144 )	2017-07-19 15:51:03 -04:00
greaber	95ccbf8b0b	better error message in load_state_dict when there are inconsistent tensor sizes (#2151 )	2017-07-19 15:50:29 -04:00
Soumith Chintala	a5422d14c8	Merge commit 'bd6263c338c717de880cddfed660b5aa06ee108b'	2017-07-19 15:48:54 -04:00
Francisco Massa	82143487b3	Add CUDA support for arange Also enables CUDA for range	2017-07-19 15:48:20 -04:00
Francisco Massa	bd6263c338	Add CUDA support for arange Also enables CUDA for range	2017-07-19 15:43:00 -04:00
Soumith Chintala	f4a565ded9	Merge commit '1c6a08c1c2a50a7048ae9e6e11290740d24a8374'	2017-07-19 15:42:20 -04:00
Soumith Chintala	1c6a08c1c2	fix lint	2017-07-19 12:41:17 -07:00
Soumith Chintala	a5c2546c0f	version bump	2017-07-19 12:34:43 -07:00
Aapo Kyrola	65e675e3e1	Fix net construct bench Summary: Net construct bench was using old version of data_parallel_model API. Reviewed By: bddppq Differential Revision: D5453281 Tags: easy fbshipit-source-id: 93e1ba58511c7b25235ee50d9862fd0614b344c9	2017-07-19 11:23:39 -07:00
Lukasz Wesolowski	13e84e460b	Use unaligned store intrinsic to enable vectorized reductions on unaligned buffers Summary: When performing reductions on fp16 buffers, gloo assumed that both buffers were either aligned to 32 bytes or misaligned by the same offset. This may not hold in intermediate steps of halving-doubling allreduce, when the reduction is performed on some offset within the receive buffer. The fix is to use intrinsic instructions that work with unaligned pointers. Reviewed By: akyrola Differential Revision: D5450103 fbshipit-source-id: 9a1c8f8c34d2e62223f6d5c21573ea1cfad6537f	2017-07-19 11:06:32 -07:00
Soumith Chintala	4d5d9de541	Merge commit '768b7c0dee34b614ab1cd8f89c69ec7d86c19c88'	2017-07-19 12:22:36 -04:00
Soumith Chintala	9da882e396	Merge commit 'ae3a8d5d2eaa1b15d825b86ce706b046e68733b8'	2017-07-19 12:21:52 -04:00
Soumith Chintala	15bece50d1	Merge commit 'cfcf2af95f91a88ec61cbcac8b30a718e7332aa5'	2017-07-19 12:20:54 -04:00
Soumith Chintala	8144f7c95d	Merge commit '58334a0c4b3c386931293f7fbee3d2cf066221a5'	2017-07-19 12:20:20 -04:00
Soumith Chintala	b660303a16	Static linking against libstdc++ in Binary Build mode	2017-07-19 12:19:36 -04:00
Soumith Chintala	768b7c0dee	Static linking against libstdc++ in Binary Build mode	2017-07-19 11:23:31 -04:00
Soumith Chintala	ae3a8d5d2e	Static linking against libstdc++ in Binary Build mode	2017-07-19 11:23:21 -04:00
Soumith Chintala	58334a0c4b	static MKL detection and linkage fixes	2017-07-19 11:22:46 -04:00
Soumith Chintala	cfcf2af95f	add explicit BLAS linkage to THC when linked against magma (in binary build)	2017-07-19 11:22:23 -04:00
Soumith Chintala	f3df24269d	Merge commit '975550512200cfa1ae18e21400e7efa3924a3d46'	2017-07-19 11:05:51 -04:00
Trevor Killeen	c4120f34bf	move to model with cuda indexing tensors for cuda tensor adv indexing	2017-07-19 11:05:10 -04:00
Trevor Killeen	9755505122	move to model with cuda indexing tensors for cuda tensor adv indexing	2017-07-19 11:04:49 -04:00
Surag Nair	8b42308f71	Bug in line 381 (sparse) (#2130 ) The function iterates over columns and sets "sparsity" fraction of entires in each column to 0. The number of zeros in a column (num_zeros) is then ceil(rows*sparsity)	2017-07-18 22:55:06 -04:00
Edward Z. Yang	685ae4813e	Squash "macro expansion producing 'defined' has undefined behavior" warnings. Fixes #2141. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-18 22:24:55 -04:00
Ahmed Taei	c1384ef99e	Fix NDPooling gradient non-symmetric padding check. Reviewed By: dutran Differential Revision: D5436817 fbshipit-source-id: 7fc325589bcd92b7964067493f3342430476126b	2017-07-18 19:08:49 -07:00
Jon Morton	9b9df3fbeb	Sync mobile codebase changes back to fbcode Summary: Rather chunky sync of changes made exclusively to mobile codebases back to fbcode. Reviewed By: ajtulloch Differential Revision: D5314405 fbshipit-source-id: c4d0a7244468f953eb63288306bc9bc78eb9e1be	2017-07-18 17:54:41 -07:00
Soumith Chintala	a0fef9dd22	Merge commit '703429d49eb397102ba20e6d4c0dd7714be001a5'	2017-07-18 20:17:26 -04:00
Edward Z. Yang	703429d49e	Make clang shut up about class/struct mismatch. Makes us -Werror clean again, I think. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-07-18 20:16:20 -04:00
Tao Wu	4a81b0f24a	make SparseLookup support None pooling Summary: Adding pooling option as None, and SparseLookup will gather the embedding for each id. Reviewed By: kittipatv Differential Revision: D5421667 fbshipit-source-id: 1e8e2b550893ff3869dab12f8eb1fe24a063c3d5	2017-07-18 16:39:55 -07:00
Geet Sethi	11c4647447	Allow CPU device scope in data_parallel_model and data_parallel_rendevous device scope checks Summary: Allowing CPU device scope instead of enforcing no device scope in data_parallel_model and data_parallel_rendevous. Reviewed By: akyrola Differential Revision: D5440492 fbshipit-source-id: bcd4344d64c710ea50ec8a65e3e9d102e35c66ea	2017-07-18 15:47:41 -07:00
Trevor Killeen	567d95fa09	Merge pull request #25 from killeent/nullable-tensors add support for Null Tensors to functions	2017-07-18 17:35:02 -04:00
Zachary DeVito	7914d67ce3	Merge pull request #20 from killeent/type-equality operator== for type	2017-07-18 14:32:45 -07:00
Trevor Killeen	8451468d8b	still generate multiple versions	2017-07-18 14:31:35 -07:00
Jacqueline Xu	3cc03568da	Fixing error message for layer model helper Summary: - Minor fix for error message in layer model helper file Reviewed By: chocjy Differential Revision: D5440768 fbshipit-source-id: df47bfe68a0caa750f0d3c8def28a5585e465ee0	2017-07-18 09:52:45 -07:00
Trevor Killeen	138b216686	add support for Null Tensors to functions	2017-07-18 07:51:51 -07:00
Junjie Bai	4e019dbb6f	Rename def() to debug_def() Summary: Also eliminated non-debug ueses of debug_def Reviewed By: akyrola Differential Revision: D5441534 fbshipit-source-id: 9dab5fb74e25b4da504fa893ec1f3478e282d3f3	2017-07-17 23:50:01 -07:00
Junjie Bai	5881aa0a78	Use shared_ptr to share OperatorDef across threads Reviewed By: akyrola Differential Revision: D5434291 fbshipit-source-id: 89f470d1e2dcde36c3273d86565b1952d7682808	2017-07-17 23:49:59 -07:00
Bangsheng Tang	e5a7891038	dot product using matmul Summary: 1. PairwiseDotProduct in layers 2. add_axis argument in Concat and Split(just for backward propagtion) Reviewed By: xianjiec Differential Revision: D5383208 fbshipit-source-id: 8e18ce371fff2da2da77b1a728142d69cd48e9c3	2017-07-17 23:20:37 -07:00
Soumith Chintala	6f6d70ffed	Merge commit 'dc5854477951765f5edbac34b0c228449de1b56b'	2017-07-18 01:34:54 -04:00
Natalia Gimelshein	dc58544779	fix baddbmm for expanded tensors	2017-07-18 01:33:59 -04:00
Tao Wu	427cc68ba2	added TensorInferenceFunction for ExpandDims operator; deleted Reshape layer. Summary: The diff added TensorInferenceFunction for ExpandDims operator, so that ExpandDims layer is no longer needed (it can be handled by functional layer) Reviewed By: kittipatv Differential Revision: D5430889 fbshipit-source-id: 4f895f2751663c45db4cc4f87e5114c63cda9fbb	2017-07-17 21:03:00 -07:00
Andrey Malevich	8a6d348bb8	Improve QPS metric. Better reporting to the UI. Summary: As desc. Differential Revision: D5440067 fbshipit-source-id: e4c08d650f1ae9008b1e910e136ba973cc5e0d49	2017-07-17 21:02:59 -07:00
Aapo Kyrola	e13704c467	fix shadowed variable name Summary: When compiled with -Werror=shadow-compatible-local, cannot reuse a variable name. This passed our tests, but some people use stronger settings to compile. Differential Revision: D5440805 fbshipit-source-id: a246af748717fb7e0e7a321e1ac4ddfef68ae524	2017-07-17 19:10:30 -07:00
CSLJXing	cddb73899c	fix strip prefix bug in SaveOp Summary: if strip_prefix_ not found in blob name, strip_prefix_.size() characters of blob name will be stripped. Closes https://github.com/caffe2/caffe2/pull/924 Differential Revision: D5440941 Pulled By: akyrola fbshipit-source-id: 1db772fac4c74f2ce05105eec4bc7742a9067ebc	2017-07-17 19:08:23 -07:00
Aapo Kyrola	95291f0f74	Revert D5348078: Add linter for enforcing caffe operator documentation Summary: This reverts commit c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e Differential Revision: D5348078 fbshipit-source-id: f536e647cbd221b26ccbc105a5f5f8bdbcc119ab	2017-07-17 18:36:38 -07:00
Aapo Kyrola	b6722a07cd	remove compilation warning on segment_reduction_ops.cu Summary: Remove this compilation warning: P57645594. Been there a while. Reviewed By: harouwu Differential Revision: D5436753 fbshipit-source-id: 630be22f097fdcae7fe0372eed49f20c065146ba	2017-07-17 17:46:21 -07:00
Aapo Kyrola	e9dd8e0e3b	Use one key for all pairs per node Summary: To reduce round trips with store handlers, it is better to store all addresses in one key instead of one address per pair. This is what this implements. Reviewed By: andrewwdye Differential Revision: D5435893 fbshipit-source-id: 2d3ea3a2822c3b934ff2578d44a262e7bfbde6d0	2017-07-17 17:35:19 -07:00
Tao Wu	78c4c4f885	handle RecurrentNetwork operator when clone net Summary: added support of passing remap_funcs to clone_and_bind_net, so that it can pass it to clone method. Added other utils to ensure RecurrentNetwork operator is correctly cloned based on the remap_blob. The reason that RecurrentNetwork operator needs special treatment is that its arguments contain proto and blobs. Reviewed By: kittipatv Differential Revision: D5421532 fbshipit-source-id: 5de68365ce97df2de483f02ad260d78c8d35eead	2017-07-17 17:33:21 -07:00
Victor Gao	981c84f7b2	remove unused parameters in math_cpu.cc Summary: This removes/comments out/silences one or more unused parameters in the files. We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle. This diff is automatically generated. Reviewers are added heuristically. Reviewed By: dzhulgakov Differential Revision: D5436791 fbshipit-source-id: 164b080c1bc0f6aad146087ddeded255fe9a3d22	2017-07-17 16:09:35 -07:00
Victor Gao	f7a92145d4	comment out unused parameter in pybind_state.cc Summary: This removes/comments out/silences one or more unused parameters in the files. We are going to enable `-Wunused-parameter` in fbcode and this fixes a case that automated tooling can't handle. This diff is automatically generated. Reviewers are added heuristically. Reviewed By: dzhulgakov Differential Revision: D5437217 fbshipit-source-id: c2fc5ed30e7ee47b8c40248f89a9f4304ce7c098	2017-07-17 15:57:49 -07:00
Dmytro Dzhulgakov	0d833590c1	Change Allocator interface to return deleter Summary: This is in preparation for adding huge pages. There we want to remember for the pointer how we got it - via mmap() or alloc(). One option is to store gigantic map of void* -> destructor, but luckily usages of Context::New are all inside Tensor which already uses shared_ptr with custom deleter. This diff could have used unique_ptr as the return type but then it's easy to accidentally call release() and loose the deleter. Thus going with std::pair<void*, MemoryDeleter> to be explicit. Also, now CPUAllocator can be effectively changed to std::function. Haven't done it yet, but can do if necessary. Let me know whether it's a bad idea to proceed like this. Reviewed By: Yangqing Differential Revision: D5429830 fbshipit-source-id: 8382ab7b81592d51272056c05c122894bb203827	2017-07-17 15:26:27 -07:00
Aapo Kyrola	baef769035	add code comments to memonger Summary: Add some comments to dag-memonger to help asaadaldien with his C++ port. Reviewed By: asaadaldien Differential Revision: D5435459 fbshipit-source-id: dd5d482efb017418d22f42ee79fbd4668bd31bdd	2017-07-17 13:07:33 -07:00
Geet Sethi	746ddb7364	Fixed error when compiling with clang Summary: recurrent_network_blob_fetcher_op_gpu.cc was failing when compiled with clang (Note: this ignores all push blocking failures!) Reviewed By: wesolwsk Differential Revision: D5436161 fbshipit-source-id: f4ea31066fe5abc108c6d6c15ee92bf828a2ff96	2017-07-17 12:52:39 -07:00
yunjey	a3c9054245	Add comments in loss.py (#2128 )	2017-07-17 13:56:19 -04:00
Geet Sethi	2dc8851206	RNN Workspace Blob Extraction Summary: Added operator RecurrentNetworkBlobFetcherOp that takes as input a scratch workspace name and prefix, and copies over all blobs in the scratch workspace into the global workspace. This essentially extracts all intermediate recurrent network computation for each timestep. Added a wrapper in recurrent.py - retrieve_step_blobs(net, prefix='rnn') - which, when called after an rnn is run, will return a list of all blobs extracted from the net. Reviewed By: akyrola Differential Revision: D5421926 fbshipit-source-id: 0f35b466d77d3c719fb0e32de7dbcafc6c0d5225	2017-07-17 10:24:18 -07:00
Henry Lu	32b13d6243	Add linter for enforcing caffe operator documentation Summary: Add lint rule to check that every time we register a caffe operator to CPU or GPU that documentation is added for the particular operator. Reviewed By: dzhulgakov Differential Revision: D5348078 fbshipit-source-id: c3fa22fc7ca8066d5fc8fa780b23d7867fd3380e	2017-07-17 08:17:23 -07:00
Yedidya Feldblum	e233875498	CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera Summary: CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera. The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures. Reviewed By: Orvid Differential Revision: D5432398 Tags: codemod, codemod-opensource fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914	2017-07-16 21:40:12 -07:00
Yedidya Feldblum	c7b624651e	CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera Summary: CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera. The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures. Reviewed By: Orvid Differential Revision: D5432398 Tags: codemod, codemod-opensource fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914	2017-07-16 21:24:13 -07:00
yunjey	ba544aa0ad	Add comments in nn.ELU (#2111 )	2017-07-16 23:04:11 -04:00
Dmytro Dzhulgakov	0eeb57a5a2	Detailed per-operator tracking for all nets Summary: Implements TEST_benchmark style of tracking for all nets created in the workspace. I had to do some tricks to invoke stuff in destructors in non-intrusive way. Let me know if it's too hacky. There are 2 levels of reporting: - `--caffe2_logging_print_net_summary=1` - prints per-type aggregated stats - `--caffe2_logging_print_net_summary=2` - prints also individual operator breakdown (might be spammy) Reviewed By: salexspb Differential Revision: D5414708 fbshipit-source-id: 40bac2cdf7e3809ab0086150433c376bb5fc7e64	2017-07-16 14:48:09 -07:00
albanD	849fb1f7e3	Fix when running with python -O (#2120 )	2017-07-16 13:51:14 -04:00
Huazhong Ning	9e2c74cc58	Use scope name for dataset cursor Summary: Currently the dataset cursor blob is using a fixed name. When we read from multi input tables, the dataset cursor of each table is using the same blob. This messed up the split queue and crashed the reader pipelines (see the errors and failures in https://fb.quip.com/uzbIA7K0PgVe) Reviewed By: dragonxlwang, rayleichen Differential Revision: D5419863 fbshipit-source-id: 5983a3d8d2e286dc47c2ec38ed1dbbe30c7c9b49	2017-07-15 19:22:32 -07:00
brett koonce	16dd997239	Spelling tweaks for documentation (#2114 )	2017-07-15 13:16:32 -07:00
Aapo Kyrola	1c0135b6f2	CreateCommonWorld: pass timeout for storehandler Summary: Use the CreateCommonWorld timeout for the storehandler as well, not just the device connect. Reviewed By: andrewwdye Differential Revision: D5425923 fbshipit-source-id: 936d2129e2db3bfed8759ca097b75843d3931d5f	2017-07-14 19:20:11 -07:00
Yangqing Jia	b6691277f5	binary size util Summary: This would allow us to inspect the binary size of the builds more easily. Reviewed By: jonmorton Differential Revision: D4553515 fbshipit-source-id: 95371bf67e66490a8653b874e1ff79cc987805e6	2017-07-14 17:49:24 -07:00
Guillaume Dumont	feecb09517	Added sensible default root location for MKL on Windows Summary: MKL on windows works with this change. Tested with MKL 2017 Update 3 (https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-2017-release-notes). Should fix #544 With MKL 2017 Update 3 #514 should not happen too. Note: I used Anaconda which ships with its own MKL, so I had to make sure that the MKL 2017 Update 3 version was loaded by replacing the .dll in the `%AnacondaPrefix%\Library\bin` folder. Otherwise, numpy would load it's own version and I would have all sorts of missing procedures errors. Now that the same version is available through `conda` this is easily fixed with `conda install mkl==2017.0.3` Closes https://github.com/caffe2/caffe2/pull/929 Differential Revision: D5429664 Pulled By: Yangqing fbshipit-source-id: eaa150bab563ee4ce8348faee1624ac4af477513	2017-07-14 17:20:36 -07:00
Honghao Wei	b68adec7bb	adding model loss logic Summary: Add api model.add_loss(), which allows adding loss, such as optimization and regularization. See change in sparse_nn.py, in which 'model.loss = loss' is changed to 'model.add_loss(loss)'. Reviewed By: xianjiec Differential Revision: D5399056 fbshipit-source-id: 13b2ced4b75d129a5ee4a9b0e989606c04d2ca8b	2017-07-14 16:25:23 -07:00
Aapo Kyrola	27488b4950	Fix Sumop schema Summary: See D5379274#inline-915415925264163 This causes perf warning in prof_dag device validation. Reviewed By: harouwu Differential Revision: D5425968 fbshipit-source-id: 5fdbb0cb692580cf3e5509fe3a52ef3f9556ee4f	2017-07-14 15:12:19 -07:00
Alexander Sidorov	bd29260f47	hyposesis_test grad_reference bug fixes Summary: 1. it was easy to pass grad_reference which was just ignored due to missing output_to_grad 2. threshold was not passed to the gradient checkinglogic Reviewed By: dzhulgakov Differential Revision: D5425226 fbshipit-source-id: 2eb41f2601d5e356f7872e57724d08ab2e742329	2017-07-14 14:41:23 -07:00
Soumith Chintala	a7d82b935f	Merge commit '9851ef4979bad0c8618e586e711c1bfd8648fd52'	2017-07-14 17:31:21 -04:00
Soumith Chintala	af7aea9f17	Merge commit 'f805a8388be8dc55af0e3aa165b13cd0fce484d3'	2017-07-14 17:29:50 -04:00
Luca Antiga	366299f9f3	Wrap unbiased flag in var, std, varall, stdall	2017-07-14 17:29:06 -04:00
Luca Antiga	9851ef4979	Wrap unbiased flag in var, std, varall, stdall	2017-07-14 17:28:14 -04:00
Luca Antiga	f805a8388b	Wrap unbiased flag in var, std, varall, stdall	2017-07-14 17:25:25 -04:00
Soumith Chintala	2f7b6db429	Merge commit 'd2874c560ebd197297ef737a084b6f7ee3f03dc6'	2017-07-14 17:21:16 -04:00
Soumith Chintala	16203f3325	fix test	2017-07-14 17:04:21 -04:00
Kai Arulkumaran	80d067e70f	retain_variables -> retain_graph (#2107 ) Closes #1928	2017-07-14 16:45:25 -04:00
Soumith Chintala	d2874c560e	lint fixes	2017-07-14 16:32:15 -04:00
Jacqueline Xu	2aa8fc7e8d	Implementing Semi-Random Features Layer Summary: - (Split diff from Arc Cosine) - Implemented [[ https://arxiv.org/pdf/1702.08882.pdf \| Semi-Random Features ]] Layer - Created a buck unit test for SRF Layer Reviewed By: chocjy Differential Revision: D5374803 fbshipit-source-id: 0293fd91ed5bc19614d418c2fce9c1cfdd1128ae	2017-07-14 13:15:50 -07:00
Zach DeVito	83596bdcb1	produce a Declarations.yaml file that describes Functions/Type/Tensor methods that framework produced.	2017-07-14 12:34:03 -07:00
Trevor Killeen	f3f8ce44bd	Merge pull request #18 from soumith/master Fix handling of if_true/if_false in ATen	2017-07-14 15:16:07 -04:00
Trevor Killeen	33ac9cdc10	add ATen tensor support to pytorch tuple_parser (#2102 )	2017-07-14 13:56:02 -04:00
Trevor Killeen	38ba935547	operator== for type	2017-07-14 10:39:40 -07:00
Zach DeVito	128e02d792	allow type inference to work on TensorList	2017-07-14 10:27:05 -07:00
Luca Antiga	7ee7542fc8	Fix handling of if_true/if_false in ATen	2017-07-14 11:58:03 -04:00
yunjey	52a9367fa7	Fix minor typo (#2100 ) Fixed minor typo in Autograd mechanics docs.	2017-07-14 10:20:13 -04:00
Soumith Chintala	08bb3b7cc8	Merge commit '7e498d2219c8dbeb801fc4cefa36b147bbf76ff4'	2017-07-14 02:55:55 -04:00
Soumith Chintala	43eaa28b9f	fix empty Tensor mmap	2017-07-14 02:55:05 -04:00
Soumith Chintala	7e498d2219	fix empty Tensor mmap	2017-07-14 02:54:39 -04:00
Junjie Bai	a305ce3ece	Fix broken seq2seq example Reviewed By: harouwu Differential Revision: D5423060 fbshipit-source-id: 4537b020546503a1f9cb237257ab3c42665ae07f	2017-07-13 23:31:54 -07:00
Fisher Yu	d6bc2642e7	Add ignore_index to NLLLoss2d	2017-07-13 23:22:48 -04:00
Christian Sarofeen	7d3511f5f2	Half fixes for ATen and CUDA 9.0	2017-07-13 22:52:39 -04:00
Soumith Chintala	a5a8ab10b0	fix Hardtanh argument names to be consistent between functional and Module	2017-07-13 22:46:51 -04:00
Soumith Chintala	25b591eb05	lint fixes	2017-07-13 22:41:01 -04:00
Soumith Chintala	06f94a7d59	better error message when thread_local is not supported (#2092 )	2017-07-13 22:32:10 -04:00
Aapo Kyrola	f44991b398	add timeout argument to DequeueBlobs; use 10 min timeout for data workers Summary: As title. This helps with (quite common) cases where data input is stuck for reason or another, and the net execution never proceeds and is stuck forever. Reviewed By: andrewwdye Differential Revision: D5409885 fbshipit-source-id: 840261fd5964408f788fc0f50ece0d74193694ac	2017-07-13 18:52:03 -07:00
Honghao Wei	34f7acbedf	Report bugs in BatchNormalization, the dimension is wrong for second order Summary: The number input dimension for NHWC should be the last dimension C. Since batch size is omitted, it should be 2 instead of 3. Reviewed By: chocjy Differential Revision: D5418538 fbshipit-source-id: a6939a863817b7566198ea2a665a1d236a2cf63d	2017-07-13 18:31:18 -07:00
Ahmed Taei	13980d2bb5	Set device to the default device(CPU) when DeviceContext is None. Summary: Fix case when optimizer isn't called within a device scope context. Fix OptimizerContext lr blob names Reviewed By: volkhin Differential Revision: D5421046 fbshipit-source-id: 186a0d05f40d4442c5ba5736084626da73a0c0f1	2017-07-13 17:54:36 -07:00
Soumith Chintala	027264cd64	Merge commit '9e720f15477d2d7a388c5b5ec7d397fa5706d64f'	2017-07-13 19:59:07 -04:00
Dmytro Dzhulgakov	80b620960c	Make QPSMetric count from the first example Summary: This fixes super annoying problem of QPS reporting in sparse_nn_benchmarks when QPS "warms up" gradually. The problem is that we create the metrics in init_net and start counting from there. Whereas there can be big delay before real processing begins. Thus I propose to just start counting from first example seen. It's slightly inprecise too as we miss the first batch, but who cares :) Reviewed By: harouwu Differential Revision: D5414672 fbshipit-source-id: 94fcf2e486416f186fed563002864f73c5f1c908	2017-07-13 16:54:29 -07:00
Soumith Chintala	7c14c377df	Merge commit 'd8fee1ebe675b9d31894ac79145f2b2629e322e4'	2017-07-13 19:25:56 -04:00
Soumith Chintala	c674923bcc	Merge commit 'ed6f5d7038f0e3873c2ed6add2ede7c9ab38e1ea'	2017-07-13 19:24:22 -04:00
Victor Gao	113ff22e65	remove unused parameters in logging_is_google_glog.h and operator.h Summary: This manually fixes a few violations of `-Wunused-parameter` where automated tooling couldn't help. Reviewed By: meyering Differential Revision: D5416336 fbshipit-source-id: c089f02dfdf33351406ebad2f52ad9f8c676360b	2017-07-13 16:24:15 -07:00
Natalia Gimelshein	d8fee1ebe6	add launch_bounds to greedy kernels	2017-07-13 19:23:29 -04:00
Natalia Gimelshein	ed6f5d7038	add launch_bounds to greedy kernels	2017-07-13 19:23:24 -04:00
Zach DeVito	9e720f1547	fix bug in method declarations	2017-07-13 16:22:52 -07:00
ngimel	ab26fa01e6	install vision in devel dockerfile, minor fixes to dockerfile (#2090 )	2017-07-13 19:06:41 -04:00
Trevor Killeen	f4ae64a6c7	add isCUDA() on Type	2017-07-13 15:13:20 -07:00
Trevor Killeen	07fcd977bb	add cudnn data type processing for ATen tensor (#2087 )	2017-07-13 16:37:53 -04:00
lynic	54cabb8bf3	Correct negative dim behavior in torch.stack (#2084 ) Fixes #1950	2017-07-13 16:29:31 -04:00
Sam Gross	42485d87c2	Set the current device in each engine's thread (#2081 ) Fixes #2017	2017-07-13 16:24:38 -04:00
Geet Sethi	ab0d631d6d	Adding AllCompare-like function to data_parallel_model Summary: Added function _RunComparison to data_parallel_model that checks if all shards in a given rendevous have the same value for a given blob_name Reviewed By: wesolwsk Differential Revision: D5394164 fbshipit-source-id: c2b07d0f8d5846fa9887d53b0be091a8c057f106	2017-07-13 13:03:57 -07:00
Zach DeVito	007d6ad816	write generated_cpp. to a file rather than as output to make error reporting clearer.	2017-07-13 11:04:52 -07:00
Soumith Chintala	abd433fa07	Merge commit '6db960fbcff7ae194c6827c73113c222391f2c3e'	2017-07-13 13:49:26 -04:00
Trevor Killeen	6db960fbcf	dont clobber gen.py error, fix for old versions of python	2017-07-13 10:45:14 -07:00
Soumith Chintala	384f03f1be	Merge commit '48b797a785c1fc6ea34398985c49b2c7c55d28ae'	2017-07-13 10:40:58 -04:00
Tzu-Wei Huang	c011d4f3d6	resolves #1991 (#2073 )	2017-07-13 09:57:33 -04:00
lynic	f98c384973	Raise error when call from_numpy on 0-dim array (#2075 ) * Raise error when call from_numpy on 0-dim array Fixes: #2055 * reword error message	2017-07-13 09:56:12 -04:00
Aapo Kyrola	59c0bb9e5a	fix for duplicate input case Summary: Fix a bug reported by dzhulgakov that occurs when input blobs is used twice in a same op --> it was released to the recycled blobs pool twice. Reviewed By: dzhulgakov, volkhin Differential Revision: D5414023 fbshipit-source-id: 861bb46fe901023cb9a496401736e6ecb77d5fae	2017-07-13 01:51:30 -07:00
Soumith Chintala	48b797a785	fix lint	2017-07-13 03:22:31 -04:00
Jiyan Yang	043640c3eb	Return top K classes Reviewed By: kittipatv Differential Revision: D5363481 fbshipit-source-id: 27ce37878434917c1a7c5f325ed77c989a1448af	2017-07-13 00:20:00 -07:00
Soumith Chintala	8983bf13f4	fix max and min docs	2017-07-13 03:03:27 -04:00
Soumith Chintala	20ce45b0c3	fix EmbeddingSum offsets initialization	2017-07-13 02:57:25 -04:00
Soumith Chintala	1e98155711	long ->size_t	2017-07-13 02:40:44 -04:00
Soumith Chintala	1c14178c65	fix osx compilation	2017-07-13 02:38:56 -04:00
Soumith Chintala	37183e91de	add normalize docs to sphinx	2017-07-13 02:31:57 -04:00
Soumith Chintala	14337693d0	Merge commit 'b900a49308cb0363d00add7e123b824fda3eab37'	2017-07-13 01:01:38 -04:00
Soumith Chintala	58e4caf80f	add missing docs	2017-07-13 01:01:04 -04:00
Ahmed Taei	3faca65adf	Add a unit-test to validate sharing learning rate between Reviewed By: kennyhorror Differential Revision: D5413387 fbshipit-source-id: ff4022375183394ca9cee6faea5ac46e56079b86	2017-07-12 21:53:25 -07:00
Zachary DeVito	b900a49308	Merge pull request #11 from soumith/master Fix ATen build for debug python	2017-07-12 21:51:36 -07:00
albanD	c888857461	Conv double backward groups (#1993 ) * add support for groups in double backward * add tests for group in double backward * fix lint * separate some tests to reduce number of test cases * remove redundant testing for different number of output channels	2017-07-13 00:41:14 -04:00
Soumith Chintala	7053b84c0e	Merge commit '41abcd4b41308b3453cce6731d896d094b23c62a'	2017-07-13 00:39:35 -04:00
Soumith Chintala	8304dc4d68	Merge commit '703ccbb8cbe1c4ce3eeb62548ce51f71181883d6'	2017-07-13 00:39:03 -04:00
Trevor Killeen	c48d50a2e2	Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors	2017-07-13 00:38:23 -04:00
Trevor Killeen	41abcd4b41	Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors	2017-07-13 00:37:20 -04:00
Trevor Killeen	703ccbb8cb	Advanced Indexing: Calculate linear offsets directly on the GPU when working with CUDA Tensors	2017-07-13 00:37:13 -04:00
Christian Sarofeen	27da4eafc2	Remove more advanced indexing duplicate tests (#2071 )	2017-07-13 00:30:52 -04:00
Soumith Chintala	459cb697b5	Merge commit 'ce96b84ccbdfbbee7f744942b1bb9fdc5924e442'	2017-07-13 00:26:06 -04:00
Pan He	ce96b84ccb	Check for shared_mem size in multinomial single-sample implementation Handle limited shared memory on function torch.multinomial Update THCTensorRandom.cu	2017-07-13 00:25:13 -04:00
Luke Yeager	82e318cf8b	Optimizer: one LR op per (device, optimizer) Summary: Try running this script through `nvprof`: ```py import numpy as np from caffe2.proto import caffe2_pb2 from caffe2.python import brew, core, optimizer, workspace from caffe2.python.model_helper import ModelHelper do = core.DeviceOption(caffe2_pb2.CUDA, 0) with core.DeviceScope(do): model = ModelHelper(arg_scope={'order': 'NCHW'}) conv1 = brew.conv(model, 'data', 'conv1', 1, 20, 5) pool1 = brew.max_pool(model, conv1, 'pool1', kernel=2, stride=2) conv2 = brew.conv(model, pool1, 'conv2', 20, 50, 5) pool2 = brew.max_pool(model, conv2, 'pool2', kernel=2, stride=2) fc3 = brew.fc(model, pool2, 'fc3', 50 * 4 * 4, 500) fc3 = brew.relu(model, fc3, fc3) pred = brew.fc(model, fc3, 'pred', 500, 10) softmax, loss = model.SoftmaxWithLoss([pred, 'label'], ['softmax', 'loss']) model.AddGradientOperators([loss]) optimizer.build_sgd(model, 0.01, policy='step', stepsize=1, gamma=0.999, momentum=0.9, nesterov=False) workspace.FeedBlob('data', np.zeros((1, 1, 28, 28), dtype=np.float32)) workspace.FeedBlob('label', np.zeros((1, 1), dtype=np.int32)) workspace.RunNetOnce(model.param_init_net) workspace.CreateNet(model.net) for _ in range(100): workspace.RunNet(model.net) ``` Before this change: ``` 1.55% 1.4185ms 837 1.6940us 1.6630us 2.4000us [CUDA memcpy HtoD] 0.72% 656.03us 200 3.2800us 3.1350us 3.5840us [CUDA memcpy DtoD] 0.39% 7.1574ms 1034 6.9220us 3.8300us 18.677us cudaMemcpyAsync 0.00% 34.180us 3 11.393us 9.0960us 12.910us cudaMemcpy ``` And after it (look at the third column): ``` 0.73% 657.15us 200 3.2850us 3.1040us 3.6160us [CUDA memcpy DtoD] 0.26% 235.07us 137 1.7150us 1.6640us 2.3680us [CUDA memcpy HtoD] 0.20% 3.4493ms 334 10.327us 6.4220us 16.958us cudaMemcpyAsync 0.00% 37.376us 3 12.458us 9.4120us 15.412us cudaMemcpy ``` That makes a pretty big difference in performance. Is there any particular reason you decided to have a separate `LearningRate` op for every parameter in `1317e3498c`? Closes https://github.com/caffe2/caffe2/pull/893 Reviewed By: kennyhorror Differential Revision: D5372541 Pulled By: asaadaldien fbshipit-source-id: 57357e1be2d58ce294058e9422fb3b1eddfca24d	2017-07-12 21:17:49 -07:00
Jiyan Yang	d6f5452240	Allow to import subclasses of layers Summary: We want it to be able to register children of layers who are not direct children of ModelLayer. This requires us to find subclasses of ModelLayer recursively. Reviewed By: kittipatv, kennyhorror Differential Revision: D5397120 fbshipit-source-id: cb1e03d72e3bedb960b1b865877a76e413218a71	2017-07-12 20:19:47 -07:00
Jeff Johnson	feddb03d58	LP pooling kernels	2017-07-12 19:31:06 -07:00
Aarti Basant	150ce4c1d8	decode only required # of frames Summary: Instead of decoding all frames for X-ray video training, decode only sampled frames Differential Revision: D5365079 fbshipit-source-id: e00dceadaacd9cdd42d83cf0d0e38338dc1f76ef	2017-07-12 19:08:41 -07:00
Trevor Killeen	fe3802d724	match PyTorch syntax	2017-07-12 16:58:57 -07:00
Aapo Kyrola	294b0eb901	Remove outside access to OperatorBase::def() Summary: As Part 1 in reducing the size of operator objects, this removes the outside access to def() and moves debug-uses under a new debug_def() function. Next phase will be by jbai to remove all access from subclasses to def(). Reviewed By: Yangqing Differential Revision: D5393893 fbshipit-source-id: 7301cff4138dce620b49f6c4db315df85fee7266	2017-07-12 15:50:00 -07:00
Trevor Killeen	b8d0c7fc0d	checked cast does it all	2017-07-12 14:41:04 -07:00
Sam Gross	ea563c1df1	Make weight norm pickleable (#2066 )	2017-07-12 17:21:22 -04:00
Jeff Johnson	2520459617	cpu lp pooling	2017-07-12 14:21:17 -07:00
Sam Gross	841173c530	Use NamedTemporaryFile to avoid filename collisions (#2069 )	2017-07-12 17:14:42 -04:00
Trevor Killeen	f4c502e8a8	basic cat implementation in ATen	2017-07-12 12:04:24 -07:00
Soumith Chintala	593c5e12e1	Merge commit 'be18499e852d8b292491e27d87dadebe68931fc3'	2017-07-12 14:55:21 -04:00
albanD	dc2ed7fd33	Fix ATen build for debug python	2017-07-12 14:52:03 -04:00
Soumith Chintala	81fd2bf2d0	fix some language / typos	2017-07-12 14:47:36 -04:00
Adam Paszke	8915e2710c	Refactor scatter/gather and add distributed docs	2017-07-12 14:47:36 -04:00
Adam Paszke	ebd5c085dc	Fix a memory leak in DataChannelTCP	2017-07-12 14:47:36 -04:00
Adam Paszke	a9759ef401	Fix undefined symbol errors in THD	2017-07-12 14:47:36 -04:00
Tao Wu	02aa5ad9fb	make functional layer return scalar if only one output Summary: This diff makes functional layer return scalar if only one output. This diff also corrects all other corresponding implementations. Reviewed By: kittipatv Differential Revision: D5386853 fbshipit-source-id: 1f00582f6ec23384b2a6db94e19952836755ef42	2017-07-12 11:34:31 -07:00
Soumith Chintala	f899eafe85	Merge commit '5894864a1c5c9596da0ae88b477ee421e3a5065b'	2017-07-12 14:33:47 -04:00
Soumith Chintala	169ca67a4e	Adding Spatial Transformers w/CuDNN support	2017-07-12 14:32:06 -04:00
Soumith Chintala	5894864a1c	Adding Spatial Transformers w/CuDNN support	2017-07-12 14:31:14 -04:00
Kittipat Virochsiri	9e4d060348	Exposing TreeWalker & TreeIterator in header file Summary: These are useful constructs for operators dealing with sparse representation. Reviewed By: sunnieshang Differential Revision: D5332077 fbshipit-source-id: 16aa8c4516e6d80f3c44ff348848f0a4a8061f22	2017-07-12 11:06:57 -07:00
Geet Sethi	a68bb5e3f9	Added device scope checks to data_parallel_model and data_parallel_rendevous Summary: Added device scope checks to data_parallel_model and data_parallel_rendevous Added test to check that checks are working correctly to data_parallel_model_test Fixed device_scope error in test_synchronization_barrier Reviewed By: akyrola Differential Revision: D5403936 fbshipit-source-id: 849c1cd7452692efbc5ef74d2d60ede090c9c017	2017-07-12 10:47:28 -07:00
Soumith Chintala	41c8fee3e7	Merge commit '7c10f1b932fbebdf0e9105f2848229ea22109747'	2017-07-12 12:57:52 -04:00
Soumith Chintala	bb891758bf	Merge commit 'a20729244b43f7072797cc5e93898df795455e5b'	2017-07-12 12:57:12 -04:00
Sam Gross	7c10f1b932	Avoid two unnecessary copies in addmm backward The `r_` and `t` tensors become different objects, even though they point to the same data. Avoid the copy whenever beta=0.	2017-07-12 12:56:17 -04:00
Sam Gross	a20729244b	Avoid two unnecessary copies in addmm backward The `r_` and `t` tensors become different objects, even though they point to the same data. Avoid the copy whenever beta=0.	2017-07-12 12:56:08 -04:00
Tao Wu	74fd4bf9e4	quick fix for model_helper __init__ Summary: the init method should also make _parameters_info shared between self and param_model, since params is shared. Otherwise it can cause a inconsistence between _parameters_info and params. Examples of using param_model can be find in rnn_cell.py. Reviewed By: kennyhorror Differential Revision: D5405327 fbshipit-source-id: ca8079058e898f529906452163cda234cb30a7df	2017-07-12 08:49:48 -07:00
Tao Wu	b9e64ecef1	allow param_info to set optimizer Summary: this diff adds optimizer into param_info, and the associated implementations for modelhelper and brew to set optimizer for each individual parameter. Reviewed By: kennyhorror Differential Revision: D5385432 fbshipit-source-id: 5d682f9d1ab077e04a5d76a24d71470f4e64fc92	2017-07-12 08:49:48 -07:00
albanD	a74fb22b9a	fix inplace division for python3 (#2063 )	2017-07-12 11:37:55 -04:00
Hugh Perkins	0d91048639	add dummy tensor.data property, to provide interpretable error message to users (#2058 )	2017-07-12 10:22:08 -04:00
Artem Volkhin	54e8ef14fb	add flag caffe2_serialize_fp16_as_bytes Reviewed By: kennyhorror Differential Revision: D5403218 fbshipit-source-id: 755e7a709880f54096a6e5e661554614fc2cc585	2017-07-11 22:20:36 -07:00
Mitchell Wortsman	823869ba79	Adding tanh to brew Summary: Added tanh to brew. Reviewed By: harouwu Differential Revision: D5395358 fbshipit-source-id: 8eb5303f503e10aec4c59b42055933198d67e9b3	2017-07-11 18:17:52 -07:00
Alexander Sidorov	3d1af15a35	logging for all operator calls Reviewed By: dzhulgakov Differential Revision: D5332005 fbshipit-source-id: 4a406ee1eb3a8333de10d09b592fe5ecfb3a0f5b	2017-07-11 17:09:39 -07:00
Dmytro Dzhulgakov	67d2f45e2f	Fix net_printer.py Summary: Fix the unprintable characters fix :) Reviewed By: akyrola Differential Revision: D5398914 fbshipit-source-id: 2c607c497f15e324e863ff1dae7bb16199d4074e	2017-07-11 15:26:52 -07:00
Sam Gross	10e23943b3	Fix missing _forward_pre_hooks in serialized modules (#2057 )	2017-07-11 18:23:35 -04:00
Sam Gross	be18499e85	Fix a few C++ warnings 1) Type needs a virtual dtor 2) Tensor move ctor should be noexcept 3) Make constructors from Context* and Type* explicit	2017-07-11 15:18:15 -07:00
Trevor Killeen	1037f30e41	add some documentation to Tensor	2017-07-11 11:00:45 -07:00
Aapo Kyrola	192e0546bf	fix for back-and-forth models, pass reference instead of copy Summary: akirillov again presented me with a memonger-bug: his model that has kind of a 'back-and-forth structure' where blobs are passed left and right in a ladder-like structure, revealed a bug in memonger: I should pass the set of free blobs as a reference, not a copy so that the recyclings are properly accounted for. Hard to explain. Since we have the graph verifier, we can be more confident with these changes. I also added some helpful debug to the graph verifier. Differential Revision: D5396925 fbshipit-source-id: 0bffb3a0bf8532afcd6b5bc9331c779768a8c5c5	2017-07-11 10:52:14 -07:00
Amartya Sanyal	78ecc2d3b1	Alias multinomial sampling in Cuda (#784 ) * Support Multinomial Alias sampling in cuda Moving benchmark file * Review changes	2017-07-11 13:23:35 -04:00
Amartya Sanyal	f483679425	Implementation of Alias Multinomial for faster Multinomial sampling (#1046 )	2017-07-11 13:22:36 -04:00
Kevin Wilfong	c8afdb6f4b	Caffe2: Add Open method to DBReader which takes DB pointer Summary: Currently the DBReader always creates the DB instance itself when Open is called. Add an Open method that takes in a DB pointer and takes ownership of it, so the DB can be initialized outside the DBReader. Reviewed By: panshen1 Differential Revision: D5392458 fbshipit-source-id: d8660ab41d349f32030e4934b47bd17256a440df	2017-07-11 09:08:54 -07:00
Sam Gross	dfd5d8d0fe	Avoid two unnecessary copies in addmm backward (#1971 ) The `r_` and `t` tensors become different objects, even though they point to the same data. Avoid the copy whenever beta=0.	2017-07-11 11:55:22 -04:00
Trevor Killeen	158c7e86dd	add basic gitignore, thpp -> at doc fix	2017-07-11 08:32:58 -07:00
Hungryof	73128f7b08	fix minor typos (#2051 ) * Update extending.rst fix typo * Update cuda.rst fix typo	2017-07-11 11:01:41 -04:00
Leonid Vlasenkov	f536c662bf	fix op in docs (#2048 )	2017-07-11 10:36:19 -04:00
Zach DeVito	2ecb18881c	add DynamicType variants for ATen functions.	2017-07-11 10:35:03 -04:00
Zach DeVito	9d8cff9bc1	initialize aten and pytorch to share the same THCState	2017-07-11 10:35:03 -04:00
Zach DeVito	ab3d85c410	add build commands for ATen	2017-07-11 10:35:03 -04:00
Soumith Chintala	e58e27cf16	Add 'torch/lib/ATen/' from commit '9d0c674cb7bcfae989d69f988363c1688c22fa89' git-subtree-dir: torch/lib/ATen git-subtree-mainline: 3314d51dcc1535dc2d00d357be889807d1bb8c57 git-subtree-split: 9d0c674cb7bcfae989d69f988363c1688c22fa89	2017-07-11 10:33:24 -04:00
Sasank Chilamkurthy	3314d51dcc	Add __repr__ to Avgpool and maxunpool layers (#2047 )	2017-07-11 10:13:22 -04:00
Jacqueline Xu	e89e71c595	Simplifying Random Fourier Features and layer test Summary: - Condensed operators in RFF layer - Adjusted RFF layer test; made test code more concise Reviewed By: chocjy Differential Revision: D5391436 fbshipit-source-id: 08748861cd6fb4a9e4cc9c8762996371492020a1	2017-07-11 00:40:53 -07:00
yunjey	1ef1dd9cad	Add comments for readability (#2005 )	2017-07-10 23:02:56 -07:00
Aapo Kyrola	3a073c591b	improve SumOp error message Summary: When Sum was called with other type than float and int, it just returned false without any helpful error. Reviewed By: asaadaldien Differential Revision: D5394070 fbshipit-source-id: 0f3c543a39f89163bccb9f55ea394e1d53561b62	2017-07-10 19:33:50 -07:00
Robert Verkuil	97193478c7	Implemented GRUCell Summary: Implemented python logic and tests to create an RNNCell for GRU. Uses the preexisting GRU Unit Op code. Reviewed By: salexspb Differential Revision: D5364893 fbshipit-source-id: 2451d7ec8c2eacb8d8c9b7c893bfd21b65fb9d18	2017-07-10 17:52:25 -07:00
Robert Verkuil	2409c2e359	GRUUnit Op Backwards Pass Summary: Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell. Functions were created to mimic LSTM implementation as closely as possible. Backwards pass implementations are defined in GRU_unit_op.{h, cc} assertGradientChecks call added to gru_cell_test.py Reviewed By: salexspb Differential Revision: D5364856 fbshipit-source-id: 09cff4478091827763b40cc331e4e0abf0ec258f	2017-07-10 17:52:24 -07:00
Robert Verkuil	279f3f095e	Implemented Gated Recurrent Unit (GRU) c++ operator forward pass Summary: Just an implementation of the forward pass of the GRU Unit Op, not the full RNNCell. Functions were created to mimic LSTM implementation as closely as possible. Implementation defined in GRU_unit_op.{h, cc} tests put in gru_cell_test.py, which import rnn_cell_test_util.py for sigmoid, tanh, and _prepare_rnn functions. Reviewed By: jamesr66a Differential Revision: D5363697 fbshipit-source-id: f9ba9fe0be01ffc868dd22027be8be4975b84998	2017-07-10 17:52:23 -07:00
Robert Verkuil	48bd102b95	Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file. Summary: Moved sigmoid, tanh, and _prepare_lstm (renamed) to a util file. Also renamed _prepare_lstm to _preapare_rnn since it is being used for both setting up and LSTM and GRU model. The reason for this commit is to allow the creation of GRU Op and testing code without copying and pasting code for sigmoid, tanh, and setting up an rnn unit op mode. Reviewed By: jamesr66a Differential Revision: D5363675 fbshipit-source-id: 352bd70378031f1d81606c9267e625c6728b18fd	2017-07-10 17:52:22 -07:00
Kevin Matzen	4b1ebd2f65	Fast path for serializing large floating-point tensors to protobuf Summary: Our existing serialization routines take a significant amount of time for large numpy arrays in order to verify the type of each element in the array as well as converting each element to a canonical type. For large floating-point tensors, such as model parameters, this checking and converting takes a significant amount of time. Adding a fast track path for just float32 arrays as this is the most common use case to worry about. Reviewed By: akyrola Differential Revision: D5389953 fbshipit-source-id: 26f44cb2426ea3efb849e7707b27d5485f69956c	2017-07-10 17:52:22 -07:00
Kevin Matzen	c096c188c3	minor leaky relu bug fixes Summary: numpy.random.rand generates samples from [0, 1) and therefore, the leaky relu test cases weren't testing negative inputs. Tests still pass after change. Leaky relu can be used in-place, but gradient took X rather than Y. Technically, the result is no different as it's just used for a sign test in the gradient, but updated it to take Y to reduce confusion. Differential Revision: D5390126 fbshipit-source-id: d0c428abbb2797eb33902a7d2a2f59d5e85daaa6	2017-07-10 16:04:45 -07:00
Sam Gross	98206c326e	Fix ref counting in wrapped tuple functions (#2042 ) Fixes #1963	2017-07-10 18:46:06 -04:00
Zach DeVito	9d0c674cb7	always use a custom default float	2017-07-10 15:37:18 -07:00
Zach DeVito	bff762c3ff	python style fixes	2017-07-10 15:37:07 -07:00
Trevor Killeen	10a8ccf27f	only test gets for advanced indexing with duplicates (#2041 )	2017-07-10 16:05:55 -04:00
Alykhan Tejani	0a9e8a23ef	add atan2 function to autograd (#2040 )	2017-07-10 16:04:35 -04:00
Kevin Matzen	720db19fa2	make GetComputedParams work like GetParams Summary: GetComputedParams tests namescopes with equality while GetParams tests with a prefix. Switching GetComputedParams to also use a prefix so that both functions have similar usages. Reviewed By: akyrola Differential Revision: D5389816 fbshipit-source-id: 0e43e4b491fccbad3b855b6b735dc2b91d7626c9	2017-07-10 12:30:44 -07:00
Artem Volkhin	d9daad509d	Serialize float16 tensors as bytes to get rid of 50% overhead Summary: When we use int32_data field for float16 tensors serialization it's possible to end up with up to 50% larger representation than can be achieved using byte_data. The reason for it is varints (https://developers.google.com/protocol-buffers/docs/encoding#varints). In worst cast (when highest sign bit is set) it uses 3 8-bit blocks i.e. 24 bits for each number. Saving in byte_field removes this overhead. Reviewed By: Yangqing Differential Revision: D5375267 fbshipit-source-id: 0068daed25cd0157ea80a768b6e3899ea2bd8caf	2017-07-10 11:19:09 -07:00
Yangqing Jia	d88cb87300	add dilated convolution guard to nnpack op Summary: dilated convolution semantics were added after the nnpack op, so the feature check macro was not there originally. accept2ship Reviewed By: ajtulloch Differential Revision: D5387287 fbshipit-source-id: 139ca8c6ad4211ceec8f24982f1f060144524401	2017-07-10 10:46:27 -07:00
Junjie Bai	ff3996acb9	Add NormalizeL1Op for doing L1 nomalization along given axis Reviewed By: salexspb Differential Revision: D5380220 fbshipit-source-id: 38fc56a1013c25b0c8b0fc161ca54fea412fb8b2	2017-07-10 10:10:36 -07:00
Jacqueline Xu	6ea71155c1	Implementing Arc Cosine Layer Summary: - Implemented the [[ http://cseweb.ucsd.edu/~saul/papers/nips09_kernel.pdf \| Arc Cosine ]] layer - Developed buck unit test for Arc Cosine Reviewed By: chocjy Differential Revision: D5367604 fbshipit-source-id: ffd3ee081bc055b06c075c34aa6ce329b62ce2e0	2017-07-10 10:10:36 -07:00
Zachary DeVito	8b003565ec	remove inaccessible median variant (#2015 ) With the addition of medianall() this variant can no longer be accessed, because both it and medianall take no arguments.	2017-07-10 10:42:45 -04:00
Kongsea	53ac2d46c6	Fix typos in docstrings. (#2034 )	2017-07-10 10:35:46 -04:00
Soumith Chintala	318ea29a86	Merge commit 'ab3a9e177ee5eb7d39de2d385ba1e141858e8329'	2017-07-10 10:30:24 -04:00
Leonid Vlasenkov	ab3a9e177e	Fix sdot_ bug for runtime F2C symbol conflicts by using cblas where available	2017-07-10 10:29:26 -04:00
Leonid Vlasenkov	46a868dab7	[Ready] Limit docs line length (#1900 ) * some docs are ready * docs * docs * fix some more * fix some more	2017-07-10 10:24:54 -04:00
Zach DeVito	581921f696	support unsafe functions for getting/constructor tensors from TH objects for backward compat.	2017-07-09 21:25:38 -07:00
Kongsea	0025e1c776	Fix typos in the docstrings of Conv3d, AvgPool3d and MaxPool3d (#2030 ) * Fix a typo of the docstring of Conv3d * Fix typos in docstrings of 3D operations.	2017-07-09 23:20:07 -04:00
Lukasz Wesolowski	9cba97a833	Pairwise-exchange benchmark with bandwidth measurement Summary: A simple benchmark to determine network bandwidth for pairwise communication. Reviewed By: plapukhov Differential Revision: D5159607 fbshipit-source-id: d16c3ed3a0c2ae182138df91bdae821f5508c6ac	2017-07-09 15:55:20 -07:00
Alykhan Tejani	c6d7e1e6bf	added input size checks to batchnorm (#2020 )	2017-07-09 15:31:24 -04:00
Jiyan Yang	3598bdd044	Modify samplingTrain layer to take more general inputs Summary: As desc. Reviewed By: kittipatv Differential Revision: D5363486 fbshipit-source-id: cb8fa65d750e80d2bf3e9909ca9b2d83a5548099	2017-07-08 22:19:55 -07:00
Carlos Garcia Jurado Suarez	35ad7be55f	Add Sum operator to mobile Summary: Moving the Sum operator into its own file (elementwise_sum_op.cc) Reviewed By: oyvindkinsey Differential Revision: D5379274 fbshipit-source-id: c504d91c9fb5e95b369f2aa7e7b5be31fd8e0d4b	2017-07-08 11:06:05 -07:00
Guillaume Dumont	dc13345eb3	Read pretrained weights using binary mode in caffe_translator.py Summary: Binary mode must be explicitly specified when reading binary files under windows. Closes https://github.com/caffe2/caffe2/pull/883 Differential Revision: D5373073 Pulled By: Yangqing fbshipit-source-id: afedebdc74c954dbb6d24c0bccc192c8712c4c88	2017-07-08 10:17:57 -07:00
Naren Dasan	49f679d0e9	Acknowledge the existence of cpu HalfTensor (#2018 )	2017-07-08 10:03:36 -04:00
Bangsheng Tang	5f63f5697a	IndexHash Summary: 1. IndexHashOp 2. Helper class SparseFeatureHash 3. FeatureSpec changes to add desired_hash_size Reviewed By: kennyhorror Differential Revision: D5361370 fbshipit-source-id: bf02e3ca12b3654f1d291f77c8af9248b6c4ac55	2017-07-07 23:06:11 -07:00
Zach DeVito	f0788afb0c	lazily initialize cuda so that we behave similar to PyTorch	2017-07-07 22:21:31 -07:00
Geet Sethi	86b6a6e2f8	Added PiecewiseLinearTransform CUDA Op Summary: Added a CUDA implementation of the PiecewiseLinearTransformOp. Differential Revision: D5378537 fbshipit-source-id: 38857f59f5cc52e16e1ecc97983a0b0b82a46c74	2017-07-07 15:20:00 -07:00
Clément Godard	cb7f17ab64	added gradients for ResizeNearest (CPU + CUDA) and ref Summary: # Added the gradients of the operation for both CPU and CUDA kernels. # Unified variable names across all ops. # Added reference implementation in numpy. # The gradient check needs a larger stepsize to succeed, is that normal? Reviewed By: akyrola Differential Revision: D5313682 fbshipit-source-id: aceb92649e01c5caeba8774e678f9095502d396c	2017-07-07 14:19:42 -07:00
Ralph Mao	febae7b20b	fix a bug in the report function of Data_Parallel Summary: replace params with sp, otherwise it will report an empty list Reviewed By: akyrola Differential Revision: D5382716 fbshipit-source-id: 34d8e6ee00cbe1718702e3d1f23ea12f8d65063e	2017-07-07 13:03:46 -07:00
Andrew Tulloch	6bff82eb6a	Revert threadpool minWorkSize change on iOS Reviewed By: sf-wind Differential Revision: D5380298 fbshipit-source-id: fdf98bdda30e8cd6689c59fcc0357bca129d409b	2017-07-07 12:41:52 -07:00
Zachary DeVito	a4dc7dcd04	osx build issues and clang warnings	2017-07-07 11:50:02 -07:00
Zach DeVito	5dd05ed8ee	remove Sparse from dispatch for now, will add dispatch variants later	2017-07-07 11:40:08 -07:00
Jonas Gehring	0a34f05d5b	Always include THNN in the build, don't check for CUDA twice As a result, the project builds on MacOS with gcc-6 (without CUDA).	2017-07-07 14:14:02 -04:00
Zach DeVito	4fda678a85	fix build issue when cuda does not exist	2017-07-07 10:54:17 -07:00
Jacqueline Xu	8cedf35d55	Adding Random Fourier Features to SparseNN Model and Flow Summary: - Integrated RFF into the preprocessing workflow for dense features - Developed Flow interface to input RFF parameters - Created unit test for using RFF with sparseNN Reviewed By: chocjy Differential Revision: D5367534 fbshipit-source-id: 07307259c501a614d9ee68a731f0cc8ecd17db68	2017-07-07 09:39:32 -07:00
lynic	ebdec9a837	Skip distributed tests if not supported (#2004 )	2017-07-07 11:06:56 -04:00
Alykhan Tejani	c3c7845572	added asserts that grad_output + input are contiguous (#2000 )	2017-07-07 09:14:02 -04:00
Dmytro Dzhulgakov	f8089c789c	One more proto_utils.h fix Reviewed By: ajtulloch Differential Revision: D5380322 fbshipit-source-id: b1aa445984bf87feb81dcf08f782f48777d359c5	2017-07-07 02:47:50 -07:00
lynic	90d0762d14	Use torch.arange instead of torch.range in test_torch.py (#1996 )	2017-07-07 00:06:31 -04:00
Aapo Kyrola	ad62e82179	fast simple-net memonger for C++ Summary: To be used with predictor "online": C++ version of memonger for simple nets. Very simple greedy algorithm. Works well at least on Resnet-50 inference graph: only 3 shared blobs are used. Next I will integrate this with predictor and run canary (separate diff). Reviewed By: asaadaldien Differential Revision: D5375392 fbshipit-source-id: d36e419e39a32e568e105657c27fb00c85a2535d	2017-07-06 15:17:07 -07:00
Guillaume Dumont	e8689dda8f	Python 3 compatible integer division Summary: As the title says. Closes https://github.com/caffe2/caffe2/pull/879 Differential Revision: D5372787 Pulled By: akyrola fbshipit-source-id: 0ff469c0d227f1b2252c1a0c4f6f8bebaac5580f	2017-07-06 11:47:12 -07:00
Andrew Dye	31f394f8b3	Add synchronization barrier API to data parallel model Summary: Add synchronization barrier API with configurable timeout. Users can call Synchronize() to join variable length execution before resuming multi-machine communication steps, i.e., resuming distributed training iterations after validation on a single machine. Reviewed By: akyrola Differential Revision: D5348387 fbshipit-source-id: 5826da10e6a60c50394c36c7cf47624f10191d11	2017-07-06 09:21:19 -07:00
Andrew Tulloch	43c46cc883	Reduce default ThreadPool min work size (~25% speedup for segmentation on S7). Summary: I noticed this when experimenting with the compute-bound convolutions for the ULP HWGQ binary conv/gemm. It's an ugly heuristic that Maratyszcza and co. are improving this half, but I think this will be a net win for C2 especially if segmentation/mask r-cnn are critical. Differential Revision: D5375976 fbshipit-source-id: 863f76d434f133bf5a00e7ced1cfadfcf92e3c84	2017-07-06 08:32:32 -07:00
Aapo Kyrola	21ba0ff560	small fix to when input blob is input to multiple ops Summary: Memonger had a bug that it crashes if an input blob was input to multiple ops. This fixes that and adds a test. Reviewed By: asaadaldien Differential Revision: D5374860 fbshipit-source-id: 1d5044001eacdbe6db43f69727da9297558f5c5c	2017-07-05 22:37:26 -07:00
Aapo Kyrola	2d133d4627	increase concurrency default Summary: Huge improvement in my tests, and it does not really hurt either. Reviewed By: wesolwsk Differential Revision: D5374925 fbshipit-source-id: c96a4ed2ca653120a82233c0037cbfded8a2d2a1	2017-07-05 21:46:31 -07:00
Andrew Dye	78a4fd1044	Add Caffe2 op for Gloo barrier Summary: Add Barrier to Caffe2 communicator ops. Add implementation using Gloo's BarrierAllToOne collective. Reviewed By: akyrola Differential Revision: D5348268 fbshipit-source-id: a21f2c98e946541e108644d150684fcd12312a0f	2017-07-05 17:37:22 -07:00
Hugh Perkins	73fead9f8f	add shape alias (#1983 )	2017-07-05 19:12:37 -04:00
Luke Yeager	be7725b0ba	Tests: fix dpm test when only 1 GPU present Summary: `b33894e95d` removed this line: ```py unittest.skipIf(workspace.NumCudaDevices() < 2, "Need at least 2 GPUs.") ``` but forgot to add it back later. ``` _________________________________ DataParallelModelTest.test_equiv __________________________________ ... if p2p_access_pattern is not None and not p2p_access_pattern[ > devices[0], peer ]: E IndexError: index 1 is out of bounds for axis 1 with size 1 ... WARNING:data_parallel_model:** Only 1 GPUs available, GPUs [0, 1] requested ``` /cc akyrola Closes https://github.com/caffe2/caffe2/pull/888 Reviewed By: akyrola Differential Revision: D5341310 Pulled By: harouwu fbshipit-source-id: 8d7f06913c7b5a42009a4033dbb6a48a8e812822	2017-07-05 14:32:12 -07:00
Dmytro Dzhulgakov	87730360d1	Small improvements to CreateOperatorDef Summary: - allow initializer lists directly with `vector<string>{}` part thanks do default initialization - reduce the number of instances Reviewed By: nicolasvasilache Differential Revision: D5370056 fbshipit-source-id: b8fae3b12144257644e098b284df7369d5bdb377	2017-07-05 11:50:01 -07:00
Junjie Bai	4fddc04054	Use the same schema of switching to device reduce sum for SumSqrElements Summary: Based on benchmark script located at `caffe2/experiments/python/device_reduce_sum_bench.py`, device reduce sum is slower for N <= 10000, so we only switch to use device reduce for large N in SumElements. This diff applies the same schema for SumSqrElements. Reviewed By: jamesr66a Differential Revision: D5369868 fbshipit-source-id: ae13a611aff9d3464d1c4950ee155c740a2da339	2017-07-05 10:52:17 -07:00
Yiming Wu	60e4607106	brew API in convnet benchmark Summary: upgrade convnet_benchmarks to brew api Reviewed By: salexspb Differential Revision: D5341829 fbshipit-source-id: f34c6dd4aae5f0c8db51e7600eb1f0e1cdc72ea3	2017-07-05 10:34:48 -07:00
Luke Yeager	e60bc2df85	TravisCI: run Python tests Summary: Run Python tests (with pytest) on TravisCI. Closes https://github.com/caffe2/caffe2/pull/817 Reviewed By: bwasti Differential Revision: D5332944 Pulled By: harouwu fbshipit-source-id: 29dfa3f19bf100ba9a04b048489f6a63e426416d	2017-07-05 10:10:04 -07:00
Christian Sarofeen	3748b6d3eb	Data parallel fix for https://github.com/pytorch/pytorch/issues/1857 (#1880 ) * Data parallel fix for https://github.com/pytorch/pytorch/issues/1857 searches recursively for variable in input * parallel_apply.py lint	2017-07-05 11:46:00 -04:00
Jacqueline Xu	25bd5dda27	Implementing random fourier features layer Summary: - Created the random fourier features layer - Generated a unit test to test the random fourier features layer is built correctly - Inspired by the paper [[ https://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf \| Random Features for Large-Scale Kernel Machines]] Reviewed By: chocjy Differential Revision: D5318105 fbshipit-source-id: c3885cb5ad1358853d4fc13c780fec3141609176	2017-07-04 23:48:42 -07:00
Bruno Alexandre Rosa	b3589b04fd	Fix exceptions not being caught (#1948 ) Adding -fexceptions to both torch and pytorch C/C++ builds fixes tests not passing. Closes #1297	2017-07-05 00:25:39 -04:00
Dmytro Dzhulgakov	03aec5ae53	Add Tensor::FreeMemory Summary: Similarly to Blob::Reset but keeps dimension information. Reviewed By: rayleichen Differential Revision: D5365655 fbshipit-source-id: c0f99c020bbabe93ff2a2ee5d519a5f467fda5ba	2017-07-04 19:31:17 -07:00
Alykhan Tejani	5964394a4c	return empty iter when tensor is empty	2017-07-04 17:29:27 -04:00
Soumith Chintala	1aaa24d99b	add medianall prototype to docs	2017-07-04 16:52:36 -04:00
Soumith Chintala	295ed7e264	Merge commit 'ab7d4e2bcea5cae8f05873fb0bbb31985cc58d47'	2017-07-04 16:47:48 -04:00
Soumith Chintala	ab7d4e2bce	add missing definition	2017-07-04 16:46:04 -04:00
SunYeop Lee	ae65236490	Fix typo	2017-07-04 15:19:05 -04:00
Soumith Chintala	c2069a15e0	Merge commit '56df97ce939985a30dcfefb1136bf45faf64413c'	2017-07-04 15:18:14 -04:00
Soumith Chintala	56df97ce93	remove unnecessary contiguous assertion	2017-07-04 15:17:15 -04:00
Soumith Chintala	89c682dfb9	Merge commit '0dbf871d9ec424f1a7897af77bf93219d3be23bf'	2017-07-04 14:56:53 -04:00
Soumith Chintala	ae839f4b2e	Merge commit 'f425c5216b7fe35dd03e0161a3440ec968c63636'	2017-07-04 14:56:22 -04:00
Luca Antiga	05c2bafc9d	Have median reduce over all dims and return just the value when dim is not provided	2017-07-04 14:55:37 -04:00
Luca Antiga	0dbf871d9e	Have median reduce over all dims and return just the value when dim is not provided	2017-07-04 14:55:30 -04:00
Luca Antiga	f425c5216b	Have median reduce over all dims and return just the value when dim is not provided	2017-07-04 14:55:19 -04:00
Tzu-Wei Huang	635bb5ec9d	corrects typo	2017-07-04 11:09:40 -04:00
Jiyan Yang	00e5afea6a	Adding dedup aggregator options to sgd optimizer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5324671 fbshipit-source-id: 27f3a58f618cd5ea11c2ea2e756df3f73635c2c8	2017-07-04 02:10:18 -07:00
Marat Dukhan	2ac9ff5c96	Cos, Sin, and Abs operators Summary: add Cos, Sin, and Abs operators Reviewed By: akyrola Differential Revision: D5307632 fbshipit-source-id: 743c9d289e4d3fd439e4b5385841cdff87d9247a	2017-07-03 22:18:32 -07:00
Soumith Chintala	a7f6b0ab4f	Merge commit 'e5bac2dd2d69772938482c1431db1fc1efb64c6f'	2017-07-03 20:41:28 -04:00
Bruno Rosa	e5bac2dd2d	Add critical section to BLAS gemm. This is needed because of possible races in SpatialConvolutionMM (and others that use gemm) if the BLAS library is not thread-safe. In terms of performance, there's not much benefit to run two gemms in parallel, because the BLAS libraries have their own all-occupying gemms anyways.	2017-07-03 20:40:21 -04:00
Simon Layton	090506ac87	Add NCCLBroadcast to correct net Summary: Otherwise was always added to main net instead of param_init_net when desired (i.e. initial param sync) Closes https://github.com/caffe2/caffe2/pull/894 Differential Revision: D5367451 Pulled By: akyrola fbshipit-source-id: 3d82be6da687c736bd15f4852dbd272266eb4811	2017-07-03 16:54:44 -07:00
Zach DeVito	ec8da55a7d	bind THS THCS, leaving all operators unimplemented. This is required because THPP can represent Sparse tensors even though the wrapper doesn't implement any operators.	2017-07-03 16:52:41 -07:00
Sam Gross	b4414c0dc3	Handle None in modules list. It's often useful to add None to an nn.ModuleList to keep the indexing of the module list to match some other property.	2017-07-03 18:53:21 -04:00
Gregory Chanan	39edc378fb	Fix lint.	2017-07-03 18:51:22 -04:00
Gregory Chanan	f6578c1b24	Implement double backwards for Dropout and FeatureDropout.	2017-07-03 18:51:22 -04:00
Gregory Chanan	daa84e7663	Implement bilinear double backward.	2017-07-03 18:51:22 -04:00
Gregory Chanan	1aa145dbac	Implement ConstantPad2d double backwards.	2017-07-03 18:51:22 -04:00
gchanan	d4b8834131	Improve non-contiguous testing in TestAutograd: (#1933 ) * Improve non-contiguous testing in TestAutograd: 1) Test gradcheck and gradgradcheck with non-contiguous inputs 2) Test gradgradcheck with non-contiguous gradoutputs (gradcheck would take more work) 3) Fix discovered issue in Prod backwards. * Simplify non-contiguous setting wrt View.	2017-07-03 18:49:52 -04:00
Gregory Chanan	699d1ec7fb	Address flaky Norm test issues: 1) Add a correction for 1.5 norms to ensure input can't be zero. 2) Increase test tolerance.	2017-07-03 18:48:22 -04:00
Gregory Chanan	05062a1439	Better handle random seeds in tests. Previously, there were 2 issues with test_autograd randomness: 1) Many random operations (e.g. random selection in prod_zeros) happened before the torch random seed was set (because it was set in run_tests at the end of the file. 2) The random seed was not set consistently: run_tests would set it to the proper value, but each call to setUp would set it to 0 (because SEED wasn't global in run_tests), which made setting the seed mostly worthless.	2017-07-03 18:48:22 -04:00
Gregory Chanan	e187ba7a9f	Decrease likelyhood that Fmod/Remainder tests fail due to numerical jacobian check. Previously, these tests added 5e-2 to the denominator tensor (the same as the div tests), which only avoids divide by 0, but not issues with computing the numerical jacobian due to non-linearity of fmod/remainder, when input / divisor is close to an integer. These tests now add 1.5 to the denominator, which is the same as the non-tensor version of the tests; Note that we can still hit the above condition but it will be much less likely.	2017-07-03 18:48:22 -04:00
Soumith Chintala	35ed224d04	Merge commit '8a24f2b4d8646de10b497c2eca2f1edc525a1e09'	2017-07-03 00:49:59 -04:00
Soumith Chintala	72b292d45c	Merge commit '733a7c6d9a22dfc9be1b11d47384991208658bfb'	2017-07-03 00:49:52 -04:00
Soumith Chintala	5b4cd9bb49	Merge commit 'c691fc6dc711814a06107d4a9b763f34bff5afca'	2017-07-03 00:49:34 -04:00
Christian Sarofeen	c691fc6dc7	Add a nonContigDim reduction kernel to improve latency for small tensors. (#768 )	2017-07-03 00:39:40 -04:00
ngimel	42cf68b402	Make reduction functors accept only constant arguments (#753 ) (similar to MaxValuePair and MinValuePair above).	2017-07-03 00:35:39 -04:00
Soumith Chintala	8a65ef1098	cc 2.0 -> 3.0 in docs.	2017-07-02 22:08:42 -04:00
Dmytro Dzhulgakov	b6c1c0ac4e	Fix communication_schema decoding Summary: Allows to override the input/output record as long as the field blobs are the same. Reviewed By: yangyangyyy Differential Revision: D5362132 fbshipit-source-id: 3ac2ac22802902b7eed5c226b00a7e1971ad264c	2017-07-02 13:04:20 -07:00
Ethan Luo	406040f6a9	fix torch.is_tensor not recognizing HalfTensor (#1934 )	2017-07-02 10:13:44 -04:00
Soumith Chintala	e26139b7f7	fixed shapes in GRU and LSTM docs.	2017-07-01 23:15:10 -04:00
Alykhan Tejani	457587088a	Fix broadcasting issues in binary_cross_entropy_with_logits (#1944 ) * done re-seed cuda device if in bad fork * avoid broadcasting in binary_cross_entropy_with_logits * assert input sizes for BCEWithLogitLoss * added check that BCEWithLogitsLoss == Sigmoid + BCELoss * fix flake8 issues * rename test_bce_with_logits_gives_same_result_as_bce_and_sigmoid -> test_bce_with_logits_gives_same_result_as_sigmooid_and_bce_loss * add warning in BCELoss about input shapes * fix lint	2017-07-01 23:06:36 -04:00
Aapo Kyrola	d43b42fb37	allow querying tensor device + tool to validate that all ops have tensors from correct devices (GPUs) Summary: Quite common, hard-to-debug, performance bug for multi-GPU training has been that operators have been passed tensors that reside on different GPU than what the op runs on. Since we have peer access enabled, this works, but is just much slower. With data parallel model this problem arises rarely as it has static analysis of the operators, but if someone bypassed DPM or uses FeedBlob with incorrect device options, this problem can happen. To make debugging easier, I added device-field to tensor that stores the device information that allocated the memory. In addition, I added a function to go through operator inputs and outputs and compare their tensor device to the operator device. This check is run after first iteration with prof_dag only. Also renamed ShapeCall to TensorInfoFun, as it now returns so much other info than the shape. I think this is pretty safe diff, but do you find it problematic to add a new field to tensor? Reviewed By: dzhulgakov Differential Revision: D5335505 fbshipit-source-id: 511b6c122dff9a205f43951984868ffd40f7ac30	2017-07-01 09:16:37 -07:00
Dmytro Dzhulgakov	c0cebc3578	Added flags to lstm, convnet and sparse_nn_benchmarks to print out operators Summary: pass flags directly to C2 Reviewed By: salexspb Differential Revision: D5345869 fbshipit-source-id: 22b0e791526c7b0caf1e6a13dd29900df0db8fe8	2017-06-30 23:47:04 -07:00
Aapo Kyrola	ab0fe0a5f4	add debug information when there is blob version mismatch Summary: It is quite common question when users get some variant of "blob has version 2 but gradient expects version 1" in their backward pass. The error message is completely unhelpful. To remedy this, I added proper debug information which tells user how the version number of a blob was incremented over time. i.e which ops caused the version to go op. This should help understand the issue. Reviewed By: dzhulgakov Differential Revision: D5358227 fbshipit-source-id: bc09d048ac33200c35d56460e44e86c2f2888f3f	2017-06-30 16:22:46 -07:00
Junjie Bai	f3a59aedff	Use cub::DeviceReduce for faster math::Sum CUDA version Summary: Port SumElements and softmax_ops.cu to use device reduce sum Reviewed By: akyrola Differential Revision: D5351881 fbshipit-source-id: ca9604186c261ffcb1480da2a17baab8a4809372	2017-06-30 15:04:06 -07:00
Sam Gross	da0fad8a7a	Use torch.matmul in nn.Linear (#1935 ) This takes advantage of the broadcasting behavior of torch.matmul to support inputs with more than two dimensions. The extra dimensions are treated like part of the batch dimension, much like nn.Bottle in Lua Torch. There are a few related small performance changes: * Addmm computes the gradient in column-major for inputs in column-major format * Variable.mm calls Addmm in-place with the desired output buffer	2017-06-30 16:53:26 -04:00
Sam Gross	2c038f2074	Add weight normalization implementation (#1945 ) * Add weight normalization implementation This adds forward "pre-hooks" which get called before the module's forward() method. Weight norm is implemented as a hook which calculates the weight variable from the weight_g and weight_v every iteration. Based on @rtqichen implementation. * Specify return type	2017-06-30 15:41:40 -04:00
Soumith Chintala	b3e500c522	fix docs generation warnings	2017-06-30 14:39:21 -04:00
Edward Z. Yang	b3f6ff1b3d	Fix unused linker argument warnings. (#1958 ) * Fix unused linker argument warnings. This patch began when I noticed the following clang warning: clang: warning: -Wl,-rpath,RIGIN: 'linker' input unused clang: warning: argument unused during compilation: '-L/home/ezyang/local/pytorch/torch/lib/tmp_install/lib' The warning is minor, but I was a bit worried our rpath wasn't setup correctly. Actually, it was, and there wasn't a problem, but I had to spend some time figuring out exactly what as going on, and by the end of it, I might as well fix the warning. In the end, I ended up filing two upstream tickets for ccache and cmake: - https://github.com/ccache/ccache/issues/189 - https://gitlab.kitware.com/cmake/cmake/issues/17025 We can remove the warning by using CMAKE_EXE_LINKER_FLAGS and CMAKE_SHARED_LINKER_FLAGS, which have sane macro expansion rules (although still slightly insane: the first level of escaping gets removed.) To ensure that the rpath was being set correctly, I ran objdump -x torch/lib/build/TH/libTH.so \| grep RPATH and verified that ORIGIN was setup correctly. I also considered using CMAKE_INSTALL_RPATH, but the rpath here doesn't seem to get set until you actually install, which is a change in behavior, and I wasn't sure if anyone was relying on rpaths being setup in the build directory. There is a SLIGHT behavior change, in that if we happened to need these LDFLAGS passed to the static linker, they won't get passed. I don't think we ever build static libraries today so this shouldn't be aproblem. P.S. Because of the ccache bug, you may continue to see these warnings after this patch. If you apply https://github.com/ccache/ccache/pull/190 and clear your cache, it will solve the problem. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Remove unnecessary -Qunused-arguments Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-30 14:15:31 -04:00
Tao Wu	5aa147f273	added PackRNNSequence and UnpackRNNSequence operators Summary: Added two operators that can be used to tranfer data into the input format of RNN and back. Reviewed By: kittipatv Differential Revision: D5329886 fbshipit-source-id: 07eac29416427b08c49989d4eeed50a6f18493a1	2017-06-30 09:53:31 -07:00
Aapo Kyrola	8c74c36626	fix reducing device option Summary: This was broken in a previous diff, fixing it to use model device type. Reviewed By: asaadaldien Differential Revision: D5356005 fbshipit-source-id: a4fcc932bae772076b57625a5fcc0d38eb702cc9	2017-06-30 09:19:57 -07:00
Andrew Dye	326e314695	Add optional timeout to Gloo ops Summary: Add an optional timeout parameter to CreateCommonWorldOp, to be honored on dependent collective operations. Reviewed By: akyrola, romain-intel Differential Revision: D5348099 fbshipit-source-id: cf5131450c389c7e40b1dabf8334c486e02e0011	2017-06-29 17:18:30 -07:00
Thomas Dudziak	5355634dac	Dict fixes/improvements and unittest targets for Python 3 in caffe2 core Summary: As title Reviewed By: salexspb Differential Revision: D5316104 fbshipit-source-id: aee43819d817842e5ce6ba3d045a55b1a2491c30	2017-06-29 17:05:41 -07:00
Alexander Sidorov	a6dee1da32	Make args.fixed_shape in lstm_benchmark work in a library mode Summary: this works as a standalone python script because args are global. When used from Flow for monitoring purposes it doesn't work. This diff fixes it Reviewed By: zem7 Differential Revision: D5349996 fbshipit-source-id: f73842901d975b783e09e9db0565eb81880bbea1	2017-06-29 14:55:26 -07:00
Aapo Kyrola	dd6e170b8d	fix LSTM benchmark reporting Summary: A couple of fixes to fix broken rerporting of lstm_benchmark: - last_time must be recorded after warm up - entry count was incorectly removed Reviewed By: salexspb Differential Revision: D5349890 fbshipit-source-id: 5dd5bdf46594c520b61bc3b57b153f90a6a17903	2017-06-29 13:53:17 -07:00
Andrew Tulloch	6c67a753c7	Fix test_pair_wise_loss_predictions Summary: Increase absolute error tolerance. Reviewed By: tomdz Differential Revision: D5349604 fbshipit-source-id: 8e04001b0b6a6e83083f341e265ab3c0d2b06918	2017-06-29 12:48:04 -07:00
Andrew Tulloch	912ee4e40a	Fix `test_sparse_to_dense` precision failures Summary: .. Reviewed By: tomdz Differential Revision: D5349561 fbshipit-source-id: 4c510905515eb03a64abc36f33d59a1d998c2ab1	2017-06-29 12:48:03 -07:00
Andrew Tulloch	83765906c6	Add min_satisfying_examples Summary: Eliminates failures from overloaded machines from only running a few examples before being timed out. Reviewed By: tomdz Differential Revision: D5349555 fbshipit-source-id: 89d1db063f58c72656b37157225a586c9e3f24bc	2017-06-29 12:48:01 -07:00
Trevor Killeen	6df23b418d	mark tools as excluded in find_packages (#1915 )	2017-06-29 13:49:56 -04:00
Viswanath Sivakumar	a4cc9f2fbf	Per-workspace mutex for shared im2col buffer Summary: Shared im2col buffer needs a mutex only to protect it from ops within a workspace (since the shared buffer is created per workspace). The current implementation has a global mutex which affects perf when running multiple nets in parallel. I don't feel great about adding a mutex for this in workspace, let me know if anyone has better suggestions. Reviewed By: akyrola Differential Revision: D5341476 fbshipit-source-id: 1c9a92ef488ffb0c0013a7656bcb3d530bc7208b	2017-06-29 10:19:37 -07:00
Edward Z. Yang	e5b5154768	Make cudnn warnings clean. (#1940 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-29 10:58:04 -04:00
Mohammad Hossain	a0bfda2390	Fix issues to access Scuba and move out scuba logging from opensource to fb-internal codebase. Summary: As title Reviewed By: akyrola Differential Revision: D5335465 fbshipit-source-id: c005210b9adfe553aee88da546da451ae29a52a6	2017-06-29 01:52:54 -07:00
ngimel	bfaddc0a19	Warp intrinsic fixes (#785 )	2017-06-29 00:14:07 -04:00
Sam Gross	4d5075add2	Add ignore_index to nnl_loss and cross_entropy (#1937 )	2017-06-29 00:10:13 -04:00
Junjie Bai	86305ddd49	Deprecate CNNModelHelper in python/seq2seq/seq2seq_model_helper.py Summary: Also added some simple tests for Seq2SeqModelHelper. Reviewed By: jamesr66a Differential Revision: D5291733 fbshipit-source-id: 15866dccb89acd82c08e0348f14834cd9c201422	2017-06-28 20:18:12 -07:00
Yiming Wu	fb4c0a664b	brew API in lstm benchamrk Summary: I deprecated CNN ModelHelper in LSTM benchmark Reviewed By: salexspb Differential Revision: D5342734 fbshipit-source-id: 81a552194bcb0cc3071604340fce6873230964f2	2017-06-28 20:18:12 -07:00
James Reed	a60f90a3a3	Only notify on DAGNet condition variable if the condition is actually true Summary: This is splitting out one change from D5273337. This makes it so that we only notify the DAGNet condition variable if the condition it's signalling is actually true, namely remaining_ops_==0 \|\| !success_. Reviewed By: akyrola Differential Revision: D5341962 fbshipit-source-id: a4d76cc95aebac27dc18da2bf8dc1837db69e6ae	2017-06-28 20:02:59 -07:00
Ben Zhang	e128245e8c	Move memonger graph equality into memonger Summary: Lets try this again. Verify graphs every time memonger is run. Will definitely check for time though. Reviewed By: akyrola Differential Revision: D5308188 fbshipit-source-id: 512a76c759b670d31c49d1d492dd8ee1eaf3bafd	2017-06-28 17:36:40 -07:00
Sam Gross	0a95613cef	Improve error message when accessing attributes that don't exist (#1936 ) New: >>> torch.autograd.Variable(torch.randn(3, 3)).foobar AttributeError: 'Variable' object has no attribute 'foobar' Old: >>> torch.autograd.Variable(torch.randn(3, 3)).foobar AttributeError: foobar	2017-06-28 20:13:15 -04:00
James Reed	cb04548577	Clean up nvcc compiler warnings in utility_ops.cu Summary: These `template` qualifiers are unnecessary and throw warnings Reviewed By: akyrola, wickedfoo Differential Revision: D5327333 fbshipit-source-id: 35bccbf410beb9311f3776747267c06f5ed7b620	2017-06-28 17:03:42 -07:00
Sam Gross	8a4eb50ed1	Speed up torch.matmul for 3D+ x 2D/1D tensors (#1931 ) If the left tensor is 3D+ and the right tensor is at most 2D, we can fold the batch into the matrix dimension and use torch.mm instead of torch.bmm. In practice, this is faster especially if the right tensor is column major.	2017-06-28 17:43:21 -04:00
Luke Yeager	fe9b0bfd27	Fix some typos Summary: Closes https://github.com/caffe2/caffe2/pull/882 Differential Revision: D5341277 Pulled By: harouwu fbshipit-source-id: bb5595c65c05ca7ea1a1d060d61d14fbfe008241	2017-06-28 13:50:48 -07:00
Yongqiang Wang	ea659b8f2e	broadcast to global parameters when using warmup Reviewed By: asaadaldien, jay-mahadeokar Differential Revision: D5340692 fbshipit-source-id: 80879847ff71c8d620de502ef95a9ffb4bdf595d	2017-06-28 13:35:27 -07:00
Alexander Sidorov	f3c15091c9	don't try to attach observer if net creation fails + unit Summary: As title. Not sure how did the unit test bug went through - we should have push blocking test guarding it. Looks like sandcastle thought that it was already broken Reviewed By: jamesr66a Differential Revision: D5340741 fbshipit-source-id: 76b2287fc2f746d85dd732b669ff89808bcbd497	2017-06-28 13:35:26 -07:00
Luke Yeager	87ab40c617	Tests: fixing observer_test Summary: Fixes bug reported at `3559ff93f9 (commitcomment-22813961)`. Builds have been broken since that commit was merged. /cc salexspb Closes https://github.com/caffe2/caffe2/pull/886 Differential Revision: D5340752 Pulled By: salexspb fbshipit-source-id: 0d3f4cd0a66580ba173378a879a44bb1dbaf7e39	2017-06-28 13:17:13 -07:00
Ahmed Taei	fbe2526343	Allow concurrent execution of GLOO broadcast collectives in Summary: This add CollectivesConcurrencyControl class to mange creating common context and cyclic controls to execute GLOO collectivces and refactors AllReduce and _AddDistributedParamterSync to use it Reviewed By: akyrola Differential Revision: D5335795 fbshipit-source-id: 5084e0a65cdb989cd949be3868b77a680561022d	2017-06-28 12:49:12 -07:00
Brian Lan	e2bd3cfc8b	Add __sub__ function for schema.Struct Summary: This is for the ease of removing the common fields of a struct from another. For example, s1 = Struct( ('a', Scalar()), ('b', Scalar()), ) s2 = Struct(('a', Scalar())) s1 - s2 == Struct(('b', Scalar())) More examples are provided in the code comments. Differential Revision: D5299277 fbshipit-source-id: 7008586ffdc8e24e1eccc8757da70330c4d90370	2017-06-28 11:24:01 -07:00
moskomule	b5e1df046e	fixed typo in formula of GRU in doc (#1921 )	2017-06-28 11:02:06 -04:00
Trevor Killeen	08648061f7	Advanced Indexing 2A - Colons + Adjacent Adv Indexers (#1890 )	2017-06-28 10:01:45 -04:00
Jiyan Yang	8260002941	Partial eval layers Summary: In some cases we don't want to compute the full FC during eval. These layers allow us to compute dot product between X and W[idx,:] where idx is an input, e.g., label. Reviewed By: kittipatv Differential Revision: D5305364 fbshipit-source-id: 0b6a1b61cc8fcb26c8def8bcd037a4a35d223078	2017-06-28 00:36:40 -07:00
Yiming Wu	a1fcbb8be1	offline_all_gpu_experiment Summary: similar to sparse_nn all gpu, this is our first step towards offline full gpu experiment. Compare Run cat(128, 32)512-512 : GPU 21138598 https://fburl.com/jpeod1pi CPU 21138787 https://fburl.com/vma7225l Reviewed By: dzhulgakov Differential Revision: D5308789 fbshipit-source-id: 413819bf9c5fff125d6967ed48faa5c7b3d6fa85	2017-06-27 23:09:54 -07:00
Yiming Wu	1fce3eac4e	single trainer hybrid device Summary: First try of single trainer hybrid device training for sparsenn Comparison results with CPU training: https://our.intern.facebook.com/intern/fblearner/run/compare/?compare_to[0]=20016969&compare_to[1]=19660293&baseline_run=19660293&all_runs[0]=20016969&all_runs[1]=19660293 Reviewed By: dzhulgakov Differential Revision: D5205723 fbshipit-source-id: 4a024324ac2efc3248dd470d4c533cf2ecec2e92	2017-06-27 22:06:30 -07:00
Henry Lu	9a14c013c3	Refactor data_parallel_model to take advantage of Gloo broadcast op in broadcasting across machines and GPUs in one operation Summary: Combine _AddDistributedParameterSync() and _SyncParams() into a single function to broadcast across distributes machines and all local GPU simultaneously. This is similar to how calls to Allreduce has already optimized using the functionalities of Gloo. All the refactoring work is contained in data_parallel_model.py. Reviewed By: akyrola, andrewwdye Differential Revision: D5329277 fbshipit-source-id: 4407b88980cf396f2e0f994d796294fa79fd39ed	2017-06-27 19:35:24 -07:00
Luke Yeager	c3b4d277bf	Tests: fix test_convolution_sync() Summary: This bug in the test was exposed by https://github.com/caffe2/caffe2/pull/861 (previously, the test was always using the cuDNN engine, regardless of the value of `engine`). This bug is now blocking https://github.com/caffe2/caffe2/pull/817. ``` ____________________ TestConvolution.test_convolution_sync _____________________ ... if use_cudnn and requested_engine != 'CUDNN': raise ValueError( > 'When use_cudnn=True, the only engine you can specify is ' E ValueError: When use_cudnn=True, the only engine you can specify is "CUDNN" ``` https://travis-ci.org/caffe2/caffe2/jobs/247605579 Closes https://github.com/caffe2/caffe2/pull/881 Differential Revision: D5332619 Pulled By: akyrola fbshipit-source-id: 63737768a155359ddbbef1da424fcbb94f86bd4e	2017-06-27 18:07:04 -07:00
Alexander Sidorov	75fc49833f	An observer for every created net and op Reviewed By: akyrola Differential Revision: D5319289 fbshipit-source-id: 1140caef6d608ab3e37d22311e5c8a7e489470d5	2017-06-27 18:07:03 -07:00
James Cross	08cfc72dee	Increase threshold for test_unroll_attention Summary: To 0.000001. Reviewed By: salexspb Differential Revision: D5323697 fbshipit-source-id: 5a06c8f5e719b5252e4229704205be37777a8bab	2017-06-27 17:17:32 -07:00
James Reed	07ba98b4b2	Allow specification of SliceOp dimensions via argument rather than via tensor Summary: This should make it so we no longer have super hacky DAG chains just to generate vectors of indices that could be specified at model creation time Reviewed By: akyrola Differential Revision: D5316707 fbshipit-source-id: 97bb3868b69e0c5a7f465c95f2e16ae0485dcc56	2017-06-27 17:17:32 -07:00
Gregory Chanan	4c35c630ec	Enable norm gradgradchecks by lowering precision requirements.	2017-06-27 18:44:14 -04:00
Gregory Chanan	3744efeaf8	Fix double backwards for prod.	2017-06-27 18:44:14 -04:00
Gregory Chanan	bc032be13e	Implement negative dimensions and double backwards cumprod.	2017-06-27 18:44:14 -04:00
Aapo Kyrola	ee1f21a53e	fix perf bug in TransposeOp for CUDA Summary: It was allocating TensorCPU always, so causing mutex to be acquired in PinnedCPUAllocator. Not much impact as everyone should use the CUDNN transpose, but good to fix anyway. Reviewed By: jamesr66a Differential Revision: D5332858 fbshipit-source-id: 287643df623b7cd59ab1028ed8b2ed1d3c1da44e	2017-06-27 15:27:28 -07:00
James Reed	81f539a283	Implement SliceGradientOp for CUDA Summary: Implement the gradient for the Slice op on GPU Reviewed By: akyrola Differential Revision: D5313442 fbshipit-source-id: 722ad0bdf65e014d3236e17d15c83d40d7c975d2	2017-06-27 14:49:22 -07:00
Aapo Kyrola	4d16578284	fix + verification for inplace blobs Summary: Fixes a memonger bug where it could recycle a blob that was released by the same op being processed. Added a verification step to ensure in-place assignments are not changed. Reviewed By: asaadaldien Differential Revision: D5331495 fbshipit-source-id: 20b08f6de5b973e8c9868aa048c142cac1eb6c58	2017-06-27 13:51:03 -07:00
Alexander Sidorov	6057fa9193	Avoid compiler warning in operator_schema.h Summary: Previous attempt included terminating the program which is not good. Here I am using [[noreturn]] trick. Reviewed By: jamesr66a Differential Revision: D5313159 fbshipit-source-id: 8889efcf793d44d472502309992e6f5b0a31f0e6	2017-06-27 13:31:35 -07:00
Alykhan Tejani	f814a892cf	done re-seed cuda device if in bad fork (#1923 )	2017-06-27 13:24:52 -04:00
Luke Yeager	dfd745a4d1	Conv frontend: checking engine and use_cudnn Summary: Fixes https://github.com/caffe2/caffe2/issues/860 Raise an exception when the user specifies conflicting values for `engine` and `use_cudnn` in the conv frontend. Closes https://github.com/caffe2/caffe2/pull/861 Differential Revision: D5329587 Pulled By: akyrola fbshipit-source-id: 0f1ced9a88c9c6c5a7cb30a070e5bf60129082f0	2017-06-27 09:47:48 -07:00
Simon Layton	d45f722e43	data_parallel_model: NCCLBroadcast root fix Summary: The root is the root _rank_ and not the root _device_. Thus we always use root=0, regardless of the devices used. https://github.com/NVIDIA/nccl/blob/v1.3.0-1/src/broadcast.cu#L75 /cc slayton58 Closes https://github.com/caffe2/caffe2/pull/872 Differential Revision: D5329564 Pulled By: akyrola fbshipit-source-id: 5a34be30c1a0046a74f28437cb08333c1fb46098	2017-06-27 09:47:48 -07:00
Luke Yeager	ca2bf16009	Tests: handle missing python-lmdb gracefully Summary: Fix issue mentioned here: `875a9850c1 (commitcomment-22773221)` Unblocks https://github.com/caffe2/caffe2/pull/817 /cc tomdz Closes https://github.com/caffe2/caffe2/pull/871 Differential Revision: D5329573 Pulled By: akyrola fbshipit-source-id: 855294f76bce82dce6d4bd489244922799848076	2017-06-27 09:47:46 -07:00
Valentin Haenel	d592e188f7	port of ConcatDataset (#1902 )	2017-06-27 12:31:56 -04:00
Zhicheng Yan	c0445c4426	support_multi_label Summary: Extend image_input_op to support multi-label binary label vector Reviewed By: panshen1 Differential Revision: D5318119 fbshipit-source-id: da6757ed9a562f1ab58e3ae5642b7a70d6d499c1	2017-06-27 08:47:59 -07:00
Leonid Vlasenkov	ae61f3ff42	adds poisson NLL loss (#1779 )	2017-06-27 10:04:54 -04:00
Soumith Chintala	1f391a42f7	fix warnings for docs generation	2017-06-27 00:18:32 -04:00
James Reed	24e30534ea	Implement SliceGradientOp for CPU Summary: Implement slice gradient for CPU. Will soon port this over to GPU so NMT can use it Reviewed By: akyrola Differential Revision: D5309305 fbshipit-source-id: 8fb5f4e665f236ecce9227c5c0c302f5076b01ad	2017-06-26 21:18:05 -07:00
Sergey Zagoruyko	b933423495	support more than 8 gpus (#774 )	2017-06-26 16:49:14 -04:00
Soumith Chintala	ee1b7b50b3	fix docs for broadcast warning	2017-06-26 14:50:57 -04:00
Andrew Tulloch	cb5af39c69	Vectorize CPU ClipOp implementation (and add test) Summary: Noticed this wasn't vectorized, could be handy. Reviewed By: kennyhorror Differential Revision: D5308593 fbshipit-source-id: c2b35ece34831f0546f010a1ebe0b89f1a7d9446	2017-06-26 11:33:13 -07:00
Sam Gross	7cdd018db4	Fix assertEquals for lists and tuples (#1913 ) zip finishes once the first iterator is exhausted, so we were erroneously allowing things like assertEquals([1, 2], [1]) to pass.	2017-06-26 14:13:21 -04:00
Ben Zhang	4862c0f47f	Memonger in O(blobs) Summary: Made them faster. This should be equivalent to the algorithm akyrola suggested, just with a list (of parents) as an intermediate representation instead of a string. Reviewed By: akyrola Differential Revision: D5308133 fbshipit-source-id: c976a513d10e79c157ea803afb99b147e9ea3357	2017-06-26 11:04:13 -07:00
Aaron Markham	82c38daa85	added research award info Summary: Closes https://github.com/caffe2/caffe2/pull/863 Differential Revision: D5320787 Pulled By: akyrola fbshipit-source-id: f59e874fafeba8879b1cf638be31fc54aa967cbb	2017-06-26 10:18:19 -07:00
Aapo Kyrola	87275817a4	fix a rare race condition by initializing scratch blobs beforehand Summary: Data workers test timeouts randomly (very seldom), and looks like the reason is that we call FeedBlob in a thread (eneuque-thread), and first time that is called, it will call workspace.CreateBlob() -- which is not thread safe. Fix this by initializing the scratch blobs explicitly. Reviewed By: panshen1 Differential Revision: D5292426 fbshipit-source-id: d7dad68f3ccc636c60bd82b2527f00f20da298b5	2017-06-26 10:18:18 -07:00
Luke Yeager	553e4ec20d	Refactor conv_test - no cuDNN+dilation+NHWC Summary: Place all the cuDNN version checks in a helper function. Easier to use in future tests and update for newer versions of cuDNN in one place. Fixes this error in `test_convolution_gradients`: ``` RuntimeError: [enforce fail at conv_op_cudnn.cc:519] status == CUDNN_STATUS_SUCCESS. 9 vs 0. , Error at: /data/caffe2/caffe2/operators/conv_op_cudnn.cc:519: CUDNN_STATUS_NOT_SUPPORTED Error from operator: input: "X" input: "w" output: "Y" name: "" type: "Conv" arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NHWC" } arg { name: "dilation" i: 2 } arg { name: "kernel" i: 1 } device_option { device_type: 1 } engine: "CUDNN" ``` Closes https://github.com/caffe2/caffe2/pull/839 Reviewed By: salexspb Differential Revision: D5292123 Pulled By: akyrola fbshipit-source-id: 513cc742be73c29ffe24e9e964845a217405a73d	2017-06-26 09:20:07 -07:00
Christian Sarofeen	7806a09f03	Fp16 fixes for CUDA 9 (#783 )	2017-06-26 11:38:18 -04:00
albanD	7523c49f03	add missing INCREF	2017-06-26 11:33:16 -04:00
Simone Cirillo	733a7c6d9a	Fix segfault in SpatialDepthWiseConvolution w/o bias	2017-06-26 16:33:45 +02:00
Alexander Sidorov	c8410859d9	Operator python stacktraces, attempt 2 Summary: Last time I used uuid filled into OperatorDef. And operator_tracebacks was populated using traceback.extract_stack. There were several issues with this approach: 1. A random field in OperatorDef breaks workflows relying on memoization, i.e. when computation is skipped based on already computed result before. 2. Adding one more field revealed RNNs being non forward compatible wrt to new fields in there. prototxt format seems to not allow forward compatibility (thanks jamesr66a for the investigation!). For RNNs we need to swtich them to a more resilient approach. azzolini's proposed change to OperatorDef / NetDef would allow that by just nesting NetDef dirrectly inside OperatorDef without need for extra serialization. 3. traceback.extract_stack is very slow when executable is on a remote filesystem. It does one or more os.stat for each frame on the stack. For some cases it ended up being up to 15 extra minutes on model construction. In this diff I use a different approach which should fix all those problems above. 1.2. are solved by not adding a new field at all. Instead I report operator idx wrt to a net it runs in. Thanks akyrola and dzhulgakov for the idea. Downside here is that operator list manipulation breaks the logic and separately created ops are not covered at all. 3. I solved this by operating on raw frames without using traceback and inspect modules which end up doing a lot of file system calls. See function extract_stacktace in core.py with additional comments. Reviewed By: dzhulgakov Differential Revision: D5286285 fbshipit-source-id: 626dd0f5f6b8b1d86bd6bf519078b122f43ddcaa	2017-06-25 19:32:58 -07:00
James Cross	29887f556f	Unrolled test for AttentionCell Summary: Adding a test to check computational integrity of networks constructed with AttentionCell using UnrolledCell. Reviewed By: salexspb Differential Revision: D5306915 fbshipit-source-id: 02acfd1011f7d3ee5fac21cc2778c4a486190c43	2017-06-25 17:21:24 -07:00
Gregory Chanan	32e666551a	Fix lint.	2017-06-24 09:45:21 -04:00
Gregory Chanan	ab0c321f80	Fix index_copy gradgrad test by ensuring indices cannot be repeated.	2017-06-24 09:45:21 -04:00
Gregory Chanan	9db14936eb	Ensure masked_select tests don't have masks of all zeros which yields 0-dimensional tensors.	2017-06-24 09:45:21 -04:00
Gregory Chanan	e5857c5f1c	Implement Gather double backwards.	2017-06-24 09:45:21 -04:00
Gregory Chanan	7da77c4255	Add ScatterAdd autograd function.	2017-06-24 09:45:21 -04:00
Gregory Chanan	656cb1c31a	Implement and test double backwards for IndexCopy.	2017-06-24 09:45:21 -04:00
Gregory Chanan	4ab4938cf0	Fix and test single backwards IndexCopy.	2017-06-24 09:45:21 -04:00
Gregory Chanan	1324c4b081	Implement double backwards for masked_scatter.	2017-06-24 09:45:21 -04:00
Gregory Chanan	bb3779efe8	Add broadcasting to masked_select.	2017-06-24 09:45:21 -04:00
Yangqing Jia	0394bc2b40	fix clip_op bug Summary: ajtulloch caught me. This fixes the min() bug - should be lowest(). Reviewed By: xianjiec, dzhulgakov Differential Revision: D5316406 fbshipit-source-id: 76c13e8eddc4233b40f99a801910fbf7a1ef6b28	2017-06-23 22:31:54 -07:00
ngimel	7c24a3d5cf	fix arguments for cudnnFindEx for transposed wgrad	2017-06-23 23:18:32 -04:00
Jay Mahadeokar	04c9c8c5c2	fix for loading model with bmuf Summary: - One line fix for loading saved checkpoint when using Parallelize_GPU_BMUF Reviewed By: asaadaldien Differential Revision: D5315254 fbshipit-source-id: a20ba6438c8e6b2ef44b65270c1d3f9ab645ded0	2017-06-23 17:16:33 -07:00
Luke Yeager	5cb73c106e	TravisCI: fix OSX builds Summary: Apparently, `brew install` fails if the package is already installed? ``` Error: automake 1.15 is already installed ``` https://travis-ci.org/caffe2/caffe2/jobs/245226634 Maybe TravisCI made some unannounced updates to their OSX images at around the same time [they updated their trusty images](https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch). Something changed on their side two days ago, and the OSX builds have been failing ever since. Closes https://github.com/caffe2/caffe2/pull/858 Differential Revision: D5313447 Pulled By: aaronmarkham fbshipit-source-id: 7085640704c60c0119a1a75ea69dacd64b5a4da8	2017-06-23 17:08:54 -07:00
Simon Layton	194bc404b5	CUDA 9 Summary: Adds basic CUDA 9 support, including adding Volta arch, and making appropriate modifications for half precision datatype changes Closes https://github.com/facebookincubator/gloo/pull/49 Differential Revision: D5315336 Pulled By: pietern fbshipit-source-id: 6468b0f357206d604bdcfec69ba82509a2c91407	2017-06-23 16:41:27 -07:00
Clément Godard	fd86c51c39	Add ResizeNearest Summary: Added the CUDA implementation of ResizeNearest (forward pass only) Reviewed By: wickedfoo Differential Revision: D5290087 fbshipit-source-id: 4291e65b2b4b6a1a197275d5ed8710f40000b59e	2017-06-23 15:49:42 -07:00
Zheng Yan	8f1e641d5f	Deprecate CNNModelHelper in python/data_workers_test.py Summary: Deprecate CNNModelHelper in python/data_workers_test.py Reviewed By: harouwu Differential Revision: D5312089 fbshipit-source-id: 37b72ac2031acf14a7e6a6ea0a298b71b00b10dd	2017-06-23 14:46:58 -07:00
Hyungsuk Kang	ca2b608f83	Fixed typo Summary: peaces -> pieces, peace -> piece Closes https://github.com/caffe2/caffe2/pull/819 Differential Revision: D5312417 Pulled By: aaronmarkham fbshipit-source-id: 59d2c3f475197a5f29dc7cf3ecaf675a242d3cdf	2017-06-23 14:02:40 -07:00
Thomas Dudziak	342de07231	Core unit test fixes for Python 3 Summary: As title Differential Revision: D5291327 fbshipit-source-id: 7dd9279c53ba55d3422c31973ffcec5705787fdf	2017-06-23 13:22:16 -07:00
Zach DeVito	a9ea975977	enable warnings in build and fix warnings	2017-06-23 11:49:09 -07:00
Ben Zhang	ff914bf201	Move away from creating netdef from string, which may get deprecated Summary: Remove cases of constructing a NetDef from String, instead of just creating a NetDef. Reviewed By: salexspb Differential Revision: D5309645 fbshipit-source-id: 06ec8617733d9dc5385668485f3b091bb37b3f73	2017-06-23 11:36:23 -07:00
James Cross	ccc46229af	Fix residual connections Summary: This diff fixes gradient computation of residual connections for a training network constructed with MultiRNNCell. It addresses a logic bug in _prepare_output() and _prepare_output_sequence() by keeping track internally of which layers have consecutive residual connections before the output, and then reconstructing the final residual output by (re-)preparing the output of each of those layers and then combining them with a Sum operation. This also involves keeping track of which states contribute toward the reconstruction of the final sequence output so that outputs_with_grads can be correctly passed to apply_over_sequence(). Differential Revision: D5300520 fbshipit-source-id: f37d800c909e631175de7045abe192351cc11c41	2017-06-23 11:36:22 -07:00
Zach DeVito	b1a84e3c70	update readme and add assign_(Scalar) variant	2017-06-23 11:27:55 -07:00
Simone Cirillo	8a24f2b4d8	Fix segfault in SpatialDepthWiseConvolution w/o bias	2017-06-23 11:14:00 +02:00
Zach DeVito	66d93b60b3	fix a bug with scalar handling by simplifiying the maybeScalar check.	2017-06-22 23:07:56 -07:00
Zach DeVito	2af6ba3b2a	handle select and operator[] style operations	2017-06-22 22:57:43 -07:00
Zach DeVito	b59b44fac7	add checks for scalars on output	2017-06-22 21:46:04 -07:00
Zach DeVito	a10a1c92b1	start adding rules to propagate scalar to results	2017-06-22 20:51:02 -07:00
Zach DeVito	bb6908e163	Scalar objects can now be backed by 0-dim Tensors.	2017-06-22 18:57:09 -07:00
James Reed	0c19074c56	LOG(WARNING) when an operator fails feature checks Summary: We had a latent cudnn operator instantiation failure that we didn't know about until I looked at the nvvp profile. This makes it so that those failures (i.e. OPERATOR_NEEDS_FEATURE failures) print to LOG(WARNING) instead of VLOG(1) Reviewed By: salexspb Differential Revision: D5303012 fbshipit-source-id: bda54682d9932f907e44aa1c81a04521d864ae99	2017-06-22 18:33:44 -07:00
Zach DeVito	c555cd8253	missing fixed allocator files	2017-06-22 18:32:10 -07:00
Zach DeVito	5e078bb7cc	scalar flags added, and used to dispatch when there is a scalar variant of a function. broadcast annotations are used to figure out when a scalar s + A should also be converted.	2017-06-22 17:22:16 -07:00
Kittipat Virochsiri	2c73ae507a	Allow assertValidationChecks to take init_net Summary: This is needed so that we can create blobs that are not numpy arrays, e.g., creating mutex with `CreateMutex` op. Reviewed By: chocjy Differential Revision: D5303742 fbshipit-source-id: f83cbf67c658a234c1e4a9a114ad943a4e360598	2017-06-22 16:02:43 -07:00
Ikhwan Lee	209c570f0d	Deprecate CNNModelHelper in caffe2/python/model_device_test.py Summary: Deprecate CNNModelHelper in caffe2/python/model_device_test.py Reviewed By: harouwu Differential Revision: D5299367 fbshipit-source-id: 5ab53b877b7c0f1a1c4daf2338d5024b2d2d9261	2017-06-22 15:37:17 -07:00
Varun Agrawal	ee10e7457f	Corrected erroneous docstring for MultiLabelSoftMarginLoss	2017-06-22 17:42:18 -04:00
Ahmed Taei	5ca263fb1c	Add a warmup option for BMUF Reviewed By: yqwangustc Differential Revision: D5279655 fbshipit-source-id: 7c778a88909580bbe43d4bac4b7d73be0d0e3f27	2017-06-22 14:32:39 -07:00
Soumith Chintala	7cd6cc17af	Merge commit '93e05eb458ad4c939e905668c1792692315880b0'	2017-06-22 17:23:02 -04:00
Soumith Chintala	8bfef60b07	Merge commit '32fd4a3d6081a13c18ce4f8dcb37260a830a911f'	2017-06-22 17:22:31 -04:00
Trevor Killeen	a45ad7cfba	Advanced Indexing Part 1 -- Purely Integer Array Indexing	2017-06-22 17:21:50 -04:00
Trevor Killeen	93e05eb458	Advanced Indexing Part 1 -- Purely Integer Array Indexing	2017-06-22 17:21:30 -04:00
Trevor Killeen	32fd4a3d60	Advanced Indexing Part 1 -- Purely Integer Array Indexing	2017-06-22 17:21:19 -04:00
Aapo Kyrola	667b8347a2	stabilize softmax_ops_test Summary: softmax_ops_test occasionally fails with gradient checks. Stabilize by setting the numpy random seed. Also reduce some dimensions for the large input test to make it run faster. Reviewed By: harouwu Differential Revision: D5292106 fbshipit-source-id: a21eec89e18d30ac7c5609dacf5d413e841841a6	2017-06-22 13:50:32 -07:00
Sam Gross	f09027bc29	Add batch sampler to DataLoader (#1867 )	2017-06-22 20:18:31 +02:00
Soumith Chintala	9a196829e2	Merge commit '43dec0a210103c4421bc73c7e742f0f746b7e39e'	2017-06-22 13:55:54 -04:00
Gregory Chanan	43dec0a210	Remove THCTensor_(expand2) and THCTensor_(expand3). They are no longer needed and the corresponding TH versions have been removed.	2017-06-22 13:55:08 -04:00
Soumith Chintala	064ef8b81b	Merge commit '104234a6a8937f09208061975ce90190a7be4159'	2017-06-22 13:21:59 -04:00
Soumith Chintala	662faf7c41	Merge commit 'a940d4ff8bf5debc76d909a778e2e47d24148ee1'	2017-06-22 13:21:38 -04:00
cph	104234a6a8	add asserts to BCECriterion	2017-06-22 13:20:25 -04:00
cph	a940d4ff8b	add asserts to BCECriterion	2017-06-22 13:20:07 -04:00
Soumith Chintala	c16a268f47	Merge commit 'fb32164a72004e63ebfe1f9ca8366ff12f8fbec2'	2017-06-22 12:56:36 -04:00
Trevor Killeen	cb4eaa9c5d	TensorLib/Aten --> changes required in pytorch	2017-06-22 12:55:55 -04:00
Trevor Killeen	fb32164a72	TensorLib/Aten --> changes required in pytorch	2017-06-22 12:55:17 -04:00
Soumith Chintala	b5854a11c4	Merge commit 'eccc759c36a4023357c87fde79732e4c916676d2'	2017-06-22 12:49:50 -04:00
Gregory Chanan	ddbd4ef4ac	Support out-of-place broadcast type definitions.	2017-06-22 12:49:06 -04:00
Gregory Chanan	eccc759c36	Support out-of-place broadcast type definitions.	2017-06-22 12:48:43 -04:00
Soumith Chintala	fecd05ba2f	Merge commit '81e14ad2dee356b2c2274eb302bc2438c9a6161a'	2017-06-22 12:46:37 -04:00
Soumith Chintala	a7d1cd75ec	Merge commit '93a7c9de29900f166486373744a0e90c7046a56a'	2017-06-22 12:46:02 -04:00
Brandon Amos	497db732fc	btrifact: Make pivoting optional.	2017-06-22 12:45:14 -04:00
Brandon Amos	81e14ad2de	btrifact: Make pivoting optional.	2017-06-22 12:45:01 -04:00
Brandon Amos	93a7c9de29	btrifact: Make pivoting optional.	2017-06-22 12:44:51 -04:00
Soumith Chintala	96febbb762	Merge commit '62cfc94f445bfaeaccc3dcc1fc69ea5b75039823'	2017-06-22 12:40:40 -04:00
Soumith Chintala	62cfc94f44	improving TH error messages in Apply macros	2017-06-22 12:38:10 -04:00
LiZhaoxing	3f6cda8696	fix bug of threshold activation	2017-06-22 12:23:35 -04:00
Gregory Chanan	a836f8f56f	Use and document saved_variables for double backwards.	2017-06-22 11:46:24 -04:00
Zach DeVito	278cbbae49	set TH_INDEX_BASE to 0	2017-06-21 16:43:16 -07:00
Ahmed Taei	ffd32c8ab7	Add distributed BMUF implementation. Summary: Refactor data_parallel_model all_reduce and broadcast methods to work for a given parameter set not only gradients and reuse them for BMUF distributed implementation. Add a distributed test (multiprocessing) to BMUF. Reviewed By: akyrola Differential Revision: D5267083 fbshipit-source-id: 8dcc7527d0a755b903d693d8071585f0b54d3403	2017-06-21 16:18:11 -07:00
Zach DeVito	68cbb857f2	allow tensors to be constucted from views of external data. Support creating new tensors that already have a size/stride	2017-06-21 15:35:08 -07:00
Yan Shang	cf4ac83a91	Make List.__getitem__() works with output of List.field_names() Summary: As described in T19378176 by kittipatv, in this diff, we fix the issue of __getitem__() of schema.List. For example, given Map(int32, float) (Map is a special List), field_names() will return "lengths", "values:keys", & "values:values". "values:keys" and "values:values" are not accessible via __getitem__(). __getitem__() bypasses the values prefix and directly access the fields in the map. Other APIs (e.g., _SchemaNode & dataset_ops) expect "values:keys" and "values:values" as it simplifies traversal logic. Therefore, we should keep field_names() as is and fix __getitem__(). Reviewed By: kittipatv Differential Revision: D5251657 fbshipit-source-id: 1acfb8d6e53e286eb866cf5ddab01d2dce97e1d2	2017-06-21 14:06:05 -07:00
Ben Zhang	f937e4bffb	Revert D5288993: Memonger Graph Equality into Memonger Summary: This reverts commit b9f105ce00148b2673eed2dd390ab74f82f990ad Differential Revision: D5288993 fbshipit-source-id: 8f2e69c0ca21e142eb43b450d0b52ba76a5e429f	2017-06-21 13:45:50 -07:00
Ahmed Taei	97ca7d7e6f	Remove unused thrust headers from math_gpu. Reviewed By: wickedfoo Differential Revision: D5290296 fbshipit-source-id: 576208c49001b236b7ebda7c11b2f0e498da9ea4	2017-06-21 12:34:16 -07:00
Zach DeVito	a1c557bc45	improve error reporting for undefined tensors passed as arguments.	2017-06-21 12:24:59 -07:00
Peizhao Zhang	8464ec5c3a	Fixed a bug in compute_interference_graph() when using with multiple in-place operators. Summary: compute_interference_graph() was not able to handle the case when a blob is reused twice for operators supporting in-place parameters. For example, for the following network with operators Mul and Sub (blob) -> [Mul] -> (blob) -> [Sub] -> (blob) an incorrect edge will be added from [Sub] to [Mul] and causes nx.is_directed_acyclic_graph() to fail. Reviewed By: ajtulloch Differential Revision: D5271604 fbshipit-source-id: f6095b6f8e1dba556ba223a82c8170be7f744529	2017-06-21 12:01:37 -07:00
Zach DeVito	4c5b7d41ba	tensor.data<> also as toLongData() variants. Scalar now also has .to<T>() variants	2017-06-21 11:57:37 -07:00
Ahmed Taei	a531d74dde	ELU CUDA implementation Reviewed By: wickedfoo Differential Revision: D5290111 fbshipit-source-id: 95bd0b5467fe064f2fe1b21cb8ec31f150b35e3f	2017-06-21 11:47:13 -07:00
Zach DeVito	13e7648fd1	document accessors	2017-06-21 11:23:03 -07:00
Tao Wu	4be5337cca	add support for weight in batch_softmax_loss Summary: weighted batch_softmax_loss when weight exists in input_record Reviewed By: kittipatv Differential Revision: D5291646 fbshipit-source-id: f1bcd386ad1fc0e95e0a0315ec1c36531c792495	2017-06-21 10:32:15 -07:00
Ben Zhang	f222e226b4	Memonger Graph Equality into Memonger Summary: Make verify_graph_equality get called by share_grad_blobs and optimize_inference_for_dag Reviewed By: akyrola Differential Revision: D5288993 fbshipit-source-id: b9f105ce00148b2673eed2dd390ab74f82f990ad	2017-06-21 10:09:15 -07:00
Luke Yeager	249614ca88	Fix CMake messages when CUDA not present Summary: Closes https://github.com/caffe2/caffe2/pull/767 Differential Revision: D5292618 Pulled By: akyrola fbshipit-source-id: 22bcfe01244d6beb48c580c84c790c810dc06998	2017-06-21 08:47:46 -07:00
Luke Yeager	d46fe736c8	Fix flaky test in dataset_ops_test.py Summary: ``` while pytest caffe2/python/operator_test/dataset_ops_test.py::TestDatasetOps::test_collect_tensor_ops; do sleep 0.1; done ``` Run this long enough and you'll see an error like this: ``` Sample histogram: [ 92 109 65 103 99 104 99 125 100 104] ... > self.assertTrue(all(hist > 0.7 * (num_to_collect / 10))) E AssertionError: False is not true ``` I've seen values like 65, 68, 69, 70. Setting the cutoff at 60 instead of 70 seems safe enough. /cc Yangqing (or whoever authored `a56b881c4a`). Closes https://github.com/caffe2/caffe2/pull/840 Differential Revision: D5292120 Pulled By: akyrola fbshipit-source-id: 2ea4cbb58e206268759bd9d3639e8921623f519c	2017-06-21 05:35:44 -07:00
Luke Yeager	005156f6b4	Fix gradient checking for softplus op Summary: kmatzen why did you set the stepsize in `ff84e7dea6`? The test is flaky before this change. Solid afterwards. Closes https://github.com/caffe2/caffe2/pull/841 Differential Revision: D5292112 Pulled By: akyrola fbshipit-source-id: c84715261194ff047606d4ec659b7f89dac3cbb1	2017-06-21 05:35:43 -07:00
Luke Yeager	e2107fffba	Fixes for test_recurrent in hypothesis_test.py Summary: /cc akyrola is it possible this test has been broken ever since `5614816fce`? More generally, why do we still have `hypothesis_test.py` at all? In the case of this test, surely one of these files does more than this one old test: * `operator_test/cudnn_recurrent_test.py` * `operator_test/recurrent_network_test.py` * `operator_test/rnn_cell_test.py` Closes https://github.com/caffe2/caffe2/pull/843 Differential Revision: D5292109 Pulled By: akyrola fbshipit-source-id: 6df5df6353a9741d1ae1b796adaab98382857527	2017-06-21 05:35:42 -07:00
Alisson Gusatti Azzolini	c24dabb414	Enable runtime cloning of tasks. Summary: Funnily, the biggest issue when trying to increase number of trainers from 5 to 20 is not model convergence (it is worse but still converges without tuning); it is the initialization time: it took around 30 min to generate the job. After this diff, job creation time for the standard 5-7 setup goes from 125s to 8s. (15x speedup). Another improvement is that ##net_printer.to_string(job)## becomes less complex. This makes the startup for 20 trainers go to 32s, which is still not ideal. Next step will be to allow passing num_instances to Node as well. This way we'll be able to create only one reader and one trainer prototype and let the framework take care of the scheduling. For this one we will need to move some DataStream and PS initialization code to C++ first. (c.c. aartibasant) Reviewed By: dzhulgakov Differential Revision: D5100788 fbshipit-source-id: 7b76bce108f527a96b2bfe7ed43a22ea8679b682	2017-06-21 03:18:20 -07:00
Aapo Kyrola	f795bf0b2a	Revert D5273337: [caffe2] Pare down on excessive futex() syscalls from the DAGNet executor Summary: This reverts commit 67d50f9d838e9a9ef3682d9a3b5ba59c7d33350d Differential Revision: D5273337 fbshipit-source-id: 85e2f3ef228871beed2afef569407474c8f8acb9	2017-06-21 01:48:24 -07:00
Bokai Cao	d9087edb07	add rekey in feature_processor Differential Revision: D5270972 fbshipit-source-id: 8805c0e947f4752d2c575e2a7b8986cd804601dc	2017-06-20 23:19:09 -07:00
Aapo Kyrola	34eaa19d27	CPU data parallel model Summary: CPU -version of data parallel model. Great thing is that now we can run data_parallel_model_test in Sandcastle (as it does not have GPUs). Pretty simple change, really. I did not change all variable names with "gpu" in them, to reduce risk (and being a bit lazy). Can improve later. Reviewed By: wesolwsk Differential Revision: D5277350 fbshipit-source-id: 682e0c5f9f4ce94a8f5bd089905b0f8268bd2210	2017-06-20 23:19:08 -07:00
Alisson Gusatti Azzolini	7d482742fd	Allow tasks/execution_steps to be cloned at runtime Summary: Advantages of cloning the tasks/execution_steps at runtime: - Less complexity on the python side: no need to clone nets and add prefixes to blob names - Faster start-up: we had cases of complex plans that took up to 30min to be created. - Better isolation: each task cloned at runtime has its own child workspace, preventing false sharing of blobs. - Opens up possibility for dynamic scheduling: Number of threads per task can be increased on the fly, at runtime. Reviewed By: dzhulgakov Differential Revision: D5100730 fbshipit-source-id: 71b83193b135da4e6eaf2536d8fc266528e1fdcc	2017-06-20 22:32:07 -07:00
Gregory Chanan	1572173ca7	Implement double backwards for Sort, Topk.	2017-06-21 00:24:13 -04:00
Gregory Chanan	e16ceef76a	Implement Scatter double backwards.	2017-06-21 00:24:13 -04:00
Gregory Chanan	b79ff11aca	Implement IndexAdd, IndexFill, IndexSelect, MaskedSelect double backwards.	2017-06-21 00:24:13 -04:00
Gregory Chanan	50c0912a75	Implemented masked_fill double backwards.	2017-06-21 00:24:13 -04:00
Ahmed Taei	43afb1d4ca	Make sure Elu alpha is strictly positive Reviewed By: Yangqing Differential Revision: D5289483 fbshipit-source-id: 96223304e4b1278595bae5ed137b5b80b7f8f521	2017-06-20 21:19:48 -07:00
Zach DeVito	c3ad55f746	add readme and generated files for Type/Tensor/Functions to a doc folder to make it possible to view headers without building the library	2017-06-20 20:33:26 -07:00
Ben Zhang	29f037f432	Improved Observers, based on NetBase now Summary: Fixed a lot of issues that salexspb brought up, and templates on NetBase which basically adds compatibility for DAGNetBase. This will be useful for Fei's future work. Reviewed By: salexspb Differential Revision: D5272352 fbshipit-source-id: b5ffe1d6fb0566dc1bfad9041c129a3ab7f6d93a	2017-06-20 18:22:38 -07:00
Zach DeVito	4b93f32234	rename TensorLib -> ATen	2017-06-20 16:49:13 -07:00
Jacqueline Xu	5957218cf0	Adding Dropout Layer to SparseNN Model and Flow Summary: - Incorporated dropout layer to the sparseNN training and testing pipeline - Integrated an advanced model options feature on Flow UI for users to specify dropout rate - Created an end-to-end unit test to build and run a model with dropout Reviewed By: chocjy Differential Revision: D5273478 fbshipit-source-id: f7ae7bf4de1172b6e320f5933eaaebca3fd8749e	2017-06-20 15:46:55 -07:00
Valentin Haenel	03f41c8120	fix capitalization of Python, make it consistent	2017-06-21 00:09:37 +02:00
Davin Wang	dd1525d346	fix #790 so model.init_params = False takes effect Summary: Given the parameter init_params=False, Weight Blob(_w) and Bias Blob (_b) should be suppressed in model.param_init_net. Without this fix, the init_params=False doesn't take effect in brew.conv as it does in brew.fc or other ops. This issue is the root cause of #790 [https://github.com/caffe2/caffe2/pull/790]. Closes https://github.com/caffe2/caffe2/pull/824 Reviewed By: harouwu Differential Revision: D5276676 Pulled By: akyrola fbshipit-source-id: 8f7088a8e1976658f67e027223e555375b3a2392	2017-06-20 14:08:35 -07:00
Kun Han	673f1d9362	Fix packsegments op and text RNN models batchsize > 0 Summary: Fix PackSegments op warning when using pad_minf Differential Revision: D5259897 fbshipit-source-id: 3c117c89d23c4ee1c67e5824b80a18bb52e16a07	2017-06-20 12:18:56 -07:00
Aapo Kyrola	5084ff3b9b	improve blob sharing Summary: Since D5193393 introduced a "token" system for memonger that prevents sharing of blobs across parallel branches, we can be more aggressive in blob sharing. Thus, this removes the tracking of 'unused free blobs' and just relies on the token system. For forward-only resnet50, this reduces the number of shared blobs to 5 (optimal accorsing to akirillov's calculation). This requires careful testing, so I will not land it soon. Reviewed By: asaadaldien Differential Revision: D5208985 fbshipit-source-id: 2e520c4ea2351a2ec327b6c5f2e3af24234d1c9a	2017-06-20 12:08:57 -07:00
Gregory Chanan	e0b70d0f64	Fix Fmod/Remainder gradgradcheck by ensuring inputs requires_grad.	2017-06-20 11:59:21 -04:00
Gregory Chanan	0b2b7d0594	Kth value function passes gradgradcheck.	2017-06-20 11:59:21 -04:00
Pieter Noordhuis	6d97ac0c0f	Missing includes in cuda_collective_device.h Summary: Closes https://github.com/facebookincubator/gloo/pull/47 Differential Revision: D5283752 Pulled By: pietern fbshipit-source-id: 8ad3353b3455c5416e31e75b46755e2f7fcaad52	2017-06-20 08:54:16 -07:00
Wenfang Xu	64bec43916	Fix a bug in BooleanUnmaskOp Reviewed By: kennyhorror Differential Revision: D5245130 fbshipit-source-id: 5ec602a33207c9d5de0f2d8d022fdcc540212586	2017-06-20 08:34:09 -07:00
Pieter Noordhuis	a405efa756	CUDA collectives as alternative to NCCL Summary: Adds a separate set of CUDA collectives that run on device as an alternative to NCCL. Use these collectives as default on-device collectives instead of NCCL. Whenever multiple processes on the same machine use Gloo with NCCL and end up doing concurrent CUDA memory allocations and algorithm execution, we risk deadlock. A follow up change will enable opt-in usage of NCCL (e.g. through environment variable). Benchmark output below with varying number of elements. It shows a minor improvement over using NCCL for local reduction and broadcast. Number of elements equal to on-device threshold (256K): ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 2685 2907 3035 3215 562 (after) 262144 2682 2874 3013 3395 577 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring_chunked Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 2045 2133 2325 2643 725 (after) 262144 1533 1673 1834 2048 800 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_halving_doubling Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 262144 1580 1640 1718 2069 893 (after) 262144 1371 1446 1539 1748 1125 ``` Larger number of elements (4M): ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 55543 58058 60103 62659 32 (after) 4194304 54490 57923 60893 66058 33 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_ring_chunked Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 18049 22820 24997 26634 105 (after) 4194304 18356 20463 21695 22589 99 Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: cuda_allreduce_halving_doubling Options: processes=2, inputs=8, gpudirect=no elements min (us) p50 (us) p99 (us) max (us) samples (before) 4194304 18584 24345 27809 29722 95 (after) 4194304 19541 22718 25408 26688 88 ``` Reviewed By: akyrola Differential Revision: D5278192 fbshipit-source-id: 53f09e404663ddc8bb46d06ac87afd8ee3ffc3a2	2017-06-20 00:23:43 -07:00
Luke Yeager	5e084a9112	Don't require pydot for Python tests Summary: Working towards https://github.com/caffe2/caffe2/pull/817. ``` > graph = pydot.Dot(name, rankdir=rankdir) E AttributeError: 'NoneType' object has no attribute 'Dot' ``` https://travis-ci.org/caffe2/caffe2/jobs/243867951 Closes https://github.com/caffe2/caffe2/pull/827 Differential Revision: D5276691 Pulled By: akyrola fbshipit-source-id: 047ee869c029002ace29d84c6b56534b7f23f87b	2017-06-19 23:02:00 -07:00
Aapo Kyrola	a5c45e18b5	MaxGradientOp for CUDA + unit test Summary: As title. Pretty straightforward. Could actually run each kernel in parallel, but we can optimize later if needed. Reviewed By: Yangqing Differential Revision: D5278415 fbshipit-source-id: 29f59afe28f37fc4152ec7eb7cd6c1ab65f2cb8c	2017-06-19 22:35:45 -07:00
Aarti Basant	611677702f	Minor Fix in VideoInputOp Summary: end_frm must be Less Than or equal to sampledFrames.size() Reviewed By: dutran Differential Revision: D5279265 fbshipit-source-id: 6bae714db6e07ff10ac01c95e6bead786d4941d2	2017-06-19 22:09:13 -07:00
Alykhan Tejani	67968cb60b	Add numerically stable BCELoss which takes logits as input (#1792 )	2017-06-19 22:05:51 -04:00
Bokai Cao	d2b1cb22a4	rekey layer Differential Revision: D5210095 fbshipit-source-id: dc66a10d95842e0f10cb53a5afb7ddcc3fcac0de	2017-06-19 18:47:28 -07:00
Pieter Noordhuis	a6c5e3f2e2	Fix case where interface doesn't have an address Summary: Code in tcp/transport tries to find the network interface a socket was bound to when create a TCP device context. Per getifaddrs(3), it is possible for the ifa_addr field to be NULL (supposedly when an interface doesn't have an address). Ignore such entries. Thanks to slayton58 for reporting this. Reviewed By: wesolwsk Differential Revision: D5279376 fbshipit-source-id: 039380b95ba4d6d94942c30581e0b230a060870c	2017-06-19 18:05:32 -07:00
Leonid Vlasenkov	6ee6b4980b	multiple docs	2017-06-19 20:06:27 -04:00
Alexander Sidorov	83e6a0bec8	Revert uuid change to OperatorDef protobuf Summary: a few issues: 1. Randomization hurts memoization 1. Even if we make it non random, then we can get key colisions when loading it back. 2. RNNs use prototxt for step net and apparently its not forward compatible like normal protobuf is I am thinking of a better less invasive solution now. Reviewed By: jamesr66a Differential Revision: D5272118 fbshipit-source-id: ab577fad04fbfc632e1fceffa923377a0d3da1be	2017-06-19 16:47:31 -07:00
Andrew Gallagher	ceb13c8cc3	Don't propagate `-mavx` flag to dependents Summary: Previously, `gloo/math.h` inlined methods which use AVX builtins, which required propagating the `-mavx` flag. This diff moves these definitions out of the header and into a source file to prevent avoid this. Reviewed By: pixelb Differential Revision: D5271043 fbshipit-source-id: dde4dc560dfb557b46d1a582a8b38e7cb8eb0c37	2017-06-19 16:46:43 -07:00
Dmytro Dzhulgakov	a6fcecaa71	Allow AliasOp to work on empty tensor Summary: Ran into it while working on a dper benchmark. Apparently it works harmless even with empty tensors. Reviewed By: akyrola Differential Revision: D5273672 fbshipit-source-id: a968ae03a659d6c1a215f12cc35f7ba68448e833	2017-06-19 15:24:02 -07:00
Gregory Chanan	82ef292f00	Add gradgradchecks for various autograd Functions and support Unfold double backwards.	2017-06-19 18:19:16 -04:00
Francisco Massa	76ee014d10	Add documentation to SELU and AlphaDropout	2017-06-19 18:18:01 -04:00
Francisco Massa	f619ac6ac9	Quickfix for AlphaDropout on CUDA	2017-06-19 18:18:01 -04:00
Jacqueline Xu	6150d9bef2	Building dropout as layer Summary: Dropout layer and unittest for DPer2 Reviewed By: chocjy Differential Revision: D5254866 fbshipit-source-id: 5eaea81808ddf8e0c7a7d76209ea44cda2ee28aa	2017-06-19 14:46:52 -07:00
James Reed	956e40f0ea	Pare down on excessive futex() syscalls from the DAGNet executor Summary: For our CNN training runs I noticed an excessive number of futex() syscalls. Using strace I narrowed this down to excessive calls to std::condition_variable member functions. 1) I added a PushBulk member function to SimpleQueue, that will push all items in a vector onto the queue and issue a single std::condition_variable::notify_all() call, rather than separate notify_one() calls per item. 2) In DAGNet::WorkerFunction, we were calling std::condition_variable::notify_one() after every single op chain was completed, even though it should have only been called when the number of remaining operators dropped to 0 or the execution failed. I added a conditional check around this call to further cut down on unnecessary syscalls. Reviewed By: pietern Differential Revision: D5273337 fbshipit-source-id: 67d50f9d838e9a9ef3682d9a3b5ba59c7d33350d	2017-06-19 14:19:39 -07:00
Luke Yeager	31e700910d	Fix entropy error coming from test_div Summary: Working towards https://github.com/caffe2/caffe2/pull/817. `E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(4, 2, 5, 1, 3, 5, 5, 1), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 24576000.` https://travis-ci.org/caffe2/caffe2/jobs/243867951 Closes https://github.com/caffe2/caffe2/pull/828 Differential Revision: D5276723 Pulled By: akyrola fbshipit-source-id: f7d0e2dd8ef8b6a2354bd4ff7c7446c377c954b4	2017-06-19 13:47:29 -07:00
Xiaoti Hu	969831ea33	Deprecate CNNModelHelper in lmdb_create_example Reviewed By: akyrola Differential Revision: D5233793 fbshipit-source-id: bae745791f071bc36fd45bd81145ce86c8ba9ed0	2017-06-19 13:04:02 -07:00
Pieter Noordhuis	32e6372538	Split cuda_collectives.h into two files Summary: This changes prepares for having a separate set of collectives that use native CUDA calls instead of NCCL. This is needed to workaround the issue where NCCL deadlocks when it is interleaved with CUDA memory management operations in other processes on the same machine. Includes a modification to the host reduction functions to bring them up to parity with the NCCL reduction functions (they now incorporate offset/counter arguments). Reviewed By: wesolwsk Differential Revision: D5276291 fbshipit-source-id: 8844731760d2c48577d207c026ce0cd641f2fc6d	2017-06-19 12:57:53 -07:00
Viswanath Sivakumar	36bfe5946d	fbcode nnpack ops for Relu and LeakyRelu Summary: As in title Reviewed By: ajtulloch Differential Revision: D5261447 fbshipit-source-id: 5ac4ff52a26a2d310238cd1ead90ee2736e8c5a1	2017-06-19 12:36:32 -07:00
Wael Abdelghani	4b4022ded7	Make test_lstm_main more stable Summary: Title Reviewed By: Yangqing Differential Revision: D5268569 fbshipit-source-id: f79c38376ef2dd0684fd438668b0762341d982cf	2017-06-19 12:36:29 -07:00
Luke Yeager	2579be1227	Skip fp16 initializer test for CPU-only builds Summary: Working towards https://github.com/caffe2/caffe2/pull/817. ``` E AttributeError: Method FloatToHalf is not a registered operator. Did you mean: [] ``` https://travis-ci.org/caffe2/caffe2/jobs/243867951 /cc slayton58 Closes https://github.com/caffe2/caffe2/pull/829 Differential Revision: D5276796 Pulled By: akyrola fbshipit-source-id: 34edca6090a9ce7ab39ae1fdc0e83b5c3b7e4f49	2017-06-19 12:21:25 -07:00
Aaron Markham	31769fbaf8	removed events and user group info Summary: Closes https://github.com/caffe2/caffe2/pull/816 Differential Revision: D5276778 Pulled By: akyrola fbshipit-source-id: 28bf0724a360e37cd3171eef9c47addf8b4e6b42	2017-06-19 12:21:24 -07:00
Luke Yeager	90a52c3904	Skip TestInferDevice if no GPU support Summary: Working towards https://github.com/caffe2/caffe2/pull/817. ``` E AttributeError: Method CopyCPUToGPU is not a registered operator. Did you mean: [] ``` https://travis-ci.org/caffe2/caffe2/jobs/243867951 Closes https://github.com/caffe2/caffe2/pull/818 Differential Revision: D5276735 Pulled By: akyrola fbshipit-source-id: 35d9df19330ae522037e8a5d721d83dc2e5aa4dc	2017-06-19 12:21:24 -07:00
Aaron Markham	980e2a6b59	fixed input and output schema for all functions Summary: Closes https://github.com/caffe2/caffe2/pull/814 Differential Revision: D5276763 Pulled By: akyrola fbshipit-source-id: a1580d3deb2b72f52486aef379a6b8928a41301a	2017-06-19 12:21:23 -07:00
Luke Yeager	932cf9eb92	Fix entropy error coming from utility_ops_test Summary: Working towards https://github.com/caffe2/caffe2/pull/817. `E InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(20, 12, 22), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 43253760.` https://travis-ci.org/caffe2/caffe2/jobs/243867951 /cc kittipatv Closes https://github.com/caffe2/caffe2/pull/830 Differential Revision: D5276639 Pulled By: akyrola fbshipit-source-id: 0c21be25ecd931837dc8b0c2cc17048f531350d1	2017-06-19 12:09:32 -07:00
Anton Osokin	172a356668	forgotten import in variables.py Fixing error on line 661: warnings.warn("masked_copy_ is deprecated and renamed to masked_scatter_, and will be removed in v0.3") NameError: name 'warnings' is not defined	2017-06-19 14:23:48 +02:00
Ben Zhang	1ec0b89361	Memonger Graph Verifier Summary: We want to make sure that a graph optimized by memonger doesn't have any possibility of two threads writing into the same output blob at the same time, when blobs are renamed. Creates a graph where edges are built such that a parents node's output blob is a child node's input blob, and there is no node in between the parent and child node that writes to the same blob. If two nets generate the same such graph, then the "path" of data is the same. Reviewed By: akyrola Differential Revision: D5210385 fbshipit-source-id: 6317fc4e16289339b50c2dcd86ec8b32d2d544a5	2017-06-19 00:46:32 -07:00
Jeff Johnson	3f860af050	Implement TopKOp for GPU Summary: This is a real implementation (not GPUFallbackOp) of the TopKOp for GPU. There are two algorithm implementations: -for k <= 512, it maps to a warp-wide min-heap implementation, which requires only a single scan of the input data. -for k > 512, it maps to a multi-pass radix selection algorithm that I originally wrote in cutorch. I took the recent cutorch code and removed some cutorch-specific things as it made sense. Also added several utility files that one or the other implementations use, some from the Faiss library and some from the cutorch library. Reviewed By: jamesr66a Differential Revision: D5248206 fbshipit-source-id: ae5fa3451473264293516c2838f1f40688781cf3	2017-06-17 08:47:38 -07:00
Kai Arulkumaran	329a2f7d27	Prevent divide by zero in dropout with p=1	2017-06-17 11:38:02 -04:00
albanD	69e38ee821	clean test code, no functional change	2017-06-17 11:11:48 -04:00
albanD	38e6b9c7e7	fix bug in wrap_outputs miscounting the number of inputs	2017-06-17 11:11:48 -04:00
albanD	7775e9e777	add newNarrow to thpp THCTensor	2017-06-17 11:11:48 -04:00
albanD	293262b8f1	fix cuda tests	2017-06-17 11:11:48 -04:00
albanD	e66e01a2a0	remove extra computations for input usage check	2017-06-17 11:11:48 -04:00
albanD	0a93903e8e	move tests to test_nn	2017-06-17 11:11:48 -04:00
albanD	bcac55dd2f	force 1 stride for 1-sized dim for cudnn, fix lint, remove extra unpacking	2017-06-17 11:11:48 -04:00
albanD	6cdcd9c603	Add Narrow function clean error message and support non perfectly sized inputs	2017-06-17 11:11:48 -04:00
albanD	075030d974	add cuda tests that use only cunn for finite difference computations	2017-06-17 11:11:48 -04:00
albanD	23dec70614	comment on working values for epsilon	2017-06-17 11:11:48 -04:00
albanD	fc0ab229ad	remove extra cloning and add contiguous calls	2017-06-17 11:11:48 -04:00
albanD	ce3bc5a4a5	force cloning of weights	2017-06-17 11:11:48 -04:00
albanD	3dbece7eb5	clean tests	2017-06-17 11:11:48 -04:00
albanD	bd94718c87	cleaner AccumulateGrad	2017-06-17 11:11:48 -04:00
albanD	2f8d21a7f2	add contiguous function	2017-06-17 11:11:48 -04:00
albanD	4f4fc9091a	add support for newTranspose in thpp::THCTensor	2017-06-17 11:11:48 -04:00
albanD	7ee095cf7f	add newExpand and newView to thpp::Tensor	2017-06-17 11:11:48 -04:00
albanD	462ab8a644	add Transpose View Expand C functions	2017-06-17 11:11:48 -04:00
albanD	dd5c7c473f	Add ConvBackwardBackward class	2017-06-17 11:11:48 -04:00
albanD	6dca309017	make AccumulateGrad support no input gradient	2017-06-17 11:11:48 -04:00
albanD	f945fbc3dd	add gradgradcheck and conv double backward tests	2017-06-17 11:11:48 -04:00
Gregory Chanan	db70d4d223	1) Simplify CompareOp autograd backward 2) Use better approach for avoiding divide-by-0 in autograd tests.	2017-06-17 09:38:28 -04:00
Gregory Chanan	7714b5a088	Fix autograd shape tracking for 1-d reduction ops.	2017-06-17 09:38:28 -04:00
Gregory Chanan	860f51e67f	Avoid nans in fmod/remainder tensor tests. Also clean up CompareOp autograd backwards impl.	2017-06-17 09:38:28 -04:00
Gregory Chanan	2c04ce63a5	Fix masked_scatter autograd broadcasting.	2017-06-17 09:38:28 -04:00
Gregory Chanan	83bfa5e1ab	Fix masked_scatter pointwise autograd backward behavior.	2017-06-17 09:38:28 -04:00
Gregory Chanan	618f20fb38	Fix autograd broadcasting for masked_fill.	2017-06-17 09:38:28 -04:00
Gregory Chanan	9711223c12	Add broadcast autograd tests for dist.	2017-06-17 09:38:28 -04:00
Gregory Chanan	7d0f1c51bb	Fix autograd broadcast for min, max.	2017-06-17 09:38:28 -04:00
Gregory Chanan	7560474fbb	Fix autograd pointwise fallback for max,min.	2017-06-17 09:38:28 -04:00
Gregory Chanan	e69fe5bdb0	Automatically detect when to skip inplace tests and fix lint.	2017-06-17 09:38:28 -04:00
Gregory Chanan	f3ae90e329	Fix broadcast and pointwise compare ops with autograd.	2017-06-17 09:38:28 -04:00
Gregory Chanan	bfdd1f2199	Fix fmod/remainder autograd broadcasting.	2017-06-17 09:38:28 -04:00
Gregory Chanan	b164efb8b0	Fix lerp broadcast autograd.	2017-06-17 09:38:28 -04:00
Gregory Chanan	94c7260087	Fix pointwise fallback for lerp.	2017-06-17 09:38:28 -04:00
Gregory Chanan	aac459431b	Fix pow autograd broadcast.	2017-06-17 09:38:28 -04:00
Gregory Chanan	a04d1af0a4	Fix addr, addmm, baddmm, addmvm, addbmm broadcasting with autograd. Fix autograd broadcast for addmm, baddmm, others.	2017-06-17 09:38:28 -04:00
Gregory Chanan	a54a7c1312	Fix addcmul, addcdiv autograd broadcasting.	2017-06-17 09:38:28 -04:00
Gregory Chanan	9ba799c26b	Fix pointwise fallback for addcdiv, addcmul.	2017-06-17 09:38:28 -04:00
Gregory Chanan	5cfb1329b5	Make implementation of Variable.mul_ and Variable.div_ consistent.	2017-06-17 09:38:28 -04:00
Gregory Chanan	af2dd0d3e9	Fix autograd for broadcasting with add, sub, mul, div.	2017-06-17 09:38:28 -04:00
Gregory Chanan	79a343bbd4	Remove unnecesssary squeezing in Expand backwards. Also add size checks to test_autograd to try to catch such issues.	2017-06-17 09:38:28 -04:00
Isac Arnekvist	88e4bec8fa	resize bug fix	2017-06-17 11:07:22 +02:00
Ahmed Taei	044679ca7e	Fix Pooling ND non-symmetric padding check. Reviewed By: dutran Differential Revision: D5270021 fbshipit-source-id: bbad8e9f07af26f7e7522844eb35bf5631883107	2017-06-16 20:33:19 -07:00
James Reed	b077e28d48	make shape parameter op field for ReduceDimsOp Summary: Since shape tensor was allocated every time, the global allocation mutex was acquired, possibly leading to slowdown. Reviewed By: salexspb Differential Revision: D5263899 fbshipit-source-id: b44ff0b01342f116154ec2a9c65f91b5c0e51452	2017-06-16 18:04:18 -07:00
Soumith Chintala	faa7c2cc2c	fix cuda breakage	2017-06-16 20:13:46 -04:00
James Reed	21dc425e07	Optimize SumSqrElementsOp for CUDA Summary: The old version used one block with 128 threads. Throughput was too low for the NMT use case (calculating squared gradient norms for every parameter), so this increases the throughput. Shaves 7% off CNN model training time per step Reviewed By: wickedfoo Differential Revision: D5263748 fbshipit-source-id: adc3bacd11e49ea00c60381d613d993050e899be	2017-06-16 17:03:38 -07:00
Leonid Vlasenkov	3cecdf84f1	Storage from_file method (#1821 )	2017-06-17 00:34:20 +02:00
Simon Layton	49586d9556	Add basic API support for NCCL 2.0 Summary: \cc pietern Minimal changes to allow gloo to compile and run with NCCL 2.0 Closes https://github.com/facebookincubator/gloo/pull/46 Differential Revision: D5268074 Pulled By: pietern fbshipit-source-id: 58d625d57b31cfc932f3dbbdd7a4b83d9a2e60a8	2017-06-16 15:22:14 -07:00
Dmytro Dzhulgakov	12094b5114	Add random shuffle through the data to the benchmark workflow Reviewed By: kdub0 Differential Revision: D5171727 fbshipit-source-id: 1d9182bb820224b479682fc0ca5014f909ba19d5	2017-06-16 13:22:46 -07:00
Alexander Sidorov	eefd4b0bb2	Static RNN: gpu support and lstm_benchmark integration Summary: While this is not intended to be the best performat and general solution, we can see from the test plan in some cases static DAG RNN could perform better than our own implementation. Hopefully we will get dynamic RNN DAG execution at least as fast as this one. Then we will not need this one in production, only for testing. Still putting it into our benchmark for comparison purposes Reviewed By: akyrola Differential Revision: D5210038 fbshipit-source-id: fa44baf51c455872abd6ec5f5d151cf06e15b1fa	2017-06-16 11:31:43 -07:00
Aapo Kyrola	2a9cb7d4a9	use brew for Tranpose --> major perf regression fix Summary: I accidentaly noticed that we were calling the non-CUDNN version of Transpose with attention, and it is super slow. This broke when rnn_cell was changed to use ModelHelper instead of CNNModelHelper in D5062963, but calls to transpose were not "brewed". Reviewed By: jamesr66a Differential Revision: D5264248 fbshipit-source-id: b61494ae210f34597245f1195d20547f5b5cd8b5	2017-06-16 11:02:48 -07:00
Luke Yeager	fda35fd19d	TravisCI Overhaul Summary: Uncached build: https://travis-ci.org/lukeyeager/caffe2/builds/239677224 Cached build: https://travis-ci.org/lukeyeager/caffe2/builds/239686725 * Parallel builds everywhere * All builds use CCache for quick build times (help from https://github.com/pytorch/pytorch/pull/614, https://github.com/ccache/ccache/pull/145) * Run ctests when available (continuation of https://github.com/caffe2/caffe2/pull/550) * Upgraded from cuDNN v5 to v6 * Fixed MKL build (by updating pkg version) * Fixed android builds (`b6f905a67b (commitcomment-22404119)`) * ~~Building NNPACK fails with no discernible error message (currently disabled entirely)~~ * ~~Android builds continue to fail with existing error:~~ * ~~OSX builds time-out:~~ \| Before \| After \| Changes \| \| --- \| --- \| --- \| \| COMPILER=g++ \| linux \| without CUDA \| \| COMPILER=g++-5 \| linux-gcc5 \| without CUDA \| \| COMPILER=g++ \| linux-cuda \| updated to cuDNN v6 \| \| BLAS=MKL \| linux-mkl \| updated pkg version \| \| BUILD_TARGET=android \| linux-android \| \| \| COMPILER=clang++ \| osx \| \| \| BUILD_TARGET=ios \| osx-ios \| \| \| BUILD_TARGET=android \| osx-android \| \| \| QUICKTEST \| GONE \| \| \| COMPILER=g++-4.8 \| GONE \| \| \| COMPILER=g++-4.9 \| GONE \| \| Closes https://github.com/caffe2/caffe2/pull/735 Reviewed By: Yangqing Differential Revision: D5228966 Pulled By: bwasti fbshipit-source-id: 6cfa6f5ff05fbd5c2078beea79564f1f3b9812fe	2017-06-16 10:18:05 -07:00
Soumith Chintala	8d33603901	make t() of Variable consistent with Tensor (#1823 )	2017-06-16 16:08:53 +02:00
Aapo Kyrola	96f19fefc0	add warning if data parallel model is created for gpus that we dont have Summary: Don't want to assert since it can be useful to sometimes create models that are not run (for example, unit tests). Reviewed By: pietern Differential Revision: D5258905 fbshipit-source-id: f1beee0605bfef235ed0f23f7e78259109720254	2017-06-16 07:02:37 -07:00
Luke Yeager	1ca262b25f	Disable smart_tensor_printer_test on OSX Summary: TravisCI failure: https://travis-ci.org/lukeyeager/caffe2/jobs/240947529 Copies solution from `ef688490c4` Closes https://github.com/caffe2/caffe2/pull/768 Differential Revision: D5264295 Pulled By: akyrola fbshipit-source-id: 891fd747a5df3edd94218dbf461a2d936d334688	2017-06-16 05:50:46 -07:00
Simon Layton	176a841087	Fixes for CuDNNDropoutOp Summary: Closes https://github.com/caffe2/caffe2/pull/809 Differential Revision: D5263514 Pulled By: akyrola fbshipit-source-id: 1f1e5bdb6fa551cb1f9beb3e5d3ad9c0c8813ed0	2017-06-15 22:51:12 -07:00
Aapo Kyrola	6f1b1828e9	add SwitchToDevice to PrefetchOp constructor Summary: In https://github.com/caffe2/caffe2/pull/802, slayton58 fixed issue in ImageInputOp where the std and mean blobs were allocated on wrong GPU (0). This fails when there is no P2P memory access. Fundamental reason was that ImageInputOp's constructor did not call SwitchToDevice. Operator's does, but ImageInputOp inherits PrefetchOp -> OperatorBase, neither of which does the switch. So made PrefetchOperator do the switch (OperatorBase does not have context, so it cannot). Reviewed By: asaadaldien Differential Revision: D5258729 fbshipit-source-id: c615c60eb2047ad26249c5bcba57ab0ef21d00e4	2017-06-15 22:35:27 -07:00
gchanan	a64560c22e	Remove flattening for torch.dot (#1781 )	2017-06-16 02:15:33 +02:00
Thomas Viehmann	97f50edf46	Add documentation for Cholesky lapack functions (#1816 )	2017-06-16 02:10:56 +02:00
Romain Cledat	3a91ac56cb	Add a shared memory machine-wide mutex utility Summary: This can be used to serialize allocations and NCCL kernel calls for example. Multiple such mutexes can be created per process. Reviewed By: Yangqing, pietern Differential Revision: D5073609 fbshipit-source-id: 28cc4293632f20e9623ee6531365b881d0f3d9ef	2017-06-15 15:54:31 -07:00
Kittipat Virochsiri	fc2a8d045c	adding flatten indices output to TopK Summary: This makes it easier to gather top-K by group of rows. This is useful in the situation where we want to pick up top-K from batch of fixed length sessions. Let `N` be number of sessions, and `M` be number of examples in a sessions. We would have a batch of `N * M` rows. We can reshape the score blob to `N x M`, and use it as input to `TopK` to select top score for each session. However, without the new output, it's would be inconvenient to gather the rows corresponding to the top scores. The indices are in `[0, K-1)` range. The new output can be used directly as input to `Gather`. Reviewed By: chocjy Differential Revision: D5171459 fbshipit-source-id: 69f7b41456c3f9670650ae07afc8fef8328485e9	2017-06-15 15:32:29 -07:00
Luke Yeager	84cc82cf3f	Fix stats_ops_test Summary: The global StatRegistry doesn't get reset when the workspace is reset. ``` > self.assertTrue(len(workspace.FetchBlob('k3')) == 2) E AssertionError: False is not true ``` https://travis-ci.org/lukeyeager/caffe2/jobs/240162665 /cc azzolini NOTE: this error doesn't show up if you just run `stats_ops_test.py` directly. It shows up when you run other tests in the same session before this test: ``` pytest -v caffe2/python/ ``` Closes https://github.com/caffe2/caffe2/pull/788 Differential Revision: D5259232 Pulled By: salexspb fbshipit-source-id: 3c72633af6bb61c4fda62195298b1e9574b4cbef	2017-06-15 15:07:57 -07:00
Luke Yeager	dc0e857e76	README: TravisCI and Appveyor badges Summary: The existing per-branch TravisCI badges don't work, and will be out-dated when https://github.com/caffe2/caffe2/pull/735 is merged. I also added an Appveyor badge. Closes https://github.com/caffe2/caffe2/pull/786 Differential Revision: D5253408 Pulled By: aaronmarkham fbshipit-source-id: b274b30fcef9df3d2ff7cda1046f8462ad56c83b	2017-06-15 15:07:57 -07:00
Po-Yen Chou	5ce9cbae70	Upgrades python/hypothesis_test.py to use brew instead of CNNHelperModel Summary: Upgrades this file to use brew instead of CNNHelperModel Reviewed By: harouwu Differential Revision: D5252089 fbshipit-source-id: 6df4350717c1d42bc4bcc63d255cd422f085ee05	2017-06-15 15:07:56 -07:00
Dmytro Dzhulgakov	e9cba7e69f	Option to read from dataset indefinitely. Summary: Useful for benchmarking Reviewed By: kdub0 Differential Revision: D5226758 fbshipit-source-id: 6f3e6dd256f2c40ab71e598a7ce47cd06099adff	2017-06-15 15:07:53 -07:00
James Reed	d9d89b191d	implement SliceOp for GPU Summary: Implementation of the SliceOp for CUDA Reviewed By: akyrola Differential Revision: D5254287 fbshipit-source-id: 0a1660e1aa161fd088a2d8f886e019c05a1919a2	2017-06-15 14:34:34 -07:00
Pieter Noordhuis	086cd6fa3e	Don't continue running operators after failure Summary: This brings back DAGNet up to parity with SimpleNet, where execution stops as expected after an operator fails. For the DAGNet it's more involved, since we have to deal with all worker threads stopping execution. Because the job queue may still hold an arbitrary number of chains to execute, this diff explicitly closes it down, waits for all workers to terminate, and resets the job queue, upon seeing a failure. Reviewed By: akyrola Differential Revision: D5232955 fbshipit-source-id: 4dac3c3ed6e5c2ebd07473b0f8be2b02c28978e9	2017-06-15 14:34:33 -07:00
Luke Yeager	f61e4ca070	Fixes in tests to support numpy >= 0.12 Summary: ``` File "/data/caffe2/install/caffe2/python/hypothesis_test.py", line 1911, in test_batch_to_space (w + 2 * pad) / block_size).astype(np.float32) File "mtrand.pyx", line 1404, in mtrand.RandomState.randn (numpy/random/mtrand/mtrand.c:19843) File "mtrand.pyx", line 1534, in mtrand.RandomState.standard_normal (numpy/random/mtrand/mtrand.c:20368) File "mtrand.pyx", line 167, in mtrand.cont0_array (numpy/random/mtrand/mtrand.c:6127) TypeError: 'float' object cannot be interpreted as an index ``` ``` File "/data/caffe2/install/caffe2/python/operator_test/tile_op_test.py", line 101, in tile_ref tiled_data = np.tile(X, tuple(dims)) File "/data/caffe2/venv/local/lib/python2.7/site-packages/numpy/lib/shape_base.py", line 881, in tile return c.reshape(shape_out) TypeError: only integer scalar arrays can be converted to a scalar index ``` I also tested to make sure this still works with 0.11. Closes https://github.com/caffe2/caffe2/pull/787 Differential Revision: D5248087 Pulled By: salexspb fbshipit-source-id: eff69482a8eabb8ace330003fa326c832b53865f	2017-06-15 14:17:20 -07:00
Alexander Sidorov	ed2d7d27ab	LSTMUnit fix: Backed out changeset 1fa39ce474c7 Summary: some users reported eigen crashes from LSTMUnitGradient Reviewed By: akyrola Differential Revision: D5258396 fbshipit-source-id: 828bf5eeb899bdfa435048103ff3a96c7eb042e0	2017-06-15 14:17:19 -07:00
Sen Li	9d8a194cef	Deprecate CNNModelHelper in python/workspace_test.py Summary: Deprecate CNNModelHelper in python/workspace_test.py to use Model_Helper instead of CNN Reviewed By: harouwu Differential Revision: D5251778 fbshipit-source-id: d634f1c76e41a95b0247ebf5d5a48aef6f8e232e	2017-06-15 14:17:18 -07:00
Hao Shi	c4c3797b0d	Deprecate CNNModelHelper - Inception() Summary: This diff deprecates `CNNModelHelper` in `Inception()` only. Depends on D5248848 Reviewed By: harouwu Differential Revision: D5249312 fbshipit-source-id: 2818fb54bbae203956ed5cd5fb547508923c52a6	2017-06-15 14:03:27 -07:00
Hao Shi	b0625ff566	Deprecate CNNModelHelper - VGGA() Summary: This diff deprecates `CNNModelHelper` in `VGGA()` function. Depends on D5247946 Reviewed By: harouwu Differential Revision: D5248848 fbshipit-source-id: ede9113edb2024e4db8f0048f812050233e3fb40	2017-06-15 14:03:26 -07:00
Hao Shi	4aff677d3d	Deprecate CNNModelHelper - OverFeat() Summary: This diff deprecates `CNNModelHelper` in `OverFeat()` function. Depends on D5247004 Reviewed By: harouwu Differential Revision: D5247946 fbshipit-source-id: 6a5299ec71f78e0b81a43212a028651522ab8f4b	2017-06-15 14:03:25 -07:00
Hao Shi	078589d7c6	Deprecate CNNModelHelper - AlexNet() Summary: This diff deprecates `CNNModelHelper` in the `AlexNet()` function. More diffs will be coming to deprecate the helper in other functions. Depends on D5241738 Reviewed By: harouwu Differential Revision: D5247004 fbshipit-source-id: eec5c5ef916a85de8289cb92d2174a6a4b8075bf	2017-06-15 14:03:24 -07:00
Hao Shi	c095b3f67f	Deprecate CNNModelHelper - MLP() Summary: This diff deprecates `CNNModelHelper` in `MLP()` function. Reviewed By: harouwu Differential Revision: D5241738 fbshipit-source-id: 03669a4166a02257aa5779860d06b40d7496104d	2017-06-15 14:03:23 -07:00
Simon Layton	78a6d2f8ba	Fix potential GPU transform OOB access Summary: Occurred when running with multiple GPUs, not all of which are connected via P2P. Essentially when mean_gpu_ and std_gpu_ are allocated and populated in the constructor of ImageInputOp, it does not seem to be guaranteed that the active context is the same as the final context on which the Op will be run. This causes the image data and the mean/std to be on different devices. With P2P we don't mind this, but without this causes OOB memory accesses in the GPU transform kernel. Closes https://github.com/caffe2/caffe2/pull/802 Differential Revision: D5258528 Pulled By: akyrola fbshipit-source-id: 778e55b5f8bb39fc52644b68573c747210ebf3bb	2017-06-15 13:22:58 -07:00
Luke Yeager	8ef12951e0	Fix for protobuf with unicode_literals Summary: Python 2.7, Protobuf 2.6 > op.ClearField('uuid') E TypeError: field name must be a string Fix: http://python-future.org/imports.html#should-i-import-unicode-literals /cc salexspb tomdz Closes https://github.com/caffe2/caffe2/pull/804 Differential Revision: D5258494 Pulled By: akyrola fbshipit-source-id: 04c473c1e55bf8caac0bfde7d86171c9f95e71a1	2017-06-15 13:22:57 -07:00
Aapo Kyrola	7ffd76db51	check operator schema before calling gradient creator Summary: Hard-to-debug problems arise when a gradient creator fails when the forward op is incorrect itself. Add checking of the schema before callig the creator. Also clarify the error messages Reviewed By: Yangqing Differential Revision: D5256016 fbshipit-source-id: 78550f7e2ce5b88e26b69fdae4be0eece52edfea	2017-06-15 13:04:58 -07:00
Junjie Bai	024afc7b0d	Simplify the implementation of AccuracyOp and Enable top-k in GPU Reviewed By: wickedfoo Differential Revision: D5224685 fbshipit-source-id: 37c23e580903775f42347f55b4747c74e2863a35	2017-06-15 10:31:58 -07:00
Bokai Cao	ca34de8b4e	revert adding extra semicolon Summary: revert changes Reviewed By: jamesr66a Differential Revision: D5251322 fbshipit-source-id: ff2b47890388291aaf8a0b221b69ab053b556b6a	2017-06-15 10:31:57 -07:00
Mehdi Drissi	6500d7f307	Fixing a small bug in schema where the number of default arguments doesn't match the number of fields Summary: The current version of schema.py has a Metadata class with three fields. The default for it is set to four Nones. This is just changing that to three Nones so that the number of default values matches the number of actual fields. Reviewed By: kennyhorror Differential Revision: D5250463 fbshipit-source-id: 42e5650d270f5f63662614d8445b4819ed370dec	2017-06-15 10:31:56 -07:00
Junjie Bai	be7c336626	Deprecate CNNModelHelper in python/memonger_test.py Summary: Also fixed a small bug in ModelHelper constructor Reviewed By: harouwu Differential Revision: D5246799 fbshipit-source-id: 3719ca078f0e2b5e463fc93da9c8215f5583bd9a	2017-06-15 10:06:57 -07:00
Soumith Chintala	86a96cd759	Merge commit 'd605afe8b51bf1522d3caf4efef4b3c85def499b'	2017-06-15 12:33:45 -04:00
Soumith Chintala	f61ec2495e	nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean)	2017-06-15 12:32:47 -04:00
Soumith Chintala	d605afe8b5	nn.EmbeddingBag to compute a bag of word embeddings (Embedding + Sum/Mean)	2017-06-15 12:32:28 -04:00
Aron Barreira Bordin	909f31764f	Add nn.padding to docs fixes #1127 (#1808 ) * exposed nn.padding modules * using functional	2017-06-15 07:41:38 -04:00
Aapo Kyrola	7bf4c0e0fb	support RNNs in ExtractPredictorNet Summary: We need to support RNNs explicitly in ExtractPredictorNet, because they store sub-nets as strings in special arguments. When netdef argument arrive, we can generalize this a bit. Added a test under rnn_cell_test to test that extracting an LSTM predictor net works correctly and sets the device option properly for the step net ops. Reviewed By: yqwangustc Differential Revision: D5236334 fbshipit-source-id: cd653427f8c440a14d94195a532d18276f94749a	2017-06-14 22:32:29 -07:00
haracejacob	2ec294a8bb	Fix a few typos and grammars in comment Summary: Fix a few typos and grammars in comment by using language-check, python library spell_checker source code is here : https://github.com/17-1-SKKU-OSS/011A/blob/master/spell_checker/spell_checker.py here is the text file which indicates what things should be fixed : https://github.com/17-1-SKKU-OSS/011A/tree/master/spell_checker/fix/caffe2 Closes https://github.com/caffe2/caffe2/pull/719 Differential Revision: D5165118 Pulled By: aaronmarkham fbshipit-source-id: 7fb8ef7a99d03cd5fd2f9ebdb01b9865e90fc37b	2017-06-14 18:22:39 -07:00
Trevor Killeen	ea5819045e	a few comments in build_all.sh (#1807 )	2017-06-14 17:58:56 -04:00
Aapo Kyrola	46a95cf420	Allow specifying device to load_from_db() Summary: A quite common problem is that it is hard to load blobs with pe.load_from_db to a specific device. One must set the device options of the returned init_net and predict_init_net, which is quite magical. So I made load_from_db() able to set these device options automatically, based on device scope or device_option parameter. Added an unit test. Reviewed By: asaadaldien Differential Revision: D5249202 fbshipit-source-id: 7b9d91476cb8d1b0ec0d9772e50b9148b8b184fa	2017-06-14 14:32:24 -07:00
Luke Yeager	86594075f7	Third fix for KeepOnShrink tests Summary: See https://github.com/caffe2/caffe2/pull/723, https://github.com/caffe2/caffe2/pull/551, https://github.com/caffe2/caffe2/issues/417 ``` [ RUN ] TensorCPUTest/1.KeepOnShrink /opt/caffe2/caffe2/core/blob_test.cc:362: Failure Expected: (ptr) != (larger_ptr), actual: 0x24f7640 vs 0x24f7640 [ FAILED ] TensorCPUTest/1.KeepOnShrink, where TypeParam = int (0 ms) ``` I haven't been able to reproduce locally or on TravisCI - only in our own test infrastructure. The fix is conceptually the same as https://github.com/caffe2/caffe2/pull/723. After reading through the code, I can't see any other checks which should fail this way. Closes https://github.com/caffe2/caffe2/pull/801 Differential Revision: D5248106 Pulled By: akyrola fbshipit-source-id: 0cd3e2c11e7ae3d924496843a311530f0e08a9da	2017-06-14 11:48:15 -07:00
Simon Layton	eaacfc7e25	Fix multi-precision SGD outputs Summary: salexspb This fixes a major perf issue (40% boost on alexnet end-to-end perf) in the multi-precision SGD optimizer - it was causing repeated cudaMalloc / cudaFree calls during training iterations due to the changing size of the `grad` blob as it moved from fp16 <-> fp32. Closes https://github.com/caffe2/caffe2/pull/797 Differential Revision: D5246978 Pulled By: salexspb fbshipit-source-id: ec3d7ef18445e19eaf5aac908d0a7bcd5957eb60	2017-06-14 11:36:43 -07:00
Sylvain Jeaugey	29a1a916dc	Add support for CUDA9 half semantics	2017-06-14 11:20:24 -07:00
Ahmed Taei	94d42b03fb	MaxReduction ops GPU implementation. Summary: Move rowwise-max kernel from Softmax to math_util library and implement colwwise-max kernel and MaxReduction ops. Reviewed By: akyrola Differential Revision: D5240329 fbshipit-source-id: a07281a877324de459aace33ff21175a68cfd8f6	2017-06-14 11:02:46 -07:00
Sam Gross	9c53c6dcb9	Fix errors and warnings when building docs (#1806 )	2017-06-14 13:50:14 -04:00
Samuel	9d916e561c	batch norm docfix (#1804 ) fixes the formula for batch normalization (moves the epsilon inside the square root)	2017-06-14 11:57:46 -04:00
Junjie Bai	c1f974aa9f	Deprecate CNNModelHelper in python/crf.py Reviewed By: harouwu Differential Revision: D5241631 fbshipit-source-id: 3dc448355bc2a766ae9eda1dc579e501743b35cf	2017-06-14 08:49:27 -07:00
gchanan	4e356528b4	Add torch.matmul function. (#1780 ) * Add torch.matmul function. Includes test_torch, test_autograd and docs changes. * Add __all__ to functional so imports are accidentally imported. * Include unbind in __all__. * Add matmul case for when one argument is 1-dimensional and the other at least 3-dimensional. * Add squeeze_ to Variable. * Use squeeze_ instead of squeeze for matmul.	2017-06-14 08:14:53 -04:00
Edward Z. Yang	9fd354e643	More accurate build instructions based on @apaszke's comments. (#1800 )	2017-06-14 12:04:45 +02:00
Alisson Gusatti Azzolini	d03ffb211c	Remove WORKER_INIT_CALLS Summary: This was only needed in order to initialize stateful PythonOps. Now PythonOp has support for initialization at Op creation time, so this is not used anymore. Reviewed By: dzhulgakov Differential Revision: D5242908 fbshipit-source-id: dbaa249466dd0f37f25d204d387b1f99c6dd4fed	2017-06-13 20:18:48 -07:00
Alexander Sidorov	eebda50b79	Operator python traceback Summary: This is going to show a python Caffe2 user where a failed operator was created. Motivation for having this information not right in protobuf is to avoid having it too verboose and keep ability to read protobufs of a net after a simple print() call. Reviewed By: jamesr66a Differential Revision: D5226047 fbshipit-source-id: 7edfe850e05a2ec209577142aa3368664a57a108	2017-06-13 18:50:02 -07:00
Soumith Chintala	c8e9bc493b	Merge commit '244af06adc77674e7e1134d67d4a56ae7641f7b9'	2017-06-13 20:49:37 -04:00
Soumith Chintala	6de5ce6bac	Merge commit '1cf105d517c4308912eee85eff8f50f31c9e31f1'	2017-06-13 20:49:13 -04:00
Sam Gross	38b9598685	Added GLU (gated linear unit) From https://arxiv.org/abs/1612.08083	2017-06-13 20:48:19 -04:00
Sam Gross	244af06adc	Added GLU (gated linear unit) From https://arxiv.org/abs/1612.08083	2017-06-13 20:48:03 -04:00
Sam Gross	1cf105d517	Added GLU (gated linear unit) From https://arxiv.org/abs/1612.08083	2017-06-13 20:47:55 -04:00
Edward Z. Yang	3ada9da808	Make csrc -Werror clean. (#1795 ) Primary things I had to fix: - Suppress _XOPEN_SOURCE warnings by ensuring that Python.h is included first, because it always unconditionally defines this macro. - Turn off strict aliasing, because Python 2 doesn't work with strict aliasing. - Workaround setuptools bug, where it's incorrectly passing -Wstrict-prototypes to C++ compilers (where this doesn't make any sense) To compile csrc with -Werror, run `CFLAGS="-Werror" python setup.py build_ext` Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 20:18:09 -04:00
Ben Zhang	a2521148b4	TimeObserver for SimpleNet, an example usage of Observers. Summary: Implemented TimeObserver for SimpleNet. Reviewed By: pietern Differential Revision: D5188373 fbshipit-source-id: 530d75d176aa29d38c131338c3a2be70bc221a47	2017-06-13 17:02:11 -07:00
Alisson Gusatti Azzolini	d3ec6e8f55	Run python op builder at op creation time Summary: This allows to construct a python op by passing a pickled "builder function call" as an argument to the op. The builder function is called at PythonOp construction time and returns a function that will be called when the op is run. This way we allow to drop the dependency on 'tokens', which didn't work properly for protobufs that get distributed to other processes. Now, the PythonOp definition is self-contained: as long as the build dependencies are right, sharding the protobuf is enough to execute the net remotely. Reviewed By: dzhulgakov Differential Revision: D5080833 fbshipit-source-id: a5deaca5d3143024cdb121519689224e9dbec5ce	2017-06-13 16:29:22 -07:00
Edward Z. Yang	5a63a6d47f	Better document how to rebuild only parts of the project. (#1796 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 17:23:39 -04:00
Soumith Chintala	38a48729f0	Merge commit '1a6995b28ca42df41270d4fd914adfb9c8c59674'	2017-06-13 16:31:48 -04:00
Soumith Chintala	deb0aef30c	Merge commit '122dd9e8ec4627ccdd895a7dc88a1ec6f13ad6d2'	2017-06-13 16:31:13 -04:00
Edward Z. Yang	3977ee3520	Support device on sparse tensor constructor, assert values/indices on same device. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:35 -04:00
Edward Z. Yang	c0e7bda3f1	Enforce storage is not NULL invariant for sparse tensors. Fixes #1783. There is an undocumented invariant in PyTorch that we should try to avoid having storage == NULL as much as possible (even though Torch supports it.) This commit properly documents the invariant, and fixes a bug in sparse where the invariant was not respected. This now means that sparse tensors now correctly remember what GPU they are associated with. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:35 -04:00
Edward Z. Yang	df412051fd	Add comment stating nDenseTensors != nTensors in checkGPU. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:35 -04:00
Edward Z. Yang	7bee03fe1e	Do NOT clone indices/values passed to sparse tensor by default. Fixes #1782. The default operation should be cheap: user can always choose to explicitly make a copy on the way in. Note that this is a BACKWARDS COMPATIBILITY BREAKING change. However, we DO create a new tensor wrapper (so we are not affected by subsequent size changes, etc.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:34 -04:00
Edward Z. Yang	865beada0e	Add comment about new implementation being CPU-only. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:34 -04:00
Edward Z. Yang	6a46863c83	Abort on known bug (#1521 ) for spcadd on non-coalesced. It's better to error than to silently give wrong results. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:19 -04:00
Edward Z. Yang	d763db59a9	More efficient nnz test in spcadd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:19 -04:00
Edward Z. Yang	5d6e593c67	Test clone preserves uncoalescedness if it wasn't coalesced. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:19 -04:00
Edward Z. Yang	bac408b693	Add some docs about storage->Size. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:19 -04:00
Edward Z. Yang	2f967a204c	Sparse tensor clone() preserves coalescedness. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:30:19 -04:00
Edward Z. Yang	1a6995b28c	Short-circuit copy if src and dest are equal. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:20:04 -04:00
Edward Z. Yang	122dd9e8ec	Short-circuit copy if src and dest are equal. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-13 16:19:35 -04:00
Thomas Dudziak	b877d4b5f8	Misc fixes for Python 3 Summary: As title Differential Revision: D5216942 fbshipit-source-id: def5563f1b259efefab3a829d8a78d8d3297ffc7	2017-06-13 12:18:43 -07:00
Xianjie Chen	795e7e64e8	add truncation for sparse feature Summary: truncate id list using the max length computed in compute meta, so that it has determined length, which is useful for position weighted pooling method. Reviewed By: sunwael Differential Revision: D5233739 fbshipit-source-id: f73deec1bb50144ba14c4f8cfa545e1ced5071ce	2017-06-13 10:46:53 -07:00
martinarjovsky	7c024e93c6	Implement Cumprod function for autograd (#1439 )	2017-06-13 17:48:15 +02:00
Alykhan Tejani	b4698d6d1d	add init to __init__.py of torch.nn (#1789 )	2017-06-13 09:02:30 -04:00
Adam Paszke	d9d50f80c7	Rename arguments to distributed collectives	2017-06-12 22:02:11 -04:00
Adam Paszke	714351ff39	Officially enable process-group mode	2017-06-12 22:02:11 -04:00
Adam Paszke	6f51b4ce2d	Fix deadlock in GlooCache	2017-06-12 22:00:22 -04:00
Adam Paszke	12813b88f6	Add DistributedDataParallel	2017-06-12 22:00:22 -04:00
Adam Paszke	23ab9d481a	Add Module._all_buffers	2017-06-12 21:58:38 -04:00
Adam Paszke	8db8716c7c	Support non-default streams in NCCL reduce	2017-06-12 21:58:38 -04:00
Adam Paszke	b37f18be53	Free GIL when entering THD functions	2017-06-12 21:58:38 -04:00
Adam Paszke	5a0d5ec058	Add more checks in torch.distributed	2017-06-12 21:58:38 -04:00
Adam Paszke	095ddc7d08	THD updates and bug fixes * Add keepdim * Fix DataChannel signature * Fix incorrect locking * Use current stream in DataChannelGloo	2017-06-12 21:58:38 -04:00
Adam Paszke	86a065e45b	Add end callbacks to the engine	2017-06-12 21:58:38 -04:00
Lukasz Wesolowski	59d438de2e	change function to remove dependence on CUDA 8.0 Summary: Replace call to function that is only supported in CUDA 8.0 with one that has been supported in previous releases. Reviewed By: pietern Differential Revision: D5231755 fbshipit-source-id: d72aec2a4a1c511064a65142887f8a05b51dad55	2017-06-12 15:53:59 -07:00
Francisco Massa	6626881e7a	Add Alpha Dropout (#1775 )	2017-06-13 00:39:49 +02:00
Yiming Wu	406d748423	better engineering for core_test.TestInferDevice Summary: Recently people find that this test is too strict because of proto string matching. Thus, I change it to compare fields so that this test will not complain even if protobuf chnaged in future. Reviewed By: dzhulgakov Differential Revision: D5229855 fbshipit-source-id: 54efcd7a0f9e5dbba1ddeb480801abcb859e07bd	2017-06-12 15:23:00 -07:00
Bokai Cao	0f787a01bc	map operator (move maptrait def out of class) Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed. Differential Revision: D5224449 fbshipit-source-id: 2f60754ed3ba6ed16039c09019117ae3c3646ab2	2017-06-12 14:52:04 -07:00
Thomas Dudziak	c7f5bf282b	Revert py::bytes -> std::string Summary: As title Reviewed By: salexspb Differential Revision: D5229338 fbshipit-source-id: 3bc9442c76061436db8f3217c1ba8edfd9581f8b	2017-06-12 14:11:37 -07:00
Bor-Yiing Su	c1420330b2	Fixes the checkpoint test. Summary: Diff D5224410 initializes the should_stop_blob explicitly. With that, we will have one more blob when executing the job. Adjusts the check accordingly. Reviewed By: azzolini Differential Revision: D5228398 fbshipit-source-id: 439b186c30b0b1d0e41e513babbcccd85e7a1b4a	2017-06-12 12:19:14 -07:00
Alexander Sidorov	7f1385e70c	Improve gradient accumulation of the framework: 1.5x - 2x Summary: We waste extra memory by creating two autosplit gradient blobs and then accumulating it into them main one. Sometimesk, when Sum / Sub ops are involved, we can avoid wasting extra memory at all. Ideally we would not waste any memory and make ops add to the same blob rather then calculating separate results and then mering them. But it would require a substantial change to the frameworks and rewriting a lot of operators. Reviewed By: dzhulgakov Differential Revision: D5157667 fbshipit-source-id: 8293824d6cdd971d8853ae90aee68e4a6d1e132b	2017-06-11 22:02:30 -07:00
Alexander Sidorov	817ae5b5eb	Revert D5211826: [caffe2][PR] Avoid compiler warning about unreachable code Summary: This reverts commit 9bb134b387d6620f1235a7b1ddf13092d73ae44c Differential Revision: D5211826 fbshipit-source-id: 0468a95cb46d6d04d8d97a4f6c3bd06eab8d9bb4	2017-06-11 21:56:31 -07:00
Dmytro Dzhulgakov	638fe804dc	Implement recover_input_schema_by_prefix Summary: It's very useful for simple cases like benchmarking nets where we want to encode input/output record in the net and don't want to go through the hurdles of storing input/output record in MetaNetDef. For those cases I propose remapping the input/output record before saving to 'input_record/{field_name}'. Then we can recover input/output record back just based on the names of the blobs. Differential Revision: D5170473 fbshipit-source-id: ac5daa60051605ed93022aec1377a49f08f15663	2017-06-11 15:37:12 -07:00
Xiaolong Wang	b133c214ce	fix potential bug in task.py Summary: as titled Differential Revision: D5225166 fbshipit-source-id: 9247fe44922c097752c6996ee9192ec72b7e7d88	2017-06-11 10:40:47 -07:00
Xiaolong Wang	827a0ac2fe	Fix comment mistakes in task.py Summary: as titled Reviewed By: kennyhorror Differential Revision: D5225154 fbshipit-source-id: 99a9547e15e0d5a4c81b6339ce75406160a7fc07	2017-06-11 10:17:07 -07:00
Gregory Chanan	49ec984c40	Ensure warnings are repeated in python2 for tests.	2017-06-11 05:37:59 -04:00
Gregory Chanan	afaad94fed	Rename autograd keepdim tests that now default to True.	2017-06-11 05:37:59 -04:00
Gregory Chanan	4f602a52b5	Use THPUtils_assert rather than THError in torch/csrc/Module.	2017-06-11 05:37:59 -04:00
Gregory Chanan	3abc8be42c	Clarify use of warn vs raise in expand_utils and don't catch exception in Broadcast plugin when fallback = false.	2017-06-11 05:37:59 -04:00
Gregory Chanan	f4ce99fd87	Add dist, atan2, lerp to fallback functions. They weren't documented as having those semantics, but tests on master show they do.	2017-06-11 05:37:59 -04:00
Gregory Chanan	d5a0f97ea7	Renamed masked_copy to masked_scatter in test, fix use of break/continue.	2017-06-11 05:37:59 -04:00
Gregory Chanan	e8ec4110f6	Fix Prod backward for broadcasting.	2017-06-11 05:37:59 -04:00
Gregory Chanan	ffd808768e	Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function.	2017-06-11 05:37:59 -04:00
Gregory Chanan	5b81746767	Simplify python warning settings and cleanup tests.	2017-06-11 05:37:59 -04:00
Gregory Chanan	d49b73bbe6	Rename check_fallback to check_backincompat_expand_warn for clarity.	2017-06-11 05:37:59 -04:00
Gregory Chanan	7040b82ede	Change async/broadcast copy arguments to be parsed as ints.	2017-06-11 05:37:59 -04:00
Gregory Chanan	723819014e	Move expand_utils-inl.h to generic/ and generate via macros.	2017-06-11 05:37:59 -04:00
Gregory Chanan	1ef4cc1591	Incorporate review comments: 1) Line up trailing dimensions in broadcast docs. 2) remove unnecessary expand_as in common_nn test. 3) use view in tensor_str instead of resize_. 4) newExpand remove raiseErrors change. 5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry. 6) simplify inferSize2/inferSizeN implementations. 7) use new-style classes for warning.	2017-06-11 05:37:59 -04:00
Gregory Chanan	deec86cc05	Clarify a number of comments.	2017-06-11 05:37:59 -04:00
Gregory Chanan	7da46097fe	Fix lint errors.	2017-06-11 05:37:59 -04:00
Gregory Chanan	21d9b0c9dd	Ensure warnings are repeated in test, necessary in python2.	2017-06-11 05:37:59 -04:00
Gregory Chanan	69287250d1	Add a broadcast parameter to copy_, use it in the library in cases where there is non-broadcasting calls exposed by the tests.	2017-06-11 05:37:59 -04:00
Gregory Chanan	74a23c5aba	Fix test_broadcast for cuda tensors, since map_, map2_ not implemented.	2017-06-11 05:37:59 -04:00
Gregory Chanan	177785eecf	explicit Ptr constructors, fast transposed copy.	2017-06-11 05:37:59 -04:00
Gregory Chanan	ad9604f45a	Add documentation for copy_.	2017-06-11 05:37:59 -04:00
Gregory Chanan	65b23f146e	Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils.	2017-06-11 05:37:59 -04:00
Gregory Chanan	c54e532954	Add broadcasting support for map_, map2_.	2017-06-11 05:37:59 -04:00
Gregory Chanan	ec120fac0c	Add broadcasting support for masked_copy, masked_fill.	2017-06-11 05:37:59 -04:00
Gregory Chanan	e06523482a	Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types.	2017-06-11 05:37:59 -04:00
Gregory Chanan	d6fb92fec9	Improve in-place broadcasting back compat warning message and fix an issue where the deprecated warning would not be printed.	2017-06-11 05:37:59 -04:00
Gregory Chanan	5e1a714386	Add backwards incompatibility docs.	2017-06-11 05:37:59 -04:00
Gregory Chanan	be65f46c76	Add optional warning for backwards incompatible keepdim. Setting torch.utils.backcompat.keepdim.warning.enabled=True will cause Python warnings in the case where the default value of keepdim is used for 1-d reductions. Also specify keepdim via kwargs in library so these warnings have less noise.	2017-06-11 05:37:59 -04:00
Gregory Chanan	3556d1b8a3	Add optional warning for backwards incompatible broadcast. Setting torch.utils.backcompat.broadcast.warning.enabled=True will cause Python warnings in the case where broadcast occurs but previously 1-d view style pointwise ops occured.	2017-06-11 05:37:59 -04:00
Gregory Chanan	5af46cb352	Add broadcasting support for matmul.	2017-06-11 05:37:59 -04:00
Gregory Chanan	a36f95fe26	Add broadcast support for fused-matmul broadcasting. Functions are: addmm, addbmm, addr, addmv, baddbmm.	2017-06-11 05:37:59 -04:00
Gregory Chanan	cd35091d9b	Include simple broadcasting example and demonstrate lining up trailing dimensions.	2017-06-11 05:37:59 -04:00
Gregory Chanan	3c586d196a	Document Broadcast Plugin.	2017-06-11 05:37:59 -04:00
Gregory Chanan	8e2f347951	Proof that broadcasting 3 args (expand3) is equivalent to breaking up operation.	2017-06-11 05:37:59 -04:00
Gregory Chanan	d279c6e099	Docs for addcdiv, addcmul	2017-06-11 05:37:59 -04:00
Gregory Chanan	014372e707	Support "fused" ops: addcmul/addcdiv.	2017-06-11 05:37:59 -04:00
Gregory Chanan	92fde6cf06	Breakup in place broadcast to better handle multiple arguments.	2017-06-11 05:37:59 -04:00
Gregory Chanan	b44ea57ba8	Change order of Broadcast specification. Since fused ops require broadcasting self over multiple other arguments, it is simpler to specify broadcast on self rather than the other way around.	2017-06-11 05:37:59 -04:00
Gregory Chanan	e96f854ce2	Implement/test broadcasting semantics for comparison ops.	2017-06-11 05:37:59 -04:00
Gregory Chanan	edf2969bd8	Backwards compatible Spatial Normalizations / CrossMapLRN.	2017-06-11 05:37:59 -04:00
Gregory Chanan	e653fe2857	Test fixes for keepdim=False, suppress warnings on backwards-compatible behavior.	2017-06-11 05:37:59 -04:00
Gregory Chanan	70c33777a6	pow, fmod, remainder also should fallback. This behavior isn't listed in the docs, but the tests depend on it.	2017-06-11 05:37:59 -04:00
Gregory Chanan	471dfe9791	Add documentation including links to numpy broadcasting semantics.	2017-06-11 05:37:59 -04:00
Gregory Chanan	85d838a028	Testing over the following: 1) CPU tensor out-of-place functions 2) CPU tensor in-place functions 3) GPU tensor out-of-place functions 4) GPU tensor in-place functions 5) torch. functions 6) Fallback semantics (use pointwise nElem matching rather than broadcasting)	2017-06-11 05:37:59 -04:00
Gregory Chanan	6a40acb4f0	Add Broadcast plugin.	2017-06-11 05:37:59 -04:00
Gregory Chanan	9087624634	Revert "Restore examples with keepdim=True default." This reverts commit 6fab62173e842bbf550de1c68cfae507ca35b800.	2017-06-11 05:37:58 -04:00
Gregory Chanan	e772a440cb	Revert "Change keepdim default to False." This reverts commit e124790cb2b6675a4b6edf64620a7eb7f7228b29. Note the original commit message is incorrect; this changes keepdim back to false.	2017-06-11 05:37:58 -04:00
Soumith Chintala	efd8b54be2	Merge commit 'e45c1046feba46aef2ffac1b1d978a3e76936bab'	2017-06-11 05:37:51 -04:00
Soumith Chintala	54c3441e9c	Merge commit '7d1b042cb2198d2bdb5871b08c6c0fb2ccc8e6b1'	2017-06-11 05:37:18 -04:00
Artem Volkhin	4102a79da4	Explicitly set should_stop_blob to False in pipeline init Summary: This diff fixes an issue with running the same reader in the same workspace multiple times. In order to achieve correct behavior of execution step we have to explicitly initialize should_stop_blob with False. Reviewed By: kennyhorror Differential Revision: D5224410 fbshipit-source-id: 4ad2740e187b62b0a1f5612ea3eef223dcc8a799	2017-06-11 02:33:42 -07:00
Soumith Chintala	7d1b042cb2	fix type	2017-06-11 04:42:34 -04:00
Gregory Chanan	e45c1046fe	Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function.	2017-06-11 04:33:54 -04:00
Gregory Chanan	a563ce1105	Incorporate review comments: 1) Line up trailing dimensions in broadcast docs. 2) remove unnecessary expand_as in common_nn test. 3) use view in tensor_str instead of resize_. 4) newExpand remove raiseErrors change. 5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry. 6) simplify inferSize2/inferSizeN implementations. 7) use new-style classes for warning.	2017-06-11 04:33:54 -04:00
Gregory Chanan	92d52bf395	Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils.	2017-06-11 04:33:54 -04:00
Gregory Chanan	0463ddf16b	Support "fused" ops: addcmul/addcdiv.	2017-06-11 04:33:54 -04:00
Gregory Chanan	9060e6be7f	Remove raiseErrors from THTensor functions, have THStorage functions take an error_buffer to return a proper error message while being able to handle memory management correctly from calling function.	2017-06-11 04:32:08 -04:00
Gregory Chanan	f0b8c4821b	Incorporate review comments: 1) Line up trailing dimensions in broadcast docs. 2) remove unnecessary expand_as in common_nn test. 3) use view in tensor_str instead of resize_. 4) newExpand remove raiseErrors change. 5) clarify expandedSizes/expandedStrides parameters in inferExpandGeometry. 6) simplify inferSize2/inferSizeN implementations. 7) use new-style classes for warning.	2017-06-11 04:32:08 -04:00
Gregory Chanan	0f79bf1a69	Clarify a number of comments.	2017-06-11 04:32:08 -04:00
Gregory Chanan	503002eda7	Add broadcasting support for copy_, simplify code generation by moving a lot of currently generated code to expand_utils.	2017-06-11 04:32:08 -04:00
Gregory Chanan	cf55e1e48a	Add broadcasting support for masked_copy, masked_fill.	2017-06-11 04:32:08 -04:00
Gregory Chanan	8d35d4215b	Use THSize_isSameSizeAs, instead of THTensor_(isSameSizeAs) in order to compare sizes of tensors with different data types.	2017-06-11 04:32:08 -04:00
Gregory Chanan	9356640453	Properly clean up expand error cases.	2017-06-11 04:32:08 -04:00
Gregory Chanan	ae6b8d0112	Include simple broadcasting example and demonstrate lining up trailing dimensions.	2017-06-11 04:32:08 -04:00
Gregory Chanan	ec2f6a81fd	Support "fused" ops: addcmul/addcdiv.	2017-06-11 04:32:08 -04:00
Gregory Chanan	1f9a365fdc	Add Infer Size N, for expansion of fused operations.	2017-06-11 04:32:08 -04:00
Gregory Chanan	d38a87217f	Expand improvements 1) Rename calculateExpandGeometry to inferExpandGeometry for consistency 2) Simplify inferExpandGeometry implementation by using a single pass through dimensions 3) Implement a two operand expansion, expand2. 4) Implement versions that return error code to use for fallback to equal nElem support.	2017-06-11 04:20:04 -04:00
Gregory Chanan	baa4ba973b	Expand improvements 1) Rename calculateExpandGeometry to inferExpandGeometry for consistency 2) Simplify inferExpandGeometry implementation by using a single pass through dimensions 3) Implement a two operand expansion, expand2. 4) Implement versions that return error code to use for fallback to equal nElem support.	2017-06-11 04:19:37 -04:00
Francisco Massa	a24db91a38	Add SELU activation function (#1769 ) * Add SELU activation function * Remove unnecessary case * Add Function for SELU + tests and fix RReLU inplace * Fix extra line in doc * Fix tests Remove in-place tests for RReLU. For some reason they fail on legacy nn, but passes on nn * SELU in new-style Function It also supports double backprop, verifyed with gradgradcheck * Fix flake8	2017-06-11 10:07:48 +03:00
gchanan	e3d5826b92	Add Cumsum double backwards support. (#1758 )	2017-06-10 18:27:44 +02:00
Edward Z. Yang	ba690d5607	Add support for NVTX functions. (#1748 )	2017-06-10 18:26:58 +02:00
Alykhan Tejani	5f1a16a018	Torch manual seed to seed cuda devices (#1762 )	2017-06-10 12:37:21 +02:00
Xiaolong Wang	eced95ffe5	caffe2 video_io.cc bug fix Summary: fix video_io.cc bug Reviewed By: dutran Differential Revision: D5224323 fbshipit-source-id: fa31f87d1053f38546a988f15368d39124fb40f7	2017-06-09 23:17:20 -07:00
Robert DiPietro	dcf07a2d7f	Fix typo in ParameterList documentation	2017-06-10 02:16:52 +02:00
Bokai Cao	e01769ece5	map operator Summary: added an operator that converts key/value blobs into a blob containing a map pointer, unittest passed. Differential Revision: D5166513 fbshipit-source-id: 748527c423a163fe55f914c08fff3adfc74a540c	2017-06-09 15:17:29 -07:00
Jiyan Yang	c7aa8e142d	Add gradient to SparseToDense op Summary: As desc. Differential Revision: D5169423 fbshipit-source-id: 64c72933c14c3caabfbe0bf85912194a479c24fa	2017-06-09 13:47:21 -07:00
Jiyan Yang	c822e89956	Rename SparseToDense layer Summary: The SparseToDense layer is essentially calling the SparseToDenseMask op. This makes it impossible to call the functional layer with the true SparseToDense op. This diff is to rename the layer. Please let me know if I missed anything or you have a better name suggestion. Differential Revision: D5169353 fbshipit-source-id: 724d3c6dba81448a6db054f044176ffc7f708bdb	2017-06-09 12:48:27 -07:00
Victor Gao	7517f050fc	apply clang-tidy modernize-use-override Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`. Reviewed By: igorsugak Differential Revision: D5211868 fbshipit-source-id: 6a85f7c4a543a4c9345ec5b0681a8853707343dc	2017-06-09 11:33:07 -07:00
Victor Gao	20382004d6	apply clang-tidy modernize-use-override Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`. Reviewed By: igorsugak Differential Revision: D5211868 fbshipit-source-id: 4118c4c72f8ec3485507f69679f7e852b3eaeb73	2017-06-09 11:33:07 -07:00
Victor Gao	2f385d490b	apply clang-tidy modernize-use-override Summary: Use clang-tidy to mechanically add missing `override` and remove redundant `virtual`. Reviewed By: igorsugak Differential Revision: D5211868 fbshipit-source-id: 15cec17d39690ffa8072ffeccdf9fedaae1f6839	2017-06-09 11:33:06 -07:00
Yiming Wu	072f4dbefc	net_printer_quick_fix Summary: To deal with encode failure Reviewed By: azzolini Differential Revision: D5215897 fbshipit-source-id: cf8687706f7e4deaee05b61cd2bfeaff88672fcc	2017-06-08 19:34:50 -07:00
Wael Abdelghani	c291c97494	Add integration test for pos_w Summary: Title Reviewed By: azzolini Differential Revision: D5197307 fbshipit-source-id: 425bf8e7c5068ea544e5b2709b6bb27eef140bf3	2017-06-08 18:04:53 -07:00
Alexander Sidorov	df72826ead	Static RNN Summary: Static RNN allows to unroll an RNN into Caffe2 graph using all existing cell abstractions. In this diff I introduce several new tests that already caught a few bugs in our RecurrentNetworkOp gradient accumulation logic by comparing it to an unrolled version. Another use case is perf - potentially we can run an unrolled net faster because DAGNet will have access to the whole graph. Same about memonger. But this work is not part of this diff Reviewed By: akyrola Differential Revision: D5200943 fbshipit-source-id: 20f16fc1b2ca500d06ccc60c4cec6e81839149dc	2017-06-08 17:48:48 -07:00
Alexander Sidorov	bb9077a6cd	Network forward / backward equality checker Summary: In some cases you have an optimized network and a normal one. And you would like to make sure they produce same results. If math under the hood is the same, you could do this with a very high precision compare to a traditional numerical gradient check. One of the application - RNNs. There we can unroll RNN into Caffe2 graph and make sure result is the same as in the optimized version using RecurrentNetworkOp. Another possible application - graph transformations. We can verify that after that nets produce same gradients (cc akyrola on memonger, bwasti on other transformation ideas) Reviewed By: bwasti Differential Revision: D5200855 fbshipit-source-id: 0196af187f0c2feb33de4778ea08d0d288fe1017	2017-06-08 17:48:47 -07:00
Alexander Sidorov	264f75fdd0	ZeroGradient op Summary: when building a multi layer static RNN the last timestep of the first layer (and other layers except the last one) doesn't get a gradient for the cell state as normally user uses results only from the last layer and cell state doesn't go up either. ZeroGradient provides a general solution for injecting 0 gradient blobs. It is in some way similar to StopGradient operator which is also specialcased Reviewed By: bwasti Differential Revision: D5198375 fbshipit-source-id: a21d0cfb3676a77fac72e5897a200d0bd25fc6de	2017-06-08 16:02:38 -07:00
Jon Morton	3f4e9ab99c	Add support for group arg to fbcode nnpack conv op Summary: Support grouped convolutions using the `group` arg in the nnpack convolution implementation. Reviewed By: Maratyszcza Differential Revision: D5204743 fbshipit-source-id: 81116213f7a4f6afa793e4bdf1c5bdd9a55e124f	2017-06-08 14:05:56 -07:00
Luke Yeager	52ee7697f4	Fixing broken Python tests Summary: `brew_test.py` is just plain broken. `core_test.py` doesn't work with pytest. `apmeter_test.py` and `top_k_test.py` don't work for CUDA builds. Closes https://github.com/caffe2/caffe2/pull/765 Differential Revision: D5211817 Pulled By: Yangqing fbshipit-source-id: 78ec5af35a3fa870978e4c9590210ade9e3bc5ac	2017-06-08 13:34:46 -07:00
Luke Yeager	10ec905289	Avoid compiler warning about unreachable code Summary: Fix https://github.com/caffe2/caffe2/issues/764 I don't think we care much about the behavior, exactly, as long as it's a loud clear crash - right? Closes https://github.com/caffe2/caffe2/pull/766 Differential Revision: D5211826 Pulled By: Yangqing fbshipit-source-id: 9bb134b387d6620f1235a7b1ddf13092d73ae44c	2017-06-08 13:34:45 -07:00
Luke Yeager	5c0b22ea03	Fix observer_test Summary: Fixes bug in tests introduced with `52dafaa7db`. ``` [ RUN ] ObserverTest.TestNotifyAfterDetach /data/caffe2/caffe2/core/observer_test.cc:130: Failure Expected: 0 To be equal to: counter.load() Which is: 1212 ``` Bug disovered with TravisCI builds at https://github.com/caffe2/caffe2/pull/735, but can also be seen in the existing builds at https://travis-ci.org/caffe2/caffe2/jobs/239913458. Closes https://github.com/caffe2/caffe2/pull/757 Differential Revision: D5211808 Pulled By: Yangqing fbshipit-source-id: f3ae83fb2933bad98eea2c02275fa41bf8fad892	2017-06-08 13:34:44 -07:00
Luke Yeager	75f1da327d	Skip Python tests which require opencv or lmdb Summary: Neither dependency is required by the core Python modules. OpenCV, in particular, is a pain to install (no pip package). Conditionally skipping this test will make TravisCI integration easier. Closes https://github.com/caffe2/caffe2/pull/739 Differential Revision: D5211799 Pulled By: Yangqing fbshipit-source-id: c6bdc8a17977f64f34e968fd9ab8c65161d2624d	2017-06-08 13:34:43 -07:00
Luke Yeager	49c89d6664	Use add_dependencies() for ExternalProjects Summary: I closed https://github.com/caffe2/caffe2/pull/736 because one of these variables should be used after all. Here's how C1 uses this variable: https://github.com/BVLC/caffe/blob/rc5/cmake/Targets.cmake#L116 Without this fix, there is a race condition in the parallel build leading to this error: ``` make[2]: *** No rule to make target `../third_party/NNPACK/lib/libnnpack.a', needed by `caffe2/libCaffe2_CPU.so'. ``` Closes https://github.com/caffe2/caffe2/pull/737 Differential Revision: D5211794 Pulled By: Yangqing fbshipit-source-id: 9e368f09b01edaf86252727adc6f6cc40d244e29	2017-06-08 13:34:42 -07:00
Romain Cledat	00e098083e	Fixed thread safety issues in ImageInputOp Summary: The random number generators could be used in a thread-unsafe method. This patch fixes this by adding a way for tasks to get the thread ID they are running on. Reviewed By: panshen1 Differential Revision: D5051334 fbshipit-source-id: 9a9f9e2e7b7a86ff456f37b40422af4fa100b5d9	2017-06-08 11:50:46 -07:00
Lukasz Wesolowski	fab5bef9f6	Merge pull request #45 from slayton58/nccl_cmake_fix Fix NCCL directory typo	2017-06-08 11:28:25 -07:00
Aapo Kyrola	27e01744b2	Probably fixed memonger Summary: This diff fixes various issues with memonger, and works at leasrt with rbgirshick's failure case, Resnet-50, and new harder unit test. I will still create a proper resnet50-test. 1) Introduce concept of "tokens". These are passed down the dependency chains, and a blob can be used for recycling only if it owns all the tokens that are currently in possession. Tokens are added when branching, and tokens are redeemed after all inputs are satisfied. A bit hard to explain. 2) There were various bugs due to bad code: the free_blobs data structure is of different type when we have blob sizes and when we haven't. I plan to rewrite this soon. But there were some bugs. 3) Added a harder unit test that failed before. 4) Added test for resnet50 + memonger Reviewed By: asaadaldien Differential Revision: D5193393 fbshipit-source-id: bc2a714877aa1201c32a5ba8ade862865e455711	2017-06-08 09:19:24 -07:00
Aapo Kyrola	feba1eed00	resnet50: fetch right lr Summary: I broke resnet50 when switching to use optimizer, which uses LR per parameter. This only happens after each epoch, and I did no test patiently enough. For a stop-gap, while asaadaldien works on a better solution, just fetch the lr of a conv1_w param. Reviewed By: asaadaldien Differential Revision: D5207552 fbshipit-source-id: f3474cd5eb0e291a59880e2834375491883fddfc	2017-06-07 21:46:35 -07:00
Yiming Wu	4fefff0bbb	Auto injecting device copy for single net and several nets Summary: This diff plan to attack the problem where we want to just annotate device option for operators and leave Caffe2 to help us inject cross device copy functions. This feature would be useful for mixed device training and multi device training with several nets, where previously we do the heavy lifting of adding copy functions ourselves. Ideally, this feature will happen like this: //construct your nets first core.InjectDeviceCopyAmongNets([train_init, train_net, ...]) My ideas are written in comments. I will update them here as well later. Reviewed By: dzhulgakov Differential Revision: D5134103 fbshipit-source-id: 173f7da9d1773d1c50ccdc27f1b5cd3067b04af5	2017-06-07 20:03:18 -07:00
Simon Layton	21a5c8ea5e	Fix use of nccl_INCLUDE_DIRS in nccl.cmake	2017-06-07 20:13:11 -04:00
Peizhao Zhang	87a12dd355	Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Summary: Caught exception when fetching uninitialized blobs when collecting blob sizes in workspace. Some of the output blobs (like mask output of DropOut when is_test=1) may be nullptr and FetchBlob will fail. Differential Revision: D5198641 fbshipit-source-id: 45ee26c4cb1c25cc48904e9f7d7c007224c97418	2017-06-07 15:35:32 -07:00
Ran Xian	4316fb4876	Implement APMeter op Summary: Implements an APMeter operator (APMeterOp) to calculate AP for multilclass classification given prediction socres and labels. The Op takes a score tensor [nsamples x nclasses] and a label tensor [nsamples x nclasses], and outputs a float tensor of size nclasses as the AP for each class. Reviewed By: akyrola Differential Revision: D5082565 fbshipit-source-id: ae7304bc8fc999c361245b9aec38eb9a5f5eef4b	2017-06-07 15:03:04 -07:00
Simon Layton	5300aafc1f	Fix NCCL directory typo	2017-06-07 17:01:13 -04:00
Zhicheng Yan	ee3727db00	add_helper_function_ElementwiseLinear_op Summary: Add a helper function for parametric op ElementwiseLinear The typical syntax is model.ElementwiseLinear(input, output, dimension) Reviewed By: harouwu, akyrola Differential Revision: D5114152 fbshipit-source-id: 8e8c691f824f518ae510a72ab0c12de1b018f3b5	2017-06-07 13:49:48 -07:00
Ben Zhang	77c481c40c	Fixed flaky observerTest.TestNotifyAfterDetach Summary: Not resetting counter matters when running binary directly. Reviewed By: bwasti Differential Revision: D5202723 fbshipit-source-id: 7cc5c9e4d5c6db0f79fa3950454556bc26ea4914	2017-06-07 13:33:08 -07:00
Matt	a9bd1de9e9	fixed README to reflect docker image name (#1751 )	2017-06-07 15:49:39 -04:00
James Cross	98825d1323	guard against special case of in-place operation Summary: There is an edge case where internal gradient blobs of the backward step net should not be considered internally calclulated if the only "internal" calculation is in-place. In the case of the failing attention unit tests, the offending blob was attention_weighted_encoder_context_grad, which was incorrectly considered internal because it was the output (as well as input) of a Reshape on the step net's edge. The caveat here is that the results may be unpredictable if a non-pass-through in-place operation is applied to a blob within a step net which is also consumed both internally and is a recurrent state/output. (This is an extreme edge case, and difficult to explicitly enforce, but it's worth noting.) Reviewed By: salexspb Differential Revision: D5198328 fbshipit-source-id: 0cfa8f903fd767fc50e727f238ac3d8cdca03fe0	2017-06-07 12:33:31 -07:00
Soumith Chintala	e57eef4bcb	Merge commit '62835fc3f5346968b4dca392c77efdeb75a6b172'	2017-06-07 14:54:47 -04:00
Sam Gross	d81da41650	Make sure the number of MKL and OpenMP threads match Otherwise, on many machines, the size of the OpenMP thread pool will change between MKL and our OpenMP enabled functions. The constant thread creation and destruction results in worse performance and leaks memory on GCC 5.4	2017-06-07 14:53:29 -04:00
Sam Gross	62835fc3f5	Make sure the number of MKL and OpenMP threads match Otherwise, on many machines, the size of the OpenMP thread pool will change between MKL and our OpenMP enabled functions. The constant thread creation and destruction results in worse performance and leaks memory on GCC 5.4	2017-06-07 14:53:14 -04:00
gchanan	da7957c660	Fix masked_copy call to masked_scatter. (#1749 )	2017-06-07 12:58:47 -04:00
Soumith Chintala	2a49353d5e	minor fix for docs of Upsample	2017-06-07 11:42:52 -04:00
Soumith Chintala	b05c23de44	Merge commit 'da45b4c6b3b0b7cd8f0dc612b9afa6a3a07b8305'	2017-06-07 11:31:38 -04:00
Soumith Chintala	019e967113	Merge commit '47bf87b9220c10edaafec98c6bd20bdb1436c8e4'	2017-06-07 11:30:35 -04:00
Luca Antiga	b9ab26765e	Add 3D upsampling (nearest and trilinear) with tests	2017-06-07 11:29:27 -04:00
Luca Antiga	da45b4c6b3	Add 3D upsampling (nearest and trilinear) with tests	2017-06-07 11:24:41 -04:00
Luca Antiga	47bf87b922	Add 3D upsampling (nearest and trilinear) with tests	2017-06-07 11:24:05 -04:00
Soumith Chintala	edd41d8d80	BatchNorm fallback to THNN when eps < CUDNN_BN_MIN_EPSILON (#1742 )	2017-06-07 09:56:28 -04:00
Soumith Chintala	352f8b2fa6	Merge commit 'ced01f6c919c4b7109512ce797a2a0185c8f8112'	2017-06-07 09:22:14 -04:00
Soumith Chintala	ced01f6c91	fix GRUFused signature	2017-06-07 09:21:20 -04:00
Thomas Dudziak	d524d5b481	Fixes zip/izip for Python 3 Summary: As title Reviewed By: salexspb Differential Revision: D5154186 fbshipit-source-id: 2ef24557d82ae16d3bdfbc90a4cc96be8e2dc6c3	2017-06-07 00:04:26 -07:00
Thomas Dudziak	60c78d6160	Fixes range/xrange for Python 3 Summary: As title Differential Revision: D5151894 fbshipit-source-id: 7badce5d3122e8f2526a7170fbdcf0d0b66e2638	2017-06-07 00:04:26 -07:00
Soumith Chintala	d351239c10	fix legacy ClassNLLCriterion for upstream change	2017-06-07 00:38:00 -04:00
Soumith Chintala	1b1579c89d	Merge commit 'b96f76e470b25454b6b14c7ace888686295405e9'	2017-06-07 00:19:42 -04:00
Soumith Chintala	df7c47142d	fix for THNN NLLLoss signature change	2017-06-07 00:18:11 -04:00
Ahmed Taei	4c5d101caf	Implement ColwiseMax and RowwiseMax reduction ops. Differential Revision: D5192949 fbshipit-source-id: e7e877b4bea19dd1be94449d45d2733f4858b8e7	2017-06-06 21:17:29 -07:00
Soumith Chintala	b96f76e470	standalone macros	2017-06-07 00:17:05 -04:00
Soumith Chintala	7e62971c86	Merge commit '71ccedbc6c4e460d38c794737bba780e7673e888'	2017-06-06 23:38:52 -04:00
Soumith Chintala	a7d987544d	Merge commit '4e49aed5eaa5a4abaf0a51bb87a49b44394ea3c3'	2017-06-06 23:35:42 -04:00
Soumith Chintala	4e49aed5ea	fix outputHeight <-> outputWidth	2017-06-06 23:33:51 -04:00
Soumith Chintala	71ccedbc6c	Merge pull request #470 from qqning/master Fix the mix-up of height and width on depth-wise convolution	2017-06-06 23:31:54 -04:00
Soumith Chintala	c3cda260b6	Merge commit '64faf120acb97866dfd90bf428b385deee4ee912'	2017-06-06 23:27:45 -04:00
Christian Sarofeen	22949350b6	More performant fix for fused rnn kernels (#1532 ) and bugfix (#1721 )	2017-06-06 23:25:31 -04:00
Christian Sarofeen	3f7b48ccda	Remove clone in fused rnn	2017-06-06 23:20:14 -04:00
Christian Sarofeen	db620304b2	More performant fix for fused rnn kernels (#1532 ) and bugfix for #1721	2017-06-06 23:13:07 -04:00
Aron Barreira Bordin	d7db75c10f	added CosineSimilarity to nn.distance and updated docs (#1672 ) * added CosineSimilarity to nn.distance and updated docs	2017-06-06 22:53:21 -04:00
Du Tran	89894536c8	Fix VideoInputOp memory leak Summary: VideoInputOp has memory leak Differential Revision: D5193802 fbshipit-source-id: a48e309b845e84ec83875119646bbb6f926ac755	2017-06-06 16:08:45 -07:00
Pieter Noordhuis	e50d599240	Fix header inclusion in math.h Summary: While debugging #43 I found common/common.h missing some headers as well. Fixes #43. Closes https://github.com/facebookincubator/gloo/pull/44 Differential Revision: D5194970 Pulled By: pietern fbshipit-source-id: 4861cd04c56931d4759f5bc050816788252003ee	2017-06-06 15:21:08 -07:00
Aarti Basant	93ac6a9837	checkpointing for distributed hive reader. Summary: The goal of this diff is: 1) Enable checkpointing to honor batches_per_epoch 2) Resume hive_readers mid-split Reviewed By: azzolini Differential Revision: D5004212 fbshipit-source-id: 2ff5df30ba946eefadd109d80056cde67398a080	2017-06-06 14:20:06 -07:00
Wenyi Huang	7723129d14	Add gradient for topK op Summary: Input of topK op: X (dense) Output of topK op: Value and Indices (sparse representation) Value will have gradient in some cases, We backprop (copy) the gradient from sparse (d Value) to dense (d X) Differential Revision: D5133461 fbshipit-source-id: 7bad55b60e8a22dfe0e51357ce2099d7f752c133	2017-06-06 14:20:06 -07:00
Xiangyu Wang	c9c862fa8f	16117716 [Caffe2 OSS] make char-rnn exapmle use build_sgd Summary: replace hand made sgd with build_sgd Reviewed By: salexspb Differential Revision: D5186331 fbshipit-source-id: 3c7b4b370e29a1344b95819766463bae3812c9a6	2017-06-06 13:54:59 -07:00
Kun Han	2ec2d23f88	booleanmask support indices sorting Summary: The booleanmask supports another output with sorted indices Differential Revision: D4984255 fbshipit-source-id: becb10d7fe989bb2f6488c901766a45369613eb7	2017-06-06 13:32:51 -07:00
Alykhan Tejani	c6a6391c38	added checks to cudnn Convolution for stride, dilation, kernel size and num input planes (#1723 ) * added checks to cudnn Convolution for stride, dilation, kernel size and num input planes	2017-06-06 15:42:00 -04:00
Soumith Chintala	d50ad408fa	fix incorrect grad_weight in Bilinear	2017-06-06 15:07:09 -04:00
Pavan Yalamanchili	73ccdb3920	Fixing the issue with incorrect normalized values in IndexLinear	2017-06-06 11:44:11 -07:00
Ben Zhang	b36d716614	Implemented a ObserverBase class for Tracing Graph performance. Summary: Contains the ObserverBase class and some unittests. Reviewed By: bwasti, pietern Differential Revision: D5099367 fbshipit-source-id: fabde126d3281729dfc772d63dbf363e5d649319	2017-06-06 03:46:23 -07:00
Dmytro Dzhulgakov	80fe2e5caf	Fix from_column_list Summary: Previous implementation relied on the order of fields for some reason. Reviewed By: azzolini Differential Revision: D5164478 fbshipit-source-id: 12717310860584e18ce4ca67d0bd5048354cdc0a	2017-06-06 01:17:02 -07:00
Yiming Wu	8cd208ad6f	Infer input and output device from OperatorDef through OperatorSchema Summary: Infer input and output device from OperatorDef through OperatorSchema. This is inspired by shape inference. With this feature, we can easily analysis device information for all blobs in the net in a generic way. It is really helpful for auto cross device execution. Reviewed By: akyrola, dzhulgakov Differential Revision: D5161065 fbshipit-source-id: ee656123112171a4ca00f2fb3f6940f32ddf3135	2017-06-05 23:47:33 -07:00
Yiming Wu	a38cae76ab	benchmark compactible with lateest building process Summary: update the new sigmoid calling process Reviewed By: dzhulgakov Differential Revision: D5187589 fbshipit-source-id: cf29e7e80776ac1c4cf5718c5d6043d44f62d4de	2017-06-05 23:47:32 -07:00
Andrey Malevich	a5fc70857c	Support fetching of the parameters from the global namescope by '' Summary: This diff is fixing fetching of the parameters in the global namescope. Earlier diff that have switched to '' have introduced this bug. Reviewed By: dzhulgakov Differential Revision: D5189667 fbshipit-source-id: 4818e99e2c2c90788e70e0b8b6204ec6f471d37d	2017-06-05 22:32:39 -07:00
Soumith Chintala	b6c75c43c8	add tests for checking the type of .data and .grad.data is the same	2017-06-06 01:06:14 -04:00
Adam Paszke	a53cde09b5	Rename masked_copy_ to masked_scatter_	2017-06-06 01:06:14 -04:00
Adam Paszke	98afdcf409	Accept None values returned from grad hooks	2017-06-06 01:06:14 -04:00
Adam Paszke	ef32e96447	Fix grad type of compare functions	2017-06-06 01:06:14 -04:00
Adam Paszke	b032b88f34	Fix Prod backward and autograd tests	2017-06-06 01:06:14 -04:00
Yan Wang	a76098ac15	fix optimizer when given single parameters (instead of an iterable) When I use the named_parametes to modify the lr and weight decay, I will face a bug. Because the value of the named_parameters return is torch.nn.paramter.Parameter, not a generator of the Parameter.	2017-06-05 23:47:56 -04:00
Bubble	2ce5875a4d	Modify the sample code of extending autograd (#1720 ) The original input can not be used as input of Linear(), because forward() takes at least 3 arguments (2 given)	2017-06-05 23:36:58 -04:00
Francisco Massa	511cb20e7d	Add Gesv to autograd (#1733 ) * Add Gesv to autograd * Add TODO for backprop through LU	2017-06-05 21:38:49 -04:00
Fedor Borisyuk	686470a6b8	Feature importance in dper 2.0: build network representation Summary: Changes to enable feature importance. Reviewed By: kennyhorror Differential Revision: D5075252 fbshipit-source-id: e5d46e129bcd5cbef77932c63b5a288dd57775d1	2017-06-05 18:03:34 -07:00
Wael Abdelghani	ebecafbcca	Support for position weighted in distributed PS Summary: Title Reviewed By: azzolini Differential Revision: D5081871 fbshipit-source-id: 68a97c2112522fbcbcdfd9e0f717b8bce60fe028	2017-06-05 17:04:42 -07:00
Wael Abdelghani	5447f5c0d7	Move position weighted to separate layer Reviewed By: kennyhorror Differential Revision: D5063086 fbshipit-source-id: 212c08946728437bcc8b6049438ae82235137ec6	2017-06-05 15:49:22 -07:00
James Cross	f1c971d04b	add ExpandDims to _known_working_ops Summary: ExpandDims is a trivial utility op which should not be triggering a warning when used by ModelHelper. Reviewed By: akyrola Differential Revision: D5117985 fbshipit-source-id: 5589f46f58458f5019924b48602db088563f2fee	2017-06-05 15:49:21 -07:00
Aapo Kyrola	5e6bd4fbfc	Return predict params from ExtractPredictorNet + test Summary: Make it easier for users by returning from ExtractPredictorNet the list of blobs that must be saved/exported to run a predictor net. Added a test for ExtractPredictorNet Codemod. Reviewed By: asaadaldien Differential Revision: D5176097 fbshipit-source-id: b1af42132459487b8d94fcdde0e4c514da608243	2017-06-05 15:34:37 -07:00
Aapo Kyrola	2a93470238	dont use Swap for param gradients but accumulate directly to correct grad blob Summary: Swap for accumulated gradients causes problems with distributed training as Gloo ops expect the buffers (pointers) to remain the same. Also, it is quite a hack. So after talking with salexspb, this diff changes the parameter gradient by "transposing" it: - gradient ops are rewritten to write into a blob with name grad + "_tmpstep" - then that blob is accumulated directly to the actual gradient blob, not a temporary "_acc" blob. Reviewed By: salexspb Differential Revision: D5184839 fbshipit-source-id: c7ca445d4077ff90413c358bb0f7199d123a5553	2017-06-05 15:07:39 -07:00
Luke Yeager	df2f52704c	Another fix for KeepOnShrink tests Summary: Fix #417 again (#551 was insufficient) Even after a reallocation, the data address can still be the same if malloc returns the same newly freed address. * Be very explicit and careful about how we set these flags so they don't interfere with other tests * Disable the failing check This somewhat takes the teeth out of this test, since it no longer verifies that the reallocation actually occurs. Test with: ``` blob_test --gtest_filter=TensorCPUTestShrink \ --gtest_shuffle --gtest_repeat=100 --gtest_throw_on_failure ``` /cc sunwael Closes https://github.com/caffe2/caffe2/pull/723 Differential Revision: D5174953 Pulled By: akyrola fbshipit-source-id: 3d875a52c8139e73db85550817dea3c837eb7eae	2017-06-05 14:47:16 -07:00
ngimel	e3305eb9dc	Runtime dockerfile (#1732 ) * reduce the size of Docker image * add runtime dockerfile	2017-06-05 17:40:06 -04:00
Makarand Tapaswi	e9bf702c5e	LSTM bias_hh, fix docs Rename W_hi ... to b_hi ...	2017-06-05 22:55:09 +02:00
Andrew Dye	9a2d11dd36	Use a longer timeout when establing initial tcp connection Summary: Machines may not create their Gloo pairs at the same time, due to earlier variable time work. Increase the timeout used to establish the initial tcp connection to accommodate without sacrificing the shorter default timeout for outstanding reads/writes. No related change required for ibverbs as there is no communication on init. Reviewed By: akyrola Differential Revision: D5184518 fbshipit-source-id: 0e6c9704a2d2f1406b3927f75887f0a42199450b	2017-06-05 13:40:22 -07:00
Ross Girshick	8e99824ce7	Allow subsets of gradient outputs / inputs in Python ops Summary: I'm using Python ops in a project and need corresponding Python gradient ops. For my use case, only a subset of the forward op outputs have gradients and only a subset of forward op inputs have gradients. However the current implementation of `GetPythonGradient` forces all grad inputs and outputs to exist. This diff allows one to specify that only a subset of grad inputs / outputs are used when constructing the Python op. I'm not sure if this is up to caffe2 standards, so please push back on style and content as needed. Reviewed By: dzhulgakov Differential Revision: D4897004 fbshipit-source-id: 96fffe8634c51a49b6bce7339a46c6235f7d4bbd	2017-06-05 12:52:01 -07:00
Yiming Wu	8871ef029b	quick fix future issue with brew/core/schema/workspace/scope/utils.py Summary: fixing missing future package issue. Recently we found some of our users does not have future module support. So we might need a try/catch wrapper around all past import Reviewed By: Yangqing Differential Revision: D5183547 fbshipit-source-id: 262fdf2940ee1be4454bf0b0abb9e6a0f1a0ee82	2017-06-05 12:01:48 -07:00
Andrey Malevich	77c1027abb	Create ParameterSharing abstraction for Caffe2. Summary: This diff is introducing abstractions for parameter sharing for all the parameters, that are created through new create_param syntax. Possible use-cases of this parameters sharing: 1. Share params within RNN interface. 2. Some complicated models that might share some of the branches. 3. TODO (next diff): Cross-model parameter sharing. Reviewed By: salexspb Differential Revision: D5160935 fbshipit-source-id: c6d40a5ed7ead240cd7db0eb69de6dc5f505b05a	2017-06-05 11:49:54 -07:00
ngimel	3716286e6b	reduce the size of Docker image (#1729 )	2017-06-05 14:03:11 -04:00
Luke Yeager	112561bcd4	Hide loud warning when using to third_party eigen Summary: This is a little excessive: ``` CMake Warning at cmake/Dependencies.cmake:201 (find_package): By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Eigen3", but CMake did not find one. Could not find a package configuration file provided by "Eigen3" with any of the following names: Eigen3Config.cmake eigen3-config.cmake Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): CMakeLists.txt:72 (include) ``` Closes https://github.com/caffe2/caffe2/pull/729 Differential Revision: D5183059 Pulled By: Yangqing fbshipit-source-id: d17d5d06a50abb50f9978d022ddc4918e991079d	2017-06-05 10:33:27 -07:00
Soumith Chintala	c357ebd590	Merge commit '6422ea3d9f065683bb899b88ae0baec79e6d73ca'	2017-06-05 13:01:25 -04:00
Sam Gross	85a95d8a23	Fix sharing of CUDA tensors on non-current devices The correct device must be set when getting the base allocation and when calling cudaIpcCloseMemHandle. Store the device in the allocators context, which was previously always NULL. Fixes #1707	2017-06-05 13:01:19 -04:00
Sam Gross	6422ea3d9f	Fix sharing of CUDA tensors on non-current devices	2017-06-05 12:58:34 -04:00
Kai Arulkumaran	ddf6328990	Document type function returns type with no args (#1719 )	2017-06-05 11:54:55 -04:00
Marvin Cao	174c3cc399	Add support for double backward of LeakyReLU (#1714 )	2017-06-05 11:53:27 -04:00
Sasank Chilamkurthy	24aecaa2c8	Cleanup torch vision docs (#1699 ) * Modify torchvision documentation following https://github.com/pytorch/vision/pull/179 * Add new datasets to docs * Fix wording in torch.datasets * Small clarification	2017-06-05 11:52:41 -04:00
Aapo Kyrola	4e33aee349	remove stray code from CUDNN ConvTransposeGradient that caused a memory allocation Summary: KaimingHe noticed a curious performance problem with ConvTranspose (actually ConvTransposeGradient): it got slower when more GPUs were used! This did not make sense. After some strenuous debugging, I noticed that tensor Y = Output(0) was being reallocated every time: this causes the slowdown because we grab a mutex for each allocation. Turns out this Y variable is copy-paste code and actually not intended to be part of the gradient op. This caused reallocation because the computed size of Y was larger than dfilter's (also Output(0)), but we never set the capacity of Y/dfilter to match the capacity of the larger size. Thus, Tensor.Resize() always ended up reseting the tensor --> allocation. This did not affect correctness of the code, but made it super-slow. Before on KaimingHe's code ConvTransposeGradient took total of 3800 ms, now about 200ms. Reviewed By: ajtulloch Differential Revision: D5180280 fbshipit-source-id: d72f23038f0c51d82bcde7aed55089d657bda03e	2017-06-04 06:46:35 -07:00
陈云	4853cc0194	convert linalg.py to new-style functions (#1638 )	2017-06-04 09:27:01 -04:00
gchanan	ac1c674723	Fix a couple of selection reduce function autograd bugs (#1702 ) * Fix Median/Mode autograd functions. * Fix kthvalue autograd function. * Double backward for selection reduce functions.	2017-06-03 02:12:15 -04:00
Du Tran	705a8fb1b2	minor modify video_input_op Summary: simply allows to access the third protos only when temporal jittering option is off Differential Revision: D5178943 fbshipit-source-id: 027234abee5c5c9fcf624dcbd55eb10ae8c9314f	2017-06-02 22:46:56 -07:00
Andrey Malevich	e05173a476	Create ExternalInitializer to simplify logic around init_params = False Summary: This diff is creating new type of Initializer - ExternalInitializer. This initializer is supposed to be used in cases when the parameter blob is already expected to exist in the workspace. Reviewed By: dzhulgakov Differential Revision: D5171322 fbshipit-source-id: d27861f0f80afdea93c235d49f63da19adccc92c	2017-06-02 18:22:50 -07:00
Sam Gross	eba3dc8561	Fix gc_refs assertion failure (#1705 ) * Fix gc_refs assertion failure Ensure that each THPVariable -> THPFunction reference contributes one ref count to the THPFunction by creating a new shared_ptr for each ref. Because multiple shared_ptrs can again manage a single THPFunction, it's not safe to use std::weak_ptr where it may point to a PyFunction. It's still safe to use weak_ptr for grad_accumulator since these are never PyFunctions. Fixes #1626 * Remove stale comment	2017-06-02 21:08:50 -04:00
Andrey Malevich	a8fb85797c	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params. Summary: This diff is the first step in the effort for refactoring all parameters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5171159 fbshipit-source-id: 68031ca779f053fb266a7c4a2e5b482a3bd9c832	2017-06-02 17:17:57 -07:00
Tao Wu	3bd6195891	removed Sum from simple_operator_layers.py; passed unit tests Summary: removed softmax, sigmoid, tanh, relu from simple_operator_layers.py; passed all unit tests Reviewed By: kittipatv Differential Revision: D5150271 fbshipit-source-id: abe611bf6c5de5caba189181e9e41d705d8c5c54	2017-06-02 15:03:16 -07:00
Janusz Marcinkiewicz	ee9d4d58e2	Fix connect bug Before the change, processes were not waiting for master even when they got 'connection refused' (master is not listening yet, so we should wait). It was because we were closing socket twice: first, by the resource guard; second, manually in exception handler. That caused errno to be set to different value (9 - bad file descriptor) and in result `if`, which checked if connection was refused, was failing.	2017-06-02 23:42:11 +02:00
Adam Paszke	b7c4900d19	Fix minor bug in InitMethodFile	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	e22f9036de	Add tcp init method for non-multicast addresses	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	c01ff1f3dc	Make world_size mandatory for Master and Worker; Minor refactor	2017-06-02 23:42:11 +02:00
Adam Paszke	eeb8e5c31b	Linux fixes	2017-06-02 23:42:11 +02:00
Adam Paszke	c6c9e61169	Implement THD tensor copies	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	34804e9600	Refactor file and tcp init methods * Add sanity checks * Refactor InitMethodFile and TCPInitMethod to more logical functions * Update few error messages * Add passing parameters by *kwargs, so now order of parameters is not relevant Review comments	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	c41555fb0a	Add rank parameter; Fix MW mode initalization	2017-06-02 23:42:11 +02:00
Adam Paszke	96cc1e1ac7	Review comments	2017-06-02 23:42:11 +02:00
Adam Paszke	cfdd49f76a	Simplify and refactor init code	2017-06-02 23:42:11 +02:00
Adam Paszke	447d9287bf	Refactor multicast and change env init method	2017-06-02 23:42:11 +02:00
Adam Paszke	832eaf900b	Fix bugs and improve init methods	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	e685277299	Add address discovery; Bug fixes;	2017-06-02 23:42:11 +02:00
Adam Paszke	8ea7c87c29	Improve init methods	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	09c0d9c51c	Add multiple initalization methods for DataChannels	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	240384605c	Make copy functions thread safe (#82 )	2017-06-02 23:42:11 +02:00
Mateusz Piotrowski	9f9a3d596f	Use lock_guard and don't use unique_ptr	2017-06-02 23:42:11 +02:00
Mateusz Piotrowski	a8c26c1040	Add mutexes to MasterCommandChannel::sendMessage	2017-06-02 23:42:11 +02:00
Mateusz Piotrowski	6cdfe0d7b9	Remove MASTER_ADDR and _PORT from MPI benchmarking	2017-06-02 23:42:11 +02:00
Mateusz Piotrowski	1b66b50064	Benchmarks: Don't export WORLD_SIZE when using MPI I just realized we don't need it (any longer?).	2017-06-02 23:42:11 +02:00
Mateusz Piotrowski	cf42c1a044	Improve error messages of DataChannel::newChannel	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	f717f29d7e	Change function names; Change thpp::Tensor to THDTensorDescriptor	2017-06-02 23:42:11 +02:00
Adam Paszke	181d2f41bd	Add initial Python wrappers for THDTensors	2017-06-02 23:42:11 +02:00
Adam Paszke	2059ece284	Exit workers gracefully in master-worker mode	2017-06-02 23:42:11 +02:00
Janusz Marcinkiewicz	b3e100b40e	Add copy (TH <-> THD) functions to MW mode	2017-06-02 23:42:11 +02:00
Aapo Kyrola	401908d570	add_weight_decay + restore weight decay to resnet50_trainer Summary: Add add_weight_decay to optimizer + test. In D5142973 I accidentally removed weight decay from resnet50 trainer, so this restores it. Reviewed By: asaadaldien Differential Revision: D5173594 fbshipit-source-id: c736d8955eddff151632ae6be11afde0883f7531	2017-06-02 14:16:56 -07:00
Matt Uyttendaele	398379db68	fixing lint errors in image_input_op Summary: noticed a few lint errors in image_input_op so cleaned them up Reviewed By: akyrola Differential Revision: D5152171 fbshipit-source-id: f84f476ddace6b4164607a01a9780a2e57e2133f	2017-06-02 14:03:01 -07:00
Aaron Markham	a2ba169354	fixed operators schema output to work from only this file for OSS Summary: old diff had some changes to formatter.py and generator.py, but now everything is in github.py Reviewed By: bwasti Differential Revision: D5165061 fbshipit-source-id: 5fe5ff70ff2c5525c7aacf20854916c86d272749	2017-06-02 13:47:25 -07:00
Jimmy Jia	ec2de16776	Improve README copyediting	2017-06-02 21:02:14 +02:00
Sam Gross	ea05d6aec3	Fix compilation with cuDNN 5 (#1703 )	2017-06-02 14:03:02 -04:00
João Felipe Santos	5a93d6b903	Fix CUDA_HOME detection (#1675 )	2017-06-02 19:26:00 +02:00
Francisco Massa	75e0df271a	Add Inverse to autograd (#1670 ) * Add Inverse to autograd * Add SkipTest to autograd tests	2017-06-02 12:00:13 -04:00
Edward Z. Yang	565bf7116b	A pile of misc doc fixes. (#1682 ) * A pile of misc doc fixes. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Handle @apaszke review comments. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Initial csrc documentation. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-06-02 11:59:03 -04:00
Alykhan Tejani	f1c57ace1b	added input dim checks to convxD and conv_transposedxd (#1695 ) * add input dim check for conv2d * add None check to conv2d * added input dim checks to convxD and conv_transposedxd * flake8 fixes	2017-06-02 11:58:19 -04:00
Soumith Chintala	460b8715a8	display version number in docs	2017-06-02 11:56:48 -04:00
Soumith Chintala	6da111c53d	Merge commit '00843c57c936720b3d17f4c0afaab08dcb52a7cc'	2017-06-02 11:52:19 -04:00
Alexey Romanenko	568c5c91ee	substitute cudnnFind* functions with cudnnFind*Ex	2017-06-02 11:52:12 -04:00
Alexey Romanenko	00843c57c9	substitute cudnnFind* functions with cudnnFind*Ex	2017-06-02 11:50:50 -04:00
Alykhan Tejani	501467db17	added param name to tuple_parser for better error messages	2017-06-02 16:16:21 +02:00
James Cross	4bed0c6d41	Update RNN Seq2SeqModelCaffe2EnsembleDecoder to reflect training network structure Summary: Use new blob as residual sum output, and add scoping to prevent any name conflicts. Reviewed By: urikz Differential Revision: D5167145 fbshipit-source-id: a01c87ed2278205e95e8395314b166afb1dca1b3	2017-06-01 23:32:35 -07:00
Ahmed Taei	55ada6d64e	Fix padding params check for conv-cudnn. Reviewed By: dutran Differential Revision: D5169744 fbshipit-source-id: 3d1c50328eefed01fb9d4daa84478c45cd0aa5fd	2017-06-01 22:38:06 -07:00
Grigori Fursin	b3e179ea31	fixing lmdb.cc when compiled on Windows (mkdir -> _mkdir) Summary: Should fix #462 . Closes https://github.com/caffe2/caffe2/pull/539 Reviewed By: asaadaldien, dzhulgakov Differential Revision: D5162615 Pulled By: Yangqing fbshipit-source-id: 985d3694e389bcf1fd96990254a53d806baba0cb	2017-06-01 21:48:25 -07:00
Pooya Davoodi	2c97c98ca7	Enable testing the GPU implementations of Adagrad and Adam Summary: Enable testing the GPU implementations of Adagrad and Adam incl sparse versions. Closes https://github.com/caffe2/caffe2/pull/607 Reviewed By: dzhulgakov Differential Revision: D5121552 Pulled By: Yangqing fbshipit-source-id: da6b7dde456237c94cf74d00860e7327b2267eab	2017-06-01 18:10:57 -07:00
Kun Han	fc4d118e6b	Caffe2 MemNN Production Model Saving Summary: Split the Caffe2 memory based model into to parts - Dimension reduction MLP - DNN with concatenation of memory and obj feature Currently only implement simple mean Differential Revision: D4866825 fbshipit-source-id: d2f6813402513ec9af30dbe29a50593e2d3cdb3b	2017-06-01 14:31:53 -07:00
Ahmed Taei	299f293cb2	Add initializer classes to conv_nd. Summary: Fix parameters passed to _ConvBase Reviewed By: sunwael Differential Revision: D5166836 fbshipit-source-id: 6c2a9fa73cf1199a5f861900554f3075a49104fc	2017-06-01 14:17:55 -07:00
Aaron Markham	05e060974f	added events and user group info Summary: also contains previous edits on statuses which should be in here.... Closes https://github.com/caffe2/caffe2/pull/657 Differential Revision: D5158733 Pulled By: aaronmarkham fbshipit-source-id: faba2ab8e2dab206e09f57021b973b3e7d01af95	2017-06-01 09:35:26 -07:00
Simon Layton	58874ad5bf	Fp16 training initializers Summary: Re-open for re-importing :) Closes https://github.com/caffe2/caffe2/pull/721 Differential Revision: D5164345 Pulled By: akyrola fbshipit-source-id: e80b32556cd25610602df91a4225b93edc0ca40b	2017-06-01 08:34:46 -07:00
Alykhan Tejani	d51cd61e2e	add checks for input, weight and bias types when using cudnn conv2d (#1689 )	2017-06-01 10:06:30 -04:00
Bubble	447fe953e5	Modify the sample code of volatile (#1694 ) The original two inputs (torch.randn(5,5)) can not be used as input of resnet, which must be (batch, channels, width, height)	2017-06-01 09:46:04 -04:00
Aapo Kyrola	ffbba0fae7	add model_helper Validate() + sprinkler around Summary: Recent diff introduced a duplicate parameter to the model, which would hurt the performance and also affect correctness (duplicate momentum updates, for example). We unfortunately had no checks for duplicate params, outside of data_parallel_model, which fortunately brought this into our attention. But it is better to have a Validate() function in model_helper, and call that before adding gradient ops and querying for parameters. Added to brew_test calls as well. Reviewed By: kennyhorror Differential Revision: D5163458 fbshipit-source-id: 35692e8bfcc359d4e8bc73e6f2358659f6e45ceb	2017-06-01 02:36:47 -07:00
Aapo Kyrola	0f8c8f37a8	Revert D5159712: [caffe2][PR] Fp16 training initializers Summary: This reverts commit 60a889494d2e2f4df1d720331e19f638c5eb95cc Differential Revision: D5159712 fbshipit-source-id: 16040c911b260648857f656f92b165f92c2daae0	2017-06-01 00:17:14 -07:00
Aapo Kyrola	076376f4f6	Revert D5119830: [C2] Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This reverts commit 2001090a37346eb12abbb234e13e727c288eb8a7 Differential Revision: D5119830 fbshipit-source-id: bf321868338f0db85dff3237af7eaf74212dbdf6	2017-06-01 00:02:21 -07:00
Andrey Malevich	ff61ed358e	Refactoring of the parameters step 0. Add simple tags and unify interface for params and computed_params Summary: This diff is the first step in the effort for refactoring all paramters. As a first step - I'm merging concept of params and computed_params, that is going to be based on tags instead (in the first version it's still using old data structs to store all the BlobReferences). Renaming computed_params to non-trainable/non-backprop params should be done is some other diff. Reviewed By: salexspb Differential Revision: D5119830 fbshipit-source-id: 2001090a37346eb12abbb234e13e727c288eb8a7	2017-05-31 22:36:36 -07:00
Arron Cao	7c3add4408	better android ndk path Summary: use user defined android ndk path instead of hard code. Closes https://github.com/caffe2/caffe2/pull/506 Differential Revision: D5162646 Pulled By: Yangqing fbshipit-source-id: 5093888e15607b3bf6682e05eb91aa94c6206b01	2017-05-31 20:35:23 -07:00
Luke Yeager	d8d1cd1064	Test smaller tensors in segment_ops_test Summary: It's causing problems inside docker containers: `InvalidArgument: Insufficient bytes of entropy to draw requested array. shape=(5, 9, 10, 5), dtype=float32. Can you reduce the size or dimensions of the array? What about using a smaller dtype? If slow test runs and minimisation are acceptable, you could increase settings().buffer_size from 8192 to at least 18432000.` Closes https://github.com/caffe2/caffe2/pull/707 Differential Revision: D5162621 Pulled By: Yangqing fbshipit-source-id: 55544210961cbc80828dca2cbeba6a5ace8cf8d1	2017-05-31 20:17:31 -07:00
Luke Yeager	e2cf007dc8	Avoid numpy VisibleDeprecationWarning in test Summary: This warning becomes an error with https://github.com/numpy/numpy/pull/6271 (`>=0.12.0`). ``` caffe2/python/operator_test/tile_op_test.py::TestTile::test_tilewinput /opt/caffe2/caffe2/python/operator_test/tile_op_test.py💯 VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future dims[axis] = tiles /usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py:873: VisibleDeprecationWarning: converting an array with ndim > 0 to an index will result in an error in the future return c.reshape(shape_out) ``` Closes https://github.com/caffe2/caffe2/pull/710 Differential Revision: D5160776 Pulled By: Yangqing fbshipit-source-id: b264e0e389de5817a289db878c15e655f9fa2f09	2017-05-31 20:01:30 -07:00
Pieter Noordhuis	7b5af7d1b7	Expand ibverbs read timeout messages Summary: TSIA Reviewed By: romain-intel Differential Revision: D5158642 fbshipit-source-id: 6e55a69a140c1f5f6e4ce6262afaf5014c412414	2017-05-31 19:50:21 -07:00
Luke Yeager	4da9e92d3f	MPIConstantFill -> ConstantFill Summary: Continuation of https://github.com/caffe2/caffe2/pull/709 Close https://github.com/caffe2/caffe2/issues/706 /cc Yangqing Closes https://github.com/caffe2/caffe2/pull/711 Differential Revision: D5162486 Pulled By: Yangqing fbshipit-source-id: 3ff069aa27eecf73c3dc51eacf86a6974f027625	2017-05-31 19:47:49 -07:00
Simon Layton	2bfacff426	Fp16 training initializers Summary: Adds support for generating and training pfp16 models. Added SGD optimizer for multi-precision trainers and a new callback to data_parallel_model in order to help multi-precision models keep their different copies of parameters in sync during training. Closes https://github.com/caffe2/caffe2/pull/697 Differential Revision: D5159712 Pulled By: salexspb fbshipit-source-id: 60a889494d2e2f4df1d720331e19f638c5eb95cc	2017-05-31 17:46:58 -07:00
Yangqing Jia	1740f90821	disable appveyor for cuda for now due to out of time error Summary: Closes https://github.com/caffe2/caffe2/pull/708 Differential Revision: D5158662 Pulled By: Yangqing fbshipit-source-id: cdff7e79c6d91c867a9b339525bc67e222b3b28d	2017-05-31 16:48:09 -07:00
Yangqing Jia	680a00e99a	MPIConstantFill -> ConstantFill Summary: (this is due to an earlier blind vim find-replace error) Closes https://github.com/caffe2/caffe2/pull/709 Differential Revision: D5159055 Pulled By: Yangqing fbshipit-source-id: f188b7bebf79a45825568ba96a71b535fe4e3aad	2017-05-31 16:36:49 -07:00
Ahmed Taei	f0f4c2fc5d	Increase the number of DAG execution worker threads. Reviewed By: akyrola Differential Revision: D5158414 fbshipit-source-id: add377aec5588076db881a2a3750101710f29732	2017-05-31 15:19:19 -07:00
Aapo Kyrola	73a8a49c7e	synchronize re-rendezvousing on node changes + support num_shards=1 rendezvous Summary: Currently we can get into broken situations when some nodes working on computation detectChanges() faster than others, thus only some of the nodes start doing next iteration of training. This is an inconsistent state. To prevent this to happen, now each node sets a "re-rendezvous flag" and that is allreduced after each iteration. Once all agnodes agree, re-rendezvous will be done. Also noticed that min_shards=1 does not work because data parallel model assumed num_shards>1 when rendezvous is not None. Fixed that. Reviewed By: andrewwdye Differential Revision: D5156282 fbshipit-source-id: f2ccbd8ad13ed37f7813ff8ad1080d963d0d17e3	2017-05-31 15:19:13 -07:00
Luke Yeager	72ea177188	Add target for quick build+test Summary: Once the build is cached, QUICKTEST takes less than 3 minutes to install+build+test (first build is ~13 minutes). Future TravisCI improvements: * Refactor other build targets so they're fast enough to build in under 45 mins * Run tests for other build targets * Run Python tests Closes https://github.com/caffe2/caffe2/pull/550 Differential Revision: D5157407 Pulled By: Yangqing fbshipit-source-id: b2b2d9c2c85423cc78f314951da54b64c247c0af	2017-05-31 13:51:53 -07:00
Dan Zimmerman	f0795c15a4	Disable stacktrace on fatal signal by default Summary: This PR adds a cli flag '--caffe2_print_stacktraces' that takes a bool, that, when set, will print stack traces when a fatal signal occurs. As a side effect a few new APIs are introduced `caffe2::setPrintStackTracesOnFatalSignal` and `caffe2::printStackTracesOnFatalSignal` - however these are mostly exposed for testing infrastructure purposes. Also it appears at some point fatal signal handlers were strictly disabled for android - this PR re-enables them. Closes https://github.com/caffe2/caffe2/pull/698 Reviewed By: Yangqing Differential Revision: D5150001 Pulled By: danzimm fbshipit-source-id: abb4aada4ddae8bcfbf1a85f3d101ed63692f221	2017-05-31 12:54:04 -07:00
Romain Cledat	afc26ac675	Added time-out to ibverbs transport Summary: Extended the time-out option from just working on TCP to also working with ibverbs Reviewed By: pietern Differential Revision: D5090258 fbshipit-source-id: fee685850d761d0c2130852f513c64ceb19f4e9e	2017-05-31 11:20:40 -07:00
Ahmed Taei	f2d9d97008	Add an option to reset momentum-sgd params every time between successive block updates. Reviewed By: akyrola Differential Revision: D5149263 fbshipit-source-id: c0a3637a1b48f74ec55c9d13c8fab3456dab809c	2017-05-31 00:32:11 -07:00
Aapo Kyrola	ccdf2d99e1	Add description to assert in model_helper Summary: Add information about the offending param when assertion fires. Reviewed By: kennyhorror Differential Revision: D5153625 fbshipit-source-id: 9f5a02bf64ccbdef9d93d346f79e589dfe3ec5be	2017-05-31 00:02:18 -07:00
Aapo Kyrola	c344880373	add automatic timing of parameter update phase Summary: Add timing of the phase between last gradient op and the final sync. This gives approximate measure of the latency of distributed computation and helps detecting stragglers. Not intended as a real measure but just for relative comparison. This could be improved by making nodes share their timings and make decisions based on it. But for first step, we can just look at the numbers ourselves. Reviewed By: andrewwdye Differential Revision: D5149273 fbshipit-source-id: c4c346291c0feb6e9c6ceced64e7be667d17dcad	2017-05-30 20:47:18 -07:00
Aapo Kyrola	ce7ce46ca1	fix secondary device check by gradient, if it is sparse Summary: Fix an issue where the parameter is not created in param_init_net, or net, and then we secondarily look at which device op outputs the gradient. This did not work if the gradient was a GradientSlice. Reviewed By: harouwu Differential Revision: D5153102 fbshipit-source-id: 20eae660ea32e5a9ea484bf93c04c8f8c71a51ed	2017-05-30 20:47:17 -07:00
Aapo Kyrola	96d8ae2163	Make fills work with input_shape when run in CUDAContext Summary: If ConstantFill (or other fill op) is used in CUDAContext, with input_as_shape, the code crashes as it expects the shape be in CUDAContext but accesses the array in host code... We could fix this by copying the values from the CUDA tensor, but it is probably best to enforce the shape param is in CPU context. This is what this diff does. Differential Revision: D5152766 fbshipit-source-id: 0629a189bd1d800c0b7c9dbc324b78d279efac0b	2017-05-30 20:47:16 -07:00
Alexander Sidorov	846240a340	Caffe2 gradient generator bug fix Summary: Bug repro is in a test. Generally speaking accumulation was not happening if len(ys) >= 2 (list of blobs we compute gradients from) and for some blob in the net it was both in ys list and also got a gradient propagated from another element in ys. Reviewed By: akyrola Differential Revision: D5121695 fbshipit-source-id: 282d88f2f4f6e27dadae311964f40246a2739130	2017-05-30 18:47:08 -07:00
Romain Cledat	6f791e74f1	Add a minimum iteration count of 1 for benchmarks Summary: For some long running benchmarks, the iteration count could be 0 which would lead to a segfault when printing results Reviewed By: pietern Differential Revision: D5149034 fbshipit-source-id: 7b56e8961c302d1ff11ffcd74ca8e909ea046231	2017-05-30 18:12:39 -07:00
Andrey Malevich	aa59b217a9	Relax requirement on the outputs of the predictor. Summary: It looks like it's a bit too restrictive requirement. Let's remove it. Reviewed By: volkhin Differential Revision: D5150968 fbshipit-source-id: 9e38574edc6542c5ce3c7f25a01afe8f5ff9b507	2017-05-30 17:23:18 -07:00
Simon Layton	1aa6300696	Option to use NCCL for broadcast Summary: Fixes some performance issues when `broadcast_computed_params=True` is passed to Parallelize_GPU. Enabled via the same `use_nccl` flag as AllReduce Closes https://github.com/caffe2/caffe2/pull/630 Differential Revision: D5149828 Pulled By: akyrola fbshipit-source-id: 12c9714c7fa078811f1cde61c8523dca8f7f968f	2017-05-30 16:46:38 -07:00
Thomas Dudziak	47e921ba49	Remove map() and filter() in favor of comprehensions Summary: These return views in Python 3 which would not do anything in a lot of usages currently present in Caffe2. This diff simply removes (almost) all usages of these two in Caffe2 and sub projects in favor of comprehensions which are also easier to read/understand Reviewed By: akyrola Differential Revision: D5142049 fbshipit-source-id: e800631d2df7d0823fed698cae46c486038007dc	2017-05-30 15:32:58 -07:00
Adam Paszke	3106423713	Synchronize with H2D copyAsync before signalling the broadcast sender Summary: Closes https://github.com/facebookincubator/gloo/pull/41 Differential Revision: D5149996 Pulled By: pietern fbshipit-source-id: 15d61fab9babfeb1e4178b84ecf5f6e32ad3bfb3	2017-05-30 14:20:29 -07:00
Bram Wasti	0deec5b3b7	Add FLOP annotation functions to operator schema Summary: Basic FLOP annotation functionality added to operator schema. Reviewed By: dzhulgakov Differential Revision: D5114086 fbshipit-source-id: 8a15d45dee744fbdceaed3773d70fb69a5cf0d24	2017-05-30 14:17:32 -07:00
Aapo Kyrola	acb2ad12e5	fix race condition at terminate Summary: Looking at one segfault at exit (https://our.intern.facebook.com/intern/chronos/jobinstance/?jobinstanceid=911625597&smc=chronos_gp_admin_client&log_type=stderr&offset=0&pretty_logs=false) and it's coredump, only thing I can see that a FreeBlob() operator is called concurrently while a cudaMemcpyAsync (on thread 1) is crashing. FreeBlobOp is only called at data_workers _stop() (via utils.ResetBlobs()), and only code that could run a cudaMemcpyAsync that time is the fetcher -thread of data_workers that is enquing blobs. Here are the stacks: P57455299 This is clearly a bug since we should only clear the scratch blobs after all threads are terminated, which happens at wait_for_finish(). I am not 100% sure this fixes all the segfaults, but at least this one was most likely caused by this. Reviewed By: andrewwdye Differential Revision: D5146278 fbshipit-source-id: ae00796706bfc4fee6823caf6529b62ab20c1cd3	2017-05-30 13:47:10 -07:00
Lukasz Wesolowski	f6853d13df	always use halving-doubling allreduce algorithm Summary: Ring-chunked performance on 8 nodes was substantially worse than halving-doubling in some cases. We can just use halving-doubling in all cases. Reviewed By: prigoyal Differential Revision: D5148755 fbshipit-source-id: 1332065615be6b9faf873effac87056011e0e804	2017-05-30 13:16:46 -07:00
Aapo Kyrola	cdb50fbf2b	add optimizer support to data_parallel_model; Use MomentumSGDUpdate Summary: This diff does two things: - add supports for optimizer to data_parallel_model. User can supply optimizer_builder_fun instead of param_update_builder_fun. The latter is called for each GPU separately with proper namescope and devicescope, while optimizer builder only is called once and adds optimizes to the whole model. - use MomentumSGDUpdate instead of MomentumSGD + WeightedSum. This bring major perf benefits. Changes resnet50 trainer to use optimizer. This relies on D5133652 Reviewed By: dzhulgakov Differential Revision: D5142973 fbshipit-source-id: 98e1114f5fae6c657314b3296841ae2dad0dc0e2	2017-05-30 12:49:57 -07:00
Luke Yeager	0a9684c3b9	Mark in-place GPU dropout as broken, add test Summary: I'll let y'all decide how you want to fix this (probably need a persistent curand buffer). Here's a test to verify the fix. Closes https://github.com/caffe2/caffe2/pull/495 Differential Revision: D5148815 Pulled By: akyrola fbshipit-source-id: e80dabe65230ddd32340f2d872cd8786ac960bf8	2017-05-30 12:35:22 -07:00
Aapo Kyrola	44257ea5ed	automatically infer device scope for param Summary: hankun is using the optimizer, but having mixed set of of GPU and CPU operators. Currently this won't work with optimizer since it adds optimizers for all parameters in the current device scope. But we can actually infer the device that a param belongs to by looking at the device option in the param_init_net. Added a test as well. Reviewed By: salexspb Differential Revision: D5133652 fbshipit-source-id: ad8689d75ac1f5c78981bae1b6978fe91e40ef0f	2017-05-30 12:02:19 -07:00
Luke Yeager	6b1cf26380	Fix for dpm when GPUs don't have p2p access Summary: See discussion at https://github.com/caffe2/caffe2/pull/633#issuecomment-303536902 Tested with a TitanX (Pascal) and a TitanZ (Kepler) with this access pattern. ``` Checking GPU(s) for support of peer to peer memory access... > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU1) : No > Peer access from TITAN X (Pascal) (GPU0) -> GeForce GTX TITAN Z (GPU2) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU1) -> GeForce GTX TITAN Z (GPU2) : Yes > Peer access from GeForce GTX TITAN Z (GPU2) -> TITAN X (Pascal) (GPU0) : No > Peer access from GeForce GTX TITAN Z (GPU2) -> GeForce GTX TITAN Z (GPU1) : Yes ``` All combinations pass: * `0,1` * `0,2` * `1,2` * `0,1,2` Closes https://github.com/caffe2/caffe2/pull/659 Differential Revision: D5148779 Pulled By: akyrola fbshipit-source-id: 6263edfe8b36623983f1946b5c3f4a3fef415a45	2017-05-30 12:02:19 -07:00
Luke Yeager	a47652379f	Fix SparseAdagrad for indices.ndim>1 Summary: Same fix as https://github.com/caffe2/caffe2/pull/249, but for SparseAdagrad. Also update the tests for both ops to test this functionality. Closes https://github.com/caffe2/caffe2/pull/675 Differential Revision: D5148750 Pulled By: akyrola fbshipit-source-id: d30b722429bc547fd53400c1a29e4ee9e2e6ed18	2017-05-30 12:02:18 -07:00
Simon Layton	df2bd158db	Optional force conv algorithms Summary: Allow user to force cuDNN convolution algorithms from python - useful if you're using a standard network and don't want to pay the cost of exhaustive search. Defined as an array in the order of [fwd, wgrad, dgrad]. Also refactors cudnn_conv_op slightly to split the code to do wgrad and dgrad a little more. Closes https://github.com/caffe2/caffe2/pull/570 Reviewed By: akyrola Differential Revision: D5125731 Pulled By: asaadaldien fbshipit-source-id: cc5c64d3ccd2546f8e744d818f587bbbd24f055b	2017-05-30 10:46:41 -07:00
Luke Yeager	16b240145a	Fixing some tests Summary: As dzhulgakov said at https://github.com/caffe2/caffe2/pull/227#issuecomment-295084443, it would be nice to avoid this stream of CPU-only test fixes. The second fix could have been avoided if tests were run on TravisCI. I think the TravisCI infra could be greatly improved if we used ccache like your colleagues at PyTorch: https://github.com/pytorch/pytorch/pull/614. Would you be interested in a PR which does this? Closes https://github.com/caffe2/caffe2/pull/547 Differential Revision: D5147405 Pulled By: akyrola fbshipit-source-id: 5e9a4571d364c5f0ed8a5e216c9b6136dd4d10be	2017-05-30 09:16:48 -07:00
Luke Yeager	dc517b6c42	Change hypothesis settings for slow memonger test Summary: Failure mode: ``` - 7 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-14987 ms - Stopped because settings.timeout=60 ``` After this change: ``` - 5 passing examples, 0 failing examples, 0 invalid examples - Typical runtimes: 12-15475 ms - Stopped because settings.max_examples=5 ``` Obviously, the `DYNAMIC_PROGRAMMING` tests are the troublemakers. An alternate solution would be to make separate tests for the two assignment algorithms (one fast, one slow). Closes https://github.com/caffe2/caffe2/pull/676 Differential Revision: D5147363 Pulled By: akyrola fbshipit-source-id: 85d9f8198e53c10de2a8d6645e2b0eb7953c96e0	2017-05-30 09:16:48 -07:00
Simon Layton	2c3071fc4e	Rework initializers to pass a class not object Summary: Changed tests Moved to WeightInitializer, BiasInitializer keywords Closes https://github.com/caffe2/caffe2/pull/682 Reviewed By: Yangqing Differential Revision: D5138769 Pulled By: salexspb fbshipit-source-id: 81d266100b2a95c64c0196c16670dfd34ea03e02	2017-05-30 09:06:56 -07:00
Liang Shuailong	4eb448a051	Fix simple typo Dimension a bit wrong	2017-05-28 18:53:04 +02:00
Huazhong Ning	660dd58022	fix for realtime training. Reviewed By: kennyhorror Differential Revision: D5068298 fbshipit-source-id: 0dc3580c9c8123368a3625fb654c6eaf1dc4a950	2017-05-26 23:49:40 -07:00
Jiyan Yang	6aff754dbc	Add batch normalization layer Summary: As desc. Reviewed By: xianjiec Differential Revision: D5077230 fbshipit-source-id: f73cdedac6d9a3542f8ef829b54fb4c713dcafd0	2017-05-26 16:46:52 -07:00
Thomas Dudziak	ec19b4bd7b	Import fixes for Python 3 Summary: As title Differential Revision: D5135990 fbshipit-source-id: 88cb15bb2fb97dd21faf3ea5ddb8d4dbff7fad93	2017-05-26 16:31:50 -07:00
Thomas Dudziak	3ccbf23132	String-related fixes for Python 3 Summary: This diff is one step towards enabling python 3 build by making it be more diligent in its handling of strings. Reviewed By: salexspb Differential Revision: D4893083 fbshipit-source-id: 28b8adf3280e8d1f0a7dc9b0fee5ad53f2fada57	2017-05-26 16:04:32 -07:00
Anmol Kalia	7f98dc28cb	Refactored spatial softmax Summary: Refactored SoftmaxWithLoss by removing the code for spatial=1 mode and created a new op SpatialSoftmaxWithLoss that has the spatial mode implemented. Reviewed By: viswanathgs Differential Revision: D5104120 fbshipit-source-id: 8ab999e32c916b2a39a670a7b2a3365401535f24	2017-05-26 14:50:43 -07:00
Dan Zimmerman	78c1415012	Use unwind functions instead of backtrace to attempt to be more portable Summary: This should build on all linux systems now (unwind.h appears to be a gcc extension that clang supports as well) on every platform - even android. I'm not sure how to look at what platforms support which libc extensions, so I'm unsure how to proactively ensure this PR will work on all platforms. Closes https://github.com/caffe2/caffe2/pull/656 Reviewed By: pietern Differential Revision: D5134097 Pulled By: danzimm fbshipit-source-id: 093a49239c6d9d43ca64c52e8aaab569970b2cf9	2017-05-26 13:46:35 -07:00
Aapo Kyrola	b266c52b51	Create signal failure blobs in constructor, avoid race condition Summary: andrewwdye caught a sigsegv that happened at Gloo failure signaling function. Turns out workspace->CreateBlob() is not thread safe, and since we are running multiple threads it is likely that many gloo ops fail at once and thus we get a race. Caffe2 ops should actually be created in constructor, so that's what this diff does. Reviewed By: andrewwdye Differential Revision: D5139269 fbshipit-source-id: 7eaab3084e4e39543632c628c5e0710225e73b65	2017-05-26 13:01:43 -07:00
gchanan	065c59860a	Fix docs: masked_fill_ takes a value, not a tensor. (#1663 )	2017-05-26 14:41:03 -04:00
Ahmed Taei	75a6f909c5	Add option to enable memonger for gradients and add param_names for save_model. Reviewed By: akyrola Differential Revision: D5131493 fbshipit-source-id: 7c159ccffa30eb064c157e559f1d8f0350f03ccb	2017-05-26 11:31:35 -07:00
Sam Gross	45f665d05c	Fix decodeUInt64BE Fixes #1658	2017-05-26 11:21:31 -07:00
Dmytro Dzhulgakov	35eaf444c0	Quickly hack sparsenn_benchmarks to also do BenchmarkNet Summary: Makes benchmark a bit hacky, but it's a benchmark after all :) Specifically ports functionality of proper BenchmarkNet run from the ads_benchmarks so that we can see training net perf. Also adds --report_interval parameter to print stats more often when running in hogwild mode kdub0 - hopefully if you have time you can integrate it properly with the Flow's workflow harouwu -shouldn't conflict too much with your current diff Reviewed By: rayleichen Differential Revision: D5125183 fbshipit-source-id: 9c6f1663bc85e26d6609f0f2f23aa280731939db	2017-05-26 10:48:45 -07:00
Aapo Kyrola	d60a2e3c58	UnsortedSegmentSum/Mean for CUDA Summary: To make optimizer for sparse gradients work with CUDA, we need UnsortedSegmentSum and Mean implemented for CUDA. Unique was already implemented by harouwu. Pretty straightforward implementations, should be fast enough -- and i don't know a faster way anyway. Added some tests as well. Reviewed By: asaadaldien Differential Revision: D5124548 fbshipit-source-id: 63ae72f45fc2f07470603f7b2de12f34635dbb3d	2017-05-26 09:33:49 -07:00
Luke Yeager	97159810c9	Restore compatibility with protobuf2 Summary: Addresses an issue with `417f74509e`. ``` > operators.append(proto.op.pop()) E AttributeError: 'RepeatedCompositeFieldContainer' object has no attribute 'pop' ``` /cc jhcross Closes https://github.com/caffe2/caffe2/pull/658 Reviewed By: dzhulgakov Differential Revision: D5130382 Pulled By: salexspb fbshipit-source-id: 34e0c39aad5f339c1aaa1506af3e7495193565f4	2017-05-26 08:47:24 -07:00
Alexander Sidorov	016f72537a	ModelHelper.create_param, Initializer abstraction and ParameterInfo for optimizers Summary: This is going to unblock Nvidia in their work on adding fp16 support to Caffe2. I discussed this with kennyhorror before to make sure this fits into his work on parameter sharing. Reviewed By: kennyhorror Differential Revision: D5127797 fbshipit-source-id: 4db155d320b1862570c23b77c4252bdacbf2296f	2017-05-25 22:03:15 -07:00
Andrey Malevich	6c12df3003	Fix export of SparseToDense layer. Summary: If there're 2 SparseToDense layers that are densifying same IdList feature it'll result in the situation, where we might export invalid input for the prediction in input specs. This diff is changing the behavior to support to use Alias to a new blob instead of passing things directly. Reviewed By: dzhulgakov Differential Revision: D5093754 fbshipit-source-id: ef4fa4ac3722331d6e72716bd0c6363b3a629cf7	2017-05-25 21:46:28 -07:00
Jiyan Yang	9bf1f16255	Add bias to cosine distance for two tower models Summary: Currently using two tower models with cosine distance results in bad calibration. Adding bias to the output of cosine term solves the problem. Reviewed By: xianjiec Differential Revision: D5132606 fbshipit-source-id: eb4fa75acf908db89954eeee67627b4a00572f61	2017-05-25 19:50:20 -07:00
Zhicheng Yan	2002018603	memory_leak_data_worker Summary: Memory leak happens when new BlobReference is constantly added to the set _scratch_blobs Reviewed By: panshen1 Differential Revision: D5134945 fbshipit-source-id: 3ce4d482153bb89de065f20cd91411178085caad	2017-05-25 19:22:03 -07:00
Luke Alonso	64faf120ac	Adding support for ADD_TORCH_LIBRARY macro	2017-05-25 15:41:52 -07:00
Luke Alonso	0b74f0d796	lua 5.3 changes and gcc constants	2017-05-25 15:41:52 -07:00
Dan Zimmerman	c6591fa59b	Add asan no sig tests, move fatal sig tests there Summary: Changed test file name to signify that if testing with ASAN you should disable ASAN signal handling. Reviewed By: pietern Differential Revision: D5122977 fbshipit-source-id: f73de44df943516f3353cf408697869c43c45032	2017-05-25 15:02:36 -07:00
Isac Arnekvist	8074180081	Faulty error message for InstanceNorm1d (#1609 )	2017-05-25 17:13:01 -04:00
Soumith Chintala	5ce4a4adbf	Merge commit '3f1f3f97343d2ab7eb522cac7330f6b7478bd4da'	2017-05-25 16:51:57 -04:00
Soumith Chintala	3e9caed731	Merge commit 'bd705d38ce11a0ca1547f709f29f80a02b3dd894'	2017-05-25 16:51:09 -04:00
Adam Paszke	7b578dd68e	Add scatterAdd	2017-05-25 16:49:48 -04:00
Adam Paszke	3f1f3f9734	Add scatterAdd	2017-05-25 16:49:32 -04:00
Adam Paszke	bd705d38ce	Add scatterAdd	2017-05-25 16:49:22 -04:00
Luke Yeager	3ff54ffa8f	Fix KeepOnShrink tests Summary: Fix https://github.com/caffe2/caffe2/issues/417 Closes https://github.com/caffe2/caffe2/pull/551 Reviewed By: sunwael Differential Revision: D5130832 Pulled By: bwasti fbshipit-source-id: 8620befdc0bca8630b346be3c928e657ce653d75	2017-05-25 13:48:07 -07:00
Pieter Noordhuis	a9b5efe3c2	Expose max collective concurrency Summary: This was hardcoded at 4 before but should be made configurable. Can be kept low for big MLPs and higher for convnets. Reviewed By: akyrola Differential Revision: D5126138 fbshipit-source-id: 713ee8bbeb243b7de1479808fd6398d397e0b49a	2017-05-25 13:32:40 -07:00
Jiaming Liu	630af4d7d8	add learning rate schedulers (#1370 )	2017-05-25 16:21:43 -04:00
Pieter Noordhuis	cf078840d4	Update gloo dependency Summary: Updated dependency was expected in 6bc3f6ce1b761a3f8fe20bc90ecc0494a001f31e. Closes https://github.com/caffe2/caffe2/pull/672 Differential Revision: D5129520 Pulled By: pietern fbshipit-source-id: 0b4cb3c8950a693d56f3bb2fb04ab4aca868be07	2017-05-25 13:18:24 -07:00
Pooya Davoodi	a5f44ed265	Fix number of indices and block_size in SparseAdam Summary: Fix number of indices and block_size in SparseAdam to support gradients of any dimension. Closes https://github.com/caffe2/caffe2/pull/249 Reviewed By: asaadaldien Differential Revision: D5125714 Pulled By: akyrola fbshipit-source-id: 84134049cb9a77e58562272ea351222befe27fca	2017-05-25 13:18:23 -07:00
Soumith Chintala	0409b42a02	Merge commit '3abe5c80d2073f0e72f79b88f11b2a9d320fb116'	2017-05-25 15:40:27 -04:00
Adam Lerer	c39d48ea7d	Fast transposed copy	2017-05-25 15:39:21 -04:00
Adam Lerer	3abe5c80d2	Fast transposed copy	2017-05-25 15:39:07 -04:00
Trevor Killeen	05bc877a05	make THPPointer have explicit constructors (#1636 )	2017-05-25 15:35:54 -04:00
Pieter Noordhuis	7ea9d9af4e	Fix build when included by another project; take 2 Summary: Only adding `include_directories` doesn't propagate to the including targets. Also use `target_include_directories` to do so. Closes https://github.com/facebookincubator/gloo/pull/39 Differential Revision: D5131001 Pulled By: pietern fbshipit-source-id: 6c58c4b76ae7fa008e4fb26d1bca7900165884d0	2017-05-25 11:50:23 -07:00
Mohamed Fawzy	e35a4fe5cc	Implement SizeOp as requested in github issue#583 Summary: Implement SizeOp that returns the number of elements in the input tensor. Output is 1D tensor that contains the number of elements Reviewed By: akyrola Differential Revision: D5101061 fbshipit-source-id: d1c56053b6f3b41c65ac574dd748482775d1ea0d	2017-05-25 11:07:35 -07:00
Aapo Kyrola	d9896c43a7	improve cudnn conv type error msg Summary: CuDNN conv op's type error was not very descriptive. Reviewed By: Yangqing Differential Revision: D5124638 fbshipit-source-id: 7d3f0afad36573cdb97d1f8ec3c60a9c6d87f926	2017-05-25 09:50:19 -07:00
Edward Z. Yang	6a7c56499c	How to manage multiple build trees of PyTorch. (#1654 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-25 11:21:52 -04:00
gchanan	46ee1e4687	Clarify definition of gather function in docs. (#1652 )	2017-05-25 11:06:28 -04:00
Pieter Noordhuis	e63b49d9ab	Fix build when included by another project Summary: The CMake variable CMAKE_BINARY_DIR points to the top level build directory. For standalone Gloo builds this path lets files include the generated file "gloo/config.h". When Gloo is included as project, this variable points to a different path and "gloo/config.h" cannot be resolved. Fix is to build a path from CMAKE_CURRENT_BINARY_DIR. Closes https://github.com/facebookincubator/gloo/pull/38 Differential Revision: D5129385 Pulled By: pietern fbshipit-source-id: 722cebf4892b34f869fe43320153efbb181555b6	2017-05-25 07:50:53 -07:00
Artem Volkhin	55d293f730	remove non-existing blobs from output_schema in layer_model_instantiator Summary: In some cases (for example, when include_tags option is used) output_schema contains blobs that aren't produced by the generated net. In this case we want to filter them from output_schema as well. Differential Revision: D5120115 fbshipit-source-id: f98ea3f747589390b039d1e1987becec3980634c	2017-05-25 00:36:19 -07:00
Aapo Kyrola	da6b82b810	fix another bug related to in-place ops --> treat in-place ops like any other Summary: D5116828 changed how in-place ops were hanled in memonger and fixed a crash in NeuralMT. However, it still produced incorrect memongerization, because an op with one inplace input-output but another non-inplace output would be handled still incorrectly, as the other output's branch would not be followed properly. This is fixed by actually removing the whole in-place op special handling. This actually is not needed anymore, it was leftover from an older version of memonger that used topological sort of the ops. Reviewed By: asaadaldien Differential Revision: D5128142 fbshipit-source-id: b551b0faebdde410e6bd7516958c63cf610cc065	2017-05-24 23:32:03 -07:00
Deepak Gopinath	33c40e8a6e	Handling shared indices in sparse gradient updates Summary: When two or more blobs are gathered by the same indices blob in a data parallel model, we used to concatenate multiple times and re-write to the same indices blob. This leads to illegal memory access at times because the gradientslice indices blob is longer than its corresponding gradientslice values blob. This diff adds a check in order to avoid this. Reviewed By: akyrola Differential Revision: D5116817 fbshipit-source-id: 1c086d092eb6d48926d600f9408f578f5ddc41c7	2017-05-24 22:47:00 -07:00
Sam Gross	036c3f93af	Check for released variables in SavedVariable::unpack() (#1648 ) Fixes #1288	2017-05-25 00:35:19 -04:00
Lukasz Wesolowski	4f261f5730	Add support for fast float16 reductions using AVX Summary: Using Misha's vectorized AVX code to greatly improve performance of reductions on float16 values. Float16 reductions are now 2x faster than float. Reviewed By: pietern Differential Revision: D5123331 fbshipit-source-id: 03d4e76886d538b7e24eedaf32a92231a80b1e43	2017-05-24 21:20:06 -07:00
Aapo Kyrola	f2303ccb77	fix tileop test Summary: Gradient test for tile op was flaky because i had made the dimensions too large. This caused push blocking errors. Also I noticed my test_grad_tile was incorrect. Reviewed By: asaadaldien Differential Revision: D5126476 fbshipit-source-id: ae9ce5d9041648d7a4535fc88d4013e669bd6f02	2017-05-24 18:32:01 -07:00
Sam Gross	98581b9f7e	Fix conv1d segfault when weight doesn't require grad (#1646 ) Fixes #1600	2017-05-24 20:46:32 -04:00
gchanan	9a497f824b	Add size/dimensionality documentation for torch.gather. (#1645 )	2017-05-24 20:42:18 -04:00
Lukasz Wesolowski	457720459d	Change AllreduceOp and BroadcastOp to allow initializing gloo algorithms to take float16 inputs Summary: Modify BroadcastOp and AllreduceOp to allow initializing algorithms on buffers of float16 values. Previously the Allreduce algorithm definitions were hardcoded to take float. Reviewed By: pietern Differential Revision: D5042015 fbshipit-source-id: c5c3ea5566f9f23969847dcc0735f5f4b075f56f	2017-05-24 16:48:38 -07:00
Pieter Noordhuis	1e63a04a18	Use clear-to-send notification for broadcast algorithms Summary: The broadcast algorithms use the buffers they were given directly. There is no inbox/outbox pattern. This means that we can race if the algorithm is run repeatedly within a short time frame. This hasn't been an issue so far since we've only used it in combination with other process wide barriers. Since this adds a round trip the latency of these ops from the root rank perspective increases. The variance between the before and after runs is pretty high since there is no back and forth interaction on the root. It simply waits for recipients to be ready and then sends its data. Before: ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: broadcast_one_to_all Options: processes=4, inputs=1 elements min (us) p50 (us) p99 (us) max (us) samples 100 1 16 29 50 426075 200 2 17 32 50 179953 500 2 11 31 59 140291 1000 2 12 29 59 177619 2000 3 12 29 62 117882 5000 5 16 31 64 127113 10000 9 21 38 88 60328 20000 19 36 65 130 30427 50000 48 68 221 556 11180 100000 92 136 426 871 7314 200000 193 251 829 2965 4092 500000 492 638 2098 4133 1677 1000000 1195 2024 3513 11646 628 2000000 3446 4216 5007 17100 282 5000000 12956 13919 14941 37751 71 ``` After: ``` Device: tcp, pci=0000:25:00.0, iface=eth0, speed=50000 Algorithm: broadcast_one_to_all Options: processes=4, inputs=1 elements min (us) p50 (us) p99 (us) max (us) samples 100 15 37 52 107 27332 200 14 40 63 199 28620 500 17 37 52 118 18299 1000 9 39 57 120 33375 2000 20 57 78 180 24779 5000 31 61 84 190 18039 10000 39 70 90 225 8908 20000 57 108 130 940 8313 50000 94 163 217 1933 5326 100000 132 231 331 3501 3681 200000 256 426 560 6509 2272 500000 774 1092 1698 10039 985 1000000 1132 2106 3878 18218 484 2000000 3509 4252 6832 20228 226 5000000 11326 15447 27129 52694 77 ``` Reviewed By: wesolwsk Differential Revision: D5123341 fbshipit-source-id: f3bab4f75ef7c38817f74f00b382f18fe43d85d5	2017-05-24 15:36:36 -07:00
kazami	a94fc625f5	make random generator more flexible in context.h Summary: using typedef to replace the type of Pseudo-random number engines, it would make it flexible to use Closes https://github.com/caffe2/caffe2/pull/615 Differential Revision: D5121539 Pulled By: Yangqing fbshipit-source-id: 988e57f8d119cb6f3bfe692fdb303aba2ecacbeb	2017-05-24 15:33:12 -07:00
Lukasz Wesolowski	e54112758c	Fix potential vector out of range issue in ContextFactory::makeContext Summary: Vector out-of-range error was being triggered in some tests due to trying to get the address of an element past the end of vector. Reviewed By: pietern Differential Revision: D5123044 fbshipit-source-id: 004f72ebaa27c609290959c12a3d99b16289bfa8	2017-05-24 14:50:09 -07:00
Hans Gaiser	567842e68d	Check system dependencies first Summary: This PR changes the cmake of Caffe2 to look for system dependencies before resorting to the submodules in `third-party`. Only googletest should logically be in third-party, the other libraries should ideally be installed as system dependencies by the user. This PR adds system dependency checks for Gloo, CUB, pybind11, Eigen and benchmark, as these were missing from the cmake files. In addition it removes the execution of `git submodule update --init` in cmake. This seems like bad behavior to me, it should be up to the user to download submodules and manage the git repository. Closes https://github.com/caffe2/caffe2/pull/382 Differential Revision: D5124123 Pulled By: Yangqing fbshipit-source-id: cc34dda58ffec447874a89d01058721c02a52476	2017-05-24 14:31:51 -07:00
Sam Gross	e1d257bc6d	Fix segfault in autograd: (#1644 ) * Fix segfault in autograd: 1) Every "output" variable must have a grad_fn or grad_accumulator 2) compute_partial_exec_callbacks uses Python errors * assertRaisesRegexp was renamed assertRaisesRegex in 3.2 * Use HANDLE_TH_ERRORS macro	2017-05-24 17:13:08 -04:00
Luke Yeager	2aaac493d4	Fix cudnn version error formatting Summary: Closes https://github.com/caffe2/caffe2/pull/619 Differential Revision: D5121534 Pulled By: Yangqing fbshipit-source-id: afab91710da8f038b188a956bac275a4e9f18360	2017-05-24 13:47:26 -07:00
Yangqing Jia	e03e14a71e	Clean up binary build cmake script Summary: After the change we will be able to simply define targets and find dependencies. Closes https://github.com/caffe2/caffe2/pull/640 Differential Revision: D5121700 Pulled By: Yangqing fbshipit-source-id: 2d21e1afbccb09614054feccdd1bef55cbe3b035	2017-05-24 13:47:26 -07:00
gchanan	3d38e4f126	Acquire GIL before THPVariable_wrap (#1625 ) * Acquire GIL before THPVariable_wrap. * mutex not required when GIL is held. * Remove unused mutex.	2017-05-24 15:19:34 -04:00
Bram Ton	4da076d3e9	Fixed typo caffe_translator.py, fixes bug #397 Summary: Fixed minor typo in python/caffe_translator.py. Fixes #397. Closes https://github.com/caffe2/caffe2/pull/412 Differential Revision: D4950875 Pulled By: aaronmarkham fbshipit-source-id: 07183c6d6e8e97451bb5ee5ff01a88553d6bdb82	2017-05-24 12:18:32 -07:00
Bor-Yiing Su	c79ce5c2ba	Profiles pipe stages. Summary: Adds timers to collect the average runtime for each pipe stage. Reviewed By: azzolini Differential Revision: D5083958 fbshipit-source-id: 42536bd70c80c2453d98d872286525388f6164c3	2017-05-24 12:02:03 -07:00
Adam Paszke	fa93653d09	Improve handling of graph roots in autograd engine (#1635 )	2017-05-24 14:50:07 -04:00
Viswanath Sivakumar	152d439400	Allow specifying net type in predictor_exporter Summary: predictor_exporter copies the original predict_net's op, external_input and external_output fields, but ignores the type field. This is reasonable as the train net would generally have 'dag' type and copying that for inference may not be applicable. It's good to have a way to specify the net type nevertheless to run DAGNet for inference. This diff adds a field in predictor_exporter to do that. Reviewed By: akyrola Differential Revision: D5122354 fbshipit-source-id: 0e3cc417128db903c71515135c9e3b87620ae21e	2017-05-24 11:46:27 -07:00
James Cross	03503140fd	DropoutCell as wrapper for another RNNCell Summary: Added a new RNNCell, DropoutCell, which wraps an existing RNNCell and applies dropout to its primary output (as defined by get_output_state_index()). Reviewed By: salexspb Differential Revision: D5084871 fbshipit-source-id: 60474af84e5757a12e7fdc3814840dc9ba8e32a1	2017-05-24 11:36:45 -07:00
Bram Wasti	c55be38e63	Added mobile exporter Summary: Basically takes in a live net and creates an init_net and predict_net which can be written to file and run in Predictor Reviewed By: salexspb Differential Revision: D4989425 fbshipit-source-id: 8052065da9ed763d48bd9e1e19f7697ef60a2829	2017-05-24 11:36:44 -07:00
Alisson Gusatti Azzolini	db1d62caf7	Move RunPlan to a separate file Summary: This RunPlan is getting complex and confusing. The first step to clean it up is to move it out of workspace.cc to better mark separation of concerns. Reviewed By: kennyhorror Differential Revision: D5100721 fbshipit-source-id: 4be0559eba1abb8bb1ddc3818698763c2e014ef2	2017-05-24 11:07:15 -07:00
James Cross	c39f6cf2d0	gradient accumulation fix Summary: As noted by salexspb, MultiRNNCell had unreliable gradient computation. The problem was that recurrent gradient and gradient computed wihtin the backward step net were not being accumulated during the backward pass, but rather writing to the same blob, thus overwriting each other. This diff fixes that by artificially introducing an extra blob for the internal output, and then accumulating it into the gradient coming from the recurrent connection. Reviewed By: salexspb Differential Revision: D5110059 fbshipit-source-id: 16add50989fe8866361bbc21afce5f214c5292fd	2017-05-24 10:33:32 -07:00
Matthias Ochs	3fe8abb492	fixed gflags 2.2.0 error and image_input_op.h Summary: - caffe2 compiles now with gflags 2.2.0 (compiled from source), see issue https://github.com/caffe2/caffe2/issues/491 - fixed an error in image_input_op.h (did not compile in vs2015) Closes https://github.com/caffe2/caffe2/pull/559 Differential Revision: D5121555 Pulled By: Yangqing fbshipit-source-id: 9d2bedadd13d1872bb930a95d67ed20263988d13	2017-05-24 10:09:17 -07:00
Yang Cai	bf6f630888	bug fix in CMakeLists.txt (CAFFE2_CPU_FLAGS and CAFFE2_WHITELIST) Summary: Fixed a bug in CMakeLists.txt: should not use option cmd for setting the initial value(empty string) of CAFFE2_CPU_FLAGS and CAFFE2_WHITELIST, because option can only be used for boolean(ON/OFF) variables. Use set cmd instead. The bug can cause compilation errors if CAFFE_CPU_FLAGS is set to ON, since an invalid 'ON' flag will be added to CXX_FLAGS. (2) Add build_* in .gitignore to allow multiple build directories in repo Closes https://github.com/caffe2/caffe2/pull/611 Differential Revision: D5121545 Pulled By: Yangqing fbshipit-source-id: 1f57042075356b6bf7138f65565b327be2a6d272	2017-05-24 10:09:17 -07:00
Herry	b5a215db0a	Added python-pip and python-numpy into build_raspbian.sh Summary: Added python-pip and python-numpy into build_raspbian.sh script because they are not installed in ubuntu/debian minimal image. Closes https://github.com/caffe2/caffe2/pull/609 Differential Revision: D5121550 Pulled By: Yangqing fbshipit-source-id: 14dd1450275fcc2aa9d2a06f0982f460528a1930	2017-05-24 10:09:16 -07:00
johnzou	43be6456e2	UNUSED_VARIABLE VS compile fail fix Summary: fix test projects UNUSED_VARIABLE compile fail on visual studio Closes https://github.com/caffe2/caffe2/pull/613 Differential Revision: D5121541 Pulled By: Yangqing fbshipit-source-id: c353e8df4995e732e4d5d64bac15d849464efea2	2017-05-24 10:09:15 -07:00
ningqingqun	ff047fdeef	Fix the mix-up of height and width on depth-wise convolution	2017-05-24 21:05:08 +08:00
Aapo Kyrola	6c511f64cc	fix handling of ops with in-place input/output Summary: Memonger ignores ops with input and output in-place, but did not work correctly if there were also non-inplace inputs, like with Mul. Simple fix to also look at in-placeness during the traversar. Reviewed By: jhcross Differential Revision: D5116828 fbshipit-source-id: 52817f1221597986cc09cc65d094417c1923d965	2017-05-23 18:23:33 -07:00
Lukasz Wesolowski	2486a6bbd0	Add missing header file types.h in CMakeLists.txt Summary: A recently added header file was missing in CMakeLists.txt Reviewed By: pietern Differential Revision: D5116962 fbshipit-source-id: 6c3fbd4b49c913f20308c1b057a7e09806e0c2b0	2017-05-23 16:50:41 -07:00
Pieter Noordhuis	640846b864	Fix race in ibverbs transport Summary: In a previous commit where the slot numbering was expanded, I changed the memory region send/recv path to use a map for the outgoing memory regions (since they may complete out of order). Before, this was a fixed size array, which was mutated by both the user thread and device thread without holding a lock. The map, however, can't be mutated without a lock. This change adds that lock and a few assertions to check for this type of problem. Reviewed By: andrewwdye Differential Revision: D5108194 fbshipit-source-id: 1908c988112469ecdec6cb6eb9849068d896c409	2017-05-23 15:38:48 -07:00
Mohammad Hossain	77b38b915e	Checks performance regression for resnet50. Summary: Guard operator execution times by leveraging ProfDagNet generated statistics. Reviewed By: akyrola Differential Revision: D5065462 fbshipit-source-id: b480a5083eb557a09eeb3fdbb5d54ff16ed923c7	2017-05-23 13:34:45 -07:00
Andrey Malevich	64e04e78d2	Remove AddOperator from ModelHelper Summary: It looks like AddOperator was never really used (searched across the whole code-base). In addition to this all model_helper functionality is getting replaced with Brew, so there I'd prefer to remove this method to reduce the amount of code touching model.params. Reviewed By: rayleichen Differential Revision: D5110425 fbshipit-source-id: f2a88e4c1ce5149d27e809e03da9a86c0867bc4d	2017-05-23 13:34:45 -07:00
Soumith Chintala	ba56de1150	add coding UTF-8 declaration	2017-05-23 16:02:34 -04:00
Kai Arulkumaran	6e3e453ad2	Tidy up convs docs (#1602 )	2017-05-23 18:32:33 +02:00
Aapo Kyrola	2b11adb414	TileOp CUDA fix: number of threads must be hard coded Summary: I had "optimized" the number of threads / block, but cub::BlockReduce has a static template parameter for the number of threads, and this must match. Probably tests still passed because typically the initial numbers are zeros. Also added a stronger test. Thanks ves for the report. Differential Revision: D5110901 fbshipit-source-id: c1169b1286e204c202b0727448ddb51b4965eacb	2017-05-23 09:32:19 -07:00
Pieter Noordhuis	f5d919a685	Generate config.h file with compilation options Summary: This file can then be used by downstream code to figure out what Gloo features it can support (e.g. ibverbs transport or not). Closes https://github.com/facebookincubator/gloo/pull/36 Differential Revision: D5110769 Pulled By: pietern fbshipit-source-id: 2c0c07537258048737ae764a4978f2f7fdbd992d	2017-05-23 09:26:03 -07:00
soumith	02e4ca9cab	fix wrapper	2017-05-23 08:43:13 -07:00
Pieter Noordhuis	70a774898e	Remove superfluous forward declaration Summary: ContextFactory is no longer mentioned in gloo/context.h. Reviewed By: romain-intel Differential Revision: D5110328 fbshipit-source-id: 48dd020dc39d71d0d5f72deebfa5d80122b70c0d	2017-05-23 08:20:55 -07:00
Aapo Kyrola	74e964ff0d	make data_workers restartable Summary: Add ability to restart data workers data input. Reviewed By: andrewwdye Differential Revision: D5108666 fbshipit-source-id: f7f71cd6d4d45d007067814a552fc93cbe3eca42	2017-05-23 01:18:44 -07:00
Pieter Noordhuis	49befe3fcd	Remove commPairs_ member variable from halving/doubling Summary: TSIA Reviewed By: wesolwsk Differential Revision: D5110348 fbshipit-source-id: d3346e2af1a9f13410dc93336c53040a29e22e66	2017-05-22 21:21:42 -07:00
Pieter Noordhuis	7eac2073b8	Add notification mechanism to ContextFactory Summary: This is another example where our unsolicited writes may interfere across calls to the collective function. In this case, it was possible for a second call to overwrite a pair's address before it had been used to connect the pair in the previous iteration. Thinking out loud, we could avoid this from happening by supporting this pattern natively in the Buffer classes. For example, we can add a notification mechanism (opt in) to the Buffer class such that the receiver may call `ackRecv()` to acknowledge receipt and handling of the data in the buffer. Then the sender will block on new sends until acknowledgement from the previous send has been received. Until then, we have to keep an extra eye out. Reviewed By: wesolwsk, romain-intel Differential Revision: D5095430 fbshipit-source-id: 4c100433108fccea7457bba4dc00f651f722e6c9	2017-05-22 19:50:18 -07:00
Jonathan R. Williford	356c19319f	Change repo from bwasti to caffe2. Summary: I'm assuming the repo should be caffe2/caffe2.git and not bwasti/caffe2.git. Changed it accordingly. Closes https://github.com/caffe2/caffe2/pull/572 Differential Revision: D5105328 Pulled By: aaronmarkham fbshipit-source-id: 4bd3babbd93c79831be79c6d40b81d873fcc3f4c	2017-05-22 15:32:23 -07:00
Cosmo Stérin	45524ec33c	Fix indices bug in MM.py (#1613 ) (#1617 )	2017-05-22 16:47:51 -04:00
Aapo Kyrola	1d8e93536c	better TileOp/Gradient CUDA implementation Summary: ves and jamesr66a had noticed that TileOp for CUDA was very slow, as it started kernels inside double loops. It was my fault not to notice this in the code review. This diff uses 1 kernel for forward and backward passes and is probably much faster. I did not test though, maybe ves or jamesr66a can help? Reviewed By: jamesr66a Differential Revision: D5101968 fbshipit-source-id: 64b6ac933785e3710b3c1d8c692a4c48650bca96	2017-05-22 12:17:13 -07:00
Dan Zimmerman	5a7f67bd41	Add stack traces on fatal signals Summary: When a fatal signal is fired to a task that links against caffe2 this PR adds stacktraces from every thread that's currently running. Only linux is supported currently. The signals that are currently supported are SIGABRT, SIGINT, SIGILL, SIGFPE, SIGBUS and SIGSEGV (more signals can easily be added, but for now this seemed like the major signals that might be fired - see signal_handler.cc:138 for the table of signals). I've added tests that verify that each of those signals indeed output the expected number of stacktraces. We need to add linking against libdl since on linux apparently it's not implicitly always linked in (I'm coming from macOS where I believe it is). Example output can be found [here](https://gist.github.com/danzimm/814faa1229d9c54f359d23ba038344a6) - note that the signal name changes depending on the signal that was sent (as well as the number in parenthesis that corresponds to the specified signal). Closes https://github.com/caffe2/caffe2/pull/596 Reviewed By: akyrola Differential Revision: D5087526 Pulled By: pietern fbshipit-source-id: ba8d058c9ca1cf06b41667205193f8699f8d6964	2017-05-22 10:34:32 -07:00
Simon Layton	193c9289f0	Fix LRN schema for cuDNN op Summary: Correct schema generation was previously broken leading to invalid gradient op creation. Also exhibited in model_device_helper, where invalid schema were being created on the CPU when kwargs['engine'] == 'CUDNN' Closes https://github.com/caffe2/caffe2/pull/617 Reviewed By: asaadaldien Differential Revision: D5097062 Pulled By: akyrola fbshipit-source-id: e22181f857deccb7b4395e87271e2cbf1226eb64	2017-05-22 08:33:34 -07:00
Ruxin Wang	f072c74dfd	make it effective to transfer a tensor from other devices to device 0 (#1610 )	2017-05-22 11:06:57 -04:00
Soumith Chintala	107a0fe9ac	Revert "Revert "ClassNLLCriterion supports missing targets""	2017-05-21 13:48:19 -04:00
Dmitry Ulyanov	2acfb2376a	fixes eval mode in InstanceNorm (#1604 ) fixes https://github.com/pytorch/pytorch/issues/1541	2017-05-21 13:27:48 -04:00
Adam Paszke	0c5598c668	Update build status matrix	2017-05-21 12:20:50 +02:00
Du Tran	37834b1343	Change video_input_op to output label in int32 instead of float Differential Revision: D5101606 fbshipit-source-id: cc3ab4309c521832f776f7770ba469cdf03f8485	2017-05-20 16:47:55 -07:00
Alexander Sidorov	92610e78bb	CuDNN comparison mode Summary: This is allows to produce nice comparisons against CuDNN. Currently on 1 layer I see about 28% slow down on average across setups specified. Reviewed By: akyrola Differential Revision: D4986218 fbshipit-source-id: efb12081f13dbfb92428fd4a85f12fd566eb9522	2017-05-20 15:19:43 -07:00
Kai Arulkumaran	feaee29bfe	Add argmax and argmin to docs	2017-05-20 18:56:20 +02:00
Aapo Kyrola	a2c01e830b	fix duplicate init blob issue + fix test Summary: Address KaimingHe's comments in D5093689 about same blob being initialized twice causing internal consistency check to fail. Also I noticed that my new test for test_checkpoint_params was completely botched due to an indentatino issue (it did not actually execute any test). So this fixes that as well. Modified the test to add a duplicate param initializer, so that this bug is tested for. Reviewed By: KaimingHe Differential Revision: D5101304 fbshipit-source-id: 72f343035c1b4953e7bb9a1a1c171cf05d3ead26	2017-05-20 09:18:29 -07:00
Aapo Kyrola	aa603a9083	add test for input order Summary: Based on jay-mahadeokar's code, add a test for input order consistency to data workers. Reviewed By: jay-mahadeokar Differential Revision: D5096887 fbshipit-source-id: efd226343f81e9a0157ec89d4588f1eee8a78549	2017-05-19 23:46:38 -07:00
Aapo Kyrola	6384bae29b	call save_to_db in CPUContext + fix a typo in data_parallel_model. Summary: If Predictor Exporter save_to_db is called in CUDAContext, a failure occurs since the following FeedBlob() tries to store a string (meta data), but for CUDA blobs we assume they are tensors. + fix a typo in data_parallel_model that I bumped on. Reviewed By: asaadaldien Differential Revision: D5099837 fbshipit-source-id: 69d01b35a9a1816bf083f13d8a6ce88e1f5aecb7	2017-05-19 18:25:00 -07:00
James Cross	83f6dceaa6	remove forget_bias as argument to AttentionCell constructor Summary: argument unsused. Differential Revision: D5096088 fbshipit-source-id: fcda8a1d2b0d7c85182ab5bc002c86640b443f97	2017-05-19 16:53:40 -07:00
Du Tran	c69ab3d3ad	Fix open source build with ffmpeg Summary: Rename some type of AVPixelFormat Reviewed By: aaronmarkham Differential Revision: D5097337 fbshipit-source-id: 8ee9b0fc7284752e56f74c7ada241b3bd421efd1	2017-05-19 16:19:44 -07:00
Ahmed Taei	09bbd0382c	ConvNd cuDNN Summary: Add ConvND cuDNN implementation. Reviewed By: akyrola Differential Revision: D4702205 fbshipit-source-id: 65275bcff3970b0d43ac5c168d38bcd075985979	2017-05-19 15:20:33 -07:00
Andrew Dye	b5721c2d9d	Throw timeout exception from StoreHandler::wait() and catch in CreateCommonWorldOp Summary: Define StoreHandlerTimeoutException() for timeouts in StoreHandler::wait(). Update all StoreHandler implementations. Catch new exception in CreateCommonWorldOp and store failure blob. Reviewed By: akyrola Differential Revision: D5095625 fbshipit-source-id: dc6f8351cc129cd1fac72bd4b2c8e6b684b21f31	2017-05-19 15:01:23 -07:00
Aapo Kyrola	0af0cba2b7	Refactor data_parallel_model initial sync and checkpointing Summary: Major improvements. Before we only synced "params" and "computed params" of model after initialization and after loading a checkpoint. But actually we want to sync all blobs that are generated in the param_init_net. For example the _momentum blobs were missed by the previous implementation and had to be manually included in checkpoint finalization. I also added GetCheckpointParams() to data_parallel_model because it is now fully general. Also added a unit test. Reviewed By: andrewwdye Differential Revision: D5093689 fbshipit-source-id: 8154ded0c73cd6a0f54ee024dc5f2c6826ed7e42	2017-05-19 12:48:06 -07:00
Yiming Wu	0aeffa985e	make sure mutex is on CPU too Summary: mutex is only supported on CPU. need to make sure mutex and following atomicIter are both on CPU. This is critical for gpu SparseNN training Differential Revision: D5093184 fbshipit-source-id: 021e6ba699a3208449fa4761cad6b0ec4544957e	2017-05-19 12:17:17 -07:00
Yiming Wu	65750349ba	deprecate CNNModelHelper in python/operator_test dir Summary: deprecate CNNModelHelper in python/operator_test dir BTW I found that there is 2 mkl_speed_test. I am confused... Reviewed By: salexspb Differential Revision: D5094122 fbshipit-source-id: f6526f4de334f2245eb4c1f204a8ec9f23750d78	2017-05-19 12:17:17 -07:00
Yiming Wu	7ce5d0765b	GivenTensorIntFill on CUDA Summary: GivenTensorIntFill on CUDA, usefull for GPU training Reviewed By: dzhulgakov Differential Revision: D5093208 fbshipit-source-id: 500338a127a6c4ecdacd732195c5c5cc776f3d4f	2017-05-19 10:19:01 -07:00
Ahmed Taei	32bf7a2c2b	Generalize PoolingOp(cuDNN) to compute 2D and 3D pooling. Reviewed By: akyrola Differential Revision: D5090689 fbshipit-source-id: f9f11e12adc0ee8db088f3397a8c33aa31eb5deb	2017-05-19 10:19:00 -07:00
Sam Gross	7f6cd7c7ea	Fix error message in CUDA forked subprocess (#1585 ) We need to re-call _lazy_init in _CudaBase.__new__ in the subprocess.	2017-05-19 12:36:08 -04:00
Yiming Wu	1b7497807f	cnnmodelhelper deprecate warning Summary: We will start our API migration process. Before that, I want to make sure people don't add new CNNModelHelper instance to our opensource code. So that I put deprecation warning here in advance Reviewed By: salexspb Differential Revision: D5093556 fbshipit-source-id: 74bf4a7782c2d882f72f202d48c72255d152b68a	2017-05-18 23:35:26 -07:00
Sam Gross	625850c2c2	Check cuDNN version at runtime (#1586 ) * Check cuDNN version at runtime This checks that the version from cudnn.h matches the version from libcudnn.so. Fixes #1476 * Only check major and minor version numbers	2017-05-19 01:55:09 -04:00
Sam Gross	9b3447761a	Check for required non-None arguments in C++ autograd functions (#1589 )	2017-05-19 01:47:35 -04:00
Soumith Chintala	ed679fc43c	disabling fd leakchecker test (#1593 )	2017-05-19 01:20:50 -04:00
Sam Gross	e6c9509a41	Fix call to Tensor.set_ in rnn.py (#1592 )	2017-05-18 20:28:49 -04:00
Po-Hsien Chu	c57f0530e7	let long_args False for param "size" of set_ (#1568 ) * fix #1524, let long_args False for param "size" of set_	2017-05-18 19:31:36 -04:00
Pieter Noordhuis	8021bb938c	Remove slot number limitation from ibverbs transport Summary: The pair was still hardcoding limits on the slot numbers. In this change those limits are lifted. This also adds back assertions on work completion status in handleCompletion. Reviewed By: wesolwsk Differential Revision: D5090457 fbshipit-source-id: 7bf884e1f31e48e8f1cdfb179a225999e28171b2	2017-05-18 16:20:40 -07:00
Lukasz Wesolowski	1f4317be3f	Add support for half-precision floating point operations Summary: Add support for collectives over vectors of half-precision floating point values. Reviewed By: pietern Differential Revision: D5062938 fbshipit-source-id: 0b39fa53370393fec1edf2d852ff7f1d862b9022	2017-05-18 15:09:06 -07:00
Simon Layton	77f539174c	Update fp16 NCCL ops Summary: CUDA_HAS_FP16 -> CAFFE_HAS_CUDA_FP16 Closes https://github.com/caffe2/caffe2/pull/605 Differential Revision: D5090629 Pulled By: pietern fbshipit-source-id: 3df12c0547f55bdd27be25f59e1e7823ebf8b899	2017-05-18 15:02:53 -07:00
Pieter Noordhuis	cba46a4869	Assert that we don't do out of bound writes on recv Summary: The halving/doubling algorithm had two instances where a receive buffer was registered with a number of elements instead of a number of bytes. This change adds the assertion that should have caught this in the first place. Reviewed By: wesolwsk Differential Revision: D5089483 fbshipit-source-id: fd0f0724ef04300236c9297ee88b27e61fb1e5a0	2017-05-18 14:34:39 -07:00
Pieter Noordhuis	b391f53681	Cache send/recv buffers in ContextFactory Summary: The original implementation created temporary buffers on the backing context. This also meant an ordering problem when using the ibverbs transport, as a call to send will block until the remote side has created its receive side buffer. Since all buffers are now created prior to using them, this is no longer an issue. Reviewed By: romain-intel Differential Revision: D5082352 fbshipit-source-id: 4c260f06e8f461c0336e7eec7ca891e07ff41cd3	2017-05-18 10:20:42 -07:00
Pooya Davoodi	307459eb62	Fix conv_test for CUDNN dilated convolution in NHWC Summary: CUDNN dilated convolution was added to V6. This version of CUDNN does not support NHWC for dilated convolution. Fix conv_test.py so that it does not test CUDNN for dilated convolution in NHWC format. Closes https://github.com/caffe2/caffe2/pull/598 Reviewed By: akyrola Differential Revision: D5084835 Pulled By: asaadaldien fbshipit-source-id: 3c0c5ed02c5d9232fca567e387ab6260d71e5aaf	2017-05-18 10:07:28 -07:00
Aapo Kyrola	9386bc7ca8	Improve elementwise comparison docs Summary: In response to https://github.com/caffe2/caffe2/issues/581 feedback, add textual "less than", "greater than" etc. to comparison operator docs, instead of just <, <=... which are hard to search on browser. Reviewed By: asaadaldien Differential Revision: D5085907 fbshipit-source-id: f129d94f03aff1cc919f8da843aa461f157eb144	2017-05-18 10:07:27 -07:00
Misha Smelyanskiy	b61378b4b6	vectorized version of lstm_unit Summary: vectorized lstm_unit using eigen Reviewed By: ajtulloch Differential Revision: D5051296 fbshipit-source-id: 1fa39ce474c731772c4169150622943a7eaec8e3	2017-05-17 23:33:22 -07:00
James Reed	85f1d947dd	Vectorize SigmoidOp on CPU Summary: I noticed that Sigmoid was taking an inordinate amount of time in our NMT benchmark, so I looked at the implementation and it didn't seem optimal. I replaced the implementation with an Eigen version so that when the Eigen update goes through, we will get proper AVX(2) vectorization. Differential Revision: D5082464 fbshipit-source-id: aa951f7d730fc05198f7dd04076ec58d471b74c8	2017-05-17 20:33:36 -07:00
Ben Zhang	12edbcb154	Implemented L1Distance Operator for CUDA Summary: Added L1Distance Operator for CUDA, as well as tests. Reviewed By: bwasti Differential Revision: D5071966 fbshipit-source-id: 4c3d862605e9123d955bf091efa67d0731bd816a	2017-05-17 17:32:53 -07:00
Lukasz Wesolowski	85732b52ec	fix cuda multiple algorithm test Summary: Fixing a bug in the multiple algorithm test where threads were spawned repeatedly, causing collisions during rendezvous. Reviewed By: pietern Differential Revision: D5082945 fbshipit-source-id: 4adbbc963b1ff652f73a44cd9fd75dcd3325f182	2017-05-17 16:35:25 -07:00
Isac Arnekvist	156fe28666	dataloader can now handle growing datasets (#1575 )	2017-05-17 19:23:15 -04:00
Edward Z. Yang	2f4bf4ab39	Rewrite 'How autograd encodes the history' to accurately describe current setup. (#1580 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-17 19:21:20 -04:00
Edward Z. Yang	1f3ff5ced2	Miscellaneous documentation around autograd. (#1577 ) * Miscellaneous documentation around autograd. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-17 19:19:24 -04:00
Chetan Khatri	b8b7f879c2	.gitignore updated with editor temporaries (#1574 )	2017-05-17 19:16:02 -04:00
Andrew Dye	1f5cd3582c	Add contrib/gloo/common.cc to Caffe2_CPU_SRCS Summary: Missing common.cc in contrib/gloo/CMakeLists.txt Reviewed By: pietern Differential Revision: D5082928 fbshipit-source-id: 4ab5142a168d2f66cc9624c3054eb6e936976c66	2017-05-17 16:10:29 -07:00
Lukasz Wesolowski	a0b83464e4	fix bad conversion to float in cpu_half2float Summary: When converting from half to float, the bytes to be returned were represented as an unsigned int. When returning, this had the effect of converting the unsigned int into a float. This is incorrect, as we want to instead take the raw data and return it as float. Reviewed By: pietern, asaadaldien Differential Revision: D5080335 fbshipit-source-id: 7208efc5799daccf92e1628ee326f7470b867261	2017-05-17 15:57:42 -07:00
Pieter Noordhuis	7b10b16496	Move ibverbs buffer send logic to pair.cc Summary: TSIA This matches the approach in the TCP transport where all send/recv logic is contained in the pair code. Reviewed By: wesolwsk Differential Revision: D5082503 fbshipit-source-id: b70886ed9aaeb381cdb45fba00704118cff62a23	2017-05-17 15:54:34 -07:00
Pieter Noordhuis	da86633c7c	Additional synchronization in halving/doubling Summary: This is necessary to avoid the next iteration of the algorithm overwriting data in recvBuf_ before it has been consumed by the receiver of that data. If this does happen, the result of the previous iteration for the receiving end is corrupted. This can only happen in async mode on the TCP transport (so all incoming data is unsolicited) when spinning on the run function. Reviewed By: wesolwsk Differential Revision: D5074789 fbshipit-source-id: 66668fbd885888f26266d812e78d61c6d65c2461	2017-05-17 15:21:09 -07:00
Pieter Noordhuis	bbd7aee9ab	Revert D4952993: [Caffe2] fix mkl_sparse and migrate sparsity experiments Summary: This reverts commit 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0 Differential Revision: D4952993 fbshipit-source-id: 5c213c48ac44ce6aefccacc6d80534648d3c516a	2017-05-17 14:46:56 -07:00
Adam Paszke	c573d53939	Bug fixes (#1573 ) * Fix clang warnings * Raise errors when unsupported ConvNd configurations are used * Properly handle Variable indexing with LongTensors * Support both tensors and variables in Variable.type_as	2017-05-17 15:28:16 -04:00
James Cross	f27c9eea20	dropout for C2 multilayer Summary: Incorporate arbitrary dropout for encoder and decoder layers for Caffe2 NMT models using current configuration. This involves separate output processing (_prepare_output() and _prepare_output_sequence()) for the final layer in a MultiRNNCell. Switching to using the newly introduced forward_only switch for RNN cells revealed an unrelated bug in our NetGradientChecker test, which urikz is investigating. Reviewed By: salexspb Differential Revision: D5031964 fbshipit-source-id: 19b49607d551aa3e2140041ef4e585f128c8f178	2017-05-17 11:32:47 -07:00
Ahmed Taei	f555c6308c	Fix NormalizeOp gradient numerical stability Differential Revision: D5075044 fbshipit-source-id: 8c20b9021020c9ada1f1059e15fafea9bd5674ff	2017-05-17 09:19:00 -07:00
Aapo Kyrola	658c337f41	Error status for Gloo ops, and handling in elastic dpm Summary: Add a RandomFailureOp and handling to elastic data parallel model of the status code Reviewed By: andrewwdye Differential Revision: D5065936 fbshipit-source-id: 24224f9ea414ee535c9e90cc28add5189354b0ef	2017-05-17 00:16:52 -07:00
Szymon Piechowicz	5ced84856a	Caffe2: SparseToDenseMask: return key presence Summary: Caffe2: SparseToDenseMask: return key presence Reviewed By: matbd Differential Revision: D5066863 fbshipit-source-id: 4f4dd141f6661829535cb77ff47cc0c230dce5d6	2017-05-16 20:22:03 -07:00
Yiming Wu	f359d70ae7	fix mkl_sparse and migrate sparsity experiments Summary: Migrate experiments folder to fb/sparse folder. Keep FunHashOp and SparseFunHashOp because they are now assumed as a default Op in depr. What I did # Migrate FunHashOp and SparseFunHashOp and their unitests to core-caffe2, make sure tests are passed. # Migrate other Ops in experiment folder to fb/sparse folder. Write new TARGETS files for them. Make sure tests are passed. # Make sure all related tests passed. # Fix MKL definition btw. Make sure that FC_Sparse is not compiled when there is no MKL support Reviewed By: salexspb Differential Revision: D4952993 fbshipit-source-id: 86c03676ab4e47f04d2d0dd438a4a1c849bbbff0	2017-05-16 18:33:51 -07:00
James Cross	37c06a3ba8	residual connections in multilayer C2 ('add' only) Summary: Residual connections for multilayer RNN encoder/decoder for Caffe2 NMT model. Only supporting 'add' connections (the standard approach, which ves's TF experiments concluded was at least as good as other approaches), and also only implementing for residual_level >= 1 (which also fits our use case). It is the responsibility of the config to ensure dimension compatibility: each level at and beyond residual_level (in both the encoder and decoder) should have the same number of units, with the exception that a bidirectional initial encoder layer should have half the number of units of the succeeding layer if that next layer is a residual layer. Differential Revision: D5023160 fbshipit-source-id: f38c1b140638fee78cf3ef7d6b4602dd462484ee	2017-05-16 17:04:58 -07:00
Yiming Wu	a28b01c155	rnn with brew Summary: Update rnn_cell.py and char_rnn.py example with new `brew` model. - Deprecated CNNModelHelper - replace all helper functions with brew helper functions - Use `model.net.<SingleOp>` format to create bare bone Operator for better clarity. Reviewed By: salexspb Differential Revision: D5062963 fbshipit-source-id: 254f7b9059a29621027d2b09e932f3f81db2e0ce	2017-05-16 13:33:44 -07:00
Alisson Gusatti Azzolini	310f505da7	Remove application-specific comment. Summary: This comment is not relevant for open-source. Differential Revision: D5070835 fbshipit-source-id: 8e2dadae85566e7f6684d42f921daf7d345dc065	2017-05-16 12:17:03 -07:00
Yang Yang	769e668faf	ttsn model fails to set optimizer for FC layer Summary: the FC ModelLayer needs an optimizer, also seems the catch-all that sets a default for missing optimizers had a bug Reviewed By: xianjiec Differential Revision: D5048302 fbshipit-source-id: cbbf641fb9ee4f4f89c5dbb132f7837ecdbe37a5	2017-05-16 11:26:02 -07:00
Sean Naren	cb79c24d0b	Added powerpc64le support (#1572 )	2017-05-16 08:30:06 -06:00
Yiming Wu	64d43dbb6e	new resnet building with brew Summary: new resnet building with brew Reviewed By: akyrola Differential Revision: D4945418 fbshipit-source-id: d90463834cbba2c35d625053ba8812e192df0adf	2017-05-15 22:47:24 -07:00
Aapo Kyrola	af0a412e83	alternating workspace for forward only Summary: Use alternating workspace for forward_only RNNs. Reviewed By: jhcross Differential Revision: D5064930 fbshipit-source-id: d1572b5f90b219fda9dfa31ce6140331672052f2	2017-05-15 21:47:06 -07:00
Nicholas Leonard	caa1cdf0ce	ClassNLLCriterion ignoreIndex	2017-05-15 22:27:00 -04:00
Ahmed Taei	25fd005dd9	Initial implementation of Blockwise Model Update Filtering (BMUF) Summary: A Single machine multi-GPU version of BMUF algorithm. BMUF is a modification to model averaging where updates to global model is implemented as a filter: param_t = param_(t-1) + delta delta = \beta delta_(t-1) + \alpha average(param_t) - param_(t-1) Reviewed By: akyrola Differential Revision: D4995057 fbshipit-source-id: 48176ba66d67eaf3fa4dee16d50d9589825ddba4	2017-05-15 18:18:15 -07:00
Aapo Kyrola	57054bd52f	use remapped name for param_grads, to enable memonger Summary: We need to use remapped name for param_grads to enable memonger. Differential Revision: D5064198 fbshipit-source-id: ae54407c3362044e9bc2bff929e12da68cd6a332	2017-05-15 17:26:40 -07:00
Edward Z. Yang	368ecb47f9	Fix flaxy test_sparse_adagrad (#1562 )	2017-05-16 01:03:08 +02:00
Huazhong Ning	e394b60a9c	Support un-equal weight training for mtml models Reviewed By: queqichao Differential Revision: D5047939 fbshipit-source-id: 857d0d77e0413939e5774fa37d21b92a00d34bf0	2017-05-15 12:56:11 -07:00
Aaron Markham	ad37840329	fixed document generator for github Summary: Fixed generator. Tweaked the output to fit github markdown template. Reviewed By: bwasti Differential Revision: D4569692 fbshipit-source-id: 87f497319cc8b258c6c75dc0837d728c5eda5636	2017-05-15 11:40:46 -07:00
Thomas Viehmann	6107d15d14	Twice differentiability of pointwise functions (#1531 )	2017-05-15 12:00:59 -06:00
Po-Hsien Chu	ba885a1a51	expose bitwise operators from C/CUDA (#1556 ) * fix issue #1549, expose bitwise and * expose C bitwise or of Tensor * expose C bitwise xor of Tensor * use built-in method for inplace and, or, xor * expose C bitwise lshift(ilshift) and rshift(irshift) of Tensor	2017-05-15 11:36:15 -06:00
Soumith Chintala	ce1a0eb6c9	Merge commit '7afd78d77ffad503357c35f495ae6d4d2b008862'	2017-05-15 11:20:27 -06:00
Rudy Bunel	7afd78d77f	Cuda reduce in a consistent direction	2017-05-15 11:18:20 -06:00
Adam Paszke	6b84dc26f0	Add F.cosine_similarity (#1502 )	2017-05-15 11:12:54 -06:00
Edward Z. Yang	0f458ee3c4	Fix memory leak in THCSTensor_spcadd. (#1519 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-15 11:11:03 -06:00
Ronny Restrepo	8aa011f52a	minor typo and style changes to _torch_docs.py (#1559 )	2017-05-15 15:32:56 +02:00
Soumith Chintala	2a610c9d13	Revert "Update to ignore zero targets"	2017-05-14 18:15:30 -07:00
Soumith Chintala	ac8b2c0fa3	Revert "ClassNLLCriterion supports missing targets"	2017-05-14 18:14:36 -07:00
Marvin Cao	0ba20435ce	Add high order grad support for Some operator (#1507 )	2017-05-14 23:02:04 +02:00
Rudy Bunel	6fc9130052	Adapt documentation to reflect new supported argument (#1548 ) Reflect the changes of #1323	2017-05-13 21:09:34 -06:00
Xinwei Geng	28f4f6db2c	typo error for torch.addr (#1547 ) fix the typo error in the example for torch.addr	2017-05-13 08:53:05 -07:00
stooloveu	9b2de027be	SpatialDepthWiseConvolution.cu added	2017-05-12 16:02:14 -04:00
Nicholas Leonard	bf4345e2ef	ClassNLLCriterion supports missing targets	2017-05-12 15:15:39 -04:00
Yiming Wu	3eeca5b5e0	arg scope in ModelHelper Summary: based on our discussion, we want an arg_map in ModelHelper and create arg_scope for that model within brew. Now it is realized Reviewed By: salexspb Differential Revision: D5042983 fbshipit-source-id: ddd2c7e9bca1be2f08a32f7252b44d3b60a57996	2017-05-12 11:18:59 -07:00
stooloveu	029290c5b1	SpatialDepthWiseConvolution	2017-05-12 11:34:27 -04:00
Nicholas Léonard	78abf0134d	Merge pull request #458 from jnhwkim/master Update to ignore zero targets	2017-05-12 10:38:18 -04:00
Riddhiman Dasgupta	9db7787316	Updating __getitem__ and __len__ for containers (#1544 )	2017-05-12 16:17:06 +02:00
Du Tran	5989deb707	adding 3d operator translators Summary: Adding caffe-to-caffe2 translators for Conv3D, Pooling3D, BatchNorm Differential Revision: D4945495 fbshipit-source-id: fe3c97547507924a1409b977307b928ce78445f3	2017-05-11 23:01:44 -07:00
Yiming Wu	b070197e8a	cuda unique op Summary: cuda unique op , unittest provided, will provide benchmark agains CPU SpeedUp results for synthetic real data. Input of size 20k, range[1, 10million], ~5x speedup CPU 9.05795(ms) Unique GPU 1.79434(ms) Unique SpeedUp results for 5x synthetic data. Input of size 1 million, range[1, 10million] ~13.7x speedup CPU 54.7539(ms) Unique GPU 3.99473(ms) Unique Reviewed By: akyrola Differential Revision: D5007726 fbshipit-source-id: 0a00c518fd1809d0ae8c6cfcba09b0bd982ffaff	2017-05-11 21:08:10 -07:00
Huazhong Ning	942f53b5a6	gradient impact of task layers on shared is configurable Reviewed By: chocjy Differential Revision: D4943948 fbshipit-source-id: 2e26dfb30c6893b60985f693a823646ed3d3e0e3	2017-05-11 20:34:04 -07:00
Ahmed Taei	16de9746bb	Fix a bug in 3D SpatialBatchNorm[CPU] gradient and improve the code. Summary: As in title Reviewed By: dutran Differential Revision: D5047102 fbshipit-source-id: 01032270115343ab7eaccb97df11729446f1c463	2017-05-11 19:31:18 -07:00
Eric Cosatto	efa913b1c2	fix uninitialized variable in cmake FindSSE (#1023 )	2017-05-11 18:57:34 -07:00
Ben Zhang	93f1d0ca7c	L1 Operator Summary: Adds the L1 Distance operator to distance_op. Reviewed By: bwasti Differential Revision: D5007719 fbshipit-source-id: fd547c6645cf5f87305e9ebfd95ed918779c1d2a	2017-05-11 18:03:10 -07:00
Matt Dering	d1a4467682	fix a bug when calling modules a module that returns a non-standard data structure currently breaks due to checks for backwards hooks. This refactors the code slightly so this will only break in the event of backwards hooks.	2017-05-11 23:00:45 +02:00
Yangqing Jia	0a25b9cb50	fix android build Summary: The most recent diff from Andrey had a tiny bug that triggered an error in Android. Closes https://github.com/caffe2/caffe2/pull/543 Differential Revision: D5040516 Pulled By: Yangqing fbshipit-source-id: d7b11b509a20b8b5e33db74dd383b55f43608c8f	2017-05-11 11:22:25 -07:00
Christian Sarofeen	507ddc4cde	Temporary fix for multiple backwards with fused pointwise RNN (#1540 )	2017-05-11 11:18:56 -07:00
Pavan Yalamanchili	aba05ce9db	Ensuring float tensors call float versions of math functions	2017-05-11 10:39:35 -07:00
Ahmed Taei	8df51a84ac	Support 3D&1D SpatialBatchNorm[CPU] Summary: Generalize SpatialBatchNorm CPU Op to compute Spatial batch normalization for 1D, 2D & 3D input tensors. Reviewed By: dutran Differential Revision: D5043563 fbshipit-source-id: 7fcb933a628dd47f13aa622f63601a87382f09cd	2017-05-11 09:32:54 -07:00
Francisco Massa	be843eb26b	Add unfold to autograd (#1523 )	2017-05-11 17:53:16 +02:00
Aapo Kyrola	a23b378052	set cuda stream for cub::DeviceReduce in SumReduceLike Summary: After a long and painful debugging of indeterministic behavior on Machine Translation team's attention model, I found that in certain cases SumReduceLike will use cub::DeviceReduce, and it lacked the stream param. Reviewed By: jamesr66a, asaadaldien Differential Revision: D5043347 fbshipit-source-id: bb91aacfc6786cc2b85ebc4e432c67e5f876e235	2017-05-10 23:32:44 -07:00
Romain Cledat	e16ea46013	Extended ImageInputOp Summary: Added several features to the ImageInputOp: - bounding box (per image as well as default for the operator). For per-image, it only works in Caffe2 format and is passed as the third tensor in the form (ymin, xmin, height, width). For the operator, pass bounding_xmin, bounding_ymin, bounding_width and bounding_height as parameters. - per-channel mean/std. You can use the usual mean/std to pass a single value to be used for all channels or also pass mean_per_channel and std_per_channel to specify different values per channel. Order of channels is BGR. - A minimum size parameter that can be specified instead of the scale parameter. The minsize parameter will only scale the image if it is smaller than required. This differs from scale which will scale up as well as down. You can only specify one of scale or minsize. Added a test case to test some of the features Differential Revision: D4874988 fbshipit-source-id: 437191052a46e9916defe8b100d7cc7864373f61	2017-05-10 17:52:01 -07:00
Yury Zemlyanskiy	e8c274cf16	Optimize memory usage for MI-LSTM Summary: Use ElementwiseLinearOps instead of manual Mul + Sum. That saves intermediate blobs. For NMT use case Before: https://our.intern.facebook.com/intern/fblearner/details/18060753 Time per step: 0.072 memory usage (per each of 2 GPUs): 9041MiB After:https://our.intern.facebook.com/intern/fblearner/details/18107583 Time per step: 0.0715 Memory (per each GPU): 8560MiB Reviewed By: akyrola Differential Revision: D5038785 fbshipit-source-id: 4bc8155dbd0c87729e17236d68d62ca530aadb53	2017-05-10 16:53:43 -07:00
Aapo Kyrola	967a0ebef4	Revert D5027046: [Caffe2/RNN/opsify] apply links ops Summary: This reverts commit e6dd59ee843fe1507fc87377b0e1e23218dbc384 Differential Revision: D5027046 fbshipit-source-id: 99ac75dffdc35e9b089ccaaf26f8807db0903d43	2017-05-10 15:02:14 -07:00
Lukasz Wesolowski	4fa6ee8219	clean up code for selecting allreduce algorithm Summary: Clean up code for initializing allreduce algorithm. Reviewed By: pietern Differential Revision: D5033172 fbshipit-source-id: 84b9c2b5b3204766a0211aaaa71ea31b09e55013	2017-05-10 12:46:52 -07:00
Aapo Kyrola	362cc296ad	apply links ops Summary: We need to also add links in ops, so that they don't require a sharp timestep boundary. This implements that. Reviewed By: salexspb Differential Revision: D5027046 fbshipit-source-id: e6dd59ee843fe1507fc87377b0e1e23218dbc384	2017-05-10 10:46:28 -07:00
Xiaolong Wang	11bcdbc3f0	Load Parameters from Model Summary: In Dper utility, add a function `load_parameters_from_model_init_options` to allow init parameters from pretrained models Reviewed By: xianjiec Differential Revision: D4926075 fbshipit-source-id: 5ab563140b5b072c9ed076bbba1aca43e71c6ac5	2017-05-10 10:33:04 -07:00
Aapo Kyrola	20ae447ce4	Instead of switching workspaces, create expliticly shared blobs Summary: As part of opsifying the RNN execution, we cannot do the workspace switching anymore as it happens at timestep boundary. But we can get same effect by just creating explicitly the blosb into the shared workspace. Reviewed By: salexspb Differential Revision: D5025667 fbshipit-source-id: 921c97cb2f7941f9f9235913a60e34667badc303	2017-05-10 09:38:03 -07:00
Aapo Kyrola	c70405271b	opsify parameter gradient accumulation Summary: Instead of explicitly accumualting the gradients in a loop, add corresponding Sum-ops to the net. This will allow for better parallelism with multithreaded nets. Reviewed By: salexspb Differential Revision: D5011177 fbshipit-source-id: 14e2fa2a6905703322d5701c1362054c17c4e796	2017-05-10 07:53:07 -07:00
Adam Paszke	5bb13485b8	Fix Linear function	2017-05-10 16:43:14 +02:00
Adam Paszke	a86adf43a1	Fix comparison functions	2017-05-10 16:43:14 +02:00
Adam Paszke	1c304a9ef6	Expose variable attribute of AccumulateGrad	2017-05-10 16:43:14 +02:00
Adam Paszke	feef54ec34	Don't modify non-volatile grads in zero_grad	2017-05-10 16:43:14 +02:00
Adam Paszke	5026209d0c	Minor fix in Prod backward	2017-05-10 16:43:14 +02:00
Adam Paszke	e7220380bc	Add new flags to Variable.backward	2017-05-10 16:43:14 +02:00
Adam Paszke	9fa0e403d6	Replace retain_variables with retain_graph	2017-05-10 16:43:14 +02:00
Adam Paszke	35cf380ed1	Improve output wrapping logic in autograd	2017-05-10 16:43:14 +02:00
Wojciech Jaśkowski	3a7e068439	Remove spurious memo argument in Module.parameters() (#1527 )	2017-05-10 13:55:15 +02:00
Kittipat Virochsiri	a66f02b223	Make dataset ops handle empty tensor better Summary: `Append` & `UnPackRecords` don't handle empty tensor well. `Append` would erase the shape of empty tensor, which break the invariants of dataset. `UnPackRecords` leaves output tensor in an undefined state. If the output tensors were initialized, they would not be cleared out. If the output tensors were not initialized, they would remain uninitialized. This diff disable unpacking empty record if prototype tensors are not provided (since output shapes maybe indeterminable if they were not initialized). The interface remains the same if empty record tensor is not used. Reviewed By: azzolini Differential Revision: D4956012 fbshipit-source-id: ad80527d78eb7421cd90968edb82322c289cd417	2017-05-09 19:48:36 -07:00
Yury Zemlyanskiy	3abd0cb623	Add axis argument to SoftmaxWithLoss Summary: ##axis## argument for SoftmaxWithLoss (it doesn't yet work for spatial case). Reviewed By: akyrola Differential Revision: D5025797 fbshipit-source-id: 9e3cf39223af3f2c8bb357f8d9fe952b7349f913	2017-05-09 19:36:00 -07:00
Alisson Gusatti Azzolini	75bc9f5e77	Relax requirement on token uniqueness Summary: Relax requirement on token uniqueness since a few use cases broke after the uniqueness requirement was added in a previous diff. Reviewed By: kittipatv Differential Revision: D5034132 fbshipit-source-id: 327eb065923e6ea152a360324316f81b7fb9564b	2017-05-09 19:36:00 -07:00
Yury Zemlyanskiy	48de1ea165	Drop extra Reshape in attention calculation Summary: We can avoid this extra Reshape. Reviewed By: jamesr66a Differential Revision: D5032874 fbshipit-source-id: 92bd568bc6bec53d7f81a64cfa96d2c610823f8c	2017-05-09 17:16:36 -07:00
Soumith Chintala	862105ec8b	Merge commit 'd5e821044aa20d67122f4570a3f1cb7e6e9c2617'	2017-05-09 17:06:25 -07:00
Sam Gross	d5e821044a	Make torch.cat not synchronize the host and device	2017-05-09 17:05:23 -07:00
Alexander Sidorov	8e3ce4bae7	RNN: reduce verbosity of "Use Summary: this is still printed on tests a lot. Lets use 1 instead of 0 as most of our RNN code does Reviewed By: jamesr66a Differential Revision: D5031460 fbshipit-source-id: bc07990b66c89dfbd97133493cca11929d3138e5	2017-05-09 17:03:42 -07:00
Edward Z. Yang	bfc8a3ebba	Reference counting documentation. (#1520 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-09 17:02:28 -07:00
Gregory Chanan	6fab62173e	Restore examples with keepdim=True default.	2017-05-09 14:49:55 -07:00
Gregory Chanan	c4742fd128	Explicitly pass keepdim=False for tests that require it. If we change the default to False, reverting this commit is optional.	2017-05-09 14:49:44 -07:00
Gregory Chanan	e124790cb2	Change keepdim default to False.	2017-05-09 14:49:21 -07:00
Gregory Chanan	171638a451	Fix test_normalize NN test.	2017-05-09 14:25:06 -07:00
Gregory Chanan	d95f711501	Add a keepdim test to torch_test.	2017-05-09 14:25:01 -07:00
Gregory Chanan	b9e00dfbb8	Make (non-legacy) nn backwards compatible. The keepdim change only seems to leak in one place: when the grad_bias is returned in linear.py.	2017-05-09 14:24:53 -07:00
Soumith Chintala	f6a00fac13	Add autograd tests for keepdim	2017-05-09 14:24:45 -07:00
Gregory Chanan	be5191a00b	Add documentation for keepdim.	2017-05-09 14:16:42 -07:00
Gregory Chanan	c9d8e0a43a	Change all legacy/nn modules to use keepdim=True (even if tests don't fail). We shouldn't be introducing changes in legacy modules if we can avoid it.	2017-05-09 14:16:31 -07:00
Gregory Chanan	ae2b2cbbec	Make keepdim work with autograd.	2017-05-09 14:15:59 -07:00
Yury Zemlyanskiy	ae924be3ac	Removing extra Reshapes in MILSTM with new broadcasted ops Summary: D4873222 introduced SumReduceLike and removed the use_grad_hack ... hack. Remove unnecessary reshapes and kill use_grad_hack parameters. Reviewed By: jamesr66a Differential Revision: D4894243 fbshipit-source-id: c4f3f84abf95572d436b58bbdc2b18b21583c2f1	2017-05-09 14:11:04 -07:00
Soumith Chintala	f4cf1d6d18	Merge commit 'af790f86f329364dacef1301fc9b5b292629075c'	2017-05-09 14:04:08 -07:00
Soumith Chintala	c34cff7035	Merge commit '906c550e1079e9762194db59440a202ffca90dca'	2017-05-09 14:03:28 -07:00
Soumith Chintala	194d7408bb	Merge commit '5f308b50fb558a620253443ef45f7cf3a91be410'	2017-05-09 14:02:25 -07:00
Soumith Chintala	0d538246fb	Merge commit '98dbdc464b0f53ecc89af58cc994c7e8d7617e4e'	2017-05-09 14:01:13 -07:00
Gregory Chanan	7c3cb24485	Add a keepdim parameter for reduction functions over a single dimension. By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)	2017-05-09 14:01:03 -07:00
Xiaolong Wang	add840510f	Refactor Optimizer to Allow scale_learning_rate Summary: In transfer learning, parameter initialized from pretrained model might require a different learning rate than otherwise initialized. To this end, here we implement a python solution where `base_learning_rate` is scaled by `scale`, which is in turn set by `scale_learning_rate`; Alternatively, we can achieve same effect by rewriting the LearningRate operator in C++ Reviewed By: kennyhorror Differential Revision: D4992827 fbshipit-source-id: 8d7e87a61c95b3eb8ef733ec436f4060e865c0ac	2017-05-09 13:16:21 -07:00
Alisson Gusatti Azzolini	20d8de8d51	Parameter cost estimation job Summary: Adds a parameter cost estimation step before the actual training starts. The costs are later used in order to better shard the parameters across instances of the parameter server. Things I needed to modify: - A few changes to make ModelLayerHelper picklable - Add support for stopping a distributed job after a number of stats reporting steps. - Refactored run_dist_job to support collocating the reader with the trainer even when PS are present. - Option to disable dense updates (when num_dense_servers=0). Currently there's a huge overhead posed by having to launch a child workflow. I'll try and address next in a subsequent diff. This is WIP because the other workflows need to be migrated as well. I can break this down into smaller diffs if reviewers would prefer it. Reviewed By: kennyhorror Differential Revision: D4974752 fbshipit-source-id: 04c336acb2945f8f11324a221ffc6967818c0672	2017-05-09 13:02:24 -07:00
Gregory Chanan	af790f86f3	Add a keepdim parameter for reduction functions over a single dimension. By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)	2017-05-09 11:55:42 -07:00
Gregory Chanan	906c550e10	Add a keepdim parameter for reduction functions over a single dimension. By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)	2017-05-09 11:55:29 -07:00
Gregory Chanan	5f308b50fb	Add a keepdim parameter for reduction functions over a single dimension. By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)	2017-05-09 11:55:20 -07:00
Gregory Chanan	98dbdc464b	Add a keepdim parameter for reduction functions over a single dimension. By default, this parameter is False -- a backwards incompatible change, but one that follows numpy semantics, e.g. numpy.sum (numpy names the parameter "keepdims" since you can pass multiple dims to reduction functions). The old behavior seems desired for normalization type operations where the tensor will immediately be expanded out again, e.g.: probs.sum(1).expand_as(probs) which no longer works because the dimension to expand is missing. This can be fixed by simply passing True as "keepdim" argument to the reduction operation, e.g: probs.sum(1, keepdim=True).expand_as(probs)	2017-05-09 11:54:58 -07:00
Alisson Gusatti Azzolini	bd8ed6641c	Stabilize PythonOp token name Summary: For distributed jobs, we were relying on the order the PythonOps were registered, which was very fragile. Reviewed By: dzhulgakov Differential Revision: D5016847 fbshipit-source-id: f5601467c5b0569d5e8a0efdd76abad0d703c5f5	2017-05-09 11:19:44 -07:00
Pieter Noordhuis	d48795e699	Use non-local include syntax Summary: TSIA Fixes lint. Differential Revision: D5024776 fbshipit-source-id: 6ee865c7d9892475d9d349c0ed0b4a57803dcf50	2017-05-09 11:02:41 -07:00
Hans Gaiser	e44bc88c2e	Remove command "touch cmake". Summary: It is quite normal to rerun cmake to find new files through GLOB commands. This external command always forces cmake to run when you run make, meaning every make command takes longer than necessary. This PR removes the external command `touch CMakeLists.txt` and therefore leaves that decision up to the user when to rerun cmake and speeds up building when cmake is not required to rerun. Closes https://github.com/caffe2/caffe2/pull/453 Reviewed By: Yangqing Differential Revision: D4978919 Pulled By: bwasti fbshipit-source-id: 0da4495b276a04f6ce46e1c8ceca0474b7573aa0	2017-05-08 18:06:48 -07:00
iamqk	65cf2f0117	compile error when build on mac enviroment Summary: build on mac, when build error occurred [ 71%] Building CXX object caffe2/CMakeFiles/Caffe2_CPU.dir/operators/transpose_op.cc.o In file included from /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.cc:1: /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.h:25:28: error: implicit instantiation of undefined template 'std::__1::array<int, 2>' std::array<int32_t, 2> temp_params = {{tiles_, axis_}}; ^ /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__tuple:114:65: note: template is declared here template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array; ^ In file included from /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.cc:1: /Users/pg/DeepLearning/caffe2/caffe2/operators/tile_op.h:119:28: error: implicit instantiation of undefined template 'std::__1::array<int, 2>' std::array<int32_t, 2> temp_params = {{tiles_, axis_}}; ^ /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/__tuple:114:65: note: template is declared here template <class _Tp, size_t _Size> struct _LIBCPP_TYPE_VIS_ONLY array; Closes https://github.com/caffe2/caffe2/pull/519 Reviewed By: asaadaldien Differential Revision: D5020422 Pulled By: bwasti fbshipit-source-id: 2a5896d0a7aa643dfe3ff688db958434b6e87780	2017-05-08 17:46:28 -07:00
Soumith Chintala	e70164316c	Merge commit '91a118c116d15d280a99a39666d298be15c6d592'	2017-05-08 16:58:56 -07:00
Alexander Matyasko	33b3968660	add larger tests for qr	2017-05-08 16:58:54 -07:00
Alexander Matyasko	91a118c116	Fix bug in magma qr decomposition and add tests for larger matrices	2017-05-08 16:44:15 -07:00
Pieter Noordhuis	218ea722fb	Don't use sync mode by default Summary: TSIA For more information see: * https://github.com/caffe2/caffe2/issues/360 * https://github.com/facebookincubator/gloo/issues/34#issuecomment-299330222 Reviewed By: andrewwdye Differential Revision: D5011293 fbshipit-source-id: 2704151f84a46e658bd28dab3bcb9849c8423efc	2017-05-08 16:33:22 -07:00
Simon Layton	1d0ba2cfbd	New cudnn ops Summary: cuDNN versions of dropout and LRN (for native fp16 support), port of Caffe's max pooling algo that uses an explicit mask to store locations (also supports fp16 storage) Closes https://github.com/caffe2/caffe2/pull/396 Reviewed By: akyrola Differential Revision: D4990880 Pulled By: asaadaldien fbshipit-source-id: a716acffb656843e9b31e3e6808bd2d8aa959d03	2017-05-08 16:33:21 -07:00
Soumith Chintala	0764589ed1	Merge commit '008a8c9720183d7bf8b00bf64d8d21c62270089f'	2017-05-08 16:24:14 -07:00
Soumith Chintala	27671c800d	Merge commit '105df5844dca21f964d180a918c808489862941f'	2017-05-08 16:23:12 -07:00
ethanluoyc	d0504aa41d	Implement lgamma function.	2017-05-08 16:21:26 -07:00
ethanluoyc	008a8c9720	Implement lgamma function.	2017-05-08 16:20:52 -07:00
ethanluoyc	105df5844d	Implement lgamma function.	2017-05-08 16:20:39 -07:00
Soumith Chintala	50bf7d5cbc	Merge commit '066fbcd014fa4092152b2cd04ad1d92fc8d7bd59'	2017-05-08 16:13:57 -07:00
Trevor Killeen	066fbcd014	use current stream in cat array kernel launch	2017-05-08 16:12:10 -07:00
Soumith Chintala	ecf29f10ad	Merge commit '22bbd7ac33ba51469cc913cb01fcd3b70a42e528'	2017-05-08 16:10:00 -07:00
Trevor Killeen	22bbd7ac33	s/IndexType/long	2017-05-08 16:09:02 -07:00
Romain Cledat	2075abbe30	Gloo: Added a way to create connected contexts from another context Summary: Added a context factory that allows you to use an existing context to create other fully connected contexts much more cheaply (without having to rely on a store). Limitations: - The backing context needs to be fully connected Reviewed By: andrewwdye, pietern Differential Revision: D4985121 fbshipit-source-id: 31ceabccbb679cedb18ec9927b6c166bef5989bb	2017-05-08 16:02:04 -07:00
Yury Zemlyanskiy	11052d03aa	RNNCell API change: returns states and outputs Summary: Incorporating definition of cell's output and illustraing it's usage by adding dropout to all types of cell. I think that we should try to get rid of aliases in RecurrentNetwork, so output of applied_over_sequence is also always (state_1_all, state_2_all, ...). This way we can merge get_output_from_single_step, get_output_from_sequence and get_outputs_with_grads into a single method Let me know what do you think! Reviewed By: jhcross Differential Revision: D4992913 fbshipit-source-id: 737939be336ad145f84e8733cd255d4f7188ef70	2017-05-08 15:19:48 -07:00
Luca Antiga	e694db0eeb	Raise error when Variable is converted to bool. Fixes #1482 . (#1491 )	2017-05-08 23:14:11 +02:00
t-vi	c5ae79fe4e	Make clamp twice differentiable (#1514 )	2017-05-08 23:12:42 +02:00
Yury Zemlyanskiy	b6a8dd1438	don't recompute small blob in attention Summary: decoder_hidden_encoder_outputs_sum_tmp is tiny after D5010109, no need to recompute it. Reviewed By: akyrola Differential Revision: D5014335 fbshipit-source-id: cc9e8f91372889d10bd99c79366018cb3943a435	2017-05-08 13:06:06 -07:00
Aapo Kyrola	0892a1428b	Add size assertions to SparseAdam/Adagrad Summary: Add boundary checks to sparse updates for easier debugging . Differential Revision: D5018121 fbshipit-source-id: c0f18d75adf9adf66f8eb8022231e7e9d274838e	2017-05-08 11:41:22 -07:00
Kevin Matzen	0cb7774445	softplus op Summary: Added softplus function, f(x) = ln(exp(x) + 1) Reviewed By: akyrola Differential Revision: D5011057 fbshipit-source-id: 5fddb1568fee625f81ea3a86a85d0f400c3ee278	2017-05-08 10:40:25 -07:00
Icyblade Dai	4ad2e155bc	Make nn.Sequential more pythonic (#1510 ) A minor fix which uses `enumerate` during iteration.	2017-05-08 07:32:07 -07:00
Andrey Malevich	12965a4108	Add Poorman's IOBound ThreadPool for serialization. Summary: At the moment serialization can tak up to 3x memory of the largest blob: original blob, BlobProto, SerializeAsString version of the blob. As a result in certain cases serialization takes more memory than it should and it hurts utilization/max model size per machines. This diff is adding IOBound ThreadPool that should set quite strict limitation on the extra memory overhead per one blob. Reviewed By: dzhulgakov Differential Revision: D5012887 fbshipit-source-id: 12dbb9d3efab136411ddeffd519b602cf606661e	2017-05-08 06:43:31 -07:00
Xianjie Chen	8a7f00d61b	fix mean pooling Summary: Segment based Ops requires increasing seg id, and without gap. Lengths based Ops does not have this requirements. Otherpooling methods, e.g., LogExpMean does not have Lengths based Ops available yet. Differential Revision: D5019165 fbshipit-source-id: ab01a220e10d4ed9fa2162939579d346607f905e	2017-05-08 01:09:07 -07:00
Jon Morton	ac1c63dda8	Add specialized ResizeNearest implementation for scale=2 Summary: Specialized implementation of ResizeNearest for width_scale=2 and height_scale=2. This implementation doesn't use divides or calls to std::min, and is unrolled 2x over the width dimension. Also add a correctness test. About 6x faster. Reviewed By: ajtulloch Differential Revision: D4928579 fbshipit-source-id: 5cc92a52bd688690fee907b4333d9c84b666f9c9	2017-05-07 21:10:11 -07:00
Sergey Zagoruyko	6d693fe413	Add F.normalize (#1467 )	2017-05-07 13:54:16 +02:00
Luca Antiga	23b556ef77	Expose custom attributes from C++ functions (#1430 )	2017-05-07 13:49:55 +02:00
Marvin CAO	e3f41a4962	Add high order gradient support for Sigmoid (#1496 )	2017-05-07 13:00:20 +02:00
Aapo Kyrola	98cf176baa	improve style + a bit of perf for ScatterWeightedSum CUDA Summary: For perf, it is better to check weight0 inside the kernel and avoid host synchronization when copying to a stack variable. Improved style a bit (github does not have Lint, so contributed code may not conform to our style). Differential Revision: D5011668 fbshipit-source-id: 1eb85912f6f499acd3190cfcb59e7e39c2220d89	2017-05-07 01:08:42 -07:00
Luca Antiga	90e9f8a476	Avoid segfault when calling join_with with self as arg (#1493 )	2017-05-07 00:35:11 +02:00
Adam Paszke	5f15a9e0cb	Add a note about THPFunction_asFunction	2017-05-06 14:28:32 -07:00
Jonathan Weese	8f692b5642	declare UpdateTimestepBlob inline Summary: Since this function is declared in a header file, and is not templated and not part of a class, it will produce an ODR error if it is included in more than one file. Adding the `inline` keyword fixes this. Reviewed By: jhcross, jamesr66a, m3rlin45 Differential Revision: D5011770 fbshipit-source-id: 50266a530da31ebfda59fcca2048355a00fe7758	2017-05-05 21:50:58 -07:00
Aapo Kyrola	711ea1d4ac	fix enternalinputs handling in AppendNet v2 Summary: External inputs must be computed before updating the _ops_output structure, otherwise if the net to be appended outputs the external input, it is not added correctly Differential Revision: D5013496 fbshipit-source-id: 6a83d0a6f1c63ef8ae7bec4d862c0ac2a690d47b	2017-05-05 21:50:57 -07:00
Du Tran	033ab9da1b	Adding video data layer for caffe2 Summary: Adding a simple video data layer which allows to read video data from frames, videos and output 5D tensor. It also allows multiple labels. The current implementation is based on ffmpeg Differential Revision: D4801798 fbshipit-source-id: 46448e9c65fb055c2d71855447383a33ade0e444	2017-05-05 14:16:38 -07:00
Aapo Kyrola	a61778a628	fix recompute_blobs_on_backward Summary: My previous refactoring broke recompute_blobs_on_backward, which was cleared. Reviewed By: urikz Differential Revision: D5013351 fbshipit-source-id: 5945778c0cff2ee2c7f5ad7b59b58f4305fa6a05	2017-05-05 14:01:34 -07:00
Aapo Kyrola	f2392bb8cb	Fix Split documentation Summary: Split doc failed to mention important features like specifying 'split' argument. Two questions the same day in Caffe2 Users were about how to do this. Reviewed By: azzolini Differential Revision: D5009503 fbshipit-source-id: 883549be891705a5c83778302d967481419f4dde	2017-05-05 13:46:39 -07:00
James Cross	5c667ebe4e	AttentionCell Summary: This diff creates a generalized AttentionCell class, which will allow us to construct attention decoders out of arbitrary RNNCell components (with a particular view to using stacked, multi-layer RNNs). In order to do this, we introduce a new optional input for RNNCell._apply which allows us to provide an additional input that is not processed by prepare_input(). Note that this is an argument only to _apply, not apply, since it is only meant to be used for additional recurrent connections to "embedded" cells, not for standalone RNNs. Reviewed By: urikz Differential Revision: D4998465 fbshipit-source-id: 473009ea4917e86e365f9d23aa2f11a46a94fd65	2017-05-05 12:33:01 -07:00
Yury Zemlyanskiy	d7f20c94fd	Optimize memory for RNN attention Summary: The fix should save us (source_len - 1) * target_len * batch_size * encoder_output_size * 4 bytes for the forward pass. Typically, these values are 100 * 100 * 128 * 512 * 4 = 2.4GB. Not entirely sure about backward pass. Reviewed By: akyrola Differential Revision: D5010109 fbshipit-source-id: 2ed68f3ebfd3b8362916d24af991482f1686e064	2017-05-05 12:18:50 -07:00
Eider Moore	0c6099ce25	Add __dir__ so autocomplete in iPython works. Summary: It is good practice to provide __dir__ whenever __getattr__ is defined so that tooling will work intelligently. In particular, it is hard to explore the available methods in iPython without tab completion. Reviewed By: dzhulgakov Differential Revision: D5006545 fbshipit-source-id: 1a150d91d54637d80b292764513943ff70d971b4	2017-05-05 11:32:06 -07:00
Heng Wang	8a2433eacb	Add model saving and loading to resnet50_trainer.py Summary: Script caffe2/caffe2/python/examples/resnet50_trainer.py can be used to train a ResNet-50 model with Imagenet data (or similar). However, currently the script does not actually save the model, so it is kind of useless. Task 1: After each Epoch, save the model in a file "<filename>_X.mdl' where X is the epoch number and <filename> is given as a command line parameter. By default, use "resnet50_model" as filename. Task 2: Add a functionality to restore the model from a previous file: - add a command line parameter "load_model", which user can use to specify a filename. - if this parameter is set, load the model parameters from the previous file Reviewed By: prigoyal Differential Revision: D4984340 fbshipit-source-id: 333e92679ba52a7effe9917fdfc2d55d652b868f	2017-05-05 10:08:37 -07:00
Aapo Kyrola	5c52392229	opsify AccumulateInputGradients Summary: Part of project to make all gradient accumulation business ops in RecurrentNetworkGradientOp, this makes the accumulateInputGradients ops. Also added way to mark operators private so they don't appear in docs. Reviewed By: salexspb Differential Revision: D5006698 fbshipit-source-id: 226d7afb473290c8d0f936d2cc87640be3e06615	2017-05-05 09:13:39 -07:00
Romain Cledat	aa5e771042	Added tiles and axis as input parameters to Tile Operator Summary: Added the possibility to add 'tiles' and 'axis' as input as opposed to arguments for the Tile Operator. If provided, the input values will override the argument values. Now with proper CUDA code Differential Revision: D4930347 fbshipit-source-id: b44b032b327c7d7bddfce63abf4e3289d7e74bfb	2017-05-04 23:46:51 -07:00
Xiaolong Wang	0d32ab4a45	Refactor FTRL optimizer to allow sending Alpha as input blob Summary: Split from parent diff Reviewed By: xianjiec Differential Revision: D4992993 fbshipit-source-id: 9f8a79023b0c581e84bd5e82e2e730c9e1a86e1e	2017-05-04 22:57:00 -07:00
Kittipat Virochsiri	211eae127c	LastNWindowCollector Summary: Layer for LastNWindowCollector op. We need this since it's an in-place operator. Reviewed By: chocjy Differential Revision: D4981772 fbshipit-source-id: ec85dbf247d0944db422ad396771fa9308650883	2017-05-04 17:32:09 -07:00
Luke Yeager	b229b7ff11	Fix typo 'convlution' Summary: Closes https://github.com/caffe2/caffe2/pull/470 Differential Revision: D5003850 Pulled By: pietern fbshipit-source-id: 62ba13f58dae9f19a434f2075ff3ac143d34feb5	2017-05-04 17:02:35 -07:00
Aapo Kyrola	d312dcc881	lstm_benchmark use rnn_cell.LSTM multicell + assertion Summary: Use the rnn_cell's multi-cell for LSTM benchmark. While doing this, i had not changed the initial_states and I got a inconsistent result from rnn_cell, so added an assertion to check initial states length is 2 * num layers. + fix division by zero error Reviewed By: salexspb Differential Revision: D5003177 fbshipit-source-id: a8250b825394c352428a0f067098dfcd7516ab2a	2017-05-04 17:02:32 -07:00
Kittipat Virochsiri	c34d5a838f	Generalize LastNWindowCollector Summary: Use `CopyItems` so that it accepts any type of tensor. Also, move the cursor to input blob so that it's checkpoint friendly. Output is now also part of input so that inference can work correctly. Reviewed By: xianjiec Differential Revision: D4920987 fbshipit-source-id: da532736225ec27f409ff763ff69a0629235151c	2017-05-04 16:05:15 -07:00
Pieter Noordhuis	4662781099	Include hint about run ID in store handler assertion Summary: TSIA Also see https://github.com/caffe2/caffe2/issues/476 Differential Revision: D5002728 fbshipit-source-id: 2c301cacc395cfed4eec11dffedc3dba0e180e72	2017-05-04 15:21:12 -07:00
Pieter Noordhuis	348e0af6e1	Remove unused binary fb_run_plan_mpi Summary: TSIA This caused a compilation problem on gcc-6, see https://github.com/caffe2/caffe2/issues/456. Differential Revision: D5002823 fbshipit-source-id: 764aae1eaf78ee9918455b95a12e982597b85fdc	2017-05-04 15:21:11 -07:00
Abhishek Kadian	ff0ff33a11	Fix docs for InstanceNorm (#1477 )	2017-05-04 18:11:15 -04:00
Lukasz Wesolowski	eb2c6ea874	set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data Summary: Set deviceId_ to -1 when CudaDevicePointer and CudaStream do not have valid data Reviewed By: andrewwdye Differential Revision: D4881374 fbshipit-source-id: e973a70e2e6e4519f5fdc2ad4e76f232d9593751	2017-05-04 15:05:27 -07:00
Trevor Killeen	e64b2e1cd7	add documentation for cwrap plugins (#1474 )	2017-05-04 17:50:58 -04:00
Pieter Noordhuis	dbe7654062	Always use halving/doubling allreduce Summary: Gloo added support for non-power-of-2 number of nodes in the recursive halving/doubling allreduce algorithm by implementing the binary blocks extension. This means we no longer have to fall back to using the ring algorithm when the number of nodes is not a power of 2. Reviewed By: prigoyal Differential Revision: D4992536 fbshipit-source-id: f231aecbb46296ae3441ab818e058eb7ad6d8d64	2017-05-04 14:48:26 -07:00
Pieter Noordhuis	004c740b6d	Update gloo dependency Summary: Closes https://github.com/caffe2/caffe2/pull/485 Differential Revision: D4998648 Pulled By: pietern fbshipit-source-id: efe862fdf6195f9dfd5d0b98fb12e2e2f48bb894	2017-05-04 14:18:30 -07:00
Pieter Noordhuis	395a80f757	Check GCC version if compiling with CUDA support Summary: Otherwise compilation fails pretty far into the build, which is inconvenient. The error reported when trying to compile with GCC 6: CUDA 8.0 is not compatible with GCC version >= 6. Use the following options to configure GCC version 5: -DCMAKE_CXX_COMPILER=/usr/bin/g++-5 -DCMAKE_C_COMPILER=/usr/bin/gcc-5 -DCUDA_HOST_COMPILER:FILEPATH=/usr/bin/gcc-5 Closes https://github.com/caffe2/caffe2/pull/504 Reviewed By: akyrola Differential Revision: D5004299 Pulled By: pietern fbshipit-source-id: 185cd2f846f291a48e1d41ce0d87ca69e7f2c593	2017-05-04 12:19:17 -07:00
Luke Yeager	c8f444237f	net_drawer: --input is required Summary: Before: ``` $ python -m caffe2.python.net_drawer Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/data/caffe2/install/caffe2/python/net_drawer.py", line 403, in <module> main() File "/data/caffe2/install/caffe2/python/net_drawer.py", line 365, in main with open(args.input, 'r') as fid: TypeError: coercing to Unicode: need string or buffer, NoneType found ``` After: ``` $ python -m caffe2.python.net_drawer usage: net_drawer.py [-h] --input INPUT [--output_prefix OUTPUT_PREFIX] [--minimal] [--minimal_dependency] [--append_output] [--rankdir RANKDIR] net_drawer.py: error: argument --input is required ``` Closes https://github.com/caffe2/caffe2/pull/479 Differential Revision: D5003898 Pulled By: pietern fbshipit-source-id: d121c331411ba4bbded81f9658ec787fa2fd3dc1	2017-05-04 11:45:57 -07:00
James Reed	f220282ddd	Set optimal number of DAG workers for predictor beam search step-net Summary: Allow RecurrentNetwork to accept dag as a step-net Differential Revision: D4985747 fbshipit-source-id: ff39e0386c8f3a7364801a3011558f322d8ea669	2017-05-04 10:16:03 -07:00
gchanan	7d40140bfb	Document squeeze behavior on 1-dimensional tensors of size 1. (#1470 )	2017-05-04 16:54:22 +02:00
GBLin5566	e50c7daaf9	Use Qr factorization to get orthogonal matrix in orthogonal init (#1453 )	2017-05-04 07:11:59 -04:00
Soumith Chintala	600f366a13	Merge commit 'a6876a4783ce3d1bb3c6ba69f54c31983097ed17'	2017-05-04 06:51:10 -04:00
Soumith Chintala	a6876a4783	fix corner-case in MaxPooling	2017-05-04 06:50:15 -04:00
Ankit Vani	4e18d89791	added twice differentiation for a bunch of ops (#1426 )	2017-05-04 06:47:14 -04:00
Soumith Chintala	de9845588d	Merge commit 'c061ed5bda238e1276601593343c10428d01eaae'	2017-05-03 23:14:26 -04:00
Soumith Chintala	c061ed5bda	handle beta=0 for gemv with transpose	2017-05-03 23:05:41 -04:00
Adam Paszke	e9d648c5e7	Fix memory leak introduced by 72e8190 (#1464 )	2017-05-03 18:38:56 -04:00
Aapo Kyrola	57e51bd72a	make all tensor.h enforces pass the caller Summary: When I added the CAFFE_ENFORCE_WITH_CALLER typedef to tag the tensor-pointer into enforce-exceptions, I only changed the most common callsites. This changes all enforces in tensor.h. Reviewed By: salexspb Differential Revision: D4995773 fbshipit-source-id: 90f2d277aeeb1354e72f92b2b9a75601fcbea609	2017-05-03 15:31:29 -07:00
Carlos Garcia Jurado Suarez	47c1418816	Add caffe2 operators to mobile: Log, StumpFunc, Div, Sub Summary: Add the above operators to fbobjc and fbandroid by splitting them out to separate files and including these on the build. We are using these on mobile as part of Scout (Messenger). Reviewed By: bwasti Differential Revision: D4958660 fbshipit-source-id: f5cb105b4d7186a7eef705023382ec1383b6ec21	2017-05-03 15:10:34 -07:00
Edward Z. Yang	80c0a8776b	Fix #1447 : sparse_mask doesn't make sense with uncoalesced tensors (#1458 ) * Make sparseMask error if mask is uncoalesced. Fixes #1447. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add test for sparse adagrad. Previously, the sparse codepath was not exercised at all; this commit adds a very simple test case "sparse Rosenbrock"; the idea is to do Rosenbrock but then knock out one of the dimensions so that the tensor is sparse. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 17:53:45 -04:00
Edward Z. Yang	4ec0435b39	Report overall size of sparse tensors. (#1461 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 17:51:56 -04:00
Aapo Kyrola	1a831ce8f2	Add direct enqueuing to enable RNN input, allow specify batch columns Summary: Add a parameter dont_rebatch to data_workers. This disables batching of input from fetcher to equal-batch size chunks. This is not desired with RNNs where with longer sequence length we might want to have smaller batches etc. For some reason the graceful-shutdown test interfered with other tests, so I removed it. Reviewed By: jay-mahadeokar Differential Revision: D4988549 fbshipit-source-id: cbab46d77c948f2e293e79e6eb538dde17d800ee	2017-05-03 14:49:44 -07:00
Edward Z. Yang	f8be3a20d3	Fix scatter_ documentation typo. (#1463 )	2017-05-03 17:31:04 -04:00
Pieter Noordhuis	7b21b0b6d7	Retry on write EINTR in sync mode Summary: We weren't handling an edge case where write(2) would return EINTR when in sync mode. The Pair::write function would return false indicating it didn't complete the write whereas the send function expects it to complete when in sync mode. With this change we now advance the cursor and retry the write when fewer than expected bytes were written. Also see https://github.com/facebookincubator/gloo/issues/34 Reviewed By: andrewwdye Differential Revision: D4996949 fbshipit-source-id: 3bad4fa3d0a01517f20b64904aa71410641fa60f	2017-05-03 14:26:26 -07:00
Pooya Davoodi	16821bc45d	Add ScatterWeightedSum for GPU. Summary: - Adding ScatterWeightedSumOp for CUDA. - This version does not support input weight (weight0). In other words, the input weight has to be 1.0, otherwise the op exits. - To check the value of weight0, we copy its value from device to host at: https://github.com/caffe2/caffe2/pull/443/files#diff-2a77f80797072e8443f4867cb709fb40R244 Closes https://github.com/caffe2/caffe2/pull/443 Reviewed By: akyrola Differential Revision: D4971910 Pulled By: asaadaldien fbshipit-source-id: 2282e968f95364f0b3b8126502b053fe7a32ba20	2017-05-03 13:40:48 -07:00
Edward Z. Yang	0910e0ac90	Fix memory leak in coalesce. (#1460 ) Fixes #1449. For future reference, we should have a doc explaining our ref-counting conventions; it looks like this bug slipped by because we assumed that newTensor was taking ownership of the pointers it was passed in. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 13:29:39 -04:00
Naim Kabir	93094294ba	function backward attempted to multiply tuple by variables (#1459 ) One line fix--changed it to multiple the grad_variables by the len(variables) when grad_variables is None.	2017-05-03 13:12:21 -04:00
James Cross	ddc4d101ad	MultiRNNCell (Caffe2) Summary: Add Python support for arbitrary (unidirectional) recurrent networks with MultiRNNCell abstraction. Since the combined step net for all layers is created at one time (in method _apply), this may be optimizable as-is. LSTM() function is extended to accept a list of numbers of units for the dim_out argument, producing a multi-layer LSTM in that case. Reviewed By: salexspb Differential Revision: D4965001 fbshipit-source-id: 39c069468d5b40bf803503cf62046a479ca83cbb	2017-05-03 10:02:31 -07:00
Bram Wasti	ff1330192c	auto -> return type for C++11 support Summary: Builds are breaking https://travis-ci.org/caffe2/caffe2/jobs/228149040 Reviewed By: Yangqing Differential Revision: D4992774 fbshipit-source-id: bea4132db9c2bf24342887a2bc4cbd6225a5ce9a	2017-05-03 09:08:50 -07:00
Edward Z. Yang	743e4894d2	Prefix values/indices/sparse_mask/nnz with underscore (#1457 ) As discussed in #1441. I also added some docs giving clear guidance about how to coalescing in sparse tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 11:14:10 -04:00
Trevor Killeen	f273377d19	add device asserts in scatter/gather kernels	2017-05-03 11:12:26 -04:00
Soumith Chintala	836332e0a1	Merge commit 'f1591fade5c8df5272b79ab1bd8b0b261bb5606a'	2017-05-03 11:11:43 -04:00
Trevor Killeen	f1591fade5	add device asserts in scatter/gather kernels	2017-05-03 11:10:31 -04:00
andrew giessel	2e7635b929	Add flexible bilinear upsampling aspect ratio redux (#1317 )	2017-05-03 08:46:28 -04:00
Edward Z. Yang	e9953c4595	A number of post-merge fixes for test_sparse (#1444 ) * Simplify _gen_sparse Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Randomly generate an uncoalesced tensor and test with it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Simpler implementation of cpu_only suggested by @apaszke Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Better implementation of randn, suggested by @soumith Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lint fix. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix CUDA type error. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-03 08:43:03 -04:00
Adam Paszke	72e8190994	Use at most one shared_ptr block at a time to manage THPFunctions (#1454 ) * Fix failing ln in build_all.sh * Use at most one shared_ptr block at a time to manage THPFunctions	2017-05-03 08:15:36 -04:00
Adam Paszke	e1278d4ee2	Fix typo in autograd docs	2017-05-03 03:11:55 -07:00
Pieter Noordhuis	aadad971e4	Fix pybind11 module name for MPI helpers Summary: TSIA Reviewed By: dzhulgakov Differential Revision: D4981136 fbshipit-source-id: 62d0df8dccea0a3ecb6da150eea4752b100c04a8	2017-05-02 23:18:50 -07:00
Kittipat Virochsiri	3ca0de25da	Prevent false overwriting of a field Summary: The code snippet below is invalid in the add unit test is invalid but it may or may not cause exception. Disable the syntax so people don't accidentally use it. Reviewed By: dzhulgakov Differential Revision: D4985030 fbshipit-source-id: ffa2b26f7b29128b196aba1b1001a97c87e381cf	2017-05-02 23:18:49 -07:00
Yury Zemlyanskiy	31643d5ecb	Inference code for seq2seq model Summary: Beam search implementation Differential Revision: D4975939 fbshipit-source-id: 67d8b73390221583f36b4367f23626a2aa80f4b4	2017-05-02 22:47:28 -07:00
Yiming Wu	3504e1d836	cuda (sparse) lengths sum Reviewed By: azzolini Differential Revision: D4961327 fbshipit-source-id: 4ee61dcdd907c044876cb0de671ceee953c15129	2017-05-02 22:21:42 -07:00
Alexander Sidorov	379ac514b8	lstm_benchmark: add warm-up stage, support layers Summary: We need a warm-up stage because otherwise first iteration speds too much timedoing all the allocations Reviewed By: akyrola Differential Revision: D4986201 fbshipit-source-id: f60a75520988ff3f1540bb157cdc69634f307db4	2017-05-02 20:34:00 -07:00
Kittipat Virochsiri	22d4eaeb9e	JoinContext Summary: Layer to allow model to follow different paths for each instantiation context and join later. Together with tagging system cleanup (this is a separate issue), this should reduce the need to write a layer to differentiate between context. Re: tagging system clean up, we should make exclusion more explicit: EXCLUDE_FROM_<CONTEXT>. This would simplify instation code. TRAIN_ONLY should become a set of all EXCLUDE_FROM_*, except EXCLUDE_FROM_TRAIN. Reviewed By: kennyhorror Differential Revision: D4964949 fbshipit-source-id: ba6453b0deb92d1989404efb9d86e1ed25297202	2017-05-02 17:32:26 -07:00
Lukasz Wesolowski	66bd200de0	bug fix - add previous slot offset to calculated slot value in halving-doubling algorithms Summary: Previous slot offset was not added to the calculated value for the slot to be used in halving-doubling algorithms. If multiple instances were running, slot values could collide. Reviewed By: pietern Differential Revision: D4986618 fbshipit-source-id: 56b9220c91f31cc016d37e82907221460de70657	2017-05-02 16:19:55 -07:00
Aapo Kyrola	282298dd1c	Data parallel model: Disable NCCL by default to hopefully reduce deadlocks Summary: Make NCCL optional in data_parallel_model due to continuing reliablity (deadlock) issues. Reviewed By: pietern Differential Revision: D4988950 fbshipit-source-id: 8a2192f01b5f3c0e847137cd37aefc69e553a56f	2017-05-02 16:09:17 -07:00
Janusz Kudelka	ee7b3c9b2b	caffe2: rebatching queue for MultiTask Summary: RFC. This is a naive implementation of Rebatchin Queue for MultiTask effort. Full disclaimer, I'm very new to Caffe/Machine Learning and I'm doing dodge science here (under Dmytros supervision), so please be extra tough on this review so I can learn best practices :) Differential Revision: D4871970 fbshipit-source-id: 924820ef0fce45b5e2bdabeec9885cbafa23a880	2017-05-02 15:22:46 -07:00
Yiming Wu	222b781f76	Ensure sparse_gradients feed to CPU Summary: Ensure sparse gradients tensors are copied to CPU Reviewed By: dzhulgakov Differential Revision: D4987701 fbshipit-source-id: 81f93c4f9d4b9bc5855cd4e9683d1a887b27e0cf	2017-05-02 15:01:26 -07:00
gchanan	574cfe3cf3	Improve kthvalue documentation. (#1448 ) 1) Fix "kth" attr specification -- I can't get sphinx to generate `k`th, but `k` th works with a space, unlike now where the highlighting continues until the next attr. 2) Specify the size of the return tensors. 3) Add an example of the return tensor sizes with more than 1 dimension.	2017-05-02 17:22:02 -04:00
Edward Z. Yang	699755e04f	Convert contiguous() call in adagrad to out-of-place coalesce. (#1446 ) We missed this one in f2903332c7dce1fbb7d7d9f18dcfba8e853581df! Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-02 16:51:54 -04:00
Edward Z. Yang	fb07914c0c	Recommendations for workflow when modifying C files. (#1443 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-02 15:46:45 -04:00
Trevor Killeen	aa2ee86375	pytorch/thpp ~= facebook/thpp (#1445 )	2017-05-02 15:46:10 -04:00
Soumith Chintala	ecd51f8510	docs fixes	2017-05-02 15:42:33 -04:00
Kittipat Virochsiri	e8e36945cf	make debug message more explicit & verbose Summary: I ran into this earlier and the debug messages were not helpful enuogh Reviewed By: kennyhorror Differential Revision: D4985754 fbshipit-source-id: b3d12b5e2cfa1b54fca9126768c84c902664ef28	2017-05-02 12:39:14 -07:00
gchanan	5aa1f769d3	Fix torch.dist documentation: function returns a float. (#1440 )	2017-05-02 14:38:48 -04:00
Krishna Vudata	1f3c7f8080	Handle net.external_inputs correctly in AppendNet Summary: When appending net A to net B, an external input of net A should not be added as an external input of net B if net B is outputting that blob. Reviewed By: dzhulgakov Differential Revision: D4975921 fbshipit-source-id: a5c0ada7b96d851e57d345244d322dd93c7be8e4	2017-05-02 11:20:26 -07:00
Krishna Vudata	da338ca821	Fix Caffe2 LoadOp docs Summary: Better documentation for the add_prefix argument. Reviewed By: ender-wieczorek Differential Revision: D4973963 fbshipit-source-id: 7c238bed05d04195a9d188548a07859a2095fab9	2017-05-02 11:04:21 -07:00
Chonglin Sun	e8e93066e7	add workflow for user complicated embedding Summary: Correctly propagate request_only tag to all layer. Reviewed By: kennyhorror Differential Revision: D4751496 fbshipit-source-id: e65fd8cfe56d2989213d44e684a528ede691d316	2017-05-02 10:46:52 -07:00
Pieter Noordhuis	eecc807a75	Keep track of number of in-flight send operations Summary: This helps guard against programming errors where waitSend is called before send is called. It uses a std::atomic to keep overhead low. Reviewed By: andrewwdye Differential Revision: D4984604 fbshipit-source-id: 04a63b1ba088e3bcba0abff40771af666deb15e5	2017-05-02 09:35:46 -07:00
Jiyan Yang	a458aa4b2a	Fix tags to be based on EXCLUDE_FROM_{CONTEXT} Summary: Cleaning up the tagging system. Introducing tags EXCLUDE_FROM_{CONTEXT}. Reviewed By: kennyhorror Differential Revision: D4974842 fbshipit-source-id: b0fa6772299bb70afa2192c39e45191c9f41336a	2017-05-02 09:32:27 -07:00
Pieter Noordhuis	5386012164	Check return value of ibv_reg_mr for error Summary: This returns EFAULT when passing a GPU memory pointer (for GPUDirect) and the ibverbs driver can't map the GPUs memory. Since the error is pretty cryptic, crash with a more useful message. ``` terminate called after throwing an instance of 'gloo::EnforceNotMet' what(): [enforce fail at gloo/transport/ibverbs/buffer.cc:46] mr_ != nullptr. ibv_reg_mr: Bad address (kernel module 'nv_peer_mem' not loaded; did you specify a GPU pointer?) ``` Reviewed By: andrewwdye Differential Revision: D4982966 fbshipit-source-id: 72c220fe22a3bc59396cfff992ad5f0f9c5bf83a	2017-05-02 09:11:15 -07:00
Edward Z. Yang	4bf813e068	Document cdata non-NULL invariant, and consequence Python side. (#1435 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-02 11:17:20 -04:00
Soumith Chintala	3b4bc721ef	fix osx build and suppress clang warnings (#1432 )	2017-05-02 09:33:24 -04:00
Kittipat Virochsiri	7d6d67119f	Allow LayerModelHelper to keep input blobs from schema Summary: In certain situation, like in D4907916 where we insert additional step in the middle of a model, it's neccessary to keep the blob names constant across model helper so that it doesn't break communication schema. Reviewed By: kennyhorror Differential Revision: D4981527 fbshipit-source-id: 6b8d6d240279dd48f801cfacbaa1d320ba54d694	2017-05-01 21:31:36 -07:00
Ahmed Aly	58bc830660	Integrate CRF in DeepText + New caffe2 operator for viterbi decode Summary: Inegration of the CRF Layer in DeepText wordmodels + Implementing the viterbi decode operator in C++ instead of python so that the CRF models can be deployed in production. Differential Revision: D4912196 fbshipit-source-id: 64f499a1bd47e811e7a96dde839904dcd05cacb3	2017-05-01 20:39:41 -07:00
Kittipat Virochsiri	38d3bfa5d4	Warn on setting blob on Scalar Summary: Calling `set()` or `set_value()` on Scalar is dangerous as something might be holding a reference to it. This is especially true with `LayerModel`, where instantiation is delayed. The code may still run but it will produce unexpected results, i.e., values maybe written to the wrong blob. Reviewed By: kennyhorror Differential Revision: D4955366 fbshipit-source-id: f5e8694a9a411ee319ca9f39a0fed632d180b8a5	2017-05-01 20:18:30 -07:00
Aapo Kyrola	c86610b738	special executor class for RecurrentNetworks (just single threaded now) Summary: This is preamble for the "diagonal executor". Instead of creating a Net for each timestep, we have a single executor for the RecurrentNetworkOp that manages ops per timestep. This will be used if net_type='rnn', so one can still use the old way by using a net type of 'simple' or 'dag' (so there is effective kill-switch if there are some issues with this). Did this only for the forward-model. Gradient op will follow later on, but it is basically similar, just reverse order. Reviewed By: salexspb Differential Revision: D4979933 fbshipit-source-id: bda77918ec518cb6b29d7021ee036d59eb2dd303	2017-05-01 19:06:25 -07:00
Edward Z. Yang	dca208b525	Refactor test_sparse to reduce boilerplate. (#1421 ) * Refactor test_sparse to reduce boilerplate. Instead of manually creating a helper function, threading an is_cuda parameter around, and creating a test method for CUDA and non-CUDA variants, we take a different approach: - There is now some new member variables initialized in setUp which control the aspects of how we carry out the test; at the moment, it's just whether or not we are using CUDA or not. This means you don't have to pass is_cuda around, or do a conditional to get the triplet of constructors you need. I'll note that I am not a big fan of member variables in test objects, but these are (intended to be) immutable so I think it should be OK. - Instead of manually defining test_foo and test_foo_cuda, we now have a new TestCudaSparse class which overrides setUp (from above) to swap in the CUDA implementation. Way less boilerplate, and NO metaprogramming needed. If you need to opt out of CUDA testing, there is a new cpu_only decorator you can use. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-01 21:52:58 -04:00
Edward Z. Yang	181cb15c72	Fix formatting error in docs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-01 21:47:22 -04:00
Lukasz Wesolowski	7df8fbb64f	Generalize halving-doubling to support non-power-of-two cases using binary blocks algorithm Summary: A generalized version of halving-doubling that supports non-power-of-two number of processes by breaking up execution into blocks that are powers of two and communicating interblock after the intrablock reduce-scatter. Non-power-of-two cases will have some degree of load imbalance compared to power-of-two, but cases with few large blocks (e.g. 8 + 4 or 16 + 8) should still perform relatively well. Reviewed By: pietern Differential Revision: D4955947 fbshipit-source-id: af4f218fedb6adf475530c38386978b81f4f2b74	2017-05-01 16:05:22 -07:00
Yiming Wu	f0dd96c116	brew fc test fix for packed_fc Summary: It turned out that we can not run PackedFC on a machine that does not have avx2 right now, as there is an known issue with MKL 2017.0.098 that produces wrong results on non-avx2 machines. I just moved this test from here because this is not the purpose of this test Reviewed By: salexspb Differential Revision: D4974021 fbshipit-source-id: c5b82a41021defc9946a8219f59b28abb13d3beb	2017-05-01 14:33:05 -07:00
Adam Paszke	5c7453447f	Fix bugs, rename differentiate to grad, make it more flexible	2017-05-01 16:44:56 -04:00
Adam Paszke	87164f554d	Bug fixes	2017-05-01 16:44:56 -04:00
Adam Paszke	267e7c0431	Fix memory issues with Conv and BatchNorm	2017-05-01 16:44:56 -04:00
Adam Paszke	e5db8f98be	Add torch.autograd.differentiate	2017-05-01 16:44:56 -04:00
Adam Paszke	20aa5b066f	Convert some of the functions to new format Also, fix a lot of issues that appeared after the previous commits.	2017-05-01 16:44:56 -04:00
Adam Paszke	de9998e198	Add support for the new Function format	2017-05-01 16:44:56 -04:00
Adam Paszke	702a2e3bc5	Make Variables not subclass Function anymore Because of this Variables can no longer appear in the graph. Every usage of a leaf Variable will leave an AccumulateGrad function that has no outputs, but modifies var.grad as a side effect.	2017-05-01 16:44:56 -04:00
Adam Paszke	2ca787fcf4	Refactor attribute names in autograd	2017-05-01 16:44:56 -04:00
Pieter Noordhuis	2ec629bef9	Set SO_REUSEADDR to try and prevent bind errors Summary: After running the test suite many times we end up with a zillion connections in TIME_WAIT state. Setting SO_REUSEADDR seems like it should help binding to ports regardless of the TIME_WAIT state. Reviewed By: andrewwdye Differential Revision: D4979606 fbshipit-source-id: b611f9c9e11aba858dc192f6bca3d64e10100b52	2017-05-01 13:36:14 -07:00
Soumith Chintala	2197e4c766	version bump	2017-05-01 15:54:52 -04:00
Pieter Noordhuis	2a28283680	Fix pair destructor if in CONNECTING state Summary: It can happen that a pair is destructed while in CONNECTING state when some unrelated code throws an exception after the connect function has been called. The most likely place for this to happen is when connecting pair A is in progress while connecting pair B throws an exception. The exception will force destruction of all references to pair A, even if it is in the CONNECTING state. Also see https://github.com/facebookincubator/gloo/issues/33 Reviewed By: andrewwdye Differential Revision: D4979557 fbshipit-source-id: 0cddddd3f478106f1694603fe7f2efe15a2d9aa1	2017-05-01 12:41:07 -07:00
Kittipat Virochsiri	ffc6bad116	Concat axis=0 Summary: Previously, the code below would go out of bound. Reviewed By: xianjiec Differential Revision: D4968037 fbshipit-source-id: 3760e2cddc919c45d85ac644ac3fabf72dbaf666	2017-05-01 12:19:34 -07:00
Anatoly Rosencrantz	1040b5f91c	Enable bitcode for iOS builds Summary: build_ios.sh now have `-fembed-bitcode` flags for cmake and passes these flags to build_host_protoc.sh (which now accepts optional argument `--other-flags`). That allows to use output libs (libCaffe2_CPU.a, libCAFFE2_NNPACK.a, libCAFFE2_PTHREADPOOL.a and libprotobuf-lite.a, libprotobuf.a respectively) in Xcode projects with bitcode enabled. Bitcode is enabled by default in all projects since Xcode7, is crucial for slicing and is mandatory for watchOS targets. Enabling bitcode for target requires bitcode to be enabled for all dependencies also, so Caffe2 built without bitcode forces developers to switch off bitcode for the whole app. Closes https://github.com/caffe2/caffe2/pull/457 Reviewed By: bwasti Differential Revision: D4978644 Pulled By: Yangqing fbshipit-source-id: 5165abb507fb91bc8c38f7348d6836bccf8fcc22	2017-05-01 10:32:11 -07:00
Ahmed Taei	561255218a	NormalizeOP CUDA impelementation Summary: Implement NormalizeOP for GPU using CUDA, and re-write the graident to be a function of the output so its more efficent specially for CUDA implemntation. Reviewed By: akyrola Differential Revision: D4971300 fbshipit-source-id: e0ab66462000988aaf1f26010ea550533d107167	2017-05-01 09:25:30 -07:00
Edward Z. Yang	4624278b1d	Make sparse documentation title consistent with others. (#1420 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-05-01 11:48:00 -04:00
Bart Olsthoorn	79d4ac670c	Add map_location to load_url (#1418 )	2017-05-01 10:21:30 -04:00
Adam Paszke	4ebf3ff46d	Add base for CUDA allReduce and broadcast in DataChannelGloo	2017-05-01 01:49:10 -07:00
Adam Paszke	ac3ba9a2ad	Rebase fixes	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	14e1bfddbc	Change warning message in MPI	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	c19fbd3364	Update comments; Add inline accessors for value_type tuple in GlooCache	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	a17d96d571	Add multiple thread support for DataChannels Previously, when using same data channel in multiple thread environment, one didn't have any guarantee that there won't be any deadlocks or even errors.	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	b7dcc29430	Forward declare GlooCache key_type	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	18b4dcd28b	Remove unused variable in macro	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	be81304d27	Moved GlooCache to new file; Functions renames; Minor fixes	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	f07f13c6e9	Change Store exception handling	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	310d08c37b	Fix store and all operations	2017-05-01 01:49:10 -07:00
Filip Binkiewicz	234df2138a	Fix compilation errors	2017-05-01 01:49:10 -07:00
Janusz Marcinkiewicz	2b340e7d50	Add python tests; Remove broken prefix store creation	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	6888c61fa8	Fix DataChannelGloo compilation	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	ba3328b365	Add DataChannelGloo tests	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	3b4fe5dfc4	Add isend/irecv; Add all types generator for template functions; Minor refactor	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	ce42761628	Add groups	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	df4791d6c0	Implement DataChannelGloo	2017-05-01 01:49:09 -07:00
Adam Paszke	7e8830c3d5	Initial gloo bindings	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	b91cec7f66	Fix THD library build for CUDA	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	765aeb1a08	Fix nonzero bug	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	280e2a94e5	Worker init clarification; Inform on error thread notification failure	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	e7f453b5de	Add barrier to test; Minor changes;	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	8030aa0f1b	Refactor error thread	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	40ad2cde62	Remove unnecessary nonzeroElems function	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	af4a978c44	Move error thread to CommandChannel; Minor fixes;	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	fe5fc6723f	Remove unnecessary code	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	6e6179633b	Minor fixes in `THDMasterWorkerInit`	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	c97e60c45d	Add actual error reporting in Master	2017-05-01 01:49:09 -07:00
Janusz Marcinkiewicz	2cdb368f97	Add error handling in MasterWorker mode	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	a5b2f3461a	Review fixes	2017-05-01 01:49:09 -07:00
Mateusz Piotrowski	d3e60599d2	Add benchmark scripts (#66 )	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	98d8e0b040	Lapack functions implementation #2 + fixes after review	2017-05-01 01:49:09 -07:00
Filip Binkiewicz	fe2c360eda	Lapack function implementation #1	2017-05-01 01:49:08 -07:00
Filip Binkiewicz	59ae109bbb	Implement functions from set 1 (except Lapack)	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	8623076654	Add `convertToRank` to do bound checking	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	a362b4f367	Add support for `unsigned char` aka `byte` to MPI	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	ef724e355c	Change rank type: int -> std::uint32_t; Minor fixes	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	e863d27393	Tweaks, fixes, cleanup in DataChannelTCP	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	4c388f9398	Revert structure changes; Minor fixes	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	6740d1d904	Rewrite CommandChannel	2017-05-01 01:49:08 -07:00
Adam Paszke	f891d9b1bf	Don't build tests by default	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	a81f330854	Rename `construct` -> `new`; Minor fixes	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	c02241edbd	Minor code refactor	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	f30a92fa17	Fix invalid socket initialization	2017-05-01 01:49:08 -07:00
Adam Paszke	1391ff99f4	Use TCP_NODELAY for data sockets	2017-05-01 01:49:08 -07:00
Adam Paszke	43019bd88a	Always loop over all possible addresses in worker	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	d6380910f5	Removed unnecessary code; Minor fixes	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	04491e84e4	Fix build with CUDA	2017-05-01 01:49:08 -07:00
Janusz Marcinkiewicz	e247249a5f	Implement TH_API functions from the set 4	2017-05-01 01:49:08 -07:00
Aapo Kyrola	2c59f017e6	Port Xray OC workflow to elastic_data_parallel_model Summary: As in the title + added scuba logging of the results. Reviewed By: andrewwdye Differential Revision: D4974261 fbshipit-source-id: 3e05b97133be95ffe37c8bcafd8a5a6bf3e7da93	2017-05-01 00:32:47 -07:00
Tejas Khot	0160438eb9	added logical not operator for ByteTensor (#1403 )	2017-04-30 08:47:24 -04:00
Soumith Chintala	7dd8571bc6	fix avg_pool docs in nn.functional	2017-04-30 08:44:43 -04:00
Kai Arulkumaran	48a7869b23	Doc fixes (#1409 )	2017-04-30 08:28:19 -04:00
Soumith Chintala	582fd3db7d	fix osx build	2017-04-29 09:29:57 -04:00
Adam Paszke	9169f60a84	Parallelize TensorMethods.cpp builds (#1400 )	2017-04-29 09:07:21 -04:00
Adam Paszke	457d78a7d9	Use THCUNN backward kernels for Tanh and Sigmoid in Autograd (#1399 )	2017-04-29 09:07:03 -04:00
Soumith Chintala	a071ccbea6	fix NCCL makefile for CUDA 7.5 (#1401 )	2017-04-29 09:04:01 -04:00
Tejas Khot	db1eb66456	corrected docstring for Dropout (#1404 )	2017-04-29 13:40:47 +02:00
Viswanath Sivakumar	6e1333fe92	CUDA operators for DotProduct and DotProductGradient Summary: Only CPU impl is available at the moment. Wrote simple cuda kernels. Reviewed By: akyrola Differential Revision: D4577736 fbshipit-source-id: c2540aa9d332fcdeac46cc7f89aab164d107d7a8	2017-04-28 19:47:00 -07:00
Ying Zhang	d223d71703	Add shape inference function for RoiPool. Summary: As the title. Reviewed By: akyrola Differential Revision: D4960241 fbshipit-source-id: d5f7d7c2eea72a75f810aa2f532965fff48f8388	2017-04-28 17:03:29 -07:00
Soumith Chintala	45020a74cd	remove inplace pow and fix contiguous -> coalesce (#1398 )	2017-04-28 18:26:29 -04:00
Edward Z. Yang	9c01f5d6b2	Document hybrid sparse tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-28 23:53:01 +02:00
Simon Layton	2b0dbad3df	Support fp16 output from ImageInputOp Summary: Refactors out option -> datatype cast Allows native fp16 output from GPU transformation path Expanded CastOp to allow fp16 in/out on GPU Closes https://github.com/caffe2/caffe2/pull/400 Reviewed By: akyrola Differential Revision: D4948721 Pulled By: asaadaldien fbshipit-source-id: f3b0b6d545e58a8f204857e5bdb41086aac731b8	2017-04-28 14:50:47 -07:00
Kai Arulkumaran	cbb9f08b71	Add new init methods gain, eye and dirac (#1172 )	2017-04-28 17:16:40 -04:00
Adam Lerer	f75ab857b8	Add safeCoalesce() to tests	2017-04-28 17:11:05 -04:00
Adam Lerer	f2903332c7	Make coalesce() out of place	2017-04-28 17:11:05 -04:00
Adam Lerer	9643be76f9	speed up accumulation	2017-04-28 17:11:05 -04:00
Adam Lerer	4f09461d24	Rename sparse tensor contiguous() to coalesce()	2017-04-28 17:11:05 -04:00
Edward Z. Yang	bafb2e5cc2	Implement sparse pow. (#1387 )	2017-04-28 23:06:09 +02:00
Rudy Bunel	28a7fbbdf5	Documentation fix for torch.gather	2017-04-28 22:45:14 +02:00
Sam Gross	4c1cdb6148	Refactor Python string utility function	2017-04-28 21:25:26 +02:00
Aapo Kyrola	ed05c28bc6	Speedup SquaredL2Distance CUDA Summary: Both SquaredL2Distance and SquaredL2DistanceGradient had bad CUDA implementations. Use proper reductions and batched kernels. Reviewed By: asaadaldien Differential Revision: D4968527 fbshipit-source-id: f7cf82072d38bc127c757c5751863a9439aca8b5	2017-04-28 11:55:59 -07:00
ngimel	775481ed56	re-enable dilated convolutions on Kepler (#1394 )	2017-04-28 14:42:19 -04:00
Soumith Chintala	5b2aac7c73	Merge commit '224f5eabf5cfb3a19abc1819f7dac230500b6bdb'	2017-04-28 13:48:06 -04:00
Boris Fomitchev	224f5eabf5	half<->float conversion cleanup (#680 )	2017-04-28 19:46:42 +02:00
Soumith Chintala	fd490c6490	Merge commit 'd6a31c68a0f39656257322a55c9e04dd579de828'	2017-04-28 13:42:23 -04:00
Bruno Rosa	d6a31c68a0	Add option to disable ppc64le's VSX support Set environment variable TH_NO_VSX=1 to disable VSX.	2017-04-28 13:41:03 -04:00
Edward Z. Yang	96a281dfab	Add one more missing self.dilation parameter. (#1392 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-28 19:16:32 +02:00
Sasank Chilamkurthy	94b147fd41	Allows dicts batches in dataloader. (#1354 ) * Allow dicts in Dataloader * use collections.Sequence instead of collections.Iterable in dataloader	2017-04-28 19:14:52 +02:00
Kevin Matzen	6bb43ee41e	leaky relu gradient op Summary: Implement CPU and GPU gradient for Leaky ReLU op. Differential Revision: D4943905 fbshipit-source-id: 541f13cd5f274a18b69ecf1362722b1bc0105ad9	2017-04-28 10:06:23 -07:00
Trevor Killeen	c26f6877a0	guard topk for half (#759 )	2017-04-28 11:57:15 -04:00
Soumith Chintala	8908000262	function -> lambda in test	2017-04-28 10:31:40 -04:00
Soumith Chintala	8b1d5727d8	fix minor docs	2017-04-28 10:13:52 -04:00
Uridah Sami Ahmed	75f1989bec	Add nn.Bilinear and tests	2017-04-28 10:11:30 -04:00
Soumith Chintala	e221536ad8	Merge commit 'a44317fea88adddded91e068088415de1e66fd4b'	2017-04-28 08:04:39 -04:00
Alexander Matyasko	a44317fea8	Change magma_sgesvd to magma_sgesdd which is significantly faster	2017-04-28 08:03:39 -04:00
Soumith Chintala	24e5a9057e	Revert "Parallelize TensorMethods.cpp builds (#1364 )" (#1390 ) This reverts commit 060048bcd808893ba3113d09273a42642904078a.	2017-04-28 07:59:40 -04:00
Adam Paszke	060048bcd8	Parallelize TensorMethods.cpp builds (#1364 )	2017-04-28 07:45:21 -04:00
Soumith Chintala	77035d151e	make topk test unique	2017-04-28 07:30:25 -04:00
Soumith Chintala	50c9c23525	enable topk for all cuda	2017-04-28 07:14:21 -04:00
Soumith Chintala	3f81803b09	Merge commit '69574a6dc4036b0113c512a1b2d74e23682c8a3b'	2017-04-28 07:08:43 -04:00
Soumith Chintala	d421c473a9	Merge commit '928f6516c16ff91c0a789d0a653551041d1bafd0'	2017-04-28 07:07:24 -04:00
Trevor Killeen	48f9e526ea	implement expand/expandAs in CPU/GPU code	2017-04-28 07:06:25 -04:00
Trevor Killeen	69574a6dc4	implement expand/expandAs in CPU/GPU code	2017-04-28 07:04:08 -04:00
Trevor Killeen	928f6516c1	implement expand/expandAs in CPU/GPU code	2017-04-28 07:03:51 -04:00
Oisin Mac Aodha	b93b525a1c	Enable specifying of margin in HingeEmbeddingLoss (#1378 ) Previously it was not possible to set a value for the margin in the HingeEmbeddingLoss in the constructor. This patch fixes the issue and makes the loss behave as it is described in the docs. A discussion of this issue can be viewed here: https://discuss.pytorch.org/t/issue-with-setting-margin-for-hingeembeddingloss/2088	2017-04-28 06:58:48 -04:00
Kevin Matzen	482ffccd76	Make instance norm grad test less flakey Summary: Instance norm failed grad check in some cases that needed a smaller step size. Decreased step size, but also increased threshold slightly. Related diff: D4627379 Reviewed By: kennyhorror Differential Revision: D4941827 fbshipit-source-id: d6f565340da92af40bfee90627960a3356c69412	2017-04-27 22:35:10 -07:00
Xianjie Chen	726ded4758	add box cox transform op Summary: as desc Reviewed By: kittipatv Differential Revision: D4949042 fbshipit-source-id: 06b8828d8fbe2a88f6798c5d19a702ebaf6def70	2017-04-27 22:06:43 -07:00
Alexander Sidorov	bf50599c70	Layered LSTM (naive version) Summary: This is a naive layering approroach till we have a better one. It could be c++ based and support diagonal execution. Not integrating into main LSTM API yet as this might be revised a bit. Would like to land so we can compare current implementation in the benchmark and also use this as an example of how LSTMs could be combined (as some folks are doing similar things with some variations). Later we can LSTM() support API of layered_LSTM() and also change it under the hood so it stacks cells into a bigger cell instead. This way if we make RNN op use a kind of a DAG net, then RNN op can provide more parallelizm in stacked cells. Reviewed By: urikz Differential Revision: D4936015 fbshipit-source-id: b1e25f12d985dda582f0c67d9a02508027e5497f	2017-04-27 19:16:58 -07:00
ngimel	8db2cf6182	temp fix for transposed dilated convolution (#1388 )	2017-04-28 02:53:37 +02:00
Yiming Wu	aa5a46b848	fix LRN order Summary: fix LRN helper's order Reviewed By: salexspb Differential Revision: D4949902 fbshipit-source-id: 88b1aa985546d36aa66c0677c617979ff186d78a	2017-04-27 16:46:47 -07:00
Lukasz Wesolowski	bc3ec13dae	change topk operator to use a priority queue Summary: Use a priority queue instead of std::partial_sort to identify the top k elements. This reduces memory usage and improves performance. Differential Revision: D4963931 fbshipit-source-id: 02e75b17ffaf24a4f63c7136626bf0991ee47495	2017-04-27 15:07:31 -07:00
Edward Z. Yang	7e8ef0e22a	Actually pass dilation to the underlying operators. (#1386 ) No tests for now; we'll need some sort of shape DSL to concisely represent them. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-27 23:38:01 +02:00
Yury Zemlyanskiy	12a024241a	Move BeamSearchForwardOnly to OSS Summary: Step 1 for inference code in OSS Differential Revision: D4960547 fbshipit-source-id: 4c3121e5cb3c2402be08947c1e1afa0dd6eb921a	2017-04-27 13:35:53 -07:00
Sam Gross	27990fee54	Use fully qualified name as tp_name for tensors and storages (#1379 )	2017-04-27 16:26:44 -04:00
Mathieu Baudet	1aadf4324b	Add row-wise broadcasting to "Where" operator Summary: Add row-wise mode to `Where` (D4901402), similar to `RowMul`. Reviewed By: ender-wieczorek Differential Revision: D4928221 fbshipit-source-id: 3443e559cd366e48c2f6a3f379aeefb7921264ee	2017-04-27 12:31:54 -07:00
Alexander Sidorov	ad6204eb0b	LSTM: support dropping hidden / cell states when sequence Summary: This is useful when data has standalone sequences which are not connected to each other by any meaningful context Reviewed By: yqwangustc Differential Revision: D4835164 fbshipit-source-id: f95626acc26acc3eba3bca7efb08ed1dbdb36c83	2017-04-27 11:47:29 -07:00
Kevin Matzen	c4ce118393	fix curand odd-sized workaround Summary: Ran into illegal memory access errors when running MSRAFill on an odd-sized tensor. curand only supports even-sized fills. To workaround this limitation, we fill the last entry of the tensor manually and use curand for what remains. In this line, the intent is to get the n-1 th element of the tensor. r is already a T*, so we should not be multiplying by sizeof(T) to get the n-1 th element. Differential Revision: D4961306 fbshipit-source-id: 587f2945abf025e28f573482a4828c09e6ae771b	2017-04-27 01:18:19 -07:00
Xiaolong Wang	e9d5863860	Allow Load operator to load into overriden names Summary: A new argument `blob_name_overrides` is added, which is to specify the destination of loaded blob (in order to allow they have different names than what are in the saved file/db). This will be used for parameter initailization by pretrained model in Dper 2. When loading a blob, we need to avoid name collision by assigning the loaded blob with a new (temp) name. Reviewed By: xianjiec Differential Revision: D4952485 fbshipit-source-id: 4ce79bf40223314bb94981c22cbe537ae3f3d27c	2017-04-27 01:18:12 -07:00
Soumith Chintala	2ef7331007	Update sparse.py	2017-04-27 02:25:00 +02:00
Qichao Que	beb7573e5c	workflow support for training regression/weighted logistic regression model. Summary: workflow support for training regression/weighted logistic regression model. Reviewed By: xianjiec Differential Revision: D4830130 fbshipit-source-id: ccd4fc47a0d4b7c4ffb5948766c4a00ac34f929b	2017-04-26 17:02:05 -07:00
Adam Paszke	c2cfa4cf5b	Add THGenerate*Type.h for all types (#1014 )	2017-04-27 01:11:56 +02:00
Pieter Noordhuis	c915f8ddbf	Signal error on connection error instead of asserting Summary: No need to assert on connection errors. Reviewed By: andrewwdye Differential Revision: D4957698 fbshipit-source-id: b47f6f0f098dbf7d212701c5cb68e34b2c1c9522	2017-04-26 16:07:13 -07:00
Aapo Kyrola	6a1ef687f6	Free scratch blobs when data workers exits, add utility function to reset blobs Summary: Free scratch blobs at data workers exit. Also add utility function that you can use to reset gradient blobs easily: from caffe2.python import utils grad_blobs = [b for b in workspace.Blobs() if b.endswith("_grad") or b.endswith("_shared")] utils.ResetBlobs(grad_blobs) Reviewed By: rpenggithub Differential Revision: D4955531 fbshipit-source-id: d33b2bb2b5247dd2c4cff51c82b1257c871a4179	2017-04-26 13:40:13 -07:00
Jiyan Yang	795dc1c326	Remove loss ops from eval net Summary: Current eval nets contain loss operators; see example: https://fburl.com/6otbe0n7, which is unnecessary. This diff is to remove them from the eval net. Differential Revision: D4934589 fbshipit-source-id: 1ba96c20a3a7ef720414acb4124002fb54cabfc7	2017-04-26 12:46:25 -07:00
Edward Z. Yang	b39a2f2cbb	Documentation for sparse tensors. (#1366 )	2017-04-26 21:43:05 +02:00
Edward Z. Yang	d9f01397b3	s/NOCUDA/NO_CUDA/ Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-26 21:42:09 +02:00
Sam Gross	8ca7bf2ab3	Check argument types in 'checkTypes' (#1363 ) Fixes #1357	2017-04-26 15:00:41 -04:00
Aapo Kyrola	9215afef7d	Allow stopping of specific data workers + specify c2 queue size Summary: Now you can call coordinator.stop_coordinator("train") to stop the train model's data input and release its memory. Reviewed By: rpenggithub Differential Revision: D4955014 fbshipit-source-id: c1bc3ec67337b94aff8ea9b306c3b4158eeef42c	2017-04-26 11:18:40 -07:00
Bor-Yiing Su	13bdd4ec05	Replaces the non-existing _param_init_net net by raising an exception. Summary: The _param_init_net does not exist. All the other places reference param_init_net instead. So far no one has encountered any problem because all the passed params are BlobReferences. This diff makes this assumption explicit. Reviewed By: azzolini Differential Revision: D4922930 fbshipit-source-id: e6dbd7a29ea640b7e62fcfec7ced3cc7d149f872	2017-04-26 10:35:45 -07:00
Jeffrey Dunn	9f9a2da1a1	Revert D4920719: [dper2][operator] ScaleGradientOp Summary: This reverts commit 0e1e0888f79594be874fdbdda5ccef7389064c50 Differential Revision: D4920719 fbshipit-source-id: 1ca9dc329eaffeb2932267d631506bb124d4e7ae	2017-04-26 09:34:47 -07:00
Aapo Kyrola	c387704030	improve softmax-with-loss kernels for prob-mode Summary: Yet another diff to improve softmax CUDA kernels. 1) Use CUB for reduction ProbCrossEntropyKernel (was sequential loop); 2) remove unnecessary inner for-loops for two other kernels. Reviewed By: wickedfoo Differential Revision: D4953099 fbshipit-source-id: 4a5806d450021eff84e3d7fb0e7020cb5013fd69	2017-04-26 09:05:56 -07:00
Edward Yang	d8e7093857	Reimplement RowMaxKernel using CUB block reduction. Summary: My first CUDA kernel ever! The general strategy: 1. Create a block per row, up per CAFFE_MAXIMUM_NUM_BLOCKS 2. Create a CAFFE_CUDA_NUM_THREADS to sum in parallel 3. Sequentially compute the max of all inputs for a thread 4. Use CUB parallel reduce to compute the overall max. The new version of the code is way faster than the old kernel (20x). This is actually quite suspicious; with the assistance of ntv, we discovered that RowMaxKernelLargeD was performing slowly on lstm because it was only ever being parallelized over a single block (see Test Plan below for a sample trace). It will be good to investigate this further. Differential Revision: D4948557 fbshipit-source-id: 7f8d5c04667b948881468adb37f8ebc5c903c8da	2017-04-26 09:05:56 -07:00
Hans Gaiser	8950f41da3	Install CUDA headers. Summary: This PR makes cmake installs the gloo CUDA headers if USE_CUDA is enabled. Closes https://github.com/facebookincubator/gloo/pull/29 Differential Revision: D4946856 Pulled By: pietern fbshipit-source-id: a688c3794c4a5e34b664e7bdeb4e1148f6504419	2017-04-25 22:42:12 -07:00
Huazhong Ning	e42c14e819	ScaleGradientOp Summary: ScaleGradient is a helper operator that does no actual numerical computation, and in the gradient computation phase scales the gradient from being computed through it. Differential Revision: D4920719 fbshipit-source-id: 0e1e0888f79594be874fdbdda5ccef7389064c50	2017-04-25 21:46:45 -07:00
Yangqing Jia	deb1327b6e	Re-apply #266 Summary: Closes https://github.com/caffe2/caffe2/pull/404 Differential Revision: D4943280 Pulled By: Yangqing fbshipit-source-id: c0988598d8ccb8329feac88382686324b90d4d46	2017-04-25 21:17:04 -07:00
Alexander Sidorov	b905166362	RNN: fix bug for parameter gradient in a case when SumOp is Summary: Issue is that AliasOp doesn't work well with swaps that we do for param.grad and param.accGrad. Tensors become the same if there is no reallocation of the gradient tensor inside the backward cell net's local workspace. bug explanation from akyrola: ``` gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad: tensor A on each timestap back to 0, we Alias gpu_0/decoder/weighted_encoder_outputs_grad, so then also gpu_0/decoder/weighted_encoder_outputs_grad: tensor A It's acc is: gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor B Now after timesteps, we swap (line 626) with _acc to get gpu_0/decoder/weighted_encoder_outputs_grad: tensor B gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A OPTION A -- batch size is same as before or smaller: Then on next iteration, we do again the Alias to gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad, so now gpu_0/decoder/weighted_encoder_outputs_grad: tensor A and also gpu_0/decoder/weighted_encoder_outputs_grad_acc: tensor A swapping them does nothing and they are the same OPTION B -- batch size increases gpu_0/decoder/decoder_hidden_encoder_outputs_sum_grad is reallocated, becomes tensor C gpu_0/decoder/weighted_encoder_outputs_grad becomes tensor C with Alias gpu_0/decoder/weighted_encoder_outputs_grad_acc: is tensor A ``` Reviewed By: urikz Differential Revision: D4946730 Tags: rnn, caffe2 fbshipit-source-id: b52d63cb238b81d2ad40e05e70deb32a81336f47	2017-04-25 20:46:59 -07:00
Aapo Kyrola	a4554bb705	elementwise ops + error handling Summary: New memonger (D4393909) has option to use shape inference. When trying this on some models, I encountered a couple of issues, fixed here: - elementwise ops Add, Div, Mul did not have shape inference, leading to errors - if shape inference function throws an error, it will crash the whole thing. It is better to catch the error, log it, and continue going on. Shape inference is not required, but an optimization. - additional checks to conv/pool shape inference function. This was segfaulting in certain cases. Reviewed By: asaadaldien Differential Revision: D4949994 fbshipit-source-id: d4c571e1bb20f8feeade95c49412771bb3e7bed0	2017-04-25 18:52:33 -07:00
Aapo Kyrola	da567dcb38	Add __syncthreads() between CUB reductions for elementwise linear gradient kernel Summary: Thanks to ezyang, now I know that if a CUB tempstorage is reused, a thread sync is needed. So added this to the elementwise linear gradient kernel. Reviewed By: wickedfoo, ezyang Differential Revision: D4949250 fbshipit-source-id: fbbbd336a962a51be43784207105cadd391a8ef2	2017-04-25 17:32:18 -07:00
Jiyan Yang	ef2701a57e	MapToRange layer Summary: A layer that takes raw ids as inputs and outputs the indices which can be used as labels. The mapping will be stored with the model. Reviewed By: kittipatv Differential Revision: D4902556 fbshipit-source-id: 647db47b0362142cdba997effa2ef7a5294c84ee	2017-04-25 16:03:58 -07:00
Yiming Wu	2c8b41e3f3	Adding add_weight_decay and image_input to brew module Summary: Adding add_weight_decay and image_input to brew module & remove `getWeights` and `getBias` from CNNModelHelper With fbgs `useWeights`, the results show that noone but add_weight_decay is using this function. I checked with oculus people, their getWeights is a different function. kennyhorror Please notice whether this is going to affect you :) Reviewed By: salexspb Differential Revision: D4945392 fbshipit-source-id: 4ef350fd81dd40a91847e9f3ebc5421eb564df32	2017-04-25 16:03:58 -07:00
Yiming Wu	885f906e67	resnet train print loss and accuracy Summary: printing resnet training loss and accuracy for each batch so that people will have better idea of what is going on Reviewed By: pietern Differential Revision: D4945390 fbshipit-source-id: 0fcd60f4735e81641355aba6e6cbf0e57e886e38	2017-04-25 16:03:58 -07:00
Yang Yang	5692969e8f	add gradient for LengthsTileOp Summary: lengthTile goes from 1 to multiple, the gradient op is simply the reverse, by adding up the fanned-out rows of gradients together into 1 Reviewed By: kittipatv Differential Revision: D4943375 fbshipit-source-id: deae9984e849974a0d484a10b94efdb1d30941cc	2017-04-25 14:31:15 -07:00
Aapo Kyrola	f82a510be6	share forward activation blobs + pass unused free blobs down all branches + use shape infernece Summary: Added optional support for using activation blobs for sharing as well. Doing this change revealed an non-optimal implementation in the blob sharing: we need to prefer to reuse freeblobs by prefering those blobs that are already shared by many other blobs. Otherwise the memory usage can increase when the pool of 'free blobs' grows. Also, my first version only passed "free blobs" (i.e blobs in recycling pool) down the first branch when operators forked. But now we pass those blobs that were not used by the first branch down the second branch and so on. Also added support for blob size information in the heuristic. This uses the shape inference mechanism. I had to also do some small tweaks: - use Sum() operator as a way to match shapes of blobs that had otherwise unknown shapes. This is related to the Sum() operator that is added to combine multiple incoming gradient inputs (with _autosplit gradients). - a couple of random shape inference fixes This reduces the Resnet-50 memory usage on 64 batch from 9.45 Gig to 8.5 Gig. For a 32 batch, the memory usage is 4330 MiB, down from 4800 MB, compared to Torch's 6856MiB (thanks prigoyal for checking this for me). This is unfortunately quite a bunch to review... Reviewed By: asaadaldien Differential Revision: D4393909 fbshipit-source-id: 9c7c94125f96512bea80463ebcb63c215ef95ff9	2017-04-25 14:23:25 -07:00
Alexander Sidorov	fc77ae1736	remote some experimental files from open-source repo Differential Revision: D4948835 fbshipit-source-id: 1115914a19d70ae214557132f24e4c302470f47e	2017-04-25 13:31:50 -07:00
Kittipat Virochsiri	aaafcfc529	Improving usability of schema Summary: This diff contains the following changes: - implementing __repr__ on Field types; this makes it a little easier to see what broken in the unit tests - preserve the shape of ndarray input to schema; previously, empty and scalar arrays lose their shape, while other keeps the shape. - type-checking ndarray input; this ensures basic integrety of schema Reviewed By: xianjiec Differential Revision: D4913030 fbshipit-source-id: bd0f6b8722d95bfe800edf98ba05029c5b99d2af	2017-04-25 10:32:08 -07:00
Hans Gaiser	afd01164f8	Install missing headers. Summary: This PR installs missing include headers. Closes https://github.com/facebookincubator/gloo/pull/30 Differential Revision: D4946478 Pulled By: pietern fbshipit-source-id: da2d532afc43cf9e5e7fc764dc7821e2dfca6b37	2017-04-25 09:42:21 -07:00
Pieter Noordhuis	a123247240	Move SIGPIPE initializer to test main Summary: It should be up to the program including Gloo to ignore SIGPIPE. We have seen a case where the EPIPE errno is not properly handled in an unrelated piece of code. Having SIGPIPE fire means we can get a core and debug this further. Reviewed By: andrewwdye Differential Revision: D4896727 fbshipit-source-id: f6fe2d3f8dc68a9e6c2c457639b45f8aee2d7b20	2017-04-25 09:08:27 -07:00
andrew giessel	41705ce7d5	Add zero padding module (#1326 )	2017-04-25 16:58:51 +02:00
Trevor Killeen	88fc1d39ff	Generic TopK implementation (#744 ) * move TopK to generic * partial genericization of kernel code * introduce TopKTypeConfig, specialize radix type and conversion for floats * implement topk for byte tensor * implement for char tensor * implement for int tensor, extend test to check indices as well * works for longs too * make bitfield set/get a struct, add support for 64-bit types * extend to double tensor * implement for half tensor * asserts; test fix	2017-04-25 16:39:20 +02:00
intel	b3b66e3d00	MKL related files with review comments incorporated Summary: This PR is based on commit "977c6b3" as this version allows MKL to use all the cores available. All MKL related files are added here after incorporating review comments, major changes include 1. usage of Clang-format(Linter) with --style = Google 2. usage of macros for checking input and filter dimension in the mkl operators 3. merged Max and Average pooling functions 4. created a new folder for mkl related python scripts in Python folder and moved them there 5. there is no mkl_alexnet_test.py as that was redundant while convnet_benchmark.py does the same thing Closes https://github.com/caffe2/caffe2/pull/270 Differential Revision: D4905219 Pulled By: Yangqing fbshipit-source-id: e5f5b189714a835b93b9ebda24c52e09572dfca7	2017-04-25 00:31:29 -07:00
Andrey Malevich	7153594d7b	Fix corruption of NameScope when exception is thrown Summary: If exception is getting thrown inside of the namescope it won't be reset to it's previous value. This diff is changing this behavior to expected one. Reviewed By: kittipatv Differential Revision: D4928621 fbshipit-source-id: 1d3579f2093ca60901b0d37ae3f2108deb2333ea	2017-04-24 22:46:27 -07:00
Ahmed Taei	2533671a97	Support 3D&1D SpatialBatchNorm in cuDNN Differential Revision: D4941087 fbshipit-source-id: 4adbf1f8990c7356f8effd8b0e1ae286fce6558c	2017-04-24 22:16:19 -07:00
Nicolas Vasilache	2a098fc20e	string -> std::string in common_rtc.h Summary: In its current form, common_rtc.h can only be included in a file where ``` using namespace std; ``` comes before the include Closes https://github.com/caffe2/caffe2/pull/398 Differential Revision: D4943125 Pulled By: Yangqing fbshipit-source-id: 3ef15c9353e6dd7326fc5f60322049c9f594ee6c	2017-04-24 22:06:31 -07:00
Yangqing Jia	795a8a603b	guard against apple platforms Summary: Mac does not support thread_local, and Caffe supports mac, so we will have to temporarily disable this on mac. (Note: this ignores all push blocking failures!) Reviewed By: marksantaniello Differential Revision: D4945019 fbshipit-source-id: 6d1d828a96459a85e1ae4fb5394eabdd9e610723	2017-04-24 21:19:30 -07:00
Yangqing Jia	d16e8ec8f3	fix thread_local bug Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/405 Differential Revision: D4944669 Pulled By: Yangqing fbshipit-source-id: dd38d2fb06b1d7b36bbb5ffb10070d1932070e21	2017-04-24 20:03:11 -07:00
Aapo Kyrola	5521fa35a5	use CUB to optimize ElementwiseLinearGradientKernel Summary: Use a proper reduction in the gradient kernel. This gives about 25% speedup with the n, D I tried (see P57333872), but with larger N, the improvement can be much more sizeable. Reviewed By: stephenyan1231 Differential Revision: D4941218 fbshipit-source-id: 627eaf26fc20a81f1ef449f39eda0d2191b8c746	2017-04-24 19:31:56 -07:00
Xian Li	4c08d6ae3b	Allow cpu-only grad update in Parallelize_GPU. Summary: Instead of requiring gradient updates on GPU, this change will allow the usage when loss computation happens on GPU while all grad updates happen on CPU. Reviewed By: jhcross Differential Revision: D4943996 fbshipit-source-id: 1f2144c4277dfdb865877e0d0216ca1ac7dd7309	2017-04-24 18:47:36 -07:00
Mathieu Baudet	081001a176	"IsMemberOf" operator Summary: Add a pointwise `IsMemberOf` operator to Caffe2. The original idea was `In` but I think this is not so clear. I used `UnaryElementwiseWithArgsOp` at some point, but it was making the code a bit more difficult to read without bringing any feature. Reviewed By: ender-wieczorek Differential Revision: D4912655 fbshipit-source-id: 716b66bb51468dd59db5f76f23d78cda85961b58	2017-04-24 18:18:49 -07:00
Mathieu Baudet	24ff90ee6b	"Where" operator Summary: Adding a pointwise `Where(condition, left, right)` operator to Caffe2. Reviewed By: ender-wieczorek Differential Revision: D4901402 fbshipit-source-id: a33682e77b2e7367050a94eeb4e10b7e5de9f955	2017-04-24 18:18:48 -07:00
Pieter Noordhuis	437a670ce8	Enable building Gloo only on 64-bit systems Summary: Cannot guarantee Gloo will build on 32-bit systems as we don't run continuous build/test for this. Verified this works by changing 8 to 7 and observing USE_GLOO defaulting to OFF. Closes https://github.com/caffe2/caffe2/pull/401 Differential Revision: D4943135 Pulled By: pietern fbshipit-source-id: 1972658afe819951e24ffbec76eb615c36ab0cc2	2017-04-24 17:40:31 -07:00
yangyanzhan	2994dd6377	Fix python support problems caused by building script errors. Summary: When trying to build caffe2 with python provided by homebrew, I find out there are some errors in the building scripts. The "get_python_cmake_flags.py" script is supposed to find out the correct python library and header file locations. However, due to these errors, this script does not function correctly. After building, caffe2 is linked against the default python library provided by Apple which causes a crash when trying to validate whether or not the installation is successful: ```shell python -c 'from caffe2.python import core' 2>/dev/null && echo "Success" \|\| echo "Failure" ``` The fix is as simple as follows: - Add "shell" so that command substitution could work under Makefile. - Add blank spaces between -D options so that they are treated as options not makefile targets. - Print the "flags" variable without the newline character so that they could be utilized by command substitution correctly. Closes https://github.com/caffe2/caffe2/pull/391 Differential Revision: D4943212 Pulled By: Yangqing fbshipit-source-id: 04d3595fa2d89fe57aed5b6a7a91a95114a82a1b	2017-04-24 17:17:21 -07:00
Janusz Kudelka	902409be56	caffe2: datasets pack/unpack Summary: Two new operators to pack and unpack a dataset. This is so that we can re-use other operators that do not understand the schema format. The immediate use-case is to use it with a partition operator. Packing works by splitting the input into separate tensors, putting them in a vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can copy). Unpack takes the packed input and concatenates it back to the original. I also had a gard time understanding the iteration, so I created a TreeWalker that just hides the complexity of operating with all the arrays and makes the short functions for a given purpose that at least for me are easier to understand. Reviewed By: dzhulgakov Differential Revision: D4918002 fbshipit-source-id: ecbf9196ed25e886a94383961176b8c84dde2d2f	2017-04-24 16:09:39 -07:00
Aapo Kyrola	9cb901caf0	Forward-only rnns Summary: Added option to recurrent_net and RNNCell's for forward_only. If this is set, the backward_step_net is not passed to the operator. When backward_step_net is not available, operator knows it is in forward_only mode and does not create workspaces for each step but cycles through only one private workspace. Note: we could avoid doing a lot of work in recurrent.py:recurrent_network call when backward step is not needed, but doing that nicely requires more refactoring that I did not want to do now. Thus, we create the backward step nets etc, but just don't pass it to the op. This can be used to create more efficient inference models. You can also sanitize existing inference nets and remove the backward_step_net argument to get the benefits. Reviewed By: salexspb Differential Revision: D4916482 fbshipit-source-id: c99b93c9cb897c32b0f449253f7f6d6a942618ad	2017-04-24 15:52:27 -07:00
Ahmed Taei	7440cd5ef4	Add python_func_type to PythonOp Summary: This is needed to have a stateful PythonOp (such as the PyTorch in the following diff) where computing f will produce a state (not tensors) thats consumed by grad_f. python_func_type is a type that constructed as python_func_type(f) and provides forward, backward methods (will be delegated to f, &f_grad). We are constructing this object in at Op registration time to have it as thread local. Differential Revision: D4900963 fbshipit-source-id: 00a6a55fa372e2244048921914e22e710d11f7ce	2017-04-24 15:52:26 -07:00
Janusz Kudelka	eb1130803f	caffe2: smart_tensor_printer Summary: As per request moving elsewhere and using the Dispatcher. The reason why I didn't put it into tensor.h is because the dispatcher lives in operator.h and operator.h includes tensor.h. I also didn't want to do any codemods. If this turns out to be useful it can be changed. Also the name is not super great but the TensorPrinter is already taken so that's what first came to mind. Reviewed By: dzhulgakov Differential Revision: D4893325 fbshipit-source-id: 7d4e56c4e57164c3cd3748f4a705a4ffe6b932d9	2017-04-24 15:52:26 -07:00
Yiming Wu	0bb558716a	rename model_helpers to brew and lowercase all helper functions Summary: rename model_helpers to brew. This is a big diff now. I did these things: 1. replace model_helpers with brew: find . -type f -exec sed -i 's/model_helpers/brew/g' {} + 2. rename model_helpers.py and model_helpers_test.py 3. rename ModelHelpersTest to BrewTest 4. lowercase all the helper functions to distinguish them from single op 5. run my unittests 6. run converge tests Reviewed By: salexspb Differential Revision: D4930465 fbshipit-source-id: f420a1b03238df1cbe9f4426e0b9c43a12119661	2017-04-24 15:52:26 -07:00
Yiming Wu	bef6e45f8b	rename ModelHelperBase Summary: rename ModelHelperBase to Model. This is the result of running: find . -type f -exec sed -i 's/ModelHelperBase/ModelHelper/g' {} + We had 19 results when fbgs ModelHelperBase. Here is 20 instances because I added 1 test in model_helpers_test.py Reviewed By: salexspb Differential Revision: D4928337 fbshipit-source-id: bc4c12b60b90c167e717de50ea9fe17521e142e3	2017-04-24 15:52:26 -07:00
Aapo Kyrola	f407078d38	ReduceFrontSumOp: striped Axpby Summary: Instead of calling math::Axpby in a loop, we can do it in one kernel much more efficiently. Reviewed By: asaadaldien, jamesr66a Differential Revision: D4935893 fbshipit-source-id: 33497784604d1779723d578ea5400e87803851f0	2017-04-24 15:52:26 -07:00
Aapo Kyrola	2e74129f0e	ReduceDimsGradientOp: replace multiple Scale calls with a batched/striped one Summary: jamesr66a noticed that the ScaleKernelAlphaDevice kernel was showing up in a profiler a lot. This was because it is called in a loop in ReduceFrontSumGradientOp. This was easy to replace by one kernel that scales in a "striped" manner. Reviewed By: asaadaldien, jamesr66a Differential Revision: D4935888 fbshipit-source-id: bc7bfd8c94988074ace6fbf3fdfb85905027f272	2017-04-24 15:52:26 -07:00
Alexander Sidorov	bf20e4e9b0	Remove MiLSTM from recurrent.py left over after refactoring Summary: its not used Reviewed By: urikz Differential Revision: D4936008 fbshipit-source-id: cc044bbdac0d17503ce9376b98e4bf79a4dc959c	2017-04-24 15:52:26 -07:00
Alexander Sidorov	4f77a49ddd	refactor LSTM test to avoid copy pasta, improve speed 1.5x and provide better coverage Summary: This is getting too messy again. So cleaning up it even more. One thing I added here - not calling random to generate the input sequence. Ideally we do this for all other inputs. This was reported to be an issue when hypothesis finds bad examples - it can make it run very long. Also I tunned ranges a bit so test finishes faster. On my devgpu test the whole test took 600 before and now is 39 seconds. One more important thing - we want to test all combinations of things that are in the for loop. While things provided by hypothesis are just random tensor inputs. Differential Revision: D4902956 fbshipit-source-id: ceb02d6761406b3192101d3b255abe90b2866770	2017-04-24 15:52:26 -07:00
Yangqing Jia	684607a793	Add a friendly error message for unzipped mnist file. Summary: Closes https://github.com/caffe2/caffe2/pull/370 Differential Revision: D4933662 Pulled By: Yangqing fbshipit-source-id: a5a8a07ccd49325d2ab493abf695abd99e49bd35	2017-04-24 15:52:25 -07:00
Aapo Kyrola	41f4198344	CUDA version of PRelu/Gradient + Fix Gradient for dW Summary: CUDA version of PRelu and its gradient. Forward pass is straightforward, backward pass requires reductino over the weights. tsaizhenling, please patch this and test. Differential Revision: D4931630 fbshipit-source-id: 1238e7d536e41480713865ced91aaef88f4feef5	2017-04-24 15:52:25 -07:00
Mohammad Hossain	3b0069a014	Expose operators execution statistics to python frontend. Summary: To expose operators execution statistics in python, profiling measurements collected in ProfDAGNet class is leveraged. In current implementation, a new operator is defined that outputs the statistic data in a protobuf message. In the frontend, OperatorStatsContainer works as a wrapper to print ProfDAGNet statistics. Differential Revision: D4923009 fbshipit-source-id: 18a6d76a405ef277a3fca7a312609051cf943207	2017-04-24 15:52:25 -07:00
Luke Yeager	09bb91022a	Fix tests for ops without a CUDA backend Summary: See https://github.com/caffe2/caffe2/pull/227 * Logit * ReplaceNaN * BatchOneHot Closes https://github.com/caffe2/caffe2/pull/277 Differential Revision: D4915268 Pulled By: Yangqing fbshipit-source-id: 77ccb2e7d03e6953e8ca60646987a02868d0ef5b	2017-04-24 15:52:25 -07:00
Robert Koch	8387bc4680	Added Python_ADDITIONAL_VERSIONS to cmake so python 2 is default. Summary: When installing on systems such as Arch Linux where the default python version is 3 the build will fail. To fix this instead of changing the python link in the shell it is more efficient to set the default python version allowed by cmake. Closes https://github.com/caffe2/caffe2/pull/361 Differential Revision: D4932214 Pulled By: Yangqing fbshipit-source-id: 06997d2df68b8e4037d72fd49813f6f74ca7591b	2017-04-24 15:52:25 -07:00
Aapo Kyrola	b82f9e9ea7	FindOp Summary: Simple FindOp for CPU and GPU which searches a list of unordered needles from an unordered index. CPU version might be faster if first sorting the index / needles, but we can get back to that later. CUDA op is also kind of brutish, but pretty parallel. Since the index and the queries are smallish at least in the use case currently in mind (Machine Translation's team word candidate search), I think this is a sufficient start. Note that this is much simpler than the Index-class of ops which allow modifying the index etc. Since CUDA ops are more complex to implement for the full Index functionality, I decided to make a separate op with this very simple functionality. Differential Revision: D4910131 fbshipit-source-id: 6df35c9e3c71d5392a500d5b98fd708ab0c8e587	2017-04-24 15:52:25 -07:00
Andrew Dye	f07ec699ee	Add rendezvous timeout parameter and defaults to StoreHandler::wait() Summary: Add default rendezvous timeout for RedisStoreHandler and FileStoreHandler. Reviewed By: pietern Differential Revision: D4911678 fbshipit-source-id: e69dd03d96214449944d583b20941540cc0b6643	2017-04-24 15:52:25 -07:00
Yiming Wu	fa261cdafb	arg_scope for model_helper Summary: arg_scope module for model_helpers. Some coding example with it: with model_helpers.arg_scope([model_helpers.FC], kwargs): model_helpers.FC(model, "x", "out_1", n, n) with model_helpers.arg_scope([myhelper], n=-3): with model_helpers.arg_scope([myhelper], n=-2): with model_helpers.arg_scope([myhelper], n=n): res = model_helpers.myhelper(None) with model_helpers.arg_scope([myhelper], n=-3), \ model_helpers.arg_scope([myhelper], n=-2), \ model_helpers.arg_scope([myhelper], n=n): res = model_helpers.myhelper(None) Reviewed By: salexspb Differential Revision: D4837180 fbshipit-source-id: 2cbd81681779d6cd1e61ee189edcc1cf3bb07d15	2017-04-24 15:52:25 -07:00
Madelaine Boyd	199a09c7dd	XCode -> Xcode Summary: Insufferable Apple fanboys have burned this into my brain. Reviewed By: Yangqing Differential Revision: D4913772 fbshipit-source-id: 486c20e9c921	2017-04-24 15:52:24 -07:00
Yangqing Jia	a48062b1a2	temporarily fix sync script bugs changes by reverting partially https://github.com/caffe2/caffe2/pull/266/files	2017-04-24 15:49:22 -07:00
Pieter Noordhuis	9899512401	Remove common.h from root Summary: This file was left over after a recent refactoring but is not used. Reviewed By: andrewwdye Differential Revision: D4940265 fbshipit-source-id: 01f8c5fbc73dd0ca0a92306dbfef22ff28133750	2017-04-24 13:51:15 -07:00
Pieter Noordhuis	d95feb3feb	Only build on 64-bit systems Summary: While it is theoretically possible to make Gloo work on 32-bit systems, it's unlikely anybody would ever use it on 32-bit systems. This removes the expectation that it should work... Fixes #28 Closes https://github.com/facebookincubator/gloo/pull/31 Differential Revision: D4939073 Pulled By: pietern fbshipit-source-id: 8c60804f7ae5cf835332871a424aefa2c498e8a4	2017-04-24 10:38:45 -07:00
Sam Gross	3ab074b3c5	Fix torch.stack() with Variable inputs (#1345 )	2017-04-24 12:20:51 -04:00
Adam Paszke	6a69f7007b	Revert "add keyword `out` for autograd function Concat to match torch.cat (#1336 )" (#1340 ) This reverts commit 71b9dea6ecc2278511ba6c2531437d27d9a2b8c8.	2017-04-23 19:19:27 +02:00
陈云	71b9dea6ec	add keyword `out` for autograd function Concat to match torch.cat (#1336 )	2017-04-23 15:36:24 +02:00
Dmitry Ulyanov	fa4f363b93	Instance norm (#1283 ) * instance norm * fix whitespaces * whitespaces * docs * "C" letter was cyrillic in docs, fixed * remove force_eval, fix non contiguous case	2017-04-23 14:49:15 +02:00
Sam Gross	aab30d4ea2	Fix errors when no CUDA devices are available (#1334 ) Fixes #1267 This fixes a number of issues when PyTorch was compiled with CUDA support but run on a machine without any GPUs. Now, we treat all errors from cudaGetDeviceCount() as if the machine has no devices.	2017-04-23 14:45:27 +02:00
Christian Sarofeen	2b56711c24	Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous. (#1325 )	2017-04-22 04:22:32 -04:00
soumith	2fa3365f94	Merge commit '5224fc56b03b6468cb85ccf39034b8ab0d76d04e'	2017-04-22 01:14:34 -07:00
Soumith Chintala	5224fc56b0	fix typo	2017-04-22 10:14:09 +02:00
soumith	4373580e6b	Merge commit 'e80a3a7f7b8d0e179c1481e0744f08e9385b31f3'	2017-04-22 01:11:10 -07:00
soumith	d9406a8a1a	Merge commit '10387a3f35573462e18219c321ff550757ce9b09'	2017-04-22 01:10:53 -07:00
Christian Sarofeen	e80a3a7f7b	Indexing fix for fused GRU/LSTM kernels when all tensors are not contiguous.	2017-04-22 01:09:46 -07:00
Soumith Chintala	5b83fe6781	add contiguous checks	2017-04-22 09:57:36 +02:00
Sam Gross	24d92b5d9f	Concatenate directly into shared memory when constructing batches (#1323 ) This saves an extra memory copy, which speeds up data loading a bit (5-10% with accimage). As part of this change: * torch.cat accepts keyword argument out * sepcifiying out=None is treated like not specifying out	2017-04-22 03:40:30 -04:00
Sam Gross	1375694853	Document torchvision members	2017-04-21 12:50:36 -07:00
Edward Z. Yang	be5e399d46	Add a simple README for torch/lib. (#1322 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-21 15:06:12 -04:00
Nay Oo	884690adb3	build_ios.sh comments fixes Summary: Changed _Android_ to _iOS_ in the comments in scripts/build_ios.sh. Closes https://github.com/caffe2/caffe2/pull/364 Differential Revision: D4930101 Pulled By: Yangqing fbshipit-source-id: 8f0a6aa1b43fd57c2f71f1c667c61d1f69b1e061	2017-04-21 10:52:29 -07:00
Yangqing Jia	57b51db8d7	Add a guard function to check Caffe2 linking setup. Summary: This helps diagnosing issues like #346 Closes https://github.com/caffe2/caffe2/pull/354 Differential Revision: D4928347 Pulled By: Yangqing fbshipit-source-id: b45685f1da18cbc49be293260b1fc2268fe5cd4c	2017-04-21 03:38:37 -07:00
Jay Mahadeokar	4dafb608e7	Fix char_rnn LSTM import Summary: Fix for char_rnn.py with latest LSTM changes in rFBS779c69758cee8caca6f36bc507e3ea0566f7652a. Fixed some linting issues. Reviewed By: salexspb Differential Revision: D4927018 fbshipit-source-id: cda760a170056b8bc237b4c565cc34800992c8e0	2017-04-20 22:46:19 -07:00
James Reed	01c76bf830	Optimize TransposeOp by using strided access pattern, bulk memory transfer, and other profile-guided optimizations Summary: Work in progress for improving the performance of the TransposeOp on CPU. This is used extensively for inference in several neural MT systems, so optimizing this function is worthwhile and will reduce request latency. Differential Revision: D4913075 fbshipit-source-id: fa2742829291d91f3eba00fdfe7d6c0dae83e206	2017-04-20 18:31:40 -07:00
Yangqing Jia	9f86de2dc7	Support WatchOS build Summary: To build, run `IOS_PLATFORM=WATCHOS scripts/build_ios.sh` Closes https://github.com/caffe2/caffe2/pull/321 Reviewed By: Yangqing Differential Revision: D4923400 Pulled By: salexspb fbshipit-source-id: 3a87f068562a01e972ea915c9be32f0667e8ea19	2017-04-20 18:15:47 -07:00
Soumith Chintala	10387a3f35	fix gradBias checks	2017-04-20 19:21:50 -04:00
Aapo Kyrola	627921d01d	Use CUDA standard tanh for lstm Summary: Better to use standard library tanh(), because there can be numerical differences to other systems. Reviewed By: urikz Differential Revision: D4910421 fbshipit-source-id: 3a1e63cd20a6b8e3720a1deafea227652b38205e	2017-04-20 16:19:57 -07:00
Soumith Chintala	a782a6231f	Merge commit 'e788ea40de0f7ef393f1b602098a6775a95d8976'	2017-04-20 19:00:45 -04:00
albanD	e788ea40de	fix typo in TH_APPLY for _dimOffset	2017-04-20 18:59:12 -04:00
Aaron Schumacher	6089900011	grammar/typo: "There's 3" -> "There are three" Summary: Closes https://github.com/facebookincubator/gloo/pull/27 Differential Revision: D4919746 Pulled By: pietern fbshipit-source-id: 35733b75fc169d2ccff8b10df013eed8c279dfd5	2017-04-20 15:19:56 -07:00
soumith	81345306c8	Merge commit '8236d38e81396ac48697ac289c0476cff18a8e08'	2017-04-20 15:03:48 -07:00
Aapo Kyrola	6ed36c37e6	fix CUDNN layer weight size calculation for multiple layers Summary: CuDNN LSTM weights were incorrectly sized for layers > 0: there was assumption that the input size to middle layers is same as for the first layer, but actually the middle layer will get input from a layer below, which will have dimension equal to the output dimension (hidden dimension). This worked fine when input_dim and hidden_dim were equal, as are the default params for lstm_benchmark. Reviewed By: salexspb Differential Revision: D4922824 fbshipit-source-id: 3ed05529dcb0a4e66ad440084a55df1c5932fd33	2017-04-20 15:02:48 -07:00
soumith	f0a19e2617	Merge commit '331219c5506b26bf0906b7acdafb4823e07a924e'	2017-04-20 15:01:22 -07:00
soumith	8236d38e81	add cusparse link dependency	2017-04-20 14:31:30 -07:00
Martin Raison	8adf8fe2ed	create and expose handles for cusparse	2017-04-20 14:30:14 -07:00
Bram Wasti	4f2531fbaa	syncing fbandroid/objc to fbcode Reviewed By: ajtulloch fbshipit-source-id: 5be2fd2636bc6106be1f7be49cddd439dfe8d28a	2017-04-20 12:53:24 -07:00
Christian Sarofeen	d2472d1ab5	Disable cudnn dilated convolutions for kepler. (#1308 )	2017-04-20 15:31:45 -04:00
Luke Yeager	f768233a1c	Fix a data_workers test Summary: This is a global variable which can be incremented by other tests. Before: ``` $ pytest -v caffe2/python/data_workers_test.py ... caffe2/python/data_workers_test.py::DataWorkersTest::testGracefulShutdown PASSED caffe2/python/data_workers_test.py::DataWorkersTest::testNonParallelModel FAILED ============================================= FAILURES ============================================== _______________________________ DataWorkersTest.testNonParallelModel ________________________________ self = <data_workers_test.DataWorkersTest testMethod=testNonParallelModel> def testNonParallelModel(self): model = cnn.CNNModelHelper(name="test") coordinator = data_workers.init_data_input_workers( model, ["data", "label"], dummy_fetcher, 32, 2, ) > self.assertEqual(coordinator._fetcher_id_seq, 2) E AssertionError: 4 != 2 caffe2/python/data_workers_test.py:38: AssertionError ----------------- Closes https://github.com/caffe2/caffe2/pull/211 Differential Revision: D4916591 Pulled By: Yangqing fbshipit-source-id: 281f12d7f02dbd0ce0932024cf1f16cd12130112	2017-04-20 11:38:11 -07:00
Chalise Grogan	41b7217898	Fix url to original Caffe external resource in README. (#317 )	2017-04-20 11:31:50 -07:00
Jun Luan	95f123a83e	fix download progress bar's percentage exceed 100% Summary: downloaded_size need to be added with the length of returned data_chunk. When the last block's size less than chunk, the percentage should exceed 100% Closes https://github.com/caffe2/caffe2/pull/329 Differential Revision: D4922227 Pulled By: Yangqing fbshipit-source-id: 7d05d9bbf2dad0a9d330be96b60e658908185a46	2017-04-20 10:41:06 -07:00
James Cross	51033f19d7	unbreak test_seq2seq_caffe2_model_cnn_one_stack_encoder Summary: Fixes unit test test_seq2seq_caffe2_model_cnn_one_stack_encoder, broken by D4905003. (Also some commas.) Differential Revision: D4920699 fbshipit-source-id: 2fe501095e3e26a475d666afcae8e48c953f2eef	2017-04-20 10:06:25 -07:00
Martin Raison	331219c550	define abs for short too	2017-04-20 09:55:17 -07:00
Kittipat Virochsiri	a790256537	Add option to control the size of lengths tensor Summary: This would allow us to pin the size of lengths tensor to the batch size. I'll use this in a follow up diff. Reviewed By: kennyhorror Differential Revision: D4906634 fbshipit-source-id: 8d3d151f33fd99547d9940e7c663779810283eb6	2017-04-20 09:53:22 -07:00
Ahmed Taei	249dc01f0d	Set cuDNN pooling mode to match CPU&CUDA implementations Summary: Set pooing mode to execlude padding values and match CPU&CUDA implementations. Differential Revision: D4920476 fbshipit-source-id: 26ce656cc792061f706e2acb37e72cec46ac77c8	2017-04-20 09:22:00 -07:00
Aapo Kyrola	5a856ce03e	disable dropout completely when not used Summary: salexspb recognized that my diff of fixing num_layers>1 cudnn lstm made it run much slower. Turns out this was caused by adding the dropout states to the gradient op (which it was missing ,that was a bug). But since we use dropout=1.0, we don't need to initialize the dropout states, and turns out this improves the perf of CuDNN LSTM very significantly, at least when hidden_dim is small (2.5x increase with hidden_dim=40). With large hidden_dim, the improvement is more modest. Reviewed By: salexspb Differential Revision: D4920543 fbshipit-source-id: 860c9d4c61793252f658dc5e3390bab571476be5	2017-04-20 08:40:25 -07:00
Chris Hiszpanski	5b6fb047aa	Fix parallel build support in makefile Summary: Top-level makefile had `make` hardcoded, resulting in slow build and the following message when following installation instructions: warning: jobserver unavailable: using -j1. Add `+' to parent make rule. Replacing this recursive make command with the variable MAKE fixes the issue. Closes https://github.com/caffe2/caffe2/pull/324 Differential Revision: D4920978 Pulled By: Yangqing fbshipit-source-id: 1e75ab41786e52d1b7abcc2c46ad1088880d8c1d	2017-04-20 01:35:03 -07:00
Benjamin Hou	e34c5dc1c3	macOS build issue with set_affinity() in net_gpu.cc Summary: see: https://github.com/caffe2/caffe2/issues/306 Closes https://github.com/caffe2/caffe2/pull/308 Differential Revision: D4915436 Pulled By: Yangqing fbshipit-source-id: d9186792e31d137ba506d83c3b8bb04dc78b956f	2017-04-20 01:16:57 -07:00
Kittipat Virochsiri	fd9185ab21	fix getting empty struct Summary: `not field` calls `__len__()`, causing the field to appear to be missing even when it's not Differential Revision: D4910587 fbshipit-source-id: bc2b2fadab96571ae43c4af97b30e50c084437af	2017-04-19 22:36:05 -07:00
Wael Abdelghani	47ce345699	Limit the maximum memory for keep_on_shrink for predictor Summary: We had to disable keep_on_shrink flags for inference and some training workloads, this change limits the memory allowed to be kept around when we are allocating smaller blob after a bigger one. Differential Revision: D4889366 fbshipit-source-id: 87412cc1c0bf2c43ea1f3f19e31afc178bc1b9db	2017-04-19 22:16:26 -07:00
Yangqing Jia	8f43e3fe36	update ios-cmake	2017-04-19 22:10:31 -07:00
Kittipat Virochsiri	e5e3ec1498	fix unit test Summary: CUDA is not implemented Reviewed By: xianjiec Differential Revision: D4917368 fbshipit-source-id: dc41a76cf569018896cf457c0e3358ce840e198e	2017-04-19 17:22:00 -07:00
Andrew Dye	7805ac9098	Base Store::wait() should ignore timeout for back compat Summary: PrefixStore::wait() uses a default timeout if unspecified. This is incompatible when using PrefixStore to wrap a Store implementation that does not support timeout. Instead the base Store::wait(keys, timeout) implementation is called, throwing an exception. This change modifies the base implementation to ignore the timeout. Differential Revision: D4916517 fbshipit-source-id: 3cdd83bd209bf938b58442d82f3fc245e68019ad	2017-04-19 16:49:44 -07:00
Kittipat Virochsiri	4bc40d0658	reset environment after every example Summary: Hypothesis tests only call `setUp()` once per test. It's annoying to reset manually. Reviewed By: xianjiec Differential Revision: D4911862 fbshipit-source-id: 6b1c11daf002d51c8a0d532261506bcb20429438	2017-04-19 16:46:16 -07:00
Huazhong Ning	001598a59b	add net gradient check Summary: 1. add net gradient check to dper2 model unittest framework 2. add net gradient check to mtml model 3. refactor the code setting defaults to namedtuple. Reviewed By: kittipatv Differential Revision: D4897169 fbshipit-source-id: 4f17dd06ee169aa1158f12f5156614d45d7d97c1	2017-04-19 15:19:55 -07:00
Yiming Wu	4ad3a4fc8b	Revert D4794432: Added tiles and axis as input parameters to Tile Operator Summary: This reverts commit a7e38f4f925a4cedf530924bd426c3bb08b5aad8 Differential Revision: D4794432 fbshipit-source-id: 05b2b0d101ebd917527e94ef8a74e63ab40942a4	2017-04-19 14:17:25 -07:00
Adam Paszke	5f65ee9ca0	Add more newContiguous calls and checks	2017-04-19 14:01:31 -07:00
inspire99	f750a2d2df	fix a few typos Summary: fix typo: Dimention, probablity Closes https://github.com/caffe2/caffe2/pull/310 Differential Revision: D4915798 Pulled By: Yangqing fbshipit-source-id: 3a16d3adc469c9930ce0dad8584c4678b3c3b5c0	2017-04-19 13:31:33 -07:00
Kittipat Virochsiri	883ff96f74	Allow UniformIntFill to produce empty tensor Summary: This is needed for the completeness of random negative sampling. When the pool size is 0, we want to generate empty indices tensor. Reviewed By: xianjiec Differential Revision: D4906866 fbshipit-source-id: 75d66a92d15d60bb37bcd1075d324f28069c4fa0	2017-04-19 13:03:23 -07:00
Ahmed Taei	b294aadc66	fp16 support for FullyConnected op(Fixed) Summary: This diff resloved some issues in reverted PR246. Differential Revision: D4911821 fbshipit-source-id: 0a6fa47f4c2405475697e40fb926758c534f8ef7	2017-04-19 12:49:12 -07:00
Lukasz Wesolowski	f9149b1f2e	Fix halving-doubling corner cases Summary: Fixes for corner cases with small element counts. Fixed problems include (1) calling range on out of bounds pointers, (2) failing to allocate send or receive buffers in cases where they correspond to out of bounds indices for reduce-scatter, but are needed in the allgather, (3) not allocating enough receive buffer space (more than count_ bytes may be needed in some cases) Reviewed By: pietern Differential Revision: D4912656 fbshipit-source-id: 0409d01894ff9c93ef1a1fdf8021c9ecf62f9b57	2017-04-19 12:20:28 -07:00
Lei Chen	8b5782ed5c	Weighted sampling dequeue operator Summary: Similar to SafeDequeueBlobsOp, but add weight-based sampling for reading from multiple input BlobsQueue. WeightedSampleDequeueBlobsOp will take a vector of weights (each weight is mapped to one input blob queue). Based on probability, we will choose which BlobQueue to fetch. WeightedSampleDequeueBlobsOp shall stop when any of input BlobQueue is empty. Reviewed By: dzhulgakov Differential Revision: D4905160 fbshipit-source-id: 5b1551e2250569f933a6c01ed04442843c5e0cb6	2017-04-19 12:02:06 -07:00
Aaron Markham	d47c1362c5	changed doxygen config to new docs path (#311 ) * updated ubuntu instructions * updated ubuntu notes and troubleshooting * updated tutorials using local files * added doxygen python blocks for docs generation * doxygen related files for generating docs * removing Mac and Windows build status while those are in beta * inference lookup is local now * launch updates * moved to docs folder, updating paths	2017-04-19 11:49:59 -07:00
Aaron Markham	d58141ec4c	launch updates (#309 ) * updated ubuntu instructions * updated ubuntu notes and troubleshooting * updated tutorials using local files * added doxygen python blocks for docs generation * doxygen related files for generating docs * removing Mac and Windows build status while those are in beta * inference lookup is local now * launch updates	2017-04-19 11:40:51 -07:00
Ajinkya Kolhe	6cae3fa896	Typo in Build version of ubuntu (#294 )	2017-04-19 11:25:59 -07:00
Hannes Badertscher	9ef30b337e	Add six to Tegra X1 install script Summary: When compiling Caffe2 on a Jetson TX2 using JetPack 3.0, the compilation with the Tegra X1 build script runs through perfectly fine. However, when running from caffe2.python import workspace the following error shows up: > ImportError: No module named six After installing `six` manually using sudo pip install six this works fine. I thus added the `six` module to the install script. I assume this will also be required for the `build_raspbian.sh` script, however as I could test this, I didn't add it (yet). Closes https://github.com/caffe2/caffe2/pull/293 Differential Revision: D4914121 Pulled By: Yangqing fbshipit-source-id: 75947e8c295e1f5ad3f480a025fe8518dd91a957	2017-04-19 11:02:23 -07:00
Balint Cristian	b89688658c	Missing CUDA_NVCC_FLAGS & CUDA_HOST_COMPILER flags at GPU arch detection. Summary: This tiny patch fix missing ```CUDA_NVCC_FLAGS``` & ```CUDA_HOST_ARCH``` from ```caffe_detect_installed_gpus()```. ----------------- People may want define their custom flags or compilers that are more CUDA compatible. Automatic gpu arch detection ignores these flags and fail. Example of such custom flags: ``` cmake . \ -DCUDA_ARCH_NAME="Auto" \ -DCUDA_HOST_COMPILER="/usr/bin/gcc5" ``` * Autodetection part fails regardless proper compiler flags are passed, due to system gcc 7.0 that doesnt work with CUDA thus all arch will be enabled: ``` -- The C compiler identification is GNU 7.0.1 -- The CXX compiler identification is GNU 7.0.1 ...//\\... -- CUDA detected: 8.0 ...//\\... -- Automatic GPU detection failed. Building for all known architectures. -- Added CUDA NVCC flags for: sm_20 sm_21 sm_30 sm_35 sm_50 sm_60 sm_61 ``` * Patch fix the autodetection time as expected: ``` $ cmake ../ -DCUDA_NVCC_FLAGS="-Xcompiler=-std=c++03 -I/usr/include/cuda/" -- The C compiler identification is Closes https://github.com/caffe2/caffe2/pull/288 Differential Revision: D4914215 Pulled By: Yangqing fbshipit-source-id: c407a750e03cb163f9d57f9f6403042704046014	2017-04-19 11:02:23 -07:00
Benjamin Hou	ea493c6fda	build error in context_gpu_test.cc Summary: caffe2/caffe2/core/context_gpu_test.cc:97:31: error: implicit instantiation of undefined template 'std::__1::array<CUstream_st *, 2>' std::array<cudaStream_t, 2> temp = {0}; (fixes build issue on macOS 10.11.6) Closes https://github.com/caffe2/caffe2/pull/296 Differential Revision: D4914191 Pulled By: Yangqing fbshipit-source-id: 5a2c218eef0f04e0dbfcaf951dd4749424b8cfaa	2017-04-19 11:02:23 -07:00
Yangqing Jia	94ee2f3662	update gloo to master to address #286	2017-04-19 10:57:39 -07:00
BenFielding	a8e6610e3d	Fix argument typo in pad_packed_sequence docstring (#1300 )	2017-04-19 13:50:59 -04:00
Aapo Kyrola	bef5720b76	Flag to report total memory in GPUs + op and python func to retrieve Summary: If command line flag caffe2_gpu_memory_tracking is enabled, CUDAContext will keep track of total memory allocated on each GPU. This requires keeping tracking of the sizes of the pointers, thus it might add some overhead, and is thus optional. The overhead is minimal in practice since we don't do allocations after first iterations, usually, though. Added an op GetGPUMemoryUsage() to fetch this data programmatically, and python function utils GetGPUMemoryUsageStats() to call this op and package the results. Modified LSTM benchmark to report these stats. This tracking is only for GPU now. CPU allocations are less organized.. Reviewed By: asaadaldien Differential Revision: D4877451 fbshipit-source-id: 857798fe499d8c78cc590783052cbb2d4db56ea0	2017-04-19 10:49:11 -07:00
Pieter Noordhuis	56cc1e219b	Fix include in mpi/context.cc Summary: memcpy comes from cstring See https://github.com/caffe2/caffe2/issues/286 Reviewed By: Yangqing Differential Revision: D4914228 fbshipit-source-id: de60c2a98feb4228546a8f1fe237a090101f50e4	2017-04-19 10:19:55 -07:00
Andrew Dye	1607042bf4	Add timeout parameter and default to rendezvous Store::wait() Summary: TSIA. Defaulting to 30s. Reviewed By: pietern Differential Revision: D4909202 fbshipit-source-id: 7f86f390077a19e559c90a1aa3aa768e273325d1	2017-04-19 10:11:56 -07:00
Yangqing Jia	41620f86c9	Update IntelComposerXE to 2017.2.274 Summary: Due to the massive dependencies I did not update the version number - under the same big version number (2017) the API is compatible so no need to rebuild all the dependencies. This will unblock the Caffe2 Intel pull request on MKLDNN. Differential Revision: D4906463 fbshipit-source-id: 0f74436ac3a05605e35b8b649c3e8b5c1c69b500	2017-04-19 10:07:09 -07:00
Yiming Wu	8a47857ef1	group_conv fix Summary: gorup conv bug fix. Calling conv without model Differential Revision: D4911690 fbshipit-source-id: fc7dd7d1b7056dd2a4a02f97ad037ee29c4d8c24	2017-04-19 10:07:09 -07:00
Andrew Dye	7d023cda6c	Add timeout to RedisStore::wait() Summary: Add a default 60s timeout to RedisStore::wait() to avoid blocking indefinitely when peer machines are unavailable. Reviewed By: pietern Differential Revision: D4908699 fbshipit-source-id: 39de9066633e8b0c8d1ee198b6bf3f70d3961196	2017-04-19 09:58:05 -07:00
gchanan	9e8b4ef075	Include THCNumerics.cuh in THCAtomics.cuh. (#752 )	2017-04-19 12:08:22 -04:00
Shubham Jain	a35f507532	Update functional.py (#1298 )	2017-04-19 11:07:12 -04:00
Victor Quach	6aa22beb86	Fix loss.py docs (#1296 )	2017-04-19 11:03:15 -04:00
Xianjie Chen	4c70612320	small change to schema Summary: as desc. small fix in the feature_proc layer for the case when we only have one preproc type Reviewed By: chocjy Differential Revision: D4908933 fbshipit-source-id: 1338048fc395f85c3724721a9996ad1ee51f0f20	2017-04-19 01:17:22 -07:00
Huazhong Ning	f950a1b70f	create bucket-based calibration - model manipulation Summary: added a new context to layers.py Reviewed By: kennyhorror Differential Revision: D4817124 fbshipit-source-id: 36f08964b86092e81df24c1b9d4b167293a7ffb8	2017-04-18 22:01:23 -07:00
Shenxiu Liu	8492c411e8	Caffe2 unit test for unmask Summary: unit test using hypothesis for unmask operator Reviewed By: ender-wieczorek Differential Revision: D4904075 fbshipit-source-id: 874d3756ec703ab2cc82f24f7160b4254bf791f1	2017-04-18 21:06:18 -07:00
Luke Yeager	b7be2016aa	Fix typos in memonger.py Summary: Found while browsing the code. Cool stuff in here! Closes https://github.com/caffe2/caffe2/pull/276 Differential Revision: D4911421 Pulled By: Yangqing fbshipit-source-id: 3bef10a4001a6b4d4527c054519d69131799a0e2	2017-04-18 20:52:41 -07:00
Pieter Noordhuis	71bf8fb55b	Clean up fd from destructor when in listening state Summary: It's possible the pair is in the listening state when it is destructed. The fd will not have been cleaned up in that case, so we shouldn't assert that being the case. Reviewed By: andrewwdye Differential Revision: D4909964 fbshipit-source-id: 7103d74910e3bcf5de9f4658d8f1f682b6c8a70c	2017-04-18 17:51:49 -07:00
Dmytro Dzhulgakov	580e192151	Revert D4870606: caffe2: datasets pack/unpack Summary: This reverts commit dc29428de5c96cc3039af2885d9e4b026d9f482d Differential Revision: D4870606 fbshipit-source-id: 1d05912b1a9e35e84b0c163c7b018db125ce060f	2017-04-18 16:47:05 -07:00
Kittipat Virochsiri	23230215a9	Add run_train_net_forward_only() to LayersTestCase Summary: Make it convenient to test a model where we don't care about the backward pass, e.g., when the backward pass won't be run anyway. Reviewed By: xianjiec Differential Revision: D4906890 fbshipit-source-id: 9da51a9de4422474ce780e180b1ca95d6bc8c46d	2017-04-18 16:47:05 -07:00
Huazhong Ning	ad6b53e401	allow to specify output dtypes for functional layers Summary: Currently, the functional layer infers the output types and shapes by running the operator once. But in cases where special input data are needed to run the operator, the inferrence may fail. This diff allows the caller to manually specify the output types and shapes if the auto infererence may fail. Reviewed By: kennyhorror Differential Revision: D4864003 fbshipit-source-id: ba242586ea384f76d745b29a450497135717bdcc	2017-04-18 16:34:52 -07:00
Soumith Chintala	c7d83a16f6	Update README.md	2017-04-18 19:05:18 -04:00
Adam Paszke	934816c01c	Change the default algo for cuDNN conv forward to PRECOMP_GEMM (#1290 )	2017-04-18 19:01:47 -04:00
soumith	5a0510934f	Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c'	2017-04-18 15:21:20 -07:00
Kittipat Virochsiri	009bbc9983	Allow UniformFill/UniformIntFill to take parameters from input blobs Summary: This will be used to generate random indices input to `Gather` Reviewed By: xianjiec Differential Revision: D4904591 fbshipit-source-id: 8d858631e3d640be2cec12f1566cbf195e6aad4b	2017-04-18 14:31:03 -07:00
DigiDigi	fc19473501	Corrections in legacy modules. (#1286 )	2017-04-18 17:13:53 -04:00
Edward Z. Yang	34546f022a	Expose dilated convolutions. Fixes #1225. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-18 17:13:02 -04:00
Edward Z. Yang	ab77742f6e	Add some missing documentation for arguments. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-04-18 17:13:02 -04:00
Janusz Kudelka	34269a6fda	caffe2: datasets pack/unpack Summary: Two new operators to pack and unpack a dataset. This is so that we can re-use other operators that do not understand the schema format. The immediate use-case is to use it with a partition operator. Packing works by splitting the input into separate tensors, putting them in a vector and wrapping in a shared_ptr (as opposed to a unique_ptr, so we can copy). Unpack takes the packed input and concatenates it back to the original. I also had a gard time understanding the iteration, so I created a TreeWalker that just hides the complexity of operating with all the arrays and makes the short functions for a given purpose that at least for me are easier to understand. Reviewed By: dzhulgakov Differential Revision: D4870606 fbshipit-source-id: dc29428de5c96cc3039af2885d9e4b026d9f482d	2017-04-18 13:31:10 -07:00
Martin Raison	701e63107f	speed improvements, fix tests	2017-04-18 12:46:54 -07:00
Martin Raison	655c22569e	CPU hspmm + more efficient reorder	2017-04-18 12:46:54 -07:00
Martin Raison	cd3bbc9dfd	more operations and optimizations (hspmm, reorder, ...)	2017-04-18 12:46:54 -07:00
Martin Raison	1018b238ac	make gradients contiguous in adagrad	2017-04-18 12:46:54 -07:00
Martin Raison	e27bd4ce7a	faster cadd	2017-04-18 12:46:54 -07:00
Martin Raison	b2acc33c73	contiguousValues method	2017-04-18 12:46:54 -07:00
Martin Raison	40804830b8	mark_contiguous operation	2017-04-18 12:46:54 -07:00
Martin Raison	01d84c5f9d	revert sparse cuda index type change	2017-04-18 12:46:54 -07:00
Martin Raison	88b42324e7	spcadd, sparseMask, cadd, csub, cmul + tests	2017-04-18 12:46:54 -07:00
Martin Raison	ec260fe8e9	add test for dsmm	2017-04-18 12:46:54 -07:00
Martin Raison	328b416068	THCS contiguous + to_dense	2017-04-18 12:46:54 -07:00
Soumith Chintala	4bde9efbd7	Update CONTRIBUTING.md	2017-04-18 15:39:58 -04:00
Soumith Chintala	ff781ed059	Update CONTRIBUTING.md	2017-04-18 15:39:26 -04:00
soumith	8f9a1af253	Merge commit 'fcf4deac7d215f134ea25cd3def8b564b58b033c'	2017-04-18 12:22:44 -07:00
soumith	31900b6bae	Merge commit '1feb120d938d47c01900f656322f16bc41d08af3'	2017-04-18 12:22:27 -07:00
Kittipat Virochsiri	ebb5cc4cdb	Make Gather works on empty DATA tensor Summary: Gather should work when both DATA and INDICES are empty Reviewed By: xianjiec Differential Revision: D4906878 fbshipit-source-id: 23585afbe618656d7f5831c56d360a03e3cb2584	2017-04-18 12:21:08 -07:00
Dmitry Ulyanov	46cf6ff5fb	fix batchnorm docs (#1284 )	2017-04-18 15:12:38 -04:00
Aapo Kyrola	c153b1ca75	fix softmax ops dimension, add explicit rowmax buffer for simplicity Summary: scale_ tensor was resizd in correctly in SoftmaxOp CUDA version. For some reason this has not triggered more crashes. I was using the rowmax_ in-place with scale_, which was then also incorrectly sized. Usually D>N, so this was not a issue, but perhaps there were cases with attention where this is not the case. Also the problem is order-sensitive, since if we once had an input with large D, the buffer was of correct size. Reviewed By: jamesr66a Differential Revision: D4904989 fbshipit-source-id: 244b6d308d1fc08be885c641440cbacad3b0dbce	2017-04-18 11:49:55 -07:00
Christian Sarofeen	fcf4deac7d	Fused RNN kernel remove explicit instantiation, isn't needed.	2017-04-18 11:07:58 -07:00
Adam Paszke	1feb120d93	Mark input as optional for gradInput in Tanh and Sigmoid	2017-04-18 10:33:33 -07:00
Adam Paszke	2ca071d730	Remove double precision math from LogSigmoid too	2017-04-18 10:28:13 -07:00
Adam Paszke	8a901c510d	Update ops for Sigmoid and Tanh	2017-04-18 09:55:11 -07:00
Andrew Dye	ed60fe0ed6	Gloo benchmarking and script updates Summary: Add AllgatherRing and CudaBroadcastOneToAll to benchmark. Add host info and algorithm sweep to chronos script. Reviewed By: pietern Differential Revision: D4901111 fbshipit-source-id: 1421025d39b914b14e857f21c43eac30c9c9dd2f	2017-04-18 09:06:34 -07:00
Aapo Kyrola	6595545843	fix CuDNN RecurrentOp Gradient init Summary: CuDNN RecurrentNet GradientOp did not pass the DROPOUT information to the initializer, causing incorrect scratch space size to be estimated. We have an assertion encorcing that scratch space is same for forward and backward ops, so this failed an assertion. We currently hard-code dropout to be 1.0, so this has had no effect on correctness in our tests. For some reason with num_layers=1 there wasn't an issue, but with num_layers>=2, the scratch space size was different. Reviewed By: salexspb Differential Revision: D4904715 fbshipit-source-id: 780266c5ecf1f7a32387edcb6fc498a13ac782ac	2017-04-18 08:36:18 -07:00
Yangqing Jia	2d28087529	Update mac build to ease the rpath issues Summary: TSIA - for rationale, see comments. Closes https://github.com/caffe2/caffe2/pull/272 Differential Revision: D4905583 Pulled By: Yangqing fbshipit-source-id: f6cdbc6b51512da03a4aec3f53de720d35c948b6	2017-04-18 01:17:38 -07:00
Yury Zemlyanskiy	4bf559eddb	RNNCell, LSTMCell, LSTMWithAttentionCell Summary: This is the nice way to re-use RNN layers for training and for inference. Reviewed By: salexspb Differential Revision: D4825894 fbshipit-source-id: 779c69758cee8caca6f36bc507e3ea0566f7652a	2017-04-18 00:47:20 -07:00
Pieter Noordhuis	e0a904011b	Use gradient name for allreduce op name Summary: This may help tell different allreduce operations apart during debugging/tracing. Reviewed By: prigoyal Differential Revision: D4897921 fbshipit-source-id: bbb2ce02a3e1f467ad54f8a3aed6a4e2b26a9fe4	2017-04-17 23:31:27 -07:00
Pieter Noordhuis	ed1e342860	Reuse common world for allreduce/broadcast Summary: The common worlds can be reused without performance impact as long as there is a guarantee that no two algorithm instances are using it at any given time. Since we know the ordering and the maximum parallelism, we can cycle through common worlds, and reuse them accordingly. Differential Revision: D4896779 fbshipit-source-id: 164e1727692eab904fa6879a9f91a3e8332a2e30	2017-04-17 23:31:26 -07:00
Yangqing Jia	cf317d1106	create_net: explicitly specify if one wants to overwrite the network. Summary: This is from discussion with dzhulgakov : as a step towards revisiting the core.Net autonaming, we will first guard against accidental overwrites of existing networks in the workspace. ajtulloch since we are doing Predictors in mobile, this should be safe right? azzolini - I assume this would be safe, but would love to get your approval. akyrola - would this hurt xray? Reviewed By: dzhulgakov Differential Revision: D4897725 fbshipit-source-id: aa41271927ad6671f07a53b9505283623f8c49e5	2017-04-17 21:46:53 -07:00
Aapo Kyrola	9ab077dc9d	Revert D4871248: [caffe2][PR] fp16 support for FullyConnected op Summary: This reverts commit 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd Differential Revision: D4871248 fbshipit-source-id: b6d812d09a00c83e363432e84742c503abfed65b	2017-04-17 21:31:20 -07:00
Bor-Yiing Su	391fd14115	Serializes a std::unique_ptr<std::mutex> object. Reviewed By: xianjiec Differential Revision: D4901097 fbshipit-source-id: 067d6fe3e2b201818eb6967a02b0ac0289fe8327	2017-04-17 19:46:16 -07:00
Kittipat Virochsiri	0a726af42e	Coerce input of FunctionalLayer to record Summary: Having to pack the input to schema doesn't make much sense since the structure is not recognized by operators anyway. Differential Revision: D4895686 fbshipit-source-id: df78884ed331f7bd0c69db4f86c682c52829ec76	2017-04-17 19:26:06 -07:00
Yangqing Jia	753201f40a	Merge pull request #271 from Yangqing/cmake Update gloo to new master	2017-04-17 18:44:37 -07:00
Yangqing Jia	3a9daeda8c	Update gloo to new master	2017-04-17 16:47:48 -07:00
Yangqing Jia	2f07e77218	update NNPACK related submodules	2017-04-17 16:47:09 -07:00
Aapo Kyrola	0a4c5756df	Logitzy SoftmaxWithLoss Summary: MT-team with urikz found out that their convergence discrepancy with another version of the model was caused by numerical stability issues in softmax. These were caused by our implementation not implementing the optimization to avoid doing exp(log(x)) for softmax-crossentropy. This diff fixes that. This does not require any changes to the current models since the output of SoftmaxWithLoss is still the exponentiated items I also did a little bit of cleanup on the code, for some reason we were passing tensors to SoftmaxCPU() instead of pointers. Reviewed By: urikz Differential Revision: D4901888 fbshipit-source-id: 62e785ecdd87e33742292b191e91b4f43912e4c0	2017-04-17 16:40:20 -07:00
Romain Cledat	20330fe3f4	Added tiles and axis as input parameters to Tile Operator Summary: Added the possibility to add 'tiles' and 'axis' as input as opposed to arguments for the Tile Operator. If provided, the input values will override the argument values Differential Revision: D4794432 fbshipit-source-id: a7e38f4f925a4cedf530924bd426c3bb08b5aad8	2017-04-17 15:31:20 -07:00
Yiming Wu	c3a4468af6	Add conv helpers and proxy to CNN Summary: Add conv helpers, the migration of functions assumes that people should not do cnn_model = CNNModelHelper(use_cudnn=True) cnn_model.Conv(..., use_cudnn=False, ...) Reviewed By: salexspb Differential Revision: D4884974 fbshipit-source-id: 12af6e2a5863eba789232cd4a4771f95d05f9227	2017-04-17 15:03:05 -07:00
Yiming Wu	2043b3c114	train and algebra helpers Summary: Adding train and algebra helpers Reviewed By: salexspb Differential Revision: D4884951 fbshipit-source-id: 7a18eb986a7356977a6c3d7a62a996ddce0c793e	2017-04-17 15:03:05 -07:00
Yiming Wu	277b4eca97	array helpers (concat) Summary: Adding array helpers Reviewed By: salexspb Differential Revision: D4884933 fbshipit-source-id: 2ec3dd37b243c8c717e299876eef7650a08d3f2b	2017-04-17 15:03:04 -07:00
Yiming Wu	ed3f0ac5e9	nonlinearity helpers Summary: adding nonlinearity helpers Reviewed By: salexspb Differential Revision: D4884894 fbshipit-source-id: fe180df23daabb62175d5a6ae7b46ccb5f7d0123	2017-04-17 15:03:04 -07:00
Yiming Wu	3623c241c4	normalization helpers Summary: Add normalization helpers Reviewed By: salexspb Differential Revision: D4884786 fbshipit-source-id: 529e678bae133e85d981310014c15d551d39d48b	2017-04-17 15:03:04 -07:00
Yiming Wu	e881c4c590	removing __all__ in fc, dropout, pooling Summary: removing __all__ in fc, dropout, pooling Reviewed By: salexspb Differential Revision: D4884742 fbshipit-source-id: 4c5cedc9205851b0f3aac6832cebd3d48d0c1e74	2017-04-17 15:03:04 -07:00
Luke Yeager	54d42af413	Fix a workspace test Summary: A workspace may add a suffix such as "_1" to the net name if other nets have been added to the workspace with the same name. This is true even if the previous nets have been removed or if the workspace has been reset. Closes https://github.com/caffe2/caffe2/pull/213 Differential Revision: D4899877 Pulled By: Yangqing fbshipit-source-id: b89b196df815dceff49a3ec76d7f658cdc4b0a38	2017-04-17 15:03:04 -07:00
Zhicheng Yan	25035e8b3b	ElementwiseLinearOp Summary: Implement a new op ElementwiseLinear. Given inputs X of size (N x D), a of size D and b of size D, the op computes Y of size (N X D) where Y_{nd} = X_{nd} * a_d + b_d. Typically, this op is followed by SigmoidCrossEntropyWithLogits op for multi-label classification problem. Differential Revision: D4892220 fbshipit-source-id: 77bffc5fbe03d48b3d83ab785f7c24a71c952aec	2017-04-17 14:18:27 -07:00
Artem Volkhin	ac7663b18c	layer_model_instantiator: filter layers by tags Summary: This diff allows to export a model partially, filtering layers by tags. Reviewed By: kittipatv Differential Revision: D4885610 fbshipit-source-id: 65394c5c9119d57a4d0703aa67ad8e79e4370e3b	2017-04-17 14:18:27 -07:00
Andrew Dye	f67ab32d34	Output peer address on network failures Summary: Output peer address on network failures. This change will help in root causing network failures. Differential Revision: D4899129 fbshipit-source-id: 60a762c6551a726081d5335ab478da8dd7f6dad7	2017-04-17 13:50:24 -07:00
Lucas Beyer	9150e33765	Add support for creating docsets. (#1276 ) Docsets are an offline documentation format introduced by Dash.app and supported by Zeal and some other open-source clones.	2017-04-17 16:35:02 -04:00
Lucas Beyer	e4478804ce	Fix `patched_make_field` for newer Sphinx versions. (#1275 ) Not sure since which version that change is needed, but using v1.5.5 here.	2017-04-17 16:17:58 -04:00
Simon Layton	1082db600e	fp16 support for FullyConnected op Summary: Includes math lib support, removal of double-precision. Closes https://github.com/caffe2/caffe2/pull/246 Reviewed By: Yangqing Differential Revision: D4871248 Pulled By: asaadaldien fbshipit-source-id: 6a991c2c993dcf0b1e18aa3f2ffbe19e693dbadd	2017-04-17 12:07:57 -07:00
Lucas Beyer	a220f2c3aa	Fix group-convolution w/o biases on CPU. (#1273 ) * Fix group-convolution w/o biases on CPU. Not having this guard will cause a crash further down in the `cat` function when it uses the first element in the passed list to create a new tensor. (And even after that, cat doesn't handle nulls well.) * Added test for groupconv w/o bias on CPU.	2017-04-17 14:53:28 -04:00
Simon Layton	5311fd3d6a	Conv no dx Summary: Based on a discussion with Yangqing, optionally disables the calculation of dX for a convolution op (i.e. conv1 in Alexnet), where the data gradient is not needed. Closes https://github.com/caffe2/caffe2/pull/242 Differential Revision: D4844013 Pulled By: bwasti fbshipit-source-id: 202d2410ed6c66671e83e8e49a1383883c6ab29e	2017-04-17 11:51:44 -07:00
Bor-Yiing Su	7270471ed6	Returns auxiliary parameters in the optimizers. Summary: 1. Adds a function to return auxiliary parameters for each optimizer. This function can be used to serialize the optimizers so that they can be recovered. 2. Fixes the bug that the iteration blob is not incremented by one in each iteration. Suppose there are k parameters using the adam learning rate optimizer, the iteration blob is incremented by k based on the original implementation. Reviewed By: azzolini Differential Revision: D4872397 fbshipit-source-id: d86711feedda2ba83af5f2a18141b06a6a473733	2017-04-17 10:16:32 -07:00
Dongsheng Fang	7568a99fee	Fix bugs in tensor-init-function Summary: These two init-functions could result in a wrong memory call if not in CPUContext: ```c++ template <typename T> Tensor(const vector<TIndex>& dims, const vector<T>& values, Context* context) ``` ```c++ template <typename T, typename = typename std::enable_if<std::is_scalar<T>::value>::type> Tensor(const T& value, Context* context) ``` Closes https://github.com/caffe2/caffe2/pull/252 Differential Revision: D4892633 Pulled By: Yangqing fbshipit-source-id: 5979fc2170881d30f5260361489dffc5d6fdd1cd	2017-04-16 18:07:15 -07:00
Yangqing Jia	22f3825d8f	Cmake mobile build improvements Summary: (1) integrate gcc compatible nnpack (2) speed up the ios travis ci. Closes https://github.com/caffe2/caffe2/pull/268 Differential Revision: D4897576 Pulled By: Yangqing fbshipit-source-id: 729fa2e4b5be6f1d0b8d55305f047116969ff61f	2017-04-16 16:46:58 -07:00
Shenxiu Liu	dd923cf052	Unmask operator in Caffe2 Summary: A CPU implementation for unmask operator in caffe2. There's also a small bug in mask operator, fix it as well. Reviewed By: ender-wieczorek Differential Revision: D4896351 fbshipit-source-id: 887d1beb66fe93ea2da1c4e165fce2e026907726	2017-04-16 11:23:19 -07:00
Aaron Markham	dd80310681	inference lookup in now local for tutorial (#267 ) * updated ubuntu instructions * updated ubuntu notes and troubleshooting * updated tutorials using local files * added doxygen python blocks for docs generation * doxygen related files for generating docs * removing Mac and Windows build status while those are in beta * inference lookup is local now	2017-04-16 10:06:56 -07:00
Soumith Chintala	15267ac009	fix typo	2017-04-15 13:08:58 -04:00
Dongsheng Fang	3c0dc06ac8	Add __builtin_cpu_supports function def in windows Summary: Closes https://github.com/caffe2/caffe2/pull/253 Differential Revision: D4892628 Pulled By: Yangqing fbshipit-source-id: 45d49121027454d9259c4a753438d8f0771cf042	2017-04-14 19:46:19 -07:00
Yangqing Jia	ca0c8e5b25	remove import_array() help and use import_array1 Summary: TSIA. See https://github.com/numpy/numpy/blob/master/numpy/core/code_generators/generate_numpy_api.py Reviewed By: jamorton Differential Revision: D4893002 fbshipit-source-id: 4b6bee1bdf8ae905e4c0952a3e8bbbacd4129a50	2017-04-14 19:46:19 -07:00
Aaron Markham	b93a7b134a	doxygen configs and updated python files to inc. doxygen tags (#266 ) * updated ubuntu instructions * updated ubuntu notes and troubleshooting * updated tutorials using local files * added doxygen python blocks for docs generation * doxygen related files for generating docs	2017-04-14 16:30:33 -07:00
Aapo Kyrola	4db7bec686	CUDA version of SigmoidCrossEntropyWithLogits Summary: CUDA versions of SigmoidCrossEntropyWithLogits/Gradient. Reviewed By: jay-mahadeokar Differential Revision: D4891254 fbshipit-source-id: cabad908026e30d9a0721cad092ba948659ab917	2017-04-14 16:07:33 -07:00
Pieter Noordhuis	fc8bb523e8	Update gloo dependency	2017-04-14 22:25:45 +00:00
Michael Shao	0cb60e7d5a	Retrieve ethernet interface link speed Summary: Retrieve ethernet interface link speed Reviewed By: pietern Differential Revision: D4880290 fbshipit-source-id: 91f1555d9bb35ff41dc731e082365a9002bb1661	2017-04-14 14:41:01 -07:00
Pieter Noordhuis	182e2d348e	Use halving/doubling allreduce if context is power of two Summary: The halving/doubling algorithm is faster than both ring and chunked ring up to 5M elements, but only works with power of two contexts right now. So use it unless the context size is not a power of two. Differential Revision: D4890065 fbshipit-source-id: 09ff82b375cbd3d0626e0255dcf9b9f4873fff54	2017-04-14 14:32:46 -07:00
Ahmed Taei	a207aa4dbc	Fix backward compatibility bug for cnn model helper arguments Summary: For new trained models passing kernels=2*[kernel] and using old code for inference that will not work because (kernels) argument isn't supported and we are not passing kernel. Reviewed By: salexspb Differential Revision: D4888795 fbshipit-source-id: 1649b073c4e1da1d59da9cea581b4dcab6dbaf5c	2017-04-14 09:47:48 -07:00
Yangqing Jia	475eff5281	Allow peer access only in groups of 8 Summary: This is the hardware limit set by NVidia. Basically, on Amazon P2 machines that have 16 gpus, the previous setting will trigger an error. This fixes the issue but is pending verification from Amazon. Differential Revision: D4888402 fbshipit-source-id: 8d26a24d6e0476f895b9afdb979144eb8e6b9321	2017-04-14 09:47:48 -07:00
Aapo Kyrola	3c9dfe4736	dag-compatible forward memonger Summary: Memonger's inference optimization is very efficient, but does not work if a multi-threaded DAG net is used. So I added this alternative that shares code with the gradient memonger and does the blob recycling by traversing the DAG and ensuring that blobs do not pass parallel branches. Reviewed By: viswanathgs Differential Revision: D4884303 fbshipit-source-id: dfd0a6ecdb91f4edbb0b743729c92f4cd015602e	2017-04-13 22:08:09 -07:00
Yangqing Jia	d65892b7f2	Change back the function signature of relu gradient to only use Summary: This allows us to do in-place relu and also corrects the previous error of inconsistency between the cudnn impl and the non-cudnn impl. This implementation butchers the cudnn interface, in the sense that we pass in the output instead of the input for the gradient pass. We do have a gradient checker to guard this situation, so we should be safe. Reviewed By: asaadaldien Differential Revision: D4889426 fbshipit-source-id: 081f8fe06de78413b5786086bfd5ae6c8128cd6e	2017-04-13 22:08:09 -07:00
James Reed	e8cc5563fe	Add an optional forget bias argument to LSTMUnit Summary: Add an option to bias the forget gate one way or another by adding in some float value before the sigmoid is applied. Differential Revision: D4880712 fbshipit-source-id: 1306a97c29fb31630838b2f96597a46e952d940a	2017-04-13 21:49:17 -07:00
Alisson Gusatti Azzolini	246bedd406	Add counter for task processing wall time Summary: This allows to check what's the real cost of each PS request for each parameter, and hopefully will allow to improve the sharding logic. Reviewed By: dzhulgakov Differential Revision: D4799210 fbshipit-source-id: d18effc671f3f7a611e535e09bde360ef0102a33	2017-04-13 20:44:10 -07:00
Aapo Kyrola	f94f43fd6e	Working sparse gradients for data parallel model Summary: This diff enables sparse gradient synchronization between GPUs. The test case is now a bit too convoluted, but once D4871680 is landed, we can simplify it a bit. Reviewed By: dzhulgakov Differential Revision: D4877087 fbshipit-source-id: 37bbb07051cbaf3a6e3c54b0eead97f3e02337d5	2017-04-13 17:39:23 -07:00
Aapo Kyrola	69f42e3f70	make CopyGPUToCPU/CPUToGPU handle sparse gradients Summary: CopyGPUToCPu and CopyGPUToCPU need to handle gradients that come sparse on their way. Added unit test and fixed the gradient makers to create copies for both value and indices. This becomes less important with gpu sparse parameter update ops land, but nevertheless good to fix. Reviewed By: dzhulgakov Differential Revision: D4882327 fbshipit-source-id: aafd2df46b3e1bcb30b52b1edf40fad8271f1f88	2017-04-13 17:16:26 -07:00
Lukasz Wesolowski	b61174047f	Add threshold to switch between host/device reduce and bcast depending on buffer size Summary: Device reduce is more efficient for large buffer sizes. For smaller buffers, host reduce may be more efficient in some cases and frees up the GPU for other work. Reviewed By: andrewwdye Differential Revision: D4885855 fbshipit-source-id: 7dc522e8c93e1a94427730aca6af03b7e93e660d	2017-04-13 15:05:47 -07:00
Kittipat Virochsiri	baf33161d4	GatherRecord layer Summary: Perform gather on the whole record. This will be used for negative random sampling. Reviewed By: kennyhorror Differential Revision: D4882430 fbshipit-source-id: 19e20f7307064755dc4140afb5ba47a699260289	2017-04-13 15:02:44 -07:00
Pieter Noordhuis	8d93fcf13f	Don't allow overwriting keys in HashStore Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4885102 fbshipit-source-id: c46c180fa8e6dd354921d562830b3515ba91c964	2017-04-13 12:35:32 -07:00
Luke Yeager	8bd0522c20	Add tests and GPU impls for sparse optimizers Summary: These GPU paths are probably even buggier than the CPU paths for sparse gradients with duplicate indices. Both paths cause multiple momentum updates in a single iteration, but only the GPU path is non-deterministic. Depending on how we decide to address the issues on the CPU path, pooyadavoodi has a good idea for how to match dense behavior with the sparse GPU ops. Closes https://github.com/caffe2/caffe2/pull/254 Reviewed By: bwasti Differential Revision: D4871680 Pulled By: dzhulgakov fbshipit-source-id: 220be57a0f699a22ea85ed4f7022d92d362d06b3	2017-04-13 11:07:40 -07:00
Gregory Chanan	a559893c9f	Instantiate nccl type templates for gloo (minus half) Summary: Instantiate nccl type templates for gloo (minus half). half requires at a minumum ifdefing CUDA_HAS_HALF and likely requires more work given that operators aren't defined on it, so skipping it for now. Reviewed By: pietern Differential Revision: D4876217 fbshipit-source-id: 833d2aec12789cbaf9e0a201b979a420fbe6732f	2017-04-13 10:52:38 -07:00
Yiming Wu	83f360887f	new SumReduceLike op CPU/GPU implementation and doc Summary: new SumReduceLikeOp CPU/GPU implementation and doc. Unit tests and NMT team tests passed. Some benchmark results here: shape(A) = [100, 1000, 100] shape(B) = [1000] 0.36684 ms/iter (0.00122679 ms/iter) SumReduceLike 0.246593 ms/iter (0.00151116 ms/iter) ReduceBackSum 0.202563 ms/iter (0.00511932 ms/iter) ReduceFrontSum // This means that we are faster than back+front sum now shape(A) = [32, 32, 100] shape(B) = [32, 100] 0.0253826 ms/iter (0.00257504 ms/iter) ReduceFrontSum 0.0233368 ms/iter (0.00118283 ms/iter) SumReduceLike shape(A) = [32, 32, 100] shape(B) = [32, 32] 0.0276206 ms/iter (0.00691918 ms/iter) ReduceBackSum 0.0254768 ms/iter (0.00325529 ms/iter) SumReduceLike Reviewed By: Yangqing Differential Revision: D4873222 fbshipit-source-id: 736b1537998f4289876bc53d38607b8052e89c70	2017-04-13 10:28:46 -07:00
Janusz Marcinkiewicz	50c2759afe	Expose missing headers Summary: Closes https://github.com/facebookincubator/gloo/pull/25 Differential Revision: D4883908 Pulled By: pietern fbshipit-source-id: 662a8fdf83ad099295b11043194de25c747e8286	2017-04-13 10:08:06 -07:00
Aapo Kyrola	da93963860	add input/output blob name when exception thrown from tensor Summary: Added a field caller_ to caffe2::EnforceNotMet and mofified operator Run() exception handler to add the input/output name of the blob being accessed to the error message. Note that this is not able to distinguish case when blob occurs in both input and output, but I believe this is still helpful. Reviewed By: salexspb Differential Revision: D4863982 fbshipit-source-id: f6a872fb07f8957dc2d3366d9f106fa81bffbd72	2017-04-13 09:03:33 -07:00
Kittipat Virochsiri	05002442eb	Renaming DuplicateOp to LengthsTileOp Summary: making the name a bit clearer Reviewed By: xianjiec Differential Revision: D4866940 fbshipit-source-id: 3e0f7067a9d3ba89cb038d85c1991e541f1e439c	2017-04-12 22:04:20 -07:00
NC Cullen	cb66e9cf78	torch.diag bug fix (#1251 )	2017-04-12 20:59:12 -07:00
Lukasz Wesolowski	735f5af87e	Add new variant of halving/doubling algorithm that pipelines local reduce/broadcast with communication steps Summary: Added a pipelined version of cuda halving/doubling algorithm. Half the buffer is reduced prior to first send and the other half prior to reducing the result from first receive. Broadcasts are started asynchronously as soon as each new message is received. New code was added as a new algorithm, as pipelining makes performance worse for small buffer sizes. Reviewed By: pietern Differential Revision: D4847109 fbshipit-source-id: 5aa55de95f8c94069380af7396f2b5b6297dcbea	2017-04-12 18:01:22 -07:00
Pieter Noordhuis	8c9f4d8c3b	Add throughput information to resnet50_trainer Summary: TSIA Makes it easier for throughput debugging. Differential Revision: D4879634 fbshipit-source-id: 8d479d51b0ec51ad3d86ad5500fc3095400cf095	2017-04-12 17:46:14 -07:00
Aapo Kyrola	580ff3a594	Revert D4854240: [EAZY][C2 OSS] Add normalization helpers and proxy to CNNModelHelper Summary: This reverts commit 3fa594d79960742b34e20d843e8b6ef8aeb601d3 Differential Revision: D4854240 fbshipit-source-id: d08cb30f188f876e1962f53a44f4e6d4ea68297f	2017-04-12 16:46:01 -07:00
Aapo Kyrola	32b30ff1fe	Revert D4854440: [EASY][C2 OSS] Add Nonlinearity helpers and proxy to CNNModelHelper Summary: This reverts commit a337e5279729f1c938f34b3994ab8827ee94aa93 Differential Revision: D4854440 fbshipit-source-id: 00ef9724654990356be9df9bb1f65b4fd0fd0ffc	2017-04-12 16:36:33 -07:00
Aapo Kyrola	a8ef3b4090	Revert D4855073: [EAZY][C2 OSS] Add array_helpers and proxy to CNN Summary: This reverts commit 7272f62cff5d065eb028b8118a1ca190bd801fd5 Differential Revision: D4855073 fbshipit-source-id: a121e6bb98c37c7af0b59efad275e00bd5d21163	2017-04-12 16:36:33 -07:00
Aapo Kyrola	7867262d39	Revert D4855040: [EASY][C2 OSS] Add Algebra and train helpers and proxy them to CNNMH Summary: This reverts commit d948ea913f674a6e47c4b72629a2d33253cb3130 Differential Revision: D4855040 fbshipit-source-id: c8efa9566a3ec6b9a9d3ad0e8cab3cc656627473	2017-04-12 16:36:32 -07:00
Soumith Chintala	c852883086	add named_parameters that yield name and value of parameters (#1242 )	2017-04-12 16:32:36 -07:00
soumith	ab77e4c3d7	Merge commit '62c584ba7972dbba404766aa06d1a558282b4169'	2017-04-12 15:06:58 -07:00
soumith	2444278b8b	Merge commit '4336e9ea6641b8ac2814eaef2adef64e4106459c'	2017-04-12 15:06:10 -07:00
gchanan	62c584ba79	Fix abs with char and short cuda types. (#747 )	2017-04-12 15:04:59 -07:00
Trevor Killeen	fbd53d87bf	block wide reduction with multiple values to reduce at once (#745 )	2017-04-12 15:04:43 -07:00
Pieter Noordhuis	c907c7c7dc	Update resnet50_trainer example Summary: A few fixes in this commit: the epoch size is now rounded down to the closest integer multiple of the global batch size (batch per GPU * GPUs per hosts * hosts per run). The num_shards and shard_id parameters are now passed to CreateDB so multiple processes actually train on different subsets of data. The LR step size is scaled by the number of hosts in the run. The test accuracy is only determined after each epoch instead of after every so many iterations. Differential Revision: D4871505 fbshipit-source-id: d2703dc7cf1e0f76710d9d7c09cd362a42fe0598	2017-04-12 14:03:51 -07:00
albanD	71303b8af4	Autograd deadlock for recent glibc fix (#1243 )	2017-04-12 22:24:31 +02:00
Soumith Chintala	4336e9ea66	Revert "make it compile on Windows + use ilp64 MKL" (#1002 )	2017-04-12 12:07:16 -07:00
Kittipat Virochsiri	f5ac83b060	LengthsGatherOp Summary: Length-aware gather operator. This will be use for random negative sampling. See the task for details. This should be equivalent to: LengthsToRange + Gather + Reshape + GatherRanges That's pretty complicated. Differential Revision: D4846023 fbshipit-source-id: 8d9b7ff3eddc75a7ab147cd1c2a12f377652df93	2017-04-12 12:01:35 -07:00
James Reed	bbcdc91135	Remove prof_dag from step net Summary: prof_dag in step net is not supported (Note: this ignores all push blocking failures!) Differential Revision: D4876551 fbshipit-source-id: 4003e60908e51ef052f8656bf527b326676c298c	2017-04-12 11:01:30 -07:00
Jerry Pan	154d49cc6a	Caffe2: add schema for SumElementsGradient Summary: Caffe2: add schema for SumElementsGradient Reviewed By: jamesr66a Differential Revision: D4873313 fbshipit-source-id: eba03d22cd260c99d13b215540b3d62f65e900d3	2017-04-12 10:09:27 -07:00
Aapo Kyrola	4967db0756	sanity checks for data parallel model Summary: To help dgponinath, and people in general: check that params don't have duplicate entries. Differential Revision: D4872132 fbshipit-source-id: 1cca1237fda771eb270227f452ecae0f912d7a33	2017-04-12 09:32:12 -07:00
Ahmed Taei	75c2168966	Generalize PoolingOp(CUDA) to compute 1D, 2D and 3D pooling. Summary: Extend MaxPooling & AvergePooling CUDA ops to compute 1D, 2D & 3D pooling. Differential Revision: D4866699 fbshipit-source-id: 9bf2d970f2df2b87194a539fc60c07ac19fa1042	2017-04-12 09:16:45 -07:00
Dmitry Ulyanov	d48afd41f9	Add print string for MaxPool3d, change for MaxPool2d (#1115 )	2017-04-12 15:58:28 +02:00
Yiming Wu	fd5643e426	Add math::Gemv<double, CUDAContext> by cublas::cublasDgemv Summary: support double gemv in CUDAContext Differential Revision: D4872986 fbshipit-source-id: c6397c5a3b2667ca446deca0f5edbcc7f29f7a1e	2017-04-12 01:17:47 -07:00
Yiming Wu	8de1ce57d2	Add Algebra and train helpers and proxy them to CNNMH Summary: Add Algebra and train helpers and proxy them to CNNMH Reviewed By: salexspb Differential Revision: D4855040 fbshipit-source-id: d948ea913f674a6e47c4b72629a2d33253cb3130	2017-04-11 23:03:00 -07:00
Yiming Wu	b2e94a7bcb	Add array_helpers and proxy to CNN Reviewed By: salexspb Differential Revision: D4855073 fbshipit-source-id: 7272f62cff5d065eb028b8118a1ca190bd801fd5	2017-04-11 23:02:59 -07:00
Yiming Wu	e7cdd90490	Add Nonlinearity helpers and proxy to CNNModelHelper Summary: Add Nonlinearity helpers and proxy to CNNModelHelper Reviewed By: salexspb Differential Revision: D4854440 fbshipit-source-id: a337e5279729f1c938f34b3994ab8827ee94aa93	2017-04-11 23:02:59 -07:00
Yiming Wu	b8f2baec8e	Add normalization helpers and proxy to CNNModelHelper Summary: Add normalization helpers and proxy to CNNModelHelper Reviewed By: salexspb Differential Revision: D4854240 fbshipit-source-id: 3fa594d79960742b34e20d843e8b6ef8aeb601d3	2017-04-11 23:02:59 -07:00
Yiming Wu	d35b7569db	Add Pooling Helpers, proxy to CNNModelHelper Summary: Add Pooling Helpers, proxy to CNNModelHelper Reviewed By: salexspb Differential Revision: D4854014 fbshipit-source-id: 672fcd886153136b707866400b2705544eaf4ec9	2017-04-11 23:02:59 -07:00
Evan Klitzke	e21e4bf3e8	add pyyaml to conda note here as well	2017-04-11 21:21:18 -07:00
Aapo Kyrola	570c6bb9b7	Fix backward pass computation when an input is used in a Fill-op input for shape Summary: Fix issue that amyzhang encountered. She was using ConstantFill to create a blob of same size as an another blob. This caused the gradient op computation flow to interrupt through the ConstantFil since the gradient for the input blob was set to None (although it had another gradient already set). The correct solution is to avoid overwriting gradient assignments with None, if they already have a gradient. UNLESS that blob is output of the same op, as with StopGradient op. (Note that Amy's problem was fixed by using instead a fixed shape ConstantFill and Add with broadcast=1, which is better solution anyway). Not sure if I explained this well, but see the new unit tests. Before this change, the testAddAndDynamicConstant failed but the testAddAndStaticConstant succeeded. Reviewed By: dzhulgakov Differential Revision: D4861176 fbshipit-source-id: 3b53621bfaba2e36786a5e4664145038995f6616	2017-04-11 19:32:22 -07:00
Aapo Kyrola	f0426e6288	remove TODO comment Summary: TODO is incorrect, code was fixed in D4024922. Differential Revision: D4872233 fbshipit-source-id: d66af5e099c3b7beb38cdb8e6acd4b161c8e28f9	2017-04-11 19:05:05 -07:00
Ahmed Taei	09bfc8043b	Generalize PoolingOp(CPU) to compute 1D, 2D and 3D pooling. Summary: Extend the op compute 1D, 2D & 3D pooling. Differential Revision: D4828691 fbshipit-source-id: 87540e82ed20d1361476cfbc43a708de9ca7a88e	2017-04-11 18:18:21 -07:00
soumith	8e36339911	Merge commit '0925c91e80cc1b3a86fcbc54570f5bb204c9cb77'	2017-04-11 18:00:44 -07:00
soumith	5391fe8953	addr zeroes output buffer when beta=0	2017-04-11 18:00:11 -07:00
soumith	0925c91e80	addr zeroes output buffer when beta=0	2017-04-11 17:59:42 -07:00
James Cross	0f43ac6865	use GPUFallback for TopK Summary: Use GPUFallback (to CPU) for TopK operator. Differential Revision: D4870842 fbshipit-source-id: e3d6ca769b5cbb9ed7dc898a53e789da596b2685	2017-04-11 17:04:54 -07:00
Natalia Gimelshein	253c854da5	update Dockerfile not to use requirements.txt	2017-04-11 15:42:05 -07:00
Soumith Chintala	7c59754d24	update source build instructions	2017-04-11 15:24:31 -07:00
soumith	2bf7dc643f	Merge commit 'aec658f8708a6f4448329da006d14ff2e13dc821'	2017-04-11 15:02:36 -07:00
soumith	ce30c76823	Merge commit '2b37ecfccf810a8e21c2c9ac9a943ce2f7c01015'	2017-04-11 15:02:16 -07:00
soumith	a8d60ad3ac	fix THNN headers	2017-04-11 15:00:30 -07:00
soumith	aec658f870	fix THNN headers	2017-04-11 14:57:11 -07:00
soumith	2b37ecfccf	fix THNN headers	2017-04-11 14:56:53 -07:00
Adam Paszke	01a35dcace	Fix coalesced CUDA collectives for nonhomogeneous lists	2017-04-11 14:48:54 -07:00
Adam Paszke	afeeb81e79	Add support for keyword arguments in torch.cat	2017-04-11 14:48:54 -07:00
Adam Paszke	6002f94232	Fix is_tensor and is_storage for old-style classes	2017-04-11 14:48:54 -07:00
Adam Paszke	a5c7d98611	Import TripletMarginLoss	2017-04-11 14:48:54 -07:00
Adam Paszke	605b3c86ce	Retain the type of numpy scalars in collate_fn	2017-04-11 14:48:54 -07:00
Adam Paszke	2087b1157a	Improve serialization error messages	2017-04-11 14:48:54 -07:00
Adam Paszke	81e972031d	Handle all errors if Module's sources can't be retrieved	2017-04-11 14:48:54 -07:00
Bor-Yiing Su	81a55f441c	Adds interfaces to check the existence of a DB Summary: To evaluate on checkpoints, we often need to load from multiple checkpoints. However, it is inconvenient if we always need to check the existence of a checkpoint manually. Adds interfaces to check the existence of a DB so that we can find available checkpoints automatically. Reviewed By: azzolini Differential Revision: D4823876 fbshipit-source-id: e5a65b736ac2addd0447c4add81dbd0986f422e7	2017-04-11 14:07:49 -07:00
Christian Sarofeen	e9ff57176b	Fused pointwise kernels for GRU/LSTM	2017-04-11 13:42:06 -07:00
soumith	a739960515	Merge commit 'cfa504691c2ce5e10010ffb6cd43001c59109aea'	2017-04-11 13:41:54 -07:00
soumith	f43320dbf2	Merge commit '0dc52abe9a673547caf79ac64c73e8e16fb37b33'	2017-04-11 13:41:42 -07:00
Aaron Markham	e362a64975	release notes for v0.6.1 (#260 ) * updated ubuntu instructions * updated ubuntu notes and troubleshooting	2017-04-11 13:40:03 -07:00
Christian Sarofeen	cfa504691c	Fused pointwise kernels for GRU/LSTM	2017-04-11 13:36:38 -07:00
Christian Sarofeen	0dc52abe9a	Fused pointwise kernels for GRU/LSTM	2017-04-11 13:36:02 -07:00
Aapo Kyrola	1e5140aa76	option to recompute blobs backward pass with massive memory savings Summary: This diff adds an option to recurrent_net to define some cell blobs to be recomputed on backward step, and thus they don't need to be stored in the step workspace. This is done by modifying the backward step to automatically include all operators that are needed to produce the output that is to be recomputed, and by storing those blobs in a shared workspace. To enable the shared workspace, i had to modify the stepworkspaces blob to also store a forward shared workspace. Making it a class field won't work since the lifecycle of the blob does not match the lifecycle of the operator. For basic LSTM, the performance hit is quite modest (about 15% with one setting, but your mileage might vary. For Attention models, I am sure this is beneficial as computing the attention blobs is not expensive. For basic LSTM, the memory saving is wonderful: each forward workspace only has 4 bytes (for timestep). I also modified the neural_mt LSTM Cells, but there is no test available, so I am not 100% sure I did it correctly. Please have a look. Added options to LSTM, MILSTM and LSTMAttention to enable memory mode. Reviewed By: urikz Differential Revision: D4853890 fbshipit-source-id: d8d0e0e75a5330d174fbfa39b96d8e4e8c446baa	2017-04-11 13:03:48 -07:00
Christian Sarofeen	0b50f794e9	Use thnn version of Tanh/Sigmoid instead of autograd. (#1234 )	2017-04-11 12:49:57 -07:00
Huazhong Ning	15c6f637d6	create bucket-based calibration - layer Summary: The basic idea of bucket-based calibration: 1. given a model and a calibration data set 2. apply the model to the calibration data set and sort the prediction scores 3. bucketize the prediction scores 4. for the samples in each bucket, compute the proportion of positive samples 5. build a set of piecewise linear functions that map from the bucket range to the proportion 6. appends an operator of piecewise linear transform to the prediction net that is supposed to calibrate the raw predictions. 7. to support calibration in realtime training, we create a new type of Net -- bucket calibration net. This needs a new Context to add_calibration_ops(), to export and load the new Net. This includes a series of diffs. This diff implements a layer that adds different operators for train/cali/eval for bucket based calibration. Reviewed By: dragonxlwang Differential Revision: D4817119 fbshipit-source-id: 44f8fcad2a94f40f7439cc1ad47e7bae5e17397d	2017-04-11 12:30:26 -07:00
Simon Layton	06ae1ff534	Add fp16 dispatch to main cuDNN operators Summary: Adds support for fp16 to main cuDNN ops (conv, relu, pool, BatchNorm). Done via. runtime dispatch, not using DispatchHelper at this point to allow for more complex dispatch logic in the future if necessary. Using separate template for all input / output types is for these reasons also - it's easier to add the functionality now and never use it, than need to add it later. Closes https://github.com/caffe2/caffe2/pull/241 Differential Revision: D4831264 Pulled By: asaadaldien fbshipit-source-id: ad2ffdb13c031d8eb20552ffbf83c05c278252f7	2017-04-11 12:30:24 -07:00
João Felipe Santos	2abbb5133c	Fixing function signatures: long -> ptrdiff_t (#1232 )	2017-04-11 11:37:21 -07:00
Aaron Markham	67468212e3	updated ubuntu instructions (#259 )	2017-04-11 11:36:22 -07:00
Pieter Noordhuis	fcf8387779	Fix ibv_devices wrapper if device list is empty Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4866469 fbshipit-source-id: 6bbde8ec9d71ea89ccdab379d48d122b90237460	2017-04-11 11:04:54 -07:00
Evan Klitzke	ade105fb7c	update README to install pyyaml from conda (#1231 )	2017-04-11 10:23:45 -07:00
Xianjie Chen	70e9c08f27	feature processing ops Summary: add necessary ops for feature processing * logit op * replace nan * batch one hot op Reviewed By: kittipatv Differential Revision: D4840869 fbshipit-source-id: 197123ea5608d54f0b5ac7899973a077a6a86775	2017-04-11 07:07:51 -07:00
James Reed	4dc1dbab05	MILSTM Cells with and without attention Summary: Bug fix for MILSTM implementation: parameters were not trainable. Reviewed By: urikz Differential Revision: D4864581 fbshipit-source-id: a3fdb7a85c8d87c5117328ca8cae4fb6352728d0	2017-04-11 02:01:23 -07:00
Aapo Kyrola	22584b546a	Revert D4711302: SumReduceLikeOp CPU/GPU implementation Summary: This reverts commit 0865abde871b3046b367599731593dae03f0775a Differential Revision: D4711302 fbshipit-source-id: 6c22e683544f6627142fc9970a781ec98f682cad	2017-04-10 23:01:26 -07:00
Aapo Kyrola	84ee795b25	remove net_predictor_extract.py Summary: Having directory utils broke open source build :(. Removing the contents as this utility is not really needed. Differential Revision: D4866228 fbshipit-source-id: 1eae4580ebac5b60e52e2e8553e0ffd919152228	2017-04-10 21:33:27 -07:00
Aapo Kyrola	092c1440a2	SumSqrElements Summary: Added SumSqrElements, since then we can avoid a large temporary blob which is needed when doing Sqr + SumElements. Also moved to reduction_ops, because utlitity_ops has grown too big. Reviewed By: jamesr66a Differential Revision: D4844172 fbshipit-source-id: 032eec45e24d6724f0d5fb83f4ec1c771d1146e5	2017-04-10 16:16:52 -07:00
soumith	4e693d12ab	Merge commit '79c4cb96b16dac603247ffd88c473e84565915a9'	2017-04-10 14:35:54 -07:00
albanD	79c4cb96b1	fix memory leak in btrisolve and getri	2017-04-10 14:35:07 -07:00
Pieter Noordhuis	97bd6aae37	Throw error if Redis replies with error Summary: The code already asserted, but only on the reply type, so it didn't include the actual error message. This makes debugging problems much easier when people have problems running the benchmark suite. Differential Revision: D4860022 fbshipit-source-id: 659bc461a724603375bff18eac90eca658492b05	2017-04-10 10:49:59 -07:00
Pieter Noordhuis	f618ea9f31	Update README.md Summary: Mention GPUDirect in README Closes https://github.com/facebookincubator/gloo/pull/24 Differential Revision: D4860167 Pulled By: pietern fbshipit-source-id: 80804c778cdc6a9bcd8febe7e05142145cc6c61b	2017-04-10 10:49:59 -07:00
Quan Vuong	f6fef3718e	fix typo in autograd.rst (#1219 )	2017-04-10 01:16:59 -04:00
Pieter Noordhuis	3fcdd6a42b	Reuse sockaddr information from device Summary: This is cheaper than doing getaddrinfo for every pair. Reviewed By: andrewwdye Differential Revision: D4850102 fbshipit-source-id: e77f468f099f63860b52fdd0dcc57a8a7a91a448	2017-04-09 16:37:41 -07:00
Pieter Noordhuis	707c1ca4cc	Function to retrieve PCI bus ID from device Summary: Part of this change is to perform a getaddrinfo in the TCP device class so we can figure out the interface and subsequently PCI bus ID of the NIC used for its traffic. This information can be used in a later diff to avoid doing getaddrinfo calls in the TCP pairs and have them reuse the information that is resolved by the device. The PCI bus ID can be used to compute distance between NICs and GPUs and make informed decisions on where to allocate scratch buffers. Reviewed By: andrewwdye Differential Revision: D4850035 fbshipit-source-id: 575e401a9273300bc720c814fef8971846ec748c	2017-04-09 16:37:41 -07:00
Soumith Chintala	bc0ed9298d	remove incorrect version in readme	2017-04-09 14:44:44 -04:00
Soumith Chintala	040cf42643	Merge pull request #455 from twitter-forks/indexlinear Adding Indexlinear	2017-04-09 13:52:56 -04:00
Pavan Yalamanchili	6d9ad1d66a	Adding IndexLinear (#1181 ) * Add IndexLinear * Fixes to IndexLinear - Fix IndexLinear test - make it better for multithreaded case - fix a glitch in the C code - improve the reset() method - fix the weight allocation. - remove "fakeBatch" possibility as it's not used - clamp normalized values at evaluation time instead of just dividing by max. - add assert on the keys/values dimensions in IndexLinear. - invert order of weightDecay in the case of output dim > 1. * Changes required to support IndexLinear in CUDA * Adding support for flattened inputs for IndexLinear * Doc for IndexLinear + fix for when the input format changes from one batch to another. * Cleaning up IndexLinear documentation * Changes required to build with latest torch * Adding benchmark script for IndexLinear * Bugfixes and cleanup of IndexLinear.lua - Fixed bug that occurs when performing multiple accGradParams + updateParams - All the data required for the updates is put in a single table - Added :pararameters method	2017-04-09 13:51:45 -04:00
Huazhong Ning	d1af311224	PiecewiseLinearTransformOp supports passing params from input blobs. Summary: The PiecewiseLinearTransformOp passes the transform parameters (bounds, slopes, intercepts) via operator arg. This diff supports to pass these parameters through input blobs. The purpose is to allow us to create a model calibration net that can be exported when saving model. Reviewed By: dragonxlwang Differential Revision: D4777086 fbshipit-source-id: 0d157154860f61ec6ecfab95aea80beed54aa5c6	2017-04-08 11:02:35 -07:00
Vincenzo Ferrari	64ee4056d7	updated docker image inside the docs (#1216 )	2017-04-08 10:29:03 -04:00
Kittipat Virochsiri	d8b9e787c2	DuplicateOp Summary: This is like LengthsToSegmentIds + Gather w/o the immediate segment IDs blob. I only realized that after I wrote the whole thing. That combination is not obvious, so just check this in? Reviewed By: xianjiec Differential Revision: D4847591 fbshipit-source-id: a1c480f16b317763866af13c83b3aaaeb6a60751	2017-04-08 00:01:59 -07:00
Aapo Kyrola	b0adcf02f8	remove workspace sequence id Summary: As said in the title. This should save a lot of memory if using both train and test workflows. Reviewed By: jhcross Differential Revision: D4855436 fbshipit-source-id: 9eeca548eee118e07bd587c46f40e7beb138318e	2017-04-08 00:01:59 -07:00
Aapo Kyrola	a2065f3c1e	report capacity bytes as part of workspace blob stats Summary: Instead of reporting the number of total elements of tensor, report the number of bytes. But report the capacity of the tensor, not the current number of bytes. Reviewed By: jamesr66a, salexspb Differential Revision: D4851633 fbshipit-source-id: 464d552f41f1b5f25753b0e7001d299b6dac1966	2017-04-07 19:16:37 -07:00
Yiming Wu	64599d8351	create helpers package and add dropout Summary: Helpers package and Dropout helper file Reviewed By: salexspb Differential Revision: D4837140 fbshipit-source-id: cd3030974421ce6830747935183e098aa04b2803	2017-04-07 17:33:49 -07:00
soumith	55d69b5ade	Merge commit '88bcfc15316e3c878237a8f95aeb6e72402c90ff'	2017-04-07 17:20:52 -07:00
soumith	0d7d6e1f0d	Merge commit '662163bef68a9d64f3cb13a903638c870c0b4aa6'	2017-04-07 17:20:15 -07:00
Rudy Bunel	b16a352a3b	Fix remainder and cremainder for integer types	2017-04-07 17:17:44 -07:00
Rudy Bunel	88bcfc1531	Fix remainder and cremainder for integer types	2017-04-07 17:16:59 -07:00
Rudy Bunel	662163bef6	Fix remainder and cremainder for integer types	2017-04-07 17:16:31 -07:00
Soumith Chintala	4026593240	check for beta=0 and avoid multiply in sparse mm (#1211 ) * check for beta=0 and avoid multiply in sparse mm	2017-04-07 20:14:32 -04:00
soumith	a931064a52	Merge commit '441d75ce569f89bad3e2f1f2a2075e68ae3bc76b'	2017-04-07 16:57:05 -07:00
Pedro Porto Buarque de Gusmao	441d75ce56	Adapts basic operations to new THXVector interface	2017-04-07 16:56:12 -07:00
Viswanath Sivakumar	813452608c	Add Reduction layer in caffe_translator Summary: Caffe's Reduction corresponds to Caffe2 ReduceBack*. Added a translator for stephenyan1231's model. Reviewed By: stephenyan1231 Differential Revision: D4848289 fbshipit-source-id: 7cb61c115549ffa6be8d0c19a5eaed99c3c086b6	2017-04-07 16:17:07 -07:00
Aaron Markham	e153643b6c	tutorial updates (#257 ) * added dataset downloader from s3 func; leveldb creator func; refactored to use both of these * working version for squeezenet only * using fb.me link for mnist dataset * ubuntu installation instuctions for v0.6.0 * removing non-functional tutorials * updated model download info * model download updates * new tutorial * bump version to v0.6.1 * tutorial helper functions	2017-04-07 16:16:31 -07:00
Yiming Wu	dc5a34200f	SumReduceLikeOp CPU/GPU implementation Summary: 1. CPU/GPU implementation of SumReduceLikeOp. [SRLOp](matrix A, matrix B) -> C where C is of the same shape as B, its value would be the reduce sum of corresponding A element. 2. Make SumReduceLikeOp (part of) the gradient of Add/Mul/Sub and provide unittests ===Update for Translation Team=== 3. Passed Tests: $ buck test caffe2/caffe2/python/operator_test:recurrent_network_test $ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_model_caffe2 $ buck test fblearner/flow/tests/langtech/translation/neural_mt:seq2seq_ensemble_beam_model_caffe2 Reviewed By: Yangqing Differential Revision: D4711302 fbshipit-source-id: 0865abde871b3046b367599731593dae03f0775a	2017-04-07 15:19:24 -07:00
Kittipat Virochsiri	8482cf9823	TensorVectorSizeOp Summary: Put the size of the input tensor vector into the output blob Reviewed By: xianjiec Differential Revision: D4849556 fbshipit-source-id: 0929319e1705b027874d41a90a9159b335d93545	2017-04-07 14:46:19 -07:00
Bram Wasti	c101856214	Disable openmp when building for android Summary: Closes https://github.com/caffe2/caffe2/pull/256 Reviewed By: salexspb Differential Revision: D4853865 Pulled By: bwasti fbshipit-source-id: 57768d538281bec2b18d8c6af7ae58009bbc257e	2017-04-07 14:35:01 -07:00
albanD	3de56785fa	fix conv1d test and add for padding	2017-04-07 13:56:02 -07:00
soumith	5ee8536a02	Merge commit 'a89317a9d407241c97fe4486b3c88de8578445d7'	2017-04-07 13:49:18 -07:00
soumith	f00a5d2f54	Merge commit '66a20e5c328836c1eb720cf4e2eb916366aae487'	2017-04-07 13:47:25 -07:00
Yangqing Jia	7ba1c437e3	Create PATENTS	2017-04-07 13:46:29 -07:00
albanD	a89317a9d4	fix types in unfold.c	2017-04-07 13:32:04 -07:00
soumith	e48db02e10	remove unused python-level BatchNorm.py	2017-04-07 16:27:16 -04:00
soumith	7f2553bc6f	dont use cudnn batchnorm for cudnn < 5.1.10	2017-04-07 16:27:16 -04:00
Viswanath Sivakumar	acaf279235	Unbreak old model check in caffe_translator Summary: The check for old model style seems wrong. Fails with a model I tried to run. Differential Revision: D4847970 fbshipit-source-id: f28c5bb635c5e8b4dcfcc5c52a434d91a89217e8	2017-04-07 12:32:25 -07:00
Bram Wasti	04bd41a4f2	Downloader fix Summary: This fixes some bugs in the downloader. TODO: fix the URL Closes https://github.com/caffe2/caffe2/pull/255 Reviewed By: Yangqing Differential Revision: D4851555 Pulled By: bwasti fbshipit-source-id: 56d01617ccaddcd40b0fb8e4be137cb4c7a52e91	2017-04-07 10:16:58 -07:00
Thomas Riccardi	66a20e5c32	Support TORCH_NVCC_FLAGS environment variable This is already supported in cutorch since august 2016, and is used in pytorch integration (to reduce the binary size).	2017-04-07 18:23:22 +02:00
Peizhao Zhang	cb3bd0ede8	Added a DP + recursion algorithm for finding optimal blob assignments based on blob sizes. Summary: Added a DP + recursion algorithm for finding blob assignments based on blob sizes. This algorithm gives optimal assignments. See comments for details. The algorithm is not used by default, set algo=memonger.AssignmentAlgorithm.DYNAMIC_PROGRAMMING and provide blob_sizes in optimize_interference() to use it. The blob sizes could be retrieved by running the net once and then calling blob_sizes = memonger.collect_blob_sizes(net). All blob sizes are assumed to be 1 if blob_sizes is not provided. In this case, using algo=memonger.AssignmentAlgorithm.GREEDY may be better. Testing on the segmentation model, the memory usage is reduced by 19% (14.96MB to 12.08MB) comparing using the greedy algorithm (without considering conv share buffer). The algorithm runs in 15s for the model with 55 sharable blobs. Reviewed By: ajtulloch Differential Revision: D4818476 fbshipit-source-id: 606936f4cf2715408d60b9a5cf3bcaf1985a0fec	2017-04-07 02:18:08 -07:00
Aapo Kyrola	ffd298376a	option to print tensor shapes at exit Summary: Added Caffe2 cmd line option --caffe2_print_blob_sizes_at_exit=1, that when enabled, will print all tensor sizes at the workspace destructor. Handy especially when using sub-workspaces like with RNNs. Note that the sizes are number of elements, not bytes. Output is designed to be easily excel-copypasteable. TODO: add sorting Reviewed By: jamesr66a Differential Revision: D4844628 fbshipit-source-id: 11608a1710ae5c89bbd741edb506d25496606185	2017-04-06 21:36:04 -07:00
Aapo Kyrola	c7d284a03b	ability to disable inputs for extract predictor net Summary: This is not a super-elegant, but a working solution to fix Newsfeed-teams problem of extracting a predictor net of a net that has a "side chain" that they want to cut from the middle. This adds a argument to ExtractPredictorModel that allows one to define "disabled inputs". These are inputs that we want to switch off, so that all operators that depend on that input will be removed from the model. Differential Revision: D4839953 fbshipit-source-id: 5d16df6f0ec4aac6670e6917efc77abde5d75c95	2017-04-06 17:05:32 -07:00
soumith	37d95687c4	Merge commit 'ae1c365dbdbf667ae24c57eec9f2e6b9debf16bd'	2017-04-06 16:37:31 -07:00
albanD	f0c7124420	Allow support for negative dimension argument for all functions	2017-04-06 16:37:00 -07:00
albanD	ae1c365dbd	Add TH_INDEX_BASE to nDimension and stride functions	2017-04-06 16:30:11 -07:00
Aapo Kyrola	8c769258f8	fix cnn.Softmax when called with only inputs Summary: Many dper code was callling model_helper.Softmax() without outputs, causing python error.. Sorry! Reviewed By: xianjiec Differential Revision: D4845359 fbshipit-source-id: 7b6d547acb968371bf7cae1eb68fb5a8609877ec	2017-04-06 15:33:54 -07:00
Pieter Noordhuis	6fd9b53d93	Include common/linux.{h,cc} in CMake build Summary: Forgot to include these in a previous commit. Closes https://github.com/facebookincubator/gloo/pull/23 Differential Revision: D4847072 Pulled By: pietern fbshipit-source-id: 08aa9e8fa47377eb8c7747bd577eec7e615789f1	2017-04-06 15:20:59 -07:00
Fei Sun	e2323ad688	Add CAFFE_ENFORCE to protobuf parsing Summary: Add CAFFE_ENFORCE to make sure the protobuf parsing is successful. Reviewed By: salexspb Differential Revision: D4843662 fbshipit-source-id: 20cab7180e6b0e5afb5e29ff3333591659e41f7a	2017-04-06 14:34:30 -07:00
Pieter Noordhuis	e692c38fcf	Compute distance metric between PCI devices Summary: With this we can compute the best GPU device to reduce on. It is not always the one CUDA indicates as GPU 0. Reviewed By: andrewwdye Differential Revision: D4845581 fbshipit-source-id: 13e0500f54fd507899646f781a97c09abcd3b056	2017-04-06 13:50:07 -07:00
Aapo Kyrola	23183b9642	memory-saving only_loss argument for SoftmaxWithLoss Summary: When only_loss=True is enabled, the softmax output buffer is shared with the gradient buffer (which is of same size). Added tests for this. Only for GPU version for now. Reviewed By: salexspb Differential Revision: D4843991 fbshipit-source-id: 834d2a1b357d784e4d64efe484f893442201ad6a	2017-04-06 13:04:31 -07:00
Peizhao Zhang	59f464434d	Used blob sizes for finding assignments in a greedy way. Summary: Used blob sizes for finding assignments in a greedy way. Reviewed By: ajtulloch Differential Revision: D4818159 fbshipit-source-id: 89180a6117ba5be058e1d2f9488b06d618e91917	2017-04-06 12:36:38 -07:00
Peizhao Zhang	a54000dc6a	Added an ordering function to reduce live spans of computed blobs. Summary: Added an ordering function (topological_sort_traversal_longest_path()) to reduce live spans of computed blobs. The idea is to sort the ops based on the length of the execution path so that ops in longer path will be used first. Tested on segmentation model with on-the-fly decoder and reduced memory usage from 21.7MB to 14MB (original size is 33MB with compressed parameters and without considering the conv buffer), comparing to use topological_sort_traversal() as the ordering function. It is a general ordering function so I put it in memonger.py directly. Reviewed By: ajtulloch Differential Revision: D4790135 fbshipit-source-id: e661b45c1640de44ce1a9fdd009a4fba38f8e042	2017-04-06 12:20:39 -07:00
Yiming Wu	b922b19bfd	add weights bias to modelhelperbase Summary: add weights and bias to modelhelperbase Reviewed By: salexspb Differential Revision: D4837125 fbshipit-source-id: 6a357c0e3d07d35aa6cdeb8ef803976646b9dbe6	2017-04-06 11:16:55 -07:00
Pieter Noordhuis	5dfa73702f	Display runtime information in benchmark output Summary: This makes it easier to capture, compare, contrast results with different parameters. Reviewed By: andrewwdye Differential Revision: D4843715 fbshipit-source-id: ba6916dcd5f8bcc615d6edce1a54657241357c31	2017-04-06 11:06:23 -07:00
Pieter Noordhuis	95140094cb	Use CudaStream as first class object Summary: Instead of having every CudaDevicePointer "own" a stream, this change moves to using CudaStream as first class object. It was pretty clunky to use the copy{To,From}* functions on the CUDA pointer classes to copy stuff around. For example it was not clear whether the stream belonging to the source or destination was used to execute the copy on. There is no longer such ambiguity after this change. To make this work the CudaBroadcastOneToAll algorithm was changed to include the workspace template argument, but only has the CudaHostWorkspace implementation. The CudaDeviceWorkspace implementation is left to be done for another change (that's not the purpose of this change). Reviewed By: andrewwdye Differential Revision: D4841615 fbshipit-source-id: d0c1b9ba948ff6167832515afa7bdd2b32b48064	2017-04-06 11:06:23 -07:00
Jerry Pan	76abd9a8ac	Caffe2: consolidate AveragedLoss with SumElementsOp Summary: Caffe2: consolidate AveragedLoss with SumElementsOp Differential Revision: D4781561 fbshipit-source-id: 6734adb9dd81d4cad1819a5f8fe736de2477cb72	2017-04-06 10:35:01 -07:00
Alexander Sidorov	c120322890	Predictor exporter open-sourcing Summary: This is moving predictor exporter's code to open-source. Differential Revision: D4815409 fbshipit-source-id: ce1508a2b6b973c91b0420928d2b4c3953f26e6c	2017-04-06 10:01:42 -07:00
Andrew Dye	ef95926103	Move setTimeout to Device and set default tcp timeout to 30 sec Summary: Make timeout a device attribute. Now the pair will configure timeout when connecting based on device timeout settings, instead of needing to be set explicitly on each pair. Set default tcp timeout to 30 sec. Reviewed By: pietern Differential Revision: D4838918 fbshipit-source-id: e6e6ee36c662eb5e7ba5354c904e50f9dcac258f	2017-04-06 08:50:21 -07:00
Soumith Chintala	e7f5220dfa	device_ids can be None again in data_parallel (#1187 )	2017-04-06 10:30:53 -04:00
albanD	a7ae04a657	fix precedence problem when building with debug python (#1201 )	2017-04-06 10:30:16 -04:00
Soumith Chintala	7f03182bfa	sizeAverage -> size_average in docs	2017-04-06 01:31:02 -04:00
Xingdong Zuo	9f2a5d804d	Add a flag to fix when dataset size is not divisible by batch size. (#1133 )	2017-04-06 00:18:43 -04:00
Jiyan Yang	a7217e6626	Remove unused optimizers Summary: As desc. Reviewed By: xianjiec Differential Revision: D4840482 fbshipit-source-id: bf820154475508ce581d16a45bcd93d026b60f30	2017-04-05 21:18:29 -07:00
Soumith Chintala	aa506fa4d7	fix docs typo	2017-04-05 23:42:02 -04:00
Lukasz Wesolowski	955869a09a	fix cuda_allreduce_halving_doubling to correctly copy between and reduce on GPU buffers Summary: cuda_allreduce_halving_doubling was not properly handling the case where buffers are allocated in GPU memory, trying to reduce and copy from them as if they were in system memory. Reviewed By: pietern Differential Revision: D4840259 fbshipit-source-id: 2615360cd2f1d9c7a37fb0bcdf33ff35528b2c75	2017-04-05 19:56:20 -07:00
陈云	d82cad3019	implement nn.Module.__dir__ (#1142 )	2017-04-05 22:18:34 -04:00
Edgar Riba	9504246c32	add triplet margin loss (#1165 )	2017-04-05 22:17:58 -04:00
soumith	81cf3dbf79	Merge commit '6bd4ecd15390517c68d598d236ffb0929ade277c'	2017-04-05 19:07:01 -07:00
soumith	12f1b4f76c	Merge commit '84bdbe5ab4b602b021ff494487c8ad57457052d3'	2017-04-05 19:06:14 -07:00
Brandon Amos	84bdbe5ab4	btrisolve: Add sz checks, correct B's ordering, support nrhs>1.	2017-04-05 19:05:20 -07:00
Soumith Chintala	85954032d9	fix doc formatting	2017-04-05 22:02:29 -04:00
Simon Layton	fadbbd2692	ReversePackedSegsOp optimized GPU code Summary: Removes the need for all the Copy calls, in one of our apps reduced time from ~40ms to < 200us Closes https://github.com/caffe2/caffe2/pull/250 Differential Revision: D4828825 Pulled By: pietern fbshipit-source-id: 656bd0edc4ffbaa3f89ccbe045e28a7aae49ceab	2017-04-05 17:46:51 -07:00
Nitish Shirish Keskar	1a04b92226	add note regarding SGD momentum	2017-04-05 20:45:41 -04:00
Aapo Kyrola	c66c8f6e84	Add Softmax to cnn.py, cuDNN engine. Summary: Softmax was not in the model helper, so added it there so we can set the CUDNN engine, as it is the preferred version. Reviewed By: asaadaldien Differential Revision: D4835624 fbshipit-source-id: 7f0c84b7a73653119901795782709a6a617345c5	2017-04-05 14:20:23 -07:00
Aapo Kyrola	8da2d75ec8	Caffe2/Recurrent] recurrent.py API to cuDNN LSTM Summary: Quite large diff to make cuDNN LSTM and our LSTM produce same results and provide python API for the cuDNN LSTM. * Added operators RecurrentParamGet and RecurrentParamSet to access weights and biases for the different gates, input/recurrent. * Removed RecurrentInit as not needed * recurrent.cudnn_LSTM() returns a special net and mapping that can be used to retrieve the parameters from the LSTM * recurrent.cudnn_LSTM() can be passed blobs that have the parameters for the individual gate weights and biases * recurrnet.InitFromLSTMParams() can be used to initialize our own LSTM from CUDNN params. This way we can test if cuDNN and our own produce the same result. recurrent_test.py tests for the equivalency Reviewed By: salexspb Differential Revision: D4654988 fbshipit-source-id: 6c1547d873cadcf33e03b0e0110248f0a7ab8cb0	2017-04-05 14:20:23 -07:00
Aapo Kyrola	cf201ebac8	support axis for cudnn softmax Summary: Added the support of axis for cudnn version of softmax + added cudnn tests to the softmax_ops_test Reviewed By: urikz Differential Revision: D4835409 fbshipit-source-id: 9150b969237e38daebff961fee3c36759f834ac4	2017-04-05 14:06:03 -07:00
James Reed	320b598ff1	Add NanCheckOp, an operator that checks for NaNs and inf's on both the forward and backward pass. Summary: NanCheck is an in-place operator for GPU that checks the input for any NaN or inf values. The operator fails and prints diagnostic information (input tensor dims and values) if it detects these erroneous values. This should help us to narrow down our numerical instability issues in the NMT models, and it might help others as well. Differential Revision: D4818141 fbshipit-source-id: e5aa9762089c58ce160270446007c7a91a7a85e5	2017-04-05 13:07:59 -07:00
Pieter Noordhuis	8a822d48f5	Update README.md Summary: Clarify that Redis Cluster is not supported. Also see #21. Closes https://github.com/facebookincubator/gloo/pull/22 Differential Revision: D4837375 Pulled By: pietern fbshipit-source-id: 6e3575b3b8dae6ca62beb765da15d8506da4abdb	2017-04-05 13:06:48 -07:00
Lukasz Wesolowski	5511ad258b	cuda version of recursive halving/doubling allreduce Summary: Basic port of the CPU halving/doubling algorithm. No pipelining is done between reduce/broadcast and communication. Reviewed By: pietern Differential Revision: D4823693 fbshipit-source-id: b18045d64edf90361bf7713f4ccb2e074757780f	2017-04-05 12:39:16 -07:00
Jin-Hwa Kim	75a635630d	Update to ignore zero targets If the target is zero, loss and gradient of input are set to zero. It is useful for variable-length natural language generation models.	2017-04-05 11:51:54 -07:00
Pieter Noordhuis	26d301fbe4	Configurable CuDNN workspace limit in resnet50_trainer Summary: TSIA Reviewed By: Yangqing, bwasti Differential Revision: D4835477 fbshipit-source-id: a0083188fe91a56c5f910c7dda46412e38632d7e	2017-04-05 10:50:00 -07:00
Aapo Kyrola	ecd3bda44e	Fix Softmax for CUDA Summary: Following jamesr66a's brilliant observation, this diff fixes the non-CUDNN versions of Softmax. The op did not take into account that blocks can run in parallel, and thus could overwrite each others values, particularly the "row max" that is important for numerical stability So in this diff: 1) SoftmaxOp now shares all the code with SoftmaxWithLoss, that had better implementation + Strengthen the test case and renaming of file. Reviewed By: jamesr66a Differential Revision: D4832929 fbshipit-source-id: 4a1bfa2106ceb65ec75f5b868323ee1e7a3457fb	2017-04-05 10:07:54 -07:00
Pieter Noordhuis	8e6524938b	Undo D4832492 for Gloo Summary: No folly dependency in Gloo. Reviewed By: andrewwdye Differential Revision: D4835050 fbshipit-source-id: 97d0c14fb770fdde68206ca5a20a974bef156392	2017-04-05 09:51:05 -07:00
Aapo Kyrola	02f0c1c9d7	make memonger work with RecurrentNetwork(Gradient) Summary: This diff enables support of recurrent networks for memonger: 1. Memonger descends into the step-nets and renames the blobs accordingly 2. Memonger tells the gradient op about the renamed blobs by adding a parameter "paramname.renamed=<new name>" 3. RecurrentNetworkGradientOp applies remapping to links and gradient blobs. I first thought of refactoring the whole gradient blob management of the recurrent network, but that looks to be very hard without a major revise of the code. Note, I did not enable memonger for neural_mt, since I think the team should do more testing before enabling this. Reviewed By: salexspb Differential Revision: D4812823 fbshipit-source-id: 1ffdf3cfb4fcd00eec5bb0ece3bf416aa6d3e26b	2017-04-05 09:48:25 -07:00
Wael Abdelghani	65439e849b	Fix mixed context loading validation Summary: Description. We kinda have our hands tied here, can't reference conext_gpu since it needs to run under _gpu TARGET to pick up correct headers and can't change the interface of deserialize blob to return size since not all blobs are tensors. If this works then let's ship it. Reviewed By: urikz Differential Revision: D4826034 fbshipit-source-id: 631ba56386ccb91d9b19d780a3e012d0ceea2422	2017-04-05 08:20:03 -07:00
Andrii Grynenko	4e4cfd8b2b	Fix main()s to call folly::init/initFacebook/registrationComplete (part 14) Summary: Required for D4821763 Based on targets from https://fb.facebook.com/groups/fbcode/permalink/1304073246296178/ (I also excluded those targets which do not depend on folly:singleton). Reviewed By: meyering Differential Revision: D4832492 fbshipit-source-id: fcb4ce42e9e5359d4752769f77d7271e550201fe	2017-04-04 20:50:47 -07:00
James Reed	66d00b3a63	Use CUDNN softmax implementation Summary: The caffe2 implementation of bare Softmax() has a race condition that wipes out the numerical stability trick. Use the CUDNN implementation instead Reviewed By: urikz Differential Revision: D4831298 fbshipit-source-id: d11b1de700e3954629e7ed43225a2416c27b3252	2017-04-04 20:02:21 -07:00
Yury Zemlyanskiy	5f263c6175	RecurrentNetwork and variable length links Summary: Two new features for RecurrentNetwork: 1. Ability to specify longer (for a few steps) initial state 2. Ability to link more than one step of external blob to internal one. Some motivation for these changes is provided in the unit test Reviewed By: salexspb Differential Revision: D4816230 fbshipit-source-id: 5ae6fed53b3b08a6ce4547ff1d0cb773dab42af0	2017-04-04 19:46:53 -07:00
Sam Gross	6bd4ecd153	Use thrust::inclusive_scan for 1D cumsum/cumprod (#742 ) For large 1D tensors thrust::inclusive_scan is much faster than our current implementation.	2017-04-04 21:05:10 -04:00
Ahmed Taei	5c802c5ba9	Refactor AllgatherRing to use remote buffer offset Summary: Refactor AllgatherRing algorithm to remove all memcpy in the communication rounds by using outPtrs as send/receive buffer + remote buffer offset. Reviewed By: pietern Differential Revision: D4793186 fbshipit-source-id: 645d0758d246fd0b493e3fe312a8441d86f6d169	2017-04-04 17:08:26 -07:00
soumith	04f5b5ea83	Merge commit '5b40e4245d573ae0a6c2da70a0b712528aab2bce'	2017-04-04 15:39:35 -07:00
Brandon Amos	5b40e4245d	Fix typo and make btrisolve work for doubles on the CPU.	2017-04-04 18:29:30 -04:00
Fei Sun	39fa092a13	Constant string is generated from Protobuf instead of Thrift Summary: To make the predictor open souorce, move the constants that are generated from Thrift to Protobuf. Reviewed By: salexspb Differential Revision: D4656884 fbshipit-source-id: d4dbb3416e8396185e0981fcd9a090fbb054a18a	2017-04-04 15:03:39 -07:00
Dmytro Dzhulgakov	ef42d4c2aa	Fix sparse to dense and improve DispatchHelper Summary: Actually adds stuff on duplicated indices. I didn't use UnorderedSegmentSum because it'd need more modifications for figuring out the first dimension and I don't want to make that function more complex than it's already is :) We theoretically can have a version that does CopyItems and fails on duplicate indices as a fallback. But I haven't implemented it yet as it wouldn't be that useful for now. Also fixes hypothesis test - doing rand() inside the body is not cool as it makes hypothesis run forever Differential Revision: D4814574 fbshipit-source-id: 1851ec5f5df8fc4bf4844585076b8af23a06b0b2	2017-04-04 15:03:39 -07:00
Pieter Noordhuis	ae5865082c	Move common algorithm stuff into algorithm.h Summary: Combines the top level common.h with algorithm.h. With algorithm.h in the common package, CUDA algorithms only need a dependency on that package. CudaBroadcastOneToAll still depended on broadcast.h so this change also removes that dependency and has it subclass the Algorithm class. Reviewed By: andrewwdye Differential Revision: D4826885 fbshipit-source-id: 930037e39f7a2c941868e53f0bbc54e3f2e0b184	2017-04-04 13:05:50 -07:00
Pieter Noordhuis	f86beccc5b	Use workspace pattern with CudaAllreduceRingChunked Summary: GPUDirect support for CudaAllreduceRingChunked by adding a workspace template parameter and adding workspace specific init functions. To support this change the CUDA LocalOp classes had to be changed a bit to take an extra destination/source pointer. This allows reduction of 1-N pointers into a target pointer, where the target may live on device or live on host. If it lives on the host, the NCCL operation that executes the reduction is followed by a D-to-H memory copy. If there is only a single input pointer, no reduction needs to happen and the class just executes the D-to-H memory copy. The net result is that we can interchangeably use device or host pointers as target for reduction or source for broadcast and these LocalOp what you would expect them to do. Reviewed By: andrewwdye Differential Revision: D4825236 fbshipit-source-id: 048ec6cbc5a0500bafbe1b3f6abe1e2e5f3a2675	2017-04-04 13:05:50 -07:00
Brandon Amos	d122b4e4ec	Update btrisolve docs to the newest interface.	2017-04-04 15:21:16 -04:00
Sylvain Jeaugey	ccfc4567dc	Merge pull request #78 from ilya-biryukov/master Fix compilation error when compiling with 'clang -x cuda'.	2017-04-04 09:47:52 -07:00
Andrew Dye	81008aa111	Handle errors in sync IO path. Summary: Fixes for handling errors and timeouts in blocking and polling sync paths. Add test coverage for errors and timeouts. Reviewed By: pietern Differential Revision: D4823498 fbshipit-source-id: 93721947a6404ca9cea6a4869f4156f8d270a981	2017-04-04 09:37:33 -07:00
Pieter Noordhuis	0cdf10478d	Start benchmark element sweep at 100 Summary: Anything number of elements below this always fits in a single packet and will yield ~identical results. Differential Revision: D4825190 fbshipit-source-id: 71ac77456049e991da5059d5a029c5e9d2a67ed7	2017-04-03 23:50:38 -07:00
Jon Morton	0e5b2fd016	Support cropping with negative pad sizes in PadImage Summary: The PadImage op supports cropping along the H/W dimensions by using negative pads; but currently passing negative values for pad attributes throws an error in ConvPoolOpBase, which PadImage inherits from. Modify ConvPoolOpBase to accept negative pad values for non-conv, non-pool ops. Also add a python operator test for cropping Reviewed By: ajtulloch Differential Revision: D4817118 fbshipit-source-id: 5ea5203e8072cc34fe14938e534b157d0ad55f6b	2017-04-03 23:47:54 -07:00
Pieter Noordhuis	4de82cfa0f	Use CudaAllreduceRing<CudaDeviceWorkspace> for GPUDirect Summary: The existing CudaAllreduceRing with a CudaDeviceWorkspace template parameter now has the same effect. Reviewed By: andrewwdye Differential Revision: D4823393 fbshipit-source-id: 88fe497a983b26a281a3a74fe3bdc02c0c87c523	2017-04-03 20:05:25 -07:00
Kittipat Virochsiri	5c32c82a6d	Add option to subtract log odd from sampled trained prediction. Summary: Useful for sampled softmax training Differential Revision: D4782673 fbshipit-source-id: 88195de60070a0bc16f5e06b9aad4dffd0484546	2017-04-03 17:50:58 -07:00
Pieter Noordhuis	1ac8251373	Use gloo::make_unique to fix build for C++11 Summary: Closes https://github.com/facebookincubator/gloo/pull/20 Differential Revision: D4820325 Pulled By: pietern fbshipit-source-id: 00a870f71e8e98ce6d06da261dcaed83b81ec81c	2017-04-03 17:07:04 -07:00
Kittipat Virochsiri	3b4c950862	Add option to use id_score_list_features column Summary: Somehow, feed-non-ranking training data usually have this type of column. Add option to support it. Reviewed By: xianjiec, kennyhorror Differential Revision: D4773960 fbshipit-source-id: 5a7ef4618a070e04f3cd8ddfcbf2b7441c00d92d	2017-04-03 17:03:09 -07:00
Andrew Dye	511ca3ea1b	Add tests for tcp transport failures Summary: Implement a file store for multi-process transport failure testing. Add test cases to spawn multi-process tcp communication, and verify that all processes throw the expected IoException. A future diff will add coverage for connectivity failures, sync modes, and ibverbs. Reviewed By: pietern Differential Revision: D4807794 fbshipit-source-id: 35212719d46e6d875eacb341fae25681f39053bc	2017-04-03 16:08:39 -07:00
Eric Cosatto	8ce1382e99	make it compile on Windows + use ilp64 MKL (#981 )	2017-04-03 18:02:15 -04:00
Lukasz Wesolowski	22cdef3ddc	recursive halving/doubling allreduce Summary: Allreduce using recursive halving and doubling algorithm. Algorithm is described in http://www.mcs.anl.gov/~thakur/papers/ijhpca-coll.pdf (see top diagram on page 12). Algorithm consists of 2 lg P stages, the first log P performing a reduce-scatter and the second log P the allgather. Message size is variable across steps. The early stages of the reduce-scatter and the late stages of allgather send the largest messages. The communication is structured such that the largest messages are sent between nearby ranks, which could be useful if elements are ranked in locality-aware fashion. So far this supports only power-of-two number of processing elements. I have attempted to minimize the amount of synchronization/ hand-shaking. Messages are received at different offsets of the output buffer for each communication step. Send offsets in the reduce-scatter steps become receive offsets in the allgather and vice versa. The reuse of buffers across reduce-scatter and allgather steps requires synchronization. Right now the algorithm is inefficient in terms of memory use, requiring 3x memory currently. This can be reduced, but would require additional synchronization. Reviewed By: pietern Differential Revision: D4795878 fbshipit-source-id: fcc6597ef6a99cd102fce2b8e4562d93088d39dc	2017-04-03 14:05:44 -07:00
Aapo Kyrola	e13e9c1302	cuDNN version of TransposeOp Summary: Uses the cudnnTransformTensor function. It works by shuffling the strides according to the transpose axis. Significant speedup over current GPU version . + moves the transpose test under utility_ops, because hypothesis_test is too big Reviewed By: jamesr66a Differential Revision: D4810993 fbshipit-source-id: 82577c4ced1389e70bd5992820ae4d8297a3817f	2017-04-03 13:33:10 -07:00
Simon Layton	bf28d80460	fp16 support for NCCL ops Summary: fp16 dispatch for NCCL Closes https://github.com/caffe2/caffe2/pull/245 Differential Revision: D4820168 Pulled By: pietern fbshipit-source-id: 03250a3dfc4439281ef50bb45e7af9c76f6069f4	2017-04-03 13:03:04 -07:00
Alexander Sidorov	fe9a243b83	Add default value for GetRepeatedField Summary: This is just by analogue with GetSingleArgument which already has a default_value support Reviewed By: Yangqing Differential Revision: D4819789 fbshipit-source-id: cf271d9f345f14f3e373186365726c738c1c26f3	2017-04-03 12:04:22 -07:00
Pieter Noordhuis	148b11847b	Remove useless base class in allreduce.h Summary: Didn't provide enough value now that ReductionFunction and CudaReductionFunction are no longer related. Reviewed By: andrewwdye Differential Revision: D4819295 fbshipit-source-id: e6479769af7f78d486bee7d9c31f049430cdc775	2017-04-03 11:09:50 -07:00
Pieter Noordhuis	b3a2f30715	Extra workspace template parameter for CUDA algorithm Summary: To bring the GPUDirect and non-GPUDirect implementations of CUDA aware algorithms closer together this change introduces CUDA workspaces. There's an implementation for a host side workspace and a device side workspace. The former is used for transports that don't support GPUDirect and the latter for ones that do. CUDA algorithms will take an extra template parameter for this workspace and this will determine whether they can be used for GPUDirect or not. The workspaces only define their respective pointer types right now but may contain local operation construction functions at a later point in time. Reviewed By: andrewwdye Differential Revision: D4802826 fbshipit-source-id: cb1d71a224ce0165afd07fb9092ad54d3e07c8cf	2017-04-03 11:09:50 -07:00
Luke Yeager	a95751e918	Fix test_random_seed_behavior for multi-GPU Summary: ``` E0327 17:33:12.775998 15629 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered F0327 17:33:12.776208 15629 operator.h:176] Computation on device returned error in operator output: "Y" name: "" type: "XavierFill" arg { name: "shape" ints: 2 } device_option { device_type: 1 cuda_gpu_id: 0 } ``` Closes https://github.com/caffe2/caffe2/pull/225 Differential Revision: D4819785 Pulled By: Yangqing fbshipit-source-id: 896ca4d6534643bc261667377cc74d4fd7b3aca3	2017-04-03 10:50:46 -07:00
Aaron Markham	f29d3839a8	ubuntu installation instuctions for v0.6.0 (#244 )	2017-04-03 10:11:21 -07:00
Yangqing Jia	e56f21e46e	bump version to 0.6.0 for prerelease (#243 )	2017-04-03 10:11:01 -07:00
Yangqing Jia	6490d58a75	Add cuda path to nccl build Summary: This was found necessary on some CentOS. aaronmarkham Closes https://github.com/caffe2/caffe2/pull/240 Differential Revision: D4819591 Pulled By: Yangqing fbshipit-source-id: 40161cd484a2c8d43f26077919ad2762440dde13	2017-04-03 10:05:26 -07:00
Adam Paszke	91c4ba7980	Add torch.arange and deprecate torch.range	2017-04-03 10:38:58 -04:00
Adam Paszke	03f1cab801	Unify argument names in norm and renorm	2017-04-03 10:38:58 -04:00
Adam Paszke	fa2c566353	Add Variable.type_as	2017-04-03 10:38:58 -04:00
Adam Paszke	2d1122739c	Raise AttributeError in Module.__getattr__	2017-04-03 10:38:58 -04:00
Adam Paszke	7861f585fe	Reshape grad in dot	2017-04-03 10:38:58 -04:00
Xianjie Chen	9fc56793dd	fix trunk for push and small cleanup Summary: multiple places broken, blocking the push :( - fix the weighted training for ads and feeds - fix the publishing if no exporter model is selected - fix the feeds retrieval evaluation - added the default config for retrieval workflows. plan to use for flow test (in next diff) - clean up not used code - smaller hash size for faster canary test Reviewed By: chocjy Differential Revision: D4817829 fbshipit-source-id: e3d407314268b6487c22b1ee91f158532dda8807	2017-04-02 23:35:49 -07:00
Soumith Chintala	3abf2ef225	Merge pull request #991 from BTNC/win add /arch:AVX /arch:AVX2 explicitly for msvc so it compiles on windows	2017-04-02 13:32:57 -04:00
Rui Guo	70c4b82eba	add /arch:AVX /arch:AVX2 explicitly for msvc	2017-04-02 20:47:29 +08:00
Jiyan Yang	b401cb48fe	Make optimization methods configurable and allow flexible optimization settings Summary: This diff does the followings: 1. Add optimization options to model options in the UI for all workflows. 2. Allow different parameters to use different optimizers (or same optimizer with different settings, eg, learning rate). 3. Remove the default values for the `sparseDedupAggregator` field in the thrift file as the default value for that should just be `None` instead of 'sum'. 4. `fb/dper/layer_models/mlp_sparse.py` is deprecated. 5. Add calibration to two tower workflows. Reviewed By: kittipatv Differential Revision: D4767004 fbshipit-source-id: de92ea63fb0ff33f8581b1693479b723a68cd2d1	2017-04-01 23:02:21 -07:00
Nick Hynes	274b5c9003	Allow unhashable inputs to parallel_apply	2017-04-01 20:11:20 +02:00
Yury Zemlyanskiy	b0a0c437dd	Some fixes for load/saving and beam search Summary: - Fixed loading params into ensemble model - Small fix for beam decoder Differential Revision: D4807595 fbshipit-source-id: 0187fda7eb469401f1acd8e6108de54ab67ae922	2017-04-01 02:17:21 -07:00
Yangqing Jia	ce31caf865	batch matmul: guard against old cuda versions. Summary: The cublasSgemmStridedBatched is only supported by cuda 8+. Luckily we can always fall back. https://devblogs.nvidia.com/parallelforall/cublas-strided-batched-matrix-multiply/ aaronmarkham found this in the centos build on the oss side. Differential Revision: D4808822 fbshipit-source-id: 1657c139b57158e633074e06787c48302e0df142	2017-03-31 17:32:22 -07:00
Luke Yeager	2d7731a5d1	Fix typo "mistmatch" Summary: Closes https://github.com/caffe2/caffe2/pull/239 Differential Revision: D4814359 Pulled By: Yangqing fbshipit-source-id: 59e959fb97a1d4960626c11242dc9b828b5db25f	2017-03-31 17:06:21 -07:00
Yangqing Jia	254ee9b099	Fix protobuf build to properly include directories Summary: aaronmarkham - I think this should fix the oss build. slayton58 FYI kmatzen FYI Closes https://github.com/caffe2/caffe2/pull/238 Differential Revision: D4812550 Pulled By: Yangqing fbshipit-source-id: 5703e403ef22c02e87f885bad8379fd5a8e06cdb	2017-03-31 13:20:10 -07:00
Pooya Davoodi	a2593ea0c2	Add GatherOp for GPU, and update its tests. Summary: This is an initial (read: unoptimized) implementation of GatherOp on GPU. Closes https://github.com/caffe2/caffe2/pull/209 Differential Revision: D4809676 Pulled By: Yangqing fbshipit-source-id: bc36fa02e9964370ca845e9cc13344e5f3dbf176	2017-03-31 13:20:09 -07:00
albanD	dfa2d26830	* make random_ range correct when both lower and upper are specified	2017-03-31 15:37:24 -04:00
albanD	559ae078b8	Fix Option constructor in invalid argument error printing code (#1160 )	2017-03-31 15:35:35 -04:00
Yiming Wu	d8c65cc52a	A more deterministic way to find old C1 model Summary: minor fix about C1 model translator Reviewed By: Yangqing Differential Revision: D4807165 fbshipit-source-id: 0149e2655d2901b23a37e92f61d9dd678cf6ee69	2017-03-31 11:51:56 -07:00
soumith	030ff4928a	Merge commit 'a216e377b3844ac9c7882bd391a00f4e0ae718e7'	2017-03-31 11:45:37 -07:00
soumith	0829bffdec	Merge commit '403cad46dc91a2bc2f6889754055decd6f3d53c7'	2017-03-31 11:45:24 -07:00
soumith	ffc7911bec	Merge commit 'd8ae7893e056ebf4e7a5e96bab2c3b69f196ddfd'	2017-03-31 11:45:06 -07:00
soumith	ff1fde6151	Merge commit 'a3bfb9f376a57fb63e89ddf70f57353f19ed9d69'	2017-03-31 11:44:48 -07:00
Soumith Chintala	a216e377b3	Merge pull request #456 from twitter-forks/addmm-fixes Using temporary variables when performing transpose + addmm	2017-03-31 14:44:07 -04:00
Andrew Tulloch	cd2929c707	ConvTransposeMobileOp respects the `shared_buffer` arg. Summary: This makes ConvTransposeMobileOp inline with other implementations, allows us to account for these buffers in the workspace, and is generally a good thing to do. Differential Revision: D4767431 fbshipit-source-id: b14a96a089136e305ab42680772272f4e5f16f53	2017-03-31 10:32:49 -07:00
Bor-Yiing Su	8f9cd757db	Skips the initialization phase of the individual checkpoint objects. Summary: The initialization phase of each checkpoint object simply loads the nanmes of the blobs in the checkpoints. When we load from the checkpoints, the names of the blobs are given. We can skip this init step. Reviewed By: azzolini Differential Revision: D4808114 fbshipit-source-id: 4c740049c1014f3e93b4b87f43e3937afdefa25a	2017-03-31 10:10:56 -07:00
Soumith Chintala	b13b7010b9	check for nvidia driver's sufficiency before checking for number of CUDA devices (#1156 )	2017-03-31 12:19:59 -04:00
Gregory Chanan	a3bfb9f376	THVector_(add),(mul) -> (adds),(mul) for VSX. This was previously completed for other architectures.	2017-03-31 08:50:23 -07:00
Tudor Berariu	5c79046d39	Use persistent tensor to store exp_inf (part of optimizer's state) (#1152 )	2017-03-31 10:30:31 -04:00
Yangqing Jia	5bee34eb84	Add git submodule init command Summary: aaronmarkham Closes https://github.com/caffe2/caffe2/pull/237 Differential Revision: D4808930 Pulled By: Yangqing fbshipit-source-id: c598fac789e97280d12961b0be257607ebf82244	2017-03-31 00:32:15 -07:00
Aapo Kyrola	0771ce312a	optimize weighted softmaxwithloss gradient Summary: Weighted LabelCrossEntropyGradientKernel had a clowny loop over D. Since the operation is completely linear, we can just do it all in a one parallel loop. Massive speed up: in my benchmark from 4s to 20ms. + added weights to the lstm_benchmark Reviewed By: jamesr66a Differential Revision: D4800889 fbshipit-source-id: f9850bcc56ce34d5d7a613419cd172256633a894	2017-03-30 23:02:19 -07:00
陈云	30fd222b80	implement autograd function `cross` (#1138 )	2017-03-31 01:45:51 -04:00
Fei Sun	834142bb64	Change the predictor to use Protobuf Reviewed By: salexspb Differential Revision: D4644798 fbshipit-source-id: 0cf96dfc9061f87978a57d2fedcfe4a0bb012405	2017-03-30 22:34:58 -07:00
Ou Jin	cd4160c894	distributed training for dper2 Summary: Add distributed training to dper2 and keep the dper1 working. * Created a ModelDelegator to wrap ModelHelper and LayerModelHelper to mitigate the difference. * To get the average length for sparse feature, I extracted some information in feature_processor. There should be some better way to do it after we have new compute_meta. * metric right now only runs on the first trainer. * The model is saved correctly for evaluation. But I'm still not sure how to handle the weights for adagrad. Reviewed By: kennyhorror Differential Revision: D4767745 fbshipit-source-id: 0559d264827a7fd9327071e8367d1e84a936bea9	2017-03-30 19:04:50 -07:00
Aapo Kyrola	8421bf7c60	Faster softmaxWithLoss rowMaxKernel Summary: We did not parallelize over D, which can be very large, especially in RNN models. This speeds up significantly, with my quick test in lstm_benchmark and nvprof, the time of RowMaxKernel dropped from 1.2s total to 0.28s total. + addded softmaxwithloss to the lstm_benchmark Reviewed By: jamesr66a Differential Revision: D4800629 fbshipit-source-id: 3400ea1064b1eb2793bc403df2c1b68801d545e5	2017-03-30 15:49:46 -07:00
Yangqing Jia	1a852095f7	Merge pull request #235 from Yangqing/protobuf move protobuf back to 3.1.0 due to android/ios cmake error in 3.2.0	2017-03-30 15:29:23 -07:00
Pieter Noordhuis	3b7b23df66	Move CUDA collectives to cuda_collectives.h Summary: The CUDA algorithms all had their own version of local reduction and broadcast. This commit consolidates them and allows all CUDA algorithms to work with CudaDevicePointer instances. Reviewed By: andrewwdye Differential Revision: D4797968 fbshipit-source-id: cccef39fce01905a2cd757ccbcffd29803411409	2017-03-30 15:06:03 -07:00
Yangqing Jia	ffd1883229	Make extension loader properly handle visibility. Summary: (Also, exposed the macros that we use during build time via the macros.h header file) Closes https://github.com/caffe2/caffe2/pull/233 Differential Revision: D4803311 Pulled By: Yangqing fbshipit-source-id: 9f8ce57692f81f7a8994344846d3c90aa2c7070a	2017-03-30 14:38:38 -07:00
Lukasz Wesolowski	d933287114	Add a barrier after verification iteration in benchmarks to prevent a race with regular iterations Summary: Verification was sometimes failing for allreduce halving-doubling. Pieter noticed that it is due to verification step racing with the regular iterations. Reviewed By: pietern Differential Revision: D4804558 fbshipit-source-id: f645cb2e332e449a993a634c5bdb42c2dcb8613b	2017-03-30 14:14:32 -07:00
Yangqing Jia	b2c6ac8691	temporarily disable binary build that depends on both leveldb and opencv Summary: Closes https://github.com/caffe2/caffe2/pull/234 Differential Revision: D4803393 Pulled By: Yangqing fbshipit-source-id: 56a38346759d4c6547f03c3c24663d114f7db01e	2017-03-30 10:16:55 -07:00
Yangqing Jia	4189882cfe	move protobuf back to 3.1.0 due to android/ios cmake error in 3.2.0	2017-03-30 10:09:04 -07:00
Aapo Kyrola	ed44e87f98	use striped batch add for the recurrent network gradient Summary: Instead of callint batch-size many math::Adds, added a new function that does a batch of additions. For CPU there is no difference, but for CUDA we do everything in one kernel. I don't think this has huge performance impact, but at least makes the CUDA profiling look better with less kernel launches. Reviewed By: jamesr66a Differential Revision: D4798411 fbshipit-source-id: 44ac65b2da5a615971219809b9298b4e122085cd	2017-03-30 08:57:16 -07:00
Aalekh Jain	761eef1f19	Minor typo fix in `backward` function in `torch/autograd/variable.py` (#1143 )	2017-03-30 11:23:28 -04:00
James Cross	5dffba3f92	Sparse momentum update for seq2seq embeddings Summary: Added SparseMomentumSGDUpdate to NMT training pipeline. Also surfaced and fixed out-of-bounds error in operator stemming from the implicit assumption that gradient slice input would be 2D. Now it is compatible with any dimensions, with indices indexing into the first dimension of param. Added internal checks to ensure that indices are valid. Differential Revision: D4799697 fbshipit-source-id: 91ea23a6e743cc5337b46fae2821e773067d911e	2017-03-30 08:03:52 -07:00
ngimel	d8ae7893e0	Get rid of warp-synchronous code (#739 ) Time to get rid of warp-synchronous code. It will break!	2017-03-30 01:20:43 -04:00
Pieter Noordhuis	90b872c670	Add GPUDirect capable version of CudaAllreduceRing Summary: This is a copy of CudaAllreduceRing that doesn't stage the locally reduced buffer in host memory but uses the GPU side buffers directly. Eventually I would like this to be absorbed back into CudaAllreduceRing, but for now it's a good place to compare the two implementations and abstract the parts that make sense, until they are identical again. Reviewed By: andrewwdye Differential Revision: D4791629 fbshipit-source-id: 5ad065cb94adb968aeee2379327be313638f2161	2017-03-29 18:50:11 -07:00
Pavan Yalamanchili	a95ce9e98f	Using temporary variables when performing transpose + addmm	2017-03-29 16:56:39 -07:00
Bor-Yiing Su	0e6413f8ea	Fix flaky test Summary: Somehow the stress-runs flag does not work as what I expected. Now the test finally passes. Reviewed By: azzolini Differential Revision: D4797559 fbshipit-source-id: 1e46844e9ae55c331c2e265a59dc550983274213	2017-03-29 16:48:20 -07:00
Pavan Yalamanchili	403cad46dc	Using temporary variables when performing transpose + addmm	2017-03-29 16:14:13 -07:00
Kittipat Virochsiri	e1d64ea4d5	support multilabel in generic preprocessor Summary: Adding support for multilabel in multiclass workflow. `input_feature_schema` and `trainer_extra_schema` are now a function taking in the preprocessor option and output the schema. This allows dynamic schema definition based on the option. Changing default value will be in the next diff. Reviewed By: xianjiec Differential Revision: D4750064 fbshipit-source-id: 896143f432e963bc1723c0153749efeb39a83bec	2017-03-29 15:20:54 -07:00
Alexander Sidorov	51ae65d76f	RNN: reuse memory for gradients of internal blobs of the cell net Summary: Main idea is that on the backward pass we don't need to store all the backward outputs in memory. This diff addresses only ones used internally in each private workspace by creating that shares them all witing the backward pass. Another thing we can do - get rid of state_grad blobs, but this would be a different effort. See comments for more detailed description. Reviewed By: urikz Differential Revision: D4784900 fbshipit-source-id: 2dd8fe1b1215217ce92c09d918582d76c3051630	2017-03-29 15:20:50 -07:00
Kittipat Virochsiri	619a3ad2f4	Adding use_grad_hack option to Sub gradient Summary: Similar to how Add gradient handles broadcasting. Reviewed By: kennyhorror Differential Revision: D4785565 fbshipit-source-id: ff302c9f1eb16c282c5172a7b9753fdbe68eaf1f	2017-03-29 15:02:36 -07:00
Kittipat Virochsiri	3eb3507367	uniform_sampling layer Summary: This layer will be used to sample negative labels for sampled softmax. Differential Revision: D4773444 fbshipit-source-id: 605a979c09d07531293dd9472da9d2fa7439c619	2017-03-29 14:36:12 -07:00
Luke Yeager	d76a814c93	Fixes for ops without a CUDA backend Summary: All of these tests fail with some variant of `Cannot create operator of type 'X' on the device 'CUDA'` (see commit messages). Closes https://github.com/caffe2/caffe2/pull/227 Differential Revision: D4797060 Pulled By: Yangqing fbshipit-source-id: 5feaa8e949098bfc1254d4c7449a2744e552f925	2017-03-29 14:36:09 -07:00
Pieter Noordhuis	b8ccf42c74	Constify algorithm constructors Summary: TSIA Reviewed By: gchanan Differential Revision: D4795492 fbshipit-source-id: aaad7afd373e40fa4669129cf2c98594c4091153	2017-03-29 14:21:03 -07:00
Alisson Gusatti Azzolini	7772f1f182	Make Blob moveable Summary: Blob fits well the semantics of a noexcept moveable object since its semantic is equivalent o a unique_ptr. This allows for example to have a std::vector<Blob> Reviewed By: pietern Differential Revision: D4760079 fbshipit-source-id: d652d89be91a90c70651936ff694e0449a2c406b	2017-03-29 14:07:30 -07:00
Adam Paszke	8aa1cefed8	Fix deadlock in autograd (#1140 )	2017-03-29 16:19:40 -04:00
Aapo Kyrola	310aacf23c	mini-improvements Summary: 1) allow other than simple network on recurrent net steps; Reviewed By: urikz Differential Revision: D4789889 fbshipit-source-id: f30f0e7268a353134db0fe21fc5c456f21fce969	2017-03-29 13:08:47 -07:00
Aapo Kyrola	d604961b26	check for ExtractPredictorNet for is_test arguments Summary: To prevent others making the same mistake as I did, check that no op has is_test=0 argument when ExtractPredictorNet is called. Reviewed By: viswanathgs Differential Revision: D4796425 fbshipit-source-id: 38c14df6bcc767ec2e6a6e35ee79596a5dab531c	2017-03-29 12:48:54 -07:00
Yangqing Jia	afe3df32f5	test existence of confu and ninja before installing nnpack. Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/229 Differential Revision: D4795563 Pulled By: aaronmarkham fbshipit-source-id: 4df871760a1124bb7a2ef226d01b4ced12d21ab1	2017-03-29 10:18:07 -07:00
aaronmarkham	04210ad531	bump protobuf to 3.2.0	2017-03-29 10:01:22 -07:00
Andrew Dye	4b147e2079	Settable timeout for tcp read/write Summary: Add a setTimeout() API to the Pair interface. Implement in the tcp transport for connect, read, and write, and across blocking, polling, and async configurations. Ibverbs implementation to come later. Reviewed By: pietern Differential Revision: D4787932 fbshipit-source-id: 6072dc0c0add1700f84a72b83e4388b29b044ec1	2017-03-29 09:07:04 -07:00
Yangqing Jia	92f2220589	Add whitelist capability for smaller mobile binaries Summary: This helps adjusting mobile build sizes when necessary. Closes https://github.com/caffe2/caffe2/pull/228 Differential Revision: D4795135 Pulled By: Yangqing fbshipit-source-id: 70a0dc35b31d5c8038081aedeb464e47e4284217	2017-03-29 08:47:08 -07:00
Yangqing Jia	8efb762fcd	gpu sequence op step 1: clean headers Summary: @public This has no functionality changes yet, only cleaning up the sequence_op file so that the header is context-independent and I will implement the gpu parts separately. Reviewed By: pietern Differential Revision: D4777140 fbshipit-source-id: 9b4aea6c36f06a64a53e235a125cd3477d54a045	2017-03-29 08:47:00 -07:00
Rudy Bunel	0d908d813b	Implements Cumsum function for autograd (#1122 )	2017-03-29 17:45:57 +02:00
Soumith Chintala	1c391f6f93	bump version	2017-03-29 10:08:34 -04:00
Aaron Markham	58f7f2b441	doxygen python block added Summary: Closes https://github.com/caffe2/caffe2/pull/226 Differential Revision: D4793550 Pulled By: JoelMarcey fbshipit-source-id: cc33e58186304fa8dcac2ee9115dcc271d785b1e	2017-03-29 06:46:16 -07:00
Brandon Amos	be146fd721	Add btriunpack and update the btrifact test.	2017-03-29 13:42:13 +02:00
Andrey Malevich	7cc92b1260	Add eval net for layer_model_helper Summary: This diff is adding eval nets to layer model helper. It should be useful for the cases when train/eval nets need some extra input (usually some supervision) for train/eval. For example various sampled layers, etc. Differential Revision: D4769453 fbshipit-source-id: 7a8ec7024051eab73b8869ec21e20b5f10fd9acb	2017-03-29 04:03:40 -07:00
Soumith Chintala	2979f4b989	add more functions to docs	2017-03-29 01:29:17 -04:00
Soumith Chintala	22b3600f19	add samplers to documentation	2017-03-29 00:33:07 -04:00
Fei Sun	95657ea1e8	Protobuf is binary string. Use bytes instead. Summary: Prepare for the Protobuf change. Reviewed By: dzhulgakov Differential Revision: D4784884 fbshipit-source-id: 86219eecefaf7637e70339437c9274c526ebd6fe	2017-03-28 19:03:23 -07:00
ngimel	215813d7ac	Change dockerfile to support for cudnn v6 (#1135 )	2017-03-28 20:05:04 -04:00
Aapo Kyrola	fd2835887b	only resize stepWorkspaces when sequence length increases Summary: We should resize the workspace-vector only when it increases. Otherwise we end up destroying and recreating workspaces constantly if sequence length varies. Modified the lstm_benchmark test to randomize sequence length. This provides big perf improvement to machine translation pipeline. Look at the recurrent network op runtimes. WITH: I0328 12:17:54.073976 492094 prof_dag_net.cc:156] 136.271 ms/iter ( 120.987 ms/iter) RecurrentNetwork I0328 12:17:54.073982 492094 prof_dag_net.cc:156] 190.074 ms/iter ( 156.828 ms/iter) RecurrentNetworkGradient WITHOUT: I0328 12:25:17.658206 518884 prof_dag_net.cc:156] 375.369 ms/iter ( 249.268 ms/iter) RecurrentNetwork I0328 12:25:17.658211 518884 prof_dag_net.cc:156] 278.892 ms/iter ( 227.29 ms/iter) RecurrentNetworkGradient With LSTM benchmark, get about 2x speedup Reviewed By: jamesr66a Differential Revision: D4789354 fbshipit-source-id: ad72f61974e35b0474abcacdc466ae9c6b4eb0ff	2017-03-28 14:08:00 -07:00
Bor-Yiing Su	a03d956b56	Fixes the flaky test. Although we create nets in three different nodes, Reviewed By: azzolini Differential Revision: D4788418 fbshipit-source-id: bdf90c5674b5dbb8b3bda21cf85ea33fedb36fa6	2017-03-28 13:48:07 -07:00
Ahmed Taei	f2b8150a1a	Fix PadImage same padding argument. Summary: PadImage has no kernel parameters resulting pads_ paraemeters to be not set (0). I added a test case too. Differential Revision: D4785230 fbshipit-source-id: fd475e7c41208e07fa7a363def9a45c6f82cddfe	2017-03-28 13:21:36 -07:00
Alexander Sidorov	939daa3d99	gradient checker for nets Summary: this is useful to test rnn cells Reviewed By: dzhulgakov Differential Revision: D4720641 fbshipit-source-id: baa7df43357ed8af72ede64be3e0a642a40472df	2017-03-28 13:03:14 -07:00
Aapo Kyrola	1ed746df45	BatchMatMulOp: use cuBLAS batched strided gemm for CUDA Summary: Instead of doing gemms in a for-loop (which is not parallelized), it is much better to do the batched matmuls using CUDA 8's new batched-striped version of gemm. With the MT team's test, we get 5-10% improvement in overall walltime, so it is significant improvement: ---- Without batched gemm: I0328 10:46:48.118605 58068 prof_dag_net.cc:136] 424.757 ms/iter ( 283.878 ms/iter) RecurrentNetwork I0328 10:46:48.118609 58068 prof_dag_net.cc:136] 352.603 ms/iter ( 265.85 ms/iter) RecurrentNetworkGradient With batched gemm: I0328 10:53:48.169996 85617 prof_dag_net.cc:136] 407.438 ms/iter ( 269.564 ms/iter) RecurrentNetwork I0328 10:53:48.169999 85617 prof_dag_net.cc:136] 322.393 ms/iter ( 287.625 ms/iter) RecurrentNetworkGradient Reviewed By: jamesr66a Differential Revision: D4788272 fbshipit-source-id: 210e8b94c1e036b6ef0f039ce000d455258651f4	2017-03-28 11:54:09 -07:00
Pieter Noordhuis	80e88a88ed	Fix ibverbs completion queue capacity Summary: The header already contained an analysis of required completion queue depth but the queue pair was still initialized with a maximum queue depth of kMaxBuffers. This change fixes that and updates the analysis to talk separately about receive and send completion queues. Reviewed By: andrewwdye Differential Revision: D4785786 fbshipit-source-id: 4dc302d523a3b7162dc261d14cfcc755681febf8	2017-03-28 10:06:50 -07:00
Alexander Sidorov	76997e80f2	RNN: remove copy for gradients of recurrent inputs Summary: See comments in the code Reviewed By: urikz Differential Revision: D4771195 fbshipit-source-id: 952b6fe86d49ed1cdb87aba1fda7ac92c67dbeb5	2017-03-28 10:02:25 -07:00
Alexander Sidorov	242bff8480	RNN: avoid copy for gradients of inputs to the rnn cell and save more memory! Summary: This is pretty tricky to explain, but we can just use backward_links. This way the whole cell would use a blob from the states_grad tensor instead of having its own blob. This also should save on memory a bit Differential Revision: D4770798 fbshipit-source-id: 673f85b2c2fdf42c47feeaa24d1e2bf086f012f9	2017-03-28 10:02:25 -07:00
Jerry Pan	327d3cb2b5	Caffe2: add init method and metric logging to data loader Summary: Caffe2: add init method and metric logging to data loader Differential Revision: D4685665 fbshipit-source-id: c4e0a09ab6a90c26c329f731f261cba8af1d6bbd	2017-03-28 08:48:27 -07:00
Jerry Pan	78f0b35949	Caffe2: CUDA implementation for LeakyReluOp Summary: Caffe2: CUDA implementation for LeakyReluOp Reviewed By: asaadaldien Differential Revision: D4782336 fbshipit-source-id: 402eace695307b62740c918660d9e521217e928a	2017-03-28 08:48:25 -07:00
James Cross	b41449b680	SparseMomentumSGDUpdateOp Summary: Creates SparseMomentumSGDUpdate, a sparse version of MomentumSGDUpdate, to make that optimization method (via in-place updating operator) compatible with GradientSlices. Differential Revision: D4784973 fbshipit-source-id: e6330f471a4d5f53589a6ac245e38f256ca7f354	2017-03-28 07:47:46 -07:00
Du Phan	dc7695a47a	Update links for tutorials in README (#1123 )	2017-03-28 14:21:40 +02:00
chenyuntc	032a65edff	modify pip uninstall command in CONTRIBUTING.md	2017-03-28 14:20:49 +02:00
Andrew Gallagher	9c58341809	codemod: use `<>` includes for gtest headers Summary: These are system headers and so should be included via `<>`. Reviewed By: yfeldblum Differential Revision: D4783480 fbshipit-source-id: 979670b594859b45560cead34f615442dfcc9f8b	2017-03-28 00:50:54 -07:00
Kittipat Virochsiri	da36212259	SamplingTrain layer Summary: `SamplingTrain` layer is a wrapper around another layer subclassing `SamplingTrainableMixin`. When initiated in the training context, `SamplingTrain` produces sparse output of the wrapped layer. Output can be paired with `indices` to create Map schema. When initiated in prediction context, the full output of the wrap layer is produced. This is liked the SampledFC function in model helper, https://fburl.com/gi9g1awh, with the ability to initiated in both trainig and prediction context. I'd like to get consensus whether we should introduce the `SamplingTrain` layer and the accompaying mixin. This can probably be accomplished in some other way, but I think this is not too bad. Reviewed By: xianjiec Differential Revision: D4689887 fbshipit-source-id: 7be8a52d82f3a09a053378146262df1047ab26a8	2017-03-27 23:31:55 -07:00
Janusz Kudelka	8251653585	caffe2: relax PartitionOp constraints Summary: We actually copy items inside, so no need to limit this to POD types. Reviewed By: dzhulgakov Differential Revision: D4768652 fbshipit-source-id: 98f71b78a7c1dd4a2a2e1bff096d6bf63a0c8f50	2017-03-27 21:02:50 -07:00
Minsuk (Brian) Kahng	ebeb36f6ee	Refactoring, t-sne, additional features Summary: t-sne projection of instances activations Minor refactorings Reviewed By: Mortimerp9 Differential Revision: D4752784 fbshipit-source-id: f5cdb74616ab8e00f9ec362c0b94bcf7988e680e	2017-03-27 20:33:20 -07:00
Pieter Noordhuis	55546359b6	Retry on EINTR for writev in tcp/pair.cc Summary: TSIA Differential Revision: D4783319 fbshipit-source-id: 610d1a65a54048e7c56610632ccfe271eac85b6c	2017-03-27 17:35:45 -07:00
Yury Zemlyanskiy	0c47d345df	Multi-gpu training for OSS seq2seq Summary: Use data_parallel_model for seq2seq multi-gpu training. The main reason for complexity here is that GatherOp hasn't yet been implemented on GPU. This diff also adds better cliping procedure - clip by global norm rather than by absolute value. Differential Revision: D4778691 fbshipit-source-id: bff184dae02ecc227413fef51f48a4726e5d3825	2017-03-27 17:32:39 -07:00
Alisson Gusatti Azzolini	db9cae4d34	Allow passing a Deleter to ShareExternalData Summary: This allow tensors to borrow external buffers and return them once tensor data is reallocated or freed. This is similar to folly::IOBuf's takeOwnership and ZMQ's message constructor taking a deleter as argument. Reviewed By: dzhulgakov Differential Revision: D4760188 fbshipit-source-id: 6989678ad66af2e58487173174d5327bd5ae0515	2017-03-27 14:49:43 -07:00
Pieter Noordhuis	fe3d5a63f2	Support multiple predefined reduction functions Summary: Predefining the reduction functions makes it easy to provide a set of fast implementations. Eigen is used to implement them if it is found. Reviewed By: andrewwdye Differential Revision: D4780868 fbshipit-source-id: e825cf2e5cfe8ec27d587c5aff4002534b1c670d	2017-03-27 14:35:02 -07:00
soumith	e4b4e515cd	add mode to cwrap	2017-03-27 13:29:14 -07:00
soumith	4b1f5f4bd6	Merge commit 'afd576ec0e389db3e47efe44652c488b1706f168'	2017-03-27 13:26:50 -07:00
Pieter Noordhuis	37718e207d	Add remote offset argument to buffer send Summary: This makes it possible to write to any offset in a remote buffer. Reviewed By: andrewwdye Differential Revision: D4779776 fbshipit-source-id: f5a44cc705df5141bd720ff4e3fec8697f707a70	2017-03-27 13:07:17 -07:00
Trevor Killeen	afd576ec0e	Add mode kernel	2017-03-27 15:58:47 -04:00
Brandon Amos	95aa2af377	btrisolve: Make a Tensor method and update argument order Also update docs for btrifact and btrisolve to the newest interface.	2017-03-27 15:46:49 -04:00
soumith	6774d39c96	Merge commit '5d274cd4991022d63b014cc8917e00c15441d3f4'	2017-03-27 11:54:08 -07:00
soumith	567faedc59	Merge commit '8051dec608368fed3569c7513292785083adc53c'	2017-03-27 11:53:41 -07:00
Fei Sun	3ddcff659d	Move AddPlan, AddNet, AddBlobs to predictor_py_utils.py Summary: Cleanup Reviewed By: salexspb Differential Revision: D4775061 fbshipit-source-id: b58405729227a6e3fd867d9d5ba959feaa99e5a6	2017-03-27 11:03:22 -07:00
Jerry Pan	ee28b6ce22	Caffe2: instrument Everstore loader Summary: Caffe2: instrument Everstore loader and log to Scuba Differential Revision: D4669060 fbshipit-source-id: 603256e4ba62a32d9aeadc409f83ef9b1f6a7358	2017-03-27 10:02:11 -07:00
Bor-Yiing Su	7fa4acab9b	Loads only the model blobs from the checkpoints. Summary: To evaluate from checkpoints, we need to load a model from the checkpoints. However, the checkpoints store way more blobs than the blobs needed by the model. This function enables the model builder to load only the blobs associated with the model to the workspace. After that, the model builder can evaluate the model from the populated workspace. Reviewed By: azzolini Differential Revision: D4751414 fbshipit-source-id: a7a420228d681fc2dcfd8573cf69a97b1abc2ef3	2017-03-27 10:02:11 -07:00
Bor-Yiing Su	73b18e7ccf	Enables checkpointing for dper2. Reviewed By: azzolini Differential Revision: D4716571 fbshipit-source-id: c4d71ed676d9465290c2e3fcb26efbbecc72cf72	2017-03-27 10:02:11 -07:00
Pieter Noordhuis	7c2c7e8e31	Move NCCL code to subdirectory and backfill ops Summary: All operations supported by NCCL are now available through the Gloo wrappers. Algorithm wrappers for them are forthcoming so that they can be used interchangeably with other implementations. Since not all of them require same-sized source and destination pointers, I moved assertions on number of elements to the op constructors. Reviewed By: andrewwdye Differential Revision: D4771292 fbshipit-source-id: 2f34629507b5e1cb9ae8d6d2f02de0a7f641a341	2017-03-27 09:50:40 -07:00
Yangqing Jia	a7029cf34c	Add installation configs for header files Summary: TSIA Closes https://github.com/caffe2/caffe2/pull/223 Differential Revision: D4778061 Pulled By: Yangqing fbshipit-source-id: 36a5b60c6b5d40cf8ca06c0bad490e48ef3f57c8	2017-03-27 08:47:25 -07:00
Alykhan Tejani	3eab8a71e2	Added docstring to add_module (#1116 )	2017-03-27 11:09:24 -04:00
Soumith Chintala	2fd4d088ff	add Adaptive pooling methods to docs	2017-03-26 22:43:46 -04:00
Yangqing Jia	661fa5915d	minor bugfix for cmake Summary: TSIA Reviewed By: asaadaldien Differential Revision: D4778069 fbshipit-source-id: 965bd7e00738ed508d5a9b0cae109b73ba1e9b62	2017-03-26 18:46:31 -07:00
Yangqing Jia	2c7f45aa3f	update nnpack to the most recent version	2017-03-26 17:29:46 -07:00
Brandon Amos	5d274cd499	Update btrisolve argument order.	2017-03-26 13:07:24 -04:00
Brandon Amos	8051dec608	Update btrisolve argument order.	2017-03-26 13:06:34 -04:00
Jason Kuen	f2c1071c33	Adaptive max and average pooling (1D & 2D) (#1084 )	2017-03-26 17:09:28 +02:00
albanD	bb71117ecc	Cwrap arg assign (#1102 )	2017-03-26 13:53:28 +02:00
Michael Galkov	d25433a099	Fix docker build commands (#1103 )	2017-03-25 16:18:33 -04:00
ngimel	7dd45490f8	don't use inplace backward, remove unnecessary zero for grad_input (#1079 )	2017-03-25 20:04:48 +01:00
Kittipat Virochsiri	6163676ebe	Skip optimizer when param doesn't have gradient and optimizer is not set Summary: Currently, we cannot have layer constant because layer params are required to have gradient and optimizer. Global constants don't cut for this because it can only be added once; therefore, a layer that add any global constant can only be used once. Differential Revision: D4773212 fbshipit-source-id: 5b60d31f3c1602afb04b61f6d30b8e3e06ed2de3	2017-03-24 22:18:34 -07:00
Kevin Waugh	eea0ea7712	Struct nested field name lookup supports List Summary: D4690225 added support for nested field name lookup in nested `schema.Struct`s. It would throw a KeyError if trying to access a nested `List`s field. Writing the lookup recursively avoids the need to enumerate all complex field types in the lookup. Differential Revision: D4719755 fbshipit-source-id: 37c87a32d730f0f45f72fb20894da3e32f820999	2017-03-24 18:17:19 -07:00
Adam Paszke	bf632544e6	Pass NULL rinfo_ to btrifact by default (#1089 )	2017-03-24 19:49:40 -04:00
Adam Paszke	282402d4f3	Revert "Add back zero fill for ger" (#1093 ) This reverts commit 5a761dbe65d2221e9c200b3f8ea0590b5d9b923f.	2017-03-24 19:49:31 -04:00
Pavan Yalamanchili	1461709ea0	Improving the performance of IndexLinear:updateOutput - Removes separate kernel for updateOutputTrain	2017-03-24 16:34:31 -07:00
soumith	cce03074f5	Merge commit '3acbbb30f2bdc6ccf4ffb6f7d568e7916d4e384d'	2017-03-24 16:19:44 -07:00
soumith	f2f63773d8	Merge commit '52911f9e47f679045a238eb9dfdc5db55bf98cc9'	2017-03-24 16:19:19 -07:00
soumith	84aa41824c	Merge commit 'b4fe5ad641181f30bdcc4749c949206a3ebb04b4'	2017-03-24 16:19:05 -07:00
soumith	25c8a117af	Merge commit 'e8196f990db4ba368010f0d950bebf1fb13c2888'	2017-03-24 16:18:52 -07:00
Deepak Gopinath	6aee34b666	Registering GPU version of PackSegments using GPUFallbackOp Summary: Creating PackSegments and UnpackSegments GPU operators using GPUFallbackOp for now. The op does mainly copying of blobs and this is a reasonable solution until we have a CUDA op. Reviewed By: pietern Differential Revision: D4761589 fbshipit-source-id: dd483b9e34ecb6b53925405e5b4c24859c549606	2017-03-24 16:01:53 -07:00
ngimel	ae122707b5	Don't do extra resize in linear bias	2017-03-24 23:41:15 +01:00
Xiaolong Wang	8ce34d6c87	Add Calibration Summary: Add calibration to sparse_nn Differential Revision: D4735564 fbshipit-source-id: 6baa637cbffcbbd50134a256d622ef8c962fca3b	2017-03-24 14:32:23 -07:00
Alisson Gusatti Azzolini	b711c7d039	More perf stats for BlobsQueue Summary: Allow to drill down on data throuhgput overall and per field. Reviewed By: dzhulgakov Differential Revision: D4622168 fbshipit-source-id: 1462bb2fac05824fda0c02f4f5f0b8713893e650	2017-03-24 14:03:28 -07:00
Alisson Gusatti Azzolini	2f73a01b70	Average and time spent counters Summary: - Allow to capture averageable stats such as bytes and time per request - Allow to capture time ellapsed. Reviewed By: pietern Differential Revision: D4622101 fbshipit-source-id: f08e422ecdfda83b13a4ed8ab9c6d5c2a5d5718d	2017-03-24 13:34:27 -07:00
Fei Sun	29c1102806	Extract net and blobs assignment to separate functions Summary: Use AddNet and AddBlobs to add net and blobs to meta_net_def. This a codemod and does not change the functionality. It is for preparation of the protobuf change. Depends on: D4770648 Reviewed By: salexspb Differential Revision: D4771110 fbshipit-source-id: 00cecb2105f2c332bd50c3c51b9a10e1004fa90f	2017-03-24 13:17:24 -07:00
Adam Paszke	b4fe5ad641	Use zero instead of mul when beta == 0 in addr	2017-03-24 13:09:00 -07:00
ngimel	5a761dbe65	Add back zero fill for ger Ger does not have beta argument, so has to be zero-filled.	2017-03-24 21:03:02 +01:00
Yangqing Jia	e591ddb70b	Add nnpack specific dependencies under third_party	2017-03-24 12:37:56 -07:00
Luke Yeager	0ade0578b1	Reset workspace after each test in copy_ops_test Summary: This was a nasty one to track down. This was the error message: ``` E0323 14:47:46.138900 2870 context_gpu.h:126] Encountered CUDA error: an illegal memory access was encountered F0323 14:47:46.139143 2870 operator.h:176] Computation on device returned error in operator input: "x_gpu_2" output: "loss" name: "" type: "AveragedLoss" device_option { device_type: 1 cuda_gpu_id: 1 } ``` Closes https://github.com/caffe2/caffe2/pull/220 Differential Revision: D4771086 Pulled By: Yangqing fbshipit-source-id: f2d0f39f1647c84d97d9745f8a0305a389bfbc41	2017-03-24 12:20:34 -07:00
Fei Sun	ad8b92b9e8	Extract plans assignment to AddPlan function Summary: Codemod to use a separate function, for protobuf change later on It does not change the functionality Reviewed By: salexspb Differential Revision: D4770648 fbshipit-source-id: d8090f45d31ffa5ca1dca47297fb7c196f34d8a6	2017-03-24 12:02:49 -07:00
Francisco Massa	dd893391d5	Add argument to children to yield the name of the modules (#941 )	2017-03-24 20:02:05 +01:00
Boris Fomitchev	649f04d077	Added Pascal nvcc flags, bumped version	2017-03-24 11:58:14 -07:00
Yangqing Jia	463a28afcb	Windows build for easier python usage Summary: Changed the windows python extension name to ".pyd" and did a manual copy from the {Debug,Release} folder to the main folder for easier automatic build. Closes https://github.com/caffe2/caffe2/pull/222 Differential Revision: D4771065 Pulled By: Yangqing fbshipit-source-id: 4a89d409fa66f0979cf4ecf502189b2f9cc11504	2017-03-24 11:33:27 -07:00
Ahmed Taei	f45ef5fdb8	AllGather algorithm [CPU] Summary: Allgather ring CPU implementation. Its does \|buffers\| x \|contextSize\| passes. Reviewed By: pietern Differential Revision: D4723809 fbshipit-source-id: ffd8366ac7e1746555474e173143d33cee497822	2017-03-24 11:06:57 -07:00
Adam Paszke	e8196f990d	Make rinfo_ argument optional in btrifact	2017-03-24 09:01:36 -07:00
Adam Paszke	269b77a1b2	Make rinfo_ optional in btrifact	2017-03-24 09:00:39 -07:00
Pieter Noordhuis	51a92c6659	Add gloo submodule	2017-03-24 15:34:50 +00:00
Xingdong Zuo	476d85dd3f	DataLoader: Fix batch data type for numpy array (#1074 )	2017-03-24 11:34:24 -04:00
Pieter Noordhuis	d5880b128e	CMake support for Gloo dependency Summary: This also requires a change to cmake/External/nccl.cmake to use the static NCCL binary instead of the shared object. When the Caffe2/Gloo build uses the bundled NCCL version it should be packaged up in the resulting libraries and not cause another runtime dependency on a library that has to be installed separately. Closes https://github.com/caffe2/caffe2/pull/218 Differential Revision: D4769926 Pulled By: pietern fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5	2017-03-24 08:32:24 -07:00
Edgar Riba	63f6c0d692	add Pairwise distance (#835 )	2017-03-24 11:29:40 -04:00
Edgar Riba	b546fa3fcd	add assertTrue to padding tests	2017-03-24 15:27:51 +01:00
Rodrigo Castro	1d656b6769	Ensure displayed progress in ProgressMonitor is between 0 and 100%. Fixes #1086	2017-03-24 15:21:52 +01:00
Yury Zemlyanskiy	97a6400f03	Don't do copy for param_grad in backward_step_net Summary: We anyway accumulate values of this blob (param_grad) in a another special internal blob Differential Revision: D4768643 fbshipit-source-id: a9d08b7eafd25f278a8db722f9cdb1d0064b852a	2017-03-24 02:22:33 -07:00
Ahmed Aly	99bfd36a04	CRF layer in caffe2 Summary: This is implementation of a CRF layer in caffe2 according to this paper: https://arxiv.org/abs/1603.01360 Currently this implementation works only for batch_size = 1 Reference implementations: - Tensorflow: `63a21e0540/tensorflow/contrib/crf/python/ops/crf.py` - Theano: https://github.com/glample/tagger/blob/master/model.py#L286 Differential Revision: D4644004 fbshipit-source-id: bf0801fd8562d11dca3fefe371c3d85e1dd69ccc	2017-03-23 22:02:02 -07:00
ngimel	3acbbb30f2	Fix inconsistent in-place and out-of-place for HardTanh in-place and out-of-place updateGradOutput results are different where input=min_val or input=max_val	2017-03-23 17:27:29 -07:00
ngimel	52911f9e47	Fix inconsistent in-place and out-of-place implementations Currently in-place and out-of-place updateGradOutput will produce different results for input=max_val or input=min_val - in-place won't backprop gradient where input=max_val or input=min_val, out-of-place will backprop gradient in this case.	2017-03-23 17:22:55 -07:00
Christian Sarofeen	a65e0f488c	Remove zero fill where not needed (#1077 )	2017-03-23 19:44:00 -04:00
Bram Wasti	396ebb0546	exec_net --> predict_net Summary: Change the naming convention back for maintainability. Reviewed By: Yangqing Differential Revision: D4741875 fbshipit-source-id: 044051e772383e81812ae7064a921e97d63615dc	2017-03-23 16:31:49 -07:00
Sergey Zagoruyko	8dc5d2a22e	export current_blas_handle	2017-03-23 23:32:45 +01:00
ezineo	2cb123df83	Fixed list init issue under MSVC compliation Summary: Closes https://github.com/caffe2/caffe2/pull/216 Differential Revision: D4763418 Pulled By: Yangqing fbshipit-source-id: 85148720388e407c9a0f9660ef4822048837de14	2017-03-23 15:17:49 -07:00
Deepak Gopinath	422c65ca35	Removing unnecessary Copy after fixing gradients for external parameters Summary: Apart from copying gradient blobs for inputs with initial_cell_input, we needed to perform a similar operation for external parameters used by the step net Reviewed By: salexspb Differential Revision: D4752259 fbshipit-source-id: 13ee48cf583ed86221a4cc1cc9f57f5c3a7d2450	2017-03-23 15:04:22 -07:00
Pavan Yalamanchili	ed97f3f854	Adding support for flattened inputs for IndexLinear - Adding relevant tests	2017-03-23 14:18:41 -07:00
Pavan Yalamanchili	a231fe8fc5	IndexLinear support for cunn	2017-03-23 14:18:01 -07:00
Huazhong Ning	8168e8ac25	allows to specify output names for functional layers Summary: currently the output schema and blobs are names as "field_i" which is bad for debugging. This diff allows us to specify output names. Reviewed By: kennyhorror Differential Revision: D4744949 fbshipit-source-id: 8ac4d3c75cacbb4c9b5f55793ac969fe1cf20467	2017-03-23 13:18:58 -07:00
ezineo	0bd69b20d7	ReluGradientOp implementation with Eigen Summary: Closes https://github.com/caffe2/caffe2/pull/217 Differential Revision: D4764348 Pulled By: Yangqing fbshipit-source-id: b7053a085650160e221293d528f553cc402ff10b	2017-03-23 13:18:57 -07:00
Brandon Amos	bb353ccc17	Add batch triangular factorization and solves, add IntegerTensor to cwrap (#903 )	2017-03-23 15:06:00 -04:00
Adam Paszke	ced0054a9e	Fix formula for stddevs grad in Normal function (#1076 )	2017-03-23 14:32:34 -04:00
Natalia Gimelshein	68ee5ede29	make inplace tests compare input grads	2017-03-23 18:54:00 +01:00
Pieter Noordhuis	2966e3295d	Make static/shared configurable and install optional Summary: This makes it possible to embed Gloo in a project without CMake installing Gloo headers and/or libraries, or having a runtime dependency (and statically link to it). Also: * Install benchmark tools * Statically link to NCCL if the bundled version is used Closes https://github.com/facebookincubator/gloo/pull/19 Differential Revision: D4762432 Pulled By: pietern fbshipit-source-id: cf38903e6c51f2480fba4ff18cbdc0c9080df0c4	2017-03-23 09:06:37 -07:00
soumith	4df98e2927	Merge commit '3865606299b1fbcd0a94cef4a66c1bc007246da8'	2017-03-23 08:39:43 -07:00
soumith	6ccac5ce28	Merge commit 'd3334db6274d7a3cd07f20d583056e453dc8134d'	2017-03-23 08:39:30 -07:00
Zico Kolter	3865606299	adding batch triangular factorization and solves, add IntegerTensor to cwrap	2017-03-23 11:37:00 -04:00
Zico Kolter	d3334db627	adding batch triangular factorization and solves, add IntegerTensor to cwrap	2017-03-23 11:35:35 -04:00
Edgar Riba	50f5a4dd18	fix BCE loss formula visualization (#1072 )	2017-03-23 11:27:21 -04:00
Soumith Chintala	b60936b9ae	fix NLLLoss2d documentation	2017-03-23 10:06:40 -04:00
Soumith Chintala	2d750b9da5	fix typo	2017-03-23 09:40:06 -04:00
chenyuntc	ca376d4584	implement autograd function trace	2017-03-23 10:37:52 +01:00
soumith	ef183a1d23	Merge commit '5cd313ed23a3b11ddd739bcfedaee6e310e4e438'	2017-03-22 19:25:46 -07:00
Soumith Chintala	f4d8944973	fix OSX fread bug (#1068 )	2017-03-22 22:06:14 -04:00
Ahmed S. Taei	42036871e9	Fix windows build Summary: Closes https://github.com/caffe2/caffe2/pull/214 Differential Revision: D4755224 Pulled By: asaadaldien fbshipit-source-id: 8a3c6d13319aecc0bf700bad2b3e9ed2a53571e9	2017-03-22 19:01:36 -07:00
Alisson Gusatti Azzolini	d76e460b80	Allow to query the blob size in bytes for perf stats Summary: This allows to gather stats on how much raw and compressed data is being transferred across queues and network. Reviewed By: dzhulgakov Differential Revision: D4622049 fbshipit-source-id: 27c0c0df9e5a705f91256b20a29c7f8f988085da	2017-03-22 18:09:55 -07:00
Hardik	6b7aef63ac	Added support for multidimensional tensors in PReLU; Channel number now in second dimension	2017-03-22 20:36:52 -04:00
ngimel	b3ab4b1094	Check torch.backends.cudnn.enabled, padding, and output_padding (#996 ) * Check torch.backends.cudnn.enabled * Don't allow negative padding and output_padding values	2017-03-22 19:42:11 -04:00
Adam Paszke	1e8cb82a2d	Break only after the update in L-BFGS	2017-03-22 18:58:42 -04:00
Adam Paszke	dd399a8d68	Return total param norm from clip_grad_norm	2017-03-22 18:58:42 -04:00
Adam Paszke	faac0f5c25	Fix torch.cat bugs Always use PySequence API and disallow catting along inexistent dimensions.	2017-03-22 18:58:42 -04:00
Adam Paszke	c36f47bd1e	Make random_ exclusive and make generator kwarg only in all random functions	2017-03-22 18:58:42 -04:00
Adam Paszke	3d1888cd95	Fix size mismatch in CosineEmbeddingLoss backward	2017-03-22 18:58:42 -04:00
Ahmed Taei	3b7cb50d1c	Add ConvNd to model helper Summary: Add ConvNd interface for Nd convluton and keep Conv for 2d convlution. I added _BaseConv to share code between ConvNd and Conv. Reviewed By: Yangqing Differential Revision: D4660822 fbshipit-source-id: 8339421351ce9a36ce5a165f7fa455cfcc61733d	2017-03-22 15:47:48 -07:00
Yangqing Jia	0276c992b7	translator fix Summary: This completes the fix that viswanathgs started in an earlier diff but did not cover the full Caffe convention. It should have proper guards for all the stuff that Caffe implies, either supporting it or throwing an explicit exception. Reviewed By: viswanathgs Differential Revision: D4751751 fbshipit-source-id: 474e921c33840cff333a631b7b19f881b39ebccd	2017-03-22 15:09:13 -07:00
Erik Zawadzki	e4907bd1ba	Improving exception logging in Caffe2 Summary: Changed logging so stack trace always comes last. Reviewed By: dzhulgakov Differential Revision: D4749720 fbshipit-source-id: 5c8bb1b6087cb5db2e91606a5b0cb40c783bf909	2017-03-22 15:09:13 -07:00
ngimel	97a82a3018	fix formatting in upsampling docs (#1067 )	2017-03-22 18:06:31 -04:00
Adam Lerer	5cd313ed23	Fix TH_TENSOR_APPLYX_D in the case where the dimension of interest is the inner dimension	2017-03-22 13:15:01 -07:00
soumith	b414494035	Merge commit '714b2b8bf657afe41cc8503998b6d919339b8075'	2017-03-22 12:49:29 -07:00
soumith	c10efc646e	Merge commit 'e17d84d38edf6094175deead555abbc96321b69f'	2017-03-22 12:49:11 -07:00
soumith	348531ad8d	Merge commit '0056b0883426e38ffbd646c040b6c281d12673f2'	2017-03-22 12:48:57 -07:00
Pieter Noordhuis	9d83121ef5	Don't add options to CUDA_NVCC_FLAGS if already set Summary: This may be the case when the Gloo CMake files are sources from a parent project that has already imported CMake CUDA support. If these checks are not performed then CUDA_NVCC_FLAGS might contain conflicting options. Verified this works while working on Gloo for Caffe2. Closes https://github.com/facebookincubator/gloo/pull/18 Differential Revision: D4756179 Pulled By: pietern fbshipit-source-id: 32fc39ec2322cce5899a2398ebbf8395d3917502	2017-03-22 12:35:04 -07:00
Pieter Noordhuis	9ab65d7be0	Add CUDA profiling ops Summary: These new ops allow you to initialize, start, and stop the CUDA profiler. This makes it possible to profile CUDA code without running the application through nvprof. Reviewed By: jamesr66a Differential Revision: D4747863 fbshipit-source-id: b439e8f28d1d62db19524fee0458523414cb79e3	2017-03-22 09:37:35 -07:00
Gregory Chanan	6d7cb31e53	MPI: Duplicate MPI_Comm and allreduce maxLength as MPI_ UNSIGNED_LONG. Summary: Some small MPI-related changes: 1) Instead of making an object copy of the MPI_Comm, call MPI_Comm_dup; because the (passed-in) communicator is used later via the call to connectFullMesh this guarantees that the communicator will not have been freed by user before connectFullMesh is called. 2) Allreduce for maxLength is done on an unsigned long type; use the corresponding MPI type. Closes https://github.com/facebookincubator/gloo/pull/17 Differential Revision: D4754195 Pulled By: pietern fbshipit-source-id: 863fd33c726f88120f8f5ee61964c3525babbf97	2017-03-22 09:26:00 -07:00
Andrew Dye	30a9cf7a46	Mark transport pair after IO error and propagate to calling threads Summary: This change solidifies IO error handling between threads and successive transport API calls. When an IO exception occurs, signal all buffers of the error, propagating the exception from the device thread or single user thread onto all user threads. Store the exception in the pair and check on future API calls or device events. Swallow all IO exceptions in the device loop. Right now IO exceptions during portions of the listen/connect phase will result in an indefinite wait in the peer. I will address this with a configurable timeout (t16205269). Reviewed By: pietern Differential Revision: D4749248 fbshipit-source-id: c75ee3b20875d561bf84631e5384e28015dabad3	2017-03-22 09:06:24 -07:00
Soumith Chintala	714b2b8bf6	Merge pull request #453 from apaszke/lookup_renorm Cast accumulator in LookupTable renorm to accreal	2017-03-22 11:53:41 -04:00
Hardik	fe4bd5066b	Added support for multidimensional tensors in PReLU; Channel number now in second dimension	2017-03-22 11:45:02 -04:00
Hardik	e17d84d38e	Added support for multidimensional tensors in PReLU; Channel number now in second dimension	2017-03-22 11:44:28 -04:00
Nitish Shirish Keskar	b9aef6bc03	Fixing default values for LR and Epsilon (#895 ) It seems that the default values for LR and Epsilon (previously, 1E-2 and 1E-38 respectively) were different from the ones recommended by the authors (2E-3 and 1E-8, respectively). Other packages such as Keras (https://github.com/fchollet/keras/blob/master/keras/optimizers.py#L474) and Lasagne (https://github.com/Lasagne/Lasagne/blob/master/lasagne/updates.py#L612) use the suggested values as well.	2017-03-22 11:34:39 -04:00
Adam Paszke	0056b08834	Narrow V when returning only some right singular vectors	2017-03-22 08:33:03 -07:00
Adam Paszke	bd0df61bb5	Cast accumulator in LookupTable renorm to accreal	2017-03-22 08:29:39 -07:00
Jihun Choi	d9678c2e34	Correct typo in batchnorm documentation	2017-03-22 13:55:45 +01:00
Yury Zemlyanskiy	ea66516d5e	Output attention weights from apply_xxx_attention methods Summary: OSS diff. We need it later for beam decoding. Differential Revision: D4747785 fbshipit-source-id: ce2d53ee2434216ace3c4ddbd40a9b68e9db7ec5	2017-03-21 19:01:58 -07:00
Alexander Sidorov	d7b2aebf2c	Support for Sum in cell net as first operator Summary: This didn't work for a reason specified in comments. Also some cleanup in the unit tests, now inference uses a custom workspace to run cell net on Reviewed By: urikz Differential Revision: D4742670 fbshipit-source-id: 04165c029fddec5ae31b20b207faf06d2fa20816	2017-03-21 18:32:18 -07:00
Yangqing Jia	e3fc195fc6	fix mklmemory bug Summary: This popped up during the debugging with intel folks. Reviewed By: salexspb Differential Revision: D4745176 fbshipit-source-id: 88ce91e565b45253d60588ab35ed4b8e5b8d4947	2017-03-21 18:04:24 -07:00
Pieter Noordhuis	2417725ae5	Use version-specific CUDA packages 16.04 Summary: So you can just run `BUILD_CUDA=ON .travis/install.sh` on a 16.04 machine and have it install the right packages. Closes https://github.com/caffe2/caffe2/pull/212 Differential Revision: D4748670 Pulled By: Yangqing fbshipit-source-id: 2015613e4d5ca6bcd1c9320c6c4cba071463c120	2017-03-21 13:47:07 -07:00
Fisher Yu	b3c0aa3b7d	fix a typo in ffi doc (#1055 )	2017-03-21 15:37:48 -05:00
Pieter Noordhuis	8fc9c79287	Add nccl submodule	2017-03-21 17:53:58 +00:00
Pieter Noordhuis	4fce1a389f	Include CUDA support in CMake build Summary: * Pull in NCCL submodule * Include (heavily modified) CUDA/NCCL build files from [Caffe2](https://github.com/caffe2/caffe2) * Build CUDA enabled benchmark/test * Enable CUDA build in Travis configuration Closes https://github.com/facebookincubator/gloo/pull/16 Differential Revision: D4746784 Pulled By: pietern fbshipit-source-id: b5c6cbcd8ac8b30c071851cdc7ae88c69c0ab4d6	2017-03-21 10:51:57 -07:00
Dmytro Dzhulgakov	8a35fea9eb	Improve error message for not found operator Summary: Seems like a lot of confusion in the group lately has been about missing CUDA operators. Let's make it clearer in the error message. Reviewed By: azzolini Differential Revision: D4737037 fbshipit-source-id: 56c7819df909bf954510296703bff5f221fa8ae7	2017-03-21 10:35:00 -07:00
Yangqing Jia	aa4d07d3c4	bugfix for Windows, esp. VS 2017 Summary: aaronmarkham this solves your Windows build issue. Basically: (1) VS 2017 does not have CUDA support yet, and we will be waiting on NVidia to do so. (2) VS 2015 and 2017 need different cmake generator strings. This PR shows how to determine those and also updates appveyor to do contbuild guard for the following 3 settings: - VS2015 without cuda - VS2017 without cuda - VS2015 with cuda Closes https://github.com/caffe2/caffe2/pull/210 Differential Revision: D4745007 Pulled By: Yangqing fbshipit-source-id: 50952552843abd0eb6f4145d9f132daeee3a6794	2017-03-21 05:17:59 -07:00
Yury Zemlyanskiy	93ff338ca7	Beam decoder for NMT in Caffe2 Summary: yolo5 Differential Revision: D4685076 fbshipit-source-id: b5534e441bb453f90e5210294f2dfff6b5c3b5b1	2017-03-20 22:03:59 -07:00
Kevin Waugh	d13f98de4e	implemented DistillLRLoss Summary: Created `BatchDistillLRLoss` layer and added support for it in DPer2. Differential Revision: D4718333 fbshipit-source-id: b873954ea704daafed94ac65fef47a20d56858e2	2017-03-20 16:01:29 -07:00
Ahmed Taei	e41d35909a	Conv-ND NCHW CUP/CUDA implementation Summary: Migrate caffe1 ConvNd implementation to caffe2. Reviewed By: Yangqing Differential Revision: D4659868 fbshipit-source-id: 14b178af3faa2c0b12e5a9f7aa76c1d8945419ea	2017-03-20 14:01:07 -07:00
Andrew Dye	8ce56c30d4	Convert runtime errors to gloo exceptions Summary: Bubble up gloo configuration and network errors as exceptions. The caller may be able to recover. Other unexpected failures continue to be handled as fatal with GLOO_ENFORCE Modify ibverb API validation to check for != 0 instead of -1 to conform with API definition. Still need to convert some errors in the rendezvous code and add documentation. Will pass device loop errors onto the calling thread in a future diff Reviewed By: pietern Differential Revision: D4730362 fbshipit-source-id: c801adb353013e7f541ab01ac16a0cc71c1c36b2	2017-03-20 13:50:29 -07:00
Ahmed Taei	771d169c7c	Extend conv params to handel nd inputs Summary: Extend ConvOp parameters to handel ND convlution input parameters. Differential Revision: D4659838 fbshipit-source-id: 920f40dd80acfd03e04fcc04221209302232906d	2017-03-20 13:18:39 -07:00
Pieter Noordhuis	4667f936e3	Add explicit dependency on pthreads Summary: Got linker errors on Ubuntu 16.04 (not on 14.04). Adding the pthreads dependency explicitly fixes it. Closes https://github.com/facebookincubator/gloo/pull/15 Differential Revision: D4739081 Pulled By: pietern fbshipit-source-id: 6bae7d361d934e93560d28a76c3dca4a4236f113	2017-03-20 11:52:41 -07:00
Pieter Noordhuis	4eaa30b634	Build tweaks Summary: * Mention submodules in README * Remove fetch.sh from third-party directory * Rename benchmark/test build targets Closes https://github.com/facebookincubator/gloo/pull/14 Differential Revision: D4739077 Pulled By: pietern fbshipit-source-id: 859c1cac0c0163870eae8f18e4e2f177a6bc8890	2017-03-20 11:35:19 -07:00
James Reed	33f41c06c0	Remove more instances of batch_size Summary: D4734505 part 2. Remove more instances of the batch_size parameter Reviewed By: urikz Differential Revision: D4736906 fbshipit-source-id: fc9d374e9308017d61c427890364c5ab9cec2edf	2017-03-19 22:31:30 -07:00
James Reed	17da5856ed	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4734505 fbshipit-source-id: d9c23d85be84f61124106e752ef2b4f6945e2a07	2017-03-19 18:16:28 -07:00
Alexander Sidorov	3924a35509	RNN: recycle workspace (attempt 2, easy mode) Summary: this is a bit simple version of what Aapo did before. As that one has some weird crashes in some of the training pipelines. Reviewed By: urikz Differential Revision: D4734934 fbshipit-source-id: f9ecff2a0d68a8cbc0858658f38be34d616fa100	2017-03-18 22:31:39 -07:00
Alexander Sidorov	3e222d501a	Backed out changeset 460028d912d6 Reviewed By: urikz Differential Revision: D4734926 fbshipit-source-id: c3ba01b70c7f515e1580a8f9a5e6d3ecff1d9f47	2017-03-18 22:31:39 -07:00
Alexander Sidorov	25bbd632e3	Backed out changeset 35c70e825855 Reviewed By: urikz Differential Revision: D4734923 fbshipit-source-id: 0d460b8460aef510ce4f18fdaaeaedebe1324608	2017-03-18 22:31:39 -07:00
Yury Zemlyanskiy	d1424c3265	Revert D4702086: Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: This reverts commit c4c1d8425cd36c1e86695918eaba2667c27e9601 Differential Revision: D4702086 fbshipit-source-id: 4620610b182bb84b9297b5de32782761ae89d20b	2017-03-17 17:36:47 -07:00
Alexander Sidorov	f97d7949d0	Remove legacy LSTM, cleanup tests Summary: we don't use this one any more except a few tests Reviewed By: urikz Differential Revision: D4731401 fbshipit-source-id: c5c28b7594e3251f501fc28455dfc9bd2093a836	2017-03-17 16:33:53 -07:00
Sam Gross	77fbc12f23	Fix some deadlocks when torch_shm_manager is not found (#1030 ) - Add additional timeouts to test_multiprocessing to reduce chances of hanging indefintely on failure - Add missing header guards - Fix typo - Check that torch_shm_manager exists in torch/__init__.py	2017-03-17 18:28:39 -04:00
Adam Paszke	7e46eb1613	Fixes for Prod and Expand functions (#1026 ) Thanks to @ChangYong-Oh for the original implementation.	2017-03-17 18:24:44 -04:00
Yangqing Jia	1aa5231fb3	make nnpack build on mac/linux, and also contbuild support Summary: * add custom ninja install * minimal build for nnpack * force -fPIC for nnpack Closes https://github.com/caffe2/caffe2/pull/207 Differential Revision: D4729265 Pulled By: Yangqing fbshipit-source-id: 2ed345a4fda6b4811af03cd1898e2402dda58701	2017-03-17 15:19:07 -07:00
Pieter Noordhuis	a2fc88cf97	Remove fbcollective from tree Summary: This has been subsumed by gloo. Reviewed By: andrewwdye Differential Revision: D4729216 fbshipit-source-id: aa4f0637ee70dd03e85a6a0e7ffda68e5e9505be	2017-03-17 10:19:06 -07:00
Yangqing Jia	a15776c868	Fix for Windows build Summary: suppressed warning, added noexcept to destructors, and fixed an inclusion bug introduced today in the top_k diff. Closes https://github.com/caffe2/caffe2/pull/206 Differential Revision: D4729263 Pulled By: Yangqing fbshipit-source-id: 20166382f1e3547713f7d554a151a5387f0a41c1	2017-03-17 10:19:06 -07:00
Kittipat Virochsiri	4829bdb1ea	BatchSoftmaxLoss layer Summary: Similar to BatchLRLoss layer Reviewed By: xianjiec Differential Revision: D4689609 fbshipit-source-id: 89fa4b9d4145ce77cb2aaa7a5c0c1a24f901d88f	2017-03-17 10:19:06 -07:00
Kittipat Virochsiri	cea16ff7cd	BatchSigmoidCrossEntropyLoss Summary: To support feed interset team Reviewed By: kdub0 Differential Revision: D4719213 fbshipit-source-id: 8deb3544377fb06593399b101de66f3f845f93b5	2017-03-17 09:35:51 -07:00
Pieter Noordhuis	c3973f08a5	Check that inputs/outputs don't change between runs Summary: This can happen when the tensors are changed/resized. The cached algorithm instance won't be valid in that case. I think for now it's best to fail hard and require the net to be reinitialized if this happens. If instead we would always reinitialize this condition is detected then frequent resets could lead to poor performance and go undetected. I spoke about the generality of this problem with YQ. The pattern used here of updating a representation of the op's parameters is far from ideal. Instead, it would be much better to have the core framework use some kind of versioning on tensors/blobs (can be as simple as a single integer) to make it much easier to detect a change in inputs/outputs. If there are more places that would benefit from such a facility, we should consider adding it. As right now Gloo is the only place where we need it, it doesn't make sense to immediately add it to core. Reviewed By: Yangqing Differential Revision: D4728121 fbshipit-source-id: 69a8a620aecc961a3f7a27e8c53e22945d9a258e	2017-03-17 09:04:04 -07:00
James Cross	79c3a3af54	add gpu support for caffe2-seq2seq Summary: Adding synchronous optimization on GPUs to the translation training pipeline, via data_parallel_model.Parallelize_GPU, which needs to be updated so there is some way of performing sparse parameter updates (e.g., on embedding tables), whether on GPU or CPU. Reviewed By: urikz Differential Revision: D4631914 fbshipit-source-id: 9cdd655f7dbda3f9b2733d459228b3e097892441	2017-03-17 05:19:14 -07:00
Soumith Chintala	821656d2d8	add CONTRIBUTING document	2017-03-17 07:59:37 -04:00
Du Phan	86e40ed875	Fix a typo in docs about pinned memory buffers (#1023 ) * remove misleading guide for BCELoss * fix docs about pinned memory buffers	2017-03-17 05:08:03 -04:00
Jon Morton	1513b1de6b	Add ResizeNearest operator Summary: This adds a nearest neighbor interpolation resizing operator to caffe2. CPU only, NCHW only, no gradients. Also adds torch2caffe support. This is probably not optimal in terms of performance, but it works. Reviewed By: ajtulloch Differential Revision: D4724244 fbshipit-source-id: b8295061141fb513da84acf91fdfd67264119059	2017-03-16 18:49:01 -07:00
Huazhong Ning	ad4ae4528f	migrate mtml to dper2 Summary: 1. migrate the basic mtml model to dper 2 2. test dper 2 mtml model 3. test all optimizers Reviewed By: kittipatv Differential Revision: D4680215 fbshipit-source-id: 7aac5c59bdac22fcad8ed869b98e9e62dca1d337	2017-03-16 17:48:05 -07:00
James Reed	cc2e915461	Implement TopK op in caffe2 Reviewed By: salexspb, urikz Differential Revision: D4718439 fbshipit-source-id: e6866eb7bb586f2716662cd4b65961bdd9914525	2017-03-16 17:32:20 -07:00
Kevin Waugh	2c8bf2525b	added BatchL2Loss layer Summary: layer that takes a label, prediction pair and outputs the L2 loss Reviewed By: kittipatv Differential Revision: D4702111 fbshipit-source-id: 09f2ede44d1b548e61096de741f1b2aa0b66bbcb	2017-03-16 17:32:20 -07:00
Yangqing Jia	9382ecb9cd	Set up Caffe2 versioning number Summary: Setting up a caffe2 versioning number per popular request. The plan is to periodically update the version, with the current plan being every other week. As a result I am setting the initial number to minor version 5 (since this is the 11th week of the year). Reviewed By: salexspb Differential Revision: D4725945 fbshipit-source-id: 9ff4c7e4a6341e22a5f1d4e25740705988cae84b	2017-03-16 17:32:20 -07:00
Huazhong Ning	227fd0bbc7	fix bypassing_mtml crash Summary: Currently if all samples in a batch miss labels, the task customized layers have no data. In that case, the EnsureDense op does not compute the gradient correctly. To avoid that, we switch back to let Gather to generate dense gradients. why EnsureDense op does not compute the gradient correctly? It is because when EnsureDense computes gradients, it does not know the actual data batch size. So its out gradients may have wrong batch size. Reviewed By: xianjiec Differential Revision: D4712463 fbshipit-source-id: 736f63273e7fbc4348f37fa3a5a696f855b7c3ad	2017-03-16 16:08:43 -07:00
Andrew Dye	1d0699e147	Define exception hierarchy Summary: Define an exception hierarchy for gloo runtime errors. Keep GLOO_ENFORCE macros for assertions. Reviewed By: pietern Differential Revision: D4724124 fbshipit-source-id: 22f0581b06524579e86fe335770bdb620d20e258	2017-03-16 15:08:01 -07:00
Andrew Tulloch	ea52c7567a	Expose minSize for threadpool Summary: Useful for restoring after a conditional block where we want to disable threading. Reviewed By: jamorton Differential Revision: D4638648 fbshipit-source-id: 29695284f7c427caa6b80a9bca0cbd1406543a44	2017-03-16 14:47:25 -07:00
Alexander Sidorov	d85ed5d5d6	fix external_loggers Summary: it was broken in trunk and I fixed it locally then had a wrong merge in D4672026. This is just a revert of those changes Reviewed By: ajtulloch Differential Revision: D4723138 fbshipit-source-id: 14757d9c8ae5135bd7c084003a64e25efc74b54f	2017-03-16 13:47:58 -07:00
Sam Gross	b9379cfab7	Use cuDNN and NCCL symbols from _C library (#1017 ) This ensures that we use the same library at the C++ level and with Python ctypes. It moves the searching for the correct library from run-time to compile-time.	2017-03-16 16:10:17 -04:00
James Reed	10d95bd0f0	Remove batch_size parameter from attention and LSTMWithAttention interfaces Summary: Reshape based on tensor shapes in the graph rather than based on a passed-in batch_size parameter Reviewed By: urikz Differential Revision: D4702086 fbshipit-source-id: c4c1d8425cd36c1e86695918eaba2667c27e9601	2017-03-16 11:47:52 -07:00
Yangqing Jia	412148a62a	update NNPACK module	2017-03-16 11:14:24 -07:00
Soumith Chintala	f0b75c4aa4	Merge pull request #729 from shenxiul/cuda_linspace linspace and logspace for CUDA Tensors	2017-03-16 14:03:00 -04:00
Kentaro Wada	7654b3f49e	Add function to compute cross_entropy for 2D image (#802 )	2017-03-16 17:34:04 +01:00
Yugnaynehc	37ebbc2809	the length of any item in padded_sequence should be greater than 0 (#1013 )	2017-03-16 17:32:43 +01:00
Xianjie Chen	b2ab7365be	fix for special case when dense dim is 1 Summary: otherwise it will fail here: https://fburl.com/puy5x2dq Reviewed By: kittipatv Differential Revision: D4719212 fbshipit-source-id: e0d8211f64dca00ee48df3235d2bc030ea30f208	2017-03-16 05:19:10 -07:00
Ilya Biryukov	8241cd7b6e	Fix compilation error when compiling with 'clang -x cuda'. Functions vFetch and vStore are not found by ADL with clang, so they need to be declared before usage in ReduceCopy.	2017-03-16 12:01:11 +01:00
Pieter Noordhuis	a7781fdebc	Use default Redis port in RedisStore constructor Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4718573 fbshipit-source-id: c0b9aa78cf1f4db910526841c0172537b9243f7e	2017-03-15 22:19:51 -07:00
Shenxiu Liu	29ddbc3e37	implement linspace, logspace and range in CUDA	2017-03-15 20:50:30 -07:00
Luke Yeager	7773a2d643	Bugfix: type not being set when inferring types+shapes Summary: /cc akyrola I basically just copied all the `ShapeCall` stuff as `TypeCall`. Is there a better way? Closes https://github.com/caffe2/caffe2/pull/187 Differential Revision: D4699312 Pulled By: Yangqing fbshipit-source-id: 92f736ffe4127b00b5821acb1eb359771975fdd7	2017-03-15 18:48:40 -07:00
Sam Gross	16a133ed9a	Fixes for testing on FB infra (#1009 ) - make each test in test_autograd have a unique name ignoring case - assemble all tests when test_legacy_nn is imported - import Python.h in PtrWrapper.h	2017-03-15 18:37:11 -04:00
Pieter Noordhuis	1aa665f6a8	Documentation Summary: * Add separate file for rendezvous docs * Mention using MPI for rendezvous * Fix algorithm docs formatting Closes https://github.com/facebookincubator/gloo/pull/13 Differential Revision: D4715442 Pulled By: pietern fbshipit-source-id: 0469ab8d16fd489a38c399ec2b25860d1225ce72	2017-03-15 14:58:51 -07:00
Sam Gross	c4d1318662	Fix map_location in torch.load (#1006 )	2017-03-15 16:54:19 -04:00
Sam Gross	379ae6d865	Refactor out dispatchStateless (#1007 ) Some of the error messages were incorrect due to erroneous 'tensor == THPDefaultTensorClass' checks	2017-03-15 16:24:55 -04:00
Yangqing Jia	9aa277eeb1	Fix cmake gcc error per @benbarsdell Summary: Closes https://github.com/caffe2/caffe2/pull/203 Differential Revision: D4714861 Pulled By: Yangqing fbshipit-source-id: 191414c8a39b2437292128b266a8c4a3502dcedf	2017-03-15 11:49:14 -07:00
Soumith Chintala	24376ff9d3	Merge pull request #723 from killeent/scan-primitive add implementation of inclusive scan via upsweep-downsweep	2017-03-15 14:37:21 -04:00
Alexander Sidorov	56f324d191	Added predictor bindings to python interface Summary: from caffe2.python import workspace; p = workspace.Predictor(init_net, predict_net); outputs = p.run(inputs) Reviewed By: Yangqing Differential Revision: D4576793 fbshipit-source-id: b829bbcaf2e7c34dad85024177433207bd96a234	2017-03-15 11:17:54 -07:00
Kittipat Virochsiri	61dd35f1d6	FCWithoutBias layer Summary: For some embedding task, we don't want to include bias term in embedding computation. Reviewed By: xianjiec Differential Revision: D4689620 fbshipit-source-id: 4168584681d30c0eaa1d17ceaf68edda11924644	2017-03-15 11:03:37 -07:00
Andrew Dye	6ac793dcbe	Reuse ncclComm_t across algorithm instances Summary: Initializing ncclComm_t is expensive. Allocate a set of ncclComm_t for each unique device set and cache for reuse. With this change the CudaAllreduceChunked tests runtime improved from ~170 sec -> ~10 sec on my machine. There is no improvement in the benchmark numbers because the algorithm instance is only allocated once. Reviewed By: pietern Differential Revision: D4708943 fbshipit-source-id: 85b85070586d6683a762b8282df593ca831e7bc7	2017-03-15 09:51:43 -07:00
Pieter Noordhuis	e00d9c1fd8	Execute benchmark through mpirun Summary: This change includes CMake changes to compile the MPI assets when the USE_MPI flag is enabled. If so, the benchmark tool can now be launched through mpirun. Includes the changes done in #11. Closes https://github.com/facebookincubator/gloo/pull/12 Reviewed By: Yangqing Differential Revision: D4712060 Pulled By: pietern fbshipit-source-id: 0d0e93882f5822583f59304d4256dbdf5dea7483	2017-03-15 08:21:12 -07:00
Pieter Noordhuis	92101aa87a	Update resnet50 example Summary: Make it use Gloo and optionally use Redis for rendezvous (where a shared filesystem is not available). Differential Revision: D4709943 fbshipit-source-id: 59cc7a14316c7b634417ea5161a75fab3c19f2fa	2017-03-15 08:18:50 -07:00
Alykhan Tejani	be6322e4b5	Update nn.init docstrings to correctly reference the module (#1001 )	2017-03-15 11:17:59 -04:00
Alykhan Tejani	62063b2f62	Fix docs for pointwise ops (#845 ) (#985 ) * add torch.nn.init docs to the source folder	2017-03-15 11:08:05 -04:00
ezineo	518d36d34b	Add PReLU translator Summary: Closes https://github.com/caffe2/caffe2/pull/171 Differential Revision: D4711877 Pulled By: Yangqing fbshipit-source-id: 555f733e6eabf351480b7d4398aa05755cc26599	2017-03-15 02:47:03 -07:00
Huazhong Ning	bb58074332	support get/add a field by nested name Summary: We are having more and more nested Struct schema. There is increasing need to get/adda field by nested name, e.g., for the following nest Struct schema: st = Struct( ('a': Scalar()), ('b': Struct( ('c': Scalar()), )), ) We may want to get the field "b:c" and/or insert a new field "b:x". The immediate need is for dper2 metrics. This diff is to achieve this. Reviewed By: kittipatv Differential Revision: D4690225 fbshipit-source-id: 71d4a74b36bd1228a2fefd901db2f200602152b7	2017-03-15 02:00:57 -07:00
Aapo Kyrola	26628d10ff	Fix workspace clashes Summary: For example, test and train nets could have shared workspaces, leading to race condition. This adds an assertion and adds a running counter to the workspace-blob name. Reviewed By: jhcross Differential Revision: D4712152 fbshipit-source-id: 808d7069095bac24ebfe0c9d31ebd134f4cf0956	2017-03-14 23:33:28 -07:00
Yangqing Jia	ba9cac4d98	fix mkl contbuild Summary: This should fix mkl contbuild per the most recent bugfix from Intel. Closes https://github.com/caffe2/caffe2/pull/189 Differential Revision: D4711448 Pulled By: Yangqing fbshipit-source-id: 70d1b35fa4fe6cc9b4d36ec0fcfbd6d33f313182	2017-03-14 23:33:28 -07:00
Yangqing Jia	3176cd6292	update nnpack submodule	2017-03-14 23:17:43 -07:00
Pieter Noordhuis	9e6fd02c28	Use Gloo ops in data_parallel_model Summary: No longer need GPU to CPU copies. The allreduce operator no longer uses 'local allreduce - global allreduce - local broadcast' sequence when Gloo is used, but passes all input blobs directly. Depends on D4708860. Differential Revision: D4709897 fbshipit-source-id: 4d745d5d8bac9c2fcca081dd5d812c902808c3b6	2017-03-14 22:34:51 -07:00
Alexander Sidorov	4d7451399b	XRay mobile quantized model Summary: This is going to allow to experiment with various training from scratch / fine tunning technics. The code itself for the new model is not intended to be used as is. Instead one could train a full precision model first. Then add quantization for the last layer, then for the next one and so on. In my experiments I tried getting a pretrained model and then quantizing all inception layers with 4 bits. This restored original accuracy after several dozen iterations Also in this diff I added a common prefix to the model checkpoint + added this prefix to git / hg ignore. And also some extra logs which are usefull to quickly see how things changed right after enabling quantization Differential Revision: D4672026 fbshipit-source-id: b022c8ccf11dd8a2af1a7b2e92673483bc741a11	2017-03-14 22:18:40 -07:00
Aapo Kyrola	9e593a901c	fix memory corruption Summary: D4704547 caused stuff to crash with various memory corruption errors. The problem appears to be in calling sharedWorkspaces->resize(), although I don't completely understand why. Something to do with moving the shared_ptrs around? Anyway, first clearing and then resizing (only needed when seqLen is bigger than what we have allocated) fixes the issue. Reviewed By: jhcross, Yangqing Differential Revision: D4711675 fbshipit-source-id: 35c70e8258555fcb6d403df35e0d391aebe96485	2017-03-14 21:32:55 -07:00
Soumith Chintala	13b1580613	add F.pad to docs	2017-03-15 00:09:14 -04:00
Andrew Dye	fe788f5003	Use correct event to synchronize destination buffer in NCCLElement Summary: NCCLOp::runNCCL is mistakenly recording an event in the source pointer after the NCCL op. This results in NCCLOp::wait() returning without synchronizing with the output buffer. The synchronous tests using NCCL fail. Reviewed By: pietern Differential Revision: D4708860 fbshipit-source-id: 0c36511e260b587d410e5c9604552ceedd06d988	2017-03-14 19:20:59 -07:00
Pieter Noordhuis	f449af378d	Explicitly pass CXX to NCCL Makefile Summary: Necessary if CXX isn't set when cmake is called. The CXX variable will then be empty which prevents make from using its own default. Closes https://github.com/caffe2/caffe2/pull/202 Differential Revision: D4711113 Pulled By: Yangqing fbshipit-source-id: 895c07044b263ba9b5440453978248506d7ac225	2017-03-14 18:33:36 -07:00
Luke Yeager	014d1fe5c4	Allow test discovery in caffe2/python/ Summary: These are all essentially no-op changes which allow for nose-style (or pytest-style) test discovery. With this patch, you can use any of these methods to discover and run tests under `caffe2/python`: ``` python -m unittest discover -p 'test.py' caffe2/python/ python -m nose caffe2/python/ python -m pytest caffe2/python/ ``` Future work: * Get all of the tests to pass * Some seem to be testing operations which don't have GPU implementations * I get a segfault unless I set `CUDA_VISIBLE_DEVICES=0` * Some tests are flaky * Allow test discovery throughout the whole project (e.g. the `experiments/` dir) Closes https://github.com/caffe2/caffe2/pull/199 Reviewed By: pietern Differential Revision: D4704504 Pulled By: Yangqing fbshipit-source-id: 8f5687ec9c8aa873dfaff30dbf44272bc38a206b	2017-03-14 18:16:41 -07:00
Aapo Kyrola	2ce5121db1	Reuse workspaces in RecurrentNetOp -> much faster Summary: RecurrentNetOp created workspaces at every run, which is very wasteful, as it had to also recreate the stepnets (forward and backward!). . Reviewed By: salexspb Differential Revision: D4704547 fbshipit-source-id: 460028d912d6a735448c445cb83c0c4d03286351	2017-03-14 16:34:40 -07:00
Aapo Kyrola	91f468b15c	fixes to make data parallel model work for RecurrentNet + test case Summary: First, this diff includes a full test of data-parallel LSTM, which confirms it works correctly. To make it work, some changes had to be made: - cell net/step net external inputs must be namespace scoped - prevent double-namescoping of cellnet inputs - make data parallel model understand recurrentnets so the device-mapping works Reviewed By: salexspb Differential Revision: D4708840 fbshipit-source-id: 4b0ddc43642d449076a2b6f67ad1c47f84138ff4	2017-03-14 15:48:07 -07:00
Pieter Noordhuis	95ecf22c0c	Add throughput.py for throughput measurements Summary: TSIA Differential Revision: D4629898 fbshipit-source-id: 6c65ca67e498c4b84173939042326214359b052a	2017-03-14 15:48:06 -07:00
Kittipat Virochsiri	25b1221579	Allow scalar output in functional layer Summary: Some operators, e.g., SoftmaxWithLoss, returns scalar-typed tensor. This would allow us to use those ops without having to write layer manually. Reviewed By: xianjiec, kennyhorror Differential Revision: D4703982 fbshipit-source-id: f33969971c57fc037c9b44adb37af1caba4084b6	2017-03-14 15:32:47 -07:00
Sam Gross	e50a1f19b3	Use streams in scatter to overlap copy with compute	2017-03-14 22:46:07 +01:00
ngimel	e86db387ba	Fix conv1d backward segfault (#999 )	2017-03-14 16:15:53 -04:00
Aapo Kyrola	783e40e806	Fix lengths-remapping again + better errors Summary: When cloning recurrent net op, we do a remapping of the lengths-blobs. But if they don't exists (like with CRF), we should not do that. Differential Revision: D4702123 fbshipit-source-id: 37a22d11e709011b8b98b2cc3d9f08eb9fda06c4	2017-03-14 11:04:45 -07:00
Simon Layton	b7530cc54a	Optional central cropping in ImageInputOp Summary: Central cropping during test phase, similar to Caffe's behavior Closes https://github.com/caffe2/caffe2/pull/195 Differential Revision: D4704506 Pulled By: Yangqing fbshipit-source-id: cf7d457dc2acfe8ff5a225ebfd5f8cd0f9d92a07	2017-03-14 10:33:08 -07:00
Nicolas D. Jimenez	a74c2bcda8	Fix build_ios.sh bug (#194 ) due to name collision. Summary: Closes https://github.com/caffe2/caffe2/pull/200 Differential Revision: D4704479 Pulled By: Yangqing fbshipit-source-id: 7a618fa2cd57fd2ead5cede5d5aed033284ea67e	2017-03-14 10:33:08 -07:00
Pieter Noordhuis	dec78a37b4	Increase threshold for using chunked allreduce Summary: Yield better throughput since full ring allreduce is cheaper for smaller blobs (fewer communication steps). Reviewed By: andrewwdye Differential Revision: D4704850 fbshipit-source-id: 338addd919f454c94412ea145e1280492f765c72	2017-03-14 09:32:42 -07:00
Pieter Noordhuis	f26f225972	Support multiple inputs to broadcast/allreduce ops Summary: TSIA For the broadcast op the first input tensor on the process with the specified rank is broadcast to all other processes and outputs. For the allreduce op all inputs are considered for the reduction. Reviewed By: andrewwdye Differential Revision: D4704540 fbshipit-source-id: e6879ca0a9adffe0bc61bf74a333c4052bc8bd92	2017-03-14 09:32:42 -07:00
Pieter Noordhuis	c2e270d8bc	Add Gloo ops Summary: Add broadcast/allreduce ops backed by Gloo (see https://github.com/facebookincubator/gloo). Reviewed By: andrewwdye Differential Revision: D4704536 fbshipit-source-id: 918852055b28a90b1f6fc8615793398db2c25d15	2017-03-14 09:32:41 -07:00
Jianfei Wang	436193bf37	fix minor typo in math_{cpu.cc,gpu.cu} Summary: Closes https://github.com/caffe2/caffe2/pull/196 Reviewed By: pietern Differential Revision: D4703131 Pulled By: Yangqing fbshipit-source-id: cb6e61a41a858e9cb164697a585ef257a8d0530e	2017-03-13 22:47:15 -07:00
Alexander Sidorov	1fac027d0e	Quantized Training API Summary: These python helpers are going to provide sufficient book keeping when adding quantization for conv layers Reviewed By: Yangqing Differential Revision: D4671478 fbshipit-source-id: 292e2f633dd30969c0afbe7a8075b340ce9a6d12	2017-03-13 22:17:58 -07:00
Deepak Gopinath	a1d63da6af	Adding UNK to vocab \| Changing default params Summary: UNK needs tobe indexed in the vocabulary for validation to work. Default args now result in training loss decreasing. Reviewed By: urikz Differential Revision: D4703393 fbshipit-source-id: e4d6ad100daf8392f8ba1e502f9ecf39bb8ce24a	2017-03-13 22:17:48 -07:00
Viswanath Sivakumar	85fad20a5a	Gracefully handle empty input to Dropout Summary: Context: https://fb.facebook.com/groups/1405155842844877/permalink/1677762748917517/. DropoutOp and DropoutGradientOp already handle input of size 0 gracefully. The CHECK isn't needed. I think this should fix the crash in xray detection models where num region proposals are zero. Differential Revision: D4697254 fbshipit-source-id: afd06975f2ad4b2e59f15d12b0aa332f6eb3f1af	2017-03-13 21:47:56 -07:00
Pieter Noordhuis	1bf61b8adc	Add googletest submodule	2017-03-14 03:39:54 +00:00
Luke Yeager	84ba7c1acb	Skip test if libfb not present Summary: Allows `nose` or `pytest` to collect all tests. ```sh $ cd build $ nosetests --collect-only .............................................................................................................................................................................................................................. ---------------------------------------------------------------------- Ran 222 tests in 0.430s OK ``` Closes https://github.com/caffe2/caffe2/pull/198 Differential Revision: D4700783 Pulled By: Yangqing fbshipit-source-id: 97504f6b14537669aa150f6a71283e851829ac5e	2017-03-13 17:31:43 -07:00
Sam Gross	704ee3ca68	Use cudart symbols from the main program. Our extension library links against cudart and pulls in the symbols. Use LoadLibrary(None) to use the same symbols as the _C extension. This fixes the PyTorch wheel when you don't have system CUDA installed.	2017-03-13 19:45:34 -04:00
Aapo Kyrola	fc7939c25b	add model_helper.ExtractPredictorNet() Summary: It has been a pain to save predictor-compatible models from Caffe2. This diff adds function ExtractPredictorNet that takes a training model and outputs a predictor model by removing all operators that are not relevant for prediction, such as backward pass and dequeue-ops for input loading (as in predictor, the input data is external input). We can also consider including this directly in the predictor exporter for FB usage. Reviewed By: rpenggithub Differential Revision: D4693264 fbshipit-source-id: e81abbbec0bd4d717159cf36488d0baaf0130090	2017-03-13 16:32:04 -07:00
Ahmed Taei	a745981c94	ReduceBack{Sum\|Mean}Op CPU & GPU implementation Summary: Implement ReduceBackSum & ReduceBackMean with gradients for CPU & GPU contexts. The reduction happens among the last dimenstions for example if input is a M x N matrix ReduceBackSum will results a vector of dim M x 1 contains the rowwise sums. Differential Revision: D4689768 fbshipit-source-id: 5b0482d4341867ecf23526dc6c4d544420e7d8f7	2017-03-13 16:19:58 -07:00
Jonathan Tremblay	9004652c7b	updated the documentation to remove the unnecessary copy grads when using multiprocessing	2017-03-13 19:04:17 -04:00
Kairan Sun	ee2bc06926	Add Shape Inference for Reshape Operator Summary: Add shape inference for reshape. Because it cannot do shape inference for reshaped tensor with runtime tensor data, set `out[0].set_unknown_shape(true)` if no `shape` argument is used. Differential Revision: D4671125 fbshipit-source-id: 685a9198f9b08e3336014c792f20051b381d8619	2017-03-13 14:31:27 -07:00
Deepak Gopinath	001ac5d751	Fix to use appropriate corpus and vocab in eval Summary: We should be using the vocabulary built on the training data, and corpus_eval as data for the evaluation phase. Reviewed By: urikz Differential Revision: D4700382 fbshipit-source-id: ca1dd043a28f9bb585faad050c82fb12c1cdf6cc	2017-03-13 14:31:27 -07:00
Jeff Johnson	aca6ce984c	change lookup table sort	2017-03-13 13:55:16 -07:00
Soumith Chintala	ed8773f7bd	add legacy_serialized.pt to gitignore	2017-03-13 16:37:35 -04:00
Pieter Noordhuis	0f7b7b27b1	Fix build for CMake 2.8.12 Summary: This is the minimum required CMake version (also the version that is available on Ubuntu Trusty (14.04)). Closes https://github.com/facebookincubator/gloo/pull/9 Reviewed By: Yangqing Differential Revision: D4698659 Pulled By: pietern fbshipit-source-id: bf01541fe485c03e7c665f175c2887feaf9516a3	2017-03-13 13:06:15 -07:00
Peizhao Zhang	a5a5d00b87	Fixed a bug: 'ModelTrainerLog instance has no attribute 'external_loggers'' Summary: Fixed a bug (AttributeError: ModelTrainerLog instance has no attribute 'external_loggers', at File "caffe2/python/experiment_util.py", line 101) when no external_loggers is passed to ModelTrainerLog(). Differential Revision: D4697197 fbshipit-source-id: 1c770c366d87ea474bcf40ab289b67c76648d48b	2017-03-13 12:32:36 -07:00
Soumith Chintala	48f48b6ff2	fix more flaky VolumetricMaxPooling tests	2017-03-13 14:38:27 -04:00
Soumith Chintala	615b27eadf	fix corner case in SetItem of Variable	2017-03-13 14:38:27 -04:00
Adam Henryk Paszke	86ede33035	CMake improvements for Gloo Summary: Install headers and add .. to include directories Reviewed By: pietern Differential Revision: D4695500 fbshipit-source-id: f48a49f03e575408829793cb63bfdb16d8e3a309	2017-03-13 11:06:05 -07:00
Simon Layton	f422e6307d	Add poly learning rate policy Summary: Add LR policy from Caffe Closes https://github.com/caffe2/caffe2/pull/192 Differential Revision: D4687377 Pulled By: Yangqing fbshipit-source-id: ba0a48a937ab4784e1c31249a3ed858b248d988f	2017-03-13 10:02:42 -07:00
Andrew Dye	bd09055207	Synchronize all NCCL ops with shared per-device streams Summary: Allocate a set of per-device streams used to serialize NCCL op scheduling. These ensure concurrent NCCL ops are not interleaved across devices (i.e., through priority scheduling), resulting in deadlock. Synchronize source and destination streams with NCCL streams. Reviewed By: pietern Differential Revision: D4685360 fbshipit-source-id: 3c228b195b0a0d9d7cccc720163898d344a5ed4c	2017-03-13 09:20:05 -07:00
Yangqing Jia	4bd220d91a	Travis contbuild scripts and cmake fix. Summary: TSIA. Redoing #7 to kick travis. Closes https://github.com/facebookincubator/gloo/pull/8 Reviewed By: Yangqing Differential Revision: D4697132 Pulled By: pietern fbshipit-source-id: d03148aeddb2cf927b4ef3689c97d9ba4f4cdc9d	2017-03-13 08:36:10 -07:00
Gu Wang	170d790b66	fix doc of conv3d in conv.py (#989 ) the second dimension should be height.	2017-03-13 11:30:13 -04:00
Eli Stevens	e216f557fd	Fixes issue returning strings from a Dataloader with pin_memory=True (#908 )	2017-03-13 10:11:07 +01:00
Xianjie Chen	e5858485ca	small change to concat layer to make tensor board vis nicer Summary: otherwise the blob will be in different namescope, e.g., `_nested`: https://fburl.com/ntlsaezv. this make tensorboard ugly. Reviewed By: dzhulgakov Differential Revision: D4696946 fbshipit-source-id: 73627feccd7c4896964e6c549b7241bcce4f49a7	2017-03-12 23:01:18 -07:00
Pieter Noordhuis	6729d81418	Specify which GPUs to use in resnet50 example Summary: TSIA This change also fixes an undefined attribute error after running 20 iterations of the resnet50 example trainer. Differential Revision: D4692794 fbshipit-source-id: b98efdfeb078c5ba89d2a86837f3c672e1eade5f	2017-03-12 22:33:15 -07:00
Dmitry Ulyanov	997312c233	Add WeightedRandomSampler (#980 ) Samples elements from `[0,..,len(weights)-1]` with given probabilities (weights). So far there is no mean to either introduce sample weights in loss functions or while sampling from a dataset. This is an attempt to add the functionality for the latter issue.	2017-03-13 00:27:05 -04:00
Adam Paszke	d602b3a834	Allow submodules and parameters to shadow attrs on assignment	2017-03-12 13:31:32 -04:00
Adam Paszke	f531d98341	Fix memory leak in torch.from_numpy	2017-03-12 13:31:32 -04:00
Adam Paszke	6bdd5ecaf5	Remove some unnecessary AutoGPU calls	2017-03-12 13:31:32 -04:00
Adam Paszke	bfbde9d6eb	Fix Embedding bug when max_norm was used	2017-03-12 13:31:32 -04:00
Nicolas Despres	b9c816a796	Fix run_test.sh --coverage option. (#983 )	2017-03-11 19:26:02 -05:00
Low Kian Seong	2f5c215d34	Update setup.py (#981 ) Adding `description` to `setup.py`	2017-03-11 12:14:07 -05:00
Alykhan Tejani	01650ac9de	add torch.nn.init docs to the source folder (#979 )	2017-03-11 10:11:30 -05:00
Dmytro Dzhulgakov	43b6fcba7d	Improve error message from LogFileDB on missing file Summary: A lot of people get confused if the file can't be loaded. Reviewed By: rpenggithub Differential Revision: D4686572 fbshipit-source-id: 519ff68a3d4f04cf8ce893f255f7814e043383b6	2017-03-10 23:31:28 -08:00
Kevin Waugh	a4a136038e	more descriptive error message Summary: pc_load_letter Differential Revision: D4693855 fbshipit-source-id: af1ce4884570dc60c309c113c698e86c54ed2b93	2017-03-10 22:03:15 -08:00
Aapo Kyrola	3f682ca699	Fix to data parallel model blob_to_device mapping Summary: We need the InferToDeviceMapping too early, or we should had done it also after running parameter update function since that can create new blobs like the momentum blobs. This fix is maybe not optimal, but works and is fast enough. Differential Revision: D4693450 fbshipit-source-id: 4c4cc2396dad371b3fbcd1d8da51133ea09a57e0	2017-03-10 18:03:58 -08:00
Dmytro Dzhulgakov	b61aaa90b6	Stop multi_reader if we run out of data before max_examples Summary: Before we didn't propagate the 'out-of-data' signal if splits_per_epoch wasn't specified. Right now it's a hacky fix (just reuse ReaderWithLimit). azzolini - any suggestions of more elegant solution? I can create an extra reader that just export "is empty" signal out. Overall, I guess we need to turn global_queue into a more sustainable unittest that verifies all possible combinations - I'm still not sure it's correct :-\ Reviewed By: xianjiec Differential Revision: D4665677 fbshipit-source-id: fe44d10ee82c3383145635e67dea1d9b666e061f	2017-03-10 18:03:57 -08:00
Zhengping Zuo	31b72b9004	move reshape out of utility_ops Summary: move reshape as individual op Reviewed By: ajtulloch Differential Revision: D4690919 fbshipit-source-id: a84859d738039125a4f4122365619b69d5990427	2017-03-10 16:21:50 -08:00
Wenyi Huang	0308910c58	Enable use of Print for LayerModelHelper Summary: Whe debug using LayerModelHelper, adding Print to model will trigger this assert. Reviewed By: xianjiec Differential Revision: D4687859 fbshipit-source-id: 6932e38f8dd17ba0b80da18a20943ecdb2e8af0a	2017-03-10 15:26:16 -08:00
Soumith Chintala	ce536aa355	fix example in docs for NLLLoss	2017-03-10 16:48:08 -05:00
Aapo Kyrola	a109cbdfb6	fix bug in data_parallel_model stripParams() Summary: Thanks for shenpan, detected this bug. Problem is that FinalizeAfterCheckponit() can be passed a list of strings, not blob references, and that fails in stripParam() after assertion I added in D4649208. It is ok to pass strings as well to that function. Reviewed By: jhcross Differential Revision: D4691028 fbshipit-source-id: 0bca80d44a5ab641438cc5b26482bca0b1527d69	2017-03-10 13:17:11 -08:00
Trevor Killeen	fc0af33a18	key only block-wide bitonic sort	2017-03-10 11:50:43 -08:00
Yangqing Jia	0e7e9888f7	Explicitly do MPI prefix for ops before it is too late Summary: Chatted with pietern today, figured it is an easy change. Reviewed By: pietern Differential Revision: D4688275 fbshipit-source-id: a2751f1ff9f192ba6f2bd961be6ad1c693c8b5c6	2017-03-10 10:18:34 -08:00
陈云	c7c4778af6	modify docs of `broadcast` to fix issuse #940 (#970 )	2017-03-10 09:54:43 -05:00
Aapo Kyrola	adb3f0ec22	add exception for empty shape param Summary: Following krp's suggestion, check if the shape parameter is empty. Reviewed By: dzhulgakov Differential Revision: D4686698 fbshipit-source-id: 3f9fb1e3215dd2a4a726442531201eeb18224bc6	2017-03-10 00:33:59 -08:00
Pieter Noordhuis	d873077349	Create context from existing MPI communicator Summary: This makes it easy to use Gloo transports and algorithms in existing MPI environments. Reviewed By: andrewwdye Differential Revision: D4685999 fbshipit-source-id: cfc7d0e445893512b4e4ed2abe1bb280d83b9c70	2017-03-09 23:06:18 -08:00
Pieter Noordhuis	0c38827318	Split out rendezvous specifics from context Summary: How pairs are setup and connected to one another is specific to whatever underlying rendezvous mechanism is used. This change moves the `connectFullMesh` function into a subclass in the `rendezvous` directory. This prepares for a separate MPI context that can setup pairs between processes using an existing MPI communicator. Reviewed By: andrewwdye Differential Revision: D4684755 fbshipit-source-id: 9eb643b8ba545b3e6f9a36b65642b3b04a5f0077	2017-03-09 23:06:18 -08:00
Andrew Dye	fb766c00b3	Align async\wait pattern to use wait() naming Summary: TSIA Reviewed By: pietern Differential Revision: D4686783 fbshipit-source-id: ccbdace0d53219bd4b881ea27f7f972b206215b6	2017-03-09 21:20:45 -08:00
Andrew Dye	e600c9830a	Fix up NCCLElement construction in CudaBroadcastOneToAll Summary: TSIA Reviewed By: pietern Differential Revision: D4686520 fbshipit-source-id: 657ca90aa1971be152b037563105a9f490137a69	2017-03-09 20:37:03 -08:00
Bram Wasti	f93039b9c4	check data is allocated Summary: enforce data allocation (if we can't allocate something is broken) Reviewed By: Yangqing Differential Revision: D4684460 fbshipit-source-id: bb5cf0a9ddeecc6fa1bfd53a9367adc54506dd6d	2017-03-09 20:32:49 -08:00
soumith	73a65cd29f	simple ordering fix to avoid gcc warning	2017-03-09 17:10:59 -08:00
Karthik Prasad	965a7daf9b	Implement MILSTM in caffe2 Summary: Created a new function with specifics related to MI LSTM implementation in caffe2 See https://arxiv.org/pdf/1606.06630.pdf for details. See D4478877 for the implementation of the same in tensorflow Reviewed By: jhcross Differential Revision: D4669882 fbshipit-source-id: 095bbcf187dbdac2cd79558ff0c8f9f67d8af639	2017-03-09 16:32:47 -08:00
Jerry Pan	bde53f61af	Caffe2: add scuba logging to benchmark Summary: Caffe2: add scuba logging to benchmark Differential Revision: D4667194 fbshipit-source-id: 8e9fca5517d7d40a6bc3e55cd00161e7482cd6f4	2017-03-09 16:32:47 -08:00
Deepak Gopinath	57ecd20197	seq2seq open source implementation Summary: OSS implementation of seq2seq model in Caffe2. The script uses Seq2SeqModelCaffe2 class to build and run the model. It takes in training data in the form of text file with one sentence in each line, builds a vocabulary, generates batches based on batch size and runs the net for a configurable number of epochs. It prints total scalar loss at the end of each epoch. All FBLearner and neural_mt type system dependencies have been removed. Unimplemented and unnecessary methods have been removed to make the script simpler. fblearner/flow/projects/langtech/translation/neural_mt/model_util_caffe2.py has been moved to caffe2/caffe2/python/examples/seq2seq_util.py and remains unchanged Potential TODOs: - Get the model running in GPU. Only GatherOp does not have a corresponding GPU implementation. Try adding CopyGPUToCPU before and CopyCPUToGPU after Gather, and use CUDA DeviceOption. - Add evaluation on test data with suitable metric (perplexity? bleu?) Reviewed By: urikz Differential Revision: D4653333 fbshipit-source-id: 1c7d970ebc86afe23fad4d48854296bf54eb0f77	2017-03-09 16:18:08 -08:00
Francisco Massa	b785ed0ac0	Fix Embedding and CosineEmbeddingLoss on non-float CUDA (#965 )	2017-03-09 18:04:40 -05:00
Alfredo Canziani	b2d077d81d	Update _tensor_docs.py (#966 )	2017-03-09 18:04:19 -05:00
James Cross	c5621ded31	Allow use of ReversePackedSegs operator in CUDA context Summary: ReversePackedSegs operator for CUDA. Input "lengths" (static integers) required to be in CPU memory. Differential Revision: D4661281 fbshipit-source-id: c800c316c34015ba8e732dcbcaa8c4edaffdfeab	2017-03-09 15:03:55 -08:00
Aapo Kyrola	89c08334bb	data_parallel_model support for sparse gradients and CPU ops Summary: Data parallel model did not support sparse operations, nor gradients computed on CPU ops. Currently sparse operations are done on CPU, so there is no point of "data parallelizing" them. I had to make a few changes to data_parallel_model to support this: 1. Model can have params that are added prior to adding the data parallel part. For example, a lookup table of word vectors would be a parameter that is non-parallel. 2. Thus, when data parallel model is called, it will separate the non-parallel params and avoid working on them. Note: when we add distributed version, we need to explicitly handle them with AllGather! This works nicely since Caffe2 automatically adds the backward concat-operator when multiple ops gather from the same blob. I also added support for data parallel CPU ops, which might be necessary in cases when we don't have GPU implemenation of some ops. Test in data_parallel_model_test validates the correctness of the code by running the same trainer on different number of gpus and checking the end result is same. Reviewed By: jhcross Differential Revision: D4649208 fbshipit-source-id: e3b7ae701ead468dc94c52a976eafec5c9831097	2017-03-09 13:48:41 -08:00
Andrew Dye	4814b0bc09	Recompose NCCLElement of src/dst CudaDevicePointers Summary: CudaDevicePointer has the information we need for a NCCL op. Refactor NCCLElement as a composition of src and dst CudaDevicePointers. This allows for separate streams for src and dst, and will simplify a future change to use a static set of streams for all NCCL ops. Reviewed By: pietern Differential Revision: D4679483 fbshipit-source-id: 75656cc2fa5b5e2a6c096d914d2111769a47291b	2017-03-09 12:26:55 -08:00
Edouard Delasalles	b1c2714ad5	Add momentum and centered options to RMSProp (#810 ) * add momentum and centered options Add two options : - Momentum (like SGD's momentum) - Centered RMSprop, as in Graves 2013 ( https://arxiv.org/abs/1308.0850 ) : grad is normalized by running estimation of its variance * somme PEP8 * bug in default * bug2 * sign mistake * alloc of momentum & centered only if needed * add link to docstring * some pep8 on docstring * implement __setstate__() for backward compatibilty * correct grammar mistake * multiply by lr when adding delta to params * rename momentum variables * change __init__ params order	2017-03-09 10:04:32 +01:00
Bram Wasti	41a3ec2455	QTensor serialization/deserialization Summary: Added protobuf style serialization/deserialization w/o chunking for qtensors Reviewed By: salexspb Differential Revision: D4622677 fbshipit-source-id: 1f845ad773a61b7ae2c362ec31d8de04e4217f68	2017-03-09 00:01:12 -08:00
Ou Jin	5bb5572719	check correct signal counter Summary: Not sure whether does this influence anything. Reviewed By: azzolini Differential Revision: D4671128 fbshipit-source-id: 7a018dd54eb68127eb0c151dbc594b94ac4da0ea	2017-03-08 23:49:41 -08:00
Andrey Malevich	84e742ded7	Migrate realtime training workflows to use new metrics. Summary: This diff is getting rid of old metrics interface in realtime training. Reviewed By: xianjiec Differential Revision: D4649734 fbshipit-source-id: de4af85eb5476df9790ebd3915625bf8beee65af	2017-03-08 23:49:41 -08:00
Ou Jin	eeb7279020	compile execution step Summary: When the execution step is representing things like: for loop execution_step net1 execution_step net2 net3 the preparation cost for execution step is too high. This diff moves most of the shared information in the CompiledExecutionStep to save time. After the change the benchmark result for parameter server handler is as following: (be aware that the first two have some variance) INFO:__main__:==Summary== INFO:__main__:Time <function case_if at 0x7f7160c32938> 0.0752924203873 INFO:__main__:Time <function case_loop at 0x7f7160c329b0> 0.0677666187286 INFO:__main__:Time <function case_simple_net at 0x7f7160c32a28> 0.0605396509171 INFO:__main__:Time <function case_one_loop at 0x7f7160c32aa0> 0.0611681699753 Before the change: INFO:main:==Summary== INFO:main:Time <function case_if at 0x7f19d079f848> 0.100815701485 INFO:main:Time <function case_loop at 0x7f19d079f8c0> 0.0864136457443 INFO:main:Time <function case_simple_net at 0x7f19d079f938> 0.0614696979523 INFO:main:Time <function case_one_loop at 0x7f19d079f9b0> 0.0598972082138 Reviewed By: azzolini Differential Revision: D4643926 fbshipit-source-id: 5a4b97230ba778e0ff5cbafc8a216335a191068a	2017-03-08 23:49:41 -08:00
Xianjie Chen	95501a0165	clean old unit test, add sum processor and sqrt pooling Summary: sum processor and sqrt pooling is to mimic the DoubleHelix model. Differential Revision: D4678413 fbshipit-source-id: fc1ccfe3c92c540ce5914dfd8ff1a040805c48db	2017-03-08 23:04:19 -08:00
Igor Sugak	86e60848c5	use gflags namespace instead of google Summary: `google` namespace is deprecated in gflags. Replacing it with `gflags` namespace. gflags was generated from this diff: P57170122 ``` % echo $gflags google::(RegisterFlagValidator\|CommandLineFlagInfo\|GetAllFlags\|ShowUsageWithFlags\|ShowUsageWithFlagsRestrict\|\ DescribeOneFlag\|SetArgv\|GetArgvs\|GetArgv\|GetArgv0\|GetArgvSum\|ProgramInvocationName\|ProgramInvocationShortName\|\ ProgramUsage\|VersionString\|GetCommandLineOption\|GetCommandLineFlagInfo\|GetCommandLineFlagInfoOrDie\|] FlagSettingMode\|SET_FLAGS_VALUE\|SET_FLAG_IF_DEFAULT\|SET_FLAGS_DEFAULT\|SetCommandLineOption\|\ SetCommandLineOptionWithMode\|FlagSaver\|CommandlineFlagsIntoString\|ReadFlagsFromString\|AppendFlagsIntoFile\|\ ReadFromFlagsFile\|BoolFromEnv\|Int32FromEnv\|Uint32FromEnv\|Int64FromEnv\|Uint64FromEnv\|DoubleFromEnv\|\ StringFromEnv\|SetUsageMessage\|SetVersionString\|ParseCommandLineNonHelpFlags\|HandleCommandLineHelpFlags\|\ AllowCommandLineReparsing\|ReparseCommandLineNonHelpFlags\|ShutDownCommandLineFlags\|FlagRegisterer) % hg grep -wlE "$gflags" 're:fbcode.*\.(cc\|cpp\|h)' \| xargs perl -pi -e 's,\bgoogle::,gflags::,g if /'"$gflags"'/' ``` Reviewed By: meyering Differential Revision: D4669201 fbshipit-source-id: 8053ba6fba9acf6eaf6796f0f297a9e07784973f	2017-03-08 22:16:47 -08:00
ezineo	842ee41999	Fix binary file reading bug for MSC compiler Summary: For MSC compiler binary flag needs to be specified Closes https://github.com/caffe2/caffe2/pull/191 Differential Revision: D4677511 Pulled By: Yangqing fbshipit-source-id: 4f80f09bd4bf9b6b6eff352cc67a62163255334f	2017-03-08 20:31:12 -08:00
Chonglin Sun	581e57c244	add AccumulateHistogramOp Summary: AccumulateHistogramOp, for computing the histogram of all values in input tensors Differential Revision: D4654417 fbshipit-source-id: dea92346004c772af16e1eb41306287d81dc5a02	2017-03-08 19:37:32 -08:00
Kecheng Hao	e88379ef3a	Implement deep function recursion as a loop with a stack instead Summary: Replace the recursion by using stack Differential Revision: D4650848 fbshipit-source-id: bd0e3f82cf92e9548b83a495a6fcf187467fcb3d	2017-03-08 19:08:13 -08:00
Bram Wasti	8a84d03253	move qtensor to open source Summary: Releasing to OS Reviewed By: salexspb Differential Revision: D4623486 fbshipit-source-id: 714b1cf2137f164d7925eb52d2a6ed4442e4457e	2017-03-08 18:02:39 -08:00
Stephen Merity	a462edd0f6	Docs(RNN\|GRU\|LSTM): Note dropout applies to all layers except the last layer (#961 ) This is an important clarification to make as otherwise users are misled as to where they may need to add dropout and to clarify the situation would need to delve into the backend implementation. `4647f753bc/torch/nn/_functions/rnn.py (L73)`	2017-03-08 18:09:11 -05:00
Minsuk (Brian) Kahng	c6a9d7f188	User input (Conv out, etc.) Summary: Take user inputs for the introspection visualization: convolutions output layer activations, filters using containing phrases, and number of samples Reviewed By: Mortimerp9 Differential Revision: D4603797 fbshipit-source-id: dc972dcb8ad36e30defab266d710e047b11cff73	2017-03-08 13:49:45 -08:00
Matt Uyttendaele	046b467c9a	added prefix to load op Summary: modified load_save_op to work with my training script - SaveOp now correctly strips specified prefix of the form 'gpu_0/' when saving model blobnames to DB - when translating DB blobnames to model blobnames, LoadOp can now optionally add prefix of the same form Reviewed By: Yangqing Differential Revision: D4664134 fbshipit-source-id: a2512e79f0c5172c5111af3e9b6fd161f268f4df	2017-03-08 12:48:50 -08:00
Sam Gross	c2425fc9a1	Fix build warning for C file	2017-03-08 21:28:57 +01:00
Ahmed Taei	4f0e7730a9	Distrubited Multi-GPU resnet50 Summary: Use filesystem rendezvous for dist-multi GPU training. Differential Revision: D4664945 fbshipit-source-id: 7b6767323e94bc4e7fa25ef3eba65b38abb79341	2017-03-08 11:39:29 -08:00
James Reed	8de1db9eb6	Implement recurrent attention in C2 Summary: Super rough implementation of recurrent attention. Planning to factor out the common code between the two functions as well as train and eval. I want to get this out and get eyes on it sooner rather than later Differential Revision: D4647837 fbshipit-source-id: 54bc4e8ed0df6f04c86c425926decbe89f73b068	2017-03-08 11:21:28 -08:00
Kittipat Virochsiri	f0d78753ae	Make ModelExporter.load_from_db() load to specific workspace Summary: In case of distributed task, load_from_db() loads to wrong workspace (when used inside a Python op). Passing which workspace to use explicitly so that it loads to the one Python op is being run. Reviewed By: kennyhorror Differential Revision: D4653692 fbshipit-source-id: 94585c012b05ee38b9ce5e8ef0efdd50aa41dd2b	2017-03-08 09:31:42 -08:00
Sam Gross	fbcedf2da2	Merge commit '3d95e13b332e1b31d706b59c3b67f886958ece79'	2017-03-08 09:09:46 -08:00
Sam Gross	3d95e13b33	Check event_count before merging blocks	2017-03-08 08:49:04 -08:00
Guillaume Klein	228e1a8696	Add CUDA caching allocator accessor	2017-03-08 08:29:50 -08:00
Pieter Noordhuis	be0e8c0009	Use sequential slot numbers from context Summary: Add a nextSlot() function to the context that increments and returns a slot number. This enables multiple algorithms sharing the pairs part of a context. The slot numbers were hardcoded before this change, which prevented reuse. After this change, some of the tests can be changed to run multiple times (or do a parameter sweep) without respawning a new threadpool or allocating new fixtures. Also change some internally used variable names for more consistency. Reviewed By: andrewwdye Differential Revision: D4668268 fbshipit-source-id: 65cbc8f2666f0b7d2f1c72574b86d913f5855d62	2017-03-08 08:23:03 -08:00
Trevor Killeen	3fa8a3ff46	add implementation of inclusive scan via upsweep-downsweep	2017-03-08 07:34:14 -08:00
Jiyan Yang	e75221e316	Add eval net to two tower workflow Summary: The evaluation part of the two tower workflow is missing. This diff is to complete it. Part of the newly added functions can be used for other workflows, eg, feed. As the eval workflow in different workflows will be overlapped, a generic eval workflow will be added in a separate diff. Reviewed By: kennyhorror Differential Revision: D4646880 fbshipit-source-id: 4d6eb35df10f6f613533d442f2a04dc0332386f8	2017-03-07 21:03:00 -08:00
James Cross	8de2027d9b	Add gradient operator for SumElements Summary: Add gradient support for Caffe2 operator SumElements (for use in Translation RNN training pipeline). Differential Revision: D4669036 fbshipit-source-id: 502760a2a624b20b3241e83a2f208f450b6ff36f	2017-03-07 20:03:07 -08:00
Huazhong Ning	83437853ad	refactor and modulize optimizers Summary: The current optimizer code in c2/python has the following issues: (1) the optimizers in sgd.py cannot config per param-blob optimizer; (2) sgd.py is a bad file name. optimizer.py is a better name; (3) layer_model_helper.py has another set of optimizer code (which supports per param-blob optimizer) This diff did the following (1) create optimizer objects so that we can config per param-blob optimizer and that are also compatible to the existing optimizer code (2) the new optimizer code are much more modulized (3) move the optimizer code to file with better name (optimizer.py) (4) replace the optimizer imports in the existing code will do in next diffs (1) optimizers with structured parameters for dper2 (2) get rid of the optimizer code in layer_model_helper.py Reviewed By: salexspb Differential Revision: D4609013 fbshipit-source-id: 2e2d6dfa8685d10498f89069157453d9feca3f27	2017-03-07 18:46:47 -08:00
Krishna Vudata	235a95f09a	Fix LengthsToRanges docs Summary: Fixing docs to reflect implementation - `LengthsToRangesOp::RunOnDevice()` assumes the input is an `int32_t` tensor. Reviewed By: ender-wieczorek Differential Revision: D4626504 fbshipit-source-id: 5249a57efc6f62748c3c2ecdfaca61843830c44e	2017-03-07 18:46:46 -08:00
Yangqing Jia	b5e5001426	update detailed build info Summary: Fun on the plane. This basically reveals the per-platform build status on the README.md file. Closes https://github.com/caffe2/caffe2/pull/188 Differential Revision: D4668460 Pulled By: Yangqing fbshipit-source-id: 242b916cca0a46f8d797c6430c1875d6ffaae7ce	2017-03-07 15:46:33 -08:00
Sam Gross	4647f753bc	Merge commit '0f872ed02fbaf5b326f235b3f18724171b061416'	2017-03-07 14:45:01 -08:00
Xiaolong Wang	ed693b1c6a	add EnsureDense Op in MTML MLP Summary: 1. Allow EnsureDense Op to do both in-place pass or copy 2. In MTML, add EnsureDense Op before gather 3. Change the unittest values (adding another operator changes the random seed, which causes the model initialization also changes) Reviewed By: xianjiec Differential Revision: D4625219 fbshipit-source-id: b3c748c3651d1dedd75420912a9698b7e46187c5	2017-03-07 14:03:49 -08:00
Andrey Malevich	b599910f3a	Use new metric intefaces in trainer workflows. Summary: This diff is migrating existing DPER workflows to use new metric abstractions in training. Reviewed By: xianjiec Differential Revision: D4656576 fbshipit-source-id: 1b3b16b390fc0757480e41df1c4214c11cd76e8a	2017-03-07 12:46:52 -08:00
Yedidya Feldblum	6830d56103	CodeMod: google::ProgramUsage to gflags::ProgramUsage Summary: CodeMod: `google::ProgramUsage` to `gflags::ProgramUsage`. For gflags, `namespace google` is deprecated in favor of `namespace gflags`. Automated with: ```lang=bash hg grep -lw google::ProgramUsage \| xargs perl -pi -e 's,\bgoogle(::ProgramUsage)\b,gflags\1,g' ``` Reviewed By: igorsugak Differential Revision: D4665851 fbshipit-source-id: 9790c74f1c42d74043b94ee356f8e3cc3622f132	2017-03-07 11:47:35 -08:00
Yangqing Jia	1741fd839f	Re-apply windows diff D4657831 Summary: (Note: previous revert was due to a race condition between D4657831 and D4659953 that I failed to catch.) After this, we should have contbuild guarding the Windows build both with and without CUDA. This includes a series of changes that are needed to make Windows build, specifically: (1) Various flags that are needed in the cmake system, specially dealing with /MD, /MT, cuda, cudnn, whole static linking, etc. (2) Contbuild scripts based on appveyo. (3) For Windows build, note that one will need to use "cmake --build" to build stuff so that the build type is consistent between configuration and actual build. see scripts\build_windows.bat for details. (4) In logging.h, ERROR is already defined by Windows. I don't have a good solution now, and as a result, LOG(ERROR) on windows is going to be LOG(INFO). (5) variable length array is not supported by MSVC (and it is not part of C++ standard). As a result I replaced them with vectors. (6) sched.h is not available on Windows, so akyrola 's awesome simple async net might encounter some slowdown due to no affinity setting on Windows. (7) MSVC has a bug that does not work very well with template calls inide a templated function call, which is a known issue that should be fixed in MSVC 2017. However for now this means changes to conv_op_impl.h and recurrent_net_op.h. No actual functionalities are changed. (8) std host function calls are not supported in CUDA8+MSVC, so I changed lp_pool (and maybe a few others) to use cuda device functions. (9) The current Scale and Axpy has heavy templating that does not work well with MSVC. As a result I reverted azzolini 's changes to the Scale and Axpy interface, moved the fixed-length version to ScaleFixedSize and AxpyFixedSize. (10) CUDA + MSVC does not deal with Eigen well, so I guarded all Eigen parts to only the non-CUDA part. (11) In conclusion, it is fun but painful to deal with visual c++. Differential Revision: D4666745 fbshipit-source-id: 3c9035083067bdb19a16d9c345c1ce66b6a86600	2017-03-07 11:02:12 -08:00
Aapo Kyrola	d8588d8007	CUDA version of elementwise power + rename to Pow + gradient Summary: Renamed ElementwisePower to Pow for better discoverability. Added CUDA version and Gradient + tests. Reviewed By: kennyhorror Differential Revision: D4665550 fbshipit-source-id: dd33d8ad3917d71504e363ab397af50d38a63b1f	2017-03-07 10:20:40 -08:00
Soumith Chintala	7ba5e7cea1	fix VolumetricMaxPooling test instability (#952 )	2017-03-07 10:55:46 -05:00
Arkadiusz Nowaczyński	9b626a8047	Fix documentation - replace 'matrix' with 'vector' (#951 )	2017-03-07 10:40:18 -05:00
Zhou Chang	bd0e9a73c7	Fix some simple build error on MacOS (#949 ) Issue #948 Signed-off-by: Zhou Chang <achang.zhou@gmail.com>	2017-03-07 09:47:49 -05:00
Aapo Kyrola	695ea6c7a1	SumElementsOp Summary: Add a simple op to sum the elements, with optional averaging. This is basically copy from AverageLossOp that we should alias to this. And maybe develop this towards a generic norm op. Reviewed By: jhcross Differential Revision: D4664591 fbshipit-source-id: 0e0c0efe9e415e2ad2feecfa42b03db2c83bee70	2017-03-07 05:23:53 -08:00
Aapo Kyrola	8fab453863	Sqr op and gradient Summary: Due to popular demand, added an op to compute element-wise square + gradient for it (just for the fun of it). Reviewed By: Yangqing Differential Revision: D4664797 fbshipit-source-id: 0a29c7c249fdc72f51412bebd6ae352a7801cf05	2017-03-07 03:03:07 -08:00
Dmytro Dzhulgakov	560572910c	Add task outputs and stop signals to net_printer Summary: Useful for debugging of multi_reader. Reviewed By: kennyhorror Differential Revision: D4664954 fbshipit-source-id: ba7a307db444b61a7e520992ee44c35237906068	2017-03-07 01:21:40 -08:00
Kairan Sun	9f588aa8a2	Add Inference for Flatten Summary: Implementing shape inference for Flatten operator and adding unit tests. Differential Revision: D4664073 fbshipit-source-id: c54a269fc7633908fe4197682d27076ef97d9c22	2017-03-07 01:21:40 -08:00
Pieter Noordhuis	7bddd586f7	Change PrefixStore to take a Store reference Summary: Taking ownership of a std::unique_ptr is a bit awkward. It's actually useful to reuse the underlying store and create multiple prefix stores against it. Reviewed By: andrewwdye Differential Revision: D4662354 fbshipit-source-id: eaf62f7d5a97d6ee848252ff3124c28da349f6f2	2017-03-06 22:19:49 -08:00
Pieter Noordhuis	da10450535	Allow multiple input pointers to broadcast algorithms Summary: This changes the constructor prototype of the broadcast algorithms. They now take the rank of the root process and the rank of the root pointer. The root process now also broadcasts locally, among the specified pointers, in addition to broadcasting to its peer processes. The broadcast tests are made more robust to use a different value at every index for every buffer, like the allreduce tests. To accomodate multiple input buffers for CPU side algorithms, I added a Fixture helper, and renamed the existing Fixture class to CudaFixture. The broadcast tests contain a few TODOs since they don't vary the root process or root pointer yet. I anecdotally verified this does work, but didn't want to include the necessary changes to do so in this commit (it requires some changes in rendezvous and NCCL code). A fix for this is forthcoming. Reviewed By: andrewwdye Differential Revision: D4661635 fbshipit-source-id: c069e0d4e8f676a63efd74b15ea1156adcc09477	2017-03-06 22:19:49 -08:00
Avani Nandini	039c3cf0ba	Revert D4657831: [caffe2][PR] Changes for Windows build to pass. Summary: This reverts commit 070ded372ed78a7e3e3919fdffa1d337640f146e Differential Revision: D4657831 fbshipit-source-id: 3a0fb403936a9257776d637ce3ba5dbd81e1119f	2017-03-06 21:02:36 -08:00
Alexis David Jacq	2b1cd919ce	Update extending.rst (#933 )	2017-03-06 23:23:14 -05:00
Yangqing Jia	7b8c7b11d2	Changes for Windows build to pass. Summary: After this, we should have contbuild guarding the Windows build both with and without CUDA. This includes a series of changes that are needed to make Windows build, specifically: (1) Various flags that are needed in the cmake system, specially dealing with /MD, /MT, cuda, cudnn, whole static linking, etc. (2) Contbuild scripts based on appveyo. (3) For Windows build, note that one will need to use "cmake --build" to build stuff so that the build type is consistent between configuration and actual build. see scripts\build_windows.bat for details. (4) In logging.h, ERROR is already defined by Windows. I don't have a good solution now, and as a result, LOG(ERROR) on windows is going to be LOG(INFO). (5) variable length array is not supported by MSVC (and it is not part of C++ standard). As a result I replaced them with vectors. (6) sched.h is not available on Windows, so akyrola 's awesome simple async net might encounter some slowdown due to no affinity setting on Windows. (7) MSVC has a Closes https://github.com/caffe2/caffe2/pull/183 Reviewed By: ajtulloch Differential Revision: D4657831 Pulled By: Yangqing fbshipit-source-id: 070ded372ed78a7e3e3919fdffa1d337640f146e	2017-03-06 20:03:37 -08:00
Soumith Chintala	8e46a15605	add docs for set_printoptions to sphinx (#945 )	2017-03-06 21:52:37 -05:00
Aapo Kyrola	2333ccadfb	MaxOp for CUDA Summary: Simple elementwise Max implementation for CUDA. Given N inputs, it will do N-1 pairwise maxes. I am not sure if it would be much better to iterate through all the inputs in the kernel, since this has better locality. We can also optimize later. Reviewed By: Yangqing Differential Revision: D4659953 fbshipit-source-id: 3a23b7fb3dbdf1d43bf3134ece03af4a791844dd	2017-03-06 16:46:53 -08:00
Andrey Malevich	3e54601bab	New approach to metrics. Summary: This diff is modifying the way we're specifying metrics from doing reporter, that should know all the blobs which is should access in advance, to reporter that is connected through schema. This diff is also reporting any arbitrary number of learning curves to Flow and provides really flexible way to specify all the metrics we care about. TODO: Modify model helper to allow providing intermediate results for reporting TODO: Add evaluation net (instead of prediction net). TODO: Move all other places in DPER 2.0 to use that abstractions instead. TODO: Get rid of LogScoreEstimator in favor of metric that is going to be really suiting our needs. Reviewed By: azzolini, dzhulgakov, kittipatv Differential Revision: D4577548 fbshipit-source-id: 3515bd41e0f92263ff90ce2f7207abf65d01b1f7	2017-03-06 14:48:16 -08:00
Huazhong Ning	f747bbec2e	move the dper 1.0 utils to c2 or fb utils Summary: so that the utils can be used by a wider range of audience. Reviewed By: xianjiec Differential Revision: D4637462 fbshipit-source-id: f0695f430902aef26360efa511069b3755eaf52a	2017-03-06 14:31:45 -08:00
Sam Gross	15a9fbdedb	Merge pull request #881 from colesbury/parallelize_backwards Parallelize autograd backwards	2017-03-06 16:57:19 -05:00
Sam Gross	6336300880	Fix bug where adding a hook could replace an existing hook. We were keying hooks by RemovableHandle id. However, we don't hold onto handles and ids of dead objects can be reused. This replaces id(handle) with a global counter.	2017-03-06 12:47:53 -08:00
Sam Gross	5073132837	Implement 'pre' and 'post' hooks at the C++ autograd level	2017-03-06 12:47:53 -08:00
Sam Gross	65b66264d4	Improve broadcast/reduce performance by coalescing tensors	2017-03-06 12:47:53 -08:00
Chonglin Sun	7472631e7f	fix bug in Mean pooling Summary: simple fix Reviewed By: xianjiec Differential Revision: D4655469 fbshipit-source-id: 6dbcfcd2f3f7f7bd74aca88af4f60c6ddffb9138	2017-03-06 11:31:10 -08:00
Sam Gross	0f872ed02f	Add THCCachingAllocator_recordStream() This is similar to THCCachingHostAllocator_recordEvent() but on CUDA allocations. It's useful for overlapping copies with computation. The workflow is approximately: 0. allocate dst tensor on copy stream 1. copy from CPU to GPU on copy stream 2. synchronize the main stream with the copy stream via cudaStreamWaitEvent 3. THCCachingAllocator_recordStream(dst, main_stream) The recordStream() call is necessary to prevent the dst tensor from begin reused on the copy stream before the main stream finishes work. Previously, you would need to insert a second cudaStreamWaitEvent before dst is freed to force the copy stream to wait on the main stream.	2017-03-06 10:50:19 -08:00
Pooya Davoodi	c61a7ca777	Make counts datatype int. Used as index. Summary: To avoid Numpy warning: using a non-integer number instead of an integer will result in an error in the future Closes https://github.com/caffe2/caffe2/pull/64 Differential Revision: D4658348 Pulled By: Yangqing fbshipit-source-id: 3a1b33cbb27849bc167b08147d078e8d487567f4	2017-03-06 10:46:36 -08:00
Wael Abdelghani	9ef35f4a0b	Add validation checks to load op Summary: Added validation for load op when doing load_all by refactoring validation logic for loading specific blobs. Reviewed By: kennyhorror Differential Revision: D4641986 fbshipit-source-id: e0075a12188ca09d7628add72c143b40d5d9f382	2017-03-06 09:46:35 -08:00
Li Dong	761d6799be	code syntax error in document (serialization.rst) (#937 )	2017-03-06 10:06:04 -05:00
Yangqing Jia	81d5461973	cuda check -> enforce Summary: In the past we have moved most of the CHECKs to CAFFE_ENFORCE (exceptions). However, we kept the name "_CHECK" for cuda calls, and that caused some confusion especially in the destructor calls: while our destructors are not written to handle exceptions, these CUDA_CHECKs could actually throw some exceptions. As a result, this diff (1) Renames all cuda related "_CHECK" to "*_ENFORCE" (2) Explicitly marked the destructor of core Caffe2 classes as noexcept (3) Added proper, really-CHECK cuda check macros, and used those in the corresponding destructors. This should not change any of existing functionality. Reviewed By: dzhulgakov Differential Revision: D4656368 fbshipit-source-id: 32e3056e66c0400156c5ca0187b6151cf3d52404	2017-03-05 22:46:22 -08:00
ezineo	4030fbf535	Add _aligned_free if defined _MSC_VER in context.h Summary: In windows, it is necessary to use `_aligned_free` instead of `free` when using `_aligned_malloc` before. Closes https://github.com/caffe2/caffe2/pull/184 Differential Revision: D4657929 Pulled By: Yangqing fbshipit-source-id: 476a9b702a1ee37d5e16483087be2ccdc7bf4259	2017-03-05 21:17:53 -08:00
Yangqing Jia	35f0c0b0fb	Fix gflags build Summary: Our internal update of gflags in `b0e325ce69` called for this change. Closes https://github.com/caffe2/caffe2/pull/185 Differential Revision: D4657928 Pulled By: Yangqing fbshipit-source-id: bdf9fdc63a16dafc28b690598463ec72e3c50f40	2017-03-05 21:17:53 -08:00
Sri Krishna	0d179aa8db	Updated datasets.rst, combined all commits (#931 ) Added MNIST in the docs Updated incomplete cifar doc Updated the datasets.rst to include all datasets	2017-03-05 17:38:28 -05:00
Du Phan	5b171ad7c2	remove misleading guide for BCELoss (#924 )	2017-03-05 14:31:01 -05:00
albanD	ac9245aeb3	import numpy before setting dlopen flags (#928 )	2017-03-05 14:30:13 -05:00
Soumith Chintala	60736bdf99	fix corner case in kwargs for DataParallel (#930 )	2017-03-05 14:27:52 -05:00
Yiran Mao	7d58765cee	docs: Fixed example code bug in extending module doc.	2017-03-05 12:09:08 -05:00
soumith	76f7d749e4	bump version	2017-03-05 08:49:52 -08:00
Soumith Chintala	0b7374eb44	add THCS to build_all flags	2017-03-05 11:32:43 -05:00
Soumith Chintala	6fff764155	replace old select_compute_arch.cmake with new	2017-03-05 11:32:43 -05:00
Soumith Chintala	8ced72ccb8	link THPP to THCS when CUDA available	2017-03-05 11:32:43 -05:00
Christian Sarofeen	b1ae7f90d5	Added functionality for data parallel table (#843 )	2017-03-05 02:35:46 +01:00
Pooya Davoodi	aef75ca5dd	Strip prefix of strip_prefix in blob names before save and load. Summary: - Replaces strip_regex implementation in SaveOp. It deletes the prefix of blob names upto a given substring. - Adds the same functionality to LoadOp. Needed for loading checkpoints that are stored using the strip_prefix feature. Closes https://github.com/caffe2/caffe2/pull/129 Differential Revision: D4512234 Pulled By: Yangqing fbshipit-source-id: d926c1c5adcc7a711365cede11f21421bb7d4138	2017-03-04 15:46:47 -08:00
Yangqing Jia	59ebbfb2bd	cpu memory allocation reporter Summary: This allows one to report the CPU memory allocation over a Caffe2 run. To enable, use --caffe2_report_cpu_memory_usage in the commandline arguments. This has to happen before any Caffe2 allocation has taken place. Reviewed By: salexspb Differential Revision: D4641353 fbshipit-source-id: 13a4315f63154edad9e925bb5c276cad4fe78c70	2017-03-04 15:46:47 -08:00
soumith	8b61ee522e	Merge commit 'aec182ae72d51dad0f46cdfe7ff9a41380d7da35'	2017-03-04 08:58:21 -08:00
soumith	76ca3eb191	Merge commit 'fea50a51ee2d9af15c42f785ab2232469357b557'	2017-03-04 08:58:02 -08:00
soumith	fea50a51ee	reintroduce USE_AVX* for files which dont have -mavx* set	2017-03-04 08:55:43 -08:00
soumith	51e589ed73	fix critical bug in adds SSE implementation	2017-03-04 08:39:19 -08:00
soumith	2e87643761	remove fastmath for everything except simd/convolve	2017-03-04 08:16:47 -08:00
Aapo Kyrola	8caa7cec8d	CUDA version of Log Summary: As in the title. Simple registration issue. Reviewed By: Yangqing, jhcross Differential Revision: D4655691 fbshipit-source-id: 661e4d5f1226ec05e099c84f4454aa07c6be4449	2017-03-04 00:32:03 -08:00
soumith	ba9a85f271	fix bug introduced in #952	2017-03-03 21:00:05 -08:00
Pieter Noordhuis	a22fd7194e	More assertions for state change in TCP transport Summary: I have seen a stress run crash with unexpected state. Adding these assertions will give more information when it happens again. ``` terminate called after throwing an instance of 'gloo::EnforceNotMet' what(): [enforce fail at gloo/transport/tcp/pair.cc:407] false. Unexpected state: 5 ``` Reviewed By: andrewwdye Differential Revision: D4652216 fbshipit-source-id: e787f4097f5ab32367dd9fa5a336d0389b97e955	2017-03-03 14:20:07 -08:00
soumith	0714d7a3ca	set AVX/AVX2 flags only for specific files	2017-03-03 12:17:14 -08:00
Pieter Noordhuis	fb7bafdd0f	Update README.md Summary: Fix styling in README Closes https://github.com/facebookincubator/gloo/pull/4 Differential Revision: D4651501 Pulled By: pietern fbshipit-source-id: e2d4384ac94972f6c4fc03467564460ea4ce5c85	2017-03-03 11:40:02 -08:00
Sam Gross	34ce58c909	Parallelize backwards	2017-03-03 11:26:00 -08:00
Adam Paszke	c238ee3681	Fix issues with lazy grad initialization (#912 )	2017-03-03 14:23:51 -05:00
Pieter Noordhuis	e1d7eaf7d8	Latency optimization tips Summary: Closes https://github.com/facebookincubator/gloo/pull/3 Differential Revision: D4651203 Pulled By: pietern fbshipit-source-id: 202afcbe26ec77ea93e48e72fea0d36f18b1b026	2017-03-03 11:05:17 -08:00
soumith	f5338a1fb8	compile AVX and AVX2 intrinsic code in separate files. Cleanup use of USE_AVX and USE_AVX2 macros in favor of __AVX__ and __AVX2__	2017-03-03 10:30:18 -08:00
soumith	d96ad41191	cleanup TH CMakeLists and THGeneral.h of unused flags	2017-03-03 09:48:26 -08:00
Martin Raison	f17cfe4293	sparse tensor operations (#735 )	2017-03-03 18:37:03 +01:00
Guillaume Klein	aec182ae72	Support half precision in baddbmm	2017-03-03 16:15:39 +01:00
Fei Sun	8dff5a87f3	Change the type of content in BlobProto from string to bytes Summary: We are converting MetaNetDef from thrift to protobuf. The protobuf is binary encoding. Since bytes is a superset of string. Change the field to bytes so that no warning is generated when compiling caffe2. Reviewed By: Yangqing Differential Revision: D4635581 fbshipit-source-id: 916b799e1fb9466658e1dd198bfb5c6928f22488	2017-03-03 07:15:34 -08:00
Hardik Goel	c93c884ee2	Add negative dimension to transpose and tests (#792 )	2017-03-03 09:31:22 -05:00
Pavan Yalamanchili	c42a2d4d24	Fix dimension check for cat (#959 ) * Use TH_INDEX_BASE when verifying dimension for cat * Adding tests for cat when no dimension is specified. - Also renamed ldimension to cat_dimension to be more specific.	2017-03-03 09:05:06 -05:00
Soumith Chintala	f89252c336	Merge pull request #719 from twitter-forks/cat-fix Fixes to cat	2017-03-03 09:04:06 -05:00
Adam Paszke	490c15fae9	Fix slicing with step (#905 )	2017-03-03 09:00:14 -05:00
Igor Sugak	cdce8f0e52	update gflags Reviewed By: yfeldblum Differential Revision: D4646271 fbshipit-source-id: 5d21407e815588ae2b016001b859a4816851ab00	2017-03-03 00:47:24 -08:00
Artem Volkhin	8c4310ac16	minor fix for _add_net_to_dict Summary: fix a check if the net is net_dict Reviewed By: kennyhorror Differential Revision: D4647493 fbshipit-source-id: e0a62fc5847c99c85857c5635b4e39d59c66d5ce	2017-03-02 23:31:27 -08:00
Pieter Noordhuis	7e3b572ca7	Document algorithm semantics Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4647587 fbshipit-source-id: a804e7479e6e2f511bfa59712b4b4a88bdf657e3	2017-03-02 21:35:28 -08:00
Huazhong Ning	6c9105447c	support fill bool tensors in GivenTensorFill Summary: the existing code uses vector<T> to store the given tensor and then copy to output. If T=bool, vector<bool> stores the data as bits and then copy does not work. we use TensorCPU to store it instead. Also add unittest. Reviewed By: kennyhorror Differential Revision: D4622325 fbshipit-source-id: 95c27b5d1cfbc836d2419d01cacde5a3172f4d7e	2017-03-02 20:18:59 -08:00
Andrew Dye	b6fbc708f5	Verify InferShapesAndTypes() in operator unittests Summary: Verify shape and type inference in op unittests via assertReferenceChecks(). For now catch exceptions from InferShapeAndTypes() and log a warning. TBD: Determine if there existing inference/output mismatches, and if so, change test asserts to warnings until they are resolved. Differential Revision: D4639343 fbshipit-source-id: 605e72f53198e1a100fe7ba18b72c34c9ddbb727	2017-03-02 20:18:59 -08:00
Pieter Noordhuis	5fbcd88102	Rename public member fields on gloo::Context Summary: The fields are public so their names should not end with an underscore. Reviewed By: andrewwdye Differential Revision: D4645038 fbshipit-source-id: c12b47affbe511383a4722717a06abb61918473b	2017-03-02 19:49:45 -08:00
Sam Gross	f2d72ba10f	Revert "make handles to be thread-local" This reverts commit 0720ba53b344809ce3d0bdfb1ea561afa5fe0646.	2017-03-02 17:48:24 -08:00
Pavan Yalamanchili	2108b42b92	Fix bug in cat when dimension is not specified. - Code was using dimension specified which was negative - Changed the cat_dimension variable to be more explicit - Fixed code to use the cat_dimension variable	2017-03-02 16:14:09 -08:00
Pavan Yalamanchili	bae8df62d3	Add missing THCudaCheck around cudaMemcpy	2017-03-02 16:13:39 -08:00
Andrew Dye	a2b2880cc2	Remove underscores from public fields in NCCLContext Summary: Remove underscores from public fields in NCCLContext Reviewed By: pietern Differential Revision: D4645857 fbshipit-source-id: 2c28a1c23d31097d685c0768dad9b99bbef7b171	2017-03-02 16:05:15 -08:00
Pieter Noordhuis	70fc15c05c	More documentation Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4644734 fbshipit-source-id: 50f5fadd2c5cd04e06a025f5538187ed852e669a	2017-03-02 15:50:37 -08:00
Pooya Davoodi	e9c0671132	Convnet benchmark cudnn_ws Summary: - Do not set default for cudnn_ws. Will use the default set by cuDNN ops. - Do not use cudnn_ws for MLP. - Do not run the benchmark if the required args are not set. Previously tried to run and errors out. Closes https://github.com/caffe2/caffe2/pull/177 Differential Revision: D4633143 Pulled By: Yangqing fbshipit-source-id: e89a7d01984e599d92a330d0ee4ba106feba65b8	2017-03-02 15:32:37 -08:00
Soumith Chintala	98775b6bb4	Merge pull request #718 from killeent/templatize-scan genericize PrefixSum --> PrefixScan via binary operator template parameter	2017-03-02 17:50:56 -05:00
Trevor Killeen	b7cc2a501f	genericize PrefixSum --> prefixScan	2017-03-02 14:31:27 -08:00
soumith	0720ba53b3	make handles to be thread-local	2017-03-02 11:10:49 -08:00
ngimel	ff5fa11129	make mkl link to threaded version with GCC (#958 )	2017-03-02 13:37:25 -05:00
Pieter Noordhuis	837023bb4f	Change benchmarks to support multiple input buffers Summary: The NCCL code used in CUDA-aware allreduce does local reduction of N buffers prior to putting anything on the wire. Supporting this in the benchmark tool to measure the impact under various configurations. Other minor tweaks in this change: * Specify sub-second iteration time * Templatize allreduce benchmarks (the algorithms share a constructor prototype) Reviewed By: andrewwdye Differential Revision: D4639517 fbshipit-source-id: f7417d3e9f79278a3b1eca48d779f48b77e5260c	2017-03-02 10:16:39 -08:00
Andrew Dye	e88d241757	Cuda algorithms should return asynchronously if device streams are passed in Summary: Cuda algorithms take an optional set of device streams to sequence operations. If streams are provided, the algorithms should enqueue final output buffer operations on the associated stream and return asynchronously. Destructors that allocate streams/events should synchronize before tearing down. Reviewed By: pietern Differential Revision: D4636447 fbshipit-source-id: 32ec2adc214c83b0b4bc0fff8993ab196459117b	2017-03-02 10:16:38 -08:00
Pieter Noordhuis	ecb37e4439	Update tests to cover potential reordering problems Summary: With this change, every buffer gets assigned a different value at every index. This means reordering of segments (e.g. in the chunked algorithm) would surface as test errors. Reviewed By: andrewwdye Differential Revision: D4636368 fbshipit-source-id: 464eb1515d1590e12481961d427a92e2ebb3be82	2017-03-02 10:16:38 -08:00
Andrew Dye	0c88194807	CUDA documentation Summary: CUDA documentation detailing high-level support for CUDA in gloo algorithms, usage of streams, and synchronizing memory management. Reviewed By: pietern Differential Revision: D4633120 fbshipit-source-id: d88e230c8dc82fe48cda0f401b61758fa4f07f2e	2017-03-02 10:16:38 -08:00
Pieter Noordhuis	50e73a8313	Support synchronous mode in ibverbs transport Summary: Synchronous mode means using the calling thread instead of the device thread for completion handling. Since this saves a context switch in the critical path, this is very beneficial for low latency algorithms. For example: the p99 of a 4-way barrier drops from 17us to 4us. Reviewed By: andrewwdye Differential Revision: D4626948 fbshipit-source-id: 013b1680497589fe5ad0bca38600bce6a410200b	2017-03-02 10:16:38 -08:00
Pieter Noordhuis	fc7f026980	Refactor ibverbs transport to prepare for sync mode Summary: All pairs created by a device would use the same completion queue. Supporting sync mode that way is difficult, as there is no way to filter completions for a particular pair. This change refactors this to use a single completion queue per pair so that this is no longer an issue. This change is a preparation for supporting synchronous mode (where the calling thread itself will poll the ibv library for completions instead of the device thread). This change also includes a refactoring of the way transient memory regions are handled so that they are properly deregistered and deallocated when no longer needed. Reviewed By: andrewwdye Differential Revision: D4625146 fbshipit-source-id: 21bf5ab321534fbd5c03f12049c10fc67da68944	2017-03-02 10:16:38 -08:00
Pieter Noordhuis	9f18f83375	Downcase setMutex Summary: TSIA Reviewed By: andrewwdye Differential Revision: D4626965 fbshipit-source-id: 2d32b07182202f65e673795aefacc6cc991d3c7c	2017-03-02 10:16:38 -08:00
Pieter Noordhuis	9c114e6f1c	Fix compile error Summary: std::atomic was not defined for cuda.cu. Reviewed By: andrewwdye Differential Revision: D4624611 fbshipit-source-id: 973bba10026e065667d6a576055d00505ee02d62	2017-03-02 10:16:38 -08:00
Andrew Dye	0e78a59610	add mutex getter/setter to synchronize CUDA and NCCL ops Summary: Allow gloo consumers to assign a mutex to synchronize CUDA malloc/free and NCCL operations. Reviewed By: pietern Differential Revision: D4622135 fbshipit-source-id: 60acd7c01a677a0df5415fe38e6ef5a2e7c8606a	2017-03-02 10:16:38 -08:00
felixgwu	5e7f5db332	add subset samplers (#888 )	2017-03-02 09:26:10 -05:00
Sergey Zagoruyko	b5f7592140	boolean mode in module.train	2017-03-02 09:18:05 -05:00
Zhou Chang	f366e5fc81	Support int16 numpy conversions issue #891	2017-03-02 09:15:57 -05:00
Shiva	48f087f6ce	C99 cleanup broke MSVC (#952 ) * __pragma for MSVC.	2017-03-02 08:57:28 -05:00
Simon Layton	73db5f902e	Fbsync cudnn rnn fix Summary: Update cuDNN RNN interface (mostly fixing ordering of arguments). Set seed so that test can pass consistently Closes https://github.com/caffe2/caffe2/pull/62 Reviewed By: Yangqing Differential Revision: D4348966 fbshipit-source-id: f9b56be37739e5bffabec130e3407492b2aef656	2017-03-02 05:31:21 -08:00
Aapo Kyrola	ec56737190	fix shape inference for spatial softmax with loss Summary: The shape inferenec did not check for spatial mode. Reviewed By: andrewwdye Differential Revision: D4638218 fbshipit-source-id: f15419738587013dea39e04a3da086890938c4e2	2017-03-01 19:32:32 -08:00
Yangqing Jia	642b5a863f	Adding changes that enable MSVC build Summary: MSVC 2015 has known bugs about template functions so these changes aim to fix them - no functional differences introduced. Closes https://github.com/caffe2/caffe2/pull/179 Reviewed By: ajtulloch Differential Revision: D4635241 Pulled By: Yangqing fbshipit-source-id: a282a96e1e626e9440c1e3f3cb15b5b1fa710887	2017-03-01 16:47:58 -08:00
Sylvain Jeaugey	7fef264bfa	Bumping version to 1.3.3	2017-03-01 16:44:27 -08:00
Nathan Luehr	8996811936	Only enable peer access for ring neighbors. This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.	2017-03-01 16:42:38 -08:00
Sylvain Jeaugey	c219a183d0	Fix copy/paste typo in error message	2017-03-01 16:42:38 -08:00
Sylvain Jeaugey	8e1d6f9b60	Fix crash in Reduce when non-root ranks have invalid recvbuff	2017-03-01 16:42:38 -08:00
Andrey Malevich	6cb63df704	Default LocalSession to current workspace. Summary: At the moment LocalSession creates a new workspace if none if provided. As a result anything that have been executed in local session is not going to be avaiable to the external caller, i.e. everything that is using SingleRunner can only observe side-effects and not really access intermediate blobs. This diff is modifying LocalSession to run in current workspace instead (unless it gots some really weird effects because we rely on privateness of the workspace it should work). Differential Revision: D4634743 fbshipit-source-id: 975bed154c7ca215dc3fc0d60f05a7c092711482	2017-03-01 16:03:18 -08:00
soumith	7ad948ffa9	fix tests to not sys.exit(), also fix fatal error on THC initialization	2017-03-01 17:37:04 -05:00
Adam Paszke	3277d83648	Add Nesterov Momentum (#887 )	2017-03-01 20:49:59 +01:00
Aapo Kyrola	2cddbc719c	Euthanize a process with timeout Summary: vigneshr has been experiencing randomly that the process does not exit in the end. We don't know what causes this, so this will help with two ways: (1) by putting timeout_guard.EuthanizeIfNecessary(600) in the end of the operator, you ensure that the process is killed in 10 minutes, allowing for retry; (2) this killing will cause python stack traces to be dumped, helping debug the real issue. Differential Revision: D4635781 fbshipit-source-id: b558418c80671c00effdd514e4ddc01e935c95df	2017-03-01 11:38:11 -08:00
Qichao Que	2f68632a32	Add SparseNN workflow for feed. Summary: Add SparseNN workflow for feed. I haven't fully thought about the change needed for ads, as I added a property called 'preproc_output_schema' for LayerModelHelper. Reviewed By: xianjiec Differential Revision: D4585796 fbshipit-source-id: 060d08f4beb928e7e7863f2e563f612c358951fb	2017-03-01 11:02:38 -08:00
Adam Paszke	1487278fdf	Allow backprop through cuDNN RNN in eval mode Handling of dropout descriptors has been improved too.	2017-03-01 19:42:39 +01:00
Adam Paszke	977630bc15	Handle duplicate backward roots in autograd	2017-03-01 19:42:39 +01:00
Sergey Zagoruyko	12efd53dba	ConstantPad2d and F.pad (#856 )	2017-03-01 19:39:44 +01:00
Alykhan Tejani	37e05485d9	added initialization schemes in torch.nn.init (#833 )	2017-03-01 19:34:13 +01:00
soumith	c76770f40e	Merge commit 'dfca8dfdc5988813ed5673589ffa4fdd1c4f3d2d'	2017-03-01 09:29:51 -08:00
Adam Paszke	da725830c2	Add support for variable length sequences in RNNs (#873 )	2017-03-01 17:36:32 +01:00
Sam Gross	fc6fcf23f7	Lock the cudaFree mutex. (#880 ) Prevents NCCL calls from overlapping with cudaFree() which can lead to deadlocks.	2017-03-01 11:29:25 -05:00
Aapo Kyrola	aa3156c235	Remove use of logging module and np.random.randint() due to deadlocks with forks Summary: See http://bugs.python.org/issue6721. Since everstore loaders use ProcessPoolExecutor, which is based on forks, and there was perhaps update of the numpy library or some unralted lirbary, we started getting subprocesses stuck at np.random.randint(). Also changed logging to prints, since logging is known to have issues with multiprocessing. See https://www.prod.facebook.com/groups/fbpython/permalink/1438647216176641/ Differential Revision: D4633725 fbshipit-source-id: ae948a1827c71a3a2119d6a3248706728984df31	2017-03-01 03:32:56 -08:00
Sam Gross	b190f1b5bc	Add another pinned memory test. Checks that pinned memory freed on a different GPU from which it was allocated isn't re-used too soon.	2017-03-01 12:22:31 +01:00
Aapo Kyrola	02937903cc	add inference for gradient ops + a couple of missing shape inference functions + fix to scalars Summary: A bit too much stuff in one diff, so sorry: 1. Add inference for gradient types by using the fact that x_grad is gradient of x and must be of same shape. This is kind of awkward to use string matching, but in addition I rely on the operator being actually a gradient op. 2. dzhulgakov was write, scalar shape is () and not (1). Sorry, my claim easlier was #fakenews. 3. Added inference functions for MakeTwoClass, MomentumSGDUpdate and Cross entropy ops. Reviewed By: dzhulgakov Differential Revision: D4569758 fbshipit-source-id: 0db13f33819777fdddefe21d4b1ebf906fcaf98c	2017-02-28 23:33:32 -08:00
Aapo Kyrola	f84e5360cc	LSTM benchmark (Caffe2 RNN based) Summary: Just generate some random data and put it through LSTM (Cafef2 RNN based) using its own output as gradient value for benchmark purposes. With default parameters it fits my dev GPU memory. On default parameters provided in this diff I have got 300k entries per second processed. These entries are split into blocks of seq_length * block_size. Each entry is of size hidden_dim, LSTM takes in hidden_dim sized input and produces output of the same size. Reviewed By: salexspb Differential Revision: D4605815 fbshipit-source-id: dd529302a0a93e8711784c67e4c777c8d6a8cdf4	2017-02-28 23:17:26 -08:00
Jerry Pan	8a0ebed4c9	Caffe2: Tile operator Summary: Caffe2: Tile operator Differential Revision: D4630698 fbshipit-source-id: 1aa5c3c9d7fcfc17f78c80fd4b752595280266a0	2017-02-28 23:17:26 -08:00
Yangqing Jia	bdd542d087	backup functions for non-cuda cases Summary: This fixes the error introduced in cudnn v6 diff. Reviewed By: ajtulloch Differential Revision: D4633113 fbshipit-source-id: 454cd4b3e52b8de01c1914e66d25310d7ecb13aa	2017-02-28 22:07:54 -08:00
Luke Yeager	69fa85be26	Fix some typos Summary: Found while reading through `d522693cc8` Closes https://github.com/caffe2/caffe2/pull/176 Differential Revision: D4630275 Pulled By: Yangqing fbshipit-source-id: 0a8e85d317d427a39467ebcb5e9a70594075bae2	2017-02-28 18:36:12 -08:00
Simon Layton	fbf47a8825	Cudnn v6 Summary: Add cudnn v6 support, including testing support for dilated convolution. Add a check to ensure that the versions of cuDNN used to compile Caffe2 and run it are compatible Closes https://github.com/caffe2/caffe2/pull/85 Reviewed By: bwasti Differential Revision: D4387690 Pulled By: Yangqing fbshipit-source-id: 312960134398dd4afe6ee0c01cdc160046c904e8	2017-02-28 17:46:33 -08:00
Trevor Killeen	dfca8dfdc5	ensure valid index in multinomial	2017-02-28 14:48:48 -08:00
Sam Gross	b46d5e0b04	Fix NN bindings	2017-02-28 14:35:38 -08:00
Sam Gross	f19a11a306	Merge commit '8e8022b7351401911e10b94aeb5ae35d32907705'	2017-02-28 14:35:20 -08:00
Sam Gross	cfcf69703f	Merge commit '80429ad9f7c4775f7f88344a2cf037e499f060b8'	2017-02-28 14:35:00 -08:00
Sam Gross	e22b8e0d17	Merge commit '3cc89afde68a831434f3abe9e3af2ac0b134215e'	2017-02-28 14:34:44 -08:00
Sam Gross	fbfba6bdca	Merge commit '6ff77503645da59eeca5be473a1902e523c4adb3'	2017-02-28 14:34:29 -08:00
Soumith Chintala	3cc89afde6	Merge pull request #713 from killeent/multinomial-indexing-fix fix indexing bug in sampleMultinomialOnce	2017-02-28 17:13:44 -05:00
Soumith Chintala	1e4aee057c	Merge pull request #712 from killeent/multinomial-fixes Fix sampleMultinomialOnce to better handle large distribution values	2017-02-28 17:12:48 -05:00
Soumith Chintala	8dfcf7e35a	Merge pull request #709 from colesbury/pinned_memory Fix bug where pinned memory event could be recorded on incorrect device	2017-02-28 16:56:21 -05:00
Sam Gross	76de151ddd	Fix bug where pinned memory event could be recorded on incorrect device	2017-02-28 13:48:56 -08:00
Trevor Killeen	2676cc46c2	fix indexing bug in sampleMultinomialOnce	2017-02-28 13:40:15 -08:00
Zachary Mirman	1c92e85dae	Added editDistance helper to caffe2 operators Summary: Added editDistance helper to caffe2 operators Differential Revision: D4622152 fbshipit-source-id: 4d6246b8226c1283d5883edfaa27e8f7748fdc4c	2017-02-28 13:31:56 -08:00
Trevor Killeen	1bf7bc9768	refactor sampleMultinomialOnce to use <real, accreal>, assertion for sum overflow	2017-02-28 12:46:12 -08:00
Sam Gross	3c41c9fe46	Add AutoGPU RAII that doesn't depend on Python API (#875 ) Separates out non-Python part of AutoGPU. This also compiles without CUDA which is useful for generic tensor code. Also fixes a bug where THCPAutoGPU may not always switch the device: THCPAutoGPU guard(-1); guard.setDevice(0); guard.setDevice(1); guard.setDevice(0); // would not switch batch to 0	2017-02-28 14:39:20 -05:00
Artem Volkhin	000db87bc7	Half-floats support for the rest of segment ops Summary: previously fp16 type was supported in SparseLengthsSum operator, now it works in all other segment operator as well. Reviewed By: dzhulgakov Differential Revision: D4624312 fbshipit-source-id: c9d72110e3762167270bb088405eaf9c56e88493	2017-02-28 11:19:15 -08:00
Adam Lerer	6ff7750364	add TH_TENSOR_APPLY variants for optimized redux (+refactor)	2017-02-28 10:30:31 -08:00
Adam Lerer	4d25c3d048	address comments and add tests	2017-02-28 10:23:36 -08:00
Boris Fomitchev	267b7ade50	Speed up reductions on non-contiguous dimensions	2017-02-28 10:23:36 -08:00
Yangqing Jia	e30e94cb71	Made CNMEM optional and added a few cmake components Summary: (1) Since cub seems to be a better memory pool I made cnmem optional. (2) Added MKL testing since Intel now provides an apt source, but that doesn't seem to work right now. (3) Added cmake file for nervana gpu. Closes https://github.com/caffe2/caffe2/pull/175 Differential Revision: D4627056 Pulled By: Yangqing fbshipit-source-id: 9676fa32fce2a29574c0bf7e9d31660b5535cb51	2017-02-28 10:16:49 -08:00
Viswanath Sivakumar	b732f347ba	Fix minor bug related to pinned memory allocator Summary: Came across while reading something. Missing return statement. Reviewed By: pietern, dzhulgakov Differential Revision: D4626160 fbshipit-source-id: 4811b9c720510c76d3aadd93cee00f342f6552de	2017-02-28 09:32:21 -08:00
Soumith Chintala	80429ad9f7	THVector_(add) -> THVector_(adds)	2017-02-28 12:20:44 -05:00
Soumith Chintala	5ca6516ecb	THVector_(add),(mul),(div) -> (adds),(muls),(divs)	2017-02-28 12:10:47 -05:00
Pablo Estevez	ffa2f77a82	Remove vectorization TODOs where not needed Summary: Remove TODOs where vectorization with Eigen is not needed, based on D4565679 feedback. Reviewed By: ajtulloch Differential Revision: D4623239 fbshipit-source-id: c949ee9bc295e87a87c333d68d958f0abfa71fd4	2017-02-28 08:36:14 -08:00
Andrey Malevich	a3726759c6	Add a way do describe layers in a more AdHoc manner. Summary: This diff is trying to address one of the concerns that Xianjie have had - requirements create a layer for all operators and attach pass shapes and other info around. The basic idea of the diff: 1. Try to create a layer with a given name, but if it's not available try to fallback on operator with that name (that is expected to have no parameters). 2. For all operators that we're adding through this functional style of creation - try to use C2 Shape/Type inference logic to get output type. If we fail to get - it just return untyped record and expect user to annotate it when it's really needed. Reviewed By: xianjiec Differential Revision: D4408771 fbshipit-source-id: aced7487571940d726424269970df0eb62670c39	2017-02-27 23:30:39 -08:00
Aaron Markham	851cb7059d	changed StringfyProto to StringifyProto Summary: Closes https://github.com/caffe2/caffe2/pull/155 Reviewed By: dzhulgakov Differential Revision: D4621607 Pulled By: Yangqing fbshipit-source-id: ec7f45132260fbb6d36ef61ffbf5bf6466f237eb	2017-02-27 23:05:04 -08:00
Pooya Davoodi	d85ca8c6df	Do not initialize BN params if init_params is false. Summary: If init_params is False, the parameters should not be initialized. This is particularly important when testing a model that provides values for these BN parameters. Closes https://github.com/caffe2/caffe2/pull/174 Differential Revision: D4621791 Pulled By: Yangqing fbshipit-source-id: 518443925990a12c1d5729b0971ebe19ba5d8998	2017-02-27 20:19:03 -08:00
Aapo Kyrola	7b0126381c	Share queue + reduce logging Summary: It is better for the workers to share the python-side queue, since I saw a case where workers assigned for one GPU was lagging behind others. Also, reduced logging as requested by rpenggithub. Differential Revision: D4620487 fbshipit-source-id: 73353f9570b07788c8cd71c9fec9308cd93a44dd	2017-02-27 19:38:45 -08:00
Adam Paszke	67f94557ff	Expose torch.HalfTensor	2017-02-27 19:35:47 -05:00
Luke Yeager	61bd5a0643	[Lint] Address F811	2017-02-27 19:33:00 -05:00
Luke Yeager	748d011c8b	[Lint] Address F812	2017-02-27 19:33:00 -05:00
Luke Yeager	5d5cfe2e57	[Lint] Address E731	2017-02-27 19:33:00 -05:00
Luke Yeager	7cbe255296	[Lint] Use flake8 instead of pep8	2017-02-27 19:33:00 -05:00
Soumith Chintala	4ef303698c	Merge pull request #711 from gchanan/getDeviceAllocator Add getter for cuda device allocator.	2017-02-27 19:29:39 -05:00
Pablo Estevez	e2acf0f95b	Vectorize rmsprop_update using Eigen Summary: Replace for loop with Eigen operations in method rmsprop_update Reviewed By: ajtulloch Differential Revision: D4620691 fbshipit-source-id: 89cd570ecdf56a1255be4a0959ee711addc9696b	2017-02-27 16:03:14 -08:00
Gregory Chanan	83e8b3f6c3	Add getter for cuda device allocator.	2017-02-27 15:44:44 -08:00
Adam Paszke	502ebed796	Fix one more reference cycle and ensure correct flag propagation (#868 )	2017-02-27 18:38:29 -05:00
Sam Gross	68ff58d771	Expose a mutex that is held around cudaFree() calls. NCCL can deadlock if cudaFree() is called while it's launching kernels. This exposes a mutex that can be held to prevent cudaFree() calls in the caching allocator.	2017-02-27 15:08:30 -08:00
Sam Gross	969c1602e6	Add Tensor::copy() to THPP For now, this only supports copying from the same type. We can add polymorphic copying in the future.	2017-02-27 21:33:40 +01:00
Kun Huang	07623e24c9	Implement shape inference function for Im2Colop Summary: Inference function for the Im2ColOp: caffe2/caffe2/operators/im2col_op.cc. Differential Revision: D4608663 fbshipit-source-id: d26ffb403c2acb7a5ead5f58f044ee3340c8311a	2017-02-27 10:46:54 -08:00
Pablo Estevez	1f537fe7d6	Vectorize ElementWiseDivide using Eigen Summary: Replace for loop with Eigen operations in method ElementWiseDivide Reviewed By: Yangqing Differential Revision: D4602516 fbshipit-source-id: 6b19de8190d5e29ffe52359d0cd0c27cf03c52e2	2017-02-27 10:46:54 -08:00
Yangqing Jia	88b7f8ffd5	Fix memory pool implementation Summary: The memory pool implementation was written back in the days when I only had one GPU, and as a result I overlooked the fact that: (1) CNMEM needs to have the same current device for the allocation and deallocation to take place correctly. (2) cub needs the device id of the pointer passed in for proper deallocation. As a result, since C2 right now switches contexts very frequently, I added a global map to keep record of the pointer affiliations, and use that for deallocation when we are at another context. I have not tested the speed but assuming that std::unordered_map is not too bad this should be fairly fast. Differential Revision: D4617300 fbshipit-source-id: e8bb366616cd93504e7d68b7f999011cd49caba5	2017-02-27 10:46:54 -08:00
Yangqing Jia	4b52cbe636	turn off deprecation warning if glog needs so Summary: This addresses #162 for thatguymike Closes https://github.com/caffe2/caffe2/pull/172 Differential Revision: D4620982 Pulled By: Yangqing fbshipit-source-id: df3ef45f2c95418c538baa65d5dde3755cb25d1c	2017-02-27 10:07:58 -08:00
Aapo Kyrola	449f8997ab	close blobs queues when stopping + test Summary: Mysterious deadlocks after epoch has finished have occured randomly but quite frequently recently for myself, vigneshr and others. Looking at a stack trace of vigneshr's job (P57129798), I noticed a couple of threads were calling BlobsQueue.blockingWrite (or something like that). That call stucks when the caffe2/c++ side queue is at capacity (we use capacity of 4 with data workers). So in cases when this call was just being made while the script was to be terminated, the thread did not close and the whole process did not close either (not completely sure why that is since thread is a daemon thread, but this might be a flow-related issue since we run inside a flow container). This is quite easy to fix: just call CloseBlobsQueue() when terminating the process. I modified coordinator.stop() and wait_for_finish() to return a status code based on whether threads that were joined actually closed within the 1.0sec timeout. This allowed creating an unit test to test for this issue. Before my change, the unit test failed. Reviewed By: pietern Differential Revision: D4619638 fbshipit-source-id: d96314ca783977517274fc7aadf8db4ee5636bdf	2017-02-27 10:07:57 -08:00
Andrew Dye	2d4d3b18dd	Use NCCL operations in AllreduceChunked Summary: The AllReduceChunked algorithm currently performs the local reduce/broadcast of local device buffers in host memory. This diff updates the algorithm to execute the local reduce/broadcast steps using NCCL operations before copying a single device buffer to/from host memory. Reviewed By: pietern Differential Revision: D4587441 fbshipit-source-id: 4de689f59a6cf898b8eecd3c3b9f57f77124c0e3	2017-02-27 09:59:29 -08:00
Yangqing Jia	97f95bb247	mpi const cast Summary: This fixes https://github.com/caffe2/caffe2/issues/160 Reviewed By: pietern Differential Revision: D4617278 fbshipit-source-id: 6fbc7727d62915cfe0426b528d707756580e7b78	2017-02-27 09:46:31 -08:00
Yangqing Jia	d0e1f5f344	fix summarize op Summary: This aims to fix https://github.com/caffe2/caffe2/issues/168 Reviewed By: pietern Differential Revision: D4617272 fbshipit-source-id: 1b4952757f73d9a6cbab7c372d8ba84c9741b124	2017-02-27 09:31:49 -08:00
Ofir Press	5e1d6a3691	Update functional.py (#862 ) Fixed documentation error in conv3d	2017-02-27 10:42:02 -05:00
Sasank Chilamkurthy	533cfc0381	Minor fix of docs of ModuleList and ParameterList (#861 )	2017-02-27 10:09:54 +01:00
Adam Paszke	2b23712dc3	Improve autograd memory usage (#859 )	2017-02-26 22:37:26 -05:00
Eli Stevens	88275da5e8	CUDA documentation tweaks (#858 )	2017-02-26 20:37:43 +01:00
Adam Paszke	bd7a5ad6f0	Make Optimizer.load_state_dict use __setstate__	2017-02-26 20:02:42 +01:00
Adam Paszke	1f6f82dbcf	Fall back to indexing compatible with numpy	2017-02-26 20:02:42 +01:00
Adam Paszke	1f8939937a	Allow using expand to broadcast tensors	2017-02-26 20:02:42 +01:00
Adam Paszke	b3d41a5f96	Add docs for ModuleList and ParameterList	2017-02-26 20:02:42 +01:00
Adam Paszke	fec2d493a9	Reshape grad_output in basic ops	2017-02-26 20:02:42 +01:00
Adam Paszke	86ee75f63f	Fix for Long and Byte tensor indexing of Variables	2017-02-26 20:02:42 +01:00
Adam Paszke	31941918cf	Prevent creation of reference cycles with leaf Variables that don't require grad Also, raise an error immediately, if a leaf that requiers_grad is modified in-place. Some comments were updated too.	2017-02-26 20:02:42 +01:00
Adam Paszke	19a65d2bea	Expose stateless methods for torch.cuda.HalfTensor	2017-02-26 20:02:42 +01:00
Sergey Zagoruyko	819d4b2b83	Add finite differences gradcheck (#851 )	2017-02-26 08:35:24 -05:00
Eli Stevens	b87c113cf4	CUDA documentation enhancement and docs versioning (#848 ) * Add more detail to CUDA documentation Also adds better cross-linking to the pages that discuss relevant topics. * Adds recommendation to torch.save docs * Make the version numbers for the docs dynamic Might need tweaks for beta, 1.0, etc.	2017-02-26 08:33:26 -05:00
Soumith Chintala	b25182971f	readme change for getting clarity on binaries	2017-02-26 07:52:13 -05:00
Joo-Kyung Kim	1ee2c47e37	Correcting the description of LSTM attributes (#854 )	2017-02-26 13:30:55 +01:00
Andrey Malevich	21c40c1a3c	Provide ability to specify more types for ConstantFillOp Summary: It looks like for most of the types there is not way we can get them (except the results of operation on top of some other tensor), that was pretty unfortunate for cases when we want to do partial type inference (I was trying to do so in D4408771). This diff is adding more possible types for ConstantFillOp. Please let me know if I'm missing anything. The only part that worries me a bit - possible GetArgument with types that support only subset of range (but it looks like it can happen even now for i32 vs i64). Reviewed By: dzhulgakov Differential Revision: D4611482 fbshipit-source-id: 77917fd5e1d18a1b860e022ede4518143d0f3f26	2017-02-25 22:48:36 -08:00
Francisco Massa	2dc563f1f1	Fix indexing when passing only an Ellipsis	2017-02-25 23:34:09 +01:00
Kevin Matzen	04d02632e9	instance norm test fix Summary: Reduce test input size to instance norm gradient check. Larger size is currently timing out on stress tests. e.g. failed: Timeout: Ran out of time before finding a satisfying example for test_instance_norm_gradients. Only found 2 examples in 125.39s. Reviewed By: Yangqing Differential Revision: D4608828 fbshipit-source-id: ce17a3ad28752d808efcbf79f1ea4238e63fb005	2017-02-25 14:31:42 -08:00
Adam Paszke	15ba71a275	Rebase fixes	2017-02-25 17:14:52 +01:00
Filip Binkiewicz	e5b3fc49d6	Implementation of the 3rd set of tensor functions	2017-02-25 17:14:52 +01:00
Janusz Marcinkiewicz	ae1766951d	Link TH and THPP to THD (#57 ) * Fix THD library build * THPP dependency added * Minor cleanup; Fix build on OSX	2017-02-25 17:14:52 +01:00
Janusz Marcinkiewicz	02d08dafd9	Add support for IPv6 in Data Channel TCP (#53 )	2017-02-25 17:14:52 +01:00
Zhou Chang	13a5090695	Added a size change in MaxPool1d module and improved tests (#771 ) (#832 ) Backend is SpatialDilatedMaxPooling, so change 3D input (NCL) to 4D size (NC1*L). Then output indices will range from 0 to L. This range will not cause UnMaxPool1D error. Signed-off-by: Zhou Chang <achang.zhou@gmail.com>	2017-02-25 08:53:30 -05:00
Yangqing Jia	1d26baa0fc	use CMAKE_SYSTEM_NAME instead of LINUX Summary: Closes https://github.com/caffe2/caffe2/pull/170 Differential Revision: D4617063 Pulled By: Yangqing fbshipit-source-id: cec9bc3f2f7324fd0281e92fab3d96e2cd4ed9e7	2017-02-24 19:47:41 -08:00
soumith	8e32e4c04c	make wrap_generic_function importable	2017-02-24 14:27:54 -08:00
soumith	cf991310c3	c++ virtual function fix	2017-02-24 13:22:44 -08:00
Peng Yang	8ab13eea6f	delete redundant comment lines. Summary: delete redundant comment lines. Differential Revision: D4600596 fbshipit-source-id: 4bb619f9ff99d6f799e87970b6b6d5ea7de02c98	2017-02-24 11:04:36 -08:00
Aapo Kyrola	a8e7d922a6	increase QPS to 470K (from 250K or so) Summary: (Stacked with D4553941). Using the new net type increases QPS to 470K, close to Torch numbers (there are other optimizations that need to be done, particularly the log-estimator). Previously, QPS was close to 250K. This was when having reuseData=true. Includes a small bug-fix to the new net type. Differential Revision: D4594704 fbshipit-source-id: 21e7b0ca4173b036f45d3ba95c218792b31e7398	2017-02-24 10:46:51 -08:00
Soumith Chintala	938706099e	adding environment flags to disable SIMD codepaths	2017-02-24 07:35:11 -05:00
Xianjie Chen	b257fd8e83	Other places that may need NameScope Summary: For code in layer model helper, layers. It's intentionally to not have NameScope by default. This looks another place that may need default NameScope. https://fburl.com/wdwtxp0m Reviewed By: kennyhorror Differential Revision: D4606971 fbshipit-source-id: b560bf59d3242e3f9443cd5aeda5c7e2e4e89079	2017-02-23 21:16:35 -08:00
Bram Wasti	aa875869dc	Added more summary information for debugging python versions Summary: Closes https://github.com/caffe2/caffe2/pull/167 Reviewed By: Yangqing Differential Revision: D4610416 Pulled By: bwasti fbshipit-source-id: 0f56941bed2a75105787e518a71638916e4d503f	2017-02-23 19:46:39 -08:00
Aapo Kyrola	9eeeb8407f	use CUDA version of AccuracyOp with top_k=1 Summary: D4348953 added support for accuracy for top_k>1, which is only supported on CPU, requiring data to be copied to CUDA. But that diff did not take into account that we have top_k=1 version of AccuracyOp for CUDA. This diff ensures we use the CUDA version for top_k=1. Differential Revision: D4607767 fbshipit-source-id: 8becda23890343043eb79ad04e4c6196e9010f0c	2017-02-23 19:02:53 -08:00
Min Li	182c168285	Add group collector limit and add option for enable sum loss Summary: as title. Add num of examples limit for group collect. Add option for enabling sum loss in BatchLRLoss Reviewed By: xianjiec Differential Revision: D4602311 fbshipit-source-id: 5b2a244f1f0e9f1ab0f4590e94828fd18d018d8d	2017-02-23 15:03:22 -08:00
Deepak Gopinath	cd4ea42048	Allowing creation of random odd length arrays in RandGaussian Summary: curandGenerateNormal can only generate arrays of multiple of 2 lengths. MSRAFill and GaussianFill operators use RandGaussian utility method which in turn uses curandGenerateNormal. This is a test which runs the operators on both devices to generate odd sized random arrays. Differential Revision: D4602819 fbshipit-source-id: e65f5c731e925886cfa14afff482f7053bd020a0	2017-02-23 15:03:22 -08:00
Aapo Kyrola	0a060dae50	better killing after timeout, cleanup Summary: This fixes at partly a recurrent problem when using everstore data input (or any other data input with multiprocessing). If the main process dies violently, the child processes are not killed. One cause for this was when using the TimeoutGuard(), as it called os._exit(1) that prevents any cleanup happening. I changed it to send SIGINT signal to the PID, and if in 10 secs the process is still living, calling os._exit(1). In my tests, this works well. Did some other cleanup: - improved logging of inputs/sec in data_workers - removed redundant atexit() handling as the multiprocessing pool does it itself Differential Revision: D4602550 fbshipit-source-id: 64d4526a2a3625d163d23f078286e719d56998f4	2017-02-23 13:16:19 -08:00
yunjey	3330287dc7	Update dataloader.py (#837 )	2017-02-23 14:38:41 -05:00
Chonglin Sun	8a85d6bd34	support vectors with different dims in for DotProductOp. Summary: Add two argument to DotProductOp operator, `force_same_dim` (1 if we want DotProductOp to only accept two tensors with equal dimension, 0 otherwise) and pad_value (only useful when force_same_dim = 0, pad the tensor with smaller size to the same as the other one). Differential Revision: D4502619 fbshipit-source-id: 46f7da710c6f6365f76a7af6234c34c7f656be62	2017-02-23 11:09:07 -08:00
Soumith Chintala	38c8520adf	adding unsqueeze to docs	2017-02-23 12:13:25 -05:00
Yury Zemlyanskiy	4a53ab3cb6	LSTMWithAttention implementation in Caffe2 Summary: Implementation of ##LSTMWithAttention## Still TBD: 1. There are problems with back propagation, because gradient is not implemented for ops with broadcasting 2. I need to make initial_recurrent_state to be of shape [dim] rather than [1, batch_size, dim], so one doesn't need to provide batch_size to LSTMWithAttention Differential Revision: D4298735 fbshipit-source-id: 8903fcff4d6a66647ee6d45a6ef28803fc3091e5	2017-02-23 04:08:34 -08:00
jokeren	492e1746af	Fix THFree in THTensorApply	2017-02-23 06:01:13 -05:00
jokeren	91a8109cfd	Use C99 for openmp cleanup	2017-02-23 06:01:13 -05:00
jokeren	161490d34a	Add memcpy copy	2017-02-23 06:01:13 -05:00
jokeren	9c302852eb	comments fix	2017-02-23 06:01:13 -05:00
jokeren	8654fcfd60	THVectorDefault style fix	2017-02-23 06:01:13 -05:00
jokeren	b3d527d9a0	Tab style fix	2017-02-23 06:01:13 -05:00
jokeren	4d495218c9	THTensorApply3 contiguous optimizations	2017-02-23 06:01:13 -05:00
jokeren	13a041284c	THTensorApply2 copy optimization	2017-02-23 06:01:13 -05:00
jokeren	c60c1a003d	TH_TENSOR_APPLY2 contiguous optimization	2017-02-23 06:01:13 -05:00
jokeren	97add1a5ea	comment fix	2017-02-23 06:01:13 -05:00
jokeren	ca02930e47	Fill bug fix	2017-02-23 06:01:13 -05:00
jokeren	20d5e95077	THTensorApply3 compress counter	2017-02-23 06:01:13 -05:00
jokeren	eb4a7dc11d	THTensorApply change dims to sizes	2017-02-23 06:01:13 -05:00
jokeren	f722498b72	THTensorApply2 counter compress	2017-02-23 06:01:13 -05:00
jokeren	aadfb6fe83	THTensorApply reduce memory overhead	2017-02-23 06:01:13 -05:00
jokeren	6c273594c9	THTensorApply Counter compress	2017-02-23 06:01:13 -05:00
jokeren	e475c82fa1	Add isTransposed judge and enable multithread of fill functions	2017-02-23 06:01:09 -05:00
jokeren	0c2e6665df	Add AVX copy	2017-02-23 05:50:34 -05:00
jokeren	6295e6e94b	Rebase master	2017-02-23 05:50:34 -05:00
jokeren	670a4aa708	Fix AVX2 bugs	2017-02-23 05:50:34 -05:00
jokeren	1bdc2e64ed	Add fma cadd	2017-02-23 05:50:34 -05:00
jokeren	c587be1e50	Add THVector Fill	2017-02-23 05:50:34 -05:00
jokeren	bd481596f5	optimize THVector add mul div	2017-02-23 05:50:34 -05:00
jokeren	a504d56b43	Fix THVector cmul AVX bug	2017-02-23 05:50:30 -05:00
jokeren	91c4dfccea	Use THVector cadd AVX	2017-02-23 05:46:44 -05:00
jokeren	27f618c44d	Add THVector Fill AVX	2017-02-23 05:46:44 -05:00
jokeren	a14482a1df	Add THVector cadd AVX	2017-02-23 05:46:40 -05:00
jokeren	aa50c5734b	Add THVector AVX cmul	2017-02-23 05:46:07 -05:00
jokeren	293001a4fe	Add THVector SSE div cdiv	2017-02-23 05:46:07 -05:00
jokeren	638cfdf150	Add SSE add	2017-02-23 05:46:07 -05:00
jokeren	5f80a14525	Separate SSE and AVX	2017-02-23 05:46:07 -05:00
jokeren	1342fd3975	Remove THTensorMathSIMD THTensorMathDispatch	2017-02-23 05:46:07 -05:00
jokeren	8d4af38489	Add THVector div cdiv	2017-02-23 05:46:07 -05:00
jokeren	575a064e66	Remove THVector diff	2017-02-23 05:46:07 -05:00
jokeren	3ab21a3c4f	Merge THVector mul AVX	2017-02-23 05:46:07 -05:00
jokeren	2f592e6c7d	Remove THVector scale	2017-02-23 05:46:07 -05:00
jokeren	5661ffb766	Merge THVector mul	2017-02-23 05:46:03 -05:00
jokeren	9b74503daa	Merge THVector cmul	2017-02-23 05:40:33 -05:00
jokeren	24848f1cd8	Change THVector mul to cmul	2017-02-23 05:40:33 -05:00
jokeren	a31a07ede9	Merge THVector add	2017-02-23 05:40:33 -05:00
jokeren	c8c4c9b23d	Change THVector add to cadd and fix NEON	2017-02-23 05:40:33 -05:00
jokeren	e1ed9303f0	Add multi-thread add	2017-02-23 05:40:33 -05:00
jokeren	a43aab13c2	Fix THTensorMath.c style	2017-02-23 05:40:33 -05:00
jokeren	c698b4a45e	Add Dispaches for div and mul	2017-02-23 05:40:29 -05:00
jokeren	c6a0ffab50	Add AVX single float and double float add	2017-02-23 05:40:24 -05:00
jokeren	8ba7cc30d1	Add THTensorMathSIMD.c	2017-02-23 05:32:34 -05:00
jokeren	61bf08ca24	Fix compilation for simd tensor add	2017-02-23 05:32:28 -05:00
rguthrie3	6ada3c0c16	Fast floating point add kernel in intrinsics (11x speedup over default for 10k elements)	2017-02-23 05:11:44 -05:00
rguthrie3	60061fbe79	Fixed up CPU dispatch and tested. Can begin implementing kernels	2017-02-23 05:11:44 -05:00
rguthrie3	46e7042add	SIMD helper header, modified add in THTensorMath to check dispatch	2017-02-23 05:11:44 -05:00
rguthrie3	d0c182773b	First commit for dynamic CPU dispatch: general framework in place (need to create dispatch tables and stubs for all functions and make impls have hidden linkage)	2017-02-23 05:11:44 -05:00
Soumith Chintala	b6f60585b5	fix AVX2 detection bugs	2017-02-23 05:00:55 -05:00
Soumith Chintala	4b0e3ee219	Merge pull request #699 from twitter-forks/bitops Bitwise operations	2017-02-23 04:15:35 -05:00
陈云	838842d4b2	fix documentation error. [issue #790 ](https://github.com/pytorch/pytorch/issues/790 ) (#831 )	2017-02-23 08:59:29 +01:00
Krishna Vudata	17d27d4882	Enable Reshape to handle scalars Summary: The context here is that we want fblearner predictor to handle float features (D4601334). Since predictor processes a single example at a time, it makes sense to specify a single float feature as a float scalar tensor. But if the Caffe2 net has a SigridTransforms operator, it expects everything to have an addition dimension so it can be called with multiple examples. Being able to Reshape a scalar into a 1-d tensor will enable us to mix SigridTransforms with other native Caffe2 operators. Reviewed By: ender-wieczorek Differential Revision: D4602675 fbshipit-source-id: 8b33876bf47bc341385fd7ac19cd1fd7f67a7ccf	2017-02-22 23:30:25 -08:00
Alexander Sidorov	95262032d8	] Char RNN bug fix for batching Summary: It could be that only first item in the batch was really used in a case rest of the memory was 0. Or if memory there had a big positive integer, then whole sequence was used. So we used rest of the batch depending on our luck :) Reviewed By: Yangqing Differential Revision: D4599569 fbshipit-source-id: ae89cee796bbcbc232e4abcab71dee360b0d8bc6	2017-02-22 17:34:30 -08:00
Andrew Tulloch	312821d36c	Allow in-place instance norm. Summary: In-place is ~30% speedup, but needs a change to torch2caffe or a graph rewrite on the client. Differential Revision: D4577582 fbshipit-source-id: c31bf8ba97f4fa4cedf355cf2475eb7bab48b304	2017-02-22 14:03:55 -08:00
Adam Lerer	e71cf20192	improved serialization (no tar copy) (#713 )	2017-02-22 22:24:20 +01:00
Bram Wasti	c7ed091633	Added model downloader Summary: Closes https://github.com/caffe2/caffe2/pull/156 Reviewed By: Yangqing Differential Revision: D4574588 Pulled By: bwasti fbshipit-source-id: a0f2da0b13358157c7d7322257a9c4f1c61aae12	2017-02-22 12:47:15 -08:00
Pooya Davoodi	b0148a7c7d	Use ws_nbytes_limit (called cudnn_ws in args). Summary: cudnn_ws args was already there. This PR only uses that args when model is created. Closes https://github.com/caffe2/caffe2/pull/164 Differential Revision: D4598443 Pulled By: Yangqing fbshipit-source-id: c2e83f73059360ecf2fedf2c62be7cacbb4034ca	2017-02-22 12:19:16 -08:00
Xianjie Chen	aed3aabc7f	model and preprocessor can handle empty dense inputs Summary: we may not need dense feature inputs in some models (e.g., double helix). Reviewed By: dzhulgakov Differential Revision: D4568755 fbshipit-source-id: 6850508f86fafb53f81783b2a2a38776be5455d7	2017-02-22 11:19:15 -08:00
Artem Volkhin	45e1905722	add support of fp16 to SparseLengthsSum and SparseLengthsMean Summary: Another part of making DPER compatible with half-floats. This diffs adds supoprt of fp16 to segment reduction operators used in DPER. Reviewed By: dzhulgakov Differential Revision: D4587560 fbshipit-source-id: 0ae10648a7286a820bffaee802464dd9464584bc	2017-02-22 11:05:55 -08:00
Artem Volkhin	b2cf0fad15	Convert SparseLookup layer's embedding to fp16 blobs for predictor Summary: First part of adding half-floats support to DPER 2.0. Let's add an option use_half_floats to enable converting some weights of the model from fp32 to fp16 before saving it to predictor models parts. For now it's for SparseLookup layer's embeddings. All conversion is done after training is finished and saved models are ready to be used on remote predictors as-is (they will be stored compacted in memory). New fp16 blobs are saved to the model instead of original ones, under the same names, so we don't modify MetaNetDef at all. Next steps: 1) support on delivery side -- operators working with these blobs should support both float and float16 input types 2) benchmark performance to make sure there is no regression a) of serialization b) of delivery 3) support realtime training (I'm thinking about adding new pre-publishing net which will be executed each time the realtime trainer stops to publish a new snapshot) Depends on D4567304 Reviewed By: kennyhorror Differential Revision: D4571710 fbshipit-source-id: 19967a17d3bd84878d66e8c0ed8c5342bf38d979	2017-02-22 11:05:49 -08:00
Xian Li	64419a928d	Implement EnsureDenseOp and EnsureDenseGradientOp. Summary: This operator can always outputs dense gradients regardless of the input gradients. For forward pass, it passes inputs to outputs in place. Reviewed By: xianjiec Differential Revision: D4582511 fbshipit-source-id: 7eb2c5d2142aa05d373f06cab1e7f89d8b747d34	2017-02-22 07:16:26 -08:00
Yangqing Jia	47b65b6d8d	Add a create your own dataset tutorial Summary: bwasti - will follow up via email. Closes https://github.com/caffe2/caffe2/pull/166 Differential Revision: D4596858 Pulled By: Yangqing fbshipit-source-id: 6d088ccf1604e0dc9b94cbf0a75b51587e734d95	2017-02-22 03:31:47 -08:00
Alisson Gusatti Azzolini	59f0454621	Gather perf counters for distributed jobs Summary: Set up a server node that periodically gathers values of all nodes' perf counters, allowing to publish them at once. Reviewed By: dzhulgakov Differential Revision: D4555116 fbshipit-source-id: 8e49ac8353b52b2be82aedf305762478e7fa687a	2017-02-21 22:06:25 -08:00
Aapo Kyrola	ba1d592b5f	New 40% faster net-type for MLP on GPUs Summary: This diff introduces a new net type 'singlethread_async' which is based on my investigation of DPER/hogwild MLP bottlenecks. It only uses one CPU thread, but multiple GPUs on each GPU. This is implemented by having each Net to submit their list of operators to a central GPU-specific executor queue and a thread that executes them asynchronously. This executor takes all tasks in the queue and executes them on separate cuda streams and then waits them in the end. This solution can achieve >95% GPU utilization on 8 GPUs when sufficient amount of workers is used. FYI: I also tried fancier solution such as using cudaStreamCallbacks(), but they did not have as good performance. Improved the dper bench by adding the MomentumSGDUpdate operations and adding speed test capabilities. During my testing I also noticed that the startup costs for inizialing CUDA streams and contexts are high, so it is important to do a warm up. Reviewed By: Yangqing Differential Revision: D4553941 fbshipit-source-id: bb00524bef653d75de026dd64097b8d9b7a0acb3	2017-02-21 21:40:15 -08:00
Alisson Gusatti Azzolini	6ff05fd49d	Fix issues pickling jobs Summary: We were running into a problem where a Job could not be pickled. It needs to be pickled in order for the master flow operator to execute it using the session. This creates a concept of "compiled" Job, that pretty much only stores protobufs with the Jobs to be executed, avoiding any issue with pickling. Reviewed By: dzhulgakov Differential Revision: D4554799 fbshipit-source-id: 2ee9877ca49a796d51925e5ec917436e3d930984	2017-02-21 20:47:27 -08:00
Alisson Gusatti Azzolini	8fa156d082	Improve "reporter net" design Summary: Previously we had several limitations for a reporter net: - needed to be a net, not an execution step - only one allowed per execution step, with a single interval Now, "reporter nets" become repoter steps and multiple of them can be specified with different timeouts. Reviewed By: dzhulgakov Differential Revision: D4583686 fbshipit-source-id: ad7266e16f96e7829fd24dcc1f165f39e9db573d	2017-02-21 20:17:40 -08:00
Bram Wasti	7a65736e46	Fix some python version issues with cmake Summary: This script will attempt to determine files that will be useful for building with the correct python version. Currently on macOS with various python installations CMake fails to determine the correct location of python libraries. Closes https://github.com/caffe2/caffe2/pull/163 Reviewed By: Yangqing Differential Revision: D4594954 Pulled By: bwasti fbshipit-source-id: c2b750ee9608a02fad4ce2f2293f5fa54dc7011c	2017-02-21 17:46:57 -08:00
Peng Yang	26be1977bf	fix CrossEntropyOp bug for batch input Summary: this is to fix the bug with eigen implementation which calculating crossentropy Reviewed By: salexspb Differential Revision: D4582078 fbshipit-source-id: 4c92047e9dbbe219fcbef618a45c584c2fbfaad5	2017-02-21 17:34:31 -08:00
Bram Wasti	183e158642	Remove Model API (unused) Summary: Removed Model API because no one {seems to,should} be using it Reviewed By: Yangqing Differential Revision: D4575126 fbshipit-source-id: 174d39e9aa46750f1fae8295f7e1e5452559af33	2017-02-21 17:19:05 -08:00
Alisson Gusatti Azzolini	04eccb8ebe	Performance counters Summary: - Key-value store for counters. - Counters are updated via macros that also export USTD probes. - Counter values can be exported using caffe2 operators. - Snapshot mechanism for tracking time-window counter values. Reviewed By: dzhulgakov, pietern Differential Revision: D4553761 fbshipit-source-id: 25a1a91a3168dcff2159c6fba7b357d3fd3aa9bf	2017-02-21 16:31:24 -08:00
Sergey Zagoruyko	adb4cb2b5b	contiguous view backward (#816 )	2017-02-21 19:09:36 -05:00
Pieter Noordhuis	478d7446ef	CMake fixes Summary: Adds script to populate third-party directory. Differential Revision: D4591509 fbshipit-source-id: 28934feb536a9f3a066d8c40988337f3dddffaed	2017-02-21 15:06:45 -08:00
Qichao Que	7f4d5e9900	Add feed label parser operator. Summary: Add feed label parser operator, this layer depends on D4520993. Reviewed By: kennyhorror Differential Revision: D4538797 fbshipit-source-id: 8efcd7b2f6962c30023c7464a13c125ba1a99dc4	2017-02-21 14:17:00 -08:00
Alexander Sidorov	ea9f4da368	fix typo in TextFileReader Summary: as title Reviewed By: bwasti Differential Revision: D4591870 fbshipit-source-id: 01912ee75b036335402c7b4a5b147f20a50ce95b	2017-02-21 14:02:48 -08:00
Michael Houston	2d8784ce55	Add python-protobuf install instructions Summary: Fixes #158 Closes https://github.com/caffe2/caffe2/pull/159 Differential Revision: D4592503 Pulled By: Yangqing fbshipit-source-id: 9398ee30e507fd6958818fe407786a7895792775	2017-02-21 12:03:00 -08:00
Pieter Noordhuis	df68230351	README and docs skeleton Summary: TSIA Differential Revision: D4591755 fbshipit-source-id: fa435f4ad6b97453c3c9516b4bfc9f8f0fb2e4f1	2017-02-21 10:52:04 -08:00
Edgar Riba	6073f9b46c	update table in README.md it removes the empty top row	2017-02-21 12:58:04 -05:00
Soumith Chintala	8e8022b735	Merge pull request #418 from ruotianluo/adaptiveAverage Add SpatialAdaptiveAveragePooling.	2017-02-21 09:15:12 -05:00
Soumith Chintala	da82d2dd70	Merge pull request #434 from bottler/master VolumetricFractionalMaxPooling like spatial	2017-02-21 09:13:59 -05:00
Soumith Chintala	82176473a5	Merge pull request #442 from twitter-forks/half-fixes Convert real to accreal in libTHCUNN	2017-02-21 09:12:56 -05:00
Soumith Chintala	2d269a9a72	Merge pull request #1137 from twitter-forks/half-fixes Using accreal instead of real in the API	2017-02-21 09:12:32 -05:00
Guillaume Sautière	240372a991	Fixed topk documentation for largest=True	2017-02-21 04:38:24 -05:00
yunjey	5b10411c8c	Fixed some mistakes in examples Fixed mistakes in LSTMCell and GRUCell examples.	2017-02-21 04:17:28 -05:00
Adam Paszke	4c474a9939	Improve prodall CUDA test	2017-02-20 23:28:31 -08:00
Adam Paszke	7ea6ae57c8	Support numpy arrays in default_collate	2017-02-20 23:28:31 -08:00
Adam Paszke	42633f8986	Fix misspelling and add support for weights in NLLLoss2d	2017-02-20 23:28:31 -08:00
Adam Paszke	84248690a9	Add support for indexing with None and slices with positive steps	2017-02-20 23:28:31 -08:00
Adam Paszke	53409ca0fb	Fix a warning in THPP	2017-02-20 23:28:31 -08:00
Adam Paszke	c2c1710047	Add clip_grad_norm	2017-02-20 23:28:31 -08:00
Adam Paszke	876202503f	Support multiple inputs in data parallel	2017-02-20 23:28:31 -08:00
Adam Paszke	946a7d9bc3	Make input contiguous only once in backward of cuDNN RNN	2017-02-20 23:28:31 -08:00
Adam Paszke	608bcd3b15	Return correct number of gradients from cuDNN RNN	2017-02-20 23:28:31 -08:00
Adam Paszke	632b02a477	Add checks for reward type and size in StochasticFunction	2017-02-20 23:28:31 -08:00
Adam Paszke	0db9c63300	Use library_dirs in setup.py	2017-02-20 23:28:31 -08:00
Adam Paszke	873ed4e6b6	Add better error message for conversion of CUDA tensors to numpy	2017-02-20 23:28:31 -08:00
Alykhan Tejani	01bd43037d	add docs to torch/cuda/random	2017-02-20 20:43:47 -05:00
Nitish Shirish Keskar	68c9e3f232	Fixed typo in GRUCell example	2017-02-21 01:37:04 +01:00
Durk Kingma	a25c8555eb	Fixed paper references	2017-02-21 00:27:18 +01:00
Pieter Noordhuis	d6ca3820aa	Optionally specify stream for pointers in CUDA algorithms Summary: Work may be queued on CUDA streams for asynchronous execution. The memory backed by pointers passed to any algorithm can therefore be mutated after constructing an algorithm instance. By also passing in the streams these mutations happen on, the algorithms can synchronize with these mutations to ensure no invalid data is used. By passing in these streams, any work done by these algorithms will also be queued, which effectively removes a single synchronization step from any algorithm run. Differential Revision: D4589394 fbshipit-source-id: 0c8cd6ba9c9018f33d6f4c55a037083fc4164acb	2017-02-20 14:15:53 -08:00
soumith	dfd1dff383	Merge commit '4ca26fbc1b7be4e369f84e95df16431bb2f1dcb7'	2017-02-20 08:05:19 -08:00
soumith	8f391d4d51	Merge commit 'ee43cd7adca3b24a2071ce6c55dcd3a95a2b6ff6'	2017-02-20 07:55:46 -08:00
soumith	2a6b7685ae	Merge commit 'f6c1bbfa483ad19c500dc94838baaa69f02d240b'	2017-02-20 07:55:19 -08:00
soumith	eb9573107d	Merge commit '34b7fed802db1fda6322a70b648dcc4947858719'	2017-02-20 07:54:51 -08:00
Adam Paszke	ee43cd7adc	Do SpatialClassNLLCriterion sizeAverage in a separate kernel	2017-02-20 06:54:23 -08:00
Adam Paszke	4ca26fbc1b	Remove averaging from prodall	2017-02-20 11:37:53 +01:00
Adam Paszke	c165226325	Print a readable error message when arguments are on different GPUs	2017-02-20 11:35:50 +01:00
Dmitry Pleshkov	ea6273e048	Fix search gcc5 build Reviewed By: philippv, luciang Differential Revision: D4536085 fbshipit-source-id: 2eb950cee137db16ec632b669a209b0f4419a6d3	2017-02-17 20:31:26 -08:00
Andrew Dye	0722775ca3	AllreduceRingChunked/CudaAllReduceTest should use the chunked algorithm Summary: I was mistakenly calling the non-chunked algorithm for the chunked test. Reviewed By: pietern Differential Revision: D4580160 fbshipit-source-id: 9d62a68e9e86cc6e596d90ff8854c585a0e8855c	2017-02-17 19:17:44 -08:00
Bram Wasti	23602488cc	Fix ProtoBuf.cmake to use PROTOBUF_LIBRARY as well Summary: Closes https://github.com/caffe2/caffe2/pull/152 Reviewed By: Yangqing Differential Revision: D4577524 Pulled By: bwasti fbshipit-source-id: 019ced46dc474c413ba00a98b8fdeb7230a28b55	2017-02-17 19:16:47 -08:00
Sasank Chilamkurthy	49295ebe54	Add sequential to documentation	2017-02-18 08:42:43 +05:30
Ahmed Taei	5bc3d2ef03	Add ReduceFront GPU Op's Summary: Add GPU implementation for ReduceFront{Sum\|Mean} Ops Differential Revision: D4577270 fbshipit-source-id: 697f498531af6b9da4a0138d2a9beb39234f9756	2017-02-17 16:46:42 -08:00
Adam Paszke	455038e470	Use a more stable formula for spatial LogSoftMax	2017-02-17 13:05:45 -08:00
Adam Paszke	ca7f02ea0c	Add shape checks for SpatialClassNLLCriterion	2017-02-17 13:01:56 -08:00
Amy Zhang	af027d5025	add assert for labels for spatial case Differential Revision: D4570726 fbshipit-source-id: fe73c7f0dfa3b5d5ad50b2a1ed651f520e609985	2017-02-17 11:49:56 -08:00
Ahmed Taei	b8f6ff1a5d	Make Shape GPU supported. Summary: Fix hard coded CPUContext and add CUDA support for shape function Differential Revision: D4577053 fbshipit-source-id: b515e52c39c02aa1600ccb1c3e559c9a5a0b718c	2017-02-17 11:30:27 -08:00
Christian Sarofeen	04aba1caec	Fix cuDNN dropout desc for multi-gpu (#772 )	2017-02-17 19:16:12 +01:00
Kittipat Virochsiri	ba7fad53b5	Support for sample softmax Summary: This diff adds ability to train multiclass classifier on sampled subset of classes. This basically implements what described in https://arxiv.org/abs/1412.2007 without the sampling probability correction. Since this implement uniform sampling, sampling probabilities are cancelled out in softmax anyway. The trick to make this work is to have 2 different nets for prediction and training, both shared parameters. The model is built normally until the last layer. If sampling is needed, then we do the following: The class sampling works as following: Reviewed By: xianjiec Differential Revision: D4512859 fbshipit-source-id: ab537bcac81d5e5877a8795045e8682c8064da68	2017-02-17 09:31:54 -08:00
Andrew Dye	420488349f	Implement CUDA-aware allreduce chunked Summary: First pass at a CUDA-aware allreduce chunked implementation. For now the algorithm runs on the CPU and is mostly copy/paste from allreduce_ring.h. A subsequent pass will offload to the GPU. Serialize cuda test to avoid intermittent failures due to memory contention. Reviewed By: pietern Differential Revision: D4576959 fbshipit-source-id: e1f292a05b88ff24c33e549d4a52e770a21f85d2	2017-02-17 09:06:05 -08:00
Nicholas Léonard	f6c1bbfa48	Merge pull request #1105 from ruotianluo/adaptiveAvg Add SpatialAdaptiveAveragePooling	2017-02-17 10:52:33 -05:00
Nicholas Léonard	4e2c8c6db5	Merge pull request #1123 from bottler/master VolumetricFractionalMaxPooling like Spatial...	2017-02-17 10:42:21 -05:00
Petr Lapukhov	1a5cae7340	Add busy-poll option in TCP transport Summary: Ideally we would want the driver to busy-poll for us. In absence of driver support, spinning with MSG_DONTWAIT flag seems to be helping a lot too. Of course, we pay the price of burning one core for polling. Sigh. Reviewed By: pietern Differential Revision: D4576242 fbshipit-source-id: 85d9e1b786fbb6053864fba80f3e5ecc80fe221d	2017-02-17 07:31:32 -08:00
bdfhjk	c26b9c0a5e	Update rnn.py Based on the https://github.com/pytorch/pytorch/blob/master/torch/backends/cudnn/rnn.py#L302 line, the output is returned in a (0,1) transposed version, if the batch_first argument is set to true.	2017-02-17 14:37:14 +01:00
Adam Paszke	aaf41c61a6	Fix Engine::compute_dependencies	2017-02-17 18:28:51 +05:30
Pablo Estevez	945e75bd3a	Remove openmp parallel for in caffe2 Summary: Task L1 items: Replace CAFFE2_OMP_PARALLEL_FOR with TODO and remove macro definition. Reviewed By: ajtulloch, Yangqing Differential Revision: D4565679 fbshipit-source-id: 8185af2b77b230159058c0a756a0da25ebcf3d0f	2017-02-16 22:05:10 -08:00
Sam Gross	dd844f741b	Fix previous_functions when it contains Variables	2017-02-17 11:03:46 +05:30
Pieter Noordhuis	4dd19988c3	Add benchmark option to display nanoseconds Summary: Latency optimization is going well and I've seen the odd case of <10us measurements. This option makes the benchmark tool display nanos instead. Differential Revision: D4575925 fbshipit-source-id: 98dbd3b39e31cbcdd4c146613f6630e721187e1e	2017-02-16 21:16:26 -08:00
Adam Paszke	7117a9012e	Fix flaky non-contig test	2017-02-17 10:40:08 +05:30
Adam Paszke	1bdc28161a	Add torch.__version__	2017-02-17 10:40:08 +05:30
Adam Paszke	5e150caf38	Fix a bug in Engine::compute_dependencies	2017-02-17 10:40:08 +05:30
Adam Paszke	c0c62d099a	Make detach() actually remove the creator	2017-02-17 10:40:08 +05:30
Adam Paszke	b9ece39685	Make torch.Size methods return torch.Size, not tuple	2017-02-17 10:40:08 +05:30
Xianjie Chen	8949abe10b	more clear about supported output dimension Summary: Do I understand correctly? It must be of size 1 for sigrid Reviewed By: kennyhorror Differential Revision: D4576541 fbshipit-source-id: 92fa8dc62e36ff095e14cceeb80b03c0028f5695	2017-02-16 21:01:52 -08:00
Fei Sun	d8b7166251	Move build_ftrl to open source directory Summary: Move the open source version of build_ftrl to the open source directory. Because build_ftrl can use several engines, the SIMD engine is fb specific. We keep the build_ftrl in the fb/optimizers/sgd.py file. So, if the caller only uses the open source engine, it can import the open source build_ftrl. If the caller may use the SIMD engine, it needs to import the fb specific build_ftrl. Also move the tests to python directory. Reviewed By: salexspb Differential Revision: D4560384 fbshipit-source-id: 84fc915d3bbe42fd19503ef132d3277088f6fab3	2017-02-16 18:02:15 -08:00
Pavan Yalamanchili	15ef008877	Using accreal instead of real in the API - This reverts commit 7a07afe545b4deae5919d9dc268bfac3d37398c7. - Includes fixes for TemporalRowConvolution	2017-02-16 17:34:11 -08:00
Pavan Yalamanchili	b14d6318f8	Convert real to accreal in libTHCUNN - This reverts commit 0d85922d116879448485ef88ae21e83a9255a0b0. - Includes fixes for TemporalRowConvolution	2017-02-16 17:33:03 -08:00
Xianjie Chen	d0621a2449	NextScopedBlob with well-defined behavior and respect namescope Summary: Remove the use of `NextName` in layer model helper, so that the same function return `model_helper` that should construct identical `Net`, when under the same NameScope. The `NextScopedBlob` should only take effect when there is real name conflicting, otherwise it returns ScopedBlobReference. This is critical for parameter blobs. In long run, we need to be able to specify parameter blobs more explicitly. (kennyhorror is working on this). This solution works in short term for e.g., two tower sparse nn models. Reviewed By: kennyhorror Differential Revision: D4555423 fbshipit-source-id: 2c4b99a61392e5d51aa878f7346466a8f14be187	2017-02-16 17:16:36 -08:00
James Cross	b436788b16	LSTMUnit: pass through H values Summary: Pass through the h-value recurrent output unchanged at each LSTM step beyond the valid part of a sequence (computed based on seqLengths, allowing batching of sequences of different length). This enables using the final-step output of each sequence as the output when one vector is desired for the entire sequence. Gradient also passed back unchanged. Also made some cosmetic changes to recurrent_network_test.py (seq_lengths offset corrected, should be in [1, T] rather than [0, T-1]). Reviewed By: urikz Differential Revision: D4540307 fbshipit-source-id: 73a9f6326069d713dcb0cdc8d17869317c6dbe96	2017-02-16 15:31:38 -08:00
Pieter Noordhuis	93002720eb	Extract CudaDevicePointer for reuse across CUDA-aware algorithms Summary: The CudaDevicePointer optionally takes an existing stream on which it runs any operation associated with the pointer (for now just memcpy's, but this likely will includes kernel execution in the future). Differential Revision: D4574035 fbshipit-source-id: ddd7972a3874012059f1fde1b341fd6edd69102d	2017-02-16 14:05:52 -08:00
Andrew Tulloch	9a33786dc0	Split out large gradient ops at a file level Summary: We don't use these ops on mobile, so this saves ~150kb. Reviewed By: Yangqing Differential Revision: D4569599 fbshipit-source-id: c6f9d702773c64a395e87afa4cfb5b2992dba230	2017-02-16 12:32:51 -08:00
Artem Volkhin	0c03c8fca5	Add name_overrides argument to SaveOp Summary: In current implementation of SaveOp we always use names for blobs from the current workspace. But there is a use case for replacing names in saved model: for example, to use half-floats in prediction model but keep full-floats for training model we might want to save a blob "w_fp16" as "w". Differential Revision: D4567304 fbshipit-source-id: 87bc84fa6a45d8bfa33edb55ac1fb1cff542dbe3	2017-02-16 12:32:51 -08:00
Steven Strijakov	5429031917	Adding SoftmaxWithLoss operator to Shape Inference Summary: This diff adds shape inference for the SoftmaxWithLoss Operator Differential Revision: D4565835 fbshipit-source-id: 1c2db398524c765977ec4d8a22c9b986bf9faf82	2017-02-16 12:32:51 -08:00
Luis Galeana	6b0545d764	Implemented logging of inputs per second Summary: Every time data is put into the logger, it checks if a second has passed. If so, it displays how many inputs were put in the last second. Differential Revision: D4527148 fbshipit-source-id: f197eb975ed81111449705e0719d1e56f385fd8d	2017-02-16 12:02:05 -08:00
Natalia Gimelshein	7c44506441	allow DataParallel to have tuple inputs on a single GPU	2017-02-16 19:07:17 +01:00
Dmytro Dzhulgakov	70fb22f5be	update sparse_to_dense op ENFORCEs Summary: Hope to catch that weird error Reviewed By: rayleichen Differential Revision: D4571699 fbshipit-source-id: 7f9654286b968df527bb3f13a088c3c0725e2412	2017-02-16 09:12:29 -08:00
Marko Vitez	937ba581d7	Improve nn.legacy compatibility with Torch7 (#738 )	2017-02-16 21:17:12 +05:30
Soumith Chintala	2ae54f1194	setup.cfg -> tox.ini (#761 )	2017-02-16 21:13:13 +05:30
Andrey Malevich	a3e24b2b5f	Fix LabelCrossEntropyOp to ENFORCE >= 0 as well. Summary: As desc. Differential Revision: D4570755 fbshipit-source-id: 665326da7a057e357a31736a0c196dccc83f4ccc	2017-02-16 06:11:29 -08:00
Yangqing Jia	d4b1d347e9	minor: make cmake cuda ready Summary: Closes https://github.com/caffe2/caffe2/pull/153 Differential Revision: D4571506 Pulled By: Yangqing fbshipit-source-id: 4e887071774749fb84d34cab114dad4587d36ff1	2017-02-16 06:11:29 -08:00
Bram Wasti	7ee9984556	Added local build and apple fix for generating .so files Summary: Closes https://github.com/caffe2/caffe2/pull/147 Reviewed By: bwasti Differential Revision: D4564024 Pulled By: JoelMarcey fbshipit-source-id: 526a5ab700f9356a3c93a6c64dc38e44a173559c	2017-02-16 06:11:28 -08:00
Andrew Tulloch	7d9a0a41fd	Allow forcing single-threaded execution at runtime. Summary: Might be useful for the EXC_RESOURCE / CPU issues. Reviewed By: salexspb Differential Revision: D4565494 fbshipit-source-id: 74ac9edeba6334a46ee6799a93ca96eb68216439	2017-02-16 06:11:27 -08:00
Yury Zemlyanskiy	40534de705	Gradient for Copy operator Summary: One can find a reason, why I need gradient for CopyOp in this post - https://fb.facebook.com/groups/1405155842844877/permalink/1639683782725414/ Gradient for CopyOp is trivial in case the device was the same (cpu, or same gpu), but get's a little harder, when the copy was made across two different gpu. I introduce new operator CopyOnDeviceLike, which has additional second input. The op copies the first input to the same device as the second one. The default implementation is exactly the same as CopyOp, but I specialize it for CUDAContext. Please, let me know if I'm doing anything wrong here! That's my first caffe2 diff, related to operators definitions. Reviewed By: Yangqing Differential Revision: D4557258 fbshipit-source-id: 9494be589cc1e5696bbbfe25b7622aaa4c9efe4a	2017-02-16 06:11:27 -08:00
Matt Uyttendaele	797720225d	refactor LoadImageOp and dbreader Summary: - updated image pre-processing to avoid detectable differences in re-sizing for different angles - refactored utility functions into dbreader and image_input - fixed an issue in image_input where crop assert was firing because it was testing pre-resized image Reviewed By: seansnyder Differential Revision: D4550365 fbshipit-source-id: 6461e24a26367c8f6af5e2682beb2b3acd67842b	2017-02-16 06:11:26 -08:00
Joel Marcey	d9d6f1e905	Fix missplaced semi-colon	2017-02-15 20:41:05 -08:00
Pieter Noordhuis	cb91078e01	Support synchronous mode for TCP transport Summary: In synchronous mode, it is not the device thread that is responsible for handling I/O, but the user thread itself. Calling waitRecv on a buffer will trigger the read function on the pair to be called. This eliminates the context switch necessary if the device thread is handling all I/O. For benchmarks with small numbers of elements this reduces latency by as much as 20%. Reviewed By: plapukhov Differential Revision: D4549998 fbshipit-source-id: ab718ba090c06d7c7aa4065cc9f92bd96b9e4a35	2017-02-15 17:31:06 -08:00
Andrew Tulloch	5db768a60f	Improve InstanceNorm NCHW performance (~30% on iOS style transfer) Summary: Refactors some of the vectorization and accumulation. Parallelization is a TODO, I'm not sure how Android goes and it's just an incremental ~10% or so. Reviewed By: Yangqing Differential Revision: D4568850 fbshipit-source-id: aa9db5a364bb738f492085772dc82b94885eb4d6	2017-02-15 17:16:14 -08:00
Aaron Markham	56d3748b26	Merge branch 'master' of http://github.com/caffe2/caffe2	2017-02-15 17:08:04 -08:00
Aaron Markham	93841acc1a	added docs for learning_rate_op and tweaked docs formatting	2017-02-15 17:07:54 -08:00
Joel Marcey	3098bef94e	Getting things in sync with the internal repo This file was moved.	2017-02-15 16:03:01 -08:00
Yangqing Jia	c7c4b00a50	windows build: getting there Summary: This clears up a bunch of windows build errors, but there are still 12 errors mostly relating to - template keywords - initializer list - pthreadpool that are not readily available on windows. Also, cuda build is being disabled right now. Current error can be found here: https://ci.appveyor.com/project/Yangqing/caffe2-w2ucm Closes https://github.com/caffe2/caffe2/pull/151 Reviewed By: bwasti Differential Revision: D4564591 Pulled By: Yangqing fbshipit-source-id: adacad5fa2d6d52d586700947972e3674e3b6e60	2017-02-15 16:00:45 -08:00
Tullie Murrell	81d932b161	Add LeakyReluOp to caffe Summary: Adds LeakyRelu to caffe2 with a test. Reviewed By: bwasti Differential Revision: D4511970 fbshipit-source-id: a7189c691ec1813b304bf04f2b73f1c61acd08e2	2017-02-15 16:00:45 -08:00
Aapo Kyrola	50a6897e80	Shape inference for ImageInput, NHWC2NCHW and StopGradient Summary: As in headline. I had missed these originally. Reviewed By: kennyhorror Differential Revision: D4560255 fbshipit-source-id: e69458e8a2574b981e40e915d87c8e16dadee7d6	2017-02-15 16:00:45 -08:00
James Cross	63901e9aca	allow recurrent network gradient op to receive gradient on any combination of network output blobs Summary: (Caffe2) Modified RecurrentNetworkGradient operator so that training is possible with any of the output blob(s) receiving gradient during the backward pass. This is realized through a new argument for the RecurrentNetwork op, outputs_with_grads, which takes a list of the indices of the output blobs which will receive gradient. The default case (only receiving gradient from the first output blob) remains the default. New unit test covers the case where outputs_with_grads = [1, 2] using Python LSTM wrapper. Reviewed By: urikz Differential Revision: D4518516 fbshipit-source-id: 5c531582b20f3cf727d1aa91239b4d5a2b8a7c1f	2017-02-15 16:00:45 -08:00
Huazhong Ning	cb3c41b9a9	PiecewiseLinearTransformOp transform binary predictions specially Summary: The existing op tranforms the input in a general way. It needs M transform mappings to transform a NxM input tensor. But for binary predictions X (Nx2 tensor), we know that X[:, 0] = 1 - X[:, 1]. So we just need one mapping for X[:, 1]. After being transformed, we can compute X[:, 0]. This diff is to handle this. Differential Revision: D4550441 fbshipit-source-id: 42d8c6e88d830c97628ee930b543740a32acf904	2017-02-15 16:00:44 -08:00
Kittipat Virochsiri	718786add7	UniqueUniformFillOp Summary: This is like `UniformIntFill` but guarantee to return unique elements in the output, excluding the optional avoiding elements. Reviewed By: xianjiec Differential Revision: D4511814 fbshipit-source-id: 5dc98ee580616e60e46ee74ebb3f5ddd29a09965	2017-02-15 16:00:44 -08:00
James Cross	93795406c5	Adapt NLU proj code for Caffe2 RecurrentNetworkOp changes Summary: Updates function revise_recurrent_network_op() which supports cloning recurrent networks by adding a blob-name prefix to string arguments to maintain correspondence. Previously relied on many hard-coded indices referring to the positions of arguments and inputs of RecurrentNetworkOp and its corresponding gradient operator, and therefore broke when the implementation changed. This fix should make it more general and robust Differential Revision: D4559768 fbshipit-source-id: fb85b0b1ffb1393dc84760d6ae5dc473e8b764b0	2017-02-15 16:00:44 -08:00
Aapo Kyrola	fc0be229b6	add mutex locks for pinnedcpuallocator to avoid nccl-deadlocks Summary: D4438796 (https://github.com/caffe2/caffe2/pull/95) introduced locks to avoid concurrent cudaFrees and NCCLs. Unfortunately, the locks were not put into PinnedCPUAlllocator, causing deadlocks in certain cases (like using Hive reader). Reviewed By: Yangqing Differential Revision: D4563752 fbshipit-source-id: 0f95051621282e742f03feb76ebc30662285fb8e	2017-02-15 16:00:44 -08:00
Andrey Malevich	a8d70f3552	Try to improve serialization speed for SparseNN. Summary: Created some simple benchmark to test model saving speed, plus few possible optimization on top of it. Since we don't really want to have partial LogFileDB ever, it makes sense to commit the transactions only after we've finished serialization. As a result in my test serialization time in my dummy test drops from 480 seconds, to: Serialization time: 52.5134651661 Deserialization time: 60.5741639137 One more really scary things that I've found: it looks like load_op with load_all might actually load corrupted DBs (if they'll be truncated), so we do need to fix it really badly (save all blobs we have in the DB or even better checksum). Reviewed By: dzhulgakov Differential Revision: D4558216 fbshipit-source-id: 4145c07f29b9dda527a2e57842f3abd8023d71a3	2017-02-15 16:00:44 -08:00
Xianjie Chen	fb7c9108d9	get parameter blobs of a model Summary: to verify that a model only used a subset of the parameters of another model (e.g., the model doing training). Differential Revision: D4557787 fbshipit-source-id: bd8ac96f5e78e05f6f56086db6e6ddcda36c1d37	2017-02-15 16:00:44 -08:00
Zhao Tan	31ca9d57b6	Remove args in Grad Summary: Removed Def().arg() in the backward computation since they have already been included in the forward. Differential Revision: D4563600 fbshipit-source-id: bb6ee25e7c8da99977b82963670267392893fcde	2017-02-15 16:00:44 -08:00
Bram Wasti	6fabf8ed1a	Documenation generation to wiki Summary: generates a fair amount of documentation from the operators. also provides a framework for later documentation generation and custom syntax. Reviewed By: dzhulgakov Differential Revision: D4168311 fbshipit-source-id: 89ae9d023ad883623cdc1879c11e10b202b68793	2017-02-15 16:00:44 -08:00
Zhen Ling Tsai	571539aa5d	implement CNN optical flow calculator Summary: addition of cnn optical flow calculator in another diff Reviewed By: Yangqing Differential Revision: D4549616 fbshipit-source-id: c444b8e7fb74d348436bc50a4432698c12ba0aec	2017-02-15 16:00:43 -08:00
bdfhjk	a217fefee1	Update rnn.py Fixed a problem with outputting the RuntimeError if arguments are incorrect in cudnn/rnn.py	2017-02-15 21:49:42 +01:00
Gregory Chanan	34b7fed802	Fix gcc 4.4.7 build.	2017-02-15 09:06:25 -08:00
Eli Stevens	5221745c21	add test for bias=False for 3d convolution	2017-02-15 04:26:44 -08:00
soumith	000ca44b16	Merge commit '797544c47a4e9bdff02137a127f883a6df9b3dfe'	2017-02-15 04:24:14 -08:00
soumith	8f3d44033b	Merge commit '0426f2f3ec2b932cb83d64101081244c2a1451b1'	2017-02-15 04:23:50 -08:00
soumith	7cc14c595a	Merge commit '07f5b21ef1bd29d1451c616062dcbfc3f8fd7c6a'	2017-02-15 04:23:18 -08:00
Eli Stevens	797544c47a	implementation of bias=False for VolConv.cu	2017-02-15 04:18:17 -08:00
Eli Stevens	0426f2f3ec	implementation of bias=False for VolConv.c Used .c file changes from `7318e2de13` as a starting point. All changes to .c files (except for whitespace details) are present here. However, the required .h files were not present in that PR.	2017-02-15 04:16:09 -08:00
Joo-Kyung Kim	336eeee895	kernel_size as the default stride for avg_pool1d (#744 ) Following the documentation, let stride to be kernel_size if stride is not provided.	2017-02-15 13:12:18 +05:30
Zhou Chang	593f867e3e	Fixed a simple compiling erroin mac OS #745 . (#746 ) Signed-off-by: Zhou Chang <achang.zhou@gmail.com>	2017-02-15 12:19:03 +05:30
Alfredo Canziani	385913be1c	Fix class torch.nn.ConvTransposeNd documentation (#739 ) There is no `dilation` `output_padding` doc was missing	2017-02-15 10:37:20 +05:30
Harsh Agrawal	6aaa14f5fe	Fix LSTMCell Doc Typo (#743 )	2017-02-15 08:29:17 +05:30
Soumith Chintala	07f5b21ef1	Merge pull request #702 from gchanan/conservativeAllocator Improve THCCachingHostAllocator performance by making it reclaim less aggressively	2017-02-15 08:26:48 +05:30
Andrew Dye	ee52f89772	Implement CUDA BroadcastOneToAll algorithm Summary: Implement CUDA BroadcastOneToAll algorithm for GPU addresses. Refactor cuda.h into cuda_private.h to allow inclusion of <cuda.h> in public headers without polluting the namespace. Port broadcast tests to GPU variants. * this revision is based on Peter's revision D4546932 Differential Revision: D4547382 fbshipit-source-id: 3d294ad8862b04fb783ba22e5c925b8d7cbc8a8d	2017-02-14 18:46:56 -08:00
Gregory Chanan	e454870396	Free set of stored streams and handle NULL streams.	2017-02-14 15:41:47 -08:00
Steven Strijakov	2de4b8840d	Added MatMul operator inference Summary: MatMul operator now performs inference Differential Revision: D4515770 fbshipit-source-id: 237b527cce306b4858452d430c8ecc8a79537aff	2017-02-14 15:32:14 -08:00
Fei Sun	8d72c6016a	Move tests of build_sgd, build_adagrad, and build_adam to pyton directory Summary: build_sgd, build_adagrad, and build_adam are in open source python directory now. Move the tests to the same directory. Extract TestBase to test_util.py so that TestFtrl can still refer it. Depends on D4552227 Reviewed By: salexspb Differential Revision: D4554549 fbshipit-source-id: 35aed05b82c78530808ef623a25bb7532b2abbae	2017-02-14 15:32:14 -08:00
Andrew Tulloch	74f1796a34	Split out `elementwise_mul_op.cc` Summary: tsia Reviewed By: Yangqing Differential Revision: D4555255 fbshipit-source-id: 0e6e4549b73b53e425dca4d60c05f59d6c09222b	2017-02-14 14:17:12 -08:00
Andrew Tulloch	a62866cc94	Compilation fix in FC op shape inferencen Summary: There's a bug here as well (should be X[:axis] + N instead of [M, N], but that can wait. Differential Revision: D4555244 fbshipit-source-id: cf07ffe925bd592b4e2159750b6ebd859cfe0e5e	2017-02-14 14:17:12 -08:00
Fei Sun	9871ed4258	Migrate build_adam to python directory Summary: The change migrates build_adam function to the open source python directory. Depends on D4551871 Reviewed By: salexspb Differential Revision: D4552227 fbshipit-source-id: 2b6bef183ecfd645d0f26215a784846d8841b845	2017-02-14 14:17:12 -08:00
Alisson Gusatti Azzolini	5fb5fd9de9	NetBuilder: Allow to call hasattr(x, ops) out of context Summary: hasattr(x, ops) should always work, regardless whether you're inside or outside a NetBuilder context. There's no ideal solution here. I think this is sensible enough. Reviewed By: kennyhorror Differential Revision: D4557228 fbshipit-source-id: 4b1c1db5c8b11e4ccbf977b3f82c63b2c3e6e7db	2017-02-14 13:47:10 -08:00
Adam Paszke	2822013437	Fix flaky tests	2017-02-14 21:28:50 +01:00
Adam Paszke	72c1982734	Add some more asserts to cuDNN RNN	2017-02-14 21:28:50 +01:00
Adam Paszke	0de2ea305a	Support retain_variables in cuDNN RNN	2017-02-14 21:28:50 +01:00
Adam Paszke	d899385a3d	Raise error when too small input is given to conv	2017-02-14 21:28:50 +01:00
Adam Paszke	c6d6cbe8a6	Check that all tensors are on the same GPU in cuDNN bindings	2017-02-14 21:28:50 +01:00
Adam Paszke	85e82e85d8	Fix bug in zero_grad, when some parameters didn't require grad	2017-02-14 21:28:50 +01:00
Adam Paszke	a1534cc37d	Fix auto-gpu in cat	2017-02-14 21:28:50 +01:00
Adam Paszke	8c8dc791ef	Load half and double THCUNN backends	2017-02-14 21:28:50 +01:00
Adam Paszke	63edca44f2	Add tests for non-contiguous inputs and gradients	2017-02-14 21:28:50 +01:00
Luke Yeager	b9f4977be9	Fix git URL in README Summary: URL was changed in `f516943841` Closes https://github.com/caffe2/caffe2/pull/127 Differential Revision: D4559705 Pulled By: Yangqing fbshipit-source-id: 72b653924d85763ac3e26081f275d699f16b494f	2017-02-14 11:48:09 -08:00
Kittipat Virochsiri	524bc07973	Change the schema of IndexLoad & IndexFreeze so that state change is captured by the framework Summary: These operators update the state of the instance and therefor should have the instance in the output list. Reviewed By: xianjiec Differential Revision: D4554773 fbshipit-source-id: 556d484fcf58878308aa6b0f7cd7ea2446d3f29e	2017-02-14 10:05:12 -08:00
Fei Sun	7aef4b2662	Migrate build_adagrad to python directory Summary: The change migrates build_adagrad function to the open source python directory. Depends on D4547016. Reviewed By: salexspb Differential Revision: D4551871 fbshipit-source-id: cb68d9b2a723b0f069c8a24cfa3062f1e676c016	2017-02-14 07:16:46 -08:00
Aapo Kyrola	7bdd8737cb	Fix to dagnet execution & dependency pruning Summary: Matt uyt reported (1) a very infrequent assertion failure at net.cc worker function. This was caused because an operator, that was not a chain, was scheduled in the job queue. This was possibly to happen since our DAGnet operator graph is graph of operators, and not chains. The dependency prunign that I introduced last week exposed this problem since it removed some "middle-to-chain" dependencies when computing the chains. (It is bit hard to explain). This diff attempts to fix the problem by only allowing scheduling of chains. I added, in addition, extra check to confirm that all parents of all nodes were indeed executed before starting next roud. This adds additional safety and breakpoint to see if there are still problems. I also fixed a bug in the operator graph pruning that made pruning less effective. (1) Matt's report: https://www.prod.facebook.com/groups/1405155842844877/permalink/1639428779417581/ Reviewed By: dzhulgakov Differential Revision: D4531424 fbshipit-source-id: 80fa7def6e8aff6910ebf0d9d5fef15ff20e0aec	2017-02-13 23:47:45 -08:00
ezineo	e52676b272	Delete SerializeToString() call in class Model(), workspace.py Summary: .In Tutorial, I found it not correct when calling Model(). After that changing, It works. Closes https://github.com/caffe2/caffe2/pull/148 Reviewed By: bwasti Differential Revision: D4556894 Pulled By: Yangqing fbshipit-source-id: 949a8d0496861f19869436908ffe1ef1a0f853b1	2017-02-13 23:18:03 -08:00
Yangqing Jia	e865c940a5	initial version of windows build Summary: This is essentially https://github.com/caffe2/caffe2/pull/146/ but shipit failed to trigger task determinator. Reviewed By: bwasti Differential Revision: D4557698 fbshipit-source-id: b0e6777957e76df4e23671371098c2c6fe83b55c	2017-02-13 23:02:40 -08:00
Aapo Kyrola	8f1f7e0dc2	Mini-optimization to AccuracyOp Summary: For k-top accuracy, if the correct prediction does not make into the k-sized priority queue, it is not going to be in the top-K, so we can short circuit. Reviewed By: Yangqing Differential Revision: D4555637 fbshipit-source-id: 7f07787f853f1c6b4024e279dcc6920d28bdde3d	2017-02-13 22:33:35 -08:00
Pieter Noordhuis	6aa8c932fc	Benchmark for CUDA-aware algorithms Summary: Separate benchmark build target for CUDA-aware algorithms. This is needed to keep CUDA an optional dependency. Differential Revision: D4546932 fbshipit-source-id: b73176ae9067233f883d51ba3ab4efbb13a6f86f	2017-02-13 21:32:58 -08:00
Pieter Noordhuis	8821f4aba6	Fix race in benchmark tool Summary: TSIA Reviewed By: plapukhov Differential Revision: D4549105 fbshipit-source-id: 61c8966e429e0701677f441aeaaf27fdc5e669e7	2017-02-13 21:32:58 -08:00
Pieter Noordhuis	5e06634f7e	Implement initial CUDA-aware allreduce Summary: This CUDA-aware ring allreduce is based on the regular ring allreduce. It runs the reduction algorithm on the CPU and is therefore most suited for smaller buffers. Both the device-to-host memcpy's at the start of the algorithm and the host-to-device memcpy's at the end of the algorithm are kicked off asynchronously in an attempt to parallize as much as possible. Reviewed By: Yangqing Differential Revision: D4542816 fbshipit-source-id: 101dfad276ca79703e37ff93fb1b6d467295f66b	2017-02-13 21:32:58 -08:00
Pieter Noordhuis	b82c4b3d38	Split benchmark code into multiple files Summary: The CUDA benchmark suite will be a separate build target, so the runner should be reused. Reviewed By: Yangqing Differential Revision: D4545092 fbshipit-source-id: 6ccf2d30f5d35c74fc59851b25416bfe6863d62c	2017-02-13 21:32:58 -08:00
Alisson Gusatti Azzolini	b7783a1976	Make ContextManager thread-safe Summary: ContextManager was thread local. This caused issues because the context registration needs to be global. What needs to be thread local is the current context. Reviewed By: jhcross Differential Revision: D4556050 fbshipit-source-id: 5de1c0d9fd0a778c4cb1eadef01f9a1ab488f603	2017-02-13 19:45:35 -08:00
Luke Yeager	9cf830fca5	Don't install the full CUDA toolkit Summary: Should speed up the build process slightly since far few packages will be installed. Matches the Caffe1 builds: https://github.com/BVLC/caffe/blob/rc4/scripts/travis/install-deps.sh#L79-L109 Closes https://github.com/caffe2/caffe2/pull/141 Reviewed By: bwasti Differential Revision: D4551924 Pulled By: Yangqing fbshipit-source-id: 10d3d2fe5b8f6c0ad75afa59cc9bc5d5f1c8273d	2017-02-13 18:50:20 -08:00
Dr. Kashif Rasul	8d90ab2d9b	compile with cudart (#737 )	2017-02-14 06:40:35 +05:30
Bram Wasti	e75a8d24bf	Fix compiler comlaint Summary: gcc didn't like not returning a value Reviewed By: Yangqing Differential Revision: D4553052 fbshipit-source-id: 68ec2df35cf097be2d9338fcd8901a5fac6292c3	2017-02-13 17:01:53 -08:00
Sam Gross	bd5303010d	Refactor autograd package to separate Python dependencies. (#662 ) The core autograd Variable, Function, and Engine no longer depend on the Python API. This let's us implement functions in C++. In the future, we can also multithread engine and release the GIL for most of the non-Python backwards.	2017-02-13 16:00:16 -08:00
Sergey Zagoruyko	16d2c3d7b3	make networks converted with loadcaffe loadable	2017-02-13 23:53:46 +01:00
Fei Sun	fc2b6e8ed6	Migrate build_sgd to python directory Summary: Currently build_sgd is in facebook specific directory. Need to move it to python so that the open source world can use it. Reviewed By: salexspb Differential Revision: D4547016 fbshipit-source-id: d699b7b1ab8051afdeadedb4d247ec2a04a7a3e7	2017-02-13 13:31:37 -08:00
Yangqing Jia	2b4ec53fcb	translator fix to solve Aaron's issue Summary: TSIA. This is actually https://github.com/caffe2/caffe2/pull/135 Reviewed By: bwasti Differential Revision: D4552417 fbshipit-source-id: 184c085af91b87f88203c565167f66c66f17c05f	2017-02-13 11:19:13 -08:00
David Truong	60be25f4cd	Added shape inference to padding operator for tensors Summary: Can now infer the shape of the tensor Differential Revision: D4529339 fbshipit-source-id: 33553611fd3ecd7fde4b7b432c7720255ddda8be	2017-02-13 11:04:13 -08:00
Yangqing Jia	54fc123610	Halfway into windows port Summary: There are still a lot to clean up, but this is a start change. Reviewed By: bwasti Differential Revision: D4543980 fbshipit-source-id: 757fc49db230b56996f02d5de9b69030ebbf3b77	2017-02-13 09:46:18 -08:00
Dr. Kashif Rasul	407a92dc26	std::min() requires same type (#732 ) * std::min() requires same type * cast buffer instead * declare buffer_size as int64_t	2017-02-13 18:06:05 +01:00
Andrew Tulloch	0db5817290	Break the DagNet* code into net_dag.cc Summary: Unneeded for mobile, should go from 90kb to ~30kb or so. Differential Revision: D4545466 fbshipit-source-id: 47945493895a8f72d17de684b0429c2c7b5564ed	2017-02-13 07:32:11 -08:00
Andrew Tulloch	f09bd84137	GivenTensorFill breakup Summary: We don't need all the ~dozen filler ops - should reduce from ~60kb to 20kb. Reviewed By: Yangqing Differential Revision: D4545452 fbshipit-source-id: 7ed1a6ba5a2c180f37c3163bfb40844160882749	2017-02-13 07:32:08 -08:00
Andrew Tulloch	7134f0183e	Elementwise ops trim Summary: We only need Add right now, so split things up. Can take it from ~260kb to ~20kb. Reviewed By: salexspb Differential Revision: D4545441 fbshipit-source-id: 96e58fb4d8b2a4f120ae7d34e86cefca146ec14e	2017-02-13 07:32:03 -08:00
Adam Lerer	0a893abc7b	fix serialization bug for large files	2017-02-12 19:13:02 +01:00
Ronny	34fa5e0dc7	Update docstrings for testing object type Add docstring for `is_storage()` and `is_tensor()`	2017-02-12 09:21:01 +05:30
Sam Gross	712686ce91	Add cat, contiguous, squeeze, and unsqueeze to THPP Use unsqueeze and view from TH/THC	2017-02-11 17:49:31 +01:00
Renbin Peng	7ca1c0e405	Add two data_loaders and refactor code Summary: (1) Add two dataloaders, everstore and squashfs (2) Refactor code Differential Revision: D4500365 fbshipit-source-id: f70fb40ca29cdbfb46da5f3f6322f2d953c01903	2017-02-11 02:13:36 -08:00
Adam Lerer	518864a7e0	Fix bug in legacy NN updateGradParameters (#714 )	2017-02-11 11:04:18 +05:30
Jim Meyering	d918d77747	caffe2/caffe2/contrib/torch/torch_op.h: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Rename inner "err" to "err2". This avoids the following errors: caffe2/caffe2/contrib/torch/torch_op.h:263:47: error: declaration of 'err' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/contrib/torch/torch_op.h:263:11: error: declaration of 'err' shadows a previous local [-Werror=shadow-compatible-local] Reviewed By: Yangqing Differential Revision: D4544812 fbshipit-source-id: b15467ba9af7ec7f391db59f706b0442cdb664c4	2017-02-10 19:34:43 -08:00
Jim Meyering	3d0b717abc	caffe2/caffe2/operators/text_file_reader_utils_test.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Rename inner "i" to "j", twice. This avoids the following errors: caffe2/caffe2/operators/text_file_reader_utils_test.cc:56:14: error: declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/operators/text_file_reader_utils_test.cc:47:14: error: declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/operators/text_file_reader_utils_test.cc:41:12: error: shadowed declaration is here [-Werror=shadow-compatible-local] Reviewed By: Yangqing Differential Revision: D4544810 fbshipit-source-id: 089d73466f48a7a28b2a516117a12389c3ad54d2	2017-02-10 16:45:49 -08:00
Jim Meyering	14a9ce432d	caffe2/caffe2/binaries/core_overhead_benchmark.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Remove declaration of unused outer "stream". This avoids the following errors: caffe2/caffe2/binaries/core_overhead_benchmark.cc:28:27: error: declaration of 'stream' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/binaries/core_overhead_benchmark.cc:26:25: error: shadowed declaration is here [-Werror=shadow-compatible-local] Reviewed By: Yangqing Differential Revision: D4544811 fbshipit-source-id: c94e8a6e6d59705c86bc654f05d4de1ae4213eac	2017-02-10 14:35:51 -08:00
Jim Meyering	c0dd3b9744	caffe2/caffe2/mpi/mpi_test.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Rename outer "rank,size" to "rank0,size0" (to avoid shadowing another "rank" and "size" just below). This avoids the following errors: caffe2/caffe2/mpi/mpi_test.cc:124:9: error: declaration of 'rank' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_test.cc:112:7: error: shadowed declaration is here [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_test.cc:126:9: error: declaration of 'size' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_test.cc:115:7: error: shadowed declaration is here [-Werror=shadow-compatible-local] Reviewed By: Yangqing Differential Revision: D4544808 fbshipit-source-id: fdc53ab8763eb342302b94d82d1ac046f2af7d33	2017-02-10 14:35:51 -08:00
Jim Meyering	b0ff960301	caffe2/caffe2/mpi/mpi_gpu_test.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Rename outer "rank" to "rank0" (to avoid shadowing another "rank" just below). Also rename outer "size" to "size0" for the same reason. This avoids the following errors: caffe2/caffe2/mpi/mpi_gpu_test.cc:132:9: error: declaration of 'rank' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_gpu_test.cc:120:7: error: shadowed declaration is here [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_gpu_test.cc:134:9: error: declaration of 'size' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_gpu_test.cc:123:7: error: shadowed declaration is here [-Werror=shadow-compatible-local] Reviewed By: Yangqing Differential Revision: D4544806 fbshipit-source-id: 4cfa412dd672919174d487e60aa503a32125da03	2017-02-10 14:19:19 -08:00
Jim Meyering	7721cba906	caffe2/caffe2/mpi/mpi_common.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local I plan to enable this for all of fbcode, soon. See t13698406 for justification. Rename inner "new_insta_comm" to "comm". This avoids the following errors: caffe2/caffe2/mpi/mpi_common.cc:167:16: error: declaration of 'new_intra_comm' shadows a previous local [-Werror=shadow-compatible-local] caffe2/caffe2/mpi/mpi_common.cc:162:14: error: shadowed declaration is here [-Werror=shadow-compatible-local] Reviewed By: pietern Differential Revision: D4544805 fbshipit-source-id: c703c3f35c71f08b4daae8491ea2518572fc8013	2017-02-10 13:01:11 -08:00
Alexander Sidorov	2727317384	char-rnn: add comments Summary: Just some comments Reviewed By: pietern Differential Revision: D4544518 fbshipit-source-id: b517023bf5e9712a2bf96ae15a709553e5ee6032	2017-02-10 12:20:58 -08:00
Yangqing Jia	72fd605b01	Fix std::accumulate Summary: Testing pull request again. Closes https://github.com/facebookincubator/gloo/pull/2 Reviewed By: pietern Differential Revision: D4542327 Pulled By: Yangqing fbshipit-source-id: 5bd66c32c7249f1327225117815bef64b8708722	2017-02-10 10:12:37 -08:00
Alexander Sidorov	98f66fd282	Char-rnn : fix batching Summary: Input have to be arranged in such a way so j-th example of batch i goes right before j-th example in batch i+1 in the text. Reviewed By: urikz Differential Revision: D4519553 fbshipit-source-id: 9dd80658e0c4d9ff0f97a7904cbb164f267fe39f	2017-02-10 10:07:32 -08:00
Amy Zhang	5c007be804	add soft label functionality to softmax with loss op Differential Revision: D4527240 fbshipit-source-id: 548bf943857adb8f198348cc5b17ec52dc65bd2e	2017-02-10 09:01:53 -08:00
Alexander Sidorov	e676f4411b	GPU support for RecurrentOp + Char RNN example Summary: On batch size of 32 and other default parameters I get 70 iterations per second vs. 40 on CPU. batching still doesn't produce good loss, I am going to work on this in a separate diff Reviewed By: urikz Differential Revision: D4516566 fbshipit-source-id: d0611534747beb2cd935a8607a283369378e4a6c	2017-02-09 22:54:53 -08:00
Dmytro Dzhulgakov	335b73221c	Unify train_local and train_with_distributed_readers Summary: Outline of changes: - add single-operator support to Caffe2-Flow integration (based on Alisson's suggestions) - because of above support we can move graph construction to the main workflow body and pass the job to the Flow operator doing running, similarly to the distributed case - after that it's easy to unify code even more - there's some trickery required to make sure model exporting doesn't pollute Cluster info (as TaskGroup.to_task() creates new tasks) Important: this diff changes train_local behavior by introducing queue between preprocessing and trainer (before we did everything on trainer thread). It doesn't seem to impact perf much (even slightly positive), so I guess it's fine. It also allows for better unification. I'll follow up with a separate diff that moves max_examples gating to multi_reader (including train_local) and then we can enable checkpointing. Reviewed By: xianjiec Differential Revision: D4526079 fbshipit-source-id: 8c44044f45e7738e9b13e5b3acfbb994bc5a3d72	2017-02-09 20:46:35 -08:00
Pavan Yalamanchili	750fb5cc73	Fixes to support short and char tensors for bitwise operations	2017-02-09 18:52:59 -08:00
Pavan Yalamanchili	0f4749907a	Adding bitwise operations - lshift, rshift, bitand, bitor, bitxor	2017-02-09 18:11:58 -08:00
Pavan Yalamanchili	bd2dc63ef6	Adding bitand, bitor and bitxor	2017-02-09 17:06:04 -08:00
Artem Volkhin	3e08beb75e	implement Float16EncodeOp and Float16DecodeOp Summary: casting between fp16 and fp32 Reviewed By: dzhulgakov Differential Revision: D4526415 fbshipit-source-id: ebffb00ae12c6bcba79096b13e84ce55ef3f02bb	2017-02-09 17:03:43 -08:00
Alisson Gusatti Azzolini	039ac56a68	Better names for nets, steps and tasks Summary: - NetBuilder now honors its name - When Nets are created in the context of a NetBuilder, they take NetBuilder's name as prefix - When a NetBuilder is created in the context of a Task, it takes the Tasks's name. - pipe() now tries to find a good name based on its processor's, output or input queue's name. - RPC tries to find a name from its handler's name. - Better names in DataStream - net_printer prints the name of Tasks and Steps - net_printer optionally factors out common prefixes form blob names. Differential Revision: D4527578 fbshipit-source-id: 5d3d1237c186e9576313c5aa01cc8800a9051217	2017-02-09 16:33:54 -08:00
Pavan Yalamanchili	19a8795450	Changes to shift operations - renaming lsh -> lshift, rsh -> rshift - adding componentwise functions	2017-02-09 15:41:07 -08:00
Adam Paszke	d9dccfdd71	Fix for non-contiguous grad_output in cuDNN conv	2017-02-10 00:25:59 +01:00
Minsuk (Brian) Kahng	b993a2abe4	Dump data for DocNN visualization Summary: - Dump instance activations, some statistics about each neuron for model introspection visualization in flow - It is a part of minsuk's summer intern project. See the following link for high-level details: https://www.dropbox.com/s/m89rwpoomqkc9jb/aml-talk-nnvis-minsuk.pptx?dl=0 - Will combine the following two visualizations: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13795371/ and https://our.intern.facebook.com/intern/fblearner/model-introspection-nn/11910201/ Differential Revision: D4303679 fbshipit-source-id: eeac699891b17cea0b29324d584937460a8d7a25	2017-02-09 13:47:07 -08:00
Huazhong Ning	ed0024a82c	SparseToDenseOp and GatherDense Summary: 1. The existing Gather op outputs gradients in sparse format. We add GatherDense that does the same thing as Gather but outputs gradients in dense format. This relies on the SparseToDenseOp. 2. SparseToDenseOp converts sparse representation (indices, values) into a dense format (missing values are filled with zeros). There is an existing SparseToDenseMaskOp. It is mainly for converting sparse features into dense format. Modifying it to achieve our purpose is too complicated and messy. Better to create a new one. Reviewed By: dzhulgakov Differential Revision: D4508879 fbshipit-source-id: f4a50efa1c08586d94040f93195661c41cd414da	2017-02-09 13:33:06 -08:00
Conrado Miranda	7547a06c4f	Avoiding duplicated unsigned as it causes error on gcc.	2017-02-09 13:29:05 -08:00
Conrado Miranda	8929b75795	Added shift operations.	2017-02-09 13:28:36 -08:00
zhtvk	4d37ef878c	Remove view on data and target tensors of dim 1 in TensorDataset (#609 )	2017-02-09 22:06:39 +01:00
Pieter Noordhuis	efd8998690	Import gloo Summary: In the GitHub repository this directory will be mirrored similar to folly, such that the repository has a single top level directory called "gloo". This allows for versioning or renaming of the project root, without having to mangle the include paths; they will always use the "gloo" prefix. fbshipit-source-id: 24502e4185fc7cbe19b5249f83609e2b8118e9d7	2017-02-09 12:33:54 -08:00
Sam Gross	126e77d5c6	Merge commit 'e9b05c71b4acf210fad719f4da8bb58a425dd00b'	2017-02-09 12:31:58 -08:00
Sam Gross	53eec78bea	Merge commit 'ac9312e9f8002227b267a82e224a5a99c7a7e734'	2017-02-09 12:31:40 -08:00
Sam Gross	a4edaec81a	Merge commit 'aeb7a72620be47c0e6a8928a9cb6df49c06902a0'	2017-02-09 12:31:16 -08:00
Sam Gross	92481b59d3	Merge commit '73d232ee454ca25de5552d347a2b06820f30d193'	2017-02-09 12:30:39 -08:00
Yangqing Jia	f2b3f0ab5c	remove decode() Summary: This should not be needed any more since we use pybind. It will help python3 migration. Reviewed By: salexspb Differential Revision: D4535490 fbshipit-source-id: a47615f73b5c35b940d21bb2d5d55060fa0850be	2017-02-09 10:08:13 -08:00
Yangqing Jia	8ca1b3baea	import_array python3 compatibility Summary: TSIA Reviewed By: salexspb Differential Revision: D4535571 fbshipit-source-id: 61ce724d4fc3c79fac551e8622a2d45cda67f80a	2017-02-09 10:08:13 -08:00
Bryan Marcus McCann	6c77fa9121	Changes in RNNBase and Embedding for compatibility with DataParallel (#660 )	2017-02-09 22:36:26 +05:30
Soumith Chintala	aeb7a72620	Merge pull request #693 from colesbury/view Add code for 'view' to THC	2017-02-09 12:09:28 +05:30
Soumith Chintala	73d232ee45	Merge pull request #926 from colesbury/view Add code for 'view' to TH	2017-02-09 12:08:57 +05:30
Soumith Chintala	c0c65bf915	Merge pull request #696 from colesbury/unsqueeze Add unsqueeze to THC	2017-02-09 11:08:20 +05:30
Soumith Chintala	f6cee952af	Merge pull request #929 from colesbury/unsqueeze Add unsqueeze1d to TH	2017-02-09 11:07:47 +05:30
Kecheng Hao	53817feb3a	Optimize computation for top-K accuracy using heaps Summary: Per the task request, replace the original partial_sort solution by using heap. Differential Revision: D4529118 fbshipit-source-id: 3dc01fc3a552ad020a0370f8d26cbc8be58bca6b	2017-02-08 20:46:37 -08:00
Andrew Dye	306fde233a	Accept optional blob map for InferShapesAndTypes Summary: Shape inference allows Caffe2 to compute shapes of blobs without running a model. Update InferShapesAndTypes() to accept an optional blob:dimensions map so that external input blobs do not need to be part of the workspace. InferShapesAndTypes() in workspace.py conditionally calls the ...from_workspace or ...from_map bindings. Note I favored a small amount of code duplication here for the sake of readability. InferShapesAndTypes() in operator.cc has been refactored into mirrored entry points, invoking a common helper. Other minor changes to address linter warnings. Reviewed By: dzhulgakov Differential Revision: D4524873 fbshipit-source-id: 56f863b759c016d7f23523f06fda3aa5bba22357	2017-02-08 15:04:24 -08:00
Gregory Chanan	e74184f679	Make THCCachingHostAllocator less aggressive. In cases where copyAsync is a large percentage of the work, processing events in recordEvent can cause a large bottleneck. Here, we relax the constraint that we reclaim blocks as fast as possible (i.e. in copyAync); instead, we only check that a block can be re-allocated in malloc and free.	2017-02-08 14:44:24 -08:00
Matt Uyttendaele	e34e1b1b7b	added updated loss function, changed cv interp filters to AREA Summary: updated training for breaking change of loss_scale. Noticed that for large downscale factors opencv INTER_AREA did a better job avoiding aliasing so changed to this filter Reviewed By: seansnyder Differential Revision: D4528909 fbshipit-source-id: 692894812701854dd5eb8da932505f465fed3590	2017-02-08 14:20:59 -08:00
Sam Gross	3884d36176	Add unsqueeze to THC	2017-02-08 13:49:32 -08:00
Steven Strijakov	e6a18d2e9a	Added TransposeOp Inference Summary: TransposeOp shape inference is now implemented Differential Revision: D4517155 fbshipit-source-id: fb2b11c27231043f87a4c128b0eb3cbb60ab2c0c	2017-02-08 10:29:31 -08:00
Sam Gross	e7c6886a00	Add unsqueeze1d to TH Unsqueeze inserts a singleton dimension. Unlike view, it doesn't require the tensor to be contiguous.	2017-02-08 09:52:50 -08:00
Sylvain Jeaugey	024d1e2678	Merge pull request #69 from cwhipkey/master Qualify nullptr_t with std::	2017-02-08 09:17:50 -08:00
Sam Gross	ed8e92f63d	Expose rawSet and rawResize as resizeNd and setStorageNd	2017-02-08 09:00:22 -08:00
Sam Gross	fb97df5d65	Expose rawSet and rawResize as resizeNd and setStorageNd These methods are useful from C because they don't require constructing THLongStorages to wrap the sizes and strides, which can lead to leaked memory in case of an error. Instead the sizes and strides can be represented on the stack using standard C long arrays.	2017-02-08 08:56:04 -08:00
Gregory Chanan	e9b05c71b4	Use THCTensor rather than THCudaTensor in THCUNN.h definition of GatedLinearUnit.	2017-02-08 07:54:10 -08:00
Chad Whipkey	5eab428294	Qualify nullptr_t with std::.	2017-02-08 07:06:31 -08:00
Aapo Kyrola	849fc7ba68	check that parameter is int Summary: One trainer passed (10,) as the max_buffer_size parameter, causing the internal queue to grow out of bounds as qsize == (10,) never was true. This adds assertion to the type of the parameter. Reviewed By: prigoyal Differential Revision: D4527649 fbshipit-source-id: 492a824700b8fc69c484b80773b1f1f5aee39071	2017-02-08 03:04:04 -08:00
Pieter Noordhuis	41007ce07b	More comprehensive benchmark tool Summary: * Use Eigen for reduction math so that processor extensions are properly used and timing is as close as possible to real use cases. * Optionally run over multiple data sizes in sequence. * Maintain all timing samples so we can report latency percentiles. Example of benchmark output (2 nodes, tcp transport, chunked allreduce): ``` elements min (us) p50 (us) p99 (us) max (us) samples 1 70 150 210 262 14880 2 95 149 211 276 10624 5 92 146 209 287 11573 10 89 149 212 269 14141 20 74 151 216 264 14254 50 90 149 211 279 15236 100 94 149 202 264 12390 200 71 129 166 234 16343 500 74 130 167 224 19473 1000 93 140 171 227 12151 2000 100 143 167 209 13873 5000 97 156 199 258 9888 10000 107 177 233 310 13549 20000 132 197 252 312 9518 50000 181 276 414 616 4514 100000 273 534 687 1231 2958 200000 405 745 1165 2333 1679 500000 805 1321 2490 3787 704 1000000 2040 2902 3433 6214 693 2000000 3337 4006 5295 12809 177 5000000 10321 12529 15760 20903 98 ``` Differential Revision: D4500374 fbshipit-source-id: 1999142d6a5b235d32886354986cdee17edc9fee	2017-02-07 14:48:10 -08:00
Pieter Noordhuis	7137c565d7	Add all-to-one barrier Summary: This enables a real RTT measurement, since it's not possible for peers to 'pre-fill' the notification buffers as is the case for the all-to-all barrier. Differential Revision: D4523543 fbshipit-source-id: 3f6467cdc66b1062ada92deed581e9360003d629	2017-02-07 14:48:09 -08:00
Pieter Noordhuis	1709664a43	Add debug mode to transport buffer Summary: Useful for debugging algorithm interactions. Differential Revision: D4523522 fbshipit-source-id: ae3652f935774570ad29ff894f42e1634f22c806	2017-02-07 14:48:09 -08:00
Aapo Kyrola	6a03641cde	Add num_iters to RunNet() Summary: Running RunNet() in python in a loop can be a performance issue if the python code is doing a lot of other processing, such as data input, because python's Global Interpreter lock (GIL) will prevent the RunNet() to be called. This can easily be fixed by making RunNet() run multiple iterations inside the C++ land. (Another way to accomplish the same thing is to use Caffe2's "execution plans", but that requires more setup). + fixed timing reporting in my OC workflow + improved one error log in data_workers.py Sorry for piggypagging those small changes, but landing diffs currently is slow... Reviewed By: rpenggithub Differential Revision: D4523575 fbshipit-source-id: 039a647576efad5dd9afda74df478ac22b43c103	2017-02-07 14:16:14 -08:00
Yangqing Jia	274ac2b590	Add cmake guard for python, build for tegra X1 Summary: In short: cmake is lovely. Closes https://github.com/caffe2/caffe2/pull/131 Differential Revision: D4517234 Pulled By: Yangqing fbshipit-source-id: 1117878393f8fe7d6bebbc4a06a3c37b734f3222	2017-02-07 13:17:50 -08:00
Pooya Davoodi	5303634ebf	Use MDB_NOLOCK. Summary: - Do not lock LMDB. - This avoids failure when multiple readers try to read the same LMDB. - This also can cause a race if a process tries to write into the LMDB that is being read by another process. Because this commit removes the locking mechanism. - Note that we already use MDB_RDONLY when reading LMDB. - It seems that LMDB does not provide any method of locking the database to avoid writes while allowing reads. Closes https://github.com/caffe2/caffe2/pull/130 Differential Revision: D4512220 Pulled By: Yangqing fbshipit-source-id: 45df849efa339601291aea6d0ed5ac74e097273b	2017-02-07 13:17:50 -08:00
Dmytro Dzhulgakov	535e0e486b	Add model graph to dper_example Summary: Just the first version displays forward part of the training net. I want to refactor local/distributed code to share graph initialization and then visualize all nets individually. Graphs don't look pretty because of a lot of DotProducts, we need to refactor it. Reviewed By: xianjiec Differential Revision: D4514479 fbshipit-source-id: 156bb07c62118b15022c87f197b5e378a7ef3b9f	2017-02-07 13:03:54 -08:00
Kecheng Hao	1f1aafaebe	Implement shape inference function for AccumulateOp Summary: Implemented the sharp inference function for AccmulateOp. The output shape and type should be same as the input. Differential Revision: D4518812 fbshipit-source-id: 11fc7ec4fad1fe3049c5a35d13c371627f9e3d11	2017-02-07 12:01:12 -08:00
Pieter Noordhuis	c115646d71	Use fbcollective Summary: Update data parallel model to default to using fbcollective. Update broadcast op to correctly handle Tensor<long>. Differential Revision: D4508029 fbshipit-source-id: 7b8d17223e25b3e1098ee3f2a08af61af140729e	2017-02-07 10:48:33 -08:00
Adam McCarthy	7926324385	Corrected parameter typo in Adam docstring (#697 )	2017-02-07 19:00:10 +01:00
João Felipe Santos	1527b37c26	Fixed typo and rendering of some equations (#693 ) * Fixed typo and rendering of some equations * Few more fixes to MSELoss docs * Cleaning up whitespace to make pep8 happy	2017-02-07 18:59:27 +01:00
zhoumingjun	de4659659b	The RNNCell's example can not run correctly	2017-02-07 18:58:19 +01:00
Pieter Noordhuis	b2532d2794	More logging if unable to find address to bind to Summary: This should help in debugging test failures on continuous integration hosts. Part of this change is to make the address family to use configurable, so the user can force the library to use either IPv4 or IPv6, instead of picking whatever we see first. Differential Revision: D4515802 fbshipit-source-id: 8834cece2ff819c8acad81fa2d76c3ed94f06158	2017-02-07 00:42:45 -08:00
Aapo Kyrola	50213705d4	Allow specifying max buffer size. Smaller initial size. Summary: I recently encountered out-of-memory errors on my OC workflow. This was because the internal queue for buffering image patches was too large. Total memory use was: image size = 227 x 227 x 3 x 4 total mem = image size x queuesize (500) x num gpus x everstore-worker batch (128) > 300 gigs. Reducing the batch size to 100 should fix this. Also can now specify as a parameter. Reviewed By: rpenggithub Differential Revision: D4519956 fbshipit-source-id: 781697e620431ce7053534e683047bb6e7257b22	2017-02-06 22:01:56 -08:00
Priya Goyal	3c90356499	Add check for num_shards when using distributed training Summary: If num_shards = 1 and distributed training is on, then ring reduce fails when it looks for left pair to exchange information. I also used the opportunity to do a small fix in my data loader benchmark Differential Revision: D4513545 fbshipit-source-id: 7d3115b871a39b8ce7b55553394b607d16e08b74	2017-02-06 20:19:19 -08:00
Zhao Tan	a386fe8b6a	LogOP implementation Summary: Element-wise log operation for a Tensor Reviewed By: dzhulgakov Differential Revision: D4519090 fbshipit-source-id: 68b73efa0ef268426b5aece77c8137000a73d165	2017-02-06 20:19:19 -08:00
Pavan Yalamanchili	a96a8c8336	Static build support + Query CUDA driver, runtime versions (#695 )	2017-02-07 08:34:20 +05:30
Dmytro Dzhulgakov	078a8d10de	Move png image to net_drawer and Flow example Summary: Making drawing a bit easier Also adds a Flow example to check that PNG images are nicely rendered in lists. Reviewed By: kennyhorror Differential Revision: D4514470 fbshipit-source-id: 35189c4543c31a351c1dbfe804ce25ae14a3a98b	2017-02-06 18:46:48 -08:00
Alisson Gusatti Azzolini	17151ca14f	Debug/Analysis tools for Jobs/ExecutionSteps Summary: Introduces 2 utitilies: - ##print_obj##: Prints the whole Job in a nice way -- each op call takes one single line and nets are inlined for much better readability. Loops and parallel steps are easy to read. - ##analyse_obj##: Goes through a Job and checks 2 things: - that there will be no undefined blob errors at execution. - no blob of same name will be created by parallel execution steps Reviewed By: dzhulgakov Differential Revision: D4142381 fbshipit-source-id: 61bf3398c22e9947493e99145ce2bfc2646830a6	2017-02-06 17:31:20 -08:00
Yury Zemlyanskiy	280718b40c	Allow non-batched initial recurrent states for RecurrentNetworkOp Summary: title Reviewed By: salexspb Differential Revision: D4493728 fbshipit-source-id: a9ba25bd325b413ed15c35754afb9ed562b1a60c	2017-02-06 15:01:36 -08:00
Sam Gross	691aa19b88	Add code for 'view' to THC	2017-02-06 14:04:04 -08:00
Min Li	947e5feb4d	Trainer support for mobile ranking Summary: We want to train models with user sequence data for mobile side ranking. The operators are for preprocessing the sequence based data. They read in a sequence with a batch and convert the examples with different method. I also add a new loader for connecting the operator to existing trainers Differential Revision: D4485411 fbshipit-source-id: 0cf17206704995f2ce079e1594607bea70b1ed0c	2017-02-06 14:03:44 -08:00
Sam Gross	6b07dc9e22	Add code for 'view' to TH	2017-02-06 14:00:48 -08:00
Dmytro Dzhulgakov	75e62924e3	schema.Struct.__add__ Summary: makes life a bit easier Reviewed By: xianjiec Differential Revision: D4514640 fbshipit-source-id: b39f9cb05d31d2e5fa957bc072cf18eda13cff89	2017-02-06 13:47:58 -08:00
Aapo Kyrola	3049bc1fed	Fix data parallel model code doc Summary: Thanks rpenggithub Reviewed By: rpenggithub Differential Revision: D4510933 fbshipit-source-id: 25e33ac0ba5a5143fc5bbe1abb615d7512c7ef41	2017-02-06 12:33:33 -08:00
Alisson Gusatti Azzolini	3bb8755067	Use multi_reader directly Summary: This makes sure dper_example is compatible with the new way of defining checkpoint epochs. See D4499320. Reviewed By: xianjiec Differential Revision: D4511618 fbshipit-source-id: f5188010cdefe3739f87f6049d1ea6aee765c514	2017-02-06 09:59:20 -08:00
Pieter Noordhuis	c4afd618c4	Add USDT for operator execution Summary: Import relevant headers from folly. Reviewed By: azzolini Differential Revision: D4342793 fbshipit-source-id: 77471e1afd70e399805e4c46e5320ccc3e39d69c	2017-02-06 08:44:42 -08:00
jeremy	8aa259b52b	review comments from gchanan	2017-02-06 11:08:23 +00:00
Kecheng Hao	06591ad414	Fixed task 15844370: [Caffe2/Bootcamp] Make top-1 accuracy faster Summary: Per the task's request, added top_k == 1 branch to specially handle the cases with top-1 accuracy. In addition, I made slight code refinement: moving the declaration of vector Xdata_pairs out of the for loop to avoid the cost of vector's constructor. Differential Revision: D4505983 fbshipit-source-id: 5671eaca4aac3900c69dfb54d664c2d617960b4b	2017-02-04 07:59:23 -08:00
Will Frey	ac9312e9f8	Bugfix/rowconv (#1126 )	2017-02-04 20:37:45 +05:30
Alisson Gusatti Azzolini	33c0e5619b	Add Task.REPORT_NET attribute Summary: This allows to have a task-local report net before the Task is created. To be used in global counter (diff soon) Reviewed By: dzhulgakov Differential Revision: D4497771 fbshipit-source-id: 24ec7c8e95466abbd83fbea79b58717d81201857	2017-02-03 18:44:50 -08:00
Boris Fomitchev	91a17b702b	half<->float conversion cleanup (#901 ) * half<->float conversion cleanup	2017-02-04 07:30:13 +05:30
Soumith Chintala	c54597e0b2	std::move fixes	2017-02-03 21:31:03 +01:00
Pieter Noordhuis	e63003d5a0	Fix race in FileStoreHandler Summary: It was possible for a set and a get to race such that the get would return an empty string, if the file for the key was created but not yet written to. This change updates the FileStoreHandler to first write to a temporary file and then atomically rename(2) the file to its final path. This removes the described race condition. This change also replaces the poor filename generation routine with using the 128-bit MurmurHash of a key. Differential Revision: D4502154 fbshipit-source-id: f2abc78b8bad68c06ad2f18a078935826e431f7a	2017-02-03 09:59:45 -08:00
Aapo Kyrola	1c7886701e	lr_scale to loss_scale Summary: As per discussion in https://www.prod.facebook.com/groups/184236721951559/permalink/354591931582703/, KaimingHe pointed out that scaling LR is not same as scaling Loss, since LR scaling will affect the weight decay (which is implemented by modifying the gradient, which thus is not yet correctly 'averaged'). Actually prigoyal tried to convince me earlier that loss scaling is the way to go, but I was then not convinved :/. So this diff removes the LR scaling parameter passed by data_parallel_model and instead passes a loss_scale parameter to the model creation function. Unfortunately, this will break all existing code that uses the data parallel model. But that is not only a bad thing, since it will bring awareness to this change. I will inform in the FB groups about this. In this diff I modified all my models to work correctly. Reviewed By: Yangqing Differential Revision: D4507002 fbshipit-source-id: 16c7221663282f71a1b754b34de0c8ccd5c2ca90	2017-02-03 07:44:40 -08:00
Aapo Kyrola	b2472eab3a	Improve dagnet chain computation by pruning redundant dependencies Summary: We have noticed that the number of chains computed is usually much larger than necessary, when there is a backward pass. For example having a network of 5 FCs with gradient operators (but no parameter updates) should yield only one chain, but instead over 20 were created. After adding parameter updates, the forward pass still should remain one chain, while the backward pass will be splintered. Analysis showed that the problem was the dependices from forward ops to the gradient computation. But these are redundant since the gradient op is already dependent from the op via the full path over ops. Example: fc1 -> fc2 ---> fc3 --> loss \| \| \| \| fc1grad <- fc2grad <- fc3grad <- Here fc1 and fc1 grad have a direct dependency, but indirect dependency via fc2->fc3->[...]->fc1grad already covers that dependency. To fix this, I added a pruning step prior to the chain computation. The chain computation is done on the pruned tree, but I do not modify the runtime chains for safety. Pruning is based on following logic: - if one of my direct parents is ancestor via an another traversar, I can remove the direct dependency Pruning is extremely fast, linear in the number of dependencies. Reviewed By: dzhulgakov Differential Revision: D4500293 fbshipit-source-id: 0994ae6775c53378ea1e0074365cef041764a1b4	2017-02-03 07:44:40 -08:00
Aapo Kyrola	dcefc74a0c	Shape and Type Inference Part1 Summary: This is a bit large diff, sorry about it. It includes basic shape and type inference functionality, based on YQ's Schema scaffolding. I added some helper functions to make it easier to write simple translations. Bigger refactoring was needed for ConvPoolBase so that we could use the shape inference already there in the schema. I annotated enough operators to be able to infer forward-pass of shapes for basic convnet, and added test for that. I intend to bootcamp some annotations and annotate enough to handle Resnets fully. Need to think about gradients, if they could be annotated in an easier way. Only shapes are now exposed to Python, types will follow later. Also the inference is not called yet anywhere but unit test. Also I am not sure if everything is in the best location in the code, but shouldn't be hard to move stuff around. Reviewed By: dzhulgakov Differential Revision: D4436818 fbshipit-source-id: eebee5937ccc9ac09c245465302388a1fae6933c	2017-02-02 22:29:22 -08:00
Angela Fan	a9785bba44	cuda implementation of Gated Linear Unit, fixed issues with genericization	2017-02-02 21:38:25 -08:00
Huazhong Ning	5837b21691	support subtask in mtml Summary: Required by feed ranking: https://fb.quip.com/N4IuAIgda8Pe Each task might have multi-subtasks. Each subtask has dedicated mlp layers. Reviewed By: xianjiec Differential Revision: D4451609 fbshipit-source-id: 3dad48e6a7cce1bb103d93ec205ff6d2333659ea	2017-02-02 17:59:30 -08:00
Bram Wasti	8e1c513fb5	Make build_host_protoc more robust to weird system settings Summary: If the PATH doesn't include cmake (such as when android studio wipes all the environment variables), this will still work. Reviewed By: Yangqing Differential Revision: D4504653 fbshipit-source-id: 56a8854e3daf6ee1f5b1cbeb83ca175a007dad12	2017-02-02 15:44:32 -08:00
Alexander Sidorov	2ce3cfefe1	Char-RNN Tutorial Summary: This learns Shakespeare and then generates samples one character at a time. We want this to be an example of using our LSTM and RNNs in general. Now it takes 4ms to run the training net on current parameters (with batch size = 1). I don't have data on how much each operator takes yet. But overal python loop doesn't seem to influence much - with 1000 fake iterations in run_net it took 4s for each iteration as expected. Future work: * fixing convergence for batching * profiling on operator level * trying it out with GPUs * benchmarking against existing char-rnn implementations * stacking lstms (one lstm is different from two, one needs to take care of scoping) Reviewed By: urikz Differential Revision: D4430612 fbshipit-source-id: b36644fed9844683f670717d57f8527c25ad285c	2017-02-02 15:44:32 -08:00
Alisson Gusatti Azzolini	d7e85bf38e	Fix ops.stop_if() from inside processors Summary: stop_if() was not being honored in ProcessingReader. Reviewed By: dzhulgakov Differential Revision: D4497784 fbshipit-source-id: 1c967c6252f832149800796e2c26aadf10b74850	2017-02-02 15:14:27 -08:00
Alisson Gusatti Azzolini	000c53a7b1	AtomicCounter to return previous value on Reset. Summary: This allows to save the previous value of the counter and send it upstream without losing counts. Reviewed By: kennyhorror Differential Revision: D4497854 fbshipit-source-id: 28a7ad0ff1020bde26f78b1f59614b094d1e1881	2017-02-02 14:59:30 -08:00
Alisson Gusatti Azzolini	d93b9eeae2	Fix NetBuilder's task_init Summary: The net was being added to the task body by mistake. Also, adds local_init and local_exit functionality. Reviewed By: dzhulgakov Differential Revision: D4497794 fbshipit-source-id: 4d9dfb48a277ccfa204f1e74886abba5d44c61f8	2017-02-02 14:59:30 -08:00
Zhao Tan	d8dff5853e	Add numSample field for preComputing Summary: For customers like Ads, Feeds, MarketPlace, their training data size is super large. It is unnecessary and costly to go over all the data to compute meta information. In this diff, numSample option is added in preCompute, so users have control over how many samples they want to use when computing meta information. Differential Revision: D4492399 fbshipit-source-id: 7199381d226ee6300a959fc5e116d39984d199fc	2017-02-02 13:59:30 -08:00
Pieter Noordhuis	115b5e0c5c	Configurable hostname to bind to for tcp transport Summary: The unit tests using the tcp transport should bind to localhost instead of hostname(2). Differential Revision: D4501851 fbshipit-source-id: 43db860c9b96d5d64801d1c6af2bf25e6759b4af	2017-02-02 11:59:32 -08:00
Aapo Kyrola	d6d19d6dca	Assert on low side as well Summary: One model was passing -1s in the label blob, causing illegal memory access when computing the label-cross entropy. Improving the assertion causes it to fail properly. Reviewed By: prigoyal Differential Revision: D4491848 fbshipit-source-id: 5c48e43b0a8928cac70e939d69d23c94c07511b9	2017-02-02 09:59:27 -08:00
Aapo Kyrola	f401bf8928	dynamic creation of streams and cublas_handles, support multiple streams per thread per gpu Summary: Currently CUDAContext only supports one cuda stream per gpu per thread. But as per my investigation, it is much better to use one CPU thread to control all streams for one GPU. To make this possible, this ground work is necessary: this diff defines a stream id for cuda context that is used to index to streams for that gpu for that thread (the streams are handled by a thread-local class). This diff also changes the initialization: before we created cuda streams for all gpus and for all threads, even if they would be never used. Now streams are created only when needed. This adds a small overhead to context.cuda_stream(), but I doubt that to have any significance. Instread, this diff will reduce memory usage on GPU side slightly. Reviewed By: Yangqing Differential Revision: D4492380 fbshipit-source-id: 498555e58d75217d43891e1bcad6d86051d376ce	2017-02-02 09:44:30 -08:00
Francisco Massa	833b8cbc7a	Remove unused code from module	2017-02-02 17:20:11 +01:00
soumith	75aeb16e05	Merge commit '72089c9c36c6b880c695baf732cd04329d72c098'	2017-02-01 22:00:42 -08:00
Yangqing Jia	4dd297d261	Add nnpack	2017-02-01 21:45:18 -08:00
Pieter Noordhuis	3df81ff18c	Add ibverbs transport for fbcollective Summary: TSIA Reviewed By: Yangqing Differential Revision: D4498867 fbshipit-source-id: 6384ae4b64e4d68e11d52e1ae8ab661216bce256	2017-02-01 21:29:56 -08:00
Matt Uyttendaele	4ed85d1d00	modified behavior of input_op to convert to desired channel depth if source is different Summary: The color_ flag to image_input_op now indicates the desired number of output channels. If the source DB has a different number of channels then color to grayscale or vice-versa is done. Reviewed By: Yangqing Differential Revision: D4498455 fbshipit-source-id: da8c39eccd06b9158f320a05663658e502905ae5	2017-02-01 21:29:56 -08:00
Soumith Chintala	fc354a0d6e	Revert "cuda implementation of Gated Linear Unit, fixed issues with genericization"	2017-02-02 10:50:47 +05:30
Soumith Chintala	262611fcd3	Merge pull request #430 from huihuifan/newCudaGLU cuda implementation of Gated Linear Unit, fixed issues with genericization	2017-02-02 08:16:35 +05:30
Gregory Chanan	b8a34f3033	Small fixups: 1) Add return after THError for completeness. 2) Fix brace formatting	2017-02-01 15:46:19 -08:00
Sam Gross	10bb6bb9b8	Fix function names in error messages	2017-02-01 15:21:57 -08:00
Sam Gross	3c9ef69c37	Fix THCTensor::isSparse	2017-02-01 14:51:06 -08:00
Natalia Gimelshein	dee987d6ee	use pseudo-fp16	2017-02-01 23:48:09 +01:00
Sam Gross	138f254ec1	Support sparse tensors in THPP (#667 )	2017-02-01 17:34:50 -05:00
Adam Paszke	c7c8aaa7f0	Add ModuleList and ParameterList to nn	2017-02-01 23:26:31 +01:00
Bram Wasti	77fd7c2b6f	Make translator work as command line tool Summary: The initial implementation wasn't working quite right (no const fill of an empty external input) Reviewed By: viswanathgs Differential Revision: D4490569 fbshipit-source-id: 1b2a4f612efb3b2685edfe6c683571dd9d01aa4f	2017-02-01 13:14:26 -08:00
Sam Gross	d0db624e02	Add W503 to PEP8 ignore list (#646 )	2017-02-01 15:57:09 -05:00
Adam Paszke	e3e7b76310	Rename all normal and log_normal args to std	2017-02-01 21:48:11 +01:00
Adam Paszke	dad02bceb9	Remove duplicated line in cwrap	2017-02-01 21:48:11 +01:00
Adam Paszke	b195285879	Improve CUDA detection in THPP	2017-02-01 21:48:11 +01:00
Adam Paszke	8f3da5b51d	set_index -> _set_index	2017-02-01 21:48:11 +01:00
Adam Paszke	825e919eb8	Add torch.unbind	2017-02-01 21:48:11 +01:00
Adam Paszke	acb0ce8885	Add LongTensor indexing support	2017-02-01 21:48:11 +01:00
Cliff Woolley	72089c9c36	Update THHalf.c	2017-02-01 11:53:29 -08:00
Cliff Woolley	cf2f158fec	Remove erroneous proprietary license header This change was approved by NVIDIA Legal, and I am authorized to make the change on behalf of the company.	2017-02-01 11:43:44 -08:00
Ross Girshick	2397b6a6f2	Add CUDA support for Safe{Enqueue,Dequeue}BlobsOps Summary: Add support for "safe" versions of enqueue and dequeue. I'm not sure if using `math::Set<bool, Context>` is the best context independent approach for setting the status. Differential Revision: D4398633 fbshipit-source-id: 7c88c8e11acfe36fd3d94f17dbf68ce558eb6df1	2017-02-01 09:44:37 -08:00
jeremy	41ddc2a786	VolumetricFractionalMaxPooling like Spatial...	2017-02-01 12:01:09 +00:00
jeremy	e4886f6589	VolumetricFractionalMaxPooling like spatial	2017-02-01 11:52:49 +00:00
Viswanath Sivakumar	0a3a3de574	Utility op to join tensor matrices into a row strings Summary: Takes a 2D tensor of floats, and converts each row into a comma delimited string. vigneshr ran into a limitation where logging features to hive wasn't possible without this since our APIs only allow logging strings. Differential Revision: D4486151 fbshipit-source-id: 2d229290819e2e7ca3dc6f93846433da8b02a41d	2017-01-31 22:44:22 -08:00
Sam Gross	6470b5bd21	Add test for Embedding with sparse=True (#663 )	2017-02-01 09:54:42 +05:30
tvn	44196955e2	ByteTensor should be unsigned (#664 ) ByteTensor should be unsigned	2017-01-31 21:43:39 -05:00
Sean Snyder	79c04d32dc	add an option to use a resnet network instead of alexnet Summary: add an option to use a resnet network instead of alexnet. Modified the resnet.create_resnet50 function slightly to allow specifying different kernel/stride parameters so we can adapt resnet to our image size. Differential Revision: D4472535 fbshipit-source-id: ed06acf52f6425a1e04d047548eb3c70388d74aa	2017-01-31 16:59:30 -08:00
Pieter Noordhuis	8ef37ff8fd	Add fbcollective Summary: Library for collective communication. Includes Caffe2 ops that wrap reduction/broadcast algorithms. Reviewed By: Yangqing Differential Revision: D4439018 fbshipit-source-id: 4bc3652d07953b0bcf4c4c08574a85f1098f683f	2017-01-31 14:29:27 -08:00
Alexander Sidorov	b7fa6b2a8b	remove recurrent_inputs in a favor of recurrent_input_ids Summary: I have forgotten to remove this one. The rest of indexing instead of string names is comming after D4446813 lands as scratches aren't inputs or outputs and thus can't be indexed. Reviewed By: urikz Differential Revision: D4465748 fbshipit-source-id: 2ccbedfb35541ef4a2231d1480eef59025bd5290	2017-01-31 13:14:33 -08:00
Adam Paszke	f08ec1394d	Fix bug with inplace TH(CU)NN Also, remove unnecessary zero_() calls	2017-01-31 21:00:49 +01:00
Sam Gross	f8fb25e0a2	Add generic bindings to THNN and THCUNN (#645 ) Adds bindings using thpp::Tensor to THNN and THCUNN. This allows calling into those APIs without knowing the concrete types of the tensor arguments.	2017-01-31 13:23:02 -05:00
Fei Sun	519a23e767	Use chrono library instead of sys/time.h to get the the time from epoc Summary: Remove the dependency on sys/time.h, and use c++11 feature chrono library, which is more portable. Reviewed By: Yangqing Differential Revision: D4486569 fbshipit-source-id: 86be58c6e9bc410e726a4799bc4d2be86fdd1dd4	2017-01-31 09:14:23 -08:00
Sam Gross	6a0c66752f	Fix documentation and argument name for Tensor.normal_(mean, stddev) (#652 )	2017-01-31 11:55:39 -05:00
Matt Uyttendaele	562a4c2dbf	fixed input op for grayscale images Summary: grayscale images were not being handled correctly by the image input op in the CPU path. There was a coercion of the grayscale image to color which strided through the grayscale image 3 pixels at a time Reviewed By: Yangqing Differential Revision: D4486356 fbshipit-source-id: 482fbfe211ecdc107e55692a4cf0329e174c8e4a	2017-01-31 01:14:22 -08:00
L. Zhou	a1bd4efb08	readme: add guidance on disabling CUDA (#655 )	2017-01-31 14:05:51 +05:30
Alexander Sidorov	d019ec793c	improve fluky test Summary: On some inputs TestWarden was failing Reviewed By: Yangqing Differential Revision: D4487293 fbshipit-source-id: 3da4b310a619c2b57f033b2dd7727f71403bfd68	2017-01-30 22:14:27 -08:00
Yury Zemlyanskiy	debd256177	Fix for gradient propagation for initial recurrent state for RecurrentNetwork Summary: looks like we don't a good job with initial recurrent input gradients yet. Here is some fix, but gradient doesn't check yet. The shape is correct now though Reviewed By: salexspb Differential Revision: D4475447 fbshipit-source-id: 280f1f59f19e487fd0dce0d440609c50ddce294a	2017-01-30 18:59:32 -08:00
Sam Gross	b43ce05268	Refactor parts of utils.h (#648 ) Moves THPObjectPtr into a separate header, so that it can be included independently. Currently, utils.h requries all of THP.h. Also adds RAII structs for acquiring and releasing the GIL.	2017-01-30 21:16:28 -05:00
Sam Gross	80e56cfda9	Merge commit 'dc9a5b7d2fbcf21268b524b9da5ae38a74214a59'	2017-01-30 17:58:05 -08:00
Sam Gross	24701fc5a7	Merge commit '03dcf8a83bb009ecfdd8f27c4d9a6db40829b690'	2017-01-30 17:57:20 -08:00
Sam Gross	f78a266d99	Merge commit '368cbe615d0a7bdaadddcb3bd390abcd4cc17b91'	2017-01-30 17:56:37 -08:00
ngimel	f096fb6859	adding cudnn V6 support (#515 )	2017-01-31 02:01:37 +01:00
Adam Paszke	a3e11d606b	Fix linter errors	2017-01-31 01:58:09 +01:00
Adam Paszke	79232c24e2	Fixes after rebase	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	15d9d499ab	Remove ZMQ dependency from compilation files	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	962084c8e8	Add Data Channel receive from any source (#52 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	7518b1eefb	Introduce Scalar for easier send/receive types through DataChannel	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	8215d7a4ba	Implement TH_API functions from the set 2 (#49 )	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	5aaa220d84	Thd functions v3 (#46 )	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	12c16ab9bc	Remaining storage functions implemented	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	76520512e7	DataChannel tests rewrite (#42 ); DataChannel `isend` and `irecv` implementation (#44 )	2017-01-31 01:58:09 +01:00
Mateusz Piotrowski	66de965882	Replace ZeroMQ (#41 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	10d32fb0b7	Fix DataChannel tests failure (#43 ) Tests failed due to accessing reference which could be invalid.	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	e72c9b6e4a	Storage constructors implemented (#40 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	ac1f68127a	Add barrier, scatter, gather and allGather implementations + groups (#34 )	2017-01-31 01:58:09 +01:00
Adam Paszke	60d1852c7b	Major improvements to master-worker mode * Fixed all undefined symbol errors * Implemented storage interface and THStorage class * RPC improvements * Code refactor	2017-01-31 01:58:09 +01:00
Mateusz Piotrowski	d53eb521fc	Add missing headers.	2017-01-31 01:58:09 +01:00
Adam Paszke	9808932f10	Refactor RPC and change TensorType to Type	2017-01-31 01:58:09 +01:00
Adam Paszke	ea876eb6d5	Add initial bindings for master-worker mode	2017-01-31 01:58:09 +01:00
Adam Paszke	0a45864866	Add THDStorage and improve master-worker mode implementation	2017-01-31 01:58:09 +01:00
Adam Paszke	2560b39796	Merge TensorTypeTraits.hpp with TensorTraits.hpp	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	21afa4c88b	Worker handling for constructors + destructor	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	9fc3c5e4d2	THDTensor constructors implemented + some minor fixes	2017-01-31 01:58:09 +01:00
Mateusz Piotrowski	3e3501c98d	Integration tests of the THD Python interface (#28 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	5e6fcd02b5	Implement data channel groups (#25 )	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	d46ebcfadf	Fix broadcast and reduce implementations Due to bad rank mapping broadcast and reduce were connecting wrong processes what resulted in errors or not received/sent tensors. * Introduced new mapping method to solve this problem. * Added and improved tests for this cases.	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	41480c8cf2	Data channel maintenance	2017-01-31 01:58:09 +01:00
Adam Paszke	236890d902	Fix transitive library dependencies in CMake	2017-01-31 01:58:09 +01:00
Adam Paszke	55632d81d2	Add Python wrappers for process group mode	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	0b276d622e	Add reduce and allReduce implementations (#15 )	2017-01-31 01:58:09 +01:00
Adam Paszke	c81491b37d	Preserve directory structure when installing headers	2017-01-31 01:58:09 +01:00
Adam Paszke	42e189425f	Detect ZMQ libs and headers in CMake	2017-01-31 01:58:09 +01:00
Adam Paszke	3cfa0d7199	Expose C API for process group mode	2017-01-31 01:58:09 +01:00
Adam Paszke	7c9e088661	Reorganize THD directory structure	2017-01-31 01:58:09 +01:00
Mateusz Piotrowski	e78aa4bb84	Implement CommandChannel with ZMQ.	2017-01-31 01:58:09 +01:00
Janusz Marcinkiewicz	f8e94d0d8b	Implement DataChannel (MPI and TCP) (#8 )	2017-01-31 01:58:09 +01:00
Filip Binkiewicz	ebe6f40fce	RPC message packing and unpacking implemented	2017-01-31 01:58:09 +01:00
Adam Paszke	5fb37efb46	Use #pragma once instead of defines	2017-01-31 01:58:09 +01:00
Adam Paszke	4f47855873	Style improvements	2017-01-31 01:58:09 +01:00
Adam Paszke	52ae6f682f	Add initial version of tensor wrappers	2017-01-31 01:58:09 +01:00
Adam Paszke	c35f58f97b	Template for THD implementation	2017-01-31 01:58:09 +01:00
Adam Paszke	659b2f3154	Add more autograd functions	2017-01-31 00:39:34 +01:00
Adam Paszke	5ea05cfb96	Return indices from Variable sort and topk	2017-01-31 00:39:34 +01:00
Alisson Gusatti Azzolini	0700e05e68	Disallow duplicate field names in Struct Summary: title. Differential Revision: D4482958 fbshipit-source-id: a732f6b5d862b440a4856251ad68ecd98f60e8d1	2017-01-30 14:44:28 -08:00
Adam Paszke	dc9a5b7d2f	Fix memory leak in SpatialMaxUnpooling	2017-01-30 23:23:07 +01:00
Alisson Gusatti Azzolini	1d3834eeb2	Nodes to support resource requirements and outputs Summary: See distributed.py for example of usage Reviewed By: xianjiec Differential Revision: D4467723 fbshipit-source-id: c74f71bebaa1751098379838d3da55945aac62bd	2017-01-30 11:29:25 -08:00
Yangqing Jia	8553bd3f68	Ensure we are not using Eigen LGPL code, and build on raspbian. Summary: Turns out that building on raspbian is easy as a cake for caffe2 - cmake is awesome. Closes https://github.com/caffe2/caffe2/pull/112 Differential Revision: D4480985 Pulled By: Yangqing fbshipit-source-id: 5dbe5e1e71d8680dea7a5ec8a9ce7fbe6aa5270a	2017-01-30 09:44:27 -08:00
Andrew Drozdov	f7ab5a128a	Delete extra bracket in RNNCellBase.__repr__. (#637 ) This extra bracket causes a ValueError when trying to print a Module that uses RNNCellBase or any of its subclasses.	2017-01-29 23:21:24 -05:00
Adam Paszke	368cbe615d	Add Ubuntu 16.04 lib paths in CMake	2017-01-30 01:16:02 +01:00
Soumith Chintala	d4c9a3782b	billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix (#617 ) * billinear -> bilinear, docs for upsampling, improved docs for Unpooling, pep8 tests fix	2017-01-30 05:08:48 +05:30
Adam Paszke	172dca5e8b	Fix bug in cat (non-contiguous first input)	2017-01-29 21:25:53 +01:00
Adam Paszke	818bf0c408	Compile with asserts by default	2017-01-29 21:21:59 +01:00
Adam Paszke	03dcf8a83b	Compile with asserts on by default	2017-01-29 21:18:54 +01:00
Adam Paszke	604f607fd1	Add asserts in index* functions	2017-01-29 21:18:43 +01:00
Hardik Goel	956d946c25	Default initial hidden states for recurrent layers (#605 ) Fixes #434	2017-01-29 12:38:56 +01:00
Mathieu Baudet	2c6391be0a	remove unused includes in fbcode (skipping #if, new default mode) Summary: This solves most include warnings as seen in Phabricator (no header files, no "packing" system headers, new default mode where more user headers are removed). We cowardly skip files containing #if for now. Generated by ``` rm -f /tmp/ffmr-diff/* && cd fbcode && (foundation/scripts/ls-cpp-dirs \| grep -v '^$\.\.\\|external/\\|./external\\|folly/\|watchman/$' \| xargs ffmr -o /tmp/ffmr-diff codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_if.sh) && (cat /tmp/ffmr-diff/.diff \| patch -p2) && hg commit -m foo && cd .. && arc amend --yes --revision D4414676 && arc diff --nolint --nounit --excuse refactoring --prepare --big-diff -m 'something' ``` folly and watchman are in separate diffs. Reviewed By: meyering Differential Revision: D4414676 fbshipit-source-id: 75e2e11f4fac8a5f8071a1bafcc4ddc355fd6f4e	2017-01-29 01:34:03 -08:00
Adam Paszke	970caaa621	Exclude sphinx_rtd_theme from pep8	2017-01-28 23:37:39 -05:00
Adam Paszke	00a5980cdf	Improve RNN doc formatting	2017-01-28 23:37:39 -05:00
Adam Paszke	e24eee04f0	Link THC to THPP	2017-01-28 23:37:39 -05:00
Adam Paszke	f1b3af4ee2	Add more bernoulli options in cwrap	2017-01-28 23:37:39 -05:00
Adam Lerer	fb2d28f477	remove circular references in NestedIOFunction	2017-01-28 23:30:06 +01:00
Yangqing Jia	f64bc7d2a7	update to eigen 3.3.2	2017-01-28 13:37:09 -08:00
Yangqing Jia	3a82b33f84	Use protobuf's own cmake scripts and add travis for ios Summary: Closes https://github.com/caffe2/caffe2/pull/110 Differential Revision: D4475170 Pulled By: Yangqing fbshipit-source-id: 5964db04186619ac563f516cb202c5e2ba543403	2017-01-28 13:29:32 -08:00
Sergey Zagoruyko	3a704ff725	Fix legacy load_lua for SpatialConvolution (#608 ) * fix legacy load_lua for conv2d * fix pep8	2017-01-28 20:19:18 +01:00
Adam Paszke	0180e638e5	Remove unnecessary zero_() calls in cuDNN RNN	2017-01-28 14:36:57 +01:00
Adam Paszke	95c6ae04fb	Fix non-contiguous grad handling in cuDNN RNN	2017-01-28 14:36:57 +01:00
Alisson Gusatti Azzolini	14a5b35805	Snapshot -> Checkpoint Summary: As per kennyhorror request. Reviewed By: kennyhorror Differential Revision: D4473177 fbshipit-source-id: 6cab6ccf247b09aab8f6f056c807bd3ed27ee6a5	2017-01-27 22:29:32 -08:00
Andrey Malevich	86fb25cefa	Rely on embedding size in split Summary: As desc. Differential Revision: D4471823 fbshipit-source-id: 2685c64c22556da1749b3e3e6b21a684a7231e7b	2017-01-27 19:44:31 -08:00
Sam Gross	27c4c6e0af	Merge commit '6ee77b4edd1552d3a9a2e5389ffc351e513a8089'	2017-01-27 17:29:07 -08:00
Sam Gross	da17414b3f	Merge commit '343d65db91c2419843d36aed5467c2d1374108bc'	2017-01-27 17:16:08 -08:00
Sam Gross	be2b27a747	Merge commit '4461ae809043390d5223905cb82b17035c7f9f31'	2017-01-27 17:15:21 -08:00
Sam Gross	aec2c8f752	Merge commit 'c45ff2efe64d0face3889194ba6f885fe9cc4d48'	2017-01-27 17:12:13 -08:00
Adam Paszke	13e34b4679	Fix multiprocessing tests	2017-01-28 01:18:42 +01:00
Adam Paszke	57373c7c29	Fix docs	2017-01-28 01:16:04 +01:00
Luke Yeager	79f5bf84e5	[pep8] Potentially breaking docstring changes	2017-01-28 01:15:51 +01:00
Luke Yeager	3ed720079e	[pep8] Fix most remaining lint manually	2017-01-28 01:15:51 +01:00
Luke Yeager	e7c1e6a8e3	[pep8] Fix most lint automatically with autopep8 Here's the command I used to invoke autopep8 (in parallel!): git ls-files \| grep '\.py$' \| xargs -n1 -P`nproc` autopep8 -i Several rules are ignored in setup.cfg. The goal is to let autopep8 handle everything which it can handle safely, and to disable any rules which are tricky or controversial to address. We may want to come back and re-enable some of these rules later, but I'm trying to make this patch as safe as possible. Also configures flake8 to match pep8's behavior. Also configures TravisCI to check the whole project for lint.	2017-01-28 01:15:51 +01:00
Adam Paszke	f1d0d73ed7	Fix flaky Sqrt test	2017-01-28 00:45:49 +01:00
Adam Paszke	9c411513bf	Patch distutils crash when linking with ccache	2017-01-28 00:28:33 +01:00
Adam Paszke	ce78bc898b	Fix travis builds and add ccache	2017-01-28 00:28:33 +01:00
Sam Gross	887002e932	Add bindings to CUDA tensors and storages in THPP (#615 )	2017-01-27 18:15:56 -05:00
Viswanath Sivakumar	eba5299576	Port ROIPool to caffe2 trunk, add CPU implementation Summary: Xray is being converted to c2 and ROIPool (needed for detection models) is missing in c2 trunk. Ported rbgirshick's implementation from experimental with a few changes: Also added code for translation in caffe_translate.py Differential Revision: D4453331 fbshipit-source-id: 7a05a88edec1bd6e806e52dc1e6c55bc75c3149f	2017-01-27 12:59:20 -08:00
Yury Zemlyanskiy	22e1bdd6d1	Use stack workspaces in RecurrentNetwork Summary: This diff use stack workspaces in RecurrentNetwork, which allows to simplify the implementation and get rid of scratches. Reviewed By: salexspb Differential Revision: D4446813 fbshipit-source-id: 514eec7e4300bdf492a9cb192b40cf4f89acf656	2017-01-27 11:44:26 -08:00
Luke Yeager	31dea5ff23	Small typo in README (#613 )	2017-01-27 20:18:36 +01:00
Alfredo Canziani	ec4602a973	Fix bad code alignment (#612 ) forward is a method of the Linear class	2017-01-27 20:16:49 +01:00
Ou Jin	ed04a20289	distributed reader for evaluation Summary: Using multiple readers for model evaluation. Since it is built by new framework, only NativeLoader is supported. With 5 readers, the evaluation speed is 124k. The speed for single evaluator is 32k. There is still room for improvement since the evaluator machine is under-utilized. (Hive is the bottleneck. Adding more loading threads help to improve the speed to 240k. More readers can improve it further.) Reviewed By: azzolini Differential Revision: D4469393 fbshipit-source-id: b55af5f798faca4c150b2c0663fe5db0f154cb70	2017-01-27 10:44:24 -08:00
Vsevolod Oparin	319945df15	Test for FC operator + fix for docs Summary: Test for FC operator + fix for docs Differential Revision: D4473293 fbshipit-source-id: 6e6ebad007ee08b05184fda288ab74982c6b2219	2017-01-27 10:44:24 -08:00
Alfredo Canziani	a38749d15f	Fix cuda notes Target GPU is consisten with source GPU	2017-01-27 19:30:49 +01:00
Will Frey	6ee77b4edd	Added cunn support for TemporalRowConvolutionMM (#415 ) * Added cunn TemporalRowConvolutionMM support	2017-01-27 13:30:25 -05:00
Fei Sun	cc65cc64c8	Create function ParseProtobufFromLargeString to parse strings more than 64MB Summary: Replace ParseFromString with ParseProtobufFromLargeString to get around the limitation of the 64MB limit. Reviewed By: Yangqing Differential Revision: D4466226 fbshipit-source-id: b68a6efc76955db294ddb0d23bbaf03b69e4952a	2017-01-27 10:29:22 -08:00
Soumith Chintala	343d65db91	Rowconv repull (#1120 ) * Added TemporalRowConvolutionMM layer, tests, and documentation	2017-01-27 13:29:05 -05:00
Yangqing Jia	4c614f2e67	Add ios-cmake	2017-01-27 00:08:57 -08:00
Angela Fan	6328981fcf	cuda implementation of Gated Linear Unit, fixed issues with genericization	2017-01-26 22:56:33 -08:00
Viswanath Sivakumar	ca1ff1ee9b	Add Flatten layer, bugfix in InnerProduct Summary: Uncovered these while converting xray detection model. Differential Revision: D4461051 fbshipit-source-id: 1654c0d7ed101c8c211a93aed6bb542db1e20e0a	2017-01-26 21:44:35 -08:00
Yangqing Jia	da01542399	fix third_party symlink	2017-01-26 21:31:18 -08:00
Bram Wasti	9dd1d9428e	Made translator work as command line tool Summary: Might be useful to have a command line version of this. Thoughts? Reviewed By: Yangqing Differential Revision: D4456221 fbshipit-source-id: 42dd464c5734c0cfbd4c2b1cb348aef9b269b4c2	2017-01-26 20:29:35 -08:00
Yangqing Jia	01e860505b	Cmake for android Summary: Added cmake for android script under scripts, and set up the travis contbuild target. Closes https://github.com/caffe2/caffe2/pull/109 Reviewed By: bwasti Differential Revision: D4468767 Pulled By: Yangqing fbshipit-source-id: 709f3eb6be24727b0a989d0901dbf377871b122a	2017-01-26 18:14:30 -08:00
Bram Wasti	59d263280e	fix directory reference in cmake for inclusion as library Summary: This fixes build that include caffe2 and change the value of CMAKE_BINARY_DIR to their own binary dir. Allows the generation of protobuf headers/files in particular. Reviewed By: Yangqing Differential Revision: D4466126 fbshipit-source-id: eba264094dd2bff07a7f050b95fd2d5525462b09	2017-01-26 14:44:37 -08:00
Soumith Chintala	a90913105c	add make-contiguous in batchnorm backward (#602 )	2017-01-26 16:17:39 -05:00
Brandon Amos	9368596059	legacy.nn Attributes: Add '_gradOutput' to SpatialConvolution. (#600 )	2017-01-26 15:00:41 -05:00
Dmytro Dzhulgakov	864f561525	Make BlobDeserialization throw exceptions instead of returning bool Summary: Makes it much nicer to spot errors, especially in iPython notebook. Reviewed By: kennyhorror Differential Revision: D4465726 fbshipit-source-id: c0adaf5168248a70987ff9d5dfce54a622ff2219	2017-01-26 09:44:19 -08:00
Adam Paszke	80ed795ff1	Minor ffi utils fix	2017-01-26 11:55:49 +01:00
Alexander Sidorov	8bff8014b3	print out inputs in lstm test to catch when it is fluky Summary: We get fluky lstm tests on a numerical gradient check. I would like to improve accuracy of the latter. But first need an example. After lading this TestWarden would find a bad input for me. Reviewed By: urikz Differential Revision: D4467223 fbshipit-source-id: 68d4bf22af11190f39fa28332c6d99efbb192132	2017-01-25 20:59:21 -08:00
Bram Wasti	b6e330641a	fix Android studio compilation error Summary: Android studio auto -Werrors in debug mode and throws an error on non string literals in 3rd argument of android_log_print Reviewed By: Yangqing Differential Revision: D4465263 fbshipit-source-id: af6dc436b7c98a29aa89bb241c452e6da5c8ad1f	2017-01-25 20:29:23 -08:00
Minsuk (Brian) Kahng	de8cd46416	Caffe2 graph to json for visualization in flow Summary: - Writing a Caffe2 computation graph to json for visualization in Flow - Example use in the Text models workflow: it replaces the existing draw function which produces PNG file - Visualization: https://our.intern.facebook.com/intern/fblearner/c2graphvis/13215753/ - The visualization uses FBLearnerDAG. Plan to add many visualization-related features. Reviewed By: Mortimerp9 Differential Revision: D4415299 fbshipit-source-id: 2d641d60177566ed2837fb3750394420690f28de	2017-01-25 19:44:20 -08:00
Andrew Tulloch	0f870d4f40	Add error checking for too-small input in ConvPoolOpBase Summary: Fixes segfaults that occur in Eigen and im2col/sgemm backends. Reviewed By: Yangqing Differential Revision: D4451772 fbshipit-source-id: 3cf21e5afb2fe300db4228933a82063db5f7091f	2017-01-25 17:44:22 -08:00
Viswanath Sivakumar	9775ffc6ae	Fixes to topological sort, canonical blob naming, sharing final blob Summary: Three small changes: Reviewed By: ajtulloch Differential Revision: D4437131 fbshipit-source-id: c849e36e1c4d1dce947076349df863fafe62c66d	2017-01-25 15:14:26 -08:00
Viswanath Sivakumar	a4ba0cceb2	Run memonger to optimize net if needed Summary: This runs memory optimization on the net. Differential Revision: D4433788 fbshipit-source-id: 80c3f0568795c2d7a5beb3cdb89a92af91162fef	2017-01-25 15:14:26 -08:00
Sam Gross	fa1516d319	Install THCUNN.h and generic/THCUNN.h The THCApply.cuh is moved to the .cu files so that THCUNN.h can be compiled by a standard C compiler.	2017-01-25 14:13:17 -08:00
Sam Gross	5e26f49db4	Install THNN.h and generic/THNN.h	2017-01-25 14:09:09 -08:00
Soumith Chintala	7694f65120	Revert "Using accreal instead of real in the API"	2017-01-25 16:26:42 -05:00
Soumith Chintala	b5ebf68df1	Revert "Convert real to accreal in libTHCUNN"	2017-01-25 16:13:20 -05:00
Priya Goyal	40ce50e0bd	Speed-up training, fast data-augmentation, sync data_parallel_model changes + other small fixes Summary: 1. Use opencv for data augmentation after benchmarking various image libraries in python 2. Use cuda no bias conv 3. Use cuda fastest conv (exhaustive search) 4. data_parallel_model had a few changes. Syncing them 3. propagate the errors in threads to make debugging easy Reviewed By: rbgirshick Differential Revision: D4341422 fbshipit-source-id: aa4471a2f49dd6d7ca13879999b3c7ceaf818c1e	2017-01-25 11:44:22 -08:00
Dmytro Dzhulgakov	aed53dd7cf	Pass cmd flags of GlobalInit down to workers in Flow Summary: It's a similar trick to dyndeps. The idea is that global state is better to be just replicated to gang workers as otherwise it causes a lot of confusion. In particular it's useful if one wants to enable detailed logging (--v) For other operators user still needs to call GlobalInit explicitly. We should consider doing it for all Flow operators, but I'll leave it for future considerations. Reviewed By: kennyhorror Differential Revision: D4460686 fbshipit-source-id: 5836737dd3195f9ad12589fd899a3ff63f173e05	2017-01-25 11:14:51 -08:00
Dmytro Dzhulgakov	630d3a5984	Fix blob serialization in KVStore ops Summary: Fixes the problem surfaced by D4446583. Our serialization interface is designed for chunking but recepients in distributed training didn't expect that. For now I just fixed the naming of the tensor and since our blobs are small it should work. I believe it's still wrong however for big tensors as we just concatenate the serialized proto strings of chunks here: https://fburl.com/6wayxglz and here: https://fburl.com/7k4nhjja . Deserialization path though just tries to deserialize it as a single proto. I'll make Blob::Serialize(name) version use non-chunking version in a separate diff. Just sending it to unblock for now. Side note - oujin - why do we have two versions of operator setting the blob? :) Is one of them added by Pieter? Maybe we should unify them a bit. Reviewed By: kennyhorror Differential Revision: D4460974 fbshipit-source-id: 485b4de7c8af8cd9eac44c06a1246deaf0b4d502	2017-01-25 11:14:51 -08:00
Dmytro Dzhulgakov	65f7c915fd	Fix non-chunked Blob::Serialize method Summary: Previous implementation was just concatenating string which I believe is wrong. Instead let's turn off chunking when we don't ask for it. Reviewed By: kennyhorror Differential Revision: D4461311 fbshipit-source-id: 8b9a3325a40a1cd0a8ffeeb20a17bf9f57b7b0a9	2017-01-25 11:14:51 -08:00
Soumith Chintala	2cad802b68	Revert "cuda implementation of Gated Linear Unit"	2017-01-25 13:15:22 -05:00
Xianjie Chen	ddbf90afa3	improve dper dh Summary: it's broken because it relies on add sparse bias. it's not easy to add_sparse_bias after switch to loader_param. DPA would like to try it out :) Differential Revision: D4447275 fbshipit-source-id: 631cb4995f35383070e44387dc86692ba64b91eb	2017-01-25 02:59:22 -08:00
Yury Zemlyanskiy	0e3146e1e8	Remove recurrent_sizes from RecurrentNetwork Summary: Remove usage of recurrent_sizes, so recurrent states' sizes can depend on input (in case of attention matrix for beam decoder). I removed recurrent_sizes from forward and backward steps. Reviewed By: salexspb Differential Revision: D4427688 fbshipit-source-id: 580420a294d309c86ec5cb4e677058623b7228e1	2017-01-24 23:14:25 -08:00
Vsevolod Oparin	5e5486491d	Replace Gather + RowMul by SparseLengthsWeightedSum Summary: Improving performace using command SparseLenghtsWeightedSum. Results for my run: Before: 8.98474 RowMul 6.89952 Gather 0.80991 LengthsSum 2.02056 SparseLengthsWeightedSum Total: 18.71 After: 1.075 Gather 6.54999 SparseLengthsWeightedSum Total: 7.62 Log of run: P56992396 With skip_backward. Command: CLASSPATH=/mnt/vol/gfsetlprocstore-oregon/users/cxj/hivereader-wrapper-1.0-SNAPSHOT-standalone.jar OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 MKL_DYNAMIC=FALSE ./buck-out/gen/caffe2/caffe2/fb/dper/tools/speed_benchmark.par -loader_param /mnt/vol/gfsfblearner-altoona/flow/data/2017-01-22/d832bb7b-5598-422e-9fee-b3299a9c8c1f -negDownsampleRate 0.1 -hidden 'unary(dot{"num_dense": 6, "pooling_method": "PositionWeighted"}(128, 64)128-128, 1)' -model_type mlp_sparse -warmup_runs 10 -main_runs 1000 -run_individual -skip_backward 2>&1 \| tee /tmp/log.txt Before: P56993234$7509 After: P56992503$7344 Command: ./fblearner/nn/ads/canary all https://our.intern.facebook.com/intern/fblearner/details/13320564/?notif_channel=cli Cloned "caffe2 ads sparse nn canary" run: https://our.intern.facebook.com/intern/fblearner/details/13322337/ Reviewed By: xianjiec Differential Revision: D4451073 fbshipit-source-id: 0a4e9693d7b8b0372b2efefa61154e987a493210	2017-01-24 20:44:21 -08:00
Yangqing Jia	f0996309d9	Fix Caffe2 gcc 4.8 regex issue Summary: It seems that a simple string("") conversion instead of "" is enough. Closes https://github.com/caffe2/caffe2/pull/105 Differential Revision: D4458626 Pulled By: Yangqing fbshipit-source-id: 5072499516332ad1067779526523a3f10aade6ef	2017-01-24 19:29:21 -08:00
Amy Zhang	962b16a814	speedup of softmaxwithlossop Summary: Speeds up inference in the FCIS model from 2900ms/iter for SoftmaxWithLoss layer to 230ms/iter Differential Revision: D4456494 fbshipit-source-id: dd520d91fbe950511d198de45f34ac4cd4a676b0	2017-01-24 18:44:28 -08:00
Alexander Sidorov	b1472a173a	don't hardcode outputs order to work only for lstm + don't pass blob names for parameters Summary: In this diff I stop passing parameters by name and also remove hardcoded output ids which were there specifically for LSTM to work. It also allows to avoid using recurrent_sizes in the backward pass (for forward this is done in D4427688) Using similar technic it should be simple enough to eliminate blob name passing at all. Then we can fix scoping. These can be done in a next diff. Reviewed By: urikz Differential Revision: D4444614 fbshipit-source-id: 3580a76365502b9f2f09e3d8b7e78084ca739f00	2017-01-24 16:29:23 -08:00
Alexander Sidorov	f09da676d7	CNNModelHelper.LSTM test Summary: lets have a test for this so we don't break existing usecases while iterating over RecurrentOp's code Reviewed By: urikz Differential Revision: D4456404 fbshipit-source-id: 79f2b88c1eed16106adf5b793b4c74441c7146c6	2017-01-24 15:59:24 -08:00
Alexander Sidorov	b7a2a41ceb	TensorPrinter helper c++ class Summary: it is annoying to print tensors from c++ (while it is easy from python when you have a net). So I just took logic out of PrintOp into a separate class Reviewed By: urikz Differential Revision: D4452793 fbshipit-source-id: d512559fe07bc468423c9ce38da0c44eaad4fdec	2017-01-24 15:59:23 -08:00
Alexander Sidorov	e64b404d45	logging: Join() method for printing vectors Summary: I can't live without it and we don't have folly here. Reviewed By: urikz Differential Revision: D4444511 fbshipit-source-id: 3a85f1a13bd3032be89b3150d40a701dce192004	2017-01-24 11:14:21 -08:00
Soumith Chintala	b39de2cbbe	Merge pull request #416 from pavanky/half-fixes Convert real to accreal in libTHCUNN	2017-01-24 12:17:49 -05:00
Soumith Chintala	49a555e0f5	Merge pull request #1109 from pavanky/api Using accreal instead of real in the API	2017-01-24 12:17:17 -05:00
Soumith Chintala	c45ff2efe6	Merge pull request #915 from pavanky/convert Macros to convert between real and accreal	2017-01-24 09:14:33 -05:00
Soumith Chintala	99b520cc5d	Merge pull request #421 from huihuifan/cudaGLU cuda implementation of Gated Linear Unit	2017-01-24 09:13:34 -05:00
Matt Uyttendaele	200ae58c35	modified save_op for multi-gpu training Summary: added functions to "de scope" the saved model files Reviewed By: Yangqing Differential Revision: D4444966 fbshipit-source-id: f447c15754f8e0648459148fcc7fba410dc06f68	2017-01-23 19:44:20 -08:00
Chao Zhang	96fc095ccb	Add piecewise linear transformation operator Summary: New operator is added for model calibration. Given a piecewise linear function and raw prediction as input, generate the mapping as output. Detail can be find in the operator doc. Differential Revision: D4418640 fbshipit-source-id: f8ff3ea786b0fe233a4ddcb709e5dbf0861ca484	2017-01-23 17:44:26 -08:00
Wael Abdelghani	eb6455d2d9	Remove enforce to have tensor data_ when sharing tensors Summary: We don't need this enforce since we already allow raw_mutable_data to return nullptr, we should be able to share meta for tensors even without data Reviewed By: Yangqing, kennyhorror Differential Revision: D4439138 fbshipit-source-id: 0e81bef3054fe2f9720efd5002418eac7a2b6c08	2017-01-23 14:44:21 -08:00
Bram Wasti	b5424c9646	Enable top-k accuracy option in caffe_translator Summary: Caffe2 has a topk accuracy op now Differential Revision: D4450387 fbshipit-source-id: 2d516cc44fb4e814ca901e73746b0364a0584217	2017-01-23 14:29:24 -08:00
Simon Layton	7acdece3b2	Comment out NHWC Alexnet test for now Summary: Relies on NHWC implementation of group conv which doesn't exist right now Closes https://github.com/caffe2/caffe2/pull/103 Differential Revision: D4451635 Pulled By: Yangqing fbshipit-source-id: 31d99b37abf7563a26389f47affcc759ce6bc5e1	2017-01-23 13:59:29 -08:00
Alexander Sidorov	ceb0c765b9	Make avoid duplicate keys when doing chunking in serialization Summary: Some DB don't support duplicate keys. Nvidia had problems with LMDB where we potentially can setup duplicate keys. But this won't be possible in some other cases. So instead lets just store different chunks with different keys in DB. And then when reading back we will remove the special suffix. Reviewed By: dzhulgakov Differential Revision: D4446583 fbshipit-source-id: 6b345e342840c5fd476029166db131d343467d48	2017-01-23 10:14:18 -08:00
Yangqing Jia	e3ea3e8c12	MKL convolution operator Summary: Closes https://github.com/caffe2/caffe2/pull/102 Differential Revision: D4448886 Pulled By: Yangqing fbshipit-source-id: 914d11cd79107895a9755154df3526fcf71a31ea	2017-01-23 09:59:30 -08:00
Ross Girshick	e0c90de6e6	Speedup get_op_ids_in_path Summary: Perf bug report: https://www.facebook.com/groups/1405155842844877/permalink/1617904561570003/ Diagnosis: I've done some digging into this and here's what I've found: (1) In this use case, the call is disallowed_op_ids = get_op_ids_in_path(ssa, blob_versions, [], inputs)) where inputs = ['res4_22_sum'] is the last blob produced by the res4 stage of a ResNet101 model. (2) get_op_ids_in_path has exponential running time in the number of blocks in the res4 stage of ResNet. This is based on empirical running times. This call should complete in 4.5 days on my devgpu. (3) I haven't familiarized myself enough with the IR and SSA code in core.py to understand the algorithmic fix yet, but surely there's a more efficient algorithm to compute the same thing. Reviewed By: Yangqing Differential Revision: D4446278 fbshipit-source-id: 8bd147f92d62b865dc355d5802a53e92d64b6e21	2017-01-23 09:44:26 -08:00
Alexander Sidorov	c4b640aeb2	@debug decorator to make it easier to use dropin debugger Summary: Now it takes two lines to get drop-in debugger: import it and then decorate your function. Also got rid of enable / disable logic as it doesn't seem usefull. We can also try to enable this by default for our tests when running locally as a next step. Reviewed By: bwasti Differential Revision: D4444299 fbshipit-source-id: 6e2006945d8ad640685b1017ca1bd63054728908	2017-01-23 09:44:26 -08:00
Andrey Malevich	ec51f887bf	Create only one instance of SigridTransform in DPerExample. Summary: DPer example have been creating multiple copies of the transform config in net defition till this moment, that resulted in the fact that I've hit the limit of ProtoBuf (64MB) for a certain Task requests (especially visible because of the ValidationPipeline that I was adding). After this diff we're going to store SigridTransforms in one instance per machine for training (or 1 instance per reading). Difference in sizes of the plans for some simple SparseNN model ~30 MB (even including the fact that second model have validation plan as well). TODO: Do similar logic for NNPreProc as well (it's also pretty large). Reviewed By: dzhulgakov Differential Revision: D4441441 fbshipit-source-id: 4452dd86a4dc49b2c7f5b7642f443aed5720b047	2017-01-22 19:29:16 -08:00
Yangqing Jia	be1224c0a7	cmake: allow execution of python files without make install Summary: This will help issues like #99 Closes https://github.com/caffe2/caffe2/pull/101 Differential Revision: D4448397 Pulled By: Yangqing fbshipit-source-id: ede3fafc1b1314886583e8ea38948bb31e69347b	2017-01-22 13:29:37 -08:00
Simon Layton	c1ba0fbab3	Refactor CuDNNReluOp for multi-precision Summary: One way of simplifying the fp16 / multi-precision operators -- remove the explict OpName / OpNameFP16 divide, dispatch the correct calls at runtime based on the contents of the input tensor(s). Closes https://github.com/caffe2/caffe2/pull/93 Differential Revision: D4444417 Pulled By: Yangqing fbshipit-source-id: 296dcff1e1e24ba534caca9b82f16e6634da2287	2017-01-22 13:29:37 -08:00
Andrew Tulloch	70af31b6c3	Fix ARM_NEON codepath for non-shared scale case. Summary: From a new model trained by Zhen. We never exercised this codepath before since we've never had models with this choice before. I'm auditing all our ARM_NEON codepaths to see if there are other cases like this. Reviewed By: Yangqing Differential Revision: D4444694 fbshipit-source-id: e0436db4e8b655551fedb21df160b7cae7e79737	2017-01-20 17:14:33 -08:00
Aapo Kyrola	06398e9bfb	softmax-with-loss, handle gracefully cases when total weight is 0 Summary: Spatial Softmax allows specifying locations that are not counted for the loss. If none of the locations are counted, this resulted in NaNs, and headache. This diff fixes that by explicitly handling these cases. + assertion for label blob dimension(0) Created a new test as well. Differential Revision: D4442939 fbshipit-source-id: 8641bfad2a994e517ca3eda39345380a6ca1ba50	2017-01-20 15:29:21 -08:00
Aapo Kyrola	e18643f90b	More fixes Summary: When testing the code, a couple of issues arised: - we need to have different name for last layer than the preprocessed model, otherwise a shape assertion is created - preprocess_noaugmentation still needs to do a crop for images larger than 227x227, otherwise things fail. Reviewed By: viswanathgs Differential Revision: D4442700 fbshipit-source-id: 05f54e7f17c266280f5ba5bb57af1721fe30df12	2017-01-20 13:44:24 -08:00
Kevin Matzen	6a7dd236fa	instance norm Summary: Added gradient and GPU implementation to caffe2 InstanceNorm op Reviewed By: Yangqing Differential Revision: D4304808 fbshipit-source-id: 6feecaed589ea9f825260a49b39b4260da6e5426	2017-01-20 12:29:28 -08:00
Simon Layton	a727742644	Prevent concurrent memory and NCCL ops Summary: Use a mutex to prevent simultaneous memory alloc / free and NCCL ops Closes https://github.com/caffe2/caffe2/pull/95 Reviewed By: bwasti Differential Revision: D4438796 Pulled By: Yangqing fbshipit-source-id: e5119b4cffcc54f4a2da066d167e93934b302234	2017-01-20 10:59:35 -08:00
Alexander Sidorov	3f66f66da9	DebugMode helper for Caffe2 Summary: It helps to develop scripts locally (when working outside of Flow). One doesn't have to rerun the script in order to catch exception in the debugger / add a print statement. (Flow does this kind of thing automatically) Usage example: ``` if __name__ == '__main__': workspace.GlobalInit(['caffe2', '--caffe2_log_level=2']) from caffe2.python.utils import DebugMode DebugMode.enable() DebugMode.run(main) ``` Reviewed By: Yangqing Differential Revision: D4424096 fbshipit-source-id: 73f418c80f581820e70139df7e166981e4d8c55f	2017-01-20 09:29:31 -08:00
Angela Fan	7179002bfb	cuda implementation of Gated Linear Unit	2017-01-19 23:01:30 -08:00
Angela Fan	43b5be1d78	added c implementation of GatedLinearUnit	2017-01-19 22:18:08 -08:00
Aapo Kyrola	afe822ebd7	Small tweaks Summary: Some tweaks, hopefully getting us to 0.98 MAP - no cropping for test dataset (as per patrick) - spatialBN momentum 0.1 (default is 0.9) Also added some additional logging and reduced frequency of running of test net and logging. Reviewed By: viswanathgs Differential Revision: D4439790 fbshipit-source-id: 700705b811a5fc8c7139a265de96db646605ca5a	2017-01-19 18:44:26 -08:00
Ahmed Taei	411059d649	Generate huffman tree Summary: In this diff : [1] Change the output from generating all paths from root to labels to TreeProto. TreeProto itself is required by inference and we can use hsm_util to get the paths from TreeProto. [2] Fix hsm_util index assigment. Differential Revision: D4416731 fbshipit-source-id: 657d8b9b4df6fa30c9f92d391cf7e07b5c5db1f8	2017-01-19 16:14:23 -08:00
Matt Uyttendaele	9c2067cc49	tweaked CUDNN_BN_MIN_EPSILON comparison to eliminate runtime warning Summary: CudnnSpatialBNOp was generating a runtime warning when testing (epsilon_ < CUDNN_BN_MIN_EPSILON) even though epsilon is set = to CUDNN_BN_MIN_EPSILON by default. Tweaked the comparison here to allow for a small epsilon. I implemented the softer comparison by introducing FLT_EPSILON from <float.h> - let me know if there is a preferable set of constants to use here. Reviewed By: Yangqing Differential Revision: D4431766 fbshipit-source-id: 5e67690a5ed258d460d95e9582b6fdf2050b42f9	2017-01-19 15:44:25 -08:00
Simon Layton	70459202da	Install tests Summary: Install tests if they're built Closes https://github.com/caffe2/caffe2/pull/94 Reviewed By: bwasti Differential Revision: D4437950 Pulled By: Yangqing fbshipit-source-id: 21204bd25a93a47e3e66378251e55fcfae6af7cf	2017-01-19 15:14:53 -08:00
Ahmed Taei	dd51336611	Fix label start index for HuffmanTreeHierarchyOp Summary: Change labels indices range to be in the range [0, num_classes[ Differential Revision: D4416685 fbshipit-source-id: b16ca8539fd538ad62bf1298dbad3f1553956241	2017-01-19 15:14:53 -08:00
Andrey Malevich	9f0a7935f6	Replace one more place from _net.external_input to _external_input_map Summary: #accept2ship Reviewed By: dzhulgakov Differential Revision: D4435301 fbshipit-source-id: 6b62492c190325e82bc14d5397852106d07d5235	2017-01-19 12:29:30 -08:00
Aapo Kyrola	8ed9a91d77	Avoid PrefetchOp destructor assertion when not necessary Summary: Countless hours were spent debugging why ImageInputOp failed with a cryptic exception P56967302. Turns out, that assertion happened in PrefetchOp destructor, that was triggered when a assertion failed in ImageInputOp constructor. Because of this, the underlying problem was shadowed. I fixed this by not asserting on finalize_ if there is no prefetch thread running, and now the error is clean: [enforce fail at image_input_op.h:105] scale_ > 0. -1 vs 0. Must provide the scaling factor. Reviewed By: Yangqing Differential Revision: D4435105 fbshipit-source-id: 52f85a9fd30eea396c9faca54b6d946fa847b7ff	2017-01-19 08:29:22 -08:00
Yangqing Jia	16d2f3b44c	reshape explicitly in-place Summary: TSIA Reviewed By: salexspb Differential Revision: D4434558 fbshipit-source-id: 025d70db82d4710f66008086833cf833c2344401	2017-01-19 01:59:25 -08:00
Yangqing Jia	91ebfa3c7c	Unit test for big batch size avg pooling Summary: basically copied test_pooling and hard coded values Reviewed By: prigoyal Differential Revision: D4428162 fbshipit-source-id: 6c0444ac8c21f08824df7ff53999a94967607dc4	2017-01-18 19:29:20 -08:00
Viswanath Sivakumar	be97f491e6	Unbreak caffe_translator for Conv op Summary: Minor bug in D4426513 - bias is added as input blob always. Running it on xray throws "RuntimeError: [enforce fail at operator.cc:25] blob != nullptr. op Conv: Encountered a non-existing input blob: caffe.SpatialConvolution_0_b" Reviewed By: Yangqing Differential Revision: D4429231 fbshipit-source-id: 0d3905ea6e87128ec1aa9d0f0a2f43126b1069b1	2017-01-18 14:00:04 -08:00
Viswanath Sivakumar	e67425647a	Support bias for Scale layer in caffe_translate Summary: Turns out xray models have some independent Scale layers (with bias) besides the Conv-Scale pairs. We could still fuse it with previous layers with some work, but for simplicity, including Add op followed by Mul for bias if needed. We could revisit optimizations layer fusion in the future once we have something working for xray. Reviewed By: Yangqing Differential Revision: D4427266 fbshipit-source-id: ef7d8677ccd7d10dbd20759eeed378d9bc4522d1	2017-01-18 09:59:21 -08:00
Yangqing Jia	bfca2b86c3	Removed the old group convolution code Summary: Now that we direct support group convolution, this will no longer be needed. I also took the chance to add dilated convolution and also optional bias. Reviewed By: prigoyal Differential Revision: D4426513 fbshipit-source-id: eb2bb0aa619f8ff5f732512570f736bc59cd57dd	2017-01-18 00:44:31 -08:00
Andrew Tulloch	e23ddf06e9	UnsafeCoalesceOp for `nn.Module.flattenParameters` style coalescing Summary: This is a handy tool for amortizing expensive operators (e.g. distributed communication, some heavier kernel launches, etc) over a lot of small blobs (e.g. all the biases in a network). We can just coalesce these small blobs in-place into a single blob, act on them in operators, etc as if they are non-coalsed (passing them as inputs to operators, etc), and then finally for heavier operators, just work on the coalesced blob that contains each of these units. I named it UnsafeCoalesce since it introduces blob aliasing, which needs care for work like memory management, graph rewriting as in memonger, etc. Reviewed By: Yangqing Differential Revision: D3557149 fbshipit-source-id: 09cff4459b84270fe9e1da3b4a168fd66d01f795	2017-01-17 17:14:35 -08:00
Pavan Yalamanchili	b5f6fdb814	Using accreal instead of real in the API This is done to be consistent with the changes made to cunn	2017-01-17 16:58:19 -08:00
Pavan Yalamanchili	a69d819901	Converting all instances of real to accreal in libTHCUNN This is because the current version of luaffifb fails to pass custom structs (i.e. half) as arguments or accept them as return values. The accreal parameters are immediately converted to real internally. This is done to ensure none of the internal code needs to be changed. This change also removes transform_reals_to_half which is no longer necessary. Change-Id: I978151d001de5492576fb0eddfa0608cd4e99149	2017-01-17 16:06:42 -08:00
Pavan Yalamanchili	fef2b1526d	Adding macros to convert between real and accreal	2017-01-17 15:14:45 -08:00
Pavan Yalamanchili	3719994c96	Remove redundant code in THGenerateAllTypes.h	2017-01-17 15:12:43 -08:00
Yangqing Jia	204867a884	in lite mode, return the non-readable string, better than nothing. Summary: TSIA Reviewed By: bwasti Differential Revision: D4379950 fbshipit-source-id: 8a5d0b5454c2f1b874526f4393c4b575966bc889	2017-01-17 11:59:30 -08:00
Viswanath Sivakumar	d63f58013b	Throw error in caffe_translator on Scale layer with bias Summary: Failing fast instead of swallowing the bias term. Differential Revision: D4419130 fbshipit-source-id: 98ce0af9a20adecfb027ffe8293ff69910873abc	2017-01-17 09:59:20 -08:00
Viswanath Sivakumar	7d6742f2f5	Tool to convert caffe models to c2 + fixes for xray v10 Summary: Simple tool similar to caffe_translator_test.py for conversion from caffe to caffe2. The differences are: There are a couple of issues that need to be fixed as mentioned in https://our.intern.facebook.com/intern/tasks?t=15424761, especially related to the 'legacy_pad' field in conv op. Differential Revision: D4407146 fbshipit-source-id: ec641f6d7e0cf6cdf2eca21f058b4451635d4a56	2017-01-17 08:59:58 -08:00
Rui Guo	4461ae8090	include cstddef for msvc	2017-01-15 23:45:48 +08:00
Ruotian Luo	2b948c42cd	Add SpatialAdaptiveAveragePooling.	2017-01-14 19:44:07 -06:00
Ruotian Luo	b2ae054410	Add SpatialAdaptiveAveragePooling.	2017-01-14 15:27:52 -06:00
Aapo Kyrola	b96c2ed6ab	fix validation to consider cpu-only ops Summary: Data paralell model has a sanity check that ensures that operators inputs/outputs do not cross device boundaries. This failed when the operator was a CPU-only operator (such as the new AccuracyOp version). This fixes that. Reviewed By: prigoyal Differential Revision: D4417841 fbshipit-source-id: 9bc4e7a2074a544ca4db69ecf24183bbd41f84ca	2017-01-13 18:59:32 -08:00
Yangqing Jia	8683737410	Caffe translator: match torch pooling Summary: See code comments: legacy is a legend. Reviewed By: viswanathgs Differential Revision: D4414447 fbshipit-source-id: 7cd96778bbc00aff053100871f273b2e1b43c973	2017-01-13 10:59:20 -08:00
Ahmed Taei	9ad10959ee	Enable large PlanDef protobuf message. Summary: Enable cases where PlanDef message is bigger than protobuf string decoding limits. Differential Revision: D4412736 fbshipit-source-id: 91ee02d7a8ab85b1c8169683a6c1dccd4c79be40	2017-01-13 09:29:29 -08:00
Matt Uyttendaele	d9c9404885	refactor to allow for parallel gpu execution Summary: First step in doing multi GPU training - modification of training code to use ImageInputOp. A few changes to accomplish that: + modified script that generates our lmdb to store byte image data instead of float + we have a float 'label' for our regression problem so added support for float labels in ImageInputOp + updated train_network.py to use ImageInputOp, but it is still single GPU Reviewed By: seansnyder Differential Revision: D4407728 fbshipit-source-id: a59a1b91b69a9d5f0486383d4fb0a993478393c9	2017-01-12 15:14:50 -08:00
Bram Wasti	0d5f3654b2	Adding back untracked files from manual github pull Summary: Github import didn't work and the manual import lost some files. Reviewed By: Yangqing Differential Revision: D4408509 fbshipit-source-id: ec8edb8c02876410f0ef212bde6847a7ba327fe4	2017-01-12 08:59:19 -08:00
Andrey Malevich	048be4533d	Fix autogenerated docs. Summary: It looks like markdown is not happy for lines starting with =. This diff is just simply fixes 2 cases when it was not true. Reviewed By: dzhulgakov Differential Revision: D4409033 fbshipit-source-id: f2ba3ce5e3936a1e0d57984c12234209993550be	2017-01-12 03:29:18 -08:00
Yangqing Jia	3a514fe28d	gpu transform fix Summary: Removed the no longer needed line. Differential Revision: D4403219 fbshipit-source-id: 5a4b9cbb6c9ab5afa3b973baae9505e170b83da3	2017-01-11 22:44:20 -08:00
Yangqing Jia	f0c893dcb8	ShareExternalPointer with meta Summary: TSIA - for background, see D3557149 Reviewed By: ajtulloch Differential Revision: D4405095 fbshipit-source-id: ea74749e3deacee74ac89e38bf6c47e340be3c92	2017-01-11 22:29:25 -08:00
Yangqing Jia	1cd166d330	CMake completions work Summary: Closes https://github.com/caffe2/caffe2/pull/88 Differential Revision: D4404292 Pulled By: bwasti fbshipit-source-id: 8a4351c2dee5136aaa12b90f1a61fd7afee51994	2017-01-11 16:59:22 -08:00
Simon Layton	d8314bf278	Fix ordering of TransformOnGPU arguments Summary: Swap std, mean to match actual interface Closes https://github.com/caffe2/caffe2/pull/86 Reviewed By: Yangqing Differential Revision: D4387679 Pulled By: bwasti fbshipit-source-id: 54020af2398240f79ee6bb0c1f6b01ab58287353	2017-01-11 11:44:38 -08:00
Dmytro Dzhulgakov	4de888e167	Add optional gradient on weights for (Sparse)LengthsWeightedSum Summary: It ended up much messier than originally expected. Maybe we should have just hardcode it, but I've tried to be "generic" so far at expense of code readability. The main issue is that for weights computation we need access to original embedding matrix and in sparse case we need to relookup the embeddings to do the dot product with output grads. Thus I'm making weight grad computation optional, controlled by a flag and it triggers invocation of a different backward op that produces both grads at the same time. So far it's implemented only for 'Lengths' version. It'd be straightforward to implement (Un)SortedSegment versions but I haven't done that yet. Reviewed By: kennyhorror Differential Revision: D4388215 fbshipit-source-id: 23132ab7daa1f5eec49233f802af1fe75b469c2b	2017-01-11 11:44:38 -08:00
Dmytro Dzhulgakov	4ae5235ec9	Tiny clean up of reducer_functors Summary: Just to make life a bit easier to further work. Reviewed By: kennyhorror Differential Revision: D4388071 fbshipit-source-id: 71b99ef1c2dc680afe4e9ef2f7a370e43116ce99	2017-01-11 11:44:38 -08:00
Bram Wasti	0e6ebdf50a	Speed up travis slightly and fix documentation mistake Summary: Closes https://github.com/caffe2/caffe2/pull/90 Differential Revision: D4404418 Pulled By: bwasti fbshipit-source-id: a45af5624eff12abbb103f1e55d2906d35e0dee5	2017-01-11 10:44:27 -08:00
Pooya Davoodi	92ebb58a06	Top-k accuracy operator on host Summary: Automatically copy from device -> host if necessary. Thanks to pooyadavoodi for the host top-k code. Closes https://github.com/caffe2/caffe2/pull/51 Reviewed By: Yangqing Differential Revision: D4348953 Pulled By: bwasti fbshipit-source-id: be650855cdd6c2c7bed838155f30e9fa92759dfe	2017-01-10 18:44:30 -08:00
Andrey Malevich	8047b8dc83	Fix random issues with some of the layers getting missing from registry. Summary: It looks like for the types that are created directly through type(...) function call, we don't store the strong references anywhere. As a result a GC call in Python might/or might not clean up these classes depending on the phase of the moon and other random things. This results in a fact that in some cases simple layers as a Relu might disappear. cat_shame Reviewed By: xianjiec Differential Revision: D4396289 fbshipit-source-id: ba4e9b7ef54ee43349853b0acc3d3f40c74e4d73	2017-01-10 15:14:31 -08:00
Aapo Kyrola	bb928f3cc0	Latest fixes to Xray Flow workflows for Caffe2 Summary: (Ignore the convolution-op related changes, they will be later patched separately) This diff ignores work from latest few weeks: - some refactoring of the flow ops - no_bias setting - MAP computation (instead of accuracy) for OC - adaptive learning rate for Xray concepts - various small bug fixes Reviewed By: viswanathgs Differential Revision: D4329500 fbshipit-source-id: 000d4fd22ec408af5290480c788eb86546bff52e	2017-01-10 12:59:23 -08:00
Aapo Kyrola	4f1db36cff	add CUDA gradient for Div Summary: DivOp missed a gradient for CUDA, so implemented it. Also added operator test. Differential Revision: D4396638 fbshipit-source-id: 9949e47aa3735bb418a0db003e2b2f4896056a71	2017-01-09 21:59:23 -08:00
Aapo Kyrola	95b3309a87	Gradient Input memory sharing using memonger blob sharing Summary: This diff brings us to roughly par with Torch on ResNet memory usage. On batch size 32, Resnet-50 took 7497MiB, after this 5010 MiB. This will thus allow us to handle 64 images / GPU, or 256 images / 4 GPUs. In addition, I added a special argument to DagNet that causes it to run only one thread for the first iteration. This is needed since there are allocations on the first iteration's backward pass due to gradient sharing, and this will cause NCCL to deadlock. The sharing of gradient buffers requires inferring which gradients can share memory (i.e that they are not used concurrently). Previous memonger code uses topological sort, but rbgirshick showed that it does not work with tree-like models. Thus, I wrote a new optimization algorithm based on DFS. It takes about 0.25 secs / GPU on resnet-50, so is clearly fast enough. Module data_parallel_model supports this feature natively. Reviewed By: prigoyal Differential Revision: D4363209 fbshipit-source-id: 73b11e7610438098bb11bff0af8075ab0cf2c0f1	2017-01-09 19:44:23 -08:00
Yangqing Jia	3732a0044c	Move mpi_python.cc to the python folder to be more consistent about source file locations. Summary: TSIA Differential Revision: D4386553 fbshipit-source-id: 2c7196171be7d0af90b46b75f68c949ee3980c2e	2017-01-09 10:59:39 -08:00
Simon Layton	b99ea43c9a	Set default build type to release Summary: Closes https://github.com/caffe2/caffe2/pull/89 Reviewed By: bwasti Differential Revision: D4392954 Pulled By: Yangqing fbshipit-source-id: 00ec72838e5e7dd9ff96449a8589273c68d0cef5	2017-01-09 10:59:39 -08:00
Bram Wasti	73fe3d5f59	Update travis to test more versions of GCC and fix README build status link Summary: Closes https://github.com/caffe2/caffe2/pull/87 Reviewed By: Yangqing Differential Revision: D4387686 Pulled By: bwasti fbshipit-source-id: 068ab542bbbd793cbabd06cd77c95ce13ebaf012	2017-01-08 21:29:35 -08:00
Bram Wasti	737000b166	Linter fix up to sync fbsource and github	2017-01-06 15:36:17 -08:00
Bram Wasti	3833dad5f6	manual sync of old never sync'd files	2017-01-06 15:28:45 -08:00
Dmytro Dzhulgakov	46c6e621cb	Fix warning in ScaleOp grad Summary: #accept2ship Reviewed By: Yangqing Differential Revision: D4386362 fbshipit-source-id: 634410e73034ac31b7f2bec39f41c52ea9935e3a	2017-01-06 00:44:33 -08:00
Bram Wasti	76c9382fb3	Delete caffe.cloc	2017-01-05 13:35:45 -08:00
Bram Wasti	603784c8cb	fix typo	2017-01-05 10:48:26 -08:00
Bram Wasti	dd133edf84	Update README.md	2017-01-05 10:15:27 -08:00
Bram Wasti	02436b0982	Merge pull request #80 from caffe2/cmake Merge cmake branch into master	2017-01-05 10:10:12 -08:00
Bram Wasti	74d6004f1d	Merge branch 'master' into cmake	2017-01-05 10:10:04 -08:00
Bram Wasti	b1a003627f	Merge pull request #81 from bwasti/master adding license back	2017-01-05 10:08:51 -08:00
Bram Wasti	c1e6aa58a0	adding license back	2017-01-05 10:08:17 -08:00
Bram Wasti	32ec21c0d1	Merge pull request #79 from bwasti/master Small fixes + docs	2017-01-05 10:00:50 -08:00
Bram Wasti	a358ed4297	Update docs to reflect current build status	2017-01-05 09:59:31 -08:00
Bram Wasti	69ce8cafde	Don't add levelDB dependency unless Snappy is also present	2017-01-05 09:55:09 -08:00
Bram Wasti	c0a48638ee	Merge pull request #12 from caffe2/cmake Cmake	2017-01-05 09:52:57 -08:00
Bram Wasti	10bad040c2	Merge branch 'master' into cmake	2017-01-05 09:52:45 -08:00
Simon Layton	ac03e65929	Move c++11 check to cmake 2.8 Previous check required cmake >= 3.1	2017-01-05 12:15:54 -05:00
Yangqing Jia	e126f6e960	travis cache apt	2017-01-04 22:30:19 -08:00
Yangqing Jia	78fb184cef	mac travis: use Eigen instead of openblas	2017-01-04 22:17:18 -08:00
Yangqing Jia	1a26aab1cf	Seems that on mac, the inclusion order matters.	2017-01-04 21:52:59 -08:00
Yangqing Jia	83b2f282de	Need to set c++11 before check_cxx_source_compiles	2017-01-04 21:38:24 -08:00
Yangqing Jia	375c0816b3	goodbye old brewery	2017-01-04 20:58:35 -08:00
Yangqing Jia	46a403250f	Make build for Android a bit easier	2017-01-04 20:50:06 -08:00
Yangqing Jia	7734235a6a	Add misc check for the long type, and temporarily disabled core_overhead_benchmark to remove the benchmark dependency for all binaries	2017-01-04 20:46:19 -08:00
Yangqing Jia	1e8659fd89	build files bugfix	2017-01-04 20:36:11 -08:00
Yangqing Jia	1be71804c8	For the caffe and caffe2 protobufs, compile them to static instead of shared.	2017-01-04 17:36:03 -08:00
Yangqing Jia	a9e2693fa8	add back third_party/protobuf, but it won't be used in normal builds. Pinned protobuf to v3.1.0 Removed the USE_SYSTEM_PROTOBUF option in cmake. It is no longer used.	2017-01-04 17:27:18 -08:00
Yangqing Jia	9d42eca92e	delete no longer used cmake lists under third party	2017-01-04 17:03:54 -08:00
Bram Wasti	b31708fb6e	Added summary to end of CMake configuration	2017-01-04 16:44:55 -08:00
Yangqing Jia	347e17600f	Added option BUILD_PYTHON	2017-01-04 16:44:06 -08:00
Yangqing Jia	3d1bda1f3a	cmake: make python dependencies separate from the C++ dependencies	2017-01-04 16:34:56 -08:00
Bram Wasti	610df2059e	Rephrase warning for missing dependency	2017-01-04 15:48:19 -08:00
Bram Wasti	249e1857e2	Reset and warn when any options are not satisfied	2017-01-04 15:46:43 -08:00
Yangqing Jia	41e03c9c38	cmake file fixes	2017-01-04 14:52:15 -08:00
Bram Wasti	711b457681	Merge pull request #11 from caffe2/cmake Cmake	2017-01-04 14:46:35 -08:00
Yangqing Jia	5bfd6c4cd1	semicolon	2017-01-04 14:36:16 -08:00
Yangqing Jia	311ae2ba33	build file fix and avx2 on mac fix	2017-01-04 14:35:15 -08:00
Bram Wasti	ec289099b7	Merge pull request #10 from caffe2/cmake Cmake	2017-01-04 14:22:36 -08:00
Yangqing Jia	1be46aeb21	more gitignore from caffe	2017-01-04 14:21:39 -08:00
Bram Wasti	3534a0ef76	Merge pull request #77 from bwasti/master moved exclude to append in binary sources	2017-01-04 14:21:14 -08:00
Bram Wasti	cd617c3a76	moved exclude to append in binary sources	2017-01-04 14:20:22 -08:00
Yangqing Jia	9f351d581e	Add build/ to .gitignore since that's common practice for cmake	2017-01-04 14:12:55 -08:00
Bram Wasti	1259172420	Merge pull request #76 from bwasti/master Moved binaries/python CMake files to reflect paradigm of the rest of the codebase	2017-01-04 14:11:44 -08:00
Yangqing Jia	62265cd1eb	Remove unnecessary cmake lines	2017-01-04 14:10:54 -08:00
Yangqing Jia	3e4b24447b	Add a missing if opencv found check	2017-01-04 14:09:44 -08:00
Bram Wasti	580294cdd4	remove accidentally included old version of installation instructions	2017-01-04 14:04:17 -08:00
Bram Wasti	2f3b5d7943	Moved binaries/python CMake files to reflect paradigm of the rest of the codebase	2017-01-04 14:02:52 -08:00
Yangqing Jia	e80f4430c4	clean no longer needed cmake lines	2017-01-04 13:44:27 -08:00
Yangqing Jia	37b5af990a	Changes to make MKL operators build.	2017-01-04 13:37:34 -08:00
Bram Wasti	945cc8dd13	Merge pull request #75 from bwasti/master Fix accidental inclusion of cudnn tests in CPU tests	2017-01-04 13:36:28 -08:00
Bram Wasti	82070ebd7a	Fix accidental inclusion of cudnn tests in CPU tests	2017-01-04 13:33:04 -08:00
Yangqing Jia	ccdeede31b	mkl: GLOB_RECURSE instead of GLOB	2017-01-04 13:21:35 -08:00
Simon Layton	dc274f9d74	Merge branch 'cmake' of https://github.com/caffe2/caffe2 into cmake	2017-01-04 15:08:33 -05:00
Bram Wasti	a69ed110f8	Merge pull request #9 from caffe2/cmake Cmake	2017-01-04 12:06:53 -08:00
Bram Wasti	c52e744cba	Merge branch 'master' into cmake	2017-01-04 12:06:39 -08:00
Simon Layton	ae62e15f87	Added MPI operators to cmake	2017-01-04 15:06:20 -05:00
Bram Wasti	7ea9f9e0ee	Updated naming convention of Caffe2_LINK*	2017-01-04 12:03:27 -08:00
Simon Layton	05fa16a7aa	Add contrib/nccl to cmake	2017-01-04 14:36:02 -05:00
Bram Wasti	425ce989e2	Update README.md	2017-01-04 11:25:17 -08:00
Simon Layton	5070321915	Add cuda_rtc to new cmake layout	2017-01-04 14:19:48 -05:00
Bram Wasti	ff9a35ce96	Merge pull request #74 from bwasti/master Migrate brewtool stuff into brewtool/ and update makefile to use cmake	2017-01-04 11:15:57 -08:00
Bram Wasti	07a1c58cad	Remove branch specification in travis	2017-01-04 11:14:47 -08:00
Bram Wasti	3dbddf6104	Create .travis.yml	2017-01-04 11:14:18 -08:00
Yangqing Jia	1d03be77d0	mkl cmake file, not tested	2017-01-04 11:13:12 -08:00
Yangqing Jia	5142640b2b	Added all folders to the add_subfolders section, with the ones not ready being commented right now.	2017-01-04 10:58:10 -08:00
Bram Wasti	3f432a8d43	Migrate brewtool stuff into brewtool/ and update makefile to use cmake	2017-01-04 10:56:15 -08:00
Bram Wasti	358d72aa29	Merge pull request #72 from bwasti/master Merging updates to the build system into main repo	2017-01-04 10:51:36 -08:00
Bram Wasti	ae17168939	Ensure glob always happens	2017-01-04 10:46:37 -08:00
Bram Wasti	ec87d49f56	Merge branch 'master' of github.com:bwasti/caffe2	2017-01-04 10:46:13 -08:00
Bram Wasti	d88d706446	Removed protobuf from third_party	2017-01-04 10:46:00 -08:00
Yangqing Jia	8396207684	CMakeLists for db, queue, sgd	2017-01-04 10:45:20 -08:00
Yangqing Jia	9fdc844620	halfway into going towards individual-folder cmake lists	2017-01-04 10:29:57 -08:00
Bram Wasti	1395c1701e	Revert relabeled 'build' directory for protobuf compilation	2017-01-04 10:02:43 -08:00
Bram Wasti	8be4bfb424	Merge pull request #8 from caffe2/master bounds check in Gather operation	2017-01-04 10:00:39 -08:00
Simon Layton	cc8b6bf715	USE_OPENMP option added Factor out omp pragma parallel for into a central macro	2017-01-04 12:44:47 -05:00
Bram Wasti	fb43912616	Guard new cmake feature with version detection for compatibility	2017-01-04 09:43:25 -08:00
Bram Wasti	1161b34529	Merge branch 'master' of https://github.com/bwasti/caffe2	2017-01-04 09:41:06 -08:00
Bram Wasti	76cbf1d4d1	Reducing minimum version of cmake required	2017-01-04 09:40:56 -08:00
Simon Layton	9748c92b75	Factor out DB source collection Should handle all combos of present / missing DBs	2017-01-04 11:05:13 -05:00
Martin Raison	65641b6bfb	bounds check in Gather operation Summary: Currently Gather doesn't check if the provided indices are in the correct range. Adding a check makes issues easier to debug Reviewed By: dzhulgakov Differential Revision: D4277170 fbshipit-source-id: dc744b6a229aaf72af8336a417f0f79c97dbdc77	2017-01-04 01:14:25 -08:00
Bram Wasti	c226211b87	Merge pull request #7 from caffe2/cmake Cmake	2017-01-04 00:53:08 -08:00
Yangqing Jia	4d53c632e0	Remove unnecessary cuda flags. -Xcompiler -std=c++11 is not needed, otherwise gcc produces warnings.	2017-01-04 00:17:35 -08:00
Yangqing Jia	69c09e1c48	BLAS option: Atlas->ATLAS, and added an else() message guard.	2017-01-03 23:37:07 -08:00
Yangqing Jia	6c124c6f49	Allow glog and gflags to be optionally used. If USE_GLOG and USE_GFLAGS are set to off, or if the system does not have glog and gflags installed, caffe2 will fall back to a non-glog and non-gflags installation. This would be helpful for e.g. mobile builds.	2017-01-03 23:16:50 -08:00
Yangqing Jia	324ef09e01	fix typo	2017-01-03 23:16:36 -08:00
Yangqing Jia	c63500fe68	remove explicit glog and gflags link libraries, since the caffe2 dependencies would have already had them.	2017-01-03 23:15:02 -08:00
Yangqing Jia	6bf2e156d4	cmake cuda: add libcuda.so find paths, and produce error if it is not found.	2017-01-03 23:14:07 -08:00
Yangqing Jia	628a6b17d3	Merge remote-tracking branch 'upstream/master' into cmake	2017-01-03 21:11:29 -08:00
Yangqing Jia	52784a3a21	Add LOG_IF and VLOG_IF to the non glog option. Summary: TSIA - this is needed when users choose to build without glog. Reviewed By: bwasti Differential Revision: D4380186 fbshipit-source-id: 1803d451e296f3af5258e0d67d4afdec5f5e5623	2017-01-03 20:59:19 -08:00
Yangqing Jia	b1a31942fc	Merge remote-tracking branch 'upstream/master' into cmake	2017-01-03 18:10:58 -08:00
Yangqing Jia	4c51f96b9d	DEFINE -> CAFFE2_DEFINE Summary: This is needed to properly compile when gflags is not present. Reviewed By: bwasti Differential Revision: D4379796 fbshipit-source-id: 3344fa304d85feabbdba81449f663405ed731797	2017-01-03 17:59:35 -08:00
Simon Layton	7c3f1521a7	Gpu transform Summary: Adds a thread pool for image decode, and optional GPU-based data conversion, mean subtraction and std division Closes https://github.com/caffe2/caffe2/pull/56 Reviewed By: Yangqing Differential Revision: D4341326 Pulled By: bwasti fbshipit-source-id: 6485616ea7d212c7701274a40fae912db30dff4a	2017-01-03 17:59:34 -08:00
Alisson Gusatti Azzolini	6618d7462d	Improvements+fixes for NetBuilder Summary: Title. Reviewed By: dzhulgakov Differential Revision: D4358227 fbshipit-source-id: 21afe5107bed27eec2027f16f2c77db62c70c6e8	2017-01-03 16:59:24 -08:00
Bram Wasti	ed5b349cd9	Merge t push branch 'master' of github.com:bwasti/caffe2	2017-01-03 15:27:01 -08:00
Bram Wasti	eab5cef032	Removed redundant fpic invocation	2017-01-03 15:26:52 -08:00
Bram Wasti	4ab0351647	Merge pull request #6 from caffe2/cmake Cmake	2017-01-03 15:19:58 -08:00
Priya Goyal	16aacbdf83	Fix MSRAFill op Summary: While debugging resnets on imagenet, Ross pointed that MSRAFill is not done correctly. Fixing that 1. use fan_out not fan_in 2. Normal distribution rather than uniform Reviewed By: Yangqing Differential Revision: D4372380 fbshipit-source-id: 8f03bd75f543caa60c20e841edbdbb918d1c8775	2017-01-03 14:44:27 -08:00
Bram Wasti	737f507786	Fix all instances of 'build' folder being used to prevent errors on make	2017-01-03 14:30:13 -08:00
Bram Wasti	86a81b3df2	Merge branch 'master' of github.com:bwasti/caffe2	2017-01-03 14:27:21 -08:00
Bram Wasti	4698062fcc	use different name for build folder (conflicts with build file)	2017-01-03 14:27:08 -08:00
Yangqing Jia	0ce23319c2	Change default blas to Eigen	2017-01-03 13:57:26 -08:00
Bram Wasti	90f601e4cf	Checked out older (and still working) version of pybind11	2017-01-03 11:57:48 -08:00
Yangqing Jia	67a74f3ada	no fancy auto in lambda functions. Summary: This is needed so that we stick with C++11 instead of 14, which are not well supported in a few platforms. Reviewed By: bwasti Differential Revision: D4377534 fbshipit-source-id: d65d7caaa935a8f16e3b44c838104a576c8f78e4	2017-01-03 10:59:27 -08:00
Bram Wasti	a84fa6fb98	Checked out older (and still working) version of pybind11	2017-01-03 10:52:41 -08:00
Bram Wasti	6303dab3ab	Update README.md	2017-01-03 10:48:57 -08:00
Bram Wasti	55856c720c	no sudo on pip for ubuntu	2017-01-03 09:56:02 -08:00
Bram Wasti	52b0741f78	Merge pull request #71 from bwasti/master Fix GPU compilation/False positive Clang warnings	2017-01-03 08:49:03 -08:00
Mathieu Baudet	b8df7ce149	fbcode: remove unused includes from .cpp files without #if (but possibly #define) Summary: Same as D4312617 but this time not excluding source files with `#define`. Reviewed By: soumith Differential Revision: D4344811 fbshipit-source-id: 5a314960c319f029c6737c8c8ac8224ec2f20218	2017-01-02 05:29:17 -08:00
Jason Jeong	3f270f60ce	display spans for a certain time interval Summary: This diff adds a couple of options to `htrace_to_chrome.py` so that users can specify start and end timestamps for displaying spans. For example, the arguments `--start_time x --end_time y` indicate that spans that finish before `y` or start after `x` will not be included in the final chrome tracing json file. This also adds timestamp information to the spans which can serve as hints to the command line argument values. Differential Revision: D4372220 fbshipit-source-id: a2b0af3be6861448874d804b30426df1b67a676e	2016-12-31 10:29:24 -08:00
Bram Wasti	76a2c9cbf7	Attempt to get numpy working with travis	2016-12-29 17:29:41 -05:00
Bram Wasti	415f4959ce	Attempt to get numpy working with travis	2016-12-29 17:28:36 -05:00
Bram Wasti	965228e559	specify python version in travis	2016-12-29 17:10:41 -05:00
Bram Wasti	187ba9d969	Added alternative numpy installation option	2016-12-29 17:00:50 -05:00
Bram Wasti	e5793efb09	clean up enviroment variables	2016-12-29 16:44:51 -05:00
Bram Wasti	edac248dad	no prompt for addition of extra repository	2016-12-29 16:31:29 -05:00
Bram Wasti	1aa473638d	Added a search path to find OpenBLAS for convenience (homebrew install)	2016-12-29 16:15:25 -05:00
Bram Wasti	bd2346093f	added a build script to specify openblas with OS X	2016-12-29 16:02:22 -05:00
Bram Wasti	06265daf1d	Add test repositories to travis	2016-12-29 15:45:33 -05:00
Bram Wasti	70b7f6af23	Fix leveldb brew install typo	2016-12-29 15:35:12 -05:00
Bram Wasti	2e2522bf30	Moved addons back into the matrix specification	2016-12-29 15:34:26 -05:00
Bram Wasti	f89340502c	set distro earlier on in the travis configuration file	2016-12-29 15:27:24 -05:00
Bram Wasti	fc750ae32d	remove gtest installation attempt	2016-12-29 15:26:13 -05:00
Bram Wasti	056312a538	Specify trusty distro	2016-12-29 15:19:18 -05:00
Bram Wasti	5abc094ea1	specify location of openblas and added addons	2016-12-29 15:16:15 -05:00
Bram Wasti	5a77c59d81	added OS X to travis and split install script into separate file	2016-12-29 15:02:44 -05:00
bwasti	9ce23cbb71	Fix false positive for non-clang compilers.	2016-12-29 11:39:50 -08:00
bwasti	454d439cdd	Add back Caffe2_GPU to Caffe2_LINK variable if it can be enabled	2016-12-29 11:37:12 -08:00
Bram Wasti	ea610da033	Merge pull request #70 from bwasti/master Build passing on travis (ubuntu trusty)	2016-12-29 14:22:33 -05:00
Bram Wasti	3dbcae9ef0	Fix typo breaking NumPy includes	2016-12-29 13:51:26 -05:00
Bram Wasti	b097c993e0	Merge branch 'master' of github.com:bwasti/caffe2	2016-12-29 12:41:29 -05:00
Bram Wasti	3ebb52074f	Fix duplicate definition bug (only present in GCC)	2016-12-29 12:38:38 -05:00
Bram Wasti	d515d8ffb8	Update README.md	2016-12-29 12:35:47 -05:00
Bram Wasti	b48f1ff810	OS X build	2016-12-29 12:25:53 -05:00
Xianjie Chen	a2ae00519c	add speed benchmark tool Summary: provide a easy way to benchmark different dper models. Differential Revision: D4367258 fbshipit-source-id: 4821645c58ad183becf0c82daae991375d5c6ef4	2016-12-28 14:14:25 -08:00
Bram Wasti	5251bb12c2	Merge pull request #66 from caffe2/eigen Updated eigen submodule	2016-12-28 15:21:02 -05:00
Bram Wasti	ce02932517	added documentation	2016-12-28 14:48:51 -05:00
Jason Jeong	3adca70cec	bugfix htrace_to_chrome wrong output file name Summary: This is a quick bugfix on `htrace_to_chrome.py`, which produces outputs with wrong file names if command line arguments are given in a specific way. fbcode $ python caffe2/caffe2/contrib/prof/htrace_to_chrome.py --display operator /tmp/htrace_alexnet_span_log_20161224_055901 Writing chrome json file to --display.json Now import --display.json in chrome://tracing Differential Revision: D4369445 fbshipit-source-id: 628f4dbd88fb86814a0d92cd4c8407ba12a401d0	2016-12-28 10:14:30 -08:00
Xianjie Chen	4b3bd06a7f	sparse nn converges better by dedupping sparse gradient by mean Summary: this normalizes the sparse gradient, so that the "effective learning rate" of each sparse parameter will NOT be affected by the number of examples in a batch that "use" this sparse parameter. experiment shows it help convergence (about 0.1% better train NE): https://fburl.com/1230747813683956. It's not conclusive yet, and we still need to do more experiments. But this diff adds it as an option, and does not change the default behavior, so we can get this in first. Differential Revision: D4367283 fbshipit-source-id: 49ea80dfa9ea776ff4160e220cf6c86593521607	2016-12-27 22:59:29 -08:00
bwasti	8576eef831	added link_directories to hopefully fix travis issue	2016-12-27 13:58:16 -08:00
bwasti	6d6d418f6c	remove policy (doesn't work with older versions of cmake)	2016-12-27 12:34:55 -08:00
bwasti	db745f33a5	whoops had the old command for cmake in there	2016-12-27 12:19:43 -08:00
bwasti	cf64d91548	new policy for building shared libs	2016-12-27 12:12:04 -08:00
bwasti	1307e6f1cf	compatibility with older version of CMake	2016-12-27 12:10:16 -08:00
bwasti	244f6aed28	force use of new cmake	2016-12-27 11:51:07 -08:00
Jason Jeong	9e75aa4d35	specify path to write htrace logs Summary: This diff adds a gflag for specifying the path for htrace span log files. This flag is used by the net types `HTraceDAGNet` and `HTraceAsyncDAGNet`. Differential Revision: D4366849 fbshipit-source-id: 56038d3d64a3fd5ab363feda86a19a6f2496971c	2016-12-27 11:44:31 -08:00
bwasti	547d151728	fix for failing dir creation	2016-12-27 11:38:24 -08:00
bwasti	de15c67844	download a newer cmake	2016-12-27 11:25:21 -08:00
bwasti	8c6fe64e1d	upgrade cmake version to use proper linking flag (full paths)	2016-12-27 11:11:13 -08:00
bwasti	5dfc3f8681	Merge branch 'master' of https://github.com/bwasti/caffe2	2016-12-27 08:51:01 -08:00
bwasti	c2c58480ab	added google/benchmark and tidied up Cuda build	2016-12-27 08:49:41 -08:00
Bram Wasti	d267f1c320	remove apt-get of nvidia toolkit (which installs nvcc-7.5)	2016-12-26 12:45:05 -08:00
Bram Wasti	826abe8438	Merge pull request #5 from caffe2/master merge caffe2:master into bwasti:master	2016-12-26 14:19:30 -05:00
Bram Wasti	118cc4174a	Remove binaries build (it seems to be broken)	2016-12-24 07:20:45 -08:00
Ou Jin	a4f3721e15	weightedsum on ps Summary: Rewrite D3993337 based on new stack. Comparing to the old one, we need more readers to achieve the same speed. But so far the speed is the same and the new bottleneck is the write bandwidth of trainer. Model quality is the same as the base. Reviewed By: azzolini Differential Revision: D4310803 fbshipit-source-id: 6d04ae8040c1ee7caa9aea5287f054e73fbe325a	2016-12-22 19:14:38 -08:00
Bram Wasti	e14dc54a28	Merge pull request #4 from caffe2/cmake Cmake	2016-12-22 17:35:56 -08:00
Bram Wasti	4ad367a6fa	Merge branch 'master' into cmake	2016-12-22 17:35:29 -08:00
Bram Wasti	55ae35c0ba	remove -lcnmem	2016-12-22 17:09:50 -08:00
Ievgen Soboliev	a7f8fe0423	introduce request net into prediction schema Summary: As title. We want to have request_only net which runs on user_only sparse features. Submitting to get early feedback. Reviewed By: dzhulgakov Differential Revision: D4282783 fbshipit-source-id: 71241bf5444550075884c788c2da4783659bc1e0	2016-12-22 15:59:27 -08:00
Bram Wasti	6bb9ec1d78	Merge branch 'master' of github.com:bwasti/caffe2	2016-12-22 15:09:52 -08:00
Bram Wasti	074924eb19	remove broken test	2016-12-22 15:09:33 -08:00
Aapo Kyrola	e51e651255	Remove redundant and failing test of FeedBlob asserts Summary: Recently a PR landed that removed asserts of trying to feed float64 to FeedBlob for GPUs and changed to a warning. Thus the test testing assertions were given started to fail. Removing it. Reviewed By: Yangqing Differential Revision: D4363780 fbshipit-source-id: d9e222c309302243138d4ff3c223c711a4d2052d	2016-12-22 14:59:28 -08:00
Bram Wasti	e7690070ca	removed encrypted binary	2016-12-22 14:41:45 -08:00
Priya Goyal	3eb08feff5	Support no_bias in naive group conv implementation Summary: I was testing perf difference between naive group conv and cudnn group conv. I am doing no_bias conv and added support for that in naive implementation although its deprecated, i thought it would be nice to have working things in our code Differential Revision: D4363168 fbshipit-source-id: 29719013d79b449fd359884709c7a1195be51ae3	2016-12-22 14:14:26 -08:00
Bram Wasti	3d69cf1fa7	added cudnn	2016-12-22 14:14:03 -08:00
Bram Wasti	ed2994a385	Add c++11 support to nvcc	2016-12-22 13:43:23 -08:00
Bram Wasti	fc74eae082	fix build for older versions of CUDA	2016-12-22 13:16:41 -08:00
Bram Wasti	d4a783405f	Merge branch 'master' of github.com:caffe2/caffe2 into cmake	2016-12-22 13:15:05 -08:00
Jason Jeong	85d7688811	add display level to htrace_to_chrome.py Summary: This diff adds an option to the htrace_to_chrome.py format conversion script so that users can decide to display less traces by hiding kernel/operator/worker spans. For example, passing the arguments `--display worker` will make the script process spans up to worker spans and not go further (deeper). Differential Revision: D4360404 fbshipit-source-id: aa5af7e499b94aeb3de06823bdeeedfbc3b1c02b	2016-12-22 13:14:27 -08:00
Aapo Kyrola	db5cc8f278	revert exhaustive_search setting to False Summary: As per discussion in D4355529 Reviewed By: prigoyal Differential Revision: D4362162 fbshipit-source-id: 795fcf1507235a7dc3c7a10b0453037936d057aa	2016-12-22 12:44:42 -08:00
Bram Wasti	72e957e611	update third party files	2016-12-22 11:54:40 -08:00
Bram Wasti	d570d1f405	fix USE_*DB option issues	2016-12-22 11:52:04 -08:00
Bram Wasti	2da1b44b7f	pass std=c++11 directly to nvcc	2016-12-22 11:32:23 -08:00
Bram Wasti	71db174410	downgraded cuda to 7.5	2016-12-22 10:39:47 -08:00
Bram Wasti	4fda7467fb	removed docs from cmake branch	2016-12-22 10:33:09 -08:00
Bram Wasti	45807c89ac	matrix install and export CXX in the script with COMPILER variable	2016-12-22 10:00:46 -08:00
Bram Wasti	a14d2b5817	turn off cuda propogate host flags	2016-12-21 18:20:03 -08:00
Bram Wasti	291c971e36	CMAKE VERBOSE on	2016-12-21 17:47:04 -08:00
Maxime Boucher	e2181a32ca	Normalize rank loss gradient to avoid convergence issues when the number of pairs is really large Summary: Essentially, when number of pairs is around 1000, then only positive samples in the list gets a massive boost from all the negative examples. This diff normalizes the gradient and the loss with the number of pairs. This diff also adds protection against NaN and more logging to help debug. Reviewed By: kdub0 Differential Revision: D4359782 fbshipit-source-id: 7240344ddb1f2f670d1eec1b03e7f6e413f3dfcc	2016-12-21 17:29:24 -08:00
Bram Wasti	505fdc99b7	fix gcc path search	2016-12-21 16:52:26 -08:00
Bram Wasti	613a8f1040	update gcc to 5.4	2016-12-21 16:20:48 -08:00
Yangqing Jia	2c6a579859	Make all convolution operators allow optional bias term Summary: It used to be that only the cudnn engine supports it, and now it should be fully supported by any conv engine. To ignore bias, simply use a convolution op that has two inputs instead of 3. The gradient operator will automatically figure out that it does not compute the bias gradient. Reviewed By: prigoyal Differential Revision: D4354183 fbshipit-source-id: cf71b6289a254d15a6a663a85df63fbbaec3702b	2016-12-21 15:14:24 -08:00
Bram Wasti	1c8185ce52	Merge branch 'osx-build' into cmake	2016-12-21 14:30:24 -08:00
Dmytro Dzhulgakov	d7836b2f5a	Preserve metadata on schema.List.lengths Summary: Ievgen ran into this bug with his dper work - we didn't preserve metadata on lengths field. Also, we didn't take keep_blobs into account for List's main field. Now fixed. Also, reformat the file to be nice. Differential Revision: D4357859 fbshipit-source-id: 1c26c533a10d38afab13b46ccbcb541f5fa9074a	2016-12-21 14:29:48 -08:00
Bram Wasti	4c22d3769b	maybe fix cmake file	2016-12-21 14:23:15 -08:00
Bram Wasti	728c694f1d	network install rather than local	2016-12-21 13:55:16 -08:00
Bram Wasti	bb3cec8046	fix broken link for ubuntu download	2016-12-21 13:42:42 -08:00
Bram Wasti	fed2cdf8cd	change nvcc version	2016-12-21 13:27:29 -08:00
Huazhong Ning	47bd606f63	Better visualization for gpu training plan Summary: The current gpu training plan has many sub-steps with same name (eg., "train/epoch"). This messes up the plan visualization. This diff fixes this. before: https://our.intern.facebook.com/intern/graphviz?paste=56899036 after: https://our.intern.facebook.com/intern/graphviz?paste=56899704 Reviewed By: xianjiec Differential Revision: D4343739 fbshipit-source-id: 8dbc01b4f3221999c78cb80a22ec8c11abf81172	2016-12-21 09:29:43 -08:00
Aapo Kyrola	5209a28c95	cuddn_exhaustive_search default True Summary: As discussed, this improves performance a lot and is not a memory hog anymore. Anyway anyone can also turn it off. Differential Revision: D4338798 fbshipit-source-id: bf0fdb594427ebe90e1e94b2effdc63196096b3f	2016-12-21 09:29:43 -08:00
Aapo Kyrola	82f1a8e12d	fix code doc for data_workers Summary: Fix bug in doc as reported by rpenggithub Reviewed By: rpenggithub Differential Revision: D4356796 fbshipit-source-id: a35e54247d84ba29ef1b8e8cac0de8a3d30b489e	2016-12-21 09:29:43 -08:00
Liang Xiong	fdb2a5b77a	separate num_task and num_label. unify label schema. remove is_mtml Summary: att. part of the effort to unify loader configueration. Differential Revision: D4342147 fbshipit-source-id: bb021112f61d4838b0ccc7a5a8bcaf272cb35cd8	2016-12-21 09:29:43 -08:00
Yury Zemlyanskiy	c2d28fb874	RNNs API simplification Summary: This is a first step in improving our RNN story. It provides a wrapper around current RecurrentNetworkOp implementation which infers most of the redundant parameters and makes API much simpler. Also in order to support general step nets I added an extra argument to the RecurrentNetworkOp. Future work: 1. Inferring step net output and internal blobs (scratches) sizes and type 2. Avoid accessing blobs by names in c++ part 3. Remove requirement for inputs / output 1:1 correspondence in the step net 4. Make python API support networks with operators like Sum being on the boarder of the Cell net (currently there is an issue with such networks where gradient blobs which are on the side are not explicitly created). Differential Revision: D4268503 fbshipit-source-id: f8a66491c2b55daa730caeed7e9f2b3921541b49	2016-12-21 09:29:43 -08:00
Simon Layton	a2b31cf9e2	Install fixes Fix paths, __init__.py initialization Other assorted fixes	2016-12-21 09:14:04 -05:00
Simon Layton	2d8d4ac423	Merge pull request #67 from caffe2/fbsync Fbsync merge	2016-12-20 20:59:14 -05:00
Bram Wasti	7bf5c48e7e	Updated eigen submodule	2016-12-20 15:54:17 -08:00
Bram Wasti	fe0d59d424	added -y flag to force addition of repository (timeouts on automated build system)	2016-12-20 15:48:57 -08:00
Bram Wasti	35fd17cae2	moved gcc installation into the script rather than addon	2016-12-20 15:33:34 -08:00
Bram Wasti	2b0054a642	fixed gcc update bug in travis.yml	2016-12-20 15:07:54 -08:00
Bram Wasti	baa058778c	added newer version of G++ for potential fix of nvcc compilation	2016-12-20 15:01:34 -08:00
Bram Wasti	705d934481	added pip numpy install	2016-12-20 14:26:34 -08:00
Bram Wasti	6ea442629b	added C as a supported language to the cmake file	2016-12-20 14:00:43 -08:00
Yangqing Jia	6abf5c99dc	Implement group convolution in the cudnn interface. Summary: This is an ongoing work - currently the forward pass is implemented, but backward is yet to be done. We might want a CPU counterpart as well. I will wait for D4341288 to land and then make bias optional. Reviewed By: prigoyal Differential Revision: D4342210 fbshipit-source-id: 51bb0e98d917970bdc040d076b535beb8e994d9a	2016-12-20 13:44:44 -08:00
Jason Jeong	a3e6f4cb7a	add HTraceAsyncDAGNet Summary: This diff adds HTraceAsyncDAGNet, which is basically the async_dag version of HTraceDAGNet. Similar to HTraceDAGNet, we can use HTraceAsyncDAGNet by setting the net type to `htrace_async_dag`. For now, we only track iteration spans and do not go deeper (operators, gpu kernels, etc.) because due to the implementation of AsyncDAGNet, applying HTrace is much more intrusive compared to HTraceDAGNet. Creating spans for operators for HTraceAsyncDAGNet is a future task. This diff also adds a minor change in the TARGETS file so that `htrace_dag`, `htrace_async_dag`, and `prof_dag` are all accessible via one rule. Differential Revision: D4351587 fbshipit-source-id: 1a4075a9a5efdfafb828a81b663cc731858f7307	2016-12-20 13:44:44 -08:00
Bram Wasti	93dd09dfd8	apt-get latest cmake	2016-12-20 13:37:20 -08:00
Bram Wasti	1d1528bc96	updated ubuntu to the xenial	2016-12-20 11:55:46 -08:00
Maxime Boucher	a03692069e	Adjust numerical precision of comparison to make test pass Summary: see title Differential Revision: D4351545 fbshipit-source-id: 1cca4552ea8f1051796a85724ba0c136ea38b5ec	2016-12-20 11:30:01 -08:00
Bram Wasti	0943c16324	added pthread download	2016-12-20 11:11:55 -08:00
Bram Wasti	6dccc8e4ab	removed swap files	2016-12-20 10:59:02 -08:00
Bram Wasti	c77486da42	Merge pull request #2 from caffe2/documentation added initial documentation template	2016-12-20 10:52:55 -08:00
Bram Wasti	64de84e069	updated ubuntu version	2016-12-20 10:47:40 -08:00
Bram Wasti	632a9fd23a	removed old '.yaml' file	2016-12-20 10:39:20 -08:00
Bram Wasti	99fa9e7ae2	fixed yml extension to be recognized by travis	2016-12-20 10:38:43 -08:00
Bram Wasti	50e2a1515e	Merge branch 'master' of github.com:bwasti/caffe2	2016-12-20 10:35:17 -08:00
Jim Meyering	0a85a977c6	caffe2/caffe2/utils/mkl/mkl_memory.h: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local (and/or the stricter -Wshadow-local) options. Note that these are both less onerous than -Wshadow. I plan to enable one of them for all of fbcode, soon. Rename inner "convert" to "convert2". Reviewed By: Yangqing Differential Revision: D4347297 fbshipit-source-id: 7494aedbaeeb2e5356db0612f5f32077f7ffd30b	2016-12-20 03:59:28 -08:00
Maxime Boucher	e3d38fa933	Add rank loss options to mlp Summary: This diff adds an option to use rank loss instead of cross entropy loss during training. This assumes that the data is loaded in batches which corresponds to sessions, which is something that was implemented for RNN training Differential Revision: D4261923 fbshipit-source-id: e92a60cc9f53acc1585ac35d1fdb430c2ebbfa33	2016-12-19 20:59:30 -08:00
Simon Layton	d4bbcab558	Setup MPI before test start Summary: With __name__ == "__main__" defined, MPI4Py was no longer being setup as intended, leading to test failures on syntax errors (_has_mpi, COMM, RANK and SIZE were no longer defined in a global scope. This is fixed via explicit use of global variables and factoring out the MPI setup into a new method. Closes https://github.com/caffe2/caffe2/pull/59 Reviewed By: Yangqing Differential Revision: D4348956 Pulled By: bwasti fbshipit-source-id: ee741a0fff1df00eade1b6d5e1c281afcb38da6a	2016-12-19 15:59:32 -08:00
Simon Layton	12c4090ea5	Skip sparse tests if operators not available Summary: Only tests for SparseFunHash for now Closes https://github.com/caffe2/caffe2/pull/60 Reviewed By: Yangqing Differential Revision: D4348961 Pulled By: bwasti fbshipit-source-id: cd05d73ccc711b42a7d33e7a6b65a9d1a9bfa7e6	2016-12-19 15:59:32 -08:00
Simon Layton	84e7eff458	Waive some hypothesis tests on GPU Summary: operators don't exist on GPU Closes https://github.com/caffe2/caffe2/pull/63 Reviewed By: Yangqing Differential Revision: D4348968 Pulled By: bwasti fbshipit-source-id: 1fb8693842d6827ffcf96de2a9a8ba2f9dff0293	2016-12-19 15:59:32 -08:00
Yangqing Jia	fd5da05a05	yes it is right. Summary: TSIA Reviewed By: bwasti Differential Revision: D4348965 fbshipit-source-id: 60b95328086cf0bcf9690cef1e8cddfcbe997c72	2016-12-19 15:29:27 -08:00
Simon Layton	05233cd5b8	Make bias optional in cuDNN conv op Summary: Yangqing This seems to work for me, not sure if it's implemented in the right way for you to accept :) Allows user to specify "no_bias" as an option for convolution layers (only cuDNN at this point), so that the bias associated with that operator is not allocated or computed. This is useful in particular for conv + BatchNorm combinations (such as ResNets), as the bias term can be handled by both conv and Batch Norm, wasting memory and computation. Closes https://github.com/caffe2/caffe2/pull/50 Reviewed By: Yangqing Differential Revision: D4341288 Pulled By: bwasti fbshipit-source-id: e6138d0024c83ed876dff2f83ffbebe7de502fd8	2016-12-19 14:59:49 -08:00
Aapo Kyrola	fe38a0c2b1	remove logging.basicConfig() from workplace Summary: As part of PR from GitHub, "logging.basicConfig()" was added to workplace, causing havoc with existing logger configurations. It should not be here. Thanks rbgirshick for reporting. Reviewed By: kdub0 Differential Revision: D4346077 fbshipit-source-id: 084ddcbfe6354bdaf5c97a42086c0bd36ec4629c	2016-12-19 11:59:26 -08:00
Orion Reblitz-Richardson	bd0b61fef1	Minor comment changes for caffe2. Summary: Found some comment typos while working on T14849353. Reviewed By: Yangqing Differential Revision: D4334469 fbshipit-source-id: f880e2a3e9a4e1152b315c6d3c8b68ad298d6334	2016-12-19 11:29:37 -08:00
Orion Reblitz-Richardson	6c3cca9bc7	Build caffe2, NNPACK, FXdiv, pthreadpool for macOS Summary: Builds caffe2 and dependencies for macOS. Not included in the MSQRD engine or elsewhere yet. Reviewed By: Yangqing Differential Revision: D4334013 fbshipit-source-id: 31cacf07e2b07f379e1894e51dde5103c56b8815	2016-12-19 11:29:37 -08:00
Yangqing Jia	09187f4aea	Moved core_overhead_benchmark to oss. Use google/benchmark Summary: TSIA Differential Revision: D4343478 fbshipit-source-id: 61ce1b8d72f689cd2ff46b73684ba298a05ed73a	2016-12-19 10:45:20 -08:00
Yangqing Jia	d87edd39e7	math gemm interface fix Summary: I don't know why I did this embarrassing bug that changes the order of ldb and beta in the gemm interface. This fixes that. Differential Revision: D4014493 fbshipit-source-id: 1aec950b6e9d57e947654d4044e50930f2db1344	2016-12-19 10:45:20 -08:00
Simon Layton	938b78b677	Merge pull request #61 from caffe2/fbsync migration to master	2016-12-19 13:44:02 -05:00
Aapo Kyrola	d37fffd257	use in-place ReLu to safe a lot of memory Summary: Reading Torch docs about Resnets, and soumith's comment, they mention significant memory-saving with in-place ReLu. prigoyal already had this in her code, but I did not. This saves memory a lot: 9851 MiB -> 7497 MiB. Reviewed By: prigoyal Differential Revision: D4346100 fbshipit-source-id: e9c5d5e93787f47487fade668b65b9619bfc9741	2016-12-19 09:29:26 -08:00
Huazhong Ning	70dcba376c	using BlobReference for Sum gradients. Summary: We create a Sum operator to sum up the gradients. Currently we use strings for its input/output blobs. So the code will fail if AddAllGradients() runs within a NameScope. To avoid this, just BlobReference instead of string for blobs. Reviewed By: xianjiec Differential Revision: D4343701 fbshipit-source-id: 2d008916e192d75c6e20f97921331ac4c7b73363	2016-12-18 09:29:22 -08:00
Mathieu Baudet	17a5a6ae32	fbcode: remove unused includes from .cpp files with no #if and #define Summary: This is a first diff to remove the "easiest" unused includes in fbcode. * For safety, we only touch .cpp files without #if and #define, * We do not try to remove redundant systems headers (aka. "packing"). The diff was generated as follows: ``` foundation/scripts/ls-cpp-dirs \| grep -v '^$\.\.\\|external/\\|./external$' \| xargs ffmr -o /tmp/ffmr-diff-1 codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_ifdefs.sh cat /tmp/ffmr-diff-1/.diff \| patch -p2 hg commit -m something arc diff --prepare --nolint --nounit --less-context --excuse refactoring ``` Note: `grep -v` is just an optimization. The actual configuration is in these two files: diffusion/FBS/browse/master/fbcode/codegraph/analysis/config.py diffusion/FBS/browse/master/fbcode/codegraph/scripts/ffmr/analyze_includes_no_headers_no_packing_skipping_ifdefs.sh See the task for more context, and the recent "safety" improvements on the tool. depends on D4317825 for very few cases where `nolint` had to be manually added. Reviewed By: igorsugak Differential Revision: D4312617 fbshipit-source-id: ecc1f0addfd0651fa4770fcc43cd1314661a311a	2016-12-17 18:29:27 -08:00
Kaiming He	9e498c7bba	caffe2: removing message logging in conv_transpose_op Summary: Avoid printing message repeatedly each time the conv_transpose_op (with cudnn) is called Reviewed By: Yangqing Differential Revision: D4337242 fbshipit-source-id: 27b048bad8c54604d91174acd4928a1496f2f5c7	2016-12-16 17:44:25 -08:00
Pooya Davoodi	78edb8295e	No exception for float64 in FeedBlob. Warning instead. Summary: The exception in FeedBlob causes many tests to fail. Instead of exception, we log a warning message and move on. Feeding a float64 blob should not cause any issue. Closes https://github.com/caffe2/caffe2/pull/57 Reviewed By: bwasti Differential Revision: D4343135 Pulled By: Yangqing fbshipit-source-id: cd1144b94c9883fcbd8bdcd78f9f93a67debc0a6	2016-12-16 17:29:29 -08:00
Yangqing Jia	ea2deae9e3	remove unnecessary code since our compiler is fairly modern Summary: TSIA Reviewed By: bwasti Differential Revision: D4343001 fbshipit-source-id: ff7496f720602e433170ab7ac52be4c18e916e43	2016-12-16 17:29:29 -08:00
Simon Layton	99e97a4b7a	Correction to paths to find cuDNN	2016-12-16 16:03:23 -05:00
Ahmed Taei	a8ae63c3e0	HuffmanTreeHierarchy operator Summary: An operator that reads labels compute their counts and generates huffman tree hierarchy. It generates all paths from root node to leafs labels as serialized HierarchyProto to be used as an input to HSoftmax operator. The tree is constructed in a bottom up greedy way keeping indices to parent nodes to in order to generate the code and the path from root to leave in a bottom up traversal. Note: HSoftmax handels computing a generic hierarchy which means for the binary case we can save one matrix x vector operation per node by representing every node as logsitc function and also reduce the paths proto size by producing only one integer list to represent the path / indices and bytes list for the code per label. Differential Revision: D4303294 fbshipit-source-id: c7f0d3c204536234c26bb2a4228cb3a1892db395	2016-12-16 10:59:48 -08:00
Priya Goyal	29f903aaf2	Make computed params broadcast optional Summary: this was introduced due to rm and riv params in SpatialBN layer and the likes. We should be saving these params as well but it is not required to broadcast these params to all gpus after every epoch. Differential Revision: D4338749 fbshipit-source-id: d3bbc92cf0cd7d220a51d76aea8bffcfd6e520b7	2016-12-16 07:59:25 -08:00
Simon Layton	dac78727fb	Add missing file	2016-12-16 08:00:47 -05:00
Aapo Kyrola	e8dc09064e	exhaustive_search=True Summary: For some reason I had been disabling the exhaustive search heuristic for cudnn for xray/resnet trainers. On BigBasin, this gives 10% perf boost. On BigSur maybe 5%. Reviewed By: prigoyal Differential Revision: D4338654 fbshipit-source-id: 3974dd612f5d4f4dc8b2febccb59664d3f276c3e	2016-12-15 22:59:27 -08:00
Aapo Kyrola	fc27f83282	restore control_input Summary: I accidentally landed in D4327024 the control_input disable for NCCL. This empirically increases likelihood of deadlocks, although gives a nice perf boost. But better to disable before NVIDIA fixes their stuff. Reviewed By: Yangqing Differential Revision: D4338537 fbshipit-source-id: d43efb45965a88bcfe38e5f1dc16c04463e2e038	2016-12-15 21:29:29 -08:00
Aapo Kyrola	35fa9e9c5f	a couple small reliability improvements Summary: A couple of more misc changes: - allow starting the coordinator multiple times -- this makes data parallel programming easier - make the fetcher id a global sequence, before each gpu had same ids for workers - my flow jobs got stuck when joining the fetcher threads. I think there is actually a memory fencing problem with the is_active boolean. But I am too tired to add proper condition variables there. Instead just add timeout to join(). It is needed anyway since some i/o thread could get blocked. Differential Revision: D4333381 fbshipit-source-id: 88226c8a9c9a5e05d771360a502a2ba21a6b9d76	2016-12-15 21:29:29 -08:00
Yangqing Jia	c016e64914	Fix test cases: tensor of size 0 not supported by GPU ops yet. Summary: TSIA Reviewed By: bwasti Differential Revision: D4334592 fbshipit-source-id: f101887ede2691aef8ca317e5286347c52779774	2016-12-15 19:59:24 -08:00
Yangqing Jia	42bbdda8c4	MKLDevice and MKLOperator Summary: This adds Caffe2 support for MKL operators directly with MKLMemory. Included a Relu layer that shows how to use it. Reviewed By: salexspb Differential Revision: D4322144 fbshipit-source-id: 8b3392c4fd024ab1a7ba7135c349ebd3e1976799	2016-12-15 19:59:24 -08:00
Jason Jeong	dbe7aeb883	move HTraceDAGNet and ProfDAGNet to contrib Summary: This diff moves all tracing code under fb/htrace and fb/prof to contrib/prof. Differential Revision: D4333032 fbshipit-source-id: 1d1ae14c3d376a89f9199561cada53b2ca62e81a	2016-12-15 14:59:56 -08:00
Aapo Kyrola	2bf18f2b1d	add inception and dummy input Summary: As requested by Yangqing, added Inception model (copied from convnet_benchmarks) and a dummy data feed option to the xray trainer, that we use for scalability benchmarking. + a couple of minichanges to the data input framework Reviewed By: Yangqing Differential Revision: D4327024 fbshipit-source-id: 86911468456fc13a32d5f437a43347380ec66a68	2016-12-15 13:40:22 -08:00
Bram Wasti	30f323298f	exclude every branch but master for build testing	2016-12-15 13:32:33 -08:00
Maxime Boucher	dd74c5d3b8	Implement rank loss method using logit function and pairwise comparisons Summary: This is just a stub for now. I need to add a report metric as well before I can produce a complete flow. Possible extensions: Implement list-wise loss, allow for more than one session in a batch and create a framework for arbitrary loss functions to be applied The data loader will be the same as for RNN Reviewed By: xianjiec Differential Revision: D4245176 fbshipit-source-id: 546683b6551654a37c410dc1606e556a7bf83a2a	2016-12-15 12:01:31 -08:00
Aapo Kyrola	e80423f341	bug fix to distringuish train/test data Summary: We often use same net for training and testing, but we must distinguish their data. My yestterday's diff forgot to include that distinction (it was in the xray sampler before), and this diff adds it. Basically one provides a name for the input source for data_workers, and all the queues and scratch spaces are suffixed with that to separate them. Also specify the caffe2 queue's size to 4, which is empirically found to be sufficient. It was errorneously defined to be function of batch size, which does not make sense as each element in the queue is a batch, and led to out of memory issues on xray trainer. Differential Revision: D4329449 fbshipit-source-id: c994da1c8b0935b8eda2402c118d49b76caa7da8	2016-12-15 12:01:31 -08:00
Priya Goyal	cb918ac727	Implementation of ResNets on imagenet dataset Summary: adding imagenet dataset as well data augmentation and model has been added, just need to add db read Differential Revision: D4289150 fbshipit-source-id: b531d3f09e3d0efac5cda5bb75d8146e1bb693e4	2016-12-15 12:01:31 -08:00
Pieter Noordhuis	585b8f7c9d	Templatize store handlers Summary: Needed to create these ops in CUDA context. Differential Revision: D4321727 fbshipit-source-id: 518fe5f994d33ea2dcf9b2cc955848b8bb7b06cd	2016-12-15 12:01:31 -08:00
Yangqing Jia	dc16bcfa27	Remove float64 test Summary: float64 test breaks things on the cuda side. I am deleting it for now and if we add it back, let's make sure we run the test on a GPU machine first :) Reviewed By: azzolini Differential Revision: D4324427 fbshipit-source-id: 0246fe9dd28a286422ca94c90f5b0fc33a162e74	2016-12-15 12:01:30 -08:00
Aapo Kyrola	0b52b3c79d	Generalize threaded data input via queues + Everstore input Summary: Xray sampler (originally by ajtulloch) and prigoyal's resnet trainer use variants of the threaded data input where worker threads put stuff into a python queue that is drained by an enqueuer thread that dumps those batches to a Caffe2 queue, that is then drained by the net's DequeueBlobs operator. There is a lot of boilerplate, which is also quite complicated. This diff is an attempt to generalize that general stuff under a new module "data_workers" (name could be improved). Basically you pass it a function that is able to return chunks of data (usually data + labels). I also created a module 'everstore_data_input' which generalizes everstore-origin data input with preprocessing function (image augmentation , for example). See how I refactored sampler.py for the usage. Next we could create fetcher function for Laser data. Differential Revision: D4297667 fbshipit-source-id: 8d8a863b177784ae13940730a27dc76cd1dd3dac	2016-12-15 12:01:30 -08:00
Yangqing Jia	4858a6bc6f	snapshot -> checkpoint Summary: This renames the "Snapshot" op name to "Checkpoint" as we discussed earlier. The early Snapshot name is still available, but we should move to the new name and eventually deprecate the old name. The Python SnapshotManager should be also changed, cc azzolini Reviewed By: dzhulgakov Differential Revision: D4272021 fbshipit-source-id: 4b8e029354416530dfbf0d538bfc91a0f61e0296	2016-12-15 12:01:30 -08:00
Yangqing Jia	ba58b80b16	Rename OperatorBase::OutputAt to OutputBlob and make the interface consistent with the rest Summary: TSIA We also return reference for Input and pointer for Output just to be consistent with the rest of the framework. Reviewed By: bwasti Differential Revision: D4318148 fbshipit-source-id: 857fd72bf929dac04a890f8f787a6fad84bd4287	2016-12-15 12:01:30 -08:00
Aapo Kyrola	d38499f727	Optimize BlobIsDefined() + benchmark --> net construction 95 secs to 8.2 secs! Summary: I have noticed that constructing the Xray model takes quite a while. To measure this, I wrote a benchmark script that creates a resnet-50 model on 8 gpus. This takes about 95 secs -- which is kind of annoying when you want to quickly debug stuff. Profiling (using Python's cProfile), I was able to see that the most of the time is used in net.BlobIsDefined(), which does a linear search over external inputs and operator outputs. Thus it gets slower and slower with large nets. This can be fully optimized by keeping a separate lookup table of operator inputs and outputs (and external inputs and outputs). It is a bit annoying to keep this separate data structure, but I setup the unit tests to ensure things are doing correctly over Clones. After the optimization, the net construction drops from 95 secs to 8.2 secs! Reviewed By: azzolini Differential Revision: D4288307 fbshipit-source-id: 0bb82c8bde9d86a2702b298f4aa706cba509346e	2016-12-15 12:01:30 -08:00
Yangqing Jia	11a6f48fe7	Fix a few docstrings in operator.h that is not correct. Summary: TSIA Reviewed By: bwasti Differential Revision: D4318071 fbshipit-source-id: f82c21dd44285818f61fe23096e7a93652c705c8	2016-12-15 12:01:30 -08:00
Yangqing Jia	1a00ffea2a	Implement fix recommended by @slayton58 Summary: This addresses integer division errors. Reviewed By: bwasti Differential Revision: D4315555 fbshipit-source-id: 13ef9496409b3452bc5fb66ce787b11af1382132	2016-12-15 12:01:30 -08:00
Maxime Boucher	4cd263db74	Last N window collector Summary: Allows to collect samples over multiple batches. The method uses a circular array and so there is no guarantee about the order of the samples. The goal is to get a view of the data accross multiple batches Reviewed By: salexspb Differential Revision: D4216181 fbshipit-source-id: bb9e1fa84ac7e04006dcddb53c9347a42ec83dc8	2016-12-15 12:01:30 -08:00
Soumith Chintala	0b21581784	update torch to fecf29bb6ad7b8117eff9712d833972205de1201 cutorch to 64f974178c03c93666cfe3796b7e2d7b549476a2 nn to e8ec31cd0a531b7f7a3247dd7e777958a643d931 and cunn to 64224a65eff88d1bfe5bc47d26a901ed8c0b4705 Summary: updated stuff from upstream Reviewed By: colesbury Differential Revision: D4086676 fbshipit-source-id: a246fe0fe3a89699e88139e86850889193b3f360	2016-12-15 12:01:29 -08:00
Aapo Kyrola	6191de7ac9	gradients for CopyGPUToCPU and CopyCPUToGPU + unit test + schema Summary: Added gradients for the Copy operators. They are simply the reverse operation. Also added a unit test to test things actually work and added the operator schema and registration to model_helper's known operators. Differential Revision: D4306516 fbshipit-source-id: dd0633fa7f2ed01991990e56e63669794df037d9	2016-12-15 12:01:29 -08:00
Wael Abdelghani	390867d2d0	Fix RecurrentNetworkGradient with batch size > 1 Summary: Fix RecurrentNetworkGradient with batch size > 1. The main issue was that we always set the Gradient output to 1, 1, recurrent_size which mismatch the input (1, batch_size, recurrent_size). Further gradient ops do Squeeze and split assuming that output gradient blob is the same size as the input so they fail. The fix is simply Resizing the output as the input (1, batch_size, recurrent_size), I had to move the resize to the RunOnDevice since batch_size is computed from Input(0) which is not available till the we actually run the op. Differential Revision: D4301487 fbshipit-source-id: e5c7426d6e770d985ce72a3737381a2b4af333ba	2016-12-15 12:01:29 -08:00
Xianjie Chen	0bc104a3d0	fix unit test Summary: ... Differential Revision: D4298663 fbshipit-source-id: 7831830a5201eb6603d846460c22b2f906e53858	2016-12-15 12:01:29 -08:00
Ievgen Soboliev	1632f053e5	implement user-only metadata for input_record Summary: We want to implement request only net and to do this we decided to split the work into two parts. The first part will propagate required metadata and the second part will cut the nets properly. This diff is to propagate request_only metadata across the layers. A few notes about implementation: - Each layer contains a field request_only which can be set based on the input_record. If all the scalars from the input_record are marked request_only we mark a layer as request_only; - Sparse-To-Dense layer sets request_only metadata; - SigridTransformation and SparseLookup layers propagate request_only status; - As for now we join request_only and other sparse features together in input_record, but ideally we may want to separate this, because request_only should be served separately; Reviewed By: xianjiec Differential Revision: D4259505 fbshipit-source-id: db8a30ef92cba84f1a843981b9dde3a8b9633608	2016-12-15 12:01:29 -08:00
Martin Raison	2c3eb3e592	fix sequence_ops doc (pad_width -> padding_width) Summary: The doc for sequence ops says "pad_width" instead of "padding_width". This diff fixes it. Differential Revision: D4277186 fbshipit-source-id: 63af6cce2fe0af0d395f78c6a6a1f41518039cf8	2016-12-15 12:01:29 -08:00
Aapo Kyrola	68cfc52452	MomemtumSGDUpdate -- version of MomentumSGD with update. Summary: It gives a significant perf boost to do the parameter update inside MomentumSGD, instead of with a separate WeightedSum op. To ensure backwards compatibility, I made it a separate op. Also added an unit test. Reviewed By: prigoyal Differential Revision: D4262446 fbshipit-source-id: 38e7ee6d7677b398658ac7fe9b7a59b569e033f4	2016-12-15 12:01:29 -08:00
Xianjie Chen	3c47d41f86	add unit test for row mul Summary: so that we are more confident. Differential Revision: D4290132 fbshipit-source-id: 44e4687d977ab90cc022a14131bbf701bdf131d4	2016-12-15 12:01:29 -08:00
Martin Raison	68fbc42830	fix empty tensor handling in some operations Summary: some operations don't handle the case where the output tensor is empty, and cause segfaults or unexpected behavior (uninitialized output tensor). This diff ensures that BatchMatMul, filler operations, PackSegments/UnpackSegments and ReadNextBatch don't fail and properly initialize their output with the correct type. Those seem like fairly straightforward changes, let me know if you'd rather break it up into separate diffs. Reviewed By: Yangqing Differential Revision: D4277149 fbshipit-source-id: c5a30b67bb3b451b117d6aa83827d40b71240c2b	2016-12-15 12:01:29 -08:00
Martin Raison	2847c8f624	input_as_shape option for Filler ops Summary: I couldn't find a way to fill a tensor with a shape provided at runtime, so I added an input_as_shape option to the filler ops. When input_as_shape is true, the input can be used to directly provide the shape of the output (this is different from the default behavior, where the output is reshaped like the input). For example if the input contains [2, 3], the output will have shape [2, 3]. Let me know if you see a simpler way :) Reviewed By: Yangqing Differential Revision: D4276872 fbshipit-source-id: 095e995d8bf302152765bd51c405185ef9952212	2016-12-15 12:01:29 -08:00
Martin Raison	cd780eb9ec	avoid Exp overhead when handling underflow with MKL Summary: I've been noticing when running caffe2 experiments that calling Exp with many values close to 0 causes MKL's underflow error handler to be called repeatedly, causing significant overhead while the result is correct (e.g. exp(x) = 0). I suggest setting the error mode to VML_ERRMODE_IGNORE for those functions, unless there are good reasons not to. with the current function (see mkl_vml_kernel_sError and vsexp_cout_rare): {F65147147} with VML_ERRMODE_IGNORE: {F65147148} Let me know if you see a better workaround Reviewed By: Yangqing Differential Revision: D4277240 fbshipit-source-id: d44168da32caee4a3f88227ffb70cdc3d5314722	2016-12-15 12:01:28 -08:00
Aapo Kyrola	eddf23ca0f	Handle parameters that are computed but not optimized Summary: prigoyal sharply noticed a bug in the Resnet models: we have not been checkpointing, nor synchronizing between gpus, the moving average and variance computed by the SpatialBN ops. Particularly the first problen is serious, since models starting from checkpoint would have started from a null-state for SpatialBN. Not synchronizing with the data parallel model is less tragic since each GPU should see very similar data. Thus I propose keeping track of "computed params", i.e params that are computed from data but not optimized. I don't know if there are other examples, but SpatialBN's moving avg and var definitely are one. - I modified the checkpointign for xray model to store those blobs + also ensure the synchronization of those blobs - I modified data parallel model to broadcast those params from gpu0. I first tried averaging, but hit some NCCL deadlocks ... :( Differential Revision: D4281265 fbshipit-source-id: 933311afeec4b7e9344a13cf2d38aa939c50ac31	2016-12-15 12:01:28 -08:00
Martin Raison	8dbe435235	Ensure input type consistency in Concat operation Summary: with the current code, Concat accepts inputs of different types and concatenates them as raw data. This causes bugs that can be hard to find: for example, when concatenating a tensor of int with a tensor of long, the long integer get split in two, and the output tensor contains garbage. This adds the necessary checks to make sure the input types are all the same. Reviewed By: Yangqing Differential Revision: D4277109 fbshipit-source-id: c1568f74bb66f0d9146a54441c0ee664d5516b77	2016-12-15 12:01:28 -08:00
Martin Raison	206029bc5a	fix caffe2 tensor index overflow in Extend/Reserve/Shrink Summary: I ran into a bug when working with very big tensors in caffe2 (> 2GB). When extending beyong a certain size, the size computation was using int32 instead of int64 and would overflow. This fixes the issue. Differential Revision: D4276487 fbshipit-source-id: 1704a69c4363c7a5b2f7db748d7d570a9593f2b1	2016-12-15 12:01:28 -08:00
Xianjie Chen	c70e8115a1	dper_example use RowMul for speed Summary: Faster ~65k vs 25k: After: 11444089 Before: 11259149 Differential Revision: D4275671 fbshipit-source-id: 57de414676799980632c1d29142ee698965b1b68	2016-12-15 12:01:28 -08:00
Wenlin Chen	48bd64b41b	RowMul Summary: Position weighted embedding is a bit slow due to the hacky implementation of Mul with broadcast. This diff speeds up the Mul with RowMul. Reviewed By: xianjiec Differential Revision: D4271193 fbshipit-source-id: e5c35e18920aeef3de3a7304a8f5727d0c980613	2016-12-15 12:01:28 -08:00
Bram Wasti	0154db83c0	Merge pull request #54 from slayton58/cmake Initial CMake building with deps	2016-12-15 10:46:19 -08:00
Bram Wasti	033acae6b4	Update README.md	2016-12-15 09:56:55 -08:00
Bram Wasti	64448905ba	added nvidia toolkit to installs	2016-12-15 09:55:45 -08:00
Bram Wasti	b348e9677c	Removed deprecated installation instructions	2016-12-14 17:02:19 -08:00
Bram Wasti	d0219bf7bb	Merge pull request #1 from bwasti/travis updated Master w/ travis	2016-12-14 16:48:48 -08:00
Bram Wasti	fd04a3468c	Added .travis.yaml	2016-12-14 16:04:58 -08:00
Simon Layton	03c9d54fd0	Support openCV 2	2016-12-14 14:59:59 -05:00
Simon Layton	a46f0fb3cb	Merge branch 'cmake' of https://github.com/slayton58/caffe2 into cmake	2016-12-14 11:00:17 -05:00
Simon Layton	788f715a6e	third_party protobuf support Fix python lib missed proto dep	2016-12-14 10:54:15 -05:00
Simon Layton	df12f431e0	Removing extraneous cmake files Leftover from Caffe cmake build system	2016-12-13 09:29:01 -05:00
Simon Layton	d7eeebc269	Refactored CUDA detection a bit Refactoring, minor fixes	2016-12-13 09:29:01 -05:00
Simon Layton	d74bd7ee55	Add CUDA NVRTC cases	2016-12-13 09:29:01 -05:00
Simon Layton	fbbb87cd46	Enhancements Add BLAS chooser Move cuDNN detection from Cuda -> FindCuDNN Refactor main C2 libs, should enable no-GPU build (untested)	2016-12-13 09:29:01 -05:00
Simon Layton	5e699ce6c2	CUDA fixes Fix NCCL build move CUDA dep into Dependencies file	2016-12-13 09:29:01 -05:00
Simon Layton	b9599c7464	Compiling entire project Can run CIFAR10 Python example!	2016-12-13 09:29:01 -05:00
Simon Layton	e9f1222408	Compiling most of the project Now compiles all CPU + GPU code, tests + binaries with deps	2016-12-13 09:29:01 -05:00
Simon Layton	c05ff206b6	Build binaries	2016-12-13 09:29:01 -05:00
Simon Layton	2610d62813	Build Python libs	2016-12-13 09:29:01 -05:00
Simon Layton	52f09fe2c9	Initial building with deps	2016-12-13 09:29:01 -05:00
Bram Wasti	e9de70f296	Added basic build system	2016-12-13 09:29:01 -05:00
Simon Layton	122e115937	Removing extraneous cmake files Leftover from Caffe cmake build system	2016-12-12 12:50:08 -05:00
Simon Layton	681267b66a	Refactored CUDA detection a bit Refactoring, minor fixes	2016-12-12 12:29:00 -05:00
Simon Layton	9f35f47411	Add CUDA NVRTC cases	2016-12-09 11:01:27 -05:00
Simon Layton	09de969e9f	Enhancements Add BLAS chooser Move cuDNN detection from Cuda -> FindCuDNN Refactor main C2 libs, should enable no-GPU build (untested)	2016-12-09 10:29:06 -05:00
Simon Layton	cdb2fb6737	CUDA fixes Fix NCCL build move CUDA dep into Dependencies file	2016-12-09 09:02:26 -05:00
Bram Wasti	66a71c0232	added initial documentation template	2016-12-08 17:01:16 -08:00
Simon Layton	f79bffc78d	Compiling entire project Can run CIFAR10 Python example!	2016-12-08 13:23:04 -05:00
Sylvain Jeaugey	2a974f5ca2	Fix 1.3.2 compilation	2016-12-08 09:11:43 -08:00
Simon Layton	4255ee9944	Compiling most of the project Now compiles all CPU + GPU code, tests + binaries with deps	2016-12-08 08:40:29 -05:00
Simon Layton	497659ce0d	Build binaries	2016-12-07 10:54:06 -05:00
Simon Layton	f3c20620ed	Build Python libs	2016-12-06 13:06:16 -05:00
Simon Layton	3d719f4bff	Initial building with deps	2016-12-06 11:39:15 -05:00
Sylvain Jeaugey	648e9fbb58	Adding missing file	2016-12-05 18:06:24 -08:00
Xianjie Chen	dea27ca4ca	use TIndex for set in math.h Summary: as desc Differential Revision: D4271900 fbshipit-source-id: 92f7cbbe33e0ce4fcc21a8af9ded4f436afb43e2	2016-12-05 11:53:27 -08:00
Alisson Gusatti Azzolini	5f7d1f02f2	Use native reader for evaluation Summary: Since hashing is different. This should be ready to commit now. Running ads nn canaries. Differential Revision: D4264009 fbshipit-source-id: 3aa16b0c47c61f9a442b0375524c5f1580af5892	2016-12-05 11:53:27 -08:00
Byung-Gon Chun	1aba4280d8	Make xray net_type configurable Summary: Make xray net_type configub a command line argument Differential Revision: D4262076 fbshipit-source-id: e2ecb9cd5bee5d6aaebe0ea8d2d4d9b378058cba	2016-12-05 11:53:27 -08:00
Pieter Noordhuis	6c13dc3dd0	Fix CreateCommonWorld schema Summary: TSIA Reviewed By: dzhulgakov Differential Revision: D4264328 fbshipit-source-id: 59eaf791a05b0202000f3b7266aba63e146229d4	2016-12-05 11:53:27 -08:00
Yangqing Jia	ab3fea540d	Add serialization interface for MKLMemory Summary: This allows us to serialize things between MKLMemory and a TensorProto. Reviewed By: dzhulgakov Differential Revision: D4218044 fbshipit-source-id: 934181493b482cb259c17ff4b17008eac52fd885	2016-12-05 11:53:27 -08:00
Aapo Kyrola	e65eeff665	LMDB example Summary: This examples writes a LMDB database of image data and labels (random). Then it reads them using Caffe2's TensorProtosDBINput and validates the checksums match. This example shows how to coerce image data into TensorProtos and be happy. Before there was no clear example how to create databases for Caffe2. Differential Revision: D4263614 fbshipit-source-id: 21e08066899095b4efcc2d23dbc3ede81e75914a	2016-12-05 11:53:26 -08:00
Aapo Kyrola	96a5e88d63	Fix consequtive checkpoint syncs Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process). Differential Revision: D4251754 fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9	2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov	3125e6a821	Hacky fix for cloned model rewriting Summary: Disclaimer: this is really hacky Continues a fix from D4218902. The root problem is that DPER builds net incrementally and input_record doesn't support it properly. For not I just manipulate the input record directly. Alisson wants to fix it properly later by allowing set_input_record to accept a superset of current record. But it should unblock our experimentation. I'm curious how it's going to look in dper_example world. Reviewed By: azzolini Differential Revision: D4255285 fbshipit-source-id: ff65b6f943d705a9b3399035597e2e8ded2e1ff3	2016-12-05 11:53:26 -08:00
Martin Raison	ea9a0f24bf	automatic aggregation of sparse gradients Summary: This adds support for automatic aggregation of sparse gradients. We simply concatenate indices and values (no attempt to deduplicate, since this is already done before feeding into the optimizer). This should support various cases (indices and/or values can be generated by one or more gradient ops, or gradient outputs can be directly passed from inputs). I tried to minimize the code footprint, but I introduced SparseGradGenMeta because GradGenMeta didn't lend itself very well to be used with sparse gradients. Reviewed By: dzhulgakov Differential Revision: D4219788 fbshipit-source-id: 1d074664cffd82a8764e4b1473ada6bc46e6c51a	2016-12-05 11:53:26 -08:00
Xianjie Chen	2045a5de9f	add position based weighting Summary: adding more methods to the layer representation. The corresponding implementation in DPER is: https://fburl.com/563869364 Differential Revision: D4256583 fbshipit-source-id: 91326b7bb9e960a5bc70b5a13812fce90054eceb	2016-12-05 11:53:26 -08:00
Aapo Kyrola	3410939459	pass learning rate scaling factor to parameter update builder function Summary: When refactoring data parallel model, the division of LR by number of devices was dropped, and thus we ended up effectively multiplying gradients by the number of devices. Thus, we need to scale the LR by 1/numgpus. Created a test to confirm that data_parallel_model produces exactly same results on different number of gpus, given the total batch size. Reviewed By: prigoyal Differential Revision: D4248907 fbshipit-source-id: af21ede113e6ac25f12c556de298cb18974548be	2016-12-05 11:53:26 -08:00
Pieter Noordhuis	a3942b2d64	Add store ops and tests Summary: Basic ops to set/get/check/wait against a StoreHandler. Differential Revision: D4248059 fbshipit-source-id: cc53061fcc13823d4b9eed6b7c1c346b9e8ec991	2016-12-05 11:53:26 -08:00
Pieter Noordhuis	f3403a1110	Add RedisStoreHandler Summary: Add store handler implementation backed by a Redis server. This allows for easy rendezvous when participating machines have no access to a shared filesystem. Differential Revision: D4241715 fbshipit-source-id: 4ce881df3a96af24f7efbb02d1050b3b2b9bc3c0	2016-12-05 11:53:26 -08:00
Dmytro Dzhulgakov	119b687994	Allow PythonOp to access the workspace Summary: DPER has very strange python ops that play with Workspace - they are somewhat similar to LoadOp/SaveOp, so I guess the semantics is fine. Thus it makes sense to allow python operators to receive workspace pointer similarly to regular Operators. I didn't figure out a better way to implement optional argument than just checking the number of args function receives on python side. Reviewed By: ajtulloch Differential Revision: D4242943 fbshipit-source-id: d97d4227815b741c8f884cfe254b06d2b56b5a41	2016-12-05 11:53:26 -08:00
Andrey Malevich	2390dfefdb	Kill few more CHECKs. Summary: One more small batch of CHECKs that left in C2 codebase. Most of the left overs should be in tests/GPU only code. Reviewed By: Yangqing Differential Revision: D4243782 fbshipit-source-id: a4a03c116ea8ba16facd2efc135746d5921f19d5	2016-12-05 11:53:25 -08:00
Jason Jeong	af2a3076a2	add header for AsyncDAGNet Summary: This diff adds a header file for net_gpu.cc so that the AsyncDAGNet class can be used to create other derived classes. Reviewed By: ajtulloch Differential Revision: D4230046 fbshipit-source-id: 379c3ff7ebb7aeeb4294f39e6f5d1ecad48b92f0	2016-12-05 11:53:25 -08:00
Bram Wasti	8f398d795e	Added basic build system	2016-12-04 16:42:00 -08:00
Sylvain Jeaugey	34d27771c6	1.3.2 release Broadcast tuning Better checking of inputs Copy/reduce code simplification	2016-12-01 15:17:50 -08:00
Sylvain Jeaugey	1093821c33	Replace min BW by average BW in tests	2016-12-01 15:16:35 -08:00
Yangqing Jia	107966b059	add error message for asan Summary: This makes sure that we have useful CUDA error message in asan mode. Also made a fb specific task pass by explicitly marking it not asan-able. Reviewed By: dzhulgakov Differential Revision: D4243471 fbshipit-source-id: 2ce303b97b3b4728c05575a8e7e21eb5960ecbc7	2016-11-29 15:18:39 -08:00
Martin Raison	da72658fa8	sparsehash-based implementation of UniqueOp Summary: Faster implementation of UniqueOp using google::dense_hash_map, as suggested by dzhulgakov. I haven't benchmarked it precisely but early measurements with my workflow show a significant speed bump (this operation went from using 20% of overall CPU time down to 7%). I gated the implementation using the "engine" feature, to avoid adding sparsehash as a dependency to caffe2. Reviewed By: dzhulgakov Differential Revision: D4219768 fbshipit-source-id: 2f142981e772105b42fffa24afb199ef816f8e0c	2016-11-29 15:18:39 -08:00
Maxime Boucher	f16c2fe3da	Create a reserve operation for tensors to avoid reallocating memory on Extend() and Resize() operations Summary: I want to collect tensors over multiple batches and so this operation could become helpful to allocate enough memory from the beginning Reviewed By: dzhulgakov Differential Revision: D4216198 fbshipit-source-id: e6b67cc7d80d71455487878da9b6b7a225035085	2016-11-29 15:18:39 -08:00
Liang Xiong	1aafeb3565	clean up memory of c2/sigrid predictor Summary: trying to optimize c2 predictor memory usage. mainly to remove unsed dbreader and dper metadata. Differential Revision: D4232595 fbshipit-source-id: dcd7aa7dd09587ec9811a9e5ec725e0c22757665	2016-11-29 15:18:39 -08:00
Xianjie Chen	f41b2ca85c	fix sliceop for empty batch Summary: Used in the NNPreProc layers. It fails the online training when there is empty batch. Reviewed By: dzhulgakov Differential Revision: D4235498 fbshipit-source-id: bde00a011831762e44a3f9bf2190d4b241a06ccc	2016-11-29 15:18:39 -08:00
Aapo Kyrola	10d0aea88c	gradient for FlattenToVec Summary: FlattenToVec was missing a gradient. It can use same gradient implementation as FlattenOp, i.e ResizeLike. Reviewed By: kdub0 Differential Revision: D4241207 fbshipit-source-id: 6b1a60681fdce3c6f3139d0cd43b17798de2cbc9	2016-11-29 15:18:38 -08:00
Ahmed Taei	2a95bd5239	Incremental MeanReducer segment Ops Summary: Adding {Sorted\|Unsorted}SegmentMean{Gradient\|} operators. Reviewed By: dzhulgakov Differential Revision: D4185094 fbshipit-source-id: d4431e2a5a10a59570a491d63962668f248c0606	2016-11-29 15:18:38 -08:00
Yangqing Jia	be1f3ed1d7	Add a snapshot test for Simon Layton Summary: This is mainly for the OSS side checking. Reviewed By: dzhulgakov Differential Revision: D4238349 fbshipit-source-id: 061da3f721341c4a1249e1cc6c8c842fc505860f	2016-11-29 15:18:38 -08:00
Ou Jin	e8b7ec1393	disable local update for sparse features Summary: With parameter server, sparse features are updated on the parameter server. Local update for sparse features are disabled. But that logic is removed in D4144922. This diff is to add this logic back in a slightly different way. Previously, in trainer_example, I did that in a hacky way just avoid adding sparse weight to model.params. It will still generate grad, but will not add optimization operators. At the same time, it is always registered directly in the sparse_mapping, so the parameter server is aware of this parameter. But with the new change for ParameterInfo. I can not do it in that way anymore. Because the param registry and params are bind together in ParameterInfo. For dper, there is a option in dper model helper to disable all of the sparse parameter optimizer. To combine these two together, I directly changed the ModelHelperBase in this diff. It is not quite ideal. It is better to do it in Layer. But to fix the old one, this seems to be more reasonable place to cover both cases. With this diff, there is no spike anymore. So probably this is the root cause for the convergence issue we have seen in D4144922. It explains that why the model can recover, which is because adagrad decays local learning rate and local updates cause less change. Reviewed By: dzhulgakov Differential Revision: D4229684 fbshipit-source-id: da1241d43d7c52cbf13560f9bb83e09897d8d56f	2016-11-29 15:18:38 -08:00
Aapo Kyrola	5d0167c8e7	Example workflow for running disributed (syncsgd) imagenet training in Flow Summary: This diff introduces a simplified Imagenet trainer that uses data_parallel_model to parallellize training over GPUs and Nodes in synchronous manner. Flow's gang scheduling is used to launch the nodes, and data_parallel_model handles the synchronization among the gang members. This example also uses the operator-per-epoch model where each epoch produces a checkpoint consumed by the followup epoch. Reviewed By: salexspb Differential Revision: D4223384 fbshipit-source-id: 8c2c73f4f6b2fdadb98511075ebbd8426c91eadb	2016-11-29 15:18:38 -08:00
Huazhong Ning	6ebae91d24	multi-task learning: save model and evaluator Summary: This consists of a series of diffs for implementing Multi-task learning. This diff is to 1. save model; 2. support MT learning in evaluator 3. add unittest. model after merging (saved model): https://our.intern.facebook.com/intern/graphviz/?paste=56793140 Reviewed By: xianjiec Differential Revision: D4123316 fbshipit-source-id: 225bf8616962ec08f4f1ef85729c1e94ba7c373a	2016-11-29 15:18:38 -08:00
Aapo Kyrola	365ca8da1c	add sanity check that ops do not cross gpus Summary: Debugging nets can be tiresome, so it is good if we can do some sanity checks. This adds a sanity check that all non-NCCL and non-Copy operators do not reference blobs that have different device scope than the operator. This check is only added to the data_parallel_model, so it should be safe. This check would had caught a subtle bugin prigoyal's training pipeline. Reviewed By: dzhulgakov Differential Revision: D4230444 fbshipit-source-id: 3d4a843162134a7a504053d95ff97a552e6b8a6d	2016-11-29 15:18:38 -08:00
Dmytro Dzhulgakov	a7df0e6724	Clone model net to avoid hard-coded inputs Summary: Previously DPER was quite broken - we couldn't change loaders on the fly because serialized model had blob names hard-coded, e.g. "nn_loader/dense". In fact, the tests worked only by accident as both trainer and evaluator used the same loader type. This diff does the following: 1) when writing out model, remap input blobs to be 'inputs/<field_name>' 2) when loading eval model, remap them back to the current loader This diff uses Net.input_schema() for convenience, in particular the schema format is implicitly serialized in input blobs names. From our discussion with Andrey this type of hardcoding is actually acceptible since the schema of HiveReader on python side is inferred via the same string-parsing procedure It also modifies model saving a bit so that we don't pollute global namespace with shape_provider net. Overall code in mlp.py is pretty terrible. But I'd leave refactoring to xianjiec as a part of Layers migration. Reviewed By: xianjiec Differential Revision: D4218902 fbshipit-source-id: 6cd19f0343ec1be6ddaa3581512e61879957749e	2016-11-29 15:18:38 -08:00
Xianjie Chen	a597c7b167	implement sparse nn using layers Summary: - It's first prototype that includes simple unary test. - will probably need to iterate based on it to include more arches that we see promising offline results Differential Revision: D4208336 fbshipit-source-id: 5b2d2a5a0274a9dcad0fb169e43e78aa9d9a704d	2016-11-29 15:18:38 -08:00
Xianjie Chen	9ea7947423	dot_production work for empty batch Summary: used for dper delivery Reviewed By: dzhulgakov Differential Revision: D4229855 fbshipit-source-id: 9722b0bbb6c3c586b1864c939e8cb0535f8c5846	2016-11-29 15:18:38 -08:00
Andrey Malevich	301ab97e41	Fix few more operators to handle empty batches correctly. Summary: If we go to prod some of the sparse features might be empty or for some reason batch might be empty. It's a good idea to be sure that we can run empty batches. Reviewed By: dzhulgakov Differential Revision: D4197297 fbshipit-source-id: 1a154ebf625d1a39fd15354a154cf100f525ae9a	2016-11-29 15:18:37 -08:00
Bram Wasti	f95757e66e	Added internal logging to internal usage of caffe2 Summary: Added basic logger functionality. Reviewed By: dzhulgakov Differential Revision: D4150362 fbshipit-source-id: 2eb98ce72a5020fbfeec2ab8c5ff65b9a2128802	2016-11-29 15:18:37 -08:00
Jeff Johnson	da7add3da8	Better threadpool sizing heuristics Summary: The old heuristic functioned badly on octa-core phones (e.g., the S6). Limiting the number of threads to 4 in the 8 core case seemed to give optimum performance. For 4 cores, 3 threads still seems to yield best performance, as does 2 threads for 2 cores in the iOS phones, though those cores are very different than the typical ARM cores in Android phones. I figure at the limit, we should limit ourselves to half the cores available, especially since in a big.LITTLE configuration, only half the cores are likely to be big. I need to get my hands on a deca-core phone or tablet to try out this heuristic, but I certainly figure that this will function better than what we had before (which would be 9 threads on a 10 core device). Reviewed By: ajtulloch Differential Revision: D4220341 fbshipit-source-id: 06fa7677789fcdbec03d98bb85a565f1d22099e1	2016-11-29 15:18:37 -08:00
Andrew Tulloch	6515772b1f	Fix UBSAN issue for zero-sized memcpy Summary: ... Reviewed By: bwasti Differential Revision: D4224805 fbshipit-source-id: 4fd07f9755b6b76978c05c9af0851019530a3c85	2016-11-29 15:18:37 -08:00
Yangqing Jia	5eb836880d	Add unittest.main() lines to test scripts under python/operator_test Summary: Needed by oss. This is done by running the following line: find . -name "*_test.py" -exec sed -i '$ a \\nif __name__ == "__main__":\n import unittest\n unittest.main()' {} \; Reviewed By: ajtulloch Differential Revision: D4223848 fbshipit-source-id: ef4696e9701d45962134841165c53e76a2e19233	2016-11-29 15:18:37 -08:00
Andrew Tulloch	1e4b4fb4c4	Fix db_test under tsan Summary: It looks like there's some locking going on here, and so if the Cursor outlives the DB (or vice-versa), we'll either deadlock or unlock an unlocked mutex. Reviewed By: dzhulgakov Differential Revision: D4224727 fbshipit-source-id: 886401a9f2824f3168fb0b2fd4df6046369e5590	2016-11-29 15:18:37 -08:00
Aapo Kyrola	c1c92479bd	check that numpy arrays are float32 when CUDA is used Summary: Recurrent developer-issue is that they pass numpy arrays with FeedBlob but forget that python float is actually double. Cuda ops in caffe2 don't allow doubles. Thus, I think we should reject incorrect types already at the FeedBlob() when device option is CUDA. Added test. Is this too strong? Reviewed By: ajtulloch Differential Revision: D4208153 fbshipit-source-id: 364b057a2a37b5d4b95de4e59faebdab724bb0ed	2016-11-29 15:18:37 -08:00
Aapo Kyrola	b9f1555b6a	remove unused function from resnet50_trainer Summary: Just noticed that I had duplicate code in the example imagenet trainer. Removed the function. Differential Revision: D4223070 fbshipit-source-id: 443a9401bf7e425f7a3a13a44c9d0f7e21e72303	2016-11-29 15:18:37 -08:00
Aapo Kyrola	b77aa551a4	add missed comma Summary: D4205610 missed a comma , causing unnecessary logspill with WeightedSum op Reviewed By: Yangqing Differential Revision: D4222806 fbshipit-source-id: ff17c20eae7a7168475f39cc227d3e8ab347288f	2016-11-29 15:18:37 -08:00
Aapo Kyrola	42279a610c	use Pieter-MPI and fb.distributed Summary: Remove MPI and use fb.distributed rendezvous and Pieter's new Ops. One now can pass a 'rendezvous' struct to data_parallel_model to initiate distributed SyncSGD. Provided rendezvoud implementation uses the kv-store handler of fb.distributed to disseminate information about other hosts. We can easily add other rendezvous, such as file-based, but that is topic of another diff. Removing MPI allowed also simplifiying of Xray startup scripts, which are included in this diff. When accepted, I will work on a simple example code so others can use this stuff as well. Also Flow implementation will be topic of next week. Differential Revision: D4180012 fbshipit-source-id: 9e74f1fb43eaf7d4bb3e5ac6718d76bef2dfd731	2016-11-29 15:18:36 -08:00
Pieter Noordhuis	122a89e3c5	Add FileStoreHandler Summary: The FileStoreHandler subclasses the abstract StoreHandler class. Operators expecting to work with a StoreHandler can now use the filesystem as their backing store. Reviewed By: Yangqing Differential Revision: D4217711 fbshipit-source-id: fce60c99c4c505201dfee33ca0a4e8a35db00338	2016-11-29 15:18:36 -08:00
Yangqing Jia	2790043421	Add the MKLDNN type to the tensor type strings and added proper docs. Summary: TSIA Reviewed By: dzhulgakov Differential Revision: D4217541 fbshipit-source-id: f68d1aba9c20af0fb0aed2cc1b2099961f6fa7a4	2016-11-29 15:18:36 -08:00
Xianjie Chen	0a42681f0c	print more logs in qps metrics Summary: since the LogScoreEstimator print # of examples after considering negative downsampling. Reviewed By: kdub0 Differential Revision: D4218040 fbshipit-source-id: 30f54353042dcd85c945c2c911ba0b6d9c0b1540	2016-11-29 15:18:36 -08:00
Jim Meyering	6b437708ad	caffe2/caffe2/operators/softmax_with_loss_op.cc: avoid shadowing warnings Summary: Fix warnings exposed by gcc-4.9.x's -Wshadow-compatible-local (and/or the stricter -Wshadow-local) options. Note that these are both less onerous than -Wshadow. I plan to enable one of them for all of fbcode, soon. Rename inner "idx" to "k". Differential Revision: D4216556 fbshipit-source-id: 5ee48751efd07838db24f56390730718ea031772	2016-11-29 15:18:36 -08:00
Aapo Kyrola	2fbf774e99	make ReshapeOp work with CUDA Summary: It was not just enough to register ReshapeOp for CUDA, since it does memory copy to/from tensors. This happened in two places: when assigning shape from a shape blob and when outputing a shape tensor. Also changed the resizeoptest to use CUDA when available (this test was done before hypothesis-tests, so I had to do this manually) Differential Revision: D4217342 fbshipit-source-id: 61761bac015f3731cf480ccef2563e9c80e0f4aa	2016-11-29 15:18:36 -08:00
Maxime Boucher	c3606dcb9a	Fix integer/floating point conversion error when computing how much data to allocate Summary: see title Reviewed By: dzhulgakov Differential Revision: D4215607 fbshipit-source-id: 172b8d743e5abe533998e884809aafb4c4bf1b1b	2016-11-29 15:18:36 -08:00
Pieter Noordhuis	c48551409c	Proper error message if passing NoneType value for kwargs Summary: I got a weird error about NoneType not being iterable which made me think it was some error in the C2 core, whereas it was an error in my code. Reviewed By: Yangqing Differential Revision: D4192799 fbshipit-source-id: 0122f13e205c1c6a0766545f0ad6296228d3a3d9	2016-11-29 15:18:36 -08:00
Willy Blandin	949ce294ff	fix race condition in text_file_reader.py Summary: This fixes a race condition in text_file_reader.py. For example in `fbcode/caffe2/caffe2/fb/text/stats.py`, in `compute_meta`, we build an execution step `read` such as: ``` . └── step_read ├── net_reader │ ├── op_TextFileReaderRead │ └── op_IsEmpty └── net_consume:n └── op_Tokenize ``` Note that in `workspace.cc`, we check should_stop between each _step_ and each _net_, not between _ops_ Let's say we have 2 workers, here is a faulty interleaving of threads: - 1 executes TextFileReaderRead - 2 executes TextFileReaderRead - 1 executes IsEmpty and sets should_stop to False - 2 executes IsEmpty and sets should_stop to True - 1 checks should_stop before running net_consume:n - 1 stops - 2 checks should_stop before running net_consume:n - 2 stops That's an issue, because 1 did read data from the file but did not run the processing step (consume:n) for this data. Reviewed By: dzhulgakov Differential Revision: D4203729 fbshipit-source-id: eabd94ea995527ec52fa137a8b63c277f7e4dd96	2016-11-29 15:18:36 -08:00
Yangqing Jia	0e298ec399	Expose MKLMemory to the Python Feed and Fetch interface, and misc changes Summary: This is #2 of a series of changes. It did the following: (1) a few refactor of the MKL memory interface (2) an initial MKLContext to deal with MKL specific computations (3) Provide MKLMemory access in Python with the blob feeder/fetcher registration. Reviewed By: dzhulgakov Differential Revision: D4210123 fbshipit-source-id: adea1f1ffbd0b9ffdd55092676468c16bec08992	2016-11-29 15:18:36 -08:00
Wenlin Chen	9fa26fcc32	position weighted embedding Summary: Each sparse feature is a ID list. And usually the position of the id in the id list is meaningful. The earlier the id appears in the list, the more important. In this diff, we multiple each embedding with a weight, where the weight corresponds to the position. With this change, same ID appears on different position would have different norm/length/importance after aggregation. The firstX transformation in sigrid is a special case of this model where the weights before n are 1, and 0 after n, where n is the argument of firstX. Reviewed By: xianjiec Differential Revision: D4181251 fbshipit-source-id: 2a6f8b7240af445b6bd2052fd24c2d99f39ee7ff	2016-11-29 15:18:35 -08:00
Aapo Kyrola	b5613d7a3d	report offending blob name when blob in wrong device scope Summary: Another recurrent problem is that some blob is in CPU scope while operator expects CUDA scope (or other way round). The exception is only partially helpful, as it tells the operator but not the offending blob name. This diff adds the blob name to the exception message, helping debug. Reviewed By: prigoyal Differential Revision: D4208584 fbshipit-source-id: 5aeac5c3efeed8d6c995bea166ed534855007945	2016-11-29 15:18:35 -08:00
Aaron Jaech	c41f0d27c4	adding more things to the list of known operators in model_helper Summary: This is so they don't generate spurious warning messages in the logs Reviewed By: dzhulgakov Differential Revision: D4205610 fbshipit-source-id: f764b51565430f4057898ab929372bc7943e0495	2016-11-29 15:18:35 -08:00
Sylvain Jeaugey	ddddfba1c0	Merge pull request #54 from peterhj/peterhj-staticlib Add a static library target "staticlib" to the Makefile.	2016-11-28 09:15:39 -08:00
Peter Jin	5765d608cc	Add a static library target "staticlib" to the Makefile. Rename the static library "libnccl_static.a" to disambiguate from the dynamic libraries.	2016-11-24 11:31:03 -08:00
Kyle Fernandes, ne Jacobs	c2c515516b	Remove irrelevant output from ncclReduce Fortran tests	2016-11-21 10:18:04 -08:00
Kyle Fernandes, ne Jacobs	9c18468fe2	Add Copyright header to Fortran bindings source files	2016-11-21 10:17:58 -08:00
Yangqing Jia	9a02908b78	enable reshape op Summary: TSIA Differential Revision: D4208409 fbshipit-source-id: b5927af1d329639840f232e620cb0241cd88b03d	2016-11-18 17:03:21 -08:00
Bram Wasti	61c0bcf91d	removed deprecated readme info Summary: Removed deprecated information. Reviewed By: Yangqing Differential Revision: D4208320 fbshipit-source-id: 6906e9c56b9f0abf0582c2ba1bb8e6a5a9f89a84	2016-11-18 17:03:20 -08:00
Yangqing Jia	589398950f	fbsync at f5a877	2016-11-18 15:41:06 -08:00
Kyle Fernandes, ne Jacobs	5f2b32e45b	Add Fortran bindings	2016-11-17 15:33:34 -08:00
Yangqing Jia	238ceab825	fbsync. TODO: check if build files need update.	2016-11-15 00:00:46 -08:00
Yangqing Jia	d90206b3fd	Merge pull request #46 from slayton58/bn_test_fix2 Fix BN in test phase	2016-10-26 10:58:09 -07:00
Yangqing Jia	fe1e644052	Merge pull request #45 from slayton58/cudnn_find_ex Move algo finding to cudnnFind*Ex	2016-10-26 10:26:01 -07:00
Yangqing Jia	c88b1e0d25	Merge pull request #48 from slayton58/nccl_priority_streams Add priority streams for NCCL ops	2016-10-26 10:25:13 -07:00
Simon Layton	8def54e82b	Fix BN in test phase	2016-10-19 08:20:11 -04:00
Simon Layton	58a0ec4b4f	Move algo finding to cudnnFind*Ex	2016-10-19 08:10:17 -04:00
Simon Layton	2909bfaca0	Add priority streams for NCCL ops	2016-10-18 12:28:35 -04:00
Sylvain Jeaugey	534b9a1697	Bump to 1.3.1	2016-10-13 10:33:05 -07:00
Sylvain Jeaugey	b2781d0501	Fix primitives function prototype	2016-10-13 10:32:42 -07:00
Sylvain Jeaugey	bf7d1514f7	NVML (libwrap) : import the needed definitions	2016-10-13 10:28:59 -07:00
Yangqing Jia	44509f9f91	fbsync: mostly lint changes, added mkl files	2016-10-11 22:45:06 -07:00
Yangqing Jia	9201cdd029	nervana build files	2016-10-07 18:37:26 -07:00
Yangqing Jia	f019672e0b	Merge branch 'master' into fbsync	2016-10-07 16:42:13 -07:00
Yangqing Jia	2535720654	fbsync	2016-10-07 15:47:52 -07:00
Yangqing Jia	d1e9215184	fbsync	2016-10-07 13:08:53 -07:00
Sylvain Jeaugey	8bb06c94be	Improved allreduce segmentation for small sizes	2016-10-07 12:42:23 -07:00
Yangqing Jia	8dec763659	Merge pull request #44 from slayton58/bn_test_fix_only Fix BN for test phase	2016-10-07 11:13:37 -07:00
Simon Layton	00c493864e	Fix BN for test phase	2016-10-07 12:11:36 -04:00
Yangqing Jia	b861d9a264	Merge branch 'fbsync'	2016-09-16 17:00:13 -07:00
Yangqing Jia	ff8a5bb532	move third_party/gtest to use git submodules. As a result the folder name is now googletest	2016-09-16 16:56:34 -07:00
Yangqing Jia	b91a6795ed	move third_party/eigen3 to use git submodules. As a result the folder name is now eigen.	2016-09-16 16:25:22 -07:00
Yangqing Jia	04f628446e	move third_party/google/protobuf to use git submodules.	2016-09-16 16:13:27 -07:00
Yangqing Jia	3d54e7b40e	fbsync: changes to implement operator schema	2016-09-08 18:07:01 -07:00
Yangqing Jia	0a09d09431	fbsync	2016-09-08 17:56:14 -07:00
Yangqing Jia	221a82bdad	more build fixing	2016-09-07 23:30:35 -07:00
Yangqing Jia	50274cf49e	fixing build - to be verified on a GPU machine	2016-09-06 18:09:51 -07:00
Yangqing Jia	b23e51d467	chunky sync	2016-09-06 15:55:19 -07:00
Yangqing Jia	862e35af25	template magic	2016-08-10 14:17:22 -07:00
Yangqing Jia	1fc9830b59	make android build again	2016-08-10 14:00:11 -07:00
Yangqing Jia	9c459be63a	minor fix to make things build	2016-08-10 11:33:04 -07:00
Yangqing Jia	05512d1e10	sync	2016-08-10 11:02:15 -07:00
Yangqing Jia	91d5d740c3	Merge branch 'fbsync' of github.com:caffe2/caffe2	2016-08-09 16:22:45 -07:00
Yangqing Jia	f9b7416efe	context_gpu.h bugfix	2016-08-09 16:19:59 -07:00
Yangqing Jia	c4eea1f2f6	Remove the no-shrink test as the critera is not guaranteed to be satisfied. Minor fix for others.	2016-08-04 23:34:37 -07:00
Yangqing Jia	bfe76b2be4	cuda memory pool implementation cleaning: both cub and cnmem	2016-08-04 22:59:15 -07:00
Yangqing Jia	2362571b23	add submodule cub	2016-08-04 15:17:15 -07:00
Yangqing Jia	1ede7a7ff0	more build updates: (1) nccl submodule, cnmem submodule (2) mpi ops fallback test (3) a bit more blob interface (4) fixed tests (5) caffe2.python.io -> caffe2.python.dataio to avoid name conflicts (6) In the build system autogen __init__.py instead of having manual rules just to copy over an empty __init__.py.	2016-08-02 23:28:23 -07:00
Yangqing Jia	b2c2d0b70c	Merge branch 'fbsync' of github.com:caffe2/caffe2 into fbsync	2016-08-01 20:59:52 -07:00
Yangqing Jia	c15e45c9bb	chunky sync again	2016-08-01 20:58:46 -07:00
Yangqing Jia	a7af924919	Add libz to dependency list (needed by rocksdb)	2016-08-01 20:56:47 -07:00
Yangqing Jia	4fffb7dc66	Add libz to dependency list (needed by rocksdb)	2016-08-01 16:03:28 -07:00
Yangqing Jia	59256bac6d	Add libz to dependency list (needed by rocksdb)	2016-08-01 15:37:10 -07:00
Yangqing Jia	ad4641a2dc	Merge pull request #38 from caffe2/fbsync Fbsync	2016-08-01 14:32:04 -07:00
Yangqing Jia	6fae5a043a	Merge pull request #36 from songhan/fbsync protected legacy_pad_, replace DeleteDropout with is_test=True	2016-07-29 12:56:35 -07:00
Song Han	cc46464cf6	protected legacy_pad_, replace DeleteDropout with is_test=True	2016-07-29 11:44:55 -07:00
Yangqing Jia	3c989347d8	caffe translator with added back legacy pooling support	2016-07-28 23:37:02 -07:00
Yangqing Jia	5ab7676d20	code fix for oss	2016-07-28 16:14:44 -07:00
Yangqing Jia	bcea409c82	sync	2016-07-28 15:06:43 -07:00
Yangqing Jia	f01f2063dd	bring up caffe.proto to master	2016-07-28 09:55:49 -07:00
Yangqing Jia	b729f05c35	Android build improvements	2016-07-26 12:48:53 -07:00
Yangqing Jia	d981e79e7d	Add a simple script to help build android.	2016-07-26 10:40:37 -07:00
Yangqing Jia	f09d2b2b35	changes to make c2 build.	2016-07-21 16:39:08 -07:00
Yangqing Jia	09bed67e4f	add untracked files	2016-07-21 11:26:41 -07:00
Yangqing Jia	6463eebc7b	chunky sync - build scripts to be written	2016-07-21 10:16:42 -07:00
Yangqing Jia	292211baa4	Merge pull request #34 from lukeyeager/fix-docker-zmq Fix zeromq build in Dockerfile	2016-06-08 14:38:12 -07:00
Luke Yeager	e2d3145d72	Fix zeromq build in Dockerfile The tarball download returns a 404	2016-06-03 17:32:24 -07:00
Yangqing Jia	92100e4b66	Update README.md	2016-05-16 21:42:12 -07:00
Yangqing Jia	a9aac859dd	glog as default.	2016-05-15 23:25:21 -07:00
Yangqing Jia	79c5275d75	A set of changes to make newest sync build. (1) build file changes. (2) removed data/ subfolder - anything involving datasets should probably be tested separately. (3) Some new functionalities. TODOs: (1) build files for contrib/ (2) cudnn 5.05 compatibility (currently supporting 5.04)	2016-05-15 23:04:32 -07:00
Yangqing Jia	559053d3a8	chunky sync	2016-05-13 14:43:48 -07:00
Yangqing Jia	8f24306aff	bugfix	2016-03-17 21:39:29 -07:00
Yangqing Jia	ff9a9a34bd	bugfix	2016-03-17 21:12:03 -07:00
Yangqing Jia	4ac4a49e58	bugfix	2016-03-17 21:04:27 -07:00
Yangqing Jia	ecd9507fc1	minot changes	2016-03-17 20:50:14 -07:00
Yangqing Jia	0190bd4fe1	cuda memorypool renaming	2016-03-17 20:48:49 -07:00
Yangqing Jia	137b880aac	cuda initialization. This makes it callable multiple times but the actual code only runs once. TODO: make it thread safe. I am too lazy for now.	2016-03-15 12:52:05 -07:00
Yangqing Jia	a2971e6a16	bugfix	2016-03-11 10:54:34 -08:00
Yangqing Jia	fa34452625	eigen3 brew bugfix	2016-03-11 10:43:43 -08:00
Yangqing Jia	4ae1bbbd7e	bugfix	2016-03-11 10:30:16 -08:00
Yangqing Jia	0521e1d672	notebook rewrite and grammar bugfix	2016-03-10 17:34:31 -08:00
Yangqing Jia	cf7ca23fc1	make caffe2.python build	2016-03-08 16:48:19 -08:00
Yangqing Jia	9ae880bb6f	move pycaffe2 to caffe2.python	2016-03-08 15:45:30 -08:00
Yangqing Jia	0747a4a7fd	move a bunch of things	2016-03-08 15:15:19 -08:00
Yangqing Jia	9e5c795c11	third party clean	2016-03-08 14:39:31 -08:00
Yangqing Jia	ffec31ea07	pycaffe2 c++ extension: py3 So I tried to make things compilable in python3 but a lot of the actual functionalities are yet to be verified. Since I am not using py3 for a short while and protobuf 2.6.1 does not work with py3 (among a bunch of others), I'll put this as a future todo item.	2016-03-07 22:08:53 -08:00
Yangqing Jia	1b5da38e29	minor fix	2016-03-07 13:47:36 -08:00
Yangqing Jia	e7c016cde6	py3 reformat	2016-03-07 08:53:08 -08:00
Yangqing Jia	46de5451ed	BREW modifications	2016-03-05 08:00:42 -08:00
Yangqing Jia	176de750c8	pycaffe2 minor fix	2016-03-05 08:00:20 -08:00
Yangqing Jia	b589d831b8	cudnn v4 interface change.	2016-03-05 08:00:08 -08:00
Yangqing Jia	05ead5f76f	Bugfix for logging	2016-03-04 09:47:20 -08:00
Yangqing Jia	0d5d16b3e6	race condition fix: since Memcpy is now async, we will make sure that the python interface syncs before returning the content. Otherwise it makes things flaky.	2016-02-02 16:05:04 -08:00
Yangqing Jia	50874dc746	relu and pool wip	2016-02-01 14:08:10 -08:00
Yangqing Jia	1740974347	average pooling wrapper: without this the NHWC path would throw an error as the order is not passed along.	2016-01-22 09:31:49 -08:00
Yangqing Jia	5a94ee6b64	Allow one to set the blas backend, while optionally choosing to use Eigen for the whole numerical computation (for example, on a platform where there is no optimized BLAS libraries present, or Eigen is already the fastest numerical library existing). The paths I have tested is Eigen and atlas. Have not tested MKL yet.	2016-01-20 17:05:21 -08:00
Yangqing Jia	78aa266770	Fix	2016-01-19 14:49:48 -08:00
Yangqing Jia	d84545c5fb	fp16: allow one to override.	2016-01-19 14:39:26 -08:00
Yangqing Jia	5f2d7ba963	misc: experimental cuda elementwise rtc, softmax fp16	2016-01-19 12:49:36 -08:00
Yangqing Jia	d244ca9052	relu fp16 fix	2016-01-13 22:12:49 -08:00
Yangqing Jia	fa59b90c72	misc updates	2016-01-13 21:00:56 -08:00
Yangqing Jia	a05782f025	fix	2016-01-12 15:59:02 -08:00
Yangqing Jia	d08880e61a	more RTC experiments	2016-01-12 15:44:15 -08:00
Yangqing Jia	fe78d1a445	quick rtc try	2016-01-11 20:21:41 -08:00
Yangqing Jia	5bc33a4f7a	Minor fix	2016-01-11 20:10:02 -08:00
Yangqing Jia	64e0d3a29a	misc updates, mainly relu, to test fp16	2016-01-07 20:56:11 +00:00
Yangqing Jia	e2b9172b4c	print cudnn version	2016-01-07 18:49:41 +00:00
Yangqing Jia	1c020d257b	bugfix	2016-01-07 18:48:30 +00:00
Yangqing Jia	a5a75e8005	some changes for TX1 benchmark	2016-01-05 20:20:50 +00:00
Yangqing Jia	66b13f4062	A bunch of cpu stuff: - Bring up eigen 3.3 beta 1. Slight performance improvements. - default avx, avx2 and fma compilation.	2016-01-05 09:56:34 -08:00
Yangqing Jia	ba62b4b493	minor changes to the build system as well as a cpu benchmark.	2016-01-05 09:56:31 -08:00
Yangqing Jia	8bcfb30d97	make android	2016-01-05 09:55:22 -08:00
Yangqing Jia	809d54ee50	convnet benchmark minor change	2016-01-05 09:55:22 -08:00
Yangqing Jia	8c1bbaa2ab	some fill ops that are not tested.	2016-01-05 09:55:22 -08:00
Yangqing Jia	6cb2072422	cudnn conv op backward compatibility back to v2	2016-01-05 09:55:21 -08:00
Yangqing Jia	778a1f6956	speed benchmark	2016-01-05 09:55:21 -08:00
Yangqing Jia	05eda208a5	Last commit for the day. With all the previous changes this should give an exact reference speed that TensorFlow with CuDNN3 should achieve in the end.	2016-01-05 09:55:21 -08:00
Yangqing Jia	896e8e5274	pooling backward cudnn, and constant for kOne and kZero.	2016-01-05 09:55:21 -08:00
Yangqing Jia	f8585bbf62	cudnn pool op.	2016-01-05 09:55:21 -08:00
Yangqing Jia	664bdf83d7	Pooling refactor so we can do a proper cudnn benchmark.	2016-01-05 09:55:21 -08:00
Yangqing Jia	288f350899	math_gpu.cu bugfix	2016-01-05 09:55:21 -08:00
Yangqing Jia	ebd6c9fab8	muji bugfix with ngpu=4	2016-01-05 09:55:21 -08:00
Yangqing Jia	55cced894d	Some untested half float stuff for benchmarking.	2016-01-05 09:49:55 -08:00
Yangqing Jia	8d4683434b	convnet benchmark: make it consistent with TF's model.	2015-12-17 11:25:51 -08:00
Yangqing Jia	b7c3b48469	copy matrix can be done with cudamemcpy.	2015-12-17 10:22:02 -08:00
Yangqing Jia	b10ee24fc3	conv op: backward exhaustive mode too. This does not seem to help much, suggesting that cudaGetConvolution*Algo is already doing a very good job. Verified with googlenet.	2015-12-17 10:21:16 -08:00
Yangqing Jia	d79cfb4ae7	exhaustive search for cudnn	2015-12-15 22:21:11 -08:00
Yangqing Jia	61c114971b	fast path for copymatrix	2015-12-15 22:21:11 -08:00
Yangqing Jia	05e3207e26	fast path for copymatrix	2015-12-15 21:25:53 -08:00
Yangqing Jia	cc9323793e	add relu cudnn code	2015-12-15 20:43:34 -08:00
Yangqing Jia	4f2530d8ce	expose benchmark code to python	2015-12-15 20:42:54 -08:00
Yangqing Jia	6b27cabf17	net benchmark code	2015-12-15 20:42:22 -08:00
Yangqing Jia	cf8ffe215f	minor tuning	2015-12-15 20:41:58 -08:00
Yangqing Jia	20ccca5b67	RTTI to true in default for the main model.	2015-12-15 11:01:09 -08:00
Yangqing Jia	f714ad0a70	number of blocks now makes more sense.	2015-12-15 10:46:50 -08:00
Yangqing Jia	3b0cc79465	context gpu: better error catching	2015-12-14 13:59:28 -08:00
Yangqing Jia	73f3daf736	minor bugfix for workspace	2015-12-13 08:37:36 -08:00
Yangqing Jia	bfae070de1	minor bugfix for net	2015-12-13 08:37:01 -08:00
Yangqing Jia	359f7685f8	halfway into timing test.	2015-12-11 11:01:40 -08:00
Yangqing Jia	03c777db72	boolean for has_gpu_support	2015-12-10 15:06:57 -08:00
Yangqing Jia	7bdc8a6c19	Pycaffe2: removed the clunky gpu support hack. Now, when one builds pycaffe2, if cuda is present, we will always build pycaffe2 with gpu support.	2015-12-10 15:06:57 -08:00
Yangqing Jia	becf9e85c1	remove no longer needed build_env_android.py.	2015-12-10 15:06:57 -08:00
Yangqing Jia	82696ebc5d	Merge pull request #9 from Yangqing/master a script to test zeromq db throughput.	2015-12-09 15:36:39 -08:00
Yangqing Jia	ae1ebd0f19	a script to test zeromq db throughput.	2015-12-09 15:15:06 -08:00
Yangqing Jia	77541ffe14	flags relaxation, or tightening?	2015-12-07 20:48:57 -08:00
Yangqing Jia	ceb4cde74a	average pooling format change to fit the cudnn interface	2015-12-06 15:56:29 -08:00
Yangqing Jia	6bfb30047e	deprecate legacy pooling	2015-12-06 11:28:00 -08:00
Yangqing Jia	20dbbbbb28	android: use full proto in default	2015-12-06 11:26:30 -08:00
Yangqing Jia	9022e4f499	pull protobuf to master	2015-12-05 18:34:48 -08:00
Yangqing Jia	05465783c6	optionally use protobuf lite	2015-12-05 16:15:00 -08:00
Yangqing Jia	3d7cb201a3	misc changes to reduce binary size.	2015-12-04 21:31:23 -08:00
Yangqing Jia	4eb486bd34	misc update to reduce binary size. Removed zmq.hpp	2015-12-03 21:28:55 -08:00
Yangqing Jia	ff04fe8b1b	merge	2015-12-02 21:41:56 -08:00
Yangqing Jia	1a4ea7c8fc	misc updates	2015-12-02 21:01:55 -08:00
Yangqing Jia	b64429bbc6	Merge branch 'dev' of https://github.com/Yangqing/caffe2 into dev Conflicts: caffe2/operators/spatial_batch_norm_op_cudnn.cc	2015-12-02 20:57:36 -08:00
Yangqing Jia	25647f8c47	more test for tf benchmark purposes.	2015-12-02 16:55:51 -08:00
Yangqing Jia	01b45fd052	backward support to cudnn R2 for TensorFlow benchmark references	2015-12-02 15:12:04 -08:00
Yangqing Jia	acc16645d3	temp hack. Will rewrite the build script later.	2015-12-02 10:06:15 -08:00
Yangqing Jia	3a4d4285f2	Added more benchmarks.	2015-12-02 10:04:00 -08:00
Yangqing Jia	7d87fe788f	alexnet benchmark code using cudnn: this should give a reference speed that TensorFlow should achieve after tuning. With R4 currently we have 29.5ms fwd / 93.4ms bwd.	2015-12-01 17:17:22 -08:00
Yangqing Jia	1499b87e56	cudnn spatial bn: optional compilation instead of throwing error	2015-12-01 14:20:28 -08:00
Yangqing Jia	5ba54180f5	various updates	2015-11-28 13:12:43 -08:00
Yangqing Jia	1b7c5acbd8	halfway. Prepare to revert proto3 to proto2	2015-11-28 11:02:47 -08:00
Yangqing Jia	fcd5f8fbf0	move to the new build scripts	2015-11-27 21:31:09 -08:00
Yangqing Jia	3dcb868411	misc update	2015-11-27 21:28:03 -08:00
Yangqing Jia	85c2eaa303	Halfway into refactoring the build system	2015-11-27 19:06:32 -08:00
Yangqing Jia	63b010115b	minor changes	2015-11-25 10:16:49 -08:00
Yangqing Jia	a71667859f	I thought I removed this. Maybe on another machine?	2015-11-25 10:16:37 -08:00
Yangqing Jia	7cfa9f378b	cnn default order.	2015-11-11 14:25:02 -08:00
Yangqing Jia	85f2fc65b7	well.	2015-11-11 14:24:45 -08:00
Yangqing Jia	92790cf6b3	Spatial batch norm; currently just based on cudnn.	2015-11-11 14:23:53 -08:00
Yangqing Jia	5c915a0321	Some naming changes.	2015-11-10 23:11:06 -08:00
Yangqing Jia	d577f9b95d	Code sugar for simpler gradient definition.	2015-11-10 23:11:05 -08:00
Yangqing Jia	63bd3ce182	sigmoid	2015-11-10 23:11:05 -08:00
Yangqing Jia	d582c395dc	tanh	2015-11-10 23:11:05 -08:00
Yangqing Jia	a3dcd9250a	bugfix	2015-11-10 23:11:05 -08:00
Yangqing Jia	48d87711ed	bugfix on master	2015-11-10 23:10:21 -08:00
Yangqing Jia	a74d606df7	A collection of changes: (1) Registry now uses std::function for more flexible use cases. (2) dropout adds an "is_test" keyword. (3) Making all gradient registered via C++. Python still provides gradient wrapper. TODO item is to make the autograd SSA in C++ if possible. Problem is if we want to dynamically register python gradients we will be sort of screwed because in c++ things are registered via static variables.	2015-11-07 16:12:18 -08:00
Yangqing Jia	f5393b4c78	Merge pull request #20 from Yangqing/tensor_notype Some big rewrites now that they are stabilized.	2015-10-31 12:53:29 -07:00
Yangqing Jia	457ef79c70	elegant fetchblob	2015-10-31 09:55:29 -07:00
Yangqing Jia	71e9932148	half float conversion	2015-10-31 09:50:15 -07:00
Yangqing Jia	b70f46f958	minor fix	2015-10-31 09:50:02 -07:00
Yangqing Jia	625e19acae	Float16 for convolution	2015-10-31 09:49:45 -07:00
Yangqing Jia	a51a398fec	Revert "minor fix" This reverts commit 88e5438cd6784ebce0053bbbc54795ba8f99b9eb.	2015-10-30 15:09:42 -07:00
Yangqing Jia	9e48fd2e8e	utility op in-place opt-in	2015-10-30 14:29:57 -07:00
Yangqing Jia	d167ae5399	cudnn race condition fix	2015-10-30 14:29:29 -07:00
Yangqing Jia	6847291ea8	minor fix	2015-10-30 14:22:53 -07:00
Yangqing Jia	5cf83e57f2	cudnn refactor so we can do easier benchmark check. Also some minor bug fix.	2015-10-29 12:40:39 -07:00
Yangqing Jia	141d122db3	minor bugfix	2015-10-29 12:37:09 -07:00
Yangqing Jia	0d18ed31dd	minor bugfix	2015-10-29 10:27:32 -07:00
Yangqing Jia	80a70b635f	Two main changes: (1) Loss: do not coerce a gradient output. Although it may be numerically more efficient to do so, it makes the definition of a loss kind of funny if one does not really want to run backward pass. (2) Autodifferentiation: allow more explicit in-place check, in-place is now opt-in, and implemented a simple SSA/IR gradient generation scheme. Also added some core gradient tests. Misc bugfixes as well.	2015-10-28 23:15:17 -07:00
Yangqing Jia	98c5b86ef7	A few changes: (1) cudnn for conv (2) cublas: after going through the work I feel it's beter to use HOST pointer mode, so changed it. (3) storage order: despite that googlenet and multibox uses NHWC, it seems better to be still using NCHW as default to be consistent with caffe and cudnn; moved to NCHW as default.	2015-10-21 22:37:11 -07:00
Yangqing Jia	d734ddc196	Adding optional Eigen code. Added a switch USE_SYSTEM_EIGEN in Env. Misc changes.	2015-10-18 16:55:24 -07:00
Yangqing Jia	648d1b101a	A consolidation of a couple random weekend work. (1) various bugfixes. (2) Tensor is now a class independent from its data type. This allows us to write easier type-independent operators. (3) code convention changes a bit: dtype -> T, Tensor<Context> -> Tensor alias. (4) ParallelNet -> DAGNet to be more consistent with what it does. (5) Caffe's own flags library instead of gflags. (6) Caffe's own logging library instead of glog, but glog can be chosen with compile-time definition -DCAFFE2_USE_GOOGLE_GLOG. As a result, glog macros like CHECK, DCHECK now have prefix CAFFE_, and LOG() now becomes CAFFE_LOG_. (7) an optional protobuf inclusion, which can be chosen with USE_SYSTEM_PROTOBUF in build_env.py.	2015-10-11 23:14:06 -07:00
Yangqing Jia	064fef1313	trying to cover as much mpi cases as possible	2015-09-15 22:06:37 -07:00
Yangqing Jia	5b9584c227	carpet bombing	2015-09-15 21:30:23 -07:00
Yangqing Jia	42d310afdd	Update README.md installation link.	2015-09-12 20:26:08 -07:00
Yangqing Jia	5b8ae52e4b	cudnn note.	2015-09-12 16:55:24 -07:00
Yangqing Jia	87bdbe0e8e	Hickery Dickery Docker. Also started adding some docs for installation	2015-09-12 10:57:44 -07:00
Yangqing Jia	0b66c01462	(1) blob debugstring (2) cnn bugfix and enhancement (3) device checker fix (4) suppress prints from caffe2_python (5) moar tests!	2015-09-12 10:37:40 -07:00
Yangqing Jia	8490282cd1	‘did I…?’ ‘No.’ ‘should I…?’ ‘Yes.’	2015-09-09 20:38:10 -07:00
Yangqing Jia	d07549bed2	half-finished cnn wrapper, etc.	2015-09-09 20:33:34 -07:00
Yangqing Jia	d4336af327	caffe_translator minor change	2015-09-06 16:33:59 -07:00
Yangqing Jia	d9af6fc7f2	Now we need to add GlobalInit() before mujitest.	2015-09-06 15:57:07 -07:00
Yangqing Jia	53868133e1	lmdb: after commit, txn is already deleted so no need to abort	2015-09-06 22:34:22 +00:00
Yangqing Jia	d72cfcebaf	fixes to allow more consistent build tests	2015-09-06 22:34:22 +00:00
Yangqing Jia	4f33daf308	workspace: cannot call GlobalInit with sys.argv because that will cause e.g. python to fail. Need a better way, currently just disabling it.	2015-09-06 13:35:20 -07:00
Yangqing Jia	7517c2898f	translator and misc fixes for legacy group convolution, sigh.	2015-09-06 13:34:46 -07:00
Yangqing Jia	1164b9a347	mujitest bugfix	2015-09-06 08:59:12 -07:00
Yangqing Jia	821eac3e7c	lint	2015-09-06 08:59:12 -07:00
Yangqing Jia	6e5e9743c3	muji fix	2015-09-06 08:59:12 -07:00
Yangqing Jia	8198cb135d	pycaffe2 finetune	2015-09-06 08:59:12 -07:00
Yangqing Jia	a117c5164a	workspace: set globalinit	2015-09-06 08:59:11 -07:00
Yangqing Jia	bc70f17a4f	no more gflags hack headers.	2015-09-06 08:59:11 -07:00
Yangqing Jia	e505c98996	no more gflags_namespace.h header	2015-09-06 08:59:11 -07:00
Yangqing Jia	7583822af8	muji	2015-09-06 08:59:11 -07:00
Yangqing Jia	c18d756c49	workspace bugfix	2015-09-06 08:59:11 -07:00
Yangqing Jia	92e2cddd62	add some python cuda capability	2015-09-06 08:59:11 -07:00
Yangqing Jia	91d8ce4f44	python uses global init	2015-09-06 08:59:04 -07:00
Yangqing Jia	d2ff13d332	put a peer access pattern function to caffe2.	2015-09-06 08:59:04 -07:00
Yangqing Jia	f2fde73447	init test bug fix - forgot to commit in the previous one	2015-09-06 08:59:03 -07:00
Yangqing Jia	26591c8ac7	easy selection of gpu ids for cuda context.	2015-09-06 08:59:03 -07:00
Yangqing Jia	baffb9c503	make caffe2_gtest also uses globalinit. Not allowing init to run twice.	2015-09-06 08:59:03 -07:00
Yangqing Jia	ec069cb3ea	Use a global init function: it seems that with the multiple components optionally linked in, it is best to just enable a registering mechanism for inits.	2015-09-06 08:59:03 -07:00
Yangqing Jia	de55e4e77c	changes to ensure a more robust build	2015-09-06 05:49:10 +00:00
Yangqing Jia	2ddea70a08	remove dependency to google profiler	2015-09-03 20:55:55 -07:00
Yangqing Jia	ecd46d5ea0	A memory pool implementation based on cnmem. Added cnmem license to LICENSE.	2015-09-03 20:55:50 -07:00
Yangqing Jia	5325bd5049	rename files so things appear cleaner	2015-09-03 20:55:45 -07:00
Yangqing Jia	4f4aa1f205	clip operator, not tested.	2015-08-28 16:33:10 -07:00
Yangqing Jia	a57de4ece7	clean pycaffe namespace snafu.	2015-08-28 14:02:53 -07:00
Yangqing Jia	a12a471b2d	suppress compiler warning.	2015-08-28 14:02:53 -07:00
Yangqing Jia	f528f46c64	move LICENSE.caffe into LICENSE, and added related correct attributions.	2015-08-28 14:02:53 -07:00
Yangqing Jia	ea0c7afa49	Delete db.cc db.cc has been moved to caffe2/core and this is no longer used.	2015-08-27 10:45:41 -07:00
Yangqing Jia	561bf8eb1b	Merge branch 'master' of https://github.com/Yangqing/caffe2	2015-08-15 07:20:19 -07:00
Yangqing Jia	7d021a0346	context change and dropout bugfix	2015-08-15 07:17:07 -07:00
Yangqing Jia	d6bebc4824	lrn bugfix - for specific cuda devices a single if is not enough, need to do kernel loop.	2015-08-15 07:15:14 -07:00
Yangqing Jia	eac3b5bd28	Update README.md	2015-08-09 18:46:52 -07:00
Yangqing Jia	30fb5b94ac	Update README.md	2015-08-09 18:46:26 -07:00
Yangqing Jia	dad0608e75	pycaffe2 minor fix	2015-08-08 16:27:38 -07:00
Yangqing Jia	10ffe1132f	image input: support caffe datum format	2015-08-08 13:04:02 -07:00
Yangqing Jia	b4656c77b3	prefetch op bugfix	2015-08-08 13:01:12 -07:00
Yangqing Jia	60e94b5247	Update LICENSE	2015-08-07 21:43:29 -07:00
Yangqing Jia	a956aa90e2	brewery pool *= 2	2015-08-06 13:40:03 -07:00
Yangqing Jia	4b32534e84	bugfix for dropout and filler	2015-08-06 13:39:44 -07:00
Yangqing Jia	32dc580c43	utils: relax bool	2015-08-06 13:39:01 -07:00
Yangqing Jia	43afd1bdeb	Change the strategy of dealing with gradients of shared parameters.	2015-08-06 13:37:09 -07:00
Yangqing Jia	127684610f	utils bugfix	2015-07-29 09:21:02 -07:00
Yangqing Jia	a07c255d16	Some utility function changes	2015-07-29 09:21:02 -07:00
Jeff Donahue	45355ae79e	Merge pull request #10 from jeffdonahue/switch-copy-arg-order change arg order of Copy/Memcpy to follow inputs-then-outputs convention	2015-07-28 15:28:33 -07:00
Jeff Donahue	d829950eff	change arg order of Copy/Memcpy to follow inputs-then-outputs convention instead of C memcpy order -- from (dst, src, n) to (n, src, dst)	2015-07-27 21:19:32 -07:00
Jeff Donahue	5bdfe67f89	Merge pull request #9 from jeffdonahue/remove-failing-test remove failing repeated arg death test after #7	2015-07-27 18:27:42 -07:00
Jeff Donahue	b856798020	disable CannotAccessRepeatedParameterWithWrongType test broken by removal of check (PR #7)	2015-07-27 18:28:44 -07:00
Yangqing Jia	75b1f2f868	Merge pull request #8 from jeffdonahue/tensor-init-values-const Tensor constructor: values arg is const	2015-07-27 18:14:16 -07:00
Yangqing Jia	b996b8dfd8	Merge pull request #7 from jeffdonahue/empty-tensor-is-scalar allow Tensor with empty dims (a scalar)	2015-07-27 18:13:54 -07:00
Jeff Donahue	091aabeaf0	operator.cc: remove check for empty repeated argument (to allow empty shapes and other use cases)	2015-07-27 18:09:58 -07:00
Jeff Donahue	4aad861f9f	allow Tensor with empty dims (a scalar) use in loss functions for scalar loss output	2015-07-27 18:08:06 -07:00
Jeff Donahue	7b481f5a8f	Tensor constructor: values arg is const	2015-07-27 13:38:54 -07:00
Yangqing Jia	2adcb8732a	Merge pull request #5 from jeffdonahue/add-to-build-env-dirs add system/anaconda dirs to build_env.py, filter non-existent dirs	2015-07-27 13:20:27 -07:00
Jeff Donahue	a139a1bf6a	build_env.py: filter non-existent dirs from INCLUDES, LIBDIRS	2015-07-27 11:42:32 -07:00
Jeff Donahue	2d30d89cb1	build_env.py: add python library dir to LIBDIRS	2015-07-27 11:42:05 -07:00
Jeff Donahue	6440362c08	build_env.py: add /usr/include, /usr/lib to the default include, library dirs	2015-07-26 19:07:45 -07:00
Yangqing Jia	7f5ee34b9a	add back gflags dependency	2015-07-23 20:40:39 -07:00
Yangqing Jia	a6d20495c2	[gflags] sort out the gflags namespace issue.	2015-07-23 20:12:35 -07:00
Yangqing Jia	532670cce0	[cuda] add all gencode archs	2015-07-23 19:00:04 -07:00
Yangqing Jia	c3ba30a537	blob templating	2015-07-22 20:59:22 -07:00
Yangqing Jia	aaead5d6a5	reorg mpi_ops	2015-07-20 21:24:05 -07:00
Yangqing Jia	a76e7bb760	core_gradients change that goes with conv gradient change	2015-07-19 20:16:38 -07:00
Yangqing Jia	cf5a9f62b0	bugfix	2015-07-19 20:16:15 -07:00
Yangqing Jia	966a743f8c	legacy names due to the plural->singular change	2015-07-19 20:16:10 -07:00
Yangqing Jia	90affff039	broadcast op: it is an in-place op with both input and output set	2015-07-19 20:15:50 -07:00
Yangqing Jia	745d8ed969	run_plan_mpi: check the returned state	2015-07-19 20:14:43 -07:00
Yangqing Jia	d4eab84548	conv_op: during backward, bias is not needed.	2015-07-19 20:14:22 -07:00
Yangqing Jia	691986ec21	Add an execution chain option to force operator orders.	2015-07-19 20:14:05 -07:00
Yangqing Jia	a5254881f2	Some more MPI examples	2015-07-19 14:21:44 -07:00
Yangqing Jia	571ee16b44	minor change	2015-07-19 09:13:07 -07:00
Yangqing Jia	cfab4ed865	zmq feeder bugfix	2015-07-18 18:50:45 -07:00
Yangqing Jia	43aaadbef4	zmq feeder: catch error when setting up the socket.	2015-07-18 17:38:36 -07:00
Yangqing Jia	ecf0dceef6	zmqdb: pass key as well.	2015-07-18 17:26:29 -07:00
Yangqing Jia	05ba5b0527	Use c++ to do zmqdb, and added a simple zmq feeder example.	2015-07-18 14:56:34 -07:00
Yangqing Jia	47c70a43b4	(1) minidb bugfix (2) blob serialization comments (3) cudnn: putting it under a separate device name so we can explicitly choose cudnn instead of having CUDA device prioritizing it. (4) note that mint is not available with ipython due to zeromq conflict (5) db_throughput utility (6) added gprofiler	2015-07-18 07:23:09 -07:00
Yangqing Jia	c5166e578c	Several changes: [misc] Update license and readme. [binary] some enhancement to move over caffe databases. [python] added an alexnet2 example notebook.	2015-07-08 19:42:02 -07:00
Yangqing Jia	59e1ad7e77	Update license and readme.	2015-07-06 22:13:14 -07:00
Yangqing Jia	85a40a0b96	zmqdb: make clear error message that zmq 3+ is required.	2015-07-06 22:01:48 -07:00
Yangqing Jia	f15492f303	make things compile for gcc	2015-07-06 22:00:02 -07:00
Yangqing Jia	60ff1b4802	Merge pull request #2 from longjon/master Bring in Jon's changes.	2015-07-06 21:19:11 -07:00
Yangqing Jia	16c253e62e	Some non-trivial refactoring: (1) added blob serialization. (2) registry can now use key types other than string. (3) changed load_save_op so they interface with a db. (4) change sgd iter op: it does increments so we can resume an iter. (5) mnist linear classifier tests snapshot functionality. (6) added protodb which is a small wrapper over TensorProtos.	2015-07-06 21:17:18 -07:00
Yangqing Jia	dc41b23e41	[bugfix] static initialization order fiasco.	2015-07-06 21:17:18 -07:00
Yangqing Jia	97f4b9f3e7	[bugfix] missing dependency	2015-07-06 21:17:18 -07:00
Yangqing Jia	9b4fd2e77e	workspace: create root folder if not exist.	2015-07-06 21:17:18 -07:00
Yangqing Jia	e078bf4e81	[op] iter_op: instead of producing an int, produces it in a wrapped tensor.	2015-07-06 21:17:18 -07:00
Yangqing Jia	2a3fda4d60	[interface] OperatorBase::OutputIsType	2015-07-06 21:17:15 -07:00
Yangqing Jia	01055c20f0	Merge pull request #1 from xiaoyunwu/master minor format issue	2015-07-05 22:16:26 -07:00
Xiaoyun Wu	8ba95403fd	minor format issue	2015-07-06 11:37:31 +08:00
Yangqing Jia	036229c889	[style] Finishing name changes for the rest of the fields in the protobuf.	2015-07-01 18:16:43 -07:00
Yangqing Jia	c1077400c9	[style] Massive name change from plural to singular. This one changes operator’s “inputs” and “outputs” to “input” and “output”, and "args" to "arg".	2015-07-01 15:31:01 -07:00
Yangqing Jia	df6fd55cce	[minor] net.cc: in parallelnet, if the user forgot to set a num_workers, warn instead of quitting.	2015-07-01 14:09:43 -07:00
Yangqing Jia	241cad91f2	[LegacySupport] Fixed a bug in the caffe pooling case: pad_tail is changed on first run so we can only use pad_head for the legacy padding value.	2015-07-01 14:08:57 -07:00
Yangqing Jia	cf88235bb4	[pycaffe2] net_drawer: do not exit too loud if pydot is not present.	2015-07-01 14:00:01 -07:00
Yangqing Jia	e94be296b7	Fix some bug in the old code: the net device option overwite used to not work.	2015-07-01 13:38:25 -07:00
Yangqing Jia	8834e3eb13	Adding <memory> to the header. Clang is fine with missing this but gcc complains.	2015-07-01 12:37:24 -07:00
Yangqing Jia	8bedbde1c1	More flag changes… and adding back run_plan_mpi	2015-07-01 12:35:34 -07:00
Yangqing Jia	314696a262	explicitly make end_to_end_test depend on core_gpu	2015-07-01 12:21:58 -07:00
Yangqing Jia	fdf6066e45	namespace google -> gflags for gflags functions.	2015-07-01 11:22:23 -07:00
Yangqing Jia	2807aac523	enable optional_deps so that any non-crucial failures will not break the whole system.	2015-06-30 20:41:41 -07:00
Yangqing Jia	2abd5e263e	GoogleNet adaption - added yet another legacy padding support.	2015-06-30 19:40:05 -07:00
Jonathan L Long	aadac9c1fa	[binaries] add missing header	2015-06-30 18:51:53 -07:00
Jonathan L Long	d8e3ce8ef4	[build] flush output from subprocesses	2015-06-30 18:51:02 -07:00
Yangqing Jia	1e7730800f	bottlefeeding.	2015-06-30 09:26:56 -07:00
Yangqing Jia	dcb921f7ee	Caffe translator example notebook, and some nicety-type changes that accompany it.	2015-06-29 17:12:44 -07:00
Yangqing Jia	e5e74a5230	fix an embarrassing bug introduced when moving over the padding change.	2015-06-28 18:39:29 -07:00
Yangqing Jia	9a19430a39	Simplify the data sharing mechanism, using std::shared_ptr instead of home-brew code. Also cleaned notebook notes a little bit.	2015-06-25 21:23:23 -07:00
Yangqing Jia	2ed1077a83	A clean init for Caffe2, removing my earlier hacky commits.	2015-06-25 16:26:01 -07:00

5360 changed files with 833771 additions and 102707 deletions

									
										974

.circleci/config.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,974 @@

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				  aws_auth:

				    # This IAM user only allows read-only access to ECR

				    aws_access_key_id: AKIAJ2J6FIG5OSZTQ3IA

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				# NOTE: We only perform the merge in build step and not in test step, because

				# all source files will be shared from build to test

				merge_pull_request_onto_master: &merge_pull_request_onto_master

				  name: Merge Onto Master

				  no_output_timeout: "10h"

				  command: |

				    if [[ "${CIRCLE_BRANCH}" != "master" ]]; then

				      git config --global user.email "circleci.ossci@gmail.com"

				      git config --global user.name "CircleCI"

				      git config remote.origin.url https://github.com/pytorch/pytorch.git

				      git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				      git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=50 --quiet

				      export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				      echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				      export GIT_COMMIT=${CIRCLE_SHA1}

				      echo "GIT_COMMIT: " ${GIT_COMMIT}

				      git checkout -f ${GIT_COMMIT}

				      git reset --hard ${GIT_COMMIT}

				      git merge --no-edit --no-ff ${GIT_MERGE_TARGET}

				    fi

				pytorch_linux_cpu_build_test_defaults: &pytorch_linux_cpu_build_test_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        git submodule sync && git submodule update --init

				        .jenkins/pytorch/build.sh

				        .jenkins/pytorch/test.sh

				pytorch_linux_build_defaults: &pytorch_linux_build_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        if [ -n "${CUDA_VERSION}" ]; then

				          export TORCH_CUDA_ARCH_LIST=5.2

				        fi

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        git submodule sync && git submodule update --init

				        .jenkins/pytorch/build.sh

				        export PYTORCH_CI_ENV_DIR=/var/lib/jenkins/pytorch-ci-env

				        mkdir -p ${PYTORCH_CI_ENV_DIR}

				        cp -r /var/lib/jenkins/workspace ${PYTORCH_CI_ENV_DIR}/build_workspace  # This copies all source files from build step to the next step

				        cp -r /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch ${PYTORCH_CI_ENV_DIR}/torch

				        cp -r build/bin ${PYTORCH_CI_ENV_DIR}/cpp_test_bin

				        if [ -d "../cpp-build" ]; then

				          cp -r ../cpp-build ${PYTORCH_CI_ENV_DIR}/cpp-build

				        fi

				  - persist_to_workspace:

				      root: /var/lib/jenkins/pytorch-ci-env

				      paths:

				        - "*"

				pytorch_linux_test_defaults: &pytorch_linux_test_defaults

				  machine:

				    image: default

				  steps:

				  - run:

				      name: Prepare workspace

				      command: |

				        sudo mkdir -p /opt/workspace

				        sudo chmod -R 777 /opt/workspace

				  - attach_workspace:

				      at: /opt/workspace

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        set -x

				        sudo pip install awscli

				        if [ -n "${CUDA_VERSION}" ]; then

				          curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				          echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				        fi

				        sudo apt-get update

				        sudo apt-get remove linux-image-generic linux-headers-generic linux-generic

				        sudo apt-get install linux-headers-$(uname -r)

				        sudo apt-get install linux-image-generic

				        if [ -n "${CUDA_VERSION}" ]; then

				          wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'

				          sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm

				          sudo apt-get install -y nvidia-docker2

				        fi

				        sudo pkill -SIGHUP dockerd

				        if [ -n "${CUDA_VERSION}" ]; then

				          nvidia-smi

				        fi

				        # This IAM user only allows read-only access to ECR

				        export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				        eval $(aws ecr get-login --region us-east-1 --no-include-email)

				        docker pull ${DOCKER_IMAGE}

				        if [ -n "${CUDA_VERSION}" ]; then

				          id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        else

				          id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        fi

				        pwd

				        cp -r /opt/workspace/build_workspace/. /home/circleci/project  # This copies all source files from build step to the current step

				        echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				        echo "declare -x PYTHON_VERSION=${PYTHON_VERSION}" >> /home/circleci/project/env

				        echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				        # This IAM user allows write access to S3 bucket for sccache

				        echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env

				        mkdir -p /home/circleci/project/build

				        cp -r /opt/workspace/cpp_test_bin /home/circleci/project/build/bin

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        echo "mkdir -p /opt/conda/lib/python${PYTHON_VERSION}/site-packages" | docker exec -u jenkins -i "$id" bash

				        docker cp "/opt/workspace/torch" "$id:/opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch"

				        if [ -d "/opt/workspace/cpp-build" ]; then

				          docker cp "/opt/workspace/cpp-build" "$id:/var/lib/jenkins/cpp-build"

				        fi

				        if [ -n "${MULTI_GPU}" ]; then

				          (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/multigpu-test.sh') | docker exec -u jenkins -i "$id" bash

				        else

				          (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/test.sh') | docker exec -u jenkins -i "$id" bash

				        fi

				caffe2_linux_build_defaults: &caffe2_linux_build_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        set -ex

				        # Need to checkout fetch PRs for onnxbot tracking PRs

				        git submodule update --init third_party/onnx || true

				        cd third_party/onnx && git fetch --tags --progress origin +refs/pull/*:refs/remotes/origin/pr/* && cd -

				        # Reinitialize submodules

				        git submodule sync && git submodule update --init --recursive

				        # Ensure jenkins can write to the ccache root dir.

				        sudo chown jenkins:jenkins "${HOME}/.ccache"

				        # Make ccache log to the workspace, so we can archive it after the build

				        mkdir -p build

				        ccache -o log_file=$PWD/build/ccache.log

				        # Configure additional cmake arguments

				        cmake_args=()

				        cmake_args+=("$CMAKE_ARGS")

				        if [[ $BUILD_ENVIRONMENT == *aten* ]]; then

				          cmake_args+=("-DBUILD_ATEN=ON")

				        fi

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				          sudo chown -R jenkins:jenkins '/opt/conda'

				        fi

				        # Build

				        if test -x ".jenkins/caffe2/build.sh"; then

				          ./.jenkins/caffe2/build.sh ${cmake_args[@]}

				        else

				          ./.jenkins/build.sh ${cmake_args[@]}

				        fi

				        # Show sccache stats if it is running

				        if pgrep sccache > /dev/null; then

				          sccache --show-stats

				        fi

				        # Copy all necessary binaries to shared workspace

				        export CAFFE2_CI_ENV_DIR=/var/lib/jenkins/caffe2-ci-env

				        mkdir -p ${CAFFE2_CI_ENV_DIR}

				        cp -r /var/lib/jenkins/workspace ${CAFFE2_CI_ENV_DIR}/build_workspace  # This copies all source files from build step to the next step

				        cp -r third_party/onnx ${CAFFE2_CI_ENV_DIR}/onnx

				        if [ -d "/usr/local/caffe2" ]; then

				          cp -r /usr/local/caffe2 ${CAFFE2_CI_ENV_DIR}/caffe2

				        fi

				        if [ -d "/opt/conda" ]; then

				          cp -r /opt/conda ${CAFFE2_CI_ENV_DIR}/conda_env

				        fi

				  - persist_to_workspace:

				      root: /var/lib/jenkins/caffe2-ci-env

				      paths:

				        - "*"

				caffe2_linux_test_defaults: &caffe2_linux_test_defaults

				  machine:

				    image: default

				  steps:

				  - run:

				      name: Prepare workspace

				      command: |

				        sudo mkdir -p /opt/workspace

				        sudo chmod -R 777 /opt/workspace

				  - attach_workspace:

				      at: /opt/workspace

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        set -x

				        sudo pip install awscli

				        if [ -n "${CUDA_VERSION}" ]; then

				          curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				          echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				        fi

				        sudo apt-get update

				        sudo apt-get remove linux-image-generic linux-headers-generic linux-generic

				        sudo apt-get install linux-headers-$(uname -r)

				        sudo apt-get install linux-image-generic

				        if [ -n "${CUDA_VERSION}" ]; then

				          wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'

				          sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm

				          sudo apt-get install -y nvidia-docker2

				        fi

				        sudo pkill -SIGHUP dockerd

				        if [ -n "${CUDA_VERSION}" ]; then

				          nvidia-smi

				        fi

				        # This IAM user only allows read-only access to ECR

				        export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				        eval $(aws ecr get-login --region us-east-1 --no-include-email)

				        docker pull ${DOCKER_IMAGE}

				        if [ -n "${CUDA_VERSION}" ]; then

				          id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        else

				          id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        fi

				        pwd

				        cp -r /opt/workspace/build_workspace/. /home/circleci/project  # This copies all source files from build step to the current step

				        echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				        echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				        # This IAM user allows write access to S3 bucket for sccache

				        echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env

				        echo "declare -x BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" >> /home/circleci/project/env

				        # TODO: merge this into Caffe2 build.sh

				        cat >/home/circleci/project/ci_build_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				        sudo ln /dev/null /dev/raw1394

				        # Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04

				        # See comments on https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830

				        if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				          sudo pip uninstall -y hypothesis

				          # "pip install hypothesis==3.44.6" from official server is unreliable on CircleCI, so we host a copy on S3 instead

				          sudo pip install attrs -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl

				          sudo pip install coverage -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl

				          sudo pip install hypothesis -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl

				        fi

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				        fi

				        pip install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				        pip install --user future

				        # Build

				        if test -x ".jenkins/caffe2/test.sh"; then

				          ./.jenkins/caffe2/test.sh

				        else

				          ./.jenkins/test.sh

				        fi

				        # Remove benign core dumps.

				        # These are tests for signal handling (including SIGABRT).

				        rm -f ./crash/core.fatal_signal_as.*

				        rm -f ./crash/core.logging_test.*

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_build_script.sh

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        if [ -d "/opt/workspace/caffe2" ]; then

				          echo "mkdir -p /usr/local/caffe2" | docker exec -u jenkins -i "$id" bash

				          docker cp /opt/workspace/caffe2/. "$id:/usr/local/caffe2"

				        fi

				        if [ -d "/opt/workspace/conda_env" ]; then

				          echo "sudo mkdir -p /opt/conda" | docker exec -u jenkins -i "$id" bash

				          docker cp /opt/workspace/conda_env/. "$id:/opt/conda"

				        fi

				        docker cp /opt/workspace/onnx/. "$id:/var/lib/jenkins/workspace/third_party/onnx"

				        (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh') | docker exec -u jenkins -i "$id" bash

				caffe2_macos_build_defaults: &caffe2_macos_build_defaults

				  macos:

				    xcode: "9.0"

				  steps:

				    - checkout

				    - run:

				        <<: *merge_pull_request_onto_master

				    - run:

				        name: Build

				        no_output_timeout: "10h"

				        command: |

				          set -ex

				          export IN_CIRCLECI=1

				          brew install cmake

				          # Reinitialize submodules

				          git submodule sync && git submodule update --init --recursive

				          # Reinitialize path (see man page for path_helper(8))

				          eval `/usr/libexec/path_helper -s`

				          # Use Homebrew Python if configured to do so

				          if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				          fi

				          pip install numpy

				          # Install Anaconda if we need to

				          if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            rm -rf ${TMPDIR}/anaconda

				            curl -o ${TMPDIR}/anaconda.sh "https://repo.continuum.io/archive/Anaconda${ANACONDA_VERSION}-5.0.1-MacOSX-x86_64.sh"

				            /bin/bash ${TMPDIR}/anaconda.sh -b -p ${TMPDIR}/anaconda

				            rm -f ${TMPDIR}/anaconda.sh

				            export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				            source ${TMPDIR}/anaconda/bin/activate

				          fi

				          # Install sccache

				          sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				          sudo chmod +x /usr/local/bin/sccache

				          export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				          # This IAM user allows write access to S3 bucket for sccache

				          export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				          export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				          export SCCACHE_BIN=${PWD}/sccache_bin

				          mkdir -p ${SCCACHE_BIN}

				          if which sccache > /dev/null; then

				            printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				            chmod a+x "${SCCACHE_BIN}/clang++"

				            printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				            chmod a+x "${SCCACHE_BIN}/clang"

				            export PATH="${SCCACHE_BIN}:$PATH"

				          fi

				          # Build

				          if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				            scripts/build_ios.sh

				          elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            # All conda build logic should be in scripts/build_anaconda.sh

				            scripts/build_anaconda.sh

				          else

				            scripts/build_local.sh

				          fi

				          # Show sccache stats if it is running

				          if which sccache > /dev/null; then

				            sccache --show-stats

				          fi

				version: 2

				jobs:

				  pytorch_linux_trusty_py2_7_9_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7.9:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py2_7_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_5_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.5:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc4_8_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc4.8:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc5_4_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc5.4:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc7_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc7:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_pynightly_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-pynightly:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_xenial_py3_clang5_asan_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_py3_clang5_asan_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				      MULTI_GPU: "1"

				    resource_class: gpu.large

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py2_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "2.7"

				      CUDA_VERSION: "9"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py2_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238"

				      PYTHON_VERSION: "2.7"

				      CUDA_VERSION: "9"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py3_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py3_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9.2"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9.2"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_macos_10_13_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *merge_pull_request_onto_master

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3

				          no_output_timeout: "10h"

				          command: |

				            set -ex

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				            git submodule sync && git submodule update --init

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            .jenkins/pytorch/macos-build.sh

				            # TODO: need to share source files from build to test, when macOS builds are enabled

				      - persist_to_workspace:

				          root: /Users/distiller/pytorch-ci-env

				          paths:

				            - "*"

				  pytorch_macos_10_13_py3_test:

				    macos:

				      xcode: "9.0"

				    steps:

				      - run:

				          name: Prepare workspace

				          command: |

				            sudo mkdir -p /Users/distiller/pytorch-ci-env

				            sudo chmod -R 777 /Users/distiller/pytorch-ci-env

				      - attach_workspace:

				          at: /Users/distiller/pytorch-ci-env

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3

				          no_output_timeout: "10h"

				          command: |

				            # TODO: need to share source files from build to test, when macOS builds are enabled

				            set -ex

				            export IN_CIRCLECI=1

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            .jenkins/pytorch/macos-test.sh

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *merge_pull_request_onto_master

				      - run:

				          name: Build

				          environment:

				            JOB_BASE_NAME: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3

				          no_output_timeout: "10h"

				          command: |

				            set -ex

				            export IN_CIRCLECI=1

				            # Install CUDA 9.2

				            sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true

				            curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip

				            unzip ~/cuda_9.2.64_mac_installer.zip -d ~/

				            sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window

				            sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib

				            sudo rm -rf /usr/local/cuda || true

				            # Install cuDNN 7.1 for CUDA 9.2

				            curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz

				            rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1

				            tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/

				            sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				            git submodule sync && git submodule update --init

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            .jenkins/pytorch/macos-build.sh

				  caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "8"

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190"

				      CUDA_VERSION: "8"

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9.1"

				      BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9.1"

				      BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_mkl_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_mkl_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_gcc4_8_ubuntu14_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc4_8_ubuntu14_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190"

				      BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_onnx_py2_gcc5_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_onnx_py2_gcc5_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_conda2_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "conda2-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_conda2_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "conda2-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc4_9_ubuntu14_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.9-ubuntu14.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc4.9-ubuntu14.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_clang3_8_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.8-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-clang3.8-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_clang3_9_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.9-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-clang3.9-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc6_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc6-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc6-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-aten-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_android_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-android-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-android-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda3-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "conda3-cuda9.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_centos7_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-centos7"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_ios_macos10_13_build:

				    environment:

				      BUILD_IOS: "1"

				      PYTHON_INSTALLATION: "system"

				      PYTHON_VERSION: "2"

				    <<: *caffe2_macos_build_defaults

				  caffe2_py2_system_macos10_13_build:

				    environment:

				      PYTHON_INSTALLATION: "system"

				      PYTHON_VERSION: "2"

				    <<: *caffe2_macos_build_defaults

				workflows:

				  version: 2

				  build:

				    jobs:

				      # - pytorch_linux_trusty_py2_7_9_build_test

				      # - pytorch_linux_trusty_py2_7_build_test

				      # - pytorch_linux_trusty_py3_5_build_test

				      # - pytorch_linux_trusty_py3_6_gcc4_8_build_test

				      # - pytorch_linux_trusty_py3_6_gcc5_4_build_test

				      # - pytorch_linux_trusty_py3_6_gcc7_build_test

				      # - pytorch_linux_trusty_pynightly_build_test

				      # - pytorch_linux_xenial_py3_clang5_asan_build

				      # - pytorch_linux_xenial_py3_clang5_asan_test:

				      #     requires:

				      #       - pytorch_linux_xenial_py3_clang5_asan_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py2_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py2_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_cudnn7_py2_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py3_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py3_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_cudnn7_py3_build

				      # - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build

				      # - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build

				      # - pytorch_macos_10_13_py3_build

				      # - pytorch_macos_10_13_py3_test:

				      #     requires:

				      #       - pytorch_macos_10_13_py3_build

				      # - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build

				      # - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_mkl_ubuntu16_04_build

				      # - caffe2_py2_mkl_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_mkl_ubuntu16_04_build

				      # - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_gcc4_8_ubuntu14_04_build

				      # - caffe2_py2_gcc4_8_ubuntu14_04_test:

				      #     requires:

				      #       - caffe2_py2_gcc4_8_ubuntu14_04_build

				      # - caffe2_onnx_py2_gcc5_ubuntu16_04_build

				      # - caffe2_onnx_py2_gcc5_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_onnx_py2_gcc5_ubuntu16_04_build

				      # - caffe2_conda2_ubuntu16_04_build

				      # - caffe2_conda2_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_conda2_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_gcc4_9_ubuntu14_04_build

				      # - caffe2_py2_clang3_8_ubuntu16_04_build

				      # - caffe2_py2_clang3_9_ubuntu16_04_build

				      # - caffe2_py2_gcc6_ubuntu16_04_build

				      # - caffe2_py2_gcc7_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_android_ubuntu16_04_build

				      # - caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_centos7_build

				      # - caffe2_py2_ios_macos10_13_build

				      # - caffe2_py2_system_macos10_13_build

88

.clang-format Normal file

View File

 @ -0,0 +1,88 @@
 ---
 AccessModifierOffset: -1
 AlignAfterOpenBracket: AlwaysBreak
 AlignConsecutiveAssignments: false
 AlignConsecutiveDeclarations: false
 AlignEscapedNewlinesLeft: true
 AlignOperands:   false
 AlignTrailingComments: false
 AllowAllParametersOfDeclarationOnNextLine: false
 AllowShortBlocksOnASingleLine: false
 AllowShortCaseLabelsOnASingleLine: false
 AllowShortFunctionsOnASingleLine: Empty
 AllowShortIfStatementsOnASingleLine: false
 AllowShortLoopsOnASingleLine: false
 AlwaysBreakAfterReturnType: None
 AlwaysBreakBeforeMultilineStrings: true
 AlwaysBreakTemplateDeclarations: true
 BinPackArguments: false
 BinPackParameters: false
 BraceWrapping:
   AfterClass:      false
   AfterControlStatement: false
   AfterEnum:       false
   AfterFunction:   false
   AfterNamespace:  false
   AfterObjCDeclaration: false
   AfterStruct:     false
   AfterUnion:      false
   BeforeCatch:     false
   BeforeElse:      false
   IndentBraces:    false
 BreakBeforeBinaryOperators: None
 BreakBeforeBraces: Attach
 BreakBeforeTernaryOperators: true
 BreakConstructorInitializersBeforeComma: false
 BreakAfterJavaFieldAnnotations: false
 BreakStringLiterals: false
 ColumnLimit:     80
 CommentPragmas:  '^ IWYU pragma:'
 CompactNamespaces: false
 ConstructorInitializerAllOnOneLineOrOnePerLine: true
 ConstructorInitializerIndentWidth: 4
 ContinuationIndentWidth: 4
 Cpp11BracedListStyle: true
 DerivePointerAlignment: false
 DisableFormat:   false
 ForEachMacros:   [ FOR_EACH_RANGE, FOR_EACH, ]
 IncludeCategories:
   - Regex:           '^<.*\.h(pp)?>'
     Priority:        1
   - Regex:           '^<.*'
     Priority:        2
   - Regex:           '.*'
     Priority:        3
 IndentCaseLabels: true
 IndentWidth:     2
 IndentWrappedFunctionNames: false
 KeepEmptyLinesAtTheStartOfBlocks: false
 MacroBlockBegin: ''
 MacroBlockEnd:   ''
 MaxEmptyLinesToKeep: 1
 NamespaceIndentation: None
 ObjCBlockIndentWidth: 2
 ObjCSpaceAfterProperty: false
 ObjCSpaceBeforeProtocolList: false
 PenaltyBreakBeforeFirstCallParameter: 1
 PenaltyBreakComment: 300
 PenaltyBreakFirstLessLess: 120
 PenaltyBreakString: 1000
 PenaltyExcessCharacter: 1000000
 PenaltyReturnTypeOnItsOwnLine: 2000000
 PointerAlignment: Left
 ReflowComments:  true
 SortIncludes:    true
 SpaceAfterCStyleCast: false
 SpaceBeforeAssignmentOperators: true
 SpaceBeforeParens: ControlStatements
 SpaceInEmptyParentheses: false
 SpacesBeforeTrailingComments: 1
 SpacesInAngles:  false
 SpacesInContainerLiterals: true
 SpacesInCStyleCastParentheses: false
 SpacesInParentheses: false
 SpacesInSquareBrackets: false
 Standard:        Cpp11
 TabWidth:        8
 UseTab:          Never
 ...

51

.clang-tidy Normal file

View File

 @ -0,0 +1,51 @@
 ---
 # NOTE: there must be no spaces before the '-', so put the comma first.
 Checks: '
   *
   ,clang-analyzer-*
   ,modernize-*
   ,-cert-dcl21-cpp
   ,-cert-err58-cpp
   ,-cert-err60-cpp
   ,-clang-diagnostic-*
   ,-cppcoreguidelines-owning-memory
   ,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-type-member-init
   ,-cppcoreguidelines-pro-type-static-cast-downcast
   ,-cppcoreguidelines-pro-type-union-access
   ,-cppcoreguidelines-pro-type-vararg
   ,-cppcoreguidelines-special-member-functions
   ,-fuchsia-*
   ,-google-build-using-namespace
   ,-google-default-arguments
   ,-google-explicit-constructor
   ,-google-readability-braces-around-statements
   ,-google-readability-namespace-comments
   ,-google-readability-todo
   ,-google-runtime-references
   ,-google-runtime-references
   ,-hicpp-braces-around-statements
   ,-hicpp-explicit-conversions
   ,-hicpp-member-init
   ,-hicpp-no-array-decay
   ,-hicpp-signed-bitwise
   ,-hicpp-special-member-functions
   ,-hicpp-vararg
   ,-llvm-header-guard
   ,-llvm-include-order
   ,-llvm-namespace-comment
   ,-misc-unused-parameters
   ,-modernize-make-unique
   ,-modernize-use-default-member-init
   ,-performance-unnecessary-value-param
   ,-readability-braces-around-statements
   ,-readability-else-after-return
   ,-readability-implicit-bool-conversion
   ,-readability-named-parameter
   '
 WarningsAsErrors: ''
 HeaderFilterRegex: 'torch/csrc/'
 AnalyzeTemporaryDtors: false
 CheckOptions:
 ...

1

.dockerignore Symbolic link

View File

				`@ -0,0 +1 @@`
				`.gitignore`

1

.gitattributes vendored Normal file

View File

				`@ -0,0 +1 @@`
				`*.bat text eol=crlf`

0

.github/CONTRIBUTING.md vendored Normal file

View File

									
										38

.github/ISSUE_TEMPLATE.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,38 @@

				If you have a question or would like help and support, please ask at our

				[forums](https://discuss.pytorch.org/).

				If you are submitting a feature request, please preface the title with [feature request].

				If you are submitting a bug report, please fill in the following details.

				## Issue description

				Provide a short description.

				## Code example

				Please try to provide a minimal example to repro the bug.

				Error messages and stack traces are also helpful.

				## System Info

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				- PyTorch or Caffe2:

				- How you installed PyTorch (conda, pip, source):

				- Build command you used (if compiling from source):

				- OS:

				- PyTorch version:

				- Python version:

				- CUDA/cuDNN version:

				- GPU models and configuration:

				- GCC version (if compiling from source):

				- CMake version:

				- Versions of any other relevant libraries:

0

.github/PULL_REQUEST_TEMPLATE.md vendored Normal file

View File

228

.gitignore vendored

View File

 @ -1,29 +1,209 @@
 build/
 dist/
 torch.egg-info/
 */**/__pycache__
 torch/csrc/generic/TensorMethods.cpp
 torch/lib/*.so*
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/build
 torch/lib/tmp_install
 torch/lib/include
 torch/lib/torch_shm_manager
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THCUNN.cpp
 docs/src/**/*
 test/data/legacy_modules.t7
 test/htmlcov
 test/.coverage
 # READ THIS BEFORE YOU REFACTOR ME
 #
 # setup.py uses the list of patterns in this file to decide
 # what to delete, but it's not 100% sound.  So, for example,
 # if you delete aten/build/ because it's redundant with build/,
 # aten/build/ will stop being cleaned.  So be careful when
 # refactoring this file!
 ## PyTorch
 .mypy_cache
 */*.pyc
 */*.so*
 */**/__pycache__
 */**/*.dylib*
 */**/*.pyc
 */**/*.pyd
 */**/*.so*
 */**/**/*.pyc
 */**/**/**/*.pyc
 */**/**/**/**/*.pyc
 */*.so*
 */**/*.so*
 */**/*.dylib*
 aten/build/
 aten/src/ATen/Config.h
 aten/src/ATen/cuda/CUDAConfig.h
 build/
 dist/
 docs/src/**/*
 docs/cpp/xml/
 docs/cpp/html/
 docs/cpp/api/
 test/.coverage
 test/cpp/api/mnist
 test/custom_operator/model.pt
 test/data/gpu_tensors.pt
 test/data/legacy_modules.t7
 test/data/legacy_serialized.pt
 test/data/linear.pt
 test/htmlcov
 test/cpp_extensions/install/
 third_party/build/
 tools/shared/_utils_internal.py
 torch.egg-info/
 torch/csrc/autograd/generated/*
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/generated
 torch/csrc/generic/TensorMethods.cpp
 torch/csrc/jit/generated/*
 torch/csrc/jit/fusers/Config.h
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THNN_generic.cpp
 torch/csrc/nn/THNN_generic.cwrap
 torch/csrc/nn/THNN_generic.h
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/lib/*.a*
 torch/lib/*.dll*
 torch/lib/*.exe*
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/*.lib
 torch/lib/*.so*
 torch/lib/build
 torch/lib/cmake
 torch/lib/include
 torch/lib/pkgconfig
 torch/lib/protoc
 torch/lib/tmp_install
 torch/lib/torch_shm_manager
 torch/lib/python*
 torch/share/
 torch/version.py
 # IPython notebook checkpoints
 .ipynb_checkpoints
 # Editor temporaries
 *.swn
 *.swo
 *.swp
 *.swm
 *~
 # macOS dir files
 .DS_Store
 # Symbolic files
 tools/shared/cwrap_common.py
 # Ninja files
 .ninja_deps
 .ninja_log
 compile_commands.json
 *.egg-info/
 docs/source/scripts/activation_images/
 ## General
 # Compiled Object files
 *.slo
 *.lo
 *.o
 *.cuo
 *.obj
 # Compiled Dynamic libraries
 *.so
 *.dylib
 *.dll
 # Compiled Static libraries
 *.lai
 *.la
 *.a
 *.lib
 # Compiled protocol buffers
 *.pb.h
 *.pb.cc
 *_pb2.py
 # Compiled python
 *.pyc
 *.pyd
 # Compiled MATLAB
 *.mex*
 # IPython notebook checkpoints
 .ipynb_checkpoints
 # Editor temporaries
 *.swn
 *.swo
 *.swp
 *~
 # Sublime Text settings
 *.sublime-workspace
 *.sublime-project
 # Eclipse Project settings
 *.*project
 .settings
 # QtCreator files
 *.user
 # PyCharm files
 .idea
 # OSX dir files
 .DS_Store
 ## Caffe2
 # build, distribute, and bins (+ python proto bindings)
 build
 build_host_protoc
 build_android
 build_ios
 /build_*
 .build_debug/*
 .build_release/*
 distribute/*
 *.testbin
 *.bin
 cmake_build
 .cmake_build
 gen
 .setuptools-cmake-build
 .pytest_cache
 aten/build/*
 # Bram
 plsdontbreak
 # Generated documentation
 docs/_site
 docs/gathered
 _site
 doxygen
 docs/dev
 # LevelDB files
 *.sst
 *.ldb
 LOCK
 LOG*
 CURRENT
 MANIFEST-*
 # generated version file
 caffe2/version.py
 # setup.py intermediates
 .eggs
 caffe2.egg-info
 # Atom/Watchman required file
 .watchmanconfig
 # BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
 #
 # Below files are not deleted by "setup.py clean".
 # Visual Studio Code files
 .vscode
 .vs

78

.gitmodules vendored Normal file

View File

 @ -0,0 +1,78 @@
 [submodule "third_party/catch"]
 	path = third_party/catch
 	url = https://github.com/catchorg/Catch2.git
 [submodule "third_party/pybind11"]
 	path = third_party/pybind11
 	url = https://github.com/pybind/pybind11.git
 [submodule "third_party/cub"]
 	path = third_party/cub
 	url = https://github.com/NVlabs/cub.git
 [submodule "third_party/eigen"]
 	path = third_party/eigen
 	url = https://github.com/eigenteam/eigen-git-mirror.git
 [submodule "third_party/googletest"]
 	path = third_party/googletest
 	url = https://github.com/google/googletest.git
 [submodule "third_party/nervanagpu"]
 	path = third_party/nervanagpu
 	url = https://github.com/NervanaSystems/nervanagpu.git
 [submodule "third_party/benchmark"]
 	path = third_party/benchmark
 	url = https://github.com/google/benchmark.git
 [submodule "third_party/protobuf"]
 	path = third_party/protobuf
 	url = https://github.com/google/protobuf.git
 [submodule "third_party/ios-cmake"]
 	path = third_party/ios-cmake
 	url = https://github.com/Yangqing/ios-cmake.git
 [submodule "third_party/NNPACK"]
 	path = third_party/NNPACK
 	url = https://github.com/Maratyszcza/NNPACK.git
 [submodule "third_party/gloo"]
 	path = third_party/gloo
 	url = https://github.com/facebookincubator/gloo
 [submodule "third_party/NNPACK_deps/pthreadpool"]
 	path = third_party/pthreadpool
 	url = https://github.com/Maratyszcza/pthreadpool.git
 [submodule "third_party/NNPACK_deps/FXdiv"]
 	path = third_party/FXdiv
 	url = https://github.com/Maratyszcza/FXdiv.git
 [submodule "third_party/NNPACK_deps/FP16"]
 	path = third_party/FP16
 	url = https://github.com/Maratyszcza/FP16.git
 [submodule "third_party/NNPACK_deps/psimd"]
 	path = third_party/psimd
 	url = https://github.com/Maratyszcza/psimd.git
 [submodule "third_party/zstd"]
 	path = third_party/zstd
 	url = https://github.com/facebook/zstd.git
 [submodule "third-party/cpuinfo"]
 	path = third_party/cpuinfo
 	url = https://github.com/Maratyszcza/cpuinfo.git
 [submodule "third_party/python-enum"]
 	path = third_party/python-enum
 	url = https://github.com/PeachPy/enum34.git
 [submodule "third_party/python-peachpy"]
 	path = third_party/python-peachpy
 	url = https://github.com/Maratyszcza/PeachPy.git
 [submodule "third_party/python-six"]
 	path = third_party/python-six
 	url = https://github.com/benjaminp/six.git
 [submodule "third_party/ComputeLibrary"]
 	path = third_party/ComputeLibrary
 	url = https://github.com/ARM-software/ComputeLibrary.git
 [submodule "third_party/onnx"]
 	path = third_party/onnx
 	url = https://github.com/onnx/onnx.git
 [submodule "third_party/cereal"]
 	path = third_party/cereal
 	url = https://github.com/USCiLab/cereal
 [submodule "third_party/onnx-tensorrt"]
 	path = third_party/onnx-tensorrt
 	url = https://github.com/onnx/onnx-tensorrt
 [submodule "third_party/sleef"]
 	path = third_party/sleef
 	url = https://github.com/shibatch/sleef
 [submodule "third_party/ideep"]
 	path = third_party/ideep
 	url = https://github.com/intel/ideep

									
										14

.jenkins/caffe2/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				# Jenkins

				The scripts in this directory are the entrypoint for testing Caffe2.

				The environment variable `BUILD_ENVIRONMENT` is expected to be set to

				the build environment you intend to test. It is a hint for the build

				and test scripts to configure Caffe2 a certain way and include/exclude

				tests. Docker images, they equal the name of the image itself. For

				example: `py2-cuda9.0-cudnn7-ubuntu16.04`. The Docker images that are

				built on Jenkins and are used in triggered builds already have this

				environment variable set in their manifest. Also see

				`./docker/jenkins/*/Dockerfile` and search for `BUILD_ENVIRONMENT`.

				Our Jenkins installation is located at https://ci.pytorch.org/jenkins/.

									
										282

.jenkins/caffe2/build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,282 @@

				#!/bin/bash

				set -ex

				pip install --user --no-cache-dir hypothesis==3.59.0

				# The INSTALL_PREFIX here must match up with test.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				CMAKE_ARGS=()

				SCCACHE="$(which sccache)"

				if [ "$(which gcc)" != "/root/sccache/gcc" ]; then

				  # Setup SCCACHE

				  ###############################################################################

				  # Setup sccache if SCCACHE_BUCKET is set

				  if [ -n "${SCCACHE_BUCKET}" ]; then

				    mkdir -p ./sccache

				    SCCACHE="$(which sccache)"

				    if [ -z "${SCCACHE}" ]; then

				      echo "Unable to find sccache..."

				      exit 1

				    fi

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which nvcc) \"\$@\""

				      ) > "./sccache/nvcc"

				      chmod +x "./sccache/nvcc"

				    fi

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				    export PATH="$CACHE_WRAPPER_DIR:$PATH"

				  fi

				fi

				# Setup ccache if configured to use it (and not sccache)

				if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then

				  mkdir -p ./ccache

				  ln -sf "$(which ccache)" ./ccache/cc

				  ln -sf "$(which ccache)" ./ccache/c++

				  ln -sf "$(which ccache)" ./ccache/gcc

				  ln -sf "$(which ccache)" ./ccache/g++

				  ln -sf "$(which ccache)" ./ccache/x86_64-linux-gnu-gcc

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				    ln -sf "$(which ccache)" ./ccache/nvcc

				  fi

				  export CACHE_WRAPPER_DIR="$PWD/ccache"

				  export PATH="$CACHE_WRAPPER_DIR:$PATH"

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				if [ -z "$MAX_JOBS" ]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then

				    MAX_JOBS=`expr $(nproc) - 1`

				  else

				    MAX_JOBS=$(nproc)

				  fi

				fi

				report_compile_cache_stats() {

				  if [[ -n "${SCCACHE}" ]]; then

				    "$SCCACHE" --show-stats

				  elif which ccache > /dev/null; then

				    ccache -s

				  fi

				}

				###############################################################################

				# Explicitly set Python executable.

				###############################################################################

				# On Ubuntu 16.04 the default Python is still 2.7.

				PYTHON="$(which python)"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON=$(which "python${BASH_REMATCH[1]}")

				  CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")

				fi

				###############################################################################

				# Use special scripts for Android, conda, and setup builds

				###############################################################################

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				  CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				  CMAKE_ARGS+=("-DBUILD_TEST=ON")

				  CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				  CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"

				  exit 0

				elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  "${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"

				  report_compile_cache_stats

				  # This build will be tested against onnx tests, which needs onnx installed.

				  # At this point the visible protbuf installation will be in conda, since one

				  # of Caffe2's dependencies uses conda, so the correct protobuf include

				  # headers are those in conda as well

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				  report_compile_cache_stats

				  exit 0

				fi

				###############################################################################

				# Set cmake args

				###############################################################################

				CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				CMAKE_ARGS+=("-DBUILD_TEST=ON")

				CMAKE_ARGS+=("-DINSTALL_TEST=ON")

				CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")

				if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then

				  CMAKE_ARGS+=("-DBLAS=MKL")

				fi

				if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  CMAKE_ARGS+=("-DUSE_CUDA=ON")

				  CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")

				  CMAKE_ARGS+=("-DUSE_NNPACK=OFF")

				  # Explicitly set path to NVCC such that the symlink to ccache or sccache is used

				  CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				  # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.

				  # Setting PATH to resolve to the right nvcc alone isn't enough.

				  # See /usr/share/cmake-3.5/Modules/FindCUDA.cmake, block at line 589.

				  export CUDA_PATH="/usr/local/cuda"

				  # Ensure the ccache symlink can still find the real nvcc binary.

				  export PATH="/usr/local/cuda/bin:$PATH"

				fi

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  # TODO: This is patching the official FindHip to properly handly

				  # cmake generator expression. A PR is opened in the upstream repo here:

				  # https://github.com/ROCm-Developer-Tools/HIP/pull/516

				  # remove this hack once it's merged.

				  if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then

				    sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake

				  fi

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  export HCC_AMDGPU_TARGET=gfx900

				  # The link time of libcaffe2_hip.so takes 40 minutes, according to

				  # https://github.com/RadeonOpenCompute/hcc#thinlto-phase-1---implemented

				  # using using ThinLTO could significantly improve link-time performance.

				  export KMTHINLTO=1

				  ########## HIPIFY Caffe2 operators

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_pytorch_amd.py"

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_caffe2_amd.py"

				fi

				# Try to include Redis support for Linux builds

				if [ "$(uname)" == "Linux" ]; then

				  CMAKE_ARGS+=("-DUSE_REDIS=ON")

				fi

				# Currently, on Jenkins mac os, we will use custom protobuf. Mac OS

				# contbuild at the moment is minimal dependency - it doesn't use glog

				# or gflags either.

				if [ "$(uname)" == "Darwin" ]; then

				  CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=ON")

				fi

				# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace

				CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")

				# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)

				# and use that if so.

				if [[ -x "$(command -v cmake3)" ]]; then

				    CMAKE_BINARY=cmake3

				else

				    CMAKE_BINARY=cmake

				fi

				###############################################################################

				# Configure and make

				###############################################################################

				if [[ -z "$INTEGRATED" ]]; then

				  # Run cmake from ./build_caffe2 directory so it doesn't conflict with

				  # standard PyTorch build directory. Eventually these won't need to

				  # be separate.

				  rm -rf build_caffe2

				  mkdir build_caffe2

				  cd ./build_caffe2

				  # Configure

				  ${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"

				  # Build

				  if [ "$(uname)" == "Linux" ]; then

				    make "-j${MAX_JOBS}" install

				  else

				    echo "Don't know how to build on $(uname)"

				    exit 1

				  fi

				else

				  # sccache will be stuck if  all cores are used for compiling

				  # see https://github.com/pytorch/pytorch/pull/7361

				  if [[ -n "${SCCACHE}" ]]; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				  USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user

				  # This is to save test binaries for testing

				  cp -r torch/lib/tmp_install $INSTALL_PREFIX

				  ls $INSTALL_PREFIX

				  report_compile_cache_stats

				fi

				###############################################################################

				# Install ONNX

				###############################################################################

				# Install ONNX into a local directory

				pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				report_compile_cache_stats

				# Symlink the caffe2 base python path into the system python path,

				# so that we can import caffe2 without having to change $PYTHONPATH.

				# Run in a subshell to contain environment set by /etc/os-release.

				#

				# This is only done when running on Jenkins!  We don't want to pollute

				# the user environment with Python symlinks and ld.so.conf.d hacks.

				#

				if [[ -z "$INTEGRATED" ]]; then

				  if [ -n "${JENKINS_URL}" ]; then

				    (

				      source /etc/os-release

				      function python_version() {

				        "$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'

				      }

				      # Debian/Ubuntu

				      if [[ "$ID_LIKE" == *debian* ]]; then

				        python_path="/usr/local/lib/$(python_version)/dist-packages"

				        sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      fi

				      # RHEL/CentOS

				      if [[ "$ID_LIKE" == *rhel* ]]; then

				        python_path="/usr/lib64/$(python_version)/site-packages/"

				        sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      fi

				      # /etc/ld.so.conf.d is used on both Debian and RHEL

				      echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf

				      sudo ldconfig

				    )

				  fi

				fi

									
										7

.jenkins/caffe2/dirty.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,7 @@

				#!/bin/bash

				set -ex

				upstream="$1"

				pr="$2"

				git diff --name-only "$upstream" "$pr"

				# For safety, unconditionally trigger for any changes.

				#git diff --name-only "$upstream" "$pr" | grep -Eq '^(CMakeLists.txt|Makefile|.gitmodules|.jenkins/caffe2|binaries|caffe|caffe2|cmake|conda|docker|docs/caffe2|modules|scripts|third_party)'

									
										153

.jenkins/caffe2/test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,153 @@

				#!/bin/bash

				set -ex

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				TEST_DIR=$ROOT_DIR/caffe2_tests

				# Figure out which Python to use

				PYTHON="python"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON="python${BASH_REMATCH[1]}"

				fi

				# The prefix must mirror the setting from build.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				# Anaconda builds have a special install prefix and python

				if [[ "$BUILD_ENVIRONMENT" == conda* ]]; then

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PYTHON="/opt/conda/bin/python"

				  INSTALL_PREFIX="/opt/conda/"

				fi

				# Add the site-packages in the caffe2 install prefix to the PYTHONPATH

				SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")

				INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"

				# Skip tests in environments where they are not built/applicable

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  echo 'Skipping tests'

				  exit 0

				fi

				# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed

				# Caffe2. This shouldn't be done on Anaconda, as Anaconda should handle this.

				if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then

				  export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"

				  export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

				fi

				cd "$ROOT_DIR"

				if [ -d $TEST_DIR ]; then

				  echo "Directory $TEST_DIR already exists; please remove it..."

				  exit 1

				fi

				mkdir -p $TEST_DIR/{cpp,python}

				cd "${WORKSPACE}"

				# C++ tests

				echo "Running C++ tests.."

				gtest_reports_dir="${TEST_DIR}/cpp"

				junit_reports_dir="${TEST_DIR}/junit_reports"

				mkdir -p "$gtest_reports_dir" "$junit_reports_dir"

				for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do

				  case "$test" in

				    # skip tests we know are hanging or bad

				    */mkl_utils_test|*/aten/integer_divider_test)

				      continue

				      ;;

				    */scalar_tensor_test|*/basic|*/native_test)

					  if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

						continue

					  else

					    "$test"

					  fi

					  ;;

					*)

				      # Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While

				      # planning to migrate to gtest as the common PyTorch c++ test suite, we

				      # currently do NOT use the xml test reporter, because Catch doesn't

				      # support multiple reporters

				      # c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223

				      # which means that enabling XML output means you lose useful stdout

				      # output for Jenkins.  It's more important to have useful console

				      # output than it is to have XML output for Jenkins.

				      # Note: in the future, if we want to use xml test reporter once we switch

				      # to all gtest, one can simply do:

				      # "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"

				      "$test"

				      ;;

				  esac

				done

				# Get the relative path to where the caffe2 python module was installed

				CAFFE2_PYPATH="$INSTALL_SITE_DIR/caffe2"

				# Collect additional tests to run (outside caffe2/python)

				EXTRA_TESTS=()

				# CUDA builds always include NCCL support

				if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then

				  EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")

				fi

				conda_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == conda* ]]; then

				  # These tests both assume Caffe2 was built with leveldb, which is not the case

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")

				fi

				rocm_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # Currently these tests are failing on ROCM platform:

				  # Unknown reasons, need to debug

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/arg_ops_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")

				  # Need to go through roi ops to replace max(...) with fmaxf(...)

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/roi_align_rotated_op_test.py")

				  # Our cuda top_k op has some asm code, the hipified version doesn't

				  # compile yet, so we don't have top_k operator for now

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/top_k_test.py")

				  # Our AMD CI boxes have 4 gpus on each

				  # Remove this once we have added multi-gpu support

				  export HIP_VISIBLE_DEVICES=$(($BUILD_NUMBER % 4))

				fi

				# Python tests

				echo "Running Python tests.."

				"$PYTHON" \

				  -m pytest \

				  -x \

				  -v \

				  --junit-xml="$TEST_DIR/python/result.xml" \

				  --ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \

				  ${conda_ignore_test[@]} \

				  ${rocm_ignore_test[@]} \

				  "$CAFFE2_PYPATH/python" \

				  "${EXTRA_TESTS[@]}"

				cd ${INSTALL_PREFIX}

				if [[ -n "$INTEGRATED" ]]; then

				  pip install --user torchvision

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										42

.jenkins/pytorch/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,42 @@

				This directory contains scripts for our continuous integration.

				One important thing to keep in mind when reading the scripts here is

				that they are all based off of Docker images, which we build for each of

				the various system configurations we want to run on Jenkins.  This means

				it is very easy to run these tests yourself:

				1. Figure out what Docker image you want.  The general template for our

				   images look like:

				   ``registry.pytorch.org/pytorch/pytorch-$BUILD_ENVIRONMENT:$DOCKER_VERSION``,

				   where ``$BUILD_ENVIRONMENT`` is one of the build environments

				   enumerated in

				   [pytorch-dockerfiles](https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh)

				2. Run ``docker -it -u jenkins $DOCKER_IMAGE``, clone PyTorch and

				   run one of the scripts in this directory.

				The Docker images are designed so that any "reasonable" build commands

				will work; if you look in [build.sh](build.sh) you will see that it is a

				very simple script.  This is intentional.  Idiomatic build instructions

				should work inside all of our Docker images.  You can tweak the commands

				however you need (e.g., in case you want to rebuild with DEBUG, or rerun

				the build with higher verbosity, etc.).

				We have to do some work to make this so.  Here is a summary of the

				mechanisms we use:

				- We install binaries to directories like `/usr/local/bin` which

				  are automatically part of your PATH.

				- We add entries to the PATH using Docker ENV variables (so

				  they apply when you enter Docker) and `/etc/environment` (so they

				  continue to apply even if you sudo), instead of modifying

				  `PATH` in our build scripts.

				- We use `/etc/ld.so.conf.d` to register directories containing

				  shared libraries, instead of modifying `LD_LIBRARY_PATH` in our

				  build scripts.

				- We reroute well known paths like `/usr/bin/gcc` to alternate

				  implementations with `update-alternatives, instead of setting

				  `CC` and `CXX` in our implementations.

									
										21

.jenkins/pytorch/build-asan.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,21 @@

				#!/bin/bash

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Clang version:"

				clang --version

				# detect_leaks=0: Python is very leaky, so we need suppress it

				# symbolize=1: Gives us much better errors when things go wrong

				export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				# TODO: Make the ASAN flags a more unified env var

				CC="clang" CXX="clang++" LDSHARED="clang --shared" \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \

				  NO_CUDA=1 \

				  python setup.py install

									
										145

.jenkins/pytorch/build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,145 @@

				#!/bin/bash

				# For distributed, four environmental configs:

				# (1) build with only NCCL

				# (2) build with NCCL and MPI

				# (3) build with only MPI

				# (4) build with neither

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				  sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				  sudo mkdir -p /var/run/sshd

				fi

				if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*

				fi

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Python version:"

				python --version

				echo "GCC version:"

				gcc --version

				echo "CMake version:"

				cmake --version

				# TODO: Don't run this...

				pip install -r requirements.txt || true

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # This is necessary in order to cross compile (or else we'll have missing GPU device).

				  export HCC_AMDGPU_TARGET=gfx900

				  # These environment variables are not set on CI when we were running as the Jenkins user.

				  # The HIP Utility scripts require these environment variables to be set in order to run without error.

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # This environment variable enabled HCC Optimizations that speed up the linking stage.

				  # https://github.com/RadeonOpenCompute/hcc#hcc-with-thinlto-linking

				  export KMTHINLTO=1

				  # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime

				  sudo apt-get install libc++1

				  sudo apt-get install libc++abi1

				  python tools/amd_build/build_pytorch_amd.py

				  python tools/amd_build/build_caffe2_amd.py

				  USE_ROCM=1 python setup.py install --user

				  exit 0

				fi

				# TODO: Don't install this here

				if ! which conda; then

				  pip install mkl mkl-devel

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used

				if [ -z "$MAX_JOBS" ]; then

				  if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				fi

				# Target only our CI GPU machine's CUDA arch to speed up the build

				export TORCH_CUDA_ARCH_LIST="5.2"

				if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  export TORCH_CUDA_ARCH_LIST="6.0"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then

				  export DEBUG=1

				fi

				# ppc64le build fails when WERROR=1

				# set only when building other architectures

				# only use for "python setup.py install" line

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  WERROR=1 python setup.py install

				elif [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  python setup.py install

				fi

				# Add the test binaries so that they won't be git clean'ed away

				git add -f build/bin

				# Test C FFI plugins

				# cffi install doesn't work for Python 3.7

				if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then

				  # TODO: Don't run this here

				  pip install cffi

				  git clone https://github.com/pytorch/extension-ffi.git

				  pushd extension-ffi/script

				  python build.py

				  popd

				fi

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip install -r requirements.txt || true

				  LC_ALL=C make html

				  popd

				fi

				# Test no-Python build

				if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  echo "Building libtorch"

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				  mkdir -p ../cpp-build/caffe2

				  pushd ../cpp-build/caffe2

				  WERROR=1 VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				  popd

				  # Build custom operator tests.

				  CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				  CUSTOM_OP_TEST="$PWD/test/custom_operator"

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  mkdir "$CUSTOM_OP_BUILD"

				  pushd "$CUSTOM_OP_BUILD"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"

				  make VERBOSE=1

				  popd

				fi

									
										140

.jenkins/pytorch/common.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,140 @@

				#!/bin/bash

				# Common setup for all Jenkins scripts

				# NB: define this function before set -x, so that we don't

				# pollute the log with a premature EXITED_USER_LAND ;)

				function cleanup {

				  # Note that if you've exited user land, then CI will conclude that

				  # any failure is the CI's fault.  So we MUST only output this

				  # string

				  retcode=$?

				  set +x

				  if [ $retcode -eq 0 ]; then

				    echo "EXITED_USER_LAND"

				  fi

				}

				set -ex

				# Required environment variables:

				#   $BUILD_ENVIRONMENT (should be set by your Docker image)

				# This token is used by a parser on Jenkins logs for determining

				# if a failure is a legitimate problem, or a problem with the build

				# system; to find out more, grep for this string in ossci-job-dsl.

				echo "ENTERED_USER_LAND"

				# compositional trap taken from https://stackoverflow.com/a/7287873/23845

				# note: printf is used instead of echo to avoid backslash

				# processing and to properly handle values that begin with a '-'.

				log() { printf '%s\n' "$*"; }

				error() { log "ERROR: $*" >&2; }

				fatal() { error "$@"; exit 1; }

				# appends a command to a trap

				#

				# - 1st arg:  code to add

				# - remaining args:  names of traps to modify

				#

				trap_add() {

				    trap_add_cmd=$1; shift || fatal "${FUNCNAME} usage error"

				    for trap_add_name in "$@"; do

				        trap -- "$(

				            # helper fn to get existing trap command from output

				            # of trap -p

				            extract_trap_cmd() { printf '%s\n' "$3"; }

				            # print existing trap command with newline

				            eval "extract_trap_cmd $(trap -p "${trap_add_name}")"

				            # print the new trap command

				            printf '%s\n' "${trap_add_cmd}"

				        )" "${trap_add_name}" \

				            || fatal "unable to add to trap ${trap_add_name}"

				    done

				}

				# set the trace attribute for the above function.  this is

				# required to modify DEBUG or RETURN traps because functions don't

				# inherit them unless the trace attribute is set

				declare -f -t trap_add

				trap_add cleanup EXIT

				if which sccache > /dev/null; then

				  # Save sccache logs to file

				  sccache --stop-server || true

				  rm ~/sccache_error.log || true

				  SCCACHE_ERROR_LOG=~/sccache_error.log RUST_LOG=sccache::server=error sccache --start-server

				  # Report sccache stats for easier debugging

				  sccache --zero-stats

				  function sccache_epilogue() {

				    echo '=================== sccache compilation log ==================='

				    python $(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py ~/sccache_error.log

				    echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='

				    sccache --show-stats

				    sccache --stop-server || true

				  }

				  trap_add sccache_epilogue EXIT

				fi

				if which ccache > /dev/null; then

				  # Report ccache stats for easier debugging

				  ccache --zero-stats

				  ccache --show-stats

				  function ccache_epilogue() {

				    ccache --show-stats

				  }

				  trap_add ccache_epilogue EXIT

				fi

				# It's called a COMPACT_JOB_NAME because it's distinct from the

				# Jenkin's provided JOB_NAME, which also includes a prefix folder

				# e.g. pytorch-builds/

				if [ -z "$COMPACT_JOB_NAME" ]; then

				  echo "Jenkins build scripts must set COMPACT_JOB_NAME"

				  exit 1

				fi

				if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/disabled-configs.txt"; then

				  echo "Job is explicitly disabled, SKIPPING"

				  exit 0

				else

				  echo "Job is not disabled, proceeding"

				fi

				if grep --line-regexp -q "$COMPACT_JOB_NAME" "$(dirname "${BASH_SOURCE[0]}")/enabled-configs.txt"; then

				  echo "Job is enabled, proceeding"

				else

				  echo "Job is not enabled, FAILING now (revert changes to enabled-configs.txt to fix this)"

				  exit 1

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]]; then

				  BUILD_TEST_LIBTORCH=1

				else

				  BUILD_TEST_LIBTORCH=0

				fi

				# Use conda cmake in some CI build. Conda cmake will be newer than our supported

				# min version 3.5, so we only do it in two builds that we know should use conda.

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn6-py2* ]] || \

				     [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then

				    if ! which conda; then

				      echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"

				      exit 1

				    else

				      conda install -q -y cmake

				    fi

				  else

				    if ! cmake --version | grep 'cmake version 3\.5'; then

				      echo "Expected ${BUILD_ENVIRONMENT} to have cmake version 3.5.* (min support version), but 'cmake --version' returns:"

				      cmake --version

				      exit 1

				    fi

				  fi

				fi

									
										10

.jenkins/pytorch/dirty.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,10 @@

				#!/bin/bash

				set -ex

				upstream="$1"

				pr="$2"

				git diff --name-only "$upstream" "$pr"

				# Now that PyTorch build depends on Caffe2, unconditionally trigger

				# for any changes.

				# TODO: Replace this with a NEGATIVE regex that allows us to blacklist

				# files (letting us skip builds when they are unnecessary)

				#git diff --name-only "$upstream" "$pr" | grep -Eq '^(aten/|caffe2/|.jenkins/pytorch|docs/(make.bat|Makefile|requirements.txt|source)|mypy|requirements.txt|setup.py|test/|third_party/|tools/|\.gitmodules|torch/)'

5

.jenkins/pytorch/disabled-configs.txt Normal file

View File

 @ -0,0 +1,5 @@
 # This file contains a list of disabled configurations.  Disabled
 # configurations are skipped and not considered a failure if they
 # fail.  You can use this to temporarily reserve a test name to
 # turn on CI side before PyTorch repository supports it.  This
 # file has the same format as .jenkins/enabled-configs.txt

									
										6

.jenkins/pytorch/docker-build-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,6 @@

				#!/bin/bash

				COMPACT_JOB_NAME="docker-build-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				docker build -t pytorch .

48

.jenkins/pytorch/enabled-configs.txt Normal file

View File

 @ -0,0 +1,48 @@
 # This file contains a list of enabled configurations
 # to perform tests on.  If you want to run tests on CI on
 # a limited set of tests before enabling the full test suite,
 # you can delete lines from this file.  Any test that is not
 # in this file will report a failure (so you don't forget to
 # reenable the tests on merge ;)
 pytorch-linux-xenial-cuda8-cudnn6-py3-build
 pytorch-linux-xenial-cuda8-cudnn6-py3-test
 pytorch-linux-xenial-cuda8-cudnn6-py3-multigpu-test
 pytorch-linux-xenial-cuda9-cudnn7-py2-build
 pytorch-linux-xenial-cuda9-cudnn7-py2-test
 pytorch-linux-xenial-cuda9-cudnn7-py3-build
 pytorch-linux-xenial-cuda9-cudnn7-py3-test
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-py3-clang5-asan-build
 pytorch-linux-xenial-py3-clang5-asan-test
 pytorch-linux-trusty-py2.7.9-build
 pytorch-linux-trusty-py2.7.9-test
 pytorch-linux-trusty-py2.7-build
 pytorch-linux-trusty-py2.7-test
 pytorch-linux-trusty-py3.5-build
 pytorch-linux-trusty-py3.5-test
 pytorch-linux-trusty-py3.6-gcc4.8-build
 pytorch-linux-trusty-py3.6-gcc4.8-test
 pytorch-linux-trusty-py3.6-gcc5.4-build
 pytorch-linux-trusty-py3.6-gcc5.4-test
 pytorch-linux-trusty-py3.6-gcc7.2-build
 pytorch-linux-trusty-py3.6-gcc7.2-test
 pytorch-linux-trusty-py3.6-gcc7-build
 pytorch-linux-trusty-py3.6-gcc7-test
 pytorch-linux-trusty-pynightly-build
 pytorch-linux-trusty-pynightly-test
 pytorch-win-ws2016-cuda9-cudnn7-py3-build
 pytorch-win-ws2016-cuda9-cudnn7-py3-test
 pytorch-macos-10.13-py3-build
 pytorch-macos-10.13-py3-test
 pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
 pytorch-docker-build-test
 short-perf-test-cpu
 short-perf-test-gpu
 py2-clang3.8-rocm1.7.1-ubuntu16.04-build
 py2-clang3.8-rocm1.7.1-ubuntu16.04-test
 pytorch-ppc64le-cuda9.2-cudnn7-py3-build
 pytorch-ppc64le-cuda9.2-cudnn7-py3-test
 pytorch-ppc64le-cuda9.1-cudnn7-py3-build
 pytorch-ppc64le-cuda9.1-cudnn7-py3-test

									
										9

.jenkins/pytorch/macos-build-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,9 @@

				#!/bin/bash

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-build* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-build.sh"

				fi

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-test.sh"

				fi

									
										72

.jenkins/pytorch/macos-build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,72 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				export PATH="/usr/local/bin:$PATH"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Build PyTorch

				if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				  export CUDA_VERSION=9.2

				  export TORCH_CUDA_ARCH_LIST=5.2

				  export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}

				  export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

				  export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}

				  export NO_CUDA=0

				  if [ -z "${IN_CIRCLECI}" ]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  fi

				else

				  if [ -z "${IN_CIRCLECI}" ]; then

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				if which sccache > /dev/null; then

				  printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${PYTORCH_ENV_DIR}/clang++"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang++"

				  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang"

				  if [[ "${JOB_BASE_NAME}" == *cuda* ]]; then

				    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"

				    chmod a+x "${PYTORCH_ENV_DIR}/nvcc"

				    export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"

				  fi

				  export PATH="${PYTORCH_ENV_DIR}:$PATH"

				fi

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				python setup.py install

				# Upload torch binaries when the build job is finished

				if [ -z "${IN_CIRCLECI}" ]; then

				  7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read

				fi

									
										112

.jenkins/pytorch/macos-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,112 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export PATH="/usr/local/bin:$PATH"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				fi

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Test PyTorch

				if [ -z "${IN_CIRCLECI}" ]; then

				  if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  else

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				# Download torch binaries in the test jobs

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z

				  7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"

				fi

				test_python_all() {

				  echo "Ninja version: $(ninja --version)"

				  python test/run_test.py --verbose

				}

				test_cpp_api() {

				  # C++ API

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  # But still clean it before we perform our own build.

				  #

				  CPP_BUILD="$PWD/../cpp-build"

				  rm -rf $CPP_BUILD

				  mkdir -p $CPP_BUILD/caffe2

				  BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				  pushd $CPP_BUILD/caffe2

				  VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				  popd

				  python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				  # Unfortunately it seems like the test can't load from miniconda3

				  # without these paths being set

				  export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  "$CPP_BUILD"/caffe2/bin/test_api

				}

				test_custom_script_ops() {

				  echo "Testing custom script operators"

				  pushd test/custom_operator

				  # Build the custom operator library.

				  rm -rf build && mkdir build

				  pushd build

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..

				  make VERBOSE=1

				  popd

				  # Run tests Python-side and export a script module.

				  python test_custom_ops.py -v

				  python model.py --export-script-module=model.pt

				  # Run tests C++-side and load the exported script module.

				  build/test_custom_ops ./model.pt

				  popd

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				  test_python_all

				  test_cpp_api

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_python_all

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_cpp_api

				    test_custom_script_ops

				  fi

				fi

									
										28

.jenkins/pytorch/multigpu-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,28 @@

				#!/bin/bash

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch (distributed only)"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				time python test/run_test.py --verbose -i distributed

									
										21

.jenkins/pytorch/perf_test/common.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,21 @@

				#!/bin/bash

				run_test () {

				  rm -rf test_tmp/ && mkdir test_tmp/ && cd test_tmp/

				  "$@"

				  cd .. && rm -rf test_tmp/

				}

				get_runtime_of_command () {

				  TIMEFORMAT=%R

				  # runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)

				  runtime=$( { time $@; } 2>&1 1>/dev/null)

				  if [[ $runtime == *"Error"* ]]; then

				    exit 1

				  fi

				  runtime=${runtime#+++ $@}

				  runtime=$(python -c "print($runtime)")

				  echo $runtime

				}

									
										66

.jenkins/pytorch/perf_test/compare_with_baseline.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,66 @@

				import sys

				import json

				import numpy

				import argparse

				parser = argparse.ArgumentParser()

				parser.add_argument('--test-name', dest='test_name', action='store',

				                    required=True, help='test name')

				parser.add_argument('--sample-stats', dest='sample_stats', action='store',

				                    required=True, help='stats from sample')

				parser.add_argument('--update', action='store_true',

				                    help='whether to update baseline using stats from sample')

				args = parser.parse_args()

				test_name = args.test_name

				if 'cpu' in test_name:

				    backend = 'cpu'

				elif 'gpu' in test_name:

				    backend = 'gpu'

				data_file_path = '../{}_runtime.json'.format(backend)

				with open(data_file_path) as data_file:

				    data = json.load(data_file)

				if test_name in data:

				    mean = float(data[test_name]['mean'])

				    sigma = float(data[test_name]['sigma'])

				else:

				    # Let the test pass if baseline number doesn't exist

				    mean = sys.maxsize

				    sigma = 0.001

				print("population mean: ", mean)

				print("population sigma: ", sigma)

				sample_stats_data = json.loads(args.sample_stats)

				sample_mean = sample_stats_data['mean']

				sample_sigma = sample_stats_data['sigma']

				print("sample mean: ", sample_mean)

				print("sample sigma: ", sample_sigma)

				z_value = (sample_mean - mean) / sigma

				print("z-value: ", z_value)

				if z_value >= 3:

				    raise Exception('''\n

				z-value >= 3, there is high chance of perf regression.\n

				To reproduce this regression, run `cd .jenkins/pytorch/perf_test/ && bash ''' + test_name + '''.sh` on your local machine and compare the runtime before/after your code change.

				''')

				else:

				    print("z-value < 3, no perf regression detected.")

				    if args.update:

				        print("We will use these numbers as new baseline.")

				        new_data_file_path = '../new_{}_runtime.json'.format(backend)

				        with open(new_data_file_path) as new_data_file:

				            new_data = json.load(new_data_file)

				        new_data[test_name] = {}

				        new_data[test_name]['mean'] = sample_mean

				        new_data[test_name]['sigma'] = max(sample_sigma, sample_mean * 0.1)

				        with open(new_data_file_path, 'w') as new_data_file:

				            json.dump(new_data, new_data_file, indent=4)

									
										16

.jenkins/pytorch/perf_test/get_stats.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,16 @@

				import sys

				import json

				import numpy

				sample_data_list = sys.argv[1:]

				sample_data_list = [float(v.strip()) for v in sample_data_list]

				sample_mean = numpy.mean(sample_data_list)

				sample_sigma = numpy.std(sample_data_list)

				data = {

				    'mean': sample_mean,

				    'sigma': sample_sigma,

				}

				print(json.dumps(data))

									
										42

.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,42 @@

				#!/bin/bash

				. ./common.sh

				test_cpu_speed_mini_sequence_labeler () {

				  echo "Testing: mini sequence labeler, CPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/

				  git checkout 726567a455edbfda6199445922a8cfee82535664

				  cd scripts/mini_sequence_labeler

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py)

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_cpu_speed_mini_sequence_labeler "$@"

				fi

									
										44

.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/bin/bash

				. ./common.sh

				test_cpu_speed_mnist () {

				  echo "Testing: MNIST, CPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/examples.git -b perftests

				  cd examples/mnist

				  pip install -r requirements.txt

				  # Download data

				  python main.py --epochs 0

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_cpu_speed_mnist "$@"

				fi

									
										28

.jenkins/pytorch/perf_test/test_cpu_speed_torch.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				. ./common.sh

				test_cpu_speed_torch () {

				  echo "Testing: torch.*, CPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/yf225/perf-tests.git

				  if [ "$1" == "compare_with_baseline" ]; then

				    export ARGS="--compare ../cpu_runtime.json"

				  elif [ "$1" == "compare_and_update" ]; then

				    export ARGS="--compare ../cpu_runtime.json --update ../new_cpu_runtime.json"

				  elif [ "$1" == "update_only" ]; then

				    export ARGS="--update ../new_cpu_runtime.json"

				  fi

				  if ! python perf-tests/modules/test_cpu_torch.py ${ARGS}; then

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."

				    exit 1

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_cpu_speed_torch "$@"

				fi

									
										28

.jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				. ./common.sh

				test_cpu_speed_torch_tensor () {

				  echo "Testing: torch.Tensor.*, CPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/yf225/perf-tests.git

				  if [ "$1" == "compare_with_baseline" ]; then

				    export ARGS="--compare ../cpu_runtime.json"

				  elif [ "$1" == "compare_and_update" ]; then

				    export ARGS="--compare ../cpu_runtime.json --update ../new_cpu_runtime.json"

				  elif [ "$1" == "update_only" ]; then

				    export ARGS="--update ../new_cpu_runtime.json"

				  fi

				  if ! python perf-tests/modules/test_cpu_torch_tensor.py ${ARGS}; then

				    echo "To reproduce this regression, run \`cd .jenkins/pytorch/perf_test/ && bash "${FUNCNAME[0]}".sh\` on your local machine and compare the runtime before/after your code change."

				    exit 1

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_cpu_speed_torch_tensor "$@"

				fi

									
										43

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,43 @@

				#!/bin/bash

				. ./common.sh

				test_gpu_speed_cudnn_lstm () {

				  echo "Testing: CuDNN LSTM, GPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python cudnn_lstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_gpu_speed_cudnn_lstm "$@"

				fi

									
										43

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,43 @@

				#!/bin/bash

				. ./common.sh

				test_gpu_speed_lstm () {

				  echo "Testing: LSTM, GPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python lstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_gpu_speed_lstm "$@"

				fi

									
										43

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,43 @@

				#!/bin/bash

				. ./common.sh

				test_gpu_speed_mlstm () {

				  echo "Testing: MLSTM, GPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python mlstm.py --skip-cpu-governor-check)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_gpu_speed_mlstm "$@"

				fi

									
										44

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/bin/bash

				. ./common.sh

				test_gpu_speed_mnist () {

				  echo "Testing: MNIST, GPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/examples.git -b perftests

				  cd examples/mnist

				  pip install -r requirements.txt

				  # Download data

				  python main.py --epochs 0

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_gpu_speed_mnist "$@"

				fi

									
										52

.jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,52 @@

				#!/bin/bash

				. ./common.sh

				test_gpu_speed_word_language_model () {

				  echo "Testing: word language model on Wikitext-2, GPU"

				  export OMP_NUM_THREADS=4

				  export MKL_NUM_THREADS=4

				  git clone https://github.com/pytorch/examples.git -b perftests

				  cd examples/word_language_model

				  cd data/wikitext-2

				  # Reduce dataset size, so that we can have more runs per test

				  sed -n '1,200p' test.txt > test_tmp.txt

				  sed -n '1,1000p' train.txt > train_tmp.txt

				  sed -n '1,200p' valid.txt > valid_tmp.txt

				  mv test_tmp.txt test.txt

				  mv train_tmp.txt train.txt

				  mv valid_tmp.txt valid.txt

				  cd ../..

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --cuda --epochs 1)

				    echo $runtime

				    SAMPLE_ARRAY+=(${runtime})

				  done

				  cd ../..

				  stats=$(python ../get_stats.py ${SAMPLE_ARRAY[@]})

				  echo "Runtime stats in seconds:"

				  echo $stats

				  if [ "$2" == "compare_with_baseline" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}"

				  elif [ "$2" == "compare_and_update" ]; then

				    python ../compare_with_baseline.py --test-name ${FUNCNAME[0]} --sample-stats "${stats}" --update

				  fi

				}

				if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then

				  run_test test_gpu_speed_word_language_model "$@"

				fi

									
										13

.jenkins/pytorch/perf_test/update_commit_hash.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				import sys

				import json

				data_file_path = sys.argv[1]

				commit_hash = sys.argv[2]

				with open(data_file_path) as data_file:

				    data = json.load(data_file)

				data['commit'] = commit_hash

				with open(data_file_path, 'w') as data_file:

				    json.dump(data, data_file)

									
										11

.jenkins/pytorch/print_sccache_log.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				import sys

				log_file_path = sys.argv[1]

				with open(log_file_path) as f:

				    lines = f.readlines()

				for line in lines:

				    # Ignore errors from CPU instruction set testing

				    if 'src.c' not in line:

				        print(line)

									
										64

.jenkins/pytorch/short-perf-test-cpu.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,64 @@

				#!/bin/bash

				COMPACT_JOB_NAME="short-perf-test-cpu"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				cd .jenkins/pytorch/perf_test

				echo "Running CPU perf test for PyTorch..."

				pip install awscli

				# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read

				# More info at https://github.com/aws/aws-cli/issues/2321

				aws configure set default.s3.multipart_threshold 5GB

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # Get current master commit hash

				    export MASTER_COMMIT_ID=$(git log --format="%H" -n 1)

				fi

				# Find the master commit to test against

				git remote add upstream https://github.com/pytorch/pytorch.git

				git fetch upstream

				IFS=$'\n'

				master_commit_ids=($(git rev-list upstream/master))

				for commit_id in "${master_commit_ids[@]}"; do

				    if aws s3 ls s3://ossci-perf-test/pytorch/cpu_runtime/${commit_id}.json; then

				        LATEST_TESTED_COMMIT=${commit_id}

				        break

				    fi

				done

				aws s3 cp s3://ossci-perf-test/pytorch/cpu_runtime/${LATEST_TESTED_COMMIT}.json cpu_runtime.json

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # Prepare new baseline file

				    cp cpu_runtime.json new_cpu_runtime.json

				    python update_commit_hash.py new_cpu_runtime.json ${MASTER_COMMIT_ID}

				fi

				# Include tests

				. ./test_cpu_speed_mini_sequence_labeler.sh

				. ./test_cpu_speed_mnist.sh

				. ./test_cpu_speed_torch.sh

				. ./test_cpu_speed_torch_tensor.sh

				# Run tests

				export TEST_MODE="compare_with_baseline"

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    export TEST_MODE="compare_and_update"

				fi

				# Operator tests

				run_test test_cpu_speed_torch ${TEST_MODE}

				run_test test_cpu_speed_torch_tensor ${TEST_MODE}

				# Sample model tests

				run_test test_cpu_speed_mini_sequence_labeler 20 ${TEST_MODE}

				run_test test_cpu_speed_mnist 20 ${TEST_MODE}

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # This could cause race condition if we are testing the same master commit twice,

				    # but the chance of them executing this line at the same time is low.

				    aws s3 cp new_cpu_runtime.json s3://ossci-perf-test/pytorch/cpu_runtime/${MASTER_COMMIT_ID}.json --acl public-read

				fi

									
										68

.jenkins/pytorch/short-perf-test-gpu.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,68 @@

				#!/bin/bash

				COMPACT_JOB_NAME="short-perf-test-gpu"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				pushd .jenkins/pytorch/perf_test

				echo "Running GPU perf test for PyTorch..."

				pip install awscli

				# Set multipart_threshold to be sufficiently high, so that `aws s3 cp` is not a multipart read

				# More info at https://github.com/aws/aws-cli/issues/2321

				aws configure set default.s3.multipart_threshold 5GB

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # Get current master commit hash

				    export MASTER_COMMIT_ID=$(git log --format="%H" -n 1)

				fi

				# Find the master commit to test against

				git remote add upstream https://github.com/pytorch/pytorch.git

				git fetch upstream

				IFS=$'\n'

				master_commit_ids=($(git rev-list upstream/master))

				for commit_id in "${master_commit_ids[@]}"; do

				    if aws s3 ls s3://ossci-perf-test/pytorch/gpu_runtime/${commit_id}.json; then

				        LATEST_TESTED_COMMIT=${commit_id}

				        break

				    fi

				done

				aws s3 cp s3://ossci-perf-test/pytorch/gpu_runtime/${LATEST_TESTED_COMMIT}.json gpu_runtime.json

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # Prepare new baseline file

				    cp gpu_runtime.json new_gpu_runtime.json

				    python update_commit_hash.py new_gpu_runtime.json ${MASTER_COMMIT_ID}

				fi

				# Include tests

				. ./test_gpu_speed_mnist.sh

				. ./test_gpu_speed_word_language_model.sh

				. ./test_gpu_speed_cudnn_lstm.sh

				. ./test_gpu_speed_lstm.sh

				. ./test_gpu_speed_mlstm.sh

				# Run tests

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    run_test test_gpu_speed_mnist 20 compare_and_update

				    run_test test_gpu_speed_word_language_model 20 compare_and_update

				    run_test test_gpu_speed_cudnn_lstm 20 compare_and_update

				    run_test test_gpu_speed_lstm 20 compare_and_update

				    run_test test_gpu_speed_mlstm 20 compare_and_update

				else

				    run_test test_gpu_speed_mnist 20 compare_with_baseline

				    run_test test_gpu_speed_word_language_model 20 compare_with_baseline

				    run_test test_gpu_speed_cudnn_lstm 20 compare_with_baseline

				    run_test test_gpu_speed_lstm 20 compare_with_baseline

				    run_test test_gpu_speed_mlstm 20 compare_with_baseline

				fi

				if [[ "$COMMIT_SOURCE" == master ]]; then

				    # This could cause race condition if we are testing the same master commit twice,

				    # but the chance of them executing this line at the same time is low.

				    aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/${MASTER_COMMIT_ID}.json --acl public-read

				fi

				popd

									
										177

.jenkins/pytorch/test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,177 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				echo "Testing pytorch"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				# JIT C++ extensions require ninja.

				git clone https://github.com/ninja-build/ninja --quiet

				pushd ninja

				python ./configure.py --bootstrap

				export PATH="$PWD:$PATH"

				popd

				# DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems

				# if you're not careful.  Check this if you made some changes and the

				# ASAN test is not working

				if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true

				    # We suppress the vptr volation, since we have separate copies of

				    # libprotobuf in both libtorch.so and libcaffe2.so, and it causes

				    # the following problem:

				    #    test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:

				    #        runtime error: member call on address ... which does not point

				    #        to an object of type 'google::protobuf::MessageLite'

				    #        ...: note: object is of type 'onnx_torch::ModelProto'

				    #

				    # This problem should be solved when libtorch.so and libcaffe2.so are

				    # merged.

				    export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp

				    export PYTORCH_TEST_WITH_ASAN=1

				    export PYTORCH_TEST_WITH_UBSAN=1

				    # TODO: Figure out how to avoid hard-coding these paths

				    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer

				    export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so

				    # Increase stack size, because ASAN red zones use more stack

				    ulimit -s 81920

				    function get_exit_code() {

				      set +e

				      "$@"

				      retcode=$?

				      set -e

				      return $retcode

				    }

				    (cd test && python -c "import torch")

				    echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)")

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")

				fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export PYTORCH_TEST_WITH_ROCM=1

				fi

				if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then

				  export ATEN_CPU_CAPABILITY=default

				elif [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then

				  export ATEN_CPU_CAPABILITY=avx

				fi

				test_python_nn() {

				  time python test/run_test.py --include nn --verbose

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn --verbose

				}

				test_aten() {

				  # Test ATen

				  # The following test(s) of ATen have already been skipped by caffe2 in rocm environment:

				  # scalar_tensor_test, basic, native_test

				  if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then

				    echo "Running ATen tests with pytorch lib"

				    TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib

				    # NB: the ATen test binaries don't have RPATH set, so it's necessary to

				    # put the dynamic libraries somewhere were the dynamic linker can find them.

				    # This is a bit of a hack.

				    if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				      SUDO=sudo

				    fi

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

				    ls build/bin

				    aten/tools/run_tests.sh build/bin

				  fi

				}

				test_torchvision() {

				  rm -rf ninja

				  echo "Installing torchvision at branch master"

				  rm -rf vision

				  # TODO: This git clone is bad, it means pushes to torchvision can break

				  # PyTorch CI

				  git clone https://github.com/pytorch/vision --quiet

				  pushd vision

				  # python setup.py install with a tqdm dependency is broken in the

				  # Travis Python nightly (but not in latest Python nightlies, so

				  # this should be a transient requirement...)

				  # See https://github.com/pytorch/pytorch/issues/7525

				  #time python setup.py install

				  pip install --user .

				  popd

				}

				test_libtorch() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				     echo "Testing libtorch"

				     CPP_BUILD="$PWD/../cpp-build"

				     if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				       "$CPP_BUILD"/caffe2/bin/test_jit

				     else

				       "$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"

				     fi

				     python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				     OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api

				  fi

				}

				test_custom_script_ops() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				    echo "Testing custom script operators"

				    CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				    pushd test/custom_operator

				    cp -r "$CUSTOM_OP_BUILD" build

				    # Run tests Python-side and export a script module.

				    python test_custom_ops.py -v

				    python model.py --export-script-module=model.pt

				    # Run tests C++-side and load the exported script module.

				    build/test_custom_ops ./model.pt

				    popd

				  fi

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				  test_python_nn

				  test_python_all_except_nn

				  test_aten

				  test_torchvision

				  test_libtorch

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_python_nn

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_python_all_except_nn

				    test_aten

				    test_torchvision

				    test_libtorch

				    test_custom_script_ops

				  fi

				fi

									
										155

.jenkins/pytorch/win-build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,155 @@

				#!/bin/bash

				# If you want to rebuild, run this with REBUILD=1

				# If you want to build with CUDA, run this with USE_CUDA=1

				# If you want to build without CUDA, run this with USE_CUDA=0

				if [ ! -f setup.py ]; then

				  echo "ERROR: Please run this build script from PyTorch root directory."

				  exit 1

				fi

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				if [[ ${JOB_NAME} == *"develop"* ]]; then

				  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

				fi

				mkdir -p ci_scripts/

				cat >ci_scripts/upload_image.py << EOL

				import os

				import sys

				import boto3

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				data = open(sys.argv[1], 'rb')

				s3.Bucket('ossci-windows-build').put_object(Key='pytorch/'+IMAGE_COMMIT_TAG+'.7z', Body=data)

				object_acl = s3.ObjectAcl('ossci-windows-build','pytorch/'+IMAGE_COMMIT_TAG+'.7z')

				response = object_acl.put(ACL='public-read')

				EOL

				cat >ci_scripts/build_pytorch.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install MKL

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet

				  )

				  7z x -aoa mkl.7z -omkl

				)

				set CMAKE_INCLUDE_PATH=%cd%\\mkl\\include

				set LIB=%cd%\\mkl\\lib;%LIB

				:: Install MAGMA

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet

				  )

				  7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma

				)

				set MAGMA_HOME=%cd%\\magma

				:: Install sccache

				mkdir %CD%\\tmp_bin

				if "%REBUILD%"=="" (

				  :check_sccache

				  %CD%\\tmp_bin\\sccache.exe --show-stats || (

				    taskkill /im sccache.exe /f /t || ver > nul

				    del %CD%\\tmp_bin\\sccache.exe

				    if "%BUILD_ENVIRONMENT%"=="" (

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %CD%\\tmp_bin\\sccache.exe

				    ) else (

				      aws s3 cp s3://ossci-windows/sccache.exe %CD%\\tmp_bin\\sccache.exe

				    )

				    goto :check_sccache

				  )

				)

				:: Install Miniconda3

				if "%REBUILD%"=="" (

				  IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				  .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				)

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )

				:: Install ninja

				if "%REBUILD%"=="" ( pip install ninja )

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				git submodule update --init --recursive

				set PATH=%CD%\\tmp_bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt

				set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				:: Target only our CI GPU machine's CUDA arch to speed up the build

				set TORCH_CUDA_ARCH_LIST=5.2

				sccache --stop-server

				sccache --start-server

				sccache --zero-stats

				set CC=sccache cl

				set CXX=sccache cl

				set DISTUTILS_USE_SDK=1

				set CMAKE_GENERATOR=Ninja

				if not "%USE_CUDA%"=="1" (

				  if "%REBUILD%"=="" (

				    set NO_CUDA=1

				    python setup.py install

				  )

				  if errorlevel 1 exit /b 1

				  if not errorlevel 0 exit /b 1

				)

				if not "%USE_CUDA%"=="0" (

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				    rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch

				    copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe

				  )

				  set CUDA_NVCC_EXECUTABLE=%CD%\\tmp_bin\\nvcc

				  if "%REBUILD%"=="" set NO_CUDA=0

				  python setup.py install && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."

				    ) else (

				      7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				    )

				  )

				)

				EOL

				ci_scripts/build_pytorch.bat

				if [ ! -f $IMAGE_COMMIT_TAG.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then

				    exit 1

				fi

				echo "BUILD PASSED"

									
										93

.jenkins/pytorch/win-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,93 @@

				#!/bin/bash

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-test

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				if [[ ${JOB_NAME} == *"develop"* ]]; then

				  export IMAGE_COMMIT_TAG=develop-${IMAGE_COMMIT_TAG}

				fi

				mkdir -p ci_scripts/

				cat >ci_scripts/download_image.py << EOL

				import os

				import sys

				import boto3

				import botocore

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				BUCKET_NAME = 'ossci-windows-build'

				KEY = 'pytorch/'+IMAGE_COMMIT_TAG+'.7z'

				LOCAL_FILE_PATH = sys.argv[1]

				try:

				    s3.Bucket(BUCKET_NAME).download_file(KEY, LOCAL_FILE_PATH)

				except botocore.exceptions.ClientError as e:

				    if e.response['Error']['Code'] == "404":

				        print("The object does not exist.")

				    else:

				        raise

				EOL

				cat >ci_scripts/setup_pytorch_env.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install Miniconda3

				IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				call conda install -y -q numpy mkl cffi pyyaml boto3

				pip install ninja

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt

				set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\lib\\x64

				set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set PYTHONPATH=%CD%\\test;%PYTHONPATH%

				cd test/

				python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z

				7z x %IMAGE_COMMIT_TAG%.7z

				cd ..

				EOL

				cat >ci_scripts/test_python_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --include nn --verbose && cd ..

				EOL

				cat >ci_scripts/test_python_all_except_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --exclude nn --verbose && cd ..

				EOL

				run_tests() {

				    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				        ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat

				    else

				        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				            ci_scripts/test_python_nn.bat

				        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				            ci_scripts/test_python_all_except_nn.bat

				        fi

				    fi

				}

				run_tests && echo "TEST PASSED"

									
										31

.travis.aten.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				# https://travis-ci.org/zdevito/ATen

				language: python

				python:

				    - 2.7

				    - 3.6

				dist: trusty

				before_install:

				    - sudo apt-get install -qq valgrind

				install:

				    - travis_retry pip install pyyaml typing

				script:

				    - cd aten

				    - mkdir build install

				    - cd build

				    - cmake .. -DUSE_CUDA=OFF -DCMAKE_INSTALL_PREFIX=../install

				    - make install

				    - ../tools/run_tests.sh .

				    - cd ..

				    - tools/test_install.sh $(pwd)/install $(pwd)

				matrix:

				    fast_finish: true

				    include:

				        env: LINT_CHECK

				        python: "2.7"

				        install: pip install flake8

				        script: flake8

									
										45

.travis.yml
									
												View File
												
				@ -1,27 +1,8 @@

				# https://travis-ci.org/pytorch/pytorch

				language: python

				python:

				    - 2.7.8

				    - 2.7

				    - 3.5

				    - nightly

				install:

				    - export CC="gcc-4.8"

				    - export CXX="g++-4.8"

				    - travis_retry pip install -r requirements.txt

				    - travis_retry pip install .

				script:

				    - ./test/run_test.sh

				addons:

				    apt:

				        sources:

				            - ubuntu-toolchain-r-test

				        packages:

				            - gcc-4.8

				            - g++-4.8

				dist: trusty

				git:

				  submodules: false

				# This reportedly works around an issue downloading packages from pypi on

				# travis.  Consider removing this after the underlying issue is fixed.

				@ -31,8 +12,20 @@ sudo: false

				matrix:

				    fast_finish: true

				    include:

				        env: LINT_CHECK

				      - env: LINT_CHECK

				        python: "2.7"

				        addons: true

				        install: pip install pep8

				        script: pep8 setup.py

				        install: pip install flake8

				        script: flake8

				      - env: LINT_CHECK

				        python: "3.7"

				        dist: xenial    # required for Python 3.7 (travis-ci/travis-ci#9069)

				        sudo: required  # required for Python 3.7 (travis-ci/travis-ci#9069)

				        install: pip install flake8

				        script: flake8

				      - env: MYPY_TYPE_CHECK

				        python: "3.6"

				        install: pip install mypy mypy-extensions

				        script: mypy @mypy-files.txt

				      - env: CPP_DOC_CHECK

				        install: sudo apt-get install -y doxygen

				        script: cd docs/cpp && ./check-doxygen.sh

6

CITATION Normal file

View File

 @ -0,0 +1,6 @@
 @inproceedings{paszke2017automatic,
   title={Automatic differentiation in PyTorch},
   author={Paszke, Adam and Gross, Sam and Chintala, Soumith and Chanan, Gregory and Yang, Edward and DeVito, Zachary and Lin, Zeming and Desmaison, Alban and Antiga, Luca and Lerer, Adam},
   booktitle={NIPS-W},
   year={2017}
 }

									
										421

CMakeLists.txt
									
										Normal file
									
												View File
												
				@ -0,0 +1,421 @@

				cmake_minimum_required(VERSION 3.5 FATAL_ERROR)

				#cmake_policy(SET CMP0022 NEW)

				#cmake_policy(SET CMP0023 NEW)

				# ---[ Project and semantic versioning.

				project(Caffe2 CXX C)

				set(CAFFE2_VERSION_MAJOR 0)

				set(CAFFE2_VERSION_MINOR 8)

				set(CAFFE2_VERSION_PATCH 2)

				set(CAFFE2_VERSION

				    "${CAFFE2_VERSION_MAJOR}.${CAFFE2_VERSION_MINOR}.${CAFFE2_VERSION_PATCH}")

				# One variable that determines whether the current cmake process is being run

				# with the main Caffe2 library. This is useful for building modules - if

				# modules are built with the main Caffe2 library then one does not need to do

				# find caffe2 in the cmake script. One can usually guard it in some way like

				#    if (NOT CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO)

				#      find_package(Caffe2 REQUIRED)

				#    endif()

				set(CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO ON)

				if(NOT DEFINED BLAS_SET_BY_USER)

				  if(DEFINED BLAS)

				    set(BLAS_SET_BY_USER TRUE)

				  else()

				    message(STATUS "Not forcing any particular BLAS to be found")

				    set(BLAS_SET_BY_USER FALSE)

				  endif()

				  set(BLAS_SET_BY_USER ${BLAS_SET_BY_USER} CACHE STRING "Marks whether BLAS was manually set by user or auto-detected")

				endif()

				# Apple specific

				if(APPLE)

				  # These lines are an attempt to make find_package(cuda) pick up

				  # libcuda.dylib, and not cuda.framework.  It doesn't work all

				  # the time, but it seems to help for some users.

				  # TODO: replace this with a more robust fix

				  set(CMAKE_FIND_FRAMEWORK LAST)

				  set(CMAKE_FIND_APPBUNDLE LAST)

				  # Get clang version on macOS

				  EXECUTE_PROCESS( COMMAND ${CMAKE_CXX_COMPILER} --version OUTPUT_VARIABLE clang_full_version_string )

				  string(REGEX REPLACE "Apple LLVM version ([0-9]+\\.[0-9]+).*" "\\1" CLANG_VERSION_STRING ${clang_full_version_string})

				  MESSAGE( STATUS "CLANG_VERSION_STRING:         " ${CLANG_VERSION_STRING} )

				  # RPATH stuff

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				# ---[ Options.

				# Note to developers: if you add an option below, make sure you also add it to

				# cmake/Summary.cmake so that the summary prints out the option values.

				include(CMakeDependentOption)

				option(BUILD_TORCH "Build Torch" OFF)

				option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)

				option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)

				option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				option(BUILD_PYTHON "Build Python binaries" ON)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				cmake_dependent_option(

				    CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON

				    "BUILD_SHARED_LIBS AND BUILD_CUSTOM_PROTOBUF" OFF)

				cmake_dependent_option(

				    CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON

				    "NOT BUILD_SHARED_LIBS" OFF)

				option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)

				cmake_dependent_option(

				    INSTALL_TEST "Install test binaries if BUILD_TEST is on" OFF

				    "BUILD_TEST" OFF)

				option(USE_ACL "Use ARM Compute Library" OFF)

				option(USE_ASAN "Use Address Sanitizer" OFF)

				option(USE_CUDA "Use CUDA" ON)

				option(USE_ROCM "Use ROCm" OFF)

				option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)

				cmake_dependent_option(

				    USE_CUDNN "Use cuDNN" ON

				    "USE_CUDA" OFF)

				option(USE_FFMPEG "Use ffmpeg" OFF)

				option(USE_GFLAGS "Use GFLAGS" ON)

				option(USE_GLOG "Use GLOG" ON)

				option(USE_LEVELDB "Use LEVELDB" ON)

				option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)

				option(USE_LMDB "Use LMDB" ON)

				option(USE_METAL "Use Metal for iOS build" ON)

				option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)

				option(USE_NATIVE_ARCH "Use -march=native" OFF)

				option(USE_NCCL "Use NCCL" ON)

				option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)

				option(USE_NERVANA_GPU "Use Nervana GPU backend" OFF)

				option(USE_NNAPI "Use NNAPI" OFF)

				option(USE_NNPACK "Use NNPACK" ON)

				option(USE_NUMA "Use NUMA (only available on Linux)" ON)

				cmake_dependent_option(

				    USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF

				    "USE_CUDA" OFF)

				option(USE_OBSERVERS "Use observers module." OFF)

				option(USE_OPENCL "Use OpenCL" OFF)

				option(USE_OPENCV "Use OpenCV" ON)

				option(USE_OPENMP "Use OpenMP for parallel code" OFF)

				option(USE_PROF "Use profiling" OFF)

				option(USE_REDIS "Use Redis" OFF)

				option(USE_ROCKSDB "Use RocksDB" OFF)

				option(USE_SNPE "Use Qualcomm's SNPE library" OFF)

				option(USE_SYSTEM_EIGEN_INSTALL

				    "Use system Eigen instead of the one under third_party" OFF)

				option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)

				option(USE_ZMQ "Use ZMQ" OFF)

				option(USE_ZSTD "Use ZSTD" OFF)

				option(USE_MKLDNN "Use MKLDNN" OFF)

				option(USE_IDEEP "Use IDEEP interface in MKL BLAS" ON)

				option(USE_MKLML "Use MKLML interface in MKL BLAS" ON)

				option(USE_DISTRIBUTED "Use distributed" ON)

				cmake_dependent_option(

				    USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				cmake_dependent_option(

				    USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				cmake_dependent_option(

				    USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF

				    "USE_GLOO" OFF)

				option(TORCH_USE_CEREAL "Build the C++ API with Cereal for serialization support" OFF)

				# Used when building Caffe2 through setup.py

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)

				SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")

				if (ANDROID OR IOS)

				  set(BUILD_ATEN_MOBILE ON)

				endif()

				# ---[ CMake scripts + modules

				list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)

				# ---[ CMake build directories

				set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)

				set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)

				set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

				enable_testing()

				# ---[ Build variables set within the cmake tree

				include(cmake/BuildVariables.cmake)

				set(CAFFE2_WHITELIST "" CACHE STRING "A whitelist file of files that one should build.")

				# Set default build type

				if(NOT CMAKE_BUILD_TYPE)

				    message(STATUS "Build type not set - defaulting to Release")

				    set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Choose the type of build from: Debug Release RelWithDebInfo MinSizeRel Coverage." FORCE)

				endif()

				# ---[ Misc checks to cope with various compiler modes

				include(cmake/MiscCheck.cmake)

				# External projects

				include(ExternalProject)

				# ---[ Utils

				# TODO: merge the following 3 files into cmake/public/utils.cmake.

				include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				# ---[ Dependencies

				include(cmake/Dependencies.cmake)

				# ---[ Whitelist file if whitelist is specified

				include(cmake/Whitelist.cmake)

				# ---[ Set link flag, handle additional deps for gcc 4.8 and above

				if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.8.0 AND NOT ANDROID)

				  message(STATUS "GCC ${CMAKE_CXX_COMPILER_VERSION}: Adding gcc and gcc_s libs to link line")

				  list(APPEND Caffe2_DEPENDENCY_LIBS gcc_s gcc)

				endif()

				# ---[ Build flags

				set(CMAKE_C_STANDARD 99)

				set(CMAKE_CXX_STANDARD 11)

				if(NOT MSVC)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -fPIC")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")

				  # Eigen fails to build with some versions, so convert this to a warning

				  # Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-field-initializers")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-type-limits")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-sign-compare")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-parameter")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-variable")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-function")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-result")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-overflow")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-aliasing")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations")

				  if (CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0))

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				  endif()

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pedantic")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=redundant-decls")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=old-style-cast")

				  # These flags are not available in GCC-4.8.5. Set only when using clang.

				  # Compared against https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Option-Summary.html

				  if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-invalid-partial-specialization")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-typedef-redefinition")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-warning-option")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-inconsistent-missing-override")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-aligned-allocation-unavailable")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Qunused-arguments")

				  endif()

				  if ((APPLE AND (NOT ("${CLANG_VERSION_STRING}" VERSION_LESS "9.0")))

				    OR (CMAKE_COMPILER_IS_GNUCXX

				    AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0 AND NOT APPLE)))

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -faligned-new")

				  endif()

				  if ($ENV{WERROR})

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror")

				  endif($ENV{WERROR})

				  if (NOT APPLE)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-but-set-variable")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-maybe-uninitialized")

				  endif()

				else()

				  foreach(flag_var

				      CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE

				      CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)

				    if (${CAFFE2_USE_MSVC_STATIC_RUNTIME})

				      if(${flag_var} MATCHES "/MD")

				        string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")

				      endif(${flag_var} MATCHES "/MD")

				    else()

				      if(${flag_var} MATCHES "/MT")

				        string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")

				      endif()

				    endif()

				    # /bigobj increases number of sections in .obj file, which is needed to link

				    # against libaries in Python 2.7 under Windows

				    set(${flag_var} "${${flag_var}} /MP /bigobj")

				  endforeach(flag_var)

				endif()

				set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				if (USE_ASAN)

				    set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fsanitize=address")

				    set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fsanitize=address")

				endif()

				if (APPLE)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")

				endif()

				if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				endif()

				if(ANDROID)

				  if(CMAKE_COMPILER_IS_GNUCXX)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")

				  else()

				    set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -s")

				  endif()

				endif()

				if(NOT APPLE AND UNIX)

				  list(APPEND Caffe2_DEPENDENCY_LIBS dl)

				endif()

				# Prefix path to Caffe2 headers.

				# If a directory containing installed Caffe2 headers was inadvertently

				# added to the list of include directories, prefixing

				# PROJECT_SOURCE_DIR means this source tree always takes precedence.

				include_directories(BEFORE ${PROJECT_SOURCE_DIR})

				# Prefix path to generated Caffe2 headers.

				# These need to take precedence over their empty counterparts located

				# in PROJECT_SOURCE_DIR.

				include_directories(BEFORE ${PROJECT_BINARY_DIR})

				include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)

				# ---[ Main build

				add_subdirectory(caffe2)

				# --[ Documentation

				if(BUILD_DOCS)

				  # check if Doxygen is installed

				  find_package(Doxygen)

				  if (DOXYGEN_FOUND)

				    message("Generating documentation")

				    set(DOXYGEN_C_IN ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/.Doxyfile-c)

				    set(DOXYGEN_C_OUT ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/Doxyfile-c)

				    set(DOXYGEN_P_IN ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/.Doxyfile-python)

				    set(DOXYGEN_P_OUT ${CMAKE_CURRENT_SOURCE_DIR}/docs/caffe2/Doxyfile-python)

				    if(EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)

				      file(REMOVE_RECURSE ${CMAKE_CURRENT_BINARY_DIR}/docs)

				    endif()

				    file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/docs)

				    configure_file(${DOXYGEN_C_IN} ${DOXYGEN_C_OUT} @ONLY)

				    configure_file(${DOXYGEN_P_IN} ${DOXYGEN_P_OUT} @ONLY)

				    add_custom_target(doc_doxygen_c ALL

				        COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_C_OUT}

				        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}

				        COMMENT "Generating C++ API documentation with Doxygen"

				        VERBATIM)

				    add_custom_target(doc_doxygen_python ALL

				        COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_P_OUT}

				        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}

				        COMMENT "Generating Python API documentation with Doxygen"

				        VERBATIM)

				  else()

				    message(FATAL_ERROR "Doxygen needs to be installed to generate the documentation")

				  endif()

				endif()

				# ---[ CMake related files

				# Uninistall option.

				if(NOT TARGET caffe2_uninstall)

				  configure_file(

				      ${CMAKE_CURRENT_SOURCE_DIR}/cmake/cmake_uninstall.cmake.in

				      ${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake

				      IMMEDIATE @ONLY)

				  add_custom_target(caffe2_uninstall

				      COMMAND ${CMAKE_COMMAND} -P

				      ${CMAKE_CURRENT_BINARY_DIR}/cmake_uninstall.cmake)

				endif()

				# ---[ Make configuration files for cmake to allow dependent libraries

				# easier access to Caffe2.

				if ((NOT USE_GLOG) OR (NOT USE_GFLAGS) OR BUILD_CUSTOM_PROTOBUF)

				  message(WARNING

				      "Generated cmake files are only fully tested if one builds "

				      "with system glog, gflags, and protobuf. Other settings may "

				      "generate files that are not well tested.")

				endif()

				if (USE_CUDA OR USE_ROCM)

				  # TODO: check if we should include other cuda dependency libraries

				  # to the interface as well.

				endif()

				# Note(jiayq): when building static libraries, all PRIVATE dependencies

				# will also become interface libraries, and as a result if there are any

				# dependency libraries that are not exported, the following install export

				# script will fail. As a result, we will only provide the targets cmake

				# files for shared lib installation. For more info, read:

				# https://cmake.org/pipermail/cmake/2016-May/063400.html

				if (BUILD_SHARED_LIBS)

				  configure_file(

				      ${PROJECT_SOURCE_DIR}/cmake/Caffe2ConfigVersion.cmake.in

				      ${PROJECT_BINARY_DIR}/Caffe2ConfigVersion.cmake

				      @ONLY)

				  configure_file(

				      ${PROJECT_SOURCE_DIR}/cmake/Caffe2Config.cmake.in

				      ${PROJECT_BINARY_DIR}/Caffe2Config.cmake

				      @ONLY)

				  install(FILES

				      ${PROJECT_BINARY_DIR}/Caffe2ConfigVersion.cmake

				      ${PROJECT_BINARY_DIR}/Caffe2Config.cmake

				      DESTINATION share/cmake/Caffe2

				      COMPONENT dev)

				  install(FILES

				      ${PROJECT_SOURCE_DIR}/cmake/public/cuda.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/glog.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/gflags.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/mkl.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/protobuf.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/threads.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/utils.cmake

				      DESTINATION share/cmake/Caffe2/public

				      COMPONENT dev)

				  install(DIRECTORY

				      ${PROJECT_SOURCE_DIR}/cmake/Modules_CUDA_fix

				      DESTINATION share/cmake/Caffe2/

				      COMPONENT dev)

				  install(EXPORT Caffe2Targets DESTINATION share/cmake/Caffe2

				      FILE Caffe2Targets.cmake

				      COMPONENT dev)

				else()

				  message(WARNING

				      "Generated cmake files are only available when building "

				      "shared libs.")

				endif()

				# ---[ Modules

				add_subdirectory(modules)

				# ---[ Binaries

				# Binaries will be built after the Caffe2 main libraries and the modules

				# are built. For the binaries, they will be linked to the Caffe2 main

				# libraries, as well as all the modules that are built with Caffe2 (the ones

				# built in the previous Modules section above).

				if (BUILD_BINARY)

				  add_subdirectory(binaries)

				endif()

				include(cmake/Summary.cmake)

				caffe2_print_configuration_summary()

25

CODEOWNERS Normal file

View File

 @ -0,0 +1,25 @@
 # This is a comment.
 # Each line is a file pattern followed by one or more owners.
 /aten/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /aten/src/ATen/core/
 /torch/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ssnl @zou3519
 /docs/cpp @goldsborough @ebetica @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /test @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /tools @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /README.md @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /setup.py @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /requirements.txt @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /torch/csrc/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /test/cpp/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/jit/passes/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /test/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /scripts/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/lib/c10d/ @apaszke @pietern @teng-li
 /torch/csrc/distributed/ @apaszke @pietern @teng-li
 /torch/distributed/ @apaszke @pietern @teng-li
 /test/test_c10d.py @apaszke @pietern @teng-li
 /torch/utils/cpp_extension.py @goldsborough @fmassa @apaszke @soumith @ezyang

									
										379

CONTRIBUTING.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,379 @@

				## Contributing to PyTorch

				If you are interested in contributing to PyTorch, your contributions will fall

				into two categories:

				1. You want to propose a new Feature and implement it

				    - post about your intended feature, and we shall discuss the design and

				    implementation. Once we agree that the plan looks good, go ahead and implement it.

				2. You want to implement a feature or bug-fix for an outstanding issue

				    - Look at the outstanding issues here: https://github.com/pytorch/pytorch/issues

				    - Especially look at the Low Priority and Medium Priority issues

				    - Pick an issue and comment on the task that you want to work on this feature

				    - If you need more context on a particular issue, please ask and we shall provide.

				Once you finish implementing a feature or bugfix, please send a Pull Request to

				https://github.com/pytorch/pytorch

				If you are not familiar with creating a Pull Request, here are some guides:

				- http://stackoverflow.com/questions/14680711/how-to-do-a-github-pull-request

				- https://help.github.com/articles/creating-a-pull-request/

				## Developing PyTorch

				To develop PyTorch on your machine, here are some tips:

				1. Uninstall all existing PyTorch installs:

				```

				conda uninstall pytorch

				pip uninstall torch

				pip uninstall torch # run this command twice

				```

				2. Clone a copy of PyTorch from source:

				```

				git clone https://github.com/pytorch/pytorch

				cd pytorch

				```

				3. Install PyTorch in `build develop` mode:

				A full set of instructions on installing PyTorch from Source are here:

				https://github.com/pytorch/pytorch#from-source

				The change you have to make is to replace

				```

				python setup.py install

				```

				with

				```

				python setup.py build develop

				```

				This is especially useful if you are only changing Python files.

				This mode will symlink the python files from the current local source tree into the

				python install.

				Hence, if you modify a python file, you do not need to reinstall pytorch again and again.

				For example:

				- Install local pytorch in `build develop` mode

				- modify your python file `torch/__init__.py` (for example)

				- test functionality

				- modify your python file `torch/__init__.py`

				- test functionality

				- modify your python file `torch/__init__.py`

				- test functionality

				You do not need to repeatedly install after modifying python files.

				In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`

				and `python setup.py clean`. Then you can install in `build develop` mode again.

				## Unit testing

				PyTorch's testing is located under `test/`. Run the entire test suite with

				```

				python test/run_test.py

				```

				or run individual test files, like `python test/test_nn.py`, for individual test suites.

				### Better local unit tests with pytest

				We don't officially support `pytest`, but it works well with our `unittest` tests and offers

				a number of useful features for local developing. Install it via `pip install pytest`.

				If you want to just run tests that contain a specific substring, you can use the `-k` flag:

				```

				pytest test/test_nn.py -k Loss -v

				```

				The above is an example of testing a change to Loss functions: this command runs tests such as

				`TestNN.test_BCELoss` and `TestNN.test_MSELoss` and can be useful to save keystrokes.

				## Writing documentation

				PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)

				for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to

				fit into Jupyter documentation popups.

				For C++ documentation (https://pytorch.org/cppdocs), we use

				[Doxygen](http://www.doxygen.nl/) and then convert it to

				[Sphinx](http://www.sphinx-doc.org/) via

				[Breathe](https://github.com/michaeljones/breathe) and

				[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen

				reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more

				information on the documentation syntax. To build the documentation locally,

				`cd` into `docs/cpp` and then `make html`.

				We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen

				commands. To run this check locally, run `./check-doxygen.sh` from inside

				`docs/cpp`.

				## Managing multiple build trees

				One downside to using `python setup.py develop` is that your development

				version of pytorch will be installed globally on your account (e.g., if

				you run `import torch` anywhere else, the development version will be

				used.

				If you want to manage multiple builds of PyTorch, you can make use of

				[conda environments](https://conda.io/docs/using/envs.html) to maintain

				separate Python package environments, each of which can be tied to a

				specific build of PyTorch.  To set one up:

				```

				conda create -n pytorch-myfeature

				source activate pytorch-myfeature

				# if you run python now, torch will NOT be installed

				python setup.py build develop

				```

				## C++ Development tips

				If you are working on the C++ code, there are a few important things that you

				will want to keep in mind:

				1. How to rebuild only the code you are working on, and

				2. How to make rebuilds in the absence of changes go faster.

				### Build only what you need.

				`python setup.py build` will build everything, but since our build system is

				not very optimized for incremental rebuilds, this will actually be very slow.

				Far better is to only request rebuilds of the parts of the project you are

				working on:

				- Working on the Python bindings?  Run `python setup.py develop` to rebuild

				  (NB: no `build` here!)

				- Working on `torch/csrc` or `aten`?  Run `python setup.py rebuild_libtorch` to

				  rebuild and avoid having to rebuild other dependent libraries we

				  depend on.

				- Working on one of the other dependent libraries? The other valid

				  targets are listed in `dep_libs` in `setup.py`. prepend `build_` to

				  get a target, and run as e.g. `python setup.py build_gloo`.

				- Working on a test binary?  Run `(cd build && ninja bin/test_binary_name)` to

				  rebuild only that test binary (without rerunning cmake).  (Replace `ninja` with

				  `make` if you don't have ninja installed).

				On the initial build, you can also speed things up with the environment

				variables `DEBUG` and `NO_CUDA`.

				- `DEBUG=1` will enable debug builds (-g -O0)

				- `NO_CUDA=1` will disable compiling CUDA (in case you are developing on something not CUDA related), to save compile time.

				For example:

				```

				NO_CUDA=1 DEBUG=1 python setup.py build develop

				```

				Make sure you continue to pass these flags on subsequent builds.

				### Code completion and IDE support

				When using `python setup.py develop`, PyTorch will generate

				a `compile_commands.json` file that can be used by many editors

				to provide command completion and error highlighting for PyTorch's

				C++ code. You need to `pip install ninja` to generate accurate

				information for the code in `torch/csrc`. More information at:

				- https://sarcasm.github.io/notes/dev/compilation-database.html

				### Make no-op build fast.

				#### Use Ninja

				Python `setuptools` is pretty dumb, and always rebuilds every C file in a

				project.  If you install the ninja build system with `pip install ninja`,

				then PyTorch will use it to track dependencies correctly.

				If pytorch was already built, you will need to run `python setup.py clean` once

				after installing ninja for builds to succeed.

				#### Use CCache

				Even when dependencies are tracked with file modification,

				there are many situations where files get rebuilt when a previous

				compilation was exactly the same.

				Using ccache in a situation like this is a real time-saver. However, by

				default, ccache does not properly support CUDA stuff, so here are the

				instructions for installing a custom `ccache` fork that has CUDA support:

				```

				# install and export ccache

				if ! ls ~/ccache/bin/ccache

				then

				    sudo apt-get update

				    sudo apt-get install -y automake autoconf

				    sudo apt-get install -y asciidoc

				    mkdir -p ~/ccache

				    pushd /tmp

				    rm -rf ccache

				    git clone https://github.com/colesbury/ccache -b ccbin

				    pushd ccache

				    ./autogen.sh

				    ./configure

				    make install prefix=~/ccache

				    popd

				    popd

				    mkdir -p ~/ccache/lib

				    mkdir -p ~/ccache/cuda

				    ln -s ~/ccache/bin/ccache ~/ccache/lib/cc

				    ln -s ~/ccache/bin/ccache ~/ccache/lib/c++

				    ln -s ~/ccache/bin/ccache ~/ccache/lib/gcc

				    ln -s ~/ccache/bin/ccache ~/ccache/lib/g++

				    ln -s ~/ccache/bin/ccache ~/ccache/cuda/nvcc

				    ~/ccache/bin/ccache -M 25Gi

				fi

				export PATH=~/ccache/lib:$PATH

				export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc

				```

				## CUDA Development tips

				If you are working on the CUDA code, here are some useful CUDA debugging tips:

				1. `CUDA_DEVICE_DEBUG=1` will enable CUDA device function debug symbols (`-g -G`).

				    This will be particularly helpful in debugging device code. However, it will

				    slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.

				2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,

				   `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).

				Hope this helps, and thanks for considering to contribute.

				## Windows development tips

				Occasionally, you will write a patch which works on Linux, but fails CI on Windows.

				There are a few aspects in which MSVC (the Windows compiler toolchain we use) is stricter

				than Linux, which are worth keeping in mind when fixing these problems.

				1. Symbols are NOT exported by default on Windows; instead, you have to explicitly

				   mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /

				   `__declspec(dllimport)`.  We have codified this pattern into a set of macros

				   which follow the convention `*_API`, e.g., `AT_API` inside ATen. (Every separate

				   shared library needs a unique macro name, because symbol visibility is on a per

				   shared library basis.)

				   The upshot is if you see an "unresolved external" error in your Windows build, this

				   is probably because you forgot to mark a function with `*_API`.  However, there is

				   one important counterexample to this principle: if you want a *templated* function

				   to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it,

				   you'll have to explicitly instantiate all of the specializations used by the call

				   sites.)

				2. If you link against a library, this does not make its dependencies transitively

				   visible. You must explicitly specify a link dependency against every library whose

				   symbols you use.  (This is different from Linux where in most environments,

				   transitive dependencies can be used to fulfill unresolved symbols.)

				3. If you have a Windows box (we have a few on EC2 which you can request access to) and

				   you want to run the build, the easiest way is to just run `.jenkins/pytorch/win-build.sh`.

				   If you need to rebuild, run `REBUILD=1 .jenkins/pytorch/win-build.sh` (this will avoid

				   blowing away your Conda environment.)

				Even if you don't know anything about MSVC, you can use cmake to build simple programs on

				Windows; this can be helpful if you want to learn more about some peculiar linking behavior

				by reproducing it on a small example.  Here's a simple example cmake file that defines

				two dynamic libraries, one linking with the other:

				```

				project(myproject CXX)

				set(CMAKE_CXX_STANDARD 11)

				add_library(foo SHARED foo.cpp)

				add_library(bar SHARED bar.cpp)

				# NB: don't forget to __declspec(dllexport) at least one symbol from foo,

				# otherwise foo.lib will not be created.

				target_link_libraries(bar PUBLIC foo)

				```

				You can build it with:

				```

				mkdir build

				cd build

				cmake ..

				cmake --build .

				```

				### Known MSVC (and MSVC with NVCC) bugs

				The PyTorch codebase sometimes likes to use exciting C++ features, and

				these exciting features lead to exciting bugs in Windows compilers.

				To add insult to injury, the error messages will often not tell you

				which line of code actually induced the erroring template instantiation.

				I've found the most effective way to debug these problems is to

				carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.

				Here are a few well known pitfalls and workarounds:

				* This is not actually a bug per se, but in general, code generated by MSVC

				  is more sensitive to memory errors; you may have written some code

				  that does a use-after-free or stack overflows; on Linux the code

				  might work, but on Windows your program will crash.  ASAN may not

				  catch all of these problems: stay vigilant to the possibility that

				  your crash is due to a real memory problem.

				* (NVCC) `at::optional` does not work when used from device code.  Don't use

				  it from kernels.  Upstream issue: https://github.com/akrzemi1/Optional/issues/58

				  and our local issue #10329.

				* `constexpr` generally works less well on MSVC.

				  * The idiom `static_assert(f() == f())` to test if `f` is constexpr

				    does not work; you'll get "error C2131: expression did not evaluate

				    to a constant".  Don't use these asserts on Windows.

				    (Example: `aten/src/ATen/core/intrusive_ptr.h`)

				* (NVCC) Code you access inside a `static_assert` will eagerly be

				  evaluated as if it were device code, and so you might get an error

				  that the code is "not accessible".

				```

				class A {

				  static A singleton_;

				  static constexpr inline A* singleton() {

				    return &singleton_;

				  }

				};

				static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");

				```

				* The compiler will run out of heap if you attempt to compile files that

				  are too large.  Splitting such files into separate files helps.

				  (Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)

				## Caffe2 notes

				In 2018, we merged Caffe2 into the PyTorch source repository.  While the

				steady state aspiration is that Caffe2 and PyTorch share code freely,

				in the meantime there will be some separation.

				If you submit a PR to only PyTorch or only Caffe2 code, CI will only

				run for the project you edited.  The logic for this is implemented

				in `.jenkins/pytorch/dirty.sh` and `.jenkins/caffe2/dirty.sh`; you

				can look at this to see what path prefixes constitute changes.

				This also means if you ADD a new top-level path, or you start

				sharing code between projects, you need to modify these files.

				There are a few "unusual" directories which, for historical reasons,

				are Caffe2/PyTorch specific.  Here they are:

				- `CMakeLists.txt`, `Makefile`, `binaries`, `cmake`, `conda`, `modules`,

				  `scripts` are Caffe2-specific.  Don't put PyTorch code in them without

				  extra coordination.

				- `mypy*`, `requirements.txt`, `setup.py`, `test`, `tools` are

				  PyTorch-specific.  Don't put Caffe2 code in them without extra

				  coordination.

									
										33

Dockerfile
									
												View File
											
				@ -1,33 +0,0 @@

				FROM nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04 

				RUN apt-get update && apt-get install -y --no-install-recommends \

				         build-essential \

				         cmake \

				         git \

				         curl \

				         ca-certificates \

				         libjpeg-dev \

				         libpng-dev &&\

				     rm -rf /var/lib/apt/lists/*

				RUN curl -o ~/miniconda.sh -O  https://repo.continuum.io/miniconda/Miniconda3-4.2.12-Linux-x86_64.sh  && \

				     chmod +x ~/miniconda.sh && \

				     ~/miniconda.sh -b -p /opt/conda && \     

				     rm ~/miniconda.sh && \

				     /opt/conda/bin/conda install conda-build && \

				     /opt/conda/bin/conda create -y --name pytorch-py35 python=3.5.2 numpy scipy ipython mkl&& \

				     /opt/conda/bin/conda clean -ya 

				ENV PATH /opt/conda/envs/pytorch-py35/bin:$PATH

				RUN conda install --name pytorch-py35 -c soumith magma-cuda80

				# This must be done before pip so that requirements.txt is available

				WORKDIR /opt/pytorch

				COPY . .

				RUN cat requirements.txt | xargs -n1 pip install --no-cache-dir && \

				    TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1+PTX" TORCH_NVCC_FLAGS="-Xfatbin -compress-all" \

				    CMAKE_LIBRARY_PATH=/opt/conda/envs/pytorch-py35/lib \

				    CMAKE_INCLUDE_PATH=/opt/conda/envs/pytorch-py35/include \

				    pip install -v .

				WORKDIR /workspace

				RUN chmod -R a+w /workspace

32

LICENSE

View File

 @ -1,3 +1,5 @@
 From PyTorch:
 Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
 Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
 Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
 @ -8,6 +10,36 @@ Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou,
 Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
 Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
 From Caffe2:
 Copyright (c) 2016-present, Facebook Inc. All rights reserved.
 All contributions by Facebook:
 Copyright (c) 2016 Facebook Inc.
 All contributions by Google:
 Copyright (c) 2015 Google Inc.
 All rights reserved.
 All contributions by Yangqing Jia:
 Copyright (c) 2015 Yangqing Jia
 All rights reserved.
 All contributions from Caffe:
 Copyright(c) 2013, 2014, 2015, the respective contributors
 All rights reserved.
 All other contributions:
 Copyright(c) 2015, 2016 the respective contributors
 All rights reserved.
 Caffe2 uses a copyright model similar to Caffe: each contributor holds
 copyright over their contributions to Caffe2. The project versioning records
 all such contribution and copyright details. If a contributor wants to further
 mark their specific copyright on a particular contribution, they should
 indicate their copyright solely in the commit message of the change when it is
 committed.
 All rights reserved.
 Redistribution and use in source and binary forms, with or without

									
										21

Makefile
									
										Normal file
									
												View File
												
				@ -0,0 +1,21 @@

				# This makefile does nothing but delegating the actual building to cmake.

				all:

					@mkdir -p build && cd build && cmake .. $(shell python ./scripts/get_python_cmake_flags.py) && $(MAKE)

				local:

					@./scripts/build_local.sh

				android:

					@./scripts/build_android.sh

				ios:

					@./scripts/build_ios.sh

				clean: # This will remove ALL build folders.

					@rm -r build*/

				linecount:

					@cloc --read-lang-def=caffe.cloc caffe2 || \

						echo "Cloc is not available on the machine. You can install cloc with " && \

						echo "    sudo apt-get install cloc"

309

NOTICE Normal file

View File

 @ -0,0 +1,309 @@
 =======================================================================
 Software under third_party
 =======================================================================
 Software libraries under third_party are provided as github submodule
 links, and their content is not part of the Caffe2 codebase. Their
 licences can be found under the respective software repositories.
 =======================================================================
 Earlier BSD License
 =======================================================================
 Early development of Caffe2 in 2015 and early 2016 is licensed under the
 BSD license. The license is attached below:
 All contributions by Facebook:
 Copyright (c) 2016 Facebook Inc.
 All contributions by Google:
 Copyright (c) 2015 Google Inc.
 All rights reserved.
 All contributions by Yangqing Jia:
 Copyright (c) 2015 Yangqing Jia
 All rights reserved.
 All other contributions:
 Copyright(c) 2015, 2016 the respective contributors
 All rights reserved.
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 . Redistributions of source code must retain the above copyright notice, this
    list of conditions and the following disclaimer.
 . Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions and the following disclaimer in the documentation
    and/or other materials provided with the distribution.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 =======================================================================
 Caffe's BSD License
 =======================================================================
 Some parts of the caffe2 code is derived from the original Caffe code, which is
 created by Yangqing Jia and is now a BSD-licensed open-source project. The Caffe
 license is as follows:
 COPYRIGHT
 All contributions by the University of California:
 Copyright (c) 2014, The Regents of the University of California (Regents)
 All rights reserved.
 All other contributions:
 Copyright (c) 2014, the respective contributors
 All rights reserved.
 Caffe uses a shared copyright model: each contributor holds copyright over
 their contributions to Caffe. The project versioning records all such
 contribution and copyright details. If a contributor wants to further mark
 their specific copyright on a particular contribution, they should indicate
 their copyright solely in the commit message of the change when it is
 committed.
 LICENSE
 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions are met:
 . Redistributions of source code must retain the above copyright notice, this
    list of conditions and the following disclaimer.
 . Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions and the following disclaimer in the documentation
    and/or other materials provided with the distribution.
 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
 ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 CONTRIBUTION AGREEMENT
 By contributing to the BVLC/caffe repository through pull-request, comment,
 or otherwise, the contributor releases their content to the
 license and copyright terms herein.
 =======================================================================
 Caffe2's Apache License
 =======================================================================
 This repo contains Caffe2 code, which was previously licensed under
 Apache License Version 2.0:
                                  Apache License
                            Version 2.0, January 2004
                         http://www.apache.org/licenses/
    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
 . Definitions.
       "License" shall mean the terms and conditions for use, reproduction,
       and distribution as defined by Sections 1 through 9 of this document.
       "Licensor" shall mean the copyright owner or entity authorized by
       the copyright owner that is granting the License.
       "Legal Entity" shall mean the union of the acting entity and all
       other entities that control, are controlled by, or are under common
       control with that entity. For the purposes of this definition,
       "control" means (i) the power, direct or indirect, to cause the
       direction or management of such entity, whether by contract or
       otherwise, or (ii) ownership of fifty percent (50%) or more of the
       outstanding shares, or (iii) beneficial ownership of such entity.
       "You" (or "Your") shall mean an individual or Legal Entity
       exercising permissions granted by this License.
       "Source" form shall mean the preferred form for making modifications,
       including but not limited to software source code, documentation
       source, and configuration files.
       "Object" form shall mean any form resulting from mechanical
       transformation or translation of a Source form, including but
       not limited to compiled object code, generated documentation,
       and conversions to other media types.
       "Work" shall mean the work of authorship, whether in Source or
       Object form, made available under the License, as indicated by a
       copyright notice that is included in or attached to the work
       (an example is provided in the Appendix below).
       "Derivative Works" shall mean any work, whether in Source or Object
       form, that is based on (or derived from) the Work and for which the
       editorial revisions, annotations, elaborations, or other modifications
       represent, as a whole, an original work of authorship. For the purposes
       of this License, Derivative Works shall not include works that remain
       separable from, or merely link (or bind by name) to the interfaces of,
       the Work and Derivative Works thereof.
       "Contribution" shall mean any work of authorship, including
       the original version of the Work and any modifications or additions
       to that Work or Derivative Works thereof, that is intentionally
       submitted to Licensor for inclusion in the Work by the copyright owner
       or by an individual or Legal Entity authorized to submit on behalf of
       the copyright owner. For the purposes of this definition, "submitted"
       means any form of electronic, verbal, or written communication sent
       to the Licensor or its representatives, including but not limited to
       communication on electronic mailing lists, source code control systems,
       and issue tracking systems that are managed by, or on behalf of, the
       Licensor for the purpose of discussing and improving the Work, but
       excluding communication that is conspicuously marked or otherwise
       designated in writing by the copyright owner as "Not a Contribution."
       "Contributor" shall mean Licensor and any individual or Legal Entity
       on behalf of whom a Contribution has been received by Licensor and
       subsequently incorporated within the Work.
 . Grant of Copyright License. Subject to the terms and conditions of
       this License, each Contributor hereby grants to You a perpetual,
       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
       copyright license to reproduce, prepare Derivative Works of,
       publicly display, publicly perform, sublicense, and distribute the
       Work and such Derivative Works in Source or Object form.
 . Grant of Patent License. Subject to the terms and conditions of
       this License, each Contributor hereby grants to You a perpetual,
       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
       (except as stated in this section) patent license to make, have made,
       use, offer to sell, sell, import, and otherwise transfer the Work,
       where such license applies only to those patent claims licensable
       by such Contributor that are necessarily infringed by their
       Contribution(s) alone or by combination of their Contribution(s)
       with the Work to which such Contribution(s) was submitted. If You
       institute patent litigation against any entity (including a
       cross-claim or counterclaim in a lawsuit) alleging that the Work
       or a Contribution incorporated within the Work constitutes direct
       or contributory patent infringement, then any patent licenses
       granted to You under this License for that Work shall terminate
       as of the date such litigation is filed.
 . Redistribution. You may reproduce and distribute copies of the
       Work or Derivative Works thereof in any medium, with or without
       modifications, and in Source or Object form, provided that You
       meet the following conditions:
       (a) You must give any other recipients of the Work or
           Derivative Works a copy of this License; and
       (b) You must cause any modified files to carry prominent notices
           stating that You changed the files; and
       (c) You must retain, in the Source form of any Derivative Works
           that You distribute, all copyright, patent, trademark, and
           attribution notices from the Source form of the Work,
           excluding those notices that do not pertain to any part of
           the Derivative Works; and
       (d) If the Work includes a "NOTICE" text file as part of its
           distribution, then any Derivative Works that You distribute must
           include a readable copy of the attribution notices contained
           within such NOTICE file, excluding those notices that do not
           pertain to any part of the Derivative Works, in at least one
           of the following places: within a NOTICE text file distributed
           as part of the Derivative Works; within the Source form or
           documentation, if provided along with the Derivative Works; or,
           within a display generated by the Derivative Works, if and
           wherever such third-party notices normally appear. The contents
           of the NOTICE file are for informational purposes only and
           do not modify the License. You may add Your own attribution
           notices within Derivative Works that You distribute, alongside
           or as an addendum to the NOTICE text from the Work, provided
           that such additional attribution notices cannot be construed
           as modifying the License.
       You may add Your own copyright statement to Your modifications and
       may provide additional or different license terms and conditions
       for use, reproduction, or distribution of Your modifications, or
       for any such Derivative Works as a whole, provided Your use,
       reproduction, and distribution of the Work otherwise complies with
       the conditions stated in this License.
 . Submission of Contributions. Unless You explicitly state otherwise,
       any Contribution intentionally submitted for inclusion in the Work
       by You to the Licensor shall be under the terms and conditions of
       this License, without any additional terms or conditions.
       Notwithstanding the above, nothing herein shall supersede or modify
       the terms of any separate license agreement you may have executed
       with Licensor regarding such Contributions.
 . Trademarks. This License does not grant permission to use the trade
       names, trademarks, service marks, or product names of the Licensor,
       except as required for reasonable and customary use in describing the
       origin of the Work and reproducing the content of the NOTICE file.
 . Disclaimer of Warranty. Unless required by applicable law or
       agreed to in writing, Licensor provides the Work (and each
       Contributor provides its Contributions) on an "AS IS" BASIS,
       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
       implied, including, without limitation, any warranties or conditions
       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
       PARTICULAR PURPOSE. You are solely responsible for determining the
       appropriateness of using or redistributing the Work and assume any
       risks associated with Your exercise of permissions under this License.
 . Limitation of Liability. In no event and under no legal theory,
       whether in tort (including negligence), contract, or otherwise,
       unless required by applicable law (such as deliberate and grossly
       negligent acts) or agreed to in writing, shall any Contributor be
       liable to You for damages, including any direct, indirect, special,
       incidental, or consequential damages of any character arising as a
       result of this License or out of the use or inability to use the
       Work (including but not limited to damages for loss of goodwill,
       work stoppage, computer failure or malfunction, or any and all
       other commercial damages or losses), even if such Contributor
       has been advised of the possibility of such damages.
 . Accepting Warranty or Additional Liability. While redistributing
       the Work or Derivative Works thereof, You may choose to offer,
       and charge a fee for, acceptance of support, warranty, indemnity,
       or other liability obligations and/or rights consistent with this
       License. However, in accepting such obligations, You may act only
       on Your own behalf and on Your sole responsibility, not on behalf
       of any other Contributor, and only if You agree to indemnify,
       defend, and hold each Contributor harmless for any liability
       incurred by, or claims asserted against, such Contributor by reason
       of your accepting any such warranty or additional liability.
    END OF TERMS AND CONDITIONS
    APPENDIX: How to apply the Apache License to your work.
       To apply the Apache License to your work, attach the following
       boilerplate notice, with the fields enclosed by brackets "[]"
       replaced with your own identifying information. (Don't include
       the brackets!)  The text should be enclosed in the appropriate
       comment syntax for the file format. We also recommend that a
       file or class name and description of purpose be included on the
       same "printed page" as the copyright notice for easier
       identification within third-party archives.
    Copyright [yyyy] [name of copyright owner]
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
        http://www.apache.org/licenses/LICENSE-2.0
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.

									
										226

README.md
									
												View File
												
				@ -1,229 +1,275 @@

				<p align="center"><img width="40%" src="docs/source/_static/img/pytorch-logo-dark.png" /></p>

				![PyTorch Logo](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/pytorch-logo-dark.png)

				--------------------------------------------------------------------------------

				PyTorch is a python package that provides two high-level features:

				- Tensor computation (like numpy) with strong GPU acceleration

				- Deep Neural Networks built on a tape-based autograd system

				PyTorch is a Python package that provides two high-level features:

				- Tensor computation (like NumPy) with strong GPU acceleration

				- Deep neural networks built on a tape-based autograd system

				You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.

				You can reuse your favorite Python packages such as NumPy, SciPy and Cython to extend PyTorch when needed.

				We are in an early-release Beta. Expect some adventures and rough edges.

				We are in an early-release beta. Expect some adventures and rough edges.

				- [More About PyTorch](#more-about-pytorch)

				- [More about PyTorch](#more-about-pytorch)

				- [Installation](#installation)

				  - [Binaries](#binaries)

				  - [From source](#from-source)

				  - [Docker image](#docker-image)

				  - [From Source](#from-source)

				  - [Docker Image](#docker-image)

				  - [Building the Documentation](#building-the-documentation)

				  - [Previous Versions](#previous-versions)

				- [Getting Started](#getting-started)

				- [Communication](#communication)

				- [Releases and Contributing](#releases-and-contributing)

				- [The Team](#the-team)

				| System | Python | Status |

				| System | 2.7 | 3.5 |

				| --- | --- | --- |

				| Linux CPU | 2.7.8, 2.7, 3.5, nightly | [![Build Status](https://travis-ci.org/pytorch/pytorch.svg?branch=master)](https://travis-ci.org/pytorch/pytorch) |

				| Linux GPU | 2.7 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py2)](https://build.pytorch.org/job/pytorch-master-py2) |

				| Linux GPU | 3.5 | [![Build Status](http://build.pytorch.org:8080/buildStatus/icon?job=pytorch-master-py3)](https://build.pytorch.org/job/pytorch-master-py3) |

				| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |

				| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |

				| Windows GPU | <center>—</center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/)

				See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).

				## More about PyTorch

				At a granular level, PyTorch is a library that consists of the following components:

				| \_                       | \_ |

				| ------------------------ | --- |

				| torch                    | a Tensor library like NumPy, with strong GPU support |

				| torch.autograd           | a tape based automatic differentiation library that supports all differentiable Tensor operations in torch |

				| torch.nn                 | a neural networks library deeply integrated with autograd designed for maximum flexibility |

				| torch.optim              | an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc. |

				| torch.multiprocessing    | python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and hogwild training. |

				| torch.utils              | DataLoader, Trainer and other utility functions for convenience |

				| torch.legacy(.nn/.optim) | legacy code that has been ported over from torch for backward compatibility reasons |

				| Component | Description |

				| ---- | --- |

				| **torch** | a Tensor library like NumPy, with strong GPU support |

				| **torch.autograd** | a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch |

				| **torch.nn** | a neural networks library deeply integrated with autograd designed for maximum flexibility |

				| **torch.multiprocessing** | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |

				| **torch.utils** | DataLoader, Trainer and other utility functions for convenience |

				| **torch.legacy(.nn/.optim)** | legacy code that has been ported over from torch for backward compatibility reasons |

				Usually one uses PyTorch either as:

				- A replacement for numpy to use the power of GPUs.

				- a replacement for NumPy to use the power of GPUs.

				- a deep learning research platform that provides maximum flexibility and speed

				Elaborating further:

				### A GPU-ready Tensor library

				### A GPU-Ready Tensor Library

				If you use numpy, then you have used Tensors (a.k.a ndarray).

				If you use NumPy, then you have used Tensors (a.k.a ndarray).

				<p align=center><img width="30%" src="docs/source/_static/img/tensor_illustration.png" /></p>

				![Tensor illustration](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/tensor_illustration.png)

				PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerate

				compute by a huge amount.

				PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the

				computation by a huge amount.

				We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs

				such as slicing, indexing, math operations, linear algebra, reductions.

				And they are fast!

				### Dynamic Neural Networks: Tape based Autograd

				### Dynamic Neural Networks: Tape-Based Autograd

				PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

				Most frameworks such as `TensorFlow`, `Theano`, `Caffe` and `CNTK` have a static view of the world.

				Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world.

				One has to build a neural network, and reuse the same structure again and again.

				Changing the way the network behaves means that one has to start from scratch.

				With PyTorch, we use a technique called Reverse-mode auto-differentiation, which allows you to

				With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to

				change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes

				from several research papers on this topic, as well as current and past work such as

				[autograd](https://github.com/twitter/torch-autograd),

				[torch-autograd](https://github.com/twitter/torch-autograd),

				[autograd](https://github.com/HIPS/autograd),

				[Chainer](http://chainer.org), etc.

				While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date.

				You get the best of speed and flexibility for your crazy research.

				<p align=center><img width="80%" src="docs/source/_static/img/dynamic_graph.gif" /></p>

				![Dynamic graph](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/dynamic_graph.gif)

				### Python first

				### Python First

				PyTorch is not a Python binding into a monolothic C++ framework.

				PyTorch is not a Python binding into a monolithic C++ framework.

				It is built to be deeply integrated into Python.

				You can use it naturally like you would use numpy / scipy / scikit-learn etc.

				You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.

				You can write your new neural network layers in Python itself, using your favorite libraries

				and use packages such as Cython and Numba.

				Our goal is to not reinvent the wheel where appropriate.

				### Imperative experiences

				### Imperative Experiences

				PyTorch is designed to be intuitive, linear in thought and easy to use.

				When you execute a line of code, it gets executed. There isn't an asynchronous view of the world.

				When you drop into a debugger, or receive error messages and stack traces, understanding them is straight-forward.

				The stack-trace points to exactly where your code was defined.

				When you drop into a debugger, or receive error messages and stack traces, understanding them is straightforward.

				The stack trace points to exactly where your code was defined.

				We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

				### Fast and Lean

				PyTorch has minimal framework overhead. We integrate acceleration libraries 

				such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed. 

				At the core, it's CPU and GPU Tensor and Neural Network backends 

				(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.  

				They are mature and have been tested for years.

				PyTorch has minimal framework overhead. We integrate acceleration libraries

				such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.

				At the core, its CPU and GPU Tensor and neural network backends

				(TH, THC, THNN, THCUNN) are mature and have been tested for years.

				Hence, PyTorch is quite fast -- whether you run small or large neural networks.

				Hence, PyTorch is quite fast – whether you run small or large neural networks.

				The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives.

				We've written custom memory allocators for the GPU to make sure that

				your deep learning models are maximally memory efficient.

				This enables you to train bigger deep learning models than before.

				### Extensions without pain

				### Extensions without Pain

				Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straight-forward

				Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward

				and with minimal abstractions.

				You can write new neural network layers in Python using the torch API

				[or your favorite numpy based libraries such as SciPy](https://github.com/pytorch/tutorials/blob/master/Creating%20extensions%20using%20numpy%20and%20scipy.ipynb).

				[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).

				If you want to write your layers in C/C++, we provide an extension API based on

				[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.  

				There is no wrapper code that needs to be written. [You can see an example here](https://github.com/pytorch/extension-ffi).

				If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.

				There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				## Installation

				### Binaries

				- Anaconda

				```bash

				conda install pytorch torchvision -c soumith

				```

				Commands to install from binaries via Conda or pip wheels are on our website:

				### From source

				[http://pytorch.org](http://pytorch.org)

				### From Source

				If you are installing from source, we highly recommend installing an [Anaconda](https://www.continuum.io/downloads) environment.

				You will get a high-quality BLAS library (MKL) and you get a controlled compiler version regardless of your Linux distro.

				Once you have [anaconda](https://www.continuum.io/downloads) installed, here are the instructions.

				Once you have [Anaconda](https://www.continuum.io/downloads) installed, here are the instructions.

				If you want to compile with CUDA support, install

				- [NVIDIA CUDA](https://developer.nvidia.com/cuda-downloads) 7.5 or above

				- [NVIDIA CuDNN](https://developer.nvidia.com/cudnn) v5.x

				- [NVIDIA cuDNN](https://developer.nvidia.com/cudnn) v6.x or above

				If you want to disable CUDA support, export environment variable `NO_CUDA=1`.

				Other potentially useful environment variables may be found in `setup.py`.

				If you want to build on Windows, Visual Studio 2017 14.11 toolset and NVTX are also needed.

				Especially, for CUDA 8 build on Windows, there will be an additional requirement for VS 2015 Update 3 and a patch for it.

				The details of the patch can be found out [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).

				#### Install optional dependencies

				On Linux

				```bash

				export CMAKE_PREFIX_PATH=[anaconda root directory]

				export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root directory]

				# Install basic dependencies

				conda install numpy mkl setuptools cmake gcc cffi

				conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

				conda install -c mingfeima mkldnn

				# Add LAPACK support for the GPU

				conda install -c soumith magma-cuda75 # or magma-cuda80 if CUDA 8.0

				conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

				```

				On OSX

				On macOS

				```bash

				export CMAKE_PREFIX_PATH=[anaconda root directory]

				conda install numpy setuptools cmake cffi

				conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

				```

				On Windows

				```cmd

				conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

				```

				#### Get the PyTorch source

				```bash

				git clone --recursive https://github.com/pytorch/pytorch

				cd pytorch

				```

				#### Install PyTorch

				On Linux

				```bash

				export MACOSX_DEPLOYMENT_TARGET=10.9 # if OSX

				pip install -r requirements.txt

				python setup.py install

				```

				On macOS

				```bash

				MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

				```

				On Windows

				```cmd

				set "VS150COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build"

				set CMAKE_GENERATOR=Visual Studio 15 2017 Win64

				set DISTUTILS_USE_SDK=1

				REM The following two lines are needed for Python 2.7, but the support for it is very experimental.

				set MSSdk=1

				set FORCE_PY27_BUILD=1

				REM As for CUDA 8, VS2015 Update 3 is also required to build PyTorch. Use the following line.

				set "CUDA_HOST_COMPILER=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"

				call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11

				python setup.py install

				```

				### Docker image

				Dockerfiles are supplied to build images with cuda support and cudnn v5 and cudnn v6 RC. Build them as usual

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass -e PYTHON_VERSION=x.y flag to specificy which python to be used by Miniconda, or leave it unset to use the default. Build as usual

				```

				docker build . -t pytorch-cudnnv5 

				docker build -t pytorch -f docker/pytorch/Dockerfile .

				```

				or 

				You can also pull a pre-built docker image from Docker Hub and run with nvidia-docker,

				but this is not currently maintained and will pull PyTorch 0.2.

				```

				docker build . -t pytorch-cudnnv6 -f tools/docker/Dockerfile-v6

				nvidia-docker run --rm -ti --ipc=host pytorch/pytorch:latest

				```

				and run them with nvidia-docker:

				```

				nvidia-docker run --rm -ti --ipc=host pytorch-cudnnv5

				```

				Please note that pytorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.

				Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g.

				for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you

				should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run. 

				should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.

				### Building the Documentation

				To build documentation in various formats, you will need Sphinx and the

				readthedocs theme.

				```

				cd docs/

				pip install -r requirements.txt

				```

				You can then build the documentation by running ``make <format>`` from the

				``docs/`` folder. Run ``make`` to get a list of all available output formats.

				### Previous Versions

				Installation instructions and binaries for previous PyTorch versions may be found

				on [our website](http://pytorch.org/previous-versions/).

				## Getting Started

				Three pointers to get you started:

				- [Tutorials: notebooks to get you started with understanding and using PyTorch](https://github.com/pytorch/tutorials)

				- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)

				- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)

				- The API Reference: [http://pytorch.org/docs/](http://pytorch.org/docs/)

				- [The API Reference](http://pytorch.org/docs/)

				## Communication

				* forums: discuss implementations, research, etc. http://discuss.pytorch.org

				* github issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

				* slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . If you need a slack invite, ping us at soumith@pytorch.org

				* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

				* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at slack@pytorch.org

				* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv

				## Releases and Contributing

				PyTorch has a 90 day release cycle (major releases). 

				It's current state is Beta (v0.1.6), we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).

				PyTorch has a 90 day release cycle (major releases).

				Its current state is Beta, we expect no obvious bugs. Please let us know if you encounter a bug by [filing an issue](https://github.com/pytorch/pytorch/issues).

				We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

				If you plan to contribute new features, utility functions or extensions to the core, please first open an issue and discuss the feature with us.

				Sending a PR without discussion might end up resulting in a rejected PR, because we might be taking the core in a different direction than you might be aware of.

				**For the next release cycle, these are the 3 big features we are planning to add:**

				1. [Distributed PyTorch](https://github.com/pytorch/pytorch/issues/241) (a draft implementation is present in this [branch](https://github.com/apaszke/pytorch-dist) )

				2. Backward of Backward - Backpropagating through the optimization process itself. Some past and recent papers such as

				   [Double Backprop](http://yann.lecun.com/exdb/publis/pdf/drucker-lecun-91.pdf) and [Unrolled GANs](https://arxiv.org/abs/1611.02163) need this.

				3. Lazy Execution Engine for autograd - This will enable us to optionally introduce caching and JIT compilers to optimize autograd code.

				## The Team

				PyTorch is a community driven project with several skillful engineers and researchers contributing to it.

				PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury) and [Soumith Chintala](http://soumith.ch) with major contributions coming from 10s of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Sergey Zagoruyko, Adam Lerer, Francisco Massa, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein.

				PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury), [Soumith Chintala](http://soumith.ch) and [Gregory Chanan](https://github.com/gchanan) with major contributions coming from 10s of talented individuals in various forms and means.

				A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

				Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.

3

aten/.flake8 Normal file

View File

 @ -0,0 +1,3 @@
 [flake8]
 max-line-length = 120

3

aten/.gitignore vendored Normal file

View File

 @ -0,0 +1,3 @@
 __pycache__/
 build/
 *.pyc

									
										105

aten/CMakeLists.txt
									
										Normal file
									
												View File
												
				@ -0,0 +1,105 @@

				if (BUILD_ATEN_MOBILE)

				  return()

				endif()

				# Find modules

				list(APPEND CMAKE_MODULE_PATH

				  /usr/lib/x86_64-linux-gnu/

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/public

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules_CUDA_fix)

				list(APPEND CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/)

				cmake_policy(SET CMP0012 NEW)

				#############################################

				set(ATen_CPU_SRCS)

				set(ATen_CPU_TEST_SRCS)

				set(ATen_CPU_INCLUDE)

				set(ATen_THIRD_PARTY_INCLUDE)

				set(ATen_CUDA_SRCS)

				set(ATen_CUDA_TEST_SRCS)

				set(ATen_CUDA_INCLUDE)

				set(ATen_CPU_DEPENDENCY_LIBS)

				set(ATen_CUDA_DEPENDENCY_LIBS)

				set(ATen_PUBLIC_CUDA_DEPENDENCY_LIBS)

				SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")

				SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")

				SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")

				if(USE_CUDA)

				  list(APPEND ATen_CUDA_INCLUDE ${CUDA_INCLUDE_DIRS})

				endif()

				set(TH_LINK_STYLE STATIC)

				add_subdirectory(src/TH)

				set(TH_CPU_INCLUDE

				  # dense

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/TH

				  ${CMAKE_CURRENT_BINARY_DIR}/src/TH

				  ${CMAKE_CURRENT_SOURCE_DIR}/src

				  ${CMAKE_CURRENT_BINARY_DIR}/src

				  ${CMAKE_BINARY_DIR}/aten/src)

				list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE})

				if(USE_CUDA OR USE_ROCM)

				  set(TH_CUDA_INCLUDE

				    # dense

				    ${CMAKE_CURRENT_SOURCE_DIR}/src/THC

				    ${CMAKE_CURRENT_BINARY_DIR}/src/THC)

				  list(APPEND ATen_CUDA_INCLUDE ${TH_CUDA_INCLUDE})

				endif()

				add_subdirectory(src/THNN)

				# Find the HIP package, set the HIP paths, load the HIP CMake.

				IF(USE_ROCM)

				  include(LoadHIP)

				  if (NOT PYTORCH_FOUND_HIP)

				    MESSAGE(FATAL_ERROR

				      "Could not find HIP installation")

				  endif()

				ENDIF()

				IF(MSVC)

				  # we want to respect the standard, and we are bored of those **** .

				  ADD_DEFINITIONS(-D_CRT_SECURE_NO_DEPRECATE=1)

				  LIST(APPEND CUDA_NVCC_FLAGS "-Xcompiler /wd4819 -Xcompiler /wd4503 -Xcompiler /wd4190 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4275 -Xcompiler /wd4522")

				ENDIF(MSVC)

				if(USE_ROCM)

				  SET(AT_CUDA_ENABLED 1)

				  add_subdirectory(src/THC)

				  add_subdirectory(src/THCUNN)

				  message("ROCm is enabled.")

				elseif(USE_CUDA)

				  SET(AT_CUDA_ENABLED 1)

				  add_subdirectory(src/THC)

				  add_subdirectory(src/THCUNN)

				else()

				  message("disabling CUDA because USE_CUDA is set false")

				  SET(AT_CUDA_ENABLED 0)

				endif()

				list(APPEND ATen_CPU_INCLUDE

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THNN

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN)

				list(APPEND ATen_CPU_INCLUDE

				  ${CMAKE_CURRENT_SOURCE_DIR}/src

				  ${CMAKE_CURRENT_SOURCE_DIR}/../third_party/catch/single_include

				  ${CMAKE_CURRENT_BINARY_DIR}/src/ATen)

				add_subdirectory(src/ATen)

				# Pass source, includes, and libs to parent

				set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)

				set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)

				set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)

				set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)

				set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)

									
										258

aten/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,258 @@

				# ATen: A TENsor library

				ATen is a simple tensor library thats exposes the Tensor operations in Torch

				and PyTorch directly in C++11. The wrapper respects the semantics of operators

				in PyTorch, except minor details due to differences between C++ and Python in

				the way default arguments are handled. See the [documentation for tensors](http://pytorch.org/docs/tensors.html) in PyTorch for what these operations do.

				ATen's API is auto-generated from the same declarations PyTorch uses so the

				two APIs will track each other over time.

				Tensor types are resolved dynamically, such that the API is generic and

				does not include templates. That is, there is one `Tensor` type. It can hold a

				CPU or CUDA Tensor, and the tensor may have Doubles, Float, Ints, etc. This design

				makes it easy to write generic code without templating everything.

				See https://pytorch.org/cppdocs for the provided API. Excerpt:

				```c++

				Tensor atan2(const Tensor & other) const;

				Tensor & atan2_(const Tensor & other);

				Tensor pow(Scalar exponent) const;

				Tensor pow(const Tensor & exponent) const;

				Tensor & pow_(Scalar exponent);

				Tensor & pow_(const Tensor & exponent);

				Tensor lerp(const Tensor & end, Scalar weight) const;

				Tensor & lerp_(const Tensor & end, Scalar weight);

				Tensor histc() const;

				Tensor histc(int64_t bins) const;

				Tensor histc(int64_t bins, Scalar min) const;

				Tensor histc(int64_t bins, Scalar min, Scalar max) const;

				```

				Inplace operations are also provided, and always suffixed by `_` to indicate they will modify the Tensor.

				### Installation

				TH/THC/THNN/THCUNN are provided (as git subtrees), so the repo is standalone. You will need a C++11 compiler, cmake, and the pyyaml python package.

				```

				# Install pyyaml used by python code generation to read API declarations

				# macOS: if you don't have pip

				sudo easy_install pip

				# Ubuntu: if you don't have pip

				apt-get -y install python-pip

				# if you don't have pyyaml

				sudo pip install pyyaml

				mkdir build

				cd build

				cmake .. -DCMAKE_INSTALL_PREFIX=/where/you/want # specify your dest directory

				# cmake .. -DUSE_NVRTC=ON -DUSE_TENSORRT=OFF -DCMAKE_INSTALL_PREFIX=../install -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DUSE_CUDA=ON # for CUDA

				# cmake .. -DUSE_CUDA=OFF  # for CPU only machines

				make install

				```

				### Example usage

				Here is a simple example; again, the syntax follows Torch semantics.

				```c++

				using namespace at; // assumed in the following

				Tensor d = CPU(kFloat).ones({3, 4});

				Tensor r = CPU(kFloat).zeros({3,4});

				for(auto i = 0; i < 100000; i++) {

				  r = r.add(d);

				  // equivalently

				  r = r + d;

				  // or

				  r += d;

				}

				```

				Want this running on the GPU?

				```c++

				using namespace at; // assumed in the following

				Tensor d = CUDA(kFloat).ones({3, 4});

				Tensor r = CUDA(kFloat).zeros({3,4});

				for(auto i = 0; i < 100000; i++) {

				  r = r.add(d);

				  // equivalently

				  r = r + d;

				  // or

				  r += d;

				}

				```

				Expressions like `CUDA(kFloat)` are first-class `at::Type` objects that represent

				the type of a Tensor and are used to create Tensors when their type cannot be

				inferred.

				See more in [sample files](src/ATen/test).

				### Creating your kernel

				It is easy to create new kernels, thanks to the `dispatch<>()` templated function. Example:

				```c++

				// a simple sum kernel (for CPU only)

				template<typename T>

				struct sum_op {

				  // dispatch handles variable arguments for you

				  Tensor CPU(const Type & t, Tensor & x_)

				  {

				    Tensor x = x_.contiguous();

				    auto x_p = x.data<T>();

				    int64_t size = x.numel();

				    T sum = 0;

				    for(int64_t i = 0; i < size; i++) {

				      sum += x_p[i];

				    }

				    return sum;

				  };

				  Tensor CUDA(Tensor& x) {

				    throw std::invalid_argument("device not supported");

				  };

				};

				Tensor a = CPU(kFloat).rand({3, 7});

				std::cout << a << std::endl;

				std::cout << dispatch<sum_op>(a.type(),a) << " == " << a.sum() << std::endl;

				```

				### Efficient access to tensor elements

				When using Tensor-wide operations, the relative cost of dynamic dispatch is very small.

				However, there are cases, especially in your own kernels, where efficient element-wise access is needed,

				and the cost of dynamic dispatch inside the element-wise loop is very high.

				ATen provides _accessors_ that are created with a single dynamic check that a Tensor is the type and number of

				dimensions. Accessors then expose an API for accessing the Tensor elements efficiently:

				```c++

				Tensor foo = CPU(kFloat).rand({12,12});

				// assert foo is 2-dimensional and holds floats.

				auto foo_a = foo.accessor<float,2>();

				float trace = 0;

				for(int i = 0; i < foo_a.size(0); i++) {

				  // use the accessor foo_a to get tensor data.

				  trace += foo_a[i][i];

				}

				```

				Accessors are temporary views of a Tensor. They are only valid for the lifetime of the tensor that they

				view and hence should only be used locally in a function, like iterators.

				### Using externally created data

				If you already have your tensor data allocated in memory (CPU or CUDA),

				you can view that memory as a Tensor in ATen:

				```c++

				float data[] = { 1, 2, 3,

				                 4, 5, 6};

				auto f = CPU(kFloat).tensorFromBlob(data, {2,3});

				cout << f << endl;

				```

				These tensors cannot be resized because ATen does not own the memory, but otherwise

				behave as normal tensors.

				### Scalars and zero-dimensional tensors

				In addition to the `Tensor` objects, ATen also includes `Scalar`s that represent a single number.

				Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's number types.

				Scalars can be implicitly constructed from C++ number types. Scalars are needed because some functions like `addmm` take numbers along with Tensors and expect these

				numbers to be the same dynamic type as the tensor. They are also used in the API to indicate places where

				a function will _always_ return a Scalar value, like `sum`.

				```c++

				Tensor addmm(Scalar beta, const Tensor & self,

				             Scalar alpha, const Tensor & mat1,

				             const Tensor & mat2);

				Scalar sum(const Tensor & self);

				//usage

				Tensor a = ...

				Tensor b = ...

				Tensor c = ...

				Tensor r = addmm(1.0, a, .5, b, c);

				```

				In addition to Scalars, ATen also allows Tensor objects to be zero-dimensional. These Tensors hold

				a single value and they can be references to a single element in a larger Tensor. They can be used anywhere a Tensor is expected. They are normally created by operators like `select` which reduce the dimensions of

				a Tensor.

				```c++

				Tensor two = CPU(kFloat).rand({10,20});

				two[1][2] = 4;

				//~~~~~~~  zero-dimensional Tensor

				```

				It is possible to convert between Scalar and zero-dim Tensors:

				```c++

				Tensor zero_dim = CPU(kFloat).scalarTensor(4);

				Scalar from_tensor = Scalar(zero_dim); //only valid when zero_dim.dim() == 0;

				```

				### Avoiding unnecessary CUDA synchronization in your kernels when using Scalars

				Moving a single number from the GPU to the CPU introduces a synchronization point

				that can add latency to your program. In certain cases the result of a GPU operator like `sum` which

				returns a Scalar may be plugged into another GPU operator as an argument. If Scalars were always copied

				to the CPU, this would result in 2 copies. To avoid these synchronizations, Scalar objects can be

				optionally backed by a zero-dim Tensor, and are only copied to the CPU when requested.

				```c++

				auto a = CUDA(kFloat).rand({3,4});

				Scalar on_gpu = Scalar(a[1][1]); //backed by zero-dim Tensor

				assert(on_gpu.isBackedByTensor());

				double value = on_gpu.toDouble(); // copied to CPU, if it was backed by GPU Tensor.

				Scalar svalue = on_gpu.local(); // force the Scalar to become local to CPU.

				// get the scalar as a zero-dim tensor. If it was already backed

				// by a zero-dim Tensor then this op has no synchronization.

				// if the Scalar was local on CPU, it performs the copy

				Tensor same_tensor = CUDA(kFloat).scalarTensor(on_gpu);

				```

				Operators aware of the location of Scalars can arrange to do the minimal number of copies required.

				### Developer notes

				ATen relies heavily on code generation to automatically generate headers

				and implementations for all of the tensor methods it supports.  The main

				entry point for the script which does all this work is

				[`src/ATen/gen.py`](src/ATen/gen.py), which ingests

				[`src/ATen/Declarations.cwrap`](src/ATen/Declarations.cwrap),

				[`src/ATen/nn.yaml`](src/ATen/nn.yaml),

				[`src/ATen/native/native_functions.yaml`](src/ATen/native/native_functions.yaml) and the THNN/THCUNN headers and

				produces all of the headers and wrapping code necessary to generate

				the ATen interface.

				If you need to understand how ATen understands a declaration after all

				of this processing occurs, it's helpful to look at the generated file

				`Declarations.yaml` (NB: not cwrap) which contains information for all

				ATen methods in a uniform manner.  This file is utilized by PyTorch

				which further extends the ATen interface with support for automatic

				differentation.

				#### Note [ATen preprocessor philosophy]

				ATen is designed to be simple to use, and one of the things this implies is

				that it should not be necessary to use preprocessor macros when using ATen;

				we would rather provide all symbols, even for functionality that is not

				available on the system ATen is running on.

				This means that internally inside ATen, whereas other libraries might

				simply omit source files for, e.g., CuDNN, when CuDNN libraries are not

				installed, ATen will always build these source files, compiling stub

				functions for anything that is not available.  ATen never uses

				`AT_ENABLED_CUDA()` in header files, and all types in ATen's public API

				are always available no matter your build configuration.

									
										21

aten/conda/build.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,21 @@

				#!/bin/bash

				set -e

				if [ -z "$PREFIX" ]; then

				  PREFIX="$CONDA_PREFIX"

				fi

				# When conda-build constructs a new working copy to perform a build

				# in, it recursively copies *all* files and directories in the original

				# source directory, including any pre-existing build products (e.g.,

				# if you previously ran cmake.)  This is problematic, because if

				# a 'build' directory already exists, cmake will reuse build settings

				# rather than recompute them from scratch.  We want a fresh build, so

				# we prophylactically remove the build directory.

				rm -rf build || true

				mkdir -p build

				cd build

				cmake -DCMAKE_INSTALL_PREFIX="$PREFIX" -DCMAKE_PREFIX_PATH="$PREFIX" -DCMAKE_BUILD_TYPE=Release $CONDA_CMAKE_ARGS ..

				make install -j20

									
										33

aten/conda/meta.yaml
									
										Normal file
									
												View File
												
				@ -0,0 +1,33 @@

				{% set version = "0.1.dev" %}

				package:

				  name: aten

				  version: {{ version }}

				source:

				  path: ..

				build:

				  number: 1

				  skip: True  # [win]

				  script_env:

				    - CONDA_CMAKE_ARGS

				requirements:

				  build:

				    - cmake

				    - pyyaml

				    - setuptools

				    - python

				    - mkl # [not osx]

				  run:

				    - mkl # [not osx]

				about:

				  home: https://github.com/zdevito/ATen

				  license: BSD

				  summary: A TENsor library for C++11

				extra:

				  recipe-maintainers:

				    - ezyang

1

aten/src/ATen/.gitignore vendored Normal file

View File

				`@ -0,0 +1 @@`
				`Config.h`

									
										26

aten/src/ATen/ATen.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,26 @@

				#pragma once

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/Allocator.h"

				#include "ATen/CPUGeneral.h"

				#include "ATen/CUDAGuard.h"

				#include "ATen/Context.h"

				#include "ATen/Device.h"

				#include "ATen/DeviceGuard.h"

				#include "ATen/DimVector.h"

				#include "ATen/Dispatch.h"

				#include "ATen/Formatting.h"

				#include "ATen/Functions.h"

				#include "ATen/core/Generator.h"

				#include "ATen/core/Layout.h"

				#include "ATen/OptionsGuard.h"

				#include "ATen/core/Scalar.h"

				#include "ATen/ScalarOps.h"

				#include "ATen/core/Storage.h"

				#include "ATen/Tensor.h"

				#include "ATen/TensorGeometry.h"

				#include "ATen/core/TensorMethods.h"

				#include "ATen/TensorOperators.h"

				#include "ATen/core/TensorOptions.h"

				#include "ATen/Type.h"

				#include "ATen/core/Error.h"

									
										9

aten/src/ATen/ATenConfig.cmake.in
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				# Find the TH includes and library

				#

				# ATEN_INCLUDE_DIR -- where to find the includes

				# ATEN_LIBRARIES -- list of libraries to link against

				# ATEN_FOUND -- set to 1 if found

				SET(ATEN_FOUND 1)

				SET(ATEN_INCLUDE_DIR "@ATEN_INCLUDE_DIR@")

				SET(ATEN_LIBRARIES "@ATEN_LIBRARIES@")

									
										43

aten/src/ATen/AccumulateType.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,43 @@

				#pragma once

				#include "ATen/Config.h"

				#include "ATen/core/Half.h"

				// Defines the accumulation type for a scalar type.

				// Example:

				//   using accscalar_t = acc_type<scalar_t, true>;

				#ifdef __CUDACC__

				#include <cuda.h>

				#include <cuda_fp16.h>

				#endif

				namespace at {

				template <typename T, bool is_cuda>

				struct AccumulateType { };

				#ifdef __CUDACC__

				template <> struct AccumulateType<half, true> { using type = float; };

				#endif

				template <> struct AccumulateType<Half, true> { using type = float; };

				template <> struct AccumulateType<float, true> { using type = float; };

				template <> struct AccumulateType<double, true> { using type = double; };

				template <> struct AccumulateType<int8_t, true> { using type = int64_t; };

				template <> struct AccumulateType<uint8_t, true> { using type = int64_t; };

				template <> struct AccumulateType<char, true> { using type = int64_t; };

				template <> struct AccumulateType<int16_t, true> { using type = int64_t; };

				template <> struct AccumulateType<int32_t, true> { using type = int64_t; };

				template <> struct AccumulateType<int64_t, true> { using type = int64_t; };

				template <> struct AccumulateType<float, false> { using type = double; };

				template <> struct AccumulateType<double, false> { using type = double; };

				template <> struct AccumulateType<int8_t, false> { using type = int64_t; };

				template <> struct AccumulateType<uint8_t, false> { using type = int64_t; };

				template <> struct AccumulateType<char, false> { using type = int64_t; };

				template <> struct AccumulateType<int16_t, false> { using type = int64_t; };

				template <> struct AccumulateType<int32_t, false> { using type = int64_t; };

				template <> struct AccumulateType<int64_t, false> { using type = int64_t; };

				template<typename T, bool is_cuda>

				using acc_type = typename AccumulateType<T, is_cuda>::type;

				}  // namespace at

									
										2

aten/src/ATen/Allocator.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Allocator.h>

									
										2

aten/src/ATen/ArrayRef.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/ArrayRef.h>

									
										2

aten/src/ATen/Backend.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Backend.h>

									
										2

aten/src/ATen/Backtrace.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Backtrace.h>

									
										384

aten/src/ATen/CMakeLists.txt
									
										Normal file
									
												View File
												
				@ -0,0 +1,384 @@

				cmake_minimum_required(VERSION 3.0 FATAL_ERROR)

				SET(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake ${CMAKE_MODULE_PATH})

				IF(NOT MSVC)

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ignored-qualifiers")

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-ignored-qualifiers")

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-absolute-value")

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-absolute-value")

				ENDIF(NOT MSVC)

				# Can be compiled standalone

				IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DIR OR NOT AT_INSTALL_SHARE_DIR)

				  SET(AT_INSTALL_BIN_DIR "bin" CACHE PATH "AT install binary subdirectory")

				  SET(AT_INSTALL_LIB_DIR "lib" CACHE PATH "AT install library subdirectory")

				  SET(AT_INSTALL_INCLUDE_DIR "include" CACHE PATH "AT install include subdirectory")

				  SET(AT_INSTALL_SHARE_DIR "share" CACHE PATH "AT install include subdirectory")

				ENDIF()

				CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")

				CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")

				# NB: If you edit these globs, you'll have to update setup.py package_data as well

				FILE(GLOB base_h "*.h" "detail/*.h")

				FILE(GLOB base_cpp "*.cpp" "detail/*.cpp")

				add_subdirectory(core)

				FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")

				FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")

				FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")

				FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")

				FILE(GLOB cudnn_cpp "cudnn/*.cpp")

				FILE(GLOB miopen_h "miopen/*.h")

				FILE(GLOB miopen_cpp "miopen/*.cpp")

				FILE(GLOB mkl_cpp "mkl/*.cpp")

				FILE(GLOB mkldnn_cpp "mkldnn/*.cpp")

				FILE(GLOB native_cpp "native/*.cpp")

				FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")

				FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")

				FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")

				FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")

				FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")

				FILE(GLOB native_cuda_cu "native/cuda/*.cu")

				FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")

				FILE(GLOB native_mkl_cpp "native/mkl/*.cpp")

				FILE(GLOB native_mkldnn_cpp "native/mkldnn/*.cpp")

				set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})

				if(AT_MKL_ENABLED)

				  set(all_cpu_cpp ${all_cpu_cpp} ${mkl_cpp})

				endif()

				if(AT_MKLDNN_ENABLED)

				  set(all_cpu_cpp ${all_cpu_cpp} ${mkldnn_cpp})

				endif()

				IF(USE_CUDA OR USE_ROCM)

				  list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)

				  set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})

				  set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})

				  IF(USE_CUDA)

				    SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})

				    IF(CUDNN_FOUND)

				      SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})

				    ENDIF()

				  ELSEIF(USE_ROCM)

				    SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${miopen_cpp} ${all_cuda_cpp})

				  ENDIF()

				endif()

				filter_list(generated_h generated_cpp "\\.h$")

				filter_list(cuda_generated_h cuda_generated_cpp "\\.h$")

				list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/..)

				# so the build can find the generated header files

				list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_BINARY_DIR})

				IF(NOT AT_LINK_STYLE)

				  SET(AT_LINK_STYLE SHARED)

				ENDIF()

				IF(BLAS_FOUND)

				  IF ($ENV{TH_BINARY_BUILD})

				    MESSAGE(STATUS "TH_BINARY_BUILD detected. Enabling special linkage.")

				    list(APPEND ATen_CPU_DEPENDENCY_LIBS

				      "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    if(USE_CUDA OR USE_ROCM)

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				        "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    endif()

				  ELSE ($ENV{TH_BINARY_BUILD})

				    list(APPEND ATen_CPU_DEPENDENCY_LIBS ${BLAS_LIBRARIES})

				    if(USE_CUDA OR USE_ROCM)

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${BLAS_LIBRARIES}")

				    endif()

				  ENDIF ($ENV{TH_BINARY_BUILD})

				ENDIF(BLAS_FOUND)

				IF(LAPACK_FOUND)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})

				  if(USE_CUDA OR USE_ROCM)

				    # Although Lapack provides CPU (and thus, one might expect that ATen_cuda

				    # would not need this at all), some of our libraries (magma in particular)

				    # backend to CPU BLAS/LAPACK implementations, and so it is very important

				    # we get the *right* implementation, because even if the symbols are the

				    # same, LAPACK implementions may have different calling conventions.

				    # This caused https://github.com/pytorch/pytorch/issues/7353

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})

				  endif()

				ENDIF(LAPACK_FOUND)

				IF (UNIX AND NOT APPLE)

				   INCLUDE(CheckLibraryExists)

				   # https://github.com/libgit2/libgit2/issues/2128#issuecomment-35649830

				   CHECK_LIBRARY_EXISTS(rt clock_gettime "time.h" NEED_LIBRT)

				   IF(NEED_LIBRT)

				     list(APPEND ATen_CPU_DEPENDENCY_LIBS rt)

				     SET(CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} rt)

				   ENDIF(NEED_LIBRT)

				ENDIF(UNIX AND NOT APPLE)

				IF(UNIX)

				  SET(CMAKE_EXTRA_INCLUDE_FILES "sys/mman.h")

				  CHECK_FUNCTION_EXISTS(mmap HAVE_MMAP)

				  IF(HAVE_MMAP)

				    ADD_DEFINITIONS(-DHAVE_MMAP=1)

				  ENDIF(HAVE_MMAP)

				  # done for lseek: https://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html

				  ADD_DEFINITIONS(-D_FILE_OFFSET_BITS=64)

				  CHECK_FUNCTION_EXISTS(shm_open HAVE_SHM_OPEN)

				  IF(HAVE_SHM_OPEN)

				    ADD_DEFINITIONS(-DHAVE_SHM_OPEN=1)

				  ENDIF(HAVE_SHM_OPEN)

				  CHECK_FUNCTION_EXISTS(shm_unlink HAVE_SHM_UNLINK)

				  IF(HAVE_SHM_UNLINK)

				    ADD_DEFINITIONS(-DHAVE_SHM_UNLINK=1)

				  ENDIF(HAVE_SHM_UNLINK)

				  CHECK_FUNCTION_EXISTS(malloc_usable_size HAVE_MALLOC_USABLE_SIZE)

				  IF(HAVE_MALLOC_USABLE_SIZE)

				    ADD_DEFINITIONS(-DHAVE_MALLOC_USABLE_SIZE=1)

				  ENDIF(HAVE_MALLOC_USABLE_SIZE)

				ENDIF(UNIX)

				if(NOT MSVC)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS m)

				endif()

				if(MKLDNN_FOUND)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ${MKLDNN_LIBRARIES})

				endif(MKLDNN_FOUND)

				list(APPEND ATen_CPU_DEPENDENCY_LIBS cpuinfo)

				if(NOT MSVC AND NOT EMSCRIPTEN)

				  # Preserve values for the main build

				  set(__aten_sleef_build_shared_libs ${BUILD_SHARED_LIBS})

				  set(__aten_sleef_build_tests ${BUILD_TESTS})

				  # Unset our restrictive C++ flags here and reset them later.

				  # Remove this once we use proper target_compile_options.

				  set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

				  set(CMAKE_CXX_FLAGS)

				  set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef static" FORCE)

				  set(BUILD_DFT OFF CACHE BOOL "Don't build sleef DFT lib" FORCE)

				  set(BUILD_GNUABI_LIBS OFF CACHE BOOL "Don't build sleef gnuabi libs" FORCE)

				  set(BUILD_TESTS OFF CACHE BOOL "Don't build sleef tests" FORCE)

				  add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../third_party/sleef" ${CMAKE_BINARY_DIR}/sleef)

				  set_property(TARGET sleef PROPERTY FOLDER "dependencies")

				  list(APPEND ATen_THIRD_PARTY_INCLUDE ${CMAKE_BINARY_DIR}/include)

				  link_directories(${CMAKE_BINARY_DIR}/sleef/lib)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS sleef)

				  set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})

				  # Set these back. TODO: Use SLEEF_ to pass these instead

				  set(BUILD_SHARED_LIBS ${__aten_sleef_build_shared_libs} CACHE BOOL "Build shared libs" FORCE)

				  set(BUILD_TESTS ${__aten_sleef_build_tests} CACHE BOOL "Build tests" FORCE)

				endif()

				IF(USE_CUDA AND NOT USE_ROCM)

				  IF ($ENV{ATEN_STATIC_CUDA})

				    # CuFFT has a complicated static story (especially around CUDA < 9) because it has device callback support

				    # we first have to build a fake lib that links with no device callbacks,

				    # and then we link against this object file.

				    # This was recommended by the CuFFT team at NVIDIA

				    # build fake CuFFT lib in build dir

				    EXECUTE_PROCESS(COMMAND touch ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc)

				    if(${CUDA_VERSION_MAJOR} EQUAL "8")

				      SET(CUFFT_FAKELINK_OPTIONS

					--generate-code arch=compute_35,code=sm_35

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60)

				    elseif(${CUDA_VERSION_MAJOR} EQUAL "9")

				      SET(CUFFT_FAKELINK_OPTIONS

					--generate-code arch=compute_35,code=sm_35

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60

					--generate-code arch=compute_70,code=sm_70)

				    else()

				      MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")

				    endif()

				    ADD_CUSTOM_COMMAND(

				      OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a

				      COMMAND "${CUDA_TOOLKIT_ROOT_DIR}/bin/nvcc" -o ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a -Xcompiler -fPIC

				      ${CUFFT_FAKELINK_OPTIONS}

				      --device-link ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc -lcufft_static -lculibos

				      )

				    ADD_CUSTOM_TARGET(FAKELINKED_CUFFT_TARGET DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)

				    add_library(FAKELINKED_CUFFT STATIC IMPORTED GLOBAL)

				    add_dependencies(FAKELINKED_CUFFT FAKELINKED_CUFFT_TARGET)

				    set_target_properties(FAKELINKED_CUFFT PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				      ${CUDA_LIBRARIES}

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcusparse_static.a

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcurand_static.a

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a

				      FAKELINKED_CUFFT

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcufft_static.a

				      )

				  ELSE()

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				      ${CUDA_LIBRARIES}

				      ${CUDA_cusparse_LIBRARY}

				      ${CUDA_curand_LIBRARY})

				  ENDIF()

				  if(CUDNN_FOUND)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${CUDNN_LIBRARIES})

				  endif(CUDNN_FOUND)

				  IF(USE_MAGMA)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${MAGMA_LIBRARIES})

				    IF ($ENV{TH_BINARY_BUILD})

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				        "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    ENDIF($ENV{TH_BINARY_BUILD})

				  ENDIF(USE_MAGMA)

				  IF ($ENV{ATEN_STATIC_CUDA})

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a")

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart_static.a")

				  ENDIF($ENV{ATEN_STATIC_CUDA})

				ENDIF()

				IF(USE_ROCM)

				 ### Link in the ROCm libraries BLAS / RNG.

				 FIND_LIBRARY(ROCBLAS_LIBRARY rocblas HINTS ${ROCBLAS_PATH}/lib)

				 FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)

				 list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})

				ENDIF()

				# Include CPU paths for CUDA as well

				list(APPEND ATen_CUDA_INCLUDE ${ATen_CPU_INCLUDE})

				# We have two libraries: libATen_cpu.so and libATen_cuda.so,

				# with libATen_cuda.so depending on libATen_cpu.so.  The CPU library

				# contains CPU code only.  libATen_cpu.so is invariant to the setting

				# of USE_CUDA (it always builds the same way); libATen_cuda.so is only

				# built when USE_CUDA=1 and CUDA is available.

				set(ATen_CPU_SRCS ${all_cpu_cpp})

				if(AT_LINK_STYLE STREQUAL "INTERFACE")

				  # Source code can't be added to an interface library, so it is

				  # passed back to be compiled into the containing library

				  add_library(ATen_cpu INTERFACE)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ATEN_CPU_FILES_GEN_LIB)

				else()

				  add_library(ATen_cpu ${AT_LINK_STYLE} ${ATen_CPU_SRCS})

				  if (ATen_THIRD_PARTY_INCLUDE)

				    target_include_directories(ATen_cpu SYSTEM PRIVATE ${ATen_THIRD_PARTY_INCLUDE})

				  endif()

				  target_include_directories(ATen_cpu INTERFACE $<INSTALL_INTERFACE:include>)

				  target_include_directories(ATen_cpu PRIVATE ${ATen_CPU_INCLUDE})

				  target_link_libraries(ATen_cpu PUBLIC ${ATen_CPU_DEPENDENCY_LIBS})

				  target_link_libraries(ATen_cpu PRIVATE ATEN_CPU_FILES_GEN_LIB)

				  caffe2_interface_library(ATen_cpu ATen_cpu_library)

				  # Set standard properties on the target

				  aten_set_target_props(ATen_cpu)

				  # Make sure these don't get built by parent

				  set(ATen_CPU_SRCS)

				endif()

				if(USE_CUDA OR USE_ROCM)

				  set(ATen_CUDA_SRCS ${all_cuda_cpp})

				  if(AT_LINK_STYLE STREQUAL "INTERFACE")

				    # Source code can't be added to an interface library, so it is

				    # passed back to be compiled into the containing library

				    add_library(ATen_cuda INTERFACE)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ATEN_CUDA_FILES_GEN_LIB)

				  else()

				    # A hack to deal with cuda library dependencies and modern CMake: the

				    # CUDA_ADD_LIBRARY includes a target_link_libraries, and as a result,

				    # one cannot use PUBLIC/PRIVATE/INTERFACE for the target anymore. This

				    # hack adds the PRIVATE keywords to CUDA_LIBRARIES so we can deal with

				    # it. We will then manually add the cudart library as interface libs.

				    set(__tmp ${CUDA_LIBRARIES})

				    set(CUDA_LIBRARIES PRIVATE ${CUDA_LIBRARIES})

				    torch_cuda_based_add_library(ATen_cuda ${AT_LINK_STYLE} ${ATen_CUDA_SRCS})

				    set(CUDA_LIBRARIES ${__tmp})

				    target_link_libraries(ATen_cuda INTERFACE caffe2::cudart)

				    target_include_directories(

				        ATen_cuda INTERFACE $<INSTALL_INTERFACE:include>)

				    target_include_directories(

				        ATen_cuda PRIVATE ${ATen_THIRD_PARTY_INCLUDE})

				    target_include_directories(

				        ATen_cuda PRIVATE ${ATen_CUDA_INCLUDE})

				    target_link_libraries(

				        ATen_cuda PRIVATE ${ATen_CUDA_DEPENDENCY_LIBS} ATEN_CUDA_FILES_GEN_LIB)

				    # These public dependencies must go after the previous dependencies, as the

				    # order of the libraries in the linker call matters here when statically

				    # linking; libculibos and cublas must be last.

				    target_link_libraries(

				        ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})

				    # Set standard properties on the target

				    aten_set_target_props(ATen_cuda)

				    caffe2_interface_library(ATen_cuda ATen_cuda_library)

				    # Make sure these don't get built by parent

				    set(ATen_CUDA_SRCS)

				  endif()

				endif()

				if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")

				  if(USE_CUDA)

				    if (NOT $ENV{ATEN_STATIC_CUDA})

				      cuda_add_cublas_to_target(ATen_cuda)

				      cuda_add_cufft_to_target(ATen_cuda)

				    endif()

				  endif()

				  if(NOT MSVC)

				    aten_compile_options(ATen_cpu)

				    if(USE_CUDA OR USE_ROCM)

				      aten_compile_options(ATen_cuda)

				    endif()

				  endif()

				  if(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")

				    set_property(TARGET ATen_cpu PROPERTY CXX_STANDARD 11)

				    if(USE_CUDA OR USE_ROCM)

				      set_property(TARGET ATen_cuda PROPERTY CXX_STANDARD 11)

				    endif()

				  endif()

				endif()

				SET(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")

				CONFIGURE_FILE(ATenConfig.cmake.in "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake")

				INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"

				  DESTINATION "${AT_INSTALL_SHARE_DIR}/cmake/ATen")

				# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake

				FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h})

				  string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" HEADER_SUB ${HEADER})

				  GET_FILENAME_COMPONENT(DIR ${HEADER_SUB} DIRECTORY)

				  INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen/${DIR})

				ENDFOREACH()

				FOREACH(HEADER ${generated_h} ${cuda_generated_h})

				  # NB: Assumed to be flat

				  INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)

				ENDFOREACH()

				INSTALL(FILES ${CMAKE_BINARY_DIR}/aten/src/ATen/Declarations.yaml

				  DESTINATION ${AT_INSTALL_SHARE_DIR}/ATen)

				if(ATEN_NO_TEST)

				  message("disable test because ATEN_NO_TEST is set")

				else()

				  add_subdirectory(test)

				endif()

				# Pass source, includes, and libs to parent

				set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)

				set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)

				set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)

				set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)

				set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)

				set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)

									
										567

aten/src/ATen/CPUApplyUtils.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,567 @@

				#pragma once

				#include "ATen/Parallel.h"

				#include "ATen/TensorUtils.h"

				#include <limits>

				#include <utility>

				#include <cstring>

				namespace at {

				/*

				[collapse dims] Updates sizes, and strides to reflect a "collapse" of

				the info, possibly excluding the optional excludeDim. A "collapsed" version

				of the info is the fewest dims that order the tensor's elements in the same

				way as the original info. If excludeDim is specified, the collapse is the

				fewest dims that order the tensor's elements as the original and preserve the

				excluded dimension, unless the tensor collapses to a point.

				This function returns a pair of values.

				1) The (new) index of the preserved dimension if excludeDim is

				specified. 0 if the tensor is collapsed to a point. -1

				otherwise.

				2) The new number of dimensions.

				*/

				template <typename T>

				inline std::pair<int64_t, int64_t> collapse_dims(

				    T* sizes,

				    T* strides,

				    int64_t dims,

				    const int excludeDim = -1) {

				  AT_CHECK(

				      excludeDim >= -1 && excludeDim < dims,

				      "expected excluded dim between -1 and dims - 1");

				  int64_t stopDim = (excludeDim == -1) ? dims : excludeDim;

				  int64_t newIndex = -1;

				  int64_t oldIndex = 0;

				  int64_t remappedExcludedDim = -1;

				  while (oldIndex < dims) {

				    // Finds a dimension to collapse into

				    for (; oldIndex < stopDim; ++oldIndex) {

				      if (sizes[oldIndex] == 1) {

				        continue;

				      }

				      ++newIndex;

				      sizes[newIndex] = sizes[oldIndex];

				      strides[newIndex] = strides[oldIndex];

				      ++oldIndex;

				      break;

				    }

				    // Collapses dims

				    for (; oldIndex < stopDim; ++oldIndex) {

				      if (sizes[oldIndex] == 1) {

				        continue;

				      }

				      if (strides[newIndex] == sizes[oldIndex] * strides[oldIndex]) {

				        sizes[newIndex] *= sizes[oldIndex];

				        strides[newIndex] = strides[oldIndex];

				      } else {

				        ++newIndex;

				        sizes[newIndex] = sizes[oldIndex];

				        strides[newIndex] = strides[oldIndex];

				      }

				    }

				    // Handles excludeDim being set (oldIndex == excludeDim)

				    if (oldIndex != dims) {

				      // Preserves excluded dimension

				      ++newIndex;

				      sizes[newIndex] = sizes[oldIndex];

				      strides[newIndex] = strides[oldIndex];

				      remappedExcludedDim = newIndex;

				      // Restarts iteration after excludeDim

				      ++oldIndex;

				      stopDim = dims;

				    }

				  }

				  // Handles special case of all dims size 1

				  if (newIndex == -1 || (newIndex == 0 && sizes[0] == 1)) {

				    dims = 1;

				    sizes[0] = 1;

				    strides[0] = 1;

				    return std::pair<int64_t, int64_t>(0, 1);

				  }

				  dims = newIndex + 1;

				  return std::pair<int64_t, int64_t>(remappedExcludedDim, dims);

				}

				/*

				 * The basic strategy for apply is as follows:

				 *

				 * 1. Starting with the outermost index, loop until we reach a dimension where

				 * the data is no longer contiguous, i.e. the stride at that dimension is not

				 * equal to the size of the tensor defined by the outer dimensions. Let's call

				 * this outer (contiguous) tensor A. Note that if the Tensor is contiguous, then

				 * A is equal to the entire Tensor. Let's call the inner tensor B.

				 *

				 * 2. We loop through the indices in B, starting at its outermost dimension. For

				 * example, if B is a 2x2 matrix, then we do:

				 *

				 * B[0][0]

				 * B[0][1]

				 * B[1][0]

				 * B[1][1]

				 *

				 * We set the offset into the underlying storage as (storageOffset + stride_B *

				 * index_B), i.e. basically we compute the offset into the storage as we would

				 * normally for a Tensor. But because we are guaranteed the subsequent data is

				 * contiguous in memory, we can simply loop for sizeof(A) iterations and perform

				 * the operation, without having to follow the order described by the strides of

				 * A.

				 *

				 * 3. As an optimization, we merge dimensions of A that are contiguous in

				 * memory. For example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor,

				 * then the first two dimensions can be merged for the purposes of APPLY,

				 * reducing the number of nested loops.

				 */

				inline Tensor sort_strides(Tensor& tensor_) {

				  IntList strides = tensor_.strides();

				  std::vector<int64_t> indices;

				  indices.reserve(tensor_.ndimension());

				  for (int64_t i = 0; i < tensor_.ndimension(); i++) {

				    indices.push_back(i);

				  }

				  std::sort(indices.begin(), indices.end(), [&strides](int64_t i1, int64_t i2) {

				    return strides[i1] > strides[i2];

				  });

				  Tensor tensor = tensor_.permute(indices);

				  return tensor;

				}

				template <typename T, int N>

				struct strided_tensor_iter_fixed {

				 public:

				  T* data_ = NULL;

				  int64_t dim_ = 0;

				  int64_t counter_[N] = {0};

				  int64_t sizes_[N] = {0};

				  int64_t strides_[N] = {0};

				  strided_tensor_iter_fixed(strided_tensor_iter_fixed const&) = delete;

				  void operator=(strided_tensor_iter_fixed const& x) = delete;

				  strided_tensor_iter_fixed(strided_tensor_iter_fixed&&) = default;

				  strided_tensor_iter_fixed(Tensor& tensor, bool sort_strides = false)

				      : data_(tensor.data<T>()) {

				    std::memset(counter_, 0, sizeof(int64_t) * N);

				    std::memcpy(

				        sizes_, tensor.sizes().data(), tensor.ndimension() * sizeof(int64_t));

				    std::memcpy(

				        strides_,

				        tensor.strides().data(),

				        tensor.ndimension() * sizeof(int64_t));

				    dim_ = std::get<1>(collapse_dims(sizes_, strides_, tensor.ndimension()));

				  }

				};

				template <typename T>

				struct strided_tensor_iter {

				 private:

				 public:

				  T* data_ = NULL;

				  int64_t dim_;

				  std::vector<int64_t> counter_;

				  std::vector<int64_t> sizes_;

				  std::vector<int64_t> strides_;

				  strided_tensor_iter(strided_tensor_iter const&) = delete;

				  void operator=(strided_tensor_iter const& x) = delete;

				  strided_tensor_iter(strided_tensor_iter&&) = default;

				  strided_tensor_iter(Tensor& tensor)

				      : data_(tensor.data<T>()),

				        dim_(tensor.ndimension()),

				        counter_(dim_, 0),

				        sizes_(tensor.sizes().vec()),

				        strides_(tensor.strides().vec()) {

				    dim_ = std::get<1>(collapse_dims(sizes_.data(), strides_.data(), dim_));

				  }

				};

				inline bool _all_equal_numel(at::ArrayRef<Tensor> tensors) {

				  if (tensors.size() == 0)

				    return true;

				  int64_t all_numel = tensors[0].numel();

				  for (size_t i = 1; i < tensors.size(); i++) {

				    if (tensors[i].numel() != all_numel)

				      return false;

				  }

				  return true;

				}

				inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {

				  std::ostringstream oss;

				  oss << "inconsistent tensor size, expected ";

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].sizes() << ", ";

				  }

				  oss << "and " << tensors[tensors.size() - 1]

				      << " to have the same number of elements, but got ";

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].numel() << ", ";

				  }

				  oss << "and " << tensors[tensors.size() - 1].numel()

				      << " elements respectively";

				  return oss.str();

				}

				inline bool _apply_preamble(ArrayRef<Tensor> tensors) {

				  checkBackend("CPU_tensor_apply", tensors, Backend::CPU);

				  if (!_all_equal_numel(tensors))

				    throw std::runtime_error(_all_equal_numel_error(tensors));

				  // An empty tensor has no elements

				  for (auto& t : tensors)

				    if (t.numel() == 0)

				      return false;

				  return true;

				}

				inline int64_t _max_dim_tensors(ArrayRef<Tensor> tensors) {

				  int64_t dim = 0;

				  for (auto& t : tensors)

				    dim = std::max(dim, t.ndimension());

				  return dim;

				}

				inline void iterate(int64_t size){};

				template <typename Arg, typename... Args>

				inline void iterate(int64_t size, Arg& iter, Args&... iter_tail) {

				  iter.counter_[iter.dim_ - 1] += size;

				  iter.data_ = iter.data_ + size * iter.strides_[iter.dim_ - 1];

				  iterate(size, iter_tail...);

				}

				inline bool iterate_continue() {

				  return true;

				};

				template <typename Arg, typename... Args>

				inline bool iterate_continue(Arg& iter, Args&... iter_tail) {

				  return iter.counter_[iter.dim_ - 1] < iter.sizes_[iter.dim_ - 1] &&

				      iterate_continue(iter_tail...);

				}

				inline int64_t max_iterate_size() {

				  return std::numeric_limits<int64_t>::max();

				};

				template <typename Arg, typename... Args>

				inline int64_t max_iterate_size(Arg& iter, Args&... iter_tail) {

				  return std::min(

				      (iter.sizes_[iter.dim_ - 1] - iter.counter_[iter.dim_ - 1]),

				      max_iterate_size(iter_tail...));

				}

				inline void iterate_overflow(){};

				template <typename Arg, typename... Args>

				inline void iterate_overflow(Arg& iter, Args&... iter_tail) {

				  if (iter.counter_[iter.dim_ - 1] == iter.sizes_[iter.dim_ - 1]) {

				    for (int64_t i = iter.dim_ - 1; i > 0; i--) {

				      if (iter.counter_[i] == iter.sizes_[i]) {

				        iter.counter_[i] = 0;

				        iter.counter_[i - 1]++;

				        iter.data_ = iter.data_ - (iter.sizes_[i] * iter.strides_[i]) +

				            iter.strides_[i - 1];

				      }

				    }

				  }

				  iterate_overflow(iter_tail...);

				}

				inline void forward(int64_t offset){};

				template <typename Arg, typename... Args>

				inline void forward(int64_t offset, Arg& iter, Args&... iter_tail) {

				  int64_t multi = offset;

				  for (int64_t i = iter.dim_ - 1; i >= 0; i--) {

				    int64_t inc = multi % iter.sizes_[i];

				    multi = multi / iter.sizes_[i];

				    iter.data_ = iter.data_ + inc * iter.strides_[i];

				    iter.counter_[i] += inc;

				  }

				  forward(offset, iter_tail...);

				}

				inline int64_t max_dim() {

				  return 0;

				}

				template <typename Arg, typename... Args>

				inline int64_t max_dim(Arg& iter, Args&... iter_tail) {

				  return std::max(iter.dim_, max_dim(iter_tail...));

				}

				inline void apply_op(){};

				template <typename Op, typename... Args>

				inline void

				apply_op(int64_t numel, int64_t offset, const Op& op, Args... iters) {

				  // For 0-dim tensors

				  if (numel == 1 && max_dim(iters...) == 0) {

				    op(*iters.data_...);

				    return;

				  }

				  if (offset > 0)

				    forward(offset, iters...);

				  // Splitting this into chunks helps the compiler create faster assembly

				  for (int64_t i = 0; i < numel;) {

				    for (; iterate_continue(iters...) && i < numel;) {

				      op(*iters.data_...);

				      iterate(1, iters...);

				      i++;

				    }

				    iterate_overflow(iters...);

				  }

				}

				inline void apply_kernel(){};

				// TODO: Deal elegantly with 0-dim tensors. iters.strides_ of 0-dim

				// strided_tensor_iter will be of size 0 for dim 0 and iters.strides_[iters.dim_

				// - 1] will index at -1. C++14 integer_sequence could be of use here.

				template <typename Op, typename... Args>

				inline void

				apply_kernel(int64_t numel, int64_t offset, const Op& op, Args... iters) {

				  if (offset > 0)

				    forward(offset, iters...);

				  int64_t size = std::min(numel, max_iterate_size(iters...));

				  op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);

				  iterate(size, iters...);

				  iterate_overflow(iters...);

				  int64_t i = size;

				  size = std::min(numel, max_iterate_size(iters...));

				  for (; i < numel;) {

				    op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);

				    iterate(size, iters...);

				    i += size;

				    iterate_overflow(iters...);

				  }

				}

				template <typename scalar1, typename scalar2, typename Op>

				inline void

				CPU_tensor_parallel_kernel_apply2(Tensor tensor1, Tensor tensor2, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (tensor1.numel() == 1) {

				    op(1, tensor1.data<scalar1>(), tensor2.data<scalar2>(), 0, 0);

				    return;

				  }

				  if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        1,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_kernel(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				              strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        1,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_kernel(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter<scalar1>(tensor1),

				              strided_tensor_iter<scalar2>(tensor2));

				        });

				  }

				}

				/*

				  Apply a pointwise operator to sequence of tensors

				  The calling convention for op is a function/functor that takes takes the same

				  number of pointers of type scalar as the number of given tensors. For example,

				  to compute a = b * c, op would be of the form:

				  [](scalar* a_val, const scalar* b_val, const scalar* c_val) { a_val[0] =

				  b_val[0] * c_val[0]; };

				*/

				template <typename scalar1, typename Op>

				inline void CPU_tensor_apply1(Tensor tensor1, const Op op) {

				  if (!_apply_preamble({tensor1}))

				    return;

				  if (tensor1.ndimension() < 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));

				  } else {

				    apply_op(tensor1.numel(), 0, op, strided_tensor_iter<scalar1>(tensor1));

				  }

				}

				template <typename scalar1, typename scalar2, typename Op>

				inline void CPU_tensor_apply2(Tensor tensor1, Tensor tensor2, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2));

				  }

				}

				template <typename scalar1, typename scalar2, typename scalar3, typename Op>

				inline void

				CPU_tensor_apply3(Tensor tensor1, Tensor tensor2, Tensor tensor3, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2, tensor3}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2, tensor3}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2),

				        strided_tensor_iter_fixed<scalar3, 8>(tensor3));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2),

				        strided_tensor_iter<scalar3>(tensor3));

				  }

				}

				template <

				    typename scalar1,

				    typename scalar2,

				    typename scalar3,

				    typename scalar4,

				    typename Op>

				inline void CPU_tensor_apply4(

				    Tensor tensor1,

				    Tensor tensor2,

				    Tensor tensor3,

				    Tensor tensor4,

				    const Op op) {

				  if (!_apply_preamble({tensor1, tensor2, tensor3, tensor4}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2, tensor3, tensor4}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2),

				        strided_tensor_iter_fixed<scalar3, 8>(tensor3),

				        strided_tensor_iter_fixed<scalar4, 8>(tensor4));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2),

				        strided_tensor_iter<scalar3>(tensor3),

				        strided_tensor_iter<scalar4>(tensor4));

				  }

				}

				template <typename scalar1, typename Op>

				inline void CPU_tensor_parallel_apply1(

				    Tensor tensor1,

				    const Op op,

				    int64_t grain_size = internal::GRAIN_SIZE) {

				  if (!_apply_preamble({tensor1}))

				    return;

				  if (tensor1.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin, begin, op, strided_tensor_iter<scalar1>(tensor1));

				        });

				  }

				}

				template <typename scalar1, typename scalar2, typename Op>

				inline void CPU_tensor_parallel_apply2(

				    Tensor tensor1,

				    Tensor tensor2,

				    const Op op,

				    int64_t grain_size = internal::GRAIN_SIZE) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				              strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter<scalar1>(tensor1),

				              strided_tensor_iter<scalar2>(tensor2));

				        });

				  }

				}

				} // namespace at

									
										31

aten/src/ATen/CPUFixedAllocator.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				#pragma once

				#include "ATen/core/Error.h"

				#include "TH/TH.h"

				// This file creates a fake allocator that just throws exceptions if

				// it is actually used.

				// state passed to the allocator is the std::function<void(void*)> called

				// when the blob is release by ATen

				namespace at {

				static cpu_fixed_malloc(void *, ptrdiff_t) {

				  AT_ERROR("attempting to resize a tensor view of an external blob");

				}

				static cpu_fixed_realloc(void *, void*, ptrdiff_t) {

				  AT_ERROR("attempting to resize a tensor view of an external blob");

				}

				static cpu_fixed_free(void * state, void * allocation) {

				    auto on_release = static_cast<std::function<void(void*)>*>(state);

				    (*on_release)(allocation);

				    delete on_release;

				}

				static THAllocator CPU_fixed_allocator =

				  { cpu_fixed_malloc, cpu_fixed_realloc, cpu_fixed_free };

				}

									
										16

aten/src/ATen/CPUGeneral.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,16 @@

				#include <ATen/CPUGeneral.h>

				#include <atomic>

				#include <memory>

				#include <thread>

				namespace at {

				// Lock free atomic type

				std::atomic<int> num_threads(-1);

				void set_num_threads(int num_threads_) {

				  if (num_threads_ >= 0)

				    num_threads.store(num_threads_);

				}

				int get_num_threads() { return num_threads.load(); }

				}

									
										12

aten/src/ATen/CPUGeneral.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,12 @@

				#pragma once

				// Using AT_API is crucial as otherwise you'll see

				// linking errors using MSVC

				// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx

				// This header adds this if using AT_API

				#include "ATen/core/ATenGeneral.h"

				namespace at {

				AT_API void set_num_threads(int);

				AT_API int get_num_threads();

				}

									
										49

aten/src/ATen/CPUGenerator.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,49 @@

				#include "ATen/CPUGenerator.h"

				#define const_generator_cast(generator) \

				  dynamic_cast<const CPUGenerator&>(generator)

				namespace at {

				CPUGenerator::CPUGenerator(Context * context_)

				  : context(context_), generator(THGenerator_new())

				{}

				CPUGenerator::~CPUGenerator() {

				  if (generator)

				    THGenerator_free(generator);

				}

				CPUGenerator& CPUGenerator::copy(const Generator& from) {

				  THGenerator_copy(generator, const_generator_cast(from).generator);

				  return *this;

				}

				CPUGenerator& CPUGenerator::free() {

				  THGenerator_free(generator);

				  return *this;

				}

				uint64_t CPUGenerator::seed() {

				  return THRandom_seed(generator);

				}

				uint64_t CPUGenerator::initialSeed() {

				  return THRandom_initialSeed(generator);

				}

				CPUGenerator& CPUGenerator::manualSeed(uint64_t seed) {

				  THRandom_manualSeed(generator, seed);

				  return *this;

				}

				CPUGenerator& CPUGenerator::manualSeedAll(uint64_t seed) {

				  // There's only one CPU generator

				  return manualSeed(seed);

				}

				void * CPUGenerator::unsafeGetTH() {

				  return generator;

				}

				} // namespace at

									
										20

aten/src/ATen/CPUTypeDefault.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				#include <ATen/CPUTypeDefault.h>

				#include <ATen/Context.h>

				#include <ATen/CPUGenerator.h>

				namespace at {

				Allocator* CPUTypeDefault::allocator() const {

				  return getCPUAllocator();

				}

				Device CPUTypeDefault::getDeviceFromPtr(void * data) const {

				  return DeviceType::CPU;

				}

				std::unique_ptr<Generator> CPUTypeDefault::generator() const {

				  return std::unique_ptr<Generator>(new CPUGenerator(&at::globalContext()));

				}

				} // namespace at

									
										14

aten/src/ATen/CPUTypeDefault.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				#pragma once

				#include <ATen/TypeDefault.h>

				namespace at {

				struct AT_API CPUTypeDefault : public TypeDefault {

				  CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)

				      : TypeDefault(type_id, is_variable, is_undefined) {}

				  Allocator* allocator() const override;

				  Device getDeviceFromPtr(void * data) const override;

				  std::unique_ptr<Generator> generator() const override;

				};

				} // namespace at

0

aten/src/ATen/CUDAGuard.h Normal file

View File

0

aten/src/ATen/CUDAStream.cpp Normal file

View File

0

aten/src/ATen/CUDAStream.h Normal file

View File

									
										18

aten/src/ATen/CheckGenerator.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,18 @@

				#pragma once

				#include "ATen/core/Generator.h"

				#include "ATen/Utils.h"

				#include "ATen/core/Error.h"

				namespace at {

				template <typename T>

				static inline T * check_generator(Generator * expr, Generator * defaultValue) {

				  if (!expr)

				    expr = defaultValue;

				  if(auto result = dynamic_cast<T*>(expr))

				    return result;

				  AT_ERROR("Expected a '", typeid(T).name(), "' but found '", typeid(expr).name(), "'");

				}

				} // namespace at

									
										11

aten/src/ATen/Config.h.in
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				#pragma once

				// Test these using #if AT_MKL_ENABLED(), not #ifdef, so that it's

				// obvious if you forgot to include Config.h

				//    c.f. https://stackoverflow.com/questions/33759787/generating-an-error-if-checked-boolean-macro-is-not-defined

				//

				// DO NOT put the macros for CUDA libraries in this file; they belong in cuda/CUDAConfig.h

				#define AT_MKLDNN_ENABLED() @AT_MKLDNN_ENABLED@

				#define AT_MKL_ENABLED() @AT_MKL_ENABLED@

				#define CAFFE2_STATIC_LINK_CUDA() @CAFFE2_STATIC_LINK_CUDA@

									
										144

aten/src/ATen/Context.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,144 @@

				#include "ATen/Config.h"

				#include "Context.h"

				#include <ATen/core/TensorOptions.h>

				#include <thread>

				#include <mutex>

				#include <sstream>

				#include <string>

				#include <stdexcept>

				#include "ATen/CPUGenerator.h"

				#include "ATen/RegisterCPU.h"

				#include "ATen/Tensor.h"

				#include "TH/TH.h"  // for USE_LAPACK

				#ifdef USE_SSE3

				#include <pmmintrin.h>

				#endif

				namespace at {

				static inline void errorHandler(const char * msg, void * data) {

				  throw std::runtime_error(msg);

				}

				static inline void argErrorHandler(int arg, const char * msg, void * data) {

				  std::stringstream new_error;

				  new_error << "invalid argument " << arg << ": " << msg;

				  throw std::runtime_error(new_error.str());

				}

				Context::Context()

				: next_id(static_cast<size_t>(TypeID::NumOptions))

				, thc_state(nullptr, [](THCState* p){ /* no-op */ } ) {

				  THSetDefaultErrorHandler(errorHandler,nullptr);

				  THSetDefaultArgErrorHandler(argErrorHandler,nullptr);

				  generator_registry[static_cast<int>(DeviceType::CPU)]

				    .reset(new CPUGenerator(this));

				  register_cpu_types(this);

				}

				// TODO: This could be bad juju if someone calls globalContext() in the

				// destructor of an object with static lifetime.

				Context & globalContext() {

				  static Context globalContext_;

				  return globalContext_;

				}

				// NB: This method is *purely* whether or not a user requested

				// that CuDNN was enabled, it doesn't actually say anything about

				// whether or not CuDNN is actually usable.

				bool Context::userEnabledCuDNN() const {

				  return enabled_cudnn;

				}

				void Context::setUserEnabledCuDNN(bool e) {

				  enabled_cudnn = e;

				}

				bool Context::deterministicCuDNN() const {

				  return deterministic_cudnn;

				}

				void Context::setDeterministicCuDNN(bool b) {

				  deterministic_cudnn = b;

				}

				bool Context::benchmarkCuDNN() const {

				  return benchmark_cudnn;

				}

				void Context::setBenchmarkCuDNN(bool b) {

				  benchmark_cudnn = b;

				}

				bool Context::hasMKL() const {

				#if AT_MKL_ENABLED()

				  return true;

				#else

				  return false;

				#endif

				}

				bool Context::hasLAPACK() const {

				#ifdef USE_LAPACK

				  return true;

				#else

				  return false;

				#endif

				}

				bool Context::setFlushDenormal(bool on) {

				#ifdef USE_SSE3

				  // Setting flush-to-zero (FTZ) flag

				  _MM_SET_FLUSH_ZERO_MODE(on ? _MM_FLUSH_ZERO_ON

				                             : _MM_FLUSH_ZERO_OFF);

				  // Setting denormals-are-zero (DAZ) flag

				  _MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON

				                                 : _MM_DENORMALS_ZERO_OFF);

				  return true;

				#else

				  return false;

				#endif

				}

				TypeExtendedInterface& getType(TensorOptions options) {

				  return globalContext().getType(

				            options.backend(), options.dtype(), options.is_variable());

				}

				TypeExtendedInterface& getType(const TensorImpl* impl) {

				  Backend backend = tensorTypeIdToBackend(impl->type_id());

				  return globalContext().getType(

				            backend, dataTypeToScalarType(impl->dtype().id()), impl->is_variable());

				}

				TypeExtendedInterface& getType(const Tensor& t) {

				  return getType(t.unsafeGetTensorImpl());

				}

				Allocator* getCPUAllocator() {

				  return getTHDefaultAllocator();

				}

				struct LegacyTypeInit : public LegacyTypeInitInterface {

				  LegacyTypeInit(LegacyTypeInitArgs) {}

				  void initCPU() const override {

				    globalContext();

				  }

				  void initCUDA() const override {

				    globalContext().lazyInitCUDA();

				  }

				  void initComplex() const override {

				    globalContext().lazyInitComplex();

				  }

				};

				REGISTER_LEGACY_TYPE_INIT(LegacyTypeInit);

				}

									
										194

aten/src/ATen/Context.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,194 @@

				#pragma once

				#include <ATen/CPUGeneral.h>

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/CUDAStream.h"

				#include "ATen/core/Generator.h"

				#include "ATen/Type.h"

				#include "ATen/TypeExtendedInterface.h"

				#include "ATen/Utils.h"

				#include "ATen/core/Error.h"

				#include "ATen/detail/CUDAHooksInterface.h"

				#include "ATen/core/VariableHooksInterface.h"

				#include "ATen/detail/ComplexHooksInterface.h"

				#include "ATen/core/LegacyTypeDispatch.h"

				// This is temporary

				#include "ATen/core/ATenCoreTest.h"

				#include <memory>

				#include <mutex>

				#include <cstdint>

				namespace at {

				struct Tensor;

				class AT_API Context {

				public:

				  Context();

				  TypeExtendedInterface* getNonVariableTypeRaw(Backend p, ScalarType s) {

				    return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeRaw(p, s));

				  }

				  TypeExtendedInterface * getNonVariableTypeOpt(Backend p, ScalarType s) {

				    return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeOpt(p, s));

				  }

				  TypeExtendedInterface & getNonVariableType(Backend p, ScalarType s) {

				    return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getNonVariableType(p, s));

				  }

				  TypeExtendedInterface & getVariableType(Backend p, ScalarType s) {

				    return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getVariableType(p, s));

				  }

				  TypeExtendedInterface & getType(Backend p, ScalarType s, bool is_variable) {

				    return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getType(p, s, is_variable));

				  }

				  // The passed in Type must be delete'able

				  // TODO: Just make it take a unique_ptr

				  void registerType(Backend b, ScalarType s, Type* t) {

				    globalLegacyTypeDispatch().registerType(b, s,

				      LegacyTypeDispatch::TypeUniquePtr{t, LegacyTypeDeleter([](Type* p) { delete p; }) });

				  }

				  Generator & defaultGenerator(DeviceType device_type) {

				    initCUDAIfNeeded(device_type);

				    auto & generator = generator_registry[static_cast<int>(device_type)];

				    if(!generator)

				      AT_ERROR(DeviceTypeName(device_type), " backend type not enabled.");

				    return *generator;

				  }

				  bool hasMKL() const;

				  bool hasLAPACK() const;

				  bool hasMAGMA() const {

				    return detail::getCUDAHooks().hasMAGMA();

				  }

				  bool hasCUDA() const {

				    return detail::getCUDAHooks().hasCUDA();

				  }

				  bool hasCuDNN() const {

				    return detail::getCUDAHooks().hasCuDNN();

				  }

				  int64_t current_device() const {

				    return detail::getCUDAHooks().current_device();

				  }

				  // defined in header so that getNonVariableType has ability to inline

				  // call_once check. getNonVariableType is called fairly frequently

				  THCState* lazyInitCUDA() {

				    std::call_once(thc_init,[&] {

				      thc_state = detail::getCUDAHooks().initCUDA();

				      generator_registry[static_cast<int>(DeviceType::CUDA)] =

				        detail::getCUDAHooks().initCUDAGenerator(this);

				      detail::getCUDAHooks().registerCUDATypes(this);

				    });

				    return thc_state.get();

				  }

				  void lazyInitComplex() {

				    std::call_once(complex_init_, [&] {

				      detail::getComplexHooks().registerComplexTypes(this);

				    });

				  }

				  THCState* getTHCState() {

				    // AT_ASSERT(thc_state);

				    return thc_state.get();

				  }

				  int getNumGPUs() const {

				    return detail::getCUDAHooks().getNumGPUs();

				  }

				  size_t freshTypeID() {

				    return next_id++;

				  }

				  bool setFlushDenormal(bool on);

				  // NB: This method is *purely* whether or not a user requested

				  // that CuDNN was enabled, it doesn't actually say anything about

				  // whether or not CuDNN is actually usable.  Use cudnn_is_acceptable

				  // to test this instead

				  bool userEnabledCuDNN() const;

				  void setUserEnabledCuDNN(bool e);

				  bool benchmarkCuDNN() const;

				  void setBenchmarkCuDNN(bool);

				  bool deterministicCuDNN() const;

				  void setDeterministicCuDNN(bool);

				  std::unique_ptr<Generator>

				    generator_registry[static_cast<int>(DeviceType::COMPILE_TIME_MAX_DEVICE_TYPES)];

				private:

				  void initCUDAIfNeeded(DeviceType p) {

				    if (p == DeviceType::CUDA) {

				      lazyInitCUDA();

				    }

				  }

				  void initComplexIfNeeded(ScalarType s) {

				    if (isComplexType(s)) {

				      lazyInitComplex();

				    }

				  }

				  std::once_flag thc_init;

				  std::once_flag complex_init_;

				  bool enabled_cudnn = true;

				  bool deterministic_cudnn = false;

				  bool benchmark_cudnn = false;

				  std::atomic<size_t> next_id;

				  std::unique_ptr<THCState, void(*)(THCState*)> thc_state;

				  friend struct Type;

				};

				AT_API Context & globalContext();

				static inline void init() {

				  globalContext();

				  if (const char *env_p = std::getenv("OMP_NUM_THREADS")) {

				    at::set_num_threads(std::stoi(env_p));

				  }

				  if (const char *env_p = std::getenv("MKL_NUM_THREADS")) {

				    at::set_num_threads(std::stoi(env_p));

				  }

				}

				static inline TypeExtendedInterface& getNonVariableType(Backend p, ScalarType s) {

				  return globalContext().getNonVariableType(p, s);

				}

				static inline TypeExtendedInterface& getNonVariableType(DeviceType p, ScalarType s) {

				  return globalContext().getNonVariableType(deviceTypeToBackend(p), s);

				}

				AT_API TypeExtendedInterface& getType(TensorOptions options);

				AT_API TypeExtendedInterface& getType(const TensorImpl*);

				AT_API TypeExtendedInterface& getType(const Tensor&);

				AT_API Allocator* getCPUAllocator();

				static inline TypeExtendedInterface& CPU(ScalarType s) {

				  return getNonVariableType(Backend::CPU, s);

				}

				static inline TypeExtendedInterface& CUDA(ScalarType s) {

				  return getNonVariableType(Backend::CUDA, s);

				}

				static inline bool hasCUDA() {

				  return globalContext().hasCUDA();

				}

				static inline bool hasCuDNN() {

				  return globalContext().hasCuDNN();

				}

				static inline bool hasMKL() {

				  return globalContext().hasMKL();

				}

				static inline bool hasLAPACK() {

				  return globalContext().hasLAPACK();

				}

				static inline bool hasMAGMA() {

				  return globalContext().hasMAGMA();

				}

				static inline int64_t current_device() {

				  return globalContext().current_device();

				}

				} // namespace at

									
										180

aten/src/ATen/DLConvertor.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,180 @@

				#include "ATen/DLConvertor.h"

				#include "ATen/Functions.h"

				#include <iostream>

				#include <sstream>

				using namespace std;

				namespace at {

				static DLDataType getDLDataType(const Type& type) {

				  DLDataType dtype;

				  dtype.lanes = 1;

				  dtype.bits = type.elementSizeInBytes() * 8;

				  switch (type.scalarType()) {

				    case ScalarType::Byte:

				      dtype.code = DLDataTypeCode::kDLUInt;

				      break;

				    case ScalarType::Char:

				      dtype.code = DLDataTypeCode::kDLInt;

				      break;

				    case ScalarType::Double:

				      dtype.code = DLDataTypeCode::kDLFloat;

				      break;

				    case ScalarType::Float:

				      dtype.code = DLDataTypeCode::kDLFloat;

				      break;

				    case ScalarType::Int:

				      dtype.code = DLDataTypeCode::kDLInt;

				      break;

				    case ScalarType::Long:

				      dtype.code = DLDataTypeCode::kDLInt;

				      break;

				    case ScalarType::Short:

				      dtype.code = DLDataTypeCode::kDLInt;

				      break;

				    case ScalarType::Half:

				      dtype.code = DLDataTypeCode::kDLFloat;

				      break;

				    case ScalarType::ComplexHalf:

				      throw std::logic_error("ComplexHalf is not supported by dlpack");

				    case ScalarType::ComplexFloat:

				      throw std::logic_error("ComplexFloat is not supported by dlpack");

				    case ScalarType::ComplexDouble:

				      throw std::logic_error("ComplexDouble is not supported by dlpack");

				    case ScalarType::Undefined:

				      throw std::logic_error("Undefined is not a valid ScalarType");

				    case ScalarType::NumOptions:

				      throw std::logic_error("NumOptions is not a valid ScalarType");

				  }

				  return dtype;

				}

				static DLContext getDLContext(const Type& type, const int64_t& device_id) {

				  DLContext ctx;

				  ctx.device_id = device_id;

				  if (type.is_cuda()) {

				    ctx.device_type = DLDeviceType::kDLGPU;

				  } else {

				    ctx.device_type = DLDeviceType::kDLCPU;

				  }

				  return ctx;

				}

				static DeviceType getATenDeviceType(const DLContext& ctx) {

				  switch (ctx.device_type) {

				    case DLDeviceType::kDLCPU:

				      return DeviceType::CPU;

				    case DLDeviceType::kDLGPU:

				      return DeviceType::CUDA;

				    case DLDeviceType::kDLOpenCL:

				      return DeviceType::OPENCL;

				    case DLDeviceType::kDLROCM:

				      return DeviceType::HIP;

				    default:

				      throw std::logic_error("Unsupported device_type: " + std::to_string(ctx.device_type));

				  }

				  return DeviceType::CPU; // impossible

				}

				ScalarType toScalarType(const DLDataType& dtype) {

				  ScalarType stype;

				  if (dtype.lanes != 1) throw std::logic_error("ATen does not support lanes != 1");

				  switch (dtype.code) {

				    case DLDataTypeCode::kDLUInt:

				      switch (dtype.bits) {

				        case 8:

				          stype = ScalarType::Byte;

				          break;

				        default:

				          throw std::logic_error("Unsupported kUInt bits " + std::to_string(dtype.bits));

				      }

				      break;

				    case DLDataTypeCode::kDLInt:

				      switch (dtype.bits) {

				        case 8:

				          stype = ScalarType::Char;

				          break;

				        case 16:

				          stype = ScalarType::Short;

				          break;

				        case 32:

				          stype = ScalarType::Int;

				          break;

				        case 64:

				          stype = ScalarType::Long;

				          break;

				        default:

				          throw std::logic_error("Unsupported kInt bits " + std::to_string(dtype.bits));

				      }

				      break;

				    case DLDataTypeCode::kDLFloat:

				      switch (dtype.bits) {

				        case 16:

				          stype = ScalarType::Half;

				          break;

				        case 32:

				          stype = ScalarType::Float;

				          break;

				        case 64:

				          stype = ScalarType::Double;

				          break;

				        default:

				          throw std::logic_error("Unsupported kFloat bits " + std::to_string(dtype.bits));

				      }

				      break;

				    default:

				      throw std::logic_error("Unsupported code " + std::to_string(dtype.code));

				  }

				  return stype;

				}

				struct ATenDLMTensor {

				  Tensor handle;

				  DLManagedTensor tensor;

				};

				void deleter(DLManagedTensor * arg) {

				  delete static_cast<ATenDLMTensor*>(arg->manager_ctx);

				}

				// This function returns a shared_ptr to memory managed DLpack tensor constructed

				// out of ATen tensor

				DLManagedTensor* toDLPack(const Tensor& src) {

				  ATenDLMTensor * atDLMTensor(new ATenDLMTensor);

				  atDLMTensor->handle = src;

				  atDLMTensor->tensor.manager_ctx = atDLMTensor;

				  atDLMTensor->tensor.deleter = &deleter;

				  atDLMTensor->tensor.dl_tensor.data = src.data_ptr();

				  int64_t device_id = 0;

				  if (src.type().is_cuda()) {

				    device_id = src.get_device();

				  }

				  atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id);

				  atDLMTensor->tensor.dl_tensor.ndim = src.dim();

				  atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src.type());

				  atDLMTensor->tensor.dl_tensor.shape = const_cast<int64_t*>(src.sizes().data());

				  atDLMTensor->tensor.dl_tensor.strides = const_cast<int64_t*>(src.strides().data());

				  atDLMTensor->tensor.dl_tensor.byte_offset = 0;

				  return &(atDLMTensor->tensor);

				}

				Tensor fromDLPack(const DLManagedTensor* src) {

				  DeviceType device_type = getATenDeviceType(src->dl_tensor.ctx);

				  ScalarType stype = toScalarType(src->dl_tensor.dtype);

				  auto deleter = [src](void * self) {

				    src->deleter(const_cast<DLManagedTensor*>(src));

				  };

				  return at::from_blob(src->dl_tensor.data,

				      IntList(src->dl_tensor.shape, src->dl_tensor.ndim),

				      IntList(src->dl_tensor.strides, src->dl_tensor.ndim),

				      deleter,

				      at::device(device_type).dtype(stype));

				}

				} //namespace at

									
										17

aten/src/ATen/DLConvertor.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,17 @@

				#pragma once

				#include "ATen/Tensor.h"

				#include "ATen/ATen.h"

				#include "ATen/dlpack.h"

				// this convertor will:

				// 1) take a Tensor object and wrap it in the DLPack tensor

				// 2) take a dlpack tensor and convert it to the ATen Tensor

				namespace at {

				AT_API ScalarType toScalarType(const DLDataType& dtype);

				AT_API DLManagedTensor * toDLPack(const Tensor& src);

				AT_API Tensor fromDLPack(const DLManagedTensor* src);

				} //namespace at

3339

aten/src/ATen/Declarations.cwrap Normal file

View File

File diff suppressed because it is too large Load Diff

									
										2

aten/src/ATen/Device.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Device.h>

									
										132

aten/src/ATen/DeviceGuard.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,132 @@

				#pragma once

				#include <ATen/core/Device.h>

				#include <ATen/core/ScalarType.h>

				#include <ATen/Tensor.h>

				#include <ATen/core/Error.h>

				#include <ATen/core/optional.h>

				#include <ATen/detail/CUDAHooksInterface.h>

				#include <cstddef>

				namespace at {

				/// RAII guard that sets a certain default GPU index in its constructor, and

				/// changes it back to the device that was originally active upon destruction.

				///

				/// The index is always reset to the one that was active at the time of

				/// construction of the guard. Even if you `set_index` after construction, the

				/// destructor will still reset the index to the one that was active at

				/// construction time.

				struct DeviceGuard {

				  /// Default constructor, does nothing.

				  DeviceGuard() = default;

				  /// Uses the given device's `index()` if it is a CUDA device, else does

				  /// nothing.

				  explicit DeviceGuard(Device device) {

				    if (device.is_cuda()) {

				      set_index(device.index());

				    }

				  }

				  explicit DeviceGuard(optional<Device> device_opt) {

				    if (device_opt.has_value() && device_opt.value().is_cuda()) {

				      set_index(device_opt.value().index());

				    }

				  }

				  /// Calls `set_index` with the given index.

				  explicit DeviceGuard(int32_t index) {

				    set_index(index);

				  }

				  /// Sets the device to the index on which the given tensor is located.

				  explicit DeviceGuard(const Tensor& tensor) {

				    set_index_from(tensor);

				  }

				  /// Sets the device to the index on which the first tensor in the list is

				  /// located. If the list is empty, does nothing.

				  explicit DeviceGuard(const TensorList& tensors) {

				    if (!tensors.empty()) {

				      set_index_from(tensors.front());

				    }

				  }

				  /// Copy is disallowed.

				  DeviceGuard(const DeviceGuard&) = delete;

				  DeviceGuard& operator=(const DeviceGuard&) = delete;

				  /// Move-constructs this `DeviceGuard` from another `DeviceGuard`. The

				  /// moved-from `DeviceGuard` is modified such that its destruction has no

				  /// effect (does not reset the device).

				  DeviceGuard(DeviceGuard&& other) noexcept {

				    *this = std::move(other);

				  }

				  /// Move-assigns this `DeviceGuard` from another `DeviceGuard`. The

				  /// moved-from `DeviceGuard` is modified such that its destruction has no

				  /// effect (does not reset the device).

				  DeviceGuard& operator=(DeviceGuard&& other) noexcept {

				    this->original_index_ = other.original_index_;

				    this->last_index_ = other.last_index_;

				    // Set other's original index to the unspecified/default state, so that it

				    // doesn't also reset the device in its constructor.

				    other.original_index_ = -1;

				    return *this;

				  }

				  /// Resets the device to the index that was active at construction of the

				  /// guard.

				  ~DeviceGuard() {

				    // It should only not have a value if an index was never actually set.

				    if (original_index_ != -1) {

				      // Unchecked because we don't want to throw in the destructor.

				      detail::DynamicCUDAInterface::unchecked_set_device(original_index_);

				    }

				  }

				  /// Sets the device to the given one.

				  void set_index(int32_t index) {

				    if (index == -1) {

				      return;

				    }

				    AT_ASSERT(index >= 0);

				    if (original_index_ == -1) {

				      int32_t previous_index = -123;

				      detail::DynamicCUDAInterface::get_device(&previous_index);

				      original_index_ = previous_index;

				      if (index != original_index_) {

				        detail::DynamicCUDAInterface::set_device(index);

				      }

				    } else {

				      detail::DynamicCUDAInterface::set_device(index);

				    }

				    last_index_ = index;

				  }

				  /// Calls `set_index` with the `Tensor`'s current device, if it is a CUDA

				  /// tensor. Does nothing if the `tensor` is not defined.

				  void set_index_from(const Tensor& tensor) {

				    if (tensor.defined() && tensor.is_cuda()) {

				      set_index(tensor.get_device());

				    }

				  }

				  /// Returns the device that was set upon construction of the guard.

				  int32_t original_index() const noexcept {

				    return original_index_;

				  }

				  /// Returns the last device that was set via `set_index`, if any.

				  int32_t last_index() const noexcept {

				    return last_index_;

				  }

				 private:

				  /// The original device that was active at construction of this object.

				  int32_t original_index_ = -1;

				  /// The last index that was set via `set_index`.

				  int32_t last_index_ = -1;

				};

				} // namespace at

									
										11

aten/src/ATen/DimVector.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				#pragma once

				#include <ATen/core/SmallVector.h>

				#include <stdint.h>

				namespace at {

				/// A container for sizes or strides

				using DimVector = SmallVector<int64_t, 5>;

				}

									
										130

aten/src/ATen/Dispatch.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,130 @@

				#pragma once

				#include <ATen/Type.h>

				#include <ATen/core/Error.h>

				#include <ATen/core/Half.h>

				#define AT_PRIVATE_CASE_TYPE(enum_type, type, ...) \

				  case enum_type: {                                \

				    using scalar_t = type;                         \

				    return __VA_ARGS__();                          \

				  }

				#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...)                           \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...)                  \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...)                           \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...)                                \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...)                            \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				  }()

									
										2

aten/src/ATen/Error.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Error.h>

									
										82

aten/src/ATen/ExpandUtils.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,82 @@

				#include "ATen/ExpandUtils.h"

				namespace at {

				std::vector<int64_t> infer_size(IntList a, IntList b) {

				  auto dimsA = a.size();

				  auto dimsB = b.size();

				  ptrdiff_t ndim = dimsA > dimsB ? dimsA : dimsB;

				  std::vector<int64_t> expandedSizes(ndim);

				  for (long i = ndim - 1; i >= 0; --i) {

				    long offset = ndim - 1 - i;

				    long dimA = dimsA - 1 - offset;

				    long dimB = dimsB - 1 - offset;

				    long sizeA = (dimA >= 0) ? a[dimA] : 1;

				    long sizeB = (dimB >= 0) ? b[dimB] : 1;

				    AT_CHECK(

				        sizeA == sizeB || sizeA == 1 || sizeB == 1,

				        "The size of tensor a (", sizeA,

				        ") must match the size of tensor b (", sizeB,

				        ") at non-singleton dimension ", i);

				      // 1s map to the other size (even 0).

				      expandedSizes[i] = sizeA == 1 ? sizeB : sizeA;

				  }

				  return expandedSizes;

				}

				std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(

				    IntList tensor_sizes,

				    IntList tensor_strides,

				    IntList sizes) {

				  int64_t ndim = sizes.size();

				  int64_t tensor_dim = tensor_sizes.size();

				  if (tensor_dim == 0) {

				    std::vector<int64_t> expandedStrides(ndim, 0);

				    return std::tuple<std::vector<int64_t>, std::vector<int64_t>>(

				        sizes.vec(), expandedStrides);

				  }

				  std::vector<int64_t> expandedSizes(ndim);

				  std::vector<int64_t> expandedStrides(ndim);

				  // create a new geometry for the tensors

				  for (int64_t i = ndim - 1; i >= 0; --i) {

				    int64_t offset = ndim - 1 - i;

				    int64_t dim = tensor_dim - 1 - offset;

				    int64_t size = (dim >= 0) ? tensor_sizes[dim] : 1;

				    int64_t stride = (dim >= 0) ? tensor_strides[dim]

				                                : expandedSizes[i + 1] * expandedStrides[i + 1];

				    int64_t targetSize = sizes[i];

				    if (targetSize == -1) {

				      AT_CHECK(

				          dim >= 0,

				          "The expanded size of the tensor (",

				          targetSize,

				          ") isn't allowed in a leading, non-existing dimension ",

				          i);

				      targetSize = size;

				    }

				    if (size != targetSize) {

				      AT_CHECK(

				          size == 1,

				          "The expanded size of the tensor (",

				          targetSize,

				          ") must match the existing size (",

				          size,

				          ") at non-singleton dimension ",

				          i);

				      size = targetSize;

				      stride = 0;

				    }

				    expandedSizes[i] = size;

				    expandedStrides[i] = stride;

				  }

				  return std::tuple<std::vector<int64_t>, std::vector<int64_t>>(

				      expandedSizes, expandedStrides);

				}

				} // namespace at

									
										169

aten/src/ATen/ExpandUtils.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,169 @@

				#pragma once

				#include "ATen/Tensor.h"

				#include "ATen/core/Error.h"

				#include <functional>

				#include <sstream>

				#include <tuple>

				namespace at {

				AT_API std::vector<int64_t> infer_size(IntList a, IntList b);

				AT_API std::tuple<std::vector<int64_t>, std::vector<int64_t> > inferExpandGeometry(

				    IntList tensor_sizes, IntList tensor_strides, IntList sizes);

				// avoid copy-construction of Tensor by using a reference_wrapper.

				inline void check_defined(std::initializer_list<std::reference_wrapper<const Tensor>> tensors, const char *api_name) {

				  for (auto& t : tensors) {

				    if (!t.get().defined()) {

				      AT_ERROR(api_name, "(...) called with an undefined Tensor");

				    }

				  }

				}

				inline std::tuple<Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand) {

				  if (tensor.sizes().equals(to_expand.sizes())) {

				    return std::make_tuple(to_expand);

				  }

				  return std::make_tuple(to_expand.expand(tensor.sizes(), /*implicit=*/true)); // see [expand implicit]

				}

				inline std::tuple<Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand, const char *api_name) {

				  check_defined({tensor, to_expand}, api_name);

				  return expand_inplace(tensor, to_expand);

				}

				inline std::tuple<Tensor, Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand1, const Tensor &to_expand2) {

				  if (tensor.sizes().equals(to_expand1.sizes()) && tensor.sizes().equals((to_expand2.sizes()))) {

				    return std::make_tuple(to_expand1, to_expand2);

				  }

				  return std::make_tuple(

				      to_expand1.expand(tensor.sizes(), /*implicit=*/true), // see [expand implicit]

				      to_expand2.expand(tensor.sizes(), /*implicit=*/true));

				}

				inline std::tuple<Tensor, Tensor> expand_inplace(const Tensor &tensor, const Tensor &to_expand1, const Tensor &to_expand2,

				                                                 const char *api_name) {

				  check_defined({tensor, to_expand1, to_expand2}, api_name);

				  return expand_inplace(tensor, to_expand1, to_expand2);

				}

				inline std::tuple<Tensor, Tensor> expand_outplace(const Tensor &to_expand1, const Tensor &to_expand2) {

				  if (to_expand1.sizes().equals(to_expand2.sizes())) {

				    return std::make_tuple(to_expand1, to_expand2);

				  }

				  auto expanded_size = infer_size(to_expand1.sizes(), to_expand2.sizes());

				  return std::make_tuple(

				      to_expand1.expand(expanded_size, /*implicit=*/true), // see [expand implicit]

				      to_expand2.expand(expanded_size, /*implicit=*/true));

				}

				inline std::tuple<Tensor, Tensor> expand_outplace(const Tensor &to_expand1, const Tensor &to_expand2, const char *api_name) {

				  check_defined({to_expand1, to_expand2}, api_name);

				  return expand_outplace(to_expand1, to_expand2);

				}

				inline std::tuple<Tensor, Tensor, Tensor> expand_outplace(const Tensor &to_expand1,

				                                                          const Tensor &to_expand2,

				                                                          const Tensor &to_expand3) {

				  if (to_expand1.sizes().equals(to_expand2.sizes()) && to_expand1.sizes().equals(to_expand3.sizes())) {

				    return std::make_tuple(to_expand1, to_expand2, to_expand3);

				  }

				  auto expanded_size12 = infer_size(to_expand1.sizes(), to_expand2.sizes());

				  auto expanded_size = infer_size(expanded_size12, to_expand3.sizes());

				  return std::make_tuple(

				      to_expand1.expand(expanded_size, /*implicit=*/true), // see [expand implicit]

				      to_expand2.expand(expanded_size, /*implicit=*/true),

				      to_expand3.expand(expanded_size, /*implicit=*/true));

				}

				inline std::tuple<Tensor, Tensor, Tensor> expand_outplace(const Tensor &to_expand1,

				                                                          const Tensor &to_expand2,

				                                                          const Tensor &to_expand3,

				                                                          const char *api_name) {

				  check_defined({to_expand1, to_expand2, to_expand3}, api_name);

				  return expand_outplace(to_expand1, to_expand2, to_expand3);

				}

				inline std::tuple<Tensor> expand_size(const Tensor &to_expand, IntList sizes) {

				  if(to_expand.sizes().equals(sizes)) {

				    return std::make_tuple(to_expand);

				  }

				  return std::make_tuple(to_expand.expand(sizes, /*implicit=*/true)); // see [expand implicit]

				}

				inline std::tuple<Tensor> expand_size(const Tensor &to_expand, IntList sizes, const char *api_name) {

				  check_defined({to_expand}, api_name);

				  return expand_size(to_expand, sizes);

				}

				inline std::vector<Tensor> expand_outplace(TensorList to_expand) {

				  // expands a list of Tensors; ignores undefined (null) tensors

				  bool first = true;

				  std::vector<int64_t> sizes;

				  for (size_t i = 0; i < to_expand.size(); ++i) {

				    if (!to_expand[i].defined()) {

				      continue;

				    } else if (first) {

				      sizes = to_expand[i].sizes().vec();

				      first = false;

				    } else {

				      sizes = infer_size(sizes, to_expand[i].sizes());

				    }

				  }

				  std::vector<Tensor> result(to_expand.size());

				  for (size_t i = 0; i < to_expand.size(); ++i) {

				    if (!to_expand[i].defined()) {

				      continue;

				    } else if (to_expand[i].sizes().equals(sizes)) {

				      result[i] = to_expand[i];

				    } else {

				      result[i] = to_expand[i].expand(sizes, /*implicit=*/true); // see [expand implicit]

				    }

				  }

				  return result;

				}

				// Sums `tensor` repeatedly to produce a tensor of shape `shape`.

				// Precondition: is_expandable_to(shape, tensor.sizes()) must be true

				static inline Tensor sum_to(Tensor tensor, IntList shape) {

				  if (shape.size() == 0) {

				    return tensor.sum();

				  }

				  Tensor result = tensor;

				  while (result.dim() > (int64_t)shape.size()) {

				    result = result.sum(0, false);

				  }

				  for (int64_t i = 0; i < result.dim(); ++i) {

				    if (shape[i] == 1 && result.sizes()[i] > 1) {

				      result = result.sum(i, true);

				    }

				  }

				  return result;

				}

				// True if `shape` can be broadcasted to `desired`

				static inline bool is_expandable_to(IntList shape, IntList desired) {

				  int ndim = shape.size();

				  int target_dim = desired.size();

				  if (ndim > target_dim) {

				    return false;

				  }

				  for (int i = 0; i < ndim; i++) {

				    int64_t size = shape[ndim - i - 1];

				    int64_t target = desired[target_dim - i - 1];

				    if (size != target && size != 1) {

				      return false;

				    }

				  }

				  return true;

				}

				}

									
										292

aten/src/ATen/Formatting.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,292 @@

				#include "ATen/Formatting.h"

				#include <ATen/ATen.h>

				#include <cmath>

				#include <cstdint>

				#include <iomanip>

				#include <iostream>

				#include <sstream>

				#include <tuple>

				namespace at {

				//not all C++ compilers have default float so we define our own here

				inline std::ios_base& defaultfloat(std::ios_base& __base) {

				  __base.unsetf(std::ios_base::floatfield);

				  return __base;

				}

				//saves/restores number formatting inside scope

				struct FormatGuard {

				  FormatGuard(std::ostream & out)

				  : out(out), saved(nullptr) {

				    saved.copyfmt(out);

				  }

				  ~FormatGuard() {

				    out.copyfmt(saved);

				  }

				private:

				  std::ostream & out;

				  std::ios saved;

				};

				std::ostream& operator<<(std::ostream & out, IntList list) {

				  int i = 0;

				  out << "[";

				  for(auto e : list) {

				    if (i++ > 0)

				      out << ", ";

				    out << e;

				  }

				  out << "]";

				  return out;

				}

				std::ostream& operator<<(std::ostream & out, Backend b) {

				  return out << toString(b);

				}

				std::ostream& operator<<(std::ostream & out, const Type& t) {

				  return out << t.toString();

				}

				static std::tuple<double, int64_t> __printFormat(std::ostream& stream, const Tensor& self) {

				  auto size = self.numel();

				  if(size == 0) {

				    return std::make_tuple(1., 0);

				  }

				  bool intMode = true;

				  auto self_p = self.data<double>();

				  for(int64_t i = 0; i < size; i++) {

				    auto z = self_p[i];

				    if(std::isfinite(z)) {

				      if(z != std::ceil(z)) {

				        intMode = false;

				        break;

				      }

				    }

				  }

				  int64_t offset = 0;

				  while(!std::isfinite(self_p[offset])) {

				    offset = offset + 1;

				    if(offset == size) {

				      break;

				    }

				  }

				  double expMin;

				  double expMax;

				  if(offset == size) {

				    expMin = 1;

				    expMax = 1;

				  } else {

				    expMin = fabs(self_p[offset]);

				    expMax = fabs(self_p[offset]);

				    for(int64_t i = offset; i < size; i++) {

				      double z = fabs(self_p[i]);

				      if(std::isfinite(z)) {

				        if(z < expMin) {

				          expMin = z;

				        }

				        if(self_p[i] > expMax) {

				          expMax = z;

				        }

				      }

				    }

				    if(expMin != 0) {

				      expMin = std::floor(std::log10(expMin)) + 1;

				    } else {

				      expMin = 1;

				    }

				    if(expMax != 0) {

				      expMax = std::floor(std::log10(expMax)) + 1;

				    } else {

				      expMax = 1;

				    }

				  }

				  double scale = 1;

				  int64_t sz;

				  if(intMode) {

				    if(expMax > 9) {

				      sz = 11;

				      stream << std::scientific << std::setprecision(4);

				    } else {

				      sz = expMax + 1;

				      stream << defaultfloat;

				    }

				  } else {

				    if(expMax-expMin > 4) {

				      sz = 11;

				      if(std::fabs(expMax) > 99 || std::fabs(expMin) > 99) {

				        sz = sz + 1;

				      }

				      stream << std::scientific << std::setprecision(4);

				    } else {

				      if(expMax > 5 || expMax < 0) {

				        sz = 7;

				        scale = std::pow(10, expMax-1);

				        stream << std::fixed << std::setprecision(4);

				      } else {

				        if(expMax == 0) {

				          sz = 7;

				        } else {

				          sz = expMax+6;

				        }

				        stream << std::fixed << std::setprecision(4);

				      }

				    }

				  }

				  return std::make_tuple(scale, sz);

				}

				static void __printIndent(std::ostream &stream, int64_t indent)

				{

				  for(int64_t i = 0; i < indent; i++) {

				    stream << " ";

				  }

				}

				static void printScale(std::ostream & stream, double scale) {

				  FormatGuard guard(stream);

				  stream << defaultfloat << scale << " *" << std::endl;

				}

				static void __printMatrix(std::ostream& stream, const Tensor& self, int64_t linesize, int64_t indent)

				{

				  double scale;

				  int64_t sz;

				  std::tie(scale, sz) = __printFormat(stream, self);

				  __printIndent(stream, indent);

				  int64_t nColumnPerLine = (linesize-indent)/(sz+1);

				  int64_t firstColumn = 0;

				  int64_t lastColumn = -1;

				  while(firstColumn < self.size(1)) {

				    if(firstColumn + nColumnPerLine <= self.size(1)) {

				      lastColumn = firstColumn + nColumnPerLine - 1;

				    } else {

				      lastColumn = self.size(1) - 1;

				    }

				    if(nColumnPerLine < self.size(1)) {

				      if(firstColumn != 0) {

				        stream << std::endl;

				      }

				      stream << "Columns " << firstColumn+1 << " to " << lastColumn+1;

				      __printIndent(stream, indent);

				    }

				    if(scale != 1) {

				      printScale(stream,scale);

				      __printIndent(stream, indent);

				    }

				    for(int64_t l = 0; l < self.size(0); l++) {

				      Tensor row = self.select(0,l);

				      double *row_ptr = row.data<double>();

				      for(int64_t c = firstColumn; c < lastColumn+1; c++) {

				        stream << std::setw(sz) << row_ptr[c]/scale;

				        if(c == lastColumn) {

				          stream << std::endl;

				          if(l != self.size(0)-1) {

				            if(scale != 1) {

				              __printIndent(stream, indent);

				              stream << " ";

				            } else {

				              __printIndent(stream, indent);

				            }

				          }

				        } else {

				          stream << " ";

				        }

				      }

				    }

				    firstColumn = lastColumn + 1;

				  }

				}

				void __printTensor(std::ostream& stream, Tensor& self, int64_t linesize)

				{

				  std::vector<int64_t> counter(self.ndimension()-2);

				  bool start = true;

				  bool finished = false;

				  counter[0] = -1;

				  for(size_t i = 1; i < counter.size(); i++)

				    counter[i] = 0;

				  while(true) {

				    for(int64_t i = 0; self.ndimension()-2; i++) {

				      counter[i] = counter[i] + 1;

				      if(counter[i] >= self.size(i)) {

				        if(i == self.ndimension()-3) {

				          finished = true;

				          break;

				        }

				        counter[i] = 0;

				      } else {

				        break;

				      }

				    }

				    if(finished) {

				      break;

				    }

				    if(start) {

				      start = false;

				    } else {

				      stream << std::endl;

				    }

				    stream << "(";

				    Tensor tensor = self;

				    for(int64_t i=0; i < self.ndimension()-2; i++) {

				      tensor = tensor.select(0, counter[i]);

				      stream << counter[i]+1 << ",";

				    }

				    stream << ".,.) = " << std::endl;

				    __printMatrix(stream, tensor, linesize, 1);

				  }

				}

				std::ostream& print(std::ostream& stream, const Tensor & tensor_, int64_t linesize) {

				  FormatGuard guard(stream);

				  if(!tensor_.defined()) {

				    stream << "[ Tensor (undefined) ]";

				  } else if (tensor_.is_sparse()) {

				    stream << "[ " << tensor_.toString() << "{}\n";

				    stream << "indices:\n" << tensor_._indices() << "\n";

				    stream << "values:\n" << tensor_._values() << "\n";

				    stream << "size:\n" << tensor_.sizes() << "\n";

				    stream << "]";

				  } else {

				    Type& cpudouble = tensor_.type().toBackend(Backend::CPU).toScalarType(kDouble);

				    Tensor tensor = tensor_.toType(cpudouble).contiguous();

				    if(tensor.ndimension() == 0) {

				      stream << defaultfloat << tensor.data<double>()[0] << std::endl;

				      stream << "[ " << tensor_.toString() << "{} ]";

				    } else if(tensor.ndimension() == 1) {

				      if (tensor.numel() > 0) {

				        double scale;

				        int64_t sz;

				        std::tie(scale, sz) =  __printFormat(stream, tensor);

				        if(scale != 1) {

				          printScale(stream, scale);

				        }

				        double* tensor_p = tensor.data<double>();

				        for(int64_t i = 0; i < tensor.size(0); i++) {

				          stream << std::setw(sz) << tensor_p[i]/scale << std::endl;

				        }

				      }

				      stream << "[ " << tensor_.toString() << "{" << tensor.size(0) << "} ]";

				    } else if(tensor.ndimension() == 2) {

				      if (tensor.numel() > 0) {

				        __printMatrix(stream, tensor, linesize, 0);

				      }

				      stream << "[ " << tensor_.toString() << "{" << tensor.size(0) << "," <<  tensor.size(1) << "} ]";

				    } else {

				      if (tensor.numel() > 0) {

				        __printTensor(stream, tensor, linesize);

				      }

				      stream << "[ " << tensor_.toString() << "{" << tensor.size(0);

				      for(int64_t i = 1; i < tensor.ndimension(); i++) {

				        stream << "," << tensor.size(i);

				      }

				      stream << "} ]";

				    }

				  }

				  return stream;

				}

				}

									
										24

aten/src/ATen/Formatting.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,24 @@

				#pragma once

				#include <iostream>

				#include "ATen/Type.h"

				#include "ATen/core/Scalar.h"

				namespace at {

				AT_API std::ostream& operator<<(std::ostream & out, IntList list);

				AT_API std::ostream& operator<<(std::ostream & out, Backend b);

				AT_API std::ostream& operator<<(std::ostream & out, const Type & t);

				AT_API std::ostream& print(std::ostream& stream, const Tensor & tensor, int64_t linesize);

				static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) {

				  return print(out,t,80);

				}

				static inline void print(const Tensor & t, int64_t linesize=80) {

				  print(std::cout,t,linesize);

				}

				static inline std::ostream& operator<<(std::ostream & out, Scalar s) {

				  return out << (s.isFloatingPoint() ? s.toDouble() : s.toLong());

				}

				}

									
										2

aten/src/ATen/Generator.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Generator.h>

									
										2

aten/src/ATen/Half.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Half.h>

									
										44

aten/src/ATen/InferSize.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				#pragma once

				#include <ATen/optional.h>

				#include <ATen/ScalarType.h>

				#include <sstream>

				#include <vector>

				namespace at {

				// Infers the size of a dim with size -1, if it exists. Also checks that new

				// shape is compatible with the number of elements.

				static std::vector<int64_t> infer_size(IntList shape, int64_t numel) {

				  auto res = shape.vec();

				  int64_t newsize = 1;

				  auto infer_dim = at::optional<int64_t>();

				  for (int64_t dim = 0, ndim = shape.size(); dim != ndim; dim++) {

				    if (shape[dim] == -1) {

				      if (infer_dim) {

				        throw std::runtime_error("only one dimension can be inferred");

				      }

				      infer_dim = dim;

				    } else if (shape[dim] >= 0) {

				      newsize *= shape[dim];

				    } else {

				      AT_ERROR("invalid shape dimension ", shape[dim]);

				    }

				  }

				  if (numel == newsize || (infer_dim && newsize > 0 && numel % newsize == 0)) {

				    if (infer_dim) {

				      // we have a degree of freedom here to select the dimension size; follow NumPy semantics

				      // and just bail.

				      AT_CHECK(newsize != 0, "cannot reshape tensor of 0 elements into shape ", shape);

				      res[*infer_dim] = numel / newsize;

				    }

				    return res;

				  }

				  std::ostringstream ss;

				  ss << "shape '" << shape << "' is invalid for input of size " << numel;

				  throw std::runtime_error(ss.str());

				}

				}

Compare commits

11082 Commits v0.1.7 ... v1.0.0a0

974 .circleci/config.yml Normal file Unescape Escape View File

88 .clang-format Normal file Unescape Escape View File

51 .clang-tidy Normal file Unescape Escape View File

1 .dockerignore Symbolic link Unescape Escape View File

1 .gitattributes vendored Normal file Unescape Escape View File

0 .github/CONTRIBUTING.md vendored Normal file Unescape Escape View File

38 .github/ISSUE_TEMPLATE.md vendored Normal file Unescape Escape View File

0 .github/PULL_REQUEST_TEMPLATE.md vendored Normal file Unescape Escape View File

228 .gitignore vendored Unescape Escape View File

78 .gitmodules vendored Normal file Unescape Escape View File

14 .jenkins/caffe2/README.md Normal file Unescape Escape View File

282 .jenkins/caffe2/build.sh Executable file Unescape Escape View File

7 .jenkins/caffe2/dirty.sh Executable file Unescape Escape View File

153 .jenkins/caffe2/test.sh Executable file Unescape Escape View File

42 .jenkins/pytorch/README.md Normal file Unescape Escape View File

21 .jenkins/pytorch/build-asan.sh Executable file Unescape Escape View File

145 .jenkins/pytorch/build.sh Executable file Unescape Escape View File

140 .jenkins/pytorch/common.sh Normal file Unescape Escape View File

10 .jenkins/pytorch/dirty.sh Executable file Unescape Escape View File

5 .jenkins/pytorch/disabled-configs.txt Normal file Unescape Escape View File

6 .jenkins/pytorch/docker-build-test.sh Executable file Unescape Escape View File

48 .jenkins/pytorch/enabled-configs.txt Normal file Unescape Escape View File

9 .jenkins/pytorch/macos-build-test.sh Executable file Unescape Escape View File

72 .jenkins/pytorch/macos-build.sh Executable file Unescape Escape View File

112 .jenkins/pytorch/macos-test.sh Executable file Unescape Escape View File

28 .jenkins/pytorch/multigpu-test.sh Executable file Unescape Escape View File

21 .jenkins/pytorch/perf_test/common.sh Normal file Unescape Escape View File

66 .jenkins/pytorch/perf_test/compare_with_baseline.py Normal file Unescape Escape View File

16 .jenkins/pytorch/perf_test/get_stats.py Normal file Unescape Escape View File

42 .jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh Normal file Unescape Escape View File

44 .jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh Normal file Unescape Escape View File

28 .jenkins/pytorch/perf_test/test_cpu_speed_torch.sh Normal file Unescape Escape View File

28 .jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh Normal file Unescape Escape View File

43 .jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh Normal file Unescape Escape View File

43 .jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh Normal file Unescape Escape View File

43 .jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh Normal file Unescape Escape View File

44 .jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh Normal file Unescape Escape View File

52 .jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh Normal file Unescape Escape View File

13 .jenkins/pytorch/perf_test/update_commit_hash.py Normal file Unescape Escape View File

11 .jenkins/pytorch/print_sccache_log.py Normal file Unescape Escape View File

64 .jenkins/pytorch/short-perf-test-cpu.sh Executable file Unescape Escape View File

68 .jenkins/pytorch/short-perf-test-gpu.sh Executable file Unescape Escape View File

177 .jenkins/pytorch/test.sh Executable file Unescape Escape View File

155 .jenkins/pytorch/win-build.sh Executable file Unescape Escape View File

93 .jenkins/pytorch/win-test.sh Executable file Unescape Escape View File

31 .travis.aten.yml Normal file Unescape Escape View File

45 .travis.yml Unescape Escape View File

6 CITATION Normal file Unescape Escape View File

421 CMakeLists.txt Normal file Unescape Escape View File

25 CODEOWNERS Normal file Unescape Escape View File

379 CONTRIBUTING.md Normal file Unescape Escape View File

33 Dockerfile Unescape Escape View File

32 LICENSE Unescape Escape View File

21 Makefile Normal file Unescape Escape View File

309 NOTICE Normal file Unescape Escape View File

226 README.md Unescape Escape View File

3 aten/.flake8 Normal file Unescape Escape View File

3 aten/.gitignore vendored Normal file Unescape Escape View File

105 aten/CMakeLists.txt Normal file Unescape Escape View File

258 aten/README.md Normal file Unescape Escape View File

21 aten/conda/build.sh Normal file Unescape Escape View File

33 aten/conda/meta.yaml Normal file Unescape Escape View File

1 aten/src/ATen/.gitignore vendored Normal file Unescape Escape View File

26 aten/src/ATen/ATen.h Normal file Unescape Escape View File

9 aten/src/ATen/ATenConfig.cmake.in Normal file Unescape Escape View File

43 aten/src/ATen/AccumulateType.h Normal file Unescape Escape View File

2 aten/src/ATen/Allocator.h Normal file Unescape Escape View File

2 aten/src/ATen/ArrayRef.h Normal file Unescape Escape View File

2 aten/src/ATen/Backend.h Normal file Unescape Escape View File

2 aten/src/ATen/Backtrace.h Normal file Unescape Escape View File

384 aten/src/ATen/CMakeLists.txt Normal file Unescape Escape View File

567 aten/src/ATen/CPUApplyUtils.h Normal file Unescape Escape View File

31 aten/src/ATen/CPUFixedAllocator.h Normal file Unescape Escape View File

16 aten/src/ATen/CPUGeneral.cpp Normal file Unescape Escape View File

12 aten/src/ATen/CPUGeneral.h Normal file Unescape Escape View File

49 aten/src/ATen/CPUGenerator.cpp Normal file Unescape Escape View File

20 aten/src/ATen/CPUTypeDefault.cpp Normal file Unescape Escape View File

14 aten/src/ATen/CPUTypeDefault.h Normal file Unescape Escape View File

11082 Commits

v0.1.7 ... v1.0.0a0

974

.circleci/config.yml Normal file

View File

88

.clang-format Normal file

View File

51

.clang-tidy Normal file

View File

1

.dockerignore Symbolic link

View File

1

.gitattributes vendored Normal file

View File

0

.github/CONTRIBUTING.md vendored Normal file

View File

38

.github/ISSUE_TEMPLATE.md vendored Normal file

View File

0

.github/PULL_REQUEST_TEMPLATE.md vendored Normal file

View File

228

.gitignore vendored

View File

78

.gitmodules vendored Normal file

View File

14

.jenkins/caffe2/README.md Normal file

View File

282

.jenkins/caffe2/build.sh Executable file

View File

7

.jenkins/caffe2/dirty.sh Executable file

View File

153

.jenkins/caffe2/test.sh Executable file

View File

42

.jenkins/pytorch/README.md Normal file

View File

21

.jenkins/pytorch/build-asan.sh Executable file

View File

145

.jenkins/pytorch/build.sh Executable file

View File

140

.jenkins/pytorch/common.sh Normal file

View File

10

.jenkins/pytorch/dirty.sh Executable file

View File

5

.jenkins/pytorch/disabled-configs.txt Normal file

View File

6

.jenkins/pytorch/docker-build-test.sh Executable file

View File

48

.jenkins/pytorch/enabled-configs.txt Normal file

View File

9

.jenkins/pytorch/macos-build-test.sh Executable file

View File

72

.jenkins/pytorch/macos-build.sh Executable file

View File

112

.jenkins/pytorch/macos-test.sh Executable file

View File

28

.jenkins/pytorch/multigpu-test.sh Executable file

View File

21

.jenkins/pytorch/perf_test/common.sh Normal file

View File

66

.jenkins/pytorch/perf_test/compare_with_baseline.py Normal file

View File

16

.jenkins/pytorch/perf_test/get_stats.py Normal file

View File

42

.jenkins/pytorch/perf_test/test_cpu_speed_mini_sequence_labeler.sh Normal file

View File

44

.jenkins/pytorch/perf_test/test_cpu_speed_mnist.sh Normal file

View File

28

.jenkins/pytorch/perf_test/test_cpu_speed_torch.sh Normal file

View File

28

.jenkins/pytorch/perf_test/test_cpu_speed_torch_tensor.sh Normal file

View File

43

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh Normal file

View File

43

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh Normal file

View File

43

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh Normal file

View File

44

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh Normal file

View File

52

.jenkins/pytorch/perf_test/test_gpu_speed_word_language_model.sh Normal file

View File

13

.jenkins/pytorch/perf_test/update_commit_hash.py Normal file

View File

11

.jenkins/pytorch/print_sccache_log.py Normal file

View File

64

.jenkins/pytorch/short-perf-test-cpu.sh Executable file

View File

68

.jenkins/pytorch/short-perf-test-gpu.sh Executable file

View File

177

.jenkins/pytorch/test.sh Executable file

View File

155

.jenkins/pytorch/win-build.sh Executable file

View File

93

.jenkins/pytorch/win-test.sh Executable file

View File

31

.travis.aten.yml Normal file

View File

45

.travis.yml

View File

6

CITATION Normal file

View File

421

CMakeLists.txt Normal file

View File

25

CODEOWNERS Normal file

View File

379

CONTRIBUTING.md Normal file

View File

33

Dockerfile

View File

32

LICENSE

View File

21

Makefile Normal file

View File

309

NOTICE Normal file

View File

226

README.md

View File

3

aten/.flake8 Normal file

View File

3

aten/.gitignore vendored Normal file

View File

105

aten/CMakeLists.txt Normal file

View File

258

aten/README.md Normal file

View File

21

aten/conda/build.sh Normal file

View File

33

aten/conda/meta.yaml Normal file

View File

1

aten/src/ATen/.gitignore vendored Normal file

View File

26

aten/src/ATen/ATen.h Normal file

View File

9

aten/src/ATen/ATenConfig.cmake.in Normal file

View File

43

aten/src/ATen/AccumulateType.h Normal file

View File

2

aten/src/ATen/Allocator.h Normal file

View File

2

aten/src/ATen/ArrayRef.h Normal file

View File

2

aten/src/ATen/Backend.h Normal file

View File

2

aten/src/ATen/Backtrace.h Normal file

View File

384

aten/src/ATen/CMakeLists.txt Normal file

View File

567

aten/src/ATen/CPUApplyUtils.h Normal file

View File

31

aten/src/ATen/CPUFixedAllocator.h Normal file

View File

16

aten/src/ATen/CPUGeneral.cpp Normal file

View File

12

aten/src/ATen/CPUGeneral.h Normal file

View File

49

aten/src/ATen/CPUGenerator.cpp Normal file

View File

20

aten/src/ATen/CPUTypeDefault.cpp Normal file

View File

14

aten/src/ATen/CPUTypeDefault.h Normal file

View File

0

aten/src/ATen/CUDAGuard.h Normal file

View File