159 Commits

Author SHA1 Message Date
2fe9e3a207 Remove catch from caffe2/.gitmodules
Summary: Step 3 to remove catch submodule from PyTorch

Reviewed By: ezyang

Differential Revision: D12959020

fbshipit-source-id: 49347de8b027433d422b653dd854ad76349d0e25
2018-11-07 11:10:09 -08:00
54d63c5752 added fbgemm as submodule (#13354) 2018-11-01 15:35:02 -04:00
1720757220 added submodules for int8 ops (#13106) 2018-10-25 09:11:11 -07:00
444cc0ee0a Back out "[pytorch][PR] added gemmlowp module" (#13090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13090

Original commit changeset: 7f8a649c739c

Reviewed By: Maratyszcza

Differential Revision: D10846367

fbshipit-source-id: a5a5aad29b51287dc1cb80c707eb5a0008ec78f5
2018-10-24 19:41:15 -07:00
9573ecefe3 Back out "[pytorch][PR] Add sse2neon tp" (#13091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13091

Original commit changeset: 8b4f9f361cc1

Reviewed By: Maratyszcza

Differential Revision: D10846301

fbshipit-source-id: 2798f1fca5c1a2362979977ef5eb724dd37c4e6d
2018-10-24 17:17:34 -07:00
b55dc8d971 Add sse2neon tp (#12948)
Summary:
Adding sse2neon in thrid-party as dependencies
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12948

Differential Revision: D10801574

Pulled By: harouwu

fbshipit-source-id: 8b4f9f361cc1722f631830f7675b9d209a9f22ef
2018-10-24 14:56:24 -07:00
c64a65c977 added gemmlowp module (#12947)
Summary:
Adding gemmlowp dependency in thrid-party folder
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12947

Differential Revision: D10794559

Pulled By: harouwu

fbshipit-source-id: 7f8a649c739ccb6c307327080711379b1db8c3e0
2018-10-24 13:53:58 -07:00
348867c10b Remove cereal submodule (#12666)
Summary:
Cereal is dead!

soumith orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12666

Reviewed By: soumith

Differential Revision: D10422061

Pulled By: goldsborough

fbshipit-source-id: ca1ac66d05e699df9de00fc340a399571b7ecb9f
2018-10-17 11:52:47 -07:00
a1bbe80e21 Remove NervanaGPU operators from Caffe2 (#12564)
Summary:
Fix #12540
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564

Reviewed By: orionr

Differential Revision: D10379775

Pulled By: soumith

fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0
2018-10-15 11:04:46 -07:00
c5d7494ca1 Use open-source NCCL2 in PyTorch (#12359)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359

Reviewed By: orionr, yns88

Differential Revision: D10219665

Pulled By: teng-li

fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
2018-10-08 15:39:07 -07:00
895994a7c3 Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch"
Reviewed By: The controller you requested could not be found.

fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d
2018-10-04 14:12:04 -07:00
ae7a7fb398 Use open-source NCCL2 in PyTorch (#12312)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312

Differential Revision: D10190845

Pulled By: teng-li

fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
2018-10-04 11:42:17 -07:00
c172ffb632 Remove the nanopb submodule
Summary:
After making changes internally, really remove the nanopb submodule.

Finalizes https://github.com/pytorch/pytorch/pull/10772

Reviewed By: yns88

Differential Revision: D9504582

fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b
2018-08-24 16:24:57 -07:00
05c473b85c Temporarily remove TBB (#8255) 2018-06-18 19:31:57 -04:00
769397eb77 [Caffe2] [feature request] Add gradient operators for IDEEP (#7234)
* Add gradient operators for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add gradient test cases for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Upgrade third_party/ideep

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Refine SumOp for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share input buffer in fallback op if possible

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback ConvTranspose op for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix bug introduced by the patch of sharing input buffer

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share output buffer in fallback operators

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove IDEEP to resolve repo issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Reflash IDEEP repo

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove redundant lines in IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback operators for IDEEP
(Flatten, ResizeLike, Transpose, and Reshape)

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-05-09 08:52:24 -07:00
f9393ffc90 Remove unneeded entry for NCCL in .gitmodules (#7216)
NCCL currently is not a git submodule. The NCCL source code is
bundled in 'third_party/nccl'.

Closes #7150
2018-05-03 00:07:58 -07:00
619a56bf21 Emergency new fork for ideep (upstream lost commits). (#7191)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-02 14:50:47 -04:00
88a705555a Add SLEEF for float and double (#6725) 2018-05-02 18:40:44 +00:00
b2cdd08252 Introducing onnx-tensorrt to third_party (#7119) 2018-04-30 21:09:51 -07:00
af71fb882f Merge autogradpp into PyTorch (#7074)
* Dump autogradpp into PyTorch

* Fixed up CMake for autogradpp/C++ API

* Made cereal a submodule

* Change search location of autogradpps mnist directory

* Add test_api to CI

* Download MNIST from the internet instead of storing in repo

* Fix warnings
2018-04-30 12:53:46 -07:00
caa6a8ce30 Switch to the official git mirror for Eigen. (#7090) 2018-04-30 14:09:18 -04:00
dec5e99e99 [aten] Move submodules to third_party (#6866)
* [aten] Move submodules to third_party

* [aten] Update aten_mirror.sh script for third_party

* [aten] Move ATen submodules def to root and rename

* [aten] Update cpuinfo cmake build

* [aten] Fix cpuinfo cmake build

* Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1

* [aten] Fix JIT test reference to catch
2018-04-24 23:33:46 -04:00
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
29e81e01aa Expunge ATen submodule; use the in-tree copy. (#6235)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-03 15:47:07 -04:00
90afedb6e2 Merge caffe2 with pytorch. 2018-03-30 10:29:50 -07:00
eb18a2f26c Reorganize third-party libraries into top-level third_party directory (#6025)
- gloo, pybind11, nanopb and nccl now live in third_party.
- ATen builds in aten/build rather than torch/lib/build/aten
- A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools
- Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-27 22:09:20 -04:00
6f80023c29 Port ATen and JIT C++ tests to Catch2 (#5788)
This PR addresses #5648. In particular, following the discussion at #5648:

- it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils
- it ports all ATen tests to Catch
- it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)
2018-03-19 16:09:43 -04:00
5fa3aac610 ATen ReduceOps (#5776)
#5481 was reverted due to a strange test bug. This PR attempts to fix that.

This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-15 12:09:28 -04:00
cadeb0cb17 Revert "ATen ReduceOps (#5481)" (#5765)
* Revert "ATen ReduceOps (#5481)"

This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc.

* Revert "Check that new cpuinfo and tbb submodules exist (#5714)"

This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.
2018-03-13 23:50:16 -04:00
310c3735b9 ATen ReduceOps (#5481)
This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-12 15:19:12 -04:00
c0866e45c7 Caffe2 ARM ComputeLibrary integration (#2015)
Caffe2 ARM Compute Library Integration
2018-02-23 18:09:05 -08:00
2344decc91 Add onnx as a submodule (#1998) 2018-02-21 21:10:50 -08:00
08113f922b Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1917

Reviewed By: orionr

Differential Revision: D6938735

Pulled By: Maratyszcza

fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
2018-02-08 15:48:56 -08:00
3108ce63ba Back out "[caffe2][PR] Vendor Python dependencies of NNPACK"
Summary:
Original commit changeset: d0c1c7681605

Reverting due to broken OSS build due to this commit

Reviewed By: bddppq

Differential Revision: D6935666

fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
2018-02-08 01:34:22 -08:00
9093eb1ba0 Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1901

Differential Revision: D6930731

Pulled By: Maratyszcza

fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
2018-02-07 17:48:06 -08:00
7ee286c80a Vendor NNPACK dependencies with Caffe2 2018-01-31 21:05:07 -08:00
5daf4ca1c9 Remove android-cmake submodule 2018-01-31 17:27:06 -08:00
c5bcd5560c Adding zstd to build
Summary:
This is in order for us to share compression ops to oss.
Closes https://github.com/caffe2/caffe2/pull/1463

Reviewed By: hlu1

Differential Revision: D6319101

Pulled By: Yangqing

fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe
2017-11-13 22:18:44 -08:00
d6ff84de5c Add an aten_op to contrib.
Summary:
This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN)
through the ATen tensor library. Use of the operator is described in the README.
The operator itself is generated from ATen's Declarations.yaml file which describes its public API.
Closes https://github.com/caffe2/caffe2/pull/1235

Reviewed By: dzhulgakov

Differential Revision: D5876944

Pulled By: zdevito

fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a
2017-09-25 10:53:51 -07:00
c2169c717f Remove references to cnmem
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D5815624

fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed
2017-09-12 14:37:12 -07:00
ac8d3372b0 Add nanopb submodule.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6d0364f13d Add pybind11 as a submodule. 2017-09-05 17:48:55 -04:00
7310ebb66f Add gloo submodule.
We make gloo a submodule because it contains submodules itself, and
Git cannot handle subtrees with nested submodules.

Fixes https://github.com/pytorch/pytorch/issues/2426
2017-08-30 11:54:04 -04:00
5e0d434b4b Add build support for opengl and latest nnpack.
Summary:
(1) Changed android-cmake to use Yangqing/android-cmake, which supports NEON fp16.
(2) Added cmake scripts to build opengl.
(3) Updated nnpack to master, and changed the corresponding build files.
Closes https://github.com/caffe2/caffe2/pull/1061

Differential Revision: D5591387

Pulled By: Yangqing

fbshipit-source-id: 1d3f28511d33c09df6ecef5041448ac9a3246601
2017-08-09 00:31:53 -07:00
e591ddb70b Add nnpack specific dependencies under third_party 2017-03-24 12:37:56 -07:00
d5880b128e CMake support for Gloo dependency
Summary:
This also requires a change to cmake/External/nccl.cmake to use the
static NCCL binary instead of the shared object. When the Caffe2/Gloo
build uses the bundled NCCL version it should be packaged up in the
resulting libraries and not cause another runtime dependency on a
library that has to be installed separately.
Closes https://github.com/caffe2/caffe2/pull/218

Differential Revision: D4769926

Pulled By: pietern

fbshipit-source-id: 5c85559992c200d874f4218724823815ffb5adb5
2017-03-24 08:32:24 -07:00
4dd297d261 Add nnpack 2017-02-01 21:45:18 -08:00
4c614f2e67 Add ios-cmake 2017-01-27 00:08:57 -08:00
01e860505b Cmake for android
Summary:
Added cmake for android script under scripts, and set up the travis contbuild target.
Closes https://github.com/caffe2/caffe2/pull/109

Reviewed By: bwasti

Differential Revision: D4468767

Pulled By: Yangqing

fbshipit-source-id: 709f3eb6be24727b0a989d0901dbf377871b122a
2017-01-26 18:14:30 -08:00
a9e2693fa8 add back third_party/protobuf, but it won't be used in normal builds.
Pinned protobuf to v3.1.0

Removed the USE_SYSTEM_PROTOBUF option in cmake. It is no longer used.
2017-01-04 17:27:18 -08:00