Commit Graph

66 Commits

Author SHA1 Message Date
a3933b87c6 Back out "Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch" (#18514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18514

Original commit changeset: d6267ddfc339

Reviewed By: bddppq

Differential Revision: D14634476

fbshipit-source-id: 2633b0b4c512d71001e5c20cd79c0c0d7856f942
2019-03-26 23:44:33 -07:00
66e8c74814 Revert D14613517: [pytorch][PR] Updating onnxtrt submodule to master branch
Differential Revision:
D14613517

Original commit changeset: dd20d718db55

fbshipit-source-id: d6267ddfc339d04f182e2de1750a601c8d6bf8c6
2019-03-26 17:37:55 -07:00
bbe110f4e1 Updating onnxtrt submodule to master branch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18441

Differential Revision: D14613517

Pulled By: bddppq

fbshipit-source-id: dd20d718db55942df9cce7acd1151d6902bc57ff
2019-03-26 14:25:55 -07:00
0fe6e8c870 Remove ComputeLibrary submodule
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/18052

Reviewed By: ezyang

Differential Revision: D14477355

fbshipit-source-id: c56b802f6d69701596c327cf9af6782f30e335fa
2019-03-16 09:06:42 -07:00
e6cf3c886d add foxi submodule (#17184) 2019-02-20 16:25:05 -05:00
aefc83f46d fixing some rebuild issues (#14969)
Summary:
This fixes rebuild issues with the ninja part of the build. With this patch all ninja files will now report `nothing to do` if nothing has changed assuming `BUILD_CAFFE2_OPS=0`.

1. This only does the python file processing for caffe2 when BUILD_CAFFE2_OPS=1, this part of the build file is written in such a way that it is always required to rerun and can take substantial time to move files around in the no-op build. In the future this part should be rewritten to use a faster method of copying the files or should treat copying the files as part of the build rules and only run when the files are out of date.

2. This points `sleef` to a patched version that fixes a dead build output that is causing everything to relink all the time. See https://github.com/shibatch/sleef/pull/231#partial-pull-merging for the upstream change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14969

Reviewed By: soumith

Differential Revision: D13395998

Pulled By: zdevito

fbshipit-source-id: ca85b7be9e99c5c578103c144ef0f2c3b927e724
2018-12-09 16:32:19 -08:00
5e06fa0baf ONNX changes to use int32_t (instead of enum) to store data type
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14926

Reviewed By: houseroad

Differential Revision: D13390642

Pulled By: bddppq

fbshipit-source-id: c2314b24d9384f188fda2b9a5cc16465ad39581e
2018-12-08 01:06:08 -08:00
2fe9e3a207 Remove catch from caffe2/.gitmodules
Summary: Step 3 to remove catch submodule from PyTorch

Reviewed By: ezyang

Differential Revision: D12959020

fbshipit-source-id: 49347de8b027433d422b653dd854ad76349d0e25
2018-11-07 11:10:09 -08:00
54d63c5752 added fbgemm as submodule (#13354) 2018-11-01 15:35:02 -04:00
1720757220 added submodules for int8 ops (#13106) 2018-10-25 09:11:11 -07:00
444cc0ee0a Back out "[pytorch][PR] added gemmlowp module" (#13090)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13090

Original commit changeset: 7f8a649c739c

Reviewed By: Maratyszcza

Differential Revision: D10846367

fbshipit-source-id: a5a5aad29b51287dc1cb80c707eb5a0008ec78f5
2018-10-24 19:41:15 -07:00
9573ecefe3 Back out "[pytorch][PR] Add sse2neon tp" (#13091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13091

Original commit changeset: 8b4f9f361cc1

Reviewed By: Maratyszcza

Differential Revision: D10846301

fbshipit-source-id: 2798f1fca5c1a2362979977ef5eb724dd37c4e6d
2018-10-24 17:17:34 -07:00
b55dc8d971 Add sse2neon tp (#12948)
Summary:
Adding sse2neon in thrid-party as dependencies
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12948

Differential Revision: D10801574

Pulled By: harouwu

fbshipit-source-id: 8b4f9f361cc1722f631830f7675b9d209a9f22ef
2018-10-24 14:56:24 -07:00
c64a65c977 added gemmlowp module (#12947)
Summary:
Adding gemmlowp dependency in thrid-party folder
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12947

Differential Revision: D10794559

Pulled By: harouwu

fbshipit-source-id: 7f8a649c739ccb6c307327080711379b1db8c3e0
2018-10-24 13:53:58 -07:00
348867c10b Remove cereal submodule (#12666)
Summary:
Cereal is dead!

soumith orionr
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12666

Reviewed By: soumith

Differential Revision: D10422061

Pulled By: goldsborough

fbshipit-source-id: ca1ac66d05e699df9de00fc340a399571b7ecb9f
2018-10-17 11:52:47 -07:00
a1bbe80e21 Remove NervanaGPU operators from Caffe2 (#12564)
Summary:
Fix #12540
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564

Reviewed By: orionr

Differential Revision: D10379775

Pulled By: soumith

fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0
2018-10-15 11:04:46 -07:00
c5d7494ca1 Use open-source NCCL2 in PyTorch (#12359)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359

Reviewed By: orionr, yns88

Differential Revision: D10219665

Pulled By: teng-li

fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881
2018-10-08 15:39:07 -07:00
895994a7c3 Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch"
Reviewed By: The controller you requested could not be found.

fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d
2018-10-04 14:12:04 -07:00
ae7a7fb398 Use open-source NCCL2 in PyTorch (#12312)
Summary:
- Removed the old nccl file
- Make open-source NCCL a submodule
- CMake to make NCCL itself

NCCL2 now is in the default build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312

Differential Revision: D10190845

Pulled By: teng-li

fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae
2018-10-04 11:42:17 -07:00
c172ffb632 Remove the nanopb submodule
Summary:
After making changes internally, really remove the nanopb submodule.

Finalizes https://github.com/pytorch/pytorch/pull/10772

Reviewed By: yns88

Differential Revision: D9504582

fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b
2018-08-24 16:24:57 -07:00
05c473b85c Temporarily remove TBB (#8255) 2018-06-18 19:31:57 -04:00
769397eb77 [Caffe2] [feature request] Add gradient operators for IDEEP (#7234)
* Add gradient operators for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add gradient test cases for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Upgrade third_party/ideep

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Refine SumOp for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share input buffer in fallback op if possible

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback ConvTranspose op for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix bug introduced by the patch of sharing input buffer

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Share output buffer in fallback operators

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove IDEEP to resolve repo issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Reflash IDEEP repo

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Remove redundant lines in IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fallback operators for IDEEP
(Flatten, ResizeLike, Transpose, and Reshape)

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-05-09 08:52:24 -07:00
f9393ffc90 Remove unneeded entry for NCCL in .gitmodules (#7216)
NCCL currently is not a git submodule. The NCCL source code is
bundled in 'third_party/nccl'.

Closes #7150
2018-05-03 00:07:58 -07:00
619a56bf21 Emergency new fork for ideep (upstream lost commits). (#7191)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-05-02 14:50:47 -04:00
88a705555a Add SLEEF for float and double (#6725) 2018-05-02 18:40:44 +00:00
b2cdd08252 Introducing onnx-tensorrt to third_party (#7119) 2018-04-30 21:09:51 -07:00
af71fb882f Merge autogradpp into PyTorch (#7074)
* Dump autogradpp into PyTorch

* Fixed up CMake for autogradpp/C++ API

* Made cereal a submodule

* Change search location of autogradpps mnist directory

* Add test_api to CI

* Download MNIST from the internet instead of storing in repo

* Fix warnings
2018-04-30 12:53:46 -07:00
caa6a8ce30 Switch to the official git mirror for Eigen. (#7090) 2018-04-30 14:09:18 -04:00
dec5e99e99 [aten] Move submodules to third_party (#6866)
* [aten] Move submodules to third_party

* [aten] Update aten_mirror.sh script for third_party

* [aten] Move ATen submodules def to root and rename

* [aten] Update cpuinfo cmake build

* [aten] Fix cpuinfo cmake build

* Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1

* [aten] Fix JIT test reference to catch
2018-04-24 23:33:46 -04:00
26ddefbda1 [feature request] [Caffe2] Enable MKLDNN support for inference (#6699)
* Add operators based-on IDEEP interfaces

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Enable IDEEP as a caffe2 device

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add test cases for IDEEP ops

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add IDEEP as a caffe2 submodule

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Skip test cases if no IDEEP support

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct cmake options for IDEEP

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Add dependences on ideep libraries

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix issues in IDEEP conv ops and etc.

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Move ideep from caffe2/ideep to caffe2/contrib/ideep

Signed-off-by: Gu Jinghui <jinghui.gu@intel.com>

* Update IDEEP to fix cmake issue

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Fix cmake issue caused by USE_MKL option

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>

* Correct comments in MKL cmake file

Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>
2018-04-22 21:58:14 -07:00
29e81e01aa Expunge ATen submodule; use the in-tree copy. (#6235)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-04-03 15:47:07 -04:00
90afedb6e2 Merge caffe2 with pytorch. 2018-03-30 10:29:50 -07:00
eb18a2f26c Reorganize third-party libraries into top-level third_party directory (#6025)
- gloo, pybind11, nanopb and nccl now live in third_party.
- ATen builds in aten/build rather than torch/lib/build/aten
- A bit of faffing about in the scripts was necessary, because they used to assume that everything lived in the same directory. Now you are expected to cd into the correct directory before calling one of the build functions. The actual builder script lives in tools
- Lint now just unconditionally ignores third_party, rather than enumerating folders explicitly
2018-03-27 22:09:20 -04:00
6f80023c29 Port ATen and JIT C++ tests to Catch2 (#5788)
This PR addresses #5648. In particular, following the discussion at #5648:

- it adds Catch as a submodule (https://github.com/catchorg/Catch2) in torch/aten/utils
- it ports all ATen tests to Catch
- it ports torch/csrc/jit/test_jit.cpp to Catch (libtorch only, Python build is unaffected)
2018-03-19 16:09:43 -04:00
5fa3aac610 ATen ReduceOps (#5776)
#5481 was reverted due to a strange test bug. This PR attempts to fix that.

This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-15 12:09:28 -04:00
cadeb0cb17 Revert "ATen ReduceOps (#5481)" (#5765)
* Revert "ATen ReduceOps (#5481)"

This reverts commit 310c3735b9eb97f30cee743b773e5bb054989edc.

* Revert "Check that new cpuinfo and tbb submodules exist (#5714)"

This reverts commit 1a23c9901dbfee295bf5b3dad36e4d3ee7e86366.
2018-03-13 23:50:16 -04:00
310c3735b9 ATen ReduceOps (#5481)
This diff adds vectorization to ATen. It uses intel intrinsics to build a general vec256 class, that represents types of 256bit width. These can then be treated like regular variables. Using those it implements torch.sum() for the contiguous case. It uses Intel TBB for multithreading, which allows workstealing and chunks the reduction operations based on a experimentally chosen value (_THRESHOLD). It uses cpuinfo to pick the right code depending on the host's capabilities.

The kernels are implemented under native/cpu. Each .cpp file is compiled with -avx, -avx2 and no additional flags. A macro is used to append AVX, AVX2 or NONE to the function name. The header then needs to define the functions three times, one for each capability. This could be improved by either changing the cmake file a bit or possibly generating source code using a Python script etc.

For the non-contiguous case this defaults to the current implementation within TH. For CUDA is entirely defaults to the implementation within THC.

There probably needs to be a bit of a debate around the design decisions here, the additional dependencies, parallelization strategy, clarity, etc. The numerical results also diverge from numpy with larger tensors, which is expected since we're summing, for example, 8 numbers and then adding the result to the running sum, instead of each number one by one. But there might be something to be said about accumulating into a double for floats or the degree of divergence, the behavior with respect to CUDA, etc.

I wrote a [small Python script]( https://github.com/cpuhrsch/benchmark/blob/sumall/benchmarks/sum_bench.py) to compare the results with numpy numerically as well as on timing. I ran this script to create timings both on master and this branch.

Here is the command for 1 core
`OMP_NUM_THREAD=1 taskset -c 0 python sum_bench.py --enable_numpy 200`

Here is the command for all cores
`python sum_bench.py --enable_numpy 200`

Here are the results of each:

[Master, 1 core](https://paste.fedoraproject.org/paste/Nho9JzHpPVK9av8a6mByjQ)

[This branch, 1 core](https://paste.fedoraproject.org/paste/6xLHkYvcVJx9z~5MoHxN4w)

[Master, all cores](https://paste.fedoraproject.org/paste/5l3V1d5zGqvJcMXIUteMRw)

[This branch, all cores](https://paste.fedoraproject.org/paste/J4RuDU-0Drz0aZwtphQwEA)

To test the command is
`python sum_bench.py --test 200`

[This branch, test results](https://paste.fedoraproject.org/paste/kTEoUC~oWgXA6XWMAfNfNw)

For this test we look at the average absolute value of the differences. This does not take into account the relative magnitude of the numbers. The numbers are sampled from a standard normal distribution. 

In terms of performance this diff should bring PyTorch on par with Numpy and usually exceed it by 1.5 to 2x.
2018-03-12 15:19:12 -04:00
c0866e45c7 Caffe2 ARM ComputeLibrary integration (#2015)
Caffe2 ARM Compute Library Integration
2018-02-23 18:09:05 -08:00
2344decc91 Add onnx as a submodule (#1998) 2018-02-21 21:10:50 -08:00
08113f922b Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1917

Reviewed By: orionr

Differential Revision: D6938735

Pulled By: Maratyszcza

fbshipit-source-id: 841a6c47a1cd003a19f48f6c256aa4d9eb2cc6e4
2018-02-08 15:48:56 -08:00
3108ce63ba Back out "[caffe2][PR] Vendor Python dependencies of NNPACK"
Summary:
Original commit changeset: d0c1c7681605

Reverting due to broken OSS build due to this commit

Reviewed By: bddppq

Differential Revision: D6935666

fbshipit-source-id: 955cfeb6d5a4ed265b2e099094cfb5bfe960ff95
2018-02-08 01:34:22 -08:00
9093eb1ba0 Vendor Python dependencies of NNPACK
Summary:
Include six, enum34, and PeachPy as Caffe2 submodules, and use the versions from submodules instead of downloading them during configuration time
Closes https://github.com/caffe2/caffe2/pull/1901

Differential Revision: D6930731

Pulled By: Maratyszcza

fbshipit-source-id: d0c1c7681605d957de6f51bd24fbb25afc0f282f
2018-02-07 17:48:06 -08:00
7ee286c80a Vendor NNPACK dependencies with Caffe2 2018-01-31 21:05:07 -08:00
5daf4ca1c9 Remove android-cmake submodule 2018-01-31 17:27:06 -08:00
c5bcd5560c Adding zstd to build
Summary:
This is in order for us to share compression ops to oss.
Closes https://github.com/caffe2/caffe2/pull/1463

Reviewed By: hlu1

Differential Revision: D6319101

Pulled By: Yangqing

fbshipit-source-id: 16c94e71fc3efe256054a648170aaf7702e5bcfe
2017-11-13 22:18:44 -08:00
d6ff84de5c Add an aten_op to contrib.
Summary:
This operator allows the use of Torch's underlying TH libraries (TH, THC, THNN, and THCUNN)
through the ATen tensor library. Use of the operator is described in the README.
The operator itself is generated from ATen's Declarations.yaml file which describes its public API.
Closes https://github.com/caffe2/caffe2/pull/1235

Reviewed By: dzhulgakov

Differential Revision: D5876944

Pulled By: zdevito

fbshipit-source-id: b558e8563a5e82a0e6278705a4a359bd7df4e70a
2017-09-25 10:53:51 -07:00
c2169c717f Remove references to cnmem
Summary: TSIA

Reviewed By: Yangqing

Differential Revision: D5815624

fbshipit-source-id: 1a6c0e471eac778aeac80001eac947178fc105ed
2017-09-12 14:37:12 -07:00
ac8d3372b0 Add nanopb submodule.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-09-05 17:48:55 -04:00
6d0364f13d Add pybind11 as a submodule. 2017-09-05 17:48:55 -04:00
7310ebb66f Add gloo submodule.
We make gloo a submodule because it contains submodules itself, and
Git cannot handle subtrees with nested submodules.

Fixes https://github.com/pytorch/pytorch/issues/2426
2017-08-30 11:54:04 -04:00