Commit Graph

469 Commits

Author SHA1 Message Date
5bfd8f583c Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726)
Summary:
This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726

Reviewed By: orionr

Differential Revision: D9844075

Pulled By: pjh5

fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa
2018-09-17 08:58:05 -07:00
acb6f18bab fix generate_code.py caching (#11644)
Summary:
Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed.

This updated logic fixes the input caching
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644

Reviewed By: orionr

Differential Revision: D9814348

Pulled By: soumith

fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f
2018-09-13 12:39:48 -07:00
6dcdbd3a1d Make C10d support CPU only build (#11513)
Summary:
This makes torch.distributed works for CPU only build.

Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513

Differential Revision: D9784546

Pulled By: teng-li

fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7
2018-09-11 22:10:34 -07:00
289a8c9b7d Allow train/eval, and non-Tensor arguments to python functions (#11505)
Summary:
This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work.

This also changes the code for calling python functions from script to allow non-tensor inputs/outputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505

Differential Revision: D9765466

Pulled By: zdevito

fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69
2018-09-11 15:05:09 -07:00
d32b41003a Copy protos on install same as develop (#11517)
Summary:
This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop.

cc ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517

Differential Revision: D9771911

Pulled By: orionr

fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9
2018-09-11 10:09:56 -07:00
4e8d9a4a58 Introducing python setup.py rebuild develop (#11487)
Summary:
This speeds up incremental builds by doing the following changes:

- Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy"
- Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`.
   *Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun*

Current no-op rebuild speedup:
- 1m 15s -> 20s

There are some lingering bugs. No-op rebuilds rerun `cmake`  for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild).

So what you see

```
python setup.py rebuild develop    # first time - ~5 mins
python setup.py rebuild develop    # second time - ~3 mins
python setup.py rebuild develop    # third time - ~2 mins
python setup.py rebuild develop    # fourth time - ~20 seconds
python setup.py rebuild develop    # fifth time - ~20 seconds
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487

Differential Revision: D9769087

Pulled By: soumith

fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5
2018-09-11 08:56:25 -07:00
a175282776 Flags for LMDB, LevelDB, and Caffe2 ops (#11462)
Summary:
Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with

```
USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps
```

Also add a flag to build Caffe2 ops, which is default `ON`. Disable with

```
NO_CAFFE2_OPS=1 python setup.py build_deps
```

cc Yangqing soumith pjh5 mingzhe09088
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462

Reviewed By: soumith

Differential Revision: D9758156

Pulled By: orionr

fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63
2018-09-10 17:27:50 -07:00
a0d4106c07 Integrate custom op tests with CI (#10611)
Summary:
This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them.

I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think?

Also ezyang for CI changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611

Differential Revision: D9597627

Pulled By: goldsborough

fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3
2018-09-10 15:40:21 -07:00
802d21c8f4 Remove FULL_CAFFE2 flag (#11321)
Summary:
Continuing pjh5's work to remove FULL_CAFFE2 flag completely.

With these changes you'll be able to also do something like

```
NO_TEST=1 python setup.py build_deps
```
and this will skip building tests in caffe2, aten, and c10d. By default the tests are built.

cc mingzhe09088 Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321

Reviewed By: mingzhe09088

Differential Revision: D9694950

Pulled By: orionr

fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8
2018-09-07 15:09:44 -07:00
01930a3145 Move sync_params to C++ (#9805)
Summary:
The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase.

I also split code into a `.h` and `.cpp` file for better code organization.

The controller you requested could not be found. pietern apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805

Differential Revision: D9688604

Pulled By: goldsborough

fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8
2018-09-07 12:56:40 -07:00
9de2085806 Use custom hcc/HIP, purge hcSPARSE (#11198)
Summary:
* purge hcSPARSE now that rocSPARSE is available
* integrate a custom hcc and HIP
* hcc brings two important compiler fixes (fixes hundreds of unit tests)
* HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch)
* mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault)
* optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198

Differential Revision: D9652340

Pulled By: ezyang

fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690
2018-09-06 19:38:07 -07:00
dda8402447 Cleanup dependency of distributed flags (#11221)
Summary:
Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set.

cc pietern The controller you requested could not be found. cpuhrsch
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221

Reviewed By: Yangqing

Differential Revision: D9664267

Pulled By: orionr

fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73
2018-09-06 08:56:00 -07:00
c0efe6f027 Forward declarations of needed curand functions (#10911)
Summary:
Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911

Reviewed By: pjh5

Differential Revision: D9636256

Pulled By: orionr

fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65
2018-09-05 16:56:26 -07:00
68c2e014cb Handling for py2/py3 division differences (#11016)
Summary:
- In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if
  `from __future__ import division` is not imported in the file.
- The / operator is universally set to do "true" division for integers
- Added a `prim::FloorDiv` operator because it is used in loop unrolling.

The error if users use '/' in python 2 without importing from __future__
occurs when building the JIT AST.

cc apaszke zdevito
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016

Differential Revision: D9613527

Pulled By: zou3519

fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6
2018-09-05 14:57:38 -07:00
020501b7b0 Getting rid of USE_C10D for build (#11237)
Summary:
Will use USE_DISTRIBUTED for both c10d and THD
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237

Differential Revision: D9647825

Pulled By: teng-li

fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969
2018-09-04 17:27:53 -07:00
33c7cc13ca improve docker packages, fix bugs, enable tests, enable FFT (#10893)
Summary:
* improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs)
* integrate rocFFT (i.e., enable Fourier functionality)
* fix bugs in ROCm caused by wrong warp size
* enable more test sets, skip the tests that don't work on ROCm yet
* don't disable asserts any longer in hipification
* small improvements
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893

Differential Revision: D9615053

Pulled By: ezyang

fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b
2018-09-02 08:54:42 -07:00
3791bd12c8 PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128)
Summary:
Added MPI group support.
And this will make all previous group test cases of MPI passed.

Also, release the MPI thread level support by serializing different PG's MPI ops. This is required.

The build is fixed too
Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128

Differential Revision: D9602188

Pulled By: teng-li

fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497
2018-08-31 12:39:56 -07:00
cd9416317d Minor copy-edit on setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933

Reviewed By: cpuhrsch

Differential Revision: D9526650

fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f
2018-08-29 13:41:04 -07:00
3c9775fff8 Remove nanopb since we've switched to protobuf (#10772)
Summary:
We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default.

cc zdevito pjh5 ezyang Yangqing
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772

Reviewed By: pjh5

Differential Revision: D9465894

Pulled By: orionr

fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201
2018-08-24 10:54:38 -07:00
8c13971f57 Remove protobuf require and use requirements.txt (#10771)
Summary:
In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771

Reviewed By: pjh5

Differential Revision: D9474458

Pulled By: orionr

fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492
2018-08-24 10:39:40 -07:00
a4c59a9dab MIOpen integration, more tests enabled, bug fixes (#10612)
Summary:
* first integration of MIOpen for batch norm and conv on ROCm
* workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing
* workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script
* use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm
* enable test_sparse set on CI, skip tests that don't work currently on ROCm
* enable more tests in test_optim after the elementwise_bug got fixed
* enable more tests in test_dataloader
* improvements to hipification and ROCm build

With this, resnet18 on CIFAR data trains without hang or crash in our tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612

Reviewed By: bddppq

Differential Revision: D9423872

Pulled By: ezyang

fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd
2018-08-23 15:24:47 -07:00
227635142f Delete THD master_worker (#10731)
Summary:
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731

Differential Revision: D9423675

Pulled By: ezyang

fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc
2018-08-22 08:54:36 -07:00
c101a57a74 Build mechanism for custom operators (#10226)
Summary:
This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I:

1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries
2. Created a ` torch/op.h` header for easy inclusion of necessary headers,
3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op.
    1. It defines an op in `op.{h,cpp}`
    2. Registers it with the JIT using `RegisterOperators`
    3. Builds it into a shared library via a `CMakeLists.txt`
    4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey!

The pure C++ and the Python builds are separate and not coupled in any way.

zdevito soumith dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226

Differential Revision: D9296839

Pulled By: goldsborough

fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0
2018-08-16 18:56:17 -07:00
130881f0e3 Delete build_caffe2.sh, replace with build_libtorch.py (#10508)
Summary:
delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR).  This ensures that all consumers of the torch CMake file go through as unified a path as possible.

In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508

Differential Revision: D9354398

Pulled By: anderspapitto

fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9
2018-08-16 08:10:04 -07:00
021b4888db Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530)
Summary:
In my environment, it looks like setup.py hangs when running

```
FULL_CAFFE2=1 python setup.py build_deps
```

Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`.

cc pjh5
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530

Differential Revision: D9349597

Pulled By: orionr

fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4
2018-08-15 14:26:53 -07:00
d1442b36f3 add a rebuild_libtorch command for speedier iteration. (#10036)
Summary:
It just calls into `ninja install`. For iterative work on
libtorch.so/_C.so,
`python setup.py rebuild_libtorch develop` should provide quick iteration
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036

Differential Revision: D9317869

Pulled By: anderspapitto

fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2
2018-08-14 12:10:02 -07:00
75651d5b58 improve use of ROCm libraries, enable more tests, small fixes (#10406)
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
2018-08-13 11:39:43 -07:00
cd81217f8e A single print statement in setup.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473

Reviewed By: ml7

Differential Revision: D9299196

Pulled By: pjh5

fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb
2018-08-13 11:39:42 -07:00
0b63d12db6 Don't call into Python during Storage destruction. (#10407)
Summary:
```
This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some
programs that use multiprocessing. The backtrace pointed to
StorageRef.__del__ being called from subtype_dealloc. My guess is that
the Python interpreter was shutdown before all C++ Storage objects were
deallocated. Deallocating the C++ Storage called the finalizer which
called back into Python after it was no longer safe to do so.

This avoids a callback from C++ into Python during Storage finalization.
Instead, dead Storage objects (expired weak references) are collected
periodically when shared_cache exceeds a limit. The limit is scaled with
2x the number of live references, which places an upper bound on the
amount of extra memory held by dead Storage objects. In practice, this
should be very small.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407

Differential Revision: D9272400

Pulled By: colesbury

fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2
2018-08-13 11:20:07 -07:00
def3715e82 Minor changes for nicer pip packages (#9544)
Summary:
I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544

Reviewed By: orionr

Differential Revision: D9267111

Pulled By: pjh5

fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894
2018-08-10 12:09:46 -07:00
40109b16d0 Remove caffe1 specific proto (#10380)
Summary:
This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation.

Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380

Differential Revision: D9267981

Pulled By: Yangqing

fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390
2018-08-10 11:10:26 -07:00
506142ac8a Add warning for building PyTorch using Python 2.7 on Windows (#10247)
Summary:
Fixes #9232.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247

Differential Revision: D9178257

Pulled By: SsnL

fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca
2018-08-05 21:24:02 -07:00
df23bdc82d add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233)
Summary:
Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`.
But "setup.py clean" deletes these folders because those are described in `.gitignore` file.

To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker.

Discussed in #10206
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233

Differential Revision: D9175515

Pulled By: ezyang

fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37
2018-08-05 15:55:44 -07:00
170d29769b Strings lexing, parsing, implementation in print (#9324)
Summary:
This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print.  If they are encountered elsewhere a NYI exception will be thrown.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324

Reviewed By: jramseyer

Differential Revision: D8807128

Pulled By: eellison

fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078
2018-08-02 11:09:03 -07:00
2d56b5cf8b Prepare THC for first class scalars (0-dimensional tensors).
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072

Differential Revision: D9082421

Pulled By: gchanan

fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd
2018-08-01 14:28:51 -07:00
37a226de63 When BUILD_ATEN=OFF, use ATen/core directly (#10019)
Summary:
ATenCore.h is a dummy header to just test that this is working at all.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019

Reviewed By: smessmer

Differential Revision: D9067262

Pulled By: ezyang

fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee
2018-07-30 21:09:55 -07:00
a08119afc2 Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561)
Summary:
* THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>`
* Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet.
* There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides)
* Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides
* Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides)

Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go.

Note for gchanan: review from commit "ci" and after
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561

Reviewed By: cpuhrsch

Differential Revision: D8901926

Pulled By: ezyang

fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510
2018-07-19 14:10:06 -07:00
4c615b1796 Introduce libtorch to setup.py build (#8792)
Summary:
Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other.

1) with setup.py. This method
- used the setuptools C extension functionality
- worked on all platforms
- did not build test_jit/test_api binaries
- did not include the C++ api
- always included python functionality
- produced _C.so

2) with cpp_build. This method
- used CMake
- did not support Windows or ROCM
- was capable of building the test binaries
- included the C++ api
- did not build the python functionality
- produced libtorch.so

This diff combines the two.

1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build
- is CMake-based
- works on all platforms
- builds the test binaries
- includes the C++ api
- does not include the python functionality
- produces libtorch.so

2) the setup.py build
- compiles the python functionality
- calls into the CMake build to build libtorch.so
- produces _C.so, which has a dependency on libtorch.so

In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792

Reviewed By: ezyang

Differential Revision: D8764181

Pulled By: anderspapitto

fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f
2018-07-18 14:59:33 -07:00
a487b08c2e AutoBatching - IR transformation(basic operators) (#9198)
Summary:
Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation).
- `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph.
- Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select).
- Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph)

Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198

Reviewed By: zdevito

Differential Revision: D8744466

Pulled By: ChunliF

fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff
2018-07-11 18:25:07 -07:00
b9f575fc33 Remove legacy code from the JIT (#9323)
Summary:
In particular, get rid of backward tracing and CppOp.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323

Reviewed By: ezyang

Differential Revision: D8795935

Pulled By: apaszke

fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe
2018-07-11 10:25:38 -07:00
efefd1d7cf Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885)
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.

Commit 1
=======

This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.

* Switches schema over to parsed declarations, in the future this will allow something like:

```
  registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
    ...
  })
```

This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).

The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.

Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.

This removes the other way we encoded schema, and makes it easier to see what schema are registered.

Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6

* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.

* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.

* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.

Commit 2
=======

This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.

* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.

* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.

* Remove addInterpreterOpHandler in favor of global operator registry.

* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.

* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.

* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors.  In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.

* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication

* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885

Reviewed By: jamesr66a

Differential Revision: D8751048

Pulled By: zdevito

fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
2018-07-10 10:24:48 -07:00
d0d1820814 Add weak pointer and finalizer support directly to THStorage. (#9148)
Summary:
The underlying use-case is the file descriptor to storage cache in
torch.multiprocessing.reductions.  Previously, this was implemented by wrapping
an existing allocator with a "weak ref" allocator which also knew to null out
the weak reference when the storage died.  This is terribly oblique, and
prevents us from refactoring the allocators to get rid of per-storage allocator
state.

So instead of going through this fiasco, we instead directly implement weak
pointers and finalizers in THStorage.  Weak pointers to THStorage retain the
THStorage struct, but not the data_ptr.  When all strong references die,
data_ptr dies and the finalizers get invoked.

There is one major hazard in this patch, which is what happens if you
repeatedly call _weak_ref on a storage.  For cleanliness, we no longer
shove our grubby fingers into the finalizer struct to see if there is already
a Python object for the weak reference and return it; we just create a new one
(no one is checking these Python objects for identity).  This means if you
keep calling it, we'll keep piling on finalizers.  That's bad! But I am
not going to fix it until it is actually a problem for someone, because
then we need to add another caching layer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148

Differential Revision: D8729106

Pulled By: ezyang

fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f
2018-07-10 06:25:33 -07:00
4498fb962b Add space around operator (#9294)
Summary:
Fixes lint failure on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294

Differential Revision: D8779010

Pulled By: goldsborough

fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91
2018-07-09 20:24:21 -07:00
99ab082366 Making setup.py install work for Caffe2 (#8509)
Summary:
Tested on my mac on a pretty clean anaconda3
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509

Reviewed By: orionr

Differential Revision: D8702257

Pulled By: pjh5

fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689
2018-07-09 18:10:58 -07:00
819815d9c0 Fix missing compile_commands.json for aten (#9227)
Summary:
When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227

Reviewed By: goldsborough

Differential Revision: D8757984

Pulled By: zdevito

fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6
2018-07-08 16:54:34 -07:00
f6027bb15d Install hpp headers for CPP Extensions (#9182)
Summary:
With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`.

This addresses issues such as
```
/home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory
```
Closes https://github.com/pytorch/pytorch/pull/9182

Reviewed By: soumith

Differential Revision: D8734581

Pulled By: fmassa

fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a
2018-07-05 07:55:25 -07:00
c61f0217a5 combine size_average and reduce args in loss functions (#8018)
Summary:
closes #7929
Closes https://github.com/pytorch/pytorch/pull/8018

Differential Revision: D8682540

Pulled By: li-roy

fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936
2018-07-01 05:39:00 -07:00
67b21117b7 Add BatchTensor class (#8922)
Summary:
Add BatchTensor class
- construct from data, mask, dims or construct from list of tensors
- can return a list of tensors from an BatchTensor class

next step: do IR level transformation and operators
Closes https://github.com/pytorch/pytorch/pull/8922

Differential Revision: D8668986

Pulled By: ChunliF

fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e
2018-06-29 15:57:27 -07:00
f74207c99f Allow autograd to work even when the shape of values cannot be determined (#8641)
This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410
to workaround the need to create zero tensors with the same shape as inputs.
It introduces the concept of a LinearBlock which marks places in the code
where we know if all the inputs to the node are zero, then the outputs
to the node are also zero. Autodiff introduces LinearBlocks around
backwards functions, which have this property. specializeUndef then
propagates Undef nodes using this information.

Notes:
* Since we do not always specialize, we have a pass LowerLinearBlocks
that replaces the block with an if statement that dynamically guards
the Undef case.
* We introduce AutogradAdd which is addition that still works when
its inputs might be undefined. In cases where we specialize this will
get removed in favor of a normal add, but there are cases where
gradient graphs do not specialize (e.g. when they are not differentiable,
but a derivative is required) so it is important for this op to be executable.
2018-06-25 18:40:04 -07:00
5a7b4840d9 Move nanopb-generated ONNX to unique file name (#8773)
* Move nanopb-generated ONNX to unique file name

* fix other places
2018-06-22 09:51:56 -04:00