Commit Graph

405 Commits

Author SHA1 Message Date
c864454a8f C++ API parity: ELU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27028

Test Plan: Imported from OSS

Differential Revision: D17682406

Pulled By: pbelevich

fbshipit-source-id: 9c313237cb93b9870c6fcf8d01b3dbe4af4c6f2a
2019-10-02 07:12:08 -07:00
ed2607486f add mobile friendly at:parallel_for backend
Summary:
This diff implemented at::parallel_for()/parallel_reduce() and other
ATen/Parallel.h APIs for mobile using caffe2::ThreadPool.

caffe2::ThreadPool doesn't support submitting individual tasks
separately and running them in parallel - all tasks need to be submit in
one batch which will lock the thread pool until all of them finish - as a
result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface
and reuse at::parallel_for() implementation in ParallelNative.h. Because
of this constraint, intraop_launch() / intraop_launch_future() are not
supported yet.

This diff doesn't touch inter-ops pool - it's still default native c10
thread pool. Will work on it when it's widely used.

Test Plan: - This is early draft to receive feedback. Will do more thorough tests.

Differential Revision: D17543412

Pulled By: ljk53

fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30
2019-09-25 22:33:06 -07:00
7fc06ea541 Bytecode export flow (#25187)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187

The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
    * The module object (in data.pkl) is the same as the original JIT model.
    * The serializer is dependent on pickle only (no protobuf or Json).
    * The major functionality is forked in ScriptModuleSerializer2::serialize().
    * The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).

The output layout looks like:

* folders of methods.
    * In each method folder (for example, forward/):
        * bytecode.pkl: instructions and operators
        * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.

Test Plan: Imported from OSS

Differential Revision: D17076411

fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046
2019-09-25 16:35:45 -07:00
7e619650c9 Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp (#26432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26432

Move unpickler related codes from pickler.h/cpp to unpickler.h/cpp. In import flow we link to unpickler only.

Test Plan: Imported from OSS

Differential Revision: D17465410

fbshipit-source-id: 9d34629aa05bc0b45383e8f809c87baa186c9804
2019-09-21 11:56:48 -07:00
872ca919a9 Distance module (#26424)
Summary:
Adds `Distance` module parity.
https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26424

Differential Revision: D17487314

Pulled By: yf225

fbshipit-source-id: c7d124cb4afb08a4733e7212af0bb276bf32d172
2019-09-20 07:28:49 -07:00
61197e94b3 Remove torch.save-related logic from pickler (#25502)
Summary:
The Pickler previously had a distinction between tensors that would be inlined in 1 pickle binary (matching the format of `torch.save()`) and tensors that are saved elsewhere with only a reference stored in the binary. This PR moves that distinction out to `torch::pickle_save` to match the eager Python interface.

The change can be seen in `register_prim_ops.cpp` where the call to `jit::pickle` is now `torch::pickle_save`
](https://our.intern.facebook.com/intern/diff/17175215/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25502

Pulled By: driazati

Differential Revision: D17175215

fbshipit-source-id: 8c9a21327cc79eaf6a0e488ea99e305be52f82b1
2019-09-17 20:38:13 -07:00
9181b9c73e Enable basic GPU profiling capability on ROCm. (#26300)
Summary:
Inserting markers using the nvtx-equivalent API is not supported yet.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26300

Differential Revision: D17425573

Pulled By: bddppq

fbshipit-source-id: 4df6c695ba07ab68e7f4dc2f77edde06f78fdac7
2019-09-17 12:11:27 -07:00
57a4b7c55d Re-organize C++ API torch::nn folder structure (#26262)
Summary:
This PR aims to re-organize C++ API `torch::nn` folder structure in the following way:
- Every module in `torch/csrc/api/include/torch/nn/modules/` (except `any.h`, `named_any.h`, `modulelist.h`, `sequential.h`, `embedding.h`) has a strictly equivalent Python file in `torch/nn/modules/`. For  example:
`torch/csrc/api/include/torch/nn/modules/pooling.h` -> `torch/nn/modules/pooling.py`
`torch/csrc/api/include/torch/nn/modules/conv.h` -> `torch/nn/modules/conv.py`
`torch/csrc/api/include/torch/nn/modules/batchnorm.h` -> `torch/nn/modules/batchnorm.py`
`torch/csrc/api/include/torch/nn/modules/sparse.h` -> `torch/nn/modules/sparse.py`
- Containers such as  `any.h`, `named_any.h`, `modulelist.h`, `sequential.h` are moved into `torch/csrc/api/include/torch/nn/modules/container/`, because their implementations are too long to be combined into one file (like `torch/nn/modules/container.py` in Python API)
- `embedding.h` is not renamed to `sparse.h` yet, because we have another work stream that works on API parity for Embedding and EmbeddingBag, and renaming the file would cause conflict. After the embedding API parity work is done, we will rename `embedding.h` to  `sparse.h` to match the Python file name, and move the embedding options out to options/ folder.
- `torch/csrc/api/include/torch/nn/functional/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/functional/pooling.h` contains the functions for pooling, which are then used by the pooling modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`.
- `torch/csrc/api/include/torch/nn/options/` is added, and the folder structure mirrors that of `torch/csrc/api/include/torch/nn/modules/`. For example, `torch/csrc/api/include/torch/nn/options/pooling.h` contains MaxPoolOptions, which is used by both MaxPool modules in `torch/csrc/api/include/torch/nn/modules/pooling.h`, and max_pool functions in `torch/csrc/api/include/torch/nn/functional/pooling.h`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26262

Differential Revision: D17422426

Pulled By: yf225

fbshipit-source-id: c413d2a374ba716dac81db31516619bbd879db7f
2019-09-17 10:07:29 -07:00
2ce8c83f67 Enable CPU fused kernel on Windows
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25578

Differential Revision: D17397156

Pulled By: ezyang

fbshipit-source-id: b243528c2bfd5a0d401897833048429e67fe40ef
2019-09-17 07:29:40 -07:00
be82239c86 Port fuse_linear from pytorch/tvm (#25623)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623

Port over fuse_linear pass from pytorch/tvm project, we'll need this
in backend specific quantization pass to match aten::linear and swap
it with quantized linear

Test Plan:
python test/test_jit.py 'TestJit.test_fuse_linear'

Imported from OSS

Differential Revision: D17208890

fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116
2019-09-12 18:51:13 -07:00
28a2dafc15 C++ Average Pool Module (#25800)
Summary:
This PR adds Average Pool module to C++ front-end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800

Differential Revision: D17318094

Pulled By: yf225

fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db
2019-09-11 16:39:56 -07:00
ba9fda14a7 C++ MaxPool Module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24860

Differential Revision: D17260361

Pulled By: yf225

fbshipit-source-id: 4b8c894d3bdf675cfeb9fc84934fe0339a048c1e
2019-09-11 08:56:57 -07:00
e04836004d L1Loss module (#25902)
Summary:
yf225 This is L1Loss module. I don't think that ```_Loss``` and ```_WeightedLoss``` as base Python classes do anything. First one sets reduction type and also takes in ```reduce``` parameter which is deprecated. The second one only registers ```weight``` parameter. I don't think that we should keep this structure. What do you think?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25902

Differential Revision: D17307045

Pulled By: yf225

fbshipit-source-id: ad3eda2ee8dcf4465054b376c1be89b39d11532f
2019-09-11 07:18:17 -07:00
3680cef44e C++ Fold nn module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24160

Differential Revision: D17260740

Pulled By: yf225

fbshipit-source-id: f0c7769316bed330289ca3d948f2e39c72ec928b
2019-09-10 13:19:37 -07:00
8485710143 introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile
Summary:
This is the first of a series of changes to reduce build size by cutting
autograd functions from mobile build.

When INTERN_DISABLE_AUTOGRAD is set:
* On CMake side we exclude Functions.h/cpp, VariableType*.h/cpp,
  VariableTypeManual.cpp from the build process. Still keep variable_factories.h
  as we rely on it to create variables instead of tensors.
* In source code we gate a couple autograd references (in autograd/variable.cpp)
  with C10_MOBILE (technically we should use a dedicated c macro but its
  maintenance cost is higher than cmake macro as we have several build systems
  to change).
* Pass --disable-autograd flag to codegen script, which will stop generating
  Functions/VariableType code. And for variable_factories.h it will stop
  generating tracing code.

Edit: in this diff we will keep Functions.h/cpp to avoid changing source code.

Why we need this change if it's already not calling VariableType and autograd
stuff with USE_STATIC_DISPATCH=ON for mobile?
It's trying to reduce static library size for iOS build, for which it's
relatively harder to strip size with linker approach.

Why we need make involved change into codegen script?
There isn't a global config system in codegen - autograd/env.py provides similar
functionality but it says not adding anything there.

Test Plan:
- will check CI;
- test mobile build in sample app;

Differential Revision: D17202733

Pulled By: ljk53

fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299
2019-09-10 10:20:17 -07:00
67c530851c get rid of protobuf dependencies (#25650)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25650

This PR removes protobuf dependencies from mobile build altogether:
- caffe2/proto: protobuf files, including caffe2.proto and torch.proto;
- caffe2 components that depend on caffe2.proto, including most part of
caffe2/core, caffe2/utils;
- libprotobuf / libprotobuf-lite dependencies;
- protobuf compiler;
- some utils class, e.g.: netdef_converter.cpp;
- introduce a macro to disable third_party/onnx which depends on protobuf;

Test Plan:
- builds;
- link with demo app to make sure it can load and run a model in pickle format;

Differential Revision: D17183548

Pulled By: ljk53

fbshipit-source-id: fe60b48674f29c4a9b58fd1cf8ece44191491531
2019-09-06 08:48:20 -07:00
9c5a899773 Enable jit fusion on ROCm (#22872)
Summary:
As of ROCm 2.6, we support hiprtc - the HIP runtime compilation API. Enable the jit fusion feature depending on the existence of such an API. This entails
* new hipification rules for API_RTC
* add hiprtc APIs to the shim loader
* update cmake infrastructure to find the hiprtc library (it is part of the HIP package)
* enabling of unit tests in the jit_fuser test set
* special casing in resource strings for HIP - the typedefs CUDA requires would be redundant
* for now disable the occupancy calculation we do not support yet and hard-code

Thanks to t-vi for working with me on getting this integration done!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22872

Differential Revision: D17207425

Pulled By: bddppq

fbshipit-source-id: 93409f3051ad0ea06afacc2239fd6c402152debe
2019-09-05 18:22:08 -07:00
197fd4f707 Adding RRef as return value for builtin operators (#25169)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25169

See #23110 for RRef design details. This commit only implements
RRef as return value for builtin operators, and RRef will communicate
between a user and the owner. More specifically, a RRef is first
created on the `dist.remote` caller, which is a user of the RRef.
Then the RRef user sends and notification to the owner to report
the fork to the owner, and the owner uses a shared_ptr to keep
the RRef alive. When the user RRef is destructed on the caller,
another notification will be sent to the owner, and the owner
can then drop it's RRef as well.

Test Plan: Imported from OSS

Differential Revision: D17048343

Pulled By: mrshenli

fbshipit-source-id: 9dd3b3d0e4fd214c76fecdbed746a6d3029b3efd
2019-09-05 15:14:17 -07:00
3556bea5aa Build torch.distributed with Gloo backend on macOS (#25260)
Summary:
In facebookincubator/gloo#212, a libuv based Gloo transport was introduced,
which allows us to use Gloo on macOS (and later perhaps also Windows). This
commit updates CMake code to enable building with USE_DISTRIBUTED=1 on macOS.

A few notes:
* The Caffe2 ops are not compiled, for they depend on `gloo::transport::tcp`.
* The process group implementation uses `gloo::transport::tcp` on Linux (because of `epoll(2)` on Linux and `gloo::transport::uv` on macOS).
* The TCP store works but sometimes crashes on process termination.
* The distributed tests are not yet run.
* The nightly builds don't use `USE_DISTRIBUTED=1`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25260

Reviewed By: mrshenli

Differential Revision: D17202381

Pulled By: pietern

fbshipit-source-id: ca80a82e78a05b4154271d2fb0ed31c8d9f26a7c
2019-09-05 07:09:50 -07:00
a35a63b8bd move legacy deserialization code into jit/import_legacy.cpp (#25649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25649

Continue the work of PR #25493 to remove dependencies of generated
protobuf headers from jit/import.cpp.

Instead of adding intrusive #if/#else to gate the legacy functions,
moving them into a separate file. Keep the ScriptModuleDeserializer
structure as otherwise it will require a lot of interface changes.

There is not much state to copy from ScriptModuleDeserializer as it only
extracts extra_files before calling into LEGACY_deserialize. There is
no state to copy back into ScriptModuleDeserializer either as it directly
returns script::Module.

Test Plan:
- builds;
- with stacked PR to remove protobuf from cmake;
- load and run ResNet-18 in model.json format with non-mobile build;
- load and run ResNet-18 in pickle format with mobile build;

Differential Revision: D17183549

Pulled By: ljk53

fbshipit-source-id: 2947b95659cd16046d9595fb118d22acc179b3ad
2019-09-05 03:16:10 -07:00
40cb5182e9 Attach 'send' autograd function to the autograd graph as part of RPC. (#24876)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24876

This contains very basic functionality of adding 'send' autograd
function to our autograd graph. The purpose of this change is to validate the
basic structure proposed here makes sense. Once this makes sense, we can build
upon this to address more complicated scenarios. At a high level we've added
the following functionality:

1) Define a very simple 'SendRpcBackwards' autograd function.
2) Attach this function to appropriate tensors when we call an RPC.
3) Store the send function in our distributed autograd context.
ghstack-source-id: 89359708

Test Plan: unit tests.

Differential Revision: D16903255

fbshipit-source-id: 6c04794a8e58b199795404225fd9da0c1440460e
2019-09-01 23:54:01 -07:00
03f67e4b16 Remove BUILD_ATEN_ONLY build option (#24441)
Summary:
This build option no longer works.

Close https://github.com/pytorch/pytorch/issues/21703
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24441

Differential Revision: D17138131

Pulled By: ezyang

fbshipit-source-id: 67adac990645a5df1f7c2e2dbef3689b2c30fcf8
2019-08-30 13:44:38 -07:00
25e6a52e2e Stop doing nn wrap. (#25353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25353

It doesn't seem necessary anymore.

Test Plan: Imported from OSS

Differential Revision: D17101569

Pulled By: gchanan

fbshipit-source-id: 67a198ae594dcd64dbd7cf6a73e2160e26e3513e
2019-08-30 07:42:20 -07:00
c56464d13e Turn off warnings on Windows CI. (#24331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24331

Currently our logs are something like 40M a pop.  Turning off warnings and turning on verbose makefiles (to see the compile commands) reduces this to more like 8M. We could probably reduce log size more but verbose makefile is really useful and we'll keep it turned on for Windows.

Some findings:

1. Setting `CMAKE_VERBOSE_MAKEFILE` inside CMakelists.txt itself as suggested in https://github.com/ninja-build/ninja/issues/900#issuecomment-417917630 does not work on Windows. Setting `-DCMAKE_VERBOSE_MAKEFILE=1` does work (and we respect this environment variable.)
2. The high (`/W3`) warning level is by default on MSVC is due to cmake inserting this in the default flags. On recent versions of cmake, CMP0092 can be used to disable this flag in the default set. The string replace trick sort of works, but the standard snippet you'll find on the internet won't disable the flag from nvcc. I inspected the CUDA cmake code and verified it does respect CMP0092
3. `EHsc` is also in the default flags; this one cannot be suppressed via a policy. The string replace trick seems to work...
4. ... however, it seems nvcc implicitly inserts an `/EHs` after `-Xcompiler` specified flags, which means that if we add `/EHa` to our set of flags, you'll get a warning from nvcc. So we probably have to figure out how to exclude EHa from the nvcc flags set (EHs does seem to work fine.)
5. To suppress warnings in nvcc, you must BOTH pass `-w` and `-Xcompiler /w`. Individually these are not enough.

The patch applies these things; it also fixes a bug where nvcc verbose command printing doesn't work with `-GNinja`.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17131746

Pulled By: ezyang

fbshipit-source-id: fb142f8677072a5430664b28155373088f074c4b
2019-08-30 07:11:07 -07:00
490eb7fed9 Add GET_ATTR instruction (#25151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25151

The prim::GetAttr operator depends on node. However, in lite interpreter there will be no node dependency. Promote the operator to a first-class instruction.

Test Plan: Imported from OSS

Differential Revision: D17076412

fbshipit-source-id: 8de20978445bb598634c5462e66e4459dcd567be
2019-08-28 20:45:55 -07:00
5dd01a7eea Pull instruction definitions out of interpreter.cpp. (#25148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25148

Instructions will be used in lite interpreter as well. Pull it out of interpreter.cpp, so that the lite interpreter doesn't have to compile with interpreter.cpp.

Test Plan: Imported from OSS

Differential Revision: D17076413

fbshipit-source-id: 99b3d8d27a96823a4a4dde6b2337ee44635e34cb
2019-08-28 20:17:36 -07:00
8756ec989e bind autograd functions into C++ (#24342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24342

Right now the two APIs that provided in autograd package only have
python bindings and we could not call them either in C++ API or in
TorchScript. This PR make these two APIs available purely in C++ (with
preserving semantics) and can be used in C++ API and TorchScript

Differential Revision: D16923271

fbshipit-source-id: 049d6fbd94cd71ecc08b2716f74d52ac061f861e
2019-08-20 15:36:34 -07:00
b6803d62fd Use snake names for all files in distributed.rpc (#24502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24502

Files in distributed.rpc package mixes snake camel names. This
commit cleans that up and all files use snake names now.
ghstack-source-id: 88548990

Reviewed By: xush6528

Differential Revision: D16860155

fbshipit-source-id: 3a22a89bf6c4e11aac5849564fc53296a04d6a8b
2019-08-19 10:58:59 -07:00
dfdb86a595 big cpp test reorg (#24801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24801

This is to fix the ODR-violations in fbcode static builds, which have been broken for several months.

This PR is unfortunately quite large, but the changes are only mechanical:
1. Tests defined in header files -> tests defined in cpp files
2. Remove the `torch::jit::testing` namespace -> `torch::jit`.
3. Single `test.h` file that aggregates all tests.
4. Separate out files for gtest and python versions of the tests instead of using a build flag
5. Add a readme for how to add a new test, and explaining a bit about why the cpp tests are the way they are.

Test Plan: Imported from OSS

Differential Revision: D16878605

Pulled By: suo

fbshipit-source-id: 27b5c077dadd990a5f74e25d01731f9c1f491603
2019-08-18 16:49:56 -07:00
mal
6b656565ab Hooks for C++ API (#24393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24393

Ability to register hook on a variable, similar to python autograd API. register_hook will take a function as argument and create a CppFunctionPreHook similar to PyFunctionPreHook.
It will return the index of the hook which can be passed to remove_hook to disable the hook.

Test Plan: Added tests.

Differential Revision: D16861722

fbshipit-source-id: d08047f932e38c7bde04283a18b2d0311c8ad604
2019-08-16 12:44:20 -07:00
75c1419b46 Add Pickler C++ API (#23241)
Summary:
This PR adds functions to wrap the Pickler and exposes them to the C++ API

](https://our.intern.facebook.com/intern/diff/16746451/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241

Pulled By: driazati

Differential Revision: D16746451

fbshipit-source-id: 25ea5db4174006ce41e2e8989c8a345b82f637a7
2019-08-12 14:43:31 -07:00
3c1270a730 Revert D16675418: [jit] Add Pickler C++ API
Differential Revision:
D16675418

Original commit changeset: 76543c81ac67

fbshipit-source-id: f0249d16d363c4ecbceecd1bf610dc280e659cc0
2019-08-09 13:13:15 -07:00
01d98c7cfb Add Pickler C++ API (#23241)
Summary:
This PR adds functions to wrap the Pickler and exposes them to the C++ API
](https://our.intern.facebook.com/intern/diff/16675418/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23241

Pulled By: driazati

Differential Revision: D16675418

fbshipit-source-id: 76543c81ac67c3e20a75ebc2073191bcbd6573bf
2019-08-09 12:25:30 -07:00
8b349073ce sync and async torch.distributed.rpc for builtin operators (#23228)
Summary:
Features:

* sync and async RPC for builtin operators
* RpcAgent API
* ProcessGroupAgent implementation

Goal:

* have a minimum working and testable RPC implementation
* make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation
  * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object.
  * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...).
* support blocking and non-blocking RequestCallback
  * blocking means the callback won't return before sending out the response
  * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list.

We are not exporting this diff until we finalize distributed autograd design and publish the API review publicly.

https://fb.quip.com/FabTAZKVgQpf

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23228
ghstack-source-id: 87816717

Reviewed By: zhaojuanmao

Differential Revision: D15194693

fbshipit-source-id: 7adb600796613cde6073db6c227451b89940ecaf
2019-08-06 16:03:01 -07:00
f81db8afb8 Initial torchbind prototype (#21098)
Summary:
I have some test code in there as well, along with a script "test_libtorch" to run it. You'll need to modify `test_libtorch` to point to where you have `pytorch` built. I currently require that `pybind11` is included as a subdirectory of the test, but added it to the `.gitignore` to make this reviewable.

Currently, something like this works:
```cpp
struct Foo {
  int x, y;
  Foo(): x(2), y(5){}
  Foo(int x_, int y_) : x(x_), y(y_) {}
  void display() {
    cout<<"x: "<<x<<' '<<"y: "<<y<<endl;
  }
  int64_t add(int64_t z) {
    return (x+y)*z;
  }
};
static auto test = torch::jit::class_<Foo>("Foo")
                    .def(torch::jit::init<int64_t, int64_t>())
                    .def("display", &Foo::display)
                    .def("add", &Foo::add)
                    .def("combine", &Foo::combine);

```
with
```py
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    val.display()
    print(val.add(3))
```
results in
```
x: 5 y: 3
24
```

Current issues:
- [x] The python class created by torchscript doesn't interactly properly with the surrounding code.
```
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    return val
```
- [x] Doesn't properly take in non-pointer classes. Can't define this function signature in cpp (We don't want to support this I believe).
```cpp
  void combine(Foo x) {
```

- [x] Has some issues with memory for blobs when constructing multiple objects (fix constant propagation pass to not treat capsules as the same object).
```py
torch.jit.script
def f(x):
    val = torch._C.Foo(5, 3)
    val2 = torch._C.Foo(100, 0)
    val.display()
    print(val.add(3))
```
- [ ] Can't define multiple constructors (need to define overload string. Currently not possible since we don't support overloaded methods).
- [x] `init` is a little bit different syntax than `pybind`. `.init<...>()` instead of `.def(py::init<>())`
- [x] I couldn't figure out how to add some files into the build so they'd be copied to the `include/` directories, so I symlinked them manually.
- [ ] Currently, the conversion from Python into Torchscript doesn't work.
- [ ] Torchbind also currently requires Python/Pybind dependency. Fixing this would probably involve some kind of macro to bind into Python when possible.
- [ ] We pass back into Python by value, currently. There's no way of passing by reference.
- [x] Currently can only register one method with the same type signature. This is because we create a `static auto opRegistry`, and the function is templated on the type signature.

Somewhat blocked on https://github.com/pytorch/pytorch/pull/21177. We currently use some structures that will be refactored by his PR (namely `return_type_to_ivalue` and `ivalue_to_arg_type`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21098

Differential Revision: D16634872

Pulled By: Chillee

fbshipit-source-id: 1408bb89ea649c27d560df59e2cf9920467fe1de
2019-08-02 18:45:15 -07:00
87a75bd605 remove ONNX & Turn on NO_API for mobile build (#23546)
Summary:
### Summary
The iOS build was broken after this PR 👉 [23195](https://github.com/pytorch/pytorch/pull/23195/files) was merged, as there are two files still have dependency on ONNX.
- `test.cpp` in `test/cpp/jit`
-  `export.cpp` in `torch/csrc/jit`

This PR is to remove ONNX completely from mobile build.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23546

Test Plan:
- The `build_ios.sh` finished successfully.
- The `libtorch.a` can  be compiled and run on iOS devices

Differential Revision: D16558236

Pulled By: xta0

fbshipit-source-id: b7ff1db750698cfd5a72d5cb0b9f2f378e315077
2019-07-31 10:42:56 -07:00
ca76c82ce3 Add early returns to JIT (#19179)
Summary:
Add early returns to JIT with minimal changes to compiler.cpp and an IR->IR pass that will transform the graph so that there is only one return value.

In compiler.cpp, record when a block will exit so that in the following example will work:
```
if cond:
    a = torch.zeros([2])
else:
    return 2
a += 2
...
```
To match block outputs with values that will not be used, like in the above example with `a`, I add a Bottom Type that subtypes everything else. This allows shape propagation to continue to work, and makes it so that we don't need many extra nodes filling up the graph.

The IR transform currently doesn't work on Loops, I didn't add that to this PR to avoid too much complexity, but will add it as a stack (and it should be very little extra code). the IR  transform is commented at the top of the file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19179

Differential Revision: D16519819

Pulled By: eellison

fbshipit-source-id: 322a27f69966d1fd074ebe723c3e948b458b0e68
2019-07-26 16:42:43 -07:00
74f8094ea5 Rename threading build options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23407

Test Plan:
USE_CUDA=0 ATEN_THREADING=TBB USE_OPENMP=0 USE_TBB=1 MKL_THREADING=TBB
BLAS=MKL USE_MKLDNN=1 MKLDNN_THREADING=TBB BUILD_BINARY=1 python
setup.py develop install --cmake

./build/bin/parallel_info

Imported from OSS

Differential Revision: D16522538

Pulled By: ilia-cher

fbshipit-source-id: 75c4761d93a7f5936f28e4c5eedcd27d8490d0c5
2019-07-26 13:09:14 -07:00
7ee62d3d91 Fix the iOS build (#23293)
Summary:
The legacy iOS build script (`build_ios.sh`) is still working, but the output is in caffe2, not Pytorch. To enable the Pytorch iOS build, we can set the value of `BUILD_CAFFE2_MOBILE` to `NO`, and turn on another cmake arg - `INTERN_BUILD_MOBILE` ljk53  has created for Android.

There is a trivial issue in `used_kernel.cpp` that will cause the compiling error when running `build_ios.sh`, as it uses a `system`API that has been deprecated since iOS 11. The fix below is to bypass this file since it's not needed by mobile.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23293

Test Plan:
The `build_ios.sh` completed successfully, and all the generated static libraries can be compiled and linked successfully on iOS devices.

### Build script

```shell
./scripts/build_ios.sh \
-DBUILD_CAFFE2_MOBILE=OFF \
-DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
-DPYTHON_EXECUTABLE=$(python -c 'import sys; print(sys.executable)')
```

Differential Revision: D16456100

Pulled By: xta0

fbshipit-source-id: 38c73e1e3a0c219a38ddc28b31acc181690f34e8
2019-07-25 12:41:20 -07:00
bdb1e1305d exclude some caffe2 modules from libtorch mobile build (#20000)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/20000
ghimport-source-id: f47773ef1c6849cd0c0e65080400416c6b370d39

Test Plan:
- verified libtorch mobile library builds and links successfully;

Imported from OSS

Differential Revision: D15169024

Pulled By: ljk53

fbshipit-source-id: 20ac89c6e7053239c93e51f00c5c5dc3595bea74
2019-07-23 16:20:27 -07:00
535c5540bc Back out "Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen"" (#22794)
Summary:
Original commit changeset: 227df3b85316

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22794
ghstack-source-id: 86400904

Differential Revision: D16222777

fbshipit-source-id: 0b198ac59e640df0b8204b4ed30f8e822c15fd9a
2019-07-15 06:28:56 -07:00
cf2889ad8f add support for breaks and continues (#21692)
Summary:
Add support for breaks and continues in the jit. We do with a Graph transform pre-SSA.

A graph of the form
```
def test():
    while i < 5:
        if i == 3:
            break
        i += 1
        print(i)
```
has the body of the loop transformed to
```
if i == 3:
    did_break = True
else:
    did_break = False
if did_break:
    loop_exit = True
else:
    i += 1
    print(i)
    loop_exit = i < 5
```

I am going to add more tests but I think it is ready for review now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/21692

Differential Revision: D16215807

Pulled By: eellison

fbshipit-source-id: 365102f42de4861d9323caaeb39a96de7619a667
2019-07-12 15:02:44 -07:00
ac78a86e1d Back out "[pytorch][PR] Move thnvrtc and DynamicLibrary to ATen" (#22749)
Summary:
Original commit changeset: add2ee8a8865

Pull Request resolved: https://github.com/pytorch/pytorch/pull/22749
ghstack-source-id: 86323899

Differential Revision: D16203552

fbshipit-source-id: 227df3b85316315c15d2cb7b6a5c884096a82e9e
2019-07-11 12:21:21 -07:00
mal
58e20638f7 Refactoring _wrap_outputs to remove python dependence.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22631

Test Plan:
test suite

Imported from OSS

Differential Revision: D16185040

fbshipit-source-id: 9b83749f6c9cd05d13f54a3bb4801e263293252b
2019-07-10 12:12:16 -07:00
31d821e267 Move thnvrtc and DynamicLibrary to ATen (#22362)
Summary:
Having the NVRTC stub in ATen is necessary to call driver APIs in ATen. This is currently blocking https://github.com/pytorch/pytorch/pull/22229.

`DynamicLibrary` is also moved as it is used in the stub code, and seems general enough.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22362

Differential Revision: D16131787

Pulled By: ezyang

fbshipit-source-id: add2ee8a8865229578aa00001a00d5a6671e0e73
2019-07-09 07:28:27 -07:00
221af09ca7 Move GradMode / AutoGradMode / NoGradGuard to ATen core (#18573)
Summary:
After the Variable/Tensor merge, code paths in ATen need to be able to check whether a tensor requires gradient, and throw errors in places where a `requires_grad=true` tensor is not allowed (such as https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Utils.h#L76-L78 and https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/SparseTensorImpl.cpp#L86). Since the `GradMode` thread-local variable controls whether a tensor should accumulate gradients, we need to be able to check this variable from ATen when we determine whether a tensor requires gradient, hence the PR to move `GradMode` / `AutoGradMode` / `NoGradGuard` to ATen.

Note that we intentionally don't merge `at::GradMode` and `at::NonVariableTypeMode`, with the following reasoning:
Semantically, `at::GradMode` and `at::NonVariableTypeMode` actually mean different things: `at::GradMode` controls whether a tensor should accumulate gradients, and `at::NonVariableTypeMode` controls whether a Variable should be treated as a non-Variable tensor in type dispatches. There are places whether we *don't* want the tensor to accumulate gradients, but *still* want the Variable to be treated as a Variable. Here is one example:
```python
#  torch/tensor.py
with torch.no_grad():
   ...
   new_tensor = self.new()    # `at::GradMode` is false at this point
   ...
```
```cpp
// tools/autograd/templates/python_variable_methods.cpp
static PyObject * THPVariable_new(PyObject* self, PyObject* args, PyObject* kwargs)
{
  ...
  // if we merge `at::GradMode` and `at::NonVariableTypeMode`, since `at::GradMode` is false and `self_.type()` checks `at::GradMode` to decide whether to return non-Variable type, it will return a non-Variable type here, which is not what we want (and throws a "Tensor that was converted to Variable was not actually a Variable" error)
  return THPVariable_Wrap(torch::utils::legacy_tensor_new(self_.type(), args, kwargs));
  ...
}
```
For the above reason, we cannot merge `at::GradMode` and `at::NonVariableTypeMode`, as they have different purposes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/18573

Differential Revision: D16134413

Pulled By: yf225

fbshipit-source-id: 6140347e78bc54206506499c264818eb693cdb8a
2019-07-05 23:41:37 -07:00
91706d1044 Primitive Jit Logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22278

Differential Revision: D16134598

Pulled By: Krovatkin

fbshipit-source-id: e64b14d0d68801189fc78c059a4e8b322acce3fa
2019-07-05 15:27:38 -07:00
6721e67c10 Remove hacky stub for quantized ops (#22388)
Summary:
Effectively reverts https://github.com/pytorch/pytorch/pull/18267 - this was a temporary measure and is not used any more.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22388

Differential Revision: D16070725

Pulled By: dzhulgakov

fbshipit-source-id: ee5db11a608f248b0da981169d4cc90470fd482f
2019-07-01 23:21:42 -07:00
ffa15d2285 Load original SourceRanges on import (#22180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22180
ghimport-source-id: efa46dcb845c099f0a746f523901ab2c2cd3b004

Test Plan: Imported from OSS

Differential Revision: D15981425

Pulled By: jamesr66a

fbshipit-source-id: bef682bd13c1a5be95bdb97e025690c6f2d523d3
2019-07-01 21:14:39 -07:00
6ff0c6ca3f Remove THD (#22065)
Summary:
It's been ~9 months since moving THD to the `torch.distributed.deprecated` namespace (see https://github.com/pytorch/pytorch/issues/11405) and we haven't seen issues related to it, so it's time to remove it.

Closes https://github.com/pytorch/pytorch/issues/18967.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22065

Reviewed By: mrshenli

Differential Revision: D15983669

Pulled By: pietern

fbshipit-source-id: 2a2f5866f9a63040bc7cef3956d5fd215aba7165
2019-06-25 12:19:13 -07:00