Commit Graph

49 Commits

Author SHA1 Message Date
93f7f58856 Make lazy codegen honor per-operator-headers flag (#74450)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74450

- per-operator-headers is a strict build mode where compulation units aren't allowed
to depend on bulk headers like ATen/Functions.h, but must instead depend only on the
specific operator headers used.  (In other configurations, the reverse is required).

Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps

Reviewed By: ezyang, bdhirsh

Differential Revision: D35002666

fbshipit-source-id: 712445f8d146cf026759444fbd42a20705be9bef
(cherry picked from commit f13e5522d49a6edcb6aed4431b1ec8e2b50a98fc)
2022-03-22 16:31:21 +00:00
72b1194464 Run lazy tensor codegen in generate_code.py (#73996)
Summary:
Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel.

Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later)

Bazel support is added in a later diff.

Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996

Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h

Reviewed By: ezyang

Differential Revision: D34408536

fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515
(cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)
2022-03-17 15:31:26 +00:00
fe91906ad7 Remove Declarations.yaml dependency from gen_autograd (#67496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496

gen_autograd.py doesn't use `Declarations.yaml` any more, and removing
the dependency allows it to run in parallel with
`tools/codegen/gen.py`.

Test Plan: Imported from OSS

Reviewed By: dagitses, ejguan

Differential Revision: D32027251

Pulled By: albanD

fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703
2021-11-03 13:19:24 -07:00
737d920b21 Strictly type everything in .github and tools (#59117)
Summary:
This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117

Test Plan:
```
flake8
mypy --config mypy-strict.ini
```

Reviewed By: malfet

Differential Revision: D28765386

Pulled By: samestep

fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e
2021-06-07 14:49:36 -07:00
53d8778b4d Update clang-format linux hash and yaml import calls (#53932)
Summary:
Fixing Bandit security issues.
- yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load().
Test ID: B506
Severity: MEDIUM
Confidence: HIGH
File: ./caffe2/contrib/aten/gen_op.py
More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html
235 if __name__ == '__main__':
236     decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader)
237     factory_methods = find_factory_methods(decls)

- Blacklist: Use of insecure MD2 (6149a26adb), MD4 (fc7f026980), MD5 (7ea9d9af4e), or SHA1 hash function.
Test ID: B303
Severity: MEDIUM
Confidence: HIGH
File: ./tools/clang_format_utils.py
More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5
36
37     hash = hashlib.sha1()
38

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932

Reviewed By: jbschlosser

Differential Revision: D27072017

Pulled By: malfet

fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce
2021-03-18 17:11:58 -07:00
5252e9857a [pytorch] clean up unused util srcs under tools/autograd (#50611)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50611

Removed the unused old-style code to prevent it from being used.
Added all autograd/gen_pyi sources to mypy-strict.ini config.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Confirmed clean mypy-strict run:
```
mypy --config mypy-strict.ini
```

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D25929730

Pulled By: ljk53

fbshipit-source-id: 1fc94436fd4a6b9b368ee0736e99bfb3c01d38ef
2021-01-18 23:54:02 -08:00
249261ada7 Remove generated_unboxing_wrappers and setManuallyBoxedKernel (#49251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49251

Since all ops are c10-full and use templated unboxing now, we don't need to codegenerate any unboxing logic anymore.
Since this codegen was the only code using setManuallyBoxedKernel, we can also remove that functionality from KernelFunction, OperatorEntry and Dispatcher.
ghstack-source-id: 119450486

Test Plan: waitforsandcastle

Reviewed By: ezyang

Differential Revision: D25502865

fbshipit-source-id: 49d009df159fda4be41bd02457d4427e6e638c10
2021-01-06 14:22:50 -08:00
2dff0b3e91 Fix typos in comments (#48316)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48316

Reviewed By: walterddr, mrshenli

Differential Revision: D25125123

Pulled By: malfet

fbshipit-source-id: 6f31e5456cc078cc61b288191f1933711acebba0
2020-11-24 10:56:40 -08:00
d91cefb0d8 [pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745

This is a relatively small codegen. Reintroduced 'simple_type' to preserve
old codegen output.

It depends on some methods defined in gen_python_functions.py - next PR will
clean up the remaining Declarations.yaml methods in gen_python_functions.py.

Confirmed byte-for-byte compatible with the old codegen:
```
Run it before and after this PR:
  .jenkins/pytorch/codegen-test.sh <baseline_output_dir>
  .jenkins/pytorch/codegen-test.sh <test_output_dir>

Then run diff to compare the generated files:
  diff -Naur <baseline_output_dir> <test_output_dir>
```

Differential Revision: D24885068

Test Plan: Imported from OSS

Reviewed By: ezyang

Pulled By: ljk53

fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f
2020-11-14 02:24:39 -08:00
4159191f0e [pytorch] split out trace type generator and migrate to new codegen model (#47438)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47438

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24808211

Pulled By: ljk53

fbshipit-source-id: 44dfadf550a255c05aa201e54b48101aaf722885
2020-11-09 12:39:39 -08:00
3d421b3137 [pytorch] rewrite of the python binding codegen with the v2 API (#46244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244

- What does the generated binding code do?

The Python binding codegen produces code that takes the input list of
PyObjects, finds the matching ATen C++ function using PythonArgParser,
converts the PyObjects into C++ types and calls the ATen C++ function:

```
+--------+  parsing   +------------------------+  binding   +-----------------------+
| PyObjs | ---------> | PythonArgParser Output | ---------> | Cpp Function Dispatch |
+--------+            +------------------------+            +-----------------------+
```

- Are Python arguments 1-1 mapped to C++ arguments?

Python arguments might be reordered, packed, unpacked when binding to
C++ arguments, as illustrated below:

```
// Binding - Reorder & Packing
// aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None,
                     Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor

            Python Args               Cpp Args
-----------------------------------------------------------
         0: size                      size
         1: names                     names
         2: memory_format -------+
         3: dtype         -----+-|--> options
         4: layout            /  |
         5: device           /   +--> memory_format
         6: pin_memory      /
         7: requires_grad -+

// Binding - Unpacking
// aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices)

            Python Args               Cpp Args
-----------------------------------------------------------
                               +----> max
                              /-----> max_values
         0: input            /        self
         1: dim             /         dim
         2: keepdim        /          keepdim
         3: out      -----+
```

- Why do we want to rewrite the python binding codegen?

The old codegen takes Declarations.yaml as input. It doesn't distinguish
between Python arguments and C++ arguments - they are all mixed together
as a bag of non-typed dict objects. Different methods process these arg
objects and add new attributes for various different purposes. It's not so
obvious to figure out the semantics of these attributes. The complicated
binding logic happens implicitly and scatteredly.

```
+--------------------+
|  Native Functions  |
+--------------------+
  |
  |
  v
+--------------------+
|   Cpp Signatures   |
+--------------------+
  |
  |
  v
+--------------------+
| Declarations.yaml  |
+--------------------+
  |                        +-------------------------------------+
  |              +-------> |       PythonArgParser Schema        |
  |              |         +-------------------------------------+
  |              |                            .
  |              |                            .
  v              |                            .
+--------------------+     +-------------------------------------+
| NonTyped Args Objs | --> | PythonArgParser -> Cpp Args Binding |
+--------------------+     +-------------------------------------+
                 |                            .
                 |                            .
                 |                            .
                 |         +-------------------------------------+
                 +-------> |        Cpp Function Dispatch        |
                           +-------------------------------------+
```

This PR leverages the new immutable data models introduced in the new
aten codegen. It introduces dedicated data models for python schema.
This way, we can not only avoid subtle Declaration.yaml conversions but
also decouple the generation of python schema, python to c++ binding and
c++ function call.

The ultimate state will be like the following diagram:

```
            +-------------------+     +-------------------------------------+
  +-------> | Python Signatures | --> |       PythonArgParser Schema        |
  |         +-------------------+     +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
+------------------+        |         +-------------------------------------+
| Native Functions |        +-------> | PythonArgParser -> Cpp Args Binding |
+------------------+        |         +-------------------------------------+
  |                         |                            .
  |                         |                            .
  |                         |                            .
  |         +-------------------+     +-------------------------------------+
  +-------> |  Cpp Signatures   | --> |        Cpp Function Dispatch        |
            +-------------------+     +-------------------------------------+
```

This PR has migrated the core binding logic from
tools/autograd/gen_python_functions.py to tools/codegen/api/python.py.

It produces the byte-for-byte same results (tested with #46243).

Will migrate the rest of gen_python_functions.py in subsequent PRs.

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D24388874

Pulled By: ljk53

fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841
2020-10-19 17:36:45 -07:00
0c5cd8c2b9 [RFC] Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722

This diff does a bunch of things:

1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files.
2. Adds helper methods to combine operators, debug info, operator lists, etc...
3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that.
4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list.
5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string.

There are a bunch of structural changes as well:

1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names)
2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments
3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library).

ghstack-source-id: 114578753

(Note: this ignores all push blocking failures!)

Test Plan:
`buck test caffe2/test:selective_build`

Generated YAML files after the change:

{P143981979}

{P143982025}

{P143982056}

Ensure that the generated files are same before and after the change:

```
[dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name "*.cpp" | xargs md5sum
d72c3d125baa7b77e4c5581bbc7110d2  ./after_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f  ./after_change/lite_predictor_lib_aten/TypeDefault.cpp
d72c3d125baa7b77e4c5581bbc7110d2  ./before_change/gen_aten/TypeDefault.cpp
42353036c83ebc7620a7159235b9647f  ./before_change/lite_predictor_lib_aten/TypeDefault.cpp
```

`VariableTypes_N.cpp` are generated the same both before and after the change:

```
[dhruvbird@devvm2490 /tmp/VariableType] find -name "*.cpp" | xargs -n 1 md5sum | sort
3be89f63fd098291f01935077a60b677  ./after/VariableType_2.cpp
3be89f63fd098291f01935077a60b677  ./before/VariableType_2.cpp
40a3e59d64e9dbe86024cf314f127fd6  ./after/VariableType_4.cpp
40a3e59d64e9dbe86024cf314f127fd6  ./before/VariableType_4.cpp
a4911699ceda3c3a430f08c64e8243fd  ./after/VariableType_1.cpp
a4911699ceda3c3a430f08c64e8243fd  ./before/VariableType_1.cpp
ca9aa611fcb2a573a8cba4e269468c99  ./after/VariableType_0.cpp
ca9aa611fcb2a573a8cba4e269468c99  ./before/VariableType_0.cpp
e18f639ed23d802dc4a31cdba40df570  ./after/VariableType_3.cpp
e18f639ed23d802dc4a31cdba40df570  ./before/VariableType_3.cpp
```

Reviewed By: ljk53

Differential Revision: D23837010

fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442
2020-10-18 15:10:42 -07:00
a01e91e6b2 [pytorch] include all overloads for OSS custom build
Summary:
For mobile custom build, we only generate code for ops that are used by
specific models to reduce binary size.

There multiple places where we apply the op filtering:
- generated_unboxing_wrappers_*.cpp
- autograd/VariableType*.cpp
- c10 op registration (in aten/gen.py)

For c10 op registration, we filter by the main op name - all overloads
that match the main op name part will be kept.

For generated_unboxing_wrappers_*, we filter by the full op name - only
those having exactly the same overload name will be kept.

This PR changes generated_unboxing_wrappers_* and autograd/VariableType*.cpp
codegen to also filter by the main op name.

The reasons are:
- keeping all overloads can have better backward compatibility;
- generated_unboxing_wrappers_* are relatively small as it only contains
  thin wrappers for root ops.
- generated_unboxing_wrappers_* will be replaced by c10 op registration
  soon anyway.
- autograd/VariableType*.cpp are not included in OSS build.

Why it offers better backward compatibility? #40737 is an example:
It introduced a new `_convolution` overload and renamed the original one
to `_convolution.deprecated`. Before this PR, the model prepared by the
old version PyTorch won't be able to run on the custom mobile build
generated on the PR because `_convolution.deprecated` won't be kept in
the custom build due to full op name matching policy. By relaxing it to
partial matching policy, the mobile custom build CI on the PR can pass.

Will test the size impact for FB production build before landing.

Differential Revision: D22809564

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Pulled By: ljk53

fbshipit-source-id: e2fc017da31f38b9430cc2113f33e6d21a0eaf0b
2020-07-31 12:43:31 -07:00
1e230a5c52 rewrite C++ __torch_function__ handling to work with TensorList operands (#41575)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575

Fixes https://github.com/pytorch/pytorch/issues/34294

This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design.

The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures.

I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support.

Benchmarks are still WIP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725

Reviewed By: mruberry

Differential Revision: D22357738

Pulled By: ezyang

fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3
2020-07-17 08:54:29 -07:00
85128113f9 [Selective build] Enable selective build in VariablType
Summary:
Quick fix due to code merging. With this feature working, the total size reduction in Android is 664 KB (Pytorch -26 KB and papaya - 639 KB)
https://fburl.com/unigraph/c726gvb1

Test Plan: CI

Reviewed By: kwanmacher

Differential Revision: D22053779

fbshipit-source-id: 8da4a651432b453c25e543bc64dbed02946de63d
2020-06-18 14:31:09 -07:00
5b23f56d5a Selective build on Training, query based. (#39452)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39452

Selective build works on training.
* VariableType_?.cpp are now selectively generated based on the operator list.
* Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default.
* For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built.
* smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages.
* At build stage, make selective build optional for training code-gen library.
The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training.
ghstack-source-id: 105190037

(Note: this ignores all push blocking failures!)

Test Plan:
Training:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt
```

Inference, with and without the new query-based feature:
```
buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```

Reviewed By: ljk53

Differential Revision: D21459302

fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f
2020-06-03 18:01:48 -07:00
6d13a334f6 Remove use_c10_dispatcher: unboxed_only (#36838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36838

All ops now do unboxing after dispatch, i.e. c10 handles unboxing and c10 registers a wrapper for the op to JIT
The last op that manually registered its own wrapper to JIT in register_aten_ops.cpp was migrated.

Since there are no ops using use_c10_dispatcher: unboxed_only anymore, we can delete the feature.

Also:
- Rename some files to more accurately describe what they're doing now:
  - OpsAlreadyMovedToC10.h/cpp -> ATenOpList.h/cpp
  - register_aten_ops.cpp -> generated_unboxing_wrappers.cpp
  - gen_jit_dispatch.py -> gen_unboxing_wrappers.cpp
ghstack-source-id: 102532915

Test Plan: waitforsandcastle

Differential Revision: D21100081

fbshipit-source-id: be824958eef33f6cd42a6a652175bd0b1df4ebf9
2020-04-21 13:32:33 -07:00
a91097bdfb Revert D20964368: Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build
Test Plan: revert-hammer

Differential Revision:
D20964368

Original commit changeset: f1874088a597

fbshipit-source-id: d9317ed97a98e2b04c190785b5564536b1096282
2020-04-10 08:19:36 -07:00
586481a6e2 Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build
Test Plan: revert-hammer

Differential Revision:
D20408831

Original commit changeset: ec75dd762c46

fbshipit-source-id: f1874088a5970dd220cc027d0020ab6223b9bd93
2020-04-10 08:03:38 -07:00
7fcf8b0a3b [Lite Interpreter] Operator registration migrate from manual to selective build (#35426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35426

Use selective build with the full set of operators (vs. manually register each used op with "_" prefix).

Lite interpreter relies on JIT operator dispatch. In future we still need JIT operator dispatch dispatch ops that are not registered in c10.
Currently the selective build is for c10/aten dispatch in BUCK. There is JIT selective code-gen in OSS but not ported to BUCK yet.
This diff is also porting the selective code-gen in BUCK.
* The selected op list is passed to gen_jit_dispatch.py.
* The list passed to gen_jit_dispatch is the top-level ops (USED_PT_OPS) only, because the selective c10/aten dispatch already registered other ops that are called from the top-level ops.

ghstack-source-id: 101885215

(Note: this ignores all push blocking failures!)

Test Plan:
1. In Python, run torch.jit.export_opnames(scripted_M_mod)
2. Append the operator names into fbcode/caffe2/pt_ops.bzl and the BUCK target.
3. Run
```
buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/temp/bi_pytext_0315.bc --input_dims "1,4" --input_type int64 --pytext_len=4
```
Should provide expected results.
In addition, the size of the generated code for JIT registration, for example, ```register_aten_ops_0.cpp```, should be significantly reduced (from ~250 KB to ~80KB). The non-selected op registration schema are still kept, but the registration functor is replaced by ```DUMMY_OPERATION```

Reviewed By: ljk53

Differential Revision: D20408831

fbshipit-source-id: ec75dd762c4613aeda3b2094f5dad11804dc9492
2020-04-10 02:31:32 -07:00
81c8ca1e2e Disable tracing for Pytorch Mobile client (#36007)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36007

Tracing is not needed in Pytorch Mobile client. Disabling it has a couple of benefits:
1. It's a pre-requisite to build lite interpreter.
2. It saves the code size for full jit and Federated learning (around 600k).

Solution: use PYTORCH_DISABLE_TRACING to disable it. It's better than passing an argument to code-gen because:
1. It's a single-point change in the code template for both VariableType and VariableFactories.
2. code-gen does not handle VariableTypeManual.cpp. The macro is need there anyway.
ghstack-source-id: 101529401

Test Plan: CI

Reviewed By: ljk53

Differential Revision: D20852558

fbshipit-source-id: c28cec9f90208974acfa351ec9aec3fabbbb8aac
2020-04-05 13:55:38 -07:00
7b04772c51 Keep same autogenerated files structure between fbcode and OSS builds (#35951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35951

Change generate_code to keep folder structure the same regardless of whether install path is provide
Amend build_variables.bzl accordingly

Another preliminary step to merge https://github.com/pytorch/pytorch/pull/35220

Test Plan: CI

Reviewed By: EscapeZero, seemethere

Differential Revision: D20839410

fbshipit-source-id: 02297560a7e48aa7c6271f7a8517fc4a1ab35271
2020-04-03 12:28:07 -07:00
78ad3dc174 Fix Lint (#34218)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34218

Test Plan: Imported from OSS

Differential Revision: D20249788

Pulled By: mrshenli

fbshipit-source-id: 5ca2acaff5344fc4455c70af60576f8e93e54cbf
2020-03-04 09:48:57 -08:00
fdd771c90f Make tracing in code gen optional (#33715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715

Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part.
ghstack-source-id: 99252322

Test Plan:
```
buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1
```
The tracing part of generated/VariableType_?.cpp will not be generated.

Reviewed By: smessmer

Differential Revision: D19684577

fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2
2020-03-04 08:16:31 -08:00
43fb0015db custom build script (#30144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144

Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

Test Plan: Imported from OSS

Differential Revision: D18612127

Pulled By: ljk53

fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a
2019-11-20 13:16:02 -08:00
60372dc713 remove backward functions from jit-op-registry for mobile build (#26851)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851

Add codegen option to remove backward ops from jit-op-registry as they are not
likely to be used for inference only mobile build.

Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219.

Test Plan: - build and integrate with demo app;

Differential Revision: D17587422

Pulled By: ljk53

fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a
2019-09-25 23:17:25 -07:00
8485710143 introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile
Summary:
This is the first of a series of changes to reduce build size by cutting
autograd functions from mobile build.

When INTERN_DISABLE_AUTOGRAD is set:
* On CMake side we exclude Functions.h/cpp, VariableType*.h/cpp,
  VariableTypeManual.cpp from the build process. Still keep variable_factories.h
  as we rely on it to create variables instead of tensors.
* In source code we gate a couple autograd references (in autograd/variable.cpp)
  with C10_MOBILE (technically we should use a dedicated c macro but its
  maintenance cost is higher than cmake macro as we have several build systems
  to change).
* Pass --disable-autograd flag to codegen script, which will stop generating
  Functions/VariableType code. And for variable_factories.h it will stop
  generating tracing code.

Edit: in this diff we will keep Functions.h/cpp to avoid changing source code.

Why we need this change if it's already not calling VariableType and autograd
stuff with USE_STATIC_DISPATCH=ON for mobile?
It's trying to reduce static library size for iOS build, for which it's
relatively harder to strip size with linker approach.

Why we need make involved change into codegen script?
There isn't a global config system in codegen - autograd/env.py provides similar
functionality but it says not adding anything there.

Test Plan:
- will check CI;
- test mobile build in sample app;

Differential Revision: D17202733

Pulled By: ljk53

fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299
2019-09-10 10:20:17 -07:00
25e6a52e2e Stop doing nn wrap. (#25353)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25353

It doesn't seem necessary anymore.

Test Plan: Imported from OSS

Differential Revision: D17101569

Pulled By: gchanan

fbshipit-source-id: 67a198ae594dcd64dbd7cf6a73e2160e26e3513e
2019-08-30 07:42:20 -07:00
8f0603b128 C++ changes toward libtorch and libcaffe2 unification (#19554)
Summary:
* adds TORCH_API and AT_CUDA_API in places
* refactor code generation Python logic to separate
  caffe2/torch outputs
* fix hip and asan
* remove profiler_cuda from hip
* fix gcc warnings for enums
* Fix PythonOp::Kind
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554

Differential Revision: D15082727

Pulled By: kostmo

fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4
2019-04-26 01:38:10 -07:00
21193bf123 try to get rid of tmp_install (#16414)
Summary:
Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414

Differential Revision: D13863635

Pulled By: zdevito

fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828
2019-01-29 17:29:40 -08:00
8a5ba577c1 Revert "remove use of tmp_install" (#15847)
Summary:
This reverts commit 04bf5285896e52ac118d2f9e9b7f582f695f13e2.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847

Differential Revision: D13603174

Pulled By: anderspapitto

fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588
2019-01-08 16:30:19 -08:00
04bf528589 remove use of tmp_install
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553

Differential Revision: D13583335

Pulled By: anderspapitto

fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0
2019-01-04 13:48:12 -08:00
4c21b2f2d3 split register_aten_ops.cpp into shards (#12615)
Summary:
after an analogous breakup of VariableType.cpp, the generated
register_aten_ops.cpp is now the slowest-to-compile file in a typical
incremental rebuild by a wide margin. Therefore, give it the same
treatment - the generated code is split across several files to allow
parallel compilation.

Note that the existing code takes some care to arrange that overloads
of the same op name are given in a particular order. This diff
preserves that behavior, by treating all overloads of the same name as
a single indivisible unit, and sharding based on these groups rather
than on individual constructors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615

Reviewed By: ezyang

Differential Revision: D10367363

Pulled By: anderspapitto

fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e
2018-10-15 14:12:27 -07:00
49256ddb4a split generated VariableType.cpp (#12493)
Summary:
On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493

Reviewed By: zdevito

Differential Revision: D10315988

Pulled By: anderspapitto

fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e
2018-10-12 13:14:44 -07:00
01cffaa7e8 fix extra output in generate_code.py (#9339)
Summary:
operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339

Reviewed By: ezyang

Differential Revision: D8798689

Pulled By: zdevito

fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213
2018-07-11 10:25:31 -07:00
efefd1d7cf Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885)
Summary:
This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness.

Commit 1
=======

This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process.

* Switches schema over to parsed declarations, in the future this will allow something like:

```
  registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) {
    ...
  })
```

This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines).

The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that.

Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types.

This removes the other way we encoded schema, and makes it easier to see what schema are registered.

Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6

* Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself.

* Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change.

* Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier.

Commit 2
=======

This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed.

* Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info.

* Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes.

* Remove addInterpreterOpHandler in favor of global operator registry.

* Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered.

* Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op.

* Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors.  In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first.

* remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication

* refactor stack manipulation functions into a separate header file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885

Reviewed By: jamesr66a

Differential Revision: D8751048

Pulled By: zdevito

fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557
2018-07-10 10:24:48 -07:00
9ec0a2aef4 fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af 2018-06-27 04:50:56 -07:00
48e90e3339 Build system changes (#8627)
* All changes needed to get rid of process_github.sh

* allow thnn_h_path
2018-06-20 17:45:26 -04:00
fcd9af8a25 changes to support ATen code generation inside fbcode (#8397)
* Back out "Back out "Add support for generating ATen files during fbcode build""

Original commit changeset: 7b8de22d1613

I'm re-sending this diff exactly as it was approved and
committed. Fixes to support @mode/opt will be sent separately for ease
of review.

* Enable building //caffe2:torch with @mode/opt

In @mode/opt, python runs out of a PAR, which breaks a lot of
assumptions in the code about where templates/ folders live relative
to __file__. Rather than introduce hacks with parutil, I simply turn
template_path into a parameter for all the relevant functions and
thread it through from the top level.
2018-06-12 14:57:29 -07:00
ce69d3110b Improve script builtin checking using schema (#7311)
Improve script builtin checking using schema

* This add aten_schema.h which provides a barebones amount of type and
  argument information about each builtin operator
* emitBuiltinCall is updated to use this information rather than
  aten_dispatch to ensure the operator is correct.
* handling of keyword and position arguments now matches python behavior
* There is no longer a requirement that kwargs be constant or that the
  attributes of an op must be entirely constant or non-constant
* compiler now constructs a non-attributed version of the op first and
  then turns it into the constant-attribute version if all attributes
  are constants.
* default arguments for builtins now work
* SugaredValue::call and similar functions now have SourceRange information
  for their arguments so that error reporting is more accurate

Notes:
* This does not try to merge the builtin checking with python arg parser.
  Given that we will eventually have C10 schema which will replace aten_schema,
  we will eventually have a C++ description of the schema and working of that
  description directly will be the easiest form to understand.
* python function calls and script method calls do not support keyword arguments yet.
  When we add this support we should refactor the handling in tryEmitSchema
  that resolves keywords into a common function.

* default arguments work
* keyword arguments to builtins work (still need to extend to calling python and other script methods)
* much better error reporting for incorrect builtins

Lift any constants to attributes on nodes when possible

* Schema  is usable internally in the compiler as
  the function signatures of script functions as well as for builtin
  operators.
* Adds a List[T] class to better represent the arguments to cat/stack
  as a type rather than with custom checking.
* Support kwargs for calls of script methods

A future commit will be needed to add support for:
* calls to script _functions_ which are currently are GraphExecutors without schema info.
* kwargs to python functions, which will require refactoring python op
2018-05-14 14:46:36 -07:00
bcadf92ad5 Move codegen from setup.py to CMake for C++ libraries (#7121)
* Generate code without setup.py for C++ build

* Move code generation to CMake

* Set DEPENDS files correctly

* Fix some errors in codegen

* Fix blank line lint
2018-05-01 11:30:13 -07:00
48a3349c29 Delete dead Tensor code paths (#5417)
This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp.

This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.
2018-02-27 17:58:09 -05:00
c1b98f0841 Add deprecated add_out overload (#5088)
We have a few calls that use this signature on Tensors. This also
updates the binding code to support deprecated xxx_out signatures.
2018-02-06 17:08:23 -05:00
97fc06ac22 Use restat to reduce ninja rebuilding when running codegen. (#4635)
* Use restat to reduce ninja rebuilding when running codegen.

Usually, you're only working on one codegen file at a time, but
in our old behavior, editing one would induce a rebuild of everything
that depended on ANY generated file.  We fix this in two steps:

- Don't write the file (updating the timestamp) when the contents
  are unchanged.  (I had to update three seperate places; shared
  Python library for build tools when?!)

- Use the 'restat' ninja feature to avoid rebuilding when the timestamp
  doesn't change.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* lintfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

* lintfix2

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2018-01-16 12:32:22 -05:00
04ad23252a Refactor gen_variable_type (#4487)
The gen_variable_type.py script now is only responsible for generating
VariableType.h/cpp. The parent script, "gen_autograd.py", delegates to
gen_autograd_functions.py, gen_variable_type.py, and
gen_python_functions.py.

I've removed "fallthrough" functions. It's replaced by
DONT_RECORD_TRACE, DONT_PROFILE, and DONT_REQUIRE_DERIVATIVE.

In preparation for binding the _out variants, I changed some static
types to Tensor (from Variable) and we now unpack and name tuple return
values.
2018-01-08 13:43:09 -05:00
e2c75d3732 Make import work even if 'tools' is available in Python path
sys.path is searched from first to last, which means that if there is already
a 'tools' directory in the existing python path, we will fail to find the root
directory of PyTorch. Better to put it first.
2017-12-18 01:09:32 +01:00
5c809de4b4 Add missing derivatives.yaml input 2017-12-07 14:46:43 -08:00
1c0fbd27a1 CuDNN bindings rewrite (into ATen) (#3666)
* Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra

The executive summary is that this moves the torch/csrc/cudnn
library into ATen, adding a number of new cudnn_ methods to ATen
for batchnorm, convolution, affine grid generator and grid sampler.

ATen infra changes:

- TensorGeometry was moved to ATen
- TensorGeometry was modified to make its interface resemble that of
  Tensor; in particular, sizes is no longer a field, it's a method.
- AT_CUDA_ENABLED macro is set via ATen/Config.h header which is
  generated at cmake configure time.
  Fixes https://github.com/zdevito/ATen/issues/168
- Change AT_CUDA_ENABLED macro to be a function macro, so that we
  error if it is not defined
- Introduce a new TensorArg class, which is a Tensor plus a little
  metadata.  This helps us give good error messages when checking
  dimensions/shapes of tensors.
  Fixes https://github.com/zdevito/ATen/issues/169
- Also introduce a TensorGeometryArg class, for when you don't
  need the actual tensor data (which is most of the time.)
- Add ATen/Check.h, which contains a number of utility functions
  for testing shapes, types and devices of input tensors.  This
  will be particulary useful for native methods, which don't get
  code generated input testing code.  These functions take a
  'CheckedFrom' argument, at the moment just a string, which
  specifies some extra information about what function was
  doing the actual checking; this greatly improves error messages.
    - Many check functions take initializer lists, which let you
      test that all tensors have some property.  This API is
      peculiar, in that we IGNORE undefined tensors in this case.
      This is handled by filterDefined.
- Add AT_CUDNN_ENABLED macro
- CuDNN linking from ATen was improved; for example, we now actually
  add the CuDNN headers to our include path.
- Add some missing override specifiers to some methods
- We now actually build tests with CUDA functionality accessible
  (previously, AT_CUDA_ENABLED was not defined, meaning that
  the headers were missing all CUDA-only functionality.)
- Native functions now support giving explicit names to return
  outputs in yaml.  This makes it possible to hook into the NN
  autogenerated derivatives codepath using native functions.

CuDNN rewrite changes:

- torch/csrc/cudnn now uses ATen (rather than passing around
  THVoidTensor) and lives in ATen.  This lets us remove tensorPointer
  shenanigans.  The functions are exposed to ATen as native functions
  described in aten/src/ATen/cudnn/cuDNN.yaml
- ATen now builds and links against CuDNN when enabled.  The cmake
  package script was taken from Caffe2.
- Some header reorganization was done to help reduce dependencies
  on headers (this reorg is no longer used but I've kept it)
- Rename CHECK to CUDNN_CHECK
- Rip out old shape/type testing code in favor of modern ATen/Check.h
  interface using TensorArg.  In many cases, increase the robustness of
  the checking code.
- Change the inputs of the public facing functions, so that they can
  be bound by ATen
  - Delete THCState*; this is retrieved from the global ATen context
  - Delete cudnnHandle_t, this is retrieved from the global Handles.h
  - Delete cudnnDataType_t, this is retrieved from the Tensor type
  - Delete Convolution class, instead its constituent arguments are
    passed individually
- Change functions to return tensors, rather than take an appropriately
  sized output tensor as an input.
- Redo how transposed convolution / backward convolution is implemented
  (knock on effect of returning tensors).  Previously it was assumed
  that you would always pass an appropriately sized output tensor, but
  we don't want to do this anymore.  For backwards, we instead give
  the desired output tensor (input, really) size, because that is
  readily available.  For *transposed* convolution, however, we take
  output_padding, and otherwise do the shape calculation.
- Redo how legacy group convolution is implemented (knock on effect from
  porting cudnn to ATen.)  Previously, group convolution was implemented
  by manually constructing sizes and strides and then outputting
  appropriate, with macros switching between individual groups and
  all-at-once based on CuDNN version.  Now, the code looks exactly what
  you'd expect: there's a top-level wrapping function that supports
  group convolution no matter the version of CuDNN, and a low-level
  wrapper which supports only what CuDNN supports.  The top-level
  function conditions on CuDNN version, and invokes the low-level
  interface 1 or n times.
- There is now a debugging printer for tensor descriptors.
- Convolution struct is replaced with ConvolutionArgs, which is not
  part of the public API but is used internally to conveniently
  pass around all of the arguments needed for Convolution.
- Add some constexprs for well-known dimensions, reduce amount of
  magic numbers in code.
- Put 'deterministic' in to ConvParams.  Fixes #3659
- Lots more comments.
- Some pessimizations, in the name of code clarity:
  - The descriptors are initialized on every invocation of convolution
    forward/backward.  Previously, the descriptors were cached, so that
    you didn't have to initialize them again on backwards.  This is
    difficult to support in the ATen interface so I didn't support it.
  - Legacy group convolution initializes its workspace for *every* group
    it performs.  I did not feel motivated to fix this because the
    legacy codepath is already quite slow.
- Affine grid generator and grid sampler automatically call contiguous
  on their arguments as necessary.
- Batchnorm input checking is greatly beefed up, it now checks for
  the following input characteristics:
    - Definedness
    - GPU location
    - Type
    - Contiguity
    - Size

PyTorch binding code changes

- batchnorm now uses consistent var/data naming
- batchnorm and convolution make use of new ATen bindings
- Affine grid generator and grid sampler make use of ATen CuDNN
  bindings via derivatives.yaml.  This means I had to restructure
  the code a little, since the THNN bindings still go through
  a legacy Python class.
- I fixed some warnings:
  - s/friend class/friend struct/ on InterpreterStateImpl
  - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp
  - Removed unused pack_list on Scalar

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

GCC 4.8 buildfix

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Add TensorGeometry to ATen.h

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

CUDNN_CHECK

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Update TODO comment

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Delete return in cudnn_grid_sampler

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Don't allocate a new vector when filtering defined.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Remove Check overloads, convert to pass references.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Some more microbenchmarking.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>
2017-11-30 23:06:58 -05:00
0e54c3a989 Significantly speed up the incremental build.
This commit adds code to setup.py to use ninja to manage
C++ and code generator dependencies rather than use raw setuptools.
This is based on similar code added to ONNX.

Enabled optionally when ninja is installed.

On my computer speed for a do-nothing build drops from 10s to 1.5 seconds.
Speed of other compilation steps is significantly improved as well.
Dependencies are tracked correctly so the need for ccache is reduced.
2017-11-30 13:47:27 -05:00