pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 05:34:18 +08:00

Author	SHA1	Message	Date
Will Constable	93f7f58856	Make lazy codegen honor per-operator-headers flag (#74450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/74450 - per-operator-headers is a strict build mode where compulation units aren't allowed to depend on bulk headers like ATen/Functions.h, but must instead depend only on the specific operator headers used. (In other configurations, the reverse is required). Test Plan: CI to make sure nothing breaks for existing backends, and rebased next diff manual test to make sure it actually helps Reviewed By: ezyang, bdhirsh Differential Revision: D35002666 fbshipit-source-id: 712445f8d146cf026759444fbd42a20705be9bef (cherry picked from commit f13e5522d49a6edcb6aed4431b1ec8e2b50a98fc)	2022-03-22 16:31:21 +00:00
Will Constable	72b1194464	Run lazy tensor codegen in generate_code.py (#73996 ) Summary: Hooks into existing autograd codegen script (generate_code.py) to take advantage of its integrations into buck/cmake/bazel. Adds a new option (--gen_lazy_ts_backend) to. generate_code.py, calling this from CMake OSS build and fbcode build, but not from other internal xplat/ovrsource builds (these could be opted in later) Bazel support is added in a later diff. Includes one generated file (torch/csrc/lazy/generated/LazyIr.h) in a unit test (test/cpp/lazy/test_ir.cpp) to partially verify the generator is working, but does not compile the remaining output sources from the generator yet as they depend on other files not yet landed from lazy_tensor_staging branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/73996 Test Plan: OSS/internal CI - verify all builds are working and test_ir.cpp compiles LazyIr.h Reviewed By: ezyang Differential Revision: D34408536 fbshipit-source-id: 8af0aea3b95d81eccafc17d64390d70ddd176515 (cherry picked from commit f930612f2bad61c76eb02d85cfbec9f33a1459dc)	2022-03-17 15:31:26 +00:00
Peter Bell	fe91906ad7	Remove Declarations.yaml dependency from gen_autograd (#67496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496 gen_autograd.py doesn't use `Declarations.yaml` any more, and removing the dependency allows it to run in parallel with `tools/codegen/gen.py`. Test Plan: Imported from OSS Reviewed By: dagitses, ejguan Differential Revision: D32027251 Pulled By: albanD fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703	2021-11-03 13:19:24 -07:00
Sam Estep	737d920b21	Strictly type everything in .github and tools (#59117 ) Summary: This PR greatly simplifies `mypy-strict.ini` by strictly typing everything in `.github` and `tools`, rather than picking and choosing only specific files in those two dirs. It also removes `warn_unused_ignores` from `mypy-strict.ini`, for reasons described in https://github.com/pytorch/pytorch/pull/56402#issuecomment-822743795: basically, that setting makes life more difficult depending on what libraries you have installed locally vs in CI (e.g. `ruamel`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/59117 Test Plan: ``` flake8 mypy --config mypy-strict.ini ``` Reviewed By: malfet Differential Revision: D28765386 Pulled By: samestep fbshipit-source-id: 3e744e301c7a464f8a2a2428fcdbad534e231f2e	2021-06-07 14:49:36 -07:00
kedejesu	53d8778b4d	Update clang-format linux hash and yaml import calls (#53932 ) Summary: Fixing Bandit security issues. - yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load(). Test ID: B506 Severity: MEDIUM Confidence: HIGH File: ./caffe2/contrib/aten/gen_op.py More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html 235 if __name__ == '__main__': 236 decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader) 237 factory_methods = find_factory_methods(decls) - Blacklist: Use of insecure MD2 (`6149a26adb`), MD4 (`fc7f026980`), MD5 (`7ea9d9af4e`), or SHA1 hash function. Test ID: B303 Severity: MEDIUM Confidence: HIGH File: ./tools/clang_format_utils.py More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5 36 37 hash = hashlib.sha1() 38 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932 Reviewed By: jbschlosser Differential Revision: D27072017 Pulled By: malfet fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce	2021-03-18 17:11:58 -07:00
Jiakai Liu	5252e9857a	[pytorch] clean up unused util srcs under tools/autograd (#50611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50611 Removed the unused old-style code to prevent it from being used. Added all autograd/gen_pyi sources to mypy-strict.ini config. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Confirmed clean mypy-strict run: ``` mypy --config mypy-strict.ini ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25929730 Pulled By: ljk53 fbshipit-source-id: 1fc94436fd4a6b9b368ee0736e99bfb3c01d38ef	2021-01-18 23:54:02 -08:00
Sebastian Messmer	249261ada7	Remove generated_unboxing_wrappers and setManuallyBoxedKernel (#49251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49251 Since all ops are c10-full and use templated unboxing now, we don't need to codegenerate any unboxing logic anymore. Since this codegen was the only code using setManuallyBoxedKernel, we can also remove that functionality from KernelFunction, OperatorEntry and Dispatcher. ghstack-source-id: 119450486 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D25502865 fbshipit-source-id: 49d009df159fda4be41bd02457d4427e6e638c10	2021-01-06 14:22:50 -08:00
Nikita Shulga	2dff0b3e91	Fix typos in comments (#48316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48316 Reviewed By: walterddr, mrshenli Differential Revision: D25125123 Pulled By: malfet fbshipit-source-id: 6f31e5456cc078cc61b288191f1933711acebba0	2020-11-24 10:56:40 -08:00
Jiakai Liu	d91cefb0d8	[pytorch][codegen] migrate gen_annotated_fn_args.py to new codegen model (#47745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47745 This is a relatively small codegen. Reintroduced 'simple_type' to preserve old codegen output. It depends on some methods defined in gen_python_functions.py - next PR will clean up the remaining Declarations.yaml methods in gen_python_functions.py. Confirmed byte-for-byte compatible with the old codegen: ``` Run it before and after this PR: .jenkins/pytorch/codegen-test.sh <baseline_output_dir> .jenkins/pytorch/codegen-test.sh <test_output_dir> Then run diff to compare the generated files: diff -Naur <baseline_output_dir> <test_output_dir> ``` Differential Revision: D24885068 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: ljk53 fbshipit-source-id: c0fbd726bcc450c3c7fe232c23e5b31779d0b65f	2020-11-14 02:24:39 -08:00
Jiakai Liu	4159191f0e	[pytorch] split out trace type generator and migrate to new codegen model (#47438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47438 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24808211 Pulled By: ljk53 fbshipit-source-id: 44dfadf550a255c05aa201e54b48101aaf722885	2020-11-09 12:39:39 -08:00
Jiakai Liu	3d421b3137	[pytorch] rewrite of the python binding codegen with the v2 API (#46244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46244 - What does the generated binding code do? The Python binding codegen produces code that takes the input list of PyObjects, finds the matching ATen C++ function using PythonArgParser, converts the PyObjects into C++ types and calls the ATen C++ function: ``` +--------+ parsing +------------------------+ binding +-----------------------+ \| PyObjs \| ---------> \| PythonArgParser Output \| ---------> \| Cpp Function Dispatch \| +--------+ +------------------------+ +-----------------------+ ``` - Are Python arguments 1-1 mapped to C++ arguments? Python arguments might be reordered, packed, unpacked when binding to C++ arguments, as illustrated below: ``` // Binding - Reorder & Packing // aten::empty.names(int[] size, *, Dimname[]? names, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, MemoryFormat? memory_format=None) -> Tensor Python Args Cpp Args ----------------------------------------------------------- 0: size size 1: names names 2: memory_format -------+ 3: dtype -----+-\|--> options 4: layout / \| 5: device / +--> memory_format 6: pin_memory / 7: requires_grad -+ // Binding - Unpacking // aten::max.names_dim(Tensor self, Dimname dim, bool keepdim=False) -> (Tensor values, Tensor indices) Python Args Cpp Args ----------------------------------------------------------- +----> max /-----> max_values 0: input / self 1: dim / dim 2: keepdim / keepdim 3: out -----+ ``` - Why do we want to rewrite the python binding codegen? The old codegen takes Declarations.yaml as input. It doesn't distinguish between Python arguments and C++ arguments - they are all mixed together as a bag of non-typed dict objects. Different methods process these arg objects and add new attributes for various different purposes. It's not so obvious to figure out the semantics of these attributes. The complicated binding logic happens implicitly and scatteredly. ``` +--------------------+ \| Native Functions \| +--------------------+ \| \| v +--------------------+ \| Cpp Signatures \| +--------------------+ \| \| v +--------------------+ \| Declarations.yaml \| +--------------------+ \| +-------------------------------------+ \| +-------> \| PythonArgParser Schema \| \| \| +-------------------------------------+ \| \| . \| \| . v \| . +--------------------+ +-------------------------------------+ \| NonTyped Args Objs \| --> \| PythonArgParser -> Cpp Args Binding \| +--------------------+ +-------------------------------------+ \| . \| . \| . \| +-------------------------------------+ +-------> \| Cpp Function Dispatch \| +-------------------------------------+ ``` This PR leverages the new immutable data models introduced in the new aten codegen. It introduces dedicated data models for python schema. This way, we can not only avoid subtle Declaration.yaml conversions but also decouple the generation of python schema, python to c++ binding and c++ function call. The ultimate state will be like the following diagram: ``` +-------------------+ +-------------------------------------+ +-------> \| Python Signatures \| --> \| PythonArgParser Schema \| \| +-------------------+ +-------------------------------------+ \| \| . \| \| . \| \| . +------------------+ \| +-------------------------------------+ \| Native Functions \| +-------> \| PythonArgParser -> Cpp Args Binding \| +------------------+ \| +-------------------------------------+ \| \| . \| \| . \| \| . \| +-------------------+ +-------------------------------------+ +-------> \| Cpp Signatures \| --> \| Cpp Function Dispatch \| +-------------------+ +-------------------------------------+ ``` This PR has migrated the core binding logic from tools/autograd/gen_python_functions.py to tools/codegen/api/python.py. It produces the byte-for-byte same results (tested with #46243). Will migrate the rest of gen_python_functions.py in subsequent PRs. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D24388874 Pulled By: ljk53 fbshipit-source-id: f88b6df4e917cf90d868a2bbae2d5ffb680d1841	2020-10-19 17:36:45 -07:00
Dhruv Matani	0c5cd8c2b9	[RFC] Switch PyTorch Selective Build (Custom Build) to use the SelectiveBuilder abstraction (#45722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45722 This diff does a bunch of things: 1. Introduces some abstractions as detailed in https://fb.quip.com/2oEzAR5MKqbD to help with selective build related codegen in multiple files. 2. Adds helper methods to combine operators, debug info, operator lists, etc... 3. Currently, the selective build machinery querying `op_registration_whitelist` directly at various places in the code. `op_registration_whitelist` is a list of allowed operator names (without overload name). We want to move to a world where the overload names are also included so that we can be more selective about which operators we include. To that effect, it makes sense to hide the checking logic in a separate abstraction and have the build use that abstraction instead of putting all this selective build specific logic in the code-generator itself. This change is attempting to do just that. 4. Updates generate_code, unboxing-wrapper codegen, and autograd codegen to accept the operator selector paradigm as opposed to a selected operator list. 5. Update `tools/code_analyzer/gen_op_registration_allowlist.py` to expose providing an actual structured operator dependency graph in addition to a serialized string. There are a bunch of structural changes as well: 1. `root_op_list.yaml` and `combined_op_list.yaml` are now actual YAML files (not a space separated list of operator names) 2. `generate_code.py` accepts only paths to operator list YAML files (both old style as well as new style) and not list of operator names on the command line as arguments 3. `gen.py` optionally also accepts a custom build related operators YAML path (this file has information about which operators to register in the generated library). ghstack-source-id: 114578753 (Note: this ignores all push blocking failures!) Test Plan: `buck test caffe2/test:selective_build` Generated YAML files after the change: {P143981979} {P143982025} {P143982056} Ensure that the generated files are same before and after the change: ``` [dhruvbird@devvm2490 /tmp/TypeDefault.cpp] find -name ".cpp" \| xargs md5sum d72c3d125baa7b77e4c5581bbc7110d2 ./after_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./after_change/lite_predictor_lib_aten/TypeDefault.cpp d72c3d125baa7b77e4c5581bbc7110d2 ./before_change/gen_aten/TypeDefault.cpp 42353036c83ebc7620a7159235b9647f ./before_change/lite_predictor_lib_aten/TypeDefault.cpp ``` `VariableTypes_N.cpp` are generated the same both before and after the change: ``` [dhruvbird@devvm2490 /tmp/VariableType] find -name ".cpp" \| xargs -n 1 md5sum \| sort 3be89f63fd098291f01935077a60b677 ./after/VariableType_2.cpp 3be89f63fd098291f01935077a60b677 ./before/VariableType_2.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./after/VariableType_4.cpp 40a3e59d64e9dbe86024cf314f127fd6 ./before/VariableType_4.cpp a4911699ceda3c3a430f08c64e8243fd ./after/VariableType_1.cpp a4911699ceda3c3a430f08c64e8243fd ./before/VariableType_1.cpp ca9aa611fcb2a573a8cba4e269468c99 ./after/VariableType_0.cpp ca9aa611fcb2a573a8cba4e269468c99 ./before/VariableType_0.cpp e18f639ed23d802dc4a31cdba40df570 ./after/VariableType_3.cpp e18f639ed23d802dc4a31cdba40df570 ./before/VariableType_3.cpp ``` Reviewed By: ljk53 Differential Revision: D23837010 fbshipit-source-id: ad06b1756af5be25baa39fd801dfdf09bc565442	2020-10-18 15:10:42 -07:00
Jiakai Liu	a01e91e6b2	[pytorch] include all overloads for OSS custom build Summary: For mobile custom build, we only generate code for ops that are used by specific models to reduce binary size. There multiple places where we apply the op filtering: - generated_unboxing_wrappers_.cpp - autograd/VariableType.cpp - c10 op registration (in aten/gen.py) For c10 op registration, we filter by the main op name - all overloads that match the main op name part will be kept. For generated_unboxing_wrappers_, we filter by the full op name - only those having exactly the same overload name will be kept. This PR changes generated_unboxing_wrappers_ and autograd/VariableType.cpp codegen to also filter by the main op name. The reasons are: - keeping all overloads can have better backward compatibility; - generated_unboxing_wrappers_ are relatively small as it only contains thin wrappers for root ops. - generated_unboxing_wrappers_* will be replaced by c10 op registration soon anyway. - autograd/VariableType*.cpp are not included in OSS build. Why it offers better backward compatibility? #40737 is an example: It introduced a new `_convolution` overload and renamed the original one to `_convolution.deprecated`. Before this PR, the model prepared by the old version PyTorch won't be able to run on the custom mobile build generated on the PR because `_convolution.deprecated` won't be kept in the custom build due to full op name matching policy. By relaxing it to partial matching policy, the mobile custom build CI on the PR can pass. Will test the size impact for FB production build before landing. Differential Revision: D22809564 Test Plan: Imported from OSS Reviewed By: iseeyuan Pulled By: ljk53 fbshipit-source-id: e2fc017da31f38b9430cc2113f33e6d21a0eaf0b	2020-07-31 12:43:31 -07:00
Nathan Goldbaum	1e230a5c52	rewrite C++ __torch_function__ handling to work with TensorList operands (#41575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41575 Fixes https://github.com/pytorch/pytorch/issues/34294 This updates the C++ argument parser to correctly handle `TensorList` operands. I've also included a number of updates to the testing infrastructure, this is because we're now doing a much more careful job of testing the signatures of aten kernels, using the type information about the arguments as read in from `Declarations.yaml`. The changes to the tests are required because we're now only checking for `__torch_function__` attributes on `Tensor`, `Optional[Tensor]` and elements of `TensorList` operands, whereas before we were checking for `__torch_function__` on all operands, so the relatively simplistic approach the tests were using before -- assuming all positional arguments might be tensors -- doesn't work anymore. I now think that checking for `__torch_function__` on all operands was a mistake in the original design. The updates to the signatures of the `lambda` functions are to handle this new, more stringent checking of signatures. I also added override support for `torch.nn.functional.threshold` `torch.nn.functional.layer_norm`, which did not yet have python-level support. Benchmarks are still WIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34725 Reviewed By: mruberry Differential Revision: D22357738 Pulled By: ezyang fbshipit-source-id: 0e7f4a58517867b2e3f193a0a8390e2ed294e1f3	2020-07-17 08:54:29 -07:00
Martin Yuan	85128113f9	[Selective build] Enable selective build in VariablType Summary: Quick fix due to code merging. With this feature working, the total size reduction in Android is 664 KB (Pytorch -26 KB and papaya - 639 KB) https://fburl.com/unigraph/c726gvb1 Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22053779 fbshipit-source-id: 8da4a651432b453c25e543bc64dbed02946de63d	2020-06-18 14:31:09 -07:00
Martin Yuan	5b23f56d5a	Selective build on Training, query based. (#39452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39452 Selective build works on training. * VariableType_?.cpp are now selectively generated based on the operator list. * Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default. * For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built. * smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages. * At build stage, make selective build optional for training code-gen library. The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training. ghstack-source-id: 105190037 (Note: this ignores all push blocking failures!) Test Plan: Training: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt ``` Inference, with and without the new query-based feature: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` ``` buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` Reviewed By: ljk53 Differential Revision: D21459302 fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f	2020-06-03 18:01:48 -07:00
Sebastian Messmer	6d13a334f6	Remove use_c10_dispatcher: unboxed_only (#36838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36838 All ops now do unboxing after dispatch, i.e. c10 handles unboxing and c10 registers a wrapper for the op to JIT The last op that manually registered its own wrapper to JIT in register_aten_ops.cpp was migrated. Since there are no ops using use_c10_dispatcher: unboxed_only anymore, we can delete the feature. Also: - Rename some files to more accurately describe what they're doing now: - OpsAlreadyMovedToC10.h/cpp -> ATenOpList.h/cpp - register_aten_ops.cpp -> generated_unboxing_wrappers.cpp - gen_jit_dispatch.py -> gen_unboxing_wrappers.cpp ghstack-source-id: 102532915 Test Plan: waitforsandcastle Differential Revision: D21100081 fbshipit-source-id: be824958eef33f6cd42a6a652175bd0b1df4ebf9	2020-04-21 13:32:33 -07:00
Martin Yuan	a91097bdfb	Revert D20964368: Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build Test Plan: revert-hammer Differential Revision: D20964368 Original commit changeset: f1874088a597 fbshipit-source-id: d9317ed97a98e2b04c190785b5564536b1096282	2020-04-10 08:19:36 -07:00
Edward Yang	586481a6e2	Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build Test Plan: revert-hammer Differential Revision: D20408831 Original commit changeset: ec75dd762c46 fbshipit-source-id: f1874088a5970dd220cc027d0020ab6223b9bd93	2020-04-10 08:03:38 -07:00
Martin Yuan	7fcf8b0a3b	[Lite Interpreter] Operator registration migrate from manual to selective build (#35426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35426 Use selective build with the full set of operators (vs. manually register each used op with "_" prefix). Lite interpreter relies on JIT operator dispatch. In future we still need JIT operator dispatch dispatch ops that are not registered in c10. Currently the selective build is for c10/aten dispatch in BUCK. There is JIT selective code-gen in OSS but not ported to BUCK yet. This diff is also porting the selective code-gen in BUCK. * The selected op list is passed to gen_jit_dispatch.py. * The list passed to gen_jit_dispatch is the top-level ops (USED_PT_OPS) only, because the selective c10/aten dispatch already registered other ops that are called from the top-level ops. ghstack-source-id: 101885215 (Note: this ignores all push blocking failures!) Test Plan: 1. In Python, run torch.jit.export_opnames(scripted_M_mod) 2. Append the operator names into fbcode/caffe2/pt_ops.bzl and the BUCK target. 3. Run ``` buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/temp/bi_pytext_0315.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` Should provide expected results. In addition, the size of the generated code for JIT registration, for example, ```register_aten_ops_0.cpp```, should be significantly reduced (from ~250 KB to ~80KB). The non-selected op registration schema are still kept, but the registration functor is replaced by ```DUMMY_OPERATION``` Reviewed By: ljk53 Differential Revision: D20408831 fbshipit-source-id: ec75dd762c4613aeda3b2094f5dad11804dc9492	2020-04-10 02:31:32 -07:00
Martin Yuan	81c8ca1e2e	Disable tracing for Pytorch Mobile client (#36007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36007 Tracing is not needed in Pytorch Mobile client. Disabling it has a couple of benefits: 1. It's a pre-requisite to build lite interpreter. 2. It saves the code size for full jit and Federated learning (around 600k). Solution: use PYTORCH_DISABLE_TRACING to disable it. It's better than passing an argument to code-gen because: 1. It's a single-point change in the code template for both VariableType and VariableFactories. 2. code-gen does not handle VariableTypeManual.cpp. The macro is need there anyway. ghstack-source-id: 101529401 Test Plan: CI Reviewed By: ljk53 Differential Revision: D20852558 fbshipit-source-id: c28cec9f90208974acfa351ec9aec3fabbbb8aac	2020-04-05 13:55:38 -07:00
Nikita Shulga	7b04772c51	Keep same autogenerated files structure between fbcode and OSS builds (#35951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35951 Change generate_code to keep folder structure the same regardless of whether install path is provide Amend build_variables.bzl accordingly Another preliminary step to merge https://github.com/pytorch/pytorch/pull/35220 Test Plan: CI Reviewed By: EscapeZero, seemethere Differential Revision: D20839410 fbshipit-source-id: 02297560a7e48aa7c6271f7a8517fc4a1ab35271	2020-04-03 12:28:07 -07:00
Shen Li	78ad3dc174	Fix Lint (#34218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34218 Test Plan: Imported from OSS Differential Revision: D20249788 Pulled By: mrshenli fbshipit-source-id: 5ca2acaff5344fc4455c70af60576f8e93e54cbf	2020-03-04 09:48:57 -08:00
Martin Yuan	fdd771c90f	Make tracing in code gen optional (#33715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715 Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part. ghstack-source-id: 99252322 Test Plan: ``` buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1 ``` The tracing part of generated/VariableType_?.cpp will not be generated. Reviewed By: smessmer Differential Revision: D19684577 fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2	2020-03-04 08:16:31 -08:00
Jiakai Liu	43fb0015db	custom build script (#30144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144 Create script to produce libtorch that only contains ops needed by specific models. Developers can use this workflow to further optimize mobile build size. Need keep a dummy stub for unused (stripped) ops because some JIT side logic requires certain function schemas to be existed in the JIT op registry. Test Steps: 1. Build "dump_operator_names" binary and use it to dump root ops needed by a specific model: ``` build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml ``` 2. The MobileNetV2 model should use the following ops: ``` - aten::t - aten::dropout - aten::mean.dim - aten::add.Tensor - prim::ListConstruct - aten::addmm - aten::_convolution - aten::batch_norm - aten::hardtanh_ - aten::mm ``` NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm". You need fix it manually for now. 3. Run custom build script locally (use Android as an example): ``` SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` 4. Checkout demo app that uses locally built library instead of downloading from jcenter repo: ``` git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git ``` 5. Copy locally built libraries to demo app folder: ``` find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \; ``` 6. Build demo app with locally built libtorch: ``` cd ${HOME}/src/android-demo-app/HelloWorldApp ./gradlew clean && ./gradlew assembleDebug ``` 7. Install and run the demo app. In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M. Test Plan: Imported from OSS Differential Revision: D18612127 Pulled By: ljk53 fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a	2019-11-20 13:16:02 -08:00
Jiakai Liu	60372dc713	remove backward functions from jit-op-registry for mobile build (#26851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851 Add codegen option to remove backward ops from jit-op-registry as they are not likely to be used for inference only mobile build. Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219. Test Plan: - build and integrate with demo app; Differential Revision: D17587422 Pulled By: ljk53 fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a	2019-09-25 23:17:25 -07:00
Jiakai Liu	8485710143	introduce INTERN_DISABLE_AUTOGRAD flag to create inference only library for mobile Summary: This is the first of a series of changes to reduce build size by cutting autograd functions from mobile build. When INTERN_DISABLE_AUTOGRAD is set: * On CMake side we exclude Functions.h/cpp, VariableType.h/cpp, VariableTypeManual.cpp from the build process. Still keep variable_factories.h as we rely on it to create variables instead of tensors. In source code we gate a couple autograd references (in autograd/variable.cpp) with C10_MOBILE (technically we should use a dedicated c macro but its maintenance cost is higher than cmake macro as we have several build systems to change). * Pass --disable-autograd flag to codegen script, which will stop generating Functions/VariableType code. And for variable_factories.h it will stop generating tracing code. Edit: in this diff we will keep Functions.h/cpp to avoid changing source code. Why we need this change if it's already not calling VariableType and autograd stuff with USE_STATIC_DISPATCH=ON for mobile? It's trying to reduce static library size for iOS build, for which it's relatively harder to strip size with linker approach. Why we need make involved change into codegen script? There isn't a global config system in codegen - autograd/env.py provides similar functionality but it says not adding anything there. Test Plan: - will check CI; - test mobile build in sample app; Differential Revision: D17202733 Pulled By: ljk53 fbshipit-source-id: 5701c6639b39ce58aba9bf5489a08d30d1dcd299	2019-09-10 10:20:17 -07:00
Gregory Chanan	25e6a52e2e	Stop doing nn wrap. (#25353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25353 It doesn't seem necessary anymore. Test Plan: Imported from OSS Differential Revision: D17101569 Pulled By: gchanan fbshipit-source-id: 67a198ae594dcd64dbd7cf6a73e2160e26e3513e	2019-08-30 07:42:20 -07:00
Karl Ostmo	8f0603b128	C++ changes toward libtorch and libcaffe2 unification (#19554 ) Summary: * adds TORCH_API and AT_CUDA_API in places * refactor code generation Python logic to separate caffe2/torch outputs * fix hip and asan * remove profiler_cuda from hip * fix gcc warnings for enums * Fix PythonOp::Kind Pull Request resolved: https://github.com/pytorch/pytorch/pull/19554 Differential Revision: D15082727 Pulled By: kostmo fbshipit-source-id: 83a8a99717f025ab44b29608848928d76b3147a4	2019-04-26 01:38:10 -07:00
Zachary DeVito	21193bf123	try to get rid of tmp_install (#16414 ) Summary: Rehash of previous attempts. This tries a different approach where we accept the install as specified in cmake (leaving bin/ include/ and lib/ alone), and then try to adjust the rest of the files to this more standard layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/16414 Differential Revision: D13863635 Pulled By: zdevito fbshipit-source-id: 23725f5c64d7509bf3ca8f472dcdcad074de9828	2019-01-29 17:29:40 -08:00
andersj	8a5ba577c1	Revert "remove use of tmp_install" (#15847 ) Summary: This reverts commit 04bf5285896e52ac118d2f9e9b7f582f695f13e2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/15847 Differential Revision: D13603174 Pulled By: anderspapitto fbshipit-source-id: ae321434d3345ad94fad67bf71fd027cddeb4588	2019-01-08 16:30:19 -08:00
andersj	04bf528589	remove use of tmp_install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14553 Differential Revision: D13583335 Pulled By: anderspapitto fbshipit-source-id: 8711fead9eda877c1037a0bc59f91a3d2e01f3e0	2019-01-04 13:48:12 -08:00
Anders Papitto	4c21b2f2d3	split register_aten_ops.cpp into shards (#12615 ) Summary: after an analogous breakup of VariableType.cpp, the generated register_aten_ops.cpp is now the slowest-to-compile file in a typical incremental rebuild by a wide margin. Therefore, give it the same treatment - the generated code is split across several files to allow parallel compilation. Note that the existing code takes some care to arrange that overloads of the same op name are given in a particular order. This diff preserves that behavior, by treating all overloads of the same name as a single indivisible unit, and sharding based on these groups rather than on individual constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615 Reviewed By: ezyang Differential Revision: D10367363 Pulled By: anderspapitto fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e	2018-10-15 14:12:27 -07:00
Anders Papitto	49256ddb4a	split generated VariableType.cpp (#12493 ) Summary: On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493 Reviewed By: zdevito Differential Revision: D10315988 Pulled By: anderspapitto fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e	2018-10-12 13:14:44 -07:00
Zachary DeVito	01cffaa7e8	fix extra output in generate_code.py (#9339 ) Summary: operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339 Reviewed By: ezyang Differential Revision: D8798689 Pulled By: zdevito fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213	2018-07-11 10:25:31 -07:00
Zachary DeVito	efefd1d7cf	Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885 ) Summary: This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness. Commit 1 ======= This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process. * Switches schema over to parsed declarations, in the future this will allow something like: ``` registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) { ... }) ``` This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines). The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that. Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types. This removes the other way we encoded schema, and makes it easier to see what schema are registered. Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6 * Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself. * Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change. * Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier. Commit 2 ======= This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed. * Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info. * Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes. * Remove addInterpreterOpHandler in favor of global operator registry. * Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered. * Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op. * Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first. * remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication * refactor stack manipulation functions into a separate header file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885 Reviewed By: jamesr66a Differential Revision: D8751048 Pulled By: zdevito fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557	2018-07-10 10:24:48 -07:00
Orion Reblitz-Richardson	9ec0a2aef4	fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af	2018-06-27 04:50:56 -07:00
anderspapitto	48e90e3339	Build system changes (#8627 ) * All changes needed to get rid of process_github.sh * allow thnn_h_path	2018-06-20 17:45:26 -04:00
anderspapitto	fcd9af8a25	changes to support ATen code generation inside fbcode (#8397 ) * Back out "Back out "Add support for generating ATen files during fbcode build"" Original commit changeset: 7b8de22d1613 I'm re-sending this diff exactly as it was approved and committed. Fixes to support @mode/opt will be sent separately for ease of review. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level.	2018-06-12 14:57:29 -07:00
Zachary DeVito	ce69d3110b	Improve script builtin checking using schema (#7311 ) Improve script builtin checking using schema * This add aten_schema.h which provides a barebones amount of type and argument information about each builtin operator * emitBuiltinCall is updated to use this information rather than aten_dispatch to ensure the operator is correct. * handling of keyword and position arguments now matches python behavior * There is no longer a requirement that kwargs be constant or that the attributes of an op must be entirely constant or non-constant * compiler now constructs a non-attributed version of the op first and then turns it into the constant-attribute version if all attributes are constants. * default arguments for builtins now work * SugaredValue::call and similar functions now have SourceRange information for their arguments so that error reporting is more accurate Notes: * This does not try to merge the builtin checking with python arg parser. Given that we will eventually have C10 schema which will replace aten_schema, we will eventually have a C++ description of the schema and working of that description directly will be the easiest form to understand. * python function calls and script method calls do not support keyword arguments yet. When we add this support we should refactor the handling in tryEmitSchema that resolves keywords into a common function. * default arguments work * keyword arguments to builtins work (still need to extend to calling python and other script methods) * much better error reporting for incorrect builtins Lift any constants to attributes on nodes when possible * Schema is usable internally in the compiler as the function signatures of script functions as well as for builtin operators. * Adds a List[T] class to better represent the arguments to cat/stack as a type rather than with custom checking. * Support kwargs for calls of script methods A future commit will be needed to add support for: * calls to script _functions_ which are currently are GraphExecutors without schema info. * kwargs to python functions, which will require refactoring python op	2018-05-14 14:46:36 -07:00
Peter Goldsborough	bcadf92ad5	Move codegen from setup.py to CMake for C++ libraries (#7121 ) * Generate code without setup.py for C++ build * Move code generation to CMake * Set DEPENDS files correctly * Fix some errors in codegen * Fix blank line lint	2018-05-01 11:30:13 -07:00
Sam Gross	48a3349c29	Delete dead Tensor code paths (#5417 ) This deletes most of the dead Tensor code paths, including the TensorMethods cwrap and generic/Tensor.cpp. This also moves the THNN.cwrap/.cpp generation to generate_code which can use ninja if installed.	2018-02-27 17:58:09 -05:00
Sam Gross	c1b98f0841	Add deprecated add_out overload (#5088 ) We have a few calls that use this signature on Tensors. This also updates the binding code to support deprecated xxx_out signatures.	2018-02-06 17:08:23 -05:00
Edward Z. Yang	97fc06ac22	Use restat to reduce ninja rebuilding when running codegen. (#4635 ) * Use restat to reduce ninja rebuilding when running codegen. Usually, you're only working on one codegen file at a time, but in our old behavior, editing one would induce a rebuild of everything that depended on ANY generated file. We fix this in two steps: - Don't write the file (updating the timestamp) when the contents are unchanged. (I had to update three seperate places; shared Python library for build tools when?!) - Use the 'restat' ninja feature to avoid rebuilding when the timestamp doesn't change. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * lintfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * lintfix2 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-01-16 12:32:22 -05:00
Sam Gross	04ad23252a	Refactor gen_variable_type (#4487 ) The gen_variable_type.py script now is only responsible for generating VariableType.h/cpp. The parent script, "gen_autograd.py", delegates to gen_autograd_functions.py, gen_variable_type.py, and gen_python_functions.py. I've removed "fallthrough" functions. It's replaced by DONT_RECORD_TRACE, DONT_PROFILE, and DONT_REQUIRE_DERIVATIVE. In preparation for binding the _out variants, I changed some static types to Tensor (from Variable) and we now unpack and name tuple return values.	2018-01-08 13:43:09 -05:00
Edward Z. Yang	e2c75d3732	Make import work even if 'tools' is available in Python path sys.path is searched from first to last, which means that if there is already a 'tools' directory in the existing python path, we will fail to find the root directory of PyTorch. Better to put it first.	2017-12-18 01:09:32 +01:00
Zachary DeVito	5c809de4b4	Add missing derivatives.yaml input	2017-12-07 14:46:43 -08:00
Edward Z. Yang	1c0fbd27a1	CuDNN bindings rewrite (into ATen) (#3666 ) * Comprehensive rewrite of Torch CuDNN bindings / a bit of ATen infra The executive summary is that this moves the torch/csrc/cudnn library into ATen, adding a number of new cudnn_ methods to ATen for batchnorm, convolution, affine grid generator and grid sampler. ATen infra changes: - TensorGeometry was moved to ATen - TensorGeometry was modified to make its interface resemble that of Tensor; in particular, sizes is no longer a field, it's a method. - AT_CUDA_ENABLED macro is set via ATen/Config.h header which is generated at cmake configure time. Fixes https://github.com/zdevito/ATen/issues/168 - Change AT_CUDA_ENABLED macro to be a function macro, so that we error if it is not defined - Introduce a new TensorArg class, which is a Tensor plus a little metadata. This helps us give good error messages when checking dimensions/shapes of tensors. Fixes https://github.com/zdevito/ATen/issues/169 - Also introduce a TensorGeometryArg class, for when you don't need the actual tensor data (which is most of the time.) - Add ATen/Check.h, which contains a number of utility functions for testing shapes, types and devices of input tensors. This will be particulary useful for native methods, which don't get code generated input testing code. These functions take a 'CheckedFrom' argument, at the moment just a string, which specifies some extra information about what function was doing the actual checking; this greatly improves error messages. - Many check functions take initializer lists, which let you test that all tensors have some property. This API is peculiar, in that we IGNORE undefined tensors in this case. This is handled by filterDefined. - Add AT_CUDNN_ENABLED macro - CuDNN linking from ATen was improved; for example, we now actually add the CuDNN headers to our include path. - Add some missing override specifiers to some methods - We now actually build tests with CUDA functionality accessible (previously, AT_CUDA_ENABLED was not defined, meaning that the headers were missing all CUDA-only functionality.) - Native functions now support giving explicit names to return outputs in yaml. This makes it possible to hook into the NN autogenerated derivatives codepath using native functions. CuDNN rewrite changes: - torch/csrc/cudnn now uses ATen (rather than passing around THVoidTensor) and lives in ATen. This lets us remove tensorPointer shenanigans. The functions are exposed to ATen as native functions described in aten/src/ATen/cudnn/cuDNN.yaml - ATen now builds and links against CuDNN when enabled. The cmake package script was taken from Caffe2. - Some header reorganization was done to help reduce dependencies on headers (this reorg is no longer used but I've kept it) - Rename CHECK to CUDNN_CHECK - Rip out old shape/type testing code in favor of modern ATen/Check.h interface using TensorArg. In many cases, increase the robustness of the checking code. - Change the inputs of the public facing functions, so that they can be bound by ATen - Delete THCState; this is retrieved from the global ATen context - Delete cudnnHandle_t, this is retrieved from the global Handles.h - Delete cudnnDataType_t, this is retrieved from the Tensor type - Delete Convolution class, instead its constituent arguments are passed individually - Change functions to return tensors, rather than take an appropriately sized output tensor as an input. - Redo how transposed convolution / backward convolution is implemented (knock on effect of returning tensors). Previously it was assumed that you would always pass an appropriately sized output tensor, but we don't want to do this anymore. For backwards, we instead give the desired output tensor (input, really) size, because that is readily available. For transposed* convolution, however, we take output_padding, and otherwise do the shape calculation. - Redo how legacy group convolution is implemented (knock on effect from porting cudnn to ATen.) Previously, group convolution was implemented by manually constructing sizes and strides and then outputting appropriate, with macros switching between individual groups and all-at-once based on CuDNN version. Now, the code looks exactly what you'd expect: there's a top-level wrapping function that supports group convolution no matter the version of CuDNN, and a low-level wrapper which supports only what CuDNN supports. The top-level function conditions on CuDNN version, and invokes the low-level interface 1 or n times. - There is now a debugging printer for tensor descriptors. - Convolution struct is replaced with ConvolutionArgs, which is not part of the public API but is used internally to conveniently pass around all of the arguments needed for Convolution. - Add some constexprs for well-known dimensions, reduce amount of magic numbers in code. - Put 'deterministic' in to ConvParams. Fixes #3659 - Lots more comments. - Some pessimizations, in the name of code clarity: - The descriptors are initialized on every invocation of convolution forward/backward. Previously, the descriptors were cached, so that you didn't have to initialize them again on backwards. This is difficult to support in the ATen interface so I didn't support it. - Legacy group convolution initializes its workspace for every group it performs. I did not feel motivated to fix this because the legacy codepath is already quite slow. - Affine grid generator and grid sampler automatically call contiguous on their arguments as necessary. - Batchnorm input checking is greatly beefed up, it now checks for the following input characteristics: - Definedness - GPU location - Type - Contiguity - Size PyTorch binding code changes - batchnorm now uses consistent var/data naming - batchnorm and convolution make use of new ATen bindings - Affine grid generator and grid sampler make use of ATen CuDNN bindings via derivatives.yaml. This means I had to restructure the code a little, since the THNN bindings still go through a legacy Python class. - I fixed some warnings: - s/friend class/friend struct/ on InterpreterStateImpl - Removed pessimizing move 'detached' in torch/csrc/autograd/variable.cpp - Removed unused pack_list on Scalar Signed-off-by: Edward Z. Yang <ezyang@fb.com> GCC 4.8 buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> Add TensorGeometry to ATen.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> CUDNN_CHECK Signed-off-by: Edward Z. Yang <ezyang@fb.com> Update TODO comment Signed-off-by: Edward Z. Yang <ezyang@fb.com> Delete return in cudnn_grid_sampler Signed-off-by: Edward Z. Yang <ezyang@fb.com> s/cudnnSetStreamToCurrent/setCuDNNStreamToCurrent/g Signed-off-by: Edward Z. Yang <ezyang@fb.com> Don't allocate a new vector when filtering defined. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Remove Check overloads, convert to pass references. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Some more microbenchmarking. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2017-11-30 23:06:58 -05:00
Zachary DeVito	0e54c3a989	Significantly speed up the incremental build. This commit adds code to setup.py to use ninja to manage C++ and code generator dependencies rather than use raw setuptools. This is based on similar code added to ONNX. Enabled optionally when ninja is installed. On my computer speed for a do-nothing build drops from 10s to 1.5 seconds. Speed of other compilation steps is significantly improved as well. Dependencies are tracked correctly so the need for ccache is reduced.	2017-11-30 13:47:27 -05:00

49 Commits