pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-20 12:54:11 +08:00

Author	SHA1	Message	Date
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
Edward Yang	77ad3c5aeb	Revert D20683972: [pytorch][PR] Fix PyTorch separate compilation Test Plan: revert-hammer Differential Revision: D20683972 Original commit changeset: bc1492aa9d1d fbshipit-source-id: 8994cbb36877d4338b8677ac6bc807dd16efa67c	2020-03-27 09:18:48 -07:00
Nikita Shulga	2e739f822b	Fix PyTorch separate compilation (#34863 ) Summary: Looks like there is a bug in CUDA device linker, but kernels that uses `thust::sort_by_key` can not be linked with other kernels Solve the problem by splitting 5 thrust-heavy .cu files into `__torch_cuda_sp` library which is statically linked into `torch_cuda` For default compilation workflow it should not make any difference. Test Plan: Compile with `-DCUDA_SEPARABLE_COMPILATION=YES` and observe library size difference: 310Mb before, 173Mb after if compiled for sm_75 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34863 Differential Revision: D20683972 Pulled By: malfet fbshipit-source-id: bc1492aa9d1d2d21c48e8764a8a7b403feaec5da	2020-03-26 17:49:07 -07:00
Martin Yuan	a4ea16dbc6	Put prim ops used in full jit only in a separate file (#35232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35232 Some prim operators, like profile and fusion, are not used in mobile (at least in short term). They are coupled with JIT code. Put them in a separate file (register_prim_ops_fulljit.cpp). ghstack-source-id: 100807055 Test Plan: buck build //xplat/caffe2:torch Reviewed By: dreiss Differential Revision: D20408827 fbshipit-source-id: 9013093357cf75723ef00c34bbfdb6b7ea40a4cf	2020-03-25 14:15:34 -07:00
Nikita Shulga	512bcf68be	[Formatting] `if (` -> `if(` in CMakeLists.txt (#35343 ) Summary: Same to `else`, `endif` and `elseif`. Also prefer lowercase over uppercase ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/35343 Test Plan: None at all Differential Revision: D20638789 Pulled By: malfet fbshipit-source-id: 8058075693185e66f5dda7b825b725e139d0d000	2020-03-25 13:48:42 -07:00
Martin Yuan	361eed6a6e	Use JIT op registration directly for lite interpreter. (#34070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34070 The first step to make all operators available for lite interpreter. The original code used manual registration for lite interpreter ops with a "_" prefix, for two reasons: 1. To minimize the build size. 2. To avoid duplicate registration in OSS (majorly feature testing and unit tests). Now since we have more and more models to support, the manual registration way is not practical. To make this process automatic while keeping the binary size under control, we plan to: 1. Make all necessary ops callable from lite interpreter. 2. The binary size would be increased because of step 1. Use ljk53 's custom build to selectively build the binary with ops used in specific models. The ops will be automatically collected using get_opnames. 3. The temporary "register_mobile_ops.cpp" can be removed. Test Plan: Imported from OSS Differential Revision: D20291596 Pulled By: iseeyuan fbshipit-source-id: 553b4699619cd71fea20658f3bc8c2d48852ef5c	2020-03-25 07:21:51 -07:00
Elias Ellison	5b2f8cef08	[JIT] Functional Graph Pass (#33020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33020 This is a pass to create functional blocks. The other PRs in the stack help avoid some of the limitations that are are often found in graphs. It's possible that this would work well with a graph that is frozen. Follow up work items that will help this pass: - We don't currently have any capacity in alias analysis to tell whether a Value that came from the wildcard set "re-escapes" back into the wildcard set. - More comments on the semantics of the graph and correctness conditions - We could consider using dynamic dag if the perf of this is a limitation. - potential make Functional Graphs Functional Blocks instead, so that we do not repeatedly copy constants, also to make IR read easier. Test Plan: Imported from OSS Differential Revision: D20603188 Pulled By: eellison fbshipit-source-id: 6822a6e65f4cc2676f8f6445fe8aa1cb858ebeeb	2020-03-24 23:44:18 -07:00
Alban Desmaison	a7f8655314	Revert D20624571: [pytorch][PR] [TensorExpr] Extend arithmetic simplifier to work with multi variable expressions Test Plan: revert-hammer Differential Revision: D20624571 Original commit changeset: e49049377bee fbshipit-source-id: 7d8dda0c3b44be1c3236a0313bbfa128b7015de7	2020-03-24 16:59:51 -07:00
Nick Gibson	fce67800f4	[TensorExpr] Extend arithmetic simplifier to work with multi variable expressions (#35127 ) Summary: A new version of the IR simplifier used by the jit/tensorexpr fuser. This is capable of simplifying expressions containing (shock) multiple variables, eg: ```(m * (1 * n_1) + (n + 1)) - (m * (1 * n_1) + n) => 1``` Similar to the previous IR Simplifier it uses a two stage approach: 1. Traverse the tree combining subtree's of commutable operations in to a flat structure. In this implementation we have two intermediate Exprs: Term (expressing products of sub expressions) and Polynomial (expressing sums of sub expressions). 2. Traverse the tree expanding Term's and Polynomials into their component operators. Using the example above we execute with a process like this to simplify: ``` (m * (1 * n_1) + (n + 1)) - (m * (1 * n_1) + n) # Using PolynomialTransformer: => Sub(Add(Mul(m, Mul(1, n_1)), Add(n, 1)), Add(Mul(m, Mul(1, n_1)), n)) => Sub(Polynomial(Term(m, n_1), n, 1), Polynomial(Term(m, n_1), n)) => Polynomial(Term(m, n_1), Term(-1, m, n_1), n, -n, 1) => Polynomial(1) # Using TermExpander => 1 ``` The IRSimplifier supports arithmetic simplifications of operators Add, Sub and Mul and constant folding of all binary Exprs and Intrinsics, but does not attempt expansion of multiplication of Polynomials to the canonical form since that generally leads to less efficient representations. It will do scalar factorization if it results in removal of operators, and will merge chains of multilane primitives (such as Broadcast and Ramp) down into a single operator. The ir_simplifier unit tests are a short tour of its capabilities. The existing simplifier has a bug where it will sometimes reorder operations on floating point types which are not associative. This causes (at least) the pyhpc equation_of_state benchmark to produce incorrect results. I have fixed that issue in this version and verified that that benchmark produces the same results with and without the simplifier. Tests: all cpp & py tensorexpr tests, and pyphc benchmark: ``` benchmarks.equation_of_state ============================ Running on CPU size backend calls mean stdev min 25% median 75% max Δ ------------------------------------------------------------------------------------------------------------------ 4,194,304 pytorch 10 0.246 0.002 0.243 0.245 0.246 0.248 0.250 1.000 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35127 Differential Revision: D20624571 Pulled By: nickgg fbshipit-source-id: e49049377beee69e02dcf26eb922bef1447ae776	2020-03-24 14:16:07 -07:00
Mikhail Zolotukhin	65cea95777	[TensorExpr] Rename schedule.{cpp,h} to loopnest.{cpp,h}. (#35119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35119 Differential Revision: D20567927 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1fb6d03bd4c6e66aca62140d2b537692577f261d	2020-03-20 23:37:51 -07:00
Pritam Damania	7065c46ea2	Respect dist autograd context in torch.jit._fork. (#34360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34360 The distributed autograd context sets up a thread local context id which is used to perform appropriate book keeping and autograd recording of RPC functions in the forward pass. However, if we use torch.jit._fork within the distributed autograd context, the code executed within torch.jit._fork will lose this context since it is run in a separate JIT thread and the thread local is not set in that thread. To fix this problem, we pass in the distributed autograd context to torch.jit._fork similar to what we did in https://github.com/pytorch/pytorch/pull/16101. ghstack-source-id: 100445465 Test Plan: waitforbuildbot Differential Revision: D20301352 fbshipit-source-id: aa3fffe69c2b40722c66213351a4e0d77484a621	2020-03-19 14:12:28 -07:00
Edward Yang	96860af870	Revert D20164420: [1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd Test Plan: revert-hammer Differential Revision: D20164420 Original commit changeset: 3d4ed7423096 fbshipit-source-id: 67f0f9c11cee84df6dbe37db7821dd601227df66	2020-03-19 08:02:07 -07:00
Omkar Salpekar	5f67c923f1	[1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd (#34638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34638 Fixes: https://github.com/pytorch/pytorch/issues/27643 This PR manages notifying workers in the event of a failure during distributed autograd. Gracefully handles propagating errors across all nodes in the backward pass and sets state in the local autograd engines accordingly. (Note: this ignores all push blocking failures!) Test Plan: Added 2 new tests checking errors when they are thrown in an intermediate node during distributed autograd. Ensured that all existing distributed autograd tests pass. Differential Revision: D20164420 fbshipit-source-id: 3d4ed74230969ac70bb763f1b5b1c16d979f66a2	2020-03-18 18:56:14 -07:00
Nikita Shulga	cfab65d90d	Fix CMake Dev warning in caffe2/CMakeLists.txt (#34886 ) Summary: If arguments of `ENDIF()` block are non-empty, they should match corresponding `IF()` BLOCK Pull Request resolved: https://github.com/pytorch/pytorch/pull/34886 Test Plan: CI Differential Revision: D20494631 Pulled By: malfet fbshipit-source-id: 5fed86239b4a0cb4b3aedd02c950c1b800199d2d	2020-03-17 12:19:42 -07:00
Mikhail Zolotukhin	ea5c86c276	[TensorExpr] Add LLVM codegen. (#34228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228 This PR adds LLVM codegen to tensor expressions. LLVM is added as an optional build dependency specified with `USE_LLVM=<path_to_llvm>` variable. If this variable is not set or LLVM is not found in the specified path, the LLVM codegen is completely disabled. Differential Revision: D20251832 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2	2020-03-16 11:49:34 -07:00
Mikhail Zolotukhin	35e7efeb9a	[TensorExpr] Add CUDA codegen. (#34227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227 This PR adds a CUDA support to tensor expressions. Differential Revision: D20251836 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017	2020-03-16 11:49:29 -07:00
Mikhail Zolotukhin	42b2c8c65d	[TensorExpr] Add a fuser pass based on tensor expressions. (#34226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34226 LLVM and Cuda backends are added in subsequent PRs, so at this point the fuser is pretty useless, but it still can be tested and its logic is not going to change with addition of the codegens. Differential Revision: D20251838 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 82b0d221fa89904ed526689d02a6c7676a8ce8de	2020-03-16 11:49:24 -07:00
Mikhail Zolotukhin	e31d462e92	[TensorExpr] Pull changes to core classes for representing expressions and statements from the side branch. (#34224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34224 Our development has been happening on a side branch `pytorch_fusion` in `bertmaher/pytorch` fork. This PR moves changes to the core classes representing expressions and transformations on them. At this moment, the tensor expressions are only used in tests. Subsequent PRs add LLVM and CUDA codegen for tensor expressions and implement fuser on top of these. This PR is huge as it is a squashed version of changes in the side branch. It is not practical to pull changes one by one from the branch, so here is the squashed version. If you're interested in seeing the history of changes, please refer to https://github.com/bertmaher/pytorch Differential Revision: D20251835 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1a871acc09cf3c6f7fb4af40d408cdbb82dc7dab	2020-03-16 11:47:47 -07:00
peter	24c9e61e79	Enable JIT tests on Windows (#27029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27029 Reviewed By: eellison Differential Revision: D20458664 Pulled By: jamesr66a fbshipit-source-id: 22be918543703869f471e89b3478423198351bf3	2020-03-16 11:26:21 -07:00
Kimish Patel	4da5569300	Pass to remove prepacking ops. (#34319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34319 Removes prepacking ops and install them as attributes of the top level module. Needs to run freezing as the first pass. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20290726 fbshipit-source-id: 633ceaa867ff7d5c8e69bd814c0362018394cb3a	2020-03-14 12:53:31 -07:00
Kimish Patel	7dd5da2026	JIT pass to insert XNNPACK ops (#34048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34048 Rewrites the graph to insert xnnpack prepack and packed run ops for conv2d and linear. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185658 fbshipit-source-id: c4c073c912ad33e822e7beb4ed86c9f895129d55	2020-03-14 12:53:27 -07:00
peterjc123	9e6cd98c3f	Ensure torch_cuda is linked against on Windows (#34288 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34288 Differential Revision: D20314251 Pulled By: seemethere fbshipit-source-id: 15ab2d4de665d553a1622a2d366148697deb6c02	2020-03-12 12:16:44 -07:00
Will Feng	a54416d208	[C++ API] Remove deprecated torch::nn::BatchNorm / FeatureDropout / modules_ordered_dict and torch::nn::init::Nonlinearity / FanMode (#34508 ) Summary: This PR is BC-breaking in the following way: - The deprecated `torch::nn::BatchNorm` is removed in favor of `torch::nn::BatchNorm{1,2,3}d` - The deprecated `torch::nn::FeatureDropout` is removed in favor of `torch::nn::Dropout{2,3}d` - The deprecated `torch::nn::modules_ordered_dict` is removed. User should do `Sequential sequential({{"m1", MyModule(1)}, {"m2", MyModule(2)}})` instead. - The deprecated `torch::nn::init::Nonlinearity` is removed, in favor of the following enums: - `torch::kLinear` - `torch::kConv1D` - `torch::kConv2D` - `torch::kConv3D` - `torch::kConvTranspose1D` - `torch::kConvTranspose2D` - `torch::kConvTranspose3D` - `torch::kSigmoid` - `torch::kTanh` - `torch::kReLU` - `torch::kLeakyReLU` - The deprecated `torch::nn::init::FanMode` is removed, in favor of the following enums: - `torch::kFanIn` - `torch::kFanOut` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34508 Differential Revision: D20351601 Pulled By: yf225 fbshipit-source-id: cca0cd112f29a31bb023e348ca8f82780e42bea3	2020-03-12 10:09:58 -07:00
Mansoor	e95657b87e	[C++ API] AdaptiveLogSoftmaxWithLoss (#29076 ) Summary: Implemented AdaptiveLogSoftmaxWithLoss and some tests for modules. Reference https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29076 Differential Revision: D20404588 Pulled By: yf225 fbshipit-source-id: edbadf432b8173cbcc6caf83c9c03dd92dc31a37	2020-03-12 09:53:58 -07:00
Michael Suo	965146b818	[jit] delete netdef converter (#33807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33807 afaik this is unused, so removing it from the source tree. RIP :( Test Plan: Imported from OSS Differential Revision: D20122118 Pulled By: suo fbshipit-source-id: cb45943f5b9f969482301a2f9fe540326dbc78f2	2020-03-09 22:25:16 -07:00
James Reed	45a504dd2d	[JIT] Introduce BuiltinOpFunction and integrate into torchbind (#34098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34098 * #33900 [JIT] Move stuff out of class_type.cpp Test Plan: Imported from OSS Differential Revision: D20229166 Pulled By: jamesr66a fbshipit-source-id: d658a63a5d6e372e675f35b8456adc8de82b49f3	2020-03-07 10:03:56 -08:00
James Reed	60e8615a6d	[JIT] Virtualize Function (#33921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)! Test Plan: Imported from OSS Differential Revision: D20177227 Pulled By: jamesr66a fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde	2020-03-07 10:03:50 -08:00
Jiakai Liu	9a5e9d8cec	[pytorch][mobile] change mobile build scripts to build PyTorch by default (#34203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34203 Currently cmake and mobile build scripts still build libcaffe2 by default. To build pytorch mobile users have to set environment variable BUILD_PYTORCH_MOBILE=1 or set cmake option BUILD_CAFFE2_MOBILE=OFF. PyTorch mobile has been released for a while. It's about time to change CMake and build scripts to build libtorch by default. Changed caffe2 CI job to build libcaffe2 by setting BUILD_CAFFE2_MOBILE=1 environment variable. Only found android CI for libcaffe2 - do we ever have iOS CI for libcaffe2? Test Plan: Imported from OSS Differential Revision: D20267274 Pulled By: ljk53 fbshipit-source-id: 9d997032a599c874d62fbcfc4f5d4fbf8323a12e	2020-03-05 23:40:47 -08:00
Jie	2b79bab029	[CUDA_FUSER] Fork CUDA fuser (#33527 ) Summary: Separating CUDA fuser from CPU fuser. 1. New node in IR - prim::CudaFusionGroup: This enables the cuda fuser to co-exist along side the old fuser. Allows us to incrementally build and expand cuda fuser. 2. copied FuseGraph optimization passes to CudaFuserGraph: We will re-factor & reuse Chunk/Concat in the old fuser logic, which is handled in the optimization pass at this moment. Unfortunately many code in the pass is tightly binded with the legacy fuser, which makes code sharing difficult. The CudaFusionGraph will support only a subset of operations comparing to legacy fuser (CUDA only). It is registered as a custom pass post fusion via ```torch._C._jit_register_cuda_fuser()``` To have it in effect, you should also turn off fusion on GPU via ```torch._C._jit_override_can_fuse_on_gpu(False)``` 3. We don't have codegen in this PR yet (WIP). Currently we just fall back to the old fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527 Differential Revision: D20171598 Pulled By: ZolotukhinM fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71	2020-03-04 20:25:08 -08:00
Martin Yuan	f097ca503d	Add and test training in lite interpreter. (#32359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32359 Test Plan: Imported from OSS Differential Revision: D19450614 Pulled By: iseeyuan fbshipit-source-id: 6bafff39d7880a5b7fb9cd70c33a4e584812be12	2020-03-03 23:33:43 -08:00
Shihao Xu	7d01888a75	[JIT] Register rpc.rpc_async(..) as a JIT operator (#33329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329 # Use case ``` torch.jit.script def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor): # type: (str, str, Tensor) -> None rpc._rpc_async_torchscript( dst_worker_name, user_callable_qual_name, args=(tensor,) ) ``` # Problem ``` torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported: File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722 args = args if args else () kwargs = kwargs if kwargs else {} fut = _invoke_rpc_torchscript(to, qualified_name, args, *kwargs) ~~~~~~ <--- HERE return fut ``` # Solution Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list. # Plan This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments. - Register "prim::rpc_async" as a `Symbol` in "interned_string.h" - Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue). - Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator. - Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp". Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco. ``` #ifdef USE_DISTRIBUTED new code here #endif ``` Test Plan: Items that need to be confirmed in the test cases https://fb.quip.com/DCvdA9ZLjeO0 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit ``` Differential Revision: D5738300 fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074	2020-03-03 19:57:42 -08:00
davidriazati	9b39ad7f2c	[jit] Fix iOS build (#34180 ) Summary: `unpickler.cpp` depends on the mobile type parser all the time, so include it regardless of whether it's a mobile build or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/34180 Pulled By: driazati Differential Revision: D20241881 fbshipit-source-id: a998dd2b3f1c7f58e55bb7851dc595c8ddf9eacb	2020-03-03 19:44:43 -08:00
Zino Benaissa	cab8772c6c	Freezing Torchscript modules (#32178 ) Summary: This patch enables folding GetAttr nodes with their corresponding values. _jit_pass_freeze_module API returns a new TorchScipt module where all function calls and get attributes are inlined. Usage: frozen_model = torch._C._freeze_module(scrited_model._c) frozen_model.forward(...) This API currently optimizes the forward method. We will follow up to to preserve and optimize methods and attributes that are annotated as torch.jit.interface. Several future improvements to JIT optimizations are required to maximize clean up/de-sugar the graph and eliminate redundancies. Ideally, we want to produce a graph that can easily be lowered to GLOW and other low-level backends. __ Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178 Differential Revision: D19419640 Pulled By: bzinodev fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b	2020-03-02 11:38:36 -08:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
xiaobing.zhang	b678256bfb	Move glu to Aten(CPU) (#33179 ) Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9	2020-02-28 14:54:38 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Mikhail Zolotukhin	bf00b4d305	[TensorExpr] Add a boilerplate pass for future TensorExpr fusion pass. (#33464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33464 I added a python-exposed knob to register this pass in custom passes pipeline. If the knob is not used, the pass is not registered and thus not run at all. Differential Revision: D19958217 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: fecdd98567fcda069fbdf8995c796899a3dbfa5c	2020-02-24 18:47:31 -08:00
Yanli Zhao	4d9b649261	jit pickling rref (#32959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32959 in rpc torch script call path, we need to pickle/unpickle rref, this diff is added to make jit pickler/unpickler be able to pickle/unpickle rref. It is similar to what is implemented for PyRef::pickle() and PyRef::unpickle(). The pickling/unpickling design assumes it is always coupled with RPC calls. It is not needed to checkpoint a model with rref, before checkpointing the model, user should call ref.to_here() to get value inside rref. The pickling process is: 1. push torch.distributed.rpc.rref global string 1. call rref.fork() and create rrefForkData, which is a few IDs and type str of the value held inside the rref, the IDs includes rref id, fork id, caller work id, callee work id, owner work id 2. push the rrefForkData The unpickling process is: 1. read torch.distributed.rpc.rref global string, and retrieve the cached global lamda function 2. the globa lamda function will get rrefForkData 3. if callee is also owner work id, then get owner rref based on Ids inside rrefFork data and return the ownerRRef 4. if callee is not owner work id, then create user rref using the rrefForkData and return the userRRef 5. meanwhile owner rref will be notified and do reference counting correctly During unpickling, a type_resolver is needed to parse type str. This type_resolver has python dependency, so we get it from rpc_agent, and pass it to unpickler during construction. So we added a type_resolver argumenmt to jit unpickler constructor in this diff. ghstack-source-id: 98814793 Test Plan: unit test Differential Revision: D19713293 fbshipit-source-id: 4fd776cdd4ce8f457c4034d79acdfb4cd095c52e	2020-02-24 11:16:35 -08:00
Mikhail Zolotukhin	bb5181b716	[TensorExpr] Add IR Printer. (#33220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33220 Test Plan: Imported from OSS Differential Revision: D19848379 Pulled By: ZolotukhinM fbshipit-source-id: 1c6ab4f63080d4506dedc3c47938de92fb4bfba2	2020-02-21 13:10:26 -08:00
Mikhail Zolotukhin	fc70fc3610	[TensorExpr] Add IR visitor, IR mutator, and IR evaluator. (#33219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33219 Test Plan: Imported from OSS Differential Revision: D19848381 Pulled By: ZolotukhinM fbshipit-source-id: 44ca7cd99c25e290a8ffd8146785c19f9c785dfd	2020-02-21 13:10:22 -08:00
Mikhail Zolotukhin	49af9425a7	[TensorExpr] Add core classes for representing expressions and statements. (#33218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33218 Test Plan: Imported from OSS Differential Revision: D19848378 Pulled By: ZolotukhinM fbshipit-source-id: 48399f8651324d5ad0607e08573d5d7b2026bb23	2020-02-21 13:10:17 -08:00
Mikhail Zolotukhin	1a4f997178	[TensorExpr] Add a class for representing data type. (#33217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33217 Test Plan: Imported from OSS Differential Revision: D19848380 Pulled By: ZolotukhinM fbshipit-source-id: d8683f8fc4555d2456cd2a7c827d8e8231915b49	2020-02-21 13:10:12 -08:00
Mikhail Zolotukhin	089d658153	[TensorExpr] Add classes for memory management in tensor expressions. (#33216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33216 All tensor expressions belong to a kernel arena and are freed when the arena is destroyed. Until it is destroyed, all expressions stay valid. Test Plan: Imported from OSS Differential Revision: D19848382 Pulled By: ZolotukhinM fbshipit-source-id: a581ea2b635b9ba2cc53949616a13d8d3a47caae	2020-02-21 13:08:50 -08:00
Mikhail Zolotukhin	806e7daa1f	Rename TorchScript compiler to IR emitter to better reflect its function. (#33127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33127 Test Plan: Imported from OSS Differential Revision: D19806503 Pulled By: ZolotukhinM fbshipit-source-id: ab78bdbbac5f12dbcc6c2e2573f5862a16ffcf3d	2020-02-12 18:45:13 -08:00
Shihao Xu	12bcfa7c77	Remove Python dependency (toPyTuple/fromPyTuple, jitCompilationUnit, deserialize) in rref_impl.h/cpp (#32753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32753 Functions to be bound as an Aten operator could not have Python dependency. This is to refactor and remove Python dependency. ghstack-source-id: 97485800 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5741675 fbshipit-source-id: 31ee60955be8d815d0773f3699e3ff2f1f9d8849	2020-01-30 17:52:48 -08:00
Basil Hosmer	fb159b5236	Some work on eager op binding codegen (gen_python_functions.py) (#29986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986 Previously in addition to generating a python binding for each op, we would generate an almost-trivial helper for each overload. This PR eliminates the helpers, simplifying codegen logic a bit and reducing the source-level indirection by a step. Perf should be unchanged. codegen diff: `1f2f07fb60` Note: in the interests of keeping the diff contained, there's only some light cleanup here beyond what's necessary for the codegen changes. Plan is to do some more substantial refactoring in followup PRs that leave generated code unchanged. Test Plan: Imported from OSS Differential Revision: D18567980 Pulled By: bhosmer fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906	2020-01-30 00:29:53 -08:00
Elias Ellison	25d33a2ee8	[JIT] Use Type Level Granularity in Alias Analysis Wildcards (#32251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32251 Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket. This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements. Test Plan: Imported from OSS Differential Revision: D19563263 Pulled By: eellison fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362	2020-01-28 18:07:48 -08:00
James Reed	465ebd58ba	[JIT] pickle serialization for custom bound classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32604 Test Plan: Imported from OSS Differential Revision: D19566633 fbshipit-source-id: 9387d3ff45cbd6ccde49ce190a52859481cc301c	2020-01-28 11:02:59 -08:00
Jiakai Liu	0ac31a99be	run code analysis against mobile interpreter (#32276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276 Include mobile interpreter in mobile code analysis pass, which has some manually registered ops in temporary namespaces. The mobile interpreter is still under development and these ops will be removed in the future. This is a temporary step for internal build experiment. Test Plan: Imported from OSS Differential Revision: D19426818 Pulled By: ljk53 fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5	2020-01-17 17:21:28 -08:00
Jiakai Liu	ab5eb65e74	gate torch_global_deps with BUILD_SHARED_LIBS flag (#32011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32011 Run into build problem with Ninja + code analysis build as follows: ``` The install of the torch_global_deps target requires changing an RPATH from the build tree, but this is not supported with the Ninja generator unless on an ELF-based platform. ``` Seems we don't need build the target for static build mode? Verified code analyzer works with the patch. Test Plan: Imported from OSS Differential Revision: D19336818 Pulled By: ljk53 fbshipit-source-id: 37f45a9392c45ce92c1df40d739b23954e50a13a	2020-01-10 11:37:24 -08:00

1 2 3 4 5 ...

405 Commits