pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-27 00:54:52 +08:00

Author	SHA1	Message	Date
Chen Lai	103fc5f9a5	Remove unused variable (#70261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70261 ghstack-source-id: 146310591 Test Plan: ``` buck test fbsource//xplat/caffe2:for_each_prod_ptl_model_test ``` {gif:p014gzft} Reviewed By: iseeyuan Differential Revision: D33265656 fbshipit-source-id: 6e303ee304064a61383ba2ae34f2e21077ec9db3	2021-12-28 22:21:29 -08:00
Andrey Talman	066c9ff08f	Deprecating python 3.6 (#70325 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70457 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70325 Reviewed By: seemethere Differential Revision: D33339496 Pulled By: atalman fbshipit-source-id: 7509cab4f7469dae234bcf3f79e0aabb54577b8a	2021-12-28 18:44:59 -08:00
Chen Lai	a0c99a8d3b	[Operator Verioning][Edge] Update upgrader codegen with latest change (#70293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70293 ``` python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` https://github.com/pytorch/pytorch/pull/70161 is landed to resolve a thread safety issue. Accordingly, the upgrader codegen needs to be updated. ghstack-source-id: 146296324 Test Plan: ``` buck test mode/opt //caffe2/test:upgrader_codegen buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` Reviewed By: iseeyuan Differential Revision: D33274831 fbshipit-source-id: 0e1d2a81edc9b6111f3c6127dbd5b97e16c93dca	2021-12-28 18:34:31 -08:00
Joel Schlosser	a6eadf9b50	Remove backward op for slow 3d convolution (#69978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69978 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33131003 Pulled By: jbschlosser fbshipit-source-id: 097440b2eb501c1eeeb8a666d4bc3508fc5d0cfa	2021-12-28 16:19:23 -08:00
Eli Uriegas	5e113eb24d	.github: Add linux.4xlarge executor (#70474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70474 Needed to compile linux wheels for CUDA 11.x since we were OOM'ing with 16GB of RAM Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D33343322 Pulled By: seemethere fbshipit-source-id: 9f62e07ce2ca229fa25285429c01dc074d63b388	2021-12-28 15:40:28 -08:00
Lin Dong	0fb73035f7	[Bootcamp Task] Replace string concatenation by fmt::format (#70366 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69979 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70366 Reviewed By: H-Huang Differential Revision: D33339291 Pulled By: LynneD fbshipit-source-id: e4e0535cd2db8e9fa8b0875d17a900be58384367	2021-12-28 14:15:21 -08:00
Joel Schlosser	e96dda15e5	Remove backward op for slow 2d transposed convolution (#70333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70333 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33301402 Pulled By: jbschlosser fbshipit-source-id: 3cfb3165589fe1620f22479b05139676d20dc493	2021-12-28 12:38:59 -08:00
Joel Schlosser	c732a26e59	Add macro to register CPU kernel for all arch types (#70332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70332 Idea to avoid recompilations: what if we introduce a new macro REGISTER_ALL_CPU_DISPATCH that registers the same kernel across all CPU arch types? We'd call this from native/Convolution*.cpp and wouldn't need to move any logic underneath the native/cpu dir. That would simplify these PRs quite a bit and would also avoid the recompilation. Wdyt about this approach? Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33301403 Pulled By: jbschlosser fbshipit-source-id: d7cc163d4fe23c35c93e512d1f0a8af8c9897933	2021-12-28 12:37:36 -08:00
Eli Uriegas	244730eeea	.github: Add needs build for generate-test-matrix (#70456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70456 This job was still running on workflows despite ciflow not being enabled This makes it so that test matrix generation only occurs before tests are actually run. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D33338946 Pulled By: seemethere fbshipit-source-id: 4b83d5fe6572771807708764609a72c4f1c5745d	2021-12-28 10:11:34 -08:00
s-kumano	4ed02748be	fix typo in the docs of multiprocessing (#70448 ) Summary: Fix typo in the docs of multiprocessing. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70448 Reviewed By: gchanan Differential Revision: D33336962 Pulled By: H-Huang fbshipit-source-id: 1235703b8ddc26c33dcbc34bd25ac36b11a18923	2021-12-28 09:58:47 -08:00
srijan789	73b5b6792f	Adds reduction args to signature of F.multilabel_soft_margin_loss docs (#70420 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70420 Reviewed By: gchanan Differential Revision: D33336924 Pulled By: jbschlosser fbshipit-source-id: 18189611b3fc1738900312efe521884bced42666	2021-12-28 09:48:05 -08:00
Eli Uriegas	6f83841582	.github: Temporarily disable xla test config (#70453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70453 Removes the current xla config, downstream `pytorch/xla` is broken for clang compilation so temporarily removing this config until the xla team can fix this upstream CI. Context: https://github.com/pytorch/xla/pull/3255/files#r775980035 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zengk95 Differential Revision: D33338463 Pulled By: seemethere fbshipit-source-id: 1ef332c685d5e2cc7e2eb038e93bd656847fd099	2021-12-28 08:49:01 -08:00
Adnios	15f14ce0dc	fix typo in adam docs (#70387 ) Summary: Fix the typo in [adam docs in master branch](https://pytorch.org/docs/master/generated/torch.optim.Adam.html#torch.optim.Adam) ![image](https://user-images.githubusercontent.com/41060790/147345284-37e180d1-fd06-4a62-9c79-2d17b8aa5cd3.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/70387 Reviewed By: H-Huang Differential Revision: D33309283 Pulled By: albanD fbshipit-source-id: d20c5d8f2498ac64013f71e202a6b50dcc069f2b	2021-12-28 07:35:39 -08:00
Vasiliy Kuznetsov	574dbb584d	quant tests: fix log spew for HistogramObserver (#70107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70107 Histogram observer used floor division on tensors, which is a deprecated behavior. There was a warning printed: ``` /Users/vasiliy/pytorch/torch/ao/quantization/observer.py:905: UserWarning: __floordiv__ is deprecated, and i ts behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' funct ion NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='flo or'). ``` This PR fixes the warning. Test Plan: ``` python test/test_quantization.py TestHistogramObserver ``` Reviewed By: ejguan Differential Revision: D33187926 Pulled By: vkuzo fbshipit-source-id: 9c37de4c6d6193bee9047b6a28ff37ee1b019753	2021-12-28 06:27:51 -08:00
Vasiliy Kuznetsov	00df885d4e	quant tests: clean up logs about incorrect tensor copy (#70106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70106 Some of quantization tests had log spew like ``` UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). ``` This PR cleans up the root cause from the utils. Some other tests may still hit this warning from other places Test Plan: ``` python test/test_quantization.py TestFakeQuantizeOps ``` this particular warning no longer appears Reviewed By: soulitzer Differential Revision: D33187925 Pulled By: vkuzo fbshipit-source-id: bd1acd77fd72a10dad0c254f9f9f32e513c8a89a	2021-12-28 06:26:40 -08:00
Michael Suo	b7b32b56f1	Revert D33281300: Prevent sum overflow in broadcast_object_list Test Plan: revert-hammer Differential Revision: D33281300 (`807f9a828c`) Original commit changeset: 1bc83e8624ed Original Phabricator Diff: D33281300 (`807f9a828c`) fbshipit-source-id: beb81a9cbfba405a61b11dfaa8e39c9601f45643	2021-12-27 19:01:53 -08:00
Stephan Uphoff	807f9a828c	Prevent sum overflow in broadcast_object_list (#70336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70336 broadcast_object_list casted the sum of all object lengths to int from long causing overflows. Test Plan: Increased size of Tensor used in object transfers to have >2GB storage requirement (in distributed_test.py) Without fix the length will overflow and the program will request a negative sized Tensor: ``` RuntimeError: Trying to create tensor with negative dimension -2147482417: [-2147482417] ``` With fix it will pass the test. Test used on server with GPUs: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn --local -- broadcast_object Differential Revision: D33281300 fbshipit-source-id: 1bc83e8624edc14e747eeced7bc8a7a10e443ee4	2021-12-27 16:17:53 -08:00
Facebook Community Bot	5a9ea9e386	Automated submodule update: tensorpipe (#70438 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `52791a2fd2` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70438 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: zertosh Differential Revision: D33331758 fbshipit-source-id: 1e811ddc30e9afa440523c6cb5c4e893eb560978	2021-12-27 15:19:21 -08:00
Bo Wu	bf610f08b0	Back out "Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions" Summary: as title Test Plan: ``` buck run mode/opt-split-dwarf -c=python.package_style=inplace //ai_infra/distributed_ai/pyper_test_framework/templates:pyper_release_v2 -- --model inline_cvr_post_imp_deterministic_shrunk_pyper_release_v2 --cluster TSCTestCluster --hpc_identity oncall_pyper_oncall --stage prod_offline_training --test_module training_platform ... ############## Start inline_cvr_post_imp_model Test Results Analysis ############## I1226 22:03:56.789000 3346280 test_driver.py:139 UNKNOWN ] Test finished in 808.2743511786684 seconds. +-------------------------+---------+------------------------+-----------------+ \| Test Case \| Status \| Message \| Model Entity ID \| +-------------------------+---------+------------------------+-----------------+ \| SmallWorld_release_test \| Success \| finished successfully. \| 987987491 \| +-------------------------+---------+------------------------+-----------------+ I1226 22:03:56.790000 3346280 test_driver.py:143 UNKNOWN ] test_run_id: 3d085f61-28d1-411d-bd27-940ea2554b23 use this id to find your run in scuba pyper_test_framework I1226 22:03:56.792000 3346280 test_driver.py:160 UNKNOWN ] Calling cleanup I1226 22:03:56.792000 3346280 training_platform_test_launcher.py:385 UNKNOWN ] Stopping launched jobs 1 I1226 22:03:59.563122 3346280 ClientSingletonManager.cpp:100] Shutting down Manifold ClientSingletonManager ``` Reviewed By: seemethere Differential Revision: D33325936 fbshipit-source-id: 64414bf7061ad77e8ac12eb8abafee4043e0fa1e	2021-12-27 09:11:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	4ae71c8d34	Add graph op replacement pass (#69915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69915 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198158 Pulled By: tugsbayasgalan fbshipit-source-id: f2b924edf9959aaf51f97db994fae031fa062cf8	2021-12-25 13:03:19 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	63e58d262a	Extend Graph, CompilationUnit, and schema matching to accept optional operator version number (#69914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69914 Test Plan: Imported from OSS Reviewed By: qihqi Differential Revision: D33198157 fbshipit-source-id: b98d9401e515f695d6cf99116f695edc7976bf01	2021-12-25 00:35:33 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	df3cbcff28	Add utility methods to find an upgrader (#68355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68355 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D33198156 Pulled By: tugsbayasgalan fbshipit-source-id: 68380148f0d9bee96d8090bf01c8dfca8e1f8b12	2021-12-24 12:23:04 -08:00
Shunting Zhang	911d527b87	Make TorchScript Preserve Fully Qualified Class Name for Python Exceptions (#70339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70339 When a python program is translated to TorchScript, the python exception type is dropped. This makes users's life hard when they need to categorize errors based more than only exception message. Here we make the change so when we raise a python exception, we record the fully qualified class name for the exception. Later on when the TorchScript is interpreted, a special exception CustomJITException is thrown. User can get the python class name from CustomJITException::getPythonClassName . Note that, this diff does not customize the mapping from C++ exception to Python exception. It's left to the users to do whatever mapping they want. Code under scripts/shunting are just my own experimental code. I can split them out if requested. ghstack-source-id: 146221879 Test Plan: buck test mode/opt //caffe2/test:jit Reviewed By: gmagogsfm Differential Revision: D33282878 fbshipit-source-id: 910f67a764519f1053a48589d1a34df69001525d	2021-12-24 00:25:40 -08:00
Priya Ramani	ab4f9862a3	[Compiled Mobilenetv3 Demo] Integrate Compiled Mobilenetv3 into FB4A Playground app (#70370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70370 Demo of Mobilenetv3 compiled with NNC in FB4A Playground app: - Add compiled ModelConfig in FB4A app - Enable Camera inputs for Mobilenet processor in the app and add ability to show live outputs - Use downscaled inputs, which works for both original mobilenetv3 model and the compiled model - Update nnc_aten_adaptive_avg_pool2d to use adaptive_avg_pool2d instead of adaptive_avg_pool2d_out as the latter is not included in the traced operators of mobilenetv3 model and hence not included in the app. - Update app dependencies to include nnc_backend_lib and asm binary Test Plan: Run `arc playground pytorchscenario` from fbandroid to build and install the app on a connected device. Live demo with compiled Mobilenetv3 model: https://pxl.cl/1W1kb Reviewed By: larryliu0820 Differential Revision: D33301477 fbshipit-source-id: 5d50a0e70a7f7d2157d311d6b1feef46e78e85b6	2021-12-23 23:46:20 -08:00
Mikhail Zolotukhin	0ee663d2fa	Revert D33234529: [NNC Testing] Randomized loop nest infrastructure Test Plan: revert-hammer Differential Revision: D33234529 (`1d094587ea`) Original commit changeset: 9019f1f1d4ca Original Phabricator Diff: D33234529 (`1d094587ea`) fbshipit-source-id: a79deca9f186299bf884587eb7d50af2464979fb	2021-12-23 23:11:23 -08:00
jjsjann123	e429a68478	Allow single node fusion for nvfuser (#70000 ) Summary: Setting `PYTORCH_NVFUSER_ONE_OP_FUSION=1` will take all nodes nvFuser support, instead of waiting for fusion opportunity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70000 Reviewed By: samdow Differential Revision: D33292195 Pulled By: davidberard98 fbshipit-source-id: 8ed5ce5e82fbb6737e8ab5ce4223b038eaf47756	2021-12-23 17:07:57 -08:00
soulitzer	5ccf28d066	Do not use ZeroTensor for inplace ops (#69998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69998 Fixes: https://github.com/pytorch/pytorch/issues/69855 The check for undefined grads for forward AD was not being run because `check_undefined_grads` was only passed as True by OpInfo for backward AD. This PR updates gradcheck to interpret `check_undefined_grads` as possibly for forward or backward AD. This PR also updates codegen to 1) not use ZeroTensor for `self` when the op is inplace. 2) only create zeros (either through ZeroTensor or at::zeros) if the tensor itself is not undefined. Previously we would error in this case when we call `.options` on the undefined tensor. ~TODO: undo the skips that are due to the original issue~ Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33235973 Pulled By: soulitzer fbshipit-source-id: 5769b6d6ca123b2bed31dc2bc6bc8e4701581891	2021-12-23 15:52:34 -08:00
soulitzer	3116d87024	Add forward AD formulas for `{adaptive_,fractional_,}max_pool{2,3}d_{backward,}` (#69884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69884 Also fixes: https://github.com/pytorch/pytorch/issues/69322, https://github.com/pytorch/pytorch/issues/69325 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D33093039 Pulled By: soulitzer fbshipit-source-id: b9a522a00f4e9e85974888de5058de07280f8f66	2021-12-23 15:51:09 -08:00
Jordan Fix	6925576e88	[acc_ops] No longer mark acc_ops.cat as unary (#70365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70365 We should only mark ops as unary if they should have a single fx.Node input. However, `cat` has a sequence of `tensors` input. Reviewed By: alexbeloi Differential Revision: D33299988 fbshipit-source-id: db3581eaee4ad9d2358eed01ec9027825f58f220	2021-12-23 15:09:03 -08:00
Jane Xu	133c7f2cf9	Revert D33301254: [pytorch][PR] GHA Windows: Propagate exit code from .bat to calling bash script Test Plan: revert-hammer Differential Revision: D33301254 (`6431ac6c7a`) Original commit changeset: 6861dbf0f0a3 Original Phabricator Diff: D33301254 (`6431ac6c7a`) fbshipit-source-id: c9d8f72bb198de678456e0a1bcf3264c2ea52874	2021-12-23 15:03:48 -08:00
Jane Xu	6431ac6c7a	GHA Windows: Propagate exit code from .bat to calling bash script (#70011 ) Summary: The windows 1st shard was silently failing to run (more details here https://github.com/pytorch/pytorch/issues/70010) because the code to run them was never reached. It was silently failing because our CI still returned green for those workflow jobs, because the exit code from the batch script DID NOT PROPAGATE to the calling bash script. The key here is that even though we have ``` if ERRORLEVEL 1 exit \b 1 ``` The exit code 1 was NOT propagating back to the bash script, as the `exit \b 1` was within an `if` statement and the batch script was actually run in a cmd shell, so the bash script win-test.sh continued without erroring. Moving the `exit \b 1` to be standalone fixes it. More details for this can be found in this stack overflow https://stackoverflow.com/a/55290133 Evidence that now a failure in the .bat would fail the whole job: https://github.com/pytorch/pytorch/runs/4621483334?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70011 Reviewed By: malfet Differential Revision: D33301254 Pulled By: janeyx99 fbshipit-source-id: 6861dbf0f0a34d5baed59f928e34eab15af6f461	2021-12-23 14:09:41 -08:00
Jiewen Tan	ab57f6d12c	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: samdow Differential Revision: D33293160 Pulled By: alanwaketan fbshipit-source-id: 78647239f90b4d04adce84ae6022b8983ad30c09	2021-12-23 12:42:03 -08:00
Rohan Varma	16e6e1a59e	[Easy] Lint wrap.py file (#70341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70341 Per title ghstack-source-id: 146181936 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33290099 fbshipit-source-id: e4415a42086d9b1b78b0b5f42d4b02f275131dfa	2021-12-23 11:30:36 -08:00
Rohan Varma	3c231e9bd7	[FSDP] Remove module.wrapper_config support (#70340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70340 Some wrap APIs support module.wrapper_config to specify the FSDP arguments, though this feature is currently unused in all use cases and there is no plan to support this API. enable_wrap() and wrap() along with FSDP constructor wrapping should be enough for all use cases, so get rid of the unnecessary code. ghstack-source-id: 146181819 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33290066 fbshipit-source-id: e7f3d8b2f2ff6bdf4a3e5021dbb53adf052ee8dc	2021-12-23 11:29:13 -08:00
Sameer Deshmukh	d100d98db8	`torch.linalg` routines return `torch.linalg.LinAlgError` when a numerical error in the computation is found. (#68571 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/64785 by introducing a `torch.LinAlgError` for reporting errors caused by bad values in linear algebra routines which should allow users to easily catch errors caused by numerical errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68571 Reviewed By: malfet Differential Revision: D33254087 Pulled By: albanD fbshipit-source-id: 94b59000fdb6a9765e397158e526d1f815f18f0f	2021-12-23 10:53:26 -08:00
Mike Iovine	6a84449290	[SR] Fast path for VarStack on scalars (#70210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70210 Add a fast-path for `VarStack` nodes for when the inputs are scalars. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- VarStack` Reviewed By: hlu1 Differential Revision: D33177498 fbshipit-source-id: 922ab76a6808fbfdb8eb6091163a380344e38de6	2021-12-23 10:31:17 -08:00
kshitij12345	cc8b916395	Transformer{DecoderLayer} : no batch dim (#70322 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60585 TransformerDecoder Test Timings (takes about 30s) <details> ``` pytest test/test_modules.py -k _TransformerDeco --durations=10 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 639 items / 591 deselected / 48 selected test/test_modules.py ss......ss......ss..ssssssssss.................. [100%] ================================================================================================================================================================================ slowest 10 durations ============================================================================================== 17.13s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_TransformerDecoderLayer_cuda_float64 4.13s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_TransformerDecoderLayer_cpu_float64 1.22s call test/test_modules.py::TestModuleCUDA::test_grad_nn_TransformerDecoderLayer_cuda_float64 0.86s call test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cpu_float32 0.73s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float32 0.57s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float32 0.56s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_TransformerDecoderLayer_cuda_float64 0.48s call test/test_modules.py::TestModuleCPU::test_grad_nn_TransformerDecoderLayer_cpu_float64 0.41s call test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_TransformerDecoderLayer_cuda_float32 0.40s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_TransformerDecoderLayer_cuda_float64 ============================================================================================ short test summary info ============================================================================================= ========================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 29.62s =========================================================================== ``` </details> Transformer Test Timings (takes about 1m10s) <details> ``` pytest test/test_modules.py -k _Transformer_ --durations=10 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 639 items / 591 deselected / 48 selected test/test_modules.py ss......ss......ss..ssssssssss.................. [100%] ================================================================================== ============================================================================================== slowest 10 durations ============================================================================================== 46.40s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_Transformer_cuda_float64 11.09s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_Transformer_cpu_float64 2.48s call test/test_modules.py::TestModuleCUDA::test_grad_nn_Transformer_cuda_float64 1.03s call test/test_modules.py::TestModuleCPU::test_grad_nn_Transformer_cpu_float64 0.96s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float32 0.87s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float32 0.85s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_Transformer_cuda_float64 0.85s call test/test_modules.py::TestModuleCPU::test_cpu_gpu_parity_nn_Transformer_cpu_float32 0.65s call test/test_modules.py::TestModuleCUDA::test_cpu_gpu_parity_nn_Transformer_cuda_float64 0.47s call test/test_modules.py::TestModuleCUDA::test_multiple_device_transfer_nn_Transformer_cuda_float32 ============================================================================================ short test summary info ============================================================================================= ===================================================================== 32 passed, 16 skipped, 591 deselected, 3 warnings in 70.19s (0:01:10) ====================================================================== ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/70322 Reviewed By: cpuhrsch Differential Revision: D33286285 Pulled By: jbschlosser fbshipit-source-id: 46e08cf47f37787733a535f683c3fd21f652486d	2021-12-23 10:13:31 -08:00
George Qi	4d49af863f	GaussianNLLLoss no_batch_dim docs and testing (#69783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69783 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33200486 Pulled By: george-qi fbshipit-source-id: a2bc2b366772682825f879dae4ac29c1f4d6a5f1	2021-12-23 09:27:53 -08:00
Adnios	a9c7d626e1	Add the `maximize` flag to AdamW (#70146 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/68052 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/70146 Reviewed By: malfet Differential Revision: D33254561 Pulled By: albanD fbshipit-source-id: f190c836a4162f936c5953e076747c345df21421	2021-12-23 09:20:29 -08:00
Yanli Zhao	b15212c62b	enable backward pass computation and communication overlap by prefetching all gather (#70235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70235 address comments in https://github.com/pytorch/pytorch/pull/69282: Have fixed a few corner cases for prefetching full parameters in post backward hook. After benchmarking, prefetching full parameters in the pre-backward hook has the best performance and stable but at cost of increased memory; prefetching full parameters in the post-backward hook did not see expected performance, also failed in a few corner cases (fixed) although there is no memory increase. The main issue is that post backward hook fire order is not consistent with opposite of forward computation order, so incorrectly prefetched all gather could delay the really needed all gather in the single NCCL stream and cause some layer's computation delay. So putting these two algorithms as two configurable experimental algorithms for now prefetch full parameters at pre-backward hook: It is observed from past traces that all gather ops are not triggered until current layer's backward pass starts to compute, also for some models previous layers' reduce scatter is scheduled before next layer's all gather ops, since all gather and reduce scatter are in the same nccl stream, this case could result in backward pass has no communication and computation overlap. To explicitly make next layers' all gather scheduled while previous layers' backward computation is running, we can prefetch next layers' all gather full params. This can help 1) both all gather and reduce scatter are overlapped with computation deterministically 2) only prefetch one layer's all gather full parameters, to avoid increasing too much memories. The implementation borrowed the idea from facebookresearch/fairscale#865, where forward graph order is recorded in the forward pass. In the backward pass, this PR prefetches all gather full parameter in current layer's pre-backward hook, instead of prefetching in current layer's post backward hook in facebookresearch/fairscale#865. Also make sure all gather streams are synced properly. Experiments showed 10% memory increase and 20% latency speed up for 1GB roberta model in a slow network environment. Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D33252795 fbshipit-source-id: 4e2f47225ba223e7429b0dcaa89df3634bb70050	2021-12-22 23:02:46 -08:00
Animesh Jain	1d094587ea	[NNC Testing] Randomized loop nest infrastructure (#70174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70174 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33234529 fbshipit-source-id: 9019f1f1d4ca945c92bee401f7ec674b7d987de4	2021-12-22 22:07:39 -08:00
Jerry Zhang	656d2a7bf6	[quant][fx][graphmode] Add backend_config_dict for standalone module (#70150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70150 This PR allows user to specify backend_config_dict for standalone modules, both in prepare and convert step adding this now to allow prototype for some of our customer use cases, test for the codepath will be added in a separate PR Test Plan: regression tests ``` python test/test_quantization.py TestQuantizeFx ``` test that specifies backend_config for some module will be added in a separate PR for the use case we have in mind since it requires other features Imported from OSS Static Docs Preview: classyvision \|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33205162/V9/classyvision/)\| \|Modified Pages\| Reviewed By: vkuzo Differential Revision: D33205162 fbshipit-source-id: a657cef8e49d99b6a43653141521dc87c33bfd89	2021-12-22 21:18:39 -08:00
Michael Suo	795af1578c	Revert D33172665: [LTC] Upstream utils to extract BackendDevice from at::Tensor Test Plan: revert-hammer Differential Revision: D33172665 (`121d067999`) Original commit changeset: b334ee358ea7 Original Phabricator Diff: D33172665 (`121d067999`) fbshipit-source-id: 8bff43cddfc5d30483ec5cea8eff037aab9d1cfa	2021-12-22 21:12:49 -08:00
kshitij12345	12afe2bb84	update poisson_nll_loss opinfo samples (#70300 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67461 cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70300 Reviewed By: cpuhrsch Differential Revision: D33285896 Pulled By: jbschlosser fbshipit-source-id: ec917ec7d3113dbc4ae03978fa5abb24aa082c01	2021-12-22 19:10:57 -08:00
Taylor Robie	681e78bace	[Profiler] Address issues from profiler bifurcation. (#70327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70327 After D32678163 (`7ea86dfdb1`), test_rpc_profiler began failing. This was surprising, because it should have been a no-op refactor. However, one change is that a Kineto profiler is no longer also an autograd profiler; the RPC framework was assuming a legacy profiler but when a kineto profiler was active things still kind of worked due to that implementation detail. (But crashed after the class split.) This diff tidys up a couple of things: 1) Move `getProfilerConfig` into `api.cpp`, since it is no longer correct to static_cast a `KinetoThreadLocalState` to a `ProfilerLegacyThreadLocalState`. (And really the class we want is `ProfilerThreadLocalStateBase` anyway.) 2) Add a mechanism for callers to check if the active profiler is a legacy or kineto profiler. (So callers like RPC can adjust or provide a nice error message.) 3) Fix the RPC test to create a legacy profiler. Test Plan: `caffe2/torch/fb/training_toolkit/backend/tests:test_rpc_profiler` now passes, and before the fix to `test_rpc_profiler.py`, I verified that the test failed with the error message added to `utils.cpp` rather than just crashing. Reviewed By: suphoff Differential Revision: D33283314 fbshipit-source-id: e4fc5b5cfc9ca3b91b8f5e09adea36f38611f90d	2021-12-22 18:50:42 -08:00
Jiewen Tan	121d067999	[LTC] Upstream utils to extract BackendDevice from at::Tensor (#70069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70069 This commit upstreams utils to extract BackendDevice from at::Tensor. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.GetBackendDevice* Reviewed By: wconstab Differential Revision: D33172665 Pulled By: alanwaketan fbshipit-source-id: b334ee358ea7b031bbffb0a16fa634715dba83f5	2021-12-22 18:15:45 -08:00
Nikita Shulga	bd8e8e3aaf	[GHA] Clean after checkout (#70337 ) Summary: Github's checkout action sometimes leaves untracked files in the repo Remedy it by running `git clean -fxd`, which should nuke them all Tentative fix for https://github.com/pytorch/pytorch/issues/70097 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70337 Reviewed By: suo Differential Revision: D33289189 Pulled By: malfet fbshipit-source-id: 16e3ebe7a61fda1648189c78bdf1b1185247037a	2021-12-22 18:10:23 -08:00
kshitij12345	a421ee0e52	[nn] InstanceNorm : no batch dim for modules (#65323 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65323 Reviewed By: davidberard98 Differential Revision: D33285268 Pulled By: jbschlosser fbshipit-source-id: c5210bb431eaf27190e1cd75c42af3e5bcf83f72	2021-12-22 18:00:36 -08:00
Michael Suo	c06b3208d4	Revert D33141012: test //c10/... in CI Test Plan: revert-hammer Differential Revision: D33141012 (`0ccccf4ed5`) Original commit changeset: 702000587171 Original Phabricator Diff: D33141012 (`0ccccf4ed5`) fbshipit-source-id: 1e30c2dad940f54185dc93912fd7b3e81eec5b63	2021-12-22 17:48:48 -08:00
Michael Suo	23ab6ce723	Revert D33141011: extract //c10/macros into its own package Test Plan: revert-hammer Differential Revision: D33141011 (`8f4c724bb6`) Original commit changeset: caa97448f922 Original Phabricator Diff: D33141011 (`8f4c724bb6`) fbshipit-source-id: 79423ed51f9a43ecf1f716a739c74949b66fadb4	2021-12-22 17:48:45 -08:00
Michael Suo	f126501d37	Revert D33141010: allow Bazel to build without glog and gflags Test Plan: revert-hammer Differential Revision: D33141010 (`8c41f258f4`) Original commit changeset: d951e5616459 Original Phabricator Diff: D33141010 (`8c41f258f4`) fbshipit-source-id: d52ca20ddf4c5a91cb09a32fecb30a00227fc4ae	2021-12-22 17:47:23 -08:00
Mike Iovine	682fab19d4	[SR] verify_and_correct_memory_overlap handles tensor lists (#69774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69774 We recently ran into a nasty bug caused by incorrect schema annotations on an `aten::split` overload. `verify_and_correct_memory_overlap` is supposed to prevent crashes in this scenario, but it didn't because it did not handle `Tensor[]` outputs. This change extends the memory correction mechanism to handle tensor lists. ghstack-source-id: 146152478 Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D33022494 fbshipit-source-id: 8d1d41ca1d4fd5dfb7c8a66028c391ba63551eb0	2021-12-22 17:18:18 -08:00
Jiewen Tan	385c12852e	[LTC] Upstream LazyTensor <=> at::Tensor utils (#70066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70066 This commit upstreams utils to convert at::Tensors into LazyTensors and vice versa. Test Plan: Covered by test_ptltc on the lazy_tensor_staging branch since TorchScript Backend hasn't merged yet. Reviewed By: desertfire Differential Revision: D33171590 Pulled By: alanwaketan fbshipit-source-id: b297ff5fc8ca1a02d30e16ad2249985310e836a9	2021-12-22 16:53:07 -08:00
Joel Schlosser	2e94a0d282	Remove backward ops for NNPACK spatial convolution (#70305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70305 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33279223 Pulled By: jbschlosser fbshipit-source-id: f263012b3edaa87ce5430ffd6204a5453360d5dd	2021-12-22 14:58:12 -08:00
Peter Bell	7cdfd86a72	TestMathBits: test with neg and conj bit set (#68948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68948 The case where both the negative and conjugate bits are set isn't tested currently despite being handled explicitly by `copy`. In theory this shouldn't matter because neg_bit is only used for real values, but it does mean the code in copy is untested. So, this just runs it with a single sample as a sanity check. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064371 Pulled By: anjali411 fbshipit-source-id: e90c65e311507c4fc618ff74fecc4929599c4fa3	2021-12-22 14:30:35 -08:00
George Qi	7c690ef1c2	FractionalMaxPool3d with no_batch_dim support (#69732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69732 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33280090 Pulled By: george-qi fbshipit-source-id: aaf90a372b6d80da0554bad28d56436676f9cb89	2021-12-22 14:30:32 -08:00
Michael Dagitses	8c41f258f4	allow Bazel to build without glog and gflags (#69995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69995 ghstack-source-id: 146027060 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141010 fbshipit-source-id: d951e5616459e8aa163ae0741e245f53185580e8	2021-12-22 14:30:30 -08:00
Michael Dagitses	8f4c724bb6	extract //c10/macros into its own package (#69994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69994 ghstack-source-id: 145799968 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141011 fbshipit-source-id: caa97448f922d7c12980bf01669c1b3ef5c1213b	2021-12-22 14:30:27 -08:00
Michael Dagitses	0ccccf4ed5	test //c10/... in CI (#69993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69993 ghstack-source-id: 145799967 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33141012 fbshipit-source-id: 70200058717189a57858f3f8d94ecc364fb229d6	2021-12-22 14:30:24 -08:00
Rui Zhu	1bd147b61a	Fix masked_softmax's perf for element_size is not 8 (#70271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70271 Test Plan: Rebase on top of D32407544 and buck run mode/opt -c fbcode.enable_gpu_sections=true pytext/fb/tools:benchmark_masked_softmax -- masked-softmax --batch-size=10 to see correct perf data ( PT time = ~2.5x PT native time ) Reviewed By: ngimel Differential Revision: D33268055 fbshipit-source-id: f48b17852c19c2bc646f9ed8d9d5aac85caa8a05	2021-12-22 14:29:09 -08:00
Peter Bell	c34aa715fa	AT_MKL_SEQUENTIAL and build changes (#70259 ) Summary: Re-land of https://github.com/pytorch/pytorch/pull/69419 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70259 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246757 Pulled By: ngimel fbshipit-source-id: 738f8558d4cad6752be14108f9931ec3514f6682	2021-12-22 13:52:23 -08:00
Kimish Patel	b37de0a4bb	Update flags in nnc lowering (#70306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70306 USE_XNNPACK is the right one to enable lowering to prepacked xnnpack based ops Test Plan: CI Reviewed By: ZolotukhinM, priyaramani Differential Revision: D33279375 fbshipit-source-id: d19ded5643f487f7b58c54a860ad39c8d484ed05	2021-12-22 12:25:35 -08:00
zengk95	f36b44bb9e	Remove ciflow_should_run job (#70204 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66725 This removes the ci_flow_should_run job and puts it in the build stage for the different job templates. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70204 Reviewed By: malfet Differential Revision: D33282338 Pulled By: zengk95 fbshipit-source-id: 327ff2bca9720d2a69083594ada5c7788b65adbd	2021-12-22 11:52:42 -08:00
Pascal	276253b164	Fixed wrong return type in ModuleList getitem (#69083 ) Summary: Fixes typing error: `Expected type ‘Iterable’ (matched generic type ‘Iterable[_T1]’), got ‘Module’ instead. ` see: https://discuss.pytorch.org/t/modulelist-typing-error-not-an-iterable/138137/5 : To reproduce (e.g. with mypy/pycharm): ```python import torch.nn as nn class Model(nn.Module): def __init__(self): super().__init__() self.module_list = nn.ModuleList( [nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 8), nn.Linear(8, 1)] ) def forward(self, batch): for i in self.module_list[1:4]: pass return batch model = Model() out = model(torch.randn(1, 1)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69083 Reviewed By: davidberard98 Differential Revision: D33279114 Pulled By: jbschlosser fbshipit-source-id: 90d74e76602163586b6ff4c49613a2694a9af37c	2021-12-22 11:38:17 -08:00
vfdev-5	ce9a2f8ba9	[C++ API] Added missing nearest-exact mode and anti-alias flag (#69318 ) Summary: Description: Following https://github.com/pytorch/pytorch/pull/65142#issuecomment-981995692 adding missing nearest-exact mode and anti-alias flag to C++ frontend. - https://github.com/pytorch/pytorch/pull/65142 - https://github.com/pytorch/pytorch/pull/64501 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/69318 Reviewed By: davidberard98 Differential Revision: D33278995 Pulled By: jbschlosser fbshipit-source-id: fa87c0c78df6b398e4f9688cc02111eed187afa7	2021-12-22 11:10:51 -08:00
PepeLotudo	da63f3f92b	Corrected typo in Cross entropy formula (#70220 ) Summary: Changes made to line 1073: The denominator of the formula was the EXP(SUM(x)) and changed it to SUM(EXP(x)) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/70220 Reviewed By: davidberard98 Differential Revision: D33279050 Pulled By: jbschlosser fbshipit-source-id: 3e13aff5879240770e0cf2e047e7ef077784eb9c	2021-12-22 11:06:21 -08:00
Jerry Zhang	b7259b8660	[quant][be] Add a check in prepare_qat to make sure the model is in training mode (#69879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69879 att Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33080989 fbshipit-source-id: 55a631284365ec9dfd6bd7469688490ab1891d41	2021-12-22 11:00:00 -08:00
Drazen Borkovic	2806d821b0	Add conversion of torch.permute to acc_ops.permute (#70294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70294 In order to inference shape for permute, the node target needs to get converted from torch.permute to acc_opts.permute. Reviewed By: jfix71 Differential Revision: D33267469 fbshipit-source-id: b77eff1892211eac4a798a2f3e624140e287f4a2	2021-12-22 10:38:39 -08:00
Natalia Gimelshein	56969bf88a	make inverse call linalg_inv (#70276 ) Summary: `linalg.inv` and `inverse` are aliases according to documentation, yet their implementation is somewhat diverged. This makes `inverse` call into `linalg_inv`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70276 Reviewed By: malfet Differential Revision: D33271847 Pulled By: ngimel fbshipit-source-id: cf018ddd2c1cee29026dd5f546f03f3a1d3cf362	2021-12-22 10:15:40 -08:00
kshitij12345	4db3a8fc0a	[nn] TransformerEncoderLayer: no-batch-dim (#69291 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 TODO: * [ ] Update docs? * [x] Generic reference function? cc albanD mruberry jbschlosser walterddr kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69291 Reviewed By: davidberard98 Differential Revision: D33278970 Pulled By: jbschlosser fbshipit-source-id: 8dd5b6d7c0099fa38aa037c186778b10834bdee4	2021-12-22 10:00:09 -08:00
Nikita Shulga	69b37a16f3	Remove unused CUDASolver.h from SparseCUDABlas (#70281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70281 Reviewed By: ngimel Differential Revision: D33272704 Pulled By: malfet fbshipit-source-id: a33a7f9cd1513115a0b9ab75530e85e9913e8dd3	2021-12-22 09:04:34 -08:00
wushirong	31c7e5d629	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: malfet Differential Revision: D33264641 Pulled By: wushirong fbshipit-source-id: ba30010bbd06e70d31415d8c52086d1779371bcf	2021-12-22 08:50:48 -08:00
CodemodService FBSourceClangFormatLinterBot	b5f71375f5	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33275345 fbshipit-source-id: b07a27897680190f9fff86e22d8c68c1c9aff19a	2021-12-22 08:05:39 -08:00
Richard Zou	29f1ccc8f0	Fix some Composite Compliance problems with binary_cross_entropy backward (#70198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70198 This PR fixes composite compliance problems with: - binary_cross_entropy's backward formula - binary_cross_entropy_with_logits's backward formula - binary_cross_entropy's double backward formula It does so by adding checks for areAnyTensorSubclassLike. Test Plan: - I tested everything with functorch. - We are going to do https://github.com/pytorch/pytorch/issues/69530 in the future so we have a way of testing this in core. I need the binary_cross_entropy ones for something right now and didn't want to wait until we come up with a solution for #69530. Reviewed By: Chillee Differential Revision: D33246995 Pulled By: zou3519 fbshipit-source-id: 310ed3196b937d01b189870b86a6c5f77f9258b4	2021-12-22 07:24:04 -08:00
Kevin Tse	75dbe88b05	[DataPipe] removing unbatch_level from .groupby (#70249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70249 IMO, the `unbatch_level` argument is not needed here since users can simply can `.unbatch` before calling `.groupby` if needed. One small step closer to an unified API with other libraries. Note that we may rename the functional name from `.groupby` to `.group` in the future. TBD. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33259104 Pulled By: NivekT fbshipit-source-id: 490e3b6f5927f9ebe8772d5a5e4fbabe9665dfdf	2021-12-22 07:13:12 -08:00
Jiewen Tan	e02d836cb2	[LTC] Upstream LTCTensorImpl (#70062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70062 This commit upstreams LTCTensorImpl from the lazy_tensor_staging branch. It inherits from c10::TensorImpl and thus manages the lifetime/storage of LazyTensor. Test Plan: ./build/bin/test_lazy --gtest_filter=LazyTensorImplTest.* Reviewed By: desertfire Differential Revision: D33171186 Pulled By: alanwaketan fbshipit-source-id: 6af9f91cc7c7e997f120cb89a7bcd6785c03ace0	2021-12-22 03:21:52 -08:00
Raghavan Raman	633f770c3c	[StaticRuntime] Add out-variant support for TensorExprDynamicGroup op (#69479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69479 This diff adds support for out-variant optimization for `TensorExprDynamicGroup` op, which will be used for TensorExpr based fusion in Static Runtime. ghstack-source-id: 146107008 Test Plan: ``` buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Completed accuracy test on inline_cvr model 294738512 v0. Results: ``` get 1012 prediction values get 1012 prediction values pyper_inference_e2e_local_replayer_test.out.132ea03c2 pyper_inference_e2e_local_replayer_test.out.1858bbeb0 max_error: 0 % total: 0 ``` Reviewed By: d1jang, mikeiovine Differential Revision: D32768463 fbshipit-source-id: a3e6c1ea9ff5f3b57eb89095aa79a6d426fbb52a	2021-12-22 00:30:22 -08:00
Raghavan Raman	7d4db93a7d	[jit] Handle output tensor being passed in as inputs to TensorExprDynamicGroup (#69478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69478 This diff handles the case when output tensors are being passed in as inputs to TensorExprDynamicGroup op. This is in preparation to support out-variant optimizations in Static Runtime. ghstack-source-id: 146107007 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/jit:jit Reviewed By: eellison Differential Revision: D32823889 fbshipit-source-id: ff18e17fcd09953e55c8da6b892e60756521c2fc	2021-12-22 00:30:19 -08:00
Raghavan Raman	4dec15e6d8	[nnc] Add a run method to TensorExprKernel that takes in output tensors (#69477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69477 This diff adds a new run method to `TensorExprKernel` which takes in output tensors as inputs and stores the output in those given tensors. ghstack-source-id: 146107009 Test Plan: buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.RunWithAllocatedOutputs' Reviewed By: ZolotukhinM Differential Revision: D32823890 fbshipit-source-id: edc1f4839785124048b034060feb71cb8c1be34f	2021-12-22 00:30:15 -08:00
Raghavan Raman	0bdf4702f6	[jit] Add a new op that composes all of the dynamic shape logic (#69476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69476 This diff adds a new op, `TensorExprDynamicGroup`, that composes all the logic behind running a dynamic shaped fused node. This includes a guard instruction that checks for conditions, a conditional that calls the fused node or the fallback graph depending on the guard. ghstack-source-id: 146107006 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32320082 fbshipit-source-id: 2bd1a43391ca559837d78ddb892d931abe9ebb73	2021-12-22 00:28:57 -08:00
Digant Desai	b613fbdbf2	Back out "[Quant] Added 4 bit support for embedding quantized module" (#70273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70273 Original commit changeset: 73e63383cf60 Original Phabricator Diff: D33152674 (`9f512e129b`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268459 fbshipit-source-id: 051bfcbbad3fa083301a3cea508d00946d6db881	2021-12-21 21:28:04 -08:00
Digant Desai	47ba28f3b5	Back out "[Quant][Eager] Added 4 bit support for eager mode quantization flow" (#70272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70272 Original commit changeset: 5cdaac5aee9b Original Phabricator Diff: D33152675 (`75718e5059`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268415 fbshipit-source-id: 99eb3209d513149ed23a1d9071d1b1c12174d09a	2021-12-21 21:28:01 -08:00
Digant Desai	a86f9806bc	Back out "[Quant][fx] Added test for quint4x2 for fx graph mode quantization" (#70274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70274 Original commit changeset: 89951fcd23e7 Original Phabricator Diff: D33152672 (`de4e7dece9`) Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D33268165 fbshipit-source-id: d667a761d72b9423407ce4d6617e9b6a04b5c9f8	2021-12-21 21:26:46 -08:00
Arvind Kannan	6217fee96b	Revert D33246843: [pytorch][PR] Implementation of Wishart distribution Test Plan: revert-hammer Differential Revision: D33246843 (`a217a62e73`) Original commit changeset: 825fcddf4785 Original Phabricator Diff: D33246843 (`a217a62e73`) fbshipit-source-id: 2c8063e8d10e9d3ac20fa44673e6011ed1160753	2021-12-21 18:55:49 -08:00
Nikita Shulga	2d509ff31b	[GHA] Fix doc push jobs (#70269 ) Summary: Home folder in docker images is `/var/lib/jenkins`, rather than `/home/jenkins` Also repo secrets can not start with `GITHUB_` prefix according to [Naming your secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets#naming-your-secrets) guide Fixes https://github.com/pytorch/pytorch/issues/70211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70269 Reviewed By: suo Differential Revision: D33271404 Pulled By: malfet fbshipit-source-id: 044bb34c75a0e8a9f0b2f5790be7aa2397524a24	2021-12-21 18:20:10 -08:00
Chen Lai	591ca4d6bc	[Operator Versioning][Edge] Reorganize upgrader initialization logic for thread safety (#70225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70225 Thanks for zhxchen17's suggestion. This pr move the operator initialization logic to `upgrader_mobile.cpp`, such that we can leverage the static variable to ensure the operator initialization only happens once. ghstack-source-id: 146103229 Test Plan: ``` buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled buck test mode/opt //caffe2/test/cpp/jit:jit buck test mode/dev //papaya/integration/service/test/mnist:mnist_system_test -- --exact 'papaya/integration/service/test/mnist:mnist_system_test - MnistFederatedSystemTest.test' ``` Reviewed By: zhxchen17 Differential Revision: D33247543 fbshipit-source-id: 6c3a87fe909a1be01452fa79649065845b26d805	2021-12-21 17:26:17 -08:00
soulitzer	21c6de9fdc	Extend autograd functional benchmarking to run vectorized tasks (#67045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67045 To run: `python benchmarks/functional_autograd_benchmark/functional_autograd_benchmark.py --gpu -1 --model-filter=ppl _robust_reg --num-iter 100` ``` Results for model ppl_robust_reg on task vjp: 0.0012262486852705479s (var: 2.2107682351446556e-10) Results for model ppl_robust_reg on task vhp: 0.002099371049553156s (var: 6.906406557760647e-10) Results for model ppl_robust_reg on task jvp: 0.001860950025729835s (var: 1.1251884146634694e-10) Results for model ppl_robust_reg on task hvp: 0.003481731517240405s (var: 2.2713633751614282e-10) Results for model ppl_robust_reg on task jacobian: 0.0012128615053370595s (var: 1.3687526667638394e-09) Results for model ppl_robust_reg on task hessian: 0.009885427542030811s (var: 9.366265096844018e-09) Results for model ppl_robust_reg on task hessian_fwdrev: 0.005268776323646307s (var: 2.4293791422991262e-09) Results for model ppl_robust_reg on task hessian_revrev: 0.002561321249231696s (var: 7.557877101938004e-10) Results for model ppl_robust_reg on task jacfwd: 0.002619938924908638s (var: 5.109343503839625e-10) Results for model ppl_robust_reg on task jacrev: 0.0013469004770740867s (var: 3.1857563254078514e-09) ``` Notes: - We go through batched fallback for both - ppl_robust_reg takes 3 tensor inputs and returns a single scalar output - this means that jacobian is equivalent to doing vjp and vmap would not help us - we expect jacfwd to be slower than jacrev Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33265947 Pulled By: soulitzer fbshipit-source-id: 14f537a1376dea7e5afbe0c8e97f94731479b018	2021-12-21 17:20:29 -08:00
Wanchao Liang	82c5f298ed	[shard] fix named_params_with_sharded_tensor (#70228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70228 fix named_params_with_sharded_tensor impl, where `named_parameters` already loop the submodules recursively, so we shouldn't put it in the submodule loop. ghstack-source-id: 146076471 Test Plan: Added more complicated test cases (that involves multiple submodules) to capture this issue. Reviewed By: pritamdamania87 Differential Revision: D33251428 fbshipit-source-id: cf24ca7fbe4a5e485fedd2614d00cdea2898239e	2021-12-21 15:29:38 -08:00
Kevin Tse	74c834e0dc	[DataPipe] adding a finally statement to ensure hook is reset (#70214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70214 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33255306 Pulled By: NivekT fbshipit-source-id: de2fe6bf08328e481c714aaad390db771073469e	2021-12-21 15:21:04 -08:00
vfdev	23902fb895	Fixed typo in torch check for cdist (#70178 ) Summary: Description: - Fixed typo in torch check for cdist cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70178 Reviewed By: bdhirsh Differential Revision: D33236027 Pulled By: zou3519 fbshipit-source-id: e87a982c0dc5fe576db8f2afc4b2010924f047c0	2021-12-21 15:16:39 -08:00
Kim Juhyeong	a217a62e73	Implementation of Wishart distribution (#68588 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68050 TODO: - [x] Unit Test - [x] Documentation - [x] Change constraint of matrix variables with 'torch.distributions.constraints.symmetric' if it is reviewed and merged. https://github.com/pytorch/pytorch/issues/68720 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68588 Reviewed By: bdhirsh Differential Revision: D33246843 Pulled By: neerajprad fbshipit-source-id: 825fcddf478555235e7a66de0c18368c41e935cd	2021-12-21 14:07:30 -08:00
Pritam Damania	0544f975e1	[reland] Support torch.equal for ShardedTensor. (#70145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70145 Added support for torch.equal to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. ghstack-source-id: 146066939 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D33201714 fbshipit-source-id: 56adfc36e345d512c9901c56c07759bf658c745b	2021-12-21 13:22:52 -08:00
Chen Lai	c321d4c1ca	[Operator Versioning] Split the upgrader test to a separate file and cover mobile part (#70090 ) Summary: 1. Split the test `test_save_load.py` to two files. Basically move the operator versioning related changes to `test_save_load_for_op_versions.py`. 2. Add mobile module related test to `test_save_load_for_op_versions.py` How to run: ``` buck test mode/opt //caffe2/test:jit or python test/test_jit.py TestSaveLoadForOpVersion ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70090 ghstack-source-id: 146103547 Test Plan: ``` buck test mode/opt //caffe2/test:jit python test/test_jit.py TestSaveLoadForOpVersion ``` Reviewed By: tugsbayasgalan Differential Revision: D33180767 fbshipit-source-id: dd31e313c81e90b598ea9dd5ad04a853c017f994	2021-12-21 13:08:01 -08:00
Raghavan Raman	a6f953156e	[StaticRuntime] Add TensorExpr fusion with dynamic shapes in SR (#69475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69475 This diff adds TensorExpr fusion with dynamic shapes in SR. This includes tracing the input graph with sample inputs, and then performing fusion with generalization to get fused graphs with dynamic shapes. ghstack-source-id: 146059043 Test Plan: ``` buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test ``` Reviewed By: d1jang Differential Revision: D32320088 fbshipit-source-id: 397f498878ddfcee9dad7a839652f79f034fefe3	2021-12-21 12:41:02 -08:00
Raghavan Raman	c6d1162325	[jit] Add support for dynamic shape fusion in JIT. (#69474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69474 This diff adds support for dynamic shape fusion in JIT. This is done by performing fusion with the static shapes observed on the first run, generalizing the fused subgraphs and generating code for the generalized fused subgraphs with dynamic shapes. ghstack-source-id: 146059044 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/cpp/jit:jit ``` Reviewed By: eellison Differential Revision: D32781307 fbshipit-source-id: f821d9f8c271bcb78babcb4783d66f2f0020b0ea	2021-12-21 12:39:44 -08:00
Ivan Kobzarev	c5333cdfba	[nnc] tensorexpr for quantized::add (#70188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70188 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33238093 Pulled By: IvanKobzarev fbshipit-source-id: bd4e451bfd7531f31f216def2c3c1ba2f2e566e7	2021-12-21 12:30:56 -08:00
George Qi	bb51519937	bug fix FractionalMaxPool2d (random_samples dimensions) (#70031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70031 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200618 Pulled By: george-qi fbshipit-source-id: 142f224c2cab1008d2d4e9ed333697a92d2d42db	2021-12-21 12:21:54 -08:00
Raghavan Raman	91da2d5fa1	[StaticRuntime] Refactor StaticModule to pass in sample inputs (#69473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69473 This diff refactors StaticModule and its uses to pass in sample inputs. These inputs need to be passed into the constructor because they are need to perform TensorExpr fusion before other optimizations are performed on the input graph. ghstack-source-id: 146059041 Test Plan: buck run mode/opt //caffe2/caffe2/fb/predictor:pytorch_predictor_test Reviewed By: donaldong Differential Revision: D32320084 fbshipit-source-id: b8bd46d442be4cc90ca60f521e0416fdb88eea60	2021-12-21 11:20:25 -08:00
Natalia Gimelshein	c4a6c7a436	fix cpu binary size increase for clamp (#70168 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/70168 Reviewed By: bdhirsh Differential Revision: D33229811 Pulled By: ngimel fbshipit-source-id: 3509da766fa327f4103fdcf880d368f64c111496	2021-12-21 10:59:27 -08:00
Ivan Kobzarev	5504e4ae5c	[nnc] Move DispatchParallel to external_functions (#70221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70221 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33249149 Pulled By: IvanKobzarev fbshipit-source-id: fa6b2535dc09229d72b1c45eaa75434477cdff5e	2021-12-21 10:51:38 -08:00
Peter Bell	304efd8e9a	Change TH_BLAS_MKL into AT_MKL_ENABLED() (#70219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70219 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69419 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33246758 Pulled By: ngimel fbshipit-source-id: aedef4c9ef97b6aa9f574313c94f774b77df2748	2021-12-21 10:36:55 -08:00
Rohan Varma	a197f3fe52	[FSDP/Checkpoint] Activation offload support in checkpoint_wrapper (#70165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70165 Implements activation offload support in checkpoint_wrapper API via save_on_cpu hooks. We avoid modifying the torch.utils.checkpoint implementation and instead compose offload + checkpoint using the save_on_cpu hook for the former. ghstack-source-id: 146078900 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D33228820 fbshipit-source-id: 98b4da0828462c41c381689ee07360ad014e808a	2021-12-21 10:08:18 -08:00
Slava Kovalevskyi	e428a90553	Android build migrated to GHA. (#68843 ) Summary: All for builds of the Android (arm32/64 and x86_32/64) are not migrated to the GHA, away from circleCI. Since this part of the workflow creates final binary with all architectures in it, it was not possible to do migration step by step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68843 Reviewed By: malfet Differential Revision: D33257480 Pulled By: b0noI fbshipit-source-id: dd280c8268bdd31763754c36f38e4ea12b23cd2e	2021-12-21 10:02:51 -08:00
Brian Hirsh	5e222d08a1	Revert "Revert D32498572: allow external backend codegen to be used without autograd kernels" (#69949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69949 This reverts commit 33363cea64fd4be16975c32cf57e9eb123af371d. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D33113544 Pulled By: bdhirsh fbshipit-source-id: e219f10d52776498c9ad273e97bca3e3406cf702	2021-12-21 08:19:37 -08:00
saekanay	8e763cd735	Add explicit OperatorHandle destructor (#70033 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70032 Windows build of PyTorch doesn't produce the `c10::OperatorHandle::~OperatorHandle(void)` symbol in any of its `*.lib` files. This fix is to explicitly define it in Dispatcher.cpp, so downstream consumers wanting to dllimport can find it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70033 Reviewed By: jbschlosser Differential Revision: D33240599 Pulled By: bdhirsh fbshipit-source-id: 56cc5963043bd5caac30e42c3501a4f48d086b36	2021-12-21 07:39:26 -08:00
Vasiliy Kuznetsov	adaf383837	dbr quant: better fix for bug with recursion on dequantize (#70128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70128 Previous code disabled torch_function when dequantizing arguments to an unquantizeable function. This PR blocklists the dequantize method from the dequantize hook instead, so we can remove the previous hack. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: ejguan Differential Revision: D33194396 Pulled By: vkuzo fbshipit-source-id: 6175c2da637c1d0c93b3fea0ef8218eaee6a2872	2021-12-21 06:25:37 -08:00
Vasiliy Kuznetsov	cce9c9aa45	dbr quant: stop overridding tensor getters (#70115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70115 This PR turns off DBR quant __torch_function__ overrides on tensor attribute getters such as `x.dtype`. This should help with making the debug logs more readable, and reduce framework overhead. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: ejguan Differential Revision: D33189544 Pulled By: vkuzo fbshipit-source-id: e0d664bb6b76ca9e71c8a439ae985a0849312862	2021-12-21 06:25:34 -08:00
Vasiliy Kuznetsov	f291708058	dbr quant: clean up logging format (#70114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70114 This PR makes the debug logging for DBR quant be more useful and easier to read. New format looks like ``` DEBUG:auto_trace: fqn: _tf_ <function tanhshrink at 0x7fa4d02d4790> out torch.float32 end ``` This will be useful to speed up further work. Test Plan: ``` // run this with logging enabled, logs easier to read python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33189545 Pulled By: vkuzo fbshipit-source-id: 20af7e066e710beac5a3871a9d6259ee5518f97d	2021-12-21 06:25:31 -08:00
Vasiliy Kuznetsov	fb2a6747b8	dbr quant: add test for qconfig_dict and methods (#70109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70109 Adds a test case for DBR quant + qconfig_dict specifying methods by object_type. Fixes a bug in the FX rewriter for scripting to make the test pass. Full coverage of methods will come in future PRs, this PR is just to verify qconfig_dict is hooked up correctly. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_qconfig_dict_object_type_method ``` Reviewed By: jerryzh168 Differential Revision: D33188160 Pulled By: vkuzo fbshipit-source-id: 47ab9dbca8cdb1cf22d6d673d9c15b3bc0d1ec81	2021-12-21 06:24:18 -08:00
rohitgr7	78bea1bb66	update example in classification losses (#69816 ) Summary: Just updated a few examples that were either failing or raising deprecated warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69816 Reviewed By: bdhirsh Differential Revision: D33217585 Pulled By: albanD fbshipit-source-id: c6804909be74585c8471b8166b69e6693ad62ca7	2021-12-21 02:46:48 -08:00
Michael Suo	19f898402d	Revert D33241684: [pytorch][PR] Install TensorRT lib on oss docker and enable fx2trt unit test Test Plan: revert-hammer Differential Revision: D33241684 (`dab3d3132b`) Original commit changeset: cd498908b00f Original Phabricator Diff: D33241684 (`dab3d3132b`) fbshipit-source-id: d5b2e663b5b0c9e570bd799b9f6111cd2a0de4f7	2021-12-20 23:14:35 -08:00
Joel Schlosser	b376d82caf	Remove backward op for slow dilated 2d convolution (#70067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70067 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33172551 Pulled By: jbschlosser fbshipit-source-id: 2f1802c77253e543ebb7ee8ee0a12fa4defde311	2021-12-20 19:18:34 -08:00
wushirong	dab3d3132b	Install TensorRT lib on oss docker and enable fx2trt unit test (#70203 ) Summary: CI Lib installed and unit test run on https://github.com/pytorch/pytorch/actions/runs/1604076060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70203 Reviewed By: janeyx99 Differential Revision: D33241684 Pulled By: wushirong fbshipit-source-id: cd498908b00f3417bdeb5ede78f5576b3b71087c	2021-12-20 18:51:48 -08:00
Jon Morton	123be0e5b7	[fusion] Add ConvTranspose+BN fusion support (#70022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70022 Add support for fusing ConvTranpose{1,2,3}d with BatchNorm{1,2,3}d. This re-uses the existing fusion logic but adds a "transpose" flag to the fusing function which when enabled will use the appropriate reshape for ConTranspose's transposed weights. Test Plan: `buck test mode/dev //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion` Reviewed By: jerryzh168 Differential Revision: D33074405 fbshipit-source-id: 5e9eff1a06d8f98d117e7d18e80da8e842e973b7	2021-12-20 18:42:48 -08:00
Donald Dong	24f16de987	[Static Runtime] Support native op split_with_sizes (#69999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69999 This adds support for the split_with_sizes operator in static runtime by adding native operators. Those operators will have less overhead comparing to their JIT fallbacks (no dispatching, no stack constructing in runtime). split_with_sizes can be called directly from cpp API, or in `torch.split` when `split_sizes` is a list. This diff adds support for both use cases. Test Plan: - Added unit tests. Made sure the operators are used - Benchmark ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/data/users/dxd/305797439_0.predictor.precompute.remote_request_only \ --method_name=user.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=500 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \ --input_type="recordio" --pt_inputs=/data/users/dxd/305797439_0_user.inputs.recordio \ --recordio_use_ivalue_format=1 --do_profile=1 --do_benchmark=1 ``` #### Before ``` Static runtime ms per iter: 3.62073. Iters per second: 276.187 0.0471904 ms. 1.31501%. aten::split_with_sizes (5 nodes) ``` #### After ``` Static runtime ms per iter: 3.44374. Iters per second: 290.382 0.0432057 ms. 1.34276%. aten::split_with_sizes (5 nodes, native) ``` Reviewed By: swolchok Differential Revision: D33141006 fbshipit-source-id: feae34c4c873fc22d48a8ff3bf4d71c0e00bb365	2021-12-20 18:32:54 -08:00
Haixin Liu	6623c4838e	Handle the corner case when min == max in L2 search (#70207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70207 In corner case when min == max, adjust_hist_to_include_zero() function used in L2 search will cause additional_nbins = -2147483648 and initialize bins_f with negative size. Test Plan: Before fix: f315187213 After fix: f315471862 Reviewed By: jspark1105 Differential Revision: D33227717 fbshipit-source-id: 7e8a455e51a0703a3a9c5eb7595d9b4d43966001	2021-12-20 17:46:55 -08:00
Joel Schlosser	f17e76b0f2	Expand description of bias_sizes arg for convolution_backward (#70195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70195 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D33240155 Pulled By: jbschlosser fbshipit-source-id: c4f907d6e33e4d1eeb1b5228f1152307c8b27729	2021-12-20 17:33:17 -08:00
Colin Taylor	3e8ef9a272	Add return type annotation for ShardedTensor (#69945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69945 Test Plan: CI Reviewed By: wanchaol Differential Revision: D32502393 fbshipit-source-id: 7bea08762446b211d8ea028d024d2acdabe45479	2021-12-20 17:15:44 -08:00
Jane Xu	c555b7bacb	GHA: Remove caffe2 check in Windows shard 1 smoke tests (#70010 ) Summary: Windows shard 1 hasn't actually been running any tests because the script that does so exited before running the python tests but did not report an error. This has been happening to all windows tests across the board, for example https://github.com/pytorch/pytorch/runs/4526170542?check_suite_focus=true Removing the caffe2.python check passes the smoke tests now. You can observe that the run_test.py file is called in the windows cpu job now https://github.com/pytorch/pytorch/runs/4541331717?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/70010 Reviewed By: malfet, seemethere Differential Revision: D33161291 Pulled By: janeyx99 fbshipit-source-id: 85024b0ebb3ac42297684467ee4d0898ecf394de	2021-12-20 16:05:38 -08:00
Natalia Gimelshein	e6d9bb8d57	reduce the number of instantiations for bernoulli tensor tensor kernel (#70169 ) Summary: Reduces the binary size of DistributionBernoulli.cu 12282600 -> 3946792 Tensor-tensor bernoulli kernels are rarely used, we limit dispatches to double probability type for double `self` tensor, and `float` probability type for everything else. This would be a minor perf hit if probability tensor is of the different dtype, but given how rarely these kernels are used (and how rarely the probability tensor is not float) this is not a problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70169 Reviewed By: jbschlosser Differential Revision: D33237890 Pulled By: ngimel fbshipit-source-id: 185c4b97aba0fb6ae159d572dd5bbb13cf676bb4	2021-12-20 13:46:34 -08:00
Rohan Varma	79a40b22aa	[Checkpoint] Make checkpoint_wrapper an nn.Module (#70164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70164 Implement Alban's suggestion to make checkpoint_wrapper an nn.Module instead of patching the forward pass, which is too hacky. ghstack-source-id: 146011215 Test Plan: IC Reviewed By: mrshenli Differential Revision: D33214696 fbshipit-source-id: dc4b3e928d66fbde828ab60d90b314a8048ff7a2	2021-12-20 13:22:28 -08:00
Jane Xu	fcaecd718a	Write flaky tests to rockset (#70136 ) Summary: Try using Rockset as backend for data instead of RDS Pull Request resolved: https://github.com/pytorch/pytorch/pull/70136 Reviewed By: suo Differential Revision: D33242148 Pulled By: janeyx99 fbshipit-source-id: 8935ceb43717fff4922b634165030cca7e934968	2021-12-20 13:17:21 -08:00
soulitzer	5651e1e3ad	Add auto_linear formulas and some others (#69727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69727 Still need to test the backward ones. We would need to update gradgradcheck to check forward over backward. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031728 Pulled By: soulitzer fbshipit-source-id: 86c59df5d2196b5c8dbbb1efed9321e02ab46d30	2021-12-20 12:15:25 -08:00
Mike Iovine	65f54bc000	[SR] Optimize VarStack (#68750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68750 There was some room for optimization in static runtime's `prim::VarStack`: * Avoid refcount bumps - constructing the `std::vector<at::Tensor>` can be avoided by writing a custom version of `stack_out` that takes a `std::vector<at::Tensor>` Skip the memory overlap check * Avoid device dispatcher overhead in a few places (e.g. `tensor.unsqueeze -> at::native::unsqueeze`) Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Stack` Reviewed By: swolchok Differential Revision: D32596934 fbshipit-source-id: e8f0ccea37c48924cb4fccbfdac4e1e11da95ee0	2021-12-20 11:46:11 -08:00
Shirong Wu	a799ffebd2	Create lower code example (#70142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70142 Create lower code example in oss, and run benchmark agaist resnet101 Test Plan: CI Reviewed By: 842974287 Differential Revision: D33117440 fbshipit-source-id: 359d0c9e65899ab94c8f3eb112db70db5d938504	2021-12-20 11:37:08 -08:00
Nikita Shulga	423ce416d8	Prune osx-arm64 binaries from nightly channel (#70132 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/70043 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70132 Reviewed By: janeyx99 Differential Revision: D33195431 Pulled By: malfet fbshipit-source-id: 4579a6788255a6df306862c3e959ae7a9ddd4e45	2021-12-20 11:28:43 -08:00
David Berard	41959ce77f	[JIT] scripting, freezing, serialization for sparse csr (#69555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69555 1. Implement pickling/unpickling 2. Add `test_freeze_sparse_csr, tests_serialize_sparse_csr` tests Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33181367 Pulled By: davidberard98 fbshipit-source-id: a15d5193a7b1b1625a27e4af003cec33cdbc8071	2021-12-20 11:13:34 -08:00
David Berard	bcb6076099	Sparse CSR tensors: storage access should throw (#70072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70072 Like (sparse COO tensors), sparse CSR tensors don't really have an actual storage() that can be accessed, so sparsetensor->storage() should throw. cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33181309 Pulled By: davidberard98 fbshipit-source-id: 8f1dc4da03073d807e5acee2ac47caeffb94b16c	2021-12-20 11:12:01 -08:00
Shirong Wu	bcc7dbdf37	Change open source unit test deps (#70167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70167 1. Change unit test dependency to open source base class, so that this unit test can run on git oss CI 2. Remove usage of typing.Protocol, so that lower can run with Python 3.6 Test Plan: oss CI passed with change included in commit: https://github.com/pytorch/pytorch/actions/runs/1597530689 see test(fx2trt) Reviewed By: yinghai Differential Revision: D33228894 fbshipit-source-id: ffe3d40a02a642b3b857a0605101797037a580bb	2021-12-20 10:41:38 -08:00
George Qi	dd02af6283	Bilinear no_batch_dim (#69539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69539 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33200105 Pulled By: george-qi fbshipit-source-id: c674e3937fea95c4ec41a01c5aa6d6890042b288	2021-12-20 09:44:07 -08:00
Taylor Robie	978089c381	Prevent divide-by-zero errors in Timer (#70050 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66503 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70050 Reviewed By: mruberry Differential Revision: D33168868 Pulled By: robieta fbshipit-source-id: 7d0ece9e888f6c69a9e0ced581c92d3259fb3540	2021-12-20 09:16:03 -08:00
Kevin Tse	ad0cd8a76e	[DataPipe] Improve inline doc and testing for CollatorIterDataPipe (#70139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70139 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33199107 Pulled By: NivekT fbshipit-source-id: f96d77490998ac9bc3da8d4ff1a9caa08e9e7f27	2021-12-20 08:05:21 -08:00
Chen Lai	8a912014b1	[Operator Versioning][Edge] Initialize upgrader thread safe (#70161 ) Summary: Upgrader should only be initialized once when runtime loads the first module. It no longer needs to initialized afterwards. Previously, instead of using an atomic variable, the upgrader will be initialized depends on whether byteCodeFunctionWithOperator.function.get_code().operators_ is empty. If it's empty, it means the operator from the upgrader is not initialized yet. However, it's not thread safe. When multiple thread loads module together, it's possible that they all consider it's the first module. Use an atomic variable here to make sure it's thread safe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70161 ghstack-source-id: 146012642 Test Plan: ``` buck test mode/opt //papaya/integration/service/test/analytics/histogram:generic_histogram_system_test -- --exact 'papaya/integration/service/test/analytics/histogram:generic_histogram_system_test - SumHistogramSystemTest.test' --run-disabled buck test mode/opt //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D33220320 fbshipit-source-id: 10f2397c3b358d5a1d39a2ce25457e3fdb640d2c	2021-12-19 20:16:00 -08:00
Taylor Robie	7ea86dfdb1	[Profiler] Factor common logic into `torch/csrc/profiler/api.h` (#69459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69459 This change breaks the dependency between the kineto and legacy profiler; instead of `profiler_kineto.h` including `profiler_legacy.h`, they both include `profiler/api.h`. As part of this refactor, I injected some intermediate classes to keep legacy behavior from leaking into the kineto profiler: 1) ProfilerThreadLocalState has become ProfilerThreadLocalStateBase which just handles config and callback handle. Legacy and Kineto profilers inherit this and implement their own very disjoint set of logic. 2) CUDAStubs is a pure virtual class to make the interface more readable, and the "always fail" behavior has been moved to a `DefaultCUDAStubs` class in `api.cpp`. Test Plan: Ran the overhead ubenchmark. Reviewed By: aaronenyeshi Differential Revision: D32678163 fbshipit-source-id: 9b733283e4eae2614db68147de81b72f6094ce6c	2021-12-19 18:40:28 -08:00
CodemodService FBSourceClangFormatLinterBot	181120f7d7	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33229251 fbshipit-source-id: 3a69bb459fa0a65888d6f9c8e70b5de032ddad97	2021-12-19 16:38:25 -08:00
CodemodService FBSourceBuckFormatLinterBot	60191196d4	[AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily `arc lint --take BUCKFORMAT` Reviewed By: zertosh Differential Revision: D33229262 fbshipit-source-id: 7c22aa59a2a9eea94d2f403c339eb20abc7d9c41	2021-12-19 16:34:00 -08:00
Peter Bell	ef70174f2e	Separate c10::Symbol header from list of interned strings (#69406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69406 Most files that include `interned_strings.h` don't actually depend on anything generated from `FORALL_NS_SYMBOLS` yet because they're in a single file you need to recompile whenever a new symbol is added. Here I move the class definition into a separate file so this doesn't happen. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32923637 Pulled By: albanD fbshipit-source-id: 6e488cbfcfe2c041a99d9ff22e167dbddf3f46d7	2021-12-19 14:52:26 -08:00
Natalia Gimelshein	06d0536dad	Low precision support for jiterator (#70157 ) Summary: This adds support for bfloat16 and fp16 types for jiterator by adding at::Half and at::BFloat16 classes to the jiterator code template. The only methods defined in those classes are construction from float and implicit conversion to float. Mathematical operations on them never need to be defined, because jiterator is written in a way to implicitly upcast the inputs to the functor, so all math has to be performed on float only (e.g. compute part of the kernel would always be written as ``` out[j] = i0<float>(arg0[j]); ``` It also adds support for casting to complex outputs, by adding a similar templated class c10::complex<T>. Originally I planned to only support float -> complex complex for it, but to compile fetch_and_cast function we also need complex -> float conversion. We can avoid it by compiling fetch_and_cast for a different subset of types, but I'm not doing it in this PR. Thus, technically, we can compile a kernel that would accept complex inputs and produce wrong results, but we are guarding against it by static asserting that none of the functor datatype are complex, and runtime-checking that none of the inputs are complex. Adding bfloat16, half and complex support allows us to remove special handling for type promotion tests for gcd. i0 (that supports half and bfloat16 inputs) is moved to use jiterator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70157 Reviewed By: mruberry Differential Revision: D33221645 Pulled By: ngimel fbshipit-source-id: 9cfe8aba3498a0604c4ea62c217292ea06c826b1	2021-12-19 11:56:57 -08:00
jiej	78f06e0690	fixing conv2d decomposition and tests (#70127 ) Summary: Current implementation has a bug where decomposed `add_optional` from `conv2d` is placed before the producer node, this causes linter error on graph. Cherry-picked from https://github.com/csarofeen/pytorch/pull/1333 Fixing issue posted in https://github.com/csarofeen/pytorch/issues/1325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70127 Reviewed By: ejguan Differential Revision: D33199018 Pulled By: jansel fbshipit-source-id: bce1f14a443811b4d55116a04fd4daa86084cc47	2021-12-19 10:38:23 -08:00
dzdang	de4e7dece9	[Quant][fx] Added test for quint4x2 for fx graph mode quantization (#69846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69846 Test Plan: In pytorch main dir, execute to run the added test Reviewed By: jbschlosser Differential Revision: D33152672 Pulled By: dzdang fbshipit-source-id: 89951fcd23e7061d6c51e9422540b5f584f893aa	2021-12-19 06:15:26 -08:00
David Dang	75718e5059	[Quant][Eager] Added 4 bit support for eager mode quantization flow (#69806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69806 Minor modifications were made to support 4 bit embedding quantized module in eager mode quantization flow and to allow for testing of the changes Test Plan: In pytorch main dir, execute ``` python test_quantization.py TestPostTrainingStatic.test_quantized_embedding ``` to run the series of tests, including the newly added test_embedding_4bit function Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152675 fbshipit-source-id: 5cdaac5aee9b8850e61c99e74033889bcfec5d9f	2021-12-19 06:14:12 -08:00
David Dang	9f512e129b	[Quant] Added 4 bit support for embedding quantized module (#69769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69769 Added 4 bit support and the correpsonding test in the module api. Restructured the test_quantized_module for both 4 & 8 bit support. Test Plan: In pytorch main dir, execute ``` python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api ``` Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152674 fbshipit-source-id: 73e63383cf60994ab34cc7b4eedd8f32a806cf7f	2021-12-18 22:26:24 -08:00
David Dang	b331752314	[Quant] Implemented 4 bit embedding op support; added corresponding test case (#69768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69768 Support for the 4 embedding operator has been added. The support is analogous to the preexisting support for byte/8bit embedding. A corresponding test case was added to test_quantized_embedding_op.py Test Plan: In pytorch main dir, execute ``` python test/test_quantization.py TestStaticQuantizedModule.test_embedding_api ``` to run the series of tests, including the newly added test_embedding_4bit function Imported from OSS Reviewed By: jbschlosser Differential Revision: D33152673 fbshipit-source-id: bdcc2eb2e37de38fda3461ff3ebf1d2fb5e58071	2021-12-18 22:03:33 -08:00
Jerry Zhang	94abf120c8	[quant][fx][graphmode][be] Use is_qat instead of model.training as a flag for qat (#69878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69878 But we'll still verify that model.training is True when user call prepare_qat API. Relaxing this condition might also mean that we change the api for methods in fuser_method_mapping, with additional flag for qat (currently we just have different fusions for training/eval), I don't think this is P0, we could revisit if there is a need in the future Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33080988 fbshipit-source-id: b13715b91f10454948199323c5d81ef88bb3517f	2021-12-18 00:00:46 -08:00
Ivan Kobzarev	fb34af1b21	[nnc][quantization] Optimize constructTensors in ext functions (#69856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69856 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064756 Pulled By: IvanKobzarev fbshipit-source-id: 430d850f8591b8e0a0bdba5c41896627a72db88e	2021-12-17 23:45:03 -08:00
Mike Ruberry	84b7832010	Updates CUDA memory leak check to verify against driver API and print more diagnostic information (#69556 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/69556 Reviewed By: mrshenli Differential Revision: D32954770 Pulled By: mruberry fbshipit-source-id: a6c2ae6f704422c178569980ca4b9c72c4272f55	2021-12-17 23:37:49 -08:00
Jerry Zhang	6c68045f60	[quant][graphmode][fx][be] Fix a typo in quantization/fx/graph_module (#69877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69877 att Test Plan: ``` python tes/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33079525 fbshipit-source-id: dfd3afb916067a628071a59ce95c6b1d228a3c72	2021-12-17 23:33:33 -08:00
Jerry Zhang	9d3a6fa623	[quant][bc-breaking] Remove QConfigDynamic from quantization api (#69875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69875 att Test Plan: ci + regression tets: ``` python test/test_quantization.py TestPostTrainingStatic python test/test_quantization.py TestPostTrainingDynamic python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33079096 fbshipit-source-id: 1e73bb27c518eba62b60f3a8c4b532dddc8367cf	2021-12-17 23:10:06 -08:00
Jerry Zhang	5db711f9d3	[quant][be] Replace QConfigDynamic with QConfig in code (#69864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864 att, will have a follow up PR that removes QConfigDynamic in the api Test Plan: regression tests ``` python test/test_quantization.py TestPostTrainingStatic python test/test_quantization.py TestPostTrainingDynamic python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33073235 fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db	2021-12-17 22:30:57 -08:00
Shiyan Deng	c463d50098	[fx2trt] Convert to tuple is output_size of adaptive avg pool is an integer (#70144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70144 It can be an integer and in this case we need to extend it. Test Plan: Added a unit test. ``` RemoteExecution session id: reSessionID-d97b46e3-20d1-4f5c-a166-4efcf1579352-tpx Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391775638 ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_adaptive_avgpool - main (9.454) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_with_dynamic_shape (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.083) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_1 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.349) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_2 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.543) ✓ Pass: caffe2/test/fx2trt/converters:test_adaptive_avgpool - test_adaptive_avgpool_0 (caffe2.test.fx2trt.converters.acc_op.test_adaptive_avgpool.TestAdaptiveAvgPoolConverter) (16.651) Summary Pass: 4 ListingSuccess: 1 ``` Reviewed By: wushirong Differential Revision: D33200773 fbshipit-source-id: 8c10d644982a4723a78f8615d8bcdbc3968790db	2021-12-17 18:31:25 -08:00
Alex Beloi	9ee3006d58	[fx-acc][graph-opts] bug fixes for transpose_to_reshape, optimize_quantization, finalize_kwargs_to_concrete Summary: Fixes a couple of bugs that surfaced during integration of graph opts into `AcceleratedGraphModule` (D31484770). 2. Fix bug in `graph_opt.transpose_to_reshape` implementation that causes it to incorrectly apply opt for `permute` op acting on shape `(B, N, N)` with `N > 1` and permutation `(0, 2, 1)`. Fixed the bug and added test case to cover this case. 3. Revert part of D31671833 (`0e371e413d`), where I made `acc_out_ty` into a required argument 4. Align `graph_opt.transpose_to_reshape` and `graph_opt.optimize_quantization` to not set `acc_out_ty` when adding a new node to graph and instead rely on tensor metadata 5. Run `acc_utils.copy_acc_out_ty_from_meta_to_acc_ops_kwargs()` in `GraphOptsTest.verify_numerics` before running graph on sample inputs. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts: ``` ``` ... Summary Pass: 85 ListingSuccess: 4 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/562950163929022 ``` Reviewed By: jfix71 Differential Revision: D31851549 fbshipit-source-id: 602affe2a2a0831d2f17b87025107ca87ecb0e59	2021-12-17 17:35:48 -08:00
Shiyan Deng	bd9983366b	[fx2trt] Add support for torch.mean (#70052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70052 As the title. Also refactored a bit to separate out the common part of adding a reduce operator. This would make mnasnet lowerable without splitter. Test Plan: Added unit tests. Reviewed By: wushirong Differential Revision: D33163950 fbshipit-source-id: 7eb8f8a852cd8e8d9937029c4b4602b036502b3a	2021-12-17 15:48:31 -08:00
Joel Schlosser	9fb199bc12	Add convolution_backward to aten_interned_strings.h (#70112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70112 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33188664 Pulled By: jbschlosser fbshipit-source-id: 20e565c2fef4c1c3c087ba9b36320b7e539e467e	2021-12-17 15:38:47 -08:00
Nikita Shulga	9b14d93d78	Fix bazel workflows (#70137 ) Summary: Fixes regression after manual rebase of `e35bf56461` Pull Request resolved: https://github.com/pytorch/pytorch/pull/70137 Reviewed By: pbelevich Differential Revision: D33197055 Pulled By: malfet fbshipit-source-id: 21adf7297f75715a59d2a1b3751b4ec8f71c7c03	2021-12-17 14:48:11 -08:00
Richard Barnes	70ed4f3ffc	Try dropping Torch from typeshed_internal (#69926 ) Summary: Removes the internal typeshed for PyTorch and replaces it with PyTorch's own type annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69926 Generated files are in P471601595, P471601643, P471601662 Based on an example in D26410012 Test Plan: Sandcastle Reviewed By: malfet, pradeep90 Differential Revision: D32292834 fbshipit-source-id: 5223f514cbdccd02c08ef0a027a48d92cdebed2c	2021-12-17 14:08:19 -08:00
Thuyen Ngo	e35bf56461	[Bazel] Add CUDA build to CI (#66241 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35316 On master, bazel cuda build is disabled due to lack of a proper `cu_library` rule. This PR: - Add `rules_cuda` to the WORKSPACE and forward `cu_library` to `rules_cuda`. - Use a simple local cuda and cudnn repositories (adopted from TRTorch) for cuda 11.3. - Fix current broken cuda build. - Enable cuda build in CI, not just for `:torch` target but all the test binaries to catch undefined symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66241 Reviewed By: ejguan Differential Revision: D31544091 Pulled By: malfet fbshipit-source-id: fd3c34d0e8f80fee06f015694a4c13a8e9e12206	2021-12-17 13:44:29 -08:00
soulitzer	e0f4e28c69	Skip forward-over-reverse gradgrad check for pinv singular on CUDA fo… (#70123 ) Summary: …r cdouble Fixes https://github.com/pytorch/pytorch/issues/70046 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70123 Reviewed By: zou3519 Differential Revision: D33193017 Pulled By: soulitzer fbshipit-source-id: 846f97ad1bf38c7239e9fc40fd5f476e29264f7c	2021-12-17 13:38:57 -08:00
Jiewen Tan	38e026c14d	Add tanh_backward to AT symbols (#70071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70071 This commit adds tanh_backward to aten_interned_strings.h as an AT symbol. Test Plan: CI. Reviewed By: mruberry Differential Revision: D33173370 Pulled By: alanwaketan fbshipit-source-id: e20ed2a807156ce772b7c1e3f434fa895116f4c3	2021-12-17 13:35:05 -08:00
XiaobingSuper	a6b7521428	always use max cmake when cmake3 and cmake are all existed (#69355 ) Summary: For Pytorch source build when using Ninja generator, it requires CMake >=3.13, Pytorch always checks cmake3 >= 3.10 first, so when 3.13> cmake3 >= 3.10 and then PyTorch will use cmake3, there will report an error: ```Using the Ninja generator requires CMake version 3.13 or greater``` even the CMake >=3.13 . For example: for my centos machine, the system CMake3 is ```3.12```, and my conda env's CMake is ```3.19.6```, there will have a build error which PyTorch choose CMake 3, I can update CMake3 or create an alias or a symlink to solve this problem, but the more reasonable way is that ```_get_cmake_command ``` always return the newest CMake executable (unless explicitly overridden with a same CMAKE_PATH environment variable). Pull Request resolved: https://github.com/pytorch/pytorch/pull/69355 Reviewed By: jbschlosser Differential Revision: D33062274 Pulled By: malfet fbshipit-source-id: c6c77ce1374e6090a498be227032af1e1a82d418	2021-12-17 12:53:49 -08:00
Kyle Chen	254360e182	[ROCm] Skip test_fn_fwgrad_bwgrad_* unexpected success tests (#70124 ) Summary: Skip tests that cause unexpected success for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> additional to this PR: https://github.com/pytorch/pytorch/pull/70061 skipping 4 more tests that cause unexpected success and fail the CI job for ROCm log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test2/15350/console cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/70124 Reviewed By: ejguan Differential Revision: D33193508 Pulled By: malfet fbshipit-source-id: 9949910e2e7dc66cbadd23cea874df26e2d4136d	2021-12-17 12:08:47 -08:00
Nikita Shulga	26e32988bd	Revert D32596264: Codegen: TraceType only includes operators being registered Test Plan: revert-hammer Differential Revision: D32596264 (`e66a8ab4f5`) Original commit changeset: 2f28b62d7b99 Original Phabricator Diff: D32596264 (`e66a8ab4f5`) fbshipit-source-id: 7d18c4e77ce30dd7817a95f9c39b565cb246cd12	2021-12-17 11:20:12 -08:00
Nikita Shulga	2f622e87bd	Revert D32596274: Codegen: ADInplaceOrViewType only include operators registered Test Plan: revert-hammer Differential Revision: D32596274 (`9ad940d982`) Original commit changeset: 400cad023782 Original Phabricator Diff: D32596274 (`9ad940d982`) fbshipit-source-id: 5c53195edaae47b9daba373cf166d2382178d01b	2021-12-17 11:02:08 -08:00
Ivan Yashchuk	60eb1e53b2	Sparse CSR CPU: Add block sparse support for MKL path (#68710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68710 This PR adds support for block sparse (BSR) matrices for functions that use Inspector-Executor MKL Sparse API. At the moment of this PR it's: * torch.addmm * torch.addmv * torch.triangular_solve (once https://github.com/pytorch/pytorch/pull/62180 is merged) cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33179486 Pulled By: cpuhrsch fbshipit-source-id: e1dec0dccdbfed8b280be16b8c11fc9e770d50ae	2021-12-17 10:56:05 -08:00
vfdev-5	0cfff65395	Apply contiguous on inputs of cdist backward (#70016 ) Summary: Description: - Apply contiguous on inputs of cdist backward - Added a test Fixes https://github.com/pytorch/pytorch/issues/69997 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/70016 Reviewed By: ejguan Differential Revision: D33187946 Pulled By: albanD fbshipit-source-id: 645306aa043b2f84c4c2df0306fabfc224d746b6	2021-12-17 10:54:45 -08:00
Kyle Chen	bc95e5a196	[ROCm] Skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 (#70061 ) Summary: This PR will skip test_fn_fwgrad_bwgrad_gradient_cuda_complex128 test for ROCm Signed-off-by: Kyle Chen <kylechen@amd.com> Related github isssue: [https://github.com/pytorch/pytorch/issues/70027](https://github.com/pytorch/pytorch/issues/70027) jithunnair-amd jeffdaily cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/70061 Reviewed By: ejguan Differential Revision: D33189411 Pulled By: malfet fbshipit-source-id: a60d5b35099d3c8d3ceebb996e91470a8a676f85	2021-12-17 10:47:31 -08:00
Ivan Kapelyukh	de992c6b21	Specify ij indexing when cartesian_prod calls meshgrid (#68753 ) Summary: Currently, `cartesian_prod` calls `meshgrid` without passing an indexing parameter. This causes a warning to be shown when running the `cartesian_prod` example from the docs. This PR simply passes the default value for this indexing parameter instead. Fixes https://github.com/pytorch/pytorch/issues/68741 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68753 Reviewed By: kimishpatel Differential Revision: D33173011 Pulled By: mruberry fbshipit-source-id: 667185ec85bd62bda177bc5768d36f56cfc8b9ab	2021-12-17 10:39:44 -08:00
Peter Bell	9ad940d982	Codegen: ADInplaceOrViewType only include operators registered (#68692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68692 ADInplaceOrViewType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also ports the generated code over to the `at::_ops` interface, and the code generator itself to using `write_sharded` instead of re-implementing its own version of sharding. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet Differential Revision: D32596274 Pulled By: albanD fbshipit-source-id: 400cad0237829720f94d60f9db7acd0e918e202e	2021-12-17 10:36:20 -08:00
Peter Bell	e66a8ab4f5	Codegen: TraceType only includes operators being registered (#68691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68691 TraceType is a sharded file, so by only including specific operator headers, we ensure that changing one (non-method) operator only needs one shard to be re-compiled. This also changes all the included autograd and jit headers from including `ATen/ATen.h` to just including `ATen/core/Tensor.h`. Test Plan: Imported from OSS Reviewed By: jbschlosser, malfet Differential Revision: D32596264 Pulled By: albanD fbshipit-source-id: 2f28b62d7b9932f30fad7daacd8ac5bb7f63c621	2021-12-17 10:35:05 -08:00
Albert Liang	0d06616c47	Add `dict` methods to `ParameterDict` (#69403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68476 We implemented all of the following `dict` methods for `ParameterDict` - `get ` - `setdefault` - `popitem` - `fromkeys` - `copy` - `__or__` - `__ior__` - `__reversed__` - `__ror__` The behavior of these new methods matches the expected behavior of python `dict` as defined by the language itself: https://docs.python.org/3/library/stdtypes.html#typesmapping Pull Request resolved: https://github.com/pytorch/pytorch/pull/69403 Reviewed By: albanD Differential Revision: D33187111 Pulled By: jbschlosser fbshipit-source-id: ecaa493837dbc9d8566ddbb113b898997e2debcb	2021-12-17 10:15:47 -08:00
Joel Schlosser	35519428a2	Remove backward ops for miopen depthwise convolution (#70064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70064 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33171169 Pulled By: jbschlosser fbshipit-source-id: 668ca9baa992d3bb1cfa7b53fd2127ffeb051147	2021-12-17 10:08:49 -08:00
Joel Schlosser	ab2a739851	Remove backward ops for miopen transposed convolution (#70063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70063 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33171170 Pulled By: jbschlosser fbshipit-source-id: 4fd6c1cd027f714354644c4ac7694d0f9092c762	2021-12-17 10:07:27 -08:00
Peter Bell	ec577300d7	OpInfo: Convert more sample_input_funcs to generators (#69976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69976 These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself. Re-submit of #69257 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33172953 Pulled By: mruberry fbshipit-source-id: 7b8bae72df6a225df88a158b7ffa82a71d3c061b	2021-12-17 10:03:59 -08:00
Peter Bell	950957f857	Fix jit tests assuming sample_inputs is a list (#69975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69975 cc mruberry Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D33172952 Pulled By: mruberry fbshipit-source-id: 1f8bb49179f7fbd0fec5e7344e8c213484518e27	2021-12-17 10:02:50 -08:00
Nikita Shulga	ad79d0dd4b	Add `ciflow/trunk` label (#69575 ) Summary: Which includes all workflows but periodic ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/69575 Reviewed By: seemethere Differential Revision: D32932850 Pulled By: malfet fbshipit-source-id: 80b58fb3a0d5f8dbc527124be5bf25bd716448b8	2021-12-17 09:57:46 -08:00
Philip Meier	de296d526f	move torch.testing from prototype to beta (#69668 ) Summary: cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/69668 Reviewed By: albanD Differential Revision: D33028213 Pulled By: mruberry fbshipit-source-id: 3316b887d4c322cc1262feee651464da4124a6de	2021-12-17 09:52:47 -08:00
CodemodService FBSourceClangFormatLinterBot	de2d9e2966	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33183467 fbshipit-source-id: d7c37f3522a38e85891524c544eab4fdb01270de	2021-12-17 09:45:20 -08:00
Masaki Kozuki	1065739781	Fix build on latest main branch of thrust - SoftMax.cu (#70039 ) Summary: Similar to https://github.com/pytorch/pytorch/issues/69985 I think there's any other source file which should `#include <thrust/iterator/constant_iterator.h>` as of `73a6c36f1b` ``` mkozuki@mkozuki-srv ~/ghq/github.com/crcrpar/torch-0 master torch-0 ❯ git rev-parse HEAD; rg -inw make_constant_iterator 73a6c36f1bfbf9aff04ba41cfe6ab06aa99883d9 aten/src/ATen/native/cuda/LegacyThrustHelpers.cu 54: thrust::make_constant_iterator(1), aten/src/ATen/native/sparse/cuda/SoftMax.cu 301: thrust::make_constant_iterator(int64_t(1)), ``` ## build error ```console https://github.com/pytorch/pytorch/issues/22 2048. /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o.d -x cu -c ../aten/src/ATen/native/sparse/cuda/SoftMax.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/SoftMax.cu.o https://github.com/pytorch/pytorch/issues/22 2048. ../aten/src/ATen/native/sparse/cuda/SoftMax.cu(301): error: namespace "thrust" has no member "make_constant_iterator" ... https://github.com/pytorch/pytorch/issues/22 2048. 13 errors detected in the compilation of "../aten/src/ATen/native/sparse/cuda/SoftMax.cu". ``` cc xwang233 zasdfgbnm ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/70039 Reviewed By: mruberry Differential Revision: D33166702 Pulled By: ngimel fbshipit-source-id: 33f3b80095c8562786a9a9b7a0e7eb58201af458	2021-12-17 09:28:44 -08:00
Nikita Shulga	92463573d8	Sanitize string before passing it as shell argument (#70070 ) Summary: Use `c10::printQuotedString` to escape any characters that might render string to be interpreted as more than one argument by shell script. Please note, that this codepath is deprecated and is not accessible by a typical PyTorch usage workflows. This issue was discovered by Daniel Lawrence of the Amazon Alexa team. Pull Request resolved: https://github.com/pytorch/pytorch/pull/70070 Reviewed By: suo Differential Revision: D33172721 Pulled By: malfet fbshipit-source-id: 9dbd17f6eb775aaa1a545da42cbc95864c1189ee	2021-12-17 08:08:28 -08:00
albanD	54406314cc	Update PULL_REQUEST_TEMPLATE.md (#70105 ) Summary: Many users actually send things like `Fixes #{69696}` which then fails to properly close the corresponding issue. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/70105 Reviewed By: ejguan Differential Revision: D33187501 Pulled By: albanD fbshipit-source-id: 2080ee42c30b9db45177f049627118a6c3b544b7	2021-12-17 07:53:36 -08:00
Joel Schlosser	b1d5948b34	Remove backward ops for miopen convolution (#69987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #69987 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33170379 Pulled By: jbschlosser fbshipit-source-id: 6bc274f1d457ec5bddc8b52c2f1c44eaae2ff0ed	2021-12-17 07:43:38 -08:00
Vasiliy Kuznetsov	f045618dab	dbr quant: extend qconfig_dict support to functionals, part 2 (#69766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69766 Follow-up on the previous PR, removes the requirement to have a parent qconfig in order for the object type qconfig to be applied for a function. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33020218 Pulled By: vkuzo fbshipit-source-id: fa0e10f05ca5f88b48ef74b9d2043ea763506742	2021-12-17 05:59:55 -08:00
Vasiliy Kuznetsov	a4173fc887	dbr quant: extend qconfig_dict support to functions, part 1 (#69758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69758 Extends DBR quant `qconfig_dict['object_type']` support to function types, with the restriction that a parent module must have a qconfig. A future PR will remove the restriction above (it is due to some technical debt), to keep PR sizes small. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33020217 Pulled By: vkuzo fbshipit-source-id: ce8a8185f9c87d437e1319ff6f19e8f6adf41e02	2021-12-17 05:59:52 -08:00
Vasiliy Kuznetsov	c186773d92	dbr quant: make fqn during prepare op hook required (#69726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69726 This is a cleanup, this variable was previously optional but it always exists, because the only way an op hook can run if there is a parent module with an `AutoQuantizationState` object. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D33003472 Pulled By: vkuzo fbshipit-source-id: de5769194808d42b025b848667815b4e3d73b6c6	2021-12-17 05:59:49 -08:00
Vasiliy Kuznetsov	b999f87503	fx quant: move _parent_name to common utils (#69720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69720 This function is also useful for DBR quant, moving it from FX utils to common utils. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33003473 Pulled By: vkuzo fbshipit-source-id: 20360682c69d614a645c14fc29d3ee023d6b2623	2021-12-17 05:59:46 -08:00
Vasiliy Kuznetsov	4f450f44bf	dbr quant: initial support of qconfig_dict for modules (#69719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69719 This PR changes the API signature of DBR quant to use `qconfig_dict`, similar to FX graph mode quantization. In this first PR, only basic functionality is implemented: * qconfig=None or static quantization with quint8 only is tested * non-default qconfig for modules only is tested * targeting ops by order is not implemented Expanding this support will be done in future PRs. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D33003475 Pulled By: vkuzo fbshipit-source-id: f5af81e29c34ea57c2e23333650e44e1758102e4	2021-12-17 05:59:44 -08:00
Vasiliy Kuznetsov	0f1ceb34ec	fx quant: refactor qconfig_dict utils to separate file (#69636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69636 Moves some of the qconfig_dict utilities away from the FX subdirectory into the quantization subdirectory. These utilities can be reused with other workflows. A future PR will start using these utilities in DBR quant. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Reviewed By: albanD Differential Revision: D33003474 Pulled By: vkuzo fbshipit-source-id: 34417b198681279469e6d7c43ea311180086d883	2021-12-17 05:58:25 -08:00
Hui Guo	7abb7667a6	[tensorexpr] Add memory planning to reuse intermediate buffers (#66452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66452 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31557188 Pulled By: huiguoo fbshipit-source-id: f18dfeba1df20d5d4f118640fc10782534eb9219	2021-12-17 01:38:02 -08:00
Hui Guo	ac92f7cc75	[tensorexpr] Remove the optional argument in LoopNest::prepareForCodeGen (#67144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67144 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881150 Pulled By: huiguoo fbshipit-source-id: af99087722ec71d6deb9049b63b573ae7720c9ec	2021-12-17 01:37:59 -08:00
Hui Guo	bbfd7b75ca	[tensorexpr] Move the allocation of intermediate buffers from TEK to CodeGen (#67143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67143 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31881151 Pulled By: huiguoo fbshipit-source-id: 457e5d4ff8a15f70af9c797c9ab4803d8e779abe	2021-12-17 01:37:56 -08:00
Hui Guo	6075ec15b1	[tensorexpr] Add BufMap instruction to reuse the memory of dest buf for src buf (#66451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66451 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557190 Pulled By: huiguoo fbshipit-source-id: 96e08a05cb1c558706c4189e27d5d72efbd9c510	2021-12-17 01:37:53 -08:00
Hui Guo	c7e0951524	[tensorexpr] Add a stmt recorder to obtain stmt PCs (#66450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66450 Test Plan: Imported from OSS Reviewed By: navahgar, ZolotukhinM Differential Revision: D31557189 Pulled By: huiguoo fbshipit-source-id: 416d79ddfc46a0109187cdeb919ad9b5abde8030	2021-12-17 01:36:37 -08:00
Jerry Zhang	043098ef7f	[quant][graphmode] Rename backend_config_dict folder to backend (#69882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69882 att Test Plan: ``` python test/fx2trt/test_quant_trt.py ``` Imported from OSS Reviewed By: supriyar Differential Revision: D33081761 fbshipit-source-id: c3178eec5798ac8587be09a963944b570c73e8ea	2021-12-16 21:13:04 -08:00
Kevin Tse	3d51c88032	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from MapDataPipes (#69561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69561 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952099 Pulled By: NivekT fbshipit-source-id: 95b725774a9d04d655e2542760726908f33043f4	2021-12-16 18:11:00 -08:00
Kevin Tse	b89c283c80	[DataPipe] Unifying API - removing options to have fn_args and fn_kwargs from IterDataPipes (#69560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69560 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32952100 Pulled By: NivekT fbshipit-source-id: e0cc31408c7cf3220fe274feed1c7202a1aaae70	2021-12-16 18:09:52 -08:00
anjali411	4a6a5d1630	OpInfos for torch.{flatten, column_stack} (#69237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69237 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32988956 Pulled By: anjali411 fbshipit-source-id: b7f5c537ff9731f56232aa5647910f03edf4582a	2021-12-16 17:50:58 -08:00
Jerry Zhang	ef6f776e82	[quant][be] Cleanup test cases for eager mode workflow (#69880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69880 Making the test cases more standardized, in general we would like to have ``` TestQuantizeEager, TestQuantizeEagerOps, TestQuantizeEagerModels, ``` but currently since we have separate ptq static, ptq dynamic and qat static apis, we only partially cleaned up the test cases, we can merge all of them later when we merge all the apis Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: supriyar Differential Revision: D33081418 fbshipit-source-id: fcb96559b76bbc51eb1b0625e0d4b193dbb37532	2021-12-16 17:47:30 -08:00
Wanchao Liang	92320dfe6e	[shard] remove set device for nccl (#69946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69946 This PR remove the implicit set_device for nccl pg according to the proposal of https://github.com/pytorch/pytorch/issues/69731 ghstack-source-id: 145847504 Test Plan: wait for ci Reviewed By: pritamdamania87 Differential Revision: D33099095 fbshipit-source-id: 3fe9f6a0facf5ea513c267e9f32c6a7fd56cc8a2	2021-12-16 17:16:42 -08:00
Jerry Zhang	9813629500	[reland][quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#70007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70007 This PR extends fusion pattern support from simple sequence of ops to a simple subgraph like conv - add ``` x - conv ---\ y ---------add ---- ouptut ``` where input x, y and output are observed/quantized Test Plan: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add ``` Imported from OSS Imported from OSS Reviewed By: supriyar Differential Revision: D33144605 fbshipit-source-id: 331fda77bdc431a8cd9abe1caea8347a71776ec2	2021-12-16 17:10:44 -08:00
Eli Uriegas	62809dc062	.github: Volume mount netrc to home directory (#70057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70057 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D33169220 Pulled By: seemethere fbshipit-source-id: 720e5fb946249a26f0505afc34b95530258e53ea	2021-12-16 15:23:45 -08:00
Jerry Zhang	a73c6a45b6	[reland][quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#70006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70006 reland: fixing some mypy errors that was missed before This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one TODO: we can also move this to backend_config_dict folder Test Plan: regression fusion test ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Imported from OSS Reviewed By: supriyar Differential Revision: D33144606 fbshipit-source-id: ca34f282018a0fb4d04c7e35119eaf2d64258e78	2021-12-16 15:04:16 -08:00
Nikita Shulga	fa582045fc	Fix lint/mypy violations (#70059 ) Summary: Introduced by https://github.com/pytorch/pytorch/pull/69194 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/70059 Reviewed By: suo, cccclai Differential Revision: D33170748 Pulled By: malfet fbshipit-source-id: a2e42f37d04c21a735f6474e42eb6670d2a0c3b9	2021-12-16 14:06:27 -08:00
Michael Dagitses	02c63c3006	extract out c10 targets to the c10 package (#69992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69992 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D33141013 fbshipit-source-id: e5edd6bd5b5834ac27390ba940ebed9148512c8d	2021-12-16 13:11:49 -08:00
Zhengxu Chen	d459e79500	[jit][edge] Remove usage of shared_ptr<mobile::Code>. (#68037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68037 Right now mobile::Code doesn't outlive its enclosing Function, and all accesses to Code happens inside interpreter loop which doesn't outlive the module, so we don't need to use std::shared_ptr here. This also should saves us 1-2 KB for binary size, because shared_ptr seems to bloat on arm64 android. ghstack-source-id: 145818696 Test Plan: eyes. Reviewed By: qihqi, tugsbayasgalan Differential Revision: D32264616 fbshipit-source-id: d83f538d6604cf75fd7728a25127b4849ce7ab2a	2021-12-16 13:11:46 -08:00
Zhengxu Chen	39f65fee47	[jit] Split ClassType into a separate header. (#68036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68036 In Edge cases we want to separately include class_type.h because in the future we want to stop depending on the rest of the JIT types declared inside jit_type.h ghstack-source-id: 145818699 Test Plan: no behavior change. Reviewed By: qihqi, gmagogsfm Differential Revision: D32264618 fbshipit-source-id: 53dc187772e3dde88ff978b87252c31f3641860b	2021-12-16 13:10:05 -08:00
Ivan Yashchuk	243e135eb4	Sparse CSR CUDA: Add block sparse support for torch.triangular_solve (#68709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68709 This PR adds support for triangular solver with a block CSR matrix. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D33066067 Pulled By: cpuhrsch fbshipit-source-id: 9eaf1839071e9526be8d8c6d47732b24200f3557	2021-12-16 13:03:42 -08:00
Rohit Gupta	5f3f327a9d	update `SequentialLR` signature (#69817 ) Summary: - ~optimizer isn't required for `SequentialLR` since it's already present in the schedulers. Trying to match the signature of it with `ChainedScheduler`.~ - ~`verbose` isn't really used anywhere so removed it.~ updated missing docs and added a small check Pull Request resolved: https://github.com/pytorch/pytorch/pull/69817 Reviewed By: ngimel Differential Revision: D33069589 Pulled By: albanD fbshipit-source-id: f015105a35a2ca39fe94c70acdfd55cdf5601419	2021-12-16 12:58:00 -08:00
Joel Schlosser	15b9e5f8a4	Revert D33136054: Remove backward ops for miopen convolution Test Plan: revert-hammer Differential Revision: D33136054 (`8b9b819d22`) Original commit changeset: e049168732bd Original Phabricator Diff: D33136054 (`8b9b819d22`) fbshipit-source-id: 2a3cc3df3519d04595795f0bc87a807705d13a13	2021-12-16 12:46:02 -08:00
Pritam Damania	b199e3c842	Provide functionality to write custom ShardedTensor ops. (#69874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69874 We have a handful of ops supported for ShardedTensor via ``__torch_function__`` dispatch. However, we currently can't cover all torch operators and having a way for users to extend this functionality will make this functionality much more general. In this PR, I've introduced a custom_sharded_op decorator which can be used to register a custom sharded op implementation. ghstack-source-id: 145841141 Test Plan: waitforbuildbot Reviewed By: wanchaol Differential Revision: D33078587 fbshipit-source-id: 5936b7ac25582e613653c19afa559219719ee54b	2021-12-16 12:40:13 -08:00
Natalia Gimelshein	1f86e0ee2a	don't compile pow kernels for non-existent case (#70017 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/70017 Reviewed By: malfet Differential Revision: D33163747 Pulled By: ngimel fbshipit-source-id: 784c7934428ee896c637662fdd59833c3a395f64	2021-12-16 12:31:30 -08:00
Joel Schlosser	8b9b819d22	Remove backward ops for miopen convolution (#69987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69987 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33136054 Pulled By: jbschlosser fbshipit-source-id: e049168732bdfcf590ec8102412f2ef0418f9dcc	2021-12-16 11:49:49 -08:00
Jiawei Lv	b4c4a015d6	Revert D33163841: Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33163841 Original commit changeset: e262b6d8c80a Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: 644216036a238a458f0a2198460b36d24fb035f8	2021-12-16 11:12:18 -08:00
Peter Bell	96fe82ac3c	HANDLE_TH_ERRORS: Move exception translation out of line (#69974 ) Summary: I've noticed that the `HANDLE_TH_ERRORS` macros are actually very expensive in terms of compile time. Moving the bulk of the catch statements out of line using a lippincott function significantly improves compile times and object file binary sizes. For just the generated autograd bindings, this halves serial build time from 8 minutes to 4 and binary size is more than halved for most files with the biggest difference being `python_variable_methods.cpp` which went from 126 MB to 43 MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69974 Reviewed By: mruberry Differential Revision: D33160899 Pulled By: albanD fbshipit-source-id: fc35fa86f69ffe5a0752557be30b438c8564e998	2021-12-16 11:04:48 -08:00
Natalia Gimelshein	9ff8c49ed9	Enable cpu scalar arguments for jiterator (#69861 ) Summary: Creates analog of `gpu_kernel_with_scalars` for jiterator kernels Pull Request resolved: https://github.com/pytorch/pytorch/pull/69861 Reviewed By: mruberry Differential Revision: D33134013 Pulled By: ngimel fbshipit-source-id: fd2412e8d6432e15d5721e95a194d29fa70ad92c	2021-12-16 10:58:59 -08:00
s-kumano	ff53ed24d2	fix NameError of docstring in broadcast_object_list (#69810 ) Summary: This PR fixes NameError of docstring in broadcast_object_list. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69810 Reviewed By: kimishpatel Differential Revision: D33143167 Pulled By: jbschlosser fbshipit-source-id: 99c076466ae4b4a332763b7546028c5097b417d7	2021-12-16 10:50:45 -08:00
Natalia Gimelshein	c9e898fef8	delete TH (#69929 ) Summary: Move TH<C>GenerateByteType includes into torch/csrc (the only place they are used), and we can remove TH folder altogether! The only thing left in THC are includes left for bc compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69929 Reviewed By: mruberry Differential Revision: D33133013 Pulled By: ngimel fbshipit-source-id: 78c87cf93d2d641631b0f71051ace318bf4ec3c1	2021-12-16 10:45:30 -08:00
zhouzaida	7f7966a888	[Docs] Fix the syntax of documentation (#69958 ) Summary: Fixes the syntax of documentation in the file torch/nn/utils/clip_grad.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/69958 Reviewed By: mruberry Differential Revision: D33160612 Pulled By: albanD fbshipit-source-id: 2dc199fee345bb4c75632900bc6f73a1ab8192a6	2021-12-16 10:38:39 -08:00
Taylor Robie	ebc66bfeea	[Profiler] Pull helper methods into dedicated file. (And start `torch/csrc/profiler` folder. (#69255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69255 One thing that I've found as I optimize profier is that there's a lot of intermingled code, where the kineto profiler relies on the legacy (autograd) profiler for generic operations. This made optimization hard because I had to manage too many complex dependencies. (Exaserbated by the USE_KINETO #ifdef's sprinkled around.) This PR is the first of several to restructure the profiler(s) so the later optimizations go in easier. Test Plan: Unit tests Reviewed By: aaronenyeshi Differential Revision: D32671972 fbshipit-source-id: efa83b40dde4216f368f2a5fa707360031a85707	2021-12-16 10:33:47 -08:00
Chen Lai	b23890177f	[Operator Versioning][Edge] Codegen upgrader_mobile.cpp (#69194 ) Summary: From operator version map and upgrader torchscript, generate upgrader_mobile.cpp file. It also includes a unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69194 ghstack-source-id: 145819351 Test Plan: ``` buck test mode/opt //caffe2/test:upgrader_codegen ``` ``` buck run mode/opt //caffe2/torch/fb/mobile/upgrader_codegen:upgrader_codegen ``` ``` python /Users/chenlai/pytorch/tools/codegen/operator_versions/gen_mobile_upgraders.py ``` Reviewed By: iseeyuan Differential Revision: D32748985 fbshipit-source-id: f8437766edaba459bfc5e7fc7a3ca0520c4edb9a	2021-12-16 10:29:35 -08:00
Rohan Varma	c4281cc92d	Prototype checkpoint_wrapper (#69955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69955 Implements a checkpoint_wrapper function, which wraps nn.Module with checkpointing so user won't have to call checkpoint() everytime they want to checkpoint the module. Currently only support for reentrant-based checkpointing is added and only tested with FSDP to unblock a use case. Future work is to add support for new checkpointing API, add more tests, upstream to torch.utils.checkpoint. ghstack-source-id: 145811242 Test Plan: CI Reviewed By: mrshenli Differential Revision: D33107276 fbshipit-source-id: c4a1c68d71d65713a929994940a8750f73fbdbdb	2021-12-16 09:59:19 -08:00
Jiawei Lv	c80b5b8c8f	Revert D33102715: Back out "Revert D32606547: torch/monitor: add C++ events and handlers" Test Plan: revert-hammer Differential Revision: D33102715 (`eb374de3f5`) Original commit changeset: 3816ff01c578 Original Phabricator Diff: D33102715 (`eb374de3f5`) fbshipit-source-id: e262b6d8c80a05f3a67e024fedfbadefdbfe6e29	2021-12-16 09:39:57 -08:00
David Berard	8c7f4a0d0b	[tensorexpr] check for index out of bounds in ir_eval (#68858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68858 when executing with ir_eval, check for index out of bounds. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32657881 Pulled By: davidberard98 fbshipit-source-id: 62dd0f85bb182b34e9c9f795ff761081290f6922	2021-12-16 09:27:45 -08:00
jiej	76d282d447	Nvfuser code bump 12 5 (#69964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69964 Things added in this PR that requires review: 1. cuLaunchCooperativeKernel driver API added aten/src/ATen/cuda/detail/LazyNVRTC.cpp aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h nvfuser code update: 1. perf turning on codegen scheduler that improves performance. 2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark) Things reverted from local changes: 1. aten::gelu with approximation 2. local changes that is upstreamed in PR https://github.com/pytorch/pytorch/issues/68804 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69428 Reviewed By: ngimel Differential Revision: D33073817 Pulled By: wconstab fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb	2021-12-16 08:28:54 -08:00
francescocastelli	a6a1c709ff	Fixed libtorch at::Tensor::print() linking error (#69615 ) Summary: There was a declaration of function at::Tensor::print() in TensorBody.h, left there during the refactoring of Tensor and TensorBase (d701357d921ef167d42c125e65b6f7da6be3ad0f). Removing it from TensorBody.h resolve the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69615 Test Plan: code below now compile and works fine (print `[CPUFloatType [3, 4, 5, 5, 5]] `) ``` #include <torch/torch.h> int main() { torch::Tensor tensor = torch::randn({3, 4, 5, 5, 5}); tensor.print(); } ``` Fixes https://github.com/pytorch/pytorch/issues/69515 Reviewed By: ngimel Differential Revision: D33020361 Pulled By: albanD fbshipit-source-id: 190f253fb4101a4205aede3574b6e8acd19e54a1	2021-12-16 07:57:10 -08:00
Xida Chen	531da0c43b	change asan test shard to 3 (#69843 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68261 This PR changes the number of test shard from 2-->3 for all Asan test, aiming to improve the run time for Asan tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69843 Reviewed By: janeyx99 Differential Revision: D33160771 Pulled By: xidachen fbshipit-source-id: dba1d318cc49b923e18704839471d8753cc00eca	2021-12-16 07:22:03 -08:00
Bin Bao	fe7b6446d5	[LTC] Upstream LazyTensor and LazyGraphExecutor (#69815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69815 Test Plan: Imported from OSS Reviewed By: dagitses, jbschlosser Differential Revision: D33059774 Pulled By: desertfire fbshipit-source-id: dd1e3e5f4fd3181517eebd2742f6a5b7b6fb9a7d	2021-12-16 05:44:40 -08:00
Bin Bao	28243769f9	[LTC] Upstream several internal ops (#69716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69716 To prepare for the landing of LazyTensor and LazyGraphExecutor, - arithmetic_ir_ops.h - cast.h - device_data.h - expand.h - generic.h - scalar.h Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32999410 Pulled By: desertfire fbshipit-source-id: 31559dd7a1e525591ae9e2d7f915ee864437c11f	2021-12-16 05:44:37 -08:00
Bin Bao	e6a4988b2d	[LTC] Upstream utils in computation_client (#69621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69621 Upstream the following utils - metrics.h - multi_wait.h - thread_pool.h - unique.h Test Plan: Imported from OSS Reviewed By: wconstab, VitalyFedyunin Differential Revision: D32957629 Pulled By: desertfire fbshipit-source-id: 5f2fb57493856556099b7cda7560a568d1f9ed97	2021-12-16 05:43:09 -08:00
Nicolas Hug	73a6c36f1b	Add more details to the known limitations section of torchhub docs (#69970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69970 This is a follow up to https://github.com/pytorch/hub/issues/243 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33124060 Pulled By: NicolasHug fbshipit-source-id: 298fe14b39a1aff3e0b029044c9a0db8bc82336a	2021-12-16 02:43:48 -08:00
Tristan Rice	eb374de3f5	Back out "Revert D32606547: torch/monitor: add C++ events and handlers" (#69923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69923 Original commit changeset: fbaf2cc06ad4 Original Phabricator Diff: D32606547 (`e61fc1c03b`) This is the same thing as the original diff but just using a normal std::mutex instead of std::shared_timed_mutex which is not available on OSX 10.11. The performance difference should be negligible and easy to change down the line if it does become a bottleneck. Old failing build: https://github.com/pytorch/pytorch/runs/4495465412?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor will add ciflow tags to ensure mac builds are fine Reviewed By: aivanou Differential Revision: D33102715 fbshipit-source-id: 3816ff01c578d8e844d303d881a63cf5c3817bdb	2021-12-15 22:51:43 -08:00
Junjie Wang (PyTorch)	5cc4037369	[PyTorch][Distributed] Integrate with ShardedOptimizer in the unit test of ShardedLinear (#69569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69569 Since ShardedOptimizer is added in https://github.com/pytorch/pytorch/pull/68607. We now integrate it in our unit test for Sharded Linear. ghstack-source-id: 145773749 Test Plan: CI + Unit test Reviewed By: wanchaol Differential Revision: D32777020 fbshipit-source-id: eb6b1bb0f6234976f024273833154cab274fed25	2021-12-15 17:55:01 -08:00
Junjie Wang (PyTorch)	dc18048dd8	[PT-D][Fix] Broken sharded embedding and embedding bag test fix (#69725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69725 We have added a `no_grad` cx manager in the tensor sharding to ensure that the local_shard is the root node. But it turns out for embedding and embedding_bag, when the `max_norm` is specified, it will complain for row-wise sharding. We use the original `max_norm` of the operators. Error traces: ``` File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/overrides.py", line 1389, in handle_torch_function result = torch_func_method(public_api, types, args, kwargs) File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/api.py", line 554, in __torch_function__ return sharded_embedding(types, args, kwargs, self._process_group) File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 115, in sharded_embedding return _handle_row_wise_sharding( File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/distributed/_sharded_tensor/ops/embedding.py", line 309, in _handle_row_wise_sharding gathered_input_embeddings = torch.nn.functional.embedding( File "/data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/_sharded_tensor/sharded_embedding#binary,link-tree/torch/nn/functional.py", line 2153, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: A view was created in no_grad mode and its base or another view of its base has been modified inplace with grad mode enabled. Given that this use case is ambiguous and error-prone, it is forbidden. You can clarify your code by moving both the view and the inplace either both inside the no_grad block (if you don't want the inplace to be tracked) or both outside (if you want the inplace to be tracked). exiting process 2 with exit code: 10 ``` As a fix, we clone, detach the local shard from the narrow result without using the context manager. ghstack-source-id: 145773748 Test Plan: CI + Unit test. Reviewed By: pritamdamania87, wanchaol Differential Revision: D33000927 fbshipit-source-id: 4d5a93120675e90d4d6d6225a51c4a481d18d159	2021-12-15 17:53:49 -08:00
Joel Schlosser	4d5dd00e61	Remove backward ops for cuDNN transposed convolution (#69902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69902 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33093795 Pulled By: jbschlosser fbshipit-source-id: 8b90150bd1996e48c0c888bdab4e95a849d10ef5	2021-12-15 17:48:25 -08:00
Joel Schlosser	3dc3651e0e	Remove backward ops for cuDNN convolution (#69901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69901 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33093796 Pulled By: jbschlosser fbshipit-source-id: f5beab6f3078144b6c8e5c4c51d69823815a9f99	2021-12-15 17:46:49 -08:00
Gao, Xiang	bf15dc22bc	Fix build on latest main branch of thrust (#69985 ) Summary: Our internal CI that builds PyTorch with the latest main branch of thrust fails with ``` https://github.com/pytorch/pytorch/issues/22 466.9 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DIDEEP_USE_MKL -DMAGMA_V2 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTH_BLAS_MKL -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_MPI -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -Iaten/src -I../aten/src -I. -I../ -I../cmake/../third_party/benchmark/include -I../cmake/../third_party/cudnn_frontend/include -I../third_party/onnx -Ithird_party/onnx -I../third_party/foxi -Ithird_party/foxi -Iinclude -I../torch/csrc/distributed -I../aten/src/TH -I../aten/src/THC -I../aten/src/ATen/cuda -Icaffe2/aten/src -I../aten/../third_party/catch/single_include -I../aten/src/ATen/.. -Icaffe2/aten/src/ATen -Inccl/include -I../c10/cuda/../.. -I../c10/.. -I../third_party/tensorpipe -Ithird_party/tensorpipe -I../third_party/tensorpipe/third_party/libnop/include -I../torch/csrc/api -I../torch/csrc/api/include -isystem=third_party/gloo -isystem=../cmake/../third_party/gloo -isystem=../cmake/../third_party/googletest/googlemock/include -isystem=../cmake/../third_party/googletest/googletest/include -isystem=../third_party/protobuf/src -isystem=/opt/conda/include -isystem=../third_party/gemmlowp -isystem=../third_party/neon2sse -isystem=../third_party/XNNPACK/include -isystem=../third_party -isystem=../cmake/../third_party/eigen -isystem=/opt/conda/include/python3.8 -isystem=/opt/conda/lib/python3.8/site-packages/numpy/core/include -isystem=../cmake/../third_party/pybind11/include -isystem=/opt/hpcx/ompi/include/openmpi -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/hwloc/hwloc201/hwloc/include -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent -isystem=/opt/hpcx/ompi/include/openmpi/opal/mca/event/libevent2022/libevent/include -isystem=/opt/hpcx/ompi/include -isystem=/usr/local/cuda/include -isystem=../third_party/ideep/mkl-dnn/third_party/oneDNN/include -isystem=../third_party/ideep/include -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Xcudafe --diag_suppress=20236 -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -Xcompiler=-fPIC -DCAFFE2_USE_GLOO -DCUDA_HAS_FP16=1 -DHAVE_GCC_GET_CPUID -DUSE_AVX -DUSE_AVX2 -DTH_HAVE_THREAD -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-variable,-Wno-unused-function,-Wno-unused-result,-Wno-unused-local-typedefs,-Wno-missing-field-initializers,-Wno-write-strings,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-sign-compare,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-error=deprecated-declarations,-Wno-missing-braces,-Wno-maybe-uninitialized -DTORCH_CUDA_BUILD_MAIN_LIB -Xcompiler -pthread -std=c++14 -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o.d -x cu -c ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LegacyThrustHelpers.cu.o https://github.com/pytorch/pytorch/issues/22 466.9 ../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu(53): error: namespace "thrust" has no member "make_constant_iterator" https://github.com/pytorch/pytorch/issues/22 466.9 https://github.com/pytorch/pytorch/issues/22 466.9 1 error detected in the compilation of "../aten/src/ATen/native/cuda/LegacyThrustHelpers.cu". ``` The failure is because this file uses `thrust::make_counting_iterator`, but didn't include the file where this function is defined. cc: xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69985 Reviewed By: jbschlosser Differential Revision: D33135575 Pulled By: ngimel fbshipit-source-id: 7a8da56bba609d6c30de4a064669faba12cb7168	2021-12-15 17:08:43 -08:00
Charles David Hernandez	98c0fb8b42	[sparsity] More descriptive error message for missing parameters (#69895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69895 sparse.Linear has an error message that doesn't tell the user how to resolve the issue. This adds more info. ghstack-source-id: 145603212 Test Plan: Not needed -- string change only Reviewed By: jerryzh168 Differential Revision: D33039278 fbshipit-source-id: b5f7f5d257142eb3e7ad73f7c005755253a329d7	2021-12-15 16:58:31 -08:00
Rui Zhu	46ace4ac33	Add support for masked_softmax when softmax_elements > 1024 & corresponding unit tests (#69924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69924 Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32819181 fbshipit-source-id: 6838a11d3554ec8e1bd48f1c2c7b1ee3a4680995	2021-12-15 16:44:15 -08:00
Jay Chae	32ffad17a9	[PyTorch][Easy] make GlobalRecordFunctionCallbacks smallvector (#70002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70002 callbacks are limited to 4. no reason for it to be `std::vector` Test Plan: CI Reviewed By: aaronenyeshi Differential Revision: D32611294 fbshipit-source-id: 21823248abe40d461579b9b68d53c8c0de2a133d	2021-12-15 16:28:09 -08:00
Jay Chae	65ab63310b	[PyTorch] use div instead of mul when calculating sampling probability (#70001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70001 multiplying inversion of `kLowProb` instead of division which uses less expensive `mul` instead of `idv` Test Plan: Before {F682076291} After {F682076323} Reviewed By: robieta Differential Revision: D32608440 fbshipit-source-id: 7851317a0f7e33813f2bd7a152e5e7f4b5c361b4	2021-12-15 15:28:18 -08:00
Scott Wolchok	66406ee0f7	[PyTorch][Static Runtime] Fix to() w/dtype bool (#69935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69935 Didn't realize that `AT_DISPATCH_ALL_TYPES` should really be called `AT_DISPATCH_MOST_TYPES`. ghstack-source-id: 145661358 Test Plan: Added test for dtype bool. Ran CMF local_ro net: before: ``` I1215 12:33:49.300174 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.966491. Iters per second: 1034.67 I1215 12:33:49.825570 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.94867. Iters per second: 1054.11 I1215 12:33:50.349246 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947926. Iters per second: 1054.93 I1215 12:33:50.870433 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.943779. Iters per second: 1059.57 I1215 12:33:51.393702 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.947185. Iters per second: 1055.76 I1215 12:33:51.915666 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.945672. Iters per second: 1057.45 I1215 12:33:52.438475 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948407. Iters per second: 1054.4 I1215 12:33:52.965337 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95472. Iters per second: 1047.43 I1215 12:33:53.494563 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.967083. Iters per second: 1034.04 I1215 12:33:54.017879 1606538 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.948945. Iters per second: 1053.8 I1215 12:33:54.017930 1606538 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.951888, standard deviation: 0.0083367 ``` after: ``` I1215 12:32:35.820874 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.999845. Iters per second: 1000.15 I1215 12:32:36.343147 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944363. Iters per second: 1058.91 I1215 12:32:36.863806 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.942542. Iters per second: 1060.96 I1215 12:32:37.385459 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.944677. Iters per second: 1058.56 I1215 12:32:37.905436 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941135. Iters per second: 1062.55 I1215 12:32:38.424907 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.939748. Iters per second: 1064.11 I1215 12:32:38.944643 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.941764. Iters per second: 1061.84 I1215 12:32:39.463791 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.938946. Iters per second: 1065.02 I1215 12:32:39.987567 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.95437. Iters per second: 1047.81 I1215 12:32:40.511204 1594955 PyTorchPredictorBenchLib.cpp:279] PyTorch run finished. Milliseconds per iter: 0.959139. Iters per second: 1042.6 I1215 12:32:40.511242 1594955 PyTorchPredictorBenchLib.cpp:290] Mean milliseconds per iter: 0.950653, standard deviation: 0.0184761 ``` Reviewed By: hlu1 Differential Revision: D33106675 fbshipit-source-id: 5bb581f8d0ed22ef08df1936dc8d67045e44e862	2021-12-15 15:26:56 -08:00
Eli Uriegas	b28a4100ff	scripts: Fix manylinux2014 promotion to pypi (#70003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/70003 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser, janeyx99 Differential Revision: D33143730 Pulled By: seemethere fbshipit-source-id: 83a46047fbfe4709e841fbfcaa75e434ff325be5	2021-12-15 14:55:00 -08:00
Peter Bell	38cfacd817	Tensor: Define operators override functions in TensorBody.h (#68697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68697 Currently, if you include `Tensor.h` but not `TensorOperators.h` then using overloaded operators will compile but fail at link time. Instead, this defines the member functions in `TensorBody.h` and leaves `TensorOperators.h` as only the free functions. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596269 Pulled By: albanD fbshipit-source-id: 5ce39334dc3d505865268f5049b1e25bb90af44a	2021-12-15 14:29:38 -08:00
Peter Bell	9c7c1b769a	Functionalization: Only include headers for required ops (#68690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68690 RegisterFunctionalization.cpp is a shared file, so only including the required operators means a single operator change only requires 1 shard to be rebuilt instead of all of them. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596275 Pulled By: albanD fbshipit-source-id: 8b56f48872156b96fbc0a16b542b8bab76b73fd4	2021-12-15 14:29:35 -08:00
Peter Bell	7bb4b683b5	Codegen: Registration now only includes the functions used (#68689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68689 Currently Register{DispatchKey}.cpp includes all of `NativeFunctions.h`, so any operator signature change requires all backend registration to be recompiled. However, most backends only have registrations for a small fraction of operators so it makes sense to only include the specific functions required. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596273 Pulled By: albanD fbshipit-source-id: 11d511f47937fbd5ff9f677c9914277b5d015c25	2021-12-15 14:29:32 -08:00
Peter Bell	6ba18ba87e	Codegen: Generate static dispatch headers per operator (#68714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68714 This splits the static dispatch headers (e.g. `CPUFunctions.h`) into per operators headers (e.g. `ops/empty_cpu_dispatch.h`) which is needed for when `Tensor.h` is compiled with static dispatch enabled. There are also several places in ATen where the static dispatch headers are used as an optimization even in dynamic dispatch builds. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596265 Pulled By: albanD fbshipit-source-id: 287783ef4e35c7601e9d2714ddbc8d4a5b1fb9e5	2021-12-15 14:29:29 -08:00
Peter Bell	303d60b8da	Add TORCH_ASSERT_ONLY_METHOD_OPERATORS macro (#68688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68688 This adds a new macro `TORCH_ASSERT_ONLY_METHOD_OPERATORS` which allows `Tensor.h` to be included, but not headers which pull in all other operators. So, a file that defines this macro needs to use the fine-grained headers to include only the operators being used. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596267 Pulled By: albanD fbshipit-source-id: 6fc2ce3d2b0f52ac6d81b3f063193ce26e0d75a3	2021-12-15 14:29:26 -08:00
Peter Bell	bab61be43b	Codegen: Add root_name property to NativeFunction{,sGroup} (#68687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68687 This adds `NativeFunction.root_name` which is the canonical name for the operator group. i.e. the BaseOperatorName without inplace or double-underscores. In the previous PR I referred to this as `base_name` but confusingly `BaseOperatorName` does potentially include inplace or double-underscores. I also add the property to `NativeFunctionsGroup` so that grouped functions with type `Union[NativeFunction, NativeFunctionsGroup]` can have the property queried without needing `isinstance` checks. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32596271 Pulled By: albanD fbshipit-source-id: 8b6dad806ec8d796dcd70fc664604670d668cae7	2021-12-15 14:28:10 -08:00
Michael Suo	a406a427ae	Revert D33004315: Support torch.equal for ShardedTensor. Test Plan: revert-hammer Differential Revision: D33004315 (`1c4c81622c`) Original commit changeset: 786fe26baf82 Original Phabricator Diff: D33004315 (`1c4c81622c`) fbshipit-source-id: e1dda70fea656834fdf0f2a9f874415f7b460c6e	2021-12-15 14:14:06 -08:00
Pritam Damania	1c4c81622c	Support torch.equal for ShardedTensor. (#69734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69734 Added support for `torch.equal` to ShardedTensor. This is really helpful in terms of comparing two ShardedTensors. Will implement `allclose` in a follow PR. ghstack-source-id: 145301451 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D33004315 fbshipit-source-id: 786fe26baf82e1bb4fecfdbfc9ad4b64e704877f	2021-12-15 13:07:36 -08:00
Sahan Chanuka Paliskara	8a08e70bf4	Revert D32596676: Avoid adding torch::deploy interpreter library to the data section Test Plan: revert-hammer Differential Revision: D32596676 (`986d19c0a7`) Original commit changeset: 1ab15b2d3642 Original Phabricator Diff: D32596676 (`986d19c0a7`) fbshipit-source-id: da4f02114fd7e41634f116ab659a55cd985cfd7d	2021-12-15 13:02:22 -08:00
Taylor Robie	24bc3be146	[Profiler] Clean up profiler includes. (#69421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69421 I've hit a lot of build issues in D32671972, and I've come to realize that a lot of it boils down to header hygene. `function.h` includes `profiler.h` solely to transitively include `record_function.h` which winds up leaking the profiler symbols. Moreover several files are relying on transitive includes to get access to `getTime`. As long as I have to touch all the places that use `getTime`, I may as well also move them to the new namespace. Test Plan: Unit tests and CI. Reviewed By: aaronenyeshi, albanD Differential Revision: D32865907 fbshipit-source-id: f87d6fd5afb784dca2146436e72c69e34623020e	2021-12-15 12:50:24 -08:00
Peter Bell	587f8d9924	OperatorEntry: Avoid unnecessarily templated code (#67986 ) Summary: `assertSignatureIsCorrect` is instantiated at minimum once per unique operator signature yet its core logic is independent of the type. So, it makes sense to have a light-weight template that does nothing but call into the non-templated function with the correct `CppSignature` object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67986 Reviewed By: jbschlosser Differential Revision: D33108600 Pulled By: swolchok fbshipit-source-id: 7594524d3156ff2422e6edcdffcb263dc67ea346	2021-12-15 12:43:53 -08:00
Sahan Chanuka Paliskara	986d19c0a7	Avoid adding torch::deploy interpreter library to the data section (#69245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69245 Create custom section ".embedded_interpreter" in order to store interpreter instead of .data in order to allow in order to increae the amount of memory that can be used by 33% for the other sections of the executable (1.5GB -> 2.0GB) such as .text/.data/.bss. This also removes memory limitations of the interpreter and tech debt. Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy check the size of the .data section Apply the fix and check the size of the .data section again. It should be reduced by the size of the interpreter.so The output of `readelf -S ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/test_deploy` is as follows. The .data section is now 0.0015415GB and the .torch_deploy_payXXX section is 0.605125GB ``` (pytorch) [sahanp@devvm4333.vll0 ~/local/fbsource/fbcode] readelf -S buck-out/gen/caffe2/torch/csrc/deploy/test_deploy There are 55 section headers, starting at offset 0x24bac82b0: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .interp PROGBITS 0000000000200350 00000350 0000000000000028 0000000000000000 A 0 0 1 [ 2] .note.ABI-tag NOTE 0000000000200378 00000378 0000000000000020 0000000000000000 A 0 0 4 [ 3] .note.gnu.build-i NOTE 0000000000200398 00000398 0000000000000024 0000000000000000 A 0 0 4 [ 4] .dynsym DYNSYM 00000000002003c0 000003c0 0000000000d07a48 0000000000000018 A 9 1 8 [ 5] .gnu.version VERSYM 0000000000f07e08 00d07e08 0000000000115f86 0000000000000002 A 4 0 2 [ 6] .gnu.version_r VERNEED 000000000101dd90 00e1dd90 0000000000000510 0000000000000000 A 9 15 4 [ 7] .gnu.hash GNU_HASH 000000000101e2a0 00e1e2a0 00000000003b4fb0 0000000000000000 A 4 0 8 [ 8] .hash HASH 00000000013d3250 011d3250 0000000000457e20 0000000000000004 A 4 0 4 [ 9] .dynstr STRTAB 000000000182b070 0162b070 0000000004ef205a 0000000000000000 A 0 0 1 [10] .rela.dyn RELA 000000000671d0d0 0651d0d0 0000000000110b80 0000000000000018 A 4 0 8 [11] .rela.plt RELA 000000000682dc50 0662dc50 00000000000093f0 0000000000000018 A 4 35 8 [12] .rodata PROGBITS 0000000006837040 06637040 00000000034067a8 0000000000000000 AMS 0 0 64 [13] fb_build_info PROGBITS 0000000009c3d7f0 09a3d7f0 00000000000002ee 0000000000000000 A 0 0 16 [14] .gcc_except_table PROGBITS 0000000009c3dae0 09a3dae0 00000000014a9340 0000000000000000 A 0 0 4 [15] .eh_frame_hdr PROGBITS 000000000b0e6e20 0aee6e20 00000000004abf54 0000000000000000 A 0 0 4 [16] .eh_frame PROGBITS 000000000b592d78 0b392d78 000000000200e344 0000000000000000 A 0 0 8 [17] .text PROGBITS 000000000d5a2000 0d3a2000 000000001e55944e 0000000000000000 AX 0 0 256 [18] .init PROGBITS 000000002bafb450 2b8fb450 0000000000000017 0000000000000000 AX 0 0 4 [19] .fini PROGBITS 000000002bafb468 2b8fb468 0000000000000009 0000000000000000 AX 0 0 4 [20] .never_hugify PROGBITS 000000002bafb480 2b8fb480 0000000000000db3 0000000000000000 AX 0 0 16 [21] text_env PROGBITS 000000002bafc240 2b8fc240 0000000000002e28 0000000000000000 AX 0 0 16 [22] .plt PROGBITS 000000002baff070 2b8ff070 00000000000062b0 0000000000000000 AX 0 0 16 [23] .tdata PROGBITS 000000002bb06000 2b906000 0000000000000b20 0000000000000000 WAT 0 0 8 [24] .tbss NOBITS 000000002bb06b40 2b906b20 0000000000007cb8 0000000000000000 WAT 0 0 64 [25] .fini_array FINI_ARRAY 000000002bb06b20 2b906b20 0000000000000028 0000000000000000 WA 0 0 8 [26] .init_array INIT_ARRAY 000000002bb06b48 2b906b48 0000000000008878 0000000000000000 WA 0 0 8 [27] .data.rel.ro PROGBITS 000000002bb0f3c0 2b90f3c0 0000000000029ce0 0000000000000000 WA 0 0 64 [28] .ctors PROGBITS 000000002bb390a0 2b9390a0 0000000000000010 0000000000000000 WA 0 0 8 [29] .dynamic DYNAMIC 000000002bb390b0 2b9390b0 0000000000000340 0000000000000010 WA 9 0 8 [30] .got PROGBITS 000000002bb393f0 2b9393f0 000000000001f040 0000000000000000 WA 0 0 8 [31] .bss.rel.ro NOBITS 000000002bb58440 2b958430 0000000000000c40 0000000000000000 WA 0 0 32 [32] .data PROGBITS 000000002bb5a000 2b959000 0000000000194188 0000000000000000 WA 0 0 4096 [33] .tm_clone_table PROGBITS 000000002bcee188 2baed188 0000000000000000 0000000000000000 WA 0 0 8 [34] .probes PROGBITS 000000002bcee188 2baed188 0000000000000002 0000000000000000 WA 0 0 2 [35] .got.plt PROGBITS 000000002bcee190 2baed190 0000000000003168 0000000000000000 WA 0 0 8 [36] .bss NOBITS 000000002bcf1300 2baf02f8 00000000005214f0 0000000000000000 WA 0 0 128 [37] .nvFatBinSegment PROGBITS 000000002c213000 2baf1000 0000000000002850 0000000000000000 A 0 0 8 [38] .nv_fatbin PROGBITS 000000002c216000 2baf4000 0000000052baed38 0000000000000000 WA 0 0 8 [39] .comment PROGBITS 0000000000000000 7e6a2d38 00000000000001dc 0000000000000000 MS 0 0 1 [40] .debug_aranges PROGBITS 0000000000000000 7e6a2f20 0000000001266c00 0000000000000000 0 0 16 [41] .debug_info PROGBITS 0000000000000000 7f909b20 000000007b21de49 0000000000000000 0 0 1 [42] .debug_abbrev PROGBITS 0000000000000000 fab27969 000000000179f365 0000000000000000 0 0 1 [43] .debug_line PROGBITS 0000000000000000 fc2c6cce 00000000176954ac 0000000000000000 0 0 1 [44] .debug_str PROGBITS 0000000000000000 11395c17a 0000000039dc32b0 0000000000000001 MS 0 0 1 [45] .debug_ranges PROGBITS 0000000000000000 14d71f430 0000000026a2d930 0000000000000000 0 0 16 [46] .debug_types PROGBITS 0000000000000000 17414cd60 000000000b211ff5 0000000000000000 0 0 1 [47] .debug_loc PROGBITS 0000000000000000 17f35ed55 000000009ca80c7e 0000000000000000 0 0 1 [48] .debug_macinfo PROGBITS 0000000000000000 21bddf9d3 000000000000151c 0000000000000000 0 0 1 [49] .note.stapsdt NOTE 0000000000000000 21bde0ef0 0000000000001b3c 0000000000000000 0 0 4 [50] .debug_macro PROGBITS 0000000000000000 21bde2a2c 0000000000040e6a 0000000000000000 0 0 1 [51] .torch_deploy_pay PROGBITS 0000000000000000 21be23896 0000000026ba5d28 0000000000000000 0 0 1 [52] .symtab SYMTAB 0000000000000000 2429c95c0 00000000020ce0c8 0000000000000018 54 863985 8 [53] .shstrtab STRTAB 0000000000000000 244a97688 000000000000025c 0000000000000000 0 0 1 [54] .strtab STRTAB 0000000000000000 244a978e4 00000000070309c6 0000000000000000 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) ``` Reviewed By: shunting314 Differential Revision: D32596676 fbshipit-source-id: 1ab15b2d36422506d8f781d3bbc0c70c44bc3d91	2021-12-15 11:27:57 -08:00
Scott Wolchok	c6bcfb152d	[PyTorch][easy] Move GlobalRecordFunctionCallbacks{,Entry} to cpp file (#68483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68483 Doesn't need to be in the header. ghstack-source-id: 145668417 Test Plan: CI Reviewed By: chaekit Differential Revision: D32477113 fbshipit-source-id: 30e7796413e3220e4051544559f9110ab745022d	2021-12-15 09:38:51 -08:00
Mike Iovine	873585da2b	[SR] Improve set_inputs (#69087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69087 This diff includes a variety of improvements to `set_inputs` to unify behavior with `torch::jit::Module`: 1. Eliminate code duplication between rvalue/lvalue overloads 2. Add type checks 3. Make input length check a `TORCH_CHECK` instead of a debug check - we have to fail when the wrong number of inputs are passed. 4. `schema` now always includes `self`, even if we release `module_`. This is consistent with `torch::jit::Module`.\| ghstack-source-id: 145599837 Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32711705 fbshipit-source-id: fe97c10b4f03801ba59868b452e7d02b26b3106b	2021-12-15 09:31:19 -08:00
Scott Wolchok	aeedd89d4e	[PyTorch] RecordFunction: use SmallVector for ObserverContextList (#68412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68412 These lists have the same size as CallbackHandles, so they should be the same container type. ghstack-source-id: 145668416 Test Plan: Run same command as previous diff. Before: see previous diff, average about 0.46us After: P467928077, average about 0.43us Reviewed By: chaekit Differential Revision: D32454856 fbshipit-source-id: 3a3ff4d381d99f51ef868d4dec4db7c411b5ea56	2021-12-15 09:31:16 -08:00
Jane Xu	29914f55bf	Skip print_test_stats checks for tests that use repeat_test_for_types (#69872 ) Summary: Once https://github.com/pytorch/pytorch/issues/69865 is fixed, this change should be undone. This will avoid print_test_stats errors in CI, such as https://github.com/pytorch/pytorch/runs/4501145212?check_suite_focus=true (HUD view `fc37e5b3ed`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69872 Reviewed By: dagitses, suo Differential Revision: D33094446 Pulled By: janeyx99 fbshipit-source-id: 7378556d75ea94dd407a2bf9dda37b15c57014f7	2021-12-15 09:29:58 -08:00
Nikita Shulga	d71b8e1a8d	More distutils.version.LooseVersion changes (#69947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69947 Reviewed By: seemethere Differential Revision: D33111996 Pulled By: malfet fbshipit-source-id: e7d2cc4ed3e39452e809965e360b05f0b409ec0d	2021-12-15 08:07:36 -08:00
Alban Desmaison	6f9844693f	Revert D32974907: [quant][graphmode][fx] Enable fuse handler for sequence of 3 ops Test Plan: revert-hammer Differential Revision: D32974907 (`bf089840ac`) Original commit changeset: ba205e74b566 Original Phabricator Diff: D32974907 (`bf089840ac`) fbshipit-source-id: e47838f3008ba014d884aef53460df654f0cf731	2021-12-15 05:46:49 -08:00
Alban Desmaison	87bc1f4ed8	Revert D33024528: [quant][fx][graphmode] Add support for conv add pattern in backend_config_dict Test Plan: revert-hammer Differential Revision: D33024528 (`59000cff91`) Original commit changeset: 5c770c82c8f6 Original Phabricator Diff: D33024528 (`59000cff91`) fbshipit-source-id: 7da6f421ef63f47fbffad8b3ad91f6a31d19d867	2021-12-15 05:45:29 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	43b8e833e9	Fix bug in aten::full signature in version_map.h to accurately reflect the current schema (#69860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69860 Previously I made a mistake and checked in the aten::full.names for the upgrader of aten::full. So changed it back to just aten::full. Test Plan: None Reviewed By: gmagogsfm Differential Revision: D33066985 fbshipit-source-id: a5598d60d1bff9b4455f807361388fac0689ba14	2021-12-15 01:09:31 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	5c7817fd43	Add test operator in upgrader entry (#69427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69427 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D32867984 Pulled By: tugsbayasgalan fbshipit-source-id: 25810fc2fd4b943911f950618968af067c04da5c	2021-12-15 00:40:05 -08:00
soulitzer	47f11730ec	Add testing for forward over reverse gradgrad (#69740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69740 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031727 Pulled By: soulitzer fbshipit-source-id: 2bcba422b4bcea3bbc936d07ba45171a6531e578	2021-12-14 23:35:10 -08:00
soulitzer	d0fe7db1f6	Add formulas for distributions (#69690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69690 * #69558 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33031726 Pulled By: soulitzer fbshipit-source-id: 9ae461dc6043d48d5bb8c2bbaa266d06ad99f317	2021-12-14 23:35:07 -08:00
soulitzer	b399a4d7b9	Add some reduction forward AD formulas (#69661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69661 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020601 Pulled By: soulitzer fbshipit-source-id: 110da6dcd490e5c3849cace62a777aa1a2b6982e	2021-12-14 23:33:43 -08:00
Scott Wolchok	3b7fc0243c	[PyTorch] Make TypePrinter take const Type& (#69412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69412 TypePrinter does not need to take ownership of the Type. This helps unblock the following diff to stop refcounting Type singletons. ghstack-source-id: 145671619 Test Plan: CI Reviewed By: suo Differential Revision: D32858525 fbshipit-source-id: df58676938fd20c7bae4a366d70b2067a852282d	2021-12-14 23:13:03 -08:00
CodemodService FBSourceBuckFormatLinterBot	7a12b5063e	[AutoAccept][Codemod][FBSourceBuckFormatLinter] Daily `arc lint --take BUCKFORMAT` Reviewed By: zertosh Differential Revision: D33119794 fbshipit-source-id: ca327caf34560c0bba32511e57d5dc18b71bdfe1	2021-12-14 21:54:41 -08:00
Jerry Zhang	59000cff91	[quant][fx][graphmode] Add support for conv add pattern in backend_config_dict (#69778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69778 This PR extends fusion pattern support from simple sequence of ops to a simple subgraph like conv - add ``` x - conv ---\ y ---------add ---- ouptut ``` where input x, y and output are observed/quantized Test Plan: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps.test_conv_add ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D33024528 fbshipit-source-id: 5c770c82c8f693fabdac5c69343942a9dfda84ef	2021-12-14 20:46:01 -08:00
Chen Lai	408283319a	[Operator Versioning][Edge] Change OP to CALL when there is a valid upgrader (#67731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67731 1. Register upgrader function at loading stage 2. Change OP to CALL when there operator_version from model is smaller than current runtime version and there exists a valid upgrader The interpreter log is : ``` RUNNING 0 STOREN 1 3 RUNNING 1 DROPR 1 RUNNING 2 LOAD 2 RUNNING 3 LOAD 3 RUNNING 4 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 5 LOAD 2 RUNNING 6 LOAD 3 RUNNING 7 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 8 MOVE 2 RUNNING 9 MOVE 3 RUNNING 10 CALL 0 RUNNING 0 STOREN 1 2 RUNNING 1 LOAD 1 RUNNING 2 OP 0, aten::is_floating_point RUNNING 3 JF 3 RUNNING 4 LOADC 1 RUNNING 5 JMP 3 RUNNING 8 STORE 3 RUNNING 9 MOVE 3 RUNNING 10 JF 5 RUNNING 11 LOAD 1 RUNNING 12 LOAD 2 RUNNING 13 OP 1, aten::div.Tensor RUNNING 14 JMP 5 RUNNING 19 STORE 4 RUNNING 20 DROPR 2 RUNNING 21 DROPR 1 RUNNING 22 MOVE 4 RUNNING 23 RET RUNNING 11 TUPLE_CONSTRUCT 3 RUNNING 12 RET ``` The upgrader bytecode is: ``` (STOREN, 1, 2) (LOAD, 1, 0) (OP, 0, 0) (JF, 3, 0) (LOADC, 1, 0) (JMP, 3, 0) (LOAD, 2, 0) (OP, 0, 0) (STORE, 3, 0) (MOVE, 3, 0) (JF, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (OP, 1, 0) (JMP, 5, 0) (LOAD, 1, 0) (LOAD, 2, 0) (LOADC, 0, 0) (OP, 2, 0) (STORE, 4, 0) (DROPR, 2, 0) (DROPR, 1, 0) (MOVE, 4, 0) (RET, 0, 0) ``` ghstack-source-id: 145635622 Test Plan: describe in summary and CI Reviewed By: iseeyuan Differential Revision: D32092517 fbshipit-source-id: 0314b4bda5d2578cdd4e7cfbfd1e3c07fbccf8a3	2021-12-14 19:13:12 -08:00
Chen Lai	9e4d60a552	[Operator Versioning][Edge] Use check in cpp source file for upgrader (#67728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67728 1. Check in upgrader_mobile.h and upgrader_mobile.cpp 2. Add test to parse all bytecode from upgrader_mobile.h ghstack-source-id: 145635621 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterUpgraderTest.Upgrader' Reviewed By: iseeyuan Differential Revision: D32087295 fbshipit-source-id: 21e95aabb5e9db76be27e01adfea8fbc41caeaf6	2021-12-14 19:10:51 -08:00
Jerry Zhang	bf089840ac	[quant][graphmode][fx] Enable fuse handler for sequence of 3 ops (#69658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69658 This PR enables fuse handler for sequence of three ops, and merges all fuse handlers into one TODO: we can also move this to backend_config_dict folder Test Plan: regression fusion test ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32974907 fbshipit-source-id: ba205e74b566814145f776257c5f5bb3b24547c1	2021-12-14 19:04:21 -08:00
Mike Iovine	102684b252	[SR] Fix stack/concat bug (#68777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68777 Fixed some cases where negative dimensions were not handled correctly * `_stack_cpu` calls `maybe_wrap_dim`, but `_stack_cpu_out` does not. This is only problematic when `_stack_cpu_out` forwards to the serial kernel: [ref](https://www.internalfb.com/code/fbsource/[1b5af978b48f2e5d308d42b588bde3275869a57b]/fbcode/caffe2/aten/src/ATen/native/TensorShape.cpp?lines=1541-1547). * concat also needs to wrap its dim Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Added new tests to cover this case Reviewed By: hlu1 Differential Revision: D32604623 fbshipit-source-id: 00aaa42817cd2d3e7606ce75ab5a9744645118cf	2021-12-14 16:26:27 -08:00
David Berard	ebc35a7ead	[JIT] Enable freezing for sparse COO tensors (#69614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69614 Previously sparse COO tensors were ignored during freezing, because `tryInsertConstant` would fail during `freeze_module.cpp`, and because hashes weren't implemented for COO tensor IValues. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32954620 Pulled By: davidberard98 fbshipit-source-id: a91f97fdfc2152b417f43a6948100c94970c0831	2021-12-14 15:43:50 -08:00
Brian Hirsh	33363cea64	Revert D32498572: allow external backend codegen to be used without autograd kernels Test Plan: revert-hammer Differential Revision: D32498572 (`b83b6f7424`) Original commit changeset: 3e7159c633f6 Original Phabricator Diff: D32498572 (`b83b6f7424`) fbshipit-source-id: f93fa444c95a2423eef5975a2ecdb96f14e0c535	2021-12-14 15:28:49 -08:00
Brian Hirsh	f6cad53443	Revert D32498569: allow external backend codegen to toggle whether to generate out= and inplace kernels Test Plan: revert-hammer Differential Revision: D32498569 (`aa0cf68c17`) Original commit changeset: ebd932d042b9 Original Phabricator Diff: D32498569 (`aa0cf68c17`) fbshipit-source-id: 21a393fa339510d926512a7983d33ece327b743d	2021-12-14 15:27:24 -08:00
Brian Hirsh	0ef523633f	Revert D32498570: make codegen'd device guards not cuda-specific. Allow them to be used in external codegen Test Plan: revert-hammer Differential Revision: D32498570 (`2e7a91c45f`) Original commit changeset: 0ce6a5614417 Original Phabricator Diff: D32498570 (`2e7a91c45f`) fbshipit-source-id: 7c64ce1b5e51a680b4aeae8721e0c9e15c793289	2021-12-14 15:04:10 -08:00
Nikita Shulga	24ee1d13f6	Another attempt to fix version comparison check (#69939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69939 Reviewed By: atalman Differential Revision: D33108135 Pulled By: malfet fbshipit-source-id: cadadfe5b04c4378f149136f8e1f8e8d6266775c	2021-12-14 14:54:15 -08:00
Mike Guo	d4f8313497	Add low level torch.profiler.kineto_profile base class (#63302 ) Summary: Refactor torch.profiler.profile by separate it into one low level class and one high level wrapper. The PR include the following change: 1. separate class torch.profiler.profile into two separated class: kineto_profiler and torch.profiler.profile. 2. The former class has the low-level functionality exposed in C++ level like: prepare_profiler, start_profiler, stop_profiler. 3. The original logics in torch.profiler.profile including export_chrome_trace, export_stacks, key_averages, events, add_metadata are all moved into kineto_profiler since they are all exposed by the torch.autograd.profiler. 4. The new torch.profiler.profile is fully back-compatible with original class since it inherit from torch.profiler.kineto_profiler. Its only responsibility in new implementation is the maintenance of the finite state machine of ProfilerAction. With the refactoring, the responsibility boundary is clear and the new logic is simple to understand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63302 Reviewed By: albanD Differential Revision: D33006442 Pulled By: robieta fbshipit-source-id: 30d7c9f5c101638703f1243fb2fcc6ced47fb690	2021-12-14 14:47:43 -08:00
kshitij12345	e8d5c7cf7f	[nn] mha : no-batch-dim support (python) (#67176 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 * [x] Update docs * [x] Tests for shape checking Tests take roughly 20s on system that I use. Below is the timings for slowest 20 tests. ``` pytest test/test_modules.py -k _multih --durations=20 ============================================================================================== test session starts =============================================================================================== platform linux -- Python 3.10.0, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 rootdir: /home/kshiteej/Pytorch/pytorch_no_batch_mha, configfile: pytest.ini plugins: hypothesis-6.23.2, repeat-0.9.1 collected 372 items / 336 deselected / 36 selected test/test_modules.py ..............ssssssss.............. [100%] ================================================================================================ warnings summary ================================================================================================ ../../.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73 test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.10/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ============================================================================================== slowest 20 durations ============================================================================================== 8.66s call test/test_modules.py::TestModuleCUDA::test_gradgrad_nn_MultiheadAttention_cuda_float64 2.02s call test/test_modules.py::TestModuleCPU::test_gradgrad_nn_MultiheadAttention_cpu_float64 1.89s call test/test_modules.py::TestModuleCUDA::test_grad_nn_MultiheadAttention_cuda_float64 1.01s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float32 0.51s call test/test_modules.py::TestModuleCPU::test_grad_nn_MultiheadAttention_cpu_float64 0.46s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float32 0.45s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float64 0.44s call test/test_modules.py::TestModuleCUDA::test_non_contiguous_tensors_nn_MultiheadAttention_cuda_float32 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float64 0.21s call test/test_modules.py::TestModuleCUDA::test_pickle_nn_MultiheadAttention_cuda_float32 0.18s call test/test_modules.py::TestModuleCUDA::test_forward_nn_MultiheadAttention_cuda_float64 0.17s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float32 0.16s call test/test_modules.py::TestModuleCPU::test_non_contiguous_tensors_nn_MultiheadAttention_cpu_float64 0.11s call test/test_modules.py::TestModuleCUDA::test_factory_kwargs_nn_MultiheadAttention_cuda_float64 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float32 0.08s call test/test_modules.py::TestModuleCPU::test_pickle_nn_MultiheadAttention_cpu_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float64 0.06s call test/test_modules.py::TestModuleCUDA::test_repr_nn_MultiheadAttention_cuda_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float32 0.06s call test/test_modules.py::TestModuleCPU::test_forward_nn_MultiheadAttention_cpu_float64 ============================================================================================ short test summary info ============================================================================================= =========================================================================== 28 passed, 8 skipped, 336 deselected, 2 warnings in 19.71s =========================================================================== ``` cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/67176 Reviewed By: dagitses Differential Revision: D33094285 Pulled By: jbschlosser fbshipit-source-id: 0dd08261b8a457bf8bad5c7f3f6ded14b0beaf0d	2021-12-14 13:21:21 -08:00
Shirong Wu	37ec99c0e4	Open source trt lowering workflow (#69381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69381 Open source lowering workflow, related tools and tests. Test Plan: CI Reviewed By: 842974287 Differential Revision: D32815136 fbshipit-source-id: 3ace30833a2bc52e9b02513c5e223cb339fb74a3	2021-12-14 13:00:21 -08:00
Nikita Shulga	930067d129	Build clang builds with -Werror (#69712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69712 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997002 Pulled By: malfet fbshipit-source-id: 8ebb5a955f8ae2d3fb67bc70636a2b1d66010c84	2021-12-14 12:41:57 -08:00
hwangdeyu	c76c6e9bd3	[ONNX] Add BFloat16 type support when export to ONNX (#66788 ) Summary: - PyTorch and ONNX has supported BFloat16, add this to unblock some mixed-precision training model. - Support PyTorch TNLG model to use BFloat16 tensors for the inputs/outputs of the layers that run on the NPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66788 Reviewed By: jansel Differential Revision: D32283510 Pulled By: malfet fbshipit-source-id: 150d69b1465b2b917dd6554505eca58042c1262a	2021-12-14 12:23:32 -08:00
Wanchao Liang	800a457b6f	[shard] add ShardedOptimizer (#68607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68607 This PR added ShardedOptimizer and a API to get module parameters along with ShardedTensor param, it allows user to use this Optimizer Wrapper to construct a optimizer that involves ShardedTensor The state_dict support will be a follow up diff ghstack-source-id: 145532834 Test Plan: python test_sharded_optim.py Reviewed By: pritamdamania87 Differential Revision: D32539994 fbshipit-source-id: a3313c6870d1f1817fc3e08dc2fc27dc43bef743	2021-12-14 12:15:20 -08:00
Brian Hirsh	457ba1dd3e	Porting index_add to structured kernels, add an out variant (#65993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65993 This PR attempts to port `index_add` to structured kernels, but does more than that: * Adds an `out=` variant to `index_add` * Revises `native_functions.yaml` registrations, to not have multiple entries and instead pass default value to `alpha`. * Changes in `derivatives.yaml` file for autograd functioning * Revises error messages, please see: https://github.com/pytorch/pytorch/pull/65993#issuecomment-945441615 Follow-up PRs in near future will attempt to refactor the OpInfo test, and will give another look at tests in `test/test_torch.py` for this function. (hence the use of ghstack for this) ~This is WIP because there are tests failing for `Dimname` variant on mobile/android builds, and I'm working on fixing them.~ Issue tracker: https://github.com/pytorch/pytorch/issues/55070 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32646426 fbshipit-source-id: b035ecf843a9a27d4d1e18b202b035adc2a49ab5	2021-12-14 11:57:13 -08:00
Brian Hirsh	9594a94d80	fix CompositeImplicitAutograd ops improperly labeled (#69863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69863 This reverts commit 41c344d460a941c57f4793690c396f830a992824. Test Plan: Imported from OSS Reviewed By: albanD, soulitzer Differential Revision: D33072958 Pulled By: bdhirsh fbshipit-source-id: 3d3488f37986256986ab009d6f16476f29cff625	2021-12-14 11:47:07 -08:00
Nikita Shulga	269e92669a	[c2] Remove unused private fields (#69709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69709 Fix logical bug in `caffe2/ideep/operators/conv_op.cc`, which contained an always false statement (fusion_type_ == X && fusion_type_ == Y ) statement Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997006 Pulled By: malfet fbshipit-source-id: 23e4db1b17cf8a77eae6a8691847ffa484d4736c	2021-12-14 11:31:08 -08:00
Nikita Shulga	fef9981998	Update run_test.py (#69920 ) Summary: Do not compare LooseVersion against string Pull Request resolved: https://github.com/pytorch/pytorch/pull/69920 Reviewed By: atalman Differential Revision: D33101166 Pulled By: malfet fbshipit-source-id: a2df9e01d17663262718f11e580c8b009764f7b5	2021-12-14 11:26:56 -08:00
Andrew Or	3e43c478a8	[Quant][fx] Lower reference conv[1-3]d module (#69228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69228 Implement lowering logic for reference conv modules, similar to https://github.com/pytorch/pytorch/pull/65723. ghstack-source-id: 145058198 Test Plan: python test/test_quantization.py TestQuantizeFx.test_conv_lowering Imported from OSS Reviewed By: anjali411 Differential Revision: D32890743 fbshipit-source-id: 04f2500628c60b0fbc84d22705164215e190aeba	2021-12-14 11:23:39 -08:00
Kevin Tse	b67eaec853	[DateLoader] more clearly expose 'default_collate' and 'default_convert' to users (#69862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69862 Fixes #69445 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan, ngimel Differential Revision: D33068792 Pulled By: NivekT fbshipit-source-id: ef9791acdc23d014b8761fa7420062d454ce8969	2021-12-14 11:18:26 -08:00
Peter Bell	1188d89a1d	TestMathBits: Call functions with original sample input values (#68947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68947 `_test_math_view` currently calls the operator with different values than those specified in the `SampleInput`. This is undesirable as it could break mathematical properties required by the operator. Instead, this calls `math_op_view(math_op_physical(sample.input))` to get a view that represents the same value as the original input. `test_neg_view` already did this by returning `torch._neg_view(-x)` from `math_op_view` but this moves the handling into `_test_math_view` to make it apply to all view op tests. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D33064327 Pulled By: anjali411 fbshipit-source-id: 4d87e0c04fc39b95f8dc30dcabda0d554d16a1d8	2021-12-14 11:10:13 -08:00
Rui Zhu	1a299d8f1b	Add support for transformer layout of masked_softmax (#69272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69272 In transformer encoder and MHA, masked_softmax's mask is a 2D tensor (B, D), where input is a 4D tensor (B, H, D, D). This mask could be simply broadcasted to a (B, H, D, D) like input, and then do a regular masked_softmax, however it will bring the problem of non-contiguous mask & consume more memory. In this diff, we maintained mask's shape unchanged, while calc the corresponding mask for input in each cuda thread. This new layout is not currently supported in CPU yet. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32605557 fbshipit-source-id: ef37f86981fdb2fb264d776f0e581841de5d68d2	2021-12-14 10:51:58 -08:00
Brian Hirsh	2e7a91c45f	make codegen'd device guards not cuda-specific. Allow them to be used in external codegen (#68531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68531 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498570 Pulled By: bdhirsh fbshipit-source-id: 0ce6a5614417671313b4d274ea84742c5b81d1b0	2021-12-14 10:25:04 -08:00
Brian Hirsh	aa0cf68c17	allow external backend codegen to toggle whether to generate out= and inplace kernels (#68530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68530 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32498569 Pulled By: bdhirsh fbshipit-source-id: ebd932d042b988e19c71aa04a21677db9bdc9f04	2021-12-14 10:25:02 -08:00
Brian Hirsh	b83b6f7424	allow external backend codegen to be used without autograd kernels (#68529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68529 Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32498572 Pulled By: bdhirsh fbshipit-source-id: 3e7159c633f6a80b60faa068436a4c49ebe731ca	2021-12-14 10:23:12 -08:00
Rick Weyrauch	8acd0a8b2f	Allow row sizes to support int64/size_t. (#69303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69303 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/792 Follow up to D32715453 (`e60fd10659`), allowing row size to be 64-bit. Test Plan: buck test mode/opt -c fbcode.caffe2_gpu_type=v100,a100 //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt -c fbcode.caffe2_gpu_type=none //deeplearning/fbgemm/fbgemm_gpu:quantize_ops_test buck test mode/opt //caffe2/test: Reviewed By: jspark1105, jianyuh Differential Revision: D32768838 fbshipit-source-id: 9e2b01d8d23e71f8333820e725379c3fc1c0711a	2021-12-14 10:09:08 -08:00
francescocastelli	2c9dd886af	Modify torch.movedim to handle scalar as no-op (#69537 ) Summary: `torch.movedim` directly handle the case of a scalar tensor (0-dim) in input as a no-op by returning a view of the input tensor (after all the usual checks for the other parameters) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69537 Test Plan: This code now works fine and res1 is a view of tensor ``` import torch tensor = torch.rand(torch.Size([])) res1 = torch.movedim(tensor, 0, 0) ``` Fixes https://github.com/pytorch/pytorch/issues/69432 Reviewed By: jbschlosser Differential Revision: D33020014 Pulled By: albanD fbshipit-source-id: b3b2d380d70158bd3b3d6b40c073377104e09007	2021-12-14 09:55:59 -08:00
Ivan Kobzarev	7503ec58b2	[nnc][fix] xnnpack ifdef (#69870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69870 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D33075061 Pulled By: IvanKobzarev fbshipit-source-id: dd53ad8b7d0ff36a68f0864540d6f7dd2284f0e0	2021-12-14 09:50:24 -08:00
Donald Dong	f7294cd865	[Static Runtime] Skip ReplaceWithCopy when inputs have writters (#69819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69819 We should skip ReplaceWithCopy if the inputs to the operator can be updated during inference. For a set of tensors that share data, ReplaceWithCopy should not happen to any of them if there exists updates to any of them. Currently, the check in place has missed some cases (suppose there exists updates, and uses <= 1). This diff addresses the missing cases by querying AliasDB. Test Plan: - Added test cases, including a one that is problematic before this diff - CI Reviewed By: mikeiovine Differential Revision: D33052562 fbshipit-source-id: 61f87e471805f41d071a28212f2f457e8c6785e7	2021-12-14 09:39:49 -08:00
Nikita Shulga	07767569c9	Properly import LooseVersion (#69904 ) Summary: This fixes regression introduced by https://github.com/pytorch/pytorch/pull/57040 Somehow importing `distutils` from `setuptool` caused import of `distutils.versions`, which is not a documented dependency and got change with the release of [setuptools-59.6.0](https://github.com/pypa/setuptools/tree/v59.6.0) We should not rely on that, as `import distutils` never re-imports `distutils.version`, which one can see by observing https://github.com/python/cpython/blob/3.9/Lib/distutils/__init__.py or by running: ``` % python3 -c "import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'sys'] % python3 -c "from setuptools import distutils;print(distutils.__version__, dir(distutils))" 3.7.5 ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'archive_util', 'ccompiler', 'cmd', 'config', 'core', 'debug', 'dep_util', 'dir_util', 'dist', 'errors', 'extension', 'fancy_getopt', 'file_util', 'filelist', 'log', 'spawn', 'sys', 'sysconfig', 'util', 'version'] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69904 Reviewed By: albanD, atalman, janeyx99 Differential Revision: D33094453 Pulled By: malfet fbshipit-source-id: aaf1adb7c6f293c4e376ccff21c64cd6ba625e97	2021-12-14 09:28:19 -08:00
John Muradeli	fdcb78df38	`print` fix in `lr_scheduler` (#68338 ) Summary: `{:5d}` fails for `CosineAnnealingWarmRestarts` which has float `epoch` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68338 Reviewed By: jbschlosser Differential Revision: D33063970 Pulled By: albanD fbshipit-source-id: 992e987f8d5f6f8f5067924df4671e9725b6d884	2021-12-14 09:05:19 -08:00
CodemodService FBSourceClangFormatLinterBot	f7210f8d90	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D33090919 fbshipit-source-id: 78efa486776014a27f280a01a21f9e0af6742e3e	2021-12-14 08:06:58 -08:00
Jane Xu	4f81b2adbb	Remove if conditioning from some MacOS workflow steps (#69788 ) Summary: Indirectly fixes https://github.com/pytorch/pytorch/issues/69389 These steps shouldn't error out when the credentials aren't set anyway Pull Request resolved: https://github.com/pytorch/pytorch/pull/69788 Reviewed By: seemethere Differential Revision: D33061307 Pulled By: janeyx99 fbshipit-source-id: 7db6d15b3e80c3c13ea428248a8b4f8d2d32d4a1	2021-12-14 07:54:15 -08:00
Aditya Tewary	fa615b332d	added set_printoptions examples (#68324 ) Summary: Added examples for `torch.set_printoptions` ``` >>> torch.set_printoptions(precision=2) >>> torch.tensor([1.12345]) tensor([1.12]) >>> torch.set_printoptions(threshold=5) >>> torch.arange(10) tensor([0, 1, 2, ..., 7, 8, 9]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68324 Reviewed By: ngimel Differential Revision: D33063869 Pulled By: anjali411 fbshipit-source-id: 24db99df1419f96ba8ae2b5217cb039b288b630a	2021-12-14 07:40:52 -08:00
Vitaly Fedyunin	d90012689f	[DataPipe] Control shuffle settings from DataLoader2 (#65756 ) Summary: Makes `shuffle` DataPipe sensitive to DataLoader(2) `shuffle` kwarg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65756 Reviewed By: albanD Differential Revision: D31344867 Pulled By: VitalyFedyunin fbshipit-source-id: e0084e0ac193ac784d6298328ca1222745681347	2021-12-14 07:35:26 -08:00
Richard Zou	620a1fcb55	OpInfos for: normal, bernoulli, multinomial (#66358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66358 Test Plan: - run tests Reviewed By: mruberry Differential Revision: D31551695 Pulled By: zou3519 fbshipit-source-id: cf1b43118a0414a1af9ece9ae8c0598b2701aa0a	2021-12-14 06:59:38 -08:00
Peter Bell	4829dcea09	Codegen: Generate seperate headers per operator (#68247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68247 This splits `Functions.h`, `Operators.h`, `NativeFunctions.h` and `NativeMetaFunctions.h` into seperate headers per operator base name. With `at::sum` as an example, we can include: ```cpp <ATen/core/sum.h> // Like Functions.h <ATen/core/sum_ops.h> // Like Operators.h <ATen/core/sum_native.h> // Like NativeFunctions.h <ATen/core/sum_meta.h> // Like NativeMetaFunctions.h ``` The umbrella headers are still being generated, but all they do is include from the `ATen/ops' folder. Further, `TensorBody.h` now only includes the operators that have method variants. Which means files that only include `Tensor.h` don't need to be rebuilt when you modify function-only operators. Currently there are about 680 operators that don't have method variants, so this is potentially a significant win for incremental builds. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32596272 Pulled By: albanD fbshipit-source-id: 447671b2b6adc1364f66ed9717c896dae25fa272	2021-12-14 06:40:08 -08:00
Alban Desmaison	badf7b0210	fix typo changing the generated code (#69899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69899 Reviewed By: soulitzer Differential Revision: D33093461 Pulled By: albanD fbshipit-source-id: 2c672a2b767f0caed1ef3a1d2afa1cacdfcdc320	2021-12-14 06:36:14 -08:00
soulitzer	51033ec840	Add forward AD layout check for storage numel (#68631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68631 This PR: - Adds the check that the storage numel of the base and tangent tensors are the same. This is to support the case when as_strided reveals elements that aren't indexable by the input tensor. - Skips the check when batched tensors are involved, because using as_strided to reveal elements that not indexable by the input tensor is already not allowed vmap. - Adds tests for the above two cases, as well as an edge case regarding conj bit (what about neg bit?) For functorch: - we need to copy the batching rule implemented here Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32899678 Pulled By: soulitzer fbshipit-source-id: 54db9550dd2c93bc66b8fb2d36ce40799ebba794	2021-12-14 04:34:25 -08:00
soulitzer	6078e12ad6	Add forward AD support for as_strided (#68629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68629 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32899680 Pulled By: soulitzer fbshipit-source-id: b80ba4483c06108938923f17dc67278b854515ef	2021-12-14 04:33:05 -08:00
jjsjann123	fed9b90ed4	fixing removeProfilingNodes duplicated functions (#1282 ) (#68804 ) Summary: Unfortunately there're two versions of removeProfilingNodes function and one of them is not cleaning up profile_ivalue nodes properly. This leads to a dangling profile_ivalue node, which ended up being profiled multiple times and could give us false assert failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68804 Reviewed By: mrshenli Differential Revision: D32980157 Pulled By: Krovatkin fbshipit-source-id: cd57c58a941d10ccd01a6cd37aac5c16256aaea6	2021-12-13 22:54:30 -08:00
Shirong Wu	82075c0a19	Create trt plugin base (#69487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69487 Write customized plugin for trt requires extend IPluginV2IOExt. This diff extract functions that should share comon impl between plugins from IPluginV2IOExt into plugin_base, make writing customized plugin for oss user easier. This diff also fix double creator issue, the root cause is about get_trt_plugin in converters.py look for plugin by name matching. Swith to use the util function from converters_utils.py resolve the issue. Test Plan: CI Reviewed By: 842974287 Differential Revision: D32747052 fbshipit-source-id: 7f2e8811c158230f66a0c389af4b84deaf7e2d1f	2021-12-13 21:31:24 -08:00
Andrey Talman	77a4b89411	Adding windows cuda 11.5 workflows (#69377 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69377 Reviewed By: ngimel Differential Revision: D33076022 Pulled By: atalman fbshipit-source-id: aeb2791fc15d7b491976f57a74c1989c6ca61b81	2021-12-13 20:49:02 -08:00
Supriya Rao	b1ef56d646	[quant][docs] quantized model save/load instructions (#69789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69789 Add details on how to save and load quantized models without hitting errors Test Plan: CI autogenerated docs Imported from OSS Reviewed By: jerryzh168 Differential Revision: D33030991 fbshipit-source-id: 8ec4610ae6d5bcbdd3c5e3bb725f2b06af960d52	2021-12-13 20:23:59 -08:00
Erjia Guan	2b81ea4f9a	[DataPipe] Export ShardingFilter (#69844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69844 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D33062183 Pulled By: ejguan fbshipit-source-id: 6b3f4ad376959c4d2e8c8b2751ae6657527dcd36	2021-12-13 19:30:56 -08:00
Adrian Wälchli	603a1de871	Fix inefficient recursive update in ShardedTensor.state_dict hook (#68806 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68805 The bug is described in the linked issue. This PR is an attempt to make the functions `_recurse_update_dict` and `_recurse_update_module` more efficient in how they iterate over the submodules. The previous implementation was suboptimal, as it recursively called the update method on the submodules returned by `module.named_modules()`, while `module.named_modules()` already returned all submodules including nested ones. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68806 Reviewed By: pritamdamania87 Differential Revision: D33053940 Pulled By: wanchaol fbshipit-source-id: 3e72822f65a641939fec40daef29c806af725df6	2021-12-13 19:22:55 -08:00
Peter Bell	b08d64202a	Remove THGeneral (#69041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69041 `TH_CONCAT_{N}` is still being used by THP so I've moved that into it's own header but all the compiled code is gone. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872477 Pulled By: ngimel fbshipit-source-id: 06c82d8f96dbcee0715be407c61dfc7d7e8be47a	2021-12-13 16:14:28 -08:00
Jithun Nair	8dfdc3df82	[ROCm] Refactor how to specify AMD gpu targets using PYTORCH_ROCM_ARCH (#61706 ) Summary: Remove all hardcoded AMD gfx targets PyTorch build and Magma build will use rocm_agent_enumerator as backup if PYTORCH_ROCM_ARCH env var is not defined PyTorch extensions will use same gfx targets as the PyTorch build, unless PYTORCH_ROCM_ARCH env var is defined torch.cuda.get_arch_list() now works for ROCm builds PyTorch CI dockers will continue to be built for gfx900 and gfx906 for now. PYTORCH_ROCM_ARCH env var can be a space or semicolon separated list of gfx archs eg. "gfx900 gfx906" or "gfx900;gfx906" cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/61706 Reviewed By: seemethere Differential Revision: D32735862 Pulled By: malfet fbshipit-source-id: 3170e445e738e3ce373203e1e4ae99c84e645d7d	2021-12-13 15:41:40 -08:00
Mike Iovine	c6c3b43498	[SR][easy] Accessors for value array offsets (#69755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69755 Per swolchok's suggestion on D32609915 (`1c43b1602c`). Hide the value offset indices behind accessors to provide more flexibility if we ever decide to change the layout of the values array. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32838145 fbshipit-source-id: cf805c077672de4c2fded9b41da01eca6d84b388	2021-12-13 15:31:39 -08:00
oliver	3d358a7678	Adds a `maximize` flag to Adam (#68164 ) Summary: Solves the next most important use case in https://github.com/pytorch/pytorch/issues/68052. I have kept the style as close to that in SGD as seemed reasonable, given the slight differences in their internal implementations. All feedback welcome! cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68164 Reviewed By: VitalyFedyunin Differential Revision: D32994129 Pulled By: albanD fbshipit-source-id: 65c57c3f3dbbd3e3e5338d51def54482503e8850	2021-12-13 05:53:53 -08:00
Joel Schlosser	fc37e5b3ed	Hook up general convolution to convolution_backward (#69584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69584 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32936380 Pulled By: jbschlosser fbshipit-source-id: c6fdd88db33bd1a9d0eabea47ae09a4d5b170e92	2021-12-12 17:30:01 -08:00
Hao Lu	0420de3539	[SR] Log SR options (#69809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69809 SR options is only printed out once per model per net. Logging it is actually pretty helpful for debugging. Test Plan: CI Reviewed By: donaldong Differential Revision: D33046814 fbshipit-source-id: 536b34e00fbc8a273c5eb4d8ae5caca0dc1f4c24	2021-12-12 16:32:00 -08:00
Joel Schlosser	f0e98dcbd3	General convolution_backward function (#69044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69044 Test Plan: Imported from OSS Reviewed By: zou3519, albanD, H-Huang Differential Revision: D32708818 Pulled By: jbschlosser fbshipit-source-id: e563baa3197811d8d51553fc83718ace2f8d1b7a	2021-12-12 15:53:38 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	a5b5152d7a	Fix typo in aten::full in version_map (#69807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69807 Test Plan: {gif:ursvp75m} Reviewed By: gmagogsfm Differential Revision: D33044503 fbshipit-source-id: 14aac66b123d84ca3f35f02c276b15e55015df9e	2021-12-12 14:47:16 -08:00
soulitzer	af7ee9fc01	Forward AD for inplace comparison operators (#69597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69597 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020600 Pulled By: soulitzer fbshipit-source-id: 0c9ab210f7dc952a41fbcaa1f5f7921c2fdeb18b	2021-12-12 00:11:14 -08:00
soulitzer	0dcbd73eee	Add some forward AD formulas (#69384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69384 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020602 Pulled By: soulitzer fbshipit-source-id: a92dd243f2b5b21fe277b0bb17bcd61dfe5a0d67	2021-12-12 00:11:11 -08:00
soulitzer	baf92f9d5a	Fix copy_ forward AD to handle broadcasting (#69592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69592 Currently, forward AD function for`copy_` (in `VariableTypeManual`) does not handle the broadcasting case. ~EDIT: but that is not a design decision, not a bug. In this PR, we make that clear as a comment.~ Note: `broadcast_to` does not have a batching rule in core, so the ops that rely on `copy_` to broadcast will still fail batched forward grad computation. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020603 Pulled By: soulitzer fbshipit-source-id: 09cb702bffc74061964a9c05cfef5121f8164814	2021-12-12 00:11:08 -08:00
soulitzer	db32daf4b2	Do not test batched forward grad for inplace ops (#69558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69558 Currently we skip batched forward grad checks completely for certain views that also have inplace variants. This PR allow us to decouple the check. Alternative: just skip the batched forward checks for inplace ops entirely. I'm okay with this because it was surprising to me these checks are being run in the first place. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D33020599 Pulled By: soulitzer fbshipit-source-id: f8012aadc0e775f80da0ab62b2c11f6645bb1f51	2021-12-12 00:09:45 -08:00
Michael Suo	f565167fbd	Revert D32606547: torch/monitor: add C++ events and handlers Test Plan: revert-hammer Differential Revision: D32606547 (`e61fc1c03b`) Original commit changeset: a00d0364092d Original Phabricator Diff: D32606547 (`e61fc1c03b`) fbshipit-source-id: fbaf2cc06ad4bec606e8a9c6f591d65c04e6fa56	2021-12-11 22:51:03 -08:00
Jerry Zhang	f575179953	[quant][fx][graphmode] Move more patterns to use ModuleReLU fuse handler (#69644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69644 This PR cleans up the init of ModuleReLUFuseHandler and moved all `module - relu` fusion pattern to use this handler also disabled additional_fuser_method argument temporarily, will enable after we bring back the simple pattern format Test Plan: ``` python test/test_quantize_fx.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32974906 fbshipit-source-id: 23483ea4293d569cb3cec6dadfefd4d9f30921a7	2021-12-11 22:00:06 -08:00
Tristan Rice	e61fc1c03b	torch/monitor: add C++ events and handlers (#68783 ) Summary: This adds a C++ event handler corresponding to the Python one mentioned in the RFC. This changes the counters a bit to all be push driven instead of being polled. The two window types are "fixed count" and "interval". One is based off the number of logged events and the other is based off of time windows. There's currently no active ticker for interval so it needs a regular stream of events to ensure events are produced. A follow up diff can add support for things like HHWheel / simple ticker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68783 Test Plan: buck test //caffe2/test/cpp/monitor:monitor Reviewed By: kiukchung Differential Revision: D32606547 fbshipit-source-id: a00d0364092d7d8a98e0b18e503c0ca8ede2bead	2021-12-11 16:44:46 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	20f7c893c1	Populate runtime with upgrader graph (#68773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68773 Test Plan: Imported from OSS Reviewed By: qihqi, gmagogsfm Differential Revision: D32603258 Pulled By: tugsbayasgalan fbshipit-source-id: 6fa0b7ee4ebe46c9aa148923c6ef3e1de106ad13	2021-12-11 13:44:24 -08:00
Yanan Cao	17f3179d60	Back out "[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer" (#69796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69796 (Note: this ignores all push blocking failures!) Test Plan: External CI + Sandcastle Reviewed By: zhxchen17 Differential Revision: D33032671 fbshipit-source-id: dbf6690e960e25d6a5f19043cbe792add2acd7ef	2021-12-10 21:29:53 -08:00
Stephen Macke	3906f8247a	clear predict_net field from PredictorExporterMeta stored in the exporter to save memory (#68485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68485 In OSS, the only change is that we make the predict_net field of PredictorExporterMeta nullable. Test Plan: sandcastle, let CI run Reviewed By: boryiingsu Differential Revision: D32467138 fbshipit-source-id: 81bd5fca695462f6a186bcfa927073874cc9c26a	2021-12-10 21:25:36 -08:00
Scott Wolchok	19fecc63e4	[PyTorch][kineto] Remove heap-allocated vectors in saveExtraArgs (#69737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69737 We can use stack allocation instead. ghstack-source-id: 145312454 Test Plan: Ran internal framework overhead benchmark with --stressTestKinto --kinetoAddFlops, but difference was minimal. Still good to fix. Reviewed By: chowarfb Differential Revision: D33007329 fbshipit-source-id: e096312fef5b729cf12580be152c9418683745b8	2021-12-10 20:24:17 -08:00
Xu Zhao	731c8255b7	Fix the TorchBench CI when running with a benchmark branch. (#69795 ) Summary: Fixes TorchBench CI when user is running with their own branch Supersedes https://github.com/pytorch/pytorch/pull/69770 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69795 Reviewed By: malfet Differential Revision: D33032886 Pulled By: xuzhao9 fbshipit-source-id: 82baee94df6925bf91bb575143efa058ce98b914	2021-12-10 18:04:43 -08:00
Nikita Shulga	59deee8308	Make c10 tests compilable with -Werror (#69711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69711 Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997005 Pulled By: malfet fbshipit-source-id: 369194051ece9d213b48584ca84e5d76b3794dae	2021-12-10 16:47:46 -08:00
Nikita Shulga	e305e4d4d8	Suppress common warnings when building by clang (#69710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69710 Namely no range-loop-analysis (that detect when loop variable can not be const reference Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997003 Pulled By: malfet fbshipit-source-id: dba0e7875e5b667e2cc394c70dd75e2403265918	2021-12-10 16:45:38 -08:00
Nikita Shulga	41c344d460	Revert D32739976: fix CompositeImplicitAutograd ops improperly labeled Test Plan: revert-hammer Differential Revision: D32739976 (`195b0d0645`) Original commit changeset: a756dd9e0b87 Original Phabricator Diff: D32739976 (`195b0d0645`) fbshipit-source-id: 6e898dd5435f31e604588e6e50be1217fa207a54	2021-12-10 13:04:29 -08:00
Nikita Shulga	77213fa4d3	Fix docker builds for Python-3.6 (#69785 ) Summary: As [conda-4.11](https://anaconda.org/anaconda/conda/files?version=4.11.0) is no longer available for Python-3.6, stick to 4.10 for 3.6 builds Fixes https://github.com/pytorch/pytorch/issues/69781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69785 Reviewed By: seemethere, atalman Differential Revision: D33026217 Pulled By: malfet fbshipit-source-id: d742a1e79634ed62b3a941ba23a7a74f41c2f4cb	2021-12-10 12:29:15 -08:00
Kevin Tse	a5a7e30943	[DataPipe] Adding interface for MapDataPipes (#69648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69648 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32989066 Pulled By: NivekT fbshipit-source-id: ef96bcd4ac4d7a576fdd2a3fb4ef52ae6a902e10	2021-12-10 12:06:08 -08:00
Kevin Tse	81a60b9813	[DataPipe] Adding output types to DataPipe interface file (#69647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69647 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32989067 Pulled By: NivekT fbshipit-source-id: 2c2e71e9e514e0d584affaa0b71b7b0d07a2ddbf	2021-12-10 12:04:45 -08:00
Scott Wolchok	d026057bb3	[PyTorch] Update SmallVector from LLVM (#69110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69110 I pasted the current LLVM code, reapplied the modifications listed in the code comments, caught a few more in the diff/build process. The trivially copyable detection is different now; if gcc builds fail, will try reverting to C10_IS_TRIVIALLY_COPYABLE or copying what LLVM is doing. The motivation for this change is that, as noted in an existing comment, C10_IS_TRIVIALLY_COPYABLE did the wrong thing for std::unique_ptr, which caused problems with D32454856 / #68412. ghstack-source-id: 145327773 Test Plan: CI Reviewed By: bhosmer, mruberry Differential Revision: D32733017 fbshipit-source-id: 9452ab90328e3fdf457aad23a26f2f6835b0bd3d	2021-12-10 11:57:19 -08:00
Scott Wolchok	1d269e8c15	[PyTorch] Simple refcount bump fixes in standardizeVectorForUnion & callees (#66695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66695 More extra reference counting in this path. ghstack-source-id: 145125484 Test Plan: CI Reviewed By: suo Differential Revision: D31692197 fbshipit-source-id: 126b6c72efbef9410d4c2e61179b6b67459afc23	2021-12-10 11:43:01 -08:00
Wanchao Liang	5374d5d8c9	[shard] fix with_comms wrapper (#69493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69493 When added `with_comms` decorator with arguments, we added an `with_comms_decorator` inner function, `with_comms()` will refer to a function object, the added parentheses was necessary to use in test cases. This PR fixes the `with_comms` wrapper behavior, to allow we both specify with/without arguments in test cases: ``` with_comms def test_case: ... ``` or ``` with_comms(backend="gloo") def test_case: ... ``` ghstack-source-id: 145327066 Test Plan: test_sharded_tensor Reviewed By: pritamdamania87 Differential Revision: D32897555 fbshipit-source-id: 2f3504630df4f6ad1ea73b8084fb781f21604110	2021-12-10 10:25:54 -08:00
David Berard	e1c583a691	[JIT] simplify logic for merging types during profiling (#69096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69096 Instead of storing profiling data in a map and then merginging at the end, perform merging directly during profiling. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32772626 Pulled By: davidberard98 fbshipit-source-id: 22622c916a61908b478dd09433815685ce43682a	2021-12-10 09:29:19 -08:00
Nikita Shulga	3219f6a487	Make vec512 bfloat16 map function clang-Wall clean (#69707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69707 `const` modifier for `__m512` return value doesn't make much sense Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997008 Pulled By: malfet fbshipit-source-id: fb98659713fe2a23cc702252c0655106687f0dbf	2021-12-10 09:11:42 -08:00
Nikita Shulga	a5ad2cdab5	Cleanup ProcessGroup.cpp (#69706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69706 Mostly code modernization, also do not capture unused `this` in end_handler functor Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32997009 Pulled By: malfet fbshipit-source-id: ac907f0c6889ad06d4fb0171964cb05133e5e610	2021-12-10 09:11:39 -08:00
Nikita Shulga	7ea5926130	Make blend operations clang-Wall clean (#69705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69705 Test Plan: Imported from OSS Reviewed By: r-barnes Differential Revision: D32997007 Pulled By: malfet fbshipit-source-id: cbadc44e1e7373800e94b7b2fd2711530854978c	2021-12-10 09:10:07 -08:00
Brian Hirsh	195b0d0645	fix CompositeImplicitAutograd ops improperly labeled (#69169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69169 I checked `derivatives.yaml`, and it doesn't look like `logical_not/and/xor` are meant to work with autograd. Those 3 ops are currently set as `CompositeImplicitAutograd` though, implying that they do work with autograd. Updating them to be CompositeExplicitAutograd instead. This came up because I'm trying to improve the error checking in external backend codegen, and these ops being improperly labeled incorrectly triggers my new error checks for XLA (see https://github.com/pytorch/pytorch/pull/67090) Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32739976 Pulled By: bdhirsh fbshipit-source-id: a756dd9e0b87276368063c8f4934be59dca371d3	2021-12-10 09:03:51 -08:00
Richard Barnes	29d759948e	use irange for loops 2 (#66746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66746 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705361 fbshipit-source-id: 33fd22eb03086d114e2c98e56703e8ec84460268	2021-12-10 04:26:23 -08:00
Hao Lu	91d16cb633	[Jit] Fix schema of aten::split int[] version (#69745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69745 Missed in D31935573 (`6b44e75f6b`). Reviewed By: d1jang Differential Revision: D31889867 fbshipit-source-id: 417bd0b15db4891dbd641b35a803553f11d0d756	2021-12-10 02:33:36 -08:00
Peter Bell	9962bfb3c9	Remove THTensor (#69040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69040 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872478 Pulled By: ngimel fbshipit-source-id: f93e16509d64308d91e374744410a6a811e7f4e3	2021-12-10 02:29:11 -08:00
Hui Guo	531b045446	[tensorexpr] Fix the buf size of discontiguous tensors (#69657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69657 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32974473 Pulled By: huiguoo fbshipit-source-id: 52dcd13d0ad7f7e4f1beb69dcaabc8ceb386ffca	2021-12-10 01:26:37 -08:00
Rui Zhu	aab67c6dff	Add native masked_softmax (#69268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69268 This diff enabled native masked softmax on CUDA, also expanded our current warp_softmax to accept masking. The mask in this masked softmax has to be the same shape as input, and has to be contiguous. In a following diff I will submit later, I will have encoder mask layout included, where input is BHDD and mask is BD. Test Plan: buck build mode/opt -c fbcode.enable_gpu_sections=true caffe2/test:nn && buck-out/gen/caffe2/test/nn\#binary.par -r test_masked_softmax Reviewed By: ngimel Differential Revision: D32338419 fbshipit-source-id: 48c3fde793ad4535725d9dae712db42e2bdb8a49	2021-12-09 23:29:45 -08:00
Hao Lu	a5996a6857	[SR] Wrap check_for_memory_leak with DCHECK (#69588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69588 Code cleanup Reviewed By: mikeiovine Differential Revision: D32938333 fbshipit-source-id: d15dc405b281411c4c3c27a1dabf82f430c3ed08	2021-12-09 22:11:21 -08:00
Nikita Shulga	3bb20ae49f	Make c10d tests -Werror clean (#69703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69703 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32997001 Pulled By: malfet fbshipit-source-id: 38b5f195c04f2b3b920e6883a96fe9a36345b9d2	2021-12-09 22:10:04 -08:00
Nikita Shulga	be757addfa	Do not use `std::labs` (#69704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69704 Instead, compute size diff inside the if statement Test Plan: Imported from OSS Reviewed By: zou3519, seemethere Differential Revision: D32997004 Pulled By: malfet fbshipit-source-id: a23819240bfe8278a11ebc6bae1e856de162f082	2021-12-09 22:05:14 -08:00
BowenBao	3f02ad09ec	[ONNX] shapeValueMap: Represent symbolic shape as value (#68203 ) (#69545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69545 Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32994272 Pulled By: malfet fbshipit-source-id: 77cbdd78d01712faf4f9703549a2833340954509 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-12-09 22:00:46 -08:00
Ha-nyung Chung	3d32a0c139	Back out "[wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op" (#69713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69713 Original commit changeset: 456086b308c4 Original Phabricator Diff: D32537714 (`bd8a4a9372`) Reviewed By: jerryzh168 Differential Revision: D32976643 fbshipit-source-id: bea6bf6a2718e42c9efa48a0b0c1dc7fe3893065	2021-12-09 21:55:09 -08:00
Ivan Kobzarev	7dba88dfdb	[nnc][quant] Fix quantized concat (#69596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69596 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32941108 Pulled By: IvanKobzarev fbshipit-source-id: 727f608b98625648e2e444396d910838c95f58f2	2021-12-09 18:55:32 -08:00
Peter Bell	b2e79ed5ec	Remove WindowsTorchApiMacro.h in favor of Export.h (#69585 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/68095 This also changes the files from the ATen folder to include c10's `Export.h` instead since they can't ever be exporting `TORCH_PYTHON_API`. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69585 Reviewed By: mrshenli Differential Revision: D32958594 Pulled By: albanD fbshipit-source-id: 1ec7ef63764573fa2b486928955e3a1172150061	2021-12-09 17:30:09 -08:00
Mike Iovine	f87f1d08e8	[SR] assignStorageToManagedTensors returns a vector (#69568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69568 Non-empty vectors should never be passed to `assignStorageToManagedTensors` and `assignStorageToManagedOutputTensors`. Presumably, this out-variant convention was adopted to avoid move-assigning the corresponding attribtues in `MemoryPlanner`. But the cost of a vector move-assign is not high, and this function type signature is safer. Test Plan: `buck test caffe2/bechmarks/static_runtime:static_runtime_cpptest` Reviewed By: donaldong Differential Revision: D32729289 fbshipit-source-id: 88f19de8eb89d8a4f1dd8bbd4d9e7f686e41888b	2021-12-09 17:01:48 -08:00
Don Jang	9aa1b3e396	[Static Runtime] [Code Cleanup] Encapsulate function objects within ProcessedFunction (#69595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69595 This changes encapsulates `function` object in `ProcessedFunction` objects instead of exposing it unnecessarily just for executing it. Test Plan: Existing tests Reviewed By: mikeiovine Differential Revision: D32908341 fbshipit-source-id: 5ff4951cbe276c5c6292227124d9eec1dd16e364	2021-12-09 15:11:03 -08:00
Richard Zou	41e1ab0785	Introduce isTensorSubclassLike; add special cases to backwards formulas (#69534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69534 Something is TensorSubclassLike if it is a Tensor subclass or if it has the same problems as Tensor subclasses. Today that just includes Tensor Subclasses and meta tensors but may include other things in the future. Some of our backwards formulas are incompatible with TensorSubclassLike objects. For example, calling .data_ptr() is a problem because many TensorSubclassLike objects don't have storage. Another problem is in-place operations: performing `regular_tensor.inplace_(tensor_subclass)` is a problem. This PR adds special cases to the backward formulas for torch.max and torch.clamp to handle this. The backward formulas for torch.max and torch.clamp are not dispatcher operations so they cannot be overridden and we hesitate to make them dispatcher operations for FC/BC concerns and performance overhead concerns. Furthermore, the old concept of "is this inplace operation vmap compatible?" can be subsumed by the general "is this inplace operation tensor-subclass compatible" question, so I replaced all instances of isInplaceVmapCompatible and replaced it with the isTensorSubclassLike checks. Test Plan - I tested the changes using functorch. - It's possible to write a test for these in core (one has to make a custom tensor subclass and then send it through the operation and then invoke autograd), but I wanted to push the work to doing some generic testing for backward formulas (https://github.com/pytorch/pytorch/issues/69530) instead of doing some one-off things now. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32967727 Pulled By: zou3519 fbshipit-source-id: 30fda1a7581da4c55179b7a3ca05069150bbe2dc	2021-12-09 15:03:22 -08:00
Han Qi	d3649309e6	[pytorch][PR] Add ability for a mobile::Module to save as flatbuffer (#69306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69306 Included functions: save_mobile_module -> saves a mobile::Module to flatbuffer load_mobile_module_from_file -> loads a flatbuffer into mobile::Module parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Test Plan: unittests Reviewed By: gmagogsfm Differential Revision: D32806835 fbshipit-source-id: 71913c6650e225634f878946bd16960d377a7f57	2021-12-09 14:53:31 -08:00
Eli Uriegas	193e3c484e	.github: Add fbsync to push triggers (#69718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69718 canary is now pushing to fbsync so we should change our workflows to reflect that. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D32999967 Pulled By: seemethere fbshipit-source-id: bc4bc9afd2d73c53f91d3af3b81aca1b31f665a4	2021-12-09 14:30:29 -08:00
Mike Iovine	3e20a74b55	[SR] Update memory planner docs (#69559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69559 We have a lot of special cases. Document them so they're easy to learn about. ghstack-source-id: 145226542 Test Plan: Spell check? :) Reviewed By: d1jang Differential Revision: D32929416 fbshipit-source-id: 2362410f25a27cdb74a4939903446192cef61978	2021-12-09 14:22:33 -08:00
Juhyeong Kim	e963b43691	Extend explanation of `torch.cholesky_inverse` to consider batched inputs. (#69069 ) Summary: While implementing https://github.com/pytorch/pytorch/issues/68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](https://github.com/pytorch/pytorch/pull/68720#pullrequestreview-817243697) `torch.cholesky_inverse` is implemented in https://github.com/pytorch/pytorch/issues/50269 and the doc was updated at https://github.com/pytorch/pytorch/issues/31275 but not merged. neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/69069 Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44	2021-12-09 14:01:31 -08:00
chunyuan	9ad05f2c3a	Upgrade oneDNN to v2.3.3 and package oneDNN Graph API together (#63748 ) Summary: This PR upgrades oneDNN to [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) and includes [Graph API preview release](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.2) in one package. - oneDNN will be located at `pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN` - The version of oneDNN will be [v2.3.3](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.3) The main changes on CPU: - v2.3 - Extended primitive cache to improve primitive descriptor creation performance. - Improved primitive cache performance in multithreaded configurations. - Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). - Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats. - Improved performance of reduction primitive - Improved performance of depthwise convolution primitive with NHWC activations for training cases - v2.3.1 - Improved int8 GEMM performance for processors with Intel AVX2 and Intel DL Boost support - Fixed integer overflow for inner product implementation on CPUs - Fixed out of bounds access in GEMM implementation for Intel SSE 4.1 - v2.3.2 - Fixed performance regression in fp32 inner product primitive for processors with Intel AVX512 support - v2.3.3 - Reverted check for memory descriptor stride validity for unit dimensions - Fixed memory leak in CPU GEMM implementation More changes can be found in https://github.com/oneapi-src/oneDNN/releases. - The Graph API provides flexible API for aggressive fusion, and the preview2 supports fusion for FP32 inference. See the [Graph API release branch](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2) and [spec](https://spec.oneapi.io/onednn-graph/latest/introduction.html) for more details. A separate PR will be submitted to integrate the oneDNN Graph API to Torchscript graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63748 Reviewed By: albanD Differential Revision: D32153889 Pulled By: malfet fbshipit-source-id: 536071168ffe312d452f75d54f34c336ca3778c1	2021-12-09 13:42:40 -08:00
Michael Suo	17641fed2a	Revert D32942007: OpInfo: Convert more sample_input_funcs to generators Test Plan: revert-hammer Differential Revision: D32942007 (`d21646c432`) Original commit changeset: bb5b253d6d87 Original Phabricator Diff: D32942007 (`d21646c432`) fbshipit-source-id: d37c78174f0acea48e4cd4af3ac67ca4ee7ac54d	2021-12-09 10:54:41 -08:00
milesial	0ccb1dcdbb	Fix inference_mode decorator (#68617 ) Summary: This fixes the case when `torch.inference_mode` is called with `mode=False` (disabled). When used as a decorator, it ignored the argument and enabled inference mode anyway. `_DecoratorContextManager` is changed so that a new instance is a copy instead of a new instance with default parameters. I also added more tests to cover this case. Current behaviour: ```python >>> import torch >>> x = torch.ones(1, 2, 3, requires_grad=True) >>> torch.inference_mode(mode=False) ... def func(x): ... return x * x ... >>> out = func(x) >>> out.requires_grad False ``` New behaviour (fixed): ```python >>> import torch >>> x = torch.ones(1, 2, 3, requires_grad=True) >>> torch.inference_mode(mode=False) ... def func(x): ... return x * x ... >>> out = func(x) >>> out.requires_grad True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68617 Reviewed By: mrshenli Differential Revision: D32958434 Pulled By: albanD fbshipit-source-id: 133c69970ef8bffb9fc9ab5142dedcffc4c32945	2021-12-09 10:45:09 -08:00
Richard Barnes	afb742382a	use irange for loops 10 (#69394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69394 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837991 fbshipit-source-id: fc7c4f76d2f32a17a0faf329294b3fe7cb81df32	2021-12-09 09:49:34 -08:00
Nik B	2d5b3101c1	Added ScriptFunction pkl exception for issue #61210 #61381 (#67076 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61381, https://github.com/pytorch/pytorch/issues/61210 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67076 Reviewed By: jbschlosser Differential Revision: D32908175 Pulled By: suo fbshipit-source-id: f6e175793243dc96cde5e44022d92f2623b934eb Co-authored-by: LucaStubbe <stubbeluca@gmail.com> Co-authored-by: Kanon Tromp <ktromp1@student.cccd.edu>	2021-12-09 09:44:49 -08:00
Peter Bell	d21646c432	OpInfo: Convert more sample_input_funcs to generators (#69257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69257 These are sample functions that already use generators internally, this just moves the `yield` into the sample function itself. Diff is best viewed ignoring whitespace changes https://github.com/pytorch/pytorch/pull/69257/files?diff=unified&w=1 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942007 Pulled By: mruberry fbshipit-source-id: bb5b253d6d87b3495b7059924bed35b09d2768a2	2021-12-09 08:38:51 -08:00
Peter Bell	6de9f0fc94	OpInfo: Allow sample_inputs_func to be any iterable (#69256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69256 Closes #52486 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32942008 Pulled By: mruberry fbshipit-source-id: f5b01b0298c0160b0bec6e86e2b6db8cfe746206	2021-12-09 08:37:26 -08:00
Gao, Xiang	d2917f705a	Fix errors in `common_utils.py` (#69578 ) Summary: This fixes the following error: ```python Traceback (most recent call last): File "/home/gaoxiang/pytorch-ucc2/test/distributed/test_distributed_spawn.py", line 40, in <module> run_tests() File "/home/gaoxiang/.local/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 618, in run_tests ['--import-slow-tests'] if IMPORT_SLOW_TESTS else List[str]([])) File "/usr/lib/python3.9/typing.py", line 680, in __call__ raise TypeError(f"Type {self._name} cannot be instantiated; " TypeError: Type List cannot be instantiated; use list() instead Traceback (most recent call last): File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1058, in <module> main() File "/home/gaoxiang/pytorch-ucc2/test/run_test.py", line 1036, in main raise RuntimeError(err_message) RuntimeError: distributed/test_distributed_spawn failed! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69578 Reviewed By: mrshenli Differential Revision: D32963113 Pulled By: malfet fbshipit-source-id: b064e230c5e572e890b4ac66ebdda2707b8c12d7	2021-12-09 07:33:43 -08:00
Zafar	07932e2735	[sparsity] Convert function for sparse kernels without a context manager (#66778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66778 This removes the hack of the context manager that would communicate the zeros block shape to the quantization convert. The conversion will assume that the converted modules have `sparse_params` (which is added by the sparsifier). Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31835721 Pulled By: z-a-f fbshipit-source-id: c5fd2da3b09a728a2296765c00ca69275dbca3b1	2021-12-09 02:58:57 -08:00
Nicolas Hug	b957b82db7	Replace issue templates with new issue forms - v2 (#69361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69361 This PR introduces the new issue forms that replace issue templates. (This is exactly the same as https://github.com/pytorch/pytorch/pull/65917 which was reverted due to an issue during the import) This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml)) The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions. There is still a "blank" template available. I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this covers all previous use-cases properly. Test Plan: Imported from OSS Reviewed By: albanD, mrshenli Differential Revision: D32947189 Pulled By: NicolasHug fbshipit-source-id: f19abe3e7c9c479b0b227969a207916db5bdb6e3	2021-12-09 02:42:29 -08:00
Zafar	e948856ce7	[sparsity] Add ability to keep sparsity parameters in modules (#66777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66777 Sometimes one might need to keep the sparsity parameters after the sparsifier is detached. This saves the parameters in the `sparse_params`. There are two ways of keeping the sparsifier params: 1. Tuple[str, ...]: A tuple of all the parameters that need to be stored. 2. Dict[str, Tuple[str, ...]]: A dict of layer keys and parameters. In this case only specified layers will have the parameters attached to. For example: ``` >>> # This will keep params in every module >>> sparsifier.squash_mask(keep_sparse_params=('sparse_block_shape',)) >>> print(model.submodule.linear1.sparse_params) {'sparse_block_shape': (1, 4)} >>> print(model.submodule.linear2.sparse_params) {'sparse_block_shape': (1, 4)} ``` ``` >>> # This will keep params only in specific modules >>> sparsifier.squash_mask(keep_sparse_params={'submodule.linear1': ('sparse_block_shape',)}) >>> print(model.submodule.linear1.sparse_params) {'sparse_block_shape': (1, 4)} >>> print(model.submodule.linear2.sparse_params) AttributeError: 'Linear' object has no attribute 'sparse_params' ``` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31835722 Pulled By: z-a-f fbshipit-source-id: 20c2d80207eb7ce7291e7f5f655d3fb2a627190f	2021-12-09 02:36:27 -08:00
Chen Lai	13faaff54c	[Operator Versioning][Edge] Implement register function for upgrader (#67730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67730 This pr implement the register function for upgrader so it can be used at loading stage ghstack-source-id: 145170986 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092518 fbshipit-source-id: 779b51eb12b8cb162a93a55c1e66fe0becc4cb36	2021-12-09 02:18:09 -08:00
Zafar	4f5806dee7	[AO] Clear the contents of the torch/ao/__init__.py (#69415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69415 Adding the imports inside the torch/ao/__init__.py has a high chance of causing circular dependencies, especially if sparsity and quantization use each other's resources. To avoid the dependency issues, we can just keep the __init__ empty. Notes: - This means that the user will have to explicitly import the `torch.ao.quantization` or `torch.ao.sparsity` instead of `from torch import ao; ao.quantization.???`. - The issue of circular dependencies that are caused by the imports with binding submodules is [fixed in Python 3.7](https://docs.python.org/3/whatsnew/3.7.html#other-language-changes), which means this solution will become obsolete at the [3.6's EoL](https://www.python.org/dev/peps/pep-0494/#and-beyond-schedule), which comes [12/23/2022](https://devguide.python.org/#status-of-python-branches). Future options to resolve the circular dependencies (subject to discussion): 1. Use interfaces for binding submodules. For example, have a torch/ao/_nn with all the source code, and an interface torch/ao/nn with only the __init__.py file. The __init__ files inside the torch/ao/_nn will be empty 2. Completely isolate the common code into a separate submodule, s.a. torch/ao/common. The other submodules will not be referencing each other. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32860168 Pulled By: z-a-f fbshipit-source-id: e3fe77e285992d34c87d8742e1a5e449ce417c36	2021-12-09 01:21:30 -08:00
CodemodService FBSourceClangFormatLinterBot	015e481a41	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32975574 fbshipit-source-id: 66856595c7bc29921f24a2c5c00c72892f262aa1	2021-12-09 00:10:33 -08:00
Mike Ruberry	dc87cf5fe1	Fixes mem_get_info when querying on a device other than the current device (#69640 ) Summary: Also fixes the documentation failing to appear and adds a test to validate that op works with multiple devices properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69640 Reviewed By: ngimel Differential Revision: D32965391 Pulled By: mruberry fbshipit-source-id: 4fe502809b353464da8edf62d92ca9863804f08e	2021-12-08 23:04:30 -08:00
Sangbaek Park	24d885f5f8	[Vulkan] Thread-safe Vulkan backend for OSS (#69576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69576 Vulkan backend for OSS is also thread-safe by default: * Removed `MAKE_VULKAN_THREADSAFE` preprocessor and if-conditions Test Plan: Test build on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5: ``` //xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s) Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 10.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 27.1 ms 5.86 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 58.5 ms 11.8 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.803 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 9.14 ms 0.857 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 32.1 ms 31.3 ms 3000 ``` Test result on MacOS: ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 18.89, 29.61, 24.95 *WARNING* Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 53.3 ms 39.6 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 28.0 ms 20.7 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 51.8 ms 38.7 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.76 ms 1.31 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.29 ms 1.11 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 49.2 ms 41.8 ms 3000 ``` Reviewed By: SS-JIA Differential Revision: D32933891 fbshipit-source-id: d8ebd5394771e1d79230c1f3aa8fbec4472b3197	2021-12-08 21:04:52 -08:00
Natalia Gimelshein	ecf9c82f24	Reduce binary size of TensorCompare.cu (#68835 ) Summary: This PR does several things 1) eliminates `where` instantiations for deprecated `byte` condition dtype, and casts `condition` to `bool` in this case. This is a perf penalty for people using deprecated calls 2) Makes `clamp_{min/max}.Tensor` overload reuse `clamp_{min/max}.Scalar` kernels if limit argument is cpu scalar, instead of instantiating `gpu_kernel_with_scalars` 3) Unifies all clamp_scalar kernels to use a single kernel with lambda picking the correct operation. I've verified that it doesn't degrade kernel performance. 4) Eliminates redundant TensorIterator construction that `clamp` structured kernel was doing when only `min` or `max` was specified This reduces the cubin size for TensorCompare.cu on V100 from 15751920 bytes to 7691120 bytes, with corresponding reduction in compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68835 Reviewed By: mruberry Differential Revision: D32839241 Pulled By: ngimel fbshipit-source-id: 0acde5af10a767264afbdb24684b137c5544b8d9	2021-12-08 20:08:53 -08:00
Sangbaek Park	3e560239e2	[Vulkan] Implement clone operator (#69551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69551 Implemented `clone` operator in the Vulkan backend: * Supports only <= 4D tensors. * Internal name is `aten::clone`. * Vulkan `clone` operator accepts only `c10::MemoryFormat::Preserve` and `c10::MemoryFormat::Contiguous` for the argument `c10::optional<c10::MemoryFormat> optional_memory_format`. * Throws an exception if the `optional_memory_format argument` is neither `MemoryFormat::Preserve` nor `MemoryFormat::Contiguous` * CPU implementation: [/aten/src/ATen/native/TensorFactories.cpp::clone()](`3e45739543/aten/src/ATen/native/TensorFactories.cpp (L1415)`) * MKL-DNN implementation: [/aten/src/ATen/native/mkldnn/TensorShape.cpp::mkldnn_clone()](`3e45739543/aten/src/ATen/native/mkldnn/TensorShape.cpp (L58)`) * `self.copy_(src)` calls `copy_()` for Vulkan to Vulkan copy operation ``` vTensor::copy_() vTensor::copy_() X -> Vulkan vTensor::copy_() CPU -> Vulkan vTensor::clone() vTensor::clone() -> MemoryFormat::Preserve vTensor::clone() -> MemoryFormat::Preserve -> self = at::empty_like(src) vTensor::clone() self.copy_(src); -> BEFORE vTensor::copy_() vTensor::copy_() X -> Vulkan vTensor::copy_() Vulkan -> Vulkan vTensor::clone() self.copy_(src); -> AFTER vTensor::copy_() vTensor::copy_() Vulkan -> X vTensor::copy_() Vulkan -> CPU ``` * References: * Function `torch.clone` in PyTorch documentation: https://pytorch.org/docs/stable/generated/torch.clone.html * Pytorch preferred way to copy a tensor: https://stackoverflow.com/questions/55266154/pytorch-preferred-way-to-copy-a-tensor * `torch.memory_format`: https://pytorch.org/docs/stable/tensor_attributes.html?highlight=memory_format#torch.torch.memory_format * `c10::MemoryFormat` definition in [/c10/core/MemoryFormat.h](`3e45739543/c10/core/MemoryFormat.h (L28)`) Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.clone_success [ OK ] VulkanAPITest.clone_success (5 ms) [ RUN ] VulkanAPITest.clone_invalidinputs_exceptions [ OK ] VulkanAPITest.clone_invalidinputs_exceptions (1 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.clone_success [ OK ] VulkanAPITest.clone_success (19 ms) [ RUN ] VulkanAPITest.clone_invalidinputs_exceptions [ OK ] VulkanAPITest.clone_invalidinputs_exceptions (2 ms) ``` Reviewed By: SS-JIA Differential Revision: D32923535 fbshipit-source-id: ea29792e1b0080cbbc1c8c7e8bf2beffad9b5c0d	2021-12-08 18:46:56 -08:00
Pritam Damania	eb2a803406	Run test_embedding_bag_with_no_grad_tensors only for TensorPipe (#69626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69626 Sparse tensors are only supported by the TensorPipe RPC backend. As a result, moving test_embedding_bag_with_no_grad_tensors to be a TensorPipe specific test. ghstack-source-id: 145134888 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D32959952 fbshipit-source-id: d65f2edbb6dad7705475690a8c6293a322299dde	2021-12-08 18:29:38 -08:00
soulitzer	b61c532f96	Make make_dual redispatch (#68630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68630 Constraints: 1) (functorch) if all the inputs to an op have requires_grad=False and don't have tangents, then their VariableType kernel should be a no-op i.e., behave like a redispatch. This is due to functorch's DynamicLayerStack having the autograd key by default (which is so that transformations like vmap) still work with autograd 2) (inference mode) inference tensors in inference mode will call straight into the kernel, we should still do something sensible inside even if we normally wouldn't redispatch into it. 3) ~Should support potential application of interposition below autograd: `nn.Parameter` is a example of subclassing where the subclass is not preserved when an operation is performed. There is an exception though: we want calling `make_dual` on a `nn.Parameter` to preserve its parameterness.~ 4) Should avoid calls to shallow_copy_and_detach to avoid spurious calls into `__python_dispatch__`. This PR: - does not redispatch to `make_dual` from its `ADInplaceOrView` kernel to satisfy (1) - calls into `alias` from the kernel in the native namespace so that behavior is consistent with other views in inference mode to satisfy (2) - discussion of (3). We still wouldn't be able to directly override `make_dual` below autograd. In this PR, instead of not redispatching at all, we choose to redispatch into `at::alias` so that one can override `make_dual`. The side effect is that one would not be able to distinguish calls between the two, which can be problematic (though a straightforward but hacky solution would be to create a new `at::alias_for_make_dual` that would allow users to distinguish) the two. This isn't ideal but seems to be the simplest way to satisfy (3). We don't pursue that hacky solution here. - (4) is satisfied because we remove calls to `shallow_copy_and_detach` <details> <summary> A potentially less hacky but more involved solution? (WIP) </summary> Realizing that make_dual is more like requires_grad, perhaps it shouldn't be autograd explicit? Make make_dual a composite or python-only construct. i.e., it would be a view on the primal followed by something to the effect of primal.set_fw_grad(tangent). Additional constraints: 5) make_dual needs to be backward-differentiable (I can't think of any applications yet becuase technically as a high-order function, jvp's input is the tangent only, "detach" is not applied on the tangent, so one would still be able to propagate gradients through it). 6) set_fw_grad needs to raise an error if there is a layout mismatch and base is a forward-differnentiable view Possible plan - (6) implies that a plain view would not suffice. We need a `detach`-like operation to ensure that set_fw_grad knows the view is not forward differentiable. - (5) implies that is this (new) `detach` would need to be backward differentiable (API TBD). - (3) is no longer relevant because make_dual is no longer autograd explicit, but perhaps this new detach should behave like the current one? There is a lot of logic to replicate for detach, so this may be hard. - (1) is satisfied if we use current detach logic, i.e., , and (4) is trivial. I'm not convinced that this is the right solution either, because in the end does (3) still work? </details> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32899679 Pulled By: soulitzer fbshipit-source-id: 98e13ae954e14e1e68dbd03eb5ab3300d5ed2c5e	2021-12-08 17:56:03 -08:00
soulitzer	7956a405ef	Make make_dual also return namedtuple when level less than zero (#68628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68628 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32899681 Pulled By: soulitzer fbshipit-source-id: 61ed09f4038e19817978a521e9571fdc482b424b	2021-12-08 17:54:40 -08:00
Mike Iovine	1c43b1602c	[SR] Scope exit guard for memory planner deallocation (#68795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68795 This change improves static runtime exception safety. Added a scope exit guard that invokes `MemoryPlanner::deallocate` in its destructor. Caveat: we have to be really careful with the exception behavior of `MemoryPlanner::deallocate` and `MemoryPlanner`'s constructor, because they're now both potentially called in the destructor of the scope exit guard. Letting exceptions potentially escape destructors is playing with fire since 1) the destructor of `Deallocator` is (implicitly) `noexcept`, 2) even if it wasn't, `std::terminate` will be called if an exception escapes and the stack is already unwinding. To get around this, we wrap the deallocation stuff in a try/catch. If deallocation throws, then we simply reset all of the memory planner stuff and carry on. There's a catch: the code path that we take when handling the deallocation exception can't throw. However, this code path is much simpler than memory planner construction/deallocation, so it's much easier to manually audit the correctness here. Test Plan: New unit tests `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32609915 fbshipit-source-id: 71fbe6994fd573ca6b7dd859b2e6fbd7eeabcd9e	2021-12-08 16:41:52 -08:00
Nikita Karetnikov (ニキータカレートニコフ)	3b27304d20	Fix typos in ATen README (#69170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69170 Reviewed By: mrshenli Differential Revision: D32957504 Pulled By: H-Huang fbshipit-source-id: d8e613b67a864f95e45b2d45398ee71efde0c567	2021-12-08 14:02:26 -08:00
Jiewen Tan	b10381f42d	Port smooth_l1_loss to structured kernels (#67404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67404 Port smooth_l1_loss to structured kernels. Brian Hirsh authored the part of adding build_borrowing_binary_op_coerce_to_scalar to TensorIterator. Test Plan: This commit shouldn't change the behavior. So, CI. Reviewed By: bdhirsh, ngimel Differential Revision: D31981147 Pulled By: alanwaketan fbshipit-source-id: a779bb76c848eed8b725dc0e1d56b97a3bd9c158	2021-12-08 12:56:24 -08:00
Charles David Hernandez	497ec9d9b8	Getting NS to work with Ferraris (#68908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68908 see description in github Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32928449 fbshipit-source-id: ba7085b823a0ebcd0d9e40f4ac19ca0a2cac1169	2021-12-08 12:26:00 -08:00
Bryan Reese	51b6981c36	[PyTorch Tests] Split out skip logic, make changes for plugins (#67256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67256 To change what tests can be run in various cases, the check logic should be moved to functions and variables that can be changed. One challenge here is that decorators don't have dynamic functionality. If something is read in when imported and then changed afterwards, it will not actually change. This means we need to separate out the variables that need to be changed for our use case. Those are put into common_distributed.py and can be changed before importing the distributed_test.py code. The use case is to add new backends to the tests and split it into tests that can be ran on demand as a separate instance. To do so, you would change DistTestSkipCases after importing it into a launcher or a setup script and then load distributed_test. Test Plan: Check the signals Reviewed By: mrshenli Differential Revision: D31906947 fbshipit-source-id: 45e3258c55f4dc34e12a468bed65280f4c25748f	2021-12-08 12:23:15 -08:00
Peter Bell	e279963eef	Remove remaining THC code (#69039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69039 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872476 Pulled By: ngimel fbshipit-source-id: 7972aacc24aef9450fb59b707ed6396c501bcb31	2021-12-08 12:18:08 -08:00
kshitij12345	7407e3d6fd	[fix] cross_entropy : fix weight with ignore_index and label_smoothing (#69511 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69339 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69511 Reviewed By: mrshenli Differential Revision: D32951935 Pulled By: jbschlosser fbshipit-source-id: 482eae851861a32f96bd6231dd3448fb6d44a015	2021-12-08 12:08:33 -08:00
Rohan Varma	d44d59aa70	[BE] Enable C++ stacktraces for MultiProcessTestCase (#69175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69175 Shows C++ stacktraces for python distributed tests that inherit from MultiProcessTestCase. Closes https://github.com/pytorch/pytorch/issues/69168 ghstack-source-id: 145085858 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32736872 fbshipit-source-id: 743e870eefa7a9e77c5791d0936e2ebd5c9b1016	2021-12-08 11:57:51 -08:00
John Clow	adb619a193	Adding hardswish, opinfo tests to custom rules (#69399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69399 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32937576 Pulled By: Gamrix fbshipit-source-id: 0e53d9e6669e70abcc744399f022a902214ef213	2021-12-08 11:56:34 -08:00
Chen Lai	a0efa48c7b	[Operator Versioning][Edge] Have operator version number available at the loading stage (#67729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67729 1. operator version is needed to decide whether applying upgrader or not. This pr make it available at loading stage. 2. Swap the order of parsing instruction and operator, because instruction needs to know the operator first because deciding whether applying upgrader or not (change `OP` to `CALL` or not). ghstack-source-id: 145082390 Test Plan: ``` buck test //caffe2/test/cpp/jit:jit ``` Reviewed By: iseeyuan Differential Revision: D32092516 fbshipit-source-id: 853a68effaf95dca86ae46b7f7f4ee0d8e8767da	2021-12-08 11:50:46 -08:00
anjali411	2808563e69	Forward fix for failing master (#69625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69625 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32959635 Pulled By: anjali411 fbshipit-source-id: 4d811c6a05deb991cb2886dd65b3f6059555b395	2021-12-08 11:30:38 -08:00
anjali411	3e6164449f	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32834987 Pulled By: anjali411 fbshipit-source-id: 20ea08ade0db0044ca633d9c1a117a6a2e65d1fd	2021-12-08 10:37:39 -08:00
Vincent-Pierre Berges	30bb4e0071	Add nvidia-smi memory and utilization as native Python API (#69104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69104 Add nvidia-smi memory and utilization as native Python API Test Plan: testing the function returns the appropriate value. Unit tests to come. Reviewed By: malfet Differential Revision: D32711562 fbshipit-source-id: 01e676203299f8fde4f3ed4065f68b497e62a789	2021-12-08 10:33:23 -08:00
Will Constable	ee60b5ddf3	Improve efficiency of shape hash by not using tostring (#69496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69496 tostring is expensive, and this is equivalent and faster Test Plan: covered by lazy tensor unit tests Reviewed By: desertfire, alanwaketan Differential Revision: D32901050 fbshipit-source-id: 34080f415db5fd5d3817f7f2533f062a6ec07d21	2021-12-08 09:16:00 -08:00
Kushashwa Ravi Shrimali	2cb385dd6e	OpInfo for `nn.functional.dropout2d`, revise sample inputs for `dropout` (#67891 ) Summary: Earlier, we were only testing for inputs with the shape of `(5,)` for `nn.functional.dropout`, but since it's used a lot - I feel it's a good idea to test for a few more shapes including scalars. This PR: 1. Revises sample inputs for `nn.functional.dropout` 2. Adds an OpInfo for `nn.functional.dropout2d`. A note regarding the documentation: Looks like `nn.functional.dropout2d` also supports inputs of shape `(H, W)` apart from `(N, C, H, W) / (C, H, W)` but the [documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html#torch.nn.Dropout2d) doesn't mention that (`H, W` case). Should that be revised or am I missing anything here? (Filed an issue here: https://github.com/pytorch/pytorch/issues/67892) ```python # A 2D tensor is a valid input for Dropout2d In [11]: tensor = torch.randn((3, 4), device='cpu', dtype=torch.float32) In [12]: dropout2d = torch.nn.Dropout2d(p=0.5) In [13]: dropout2d(tensor) Out[13]: tensor([[-0.1026, -0.0000, -0.0000, -0.0000], [-1.5647, 0.0000, -0.0000, -0.5820], [-0.0000, -3.2080, 0.1164, -3.6780]]) ``` Issue Tracker: https://github.com/pytorch/pytorch/issues/54261 cc: mruberry zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67891 Reviewed By: mrshenli Differential Revision: D32628527 Pulled By: mruberry fbshipit-source-id: 4c9b89550f1d49526e294378ce107eba9f29cabb	2021-12-08 08:54:16 -08:00
Philip Meier	f54745a6ff	add `OpInfo` for `torch.diagflat` (#65680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65680 cc mruberry Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31730001 Pulled By: mruberry fbshipit-source-id: 487e41da4b043944cc5b26d6081209fb0875f4de	2021-12-08 08:49:45 -08:00
Philip Meier	7e49f4638c	add `OpInfo` for `torch.nn.functional.kl_div` (#65469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65469 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31111698 Pulled By: mruberry fbshipit-source-id: 0af41a2ef2b199db3d8c63050277e72213f04565	2021-12-08 08:48:18 -08:00
Alban Desmaison	8b20dde932	add python dispatch test back to CI and fix typo in test (#69565 ) Summary: The error message was changed following a PR comment. And since the test doesn't run on CI, I forgot to update the test to catch the new error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69565 Reviewed By: mrshenli Differential Revision: D32932982 Pulled By: albanD fbshipit-source-id: a1da72b0ca735e72b481bc944039233094f1c422	2021-12-08 08:44:49 -08:00
Don Jang	afaa184b44	[Static Runtime] Avoid evaluating expressions of `Node` for interpreter fallback op (#69489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69489 This change avoids pulling out `Node` out of `ProcessedNode` to evaluate expressions related to `Node` at op execution time. Perf gain is expected to be there but not measurable and the purpose of this change is to make SR's code more self-contained (calling more code from SR not JIT) during execution time. Test Plan: Existing tests Reviewed By: mikeiovine Differential Revision: D32893265 fbshipit-source-id: f0f397666b3556f985d45112af8fe0b08de22139	2021-12-08 08:40:30 -08:00
Charles David Hernandez	fc2614537b	Updating quantization documentation (#68907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68907 Added information about symmetric qschemes and corrected an error in reference to https://github.com/pytorch/pytorch/issues/68540 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D32662033 fbshipit-source-id: 9052c597f61991934b86850fea8b6eab78397450	2021-12-08 08:32:33 -08:00
Kevin Tse	39fb855d91	[DataLoader] Implementing communication processes for Map-style DataPipes (#68549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68549 cc SsnL VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32922676 Pulled By: NivekT fbshipit-source-id: fd918a342214d617a489ac5acffff15b55e9b255	2021-12-08 07:27:01 -08:00
Ben Koopman	f3983f9c47	[quant][embdding qat] Re-land Add FX support for QAT EmbeddingBag (#69334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69334 Original PR #68121 broke with incompatible qengine for Mac OS, this PR re-introduces changes with fix Add FX support for QAT EmbeddingBag operator, previously only eager mode support. Test Plan: pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embeddingbag_linear" Imported from OSS Reviewed By: jingsh Differential Revision: D32815153 fbshipit-source-id: 33654ce29de6e81920bf3277a75027fe403a1eb2	2021-12-08 05:57:20 -08:00
Ben Koopman	93aa3603ee	[quant][embedding qat] Re-Land Support Embedding QAT via FX API (#69333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69333 Original PR reverted due to break with incompatible qengine on Mac OS, this diff fixes that. Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"` Imported from OSS Reviewed By: jingsh Differential Revision: D32814827 fbshipit-source-id: f7a69d2b596f1276dc5860b397c5d5d07e5b9e16	2021-12-08 05:28:07 -08:00
Peter Bell	fc8404b5bc	histc: Avoid dispatch in parallel region (#68520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68520 Ref #56794 This changes the code from allocating 1 tensor per thread inside the parallel region, to allocating one larger tensor outside the parallel region and manually viewing each thread's slice of the histogram. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32929365 Pulled By: ngimel fbshipit-source-id: e28da2736e849a0282b70f34d11526d3355d5bd5	2021-12-08 02:42:43 -08:00
Pritam Damania	2a38e1a76a	Fix TSAN issue in TCPStore (#69590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69590 The variable `callbackRegisteredData_` was written to without synchronization. ghstack-source-id: 145066862 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D32938979 fbshipit-source-id: bc9a11a70680db45ece95880ae19ce2026e8a88e	2021-12-07 23:29:08 -08:00
Pritam Damania	0ce49000db	Release GIL during RPC shutdown. (#69586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69586 In certain scenarios during shutdown the following assert failed: https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/rpc/rpc_agent.cpp#L39. This was due to _reset_current_rpc_agent not releasing GIL. Fixed this issue by releasing GIL. ghstack-source-id: 145062265 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D32937687 fbshipit-source-id: 980adbcc1e3799b40206f7bca6e7695ca67f0fc2	2021-12-07 23:24:57 -08:00
Nikita Vedeneev	c236247826	OpInfo tests for `(svd\|pca)_lowrank` (#69107 ) Summary: As per title. While working on this I have discovered several issues with these methods related to grad instabilities. I will file them and link here later. These were quite painful to force to pass all the tests with these discovered issues, sorry for the delay, mruberry! cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/69107 Reviewed By: zou3519 Differential Revision: D32920341 Pulled By: mruberry fbshipit-source-id: 15b33e2b46acdcbff8a37d8e43e381eb55d1a296	2021-12-07 19:50:12 -08:00
Shirong Wu	e06af79136	Fix sign op converter (#69580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69580 Fix bug in sign converter Reviewed By: 842974287 Differential Revision: D32934661 fbshipit-source-id: f21d7c65b07ab2f0a0027939d660e56dacd9cdef	2021-12-07 19:04:51 -08:00
Joel Schlosser	6b950eea27	Remove finput and fgrad_input from slow3d transpose signatures (#68899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68899 Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D32655872 Pulled By: jbschlosser fbshipit-source-id: 963b391a489c639f98d9f634d4f4c668353c799a	2021-12-07 18:24:40 -08:00
Jerry Zhang	05946051f8	[quant][graphmode] initial support for fusion pattern in backend_config_dict (#69335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69335 This PR added support for configuring fusion with: "pattern", "fuser_method" This only works for simple sequence of 2 op patterns currently, will extend this in future PRs Test Plan: regresion test on linear-relu fusion: ``` python test/fx2trt/test_quant_trt.py TestQuantizeFxTRTOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32816164 fbshipit-source-id: f300b7b96b36908cb94a50a8a17e0e15032509eb	2021-12-07 16:54:42 -08:00
Richard Barnes	2d38d37f5f	use irange for loops (#69533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69533 Modified loops in files under fbsource/fbcode/caffe2/ from the format ``` for(TYPE var=x0;var<x_max;x++) ``` to the format ``` for(const auto var: irange(xmax)) ``` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32837942 fbshipit-source-id: 8663037a38ade8f81bd5e983a614d197ea11f0d1	2021-12-07 16:53:27 -08:00
Bin Bao	8a975c0106	[LT] Sync with the lazy_tensor_staging branch (#69527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69527 - Add missing TORCH_API in class/struct declarations; - Fix internal op declarations in ltc_ops; - Update lazy_ts_lowering.py Test Plan: Imported from OSS Reviewed By: alanwaketan Differential Revision: D32918929 Pulled By: desertfire fbshipit-source-id: e956d51aff5ef593fdf4cd5ad2a38e38788913d8	2021-12-07 16:47:35 -08:00
Rohan Varma	049debd97d	[Reland][Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69508 Original Phabricator Diff: D32704467 (`e032dae329`) Reland, fix is to not test traditional checkpoint when input does not require grad as that is unsupported as documented. Original PR body: Resubmission of https://github.com/pytorch/pytorch/pull/62964 with the suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, the tests that we need to add are: - [x] Gradient hooks are called once - [x] works when input does require grads but Tensor that require grads are captures (like first layer in a nn) - [x] works for functions with arbitrary input/output objects - [x] distributed tests (next PR) Note that this is only for `torch.utils.checkpoint`, if this approach overall looks good, we will do something similar for `checkpoint_sequential`. ghstack-source-id: 144948501 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32902634 fbshipit-source-id: 2ee87006e5045e5471ff80c36a07fbecc2bea3fe	2021-12-07 16:31:23 -08:00
Sicheng Stephen Jia	3456c2cbc8	Allow build_android.sh to forward Vulkan args (#69332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69332 --- ## Context The `build_android.sh` script currently does not forward Vulkan configuration options, which makes it impossible to control them when running `build_pytorch_android.sh`. ## Changes Slightly change the script to allow Vulkan configuration options to propagate from `build_pytorch_android.sh` to `build_android.sh` Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840908 Pulled By: SS-JIA fbshipit-source-id: e55d89c93c996b92b743cf047f5a285bb516bbc4	2021-12-07 16:24:35 -08:00
Sicheng Stephen Jia	fa39754e11	[vulkan] Disable shader optimization to avoid Validation Errors (#69331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69331 --- ## Context When the optimization flag is turned on, some SPIR-V modules produced from the Vulkan compute shaders were invalid. The Vulkan Validation layer raises the following error for these modules: ``` [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object: VK_NULL_HANDLE (Type = 0) \| SPIR-V module not valid: Header block 52[%52] is contained in the loop construct headed by 44[%44], but it's merge block 47[%47] is not %52 = OpLabel ``` Turning off the optimization flag, the SPIR-V modules produced no longer reports these errors in the Validation layer. ## Changes Turns off optimization when generating SPIR-V modules to ensure correctness of the modules. Note that disabling SPIR-V optimization did not regress inference latency for the several models I tested. Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840910 Pulled By: SS-JIA fbshipit-source-id: 7ccb5691fd0e2d11b9c8c28ad7b83906e8163699	2021-12-07 16:24:32 -08:00
Sicheng Stephen Jia	bede33e3f5	[vulkan] Add image format qualifier to glsl files (#69330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69330 --- ## Context Previously, our shader files did not declare any [image format qualifiers](https://www.khronos.org/opengl/wiki/Layout_Qualifier_(GLSL)#Image_formats) for image layouts. This causes the SPIR-V modules produced to declare the [StorageImageWriteWithoutFormat](https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#_a_id_capability_a_capability) capability, which requires `shaderStorageImageWriteWithoutFormat` to be enabled in [VkPhysicalDeviceFeatures](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceFeatures.html). `shaderStorageImageWriteWithoutFormat` is not available on some devices, causing errors to be reported by the Vulkan validation layer. ## Changes Vulkan shaders now declare the image format explicitly so that the SPIR-V modules produced are compatible with devices that do not have `shaderStorageImageWriteWithoutFormat` enabled. Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32840909 Pulled By: SS-JIA fbshipit-source-id: 76e0a0da68b423ebc74ae7e839b9cfaf57d2cd39	2021-12-07 16:23:09 -08:00
Jerry Zhang	e5a1ee0e5a	[quant][graphmode] Refactor fusion to use the new Pattern format (#68770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68770 Previous fusion only works for a sequnce of ops, which is not general enough for fusion patterns that is defined by a subgraph, this PR refactors that to make it more general Test Plan: ``` python test/test_quantization.py TestFuseFx ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32602637 fbshipit-source-id: a7897c62081b9d71c67fb56e78484cf68deaacf6	2021-12-07 16:12:40 -08:00
Richard Barnes	1433160a36	use irange for loops 6 (#66742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66742 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705366 fbshipit-source-id: be58222426c192406a7f93c21582c3f6f2082401	2021-12-07 16:07:50 -08:00
Peter Bell	9a7732e852	CMake: Support dynamic codegen outputs (#68246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68246 Currently the codegen produces a list of output files at CMake configuration time and the build system has no way of knowing if the outputs change. So if that happens, you basically need to delete the build folder and re-run from scratch. Instead, this generates the output list every time the code generation is run and changes the output to be a `.cmake` file that gets included in the main cmake configuration step. That means the build system knows to re-run cmake automatically if a new output is added. So, for example you could change the number of shards that `Operators.cpp` is split into and it all just works transparently to the user. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32596268 Pulled By: albanD fbshipit-source-id: 15e0896aeaead90aed64b9c8fda70cf28fef13a2	2021-12-07 15:58:06 -08:00
Peter Bell	cd9da3267c	Rationalize API exports in torch_python (#68095 ) Summary: This renames `WindowsTorchApiMacro.h` to `Export.h` to mirror the c10 header `c10/macros/Export.h` and also updates it to use `C10_EXPORT`/`C10_IMPORT`. This also removes the `THP_API` macro from `THP_export.h` which appears to serve the same purpose. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68095 Reviewed By: jbschlosser Differential Revision: D32810881 Pulled By: albanD fbshipit-source-id: d6949ccd0d80d6c3e5ec1264207611fcfe2503e3	2021-12-07 15:24:37 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	829b49b867	Output UnionType str rep with () instead of [] (#69502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69502 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32902781 Pulled By: tugsbayasgalan fbshipit-source-id: 67a73b209575437477cdbd3eb8f685019709e99c	2021-12-07 14:17:06 -08:00
Ivan Yashchuk	a8232ee1bc	Sparse CSR CUDA: Add block torch.addmv when mat is sparse (#68708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68708 This PR adds block CSR matrix times dense vector multiplication. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32647694 Pulled By: cpuhrsch fbshipit-source-id: a1c120691c4350284b156fe4259eda684b734b66	2021-12-07 14:02:59 -08:00
Cheng Tang	6df7b75186	skip ORT tensor in TensorIterator because it doesn't have storage (#68705 ) Summary: ORT Tensors are similar to XLA tensors which doesn't have storage. So extend the condition to ORT tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68705 Reviewed By: zou3519 Differential Revision: D32921378 Pulled By: albanD fbshipit-source-id: 3bda9bba2ddd95cb561a4d1cff463de652256708	2021-12-07 13:33:54 -08:00
Mike Iovine	008469c5e2	[SR] Simplify memory re-use algorithm (#68302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68302 Implement the new memory re-use algorithm. It’s roughly based on the c2 one, but after going through many iterations it may not be a 1:1 port anymore. Also deleted the old liveness analysis. Test Plan: ## Re-use metrics `inline_cvr` (294738512_58) Before * `local` ``` Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 4601984 bytes Total number of reused tensors: 1183 ``` * `local_ro` ``` Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 29696 bytes Total number of reused tensors: 959 ``` After * `local` ``` Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 4520000 bytes Total number of reused tensors: 1198 ``` * `local_ro` ``` Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 29120 bytes Total number of reused tensors: 963 ``` Reviewed By: hlu1 Differential Revision: D32370424 fbshipit-source-id: 06a8e0a295ed7a2b4d14071349c1f1e975f746bf	2021-12-07 13:25:42 -08:00
Andrey Talman	c309637923	Making cuda 11.5 workflows periodic (#69323 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69323 Reviewed By: gchanan, malfet Differential Revision: D32812346 Pulled By: atalman fbshipit-source-id: 081f40802997cfb986742f1621eee4b4565660f0	2021-12-07 13:14:07 -08:00
Nikita Shulga	baac51ff4a	Add conda-forge dependency for cuda-11.5 (#69541 ) Summary: [NVIDIA's cudatoolkit=11.5](https://anaconda.org/nvidia/cudatoolkit/files?version=11.5.0) at the time of the writing depends on libstdcxx-ng >=9.4.0, but latest available from official anaconda channel is [9.3.0](https://anaconda.org/anaconda/libstdcxx-ng/files?version=9.3.0), so add `-c conda-forge` as extra dependency to resolve the problem Should resolve problems such as https://app.circleci.com/pipelines/github/pytorch/pytorch/420750/workflows/19d6e3ce-a305-49c6-bac8-11ed43ed2b1e/jobs/16829102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69541 Reviewed By: atalman Differential Revision: D32921300 Pulled By: malfet fbshipit-source-id: 09dd3575f968679f545aec739a2791dde85d37c1	2021-12-07 12:58:41 -08:00
gmagogsfm	358e908162	Add Union type to TorchScript Language Ref (#69514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69514 Reviewed By: tugsbayasgalan Differential Revision: D32909371 Pulled By: gmagogsfm fbshipit-source-id: af1c3040cd59ee913dc576cf8a8c759313f1e07f	2021-12-07 12:53:54 -08:00
David Berard	c21169ea41	[JIT] optimize_for_inference on methods other than forward (#69367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69367 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32835529 Pulled By: davidberard98 fbshipit-source-id: d3066c23d071bc2a3bee59b8ab03b6ab0e43efcf	2021-12-07 12:36:47 -08:00
David Berard	60ca6776e2	[JIT] run frozen optimizations on methods other than forward (#68668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68668 This updates run_frozen_optimizations so that it will run on additional methods other than forward ghstack-source-id: 143871758 Test Plan: Added test in test_freezing.py ``` python3 test/test_jit.py -- test_conv_bn_folding_not_forward ``` Reviewed By: eellison Differential Revision: D32567857 fbshipit-source-id: 75e56efad576404dc8d6897861d249573f5ccd7a	2021-12-07 12:35:30 -08:00
Kushashwa Ravi Shrimali	63470f9449	Sparse CSR: Implement unary ufuncs (with 0->0 correspondence) (#69292 ) Summary: This PR attempts to add support for unary ufuncs (with 0->0 correspondence) for Sparse CSR Layout. Ops supported: `['abs', 'asin', 'asinh', 'atan', 'atanh', 'ceil', 'conj_physical', 'floor', 'log1p', 'neg', 'round', 'sin', 'sinh', 'sign', 'sgn', 'signbit', 'tan', 'tanh', 'trunc', 'expm1', 'sqrt', 'angle', 'isinf', 'isposinf', 'isneginf', 'isnan', 'erf', 'erfinv']` cc nikitaved pearu cpuhrsch IvanYashchuk peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69292 Reviewed By: pbelevich Differential Revision: D32805514 Pulled By: cpuhrsch fbshipit-source-id: 9ae20817e77a36d3aa6c5afa532b9dc3b8cf1dd3	2021-12-07 12:07:41 -08:00
Rodrigo Bermúdez Schettino	1a202b0c39	Docs: Fix broken code syntax in autograd.rst (#69362 ) Summary: The backticks around `nn.Parameters` were not rendered correctly because the word was enclosed in an italics block. Spotted the issue on https://pytorch.org/docs/stable/notes/autograd.html#locally-disable-grad-doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69362 Reviewed By: zou3519 Differential Revision: D32924093 Pulled By: albanD fbshipit-source-id: 5a310ac3f3d13a5116f7aa911817b9452eee711d	2021-12-07 12:03:15 -08:00
Yinghai Lu	10229e156b	trt engine inspector demo (#66683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66683 Starting from TensorRT 8.2, we have this nice engine inspector which gives you much details of trt layer. Test Plan: ``` buck run mode/opt -c python.package_style=inplace scripts/yinghai/test:trt_engine_inspector ``` And you will see something like ``` {"Layers": [{ "Name": "PWN(PWN(relu_1), add_1)", "LayerType": "PointWiseV2", "Inputs": [ { "Name": "x", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "Outputs": [ { "Name": "(Unnamed Layer* 1) [ElementWise]_output", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "ParameterType": "PointWise", "ParameterSubType": "PointWiseExpression", "NbInputArgs": 1, "InputArgs": ["arg0"], "NbOutputVars": 1, "OutputVars": ["var1"], "NbParams": 0, "Params": [], "NbLiterals": 4, "Literals": ["0.000000e+00f", "1.000000e+00f", "0.000000e+00f", "0.000000e+00f"], "NbOperations": 2, "Operations": ["const auto var0 = pwgen::iMax(arg0, literal0);", "const auto var1 = pwgen::iPlus(arg0, var0);"], "TacticValue": "0x0" },{ "Name": "matmul_1", "LayerType": "MatrixMultiply", "Inputs": [ { "Name": "(Unnamed Layer* 1) [ElementWise]_output", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }, { "Name": "y", "Dimensions": [10,2], "Format/Datatype": "Row major linear FP16 format" }], "Outputs": [ { "Name": "output0", "Dimensions": [10], "Format/Datatype": "Row major linear FP16 format" }], "ParameterType": "MatrixMultiply", "MatrixOpA": "VECTOR", "MatrixOpB": "VECTOR", "Alpha": 1, "Beta": 0, "TacticValue": "0x1" }], "Bindings": ["x" ,"y" ,"output0" ]} ``` Reviewed By: RoshanPAN, wushirong Differential Revision: D31681405 fbshipit-source-id: 31f912c37812ac17c6421073e0c35e512463ba6e	2021-12-07 11:50:09 -08:00
David Berard	aa9fbb9ae9	[JIT] check stack size after calling operator (#68788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68788 In debug mode, this should throw errors for ops where the wrong number ops is returned (i.e. the number of values left on the stack is different from the number shown in the schema) Test Plan: Run this in debug mode and verify that it doesn't throw an assert ``` import torch class Thing(torch.nn.Module): torch.jit.export def en(self, x: torch.Tensor): return torch.add(x, 2.0) def forward(self, x: torch.Tensor, y: torch.Tensor): a = torch.mm(x, y) b = torch.nn.functional.gelu(a) c = self.en(b) return c.std_mean() if __name__ == '__main__': unsc = Thing() thing = torch.jit.script(unsc) x = torch.randn(4, 4) y = torch.randn(4, 4) std, mean = thing.forward(x, y) print(std, mean) print(str(thing.forward.graph)) ``` Reviewed By: gchanan Differential Revision: D32625256 Pulled By: davidberard98 fbshipit-source-id: 61d5ec0c5a9f8b43706257119f4f524bb9dbe6f5	2021-12-07 11:43:50 -08:00
Kevin Tse	bd8d4195a6	[DataPipe] Small change to generation script and update to DataPipe .pyi file (#69392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69392 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849463 Pulled By: NivekT fbshipit-source-id: b6d419fbe0e4cc9d718f21fb3fe886f721f618d3	2021-12-07 11:40:53 -08:00
Kevin Tse	fdfdafd1e6	[DataPipe] Removing usage of unbatch_level from .batch interface and DataFrame (#69393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69393 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32849461 Pulled By: NivekT fbshipit-source-id: 16abbe289ad2092faaa029fd78f3d6924e7b2ff4	2021-12-07 11:40:50 -08:00
Kevin Tse	357160e68e	[DataPipe] Unifying API - removing nesting_level argument from FilterIterDataPipe (#69391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69391 As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `FilterIterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849462 Pulled By: NivekT fbshipit-source-id: 91cf1dc03dd3d3cbd7a9c6ccbd791ade91355f30	2021-12-07 11:40:46 -08:00
Kevin Tse	4478b14e4c	[DataPipe] Unifying API - removing nesting_level argument from MapperIterDataPipe (#69390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69390 As part of the efforts to unify the APIs across different data backends (e.g. TorchData, TorchArrow), we are making changes to different DataPipes' APIs. In this PR, we are removing the input argument `nesting_level` from `MapperIterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32849465 Pulled By: NivekT fbshipit-source-id: 963ce70b84a7658331d126e5ed9fdb12273c8e1f	2021-12-07 11:39:08 -08:00
Jerry Zhang	9cb52327a8	[quant][refactor] Move pattern type definition to ao/quantization/utils.py (#68769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68769 att, since we want to use this type in fuser_method_mapping in later PRs Test Plan: no change to logic, just regression test on ci ``` python test/test_quantization.py ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32602636 fbshipit-source-id: 15b95241431dfca9b1088d0920bf75705b37aa9a	2021-12-07 11:00:22 -08:00
Hanton Yang	976b076715	[iOS] Add LibTorch nightly build (#69341 ) Summary: Add LibTorch nightly build for using in LibTorchvision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69341 Test Plan: CI jobs: https://fburl.com/lbyjzpxz 1. Validate lib is uploaded to link https://ossci-ios-build.s3.amazonaws.com/libtorch_ios_nightly_build.zip 2. Download lib from the link and validate `version.txt` is correct 3. Test the lib in HelloWorld demo Imported from OSS Reviewed By: xta0 Differential Revision: D32901836 Pulled By: hanton fbshipit-source-id: 8622c3e6052cec2039bc15dea0d495ec1a8186cb	2021-12-07 10:07:28 -08:00
Scott Wolchok	3edf1b6cee	[PyTorch] Avoid no-op shared_ptr dtor when constructing tuple (#69337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69337 See note in code. ghstack-source-id: 144657751 Test Plan: Ran PyTorchFeatureConversionBenchmark 5x before/after: ``` swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2CacheTupleTypes ; end (pytorch-ort-bert) ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 418.75K PyTorchFeatureConversionIdListBenchmark 3.59us 278.91K PyTorchFeatureConversionIdScoreListBenchmark 5.01us 199.51K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.42us 413.80K PyTorchFeatureConversionIdListBenchmark 3.56us 280.60K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.15K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.41us 414.25K PyTorchFeatureConversionIdListBenchmark 3.55us 281.59K PyTorchFeatureConversionIdScoreListBenchmark 5.02us 199.09K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 417.68K PyTorchFeatureConversionIdListBenchmark 3.55us 281.65K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.06K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.39us 417.54K PyTorchFeatureConversionIdListBenchmark 3.56us 281.03K PyTorchFeatureConversionIdScoreListBenchmark 5.05us 198.13K ============================================================================ swolchok@devbig032 ~/f/fbcode> for x in (seq 5); sudo scripts/bertrand/noise/denoise.sh /tmp/pytorch_feature_conversion_benchmark.Dec2TupleConstruction ; end (pytorch-ort-bert) ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 420.38K PyTorchFeatureConversionIdListBenchmark 3.53us 282.90K PyTorchFeatureConversionIdScoreListBenchmark 4.99us 200.41K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.37us 421.54K PyTorchFeatureConversionIdListBenchmark 3.54us 282.27K PyTorchFeatureConversionIdScoreListBenchmark 4.99us 200.28K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 420.99K PyTorchFeatureConversionIdListBenchmark 3.56us 280.56K PyTorchFeatureConversionIdScoreListBenchmark 5.08us 196.91K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.37us 421.48K PyTorchFeatureConversionIdListBenchmark 3.54us 282.87K PyTorchFeatureConversionIdScoreListBenchmark 5.00us 199.88K ============================================================================ ============================================================================ sigrid/lib/features/tests/PyTorchFeatureConversionBenchmark.cpprelative time/iter iters/s ============================================================================ PyTorchFeatureConversionDenseBenchmark 2.38us 419.69K PyTorchFeatureConversionIdListBenchmark 3.56us 280.68K PyTorchFeatureConversionIdScoreListBenchmark 4.97us 201.23K ============================================================================ ``` Looks like maybe around 1% faster? Reviewed By: hlu1 Differential Revision: D32817592 fbshipit-source-id: 4b015dc993b26a92e45a3673e14fde32105a34fa	2021-12-07 09:39:15 -08:00
Jane Xu	617a3bd944	GHA: Re enable mac json uploads (#69387 ) Summary: Removed JSON uploading to S3 for Mac GHA workflows as the AWS credentials were not working. This PR tries uploading them to GitHub instead, which works https://github.com/pytorch/pytorch/runs/4413940318?check_suite_focus=true They should show up on the HUD page: hud.pytorch.org/pr/69387 with the name test-jsons after the CI is completed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69387 Reviewed By: seemethere Differential Revision: D32885204 Pulled By: janeyx99 fbshipit-source-id: 3d25ead6d464144a228fdf8ead5172de3ed8430e	2021-12-07 08:25:51 -08:00
CodemodService FBSourceClangFormatLinterBot	945d2e380c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32910817 fbshipit-source-id: 60d0cb10412e1a37a0249bb223b75855c5596dbd	2021-12-07 08:11:09 -08:00
Bryan Reese	4670f0f2c5	Set non-default backend names to lower case (#69400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69400 Hopefully this makes naming more consistent. Without this change, some tests will fail for plugins since values can be set to upper case in some cases. This should prevent that and make lookup and comparison consistent. Test Plan: Check the signals. There is no specific test for this, but all tests should pass. Reviewed By: mrshenli Differential Revision: D32836529 fbshipit-source-id: 1b7d2b64e04fe0391b710aa6ed6d1e47df9027a3	2021-12-07 07:58:46 -08:00
Vasiliy Kuznetsov	2dd46d3aa9	FX: ensure node stack trace survives copying (#69368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69368 Before this PR, copying a node would lose the stack trace. This PR ensures that the stack trace is preserved across copies. This is useful because quantization passes would like to start allowing the user to preserve stack traces, and we use the copy behavior. Test Plan: ``` python test/test_fx.py TestFX.test_stack_traces ``` Imported from OSS Reviewed By: jamesr66a Differential Revision: D32835248 fbshipit-source-id: 91610fd8d05f5683cfa5e11fb6f9f3feacb8e241	2021-12-07 06:18:38 -08:00
Jerry Zhang	ca945d989a	[quant][graphmode][fx] Add default_replay_qconfig for ops like reshape (#69249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69249 This PR added default_replay_qconfig and default_replay_observer which is used when we want to configure an operator to reuse the observer from input, if the input Tensor for the operator is not observed, we will not observe the output of this operator either, if the input Tensor is observed, we will observe the output of the operator with the same observer. e.g. ``` x1 = x0.reshape() ``` if reshape is configured with default_replay_qconfig: 1. if x0 is observed with observer_0, we'll observe x1 with the same observer instance 2. if x0 is not observed, we won't observe x1 either Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_replay_qconfig ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32774723 fbshipit-source-id: 26862b2bc181d0433e2243daeb3b8f7ec3dd33b2	2021-12-06 22:56:14 -08:00
David Berard	8b1e49635a	[JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32773666 Pulled By: davidberard98 fbshipit-source-id: c83dbb88804bdef23dc60a6299acbfa76d5c1495	2021-12-06 21:06:25 -08:00
Nikita Shulga	e55b939732	Enable build-split for all CUDA-11.x version (#69494 ) Summary: Should fix cu115 wheel binary builds, see https://hud.pytorch.org/ci/pytorch/pytorch/nightly?name_filter=cu115 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69494 Reviewed By: atalman Differential Revision: D32899994 Pulled By: malfet fbshipit-source-id: bb0e05a30c9360c75d2cfd9d4e0d40ed9a3b2830	2021-12-06 20:39:06 -08:00
Jerry Zhang	bd8a4a9372	[wip][quant][graphmode] produce reference pattern for binary ops and then rewrite to quantized op (#68229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68229 This PR makes BinaryOpQuantizeHandler to always produce reference patterns, and we rely on subgraph_rewriter to rewrite the reference qunatized patterns to quantized ops Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537714 fbshipit-source-id: 456086b308c4446840d8d37997daa6f8f8068479	2021-12-06 20:20:15 -08:00
Shiyan Deng	bcd0303834	[fx2trt][easy] add sparse flag to TRTInterpreter (#69495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69495 As the title. Separated from D30589161. Test Plan: Tested in D30589161. Reviewed By: maratsubkhankulov, wushirong Differential Revision: D32898927 fbshipit-source-id: 89e18d2eb19b43fbab92b4988d0a21d21cff2d1f	2021-12-06 18:57:08 -08:00
Shiyan Deng	3211588308	[fx2trt] Separate sign from `trunc_div` and use it for acc_ops.sign (#69486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69486 As the title. Migrate from sign plugin to native trt layers. All the layers are fused into one single PWN kernel in TRT. ``` [TensorRT] VERBOSE: Engine Layer Information: Layer(PointWiseV2): PWN(sign_1_sign_rhs + sign_1_sign_rhs_broadcast, PWN(PWN(sign_1_floor_div2_rhs + sign_1_floor_div2_rhs_broadcast, PWN(PWN(PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_abs], [UNARY]-[acc_ops.sign]-[sign_1_prod_abs_exp]), PWN([UNARY]-[acc_ops.sign]-[sign_1_prod_exp], [ELEMENTWISE]-[acc_ops.sign]-[sign_1_exp_floor_div])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_floor_div*2])), [ELEMENTWISE]-[acc_ops.sign]-[sign_1_sign])), Tactic: 0, x[Float(2,2,3)] -> output0[Float(2,2,3)] ``` Test Plan: CI Reviewed By: wushirong Differential Revision: D32887537 fbshipit-source-id: ac250b5197e340319de29653a27f879a0e1ea9cd	2021-12-06 16:54:44 -08:00
Shiyan Deng	e23827e6d6	[fx2trt] [Prep for release] Add type hints to converters and separate main files (#69458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69458 1. Added type hints to acc ops converters. 2. Put some of the class/logic in fx2trt.py to some separated files. (input_tensor_spec.py, trt_module.py, converter_registry.py). 3. Added import in `__init__.py` so that user can just call `from torch.fx.experimental.fx2trt import xxx` instead of `experimental.fx2trt.fx2trt`. Test Plan: CI Reviewed By: wushirong Differential Revision: D32884637 fbshipit-source-id: e3e1e597edb9a08b47b4595bd371f570f2f3c9b6	2021-12-06 16:54:41 -08:00
Shiyan Deng	a2d1cadfdb	[fx2trt] Add a helper function to generate specs for dynamic batch size (#69405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69405 Add a helper function that will generate input tensor specs with dynamic batch size. Note that the constraint currently on this function is that the batch dimension of all these tensors should be the first dimension. Also add more doc strings. Test Plan: Added unit tests. ``` Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/7881299413036896 ✓ ListingSuccess: caffe2/test/fx2trt/core:test_input_tensor_spec - main (7.455) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensor (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.047) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors_with_dynamic_batch_size (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.066) ✓ Pass: caffe2/test/fx2trt/core:test_input_tensor_spec - test_from_tensors (caffe2.test.fx2trt.core.test_input_tensor_spec.TestTRTModule) (7.181) Summary Pass: 3 ListingSuccess: 1 ``` Wait for CI to verify if this unit test can run without RE. Reviewed By: yinghai, kflu Differential Revision: D32853947 fbshipit-source-id: 19713e8ad5478c945385c7013f7a1b9894151fea	2021-12-06 16:54:39 -08:00
Shiyan Deng	cfe3cbb392	[fx2trt] Use weights shape as normalize shape in layer norm (#69401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69401 As the title. In PyTorch, these two shapes are the same. Normalize shape might be retrieved from tensor.size and in explicit batch dim, it won't work right now. Test Plan: ``` ✓ ListingSuccess: caffe2/test/fx2trt/converters:test_layernorm - main (7.018) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (22.945) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_0_1d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (23.203) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (42.549) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_1_2d_normalized_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (43.544) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_with_dynamic_shape_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (45.958) ✓ Pass: caffe2/test/fx2trt/converters:test_layernorm - test_layer_norm_2_4d_input_shape (caffe2.test.fx2trt.converters.acc_op.test_layernorm.TestLayerNormConverter) (47.027) Summary Pass: 6 ListingSuccess: 1 ``` Reviewed By: yinghai Differential Revision: D32853359 fbshipit-source-id: 8a122fe3348a1d9ad07b48647ec6166d171d113a	2021-12-06 16:53:29 -08:00
Michael Suo	59e98b66ac	Revert D32704467: [Autograd/Checkpoint] Checkpoint implementation without reentrant autograd Test Plan: revert-hammer Differential Revision: D32704467 (`e032dae329`) Original commit changeset: 6eea1cce6b93 fbshipit-source-id: 1a788c1fd57cee46bba82e216e6162d078359cc2	2021-12-06 16:33:32 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	bc89528931	Initialize upgrader and operator version files (#68772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68772 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D32603257 Pulled By: tugsbayasgalan fbshipit-source-id: 5a3d9ba4d0a01ddff4ff6ebdf7bb88ec125765b0	2021-12-06 16:27:52 -08:00
Jacob Szwejbka	9e678446a2	[Pytorch Edge] Add new_empty_strided to tracer (#69492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69492 We already add empty, and this is another weird variation that sometimes pops up. Triggering it is unclear, so just adding it for now. Test Plan: ran tracer Differential Revision: D32896522 fbshipit-source-id: 38627d8efc48ef240100ccdbd94c0e7208b0b466	2021-12-06 15:28:13 -08:00
Junjie Wang	65b0f389d2	[PyTorch][Distributed] Use auto-grad enabled collections for the shared linear op to enable backward grad calculation (#68096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68096 We replace all c10d APIs with the Auto-grad collection in the shareded linear op. So that we can enable the backward propagation (grad calculation for sharded linear). ghstack-source-id: 144882914 Test Plan: Unit test + CI Reviewed By: pritamdamania87 Differential Revision: D32177341 fbshipit-source-id: 1919e8ca877bdc79f4cdb0dc2a82ddaf6881b9f1	2021-12-06 15:17:08 -08:00
Junjie Wang	7c2489bdae	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68786 To enable the auto grad for the sharded linear, we find we need to make some changes to the current nn function api (c10d api with auto grad enabled). So we made the following several changes: 1. Add a new api `reduce_scatter` since we need it in the rowwise sharding. 2. Modify the `all_to_all` api to make sure it consistent with the ones in distributed_c10d.py. 3. Found the cpp input params of `reduce_scatter` is missing input param, added more unit test to cover these cases. 4. Sync the NN test from gloo to nccl. ghstack-source-id: 144860208 Test Plan: CI + Unit Test Reviewed By: pritamdamania87 Differential Revision: D32569674 fbshipit-source-id: 9bd613f91bbf7a39eede0af32a5a5db0f2ade43b	2021-12-06 13:38:58 -08:00
Rohan Varma	e032dae329	[Autograd/Checkpoint] Checkpoint implementation without reentrant autograd (#69027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69027 Resubmission of https://github.com/pytorch/pytorch/pull/62964 withe suggestions and tests discussed in https://github.com/pytorch/pytorch/issues/65537. Adds a `use_reentrant=False` flag to `checkpoint` function. When `use_reentrant=True` is specified, a checkpointing implementation that uses SavedVariableHooks instead of re-entrant autograd is used. This makes it more composable with things such as `autograd.grad` as well as DDP (still need to add thorough distributed testing). As discussed in https://github.com/pytorch/pytorch/issues/65537, we have added the following tests: -[ ] Gradient hooks are called once ghstack-source-id: 144644859 Test Plan: CI Reviewed By: pbelevich Differential Revision: D32704467 fbshipit-source-id: 6eea1cce6b935ef5a0f90b769e395120900e4412	2021-12-06 13:29:37 -08:00
Deepali Chourasia	4d81175a07	add VSX dispatch for fft_fill_with_conjugate_symmetry_stub (#68914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68057. As discussed in https://github.com/pytorch/pytorch/issues/68057 adding change to provide the missing dispatch for VSX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68914 Reviewed By: seemethere Differential Revision: D32696773 Pulled By: malfet fbshipit-source-id: f1b70ab85bf9fb1c0119cc70d6125b8801d95669	2021-12-06 13:04:59 -08:00
Eli Uriegas	f87faf3c29	.github: Volume mount local netrc for docs push (#69472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69472 Neglected the fact that the actual push for these variables is happening inside of a docker container, this should help resolve that issue Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32889583 Pulled By: seemethere fbshipit-source-id: d0ef213787694ab1a7e9fb508c58d2f53ff218c3	2021-12-06 12:11:23 -08:00
Rohan Varma	1859e5f000	[FSDP] Enforce wrapper_cls as a mandatory kwarg in enable_wrap. (#69358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69358 Enforces and raises error earlier if wrapper_cls is not provided as an arg into enable_wrap() function. Also improves the documentation. ghstack-source-id: 144807950 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32826963 fbshipit-source-id: d1b98df021e86d3d87a626e82facf6230b571a55	2021-12-06 12:11:20 -08:00
Rohan Varma	00245fed96	[FSDP] Kill config_auto_wrap_policy, remove policy from enable_wrap, (#69357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69357 Since we only want to support enable_wrap() and wrap() manual wrapping APIs without them accepting auto_wrap_policy, remove all this unneeded code. ghstack-source-id: 144807951 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32826318 fbshipit-source-id: 6526e700ebdf132cbb10439698f5c97ce083cd3d	2021-12-06 12:11:17 -08:00
Rohan Varma	c95277e92a	[FSDP] Remove auto_wrap() (#69356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69356 Per title ghstack-source-id: 144807949 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32816150 fbshipit-source-id: 6b4eacc63edd267bc1eb8a1c1d6c753bc581d63a	2021-12-06 12:11:14 -08:00
Rohan Varma	f333cde14e	[FSDP] Make recursive_wrap, wrap APIs independent of ConfigAutoWrap. (#68776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68776 Makes these APIs independent of ConfigAutoWrap so that they can be used by FSDP ctor without it knowing about ConfigAutoWrap. Also gets us one step closer to killing ConfigAutoWrap.recursive_wrap and auto_wrap(), as we will only support enable_wrap() and wrap() moving forward. Will test via unittests and FSDP benchmarks to ensure the wrapping still works. ghstack-source-id: 144807948 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32604021 fbshipit-source-id: 54defc0cd90b16b5185a8c1294b39f75c06ffd21	2021-12-06 12:09:49 -08:00
Marat Subkhankulov	456139d0ae	FX pass: fuse_sparse_matmul_add (#69340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69340 - An FX pass to fuse ops resulting from addmm(a, b.t()) - Used to enable structured sparsity using TRT Reviewed By: 842974287 Differential Revision: D32456684 fbshipit-source-id: 601826af216cea314ee85ed522d5c54a5151d720	2021-12-06 12:07:02 -08:00
Sangbaek Park	68b5c86e65	[Vulkan] Implement slice operator (#69382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69382 Implemented `slice` operator on the Vulkan backend: * Supports only <= 4D tensors. * `aten::slice.Tensor` will be executed internally by indexing Tensor. * Slicing means selecting the elements present in the tensor by using `:` slice operator. We can slice the elements by using the index of that particular element. * Indexing starts with 0. `end` is exclusive. In this example, we will be getting the elements from the very start to the end index 4(exclusive) of the tensor. ``` tensor = torch.tensor([2, 4, 1, 7, 0, 9]) print(tensor[ : 4]) # Outputs- tensor([2, 4, 1, 7]) ``` * Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally. * Negative `start` and `end` inputs are allowed. * CPU implementation: [/aten/src/ATen/native/TensorShape.cpp::slice()](`3e45739543/aten/src/ATen/native/TensorShape.cpp (L1262)`) * For width dimension, use `vkCmdCopyImage` API, * input texture size = `{x,y,z}` * if `step` is 1, copy a region from the input texture to the output texture once where * source offset = `{start,0,0}` * destination offset = `{0,0,0}` * copy extents = `{end-start,y,z}` * call `vkCmdCopyImage` API * if `step` is not 1, do for-loop from x=`start` to `end-1` by `step` (also from x_new=`0` to `end-start-1`) where * x_max = x * copy extents = `{1,y,z}` * if (x >= x_max) continue; // out of range * source offset = `{x,0,0}` * destination offset = `{x_new,0,0}` * call `vkCmdCopyImage` API * For height dimension, use `vkCmdCopyImage` API, * input texture size = `{x,y,z}` * if `step` is 1, copy a region from the input texture to the output texture once where * source offset = `{0,start,0}` * destination offset = `{0,0,0}` * copy extents = `{x,end-start,z}` * call `vkCmdCopyImage` API * if `step` is not 1, do for-loop from y=`start` to `end-1` by `step` (also from y_new=`0` to `end-start-1`) where * y_max = y * copy extents = `{x,1,z}` * if (y >= y_max) continue; // out of range * source offset = `{0,y,0}` * destination offset = `{0,y_new,0}` * call `vkCmdCopyImage` API * For batch and feature(channel) dimensions, we build up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access) * `b,c,h,w` = input tensor dims (NCHW) * `b1,c1,h1,w1` = output tensor dims (NCHW) * `posIn` = position (x,y,z) for input texture * `posOut` = position (x,y,z) for output texture * `inval` = input texel value * `outval` = output texel value * `max_dst_index` = batch size of output tensor * channel size of output tensor * `n` = end - start * `i` = index of input texel (0...3) and `j` = index of output texel (0..3) * Pseudo code: ``` for (uint j = 0; j < 4; ++j) { dst_index = posOut.z * 4 + j; if (dst_index >= max_dst_index) { save outval to output texture at posOut break; // out of reange } b1 = int(dst_index / channel size of output tensor); c1 = dst_index % channel size of output tensor; h1 = posOut.y; w1 = posOut.x; b=b1 c=c1 h=h1 w=w1 if (dim==0) { // batch b=start+stepb1; } else { // feature(channel) c=start+stepc1 } src_index = b * channel size of input tensor + c; posIn.x = int(w); posIn.y = int(h); posIn.z = int(src_index / 4); i = (src_index % 4); read inval from input texture at posIn outval[j] = inval[i] if (j == 3) { save outval to output texture at posOut } } ``` * Error/edge cases: * Vulkan backend doesn't support zero-sized slice. It throws an exception when allocating a Vulkan buffer if any dim size is zero. * The slice step should be positive. * Generalized test cases with different dim size tensors for batch, feature, height and width. For example, a 4D tensor slicing by dim=width: ``` tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=1 <-> tensor indexing by [:,:,:,10:30:1] tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=30, step=7 <-> tensor indexing by [:,:,:,10:30:7] tensor {2, 3, 40, 50} slicing with dim=3, start=10, end=50, step=2 <-> tensor indexing by [:,:,:,10:50:2] with end=out of range tensor {2, 3, 40, 50} slicing with dim=3, start=-60, end=60, step=2 <-> tensor indexing by [:,:,:,-60:60:2] with start/end=out of range tensor {2, 3, 40, 50} slicing with dim=3, start=-30, end=-10, step=2 <-> tensor indexing by [:,:,:,-30:-10:1] with negative start/end tensor {2, 3, 40, 50} slicing with dim=3, start=0, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,0:9223372036854775807:1] with end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=-10, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-10:9223372036854775807:1] with negative start and end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=INT64_MIN, end=INT64_MAX, step=2 <-> tensor indexing by [:,:,:,-9223372036854775808:9223372036854775807:1] with start=INT64_MIN and end=INT64_MAX tensor {2, 3, 40, 50} slicing with dim=3, start=empty, end=empty, step=2 <-> tensor indexing by [:,:,:,::1] with empty start/end ``` * References: * [Slicing PyTorch Datasets](https://lewtun.github.io/blog/til/nlp/pytorch/2021/01/24/til-slicing-torch-datasets.html) * [How to Slice a 3D Tensor in Pytorch?](https://www.geeksforgeeks.org/how-to-slice-a-3d-tensor-in-pytorch/) * [PyTorch Tensor Indexing API](https://pytorch.org/cppdocs/notes/tensor_indexing.html#translating-between-python-c-index-types) * [PyTorch Tensor Indexing](https://deeplearninguniversity.com/pytorch/pytorch-tensor-indexing/) * [Slicing and Striding](https://mlverse.github.io/torch/articles/indexing.html#slicing-and-striding) * Vulkan `slice` operator tensor conversion: {F684363708} Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (17 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (13 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (20 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (9 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (0 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.slice_width_success [ OK ] VulkanAPITest.slice_width_success (81 ms) [ RUN ] VulkanAPITest.slice_height_success [ OK ] VulkanAPITest.slice_height_success (56 ms) [ RUN ] VulkanAPITest.slice_feature_success [ OK ] VulkanAPITest.slice_feature_success (132 ms) [ RUN ] VulkanAPITest.slice_batch_success [ OK ] VulkanAPITest.slice_batch_success (33 ms) [ RUN ] VulkanAPITest.slice_invalidinputs_exceptions [ OK ] VulkanAPITest.slice_invalidinputs_exceptions (1 ms) ``` Reviewed By: SS-JIA Differential Revision: D32482638 fbshipit-source-id: 65841fb2d3489ee407f2b4f38619b700787d41b0	2021-12-06 12:05:37 -08:00
Natalia Gimelshein	a84ed8be6d	unify compare kernels (#69111 ) Summary: This unifies 6 compare ops (NE, EQ, LT, LE, GE, GT) into 2 kernels, reducing context size. Performance is ~5% worse for low width broadcasted cases, on-par for non-broadcasted With this PR, benchmarks for contiguous, 1M-MM, 1M-M1, op with scalar (size in MB and bandwidth in GB/s): ``` 5.0, 795.9 10.0, 650.5 15.0, 706.2 20.0, 731.6 25.0, 744.9 30.0, 758.1 35.0, 762.6 40.0, 768.8 45.0, 775.7 50.0, 780.7 55.0, 781.7 60.0, 783.0 65.0, 784.8 70.0, 790.7 75.0, 789.2 80.0, 794.4 85.0, 794.2 90.0, 797.4 95.0, 796.3 100.0, 798.0 3.0, 363.7 1.0, 122.2 3.0, 385.5 6.0, 420.4 2.0, 142.9 6.0, 755.5 9.0, 438.3 3.0, 151.6 9.0, 684.5 12.0, 449.5 4.0, 156.4 12.0, 702.9 15.0, 463.7 5.0, 159.6 15.0, 716.8 18.0, 472.7 6.0, 161.4 18.0, 737.0 21.0, 477.6 7.0, 162.4 21.0, 745.6 24.0, 480.9 8.0, 164.1 24.0, 755.4 27.0, 483.7 9.0, 163.7 27.0, 760.7 30.0, 487.3 10.0, 165.9 30.0, 770.4 33.0, 491.4 11.0, 166.3 33.0, 774.3 36.0, 492.9 12.0, 166.2 36.0, 779.0 39.0, 494.7 13.0, 166.7 39.0, 782.5 42.0, 491.3 14.0, 166.7 42.0, 789.0 45.0, 495.1 15.0, 167.5 45.0, 790.0 48.0, 499.7 16.0, 167.7 48.0, 791.8 51.0, 496.2 17.0, 166.9 51.0, 794.0 54.0, 497.6 18.0, 167.7 54.0, 797.4 57.0, 497.1 19.0, 167.5 57.0, 798.6 60.0, 498.8 20.0, 168.8 60.0, 802.1 ``` Master ``` 5.0, 743.4 10.0, 665.7 15.0, 702.3 20.0, 727.5 25.0, 740.7 30.0, 757.5 35.0, 760.3 40.0, 768.5 45.0, 775.7 50.0, 776.8 55.0, 781.1 60.0, 786.5 65.0, 786.8 70.0, 790.1 75.0, 789.7 80.0, 789.1 85.0, 793.2 90.0, 793.8 95.0, 795.9 100.0, 796.0 3.0, 383.1 1.0, 129.0 3.0, 337.0 6.0, 445.0 2.0, 149.6 6.0, 670.6 9.0, 445.3 3.0, 159.6 9.0, 678.6 12.0, 474.9 4.0, 164.1 12.0, 705.5 15.0, 480.8 5.0, 167.2 15.0, 718.3 18.0, 490.3 6.0, 169.1 18.0, 733.3 21.0, 493.9 7.0, 168.5 21.0, 742.5 24.0, 503.8 8.0, 171.9 24.0, 756.4 27.0, 506.7 9.0, 171.3 27.0, 759.8 30.0, 508.7 10.0, 172.4 30.0, 767.1 33.0, 515.7 11.0, 174.2 33.0, 773.7 36.0, 516.7 12.0, 170.4 36.0, 781.7 39.0, 519.1 13.0, 174.4 39.0, 782.1 42.0, 515.7 14.0, 174.1 42.0, 787.0 45.0, 519.2 15.0, 172.7 45.0, 788.1 48.0, 522.2 16.0, 175.4 48.0, 791.7 51.0, 519.6 17.0, 175.1 51.0, 795.7 54.0, 518.5 18.0, 174.8 54.0, 795.8 57.0, 519.1 19.0, 174.4 57.0, 796.6 60.0, 521.5 20.0, 175.6 60.0, 800.1 ``` <details> <summary>Benchmarking script </summary> ``` import torch from matplotlib import pyplot as plt from torch.utils.benchmark import Timer, Compare import math import click print(torch.cuda.get_device_capability()) # check that we are on Volta (compute capability 7,0) #torch.cuda.set_device(1) # don't benchmark on anything too small, you'll see only overhead click.command() click.option('--op_str', default="torch.gt") click.option('--dtype_str', default="float", type=click.Choice(['float', 'half'])) def bench(op_str, dtype_str): if dtype_str == "float": dtype = torch.float elif dtype_str == "half": dtype = torch.half MB = 1024 * 1024 size = MB results = [] sizes = [] for _ in range(20): torch.cuda.memory.empty_cache() a=torch.randn(int(size), device="cuda", dtype=dtype) b=torch.randn(int(size), device="cuda", dtype=dtype) t = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{size/MB} MB", description="contiguous", globals = {"a":a, "b":b}) res = t.blocked_autorange() results.append(res) sizes.append(size) size += MB del a #to save memory for next iterations del b c=Compare(results) #print(c) bw=[] bytes=[] element_size = torch.tensor([], dtype=dtype).element_size() output_element_size = 1 for res, size in zip(results,sizes): bytes_io = 2sizeelement_size + output_element_size * size bytes.append(bytes_io/MB) # we'll report bandwidth in GB/s bw.append(bytes_io/res.median * 1e-9) print(f"{bytes_io/MB:7.1f}, {bw[-1]:7.1f}") sizes = [] results = [[],[],[]] size = MB for _ in range(20): torch.cuda.memory.empty_cache() M = math.floor(math.sqrt(size)) a=torch.randn(1, M, device="cuda", dtype=dtype) b=torch.randn(M, M, device="cuda", dtype=dtype) b1 = torch.randn(M, 1, device="cuda", dtype=dtype) tb = Timer(stmt=f"{op_str}(a,b)", label = op_str, sub_label=f"{MM/MB} MB", description="MMM1", globals = {"a":a, "b":b}) t1 = Timer(stmt=f"{op_str}(a,b1)", label = op_str, sub_label=f"{MM/MB} MB", description="M11M", globals = {"a":a, "b1":b1}) ts = Timer(stmt=f"{op_str}(b,1.)", label = op_str, sub_label=f"{MM/MB} MB", description="scalar", globals = {"a":a, "b":b}) res = [t.blocked_autorange() for t in (tb, t1, ts)] for (rl, r) in zip(results, res): rl.append(r) sizes.append(M) size += MB del a #to save memory for next iterations del b comps = [Compare(r) for r in results] #[print(c) for c in comps] bw=[[],[],[]] for res, res1, ress, size in zip(results[0],results[1],results[2], sizes): bytes_io = (size+sizesize)element_size + output_element_size sizesize #(size+size+sizesize)4 bytes_io1 = (size+size)element_size + output_element_size * sizesize #(size+size+sizesize)4 bytes_ios = (sizesize)element_size + output_element_size size * size bytes_iol = (bytes_io, bytes_io1, bytes_ios) for (bw_elem, bytes_elem, res_elem) in zip(bw, bytes_iol, (res, res1, ress)): bw_elem.append(bytes_elem/res_elem.median * 1e-9) print(f"{bytes_iol[0]/MB:7.1f}, {bw[0][-1]:7.1f}", f"{bytes_iol[1]/MB:7.1f}, {bw[1][-1]:7.1f}", f"{bytes_iol[2]/MB:7.1f}, {bw[2][-1]:7.1f}") if __name__ == '__main__': bench() ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/69111 Reviewed By: mruberry Differential Revision: D32851098 Pulled By: ngimel fbshipit-source-id: cfb83922b2e8eb6a0ad0621ff07c2dada9c8e626	2021-12-06 11:00:53 -08:00
Eli Uriegas	38c576cfef	Clean up CODEOWNERS for .github/ (#69395 ) Summary: Cleans up the CODEOWNERS file to reflect current team Pull Request resolved: https://github.com/pytorch/pytorch/pull/69395 Test Plan: yeah_sandcastle Reviewed By: anjali411 Differential Revision: D32885237 Pulled By: seemethere fbshipit-source-id: a465f2cd0e27d5e53f5af5769d1cad47ec5348e7	2021-12-06 10:50:29 -08:00
Peter Bell	bf01cd5228	Move THC_sleep to ATen (#69038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69038 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32872479 Pulled By: ngimel fbshipit-source-id: 97c7592b16eee2ecc66c42507c358aa92cc8ee50	2021-12-06 10:20:43 -08:00
Mike Ruberry	a974699633	Skips failing ROCm test (#69456 ) Summary: ROCm and CUDA type promotion are slightly divergent and need to be updated. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/69456 Reviewed By: anjali411, janeyx99 Differential Revision: D32883895 Pulled By: mruberry fbshipit-source-id: 3b0ba8a9d092c2d7ff20d78da42d4a147b1db12d	2021-12-06 09:12:31 -08:00
kshitij12345	b737e09f60	expose return_types in Python (#66614 ) Summary: https://github.com/facebookresearch/functorch/issues/87 TODO: * [x] Add comments * [x] Add test * [x] Fix XLA <details> <summary>Generated python_return_types.cpp</summary> ```cpp #include <Python.h> #include <vector> #include <map> #include <string> #include "torch/csrc/autograd/python_return_types.h" #include "torch/csrc/utils/structseq.h" #include "torch/csrc/Exceptions.h" namespace { PyTypeObject* get__det_lu_based_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"det", ""}, {"lu", ""}, {"pivs", ""}, {nullptr} }; static PyTypeObject _det_lu_based_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._det_lu_based_helper", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_det_lu_based_helperNamedTuple, &desc); _det_lu_based_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_det_lu_based_helperNamedTuple; } PyTypeObject* get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fake_quantize_per_tensor_affine_cachemask_tensor_qparams", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple, &desc); _fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fake_quantize_per_tensor_affine_cachemask_tensor_qparamsNamedTuple; } PyTypeObject* get__fused_moving_avg_obs_fq_helper_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"output", ""}, {"mask", ""}, {nullptr} }; static PyTypeObject _fused_moving_avg_obs_fq_helperNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._fused_moving_avg_obs_fq_helper", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_fused_moving_avg_obs_fq_helperNamedTuple, &desc); _fused_moving_avg_obs_fq_helperNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_fused_moving_avg_obs_fq_helperNamedTuple; } PyTypeObject* get__lu_with_info_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"LU", ""}, {"pivots", ""}, {"info", ""}, {nullptr} }; static PyTypeObject _lu_with_infoNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._lu_with_info", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&_lu_with_infoNamedTuple, &desc); _lu_with_infoNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_lu_with_infoNamedTuple; } PyTypeObject* get__unpack_dual_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"primal", ""}, {"tangent", ""}, {nullptr} }; static PyTypeObject _unpack_dualNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types._unpack_dual", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&_unpack_dualNamedTuple, &desc); _unpack_dualNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &_unpack_dualNamedTuple; } PyTypeObject* get_aminmax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmaxNamedTuple, &desc); aminmaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmaxNamedTuple; } PyTypeObject* get_aminmax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"min", ""}, {"max", ""}, {nullptr} }; static PyTypeObject aminmax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.aminmax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&aminmax_outNamedTuple1, &desc); aminmax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &aminmax_outNamedTuple1; } PyTypeObject* get_cummax_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummaxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummaxNamedTuple, &desc); cummaxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummaxNamedTuple; } PyTypeObject* get_cummax_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummax_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummax_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummax_outNamedTuple1, &desc); cummax_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummax_outNamedTuple1; } PyTypeObject* get_cummin_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cumminNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cumminNamedTuple, &desc); cumminNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cumminNamedTuple; } PyTypeObject* get_cummin_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject cummin_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.cummin_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&cummin_outNamedTuple1, &desc); cummin_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &cummin_outNamedTuple1; } PyTypeObject* get_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eig_outNamedTuple, &desc); eig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eig_outNamedTuple; } PyTypeObject* get_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject eigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&eigNamedTuple1, &desc); eigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &eigNamedTuple1; } PyTypeObject* get_frexp_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexpNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexpNamedTuple, &desc); frexpNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexpNamedTuple; } PyTypeObject* get_frexp_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"mantissa", ""}, {"exponent", ""}, {nullptr} }; static PyTypeObject frexp_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.frexp_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&frexp_outNamedTuple1, &desc); frexp_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &frexp_outNamedTuple1; } PyTypeObject* get_geqrf_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrf_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrf_outNamedTuple, &desc); geqrf_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrf_outNamedTuple; } PyTypeObject* get_geqrf_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"a", ""}, {"tau", ""}, {nullptr} }; static PyTypeObject geqrfNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.geqrf", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&geqrfNamedTuple1, &desc); geqrfNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &geqrfNamedTuple1; } PyTypeObject* get_histogram_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogram_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogram_outNamedTuple, &desc); histogram_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogram_outNamedTuple; } PyTypeObject* get_histogram_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"hist", ""}, {"bin_edges", ""}, {nullptr} }; static PyTypeObject histogramNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.histogram", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&histogramNamedTuple1, &desc); histogramNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &histogramNamedTuple1; } PyTypeObject* get_kthvalue_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalueNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalueNamedTuple, &desc); kthvalueNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalueNamedTuple; } PyTypeObject* get_kthvalue_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject kthvalue_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.kthvalue_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&kthvalue_outNamedTuple1, &desc); kthvalue_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &kthvalue_outNamedTuple1; } PyTypeObject* get_linalg_cholesky_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_exNamedTuple, &desc); linalg_cholesky_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_exNamedTuple; } PyTypeObject* get_linalg_cholesky_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"L", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_cholesky_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_cholesky_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_cholesky_ex_outNamedTuple1, &desc); linalg_cholesky_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_cholesky_ex_outNamedTuple1; } PyTypeObject* get_linalg_eig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigNamedTuple, &desc); linalg_eigNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigNamedTuple; } PyTypeObject* get_linalg_eig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eig_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eig_outNamedTuple1, &desc); linalg_eig_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eig_outNamedTuple1; } PyTypeObject* get_linalg_eigh_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eighNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eighNamedTuple, &desc); linalg_eighNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eighNamedTuple; } PyTypeObject* get_linalg_eigh_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject linalg_eigh_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_eigh_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_eigh_outNamedTuple1, &desc); linalg_eigh_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_eigh_outNamedTuple1; } PyTypeObject* get_linalg_inv_ex_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_exNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_exNamedTuple, &desc); linalg_inv_exNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_exNamedTuple; } PyTypeObject* get_linalg_inv_ex_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"inverse", ""}, {"info", ""}, {nullptr} }; static PyTypeObject linalg_inv_ex_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_inv_ex_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_inv_ex_outNamedTuple1, &desc); linalg_inv_ex_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_inv_ex_outNamedTuple1; } PyTypeObject* get_linalg_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsqNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsqNamedTuple, &desc); linalg_lstsqNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsqNamedTuple; } PyTypeObject* get_linalg_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"residuals", ""}, {"rank", ""}, {"singular_values", ""}, {nullptr} }; static PyTypeObject linalg_lstsq_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_lstsq_out", nullptr, NamedTuple_fields, 4 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_lstsq_outNamedTuple1, &desc); linalg_lstsq_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_lstsq_outNamedTuple1; } PyTypeObject* get_linalg_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qrNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qrNamedTuple, &desc); linalg_qrNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qrNamedTuple; } PyTypeObject* get_linalg_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject linalg_qr_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_qr_outNamedTuple1, &desc); linalg_qr_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_qr_outNamedTuple1; } PyTypeObject* get_linalg_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdetNamedTuple, &desc); linalg_slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdetNamedTuple; } PyTypeObject* get_linalg_slogdet_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject linalg_slogdet_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_slogdet_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_slogdet_outNamedTuple1, &desc); linalg_slogdet_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_slogdet_outNamedTuple1; } PyTypeObject* get_linalg_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svd_outNamedTuple, &desc); linalg_svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svd_outNamedTuple; } PyTypeObject* get_linalg_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"Vh", ""}, {nullptr} }; static PyTypeObject linalg_svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.linalg_svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&linalg_svdNamedTuple1, &desc); linalg_svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &linalg_svdNamedTuple1; } PyTypeObject* get_lstsq_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsq_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsq_outNamedTuple, &desc); lstsq_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsq_outNamedTuple; } PyTypeObject* get_lstsq_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"QR", ""}, {nullptr} }; static PyTypeObject lstsqNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lstsq", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&lstsqNamedTuple1, &desc); lstsqNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lstsqNamedTuple1; } PyTypeObject* get_lu_unpack_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpackNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpackNamedTuple, &desc); lu_unpackNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpackNamedTuple; } PyTypeObject* get_lu_unpack_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"P", ""}, {"L", ""}, {"U", ""}, {nullptr} }; static PyTypeObject lu_unpack_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.lu_unpack_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&lu_unpack_outNamedTuple1, &desc); lu_unpack_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &lu_unpack_outNamedTuple1; } PyTypeObject* get_max_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject maxNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&maxNamedTuple, &desc); maxNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &maxNamedTuple; } PyTypeObject* get_max_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject max_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.max_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&max_outNamedTuple1, &desc); max_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &max_outNamedTuple1; } PyTypeObject* get_median_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject medianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&medianNamedTuple, &desc); medianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &medianNamedTuple; } PyTypeObject* get_median_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject median_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.median_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&median_outNamedTuple1, &desc); median_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &median_outNamedTuple1; } PyTypeObject* get_min_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject minNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&minNamedTuple, &desc); minNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &minNamedTuple; } PyTypeObject* get_min_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject min_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.min_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&min_outNamedTuple1, &desc); min_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &min_outNamedTuple1; } PyTypeObject* get_mode_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject modeNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&modeNamedTuple, &desc); modeNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &modeNamedTuple; } PyTypeObject* get_mode_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject mode_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.mode_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&mode_outNamedTuple1, &desc); mode_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &mode_outNamedTuple1; } PyTypeObject* get_nanmedian_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedianNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedianNamedTuple, &desc); nanmedianNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedianNamedTuple; } PyTypeObject* get_nanmedian_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject nanmedian_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.nanmedian_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&nanmedian_outNamedTuple1, &desc); nanmedian_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &nanmedian_outNamedTuple1; } PyTypeObject* get_qr_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qr_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qr_outNamedTuple, &desc); qr_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qr_outNamedTuple; } PyTypeObject* get_qr_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"Q", ""}, {"R", ""}, {nullptr} }; static PyTypeObject qrNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.qr", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&qrNamedTuple1, &desc); qrNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &qrNamedTuple1; } PyTypeObject* get_slogdet_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"sign", ""}, {"logabsdet", ""}, {nullptr} }; static PyTypeObject slogdetNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.slogdet", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&slogdetNamedTuple, &desc); slogdetNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &slogdetNamedTuple; } PyTypeObject* get_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solveNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solveNamedTuple, &desc); solveNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solveNamedTuple; } PyTypeObject* get_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"LU", ""}, {nullptr} }; static PyTypeObject solve_outNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&solve_outNamedTuple1, &desc); solve_outNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &solve_outNamedTuple1; } PyTypeObject* get_sort_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sort_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sort_outNamedTuple, &desc); sort_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sort_outNamedTuple; } PyTypeObject* get_sort_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject sortNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.sort", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&sortNamedTuple1, &desc); sortNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &sortNamedTuple1; } PyTypeObject* get_svd_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svd_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd_out", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svd_outNamedTuple, &desc); svd_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svd_outNamedTuple; } PyTypeObject* get_svd_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"U", ""}, {"S", ""}, {"V", ""}, {nullptr} }; static PyTypeObject svdNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.svd", nullptr, NamedTuple_fields, 3 }; if (!is_initialized) { PyStructSequence_InitType(&svdNamedTuple1, &desc); svdNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &svdNamedTuple1; } PyTypeObject* get_symeig_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeig_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeig_outNamedTuple, &desc); symeig_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeig_outNamedTuple; } PyTypeObject* get_symeig_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"eigenvalues", ""}, {"eigenvectors", ""}, {nullptr} }; static PyTypeObject symeigNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.symeig", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&symeigNamedTuple1, &desc); symeigNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &symeigNamedTuple1; } PyTypeObject* get_topk_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topk_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topk_outNamedTuple, &desc); topk_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topk_outNamedTuple; } PyTypeObject* get_topk_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"values", ""}, {"indices", ""}, {nullptr} }; static PyTypeObject topkNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.topk", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&topkNamedTuple1, &desc); topkNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &topkNamedTuple1; } PyTypeObject* get_triangular_solve_out_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solve_outNamedTuple; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve_out", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solve_outNamedTuple, &desc); triangular_solve_outNamedTuple.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solve_outNamedTuple; } PyTypeObject* get_triangular_solve_namedtuple() { static PyStructSequence_Field NamedTuple_fields[] = { {"solution", ""}, {"cloned_coefficient", ""}, {nullptr} }; static PyTypeObject triangular_solveNamedTuple1; static bool is_initialized = false; static PyStructSequence_Desc desc = { "torch.return_types.triangular_solve", nullptr, NamedTuple_fields, 2 }; if (!is_initialized) { PyStructSequence_InitType(&triangular_solveNamedTuple1, &desc); triangular_solveNamedTuple1.tp_repr = (reprfunc)torch::utils::returned_structseq_repr; is_initialized = true; } return &triangular_solveNamedTuple1; } } namespace torch { namespace autograd { std::map<std::string, PyTypeObject>& get_namedtuple_types_map() { // [NOTE] Non-global map // This map calls Python functions during its initialization. // If it is a global static variable and in case it is loaded // before Python interpreter is ready, then the calls it makes during // initialization will SEGFAULT. // To avoid this we make it function static variable so that it is // initialized only after the Python interpreter is ready. static std::map<std::string, PyTypeObject> namedtuple_types_map = { {"_det_lu_based_helper", get__det_lu_based_helper_namedtuple()}, {"_fake_quantize_per_tensor_affine_cachemask_tensor_qparams", get__fake_quantize_per_tensor_affine_cachemask_tensor_qparams_namedtuple()}, {"_fused_moving_avg_obs_fq_helper", get__fused_moving_avg_obs_fq_helper_namedtuple()}, {"_lu_with_info", get__lu_with_info_namedtuple()}, {"_unpack_dual", get__unpack_dual_namedtuple()}, {"aminmax", get_aminmax_namedtuple()}, {"aminmax_out", get_aminmax_out_namedtuple()}, {"cummax", get_cummax_namedtuple()}, {"cummax_out", get_cummax_out_namedtuple()}, {"cummin", get_cummin_namedtuple()}, {"cummin_out", get_cummin_out_namedtuple()}, {"eig_out", get_eig_out_namedtuple()}, {"eig", get_eig_namedtuple()}, {"frexp", get_frexp_namedtuple()}, {"frexp_out", get_frexp_out_namedtuple()}, {"geqrf_out", get_geqrf_out_namedtuple()}, {"geqrf", get_geqrf_namedtuple()}, {"histogram_out", get_histogram_out_namedtuple()}, {"histogram", get_histogram_namedtuple()}, {"kthvalue", get_kthvalue_namedtuple()}, {"kthvalue_out", get_kthvalue_out_namedtuple()}, {"linalg_cholesky_ex", get_linalg_cholesky_ex_namedtuple()}, {"linalg_cholesky_ex_out", get_linalg_cholesky_ex_out_namedtuple()}, {"linalg_eig", get_linalg_eig_namedtuple()}, {"linalg_eig_out", get_linalg_eig_out_namedtuple()}, {"linalg_eigh", get_linalg_eigh_namedtuple()}, {"linalg_eigh_out", get_linalg_eigh_out_namedtuple()}, {"linalg_inv_ex", get_linalg_inv_ex_namedtuple()}, {"linalg_inv_ex_out", get_linalg_inv_ex_out_namedtuple()}, {"linalg_lstsq", get_linalg_lstsq_namedtuple()}, {"linalg_lstsq_out", get_linalg_lstsq_out_namedtuple()}, {"linalg_qr", get_linalg_qr_namedtuple()}, {"linalg_qr_out", get_linalg_qr_out_namedtuple()}, {"linalg_slogdet", get_linalg_slogdet_namedtuple()}, {"linalg_slogdet_out", get_linalg_slogdet_out_namedtuple()}, {"linalg_svd_out", get_linalg_svd_out_namedtuple()}, {"linalg_svd", get_linalg_svd_namedtuple()}, {"lstsq_out", get_lstsq_out_namedtuple()}, {"lstsq", get_lstsq_namedtuple()}, {"lu_unpack", get_lu_unpack_namedtuple()}, {"lu_unpack_out", get_lu_unpack_out_namedtuple()}, {"max", get_max_namedtuple()}, {"max_out", get_max_out_namedtuple()}, {"median", get_median_namedtuple()}, {"median_out", get_median_out_namedtuple()}, {"min", get_min_namedtuple()}, {"min_out", get_min_out_namedtuple()}, {"mode", get_mode_namedtuple()}, {"mode_out", get_mode_out_namedtuple()}, {"nanmedian", get_nanmedian_namedtuple()}, {"nanmedian_out", get_nanmedian_out_namedtuple()}, {"qr_out", get_qr_out_namedtuple()}, {"qr", get_qr_namedtuple()}, {"slogdet", get_slogdet_namedtuple()}, {"solve", get_solve_namedtuple()}, {"solve_out", get_solve_out_namedtuple()}, {"sort_out", get_sort_out_namedtuple()}, {"sort", get_sort_namedtuple()}, {"svd_out", get_svd_out_namedtuple()}, {"svd", get_svd_namedtuple()}, {"symeig_out", get_symeig_out_namedtuple()}, {"symeig", get_symeig_namedtuple()}, {"topk_out", get_topk_out_namedtuple()}, {"topk", get_topk_namedtuple()}, {"triangular_solve_out", get_triangular_solve_out_namedtuple()}, {"triangular_solve", get_triangular_solve_namedtuple()}, }; return namedtuple_types_map; } PyTypeObject* get_namedtuple(std::string name) { static auto& namedtuple_types_map = get_namedtuple_types_map(); return namedtuple_types_map[name]; } void initReturnTypes(PyObject* module) { static struct PyModuleDef def = { PyModuleDef_HEAD_INIT, "torch._C._return_types", nullptr, -1, {}}; PyObject* return_types_module = PyModule_Create(&def); if (!return_types_module) { throw python_error(); } for (const auto& return_type_pair : get_namedtuple_types_map()) { // hold onto the TypeObject for the unlikely case of user // deleting or overriding it. Py_INCREF(return_type_pair.second); if (PyModule_AddObject( return_types_module, return_type_pair.first.c_str(), (PyObject)return_type_pair.second) != 0) { Py_DECREF((PyObject)return_type_pair.second); throw python_error(); } } // steals a reference to return_types on success if (PyModule_AddObject(module, "_return_types", return_types_module) != 0) { Py_DECREF(return_types_module); throw python_error(); } } } // namespace autograd } // namespace torch ``` </details> <details> <summary>Eg. updated call in other python__functions</summary> ```cpp // linalg_cholesky_ex static PyObject THPVariable_linalg_cholesky_ex(PyObject* self_, PyObject* args, PyObject* kwargs) { HANDLE_TH_ERRORS static PyTypeObject* NamedTuple = get_namedtuple("linalg_cholesky_ex"); static PyTypeObject* NamedTuple1 = get_namedtuple("linalg_cholesky_ex_out"); static PythonArgParser parser({ "linalg_cholesky_ex(Tensor input, , bool upper=False, bool check_errors=False, TensorList[2] out=None)", }, /traceable=/true); ParsedArgs<4> parsed_args; auto _r = parser.parse(nullptr, args, kwargs, parsed_args); if(_r.has_torch_function()) { return handle_torch_function(_r, nullptr, args, kwargs, THPLinalgVariableFunctionsModule, "torch.linalg"); } if (_r.isNone(3)) { // aten::linalg_cholesky_ex(Tensor self, , bool upper=False, bool check_errors=False) -> (Tensor L, Tensor info) auto dispatch_linalg_cholesky_ex = [](const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex(self, upper, check_errors); }; return wrap(NamedTuple, dispatch_linalg_cholesky_ex(_r.tensor(0), _r.toBool(1), _r.toBool(2))); } else { // aten::linalg_cholesky_ex.L(Tensor self, *, bool upper=False, bool check_errors=False, Tensor(a!) L, Tensor(b!) info) -> (Tensor(a!) L, Tensor(b!) info) auto out = _r.tensorlist_n<2>(3); auto dispatch_linalg_cholesky_ex_out = [](at::Tensor & L, at::Tensor & info, const at::Tensor & self, bool upper, bool check_errors) -> ::std::tuple<at::Tensor,at::Tensor> { pybind11::gil_scoped_release no_gil; return at::linalg_cholesky_ex_out(L, info, self, upper, check_errors); }; return wrap(NamedTuple1, dispatch_linalg_cholesky_ex_out(out[0], out[1], _r.tensor(0), _r.toBool(1), _r.toBool(2))); } Py_RETURN_NONE; END_HANDLE_TH_ERRORS } ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/66614 Reviewed By: H-Huang Differential Revision: D32741134 Pulled By: zou3519 fbshipit-source-id: 27bada30d20e66333ca1be1775608d9f0cbf9f59	2021-12-06 09:05:29 -08:00
Will Constable	78b7a419b2	Enable native_dropout/backward for lazy (#69374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69374 Enables existing native_dropout operator for use with lazy tensors. Also adds aten interned strings so lazy tensor codegen can refer to the symbols in generated IR classes. Test Plan: CI for regressions of existing use cases, and manual tests of new Lazy Tensor functionality Reviewed By: ngimel Differential Revision: D32837301 fbshipit-source-id: a372a24ec65367fb84ad2e97c7e38cae4ec703a6	2021-12-06 08:14:10 -08:00
Mike Ruberry	b6f41bb848	The Jiterator (#69439 ) Summary: This PR: - creates the "jiterator" pattern, allowing elementwise unary and binary kernels that don't accept scalars to be jit compiled when called - ports the gcd and i1 CUDA kernels to use the jiterator - extends elementwise binary systemic testing to be comparable to elementwise unary systemic testing - separates one test case from test_out in test_ops.py - updates more OpInfos to use expected failures instead of skips The jiterator currently does not support half, bfloat16 or complex dtypes. It also (as mentioned above) doesn't support scalar inputs. In the future we expect to add support for those datatypes and scalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69439 Reviewed By: ngimel Differential Revision: D32874968 Pulled By: mruberry fbshipit-source-id: d44bb9cde4f602703e75400ec5a0b209f085e9b3	2021-12-06 07:32:48 -08:00
Tao Xu	3202028ed1	[Core ML] Avoid recompiling models when the OS version is not changed (#69438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69438 We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model. {F683788183} ghstack-source-id: 144784720 ghstack-source-id: 144821734 Test Plan: 1. Test in the playground app 2. Test in the ig Reviewed By: hanton Differential Revision: D32866326 fbshipit-source-id: ae2174f68dda4d2ab89ee328cb710c08d45c4d9a	2021-12-06 00:49:51 -08:00
Don Jang	c97dc9286d	Revert D32780415: [Static Runtime] Move implementation details from impl.h into internal.h Test Plan: revert-hammer Differential Revision: D32780415 (`999e93e6a8`) Original commit changeset: 119b7aedbf56 fbshipit-source-id: 1aa777e8c1854ab27e86bc625188f7170097fac8	2021-12-04 19:44:07 -08:00
Michael Suo	29a45f0009	Revert D32743881: [Core ML] Avoid recompiling models when the OS version is not changed Test Plan: revert-hammer Differential Revision: D32743881 (`b97903abb8`) Original commit changeset: 2e94c6035520 fbshipit-source-id: 6cb05c414a23e15604b095c333a92ed8980092bd	2021-12-04 15:57:58 -08:00
Don Jang	999e93e6a8	[Static Runtime] Move implementation details from impl.h into internal.h (#69274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69274 `impl.h` is the main header file that defines the interface of Static Runtime to its clients. However, it is currently filled with implementation details that should not be leaked to our clients. 1) this can unnecessarily leak our internals to our clients which can make it hard to change them later 2) cause unnecessary merge conflicts when multiple people are touching this enormous impl.cpp file. To alleviate the situation, this change moves the implementation details from impl.h into a new file, internal.h, that's internally kept without leaking the details to our clients. This change will be followed by another change to rename `impl.h` into `runtime.h` or anything better since `impl.h` is currently not about implementation but SR's interface. Note that this change is NOT complete since the remaining declarations in impl.h still contain a lot of implementation details. Therefore, we should keep working on minimizing the interface to prevent our API from being bloated unnecessarily. Also we need to work on modularizing our implementations into separate pieces organized by separate files in the near future. Test Plan: Existing unittests Reviewed By: donaldong Differential Revision: D32780415 fbshipit-source-id: 119b7aedbf563b195641c5674572a9348732145f	2021-12-04 14:48:28 -08:00
Tao Xu	b97903abb8	[Core ML] Avoid recompiling models when the OS version is not changed (#69234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69234 We don't need to recompile the model if the OS version is not changed. This could save hundreds of ms when loading the model. {F683788183} ghstack-source-id: 144784720 Test Plan: 1. Test in the playground app 2. Test in the ig Reviewed By: hanton Differential Revision: D32743881 fbshipit-source-id: 2e94c6035520de3eeaf0b61f7cf9082228c8a955	2021-12-04 13:38:27 -08:00
Bin Bao	e8f4c9cc40	[LT] Upstream LazyView and view ops IR Nodes (#69277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69277 LazyView is the main class for tracking alias caused by view ops. The corresponding IR classes for view ops are hand-written now, and we can switch to code-gen them in future. For certain view ops, they have a reverse IR class to perform inplace update in the backward direction on a chain of alias ops. As part of the future work, we will simplify the logic for LazyView once the functionalization pass in core is ready to use. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32820014 Pulled By: desertfire fbshipit-source-id: d9eb526cb23885f667e4815dc9dd291a7b7e4256	2021-12-04 08:44:54 -08:00
Bin Bao	0bbe21b172	[LT] Upstream more util functions (#69098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69098 Add the following utils: helpers, ir_dump_util, and tensor_util. Some of the util functions may be better organized by grouping into different files, but we can leave that for later. Test Plan: Imported from OSS Reviewed By: alanwaketan Differential Revision: D32758480 Pulled By: desertfire fbshipit-source-id: 2a0707879f0c49573380b4c8227a3c916c99bf9a	2021-12-04 08:42:35 -08:00
Xiao Wang	bfe5ad28e6	[Linalg] Add a runtime switch to let pytorch prefer a backend impl in linalg functions on GPU (#67980 ) Summary: Per title. This PR introduces a global flag that lets pytorch prefer one of the many backend implementations while calling linear algebra functions on GPU. Usage: ```python torch.backends.cuda.preferred_linalg_library('cusolver') ``` Available options (str): `'default'`, `'cusolver'`, `'magma'`. Issue https://github.com/pytorch/pytorch/issues/63992 inspired me to write this PR. No heuristic is perfect on all devices, library versions, matrix shapes, workloads, etc. We can obtain better performance if we can conveniently switch linear algebra backends at runtime. Performance of linear algebra operators after this PR should be no worse than before. The flag is set to `'default'` by default, which makes everything the same as before this PR. The implementation of this PR is basically following that of https://github.com/pytorch/pytorch/pull/67790. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67980 Reviewed By: mruberry Differential Revision: D32849457 Pulled By: ngimel fbshipit-source-id: 679fee7744a03af057995aef06316306073010a6	2021-12-03 19:06:30 -08:00
Don Jang	9663e08674	[Static Runtime] Fix a bug that aten::embedding_bag keeps cannot handle resized input tensors (#69219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69219 This change fixes a bug that `aten::embedding_bag` implementation does not adjust the size of a managed output tensor according to a given input after memory planning starts. Test Plan: Enhanced `StaticRuntime.EmbeddingBag` to trigger the existing bug that's fixed by this change. Reviewed By: mikeiovine Differential Revision: D32544399 fbshipit-source-id: 0a9f1d453e96f0cfa8443c8d0b28bbc520e38b29	2021-12-03 19:01:45 -08:00
Saketh Are	6a4fa86026	Add OpInfos for misc nn.functional operators (#68922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68922 Reviewed By: Chillee Differential Revision: D32842301 Pulled By: saketh-are fbshipit-source-id: b7166faefb64668fc76cca6c528501b0d360c43b	2021-12-03 17:03:02 -08:00
Michael Carilli	da023611d7	[CUDA graphs] Fixes make_graphed_callables example typos (#69379 ) Summary: cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/69379 Reviewed By: mruberry Differential Revision: D32841260 Pulled By: ngimel fbshipit-source-id: a7d0b9db0578526907547b201eddd55827812b63	2021-12-03 16:51:14 -08:00
Xu Zhao	e92b14bf1f	Update CUDA version to 11.3 and setup proper environment variables. (#69383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69383 Test Plan: TorchBench CI RUN_TORCHBENCH: hf_Bert Reviewed By: janeyx99 Differential Revision: D32845001 Pulled By: xuzhao9 fbshipit-source-id: 50dff742ad4786e4b4995bd9aa82629b2fc050c5	2021-12-03 16:12:29 -08:00
Scott Wolchok	a3ca4c83a6	[PyTorch] Add torch::jit::toString(const Type&) (#66689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66689 Let's not take an extra refcount bump to stringify types. ghstack-source-id: 144374720 Test Plan: CI Reviewed By: suo Differential Revision: D31691526 fbshipit-source-id: 673d632a83e6179c063530fdbc346c22d5f47d7c	2021-12-03 15:16:08 -08:00
Will Constable	855365e9c4	Clean up dead code (#69296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69296 remove a commented block of code that was accidentally checked in Test Plan: no testable changes Reviewed By: alanwaketan Differential Revision: D32799197 fbshipit-source-id: d3eb05cbafb0f5a4a3f41c17f66ca6d0c2fc60b7	2021-12-03 15:11:38 -08:00
Nelson Elhage	a813ddf5ec	CUDACachingAllocator: make an error message more accurate. (#69174 ) Summary: The `TORCH_CHECK` asserts for strictly-greater-than `kLargeBuffer`, but the exception claims `>=`. Fix the error message to match the code. Happy to open an issue if it's helpful; I was hopeful the trivial fix doesn't need a separate issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69174 Reviewed By: zou3519 Differential Revision: D32760055 Pulled By: H-Huang fbshipit-source-id: 1a8ab68f36b326ed62d78afdcb198f4d6572d017	2021-12-03 15:04:59 -08:00
Elio	088a4feb41	Update the documentation for AMP with DataParallel (#69218 ) Summary: Following https://github.com/pytorch/pytorch/issues/60540 and pull request https://github.com/pytorch/pytorch/issues/43102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69218 Reviewed By: gchanan Differential Revision: D32803814 Pulled By: ngimel fbshipit-source-id: 06fdbbee2c7734153271be70ec4bc24263c8c367	2021-12-03 14:58:47 -08:00
Jane Xu	80a67cd33c	Limit uploading JSONs to trunk (#69385 ) Summary: Mac workflows on forked PRs don't have the right permissions to upload artifacts :/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/69385 Reviewed By: malfet, atalman Differential Revision: D32843252 Pulled By: janeyx99 fbshipit-source-id: e137a6707fe46559771b9d77fbfe44b0a21c914a	2021-12-03 13:20:37 -08:00
Nadav Elyahu	a20b9f8d5c	add HPU case for backend_to_string function (#69225 ) Summary: Change-Id: If8ed7f1161343a2e494d8b964576e1ee193007f7 Fixes https://github.com/pytorch/pytorch/issues/65609 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69225 Reviewed By: gchanan Differential Revision: D32804545 Pulled By: wconstab fbshipit-source-id: bdf359bd779113153ebdecc515edba94e47e0ae4	2021-12-03 12:54:15 -08:00
Donald Dong	6f7a5ddffc	[SR] Use std::vector::reserve in GetLivenessMap (#68884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68884 This diff uses std::vector::reserve in GetLivenessMap to set container capacity for all local contains to avoid runtime resizing. The changes should theoretically improves the performance by a little. Test Plan: - [x] `buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1` - [x] ``` seq 1 10 \| xargs -I{} ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench \ --scripted_model=/data/users/dxd/302008423_0.predictor.disagg.local \ --method_name=local_request_only.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=0 --warmup_iters=0 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 \ --input_type="recordio" --pt_inputs=/data/users/dxd/302008423_0.local_ro.inputs.recordio \ --recordio_use_ivalue_format=1 ``` ### Before ``` I1201 12:04:46.753311 2874563 PyTorchPredictorBenchLib.cpp:336] Took 10.9826 sec to initialize a predictor. I1201 12:05:00.617139 2875780 PyTorchPredictorBenchLib.cpp:336] Took 11.1078 sec to initialize a predictor. I1201 12:05:15.279667 2876813 PyTorchPredictorBenchLib.cpp:336] Took 11.7979 sec to initialize a predictor. I1201 12:05:30.201207 2877554 PyTorchPredictorBenchLib.cpp:336] Took 11.8901 sec to initialize a predictor. I1201 12:05:44.386926 2879713 PyTorchPredictorBenchLib.cpp:336] Took 11.2722 sec to initialize a predictor. I1201 12:05:58.003582 2881426 PyTorchPredictorBenchLib.cpp:336] Took 10.8046 sec to initialize a predictor. I1201 12:06:12.004778 2882604 PyTorchPredictorBenchLib.cpp:336] Took 11.2754 sec to initialize a predictor. I1201 12:06:26.101241 2884888 PyTorchPredictorBenchLib.cpp:336] Took 11.3355 sec to initialize a predictor. I1201 12:06:40.364817 2886572 PyTorchPredictorBenchLib.cpp:336] Took 11.401 sec to initialize a predictor. I1201 12:06:54.483794 2888614 PyTorchPredictorBenchLib.cpp:336] Took 11.3498 sec to initialize a predictor. ``` ### After ``` I1201 11:51:53.775239 2818391 PyTorchPredictorBenchLib.cpp:336] Took 10.9113 sec to initialize a predictor. I1201 11:52:07.412720 2819530 PyTorchPredictorBenchLib.cpp:336] Took 10.8413 sec to initialize a predictor. I1201 11:52:21.202816 2820359 PyTorchPredictorBenchLib.cpp:336] Took 11.0216 sec to initialize a predictor. I1201 11:52:35.513288 2821029 PyTorchPredictorBenchLib.cpp:336] Took 11.4216 sec to initialize a predictor. I1201 11:52:49.145979 2821930 PyTorchPredictorBenchLib.cpp:336] Took 10.8272 sec to initialize a predictor. I1201 11:53:02.908790 2822859 PyTorchPredictorBenchLib.cpp:336] Took 11.0262 sec to initialize a predictor. I1201 11:53:16.276015 2823657 PyTorchPredictorBenchLib.cpp:336] Took 10.6893 sec to initialize a predictor. I1201 11:53:30.103283 2824382 PyTorchPredictorBenchLib.cpp:336] Took 11.1854 sec to initialize a predictor. I1201 11:53:44.298514 2825365 PyTorchPredictorBenchLib.cpp:336] Took 11.4796 sec to initialize a predictor. I1201 11:53:58.258708 2826128 PyTorchPredictorBenchLib.cpp:336] Took 11.2652 sec to initialize a predictor. ``` Reviewed By: swolchok Differential Revision: D32649252 fbshipit-source-id: 5cd296d12b12e5b15e85e4f1a8a236e293f37f9c	2021-12-03 12:18:06 -08:00
Onyiee	ae11264583	Fixed type checking errors in node.py (#68124 ) Summary: Fixes [issue#67](https://github.com/MLH-Fellowship/pyre-check/issues/67) This PR fixes the type checking errors in Pytorch torch/fx/node.py . The variable types in 363:20 and 364:20 were declared to have type `List[str]` but were assigned a value of `None`. This caused an incompatitble variable type error. I changed the type from `List[str]` to `Optional[List[str]` . This therefore fixed the incompatitble variable type error. Signed-off-by: Onyemowo Agbo onionymous 0xedward Pull Request resolved: https://github.com/pytorch/pytorch/pull/68124 Reviewed By: gmagogsfm Differential Revision: D32322414 Pulled By: onionymous fbshipit-source-id: be11bbbd463715ddf28a5ba78fb4adbf62878c80	2021-12-03 12:03:49 -08:00
Kevin Tse	6baaec30cd	[DataPipe] Adding ShufflerMapDataPipe (#68606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68606 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32813290 Pulled By: NivekT fbshipit-source-id: 8d1ebd5bc776563c23250f76a2efc1d395f1af9c	2021-12-03 11:36:33 -08:00
Scott Wolchok	3e45739543	[PyTorch][JIT] Use stack.pop_back() instead of pop(stack) for DROP (#69326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69326 Looks like this really is slightly cheaper (see assembly diff screenshot in internal test plan). The problem is that `pop()` returns the value, so we have to spend instructions moving it out of the stack and then destroying it via a local. ghstack-source-id: 144641680 Test Plan: {F684148304} CI Reviewed By: zhxchen17 Differential Revision: D32812841 fbshipit-source-id: e9e43458d3364842f67edd43e43575a1f72e3cb0	2021-12-03 11:09:05 -08:00
Scott Wolchok	2c84b010e6	[PyTorch] Use toObjectRef in JIT interpreter (#69324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69324 This slightly shrinks runImpl. Before: - Move pointer out of IValue - Clear the IValue to none - Do our thing with the Object - destroy the intrusive_ptr on the C stack - destroy the IValue on the C stack (even though it was cleared to None, the destructor has to run anyway) After: - Grab the pointer out of IValue - Do our thing with the Object - Decref the pointer in the IValue on the JIT stack as we assign over it We should be saving at least the memory traffic from clearing the IValue and possibly the dtor code as well. ghstack-source-id: 144638920 Test Plan: Inspected assembly to verify shorter runImpl Tried to microbenchmark (D32809454) but can't show a difference. Reviewed By: gchanan Differential Revision: D32812252 fbshipit-source-id: a3689f061ee51ef01e4696bd4c6ffcbc41c30af5	2021-12-03 11:07:16 -08:00
Eli Uriegas	5a480831e6	.github: Propagate WITH_PUSH to docs jobs (#69372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69372 Docs weren't getting push since this variable wasn't getting propagated to the docker container Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32837012 Pulled By: seemethere fbshipit-source-id: 5074d5266a567df2230981186cabffb53c01c634	2021-12-03 11:00:38 -08:00
Matt Galloway	8f8524a447	Expose is_metal_available in header (#68942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68942 Currently, `at::native::is_metal_available()` is implemented, but it's not exposed in the header, so nobody can use it. It's a useful function and I want to use it, so exposing it in the header. Test Plan: CI Reviewed By: sodastsai, xta0 Differential Revision: D32675236 fbshipit-source-id: b4e692db7d171dfb872d5c2233cc808d7131f2e9	2021-12-03 10:31:03 -08:00
Joel Schlosser	77ca153d3e	Remove columns and ones from slow2d transpose signatures (#68898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68898 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655873 Pulled By: jbschlosser fbshipit-source-id: 810035a745e3851bd5326459b563e4796a074a65	2021-12-03 09:56:18 -08:00
Joel Schlosser	7ca2da14e9	Remove finput and fgrad_input from slow3d signatures (#68897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68897 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655875 Pulled By: jbschlosser fbshipit-source-id: 8d04968b2df47e11da1eceb1612d55d00768eeb4	2021-12-03 09:55:02 -08:00
Eli Uriegas	73d2ca20e0	.github: Add credentials for macos test jobs (#69371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69371 macOS jobs need credentials to upload their test stats Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32836893 Pulled By: seemethere fbshipit-source-id: 0f5a8f1b35f4240d57b08a2120a97a13ba3b3de5	2021-12-03 09:43:41 -08:00
Mike Iovine	6ed7354435	[SR][Code cleanup] Typedef/default for kwargs (#69164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69164 We have lots of methods that take `std::unordered_map<std::string, c10::IValue>` now. That's kind of ugly and cumbersome to type, so add a `KWargs` typedef. Also made the `operator()` default `kwargs` to empty. Note that we could have another overload that doesn't take `kwargs` at all, but the perf gain is so minuscule it's probably not worth it. ghstack-source-id: 144691899 Test Plan: CI Reviewed By: d1jang Differential Revision: D32734677 fbshipit-source-id: 8d6496a6d1ec2dc71253151d2f6408f1387966cf	2021-12-03 09:27:37 -08:00
Jane Xu	b761172406	Revert D32786909: [C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices Test Plan: revert-hammer Differential Revision: D32786909 (`dbc8d9c947`) Original commit changeset: a53f96f57e67 fbshipit-source-id: 19599c3a489804bfdbb3062f4256dceb680c143b	2021-12-03 08:31:45 -08:00
Andrey Talman	e0fb228e03	Revert of adding windows CUDA 11.5 workflow (#69365 ) Summary: This is partial revert of `bb522c9d7a` to revert addition of workflows for CUDA 11.5 windows that fails Pull Request resolved: https://github.com/pytorch/pytorch/pull/69365 Reviewed By: suo Differential Revision: D32831418 Pulled By: atalman fbshipit-source-id: 184346d22623f88594312a4ce2e4d29cc67e8338	2021-12-03 08:00:16 -08:00
Peter Bell	21919be96b	CMake: Update precompiled header and fix support (#67851 ) Summary: This fixes the `USE_PRECOMPILED_HEADERS` cmake version check which was accidentally inverted, so it was always disabled. I've also made the precompiled header so it only includes headers used in 95% or more of code, weighted by compile time. This limits it to the standard library, `c10` and a limited subset of `ATen/core`. Crucially, the new pch doesn't depend on `native_functions.yaml` so won't cause as much unnecessary rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67851 Reviewed By: zou3519 Differential Revision: D32290902 Pulled By: dagitses fbshipit-source-id: dfc33330028c99b02ff40963926c1f1260d00d00	2021-12-03 06:51:56 -08:00
Mike Iovine	cc46dc45e1	[SR] Factor logic that determines managed tensors out of MemoryPlanner (#68295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68295 There's no reason we can't figure out what tensors we need to manage at model load time. It's also useful to have the set of ranges available at load time for integrating the ranges algorithm introduced in the previous diff. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D32400593 fbshipit-source-id: 0466b2641166ddc9c14f72774f4ba151407be400	2021-12-03 04:45:27 -08:00
Jacob Szwejbka	276cb8f501	[Pytorch Edge] Make Tracer support xirp metal segmentation (#69328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69328 Aten_metal_prepack is cpp based and can be safely included here. Test Plan: "Traced" the xirp model with the script. Reviewed By: xta0 Differential Revision: D32813686 fbshipit-source-id: 7a428151348dc9d3f576531701926d6b3413de3d	2021-12-02 22:16:19 -08:00
Saketh Are	a07ffe8d0e	Add OpInfos for combinations, cartesian_prod, sum_to_size, ldexp, as_strided (#68853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68853 Reviewed By: davidberard98 Differential Revision: D32811147 Pulled By: saketh-are fbshipit-source-id: 941dcf949072f8d10faf4d5a0fa0ef409ac6a7db	2021-12-02 21:22:56 -08:00
Mark Richardson	834bd3134e	Back out "Add efficient zero tensors" (#69327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69327 Original commit changeset: d44096d88265 Original Phabricator Diff: D32144240 (`668574af4a`) Test Plan: CI original diff failed 175 builds in CI Reviewed By: airboyang, anjali411 Differential Revision: D32809407 fbshipit-source-id: c7c8e69bcee0274992e2d5da901f035332e60071	2021-12-02 19:11:41 -08:00
ankitaS11	c572a603a6	fix for python 3.10 for gradient opinfo (#68113 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/67612 by creating a tensor first and then converting the dtype explicitly using `.to(dtype)` call. Looking forward to your feedback and suggestions on this. cc: kshitij12345 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68113 Reviewed By: zou3519 Differential Revision: D32797329 Pulled By: saketh-are fbshipit-source-id: 5c34709ab277c82cda316a3ea1cf01e853e4c38b	2021-12-02 19:01:01 -08:00
Michael Carilli	572c3e3118	Fix some usages of CUDA_VERSION (#69092 ) Summary: See https://pytorch.slack.com/archives/G4Z791LL8/p1638229956006300 I grepped c10, aten, and torch for CUDA_VERSION and checked the usages I saw. I can't guarantee I made a clean sweep. but this improves the status quo. cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/69092 Reviewed By: zou3519 Differential Revision: D32786919 Pulled By: ngimel fbshipit-source-id: 1d29827dca246f33118d81e136252ddb5bf3830f	2021-12-02 18:32:47 -08:00
Andrew Tulloch	dbc8d9c947	[C10D] [Easy] Use pinned memory for HtoD copies in Reducer:: sync_bucket_indices (#69298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69298 I was exploring adding an invariant that we actually use properly-tracked pinned memory when doing non-blocking copies (to plug various correctness holes), and found this case where we allocate a tensor without pinned memory and then copy it with non_blocking=True. Test Plan: Unit tests cover this code. Reviewed By: rohan-varma Differential Revision: D32786909 fbshipit-source-id: a53f96f57e6727238e4cd2164c1a0f04cf270413	2021-12-02 17:34:34 -08:00
Shirong Wu	e2c7bd08b9	Add operator div (#68528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68528 Add operator converter div, torch.floor_div is announce to be deprecated by pytorch, consider remove after full deprecation done by pytorch. Reviewed By: 842974287 Differential Revision: D32497573 fbshipit-source-id: d06c864077f745c295c33fb25639b7116f85ca20	2021-12-02 17:25:40 -08:00
Nikita Shulga	bede18b061	Add support for C++ frontend wrapper on Linux (#69094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69094 Partially addresses https://github.com/pytorch/pytorch/issues/68768 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D32730079 Pulled By: malfet fbshipit-source-id: 854e4215ff66e087bdf354fed7a17e87f2649c87	2021-12-02 16:47:00 -08:00
Peter Bell	33c3c539b6	THPStorage: Prefer intrusive_ptr over owning raw pointers (#69248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69248 Reviewed By: zou3519 Differential Revision: D32771035 Pulled By: ngimel fbshipit-source-id: cf9bbcc5563ae9715ecf13631ba56c32240e59e3	2021-12-02 16:33:03 -08:00
kshitij12345	9f39a2de0a	[fix] add range check for `k` kthvalue_cpu (#68863 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68813 Long-term it might make more sense to port it to structured Pull Request resolved: https://github.com/pytorch/pytorch/pull/68863 Reviewed By: H-Huang Differential Revision: D32749372 Pulled By: mruberry fbshipit-source-id: 85a1b2a31e922ff1df0d0f3f576ad219e652aa49	2021-12-02 15:33:06 -08:00
Eli Uriegas	cc85b68984	.github: Fix ci workflows generation (#69329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69329 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32814709 Pulled By: seemethere fbshipit-source-id: ea83aa0319bebb65623856ca9e34689581dd517b	2021-12-02 15:28:59 -08:00
Eli Uriegas	f786b03f98	ci: Migrate docs push to GHA (#69172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69172 Migrates the docs push jobs to Github Actions by implementing a simple WITH_PUSH switch to do the actual push. Adds 2 new workflows for GHA: * linux-docs (on trunk) * linux-docs-push (on schedule) linux-docs-push is the only workflow that actually gets access to credentials so it should be relatively safe. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32767239 Pulled By: seemethere fbshipit-source-id: 5b100f986cf4023c323f4f96f0fe7942fec49ad2	2021-12-02 15:06:57 -08:00
jjsjann123	db5425bcd2	re-enable layer_norm in autodiff (#69007 ) Summary: Turn on layer_norm in autodiff https://github.com/pytorch/pytorch/issues/67732 should have fixed the previously issue exposed by enabling layer_norm in autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/69007 Reviewed By: soulitzer Differential Revision: D32699108 Pulled By: eellison fbshipit-source-id: 6951668c0e74e056d3776294f4e1fd3123c763e5	2021-12-02 14:55:00 -08:00
kshitij12345	5b2586fe09	[testing] Ignore expected_regex in assertRaisesRegex for non-native device (#68723 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29719 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68723 Reviewed By: zou3519 Differential Revision: D32797061 Pulled By: mruberry fbshipit-source-id: 3bcae6d3d62d180059dbe39be520b0e7f9aea19f	2021-12-02 14:52:27 -08:00
Joel Schlosser	36ba1b6b3a	Remove unused _convolution_nogroup op (#68829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68829 Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D32627578 Pulled By: jbschlosser fbshipit-source-id: 8a4c0ac58aae184a465b1fd40cce880a60d67339	2021-12-02 14:42:08 -08:00
Mikhail Zolotukhin	791d5087ed	[TensorExpr] Add lowerings for quantized ops: cat, mul, conv1d, relu. (#69055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69055 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32710325 Pulled By: ZolotukhinM fbshipit-source-id: 4a7f0ac059ea238463317b6a45a822b8d05610dd	2021-12-02 14:34:21 -08:00
Mikhail Zolotukhin	83c4451f60	[TensorExpr] Add a pass to symbolize an input dimension. (#68857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68857 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632908 Pulled By: ZolotukhinM fbshipit-source-id: bcee95d83731fcea07ec2f55ed78418ee52f51b6	2021-12-02 14:34:18 -08:00
Mikhail Zolotukhin	1e9dcdd2a0	[TensorExpr] TensorExprKernel: support custom-class constants. (#68856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68856 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632907 Pulled By: ZolotukhinM fbshipit-source-id: e4180f8d791ba0cdf82bcb3bd11b61405c2faadd	2021-12-02 14:34:15 -08:00
Mikhail Zolotukhin	48d7d585c8	[TensorExpr] IR Eval: add more logging. (#68855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68855 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632905 Pulled By: ZolotukhinM fbshipit-source-id: fef9b019d8d5b8a3ffd4075bfac069d1c81f647d	2021-12-02 14:34:12 -08:00
Mikhail Zolotukhin	b6bcf5a0f1	[TensorExpr] Un-const TEK::kernel_func_name to allow recompilation. (#68854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68854 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32632904 Pulled By: ZolotukhinM fbshipit-source-id: 154e3802ba844e738f09dbc239cf3656b9f8d5fd	2021-12-02 14:33:02 -08:00
Nikita Shulga	a0367f8980	Revert D32404517: [quant][embedding qat] Support Embedding QAT via FX API Test Plan: revert-hammer Differential Revision: D32404517 (`abda069ce2`) Original commit changeset: 0484df8c826b fbshipit-source-id: 4e7d62b9ccdb84eb4d184cd0b3c9506013fd8336	2021-12-02 14:28:35 -08:00
Nikita Shulga	ec4c749024	Revert D32318435: [quant][embdding qat] Add FX support for QAT EmbeddingBag Test Plan: revert-hammer Differential Revision: D32318435 (`4484c04513`) Original commit changeset: 8b5d1a5d5422 fbshipit-source-id: e46d431f92a5c3f86c757695164d1eb5b0041298	2021-12-02 14:27:17 -08:00
Jane Xu	8dafe6e147	Forward fix merge conflict (#69319 ) Summary: Forward fixes a merge conflict between two commits Pull Request resolved: https://github.com/pytorch/pytorch/pull/69319 Reviewed By: seemethere Differential Revision: D32810884 Pulled By: janeyx99 fbshipit-source-id: 6e2f9fc89d00da979de1430a172673e82c51ba14	2021-12-02 14:05:54 -08:00
Kurt Mohler	52219b1017	Fix `ChainedScheduler.get_last_lr()` (#69112 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68820 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/69112 Reviewed By: zou3519 Differential Revision: D32796626 Pulled By: albanD fbshipit-source-id: bde9d4e473527be4c0a7f21cb57f795a67a99eaa	2021-12-02 13:44:12 -08:00
Ming Zhu	db30696be8	[pytorch][PR] bug fix for D32374003 (#69278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69278 Test Plan: ``` fbpkg build -E smart.inference_platform_sp.sigrid_predictor.persistent.bolt --yes ``` Reviewed By: kimishpatel, HDCharles Differential Revision: D32773910 fbshipit-source-id: a2181fea354f310cf9f6f57b802dc4a148627acc	2021-12-02 13:31:19 -08:00
Jane Xu	915c26f588	GHA: preserve downloaded JSONs as artifacts (#69258 ) Summary: Preserves the .json files in the test folder for every test job as an artifact. Going to hud.pytorch.org/pr/69258 and downloading/unzipping any of the `test-jsons-*.zip` shows that .pytorch-slow-tests.json and .pytorch-disabled-tests.json exist. (Though you won't see them in your file manager as they are hidden files.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69258 Reviewed By: seemethere Differential Revision: D32807102 Pulled By: janeyx99 fbshipit-source-id: ed1b227cdd32160ed045dd79a7edc55216dcfe53	2021-12-02 13:26:14 -08:00
lezcano	cafcf599d0	Deprecate torch.triangular_solve (#63570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63570 There is a use of `at::triangular_solve_out` in the file `torch/csrc/jit/tensorexpr/external_functions.cpp` that I have not dared to move to `at::linalg_solve_triangular_out`. Deprecation note: This PR deprecates the `torch.triangular_solve` function in favor of `torch.linalg.solve_triangular`. An upgrade guide is added to the documentation for `torch.triangular_solve`. Note that it DOES NOT remove `torch.triangular_solve`, but `torch.triangular_solve` will be removed in a future PyTorch release. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618035 Pulled By: anjali411 fbshipit-source-id: 0bfb48eeb6d96eff3e96e8a14818268cceb93c83	2021-12-02 13:24:55 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	dde801686b	Expose MobileCode to python (#66592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66592 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D31632600 Pulled By: tugsbayasgalan fbshipit-source-id: 46a7ac20ddb6b433bd037280ed020481901a15c9	2021-12-02 13:18:46 -08:00
Andrey Talman	bb522c9d7a	Enabling CUDA 11.5 for binary builds, Adding windows workflows for CUDA 11.5 (#69262 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69262 Reviewed By: malfet Differential Revision: D32804850 Pulled By: atalman fbshipit-source-id: abac45ad1d49ec7e0e7df6cb9a22a46fbcd905a2	2021-12-02 13:04:43 -08:00
Ramanpreet Nara	f587267dc7	Revert D31705359: use irange for loops 8 Test Plan: revert-hammer Differential Revision: D31705359 (`17e5200441`) Original commit changeset: c9ea2fbc0f9c fbshipit-source-id: 08fff2d12beca953ad30dd0baabf86e39ac84f14	2021-12-02 12:55:08 -08:00
Anthony Shoumikhin	97750e03a4	[torch][edge] Add int to the copy kernel. (#69297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69297 . Test Plan: CI Reviewed By: JacobSzwejbka Differential Revision: D32799822 fbshipit-source-id: c40fdb55a706b3a8eccaa69dbfbc6d7af0b111e4	2021-12-02 12:13:58 -08:00
Eli Uriegas	7142b0b033	.github: Add linux.large to actionlint.yaml (#69304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69304 Don't know why this isn't automatically figured out Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: anjali411, atalman, janeyx99 Differential Revision: D32805380 Pulled By: seemethere fbshipit-source-id: 2c4805f87ae91388a6b605a6394024887b4bc71e	2021-12-02 11:21:49 -08:00
Denis Baručić	4056251a18	Add missing spaces to an error message (#69289 ) Summary: Before: `ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor.This is because InstanceNorm1d reshapes inputs to(1, N * C, ...) from (N, C,...) and this makesvariances 0.` After: `ValueError: InstanceNorm1d returns 0-filled tensor to 2D tensor. This is because InstanceNorm1d reshapes inputs to (1, N * C, ...) from (N, C,...) and this makes variances 0.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69289 Reviewed By: jbschlosser Differential Revision: D32796035 Pulled By: albanD fbshipit-source-id: c8e7c5cf6e961ec5f7242b31c7808454104cde02	2021-12-02 11:03:33 -08:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	2ea70a6462	Aloow Union of scalars to be NumberType (#66591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66591 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D31632599 Pulled By: tugsbayasgalan fbshipit-source-id: 374065da1d91334a19c15c604faf13ebec1681f6	2021-12-02 10:52:02 -08:00
Eli Uriegas	d673b1ec59	.github: Switch ciflow-should-run to self hosted (#69166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69166 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32735493 Pulled By: seemethere fbshipit-source-id: 9a03cf5245d1dbfe1be86cfbb3f5d1d42dd391c8	2021-12-02 10:42:07 -08:00
Scott Wolchok	14ed4df305	[PyTorch][Static Runtime][easy] give to_copy_functor a name (#67701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67701 I split this out to ease rebasing and review. ghstack-source-id: 144507288 Test Plan: CI Reviewed By: hlu1 Differential Revision: D32112523 fbshipit-source-id: dba14e6ada33df02dbcd7025b090a8a18cf438ae	2021-12-02 10:36:26 -08:00
Scott Wolchok	21686923e8	[PyTorch][SR] More debug logging (#67220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67220 Specifically we log AliasDb and same_storage_values, and are chattier about the aliasing logs in the liveness analysis. ghstack-source-id: 144507289 Test Plan: Used to help develop D31776259 Reviewed By: hlu1 Differential Revision: D31847561 fbshipit-source-id: 8371455d060c17dace91cd90e4034b7618f820a6	2021-12-02 10:36:23 -08:00
Scott Wolchok	b22e4d4aea	[PyTorch][SR] Add more to() tests & extend debug logging in testStaticRuntime (#67219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67219 I found that these specific test cases were causing different failures when developing D31776259. I also found that it was difficult to debug testStaticRuntime failures, so I added more verbose logs gated behind -v 2. ghstack-source-id: 144507287 Test Plan: Used during development of D31776259 Reviewed By: hlu1 Differential Revision: D31847566 fbshipit-source-id: ea9147fb246c345d18bbc8d7f3bfba48d3a0fab3	2021-12-02 10:34:54 -08:00
Jerry Zhang	84aa1ddedd	[quant] Remove warning for quantized Tensor in `__dir__` (#69265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69265 This is used in tab completion, we should not put warning here Test Plan: ci Imported from OSS Reviewed By: albanD Differential Revision: D32778736 fbshipit-source-id: f1bec5e09a8238ab41329ac2b64e6f3267799f6a	2021-12-02 10:30:36 -08:00
Richard Barnes	17e5200441	use irange for loops 8 (#66743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66743 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31705359 fbshipit-source-id: c9ea2fbc0f9cd29e97a52dcb203addc5f2abb09b	2021-12-02 10:21:29 -08:00
Wanchao Liang	ff3fc37267	[BE] rewrite ProcessGroupNCCLTest to be MultiProcess (#67705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67705 This PR rewrites ProcessGroupNCCLTest to be MultiProcessTestCase. It was originally written in a single process multi-GPU fashion, we change it to multi-process instead to align with other c10d tests. ghstack-source-id: 144555092 Test Plan: wait for CI Reviewed By: pritamdamania87, fduwjj Differential Revision: D32113626 fbshipit-source-id: 613d36aeae36bf441de1c2c83aa4755f4d33df4d	2021-12-02 10:12:05 -08:00
Vasiliy Kuznetsov	5c816520b3	ns for fx: fix bug in graph matcher (#69238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69238 The NS for FX graph matcher was not properly taking into account seen_nodes, this allowed a node to be matched twice. Test Plan: FB-only testing on real model passes. Ideally we would have a test case to capture this, but hopefully we can land this soon to unblock production work. Imported from OSS Reviewed By: HDCharles Differential Revision: D32765761 fbshipit-source-id: ed3dff8fd981e399a649fcd406883b4d56cc712a	2021-12-02 09:59:57 -08:00
Richard Zou	698c35e743	Add functorch TLS to ATen/ThreadLocalState (#69181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69181 functorch lives out-of-tree. However, it has some TLS that needs to be propagated. The solution for that is we store a pointer to the TLS inside pytorch/pytorch and extend FuncTorchTLSBase inside functorch to include whatever functorch needs. A previous solution used ThreadLocalDebugInfo. However, all PyTorch-managed threads (e.g. spawned by Autograd) all receive a shared_ptr that points to the same ThreadLocalDebugInfo. This leads to race conditions if the multiple threads start modifying the TLS stored within ThreadLocalDebugInfo without using mutexes. Test Plan: - tested with functorch - The performance impact of this change when functorch is not used is negligible because we end up manipulating nullptrs. Reviewed By: albanD Differential Revision: D32742312 Pulled By: zou3519 fbshipit-source-id: 1a8439a4af06b3d3e50b9a2dbca98a0ba612062a	2021-12-02 09:29:55 -08:00
Brian Hirsh	0de7a618a3	functionalization: update is_aliased() logic (#68881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68881 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32647614 Pulled By: bdhirsh fbshipit-source-id: 6bec50d3e54419d1707d0b6c0c6729bcc1ced1f0	2021-12-02 09:19:12 -08:00
Ben Koopman	4484c04513	[quant][embdding qat] Add FX support for QAT EmbeddingBag (#68121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68121 Add FX support for QAT EmbeddingBag operator, previously only eager mode support. Test Plan: pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embeddingbag_linear" Imported from OSS Reviewed By: supriyar Differential Revision: D32318435 fbshipit-source-id: 8b5d1a5d5422972c49676f9e470d5fbe29dd503b	2021-12-02 09:05:07 -08:00
Nikita Shulga	78ab3cde4a	Do not modify type map from getCustomClassTypeImpl() (#69261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69261 As this function is supposed to be called only once per type from caching getCustomClassType template Test Plan: Imported from OSS Reviewed By: suo, lw Differential Revision: D32776564 Pulled By: malfet fbshipit-source-id: 218436657e6ad5ad0c87964857114d1e60c57140	2021-12-02 08:53:09 -08:00
Nikita Shulga	113684cf81	Fix crash in `checkCustomClassType` if arg is null (#69259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69259 Otherwise `checkCustomClassType(nullptr, new Type())` will crash Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D32775297 Pulled By: malfet fbshipit-source-id: 54b10fd395d734c615dcaf85a5e599a445cee663	2021-12-02 08:51:59 -08:00
anjali411	668574af4a	Add efficient zero tensors (#64837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64837 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32144240 Pulled By: anjali411 fbshipit-source-id: d44096d882657c7f9270a16636900e0b73cefa40	2021-12-02 08:47:45 -08:00
Ben Koopman	abda069ce2	[quant][embedding qat] Support Embedding QAT via FX API (#68296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68296 Support QAT workflow by using torch.fx QAT API. e.g. `prepare_qat_fx` and `convert_fx`. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_embedding_linear"` Imported from OSS Reviewed By: jingsh, supriyar Differential Revision: D32404517 fbshipit-source-id: 0484df8c826b823b60dfecd9def77bf8cffe0527	2021-12-02 08:42:45 -08:00
Ben Koopman	3157371bb4	[quant][embedding qat] Fix bug enforcing quant_min <= zero_point <= quant_max for float zeropoint (#68852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68852 When using a float zero_point in FakeQuant, such as for embeddings, it does not need to be between quant_min and quant_max, as is enforced for integer zero_points. This is because float zero_points are formulated as per: ``` xq = Round(Xf * inv_scale + zero_point), Xq = Round((Xf - min) * inv_scale) ``` Test Plan: pytest test/test_quantization.py -v -k "test_fake_quant_per_channel_qparam_range" Imported from OSS Reviewed By: supriyar Differential Revision: D32645014 fbshipit-source-id: 96dc3ca6eef9cee60be6919fceef95c9f2759891	2021-12-02 07:58:03 -08:00
Will Constable	397183f44c	Add Lazy Tensor codegen infra (#69020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69020 Merges the lazy tensor codegen infra which has already been used on lazy_tensor_staging. Test Plan: Test via lazy_tensor_staging branch Reviewed By: alanwaketan, bdhirsh Differential Revision: D32570613 fbshipit-source-id: 2cd5698644398bda69669683f8de79fd3b6639b5	2021-12-02 07:51:52 -08:00
Alban Desmaison	28c519961f	Follow the undefined Tensor <-> None rule better in torch dispatch (#67793 ) Summary: As per title. This in particular allows to more easily override backward function for which the underlying backend returns `None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67793 Reviewed By: zou3519 Differential Revision: D32242962 Pulled By: albanD fbshipit-source-id: 6e114def90ee9499161e1303d301ba7fd003ff89	2021-12-02 07:46:56 -08:00
Kevin Tse	0465f64bb8	[DataPipe] Adding BatcherMapDataPipe (#68197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68197 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32440963 Pulled By: NivekT fbshipit-source-id: 277cbe8d735afe341a7c189be20e1d334ecf9d4a	2021-12-02 07:27:17 -08:00
Alban Desmaison	00ebbd5ef6	Revert D32010095: [pytorch][PR] Add ability for a mobile::Module to save as flatbuffer Test Plan: revert-hammer Differential Revision: D32010095 (`41d35dc201`) Original commit changeset: d763b0557780 fbshipit-source-id: bf746a0389135c9f5f67f00f449435ce08fb5f6d	2021-12-02 06:41:40 -08:00
Hao Lu	ed3b73fd4d	[Static Runtime] Skip ProcessedNode:: verify_no_memory_overlap() for out variants (#68639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68639 Fix all problems related to `ProcessedNode:: verify_no_memory_overlap()` - Only enable this check for native and fallback ops that are not inplace or view ops - Enable ProcessedNode:: verify_no_memory_overlap() in debug mode and enforce it - Add gflag --static_runtime_disable_debug_memory_overlap_check to test the runtime memory overlap fix for bad schemas fb::expand_dims's schema was not correct after this check is re-enabled. It's fixed in D32556204 (`39ab417107`) Reviewed By: mikeiovine Differential Revision: D32553708 fbshipit-source-id: 88de63cdf1ee4f87b7726c8b65a11a5fb8a99d13	2021-12-02 05:03:12 -08:00
Wanchao Liang	c60232d89a	[shard] add back init_from_local_shard_and_global_metadata API (#69226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69226 This add back the previous init_from_local_shards API, but renamed it to init_from_local_shard_and_global_metadata. It's a partial revert of D32147888 (`35712a8eb4`). We now provide two APIs: 1. `init_from_local_shards`: user don't need to provide global metadata and we do all_gather under the hood, the other that 2. `init_from_local_shards_and_global_metadata`: user need to explicitly construct ShardedTensorMetadata to use this API, need to ensure correctness on all ranks, as there's no cross-rank communication/validations. All of these two APIs stay private until it stablizes and proof of UX. The second one can only be called on `ShardedTensor` class directly, not included as a package API for now. Test Plan: test_init_from_local_shards_and_global_metadata test_init_from_local_shards_and_global_metadata_invalid_shards Reviewed By: dstaay-fb, pritamdamania87 Differential Revision: D32746882 fbshipit-source-id: bafd26ce16c02e2095907f9e59984a5d775c7df5	2021-12-02 01:02:56 -08:00
Yanli Zhao	12621c3a39	support pure fp16 training in FSDP (#68417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68417 1. since parameter attributes are lazily initialized at the beginning of forward, it makes more sense to init full_param_padded using parameters' data type during lazy_init, instead of using parameters' data type during construction, as parameters' data type may be changed after construction and before training loop 2.add checking whether parameter storage is changed outside FSDP and handle it properly ghstack-source-id: 144479019 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32458643 fbshipit-source-id: 0e07e5e08270f2e265e8f49124a6648641e42e7a	2021-12-02 00:27:45 -08:00
Han Qi	41d35dc201	Add ability for a mobile::Module to save as flatbuffer (#67351 ) Summary: Included functions: * save_mobile_module -> saves a mobile::Module to flatbuffer * load_mobile_module_from_file -> loads a flatbuffer into mobile::Module * parse_mobile_module -> parses from bytes or deserialized flatbuffer Module object Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67351 Reviewed By: iseeyuan Differential Revision: D32010095 Pulled By: qihqi fbshipit-source-id: d763b0557780f7c2661b6485105b045e41a5e8f1	2021-12-01 23:58:15 -08:00
Elias Ellison	40fb28ea87	[JIT] Compute input sym shapes in partial eval graph (#68281 ) Summary: Needed for NNC dynamic shape fusion. Previously, when creating a partially evaluated graph for symbolic shape compute, if the input wasn't used, we wouldn't compute it, which led to failures when NNC expected this value to be passed in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68281 Reviewed By: navahgar Differential Revision: D32401365 Pulled By: eellison fbshipit-source-id: 97a684e5f1faed5df77c8fd69f9623cdba0781f9	2021-12-01 22:33:35 -08:00
Kevin Tse	d8a44270d6	[DataPipe] Simplify BatcherIterDataPipe by removing 'unbatch_level' argument and functionality (#68594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68594 Based on my conversation with ejguan [here](https://github.com/pytorch/pytorch/pull/68197#pullrequestreview-809148827), we both believe that having the `unbatch_level` argument and functionality is making this DataPipe unnecessarily complicated, because users can call `.unbatch` before `.batch` if they would like to do so. That will likely be cleaner as well. I also checked other libraries (for example, [TensorFlow](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#unbatch)), and I do not see them provide the ability the `unbatch` within the `batch` function either. This PR simplifies the DataPipe by removing the argument. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32532594 Pulled By: NivekT fbshipit-source-id: 7276ce76ba2a3f207c9dfa58803a48e320adefed	2021-12-01 22:00:31 -08:00
Michael Suo	ad182479b0	[deploy] docs (#69251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69251 This adds some actual documentation for deploy, which is probably useful since we told everyone it was experimentally available so they will probably be looking at what the heck it is. It also wires up various compoenents of the OSS build to actually work when used from an external project. Differential Revision: D32783312 D32783312 Test Plan: Imported from OSS Reviewed By: wconstab Pulled By: suo fbshipit-source-id: c5c0a1e3f80fa273b5a70c13ba81733cb8d2c8f8	2021-12-01 21:55:18 -08:00
Dennis van der Staay	cbe0a38d8c	Back out "[CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer" (#69193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69193 Reviewed By: xing-liu, yuchenhao Differential Revision: D32748570 fbshipit-source-id: bd73d7567f94c70daeace49d4081381b8adf2d77	2021-12-01 19:30:08 -08:00
Dennis van der Staay	929f2a750a	Back out "[CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability" (#69191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69191 Reviewed By: xing-liu, yuchenhao Differential Revision: D32748466 fbshipit-source-id: 6abd3265e8a20270305da3f8be25114ad4d67fc2	2021-12-01 19:28:57 -08:00
Pearu Peterson	370d0afc1b	Strided masked var. (#68738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68738 Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D32767155 Pulled By: cpuhrsch fbshipit-source-id: a5c095103405fbfc28b9f4fd624bdbbc45e7f715	2021-12-01 19:19:37 -08:00
Jacob Szwejbka	291e56eda4	[Pytorch Edge] Update Black Box Api with operator versioning (#68678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68678 Test Plan: Ill update the unit test before land Reviewed By: cccclai Differential Revision: D32573603 fbshipit-source-id: 19271bcbb68b61d24d6943e61a943f4f75fddb5d	2021-12-01 19:13:32 -08:00
Chen Lai	b9738e923e	[Operator Versioning][Edge] Add old models and unittest (#67726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67726 1. Check in one model with aten:div_tensor old op with unittest in both cpp and python. The following two lines are commented out and expected to work after using upgrader. ``` _helper(mobile_module_v2, div_tensor_0_3) _helper(current_mobile_module, torch.div) ``` 2. Update the commented code accordingly. Currently there are 6 upgraders. The following old models with operators are added to cover these 6 upgraders: ``` // Tensor x Tensor test_versioned_div_tensor_v3 // Tensor x Scalar test_versioned_div_scalar_float_v3 test_versioned_div_scalar_reciprocal_int_v3 test_versioned_div_scalar_inplace_float_v3 // Scalar x Scalar test_versioned_div_scalar_scalar_v3 // Tensor x Tensor with out kwarg test_versioned_div_tensor_out_v3 // Tensor x Tensor inplace test_versioned_div_tensor_inplace_v3 // Tensor x Scalar inplace test_versioned_div_scalar_inplace_int_v3 ``` Note: In this pr, per model, it includes the following test: 1. Model (with old op) load/run test will be in both cpp and python 2. Model (with old op) + upgrader test will be in python Other tests considered adding: 1. per upgrader bytecode test 2. app level integration test ghstack-source-id: 144422418 Test Plan: CI and the added unittest Reviewed By: iseeyuan Differential Revision: D32069653 fbshipit-source-id: 96d9567088a1f709bc7795f78beed7a308e71ca9	2021-12-01 18:46:30 -08:00
Guo Yejun	124bb6a19d	RegisterDispatchKey.cpp: remove redundant code (#68983 ) Summary: remove the line since line 10 has already included this header file Pull Request resolved: https://github.com/pytorch/pytorch/pull/68983 Reviewed By: samdow Differential Revision: D32706952 Pulled By: soulitzer fbshipit-source-id: 98746e12d8d04d64ee2e0449e4aec5153ac723d5	2021-12-01 18:38:19 -08:00
Aliaksandr Ivanou	fced51eaf7	[torch][distributed] Check for file existence before invoking cleanup logic in FileStore destructor (#68603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68603 FileStore is frequently used from the python lang, which has GC. This means, that users of FileStore from python do not have control over when FileStore destructor is invoked. If the directory for file store is created by some external logic, that has cleanup procedure, this procedure may have a race condition with the logic in the FileStore destructor. The diff adds check for file access in destructor before actually invoking the cleanup. In long term, it makes sense to move out the cleanup logic out of the destructor to a separate method. Test Plan: CI Stress tests: `buck test mode/dev-nosan //torchrec/examples/dlrm/tests:test_dlrm_main -- --exact 'torchrec/examples/dlrm/tests:test_dlrm_main - torchrec.examples.dlrm.tests.test_dlrm_main.MainTest: test_main_function' --run-disabled --jobs 18 --stress-runs 20 --record-results` Reviewed By: colin2328 Differential Revision: D32535470 fbshipit-source-id: 6f421f2e7b0d9ac9c884a1db2f7e5a94fc59fc0e	2021-12-01 16:43:42 -08:00
jjsjann123	3c1e2ff9eb	fixing layer_norm cuda bug (#69210 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69210 Reviewed By: H-Huang Differential Revision: D32764811 Pulled By: ngimel fbshipit-source-id: fb4201fe5f2284fcb22e36bc1029eef4a21b09bf	2021-12-01 15:46:47 -08:00
Ansha Yu	d72d476875	[pyper] add flag to disable clip_ranges_gather fusions (#69198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69198 add flag --enable_clip_ranges_gather_fusions to disable clip_ranges+gather_ranges fusions. This fusion happens in static runtime, and it also happens in jit when optimize_sparse_nn_model is used. Note that clip_ranges+gather_ranges+sigrid_hash fusions use different code that was untouched by D30515441 (`01b30922dd`), so not disabling it for now. This also effectively disables ClipRangesGatherSigridHash(graph) (even though it's not explicitly included), because that fusion lookgs for the clip_ranges_gather_lengths_to_offsets fusion, which won't exist if this flag is on Test Plan: Run ptvsc2_predictor_bench with --enable_clip_ranges_gather_fusions=0 and SR=1 ``` Input size: 211 Static runtime ms per iter: 11.9668. Iters per second: 83.5643 Time per node type: 6.42796 ms. 54.5663%. static_runtime::fused_variadic_sigrid_transforms_torch_bind (1 nodes, out variant) 1.64969 ms. 14.0041%. fb::quantized_linear (9 nodes, out variant) 0.475394 ms. 4.03557%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (158 nodes, out variant) 0.367554 ms. 3.12013%. aten::argmin (1 nodes, out variant) 0.358351 ms. 3.04201%. aten::matmul (1 nodes, out variant) 0.215082 ms. 1.82581%. static_runtime::to_copy (805 nodes, out variant) 0.214397 ms. 1.81999%. fb::gather_ranges (313 nodes, out variant) 0.179759 ms. 1.52595%. fb::offsets_to_ranges (655 nodes, out variant) 0.173236 ms. 1.47058%. fb::lengths_to_offsets (464 nodes, out variant) 0.151249 ms. 1.28394%. aten::sub (1 nodes, out variant) 0.14017 ms. 1.18989%. aten::sigmoid (3 nodes, out variant) 0.136118 ms. 1.15549%. aten::mul (5 nodes, out variant) 0.130813 ms. 1.11046%. aten::sum (3 nodes, out variant) 0.124876 ms. 1.06006%. aten::repeat (1 nodes, out variant) 0.12191 ms. 1.03488%. static_runtime::signed_log1p (1 nodes, out variant) 0.0922349 ms. 0.782972%. aten::norm (1 nodes, out variant) 0.0877845 ms. 0.745193%. aten::pow (1 nodes, out variant) 0.0783335 ms. 0.664966%. fb::batch_box_cox (1 nodes, out variant) 0.0755047 ms. 0.640951%. fb::clip_ranges (311 nodes, out variant) 0.0702456 ms. 0.596308%. static_runtime::layer_norm (2 nodes, out variant) 0.0696762 ms. 0.591475%. fb::quantize_per_tensor (4 nodes) 0.0556873 ms. 0.472724%. quantized::embedding_bag_byte_prepack (3 nodes, out variant) 0.0555237 ms. 0.471335%. prim::VarConcat (2 nodes, out variant) 0.0437336 ms. 0.37125%. static_runtime::dict_unpack (2 nodes, native) 0.0390592 ms. 0.33157%. static_runtime::dequantize_copy (9 nodes, out variant) 0.0385823 ms. 0.327521%. fb::concat_add_mul_replacenan_clip (1 nodes, out variant) 0.0321869 ms. 0.273231%. prim::TupleConstruct (1 nodes, out variant) 0.0308289 ms. 0.261703%. fb::casted_batch_one_hot_lengths (1 nodes, out variant) 0.0280272 ms. 0.23792%. static_runtime::reshape_copy (2 nodes, out variant) 0.0244705 ms. 0.207727%. fb::sigrid_hash_precompute (1 nodes, out variant) 0.020917 ms. 0.177562%. static_runtime::VarTupleUnpack (1 nodes, native) 0.0175842 ms. 0.149271%. aten::div (1 nodes, out variant) 0.0169989 ms. 0.144302%. aten::narrow_copy (4 nodes, out variant) 0.00818147 ms. 0.0694517%. aten::logit (1 nodes, out variant) 0.00719822 ms. 0.061105%. prim::VarStack (1 nodes, out variant) 0.00687292 ms. 0.0583435%. aten::add (1 nodes, out variant) 0.00328646 ms. 0.0278985%. aten::clamp_min (1 nodes, out variant) 0.00325073 ms. 0.0275951%. static_runtime::expand_dims_copy (1 nodes, out variant) 0.00295617 ms. 0.0250946%. static_runtime::flatten_copy (1 nodes, out variant) 0.00230511 ms. 0.0195679%. aten::expand_as (1 nodes, native) 0.00182061 ms. 0.015455%. aten::full_like (1 nodes, out variant) 0.000268152 ms. 0.00227631%. prim::ListConstruct (1 nodes, out variant) 11.7801 ms. in Total ``` Servicelabs: AF: https://www.internalfb.com/intern/servicelab/1001770528/ AI: https://www.internalfb.com/intern/servicelab/402342245/ Prospector: https://www.internalfb.com/intern/servicelab/502342630/ Reviewed By: movefast1990 Differential Revision: D32750847 fbshipit-source-id: b809a72a9fbeea86080346962eb17761e71397d8	2021-12-01 15:26:36 -08:00
Santiago Castro	263125a962	Fix RAdam docstring on LR default value (#69186 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69186 Reviewed By: albanD Differential Revision: D32759614 Pulled By: H-Huang fbshipit-source-id: b11819c50156a538cd6003e9cddde0390c853f67	2021-12-01 14:32:07 -08:00
Kyle Matoba	3bf4080fd9	Change misleading MaxUnpool2d example to better demonstrate output_size usage (#68936 ) Summary: At https://github.com/pytorch/pytorch/issues/68873, jbschlosser states that maxunpool2d with the `output_size` argument only works for indices of the same size. This makes sense, but unfortunately it's not what's shown in the example! I've removed the wrong example and replaced it with one where specifying `output_size` is actually necessary -- the unpool call fails without it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68936 Reviewed By: H-Huang Differential Revision: D32759207 Pulled By: jbschlosser fbshipit-source-id: 658e1724150a95454a05a771ae7c6e2e736740a7	2021-12-01 14:11:26 -08:00
peterjc123	2eef5e76db	add `extra_repr` for `nn.ZeroPad2d` (#69206 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/69205 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69206 Reviewed By: H-Huang Differential Revision: D32759597 Pulled By: jbschlosser fbshipit-source-id: abc9ee69fb5e22d45a640993a4e598b016020688	2021-12-01 13:53:19 -08:00
David Berard	cd043c335f	Revert D32329330: [JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp Test Plan: revert-hammer Differential Revision: D32329330 (`cfc75c2137`) Original commit changeset: c0f10da4b954 fbshipit-source-id: e81f93a5c1e2bb9b20fde6ccaeef143472a5b900	2021-12-01 12:55:10 -08:00
Jiewen Tan	e6c435bf96	[LTC] Upstream helpers for c10::Device <=> BackendDevice (#69064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69064 This commit upstreams helpers for converting a c10::Device to BackendDevice and vice versa. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.FromAten:BackendDeviceTest.ToAten Reviewed By: wconstab Differential Revision: D32732607 Pulled By: alanwaketan fbshipit-source-id: 0dd233d37a4a30fc4b22dba322ddd85d4cb3635b	2021-12-01 12:15:32 -08:00
Yanli Zhao	92f168941e	remove accidentally committed redundant debug print (#68510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68510 remove accidentally committed redundant debug print ghstack-source-id: 144362817 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32487736 fbshipit-source-id: 279030f782e6b716a6bbfd591c5ce761de3ddd63	2021-12-01 11:35:34 -08:00
Pearu Peterson	1842364b30	Strided masked normalize. (#68694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68694 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32724552 Pulled By: cpuhrsch fbshipit-source-id: 82f579a86b0b265e0b9b3715a8a327b775dd55e1	2021-12-01 10:45:16 -08:00
Mike Guo	23633bdb5c	record the datapipe for each pieces of Dataset (#67613 ) Summary: Add record_function for each DataPipe. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67613 Reviewed By: H-Huang Differential Revision: D32246672 Pulled By: ejguan fbshipit-source-id: 02ef7e75748c5b84fdcbb103398532e1f2962fbf	2021-12-01 10:29:06 -08:00
Paul Streli	deaf745aee	Add kl divergence between normal and laplace distribution. (#68807 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/68746] ![KL_normal_laplace](https://user-images.githubusercontent.com/35850237/143008244-f304cee1-9583-4de1-b0d0-5751ebdb8188.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/68807 Reviewed By: H-Huang Differential Revision: D32750391 Pulled By: neerajprad fbshipit-source-id: 129e6ef60d6e244d0d6b02b3944bfd5d8b06edcb	2021-12-01 10:22:08 -08:00
Tim Poulsen	486ae5c733	Dataset & IterableDataset attribute errors prints attribute (#69021 ) Summary: The message is the message from a standard attribute error. Thought it would be informative when the error is thrown. Alternatively in python 3.10, one can set the keyword arguments 'name' and 'obj', reference: https://github.com/python/cpython/blob/3.10/Doc/library/exceptions.rst#concrete-exceptions Fixes #{?} Pull Request resolved: https://github.com/pytorch/pytorch/pull/69021 Reviewed By: samdow Differential Revision: D32730362 Pulled By: ejguan fbshipit-source-id: 7132ba612fa6075aeffb9315ce651828e9a8e0bc	2021-12-01 10:16:31 -08:00
Kurt Mohler	d507fd63f3	Check that block height and width are positive in `nn.Fold` (#69048 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68875 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/69048 Reviewed By: samdow Differential Revision: D32729307 Pulled By: jbschlosser fbshipit-source-id: 162cafb005873012d900d86997d07640967038c0	2021-12-01 10:08:47 -08:00
Nikita Shulga	c08e95dd9c	Introduce `IS_LINUX` and `IS_MACOS` global vars (#69093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69093 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32730080 Pulled By: malfet fbshipit-source-id: aa3f218d09814b4edd96b01c7b57b85fd58c47fc	2021-12-01 09:47:38 -08:00
Nikita Shulga	840fe8e4e6	Fix MacOS artifact upload (#69188 ) Summary: Add test shard number and runner name to the test name suffix Otherwise test report names for shard 1 and shard 2 will be identical and override each other Pull Request resolved: https://github.com/pytorch/pytorch/pull/69188 Reviewed By: janeyx99 Differential Revision: D32747747 Pulled By: malfet fbshipit-source-id: 149f921d8e420d3ed69ce812bdcd3c034799353a	2021-12-01 08:06:48 -08:00
lezcano	f9e69af22e	Modify LU_backward and lu_solve_backward to use linalg_solve_triangular (#63569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63569 This PR also rewrites `lu_solve_backward` from scratch going from solving 5 systems of equations to just 2. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32618014 Pulled By: anjali411 fbshipit-source-id: 0e915bcf7045a4db43ffd076d807beac816c8538	2021-12-01 07:34:38 -08:00
Krishna Ramasimha	478069d6f2	Remove duplicate .DS_Store in gitignore (#68981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68981 Reviewed By: samdow Differential Revision: D32707039 Pulled By: soulitzer fbshipit-source-id: 346f0f3de583d995be34c252db4f9f26cd574ba8	2021-12-01 07:28:33 -08:00
Kshiteej K	e5e0c19882	OpInfo : embedding_bag (#67252 ) Summary: Adds OpInfo for `embedding_bag`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67252 Reviewed By: VitalyFedyunin Differential Revision: D32462157 Pulled By: zou3519 fbshipit-source-id: 70303349a718720c4fa47519fa94ae900e052939	2021-12-01 07:00:50 -08:00
Peter Bell	1da1707568	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32734911 Pulled By: cpuhrsch fbshipit-source-id: 203ab105799f3d2d682b01ca3d6b18e7c994776a	2021-12-01 05:43:19 -08:00
Facebook Community Bot	afff381824	Automated submodule update: tensorpipe (#69089 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `ed4bbe52b7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/69089 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32725534 fbshipit-source-id: 73b1e0f67c957ca0220cd47179dd4b350a98fd33	2021-12-01 02:29:18 -08:00
Elias Ellison	a23d1036ab	Add ops for BI (mean) (#68826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68826 Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32732465 Pulled By: eellison fbshipit-source-id: e8b185d89e5ecbe5c8e09d576c84a1f0a402a5e0	2021-12-01 00:45:00 -08:00
Elias Ellison	19b87292fc	Add TE fuser ops (#68825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68825 Factoring out the elementwise ops in tensorexpr fuser and adding their corresponding shape functions, since we need shape functions to fuse them with dynamic shapes Test Plan: Imported from OSS Reviewed By: samdow Differential Revision: D32732466 Pulled By: eellison fbshipit-source-id: 69cacf6fbed8eb97e475f5d55b2eec0384fe8ec1	2021-12-01 00:43:42 -08:00
Rohan Varma	7fad758e02	[FSDP] AutoWrap Main API (#68155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68155 Per title ghstack-source-id: 144398229 Test Plan: CI Reviewed By: pbelevich, mrshenli Differential Revision: D32327954 fbshipit-source-id: 36bdf06c1c50932a93acbfa97017c549fa490a6c	2021-12-01 00:16:38 -08:00
Rohan Varma	999e52a795	[FileStore] log timeout in err msg (#69167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69167 Per title ghstack-source-id: 144378083 Test Plan: Ci Reviewed By: H-Huang Differential Revision: D32736119 fbshipit-source-id: f37fd3e4ac393c07eb8bd1f9202841d33d0a8aad	2021-11-30 23:29:09 -08:00
Juhyeong Kim	845a82b635	Debug positive definite constraints (#68720 ) Summary: While implementing https://github.com/pytorch/pytorch/issues/68644, during the testing of 'torch.distributions.constraint.positive_definite', I found an error in the code: [location](`c7ecf1498d/torch/distributions/constraints.py (L465-L468)`) ``` class _PositiveDefinite(Constraint): """ Constrain to positive-definite matrices. """ event_dim = 2 def check(self, value): # Assumes that the matrix or batch of matrices in value are symmetric # info == 0 means no error, that is, it's SPD return torch.linalg.cholesky_ex(value).info.eq(0).unsqueeze(0) ``` The error is caused when I check the positive definiteness of `torch.cuda.DoubleTensor([[2., 0], [2., 2]])` But it did not made a problem for `torch.DoubleTensor([[2., 0], [2., 2]])` You may easily reproduce the error by following code: ``` Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> const = torch.distributions.constraints.positive_definite >>> const.check(torch.cuda.DoubleTensor([[2., 0], [2., 2]])) tensor([False], device='cuda:0') >>> const.check(torch.DoubleTensor([[2., 0], [2., 2]])) tensor([True]) ``` The cause of error can be analyzed more if you give 'check_errors = True' as a additional argument for 'torch.linalg.cholesky_ex'. It seem that it is caused by the recent changes in 'torch.linalg'. And, I suggest to modify the '_PositiveDefinite' class by using 'torch.linalg.eig' function like the below: ``` class _PositiveDefinite(Constraint): """ Constrain to positive-definite matrices. """ event_dim = 2 def check(self, value): return (torch.linalg.eig(value)[0].real > 0).all(dim=-1) ``` By using above implementation, I get following result: ``` Python 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> const = torch.distributions.constraints.positive_definite >>> const.check(torch.cuda.DoubleTensor([[2., 0.], [2., 2.]])) tensor(True, device='cuda:0') >>> const.check(torch.DoubleTensor([[2., 0.], [2., 2.]])) tensor(True) ``` FYI, I do not know what algorithm is used in 'torch.linalg.eig' and 'torch.linalg.cholesky_ex'. As far as I know, they have same time complexity generally, O(n^3). It seems that in case you used special algorithms or finer parallelization, time complexity of Cholesky decomposition may be reduced to approximately O(n^2.5). If there is a reason 'torch.distributions.constraints.positive_definite' used 'torch.linalg.cholesky_ex' rather than 'torch.linalg.eig' previously, I hope to know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68720 Reviewed By: samdow Differential Revision: D32724391 Pulled By: neerajprad fbshipit-source-id: 32e2a04b2d5b5ddf57a3de50f995131d279ede49	2021-11-30 22:27:27 -08:00
Jacob Szwejbka	8586f374bc	[Pytorch Edge] Get Operator Version from model file (#68677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68677 Using in compatibility apis. Luckily the stream reader kinda just does this already so mostly just create a wrapper in our compatibility files Test Plan: ci Reviewed By: cccclai Differential Revision: D32573132 fbshipit-source-id: 86331c03a1eebcd86ed29b9c6cd8a8fd4fe79949	2021-11-30 21:10:21 -08:00
Ivan Yashchuk	219db3b4e1	Add OpInfo for torch.linalg.tensorsolve (#68810 ) Summary: This PR adds an OpInfo entry for tensorsolve function. The keyword argument is different from NumPy so a lambda function is needed to be passed to `ref=`. I had to change the dtypes for `test_reference_testing` because NumPy does computation internally using double for all linear algebra functions and maybe for some other functions. Using `torch.float64` and `torch.complex128` is more reliable for NumPy comparisons. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68810 Reviewed By: soulitzer Differential Revision: D32696065 Pulled By: mruberry fbshipit-source-id: a4305065d3e7d0097503dc05938b3c4784e14996	2021-11-30 20:31:12 -08:00
Jacob Szwejbka	b05237f5e4	[Pytorch Edge] Add bool to copy kernel (#69106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69106 this kernel sucks. Test Plan: ci Reviewed By: shoumikhin, cccclai Differential Revision: D32729888 fbshipit-source-id: c747d4bf3d5233c8ed15dba5e2c2d244ba7d4b3f	2021-11-30 19:45:42 -08:00
Peter Bell	e534c5efd7	CMake: Include instead of copying cpu kernel files (#67656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67656 Currently, each cpu kernel file is copied into the build folder 3 times to give them different compilation flags. This changes it to instead generate 3 files that `#include` the original file. The biggest difference is that updating a copied file requires `cmake` to re-run, whereas include dependencies are natively handled by `ninja`. A side benefit is that included files show up directly in the build dependency graph, whereas `cmake` file copies don't. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32566108 Pulled By: malfet fbshipit-source-id: ae75368fede37e7ca03be6ade3d4e4a63479440d	2021-11-30 19:13:53 -08:00
Nikita Shulga	f6f1b580f8	Fix mypy in cpp_extension.py (#69101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69101 Test Plan: Imported from OSS Reviewed By: atalman, janeyx99 Differential Revision: D32730081 Pulled By: malfet fbshipit-source-id: 76ace65b51850b74b175a3c4688c05e107873e8d	2021-11-30 16:01:55 -08:00
Nikita Shulga	6953b7e269	[BE] Fix mypy local run on MacOS (#69097 ) Summary: Unversioned python invocations should not be used, as it can be aliased to Python-2 Also invoke mypy as `python3 -mmypy` as binary aliases are not always available for user installation Pull Request resolved: https://github.com/pytorch/pytorch/pull/69097 Reviewed By: janeyx99 Differential Revision: D32729367 Pulled By: malfet fbshipit-source-id: 7539bd0af15f97eecddfb142dba7de7f3587083d	2021-11-30 15:52:23 -08:00
Eli Uriegas	aa2163eba5	.github: Add linux.large instance type (#69165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69165 We're hitting hard concurrency limits for built in github runners so let's use our own runners and make them non-ephemeral so they'll have basically constant uptime Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: atalman Differential Revision: D32735494 Pulled By: seemethere fbshipit-source-id: c042c6f0fb23fd50acef312d96b0c89d02c93270	2021-11-30 14:45:51 -08:00
Jongsoo Park	e60fd10659	[fbgemm] remove assumption number of rows is in 32 bit (#69066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69066 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/781 And remove unnecessary looping inside parallel_for despite fbgemm routines support batching multiple rows Test Plan: CI Reviewed By: dskhudia, jianyuh Differential Revision: D32715453 fbshipit-source-id: 33c3e72f51c8ff5d02dafab4a8947d1230c2d551	2021-11-30 13:38:53 -08:00
Scott Wolchok	ef7ed082ec	[PyTorch] Remove StringView from RecordFunction implementation [2/2] (#68411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68411 Avoids heap-allocating a std::string instance in before() each time even if it's not going to be used. ghstack-source-id: 144287655 Test Plan: Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty Before: P467922606 After: P467922626 Reviewed By: chaekit Differential Revision: D32453846 fbshipit-source-id: 18e1b482dbf5217add14cbaacd447de47cb5877b	2021-11-30 13:22:27 -08:00
Scott Wolchok	1d84d8c5d8	[PyTorch] Remove StringView from RecordFunction interface (1/2) (#68410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68410 First step toward not heap-allocating a string in RecordFunction::before() every time ghstack-source-id: 144287654 Test Plan: CI Reviewed By: chaekit Differential Revision: D32453847 fbshipit-source-id: 080d95095fb568287b65fcc41a4ca6929b5f9a87	2021-11-30 13:20:08 -08:00
Xiang Gao	22690c2cb6	Use `cub::FutureValue` to simplify 64bit indexing split of cub scan (#66711 ) Summary: https://github.com/NVIDIA/cub/pull/305 has landed to cub 1.15. This is ready to review and land. This PR contains https://github.com/pytorch/pytorch/pull/66219, please land that PR first before review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66711 Reviewed By: soulitzer Differential Revision: D32698306 Pulled By: ngimel fbshipit-source-id: 4cc6b9b24cefd8932f4d421c6d64ea20ea911f52	2021-11-30 13:15:36 -08:00
Stephen Jia	c48e6f014a	[vulkan] Update VMA settings to reduce memory usage (#69088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69088 It was found that the Vulkan backend was consuming a huge (~287 MB) of graphics memory when executing a lightweight segmentation model. In fact the Vulkan backend tends to consume a huge amount of memory in general. It was found that the reason for this is due to how the backend uses [VMA](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/). When allocating memory, VMA will first allocate a large block of memory, then subdivide that block to use for individual textures and buffers. The pattern is used because Vulkan has a limit on the number of `vkDeviceMemory` allocations that can be active at one time. It turns out that the Vulkan backend was using custom memory pools with a block size of 64 MiB, meaning that the minimum amount of memory used will be 64 MiB at minimum. Furthermore, usage of the [linear allocation algorithm](https://gpuopen-librariesandsdks.github.io/VulkanMemoryAllocator/html/custom_memory_pools.html#linear_algorithm) resulted in minimal reuse of memory, leading to the creation of much more blocks than were actually required and a huge amount of unused memory. By avoiding the use of custom memory pools and instead simply using the default memory pool provided by VMA, the library seems to have a much easier time minimizing the amount of unused memory. This change reduces memory usage down to 20 MB when running the aforementioned segmentation model. This diff also reduces the preferred block size to 32 MiB and removes the use of the linear allocation algorithm in case custom memory pools are needed in the future. Test Plan: Build and run vulkan_api_test: ``` cd ~/pytorch BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: beback4u Differential Revision: D32653767 fbshipit-source-id: b063a8ea76d34b57d0e2e6972ca5f6f73f2fd7e5	2021-11-30 12:45:41 -08:00
Rohan Varma	fcd1375b2b	[DDP][BE][Docs] Clarify checkpoint support (#68827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68827 Add a note about current checkpoint support with DDP. Note that this does not include the features enabled with _set_static_graph yet, as it is an undocumented private API. Once we support static graph as beta feature in OSS we can add to the note here. ghstack-source-id: 144285041 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D32624957 fbshipit-source-id: e21d156a1c4744b6e2a807b5b5289ed26701886f	2021-11-30 12:37:37 -08:00
Rohan Varma	994f110a6f	Refactor DDP checkpoint tests (#68792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68792 Refactor tests to be more clear what features are supported and unsupported under certain DDP configs. ghstack-source-id: 144285040 Test Plan: Ci Reviewed By: pbelevich Differential Revision: D32609498 fbshipit-source-id: 5231242054d4ff6cd8e7acc4a50b096771ef23d1	2021-11-30 12:36:14 -08:00
Maksim	49abda208b	[JIT] internal build bug fix (#69061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69061 `warning` breaks this build [D32622152](https://www.internalfb.com/diff/D32622152) Test Plan: Imported from OSS Differential Revision: D32712448 Pulled By: makslevental fbshipit-source-id: c7a70487bd0b95ac8b242522c36597d36072201f	2021-11-30 12:32:11 -08:00
Ben Koopman	5e0302e1d0	[quant][embedding qat] Set FakeQuant zeropoint dtype matches observer (#68390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68390 Observer zero_point's dtype can be float, in the specific case of `torch.per_channel_affine_float_qparams`. This change sets FakeQuant's zero_point dtype accordingly. Test Plan: `pytest test/quantization/core/test_workflow_module.py -v -k "embedding"` `pytest test/quantization/eager/test_quantize_eager_qat.py -v -k "embedding"` Imported from OSS Reviewed By: vkuzo Differential Revision: D32446405 fbshipit-source-id: cca7aade68ff171887eeeae42801f77d934dad4c	2021-11-30 12:21:14 -08:00
Nikul Patel	8f9f559453	ammend tensors.rst and torch.rst for doc generation (#69030 ) Summary: (This is my first contribution to PyTorch) Added missing operations to docs added in https://github.com/pytorch/pytorch/issues/64430. Please let me know if I've done anything wrong. Fixes https://github.com/pytorch/pytorch/issues/68928 Pull Request resolved: https://github.com/pytorch/pytorch/pull/69030 Reviewed By: samdow Differential Revision: D32706826 Pulled By: soulitzer fbshipit-source-id: edcc175a8f9bc69450a39059580c05edce699312	2021-11-30 12:04:13 -08:00
Michael Suo	0aa9d177fe	[fx] remove CPatcher (#69032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69032 I am removing it because, for packaging-related reasons, it's easier if torch.fx is a pure Python module. I don't think there is much reason to keep it: this functionality was experimental, has no known users currently, and we didn't have a clear path to turning it on by default due to regressions in tracing performance. Also, it only was ever enabled for `rand` and friends. Technically the removal of the `enable_cpatching` arguments on `symbolic_trace` and `Tracer.__init__` are BC-breaking, but the docstrings clearly state that the argument is experimental and BC is not guaranteed, so I think it's fine. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D32706344 Pulled By: suo fbshipit-source-id: 501648b5c3610ae71829b5e7db74e3b8c9e1a480	2021-11-30 11:59:57 -08:00
Jan-Hendrik Ewers	81246ed01c	Markdown was linking to repo rather than pytorch.org website (#68937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68937 Reviewed By: samdow Differential Revision: D32707264 Pulled By: soulitzer fbshipit-source-id: c534f008087def33784dde701130769e2058aa9f	2021-11-30 11:51:24 -08:00
Eli Uriegas	251686fc4c	Revert D32706197: Sparse: Implement simple unary ufuncs operators Test Plan: revert-hammer Differential Revision: D32706197 (`fbaa19a6fa`) Original commit changeset: 65e1acb36457 fbshipit-source-id: 45c4b486f9eee200d5a1f6d46d267617124f8a5e	2021-11-30 10:50:12 -08:00
Joel Schlosser	8fef7c09f5	Remove finput from slow2d signatures (#68896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68896 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32655874 Pulled By: jbschlosser fbshipit-source-id: 3c9acb106961c40af1432652179edb2bc5a4bfa5	2021-11-30 09:47:24 -08:00
Don Jang	cd3e37cbe4	[Static Runtime] [Code Cleanup] Reduce indentation depth in ops.cpp (#69028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69028 This change converts ``` if (..) { ... } else { ... } # end of function ``` into ``` if(...) { ... return; } ... ``` in ops.cpp to remove the else branch to reduce the indentation depth by 1 for better readability. Test Plan: N/A Reviewed By: hlu1 Differential Revision: D32506235 fbshipit-source-id: a4fd5188bd680dba5dcad2b6e873735a54497664	2021-11-30 09:41:46 -08:00
David Berard	cfc75c2137	[JIT] Separate GPU implementation of frozen_conv_add_relu_fusion.cpp (#68149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68149 JIT optimization passes are part of the CPU-only build (i.e. necessary GPU flags are not passed in). This separates the implementation of frozen_conv_add_relu_fusion so that the GPU-enabled implementation is registered at runtime (if it is available) ghstack-source-id: 143676384 Test Plan: In the following script, conv_add_relu fusion is not observed without this change, but is observed when this change is added. ``` from typing import List, Optional import torch class Model(torch.nn.Module): def __init__(self): super().__init__() self.weight = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) self.add_tensor = torch.nn.Parameter(torch.rand((3, 3, 7, 7), device="cuda")) def forward( self, inp: torch.Tensor, bias: Optional[torch.Tensor], stride: List[int], padding: List[int], dilation: List[int], groups: int, ): # weight = torch.zeros((3, 3, 7, 7), device="cuda") inp = inp.to("cuda") conv_result = torch.conv2d( inp, self.weight, bias, stride, padding, dilation, groups ) add_result = conv_result.add_(self.add_tensor) return add_result.relu_() torch.jit.export def make_prediction(self, inp: torch.Tensor): bias = None groups = 1 stride = (1, 1) padding = (0, 0) dilation = (1, 1) return self.forward(inp, bias, stride, padding, dilation, groups) if __name__ == "__main__": # generate some sample input groups = 1 channels_in = 3 channels_out = 3 kernel_size = (7, 7) stride = (1, 1) padding = (0, 0) dilation = (1, 1) inp = torch.rand((64, 3, 432, 432)) weight = torch.rand( (channels_out, channels_in, kernel_size[0], kernel_size[1]), device="cuda" ) bias = None model = Model() model.eval() script = torch.jit.script(model) script = torch.jit.freeze(script) script = torch.jit.optimize_for_inference(script) print("~~~~ FORWARD ~~~~") print(script.graph) print("with preserved_attrs") print(torch.sum(script.forward(inp, bias, stride, padding, dilation, groups))) ``` Reviewed By: cpuhrsch Differential Revision: D32329330 fbshipit-source-id: c0f10da4b9540c588819efe3ec540baa0fae4b35	2021-11-30 09:31:57 -08:00
Ansha Yu	7342b654a1	[static runtime] dequantize out variant (#68664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68664 Reland D32187063 (`f120335643`), fixing lint Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Test on ctr_mbl_feed model 307210374 on 696 inputs Before: 0.0569016 ms. 0.296647%. aten::dequantize (10 nodes) After: 0.0423128 ms. 0.220481%. static_runtime::dequantize_copy (10 nodes, out variant) Reviewed By: mikeiovine Differential Revision: D32566429 fbshipit-source-id: b95dfc4c5e4115e083794093bc1571c7b1d72f5b	2021-11-30 09:03:26 -08:00
Nikita Shulga	d3de3546d9	Revert D32099294: Split cuda: list cpp files that go in _cu library explicitly Test Plan: revert-hammer Differential Revision: D32099294 (`b47ae9810c`) Original commit changeset: 8a3582944b6b fbshipit-source-id: eab63e6ba3db3e17f404292a3659823607627576	2021-11-30 08:42:19 -08:00
Richard Zou	6fea7499c2	CompositeImplicitAutograd compliance testing (#65819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65819 Related to #61669. Functions registered as CompositeImplicitAutograd MUST work for most, if not all, backends. This includes Tensor subclasses. To achieve this, we (PyTorch) impose a set of constraints on how a CompositeImplicitAutograd function can be written. Concretely, this PR adds tests for all OpInfos that checks for compliance. The things that get tested in this PR apply to composite ops and are that: - the op does not change the metadata of a Tensor without performing dispatches - the op does not call set_ or resize_ - the op does not directly access the data ptr The mechanism for the test is to create a new __torch_dispatch__ object, CompositeCompliantTensor. For each operator, we wrap all inputs in CompositeCompliantTensor, turn on python mode for it, and send it through the operator. Non-CompositeImplicitAutograd operators will pass the test because they perform a dispatch to backend code. Here's how CompositeCompliantTensor catches problems: - If it sees set_ or resize_ getting called, it will directly error out - After each operation, CompositeCompliantTensor checks to make sure that its metadata is consistent with that of the thing it is wrapping. If the CompositeImplicitAutograd op modifes the metadata directly (through e.g. the TensorImpl API) then the metadata will go out of sync. - If data_ptr gets called, that returns a nice error (because the storage is meta). CompositeCompliantTensor is written in an interesting way. First off, if a view operation occurs (e.g. `B = A.view_op(...)`), then B.storage() must alias A.storage() where B.storage() is CompositeCompliantTensor's storage, NOT the storage of the tensor it is wrapping. This is an invariant in autograd, see #62182 for details. To handle this we replay the view on A's storage and set it as B's storage. Secondly, there are cases where the metadata is allowed to go out of sync. I believe this is only possible with in-place view functions, like transpose_, t_, squeeze_, unsqueeze_. Those are special cased. Finally, I added a new section to aten/src/ATen/native/README.md about what it means to be CompositeImplicitAutograd Compliant Test Plan: - run tests Reviewed By: ezyang, bdhirsh Differential Revision: D31268369 Pulled By: zou3519 fbshipit-source-id: 31634b1cbe1778ab30196013cfc376ef9bd2e8b1	2021-11-30 07:35:22 -08:00
Bin Bao	b83e8d560b	[LT] Sync LTC branch changes on torch/csrc/lazy/core (#69012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69012 Some changes to torch/csrc/lazy/core were done on the lazy_tensor_staging branch (https://github.com/pytorch/pytorch/pull/68427). Merge those back into the trunk. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32708696 Pulled By: desertfire fbshipit-source-id: e54b978f2bdb9c7db27880f60246fdf1e8b41019	2021-11-30 07:09:15 -08:00
Hao Lu	39ab417107	[Static Runtime] Fix fb::expand_dims schema (#68636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68636 Same old alias problem Reviewed By: mikeiovine Differential Revision: D32556204 fbshipit-source-id: 4d380f0110ad1be83f705e6d6910a6aaf818ec08	2021-11-30 06:28:29 -08:00
Vasiliy Kuznetsov	5b37ac54cb	dbr quant overhead [14/x]: cache whether an op is a module (#68877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68877 Saves whether an op type is a module during tracing, so we can avoid recalculating this when validating the op during inference. This leads to a small speedup. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` ``` // MobileNetV2, 1x3x224x224, function level profiling // before validate_cur_op - 1.77% // after validate_cur_op - 1.41% ``` Reviewed By: jerryzh168 Differential Revision: D32646149 Pulled By: vkuzo fbshipit-source-id: 03ebc4fedceb84bb885939dff8dec81d30ba6892	2021-11-30 06:13:06 -08:00
Michael Dagitses	b47ae9810c	Split cuda: list cpp files that go in _cu library explicitly (#67216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67216 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32099294 Pulled By: dagitses fbshipit-source-id: 8a3582944b6b48af1ac31c5df09a7e6e838892c4	2021-11-30 04:24:55 -08:00
Peter Bell	174eea8a05	Remove native_functions.yaml dependency from IndexKernel.{cpp,cu} (#66914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66914 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31856105 Pulled By: dagitses fbshipit-source-id: 8729783b68879b509ae6b66ce145de0af68aad8c	2021-11-30 04:24:52 -08:00
Peter Bell	f7d598948a	Remove native_functions.yaml dependency from TensorModeKernel.cu (#66913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66913 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31856102 Pulled By: dagitses fbshipit-source-id: 8888a1984adef09104a40ae683d091143cd1f4fa	2021-11-30 04:22:09 -08:00
Andrew Tulloch	ec1339a48b	[CUDA Pinned Memory] Alternative implementation of pinned memory allocator focusing on multi-threaded scalability (#68906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68906 The existing PyTorch pinned memory allocator has been a challenge for scalability in multi-GPU inference workloads. The existing allocator is mostly designed in the context of training, where in the process-per-GPU setup we have natural sharding of the global locks and lower allocation rates (perhaps O(100 allocs/sec) per process. In this setup we might have globally on the order of O(200k allocs/sec) - e.g. 20k QPS and 10 allocs/query. This is a different domain. In the existing allocator, we observe tail latencies of cudaEventCreate and cudaEventDestroy (while holding the lock) can also completely stall all allocations, which is undesirable. The idea here is to retain a similar design to the existing PyTorch allocator - eager collection of used memory, no lock-free or deferred tricks, identical semantics around events, but to: a) split up the locks around the various critical datastructures, and b) do as little work as possible while holding any process-global mutexes (importantly, no CUDA runtime API calls) c) pool CUDA events manually (as cuda event creation is a bottleneck at high rates from multiple threads). This does require a bit of care, but I believe it's correct. In general the threading and state transitions are fairly simple. With these improvements, microbenchmarks show significant improvements (1.5x-3x). Importantly, real workloads also show significant improvements, especially WRT tail latency and stalls. Test Plan: Unit tests all pass. With a synthetic benchmark such as: ``` static void BM_copies_baseline(benchmark::State& state) { auto N = state.range(0); auto scale = state.range(1); auto object_size_min = N; auto object_size_max = scale * N; auto device = at::Device(at::kCUDA, at::cuda::current_device()); uint64_t bytes_copied = 0; uint64_t allocs = 0; auto stream = at::cuda::getCurrentCUDAStream(); for (auto _ : state) { auto object_size = static_cast<int64_t>(expf(folly::Random::randDouble( logf(object_size_min), logf(object_size_max)))); auto tensor = at::empty( {object_size}, at::TensorOptions().dtype(at::kByte).pinned_memory(true)); at::cuda::CachingHostAllocator_recordEvent( tensor.storage().data_ptr().get_context(), stream); bytes_copied += object_size; allocs += 1; } state.counters["BW"] = benchmark::Counter(bytes_copied, benchmark::Counter::kIsRate); state.counters["Allocs"] = benchmark::Counter(allocs, benchmark::Counter::kIsRate); } BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(1)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(4)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(16)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(64)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(128)->UseRealTime(); BENCHMARK(BM_copies_baseline)->Args({1000000, 20})->Threads(256)->UseRealTime(); ``` I observe roughly 1.5-3x improvements. End to end application testing also sees significant improvements in the contended scenario. Reviewed By: jianyuh, ngimel Differential Revision: D32588784 fbshipit-source-id: ee86c3b7ed4da6412dd3c89362f989f4b5d91736	2021-11-30 02:49:43 -08:00
Jiewen Tan	0cdeb586ae	[LTC] Upstream some utilities (#69046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69046 This commit upstreams utilities including ExceptionCleanup, MaybeRef, Iota, ToVector, ToOptionalVector and GetEnumValue. Test Plan: ./build/bin/test_lazy --gtest_filter=UtilTest.* Reviewed By: wconstab, Chillee Differential Revision: D32709090 Pulled By: alanwaketan fbshipit-source-id: 5147433becd4dbb07be7d36d66b0b8685054d714	2021-11-30 02:44:02 -08:00
Peter Bell	fbaa19a6fa	Sparse: Implement simple unary ufuncs operators (#68887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68887 Closes #46988, closes #46987, closes #46761 By "simple" I mean operators that map 0->0 so we can implement it by just re-dispatching on the values tensor. That does mean we have `sin` but not `cos` for example, but without fill value support this is the best that can be done. Most of these don't support autograd because the derivative formulas use unsupported operators. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32706197 Pulled By: cpuhrsch fbshipit-source-id: 65e1acb3645737ca7bdb7f2db739d8e118906f4b	2021-11-30 00:30:30 -08:00
Mikhail Zolotukhin	3186d36972	[TensorExpr] Supress TracerWarnings in test_unsupported in test_jit_fuser_te.py. (#68757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68757 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600951 Pulled By: ZolotukhinM fbshipit-source-id: 7b9859d7dee1e9803b8fde5d071890a72d30cec9	2021-11-30 00:06:36 -08:00
Mikhail Zolotukhin	75ce040620	[TensorExpr] Allow for 'keepdim' argument in aten::mean in NNC's external call. (#68756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68756 That fixes some warnings in our tests. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600952 Pulled By: ZolotukhinM fbshipit-source-id: 548eaf3659e20795cce44d8f57e77f4a47d44d98	2021-11-30 00:06:34 -08:00
Mikhail Zolotukhin	a93f505ee5	[TensorExpr] IRPrinter: print sizes and name when visiting a Buf. (#68755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68755 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D32600950 Pulled By: ZolotukhinM fbshipit-source-id: 925da05d958497791cb9176a5d15d8315334aa24	2021-11-30 00:05:10 -08:00
Priya Ramani	8cc9ec2f6b	Add option to get input dtype from user (#68751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68751 Add option to get input dtype from user for AOT compilation Test Plan: BI model compiles and runs fine ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64' Building... 8.3 sec (99%) 7673/7674 jobs, 0/7674 updated WARNING: Logging before InitGoogleLogging() is written to STDERR W1116 14:32:44.632536 1332111 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) E1116 14:32:44.673710 1332111 huge_pages_allocator.cc:287] Not using huge pages because not linked with jemalloc The compiled llvm assembly code was saved to bi.compiled.ll The compiled model was saved to bi.compiled.pt ``` > Error thrown when input dims and input types sizes don't match ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' --input_types='int64;int64;int64' . . terminate called after throwing an instance of 'c10::Error' what(): [enforce fail at aot_model_compiler.cc:208] split(';', FLAGS_input_dims).size() == split(';', FLAGS_input_types).size(). Number of input_dims and input_types should be the same . . . ``` Reviewed By: ljk53 Differential Revision: D32477001 fbshipit-source-id: 8977b0b59cf78b3a2fec0c8428f83a16ad8685c5	2021-11-29 21:39:49 -08:00
Thomas Viehmann	ac1fe91dc9	Clean up some THC includes (#69024 ) Summary: These seem to not be needed and cause ninja to rebuild the files at every build. (There also is THCStorage.cu, but hopefully this will go away with https://github.com/pytorch/pytorch/issues/68556 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/69024 Reviewed By: soulitzer Differential Revision: D32705309 Pulled By: ngimel fbshipit-source-id: 5255297f213fdcf36e7203de7460a71291f8c9a0	2021-11-29 20:55:27 -08:00
John Clow	ce53baf573	Merging the implementations of ClearProfiling (#67575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497548 Pulled By: Gamrix fbshipit-source-id: fb656b017d405487e25bd2407b069a702769659f	2021-11-29 19:48:56 -08:00
Peter Bell	e6a8d15a4c	cpu_kernel_vec: Hoist stride checks out of loop (#68962 ) Summary: `cpu_kernel_vec` does stride checks to determine whether to use the vectorized or scalar inner loop. Since it uses a 1d `for_each` loop, it re-does these stride checks after every loop over the inner dimension. For iterators with small inner dimensions, this means a significant proportion of the time may be spent just on stride checks. This changes it to use a 2d loop so the stride checks are further amortized. With the below `copy_` benchmark, it saves 50% of the callgrind instruction count from 28.4 Million to 13.5 Million and 30% time speedup from 22.8 us to 16.4 us on my machine. ``` from torch.utils.benchmark import Timer import timeit timer = Timer( stmt="b.copy_(a);", setup=""" auto a = at::rand({10000, 8}, at::kComplexDouble).slice(0, 0, -1, 2); auto b = at::empty_like(a); """, num_threads=1, language='c++', timer=timeit.default_timer ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68962 Reviewed By: mrshenli Differential Revision: D32684191 Pulled By: ngimel fbshipit-source-id: 582af038314a0f999f43669e66edace38ff8d2dc	2021-11-29 19:37:58 -08:00
Joel Schlosser	61ea2fc35e	Fix device type / dtype handling for parametrized test names (#65217 ) Summary: This PR absolves `_TestParametrizer`s (e.g. `ops`, `modules`, `parametrize`) of the responsibility of adding device type (e.g. `'cpu'`, `'cuda'`, etc.) / dtype (e.g. 'float32') to generated test names. This fixes repeated instances of the device string being added to generated test names (e.g. `test_batch_norm_training_True_cuda_track_running_stats_True_cuda_affine_True_cuda`). The responsibility for placing device / dtype suffixes is now handled by `instantiate_device_type_tests()` instead so it is added a single time. It will place `<device>_<dtype>` at the end of the test name unconditionally, maintaining the current naming convention. As part of this work, I also tightened the semantics through some additional error case handling: * Composing multiple decorators that each try to handle the same parameter will error out with a nice message. This includes the case to trying to compose `modules` + `ops`, as they each try to handle `dtype`. Similarly, `ops` + `dtypes` is forbidden when both try to handle `dtype`. This required changes in the following test files: * `test/test_unary_ufuncs.py` * `test/test_foreach.py` * The `modules` / `ops` decorators will now error out with a nice message if used with `instantiate_parametrized_tests()` instead of `instantiate_device_type_tests()`, since they're not (currently) written to work outside of a device-specific context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65217 Reviewed By: mruberry Differential Revision: D32627303 Pulled By: jbschlosser fbshipit-source-id: c2957228353ed46a0b7da8fa1a34c67598779312	2021-11-29 19:02:23 -08:00
Stefan Ollinger	933d5b561f	Fixed links to RNN docs in comments (#68828 ) Summary: Fixed links to RNN docs in comments Pull Request resolved: https://github.com/pytorch/pytorch/pull/68828 Reviewed By: soulitzer Differential Revision: D32702384 Pulled By: jbschlosser fbshipit-source-id: 577c88842cde555534d9a39fa7dfd24164d71552	2021-11-29 18:55:53 -08:00
Stas Bekman	863f321c6d	Fix typo in AdaptiveLogSoftmaxWithLoss docs (#68926 ) Summary: Fixes a typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/68926 Reviewed By: soulitzer Differential Revision: D32702366 Pulled By: jbschlosser fbshipit-source-id: 8975aad3e817dab33359cf29182b4bd1e3aa1299	2021-11-29 18:51:58 -08:00
mrshenli	b8c3693281	Remove autograd-enabled collective APIs from distributed docs (#69011 ) Summary: These APIs are not yet officially released and are still under discussion. Hence, this commit removes those APIs from docs and will add them back when ready. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/69011 Reviewed By: fduwjj Differential Revision: D32703124 Pulled By: mrshenli fbshipit-source-id: ea049fc7ab6b0015d38cc40c5b5daf47803b7ea0	2021-11-29 18:14:50 -08:00
Peter Bell	178010455d	Vectorized: Use inline namespace instead of anonymous (#67655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67655 Some of the CPU operators already use the `namespace CPU_CAPABILITY` trick to avoid anonymous namespacing, like [`PowKernel.cpp`](`cd51d2a3ec/aten/src/ATen/native/cpu/PowKernel.cpp (L14)`). This extends that pattern to the `Vectorized` class, which avoids `Wsubobject-linage` warnings like I was getting in #67621. For many functions, it was necessary to add `inline` because the functions are defined in a header. There were no link errors previously because the anonymous namespace ensured they were not exposed to linkage. Similarly, free functions defined in an anonymous namespace might need the `C10_UNUSED` attribute to silence warnings about the function not being called in the only translation unit that it's defined in. By removing the anonymous namespace, these decorators are no longer necessary. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32566109 Pulled By: malfet fbshipit-source-id: 01d64003513b4946dec6b709bd73bbab05772134 Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-29 16:54:17 -08:00
Scott Wolchok	1d0416397a	[PyTorch] Change from unique_ptr to optional for RecordFunction state (#68397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68397 Now that hot paths can avoid instantiating RecordFunction by using shouldRunRecordFunction, we can improve efficiency for profiling cases by avoiding a large heap allocation. ghstack-source-id: 144235785 Test Plan: 1) Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts --stressTestRecordFunction --op empty. Before: P467891381 After: P467902339 2) Run without --stressTestRecordFunction to verify no regression in the regular dispatcher path. Before: P467902381 After: P467902403 Reviewed By: chaekit Differential Revision: D32448365 fbshipit-source-id: 2d32a3bd82c60d2bb11fc57bb88bf3f02aa3fa25	2021-11-29 16:35:36 -08:00
Scott Wolchok	7194faed7f	[PyTorch] Optimzize mergeRunCallbacks for RecordFunction (#68387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68387 Function call overhead on tryRunCallback was notable. ghstack-source-id: 144235788 Test Plan: Run //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark before/after this diff with arguemnts `--stressTestRecordFunction --op empty`. Before: P467891339 After: P467891381 Reviewed By: chaekit Differential Revision: D32443863 fbshipit-source-id: c0b3dd40bbd5bca976c2ebb0f21aa62e097b302e	2021-11-29 16:33:36 -08:00
Andrey Talman	f1a3512b78	Adding Linux cuda 11.5 workflows (#68745 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68745 Reviewed By: janeyx99 Differential Revision: D32707491 Pulled By: atalman fbshipit-source-id: 100facfdcc0fc2f68e203a696856852faa25ee08	2021-11-29 16:21:00 -08:00
JUBIN CHHEDA	27228656e6	[FX][docs] Document gotcha about training flag (#68915 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68913 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68915 Reviewed By: jamesr66a Differential Revision: D32705410 Pulled By: jubinchheda fbshipit-source-id: a44c17ab0e62465823ceb0ef983ae330b50fb073	2021-11-29 16:13:32 -08:00
Vasiliy Kuznetsov	f253370bb9	dbr quant overhead [13/x]: cache results of get_module_hook_type (#68841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68841 Caches the current module's hook type as an attribute on the module. This requires the assumption that the current module's hook type does not change during inference, which is an assumption we can commit to. Test Plan: correctness ``` python test/test_quantization.py TestQuantizeDBR ``` performance ``` // MobileNetV2, 1x3x224x224, function profiling // before get_module_hook_type -> 2.58% // after get_module_hook_type -> 0.73% ``` Reviewed By: jerryzh168 Differential Revision: D32630881 Pulled By: vkuzo fbshipit-source-id: 667f2667ef9c5514e5d82e4e7e4c02b8238edc65	2021-11-29 16:10:24 -08:00
Vasiliy Kuznetsov	2ad4727ad9	dbr quant: fix debugging fqn info for converted model (#68840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68840 Fixes the debugging FQN info for a converted model. Some of this information was missing because eager mode convert performed module swaps. This information is only used in debugging and is not used for inference. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` turn `enable_logging` on in `auto_trace.py`, the FQN is now displayed for a converted model Reviewed By: jerryzh168 Differential Revision: D32630884 Pulled By: vkuzo fbshipit-source-id: be8c43343abfdab9fe0af39499d908ed61a01b78	2021-11-29 16:10:21 -08:00
Vasiliy Kuznetsov	a03fe9ba61	dbr quant overhead[12/x]: turn off overrides for module convert output hook (#68839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68839 We can assume that there are no overrides needed for the hook which dequantizes the module outputs, so we can turn them off explicitly. While this does not lead to a measurable perf win, it makes things easier to debug by eliminating the no-op overrides. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32630886 Pulled By: vkuzo fbshipit-source-id: 1719c168f5f21f3e59c80a3b6d0f32ebb1c77ef8	2021-11-29 16:10:18 -08:00
Vasiliy Kuznetsov	515db56755	dbr quant: remove unnecessary outputs hook (#68838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68838 Removes an unnecessary outputs hook on the top level module. The same hook is already called inside the regular hook flow. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: soulitzer Differential Revision: D32630882 Pulled By: vkuzo fbshipit-source-id: aa5f1b1cb866051013195d7311949333b08df4de	2021-11-29 16:10:15 -08:00
Vasiliy Kuznetsov	e3af582f92	dbr quant overhead[11/x]: speed up module convert hook (#68837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68837 The module convert hook dequantizes the module outputs if the user requested the module to adhere to a certain dtype for outputs. This is most commonly used for the assumption that a model's overall return type if fp32. This PR precalculates for each module whether this hook will do anything, and returns early if it does not. This prevents the overhead of this hook to influencing any module which does not need this hook. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` perf ``` MobileNetV2, 1x3x224x224, function level profiling // before outputs_convert_hook - 0.73% // before outputs_convert_hook - 0.45% ``` Reviewed By: jerryzh168 Differential Revision: D32630885 Pulled By: vkuzo fbshipit-source-id: 7ee84de742fc0c752b66d20d097405a754c8b480	2021-11-29 16:10:12 -08:00
Vasiliy Kuznetsov	be70477a7b	dbr quant overhead[10/x]: disable torch_function overrides for leaf nodes (#68836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68836 If we have a leaf module like a `torch.nn.Conv2d`, DBR quant handles the input and output of the module and should treat the inside of this module as invisible. Specifically, there is no need to override the `F.conv2d` call if the parent module is already being overridden. Before this PR, `__torch_function__` was still overridden for the insides of leaf modules, and the override was a no-op. There was some overhead in these overrides because they were checking the hook type. This PR adds a fast global override so we can skip overridding the insides of leaf modules. This has some performance benefits in the prepare model, because we now skip overriding all of the inner functions in observers. Test Plan: testing ``` python test/test_quantization.py TestQuantizeDBR ``` perf ``` // MobileNetV2, 1x3x224x224, comparing fp32 with dbr quant, Mac OS laptop // before fp32: 0.017837 seconds avg fx_prepared: 0.021963 seconds avg, 0.812143 speedup vs fp32 fx_quantized: 0.012632 seconds avg, 1.412056 speedup vs fp32 dt_prepared: 0.034052 seconds avg, 0.523820 speedup vs fp32 dt_quantized: 0.018316 seconds avg, 0.973829 speedup vs fp32 // after fp32: 0.020395 seconds avg fx_prepared: 0.026969 seconds avg, 0.756230 speedup vs fp32 fx_quantized: 0.013195 seconds avg, 1.545611 speedup vs fp32 dt_prepared: 0.033432 seconds avg, 0.610023 speedup vs fp32 dt_quantized: 0.018244 seconds avg, 1.117866 speedup vs fp32 ``` Reviewed By: jerryzh168 Differential Revision: D32630883 Pulled By: vkuzo fbshipit-source-id: 6365e1c514726d8b2a4b3a51f114f5fed3ebe887	2021-11-29 16:08:52 -08:00
Thomas J. Fan	1342f19a8c	Add ModuleInfo-based device transfer tests (#68092 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/65488; addresses the problem that got it reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68092 Reviewed By: mruberry Differential Revision: D32299103 Pulled By: jbschlosser fbshipit-source-id: bc298aca15368f2acb5082e6fb6eedea60b5d75f	2021-11-29 15:48:40 -08:00
Ivan Yashchuk	89a145fd91	Sparse CSR CUDA: Add torch.sparse.sampled_addmm (#68007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68007 This PR adds a new function to the sparse module. `sampled_addmm` computes α(A @ B) spy(C) + β*C, where C is a sparse CSR matrix and A, B are dense (strided) matrices. This function is currently restricted to single 2D matrices, it doesn't support batched input. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32435799 Pulled By: cpuhrsch fbshipit-source-id: b1ffac795080aef3fa05eaeeded03402bc097392	2021-11-29 15:43:29 -08:00
Peter Bell	af49805a73	Port lerp to structured kernels (#68924 ) Summary: Ref https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68924 Reviewed By: jbschlosser Differential Revision: D32697409 Pulled By: bdhirsh fbshipit-source-id: b098533e46f8bdbb995c76db0e6a124ab2b076b8	2021-11-29 15:11:30 -08:00
zilinzhu	62847a2b9c	Fix bug on empty GLOO_SOCKET_IFNAME_ENV (#68933 ) Summary: This PR is trying to fix the no device bug when user resets the `GLOO_SOCKET_IFNAME_ENV` with ```bash export GLOO_SOCKET_IFNAME_ENV= ``` Thank you for your time on reviewing this PR :). cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68933 Reviewed By: soulitzer Differential Revision: D32690633 Pulled By: mrshenli fbshipit-source-id: f6df2b8b067d23cf1ec177c77cc592dc870bda72	2021-11-29 15:05:38 -08:00
Thomas J. Fan	b468566208	Add ModuleInfo-based CPU / GPU parity tests (#68097 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/64694; fixes issues with the diff there Pull Request resolved: https://github.com/pytorch/pytorch/pull/68097 Reviewed By: mruberry Differential Revision: D32300650 Pulled By: jbschlosser fbshipit-source-id: f3a5e72b019d4eddd7202854999eab61fffc9006	2021-11-29 14:58:07 -08:00
Pearu Peterson	fb63bb60ec	Strided masked norm. (#68584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68584 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32581285 Pulled By: cpuhrsch fbshipit-source-id: 896ee1e58957b46c2f6a16a170adff4cb3b8da62	2021-11-29 14:23:27 -08:00
Santiago Castro	f776f30780	Keep the sequence or mapping type in `default_collate` (#68779 ) Summary: `default_collate`, `default_convert`, and `pin_memory` convert sequences into lists. I believe they should keep the original type when possible (e.g., I have a class that inherits from `list`, which comes from a 3rd party library that I can't change, and provides extra functionality). Note it's easy to do when the type supports an iterable in its creation but it's not always the case (e.g., `range`). Even though this can be accomplished if using a custom `default_collate`/`default_convert`, 1) this is behavior they should support out-of-the-box IMHO, and 2) `pin_memory` still does it. cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/68779 Reviewed By: wenleix Differential Revision: D32651129 Pulled By: ejguan fbshipit-source-id: 17c390934bacc0e4ead060469cf15dde815550b4	2021-11-29 13:14:20 -08:00
Kurt Mohler	d9e7d85390	Remove TH/THC Storage (#68556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67852 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/68556 Reviewed By: ejguan Differential Revision: D32652758 Pulled By: ngimel fbshipit-source-id: 170956fca112606f9008abe09b92c6ddc411be09	2021-11-29 12:55:20 -08:00
Peter Bell	f5fa91ba2e	Sparse: Add additional opinfo tests (#68886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68886 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32697933 Pulled By: cpuhrsch fbshipit-source-id: fffdd1bc663cc1bc49abe8cf3680982d1cb497bc	2021-11-29 12:49:20 -08:00
Rohan Varma	3bd7dbf119	[Dist CI][BE] Remainder of c10d/store tests run in subprocess (#68822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68822 Per title, we switched over c10d_gloo and nccl and results look good so far, so switch the rest of them as well. After the only dist tests that won't run in subprocess are pipe and fsdp tests, which historically haven't had much flakiness. ghstack-source-id: 144213522 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624330 fbshipit-source-id: 469f613e5b0e4529e6b23ef259d948837d4af26b	2021-11-29 10:59:39 -08:00
Rohan Varma	250d0bd20b	[RPC][Dist CI][BE] RPC tests run in subprocess (#68821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68821 Continuing effort to move most distributed tests to run in subprocess for better reproducibility + reduce flakiness. ghstack-source-id: 144213520 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32624199 fbshipit-source-id: 04448636320554d7a3ab29ae92bc1ca9fbe37da2	2021-11-29 10:58:08 -08:00
Eli Uriegas	51f4ac40fd	ci: Use default blank if no TEST_CONFIG (#69008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/69008 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32699051 Pulled By: seemethere fbshipit-source-id: 9ed12fe8a7f541c6eda77182cfd1b0a733a545f0	2021-11-29 10:05:20 -08:00
Nikita Shulga	ee59a09772	Implement sharding for MacOS jobs (#68784 ) Summary: Do not run distributed tests as part of separate shard, but keep it inside one of the two shards (to limit concurrency problems) Fixes https://github.com/pytorch/pytorch/issues/68260 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68784 Reviewed By: seemethere, janeyx99 Differential Revision: D32653440 Pulled By: malfet fbshipit-source-id: ebe5bbc30bdf67e930f2c766c920932700f3a4e4	2021-11-29 09:31:42 -08:00
Ivan Yashchuk	61a4204d80	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32650366 Pulled By: cpuhrsch fbshipit-source-id: 430a9627901781ee3d2e2496097b71ec17727d98	2021-11-29 08:58:49 -08:00
Peter Bell	9ee5db490b	neg_sparse: Fix output dtype (#68885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68885 `torch.neg` should preserve the input dtype but for sparse tensors it was promoting integers to floating point. This would have been picked up by the OpInfo-based test, but `neg` wasn't marked with `supports_sparse=True` so it was never run. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32680008 Pulled By: cpuhrsch fbshipit-source-id: 502f8743c1c33ab802e3d9d097792887352cd220	2021-11-29 08:48:22 -08:00
Vinnam Kim	7b701ce2d4	Add set_to_none option to C++ API (#68801 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68167. Signed-off-by: Vinnam Kim <vinnam.kim@makinarocks.ai> Pull Request resolved: https://github.com/pytorch/pytorch/pull/68801 Reviewed By: mruberry Differential Revision: D32625239 Pulled By: jbschlosser fbshipit-source-id: 5f09b959e23d5448106a47029d06ec20ad094d82	2021-11-29 08:42:39 -08:00
Bin Bao	787ded5103	Add lazy::Shape::numel() (#68314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68314 Add a convenience to lazy::Shape for counting the number of elements (by multiplying out the dimensions). This is a method on Tensor, and in switching other lazy tensor shape utils to use aten shape inference, we need numel counts. Test Plan: add unit tests Reviewed By: alanwaketan Differential Revision: D32409138 fbshipit-source-id: 3ae725300f8826d38e45412f46501d5e5f776fb2	2021-11-29 08:38:09 -08:00
Richard Zou	3d504ae1b4	[RELAND] Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#68073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68073 Relanding the original PR. Its body was as follows: Followup to https://github.com/pytorch/pytorch/pull/60787 It turns out that the original PR was wrong for unboxed kernels. We recently ran into this in https://github.com/facebookresearch/functorch/issues/124 For unboxed kernels, the correct type for a Tensor?[] argument is actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>` ghstack-source-id: 144204580 Test Plan: - assert that https://github.com/facebookresearch/functorch/issues/124 actually works Reviewed By: gchanan Differential Revision: D32313601 Pulled By: zou3519 fbshipit-source-id: 8028d5f34eecabc53d603bd54d6b6748b5db461a	2021-11-29 08:31:55 -08:00
Eli Uriegas	17ba936da0	.github: Migrate XLA tests to GHA (#64320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64320 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30684490 Pulled By: seemethere fbshipit-source-id: 5d2657f9aa4c7082591239a5bb095cc85d2cde66	2021-11-29 08:30:57 -08:00
Eli Uriegas	f398320e0d	packaging: Include lazy headers in package_data (#68817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68817 Looks like these files are getting used by downstream xla so we need to include them in our package_data Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32622241 Pulled By: seemethere fbshipit-source-id: 7b64e5d4261999ee58bc61185bada6c60c2bb5cc	2021-11-29 08:29:48 -08:00
Richard Zou	871cd7c5b9	Forward-mode AD support for torch.split, torch.split_with_sizes (#68566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68566 These are just auto-linear as pointed out by Jeffrey. ghstack-source-id: 143814393 Test Plan: - Run OpInfo tests. Reviewed By: albanD, soulitzer Differential Revision: D32520239 Pulled By: zou3519 fbshipit-source-id: 807115157b131e6370f364f61db1b14700279789	2021-11-29 07:50:53 -08:00
Philip Meier	3315c4b31e	add instructions for unhandled exceptions in assert_close (#68722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68722 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32684446 Pulled By: mruberry fbshipit-source-id: 04fe5730721d24e44692cdc9bb327484356ead3f	2021-11-28 21:35:53 -08:00
Mike Ruberry	d095f498a0	Tensor docs (#63308 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62146. Modernizes and clarifies the documentation of torch.tensor and torch.as_tensor, highlighting the distinction in their copying behavior and preservation of autograd history. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63308 Reviewed By: albanD, ngimel Differential Revision: D30338025 Pulled By: mruberry fbshipit-source-id: 83a0c113e4f8fce2dfe086054562713fe3f866c2	2021-11-28 21:26:12 -08:00
Mike Ruberry	6ae34ea6f8	Revert D32521980: Add linalg.lu_factor Test Plan: revert-hammer Differential Revision: D32521980 (`b10929a14a`) Original commit changeset: 26a49ebd87f8 fbshipit-source-id: e1a6bb9c2ece9bd78190fe17e16a46e3358c5c82	2021-11-28 17:22:15 -08:00
lezcano	b10929a14a	Add linalg.lu_factor (#66933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66933 This PR exposes `torch.lu` as `torch.linalg.lu_factor` and `torch.linalg.lu_factor_ex`. This PR also adds support for matrices with zero elements both in the size of the matrix and the batch. Note that this function simply returns empty tensors of the correct size in this case. We add a test and an OpInfo for the new function. This PR also adds documentation for this new function in line of the documentation in the rest of `torch.linalg`. Fixes https://github.com/pytorch/pytorch/issues/56590 Fixes https://github.com/pytorch/pytorch/issues/64014 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32521980 Pulled By: mruberry fbshipit-source-id: 26a49ebd87f8a41472f8cd4e9de4ddfb7f5581fb	2021-11-27 17:52:48 -08:00
kshitij12345	01ddd5dde6	[opinfo] use dtypes instead of dtypesIfCPU (#68732 ) Summary: Reland https://github.com/pytorch/pytorch/issues/67619 Replace usage of dtypesIfCPU with dtypes in OpInfo class and also make it a mandatory argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68732 Reviewed By: jbschlosser Differential Revision: D32594344 Pulled By: mruberry fbshipit-source-id: 660b38aef97752ba064228e8989041ed1d5777fe	2021-11-27 16:07:51 -08:00
Xiang Gao	cffad597ea	Tune test_reference_numerics_normal (#68019 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68019 Reviewed By: albanD Differential Revision: D32482535 Pulled By: mruberry fbshipit-source-id: 48300a5c6a4484fb81789f9049d3f08272d9f31c	2021-11-26 18:59:31 -08:00
Maksim	5fdcc20d8d	[JIT][Symbolic Shape Analysis] expose op shape functions (#68748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68748 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32598605 Pulled By: makslevental fbshipit-source-id: c97a06cd0fe143a6ea14db65fc5d3f76abdff312	2021-11-24 17:17:01 -08:00
Natalia Gimelshein	f14c16e509	Revert D32599540: [pytorch][PR] implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension. Test Plan: revert-hammer Differential Revision: D32599540 (`bc3bdbc8f4`) Original commit changeset: 9227f7e99318 fbshipit-source-id: edfe7072073d910a49be52e1b8c2d374ef71e9ec	2021-11-24 17:15:31 -08:00
Nikolay Korovaiko	c2e3b92db4	partial revert of D32522826 (#68889 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68889 Reviewed By: cpuhrsch, ejguan Differential Revision: D32650385 Pulled By: Krovatkin fbshipit-source-id: 2c4a30cfc729a023b592b6b6e1959bbd2ad6f7cf	2021-11-24 17:05:20 -08:00
Guo Yejun	4afa5ea0ab	native_functions.yaml: remove SparseXPU which is added by accident (#68791 ) Summary: gen_backend_stubs.py will report 'assert' when generate code with SparseXPU dispatch key for external backends, if SparseXPU is in native_functions.yaml. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68791 Reviewed By: cpuhrsch, ejguan Differential Revision: D32646303 Pulled By: bdhirsh fbshipit-source-id: 64e42cc40468bc8c696a31b4b7c0cc3728866a64	2021-11-24 15:34:17 -08:00
Nikita Shulga	c5f63f859e	Add slow path to `getCustomClassTypeImpl` (#68717 ) Summary: This fixes custom class registration issue when `typeid` is not guaranteed to be unique across multiple libraries, which is the case for libc++ runtime on MacOS 11 in particular for M1 From [libcxx/include/typeinfo](`78d6a7767e/include/typeinfo (L139)`): ``` // -------------------------------------------------------------------------- // // NonUniqueARMRTTIBit // -------------------------------------------------------------------------- // // This implementation of type_info does not assume always a unique copy of // the RTTI for a given type inside a program. It packs the pointer to the // type name into a uintptr_t and reserves the high bit of that pointer (which // is assumed to be free for use under the ABI in use) to represent whether // that specific copy of the RTTI can be assumed unique inside the program. // To implement equality-comparison of type_infos, we check whether BOTH // type_infos are guaranteed unique, and if so, we simply compare the addresses // of their type names instead of doing a deep string comparison, which is // faster. If at least one of the type_infos can't guarantee uniqueness, we // have no choice but to fall back to a deep string comparison. ``` But `std::type_index` hash is computed always assuming that implementation is unique By adding a slow path this problem can be fixed in those scenarios. Fixes https://github.com/pytorch/pytorch/issues/68039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68717 Reviewed By: seemethere Differential Revision: D32605187 Pulled By: malfet fbshipit-source-id: 8d50e56885b8c97dad3bc34a69c47ef879456dd1	2021-11-24 15:00:47 -08:00
Nikita Shulga	14dc9759f2	Revert D32650384: OpInfos for torch.{flatten, column_stack} Test Plan: revert-hammer Differential Revision: D32650384 (`aceb46e4ce`) Original commit changeset: 9ead83b378d0 fbshipit-source-id: 3ef281e536b1f21a6f13c6c51309021cf92b53b2	2021-11-24 14:55:26 -08:00
Ariel Kwiatkowski	96929ea995	Update empty and empty_like examples in docs (#68874 ) Summary: For some reason, the example for `torch.empty` showed the usage of `torch.empty_like` and the other way around. These are now swapped. Fixes https://github.com/pytorch/pytorch/issues/68799 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68874 Reviewed By: wenleix Differential Revision: D32646645 Pulled By: ejguan fbshipit-source-id: c8298bcaca450aaa4abeef2239af2b14cadc05b3	2021-11-24 14:01:06 -08:00
Andrew Tulloch	d44e610efa	[CUDA Pinned Memory] Event recording with non-blocking copies should track the storage context, not the tensor data pointer (#68749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68749 The logic for asynchronous copies (either HtoD or DtoH) using cudaMemcpyAsync relies on recording an event with the caching host allocator to notify it that a given allocation has been used on a stream - and thus it should wait for that stream to proceed before reusing the host memory. This tracking is based on the allocator maintaining a map from storage allocation pointers to some state. If we try to record an event for a pointer we don't understand, we will silently drop the event and ignore it (`9554ebe44e/aten/src/ATen/cuda/CachingHostAllocator.cpp (L171-L175)`). Thus, if we use the data_ptr of a Tensor instead of the storage allocation, then reasonable code can lead to incorrectness due to missed events. One way this can occur is simply by slicing a tensor into sub-tensors - which have different values of `data_ptr()` but share the same storage, for example: ``` image_batch = torch.randn(M, B, C, H, W).pin_memory() for m in range(M): sub_batch = image_batch[m].cuda(non_blocking=True) # sub_batch.data_ptr() != image_batch.data_ptr() except for m == 0. # however, sub_batch.storage().data_ptr() == image_batch.storage().data_ptr() always. ``` Therefore, we instead use the storage context pointer when recording events, as this is the same state that is tracked by the caching allocator itself. This is a correctness fix, although it's hard to determine how widespread this issue is. Using the storage context also allows us to use a more efficient structure internally to the caching allocator, which will be sent in future diffs. Test Plan: Test added which demonstrates the issue, although it's hard to demonstrate the race explicitly. Reviewed By: ngimel Differential Revision: D32588785 fbshipit-source-id: d87cc5e49ff8cbf59052c3c97da5b48dd1fe75cc	2021-11-24 13:20:22 -08:00
Juhyeong Kim	bc3bdbc8f4	implemented 'torch.distributions.constraints.symmetric' checking if the tensor is symmetric at last 2 dimension. (#68644 ) Summary: Implemented submodule for https://github.com/pytorch/pytorch/issues/68050 Opened cleaned, final version of PR for https://github.com/pytorch/pytorch/issues/68240 Explanation: I am trying to contribute to PyTorch by implementing distributions for symmetric matrices like Wishart distribution and Inverse Wishart distribution. Although there is a LKJ distribution for the Cholesky decomposition of correlation matrices, it only represents equivalence to restricted form of Wishart distribution. [https://arxiv.org/abs/1809.04746](https://arxiv.org/abs/1809.04746) Thus, I started implementing Wishart distribution and Inverse Wishart distribution seperately. I added a short code about the 'torch.distributions.constraints.symmetric', which was not included in 'torch.distributions.constraints' previously. i.e., 'torch.distributions.constraints' contains module like 'positive_definite' constraints, but it just assumes symmetricity of the input matrix. [Link](`1adeeabdc0/torch/distributions/constraints.py (L466)`) So, I think it will be better if we have constraint checking symmetricity of the tensors in PyTorch. We may further utilize it like `constraints.stack([constraints.symmetric, constraints.positive_definite])` for the constraint of the covariance matrix in Multivariate Normal distribution, for example, to check if the random matrix is a symmetric positive definite matrix. cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/68644 Reviewed By: jbschlosser Differential Revision: D32599540 Pulled By: neerajprad fbshipit-source-id: 9227f7e9931834a548a88da69e4f2e9af7732cfe	2021-11-24 13:13:28 -08:00
Jerry Zhang	1940cc028e	[quant][graphmode][fx] Fork subgraph_rewriter from torch.fx to quantization (#68228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68228 Forking this for now so that we can make changes as we need, the changes can be merged back to torch.fx later Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537713 fbshipit-source-id: 326598d13645fcc28ef2c66baaac6a077b80fd0c	2021-11-24 10:49:05 -08:00
anjali411	aceb46e4ce	OpInfos for torch.{flatten, column_stack} (#67555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67555 Test Plan: Imported from OSS Reviewed By: cpuhrsch Differential Revision: D32650384 Pulled By: anjali411 fbshipit-source-id: 9ead83b378d0ece60569e1a0fc7d8849f89566b3	2021-11-24 10:25:37 -08:00
lezcano	cf54416925	Add docs entry for `adjoint`. (#68869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68869 As per title. cc brianjo mruberry anjali411 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32647456 Pulled By: anjali411 fbshipit-source-id: 2cb053a6884e2b22d3decc058e86d10f355fcb84	2021-11-24 10:03:41 -08:00
anjali411	c7d5e0f53f	OpInfos for torch.atleast_{1d, 2d, 3d} (#67355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67355 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32649416 Pulled By: anjali411 fbshipit-source-id: 1b42e86c7124427880fff52fbe490481059da967	2021-11-24 09:55:39 -08:00
Kurt Mohler	b69155f754	Avoid dtype mismatch error in `torch.save` if storages are unallocated (#68787 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58970 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/68787 Reviewed By: mruberry Differential Revision: D32617425 Pulled By: anjali411 fbshipit-source-id: fe7f2374e4ef4428346a0a202cae8e0d382e03ab	2021-11-24 09:51:29 -08:00
Nikita Shulga	208e109dbf	Revert D32633806: Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse Test Plan: revert-hammer Differential Revision: D32633806 (`b28ddd72d3`) Original commit changeset: b98db0bd655c fbshipit-source-id: 1c757628526bb1b88747257fc77d8b9cb996e502	2021-11-24 09:15:17 -08:00
Ivan Kobzarev	7802953dd5	[nnc][quantization] quantized ops for BI bytedoc via aten (#68790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68790 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32609427 Pulled By: IvanKobzarev fbshipit-source-id: de8f4209befe2509f5033888c739554470768290	2021-11-24 08:59:44 -08:00
Yi Zhang	31d36fd35d	fix sccache issue on Windows CPU (#68870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68796 ``` 2021-11-24T10:12:40.7634007Z Compile requests 4312 2021-11-24T10:12:40.7634484Z Compile requests executed 4300 2021-11-24T10:12:40.7634823Z Cache hits 4227 2021-11-24T10:12:40.7635122Z Cache hits (C/C++) 4227 2021-11-24T10:12:40.7636139Z Cache misses 62 2021-11-24T10:12:40.7636930Z Cache misses (C/C++) 62 2021-11-24T10:12:40.7637333Z Cache timeouts 0 2021-11-24T10:12:40.7637839Z Cache read errors 0 2021-11-24T10:12:40.7638161Z Forced recaches 0 2021-11-24T10:12:40.7638489Z Cache write errors 0 2021-11-24T10:12:40.7638828Z Compilation failures 1 2021-11-24T10:12:40.7639180Z Cache errors 10 2021-11-24T10:12:40.7639490Z Cache errors (C/C++) 10 2021-11-24T10:12:40.7639856Z Non-cacheable compilations 0 2021-11-24T10:12:40.7640244Z Non-cacheable calls 0 2021-11-24T10:12:40.7640601Z Non-compilation calls 12 2021-11-24T10:12:40.7640987Z Unsupported compiler calls 0 2021-11-24T10:12:40.7641426Z Average cache write 0.104 s 2021-11-24T10:12:40.7641763Z Average cache read miss 6.000 s 2021-11-24T10:12:40.7642110Z Average cache read hit 0.046 s 2021-11-24T10:12:40.7642485Z Failed distributed compilations 0 ``` https://github.com/pytorch/pytorch/runs/4310176911?check_suite_focus=true cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/68870 Reviewed By: ejguan Differential Revision: D32646289 Pulled By: janeyx99 fbshipit-source-id: bf04446439e55a4ccaf9ce7c77812752ca717a7c	2021-11-24 08:04:59 -08:00
Howard Huang	be7e159e71	Remove extraneous logging (#68830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68830 No logical changes, removing a logging statement that was accidentally committed. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jjlilley mrzzd Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D32628711 Pulled By: H-Huang fbshipit-source-id: 070190b92f97c8e38d8bb03124c13cb061fc9ec1	2021-11-24 07:15:50 -08:00
Ivan Kobzarev	7d8a79b6f3	[nnc] llvm_codegen quantization types for vectype (#68736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68736 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32596261 Pulled By: IvanKobzarev fbshipit-source-id: 0388c3b5ae58eb16921d25d9a784f82f1bb924fc	2021-11-24 01:17:39 -08:00
Ivan Yashchuk	b28ddd72d3	Sparse CSR CUDA: Add block torch.addmm when mat1 is sparse (#68707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68707 This PR adds a path for block CSR matrices for `torch.addmm`. cuSPARSE interface is restricted to 32-bit indices and square blocks. My plan is to make everything work and tests passing using an unsafe constructor first, keeping it all private. Then discuss & implement constructors with block information separately unlocking the functions for wider use. Documentation will come with the update to constructors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32633806 Pulled By: cpuhrsch fbshipit-source-id: b98db0bd655cce651a5da457e78fca08619a5066	2021-11-23 22:55:46 -08:00
Nikita Shulga	b5b62b3408	Cleanup old TD logic (#68842 ) Summary: Remove `--determine-from` option from run_test.py and remove all references from corresponding test scripts Followup after https://github.com/pytorch/pytorch/pull/64921 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68842 Reviewed By: seemethere, janeyx99 Differential Revision: D32631418 Pulled By: malfet fbshipit-source-id: bdb5dd888c1d97dfaf95c1f299bf8073f3de9588	2021-11-23 18:45:42 -08:00
Donald Dong	d9f3feb5a2	[SR] Use std::vector::reserve for StaticModule constants (#68834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68834 This diff uses std::vector::reserve for constructing constants in StaticModule. We can also avoid two extra iterations over all the graph nodes. This diff should technically improve its performance by a tiny bit. Test Plan: - [x] buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- -v 1 Reviewed By: mikeiovine Differential Revision: D32628806 fbshipit-source-id: 99dd2a7a36e86899ca1fe5300f3aa90d30a43726	2021-11-23 18:00:04 -08:00
Kent Gauen	8fb9ce4927	Update Documentation to Make CUDA Call Explicit (#67973 ) Summary: I am clarifying in the docs to make the call to cudaStreamWaitEvent explicit. Fixes https://github.com/pytorch/pytorch/issues/67866 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67973 Reviewed By: mruberry Differential Revision: D32620261 Pulled By: ngimel fbshipit-source-id: 1fc8beb2062baaddb013ea4d7b10da2baa10f15e	2021-11-23 16:25:37 -08:00
andrewor	79b67d9a4a	[Quant] Refactor handling of FixedQParams operators (#68143 ) Summary: Summary: FixedQParams operators do not need fake quantization in the prepare step. This commit introduces FixedQParamsObserver and makes FixedQParamsFakeQuantize a simple wrapper around this observer. It also removes the fake quantize logic in forward. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68143 Test Plan: Added two tests: python3 test/test_quantization.py TestQuantizeFx.test_fixed_qparams_patterns python3 test/test_quantization.py TestQuantizeFx.test_register_patterns Reviewers: Jerry Zhang Subscribers: Jerry Zhang, Supriya Rao Tasks: T104942885 Tags: pytorch Reviewed By: albanD Differential Revision: D32484427 Pulled By: andrewor14 fbshipit-source-id: 5a048b90eb4da79074c5ceffa3c8153f8d8cd662	2021-11-23 15:26:10 -08:00
Shirong Wu	998daf44d6	All get_attr node to be in64 type (#68818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68818 Operator Support was blocking all node with dtype int64 from lowering. This diff ease the condition, for input from get_attr node(which are known not gonna be used for trt compute) to have dtype int64. Reviewed By: brad-mengchi, 842974287 Differential Revision: D32609457 fbshipit-source-id: ea255f3281349a4254cb6abdeed671ab2c0216ba	2021-11-23 15:21:47 -08:00
Nikita Shulga	78dce417a1	[BE] Simplify magma installation logic (#68778 ) Summary: Difference between `CUDA_VERSION` is magma package name is just a dot between major and minor In process of refactoring, discovered that some docker images set `CUDA_VERSION` to contain minor revision, so modified pattern to strip it, i.e. `cuda-magma102` would be installed for `CUDA_VERSION=10.2.89` and `cuda-magma113` would be installed for `CUDA_VERSION=11.3.0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68778 Reviewed By: seemethere Differential Revision: D32605365 Pulled By: malfet fbshipit-source-id: 43f8edeee5b55fdea6b4d9943874df8e97494ba1	2021-11-23 14:57:44 -08:00
Xiao Wang	2cd48d14ef	Fix `test_svd_errors_and_warnings` warning message when cuda >= 11.5 (#68683 ) Summary: In SVD cusolverDnXgesvd computations, When cuda < 11.5, cusolver raises CUSOLVER_STATUS_EXECUTION_FAILED when input contains nan. When cuda >= 11.5, cusolver normally finishes execution and sets info array indicating convergence issue. Related: https://github.com/pytorch/pytorch/issues/68259 #64533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68683 Reviewed By: dagitses Differential Revision: D32583576 Pulled By: mruberry fbshipit-source-id: f732872522e0bda2703450ffcc64ae3a0d3f5bc0	2021-11-23 14:16:23 -08:00
Omkar Salpekar	8e343ba5db	Revert D32611368: [pytorch][PR] Initial version of general convolution_backward Test Plan: revert-hammer Differential Revision: D32611368 (`445b31abff`) Original commit changeset: 26d759b7c908 fbshipit-source-id: e91f45f0f31150e60d657a3964b7e42027beff58	2021-11-23 13:39:36 -08:00
Pritam Damania	84047ff342	Add API usage logging to ShardedTensor and fix a few tests. (#68771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68771 ghstack-source-id: 143974518 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D32601562 fbshipit-source-id: ed624137efab94fbe556609bb40cca14e69d9bac	2021-11-23 13:30:59 -08:00
Han Qi	959cb03132	Populate operator_input_sizes_ (#68542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68542 title Test Plan: unittest Reviewed By: iseeyuan Differential Revision: D32508159 fbshipit-source-id: 0773a725973a493f19a2e9a340365e559dfdf7f8	2021-11-23 12:18:06 -08:00
Artsiom Sanakoyeu	c0e6dc9ac7	[pytorch] Fix loading from checkpoint after "maximize" flag was introduced in SGD (#68733 ) Summary: After 'maximize' flag was introduced in https://github.com/pytorch/pytorch/issues/46480 some jobs fail because they resume training from the checkpoints. After we load old checkpoints we will get an error during optimizer.step() call during backward pass in [torch/optim/sgd.py", line 129] because there is no key 'maximize' in the parameter groups of the SGD. To circumvent this I add a default value `group.setdefault('maximize', False)` when the optimizer state is restored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68733 Reviewed By: albanD Differential Revision: D32480963 Pulled By: asanakoy fbshipit-source-id: 4e367fe955000a6cb95090541c143a7a1de640c2	2021-11-23 11:42:16 -08:00
Eli Uriegas	73f494d690	.circleci: Remove migrated CUDA 10.2 build (#68782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68782 These builds are no longer required for slow_gradcheck and should be removed Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D32606679 Pulled By: seemethere fbshipit-source-id: e4827a6f217b91c34cfab6c2340e3272f3db1522	2021-11-23 09:50:53 -08:00
Samantha Andow	23288fdacc	Making norms inputs independent (#68526 ) Summary: An update to https://github.com/pytorch/pytorch/issues/67442 to make sure all of the inputs produced are independent Updates group_norm and instance_norm (local_response_norm was already producing independent inputs) Also updates instance_norm for a bug in one set of inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/68526 Reviewed By: ngimel Differential Revision: D32532076 Pulled By: samdow fbshipit-source-id: 45b9320fd9aecead052b21f838f95887cfb71821	2021-11-23 09:41:36 -08:00
Peter Bell	e7e1b76106	Require CMake 3.13 when building with Ninja (#68731 ) Summary: There is a bug in CMake's Ninja generator where files considered inputs to the cmake command couldn't be generated by another build step. The fix was included in CMake 3.13, but 3.10.3 is still sufficient for other cmake generators e.g. makefiles. For reference, the bug is here https://gitlab.kitware.com/cmake/cmake/-/issues/18584 This is necessary for https://github.com/pytorch/pytorch/issues/68246 but I'm isolating the change here to make testing easier. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68731 Reviewed By: jbschlosser Differential Revision: D32604545 Pulled By: malfet fbshipit-source-id: 9bc0bd8641ba415dd63ce21a05c177e2f1dd9866	2021-11-23 09:34:20 -08:00
Caspar van Leeuwen	3282386aa4	Added additional string to search cpu flags for vnni detection (#67686 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67685 cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/67686 Reviewed By: ejguan Differential Revision: D32109038 Pulled By: malfet fbshipit-source-id: 3ea6e4cc1aa82831fd6277129a67c8241a5591a5	2021-11-23 09:32:53 -08:00
Wanchao Liang	98e51895ef	[dist_quant] change op registration to each file instead (#68797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68797 This change dist quantization op registration to each file instead, allow torch deploy test to pass ghstack-source-id: 143994945 Test Plan: wait for sc Reviewed By: jbschlosser Differential Revision: D32610679 fbshipit-source-id: 3ade925286f1ed0f65017939f1ad3f5c539e1767	2021-11-23 09:20:26 -08:00
Joel Schlosser	445b31abff	Initial version of general convolution_backward (#65219 ) Summary: Towards [convolution consolidation](https://fb.quip.com/tpDsAYtO15PO). Introduces the general `convolution_backward` function that uses the factored-out backend routing logic from the forward function. Some notes: * `finput` is now recomputed in the backward pass for the slow 2d / 3d kernels instead of being saved from the forward pass. The logic for is based on the forward computation and is present in `compute_finput2d` / `compute_finput3d` functions in `ConvUtils.h`. * Using structured kernels for `convolution_backward` requires extra copying since the backend-specific backward functions return tensors. Porting to structured is left as future work. * The tests that check the routing logic have been renamed from `test_conv_backend_selection` -> `test_conv_backend` and now also include gradcheck validation using an `autograd.Function` hooking up `convolution` to `convolution_backward`. This was done to ensure that gradcheck passes for the same set of inputs / backends. The forward pass routing is done as shown in this flowchart (probably need to download it for it to be readable since it's ridiculous): ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/137186002-5bca75ca-f911-4e61-8245-ec07af841506.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/139731619-9d0d436e-cce3-4bc3-8eaf-d469f667f0d7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65219 Reviewed By: mruberry Differential Revision: D32611368 Pulled By: jbschlosser fbshipit-source-id: 26d759b7c908ab8f19ecce627acea7bd3d5f59ba	2021-11-23 08:19:45 -08:00
Jerry Zhang	a31aea8eaa	[quant][graphmode][fx] Add support for specifying reference quantized module mapping in backend_config_dict (#68227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68227 This PR adds two keys to backend_config_dict: "root_module": the root module for the pattern (since we may have patterns for fused ops) "reference_quantized_module_for_root": the corresponding reference quantized module for the root Test Plan: ``` python test/test_quant_trt.py TestQuantizeFxTRTOps python test/test_quant_trt.py TestConvertFxDoNotUse ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537711 fbshipit-source-id: 6b8f36a219db7bb6633dac53072b748ede8dfa78	2021-11-22 21:35:04 -08:00
Zafar	b845b9876b	[sparsity] Fix for the failing pruner test (#68794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68794 The pruner `test_constructor` fails because of a typo in the regular expression matching for the error that the pruner throws. This fixes it. Test Plan: Separate test is not needed -- single letter change. Previous test: `python test/test_ao_sparsity.py -- TestBasePruner Reviewed By: ngimel Differential Revision: D32609589 Pulled By: z-a-f fbshipit-source-id: 800ef50c8cdbf206087bc6f945d1830e4af83c03	2021-11-22 21:07:24 -08:00
Junjie Wang	d6a68e0b8d	[PyTorch][3/N] Enable the rest forward spec options for ShardedEmbedding and ShardedEmbeddingBag. (#67799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67799 We have enabled the sharding embedding and embedding bag in https://github.com/pytorch/pytorch/pull/67188 and https://github.com/pytorch/pytorch/pull/66604. We now want to enable as many parameters as defined in doc as possible: https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding_bag.html, https://pytorch.org/docs/stable/generated/torch.nn.functional.embedding.html. For the ones that we don't support we just throw exception. Last but not least, we use get to get params instead of directly using the key. ghstack-source-id: 143987066 Test Plan: Unit test & CI Reviewed By: pritamdamania87 Differential Revision: D31985333 fbshipit-source-id: 3794241b81eecc815bc4390679d0bb0323f4ae72	2021-11-22 20:33:03 -08:00
Saketh Are	5d300e761d	Add OpInfos for parcel Activation Functions I (#68521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68521 Reviewed By: jbschlosser Differential Revision: D32606625 Pulled By: saketh-are fbshipit-source-id: acf98a07c45bce95b1470bf9856577426265f3d1	2021-11-22 20:01:35 -08:00
Yutaro Sanada	74e6d2ce67	fix typos in jit_language_reference.rst (#68706 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68700 - indent problem Pull Request resolved: https://github.com/pytorch/pytorch/pull/68706 Reviewed By: mruberry Differential Revision: D32598916 Pulled By: jbschlosser fbshipit-source-id: 42af216e83fb48bbd311fc3d41fc3e8f5a2fef08	2021-11-22 19:09:06 -08:00
Zafar	e7d8f096c9	[sparsity] Fix GPU training for sparsity (#66412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66412 The GPU training was not supported in the sparsifier. The reason was that when the sparsifier was created the masks would default to the CPU. Attaching a GPU model to the sparsifier would throw an error. The solution is to create the masks on the same device as the weight. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31590675 Pulled By: z-a-f fbshipit-source-id: 98c2c1cedc7c60aecea4076e5254ef6b3443139e	2021-11-22 16:49:39 -08:00
pbialecki	0b0674121a	Fix strict aliasing rule violation in bitwise_binary_op (#66194 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66119 Failure on ARM Neoverse N1 before this PR: ``` ====================================================================== FAIL: test_bitwise_ops_cpu_int16 (__main__.TestBinaryUfuncsCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test result = test(self, param_kwargs) File "test_binary_ufuncs.py", line 315, in test_bitwise_ops self.assertEqual(op(a, b), op(a_np, b_np)) File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual self.assertEqual( File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!Found 176 different element(s) (out of 225), with the greatest difference of 21850 (-21846 vs. 4) occuring at index (0, 2). ====================================================================== FAIL: test_bitwise_ops_cpu_int32 (__main__.TestBinaryUfuncsCPU) ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/pytorch/pytorch/torch/testing/_internal/common_device_type.py", line 373, in instantiated_test result = test(self, param_kwargs) File "test_binary_ufuncs.py", line 315, in test_bitwise_ops self.assertEqual(op(a, b), op(a_np, b_np)) File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1633, in assertEqual self.assertEqual( File "/opt/pytorch/pytorch/torch/testing/_internal/common_utils.py", line 1611, in assertEqual super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) AssertionError: False is not true : Tensors failed to compare as equal!Found 188 different element(s) (out of 225), with the greatest difference of 1335341061 (-1335341056 vs. 5) occuring at index (14, 8). ---------------------------------------------------------------------- ``` which passes now. CC malfet ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66194 Reviewed By: dagitses, bdhirsh, ngimel Differential Revision: D31430274 Pulled By: malfet fbshipit-source-id: bcf1c9d584c02eff328dd5b1f7af064fac5942c9	2021-11-22 16:43:09 -08:00
Zafar	d176c82bd5	[sparsity] Fix and enable the pruning tests (#66411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66411 The original tests were disabled, and had some bugs. This fixes those unittests. Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31590678 Pulled By: z-a-f fbshipit-source-id: ddbed34cc01d5f15580cb8f0033416f2f9780068	2021-11-22 15:28:12 -08:00
lezcano	b46c89d950	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32588230 Pulled By: mruberry fbshipit-source-id: 69e484849deb9ad7bb992cc97905df29c8915910	2021-11-22 12:41:06 -08:00
nhankiet	a2e35e167b	refactor: update f-string for swa.utils.py (#68718 ) Summary: _ Update some old-style formats to f-string, for whole and coherent consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68718 Reviewed By: jbschlosser Differential Revision: D32593746 Pulled By: albanD fbshipit-source-id: fcc17958f8af6a3260beca883bc1065f019dcf0e	2021-11-22 11:23:18 -08:00
Rohan Varma	9554ebe44e	[Dist CI][BE] c10d gloo tests run in subprocess (#68504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68504 Per title ghstack-source-id: 143928767 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32485100 fbshipit-source-id: a55687aea4af69e3830aee6f0278550c72f142c2	2021-11-22 09:54:07 -08:00
Rohan Varma	ddc22ea3b2	[Dist CI][BE] test_c10d_nccl run in subprocess (#68503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68503 Per title ghstack-source-id: 143928768 Test Plan: CI Reviewed By: H-Huang Differential Revision: D32484990 fbshipit-source-id: 6682f46256af0da5153e5087a91a7044156dd17f	2021-11-22 09:52:58 -08:00
Jane Xu	39ec0f321b	GHA: add print_tests_stats step to MacOS workflow (#68669 ) Summary: This will allow trunk CI to print test stats and upload stats (test reports, flaky tests, failed tests) to - Scribe - S3 - RDS Pull Request resolved: https://github.com/pytorch/pytorch/pull/68669 Reviewed By: dagitses Differential Revision: D32578169 Pulled By: janeyx99 fbshipit-source-id: c348e2070402754789f462b52cd71411984102e2	2021-11-22 08:26:52 -08:00
Erjia Guan	a66ff81837	[DataPipe] Optimize Grouper from N^2 to N (#68647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68647 Fixes #68539 When all data from source datapipe depletes, there is no need to yield the biggest group in the buffer. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32562646 Pulled By: ejguan fbshipit-source-id: ce91763656bc457e9c7d0af5861a5606c89965d5	2021-11-22 07:49:13 -08:00
Alban Desmaison	148f323856	Revert D32541986: [pytorch][PR] [opinfo] use dtypes instead of dtypesIfCPU Test Plan: revert-hammer Differential Revision: D32541986 (`d2a90f91bc`) Original commit changeset: 793d7d22c3ec fbshipit-source-id: c60c4be3416f6feb658b5da1bdf75f0cbe6bee24	2021-11-22 04:58:01 -08:00
Wanchao Liang	7c6a8a47db	[BE] minor improvement to dist quantization (#67401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67401 some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup ghstack-source-id: 143910067 ghstack-source-id: 143910067 Test Plan: wait for ci Reviewed By: mrshenli Differential Revision: D31979269 fbshipit-source-id: 85a2f395e6a3487dd0b9d1fde886eccab106e289	2021-11-21 23:31:59 -08:00
Wanchao Liang	fb556c91ce	[BE] delete frontend.cpp (#67400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67400 c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy. ghstack-source-id: 143910066 ghstack-source-id: 143910066 Test Plan: wait for ci Reviewed By: navahgar Differential Revision: D31979270 fbshipit-source-id: 6ceb8b53d67ab8f9aef44b34da79346dfbb51225	2021-11-21 23:30:52 -08:00
kshitij12345	d2a90f91bc	[opinfo] use dtypes instead of dtypesIfCPU (#67619 ) Summary: Replace usage of `dtypesIfCPU` with `dtypes` in OpInfo class and also make it a mandatory argument. Also added DeprecationWarning on using `dtypesIfCPU` This raises a question : For an OpInfo entry, currently `dtypes` works for any external backend, `dtypesIfCPU` for CPU and `dtypesIfCUDA` and `dtypesIfROCM` for CUDA and ROCm respectively. If we merge `dtypes` and `dtypesIfCPU`, then for cases where external backend `dtypes` don't match cpu `dtypes` then it will lead to failures. Currently there are few issues (5 failures) due to this on XLA (we may add relevant skips for the same). If we agree that skip should be added, then should it be added via OpInfo using decorators mechanism or at the XLA end? I think XLA end makes more sense to me to have one source of skips. <details> <summary>XLA Fail Log</summary> ``` Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 ERROR [0.016s]: test_reference_eager_histogram_xla_float32 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, *param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 397, in compare_with_eager_reference Nov 01 11:48:26 cpu_inp, cpu_args, cpu_kwargs = cpu(sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 393, in cpu Nov 01 11:48:26 sample.args), to_cpu(sample.kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in to_cpu Nov 01 11:48:26 return {k: to_cpu(v) for k, v in x.items()} Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 386, in <dictcomp> Nov 01 11:48:26 return {k: to_cpu(v) for k, v in x.items()} Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 390, in to_cpu Nov 01 11:48:26 raise ValueError("Unknown type {0}!".format(type(x))) Nov 01 11:48:26 ValueError: Unknown type <class 'NoneType'>! Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.575s]: test_reference_eager___rmatmul___xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 44 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 9.187201950435738e+18 (-9.187201950435738e+18 vs. 34.0), which occurred at index (0, 4). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.137s]: test_reference_eager_linalg_multi_dot_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 4 element(s) (out of 4) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 140230883884432.0 (0.0 vs. 140230883884432.0), which occurred at index (0, 0). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.461s]: test_reference_eager_matmul_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=0.001 and atol=0.001, found 37 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 7.661375630332297e+18 (-7.66128151259864e+18 vs. 94117733658072.0), which occurred at index (4, 5). Nov 01 11:48:26 Nov 01 11:48:26 ====================================================================== Nov 01 11:48:26 FAIL [0.050s]: test_reference_eager_remainder_autodiffed_xla_int64 (__main__.TestOpInfoXLA) Nov 01 11:48:26 ---------------------------------------------------------------------- Nov 01 11:48:26 Traceback (most recent call last): Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test Nov 01 11:48:26 result = test(self, param_kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper Nov 01 11:48:26 return test(args, kwargs) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 411, in test_reference_eager Nov 01 11:48:26 self.compare_with_eager_reference(op, sample_input) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/test_ops.py", line 402, in compare_with_eager_reference Nov 01 11:48:26 self.assertEqual(actual, expected, exact_dtype=True, exact_device=False) Nov 01 11:48:26 File "/var/lib/jenkins/workspace/xla/test/pytorch_test_base.py", line 607, in assertEqual Nov 01 11:48:26 return DeviceTypeTestBase.assertEqual(self, x, y, args, **kwargs) Nov 01 11:48:26 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual Nov 01 11:48:26 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) Nov 01 11:48:26 AssertionError: False is not true : Tensors failed to compare as equal!Attempted to compare equality of tensors with different dtypes. Got dtypes torch.int64 and torch.float32. Nov 01 11:48:26 Nov 01 11:48:26 ---------------------------------------------------------------------- ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/67619 Reviewed By: ngimel Differential Revision: D32541986 Pulled By: mruberry fbshipit-source-id: 793d7d22c3ec9b4778784254ef6f9c980b4b0ce2	2021-11-21 21:52:38 -08:00
Sameer Deshmukh	2d06c081ca	Fix test issue with householder_product for non-contiguous inputs. (#68231 ) Summary: Fixes failing tests for `householder_product` due to non-contiguous inputs as shown here: https://github.com/pytorch/pytorch/issues/67513. The floating point error was set too high for the complex64 type, so this PR reduces the error threshold for that particular type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68231 Reviewed By: dagitses Differential Revision: D32562774 Pulled By: mruberry fbshipit-source-id: edae4447ee257076f53abf79f55c5ffa1a9b3cb2	2021-11-21 21:47:23 -08:00
Ivan Yashchuk	3b3dc1ade8	Sparse CSR CPU: add `triangular_solve_out` (#62180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62180 This PR adds CPU dispatch for `triangular_solve` with sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D32581395 Pulled By: cpuhrsch fbshipit-source-id: 41c7133a0d2754ef60b5a7f1d14aa0bf7680a844	2021-11-21 21:29:20 -08:00
Vasiliy Kuznetsov	e1c449ff34	dbr quant overhead[9/x]: precalculate when to skip op_convert_after_hook (#68432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68432 Speeds up `op_convert_after_hook` by precalculating when this hook is a no-op based on informationg gathered while tracing, and skipping execution when this flag is true. ``` MobileNetV2, function level profiling, 1x3x224x224 // before op_convert_before_hook = 3.25% // after op_convert_before_hook = 1.35% ``` Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463752 Pulled By: vkuzo fbshipit-source-id: b0c3d37909ddc8c254fe53f90954f625ae874e3b	2021-11-21 07:08:29 -08:00
Vasiliy Kuznetsov	ba230de118	dbr quant: remove more asserts from hot paths (#68431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68431 asserts have some overhead, removing the asserts used only to make mypy happy from the path which is hit in every forward. Test Plan: python test/test_quantization.py TestQuantizeDBR Reviewed By: jerryzh168 Differential Revision: D32463767 Pulled By: vkuzo fbshipit-source-id: 5f85f80144f35a725afe481bf027ea61ca6315bf	2021-11-21 07:08:26 -08:00
Vasiliy Kuznetsov	95c00cf029	speed up quantized relu6 inplace kernel (#68404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68404 The qclamp kernel is equal to (non inplace) or faster (inplace) than the qrelu6 kernel. Removing the qrelu6 kernel and routing qrelu6 to the qclamp kernel instead. Test Plan: ``` // correctness python test/test_quantization.py TestQuantizedOps.test_qrelu6 // benchmarking import torch import torch.nn.functional as F toq = torch.ops.quantized import time N_WARMUP = 5 N_ITER = 1000 data = torch.randn(32, 32, 64, 64) data = torch.quantize_per_tensor(data, 0.05, 0, torch.quint8) for _ in range(N_WARMUP): F.hardtanh(data, 0., 6., inplace=True) t1 = time.time() for _ in range(N_ITER): F.hardtanh(data, 0., 6., inplace=True) t2 = time.time() for _ in range(N_WARMUP): toq.relu6(data, inplace=True) t3 = time.time() for _ in range(N_ITER): toq.relu6(data, inplace=True) t4 = time.time() t_hardtanh = t2 - t1 t_qrelu6 = t4 - t3 print(t_hardtanh, t_qrelu6) // before 0.7156341075897217 1.4007949829101562 // after 0.6825599670410156 0.6571671962738037 ``` Reviewed By: jerryzh168 Differential Revision: D32463754 Pulled By: vkuzo fbshipit-source-id: a87fe5907d7b71d87eb1d5f6588cd509a88f2969	2021-11-21 07:08:23 -08:00
Vasiliy Kuznetsov	592053f115	dbr quant: simplify relatedness logic (#68374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68374 Cleans up the relatedness logic in DBR quant. For now, this is still duplicated with NS. A future PR should unify these mappings. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463750 Pulled By: vkuzo fbshipit-source-id: 90c2f5e79b86b1b595bd52650305bad88212ed49	2021-11-21 07:08:20 -08:00
Vasiliy Kuznetsov	f1021bcf38	dbr quant overhead[8/x]: small speedup in op_needs_quantization (#68373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68373 Removes redundant logic in `op_needs_quantization`, for a small speedup. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before cur_op_needs_hooks - 0.76% op_needs_quantizaion - 0.41% // after cur_op_needs_hooks - 0.70% op_needs_quantization - 0.36% ``` Reviewed By: jerryzh168 Differential Revision: D32463762 Pulled By: vkuzo fbshipit-source-id: 334591c514dfa5af6fabc1390005088e8c5ca952	2021-11-21 07:08:17 -08:00
Vasiliy Kuznetsov	74ba1067a6	dbr quant overhead[7/x]: speed up AutoQuantizationState.reset_to_new_call (#68372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68372 Speeds up `AutoQuantizationState.reset_to_new_call` by going around the getattr and setattr overhead in `torch.nn.Module`. Test Plan: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before reset_to_new_call - 1.09% // after reset_to_new_call - 0.18% ``` Reviewed By: jerryzh168 Differential Revision: D32463759 Pulled By: vkuzo fbshipit-source-id: f3faa464372b0703f7d246680d62acd2782453e3	2021-11-21 07:08:15 -08:00
Vasiliy Kuznetsov	b7d58745c8	dbr quant overhead[6/x]: remove unneeded isinstance checks in `op_convert_before_hook` (#68371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68371 `isinstance` has some overhead, changing the code in `op_convert_before_hook` to use the information calculate during tracing instead which is cheaper. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` function level benchmarking ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before op_convert_before_hook = 3.55% isinstance = 1.62% // after op_convert_before_hook = 2.89% ``` Reviewed By: jerryzh168 Differential Revision: D32463757 Pulled By: vkuzo fbshipit-source-id: 129efe9c279a41f55b8bfd09132e21c0066298a6	2021-11-21 07:08:12 -08:00
Vasiliy Kuznetsov	b3a7d696b3	dbr quant overhead[5/x]: remove unnecessary asserts (#68370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68370 Removes asserts which are duplicate (the same condition is checked when calculating the hook type, so there is no need to check it again). For the assert in `validate_is_at_last_seen_idx`, rewrites it to raise an Error instead to ensure it does not get stripped in production environments. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463766 Pulled By: vkuzo fbshipit-source-id: 8a7b7e0bf270bc327f49bd3e5bd156339e846381	2021-11-21 07:08:09 -08:00
Vasiliy Kuznetsov	16a6e0612d	dbr quant: clean up key types in AutoQuantizationState mappings (#68369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68369 `AutoQuantizationState` has various mappings keyed on IDs. Only `tensor_id_to_observer` actually needs string keys because it is an `torch.nn.ModuleDict`. This PR changes the other mappings to have integer keys, for simplicity and performance. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463765 Pulled By: vkuzo fbshipit-source-id: 5a9bf2a1102859097eedf1e536761084cd408856	2021-11-21 07:08:06 -08:00
Vasiliy Kuznetsov	3fc9bc43c6	dbr quant overhead[4/x]: speed up hook type calculations (#68351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68351 Speeds up `get_module_hook_type` and `get_torch_function_hook_type` by bypassing the expensive `torch.nn.Module` getters and setters and fetching `_auto_quant_state` directly. Test Plan: Model level benchmarking is noisy. Individual `cProfile` results: ``` // MobileNetV2, 1x3x224x224 input, % of time spent by function during DBR convert // before get_module_hook_type - 5.96% get_torch_function_hook_type - 2.24% // after get_module_hook_type - 2.10% get_torch_function_hook_type - 0.57% ``` Reviewed By: jerryzh168 Differential Revision: D32463756 Pulled By: vkuzo fbshipit-source-id: 6eb199052ddf8d78f1c123a427e7437fc7c4fe58	2021-11-21 07:08:03 -08:00
Vasiliy Kuznetsov	c72ffee497	dbr quant overhead[3/x]: speed up AutoQuantizationState.mark_cur_op_complete (#68350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68350 `torch.nn.Module` has overhead for getting and setting attributes because it does various type checks on the attribute. This PR explicitly gets and sets the right thing for this particular function, avoding the type checks. Model level benchmarks are too noisy, but according to function level profiling this reduces the time spent in this function in a quantized model from 2.60% to 0.53%, on MobileNetV2 with input size 1x3x224x224. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463751 Pulled By: vkuzo fbshipit-source-id: a29beed2a2b87ca4df675a30dd591f797c8a1dbe	2021-11-21 07:06:42 -08:00
Vasiliy Kuznetsov	c7ecf1498d	dbr quant overhead[2/x]: precalculate op_convert_info (#68347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68347 Moves `op_convert_info` to be precalculated in the convert step instead of calculated dynamically. This should help with framework overhead. Test Plan: Noisy benchmark: ``` // before fp32: 0.016103 seconds avg fx_prepared: 0.019841 seconds avg, 0.811601 speedup vs fp32 fx_quantized: 0.011907 seconds avg, 1.352346 speedup vs fp32 dt_prepared: 0.035055 seconds avg, 0.459357 speedup vs fp32 dt_quantized: 0.018891 seconds avg, 0.852417 speedup vs fp32 // after fp32: 0.020535 seconds avg fx_prepared: 0.023071 seconds avg, 0.890070 speedup vs fp32 fx_quantized: 0.011693 seconds avg, 1.756206 speedup vs fp32 dt_prepared: 0.038691 seconds avg, 0.530734 speedup vs fp32 dt_quantized: 0.021109 seconds avg, 0.972793 speedup vs fp32 ``` The benchmark is too noisy to rely on, but according to `cProfiler` this removes about 5% of overhead. Reviewed By: jerryzh168 Differential Revision: D32463761 Pulled By: vkuzo fbshipit-source-id: e2ad0d7eeff7dbadf3aa379604bfe9bec0c228fe	2021-11-20 15:17:12 -08:00
Vasiliy Kuznetsov	9fba8971a7	dbr quant: move model level utils into own file (#68346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68346 Some utility functions for DBR quant need to be aware of `AutoQuantizationState`. This PR moves them into their own file, so they can use the type directly without circular imports, and removes the mypy ignores which are no longer necessary after this change. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463763 Pulled By: vkuzo fbshipit-source-id: e2c367de0d5887c61e6d2c3a73d82f7d76af3de1	2021-11-20 15:17:10 -08:00
Vasiliy Kuznetsov	629f9a5532	dbr quant: clean up AutoQuantizationState.get_op_convert_info flag (#68345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68345 Removes a flag to unwrap scale and zp which was only needed by the FX rewriter. Moves the logic to happen in the FX tracer instead. This resolves a technical debt TODO. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463764 Pulled By: vkuzo fbshipit-source-id: ba7c976664c95111174fb65488bdac62b4f4984d	2021-11-20 15:17:07 -08:00
Vasiliy Kuznetsov	52cc9cb0ee	dbr quant: refactor AutoQuantizationState._get_packed_param_name (#68344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68344 Makes `AutoQuantizationState._get_packed_param_name` use `seen_op_info` instead of the current op. This will make future performance improvements easier. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: albanD Differential Revision: D32463758 Pulled By: vkuzo fbshipit-source-id: 0c16fe4bc989cb66180ad674ec55060cd970e32e	2021-11-20 15:17:04 -08:00
Vasiliy Kuznetsov	2755cf457c	dbr quant: refactor AutoQuantizationState._get_input_args_quant_dequant_info (#68343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68343 Refactors `AutoQuantizationState._get_input_args_quant_dequant_info` to use less internal state, makes the function have no side effects by passing the state in the arguments, and moves the function to utils file. This will help with a future refactor to cache this info at runtime. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463760 Pulled By: vkuzo fbshipit-source-id: bdd50b0772f128755f9b734b5eeb0a9f4bc4970b	2021-11-20 15:17:02 -08:00
Vasiliy Kuznetsov	57472ec414	dbr quant: refactor `get_quantized_op` to only use `seen_op_info` (#68342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68342 Before this PR, `get_quantized_op` required the current callable. After this PR, `get_quantized_op` only requires `seen_op_info`. The signature was changed slightly to return `None` if the original callable does not need replacement for quantization. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463768 Pulled By: vkuzo fbshipit-source-id: 5db2c4199f6c0529817f4c058f81fd1d32b9fa9f	2021-11-20 15:16:59 -08:00
Vasiliy Kuznetsov	9cf4779ec9	dbr quant: refactor `get_func_output_obs_type` to only use `seen_op_info` (#68341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68341 Before this PR, `get_func_output_obs_type` used information from the incoming op and its arguments, which makes it hard to cache. This PR refactors `get_func_output_obs_type` to only use information collected during tracing. This will make it easier to make performance improvements in a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: jerryzh168 Differential Revision: D32463755 Pulled By: vkuzo fbshipit-source-id: 25a220de652f0285685d43aedf7392082104b26c	2021-11-20 15:16:56 -08:00
Vasiliy Kuznetsov	f8b084c563	dbr quant overhead[1/x]: remove expensive calls to named_modules (#68309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68309 This is the first of a series of PRs to reduce overhead of DBR quantization prototype. For now, the measurement of this work is not super scientific as there are a lot of low hanging fruit. As we speed up the prototype, we might need to invest in better benchmarking. Current benchmarking setup: * mac OS laptop with OMP_NUM_THREADS=1 * torchvision's mobilenet_v2 * input size 1x3x224x224 * we measure fp32 forward, prepared and quantized forward with FX quant vs DBR quant Note that due to small input size, this benchmark is pretty noisy. The goal here is to measure overhead of DBR quant logic (not the kernels), so small input is good as we want the kernels to take as little % of overall time as possible. High level goal is for DBR quant convert forward to approach the FX time. This first PR removes the expensive named_modules calls and resets the op counter in the op instead. According to cProf, this should be a 2 to 3 pct win. Test Plan: ``` benchmark: https://gist.github.com/vkuzo/1a4f98ca541161704ee3c305d7740d4a // before fp32: 0.020101 seconds avg fx_prepared: 0.020915 seconds avg, 0.961083 speedup vs fp32 fx_quantized: 0.012037 seconds avg, 1.670005 speedup vs fp32 dt_prepared: 0.037506 seconds avg, 0.535953 speedup vs fp32 dt_quantized: 0.022688 seconds avg, 0.885988 speedup vs fp32 // after fp32: 0.020722 seconds avg fx_prepared: 0.023417 seconds avg, 0.884893 speedup vs fp32 fx_quantized: 0.014834 seconds avg, 1.396942 speedup vs fp32 dt_prepared: 0.039120 seconds avg, 0.529700 speedup vs fp32 dt_quantized: 0.020063 seconds avg, 1.032831 speedup vs fp32 ``` Reviewed By: albanD Differential Revision: D32463753 Pulled By: vkuzo fbshipit-source-id: 1d7de7d9c4837e2b0ec815f0f67014c7600bb16c	2021-11-20 15:16:53 -08:00
Vasiliy Kuznetsov	ed6ef0eec4	dbr quantization: inline scale and zp (#68251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68251 Before this PR, DBR quantization used to recalculate scale and zero_point in the converted model every time it was needed, which is slow. This PR creates a pass during the convert function to go through every observer in the model and cache its scale and zero_point. Note: only doing this for observers which correspond to int8 operations is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR ``` Reviewed By: VitalyFedyunin Differential Revision: D32463769 Pulled By: vkuzo fbshipit-source-id: d1d2e598e2bccc1958e5023096b451d69dc34e29	2021-11-20 15:16:51 -08:00
Vasiliy Kuznetsov	ca499567d2	barebones numeric suite for quantization with dynamic tracing (#67776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67776 This adds a barebones `add_loggers` and `extract_logger_info` API to analyze intermediate activations of models using quantization with dynamic tracing. The API generally matches the NS for FX tool, with some omissions. For now, this is moving fast to help us debug real models, and the API will be 100% aligned before this is marketed to users, in future PRs. Note: the current approach couples Numeric Suite with the quantization logic. This is not the best for composability, and may be changed at a future time. Test Plan: ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` ``` python test/test_quantization.py TestAutoTracing.test_numeric_suite ``` Differential Revision: D32231332 D32231332 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 8adfb50cd8b7836c391669afe2e2ff6acae6d40a	2021-11-20 15:15:48 -08:00
Pearu Peterson	d0eff8d846	Strided masked softmin. (#68463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68463 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32576497 Pulled By: cpuhrsch fbshipit-source-id: 286edb2e7a5415df76858c69d0312743437b0fd8	2021-11-19 20:51:42 -08:00
Christian Puhrsch	75955e4ef8	[clone][sparse] Add `torch._C._sparse` namespace (#68672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68672 This PR adds `python_module: sparse` to `native_function.yaml`. These functions would appear in `torch._C._sparse` namespace instead of just `torch`. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32517813 fbshipit-source-id: 7c3d6df57a24d7c7354d0fefe1b628dc89be9431	2021-11-19 19:47:38 -08:00
Xiang Gao	95f4cd0ba9	Implement topk with sort for some cases (#68632 ) Summary: Benchmark that compares original implementation and the sort implementation (this code should run on a branch without this patch): ```python import torch import timeit def tune_dtype(f): def ret(args, kwargs): for dtype in [torch.int8, torch.half, torch.float, torch.double]: f(args, *kwargs, dtype=dtype) return ret def tune_slice(f): def ret(args, *kwargs): slice = 1 while slice <= 256: f(args, *kwargs, slice=slice) slice = 2 return ret def tune_slice_size(f): def ret(args, kwargs): slice_size = 1 while slice_size <= 1_000_000: f(args, *kwargs, slice_size=slice_size) slice_size = 10 return ret def tune_k(f): def ret(args, slice_size, kwargs): k = 1 while k <= slice_size: f(args, *kwargs, k=k, slice_size=slice_size) k = 10 return ret def topk_with_sort(tensor, k, dim=-1, largest=True): values, indices = tensor.sort(dim=dim, descending=largest) return values.narrow(dim, 0, k), indices.narrow(dim, 0, k) def run50sync(f): for _ in range(50): f() torch.cuda.synchronize() def warmup(): N = 1000000 for i in range(1, N // 10000): torch.randn(i, device='cuda') def benchmark_one(slice, slice_size, k, dtype): input_ = torch.empty((slice, slice_size), dtype=dtype, device="cuda").random_() torch.cuda.synchronize() time = timeit.timeit(lambda: run50sync(lambda: torch.topk(input_, k, dim=1)), number=1) torch.cuda.synchronize() time_sort = timeit.timeit(lambda: run50sync(lambda: topk_with_sort(input_, k, dim=1)), number=1) method = "orig" if time < time_sort else "sort" speedup = time / time_sort print(f"(dtype={dtype}, slice={slice}, slice_size={slice_size}, k={k}) -> (method={method}, speedup={speedup})") if __name__ == "__main__": warmup() tune_dtype(tune_slice(tune_slice_size(tune_k(benchmark_one))))() ``` Benchmark result see next comment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68632 Reviewed By: dagitses Differential Revision: D32566233 Pulled By: ngimel fbshipit-source-id: f7a508176ef3685b491048c4a6562121c60b8b2a	2021-11-19 17:18:20 -08:00
Rohan Varma	e554d8b89c	Fix retry on connect failure decorator (#68600 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68541 by checking string contains instead of exact eror Pull Request resolved: https://github.com/pytorch/pytorch/pull/68600 Reviewed By: dagitses, H-Huang Differential Revision: D32535592 Pulled By: rohan-varma fbshipit-source-id: 864c3e3c6831f2351c2949b2348af4f48a308522	2021-11-19 17:13:30 -08:00
Priya Ramani	8e51381bac	Make AOT compiler generic (#68637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68637 Make AOT compiler compile BI bytedoc model also making the compiler generic enough for other models. Shape propagation pass replaced with the new JIT tracer as shape propagation doesn't yet support dynamic shapes. Change to get and set input dtype to follow Test Plan: BI model changed to return a tuple of tensors instead of returning a tuple(list[tensor], list[string]). Modified BI model runs well with these changes ``` jf download GN91Hg9shoWzU1oPAGQ7X9SV8-5nbmQwAAAA --file bi.pt └─ $ ./compile_model.sh -m pytorch_dev_bytedoc -p bi.pt -v v1 -i "1,115;1" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=pytorch_dev_bytedoc + getopts m:p:v:i:h opt + case $opt in + MODEL_PATH=bi.pt + getopts m:p:v:i:h opt + case $opt in + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + INPUT_DIMS='1,115;1' + getopts m:p:v:i:h opt + require_arg m pytorch_dev_bytedoc + '[' -n pytorch_dev_bytedoc ']' + require_arg p bi.pt + '[' -n bi.pt ']' + require_arg i '1,115;1' + '[' -n '1,115;1' ']' + '[' '!' -f bi.pt ']' +++ dirname ./compile_model.sh ++ cd . ++ pwd -P + SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc + FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../.. + FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/pytorch_dev_bytedoc/v1/nnc ++ readlink -f bi.pt ++ sed 's/.pt.*//' + MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi + LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.ll + ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.s + COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt + KERNEL_FUNC_NAME=nnc_pytorch_dev_bytedoc_v1_forward + buck run //caffe2/binaries:aot_model_compiler -- --model=bi.pt --model_name=pytorch_dev_bytedoc --model_version=v1 '--input_dims=1,115;1' Restarting Buck daemon because Buck version has changed... Buck daemon started. Parsing buck files... 0.6 sec (0/unknown) . . Parsing buck files: finished in 5.0 sec Creating action graph: finished in 0.7 sec Downloaded 3750/4917 artifacts, 16.09 Mbytes, 13.3% cache miss (for updated rules) Building: finished in 01:22.3 min (100%) 4995/4995 jobs, 4995/4995 updated Total time: 01:28.0 min BUILD SUCCEEDED Run with 56 threads Run with 56 threads Loading model... Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/bi.compiled.pt Running forward ... WARNING: Logging before InitGoogleLogging() is written to STDERR W1115 11:42:18.170666 1597103 TensorImpl.h:1418] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) (Columns 1 to 10 0.5428 0.1651 0.0158 0.0055 0.0503 0.0749 0.0161 0.0204 0.0237 0.0095 Columns 11 to 12 0.0609 0.0148 [ CPUFloatType{1,12} ], Columns 1 to 10-1.3946 -0.0835 -1.1268 0.3325 -2.1884 4.6175 -0.1206 -1.5058 -1.5277 -2.1214 Columns 11 to 20 1.3726 -0.4573 -1.7583 -2.2275 1.9607 -5.3430 -4.4927 -3.2548 -5.3214 2.9002 Columns 21 to 30-1.3973 -0.8084 -1.8491 -1.6518 4.2531 -0.0321 -0.0282 -1.1180 -0.9800 2.9228 Columns 31 to 32 0.8228 2.2611 [ CPUFloatType{1,32} ]) Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 40.64. Iters per second: 24.6063 Memory usage before main runs: 71581696 bytes Memory usage after main runs: 94347264 bytes Peak memory usage after main runs: 94347264 bytes Average memory increase per iter: 2.22495e+06 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D32438852 fbshipit-source-id: 5defdc2593abda5da328f96248459d23b2c5e5c6	2021-11-19 17:08:07 -08:00
Pritam Damania	c41d8290b3	Rename shard_lengths to shard_sizes to be more inline with Tensor sizes. (#66464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66464 Dimension sizes are referred to as `size` in general in PyTorch and hence rename shard_lengths to shard_sizes. #Closes: https://github.com/pytorch/pytorch/issues/65794 ghstack-source-id: 143866449 Test Plan: waitforbuildbot Reviewed By: fduwjj, wanchaol Differential Revision: D31564153 fbshipit-source-id: 6273426c4b0e079358806070d0d9644740adb257	2021-11-19 16:30:00 -08:00
Pearu Peterson	af564e73b8	Strided masked log_softmax. (#68461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68461 Test Plan: Imported from OSS Reviewed By: dagitses, zou3519 Differential Revision: D32569961 Pulled By: cpuhrsch fbshipit-source-id: 5d262adacf239dace4a28de85af4b602e36f17f0	2021-11-19 16:28:35 -08:00
Peter Bell	578507cb7b	Fix nanmedian result using more CUDA memory than necessary (#68591 ) Summary: CUDA's `at::nanmedian` creates a sorted copy of the array, then indexes into it to create a single element view. This view necessarily keeps the entire `sorted` tensor's storage alive which can be avoided by returning a copy, which is what `at::median` does indirectly via `at::where`. This also changes the index variable `k` to be a simple `int64_t` instead of the CUDA tensor that was used before. This saves the additional host and device operations from calling `Tensor`'s `operator -` which helps balance out the cost of the `clone` added here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68591 Reviewed By: dagitses Differential Revision: D32538538 Pulled By: ngimel fbshipit-source-id: abe9888f80cf9d24d50a83da756e649af1f6ea3b	2021-11-19 16:16:19 -08:00
Shiyan Deng	6cca14d02f	[fx2trt][easy] Replace all network.add_activation() call with helper function (#68676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68676 As the title, the helper functions handles setting layer name. We would want to use those helper functions whenever possible. Test Plan: CI Reviewed By: wushirong Differential Revision: D32571061 fbshipit-source-id: 4a191f0085c0b3965dc02d99bb33de21973d565d	2021-11-19 15:29:39 -08:00
Aliaksandr Ivanou	37edb7483a	[torchelastic][1/n] Fix `caffe2.test.distributed.launcher.api_test` flaky tests (#68624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68624 Fix `caffe2.test.distributed.launcher.api_test` flaky tests for opt-tsan mode. The diff changes the default `mp.Process` invocation to use spawn context. `mp.Process` will uses `fork` method that is not compatible with `*san`. Test Plan: CI Reviewed By: d4l3k Differential Revision: D32550578 fbshipit-source-id: f4767987e8e10a6a2ece3f86e48278f2dbaebe7c	2021-11-19 15:23:30 -08:00
Jerry Zhang	a545a409f8	[quant][graphmode][fx] Support input_quantized_idxs and output_quantized_idxs in the new convert (#68042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68042 att Also added test cases from TestQuantizeFx which tests all combinations of {fp32, int8} input and output override Test Plan: ``` python test/fx2trt/test_quant_trt.py TestConvertFxDoNotUse ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32271511 fbshipit-source-id: 87ffc00069aaff7d1c455cdd97fac82b11aa4527	2021-11-19 15:12:54 -08:00
Peter Bell	993b7a2052	Remove doubly nested anonymous namespace (#68555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68555 The outer namespace is already anonymous, so this is not necessary. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D32565941 Pulled By: malfet fbshipit-source-id: 4daf1c46b25ff68e748e6c834c63d759ec6fde4f	2021-11-19 14:40:47 -08:00
soulitzer	5456d8c8f3	Add vectorized Jacobian and Hessian computation with forward AD (#67041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67041 Original PR here: https://github.com/pytorch/pytorch/pull/62246 (The old PR does more things, but now that's split across this stack) This PR: - Adds "jacfwd" and "hessian_fwdrev" - Modifies existing tests to also test the `forward_ad=True` case Test Plan: Imported from OSS Reviewed By: gchanan, zou3519 Differential Revision: D32314424 Pulled By: soulitzer fbshipit-source-id: 785b0e39162b93dc3b3cb9413233447152eddd53	2021-11-19 14:31:09 -08:00
soulitzer	7bb401a4c9	Add forward AD support for miscellanous operators (#67820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67820 Original PR here: https://github.com/pytorch/pytorch/pull/67040 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D32314423 Pulled By: soulitzer fbshipit-source-id: ecd898dc903692cab084f6922a1d86986f957b1b	2021-11-19 14:31:06 -08:00
soulitzer	e358c49a5b	Add OpInfo test and fix a couple cases (#66294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66294 In this PR: - OpInfo for forward AD now checks batched forward grad when `op.check_batched_grad=True` - Adds setting to disable the test for individual ops `check_batched_forward_grad` and disable for the ops here: https://github.com/pytorch/pytorch/issues/66357 Fixes some more failures: - Make Forward AD metadata less strict by allowing stride to differ when size is 1 - Fix sum batching rule when logical tensor is a scalar and dim is unspecified - Batching rule for `_reshape_alias` - ~Batching rules now preserve storage offset for view operator that return non-zero storage offset~ (moved to previous PR) Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842020 Pulled By: soulitzer fbshipit-source-id: 3517a8fb9d6291fccb53c0b1631eab5bbb24ebd1	2021-11-19 14:31:03 -08:00
soulitzer	21d203b5ca	Add internal assert for tangent layout mismatch for view ops (#66293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66293 This PR: - Asserts that if the output is a view, then the `is_same_metadata` must return `true`. Otherwise, we are performing a copy. - unless we are being called from `make_dual` which can allow the tangent and primal to have different layouts, because it is not forward differentiable. - To make this possible, we add `is_make_dual` as a parameter. ~The alternative is to make `make_dual` non-composite, and then we can rely on its `view_info` for differentiability information. This also assumes that the only composite function that calls `set_fw_grad` is `make_dual`.~ - Batching rules now preserve storage offset for view operator that return non-zero storage offset Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842021 Pulled By: soulitzer fbshipit-source-id: ed606f5a7b4770df1e9ebc6eb1d584b27dad5bae	2021-11-19 14:30:59 -08:00
soulitzer	2455cc2adf	Address case when layout of tangent is not same as base (#66292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66292 In this PR: 1. Fix the case when tangent has a different layout from the base when `set_fw_grad` by adding a native function and its batching rule. For (1) we replace the following: ``` Tensor new_with_same_meta(const Variable& base) { int64_t nelement_in_storage = base.storage().nbytes() / base.itemsize(); auto new_tensor = at::zeros({nelement_in_storage}, base.options()); auto res = new_tensor.as_strided(base.sizes(), base.strides(), base.storage_offset()); return res; } ``` with a native function as to enable a batching rule to alter its behavior. This new function will be similar to `new_zeros_strided` except we also require the `storage_offset` and `storage_numel` arguments. Possible concerns: - Why have redundant logic? Why not add new args `new_zeros_strided`? This is probably a niche use case, so it's better not to complicate the current API. - Previously the created tensor inherits the TensorOptions of the primal. Now we inherit from the TensorOptions of the tangent. - Probably fine. Likely, no one relies on this because the behavior is only triggered when tangent/base have different layouts. - Why pass in exploded size, stride, and offset? It is possible in the non-batched case to pass in a tensor directly, but not possible when we'd like to have a batching rule. The size, stride, and offset we'd be passing won't belong to any live tensor. Test Plan: Imported from OSS Reviewed By: zou3519, albanD Differential Revision: D31842019 Pulled By: soulitzer fbshipit-source-id: a58433d814fd173bc43a2c550b395377dba40de2	2021-11-19 14:29:46 -08:00
Andrey Talman	bbe2aae84c	Support cuda 11.5: install magma for cuda in conda (#68665 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68667 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68665 Reviewed By: malfet Differential Revision: D32570283 Pulled By: atalman fbshipit-source-id: 4471fe8c4f8cc74c542ed67038322f07e861af73	2021-11-19 13:43:26 -08:00
Rohan Varma	183dcdf551	[reland] Fix flaky test_nccl_timeout (#68544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 In addition to changes in https://github.com/pytorch/pytorch/pull/68403, add one more error check that can be raised when a collective times out cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68544 Reviewed By: albanD Differential Revision: D32508706 Pulled By: rohan-varma fbshipit-source-id: 7d41b91f547d4ad763c44cd11e7b9914b452b617	2021-11-19 13:25:24 -08:00
Jerry Zhang	875ba3dddb	[quant][trt] Add support for torch.addmm in TensorRT (#67537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67537 This PR adds support for quantizing torch.addmm to produce a reference quantized pattern, and also adds support in the backend_config_dict api that allows people to specify the input, weight and bias input for each input: ``` addmm_config = { "pattern": torch.addmm, "observation_type": ObservationType.OUTPUT_USE_DIFFERENT_OBSERVER_AS_INPUT, "dtype_configs": [ weighted_op_qint8_dtype_config, ], # a map from input type to input index "input_type_to_index": { "bias": 0, "input": 1, "weight": 2, } } ``` This requires some changes in getting weight_dtype and bias_dtype in the type inference stage of prepare, which will be added in the previous PR Test Plan: ``` pytho test/fx2trt/test_quant_trt.py TestQuantizeFxTRT.test_addmm ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32014998 fbshipit-source-id: 8d96c1e8b7ebb2ab385c08a5b1e43f2d5a2cbcbe	2021-11-19 13:19:28 -08:00
Mike Iovine	ee4cfaa286	[SR] Add utility class to determine tensor ranges (#68284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68284 Add a new class `ManagedTensorRanges` that determines when manage tensors can be made available for re-use. This class provides a method `availableTensors(Node* node)` that returns a vector of `Value*` (corresponding to managed tensors) that are not used (either directly or through any alias) after `node`. Test Plan: New unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: swolchok Differential Revision: D32397207 fbshipit-source-id: fb0d9a23f13abf6f2207e3d7266384966f477fc6	2021-11-19 13:10:55 -08:00
Jerry Zhang	a6d862c50a	[quant][graphmode][fx] Add support for weight and bias dtype in backend_config_dict (#68602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68602 This PR adds support for configuring weight/bias dtype in backend_config_dict and refactor the current code that checks when to insert observers Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32537712 fbshipit-source-id: 28eb7c61a8dcad8c1f3f6622d490a34cff0c59e2	2021-11-19 13:01:50 -08:00
Jagadish Krishnamoorthy	da4a95c79a	[ROCm] Use hipCUB/rocPRIM scan algorithms for large index support (#68487 ) Summary: For inclusive_scan and exclusive_scan, use hipCUB/rocPRIM scan algorithms for large index support. Implemented for ROCm 5.0 and above. Code reference : ROCmSoftwarePlatform/rocPRIM@5673df4#diff-47f4ef75e5af60dd5fe3906df9cf971f0635602a6b64a706dee6633d6677ef1a Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/68487 Reviewed By: ngimel Differential Revision: D32547541 Pulled By: malfet fbshipit-source-id: 4dd984e6906aec7634d05e2ceaa55e31cd4d7376	2021-11-19 12:51:30 -08:00
Shirong Wu	5880a2f1ef	Allow fuse unsqueeze cat sum with multiple input (#68650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68650 Allow fuse unsqueeze cat sum with >2 input, the impl in this diff is naive, just concat item with add. Not sure can have more perf gain with fuse multiple add into one operation. Test Plan: unit test Reviewed By: jfix71 Differential Revision: D32520135 fbshipit-source-id: 535b1c8c91e415d5f1af714378b9205c1ca02ffd	2021-11-19 12:45:37 -08:00
Pearu Peterson	2cab77f810	Masked normalization infrastructure and strided masked softmax (#68333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68333 Test Plan: Imported from OSS Reviewed By: dagitses, ZolotukhinM Differential Revision: D32564435 Pulled By: cpuhrsch fbshipit-source-id: 4d4662323ceffd12c210b7e931682d0442578157	2021-11-19 12:41:22 -08:00
Philip Meier	f99f5ee088	add support for None in assert_close (#67795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67795 Closes #61035. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32532207 Pulled By: mruberry fbshipit-source-id: 6a2b4245e0effce4ddea7d89eca63e3b163951a7	2021-11-19 12:38:25 -08:00
Philip Meier	0809553cf0	refactor assert_close to be more modular (#67794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67794 This change is needed to conveniently use the same comparison mechanism for our internal testsuite (see #67796). The reworked version is on par with the previous version except for the ability to pass a custom message as callable. Before we converted everything to a tensor so it was fairly easy to provide consistent mismatch diagnostics to the callable. Now, with arbitrary `Pair`'s that are used for comparison that is no longer viable. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32532206 Pulled By: mruberry fbshipit-source-id: dc847fba6a795c1766e01bc3e88b680a68287b1e	2021-11-19 12:37:16 -08:00
Ivan Kobzarev	f74779e403	[android] Lite interpreter naming for android nightly publishing (#68651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68651 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D32564796 Pulled By: IvanKobzarev fbshipit-source-id: 57847bfb2778433cfb02ad7a5a79ae30a6b438c1	2021-11-19 10:56:13 -08:00
Saketh Are	4bcff4733d	Add OpInfos for parcel Elementwise Binary II (#68085 ) Summary: Adds OpInfos for `torch.lcm`, `torch.gcd`, `torch.heaviside`, `torch.bitwise_or`, `torch.bitwise_xor`, `torch.isclose`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68085 Reviewed By: ngimel Differential Revision: D32533310 Pulled By: saketh-are fbshipit-source-id: 1616ebec61164cd1b44672f36220787a878b96a4	2021-11-19 10:37:07 -08:00
Ben Koopman	c2c859bdf2	[quant][embedding qat] Add benchmarks for QAT Embedding+EmbeddingBag (#66560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66560 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31618282 Pulled By: b-koopman fbshipit-source-id: ebfe723cfc4004f413f157e65532d64e8d0274b3	2021-11-19 06:29:19 -08:00
Gisle Dankel	f82f14de17	[libkineto] Refactor 4/n: Simplify activity logger step 2/3 (#68329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68329 Pull Request resolved: https://github.com/pytorch/kineto/pull/466 1. Generalize ChromeTraceLogger::handleGenericActivity to enable it to handle Cuda runtime activities as well as the Roctracer generic activities. This primarily involves enabling generic support for CPU -> GPU flows. 2. In the event of out-of-order GPU activities (an issue with Cuda11.0, likely fixed in later versions), no longer remove them but print warnings. Another diff will add these warnings to the metadata section. Reviewed By: briancoutinho Differential Revision: D31624496 fbshipit-source-id: dab04b3e3c0dd6799496ac87f837363de79eea25	2021-11-18 23:09:20 -08:00
Gisle Dankel	18312313c4	[Profiler] Add missing guards (#65812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65812 Multiple threads are recording events to a shared activity buffer and the buffer is at some point transferred to libkineto. The access to and the transfer of the buffer needs to be done under lock. Reviewed By: leitian, xw285cornell Differential Revision: D31220061 fbshipit-source-id: f11c879df1b55aa9068187e600730bb0e5e5455f	2021-11-18 22:39:21 -08:00
Scott Wolchok	343723e2ad	[PyTorch][JIT][easy] Delete unnecessary overload of MemoryDAG::mayAlias (#66966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66966 T* is convertible to const T*, so we don't need this overload. ghstack-source-id: 143749559 Test Plan: builds Reviewed By: hlu1 Differential Revision: D31809824 fbshipit-source-id: 70cca86c4a87dc09cd958953a08a801db3e4d047	2021-11-18 22:36:06 -08:00
Scott Wolchok	ced57eb490	[PyTorch][Static Runtime] Delete incorrect alias analysis code (#67075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67075 Sharing storage if `mayAlias` is incorrect, as the old comment notes; sharing if `mustAlias` would be nice but, as the new comment notes, would not matter. ghstack-source-id: 143749553 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31851893 fbshipit-source-id: 5bdc8de984d5919332c9010e8b0160211d96bc2f	2021-11-18 22:34:50 -08:00
Kushashwa Ravi Shrimali	833dcaf2d6	Sparse CSR: Add `torch.sin` (#68123 ) Summary: This PR attempts to add support for `torch.sin` for sparse CSR tensors. This aims to be a revised implementation (in some form) of https://github.com/pytorch/pytorch/pull/68083, and the implementation aims to be similar to that in [`SparseTensorMath.cpp` file](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/sparse/SparseTensorMath.cpp) The tests and `empty_like` support for sparse CSR tensors (with a minor correction) are borrowed from https://github.com/pytorch/pytorch/pull/68083 temporarily to assist CI with testing this PR. :) cc nikitaved pearu cpuhrsch IvanYashchuk krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/68123 Reviewed By: jbschlosser Differential Revision: D32533379 Pulled By: cpuhrsch fbshipit-source-id: eb834d64d16ee12734c77e74fffa4a47614e3dfb	2021-11-18 21:58:09 -08:00
Tristan Rice	758d7dea9c	torch.monitor - Initial C++ Stats (#68074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68074 This is the first step of many PRs towards implementing the `torch.monitor` RFC https://github.com/pytorch/rfcs/pull/30 This defines the aggregation types, the `Stat` class and provides some simple collection of the stats. This doesn't match the RFC exactly as it incorporates some of the comments on the RFC as well as a few changes for performance. Changes: * added window_size to the stats. If specified it will always compute the stat using the `window_size` number of values. If there aren't enough values within that window it reports the previous stats. * This doesn't include the push metrics yet (will be coming). After more discussion it looks like the best way to handle this is to support a hybrid where the metric can set how frequently it'll be logged. For fixed window_size metrics it'll be logged each time it hits the window size. This will allow performant counters as well as lower frequency push counters (window_size=1). Performance considerations: * Updating the stats acquires a lock on that Stat object. This should be performant unless there's many-many threads writing to the same stat. Single thread will typically use futex so should be quite fast. * Adding/removing/fetching all stats sets a global lock on the stat list -- this shouldn't be an issue since these events happen infrequently. * Fetching stats accesses one stat at a time instead of a global lock. This means the exported values are linearizable but not serializable across multiple stats but I don't expect this to be an issue. Next steps: 1. Add StatCollector interface for push style metrics 1. Add pybind interfaces to expose to Python 1. Add default metric providers 1. Integrate into Kineto trace view Test Plan: buck test //caffe2/test/cpp/monitor:monitor CI Reviewed By: kiukchung Differential Revision: D32266032 fbshipit-source-id: dab8747b4712f5dba5644387817a3a0fda18b66a	2021-11-18 21:46:23 -08:00
Jordan Fix	68d8ab0cc6	[const_fold] Fix call_module const folding (#68614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68614 We need to copy modules over to the `split` graph during const folding. We were previously only doing so from the non-constant submod, but we need to do this for the constant one as well in case some `call_module` is const folded. Test Plan: Added unit test Reviewed By: wushirong, 842974287 Differential Revision: D32543289 fbshipit-source-id: 80d1d0ce2c18a665b00e1343d6c55d939390ab10	2021-11-18 20:56:06 -08:00
Ivan Kobzarev	39747dc456	[nnc] Loweings for flatten, xnnpack prepack op (#68470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68470 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32545261 Pulled By: IvanKobzarev fbshipit-source-id: b2bf5b3260002bcc40a351a9c56d786b16b69287	2021-11-18 20:14:42 -08:00
jiej	ca92111758	Add native_dropout (#63937 ) Summary: Adds native_dropout to have a reasonable target for torchscript in auto diff. native_dropout has scale and train as arguments in its signature, this makes native_dropout more consistent with other operators and removes conditionals in the autodiff definition. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63937 Reviewed By: mruberry Differential Revision: D32477657 Pulled By: ngimel fbshipit-source-id: d37b137a37acafa50990f60c77f5cea2818454e4	2021-11-18 19:41:10 -08:00
Shunting Zhang	a39060c001	textray demo for unity Summary: Previously I need to back out D32220626 and then apply D31841609 to run the textray unity demo. It's hard to have other people to take a look how this textray demo looks like. I copied the textray demo (a single file) from pytext folder to unity folder and applied the changes needed. This way, other people can also run this textray demo. This also makes my dev environment cleaner. Test Plan: buck run mode/opt :textray_demo Reviewed By: mleshen Differential Revision: D32537190 fbshipit-source-id: 5df6347c4bec583c225aea9f98fbc9f37b5d3153	2021-11-18 19:04:18 -08:00
Vansh Sharma	ff125a3624	Minor changes in documentation (#68557 ) Summary: Fixed some small typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/68557 Reviewed By: mruberry Differential Revision: D32538749 Pulled By: ngimel fbshipit-source-id: 09a9cd4031463b6a40d7307bd8fcb7d364444ac3	2021-11-18 17:57:16 -08:00
Masaki Kozuki	9ce3c630ba	[Docs] Mention `torch.bfloat16` in `torch.finfo` (#68496 ) Summary: https://pytorch.org/docs/master/type_info.html#torch.torch.finfo seems to miss `torch.bfloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68496 Reviewed By: mruberry Differential Revision: D32538806 Pulled By: ngimel fbshipit-source-id: 1296b3eb34d024cfc7d85cf53efe771ee9f98ea2	2021-11-18 17:52:41 -08:00
soulitzer	913ac27112	Fixes forward AD codegen for multiple formulas (#68535 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67367 - Adds check to make sure forward grad itself does not have forward grad at the same level - Verify with `python test/test_ops.py -k test_forward_mode_AD_linalg_eigh_cpu_float64` that it fails the check before, but passes after the codegen update Before: ``` if (_any_has_forward_grad_eigenvalues) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto eigenvalues_new_fw_grad = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors); if (eigenvalues_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvalues._set_fw_grad(eigenvalues_new_fw_grad, /* level / 0, / is_inplace_op / false); } } if (_any_has_forward_grad_eigenvectors) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); auto eigenvectors_new_fw_grad = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors); if (eigenvectors_new_fw_grad.defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvectors._set_fw_grad(eigenvectors_new_fw_grad, / level / 0, / is_inplace_op / false); } } ``` After: ``` c10::optional<at::Tensor> eigenvalues_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_eigenvalues) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); eigenvalues_new_fw_grad_opt = eigh_jvp_eigenvalues(self_t, eigenvalues, eigenvectors); } c10::optional<at::Tensor> eigenvectors_new_fw_grad_opt = c10::nullopt; if (_any_has_forward_grad_eigenvectors) { auto self_t_raw = toNonOptFwGrad(self); auto self_t = self_t_raw.defined() ? self_t_raw : at::zeros_like(toNonOptTensor(self)); eigenvectors_new_fw_grad_opt = eigh_jvp_eigenvectors(self_t, eigenvalues, eigenvectors); } if (eigenvalues_new_fw_grad_opt.has_value() && eigenvalues_new_fw_grad_opt.value().defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvalues._set_fw_grad(eigenvalues_new_fw_grad_opt.value(), / level / 0, / is_inplace_op / false); } if (eigenvectors_new_fw_grad_opt.has_value() && eigenvectors_new_fw_grad_opt.value().defined()) { // The hardcoded 0 here will need to be updated once we support multiple levels. eigenvectors._set_fw_grad(eigenvectors_new_fw_grad_opt.value(), / level / 0, / is_inplace_op */ false); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68535 Reviewed By: ngimel Differential Revision: D32536089 Pulled By: soulitzer fbshipit-source-id: a3f288540e2d78a4a9ec4bd66d2c0f0e65dd72cd	2021-11-18 17:44:17 -08:00
Ivan Kobzarev	e7002c62ae	[nnc] External functions quantized via Dispatch (#68572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68572 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32522410 Pulled By: IvanKobzarev fbshipit-source-id: 7bb373819275582bb02e0d2ffd87a78d19f92318	2021-11-18 17:27:03 -08:00
Aliaksandr Ivanou	a990a7ac31	[torchelastic] Remove stale `test_get_default_executable` test (#68609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68609 The test is stale and tests non-existent method Test Plan: ci Reviewed By: kiukchung Differential Revision: D32540127 fbshipit-source-id: c47b7aed3df6947819efb2f4ad1b7a059c252138	2021-11-18 17:20:36 -08:00
Wanchao Liang	003f6ccec6	[BE] rename some tests in test_c10d_common (#67828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67828 as titled ghstack-source-id: 143781976 Test Plan: wait for ci Reviewed By: mrshenli Differential Revision: D32165576 fbshipit-source-id: 40c04b74f9e3241d3b3d64dee53af01fcfd1018b	2021-11-18 17:14:58 -08:00
John Clow	3757a16c7a	Adding custom testing based on opinfos input for ops with custom rules. (#67500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500 * #66898 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497547 Pulled By: Gamrix fbshipit-source-id: 07761f0e27f4ac289377ff3279ce6470d4b727dd	2021-11-18 16:29:00 -08:00
John Clow	71a031e954	Adding Custom Rules to Device Propagation (#66973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497549 Pulled By: Gamrix fbshipit-source-id: 5732682c0b39709f76cf218490e5f5136c0d83f8	2021-11-18 16:28:56 -08:00
John Clow	77db720c65	Moving parts of the Shape Registry into a common file (#66948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D32497550 Pulled By: Gamrix fbshipit-source-id: 650feded6bae379af3d73a52edac2721bd7af2f2	2021-11-18 16:27:45 -08:00
Hongyi Jia	244691db98	surface ncclUniqueId store broadcast error (#68597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68597 Users got confused by just 'Socket timeout'. Surfacing detailed error message. https://fb.workplace.com/groups/319878845696681/posts/650320792652483/. As we are using store more often for desync timeout/slowness detection, will need a good wrapper to surface error message for all store APIs. Test Plan: ``` RuntimeError: [3] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got exception: Socket Timeout Exception raised from recvBytes at caffe2/torch/csrc/distributed/c10d/Utils.hpp:595 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&) # 2 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 3 c10::detail::torchCheckFail(char const, char const, unsigned int, char const) # 4 c10d::TCPStore::doWait(c10::ArrayRef<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::chrono::duration<long, std::ratio<1l, 1000l> >) # 5 c10d::TCPStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 6 c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 7 c10d::PrefixStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) # 8 c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId, c10d::OpType, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) # 9 c10d::ProcessGroupNCCL::getNCCLComm(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<c10::Device, std::allocator<c10::Device> > const&, c10d::OpType, int, bool) # 10 c10d::ProcessGroupNCCL::allreduce(std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&) # 11 pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<c10::intrusive_ptr<c10d::ProcessGroup::Work, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup::Work> >, c10d::ProcessGroup, std::vector<at::Tensor, std::allocator<at::Tensor> >&, c10d::AllreduceOptions const&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, pybind11::arg_v, pybind11::call_guard<pybind11::gil_scoped_release> >(c10::intrusive_ptr<c10d::ProcessGroup::WorkTraceback (most recent call last): ``` Reviewed By: rohan-varma, mingzhe09088 Differential Revision: D32533304 fbshipit-source-id: e471636ee0c5291215cb6cde659b10bee13b7d12	2021-11-18 16:04:39 -08:00
Nikolay Korovaiko	ab1d879b33	[WIP] forbid aliasing between the outputs of a differentiable graph (#67732 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67732 Reviewed By: cpuhrsch Differential Revision: D32522826 Pulled By: Krovatkin fbshipit-source-id: 9fdf3509dcd1b885f7c7f06d22b340c0f93bbe12	2021-11-18 15:03:35 -08:00
Jane Xu	9f4e004abd	Revert D32283178: Add linalg.solve_triangular Test Plan: revert-hammer Differential Revision: D32283178 (`0706607abc`) Original commit changeset: deb672e6e52f fbshipit-source-id: d2a3421292147426cc61c2f063b721acf9004755	2021-11-18 14:46:10 -08:00
Pearu Peterson	48771d1c7f	[BC-breaking] Change dtype of softmax to support TorchScript and MyPy (#68336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68336 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32470965 Pulled By: cpuhrsch fbshipit-source-id: 254b62db155321e6a139bda9600722c948f946d3	2021-11-18 11:26:14 -08:00
Alban Desmaison	748d9d2494	Revert D32187063: [static runtime] dequantize out variant Test Plan: revert-hammer Differential Revision: D32187063 (`f120335643`) Original commit changeset: 1fec6b74c7d3 fbshipit-source-id: 9770f8379e9ddba9e537fef0e66cc93c2caaf860	2021-11-18 10:12:31 -08:00
lezcano	0706607abc	Add linalg.solve_triangular (#63568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63568 This PR adds the first solver with structure to `linalg`. This solver has an API compatible with that of `linalg.solve` preparing these for a possible future merge of the APIs. The new API: - Just returns the solution, rather than the solution and a copy of `A` - Removes the confusing `transpose` argument and replaces it by a correct handling of conj and strides within the call - Adds a `left=True` kwarg. This can be achieved via transposes of the inputs and the result, but it's exposed for convenience. This PR also implements a dataflow that minimises the number of copies needed before calling LAPACK / MAGMA / cuBLAS and takes advantage of the conjugate and neg bits. This algorithm is implemented for `solve_triangular` (which, for this, is the most complex of all the solvers due to the `upper` parameters). Once more solvers are added, we will factor out this calling algorithm, so that all of them can take advantage of it. Given the complexity of this algorithm, we implement some thorough testing. We also added tests for all the backends, which was not done before. We also add forward AD support for `linalg.solve_triangular` and improve the docs of `linalg.solve_triangular`. We also fix a few issues with those of `torch.triangular_solve`. Resolves https://github.com/pytorch/pytorch/issues/54258 Resolves https://github.com/pytorch/pytorch/issues/56327 Resolves https://github.com/pytorch/pytorch/issues/45734 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: zou3519, JacobSzwejbka Differential Revision: D32283178 Pulled By: mruberry fbshipit-source-id: deb672e6e52f58b76536ab4158073927a35e43a8	2021-11-18 09:45:51 -08:00
Ansha Yu	f120335643	[static runtime] dequantize out variant (#67873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67873 Add out variant for aten::dequantize Test Plan: Test on inline_cvr model ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/294738512/294738512_0.predictor.disagg.local --recordio_inputs=/data/users/ansha/tmp/adfinder/294738512/294738512_0_local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=1 --iters=5 --warmup_iters=5 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Before: 0.047472 ms. 0.409729%. aten::dequantize (9 nodes) After 0.0307179 ms. 0.267204%. static_runtime::dequantize_copy (9 nodes, out variant) Reviewed By: hlu1 Differential Revision: D32187063 fbshipit-source-id: 1fec6b74c7d3f25d0f445775c4558d30c55dcece	2021-11-18 09:31:27 -08:00
Shirong Wu	7d38768d84	Rename splitter result (#68303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68303 Result of splitter is run on either accelerator or directly on gpu, rename gpu part graph to run_on_gpu Test Plan: buck test mode/opt caffe2/test:trt_tools_test Reviewed By: 842974287 Differential Revision: D32392492 fbshipit-source-id: b085376c00c1097752e856e22c631d74a0fbc38f	2021-11-18 09:04:30 -08:00
Emilio Castillo	533e72e0a4	Fix DLPack CUDA stream convention (#67618 ) Summary: Apparently for the array API, cuda default stream and per thread stream should be 1 and 2 instead of 0 and 1: https://data-apis.org/array-api/latest/API_specification/array_object.html?dlpack-self-stream-none#dlpack-self-stream-none. This caused a problem in the interop with CuPy https://github.com/cupy/cupy/pull/5970#discussion_r739912926. cc rgommers leofang mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67618 Reviewed By: albanD Differential Revision: D32521805 Pulled By: mruberry fbshipit-source-id: 95777e4014e5edf1f88ba10adc03c6e34c13248d	2021-11-18 08:36:05 -08:00
kshitij12345	d5d2096dab	[testing] make @dtypes mandatory when using @dtypesIf (#68186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53647 With this if a test forgets to add `dtypes` while using `dtypesIf`, following error is raised ``` AssertionError: dtypes is mandatory when using dtypesIf however 'test_exponential_no_zero' didn't specify it ``` Tested Locally Pull Request resolved: https://github.com/pytorch/pytorch/pull/68186 Reviewed By: VitalyFedyunin Differential Revision: D32468581 Pulled By: mruberry fbshipit-source-id: 805e0855f988b77a5d8d4cd52b31426c04c2200b	2021-11-18 08:29:31 -08:00
Nikita Vedeneev	857fed1f42	torch.linalg.qr: forward AD support (#67268 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67268 Reviewed By: ngimel Differential Revision: D31960517 Pulled By: albanD fbshipit-source-id: bfd1028a8d352f550efb420f9ca609c09f4a7484	2021-11-18 08:11:54 -08:00
Nikita Shulga	a2d187a672	[BE] MapAllocator: report map error on Linux (#68545 ) Summary: Add `, strerror(errno), " (", errno, ")"` suffix to TORCH_CHECK messages that report failures from POSIX calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/68545 Reviewed By: ngimel Differential Revision: D32509300 Pulled By: malfet fbshipit-source-id: 1d7792d07e3a1184d2d54d137e6a9105dbab7d4c	2021-11-18 08:04:09 -08:00
Richard Zou	b1aa45a8a7	Fix `_make_wrapper_subclass`'s storage_offset handling (#68268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68268 Previously, `_make_wrapper_subclass` ignored the storage offset it was passed. This PR fixes that by updating TensorMaker::computeStorageSize() and TensorMaker::make_tensor() to take into account storage_offset. Test Plan: - added test Reviewed By: albanD, bdhirsh Differential Revision: D32396330 Pulled By: zou3519 fbshipit-source-id: 2c85bc4066044fe6cb5ab0fc192de6c9069855fd	2021-11-18 07:07:42 -08:00
Richard Zou	f0e2ad5037	Stop warning spamming about vmap in gradcheck (#68586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68586 We updated the vmap warnings to be more descriptive in https://github.com/pytorch/pytorch/pull/67347 . However, gradcheck does some warning squashing that matches on the warning message and we didn't update that. This PR updates the warning squashing in gradcheck. Test Plan: - check logs Reviewed By: albanD Differential Revision: D32530259 Pulled By: zou3519 fbshipit-source-id: 9db380b57c38b3b72cbdb29574f71dbfe71e90d1	2021-11-18 07:00:36 -08:00
Richard Zou	f9ef807f4d	Replace empty with new_empty in nn.functional.pad (#68565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68565 This makes it so that we can now vmap over nn.functional.pad (circular variant). Previously we could not because we were effectively doing `out.copy_(input)` where the out was created with empty. This also has the added side effect of cleaning up the code. Test Plan: - I tested this using functorch.vmap and can confirm that vmap now works. - Unfortunately this doesn't work with the vmap in core so I cannot add a test for this here. Reviewed By: albanD Differential Revision: D32520188 Pulled By: zou3519 fbshipit-source-id: 780a7e8207d7c45fcba645730a5803733ebfd7be	2021-11-18 06:06:50 -08:00
Ben Koopman	6c9cf5e6ea	[quant][embedding qat] eager mode QAT for Embeddings (#66429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66429 Test Plan: Imported from OSS Reviewed By: HDCharles, supriyar Differential Revision: D31618284 Pulled By: b-koopman fbshipit-source-id: 0c0e2e86b98da9f29e9b2fc2a35c59424f94cbba	2021-11-18 05:57:11 -08:00
Jaewon Lee	dbbb02474b	[GPU host alloc] Fast path for size 0 malloc (#68532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68532 Diff to better handle size 0 pinned memory allocation requests. ---- ### Behavior before fix The very first size 0 malloc comes in. It will create a block with `{key: 0, value: Block(0, 0, true)}`. Another size 0 malloc comes in. It will either 1) get a block with size > 0 (which is a waste of pinned memory) or 2) call `cudaHostAlloc()` with size 0 to eventually get ptr=0. Note that this block is not registered* to the block pool because we have a duplicate entry (and that's why we will keep wasting size > 0 pinned memory block, if `available.empty() == false`). ---- ### Behavior after fix Let `malloc()` simply return a nullptr (0). This avoids wasting valid size > 0 blocks as well as save the calls to `cudaHostAlloc()` which is expensive. This is also safe since `free()` simply returns success for nullptrs. ----- Test Plan: Unit tests. Reviewed By: yinghai Differential Revision: D32487522 fbshipit-source-id: 6140cab54ff5a34ace7d046f218fb32805c692c0	2021-11-18 02:39:36 -08:00
Ansha Yu	4635f5711f	[static runtime][dper] multi_env tests for static runtime: selective enable (#67467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67467 Unit tests for static runtime in the dper multi-env tests for cpu and scripted (including fx-traced + scripted) models. Only turn it on for single_operators_tests that are in the inline_cvr local/local_ro/remote_ro model for now. Will have another diff that turns this on by default and explicitly disables for certain tests. Test Plan: buck test dper3/dper3/modules/low_level_modules/tests:single_operators_test Reviewed By: hlu1, houseroad Differential Revision: D30870488 fbshipit-source-id: 382daec8dbcb95135cdd43e7b84a1d23b445d27c	2021-11-18 01:04:12 -08:00
Wanchao Liang	35712a8eb4	[reland] simplify init_from_local_shards API (#68021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68021 reland PR of https://github.com/pytorch/pytorch/pull/64481 as the previous one have some internal failures that didn't get captured when first landed. This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 143661119 ghstack-source-id: 143661119 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D32147888 fbshipit-source-id: 897128b75224f4b9644471a04a64079f51e0d5fe	2021-11-17 23:20:37 -08:00
Rok	952ca25daa	Sparse CSR: add `convert_indices_from_csr_to_coo` (#66774 ) Summary: This PR adds conversion from CSR to COO. Fixes https://github.com/pytorch/pytorch/issues/56959 cc nikitaved pearu cpuhrsch IvanYashchuk gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66774 Reviewed By: zou3519 Differential Revision: D32288415 Pulled By: cpuhrsch fbshipit-source-id: 683ba658dc46835fdf3c0e24645c0c2bb243b968	2021-11-17 22:28:30 -08:00
Hongyi Jia	96ba2099d1	Fix c10d TCP store with mutex (#68499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68499 TCP store is actually being accessed by multi-threading (NCCL watch dog thread), but no mutex protection while FileStore and HashStore have. As enabling desync root cause analysis makes store calls more often, the race condition of TCP store was always triggered when creating another process group like gloo. Adding mutex to TCP store, to be the same with FileStore and HashStore. Test Plan: DDP benchmark with desync debug enabled, no perf regression https://www.internalfb.com/intern/fblearner/details/309398285?tab=Outputs W/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32482254 fbshipit-source-id: e8f466e1c6fdcab6cfa170f44b9be70395935fb8	2021-11-17 20:30:10 -08:00
Hongyi Jia	146a7f68e2	Enable desync root cause analysis for NCCL (#68310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68310 Enable desync root cause analysis by recording the last footprint of collective calls. When timeout we parse the store trace and figure out the root cause of the desync issue. This feature is built based on async error handling. Test Plan: Standalone test * Typical desync - P467288969 * Mismatched collectives - P467288916 * Mismatched broadcast size - P467288873 DDP benchmark * DDP benchmark desync - P467433483, P467520195 No perf regression: * w/o this diff https://www.internalfb.com/intern/fblearner/details/308379789?tab=Outputs * w/ this diff https://www.internalfb.com/intern/fblearner/details/308534088?tab=Outputs Reviewed By: mingzhe09088 Differential Revision: D32348647 fbshipit-source-id: 43e7e96e3fa2be0ac66c1325bceb639b461a8b3a	2021-11-17 20:29:03 -08:00
rusty1s	9807787135	`scatter_reduce` (#68115 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63780 Basic functionality of a `scatter_reduce` algorithm with `reduce="sum"`: * `scatter_reduce` is named as `scatter_reduce2` due to compiling issues * It currently re-uses functionality from `scatter_add` * Tests are missing: WIP The error when the `scatter_reduce` naming is used: ``` In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13949:18: error: redefinition of ‘struct at::_ops::scatter_reduce’ 13949 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13817:18: note: previous definition of ‘struct at::_ops::scatter_reduce’ 13817 \| struct TORCH_API scatter_reduce { \| ^~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13960:18: error: redefinition of ‘struct at::_ops::scatter_reduce_out’ 13960 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ aten/src/ATen/Operators.h:13839:18: note: previous definition of ‘struct at::_ops::scatter_reduce_out’ 13839 \| struct TORCH_API scatter_reduce_out { \| ^~~~~~~~~~~~~~~~~~ In file included from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/core/TensorBody.h: In member function ‘at::Tensor at::Tensor::scatter_reduce(int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>) const’: aten/src/ATen/core/TensorBody.h:3976:83: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 3976 \| return at::_ops::scatter_reduce::call(const_cast<Tensor&>(*this), dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor at::scatter_reduce(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7119:61: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7119 \| return at::_ops::scatter_reduce::call(self, dim, index, reduce, output_size); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13824:109: note: initializing argument 4 of ‘static at::Tensor at::_ops::scatter_reduce::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view)’ 13824 \| static at::Tensor call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_out(at::Tensor&, const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>)’: aten/src/ATen/Functions.h:7124:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7124 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from ../aten/src/ATen/ATen.h:15, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Functions.h: In function ‘at::Tensor& at::scatter_reduce_outf(const at::Tensor&, int64_t, const at::Tensor&, c10::string_view, c10::optional<long int>, at::Tensor&)’: aten/src/ATen/Functions.h:7129:65: error: cannot convert ‘c10::string_view’ {aka ‘c10::basic_string_view<char>’} to ‘const at::Tensor&’ 7129 \| return at::_ops::scatter_reduce_out::call(self, dim, index, reduce, output_size, out); \| ^~~~~~ \| \| \| c10::string_view {aka c10::basic_string_view<char>} In file included from aten/src/ATen/core/TensorBody.h:3, from ../aten/src/ATen/core/Tensor.h:3, from ../aten/src/ATen/DeviceGuard.h:4, from ../aten/src/ATen/ATen.h:11, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/Operators.h:13846:111: note: initializing argument 4 of ‘static at::Tensor& at::_ops::scatter_reduce_out::call(const at::Tensor&, int64_t, const at::Tensor&, const at::Tensor&, c10::string_view, at::Tensor&)’ 13846 \| static at::Tensor & call(const at::Tensor & self, int64_t dim, const at::Tensor & index, const at::Tensor & src, c10::string_view reduce, at::Tensor & out); \| ~~~~~~~~~~~~~~~~~~~^~~ In file included from aten/src/ATen/NativeFunctions.h:6, from ../aten/src/ATen/TensorIndexing.h:12, from ../aten/src/ATen/ATen.h:20, from aten/src/ATen/native/cpu/CopyKernel.cpp.DEFAULT.cpp:1: aten/src/ATen/NativeMetaFunctions.h: At global scope: aten/src/ATen/NativeMetaFunctions.h:496:18: error: redefinition of ‘struct at::meta::structured_scatter_reduce’ 496 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ aten/src/ATen/NativeMetaFunctions.h:481:18: note: previous definition of ‘struct at::meta::structured_scatter_reduce’ 481 \| struct TORCH_API structured_scatter_reduce : public at::impl::MetaBase { \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ninja: build stopped: subcommand failed. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68115 Reviewed By: albanD Differential Revision: D32488450 Pulled By: cpuhrsch fbshipit-source-id: 65e79c6d0555c0d5715535bb52aade8d5fcd9722	2021-11-17 19:53:12 -08:00
Shiyan Deng	e72b9db48e	[fx2trt] add converter for acc_ops.hardtanh (#68550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68550 Missing ops in https://fburl.com/gsheet/q06f1vrc Test Plan: unit tests Reviewed By: wushirong Differential Revision: D32500303 fbshipit-source-id: 9266210ae229263f6bb2a60486c279ceb766ffdf	2021-11-17 17:59:37 -08:00
Kefei Lu	9d9ca88f5c	[predictor][trt] Expose more CUDA/CuDNN info to at::Context and BC stage 1 (#68146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68146 Expose more CUDA/CuDNN info to at::Context Test Plan: CI; lint; Reviewed By: houseroad Differential Revision: D32264935 fbshipit-source-id: ad43d5d245dba4a054e09346240414159832585e	2021-11-17 17:16:19 -08:00
Ivan Kobzarev	d71092f668	[android][fbjni] Update fbjni to 0.2.2 (#68400 ) Summary: ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400 Fixes #{issue number} CI-all check: https://github.com/pytorch/pytorch/pull/68497 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68495 Reviewed By: linbinyu Differential Revision: D32481451 Pulled By: IvanKobzarev fbshipit-source-id: b19ce05ff9d63b3f701d718eefbf1e9d66e11639	2021-11-17 16:54:22 -08:00
Brian Hirsh	53bfb00ee1	[bugfix] TensorList args in functionalization pass (#68395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68395 At the time that I wrote the pass, I thought that `c10::TensorList` and `c10::List<Tensor>` were the same thing. But it looks like a `TensorList` is actually an `ArrayRef<Tensor>`. This led to a nasty bug when I tried to add conditional functionalization to `block_diag`, where in the boxed kernel, I would: (1) unwrap the first `IValue` by calling `.toTensorList()` (this actually returns a `List<Tensor>`, not a `TensorList`). (2) call `TensorList to_functional_tensor(List<Tensor>)` to get out a `TensorList` with the functionalized tensors (3) wrap that back into an `IValue` and put in on the stack. Somewhere in that sequence of operations, something bad happens and we segfault. Fixing up the signature of `to_functional_tensor` to be `List<Tensor> to_functional_tensor(List<Tensor>)` fixes the bug. I have a feeling that there's a latent TensorList-related bug in the boxing/unboxing logic that made this worse, but I'm okay to stick with my narrow fix for now. Additionally tested by running `pytest test/test_ops.py test/test_vmap.py -v -k block_diag` on top of this PR: https://github.com/pytorch/functorch/pull/235 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32448258 Pulled By: bdhirsh fbshipit-source-id: 3b2b6c7cd5e4c29533d0502f24272d826bfe03c1	2021-11-17 15:50:30 -08:00
Wei-Sheng Chin	b0bdf588ea	[ONNX] Release values cached in global object (#68210 ) Summary: To release constants computed and stored by `ConstantValueMap::SetValue(...)` during ONNX exporting, `ConstantValueMap::Clear()` needs to be called explicitly. Otherwise, it's a memory leak. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68210 Reviewed By: jansel Differential Revision: D32465670 Pulled By: msaroufim fbshipit-source-id: 521e474071b94c5d2cd4f353ee062cee78be1bd4	2021-11-17 12:47:59 -08:00
Han Qi	4eb772fde6	Refactor saving jit::Module to mobile .pt in 2 steps: (#66494 ) Summary: 1. is to convert Function -> mobile::Function 2. is to serialize mobile::Function This also opens opportunity to create mobile::Module without saving/reloading Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66494 Reviewed By: zhxchen17 Differential Revision: D32293022 Pulled By: qihqi fbshipit-source-id: 29b43d47ff86071d5e2f9d6ca4dba4445711ce3d	2021-11-17 12:02:20 -08:00
Natalia Gimelshein	e2aeb4a7af	Improve native layer norm backward perf (#68238 ) Summary: Benchmarks At this PR ``` [------------------------------------------------------ ln ------------------------------------------------------] \| fwd, torch.float32 \| fwdbwd, torch.float32 \| fwd, torch.float16 \| fwdbwd, torch.float16 1 threads: ------------------------------------------------------------------------------------------------------- 200, 256 \| 17.5 \| 106.6 \| 18.1 \| 94.7 1000, 256 \| 18.7 \| 116.6 \| 18.7 \| 110.7 6000, 256 \| 28.1 \| 111.8 \| 19.4 \| 92.3 6272, 256 \| 29.3 \| 108.5 \| 20.1 \| 92.7 200, 512 \| 19.3 \| 83.8 \| 19.1 \| 116.3 1000, 512 \| 17.9 \| 88.0 \| 17.9 \| 93.0 6000, 512 \| 36.9 \| 141.2 \| 27.4 \| 103.3 6272, 512 \| 38.2 \| 146.5 \| 28.1 \| 107.9 200, 1024 \| 18.1 \| 89.5 \| 21.1 \| 102.7 1000, 1024 \| 17.9 \| 88.7 \| 18.5 \| 92.5 6000, 1024 \| 77.6 \| 277.5 \| 40.3 \| 148.5 6272, 1024 \| 80.7 \| 288.1 \| 42.0 \| 154.0 200, 1536 \| 17.9 \| 117.3 \| 18.1 \| 88.1 1000, 1536 \| 22.9 \| 92.0 \| 19.4 \| 89.0 6000, 1536 \| 123.4 \| 436.3 \| 61.7 \| 228.5 6272, 1536 \| 129.1 \| 457.3 \| 64.3 \| 238.5 200, 2048 \| 18.0 \| 90.5 \| 19.1 \| 101.6 1000, 2048 \| 31.1 \| 109.8 \| 25.3 \| 107.9 6000, 2048 \| 174.5 \| 589.8 \| 87.1 \| 310.5 6272, 2048 \| 182.2 \| 617.0 \| 91.2 \| 316.7 200, 3072 \| 19.8 \| 96.4 \| 19.4 \| 89.3 1000, 3072 \| 48.1 \| 168.7 \| 23.5 \| 100.9 6000, 3072 \| 267.1 \| 930.0 \| 134.8 \| 519.2 6272, 3072 \| 278.2 \| 971.2 \| 140.7 \| 540.2 ``` Pre-https://github.com/pytorch/pytorch/issues/67977 ``` [------------------------------------------------------- ln -------------------------------------------------------] \| fwd, torch.float32 \| fwdbwd, torch.float32 \| fwd, torch.float16 \| fwdbwd, torch.float16 1 threads: --------------------------------------------------------------------------------------------------------- 200, 256 \| 20.9 \| 92.6 \| 21.3 \| 110.1 1000, 256 \| 20.3 \| 91.8 \| 28.1 \| 115.6 6000, 256 \| 93.0 \| 310.7 \| 86.3 \| 299.8 6272, 256 \| 97.3 \| 323.5 \| 90.0 \| 314.1 200, 512 \| 20.9 \| 110.2 \| 21.1 \| 95.0 1000, 512 \| 24.0 \| 102.8 \| 22.2 \| 95.9 6000, 512 \| 121.7 \| 367.2 \| 105.6 \| 337.4 6272, 512 \| 127.0 \| 382.3 \| 111.3 \| 352.0 200, 1024 \| 21.0 \| 131.8 \| 20.4 \| 93.3 1000, 1024 \| 35.5 \| 108.7 \| 27.7 \| 99.4 6000, 1024 \| 170.4 \| 495.5 \| 137.7 \| 411.4 6272, 1024 \| 177.5 \| 517.6 \| 143.6 \| 428.6 200, 1536 \| 21.9 \| 97.6 \| 20.8 \| 92.7 1000, 1536 \| 44.3 \| 129.7 \| 33.9 \| 100.1 6000, 1536 \| 215.8 \| 619.2 \| 167.2 \| 480.9 6272, 1536 \| 225.0 \| 646.9 \| 174.8 \| 505.9 200, 2048 \| 21.8 \| 100.8 \| 20.7 \| 96.7 1000, 2048 \| 53.7 \| 152.4 \| 41.4 \| 118.3 6000, 2048 \| 267.0 \| 753.6 \| 220.4 \| 571.5 6272, 2048 \| 278.6 \| 785.8 \| 211.4 \| 589.2 200, 3072 \| 20.9 \| 103.7 \| 21.9 \| 104.6 1000, 3072 \| 71.4 \| 201.1 \| 53.1 \| 148.3 6000, 3072 \| 365.7 \| 1040.3 \| 262.0 \| 731.5 6272, 3072 \| 382.0 \| 1084.4 \| 273.3 \| 766.3 ``` Benchmarking script ``` import torch from torch.utils.benchmark import Timer, Compare results = [] for dtype in (torch.float, torch.half): for fs in (256, 512, 1024, 1536, 2048, 3072): for bs in (200, 1000, 6000, 196*32): ln = torch.nn.LayerNorm((fs,), device="cuda", dtype=dtype) X = torch.randn(bs, fs, device="cuda", dtype=dtype, requires_grad=True) gO = torch.rand_like(X) stmtfwd = "ln(X)" stmtfwdbwd = "X.grad=None; ln.zero_grad(set_to_none=True); out = ln(X); out.backward(gO)" tfwd = Timer(stmt=stmtfwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwd, {dtype}", globals=globals()) tfwdbwd = Timer(stmt=stmtfwdbwd, label="ln", sub_label=f"{bs:5}, {fs:5}", description=f"fwdbwd, {dtype}", globals=globals()) for t in (tfwd, tfwdbwd): results.append(t.blocked_autorange()) print(fs, end='\r') c = Compare(results) c.print() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68238 Reviewed By: mruberry Differential Revision: D32469450 Pulled By: ngimel fbshipit-source-id: 08fe755c156d3d5c366c966cb808bf0f3e74c050	2021-11-17 12:00:07 -08:00
Jane Xu	f3e2fefe09	Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486 ) Summary: After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable. What a noob mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486 Reviewed By: seemethere Differential Revision: D32501718 Pulled By: janeyx99 fbshipit-source-id: 9918d626e90bea1562a3094c6eb12cb7d86dbf6a	2021-11-17 11:50:48 -08:00
Jerry Zhang	2f37a39a5c	[quant][graphmode][fx] Refactor node_name_to_target_dtype to make it more clear (#68317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68317 We use the node_name_to_target_dtype to store the target dtype for output activations for each node, computed from qconfig for the node, there are two problems with node_name_to_target_dtype that makes it hard to work with: 1. we mutate node_name_to_target_dtype when we insert observers, this makes the data structure confusing because it's typically unexpected to change a data structure that store the "target" dtype 2. currently it only stores target dtype about output activations, while we also need target dtype for input activation, weight and bias This PR fixes both problem by removing mutation from the node_name_to_target_dtype and expanding the target_dtype for node to include the missing target dtype for input activation, weight and bias. We will have another refactor to simplify the observation for weight and bias dtype in the future. Please see comments for the updated structure of node_name_to_target_dtype TODO: we may want to rename node_name_to_target_dtype to node_name_to_target_dtype_info in a separate PR. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D32411858 fbshipit-source-id: 3d76dd65056920ff8642899517bc1b95d43fc1de	2021-11-17 11:21:25 -08:00
Kurt Mohler	3b4f072383	Remove TH/THC Storage data and copy functions (#68127 ) Summary: Part of https://github.com/pytorch/pytorch/issues/67852 cc ezyang bhosmer smessmer ljk53 bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/68127 Reviewed By: mrshenli Differential Revision: D32441885 Pulled By: ngimel fbshipit-source-id: 1bbe7c8bed30bfe1737511a4f347fd9a8024dd99	2021-11-17 11:19:54 -08:00
Peter Bell	4e21d77dbb	Use TORCH_CHECK in MapAllocator (#68424 ) Summary: When porting `THAllocator` to ATen I changed `AT_ERROR` to `TORCH_INTERNAL_ASSERT` but the direct translation should have been `TORCH_CHECK`. `33e9a0b5f6/c10/util/Exception.h (L619-L623)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68424 Reviewed By: VitalyFedyunin Differential Revision: D32465548 Pulled By: ngimel fbshipit-source-id: 7fa9c1fe27e4849b76248badb681d7b6877ce9e8	2021-11-17 10:33:22 -08:00
frgfm	693fe2fd9b	docs: Added Union to supported types in documentation (#68435 ) Summary: This PR simply updates the documentation following up on https://github.com/pytorch/pytorch/pull/64234, by adding `Union` as a supported type. Any feedback is welcome! cc ansley albanD gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/68435 Reviewed By: davidberard98 Differential Revision: D32494271 Pulled By: ansley fbshipit-source-id: c3e4806d8632e1513257f0295568a20f92dea297	2021-11-17 10:26:31 -08:00
Mike Iovine	61206ba4db	[SR] Add StorageGroup abstraction (#68279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68279 While reworking the liveness analysis, I noticed that using `std::pair<size_t, std::vector<Tensor>>` to represent storage groups made things quite unreadable. Add a simple class to wrap a `std::vector<at::Tensor>` and store a `size` attribute Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Also ran inline_cvr benchmarks, did not see any errors Reviewed By: swolchok Differential Revision: D32369447 fbshipit-source-id: e0b562aa7eefd738b1a34f1f37eb7bc95d71a257	2021-11-17 09:29:08 -08:00
Mikayla Gawarecki	cac3cd1433	add torch.diff support for n greater than 1 (#67260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67260 Addressing 54853 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31930294 Pulled By: mikaylagawarecki fbshipit-source-id: 97c7a27e9200c6688242680ff96b73dfff828479	2021-11-17 09:16:33 -08:00
vfdev-5	3da2e09c9b	Added antialias flag to interpolate (CPU only, bilinear) (#65142 ) Summary: Description: - Added antialias flag to interpolate (CPU only) - forward and backward for bilinear mode - added tests ### Benchmarks <details> <summary> Forward pass, CPU. PTH interpolation vs PIL </summary> Cases: - PTH RGB 3 Channels, float32 vs PIL RGB uint8 (apply vs pears) - PTH 1 Channel, float32 vs PIL 1 Channel Float Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` # OMP_NUM_THREADS=1 python bench_interp_aa_vs_pillow.py Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, Num threads: 1 [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (320, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.9 \| 3.1 channels_last non-contiguous torch.float32 \| 2.6 \| 3.6 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (460, 220) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 3.4 \| 4.0 channels_last non-contiguous torch.float32 \| 3.4 \| 4.8 Times are in milliseconds (ms). [------------------------ Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 96) -------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 1.6 \| 1.8 channels_last non-contiguous torch.float32 \| 1.6 \| 1.9 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 9.0 \| 11.3 channels_last non-contiguous torch.float32 \| 8.9 \| 12.5 Times are in milliseconds (ms). [----------------------- Downsampling: torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------------------] \| Reference, PIL 8.3.2, mode: RGB \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 \| 2.1 \| 1.8 channels_last non-contiguous torch.float32 \| 2.1 \| 3.4 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (320, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.2 \| 1.0 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (460, 220) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 1.4 \| 1.3 Times are in milliseconds (ms). [--------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 719.9 \| 599.9 Times are in microseconds (us). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 3.7 \| 3.5 Times are in milliseconds (ms). [-------------- Downsampling: torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------] \| Reference, PIL 8.3.2, mode: F \| 1.10.0a0+git1e87d91 1 threads: ------------------------------------------------------------------------------ contiguous torch.float32 \| 834.4 \| 605.7 Times are in microseconds (us). ``` </details> Code is moved from torchvision: https://github.com/pytorch/vision/pull/4208 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65142 Reviewed By: mrshenli Differential Revision: D32432405 Pulled By: jbschlosser fbshipit-source-id: b66c548347f257c522c36105868532e8bc1d4c6d	2021-11-17 09:10:15 -08:00
CodemodService FBSourceClangFormatLinterBot	143491e0ad	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32484422 fbshipit-source-id: 5c836dc7d06f12e64cc4bb1e85d8fa4b62a29b85	2021-11-17 07:27:04 -08:00
Mike Ruberry	3e3bf40b0a	Revert D32452012: [pytorch][PR] Fix flaky test_nccl_timeout Test Plan: revert-hammer Differential Revision: D32452012 (`faa1e8b7cf`) Original commit changeset: c959b25957f2 fbshipit-source-id: a2786744b12ceed350eec0ca2834f5176a4e21ee	2021-11-17 06:08:53 -08:00
Mike Ruberry	54ac64f035	Revert D32477989: [pytorch][PR] Actually enable PYTORCH_RETRY_TEST_CASES for linux tests Test Plan: revert-hammer Differential Revision: D32477989 (`173c0f8a98`) Original commit changeset: e28d095773f5 fbshipit-source-id: 2de5fac08f7f322a3aeb92a67b5fdfa0a6518bf1	2021-11-17 06:04:14 -08:00
jjsjann123	0dc3f829d9	Nvfuser code bump 11 5 (#67943 ) Summary: nvfuser code update: 1. Tuning heuristics on schedulers for reduction/normalization kernels; 2. bfloat16 on IO tensor support; 3. Refactored memory format support, now we can support dimension collapsing with non-coherent input tensors with different memory format. e.g. channels last tensor input to batch normalization. Note that we are currently limiting memory format to only Contiguous and Channels last; 4. Refactored nvfuser graph partitioning in `graph_fuser.cpp`, separated node merge and profile node API. Updated `profiling_record.cpp`. Things that are reverted from our local branch: 1. changes on some entries in autodiff 2. aten::gelu with approximation 3. native_dropout(_backward) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67943 Reviewed By: ngimel Differential Revision: D32288709 Pulled By: dzhulgakov fbshipit-source-id: fc9491182ea7e0158bc112c66f096823c588eaf1	2021-11-17 01:22:17 -08:00
Ansha Yu	01b30922dd	[static runtime] fuse gather+to+lengths_to_offsets (#64075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64075 Test Plan: Before: `I0826 17:17:54.165174 1064079 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.66724. Iters per second: 149.987` After: `I0826 17:13:07.464485 1040300 PyTorchPredictorBenchLib.cpp:313] PyTorch run finished. Milliseconds per iter: 6.46362. Iters per second: 154.712` Profile after: P453143683 Accuracy tested comparing with jit interpreter for no differences under 1e-3 (nnc ops turned on) https://www.internalfb.com/intern/diff/view-version/136824794/ ====== With 100-request recordio inputs (211 inputs) Before: `I1101 12:43:13.558375 742187 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.7882. Iters per second: 84.8309` After: `I1101 13:50:41.087644 1126186 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 11.6763. Iters per second: 85.6438` Profile after: P465977010 Constituent ops before (total is 0.5646): ``` 0.187392 ms. 1.61737%. fb::clip_ranges_gather (309 nodes, out variant) 0.174101 ms. 1.50266%. fb::lengths_to_offsets (464 nodes, out variant) 0.203126 ms. 1.75317%. static_runtime::to_copy (805 nodes, out variant) ``` Constitutent ops after (total is 0.4985): ``` 0.376559 ms. 3.25614%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0614349 ms. 0.531235%. fb::lengths_to_offsets (159 nodes, out variant) 0.0573315 ms. 0.495751%. static_runtime::to_copy (195 nodes, out variant) 0.00325543 ms. 0.0281501%. fb::gather_ranges (4 nodes, out variant) ``` Compare with jit interpreter inside benchmark: `I1101 13:55:53.013602 1149446 PtVsBlackBoxPredictorBenchLib.cpp:175] Finished comparing PT static runtime and jit interpreter results` ====== Casting on the fly: a. Static runtime off ``` Static runtime ms per iter: 11.4658. Iters per second: 87.2159 0.220367 ms. 1.94726%. static_runtime::to_copy (805 nodes, out variant) 0.172585 ms. 1.52504%. fb::clip_ranges_gather (309 nodes, out variant) 0.157836 ms. 1.39471%. fb::lengths_to_offsets (464 nodes, out variant) ``` b. Casting on the fly, using explicit allocation+to_copy (which has the fast pass for certain cases, but we'll always call empty): ``` I1115 09:08:35.711972 1925508 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 11.6732. Iters per second: 85.6662 0.599439 ms. 5.25098%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.0552475 ms. 0.483958%. fb::lengths_to_offsets (159 nodes, out variant) 0.0576032 ms. 0.504593%. static_runtime::to_copy (195 nodes, out variant) 0.00299026 ms. 0.0261941%. fb::gather_ranges (4 nodes, out variant) ``` c. Casting on the fly with native::to (no explicit allocation, but no fast pass): ``` Static runtime ms per iter: 11.5627. Iters per second: 86.4849 0.454356 ms. 3.9652%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.06315 ms. 0.551115%. static_runtime::to_copy (195 nodes, out variant) 0.0590741 ms. 0.515544%. fb::lengths_to_offsets (159 nodes, out variant) 0.00359182 ms. 0.031346%. fb::clip_ranges_gather (4 nodes, out variant) ``` d. Removal of the to() call in question from the fusion pattern: ``` Static runtime ms per iter: 11.3658. Iters per second: 87.9836 0.29591 ms. 2.6479%. fb::clip_ranges_to_gather_to_offsets (305 nodes, out variant) 0.154612 ms. 1.38352%. static_runtime::to_copy (500 nodes, out variant) 0.0567151 ms. 0.507505%. fb::lengths_to_offsets (159 nodes, out variant) 0.0051115 ms. 0.0457394%. fb::clip_ranges_gather (4 nodes, out variant) ``` Reviewed By: hlu1 Differential Revision: D30515441 fbshipit-source-id: 53acee10619ac2be7dc8982e929e3210c4bb6d21	2021-11-17 00:49:31 -08:00
Rohan Varma	faa1e8b7cf	Fix flaky test_nccl_timeout (#68403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66882 - Remove time.sleep call - Use gloo barrier to enforce rank synchronization - Reduce timeouts for allrduce - Pass in timeout and call wait() in _check_for_nccl_abort() cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/68403 Reviewed By: H-Huang Differential Revision: D32452012 Pulled By: rohan-varma fbshipit-source-id: c959b25957f2eb8d59c506075da6023d25bbcfd9	2021-11-16 23:43:23 -08:00
Amit Kumar Chawla	6186b90c53	[Contrib][Fakelowp] Change Lut Size for Tanh (#68334 ) Summary: Reference code LUT size increased and now mininum starts from 0, instead of 7000 earlier Pull Request resolved: https://github.com/pytorch/pytorch/pull/68334 Reviewed By: jiecaoyu Differential Revision: D32467332 Pulled By: hl475 fbshipit-source-id: 3e4510e09374519aebe657a31f0b1ccde117e761	2021-11-16 23:39:02 -08:00
Yanli Zhao	f6696c5a85	export CPUOffload in _fsdp package (#68308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68308 export CPUOffload in _fsdp package, as cpu_offload config in FSDP API needs to import this class ghstack-source-id: 143560608 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32408719 fbshipit-source-id: ee5c40ec91a423fbd58872fbdeb5f2dda8a3d89e	2021-11-16 22:56:12 -08:00
Yanli Zhao	9c15523793	Attach unused parameter info to static graph error message (#68413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68413 attach unused parameter info to static graph error message ghstack-source-id: 143560766 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32457112 fbshipit-source-id: 31de859bf5289aa6044279014f0e76be9bcb9e54	2021-11-16 22:55:08 -08:00
Peter Bell	9de730ebba	q_avgpool: Loop over batch dimension inside operators (#66819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66819 This has a number of different advantages: - For channels last tensors, DispatchStub overhead is only incurred once. - For contiguous tensors, parallelization now happens over batch and chanels, enabling better load balancing between threads. - `q_scale()` and `q_zero_point()` are no longer called inside of a parallel region, which is not allowed (see gh-56794) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32445352 Pulled By: ngimel fbshipit-source-id: cd938e886cd5696855eb56a649eaf3bccce35e54	2021-11-16 22:29:42 -08:00
Sangbaek Park	1cade067e3	[Vulkan] Vulkan backend is now thread-safe (#67733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67733 Vulkan backend is now thread-safe: * `ThreadContext` class holds onto all per-thread Vulkan states such as Command, Descriptor and Resource objects. * `ThreadContext::SingletonThreadLocalObject<T>` is a very light version of `folly::SingletonThreadLocal` (https://github.com/facebook/folly/blob/main/folly/SingletonThreadLocal.h). It holds a static object with `thread_local` modifier. It is tied with a `GPU` object which allows us to expand multi-threaded GPU backend and multi-GPU capability in the future. The lifetime of `SingletonThreadLocalObject<T>` object is from the first call (instantiation) to the termination of thread. * `MAKE_VULKAN_THREADSAFE` preprocessor is used for BUCK and the implementation of thread-safe Vulkan backend. We can quickly exclude it from the BUCK if any unexpected issue gets uncovered in the future. Once we are confident it's stable, we can remove the preprocessor from the code. * A new perf test is added with `{3,40,221,193}` with 3 threads. * `vkQueueSubmit` is not thread-safe, only one thread can push the commands at a time (See https://vkguide.dev/docs/chapter-1/vulkan_command_flow/#vulkan-command-execution). The number of available queues depends on GPU. It could be 1 and we cannot assume we can create multiple queues. Thus, we need to avoid calling `vkQueueSubmit` from multiple threads at the same time. When running Vulkan backend in different threads without any locking mechanism, `vkQueueSubmit` will get the `VK_ERROR_INITIALIZATION_FAILED(-3)` error. * In the `Context::~Context()`, we should not call `flush()` since all per-thread objects will be destroyed as each thread exits. From the following logs, you can verify all per-thread objects are getting destroyed as their threads are terminated. The logs captured all ctor/dtor calls when running Vulkan backend with 3 different threads: ``` ThreadContext::ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] Context::Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] Resource::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] Command::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] Resource::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] Command::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] Resource::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] Command::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] Descriptor::Pool::Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] Descriptor::Pool::Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] Descriptor::Pool::Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> enter Descriptor::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965310] device_[0x7f94998cf218] descriptor_pool_[0x4b7df1000000002f] -> leave Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> enter Command::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965068] device_[0x7f94998cf218] command_pool_[0xfa21a40000000003] -> leave Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> enter Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> enter Descriptor::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d510] device_[0x7f94998cf218] descriptor_pool_[0x980b0000000002e] -> leave Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> enter Command::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d268] device_[0x7f94998cf218] command_pool_[0xead9370000000008] -> leave Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> enter Resource::Pool::~Pool() -> thread[0x7000095ab000] this[0x7f9489965258] device_[0x7f94998cf218] allocator_[0x7f947980ee00] -> leave Resource::Pool::~Pool() -> thread[0x70000962e000] this[0x7f947980d458] device_[0x7f94998cf218] allocator_[0x7f949b119c00] -> leave Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> enter Descriptor::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee910] device_[0x7f94998cf218] descriptor_pool_[0xa43473000000002d] -> leave Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> enter Command::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee668] device_[0x7f94998cf218] command_pool_[0xcad092000000000d] -> leave Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> enter Resource::Pool::~Pool() -> thread[0x1207d5e00] this[0x7f949a0ee858] device_[0x7f94998cf218] allocator_[0x7f9499901c00] -> leave Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> enter Context::~Context() -> thread[0x1207d5e00] this[0x7f9489981800] device_[1] -> leave ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> enter ThreadContext::~ThreadContext() -> thread[0x1207d5e00] this[0x0x7f9489981e28] -> leave ``` Some notes on unexpected behaviors by `VkQueue`: * We need to make sure only one thread accesses `VkQueue` at a time if multi-threaded. Or we need to have a locking mechanism to protect `VkQueue` from multiple threads. This approach is used for this change. * To avoid having lock overhead, we tried to have per-thread `VkQueue` (having separate object per thread) didn't fix `VK_ERROR_INITIALIZATION_FAILED` error by `vkQueueSubmit` call. This was not expected. Interestingly, MacOS doesn't crash with this per-thread approach but no wonder since its behavior has been not that reliable. Not sure it's an Android Vulkan driver issue or not. * Making the entire `Context` as `thread_local` without any lock actually fixes the same error. Test Plan: Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 ``` //xplat/caffe2:pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/fe3a39b8/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64 buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid#android-arm64: 1 file pushed, 0 skipped. 145.4 MB/s (826929592 bytes in 5.426s) Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ============================================================================================================= Thread-safe Vulkan backend on Google Pixel 5 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 55.8 ms 15.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 25.6 ms 4.08 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 60.6 ms 14.3 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.52 ms 0.757 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 7.16 ms 0.770 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 35.9 ms 38.8 ms 3000 ============================================================================================================= Non thread-safe Vulkan backend on Google Pixel 5 ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 55.0 ms 14.5 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 25.8 ms 4.30 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 60.6 ms 14.5 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.52 ms 0.761 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 7.15 ms 0.765 ms 5000 ``` For the single thread scenario of thread-safe and non thread-safe versions, the difference between them is less than 2% which is acceptable. In other words, there is no considerable performance degradation with the thread-safe Vulkan backend by using: * singleton thread local objects for `Command`, `Descriptor` and `Resource` pools * mutex lock for `VkQueueCommit` call Test result on MacOS ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 11.96, 7.17, 5.45 *WARNING* Library was built as DEBUG. Timings may be affected. ============================================================================================================= Thread-safe Vulkan backend on MacOS ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 58.4 ms 42.8 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 12.3 ms 5.43 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 56.0 ms 41.2 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.00 ms 1.52 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.56 ms 1.34 ms 5000 cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:3 42.8 ms 42.8 ms 3000 ============================================================================================================= Non thread-safe Vulkan backend on MacOS ------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 58.6 ms 42.6 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.3 ms 4.67 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 57.6 ms 42.4 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.89 ms 1.45 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.47 ms 1.27 ms 5000 ``` Non thread-safe version is slightly faster than the thread-safe one. This test result is only for reference since we cannot trust MacOS that has an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) on top of `Metal`. Reviewed By: SS-JIA Differential Revision: D32093974 fbshipit-source-id: 9eab7f0db976eff717540a5b32f94ed17a00b662	2021-11-16 22:09:32 -08:00
Peter Bell	2317e28e9e	Enable complex autograd for col2im / im2col (#68199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68199 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32467043 Pulled By: mruberry fbshipit-source-id: 9094aff036f75b280422e210f7089140ea61fc71	2021-11-16 21:11:44 -08:00
Peter Bell	fea2bb64c8	OpInfos for stft, istft, fftshift, ifftshift (#68198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68198 This unearths some bugs in istft backward, so I've disabled backward tests but it's fixed in the next PR in the stack. cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32467044 Pulled By: mruberry fbshipit-source-id: 5cf49560cbeb0263a66aafb48ed1bcc8884b75f1	2021-11-16 21:09:54 -08:00
Can Balioglu	6e640a0acf	Revise the socket implementation of c10d (#68226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68226 Note that this PR is unusually big due to the urgency of the changes. Please reach out to me in case you wish to have a "pair" review. This PR introduces a major refactoring of the socket implementation of the C10d library. A big portion of the logic is now contained in the `Socket` class and a follow-up PR will further consolidate the remaining parts. As of today the changes in this PR offer: - significantly better error handling and much more verbose logging (see the example output below) - explicit support for IPv6 and dual-stack sockets - correct handling of signal interrupts - better Windows support A follow-up PR will consolidate `send`/`recv` logic into `Socket` and fully migrate to non-blocking sockets. ## Example Output ``` [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [W logging.h:28] The server socket on [localhost]:29501 is not yet listening (Error: 111 - Connection refused), retrying... [I logging.h:21] The server socket will attempt to listen on an IPv6 address. [I logging.h:21] The server socket is attempting to listen on [::]:29501. [I logging.h:21] The server socket has started to listen on [::]:29501. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42650. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42650. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42722. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42722. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42724. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42724. [I logging.h:21] The client socket will attempt to connect to an IPv6 address on (127.0.0.1, 29501). [I logging.h:21] The client socket is attempting to connect to [localhost]:29501. [I logging.h:21] The client socket has connected to [localhost]:29501 on [localhost]:42726. [I logging.h:21] The server socket on [::]:29501 has accepted a connection from [localhost]:42726. ``` ghstack-source-id: 143501987 Test Plan: Run existing unit and integration tests on devserver, Fedora, Ubuntu, macOS Big Sur, Windows 10. Reviewed By: Babar, wilson100hong, mrshenli Differential Revision: D32372333 fbshipit-source-id: 2204ffa28ed0d3683a9cb3ebe1ea8d92a831325a	2021-11-16 20:49:25 -08:00
Matthias Reis	4c346bd073	Added forward derivatives for neg, diag, inverse, linalg_eig (#67837 ) Summary: Recreated due to CI failures as per comment https://github.com/pytorch/pytorch/pull/67339#issuecomment-959893293 === See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67837 Reviewed By: mrshenli Differential Revision: D32403662 Pulled By: soulitzer fbshipit-source-id: 529cb93f865ce4cc2e24fa6f672d4234e7abe2b1	2021-11-16 20:32:47 -08:00
Don Jang	aa9ee8d02a	[Static Runtime] Avoid copying function objects per StaticRuntime instance (#68368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68368 Currently, each instance of `StaticRuntime` has its own copy of `std::function` object wrapped in `ProcessedNode::Function` object, in order to invoke actual operation implementation. However, all instances of `StaticRuntime` derived from same `StaticModule` objects invoke exactly same op implementation, and this is avoidable. This change adds `StaticModule::functions_` member variable to keep a list of unique instance of `ProcessedFunction` objects. A newly constructed `StaticRuntime` takes `ProcessedFunction`'s pointers instead of the whole function object. This can save a substantial amount of memory per `StaticRuntime` instance. This comes with a sacrifice in execution time. Now that a `ProcessedNode` instance keeps the function object's pointer, executing a node now involves an extra pointer dereference. However, this cost was proved to be negligible from local performance tests. Thanks to hlu1 for proposing this non-intrusive improvement idea :D Test Plan: This change reduces the size of a StaticRuntime instance by 14.41% (459KB -> 393KB) (patched D32181666 to print the memory turnover from instantiating a StaticRuntime instance) for CMF/local ( & 8% for CMF/local_ro). No noticeable latency regression was observed. ==AFTER * CMF/local memory turnover: 393608 latency: PyTorch run finished. Milliseconds per iter: 15.6965. Iters per second: 63.7087 * CMF/local_ro memory turnover:387288 latency: PyTorch run finished. Milliseconds per iter: 7.51308. Iters per second: 133.101 ==BEFORE * CMF/local memory turnover: 459888 latency: PyTorch run finished. Milliseconds per iter: 15.8278. Iters per second: 63.18 * CMF/local_ro memory turnover: 420832 latenfcy: PyTorch run finished. Milliseconds per iter: 7.43756. Iters per second: 134.453 ==Confirmation that ptvsc2_predictor_bench reports the same memrmoy management stats for inline_cvr: ==AFTER Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%) Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ==BEFORE Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2677 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1937 (99.5354%) Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) Reviewed By: swolchok Differential Revision: D32337548 fbshipit-source-id: e714e735399c93fde337b0f70e203a2de632057a	2021-11-16 20:28:48 -08:00
Richard Barnes	fd85d925b0	Fix some sign issues (#68361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68361 Fixes ``` caffe2/aten/src/ATen/FunctionalizeFallbackKernel.cpp:36:31: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'const unsigned long' [-Werror,-Wsign-compare] for (int64_t idx = 0; idx < num_returns; ++idx) { ~~~ ^ ~~~~~~~~~~~ caffe2/aten/src/ATen/native/cuda/Sorting.cpp:87:16: error: comparison of integers of different signs: 'int64_t' (aka 'long') and 'std::vector::size_type' (aka 'unsigned long') [-Werror,-Wsign-compare] assert(dim < out_shape.size()); ~~~ ^ ~~~~~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D32433063 fbshipit-source-id: b896dbab81861f3f074e00db73d20d9523037dd1	2021-11-16 20:18:58 -08:00
Jane Xu	173c0f8a98	Actually enable PYTORCH_RETRY_TEST_CASES for linux tests (#68486 ) Summary: After realizing that CUDA mem leaks were not rerun, I realized I forgot to pass the env var as a Docker variable. What a noob mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68486 Reviewed By: malfet, seemethere Differential Revision: D32477989 Pulled By: janeyx99 fbshipit-source-id: e28d095773f50864ab49229e434187a9ecb004cc	2021-11-16 19:02:03 -08:00
Ivan Yashchuk	affa3f846c	Sparse CSR CPU: add `torch.addmm` (#65606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65606 This PR adds `torch.addmm(c, a, b, alpha=1.0, beta=0.0, out=out)` variant with `a, b, c, out` all being sparse CSR tensors on CPU. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32366236 Pulled By: cpuhrsch fbshipit-source-id: e910bcc96eee99d624b80ee881df3887ab3ba5ac	2021-11-16 17:22:46 -08:00
David Berard	5cfca5524c	[JIT] clear GraphFunction.optimized_graphs_ after freezing a module (#68316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68316 Consider the following: ``` class Mod(nn.Module): def __init__(self, val): super().__init__() self.param = nn.Parameter(val) def forward(self, x): # this method will change during freezing return x + self.param torch.jit.export def make_prediction(self, x): y = x + x return self.forward(y) param = torch.rand([2, 2]) unscripted_mod = Mod(param) mod = torch.jit.script(unscripted_mod) mod.eval() mod = torch.jit.freeze(mod, preserved_attrs=["make_prediction"])` ``` During freezing the following will occur: 1. do some pre-freezing, including inlining; in particular, forward will be inlined into make_prediction. During inlining, forward.optimized_graph() is called, and the result is cached 2. freeze some methods. While freezing forward, the graph associated with the function will get updated. The cached optimized_graphs_ are not updated. Previously, a call to `mod.forward(x)` would return an exectutor that would run on the old cached optimized_graph(). This would mean that the freezing optimizations would not apply, and potentially that the execution would fail because of parameters removed from the module. This change clears the optimized_graphs_ cache after running freezing to prevent executing an old version of the graph. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32410862 Pulled By: davidberard98 fbshipit-source-id: dd8bfe86ec2898b7c72813ab32c08f25c38e4cea	2021-11-16 17:15:29 -08:00
Hao Lu	75ccb07b26	[SR] LOG->VLOG (#68477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68477 We're printing a lot of unnecessary logs in prod. Change these from LOG(INFO) to VLOG(1) so you can easily flip them back for testing. Test Plan: CI Reviewed By: ajyu, d1jang Differential Revision: D32439776 fbshipit-source-id: 40fa57f4eeb6ca0b610008062cc94aed62fb6981	2021-11-16 17:09:52 -08:00
Saketh Are	515d9fb2a9	Add OpInfo for torch.histc (#67452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67452 Reviewed By: davidberard98 Differential Revision: D32453690 Pulled By: saketh-are fbshipit-source-id: 6311519dc1b2e92a200d0455d32a9c7301a45d51	2021-11-16 13:55:30 -08:00
Yanli Zhao	a8bcfc90f5	fix fsdp overlap flaky test (#68415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68415 remove e4["cpu_iter"] from short list as cpu may take some time to queue both compute and all-gather. close #68391 ghstack-source-id: 143478769 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D32457334 fbshipit-source-id: baeedfb628ce4554a1ef365c3a2de27b8884f6d4	2021-11-16 13:52:13 -08:00
Nikita Shulga	27eca2c6fd	Revert D32467139: [pytorch][PR] [android][fbjni] Update fbjni to 0.2.2 Test Plan: revert-hammer Differential Revision: D32467139 (`04056df475`) Original commit changeset: 49e155989d2d fbshipit-source-id: ce03be3c6f209a6e9969660bd823d5343a7f0615	2021-11-16 13:50:50 -08:00
Aditya Tewary	284758b585	correct NLLLoss parameters default value (#68426 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17577 Previous `size_average by default: True` `reduce by default: True` Present `size_average by default: None` `reduce by default: None` Pull Request resolved: https://github.com/pytorch/pytorch/pull/68426 Reviewed By: VitalyFedyunin Differential Revision: D32463324 Pulled By: jbschlosser fbshipit-source-id: 7ba9cd03c9fb6b2f19301e7e39c3c490de17202b	2021-11-16 13:45:52 -08:00
Kefei Lu	76e9dbb0f4	[torch.fx] add code-gen customizability and support for setting breakpoint in code-gen'd forward() call (#67139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67139 This diff enables setting breakpoint in the graph module's generated python code. See test plan for usage. In order to support this functionality, and other similar functionalities to customize the generated code, a code transformer functionality is added to `fx.Graph`. This allows flexible customization of `fx.Graph`'s code gen behavior, in composable and functional ways. See test plan for its usage. Test Plan: ### Use of `fx.experimental.debug.set_trace` ``` In [2]: from torch.fx.experimental.debug import set_trace In [3]: set_trace(ttop) Out[3]: top( (a): Sub() ) In [4]: ttop(1) > /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.10(6)forward() (Pdb) l 1 2 3 4 def forward(self, x): 5 import pdb; pdb.set_trace() 6 -> a = self.a(x); x = None 7 getitem = a[0] 8 getitem_1 = a[0]; a = None 9 add = getitem + getitem_1; getitem = getitem_1 = None 10 return add 11 (Pdb) ``` ### Use of `on_generate_code` ``` In [1]: def insert_pdb(body): ...: return ['import pdb; pdb.set_trace()\n', *body] ...: In [8]: type(ttop) Out[8]: torch.fx.graph_module.GraphModule.__new__.<locals>.GraphModuleImpl In [10]: with ttop.graph.on_generate_code(lambda _: insert_pdb): ...: ttop.recompile() ...: print(f"== _on_generate_code should not be None: { ttop.graph._on_generate_code }") ...: print(ttop.code) ...: == _on_generate_code should not be None: <function insert_pdb at 0x7fc9895ddd30> def forward(self, x): import pdb; pdb.set_trace() a = self.a(x); x = None getitem = a[0] getitem_1 = a[0]; a = None add = getitem + getitem_1; getitem = getitem_1 = None return add In [11]: ttop.graph._on_generate_code # restored to None In [12]: ttop(1) # this should drop into pdb > /data/users/kefeilu/fbsource33/fbcode/buck-out/dev/gen/caffe2/torch/fb/fx2trt/<eval_with_key>.6(6)forward() (Pdb) l 1 2 3 4 def forward(self, x): 5 import pdb; pdb.set_trace() 6 -> a = self.a(x); x = None 7 getitem = a[0] 8 getitem_1 = a[0]; a = None 9 add = getitem + getitem_1; getitem = getitem_1 = None 10 return add 11 ``` Reviewed By: jamesr66a Differential Revision: D30736160 fbshipit-source-id: 9646867aae0461b5131dfd4ba9ee77a8c2ea9c93	2021-11-16 13:28:11 -08:00
Scott Wolchok	8954c92529	[PyTorch][Static Runtime] Borrow outputs in static_runtime::VarTupleUnpack (#68161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68161 Continuing rollout of borrowing outputs for native ops. ghstack-source-id: 143424920 Test Plan: Compare CMF local_ro perf again. Previous diff: ``` I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313 I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163 I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186 I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898 I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946 I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174 I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567 I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005 I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971 I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509 I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053 ``` This diff: ``` I1110 22:18:48.453075 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.931188. Iters per second: 1073.9 I1110 22:18:48.967614 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.933196. Iters per second: 1071.59 I1110 22:18:49.483338 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.932087. Iters per second: 1072.86 I1110 22:18:49.997144 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.930877. Iters per second: 1074.26 I1110 22:18:50.529383 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.936981. Iters per second: 1067.26 I1110 22:18:51.085038 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.953214. Iters per second: 1049.08 I1110 22:18:51.607192 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.940719. Iters per second: 1063.02 I1110 22:18:52.126169 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.942638. Iters per second: 1060.85 I1110 22:18:52.644445 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.937574. Iters per second: 1066.58 I1110 22:18:53.163486 191647 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 0.941636. Iters per second: 1061.98 I1110 22:18:53.163537 191647 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 0.938011, standard deviation: 0.00691196 ``` 0.099 (9.5%!) usec/iter improvement over previous diff Reviewed By: hlu1 Differential Revision: D32347900 fbshipit-source-id: 8169ebcadf1248e555a18bbffa99eef6cac1ba85	2021-11-16 12:32:15 -08:00
Scott Wolchok	755be54c77	[PyTorch][Static Runtime] Borrow outputs in static_runtime::dict_unpack (#68160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68160 This generalizes the mechanism D32318674 added for letting native ops borrow their outputs and uses it in dict_unpack. ghstack-source-id: 143424919 Test Plan: 4.5% in CMF local_ro compared to D32318674 (previous two diffs were necessary steps but didn't get the full win yet): ``` FastAliasingInSelectTensor, local_ro ======================================== I1110 22:06:37.549811 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08488. Iters per second: 921.76 I1110 22:06:38.147949 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08675. Iters per second: 920.171 I1110 22:06:38.766340 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08626. Iters per second: 920.592 I1110 22:06:39.366608 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08376. Iters per second: 922.717 I1110 22:06:39.964979 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08362. Iters per second: 922.833 I1110 22:06:40.565248 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08423. Iters per second: 922.312 I1110 22:06:41.167326 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0945. Iters per second: 913.659 I1110 22:06:41.766187 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08373. Iters per second: 922.742 I1110 22:06:42.367816 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08995. Iters per second: 917.475 I1110 22:06:42.968391 119627 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.08854. Iters per second: 918.665 I1110 22:06:42.968446 119627 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.08662, standard deviation: 0.00351662 BorrowDictUnpackOutputs, local_ro ======================================== I1110 22:05:23.245435 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03272. Iters per second: 968.313 I1110 22:05:23.822196 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.06478. Iters per second: 939.163 I1110 22:05:24.395256 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.035. Iters per second: 966.186 I1110 22:05:24.964169 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.02786. Iters per second: 972.898 I1110 22:05:25.536558 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03205. Iters per second: 968.946 I1110 22:05:26.109027 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04256. Iters per second: 959.174 I1110 22:05:26.679611 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03245. Iters per second: 968.567 I1110 22:05:27.253048 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.04493. Iters per second: 957.005 I1110 22:05:27.822629 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.0299. Iters per second: 970.971 I1110 22:05:28.393326 113949 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.03039. Iters per second: 970.509 I1110 22:05:28.393368 113949 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.03726, standard deviation: 0.0111053 ``` 0.04936 (4.5%) usec/iter improvement Reviewed By: hlu1 Differential Revision: D32347390 fbshipit-source-id: e636ddafacf30ed2a2d84a6e15fff97481342fdb	2021-11-16 12:31:03 -08:00
Scott Wolchok	bbc24222d2	[PyTorch][Static Runtime] Refcount bump pass in native_ops (#68159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68159 These all look like they'll cause unnecessary refcount bumps to me. ghstack-source-id: 143424917 Test Plan: CI TODO profile local_ro? Reviewed By: hlu1 Differential Revision: D32347392 fbshipit-source-id: d8ed91b5855b86765db00c61ad3650273302c7b6	2021-11-16 12:27:12 -08:00
Saketh Are	86399d8e0c	Add histogramdd to torch.rst (#68273 ) Summary: The `torch.histogramdd` operator is documented in `torch/functional.py` but does not appear in the generated docs because it is missing from `docs/source/torch.rst`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68273 Reviewed By: cpuhrsch Differential Revision: D32470522 Pulled By: saketh-are fbshipit-source-id: a23e73ba336415457a30bae568bda80afa4ae3ed	2021-11-16 11:55:40 -08:00
Scott Wolchok	ed00a763a2	[PyTorch] Don't force refcount bump when accessing DictEntryRef key/value (#68158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68158 to() sometimes returns a reference; let's forward that through. ghstack-source-id: 143424916 Test Plan: Combined with following diff, seeing a huge drop in dict_unpack self time in ctr_mobile_feed local_ro net. Following diff by itself didn't work. Reviewed By: suo Differential Revision: D32347391 fbshipit-source-id: da96295bf83ea30867a2e3fceedc9b4e0a33ffa3	2021-11-16 11:44:08 -08:00
Ivan Kobzarev	04056df475	[android][fbjni] Update fbjni to 0.2.2 (#68400 ) Summary: ghstack-source-id: caeb8df3a18a6fa48d591af126ac59d8e41494b5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68400 Fixes #{issue number} Updates fbjni version to 0.2.2 ci-all PR: https://github.com/pytorch/pytorch/pull/68401 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68402 Reviewed By: linbinyu Differential Revision: D32467139 Pulled By: IvanKobzarev fbshipit-source-id: 49e155989d2dbafedd5b2df77e089e25e8b4f8f8	2021-11-16 11:34:46 -08:00
Scott Wolchok	df129fa8d6	[PyTorch] Support MaybeOwned<IValue> (#68157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68157 Does what it says on the tin. I don't have a use for `MaybeOwned<IValue>` itself right now, but following diffs will use `MaybeOwnedTraits<IValue>::{create,destroy}Borrow` and I thought it best to just provide the full thing. ghstack-source-id: 143424915 Test Plan: Extended MaybeOwned tests to cover this. Reviewed By: hlu1 Differential Revision: D32347393 fbshipit-source-id: 219658cb69b951d36dee80c2ae51387328224866	2021-11-16 11:24:32 -08:00
Saketh Are	030ee34216	Add OpInfo for torch.nonzero (#67459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67459 Reviewed By: davidberard98 Differential Revision: D32453687 Pulled By: saketh-are fbshipit-source-id: e7ed5601686d88407bf67bca0f75304b30fa7ac5	2021-11-16 11:10:43 -08:00
Scott Wolchok	10e9d80ad1	[PyTorch][Static Runtime] Don't track scalar ivalues (#67702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67702 This isn't a particularly large optimization and it does nothing before select_tensor is introduced (I'm surprised that no operators have optimizable outputs!), but it seems like we should probably get the savings. ghstack-source-id: 143424918 Test Plan: CI; checked `--do_profile=1` ouput with following diff and we save tracking hundreds of values, as expected. Reviewed By: hlu1 Differential Revision: D32112522 fbshipit-source-id: 1804b77992a73670bfc1e36af608b852b8261bd2	2021-11-16 11:05:42 -08:00
eqy	391be39575	Use reduced precision switch in `test_addmm_baddbmm_overflow` (#68399 ) Summary: https://github.com/pytorch/pytorch/issues/68125 Checking to see if actually using the switch fixes the test... CC mruberry ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/68399 Reviewed By: VitalyFedyunin Differential Revision: D32466974 Pulled By: ngimel fbshipit-source-id: aa8643ed913b344977fd103974625c527d20dbb8	2021-11-16 10:50:17 -08:00
Michael Suo	5c3529a86d	[lint] small pass to make lint clean (#68367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68367 - bmm_test.py was using syntax not allowed in 3.6 - Some suppressions were not placed on the correct line. With this file, ``` lintrunner --paths-cmd='git grep -Il .' ``` passes successfully. Test Plan: Imported from OSS Reviewed By: janeyx99, mrshenli Differential Revision: D32436644 Pulled By: suo fbshipit-source-id: ae9300c6593d8564fb326822de157d00f4aaa3c2	2021-11-16 10:27:00 -08:00
Scott Wolchok	639258499f	[PyTorch][Static Runtime] Add & use "small array" for ProcessedNodeInputs (#67935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67935 Rationale should be documented in code comments. In short, we can avoid heap-allocating arrays of input indexes for operators with 5 or fewer inputs, at the cost of a tag bit check on access. ghstack-source-id: 143429112 Test Plan: Patched d1jang's D32181666, which prints static runtime memory usage. Previous diff, local: ``` I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208 ``` This diff, local: ``` I1105 12:48:35.820663 1066520 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 338064 ``` 4.5% savings (16144 bytes) Ran 10 repetitions of CMF local_ro with core pinning: P467095603. This diff is perf neutral compared to the previous diff. Reviewed By: hlu1 Differential Revision: D32216573 fbshipit-source-id: d18483db255f75f1d90e610ecded7727c6ffe65c	2021-11-16 10:21:12 -08:00
Scott Wolchok	6acde23bec	[PyTorch][Static Runtime] Switch input/output repr to 2-byte offsets (#67934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67934 This reduces the memory requirements of ProcessedNode: by allocating outputs sequentially into a shared array and supporting at most 2**16 - 1 values (current models seem to have 10-20x less than that), we only need to store the 2-byte offset into that array and 2-byte number of outputs in ProcessedNode. ghstack-source-id: 143429113 Test Plan: Patched d1jang's diff to measure memory turnover around SR startup. Previous diff, CMF local: ``` I1104 12:19:39.900211 597593 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 427120 ``` This diff, CMF local: ``` I1105 12:17:36.459688 866763 PyTorchStaticRuntimePredictor.cpp:82] memory turnover after creating an instance of StaticRuntime: 354208 72912 bytes (17%) savings ``` Perf looks neutral; see next diff (D32216573) test plan for details. Reviewed By: hlu1 Differential Revision: D32190751 fbshipit-source-id: 30c1e2caa9460f0d83b2d9bb24c68ccfcef757cc	2021-11-16 10:19:50 -08:00
Scott Wolchok	8678472ec8	[PyTorch][Static Runtime] Save 2 pointers in ProcessedNode (#67860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67860 We don't need 8-byte sizes for inputs and outputs, and we only need op names if profiling isn't disabled. ghstack-source-id: 143429111 Test Plan: Ran CMF local & local_ro with recordio inputs. I'm calling the result inconclusive/neutral because I saw some noise (as you'll see below), but that's fine with me since this is a clear memory win. ``` Nov4Stable, local_ro ======================================== I1104 09:53:08.875444 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19925. Iters per second: 833.851 I1104 09:53:10.200443 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.1996. Iters per second: 833.608 I1104 09:53:11.524045 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.19746. Iters per second: 835.103 I1104 09:53:12.851861 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20479. Iters per second: 830.019 I1104 09:53:14.183387 505783 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20487. Iters per second: 829.964 I1104 09:53:14.183427 505783 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.2012, standard deviation: 0.00341762 re-ran stable in light of baffling regression (see next entry), and sure enough we still have some significant run-to-run-variation: I1104 09:56:15.244969 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24956. Iters per second: 800.28 I1104 09:56:16.621292 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24776. Iters per second: 801.437 I1104 09:56:18.018808 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25247. Iters per second: 798.42 I1104 09:56:19.399660 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25054. Iters per second: 799.656 I1104 09:56:20.781828 524012 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.25052. Iters per second: 799.664 I1104 09:56:20.781878 524012 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.25017, standard deviation: 0.00171396 Nov4SaveTwoWordsInProcessedNode, local_ro ======================================== I1104 09:53:42.070139 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.2411. Iters per second: 805.736 I1104 09:53:43.438390 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24102. Iters per second: 805.788 I1104 09:53:44.773303 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.20682. Iters per second: 828.621 I1104 09:53:46.110538 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21216. Iters per second: 824.973 I1104 09:53:47.448279 508309 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.21265. Iters per second: 824.639 I1104 09:53:47.448334 508309 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22275, standard deviation: 0.0168698 early runs look like a glitch, rerunning I1104 09:54:20.999117 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24558. Iters per second: 802.841 I1104 09:54:22.376780 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24436. Iters per second: 803.623 I1104 09:54:23.738584 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23176. Iters per second: 811.845 I1104 09:54:25.113063 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.24938. Iters per second: 800.395 I1104 09:54:26.476349 511022 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23552. Iters per second: 809.377 I1104 09:54:26.476395 511022 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.24132, standard deviation: 0.00737197 Nov4Stable, local ======================================== I1104 09:57:56.854537 533814 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177885632 I1104 09:58:02.829813 533814 PrepareModelInputs.cpp:190] Loaded 696 records. I1104 09:58:03.010681 533814 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590507056 I1104 09:58:03.010710 533814 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1 I1104 09:58:58.839010 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0567. Iters per second: 49.8586 I1104 09:59:54.797755 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.1007. Iters per second: 49.7494 I1104 10:00:50.696525 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0657. Iters per second: 49.8363 I1104 10:01:46.514736 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0696. Iters per second: 49.8265 I1104 10:02:42.378270 533814 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 20.0641. Iters per second: 49.8402 I1104 10:02:42.378316 533814 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 20.0714, standard deviation: 0.0170605 I1104 10:02:42.378325 533814 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591882400 Nov4SaveTwoWordsInProcessedNode, local ======================================== I1104 10:38:15.543320 733514 PyTorchPredictorBenchLib.cpp:346] memory turnover after getPredictor: 177721792 I1104 10:38:21.224673 733514 PrepareModelInputs.cpp:190] Loaded 696 records. I1104 10:38:21.382973 733514 PyTorchPredictorBenchLib.cpp:353] memory turnover before benchmarking: 4590343216 I1104 10:38:21.382992 733514 PyTorchPredictorBenchLib.cpp:154] PyTorch predictor: number of prediction threads 1 I1104 10:39:17.005359 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9498. Iters per second: 50.1257 I1104 10:40:12.545269 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9279. Iters per second: 50.1808 I1104 10:41:08.138119 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.999. Iters per second: 50.0026 I1104 10:42:03.686841 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.9115. Iters per second: 50.2222 I1104 10:42:55.137498 733539 Proxy2Connection.cpp:343] Received NotRegisteredException from Configerator Proxy2. I1104 10:42:55.138715 733539 ReadOnlyConnectionIf.h:91] Mark connection as healthy. I1104 10:42:55.384534 733514 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6297. Iters per second: 50.9433 I1104 10:42:55.384579 733514 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.8836, standard deviation: 0.14571 I1104 10:42:55.384588 733514 PyTorchPredictorBenchLib.cpp:366] memory turnover after benchmarking: 4591711760 ``` Reviewed By: d1jang Differential Revision: D32177531 fbshipit-source-id: 267e38a151d2dbab34fd648135d173cfbee1c22e	2021-11-16 10:12:53 -08:00
Michael Suo	45b2f41c3e	[package] fix torchscript classes in package (#68028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68028 Today, we demangle a typename before passing it to the TorchScript compiler. This breaks compilation of torch classes in cases where we are attempting to script the same class name from inside a package and out, since we will return the same qualified name for both. Differential Revision: D32261907 D32261907 Test Plan: Imported from OSS Reviewed By: saketh-are Pulled By: suo fbshipit-source-id: 921bc03ad385d94b9279fbc6f3b7dcd0ddbe5bc7	2021-11-16 10:01:40 -08:00
Thomas Metcalfe	ba16b1eca7	[numpy] Alias `arctan2` to `atan2` (#67010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65906 Adds an alias `arctan2` to improve numpy compatibility cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/67010 Reviewed By: anjali411 Differential Revision: D32378998 Pulled By: mruberry fbshipit-source-id: 424c5c10c12b49c20ee83ccd109325c480b5b6cf	2021-11-16 09:41:09 -08:00
Sangbaek Park	6226a3cf74	[Vulkan] Implement permute operator (#68274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68274 Implemented `permute` operator on the Vulkan backend: * Supports only <= 4D tensors. * Builds up shader operations from the output texture point of view to avoid the nondeterministic order of GPU shader operations between texels. See [incoherent memory access](https://www.khronos.org/opengl/wiki/Memory_Model#Incoherent_memory_access) * Generalized input tensors to 4D ones to simplify input/output texture handling. For example, {2, 3} is treated as {1,1,2,3} internally. * 1D to 4D inputs with all possible permutations are used for test cases. * Reference on CPU implementation of `permute` operator: [TensorShape.cpp](`cbf596bf8e/aten/src/ATen/native/TensorShape.cpp (L936)`) * When shuffling dims, a new depth size of output texture needs to be determined by `ceil(batchchannel)/4`. This logic needs to be handled in a separate change. The depth of texture cannot exceed a certain number, depending on the device's capability. It is typically 2048 on most of android devices but less than or equal to 16,384 (see [Value distribution for maxImageDimension3D on Android](https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxImageDimension3D&platform=android)). i.e., 2048 on MacOS and Google Pixel 5. * Due to this limitation, `permute` op needs to throw an exception if the depth of output texture is greater than or equal to `VkImageFormatProperties.maxExtent.depth`. * Otherwise, the following error will occur: `-[MTLTextureDescriptorInternal validateWithDevice:]:1325: failed assertion "Texture Descriptor Validation MTLTextureDescriptor has depth (10664) greater than the maximum allowed size of 2048."` * Vulkan `permute` operator tensor conversion: {F679505029} {F679505223} * Vulkan `permute` operator shader equation: {F679504799} * Error/edge cases: ``` X = torch.randint(0, 23, (2, 3, 2, 2)) O = torch.permute(X, (2, 2, 1, 0)) # RuntimeError: repeated dim in permute O = torch.permute(X, (2, 1, 0)) # RuntimeError: number of dims don't match in permute O = torch.permute(X, (4, 3, 2, 1, 0)) # RuntimeError: number of dims don't match in permute O = torch.permute(X, (3, 2, -1, 0)) # RuntimeError: repeated dim in permute data2 = [0,1,2] X2 = torch.tensor(data2) O2 = torch.permute(X2, (0)) # permute(): argument 'dims' (position 2) must be tuple of ints, not int # TypeError: permute(): argument 'dims' (position 2) must be tuple of ints, not int O = torch.permute(X, (0, 1, 2, 3)) # do nothing since the dims doesn't change? ``` * Shader debug traces with a 4D tensor size {2,3,2,2} with permute by {3,2,1,0}: ``` output tensor: (1,1,.,.) = 0.4395 0.5652 0.1309 0.9768 0.0490 0.1127 (2,1,.,.) = 0.7058 0.2238 0.6542 0.4064 0.4813 0.0500 (1,2,.,.) = 0.1716 0.4951 0.2225 0.3255 0.0758 0.7150 (2,2,.,.) = 0.3762 0.0228 0.6367 0.4411 0.7682 0.7599 [ CPUFloatType{2,2,3,2} ] shader debug traces: src_index:0, b c h w: 0 0 0 0, posIn: (0 0 0) i:0 -> b c h w: 0 0 0 0, dst_index: 0, posOut: (0 0 0) j:0 -> inval[0.439453] outval[0.439453] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.439453 0.000000 0.000000 0.000000] src_index:3, b c h w: 1 0 0 0, posIn: (0 0 0) i:3 -> b c h w: 0 0 0 1, dst_index: 0, posOut: (1 0 0) j:0 -> inval[0.564941] outval[0.564941] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.564941 0.000000 0.000000 0.000000] src_index:1, b c h w: 0 1 0 0, posIn: (0 0 0) i:1 -> b c h w: 0 0 1 0, dst_index: 0, posOut: (0 1 0) j:0 -> inval[0.130859] outval[0.130859] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.130859 0.000000 0.000000 0.000000] src_index:4, b c h w: 1 1 0 0, posIn: (0 0 1) i:0 -> b c h w: 0 0 1 1, dst_index: 0, posOut: (1 1 0) j:0 -> inval[0.976562] outval[0.976562] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.976562 0.000000 0.000000 0.000000] src_index:2, b c h w: 0 2 0 0, posIn: (0 0 0) i:2 -> b c h w: 0 0 2 0, dst_index: 0, posOut: (0 2 0) j:0 -> inval[0.049011] outval[0.049011] -> inval[0.439453 0.130859 0.049011 0.564941] outval[0.049011 0.000000 0.000000 0.000000] src_index:5, b c h w: 1 2 0 0, posIn: (0 0 1) i:1 -> b c h w: 0 0 2 1, dst_index: 0, posOut: (1 2 0) j:0 -> inval[0.112671] outval[0.112671] -> inval[0.976562 0.112671 -65504.000000 -65504.000000] outval[0.112671 0.000000 0.000000 0.000000] src_index:0, b c h w: 0 0 1 0, posIn: (0 1 0) i:0 -> b c h w: 0 1 0 0, dst_index: 1, posOut: (0 0 0) j:1 -> inval[0.171509] outval[0.171509] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.439453 0.171509 0.000000 0.000000] src_index:3, b c h w: 1 0 1 0, posIn: (0 1 0) i:3 -> b c h w: 0 1 0 1, dst_index: 1, posOut: (1 0 0) j:1 -> inval[0.494873] outval[0.494873] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.564941 0.494873 0.000000 0.000000] src_index:1, b c h w: 0 1 1 0, posIn: (0 1 0) i:1 -> b c h w: 0 1 1 0, dst_index: 1, posOut: (0 1 0) j:1 -> inval[0.222412] outval[0.222412] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.130859 0.222412 0.000000 0.000000] src_index:4, b c h w: 1 1 1 0, posIn: (0 1 1) i:0 -> b c h w: 0 1 1 1, dst_index: 1, posOut: (1 1 0) j:1 -> inval[0.325439] outval[0.325439] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.000000 0.000000] src_index:2, b c h w: 0 2 1 0, posIn: (0 1 0) i:2 -> b c h w: 0 1 2 0, dst_index: 1, posOut: (0 2 0) j:1 -> inval[0.075745] outval[0.075745] -> inval[0.171509 0.222412 0.075745 0.494873] outval[0.049011 0.075745 0.000000 0.000000] src_index:5, b c h w: 1 2 1 0, posIn: (0 1 1) i:1 -> b c h w: 0 1 2 1, dst_index: 1, posOut: (1 2 0) j:1 -> inval[0.714844] outval[0.714844] -> inval[0.325439 0.714844 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.000000 0.000000] src_index:0, b c h w: 0 0 0 1, posIn: (1 0 0) i:0 -> b c h w: 1 0 0 0, dst_index: 2, posOut: (0 0 0) j:2 -> inval[0.705566] outval[0.705566] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.439453 0.171509 0.705566 0.000000] src_index:3, b c h w: 1 0 0 1, posIn: (1 0 0) i:3 -> b c h w: 1 0 0 1, dst_index: 2, posOut: (1 0 0) j:2 -> inval[0.223755] outval[0.223755] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.564941 0.494873 0.223755 0.000000] src_index:1, b c h w: 0 1 0 1, posIn: (1 0 0) i:1 -> b c h w: 1 0 1 0, dst_index: 2, posOut: (0 1 0) j:2 -> inval[0.653809] outval[0.653809] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.130859 0.222412 0.653809 0.000000] src_index:4, b c h w: 1 1 0 1, posIn: (1 0 1) i:0 -> b c h w: 1 0 1 1, dst_index: 2, posOut: (1 1 0) j:2 -> inval[0.406250] outval[0.406250] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.000000] src_index:2, b c h w: 0 2 0 1, posIn: (1 0 0) i:2 -> b c h w: 1 0 2 0, dst_index: 2, posOut: (0 2 0) j:2 -> inval[0.481201] outval[0.481201] -> inval[0.705566 0.653809 0.481201 0.223755] outval[0.049011 0.075745 0.481201 0.000000] src_index:5, b c h w: 1 2 0 1, posIn: (1 0 1) i:1 -> b c h w: 1 0 2 1, dst_index: 2, posOut: (1 2 0) j:2 -> inval[0.049957] outval[0.049957] -> inval[0.406250 0.049957 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.000000] src_index:0, b c h w: 0 0 1 1, posIn: (1 1 0) i:0 -> b c h w: 1 1 0 0, dst_index: 3, posOut: (0 0 0) j:3 -> inval[0.376221] outval[0.376221] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.439453 0.171509 0.705566 0.376221] outval_after[0.439453 0.171509 0.705566 0.376221] src_index:3, b c h w: 1 0 1 1, posIn: (1 1 0) i:3 -> b c h w: 1 1 0 1, dst_index: 3, posOut: (1 0 0) j:3 -> inval[0.022751] outval[0.022751] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.564941 0.494873 0.223755 0.022751] outval_after[0.564941 0.494873 0.223755 0.022751] src_index:1, b c h w: 0 1 1 1, posIn: (1 1 0) i:1 -> b c h w: 1 1 1 0, dst_index: 3, posOut: (0 1 0) j:3 -> inval[0.636719] outval[0.636719] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.130859 0.222412 0.653809 0.636719] outval_after[0.130859 0.222412 0.653809 0.636719] src_index:4, b c h w: 1 1 1 1, posIn: (1 1 1) i:0 -> b c h w: 1 1 1 1, dst_index: 3, posOut: (1 1 0) j:3 -> inval[0.440918] outval[0.440918] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.976562 0.325439 0.406250 0.440918] outval_after[0.976562 0.325439 0.406250 0.440918] src_index:2, b c h w: 0 2 1 1, posIn: (1 1 0) i:2 -> b c h w: 1 1 2 0, dst_index: 3, posOut: (0 2 0) j:3 -> inval[0.768066] outval[0.768066] -> inval[0.376221 0.636719 0.768066 0.022751] outval[0.049011 0.075745 0.481201 0.768066] outval_after[0.049011 0.075745 0.481201 0.768066] src_index:5, b c h w: 1 2 1 1, posIn: (1 1 1) i:1 -> b c h w: 1 1 2 1, dst_index: 3, posOut: (1 2 0) j:3 -> inval[0.759766] outval[0.759766] -> inval[0.440918 0.759766 -65504.000000 -65504.000000] outval[0.112671 0.714844 0.049957 0.759766] outval_after[0.112671 0.714844 0.049957 0.759766] ``` Test Plan: Build & test on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Build & test on MacOS: ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` Test result on Android (Google Pixel 5): ``` [ RUN ] VulkanAPITest.permute_2d_success [ OK ] VulkanAPITest.permute_2d_success (26 ms) [ RUN ] VulkanAPITest.permute_3d_success [ OK ] VulkanAPITest.permute_3d_success (6 ms) [ RUN ] VulkanAPITest.permute_4d_success [ OK ] VulkanAPITest.permute_4d_success (10 ms) [ RUN ] VulkanAPITest.permute_4dmclaren_success [ OK ] VulkanAPITest.permute_4dmclaren_success (1 ms) [ RUN ] VulkanAPITest.permute_4dbig_success [ OK ] VulkanAPITest.permute_4dbig_success (234 ms) [ RUN ] VulkanAPITest.permute_negativedims_success [ OK ] VulkanAPITest.permute_negativedims_success (0 ms) [ RUN ] VulkanAPITest.permute_1d_nochange [ OK ] VulkanAPITest.permute_1d_nochange (0 ms) [ RUN ] VulkanAPITest.permute_sameDims_nochange [ OK ] VulkanAPITest.permute_sameDims_nochange (1 ms) [ RUN ] VulkanAPITest.permute_invalidinputs_exceptions [ OK ] VulkanAPITest.permute_invalidinputs_exceptions (1 ms) ``` Test result on MacOS: ``` [ RUN ] VulkanAPITest.permute_2d_success [ OK ] VulkanAPITest.permute_2d_success (154 ms) [ RUN ] VulkanAPITest.permute_3d_success [ OK ] VulkanAPITest.permute_3d_success (13 ms) [ RUN ] VulkanAPITest.permute_4d_success [ OK ] VulkanAPITest.permute_4d_success (33 ms) [ RUN ] VulkanAPITest.permute_4dmclaren_success [ OK ] VulkanAPITest.permute_4dmclaren_success (2 ms) [ RUN ] VulkanAPITest.permute_4dbig_success [ OK ] VulkanAPITest.permute_4dbig_success (251 ms) [ RUN ] VulkanAPITest.permute_negativedims_success [ OK ] VulkanAPITest.permute_negativedims_success (2 ms) [ RUN ] VulkanAPITest.permute_1d_nochange [ OK ] VulkanAPITest.permute_1d_nochange (1 ms) [ RUN ] VulkanAPITest.permute_sameDims_nochange [ OK ] VulkanAPITest.permute_sameDims_nochange (0 ms) [ RUN ] VulkanAPITest.permute_invalidinputs_exceptions [ OK ] VulkanAPITest.permute_invalidinputs_exceptions (2 ms) ``` Reviewed By: SS-JIA Differential Revision: D32292554 fbshipit-source-id: dbeaee6ff98633022cf34d6da90662d81eac6b0e	2021-11-16 09:27:51 -08:00
Kurt Mohler	bc3d380ed1	Throw error when saving storages that view same data with different type (#66949 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58970 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66949 Reviewed By: albanD Differential Revision: D31926323 Pulled By: anjali411 fbshipit-source-id: f6e7acc0c1968b70a94f9b0b69a32780e8e21a62	2021-11-16 08:44:44 -08:00
David Berard	bf60c6e71b	[JIT] remove prim::SetAttr from list of ops with side effects (#68311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68311 prim::SetAttr is listed as an op with side effects, but in AliasDb, `analyzeSetAttr` already accounts for its behavior. By removing it from the list of ops with side effects, dead code elimination will work in a few other scenarios. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32409510 fbshipit-source-id: 52ed9e19f92afb95c669ad3c2440f72f9515ba4c	2021-11-16 08:39:24 -08:00
lezcano	add79722dd	Correct `householder_product` docs. (#68335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68335 When discussing https://github.com/pytorch/pytorch/pull/63880, we realised that the docs of `householder_product` were not correct. This PR fixes this. The new docs are slightly more difficult, but hopefully correct. Note that this is a LAPACK function in disguise, so it is expected the specification to be more difficult than normal. cc brianjo mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32429755 Pulled By: mruberry fbshipit-source-id: 3ac866d30984adcd9f3b83d7fa9ae7b7ae5d4b53	2021-11-16 07:54:24 -08:00
Nikita Vedeneev	01a8862582	OpInfo tests for `nn.functional.max_pool{n}d`. (#68075 ) Summary: As per title. It is planned to use these tests for fixing issues with the max_unpools' backward methods reported in https://github.com/pytorch/pytorch/issues/67658 and https://github.com/pytorch/pytorch/issues/67657. max_unpool.backward methods are not tested and implemented with custom kernels. We can replace these kernels with advanced indexing operations (i.e. `gather`) which are efficient and well tested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68075 Reviewed By: malfet Differential Revision: D32308317 Pulled By: mruberry fbshipit-source-id: 9f91c6e6a9d78c19230e93fc0a3164f4eb7b8ec5	2021-11-16 07:28:32 -08:00
Taylor Robie	33e9a0b5f6	[Reland] Python tracer. (#68325 ) Summary: There were two issues with the original PR: 1) My assumption that bound C functions could be trusted to stay alive was not valid. I'm still not entirely sure what was dying, but I've just added a cache so that the first time I see a function I collect the repr just like I was already doing with Python functions. 2) `std::regex` is known to be badly broken and prone to segfaults. Because I'm just doing a very simple prefix prune it's fine to do it manually; see `trimPrefix`. Long term we should move all of PyTorch to `re2` as the internal lint suggests, but CMake is hard and I couldn't get it to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68325 Reviewed By: chaekit Differential Revision: D32432596 Pulled By: robieta fbshipit-source-id: 06fb4bcdc6933a3e76f6021ca69dc77a467e4b2e	2021-11-15 23:32:49 -08:00
Richard Barnes	438ca7603f	Fix sign comparison issue in Histogram.cpp (#68294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68294 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D32403821 fbshipit-source-id: cdbf1d83ab02b1e996559e4cfbbe699b7165483a	2021-11-15 23:14:04 -08:00
Richard Barnes	ec742c65d5	Fix a sign comparison issue in BatchLinearAlgebraLib.cpp (#68293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68293 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D32403788 fbshipit-source-id: 1afc5e62e7157f144ec36b029ee3bcc6c23d65a1	2021-11-15 23:12:56 -08:00
CodemodService FBSourceClangFormatLinterBot	d541aa8cbe	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D32454757 fbshipit-source-id: ffb46701843245ac040905423eb950902b51951d	2021-11-15 21:54:23 -08:00
Stephen Macke	27cc11226d	make broadcast fastpath the default for currently rolled-out ops (#68365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68365 title. broadcast fastpath has been running fine for the enabled ops for a while now, so make it the default for these ops. Test Plan: diff is a no-op, so sandcastle Differential Revision: D32107847 fbshipit-source-id: b239b127b219985bf7df6a0eea2d879b8e9c79a4	2021-11-15 21:41:57 -08:00
Charles David Hernandez	7ee84ad321	Refactoring quantized op tests to combine test classes (#68282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68282 Combined 3 Dynamic quantized op test classes into 1 Test Plan: python test/test_quantization.py TestDynamicQuantizedOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D32402163 fbshipit-source-id: 696b7ef5d823632941dc7afc95161501445d0e18	2021-11-15 20:47:02 -08:00
Veselin Petrov	065018d812	[pytorch][xros] Ensure all pytorch mobile operators build ok in XROS mode (#68266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68266 * Use `if...endif` to adjust pyTorch internals towards XROS Test Plan: CI Reviewed By: kkosik20 Differential Revision: D32190771 fbshipit-source-id: cce073dea53c2b5681d913321101cd83c6472019	2021-11-15 19:52:45 -08:00
Mengchi Zhang	86c1368611	[fx][const fold] Add test/example for skipping quant/dequant pattern (#68378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68378 Add test/example for skipping quant/dequant pattern Reviewed By: jfix71 Differential Revision: D32410544 fbshipit-source-id: e63419a01a097e4c570c3861d79d573cabc0b294	2021-11-15 18:49:23 -08:00
Bowen Bao	722af775c3	[ONNX] ConstantMap setters to update existing value instead of emplace (#67630 ) (#67812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67812 `UpdateShape` uses `.emplace(tensorName, shapeValue)`. This will not update `shapeValue` for `tensorName`, if such name already exist in the map. Hence our code is not able to correct the shape inference error, even if we inferred the shape correctly later. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181300 Pulled By: malfet fbshipit-source-id: 05c58ad3fdac683ad957996acde8f0ed6341781d Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-11-15 17:20:07 -08:00
Deyu Huang	d32efe8bc2	[ONNX] Remove the argument use_external_data_format of export() method entirely. (#67080 ) (#67811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67811 * remove the argument use_external_data_format of export() method entirely Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181302 Pulled By: malfet fbshipit-source-id: 4bc1448b7487bb9dfdad4e36008ff5b227fd64a3 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-11-15 17:20:04 -08:00
Thiago Crepaldi	9d25554d45	[ONNX] Allow registration of custom symbolics for aten namespace (#66481 ) (#67810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67810 Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181303 Pulled By: malfet fbshipit-source-id: af2a715dc554b958fa3f5a7a8ae96cb3f7d112bb	2021-11-15 17:18:39 -08:00
Charles David Hernandez	09615cd0b0	Adding Dynamic Conv and ConvT ops/modules (#68176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68176 it should be noted that for the modules, reduce_range is set to true by default in a similar fashion to linear_dynamic. Test Plan: python test/test_quantization.py TestDynamicQuantizedModule python test/test_quantization.py TestDynamicQuantizedConv python test/test_quantization.py TestQuantizedConv Imported from OSS Reviewed By: kimishpatel Differential Revision: D32374003 fbshipit-source-id: 011562bd0f4d817387d53bb113df2600aa60a7a3	2021-11-15 16:42:25 -08:00
Nicolas Weber	529ebae0ac	Bugfix for TorchScript RNN RELU and TANH (#61274 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28418 Related https://github.com/pytorch/pytorch/issues/32976 but has already been fixed before. TorchScript handling of GRU and LSTM have been working, but not for RNN (Tanh and ReLU). The reason is that the ```Union[Tensor, PackedSequence]``` is not supported by TorchScript. Using ```torch._jit_internal._overload_method``` in ```RNNBase::Forward``` does not work, as it seems TorchScript does not correctly use them if the method gets inherited by ```RNN```. My solution is to move the ```RNNBase::forward``` to ```RNN``` and annotate using ```torch._jit_internal._overload_method```. LSTM and GRU anyway use their own ```forward``` methods, so there seems to be no problem related to this fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/61274 Reviewed By: anjali411 Differential Revision: D32374452 Pulled By: malfet fbshipit-source-id: 77bab2469c01c5dfa5eaab229429724a4172445d Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-15 16:20:58 -08:00
Raghavan Raman	2fd468e5f8	[jit] Set the graph input types before interpreting the graph during tracing (#68242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68242 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D32382958 Pulled By: navahgar fbshipit-source-id: 4e82a604a9ea2046af2755de23944147e618a65f	2021-11-15 15:44:32 -08:00
Mike Iovine	9ed49449b3	[SR] Add net level record functions (#68091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68091 Add record functions for recording perf stats on the entire network. Note that this is backed by the same pre-sampling mechanism as the op record functions, so net level stats get logged relatively infrequently. (If this is not acceptable, we can not use pre-sampling at the cost of a little bit of perf, every inference will require an RNG call) Reviewed By: hlu1 Differential Revision: D32296756 fbshipit-source-id: 09ff16c942f3bfc8f4435d6cca2be4a6b8dc6091	2021-11-15 15:39:08 -08:00
Will Constable	0823d18fcd	make TSComputation ctor explicit (#68286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68286 Test Plan: check it compiles Reviewed By: alanwaketan Differential Revision: D32402016 fbshipit-source-id: b623afa8831cd906336d7fcafbcbad32f79254b0	2021-11-15 14:58:33 -08:00
Eli Uriegas	7b958fbec4	ci: Build periodic jobs with DEBUG=1 (#67192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67192 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD, janeyx99 Differential Revision: D31902447 Pulled By: seemethere fbshipit-source-id: 1d1cca8b5ac84b1c23ab73e2d973bfb7bffa8982	2021-11-15 14:51:06 -08:00
Jane Xu	ea0a558487	GHA CI: make the default config use only one GPU (#68382 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66511 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68382 Reviewed By: mrshenli Differential Revision: D32441585 Pulled By: janeyx99 fbshipit-source-id: d92407c9bb9e4f740435840b4022e75749d7f0ba	2021-11-15 14:35:49 -08:00
vfdev-5	6adbe044e3	Added nearest-exact interpolation mode (#64501 ) Summary: Added "nearest-exact" interpolation mode to fix the issues: https://github.com/pytorch/pytorch/issues/34808 and https://github.com/pytorch/pytorch/issues/62237. Description: As we can not fix "nearest" mode without large impact on already trained model [it was suggested](https://github.com/pytorch/pytorch/pull/64501#pullrequestreview-749771815) to introduce new mode instead of fixing exising "nearest" mode. - New mode "nearest-exact" performs index computation for nearest interpolation to match scikit-image, pillow, TF2 and while "nearest" mode still match opencv INTER_NEAREST, which appears to be buggy, see https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries. "nearest": ``` input_index_f32 = output_index * scale input_index = floor(input_index_f32) ``` "nearest-exact" ``` input_index_f32 = (output_index + 0.5) * scale - 0.5 input_index = round(input_index_f32) ``` Comparisions with other libs: https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664 PyTorch version \| 1.9.0 "nearest" \| this PR "nearest" \| this PR "nearest-exact" ---\|---\|---\|--- Resize option: \| \| OpenCV INTER_NEAREST result mismatches \| 0 \| 0 \| 10 OpenCV INTER_NEAREST_EXACT result mismatches \| 9 \| 9 \| 9 Scikit-Image result mismatches \| 10 \| 10 \| 0 Pillow result mismatches \| 10 \| 10 \| 7 TensorFlow result mismatches \| 10 \| 10 \| 0 Rescale option: \| \| \| size mismatches (https://github.com/pytorch/pytorch/issues/62396) \| 10 \| 10 \| 10 OpenCV INTER_NEAREST result mismatches \| 3 \| 3\| 5 OpenCV INTER_NEAREST_EXACT result mismatches \| 3 \| 3\| 4 Scikit-Image result mismatches \| 4 \| 4 \| 0 Scipy result mismatches \| 4 \| 4 \| 0 TensorFlow: no such option \| - \| - Versions: ``` skimage: 0.19.0.dev0 opencv: 4.5.4-dev scipy: 1.7.2 Pillow: 8.4.0 TensorFlow: 2.7.0 ``` Implementations in other libs: - Pillow: - `ee079ae67e/src/libImaging/Geometry.c (L889-L899)` - `ee079ae67e/src/libImaging/Geometry.c (L11)` - `a[2] == 0` - Scikit-Image : - dev v0.19.0 uses scipy ndi.zoom: - `38fae50c3f/skimage/transform/_warps.py (L180-L188)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L775-L779)` - `47bb6febaa/scipy/ndimage/src/ni_interpolation.c (L479)` Additionally: - Updated upsampling tests cc ezyang gchanan albanD mruberry jbschlosser walterddr fmassa heitorschueroff ppwwyyxx Pull Request resolved: https://github.com/pytorch/pytorch/pull/64501 Reviewed By: anjali411 Differential Revision: D32361901 Pulled By: jbschlosser fbshipit-source-id: df906f4d25a2b2180e1942ffbab2cc14600aeed2	2021-11-15 14:28:19 -08:00
Digant Desai	e3bcf64ff8	[qnnpack] Remove redundant fp16 dependency (#68011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68011 `qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`. Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`. This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake. This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`. Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a. Reviewed By: salilsdesai Differential Revision: D32250984 fbshipit-source-id: 21468d8ef79c90e9876dc00da95383180a1031b5	2021-11-15 12:38:44 -08:00
Shiyan Deng	0cf46fb0de	[fx2trt] fix a bug in conversion from negative dim to positive dim (#68360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68360 Added a helper function to do this. Only use `mod` to convert negative dim to positive. Do nothing when it's already positive. Previously in `getitem` if we are slicing to the very end, we will get the dimension wrong. Test Plan: Add a unit test Reviewed By: yinghai, wushirong Differential Revision: D32432893 fbshipit-source-id: 3c5d6a578d92a15207a5e52802750f9ea7f272a9	2021-11-15 12:30:50 -08:00
Saketh Are	549e014963	[docs] fix torch.histc's min/max arg types (#64191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31475. `torch.histc` accepts Scalar min/max. The docs erroneously specified their types as int. cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/64191 Reviewed By: mrshenli Differential Revision: D32437279 Pulled By: saketh-are fbshipit-source-id: e6017e9236d815abd818dcd44e27819611666823	2021-11-15 12:29:25 -08:00
Mike Iovine	ccd9675569	[lint] Disable modernize-use-nodiscard (#68354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68354 Lint rule: https://clang.llvm.org/extra/clang-tidy/checks/modernize-use-nodiscard.html This check adds a ton of noise to our diffs. `[[nodiscard]]` is typically only useful when ignoring the return value of a function is a critical error, e.g. for `operator new`. Test Plan: Verified that the lint does not get triggered Reviewed By: hlu1 Differential Revision: D32429731 fbshipit-source-id: ca3d90686ec8d419d3f96167140dc406df6f4a53	2021-11-15 12:11:08 -08:00
Mike Iovine	c697eeba72	[JIT] Combine concat nodes where possible (#67000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67000 See the [related issue](https://github.com/pytorch/pytorch/issues/66654) for context. This new JIT optimization transforms patterns like this: ``` %inputs.1 : Tensor[] = prim::ListConstruct(%a, %b, %c) %concat.1 : Tensor = aten::cat(%inputs, %dim) %inputs.2 : Tensor[] = prim::ListConstruct(%x, %concat.1, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` into this: ``` %inputs.2 : Tensor[] = prim::ListConstruct(%x, %a, %b, %c, %y) %concat.2 : Tensor = aten::cat(%inputs.2, %dim) ``` (it can do this for chains of `aten::cat` longer than 2 as well) A few conditions have to hold: 1. The `dim`s have to match. 2. `inputs.1` and `inputs.2` cannot be mutated Test Plan: `buck test caffe2/test/cpp/jit:jit -- ConcatOpt` Reviewed By: d1jang Differential Revision: D31819491 fbshipit-source-id: 9f1a501d52099eb1a630b5dd906df4c38c3817ba	2021-11-15 12:02:45 -08:00
Brian Hirsh	30cda0b28c	[bugfix] functionalization pass for view ops without a 'self' first argumennt (#68339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68339 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32429570 Pulled By: bdhirsh fbshipit-source-id: e6df243c508c2ba2ca1df7a53fa68f32db454f32	2021-11-15 11:58:21 -08:00
Brian Hirsh	5b05983497	[bugfix] fix two edge cases in functionalization (#68269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68269 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32396357 Pulled By: bdhirsh fbshipit-source-id: 1d374b693f3f526d027104cbdc08b8bbe9d38307	2021-11-15 11:58:18 -08:00
yanbing-j	12026124cc	Avoid the view for mkldnn case in 1D convolution (#68166 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68166 Reviewed By: mrshenli Differential Revision: D32432444 Pulled By: jbschlosser fbshipit-source-id: fc4e626d497d9e4597628a18eb89b94518bb3b33	2021-11-15 11:56:45 -08:00
Jane Xu	56024e91c9	GHA: Enable flaky test reporting by setting PYTORCH_RETRY_TEST_CASES=1 (#68300 ) Summary: Enables https://github.com/pytorch/pytorch/issues/68150 in CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/68300 Reviewed By: seemethere Differential Revision: D32435332 Pulled By: janeyx99 fbshipit-source-id: 155018afaf73d5a24d13d358879361468ec7b18e	2021-11-15 11:23:55 -08:00
Michael Suo	24b60b2cbf	[lint] lintrunner fixes/improvements (#68292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68292 - noqa was typo-d to be the same as type: ignore - generalize clang-tidy initialization and use it for clang_format as well - Add a script that lets you update the binaries in s3 relatively easily Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32403934 Pulled By: suo fbshipit-source-id: 4e21b22605216f013d87d636a205707ca8e0af36	2021-11-15 11:08:26 -08:00
lezcano	43874d79e7	Fix failing test due to a bug in NumPy when using OpenBLAS (#67679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67679 implementations Fixes https://github.com/pytorch/pytorch/issues/67675 cc mruberry Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32368698 Pulled By: mruberry fbshipit-source-id: 3ea6ebc43c061af2f376cdf5da06884859bbbf53	2021-11-15 08:25:12 -08:00
Andrey Talman	d1c529bd0b	replace platform specific CI environment variables with generic ones (#68133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68133 Reviewed By: saketh-are Differential Revision: D32401080 Pulled By: atalman fbshipit-source-id: 057a34a56f8a2d324f4d1ea07da3a09772177897	2021-11-15 07:02:44 -08:00
Mengchi Zhang	1c0d6ff835	[fx][const fold] Allow to set up a function to modify const_nodes for split_const_subgraphs (#67784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67784 FX model generates quant/dequant layers for INT8 explicit mode support. However, if the inputs of quant/dequant layers are constant, the layer will be put into constant subgraph and optimized out. Hence TensorRT will fails to parse the left over graph. It is better to set up an optional function (skip_folding_node_fn) to skip folding nodes for split_const_subgraphs. Reviewed By: jfix71 Differential Revision: D32076970 fbshipit-source-id: 7dcbb4f02386f8c831d09a2f0e40bcdba904471c	2021-11-15 06:51:19 -08:00
Erjia Guan	4c87aa77d1	[DataPipe] Traverse DataPipe graph excluding primitive and callable (#67783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67783 Add `getstate_hook` to exclude primitive objects and callable when serialization when `exclude_primitive` is enabled for `traverse`. For graph traversing, we don't have to handle the lambda and other stuff. This is used by `OnDiskCacheHolder` to trace the DataPipe Graph. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D32146697 Pulled By: ejguan fbshipit-source-id: 03b2ce981bb21066e807f57c167b77b2d0e0ce61	2021-11-15 06:46:31 -08:00
Yinghai Lu	1adeeabdc0	Fix trt tuple(Dims) throwing issue (#68318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68318 Adding a `__iter__` binding so that when we do `tuple(Dims)` can construct the right iterator and knows where to stop instead of trial and error with exception catch. We should upstream this to https://github.com/NVIDIA/TensorRT. cc: wushirong I did try a very similar `__iter__` fix previsouly but not sure why it wasn't effective... Reviewed By: kflu, wushirong Differential Revision: D32412430 fbshipit-source-id: 6390a1275dc34ef498acf933bb96f636c15baf41	2021-11-13 19:48:46 -08:00
Thomas Viehmann	be281fc597	Check for None in torch.jit.Graph.create (#68253 ) Summary: ...because we don't like segfaults from Python (see test). Pull Request resolved: https://github.com/pytorch/pytorch/pull/68253 Reviewed By: suo Differential Revision: D32396747 Pulled By: gmagogsfm fbshipit-source-id: a0925e8479702766e88176280985a63bc79e4f6a	2021-11-13 11:30:33 -08:00
Ivan Kobzarev	6fb8ebcd92	[tensorexp] Add strides to Buf (#68018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68018 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32262381 Pulled By: IvanKobzarev fbshipit-source-id: dba79add0bf703bc2378d64e726d4c47ec30e3be	2021-11-13 08:33:01 -08:00
David Dang	f7366ca51b	implemented quantize_per_tensor_dynamic and added a corresponding test script (#68004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68004 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D32301792 Pulled By: dzdang fbshipit-source-id: f680557ba4736d095efc33e8c92111265f25aee0	2021-11-13 06:34:36 -08:00
Rohan Varma	cb14a258a2	[c10d] Fix object-based collectives for debug mode (#68223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68223 DETAIL debug mode didn't work with object-based collectives for NCCL backend, because we'd only check if backend is NCCL and then move tensors to CUDA. Instead, check if it is a wrapped PG, and then check the pg that is wrapped to see if its nccl. ghstack-source-id: 143242023 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32366840 fbshipit-source-id: be0a2af6849f8f24446593f4a4fbea4a67586ee5	2021-11-13 04:18:31 -08:00
Mikhail Zolotukhin	ec94bb787a	[TensorExpr] Add a way to define target triple/cpu/attrs for llvm codegen and turn on the AOT workflow. (#66527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66527 Differential Revision: D31593869 D31593869 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: e7534c11fbcf0dab5f49d01d6053caf77b833ef0	2021-11-13 00:52:20 -08:00
Mikhail Zolotukhin	52e93fca2c	[TensorExpr] Fix some TE python bindings. (#68232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68232 Differential Revision: D32380676 D32380676 Test Plan: Imported from OSS Reviewed By: saketh-are Pulled By: ZolotukhinM fbshipit-source-id: 9287a2c765a53b45ac04d625cc010f5384a8bddf	2021-11-13 00:52:18 -08:00
Mikhail Zolotukhin	e511a7a5b4	[TensorExpr] Remove non-determinism in iterating over unordered_set of intermediate buffers. (#68277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68277 Differential Revision: D32400553 D32400553 Test Plan: Imported from OSS Reviewed By: saketh-are, priyaramani Pulled By: ZolotukhinM fbshipit-source-id: a8fe820bbddaa19f95db432efaa6d3e36095a05e	2021-11-13 00:50:57 -08:00
Jane Xu	80339e85c5	Fix disabling bot with subprocessing (#68290 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/68270 Tested locally + tests get disabled properly Pull Request resolved: https://github.com/pytorch/pytorch/pull/68290 Reviewed By: mrshenli Differential Revision: D32403956 Pulled By: janeyx99 fbshipit-source-id: 86629daa86f83f6777f2279524ef973af51046b9	2021-11-12 19:56:17 -08:00
Shirong Wu	282221c5d6	Fuse unsqueeze, cat, sum for inline_cvr (#68289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68289 Fuse op unsqueese+cat+sum to add op Reviewed By: jfix71 Differential Revision: D31769197 fbshipit-source-id: 184b3c8217f2ad9fab9ac8d3c91cd33cf7e7de30	2021-11-12 18:20:11 -08:00
Deyu Huang	48c8de45b0	[ONNX] Remove the argument example_outpus of export() method entirely. (#67082 ) (#67809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67809 * remove the argument example_outpus of export() method entirely [ONNX] Follow-up: Remove the argument example_outpus of export() method entirely. (#67629) * Resolve CI failure * remove test after removing example_outputs [ONNX] Follow-up: Follow-up: Remove the argument example_outpus of export() method entirely (#67719) Removing unused import, resolving flake error. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181305 Pulled By: malfet fbshipit-source-id: ba00547b7cb455ace86606b1bda643c02bdcfa1b Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-11-12 17:06:26 -08:00
Richard Zou	a8b93cb3ec	More aggressively market functorch.vmap when torch.vmap gets called (#67347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67347 This PR: - changes the warning when torch.vmap gets called to suggest using functorch.vmap - changes the warning when a batching rule isn't implemented to suggest using functorch.vmap Test Plan: - test/test_vmap.py Reviewed By: H-Huang Differential Revision: D31966603 Pulled By: zou3519 fbshipit-source-id: b01dc1c2e298ce899b4a3a5fb333222a8d5bfb56	2021-11-12 16:10:16 -08:00
Jane Xu	da5ffe752a	Add reporting for flaky tests in CI (#68150 ) Summary: This PR does NOT change how signal is displayed in CI but rather just reports stats of flaky tests to RDS. None of the below will be enabled after landing this PR--it will be done in a separate PR with environment variables. We report flaky tests stats when a test first fails, and when we rerun it MAX_NUM_RETRIES times, we get at least one success. For tests that fail all the reruns, we assume it is because it is a real test failure. For tests that succeed the first time, we do not rerun the test, even if it was previously noted as flaky. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68150 Test Plan: First, I modified: test_async_python to always fail (will be our "failing test") test_async_future_type_python to fail 40% of the time test_async_script_capture to fail 60% of the time Then, running `python test/test_jit.py -v -k test_async` while setting IN_CI to 1: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.004s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.020s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.045s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 3 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 2 test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_capture succeeded - num_retries_left: 1 test_async_script_capture (jit.test_async.TestAsync) ... FAIL (0.010s) test_async_script_capture failed - num_retries_left: 0 test_async_script_error (jit.test_async.TestAsync) ... ok (0.040s) test_async_script_multi_forks (jit.test_async.TestAsync) ... ok (0.025s) test_async_script_multi_waits (jit.test_async.TestAsync) ... ok (0.009s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ====================================================================== FAIL [0.010s]: test_async_script_capture (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 123, in test_async_script_capture self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 28 tests in 0.399s FAILED (failures=2, expected failures=5, unexpected successes=1) ``` Yielding this as the test report (I changed the extension from xml to txt so it uploads here): [TEST-jit.test_async.TestAsync-20211110222055.txt](https://github.com/pytorch/pytorch/files/7517532/TEST-jit.test_async.TestAsync-20211110222055.txt) And then running print_test_stats correctly excludes the all failing test `test_async_python` and calculates red and green appropriately: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_script_capture', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 3, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] (pytorch) janeyx@janeyx-mbp pytorch % ``` ------------------- If you're curious, I also included the code for when we would like to override the report_only feature and also hide flaky signal in CI. The results for the same test command correctly still fail the test suite, but mark the flaky test_async_future_type_python as passed: ``` (pytorch) janeyx@janeyx-mbp pytorch % python test/test_jit.py -v -k test_async ... Running tests... ---------------------------------------------------------------------- test_async_future_type_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_future_type_python failed - num_retries_left: 3 test_async_future_type_python (jit.test_async.TestAsync) ... ok (0.001s) test_async_grad_guard_no_grad (jit.test_async.TestAsync) ... ok (0.017s) test_async_grad_guard_with_grad (jit.test_async.TestAsync) ... ok (0.008s) test_async_kwargs (jit.test_async.TestAsync) ... ok (0.091s) test_async_parsing (jit.test_async.TestAsync) ... ok (0.010s) test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 3 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 2 test_async_python (jit.test_async.TestAsync) ... FAIL (0.004s) test_async_python failed - num_retries_left: 1 test_async_python (jit.test_async.TestAsync) ... FAIL (0.003s) test_async_python failed - num_retries_left: 0 test_async_script (jit.test_async.TestAsync) ... ok (0.008s) test_async_script_capture (jit.test_async.TestAsync) ... ok (0.011s) test_async_script_error (jit.test_async.TestAsync) ... ok (0.039s) ... ====================================================================== FAIL [0.003s]: test_async_python (jit.test_async.TestAsync) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/janeyx/pytorch/test/jit/test_async.py", line 30, in test_async_python self.assertTrue(False) AssertionError: False is not true ---------------------------------------------------------------------- Ran 26 tests in 0.390s FAILED (failures=1, expected failures=4) ``` With test reports: [TEST-jit.test_async.TestAsync-20211110224810.txt](https://github.com/pytorch/pytorch/files/7517663/TEST-jit.test_async.TestAsync-20211110224810.txt) And running print_test_stats: ``` (pytorch) janeyx@janeyx-mbp pytorch % python tools/stats/print_test_stats.py test-reports/python-unittest/test.test_jit [scribe] Not invoking RDS lambda outside GitHub Actions: [{'create_table': {'table_name': 'flaky_tests', 'fields': {'name': 'string', 'suite': 'string', 'file': 'string', 'num_green': 'int', 'num_red': 'int', 'pr': 'string', 'ref': 'string', 'branch': 'string', 'workflow_id': 'string', 'build_environment': 'string'}}}] [scribe] Writing for None [scribe] Wrote stats for flaky_tests [scribe] Not invoking RDS lambda outside GitHub Actions: [{'write': {'table_name': 'flaky_tests', 'values': {'name': 'test_async_future_type_python', 'suite': 'jit.test_async.TestAsync', 'file': 'test/test_jit', 'num_green': 1, 'num_red': 1, 'pr': None, 'ref': None, 'branch': None, 'workflow_id': None, 'build_environment': 'linux-xenial-gcc5.4-py3'}}}] ``` Reviewed By: saketh-are Differential Revision: D32393907 Pulled By: janeyx99 fbshipit-source-id: 37df890481ab84c62809c022dc6338b50972899c	2021-11-12 15:03:14 -08:00
Jane Xu	8bf150f21b	Revert D32178667: [pytorch][PR] Python tracer for profiler Test Plan: revert-hammer Differential Revision: D32178667 (`33353fb828`) Original commit changeset: 118547104a7d fbshipit-source-id: 47510607589fc39c730ba913f47c01a7d107b7b0	2021-11-12 14:53:52 -08:00
Peter Bell	a82e51a7ae	Move some cub templates out of the header file (#67650 ) Summary: Cub routines are both expensive to compile and used in multiple different operators throughout the cuda folder. So, it makes sense to compile them in one centralized place where possible (i.e. when custom operators aren't involved). Pull Request resolved: https://github.com/pytorch/pytorch/pull/67650 Reviewed By: mruberry Differential Revision: D32259660 Pulled By: ngimel fbshipit-source-id: 5f7dbdb134297e1ffdc1c7fc5aefee70a2fa5422	2021-11-12 13:51:11 -08:00
Will Constable	6ddaf3bd37	[LT] Upstream TsNode, TsNodeLowering, TsLoweringContext (#68154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68154 Test Plan: added a basic test; cover more by using lazy_tensor_staging tests Reviewed By: Krovatkin, alanwaketan Differential Revision: D32224303 fbshipit-source-id: ac3e1161229b8ae60fdb15ffa72e17072b595914	2021-11-12 12:57:20 -08:00
Ben Koopman	f6e45102d2	[quant][embedding qat] Support non-partial functions in qconfig comparison (#68067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68067 Embedding QAT uses a NoopObserver class for activation, and a FakeQuant for weight, make sure that qconfig comparison functions properly for a mix of partial function and class in qconfig. Test Plan: `pytest test/quantization/eager/test_quantize_eager_qat.py -v -k "test_embedding_qat_qconfig_equal"` Imported from OSS Reviewed By: HDCharles Differential Revision: D32318434 fbshipit-source-id: c036eef9cbabe7c247745930501328e9c75a8cb0	2021-11-12 12:48:00 -08:00
Mikhail Zolotukhin	66b52d5b49	[TensorExpr] Convert linear_clamp_run to using schema in NNC lowerings. (#66523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66523 Differential Revision: D31590857 D31590857 Test Plan: Imported from OSS Reviewed By: bdhirsh Pulled By: ZolotukhinM fbshipit-source-id: da8a7d68c8a4cf74c3f622b8a3af54d00ffb14a6	2021-11-12 12:26:06 -08:00
Kevin Tse	06e8cb9e04	Manually Disabling two TestDistBackendWithSpawn tests on ROCm, test_ddp_profiling_torch_profiler and test_ddp_sync_bn_training_vs_eval (#68255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68255 Manually disabling these two tests because they can't be disabled via Probot. See the issues #68222 and #68173 for details. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Test Plan: Imported from OSS Reviewed By: malfet, saketh-are Differential Revision: D32390899 Pulled By: NivekT fbshipit-source-id: bd4996d73014337a9175b20ae67a3880ee994699	2021-11-12 12:04:21 -08:00
Taylor Robie	33353fb828	Python tracer for profiler (#67407 ) Summary: This PR instruments the CPython interpreter and integrates the resulting trace into the PyTorch profiler. The python tracing logic works by enabling `PyEval_SetProfile`, and then logging the minimal information to track every time python calls or returns from a function. A great deal of care has gone into keeping this process very lightweight; the `RawEvent` struct is only two words and doesn't do anything fancy. When a python function is called, we have to do extra work. If the call is to `nn.Module.__call__`, we simply incref to extend the life of the module. Otherwise we check if we have seen the function before, and if not go through the (somewhat expensive) task of saving the strings which we then cache. To actually get a useful timeline, we have to replay the events to determine the state of the python stack at any given point. A second round of stack replay is needed to figure out what the last python function was for each torch op so we can reconstruct the correct python stack. All of this is done during post processing, so while we want to be reasonably performant it is no longer imperative to shave every last bit. I still need to do a bit of refinement (particularly where the tracer interfaces with the profiler), but this should give a good sense of the general structure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67407 Test Plan: ``` import torch class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(2, 2) self.relu = torch.nn.ReLU() def forward(self, x): x = self.linear(x) return self.relu(x) def call_module(): m = MyModule() for _ in range(4): m(torch.ones((2, 2))) def top_level_fn(): with torch.profiler.profile(with_stack=True) as p: call_module() p.export_chrome_trace("test_trace.json") top_level_fn() ``` <img width="1043" alt="Screen Shot 2021-10-27 at 6 43 18 PM" src="https://user-images.githubusercontent.com/13089297/139171803-f95e70f3-24aa-45e6-9d4b-6d437a3f108d.png"> PS: I've tried to comment liberally, particularly around some of the more magical parts. However I do plan on doing another linting and commenting pass. Hopefully it's not too bad right now. Reviewed By: gdankel, chaekit Differential Revision: D32178667 Pulled By: robieta fbshipit-source-id: 118547104a7d887e830f17b94d3a29ee4f8c482f	2021-11-12 11:58:12 -08:00
David Berard	96d116fec2	[JIT] Add additional debug output when op cannot be found in AliasDb (#68099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68099 When an op in the graph cannot be matched to any known ops, alias_analysis.cpp throws an error. Before: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for aten::add but it isn't a special case. Argument types: Tensor, float, Tensor, ``` After: ``` RuntimeError: 0INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":612, please report a bug to PyTorch. We don't have an op for a ten::add but it isn't a special case. Argument types: Tensor, float, Tensor, Candidates: aten::add.Tensor(Tensor self, Tensor other, , Scalar alpha=1) -> (Tensor) aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor) aten::add.out(Tensor self, Tensor other, , Scalar alpha=1, Tensor(a!) out) -> (Tensor(a!)) aten::add.t(t[] a, t[] b) -> (t[]) aten::add.str(str a, str b) -> (str) aten::add.int(int a, int b) -> (int) aten::add.complex(complex a, complex b) -> (complex) aten::add.float(float a, float b) -> (float) aten::add.int_complex(int a, complex b) -> (complex) aten::add.complex_int(complex a, int b) -> (complex) aten::add.float_complex(float a, complex b) -> (complex) aten::add.complex_float(complex a, float b) -> (complex) aten::add.int_float(int a, float b) -> (float) aten::add.float_int(float a, int b) -> (float) aten::add(Scalar a, Scalar b) -> (Scalar) ``` Test Plan: Run ``` import torch if __name__ == '__main__': ir = """ graph(%x : Tensor, %y : Tensor): %2 : float = prim::Constant[value=1.2]() %result : Tensor= aten::add(%x, %2, %y) return (%result) """ x = torch.tensor([[1., 2.], [3., 4.]]) y = torch.tensor([[2., 1.], [2., 1.]]) graph = torch._C.parse_ir(ir) print(graph) graph.alias_db().analyze() # print(script(x, y)) ``` to get the results above Imported from OSS Reviewed By: anjali411 Differential Revision: D32339639 fbshipit-source-id: a79a3c2f157154b5fb1e3f33a23e43b7884e8e38	2021-11-12 08:39:41 -08:00
Mike Ruberry	98bab78e11	Revert D32039318: [pytorch][PR] Bump dlpack.h to latest version Test Plan: revert-hammer Differential Revision: D32039318 (`d049772538`) Original commit changeset: 7dfc653e1e77 fbshipit-source-id: 0d4b1af7381a2638ca9f3c3af26c2ff0b7bd7469	2021-11-12 08:20:21 -08:00
Mikayla Gawarecki	5c3a9f3fdc	adding opinfo for torch.nn.bilinear and torch.nn.glu (#67478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67478 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32027807 Pulled By: mikaylagawarecki fbshipit-source-id: 501057cc9aced19fca26c4294fe81dcbb4b83a26	2021-11-12 08:13:15 -08:00
Will Constable	dc24503a89	Fix Hash(c10::Scalar), account for garbage data in union (#68201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68201 Hash(c10::Scalar) made a bad assumption that it was valid to just hash over all the bytes of data of the c10::Scalar struct. Becuase c10::Scalar stores a union of different (float/int/complex) types with different sizes, not all bytes are valid in all cases. Hash() should only read the bytes corresponding to the currently active type. Test Plan: Added new unit tests. Verified HashTest.Scalar failed with the original Hash() impl and then fixed. Reviewed By: alanwaketan Differential Revision: D32367564 fbshipit-source-id: ac30dd4f6dd0513954986d3d23c0c11ba802c37b	2021-11-12 07:20:08 -08:00
Andres Suarez	0bd0a67c4f	[lint][fbcode/caffe2] CLANGFORMAT Test Plan: Proof of coverage: ``` $ hg files fbcode/caffe2 \| arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/before.txt $ hg up this_diff $ hg files fbcode/caffe2 \| arc linttool debugfilterpaths --take CLANGFORMAT --path-match-only > ~/after.txt $ comm -3 ~/before.txt ~/after.txt \| pastry P467377980: https://www.internalfb.com/intern/paste/P467377980/ ``` These files lost coverage: - `fbcode/caffe2/torch/abi-check.cpp` - `fbcode/caffe2/torch/custom_class.h` - `fbcode/caffe2/torch/custom_class_detail.h` - `fbcode/caffe2/torch/deploy.h` - `fbcode/caffe2/torch/extension.h` - `fbcode/caffe2/torch/library.h` - `fbcode/caffe2/torch/script.h` Everything else in P467377980 gained coverage. Reviewed By: suo Differential Revision: D32364856 fbshipit-source-id: 9b3ba3350ecdb50038412a24af5e0da0fe4d69b8	2021-11-12 05:12:39 -08:00
Charles David Hernandez	e795315c63	Changes and fixes to prepare for dynamic conv (#68175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68175 This slightly alters the way from_float works so it will work with placeholder observers. It also fixes a but with ConvTranspose3d and ConvTranspose1d where the parameters like kernel_size, stride...etc weren't set properly. New tests were added to check for this type of issue as well. Test Plan: python test/test_quantization.py TestQuantizedOps python test/test_quantization.py TestStaticQuantizedModule Imported from OSS Reviewed By: z-a-f Differential Revision: D32374004 fbshipit-source-id: caaa548d12d433d9c1fa0abc8597a7d31bb4e8af	2021-11-11 23:55:04 -08:00
Nikita Shulga	1181628d85	BE: Use TORCH_CHECK instead of explicit c10::Error (#68187 ) Summary: `if (cond) { raise c10::error("", msg)}` is identical to `TORCH_CHECK(!cond, msg);`, but with better attribution Pull Request resolved: https://github.com/pytorch/pytorch/pull/68187 Reviewed By: xuzhao9 Differential Revision: D32360956 Pulled By: malfet fbshipit-source-id: e554b99926d7ad0c79a1cd54d35f47339fa2429d	2021-11-11 22:01:41 -08:00
Shirong Wu	799ebce3aa	Add algo recorder/replayer to lower.py (#68194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68194 Add algorithm recorder/replayer to lower.py Reviewed By: yinghai Differential Revision: D31909575 fbshipit-source-id: 552f2ba4fbd6ea646316f6412d55416a76e1f69a	2021-11-11 21:22:22 -08:00
Mike Ruberry	613c1aca6d	Adds support for automated error and warning testing (#67354 ) Summary: Adds a new class `ErrorOrWarningInput` that is a `SampleInput` with some additional metadata for validating that `SampleInput` throws the desired warning or error. The architecture to support these new tests is modeled after the existing reference tests and sample input functions. Existing invalid input tests for neg and kthvalue are ported to the new scheme to validate it. There may be a simpler/clearer naming scheme we can use here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67354 Reviewed By: jbschlosser Differential Revision: D31989888 Pulled By: mruberry fbshipit-source-id: 4fa816e1e8d0eef21b81c2f80813d42b2c26714e	2021-11-11 19:28:47 -08:00
Yi Zhang	89d556f648	add VS extension in doc (#63944 ) Summary: add VS extension in https://pytorch.org/cppdocs/installing.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/63944 Reviewed By: malfet Differential Revision: D30546156 Pulled By: seemethere fbshipit-source-id: a65448d8702f9fd400c9dd2ef2d9f961f30c4983	2021-11-11 18:02:08 -08:00
Don Jang	9cb65df79f	[Static Runtime] Fallback to disabling manage_output_tensors instead of crashing when wrong API is used (#67939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67939 With `manage_output_tensor` enabled, a client of `StaticRuntime` requires to call it via `PyTorchPredictor::predict_managed_result`. If the client uses `PyTorchPredictor::operator()` the client will experience a crash (intended behavior not to leak memory of managed output tensors). This mistake can cause a catastrophic failure in production if that happens (by gatekeeper, config changes, etc). Considering the complexity in how `PyTorchPredictor` is used in different settings, the chances that this bug can hit production is non-zero. This change introduces `StaticRuntime::disableManageOutputTensor` to disable `manage_output_tensor` feature when a client mistakenly uses `PyTorchPredictor::operator()` instead of crashing. When `StaticRuntime` is invoked via `PyTorchPredictor::operator()`, it first calls `StaticRuntime::disableManageOutputTensor` to disable the feature, so that it can get non-managed output tensors to pass to the client safely. A slight perf degradation is expected by forcefully disabling `manage_output_tensors`, but its robustness value outweighs a catastrophic failure of crashes at a high rate. Test Plan: Added a unittest `StaticRuntime, DisableManageOutputTensors` to cover the newly added code. Reviewed By: swolchok Differential Revision: D32219731 fbshipit-source-id: caf5c910b34726c570e17435ede7d888443e90cf	2021-11-11 17:31:07 -08:00
Jiakai Liu	3dc0754c53	[pytorch][mobile] deprecate the LLVM-based static analyzer (#68180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68180 Since we've open sourced the tracing-based selective build, we can deprecate the op-dependency-graph-based selective build and the static analyzer tool that produces the dependency graph. ghstack-source-id: 143108377 Test Plan: CIs Reviewed By: seemethere Differential Revision: D32358467 fbshipit-source-id: c61523706b85a49361416da2230ec1b035b8b99c	2021-11-11 16:37:08 -08:00
Junjie Wang	301369a774	[PyTorch][Fix] Pass the arguments of embedding as named arguments (#67574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67574 When adding the optional params for sharded embedding op. Found that we cannot get these params from `__torch_function__` override. The reason is that we don't pass them via keyword arguments. So maybe we want to change them to kwargs? ghstack-source-id: 143029375 Test Plan: CI Reviewed By: albanD Differential Revision: D32039152 fbshipit-source-id: c7e598e49eddbabff6e11e3f8cb0818f57c839f6	2021-11-11 15:22:10 -08:00
Michael Suo	9571eb599c	[lint] fix up clangtidy lintrunner integration (#68192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68192 - Run on exactly the same stuff as the existing linter checks. - Exclude deploy interpreter headers from being reported. Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32364023 Pulled By: suo fbshipit-source-id: c27eca4a802534875d609d004fa9f6fca59ae6a5	2021-11-11 14:53:28 -08:00
Sameer Deshmukh	6afb414c21	Nan in linalg eig (#67544 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61251. As per the comment here (https://github.com/pytorch/pytorch/issues/61251#issuecomment-954676082), a consensus has been reached to raise an error if there is a NaN value in the input when calling `eig()`. This PR implements that feature. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67544 Reviewed By: malfet Differential Revision: D32310919 Pulled By: mruberry fbshipit-source-id: fc74a1ae2d929157c2d4c9051e3e9a4bf03dd5be	2021-11-11 14:33:49 -08:00
Thomas Viehmann	d049772538	Bump dlpack.h to latest version (#65047 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65047 Reviewed By: ngimel Differential Revision: D32039318 Pulled By: mruberry fbshipit-source-id: 7dfc653e1e77799d1f26a95fa9bbae3c7ffc887c	2021-11-11 14:02:16 -08:00
Kurt Mohler	0420545639	Enable all dtype combinations in `torch.Tensor.view(dtype)` (#66493 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29013 Note: This PR does not enable autograd. This can be done in a future PR. cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/66493 Reviewed By: gchanan Differential Revision: D32314680 Pulled By: mruberry fbshipit-source-id: 69d325573b2331f32b83c05c91ffbe80571e7ae2	2021-11-11 13:55:21 -08:00
Nick Anderson	f9ea41f257	Fixes spelling error writeable to writable, improves warning, and documentation (#67664 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46741 pytorchbot contributors: nickleus27, yanivsagy, and khanhthien123 SmrutiSikha this is mostly your work. We just did very minor clean up. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67664 Reviewed By: gchanan Differential Revision: D32311838 Pulled By: mruberry fbshipit-source-id: 0e5d4d888caeccb0fd7c80e6ff11b1b1fa8e00d6	2021-11-11 13:05:00 -08:00
lezcano	1e8f836c44	Remove OpInfo non-contig inputs (#67677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67677 This follows https://github.com/pytorch/pytorch/issues/63341#issuecomment-899690614 Fixes https://github.com/pytorch/pytorch/issues/67012 Note. I wrote the OpInfo for `index_fill`, so removing those inputs in there is right. kshitij12345 mentioned that the same thing is true for the inputs for tile / repeat. https://github.com/pytorch/pytorch/issues/67012#issuecomment-948537446 There are more uses of `transpose` within the OpInfos, but most of them are for testing `mm` and `baddmm`. I did not touch those, as those operations are so important that it won't hurt to test those more thoroughly. cc mruberry Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32311729 Pulled By: mruberry fbshipit-source-id: ac0804ca6f893118046b3e1bd97b5a2e6b900b59	2021-11-11 13:03:16 -08:00
Marco Bertolini	4fe3965b3a	Fix dtype arg typing for Tensor.type doc string (#67019 ) Summary: Fix typing error in PyCharm when using torch.Tensor.type(dtype=torch.int64) <img width="386" alt="Screenshot 2021-10-21 at 15 30 50" src="https://user-images.githubusercontent.com/59562934/138288062-cc2ba45e-ece0-4fca-9369-55d020404c28.png"> Thanks for your great work! :) cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67019 Reviewed By: malfet Differential Revision: D32311313 Pulled By: mruberry fbshipit-source-id: 90fc453bc4129a301d567d4b39137b93c5dac01e	2021-11-11 12:58:46 -08:00
Anirudh Dagar	b07a11929d	Array API: Add torch.linalg.cross (#63285 ) Summary: ### Create `linalg.cross` Fixes https://github.com/pytorch/pytorch/issues/62810 As discussed in the corresponding issue, this PR adds `cross` to the `linalg` namespace (Note: There is no method variant) which is slightly different in behaviour compared to `torch.cross`. Note: this is NOT an alias as suggested in mruberry's [https://github.com/pytorch/pytorch/issues/62810 comment](https://github.com/pytorch/pytorch/issues/62810#issuecomment-897504372) below > linalg.cross being consistent with the Python Array API (over NumPy) makes sense because NumPy has no linalg.cross. I also think we can implement linalg.cross without immediately deprecating torch.cross, although we should definitely refer users to linalg.cross. Deprecating torch.cross will require additional review. While it's not used often it is used, and it's unclear if users are relying on its unique behavior or not. The current default implementation of `torch.cross` is extremely weird and confusing. This has also been reported multiple times previously. (See https://github.com/pytorch/pytorch/issues/17229, https://github.com/pytorch/pytorch/issues/39310, https://github.com/pytorch/pytorch/issues/41850, https://github.com/pytorch/pytorch/issues/50273) - [x] Add `torch.linalg.cross` with default `dim=-1` - [x] Add OpInfo and other tests for `torch.linalg.cross` - [x] Add broadcasting support to `torch.cross` and `torch.linalg.cross` - [x] Remove out skip from `torch.cross` OpInfo - [x] Add docs for `torch.linalg.cross`. Improve docs for `torch.cross` mentioning `linalg.cross` and the difference between the two. Also adds a warning to `torch.cross`, that it may change in the future (we might want to deprecate it later) --- ### Additional Fixes to `torch.cross` - [x] Fix Doc for Tensor.cross - [x] Fix torch.cross in `torch/overridres.py` While working on `linalg.cross` I noticed these small issues with `torch.cross` itself. [Tensor.cross docs](https://pytorch.org/docs/stable/generated/torch.Tensor.cross.html) still mentions `dim=-1` default which is actually wrong. It should be `dim=None` after the behaviour was updated in PR https://github.com/pytorch/pytorch/issues/17582 but the documentation for the `method` or `function` variant wasn’t updated. Later PR https://github.com/pytorch/pytorch/issues/41850 updated the documentation for the `function` variant i.e `torch.cross` and also added the following warning about the weird behaviour. > If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected. But still, the `Tensor.cross` docs were missed and remained outdated. I’m finally fixing that here. Also fixing `torch/overrides.py` for `torch.cross` as well now, with `dim=None`. To verify according to the docs the default behaviour of `dim=-1` should raise, you can try the following. ```python a = torch.randn(3, 4) b = torch.randn(3, 4) b.cross(a) # this works because the implementation finds 3 in the first dimension and the default behaviour as shown in documentation is actually not true. >>> tensor([[ 0.7171, -1.1059, 0.4162, 1.3026], [ 0.4320, -2.1591, -1.1423, 1.2314], [-0.6034, -1.6592, -0.8016, 1.6467]]) b.cross(a, dim=-1) # this raises as expected since the last dimension doesn't have a 3 >>> RuntimeError: dimension -1 does not have size 3 ``` Please take a closer look (particularly the autograd part, this is the first time I'm dealing with `derivatives.yaml`). If there is something missing, wrong or needs more explanation, please let me know. Looking forward to the feedback. cc mruberry Lezcano IvanYashchuk rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/63285 Reviewed By: gchanan Differential Revision: D32313346 Pulled By: mruberry fbshipit-source-id: e68c2687c57367274e8ddb7ef28ee92dcd4c9f2c	2021-11-11 12:49:41 -08:00
Thomas Viehmann	40bedf6206	Fix test_triangular_solve testcase enumeration (#67635 ) Summary: use product instead of zip to cover all cases cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67635 Reviewed By: malfet Differential Revision: D32310956 Pulled By: mruberry fbshipit-source-id: 806c3313e2db26d77199d3145b2d5283b6ca3617	2021-11-11 12:49:38 -08:00
Kurt Mohler	db014b8529	Add `set_deterministic_debug_mode` and `get_deterministic_debug_mode` (#67778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67386 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67778 Reviewed By: ngimel Differential Revision: D32310661 Pulled By: mruberry fbshipit-source-id: 300129e96ca51c22fa711182ce6a9f4d4d2ce57f	2021-11-11 12:48:29 -08:00
Jiewen Tan	cd4e31ff21	[LTC] Add some comments to BackendDevice() (#68156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68156 [skip ci] Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D32346302 Pulled By: alanwaketan fbshipit-source-id: 06de6afbe2f937511abce485b24cec0a85bfbe97	2021-11-11 12:43:56 -08:00
Howard Huang	7b376bf844	Remove ProcessGroup from TensorPipeAgent initialization (#68128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68128 Reland of D31762735 (`0cbfd466d2`). This diff was originally reverted due to failure in test_send_export_type_through_rpc_with_custom_pickler. I updated rpc_pickler_test.py to prevent a race condition where processes were not registering their pickler before handling their rpc_sync calls. Test Plan: rpc_pickler_test file: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test //caffe2/torch/fb/training_toolkit/backend/metrics/collectors/fbdata_aggregator/tests:batch_collector_test -- --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx rpc_pickler stress test: buck test mode/dev-nosan -c 'cxx.coverage_only=caffe2' //caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test -- --exact 'caffe2/torch/fb/training_toolkit/backend/metrics/tests:rpc_pickler_test - test_send_export_type_through_rpc_with_custom_pickler (caffe2.torch.fb.training_toolkit.backend.metrics.tests.rpc_pickler_test.CythonTypeRpcSpawnTest)' --run-disabled --collect-coverage '--code-coverage-session=test_session' --force-tpx --jobs 18 --stress-runs 10 --record-results Reviewed By: mrshenli Differential Revision: D32316077 fbshipit-source-id: e58de2335fbaa3ab46d46fe222c659197633a5e4	2021-11-11 12:28:55 -08:00
Michael Suo	b473ca999b	[lint] add cmakelint to lintrunner (#68191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68191 + fix filename of exec_linter Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32364022 Pulled By: suo fbshipit-source-id: 740892d9580edc348c3e818664fd37f145669fda	2021-11-11 12:19:01 -08:00
Alex Beloi	6cade3362b	[fx-acc] add optimize_noop graph opt (#68131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68131 Ports EliminateNoop to FX Adds optimization for a few more ops and cases than the glow version * `acc_ops.dequantize` * `acc_ops.flatten` * `acc_ops.(max\|min)_full_reduce` * `acc_ops.permute` * `acc_ops.reshape` * `acc_ops.squeeze` * `acc_ops.to_dtype` Already covered by either constant fold or custom mapper * acc_ops.slice_tensor * acc_ops.getitem Bug fix * If `-1` is used in reshape's `shape` argument, we would convert this inferred value to actual positive value but needed to use integer division, otherwise we get a float in the shape tuple. Existing unit tests didn't cover this because `unittest.TestCase.assertEqual(1, 1.0)` doesn't check types and returns `True`. Test Plan: # Graph Opt `buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts -- TestEliminateNoOp` ``` Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 95c17eb9-cd4d-463a-96c8-358ca3679d56 Trace available for this run at /tmp/tpx-20211105-144929.801413/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (4.873) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_01_noop_dequantize (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.032) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_02_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_12_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.081) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_15_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.022) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_20_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.126) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_18_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_08_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.034) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_16_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.183) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_10_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_06_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.038) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_09_noop_reshape (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.055) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_00_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.025) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_04_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_21_noop_cat (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.037) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_03_noop_flatten (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.040) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_19_noop_max_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.135) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_11_noop_squeeze (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.036) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_14_to_dtype (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.024) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_17_noop_avg_pool2d (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.114) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_13_noop_tile (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.031) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_05_noop_max_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.026) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_eliminate_noop_07_noop_min_full_reduce (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestEliminateNoOp) (0.030) Summary Pass: 22 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5629499609900775 ``` # Shape Inference `buck test mode/opt //glow/fb/fx/acc_tracer:test_acc_shape_inference` ``` Summary Pass: 99 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4503599703156114 ``` Reviewed By: jfix71 Differential Revision: D32081046 fbshipit-source-id: 22403f2bb72a2605f1adcbb733e8150795c7984b	2021-11-11 12:08:24 -08:00
Saketh Are	fe90313d02	Avoid index_put_ overhead in histogram kernel's inner loop (#67815 ) Summary: TLDR: Makes torch.histc run 400x faster on large inputs. Should fix [a broken test on internal CI](https://www.internalfb.com/intern/test/281475013640093/). HistogramKernel presently calls torch.Tensor.index_put_ once for each element of its input tensor. Obtaining a data pointer and manipulating it directly avoids the considerable dispatch overhead from calling index_put_. Behavior is unchanged because the tensor being operated on is known to be contiguous and in CPU memory. Fixes performance regression introduced in https://github.com/pytorch/pytorch/pull/65318. Benchmark: time taken to compute histc on a tensor with 10,000,000 elements 1. Before https://github.com/pytorch/pytorch/pull/65318: 0.003s 2. After https://github.com/pytorch/pytorch/pull/65318: 2.154s 3. After this change: 0.005s Benchmark code: ``` import torch as t from timeit import default_timer as timer x = t.randperm(10000000, dtype=t.float32) start = timer() t.histc(x) end = timer() print(end - start) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67815 Reviewed By: anjali411 Differential Revision: D32357663 Pulled By: saketh-are fbshipit-source-id: f8fa59173ea4772c8ad1332548ef4d9ea8f01178	2021-11-11 11:16:45 -08:00
Kevin Tse	61a94495d9	[DataPipe] adding ZipperMapDataPipe (#68032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68032 Part of #57031 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32263058 Pulled By: NivekT fbshipit-source-id: 13a30ee9d9779284a9fd9bb7222fc41253c6fe3b	2021-11-11 10:36:05 -08:00
Martin Yuan	bd5f33f91e	demo backend decoupled from operators (#66100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66100 A backend should not directly dependent on ATen operators. The demo backend is changed to that way for testing purpose. Test Plan: Imported from OSS Reviewed By: pavithranrao Differential Revision: D31384614 Pulled By: iseeyuan fbshipit-source-id: c97f0c4aa12feb1d124f1d7a852e9955a7a2ce42	2021-11-11 10:26:17 -08:00
Jacob Szwejbka	97a386805e	[Pytorch Edge] Add selective macros to metal ops (#68134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68134 Add the macros in preparation of making these selective. Should be a no-op in this diff. ghstack-source-id: 143023844 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D32326833 fbshipit-source-id: 7abc93102bff0aa0bc5e3383bdf3e95fb84ce5ba	2021-11-11 10:15:31 -08:00
Ivan Yashchuk	c2642b6465	Sparse CSR CPU: add `torch.add` with all inputs sparse (#64391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64391 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors on CPU. Fixes #59060 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32316562 Pulled By: cpuhrsch fbshipit-source-id: 384462369007854b5e2e6cb9ae7b320302627c71	2021-11-11 10:02:12 -08:00
Natalia Gimelshein	84d3df8027	Fast cuda layer norm (#67977 ) Summary: This adds apex-inspired fast layer norm forward kernel to pytorch (it is a significant rewrite though). It's much faster than current implementation, for a typical transformer size (32*196, 1024) time goes down from ~180us to ~49 us on Volta. Compared to apex, it also produces bitwise accurate results between float inputs representable in fp16, and fp16 inputs. It produces slightly different results compared to current implementation though, because welford summation is implemented differently. It is slower than lightSeq (~37 us), but lightseq uses inaccurate variance approximation, and doesn't guarantee float - fp16 bitwise accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67977 Reviewed By: mruberry Differential Revision: D32285331 Pulled By: ngimel fbshipit-source-id: a8b876a9cf3133daacfe0ce3a37e3ad566f4b6a8	2021-11-11 09:32:40 -08:00
eqy	a1ace029e2	Add host-side memory requirement for `test_softmax_64bit_indexing` (#67922 ) Summary: https://github.com/pytorch/pytorch/issues/67910 The original `largeTensorTest` decorator didn't account for the additional host-side memory requirements. Thanks crcrpar for raising the issue, CC ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67922 Reviewed By: malfet Differential Revision: D32308602 Pulled By: mruberry fbshipit-source-id: 97b7d2c39fe63c1a8269402f72186026a89f6b4c	2021-11-11 09:24:15 -08:00
Kushashwa Ravi Shrimali	9e7b314318	OpInfo for `nn.functional.conv1d` (#67747 ) Summary: This PR adds OpInfo for `nn.functional.conv1d`. There is a minor typo fix in the documentation as well. Issue tracker: https://github.com/pytorch/pytorch/issues/54261 cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67747 Reviewed By: malfet Differential Revision: D32309258 Pulled By: mruberry fbshipit-source-id: add21911b8ae44413e033e19398f398210737c6c	2021-11-11 09:23:04 -08:00
Vinnam Kim	35f1617001	Implement Entropy methods for Binomial and Multinomial distributions (#67609 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60866. Because it seems https://github.com/pytorch/pytorch/pull/61719 shows no response for a long time, I made this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67609 Reviewed By: malfet Differential Revision: D32310866 Pulled By: mruberry fbshipit-source-id: b3a8dde452f448e5981f5405f5f925f860b0d84f	2021-11-11 09:16:28 -08:00
Ivan Kobzarev	864c6b3794	[nnc] aotCompiler outputSpec support quantized outputs (#67711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67711 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32115833 Pulled By: IvanKobzarev fbshipit-source-id: e96eb72a290ffb88011b86b3c65c0eff864b63dc	2021-11-11 09:01:46 -08:00
Ivan Kobzarev	362c6069b9	[nnc] Lazy lowerings registration; custom classes network params (#67623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67623 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32065076 Pulled By: IvanKobzarev fbshipit-source-id: 4945ac6483938d428c539ed1ce4fcd6988b34250	2021-11-11 09:00:23 -08:00
Vinnam Kim	f89572f417	Add feature: zeros_like() from a dense tensor to a sparse tensor (#68108 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67904. - Create a sparse tensor when the sparse layout is given even if the input tensor is not sparse. cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/68108 Reviewed By: anjali411 Differential Revision: D32316269 Pulled By: cpuhrsch fbshipit-source-id: 923dbd4dc7c74f51f7cdbafb2375a30271a6a886	2021-11-11 08:54:15 -08:00
Shirong Wu	5efe5e243a	Ease constrain for fuse path in trt lower (#68148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68148 Question raised regarding whether we should fuse pass a->b->c if node a has other consumer rather than node b. This diff is to ease the constrain in fuse path so that in case: ``` a \| \| b d \| c ``` we still allow fuse path(a->b->c), after fuse, node b will be eliminated by dead_node_eliminator while node a keep in graph. Reviewed By: yinghai, 842974287 Differential Revision: D32296266 fbshipit-source-id: 44ded07a97b5b708bdf37193a022fae21410b4bd	2021-11-11 08:48:34 -08:00
Richard Zou	d4ae789655	OpInfos for new_blah functions and some _like functions (#67357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67357 This PR adds OpInfos for: - new_ones, new_zeros, new_full, new_empty - rand_like, randint_like I forgot to add the _like functions in a previous PR, so here they are. Test Plan: - wait for tests Reviewed By: mruberry Differential Revision: D31969533 Pulled By: zou3519 fbshipit-source-id: 236d70d66e82f1d6f8e5254b55ca2a37b54c9494	2021-11-11 07:21:23 -08:00
Vasiliy Kuznetsov	4466ba8f30	Working POC of define-by-run quantization (#64676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64676 We implement a working eager mode quantization flow which uses tracing and `__torch_function__` and `torch.nn.Module.__call__` overrides to automate the model modifications needed for quantization. Partial program capture (instead of full program capture) is used, allowing this scheme to target a wide variety of user programs. Control flow over quantizeable ops is not supported, but general control flow is supported. In particular: * `auto_trace.py` contains the machinery to override `__torch_function__` and `torch.nn.Module.__call__` and call hooks before and after each quantizeable module or function * `quantization_state.py` contains the state needed to use the hooks to implement quantization logic such as adding quants/dequants, observers, etc. * please see `README.md` for more details Test Plan: ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` ``` python test/test_quantization.py TestAutoTracing python test/test_quantization.py TestAutoTracingModels ``` Differential Revision: D31992281 D31992281 Reviewed By: HDCharles Pulled By: vkuzo fbshipit-source-id: 6d40e855f3c96b9a4b637a0e677388a7b92f7967	2021-11-11 06:25:24 -08:00
Rohan Varma	f02efc749a	[Dist CI][BE] Run each test in its own process for test_distributed_spawn (#67901 ) Summary: Context: https://github.com/pytorch/pytorch/issues/67061 Use `run_test.py`'s provided flag `"--subprocess"`, passed in like `extra_unittest_args=["--subprocess"]` when running test_distributed_spawn. This will ensure that each test is run separately in its own process. The goal is to more closely simulate how a developer would run a single test when reproducing a CI failure and make reproducibility easier in general. Also, when a test fails, print out the exact command that was issued so developer knows how to reproduce it. For example test fails, it will print out something like the following to logs - ``` Test exited with non-zero exitcode 1. Command to reproduce: BACKEND=gloo WORLD_SIZE=3 /fsx/users/rvarm1/conda/envs/pytorch/bin/python distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_Backend_enum_class ``` running test_distributed_spawn is still the same cmd as before: ` python test/run_test.py --verbose -i distributed/test_distributed_spawn ` as seen in [distributed contributing](https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md) guide. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67901 Reviewed By: cbalioglu, mruberry Differential Revision: D32225172 Pulled By: rohan-varma fbshipit-source-id: 7e8d4c7a41858044bd2a4e0d1f0bf8f1ac671d67	2021-11-11 06:11:00 -08:00
Nikolay Korovaiko	aea4e61ec3	skip test_jit_legacy (#68129 ) Summary: disables failing tests in [https://github.com/pytorch/pytorch/issues/66429](https://github.com/pytorch/pytorch/issues/67646) Pull Request resolved: https://github.com/pytorch/pytorch/pull/68129 Reviewed By: suo, janeyx99 Differential Revision: D32326118 Pulled By: Krovatkin fbshipit-source-id: ca00d2214503f418be45dc756057b990fb6e6370	2021-11-10 23:08:32 -08:00
Facebook Community Bot	a6a2616558	Automated submodule update: kineto (#67445 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `f60ad2cb0f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67445 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: robieta Differential Revision: D31993939 fbshipit-source-id: 3d4aa2f900434d4bbe5134db8453deb227ef5685	2021-11-10 22:33:03 -08:00
Chen Lai	a229c3e51a	Add complete type name in error message when fail to export model (#67750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67750 Add more information about why exporting model fails. Before: error message: ``` E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete terminate called after throwing an instance of 'c10::Error' what(): __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). The problematic type is: __torch__.dper3.core.schema_utils.IdListFeature Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first): ``` After ``` E1102 22:57:42.984015 3220949 ExceptionTracer.cpp:221] exception stack complete terminate called after throwing an instance of 'c10::Error' what(): __torch__ types other than torchbind (__torch__.torch.classes)are not supported in lite interpreter. Workaround: instead of using arbitrary class type (class Foo()), define a pytorch class (class Foo(torch.nn.Module)). Exception raised from getFunctionTuple at caffe2/torch/csrc/jit/serialization/export_module.cpp:246 (most recent call first): ``` ghstack-source-id: 143009294 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D32129397 fbshipit-source-id: 0594a98a59f727dc284acd1c9bebcd7589ee7cbb	2021-11-10 21:04:05 -08:00
Mike Iovine	1f07efd0f2	[SR] Fix aten::split schema (#68135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68135 Update the schema to reflect the changes in D31935573 (`6b44e75f6b`). Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Confirmed native implementation is used. Reviewed By: hlu1 Differential Revision: D32326865 fbshipit-source-id: 7f607f57ceb6690a2782d94d9ee736ba64e7d242	2021-11-10 20:03:30 -08:00
Hao Lu	47bc47f2b9	[SR] Add runtime check to correct bad schema alias info (#67825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67825 The comment explains how it works. Test Plan: A small regression to local and local_ro if we only enable it for fallback ops. ``` ## local_ro # before I1103 21:25:05.250440 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22213. Iters per second: 818.247 I1103 21:25:08.629221 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22351. Iters per second: 817.319 I1103 21:25:12.005179 2636751 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22285. Iters per second: 817.759 I1103 21:25:12.005236 2636751 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.22283, standard deviation: 0.000693619 # after # # only enable for fall back ops: 0.7% I1103 21:26:40.190436 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.22928. Iters per second: 813.481 I1103 21:26:43.590443 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23265. Iters per second: 811.262 I1103 21:26:46.992928 2644597 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.23379. Iters per second: 810.51 I1103 21:26:46.992980 2644597 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.23191, standard deviation: 0.0023424 # enable for all (no clone): 4.7% I1103 21:27:55.291216 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.28204. Iters per second: 780.005 I1103 21:27:58.822347 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27854. Iters per second: 782.14 I1103 21:28:02.354184 2649780 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 1.27958. Iters per second: 781.506 I1103 21:28:02.354240 2649780 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 1.28006, standard deviation: 0.00179765 # local # before I1103 21:52:00.784718 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.676. Iters per second: 50.8233 I1103 21:52:28.985873 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.699. Iters per second: 50.7641 I1103 21:52:57.200223 2765168 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.6953. Iters per second: 50.7735 I1103 21:52:57.200273 2765168 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.6901, standard deviation: 0.0123206 # after # # only enable for fall back ops: 0.1% I1103 21:45:25.514535 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7103. Iters per second: 50.7349 I1103 21:45:53.773594 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7005. Iters per second: 50.7601 I1103 21:46:21.955680 2734440 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.7398. Iters per second: 50.659 I1103 21:46:21.955729 2734440 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.7169, standard deviation: 0.0204658 # enable for all (no clone): 0.9% I1103 21:43:22.162272 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8893. Iters per second: 50.2783 I1103 21:43:50.651847 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8566. Iters per second: 50.3611 I1103 21:44:19.068519 2723868 PyTorchPredictorBenchLib.cpp:274] PyTorch run finished. Milliseconds per iter: 19.8793. Iters per second: 50.3037 I1103 21:44:19.068570 2723868 PyTorchPredictorBenchLib.cpp:285] Mean milliseconds per iter: 19.875, standard deviation: 0.0167498 ``` Reviewed By: d1jang Differential Revision: D32124812 fbshipit-source-id: 0f60c26f8fb338d347e4ca7a70b23e5a386fc9aa	2021-11-10 19:35:11 -08:00
Dhruv Matani	ca7d0062ad	[PyTorch Edge] Better error message when training attribute is not found (#68103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68103 The error message `'training' attribute not found.` in itself isn't particularly actionable. Anyone running into this tends to be clueless regarding why they are getting this message. For example, see [this post](https://fb.workplace.com/groups/pytorch.edge.users/posts/965868874283406/) asking for help when seeing this specific error message. The most common reason for this error is that users call `.eval()` in the model instance before saving it. This change tries to draw attention to that oversight and allows them to proactively investigate and correct that mis-action if necessary. This saves valuable time for our users and effort from the team tp provide support. Overall, I believe this is a Developer Experience win. ghstack-source-id: 143021300 Test Plan: Build/CI Reviewed By: JacobSzwejbka Differential Revision: D32304477 fbshipit-source-id: 474abe717a862347f16ad981834ddab6819cb4d3	2021-11-10 19:31:10 -08:00
Nikita Shulga	0e366b8e5f	Make `torch.fx.experimental.fx2trt.passes` a package (#68139 ) Summary: Only packages and tools (which are explicitly specified) are included in the wheel/conda files Pull Request resolved: https://github.com/pytorch/pytorch/pull/68139 Test Plan: Run `python3 -c "from setuptools import find_packages; print([x for x in find_packages(exclude=('tools','tools.*')) if 'torch.fx' in x])"` before and after the change Fixes https://github.com/pytorch/pytorch/issues/68059 Reviewed By: nrsatish, seemethere Differential Revision: D32330483 Pulled By: malfet fbshipit-source-id: a55443730999a83c615b3f943c327353c011bf7b	2021-11-10 15:57:29 -08:00
Dani El-Ayyass	f171c78c04	add unpack_sequence and unpad_sequence functions (#66550 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66549 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66550 Reviewed By: malfet Differential Revision: D32299193 Pulled By: jbschlosser fbshipit-source-id: 96c92d73d3d40b7424778b2365e0c8bb1ae56cfb	2021-11-10 15:15:08 -08:00
Shirong Wu	a510f4139b	Fix lambda function broke torch.save Summary: Torch.save use pickle, which cannot handle lambda function or local function directly without modify serialization.py. This diff fix the issue by extract lambda to a normal function. Test Plan: buck test mode/dev-nosan //caffe2/test/fx2trt/core:test_trt_module Reviewed By: 842974287 Differential Revision: D32320536 fbshipit-source-id: 497d2e64f94526f92e6d1a9909b6ad629dbca850	2021-11-10 14:21:06 -08:00
soulitzer	22e73f616c	Update unpack_dual to return named tuple (#68062 ) Summary: Also updates the doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/68062 Reviewed By: gchanan Differential Revision: D32315089 Pulled By: soulitzer fbshipit-source-id: 567c812da093daeb6549b0dc7ecbffd58eb8ccc2	2021-11-10 14:14:55 -08:00
Will Constable	d6e6064efc	[LT] Upstream backend interfaces (#67927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67927 BackendData - represents 'tensor data' in opaque backend storage LoweringContext - interface for performing backend-specific IR lowering BackendImplInterface - interface for lazy tensors backends to implement Reorgs backend-related files into lazy/backend subdir includes a few small fixes, which were made on lazy_tensor_staging but need to be back-ported to master. Test Plan: used by lazy_tensor_staging branch Reviewed By: desertfire Differential Revision: D32142032 fbshipit-source-id: 828c717bcd0d511876e64ad209b50f7bfb10cec5	2021-11-10 12:55:31 -08:00
Howard Huang	c075f0f633	Update rpc testing to include USE_TENSORPIPE directive (#68080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68080 Fixes #68002 After FaultyProcessGroupAgent was replaced with FaultyTensorpipeAgent there is now a dependency on Tensorpipe for rpc testing. However, if user does not have USE_TENSORPIPE enabled they will hit an issue such `undeclared identifier 'FaultyTensorPipeRpcBackendOptions'`. This is for testing the faulty agent method so it should not block compilation. Update to wrap the Tensorpipe specific code in a directive. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32292861 Pulled By: H-Huang fbshipit-source-id: 4ffb879860ced897674728200a1831f18fea0a4a	2021-11-10 12:12:18 -08:00
Eli Uriegas	a3bb95c1b5	don't include label in ci: sev issue (#68093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68093 We don't want regular users without write access to be able to file an actual issue with the `ci: sev` label since that issue will automatically show up on hud.pytorch.org Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32299553 Pulled By: seemethere fbshipit-source-id: d46a96f16ae29120fff94288d3e0c06b103edf7f	2021-11-10 12:03:18 -08:00
Mike Iovine	ecd5b1a8d4	[SR] Native implementation for aten::split (#67476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67476 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31994040 fbshipit-source-id: 9de57d8d7925ee46544478eae8229952ca5f248a	2021-11-10 10:23:03 -08:00
Samuel Salas	746a31b290	Logger integration format (#67962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67962 Logger integration format for chunks at [dims] -> input_val.shape[dim] NOTE: Unused typing imports removed Test Plan: buck run -c python.package_style=inplace mode/dev-nosan caffe2/torch/fb/fx2trt:test_chunk out: RuntimeWarning: Asked for 2000 chunks along dimention 2 on tensor with size (3, 10, 20), chunks will default to 20 Reviewed By: 842974287 Differential Revision: D32233039 fbshipit-source-id: 1fde12c9f743bb80cdb309e0b7be287173d45147	2021-11-10 10:12:06 -08:00
Natalia Gimelshein	8dfbc620d4	don't hardcode mask type in mha (#68077 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/68077 Reviewed By: zou3519 Differential Revision: D32292410 Pulled By: ngimel fbshipit-source-id: 67213cf5474dc3f83e90e28cf5a823abb683a6f9	2021-11-10 09:41:21 -08:00
George Qi	ae5864498d	torch.allclose opinfo (#68023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68023 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32295811 Pulled By: george-qi fbshipit-source-id: 3253104a5a9655d8ba7bbba6620038ed6d6669f1	2021-11-10 09:16:39 -08:00
Joel Schlosser	9a2db6f091	Factor backend routing logic out of convolution forward (#67790 ) Summary: This PR introduces a new function `_select_conv_backend` that returns a `ConvBackend` enum representing the selected backend for a given set of convolution inputs and params. The function and enum are exposed to python for testing purposes through `torch/csrc/Module.cpp` (please let me know if there's a better place to do this). A new set of tests validates that the correct backend is selected for several sets of inputs + params. Some backends aren't tested yet: * nnpack (for mobile) * xnnpack (for mobile) * winograd 3x3 (for mobile) Some flowcharts for reference: ![conv_routing_graph md](https://user-images.githubusercontent.com/75754324/140828957-1135b400-38c0-4c9f-87ef-4f33ceebeeae.png) ![conv_nogroup_routing_graph md](https://user-images.githubusercontent.com/75754324/140828977-ed223a4e-aa86-49f1-9925-c0f6b9ab36af.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67790 Reviewed By: zou3519 Differential Revision: D32280878 Pulled By: jbschlosser fbshipit-source-id: 0ce55174f470f65c9b5345b9980cf12251f3abbb	2021-11-10 07:53:55 -08:00
vfdev-5	147de8243b	Fixed deprection warnings with `.data<T>()` in SpectalOps.cpp (#67993 ) Summary: Description: - Fixed deprection warnings `.data<T>()` -> `.data_ptr<T>()` in SpectralOps.cpp shown while building pytorch from source ```c++ ../aten/src/ATen/native/mkl/SpectralOps.cpp:213:10: warning: ‘T* at::Tensor::data() const [with T = c10::complex<double>]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor. data_ptr<T>() instead. [-Wdeprecated-declarations] 213 \| return reinterpret_cast<std::complex<T>*>(t.data<c10::complex<T>>()); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67993 Reviewed By: H-Huang Differential Revision: D32246945 Pulled By: mruberry fbshipit-source-id: 5cd6b0ac6ddff0afc56e99641971e1e3b6434af6	2021-11-10 07:33:15 -08:00
Jiewen Tan	6011c35a79	[LTC] Upstream class BackendDevice (#68027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68027 This commit upstreams class BackendDevice to the master, which is a backend specific representation of the actual hardware, for instances, CPU, GPU, or TPU. This concept is important for backend like XLA where it needs to tell the actual hardware type from the c10::DeviceType::Lazy virtual device during both IR constructions and lowerings. Test Plan: ./build/bin/test_lazy --gtest_filter=BackendDeviceTest.* Reviewed By: wconstab Differential Revision: D32261838 Pulled By: alanwaketan fbshipit-source-id: 579c3fc5f9da7847c887a383c6047e8ecb9cc5bc	2021-11-10 07:05:43 -08:00
Alban Desmaison	a6c0edff1a	fix gradcheck to generate valid input for forward AD complex (#68001 ) Summary: This fixed a few of the linalg checks that we disabled before! This also seems to break sgn, abs and angle (sending on CI here to see if there are more). These two functions used to only ever get pure imaginary or real values. This is very much likely that something is wrong with their formula. But they are implemented as element-wise, so not sure where the error can come from. I tried to look at it but nothing obvious seems wrong there (especially because it is correct in backward mode). Pull Request resolved: https://github.com/pytorch/pytorch/pull/68001 Reviewed By: soulitzer Differential Revision: D32280475 Pulled By: albanD fbshipit-source-id: e68b1ce0e2e97f8917c3d393141d649a7669aa9d	2021-11-10 03:07:48 -08:00
oliver	94b6fa6f8b	Adds an optimizer instance variable to ChainedScheduler (#68010 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67601. As simple a fix as I could make it. I even managed to delete some testing code! I checked calling `super()` and, as I had feared, it doesn't work out the box, so perhaps that ought to be revisited later. As it stands, https://github.com/pytorch/pytorch/issues/20124, still applies to the chained scheduler, but I think this change is still an improvement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/68010 Reviewed By: zou3519 Differential Revision: D32278139 Pulled By: albanD fbshipit-source-id: 4c6f9f1b2822affdf63a6d22ddfdbcb1c6afd579	2021-11-10 01:31:47 -08:00
Dhruv Matani	cb2a41e508	[PyTorch Edge] Don't use LeftRight in mobile (#66064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66064 The only place this is used seems to be in the dispatcher for `operatorLookupTable_`. Disarming `LeftRight` disarms it for this one use case. This should make .so loading faster, and also reduce memory consumption since `LeftRight<T>` does 2 writes for every write. I'd like to get a thorough review from reviewers for this diff since I want to make sure that initialization of stuff that writes into the dispatcher isn't going to happen on multiple threads for on-device use. Created a new class named `LeftRightNoOpWrapper<T>` for use in mobile builds. ### Why is LeftRight<T> slow? It maintains 2 copies of each data structure `T` to be able to keep reads quick. Every write goes to both data structures, which means that writes that 2x and memory overhead is also 2x ### Why is this safe for mobile builds? 1. .so loading never happens concurrently with model execution 2. Custom ops are loaded during .so load - initializers are all run serially 3. I don't see any threads being spawned from the global schema and kernel initializers After discussing with dreiss, it seems like there could be rare cases in OSS apps or internal Android/iOS apps where a `.so` or `dylib` is loaded after the PT runtime is loaded, and this load happens concurrently with an in-progress inference run, which is looking up the operator table in the dispatcher. To avoid crashes there, it seems reasonable to use the RW lock, since I don't expect any contention 99.9% of the time. When registering operators, everything is serial so only one thread will ever hold the lock. The next time it needs the lock, it will have already released it. During inference runs, only one thread will ask for the shared lock unless multiple concurrent inferences are in progress. Even in that case, they will all be able to simultaneously get the Read lock. Test Plan: Build and generate a local build of the iOS app to test. Reviewed By: swolchok Differential Revision: D31352346 fbshipit-source-id: c3f12454de3dbd7b421a6057d561e9373ef5bf98	2021-11-09 21:49:45 -08:00
Dhruv Matani	b0817e19e0	[PyTorch] Avoid reading file from stream for 0 byte Tensor storage (#67787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67787 First noticed in https://fb.workplace.com/groups/pytorch.edge.team/posts/952737705280969/ - basically one of the speech models has ~400 0 byte tensor files, so we're basically paying the cost of looking it up in the archive and reading nothing from it. Turns out that there's a fairly simple fix to avoid reading a 0 byte tensor. Once we notice that it's 0 bytes, just use the default `DataPtr` instead to initializing it with 0 bytes read in from the input file stream. ghstack-source-id: 142025211 Test Plan: CI and manually ran a couple production mobile models with bundled inputs. CI Will run all prod. mobile mobiles with bundled inputs. Reviewed By: swolchok Differential Revision: D32054983 fbshipit-source-id: 919b0cdbc44bccb8f6cfe0da10ff5474af37fd99	2021-11-09 21:45:05 -08:00
Dhruv Matani	bf31d4b2b5	[PyTorch] Replace copy_ with data_ptr<float>() since input Tensor's dtype is guaranteed to be float (#67788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67788 Based on comments from supriyar in D31657430 (`20aa417e38`). ghstack-source-id: 142924000 Test Plan: CI Reviewed By: supriyar Differential Revision: D32055028 fbshipit-source-id: 756d526585f8ded755ea42b52dbbf5c1687acde2	2021-11-09 21:40:23 -08:00
Elias Ellison	6b44e75f6b	aliasing fixes (#66977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66977 Fix for https://github.com/pytorch/pytorch/issues/47218 More context is in original PR here: https://github.com/pytorch/pytorch/pull/20556 Test Plan: Imported from OSS Reviewed By: malfet, albanD Differential Revision: D31935573 Pulled By: eellison fbshipit-source-id: 3658d5711116396c35f1d5016773b0096ed347a5	2021-11-09 18:33:37 -08:00
Shirong Wu	3f1a3f7b18	Fix ads dense arch regression (#68071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68071 Reviewed By: yinghai Differential Revision: D32261611 fbshipit-source-id: 3224464bbf30fecbdb69e6ae88e42485ef67f800	2021-11-09 18:22:01 -08:00
Natalia Gimelshein	91af74c934	remove Generate* macro files (#67940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67940 Reviewed By: mruberry Differential Revision: D32250987 Pulled By: ngimel fbshipit-source-id: 3feb0bc876bc26d0a42784e5c6001670ed71e971	2021-11-09 17:31:56 -08:00
eqy	790763b0fe	Add an option to disable reduced precision reductions for FP16 GEMM (#67946 ) Summary: https://github.com/pytorch/pytorch/issues/67578 disabled reduced precision reductions for FP16 GEMMs. After benchmarking, we've found that this has substantial performance impacts for common GEMM shapes (e.g., those found in popular instantiations of multiheaded-attention) on architectures such as Volta. As these performance regressions may come as a surprise to current users, this PR adds a toggle to disable reduced precision reductions `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = ` rather than making it the default behavior. CC ngimel ptrblck stas00 Note that the behavior after the previous PR can be replicated with `torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction = False` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67946 Reviewed By: zou3519 Differential Revision: D32289896 Pulled By: ngimel fbshipit-source-id: a1ea2918b77e27a7d9b391e030417802a0174abe	2021-11-09 17:27:20 -08:00
Jiakai Liu	078c655985	[nnc][mobile] temporarily disable quantization external functions (#68029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68029 Temporarily disable quantization external functions with a new macro DISABLE_NNC_QUANTIZATION. The ATen CPU library consists of two parts: A. Common operator functions, e.g. "at::empty()", the list of sources can be found at "aten_cpu_source_list" in "tools/build_variables.bzl"; B. Implementations of these operators, e.g. "at::native::empty()", the list of sources is defined at "aten_native_source_list" in "tools/build_variables.bzl"; Note that A does not directly depend on B. A calls B via dispatch table. The dependency is injected into the dispatch table by B during its static initialization. For internal mobile builds, B is built on a per-app basis. A is the public library for other libraries to depend on. Because these external functions call quantization functions that are not part of A, the NNC kernel library cannot resolve the missing symbols. Use this PR to unblock the internal experiment until we figure out a better solution (e.g. move quantization API to A). ghstack-source-id: 142868370 Test Plan: Make sure it can build with the stacked diff. Reviewed By: IvanKobzarev Differential Revision: D32239783 fbshipit-source-id: 3797b14104b0f54fb527bc3fc5be7f09cc93d9e4	2021-11-09 17:10:16 -08:00
Dmitry	b1a42298a4	Simplify example for nn.Flatten (#67472 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67415 Using the docstring example provided by jbschlosser to the issue submitted by qzylalala Pull Request resolved: https://github.com/pytorch/pytorch/pull/67472 Reviewed By: soulitzer Differential Revision: D32210995 Pulled By: jbschlosser fbshipit-source-id: f22bcd729699993942b6e676b479618ac613022c	2021-11-09 17:03:06 -08:00
Eli Uriegas	d8f0087e08	.github: Fix sccache for macOS workflows on push (#68094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68094 Turns out sccache was not getting activated properly on master pushes so this should help resolve that Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32299636 Pulled By: seemethere fbshipit-source-id: 5f1be98dffdb202d3c11b6ceb2b49af235e1f91b	2021-11-09 16:40:56 -08:00
Hao Lu	1b2a366932	[SR] Enforce checks for resizing of the internal buffer in MemoryPlanner in unit tests (#67941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67941 I just found out that due to the round up of the Tensor storage sizes to multiples of 64 bytes, resizing is not actually triggered for a lot of our unit tests (23 OSS, 16 internal). Now they should be all fixed. Also moved a bunch of tests to `test_static_module.cc` so that `test_static_runtime.cc` now only contains operator tests. From now on, by default if `args2` is passed to `test_static_runtime`, at the end of the second iteration, it would check that the managed buffer's size is bigger than the previous size and enforce that. You can bypass the check for ops with constant output sizes, such as `aten::sum` without `dim` passed in. Test Plan: Facebook ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` Reviewed By: swolchok Differential Revision: D32196204 fbshipit-source-id: 8425d9efe6b9a1c1e3807e576b1143efd7561c71	2021-11-09 16:07:40 -08:00
Eli Uriegas	8d025bbc2d	.github: Migrate macOS workflows to GHA (#67717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67717 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32287733 Pulled By: seemethere fbshipit-source-id: 8df6b20aada818ad39895ef87dc280098e09707b	2021-11-09 15:46:05 -08:00
Jacob Szwejbka	55e3b23abe	[Pytorch Edge] Generic Build Features for Selective Build (#67817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67817 Implementation of build features as a useable feature. Includes tracing support and selectivity support. Follow up of Dhruv's prototype in D30076214. The general idea is to allow selectivity of arbitrary sections of the codebase through the 2 apis, BUILD_FEATURE_REQUIRED(NAME), and BUILD_FEATURE_AVAILABLE(NAME) References PyTorch Edge Team Workplace group post link: https://fb.workplace.com/groups/pytorch.edge.team/posts/905584476662959/ Quip talking about some early ideas related to build features: https://fb.quip.com/iur3ApU9q29v Google Doc about most recent discussion and details: https://docs.google.com/document/d/1533zuN_9pwpQBa4RhtstUjT5B7guowblqJz35QYWPE0/edit Will remove the copy kernel example after. Its just here as an example. ghstack-source-id: 142850218 Test Plan: CI, dummy traced a model, and played around with its unit test if i removed the traced value from the yaml Reviewed By: dhruvbird Differential Revision: D32151856 fbshipit-source-id: 33764c1f6902a025e53807b784792a83c8385984	2021-11-09 15:37:21 -08:00
Kushashwa Ravi Shrimali	43ef6816f2	OpInfo for `nn.functional.cross_entropy` (#63547 ) Summary: Reference: https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261 TODOs: * [ ] Investigate autograd failures. * [ ] Clean up `test_nn.py` for `cross_entropy`. cc: mruberry zou3519 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63547 Reviewed By: mruberry Differential Revision: D32062955 Pulled By: zou3519 fbshipit-source-id: 2a62a4c28af51fb71159df2e262d05039d549b7e	2021-11-09 15:07:12 -08:00
James Reed	eaf0457eef	[distributed][docs] Delete distributed optimimzer section from RPC and add reference to namespace docs page (#68068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68068 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D32286554 Pulled By: jamesr66a fbshipit-source-id: a43fe1f0cfa74721f467b128f2e878bd02f32546	2021-11-09 15:01:54 -08:00
Brian Hirsh	7c90bd77ec	Test functionalization pass in python (#66101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66101 Updated description: This PR tests the functionalization pass in python in two ways. For each of the test programs that I have in `test_functionalization.py`, it: - runs the program with and without functionalization, and asserts the outputs and (potentially mutated) inputs are equal in both cases - runs the program with `LoggingTensor`, and uses expecttests on the resulting graph. I manually confirm that the graphs look reasonable and only contain functional ops. Mechanically, the changes include: - factoring out `LoggingTensor` into a testing util so it can be re-used in multiple tests - adding some private python api's in the `torch` namespace as hooks that I can use during testing In the original version of this PR, I also added some fixes to the `_make_subclass()` function in python: allowing you to pass in strides and storage_offset. I kept them in mainly because the changes were already there. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31942095 Pulled By: bdhirsh fbshipit-source-id: 90ff4c88d461089704922e779571eee09c21d707	2021-11-09 14:34:05 -08:00
Brian Hirsh	fe46d6c68f	functionalization: map copy_() -> to().expand_as() (#67878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67878 The functionalization pass doesn't work with `copy_()` which is a problem with functorch. Originally we were going to make a functional `copy()` operator to fix this problem, but zou3519 that we can get (most of) the existing functionality by mapping `self.copy_(src)` to `src.to(self).expand_as(self)`. This makes the codegen a bit uglier, but has the benefit of avoiding a totally unnecessary tensor allocation in functorch. Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32280588 Pulled By: bdhirsh fbshipit-source-id: 2c6ee65f0929e0846566987183ba2498c88496c2	2021-11-09 14:34:02 -08:00
Brian Hirsh	be4150139a	bugfix for conditional functionalization (#67715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67715 I had original made the `vector<ViewMeta>` and `Tensor`s stored on the `Update` struct references, but will pointed out a bug in the conditional-functionalization PR due to a use-after-free error. This happens because the queued-up updates might not be synced until later, and can out-live the original tensor that was used to create them. It was kind of strange that this doesn't show up in the existing `test/test_functionalization.py` tests that I have in this stack, which technically also should have this bug (they call sync_() after the mutated tensors have gone out of scope). I looked at it with gdb, and I'm wondering if it's just because the stored values in the free'd `ViewMeta`/`Tensor` just happen to not get clobbered by the time the sync is called in the test. Either way, copying the Tensor + vector<ViewMeta> is probably not ideal for performance, but I couldn't think of an easy work-around for now. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32136007 Pulled By: bdhirsh fbshipit-source-id: 707c6392a31b967e8965b9b77f297fd10a0a095a	2021-11-09 14:32:17 -08:00
Jane Xu	4100a5cc48	Revert D32286934: [pytorch][PR] replace platform specific CI environment variables with generic ones Test Plan: revert-hammer Differential Revision: D32286934 (`7d931fb082`) Original commit changeset: 1008938088da fbshipit-source-id: dd2dd07742670a34deec10995b95b98c9fd62724	2021-11-09 14:06:18 -08:00
Xiaoyu Zhang	273f7ae9b3	fx: Update fx.rst (#68043 ) Summary: When I run this part of the code on the document with PyTorch version 1.10.0, I found some differences between the output and the document, as follows: ```python import torch import torch.fx as fx class M(torch.nn.Module): def forward(self, x, y): return x + y # Create an instance of `M` m = M() traced = fx.symbolic_trace(m) print(traced) print(traced.graph) traced.graph.print_tabular() ``` I get the result： ```shell def forward(self, x, y): add = x + y; x = y = None return add graph(): %x : [#users=1] = placeholder[target=x] %y : [#users=1] = placeholder[target=y] %add : [#users=1] = call_function[target=operator.add](args = (%x, %y), kwargs = {}) return add opcode name target args kwargs ------------- ------ ----------------------- ------ -------- placeholder x x () {} placeholder y y () {} call_function add <built-in function add> (x, y) {} output output output (add,) {} ``` This pr modified the document。 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68043 Reviewed By: driazati Differential Revision: D32287178 Pulled By: jamesr66a fbshipit-source-id: 48ebd0e6c09940be9950cd57ba0c03274a849be5	2021-11-09 14:00:45 -08:00
Yifan Xiong	c7eaec86f0	[NCCL] Patch bfloat16 support (#67843 ) Summary: Patch bfloat16 support in NCCL, PR https://github.com/pytorch/pytorch/issues/63260 adds bfloat16 support but is still not complete to enable bfloat16 for allreduce in end-to-end training. This patch does the followings: * fix minimum NCCL version from 2.9.7 to 2.10, NCCL adds bf16 support in v2.10.3-1 (commit 7e51592) * update bfloat16 datatype flag in `csrc/cuda/nccl.cpp` so that NCCL operations like all reduce can use it * enable unit tests for bfloat16 datatype if possible cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67843 Reviewed By: H-Huang Differential Revision: D32248132 Pulled By: mrshenli fbshipit-source-id: 081e96e725af3b933dd65ec157c5ad11c6873525	2021-11-09 13:46:13 -08:00
Ben Koopman	45ac6f2b65	[quant] Fix comparison against reference for test_qat_functional_linear (#68061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68061 Test had a typo that didn't compare test value against reference value, fixed typo. Test Plan: `pytest test/quantization/fx/test_quantize_fx.py -v -k "test_qat_functional_linear"` Imported from OSS Reviewed By: HDCharles Differential Revision: D32280803 fbshipit-source-id: d57a25a0dcdd88df887a39b5117abafaf15125b2	2021-11-09 13:33:13 -08:00
John Clow	a9c2f11d2a	Update Freezing Logic and add new passes (#68024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68024 Pull Request resolved: #67949 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D32260614 Pulled By: eellison fbshipit-source-id: 41d7a9b45e33297a17560a22eba8973e2fc48b43	2021-11-09 13:21:52 -08:00
John Shen	d2438a8901	[qnnpack] Lock before weightpacking in qlinear (#68012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68012 Previous attempt to make qlinear threadsafe placed lock after weight ptr was already accessed via packB. Race condition occurs when thread1 acquires lock, packs weights but thread2 still uses old nullptr after acquiring the lock. This causes a null pointer dereference later. ghstack-source-id: 142714894 Test Plan: Tested on repro diff Reviewed By: kimishpatel Differential Revision: D32252563 fbshipit-source-id: 429fcd3f76193f1c4c8081608b6f725b19562230	2021-11-09 13:03:02 -08:00
Samantha Andow	e86058559a	Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492 Reviewed By: zou3519 Differential Revision: D32282580 Pulled By: samdow fbshipit-source-id: 115afe790328577357a90117bede3b6502590441	2021-11-09 12:57:38 -08:00
Michael Suo	726e2ed715	[lint] add more lints to lintrunner (#68069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68069 - executable bit - cub include - raw CUDA API usage Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32286559 Pulled By: suo fbshipit-source-id: 21d58e259c951424f9c6cbf1dac6d79fe7236aa4	2021-11-09 12:48:56 -08:00
Ivan Yashchuk	cbf596bf8e	Sparse CSR CPU: add `addmv_out` (#61536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61536 This PR adds CPU dispatch for `addmv_out` with Sparse CSR matrix. The implementation uses MKL Sparse library. If it's not available then a runtime error is thrown. Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated. MKL descriptor of sparse matrices is implemented in `at::mkl::sparse::MklSparseCsrDescriptor`. MKL Sparse doesn't allow switching indices type in runtime, it's predetermined in build time. Only 32-bit version of MKL was tested locally, but I expect 64-bit version to work correctly as well. When indices type of PyTorch CSR tensor doesn't match with MKL's, indices tensor is converted to MKL compatible type (`int` vs `int64_t`). cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D32141787 Pulled By: malfet fbshipit-source-id: b818a0b186aa227982221c3862a594266a58a2a6	2021-11-09 12:34:21 -08:00
Andrey Talman	7d931fb082	replace platform specific CI environment variables with generic ones (#68022 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59478 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68022 Reviewed By: seemethere Differential Revision: D32286934 Pulled By: atalman fbshipit-source-id: 1008938088da56807e85fb5d776abf79f28ef77b	2021-11-09 12:06:44 -08:00
Bin Bao	a027551358	[LT] Merge cache.h (#67929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67929 1. Write a node-hash based unit test for Cache 2. Replace CHECK with TORCH_CHECK in IrUtil Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32246134 Pulled By: desertfire fbshipit-source-id: c464bc300126d47e9ad4af3b3e8484a389757dc0	2021-11-09 12:02:02 -08:00
Bin Bao	a473417076	[LT] Merge permutation_util into master (#67766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67766 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab Differential Revision: D32147676 Pulled By: desertfire fbshipit-source-id: 528b48c9cf789abc171235091c7146b2ab7a9c76	2021-11-09 12:00:39 -08:00
Onyiee	442d7d72de	fixed type checking errors in options.py (#68056 ) Summary: Fixes [issue#64](https://github.com/MLH-Fellowship/pyre-check/issues/64) This PR fixes the type checking errors in torch/distributed/rpc/options.py. The variable types in 84:8 and 85:8 were declared to have type `List` but were sometimes assigned a value of `None`. This caused an incompatitble variable type error. Therefore, I changed the type from `List` to `Optional[List]` . Hence, this fixes the incompatitble variable type error. Signed-off-by: Onyemowo Agbo onionymous 0xedward Pull Request resolved: https://github.com/pytorch/pytorch/pull/68056 Reviewed By: zou3519 Differential Revision: D32282289 Pulled By: mrshenli fbshipit-source-id: ee410165e623834b4f5f3da8d44bd5a29306daae	2021-11-09 11:42:34 -08:00
Kevin Tse	acb035f513	Revert D31609714: Fix Dispatching not considering List[Optional[Tensor]] for dispatch Test Plan: revert-hammer Differential Revision: D31609714 (`c581f56c74`) Original commit changeset: bb91cafd32fb fbshipit-source-id: a04055e7af4bf8491b44bbc3e3bddc7831ab205e	2021-11-09 10:41:53 -08:00
Mike Iovine	6e53d6df83	[SR] Introduce StaticMethod (#67981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67981 To save on memory, various internal classes need to release all references to their `torch::jit::Module` after constructing their `StaticModule`. Unfortunately, many of these classes currently instantiate a `torch::jit::Method` attribute, which holds a reference to the `ivalue` backing its owning module. To avoid this, I've introduced a new subclass of `IMethod` to represent scripted functions backed by static runtime. Test Plan: CI Reviewed By: swolchok Differential Revision: D32232039 fbshipit-source-id: 434b3a1a4b893b2c4e6cacbee60fa48bd33b5722	2021-11-09 10:37:29 -08:00
Mike Iovine	5e19fb61fd	[SR] Release reference to JIT module if possible (#67911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67911 If we can remove `self` from the graph inputs, there is no need for `StaticModule` to hold onto its `Module` reference anymore. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D32190755 fbshipit-source-id: 9c4649a63b6e68c7d2e47395a23572985d2babb1	2021-11-09 10:35:31 -08:00
Pritam Damania	9ae3f3945b	Add remote_module logging to the __new__ method. (#68035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68035 RemoteModule is sometimes created using object.__new__ (ex: init_from_module_rref), in this case the logging in the __init__ method would not pick this up. As a result, adding a `__new__` method to RemoteModule to log all usages appropriately. ghstack-source-id: 142762019 Test Plan: waitforbuildbot Reviewed By: vipannalla Differential Revision: D32263978 fbshipit-source-id: a95ab0bb5d0836da8fe6333c41593af164b008d9	2021-11-09 09:32:34 -08:00
Peter Bell	96b4f2296e	CppSignature: Compare types by their mangled names (#67987 ) Summary: `.name()` has to call `__cxa_demangle` and allocate a new string, both of which can be avoided by just comparing the mangled names directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67987 Reviewed By: mruberry Differential Revision: D32264560 Pulled By: H-Huang fbshipit-source-id: 9dd4388ba4e2648c92e4062dafe6d8dc3ea6484e	2021-11-09 08:52:42 -08:00
JackCaoG	114ef8c5ea	Add SiLU backward Aten symbol (#67665 ) Summary: This is related to https://github.com/pytorch/xla/issues/3192. bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/67665 Reviewed By: desertfire Differential Revision: D32245736 Pulled By: bdhirsh fbshipit-source-id: c5a2b24214fa37a181246cbbfcbee131473cf807	2021-11-09 08:14:02 -08:00
Richard Zou	c581f56c74	Fix Dispatching not considering List[Optional[Tensor]] for dispatch (#66506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66506 Followup to https://github.com/pytorch/pytorch/pull/60787 It turns out that the original PR was wrong for unboxed kernels. We recently ran into this in https://github.com/facebookresearch/functorch/issues/124 For unboxed kernels, the correct type for a Tensor?[] argument is actually `List<optional<Tensor>>`, not `ArrayRef<optional<Tensor>>` Test Plan: - assert that https://github.com/facebookresearch/functorch/issues/124 actually works Reviewed By: bdhirsh Differential Revision: D31609714 Pulled By: zou3519 fbshipit-source-id: bb91cafd32fb3c1b7d1e4f966b46b5d973b50df2	2021-11-09 08:00:09 -08:00
Kevin Tse	803e88d418	[DataPipe] Fixing pickling issues with fork and demux (#67930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67930 Fixes #67848 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32222184 Pulled By: NivekT fbshipit-source-id: 48871c45a855d92cd599e21f3b53827dd32c91ef	2021-11-09 07:54:02 -08:00
Joao Gomes	577a4d34a7	making import_module private and deprecating public method (#67990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67990 Duplicate of the following PR which was merged by mistake without ghimport https://github.com/pytorch/pytorch/pull/67914 cc albanD NicolasHug Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32247560 Pulled By: jdsgomes fbshipit-source-id: 8ba5ba7d17fc3d0d2c377da467ea805822e21ec1	2021-11-09 07:27:57 -08:00
Vasilis Vryniotis	0a9cd6d461	Removes unnecessary `no_pretrained_model` from test_quantize_fx.py (#67836 ) Summary: TorchVision accidentally included model builders for quantized models without weights; this was an old bug. These builders were largely unusable and caused issues to the users. Commonly they were filtered out to avoid causing issues. We've recently fixed that (https://github.com/pytorch/vision/pull/4854) by either removing those unnecessary builders or by providing quantized weights. This PR removes the no-longer necessary filtering of the methods. It should be merged after TorchVision is synced on FBCode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67836 Reviewed By: jerryzh168 Differential Revision: D32230658 Pulled By: datumbox fbshipit-source-id: 01cd425b1bda3b4591a25840593b3b5dde3a0f12	2021-11-09 05:49:27 -08:00
Alban Desmaison	f9422e1c6b	Fix deadlock for multi-output forward AD (#67995 ) Summary: Will hide some of the issues from https://github.com/pytorch/pytorch/issues/67367 This will at least allow us to run gradcheck for now until the above issue is fixed. For more context, the deadlock happens when we (wrongfully) set a forward grad that also has a forward grad of the same level. In particular, when exiting the level from `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L23)` We are taking the `all_forward_levels_mutex_` lock and proceed to delete the level at `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L29)` (nothing else usually references this object, so it gets deleted as soon as it gets removed from the vector). Note that, at this point, we still have the lock! In the level destructor in `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L55)` we are deleting the forward grad. Which triggers the deletion the grad Tensor and everything it holds (assuming nothing else references it). But in the (bad) case where this Tensor also has a forward grad for this level, the autograd meta clears the fw grads: `191b48b12f/torch/csrc/autograd/forward_grad.h (L124)` While clearing, we access the level (to de-register this forward grad) via `191b48b12f/torch/csrc/autograd/forward_grad.h (L139)` But this tries to access the level again in `191b48b12f/torch/csrc/autograd/forward_grad.cpp (L39)` and deadlocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67995 Reviewed By: soulitzer Differential Revision: D32250996 Pulled By: albanD fbshipit-source-id: f6118117effd3114fa90dc8fe22865339445f70c	2021-11-09 01:32:43 -08:00
oliver	f8297d40fc	Adds a `maximize` flag to SGD. (#67847 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46480 -- for SGD. ## Notes: - I have modified the existing tests to take a new `constructor_accepts_maximize` flag. When this is set to true, the ` _test_basic_cases_template` function will test both maximizing and minimizing the sample function. - This was the clearest way I could think of testing the changes -- I would appreciate feedback on this strategy. ## Work to be done: [] I need to update the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67847 Reviewed By: H-Huang Differential Revision: D32252631 Pulled By: albanD fbshipit-source-id: 27915a3cc2d18b7e4d17bfc2d666fe7d2cfdf9a4	2021-11-09 00:43:07 -08:00
Masaki Kozuki	c5e5264be2	Disable TF32 in `pinv_jvp` and `pinv_backward` (#67948 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67947 cc ptrblck xwang233 zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/67948 Reviewed By: H-Huang Differential Revision: D32251934 Pulled By: ngimel fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec	2021-11-08 22:33:29 -08:00
Natalia Gimelshein	417dc7f86c	Revert D32007691: [pytorch][PR] Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) Test Plan: revert-hammer Differential Revision: D32007691 (`ea60e7d559`) Original commit changeset: 6cb14dc56e29 fbshipit-source-id: 9ef599ef07302fb521b1f413b989786adfa3576c	2021-11-08 21:16:53 -08:00
Jane Xu	36d9a74bc6	Enforce that test cases extend from correct TestCase (#67819 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/66903 Main code is in torch/testing/_internal/common_utils.py and everything else is fixing the lint Pull Request resolved: https://github.com/pytorch/pytorch/pull/67819 Reviewed By: H-Huang Differential Revision: D32259978 Pulled By: janeyx99 fbshipit-source-id: 39c5ffbaa510e1e533d6bdacf5c6158a3dd9885d	2021-11-08 18:28:36 -08:00
Hojin Lee	25cd81876d	Fix typo grid_sampler_3d_cuda (#67752 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67752 Reviewed By: NivekT, mruberry Differential Revision: D32256561 Pulled By: H-Huang fbshipit-source-id: b4d56cadf15bc00181e899ea4be4b1bcfe63f692	2021-11-08 18:16:01 -08:00
Xiang Gao	4b1d044498	[WIP][resubmit] Don't #define NUM_THREADS (#68008 ) Summary: This reverts commit 9e8016d8c48e9c99addad93112f99d3375a0fbc7. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/68008 Reviewed By: H-Huang Differential Revision: D32254779 Pulled By: ngimel fbshipit-source-id: 38ec415199f62a1e58000abe3e34ac91898a94ae	2021-11-08 18:03:45 -08:00
vfdev-5	a2ab06514b	Fixes CUDA vs CPU consistency for index_put_ when accumulating (part 2) (#67189 ) Summary: Description: - Follow up PR to https://github.com/pytorch/pytorch/issues/66790 to fix the tests on functorch, https://github.com/pytorch/functorch/issues/195 In functorch, a null tensor is added to the list of indices for the batch dimension in C++, but I can not find an equivalent of that in python without using `torch.jit.script`. If any other better solutions could be suggested, I'd be happy to replace the current way of testing. cc ngimel zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67189 Reviewed By: suo Differential Revision: D31966686 Pulled By: ngimel fbshipit-source-id: a14b9e5d77d9f43cd728d474e2976d84a87a6ff4	2021-11-08 17:56:43 -08:00
James Reed	3f048c637f	[distributed] Render `torch.distributed.optim` members (#67885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67885 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32191952 Pulled By: jamesr66a fbshipit-source-id: a9ed52da8e89b3491eab2e691f5571338f83e8e3	2021-11-08 16:20:55 -08:00
Shiyan Deng	fd198a2fea	[fx2trt] fix import in oss tests (#68016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68016 We would want to use oss test utils. Also refactor both test utils so that the internal one is an enhancement over the oss test utils. Test Plan: CI Reviewed By: wushirong Differential Revision: D32250266 fbshipit-source-id: 968b8f215ca2d294f7d0bd13cf9563be567954dd	2021-11-08 16:11:00 -08:00
Shiyan Deng	0d8a8a2e41	[fx2trt]organize converter utils (#68015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/68015 Put all converter utils into a single file `converter_utils.py`. Test Plan: CI Reviewed By: wushirong Differential Revision: D32250243 fbshipit-source-id: 93fb34bc9ca23f4c3cef3125e04871083dbd413d	2021-11-08 16:09:42 -08:00
jcwchen	5b036d5f2b	[Doc] [ONNX]Fix a broken url for ONNXRuntime custom op (#67944 ) Summary: Description Update the broken url by a valid link https://onnxruntime.ai/docs/reference/operators/add-custom-op.html. Motivation Closes https://github.com/pytorch/pytorch/issues/67849. The url is broken. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67944 Reviewed By: NivekT Differential Revision: D32252880 Pulled By: H-Huang fbshipit-source-id: 400b0efa3d6f63e60b016c482fbbed1293c29806	2021-11-08 15:51:02 -08:00
Jane Xu	82398e38ab	Upgrade and fix boto3 version to 1.19.12 (#68025 ) Summary: The new boto3 version could be causing the macos test reporting to fail. Pinning to version 1.19.12 example fail: https://app.circleci.com/pipelines/github/pytorch/pytorch/406385/workflows/f15ca6ba-e8af-45a3-b1b0-c0298ea3fe9d/jobs/16687920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/68025 Reviewed By: malfet, seemethere Differential Revision: D32261971 Pulled By: janeyx99 fbshipit-source-id: 1a2cd636a2f0b206921749c3f0c9e4707c9a1222	2021-11-08 15:43:35 -08:00
Jane Xu	9094947b0a	use better secrets for upload labels workflow (#68013 ) Summary: Should prevent https://github.com/pytorch/pytorch/runs/4134946329?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/68013 Reviewed By: seemethere Differential Revision: D32254046 Pulled By: janeyx99 fbshipit-source-id: 55a7a1b8f8434f6608fe9d423982406c1e187c59	2021-11-08 15:14:28 -08:00
Douglas Lehr	db9b4f1a37	[ROCm] Bump magma source to pickup memory leak fix (#67225 ) Summary: Magma's magma_queue was double allocating storage when creating ptrArray for gemm operations. A fix has been upstreamed and the build needs to pick this up going forward. Fixes #{issue number} cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67225 Reviewed By: janeyx99 Differential Revision: D32252609 Pulled By: seemethere fbshipit-source-id: e27ba1a54dc060fd1bfb4afad9079bf9b4705c8a	2021-11-08 15:08:09 -08:00
Kevin Tse	0b09d62cf3	[hackathon][DataPipe] adding .pyi file generation for torch.utils.data.datapipes (#67374 ) Summary: Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ https://github.com/pytorch/pytorch/issues/67374 This is a work in progress. Related TorchData issue: https://github.com/pytorch/data/issues/80 cc VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/67374 Reviewed By: H-Huang Differential Revision: D32153211 Pulled By: NivekT fbshipit-source-id: b4c61f191f20fd98ca44bb9e4f972c6d812994a0	2021-11-08 14:43:24 -08:00
David Berard	2e523ed229	[JIT] additional support for CallMethod with autocasting (#67925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67925 Previously, the following would always fail, because autocasting would not be enabled in the called method: ``` torch.jit.script def fn(x, y): with autocast(): # CallMethod() to some method fn(x, y) ``` This allows the above, if autocasting is globally enabled, e.g. ``` torch.jit.script def fn(x, y): with autocast(): # CallMethod() to some method with autocast(): fn(x, y) # now ``` ghstack-source-id: 142667351 Test Plan: added test in test_jit_autocast.py Reviewed By: navahgar Differential Revision: D32214439 fbshipit-source-id: bb7db054e25e18f5e3d2fdb449c35b5942ab303e	2021-11-08 14:37:09 -08:00
Gary Miguel	f57c63032e	[ONNX] Fix reciprocal when input is not floating point (#67471 ) (#67808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67808 torch.reciprocal implicitly casts the inputs to float, and ONNX Reciprocal requires floating point inputs. Also separate the reciprocal test from other tests, and test different input types. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181307 Pulled By: malfet fbshipit-source-id: 3e1109b3c85a49c51dc713656a900b4ee78c8340	2021-11-08 14:37:07 -08:00
Gary Miguel	eb22d06e5e	[ONNX] Use human readable enum for dtype scalars (#66822 ) (#67807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67807 Also make quoting of string literals consistent. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181309 Pulled By: malfet fbshipit-source-id: e1053701e3589f0310d8b5ef920359c03c6713f0	2021-11-08 14:37:05 -08:00
Gary Miguel	958d517643	[ONNX] Fix new_full and full_like for Python 3.9 (#67124 ) (#67806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67806 Previously new_full would fail with errors like: `TypeError: only integer tensors of a single element can be converted to an index` And full_like would trigger warnings like: `DeprecationWarning: an integer is required (got type float). Implicit conversion to integers using __int__ is deprecated, and may be removed in a future version of Python.` Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181301 Pulled By: malfet fbshipit-source-id: 2cf262cfef36c18e7b2423efe1e1d4fa3438f0ba Co-authored-by: Bowen Bao <bowbao@microsoft.com>	2021-11-08 14:37:03 -08:00
Gary Miguel	37688148ae	[ONNX] Support opset 15 (#67121 ) (#67805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67805 Also fix Reduce ops on binary_cross_entropy_with_logits The graph says the output is a scalar but with `keepdims=1` (the default), the output should be a tensor of rank 1. We set keep `keepdims=0` to make it clear that we want a scalar output. This previously went unnoticed because ONNX Runtime does not strictly enforce shape inference mismatches if the model is not using the latest opset version. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181304 Pulled By: malfet fbshipit-source-id: 1462d8a313daae782013097ebf6341a4d1632e2c Co-authored-by: Bowen Bao <bowbao@microsoft.com>	2021-11-08 14:37:00 -08:00
Bowen Bao	ead59b5ff3	[ONNX] Suppress ort warnings in onnx related test (#67054 ) (#67804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67804 Improve readability of test logs by suppressing ort warnings logging for onnx related test. Reducing ONNX CI test log binary size: linux-xenial-py3.6-clang7-onnx-test1: 12443 KB -> 6958 KB linux-xenial-py3.6-clang7-onnx-test2: 16884 KB -> 5778 KB Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181308 Pulled By: malfet fbshipit-source-id: 11cf165dc212d061606590e96c08c6e021135f74 Co-authored-by: BowenBao<bowbao@microsoft.com>	2021-11-08 14:35:20 -08:00
Samantha Andow	ea60e7d559	Op info for activation functions 2 (softsign, tanh, tanhshrink, threshold, celu, sigmoid, mish, hardsigmoid) (#67492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67492 Reviewed By: mruberry Differential Revision: D32007691 Pulled By: samdow fbshipit-source-id: 6cb14dc56e296154e2f48249049c4d2fe4f4d10d	2021-11-08 14:30:50 -08:00
Yinghai Lu	a1d733ae8c	Avoid convert trt.Dims to tuple in hot path (#67960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67960 For some reason, we are throwing py::index_error when converting a trt.Dims to tuple. This staying in the hot path of trt inference in not good, especially when we register a bunch of pybind11 exception translator where they repeatedly rethrow the exception. Since shape is static information, we save it once to avoid such repeated conversion. Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D32232065 fbshipit-source-id: 11e49da9758ead0ff3aa647bbd3fce7735bf4a07	2021-11-08 13:36:15 -08:00
andrewor	4a8f27445d	[Quant] Add dynamic QAT Linear module (#67325 ) Summary: Summary: This commit adds the `torch.nn.qat.dynamic.modules.Linear` module, the dynamic counterpart to `torch.nn.qat.modules.Linear`. Functionally these are very similar, except the dynamic version expects a memoryless observer and is converted into a dynamically quantized module before inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325 Test Plan: `python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear` Reviewers: Charles David Hernandez, Jerry Zhang Subscribers: Charles David Hernandez, Supriya Rao, Yining Lu Tasks: 99696812 Tags: pytorch Reviewed By: malfet, jerryzh168 Differential Revision: D32178739 Pulled By: andrewor14 fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73	2021-11-08 10:24:25 -08:00
Nikita Vedeneev	db456d16ee	`torch.lobpcg.backward`: do not save non-Variable types with `ctx.save_for_backward`. (#67994 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67827 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67994 Reviewed By: H-Huang Differential Revision: D32244818 Pulled By: albanD fbshipit-source-id: 702a3a1d1f4c160bef7ec1f764a2ab5d01ca7901	2021-11-08 10:02:09 -08:00
Michael Suo	8e2528132b	[lint] small updates to .lintrunner.toml (#67942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67942 - Change "name" to "code" for consistency with linttool and LintMessage format. - Change "args" and "init_args" to "command" and "init_command" for consistency with internal representation. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250606 Pulled By: suo fbshipit-source-id: 557fef731bab9adca7ab1e7cc41b996956076b05	2021-11-08 09:45:26 -08:00
Michael Suo	d201102d36	[lint] Add the rest of the grep linters (#67932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67932 Also various improvements to grep_linter.py, including the ability to specify a replacement pattern. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250603 Pulled By: suo fbshipit-source-id: e07eb182e9473a268e2b805a68a859b91228bfbb	2021-11-08 09:45:20 -08:00
Michael Suo	53f118c800	[lint] improve mypy lintrunner config (#67936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67936 - Add the strict config - Make the patterns exactly match the current CI - Add init_args Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250605 Pulled By: suo fbshipit-source-id: a71d434bf6024db4462260a460a1bc2d9ac66a32	2021-11-08 09:45:14 -08:00
Michael Suo	419c58ea9c	[lint] add newlines linter to lintrunner (#67894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67894 As title. Confirmed that the code base passes by running: ``` lintrunner --paths-cmd='git grep -Il ""' --take NEWLINE ``` and seeing that it pases Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D32250604 Pulled By: suo fbshipit-source-id: de9bcba635d21f8832bb25147b19b7b2e8802247	2021-11-08 09:45:07 -08:00
Michael Suo	4b021280ad	[lint] add nativefunctions to lintrunner (#67890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67890 Adding another linter. I also added a generic initializer that installs the right pip packages (you can invoke it by running `lintrunner init`). Differential Revision: D32197366 D32197366 Test Plan: Imported from OSS Reviewed By: driazati Pulled By: suo fbshipit-source-id: 82844e78f1ee3047220d8444874eab41d7cc0e9e	2021-11-08 09:44:59 -08:00
Michael Suo	5bb5bfccf7	[lint] add lintrunner support for circleci_linter (#67872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67872 As title. This demonstrates some of the nice features of lintrunner: - Uniform error reporting means you get a nice diff of the changes for free - Can run with -a to just accept the changes (don't need to tell people to run a special regenerate command since the linter adaper already knows how.) Differential Revision: D32187386 D32187386 Test Plan: Imported from OSS Reviewed By: driazati Pulled By: suo fbshipit-source-id: 71de6b042730be80ff6794652039e9bc655a72b1	2021-11-08 09:43:25 -08:00
oliver	b3770766c4	Fixes deprecation warnings in `test_optim.py` (#67954 ) Summary: Catches deprecation warnings when we call `scheduler.step(epoch)` in tests. Removes duplicate parameters to optimizers unless we are specifically testing for that Fixes https://github.com/pytorch/pytorch/issues/67696 There is one warning remaining when I run this locally -- however that is due to the implementation of the `SequentialLR` Scheduler. I will open a new issue relating to that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67954 Reviewed By: H-Huang Differential Revision: D32244056 Pulled By: albanD fbshipit-source-id: 2ab3086a58e10c8d29809ccbaab80606a1ec61d8	2021-11-08 09:36:08 -08:00
David Berard	b546cdf401	[SR] Out variant for prim::NumToTensor (#67856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67856 Returns a tensor constructed from scalar input Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Ran ``` buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --gtest_filter=NumToTensorScalar --v=1 ``` and the output contains `Switch to out variant for node: %2 : Tensor = prim::NumToTensor(%0)`. Reviewed By: mikeiovine Differential Revision: D32014194 fbshipit-source-id: e7df65ea1bf05d59c1fc99b721aee420e484f542	2021-11-08 09:02:58 -08:00
ZhuRui	0dc99dcf59	Update __init__.py (#67900 ) Summary: fix bugs https://github.com/pytorch/pytorch/issues/67896 fix a syntax error in pytorch/torch/cuda/__init__.py Fixes https://github.com/pytorch/pytorch/issues/67896 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67900 Reviewed By: mruberry Differential Revision: D32211978 Pulled By: soulitzer fbshipit-source-id: a313a5e23b4d79e5b7bb909eaf82c9ee6cab10c9	2021-11-08 08:56:38 -08:00
Mike Iovine	5bc89275dd	[SR] Eliminate no-ops (#67437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67437 Certain ops do nothing on the forward pass and can be discarded after training: `aten::detach` and `fb::scale_gradient` are examples of this. Test Plan: `buck test caffe2/test:jit -- test_freezing` Reviewed By: hlu1 Differential Revision: D31980843 fbshipit-source-id: 0045b6babcfae786a2ce801b2f5997a078205bc0	2021-11-08 08:42:33 -08:00
thomasw21	191b48b12f	[torch.fx] Fix replace pattern mechanism (#66442 ) Summary: Fixes #{issue number} The following code would not return the pattern correctly: ```python def f(x): x = torch.sigmoid(x) x = torch.sigmoid(x) return torch.sigmoid(x) def pattern(x): return torch.sigmoid(x) def replacement(x): return torch.exp(x) def comparison(x): x = torch.exp(x) x = torch.exp(x) return torch.exp(x) traced = symbolic_trace(f) comparison_fn = symbolic_trace(comparison) subgraph_rewriter.replace_pattern(traced, pattern, replacement) # Only one sigmoid gets converted. ``` This PR fixes this by adding a new test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66442 Reviewed By: ZolotukhinM Differential Revision: D32238424 Pulled By: ansley fbshipit-source-id: 386e777174c639baafc166d5ffbc0658a96b1ee9	2021-11-07 13:23:02 -08:00
Howard Huang	9fb3ba9d7b	Revert D31762735 (#67924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67924 This diff reverts the changes made in D31762735 (`0cbfd466d2`) Test Plan: Wait for CI Reviewed By: derekmod-fb Differential Revision: D32214744 fbshipit-source-id: e0a65b6a31a88216ae1243549fcbc901ef812374	2021-11-06 17:34:13 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	9cacf2b718	Add custom zipper script to zip python modules for torch.deploy (#67006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67006 Test Plan: nervouslaugh_ Reviewed By: shunting314 Differential Revision: D31822429 fbshipit-source-id: c2efeab1446fbeb70b98d4ee766fbc670cf091b0	2021-11-06 11:49:02 -07:00
Chen Lai	ae501a9727	[PyTorch Edge] Update bytecode version compatibility check (#67417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67417 bytecode version is valid when it's smaller than kMaxSupported and larger than kMinSupported ghstack-source-id: 142609392 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' ``` Reviewed By: JacobSzwejbka, iseeyuan Differential Revision: D31984839 fbshipit-source-id: 2011e77455c931c0a8a58267494d44bcf167b877	2021-11-05 19:34:01 -07:00
James Reed	80178d6152	[DDP] Fix some issues with code example in DDP docstring (#67883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67883 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D32190946 Pulled By: jamesr66a fbshipit-source-id: a376324b95cbe833ffa606ecdfc6156432880f70	2021-11-05 17:32:45 -07:00
James Reed	22afe82ce3	[rpc] Switch RPC agent check to TORCH_CHECK and add more descriptive error (#67882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67882 I ran into a hard-to-interpret error message when trying to run the following script, which was missing an `init_rpc` call: ``` # $ torchrun --standalone --nnodes=1 --nproc_per_node=1 script.py import os rank = int(os.environ['LOCAL_RANK']) world_size = int(os.environ['WORLD_SIZE']) import torch.distributed # !!!!!! Uncomment the following and the script succeeds # torch.distributed.rpc.init_rpc('worker', rank=rank, world_size=world_size) import torch.distributed as dist dist.init_process_group(backend='gloo') import torchvision.models as models import torch rn50 = models.resnet50() rn50.train() rn50 = torch.nn.parallel.DistributedDataParallel(rn50) from torch.distributed.rpc import RRef from torch.distributed.optim import DistributedOptimizer params = [] for param in rn50.parameters(): params.append(RRef(param)) dist_optim = DistributedOptimizer( torch.optim.SGD, params, lr=0.05) loss_func = torch.nn.CrossEntropyLoss() with torch.distributed.autograd.context() as context_id: pred = rn50(torch.randn(50, 3, 224, 224)) target = torch.randn(50, 1000).softmax(dim=1) loss = loss_func(pred, target) dist.autograd.backward(context_id, [loss]) dist_optim.step(context_id) ``` Error: ``` Traceback (most recent call last): File "/xxx/torchrun_exp/script.py", line 23, in <module> params.append(RRef(param)) RuntimeError: agentINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rpc_agent.cpp":237, please report a bug to PyTorch. Current RPC agent is not set! ``` Since this is a user-facing error, I've changed `TORCH_INTERNAL_ASSERT` to `TORCH_CHECK` and added a hint about how to resolve the issue. On the other hand, the fact that this was originally `TORCH_INTERNAL_ASSERT` may suggest that the author thought that this should be an internal-only error condition. If there is some other place that should be throwing an exception in this case that is failing, let me know and I can adapt the fix to change that location. Question for reviewers: * Is there a good test file where I can add a test for this error condition? cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32190947 Pulled By: jamesr66a fbshipit-source-id: 3621d755329fd524db68675c55b1daf20e716d43	2021-11-05 17:31:11 -07:00
Can Balioglu	efdb17b984	Add meta support to tensor range factories (#67032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67032 This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators. Note that the original PR (#66630) was reverted due to two failing unit tests in the Bionic CI. This revision includes a fix for those tests; otherwise its content is identical to the previous PR. Original commit changeset: 2f9d8d1acbb0 ghstack-source-id: 142487306 Test Plan: Extended the existing tensor creation tests to assert meta backend support. Reviewed By: zhaojuanmao Differential Revision: D31834403 fbshipit-source-id: a489858a2a8a38a03234b14408e14d2b208a8d34	2021-11-05 15:36:29 -07:00
Jane Xu	9e8016d8c4	Revert D31932215: [pytorch][PR] Don't #define NUM_THREADS Test Plan: revert-hammer Differential Revision: D31932215 (`f70e8064f4`) Original commit changeset: ccdf11e249fb fbshipit-source-id: 4c330aebe9cfb483f02ceb1fdaf5c3b0f8fa6fa1	2021-11-05 15:14:32 -07:00
Jerry Zhang	10411e3561	[quan][fusion] Fix a additional_fuser_method method for fuse_fx (#67876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67876 Previously we miss it when we call obj.convert and this argument would not impact the fusion. This PR fixes it and adds a test for it Test Plan: python test/test_quantization.py TestFuseFx Imported from OSS Reviewed By: malfet Differential Revision: D32191364 fbshipit-source-id: 566bd39461010d70a21de71f611bb929976fe01d	2021-11-05 14:51:15 -07:00
Xiang Gao	f70e8064f4	Don't #define NUM_THREADS (#67258 ) Summary: PyTorch doesn't compile with the latest `main` branch of cub again. The root cause is, PyTorch defines a macro `NUM_THREADS`, and cub added some code like ```C++ template<...., int NUM_THREADS, ...> ``` and these two mess up with each other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67258 Reviewed By: albanD Differential Revision: D31932215 Pulled By: ngimel fbshipit-source-id: ccdf11e249fbc0b6f654535067a0294037ee7b96	2021-11-05 13:56:11 -07:00
Andrey Talman	b1ecfc6d45	Add timeouts for GHA jobs for pytorch/pytorch (#67912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67713 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67912 Reviewed By: seemethere Differential Revision: D32215323 Pulled By: atalman fbshipit-source-id: 45da7c4bb13c877c9b38bea8615adf75c4a9702d	2021-11-05 12:50:19 -07:00
Kiuk Chung	f6402c469e	(torch/elastic) fix scale down bug caused by calling rdzv_handler.shutdown() on premature agent failures (#67749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67749 Fixes: https://github.com/pytorch/pytorch/issues/67742 Test Plan: Added unittests. Validated manually: ``` # start agent 0 $ torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py # start agent 1 torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py # kill agent 0 CTRL+C (SIGINT) or kill -15 (SIGTERM) # restart it torchrun --rdzv_backend c10d --rdzv_id 123 --rdzv_endpoint localhost:29500 --nnodes 1:2 --nproc_per_node 1 --monitor_interval 1 test.py ``` Reviewed By: cbalioglu Differential Revision: D32129005 fbshipit-source-id: db292268250ef6f1e06f5b4c5bd67124d8dfd325	2021-11-05 12:18:46 -07:00
Samantha Andow	240e8d5cc5	Updated searchsorted functionality (#66818 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60492 Updates searchsorted API to be more consistent with numpy and adds an OpInfo for searchsorted Pull Request resolved: https://github.com/pytorch/pytorch/pull/66818 Reviewed By: mruberry Differential Revision: D31745142 Pulled By: samdow fbshipit-source-id: 0b9600afc3cb0720afb5811212404ee96d2a7d93	2021-11-05 12:13:47 -07:00
Xiao Wang	f6a4c80a5a	Refactor cuDNN Convolution memory format and Conv-Bias-Relu code (#65594 ) Summary: This PR makes several changes: - Changed function `bool cudnn_conv_use_channels_last(...)` to `at::MemoryFormat cudnn_conv_suggest_memory_format(...)` - Removed `resize_` in cudnn convolution code. Added a new overloading method `TensorDescriptor::set` that also passes the desired memory format of the tensor. - Disabled the usage of double + channels_last on cuDNN Conv-Relu and Conv-Bias-Relu. Call `.contiguous(memory_format)` before passing data to cuDNN functions. - Disabled the usage of cuDNN fused Conv-Bias-Relu in cuDNN < 8.0 version due to a CUDNN_STATUS_NOT_SUPPORTED error. Instead, use the native fallback path. - Let Conv-Bias-Relu code respect the global `allow_tf32` flag. From cuDNN document, double + NHWC is genenrally not supported. Close https://github.com/pytorch/pytorch/pull/66968 Fix https://github.com/pytorch/pytorch/issues/55301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65594 Reviewed By: jbschlosser, malfet Differential Revision: D32175766 Pulled By: ngimel fbshipit-source-id: 7ba079c9f7c46fc56f8bfef05bad0854acf380d7	2021-11-05 11:50:55 -07:00
Masaki Kozuki	cdd5d16489	[Foreach] Implement L1&L2 norm (#62646 ) Summary: Implement L1 & L2 norm in fast path with the reference of [nvidia/apex](https://github.com/NVIDIA/apex/blob/master/csrc/multi_tensor_l2norm_kernel.cu). When `ord` is neither 1 nor 2, then slow path is chosen. Related: https://github.com/pytorch/pytorch/issues/58833 cc ptrblck mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/62646 Reviewed By: malfet Differential Revision: D32173421 Pulled By: ngimel fbshipit-source-id: 14b7544601658a979b83509df351e1848ded7675	2021-11-05 11:23:00 -07:00
Raghavan Raman	e7a3bbce89	[nnc] Add support for dynamic shapes in TensorExprKernel (#67861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861 Previously submitted as https://github.com/pytorch/pytorch/pull/67197. This got reverted because its failures were hidden by the failures of another PR. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D32178196 Pulled By: navahgar fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300	2021-11-05 11:18:19 -07:00
Jane Xu	a4a6d056e6	Add ownership to more edge tests (#67859 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66232 This should be the last immediate task. I anticipate test ownership will change overtime but this is the last big thing to close it out Pull Request resolved: https://github.com/pytorch/pytorch/pull/67859 Reviewed By: soulitzer Differential Revision: D32210534 Pulled By: janeyx99 fbshipit-source-id: 7fd835d87d9d35d49ec49de1fcfa29b085133e99	2021-11-05 11:01:16 -07:00
Natalia Gimelshein	9dafb6434b	remove use of THGenerateAllTypes, clean up (#67867 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67867 Reviewed By: mruberry Differential Revision: D32191053 Pulled By: ngimel fbshipit-source-id: 84eb6c2989495fca5f7b055c4984efe5de94e812	2021-11-05 10:57:04 -07:00
jiej	ee7412dd29	autodiff fix for autocast_to_xxx (#67648 ) Summary: Fixes autocast + autodiff issue where `RuntimeError: grad_inputs.size() == node->inputs().size()INTERNAL ASSERT FAILED at "../torch/csrc/jit/runtime/autodiff.cpp":426, please report a bug to PyTorch.` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67648 Reviewed By: cpuhrsch Differential Revision: D32083227 Pulled By: davidberard98 fbshipit-source-id: edf526cff4ec21874ae35ec730d13c250073e10c	2021-11-05 10:48:39 -07:00
Pavithran Ramachandran	9269080b47	[PyTorchEge] backport test (#67824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67824 Testing backport of all prod models using model test framework Ref: [Create tests at run-time (google test)](https://stackoverflow.com/questions/19160244/create-tests-at-run-time-google-test) breaking the list of models into 20 chunks based on a simple hash (sum of all char values) ghstack-source-id: 142398833 Test Plan: ``` buck test //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs Starting new Buck daemon... Parsing buck files: finished in 7.6 sec Creating action graph: finished in 0.9 sec [RE] Metadata: Session ID=[reSessionID-66f5adfe-50d1-4599-9828-3e8115181601] [RE] Waiting on 0 remote actions. Completed 1008 actions remotely, action cache hit rate: 43.59%. Downloaded 26/1523 artifacts, 252.60 Kbytes, 96.6% cache miss (for updated rules) Building: finished in 01:18.6 min (100%) 5532/5532 jobs, 770/5532 updated Total time: 01:27.3 min Testing: finished in 11:21.6 min (41 PASS/0 FAIL) BUILD SUCCEEDED RESULTS FOR //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs PASS 673.8s 41 Passed 0 Skipped 0 Failed //xplat/pytorch/mobile/test:test_read_all_mobile_model_configs TESTS PASSED ``` Reviewed By: dhruvbird Differential Revision: D32068955 fbshipit-source-id: d06c2434a4a69572ab52df31a684e5973b9d551c	2021-11-05 10:41:36 -07:00
Bowen Bao	02e35ce17b	[ONNX] Update onnx function export with comments and clean up (#66817 ) (#67803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67803 * Addresses comments from #63589 [ONNX] remove torch::onnx::PRODUCER_VERSION (#67107) Use constants from version.h instead. This simplifies things since we no longer have to update PRODUCER_VERSION for each release. Also add TORCH_VERSION to version.h so that a string is available for this purpose. [ONNX] Set `ir_version` based on opset_version. (#67128) This increases the odds that the exported ONNX model will be usable. Before this change, we were setting the IR version to a value which may be higher than what the model consumer supports. Also some minor clean-up in the test code: * Fix string replacement. * Use a temporary file so as to not leave files around in the test current working directory. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D32181306 Pulled By: malfet fbshipit-source-id: 02f136d34ef8f664ade0bc1985a584f0e8c2b663 Co-authored-by: BowenBao <bowbao@microsoft.com> Co-authored-by: Gary Miguel <garymiguel@microsoft.com> Co-authored-by: Nikita Shulga <nshulga@fb.com>	2021-11-05 10:35:35 -07:00
Rohan Varma	ace2183195	[FSDP] Address follow up comments for CPU offload (#67813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67813 Address Shen's comments in https://github.com/pytorch/pytorch/pull/67249/files ghstack-source-id: 142379312 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D32157545 fbshipit-source-id: 3cc2df6d5fa0d3b9383ed3711e7f79729dbb1dda	2021-11-05 10:34:08 -07:00
soulitzer	823ae3a4ff	[forward ad] Also check layout of grad matches that of self for inplace over view (#67816 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67800 Currently when the grad is the same layout as base, we try to assign the same tensor to the forward grad of both the base and the view. However, when the layout of the grad is different from the layout of the view, this triggers a copy to be created, and the tangent of the view (after the inplace) will not have a view relationship with the view of the base. This PR just changes it so that we only do the above optimization when the layout also matches the layout of self Pull Request resolved: https://github.com/pytorch/pytorch/pull/67816 Reviewed By: malfet Differential Revision: D32190021 Pulled By: soulitzer fbshipit-source-id: b1b2c9b332e83f4df5695ee9686ea76447f9305b	2021-11-05 10:26:24 -07:00
Howard Huang	13a69d23b1	Add retry logic for test_multitenancy and documentation for find_free_port (#67775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67775 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32142749 Pulled By: H-Huang fbshipit-source-id: 67ab4ede4f4bff96a1ffd41d55b3be0edc82b1ce	2021-11-05 09:05:12 -07:00
Thomas Viehmann	33b7790907	Fix conv_transpose3d backward with non-contiguous grad_out (#67829 ) Summary: Many thanks to Forest Yang (meowmix) from the forum for reporting it with a minimal reproduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67829 Reviewed By: malfet Differential Revision: D32184786 Pulled By: albanD fbshipit-source-id: b63dbd3148b5def2109deb2f4612c08f55f59dfb	2021-11-05 08:34:21 -07:00
hesom	07a08fb95f	Fix typo in LinearLR docs (#67840 ) Summary: The final learning rate should be 0.05 like the lr used as the argument for the optimizer and not 0.005. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67840 Reviewed By: jbschlosser Differential Revision: D32187091 Pulled By: albanD fbshipit-source-id: 8aff691bba3896a847d7b9d9d669a65f67a6f066	2021-11-05 07:16:15 -07:00
oliver	53ebccbe78	Fix warnings produced when running test_optim.py (#67756 ) Summary: Fixes part of https://github.com/pytorch/pytorch/issues/67696 by adding calls to `optimizer.step()` in various places. ## Notes for reviewers: - It is not entirely clear which is the right optimizer to step in each case. I have favoured the more explicit approach of creating a set of optimizers and calling step on each of them. - At the time of writing, the only Scheduler without an `optimizer` instance variable is `ChainedScheduler` which I need to deal with once. I use `hasattr` to do this check. Let me know if this ought to be changed. - I am opening this PR for review when it only solve part of the issue, as I'd rather get feedback sooner. I think it is fine to fix the issue in several PRs too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67756 Reviewed By: jbschlosser Differential Revision: D32187864 Pulled By: albanD fbshipit-source-id: fd0d133bcaa3a24588e5a997ad198fdf5879ff5a	2021-11-05 07:12:13 -07:00
Joel Schlosser	b098264f22	Revert D32063662: [pytorch][PR] TST Adds device transfer into module info tests Test Plan: revert-hammer Differential Revision: D32063662 (`da59bd1d13`) Original commit changeset: 0868235a0ae7 fbshipit-source-id: a4f775874faa88be0eb5272dedf3bbc8194ebde6	2021-11-05 07:07:39 -07:00
Alban Desmaison	bb8978f605	Revert D32175963: Converting hardswish to strucutred kernels with metatensor support Test Plan: revert-hammer Differential Revision: D32175963 (`57335a9ee3`) Original commit changeset: f4d749c6aeaf fbshipit-source-id: 6d68a96cf872c2d7b518c061875b9336bca0043a	2021-11-05 07:04:40 -07:00
Alban Desmaison	4d5338228f	Revert D32175960: Moving parts of the Shape Registry into a common file Test Plan: revert-hammer Differential Revision: D32175960 (`d04389e6f0`) Original commit changeset: 2e30115ca554 fbshipit-source-id: 27f9889c535e4f7c21c50b2468e1e6650e952d4f	2021-11-05 07:04:37 -07:00
Alban Desmaison	38af37f409	Revert D32175958: Adding Custom Rules to Device Propagation Test Plan: revert-hammer Differential Revision: D32175958 (`853298481b`) Original commit changeset: 26a9ef41e10a fbshipit-source-id: adcc70687b5b454f358b5446bed2c06d04e61435	2021-11-05 07:04:35 -07:00
Alban Desmaison	b1ac7f51a1	Revert D32175957: Adding custom testing based on opinfos input for ops with custom rules. Test Plan: revert-hammer Differential Revision: D32175957 (`b8e165e841`) Original commit changeset: 1cb51a7b6cbb fbshipit-source-id: 29fd0750d9981758436c55eea2de40cdaddfb9be	2021-11-05 07:04:33 -07:00
Alban Desmaison	0c8569bec9	Revert D32175959: Merging the implementations of ClearProfiling Test Plan: revert-hammer Differential Revision: D32175959 (`f1754319e3`) Original commit changeset: b335dacce709 fbshipit-source-id: 23d1f75d47f15effc9806bd6e5228007d521b0b3	2021-11-05 07:03:18 -07:00
Don Jang	2f68878a05	[Static Runtime] Add a comment on clients taking ownership of managed output tensors (#67554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67554 This change adds a comment on clients taking ownership of managed output tensor to remind SR developers of how and why that matters. Test Plan: N/A Reviewed By: swolchok Differential Revision: D32013468 fbshipit-source-id: bcc13055c329c61677bdcc76411fe8db44bb2cee	2021-11-04 22:20:49 -07:00
Shirong Wu	ba9d9d488e	Implement padding with slice layer (#67888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67888 Implement padding with slice layer, work step is: reverse slice and pad 0 [1, 2] => [2, 1, 0 ... 0] transpose, reverse tensor back to original order, finish pre-pad [2, 1, 0 ... 0] => [0 ... 0, 1, 2] continue post-pad [0 ... 0, 1, 2] => [0 ... 0, 1, 2, 0 ... 0] Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad Reviewed By: 842974287 Differential Revision: D32160739 fbshipit-source-id: dbbc04d916e23551e3ce9be480283377e9a38b34	2021-11-04 21:25:01 -07:00
Shunting Zhang	daaad47d9c	Allow torch::deploy unity embed xar file of any size (#67814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67814 There was a limitation on the xar file size we can embed into the binary previously. The payload (xar file here) is added to .data section by default using 'ld -b binary -r' command (which section the payload goes is hardcoded in ld BTW. Check code pointer [here](https://github.com/bminor/binutils-gdb/blob/binutils-2_32/bfd/binary.c#L80) ) . When we link the object file containing the payload to other parts of the executable, we will get relocation out of range error if the overall size of .test, .data, .bss etc sections exceed 2G. Some relocation entries uses 32 bit singed integer, thus the limit is 2G here. To solve the issue and mitigate the risk, we designed a mechanism to put the payload in a customized payload section (.torch_deploy_payload.unity here). The payload section does not join the party of relocating and symbol resolution, thus in theory it can be as large as the disk space... Since we don't do relocation for the payload section, the start/end/size symbols are no longer available/valid, we have to parse the ELF file ourselves to figure out those. The mechanism can be used to embed interprter.so as well. The interpreter.so is currently 0.5G. That will limit the other .test/.data/.bss sections of the executable to be at most 1.5G. Using this mechanim in this diff avoid the interpreter.so taking any budgets. We could also use this mechanism to ship python scripts with our binary rather than freeze them before hand. These use cases are not handled in this diff. This diff also improves experience for those simple use cases that does not depends on extra shared libraries in the XAR file (except the shared libraries for python extensions themselves). This is mainly for fixing the stress test right now, but it also makes other simple cases easier. ghstack-source-id: 142483327 Test Plan: # Verify the relocation out of range issue is fixed Add //caffe2:torch as a dependency to the macro build_unity(name="example", …) in torch/csrc/deploy/unity/TARGETS and run 'buck run mode/opt :unity_demo', it's expected to get the relocation errors like: ``` ld.lld: error: caffe2/c10/util/intrusive_ptr.h:325:(.text._ZN11ska_ordered8detailv317sherwood_v3_tableISt4pairIN3c106IValueES4_ES4_NS3_6detail11DictKeyHashENS0_16KeyOrValueHasherIS4_S5_S7_EENS6_14DictKeyEqualToENS0_18KeyOrValueEqualityIS4_S5_SA_EESaIS5_ESaINS0_17sherwood_v3_entryIS5_EEEE15emplace_new_keyIS5_JEEES2_INSH_18templated_iteratorIS5_EEbEaPSF_OT_DpOT0_+0x4E9): relocation R_X86_64_32S out of range: 2345984168 is not in [-2147483648, 2147483647]; references c10::UndefinedTensorImpl::_singleton >>> defined in /data/sandcastle/boxes/fbsource/fbcode/buck-out/opt/gen/caffe2/c10/c10#platform009-clang,static/libc10.a(../c10#compile-UndefinedTensorImpl.cpp.o44c44c4c,platform009-clang/core/UndefinedTensorImpl.cpp.o) ``` With the diff, the error above is resolved. # Pass Stress Test Also pass existing unit tests for unity. buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_sum -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_sum - UnityTest.TestUnitySum' --run-disabled --jobs 18 --stress-runs 10 --record-results buck test mode/opt //caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model -- --exact 'caffe2/torch/csrc/deploy/unity/tests:test_unity_simple_model - UnityTest.TestUnitySimpleModel' --run-disabled --jobs 18 --stress-runs 10 --record-results # Verify debug sections are not messed up Verified that debug sections are not messed up and GDB still works: `gdb ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity_demo` ``` b main run l c ``` Reviewed By: suo Differential Revision: D32159644 fbshipit-source-id: a133513261b73551a71acc257f4019f7b5af34a8	2021-11-04 20:52:57 -07:00
Digant Desai	5a48868d39	[qnnpack] fix benchmarks after an API update (#67768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67768 We don't need to pass so many padding args after removing support for asymm padding from qnnpack Test Plan: it builds Reviewed By: jshen Differential Revision: D32082204 fbshipit-source-id: 2bfe4c135ad613f0cc267e7e3ab6357731f29bc2	2021-11-04 20:17:05 -07:00
John Clow	f1754319e3	Merging the implementations of ClearProfiling (#67575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67575 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175959 Pulled By: Gamrix fbshipit-source-id: b335dacce709a64e3d5779f9c6e9569f86e10748	2021-11-04 19:02:08 -07:00
John Clow	b8e165e841	Adding custom testing based on opinfos input for ops with custom rules. (#67500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67500 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175957 Pulled By: Gamrix fbshipit-source-id: 1cb51a7b6cbb75bf3841e3c4caedf88aa94168fe	2021-11-04 19:02:06 -07:00
John Clow	853298481b	Adding Custom Rules to Device Propagation (#66973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66973 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D32175958 Pulled By: Gamrix fbshipit-source-id: 26a9ef41e10a171be6a8779a4e6014e2e7e3f2c1	2021-11-04 19:02:04 -07:00
John Clow	d04389e6f0	Moving parts of the Shape Registry into a common file (#66948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66948 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175960 Pulled By: Gamrix fbshipit-source-id: 2e30115ca554816166fedddbcdeffbe189eb19a6	2021-11-04 19:02:02 -07:00
John Clow	57335a9ee3	Converting hardswish to strucutred kernels with metatensor support (#66899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66899 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175963 Pulled By: Gamrix fbshipit-source-id: f4d749c6aeaf064084be72361607ea4f3f6bc91d	2021-11-04 19:02:00 -07:00
John Clow	ec8a71f9ac	Dtype Analysis for Unary and Binary ops with Metatensors (#66898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66898 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32175961 Pulled By: Gamrix fbshipit-source-id: 72721259b900e5a311b6bcb5c350366ba420b734	2021-11-04 19:00:50 -07:00
Bert Maher	4b084bc832	Benchmarks for various fusers (#67622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67622 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32171063 Pulled By: bertmaher fbshipit-source-id: 40d3a7adcc52aba3b051e382ec5ec4ee7e43d81b	2021-11-04 18:57:17 -07:00
Shirong Wu	31fc9d6539	Introduce version control for tensorrt converter decorator (#67886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67886 Similar to what we have in torch2trt tensorrt_converter, introduce version enablement for fx2trt converters. Upgrade to trt 8.2 will introduce new op converter as well as deprecate old op. Test Plan: pass existing unit test Reviewed By: 842974287 Differential Revision: D32183581 fbshipit-source-id: 6419acada296d24e882efa9fca25eca6349153e4	2021-11-04 17:39:15 -07:00
Tao Xu	f5daa9f76b	[iOS] Enable ARC for CMake build (#67884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67884 Test Plan: Imported from OSS Reviewed By: husthyc Differential Revision: D32191532 Pulled By: xta0 fbshipit-source-id: a295004f8e7f1b0f5a4ab12ffd9b37c36b80226b	2021-11-04 16:50:46 -07:00
Pavithran Ramachandran	c2ceba8ada	[PyTorchEdge] Move all serialize/deserialize files to a separate target (#66805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66805 {F672465642} DGW: ``` buck query 'allpaths(//xplat/caffe2:torch_mobile_core, //xplat/caffe2:torch_mobile_interpreter)' --output-format dot_compact \| pastry bunnylol dgw paste_id ``` Test Plan: buck builds to pass ``` buck build fbsource//fbandroid/mode/opt @//fbandroid/mode/messenger //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest_redex_optimizedtestableresources_postprocessed_resign //fbandroid/apps/messenger:messenger_staticdi_dextr_splitarsc_dlstr_xzs_for_perftest#unstripped_native_libraries buck build //xplat/caffe2:torch_mobile_coreAndroid#android-armv7,shared buck build //xplat/caffe2:torch_commonAndroid#android-armv7,shared ``` DGW: ``` buck query 'allpaths(//xplat/caffe2/fb/runtime:only_flatbuffer_test, //xplat/caffe2:miniz)' --output-format dot_compact \| pastry P464671429: https://www.internalfb.com/intern/paste/P464671429/ bunnylol dgw P464671429 ``` loader is decoupled from miniz ``` buck query 'allpaths(//xplat/caffe2/fb/runtime:flatbuffer_loader, //xplat/caffe2:miniz)' --output-format dot_compactdigraph result_graph { } ``` Reviewed By: iseeyuan Differential Revision: D31532862 fbshipit-source-id: 51e6880e78e1cafe20c8d90e98037edc3c1b6b11	2021-11-04 15:55:52 -07:00
Scott Wolchok	b0c05297f9	[Static Runtime] Arena allocate StorageImpls for managed tensors (#66130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66130 We're reusing backing storage for these tensors, which is only safe because they have non-overlapping lifetimes. Accordingly, it seems that they can also share their StorageImpl. ghstack-source-id: 142427752 Test Plan: benchmarked ctr_mobile_feed local and local_ro: Using recordio inputs for model 302008423_0 ``` swolchok@devbig032 ~/f/fbcode> env MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 > environment^C swolchok@devbig032 ~/f/fbcode> sudo ~/fbsource2/fbcode/scripts/bertrand/noise/denoise-env.sh \ /tmp/ptvsc2_predictor_benchNov1ArenaAllocateStorageImpls \ --scripted_model=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.predictor.disagg.local \ --method_name=local.forward --pt_cleanup_activations=1 \ --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=2 --warmup_iters=2 \ --num_threads=1 --pt_enable_static_runtime=1 --set_compatibility=1 --repetitions=5 --recordio_use_ivalue_format=1 --recordio_inputs=/data/users/swolchok/ctr_mobile_feed_q3_2021/302008423_0.local.inputs.recordio Stable ======================================== I1101 14:19:16.473964 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0131. Iters per second: 49.9673 I1101 14:20:12.193130 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 20.0155. Iters per second: 49.9612 I1101 14:21:07.761898 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9751. Iters per second: 50.0624 I1101 14:22:03.218066 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9104. Iters per second: 50.2249 I1101 14:22:58.723256 2748837 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.956. Iters per second: 50.1102 I1101 14:22:58.723306 2748837 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.974, standard deviation: 0.043643 ArenaAllocateStorageImpls ======================================== I1101 14:08:57.070914 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9771. Iters per second: 50.0572 I1101 14:09:52.605121 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.924. Iters per second: 50.1907 I1101 14:10:48.098287 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9353. Iters per second: 50.1624 I1101 14:11:43.645395 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9723. Iters per second: 50.0694 I1101 14:12:39.171636 2695478 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 19.9673. Iters per second: 50.0819 I1101 14:12:39.171685 2695478 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 19.9552, standard deviation: 0.0239318 difference: 0.0188 (0.09%), which is less than 1 standard deviation Stable, local_ro ======================================== I1101 14:26:10.796161 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25991. Iters per second: 793.708 I1101 14:26:12.194727 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26862. Iters per second: 788.26 I1101 14:26:13.591312 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.26549. Iters per second: 790.207 I1101 14:26:14.982439 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25943. Iters per second: 794.01 I1101 14:26:16.377033 2787930 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.25995. Iters per second: 793.68 I1101 14:26:16.377094 2787930 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.26268, standard deviation: 0.00414788 ArenaAllocateStorageImpls, local_ro ======================================== I1101 14:26:45.875073 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20987. Iters per second: 826.536 I1101 14:26:47.207271 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20827. Iters per second: 827.633 I1101 14:26:48.533766 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.20023. Iters per second: 833.174 I1101 14:26:49.850610 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19206. Iters per second: 838.884 I1101 14:26:51.172356 2790009 PyTorchPredictorBenchLib.cpp:251] PyTorch run finished. Milliseconds per iter: 1.19958. Iters per second: 833.622 I1101 14:26:51.172411 2790009 PyTorchPredictorBenchLib.cpp:262] Mean milliseconds per iter: 1.202, standard deviation: 0.00722754 Difference: 0.06 usec/iter (4.8%), which is much more than 1 standard deviation ``` we can see that this is a large relative improvement on local_ro, but no effect on local. Reviewed By: hlu1 Differential Revision: D31357486 fbshipit-source-id: 229c003677da76e89c659d0e0639002accced76e	2021-11-04 15:43:39 -07:00
Scott Wolchok	01809731bc	[Static Runtime] Cache managed tensor Storages (#66638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66638 See comments in code explaining what we're doing here. ghstack-source-id: 142427750 Test Plan: Ran ptvsc2_predictor_bench on ctr_mobile_feed local and local_ro net before/after this change on a devserver with turbo off. Results: ``` stable, local_ro: ======================================== I1014 16:13:52.713300 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68012. Iters per second: 373.118 I1014 16:14:00.961875 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66156. Iters per second: 375.719 I1014 16:14:09.163097 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.6449. Iters per second: 378.086 I1014 16:14:17.425621 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66661. Iters per second: 375.008 I1014 16:14:25.711349 151733 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67375. Iters per second: 374.006 I1014 16:14:25.711390 151733 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.66539, standard deviation: 0.0134423 stable, local: ======================================== I1014 15:08:28.547081 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42772. Iters per second: 155.576 I1014 15:08:48.276582 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3643. Iters per second: 157.127 I1014 15:09:07.978683 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3566. Iters per second: 157.317 I1014 15:09:27.875543 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.42044. Iters per second: 155.752 I1014 15:09:47.558079 3979345 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.34902. Iters per second: 157.505 I1014 15:09:47.558120 3979345 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.38361, standard deviation: 0.037421 cache storages, local_ro: ======================================== I1014 16:15:42.292997 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66604. Iters per second: 375.088 I1014 16:15:50.622402 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68683. Iters per second: 372.186 I1014 16:15:58.901475 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.67028. Iters per second: 374.493 I1014 16:16:07.156373 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.66317. Iters per second: 375.492 I1014 16:16:15.474292 160496 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 2.68394. Iters per second: 372.587 I1014 16:16:15.474334 160496 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 2.67405, standard deviation: 0.0106982 cache storages, local: ======================================== I1014 20:53:43.113400 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.3811. Iters per second: 156.713 I1014 20:54:02.829102 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.36039. Iters per second: 157.223 I1014 20:54:22.885171 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.47333. Iters per second: 154.48 I1014 20:54:42.768963 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.41404. Iters per second: 155.908 I1014 20:55:02.624423 1657168 PyTorchPredictorBenchLib.cpp:252] PyTorch run finished. Milliseconds per iter: 6.4042. Iters per second: 156.147 I1014 20:55:02.624460 1657168 PyTorchPredictorBenchLib.cpp:269] Mean milliseconds per iter: 6.40661, standard deviation: 0.0427168 ``` Looks like this diff is neutral or a slight regression, but it is a stepping stone on the way to the following diff. Reviewed By: hlu1 Differential Revision: D31326711 fbshipit-source-id: a6e0185f24a6264b1af2a51b69243c310d0d48d5	2021-11-04 15:42:22 -07:00
Nikita Shulga	56dda833ff	Small updates to RELEASE.md (#65489 ) Summary: Combine `xla` and `builder` branch pinning steps and link them to a PR that does it correctly Update example PR for version bump, as few files have changed Deleted FaceHub step as it is no longer necessary after recent update Pull Request resolved: https://github.com/pytorch/pytorch/pull/65489 Reviewed By: zhouzhuojie, seemethere Differential Revision: D31120498 Pulled By: malfet fbshipit-source-id: e1a9db2b03243c8d28eeed9888c3653e4460ad07	2021-11-04 15:39:40 -07:00
Ivan Yashchuk	d5d342b237	Sparse CSR CUDA: Support mixed memory format input for triangular_solve (#66401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66401 This PR fixes the case when result and input tensors have different strides. cuSPARSE from CUDA 11.3.1 has a bug: it doesn't use correct strides to write the result. This is "fixed" in PyTorch code by copying the input tensor to a tensor with same strides as result tensor has. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: davidberard98 Differential Revision: D32177966 Pulled By: cpuhrsch fbshipit-source-id: 118437409df147f04dce02763aff9bfd33f87c63	2021-11-04 15:34:42 -07:00
Caspar van Leeuwen	a20a64af4e	Increased tolerance for test_zero_model_parallel tests (#67765 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67764 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67765 Reviewed By: malfet Differential Revision: D32171621 Pulled By: mrshenli fbshipit-source-id: 8c34f4714289cb41824f3a18822a28ed670fa0a6	2021-11-04 15:17:45 -07:00
MalikIdreesHasanKhan	c541c69e89	Fix minor typo in contributing.md (#67855 ) Summary: Fixes #{issue number} No issue number, minor change Pull Request resolved: https://github.com/pytorch/pytorch/pull/67855 Reviewed By: malfet Differential Revision: D32186689 Pulled By: driazati fbshipit-source-id: 7cda19f66ff1312296d8310922bb0d221df81e46	2021-11-04 14:38:48 -07:00
Jiewen Tan	8bed46ef38	[WIP][LTC] Upstream class Shape (#67672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67672 This commit Upstreams class Shape from lazy_tensor_staging branch. Test Plan: WIP. Reviewed By: malfet Differential Revision: D32095478 Pulled By: alanwaketan fbshipit-source-id: 61611b12fc079b195833b5b22a6cf73c0935b8b9	2021-11-04 14:12:03 -07:00
Shashank Chaudhry	e8ac8c005d	[NOOP][clangformat][codemod] Enable CLANGFORMAT (#67854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67854 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D32173077 fbshipit-source-id: 10ab4b0afa18c7be4fab3e3564d9b479a7a48cb5	2021-11-04 14:07:57 -07:00
Hao Lu	938bab0bfd	[PyTorch] Add int version of vectorized PrefixSum to Benchmark (#67865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67865 - Add int version of vectorized PrefixSum - Use unaligned load/store instructions - Add exclusive scan version. "exclusive" means that the i-th input element is not included in the i-th sum. For details see https://en.cppreference.com/w/cpp/algorithm/exclusive_scan Test Plan: ``` buck build mode/opt-clang //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench OMP_NUM_THREADS=1 numactl -m 0 -C 5 \ ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench ``` For full benchmark results, see P465274613 ``` PrefixSumBench/LocalInt/64 57 ns 56 ns 12414048 GB/s=9.06239G/s PrefixSumBench/LocalInt/256 221 ns 221 ns 3160853 GB/s=9.28635G/s PrefixSumBench/LocalInt/1024 818 ns 817 ns 857922 GB/s=10.0235G/s PrefixSumBench/LocalInt/4096 3211 ns 3210 ns 217614 GB/s=10.2093G/s PrefixSumBench/LocalInt/16384 12806 ns 12804 ns 54805 GB/s=10.2364G/s PrefixSumBench/LocalInt/65536 51115 ns 51079 ns 13741 GB/s=10.2643G/s PrefixSumBench/LocalInt/262144 205974 ns 205912 ns 3401 GB/s=10.1847G/s PrefixSumBench/LocalInt/1048576 829523 ns 828859 ns 845 GB/s=10.1207G/s PrefixSumBench/LocalIntAVX2/64 45 ns 45 ns 15568113 GB/s=11.3549G/s PrefixSumBench/LocalIntAVX2/256 208 ns 208 ns 3371174 GB/s=9.86913G/s PrefixSumBench/LocalIntAVX2/1024 893 ns 892 ns 783154 GB/s=9.18629G/s PrefixSumBench/LocalIntAVX2/4096 3618 ns 3613 ns 193834 GB/s=9.06838G/s PrefixSumBench/LocalIntAVX2/16384 14416 ns 14411 ns 48564 GB/s=9.09543G/s PrefixSumBench/LocalIntAVX2/65536 57650 ns 57617 ns 12156 GB/s=9.09952G/s PrefixSumBench/LocalIntAVX2/262144 230855 ns 230612 ns 3035 GB/s=9.09386G/s PrefixSumBench/LocalIntAVX2/1048576 924265 ns 923777 ns 758 GB/s=9.08077G/s PrefixSumBench/LocalIntAVX512/64 23 ns 23 ns 24876551 GB/s=22.0697G/s PrefixSumBench/LocalIntAVX512/256 95 ns 95 ns 7387386 GB/s=21.556G/s PrefixSumBench/LocalIntAVX512/1024 435 ns 435 ns 1609682 GB/s=18.8425G/s PrefixSumBench/LocalIntAVX512/4096 1815 ns 1815 ns 385462 GB/s=18.0561G/s PrefixSumBench/LocalIntAVX512/16384 7479 ns 7476 ns 93660 GB/s=17.5335G/s PrefixSumBench/LocalIntAVX512/65536 30171 ns 29879 ns 23430 GB/s=17.5468G/s PrefixSumBench/LocalIntAVX512/262144 125805 ns 125631 ns 5570 GB/s=16.6929G/s PrefixSumBench/LocalIntAVX512/1048576 504216 ns 503983 ns 1384 GB/s=16.6446G/s PrefixSumBench/ExclusiveScanIntAVX512/64 23 ns 23 ns 30058295 PrefixSumBench/ExclusiveScanIntAVX512/256 101 ns 101 ns 7398498 PrefixSumBench/ExclusiveScanIntAVX512/1024 435 ns 434 ns 1403877 PrefixSumBench/ExclusiveScanIntAVX512/4096 1979 ns 1978 ns 354016 PrefixSumBench/ExclusiveScanIntAVX512/16384 7828 ns 7819 ns 89551 PrefixSumBench/ExclusiveScanIntAVX512/65536 31206 ns 31192 ns 22408 PrefixSumBench/ExclusiveScanIntAVX512/262144 130106 ns 130023 ns 5388 PrefixSumBench/ExclusiveScanIntAVX512/1048576 525515 ns 524976 ns 1244 ``` Reviewed By: navahgar, swolchok Differential Revision: D32011740 fbshipit-source-id: 7962de710bd588291dd6bf0c719f579c55f7c063	2021-11-04 14:00:19 -07:00
Philip Meier	641ba36a4e	fix annotation for Demultiplexer (#65998 ) Summary: cc SsnL VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/65998 Reviewed By: bdhirsh Differential Revision: D32145926 Pulled By: ejguan fbshipit-source-id: 60be3126fb9e73b8631b5040676264504e926707	2021-11-04 13:44:02 -07:00
Thomas J. Fan	da59bd1d13	TST Adds device transfer into module info tests (#65488 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 This PR adds device to device transfer test into `ModuleInfo`. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/65488 Reviewed By: mruberry Differential Revision: D32063662 Pulled By: jbschlosser fbshipit-source-id: 0868235a0ae7e5b6a3e4057c23fe70753c0946d2	2021-11-04 12:50:33 -07:00
Natalia Gimelshein	3d4a6ff15d	Revert D32154788: Move Concat Linear out of Optimize Numerics Test Plan: revert-hammer Differential Revision: D32154788 (`ea94dde573`) Original commit changeset: faa6465c89b3 fbshipit-source-id: 0dcaa65268b68ed01e6a5bc7b73ade1f51163b33	2021-11-04 12:20:02 -07:00
Natalia Gimelshein	86aea79217	Revert D32154786: Fix Freezing Docs Parameters Test Plan: revert-hammer Differential Revision: D32154786 (`db15a7c0b3`) Original commit changeset: d8a2b4f39ff4 fbshipit-source-id: 657e3974a8e0ca71790adc1b031a87b7c497ea25	2021-11-04 12:20:00 -07:00
Natalia Gimelshein	279af1a668	Revert D32154787: Formatted with Black Test Plan: revert-hammer Differential Revision: D32154787 (`08d630b9a6`) Original commit changeset: 6a95691c4ad9 fbshipit-source-id: 2dbcf2395071433731683f685a0351fa8604d620	2021-11-04 12:18:37 -07:00
John Clow	08d630b9a6	Formatted with Black (#67792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67792 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154787 Pulled By: Gamrix fbshipit-source-id: 6a95691c4ad9d997071bb4ffc00b5eab30f90b81	2021-11-04 11:32:26 -07:00
John Clow	db15a7c0b3	Fix Freezing Docs Parameters (#67201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67201 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154786 Pulled By: Gamrix fbshipit-source-id: d8a2b4f39ff477f5131c02fe8c0b1a25339ce158	2021-11-04 11:32:24 -07:00
John Clow	ea94dde573	Move Concat Linear out of Optimize Numerics (#67196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67196 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D32154788 Pulled By: Gamrix fbshipit-source-id: faa6465c89b3676d6b1ff7c20a677738a7fbdf88	2021-11-04 11:30:39 -07:00
Bo Tan	6f0a1f2b8d	Only set sccache_epilogue to run on build job exits (#67798 ) Summary: Fixes: * https://github.com/pytorch/pytorch/issues/65431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67798 Reviewed By: malfet Differential Revision: D32174810 Pulled By: boyuantan fbshipit-source-id: 072fdc042b56e541a877074120d41645c98e41f5	2021-11-04 11:11:02 -07:00
Eli Uriegas	618bab593c	.github: Output expected vs. actual (#67703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67703 Had this script fail on me within CI without actually telling me what was wrong so adding some more output here to showcase what the actual vs. the expected result is Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32112898 Pulled By: seemethere fbshipit-source-id: dfc9a82c709d52e0dde02d1e99a19eecc63c5836	2021-11-04 11:02:43 -07:00
Rohan Varma	90d311b268	[RPC] Add exception logging to constValue() (#67802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67802 In RPC C++ code, we might sometimes call constValue() when the future actually has an exception, and in unittests we want to assert on the exception. What happens is that we get a message basically saying "!eptr_" which indicates there is some exception but we don't know what it is. This diff simply adds logging for the exception and mentions that `value` over `constValue` should be used when the future can have an exception. The contract of `constValue` to throw when `eptr_` is set is still held, it is just enhanced with additional logging. ghstack-source-id: 142375391 Test Plan: Added UT Reviewed By: mrshenli Differential Revision: D32156552 fbshipit-source-id: 4dd5e73b92173209074c104a4b75c2021e20de4b	2021-11-04 10:04:09 -07:00
Masaki Kozuki	7c739e1ab9	Resubmit #67161 (#67735 ) Summary: Skip building extensions if windows following https://github.com/pytorch/pytorch/pull/67161#issuecomment-958062611 Related issue: https://github.com/pytorch/pytorch/issues/67073 cc ngimel xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67735 Reviewed By: bdhirsh Differential Revision: D32141250 Pulled By: ngimel fbshipit-source-id: 9bfdb7cf694c99f6fc8cbe9033a12429b6e4b6fe	2021-11-04 09:59:30 -07:00
Jane Xu	8b0c2c18eb	Fix pretrained=True for test_pt_onnx_trt (#67818 ) Summary: Addresses https://github.com/pytorch/pytorch/pull/66312#issuecomment-960357403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67818 Reviewed By: malfet Differential Revision: D32161208 Pulled By: janeyx99 fbshipit-source-id: 076e52ddc8718c74eb2941e867d92bfa4fe70f80	2021-11-04 09:49:42 -07:00
Eddie Yan	af1bd88fc4	Allow scalars for aliased binary ops {`multiply`, `subtract`, `divide`} (#65937 ) Summary: https://github.com/pytorch/pytorch/issues/65868 pointed out that the "long-form" versions of some binary ops like `mul`, `sub`, and `div` don't match their alias's behavior when it comes to handling scalar inputs. This PR adds the missing registration in `python_arg_parser.cpp` to resolve this. CC ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65937 Reviewed By: malfet Differential Revision: D32156580 Pulled By: ngimel fbshipit-source-id: b143cf7119a8bb51609e1b8734204edb750f0210	2021-11-04 09:36:45 -07:00
Rohan Varma	bd8feb33d4	Update distributed contributing guide to show how to run one test in test_distributed_spawn (#67801 ) Summary: Running one test in test_distributed_spawn is a bit confusing but possible. Add documentation to the CONTRIBUTING.md for this. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67801 Reviewed By: mrshenli Differential Revision: D32157700 Pulled By: rohan-varma fbshipit-source-id: a1d10f2fb5f169b46c6d15149bf949082d9bd200	2021-11-04 08:54:31 -07:00
Peter Bell	4262c8913c	Remove native_functions.yaml dependency from TensorTopK.cu (#66794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66794 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31856104 Pulled By: dagitses fbshipit-source-id: 2b9c0e1072455c5019c6f681faa3de848b3dae46	2021-11-04 08:32:06 -07:00
Peter Bell	927da4d32f	Remove native_functions.yaml dependency from Sort.cu (#66793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66793 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31856100 Pulled By: dagitses fbshipit-source-id: 1469ce1deb4124f2a9e160a8e3298d56ac3f6561	2021-11-04 08:30:40 -07:00
Facebook Community Bot	61ed9285dd	Automated submodule update: tensorpipe (#67845 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `d2aa3485e8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67845 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32170821 fbshipit-source-id: 1958e824a9f02c5178fa5d4a73a171dedefc540c	2021-11-04 08:24:05 -07:00
Howard Huang	cfd998c197	Remove ProcessGroup RPC backend placeholder as part of 1.11 (#67363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67363 ProcessGroup RPC backend is deprecated. In 1.10 it would throw an error to the user to be more user friendly. This PR now removes it completely. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D32138321 Pulled By: H-Huang fbshipit-source-id: b4f700d8f1b1d46ada7b5062d3f754646571ea90	2021-11-04 07:57:58 -07:00
lezcano	8e1ead8e4d	Fix the kl_div docs (#67443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67443 Fixes https://github.com/pytorch/pytorch/issues/57459 After discussing the linked issue, we resolved that `F.kl_div` computes the right thing as to be consistent with the rest of the losses in PyTorch. To avoid any confusion, these docs add a note discussing how the PyTorch implementation differs from the mathematical definition and the reasons for doing so. These docs also add an example that may further help understanding the intended use of this loss. cc brianjo mruberry Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D32136888 Pulled By: jbschlosser fbshipit-source-id: 1ad0a606948656b44ff7d2a701d995c75875e671	2021-11-04 07:09:08 -07:00
Facebook Community Bot	04fe4382ec	Automated submodule update: tensorpipe (#67769 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `caa2ccb394` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67769 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D32138256 fbshipit-source-id: dfe4c73ae25c8f362f2917dd7594bdcd418c2a0d	2021-11-04 01:13:19 -07:00
Mike Ruberry	b8d365ca3a	ci fix (#67826 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67826 Reviewed By: Chillee Differential Revision: D32164770 Pulled By: mruberry fbshipit-source-id: c1de7e6db6d0cb1761388f1ea0178dbff3fe6dc8	2021-11-04 00:16:47 -07:00
Bin Wen	1baed45c6b	[fbcode][static runtime] out-variant for quantized::linear_dynamic_fp16 (#67663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67663 mostly follow the example of quantized::linear (D28428734 (`4d7abdbdad`)) to enable out-variant for quantized::linear_dynamic_fp16. Reason being from MP tab ctr pytorch model migration, we observe quantized::linear_dynamic_fp16 operator has highest cost but not enable out-variant yet https://fburl.com/phabricator/b5juus2d Test Plan: buck build mode/opt caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench sudo watch -n 20 /usr/local/fbprojects/dynamoserver/bin/turboDriver disable MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench -- --scripted_model=/home/bwen/models/991103061_4/991103061_4.predictor --pt_inputs=/home/bwen/models/991103061_4/pt_inputs --method_name=forward --pt_cleanup_activations=1 --pt_enable_out_variant=1 --pt_optimize_memory=1 --iters=1000 --warmup_iters=1000 --num_threads=1 --repetitions=3 --do_profile=1 --do_benchmark=1 --set_compatibility=1 --compare_results=1 --pt_enable_static_runtime 2>&1 \| pastry before: P465201159 0.929067 ms. 31.808%. quantized::linear_dynamic_fp16 (16 nodes) 0.921679 ms. 31.7324%. quantized::linear_dynamic_fp16 (16 nodes) 0.919127 ms. 31.7404%. quantized::linear_dynamic_fp16 (16 nodes) after: P465203015 0.90898 ms. 31.0205%. quantized::linear_dynamic_fp16 (16 nodes, out variant) 0.9127 ms. 30.62%. quantized::linear_dynamic_fp16 (16 nodes, out variant) 0.879148 ms. 31.0161%. quantized::linear_dynamic_fp16 (16 nodes, out variant) unit test logic refers https://fburl.com/code/vv0rry13 buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D32001168 fbshipit-source-id: 873d9f77434b9c4bafb298c871173f9a560dd2a3	2021-11-03 22:39:04 -07:00
Natalia Gimelshein	99c7a9f09d	fix bfloat16 autocast skip (#67822 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67822 Reviewed By: mruberry Differential Revision: D32162605 Pulled By: ngimel fbshipit-source-id: eb5ccf6c441231e572ec93ac8c2638d028abecad	2021-11-03 21:02:37 -07:00
Elias Ellison	2486061c72	[JIT] make x (+ or -) 0 and x (* or /) 1 peepholes type promotion aware (#67688 ) Summary: Some of the "no-ops" are not actually no-ops because they can change the dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/67688 Reviewed By: davidberard98 Differential Revision: D32104601 Pulled By: eellison fbshipit-source-id: ccb99179a4b30fd20b5a9228374584f2cdc8ec21	2021-11-03 20:11:46 -07:00
Jane Xu	88d86de7d8	Add lint to ensure all test files have headers with ownership info (#66826 ) Summary: UPDATE: CI should be green now with the added files. This should fail for now, but will pass when all action for https://github.com/pytorch/pytorch/issues/66232 is done. Example failure run: https://github.com/pytorch/pytorch/runs/4052881947?check_suite_focus=true Pull Request resolved: https://github.com/pytorch/pytorch/pull/66826 Reviewed By: seemethere Differential Revision: D32087209 Pulled By: janeyx99 fbshipit-source-id: ad4b51e46de54f23aebacd592ee67577869f8bb6	2021-11-03 18:21:49 -07:00
Junjie Wang	2766662ca9	[PyTorch][2/N] Basic implementation of ShardedEmbeddingBag using ShardedTensor. (#67188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67188 This diff/PR is trying to implement the ShardedEmbeddingBag using the ShardedTensor. We support both row-wise and column-wise sharding of the embedding bag. The detailed logic can be found in the comment. Several caveats: 1. Only the sharding of one weight is supported now. 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. Some other changes include: 1. Refactor the ShardedEmbedding code so that the common logic can be reused. 2. Fix tiny typos and corner cases in API `get_chunked_dim_size`. Where it will return -1 if the we set the dim_size = 5, split_size = 2, idx = 3. (This is a valid case because when chunks = 4, dim_size = 5, then the split_size = 2) ghstack-source-id: 142325915 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31749458 fbshipit-source-id: ed77e05e4ec94ef1a01b1feda8bbf32dc5d5da1b	2021-11-03 17:39:18 -07:00
Rohan Varma	fd77fff0b1	[FSDP] customizable backend in test (#67135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67135 Add ability to use env var backend for quicker testing (and gloo2 in the future) ghstack-source-id: 142274304 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31878285 fbshipit-source-id: 80ae7107cd631a1a15ebc23262b27d8192cfe4b6	2021-11-03 15:45:52 -07:00
soulitzer	83e8612d11	Clean up test autograd (#67413 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/66066 This PR: - cleans up op-specific testing from test_autograd. test_autograd should be reserved for testing generic autograd functionality - tests related to an operator are better colocated - see the tracker for details What to think about when moving tests to their correct test suite: - naming, make sure its not too generic - how the test is parametrized, sometimes we need to add/remove a device/dtype parameter - can this be merged with existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/67413 Reviewed By: jbschlosser, albanD Differential Revision: D32031480 Pulled By: soulitzer fbshipit-source-id: 8e13da1e58a38d5cecbfdfd4fe2b4fe6f816897f	2021-11-03 15:26:09 -07:00
Natalia Gimelshein	ca445645f9	Revert D31902471: [nnc] Add support for dynamic shapes in TensorExprKernel Test Plan: revert-hammer Differential Revision: D31902471 (`15a3c374e2`) Original commit changeset: d2729a38ba1a fbshipit-source-id: 4c05de82e626bbf744df84fd2b914b66fd165a19	2021-11-03 14:48:12 -07:00
Tao Xu	603116a6ae	[Core ML][easy] Assign missing properties to the executor (#67737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67737 As title says ghstack-source-id: 142277212 Test Plan: - buck test pp-ios - circleci Reviewed By: hanton Differential Revision: D32123661 fbshipit-source-id: eff3068669f8fdc573dc81b04bcc20ef153d8c4a	2021-11-03 14:15:53 -07:00
Yusuo Hu	fddfb81dd0	Add BF16 type to _autocast_to_full_precision (#67707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67707 https://github.com/pytorch/pytorch/pull/63939/files has added FP16 support to torchscript. This is to add BF16 device type when doing full conversion. Test Plan: Unit test. Also tested BF16 locally on A100 using MLP model. Reviewed By: idning Differential Revision: D32027152 fbshipit-source-id: b2a5ff2b22ea1e02306b0399f2b39b8493be4f45	2021-11-03 14:06:50 -07:00
Pritam Damania	05e17e7ff6	Add API usage logging for several other RPC APIs. (#67722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67722 ghstack-source-id: 142259452 Test Plan: waitforbuildbot Reviewed By: jaceyca, fduwjj Differential Revision: D32118872 fbshipit-source-id: 041ab5601221b1846c56ce4bb63364bec9ad28b0	2021-11-03 14:02:00 -07:00
Michael Suo	5fd93fb5f8	broaden retries on TestHub (#67779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67779 Not all flaky failures from this test are URLErrors; I think we should err on the side of being expansive with retries here. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D32145434 Pulled By: suo fbshipit-source-id: 3c3274b2080681fcafb3ea6132e420605f65c429	2021-11-03 13:48:58 -07:00
Hao Lu	89b02fc70b	[StaticRuntime][Easy] Correct typos in test_static_runtime (#67739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67739 Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: mikeiovine Differential Revision: D32125879 fbshipit-source-id: bd989e5088edff87624b858bd9045dfe9da3fbe7	2021-11-03 13:24:46 -07:00
Peter Bell	4d601a1c36	codegen: Split up source, header and Declarations.yaml generation (#67497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67497 This allows more of the code-generation to happen in parallel, whereas previously all codegen was serialized. Test Plan: Imported from OSS Reviewed By: dagitses, mruberry Differential Revision: D32027250 Pulled By: albanD fbshipit-source-id: 6407c4c3e25ad15d542aa73da6ded6a309c8eb6a	2021-11-03 13:20:54 -07:00
Peter Bell	fe91906ad7	Remove Declarations.yaml dependency from gen_autograd (#67496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67496 gen_autograd.py doesn't use `Declarations.yaml` any more, and removing the dependency allows it to run in parallel with `tools/codegen/gen.py`. Test Plan: Imported from OSS Reviewed By: dagitses, ejguan Differential Revision: D32027251 Pulled By: albanD fbshipit-source-id: 2cc0bbe36478e6ec497f77a56ab8d01c76145703	2021-11-03 13:19:24 -07:00
Mike Iovine	9b1caca185	[SR] Macro to clean up c10::Symbol maps in passes (#67484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67484 Maps from `c10::Symbol -> c10::Symbol` can be hard to parse when `fromQualString` is scattered everywhere. I've been annoyed by this issue many times when rebasing, and have even messed up `FuseListUnpack` a few times. Introduce a macro to make it easier to see what maps to what. Test Plan: CI Reviewed By: hlu1 Differential Revision: D32004451 fbshipit-source-id: 1086254c8403a0880d014512c439edbefa6fa015	2021-11-03 12:57:07 -07:00
Mike Iovine	0eaa01ead1	[SR] Add EliminateTrivialEquallySplit graph pass (#67166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67166 This optimization is not really the same thing as `FuseListUnpack`, and mixing the logic in that pass is confusing and error-prone. It should really be its own pass. It's slower since we have to do another pass over the graph, but this is not perf critical code; readability is more important. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D31887458 fbshipit-source-id: 289e281d512435861fccfe19f017751ad015688c	2021-11-03 12:57:05 -07:00
Xu Zhao	6cc6a5fd9d	Fix a bug in TorchBench ondemand CI. (#67743 ) Summary: Use the main branch when TorchBench branch is not specified. RUN_TORCHBENCH: soft_actor_critic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67743 Reviewed By: seemethere Differential Revision: D32142663 Pulled By: xuzhao9 fbshipit-source-id: 160227835543b8e55c970025073839bf0f03aa81	2021-11-03 12:55:52 -07:00
Charles David Hernandez	f455030931	Adding a docstring for memoryless in observer args (#67690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67690 see title [skip ci] Test Plan: python setup.py develop Imported from OSS Reviewed By: ejguan Differential Revision: D32107512 fbshipit-source-id: da5668339716d44720672f7b71a991b23530461e	2021-11-03 12:46:44 -07:00
Natalia Gimelshein	98be5216e2	Revert D32104006: [pytorch][PR] Added forward derivatives for neg, diag, inverse, linalg_eig Test Plan: revert-hammer Differential Revision: D32104006 (`88c61b8d06`) Original commit changeset: 1f6ace09ee3e fbshipit-source-id: f9f950b4177e1fe29b9059f4b5dfb9c8c67f479a	2021-11-03 12:40:00 -07:00
Michael Suo	6df0d7d502	[lint] add basic lintrunner compatibility (#67110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67110 Adds support for using lintrunner with: - clang-format - clang-tidy - flake8 - mypy Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D32145555 Pulled By: suo fbshipit-source-id: 2150348e26fba4ae738cd0b9684b2889ce0f1133	2021-11-03 12:35:28 -07:00
Shashank Chaudhry	89c4e8c22b	[NOOP][clangformat][codemod] Enable CLANGFORMAT for some folders in caffe2/* (#67746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67746 Test Plan: Visual inspection. Sandcastle. Reviewed By: zertosh Differential Revision: D31986646 fbshipit-source-id: 91885c20c3cead3853c49abb9fe0a94a67f33cc8	2021-11-03 12:23:14 -07:00
Eddie Yan	a5b57c9433	Avoid prematurely casting GEMM parameters `alpha`, `beta` to `scalar_t` (#67633 ) Summary: stas00 uncovered an issue where certain half-precision GEMMs would produce outputs that looked like the result of strange rounding behavior (e.g., `10008.` in place of `10000.`). ptrblck suspected that this was due to the parameters being downcasted to the input types (which would reproduce the problematic output). Indeed, the GEMM and BGEMM cublas wrappers are currently converting the `alpha` and `beta` parameters to `scalar_t` (which potentially is reduced precision) before converting them back to `float`. This PR changes the "ARGTYPE" wrappers to use `acc_t` instead and adds a corresponding test. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/67633 Reviewed By: mruberry Differential Revision: D32076474 Pulled By: ngimel fbshipit-source-id: 2540d9b9d0195c17d07d1161374fb6a5850779d5	2021-11-03 12:01:04 -07:00
Eli Uriegas	3f33ada8d5	.github: Forward fix generating GHA workflows (#67777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67777 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D32143785 Pulled By: seemethere fbshipit-source-id: fb129244bdd46ffda05ed51b16183395152d7296	2021-11-03 11:36:27 -07:00
Raghavan Raman	15a3c374e2	[nnc] Add support for dynamic shapes in TensorExprKernel (#67197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67197 Test Plan: Imported from OSS Reviewed By: eellison, ZolotukhinM Differential Revision: D31902471 Pulled By: navahgar fbshipit-source-id: d2729a38ba1ac607ff07f516ed56fbd9085715dc	2021-11-03 11:24:17 -07:00
Matthias Reis	88c61b8d06	Added forward derivatives for neg, diag, inverse, linalg_eig (#67339 ) Summary: See also discussion in https://github.com/pytorch/pytorch/issues/10223, starting from [this](https://github.com/pytorch/pytorch/issues/10223#issuecomment-949499666) comment The formulas for the derivatives are taken from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf. As indicated, the method linalg_eig_jvp should be used instead of linalg_eig_jvp_eigenvalues and linalg_eig_jvp_eigenvectors in the future. Due to a codegen limitation, this is not yet possible. CC albanD Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/67339 Reviewed By: ejguan Differential Revision: D32104006 Pulled By: albanD fbshipit-source-id: 1f6ace09ee3e737b99520543b30550601809ceb5	2021-11-03 11:21:54 -07:00
Jane Xu	a23814577b	Overload TestCase not vanilla TestCase for some elastic tests (#67700 ) Summary: Addresses a bit of https://github.com/pytorch/pytorch/issues/66903 Fixes it so that https://github.com/pytorch/pytorch/issues/66207 can be properly disabled cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67700 Reviewed By: H-Huang Differential Revision: D32116908 Pulled By: janeyx99 fbshipit-source-id: 205ff68a7408609cfced2357fd99f41949ef6390	2021-11-03 11:14:52 -07:00
neerajprad	201f7d330a	Remove duplicate check in distributions arg validation (#67741 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/66800. (Duplicate of https://github.com/pytorch/pytorch/issues/67725 against pytorch/pytorch so as to trigger TorchBench) https://github.com/pytorch/pytorch/issues/61056 added a more verbose error message for distributions failing argument validation. However, it did not replace the earlier error check as was originally intended and was flagged by xuzhao9 as being the potential cause of a perf regression in `test_eval[soft_actor_critic-cuda-eager]`. xuzhao9: Is there a way for me to check if this resolves the perf issue you mentioned? cc VitalyFedyunin ngimel Note that existing tests already check for the error message and should verify that the removed lines are redundant. RUN_TORCHBENCH: soft_actor_critic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67741 Reviewed By: neerajprad Differential Revision: D32135675 Pulled By: xuzhao9 fbshipit-source-id: 37dfd3ff53b95017c763371979ab3a2c302a72b9	2021-11-03 10:41:41 -07:00
Slava Kovalevskyi	1ffd43cf0c	generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit migrated to GHA (#67695 ) Summary: in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes: * generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit deleted from circle * pytorch_android_gradle_custom_build_single removed since it is no longer used * generated-pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit added to GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/67695 Reviewed By: malfet, seemethere, ejguan Differential Revision: D32115620 Pulled By: b0noI fbshipit-source-id: 113d48303c090303ae13512819bac2f069a2913f	2021-11-03 10:29:37 -07:00
Shiyan Deng	4a106e41e9	[fx2trt] Add torch.nn.function.pad support for fx2trt (#67498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67498 Add acc_ops.pad and a converter for it. We want to try padding convolution channel dimension to get better int8 performance. This one only support padding the last two dimension though. Starting from 8.2, it's suggested to use Slice layer to do padding but this might be nice to have for old version support. Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_pad Reviewed By: wushirong Differential Revision: D32006072 fbshipit-source-id: 96c3aa2aec2d28345d044a88bee2f46aba5cca0e	2021-11-03 10:21:08 -07:00
Raghavan Raman	383c1f51b1	[nnc] Fixed handling of 0-sized tensors in cat (#67734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67734 The implementation of `aten::cat` op in NNC has to ignore tensors that have 0-size in any dimension. Test Plan: `buck test mode/dev-nosan //caffe2/test/cpp/tensorexpr:tensorexpr -- --exact 'caffe2/test/cpp/tensorexpr:tensorexpr - Kernel.CatWithEmptyInputs'` Reviewed By: ZolotukhinM Differential Revision: D32122171 fbshipit-source-id: 90c697813bc504664673cdc262df6e7ce419c655	2021-11-03 10:16:16 -07:00
Xiao Wang	31cf3d6aad	Fix adaptive_max_pool2d for channels-last on CUDA (#67697 ) Summary: Fix https://github.com/pytorch/pytorch/issues/67239 The CUDA kernels for `adaptive_max_pool2d` (forward and backward) were written for contiguous output. If outputs are non-contiguous, first create a contiguous copy and let the kernel write output to the contiguous memory space. Then copy the output from contiguous memory space to the original non-contiguous memory space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67697 Reviewed By: ejguan Differential Revision: D32112443 Pulled By: ngimel fbshipit-source-id: 0e3bf06d042200c651a79d13b75484526fde11fe	2021-11-03 09:47:29 -07:00
Mikhail Zolotukhin	ff5c61a74e	[TensorExpr] Add lowering for aten::max (reduction). (#66519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66519 Differential Revision: D31590853 D31590853 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: a702621621f681d7f5392912e8a77ca124e14170	2021-11-03 09:44:09 -07:00
Mikhail Zolotukhin	00afe9ba7b	[TensorExpr] Add lowering for aten::embedding. (#66518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66518 Differential Revision: D31590855 D31590855 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: aace0a87b1649330dae44182f7873aca27160d64	2021-11-03 09:44:07 -07:00
Mikhail Zolotukhin	008a58d226	[TensorExpr] Add lowering for aten::conv1d. (#66517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66517 Differential Revision: D31590856 D31590856 Test Plan: Imported from OSS Reviewed By: pbelevich Pulled By: ZolotukhinM fbshipit-source-id: c05a37d8741acd0606c2adb8d6cfeb1f57bc8aa0	2021-11-03 09:44:05 -07:00
Mikhail Zolotukhin	d58ef2bbff	[TensorExpr] Fix lowering for aten::softmax for the case when dtype parameter is None. (#66516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66516 Differential Revision: D31590858 D31590858 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 0aeee7a5be64b3b9c8fa00aacb1a94031a7e25d1	2021-11-03 09:42:48 -07:00
Gordon Fossum	ea4d983885	Modify "gemm" code to enable access to "sbgemm_" routine in OpenBLAS (#58831 ) Summary: OpenBLAS recently added support for bfloat16 GEMM, so this change has PyTorch call out to OpenBLAS for that, like it does for single and double precision Our goal is to try to enable PyTorch to make calls to "sbgemm" in OpenBLAS. We are prepared (if it is your preference) to add fences to the code to limit this change to the Power architecture, but our first instinct is that anyone on any architecture that enables access to sbgemm in their OpenBLAS library should be able to use this code. (but again, we respect that as we are just starting to modify PyTorch, we respect your guidance!) (there is no issue number related to this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58831 Reviewed By: albanD Differential Revision: D29951900 Pulled By: malfet fbshipit-source-id: 3d0a4a638ac95b2ff2e9f6d08827772e28d397c3	2021-11-03 08:53:27 -07:00
Richard Zou	05d1dcc14c	Split channels_last test cases for tensor conversion OpInfos (#67368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67368 This PR adds an addition test variant for the tensor conversion functions (bfloat16, char, long, ...) that tests channels_last. This is because some backends (mostly just functorch right now) don't have channels last handling and may want to test that separately from the more general case of these operations. Test Plan: - wait for tests Reviewed By: mruberry Differential Revision: D31972959 Pulled By: zou3519 fbshipit-source-id: 68fea46908b2cdfeb0607908898bb8f9ef25b264	2021-11-03 07:39:41 -07:00
Vasiliy Kuznetsov	92a85ecbab	add a quantized hardsigmoid inplace variant (#65740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65740 fp32 hardsigmoid supports inplace. This PR adds the inplace support to the quantized hardsigmoid function, to make the signatures match. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qhardsigmoid ``` Reviewed By: supriyar Differential Revision: D31992282 Pulled By: vkuzo fbshipit-source-id: f6be65d72954ab8926b36bb74a5e79d422fbac90	2021-11-03 07:35:31 -07:00
Matt Galloway	e32d7f7525	ATen \| Fix potential crash if `MTLCreateSystemDefaultDevice` return nil (#66859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66859 `MTLCreateSystemDefaultDevice` can return `nil`. If that happens then inside `createDeviceInfo`, we'll crash trying to convert the `nullptr` from `device.name.UTF8String` into a `std::string`. Let's fix it by returning early in setup if there's no Metal device. But also make `createDeviceInfo` safe if we do pass in `nil`. Test Plan: * CircleCI Reviewed By: xta0 Differential Revision: D31759690 fbshipit-source-id: 74e878ab5b8611250c4843260f1d2e4eab22cdaf	2021-11-03 03:03:45 -07:00
Scott Wolchok	510336499b	[PyTorch][Static Runtime] Separate overlap checks for easier debugging (#66637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66637 We can give more information when verify_no_memory_overlap would fail by separating the DCHECK. ghstack-source-id: 142226105 Test Plan: fitsships Reviewed By: d1jang Differential Revision: D31517151 fbshipit-source-id: 8cbc324c27f6b4db4489d1bd469d37b1d8ae6ce1	2021-11-02 23:59:04 -07:00
Nikolay Korovaiko	3db536e55e	add jit_trace_module python binding (#67425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67425 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998564 Pulled By: Krovatkin fbshipit-source-id: f7e38c8c3f560f2c4e5ed62e1acae2c100efebd4	2021-11-02 23:55:23 -07:00
Nikolay Korovaiko	a8757cdd70	type inputs (#67424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67424 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998565 Pulled By: Krovatkin fbshipit-source-id: 8a2b8b3f13a361fe8fce7c7c930bbfd357ef8ac1	2021-11-02 23:55:21 -07:00
Nikolay Korovaiko	d352587210	add a few convenience helpers to removeAllXXX to Block and Node (#67423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67423 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31998566 Pulled By: Krovatkin fbshipit-source-id: ed435d5c35e44ab2676c47b43d6e2aa8e79d9ab2	2021-11-02 23:54:02 -07:00
Rohan Varma	7f3326a6d2	[FSDP] CPU offload resubmit (#67249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67249 Implements CPU offload for model parameters in FSDP. - CPU offload class with only offload_params attribute is created - If this is specified in FSDP ctor, model parameters are moved back to CPU after sharding in __init__ - In forward pass, during lazy init, p._local_shard gets set to p.data so it is on CPU. We pin_memory here. - In forward pass, in _rebuild_full_params, we move p.data back to self.compute_device if necessary. Note that we don't use the device of p._full_param_padded because we don't always have this attr, but when we do its always the same as compute_device. - The same logic as above applies to the beginning of backwards pass. - At end of fwd and end of bwd, `_use_param_local_shard` takes care to ensure the parameters are offloaded to CPU again, by pointing it to p._local_shard, which is always on CPU. Regarding tests: - We tests 3 different types of init: 1) CUDA the model before wrapping with FSDP, 2) CUDA the model after wrapping with FSDP, 3) never CUDA the model. - Case 1 is always supported. Case 2 is not supported with CPU offload and throws an error during fwd pass. Case 3 is only supported with CPU offload at the moment. - Verifies all params are offloaded to CPU after init. - Verifies all params are offloaded to CPU after forward and backward. - Note that there is an issue with verifying exact parity when CPU offloading, but it appears to be related to transfering model back and forth cpu/CUDA. More details in https://github.com/pytorch/pytorch/pull/66961 ghstack-source-id: 141851903 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31911085 fbshipit-source-id: 3ddf73c070b55ce383e62251868d609004fc30e7	2021-11-02 23:27:34 -07:00
Shashank Chaudhry	06d1be2447	[NOOP][clangformat][codemod] Enable CLANGFORMAT for caffe2/caffe2/* (#67624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67624 Test Plan: Visual inspection. Sandcastle. Reviewed By: malfet Differential Revision: D31986628 fbshipit-source-id: c872bded7325997a2945dbf5d4d052628dcb3659	2021-11-02 22:14:04 -07:00
Don Jang	e86a5a3a1a	[Static Runtime] Add PyTorchPredictor::predict_managed_result to return managed output tensors (#65598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65598 This change adds `PyTorchPredictor::predict_managed_result` to enable Static Runtime to return managed output tensors, allocated and owned by Static Runtime to accelerate inference workloads. - `PyTorchPredictor::predict_managed_result` does only meaningful work for the overridden `PyTorchStaticRuntimePredictor::predict_managed_result`. For other subclasses, it returns a simple object that just wraps the returned `Ivalue`. - When `manage_output_tensors` is enabled, a `StaticRuntime` cannot be reentered until its return value gets deallocated by calling `StaticRuntime::deallocateOutputTensors`. Currently an instance of `StaticRuntime` gets immediately pushed back to `static_runtime_pool` to be reentered again, and this cannot be done when `manage_output_tensors` is enabled. `PyTorchStaticRuntimePredictorManagedResult` makes sure to delay pushing a `StaticRuntime` instance back to the pool only after `StaticRuntime::deallocateOutputTensors` is called on the runtime instance. - When `manage_output_tensors` is enabled, `PyTorchStaticRuntimePredictor::predict_managed_result` returns the prediction result, whose backing memory is managed by an instance of `StaticRuntime`. The lifetime of any value reachable from `PyTorchStaticRuntimePredictorManagedResult.get()` is expected to end before `PyTorchStaticRuntimePredictorManagedResult` gets destructed. As explained above, `PyTorchPredictorManagedResult`'s destruction pushes the runtime instance that returned the result back to `static_runtime_pool` to be reused again. - The current API design of adding `predict_managed_result` instead of forcing `operator()` to return `PyTorchPredictorManagedResult` was motivated by the fact that `manage_output_tensors` will be selectively enabled just for a few models. In case `manage_output_tensors` becomes a commonly used feature we should revisit this API design to merge them together. Reviewed By: hlu1 Differential Revision: D31149323 fbshipit-source-id: 5ca026188077232d6a49a46759124a978439d7b2	2021-11-02 22:10:26 -07:00
Shen Li	18955d3564	Raise warning when calling collectives on non-member group objects (#67639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67639 Due to BC considerations, we cannot directly error out, as that might break existing applications. Raise warnings first to improve debuggability. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D32075151 Pulled By: mrshenli fbshipit-source-id: 5680d420f5f6cd3f74a36616c03350e8a976b363	2021-11-02 20:04:07 -07:00
Jerry Zhang	54241a9cfa	[quant][fx] Add support for fused modules in _convert_do_not_use (#67245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67245 Add support for fused modules in the new convert path, including linear-relu, conv{1-3}d-relu and their qat versions, also tested with trt (conv2d-relu and linear-relu) Test Plan: ``` python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_linear_relu_module python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_conv_relu_module ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D31919724 fbshipit-source-id: 7e5c96eba30706f7989da680aa3443159847bdfd	2021-11-02 19:21:54 -07:00
Nikita Shulga	91971dfc2a	[BE] [GHA] Use `aws ecr get-login-password` (#67709 ) Summary: Replacing `aws ecr get-login` with `awc ecr get-login-password`, per https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration.html#cliv2-migration-ecr-get-login Follow up after the similar change in CircleCI: https://github.com/pytorch/pytorch/pull/58308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67709 Reviewed By: seemethere, janeyx99 Differential Revision: D32119319 Pulled By: malfet fbshipit-source-id: 0cd0d8f4d81e9981a5f8fbf9b812a9167fd48135	2021-11-02 19:06:50 -07:00
Joe Early	16ee6409ee	Changed value constraint of exponential dist (#67184 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67183. cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/67184 Reviewed By: ejguan Differential Revision: D32114661 Pulled By: neerajprad fbshipit-source-id: ea23e59f38a23a7b0bab4fbbd98ae3feba468b9c	2021-11-02 17:44:56 -07:00
Rohan Varma	885da61d7d	[PG NCCL] Disable NCCL health check (#67668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67668 This adds an env var to enable NCCL health check, which when left unspecified, results in the check not being run. Unit tests that need to test this functionality have the env variable set. Please see internal diff for more details. Test Plan: CI Reviewed By: yuguo68, mrshenli Differential Revision: D32089763 fbshipit-source-id: dff5664a5e607f711515cd1042089ca769914fbb	2021-11-02 16:21:59 -07:00
Horace He	0b2f68eadf	Remove special FX OpInfo list (#67520 ) Summary: Most of the failing tests are since the test doesn't work with python functions (only builtins like `torch.add`). I added a check for that and ported the remaining skips into the `skips` field. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67520 Reviewed By: ZolotukhinM Differential Revision: D32046856 Pulled By: Chillee fbshipit-source-id: 05fa3e3c40fa6cc4f776e0c24f667629b14afd25	2021-11-02 16:01:46 -07:00
Peter Bell	96e3d1a76c	Remove native_functions.yaml dependency from Sorting.cu (#66621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66621 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31856099 Pulled By: dagitses fbshipit-source-id: d9c2b6b45099e49c7beaae5888140de350d23696	2021-11-02 14:46:29 -07:00
Peter Bell	7deb1726ea	Remove native_functions.yaml dependency from ScanKernels.cu (#66620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66620 This splits the Tensor-dependant code out into a cpp file. A slight complicating factor is `scan_dim` using `copy_` to handle non-contiguous out arguments. So, I've moved that code into the caller which does introduce some duplication. Though it's only ~10 lines extra in total. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31856106 Pulled By: dagitses fbshipit-source-id: 91bb4ce5e7c6487e3ea0d5ec4d9f7a625d8ef978	2021-11-02 14:45:17 -07:00
Eli Uriegas	9e97ccbd7a	.github: Migrate iOS workflows to GHA (#67645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67645 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D32104367 Pulled By: seemethere fbshipit-source-id: 08ff043ed5d0b434322f1f3f20dce2a4f5fa88c1	2021-11-02 14:38:43 -07:00
Salil Desai	a831713786	[PyTorch Edge] Use Integer Subtraction (Instead of Float) in Non-FBGEMM Dequantization (#67115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67115 This matches what FBGEMM does (https://fburl.com/code/vjrdn6tj → https://fburl.com/code/btkdn24l) Benchmark Mobile Vision Transformer Model Results (as described in D31066997 and config from rebasing onto v4 of D31869106): This diff (v18): - NET latency: 109.866 - https://our.intern.facebook.com/intern/aibench/details/536304563225483 This diff before using vsubl (v14 but rebased onto v22 of D31205883, the previous diff in this stack) - NET latency: 115.887 - https://our.intern.facebook.com/intern/aibench/details/906978557243297 Before this diff (v22 of D31205883): - NET latency: 116.449 - https://our.intern.facebook.com/intern/aibench/details/870678436773989 ghstack-source-id: 142166375 Test Plan: Phabricator tests + Running quantized_test on a pixel3a passes and Running mobile vision transformer model (as described in D31066997) both work Reviewed By: kimishpatel Differential Revision: D31483135 fbshipit-source-id: fbef00cad6087b49900d21c3dd3b6fd432f64e94	2021-11-02 14:28:03 -07:00
Salil Desai	23bd3cf5b2	[PyTorch Edge] Parallelize Quantize and Dequantize Tensor (#65845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65845 Benchmarking of Non-Parallelized and Parallelized quantization/dequantization for various devices and input sizes done in this notebook: https://www.internalfb.com/intern/anp/view/?id=1204834&scroll_cell=17&checkpoint_id=432447238302644 For example: - {F671713127} - {F671713209} - {F671713238} - {F671713253} When run on Partially Quantized Mobile Vision Transformer Model (as described in D31066997: Before this diff (on D31444248 v7): - [120.907ms](https://our.intern.facebook.com/intern/aibench/details/945891590820680) With this diff (v19): - Threshold = 2^16: [118.086ms](https://our.intern.facebook.com/intern/aibench/details/436376817372377) - Threshold = 2^20: [118.361ms](https://our.intern.facebook.com/intern/aibench/details/617543354077290) ghstack-source-id: 142166374 Test Plan: Same as previous diff (D31066997) All tests pass Also, set numel to 2^21 in quantized_test TestArmVectorizedAndParallelQuantizeDequantize (https://www.internalfb.com/diff/D31066997?dst_version_fbid=596325738080019&transaction_fbid=219437170135898) and the tests passed Reviewed By: kimishpatel Differential Revision: D31205883 fbshipit-source-id: 9ed0b11a376734feaf228074a24b8eb79d5270a3	2021-11-02 14:28:01 -07:00
Salil Desai	92cfda1785	[PyTorch Edge] Clean up Quantize Tensor code (#66220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66220 - Pass pointers rather than tensors to ```quantize_tensor_arm``` to allow for using ```__restrict__``` and to make parallelization easier (as in the next diff on this stack D31205883) - Replace ```auto``` with actual types - Replace raw cast with reinterpret_cast<...> - All of these changes make the code structure similar to that of Dequantize ghstack-source-id: 142166376 Test Plan: same as D31066997 (all tests pass) Reviewed By: kimishpatel Differential Revision: D31444248 fbshipit-source-id: 6a31d090082047263403f415911c199519987595	2021-11-02 14:27:59 -07:00
Salil Desai	16c62a6dc9	[PyTorch Edge] Optimize Dequantize Tensor with Intrinsics (#65844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65844 When run on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171), with config from rebasing onto v4 of D31869106 Before: [AIBench Run (128ms)](https://www.internalfb.com/intern/aibench/details/309792316534505) [Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881079420.html) After: [AIBench Run (117ms)](https://www.internalfb.com/intern/aibench/details/20433505461364) [Perf Report](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/model_perf_1635881527831.html) Total events spent on at::native::dequantize_quantized reduced from 1.97 Billion to 0.97 Billion (~50% Reduction) ghstack-source-id: 142166373 Test Plan: To run quantized_test - Clone open source repo - Set ANDROID_NDK and ANDROID_SDK - Build with ```BUILD_MOBILE_BENCHMARK=1 BUILD_MOBILE_TEST=1 ANDROID_DEBUG_SYMBOLS=1 BUILD_LITE_INTERPRETER=0 ANDROID_ABI=arm64-v8a ./scripts/build_android.sh -DANDROID_CCACHE=$(which ccache) -DBUILD_BINARY=ON``` - Move ```build_android/bin/quantized_test``` to devserver - Use one world to connect to android device (ex. ```one_world android device pixel-3a```) - In another terminal: Make quantized_test executable (```chmod +x quantized_test```), copy it to android device (```adb push quantized_test /data/local/tmp```), and run it (```adb shell /data/local/tmp/quantized_test```) Results: {F676102702} Also ```buck test mode/dev //caffe2/aten:quantized_test``` passes To test performance on [Partially Quantized Mobile Vision Transformer Model](https://www.internalfb.com/diff/D30648171) with AI Bench: - Save this config file: P466124028 (for example: D31869106) - Before or after the changes in this diff, run ```buck run aibench:run_bench -- -b benchmark_mobile_vision_transformer_model_config.json --platform android/arm64 --framework pytorch --remote --devices Pixel-3a-11-30 --force_profile``` Reviewed By: kimishpatel Differential Revision: D31066997 fbshipit-source-id: 9067e683e0181aa13a2b636b68ac4fe5a4b2e618	2021-11-02 14:26:42 -07:00
Shirong Wu	9cef2033f3	Modify decorator for acc op converters (#67636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67636 Modify decorator to denote whether a acc op converter is able to support explicit/implicit batch dim. This info will be used by trt_splitter when determine whether a node can be split into acc graph. This is can prevent us from split node to acc module and later found no proper converter exist for the node and fail the lower process. Test Plan: unit test Reviewed By: 842974287 Differential Revision: D31998477 fbshipit-source-id: 6789ebef4a76f9a0c1ab3edf8e846a5b6143326b	2021-11-02 13:35:40 -07:00
Sisil Mehta	5ad169b7cc	Adding in Wrap functions for FSDP from Fairscale (#67292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67292 as title Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/fsdp:wrap --keep-going Reviewed By: rohan-varma Differential Revision: D31936404 fbshipit-source-id: b7ebead9a649766aec83e5630c2ce1386ad33e11	2021-11-02 13:30:41 -07:00
Nikita Shulga	8f63cfda14	[LiteInterpreter] Specify `Loader` to `yaml.load` (#67694 ) Summary: It became a mandatory argument since PyYaml-6, but has been present since PyYaml-3 Unblock migration to newer runtime Pull Request resolved: https://github.com/pytorch/pytorch/pull/67694 Reviewed By: seemethere Differential Revision: D32106043 Pulled By: malfet fbshipit-source-id: 35246b97a974b168c066396ea31987b267534c7f	2021-11-02 12:52:57 -07:00
Sicheng Stephen Jia	b00206d473	[vulkan] Use 3D textures for everything (#67647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67647 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D32102196 Pulled By: SS-JIA fbshipit-source-id: ded1835386a0640181f69c190a2294d298311e26	2021-11-02 12:29:26 -07:00
Mike Iovine	0ee8473af7	[SR][easy] Fix FuseListUnpack 0-use corner case (#67165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67165 We previously skipped the optimization if `value_out->uses().size() > 1`. But it's possible that the number of uses is 0. In that case, it's not safe to access `value_out->uses()[0]`. This is not causing any problems in production right now since we don't have any dead code before running this pass. But we should handle this case correctly to make the pass more robust. Test Plan: CI Reviewed By: hlu1 Differential Revision: D31887416 fbshipit-source-id: d30a5824e8bd1cda1debdc16524db3fb0da312f9	2021-11-02 12:17:16 -07:00
Michael Suo	6b1d8e5bb2	Revert D31861962: [qnnpack] Remove redundant fp16 dependency Test Plan: revert-hammer Differential Revision: D31861962 (`4061239fdd`) Original commit changeset: e1425c7dc3e6 fbshipit-source-id: 418f8173c19b9541316443e1ab4ec39062561b5e	2021-11-02 11:55:07 -07:00
Scott Wolchok	3e218dbd27	[PyTorch] Capture function args from schema by reference (#65951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65951 Profiling shows that we do a bunch of heap allocations to copy Argument structs in append_operator. Capturing by reference here should be safe as long as the schema objects args is from outlive the operator function. IMPORTANT: Reviewers (or automated tests if we're lucky) need to confirm that the above is true or we're going to have fun use-after-free bugs. ghstack-source-id: 142065422 Test Plan: AIBench run for speech model on MilanBoard control: https://www.internalfb.com/intern/aibench/details/485570882988661 (mean 906 ms) test: https://our.intern.facebook.com/intern/aibench/details/620835625995669 (mean 818 ms) So almost a 10% improvement in the wall time metric? Reviewed By: iseeyuan Differential Revision: D31319988 fbshipit-source-id: 7da56357420df500df344f49007e070ebb1bc581	2021-11-02 11:12:04 -07:00
Scott Wolchok	33d62266f2	[PyTorch][easy] Avoid allocating OperatorName strings in append_operator (#66134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66134 No reason to do the comparison the old way when we could do it this way and avoid copying into std::string. ghstack-source-id: 142065423 Test Plan: AIBench Milan run shows neutral to slight regression, but I think we should probably just make this change anyway. Reviewed By: dhruvbird Differential Revision: D31319669 fbshipit-source-id: dde329a4f2c4054f275eb98fb6556f5341e7533a	2021-11-02 11:10:52 -07:00
Mike Iovine	2644725937	[SR] Migrate gather_ranges_to_dense to new FuseListUnpack (#67164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67164 Migrated both the variadic and non-variadic versions. This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons: * You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native). * The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true. This diff finishes the migration, so the original version of `FuseListUnpack` is removed Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Accuracy Test Done at the top of this diff stack. Reviewed By: hlu1 Differential Revision: D31887386 fbshipit-source-id: 9d44c813667a75bce13dce62bf98e6109edea6ba	2021-11-02 11:04:59 -07:00
Scott Wolchok	82f7f8d471	[PyTorch] Adopt IValue::toTupleRef() where obvious (#65505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65505 Generated with `fastmod -m 'toTuple(\s)->' 'toTupleRef()${1}.'` , followed by `fastmod '(std::move$.)toTupleRef\($.' '${1}toTuple()->'` to unbreak 2 callsites. ghstack-source-id: 142065835 Test Plan: CI Reviewed By: gchanan Differential Revision: D31131025 fbshipit-source-id: 54457ae5bbeb38db9c7f196d469b98521c3d3f34	2021-11-02 10:22:18 -07:00
Slava Kovalevskyi	eb1b8a2160	pytorch_android_gradle_custom_build_single migrated from Circle to GHA. (#67577 ) Summary: in scope of: https://github.com/pytorch/pytorch/issues/67301. Main changes: * pytorch_android_gradle_custom_build_single removed from the circle (however template is still there since it is used by another similar workflow: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit, which will be migrated next) * new GHA workflow added: pytorch_android_gradle_custom_build_single Pull Request resolved: https://github.com/pytorch/pytorch/pull/67577 Reviewed By: malfet, mruberry Differential Revision: D32087709 Pulled By: b0noI fbshipit-source-id: f9581558ddc1453b63264bf19fe5a4c245b7c007	2021-11-02 10:21:03 -07:00
Scott Wolchok	d9bac7c316	[PyTorch] Add IValue::toTupleRef() (#65504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65504 We should be able to borrow a Tuple from an IValue without incurring refcount bumps. ghstack-source-id: 142065833 Test Plan: Added test coverage. Profiled static runtime on the local_ro net for ctr_mobile_feed. Inclusive time spent in VarTupleUnpack decreased about 0.3%, which roughly matches with the 0.36% of runtime that was previously spent in IValue::toTuple(). Reviewed By: hlu1 Differential Revision: D31130570 fbshipit-source-id: afa14f46445539e449068fd908d547b8da7f402c	2021-11-02 10:16:25 -07:00
Scott Wolchok	7cd62621fb	[PyTorch] Adopt faster Tuple::create (#65381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65381 The previous diff adds a way to make Tuples of size 3 or less more efficiently. This diff makes it easier to hit that path and updates a bunch of callsites to hit it. ghstack-source-id: 142065832 Test Plan: CI Reviewed By: ezyang Differential Revision: D31069538 fbshipit-source-id: d04da3709594ed68ab1c0a1471f8cffd8d001628	2021-11-02 10:10:31 -07:00
Howard Huang	9e71ea292d	Fix test_init_pg_and_rpc_with_same_socket by retrying on addr in use error (#67638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67638 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D32074698 Pulled By: H-Huang fbshipit-source-id: 6b980fcdac4b0f1edfe086d0deb99be371a73900	2021-11-02 09:42:47 -07:00
Digant Desai	4061239fdd	[qnnpack] Remove redundant fp16 dependency (#67281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67281 `qnnpack/operator.h` introduces a dependency on an external library fp16 via `qnnpack/requantization.h`. Including `qnnpack/operator.h` in `pytorch_qnnpack.h` will make objects who really don't require fp16 depend on it indirectly because they include `pytorch_qnnpack.h`. This was causing some test and bench targets to fail building for local and android/arm64 (only two tried) using cmake. This diff moves `qnnpack/operator.h` from `pytorch_qnnpack.h` to `qnnpack_func.h`, and explicitly add `qnnpack/operator.h` in `src/conv-prepack.cc`. Test Plan: Ran all the tests for local on my devserver, and arm64 on Pixel3a. Reviewed By: kimishpatel Differential Revision: D31861962 fbshipit-source-id: e1425c7dc3e6700cbe3e46b64898187792555bb7	2021-11-02 09:29:55 -07:00
ankitaS11	cd51d2a3ec	Adding OpInfo for `logical_or`, `logical_and`, `logical_xor` (#67178 ) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/54261. This adds OpInfos for binary logical element wise operators. This is my first PR in OpInfos to PyTorch, looking forward to suggestions and any feedback. cc: mruberry krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/67178 Reviewed By: jbschlosser Differential Revision: D32057889 Pulled By: mruberry fbshipit-source-id: 7e670260af6b478dba9d6e8d77de4df1b6d0b5d1	2021-11-01 20:27:45 -07:00
Shunting Zhang	c65f332da4	torch::deploy unity and its demo (#67134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67134 This diff demos torch::deploy unity which builds the model, the dependencies and the runtime as a unity! The end user only need to use the build_unity rule to replace the python_binary rule to define the python application. Under the hood, we build the python application (an xar file), build the torch deploy runtime, and then embed the python application (the xar file) into the torch deploy runtime. When starting the torch::deploy runtime, the xar will be written to the filesystem and extracted. We put the extracted path to python sys.path so all the model files and all the python dependencies can be found! As a demo, the model here is just a simple python program using numpy and scipy. But theoretically, it can be as complex as we want. I'll check how bento_kernel works. Maybe we can learn from bento_kernel to simplify things a bit. ghstack-source-id: 142085742 Test Plan: ``` #build buck build mode/opt unity:unity # make sure the path exists before we start torch::deploy runtime # Otherwise the dynamic loader will just skip this non-existing path # even though we create it after the runtime starts. mkdir -p /tmp/torch_deploy_python_app/python_app_root #run LD_LIBRARY_PATH=/tmp/torch_deploy_python_app/python_app_root ~/fbcode/buck-out/gen/caffe2/torch/csrc/deploy/unity/unity ``` Reviewed By: suo Differential Revision: D31816526 fbshipit-source-id: 8eba97952aad10dcf1c86779fb3f7e500773d7ee	2021-11-01 19:32:49 -07:00
Sicheng Stephen Jia	ec6b472e0a	[vulkan] Add prepacking for conv2d_transpose (#67358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67358 Test Plan: Imported from OSS Reviewed By: beback4u Differential Revision: D31970903 Pulled By: SS-JIA fbshipit-source-id: 128deb40dc14fb97aa61af9cddab4540b630359e	2021-11-01 17:59:32 -07:00
francescocastelli	152f665dee	Inserted check for PyObject_IsInstance in THPVariableCheck (#67588 ) Summary: Inserted check for the return of PyObject_IsInstance to capture the case in which it raises an exception and return -1. When this happen THPVariable_Check now throws a python_error to signal the exception. Fixes https://github.com/pytorch/pytorch/issues/65084 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67588 Reviewed By: mruberry Differential Revision: D32064776 Pulled By: albanD fbshipit-source-id: 895c7682e0991ca257e27f9638a7462d83707320	2021-11-01 16:53:54 -07:00
Pearu Peterson	c4bf196334	Strided masked reduction: mean (2nd try) (#67088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088 Stack from [ghstack](https://github.com/ezyang/ghstack): * __->__ #67088 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D32070264 Pulled By: cpuhrsch fbshipit-source-id: 08a91550dd24fb0f51abf06591a0e26186c4f9f9	2021-11-01 16:12:07 -07:00
Jacob Szwejbka	53e6aca8b3	[Pytorch Edge] Make More Classes Selective (#67397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67397 Expand selectivity coverage to classes created outside of TORCH_LIBRARY. ghstack-source-id: 142076940 Test Plan: Model unit tests, manually run some models on prod apps. Reviewed By: dhruvbird, bdhirsh Differential Revision: D31978965 fbshipit-source-id: 708901b47a9838ac54c78788028d0e18c1e378c0	2021-11-01 15:12:30 -07:00
francescocastelli	45d5b3248b	Fixed C++ BatchNorm pretty_print() with optional momentum (#67335 ) Summary: Summary : Inserted a check for the momentum and print "None" in case is not defined. See https://github.com/pytorch/pytorch/issues/65143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67335 Test Plan: The code below now prints `torch::nn::BatchNorm2d(128, eps=1e-05, momentum=None, affine=true, track_running_stats=true)` without generating errors. ``` torch::nn::BatchNorm2d m(torch::nn::BatchNormOptions(128).momentum(c10::nullopt)); std::cerr << *m << "\n"; ``` Fixes https://github.com/pytorch/pytorch/issues/65143 Reviewed By: mruberry Differential Revision: D32067820 Pulled By: ngimel fbshipit-source-id: f40f9bbe090aa78e00f6c3a57deae393d946b88d	2021-11-01 14:45:33 -07:00
John Shen	234bd6dc56	[quantized] Add bilinear quantized grid_sample (#66879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66879 This adds a quantized implementation for bilinear gridsample. Bicubic interpolation cannot be supported as easily since we rely on the linearity of quantization to operate on the raw values, i.e. f(q(a), q(b)) = q(f(a, b)) where f is the linear interpolation function. ghstack-source-id: 141321116 Test Plan: test_quantization Reviewed By: kimishpatel Differential Revision: D31656893 fbshipit-source-id: d0bc31da8ce93daf031a142decebf4a155943f0f	2021-11-01 14:44:26 -07:00
Howard Huang	0cbfd466d2	Remove ProcessGroup from TensorPipeAgent initialization (#66708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66708 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31762735 Pulled By: H-Huang fbshipit-source-id: 9f3879fca6b8258f7e6171b14d2c1d6cce21627d	2021-11-01 14:15:27 -07:00
Max Ren	ba369ea053	check to ensure profiler_edge is only added when use_kineto is on (#67494 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67494 Reviewed By: jbschlosser Differential Revision: D32031142 Pulled By: mcr229 fbshipit-source-id: 8267f0e02c5bed0fbc4956af6935a551bedb27ef	2021-11-01 13:42:14 -07:00
Gary Miguel	76f57cd442	[CODEOWNERS] Remove @neginraoof (#67631 ) Summary: She no longer works on the ONNX exporter Pull Request resolved: https://github.com/pytorch/pytorch/pull/67631 Reviewed By: malfet Differential Revision: D32070435 Pulled By: msaroufim fbshipit-source-id: d741a15bd7a916745aa7f2f3d9bb1dc699553900	2021-11-01 13:26:38 -07:00
Ivan Kobzarev	e80cb08cc8	[jit][shape_prop] Fix jit registration of unpack_sizes ops for prepacked (#66737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66737 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31703587 Pulled By: IvanKobzarev fbshipit-source-id: ccebe5ffc4fa959e3fa63afab1058d94e9df9dd9	2021-11-01 12:43:10 -07:00
Jane Xu	251278d385	[skip ci] set more tests with owners for distributed and elastic (#67583 ) Summary: It turns out my lint doesn't work on CI all the time because of shell differences. I'm working on a new more comprehensive lint in https://github.com/pytorch/pytorch/pull/66826 and it'd be nice if these could be cleared first. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67583 Reviewed By: H-Huang, mruberry Differential Revision: D32045155 Pulled By: janeyx99 fbshipit-source-id: ecfe9f008310c28e3b731e246c2b2ed0106d03b1	2021-11-01 12:26:03 -07:00
Kurt Mohler	4d99bc839b	Remove TH/THC Storage functions for unused dtypes (#67480 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67480 Reviewed By: mruberry Differential Revision: D32023494 Pulled By: ngimel fbshipit-source-id: 8827e1d6e765fee7219b5ee9888a1a3e3c5fbe89	2021-11-01 11:45:20 -07:00
Richard Barnes	a122ba776a	Fix less_than_lowest warnings (#67422 ) Summary: Fixes useless comparison against zero warnings for Half.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/67422 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31951939 fbshipit-source-id: 3e9940adda2d57b4d9b122f3862706c673f9ef4b	2021-11-01 11:19:55 -07:00
Brian Hirsh	da29655797	Disable miopen test for convolution on mobile (#66564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66564 Mobile thinks that we are segfaulting in _convolution, and this is the most recent substantive change to this function. I think it's pretty unlikely to have caused the crash, but if we don't have any better ideas we should try this. ghstack-source-id: 141972758 Test Plan: ship it and see if it resolves the error report Reviewed By: kimishpatel Differential Revision: D31598633 fbshipit-source-id: c34f4b0b7b8529e21fd019c886ad8d68ffe286b0	2021-11-01 10:22:40 -07:00
kshitij12345	885a8e53ba	replace onlyOnCPUAndCUDA with onlyNativeDeviceTypes (#65201 ) Summary: Reference https://github.com/pytorch/pytorch/issues/53849 Replace `onlyOnCPUandCUDA` with `onlyNativeDeviceTypes` which includes `cpu, cuda and meta`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65201 Reviewed By: mrshenli Differential Revision: D31299718 Pulled By: mruberry fbshipit-source-id: 2d8356450c035d6a314209ab51b2c237583920fd	2021-11-01 09:22:34 -07:00
Mike Iovine	39ad7b670e	[SR] Native implementation for aten::squeeze (#67441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67441 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31992093 fbshipit-source-id: 88191c13d229ffeac4e5b17b78e25f51d3f7f23e	2021-11-01 08:22:57 -07:00
Jane Xu	00da7b9a3b	Set test owner for vmap (#67582 ) Summary: More leftover actions from https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67582 Reviewed By: zou3519 Differential Revision: D32045160 Pulled By: janeyx99 fbshipit-source-id: 92ae9a533285b05b44bd04bb6127061c6fddd689	2021-11-01 07:22:48 -07:00
Alban Desmaison	9cdd1d7e48	Docs module check (#67440 ) Summary: Add check to make sure we do not add new submodules without documenting them in an rst file. This is especially important because our doc coverage only runs for modules that are properly listed. temporarily removed "torch" from the list to make sure the failure in CI looks as expected. EDIT: fixed now This is what a CI failure looks like for the top level torch module as an example: ![image](https://user-images.githubusercontent.com/6359743/139264690-01af48b3-cb2f-4cfc-a50f-975fca0a8140.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/67440 Reviewed By: jbschlosser Differential Revision: D32005310 Pulled By: albanD fbshipit-source-id: 05cb2abc2472ea4f71f7dc5c55d021db32146928	2021-11-01 06:24:27 -07:00
Mike Iovine	0d7cf825fc	[SR] Drop support for aten::__is__ and aten::__isnot__ (#67550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67550 `aten::__is__` and `aten::__isnot__` are extremely problematic for a large number of SR graph optimizations. Some examples: - Removing ops that are no-ops in the forward pass like `aten::detach`. This would normally be trivial, but `is` introduces corner cases like this: ``` def forward(x): y = x.detach() return x is y ``` We get `False` before optimizations. But after optimizations, the test becomes `x is x`, and we get `True`. - `ReplaceWithCopy`: the pass that replaces ops like `aten::to` with an out variant that copies its input. The following graph returns `True` before optimizations, but `False` afterwards ``` def forward(x): y = x.to(x.dtype) return x is y ``` - And many more, `FuseListUnpack` can break too Since the ops are not used by 99.99% of users, rejecting them so we don't have to think about this is not a big deal. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D32022584 fbshipit-source-id: d135938edb2299c9b8f9511afac2bf568578879e	2021-11-01 04:45:14 -07:00
Ivan Kobzarev	7fbcf79684	[tensorexpr][nnc] Support quantization (#66676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66676 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31676329 Pulled By: IvanKobzarev fbshipit-source-id: 288b41ff4ed603dfaacb465f296997f14bb23c22	2021-10-31 22:49:30 -07:00
Mike Ruberry	97f29bda59	Relaxes tolerance on ROCm test_noncontiguous_samples_matmul (#67593 ) Summary: This test is narrowly failing intermittently. See https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.3.1-py3.6-test1/7736//console for an example. Relevant snippet: ``` 12:28:43 ====================================================================== 12:28:43 FAIL [0.104s]: test_noncontiguous_samples_matmul_cuda_float32 (__main__.TestCommonCUDA) 12:28:43 ---------------------------------------------------------------------- 12:28:43 Traceback (most recent call last): 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1422, in wrapper 12:28:43 method(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 371, in instantiated_test 12:28:43 result = test(self, param_kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 737, in test_wrapper 12:28:43 return test(args, kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_device_type.py", line 920, in only_fn 12:28:43 return fn(self, args, *kwargs) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1041, in wrapper 12:28:43 fn(args, **kwargs) 12:28:43 File "test_ops.py", line 262, in test_noncontiguous_samples 12:28:43 self.assertEqual(actual_grad, expected_grad) 12:28:43 File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1903, in assertEqual 12:28:43 super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg)) 12:28:43 AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 1 element(s) (out of 10) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 1.2278556823730469e-05 (-1.458460807800293 vs. -1.4584730863571167), which occurred at index 7. ``` Setting an absolute tolerance of 1e-4, which is what this PR does, should make the test pass consistently. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67593 Reviewed By: ngimel Differential Revision: D32050986 Pulled By: mruberry fbshipit-source-id: f15bc8c4516be0a859afcfa76d52334c0b2c58a5	2021-10-31 04:26:31 -07:00
Natalia Gimelshein	d0662f2f76	Add adaptive_max_pool OpInfo (#67405 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67405 Reviewed By: mruberry Differential Revision: D32044712 Pulled By: ngimel fbshipit-source-id: 4619d134d18359601801c029dd5be3f59b91626d	2021-10-30 21:19:58 -07:00
Eddie Yan	e01279cc2e	Disable reduced precision reductions for fp16 GEMMs (#67578 ) Summary: It appears that most NVIDIA architectures (well, at least there haven't been many reports of this issue) don't do reduced precision reductions (e.g., reducing in fp16 given fp16 inputs), but this change attempts to ensure that a reduced precision reduction is never done. The included test case currently fails on Volta but passes on Pascal and Ampere; setting this flag causes the test to pass on all three. CC stas00 ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67578 Reviewed By: mruberry Differential Revision: D32046030 Pulled By: ngimel fbshipit-source-id: ac9aa8489ad6835f34bd0300c5d6f4ea76f333d1	2021-10-30 21:14:11 -07:00
kshitij12345	510e3026a9	[numpy] add torch.argwhere (#64257 ) Summary: Adds `torch.argwhere` as an alias to `torch.nonzero` Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`. From NumPy docs, > np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257 Reviewed By: qihqi Differential Revision: D32049884 Pulled By: saketh-are fbshipit-source-id: 016e49884698daa53b83e384435c3f8f6b5bf6bb	2021-10-30 15:26:11 -07:00
Kefei Lu	a95c94f075	[fx2trt] fix acc_tracer when run against module that contains ScriptModule submodules (#67567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67567 - Fix an issue to allow it to work against modules that contains ScriptModule submodules. - Fix a bug where `getattr(base_class, method_name)` could raise KeyError Test Plan: linter; CI; Reviewed By: 842974287 Differential Revision: D31956070 fbshipit-source-id: 1114937f380af437fd6d36cd811ef609d7faefe7	2021-10-30 15:13:45 -07:00
Saketh Are	b24c34426f	Add OpInfo for torch.unique and torch.unique_consecutive (#67529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67529 Reviewed By: pbelevich Differential Revision: D32045941 Pulled By: saketh-are fbshipit-source-id: fefea1ddabcd3c4b40e9374b991410626437cdb4	2021-10-30 08:33:41 -07:00
Mike Ruberry	aa16de517d	Revert D31984694: [pytorch][PR] make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions Test Plan: revert-hammer Differential Revision: D31984694 (`d4493b27ee`) Original commit changeset: 0035ecd13980 fbshipit-source-id: c85689007719c9e4a930b0a8a32d481a501d3c14	2021-10-30 03:51:18 -07:00
Brian Hirsh	4a2bbc619d	move functionalize fallback out of aten/core (#67564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67564 moves the functionalize fallback out of aten/core and into aten, which should fix the issue described at https://fb.workplace.com/groups/163556484490704/permalink/1029416141238063/. I'm still not clear on why this didn't fail anything in CI / sandcastle on the initial diff: D31942093 (`0032fa7725`) ghstack-source-id: 141959891 Test Plan: Locally, running `buck build mode/opt //sigrid/feed/prediction_replayer:fully_remote_replayer_main` Reviewed By: zou3519 Differential Revision: D32027585 fbshipit-source-id: 2d86c4a6b3a73b00ee0ccee2f89a54704ed83e00	2021-10-29 21:40:49 -07:00
kshitij12345	c00806beda	Add skipXLA and expectedFailureXLA decorator (#66857 ) Summary: Add skipXLA and expectedFailureXLA decorator and relevant test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66857 Reviewed By: ngimel Differential Revision: D32039856 Pulled By: mruberry fbshipit-source-id: 3c99d5e06c1c7684d1f798c11c783bd6ebea9899	2021-10-29 19:53:36 -07:00
Shirong Wu	69adbc8778	Fix splitter_base and add unit test for trt splitter (#67569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67569 Splitter_base has assumption that the first subgraph after split must be cpu subgraph if there exists cpu node. This is wrong, start subgraph should be determined by which subgraph has 0-dep node. Also add unit test for splitter. Reviewed By: yinghai Differential Revision: D32012549 fbshipit-source-id: e2639ccd7774b4295ca05c2ddbefff9726702b3f	2021-10-29 18:51:59 -07:00
Masaki Kozuki	d4493b27ee	make `TORCH_(CUDABLAS\|CUSOLVER)_CHECK` usable in custom extensions (#67161 ) Summary: Make `TORCH_CUDABLAS_CHECK` and `TORCH_CUSOLVER_CHECK` available in custom extensions by exporting the internal functions called by the both macros. Rel: https://github.com/pytorch/pytorch/issues/67073 cc xwang233 ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/67161 Reviewed By: jbschlosser Differential Revision: D31984694 Pulled By: ngimel fbshipit-source-id: 0035ecd1398078cf7d3abc23aaefda57aaa31106	2021-10-29 17:27:07 -07:00
Don Jang	ad89d994c9	[Static Runtime] Support recordio format input for benchmark (#67530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67530 Currently `ptvsc2_predictor_bench` only uses the first input of a given recordio file even when the record io file contains many inputs. This change extends `StaticRuntime::benchmark` to accept multiple input entries so that we can benchmark more extensibly and realistically using all the inputs in the recordio file. Test Plan: Tested `ptvsc2_predictor_bench` with / without this change executing the following command: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423_0.predictor.disagg.local --recordio_inputs=/home/djang/ads/adfinder/ctr_mobilefeed/302008423/302008423.local.inputs.recordio --pt_enable_static_runtime=1 --compare_results=0 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=1 --method_name=local.forward --set_compatibility --do_benchmark=1 --recordio_use_ivalue_format=1 ``` Reviewed By: hlu1 Differential Revision: D31947382 fbshipit-source-id: 4188271613aad201f8cad5f566e0dfed26680968	2021-10-29 14:38:14 -07:00
Mike Iovine	2cac92f470	[SR] Migrate sigrid_transforms_torch_bind to new FuseListUnpack (#67163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67163 Migrated both the variadic and non-variadic versions. This diff is part of the effort to migrate all ops used in `FuseListUnpack` to `FuseListUnpackV2`. The original version of `FuseListUnpack` is problematic for a few reasons: * You have to complicate the op implementation with an `is_fused` check, resulting in messier code. It is easier to reason about two ops, fused (out variant) and unfused (native). * The original version of `FuseListUnpack` is buggy. It assumes that the `ListUnpack` node occurs immediately after the fusion candidate, which is not necessarily true. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime/...` Accuracy Test Done at the top of this diff stack. Performance Everything seems to be about the same plus or minus some noise. * Baseline (D31947382 with some errors correct locally, the version of the op here is fused and variadic): P464964343 * This diff, fused_variadic: P464960645 * Variadic transformation disabled, fused (caught and fixed a schema error here): P464961561 * List unpack fusion disabled, variadic: P464962661 * Both variadic and fusion passes disabled: P464963342 The predictions match with the JIT interpreter for all ops. Reviewed By: hlu1 Differential Revision: D31887300 fbshipit-source-id: 25a7b4e35eed21ca8b2c98297513425cf17f461a	2021-10-29 14:25:10 -07:00
Shunting Zhang	289b0f7b04	Resent the reverted PR: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67303 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D32016061 Pulled By: shunting314 fbshipit-source-id: 9460c90dd4f630f4c81dbfbbd772446ddffbabd0	2021-10-29 14:10:43 -07:00
Dmytro Ivchenko	ba74b03b0d	Back out "[sharded_tensor] simplify init_from_local_shards API" Summary: Original commit changeset: 6e97d95ffafd Test Plan: unit test Reviewed By: wanchaol Differential Revision: D32023341 fbshipit-source-id: 2a9f7b637c0ff18700bcc3e44466fffcff861698	2021-10-29 14:01:07 -07:00
Dmytro Mishchenko	5c77ccefe0	Resolves #67227 documentation issue (#67379 ) Summary: Changed "Chi2" in the docstring to a more intuitive "Chi-squared" Fixes https://github.com/pytorch/pytorch/issues/67227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67379 Reviewed By: jbschlosser Differential Revision: D32023761 Pulled By: ngimel fbshipit-source-id: b514b49726f616914871a9a831aa10e12e4be90b	2021-10-29 13:47:38 -07:00
Jacob Szwejbka	66202b7f8d	[Pytorch Edge] Expose runtime operators versioning (#67385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67385 As part of the expanded operator versioning effort we are going to start looking at this variable and whats stored locally in the model file. ghstack-source-id: 141782717 Test Plan: unit test Reviewed By: cccclai Differential Revision: D31976654 fbshipit-source-id: 255a23cff7c4f4039089de23b4da95772be48324	2021-10-29 13:42:59 -07:00
Zhengxu Chen	60a80c5bbd	[jit] Move ModuleIndex operator to selective build. (#67483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67483 Move ModuleIndex operator to selective build candidates. ghstack-source-id: 141953898 Test Plan: eyes Reviewed By: qihqi Differential Revision: D32003895 fbshipit-source-id: 635c2bc37cd30a98f4a1e182fd6534eb9f1c4a69	2021-10-29 13:31:35 -07:00
Zhengxu Chen	12ede84dbb	[jit][edge] Enable lite interpreter to correctly handle INTERFACE_CALL instruction. (#65972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65972 ghstack-source-id: 141842336 Test Plan: buck test mode/dev //caffe2/test:mobile -- --exact 'caffe2/test:mobile - test_stacktrace_interface_call (mobile.test_lite_script_module.TestLiteScriptModule)' Reviewed By: qihqi Differential Revision: D31326147 fbshipit-source-id: 338ff4ce8ddc9502ffe0add49057b33b52a24955	2021-10-29 13:13:32 -07:00
Zhengxu Chen	d6b15bfcbd	[jit][edge] Load interface methods to corresponding ClassTypes. (#65971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65971 ghstack-source-id: 141842335 We should be able to load methods into their ClassTypes. Right now mobile runtime only loads data member to ClassTypes but not for methods. To support interface call, we inject methods into ClassTypes when the methods are loaded. Test Plan: existing tests should all pass. Reviewed By: qihqi Differential Revision: D31326146 fbshipit-source-id: fb1dbea619910ef1f8fa26146da3ebab348fe902	2021-10-29 12:48:57 -07:00
Jane Xu	6259601c8a	Set test owners for tests with unknown owners (#67552 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67552 Reviewed By: jbschlosser Differential Revision: D32028248 Pulled By: janeyx99 fbshipit-source-id: a006f7026288b7126dba58b31cac28e10ce0fed6	2021-10-29 12:42:01 -07:00
Jane Xu	c19cda5782	[skip ci] Add test owners for a special hi-pri class of tests (#67553 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 This change does require some context: there were several suggestions regarding what to do about this group of tests: tests that are core and crucial to all of PyTorch and are too broad to be owned by one team. 1. Let's add a "module: core" and put people behind it! This idea sounds appealing unless you are one of the people backing the label. From talking to albanD among others, this idea of putting all these core tests on the shoulder of a few people or one team isn't super fair and I have not yet found anyone willing to take on this job. 2. Taking advantage of the fact that we already have a triaging oncall that takes turns triaging issues, we can leave these tests essentially unlabeled and allow the oncall to triage these tests. Since these tests are crucial to PyTorch, we'll add the "high priority" label to mark them different from other unowned tests (see https://github.com/pytorch/pytorch/issues/67552). 3. I _could_ still create an unbacked label "module: core" and attribute these tests there, but I don't like the idea of creating a facade that the tests are "triaged" to a label when no one is actually taking a look. Now we could potentially break these tests down into smaller files so that each piece _could_ be owned by a team, but 1. I don't know if this is currently feasible and 2. This approach does not prevent that from happening in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67553 Reviewed By: albanD Differential Revision: D32025004 Pulled By: janeyx99 fbshipit-source-id: 1fb1aa4c27e305695ab6e80ae3d02f90519939c0	2021-10-29 12:17:21 -07:00
albanD	fcba8018c2	Update codeowners for sphinx conf (#67548 ) Summary: Add a codeowner for the conf file to ensure allowlist modification is monitored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67548 Reviewed By: jbschlosser Differential Revision: D32023929 Pulled By: albanD fbshipit-source-id: 63f18cdd725cc60993a6c0a9e3529ed95845e0bb	2021-10-29 10:50:15 -07:00
Ivan Yashchuk	69f86ecd3a	Sparse CSR CUDA: add `torch.add` with all inputs sparse (#63948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63948 This PR adds `torch.add(a, b, alpha=None, out=out)` variant with `a, b, out` all being sparse CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation, the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. Fixes https://github.com/pytorch/pytorch/issues/59060 cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31909731 Pulled By: cpuhrsch fbshipit-source-id: 656f523e3947fec56b2f93c474fb6fd49f0360ca	2021-10-29 10:43:05 -07:00
Pritam Damania	285d5a55b9	Add API usage to torch.RPC (#67515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67515 Adding API usage to torch.rpc to better understand usage of this API. ghstack-source-id: 141877028 Reviewed By: rohan-varma Differential Revision: D32011465 fbshipit-source-id: 34d006ece307ae4a90fbcc6cb44fc0b7edca611e	2021-10-29 10:38:41 -07:00
Mike Ruberry	ddc9bd335b	Adds reference vs. noncontiguous OpInfo test (#67434 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63341. This PR adds a new test, `test_noncontigous_samples`, that runs ops forward and backward and compares their outputs and grads between "normal" contiguous SampleInputs and noncontiguous SampleInputs. This test should preclude the need for noncontiguous SampleInputs going forward. The test was added by generalizing the `.numpy()` transform on SampleInputs to support a new `.noncontiguous()` transform and copying forward/backward patterns from other tests in test_ops.py. It also discovered that many SampleInputs were incorrectly reusing tensors, so those have been revised. SampleInputs creating noncontiguous tensors for testing have also been altered to no longer do so. In addition, this test discovered the following high priority silent correctness issues: - https://github.com/pytorch/pytorch/issues/67432 - https://github.com/pytorch/pytorch/issues/67517 - https://github.com/pytorch/pytorch/issues/67513 - https://github.com/pytorch/pytorch/issues/67512 - https://github.com/pytorch/pytorch/issues/67470 It also identified the following issues: - https://github.com/pytorch/pytorch/issues/67539 The pow OpInfo also incorrectly specified that pow supported the bool datatype, and this has been fixed. Its SampleInputs were written in a way that made requests for boolean SampleInputs return type promoting inputs that never actually tried to compute pow in bool. This PR suggests we should add the following guidance for writing SampleInputs: - ensure that all SampleInputs are independent of each other (don't reuse tensors) - ensure that all SampleInput tensors have no grad or backward functions (no autograd history) -- they should be leaves - prefer keeping sample inputs simple where possible, a good set of handwritten samples that test interesting cases may be better than an exhaustive but hard to read and maintain programmatic enumeration - keep code readable by using functools.partial and writing simple inline helpers; break up large statements into a more readable series of smaller statements; especially don't write complicated generator expressions with a `for` at the end! fyi kshitij12345 krshrimali pmeier anjali411 saketh-are zou3519 dagitses Pull Request resolved: https://github.com/pytorch/pytorch/pull/67434 Reviewed By: ngimel Differential Revision: D32014557 Pulled By: mruberry fbshipit-source-id: b17e19adc1d41e24441f0765af13d381fef5e3c1	2021-10-29 09:55:56 -07:00
Joel Schlosser	16d937b0df	Fix strided _conv_double_backward() with 3D input / weight (#67283 ) Summary: Removes the 3D special case logic in `_convolution_double_backward()` that never worked. The logic was never called previously since `convolution()` expands input / weight from 3D -> 4D before passing them to backends; backend-specific backward calls thus save the 4D version to pass to `_convolution_double_backward()`. The new general `convolution_backward()` saves the original 3D input / weight, uncovering the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67283 Reviewed By: anjali411 Differential Revision: D32021100 Pulled By: jbschlosser fbshipit-source-id: 0916bcaa77ef49545848b344d6385b33bacf473d	2021-10-29 09:48:53 -07:00
Erjia Guan	bf31995194	Add OpInfo for `nn.functional.cosine_embedding_loss` (#67465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67465 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D32001920 Pulled By: ejguan fbshipit-source-id: 82e547b5f0057b4ecc61e6f3be56bf038db179d1	2021-10-29 09:11:23 -07:00
Erjia Guan	bcd301a457	Add OpInfor for `nn.functional.ctc_loss` (#67464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67464 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D32001919 Pulled By: ejguan fbshipit-source-id: f277a8e9c9887ed62e871e8a0c8549e853e34356	2021-10-29 09:11:21 -07:00
Erjia Guan	e2e20e79fb	Add OpInfo for `nn.functional.poisson_nll_loss` (#67371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67371 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31973173 Pulled By: ejguan fbshipit-source-id: 3cbb21d292b95039f7c7d1f4caa300f3d619740a	2021-10-29 09:11:18 -07:00
Erjia Guan	8b8fb4f4e6	Add OpInfo for `nn.functional.gaussian_nll_loss` (#67376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67376 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31974040 Pulled By: ejguan fbshipit-source-id: d6abac78a378d2763ca2fd465e64dea9985840f2	2021-10-29 09:11:16 -07:00
Erjia Guan	1d900ee22f	Add OpInfo for `nn.functional.hinge_embedding_loss` (#67381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67381 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31976354 Pulled By: ejguan fbshipit-source-id: 09068bb3d1bba665517254dd8a2dab9abd78b0e2	2021-10-29 09:11:14 -07:00
Erjia Guan	c6a6c09383	Add OpInfo for `torch.nn.functional.gaussian_nll_loss` (#67356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67356 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31970077 Pulled By: ejguan fbshipit-source-id: 91bd9c5202b49f79ef83795196c2773fbe8a9afd	2021-10-29 09:09:48 -07:00
Sean Silva	2e156f649e	Sort output of *NativeFunctions.h (#67046 ) Summary: This ensures deterministic output, allowing systems like ccache to be more effective. cc ezyang bhosmer bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/67046 Reviewed By: VitalyFedyunin Differential Revision: D31896114 Pulled By: bdhirsh fbshipit-source-id: d29ef0cf6c7e3408b104c5239b620eaa24327088	2021-10-29 09:03:39 -07:00
Samantha Andow	f95ed474ac	Norms Op Info (#67442 ) Summary: Adds op infos for group_norm, instance_norm, and local_response_norm Pull Request resolved: https://github.com/pytorch/pytorch/pull/67442 Reviewed By: mruberry Differential Revision: D31992225 Pulled By: samdow fbshipit-source-id: 5bf3e21cff2a39ca3e47dbe13db7671c617aaad1	2021-10-29 08:36:07 -07:00
Vasiliy Kuznetsov	d58f209326	add dequantize support for fp16 + cuda (#67234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67234 Extends the dequantize fp16 function to also work on CUDA, and adds a test. Test Plan: ``` python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cuda python test/test_quantization.py TestQuantizedTensor.test_dequantize_fp16_cpu ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31915330 fbshipit-source-id: 622d47464fae26bf02f295ff56df63a3bf80b786	2021-10-29 07:58:38 -07:00
Vasiliy Kuznetsov	99282126dc	pytorch quantization: document the custom module APIs (#67449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67449 Adds a description of what the current custom module API does and API examples for Eager mode and FX graph mode to the main PyTorch quantization documentation page. Test Plan: ``` cd docs make html python -m http.server // check the docs page, it renders correctly ``` Reviewed By: jbschlosser Differential Revision: D31994641 Pulled By: vkuzo fbshipit-source-id: d35a62947dd06e71276eb6a0e37950d3cc5abfc1	2021-10-29 05:22:17 -07:00
Jerry Zhang	acdc754918	[quant][graphmode][fx] Add support for ObservationType.OUTPUT_SHARE_OBSERVE_WITH_INPUT in backend_config_dict (#67210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67210 `OUTPUT_SHARE_OBSERVE_WITH_INPUT` is an observation type for operators that would have the same observer/fake_quant instance as output, when quantized, these ops can take quantized Tensor as input and output a quantized Tensor with the same quantization parameters (scale/zero_point etc.) as input Using cat as an example in this PR. Other ops can be added later gradually (together with tests). Test Plan: python test/fx2trt/test_quantize_fx.py TestQuantizeFxTRTOps.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D31907243 fbshipit-source-id: 2c7af4a456deb5e6597b0b9cd4e32c5fcdec580b	2021-10-29 02:37:48 -07:00
Jerry Zhang	2bb20c0e48	[quant] Move test file to fx2trt folder (#67129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67129 Since the tests depends on experimental feature (fx2trt), we'll move them to fx2trt foler Test Plan: python test/fx2trt/test_quantize_fx.py Imported from OSS Reviewed By: vkuzo Differential Revision: D31877123 fbshipit-source-id: 5a98a257c4806c1911cfc2616d5ad98d715234c4	2021-10-28 23:58:44 -07:00
Yinghai Lu	5e46a4f6bd	Fixes to make trt timing_cache really work (#67524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67524 We have some loose ends to tie to make timing cache really work. This diff fixes them. Reviewed By: wushirong Differential Revision: D32012021 fbshipit-source-id: 1e93c76d48a3740a02613e1f19222ed92cca9deb	2021-10-28 23:09:24 -07:00
Michael Suo	96c868217c	[deploy] fix TypedStorage serialization (#67499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67499 Since https://github.com/pytorch/pytorch/pull/62030 was landed, storages being produced when loading from a pickle are of type TypedStorage. We weren't catching this in our deploy serialization, leading tensors to actually get pickled instead of the storages getting shared across interpreters. Since this is technically correct still, it wasn't caught by any of our tests, until someone tried to pass a really big tensor and started ooming. ghstack-source-id: 141869521 Test Plan: added unit test Reviewed By: shunting314 Differential Revision: D32004075 fbshipit-source-id: ef5a80cd3cb1dff0b6b4c1b6c95923e4faab7d50	2021-10-28 22:33:04 -07:00
Rob Kinyon	4052393af8	Revert D31450501: Wextra caffe2/ Test Plan: revert-hammer Differential Revision: D31450501 (`7c2d3e6d32`) Original commit changeset: 702728fdb3c5 fbshipit-source-id: 486b8e872c38415706288f7f389d7cb1ea5eb0a9	2021-10-28 20:43:28 -07:00
Vidhoon Viswanathan	18807273cb	Fix Ads build broken due to comparison type mismatch (#67526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67526 Build error P465285570 due to D31942093 (`0032fa7725`) (Note: this ignores all push blocking failures!) Test Plan: build passes after fix Reviewed By: jbschlosser Differential Revision: D32013247 fbshipit-source-id: b60a65afd7a5a2d3249150fbc2ac52374d62a591	2021-10-28 20:42:13 -07:00
Jay Zhang	26241994b2	Remove the argument strip_doc_string of export() method entirely. (#66615 ) (#67278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67278 Remove the argument strip_doc_string of export() method entirely. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962512 Pulled By: malfet fbshipit-source-id: 168ad3f157a80d1edd7a9053783b3f3deb2ecf43 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:07 -07:00
Jay Zhang	43d51254bf	Deprecate the argument _retain_param_name of export() method entirely. (#66617 ) (#67277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67277 Remove the argument _retain_param_name of export() method entirely. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962514 Pulled By: malfet fbshipit-source-id: 8ac5e3a4a7821cc580951a7f167fd20069116350 Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:05 -07:00
Jay Zhang	40920185ac	[ONNX] Remove the argument enable_onnx_checker of export() method entirely. (#66611 ) (#67276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67276 [ONNX] Remove argument _retain_param_name from torch.onnx.export() function. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962520 Pulled By: malfet fbshipit-source-id: 86ee15f525261c0da74175e47dd74eeb169ac47f Co-authored-by: fatcat-z <jiz@microsoft.com>	2021-10-28 19:25:03 -07:00
Bowen Bao	609da98154	[ONNX] Update value name copying logic for onnx (#66170 ) (#67275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67275 Specifically targets the symbolic functions that directly returns input as output. The old logic will override the value name with output value name. But since the input is unmodified and unchanged, it is more logically to keep its original input name. Especially for cases where inputs are directly from model parameters. Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962517 Pulled By: malfet fbshipit-source-id: 9cb4a2bb55aa08dd1ce8fdec24e7cfb11d7ea97c Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-28 19:23:55 -07:00
Richard Barnes	7c2d3e6d32	Wextra caffe2/ (#67319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67319 Test Plan: Sandcastle Reviewed By: pbelevich Differential Revision: D31450501 fbshipit-source-id: 702728fdb3c5b00510ec637ff65bb2c6949fcc4e	2021-10-28 19:02:34 -07:00
Yinghai Lu	d8bde98f36	Workaround the bug of TRT which creates extra outputs (#67327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67327 At cerntain condition, TRT will create extra outputs, which seems more like a bug. If we don't capture those hidden outputs, we won't allocate memory to host those outputs and trt will end up writing to illegal memory. This diff address the issue but capturing the hidden outputs and allocate proper memory for them. Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31955379 fbshipit-source-id: c9faaf91ed45bec8e0bc4e0bea812a0a3f2abad0	2021-10-28 18:43:59 -07:00
Elias Ellison	fc82ad186a	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797466 Pulled By: eellison fbshipit-source-id: b508d2f5baef6e8e4020955ab1d4bc4b9c7bdfdd	2021-10-28 17:09:03 -07:00
John Clow	2661507488	Adding support for Symbolic Shapes in Inplace Ops #65642 (#65729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65729 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31961857 Pulled By: Gamrix fbshipit-source-id: bfb1e8a66be254638e8e93ade091ab9df6029e8c	2021-10-28 16:49:10 -07:00
Eli Uriegas	d0bc01fac2	ci: Migrate hardcoded docker builds to GHA (#67455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67455 Migrates docker builds that don't have dependent jobs within the pytorch repository to our new GHA docker build job Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D31997671 Pulled By: seemethere fbshipit-source-id: 9d6f58fa8ea8731cf12457fe64dc65e70f3d9f25	2021-10-28 14:50:05 -07:00
Yiwen Song	6696c59af4	Adding `optimizer` attribute to SequentialLR (#67406 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67318 :) cc albanD, datumbox Pull Request resolved: https://github.com/pytorch/pytorch/pull/67406 Reviewed By: jbschlosser Differential Revision: D31997873 Pulled By: albanD fbshipit-source-id: f579fb886d049a545673fd92ef5892fcf501bcc6	2021-10-28 14:43:40 -07:00
Mike Iovine	354363b57a	[SR] Native implementation for aten::size (#67346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67346 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like TupleUnpack). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31965159 fbshipit-source-id: 86a69c395f401c4a4c55daa4c5fe80764383c8e5	2021-10-28 14:18:17 -07:00
Scott Wolchok	9f01937caf	[PyTorch][easy] Deduplicate memory planner creation code (#67265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67265 Avoid repeating this initialization code. ghstack-source-id: 141585971 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31933368 fbshipit-source-id: 6342ae9bb82c4d152a427bad142470c3d162de69	2021-10-28 14:13:43 -07:00
Nicolas Hug	82c356505f	Revert D31894777: [pytorch][PR] Replace issue templates with new issue forms Test Plan: revert-hammer Differential Revision: D31894777 (`62feadd76f`) Original commit changeset: fbd39f7ed4ca fbshipit-source-id: 4698ff5fe8629f9ad0249425a369c6f0bd89c891	2021-10-28 13:52:43 -07:00
Mike Iovine	afb8434440	[SR] Native implementation for aten::view (#67341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67341 Native ops are faster than falling back to the JIT interpreter, sometimes significantly (we've previously seen this with ops like `TupleUnpack`). We should improve op coverage where possible. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31962589 fbshipit-source-id: 3107fb169c1b02fb2bafbb355c005669b5fa8435	2021-10-28 13:37:46 -07:00
Zhengxu Chen	60472594e1	[jit][edge] Implement torch::jit::Function for mobile funciton. (#65970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65970 ghstack-source-id: 141842338 mobile::Function should inherit from jit::Function, because for interface call support, we need an abstract jit::Function type stored in corresponding ClassTypes, so that we can look up methods in there. Previously mobile::Function is implemented separately which prevents this. Since we get rid of all the unneeded virtual methods from jit::Function, we can inherit from torch::jit::Function without too much cost. NOTE that torch::jit::Function is already in dependency because we need it to support custom class call. We should be able to use Function uniformly without looking into whether it's a builtin function or mobile::Function. Test Plan: no behavior change. Reviewed By: iseeyuan, mrshenli Differential Revision: D31326148 fbshipit-source-id: 36caeaf3c8c5f54c23a1a7c8c9e2fd6e78b19622	2021-10-28 13:33:30 -07:00
Zhengxu Chen	5ef62c88a9	[jit] Replace get_executor() with call() in abstract Function interface. (#65969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65969 ghstack-source-id: 141759210 Test Plan: no behavior change. Reviewed By: anjali411 Differential Revision: D31326151 fbshipit-source-id: 201f6dc4c23fdb2531f6b8c73d26127f9e212de4	2021-10-28 13:11:29 -07:00
Mike Iovine	8363da3f92	[SR][C2][easy] Benchmarks report # of ops (#67436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67436 This information is useful for comparing static runtime to c2 Reviewed By: d1jang Differential Revision: D31991571 fbshipit-source-id: eb83bc4564b05d56fb9a550863eea3f6312f3f6c	2021-10-28 13:03:09 -07:00
Douglas Lehr	b8f07689f2	[ROCm] Enable frexp support for ROCm builds (#67226 ) Summary: The frexp function has been enabled in ROCm code. Updating PyTorch to enable this functionality. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Pull Request resolved: https://github.com/pytorch/pytorch/pull/67226 Reviewed By: jbschlosser Differential Revision: D31984606 Pulled By: ngimel fbshipit-source-id: b58eb7f226f6eb3e17d8b1e2517a4ea7297dc1d5	2021-10-28 12:42:09 -07:00
Zhengxu Chen	0795735351	[jit] Clean up unneeded virtual methods from Function interface. (#65968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65968 tryToGraphFunction() should cover all cases and more composable than adhoc virtual methods. ghstack-source-id: 141759214 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326154 fbshipit-source-id: 692a35df424f7d4f777a96489c4cbb24b3ae7807	2021-10-28 12:28:48 -07:00
Ivan Yashchuk	bd5e6fe5ac	Skip complex128 dtype for test_addmm_sizes_all_sparse_csr Windows test (#67453 ) Summary: Windows CUDA 11.1 periodic CI is failing. See https://github.com/pytorch/pytorch/pull/63511#issuecomment-953940183. I don't understand though why periodic-win-vs2019-cuda11.1-py3 was triggered on the PR, but no test from `test_sparse_csr.py` were run https://github.com/pytorch/pytorch/runs/3975200820?check_suite_focus=true. cc nikitaved pearu cpuhrsch IvanYashchuk mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/67453 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D31997574 Pulled By: cpuhrsch fbshipit-source-id: ae8bfb6da865014f39e6ad5675eb17e5a4d39744	2021-10-28 12:24:46 -07:00
Jiewen Tan	5b8b2382d1	Mark mv as CompositeExplicitAutograd (#67373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67373 From the implementation of mv, it's decomposed into addmv. So it should be a CompositeExplicitAutograd op. Test Plan: It shouldn't change any behaviors. So, CI. Reviewed By: bdhirsh Differential Revision: D31973265 Pulled By: alanwaketan fbshipit-source-id: 3b6850f08e6f671b908a177f148cc6194baa8a13	2021-10-28 11:59:00 -07:00
Can Balioglu	f3aae62942	Port `tril` and `triu` to structured kernels (#67055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67055 This PR ports `tril` and `triu` operations to structured kernels. ghstack-source-id: 141797608 Test Plan: Extended the existing unit tests. Reviewed By: wanchaol Differential Revision: D31844638 fbshipit-source-id: 03ea4aa2410b042cafc3c5397e777a9ca5351b39	2021-10-28 11:45:58 -07:00
John Shen	4a1f73ccb3	[qnnpack] Remove asymmetrical padding parameters in qnnpack (#67102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67102 Getting rid of top/bottom and left/right distinction, replacing with height and width. These parameters are widely used in qnnpack and always passed together but never different. Pytorch doesn't support asymmetrical paddings either so I see no potential use for this. ghstack-source-id: 141334544 Test Plan: qnnpack unit tests Reviewed By: kimishpatel Differential Revision: D31863370 fbshipit-source-id: aa57490399e23d6139b2ad7b745139752acd7848	2021-10-28 11:40:19 -07:00
Howard Huang	4e873d6799	Formatting changes (#66257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66257 Used `clang-format -i` for these two files. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31762737 Pulled By: H-Huang fbshipit-source-id: e94e301d0b013dbb8f2cef19ff140bac5811738f	2021-10-28 11:36:00 -07:00
Robert Blackwell	cee4e8f35d	Add FlexiBLAS build support per #64752 (#64815 ) Summary: To enable building torch+dependencies, set WITH_BLAS=flexi BLAS=FlexiBLAS Fixes https://github.com/pytorch/pytorch/issues/64752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64815 Reviewed By: jbschlosser Differential Revision: D31997745 Pulled By: albanD fbshipit-source-id: db208d59002f5896608a03132616400f09d972aa	2021-10-28 11:28:00 -07:00
Shirong Wu	55b7387e45	Timing cache for Tensort (#67214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67214 This is draft for creating timing cache for tensorrt. Reviewed By: yinghai, 842974287 Differential Revision: D31783757 fbshipit-source-id: 211ab68df0832120fa637304e4a7ece80d26f9b1	2021-10-28 11:21:51 -07:00
Brian Hirsh	0032fa7725	Add a Functionalization pass in core (#64432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64432 Original PR description + feedback here: https://github.com/pytorch/pytorch/pull/63048 I've addressed all of the feedback in the original PR and made some pretty large changes, listed below. Table of Contents - Starting points - List of the main changes from the original PR - Next Steps - Example codegen output (for a view, mutation, and view+mutation op) Starting Points A good place to start when looking through the PR: * Alban mentioned that this is a useful mental model (thanks Ed for originally making this clear to me). Semantically, the pass currently does THREE things, which are all needed by functorch - all fused together into one big pass. * (a) alias removal, which replaces {view} calls with {view}_copy calls, and manually tracks aliasing information, so that when one tensor is mutated, we re-apply the same mutation to all of the aliases. This is the bulk of the work - once this is done, the next 2 things are trivial to implement. * (b) mutation removal, which is easy to do once we know that there are no aliases. Every mutation `a.add_(b)` becomes `a.replace_(a.add(b))` * (c) reapplying views: all of the `{view}_copy` calls are replaced with `{view}` calls again. This is an optimization that we can make specifically for functorch (and strided backends), that only care about mutation removal and not alias removal * XLA and Vulkan only want (a), or (a) + (b). Later, we'll want to split this out so that you can actually opt into different versions of this logic. * There is currently no {view}_copy replacement, because the pass just <replace views with copies> and <replace copies with views> steps have been combined. Later, we'll want to actually implement {view}_copy variants of each view operator, probably with codegen. * documentation breadcrumb 1, in `FunctionalTensorWrapper.cpp`: https://github.com/pytorch/pytorch/pull/64432/files#diff-a0bac99bf205dba5b94cb64fc2466d3d55d991887572f9cd6a02e27b3a91dd60R59 (you might have to expand the `FunctionalTensorWrapper.cpp` file, which GitHub closes by default because it's large) * documentation breadcrumb 2, in `FunctionalTensorWrapper.h`: https://github.com/pytorch/pytorch/pull/64432/files#diff-c945c71a4ccac65871f24a912e8904f9a5088b24a32e636727ea9c8fe920708aR12 * Reading through the codegen output at the bottom of this description. Main changes from the original PR (1) I use lambdas instead of a giant enum to handle all of the different views. This results in less boilerplate per view op (and more stuff that can be codegen'd). Every `ViewMeta` object now contains a `forward` and `reverse` lambda, that knows how to replay the view and its inverse. This makes the actual code that executes the replaying logic a lot less boilerplate-y (see `Alias::sync_update_operations` and `FunctionalTensorWrapper::sync_`) (2) Every tensor during the functionalization pass is always wrapped in a `FunctionalTensorWrapper`. This is potentially unnecessary for Vulkan/XLA, and will have a mild perf impact, but for now this PR just targets the functorch use case. I previously had a complicated design a (`FunctionalTensorImplBase` class) to avoid needing the wrapper for XLA, but it had some subtleties that are gonna require more thought to fix, so I'm pushing that off for now. (3) `FunctionalTensorWrapper` objects accurately report stride information. It's a little annoying to do this though, because the logic that calculates stride info for each view isn't easily separated from the actual view kernels in core, `at::native::{view}`. I do this by adding logic in each `at::functionalization::{view}` kernel to call the reference implementation `at::native::{view}`. I don't do anything with the output aside from taking it's size/stride/storage_offset to set the actual output tensor's size/stride/storage_offset correctly. There's another annoying part to this: I'm pretty sure that we want to pass in the actual wrapper tensors directly into the native kernels, not their inner unwrapped values. But there are some `at::native::{view}` kernels that call other tensor methods, which re-invokes the dispatcher, calling functionalization/functorch kernels that try do the unwrapping. To do this, right now I have an `AutoDispatchDirectlyToNative` guard that basically ensures that any tensor methods called inside of the at::native::{view} op always redispatch straight to the CPU kernel (which will be another at::native:: kernel). This feels kind of heavy handed, but I'm not sure of a better way to do it. (4) `FunctionalTensorWrapper` objects accurately report aliasing information. There's a new `FunctionalStorageImpl` class (subclass of `StorageImpl`) that allows tensors in the functionalization pass to accurately alias storage. If two tensors `a` and `b` in a functionalized program are views of one another, then `a.storage.is_alias_of(b.storage)` should return true. I added this in a pretty similar way to how meta tensors allocate storage, although I don't pass in an actual allocator (I think this is fine because you should never resize a functional tensor's storage). One thing I'm not sure about - should `FunctionalTensorWrapper` set `storage_access_should_throw_`: (a) always, (b) never, (c) only if its wrapped tensor has it set. Right now I have it not set, mostly because calling the reference view functions (`at::native::{view}`) requires looking at the storage. But that means that if you try to access storage from python in a functionalized program, you'll get silent garbage instead of an error. Related question: are we planning on exposing meta tensor storage to python in the future (even though it contains garbage)? (5) better docs :) View operator coverage (6) The functionalization pass now gets math-composite view ops for free. I didn't add the `Functionalize` dispatch key to the composite set, because I don't want composite ops like `torch.ones` to get decomposed before hitting the functionalization pass. Instead, I added codegen to manually register the `at::native::` kernels of composite view ops. This is a little hairy, because the names of the `at::native::` kernels aren't easily accessible. They're stored in a `Dict[DispatchKey, BackendIndex]`. I made a best-effort attempt to get each view kernel's name, basically by assuming that every view op has either a composite or cpu implementation. There's also a hardcoded list of composite view ops in `gen_inplace_or_view_type.py`, but it looks like it's wrong. This is probably worth rationalizing later, but instead I created a new list of the "complete" set of composite view ops, and preserved the old set by hardcoding the delta between the two sets. (7) I've added codegen for ops that are both views AND mutations, like `transpose_()` (why do we even have these {emoji:1f622}). From some light testing, it looks like they work correctly with one caveat: I had a hard time ensuring that functorch programs that mutate their inputs using ops like `transpose_()` preserve the input mutations after the program finishes running. For (in my corresponding functorch branch) I emit a warning when this happens, and just don't preserve the mutation (8) I added `{view}_inverse` implementations for every view op, in `FunctionalInverses.cpp`. These are needed to take mutations made to views and replay them back onto the base. To reduce boilerplate, the codegen generates function declarations for each `{view}_inverse` function, so you get a nice compiler error when someone eventually adds a new view op. The only view ops currently not supported are (a) as_strided, and (b) the sparse view ops (values()/indices()). I can add support for as_strided, but it needs an `as_strided_inverse()` function. That will look really similar to the `as_strided_backward()` function in FunctionsManual.cpp, but it has some noticeable differences: we basically want an `as_strided_embed` for autograd and `as_strided_scatter` for functionalization. We also will probably need them to be primitives w.r.t to autograd, since the currently implementation for autograd uses view().copy_() calls that XLA won't be able to handle. I'm wondering if anyone has any objections, but otherwise I can make those change (which will require writing backward formulas for `as_strided_embed` and `as_strided_scatter`). I did a bunch of manual testing that all looks pretty good, but it's definitely not fully tested. Ed pointed out that once XLA uses this pass (or at least once there's a POC), we can just run the existing xla view test suite. Hopefully that delay is okay - if it's not, maybe we can think about using OpInfos similar to how functorch uses them for testing. Note: there's some duplication with autograd's view code. Every `{view}_inverse` implementation is really similar to the implementation for that view listed in `derivatives.yaml`. There are some major differences though: * the autograd implementations over those backwards functions (like `permute_backwards()`, in `FunctionsManual.cpp`) internally call other view ops. For functoinalization, we want them to (eventually call `{view}_copy` operators). * For view ops that take a subset of the original storage, like `slice/select/diagonal/as_strided()`, the autograd backward functions fill the "spaces" in the inverse call with zeroes. For functionalizations, we want to fill them with the value of `base` at those positions. It looks like this currently applies to 6 total ops (since we can ignore composites): * select * slice * diagonal * as_stridied * split * split_with_sizes A nice end state would probably be for the autograd + functoinalization codegen to both look at the same yaml (either `derivatives.yaml`, or something else), and automatically generate the right thing. I didn't leave that in scope for this PR though. Current State + Next Steps There are a bunch of followups after this PR eventually lands. Roughly in order: * Use the current pass to register problematic composite ops in functorch. Also, nested `functionalize()` calls aren't supported yet (I mostly just need to remove some debug asserts and test it). * Work on freeing up dispatch key space in the by deduplicating the `{backend}`/`Autograd{backend}`/`Sparse{backend}`/`Quantized{backend}` keys * Once we have more dispatch keys, split up this pass into 3 pieces - it's currently fused, and doesn't do the right thing for vulkan/XLA. Specifically, all of the `{view}` calls in the current pass's view-replay logic should turn into `{view}_copy` calls that vulkan/XLA know how to implement, and there will be separate passes for (a) removing mutations, and (b) turning `{view}_copy` calls back into `{view}` calls. For Vulkan, we eventually want a pass that ONLY removes aliasing and view calls, and doesn't remove mutations. We can also probably make the 2 new passes user dispatch keys to save dispatch key space, if they'll only be used by functorch anyway. * Do more of a dive on perf for the vulkan/xla use cases. There are several areas to improve perf with varying levels of effort required. The simplest one that I'll probably do regardless is to codegen the out-of-place kernels instead of using a boxed fallback. Getting a POC working for xla will also be useful to test the view operator coverage. Example Codegen Output View Op: ``` ::std::vector<at::Tensor> split_Tensor(c10::DispatchKeySet ks, const at::Tensor & self, int64_t split_size, int64_t dim) { auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); ::std::vector<at::Tensor> out; { at::AutoDispatchBelowFunctionalize guard; auto tmp_output = at::redispatch::split(ks & c10::after_func_keyset, self_, split_size, dim); out = at::functionalization::impl::wrapFunctionalTensor(tmp_output); // I'm fusing the [alias removal], [mutation removal], [add views back] passes together. // Later, we'll want to turn them into separate passes (since e.g. vulkan only cares about alias removal). } at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [split_size, dim](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.split(split_size, dim)[mutated_view_idx]; }, [split_size, dim](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::split_inverse(base, mutated_view, mutated_view_idx, split_size, dim); } ); at::functionalization::impl::set_view_meta(out, self, view_meta); at::AutoDispatchDirectlyToNative native_guard; ::std::vector<at::Tensor> reference_tensor_output = at::native::split(self, split_size, dim); at::functionalization::impl::set_strides(out, reference_tensor_output); return out; } ``` Mutation Op: ``` at::Tensor & add__Tensor(c10::DispatchKeySet ks, at::Tensor & self, const at::Tensor & other, const at::Scalar & alpha) { at::functionalization::impl::sync(self); at::functionalization::impl::sync(other); auto self_ = at::functionalization::impl::unwrapFunctionalTensor(self); auto other_ = at::functionalization::impl::unwrapFunctionalTensor(other); at::Tensor tmp_output; { at::AutoDispatchBelowFunctionalize guard; // The functionalization pass explicitly doesn't pass out= parameters to the redispatch tmp_output = at::redispatch::add( ks & c10::after_func_keyset, self_, other_, alpha); } self.replace_(tmp_output); at::functionalization::impl::maybe_add_update(self); return self; } ``` View + Mutation Op: ``` at::Tensor & transpose_(c10::DispatchKeySet ks, at::Tensor & self, int64_t dim0, int64_t dim1) { at::functionalization::ViewMeta view_meta = at::functionalization::ViewMeta( [dim0, dim1](const at::Tensor& base, int64_t mutated_view_idx) -> at::Tensor { return base.transpose(dim0, dim1); }, [dim0, dim1](const at::Tensor& base, const at::Tensor& mutated_view, int64_t mutated_view_idx) -> at::Tensor { return at::functionalization::impl::transpose_inverse(base, mutated_view, dim0, dim1); } ); at::functionalization::impl::mutate_view_meta(self, view_meta); // See Note [Propagating strides in the functionalization pass] // Directly update the sizes/strides/storage_offset fields on self using the inplace call. // I need the guard because I don't want the at::native kernel to end up calling more functionalization/functorch kernels. // Its only job is to directly compute the output size/stride/storage_offset metadata. at::AutoDispatchDirectlyToNative native_guard; at::native::transpose_(self, dim0, dim1); return self; } ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942093 Pulled By: bdhirsh fbshipit-source-id: b95598dae35dd1842fa8b1d8d1448332f3afaadf	2021-10-28 10:51:17 -07:00
Brian Hirsh	b0a8ca2cb5	add tags for inplace view ops in native_functions.yaml (#65412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65412 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942094 Pulled By: bdhirsh fbshipit-source-id: 1f7f6ea7df13e9f91b81ed64088e35e471800aa8	2021-10-28 10:51:15 -07:00
Brian Hirsh	03f3a0331b	add slice/select/diagonal_scatter variants as primitive ops (#64430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64430 The functionalization pass needs `{view}_scatter` versions of the slice/select/diagonal ops in order to correctly propagate mutations from a view to its base. On top of that, the implementations need to be primitive w.r.t. autograd, because they look something like `...slice().copy_()`, and the functionalization pass can't use views + mutations inside of it's own alias-removal machinery! I added some basic tests that I tried to base off of existing tests for views (particularly around testing the derivative formulas), but I'm wondering if I should add something more comprehensive. Also, as_strided fits into this category - the functionalization pass will need an `as_strided_scatter` op that's primitive w.r.t. autograd. I didn't add it for now, because it'll involve duplicating a bunch of logic from the current `as_strided_backward()` function, and also writing a derivative formula that I wasn't sure how to write :) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942092 Pulled By: bdhirsh fbshipit-source-id: c702a57c2748a7c771c14e4bcc3e996b48fcc4c8	2021-10-28 10:51:12 -07:00
Brian Hirsh	665c148e42	move some codegen utilities into utils.py (#63094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63094 This PR: - Moves `FileManager` and its dependencies (`assert_never` and other imports) to `utils.py`, and updates all of the call-sites with the fresh imports - Passes the list of NativeFunction objects into `gen_trace_type` directly, instead of requiring the function to regenerate it (we already have it) The purpose of the reshuffling is to avoid circular dependencies in the next PR, where I add codegen for the functionalization pass, which gets called from `gen.py` (but depends on some stuff from the autograd codegen - in partulcar, the list of view ops). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31942096 Pulled By: bdhirsh fbshipit-source-id: 36118facae61f25f8922bb43ad2818c80b53504e	2021-10-28 10:49:17 -07:00
Mike Iovine	b100a9ea82	Back out "Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t>" (#67456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67456 There are some compatibility issues, we need to back-out before it gets to prod feed models Test Plan: CI Reviewed By: pgarbacki Differential Revision: D31997684 fbshipit-source-id: 8b2584cb5d43e487719c6530d4178988fd03c455	2021-10-28 10:44:41 -07:00
Jerry Zhang	a8f85300da	[quant][graphmode][fx][test] Refactor test code for quant-fx2trt unit tests (#67070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67070 Test Plan: python test/test_quantization.py TestQuantizeFxTRTOps Imported from OSS Reviewed By: vkuzo Differential Revision: D31850124 fbshipit-source-id: a314b8869c091743dad7e5a1468985cf8aff6091	2021-10-28 10:39:58 -07:00
Yanli Zhao	325b15039c	Add FSDP tests to verify forward overlap and memory usage (#67117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67117 Add FSDP tests to verify forward overlap and memory usage ghstack-source-id: 141783871 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D31845629 fbshipit-source-id: b8b747e036925a9bb9164f0a5546000eece8442a	2021-10-28 10:29:27 -07:00
Howard Huang	938afa37a3	Remove process group barrier and all_reduce function calls from tensorpipe agent (#65946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65946 Add new function in agent_utils to perform a synchronization of active call counts using store. This is intended to replace the barrier and all_reduce used by the process group in RPC shutdown. `test_ddp_comparison` and `test_ddp_comparison_uneven_inputs` test fail with these changes. It seems like the RPC agents are not accessing the same store, so the total count of processes never reaches the world size to exit the barrier, still ened to investigate why it is like this only for these test cases. Setting clean_shutdown to false ignores this code path which allows the test to pass. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31762736 Pulled By: H-Huang fbshipit-source-id: cb5d0efe196f72726c63393c4293e97ec4f18548	2021-10-28 10:15:56 -07:00
Nikita Shulga	0c93c8e39a	Disable linux-xenial-cuda10.2 config (#67344 ) Summary: linux-xenial-cuda10.2 and linux-bionic-cuda10.2 are very similar, no need to run both configs Moved all auxiliary builds from xenial to bionic Pull Request resolved: https://github.com/pytorch/pytorch/pull/67344 Reviewed By: seemethere, janeyx99 Differential Revision: D31964850 Pulled By: malfet fbshipit-source-id: d07ce266c843c7fd69b281e678c4247b0bf6da20	2021-10-28 10:10:13 -07:00
Kenichi Maehashi	6ed68f3f84	Document `torch.jit.is_tracing()` (#67326 ) Summary: This PR adds `torch.jit.is_tracing()` to the JIT API reference. This function is widely used but left undocumented: https://github.com/search?q=torch.jit.is_tracing&type=code Pull Request resolved: https://github.com/pytorch/pytorch/pull/67326 Reviewed By: tugsbayasgalan Differential Revision: D31985251 Pulled By: Krovatkin fbshipit-source-id: 852b432b08d63df8bd7a7a02c9555e61f5f37978	2021-10-28 09:56:09 -07:00
albanD	b27b1ff809	Fix deadlock when forward and backward AD are used at the same time (#67360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67360 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31973040 Pulled By: albanD fbshipit-source-id: f9c75c6497b622c86e8653027bce45461304eff5	2021-10-28 09:11:36 -07:00
albanD	d3f03af496	Fix indentation in forward_grad.h (#67359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67359 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31973039 Pulled By: albanD fbshipit-source-id: 80ca7870ea35977560334aa65aa344da6847c039	2021-10-28 09:10:18 -07:00
Bin Wen	6900aacf54	[fbcode] Fix operator_benchmark with jit mode (#67382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67382 two simple updates: * fix running benchmark with --use_jit. Previously will fail with error torch.jit.frontend.UnsupportedNodeError: import statements aren't supported: File "/proc/self/fd/3/bmm_test.py", line 9 def __invoke_main(): import ctypes ~~~~~~ <--- HERE import ctypes.util import errno * add matmul to bmm benchmark as D31837588 Test Plan: buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:bmm_test -- --forward_only=True --mkl_num_threads=1 --omp_num_threads=1 --use_jit=True Reviewed By: ShijunK Differential Revision: D31960528 fbshipit-source-id: 84b892934149784d1b8a0f90b0233cc2f1cf1f5f	2021-10-28 08:48:10 -07:00
Jane Xu	eb8b80b76f	Add test owners for elastic tests (#67293 ) Summary: Action following discussion with distributed and r2p team--the tests under elastic in distributed should be owned by oncall: r2p and not distributed. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/67293 Reviewed By: jbschlosser Differential Revision: D31973779 Pulled By: janeyx99 fbshipit-source-id: 05875a7600c6eb1da1310a48e1e32a1a69461c55	2021-10-28 08:32:50 -07:00
Bin Bao	2366948085	[LT] Add ir_util for ComputePostOrder (#67282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67282 Test Plan: `build/bin/test_lazy` Reviewed By: wconstab, ngimel Differential Revision: D31961754 Pulled By: desertfire fbshipit-source-id: 28466588ece8057640a7202b8c79cc1a4357d373	2021-10-28 08:17:52 -07:00
albanD	6293e0ad61	update coverage ignore to not skip whole modules (#67395 ) Summary: This reduces the chance of a newly added functions to be ignored by mistake. The only test that this impacts is the coverage test that runs as part of the python doc build. So if that one works, it means that the update to the list here is correct. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67395 Reviewed By: jbschlosser Differential Revision: D31991936 Pulled By: albanD fbshipit-source-id: 5b4ce7764336720827501641311cc36f52d2e516	2021-10-28 08:07:24 -07:00
Shubham Bhokare	961fd76a9a	[ONNX] Relax check on Prim::PythonOp nodes for ONNX_FALLTHROUGH (#66172 ) (#67273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67273 * Relax check on Prim::PythonOp nodes for Onnx_fallthrough * Add tests Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962521 Pulled By: malfet fbshipit-source-id: 878920196d66c4f1dadaf3ebb9a7bf69b88849b4	2021-10-28 08:02:49 -07:00
Bowen Bao	02a78bdba7	[ONNX] Support conv-bn fusion in blocks (#66152 ) (#67272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67272 * Support conv-bn fusion in nested blocks * avoid running script tests twice Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962513 Pulled By: malfet fbshipit-source-id: 3ee79426542f9049cf62ac7b0c1be9d60ae6d014	2021-10-28 08:02:46 -07:00
Gary Miguel	9deb602726	[ONNX] Use Reciprocal operator instead of Div(1, x). (#65382 ) (#67271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67271 * [ONNX] Use Reciprocal operator instead of Div(1, x). This is a more readable and perhaps more performant way to export torch.reciprocal. * Use Reciprocal in caffe to operator to import onnx Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962519 Pulled By: malfet fbshipit-source-id: d926e75b1c8312b9a980c9a1207a1a93ba0c71e0 Co-authored-by: take-cheeze <takechi101010@gmail.com>	2021-10-28 08:01:21 -07:00
Onyiee	eea20bfa15	fixed type checking errors in fuse.py (#66799 ) Summary: Fixes [Issue#70](https://github.com/MLH-Fellowship/pyre-check/issues/70) This PR fixes the type checking error that was found in fuse.py as follows: torch/quantization/fx/fuse.py:34:13 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. Signed-off-by: Onyemowo Agbo Pull Request resolved: https://github.com/pytorch/pytorch/pull/66799 Reviewed By: 0xedward Differential Revision: D31961462 Pulled By: onionymous fbshipit-source-id: 7481afc07152ba13f3224e4ad198fd8e2c34c880	2021-10-28 07:45:28 -07:00
Mike Iovine	7da9c4ed2e	[SR] NNC out variant for aten::where (#67255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67255 Add an out variant for `aten::where`. Since this op can be implemented quite trivially in NNC with `ifThenElse`, I added an NNC kernel as well. Test Plan: Unit tests: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: navahgar Differential Revision: D31923886 fbshipit-source-id: b4379ee3aaf31a000e626b4caeafd3e3f3d60837	2021-10-28 06:48:22 -07:00
Ben Koopman	3aadff651c	[quant][embedding qat][bugfix] Fix and test QAT EmbeddingBag from_float error message (#66989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66989 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31961773 Pulled By: b-koopman fbshipit-source-id: 0d28728c87751ffc696ac221c3e8e75ac923cc57	2021-10-28 06:29:20 -07:00
Nicolas Hug	62feadd76f	Replace issue templates with new issue forms (#65917 ) Summary: This PR introduces the new issue forms that replace issue templates. This is similar to what was done in torchvision https://github.com/pytorch/vision/pull/4299 and torchaudio, you can see the end result here: https://github.com/pytorch/vision/issues/new/choose (click e.g. on the [bug report](https://github.com/pytorch/vision/issues/new?assignees=&labels=&template=bug-report.yml)) The main new thing is that we can enforce some of the fields to be filled, especially for bug reports. It's also a much cleaner GUI for users IMHO, and we can provide better examples and instructions. There is still a "blank" template available. I removed the "Questions" form: we say we close these issues anyway. I replaced it with a direct link to https://discuss.pytorch.org. Since we still have a "blank" template, I think this covers all previous use-cases properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65917 Reviewed By: VitalyFedyunin Differential Revision: D31894777 Pulled By: NicolasHug fbshipit-source-id: fbd39f7ed4cadab732d106d3166c04c451c31f94	2021-10-28 04:49:47 -07:00
Mike Iovine	6827d36c1a	[Static Runtime][DI] Fuse list unpack and variadic_grouped_accessor_op (#66585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66585 Add a new op `static_runtime::fused_variadic_grouped_accessor_op` that outputs many tensors rather than a single tensor list. Incorporated this new op into `FuseListUnpack`. This eliminates `ListUnpack` overhead and tensor refcount bumps. Test Plan: Accuracy Test Model 294738512_40 (manually confirmed that fusion happens) ``` get 2861 prediction values get 2861 prediction values max_error: 0 total: 0 ``` Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors. Performance TW replayer test w/ 800 QPS (stacked with D31482816 (`72e25c9f4e`)) shows 5% CPU decrease for storage tier. Results: {F673610679} Reviewed By: hlu1 Differential Revision: D31620408 fbshipit-source-id: f05c298bcbce61a491b63d575af4aca746881696	2021-10-28 04:34:57 -07:00
jjsjann123	90b722c544	specializeGradSumToSize patch - propagate profile_none through profile_ivalue (#63941 ) Summary: simply propagate profile_none_ value through profile_ivalue nodes inserted by nvfuser. Without the propagation, profile_ivalue inserted by other passes would block the optimization on no-op sum_to_size. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63941 Reviewed By: shunting314, cpuhrsch Differential Revision: D31972765 Pulled By: Krovatkin fbshipit-source-id: 4fa571a758e269b486c584f47c2a933de82d463b	2021-10-27 22:54:09 -07:00
Wanchao Liang	fc664ac272	[sharded_tensor] easier initialization for Shard (#66351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66351 This add the ability for use to just provide shard_offsets and optionally rank, to construct a local shard, instead of knowing there's a ShardedMetadata. Under the hood, we will construct the ShardedMetadata by inferring shard_lengths and device from the local tensor. ghstack-source-id: 141742410 Test Plan: test_local_shards Reviewed By: pritamdamania87 Differential Revision: D31519919 fbshipit-source-id: 8f3b4682ffc74b79b41076f3f4b832f4cacda49d	2021-10-27 22:20:37 -07:00
Wanchao Liang	71a67d0ce9	[sharded_tensor] simplify init_from_local_shards API (#64481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64481 This simplifies `init_from_local_shards` API in sharded tensor, to only require user pass in a list of `Shard` and `overall_size`, instead of ShardedTensorMetadata. We will do the all_gather inside to form a valid ShardedTensorMetadata instead. TODO: add more test cases to improve coverage. ghstack-source-id: 141742350 Test Plan: TestShardedTensorFromLocalShards Reviewed By: pritamdamania87 Differential Revision: D30748504 fbshipit-source-id: 6e97d95ffafde6b5f3970e2c2ba33b76cabd8d8a	2021-10-27 22:19:20 -07:00
Jerry Zhang	0117ada47c	[quant][graphmode][fx] Add input_idx_to_dtype and ouptut_idx_to_dtype to backend_config_dict (#67067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67067 Plan to gradually adding features to backend_config_dict, this PR adds support for specifying the dtype for input and output of a given pattern Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849074 fbshipit-source-id: ca2fbb873176fe72e08ea79ed1bc659bf27cbd8a	2021-10-27 22:10:12 -07:00
Tao Xu	e332d80299	[iOS][CoreML] Remove shape information from TensorSpec (#67412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67412 For inputs, we'll be using the shape from PyTorch tensors. For outputs, we'll be using the shape from MLMultiArray. Thus, we can decouple from the symbolic shapes defined in the compile spec. ghstack-source-id: 141746346 Test Plan: - Sandcastle - buck test pp-ios Reviewed By: hanton Differential Revision: D31299408 fbshipit-source-id: 337d5bb9efc2ff51409586c288d607399b937212	2021-10-27 21:55:29 -07:00
Tao Xu	04aba42ed7	[Core ML] Assign Core ML computationUnit to executor (#67411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67411 This was overlooked before. ghstack-source-id: 141746345 Test Plan: buck test pp-ios Reviewed By: hanton Differential Revision: D31977097 fbshipit-source-id: f5ce9f7d58c3f35097caaa75f75310a89c918387	2021-10-27 21:55:27 -07:00
Tao Xu	7e1a53cd5c	[Core ML] Fix error messages (#67410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67410 As title ghstack-source-id: 141537215 Test Plan: buck-test pp-ios Reviewed By: hanton Differential Revision: D31901372 fbshipit-source-id: 80ae1cf8cb67c0e2ca276e21cc80b1ff799437a4	2021-10-27 21:54:14 -07:00
Scott Wolchok	fae1c0a434	[PyTorch] Reduce refcount bumps in ClassType (#66724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66724 Forwarding fix from previous diff through the ClassType getters & moving Types in where possible. ghstack-source-id: 141594741 Test Plan: CI Reviewed By: suo Differential Revision: D31697995 fbshipit-source-id: 05d6af7c23e3b7a94db75b20d06338bc9ade3e20	2021-10-27 19:32:33 -07:00
Scott Wolchok	c8dd90c858	[PyTorch] Fix extra refcount bumps in ClassAttribute (#66723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66723 Missing move in constructor and forced copy in getter. ghstack-source-id: 141594742 Test Plan: CI Reviewed By: suo Differential Revision: D31697702 fbshipit-source-id: c2018531e7ec4a4853cd003ea3753273a5fae7fb	2021-10-27 19:31:22 -07:00
Supriya Rao	1cfdb6f4c6	[quant][fx] add pass to duplicate dequant nodes with multi use (#67118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67118 Fixes a bug in the reference pattern support for nn.Linear when the same quantized input is shared across multiple Linear nodes. This PR adds a pass to duplicate the dequant nodes for each use so that for a case like ``` x -> quant -> dequant -> linear1 - quant1 \| linear2 - quant2 ``` We duplicate the dequant nodes ``` x -> quant -> dequant1 -> linear1 - quant1 \| dequant2-> linear2 - quant2 ``` So that we can match each pattern in the loweing step We also add a pass to remove the extra/duplicate dequant nodes that may be leftover from the above pass if we don't lower them based on pattern match Test Plan: python test/test_quantization.py test_ref_pattern_multi_use Imported from OSS Reviewed By: mrshenli Differential Revision: D31873511 fbshipit-source-id: aea0819222f084635157426743a50e065e6503c3	2021-10-27 18:25:35 -07:00
Nikolay Korovaiko	9e175400ac	Moving python binding to _C and its decl to the right pyi file (#67365 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67365 Reviewed By: malfet, albanD Differential Revision: D31972163 Pulled By: Krovatkin fbshipit-source-id: e5313c2c8cb810b57b7fe16af8ba26edbe486488	2021-10-27 17:33:45 -07:00
Shunting Zhang	882446c1d2	add frozen_numpy to :builtin_registry_cuda target (#67396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67396 frozen_numpy did not work on GPU since we didn't added register_frozennumpy to the :builtin_registry_cuda target. This was not found earlier since the unit test we added to test_deploy.cpp is only run on CPU. On GPU, we run test_deploy_gpu.cpp which does not contains the added unit tests for numpy. In this diff, I just duplidate the unit tests for numpy (and pyyaml) across test_deploy.cpp and test_deploy_gpu.cpp. I think ideally we should consolidate there 2 files to a single one. So we can add unit test in a single place while run them in both hardward platforms. Tracking task: T104399180 ghstack-source-id: 141750276 Test Plan: buck test mode/opt :test_deploy_gpu Reviewed By: suo Differential Revision: D31978156 fbshipit-source-id: 2f5cd55ca33107cc7d230b72f1353df81f0a3bda	2021-10-27 17:29:25 -07:00
Hao Lu	9ebc6357b3	[SR] Vectorize int version of fmod (#67313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67313 Reviewed By: swolchok Differential Revision: D31889868 fbshipit-source-id: a0af399431a0d672fa56cf2f2ba6d548c47bcedd	2021-10-27 17:02:53 -07:00
Jacob Szwejbka	dea8b27433	[Pytorch Edge] Make some torchbind classes selective (#67340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67340 Currently Torchbind classes arent selective. This makes is a rough granularity pass that will remove entire classes if they arent selected. If we need finer granularity in the future we can make individual methods within classes Selective though instrumenting that will be significantly more involved I think. On a linux build only __torch__.torch.classes._nnapi.Compilation remains unselective. I cant find where its registered :P (theres a couple Android only ones and presumably some metal only ones as well) Many of the classes registered in functions returned a reference to the class that was created. I talked with dreiss about it and we decided that this seemingly didnt serve any purpose, and leaving it like that would make the return value difficult (but possible) to create with selectivity. Since it seems useless anyway I just changed them to return an int so that they can still be called from a global scope, but not have any issues with the return type. ghstack-source-id: 141690776 Test Plan: CI, model unit tests, test models in prod apps Reviewed By: dhruvbird Differential Revision: D31092564 fbshipit-source-id: 657f7eb83490292436c15cf134ceca9b72fb9e1a	2021-10-27 16:58:27 -07:00
Zhengxu Chen	f20614af21	[jit] Allow custom class functions to be traced in invokeScriptMethodFromPython(). (#67380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67380 Test Plan: eyes Reviewed By: tugsbayasgalan Differential Revision: D31975656 fbshipit-source-id: 47c8c9854899e9fed5a635f88470711dc4c95970	2021-10-27 16:38:50 -07:00
Ivan Yashchuk	2267a984eb	[ROCm] Add sparse mappings for CUDA->HIP translation (#67323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67323 Applied patch proposed by Jeff https://github.com/pytorch/pytorch/pull/63948#issuecomment-952166982. In PyTorch, we map cuBLAS->rocBLAS and cuSPARSE->hipSPARSE. Note the prefix, roc versus hip. The 'hip' APIs offer a more direct CUDA-friendly mapping, but calling rocBLAS directly has better performance. Unfortunately, the `roc` types and `hip` types differ, i.e., `rocblas_float_complex` versus `hipComplex`. In the case of SPARSE, we must use the hip types for complex instead of the roc types, but the pytorch mappings assume roc. Therefore, we create a new SPARSE mapping that has a higher priority. Its mappings will trigger first, and only when a miss occurs will the lower-priority pytorch mapping take place. When a file contains "sparse" in the filename, a mapping marked with API_SPARSE is preferred over other choices. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31969246 Pulled By: cpuhrsch fbshipit-source-id: 4ce1b35eaf9ef0d146a0955ce70c354ddd8f4669	2021-10-27 16:28:37 -07:00
Alban Desmaison	708f7b1209	Update extending doc to cover forward mode AD (#66962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66962 Reviewed By: VitalyFedyunin Differential Revision: D31897782 Pulled By: albanD fbshipit-source-id: 64164783a14a7ed4cedc17da28f1181d9807a499	2021-10-27 14:18:38 -07:00
Shubham Bhokare	d9a5668983	[ONNX] Add dim argument to all symbolic (#66093 ) (#67270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67270 * Add dim argument to all symbolic * All symbolic depends on any symbolic Test Plan: Imported from OSS Reviewed By: msaroufim Differential Revision: D31962518 Pulled By: malfet fbshipit-source-id: f7ee05cf4eff5880fc508154267e060952b5b42d	2021-10-27 13:46:31 -07:00
Gary Miguel	cb15df76ad	[ONNX] Update onnxruntime to 1.9 for CI (#65029 ) (#67269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67269 Test Plan: Imported from OSS Reviewed By: ngimel, msaroufim Differential Revision: D31962516 Pulled By: malfet fbshipit-source-id: 39b3c6a4a05d7b769f0ef5ce7ea597209516cde2	2021-10-27 13:45:07 -07:00
Richard Barnes	9900310133	Fix sign warnings in CUDA kernels (#66753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66753 Fixes these Wextra compilation errors: ``` stderr: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/UnarySignKernels.cu:49:72: error: comparison is always false due to limited range of data type [-Werror=type-limits] 49 \| AT_DISPATCH_ALL_TYPES_AND2 (`44fd312604`)(kBFloat16, ScalarType::Half, iter.input_dtype(), "signbit_cuda", [&]() { \| ~~^~~ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:97: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ stderr: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu: In lambda function: caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu:99:86: error: comparison is always false due to limited range of data type [-Werror=type-limits] 99 \| AT_DISPATCH_INTEGRAL_TYPES(dtype, "div_floor_cuda", [&]() { \| ^ ``` And also these warnings: ``` caffe2/c10/util/Half.h(461): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here caffe2/c10/util/Half.h(459): warning: pointless comparison of unsigned integer with zero detected during instantiation of "std::enable_if<<expression>, __nv_bool>::type c10::overflows<To,From>(From) [with To=size_t, From=unsigned long]" caffe2/aten/src/ATen/native/Resize.h(45): here ``` I thought I'd fixed this previously using `std::is_unsigned` in D25256251 (`cff1ff7fb6`), but apparently that was insufficient. Test Plan: Sandcastle Reviewed By: malfet, ngimel Differential Revision: D31708173 fbshipit-source-id: 7714f6bbf109d2f2164630d3fc46bad18046c06c	2021-10-27 13:39:27 -07:00
Nikita Shulga	3a1aa31a2f	Add dummy bfloat16 VSX implementation (#67331 ) Summary: Just a copy of DEFAULT bfloat16 implementation and revert restriction introduced by https://github.com/pytorch/pytorch/pull/61630 Fixes https://github.com/pytorch/pytorch/issues/66867 and https://github.com/pytorch/pytorch/issues/62016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67331 Reviewed By: ngimel Differential Revision: D31959916 Pulled By: malfet fbshipit-source-id: 8ca5e65ca041fef67ee18ddbb215cff01fd1e004	2021-10-27 13:35:38 -07:00
Shirong Wu	7484941eaa	Wrap TRTInterpreter result with wrapper (#67307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67307 Wrap TRTInterpreter result so that any future change to output params is less likely to break existing use cases. Test Plan: Run test with all touched file Reviewed By: 842974287 Differential Revision: D31945634 fbshipit-source-id: 7cf73a1ef0098bff2013815f2f1692233ef7ec14	2021-10-27 13:24:50 -07:00
Priya Ramani	fa70d72e95	Set kernel func name from AOT Compiler (#67229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67229 Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated. This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later. Note: Most of this change was landed in https://github.com/pytorch/pytorch/pull/66337 which had to be reverted as it was breaking `test_profiler` in `test_jit_fuser_te` as it replaced the name generated for graph with the default kernel_func_name value. This PR fixes that as well. ``` (pytorch) ~/local/pytorch kname └─ $ python3 test/test_jit_fuser_te.py CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing ........................................<string>:3: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. L = torch.cholesky(A) should be replaced with L = torch.linalg.cholesky(A) and . . . ......................<string>:3: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release. The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion. L, _ = torch.symeig(A, upper=upper) should be replaced with L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L') and L, V = torch.symeig(A, eigenvectors=True) should be replaced with L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2492.) ......[W pybind_utils.cpp:35] Warning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (function operator()) /data/users/priyaramani/pytorch/torch/testing/_internal/common_utils.py:403: UserWarning: Using sparse tensors in TorchScript is experimental. Many optimization pathways have not been thoroughly tested with sparse tensors. Please include the fact that the network is running sparse tensors in any bug reports submitted. (Triggered internally at ../torch/csrc/jit/python/pybind_utils.h:691.) return callable(args, *kwargs) .....................................................................[W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check) [W Resize.cpp:23] Warning: An output with one or more elements was resized since it had shape [1, 5], which does not match the required output shape [5].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function resize_output_check) ........................................................................s.......s...s.s....s......s..sss............................ ---------------------------------------------------------------------- Ran 503 tests in 37.536s OK (skipped=10) ``` Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D31945713 Pulled By: priyaramani fbshipit-source-id: f2246946f0fd51afba5cb6186d9743051e3b096b	2021-10-27 13:10:49 -07:00
Jane Xu	5347dab851	Set test owners for onnx tests (#66860 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66860 Reviewed By: malfet Differential Revision: D31964696 Pulled By: janeyx99 fbshipit-source-id: 4e77d1bda92d9107ca0b90a06d24fa4477ceaffa	2021-10-27 12:50:45 -07:00
Mike Iovine	72e25c9f4e	[Static Runtime][DI] Add variadic grouped_accessor_op (#66289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66289 Add a variadic version of `grouped_accessor_op` to eliminate list construction overhead and associated refcount bumps in static runtime. Test Plan: Accuracy test with model 294738512_40: passes with 0 errors. Accuracy test with model 296213501_65 (has V2 op): passes with 0 errors. Perf impact TW replayer test w/ 800 QPS (stacked with D31620408) shows ~5% CPU decrease for storage tier. Results: {F673610665} Reviewed By: hlu1 Differential Revision: D31482816 fbshipit-source-id: 14393da122cefd094c3e4f423beb897c1d17b32c	2021-10-27 12:29:33 -07:00
jjsjann123	1ec732bc46	Add fp16/fp32 autocasting to JIT/TorchScript (#63939 ) Summary: Adds mixed precision autocasting support between fp32/fp16 to torchscript/JIT. More in depth descriptoin can be found at [torch/csrc/jit/JIT-AUTOCAST.md](https://github.com/pytorch/pytorch/pull/63939/files#diff-1f1772aaa508841c5bb58b74ab98f49a1e577612cd9ea5c386c8714a75db830b) This PR implemented an autocast optimization pass that inserts casting ops per AMP rule (torch/csrc/jit/passes/autocast.cpp), that mimics the behavior of eager autocast. The pass also takes into consideration the context of `torch.cuda.amp.autocast` and only inserts casting ops within the enabled context manager, giving feature parity as with eager amp autocast. We currently provide JIT AMP autocast as a prototyping feature, so it is default off and could be turned on via `torch._C._jit_set_autocast_mode(True)` The JIT support for autocast is subject to different constraints compared to the eager mode implementation (mostly related to the fact that TorchScript is statically typed), restriction on the user facing python code is described in doc torch/csrc/jit/JIT-AUTOCAST.md This is a prototype, there are also implementation limitation that's necessary to keep this PR small and get something functioning quickly on upstream, so we can iterate on designs. Few limitation/challenge that is not properly resolved in this PR: 1. Autocast inserts cast operation, which would have impact on scalar type of output tensor feeding downstream operations. We are not currently propagating the updated scalar types, this would give issues/wrong results on operations in promotion rules. 2. Backward for autodiff in JIT misses the casting of dgrad to input scalar type, as what autograd does in eager. This forces us to explicitly mark the casting operation for certain operations (e.g. binary ops), otherwise, we might be feeding dgrad with mismatch scalar type to input. This could potentially break gradient function consuming dgrad. (e.g. gemm backwards, which assumes grad_output to be of same scalar type as input') 3. `torch.autocast` api has an optional argument `dtype` which is not currently supported in the JIT autocast and we require a static value. Credit goes mostly to: tlemo kevinstephano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63939 Reviewed By: navahgar Differential Revision: D31093381 Pulled By: eellison fbshipit-source-id: da6e26c668c38b01e296f304507048d6c1794314	2021-10-27 12:11:36 -07:00
Eli Uriegas	0101b1ea2b	[skip-ci] .github: Set linux gpu instances to be non-ephemeral (#67345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67345 Was hitting capacity issues, setting these to non-ephemeral would mean keeping the current capacity at the expense of "unclean" nodes Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31965477 Pulled By: seemethere fbshipit-source-id: 6d45fb34d07d55c5112db065af2aa0a8b1fd8d1f	2021-10-27 11:55:45 -07:00
Zhengxu Chen	b55a2500d2	[jit] Remove graph() call from abstract Function interface. (#65967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65967 Graph is an implementation detail. If user wants to get access to the underlying graph, they should be able to explicitly dynamic cast instead. ghstack-source-id: 141659819 Test Plan: no behavior change. Reviewed By: gmagogsfm Differential Revision: D31326153 fbshipit-source-id: a0e984f57c6013494b92a7095bf5bb660035eb84	2021-10-27 11:54:26 -07:00
Ivan Yashchuk	7c48b9ee25	Sparse CSR CUDA: add `triangular_solve_out` (#61858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61858 This PR adds `triangular_solve_out_sparse_csr_cuda`. The operation is used to comput the solution to the linear system where coefficient matrix is triangular. Structured kernels are used and the meta function needed some changes to support sparse csr layout. With sparse matrix input the `cloned_coefficient` tensor is 0-sized tensor. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31948435 Pulled By: cpuhrsch fbshipit-source-id: 7775fece83ca705a26d75f82aead10b956b14bfd	2021-10-27 11:12:20 -07:00
Shiyan Deng	4b9464f4b9	[fx]Early return if a node tries prepend self (#67068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67068 Prepending a node to itself will result in the node gets removed from the graph. Usually people won't prepend a node with itself. But people would accidentally try to append a node that's already next to `self` node, which will be prepending `self` to `self`. Test Plan: Added a unit test Reviewed By: jamesr66a Differential Revision: D31849030 fbshipit-source-id: b0fdfbb893f785f268595acd823b426d57c15e61	2021-10-27 10:49:45 -07:00
Eli Uriegas	2669e4ed4e	Revert D31945507: .github: Switch 8xlarge to 4xlarge instance_type Test Plan: revert-hammer Differential Revision: D31945507 (`1541bb823a`) Original commit changeset: fb8587de7f31 fbshipit-source-id: 3760f5610f0c9cd5298a35490c549e56f7396aaf	2021-10-27 10:02:51 -07:00
Jane Xu	7d1c0992e1	GHA: add back runner type for distributed tests (#67336 ) Summary: Addresses https://github.com/pytorch/pytorch/pull/67264#issuecomment-953031927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67336 Test Plan: the 8x is used for the distributed config ![image](https://user-images.githubusercontent.com/31798555/139103861-38d7dc37-ca8b-4448-b3ec-facc24aee342.png) Reviewed By: malfet Differential Revision: D31961179 Pulled By: janeyx99 fbshipit-source-id: cd21e2bf2a7c6602c9a42a53759b720959e43b8d	2021-10-27 09:34:18 -07:00
soulitzer	f2f7b02b4c	Add support for vmap+fwdAD for basic out-of-place op (#66291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66291 In this PR: - Trivial batching rules for `make_dual` and `is_same_size` that enable forward ad + vmap functionality - Adds a check in gradcheck that is performed when both `check_batched_grad` and `check_forward_ad` are `True` (an OpInfo using this is added later in the stack). - Tests for the gradcheck functionality - Tests that basic out-of-place op works Test Plan: Imported from OSS Reviewed By: albanD, saketh-are Differential Revision: D31842018 Pulled By: soulitzer fbshipit-source-id: 84b18d9a77eeb19897757e37555581f2a9dc43d8	2021-10-27 08:55:06 -07:00
Shen Li	a3aa9df59f	Add barrier to ProcessGroup trampoline (#67236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67236 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31916706 Pulled By: mrshenli fbshipit-source-id: f3d2bcd938a384ec297f4094831c69d4059316bb	2021-10-27 08:18:07 -07:00
Ivan Kobzarev	e52d0e773b	[tensorexpr][ir][quant] Adding qscale and qzero to tensorexpr IR Buf (#66675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66675 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31676328 Pulled By: IvanKobzarev fbshipit-source-id: c6479415fa7d809e02dd3789ee0bfd6dfe50dc92	2021-10-27 01:32:16 -07:00
Shen Li	632719c214	Enable c10d trampoline tests on MacOS (#67205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67205 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31916705 Pulled By: mrshenli fbshipit-source-id: 440d319959796d01c637c277706eeab127d9bde7	2021-10-26 20:40:12 -07:00
Bangsheng Tang	c88da701e2	[hpc][inference] enable cuda graph in engine holder (#66738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66738 added a field `max_batch_size` to TRTModule, which would be later used to determine how big the engine holder would need to pad the input to Reviewed By: 842974287 Differential Revision: D31286509 fbshipit-source-id: be5c6d4ad9c87ca0842679dc507b187275d4e8dc	2021-10-26 18:48:05 -07:00
Sangbaek Park	28570664d5	[Vulkan] Add vulkan_perf_test with google benchmark (#67230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67230 Added a new test `vulkan_perf_test` for measuring performance with google benchmark. Summay: * `vulkan_perf_test` can be used to perform a quick benchmark test for Vulkan features to compare before and after performance when applying a new method and/or optimizing the existing implementation on your local machine. * The google benchmark 3rd party library (https://github.com/google/benchmark) is already in the repo (`fbsource/third-party/benchmark`). * The number of threads is set to 1 since Vulkan backend is not thread-safe. * Added a new API `Context::wait()` to allow benchmark tests to wait for all GPU operations to be done before calling `Context::flush()` * Call `Context::wait()` for each output Vulkan tensor and then `Context::flush()` to avoid out-of-memory issues while running a number of iterations in the benchmark test code * Use `Time` column (wall clock) as a total execution time for each iteration (instead of `CPU` column = CPU execution time only) from the benchmark result table * The more iterations, the more reliable data. But, it will take much longer. 100-1,000 iterations for bigger tensors and 5,000-10,000 iterations for smaller ones would be sufficient. * The benchmark data on MacOS is not reliable since there is an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`. And also running Vulkan models on MacOS instead of Metal ones is generally not a good idea. Next steps: * Add more benchmark tests as we optimize more Vulkan operators * Consider using Vulkan own performance counter such as [uVkCompute](https://github.com/google/uVkCompute) in the near future. Each iteration time can be manually set by `benchmark::State::SetIterationTime()` and `Benchmark::UseManualTime()` APIs (see [UseManualTime API](`365670e432/include/benchmark/benchmark.h (L1013)`)) * Consider this `vulkan_perf_test` as a performance BAT (Build Acceptance Test) on the CI pipeline. `gtest` and `google benchmark` can be written in the same place ([see](https://stackoverflow.com/questions/8565666/benchmarking-with-googletest)). And [swiftshader](https://github.com/google/swiftshader) can be used for Sandcastle devservers that don't support Vulkan. We may come up with a reasonable performance criteria for each test and it will fail if any significant performance degradation. Test Plan: Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on MacOS ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 ``` Running /data/local/tmp/vulkan_perf_test Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 60.4 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 24.1 ms 0.947 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.6 ms 14.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.844 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.02 ms 0.845 ms 5000 ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 13.3 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 16.4 ms 3.49 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.7 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.93 ms 0.855 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.14 ms 0.852 ms 5000 ``` Note that the smaller tensors (`3.93 ms` vs `6.14 ms` when comparing `{3,4,221,193}` with `{3,3,221,193}`) receive significant improvement on the Android builds. Because `vkCmdCopyImage` API is used for the bigger tensor `{3,4,22,193}` instead of shader operations. * `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster) * `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster) * `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster) {F674052834} Test result on MacOS ``` Running ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac#macosx-x86_64 Run on (16 X 2400 MHz CPU s) CPU Caches: L1 Data 32 KiB (x8) L1 Instruction 32 KiB (x8) L2 Unified 256 KiB (x8) L3 Unified 16384 KiB (x1) Load Average: 5.95, 5.02, 5.15 *WARNING* Library was built as DEBUG. Timings may be affected. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 51.2 ms 35.5 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.4 ms 4.76 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 51.9 ms 35.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 2.84 ms 1.36 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 2.30 ms 1.13 ms 5000 ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 70.1 ms 36.9 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 11.8 ms 5.00 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 69.3 ms 36.8 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 4.60 ms 1.48 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 3.65 ms 1.41 ms 5000 ``` Note that `{3,40,221,193}` input tensors don't receive any performance improvement when we use `vkCmdCopyImage` API to directly copy textures when the number of channel is a multiple of 4 on MacOS. This is maybe due to an extra layer [MoltenVk](https://github.com/KhronosGroup/MoltenVK) that is running on top of `Metal`. Reviewed By: SS-JIA Differential Revision: D31906379 fbshipit-source-id: 0addc766502dba1a915b08840b3a4dc786a9cd9d	2021-10-26 17:55:42 -07:00
Sangbaek Park	cdc9b26281	[Vulkan] Optimize cat operator for channel dimension (#67207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67207 Improved performance for `cat` operator for channel dimension: * Improved when the input tensor's channel size is a multiple of 4. * Add new test cases to cover this scenario * Limitation: We can't mix up using shader and `vkCmdCopyImage` at the same time. The way we create the output texture is different between two so we can't cross unless we create the output texture every time. We consider using `vkCmdCopyImage` only if all input tensors' channel is a multiple of 4. {F673815905} Test Plan: Test Conditions * 3 input tensors with size `{3, 40, 221, 193}` * Number of iteration: `1,000` * Compare `Time` column (`CPU` column is only for CPU execution time) * Flushes resources every 1 iteration since the input tensor size is big * running vulkan_perf_test requires a separate diff (D31906379) Test build on Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_perf_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_perf_test adb shell "/data/local/tmp/vulkan_perf_test" ``` Test build on Mac ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_perf_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_perf_test_binAppleMac\#macosx-x86_64 ``` Test result on Google Pixel 5 a) Without using `vkCmdCopyImage` for multiples of 4 in channel dimension ``` Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (Without optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 60.4 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 24.1 ms 0.947 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.6 ms 14.0 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 5.98 ms 0.844 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.02 ms 0.845 ms 5000 ``` b) With using `vkCmdCopyImage` for multiples of 4 in channel dimension ``` Run on (8 X 1804.8 MHz CPU s) *WARNING* CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead. ------------------------------------------------------------------------------------------------------------- Benchmark (With optimization for 4x channels) Time CPU Iterations ------------------------------------------------------------------------------------------------------------- cat_op_channel_perf/N:3/C:40/H:221/W:193/iterations:1000/threads:1 39.3 ms 13.3 ms 1000 cat_op_channel_perf/N:3/C:20/H:221/W:193/iterations:1000/threads:1 16.4 ms 3.49 ms 1000 cat_op_channel_perf/N:3/C:39/H:221/W:193/iterations:1000/threads:1 59.7 ms 14.1 ms 1000 cat_op_channel_perf/N:3/C:4/H:221/W:193/iterations:5000/threads:1 3.93 ms 0.855 ms 5000 cat_op_channel_perf/N:3/C:3/H:221/W:193/iterations:5000/threads:1 6.14 ms 0.852 ms 5000 ``` * `{3,40,221,193}`: 60.4 ms -> 39.3 ms (34.93% faster) * `{3,20,221,193}`: 24.1 ms -> 16.4 ms (31.95% faster) * `{3,4,221,193}`: 5.98 ms -> 3.93 ms (34.28% faster) {F674052795} Reviewed By: SS-JIA Differential Revision: D31781390 fbshipit-source-id: 42179d28ae461a9e247053bc9718f6b8c6c819e5	2021-10-26 17:54:19 -07:00
Nikita Shulga	d691bc1207	Revert D31937065: [pytorch][PR] fix binding to the wrong python module Test Plan: revert-hammer Differential Revision: D31937065 (`7ac8ed741d`) Original commit changeset: 5c10b2870bcc fbshipit-source-id: 9b21ffea8054b8a3a0b96e1b78e933f8654e7f2f	2021-10-26 17:40:59 -07:00
Christopher Gray Howard	dfa7225a38	[Pytorch][Bootcamp] Add fix and testing for non-vectorized Adadelta optimizer to handle complex numbers (#66587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66587 Made some changes in the step function of the non-vectorized Adadelta optimizer to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141484731 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adadelta_complex' https://pxl.cl/1R7kJ Reviewed By: albanD Differential Revision: D31630069 fbshipit-source-id: 2741177b837960538ce39772897af36bbce7b7d8	2021-10-26 17:35:01 -07:00
Eli Uriegas	fcefed9517	Revert D31935958: Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build Test Plan: revert-hammer Differential Revision: D31935958 (`00b0d4eeed`) Original commit changeset: 3e2cc5c8bc18 fbshipit-source-id: 3f22bf88d902891b83d836e3c53be9a214a58f1f	2021-10-26 17:30:22 -07:00
Eli Uriegas	1541bb823a	.github: Switch 8xlarge to 4xlarge instance_type (#67299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67299 Switches the linux.8xlarge.nvidia.gpu to the 4xlarge instance type to help with queueing / capacity issues. This change is only meant to be a bridge until everyone updates their PRs to use the new linux.4xlarge.nvidia.gpu node type NOTE: This node type will be removed so do not depend on it for any new workflows. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31945507 Pulled By: seemethere fbshipit-source-id: fb8587de7f31da72e968d46eeecc573d3f5b440f	2021-10-26 17:23:46 -07:00
Nikolay Korovaiko	7ac8ed741d	fix binding to the wrong python module (#67246 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67246 Reviewed By: zhxchen17 Differential Revision: D31937065 Pulled By: Krovatkin fbshipit-source-id: 5c10b2870bccece50ba52dde26127da79bccbba6	2021-10-26 17:19:02 -07:00
Kimish Patel	0e8bd0c8d6	[Pytorch Delegated Backend] Add macro to define sentinel value of debug handle. (#66584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66584 This will help avoid "-1"s in different places in our and backend codebase when debug handle is not known. Test Plan: CI Reviewed By: sxu Differential Revision: D31614478 fbshipit-source-id: 97fceb04e3e78f52feda7b1ba1da08fa4480dd77	2021-10-26 17:13:44 -07:00
Shunting Zhang	00b0d4eeed	Add register_frozenpython.cpp to the torch::deploy interpreter library in the OSS build (#67280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67280 Test Plan: Imported from OSS Reviewed By: zhxchen17 Differential Revision: D31935958 Pulled By: shunting314 fbshipit-source-id: 3e2cc5c8bc18b5e19bd3804ad542a9ed69e04291	2021-10-26 16:39:40 -07:00
Zhengxu Chen	f510193e22	[jit][edge] Export maybe-used interface methods from modules. (#65966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65966 ghstack-source-id: 141594521 Support exportation of "interface methods" from submodule to a mobile module. "Interface methods" are defined as methods which might be dynamically called in a module therefore need to be exported anyway, like virtual functions in C++. Before this change the algorithm of exportation is a simple iteration through all toplevel methods. Now since we have indirect calls, we need to recursively walkthrough the call graph to find all potentially used methods, which means the order we export methods might break in old runtimes, to guarantee forward compatibility we need to export toplevel methods first, then extra methods, in this order toplevel methods will always be found first. NOTE that interface methods exportations are disabled by default in this diff. We need to call torch._C._enable_mobile_interface_call_export to actaully enable it. Test Plan: buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_export_opnames_interface (jit.test_misc.TestMisc)' Reviewed By: qihqi, iseeyuan Differential Revision: D31326155 fbshipit-source-id: 5be7234cca07691f62648a85133b6db65e427b53	2021-10-26 16:35:15 -07:00
Natalia Gimelshein	a72a6365c9	disallow requires_grad=True in make_tensor for integral inputs (#67149 ) Summary: per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/67149 Reviewed By: albanD Differential Revision: D31928613 Pulled By: ngimel fbshipit-source-id: 4491954c4fcd4a4e3121155d4451cc7370c27a0b	2021-10-26 16:19:28 -07:00
Eli Uriegas	81d188101f	.github: Use 4xlarge instances for linux gpu (#67264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67264 Downgrades linux gpu instances from 4xlarge -> 8xlarge We were seeing capacity issues in terms of scaling 8xlarge instances, downgrading this to 4xlarge (which only have a single gpu) to see if that helps resolve some of the capacity issues we were seeing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31933488 Pulled By: seemethere fbshipit-source-id: b41922ebb675e663cb035cd3795bc9bae94dcac7	2021-10-26 16:17:33 -07:00
Ivan Yashchuk	fdc74e2373	Port triangular_solve to structured kernel (#61857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61857 A few updates to internal code that allow marking triangular_solve as structured. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31928687 Pulled By: cpuhrsch fbshipit-source-id: 80a2783c469d5a6194c466ccfa8808fa41c0bdb7	2021-10-26 14:50:00 -07:00
Scott Wolchok	6ce14e7b51	[PyTorch][Static Runtime] Cleanup: add valueVecFromFastSet (#66996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66996 We do this conversion a few times, and further diffs (which I'm trying to keep as small as possible) will do it more. ghstack-source-id: 141496817 Test Plan: CI Reviewed By: mikeiovine Differential Revision: D31821037 fbshipit-source-id: 1d3b54cadaedd53189aec6a35ed1a126c6fe4824	2021-10-26 14:47:15 -07:00
Scott Wolchok	066a980e7b	[PyTorch][Static Runtime][easy] Fix ValueGroup comment (#66965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66965 external aliases aren't defined to be outputs (though output aliases may end up in there as the following sentence clarifies). ghstack-source-id: 141473794 Test Plan: review Reviewed By: mikeiovine Differential Revision: D31809715 fbshipit-source-id: 82d1391b04e22559932f82270669a7ff94a1c90f	2021-10-26 14:45:36 -07:00
Shen Li	1926156752	Prevent TCPServer get deleted too early (#67204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67204 Fixes #66422 Fixes #66423 In the original test, all collectives are dummy local ones. As a result, rank 0 could exit earlier than other ranks. However, the `TCPStore` lives on rank 0, and other ranks might need to talk to that store after rank 0 exits. This commit explicitly makes rank 0 wait for all other ranks to finish. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31906802 Pulled By: mrshenli fbshipit-source-id: 82745f5497d784ea3cea9df6bda537ec71380867	2021-10-26 14:38:11 -07:00
Natalia Gimelshein	273ab55fc4	Revert D31914868: Strided masked reduction: mean (2nd try) Test Plan: revert-hammer Differential Revision: D31914868 (`a33d3d84df`) Original commit changeset: beda9d32ea65 fbshipit-source-id: dc3fa66d7e3c8a211fedac6ae191b11a4a9ab232	2021-10-26 14:18:22 -07:00
Rohan Varma	2ca552160b	[DDP] logging improvements (#67059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67059 Debugging some workflows, and sometimes the training does not finish but I want to know whether the graph was not static. Also, log 0 for unused parameter size if no unused params were found. ghstack-source-id: 141428950 Test Plan: Ci Reviewed By: mrshenli Differential Revision: D31846669 fbshipit-source-id: 21763fcdc1b244ba829117da1f15b2271d966983	2021-10-26 13:18:00 -07:00
Eli Uriegas	197dec14b3	.github: Change periodic docker jobs to always_rebuild (#67267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67267 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: xuzhao9 Differential Revision: D31934251 Pulled By: seemethere fbshipit-source-id: a323d2c754ff6324c69f81bf0e820ae9adbe7853	2021-10-26 13:06:16 -07:00
Aditya Pillai	99b34b320b	Make fb::sigrid_hash_compute_multipler_shift return a std::tuple<int64_t, int64_t> (#67123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67123 Makes `sigrid_hash_compute_multipler_shift` return a tuple instead of a tensor and modifies functions that depends on it. Test Plan: ``` buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` Benchmarks: `local`: ``` I1022 13:56:34.529495 2866038 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.67114, standard deviation: 0.336918 I1022 15:29:45.248790 3292725 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 5.66678, standard deviation: 0.403032 ``` `local_ro`: ``` I1022 13:59:24.262511 2882599 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.56012, standard deviation: 0.0537101 I1022 15:34:53.941890 3328358 PyTorchPredictorBenchLib.cpp:266] Mean milliseconds per iter: 1.5525, standard deviation: 0.0280267 ``` FB: local - P463676888, local_ro - P463676984, master local - P463686094, master local_ro - P463686470 Reviewed By: mikeiovine Differential Revision: D31867186 fbshipit-source-id: 0f640487b74d1cd0d5f714f2258e056a2f0c2c07	2021-10-26 12:51:10 -07:00
Pavithran Ramachandran	1ce500f56f	[easy][PyTorch] Use `at::native::is_nonzero` (#67195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67195 Now that `is_nonzero` is part of `at::native` refer https://github.com/pytorch/pytorch/pull/66663, replacing `TensorCompare::is_nonzero` to `at::native::is_nonzero` ghstack-source-id: 141514416 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31704041 fbshipit-source-id: 36813e5411d0aa2eb2d0442e2a195bbed417b33d	2021-10-26 12:40:32 -07:00
Pearu Peterson	a33d3d84df	Strided masked reduction: mean (2nd try) (#67088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67088 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31914868 Pulled By: cpuhrsch fbshipit-source-id: beda9d32ea65bcae31c2c0181f95ad23c6631075	2021-10-26 11:54:39 -07:00
Jacob Szwejbka	6c22b96082	[Pytorch Edge] Extend Tracer to Custom Classes (#67004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67004 New version because the other one was impossible to rebase Trace custom classes Test Plan: CI. Reviewed By: dhruvbird Differential Revision: D31818978 fbshipit-source-id: daa22ccb153e32685bcca43a303ba9e21042d052	2021-10-26 11:38:06 -07:00
Eli Uriegas	34ee5b11ff	.github: Add 4xlarge nvidia gpu to scale-config (#67262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67262 Adds a 4xlarge nvidia gpu variant to our scale-config.yml Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31931941 Pulled By: seemethere fbshipit-source-id: 120c73ad2c973a416a8426ad6f67457f87302db5	2021-10-26 11:19:16 -07:00
Eli Uriegas	7052c41899	.github: Add workflow to build all docker images (#67215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67215 We were regularly seeing gaps in our docker image builds due to specific workflows not being run when docker builds occurred on PRs, this should remove that ambiguity and ensure that all docker builds be re-built if a rebuild is deemed necessary Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31910422 Pulled By: seemethere fbshipit-source-id: f346e64f1857e35a995c49bf30521a3acd8af0b1	2021-10-26 11:14:04 -07:00
Howard Huang	d7ac6e977a	Fix test_create_store_multi flaky test (#66953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66953 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: kiukchung Differential Revision: D31802767 Pulled By: H-Huang fbshipit-source-id: a430e242788aac164496d4e65b85bf326537d019	2021-10-26 11:08:51 -07:00
vfdev-5	49bf24fc83	Updated error message for nn.functional.interpolate (#66417 ) Summary: Description: - Updated error message for nn.functional.interpolate Fixes https://github.com/pytorch/pytorch/issues/63845 cc vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/66417 Reviewed By: albanD Differential Revision: D31924761 Pulled By: jbschlosser fbshipit-source-id: ca74c77ac34b4f644aa10440b77b3fcbe4e770ac	2021-10-26 10:33:24 -07:00
Jane Xu	d47a9004c8	[skip ci] Set test owner for mobile tests (#66829 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66829 Reviewed By: albanD Differential Revision: D31928812 Pulled By: janeyx99 fbshipit-source-id: 8116b7f3728df8632278b013007c06ecce583862	2021-10-26 10:20:01 -07:00
Xiao Wang	204ffd33ee	[CUDA][Linalg] Add gesvd as SVD fallback; optimize SVD gesvdj performance (#64533 ) Summary: Fix https://github.com/pytorch/pytorch/issues/64237 Fix https://github.com/pytorch/pytorch/issues/28293 Fix https://github.com/pytorch/pytorch/issues/4689 See also https://github.com/pytorch/pytorch/issues/47953 cc ngimel jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64533 Reviewed By: albanD Differential Revision: D31915794 Pulled By: ngimel fbshipit-source-id: 29ea48696531ced8a48474e891a9e2d5f11e9d7a	2021-10-26 10:13:52 -07:00
kshitij12345	828a9dcc04	[nn] MarginRankingLoss : no batch dim (#64975 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64975 Reviewed By: albanD Differential Revision: D31906528 Pulled By: jbschlosser fbshipit-source-id: 1127242a859085b1e06a4b71be19ad55049b38ba	2021-10-26 09:03:31 -07:00
Peter Bell	129e99fbce	__getitem__: Ensure Tensor subclasses are not treated as tuples (#67202 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/67027 `torch.Tensor` is considered a Mapping, but not a Sequence in Python because it uses `tp_as_mapping` instead of defining `__getitem__` in Python. However, If you try to overwrite `__getitem__` from Python it is considered a `Sequence` and so the tensor is treated like a tuple for indexing purposes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67202 Reviewed By: VitalyFedyunin Differential Revision: D31908515 Pulled By: albanD fbshipit-source-id: 0ca55a36be3421f96428a8eacf5d195646252b38	2021-10-26 08:56:59 -07:00
Nikita Vedeneev	3c61700cf7	`torch.linalg.householder_product`: forward AD support (#67043 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67043 Reviewed By: VitalyFedyunin Differential Revision: D31897617 Pulled By: albanD fbshipit-source-id: ef135fe3d9e5b9b2a541c355017f07cdb1309979	2021-10-26 08:34:00 -07:00
Digant Desai	5b345e767e	QNNPACK: Update to use pytorch/cpuinfo.git repo as a third party dependency (#67106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67106 Test Plan: Recloned cpuinfo, rebuilt, and ran all the tests locally Reviewed By: kimishpatel Differential Revision: D31782317 fbshipit-source-id: 4a71be91f02bb6278db7e0124366d8009e7c7a60	2021-10-26 07:59:19 -07:00
Shen Li	2abffaf050	Consolidate c10d and dist imports in test_c10d_common.py (#67203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67203 This commit uses `dist` for `torch.distributed` and `c10d` for `torch.distributed.distributed_c10d`. The former is for public APIs and the latter is for private ones. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31906801 Pulled By: mrshenli fbshipit-source-id: c3a01f33962b01a03dbd565ed119dcdac594bcf2	2021-10-26 07:50:48 -07:00
Jane Xu	71b7182ee2	[skip ci] Set test owner for deploy/package tests (#66830 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66830 Reviewed By: albanD Differential Revision: D31905820 Pulled By: janeyx99 fbshipit-source-id: 9496acc98339d689fa62e18a8781d7344903a64c	2021-10-26 07:49:33 -07:00
Jane Xu	49251d05ec	[skip ci] Set test owners for NNC tests (#66833 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66833 Reviewed By: albanD Differential Revision: D31907812 Pulled By: janeyx99 fbshipit-source-id: 5e5013b4276fd208ac68d61cf787679799695602	2021-10-26 07:46:18 -07:00
Jeff Daily	a6d702a3ee	add support for ubuntu 20.04 to CI docker images (#66942 ) Summary: Some minor changes are needed to the .circleci docker scripts to support ubuntu 20.04. One edit updates the packages needed for all images (.circleci/docker/common/install_base.sh), while the other edit is specific to ROCm support. cc jeffdaily sunway513 jithunnair-amd ROCmSupport KyleCZH seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/66942 Reviewed By: albanD Differential Revision: D31899271 Pulled By: janeyx99 fbshipit-source-id: f7677ddc063a4504da9f39a756dc181ac55f200a	2021-10-26 07:41:46 -07:00
Mike Iovine	83355f9537	[SR][easy] Alias for c10::Symbol::fromQualString (#67162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67162 It's a bit annoying/ugly to type `c10::Symbol::fromQualString` everywhere, and we can't do `using c10::Symbol::fromQualString` since it's a static class function. Test Plan: CI Reviewed By: d1jang Differential Revision: D31887042 fbshipit-source-id: 073a56c72281c20284a9feef741aed96b58a921d	2021-10-26 06:09:17 -07:00
Vasilis Vryniotis	38cbaeb8a4	Update deprecated import paths. (#67250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67250 Test Plan: Run tests manually Reviewed By: NicolasHug Differential Revision: D31921656 fbshipit-source-id: e2cba7bc7d4a8c7f836bc32f1b8b11a37494a4e2	2021-10-26 04:51:07 -07:00
Hao Lu	0c1b7545b6	[Static Runtime] Add more debug info to verify_no_memory_overlap() (#67206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67206 The memory overlap check still checks the memory overlap for alias ops. It only skips the check for inplace ops. This needs to be fixed if we want to use the memory overlap check in prod. This diff only adds more debug info. It doesn't fix the aforementioned problem. Reviewed By: d1jang Differential Revision: D31889866 fbshipit-source-id: 05a80ace3d404f66f21a8bbdc9678485ff76c8d3	2021-10-26 01:48:41 -07:00
Wanchao Liang	31bcfa3760	[sharded_tensor] refactor sharded_tensor file structure (#67199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67199 This PR refactors _sharded_tensor package so that it splits from api.py, and add different components to make it more modularized, this will also help us resolve circular dependency due to increasing code size and better organize the package: * api.py: sharded tensor APIs * metadata.py: Metadata definition for ShardedTensors * shard.py: Shard definition for ShardedTensor * utils.py: utility functions for validation, etc. ghstack-source-id: 141533618 Test Plan: test_sharded_tensor.py Reviewed By: pritamdamania87 Differential Revision: D31904249 fbshipit-source-id: c747d96e131a1d4731991ec4ac090f639dcb369b	2021-10-26 00:36:23 -07:00
Zheng Yan	b96337cf47	add frozen_pyyaml as a builtin library to torch::deploy (#67127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67127 add frozen_pyyaml as a builtin library to torch::deploy Test Plan: unittests pass > buck test mode/dev-nosan caffe2/torch/csrc/deploy/... -- --regex ".TestPyYAML." Reviewed By: shunting314 Differential Revision: D31852201 fbshipit-source-id: 889c4493faf09ddd3ec2b9487da9acfea3ab6bcd	2021-10-25 23:16:41 -07:00
Alex Beloi	0e371e413d	[fx-acc] add automated graph opt testing using AccOpProperty (#67228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67228 We added `AccOpProperty` for easy enablement of graph opts for new acc ops based on general properties. This diff adds 1. `AccOpProperty.unary` 2. Automated testing for acc ops with both `AccOpProperty.unary` and `AccOpProperty.pointwise` with `sink_reshape_ops` graph opt. [Adds coverage for 30 more acc_ops] 3. Refactors `graph_opts/TARGETS` to collect all graph optimizations into a common library 4. replaces `def foo(, input, acc_out_ty=None): assert acc_out_ty is not None` with just `def foo(, input, acc_out_ty)`. Let me know if there is some hidden purpose to the other implementation. 5. adds `AccOpProperty.*` flags to appropriate ops. Test Plan: `buck test mode/dev glow/fb/fx/graph_opts:test_fx_sink` ``` ... Summary Pass: 31 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724581304 ``` Also ran ``` `buck test mode/dev glow/fb/fx/acc_tracer:` ``` ``` ... Summary Pass: 136 ListingSuccess: 4 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5910974582823618 ``` Reviewed By: jfix71 Differential Revision: D31671833 fbshipit-source-id: aa16d1008f18f7c8626058361efff33843de3505	2021-10-25 19:53:05 -07:00
Bo Wang	3596e13d45	Add torch.nn.init.normal_ and torch.nn.init.kaiming_uniform_ ops to ShardedTensor (#67057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67057 Extend ShardedTensor with torch.nn.init.[normal_, and kaiming_uniform_] ops Follow up from https://github.com/pytorch/pytorch/pull/63997 Test Plan: a) Unit Test (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit# s/uniform_/normal_ or kaiming_uniform_ Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D31845654 fbshipit-source-id: e7aedc0972539da59f7b84bbbf617caf6b206d52	2021-10-25 19:14:30 -07:00
Yinghai Lu	bfcde08612	[trt] Algorithm recorder/replayer (#4 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67211 Record the algorithm selection, dump it in json format and replay it. This has potential to 1. consistently repro the issue (algo selection could be sensitive to local benchmark timing) 2. manual edit the dumped json file to control algorithm selection. Reviewed By: wushirong, 842974287 Differential Revision: D31888836 fbshipit-source-id: 4611fda548f7391776f1ad61572b1f59fa4665b6	2021-10-25 18:50:55 -07:00
Priya Ramani	ecf7e96969	[Light] Remove ambiguity from compile_spec names, use actual output type (#67209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/67198 Fixing a couple instances where parameters were named method_compile_spec when they were actually compile_specs that could have multiple method_compile_specs inside. Also use output dtype from buffer. Test Plan: Mobilenetv3 compiles and runs fine ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ PYTORCH_JIT_LOG_LEVEL="aot_compiler" buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224 " Downloaded 4501/6195 artifacts, 433.89 Mbytes, 14.3% cache miss (for updated rules) Building: finished in 06:34.6 min (100%) 20233/20233 jobs, 5467/20233 updated Total time: 06:35.0 min BUILD SUCCEEDED The compiled llvm assembly code was saved to mobilenetv3.compiled.ll The compiled model was saved to mobilenetv3.compiled.pt └─ $ ./compile_model.sh -m pytorch_dev_mobilenetv3 -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/mobilenetv3.pt -v v1 -i "1,3,224,224" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=pytorch_dev_mobilenetv3 . . Columns 961 to 9701e-11 * -4.2304 -3.9674 2.4473 -0.8664 -0.7513 1.2140 0.0010 3.8675 1.2714 2.2989 Columns 971 to 9801e-11 * -2.7203 1.6772 -0.7460 -0.6936 4.4421 -0.9865 -0.5186 -1.4441 1.3047 -1.6112 Columns 981 to 9901e-11 * 0.1275 -1.8815 2.5105 -0.4871 -2.2342 0.8520 0.8658 1.6180 3.8901 -0.2454 Columns 991 to 10001e-11 * -1.4896 4.1337 -2.6640 0.8226 0.2441 -1.4830 -1.7430 1.8758 0.5481 0.5093 [ CPUFloatType{1,1000} ] Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 276.255. Iters per second: 3.61984 Memory usage before main runs: 104366080 bytes Memory usage after main runs: 343441408 bytes Average memory increase per iter: 2.39075e+07 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D31698338 fbshipit-source-id: da6c74c1321ec02e0652f3afe6f97bf789d3361b	2021-10-25 17:44:05 -07:00
Michael Shi	ad5731cacc	[PyTorch] Add flop count for bmm and baddbmm (#66636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66636 Add FLOP count for bmm and baddbmm, which is `2bmnk`. Reviewed By: ngimel Differential Revision: D31622061 fbshipit-source-id: f3e1e1e34c45228693117b81647fb4a623c4085b	2021-10-25 17:31:12 -07:00
Chen Lai	7acf0c6d4b	[PyTorch Edge][type] Add type support for NamedTuple custom class (export) (#62612 ) Summary: Add type support for namedtule custom class. For the namedtuple type, it will deserailize to the following format in string ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` If it's nested, it will be ``` "__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B [ NamedTuple, [ [field_name_b, __torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ] ] ] ] ] ] ] " ``` The nametuple type includes both `collection` and `typing`. ``` from typing import NamedTuple from collections import namedtuple ``` It will be a forward incompatible change. However this type is never supported and exported before and we don't have a proper way to backport it. The optimum solution to ship this change is probably 1. Update the change for import without the change to export. So the runtime can read the new format, but no new format will be exported. 2. Update the change to export the new type. So runtime can export new format. For the following example: ``` class Foo(NamedTuple): id: torch.Tensor class Bar(torch.nn.Module): def __init__(self): super(Bar, self).__init__() self.foo = Foo(torch.tensor(1)) def forward(self, a: torch.Tensor): self.foo = Foo(a) return self.foo ``` The new bytecode.pkl will be ``` (6, ('__torch__.mobile.test_lite_script_type.MyTestModule.forward', (('instructions', (('STOREN', 1, 2), ('DROPR', 1, 0), ('MOVE', 2, 0), ('LIST_CONSTRUCT', 0, 1), ('NAMED_TUPLE_CONSTRUCT', 1, 1), ('RET', 0, 0))), ('operators', ()), ('constants', ()), ('types', ('List[Tensor]', '__torch__.mobile.test_lite_script_type.myNamedTuple[NamedTuple, [[a, ' 'List[Tensor]]]]')), ('register_size', 2)), (('arguments', ((('name', 'self'), ('type', '__torch__.mobile.test_lite_script_type.MyTestModule'), ('default_value', None)), (('name', 'a'), ('type', 'Tensor'), ('default_value', None)))), ('returns', ((('name', ''), ('type', '__torch__.mobile.test_lite_script_type.myNamedTuple'), ('default_value', None)),))))) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/62612 ghstack-source-id: 141485500 Test Plan: fb: 1. Add a simple unittest to test NamedTuple custom class 2. Use following cpp code (D30271153) ``` TEST(LiteTrainerTest, CustomOp) { std::string jit_model = "/home/chenlai/local/notebooks/ads_dper_fl_model_282250609.pt"; Module jit_m = load(jit_model); jit_m.eval(); torch::jit::Module module_freeze = freeze(jit_m); IValue tuple = c10::ivalue::Tuple::create({1 * torch::ones({10, 1034}), 3 * torch::ones({10, 1034})}); std::vector<IValue> inputs_1{tuple}; auto jit_output = jit_m.forward(inputs_1); jit_output.dump(); std::stringstream ss; jit_m._save_for_mobile(ss); jit_m._save_for_mobile("/home/chenlai/local/notebooks/tmp/tmp.ptl"); torch::jit::mobile::Module mobile_m = _load_for_mobile(ss); auto mobile_output = mobile_m.forward(inputs_1); std::cout << "mobile output: " << std::endl; mobile_output.dump(); } ``` And output from both mobile and jit are ``` {prediction: ([ CPUFloatType{0} ], [ CPUFloatType{0} ])} ``` 3. N1033894 with model inspection, also compare the result between jit and mobile with the dper model. Reviewed By: iseeyuan Differential Revision: D30004716 fbshipit-source-id: cfd30955e66a604af8f9633b1b608feddc13d7d7	2021-10-25 17:15:50 -07:00
andrewor	0d7d446154	Disallow annotations on instance attributes outside __init__ (#67051 ) Summary: Summary: This commit solves the first part of https://github.com/pytorch/pytorch/issues/52306, which disallows type annotations on instance attributes inside any method other than the constructor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67051 Test Plan: Added test to test_types.py. Reviewers: Zhengxu Chen Subscribers: Zhengxu Chen, Yanan Cao, Peng Wu, Yining Lu Tasks: T103941984 Tags: pytorch Fixes https://github.com/pytorch/pytorch/issues/52306 Reviewed By: zhxchen17 Differential Revision: D31843527 Pulled By: andrewor14 fbshipit-source-id: 624879ae801621e367c59228be8b0581ecd30ef4	2021-10-25 16:20:47 -07:00
Nikolay Korovaiko	1f55dd83ac	[WIP] wrap XLATensors into Python XLA wrapper class (#65841 ) Summary: Improbably fixes https://github.com/pytorch/pytorch/issues/65130 ezyang I'm super n00b in Python extensions, is this what we want to do? Pull Request resolved: https://github.com/pytorch/pytorch/pull/65841 Reviewed By: navahgar Differential Revision: D31889790 Pulled By: Krovatkin fbshipit-source-id: c7f077b89f6f02df1962ab83d9e13fcc348a227d	2021-10-25 16:11:03 -07:00
Jane Xu	fa7fb7b4d9	[skip ci] Set test owner for test_profiler.py (#66831 ) Summary: Followup action to https://github.com/pytorch/pytorch/issues/66232 cc ilia-cher robieta chaekit gdankel bitfort ngimel orionr nbcsm guotuofeng guyang3532 gaoteng-git Pull Request resolved: https://github.com/pytorch/pytorch/pull/66831 Reviewed By: gdankel Differential Revision: D31909245 Pulled By: janeyx99 fbshipit-source-id: 4156a5cffa215c29022fc4dab6ee5b442a509db4	2021-10-25 15:59:52 -07:00
Stephen Jia	0acc21b412	[vulkan] Add 2D transposed convolutions (#67104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67104 Add 2D transposed convolutions to Vulkan. Currently, only `dilation={1,1}` is supported. We plan to support dilation at a later time. Test Plan: Build and run `vulkan_api_test`: ``` cd ~/pytorch BUILD_CUSTOM_PROTOBUF=OFF \ BUILD_TEST=ON \ USE_EIGEN_FOR_BLAS=OFF \ USE_FBGEMM=OFF \ USE_MKLDNN=OFF \ USE_NNPACK=OFF \ USE_NUMPY=OFF \ USE_OBSERVERS=OFF \ USE_PYTORCH_QNNPACK=OFF \ USE_QNNPACK=OFF \ USE_VULKAN=ON \ USE_VULKAN_API=ON \ USE_VULKAN_SHADERC_RUNTIME=ON \ USE_VULKAN_WRAPPER=OFF \ MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python3 setup.py develop --cmake && ./build/bin/vulkan_api_test ``` Reviewed By: beback4u Differential Revision: D31731742 fbshipit-source-id: b79c946c8d988bb4d83e9fd3381992a4f2f4be80	2021-10-25 15:55:20 -07:00
Zhengxu Chen	059ae96007	[jit] Factor findAllNodes into one place. (#65965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65965 ghstack-source-id: 141504185 Test Plan: no behavior change Reviewed By: qihqi, ejguan Differential Revision: D31326152 fbshipit-source-id: 2e0261a96853bfb67a96dd68972c905b6b26d562	2021-10-25 15:42:52 -07:00
Shiyan Deng	239b38268b	[fx2trt] Better trt layer name (#67200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67200 We want to put more information on the tensorrt layer name. Mainly we want to be able to tell the original op that a TensorRT layer is mapped from. The layer format is `[TensorRT Layer Type]-[Original Op Code]-[FX Node Name]` ``` Reformatting CopyNode for Input Tensor 0 to [FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.0328ms [FULLY_CONNECTED]-[acc_ops.linear]-[linear_1]: 0.027712ms PWN([RELU]-[acc_ops.relu]-[relu_1]): 0.008672ms ``` Test Plan: CI ``` buck run mode/dev-nosan -c python.package_style=inplace caffe2:fx2trt_example ``` Reviewed By: wushirong Differential Revision: D31627274 fbshipit-source-id: 3dbb576caa63b922274541d2a306b4bd37e707c5	2021-10-25 15:41:38 -07:00
Jerry Zhang	4ac8d06911	[quant] Remove unused print in quantization_patterns.py (#67191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67191 Test Plan: sandcastle and ossci Imported from OSS Reviewed By: supriyar Differential Revision: D31899784 fbshipit-source-id: 31ad63c0b2a5328fff80c38dc4e527e0399e802e	2021-10-25 15:07:18 -07:00
Zhengxu Chen	12daa4f663	[jit][edge] Enable CALL instruction in lite interpreter. (#65964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65964 ghstack-source-id: 141425519 Test Plan: buck run xplat/caffe2:test_lite_interpreter Reviewed By: cccclai Differential Revision: D31326149 fbshipit-source-id: 8a599d92f3fa4e6c125100adb36d89592e71e547	2021-10-25 14:44:33 -07:00
Xiang Gao	b8dfb45ac2	Refactor cub namespace handling (#66219 ) Summary: This PR is to update PyTorch with the following cub changes: - Starting cub 1.13.1, cub requires users to define `CUB_NS_QUALIFIER` if `CUB_NS_PREFIX` is also defined. Besides that, a new mechanism `CUB_WRAPPED_NAMESPACE` is added. And I do the following change to PyTorch: - Starting CUDA 11.5, define `CUB_WRAPPED_NAMESPACE` globally as an nvcc flag. - Fix caffe2 failures caused by the above change. - Add a `aten/src/ATen/cuda/cub_definitions.cuh` that defines helper macros about feature availability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66219 Reviewed By: bdhirsh Differential Revision: D31626931 Pulled By: ngimel fbshipit-source-id: 97ebf5ef671ade8bf46d0860edc317f22660f26d	2021-10-25 14:37:09 -07:00
Ivan Yashchuk	700b39a3df	Sparse CSR CUDA: add `torch.addmm` with all inputs sparse (#63511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63511 This PR adds `torch.addmm(c, a, b)` variant with `c, a, b` all being CSR tensors. The underlying cuSPARSE function works only with 32-bit indices, and in the current implementation the result tensor has 32-bit indices. Input tensors can have both 64-bit and 32-bit indices tensors. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31809838 Pulled By: cpuhrsch fbshipit-source-id: 97005dba27d8adcae445eb756bcbd7271061e9b5	2021-10-25 14:32:30 -07:00
Pearu Peterson	333717eaf0	Improve assert failure message in test_get_torch_func_signature_exhaustive (#67039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67039 cc mruberry Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31899719 Pulled By: cpuhrsch fbshipit-source-id: 819d07da5b18b31d462010b9f9382e0b8cd10f9f	2021-10-25 14:20:38 -07:00
Jacob Szwejbka	a6d0339492	[Pytorch Edge] Extend runtime compatibility to custom classes (#66972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66972 Add api to view how many custom classes we have and what their names are Test Plan: unit test Reviewed By: cccclai Differential Revision: D31811337 fbshipit-source-id: 9f8ca1fc578a0a5360c9cd8f95475acc33f250e4	2021-10-25 13:42:26 -07:00
lezcano	f4dd88489a	Better and more consistent error messages in torch.linalg (#62734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62734 Following https://github.com/pytorch/pytorch/pull/62715#discussion_r682610788 - squareCheckInputs takes a string with the name of the function - We reuse more functions when checking the inputs The state of the errors in torch.linalg is far from great though. We leave a more comprehensive clean-up for the future. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823230 Pulled By: mruberry fbshipit-source-id: eccd531f10d590eb5f9d04a957b7cdcb31c72ea4	2021-10-25 13:24:28 -07:00
Zhengxu Chen	4dce051cb0	[jit][edge] Add control stack frame to lite interpreter (#65963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65963 ghstack-source-id: 141425517 Test Plan: In next diff. Reviewed By: qihqi, cccclai Differential Revision: D31326150 fbshipit-source-id: dbbf65f2bf14846c45d0add71edc7d4dbfc6b92c	2021-10-25 12:15:16 -07:00
Alex Zhao	ac948f4f35	.github: Migrate linux-xenial-py3.6-gcc7 to GHA (#67072 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66888 cc seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/67072 Reviewed By: seemethere Differential Revision: D31900833 Pulled By: zhaoalex fbshipit-source-id: 93f8995611169d991f90e07e8c13e08182969577	2021-10-25 11:40:12 -07:00
Sahan Chanuka Paliskara	9de0888891	Move the registration of CPython builtin modules to BuiltinRegistry (#67085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67085 leverages BuiltinRegistry to register the CPython standard C modules. The standard C modules moved are in the FOR_EACH macro Test Plan: buck test mode/opt //caffe2/torch/csrc/deploy/interpreter:test_builtin_registry buck test mode/opt //caffe2/torch/csrc/deploy:test_deploy Reviewed By: shunting314 Differential Revision: D31848547 fbshipit-source-id: 7eb49d222eaaccb2b8ca5c984b05bf54cc233f25	2021-10-25 11:12:07 -07:00
Nikita Shulga	d68bb50ef3	Disable SVE when cross-compiling for M1 (#67114 ) Summary: Followup after https://github.com/pytorch/pytorch/issues/58653 It does not matter whether one compiles locally or cross-compiles - attempts to use SVE on M1 results in compiler crash as SVE ABI is not defined on MacOS Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/67114 Reviewed By: VitalyFedyunin Differential Revision: D31869356 Pulled By: malfet fbshipit-source-id: 184e26ae40edc7ef7b703200b53ea7a15da74818	2021-10-25 11:03:00 -07:00
Mike Iovine	5d9ff8f30e	[Static Runtime] Add static_runtime::fused_sigrid_transforms (#66659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66659 Original message: We added and registered a new operator, static_runtime::fused_sigrid_transforms, and modified the original sigrid_transforms to handle non-fused case only Note: this diff was commandeered from a bootcamper. Some final touches were needed. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: swolchok Differential Revision: D31550307 fbshipit-source-id: 287380be0cca20ee6e145bcc7217547bd58cf6d0	2021-10-25 10:44:46 -07:00
Pavithran Ramachandran	8d164a36fb	Use `at::native::is_nonzero` in promoted ops to improve portability (#67097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67097 all delegated models have `is_nonzero` ops by default, by making the op native and consumable without dispatch eases the portability of such models ghstack-source-id: 141375082 Test Plan: `buck test caffe2/test/cpp/jit:jit -- BackendTest.TestComposite` ``` ~/fbsource/fbcode] cd ~/fbsource/fbcode/ && buck test caffe2/test:jit -- test_trace_arange Parsing buck files: finished in 0.5 sec Building: finished in 9.4 sec (100%) 16035/16035 jobs, 0/16035 updated Total time: 10.0 sec More details at https://www.internalfb.com/intern/buck/build/1e55eea5-2adb-41d1-96ae-cbf4b446d6c6 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 46eedba2-ae17-4e88-b205-93bd1332665d Trace available for this run at /tmp/tpx-20211015-113905.235421/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177 ✓ ListingSuccess: caffe2/test:jit - main (12.372) ✓ Pass: caffe2/test:jit - test_trace_arange (jit.test_tracer.TestTracer) (13.748) ✓ Pass: caffe2/test:jit - test_trace_arange_with_grad (jit.test_tracer.TestTracer) (13.892) Summary Pass: 2 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1970324912349177 ``` Reviewed By: iseeyuan Differential Revision: D31656842 fbshipit-source-id: c0e6c798478a2783c0e17e6e9100ba5ce044da78	2021-10-25 10:18:31 -07:00
Christopher Gray Howard	acb340de75	[Pytorch][Bootcamp] Add fixes and vanilla testing for Adagrad non-vectorized and vectorized optimizers to handle complex numbers (#66671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66671 Made changes in the step function of the vectorized and non-vectorized adagrad optimizers to handle complex numbers as two real numbers as per 65711 on github ghstack-source-id: 141442350 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1Rd44 Reviewed By: albanD Differential Revision: D31673503 fbshipit-source-id: 90a0d0c69b556716e2d17c59ce80f09c750fc464	2021-10-25 10:13:21 -07:00
Mike Iovine	a0495b3cdb	[SR] Remove unused operator() overload (#67001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67001 The overload of `operator()` taking `std::vector<at::Tensor>` was only used for testing. In a diff following this one, I will add a new overload that takes `std::vector<c10::IValue> args` and no `kwargs` so we can avoid default-constructing `kwargs` everywhere. This new overload will probably take a forwarding reference, so to avoid problems with overloading on forwarding reference and simplify the interface, it's best to remove this unused one. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` `buck test caffe2/test:static_runtime` Reviewed By: hlu1 Differential Revision: D31821990 fbshipit-source-id: 6d2e4a75ca4abe6e262651532eb96c3b274c6f4a	2021-10-25 08:18:58 -07:00
Mike Iovine	364645cd9d	[SR] Factor operator() implementation into separate function (#67125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67125 Using explicit template instantiations in D31659973 (`f2582a59d0`) was a bad idea. The problem is that the lvalue instantiation was for a `const` vector of `IValue`, meaning that if you tried to pass SR a non-const vector of arguments, the linker would fail to find the symbol. The reason we didn't catch this in D31659973 (`f2582a59d0`) was because predictor always passes a `const` reference anyways. But we should fix this to prevent unexpected problems in the future. Test Plan: `buck test caffe2/benchmarks/static_runtime/...` Reviewed By: hlu1 Differential Revision: D31873406 fbshipit-source-id: 5ab5a03334bed925cec11facadcedf9bec9b90ad	2021-10-25 08:17:40 -07:00
Sameer Deshmukh	edd4d246c3	Accept 0-dim channel inputs in convolution layer (#66256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56998 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66256 Reviewed By: mrshenli Differential Revision: D31859428 Pulled By: jbschlosser fbshipit-source-id: 034b6c1ce35aac50eabfa09bbcd8b1e3c8b171bd	2021-10-25 08:12:29 -07:00
kshitij12345	6c985b57ff	OpInfo : nn.functional.embedding (#66997 ) Summary: Adds OpInfo for `nn.functional.embedding` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66997 Reviewed By: mrshenli Differential Revision: D31859799 Pulled By: zou3519 fbshipit-source-id: bbca860df4fbc243751f5fa81658231866c31d2e	2021-10-25 08:06:32 -07:00
Jerry Zhang	adc21f1966	[quant] Fix docs build (#67169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67169 Looks like the doc error only appears after it's landed Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31890431 fbshipit-source-id: d40cba082712c4b35704ea15d82fbc4749f85aec	2021-10-25 08:02:26 -07:00
Mike Iovine	dd81fa9027	[JIT] Freeze allows preservation of submodule attributes (#66102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66102 This changes allows the `preserved_attributes` parameter of `torch.jit.freeze` to accept attributes of submodules. Previously, only root-level attributes were able to be preserved. Example: ``` class SubModule(nn.Module): def __init__(self): super(SubModule, self).__init__() self.a = 1 self.b = 2 def forward(self): return self.a + self.b class Module(nn.Module): def __init__(self): super(Module, self).__init__() self.sub = SubModule() def forward(self): return self.sub() mod = torch.jit.script(Module()) mod.eval() frozen_mod = torch.jit.freeze(mod, preserved_attrs = ['sub.a']) mod.sub # OK mod.sub.a # OK mod.sub.b # Error, not preserved mod() # = 3 mod.sub.a = 0 mod() # = 2 ``` Test Plan: `buck test caffe2/test:jit -- TestFreezing` Reviewed By: eellison Differential Revision: D31383868 fbshipit-source-id: 34a05ca9528d4e5f04f71ac2a339fd584a8fa305	2021-10-25 07:56:20 -07:00
Jane Xu	09c7771e9c	Set test owners for jit tests (#66808 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66808 Reviewed By: mrshenli Differential Revision: D31761414 Pulled By: janeyx99 fbshipit-source-id: baf8c49ff9c4bcda7b0ea0f6aafd26380586e72d	2021-10-25 07:51:10 -07:00
Jerry Zhang	364c4959c3	[quant] Fix docs error in convert_fx (#67152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67152 Test Plan: ``` cd docs make html ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31884570 fbshipit-source-id: 2b521f617c93f6fa08da3387df2d25497293eee6	2021-10-24 19:26:45 -07:00
Nikolay Korovaiko	a7ebf76a15	jit trace (#59949 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59949 Reviewed By: ZolotukhinM Differential Revision: D31366787 Pulled By: Krovatkin fbshipit-source-id: 798cbcd97e8ecfba984f98cd70214954be9309af	2021-10-24 18:04:22 -07:00
Facebook Community Bot	f1b5f1898b	Automated submodule update: kineto (#67133 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `879a203d9b` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67133 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D31877172 fbshipit-source-id: 224a499607d1f3bf7c00d8d8dd1fdac47cd33a3b	2021-10-24 13:06:19 -07:00
Rohan Varma	b51731527d	[ez] [Docs] Missing import in example for post_local_sgd (#67047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67047 Fix missing import ghstack-source-id: 141258423 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31841837 fbshipit-source-id: 139e614517dcac7a53259ff7a0360bb5275bb53b	2021-10-24 01:44:06 -07:00
Rohan Varma	0000c88e10	[FSDP] No need for list() in _get_shard (#66957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66957 chunk appears to return a tuple which is enough given that we just index to the right chunk and discard the rest. ghstack-source-id: 141391149 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31780799 fbshipit-source-id: fdb1b77fffa916328e14a4cd692b5241ae46a514	2021-10-24 01:29:19 -07:00
Rohan Varma	580efb35a5	[FSDP] Add some comments after reading the code. (#66956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66956 Adds some comments I found helpful while ramping up on FSDP code. ghstack-source-id: 141391150 Test Plan: n/a Reviewed By: mrshenli Differential Revision: D31780798 fbshipit-source-id: e2d38a9801b4548b202a73615774d5f0f7f5e3ed	2021-10-24 01:28:19 -07:00
Natalia Gimelshein	b6fa998892	Revert D31514095: Use kernel_func_name from aotCompiler Test Plan: revert-hammer Differential Revision: D31514095 (`7b55dc8340`) Original commit changeset: b70c8e2c7336 fbshipit-source-id: ad4d828f33506e612b51c276149fa0e12b0565d5	2021-10-23 17:17:53 -07:00
Jerry Zhang	313939c9c6	[quant] Fix lint errors (#67138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67138 Test Plan: ossci Imported from OSS Reviewed By: supriyar Differential Revision: D31879558 fbshipit-source-id: 271905d3d254c906aa78bae9f2bd411f9d57e1e8	2021-10-23 11:26:25 -07:00
Priya Ramani	7b55dc8340	Use kernel_func_name from aotCompiler (#66337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66337 Right now, assembly code generated for the a given method from the model is named wrapper or func by default. The function name is then replaced with a proper kernel_func_name after target specific assembly is generated. This PR propagates a desired kernel_func_name right from aotCompiler API so that the generated function has the needed name that doesn't need to be replaced later. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31514095 Pulled By: priyaramani fbshipit-source-id: b70c8e2c733600a435cd4e8b32092d37b7bf7de5	2021-10-23 02:20:45 -07:00
Jianyu Huang	64c68edaf3	[pt] Add Half precision support for bucketize and searchsorted op (#67077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67077 Test Plan: CI Reviewed By: yinghai Differential Revision: D31852556 fbshipit-source-id: 1e4212146ee67edc6b6568d25db79de525782788	2021-10-22 23:37:37 -07:00
Jerry Zhang	2d81d5ab0a	[quant][graphmode][fx] Remove fbgemm_backend_config_dict for now (#67066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67066 We'll add it later when the api is ready Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849079 fbshipit-source-id: 0c00d08510166b2d897cf1562c7276527319b05c	2021-10-22 21:57:56 -07:00
Supriya Rao	8460fa5707	[quant][fx] Add an option in convert_fx to accept qconfig_dict to skip quantization (#66878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66878 Currently convert_fx quantizes all layers that have been prepared, depending on the prepare qconfig_dict This PR adds support to accept a variation of qconfig_dict in convert_fx that can be used to specify skip quantizing certain layers This can help with prepare/observe all operators, quantize a subset of them (based on quantization error), to avoid preparing multiple times. The qconfig_dict passed to convert_fx can only have the values set to `None`, with the keys being the same as what is allowed in the prepare qconfig_dict Test Plan: python test/test_quantization.py TestQuantizeFx.test_convert_qconfig_dict Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31808247 fbshipit-source-id: a4f5dca1090f0083fc3fea14aff56924033eb24f	2021-10-22 21:18:15 -07:00
Supriya Rao	d13829e6be	[quant][[fx] update observer_fqn to not depend on node.name (#66767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66767 Make observer fqn in prepare step independent of input_node/observed_node name. This change names the observers as `{input/output}_activation_post_process_{idx}` where idx will be incremented for each new observer instance and is guaranteed to be unique. Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: anjali411 Differential Revision: D31752052 fbshipit-source-id: e0995b1ef33a99d5b012133fe92d303d55a73b7d	2021-10-22 21:16:24 -07:00
Yukio Siraichi	83f70db95c	Fix common device computation for comparison ops. (#66245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66245 Fixes #66053 This PR splits `declare_static_dtype_and_device` into two new methods for `TensorIteratorBase`: `declare_static_dtype` and `declare_static_device`. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503849 Pulled By: ngimel fbshipit-source-id: 4b131b691d29ceb5f3709f5d6503997ea0875c54	2021-10-22 18:43:17 -07:00
Jerry Zhang	3f5adf4f9c	[quant][graphmode][fx] Use the new convert function instead of the old one in quant-fx2trt tests (#67065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67065 Switching to use _convert_fx_do_not_use in the tests Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849077 fbshipit-source-id: 3688fc09ac538b6abc16ce87c600b8ee04acfcd1	2021-10-22 18:29:58 -07:00
Deyu Fu	af1a2df825	enable better depthwise conv perf on cudnn 8.2+ (#58749 ) Summary: There are multiple improvement of depthwise convolution speed in cudnn between 7.6 and 8.2, since https://github.com/pytorch/pytorch/pull/22302. This PR aim to harvest all the new improvement by enable more cudnn kernel. The workload checking logic can also be simplified now. To keep the change simple, I kept things before cudnn 8.2 unchanged. Similar to https://github.com/pytorch/pytorch/pull/22302, I used a script [here](https://gist.github.com/FDecaYed/e8ba98a95cd33697df2ace86fdb44897) to benchmark. Both run are using cudnn 8.2 One enhancement I did to the script is switch to event based timing. With warmup kernels to fill the launch queue ahead, this should give us accurate kernel timing even in CPU launch bound cases. Here is A100 and V100 result sorted by speedup. [Book1.xlsx](https://github.com/pytorch/pytorch/files/6530371/Book1.xlsx) Result highlights: Newly turned on 5x5 cudnn kernel show up to 6x speedup. Close to half of test sizes show >10% speedup. Fixed some corner cases that previously caused 15-20x slowdown. Only slowdown a handful of cases(~10 out of >1000) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58749 Reviewed By: bdhirsh Differential Revision: D31613199 Pulled By: ngimel fbshipit-source-id: 883b58facad67ccd51dc9ab539368b4738d40398	2021-10-22 17:47:07 -07:00
Wanchao Liang	cf3a5160f8	[BE] move init_multigpu_helper to common_distributed (#67050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67050 This PR moves init_multi_gpu_helper to common_distributed so that it could be shared by different distributed tests. ghstack-source-id: 141370119 Test Plan: wait for ci. Reviewed By: mrshenli Differential Revision: D31842644 fbshipit-source-id: c7bad25d6cef9bdce7ad1fb6c60c1cad4b765702	2021-10-22 17:16:11 -07:00
Yanli Zhao	df3f82a1ef	Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrapper (#66904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66904 Add more FSDP unit tests to cover core logic, freezing weights and flatten parameter wrappe, these unit tests are refactored to be aligned with PyTorch commonly used test classes ghstack-source-id: 141335614 Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D31779565 fbshipit-source-id: c727110d1d7570c0ec49e42cadfc9e9a5e440073	2021-10-22 16:50:52 -07:00
Michael Suo	f6c88fa99d	Revert D31627107: [BE] delete frontend.cpp Test Plan: revert-hammer Differential Revision: D31627107 Original commit changeset: 07d30d280c25 fbshipit-source-id: 5e82f2158f5007c67adb8f947f8cc4d995a9a3bc	2021-10-22 16:39:02 -07:00
Michael Suo	f50bf16c04	Revert D31663043: [BE] minor improvement to dist quantization Test Plan: revert-hammer Differential Revision: D31663043 Original commit changeset: 2f96b7346e9c fbshipit-source-id: d38684dfe79ca335fbbe624496ad4c86c29d1570	2021-10-22 16:37:41 -07:00
Nikita Shulga	7b0408684b	Fix linter (#67122 ) Summary: Fixes regression introduced by `7e5aa0d35a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/67122 Reviewed By: seemethere Differential Revision: D31872569 Pulled By: malfet fbshipit-source-id: ada0137db9a46cbec573489c9c37a94f3a7576ae	2021-10-22 16:02:36 -07:00
Aliaksandr Ivanou	018e06edca	[torchelastic] Skip tests in tsan mode (#67103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67103 Skip tests in tsan mode for now. More info: T104010063 Test Plan: sandcastle + running tests in mode/dev-tsan Reviewed By: d4l3k Differential Revision: D31861426 fbshipit-source-id: d50e5d06afbc82ccce6d102e52f72b5b01f6f41a	2021-10-22 15:55:18 -07:00
samdow	7e5aa0d35a	fixed unique arguments documentation (#66132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66132 Differential Revisi <img width="875" alt="Screen Shot 2021-10-05 at 12 10 39 PM" src="https://user-images.githubusercontent.com/17888388/136276286-3df20681-7b7a-4a91-97d6-4f1ac3722121.png"> on: [D31397746](https://our.intern.facebook.com/intern/diff/D31397746/) Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31734476 Pulled By: samdow fbshipit-source-id: 8999443c7f9b24394d7543652b8350261c1f8b3a	2021-10-22 14:50:02 -07:00
Jerry Zhang	a7bbf8814c	[quant][graphmode][fx] Move quant-fx2trt unittests to test_quantize_fx.py (#67064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67064 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849075 fbshipit-source-id: 9c5e8aad7c88070830d853faf3106491726e77ff	2021-10-22 14:36:36 -07:00
Wanchao Liang	7379d4db20	[BE] minor improvement to dist quantization (#66649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66649 some minor changes to dist quantization, mainly change the namespace and add some notes for future code dedup ghstack-source-id: 141336191 Test Plan: wait for ci Reviewed By: cbalioglu Differential Revision: D31663043 fbshipit-source-id: 2f96b7346e9c90df5ab2536767f8301eb86a9c79	2021-10-22 13:46:28 -07:00
BowenBao	1da628bdb7	[ONNX] Update slice process shape to support rank only inference (#65782 ) (#66149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149 Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31423840 Pulled By: malfet fbshipit-source-id: 17b2b24aa63435d5212ebe6bdf66ae3c348c4e3b Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:46:26 -07:00
Nikita Shulga	0bc9928f31	[ONNX] Symbolic: dynamic input for OneHot, bool for Einsum (#65940 ) (#66147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66147 Symbolic: dynamic input for OneHot, bool for Einsum Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424094 fbshipit-source-id: 76bea22b29c93d1621c597fe7ab59deb3685087f Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:24 -07:00
Nikita Shulga	2c0fe338da	[ONNX] Modify softplus symbolic to support beta!=1 (#65001 ) (#66146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66146 * Modify softplus symbolic to support beta!=1 * Remove parse args Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424096 fbshipit-source-id: 971af54a28141737ccb17510ada03b0651be2a63	2021-10-22 13:46:22 -07:00
Nikita Shulga	6f3f302d9f	[ONNX] Deprecate fold_if pass (#65697 ) (#66145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66145 Deprecate fold_if pass Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424097 fbshipit-source-id: 25b89679c756393a1065ca6aaa24d29db960cbd4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-22 13:46:20 -07:00
Nikita Shulga	a0fc14c20f	[ONNX] Add diagonal symbolic (#64454 ) (#66144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66144 * Add logic and tests * minor edits * Eliminate expand ops * Fix flake and editing * Modified errant message * Add overrun check * Add overrun descriptions * Remove emptyline Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424095 fbshipit-source-id: 5b8ef6ac21c32d43c3dbc8e51e1ef30bffb19c25	2021-10-22 13:46:18 -07:00
Nikita Shulga	b18c298f24	ONNX: Delete or document skipped ORT tests (#64470 ) (#66143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66143 Delete test_list_remove. There's no point in testing conversion of this model since TorchScript doesn't support it. Add a link to an issue tracking test_embedding_bag_dynamic_input. [ONNX] fix docs (#65379) Mainly fix the sphinx build by inserting empty before bulleted lists. Also some minor improvements: Remove superfluous descriptions of deprecated and ignored args. The user doesn't need to know anything other than that they are deprecated and ignored. Fix custom_opsets description. Make indentation of Raises section consistent with Args section. [ONNX] publicize func for discovering unconvertible ops (#65285) * [ONNX] Provide public function to discover all unconvertible ATen ops This can be more productive than finding and fixing a single issue at a time. * [ONNX] Reorganize test_utility_funs Move common functionality into a base class that doesn't define any tests. Add a new test for opset-independent tests. This lets us avoid running the tests repeatedly for each opset. Use simple inheritance rather than the `type()` built-in. It's more readable. * [ONNX] Use TestCase assertions rather than `assert` This provides better error messages. * [ONNX] Use double quotes consistently. [ONNX] Fix code block formatting in doc (#65421) Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424093 fbshipit-source-id: 4ced841cc546db8548dede60b54b07df9bb4e36e	2021-10-22 13:46:16 -07:00
Nikita Shulga	7a78f715a6	[ONNX] Add warning for inplace updates on tensor.shape in tracing mode (#63170 ) (#66142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66142 * Add warning * Lint and clang fixes * Remove duplicate comments * Added pitfalls section * Modify sections * Minor modifications * Add underline to avoid doc build failures Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424092 fbshipit-source-id: c83195f3c66885ad1aecde13b3029c45dd171dbd	2021-10-22 13:46:14 -07:00
Nikita Shulga	136abf5aff	[ONNX] Update sum symbolic to handle dtypes (#64289 ) (#66141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66141 * Update aten::sum symbolic for dtype * Remove nesting and modify opeartor tests * Fix expect files [ONNX] Fix expect files added in #64289 (#65356) Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424091 fbshipit-source-id: d4af21e9f0d7e1c68bf6ef2f3e385db84b4c53f3	2021-10-22 13:46:12 -07:00
Nikita Shulga	53a163a015	[ONNX] Export nn.Module call as ONNX local function (#63589 ) (#66140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66140 * Add new argument to export api to enable users specifying `nn.Module` classes that they wish to be exported as local function in ONNX model. * Refactor `torch/csrc/jit/serialization/export.cpp`, and remove redundant `EncoderBase` class. * ~~Contains changes from #63268~~ * Depends on #63716 to update onnx submodule. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D31424098 fbshipit-source-id: c949d0b01c206c30b4182c2dd1a5b90e32b7a0d3 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-22 13:44:56 -07:00
Wanchao Liang	d1986a1cf5	[BE] delete frontend.cpp (#66581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66581 c10d/frontend.cpp was originally proposed to introduce pure C++ API and use TorcBind to share python level API with TorchScript. This is no longer needed, so delete this to reduce code redundancy. ghstack-source-id: 141336190 Test Plan: wait for ci Reviewed By: rohan-varma Differential Revision: D31627107 fbshipit-source-id: 07d30d280c25502a222a74c2c65dfa4069ed8713	2021-10-22 13:33:24 -07:00
Jerry Zhang	e8742f15cf	[quant][graphmode][fx] Add observation_type.py (#67063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67063 Adding ObservationType Enum for `backend_config_dict` Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31849078 fbshipit-source-id: e9e7225d564b51fa9454f7f087dd134152c069a0	2021-10-22 12:17:54 -07:00
Mike Iovine	f2582a59d0	[SR] Add rvalue overload for operator() (#66648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66648 Currently, SR shallow-copies its `IValue` inputs when running inferences. We can avoid refcount bumps by `std::move`-ing the inputs into their slots. To achieve this, I've made the following changes: 1. Add an overload for `set_inputs` that takes a `std::vector<IValue>&&`. 2. Change the signatures of `StaticModule::operator()` and `StaticRuntime::operator()`. Old: ``` operator()(const std::vector<IValue>& args, const std::unordered_map<std::string, IValue>& kwargs) ``` New: ``` template <class IValueList> operator()(IValueList&& args, const std::unordered_map<std::string, IValue>& kwargs) ``` The implementations use perfect forwarding to invoke the correct overload of `set_inputs`. Test Plan: Added a short new unit test to exercise the new code path. All other unit tests still pass. Reviewed By: hlu1 Differential Revision: D31659973 fbshipit-source-id: b8c194405b54a5af1b418f8edaa1dd29a061deed	2021-10-22 10:51:47 -07:00
Aditya Pillai	40a8a50913	Add static_runtime::fused_equally_split (#2 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66881 Adds `static_runtime::fused_equally_split` operator and removes `is_fused` logic from original operator. Modifies `FuseUnpackListV2` to map `fb::equally_split` to this new operator. Test Plan: ``` adityapillai@5960 /data/sandcastle/boxes/fbsource/fbcode 1m 13s ❯ buck test //caffe2/benchmarks/static_runtime/fb:test_fb_operators ``` and sandcastle strange_what_could_go_wrong Reviewed By: mikeiovine Differential Revision: D31742293 fbshipit-source-id: 60b35589c8817719b005d49811f575b6590d1c39	2021-10-22 10:26:49 -07:00
Mike Iovine	391eb1dbe3	[JIT] UseVariadicOp handles multiple lists (#66288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66288 This change makes it so `UseVariadicOp` can transform ops with many Tensor list inputs. Input pattern: ``` %output : Type = op(%list_1, %arg_1, %list_2, %list_3) ``` Output pattern: ``` %output : Type = variadic_op(%list_11, ..., %list_1N, %arg_1, %list_21, ..., %list_2M, %list_31, ..., %list_3K, N, M, K) ``` The length of each list is passed at the end of the variadic op so that the op implementation can process the inputs appropriately. This also frees us from needing to update `hasVarArgs` in static runtime each time we add a variadic op. This diff also makes `UseVariadicOp` more robust. Before, `list_idx` was passed as an argument. Now, `VariadicUpdater` determines `list_idx` from the node's schema. Test Plan: Existing variadic ops do not break: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: d1jang Differential Revision: D31450811 fbshipit-source-id: 808fcc3ae8940b9e602586f38f8cf9154c9a6462	2021-10-22 10:22:33 -07:00
mattip	c7121ae77f	fix formatting CIRCLE_TAG when building docs (#67026 ) Summary: Similar to pytorch/text#1416 malfet, brianjo The previous code failed when tags changed from `v0.9.0` to `v0.10.0`. I tested this offline, it would be nice to somehow be actually tag the repo and see that this adds the correct documentation directory to the pytorch/pytorch.github.io repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67026 Reviewed By: saketh-are Differential Revision: D31843381 Pulled By: malfet fbshipit-source-id: 21526ad9ed4c1751c2d7f6d621da305f166a7f55	2021-10-22 10:10:52 -07:00
Eddie Yan	d9c4b3feab	Do rowwisemoments computation in `float` for `half` `LayerNorm` (#66920 ) Summary: https://github.com/pytorch/pytorch/issues/66707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66920 Reviewed By: mrshenli Differential Revision: D31850612 Pulled By: ngimel fbshipit-source-id: a95a33567285dcf9ee28d33f503cead3268960f9	2021-10-22 09:50:42 -07:00
Elias Ellison	6e6ede2e70	[JIT] Re-enable alias sensitive peepholes (#65860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65860 Re-enable peepholes like `x + 0 == x`. These were at one point enabled, and then disabled because they did not properly account for aliasing, and then re-enabled with reconstructing the alias db everytime which is slow - O(n^2). I've added correctness conditions, and I've also made it so that we avoid using stale aliasing properties for either the input or output of nodes we optimize. Some of the other code that we have written to avoid re-instantiating the alias db involves internally mutating it, however this is tricky to reason about and we probably have to add some extra invariants... cc navahgar relevant to graph opts and d1jang alias analysis relevant here Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31352382 Pulled By: eellison fbshipit-source-id: 441a27f17dc623d6c24538d1d43cba0412c3c482	2021-10-22 09:45:57 -07:00
Don Jang	051ea5ccbf	[Static Runtime] Bundle function & function_kind to carry them together (#66974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66974 `D31591785 (`67e003f09b`)` started carrying a function object to be executed and `FunctionKind` for the type of the function separately, and this caused a bug fixed by D31783028 (`79803b199f`). This change bundles them as it was before done by swolchok to reduce the chances of such a mistake in the future. They need to be carried altogether always since `FunctionKind` identifies the type of the function object. Note that `struct Function` is a POD type, so accessing its field (first, second) shouldn't cause an extra overhead in `ProcessedNode::run()`. Test Plan: Confirmed that the managed memory metics remain the same before/after this diff on inline_cvr: ``` #AFTER # inline_cvr/local Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) # inline_cvr/local_ro Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2679 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%) # inline_cvr/remote_ro First iter time: 12.0344 ms Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ``` ``` #BEFORE # inline_cvr/local Total number of managed tensors: 2660 Total number of managed output tensors: 0 Total number of unmanaged values: 3041 Total memory managed: 1496896 bytes Total number of reused tensors: 1183 Total number of 'out' variant nodes/total number of nodes: 2452/2469 (99.3115%) #inline_cvr/local_ro Total number of managed tensors: 1412 Total number of managed output tensors: 0 Total number of unmanaged values: 2679 Total memory managed: 39040 bytes Total number of reused tensors: 959 Total number of 'out' variant nodes/total number of nodes: 1928/1939 (99.4327%) #inline_cvr_remote_ro Total number of managed tensors: 1293 Total number of managed output tensors: 0 Total number of unmanaged values: 14 Total memory managed: 5293824 bytes Total number of reused tensors: 771 Total number of 'out' variant nodes/total number of nodes: 1298/1298 (100%) ``` Reviewed By: mikeiovine Differential Revision: D31798419 fbshipit-source-id: fd4301b6731e402be0820729654735c791511aba	2021-10-22 08:57:49 -07:00
Erjia Guan	3d7a344c5e	Fix ArchiveReader to keep archive path (#67035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67035 Incorporate the same change from https://github.com/pytorch/data/pull/73 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31837963 Pulled By: ejguan fbshipit-source-id: 3b0171ba30f392c8773c497702bc60aa4fbe28c6	2021-10-22 06:34:39 -07:00
Natalia Gimelshein	d1a5612a3e	remove accscalar from i0 and i0e (#67048 ) Summary: Removes some of the half math ops to make https://github.com/pytorch/pytorch/issues/64023 possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/67048 Reviewed By: mruberry Differential Revision: D31847249 Pulled By: ngimel fbshipit-source-id: 8385aacd846bb990e368ff336eb346d847af70b9	2021-10-22 01:34:36 -07:00
Chen Lai	5f58764d1d	[PyTorch Edge][type] Add type support for NamedTuple custom class (import) (#63130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63130 Extend `type_parser` to handle `NamedTuple` type. It can be extended to handle other types when needed. The custom type will follow the following format: ``` "qualified_named[ NamedTuple, [ [filed_name_1, field_type_1], [filed_name_2, field_type_2] ] ]" ``` For example: ``` "__torch__.base_models.sparse_nn.pytorch_preproc_types.PreprocOutputType[ NamedTuple, [ [float_features, Tensor], [id_list_features, List[Tensor]], [label, Tensor], [weight, Tensor], ] ]" ``` For nested types, the order of type lists from type table should be: ``` std::string type_1 = “__torch__.C [ NamedTuple, [ [field_name_c_1, Tensor], [field_name_c_2, Tuple[Tensor, Tensor]], ] ]” std::string type_2 = “__torch__.B [ NamedTuple, [ [field_name_b, __torch__.C ] ] ]” std::string type_3 = “__torch__.A[ NamedTuple, [ [field_name_a, __torch__.B] ] ]” std::vector<std::string> type_strs = {type_str_1, type_str_2, type_3}; std::vector<TypePtr> type_ptrs = c10::parseType(type_strs); ``` namedtuple from both `collection` and `typing` are supported ``` from typing import NamedTuple from collections import namedtuple ``` This change only adds the parser and now new runtime can read the above format. ghstack-source-id: 141293658 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.CompatibleCustomType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatiblePrimitiveType' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.InCompatibleCustomType' ``` Reviewed By: iseeyuan Differential Revision: D30261547 fbshipit-source-id: 68a9974338464e320b39a5c613dc048f6c5adeb5	2021-10-22 00:40:57 -07:00
lezcano	d3fc3c4ded	Implement forward AD for linalg.matrix_exp (#62716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62716 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31823231 Pulled By: mruberry fbshipit-source-id: 6d19b8988dce773b5716f0522d06febfe167fead	2021-10-21 23:55:36 -07:00
Han Qi	fe102b9888	diff tool (#66854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66854 diff tool and script to test correctness of flatbuffer format Test Plan: `./verify_flatbuffer.sh \| pastry` P463163180 Reviewed By: zhxchen17 Differential Revision: D31752696 fbshipit-source-id: bea00102b21e62c02367853c8bec2742b483fbda	2021-10-21 22:53:51 -07:00
Jerry Zhang	8ea985f240	[quant][fx][graphmode] Rename files and functions for convert and add do_not_use suffix (#66955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66955 The new convert function are not meant to be used by users, it's a temporary function that we use to build up the new convert path, we will bring feature parity with the old path and deprecate the old path after that Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31810488 fbshipit-source-id: 2f65a110506683123350e619c48df090a15570fc	2021-10-21 22:17:28 -07:00
Hanton Yang	01ced45217	[iOS] Bump up iOS CocoaPods version to 1.10.0 (#67058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67058 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D31846445 Pulled By: hanton fbshipit-source-id: 7510a6c15fdeecc996fcce5c48db32e148ba7def	2021-10-21 21:30:24 -07:00
Nikita Shulga	77beccaedb	Do not build PyTorch with caffe2 by default (#66658 ) Summary: CAFFE2 has been deprecated for a while, but still included in every PyTorch build. We should stop building it by default, although CI should still validate that caffe2 code is buildable. Build even fewer dependencies when compiling mobile builds without Caffe2 Introduce `TEST_CAFFE2` in torch.common.utils Skip `TestQuantizedEmbeddingOps` and `TestJit.test_old_models_bc` is code is compiled without Caffe2 Should be landed after https://github.com/pytorch/builder/pull/864 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66658 Reviewed By: driazati, seemethere, janeyx99 Differential Revision: D31669156 Pulled By: malfet fbshipit-source-id: 1cc45e2d402daf913a4685eb9f841cc3863e458d	2021-10-21 20:32:47 -07:00
Horace He	4fe8055b9f	made functorch not decompose by default (#66945 ) Summary: Basically reverting this: https://github.com/pytorch/pytorch/pull/63616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66945 Reviewed By: zou3519 Differential Revision: D31802176 Pulled By: Chillee fbshipit-source-id: b1cabd7af66aef26411801516c87336eaea4fccb	2021-10-21 19:18:00 -07:00
vfdev-5	28fac23409	Fixes CUDA vs CPU consistency for index_put_ when accumulating (#66790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39227 Fixes https://github.com/pytorch/pytorch/issues/66495 (duplicate of 39227) Description: - Expands values for CUDA implementation - Improved shapes checking for CUDA - Improved error message for CUDA - Added tests cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66790 Reviewed By: mruberry Differential Revision: D31843566 Pulled By: ngimel fbshipit-source-id: c9e5d12a33e1067619c210174ba6e3cd66d5718b	2021-10-21 19:09:57 -07:00
Bo Wang	35965869cf	Enroll bowangbj@ to PyTorch distributed package (#67062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67062 For cc and potential reviews Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31849050 fbshipit-source-id: d3899c2ca857b8f22bdc88b4e83cdd20bbf0b1d6	2021-10-21 18:45:21 -07:00
Natalia Gimelshein	20f08d23a0	Revert D31838513: Strided masked reduction: mean. Test Plan: revert-hammer Differential Revision: D31838513 Original commit changeset: 54b99ccf9821 fbshipit-source-id: 5480e8482c8770b41579ee085e158572b659c1f5	2021-10-21 18:32:42 -07:00
Jane Xu	2578de4851	[skip ci] Set test owner for test_cuda* tests (#66838 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/66838 Reviewed By: saketh-are Differential Revision: D31841411 Pulled By: janeyx99 fbshipit-source-id: 5cdffdef4a92f9adcef1143ae4598b052c5acc6b	2021-10-21 17:36:25 -07:00
Pearu Peterson	b40a940192	Strided masked reduction: mean. (#66784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66784 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31838513 Pulled By: cpuhrsch fbshipit-source-id: 54b99ccf9821832c31976406379939b3c95f41de	2021-10-21 16:32:45 -07:00
imaginary-person	b696d64ef4	Binaries without AVX512 kernels shouldn't report CPU Capability as AVX512 on machines with AVX512 support (#66703 ) Summary: ### BUG If a PyTorch binary is built with a compiler that doesn't support all the AVX512 intrinsics in the codebase, then it won't have ATen AVX512 kernels, but at runtime, CPU capability would still be incorrectly returned as AVX512 on a machine that supports AVX512. It seems that PyTorch Linux releases are done on CentOS with `gcc 7.3`, so this bug would manifest in the 1.10 release, unless a fix such as this one is added. gcc versions below 9.0 don't support all the AVX512 intrinsics in the codebase, such as `_mm512_set_epi16`. ### FIX CPU Capability would be returned as AVX512 at runtime only if the binary was built with a compiler that supports all the AVX512 intrinsics in the codebase, and if the hardware the binary is being run on supports all the required AVX512 instruction sets. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66703 Reviewed By: gchanan Differential Revision: D31732625 Pulled By: malfet fbshipit-source-id: e52d06b87fbe2af9b303a2e9c264189c8512d5ec	2021-10-21 16:17:28 -07:00
Saketh Are	33790c4e06	Implement histogramdd on CPU (#65318 ) Summary: Implements `torch.histogramdd` analogous to `numpy.histogramdd`. Builds on https://github.com/pytorch/pytorch/pull/58780, generalizing the existing `torch.histogram` kernel to handle D-dimensional inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65318 Reviewed By: soulitzer Differential Revision: D31654555 Pulled By: saketh-are fbshipit-source-id: 14b781fac0fd3698b052dbd6f0fda46e50d4c5f1	2021-10-21 16:09:31 -07:00
Jane Xu	6a224b3370	Set test owners for quantization tests (#66832 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/66832 Reviewed By: saketh-are Differential Revision: D31842880 Pulled By: janeyx99 fbshipit-source-id: 8aee760e4203045c12e7548a21ed5b71c557e3ee	2021-10-21 16:04:41 -07:00
Natalia Gimelshein	f29e5220a6	Revert D31474901: [pytorch][PR] [numpy] add torch.argwhere Test Plan: revert-hammer Differential Revision: D31474901 Original commit changeset: 335327a4986f fbshipit-source-id: 534093e459762ff7a888c58d76e49e362015f2ba	2021-10-21 15:50:54 -07:00
Richard Barnes	fcfa06586d	Wextra fix for NamedTensor.cpp (#66897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66897 Fixes: ``` stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (order_idx >= ellipsis_idx) { ~~~~~~~~~ ^ ~~~~~~~~~~~~ stderr: caffe2/aten/src/ATen/native/NamedTensor.cpp:226:19: error: comparison of integers of different signs: 'const unsigned long' and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (order_idx >= ellipsis_idx) { ~~~~~~~~~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31774623 fbshipit-source-id: b6e5b76695e512084ac5c9cb4215de7e9b763cf8	2021-10-21 14:22:38 -07:00
kshitij12345	462f333c01	[numpy] add torch.argwhere (#64257 ) Summary: Adds `torch.argwhere` as an alias to `torch.nonzero` Currently, `torch.nonzero` is actually provides equivalent functionality to `np.argwhere`. From NumPy docs, > np.argwhere(a) is almost the same as np.transpose(np.nonzero(a)), but produces a result of the correct shape for a 0D array. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64257 Reviewed By: dagitses Differential Revision: D31474901 Pulled By: saketh-are fbshipit-source-id: 335327a4986fa327da74e1fb8624cc1e56959c70	2021-10-21 14:02:11 -07:00
soulitzer	892ac08a02	Do not generate not_implemented error for forward AD when input with tangent passed to non-differentiable function (#66926 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/61926 1. update the `if` to just use requires_derivative since that should reflect when function is not differentiable 2. if `requires_derivative=True` but no outputs have forward derivatives, we should error as usual 3. ~In the future we may also want to handle the case~ when `len(fw_derivatives) > 0 and len(fw_derivatives) < num_diff_outputs` we should add assert in codegen that this does not happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66926 Reviewed By: anjali411 Differential Revision: D31810736 Pulled By: soulitzer fbshipit-source-id: 11a14477cc7554f576cff2ed1711a448a8c6a66a	2021-10-21 13:53:07 -07:00
Facebook Community Bot	062ae8df0e	Automated submodule update: tensorpipe (#65353 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `183172ba8c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65353 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D31059779 fbshipit-source-id: 7bddff5139f8168750e22e1cc8c0d49931db542e	2021-10-21 13:30:45 -07:00
Jane Xu	b07371f19c	[skip ci] Set test owners for serialization tests (#66862 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66862 Reviewed By: saketh-are Differential Revision: D31828615 Pulled By: janeyx99 fbshipit-source-id: 8d28970eead9d6f26e9ea64b823295d9c9e1469d	2021-10-21 13:22:18 -07:00
Jane Xu	6f1ba16d6d	[skip ci] Set test owners for cpp test (#66836 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc yf225 glaringlee Pull Request resolved: https://github.com/pytorch/pytorch/pull/66836 Reviewed By: saketh-are Differential Revision: D31828641 Pulled By: janeyx99 fbshipit-source-id: 076d41686746fecebc07452df8212eef15a7824c	2021-10-21 13:17:46 -07:00
Jane Xu	00a871c5c9	[skip ci] Set test owner for multiprocessing tests (#66848 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66848 Reviewed By: VitalyFedyunin Differential Revision: D31828908 Pulled By: janeyx99 fbshipit-source-id: 45d6901648f5564c1bf07ad8d01d69ef486ae104	2021-10-21 13:13:53 -07:00
Jane Xu	78f970568c	Add dummy op to use instead of searchsorted (#66964 ) Summary: Would help unblock https://github.com/pytorch/pytorch/issues/66818 if this actually works Pull Request resolved: https://github.com/pytorch/pytorch/pull/66964 Reviewed By: mruberry Differential Revision: D31817942 Pulled By: janeyx99 fbshipit-source-id: 9e9a2bcb0c0479ec7000ab8760a2e64bf0e85e95	2021-10-21 12:56:22 -07:00
Kurt Mohler	94f4e9a995	Enable warning tests for nondeterministic backward functions (#66736 ) Summary: Followup from https://github.com/pytorch/pytorch/issues/66233 Since https://github.com/pytorch/pytorch/issues/50209 was fixed, we can enable these warning tests now cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66736 Reviewed By: zou3519 Differential Revision: D31723385 Pulled By: mruberry fbshipit-source-id: dc1922a6d0c45cc80020db85710e755a89113861	2021-10-21 12:51:53 -07:00
Shen Li	ce6f4b3a02	Setup c10d extension Backend class attr the same way as builtin ones (#66991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66991 Currently, c10d extensions uses Backend.NAME to store the creator function. However, builtin ones use that same field to store the name. This commit makes c10d extensions comply with builtin ones, and uses a dedicated `_plugins` field to store creator functions. Thanks bryanmr for pointing this out. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31820307 Pulled By: mrshenli fbshipit-source-id: 259769ebfc80c0c9fc44d25498c8d19a3a09d1bc	2021-10-21 12:35:07 -07:00
Saketh Are	40e5d31a52	Add OpInfo for torch.bincount (#65796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65796 Reviewed By: bdhirsh Differential Revision: D31386560 Pulled By: saketh-are fbshipit-source-id: acb6ed3f743ddcccd0ff7ce1ab21f77c2078da37	2021-10-21 12:11:38 -07:00
Tal Ben-Nun	9d4549295d	ONNX export: propagate node metadata across passes (#45256 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45255 Mostly straightforward. Only downside in this PR is the lack of more scalable way to check for all newly-created nodes in `callPySymbolicFunction`. The other options were: * Create a scope within the node's scope and loop through all nodes that correspond to the scope. The code would still need to loop through all nodes. * Add extra state to the graph (no good reason to do so). * Add extra state to the ONNX exporter, since python calls go back to `g.op(...)` (no good reason to do so, also not very pythonic). cc BowenBao neginraoof Pull Request resolved: https://github.com/pytorch/pytorch/pull/45256 Reviewed By: malfet, houseroad Differential Revision: D31744281 Pulled By: msaroufim fbshipit-source-id: 1b63f6e7f02ed61b3a9b7ac3d0be0a3a203c8ff6	2021-10-21 11:49:05 -07:00
Michael Suo	a33f341cee	[ci] try setting MAX_JOBS on windows builds to reduce OOMs (#66986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66986 See: https://github.com/pytorch/pytorch/issues/66674 Test Plan: Imported from OSS Reviewed By: seemethere, anjali411 Differential Revision: D31822578 Pulled By: suo fbshipit-source-id: e24bbe9a1ff21ad0653708217cef5d8b2f56c5a2	2021-10-21 11:41:05 -07:00
Mike Iovine	53cf7e844f	[SR] Fix bug in FuseListUnpackV2 (#67021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/67021 When applying the equally split optimization, we still need to delete the list unpack node. I did an accuracy test yesterday but didn't catch this issue because my diffs were not properly synced between devservers (I use hlu1's devbig for testing and it had an old version of "Add FuseListUnpackV2"). But I did another test this morning and realized that there was an issue. This is not affecting anything in prod right now since D31742293 has not landed. Reviewed By: hlu1 Differential Revision: D31827278 fbshipit-source-id: c7b05e3d8ec942632adcff4bdfebb8c27c1a7a39	2021-10-21 11:08:04 -07:00
Samuel Salas	a7ec4b53d2	Splitter: Transformer_encoder (#66952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66952 Added splitter to lower parts of the transformer model Program now supports arg input Test Plan: Performance on non-lowered model: 0.19662559509277344 Performance on semi-lowered model: 0.19131642150878905 Reviewed By: 842974287 Differential Revision: D31541325 fbshipit-source-id: 194aba97afc794dbeada4bbc4777d0a7b02e3635	2021-10-21 10:59:08 -07:00
Samuel Salas	d73b88b473	Unsqueeze bug fix (#66889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66889 Added support for negative dims and modified unit test. Test Plan: buck test mode/dev-nosan caffe2/test/fx2trt/converters:test_unsqueeze Reviewed By: 842974287 Differential Revision: D31769393 fbshipit-source-id: 854335ead2ffad5f466ad66b9be36ba20a0fea67	2021-10-21 10:57:58 -07:00
Pearu Peterson	23321ba7a3	Fix bug [#66780 ]: wrong input to torch.is_floating_point (#66783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66783 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31802971 Pulled By: cpuhrsch fbshipit-source-id: 6a7d8b83dad219fd683504f9084b77358800507c	2021-10-21 09:50:58 -07:00
Jane Xu	13b8599831	[skip ci] Set test owner for test_dispatch.py (#66840 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66840 Reviewed By: saketh-are Differential Revision: D31829224 Pulled By: janeyx99 fbshipit-source-id: 66aceacd4f976c36ed48ca5be59616d245ba2a82	2021-10-21 08:48:37 -07:00
John Shen	8cbdf49dce	[qnnpack] Remove conv_utils.h (#66605 ) Summary: This completes the removal of conv_utils and redistributes its dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/66605 ghstack-source-id: 140565820 Test Plan: ci tests Reviewed By: kimishpatel Differential Revision: D31637731 fbshipit-source-id: 48d3a423e4ff0eb6ab21bb13bda44da16996423b	2021-10-21 08:23:42 -07:00
Jane Xu	960e3216a4	[skip ci] Set test owner for named tensor tests (#66849 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66849 Reviewed By: zou3519 Differential Revision: D31828903 Pulled By: janeyx99 fbshipit-source-id: 30810bcec750ba8e1d5a342c31a5996bf57acd69	2021-10-21 08:22:26 -07:00
Jane Xu	f5c5ab2868	[skip ci] Set test owner for cpp-extensions tests (#66837 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc yf225 glaringlee zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66837 Reviewed By: anjali411 Differential Revision: D31828401 Pulled By: janeyx99 fbshipit-source-id: 35ac27f3e1c0eb70ccb38c07c42ba61bd0c848fe	2021-10-21 08:15:38 -07:00
arindamroy-eng	32e790997b	[Rocm]Reduce severity of detected possible memory leak from assertion to warning (#65973 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/62533. In very rare cases, the decorator for detecting memory leak is throwing assertion, even when the test is passing, and the memory is being freed with a tiny delay. The issue is not being reproduced in internal testing, but shows up sometimes in CI environment. Reducing the severity of such detection to warning, so as not to fail the CI tests, as the actual test is not failing, rather only the check inside the decorator is failing. Limiting the change to ROCM only for now. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65973 Reviewed By: anjali411 Differential Revision: D31776154 Pulled By: malfet fbshipit-source-id: 432199fca17669648463c4177c62adb553cacefd	2021-10-21 07:10:54 -07:00
Jagadish Krishnamoorthy	70a5113e03	[ROCm] update Magma for 4.3 release (#65203 ) Summary: Upstream magma fixes the cholesky issues. Refer https://bitbucket.org/icl/magma/issues/48/parameter-4-was-incorrect-on-entry-to Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Fixes #{issue number} cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65203 Reviewed By: anjali411 Differential Revision: D31766608 Pulled By: malfet fbshipit-source-id: 3829b89314d25d8aa14be57ead879a811ab3f098	2021-10-21 07:06:01 -07:00
Bo Wang	b6df043f1f	Add torch.nn.init.uniform_ operator to ShardedTensor. (#63997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63997 Use torch_function to extend torch.nn.init.uniform_ The Init is done in SPMD fashion. Note that ideally we want to aggregate sharded tensors into a global tensor, init it and reshard. It's fine to run it SPMD since uniform is I.I.D indepenent and identifically distributed. Also enable unit test for test_linear.py for OSS test Test Plan: a) Unit Test (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_init.py TestShardedTensorNNInit --v (pytorch) ... $ python test/distributed/_sharded_tensor/ops/test_linear.py --v (before runs this command is no-op) or b) Manual run: Instruction here: https://docs.google.com/document/d/1_m1Hdo5w51-hhPlZ_F8Y6PIWrN7UgJZqiSpARYvhsaE/edit# Imported from OSS Reviewed By: pritamdamania87, anjali411 Differential Revision: D30563017 fbshipit-source-id: d1859f7682235bcb44515efc69ca92bc5e34fce1	2021-10-21 00:17:13 -07:00
Bert Maher	bdb889aca1	[nnc] Use a descriptive name for fused kernels when profiling (#66990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66990 NNC fusion groups currently show up as "TensorExpr" in the profiler, which is true but not super useful since it obscures what's actually happening in the fusion group. This change will log them as `fused_XXX` where XXX is a (length-limited) series of ops describing the subgraph, for instance `fused_mul_add` to represent a group containing `aten::mul`, `aten::add`. Test Plan: New unit test to check the output of autograd profiler. Reviewed By: dzhulgakov Differential Revision: D31762087 fbshipit-source-id: 3fadbdc67b054faa01aa42e5b6ea2c4a6bc3481f	2021-10-21 00:06:23 -07:00
Pavithran Ramachandran	8beabffac3	[PyTorchEdge] Make aten function common to aten and torch_common (#66663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66663 fb: TensorCompare.cpp is in per-app, a target higher than torch_mobile Please read this doc to know about [Per-app ATen/native and Template Selective Build]( https://docs.google.com/document/d/1O5--mOAi_gGh2GkE-REo3qJRRQ_Lks69IfgszcB8ThI/edit) Create a filed called "prim_native_functions.cpp" in ATen, add it to aten_cpu, and cut-paste native::is_nonzero() to prim_native_functions.cpp. By doing this we move the function to lower layer which is more visible to all targets depending on it. Instruction count comparison new vs old https://www.internalfb.com/phabricator/paste/view/P463272302?view=diff Test Plan: fb: ``` (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_cpu Building: finished in 0.4 sec (100%) 1/202 jobs, 0/202 updated Total time: 0.4 sec More details at https://www.internalfb.com/intern/buck/build/ea35300b-55be-4b9f-bc74-80cdd869c16a BUILD SUCCEEDED (base) [pavithran@devvm1803.vll0 /data/users/pavithran/fbsource] buck build //xplat/caffe2:aten_native_cpu Building: finished in 0.7 sec (100%) 1/1 jobs, 0/1 updated Total time: 0.8 sec More details at https://www.internalfb.com/intern/buck/build/ccd97d43-c59d-4f29-9418-485cd24575e2 BUILD SUCCEEDED ``` Reviewed By: iseeyuan Differential Revision: D31669536 fbshipit-source-id: d35f069f975db6dce0b678c5b5ddd74bd690f599	2021-10-20 20:41:41 -07:00
Jerry Zhang	f8f04d5424	[quant][graphmode][fx] Add support for single linear and conv2d (#66950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66950 Just to show that it works for weighted operations as well, qat/fused op not supported yet We can start developing the backend_config_dict and work towards making the support more complete afterwards Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31801782 fbshipit-source-id: 8491bab7939a7a1c23ffa87c351844b82e390027	2021-10-20 19:13:27 -07:00
Jerry Zhang	a89851a0d9	[quant][fx][graphmode] Adding a new convert function that produces reference pattern by default (#66925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66925 Current convert_fx implementation is using "The Interpreter Pattern" in https://pytorch.org/docs/stable/fx.html There are two things that's changed which make the approach in this PR possible and needed: 1). original convert implementation is developed at the initial prototype where fx does not allow mutations, now fx supports mutations 2). original convert needs to work for a lot of fbgemm/qnnpack specific logic, which is not needed for reference patterns Therefore it makes sense for us to write a new convert function just for reference patterns, the implementation is significantly easier to understand than the original convert implementation Current support: * we should be able to support all non-weighted ops like relu, add etc. Missing: * linear and conv * some advanced features like standalone modules, input_quantized_idxs etc. will add linear and conv support and start defining the backend_config_dict based on this version of convert Test Plan: python test/test_quantization.py TestQuantizeFxOpsNew Imported from OSS Reviewed By: vkuzo Differential Revision: D31786241 fbshipit-source-id: 2a32156eb6d3c5271cb44906cd863055785fb5d4	2021-10-20 18:54:30 -07:00
Nanshu Wang	db4165892b	[SmartCompose][OnDevice]fix function name bug in mobile export & Script to convert mobile model (#66915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66915 Pull Request resolved: https://github.com/pytorch/pytorch-canary/pull/3 fix function name bug in mobile export Test Plan: buck run pytext/fb/assistant/smart_compose:mobile_converter -- --model_input=pytext_training/tree/teams/assistant/smart_compose/300555761/model.ts --model_output=pytext_training/tree/teams/assistant/smart_compose/300555761/mobile_model_test.ts Reviewed By: JacobSzwejbka Differential Revision: D31782983 fbshipit-source-id: 7288bb65adc7346d218980a535d68a12d8ef2033	2021-10-20 18:14:51 -07:00
Mike Iovine	ab1e4eac42	[Static Runtime] Add FuseListUnpackV2 (#66509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66509 Like `FuseListUnpack`, but instead of adding arguments to the fused node's outputs, inserts a new fused op. By using a new fused op, we can avoid runtime `is_fused` checks. This will make the op implementations significantly cleaner. Eventually, we will migrate all ops to `V2` and delete to old pass. `FuseListUnpackV2` also fixes the bug described in T103159043. Test Plan: I've made some changes to D31550307 locally and verified that everything works. Reviewed By: hlu1 Differential Revision: D31492017 fbshipit-source-id: 4f90fcbc17e4c70a3d65985bee836fabf868a22c	2021-10-20 16:39:32 -07:00
Elias Ellison	17889ad26e	Add support for cat in output stitching (#66098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098 `cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize. Test Plan: Imported from OSS Reviewed By: navahgar, anjali411 Differential Revision: D31797467 Pulled By: eellison fbshipit-source-id: ca761e214dfd7f3bba8d189f3b3f42ffec064f63	2021-10-20 16:13:09 -07:00
Elias Ellison	2dd23ebfdb	Add support for multi output nodes in partial eval graph stitching (#66097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097 Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797468 Pulled By: eellison fbshipit-source-id: 2c182b71a46b36d33f23ad35b89790a4a5d4471c	2021-10-20 16:13:07 -07:00
Elias Ellison	0196b984f3	Add Handling of Cat in Shape Analysis (#65575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575 This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes. The graph of with two dynamic shape inputs produces: ``` graph(%x.1 : Tensor(SS(-2), 2, 3), %y.1 : Tensor(SS(-3), 2, 3)): %5 : int = prim::Constant[value=0]() %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1) %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19 return (%6) ``` With a partial eval graph of ``` Done with partial evaluation graph(%129 : int[], %130 : int[], %dim.14 : int): %738 : int = prim::Constant[value=3]() %737 : int = prim::Constant[value=2]() %132 : int = prim::Constant[value=0]() %392 : int = aten::__getitem__(%129, %132) # <string>:339:44 %417 : int = aten::__getitem__(%130, %132) # <string>:339:44 %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29 %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738) return (%result_size.5) ``` To handle cat, I essentially make the cat shape op variadic, replacing ``` torch.cat([x, y] ... def cat_shape_op(tensors: List[List[int]], dim: int): ... op(tensors) ``` with ``` def cat_shape_op(x: List[int], y: List[int], dim: int): tensors = [x, y] op(tensors) ``` This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797471 Pulled By: eellison fbshipit-source-id: 62c794533d5fabfd3fad056d7e5fe3e8781b22c5	2021-10-20 16:13:05 -07:00
Elias Ellison	eaba976d49	Add x + 0 optimization (#65574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797470 Pulled By: eellison fbshipit-source-id: bf9309fb43f164665335fed0d09697b0e2f67261	2021-10-20 16:13:03 -07:00
Elias Ellison	b059f035be	Fix bug preventing optimization from firing (#65573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573 When we remove mutation on ``` x = [0, 1, 3, 4] x[-2] = 4 ``` we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797469 Pulled By: eellison fbshipit-source-id: 02a1686b9f6016eb5aeb87ed342c043c203dcd0e	2021-10-20 16:13:01 -07:00
Elias Ellison	63b41e1f4d	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797472 Pulled By: eellison fbshipit-source-id: a41ed31fad085d3563e71c815f49af0cd18aaeed	2021-10-20 16:12:58 -07:00
Elias Ellison	4ad6c144f6	[JIT][Easy] Shape cleanups (#65148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148 No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31797447 Pulled By: eellison fbshipit-source-id: 60d322da040245dd7b47ee7c8996239572fd11c2	2021-10-20 16:11:24 -07:00
andrewor	e046386be8	Avoid inlining error reporting in checked_convert (#66721 ) Summary: Summary: Move the error reporting part to the cpp file to avoid callers inlining it, which inflates the generated code size. See https://github.com/pytorch/pytorch/issues/65830. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66721 Test Plan: Compiling the simple program below now generates ~150 lines of assembly, compared to 700+ lines before. ``` #include <c10/core/Scalar.h> void g(float) {} void f(const c10::Scalar& scalar) { auto x = scalar.to<float>(); g(x); } ``` Reviewers: Brian Hirsh Subscribers: Brian Hirsh, Edward Yang, Yining Lu Tasks: T103384490 Tags: pytorch Fixes https://github.com/pytorch/pytorch/issues/65830 Reviewed By: zou3519, bdhirsh Differential Revision: D31737607 Pulled By: andrewor14 fbshipit-source-id: 3d493c4d8e51d8f8a19d00f59b8ea28176c8a9e3	2021-10-20 16:04:09 -07:00
Don Jang	18bbc4c2b7	[Static Runtime] Fix a bug in aten::index (#66940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66940 `aten::index`'s schema is as follows: ``` "aten::index.Tensor(Tensor self, Tensor?[] indices) -> Tensor ``` The current implementation assumes `indices`' elements are all tensors by doing `elem.toTensor`, which is incorrectly. This change creates an empty optional value if an element from `indices` is not a tensor. Test Plan: Fixed `StaticRuntime, IndividualOps_Index` to correctly test `aten::index` with `indices` that contains `None`. Reviewed By: hlu1 Differential Revision: D31712145 fbshipit-source-id: be1c29674bcd55b67b0dcc2a988bc37fd43745f3	2021-10-20 15:51:21 -07:00
Junjie Wang	08cb31a03e	[PyTorch][1/N] Basic implementation of ShardedEmbedding using ShardedTensor. (#66604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66604 This diff/PR is trying to implement the ShardedEmbedding and ShardedEmbedding using the ShardedTensor. Several caveats: 1. We support limited input params for the op. To support more params are on the way. 2. We only support chuck sharding for now. 3. We only support a single local shard per rank for now. ghstack-source-id: 141056130 Test Plan: Unit test and CI Reviewed By: pritamdamania87 Differential Revision: D31544556 fbshipit-source-id: cc867dcba8c11e6f4c7c3722488908f5108cc67f	2021-10-20 15:16:49 -07:00
liulixinkerry	257239972c	Fix attr_to_scope's key in `torch/utils/tensorboard/_pytorch_graph.py` (#65692 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65692 Reviewed By: Reubend Differential Revision: D31678606 Pulled By: edward-io fbshipit-source-id: 7c0bf740ee4f8c21bd01ced3ae70df23c9efadfb	2021-10-20 14:35:29 -07:00
Ivan Yashchuk	450221c534	Sparse CSR: Add tensor.resize_ and tensor.copy_ (#63510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63510 Sparse CSR matrix resizing behavior: If we _increase the number of rows_ the number of specified elements in the matrix remains the same -> the size of col_indices, values doesn't change, the size of crow_indices becomes `rows+1`. If we _decrease the number of rows_ the number of specified elements will be `min(nnz, rowscols)` -> need to resize `crow_indices` to `rows+1` and set the last element to `min(nnz, rowscols)`; decrease the size of col_indices and values to `min(nnz, rows*cols)`. If we _increase the number of columns_ the number of specified elements in the matrix remains the same, the number of rows remains the same -> no need to resize anything, just set new sizes. We _cannot decrease the number of columns_ because it would require recomputing `crow_indices`. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31796680 Pulled By: cpuhrsch fbshipit-source-id: 7d8a9701ce06d30a1841f94bba0a057cacea9401	2021-10-20 14:19:04 -07:00
Sahan Paliskara	f56a1a59a3	Add simple backwards compatibility check for torch.package (#66739 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65154, tests for backwards compatibility of torch.package by checking if packages that were created before can still be loaded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66739 Reviewed By: suo Differential Revision: D31771526 Pulled By: PaliC fbshipit-source-id: ba8c652c647b94114a058e4c7d7f1c7ce6033d84	2021-10-20 12:46:17 -07:00
Jane Xu	6e67150f57	[skip ci] Set test owner for test_mkldnn.py (#66845 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc gujinghui PenghuiCheng XiaobingSuper jianyuh VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/66845 Reviewed By: anjali411 Differential Revision: D31803377 Pulled By: janeyx99 fbshipit-source-id: 4fcf77d3e4bf976449a0b1ab4d750619db3493a1	2021-10-20 12:38:56 -07:00
Mikayla Gawarecki	5569d5824c	Fix documentation of arguments for torch.nn.functional.Linear (#66884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66884 Addressing docs fix mentioned in issue 64978 on Github ghstack-source-id: 141093449 Test Plan: https://pxl.cl/1Rxkz Reviewed By: anjali411 Differential Revision: D31767303 fbshipit-source-id: f1ca10fed5bb768749bce3ddc240bbce1dfb3f84	2021-10-20 12:02:58 -07:00
David Berard	e86d8323cb	[JIT] Add special cases for batch_norm, instance_norm in alias_analysis (#66554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66554 In native_functions.yaml, the schemas for batch_norm and instance_norm are incorrect: the inputs `running_mean` and `running_var` are mutated, but are not marked as such in the function schema. Since `(a!)?` annotations are currently not working (see #65760), this instead adds a special case to `alias_anaysis.cpp`. If the value of `training` or `use_input_stats` is known to be `false`, then `alias_analysis` will mark the input as _not_ being written to. Test Plan: Removed the `skip` annotation on the following test, and added a special exception in `check_alias_annotations`: ``` python test/test_ops.py -k test_variant_consistency_jit_nn_functional_batch_norm ``` Also: ``` ./build/bin/test_jit --gtest_filter="BatchAndInstanceNormFixture" ``` Imported from OSS Reviewed By: eellison Differential Revision: D31612339 fbshipit-source-id: 12ca61b782b9e41e06883ba080a276209dc435bb	2021-10-20 10:22:10 -07:00
Jane Xu	cf77bd4cf4	Fix python version in test tools CI job (#66947 ) Summary: On the HUD, the test tools job is failing as the runners now install Python 3.10, which is not compatible with numpy 1.20 See https://github.com/pytorch/pytorch/runs/3952169950?check_suite_focus=true Install dependencies step: ``` ERROR: Command errored out with exit status 1: command: /opt/hostedtoolcache/Python/3.10.0/x64/bin/python /opt/hostedtoolcache/Python/3.10.0/x64/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmptq8aay7m cwd: /tmp/pip-install-dk_6t98q/numpy_e9431bf106b746148c0e7c36e46551b4 Complete output (1169 lines): setup.py:66: RuntimeWarning: NumPy 1.20.0 may not yet support Python 3.10. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66947 Reviewed By: suo, malfet Differential Revision: D31799205 Pulled By: janeyx99 fbshipit-source-id: 64bf10c37c0aa4f5837c48e92d56e81d920722bd	2021-10-20 10:12:16 -07:00
Jane Xu	793f366e34	[skip ci] Set test owners for sparse tests (#66863 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nikitaved pearu cpuhrsch IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66863 Reviewed By: anjali411 Differential Revision: D31771126 Pulled By: janeyx99 fbshipit-source-id: 6cb5ca0557e8555f6a09b3e607ff8888e505486e	2021-10-20 10:12:13 -07:00
Pearu Peterson	a015964cf8	Strided masked reduction: prod. (#66386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66386 cc nikitaved pearu cpuhrsch Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31779598 Pulled By: cpuhrsch fbshipit-source-id: 304a3d6abc794a49de5b044aade6cfd727758495	2021-10-20 10:10:54 -07:00
Jane Xu	822277f302	[skip ci] Set test owners for test_type_promotion.py (#66866 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc nairbv mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66866 Reviewed By: anjali411 Differential Revision: D31771149 Pulled By: janeyx99 fbshipit-source-id: 87c04ed4a75ada06a553a11064d44ac65fc4c6ea	2021-10-20 09:42:37 -07:00
Jane Xu	409364e597	[skip ci] Set test owners for test_typing.py (#66869 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang malfet rgommers xuzhao9 gramster Pull Request resolved: https://github.com/pytorch/pytorch/pull/66869 Reviewed By: anjali411 Differential Revision: D31766850 Pulled By: janeyx99 fbshipit-source-id: e9772f5378be07162d4f4d06925165e396d7d6c6	2021-10-20 09:41:13 -07:00
Jane Xu	452b359c3f	[skip ci] Set test owners for tensor creation tests (#66864 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc gchanan mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66864 Reviewed By: anjali411 Differential Revision: D31771139 Pulled By: janeyx99 fbshipit-source-id: 74adeae7de355fa6c63de22290fa324911230368	2021-10-20 09:38:21 -07:00
Jane Xu	8a65047acc	[skip ci] Set test owners for everything considered with module: tests (#66865 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66865 Reviewed By: anjali411 Differential Revision: D31771147 Pulled By: janeyx99 fbshipit-source-id: 8bebe5ac2098364ef1ee93b590abb5f4455b0f89	2021-10-20 09:37:03 -07:00
Jeffrey Wan	94f4b22df9	Revert D31761594: [pytorch][PR] opinfo : nn.functional.embedding Test Plan: revert-hammer Differential Revision: D31761594 (`ed5633d0c5`) Original commit changeset: d24f44728d04 fbshipit-source-id: 72574918300a7982430a0ceb772c9a24de525050	2021-10-20 09:17:16 -07:00
soulitzer	f95fef7897	Add prim::TensorExprDynamicGuard to bc allowlist (#66939 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66939 Reviewed By: ejguan Differential Revision: D31797160 Pulled By: soulitzer fbshipit-source-id: 630b7a0ab99671192397f927391361622f7e9c2e	2021-10-20 08:53:19 -07:00
Mikayla Gawarecki	3fe2ff800c	Module docs update (#66909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37824 {F671745341} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66909 Reviewed By: anjali411 Differential Revision: D31782046 Pulled By: mikaylagawarecki fbshipit-source-id: 009d2ea3c8a51a89786ef55bb9e88dc53aa8360f	2021-10-20 08:14:36 -07:00
vfdev	62ca5a81c0	Exposed `recompute_scale_factor` into nn.Upsample (#66419 ) Summary: Description: - Exposed recompute_scale_factor into nn.Upsample such that recompute_scale_factor=True option could be used Context: https://github.com/pytorch/pytorch/pull/64501#discussion_r710205190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66419 Reviewed By: gchanan Differential Revision: D31731276 Pulled By: jbschlosser fbshipit-source-id: 2118489e6f5bc1142f2a64323f4cfd095a9f3c42	2021-10-20 07:59:25 -07:00
Pearu Peterson	867ccc9987	Strided masked reduction: amin. (#66385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66385 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31779530 Pulled By: cpuhrsch fbshipit-source-id: de753c2d191f7980a48831b892d3a1e8a7a547cd	2021-10-20 07:45:40 -07:00
Mikayla Gawarecki	c69e33bb11	Fix doc string for torch.acosh (#66814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66814 Shift equation above note as per issue 65905 on github Test Plan: Imported from OSS In preview docs built from PR https://docs-preview.pytorch.org/66814/generated/torch.acosh.html#torch.acosh equation is now above note {F671441651} Reviewed By: gchanan Differential Revision: D31742677 Pulled By: mikaylagawarecki fbshipit-source-id: 9fa5390ad2a01ca001418c0bd624f2145f861bf4	2021-10-20 07:01:42 -07:00
kshitij12345	ed5633d0c5	opinfo : nn.functional.embedding (#66622 ) Summary: Adds opinfo for `nn.functional.embedding` Few cases where `numerical` gradient doesn't match (gradcheck fails) ```python import torch try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, padding_idx=1), (idx, t, )) except Exception as e: print("PADDING IDX:", e) try: t = torch.ones(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, max_norm=1.), (idx, t, )) except Exception as e: print("MAX NORM:", e) try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, scale_grad_by_freq=True), (idx, t, )) except Exception as e: print("SCALE GRAD BY FREQUENCY:", e) try: t = torch.randn(2, 1, dtype=torch.float64, requires_grad=True) idx = torch.tensor([0, 1]) torch.autograd.gradcheck(lambda idx, t : torch.nn.functional.embedding(idx, t, sparse=True), (idx, t, )) except Exception as e: print("SPARSE", e) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66622 Reviewed By: gchanan Differential Revision: D31761594 Pulled By: zou3519 fbshipit-source-id: d24f44728d049e6276d6c3165aa1fba458214959	2021-10-20 06:33:55 -07:00
Hao Lu	79803b199f	[Static Runtime] Make sure ProcessedNode::function_kind_ is copied over (#66917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66917 The total number of 'out' variant nodes/total number of nodes is now 100% for all the models, which isn't true obviously. Reviewed By: swolchok, mikeiovine Differential Revision: D31783028 fbshipit-source-id: e0bc2c6614aa3c3a235283c9125de1b339f42585	2021-10-20 00:21:35 -07:00
Junjie Wang	14ee608791	[PyTorch] Make rearragement in sharded linear work as expected. (#66603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66603 Found the issue here: https://github.com/pytorch/pytorch/issues/66281 by make the test cases more complicated. By closely reading the code again, it turns out my original understanding is also wrong. Let's use the example mentioned in the issue to explain: If the placement is like: ``` "rank:3/cuda:3", "rank:0/cuda:0", "rank:1/cuda:1", "rank:2/cuda:2", ``` First, we split the column or row by the order of [3, 0, 1, 2]. In the case of column-wise sharding: We get to reaggrage the result from rank0-4. Step 1: we split the output based on the original sharding strategy, aka, rank3 gets the 1st shard, rank0 get the 2nd shard, etc. Step 2: we need to rearrange the result from rank0-4 by ordering them following the order of [3, 0, 1, 2], aka, the result from rank3 needs to be put in the front, and so forth. In the case of row-wise sharding: We need to rearrange the input being sent to rank0-4. Step 1: we reorder the input and follow the map of [3, 0, 1, 2]. For example, the first shard goes to rank 3 so we need to put in the 3rd part, the second shard goes to rank 0, so we put it in the 2nd part, and so on. Step 2: the size of the sharding for each rank is decided by the original placement: [3, 0, 1, 2], aka, rank 3 gets the first shard and its size, etc. Update the unit test to reflect this change. Also, correct some format and comments in the sharded linear. ghstack-source-id: 141055689 Test Plan: unit test and wait for CI. Reviewed By: pritamdamania87, bowangbj Differential Revision: D31634590 fbshipit-source-id: 677a9c2b42da1e2c63220523ed2c004565bbecc7	2021-10-19 23:16:38 -07:00
Michael Suo	ef15691a1e	Revert D31732421: [JIT][Easy] Shape cleanups Test Plan: revert-hammer Differential Revision: D31732421 (`16d0896b69`) Original commit changeset: e934507d1795 fbshipit-source-id: 6b34815c556de64ee5c7ef8d41e4cb434ccd7098	2021-10-19 20:07:06 -07:00
Michael Suo	70c9eb130d	Revert D31732419: [JIT] Add partial evaluation graph stitching logic Test Plan: revert-hammer Differential Revision: D31732419 (`5db7db667f`) Original commit changeset: 883a55cbeef0 fbshipit-source-id: f5faba69dfb6b54aeb29d1beaeec8c5b0373830f	2021-10-19 20:07:04 -07:00
Michael Suo	90b42452e2	Revert D31732417: Fix bug preventing optimization from firing Test Plan: revert-hammer Differential Revision: D31732417 (`853fc25fb0`) Original commit changeset: dd734254c021 fbshipit-source-id: 3da0663dac5b5d2117b3d7abdbcd45d96f98de33	2021-10-19 20:07:02 -07:00
Michael Suo	b8d58129bb	Revert D31732420: Add x + 0 optimization Test Plan: revert-hammer Differential Revision: D31732420 (`66543f88de`) Original commit changeset: 0271e0dc0dda fbshipit-source-id: c2beea1661e10c2f1a982b5d4a34b1041dcb1204	2021-10-19 20:07:00 -07:00
Michael Suo	e730752610	Revert D31732416: Add Handling of Cat in Shape Analysis Test Plan: revert-hammer Differential Revision: D31732416 (`cc7de1df3b`) Original commit changeset: 6d93ddf62c34 fbshipit-source-id: e2c9713177a7f783897e99dd71e631fb275c37da	2021-10-19 20:06:57 -07:00
Michael Suo	57fcea9e88	Revert D31732418: Add support for multi output nodes in partial eval graph stitching Test Plan: revert-hammer Differential Revision: D31732418 (`0fdc9b77a3`) Original commit changeset: 767698d031b1 fbshipit-source-id: f899eb155dcec67d57f53a658a71169d37b63b42	2021-10-19 20:06:55 -07:00
Michael Suo	4187d870df	Revert D31732415: Add support for cat in output stitching Test Plan: revert-hammer Differential Revision: D31732415 (`b4db5174fe`) Original commit changeset: 7f513cea355f fbshipit-source-id: a0d8f1512b13d51f6e50b5da58084effbaf0a0dc	2021-10-19 20:06:53 -07:00
Michael Suo	1bf0e1acb4	Revert D31732414: Add Initial NNC Dynamic Shapes Flow Test Plan: revert-hammer Differential Revision: D31732414 (`de4fe7a38c`) Original commit changeset: 290a94a667c2 fbshipit-source-id: 3021a1d7a8661967e37d4f9cfc86ed47cc4a7f3d	2021-10-19 20:05:29 -07:00
Nikita Shulga	9c4d7d96db	Address feedback from #66673 (#66905 ) Summary: Specify both `build_generates_artifacts` and `exclude_tests` properties as suggested in https://github.com/pytorch/pytorch/pull/66673#pullrequestreview-783667960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66905 Reviewed By: seemethere Differential Revision: D31779742 Pulled By: malfet fbshipit-source-id: 21f5543f3b767f38132be8c7e163455f39ff893f	2021-10-19 18:27:45 -07:00
Alex Beloi	deb6989880	[fx-acc] add optimize_quantization to FX graph opts (#65929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65929 This adds a set of quantize/dequantize graph optimizations. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts:test_fx_graph_opts ``` ``` Parsing buck files: finished in 0.8 sec Building: finished in 3.0 sec (100%) 8475/80926 jobs, 0/80926 updated Total time: 3.9 sec More details at https://www.internalfb.com/intern/buck/build/9dd6193b-d99c-4d2a-8ef8-4d71380916e7 BUILD SUCCEEDED Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: b5a83d2a-8870-400e-b21e-3286967d1f4a Trace available for this run at /tmp/tpx-20211018-165956.836274/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_graph_opts - main (3.152) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_1_optimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.100) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_0_identity (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.017) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.154) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.140) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_2_QuantizePerChannel_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.422) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.296) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.288) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.433) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_clamp_tensor (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.346) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_1_Quantize_Dequantize_X_RescaleQuantized_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.403) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_transpose_to_reshape_2_unoptimizable (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestTransposeToReshape) (0.117) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_1 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.415) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_3 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.280) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_3_Dequantize_Quantize_Dequantize_X_Dequantize_rescale_X_Dequantize_X_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.150) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_6 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.133) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.523) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_dequantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.569) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_4_Rescale_QuantizeNode_QuantizeNode_ (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.815) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_5 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.295) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_4 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.308) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.213) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_2 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.230) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantization_0_Dequantize_Quantize_X_X (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantization) (0.336) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_remove_one_0 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.486) ✓ Pass: glow/fb/fx/graph_opts:test_fx_graph_opts - test_optimize_quantize_clamp_ignore_one_7 (glow.fb.fx.graph_opts.tests.test_fx_graph_opts.TestOptimizeQuantizeClamp) (0.306) Summary Pass: 25 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124724048882 ``` # Before ``` Model before opt. graph(): %x : [#users=1] = placeholder[target=x] %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1.000001e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) %dequantize_1 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.dequantize](args = (), kwargs = {input: %quantize_per_tensor_2}) %quantize_per_tensor_3 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %dequantize_1, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) return quantize_per_tensor_3 opcode name target args kwargs ------------- --------------------- ------------------------------------------------ ------------------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- placeholder x x () {} call_function quantize_per_tensor_2 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1.000001e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} call_function dequantize_1 <function dequantize at 0x7f66030a35e0> () {'input': quantize_per_tensor_2} call_function quantize_per_tensor_3 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': dequantize_1, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} output output output (quantize_per_tensor_3,) {} ``` # After ``` Model after opt. graph(): %x : [#users=1] = placeholder[target=x] %quantize_per_tensor_2 : [#users=1] = call_function[target=torch.fx.experimental.fx_acc.acc_ops.quantize_per_tensor](args = (), kwargs = {input: %x, acc_out_ty: ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {scale: 1e-05, zero_point: 0, qscheme: torch.per_tensor_affine})}) return quantize_per_tensor_2 opcode name target args kwargs ------------- --------------------- ------------------------------------------------ ------------------------ ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- placeholder x x () {} call_function quantize_per_tensor_2 <function quantize_per_tensor at 0x7f66030a34c0> () {'input': x, 'acc_out_ty': ((8, 4, 2), torch.qint32, False, (8, 2, 1), torch.contiguous_format, True, {'scale': 1e-05, 'zero_point': 0, 'qscheme': torch.per_tensor_affine})} output output output (quantize_per_tensor_2,) {} ``` Reviewed By: jfix71 Differential Revision: D30945732 fbshipit-source-id: 427cd4215b546e1d6c5362734bb7de93d0c0b1b9	2021-10-19 17:06:32 -07:00
Jane Xu	32e3003726	Have test classes extend from common_utils.TestCase, not unittest.TestCase (#66900 ) Summary: This causes some functionality to not work, such as the disabling issues e.g., https://github.com/pytorch/pytorch/issues/66641 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66900 Reviewed By: seemethere Differential Revision: D31778293 Pulled By: janeyx99 fbshipit-source-id: df3023ddaf7969ffb60117d1e1d7e36d87bc6139	2021-10-19 16:54:05 -07:00
Elias Ellison	de4fe7a38c	Add Initial NNC Dynamic Shapes Flow (#66136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66136 FOR REVIEWERS: this is ready to review, test failures comes from somewhere else in stack.. Takes in a TensorExprGraph of static shapes and generalizes the input shapes to symbolic dimensions. Dimensions of value 1 will be preserved, otherwise dimensions with the same value will be bucketed to the same symbolic shape. E.g. `Tensor(5, 3), Tensor(3, 1) -> Tensor(SS(-1), SS(-2)), Tensor(SS(-2), 1)` From there, runs symbolic shape inference on the graph, and creates a versioning if in the graph with prim::TensorExprDynamicGuard checking if the inputs at runtime match the Generalized Symbolic Shapes that are inputs to the TE Kernel. The computate to calculate all symbolic dimensions is inlined in to the if block with the TE Kernel. All Sym Dim Value* are appended to the end of the TE Kernel Graph/Node inputs, and the Node is augmented with a integer list attr `symbolic_shape_inputs` that gives the mapping from Value * -> Symbolic Shape int64_t value. For more lengthy IR examples and walkthrough look at ShapeAnalysisTest.DynamicShapesFusion in `test_shape_analysis` Returns True on Success, False on Failure, can fail if shape propagation fails to propagate # of dims or if complete shapes on inputs not set. Example transformation ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : Tensor = prim::TensorExprGroup_0(%x_inp, %y_inp, %z_inp) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %3 : int = prim::Constant[value=0]() %4 : Tensor = aten::tanh(%x.1) %5 : Tensor = aten::erf(%4) %6 : Tensor = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor = aten::cat(%7, %3) %9 : Tensor = aten::hardswish(%8) %10 : Tensor = aten::mul(%9, %z) return (%9) ``` -> ``` graph(%x_inp : Float(10, 5, strides=[5, 1], requires_grad=0, device=cpu), %y_inp : Float(4, 5, strides=[5, 1], requires_grad=0, device=cpu), %z_inp : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)): %4 : bool = prim::TensorExprDynamicGuard[types=[Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu)]](%x_inp, %y_inp, %z_inp) %5 : Tensor = prim::If(%4) block0(): %15 : int[] = aten::size(%x_inp) %16 : int[] = aten::size(%y_inp) %17 : int = prim::Constant[value=1]() %18 : int = prim::Constant[value=0]() %elem.3 : int = aten::__getitem__(%15, %18) # <string>:40:10 %elem.5 : int = aten::__getitem__(%15, %17) # <string>:40:10 %elem.11 : int = aten::__getitem__(%16, %18) # <string>:40:10 %cat_dim_size.48 : int = aten::add(%elem.3, %elem.11) # <string>:321:29 %3 : Tensor = prim::TensorExprGroup_0[symbolic_shape_inputs=[-5, -4, -3, -2]](%x_inp, %y_inp, %z_inp, %cat_dim_size.48, %elem.11, %elem.5, %elem.3) -> (%3) block1(): %14 : Tensor = prim::FallbackGraph_1(%x_inp, %y_inp, %z_inp) -> (%14) return () with prim::TensorExprGroup_0 = graph(%x.1 : Float(SS(-2), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %y.1 : Float(SS(-4), SS(-3), strides=[5, 1], requires_grad=0, device=cpu), %z : Float(1, 1, strides=[1, 1], requires_grad=0, device=cpu), %SS_5 : int, %SS_4 : int, %SS_3 : int, %SS_2 : int): %3 : int = prim::Constant[value=0]() %4 : Tensor(SS(-2), SS(-3)) = aten::tanh(%x.1) %5 : Tensor(SS(-2), SS(-3)) = aten::erf(%4) %6 : Tensor(SS(-4), SS(-3)) = aten::relu(%y.1) %7 : Tensor[] = prim::ListConstruct(%5, %6) %8 : Tensor(SS(-5), SS(-3)) = aten::cat(%7, %3) %9 : Tensor(SS(-5), SS(-3)) = aten::hardswish(%8) %10 : Tensor(SS(-5), SS(-3)) = aten::mul(%9, %z) return (%9) ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732414 Pulled By: eellison fbshipit-source-id: 290a94a667c20467717202a43c60e4f9ca4c00e2	2021-10-19 16:41:49 -07:00
Elias Ellison	b4db5174fe	Add support for cat in output stitching (#66098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66098 `cat` is somewhat special-cased right now because currently we only have list of Tensor inputs where the list is constructed in the JIT IR graph. While that is generally true for Fusion (e.g. why we have ConstantChunk) that may not be true for shape analysis generally, so I'm waiting a bit to generalize. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732415 Pulled By: eellison fbshipit-source-id: 7f513cea355f1e4c1d2ca7c32c06690a9bdcb050	2021-10-19 16:41:44 -07:00
Elias Ellison	0fdc9b77a3	Add support for multi output nodes in partial eval graph stitching (#66097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66097 Adding logic to generate runtime shapes for nodes with multi-outputs. It is generalizing existing flow of looking at a node, getting its shape graph, inlining it, and adding a mapping from the output to the new value in the stitched shape compute graph to loop over multiple outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732418 Pulled By: eellison fbshipit-source-id: 767698d031b1daf002678a025b270e0ede429061	2021-10-19 16:41:39 -07:00
Elias Ellison	cc7de1df3b	Add Handling of Cat in Shape Analysis (#65575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65575 This is needed for lowering an NNC model to mobile. It is also the last class of unhandled ops which NNC fuses, and we need integration this for computing output symbolic shapes. The graph of with two dynamic shape inputs produces: ``` graph(%x.1 : Tensor(SS(-2), 2, 3), %y.1 : Tensor(SS(-3), 2, 3)): %5 : int = prim::Constant[value=0]() %4 : Tensor[] = prim::ListConstruct(%x.1, %y.1) %6 : Tensor(SS(-4), 2, 3) = aten::cat(%4, %5) # /private/home/eellison/pytorch/test/jit/test_symbolic_shape_analysis.py:290:19 return (%6) ``` With a partial eval graph of ``` Done with partial evaluation graph(%129 : int[], %130 : int[], %dim.14 : int): %738 : int = prim::Constant[value=3]() %737 : int = prim::Constant[value=2]() %132 : int = prim::Constant[value=0]() %392 : int = aten::__getitem__(%129, %132) # <string>:339:44 %417 : int = aten::__getitem__(%130, %132) # <string>:339:44 %cat_dim_size.48 : int = aten::add(%392, %417) # <string>:339:29 %result_size.5 : int[] = prim::ListConstruct(%cat_dim_size.48, %737, %738) return (%result_size.5) ``` To handle cat, I essentially make the cat shape op variadic, replacing ``` torch.cat([x, y] ... def cat_shape_op(tensors: List[List[int]], dim: int): ... op(tensors) ``` with ``` def cat_shape_op(x: List[int], y: List[int], dim: int): tensors = [x, y] op(tensors) ``` This reuses the existing input Tensor properties partial evaluation path and avoids having to add special handling to optimize out `len(tensors)` calls in the IR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732416 Pulled By: eellison fbshipit-source-id: 6d93ddf62c34846ec238159f75229632515530b7	2021-10-19 16:41:34 -07:00
Elias Ellison	66543f88de	Add x + 0 optimization (#65574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65574 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732420 Pulled By: eellison fbshipit-source-id: 0271e0dc0ddab06220048ed5bf4236fc85f3318c	2021-10-19 16:41:29 -07:00
Elias Ellison	853fc25fb0	Fix bug preventing optimization from firing (#65573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65573 When we remove mutation on ``` x = [0, 1, 3, 4] x[-2] = 4 ``` we have a safety check that the new index will be in bounds of the old index. in practice, this should always be the case otherwise you would have a runtime error. Within that check (not within the actual adjustment) we were using the wrong length of inputs preventing the optimization from firing. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732417 Pulled By: eellison fbshipit-source-id: dd734254c0212ca459c1c135da262974de5299be	2021-10-19 16:41:24 -07:00
Elias Ellison	5db7db667f	[JIT] Add partial evaluation graph stitching logic (#65377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65377 When we run symbolic shape analysis on ``` conv = torch.nn.Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) max_pool = torch.nn.MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) mod = nn.Sequential(conv1, max_pool) ... graph(%self : __torch__.torch.nn.modules.container.___torch_mangle_0.Sequential, %input.1 : Tensor): %18 : bool = prim::Constant[value=0]() %30 : int[] = prim::Constant[value=[1, 1]]() %29 : int[] = prim::Constant[value=[3, 3]]() %28 : int[] = prim::Constant[value=[2, 2]]() %6 : int = prim::Constant[value=1]() %self.0.bias : NoneType = prim::Constant() %self.0.weight : Double(64, 3, 7, 7, strides=[147, 49, 7, 1], requires_grad=0, device=cpu) = prim::Constant[value=<Tensor>]() %input.5 : Tensor(SS(-2), 64, SS(-3), SS(-4)) = aten::conv2d(%input.1, %self.0.weight, %self.0.bias, %28, %29, %30, %6) %input.9 : Tensor(SS(-2), 64, SS(-5), SS(-6)) = aten::max_pool2d(%input.5, %29, %28, %30, %30, %18) return (%input.9) ``` we partially evaluate the shape compute graph of `conv2d`, whose output gets passed in and used to partially evaluate the shape compute graph of `max_pool2d`. The conv2d remaining partially eval'd graph is [here](https://gist.github.com/eellison/0598bd224a422211efa1a45d2b7560b7), and the maxpool2d eval'd graph is [here](https://gist.github.com/eellison/625540b84f650ddbefd3ae5511ab8814). We can take the partially eval'd graphs of a series of operators and stitch them together, which allows us to a) recover symbolic equivalences by CSE'ing & other optimizations b) calculate shapes for a whole block of operators just on the input, such as for fusing the whole model to nnc with dynamic shapes and then passing along the computed symbolic shapes. the calculation will also handle error handling. c) (future-looking) generate inputs on demand for straight-line networks that are composed just of aten operators The combined graph of the two gives us compute for the unknown symbolic dimensions - `SS(-2), SS(-3), SS(-4), SS(-5), and SS(-6)`. ``` graph(%input.1 : int[]): %42 : bool = prim::Constant[value=0]() # <string>:152:17 %15 : int = prim::Constant[value=3]() %input_batch_size_dim.1 : int = prim::Constant[value=0]() # <string>:417:41 %13 : int = prim::Constant[value=1]() # <string>:426:61 %12 : int = prim::Constant[value=4]() # <string>:437:32 %11 : str = prim::Constant[value="AssertionError: "]() %9 : int = prim::Constant[value=2]() %8 : int = prim::Constant[value=6]() %7 : int = prim::Constant[value=7]() %16 : int = aten::len(%input.1) # <string>:438:17 %17 : bool = aten::eq(%16, %12) # <string>:438:17 = prim::If(%17) # <string>:438:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:438:10 -> () %18 : int = aten::__getitem__(%input.1, %13) # <string>:407:17 %19 : bool = aten::eq(%18, %15) # <string>:407:17 = prim::If(%19) # <string>:407:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:407:10 -> () %20 : int = aten::__getitem__(%input.1, %9) # <string>:411:20 %21 : int = aten::add(%20, %8) # <string>:411:20 %22 : bool = aten::ge(%21, %7) # <string>:411:20 = prim::If(%22) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %23 : int = aten::__getitem__(%input.1, %15) # <string>:411:20 %24 : int = aten::add(%23, %8) # <string>:411:20 %25 : bool = aten::ge(%24, %7) # <string>:411:20 = prim::If(%25) # <string>:411:12 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:411:12 -> () %26 : int = aten::__getitem__(%input.1, %input_batch_size_dim.1) # <string>:422:29 %27 : int = aten::sub(%20, %13) # <string>:428:32 %28 : int = aten::floordiv(%27, %9) # <string>:428:32 %29 : int = aten::add(%28, %13) # <string>:428:32 %30 : int = aten::sub(%23, %13) # <string>:428:32 %31 : int = aten::floordiv(%30, %9) # <string>:428:32 %32 : int = aten::add(%31, %13) # <string>:428:32 %48 : int = aten::floordiv(%28, %9) # <string>:133:17 %outputSize.2 : int = aten::add(%48, %13) # <string>:136:23 %51 : int = aten::floordiv(%31, %9) # <string>:133:17 %outputSize.1 : int = aten::add(%51, %13) # <string>:136:23 %53 : bool = aten::ne(%29, %input_batch_size_dim.1) # <string>:156:41 %54 : bool = prim::If(%53) # <string>:157:64 block0(): %55 : bool = aten::ne(%32, %input_batch_size_dim.1) # <string>:157:93 -> (%55) block1(): -> (%42) = prim::If(%54) # <string>:157:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:157:10 -> () %56 : bool = aten::ge(%outputSize.1, %13) # <string>:160:17 %57 : bool = prim::If(%56) # <string>:160:17 block0(): %58 : bool = aten::ge(%outputSize.2, %13) # <string>:160:38 -> (%58) block1(): -> (%42) = prim::If(%57) # <string>:160:10 block0(): -> () block1(): = prim::RaiseException(%11) # <string>:160:10 -> () return (%26, %29, %32, %outputSize.2, %outputSize.1) ``` This PR runs shape analysis, retains the partially evaluated graphs, and then stitches them together, keeping track of what inputs in the partial eval graph correspond to what inputs in the encompassing graph IR and what outputs correspond to what symbolic shape. Adding NNC ppl as reviewers because it is relevant to dynamic shape fusion. Question for reviewers : should I make this a separate file ? Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732419 Pulled By: eellison fbshipit-source-id: 883a55cbeef0fd5a6068a779ffa89b6f537245b3	2021-10-19 16:41:19 -07:00
Elias Ellison	16d0896b69	[JIT][Easy] Shape cleanups (#65148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65148 No functional changes, factoring out optimizations and renaming the `graph` in symbolic shape analysis to `shape_compute_graph` as ZolotukhinM suggested Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31732421 Pulled By: eellison fbshipit-source-id: e934507d1795e0bc4d98a3bfe6cb792e2f08b119	2021-10-19 16:39:32 -07:00
Peter Bell	b3bb234e16	Remove THCGeneral.cpp (#66766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66766 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31721647 Pulled By: ngimel fbshipit-source-id: 5033a2800871c8745a1a92e379c9f97c98af212e	2021-10-19 16:09:19 -07:00
Ivan Yashchuk	bd4d5cb14c	Sparse CSR: Add torch.empty (#63509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63509 The primary use of `torch.empty` is to reserve memory for tensor and set the type, device, size information. The same is done here for SparseCSR. `crow_indices` is initialized as an empty tensor of size `num_rows + 1`. `col_indices` and `values` are initialized as empty tensors of size 0. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D31770359 Pulled By: cpuhrsch fbshipit-source-id: c83f2a2e0d7514ba24780add1086e1bccf541dd9	2021-10-19 15:59:07 -07:00
Erjia Guan	b1a6129e09	Add repr to StreamWrapper (#66880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66880 Help to print out `fileobj` Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31764431 Pulled By: ejguan fbshipit-source-id: 668a8fbe0078196d4d584be3dfb413c8ad5e72b1	2021-10-19 15:28:25 -07:00
David Riazati	e70b5d64f4	Change README getting started link to explicit instructions (#66828 ) Summary: This changes the link for installing binaries to the page on pytorch.org that is entirely the download command selector (which isn't visible on a normal aspect ratio screen when the main website page first loads anymore). This also includes some other random fixes: * Update HUD link * Clean ups Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66828 Reviewed By: malfet Differential Revision: D31750654 Pulled By: driazati fbshipit-source-id: aef9ceba71418f6f7648eab9a8c8a78d6c60518b	2021-10-19 14:59:48 -07:00
Nikita Shulga	cbd7bac914	Migrate clang5-mobile build to GHA (#66673 ) Summary: `linux-xenial-py3-clang5-mobile-build`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic`, `linux-xenial-py3-clang5-mobile-custom-build-dynamic` and `linux-xenial-py3-clang5-mobile-code-analysis` are just the flavors of regular linux build job with no tests. `linux-xenial-py3-clang5-mobile-code-analysis` is the master only job `code-analysis` job is dispatch to `.jenkins/pytorch/build-mobile-code-analysis.sh` in `583217fe37/.jenkins/pytorch/build.sh (L23-L25)` and all `mobile-build` jobs are dispatched to `.jenkins/pytorch/build-mobile.sh` in `583217fe37/.jenkins/pytorch/build.sh (L19-L21)` Rename `is_libtorch` `CIWorkflow` property into `build_generates_artifacts` and change defaults from False to True Both libtorch and mobile build jobs do not generate build artifacts Pull Request resolved: https://github.com/pytorch/pytorch/pull/66673 Reviewed By: janeyx99 Differential Revision: D31674434 Pulled By: malfet fbshipit-source-id: 24d05d55366202cd4d9c25ecab429cb8f670ded0	2021-10-19 14:13:29 -07:00
Shiyan Deng	15f21eef5e	[fx2trt]fix softmax test (#66885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66885 Test Plan: CI Reviewed By: hl475 Differential Revision: D31767433 fbshipit-source-id: 1ee79ac027c612b5397be9da9665fff21b2c321f	2021-10-19 13:55:49 -07:00
Richard Barnes	a1afb692f3	Fix metal issues with irange (#66877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66877 Fixes (hopefully): ``` program_source:516:27: error: use of undeclared identifier 'c10' for (const auto idx : c10::irange(4)) { ^ program_source:590:27: error: use of undeclared identifier 'c10' for (const auto idx : c10::irange(4)) { ^ program_source:810:26: error: use of undeclared identifier 'c10' for (const auto iy : c10::irange(roi_bin_grid_h)) { ^ program_source:811:30: error: use of undeclared identifier 'c10' for (const auto ix : c10::irange(roi_bin_grid_w)) { ^ DeviceName: AMD Radeon Pro 5500M, LanguageVersion: 131075 Exception raised from -[MetalContext available] at xplat/caffe2/aten/src/ATen/native/metal/MetalContext.mm:66 (most recent call first): (no backtrace available) ``` Test Plan: Sandcastle Reviewed By: benb, xta0 Differential Revision: D31763270 fbshipit-source-id: cfe4364b14c5fe6dbd39893788919769c9a9eb00	2021-10-19 13:49:24 -07:00
Scott Wolchok	66f241230d	[PyTorch] Take const Type& in {tryS,s}calarTypeFromJitType (#66717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66717 No need to require a refcount bump for this function. ghstack-source-id: 140921170 Test Plan: CI Reviewed By: suo Differential Revision: D31696898 fbshipit-source-id: a3732a04ccbddc32207ce90836030f3020154a77	2021-10-19 13:08:42 -07:00
Jane Xu	9a00910bf3	[skip ci] Set test owner for test_linalg.py (#66844 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/66844 Reviewed By: gchanan Differential Revision: D31761714 Pulled By: janeyx99 fbshipit-source-id: a4c7b239d855707ee6ec1194f57f8a66812b4e99	2021-10-19 13:01:05 -07:00
Shunting Zhang	57c596eb9e	add interactive_embedded_interpreter.cpp to the OSS build (#66352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66352 Add cmake rules for interactive_embedded_interpreter.cpp . The builtin_registry.cpp has already been handled in https://github.com/pytorch/pytorch/pull/66347 . I'll remove the change in this PR once that one is merged. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D31521249 Pulled By: shunting314 fbshipit-source-id: bb9d340e5a6aad7d76078ca03a82b5ae7494a124	2021-10-19 12:32:49 -07:00
Ivan Yashchuk	3488a85a76	Sparse CSR CUDA: fix input checks for `addmm` and `mm` (#66485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66485 The errors for incorrectly sized inputs should match the dense variants of functions. Moved addmm_out_sparse_csr_dense_cuda from SparseCsrTensorMath.cu and removed unnecessary device check. cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31764036 Pulled By: cpuhrsch fbshipit-source-id: 76900fe9e4a49474695a01f34bad41cb3422321c	2021-10-19 12:01:11 -07:00
Peter Bell	690c2a7076	masked_scatter: fuse mask count check into one kernel (#66871 ) Summary: This saves 1 kernel launch, 7 dispatcher calls, 3 `TensorImpl` allocations and 1 CUDA memory allocation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66871 Reviewed By: gchanan Differential Revision: D31763713 Pulled By: ngimel fbshipit-source-id: b0d2f9415b7fd013fb4e7d68ade6e38a58f5b153	2021-10-19 11:52:38 -07:00
Scott Wolchok	552af8bdef	[PyTorch] Fix missing move in OptionalType::createWithContained (#66697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66697 We own this vector, so we can move from it. ghstack-source-id: 140742640 Test Plan: CI Reviewed By: suo Differential Revision: D31693230 fbshipit-source-id: 3f33ca6e47e29b0e3d6c8fad59c234c55e1e159f	2021-10-19 11:47:35 -07:00
Scott Wolchok	7e81a89e13	[PyTorch] Fix performance-no-automatic-move clang tidy warnings in matchTypeVariables (#66720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66720 See the documentation for the warning. https://clang.llvm.org/extra/clang-tidy/checks/performance-no-automatic-move.html ghstack-source-id: 140922952 Test Plan: CI Reviewed By: suo Differential Revision: D31697506 fbshipit-source-id: 26ce6c47d0f3b0c4e48ecc882f6792f1b5a45bac	2021-10-19 11:30:46 -07:00
Jane Xu	50f5689d60	Set test owner for distributions tests (#66842 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc fritzo neerajprad alicanb nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/66842 Reviewed By: neerajprad Differential Revision: D31761720 Pulled By: janeyx99 fbshipit-source-id: 9d9e88d93e2efb90c971f165b4040880e9d90c56	2021-10-19 11:00:29 -07:00
Jane Xu	c37f413e75	[skip ci] Change pretrained to false for quantization tests (#66795 ) Summary: Helps resolve a bit of https://github.com/pytorch/pytorch/issues/65439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66795 Reviewed By: suo, jerryzh168 Differential Revision: D31732043 Pulled By: janeyx99 fbshipit-source-id: 10b71865fc937f9d72f2b1c04cbf3ea9a68c8818	2021-10-19 10:56:29 -07:00
Jane Xu	c9d9244166	[skip ci] Set test owner for test_spectral_ops.py (#66843 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry peterbell10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66843 Reviewed By: gchanan Differential Revision: D31761715 Pulled By: janeyx99 fbshipit-source-id: 1173a200478b87568768fafcfee117c09c1cffbd	2021-10-19 10:56:27 -07:00
Jane Xu	34051d74da	Add test owner to distributed files starting with test_ (#66797 ) Summary: Action based on https://github.com/pytorch/pytorch/issues/66232 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang Pull Request resolved: https://github.com/pytorch/pytorch/pull/66797 Reviewed By: gchanan Differential Revision: D31761389 Pulled By: janeyx99 fbshipit-source-id: c27c9ab4acec1eb71d5edd4538cd113b770dfc6c	2021-10-19 10:55:20 -07:00
Jane Xu	94afbd158c	[skip ci] Set test owner for test_numpy_interop.py (#66851 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc mruberry rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/66851 Reviewed By: gchanan Differential Revision: D31761703 Pulled By: janeyx99 fbshipit-source-id: 4dec507dff0ce25d2780b6020f0d9790ab1cb499	2021-10-19 10:50:54 -07:00
Eshika Shah	17f07c310b	Fix type checking errors in torch/ao/quantization/quantize_fx.py (#66804 ) Summary: - [x] Fix the Pyre type checking errors in `torch/ao/quantization/quantize_fx.py` ``` torch/quantization/quantize_fx.py:41:8 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:143:16 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:144:16 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:206:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:230:12 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:268:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:269:8 Incompatible variable type [9]: equalization_qconfig_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:427:8 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:464:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:486:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/quantize_fx.py:547:8 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/76](https://github.com/MLH-Fellowship/pyre-check/issues/76) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66804 Reviewed By: onionymous Differential Revision: D31738171 Pulled By: 0xedward fbshipit-source-id: 00d4c5749c469aff39a1531365461ced747e52fc	2021-10-19 09:45:18 -07:00
lezcano	a2e94b80fa	Create linalg.matrix_exp (#62715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62715 Fixes https://github.com/pytorch/pytorch/issues/61648 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31641698 Pulled By: mruberry fbshipit-source-id: 2e2965d14807b6b4fada4b809d539066dd0ba277	2021-10-19 09:07:15 -07:00
Jane Xu	fd608cd313	[skip ci] Set test owners for optim tests (#66861 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc vincentqb jbschlosser albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/66861 Reviewed By: albanD Differential Revision: D31761369 Pulled By: janeyx99 fbshipit-source-id: 57829e1f1509fc2af321530a4b55c9d33b7fb150	2021-10-19 08:39:35 -07:00
Jane Xu	c806bb1022	[skip ci] Set test owner for test_complex.py (#66835 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/66835 Reviewed By: anjali411 Differential Revision: D31761723 Pulled By: janeyx99 fbshipit-source-id: ca672f5a1be9dc27284fade725a8238cbfd877a3	2021-10-19 08:36:27 -07:00
Jane Xu	299a6a65b2	[skip ci] Set test owners for autograd tests (#66834 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66834 Reviewed By: albanD Differential Revision: D31761778 Pulled By: janeyx99 fbshipit-source-id: 355edfb1b940154e84fbba6f7b096605e75ae459	2021-10-19 08:35:02 -07:00
Jane Xu	39215ddf84	[skip ci] Set test owners for dataloader tests (#66839 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc SsnL VitalyFedyunin ejguan NivekT Pull Request resolved: https://github.com/pytorch/pytorch/pull/66839 Reviewed By: ejguan Differential Revision: D31761722 Pulled By: janeyx99 fbshipit-source-id: 8315ac03352c11b3215d89856b3cfda6cd78fa0c	2021-10-19 08:31:16 -07:00
Jane Xu	9eab6da887	[skip ci] Set test owner for nn tests (#66850 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66850 Reviewed By: albanD Differential Revision: D31761712 Pulled By: janeyx99 fbshipit-source-id: 7272154cac77e2ce38370775a9e8d41252e13166	2021-10-19 08:26:50 -07:00
Janet Yang	05b6dc9d75	Fix BatchMatMul test and shape inference (#66733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66733 Fix the test for BatchMatMul to compare glow/caffe2 outputs and fix its shape inference function since it made simplifying assumptions for broadcasting and failed on some of the shapes in the test. The previous inference was failing for any cases where the first n - 2 output dimensions of A x B was not simply that of whichever one of A or B had higher rank (ex. A: [2, 2, 2, 3, 4], B: [3, 1, 2, 2, 4, 5] we expect output dimensions [3, 2, 2, 2, 3, 5] rather than [3, 1, 2, 2, 3, 5]. Test Plan: ``` buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- -r .test_batch_matmul_manydims. --env USE_INF_API=1 ``` Reviewed By: khabinov Differential Revision: D31701184 fbshipit-source-id: 31d0fb17409a399b90fb8042385e000ed81c3581	2021-10-19 07:53:13 -07:00
Philip Meier	9f782f8b35	add `OpInfo` for `torch.nn.pixel_unshuffle` (#65468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65468 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31111699 Pulled By: zou3519 fbshipit-source-id: a92c2f1f4986a54abab82360e97ea2ce22fb9397	2021-10-19 07:36:35 -07:00
Philip Meier	1164118fc2	add `OpInfo` for `torch.nn.pixel_shuffle` (#65467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65467 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31111697 Pulled By: zou3519 fbshipit-source-id: 618e6b2cc927814f85500374a2838d98c9c45d6e	2021-10-19 07:36:33 -07:00
Philip Meier	8f09292c5e	add `OpInfo` for `torch.nn.functional.pairwise_distance` (#65460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65460 cc albanD mruberry jbschlosser walterddr Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31111701 Pulled By: zou3519 fbshipit-source-id: a4034418cf8d14f584134a16d822181703858f99	2021-10-19 07:35:10 -07:00
Ben Koopman	0036e41143	[quant][embedding qat] Add eager QAT test for EmbeddingBag+Linear model (#66334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66334 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31618283 Pulled By: b-koopman fbshipit-source-id: bb824a341f1aa9d7e83f8e66d320a9dfd348a1d7	2021-10-19 07:03:36 -07:00
Richard Barnes	0a07488ed2	use irange for loops 1 (#66741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66741 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705360 fbshipit-source-id: 7115f76e381ad2d98584eb534961c3cbb957ebaa	2021-10-19 03:28:51 -07:00
Giuseppe Ottaviano	72803dbcfd	[caffe2] Fix invalid vector accesses and polar() call (#66757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66757 `InterpreterStateImpl::run()` gets the number of outputs from the current frame, but by the time the continuation completes, the frame is gone, so we're calling `front()` on an empty vector. This works out in practice (data is still there) but it is technically undefined behavior and could break in the future. Also, `std::polar()` expects its argument to be non-negative, but `c10::polar()` does not, so implement it explicitly (implementation is the same as libstdc++). Test Plan: JIT tests pass. Reviewed By: zhxchen17 Differential Revision: D31715587 fbshipit-source-id: 98abcc10c2742887af866d8e70169a0187c41d33	2021-10-19 00:29:54 -07:00
gmagogsfm	147f7559b1	Add `SourceView` which doesn't own source text as base class of `Source` (#65309 ) Summary: This would save the cost copying text from stack to heap in some cases (like parsing function schema during loading phase of libtorch.so) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65309 Reviewed By: swolchok Differential Revision: D31060315 Pulled By: gmagogsfm fbshipit-source-id: 0caf7a688b40df52bb4388c5191d1a42351d6f1a	2021-10-18 23:17:22 -07:00
Rohan Varma	bff64e84cd	[DDP] Track models with sync bn (#66680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66680 Closes https://github.com/pytorch/pytorch/issues/66215. Tracks models with sync BN so we can find workflows that use them and target for perf optimization. ghstack-source-id: 140875182 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31679477 fbshipit-source-id: 0e68cd1a7aabbc5b26227895c53d33b8e98bfb8e	2021-10-18 22:31:52 -07:00
Richard Barnes	e0643fa3fc	use irange for loops 5 (#66744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66744 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705358 fbshipit-source-id: d6ea350cbaa8f452fc78f238160e5374be637a48	2021-10-18 21:59:50 -07:00
Richard Barnes	bceb1db885	use irange for loops 3 (#66747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66747 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31705365 fbshipit-source-id: 5c3af2184766b063eed2f4e8feb69f1fedd3503e	2021-10-18 21:50:32 -07:00
Ivan Yashchuk	061baf02bf	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: zou3519 Differential Revision: D31739416 Pulled By: mruberry fbshipit-source-id: 153c40d8eeeb094b06816882a7cbb28c681509a9	2021-10-18 21:30:01 -07:00
Scott Wolchok	08a464a9f3	[PyTorch] Pass c10::optional<bool> to Stride ctor by value (#66698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66698 this type should fit in a register; no need to pass by reference. ghstack-source-id: 140742830 Test Plan: CI Reviewed By: suo Differential Revision: D31693291 fbshipit-source-id: 299fb3d1830a059b59268487c22e030446c3496e	2021-10-18 21:28:56 -07:00
Natalia Gimelshein	c9c52b760b	test addr type promotion in a single test (#66812 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66802 Test time goes from 150s to 15s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66812 Reviewed By: mruberry Differential Revision: D31739299 Pulled By: ngimel fbshipit-source-id: cb6d92ff335f46ee06b2480bdd9143f85865bccf	2021-10-18 21:21:11 -07:00
Will Constable	d05c1ec007	Add lazy Node base and associated infra (#66601 ) Summary: - Adds Node base class and unit tests - Also adds metadata utils to enable source code annotation and scope tracking Pull Request resolved: https://github.com/pytorch/pytorch/pull/66601 Test Plan: Add new unit tests Reviewed By: desertfire Differential Revision: D31634044 fbshipit-source-id: a042d54f06fbc480acfc63c18d43cb6fceb6fea5	2021-10-18 19:09:42 -07:00
Scott Wolchok	a17a4e93ce	[PyTorch][easy] Fix missing move in UnionType::createWithContained (#66691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66691 Does what it says on the tin. ghstack-source-id: 140736047 Test Plan: CI Reviewed By: suo Differential Revision: D31691627 fbshipit-source-id: 21a5d0248bf3412f5af36260597a5f663ab34361	2021-10-18 18:04:22 -07:00
Scott Wolchok	c9c447f4be	[PyTorch] Fix missing moves in ListType (#66701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66701 We own the argument vector. ghstack-source-id: 140760983 Test Plan: CI Reviewed By: suo Differential Revision: D31693645 fbshipit-source-id: 02829bc3c728f6d1d07be08b0d977eee1efee38f	2021-10-18 18:00:18 -07:00
Scott Wolchok	d0a63c978b	[PyTorch][easy] Don't copy string in TensorType::repr_str unnecessarily (#66699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66699 std::string::operator+ will copy the string an extra time even if the argument is `""`. See https://godbolt.org/z/3sM5h1qTo ghstack-source-id: 140743822 Test Plan: CI Reviewed By: suo Differential Revision: D31693522 fbshipit-source-id: 6a8033c90366904b9aff44214b600cfb255a0809	2021-10-18 17:55:21 -07:00
Scott Wolchok	f65b4b7a4c	[PyTorch] Avoid refcount bump in UnionType::canHoldType (#66693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66693 Passing a `TypePtr` by value causes an unnececssary refcount bump. We don't need to take ownership, so `const Type&` is all we need. I considered providing a compatibility shim that takes `const TypePtr&`, but doing so is dangerous because a copy is required to convert from a more specific pointer like `NoneTypePtr`. ghstack-source-id: 140737081 Test Plan: CI Reviewed By: suo Differential Revision: D31691869 fbshipit-source-id: f766ce3234a28771c2a9ca4c284eb3f96993a3d0	2021-10-18 17:39:59 -07:00
kshitij12345	1db50505d5	[nn] MultiLabelSoftMarginLoss : no batch dim support (#65690 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/60585 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/65690 Reviewed By: zou3519 Differential Revision: D31731162 Pulled By: jbschlosser fbshipit-source-id: d26f27555f78afdadd49126e0548a8bfda50cc5a	2021-10-18 15:30:01 -07:00
Yanli Zhao	8173d4df69	move get_cycles_per_ms() to common_utils (#66798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66798 get_cycles_per_ms is copied and used in a few places, move it to common_utils so that it can be used as a shared util function ghstack-source-id: 140790599 Test Plan: unit tests Reviewed By: pritamdamania87 Differential Revision: D31706870 fbshipit-source-id: e8dccecb13862646a19aaadd7bad7c8f414fd4ab	2021-10-18 14:04:09 -07:00
Eli Uriegas	d024f1134d	ci: Move bazel download from github -> s3 (#66815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66815 Was seeing 403's when attempting to wget from github, re-hosting the binary on s3 so we shouldn't see those issues anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31740656 Pulled By: seemethere fbshipit-source-id: 4462678d51a52b63020f8da18d7cdc80fb8dbc5d	2021-10-18 13:34:40 -07:00
Jerry Zhang	06e49ea088	[not4land][quant][fx][graphmode] lower reference linear module example (#65723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65723 Example lowering reference linear module to fbgemm/qnnpack quantized linear module Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31567461 fbshipit-source-id: 0b8fffaf8e742ec15cb07bf6a4672cf3e856db2d	2021-10-18 13:14:39 -07:00
Jannik Bamberger	c994a7fc2d	Update documentation of torch.nn.Upsample (#66756 ) Summary: The documentation of torch.nn.Upsample stated that `align_corners` only affects `linear`, `bilinear` and `trilinear`. This PR updates the documentation for the Python `Upsample` module and the C++ `UpsampleOptions` struct to reflect that `bicubic` is also affected by `align_corners`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66756 Reviewed By: zou3519 Differential Revision: D31731148 Pulled By: jbschlosser fbshipit-source-id: 3ec277fc3fbdf8414d0de327d8c57ba07342a5b9	2021-10-18 13:07:17 -07:00
lezcano	0974215c4d	Prefer mT and mH over transpose(-2, -1) and transpose(-2, -1).conj() (#64181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64181 This PR replaces all the calls to: - `transpose(-2, -1)` or `transpose(-1, -2)` by `mT()` in C++ and `mT` in Python - `conj().transpose(-2, -1)` or `transpose(-2, -1).conj()` or `conj().transpose(-1, -2)` or `transpose(-1, -2).conj()` by `mH()` in C++ and `mH` in Python. It also simplifies two pieces of code, and fixes one bug where a pair of parentheses were missing in the function `make_symmetric_matrices`. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31692896 Pulled By: anjali411 fbshipit-source-id: e9112c42343663d442dc5bd53ff2b492094b434a	2021-10-18 13:02:25 -07:00
Scott Wolchok	44fd312604	[PyTorch] Use intrusive_ptr to save space in KernelFunction (#65618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65618 This saves 8 bytes per KernelFunction, which should help in resource-constrained environments. ghstack-source-id: 140731069 Test Plan: CI Reviewed By: ezyang Differential Revision: D25405736 fbshipit-source-id: 757c0f1387da9147e46ac69af2aa9fffd2998e35	2021-10-18 12:53:45 -07:00
Scott Wolchok	622e19b859	[PyTorch] Take const Type& in TensorType::fromNumberType (#66716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66716 No need to require a refcount bump for this function. ghstack-source-id: 140754065 Test Plan: CI Reviewed By: suo Differential Revision: D31696639 fbshipit-source-id: bf8aa3f542d52e82e0f6a444b8898330f3d16a31	2021-10-18 12:49:40 -07:00
Scott Wolchok	6a7296be9c	[PyTorch] Use castRaw in InterfaceType (#66728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66728 Two extra refcount bumps. ghstack-source-id: 140760872 Test Plan: CI Reviewed By: suo Differential Revision: D31698577 fbshipit-source-id: 1f50195a99f98f857abc9b03b4254519c316fefe	2021-10-18 12:44:24 -07:00
Jane Xu	9ea3424747	Set test owner for fx (#66807 ) Summary: Action following https://github.com/pytorch/pytorch/issues/66232 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66807 Reviewed By: jamesr66a Differential Revision: D31736722 Pulled By: janeyx99 fbshipit-source-id: 5ffcb02a858137211bff1eabf158001dcb0359a6	2021-10-18 12:25:38 -07:00
Peter Bell	8637556d23	Migrate THCState to ATen (#66765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66765 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31721648 Pulled By: ngimel fbshipit-source-id: 772b24787656a95f9e3fcb287d912b1c3400f32d	2021-10-18 12:14:43 -07:00
Scott Wolchok	1fcbd8fa15	[PyTorch] Fix extra refcount bumps in tryEvalTypeVariables (#66722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66722 Missing move, s/cast/castRaw/, and take TypePtr arg by const ref because we only sometimes need to take ownership. ghstack-source-id: 140757141 Test Plan: CI Reviewed By: suo Differential Revision: D31697631 fbshipit-source-id: 04afe13688c6e2aaf79157400c0a44021cb8179d	2021-10-18 12:06:37 -07:00
Scott Wolchok	393299b124	[PyTorch] Fix unnecessary shared_ptr copies in RRefType (#66706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66706 Missing moves in the construction path. ghstack-source-id: 140746585 Test Plan: CI Reviewed By: suo Differential Revision: D31694356 fbshipit-source-id: 8e2bf2dd41f3f65fc06e30ffd5fddd487d01aaa8	2021-10-18 12:04:43 -07:00
Scott Wolchok	d5a25faf7a	[PyTorch] Fix unnecessary shared_ptr copies in EnumType (#66714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66714 Forced copy in getValueType and unnecessary use of cast over castRaw. ghstack-source-id: 140752791 Test Plan: CI Reviewed By: suo Differential Revision: D31696164 fbshipit-source-id: fc2316617a61ca32f1fb952fb0af18b8784a606b	2021-10-18 12:04:41 -07:00
Ivan Kobzarev	9b729ebc88	[jit] shape propagation for quantization (#66343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66343 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31515839 Pulled By: IvanKobzarev fbshipit-source-id: 1b2b953b93210a1cade64c30302478907fc639f3	2021-10-18 12:03:20 -07:00
BowenBao	1cf317b85f	[ONNX] Support exporting with Apex O2 (#65374 ) (#66700 ) Summary: Apex O2 hook state_dict to return fp16 weights as fp32. Exporter cannot identify them as same tensors. Since this hook is only used by optimizer, it is safe to remove this hook while exporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66700 Reviewed By: zou3519 Differential Revision: D31695132 Pulled By: malfet fbshipit-source-id: 977bdf57240002498f3ad0f1a8046c352e9860e6	2021-10-18 11:54:09 -07:00
Pritam Damania	624ce95201	Run sparse tests only for TensorPipe agent. (#66661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66661 Similar to https://github.com/pytorch/pytorch/pull/66600, runs rpc_test.py sparse tests only for TP agent. ghstack-source-id: 140666322 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D31669850 fbshipit-source-id: 41a66c8d1843130964aede5c77d391484607214f	2021-10-18 11:53:07 -07:00
Nikita Vedeneev	7fad47e522	`torch.linalg.lstsq`: forward/backward AD support (#65054 ) Summary: As per title. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65054 Reviewed By: zou3519 Differential Revision: D31729468 Pulled By: albanD fbshipit-source-id: ab7df824bc80128e7f64f6444c7a4baa4786c161	2021-10-18 11:28:44 -07:00
Scott Wolchok	6bde474066	[PyTorch] Fix extra refcount bumps in matchTypeVariables (#66719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66719 Some cast that could be castRaw. Parameters did not need to force a refcount bump. ghstack-source-id: 140756356 Test Plan: CI Reviewed By: suo Differential Revision: D31697455 fbshipit-source-id: 87a8cba221a7ae53f2a485acafd31622e9328ff0	2021-10-18 11:15:07 -07:00
Scott Wolchok	c373e188d8	[PyTorch] Fix extra refcount bumps in unifyTypes (#66718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66718 Some missing moves and use of cast instead of castRaw (due to a previous automated fixup only being a partial fix). ghstack-source-id: 140755229 Test Plan: CI Reviewed By: suo Differential Revision: D31697115 fbshipit-source-id: 86743f8982951a58638ba244b3a92d3737dde58b	2021-10-18 11:13:45 -07:00
Pearu Peterson	472a6f2787	Strided masked reductions: sum, amax. Testing of masked reductions. (#65990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65990 cc nikitaved pearu cpuhrsch IvanYashchuk Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31729532 Pulled By: albanD fbshipit-source-id: 855a6bb2a7c6e75c780a64ce23c0f29321f0e511	2021-10-18 11:10:32 -07:00
Jerry Zhang	d777e490a5	[bc-breaking][quant][graphmode][fx] Produce reference patterns for GeneralShapeOps (#66647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66647 Missed in the last round , This adds reference patterns for general shape ops like view when is_reference is True bc-breaking: basically disabled getitem from supporting quantized ops here, we may support it later in fbgemm Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestQuantizeFxModels Imported from OSS Reviewed By: H-Huang Differential Revision: D31680379 fbshipit-source-id: 6a3a7128514baf6d92b1607308c40339469d0066	2021-10-18 11:09:17 -07:00
Scott Wolchok	eb1eefc399	[PyTorch] Fix unnecessary shared_ptr copies in DictType (#66702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66702 Missing moves in the construction path and forced copies of the key & value type on access. ghstack-source-id: 140744707 Test Plan: CI Reviewed By: suo Differential Revision: D31693818 fbshipit-source-id: 4c5d2359f58148744621abe81429e56e7889f754	2021-10-18 11:05:25 -07:00
Scott Wolchok	09c4e73c95	[PyTorch] Fix unnecessary shared_ptr copies in FutureType (#66704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66704 Missing moves in the construction path. ghstack-source-id: 140746391 Test Plan: CI Reviewed By: suo Differential Revision: D31694296 fbshipit-source-id: 3bed477c811069248611efdb57ad27c6ca233442	2021-10-18 11:01:00 -07:00
Stas Bekman	62e89f692f	[doc] typo (#66754 ) Summary: This PR fixes a typo in the `torch/autograd/function.py` doc ----------------------- Additionally, the example at https://pytorch.org/docs/master/autograd.html#torch.autograd.Function doesn't quite compile: ``` 'builtin_function_or_method' object has no attribute 'exp' ``` even though `i.exp()` is a valid function if `i` is a tensor. I changed it to: ``` result = torch.exp(i) ``` but python doesn't like it either: ``` TypeError: exp(): argument 'input' (position 1) must be Tensor, not builtin_function_or_method ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66754 Reviewed By: albanD Differential Revision: D31729400 Pulled By: soulitzer fbshipit-source-id: eef783bcdc8d4693a8b7f1ab581e948abc0f9b94	2021-10-18 10:33:56 -07:00
Jane Xu	f4a7273b5c	Set test owners for module: ci (#66796 ) Summary: Action based on RFC https://github.com/pytorch/pytorch/issues/66232 cc seemethere malfet pytorch/pytorch-dev-infra Pull Request resolved: https://github.com/pytorch/pytorch/pull/66796 Reviewed By: seemethere Differential Revision: D31732391 Pulled By: janeyx99 fbshipit-source-id: b894eab8a4a8737165d1ba7b536e1232f6c07a8f	2021-10-18 10:29:50 -07:00
Wanchao Liang	8532061bce	[sharded_tensor] support gloo/mpi backend in tests (#65855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65855 This adjusted our test base to support non-nccl backend like gloo/mpi, so that we could test sharding on CPU with gloo/mpi backend. ghstack-source-id: 140840866 Test Plan: wait for the CI for existing tests, also adding tests in the stacked diff above. Reviewed By: pritamdamania87, bowangbj Differential Revision: D31287162 fbshipit-source-id: d48dfc8ef886a4d34b1de42f3ce6b600b5c9a617	2021-10-18 10:17:59 -07:00
Vasiliy Kuznetsov	d549c8de78	fx quant: enable linear-bn1d fusion for PTQ (#66484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66484 https://github.com/pytorch/pytorch/pull/50748 added linear - bn1d fusion in Eager mode, for PTQ only. This PR also enables this in FX graph mode. We reuse the existing conv-bn-relu fusion handler, renaming `conv` to `conv_or_linear` for readability. The QAT version is saved for a future PR, for both eager and FX graph. Test Plan: ``` python test/test_quantization.py TestFuseFx.test_fuse_linear_bn_eval ``` Imported from OSS Reviewed By: bdhirsh Differential Revision: D31575392 fbshipit-source-id: f69d80ef37c98cbc070099170e335e250bcdf913	2021-10-18 10:14:28 -07:00
Shiyan Deng	9d287d0b63	[fx2trt]Add support for negative dim in softmax (#66760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66760 Previously we didn't convert negative dim to positive dim. Test Plan: WIP Reviewed By: wushirong Differential Revision: D31703127 fbshipit-source-id: 6d5ccecab45b46f867a05ee70c76a5980e41011d	2021-10-18 09:03:56 -07:00
Ben Koopman	aa7da7b09c	[quant][embedding qat] Enable quint4 in EmbeddingBag QAT workflow (#66348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66348 Test Plan: Imported from OSS Reviewed By: HDCharles Differential Revision: D31691300 Pulled By: b-koopman fbshipit-source-id: 11bd75b608b972394fe9f7c9b7bf034af42f28b5	2021-10-18 08:51:39 -07:00
Kushashwa Ravi Shrimali	909694fd88	Fix `nn.functional.max_poolNd` dispatch (for arg: `return_indices`) (#62544 ) Summary: Please see https://github.com/pytorch/pytorch/issues/62545 for context. The order of `return_indices, ceil_mode` is different for `nn.functional.max_poolNd` functions to what seen with `torch.nn.MaxPoolNd` (modular form). While this should be resolved in the future, it was decided to first raise a warning that the behavior will be changed in the future. (please see https://github.com/pytorch/pytorch/pull/62544#issuecomment-893770955 for more context) This PR thus raises appropriate warnings and updates the documentation to show the full signature (along with a note) for `torch.nn.functional.max_poolNd` functions. Quick links: (_upstream_) * Documentation of [`nn.functional.max_pool1d`](https://pytorch.org/docs/1.9.0/generated/torch.nn.functional.max_pool1d.html), [`nn.functional.max_pool2d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool2d.html), and [`nn.functional.max_pool3d`](https://pytorch.org/docs/stable/generated/torch.nn.functional.max_pool3d.html). (_this branch_) * Documentation of [`nn.functional.max_pool1d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool1d.html?highlight=max_pool1d), [`nn.functional.max_pool2d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool2d.html?highlight=max_pool2d#torch.nn.functional.max_pool2d), and [`nn.functional.max_pool3d`](https://docs-preview.pytorch.org/62544/generated/torch.nn.functional.max_pool3d.html?highlight=max_pool3d#torch.nn.functional.max_pool3d). cc mruberry jbschlosser Pull Request resolved: https://github.com/pytorch/pytorch/pull/62544 Reviewed By: gchanan Differential Revision: D31179038 Pulled By: jbschlosser fbshipit-source-id: 0a2c7215df9e132ce9ec51448c5b3c90bbc69030	2021-10-18 08:34:38 -07:00
Alexander Grund	e4a9ee8d42	Deduplicate codegenOutputQuery to query maximum CUDA compute capabilities (#55901 ) Summary: There were 2 versions of the same code which were slightly different although functionally equivalent. When adding support for another CUDA / device version both would need to be changed and kept in sync. So it is better to have only 1 version of it as the unique source of truth. I chose the implementation which looks cleaner and easier to read and added some minor enhancements and comments to further increase readability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55901 Reviewed By: H-Huang Differential Revision: D31636917 Pulled By: bertmaher fbshipit-source-id: 622e1fabc39de4f3f1b1aa9a1544cfbd35a5cfd9	2021-10-18 07:42:15 -07:00
Kevin Tse	811f5a2b94	Adding StreamWrapper to ensure file object will be closed (#66715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66715 Adding StreamWrapper to streams produced by DataPipes within PyTorch Core and TorchData Test Plan: OSS CI and Internal Tests Reviewed By: ejguan Differential Revision: D31695248 fbshipit-source-id: c26fa1bc1688d5597851ad265f667fafdcd64c59	2021-10-18 07:31:32 -07:00
Ivan Yashchuk	0d203a16fe	Add relative and absolute tolerances for matrix_rank, pinv (#63102 ) Summary: This pull request introduces new keyword arguments for `torch.linalg.matrix_rank` and `torch.linalg.pinv`: `atol` and `rtol`. Currently, only tensor overload has default values for either `atol` or `rtol`, the float overload requires both arguments to be specified. FC compatibility: https://github.com/pytorch/pytorch/pull/63102#discussion_r710930509 Fixes https://github.com/pytorch/pytorch/issues/54151. Fixes https://github.com/pytorch/pytorch/issues/66618. cc jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/63102 Reviewed By: H-Huang Differential Revision: D31641456 Pulled By: mruberry fbshipit-source-id: 4c765508ab1657730703e42975fc8c0d0a60eb7c	2021-10-17 22:15:42 -07:00
Mengwei Liu	53aac4b6f3	[PyTorch] Allow override for macro `HAS_DEMANGLE` (#66540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66540 Currently the macro `HAS_DEMANGLE` is determined by compiler predefined macros. Here I'm adding an option to allow `HAS_DEMANGLE` to be defined in build files. Test Plan: Rely on CI Reviewed By: poweic Differential Revision: D31600007 fbshipit-source-id: 76cf088b0f5ee940e977d3b213f1446ea64be036	2021-10-17 16:10:45 -07:00
Natalia Gimelshein	3b4cb9ddca	Revert D31577488: Migrate THCState to ATen Test Plan: revert-hammer Differential Revision: D31577488 (`65adf1dfa2`) Original commit changeset: 90604f30854f fbshipit-source-id: 3d7e35b3d6ea94f2c999bcf821b33a9cf1db01ee	2021-10-16 21:51:36 -07:00
Natalia Gimelshein	719d43a2a2	Revert D31547709: Remove THCGeneral.cpp Test Plan: revert-hammer Differential Revision: D31547709 (`aa0c31876b`) Original commit changeset: 059c47621863 fbshipit-source-id: e8c3597f2badbc5ecf356b381edea06a07331f24	2021-10-16 21:50:19 -07:00
Yukio Siraichi	8854817f44	Implement Python Array API `asarray` function. (#60627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60627 In this PR, the core of `frombuffer` and `fromDLPack` onto _tensor_new.cpp_. `asarray` uses such refactored functions for interpreting the object as a tensor. We follow the Python Array API standard found: https://data-apis.org/array-api/latest/API_specification/creation_functions.html?highlight=asarray Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31640510 Pulled By: mruberry fbshipit-source-id: d0869e0d73cb50023d5866b001dac5d34ca30dfd	2021-10-16 21:11:31 -07:00
Priya Ramani	9e3a2babfa	Make aotCompile support multiple input sizes (#66727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66727 Make aotCompile support multiple input sizes Test Plan: Able to compile and run a model with multiple inputs ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ PYTORCH_JIT_LOG_LEVEL=aot_compiler buck run //caffe2/binaries:aot_model_compiler -- --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims="2,2,2;2,2,2" Building: finished in 3.2 sec (100%) 7461/7461 jobs, 0/7461 updated Total time: 3.4 sec BUILD SUCCEEDED [DUMP aot_compiler.cpp:097] graph before shape propagation [DUMP aot_compiler.cpp:097] graph(%x.1 : Tensor, [DUMP aot_compiler.cpp:097] %y.1 : Tensor): [DUMP aot_compiler.cpp:097] %3 : int = prim::Constant[value=1]() # :0:0 [DUMP aot_compiler.cpp:097] %4 : Tensor = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15 [DUMP aot_compiler.cpp:097] return (%4) (1,.,.) = 0.3357 0.6137 0.8472 0.0858 (2,.,.) = 0.8406 0.2959 0.6012 0.7184 [ CPUFloatType{2,2,2} ] (1,.,.) = 0.7086 0.6398 0.0579 0.1913 (2,.,.) = 0.8598 0.3641 0.5925 0.0200 [ CPUFloatType{2,2,2} ] here 2 2 graph 0x6130001ee2d0 [DUMP aot_compiler.cpp:118] graph after shape propagation [DUMP aot_compiler.cpp:118] graph(%x.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu), [DUMP aot_compiler.cpp:118] %y.1 : Float(2, 2, 2, strides=[4, 2, 1], requires_grad=0, device=cpu)): [DUMP aot_compiler.cpp:118] %3 : int = prim::Constant[value=1]() # :0:0 [DUMP aot_compiler.cpp:118] %4 : Tensor(2, 2, 2) = aten::add(%x.1, %y.1, %3) # /data/users/priyaramani/fbsource/fbcode/caffe2/test/mobile/nnc/aot_test_model.py:10:15 [DUMP aot_compiler.cpp:118] return (%4) The compiled llvm assembly code was saved to aot_test_model.compiled.ll The compiled model was saved to aot_test_model.compiled.pt └─ $ ./compile_model.sh -m aot_test_model -p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt -v v1 -i "2,2,2;2,2,2" + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + MODEL=aot_test_model + getopts m:p:v:i:h opt + case $opt in + MODEL_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt + getopts m:p:v:i:h opt + case $opt in + VERSION=v1 + getopts m:p:v:i:h opt + case $opt in + INPUT_DIMS='2,2,2;2,2,2' + getopts m:p:v:i:h opt + require_arg m aot_test_model + '[' -n aot_test_model ']' + require_arg p /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt + '[' -n /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']' + require_arg i '2,2,2;2,2,2' + '[' -n '2,2,2;2,2,2' ']' + '[' '!' -f /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ']' +++ dirname ./compile_model.sh ++ cd . ++ pwd -P + SRC_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc + FBCODE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../.. + FBSOURCE_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + KERNEL_DIR=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../../xplat/pytorch_models/build/aot_test_model/v1/nnc ++ echo /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.pt ++ sed 's/.pt.*//' + MODEL_PATH_PREFIX=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model + LLVM_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.ll + ASSEMBLY_CODE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.s + COMPILED_MODEL_FILE_PATH=/data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt + KERNEL_FUNC_NAME=nnc_aot_test_model_v1_forward + cd /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/../../../.. + buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc -- --model /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt --print_output true --input_dims '2,2,2$ 2,2,2' --input_type 'float;float' --input_memory_format 'contiguous_format;contiguous_format' clang-9: warning: argument unused during compilation: '-pthread' [-Wunused-command-line-argument] Downloaded 1/4 artifacts, 2.11 Kbytes, 50.0% cache miss (for updated rules) Building: finished in 12.2 sec (100%) 4572/4572 jobs, 3/4572 updated Total time: 12.2 sec BUILD SUCCEEDED Run with 56 threads Run with 56 threads Loading model... Model loaded: /data/users/priyaramani/fbsource/fbcode/caffe2/fb/nnc/aot_test_model.compiled.pt Running forward ... (1,.,.) = -0.7451 -0.7451 -0.7451 -0.7451 (2,.,.) = -0.7451 -0.7451 -0.7451 -0.7451 [ CPUFloatType{2,2,2} ] Starting benchmark. Running warmup runs. Main runs. Main run finished. Milliseconds per iter: 0.0887. Iters per second: 11274 Memory usage before main runs: 71262208 bytes Memory usage after main runs: 71573504 bytes Average memory increase per iter: 31129.6 bytes 0 value means "not available" in above ``` Reviewed By: ljk53 Differential Revision: D31631975 fbshipit-source-id: 7956787b3e121f9c14f4733398a64c2f7ae84373	2021-10-16 20:04:52 -07:00
Priya Ramani	962c6476da	Refactor: move method to func compilation work to compileMethod, add option to specify method name (#66726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66726 Move method to func compilation work to compileMethod Test Plan: Mobilenetv3 compiles and runs successfully ``` (pytorch) ~/fbsource/fbcode/caffe2/fb/nnc └─ $ buck run //caffe2/binaries:aot_model_compiler -- --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="1,3,224,224" Downloaded 0/4 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 13.2 sec (100%) 18719/18719 jobs, 2/18719 updated Total time: 13.5 sec BUILD SUCCEEDED The compiled llvm assembly code was saved to mobilenetv3.compiled.ll The compiled model was saved to mobilenetv3.compiled.pt ``` Reviewed By: ljk53, IvanKobzarev Differential Revision: D31624342 fbshipit-source-id: 233a6e94ea05ba8d6fc166d2414034c9e58cb076	2021-10-16 20:03:24 -07:00
Peter Bell	aa0c31876b	Remove THCGeneral.cpp (#66391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66391 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31547709 Pulled By: ngimel fbshipit-source-id: 059c47621863738fb560f4257e7765afa9b952aa	2021-10-16 14:53:52 -07:00
Shunting Zhang	8c5928bd78	add frozen_numpy as a builtin library to torch::deploy (#66297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66297 Link register_numpy.cpp with the embedded interpreter will register numpy as a builtin library. Test Plan: Add unit test to test basic numpy functionality in torch::deploy like creating random matrices, matric multiplication. Reviewed By: suo Differential Revision: D31490434 fbshipit-source-id: b052ce01fc64fb0efee846feb0acc1f107ba13e0	2021-10-15 21:48:24 -07:00
Shiyan Deng	42f138469a	[TS] Return early if device doesn't match (#66694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66694 `lhs.equal(rhs)` would throw if the device doesn't match. To avoid that we return early if the device doesn't match. Test Plan: CI Reviewed By: houseroad Differential Revision: D31691608 fbshipit-source-id: 513c3e0743a65d9778c7ef9b79ececfeaccc0017	2021-10-15 18:13:46 -07:00
Richard Barnes	32ac001e4d	Suppress deprecated copy in vec256_qint.h (#66646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66646 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31660387 fbshipit-source-id: a1ea9702a8b33f78a7201a1d9214065c2fb930b1	2021-10-15 17:14:15 -07:00
Peter Bell	65adf1dfa2	Migrate THCState to ATen (#66480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66480 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31577488 Pulled By: ngimel fbshipit-source-id: 90604f30854fe766675baa3863707ac09995bc9e	2021-10-15 17:05:04 -07:00
Xue Li	2f099c7555	Revert D30652629: use irange for loops Test Plan: revert-hammer Differential Revision: D30652629 (`687c2267d4`) Original commit changeset: 0ae6c4bbbb55 fbshipit-source-id: 5c4f067b584a021c8c9656454d1ee60999600fb3	2021-10-15 15:23:10 -07:00
Peter Bell	1e2b2ee5ff	sort_out_cuda: Use custom kernels to fill index tensors (#66668 ) Summary: These stable sorts currently use a combination of `at::arange`, view ops and `tensor.copy_` to fill in the initial values for the indices before calling into `CUB` to do the actual sort. This is somewhat inefficient because it requires 2 to 4 kernel launches, and the copies all use strided kernels instead of the more efficient contiguous kernels. Instead, a fairly straight-forward custom kernel is more efficient in terms of both CUDA and CPU runtime. In a simple benchmark I profiled `a.sort(stable=True, dim=1)` for different shapes and single out the kernel invocations for intitializing the index tensors (i.e. the non-`cub` kernels). Note that when the batch dim is `<128` we call `segmented_sort_pairs_by_full_sort` instead of `segmented_sort_pairs`: \| shape \| Master (us) \| This PR (us) \| \|--------------\|:-----------:\|:------------:\| \| (100, 1000) \| 5.000 \| 2.300 \| \| (1000, 100) \| 2.070 \| 1.090 \| \| (100, 10000) \| 87.34 \| 26.47 \| \| (1000, 1000) \| 28.63 \| 20.27 \| Of course for sufficiently large inputs, the overall runtime is dominated by the actual sort. But I have another motive of wanting to remove operator the calls from the middle of this kernel launch code. This change makes it easier to split the kernel code that needs to be compiled with `nvcc` into it's own file that doesn't include `Tensor.h`, similar to what I'm doing in https://github.com/pytorch/pytorch/issues/66620. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66668 Reviewed By: H-Huang Differential Revision: D31693722 Pulled By: ngimel fbshipit-source-id: 5765926e4dbbc7a20d2940c098ed093b3de2204e	2021-10-15 15:13:02 -07:00
driazati	9ba39d2008	Clean up test running scripts (#65508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65508 This has some misc cleanups for the code that happens before `run_test.py`: * remove hardcoding of 2 shards * add `set -eux` in some places Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31296509 Pulled By: driazati fbshipit-source-id: 2df1463432846d8a4d8a579812a4e9c3b7c2b957	2021-10-15 14:36:32 -07:00
Sangbaek Park	2c761caaaa	[Vulkan] cat operator for channel dimension (#66669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66669 Implemented `cat` operator for channel dimension Facts: * texture coordinate: x(width), y(height), z(depth) * input x, y, z -> no change * out x, y -> no change * out z and index i, j only matter Equations: batch_size = bt0 (or bt1 or bt2 or ...) = # of batch for tensor i ch_size = ch0 (or ch1 or ch2 or ...) = # of channels for tensor i ch_interval = ch0 + ch1 + ch2 + ... = total # of channels for all tensors ch_size_allprior = ch0 (or ch0+ch1 or ch0+ch1+ch2 or ...) = # of channels for tensor 0 to i-1 where pos.z = d (input) i = index of input texel = vec4[i] of texel at posIn(x,y,z) on input texture j = index of output texel = vec4[j] of texel at posOut(x',y',z') on input texture posIn[i] = {x,y,z} at ith index of vec4 src_index = posIn.z * 4 + i dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior d = posOut.z = int(dst_index / 4) j = (dst_index % 4) posOut[j] = {posIn.x, posIn.y, d} at jth index of vec4 Shader pseudo code: posOut = posIn; for (i = 0; i < 4; ++i) { src_index = posIn.z * 4 + i; if (src_index >= ch_size * batch_size) break; // out of range dst_index = int(src_index / ch_size) * ch_interval + (src_index % ch_size) + ch_size_allprior; posOut.z = int(dst_index / 4); j = (dst_index % 4); uOutput[j] = uInput[i] } Test Plan: Test build on Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Test result: ``` [ RUN ] VulkanAPITest.cat_dim1_samefeature_success [ OK ] VulkanAPITest.cat_dim1_samefeature_success (101 ms) [ RUN ] VulkanAPITest.cat_dim1_difffeature_success [ OK ] VulkanAPITest.cat_dim1_difffeature_success (81 ms) [ RUN ] VulkanAPITest.cat_dim1_texture2d_success [ OK ] VulkanAPITest.cat_dim1_texture2d_success (2 ms) [ RUN ] VulkanAPITest.cat_dim1_singledepth_success [ OK ] VulkanAPITest.cat_dim1_singledepth_success (6 ms) [ RUN ] VulkanAPITest.cat_dim1_singletensor_success [ OK ] VulkanAPITest.cat_dim1_singletensor_success (21 ms) [ RUN ] VulkanAPITest.cat_dim1_twotensors_success [ OK ] VulkanAPITest.cat_dim1_twotensors_success (53 ms) [ RUN ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success [ OK ] VulkanAPITest.cat_dim1_bat1_ch4multiple_success (17 ms) [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (83 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (86 ms) [ RUN ] VulkanAPITest.cat_dim2_singledepth_success [ OK ] VulkanAPITest.cat_dim2_singledepth_success (5 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (82 ms) ``` Reviewed By: SS-JIA Differential Revision: D31593623 fbshipit-source-id: e52dc57985e3f0bb9b20313d4fcc7248a436e863	2021-10-15 14:25:19 -07:00
Xue Haotian	06cfdfae0e	Promote integral inputs to floating for `torch.logsumexp` (#63393 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/56132, Integral inputs of `torch.logsumexp` would be promoted to the floating point type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63393 Reviewed By: ezyang Differential Revision: D30512180 Pulled By: mruberry fbshipit-source-id: fbde3605c15b930411d0d1eb3a132b0088187097	2021-10-15 14:20:50 -07:00
Don Jang	67e003f09b	[Static Runtime] Determine function for `ProcessedNode::run()` statically (#66692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66692 Currently `ProcessedNode::run()` performs 2 dynamic dispatches to decide which function implementation to execute depending on if the function is an out variant / native / or interpreter fallback. Note that this is happening every time an operation is executed by Static Runtime dynamically. This change makes that same decision during module loading time once so that we can remove 1 dynamic dispatch cost at runtime. size reduction Saving 4 bytes per `ProcessedNode`. - Before: sizeof(c10::variant<OutVariant, NativeFunction, Operation>):40 - After: sizeof(std::function<void(ProcessedNode)>): 32 + sizeof(FunctionKind):4 = 36 latency optimization* Expected to remove 2 memory loads & 1 conditional jump per `ProcessedNode::run()` execution (needs to be confirmed from compiled binary code). Ran `ptvsc2_predictor_bench` with `inline_cvr` with 1000 iterations: - local : 7.56026 -> 7.24794 - local_ro: 1.5799. -> 1.55504. - remote_ro: 10.6464 -> 10.3017 Test Plan: Ran existing unittests Reviewed By: swolchok Differential Revision: D31591785 fbshipit-source-id: 5de83ca386af509381e08ecedf071ee4e9f0f0b0	2021-10-15 14:07:24 -07:00
Nikita Shulga	d1b6121935	Revert D31656999: Add meta support to tensor range factories Test Plan: revert-hammer Differential Revision: D31656999 (`7400f34b8e`) Original commit changeset: 06e7f3655b94 fbshipit-source-id: 2f9d8d1acbb01c5105ece73472e5c1f5f90886ee	2021-10-15 14:03:04 -07:00
Kurt Mohler	a25648953c	Add `warn_only` kwarg to `use_deterministic_algorithms` (#66233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64883 Adds a `warn_only` kwarg to `use_deterministic_algorithms`. When enabled, calling an operation that does not have a deterministic implementation will raise a warning, rather than an error. `torch.testing._internal.common_device_type.expectedAlertNondeterministic` is also refactored and documented in this PR to make it easier to use and understand. cc mruberry kurtamohler Pull Request resolved: https://github.com/pytorch/pytorch/pull/66233 Reviewed By: bdhirsh Differential Revision: D31616481 Pulled By: mruberry fbshipit-source-id: 059634a82d54407492b1d8df08f059c758d0a420	2021-10-15 13:54:59 -07:00
Richard Barnes	687c2267d4	use irange for loops (#66234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66234 Modified loops in files under fbsource/fbcode/caffe2/ from the format `for(TYPE var=x0;var<x_max;x++)` to the format `for(const auto var: irange(xmax))` This was achieved by running r-barnes's loop upgrader script (D28874212) with some modification to exclude all files under /torch/jit and a number of reversions or unused variable suppression warnings added by hand. bypass_size_limit allow-large-files Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D30652629 fbshipit-source-id: 0ae6c4bbbb554bad42e372792a6430e1acf15e3e	2021-10-15 13:50:33 -07:00
Xiang Gao	b5b7d6a3a6	EmbeddingBackward exclusive_scan thrust->cub (#66566 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66566 Reviewed By: H-Huang Differential Revision: D31637660 Pulled By: ngimel fbshipit-source-id: 8093432bb9a9b902bb6bab7da221f0bcd7e9fb34	2021-10-15 13:46:30 -07:00
Richard Barnes	bd25f92e81	Fix Wextra issues in Half.h (#66643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66643 Fixes: ``` caffe2/c10/util/Half.h:456:14: error: comparison of integers of different signs: 'long' and 'unsigned long' [-Werror,-Wsign-compare] return f > limit::max() \|\| ~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31656816 fbshipit-source-id: 7623d20e166a9e95a949ebd8b23793f24960cf07	2021-10-15 13:38:10 -07:00
Richard Barnes	abc022f9c8	Fix torch.cholesky deprecation warning (#66645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66645 Fixes: ``` test_cholesky_solve_batched_broadcasting_cpu_complex128 (__main__.TestLinalgCPU) ... test_linalg.py:3099: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release. ``` Test Plan: Sandcastle Reviewed By: mruberry Differential Revision: D31635851 fbshipit-source-id: c377eb88d753fb573b3947f0c6ff5df055cb13d8	2021-10-15 13:24:58 -07:00
jiayisun	0b8dc0f04a	add BFloat16 operators on CPU: logaddexp, logaddexp2, remainder (#63621 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/63621 Reviewed By: H-Huang Differential Revision: D31640811 Pulled By: mruberry fbshipit-source-id: 1fd061b65c196398738018eefc52bf459e424b1c	2021-10-15 13:11:45 -07:00
Shirong Wu	a58852fd44	Fix fx2trt broken unit test (#66696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66696 D31511082 (`9918fd8305`) moved unit test but didn't add proper target in build file, fix it in this diff. Test Plan: buck test mode/opt caffe2/test/fx2trt/converters/... Reviewed By: 842974287 Differential Revision: D31667697 fbshipit-source-id: 49e04afa323b27a1408c9bc2b5061b6529ced985	2021-10-15 12:56:12 -07:00
gmagogsfm	e48a4cbf64	Make several methods of SharedParserData private (#66670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66670 Reviewed By: zhxchen17 Differential Revision: D31674377 Pulled By: gmagogsfm fbshipit-source-id: 5c73b78f842c5c4305047ca98f40bf99bd3d2d60	2021-10-15 12:43:45 -07:00
Scott Wolchok	e88d1c4f10	[PyTorch] Add tuple inline storage (#64066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64066 I noticed a bunch of time being spent heap-allocating Tuples in the unpickler. 1-, 2-, and 3-element Tuples are apparently common enough that they get their own bytecode instructions, so I decided to try also giving them their own representation. We store up to 3 IValues inline in `Tuple` rather than doing a second heap allocation for a `std::vector<IValue>`. ghstack-source-id: 140695395 Test Plan: Added automated tests for TupleElements. Pixel 3 before: https://www.internalfb.com/intern/aibench/details/761596366576284 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/591414145082422 We went from 347 ms to 302 ms. Reviewed By: dhruvbird Differential Revision: D30592622 fbshipit-source-id: 93625c54c9dca5f765ef6d5c191944179cb281a8	2021-10-15 12:16:51 -07:00
Teng Zhang	f8f9a47b02	PR3: add a workaround for reference path (#66535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66535 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31676400 Pulled By: rahxephon89 fbshipit-source-id: fd4c8e9bbc82930cc1255fb8bf8d8ac7f0934c3f	2021-10-15 11:56:11 -07:00
Can Balioglu	7400f34b8e	Add meta support to tensor range factories (#66630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66630 This PR adds meta backend support to the `range`, `arange`, `linspace`, and `logspace` operators. ghstack-source-id: 140618055 Test Plan: Extended the existing tensor creation tests to assert meta backend support. Reviewed By: ezyang Differential Revision: D31656999 fbshipit-source-id: 06e7f3655b94c0d85a28bcd0ca61d9f9ce707f1d	2021-10-15 11:17:08 -07:00
Samantha Andow	6436bd3d5d	Clarify topk doc (#65938 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50331 <img width="855" alt="Screen Shot 2021-10-01 at 11 23 23 AM" src="https://user-images.githubusercontent.com/17888388/136036611-f2bd9c77-61b4-4ab8-85eb-44f50c1e03d7.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/65938 Reviewed By: bdhirsh Differential Revision: D31314875 Pulled By: samdow fbshipit-source-id: bdd9425fd748710f8a64ed1989e1938dd358780f	2021-10-15 11:15:48 -07:00
Gary Miguel	2506baf9c2	[ONNX] move CheckerError from torch.onnx.utils to torch.onnx (#66644 ) Summary: This moves it to where the user would expect it to be based on the documentation and all the other public classes in the torch.onnx module. Also rename it from ONNXCheckerError, since the qualified name torch.onnx.ONNXCheckerError is otherwise redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66644 Reviewed By: malfet Differential Revision: D31662559 Pulled By: msaroufim fbshipit-source-id: bc8a57b99c2980490ede3974279d1124228a7406	2021-10-15 10:38:56 -07:00
Mikhail Zolotukhin	3a9259f6cf	[TensorExpr] Add missing schema for aten::where and aten::pow lowerings. (#66688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66688 Differential Revision: D31689431 D31689431 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 6b3abb4471170ff5418f72bb700325711e7bd28f	2021-10-15 10:14:43 -07:00
Nikita Vedeneev	06c37876b8	`torch.linalg.householder_product` faster backward (#63880 ) Summary: This PR implements a much more efficient algorithm. This algorithm allows to achieve MASSIVE speed-ups, especially for batched and/or larger double-precision inputs. Here are some benchmarks: <details> <summary>Testing script</summary> ```python from IPython import get_ipython import torch import itertools torch.manual_seed(13) #torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def generate_input(shape, dtype=torch.double, device=cpu): eigvals = torch.rand(shape[:-1], dtype=dtype, device=device) eigvecs = torch.rand(shape, dtype=dtype, device=device) input = (eigvecs * eigvals.unsqueeze(-2)) @ eigvecs.inverse() input.requires_grad_(True) tau = torch.rand(*shape[:-1], dtype=dtype, device=device) tau.requires_grad_(True) return input, tau def run_test(shape, device, dtype): print(f"shape: {shape}, device: {device}, dtype: {dtype}") a, tau = generate_input(shape, dtype=dtype, device=device) prod = torch.linalg.householder_product(a, tau) ones_prod = torch.ones_like(prod) command = "torch.autograd.backward((prod,), (ones_prod), retain_graph=True)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() dtypes = [torch.float, torch.double] devices = [cpu, cuda] #devices = [cuda] sizes = [ (10, 10), (1000, 10, 10), (100, 100), (1000, 100, 100), (1000, 1000), (10, 1000, 1000), ] for device, dtype, size in itertools.product(devices, dtypes, sizes): run_test(size, device, dtype) ``` </details> <details> <summary>This PR, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.33 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 1.52 ms ± 40.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (100, 100), device: cuda, dtype: torch.float32 10.8 ms ± 9.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 127 ms ± 8.45 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 1000), device: cuda, dtype: torch.float32 151 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 981 ms ± 91.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float32</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float32 1.64 ms ± 6.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float32 298 ms ± 463 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float32 15.4 ms ± 41.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float32 5.36 s ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float32 1.64 s ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float32 15.7 s ± 43.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.14 ms ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 2.22 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cuda, dtype: torch.float64 10.6 ms ± 11.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 287 ms ± 84.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 236 ms ± 41.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 1.88 s ± 88.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cuda float64</summary> ``` shape: (10, 10), device: cuda, dtype: torch.float64 1.58 ms ± 8.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cuda, dtype: torch.float64 308 ms ± 213 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (100, 100), device: cuda, dtype: torch.float64 79 ms ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cuda, dtype: torch.float64 54.2 s ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cuda, dtype: torch.float64 31.5 s ± 698 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cuda, dtype: torch.float64 4min 45s ± 2.48 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 476 µs ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 5.1 ms ± 100 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 4.38 ms ± 4.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 1.55 s ± 6.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 745 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 5.44 s ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float32</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float32 387 µs ± 645 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float32 12.3 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float32 39.4 ms ± 80.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float32 29.1 s ± 44.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float32 9.42 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float32 1min 50s ± 282 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>This PR, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 381 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 6.19 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 4.6 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 2.59 s ± 8.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 1.07 s ± 5.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 14.4 s ± 13.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>Master, cpu float64</summary> ``` shape: (10, 10), device: cpu, dtype: torch.float64 395 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) shape: (1000, 10, 10), device: cpu, dtype: torch.float64 14.6 ms ± 9.76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) shape: (100, 100), device: cpu, dtype: torch.float64 45.5 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) shape: (1000, 100, 100), device: cpu, dtype: torch.float64 33.1 s ± 69.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (1000, 1000), device: cpu, dtype: torch.float64 19.3 s ± 80.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) shape: (10, 1000, 1000), device: cpu, dtype: torch.float64 3min 30s ± 1.29 s per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/63880 Reviewed By: soulitzer Differential Revision: D30639435 Pulled By: anjali411 fbshipit-source-id: 127789943ae56e2f1dd03e0fe76ef7b6db86bcf0	2021-10-15 09:54:30 -07:00
Jeff Daily	65e25256c3	[ROCm] enable test_distributed() in test.sh (#66657 ) Summary: Restores tests for ROCm CI that used to run prior to https://github.com/pytorch/pytorch/issues/63147. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/66657 Reviewed By: soulitzer Differential Revision: D31668379 Pulled By: malfet fbshipit-source-id: 91a6f6c63d6c957cc5821edbd33d4c16eecc8c0a	2021-10-15 09:45:11 -07:00
Yanli Zhao	8a01bbd64a	add flatten parameter module (#66578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66578 flatten parameters for performance optimization and handle the case when grad ready order is different or there are unused parameters among ranks. when there is no param to be sharded in the FSDP instance (usually root), the flatten wrapper module's flat_param is None. ghstack-source-id: 140696745 Test Plan: unit test Reviewed By: mrshenli Differential Revision: D31625194 fbshipit-source-id: c40e84f9154f5703e5bacb02c37c59d6c4e055c7	2021-10-15 09:37:26 -07:00
CodemodService FBSourceClangFormatLinterBot	a3d12bcdf9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31681115 fbshipit-source-id: e2146e59a57ff27759de18b00fb644e9dc3c5672	2021-10-15 03:07:57 -07:00
Chen Lai	76efbccc3b	[PyTorch Edge][tracing-based] Unify tracer between internal and external (#64152 ) Summary: As title, introduce the file `TracerRunner` shared by internal/external tracer and the main function is ``` TracerResult trace_run(const std::string& input_module_path); ``` which basically takes the path to model file and generate the trace result. The main difference between external tracer and internal tracer is 1. the dependency on `<yaml-cpp/yaml.h>`. 2. the output yaml file from internal tracer includes `model_version` and `model_asset`. These are only needed for internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64152 ghstack-source-id: 140692467 Test Plan: ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/deeplabv3_scripted_with_bundled_input.ptl ``` have the same operator output selected_operators.yaml (P460296279) selected_mobile_ops.h (P460296258) Reviewed By: dhruvbird Differential Revision: D30632224 fbshipit-source-id: eb0321dbc0f1fcf6d2e05384695eebb59ac04f8c	2021-10-15 02:19:45 -07:00
Rohan Varma	1e47181c47	[DDP Logging] Add iteration in error reporting (#65772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65772 Looking at some workloads and it would be useful to have this info. ghstack-source-id: 140555200 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31224417 fbshipit-source-id: 14eeb053aced87c7ca43b6879f81f54bd0a42b76	2021-10-14 22:29:36 -07:00
Rohan Varma	3740a06712	[MonitoredBarrier] Fix some logging (#65771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65771 Fixes some logging around monitored_barrier to make it cleaner. ghstack-source-id: 140555204 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31222881 fbshipit-source-id: 77d6f072ce98a9b31192e0d48ea0f8cbd8f216fe	2021-10-14 22:28:16 -07:00
Rohan Varma	06fa6c15c0	Back out "Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor"" (#66393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66393 Third try! Fixes: - test_nccl_timeout can be flaky because of 1s timeout, bump up the timeout to resolve the flakiness. But in general we should not have been relying on time.sleep for this test, filed https://github.com/pytorch/pytorch/issues/66354 to track that. - ciflow/all did not actually run tests due to a bug causing multigpu tests to not be run. This has since been fixed. ghstack-source-id: 140560113 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534735 fbshipit-source-id: 8b7e0f4fed3972b7a77cbcda28876c9eefb0c7e2	2021-10-14 22:23:22 -07:00
Animesh Jain	59b28063b4	[NNC] Adding more python bindings for missing operators (#66612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66612 For op authoring project, we want to expose the python bindings to create Expr. These are the missing bindings. Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31667852 fbshipit-source-id: 6d3ff83a7676cfea391ab3ea60dde6874a64047a	2021-10-14 22:09:01 -07:00
Mike Iovine	8dcf84069e	[PyTorch] Implement improved version of gather_ranges_to_dense (#66677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66677 Reviewed By: wfanzju Differential Revision: D31676536 fbshipit-source-id: a2eb1b1f9e5a0b78f89c3aad19f97acb7c05e1f8	2021-10-14 21:22:15 -07:00
Abhishek Gadewar	70fc60b9d1	Revert D31325860: [PyTorch] Implement improved version of gather_ranges_to_dense Test Plan: revert-hammer Differential Revision: D31325860 (`23710e2d80`) Original commit changeset: 8e154f929ff7 fbshipit-source-id: 6d36d50d6bd4ec4fe07a6e2d1d0110504b9c8b53	2021-10-14 19:43:38 -07:00
Peizhao Zhang	b60050e96a	[qat]Make sure the bn statistics are the same in the unit test. (#66244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66244 Make sure the bn statistics are the same in the unit test. * The fused model in the existing code will have different bn statistics compared to the model without fusion. They will produce the same result when the model is in training mode, but different result in eval mode. Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion Reviewed By: jerryzh168 Differential Revision: D29504500 fbshipit-source-id: 41e3bfd7c652c27619baa7cbbe98d8d06a485781	2021-10-14 19:23:05 -07:00
Mike Iovine	23710e2d80	[PyTorch] Implement improved version of gather_ranges_to_dense (#66664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66664 Reviewed By: hlu1 Differential Revision: D31325860 fbshipit-source-id: 8e154f929ff7c597ff6e41f18278b24c552d1719	2021-10-14 18:37:35 -07:00
Adam Mainz	583217fe37	changes for pytorch issue 55577 (#66571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66571 changes for pytorch issue 55577 Test Plan: Ran test: python test/test_jit.py TestDict Reviewed By: tugsbayasgalan Differential Revision: D31622633 fbshipit-source-id: 171c68a65b1d0bf769b3d95f103daba375e95335	2021-10-14 18:19:11 -07:00
Ansley Ussery	a1084401b0	Clean up `DictLiteral` and `DictComprehension` emission logic (#64953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64953 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D30914687 Pulled By: ansley fbshipit-source-id: ab9b9192a29f05b90c113c678e7c795bc087dc99	2021-10-14 17:35:39 -07:00
Ansley Ussery	a7b79033ea	Clean up `ListLiteral` and `ListComprehension` emission logic (#64952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64952 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30914690 Pulled By: ansley fbshipit-source-id: 83ac9bc6445f89b3f47c5404435bc6058c6f3bd7	2021-10-14 17:34:17 -07:00
Samuel Salas	22ec625028	fx2trt example: run all submodules (#66590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66590 Updated fx2trt example to run all submodules Added assertion to make sure outputs from lowered and regular models matches Test Plan: buck run mode/dev-nosan caffe2:fx2trt_example Reviewed By: 842974287 Differential Revision: D31592985 fbshipit-source-id: 45ce0b33e957f16b3729d3ecde706331c29d7214	2021-10-14 17:09:29 -07:00
Dhruv Matani	20aa417e38	[PyTorch] [Quantization] Speed up PackedEmbeddingBagWeight::prepack() (#66632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66632 Calling `.item<float>()` for each element in a tensor is expensive. Instead convert the entire Tensor in one call to `Tensor::copy_(input_tensor)`. See [this post](https://fb.workplace.com/groups/1144215345733672/posts/2080756188746245/) for more details. ghstack-source-id: 140639868 Test Plan: Build and run with bundled inputs. ### AI Bench Before: [AI Bench](https://www.internalfb.com/intern/aibench/details/877359346171823), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634185889953.html): 500ms After: [AI Bench](https://www.internalfb.com/intern/aibench/details/60828780633319), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v6_perf_1634231176980.html): 444ms We went from 500ms to 444ms, which is a reduction of ~11%. Reviewed By: supriyar Differential Revision: D31657430 fbshipit-source-id: 199ec9de3dab84bb5727d81c7804bb83bebf7b48	2021-10-14 16:30:39 -07:00
Mikhail Zolotukhin	871a31b9c4	[TensorExpr] Add missing schemas for lshift/rshift lowerings. (#66653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66653 Test Plan: Imported from OSS Reviewed By: navahgar, anijain2305 Differential Revision: D31664748 Pulled By: ZolotukhinM fbshipit-source-id: 13a3154292f12b7bee43b9a5254fb43be032e7c1	2021-10-14 14:19:29 -07:00
Bangsheng Tang	f8348ce9c8	graceful failure for draw_graph() in acc_utils.py (#66631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66631 writing to the current directory is causing issues in CI. we might also consider writing the ".dot" files to some temporary location. Test Plan: CI Reviewed By: 842974287 Differential Revision: D31657078 fbshipit-source-id: 9876327c7f172cd354f1b8e8076597c6a26e2850	2021-10-14 14:04:48 -07:00
Christopher Yeh	1d90f29f14	[DOC] Improve Transformer documentation (#66574 ) Summary: Includes adding some typing annotations to TransformerEncoderLayer and TransformerDecoderLayer Pull Request resolved: https://github.com/pytorch/pytorch/pull/66574 Reviewed By: soulitzer Differential Revision: D31654024 Pulled By: jbschlosser fbshipit-source-id: 9026bd36541699b7205e893decf5abc4a3f0ab5e	2021-10-14 13:26:12 -07:00
Christopher Yeh	3097755e7a	[DOC] Fix typo in KLDivLoss (#66583 ) Summary: Fix simple typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66583 Reviewed By: soulitzer Differential Revision: D31653998 Pulled By: jbschlosser fbshipit-source-id: e4fc91be297cc9a85099d7883b42436b5e3392d3	2021-10-14 13:21:37 -07:00
Harut Movsisyan	914796a69c	Fix for prim::BroadcastMKLDNNTensors issue (#66628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66628 Ensure BroadcastMKLDNNTensors do not break the stack invariant by pushing more than 2 tensors into the stack. Reviewed By: eellison Differential Revision: D31638565 fbshipit-source-id: 4526c0cf7ba8d87dc8a9c213c66c711e83adfc66	2021-10-14 11:53:42 -07:00
Richard Barnes	833ede33ed	Fix ubsan in concat_split_op.h (#66283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66283 Fixes ``` UndefinedBehaviorSanitizer: nullptr-with-nonzero-offset caffe2/caffe2/operators/concat_split_op.h:185:52 ``` Test Plan: Sandcastle Reviewed By: swolchok Differential Revision: D31486274 fbshipit-source-id: 20128056f19cf814fdc3e6e144cf9208a4080d6a	2021-10-14 11:42:30 -07:00
Vasiliy Kuznetsov	76f3b07caf	quantization docs: remove erroneous rebase artifact (#66577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66577 There was a rebase artifact erroneously landed to quantization docs, this PR removes it. Test Plan: CI Imported from OSS Reviewed By: soulitzer Differential Revision: D31651350 fbshipit-source-id: bc254cbb20724e49e1a0ec6eb6d89b28491f9f78	2021-10-14 11:30:47 -07:00
Pritam Damania	016362e2d7	Run sparse tests only for TensorPipe agent. (#66600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66600 Sparse RPC functionality added in https://github.com/pytorch/pytorch/pull/62794 works only for TensorPipe and is broken for other agent types. Moving these tests to a TensorPipe only class. ghstack-source-id: 140553147 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D31633305 fbshipit-source-id: 37d94cb9ed5565a72a6d512c2a9db75a497d5b95	2021-10-14 11:08:15 -07:00
Gary Miguel	543b7fb942	[JIT] Fix type annotations of pooling modules (#65847 ) Summary: All of the pooling modules except MaxUnpool and LPPool return either a Tensor or [Tensor, Tensor]. The current type annotations are inaccurate, and prevent scripting the module if return_indices is set as True in the module. There's not a great way to make this agree with mypy because the overload is dependent on the value of return_indices, an attribute. I tried changing the annotations from `Tensor` to `Union[Tensor, Tuple[Tensor, Tensor]]`, but that breaks a bunch of uses that have return_indices=False. For example, this breaks: `4e94e84f65/torch/nn/modules/container.py (L139)` Also clean up how test names were being constructed in test_jit, since otherwise we were getting name collisions when there were two tests on the same nn.Module. Fixes https://github.com/pytorch/pytorch/issues/45904 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65847 Reviewed By: ZolotukhinM Differential Revision: D31462517 Pulled By: eellison fbshipit-source-id: 6f9e8df1be6c75e5e1e9bae07cf3ad3603ba59bd	2021-10-14 10:59:19 -07:00
Peizhao Zhang	51b67f2bca	[qat]Removed outdated context manager in unit test. (#66274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66274 Removed outdated context manager in unit test. * The linked issue (https://github.com/pytorch/pytorch/issues/23825) seemed have been be fixed in 2020. Test Plan: buck run mode/dev-nosan //caffe2/test:quantization -- -r quantization.eager.test_quantize_eager_qat Reviewed By: vkuzo Differential Revision: D29507087 fbshipit-source-id: e8fa04c9527023a5adaf1a012b2c393ce0c5cd97	2021-10-14 10:23:55 -07:00
kshitij12345	49a1d7bfcb	[opinfo] elemwise parcel : isfinite, isinf, isposinf, isneginf, isnan, isreal (#66400 ) Summary: Adds OpInfo for `isfinite, isinf, isposinf, isneginf, isnan, isreal` Pull Request resolved: https://github.com/pytorch/pytorch/pull/66400 Reviewed By: bdhirsh Differential Revision: D31602998 Pulled By: mruberry fbshipit-source-id: 235cc414f373f014f4822a72deb1a04a58ad4a7c	2021-10-14 10:11:57 -07:00
Richard Zou	d810e738b9	OpInfo for `*_like` functions (#65941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65941 OpInfos for: empty_like, zeros_like, ones_like, full_like, randn_like Test Plan: - run tests Reviewed By: dagitses Differential Revision: D31452625 Pulled By: zou3519 fbshipit-source-id: 5e6c45918694853f9252488d62bb7f4ccfa1f1e4	2021-10-14 09:14:51 -07:00
Richard Zou	5d4452937d	OpInfos for some Tensor dtype conversion methods (#64282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64282 OpInfos for: - Tensor.bfloat16, Tensor.bool, Tensor.bypte, Tensor.char - Tensor.double, Tensor.float, Tensor.half, Tensor.int - Tensor.short, Tensor.long None of these are supported by TorchScript. Also, the OpInfo autograd test runner assumes that the operation is not allowed to change the dtype of the argument, so only Tensor.double has `supports_autograd=True` (in theory Tensor.bfloat16, Tensor.float, Tensor.half should be differentiable). Test Plan: - run tests Reviewed By: dagitses Differential Revision: D31452627 Pulled By: zou3519 fbshipit-source-id: b7f272e558558412c47aefe947af7f060dfb45c5	2021-10-14 09:13:30 -07:00
Brian Hirsh	77f98ea5e0	assert no duplicate yaml keys in codegen (#66238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66238 The codegen should error if it sees two yaml entries with the same key. The default behavior of python's yaml loader is to overwrite duplicate keys with the new value. This would have caught a nasty bug that showed up in https://github.com/pytorch/pytorch/pull/66225/files#r723796194. I tested it on that linked PR, to confirm that it errors correctly (and gives the line number containing the duplicate). Test Plan: Imported from OSS Reviewed By: dagitses, albanD, sean-ngo Differential Revision: D31464585 Pulled By: bdhirsh fbshipit-source-id: 5b35157ffa9a933bf4b344c4b9fe2878698370a3	2021-10-14 08:28:20 -07:00
lezcano	fe41df3601	Deprecate x.T on tensors of dimension other than 0 or 2 (#64180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64180 BC-breaking note: This PR deprecates the `Tensor.T` are not matrices. An upgrade guide is added to the documentation for `Tensor.T`. This PR DOES NOT make this attribute to throw an error when called on a tensor of `dim != 2`, but this will be its behavior in a future PyTorch release. cc mruberry rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D31610611 Pulled By: anjali411 fbshipit-source-id: af8ff7e862790dda9f06921de005b3f6fd0803c3	2021-10-14 08:17:32 -07:00
Vasiliy Kuznetsov	d802877dfa	speed up quantized interpolate for channels last (#66525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66525 This should solve https://github.com/pytorch/pytorch/issues/60015 There were two `q_zero_point()` accesses inside a for loop which was expensive. Moving them to before the loop sped things up 10x for a microbenchmark. Test Plan: ``` // comment out benchmarks unrelated to original issue, for simplicity cd benchmarks/operator_benchmark python -m pt.qinterpolate_test // before: 2994 us // after: 324 us // full results: https://gist.github.com/vkuzo/cc5ef9526dc0cda170d6d63498c16453 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31592422 fbshipit-source-id: b6078ac1039573bbe545275f7aedfd580910b459	2021-10-14 08:11:26 -07:00
CodemodService FBSourceClangFormatLinterBot	a40812de53	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31646229 fbshipit-source-id: 26a89b8eb88d31259f79c8f9061e016d57a1e462	2021-10-14 04:52:16 -07:00
Hao Lu	6310eb30d1	[SR] Clean up GetLivenessMap (#66606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66606 - Remove dead code (see comment for where) - Add debug prints - Small reorganization of the code to improve readability Reviewed By: d1jang Differential Revision: D31568219 fbshipit-source-id: 50240c325bf4fd012e1947ac931bb67c6f5dfafb	2021-10-13 23:55:40 -07:00
Jerry Zhang	e1348973ac	Add common_fx2trt.py (#66579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66579 Didn't commit this file in the PR that open sources fx2trt tests Test Plan: ci Reviewed By: 842974287 Differential Revision: D31623354 fbshipit-source-id: 6cedbe0f229da40499b83e6df28e16caca392d9c	2021-10-13 21:24:11 -07:00
Alex Beloi	74849d9188	[acc_shape_inference] add shape inference for quantize_per_channel (#66562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66562 Adding shape inference for `acc_ops.quantize_per_channel`, and fixing some bugs. Bugs were related to the fact that `quantize_per_channel` arguments `scales` and `zero_points` take tensors, so when we fetch the values (which needs to be done using `.tolist()` instead of `.item()`) we may get either a list or a scalar value. Test Plan: # Test Quantized Resnet From sandbox with GPU that supports quantized types (tested with V100) `buck run mode/opt -c python.package_style=inplace caffe2:fx2trt_quantized_resnet_test` Output ``` ... [TensorRT] INFO: [MemUsageSnapshot] Builder end: CPU 0 MiB, GPU 1548 MiB [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation begin: CPU 0 MiB, GPU 1548 MiB [TensorRT] VERBOSE: Using cublasLt a tactic source [TensorRT] WARNING: TensorRT was linked against cuBLAS/cuBLAS LT 11.5.1 but loaded cuBLAS/cuBLAS LT 11.1.0 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 0, GPU 1556 (MiB) [TensorRT] VERBOSE: Using cuDNN as a tactic source [TensorRT] INFO: [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 0, GPU 1564 (MiB) [TensorRT] WARNING: TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [TensorRT] VERBOSE: Total per-runner device memory is 23405056 [TensorRT] VERBOSE: Total per-runner host memory is 73760 [TensorRT] VERBOSE: Allocated activation device memory of size 154140672 [TensorRT] INFO: [MemUsageSnapshot] ExecutionContext creation end: CPU 0 MiB, GPU 1736 MiB trt fp16 time (ms/iter) 1.252899169921875 trt int8 time (ms/iter) 1.3774776458740234 trt implicit int8 time (ms/iter) 1.3835883140563965 PyTorch time (CUDA) (ms/iter) 4.34483528137207 PyTorch time (CPU) (ms/iter) 55.687150955200195 [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1918 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1866 (MiB) [TensorRT] INFO: [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 0, GPU 1738 (MiB) WARNING: Logging before InitGoogleLogging() is written to STDERR W1012 12:07:23.556475 711816 DynoConfigLoader.cpp:32] Failed to read config: No dyno config client ``` # Test shape inference `buck test mode/opt glow/fb/fx/acc_tracer:test_acc_shape_inference` Output ``` ... Summary Pass: 95 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1407375092088240 ``` Reviewed By: jfix71, jerryzh168 Differential Revision: D31457323 fbshipit-source-id: 8ccc4a9b0ca655fb30838e88575aff2bf3a387a6	2021-10-13 21:03:08 -07:00
Natalia Gimelshein	7d9bbd3596	Revert D31580382: [pytorch][PR] dropout update in autodiff Test Plan: revert-hammer Differential Revision: D31580382 (`eb8138d886`) Original commit changeset: 41d15da99bf4 fbshipit-source-id: 59f751ee59602a5fd09c17f8c7565dca5e2beb50	2021-10-13 19:52:05 -07:00
Bert Maher	c1c985a282	Rename tensorexpr::Value so that it can coexist with torch::jit::Value (#66467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66467 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D31619973 Pulled By: bertmaher fbshipit-source-id: eebea821fbbd0ae6f0a7144809c87c7da7f88699	2021-10-13 19:41:07 -07:00
Hao Lu	6634570aef	[SR] Fix bug in ValueGroup (#66470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66470 Reviewed By: d1jang Differential Revision: D31566348 fbshipit-source-id: e0f634af77d893bbc8d66f214b2b8bdd6ab58cc3	2021-10-13 19:26:38 -07:00
Scott Wolchok	d30397d42a	[PyTorch][Static Runtime] Don't use vector in ProcessedNode (#65429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65429 The sizes of these arrays can't change, so there's no need to waste an extra pointer on them. ghstack-source-id: 140532722 Test Plan: CI I profiled this diff and the previous diff together. Comparing time spent in the operator functor handler for to_copy, I see the load instruction fetching the inputs pointer from p_node on https://www.internalfb.com/code/fbsource/[4c98a83b2451fa6750f38796c91ebb0eb0afd800]/fbcode/caffe2/torch/csrc/jit/runtime/static/ops.cpp?lines=947 (`p_node->Input(0).toTensor()`) improved a tiny bit, and the overall time spent in that wrapper decreased from 0.8% to 0.7%. Reviewed By: hlu1 Differential Revision: D31096042 fbshipit-source-id: 35c30462d6a9f9bd555d6b23361f27962e24b395	2021-10-13 19:13:20 -07:00
Samuel Salas	c6f0dde3ca	Cumsum Converter (#66376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66376 Added converter for cumsum and unit test Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_cumsum Reviewed By: wushirong, 842974287 Differential Revision: D31423701 fbshipit-source-id: ee3aa625d6875ba8e6bad27044d22638e99b5c03	2021-10-13 19:04:37 -07:00
Can Balioglu	160946e3f3	Use `torch.empty()` instead of `torch.tensor()` in `torch.nn.Parameter` (#66486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66486 The newly-introduced Python dispatcher mode (`__torch_dispatch__`) does not have support for `torch.tensor()` (see #64360) and this causes friction in the user experience if some `nn.Modules` use `torch.tensor()` either implicitly or explicitly. This PR replaces calls to `torch.tensor()` in `Parameter`, `UninitializedParameter`, and `UninitializedBuffer` with an equivalent call to `torch.empty()` which serves the same purpose and is syntactically more readable. ghstack-source-id: 140520931 Test Plan: Since no behavioral change, run the existing unit and integration tests. Reviewed By: pbelevich Differential Revision: D31575587 fbshipit-source-id: bd7bdeea54370f3e53dc13bd182b97d0f67146f5	2021-10-13 18:56:36 -07:00
Peter Bell	30d9fd9cf3	Migrate USE_MAGMA config macro to ATen (#66390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66390 Test Plan: Imported from OSS Reviewed By: malfet, bdhirsh Differential Revision: D31547712 Pulled By: ngimel fbshipit-source-id: 1b2ebc0d5b5d2199029274eabdd014f343cfbdd3	2021-10-13 17:50:10 -07:00
Natalia Gimelshein	e75de4f307	remove a few unused THCTensor/Storage methods (#66555 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/66555 Reviewed By: mruberry Differential Revision: D31620969 Pulled By: ngimel fbshipit-source-id: 1922ef523df473e8673a35c4a155b7b0cf000953	2021-10-13 17:18:11 -07:00
Peter Bell	4e1c075542	log_sigmoid: Use log1p for improved precision (#66441 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20972 log_sigmoid calculates something like `log(1 + x)` where x is always a positive number less than one. This wastes floating point precision because the exponent always becomes zero. Instead, using `log1p(x)` gives the full mantissa precision around `x=0`. This also fixes infinity propagation because the old code does, `exp(in - in)` when `in` is negative. Which for infinity, results in a NaN instead of 0. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/66441 Reviewed By: bdhirsh Differential Revision: D31619630 Pulled By: albanD fbshipit-source-id: e7867f3459a91e944b92f8ca42b6e0697b13f89b	2021-10-13 16:36:13 -07:00
Edward Yang	24202f7fb4	Remove native_functions.yaml dependency from Activation.cu (#64499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64499 This moves the native functions into a separate Activation.cpp file, which calls into `launch_..._kernel` functions defined in `Activation.cu`. The exception is `rrelu_with_noise` which is compilcated by the random number generation code, so I've moved it into its own file. Test Plan: Imported from OSS Reviewed By: jbschlosser, ezyang Differential Revision: D30867323 Pulled By: dagitses fbshipit-source-id: a4cd6f1fb1b1fed4cc356bf8b3778991ae2278ba	2021-10-13 16:28:13 -07:00
jiej	eb8138d886	dropout update in autodiff (#66273 ) Summary: 1. Unifies dropout op in autodiff 2. Removes dropout inference support in autodiff Pull Request resolved: https://github.com/pytorch/pytorch/pull/66273 Reviewed By: jbschlosser, gmagogsfm Differential Revision: D31580382 Pulled By: eellison fbshipit-source-id: 41d15da99bf4ce6c47cc335a4156c4a1c9705a70	2021-10-13 16:23:40 -07:00
Peter Bell	5f45927d15	Autograd: Delay warnings until the end of backward execution (#66235 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50209 This adds a new warning handler that stores all warnings in a shared queue, which can be "replayed" at a later time and, crucially, on another thread. Then, I use this inside the autograd engine to ensure that warnings are processed by the handler registered on the main thread. For testing, I also add an operator that always warns in the backward pass and test that the warning is a normal Python warning. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66235 Reviewed By: ejguan Differential Revision: D31505413 Pulled By: albanD fbshipit-source-id: 1a7f60b038f55c20591c0748b9e86735b3fec2f9	2021-10-13 15:38:04 -07:00
Nikita Shulga	42328090cb	[GHA] Hardcode doc build target to `master` (#66567 ) Summary: According to `f48f20e154/.circleci/verbatim-sources/job-specs/job-specs-custom.yml (L46-L48)` target should always be master (even on release branches) unless it is a tagged build Fixes https://github.com/pytorch/pytorch/issues/66466 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66567 Reviewed By: seemethere Differential Revision: D31621530 Pulled By: malfet fbshipit-source-id: d6de2222d0340820555a82ae90b3de22b4dc7b88	2021-10-13 15:08:46 -07:00
Scott Wolchok	0aab34c26c	[jit] Refcounting spot fixes in alias_analysis (#66295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66295 Tidying up the top sources of reference count decrements seen during static runtime startup in alias_analysis.cpp specifically. ghstack-source-id: 140484160 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31490761 fbshipit-source-id: bbdcb7f9065c3aafa7fff7bfea9cea6dbc41f9d9	2021-10-13 14:47:32 -07:00
Scott Wolchok	9767282643	[jit] Add MutableTypePtrHelper::mapTypeToBorrowedAliasTypeSet (#65344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65344 Callsites that know they are using a cache can borrow AliasTypeSets from the cache instead of copying them. ghstack-source-id: 140484162 Test Plan: Running perf on static runtime startup seems to show less inclusive time spent in AliasDb::getElements Reviewed By: ejguan Differential Revision: D31027363 fbshipit-source-id: b7a1473f4f9e9f14566f56f4b3b4e6317076beeb	2021-10-13 14:47:30 -07:00
Scott Wolchok	75d98fa0ae	[jit] Implement one-element MemoryDAG::mayContainAlias more efficiently (#65178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65178 There is no need to copy the MemoryLocations in this case. ghstack-source-id: 140484161 Test Plan: CI static runtime startup for ctr_mobile_feed decreased from 7.0s to 6.3s Reviewed By: suo Differential Revision: D30984442 fbshipit-source-id: 61bb678c4480cd030aaab2bbc8a04cbd9b7c7f4d	2021-10-13 14:46:16 -07:00
Shiyan Deng	9e8281fd2f	[fx2trt][code quality] Add type annotation and docstring to utils functions in acc_ops_converters.py (#66496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66496 As the title. No changes on the code logic. Test Plan: CI Reviewed By: wushirong Differential Revision: D31576303 fbshipit-source-id: f2132309023b3c9e09810e32af91eb42eefd3f32	2021-10-13 14:06:15 -07:00
Mike Iovine	37db650c9c	[Static Runtime] Clone test does not use uninitialized memory (#66557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66557 The test was previously using `at::empty_strided` to initialize one of its inputs. The contents of the tensor returned by this function are random, uninitialized memory. If we happened to get a NaN, this test would fail since `use_equalnan` was not set. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31611961 fbshipit-source-id: 79a9476d0d6ce7a9f1412eefcef19bc2618c54b8	2021-10-13 14:02:34 -07:00
Michael Suo	82986a17a6	fix lint (#66572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66572 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D31624043 Pulled By: suo fbshipit-source-id: 9db9cee3140d78c2a2f0c937be84755206fee1dd	2021-10-13 13:59:08 -07:00
anjali411	a82fcd3560	Disable .numpy() and .tolist() for tensor subclasses subclasses and fix .tolist() for conjugated and negated tensors (#66082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66082 Fixes https://github.com/pytorch/pytorch/issues/66024 #65779 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved albanD Test Plan: Imported from OSS Reviewed By: Gamrix, albanD Differential Revision: D31615588 Pulled By: anjali411 fbshipit-source-id: c3e65ef0fe301630eb76732ccd7819683c09aa19	2021-10-13 13:57:51 -07:00
John Shen	675ba6cd53	[qnnpack] Remove usage of conv_param_t in deconv-run.cc (#66465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66465 conv_param_t is being removed as it stores redundant information. This removes the last usage of it in qnnpack so we can begin removing the dependency. ghstack-source-id: 140475374 Test Plan: github tests Reviewed By: kimishpatel Differential Revision: D31564679 fbshipit-source-id: 049a28fac0235b2e739fb2e048484d7e8e7189fa	2021-10-13 13:51:15 -07:00
Saketh Are	86cf22cb1c	Add OpInfo for torch.bucketize (#65821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65821 Reviewed By: malfet, mruberry Differential Revision: D31386048 Pulled By: saketh-are fbshipit-source-id: fae7ec7b6b57436d87d38d421c5f3f52be4cdadd	2021-10-13 13:46:35 -07:00
anjali411	035310c574	Handle shared memory cases in MathBithFallback (#63602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63602 This PR fixes the case when a read and write is performed on a memory shared between mutable and (or) non-mutable arguments. Example: ``` a=torch.tensor([1+1j]) b=a.conj() b.add_(a) # should return tensor([2]) but returns tensor ([2-2j]) ``` The issue here is that in the conjugate fallback, we resolve the conjugation in-place for mutable arguments which can be a problem as shown above in the case when other input arguments share memory with the mutable argument(s). This PR fixes this issue by: 1. first scanning through the operator input arguments and creating a vector of mutable arguments that have the conj bit set to `True` (and accordingly setting the flag `check_for_alias_with_mut_arg ` to `True` or `False`). 2. Iterating through all the arguments. At this time we only look at the non-mutable arguments. If `check_for_alias_with_mut_arg` is set to `True`, then we iterate through `mutable_inputs` to check if the current arg tensor in question doesn't alias any of the entries in `mutable_inputs`. If yes, then we clone the non-mutable tensor arg, else we resolve the conjugation as before. 3. Now we look through the mutable_inputs vector (which contains only mutable input tensors with conj bit set to `True`). We in-place conjugate each of the entries in the vector. 4. Do the computation. 5. Re-conjugate the mutable argument tensors. NOTE: `TensorLists` are not fully handled in ConjugateFallback. Please see the in-line comment for more details. Fixes https://github.com/pytorch/pytorch/issues/59943 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D30466905 Pulled By: anjali411 fbshipit-source-id: 58058e5e6481da04a12d03f743c1491942a6cc9b	2021-10-13 13:39:31 -07:00
lezcano	c04bcde245	Make empty* and *_like factory functions respect tensor subclasses (#65677 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65243 cc albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/65677 Reviewed By: dagitses Differential Revision: D31432032 Pulled By: albanD fbshipit-source-id: 77f464974c7656c1206085aba9300471d7e0ef57	2021-10-13 13:34:53 -07:00
Nikita Shulga	b792a77895	Skip `interactive_embedded_interpreter.cpp` for clang-tidy (#66569 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66569 Reviewed By: suo Differential Revision: D31622885 Pulled By: malfet fbshipit-source-id: 61bad5ff3011f992cdd149724c935c098996d6a2	2021-10-13 13:27:56 -07:00
Eli Uriegas	09b90612c4	.github: Enable onnx tests (#66513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513 These were missed in the migration of onnx to github actions. Adds ort tests with 2 shards for the onnx workflow Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31599433 Pulled By: seemethere fbshipit-source-id: 73dce0d3017c4280e64f0c8578e2be7ef6a168d6	2021-10-13 13:14:02 -07:00
Will Constable	f48f20e154	Make ContainerHash compatible with const& types (#66497 ) Summary: - this change should not impact existing use cases, but allows for additional use cases where the container holds const types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66497 Reviewed By: alanwaketan Differential Revision: D31582242 Pulled By: wconstab fbshipit-source-id: 3a0e18b4afaf3c7ff93a0e3d09067ed066402b44	2021-10-13 12:45:17 -07:00
Natalia Gimelshein	fdd9f49cf5	add a note on numerical accuracy (#65947 ) Summary: Per title Fixes https://github.com/pytorch/pytorch/issues/54437 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65947 Reviewed By: albanD Differential Revision: D31612445 Pulled By: ngimel fbshipit-source-id: 5c155891a088aef3b9813f253d0dc1ee4d51ae1c	2021-10-13 12:43:55 -07:00
Shunting Zhang	a453ebc8ac	Use interactive_embedded_interpreter to dynamicly loading various third-party libraries (#66512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66512 TLDR, we are able to use the interactive_embedded_interpreter (basically just torch::deploy interpreter with an interactive shell) to dynamicly load various third party libraries. We use the popular libraries numpy, scipy, regex, pandas for illustration purpose. A couple of changes need to be done for the interactive_embedded_interpreter: 1, we need link with :embedded_interpreter_all rather than :embedded_interpreter so we can enable DEEPBIND and use our custom loader 2, we provide a pylibRoot path to construct the InterpreterManager. The path will be added to the embedded interpreter's sys.path. Typically we can pass in the python library root path in a conda environment so torch::deploy interpreter can find all installed packages. 3, we allow interactive_embedded_interpreter execute a script to ease recording the exploration of various python libraries. ghstack-source-id: 140453213 Test Plan: Install numpy, scipy, regex, pandas in the conda environment or on the machine directly. Suppose /home/shunting/.local/lib/python3.8/site-packages/ is the root path for the installed libraries. - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_regex.py content of try_regex.py: ``` import regex print(regex) pat = r'(.+)\1' print(regex.match(pat, "abcabc")) print(regex.match(pat, "abcba")) print("bye") ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_numpy.py content of try_numpy.py: ``` import numpy as np print(f"numpy at {np}") a = np.random.rand(2, 3) b = np.random.rand(3, 2) print(np.matmul(a, b)) ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_scipy.py content of try_scipy.py: ``` import numpy as np from scipy import linalg mat_a = np.array([[1, 0, 0, 0], [1, 1, 0, 0], [1, 2, 1, 0], [1, 3, 3, 1]]) mat_b = linalg.inv(mat_a) print(mat_b) ``` - buck run mode/opt :interactive_embedded_interpreter -- --pylib_root=/home/shunting/.local/lib/python3.8/site-packages/ --pyscript=~/p7/iei_examples/try_pandas.py content of try_pandas.py: ``` import pandas as pd print(f"pandas at {pd}") df = pd.DataFrame({ "col1": [1, 2, 3, 4], "col2": [2, 4, 8, 16], }) print(df) ``` Reviewed By: suo Differential Revision: D31587278 fbshipit-source-id: c0b031c1fa71a77cdfeba1d04514f83127f79012	2021-10-13 12:39:13 -07:00
Stephen Jia	a8815d557a	[vulkan] Remove the persistent resource pool (#66478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66478 A persistent resource pool was needed to store prepacked tensors since the main resource pool tied to the global Vulkan context would be flushed at the end of each inference run. However, prepacked tensors needed to alive between inference runs, so an additional persistent resource pool was introduced that would only be flushed when the Vulkan context was destroyed. However, with [this change](https://github.com/pytorch/pytorch/pull/66477) the resource pool no longer indiscrimately flushes allocated resources at the end of an inference run. Tensors will have to call `release_resources()` before they become eligible to be destroyed. Since prepacked tensors are tied to an `OpContext` object they will stay alive between inference runs. Therefore, the persistent resource pool is no longer needed. Test Plan: Build and run `vulkan_api_test`. Reviewed By: beback4u Differential Revision: D31490076 fbshipit-source-id: 3741a2333c834796d589774e819eaaf52bb9f0fe	2021-10-13 12:01:08 -07:00
Stephen Jia	cebaf21c5a	[vulkan] Release GPU resources when vTensor::View is destroyed (#66477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66477 Currently, Vulkan tensor memory is allocated and deallocated through the following mechanism: 1. During inference, ops will request buffer and/or texture memory for tensors from the [Resource Pool](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.h#L324-L327) 2. The resource pool allocates the memory and [adds it to a vector](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L609-L622) containing all the memory allocations it has made this inference, then returns the most recently allocated block of memory 3. At the end of inference, results are transferred back to the CPU and the [context is flushed](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Copy.cpp#L150) 4. As part of the context flush the [resource pool is purged](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Context.cpp#L143) which [deallocates all buffer and texture memory](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/api/Resource.cpp#L683-L684) allocated by the resource pool This pattern makes it impossible to have models with multiple outputs. When the first output tensor is transferred back to the CPU, the memory of the other output tensors will be deallocated when the context is flushed. Instead, an alternative is to tie resource destruction to the destructor of the [vTensor::View](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/vulkan/ops/Tensor.h#L243) class, which holds the actual implementation and storage of Vulkan tensors. This will ensure that memory associated with a tensor will be cleaned up whenever it is no longer used. The new deallocation mechanism proposed is: 1. During inference, `vTensor` objects will request GPU memory from the resource pool, same as before. 2. The resource pool allocates buffer or texture memory and returns it directly to the `vTensor` 3. Throughout inference, intermediate tensors' reference counts will go to 0 and the destructor of the `View` class will be called 4. The destructor will any texture and buffer memory it's holding to the resource pool's list of GPU memory allocations to be cleaned 5. At the end of inference `purge()` will be called which will destroy all allocations in the list of allocations to be cleaned 6. GPU memory for output tensors will not be destroyed, since their reference counts will be greater than 0, thus they have not yet been added to the list of allocations to be destroyed Note that it is not correct to have the destructor directly deallocate GPU memory. This is due to the fact that Vulkan ops simply submit work to the GPU but does not guarantee that work has completed when the op returns. Therefore we must keep all allocated GPU memory until the end of inference, when we wait for the GPU to complete work. Test Plan: build and run `vulkan_api_test` to make sure existing functionality is not impacted. Also test in a later diff that checks that output tensors stay alive after inference completes. Reviewed By: dreiss Differential Revision: D31510899 fbshipit-source-id: 99250c2800a68f07b1b91dbf5d3b293184da5bd2	2021-10-13 11:59:40 -07:00
Yinghai Lu	5e34ac6c43	[FX] Fix cases when we should not fuse due to more than one users of intermediate node (#66472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66472 A follow up of https://github.com/pytorch/pytorch/pull/66362. Same fix. Test Plan: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_matmul_trt buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: wushirong, 842974287 Differential Revision: D31567662 fbshipit-source-id: 2c9e6a138fc31996d790fd4d79e0bf931507fc99	2021-10-13 11:53:42 -07:00
Michael Suo	9d13ae450a	[oss/ci] skip all dataloader tests with asan (#66561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66561 See https://github.com/pytorch/pytorch/issues/66223 for context. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31617142 Pulled By: suo fbshipit-source-id: 16b280fc47a7c40fa19c5c72192d342dd33680bf	2021-10-13 11:39:41 -07:00
Tomi Peltola	713e025c9f	Add no-input-grad-needed cases to test_grid_sample (#66071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66071 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431801 Pulled By: albanD fbshipit-source-id: 57a94ed9e97e402aa8193d69355e57b6309c64f7	2021-10-13 10:56:47 -07:00
Tomi Peltola	8a40bb62f9	Compute input gradient only if required (CUDA) (#66070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66070 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431805 Pulled By: albanD fbshipit-source-id: 8c3de6632aaee168ec6fd7eb79a5af26973af9c5	2021-10-13 10:56:45 -07:00
Tomi Peltola	f8d98b5a6d	Compute input gradient only if required (CPU) (#66069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66069 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431803 Pulled By: albanD fbshipit-source-id: d4caba5fa092e4ee7411502021836370082670b2	2021-10-13 10:56:43 -07:00
Tomi Peltola	84385c40e4	Add output_mask (#66068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66068 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31431802 Pulled By: albanD fbshipit-source-id: 322aae5614dacb06fd45e513465b7a5cc11f4dbb	2021-10-13 10:55:27 -07:00
Arpan Abhishek	6401658b08	fix type error in hipify_python.py (#66164 ) Summary: - [x] Fixed the Pyre type checking errors in `torch/utils/hipify/hipify_python.py`: ``` torch/utils/hipify/hipify_python.py:196:8 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`. torch/utils/hipify/hipify_python.py:944:4 Incompatible variable type [9]: clean_ctx is declared to have type `GeneratedFileCleaner` but is used as type `None`. ``` Fixing the issue: https://github.com/MLH-Fellowship/pyre-check/issues/78 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66164 Reviewed By: onionymous Differential Revision: D31411443 Pulled By: 0xedward fbshipit-source-id: c69f8fb839ad1d5ba5e4a223e1322ae7207e1574	2021-10-13 10:33:49 -07:00
jjsjann123	d85948896c	Add softplus support to autodiff (#63942 ) Summary: Add softplus definition to autodiff. cc gmagogsfm Pull Request resolved: https://github.com/pytorch/pytorch/pull/63942 Reviewed By: ngimel Differential Revision: D31397158 Pulled By: eellison fbshipit-source-id: f7db547370f82e5e282505c3c8415fb4fbd86d54	2021-10-13 08:08:09 -07:00
lezcano	82a216c45b	Add tensor.{adjoint(),H,mT,mH} methods and properties (#64179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64179 This PR follows the discussion in https://github.com/pytorch/pytorch/issues/45063#issuecomment-904431478 Fixes https://github.com/pytorch/pytorch/issues/45063 cc ezyang anjali411 dylanbespalko mruberry Lezcano nikitaved rgommers pmeier asmeurer leofang AnirudhDagar asi1024 emcastillo kmaehashi heitorschueroff Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D30730483 Pulled By: anjali411 fbshipit-source-id: 821d25083f5f682450f6812bf852dc96a1cdf9f2	2021-10-13 07:44:43 -07:00
Christopher Gray Howard	87df043f63	[Bootcamp][Pytorch]Add testing for complex parameters in Adagrad optimizer (#66501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66501 Add testing for the Adagrad optimizer to ensure that it behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github ghstack-source-id: 140414042 Test Plan: buck test mode/dev caffe2/test:optim -- 'test_adagrad_complex' https://pxl.cl/1R27M Reviewed By: albanD Differential Revision: D31584240 fbshipit-source-id: 5c9938084566b8ea49cc8ff002789731f62fe87e	2021-10-13 07:05:20 -07:00
Louis Feng	ecb7b38c00	[PyTorch] Support additional arguments in Python record function (#65736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65736 We ran into some limitations to extract PyTorch operator parameters through hooks or the execution graph. Some of these limitations are not due to the operator not exposing them, rather the inputs for these operators are already fused/processed in some cases (like embedding table). We want to be able to attach some metadata to the user scope record functions allowing the profilers to later extract these information. The record function C++ API already supports taking inputs and outputs information. The corresponding Python interface does not support them and only allows a string name as record function parameter. This diff adds support for user to optionally to add additional arguments to the record function in two ways. 1. to remain backward compatible with `record_function_op`, we have added an optional string arg to the interface: `with record_function(name, arg_str)`. 2. to support data dependency graph, we also have the new `torch.autograd._record_function_with_args_enter` and `torch.autograd._record_function_with_args_exit` functions to provide an interface where we can give additional tensor arguments. For now we imagine this can be used for debugging or analysis purpose. In this form, we currently support some basic data types as inputs: scalars, string, list, and tensor. Example usage: ``` # record_function operator with a name and optionally, a string for arguments. with record_function("## TEST 1 ##", "[1, 2, 3]"): <actual module or operator> # more general form of record_function a = _record_function_with_args_enter("## TEST 2 ##", 1, False, 2.5, [u, u], "hello", u) <actual module or operator> _record_function_with_args_exit(a) ``` Corresponding outputs in execution graph: ``` { "name": "## TEST 2 ##", "id": 7, "parent": 3, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": [1,false,2.5,[6,6],"hello",6], "input_shapes": [[],[],[],[[3,4,5],[3,4,5]],[],[3,4,5]], "input_types": ["Int","Bool","Double","GenericList[Tensor(float),Tensor(float)]","String","Tensor(float)"], "outputs": [], "output_shapes": [], "output_types": [] }, { "name": "## TEST 1 ##", "id": 3, "parent": 2, "fw_parent": 0, "scope": 5, "tid": 1, "fw_tid": 0, "inputs": ["1, 2, 3"], "input_shapes": [[]], "input_types": ["String"], "outputs": [], "output_shapes": [], "output_types": [] }, ``` Test Plan: ``` => buck build caffe2/test:profiler --show-output => buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestRecordFunction test_record_function (test_profiler.TestRecordFunction) ... Log file: /tmp/libkineto_activities_1651304.json Net filter: Target net for iteration count: Net Iterations: 3 INFO:2021-09-27 01:10:15 1651304:1651304 Config.cpp:424] Trace start time: 2021-09-27 01:10:30 Trace duration: 500ms Warmup duration: 5s Net size threshold: 0 GPU op count threshold: 0 Max GPU buffer size: 128MB Enabled activities: cpu_op,user_annotation,external_correlation,cuda_runtime,cpu_instant_event Manifold bucket: gpu_traces Manifold object: tree/traces/clientAPI/0/1632730215/devvm2060.ftw0/libkineto_activities_1651304.json Trace compression enabled: 1 INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:536] Tracing starting in 14s INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:48] Target net for iterations not specified - picking first encountered that passes net filter INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:57] Tracking net PyTorch Profiler for 3 iterations INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:126] Processing 1 CPU buffers INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:686] Recorded nets: INFO:2021-09-27 01:10:15 1651304:1651304 ActivityProfiler.cpp:689] PyTorch Profiler: 1 iterations ok ---------------------------------------------------------------------- Ran 1 test in 0.021s OK ``` Reviewed By: gdankel Differential Revision: D31165259 fbshipit-source-id: 15920aaef7138c666e5eca2a71c3bf33073eadc4	2021-10-13 01:49:15 -07:00
Jerry Zhang	9918fd8305	[fx2trt] open source tests for converters (#66361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66361 ossci will be setup later, fbonly ci is ready Test Plan: buck run caffe2/test:fx2trt_test_linear testinprod Reviewed By: 842974287 Differential Revision: D31511082 fbshipit-source-id: 9e2c50c83fdba822cd2488eb17b5787d8a57f087	2021-10-13 00:09:43 -07:00
Peter Bell	80a3619823	Remove THCTensorMathReduce.cuh (#66389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66389 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31547711 Pulled By: ngimel fbshipit-source-id: c181d14f66536b6873b5b14088312c6c70bf0855	2021-10-12 22:59:19 -07:00
Junjie Wang	bc6935ddf5	[PyTorch][Distributed][Easy] Make ShardedTensor.size() equivalent to torch.Tensor.size() (#65087 ) (#66012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66012 Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D31345161 Pulled By: fduwjj fbshipit-source-id: 10d6b65780ab0c6934babcc7c36a181cb66f0b7c	2021-10-12 22:26:22 -07:00
Peter Bell	8eb85b5027	Remove THCNumerics (#66388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66388 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D31547710 Pulled By: ngimel fbshipit-source-id: 20710328f2e5fc2e931a3f8ba9b4243acc310d54	2021-10-12 22:05:03 -07:00
Eli Uriegas	2d3b23190c	Revert D31591512: .github: Enable onnx tests Test Plan: revert-hammer Differential Revision: D31591512 (`06a156efc7`) Original commit changeset: 4a8bb3f0e62f fbshipit-source-id: 2d8580c0e507c2a0b30431bcf30eb01cef82f602	2021-10-12 20:17:02 -07:00
Ivan Yashchuk	08f3823647	Sparse CSR CUDA: add `addmv_out` (#61407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61407 This PR adds `addmv_out_sparse_csr_cuda`. The operation is used to compute matrix-vector multiplication. Since structured_delegate is used we only need to implement the out variant, the in-place and normal variants are autogenerated. Working on this PR revealed that float16 (and probably bfloat16) inputs do not work correctly in cusparse, therefore for this case `addmm` is used with squeezes and unsqueezes. cc nikitaved pearu cpuhrsch IvanYashchuk ngimel Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31584499 Pulled By: ngimel fbshipit-source-id: 4c507791471ada88969116b88eeaaba7a7536431	2021-10-12 20:06:56 -07:00
Eli Uriegas	8492e6bc6a	.github: scheduled -> schedule, fix periodic (#66531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66531 The github.event_name should be schedule not scheduled Reference, https://docs.github.com/en/actions/learn-github-actions/events-that-trigger-workflows#schedule Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31598136 Pulled By: seemethere fbshipit-source-id: 4d67f7731b21e05dabc8f54b4ebf9a5d2d3a4e1e	2021-10-12 19:46:01 -07:00
Eli Uriegas	06a156efc7	.github: Enable onnx tests (#66513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66513 These were missed in the migration of onnx to github actions. Adds ort tests with 2 shards for the onnx workflow Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31591512 Pulled By: seemethere fbshipit-source-id: 4a8bb3f0e62ff98ee77d3d8afc905f4e02db6f24	2021-10-12 19:35:09 -07:00
soulitzer	93d326c868	Add InplaceOrView boxed kernel (#63878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63878 See https://github.com/pytorch/pytorch/issues/64407, https://github.com/pytorch/pytorch/issues/62032 for context: In this PR: - Add boxed kernel by replicating `gen_inplace_or_view`'s logic that is ONLY for use with the Autograd not-implemented kernel - Unlike `gen_inplace_or_view` we always pass a view_func to as_view in order to ensure that an "derivative is not implemented" error is raised even if an in-place update is performed on the view. Without the `view_func`, the CopySlice + AsStridedBackward nodes would replace the NotImplemented node. - This limitation makes it impossible to use this node for general use - view relationship must be between first input (must be tensor) and first output (may be tensor or vec of tensor) - do not support non-differentiable views (_values, _indices, view.dtype) - view relationship is always fw and bw differentiable - Adds the macro `#define REGISTER_AUTOGRAD_NOT_IMPLEMENTED_FALLBACK(ns, op)` to be the interface for this feature: - static initialization can be slowed down(? not measured) if there are many registrations, because each line translates to 2 library calls but the workaround is just to manually use the two functions `AutogradNotImplementedFallback` and `ADInplaceOrViewFallback` and call `m.impl`. - Adds testing: - for views: view relationship created - performing in-place operation on the view, raises properly - trying to create two view relationships is not allowed, - single view relationship but not first input/first output should error - view relation created properly for tensor vector output - for in-place: - version count bump - triggers rebase_history - multiple mutations is okay and also updates version counter - TODO (follow up): Update tutorials for adding third-party operators (and document the above limitations) - TODO (follow up): Look at torch-audio/torch-vision and identify places where this can simplify existing code EDIT: Made it more clear what is introduced in this PR and moved some more contextual stuff into the issue itself Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30901714 Pulled By: soulitzer fbshipit-source-id: 48de14c28be023ff4bd31b7ea5e7cba88aeee04c	2021-10-12 18:55:50 -07:00
Teng Zhang	40794dbb25	add backend_config_dict to checkGraphModeFxOp (#66499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66499 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31582518 Pulled By: rahxephon89 fbshipit-source-id: b8107bb7140517f2dc32bf692c6b916536ea35c3	2021-10-12 18:35:54 -07:00
Nikita Shulga	d32736e317	Make permission errors more human readable (#66492 ) Summary: `_mkdir_p` feels like a remnant of Python-2 era, add `exist_ok` argument and re-raise OSError to make it more human readable. After the change attempt to build PyTorch in a folder that does not have write permissions will result in: ``` % python3.6 setup.py develop Building wheel torch-1.10.0a0+git9509e8a -- Building version 1.10.0a0+git9509e8a Traceback (most recent call last): File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 21, in _mkdir_p os.makedirs(d, exist_ok=True) File "/opt/homebrew/Cellar/python36/3.6.2+_254.20170915/Frameworks/Python.framework/Versions/3.6/lib/python3.6/os.py", line 220, in makedirs mkdir(name, mode) PermissionError: [Errno 13] Permission denied: 'build' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "setup.py", line 895, in <module> build_deps() File "setup.py", line 370, in build_deps cmake=cmake) File "/Users/nshulga/git/pytorch-worktree/tools/build_pytorch_libs.py", line 63, in build_caffe2 rerun_cmake) File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 225, in generate _mkdir_p(self.build_dir) File "/Users/nshulga/git/pytorch-worktree/tools/setup_helpers/cmake.py", line 23, in _mkdir_p raise RuntimeError(f"Failed to create folder {os.path.abspath(d)}: {e.strerror}") from e RuntimeError: Failed to create folder /Users/nshulga/git/pytorch-worktree/build: Permission denied ``` Fixes https://github.com/pytorch/pytorch/issues/65920 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66492 Reviewed By: seemethere Differential Revision: D31578820 Pulled By: malfet fbshipit-source-id: afe8240983100ac0a26cc540376b9dd71b1b53af	2021-10-12 18:31:24 -07:00
Jane Xu	d921891f57	GHA: Stop skipping periodic jobs (#66264 ) Summary: they have been skipped for too long ![image](https://user-images.githubusercontent.com/31798555/136433267-f35c0507-23ab-4348-be43-78d299c3d654.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66264 Reviewed By: dagitses, malfet, seemethere Differential Revision: D31478705 Pulled By: janeyx99 fbshipit-source-id: 1324b123e3f8646e5cd671af4c1850398a6f6e3b	2021-10-12 14:39:47 -07:00
Michael Suo	3ac2c74896	Revert D31082208: Use shared CUPTI by default Test Plan: revert-hammer Differential Revision: D31082208 (`8b0eae5aa8`) Original commit changeset: 14f66af92084 fbshipit-source-id: 0faff00832b7f79d476fd1f9f505142a548a76db	2021-10-12 14:37:54 -07:00
Peter Bell	9984f4bb8b	Remove native_functions.yaml dependency from some reduction operators (#64173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64173 This one also required restructuring the code a bit to move the kernel code into seperate files. So, I've mainly focused on CUDA which is where the real build-time issues are. Test Plan: Imported from OSS Reviewed By: jbschlosser, ezyang Differential Revision: D30728581 Pulled By: dagitses fbshipit-source-id: a69eea5b4100d16165a02660dde200c8f648683d	2021-10-12 13:11:24 -07:00
Natalia Gimelshein	ee38a467ea	fix normal with empty std (#66463 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65709 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66463 Reviewed By: navahgar Differential Revision: D31561904 Pulled By: ngimel fbshipit-source-id: 3b2f44dc0ec075fe4f9685696578a0ff6e58d501	2021-10-12 11:28:11 -07:00
Edward Yang	8b0eae5aa8	Use shared CUPTI by default (#65401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65401 Per https://github.com/pytorch/pytorch/issues/57744 statically linked CUPTI causes exception handling to break on certain compiler configurations, likely because CUPTI comes with incompatible libstdc++ symbols. Rather than pray that something reasonable happens, use the safer configuration (dynamic linking) by default and give a warning if the user inverts the setting. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gdankel Differential Revision: D31082208 Pulled By: ezyang fbshipit-source-id: 14f66af920847e158436b5801c43f3124b109b34	2021-10-12 11:01:40 -07:00
Kimish Patel	c6216b2a43	Back out "Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source" (#66421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66421 Original commit changeset: ab6bb8fe4e83 Plus this incldes BUILD.bazel changes, the reason for the revert. Test Plan: See original diff Reviewed By: gdankel Differential Revision: D31542513 fbshipit-source-id: ee30aca2d6705638f97e04b77a9ae31fe5cc4ebb	2021-10-12 10:55:29 -07:00
Scott Wolchok	d7916e3734	[jit] Eliminate malloc & recursive refcount bumps in HashType::operator() (#65348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65348 Previously, this took several percent of model loading time. Now it is well under 1%. We get this savings by avoiding allocating a vector and avoiding reference count bumps on contained types within each type. ghstack-source-id: 140148562 Reviewed By: suo Differential Revision: D31057278 fbshipit-source-id: 55a02cbfefb8602e41baddc2661d15385fb2da55	2021-10-12 10:51:17 -07:00
Scott Wolchok	47c531b6e8	[jit] Compare object identity first in ClassType::operator== (#65347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65347 This check is much cheaper than anything involving actually inspecting object fields (i.e., the cost is low), and if it succeeds we can skip the expensive (e.g., it involves locking a weak_ptr and then destroying the resulting shared_ptr) function body. It almost entirely eliminates time spent in this function during model loading according to perf. ghstack-source-id: 140148561 Test Plan: Specifically I profiled static runtime startup for the ctr_mobile_feed model and saw self time in this function go from 2-3% to 0.36%. Reviewed By: ejguan Differential Revision: D31057279 fbshipit-source-id: efb6bdc0957b680112ac282e85dc1b06b1b6c0bd	2021-10-12 10:49:36 -07:00
Teng Zhang	17e79bc76c	remove is_reference from all is_output_quantized (#66456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66456 Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31562633 Pulled By: rahxephon89 fbshipit-source-id: 85c73a23e90ba9c1406f4027d447fbbe4576e39a	2021-10-12 10:43:52 -07:00
Jerry Zhang	702fb1de72	[fx2trt] open source tests for acc tracer (#66302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66302 Just move files, ossci can be setup later Test Plan: buck run //caffe2/test:test_fx_acc_tracer testinprod Reviewed By: 842974287 Differential Revision: D31495087 fbshipit-source-id: f182c7438e3e80ba98924990682cb45a99b9967c	2021-10-12 10:27:34 -07:00
Lu Fang	a6eec0c60f	Upgrade onnx submodule to 85546f8c44e627f8ff1181725d03cc49f675e44f (#66427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66427 Update the onnx submodule, so https://github.com/pytorch/pytorch/pull/66140 can land. Test Plan: ci Reviewed By: ezyang Differential Revision: D31544610 fbshipit-source-id: 94831ef531bbd654a6aeb744cd53a38155848079	2021-10-12 09:46:08 -07:00
Yinghai Lu	e6261083f9	[FX] fuse permute021 linear pass for trt lowering (#66362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66362 In general we cannot rely on Permute021Linear being kept as is before lowering phase before our transformation could have traced through this module. A acc based fx pass is more reliable to recover the perf. Test Plan: ``` buck run mode/opt -c python.package_style=inplace -c fbcode.nvcc_arch=a100 //hpc/new/models/ads/benchmarks:ads_dense_benchmark -- over-arch --model-version=23x_3tb --batch-size=2048 OverArch, PyTorch, FP16, BS: 2048, TFLOP/s: 53.22, Time per iter: 14.46ms, QPS: 141629.45 OverArch, TensorRT, FP16, BS: 2048, TFLOP/s: 92.20, Time per iter: 8.35ms, QPS: 245354.15 ``` Unittest: ``` buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fuse_permute_linear_trt ``` Reviewed By: jianyuh, wushirong, 842974287 Differential Revision: D31525307 fbshipit-source-id: b472a8c277aa4d156d933d6a5abec091133f22c5	2021-10-12 09:41:32 -07:00
Ivan Yashchuk	8818dda237	Fix lstsq to work with inputs that require grad (#66426 ) Summary: I updated `sample_inputs_linalg_lstsq` and `test_nondifferentiable` now correctly reveals the failure. The internal assert error was thrown because autograd attempts to mark integer tensor as differentiable. Fixes https://github.com/pytorch/pytorch/issues/66420. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66426 Reviewed By: ejguan Differential Revision: D31550942 Pulled By: albanD fbshipit-source-id: 4a0ca60e62c5e9bb96af5020541da2d09ea3e405	2021-10-12 08:52:21 -07:00
Peter Bell	213ac4e59c	Remove native_functions.yaml dependency from PointwiseOps (#64172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64172 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728584 Pulled By: dagitses fbshipit-source-id: 2ae9686ac7c312e2d470d26a3cad12afcf7ef47b	2021-10-12 08:12:25 -07:00
Peter Bell	8674a3c6e3	Remove native_functions.yaml dependency from PowKernel (#64171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64171 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728583 Pulled By: dagitses fbshipit-source-id: ea6891a3598eead93daea620b94e50d3a3b248cf	2021-10-12 08:12:23 -07:00
Peter Bell	1841f76cc0	Remove native_functions.yaml dependency from unary ops (#64170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64170 Test Plan: Imported from OSS Reviewed By: gchanan, ezyang Differential Revision: D30728578 Pulled By: dagitses fbshipit-source-id: 70baa90d0834e68324504c74064a1d1790193483	2021-10-12 08:11:03 -07:00
Kevin Tse	71e17d9827	[DataPipe] Fix HttpReader IterDataPipe Issue with streaming (#66432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66432 This PR aims to fix the same issue that was addressed in TorchData. See this [TorchData PR](https://github.com/pytorch/data/pull/51) and the corresponding [issue](https://github.com/pytorch/data/issues/42) for details. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31547565 Pulled By: NivekT fbshipit-source-id: 1e0cb13be270e6b81a11af54fa08cf6d7e7c5721	2021-10-12 07:37:57 -07:00
Mikhail Zolotukhin	5f1518609b	[TensorExpr] Fix lowering for aten::t. (#65859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65859 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D31289347 Pulled By: ZolotukhinM fbshipit-source-id: b9648416238657fe23366928e43ed63e992a8973	2021-10-12 01:26:36 -07:00
Mikhail Zolotukhin	6864146f2b	[TensorExpr] Fix lowerings for aten::view and aten::reshape. (#65852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65852 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31286024 Pulled By: ZolotukhinM fbshipit-source-id: eb5b5f2ed86b6f325f09904e841815b8183b4e1d	2021-10-12 01:26:34 -07:00
Mikhail Zolotukhin	60a2a295ce	[TensorExpr] Use schema instead of op name in NNC lowerings. (#65843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65843 Fixes #64963. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31282334 Pulled By: ZolotukhinM fbshipit-source-id: ffd0e1b6433d9360fedd9081c01ef41b21684439	2021-10-12 01:26:32 -07:00
Mikhail Zolotukhin	24b9b304d9	[TensorExpr] Nuke TE shape inference. (#65554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65554 We're relying on JIT based shape inference and not using the TE implementation. Question to the audience: we set `hasBroadcasts_` in that function, but this function was almost never invoked. Do we behave correctly in the presence of rand-calls and broadcasts? Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31148925 Pulled By: ZolotukhinM fbshipit-source-id: 2898a57e389ea0950163122089d0fec3d92701c4	2021-10-12 01:25:14 -07:00
Jacob Szwejbka	18e4688199	[Pytorch Edge] Improve bundled inputs name error handling (#65856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65856 Occasionally functions dont have this __name__ variable set and have name set instead? Not sure why this happens, but this should catch it. Test Plan: ci Reviewed By: iseeyuan Differential Revision: D31286787 fbshipit-source-id: 8a339541215329b6e9ff43ef77363be41f19c5ca	2021-10-12 00:08:39 -07:00
Michael Suo	2d1552824a	Revert D31386275: Migrate THCState to ATen Test Plan: revert-hammer Differential Revision: D31386275 (`a6774d6e1f`) Original commit changeset: 5c1f1bbe8c3d fbshipit-source-id: bea4e80fb0bdc57e8bb6a8ee781afd224adf4ed0	2021-10-11 22:30:08 -07:00
Mengwei Liu	d8532e3524	[PyTorch] Split c10 Type.cpp into two files to allow targets to include one of them (#66445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66445 `Type.cpp` implements `demangle()` function based on the macro `HAS_DEMANGLE`. This diff splits it into two `.cpps` so that we can add either one into the build target. This change follows the patternof `flags_use_no_gflags.cpp` and `flags_use_gflags.cpp`. Test Plan: Rely on CI Reviewed By: iseeyuan Differential Revision: D31551432 fbshipit-source-id: f8b11783e513fa812228ec873459ad3043ff9147	2021-10-11 21:52:24 -07:00
Michael Suo	07ec250fd7	[deploy] fix oss build (#66347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66347 It turns out that our hard-coded build flavor that we were running deploy tests on in CI no longer exists lol. This PR fixes the OSS build and also updates the build flavor. Differential Revision: D31517679 D31517679 Test Plan: Imported from OSS Reviewed By: malfet, shunting314 Pulled By: suo fbshipit-source-id: 763f126a3304f82e6dff7cff8c56414d82c54de3	2021-10-11 21:48:26 -07:00
Xiang Gao	9a85167d22	Fix batch_isend_irecv tests for err case (#63112 ) Summary: - `batch_isend_irecv` returns a list of requests instead of a single request - remove some unused variables Pull Request resolved: https://github.com/pytorch/pytorch/pull/63112 Reviewed By: pbelevich, wayi1, fduwjj Differential Revision: D30921265 fbshipit-source-id: e2075925172805d33974ef0de6fb631bdf33b5ea	2021-10-11 19:39:49 -07:00
James Reed	3eb9443619	[FX] Fix issue where GraphModule.delete_all_unused_submodules deletes submodules from called leaf modules (#66430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66430 On the whole, I'm not totally satisfied with this approach. I think we should be building a prefix tree data structure during initial iteration over the submodules and querying that when deleting submodules. But I think this approach works and I want to see if we can get it in before 1.10 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D31546137 Pulled By: jamesr66a fbshipit-source-id: f08b8409a3cf511277017ccccb916097b7c4c4fe	2021-10-11 19:37:51 -07:00
Peter Bell	a6774d6e1f	Migrate THCState to ATen (#65948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65948 This guts `THCState` to simply be an empty struct, as well as: - moving `THCState_getPeerToPeerAccess` and its cache into `ATen`. - cleaning up dead code in `THCGeneral.cpp` - moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386275 Pulled By: ngimel fbshipit-source-id: 5c1f1bbe8c3d2d9f5b99996e0588fb7f07fa6a77	2021-10-11 19:31:43 -07:00
Nikita Shulga	e7b5712c21	Call `PyArray_Check` only if NumPy is available (#66433 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66353 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66433 Reviewed By: seemethere, janeyx99 Differential Revision: D31548290 Pulled By: malfet fbshipit-source-id: 3b094bc8195d0392338e0bdc6df2f39587b85bb3	2021-10-11 19:25:31 -07:00
Vasiliy Kuznetsov	565cf47abf	Quantization docs: add pages for Numeric Suite (Eager and FX) (#66380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66380 Description: 1. creates doc pages for Eager and FX numeric suites 2. adds a link from main quantization doc to (1) 3. formats docblocks in Eager NS to render well 4. adds example code and docblocks to FX numeric suite Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31543173 Pulled By: vkuzo fbshipit-source-id: feb291bcbe92747495f45165f738631fa5cbffbd	2021-10-11 18:47:58 -07:00
Vasiliy Kuznetsov	8b1258698e	Improve quantization API docs (#66379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66379 Description: Creates a quantization API reference and fixes all the docblock errors. This is #66122 to #66210 squashed together Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, looks good ``` Reviewed By: ejguan Differential Revision: D31543172 Pulled By: vkuzo fbshipit-source-id: 9131363d6528337e9f100759654d3f34f02142a9	2021-10-11 18:46:11 -07:00
Eshika Shah	88ed93c2ca	Fix type checking errors in torch/quantization/fx/qconfig_utils.py (#66428 ) Summary: - [x] Fix the Pyre type checking errors in `torch/quantization/fx/qconfig_utils.py` ``` torch/quantization/fx/qconfig_utils.py:241:46 Incompatible variable type [9]: prepare_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/fx/qconfig_utils.py:267:46 Incompatible variable type [9]: convert_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. torch/quantization/fx/qconfig_utils.py:284:43 Incompatible variable type [9]: fuse_custom_config_dict is declared to have type `Dict[str, typing.Any]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/73](https://github.com/MLH-Fellowship/pyre-check/issues/73) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66428 Reviewed By: grievejia Differential Revision: D31545215 Pulled By: 0xedward fbshipit-source-id: 767ae7888854c2eec2ecf14855a5b011110b9271	2021-10-11 16:48:11 -07:00
Jerry Zhang	25965619dd	Back out "Revert D31495086: open source engine_layer_visualize.py" (#66431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66431 Original commit changeset: 186f3407a642 Test Plan: testinprod Reviewed By: 842974287 Differential Revision: D31546998 fbshipit-source-id: 4bc131d895cc4a7a84a4ff277df5f99e69ef4346	2021-10-11 16:06:23 -07:00
Nikita Shulga	ae5a9a451f	Do not enforce unused vars rule for torch_deploy (#66447 ) Summary: Followup after https://github.com/pytorch/pytorch/pull/66041 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66447 Reviewed By: seemethere Differential Revision: D31554356 Pulled By: malfet fbshipit-source-id: 6638324dcf658f4b244da285b4360ff2e2e2c013	2021-10-11 15:24:19 -07:00
Samuel Salas	7baf4f6b12	Chunk: Converter (#66028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66028 Added converter and unit test for torch.chunk function Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu Reviewed By: 842974287 Differential Revision: D31345180 fbshipit-source-id: 9425685671b474449e825aa2a8e7e867a329eb6e	2021-10-11 14:33:25 -07:00
Animesh Jain	cc24e4e5d0	[NNC] Normalize loops in SplitWithTail (#66242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66242 While working on random test generation, I observed that many simple transformations were upsetting vectorization. Digging deeper, I found that it calls SplitWithTail which incorrectly splits the loop when the loop start is not zero. This path normalizes the loop before we start splitting it. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31506853 Pulled By: anijain2305 fbshipit-source-id: 5c5f2568ce0a239bfaa515458be52541eafd23b1	2021-10-11 13:44:05 -07:00
Atul Jangra	49f1605392	[RFC] Reduce logging noise from AdagradOptimizer (#66443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66443 For some reason, this logging is adding noise to a lot of flow jobs. I am not sure if this is actually needed. This is called from the __init__ so it's logged all the time and logs all key:values the current local symbol. Test Plan: N/A Reviewed By: chowarfb Differential Revision: D31534372 fbshipit-source-id: bed032b66fed548c97a6f66b1b9e905fd2738851	2021-10-11 13:25:41 -07:00
Aliaksandr Ivanou	c03f851750	[torchelastic] Fix failing tests (#66440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66440 * Set correct name for test worker executable * Remove `test_get_override_executable` from oss, there already test that tests the functionality Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/launcher/fb:launch_test Reviewed By: d4l3k Differential Revision: D31544853 fbshipit-source-id: e1e009b4b38830d3a78981f8f93c2314ed851695	2021-10-11 13:06:36 -07:00
Animesh Jain	1d14fbdad7	[TensorExpr] Adding missing python binding for operators (#66336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66336 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31544865 Pulled By: anijain2305 fbshipit-source-id: 04be6cf079efc952d0f0b1e68f7f4da4a19c64fa	2021-10-11 12:47:41 -07:00
Richard Barnes	08fab7ae13	Wextra fix for Integration.cpp (#66321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66321 Fixes ``` stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (curr_shape.size() >= target_n_dim) ~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~ stderr: caffe2/aten/src/ATen/native/Integration.cpp:62:27: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long') [-Werror,-Wsign-compare] if (curr_shape.size() >= target_n_dim) ~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505347 fbshipit-source-id: 100b76215f78c3ce75bf4a993715a6767189747d	2021-10-11 12:30:46 -07:00
Scott Wolchok	8c468ce00b	[PyTorch][JIT] Return a reference from caching specializations of getTypePtr (#66342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66342 `decltype(auto)` in D31486117 (`fb5a80ffd8`) wasn't the right choice in these specializations, because it will still deduce a copy. See https://godbolt.org/z/GjbcPE1c4 for example. ghstack-source-id: 140144199 Test Plan: CI, added new static_assert to make sure we got it right for std::tuple in particular Reviewed By: hlu1, JasonHanwen Differential Revision: D31514960 fbshipit-source-id: cae722aa34345b590c46eae478229cb5f4b0d7dc	2021-10-11 12:17:50 -07:00
Scott Wolchok	998cb98844	[PyTorch][jit] Cache TupleType objects in getTypePtr (#66340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66340 For functions that take `std::vector`s with `std::tuple`s in them, `getTypePtr` can get hit on every call, in which case creating a new `TupleType` object every time is expensive. ghstack-source-id: 140143104 Test Plan: CI Reviewed By: hlu1, JasonHanwen Differential Revision: D31514792 fbshipit-source-id: 23652ca90ba1259afc05e953b99ce1fe1bebcc2b	2021-10-11 12:16:31 -07:00
Nikita Shulga	acb0157a3d	Specialization for `c10::util:get_type_index<std::string>` (#66290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66290 Add full specialization for std::string type index It slightly speeds up compilation as well as solves the ambiguity how template instantiations implemented in inline namespaces are rendered during `__PRETTY_FUNCTION__` computation. Not sure what `#pragma` controls this behaviour, but when code is compiled by clang-12+ using libstdc++, `__PRETTY_PRINT__`, sometimes resolve `std::string` to `std::basic_string<char>` and sometimes to `std::__cxx11::basic_string<char>`, even though in the object file symbol is always inside `std::__cxx11::` namespace, which might break caffe2 serialization code that depends on dynamic hash generation Template name resolution were debugged using https://gist.github.com/malfet/c83b9ebd35730ebf8bac7af42682ea37 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: r-barnes Differential Revision: D31490050 fbshipit-source-id: 127091574cf6b92c7ec3f972821e4e76f5f626a9	2021-10-11 11:11:59 -07:00
Rohan Varma	901df0cc22	Skip test_nccl_errors_nonblocking (#66394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66394 Skips this test as it currently does not seem to pass after several internal local runs. ghstack-source-id: 140210583 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31534806 fbshipit-source-id: 799849a6a715506a85c9697b46f7098d9b71b32e	2021-10-11 10:08:31 -07:00
Richard Barnes	221c308389	Wextra fix for LossCTC.cpp (#66381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66381 Fixes ``` stderr: caffe2/aten/src/ATen/native/cudnn/LossCTC.cpp:83:37: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const long' [-Werror,-Wsign-compare] TORCH_CHECK(input_lengths_.size() == batch_size, "input_lengths needs to have size to match batch_size"); ~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31510217 fbshipit-source-id: e3585e08650950c08d80d347dfae375aedf2ceaf	2021-10-11 10:02:53 -07:00
Don Jang	736fa09a9a	[Static Runtime] Manage output tensors (#65515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65515 This change enables `StaticRuntime` to manage output tensors (returned from a graph) as follows: - At the creation of `StaticModule`, it gathers a set of candidates for output tensors (& their aliases) for managing. This is done by `ValueGroup` introduced by the previous diff. - At the end of the 1st iteration, `MemoryPlanner` creates a set of output `at::Tensor` to manage. This set consists of tensors objects from the aforementioned candidates, excluding the direct output value of the graph to simplify ivalue ownership passing (`std::move(ivalue)` to return from SR). Note that this exclusion has no perf implication for inline_cvr & ctr_mobilefeed since they only return a container object (e.g., tuple). - The 2nd+ iterations preallocates a slab memory and all identified output tensors during the 1st iteration. Note that these preallocated tensors are NOT* deallocated when returned from SR. The client receives the output tensors, and completes using them, and is responsible to call `StaticRuntime::deallocateOutputTensors()` to deallocate them. This mandates that SR cannot be reentered until `deallocateOutputTensors` is called by the client. - In case of a buggy client missing a call to `StaticRuntime::deallocateOutputTensors()`, SR throws an exception when reentered instead of leaking memory. - Nit: I plan to use camlcase for function names, and so all newly introduced functions use camlcase despite inconsistencies with snakecase. We can gradually fix the inconsistencies. This change will be followed by another one to enable `manage_output_tensors` from `PyTorchScriptPredictor`, starting with `ptvsc2_prediction_bench` as a testbed. Test Plan: - Added `StaticRuntime.ManageOutputTensors*` to cover the newly added code paths. - Enhanced `testStaticRuntime` to exercise each unittest test case with `manage_output_tensors` on. Confirmed that SR actually managed output tensors successfully for a few existing testcases (e.g., StaticRuntime.EmbeddingBag`). Reviewed By: hlu1 Differential Revision: D31049221 fbshipit-source-id: 4ad1599179cc7f00d29e0ce41b33f776226d4383	2021-10-11 09:50:54 -07:00
Eli Uriegas	3b4b1b2d23	.github: Remove confusing ciflow_config.enabled variable (#66260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66260 Every workflow has ciflow enabled so this is not needed anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: dagitses, janeyx99 Differential Revision: D31493340 Pulled By: seemethere fbshipit-source-id: 8718fe5d22f4be6e0900962576782a9f23162a39	2021-10-11 09:39:31 -07:00
Peter Bell	c66847afbe	Add workaround for nvcc header dependecies bug (#62550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62550 I noticed that running the build twice in a row resulted in ~80 CUDA files being rebuilt. Running `ninja -d explain` shows ``` ninja explain: TH/generic/THStorage.h is dirty ninja explain: TH/generic/THStorageCopy.h is dirty ninja explain: THC/generic/THCStorage.h is dirty ninja explain: THC/generic/THCStorageCopy.h is dirty ninja explain: TH/generic/THTensor.h is dirty ninja explain: THC/generic/THCTensor.h is dirty ninja explain: THC/generic/THCTensorCopy.h is dirty ninja explain: THC/generic/THCTensorMath.h is dirty ninja explain: THC/generic/THCTensorMathMagma.h is dirty ninja explain: THC/generic/THCTensorMathPairwise.h is dirty ninja explain: THC/generic/THCTensorScatterGather.h is dirty ``` considering `ninja` is working relative to the `build` folder, these files don't actually exist. I traced this back to the output of `nvcc -MD` containing paths relative to the include directory, instead of being absolute. This adds a little script to launch the compiler then resolve any relative paths in the `.d` file before `ninja` looks at it. To use it, I run the build with ``` export CMAKE_CUDA_COMPILER_LAUNCHER="python;`pwd`/tools/nvcc_fix_deps.py;ccache" ``` There are some possible pit-falls here. The same relative path might work for two include directories, and the compiler could pick a different one. Or, the compiler might have additional implicit include directories that are needed to resolve the path. However, this has worked perfectly in my testing and it's completely opt-in so should be fine. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503351 Pulled By: malfet fbshipit-source-id: b184c4526679d976b93829b5715cafcb1c7db2ae	2021-10-11 09:07:12 -07:00
Nikita Shulga	c373387709	Update CMake and use native CUDA language support (#62445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62445 PyTorch currently uses the old style of compiling CUDA in CMake which is just a bunch of scripts in `FindCUDA.cmake`. Newer versions support CUDA natively as a language just like C++ or C. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31503350 fbshipit-source-id: 2ee817edc9698531ae1b87eda3ad271ee459fd55	2021-10-11 09:05:48 -07:00
Alban Desmaison	d3b29afbb6	Remove old code that is unused in test/ (#66331 ) Summary: . Pull Request resolved: https://github.com/pytorch/pytorch/pull/66331 Reviewed By: gchanan Differential Revision: D31533549 Pulled By: albanD fbshipit-source-id: 5addd11edc4199a88f10f0ff236be59ec2289903	2021-10-11 08:45:24 -07:00
Nikita Shulga	4775419850	[BE] Address feedback from #66296 (#66315 ) Summary: Also use range loop instead of regular one Pull Request resolved: https://github.com/pytorch/pytorch/pull/66315 Reviewed By: albanD Differential Revision: D31503730 Pulled By: malfet fbshipit-source-id: f5568f7f28e15a9becd27986dd061a6fcae34651	2021-10-11 08:39:29 -07:00
XiaobingSuper	822c0850cb	fix pybind issue for get_autocast_cpu_dtype and get_autocast_gpu_dtype (#66396 ) Summary: There has an issue when calling torch.get_autocast_cpu_dtype and torch.get_autocast_gpu_dtype: ``` >>> torch.get_autocast_gpu_dtype()==torch.half False >>> torch.get_autocast_cpu_dtype()==torch.bfloat16 False ``` but the expected results should be : ``` >>> torch.get_autocast_gpu_dtype()==torch.half True >>> torch.get_autocast_cpu_dtype()==torch.bfloat16 True ``` This PR is about fixing this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66396 Reviewed By: ejguan Differential Revision: D31541727 Pulled By: albanD fbshipit-source-id: 1a0fe070a82590ef2926a517bf48046c2633d168	2021-10-11 08:34:48 -07:00
Nikita Vedeneev	1b40daac74	pinv: forward/backward AD which is Frechet-defined in a rank-preserving neighborhood. (#66092 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65911. Also enables complex support/tests for `linalg_pinv` in OpInfo. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66092 Reviewed By: ejguan Differential Revision: D31503072 Pulled By: albanD fbshipit-source-id: 52018e826826ae62beaad76becb5edf880be253f	2021-10-11 08:33:28 -07:00
Jane Xu	7c2f53b363	[BE] set pretrained=False for onnx tests (#66312 ) Summary: Addresses this network risk mitigation mentioned in https://github.com/pytorch/pytorch/issues/65439#issuecomment-924627239. I didn't include any mobile app/benchmarking changes because I think the pretrained matters there. I ended up removing the changes in test_utils because those were sensitive to the pretrained variable. I am saving the quantization test changes for another PR because they are currently disabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66312 Reviewed By: ejguan Differential Revision: D31542992 Pulled By: janeyx99 fbshipit-source-id: 57b4f70247af25cc96c57abd9e689c34641672ff	2021-10-11 08:29:11 -07:00
Vasiliy Kuznetsov	1d9a6862cd	fx quant: add a BC test for loading old torch.package models (#65538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65538 Adds a test which verifies that `prepare_fx` and `convert_fx` work on models created by `torch.package` in the past. In detail: 1. (one time) create a model and save it with torch.package. Also save input, expected output, and names of quantization related get_attrs added by our passes. 2. (every time) load the model from (1), and verify that expected output matches current output, and that get_attr targets did not change. Test Plan: ``` python test/test_quantization.py TestSerialization.test_linear_relu_package_quantization_transforms ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31512939 fbshipit-source-id: 718ad5fb66e09b6b31796ebe0dc698186e9a659f	2021-10-11 08:23:38 -07:00
Hong Xu	0348148725	Update link to qnnpack in quantization doc. (#66226 ) Summary: The old repo has been archived. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66226 Reviewed By: vkuzo Differential Revision: D31534712 Pulled By: ezyang fbshipit-source-id: 4d7f070c8547aeb25464c72b25ed21f209821bc2	2021-10-11 08:19:19 -07:00
Shen Li	58fefa6516	Add pybind trampoline for ProcessGroup and Work (#66338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66338 This commit exposes c10d extension API to Python land. Users can now override c10d communication behaviors in pure Python, and no longer needs to go through the cpp extension steps. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31514351 Pulled By: mrshenli fbshipit-source-id: a8b94af0af7960c078e1006c29b25f7f3bd86c81	2021-10-11 06:41:06 -07:00
Luca Wehrstedt	bc06eefebe	[reland] Allow external CUDA streams to be set as current (#66324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66324 Fixes https://github.com/pytorch/pytorch/issues/65822. Reland of https://github.com/pytorch/pytorch/pull/65914. ghstack-source-id: 140105651 Test Plan: Added tests Reviewed By: ngimel Differential Revision: D31506134 fbshipit-source-id: ff56203a120befdb282e974309478ac11aa56652	2021-10-11 02:41:43 -07:00
Chen Lai	355acfdebc	[PyTorch Edge][tracing-based] use operator.yaml to build libtorch library (#66237 ) Summary: https://pxl.cl/1QK3N Enable using the yaml file from tracer to build libtorch library for ios and android. 1. Android: ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 ./scripts/build_pytorch_android.sh x86 ``` libtorch_lite.so x86: 3 MB (larger than H1, static is ~3.2 MB) 2. iOS ``` SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/tracing/deeplabv3_scripted_tracing_update.yaml TRACING_BASED=1 BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR ./scripts/build_ios.sh ``` Binary size: 7.6 MB Size: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66237 ghstack-source-id: 140197164 Reviewed By: dhruvbird Differential Revision: D31463119 fbshipit-source-id: c3f4eb71bdef1969eab6cb60999fec8547641cbd	2021-10-10 14:07:01 -07:00
Mike Ruberry	9971113340	Revert D31447612: Create a documentation page for FX graph mode quantization APIs Test Plan: revert-hammer Differential Revision: D31447612 (`a89ac3138e`) Original commit changeset: 07d0a6137f15 fbshipit-source-id: f2cba7d835011500580b4ab9cff72171280ee18b	2021-10-10 01:51:13 -07:00
Mike Ruberry	b85fd4c54f	Revert D31447613: Create separate documentation pages for quantization observers and fake_quants Test Plan: revert-hammer Differential Revision: D31447613 (`f0fa3d1110`) Original commit changeset: 63b4cf518bad fbshipit-source-id: 67de592d1e12a5149cdb22b0725caad063f94476	2021-10-10 01:51:11 -07:00
Mike Ruberry	10633460ce	Revert D31447614: Create a documentation page for `torch.ao.quantization.QConfig` Test Plan: revert-hammer Differential Revision: D31447614 (`7332ed13ed`) Original commit changeset: 5d9dd2a4e864 fbshipit-source-id: 6ac15a956222ca61f7fbb75ed36bcc58b23f0f36	2021-10-10 01:51:09 -07:00
Mike Ruberry	037ac2330e	Revert D31447616: Quantization docs: consilidate all API references on a single page Test Plan: revert-hammer Differential Revision: D31447616 (`fe86f0e068`) Original commit changeset: 2f9c4dac2b2f fbshipit-source-id: 673368e87399f0a25441688bb9356de5a2f3e66e	2021-10-10 01:51:07 -07:00
Mike Ruberry	09c3e6002b	Revert D31447615: Quantization docs: rewrite API reference to be more automated Test Plan: revert-hammer Differential Revision: D31447615 (`7d2526ab20`) Original commit changeset: 09874ad9629f fbshipit-source-id: 0963c9f5118e243cd299f8cded2bf7b0848a7105	2021-10-10 01:51:05 -07:00
Mike Ruberry	df1858bea5	Revert D31447611: Quantization documentation: move backend section down Test Plan: revert-hammer Differential Revision: D31447611 (`309a8cf46c`) Original commit changeset: 537b146559bc fbshipit-source-id: c400aef9a2ea5d18f8076879fe6354be7a6732f1	2021-10-10 01:51:03 -07:00
Mike Ruberry	ad0accdecd	Revert D31447610: Quantization docs: add pages for Numeric Suite (Eager and FX) Test Plan: revert-hammer Differential Revision: D31447610 (`9539e6216b`) Original commit changeset: 441170c4a6c3 fbshipit-source-id: b49bff54405cdb8465397077e38506a36b277921	2021-10-10 01:49:19 -07:00
Mike Ruberry	291d463cf9	Revert D31495086: open source engine_layer_visualize.py Test Plan: revert-hammer Differential Revision: D31495086 (`150b7c7410`) Original commit changeset: 1f5505d6baac fbshipit-source-id: 186f3407a6423f0981f0b7a2e7408ce53013fceb	2021-10-10 01:45:21 -07:00
John Shen	0e0c98077f	[quantized] Implement 3d convolution in qnnpack (#66350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66350 Implements conv3d for QNNPACK by writing another kernel for the indirection buffer in 3 dimensions. Modifies all structs to take depth, with depth = 1 indicating 2d operation. gemm and conv (non transpose) work, next up is depthwise and tranpose. ghstack-source-id: 140152440 Test Plan: test/quantization Reviewed By: kimishpatel Differential Revision: D30858693 fbshipit-source-id: 883cca8ec53b9e15ab4b9473c6cc042e3d049d9c	2021-10-09 12:28:24 -07:00
Jerry Zhang	150b7c7410	open source engine_layer_visualize.py (#66301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66301 Test Plan: testinprod Reviewed By: 842974287 Differential Revision: D31495086 fbshipit-source-id: 1f5505d6baac66eca11a35ce9532d6c7c7513190	2021-10-09 10:25:03 -07:00
Facebook Community Bot	27f193af64	Automated submodule update: kineto (#59674 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/kineto](https://github.com/pytorch/kineto). New submodule commit: `6f9c0eeff5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59674 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: larryliu0820 Differential Revision: D28977762 fbshipit-source-id: d441d4d46a7044cc05eb8b21e59471deee312e02	2021-10-09 09:34:32 -07:00
Peter Bell	84326ef059	Remove native_functions.yaml dependency from binary ops (#64169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64169 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728586 Pulled By: dagitses fbshipit-source-id: 17d645b6712815d1967b9ff83eecc4d16833ee6b	2021-10-09 09:25:48 -07:00
Vasiliy Kuznetsov	9539e6216b	Quantization docs: add pages for Numeric Suite (Eager and FX) (#66222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66222 Description: 1. creates doc pages for Eager and FX numeric suites 2. adds a link from main quantization doc to (1) 3. formats docblocks in Eager NS to render well 4. adds example code and docblocks to FX numeric suite Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447610 Pulled By: vkuzo fbshipit-source-id: 441170c4a6c3ddea1e7c7c5cc2f1e1cd5aa65f2f	2021-10-09 06:46:06 -07:00
Vasiliy Kuznetsov	309a8cf46c	Quantization documentation: move backend section down (#66210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66210 Description: Moves the backend section of the quantization page further down, to ensure that the API description and reference sections are closer to the top. Test Plan: ``` cd docs make html python -m server.http // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447611 Pulled By: vkuzo fbshipit-source-id: 537b146559bce484588b3c78e6b0cdb4c274e8dd	2021-10-09 06:46:04 -07:00
Vasiliy Kuznetsov	7d2526ab20	Quantization docs: rewrite API reference to be more automated (#66201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66201 Description: This PR switches the quantization API reference to use `autosummary` for each section. We define the sections and manually write a list of modules/functions/methods to include, and sphinx does the rest. A result is a single page where we have every quantization function and module with a quick autogenerated blurb, and user can click through to each of them for a full documentation page. This mimics how the `torch.nn` and `torch.nn.functional` doc pages are set up. In detail, for each section before this PR: * creates a new section using `autosummary` * adds all modules/functions/methods which were previously in the manual section * adds any additional modules/functions/methods which are public facing but not previously documented * deletes the old manual summary and all links to it Test Plan: ``` cd docs make html python -m http.server // renders well, links work ``` Reviewed By: jerryzh168 Differential Revision: D31447615 Pulled By: vkuzo fbshipit-source-id: 09874ad9629f9c00eeab79c406579c6abd974901	2021-10-09 06:46:02 -07:00
Vasiliy Kuznetsov	fe86f0e068	Quantization docs: consilidate all API references on a single page (#66198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66198 Consolidates all API reference material for quantization on a single page, to reduce duplication of information. Future PRs will improve the API reference page itself. Test Plan: ``` cd docs make html python -m http.server // renders well ``` Reviewed By: jerryzh168 Differential Revision: D31447616 Pulled By: vkuzo fbshipit-source-id: 2f9c4dac2b2fb377568332aef79531d1f784444a	2021-10-09 06:46:00 -07:00
Vasiliy Kuznetsov	7332ed13ed	Create a documentation page for `torch.ao.quantization.QConfig` (#66129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66129 Adds a documentation page for `torch.ao.quantization.QConfig`. It is useful for this to have a separate page since it shared between Eager and FX graph mode quantization. Also, ensures that all important functions and module attributes in this module have docstrings, so users can discover these without reading the source code. Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, renders correctly ``` Reviewed By: jerryzh168 Differential Revision: D31447614 Pulled By: vkuzo fbshipit-source-id: 5d9dd2a4e8647fa17b96cefbaae5299adede619c	2021-10-09 06:45:58 -07:00
Vasiliy Kuznetsov	f0fa3d1110	Create separate documentation pages for quantization observers and fake_quants (#66125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66125 Before this PR, the documentation for observers and fake_quants was inlined in the Eager mode quantization page. This was hard to discover, especially since that page is really long, and we now have FX graph mode quantization reusing all of this code. This PR moves observers and fake_quants into their own documentation pages. It also adds docstrings to all user facing module attributes such as the default observers and fake_quants, so people can discover them from documentation without having to inspect the source code. For now, enables autoformatting (which means all public classes, functions, members with docstrings will get docs). If we need to exclude something in these files from docs in the future, we can go back to manual docs. Test Plan: ``` cd docs make html python -m server.http // inspect docs on localhost, renders correctly ``` Reviewed By: dagitses Differential Revision: D31447613 Pulled By: vkuzo fbshipit-source-id: 63b4cf518badfb29ede583a5c2ca823f572c8599	2021-10-09 06:45:56 -07:00
Vasiliy Kuznetsov	a89ac3138e	Create a documentation page for FX graph mode quantization APIs (#66122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66122 Description: Adds a documentation page for FX graph mode quantization APIs which reads from the docstrings in `quantize_fx`, and links it from the main quantization documentation page. Also, updates the docstrings in `quantize_fx` to render well with reStructuredText. Test Plan: ``` cd docs make html python -m http.server // open webpage, inspect it, looks good ``` Reviewed By: dagitses Differential Revision: D31447612 Pulled By: vkuzo fbshipit-source-id: 07d0a6137f1537af82dce0a729f9617efaa714a0	2021-10-09 06:44:38 -07:00
CodemodService FBSourceClangFormatLinterBot	b96c7aea73	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31527108 fbshipit-source-id: 40360ebf92e67fd95613cedea9988fbe52519de6	2021-10-09 06:03:49 -07:00
Richard Barnes	109aa135e6	Remove apparently unnecessary std::remove_cv_t (#66254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66254 `std::decay_t` already implies dropping the const Test Plan: Sandcastle Reviewed By: malfet Differential Revision: D31465856 fbshipit-source-id: 851cdb9194354fe9a89b3a37a4463a43dbbcd77a	2021-10-09 00:38:44 -07:00
Richard Barnes	4cb4d11e0b	Disable "-Wignored-qualifiers" for vec256_bfloat16.h (#66279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66279 This error appears when compiling with "-Wextra" and cannot be resolved by fixing the code since the return type of the instrinic being passed to `map` is fixed. Fixes: ``` caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] Vectorized<BFloat16> map(const __m256 (const vop)(__m256)) const { ^~~~~~ caffe2/aten/src/ATen/cpu/vec/vec256/vec256_bfloat16.h:204:28: error: 'const' type qualifier on return type has no effect [-Werror,-Wignored-qualifiers] Vectorized<BFloat16> map(const __m256 (const vop)(__m256)) const { ^~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31480888 fbshipit-source-id: 919c0d48c8ce13ce1106a9df124a077945e36707	2021-10-08 21:47:41 -07:00
Chen Lai	3fe5895a00	Back out "Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS" (#66267 ) Summary: Previously https://github.com/pytorch/pytorch/pull/64087 broke the test `binary_macos_wheel_3_7_cpu_build`, because wheel build is not happy with `model_tracer`. Considering it's prototype and there is no need to ship model_tracer via wheel at the moment, using the option `TRACING_BASED` for building tracer. When tracing-based is mature enough, we can ship the tracer binary via wheel eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66267 Original commit changeset: 8ac3d75a52d0 ghstack-source-id: 140122106 Test Plan: binary_macos_wheel_3_7_cpu_build passes {F668643831} Reviewed By: dhruvbird Differential Revision: D31478593 fbshipit-source-id: 726cab1b31c4596f6268b7824eecb20e2e59d161	2021-10-08 20:12:12 -07:00
Scott Wolchok	1763c25414	[PyTorch][jit] Fix excess refcounting in TupleType::compare (#66286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66286 No need to take refcount bumps on each comparator call. Test Plan: CI, review Reviewed By: hlu1, JasonHanwen Differential Revision: D31487058 fbshipit-source-id: 98d2447ac27a12695cb0ebe1e279a6b50744ff4f	2021-10-08 20:08:07 -07:00
Scott Wolchok	fb5a80ffd8	[jit] Don't force refcount bumps from getTypePtr (#66282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66282 Now that a bunch of the `FooType::get()` functions return a const reference, we can forward that behavior through `getTypePtr()` using return type deduction. Test Plan: Inspect assembly for List_test.cpp before/after the rest of the change; reference counting is no longer in the happy path. Reviewed By: hlu1, JasonHanwen Differential Revision: D31486117 fbshipit-source-id: 863b677bb6685452a5b325d327bdc2a0a09627bf	2021-10-08 20:06:43 -07:00
Eshika Shah	85b562dd2b	Fix type checking errors in fx/utils.py (#66311 ) Summary: - [x] Fix the Pyre type checking errors in `torch/quantization/fx/utils.py` ``` torch/quantization/fx/utils.py:490:4 Incompatible variable type [9]: target_module_type is declared to have type `Type[nn.modules.module.Module]` but is used as type `None`. ``` Fixes the issue: [MLH-Fellowship/pyre-check/issues/75](https://github.com/MLH-Fellowship/pyre-check/issues/75) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66311 Reviewed By: pradeep90 Differential Revision: D31506399 Pulled By: 0xedward fbshipit-source-id: 3d866fba6005452378d4a2613b8689fa2d7a8b67	2021-10-08 19:14:22 -07:00
Shiyan Deng	e5f6f356da	[hpc infer] fix bench perf number Reviewed By: yinghai, jianyuh Differential Revision: D31505288 fbshipit-source-id: e4951a7c5813e0ee38903dec4cef61531f1b4059	2021-10-08 19:11:04 -07:00
Jane Xu	904fbadaff	Fix merge conflict in bc tests (#66356 ) Summary: BC test currently borken on trunk Pull Request resolved: https://github.com/pytorch/pytorch/pull/66356 Reviewed By: malfet Differential Revision: D31523340 Pulled By: janeyx99 fbshipit-source-id: a8d1ff697f017c710f70a76b5bb6a2f89d7637c7	2021-10-08 18:45:15 -07:00
Scott Wolchok	5a67ffe0ad	[PyTorch][Static Runtime] Combine ProcessedNode::{native_,}fn_ (#65414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65414 Saves 24 bytes (`sizeof(std::function) - 8`) per ProcessedNode. ghstack-source-id: 139999909 Test Plan: CI Reviewed By: hlu1 Differential Revision: D31085561 fbshipit-source-id: 70734b8319e805736ba41aedaaf7fa3d463400c9	2021-10-08 18:11:59 -07:00
Vasiliy Kuznetsov	566922bbcd	clean up mypy nit in torch/jit/_recursive.py (#66253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66253 This was initially broken in #65829 and unbroken in #66003, this PR cleans it up by removing the mypy ignore line. Test Plan: ``` mypy torch/jit/_recursive.py --no-incremental ``` Imported from OSS Reviewed By: supriyar Differential Revision: D31475100 fbshipit-source-id: 46ab2ede72c08b926f4f9a6b03b1a1375b884c8a	2021-10-08 18:07:33 -07:00
Richard Barnes	4a302a3074	Wextra fix for CUDAApplyUtils.cuh (#66323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66323 Fixes ``` /data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:310:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare] const IndexType bOffset = sizeof...(Offsets) < n ? ~~~~~~~~~~~~~~~~~~ ^ ~ /data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/caffe2/aten/src/ATen/cuda/CUDAApplyUtils.cuh:306:48: error: comparison of integers of different signs: 'unsigned long' and 'int' [-Werror,-Wsign-compare] const IndexType aOffset = sizeof...(Offsets) < n ? ~~~~~~~~~~~~~~~~~~ ^ ~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505428 fbshipit-source-id: 326fa8f41f2b200981eddc5cab035b18536cd24e	2021-10-08 18:02:09 -07:00
Jane Xu	0a48f56318	Revert D31299350: Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" Test Plan: revert-hammer Differential Revision: D31299350 (`f1f3bd8c36`) Original commit changeset: 9ad5c8fa17f7 fbshipit-source-id: d63d889922f507a4a0e2e042e451b95b9591c317	2021-10-08 17:55:28 -07:00
Jane Xu	c62ed96496	Revert D30710710: [Pytorch Edge] Support profiling kineto events from external source Test Plan: revert-hammer Differential Revision: D30710710 (`c1343ff706`) Original commit changeset: 51399f9b0b64 fbshipit-source-id: ab6bb8fe4e83ed1052e621e427259192a4f0f540	2021-10-08 17:46:18 -07:00
Peter Bell	c957d9fdf6	Replace _baddbmm_mkl_ with cpublas::gemm_batched (#66165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66165 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31493952 Pulled By: ngimel fbshipit-source-id: 87cf79036c2d0f4955edbeeeb78f578b0fd223ab	2021-10-08 17:12:14 -07:00
Richard Barnes	51835bec07	Wextra fix 1 for caffe2 (#66272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66272 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31475543 fbshipit-source-id: f6e02d299d0b792ddb37534ad85db82af65bb42a	2021-10-08 16:36:13 -07:00
Zafar Takhirov	a28b038af4	[ao_migration] torch/nn/intrinsic: torch.quantization -> torch.ao.quantization (#65903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65903 This changes the imports in the `caffe2/torch/nn/intrinsic` to include the new import locations. ``` codemod -d torch/nn/intrinsic --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: albanD Differential Revision: D31301195 fbshipit-source-id: a5a9d84cb1ac33df6c90ee03cda3e2f1c5d5ff51	2021-10-08 16:21:23 -07:00
Zafar Takhirov	2daae532bd	[ao_migration] torch/nn/qat: torch.quantization -> torch.ao.quantization (#65902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65902 This changes the imports in the `caffe2/torch/nn/qat` to include the new import locations. ``` codemod -d torch/nn/qat --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301196 fbshipit-source-id: ff237790d74cd3b3b5be642a997810f4f439a1d8	2021-10-08 16:21:21 -07:00
Zafar Takhirov	1a6482ee2a	[ao_migration] torch/nn/quantizable: torch.quantization -> torch.ao.quantization (#65901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65901 This changes the imports in the `caffe2/torch/nn/quantizable` to include the new import locations. ``` codemod -d torch/nn/quantizable --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301194 fbshipit-source-id: 8ce8a3015ea61da62d7658846d1ca64fbdabaf7a	2021-10-08 16:21:19 -07:00
Zafar Takhirov	b23709df03	[ao_migration] torch/nn/quantized: torch.quantization -> torch.ao.quantization (#65900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65900 This changes the imports in the `caffe2/torch/nn/quantized` to include the new import locations. ``` codemod -d torch/nn/quantized --extensions py 'torch.quantization' 'torch.ao.quantization' ``` Test Plan: `python test/run_test.py` Reviewed By: jerryzh168 Differential Revision: D31301193 fbshipit-source-id: 58efb1ad51a8b441e2a3bd5b91af11eab6b9331f	2021-10-08 16:19:53 -07:00
Rohan Varma	f1f3bd8c36	Back out "Revert D31005792: [NCCL] Init dummy NCCL comms in constructor" (#65883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65883 Original commit changeset: d8e962b8aab6 ghstack-source-id: 139836954 Test Plan: ci Reviewed By: zhaojuanmao Differential Revision: D31299350 fbshipit-source-id: 9ad5c8fa17f7038ba579cb1eda6d9271ac07a130	2021-10-08 16:04:20 -07:00
Kimish Patel	c1343ff706	[Pytorch Edge] Support profiling kineto events from external source (#64397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64397 This diff exposes a way to add events to kineto profiler from external source. This can be a backend that executes a subgraph and wants to record this execution in kineto profiler. This diff also adds "backend" metadata to identify the backend an event would have executed on. Test Plan: test_lite_interpreter Imported from OSS Reviewed By: raziel Differential Revision: D30710710 fbshipit-source-id: 51399f9b0b647bc2d0076074ad4ea9286d0ef3e2	2021-10-08 15:59:42 -07:00
Richard Barnes	8a02d3e5d0	Wextra fix for Tensorshape.cpp (#66320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66320 Fixes ``` stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare] for (size_t offset = 0; offset < numel; offset++) { ~~~~~~ ^ ~~~~~ stderr: caffe2/aten/src/ATen/native/TensorShape.cpp:619:36: error: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'long' [-Werror,-Wsign-compare] for (size_t offset = 0; offset < numel; offset++) { ~~~~~~ ^ ~~~~~ ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31505374 fbshipit-source-id: 0fc393dacd72a8b29c0d82561f730cc047b38f0c	2021-10-08 15:03:47 -07:00
Peter Bell	731cf494f2	Remove cuda/Loops.cuh dependency on native_functions.yaml (#64168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64168 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728582 Pulled By: dagitses fbshipit-source-id: 99dcbb9bb790dd0440d498593ac43e2c18e54a0c	2021-10-08 12:58:52 -07:00
Raghavan Raman	92ce188510	Revert D31445799: [nnc] Use given kernel function name while emitting code Test Plan: revert-hammer Differential Revision: D31445799 (`c30dc52739`) Original commit changeset: 8d1642098313 fbshipit-source-id: 6b9d8c816437e9fcba8eb19cc683bc0a46a04cf5	2021-10-08 12:39:01 -07:00
Raghavan Raman	2e6fa0261f	Revert D31445797: [nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination Test Plan: revert-hammer Differential Revision: D31445797 (`7e5ef5e517`) Original commit changeset: 4e1450100928 fbshipit-source-id: fc13b34dbb66c7a22816eb46cf6d98ae9f332d39	2021-10-08 12:38:59 -07:00
Raghavan Raman	097fdcdf0c	Revert D31445798: [Static Runtime] Cleanup LLVMCodeGen memory after code gen completes Test Plan: revert-hammer Differential Revision: D31445798 (`40dd2711b6`) Original commit changeset: c860d36456b2 fbshipit-source-id: 64d900cad87113e6b871aedd6669e771a7ede5cc	2021-10-08 12:37:48 -07:00
Peter Bell	0be36d798b	Remove Tensor.h include from TensorIterator.h (#64167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64167 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30728579 Pulled By: dagitses fbshipit-source-id: 3888da00c9c8030013c8f4b39d300fe671defb05	2021-10-08 12:28:37 -07:00
Peter Bell	bc1dec9b81	Migrate THCStorage_resizeBytes to ATen (#65944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65944 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386276 Pulled By: ngimel fbshipit-source-id: a2b28bc09d11a856fdd3796d3df6f96613f13437	2021-10-08 11:50:52 -07:00
John Clow	3bad54069b	Concatting multiple linear layers with same input Tensor (different weight/bias) (#63198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63198 Linear layers using the same input tensor can be concatted together as long as the weights and biases are compatible. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31240642 fbshipit-source-id: 1e78daa6b89822412ba2513d326ee0e072ceff1e	2021-10-08 10:55:46 -07:00
Scott Wolchok	94845fc44e	[jit] Implement one-argument AliasDb::mayContainAlias more efficiently (#65177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65177 There is no need to heap-allocate any vectors in this case. ghstack-source-id: 140052520 Test Plan: CI Startup for static runtime on ctr_mobile_feed local net decreased from 7.8s to about 7.0s Reviewed By: malfet Differential Revision: D30984194 fbshipit-source-id: 85091e55445f653ec728b27da4b459a2f1873013	2021-10-08 10:29:25 -07:00
Scott Wolchok	c80693f7e6	[jit] Add cache for MemoryDAG::collectAllContainedMemoryLocations (#65122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65122 Failure to cache this seems to contribute to quadratic startup time for the static runtime. Disclaimer: I am entirely un-versed in the performance considerations for the JIT and have no idea what the other impacts of this change may be. Let the reviewer beware. ghstack-source-id: 140052522 Reviewed By: suo Differential Revision: D30983268 fbshipit-source-id: 4329aee6b5781f5c2e2d2334c396fab8528d4b7b	2021-10-08 10:29:23 -07:00
Scott Wolchok	3ef69a4598	[static runtime] Pre-allocate hash tables (#65343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65343 No reason not to save a bit on re-hashing. ghstack-source-id: 140052518 Test Plan: CI Static runtime startup seems to go from 5.9-6.0s to 5.8s-6.0s, perf shows less time spent rehashing Reviewed By: mikeiovine Differential Revision: D31027362 fbshipit-source-id: 39dd53ecd462693b518535856ddd92df78a4977b	2021-10-08 10:28:13 -07:00
Peter Bell	0020a151c6	slow_conv3d grad_weight: call gemm directly (#65759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65759 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31257873 Pulled By: ngimel fbshipit-source-id: 1612c0be10b2aa269c807c7b9f5470172ed68dc1	2021-10-08 09:55:08 -07:00
Yanli Zhao	dfb64b3287	log API usage for fsdp API in PyTorch (#64964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64964 log API usage for fsdp API in PyTorch Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D30915734 fbshipit-source-id: 5e3b335327f4a3ff59b025e8e17a0fa0b7f6597d	2021-10-08 09:32:59 -07:00
Luca Wehrstedt	201174cb91	Revert D31389480: [pytorch][PR] Allow external CUDA streams to be set as current Test Plan: revert-hammer Differential Revision: D31389480 (`61f0bb70c1`) Original commit changeset: 2b2f40e5452c fbshipit-source-id: c6631e51abcf3819732f981f646cb77b91569c7d	2021-10-08 09:20:24 -07:00
Rohan Varma	b72a1782d8	[PG Wrapper][BE] Add collective information when monitored barrier error is (#66167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66167 Sometimes due to desync we see PG wrapper monitored barrier fail. In this case it would be useful to print the info about the collective that was trying to run along with the actual error. ghstack-source-id: 140037653 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353021 fbshipit-source-id: e2a515326c9314c98119978d5566eb5431cca96c	2021-10-08 09:14:24 -07:00
Rohan Varma	b5b1d49a66	[PG Wrapper][BE] Make some methods private (#66166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66166 These methods should be private. ghstack-source-id: 139782587 Test Plan: CI Reviewed By: cbalioglu Differential Revision: D31353020 fbshipit-source-id: 583fb315cc2cacc37df3d29cd5793b42558930b3	2021-10-08 09:13:02 -07:00
Peter Bell	0cad2c0615	Move intraop_launch_future from Parallel.h (#64166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64166 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D30728585 Pulled By: dagitses fbshipit-source-id: 75a41418ae9218bec9bac27597051295222b6eee	2021-10-08 09:07:35 -07:00
Scott Wolchok	2d885ab73d	[jit] Reduce refcounting of Types (#65345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65345 FooType::get() can return a const reference. Inconveniently, converting shared_ptr<FooType> to shared_ptr<Type> requires a copy & refcount bump, so to properly take advantage of this in unshapedType() we need to take a const Type& in isSubtypeOf(), which is good practice anyway -- don't require a shared_ptr if you don't need to take ownership. ghstack-source-id: 140044165 Test Plan: CI perf says c10::unshapedType time decreased from 2.8% to 2.2% during static runtime startup, though I expect this to be generally beneficial. Reviewed By: hlu1 Differential Revision: D31027361 fbshipit-source-id: 676feb81db9f74ad7b8651d8774f4ecb4cfa6ab8	2021-10-08 09:03:04 -07:00
Scott Wolchok	1ae468a484	[jit] Refcounting spot fixes (#65346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65346 Tidying up the top sources of reference count decrements seen during static runtime startup. ghstack-source-id: 140027349 Test Plan: CI perf now shows under 2% time spend in ~__shared_count instead of about 5%. Reviewed By: suo Differential Revision: D31057277 fbshipit-source-id: 9a16daf2e655fda80d4ec21290b30f02ba63d8da	2021-10-08 08:39:20 -07:00
Kevin Tse	8ebe1a924d	[DataPipe] moving mux IterDataPipe test to the right location (#66277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66277 Previously, it is grouped together with tests related to `MapDataPipe`, but it should be with `IterDataPipe`. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485823 Pulled By: NivekT fbshipit-source-id: d13d8c28cbfc305da0e3033d4109a0f971281a02	2021-10-08 08:32:29 -07:00
Kevin Tse	ed17851642	[DataPipe] adding test for IterableWrapperIterDataPipe (#66276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66276 cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485824 Pulled By: NivekT fbshipit-source-id: c7b21636e4b17e264bfb5dbea69cd3c477472f0b	2021-10-08 08:32:26 -07:00
Kevin Tse	e808e3d3d6	[DataPipe] adding SequenceWrapperMapDataPipe (#66275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66275 Once this is added to Core, TorchData's PR will not need a custom class and can use this wrapper instead. cc VitalyFedyunin ejguan NivekT Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31485822 Pulled By: NivekT fbshipit-source-id: 790de27629c89c0ca7163a8ee5a09ee8b8233340	2021-10-08 08:32:24 -07:00
Vasiliy Kuznetsov	a7cc07f109	quantized embedding: make error message clearer (#66051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66051 Make the error message clearer when quantized embedding is converted with an unsupported dtype. This is helpful when debugging quantization errors on new models. Test Plan: ``` class M(nn.Module): def __init__(self): super().__init__() self.embedding = nn.Embedding(1, 1) m = M().eval() m.qconfig = torch.quantization.QConfig( activation=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8), weight=torch.quantization.MinMaxObserver.with_args(dtype=torch.qint8)) m.embedding.qconfig = m.qconfig mp = torch.quantization.prepare(m) mq = torch.quantization.convert(m) // error message now includes the incorrect dtype ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472848 fbshipit-source-id: 86f6d90bc0ad611aa9d1bdae24497bc6f3d2acaa	2021-10-08 08:32:22 -07:00
Vasiliy Kuznetsov	c9aba3b128	make error message when trying to quantize non floats more specific (#66050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66050 Adds the dtype to an error message when trying to quantize something other than a float. This is useful for debugging quantization tools on new models. Test Plan: ``` x = torch.randn(1, 1, 1, 1, dtype=torch.double) xq = torch.quantize_per_tensor(x, 0.01, 0, torch.quint8) // error message now includes Double ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472849 fbshipit-source-id: 2331ffacefcbc6f8eca79694757d740de74a0f1d	2021-10-08 08:32:19 -07:00
Vasiliy Kuznetsov	81660c08f0	quantized add: enable broadcasting (#66049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66049 Enables quantized add with broadcasting. As pointed out by jamesr66a, this was disabled but TensorIterator already supports it. Added a test case to verify. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qadd_broadcast ``` Imported from OSS Reviewed By: dagitses Differential Revision: D31472850 fbshipit-source-id: a3b16d9000487918db743525d22db6864330762b	2021-10-08 08:31:07 -07:00
Edward Yang	ece0221854	Rename int to long, add more C++ types. (#66108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66108 BC-breaking change: intT is now longT (which aligns it more accurately with how the types are referred to in C++). The benefit for this is we can idiomatically express all C++ dtypes (with intT now mapping to int32_t). These types are needed for ufunc codegen in a latter patch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31385761 Pulled By: ezyang fbshipit-source-id: ec6f3a0953794313470dbe14911f23ac116be425	2021-10-08 08:25:06 -07:00
Edward Yang	11bc435622	Allow registration of custom symbolics for prim namespace (#64460 ) (#66139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66139 [ONNX] Add prim::PythonOp check back in export.cpp (#64944) Add prim::PythonOp check back in export.cpp Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424102 fbshipit-source-id: 6d2eef767fab846ed79ea509e97b714072bac9f4 Co-authored-by: jiafatom <jiafa@microsoft.com>	2021-10-08 07:41:06 -07:00
Edward Yang	9b09a5f7ba	[ONNX] Enable scripting tests (#64780 ) (#66138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66138 * Scripting tests * Fixed scripting tests for lower opsets Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424099 fbshipit-source-id: 67095b7ac67b9da986961788392aa92c95cf11f2	2021-10-08 07:41:03 -07:00
Edward Yang	53fefaa916	[ONNX] Fix duplicated output same name case (#64190 ) (#66137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66137 * fix duplicated output node same output name issue. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31424100 fbshipit-source-id: b1b06a92c51744030788b651f3a597d987a8deda Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-10-08 07:41:01 -07:00
BowenBao	4af47eb3a7	[ONNX] Update slice process shape to support rank only inference (#65782 ) (#66149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66149 Updated logic will be able to infer rank of slice output, when only rank is known for slice input. Enables cases where `ConstantValueMap::HasRank(input)` is `True`, while `ConstantValueMap::HasShape(input)` is `False`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31423232 Pulled By: ezyang fbshipit-source-id: 516e3916aa71afda2b10e44620636e42ed837236 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-10-08 07:39:40 -07:00
Richard Zou	dc37547c44	Opinfos for avg_pooling (#64214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64214 Added OpInfos for: - F.adapative_avg_pool{1, 3}d - F.avg_pool{1, 3}d The 2d variants already had OpInfos. Test Plan: - run tests Reviewed By: albanD, mruberry Differential Revision: D30667797 Pulled By: zou3519 fbshipit-source-id: 53f5cd02070de5b7db4abb017d727376b59288df	2021-10-08 07:26:08 -07:00
Jeeja KP	8d6d448238	Add HPU for Autograd Fallback (#65605 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65605 Reviewed By: albanD Differential Revision: D31373899 Pulled By: ezyang fbshipit-source-id: 894f62dc44b0532f152dc97b839eecfbaed25e8c	2021-10-08 07:21:44 -07:00
Ankita Sharma	4af913a7cf	fixed minor issues for index_add in docs (#65806 ) Summary: Hi, I'm looking forward to contributing to PyTorch, so starting with a minor fix in the documentation for `index_add`. Currently, in the documentation for `index_add_` (please see https://pytorch.org/docs/master/generated/torch.Tensor.index_add_.html#torch.Tensor.index_add_): 1. `tensor` attribute was pointing to `torch.tensor` class, which IMO - is (thought may not be a big deal) unintentional. 2. `dim` attribute is pointing to `torch.Tensor.dim`, which again IMO - is unintentional. This PR suggests a correction for the first point above, to rename `tensor` attribute to `input` so that it doesn't point to `torch.tensor` class. (I've verified that others ops like `scatter` use `input`, so this should not break the consistency in the documentation). I couldn't find an appropriate fix for the second point above, since renaming `dim` to something else will break the consistency (as almost all others op in PyTorch use `dim` as the attribute name). I may be wrong here, so please let me know if there is any feedback or an alternate fix for this. _Note:_ I plan to fix this behavior for `index_copy_` (https://pytorch.org/docs/master/generated/torch.Tensor.index_copy_.html#torch.Tensor.index_copy_) once and if this PR is approved. To the reviewers, please help me tag the correct person who could help review this PR. cc: krshrimali mruberry zou3519 cc brianjo mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/65806 Reviewed By: dagitses, mruberry Differential Revision: D31431182 Pulled By: zou3519 fbshipit-source-id: 66ced9677ac3bc71d672d13366f9f567ecea0a2d	2021-10-08 07:17:15 -07:00
Luca Wehrstedt	61f0bb70c1	Allow external CUDA streams to be set as current (#65914 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65822. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65914 Reviewed By: dagitses Differential Revision: D31389480 Pulled By: lw fbshipit-source-id: 2b2f40e5452c5b2a0b9f0f705750d2aa9deb2ead	2021-10-08 06:09:32 -07:00
Shiyan Deng	60fe854f9f	[fx2trt] save and load TRTModule for OSS (#65958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65958 zhxchen17 added `pickle` pybind for trt engine which allows us to save and load a nn.Module with trt engine in fbcode. This diff though is explicitly ser/des engine in __set_state__` and `__get_state__` so that in OSS people can also save and load TRTModule directly. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_fx2trt Reviewed By: wushirong Differential Revision: D31309429 fbshipit-source-id: 9068e2ae6375ed0e1bb55b0e9d582b8d9c049dbf	2021-10-07 22:27:40 -07:00
jiej	321345d7c9	Revert "Revert D31227448: [pytorch][PR] fixing sorting in stride indices" (#66176 ) Summary: enabling https://github.com/pytorch/pytorch/issues/63940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66176 Reviewed By: ngimel Differential Revision: D31423920 Pulled By: dzhulgakov fbshipit-source-id: 06b1e0f757f4fb5b31ee1fa464bcd689df919b9c	2021-10-07 22:09:07 -07:00
Shiyan Deng	74477ba243	[fx2trt] More controls over output dtypes (#65959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65959 Give some more controls over the output dtype of a trt engine. Previously it would be fp16 if we turn on fp16_mode. This diff allows the engine to generate fp32 output with fp16_mode=True. Test Plan: CI Reviewed By: kflu, wushirong Differential Revision: D31243929 fbshipit-source-id: 09c752e6f382d6ad169da66878d9a9277c134869	2021-10-07 22:03:51 -07:00
CodemodService FBSourceClangFormatLinterBot	227f91e72d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31495160 fbshipit-source-id: b0a56003a6695989dff0d325cdc118182662ec61	2021-10-07 21:09:22 -07:00
Ben Koopman	a58ff186e8	[quant][embedding qat] Add basic EmbeddingBag QAT fakeQuant workflow (#65443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65443 Test Plan: Imported from OSS Reviewed By: dagitses, supriyar Differential Revision: D31456445 Pulled By: b-koopman fbshipit-source-id: 0edda6e272d9005fce65f2ba6a5e6abc831836de	2021-10-07 20:19:29 -07:00
Dhruv Matani	64caee1356	[PyTorch Edge] Leave out field for debug_handle if not being built with eager symbolication support (#66131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66131 Turns out that a model with 72k instructions causes about 0.5MiB of additional memory overhead (if there's an 8 byte memory overhead per instruction). This is not necessary if we're building w/o eager symbolication support. This change eliminates the 8 byte `debug_handle` if the build is w/o eager symbolication support. ghstack-source-id: 140045478 (Note: this ignores all push blocking failures!) Test Plan: ``` buck build -c "pt.enable_eager_symbolication"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor buck build //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: kimishpatel Differential Revision: D31387784 fbshipit-source-id: af56787ad833b990a46b79ab021e512edaa22143	2021-10-07 20:01:18 -07:00
Nikita Shulga	ebe530a9cd	Periodic jobs should not have CIFLOW_DEFAULT label (#66300 ) Summary: Noticed that `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` job has a `ciflow/default`, but does not have a `ciflow/scheduled` label Added asserts to enforce that jobs with non-trival is_scheduled property do not have default and do have scheduled labesl Rename `periodic-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-slow-gradcheck` to `periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66300 Reviewed By: seemethere Differential Revision: D31493323 Pulled By: malfet fbshipit-source-id: 194c1d7a4e659847d94a547b87a0d7d08e66406d	2021-10-07 19:57:32 -07:00
Peter Bell	bd9eee4e65	TBB: Use static partitioner to match OpenMP scheduling (#65327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65327 Should fix https://github.com/pytorch/pytorch/issues/64571 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31474116 Pulled By: malfet fbshipit-source-id: 8c4264d4778c6caf58261e3f70d72decd134128d	2021-10-07 19:12:36 -07:00
Nikita Shulga	d5033410b1	Parallel: Deduplicate parallel functions in different backends (#65326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65326 parallel_for and parallel_reduce currently share some common code in all backends, specifically for detecting if it should run in parallel or not. This moves all the backend-specific code into a single `internal::invoke_parallel` function and makes the `parallel_` functions common to all backends. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31124495 fbshipit-source-id: 65c3d2af42a8860cc4d6349566085c9fa8d8c6f0	2021-10-07 19:11:19 -07:00
Nikita Shulga	e1817d895f	[BE] Cleanup python_function.cpp (#66296 ) Summary: - Delete unused `var_input_idx` - Fix `uninitialized variable` clang-tidy warning by setting `PyObject* input` to PyNone Pull Request resolved: https://github.com/pytorch/pytorch/pull/66296 Reviewed By: janeyx99 Differential Revision: D31491016 Pulled By: malfet fbshipit-source-id: 08267144be0cd049d122580cdf81cf586c3e30a6	2021-10-07 18:41:17 -07:00
Eli Uriegas	ca363d1e22	docker: Ensure libgnutls30 for all docker builds (#66258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66258 Installing libgnutls30 has shown to be good when confronted with the CERT issue related to deb.nodesource.com Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31477789 Pulled By: seemethere fbshipit-source-id: f87ae4c098771acc505db14e3982d8858cf7326f	2021-10-07 18:36:40 -07:00
Rohan Varma	38f5144eae	Fix https://github.com/pytorch/pytorch/issues/61982 (#66015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66015 Fixes https://github.com/pytorch/pytorch/issues/61982 by clone of tensors in DDPSink. Only applies once for static_graph and generally for unused params which already has overhead, so perf hit should not be an issue. Will verify with benchmark. Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D31346633 fbshipit-source-id: 5b9245ade628565cffe01731f6a0dcbb6126029b	2021-10-07 18:11:18 -07:00
Peter Bell	20f2e55d4f	Rename cuda/Resize.cu to cuda/Resize.cpp (#65943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65943 These files don't require nvcc to compile. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31386277 Pulled By: ngimel fbshipit-source-id: 1066ee87fa795e2c7969447fbce1fe2633fb9680	2021-10-07 16:37:51 -07:00
Ashish Solanki	86de09e49a	Upgrade to ubuntu:trusty-20190515 (#63468 ) Summary: Security Upgrade to ubuntu:trusty-20190515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63468 Reviewed By: ngimel Differential Revision: D31393552 Pulled By: malfet fbshipit-source-id: 4e2399e3cddc1d549c08c82c08015e00569c19bc	2021-10-07 16:28:08 -07:00
Don Jang	416f593080	[Static Runtime] Group graph nodes into input aliases & output aliases (#65517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65517 This change retrofits `GetAlwaysAliveValues` into `ValueGroup` to group the values used by a graph into three groups as follows: - input_aliases: values that are either inputs or contain aliases of inputs or constants. - output_aliases: values that are either outputs or contain aliases of outputs and are not in input_aliases. - Values that dont't show up in input_aliases and output_aliases are internally created consumed within the graph. `output_aliases` is the only new group introduced by this change, and a following diff will use this to preallocate output Tensors to accelerate Static Runtime's performance. Test Plan: Added `ValueGroup.Init` to cover the updated code path. Note that there was no test for `GetAlwaysAliveValues` before. Reviewed By: hlu1 Differential Revision: D30940955 fbshipit-source-id: 2cb065ecda0f447a61e64a7cf70cc7c6947f7dfc	2021-10-07 14:35:12 -07:00
Mikayla Gawarecki	0e2d1b221a	[Bootcamp][Pytorch Core] Add testing for complex non-vanilla SGD Summary: Adding test to ensure non-Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QLxw Reviewed By: albanD Differential Revision: D31477212 fbshipit-source-id: 500678e561a05ac96759223b4c87a37cab26c6a6	2021-10-07 14:07:39 -07:00
Shunting Zhang	5e7d8ec846	Support Registering a Variable Length List of Builtin Modules for torch::deploy Builtin Libraries (#66021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66021 A builtin library consists of a list of frozen modules and a list of builtin modules. For tensorrt, it's quite simple since we only have a single builtin module tensorrt.tensorrt. But it can be complex for libraries like numpy which contains multiple builtin modules (np.core._multiarray_umath, np.random.mtrand etc.) if we want to add it as a torch::deploy builtin. We enhance the macro that registers builtin libraries to accept a variable length of builtin modules. We can use this macro to register frozentorch, frozenpython, tensorrt for now and can also use it to register libraries like numpy later on. The enhanced macro now looks as follows. Although we don't need to worry about back-compatibility for now, but this enhanced version is fully compatible with the previous version. The previous version is just a special case when the library contains no builtin modules. ``` REGISTER_TORCH_DEPLOY_BUILTIN(library_name_without_quote, frozen_modules_list, builtin_module_name_1, builtin_module_init_function_1, ..., builtin_module_name_N, builtin_module_init_function_N) ``` ghstack-source-id: 140007970 Test Plan: 1. Play around with interactive_embedded_interpreter.cpp to import torch._C, tensorrt.tensorrt etc inside the embedded interpreter. 2. Enhance test_builtin_registry.cpp 3. Run test_deploy.cpp and test_deploy_gpu.cpp Reviewed By: suo Differential Revision: D31349390 fbshipit-source-id: 70a1fcf660341180fc4d5195aed15ceb07c2bef7	2021-10-07 13:23:46 -07:00
Raghavan Raman	40dd2711b6	[Static Runtime] Cleanup LLVMCodeGen memory after code gen completes (#66218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66218 This stack of diffs reduces the memory used by LLVMCodeGen object. Here are the numbers on model `294738512`: (this is the number reported as `Memory turnover after freeze_module:` in the output) ``` Before: 123343496 After : 121566008 ``` So, there is a reduction of about `~1.77MB` with this change of making `PytorchLLVMJIT` a singleton. Test Plan: Imported from OSS Reviewed By: ZolotukhinM, hlu1 Differential Revision: D31445798 Pulled By: navahgar fbshipit-source-id: c860d36456b2c5d3e21010c1217e2948326f666d	2021-10-07 13:17:13 -07:00
Raghavan Raman	7e5ef5e517	[nnc] Added a cache to use singleton instances of PytorchLLVMJIT for every triple,cpu,attrs combination (#66217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66217 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D31445797 Pulled By: navahgar fbshipit-source-id: 4e1450100928132ccce4ef3c6c20ad6661cfabed	2021-10-07 13:17:11 -07:00
Raghavan Raman	c30dc52739	[nnc] Use given kernel function name while emitting code (#66216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66216 Test Plan: Imported from OSS Reviewed By: dagitses, priyaramani Differential Revision: D31445799 Pulled By: navahgar fbshipit-source-id: 8d164209831339d364710b14f6a263a16e108281	2021-10-07 13:15:46 -07:00
Bin Wen	3cc40253d9	add gather to ShardedTensor (#65671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65671 Tentative implementation to use dist.gather_object to collect shards from all ranks and then "merge" them. The merge is done on dst_rank though padding the sharded tensors into the size of full tensor based on their metadata (offsets, lengths) first, and then summing these padded tensors together. Also considered concatenating sharded tensor without padding to minimize memory footprint (assuming padding will increase memory). But it may not be flexible enough for arbitrary sharing (e.g. shard on multiple directions) Another way can be constructing the padded tensor on each rank and reduce to rank0. I feel this is the most easy implementation. But it will invoke higher memory usage and comm payload. Please let me know if this alternative is preferred. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS python test/distributed/_sharded_tensor/test_sharded_tensor.py -v -k test_gather did not manage to test on oss, but tested in fbcode by reserving on demand gpu arc patch D31197611 modify the test with 2 gpus as on-demand gpu only has 2 cores (D31227986) buck test -c fbcode.enable_gpu_sections=true mode/dev-nosan caffe2/test/distributed/_sharded_tensor:sharded_tensor -- test_gather buck-out/gen/caffe2/test/distributed/_sharded_tensor/sharded_tensor#binary.par test_sharded_tensor.TestShardedTensorChunked.test_gather {F667213605} Reviewed By: dagitses, pritamdamania87 Differential Revision: D31197611 Pulled By: dracifer fbshipit-source-id: cf98b4a2d7838b11b9582eb23f826bb0fa38a7f4	2021-10-07 13:01:12 -07:00
Peter Bell	f445ed19b2	OpInfo for 2d fft functions (#66128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66128 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450217 Pulled By: mruberry fbshipit-source-id: 1952fc60c5d5f454966c43f5710b8b97a9794d0e	2021-10-07 12:50:06 -07:00
Peter Bell	2213c463ba	C++ API and docs for hfftn (#66127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66127 cc mruberry peterbell10 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31450216 Pulled By: mruberry fbshipit-source-id: 2878aee294aa7d74482b66d536258bac0541408d	2021-10-07 12:48:36 -07:00
Peter Bell	e6a4f746c2	slow_conv3d: Use at::sum for grad_bias accumulation (#65758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65758 The same change has been made in conv2d, the proper algorithm is both faster and gives more precision. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257872 Pulled By: ngimel fbshipit-source-id: 6ff3a7a00a05b66f83d45cc820bd0c230cb8de6d	2021-10-07 12:20:49 -07:00
Ivan Yashchuk	2e4e5b0264	Add inplace_variant for resize_ OpInfo (#66135 ) Summary: Enable testing of `torch.Tensor.resize_`. The negative view test is skipped as the test doesn't work with resize_ see https://github.com/pytorch/pytorch/issues/65945. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/66135 Reviewed By: dagitses Differential Revision: D31444263 Pulled By: mruberry fbshipit-source-id: 00c7fe05df28fba01508b31adb3ed4fdcf4d0326	2021-10-07 12:00:30 -07:00
Samuel Salas	361b34eb81	Chunk: acc_ops (#66010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66010 Added chunk acc op and unit test. Removed misleading return statements. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31326490 fbshipit-source-id: 81183ad8773eb7471566bec07cdd3dd6c4cee217	2021-10-07 11:41:00 -07:00
Patrick Spencer	9fb6ba24e7	Update `torch.fx.passes.split_module` docstring (#65542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65542 Add docstring for torch.fx.passes.split_module that conforms to Google Python Style conventions. Changed original example to the example from this diff: https://www.internalfb.com/diff/D24925283 (`9734c042b8`) Test Plan: Ran buck test //caffe2/test:fx. No errors detected https://pxl.cl/1QCch Reviewed By: jamesr66a Differential Revision: D31145694 fbshipit-source-id: 8e54f3b1be3dca1c4d414fdeeab71b9f2b5d9f3e	2021-10-07 10:37:10 -07:00
Mike Iovine	d5f64afc38	[Static Runtime] Support aten::to.prim_dtype overload (#64928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64928 Added support this overload of `aten::to`: ``` aten::to.prim_dtype(Tensor(a) self, int? dtype, bool non_blocking=False, bool copy=False) -> Tensor(a\|b) ``` Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_to` Reviewed By: hlu1 Differential Revision: D30901398 fbshipit-source-id: 38ce807c30185e92dd472b404b362f22ac7e4efb	2021-10-07 10:22:44 -07:00
Will Constable	a8c0b362ce	[pytorch][PR] Add hash and int128 utils for Lazy Tensor Core" (#66181 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66181 Original commit changeset: 3d0d5377d71e Test Plan: Run PyTorch XLA corresponding PR in XLA CI: https://github.com/pytorch/xla/pull/3148/files Reviewed By: suo Differential Revision: D31416438 fbshipit-source-id: 58a6a49c5bc30134bc6bae2e42778f359b9a8f40	2021-10-07 10:05:26 -07:00
Yanli Zhao	61fca037d6	[Part 1] upstreaming fairscale fsdp to PyTorch -- sharding, core data flow and hooks (#63881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63881 This PR includes the minimal sets of features to make FSDP work, like sharding, core data flow and hooks. More tests will be added in the follow up PRs. Tests are refactored to utilize common PyTorch utils. Codes are also refactored a little bit. Alternative ways to replace ".data" usage in this PR are still being discussed offline. Test Plan: unit tests Reviewed By: mrshenli Differential Revision: D30521673 fbshipit-source-id: 9a23390dd7c925749604c6860e08fbe39ddc5500	2021-10-07 09:06:44 -07:00
Erjia Guan	88f8944ef1	Revert D30599136: [Pytorch Edge][tracing-based] build tracer in OSS Test Plan: revert-hammer Differential Revision: D30599136 (`eeaf527feb`) Original commit changeset: 102f23fb652c fbshipit-source-id: 8ac3d75a52d06a5c4196bae2db1c4df2d5c5c666	2021-10-07 08:34:23 -07:00
Richard Barnes	2f1ab477f1	Speed up DataTypeToTypeMeta (#66113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66113 For a benchmark compiled in opt-mode in which the lookup items were shuffled and then the items were looked up round-robin fashion 10M times (for a total of 140M lookups) compiled in opt-mode we see: ``` Function Container Time (ms) Multiplier TypeMetaToDataType if-chain 233 1x TypeMetaToDataType std::vector 795 3.41x TypeMetaToDataType std::map 1566 6.72x TypeMetaToDataType std::unordered_map 2136 9.17x DataTypeToTypeMeta switch 102 1x DataTypeToTypeMeta std::vector 666 6.53x DataTypeToTypeMeta std::map 1212 11.9x DataTypeToTypeMeta std::unordered_map 1539 15.1x DataTypeToTypeMeta folly::F14FastMap 1789 17.5x ``` From this, we draw two conclusions: 1. Using a complex container like `std::map` is worse than using a simple vector lookup here (there aren't enough items for the Big-O to assert itself). 2. Using any container at all is a mistake. (Unless we pull in more exotic reasoning like invalidating the code cache or preventing inlining.) Test Plan: Sandcastle Reviewed By: dzhulgakov Differential Revision: D31375117 fbshipit-source-id: 0b310c6c2e94080d125c82fb7c2b43ab869adbcb	2021-10-07 08:06:09 -07:00
Mikayla Gawarecki	1e4bcbdddb	[Bootcamp][Pytorch Core] Add test for complex numbers for vanilla SGD (#66230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66230 Adding test to ensure Vanilla SGD behaves as if complex numbers are two real numbers in R^2 as per issue 65711 on github https://github.com/pytorch/pytorch/issues/65711 ghstack-source-id: 139918862 Test Plan: ```buck test mode/dev caffe2/test:optim -- 'test_sgd_complex'``` https://pxl.cl/1QHvX Reviewed By: albanD Differential Revision: D31449289 fbshipit-source-id: da8b00421085796a23b643e73f96b19b5b560a32	2021-10-07 07:14:05 -07:00
Mike Iovine	057a01556c	[Static Runtime] Do not use variadic_sigrid_transforms_torch_bind if out variant is disabled (#66221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66221 JIT doesn't have an implementation for this op, so we can only use it when out variants are enabled. Reviewed By: hlu1 Differential Revision: D31445887 fbshipit-source-id: 4565ac4df751d8ee4052647574c43efa05ea1452	2021-10-07 06:57:17 -07:00
CodemodService FBSourceClangFormatLinterBot	dcf39f9bb9	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31464823 fbshipit-source-id: 37bd72c8f1c8240d2ae72385a0707003ddb24ce8	2021-10-07 04:17:48 -07:00
Kiuk Chung	df11e2d6f9	(torch/elastic) add fqdn hostname to error printout (#66182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66182 closes https://github.com/pytorch/pytorch/issues/63174 Does a few things: 1. adds hostname to the error report 2. moves the "root cause" section to the end (presumably since the logs are being "tailed" we want the root cause to appear at the end) 3. moves redundant error info logging to debug 4. makes the border max 60 char in length and justifies left for the header NOTE: YOU HAVE TO annotate your main function with torch.distributed.elastic.multiprocessing.errors.record, otherwise no traceback is printed (this is because python exception propagation does NOT work out of the both for IPC - hence the extra record annotation). Test Plan: Sample ``` ============================================================ run_script_path FAILED ------------------------------------------------------------ Failures: <NO_OTHER_FAILURES> ------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2021-10-05_17:37:22 host : devvm4955.prn0.facebook.com rank : 0 (local_rank: 0) exitcode : 1 (pid: 3296201) error_file: /home/kiuk/tmp/elastic/none_3_lsytqe/attempt_0/0/error.json traceback : Traceback (most recent call last): File "/tmp/jetter.xr3_x6qq/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 372, in wrapper return f(args, *kwargs) File "main.py", line 28, in main raise RuntimeError(args.throws) RuntimeError: foobar ============================================================ ``` Reviewed By: cbalioglu, aivanou Differential Revision: D31416492 fbshipit-source-id: 0aeaf6e634e23ce0ea7f6a03b12c8a9ac57246e9	2021-10-07 01:40:02 -07:00
Supriya Rao	8a974a482c	[quant] Add support for quantization of Embedding{Bag} in dynamic quant APIs (#65674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674 Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules. With this PR they can use either the static or dynamic quantization APIs for Embedding quantization The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float method of the quantized Embedding/Embedding modules. To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type. The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32. Addresses Issue #65185 ghstack-source-id: 139935419 Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: gchanan Differential Revision: D31211199 fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4	2021-10-06 23:19:38 -07:00
Samuel Salas	115526cc88	GELU Converter (#66008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66008 Added GELU converter and updated TARGET file of deeplearning/trt/fx2trt to load the plugins onto the converters Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_gelu Reviewed By: 842974287 Differential Revision: D31284144 fbshipit-source-id: 0e938a47a99d289aefc3308aec3937c7334e9b8a	2021-10-06 22:25:43 -07:00
Martin Yuan	ac0dbd6eec	Promote missing ops for delegated models (#66052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66052 `aten::__getitem__.Dict_str` and `prim::unchecked_cast` are used in delegate API. ghstack-source-id: 139860350 Test Plan: CI Reviewed By: pavithranrao Differential Revision: D31364720 fbshipit-source-id: dfca5e3ded4cdd3329c9b9d80a13f0fb1f5f2a51	2021-10-06 21:48:42 -07:00
Peter Bell	3f30526ff2	Remove THCAllocator (#65942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65942 This one is a bit weird. The class is called `THCIpcDeleter` but it actually has nothing IPC-specific. It just converts `std::shared_ptr` + `void*` into a `c10::DataPtr`. Instead, moving the `DataPtr` conversion into the actual IPC code allows 2 memory allocations to be elided by merging 3 separate deletion contexts into one. Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D31386278 Pulled By: ngimel fbshipit-source-id: 5722beed9dcf680f0eb6bbff30405cff47b21962	2021-10-06 19:04:43 -07:00
Chen Lai	eeaf527feb	[Pytorch Edge][tracing-based] build tracer in OSS (#64087 ) Summary: 1. Introduce ``` MobileModelRunner.h MobileModelRunner.cpp TensorUtils.h TensorUtils.cpp ``` in external. They are pretty much the same as internal, except namespace and the dependency in folly. In next prs, TensorUtils and MobileModelRunner are unified between external and internal. 2. Introduce ``` tracer.cpp ``` for external. Majority is the same as internal one, with some cleanup on unnecessary dependency. It's unified between internal and external in next change. 3. Add an executable to build the tracer. It will be built for desktop only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64087 ghstack-source-id: 139900300 Test Plan: Given the model ``` class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.lin = nn.Linear(10, 1) def forward(self, x): return self.lin(x) model = Net() scripted_module = torch.jit.script(model) example_dict = {'a' : 1, 'b' : 2} sample_input = { scripted_module.forward : [(torch.zeros(1,10),)], } bundled_model = torch.utils.bundled_inputs.bundle_inputs(scripted_module, sample_input) bundled_model._save_for_lite_interpreter("dummy_model_with_bundled_input.ptl") ``` External tracer ``` ./build/bin/model_tracer --model_input_path "/Users/chenlai/Documents/pytorch/tracing/dummy_model_with_bundled_input.ptl" --build_yaml_path "/Users/chenlai/Documents/pytorch/tracing/tmp.yaml" ``` and compare `tmp.yaml` with the operator list generated from Internal tracer ``` ./fbcode/caffe2/fb/model_tracer/run_model_with_bundled_inputs.sh ~/local/notebooks/prod_models/dummy_model_with_bundled_input.ptl ``` QNNPACK only: Example yaml from internal tracer: P460742166 [devserver] Example yaml from external tracer: P460759099 [mac], P460742166 [devserver] Comparison ops between internal and external on devserver: {F666923807} {F666924048} Note: The operators generated on mac and devservers are different, the one on deserver includes two extra ops: `aten::addmm_, aten::slow_conv_dilated2d"`. Based on the traced list, when calling `aten::_convolution`, one calls `aten::mkldnn_convolution`, and the other calls `aten::_convolution_nogroup`, causing the divergence. Thanks for Martin for pointing out: > mkldnn is another backend from Intel Reviewed By: dhruvbird Differential Revision: D30599136 fbshipit-source-id: 102f23fb652c728a9ee4379f9acc43ae300d8e8a	2021-10-06 19:01:04 -07:00
Chen Lai	0cab25468d	[Pytorch Edge][tracing-based] reorganize model tracer dependency (#63421 ) Summary: 1. move 4 files to : ``` KernelDTypeTracer.h KernelDTypeTracer.h OperatorCallTracer.h OperatorCallTracer.h ``` so it's visible in OSS. 2. Update the namespace to `torch::jit::mobile` 3. Add a `fb_xplat_cxx_library` `torch_model_tracer` with the source file list above. 4. update the `fb_xplat_cxx_library` `model_tracer_lib` dependency on the new `torch_model_tracer` library Pull Request resolved: https://github.com/pytorch/pytorch/pull/63421 ghstack-source-id: 139900299 Reviewed By: dhruvbird Differential Revision: D30378069 fbshipit-source-id: d56c6140e951bc13113a76d6b63767a93843c842	2021-10-06 18:59:50 -07:00
Horace He	300613dc60	make FX symbolic tracing reuse buffers if they're the same (#66211 ) Summary: Currently, if the same tensor constant is reused multiple times, we'll store a tensor constant for each time we use it. For example ``` val = torch.randn(5) for _ in range(10): x = x + val ``` ends up storing 10 tensor constants. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66211 Reviewed By: jamesr66a Differential Revision: D31437089 Pulled By: Chillee fbshipit-source-id: 401169c8d58ce0afb7025ae11060680ef544419f	2021-10-06 18:35:38 -07:00
Priya Ramani	67970e8c9b	Add CI tests for AOT Compile (#65441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65441 Adding CI test to verify a simple linear model can compile fine. Successful run from CI logs: ``` + test_aot_model_compiler + echo 'Testing AOT model compiler' Testing AOT model compiler + source test/mobile/nnc/test_aot_compile.sh +++ python -c 'import site; print(site.getsitepackages()[0])' ++ TORCH_INSTALL_DIR=/opt/conda/lib/python3.6/site-packages/torch ++ TORCH_BIN_DIR=/opt/conda/lib/python3.6/site-packages/torch/bin +++ dirname test/mobile/nnc/test_aot_compile.sh ++ CURRENT_DIR=test/mobile/nnc ++ MODEL=aot_test_model.pt ++ COMPILED_MODEL=aot_test_model.compiled.pt ++ COMPILED_CODE=aot_test_model.compiled.ll ++ test_aot_model_compiler ++ python test/mobile/nnc/aot_test_model.py ++ exit_code=0 ++ [[ 0 != 0 ]] ++ /opt/conda/lib/python3.6/site-packages/torch/bin/test_aot_model_compiler --model aot_test_model.pt --model_name=aot_test_model --model_version=v1 --input_dims=2,2,2 The compiled model was saved to aot_test_model.compiled.pt ++ success=1 ++ '[' '!' -f aot_test_model.compiled.pt ']' ++ '[' '!' -f aot_test_model.compiled.ll ']' ++ '[' -f aot_test_model.compiled.ll ']' ++ rm aot_test_model.compiled.ll ++ '[' -f aot_test_model.compiled.pt ']' ++ rm aot_test_model.compiled.pt ++ rm aot_test_model.pt ++ '[' 1 = 0 ']' + [[ linux-xenial-py3.6-gcc5.4-default == pytorch-linux-xenial-py3* ]] + assert_git_not_dirty + [[ linux-xenial-py3.6-gcc5.4-default != rocm ]] + [[ linux-xenial-py3.6-gcc5.4-default != xla ]] ++ git status --porcelain + git_status= + [[ -n '' ]] + test_custom_script_ops ``` Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D31348169 Pulled By: priyaramani fbshipit-source-id: dd5c55859dfa07d150e5decc2dd7e56f43e7f66b	2021-10-06 18:23:19 -07:00
Shunting Zhang	6c54971cd9	Open Registration for torch::deploy Builtins (#65953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65953 Previously if people want to add a torch::deploy builtin, they need to change torch::deploy internal code (interpreter_impl.cpp) to register the python part as frozen modules and C++ part as builtin modules. This is not convenient and error prone. We want to add open registration support for torch::deploy builtins so that people only need to add one effective line of code in there library code to complete the registration. Here is an example to registry numpy as torch::deploy builtins: REGISTER_TORCH_DEPLOY_BUILTIN(numpy, numpy_frozen_modules, <list of name, PyInit function pairs>) This diff supports open registration of frozen modules. It's the first step to achieve the plan above. ghstack-source-id: 139888306 Test Plan: Run tests in test_deploy.cpp and test_builtin_registry.cpp Reviewed By: suo Differential Revision: D31321562 fbshipit-source-id: 6445bd8869f1bb7126b4c96cf06c31145f0e9445	2021-10-06 18:04:57 -07:00
Michael Suo	213c3f45da	[oss/ci] skip TestDataLoaderPersistentWorkers on ASAN (#66236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66236 it's flaky, see https://github.com/pytorch/pytorch/issues/66223 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31462056 Pulled By: suo fbshipit-source-id: f4362a8020dc05ac8856706c0508d48be026eeb8	2021-10-06 17:56:19 -07:00
Aliaksandr Ivanou	4937218611	[torch][launch] Add ability to override sys.executable for `torch.distributed.run` (#66179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66179 The diff adds check for `PYTHON_EXEC` environment variable. If the variable is set, it will override `sys.executable` for `torch.distibuted.run`. This means that if `PYTHON_EXEC` is set, user scripts executed via `torch.distributed.run` will start via value of `os.environ["PYTHON_EXEC"]` Test Plan: unittest Reviewed By: kiukchung Differential Revision: D31329003 fbshipit-source-id: b9d0167d99bbf463a6390f508324883ca4a1e439	2021-10-06 17:33:19 -07:00
Sangbaek Park	e8837d741e	[Vulkan] cat operator for height dimension (#66103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66103 Implemented `cat` operator for height dimension Test Plan: On Mac ``` cd ~/fbsource buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (272 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (161 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (235 ms) ``` On Android ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" [ RUN ] VulkanAPITest.cat_dim2_sameheight_success [ OK ] VulkanAPITest.cat_dim2_sameheight_success (98 ms) [ RUN ] VulkanAPITest.cat_dim2_diffheight_success [ OK ] VulkanAPITest.cat_dim2_diffheight_success (105 ms) [ RUN ] VulkanAPITest.cat_dim2_invalidinputs_exceptions [ OK ] VulkanAPITest.cat_dim2_invalidinputs_exceptions (101 ms) ``` Reviewed By: SS-JIA Differential Revision: D31323141 fbshipit-source-id: 68b187e856758790cc5f7b0c263feb30a2bb467f	2021-10-06 16:12:59 -07:00
Nikita Vedeneev	1d586e78c6	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: dagitses Differential Revision: D31431847 Pulled By: albanD fbshipit-source-id: 0e343e0d9da3c3d2051fca215fad289d77275251	2021-10-06 16:04:22 -07:00
Priya Ramani	78209b93b3	Don't build shared library for AOT Compiler (#66227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66227 Building a shared library for AOT Compiler is not necessary as it's included in libtorch. Also having this built as a shared library was affecting android builds and we don't need to build AOT Compiler for mobile builds Before fix: ``` (pytorch) ~/local/pytorch master └─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21] Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu) Python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] Caffe2 path: /data/users/priyaramani/pytorch Using Android NDK at /opt/android_ndk/r20/ . . FAILED: lib/libaot_compiler.so : && /opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/android_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/and roid_ndk/r20/toolchains/llvm/prebuilt/linux-x86_64/sysroot -fPIC -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -fno-addrsig -march=armv7-a -mt humb -Wa,--noexecstack -Wformat -Werror=format-security -frtti -fexceptions -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK - DBUILD_LITE_INTERPRETER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bound s -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -W no-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-typedef-redefinition -Wno-unknown-warning-option -Wno-unused-private-field -Wno-inconsistent-miss ing-override -Wno-aligned-allocation-unavailable -Wno-c++14-extensions -Wno-constexpr-not-const -Wno-missing-braces -Qunused-arguments -fcolor-diagnostics -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -g0 -Oz -DNDEBUG -Wl,--exclude-libs,libgcc.a -Wl,--exclude-libs,libatomic.a -static-libstdc++ -Wl,--build-id -Wl,--warn-shared-text rel -Wl,--fatal-warnings -Wl,--exclude-libs,libunwind.a -Wl,--no-undefined -Qunused-arguments -Wl,-z,noexecstack -rdynamic -shared -Wl,-soname,libaot_compiler.so -o lib/libaot_compiler.so caffe2/torch/CMakeFi les/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o -latomic -lm && : caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function at::from_blob(void*, c10::ArrayRef<long long>, c10::TensorOptions const&): error: undefined reference t o 'at::TensorMaker::make_tensor()' . . caffe2/torch/CMakeFiles/aot_compiler.dir/csrc/jit/mobile/nnc/aot_compiler.cpp.o:aot_compiler.cpp:function torch::jit::mobile::nnc::Function::Function(): error: undefined reference to 'c10::AnyType::get()' clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` After fix: ``` (pytorch) ~/local/pytorch master └─ $ ANDROID_NDK=/opt/android_ndk/r20/ BUILD_PYTORCH_MOBILE=1 ANDROID_ABI=armeabi-v7a ./scripts/build_android.sh -DBUILD_BINARY=ON Build with ANDROID_ABI[armeabi-v7a], ANDROID_NATIVE_API_LEVEL[21] Bash: GNU bash, version 5.0.11(1)-release (x86_64-redhat-linux-gnu) Python: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] Caffe2 path: /data/users/priyaramani/pytorch Using Android NDK at /opt/android_ndk/r20/ . . -- Build files have been written to: /data/users/priyaramani/pytorch/build_android Will install headers and libs to /data/users/priyaramani/pytorch/build_android/install for further Android project usage. [2/3] Install the project... -- Install configuration: "Release" Installation completed, now you can copy the headers/libs from /data/users/priyaramani/pytorch/build_android/install to your Android project directory. ``` Test Plan: Imported from OSS Reviewed By: ljk53, axitkhurana Differential Revision: D31450970 Pulled By: priyaramani fbshipit-source-id: 87e48033f1db46fef112bae1239a09a2365620d2	2021-10-06 15:57:32 -07:00
Natalia Gimelshein	4a50b6c490	fix cosine similarity dimensionality check (#66191 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66191 Reviewed By: dagitses, malfet Differential Revision: D31436997 Pulled By: ngimel fbshipit-source-id: 363556eea4e1696d928ae08320d298451c286b10	2021-10-06 15:44:51 -07:00
Scott Wolchok	05e1476d49	[jit] Fix list copy in MemoryDAG (#65176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65176 getElements returns a reference. ghstack-source-id: 139745230 Test Plan: CI Static runtime startup for ctr_mobile_feed local net reduced from 8.35s to 7.8s Reviewed By: malfet Differential Revision: D30983898 fbshipit-source-id: 884bff40f12322633c0fffd45aed5b8bc7498352	2021-10-06 15:39:33 -07:00
Martin Yuan	fc4836f400	[Fix] Use full name to look for the promoted prim operator table (#66081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66081 Two fixes: 1. Since the operators are always registered with both name and overload name, the overloaded name need to be included when looking for an operator. 2. Don't promote operators with alias, because the new registry does not support schema with alias. ghstack-source-id: 139732099 Test Plan: CI Reviewed By: pavithranrao Differential Revision: D31382262 fbshipit-source-id: 43c6e6e0c13950a9ce8cf3a70debe0421372d053	2021-10-06 15:35:02 -07:00
Peter Bell	7cc121dbcd	slow_conv3d grad_input: Avoid dispatch in parallel region (#65757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65757 See gh-56794 Avoid dispatch inside of parallel_for by: - Replacing Tensor slicing with TensorAccessor - Replaces `bmm` and `mm` with direct calls to gemm. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257878 Pulled By: ngimel fbshipit-source-id: e6aad2d5ae7fa432bd27af2b1a8b0dcef1fc6653	2021-10-06 15:08:47 -07:00
Rohan Varma	480a1a88d6	[DDP] Log iteration in debug mode (#65770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65770 This logging info is printed out in debug mode, make it log the iteration as well for clarity. ghstack-source-id: 139838595 Test Plan: CI Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31222132 fbshipit-source-id: 14519aae1ba0b2a35b4b962e7d1a957c9142c8f8	2021-10-06 14:36:07 -07:00
Rohan Varma	722f1ccfb8	[DDP][Instrumentation] Profiling range for bucket copy (#65769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65769 Seeing some bottlenecks when copying bucket to grad, help make it more clear here. ghstack-source-id: 139838597 Test Plan: Ci Reviewed By: zhaojuanmao, wayi1 Differential Revision: D31217340 fbshipit-source-id: 762a254a3538eb5292b3a53bb5d1211057ecbdbb	2021-10-06 14:34:10 -07:00
Eli Uriegas	84c5970a77	ci: Migrate slow_gradcheck to GHA (#65730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65730 This should close out the door on migrating all scheduled workflows we have for CircleCI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31225188 Pulled By: seemethere fbshipit-source-id: 4c49e88ec017edc30e07325dbc613ff54dd164d8	2021-10-06 14:29:14 -07:00
Shijun Kong	e2be087207	[oss][pytorch] Add quint2x4 dtype (#65545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65545 Introduce 2bit qtensor. The new dtype added for this is c10::quint2x4 The underlying storage for this is still uint8_t, so we pack 4 2-bit values in a byte while quantizing it. Kernels that use this dtype should be aware of the packing format. (4 2-bit values in one byte) Test Plan: `buck test mode/dev-asan caffe2/test/:quantization -- test_qtensor` Reviewed By: supriyar Differential Revision: D31148141 fbshipit-source-id: 1dc1de719e097adaf93fee47c6d1b8010a3eae6c	2021-10-06 14:22:00 -07:00
Scott Wolchok	252b6f2cba	[PyTorch][easy] Remove dead std::set in parseAliasAnnotation (#65712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65712 No reason for this to be here. ghstack-source-id: 139743362 Test Plan: fitsships Reviewed By: dhruvbird Differential Revision: D31215696 fbshipit-source-id: 238ea6633629831e54847ce82de23571cf476740	2021-10-06 14:20:31 -07:00
Baichuan Yuan	90db214d4b	support counter-based fused rowwise adagrad (#66177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66177 As title, with additional change to enable counter for SparseAdagrad. Test Plan: buck test //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test Testing with canary packages baseline: f297789852 counter run: f297789912 Reviewed By: jspark1105 Differential Revision: D30903029 fbshipit-source-id: 3ed89a7da409fd820fd0b44950407c20fa2018a5	2021-10-06 13:50:43 -07:00
Mike Iovine	6d7fab5929	[Static Runtime][easy] Clone scripts do not use aten::add (#66161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66161 `aten::add` is not guaranteed to be bit exact with the JIT interpreter. This was causing non-deterministic test failures on master. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31406764 fbshipit-source-id: d968cb1bdb8f33934682ef3712a1341a3aacf18e	2021-10-06 12:37:39 -07:00
Ben Koopman	9285981de1	Clean up unused model instantiation (#65487 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65487 Test Plan: Imported from OSS Reviewed By: jingsh Differential Revision: D31410880 Pulled By: b-koopman fbshipit-source-id: 09b2d2d899a232e7334c82f00eff0f900e817853	2021-10-06 12:21:56 -07:00
Samuel Salas	8548928950	Cumsum: acc_ops (#66189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66189 Added acc_ops for cumsum and unit test Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31355244 fbshipit-source-id: 41490d300553b0a5d52cbc4e681bdd0cf990eb42	2021-10-06 12:15:36 -07:00
Peter Bell	623ac7eabb	slow_conv3d: Avoid dispatch in parallel region (#65737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65737 See gh-56794 Avoid dispatch inside of parallel_for by: - Replacing Tensor slicing with TensorAccessor - Copy bias into output only once, outside of the parallel region - Replaces `addmm_` and `baddbmm_` with direct calls to gemm. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257874 Pulled By: ngimel fbshipit-source-id: 20b94daa13082fb1e39eaa8144bfa4c611b61bab	2021-10-06 12:10:55 -07:00
Zafar Takhirov	9a0b2acd76	[quant] Remove hypothesis from qtopk (#66158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66158 qtopk used hypothesis which created flaky tests. In addition to that the tests generated were not representative, and would not catch the cases that we are interested in. This diff removes the hypothesis from the qtopk and merges the qtopk and qtopk_nhwc tests. We now use specific testcases. ghstack-source-id: 139768865 Test Plan: `buck test mode/dev //caffe2/test:quantization -- test_qtopk` Reviewed By: jerryzh168 Differential Revision: D31401341 fbshipit-source-id: a8fb37a7221fc43c159f34e28aa4a91ed3506944	2021-10-06 11:42:34 -07:00
Nikita Shulga	6d4d636d66	[GHA] Rectify `trigger_action_only` flag (#66209 ) Summary: No longer needed, as PR can be opened/reopened with specific label Fixes https://github.com/pytorch/pytorch/issues/66110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66209 Reviewed By: seemethere Differential Revision: D31436292 Pulled By: malfet fbshipit-source-id: 5b6e0875bec261862017dfe0eb3a5ec57fb8c705	2021-10-06 10:46:10 -07:00
Oskar Wirga	c4ea447eb5	Use src size for memcpy in order to avoid fortify complaints (#65222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65222 When compiling against the Android SDK with `--D_FORTIFY_SOURCE=2`, the compiler will complain that the `dst` size is a larger size than the `src` size due to the function templating using two differently sized objects. There is a `TORCH_CHECK` to ensure we don't go through with these `memcpy`'s, but in the interest of making the compiler happy, lets switch the `memcpy` to take `sizeof(src)`. Test Plan: CI Reviewed By: bertmaher, lanza Differential Revision: D30992678 fbshipit-source-id: b3e7aa992a3650e1051abad05be800b684e6332b	2021-10-06 09:05:31 -07:00
Nikita Shulga	bfaaac6392	Ignore register_rds errors (#66185 ) Summary: Network communications are flaky by nature, test should be marked as skipped if network ops can not be completed for some reason Fixes https://github.com/pytorch/pytorch/issues/66184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66185 Reviewed By: seemethere Differential Revision: D31423193 Pulled By: malfet fbshipit-source-id: 96c3a123c65913f44ea78b30a03e8e7eda164afe	2021-10-06 08:42:35 -07:00
Alexandr Guzhva	b8e1999253	[quant] Add op benchmark for GPU FakeQuantizePerChannel with float zero_points (#66183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66183 Add a GPU benchmark for fakeQuant, similar to #65241 ghstack-source-id: 139810414 Test Plan: https://pxl.cl/1QjJM Reviewed By: b-koopman Differential Revision: D31288158 fbshipit-source-id: 65526248b5c7b70f0bc32a86b08f50b4cbc7a83d	2021-10-06 08:07:42 -07:00
John Shen	9de9733390	Add 1d to 2d conv transform during mobile optimization (#65850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65850 This step was never added ghstack-source-id: 139753673 Test Plan: Run optimize_for_mobile on model with conv1d and see that it transforms to conv2d Reviewed By: kimishpatel Differential Revision: D31093503 fbshipit-source-id: 11a19f073789c01a9de80f33abbe628005996b66	2021-10-06 07:27:09 -07:00
Peter Bell	747a5782e3	[quant][fx] Don't assume bias is a keyword argument (#61647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61647 `prepare_fx` currently assumes that bias is always a positional argument to convolutions, and only a keyword argument to other functions. This happens to work today due to a quirk in how `__torch_function__` is handled for python functions but shouldn't be considered stable. Instead, we should support `bias` for both positional and keyword forms. cc jerryzh168 jianyuh raghuramank100 jamesr66a vkuzo Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D31401360 Pulled By: albanD fbshipit-source-id: 1e2f53d80e2176b870f326dc498e251e2386136e	2021-10-06 07:25:47 -07:00
Mengwei Liu	ab25516054	[PyTorch] Remove unused function in import (#65865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65865 `operator_str` is not used in `import.cpp` and it is also defined in `parse_operators.cpp` so removing it from `import.cpp`. Test Plan: CI passing Reviewed By: iseeyuan Differential Revision: D31293008 fbshipit-source-id: 1c857cbd63c57b8f79c1a068789fc8605605b642	2021-10-06 06:34:51 -07:00
Chen Lai	a5895f85be	[PyTorch Edge][type] Add type check in compatibility api (#63129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63129 1. Add an api to get `supported_types` from runtime, expose in c++ only. 2. Add an api to get `contained_types` from model, expose in both c++ and PyThon. 3. Add a field `contained_types_` in `type_parser.cpp` to track the contained types when parsing python string. 4. Expand `is_compatible` api to check type. When checking type, it will check the contained type list from the model with the support type list from runtime. 5. Expand the unittest for compatibility to cover type 6. Add unit test in python to check type list ghstack-source-id: 139826944 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.GetContainTypes' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleSuccess' buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - LiteInterpreterTest.isCompatibleFail' buck test //caffe2/test:mobile ``` Reviewed By: iseeyuan Differential Revision: D30231419 fbshipit-source-id: 8427f423ec28cc5de56411f15fd960d8595d6947	2021-10-06 02:23:44 -07:00
Chen Lai	c75210face	[PyTorch Edge][type] Move TypeParser class definition to header file (#65976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65976 More TypeParser class to header file so it can be called from somewhere else. For example, the getContainedTypes() api in this stack can be moved to other files. ghstack-source-id: 139826943 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D31294254 fbshipit-source-id: 1c532fd69c7f6b44ad2332055d24c95a0fac1846	2021-10-06 02:22:26 -07:00
Bert Maher	931352c68d	Make handle_torch_function_no_python_arg_parser public (#66054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66054 I need this function in functorch to support the ability of custom jitted kernels to invoke torch_function when applicable. Test Plan: functorch unit tests Reviewed By: qihqi, ngimel Differential Revision: D31416599 Pulled By: bertmaher fbshipit-source-id: 90b57badd6a6b9d505ebfc436869b962b55c66d7	2021-10-06 00:27:10 -07:00
Stephen Jia	c0b1965f7c	Back out "[vulkan] Use push constants instead of SSBOs" (#66169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66169 Original change: D30368834 (`57e5ae5306`) Switching to Push Constants from Uniform Buffers caused some unforseen memory errors when running Mac unit tests. We'll switch back for now until we can pinpoint and resolve the issue. Test Plan: Build and run `vulkan_api_test` ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: beback4u Differential Revision: D31409130 fbshipit-source-id: cab1a3330945b50522235db6738406b6037f9c68	2021-10-05 21:28:59 -07:00
Thiago Crepaldi	8d435877d5	Fix typos at ONNX docs (#66090 ) Summary: This PR fixes small typos at ONNX docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/66090 Reviewed By: albanD Differential Revision: D31385765 Pulled By: ezyang fbshipit-source-id: f4879069a2acf9c8adaa81c26a6a5014634761f5	2021-10-05 21:11:47 -07:00
CodemodService Bot	cbc29acca3	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31423202 fbshipit-source-id: 08d249e8546c0bfe6f1145c0571141b90aad03eb	2021-10-05 20:55:56 -07:00
Gary Miguel	d1058df885	fix clang-tidy error introduced by #64382 (#65977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65977 Reviewed By: ngimel Differential Revision: D31423174 Pulled By: malfet fbshipit-source-id: 0ea560b9a6ddd6431f70bd3ac10ace68e26ab352	2021-10-05 20:13:13 -07:00
John Clow	6cdea8239e	Precomputing Transposes for frozen linear layers (#65631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65631 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D31314248 Pulled By: Gamrix fbshipit-source-id: 85611f3ccfe7b91a183d5d12f7fb9aca3c51acb0	2021-10-05 20:08:32 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	43e26d0086	[deploy] Improve error messaging for create_movable (#65955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65955 This diff makes sure to give clear error message when user tries to create obj from obj that lives in different session Test Plan: buck test //caffe2/torch/csrc/deploy:test_deploy Reviewed By: suo Differential Revision: D31323045 fbshipit-source-id: e7bd6f76afeb0285847bc11881185a164f80e3f0	2021-10-05 19:49:51 -07:00
Rohan Varma	3bd26792c0	Skip test_multiple_groups on windows (#66154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66154 Skips as the test is flaky: https://github.com/pytorch/pytorch/issues/66059 ghstack-source-id: 139763149 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31403153 fbshipit-source-id: 7f47f17cee148a708346d6d9454c44a194d13a78	2021-10-05 18:33:23 -07:00
Rohan Varma	eeabab03e7	[DataParallel] Log API Usage for tracking (#66038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66038 Will help track workflows for DP deprecation. Tested via standalone DP script. Test Plan: CI Reviewed By: mrshenli Differential Revision: D31356975 fbshipit-source-id: c0a3ac3a1faed794e3362f3f3a19a6fb800587a7	2021-10-05 18:30:23 -07:00
Michael Benayoun	dc26f5eb65	[FX] Specifies a default value when possible for placeholders created from concrete_args (#59569 ) Summary: ```python class Foo(torch.nn.Module): def __init__(self): super().__init__() def forward(self, a=None, b=None): res = a if b is not None: res = res + b return res concrete_args = {'b': torch.tensor(5)} traced = fx.symbolic_trace(Foo(), concrete_args=concrete_args) ``` Gives the following error: ``` File "<eval_with_key_9>", line 2 def forward(self, a = None, b_1): ^ SyntaxError: non-default argument follows default argument ``` Since https://github.com/pytorch/pytorch/issues/55888, placeholders are also created for concrete arguments. But these placeholders do not have default values even when it was provided for the argument in question, causing the error above. To solve this, I add a default value when it is available during placeholder creation for concrete arguments. I also tried to set the default value to the value specified in concrete_args (since it many cases it will actually use this value anyway), but ran into an error because the default value is never defined: ``` def forward(self, a = None, b_1 = _tensor_constant0): _tensor_constant0 = self._tensor_constant0 _tensor_constant1 = self._tensor_constant1 add = a + _tensor_constant1; a = _tensor_constant1 = None NameError: name '_tensor_constant0' is not defined ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59569 Reviewed By: albanD Differential Revision: D31385607 Pulled By: Chillee fbshipit-source-id: 44a8ce28b5eabdb9b4c773e73a68ff0bb9c464cc	2021-10-05 17:45:09 -07:00
Alexandr Guzhva	83bac89d64	[quant] Add fp32/fp16 zero_point support for GPU fakeQuant (#65836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65836 Add a GPU implementation for GPU fakeQuant, similar to D30975238 (`60915eb810`) ghstack-source-id: 139779416 Test Plan: https://www.internalfb.com/intern/testinfra/testconsole/testrun/281475183488511/ {F667112564} Reviewed By: b-koopman Differential Revision: D31091679 fbshipit-source-id: 68fd483e6926c7fd565703c01d8ffb337b75dca5	2021-10-05 17:40:54 -07:00
Michael Suo	f062def486	Revert D31260343: [pytorch][PR] Add hash and int128 utils for Lazy Tensor Core Test Plan: revert-hammer Differential Revision: D31260343 (`e94fea08d0`) Original commit changeset: 8bb1194188e3 fbshipit-source-id: 3d0d5377d71ed928015bcb2105801be368e38cd8	2021-10-05 17:15:50 -07:00
Eli Uriegas	5e6347ca64	.circleci: Remove migrated distributed configs (#66174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66174 These configs have already been migrated so going to go ahead and remove them Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31413579 Pulled By: seemethere fbshipit-source-id: 8923736d347eb8c8470884be413122c198d1bf20	2021-10-05 16:53:02 -07:00
Will Constable	e94fea08d0	Add hash and int128 utils for Lazy Tensor Core (#65635 ) Summary: These utils are prerequisites for Lazy Node base class. - set up new torch/csrc/lazy, test/cpp/lazy dirs - add source files to build_variables.bzl in new lazy_core_sources var - create new test_lazy binary Fixes https://github.com/pytorch/pytorch/issues/65636 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65635 Reviewed By: alanwaketan Differential Revision: D31260343 Pulled By: wconstab fbshipit-source-id: 8bb1194188e3e77fc42e08a14ba37faed37a9c2e	2021-10-05 16:43:55 -07:00
Raghavan Raman	143c957c2d	[nnc] Reduced memory usage of LLVMCodeGen object after code generation is complete (#65373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65373 Test Plan: Imported from OSS Reviewed By: bertmaher, hlu1 Differential Revision: D31066974 Pulled By: navahgar fbshipit-source-id: 0dbe0d1746c50adee90fe5a7cc4a66adba3a229e	2021-10-05 16:27:43 -07:00
Jane Xu	68555339d7	test_utils.py: Add another retry to test_download_url_to_file (#66159 ) Summary: Fixes one of the flakiness concerns mentioned https://github.com/pytorch/pytorch/issues/65439#issuecomment-934686485 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66159 Reviewed By: ngimel Differential Revision: D31406485 Pulled By: janeyx99 fbshipit-source-id: cf7834cdab58360ecef1748075d52969de2e0778	2021-10-05 16:26:20 -07:00
Eli Uriegas	d2021e5e68	ci: Migrate vulkan builds to GHA (#66044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66044 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31370889 Pulled By: seemethere fbshipit-source-id: 399f5f0c184f7856dcddb138c357f1374706e676	2021-10-05 16:11:36 -07:00
Nikita Shulga	7452b65144	Remove unused `dump` method from VSX vec256 methods (#66085 ) Summary: Follow up after https://github.com/pytorch/pytorch/pull/63533 Probably fixes https://github.com/pytorch/pytorch/issues/65956 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66085 Reviewed By: ngimel Differential Revision: D31382898 Pulled By: malfet fbshipit-source-id: f3d97b0f2c7f1207827773ae85e2739f1d54b9c7	2021-10-05 16:05:01 -07:00
Bin Bao	6e06cb76ff	[JIT] Initialize CUDA context before launching fused kernel (#65064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65064 The problem appears when nvfuser is triggered from LazyTensor. Because LT maintains its own thread pool, the thread used for the first-time compilation does CUDA context initialization properly, but later cached execution may use a different thread which does not have a proper CUDA context. Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31269691 Pulled By: desertfire fbshipit-source-id: 384362025c087d61e8b625ff938379df283ef8b2	2021-10-05 16:01:59 -07:00
Mike Iovine	a5e6b2b2e3	[Static Runtime] Add variadic sigrid_transforms_torch_bind (#63960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63960 Reviewed By: hlu1 Differential Revision: D30529880 fbshipit-source-id: 1c4be2f9c0944bbe1e1c146989588c96bfd14eda	2021-10-05 16:00:36 -07:00
Dhruv Matani	e7747795c9	[PyTorch Edge] Reduce dispatch table size further for a trimmed build (#66112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66112 Eliminate Metal and Vulkan Dispatch Keys. Test Plan: Build + Sandcastle Differential Revision: D31298307 fbshipit-source-id: 31302fc626382db7997e5058750fa85458c9cbc1	2021-10-05 15:24:07 -07:00
Michael Suo	a3bbaf227c	Revert D31227448: [pytorch][PR] fixing sorting in stride indices Test Plan: revert-hammer Differential Revision: D31227448 (`da0e29edd4`) Original commit changeset: 51e3cd903757 fbshipit-source-id: a752a4df70281aa0eaaeb1afdd88395b08276da8	2021-10-05 14:28:34 -07:00
Michael Suo	89b56d630d	Create CI sev template (#66163 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/66163 Reviewed By: seemethere Differential Revision: D31407988 Pulled By: suo fbshipit-source-id: a23b6fc5410ef1f901e2a7aacc2e0c17cb04d083	2021-10-05 13:55:07 -07:00
Kurt Mohler	5883523c1d	Remove dtype from torch.Storage and use only torch.ByteStorage (#62030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62030 Remove dtype tracking from Python Storage interface, remove all the different `<type>Storage` classes except for `ByteStorage`, and update serialization accordingly, while maintaining as much FC/BC as possible Fixes https://github.com/pytorch/pytorch/issues/47442 * THE SERIALIZATION FORMAT IS FULLY FC/BC. We worked very hard to make sure this is the case. We will probably want to break FC at some point to make the serialization structure of tensors make more sense, but not today. * There is now only a single torch.ByteStorage class. Methods like `Tensor.set_` no longer check that the dtype of storage is appropriate. * As we no longer know what dtype of a storage is, we've removed the size method from Storage, replacing it with nbytes. This is to help catch otherwise silent errors where you confuse number of elements with number of bytes. * `Storage._new_shared` takes a `nbytes` kwarg and will reject previous positional only calls. `Storage._new_with_file` and `_set_from_file` require explicit element size arguments. * It's no longer possible to convert storages to different types using the float/double/etc methods. Instead, do the conversion using a tensor. * It's no longer possible to allocate a typed storage directly using FloatStorage/DoubleStorage/etc constructors. Instead, construct a tensor and extract its storage. The classes still exist but they are used purely for unpickling. * The preexisting serialization format stores dtype with storage, and in fact this dtype is used to determine the dtype of the tensor overall. To accommodate this case, we introduce a new TypedStorage concept that exists only during unpickling time which is used to temporarily store the dtype so we can construct a tensor. If you overrode the handling of pickling/unpickling, you MUST add handling for TypedStorage or your serialization code will degrade to standard file-based serialization. Original pull request: https://github.com/pytorch/pytorch/pull/59671 Reviewed By: soulitzer, ngimel Differential Revision: D29466819 Pulled By: ezyang fbshipit-source-id: 4a14e5d3c2b08e06e558683d97f7378a3180b00e	2021-10-05 13:50:34 -07:00
Nikita Shulga	588c1787ba	Update link to example pytorch/examples (#66095 ) Summary: `https://github.com/goldsborough/examples/tree/cpp/cpp` -> `https://github.com/pytorch/examples/tree/master/cpp` As C++ examples in https://github.com/pytorch/examples are more update Partially addresses https://github.com/pytorch/pytorch/issues/65388 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66095 Reviewed By: janeyx99 Differential Revision: D31382888 Pulled By: malfet fbshipit-source-id: 8884c7795386249dea07cbe66783fa1dd963e07c	2021-10-05 12:48:12 -07:00
jiej	da0e29edd4	fixing sorting in stride indices (#63940 ) Summary: Updating `computeStrideProps` logic to break ties on stride_indices. For two dimension with identical stride, the dimension with size-1 should be considered as the faster dimension. Otherwise, its stride should be the product of existing stride and the size of the other dimension. Note that there's still inconsistency between eager memory_format and stride_properties in JIT, this is a design issue due to the ambiguity on size-1 stride. One example showing this failing test has been disabled in the added cpp test Pull Request resolved: https://github.com/pytorch/pytorch/pull/63940 Reviewed By: albanD Differential Revision: D31227448 Pulled By: dzhulgakov fbshipit-source-id: 51e3cd903757bef55d3158c057f9444d0cff7d2a	2021-10-05 12:30:41 -07:00
Zafar	0d020effab	[quant] Fix the parts that were missing after initial migration (#66058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66058 After the initial migration from `torch.quantization` to `torch.ao.quantization`, some of the files did not change. This happened because the migration was done in parallel, and some of the files were landed while the others were still in the original location. This is the last fix in the AO migration phase 1, which completely enables the ao.quantization namespace. Test Plan: `python test/test_quantization.py` Reviewed By: vkuzo Differential Revision: D31366066 Pulled By: z-a-f fbshipit-source-id: bf4a74885be89d098df2d87e685795a2a64026c5	2021-10-05 11:45:37 -07:00
Zafar	727576e501	[quant] Fixing the hypothesis test for topk (#66057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66057 The current test is creating the sets that are too slow. This will cause either "Filtering too much" or "Timeout" errors in the future versions of hypothesis. This PR preemptively fixes the issue. Test Plan: `python test/test_quantization.py` Reviewed By: vkuzo Differential Revision: D31366065 Pulled By: z-a-f fbshipit-source-id: deaab4da8ee02a5dee8943cabdd30fc53d894a34	2021-10-05 11:43:56 -07:00
Michael Suo	92d0b7e99c	[deploy] fix typo in `registerModuleSource` (#66107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66107 lol Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D31385631 Pulled By: suo fbshipit-source-id: a3307e2862f7951c160776eb8edb18329c937ed1	2021-10-05 11:15:35 -07:00
Supriya Rao	458a00bacb	Back out "[quant] update fused_obs_fake_quant op to accept output_fake_quant argument" (#66063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66063 Original commit changeset: bffe776216d0 Test Plan: CI Reviewed By: vkuzo Differential Revision: D31347042 fbshipit-source-id: f56f628dc4690187bf284a8f2fda4c6aae10c1d6	2021-10-05 11:02:54 -07:00
John Shen	2b39b80971	[quantized] Replace conv_p with convolution_op in qnnpack (#65783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65783 convolution_op makes conv_param struct redundant, since it contains all the params of conv_param and more. We don't need to pass both structs to qnnpack or hold both in the packed weights, let's just hold convolution_op. This makes it easier to implement 3dconv since we won't have to template two structs. The conv_param struct is left in existence since tests rely on it to set up the convolution. ghstack-source-id: 139479651 (Note: this ignores all push blocking failures!) Test Plan: ci Reviewed By: kimishpatel Differential Revision: D30738727 fbshipit-source-id: e6d39644357b99d3b7491ae8a7066bf107eb8b9e	2021-10-05 11:01:26 -07:00
Peter Bell	bda3230b62	slow_conv2d grad_weight: call gemm directly (#65726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65726 This PR isn't strictly necessary since grad_weight doesn't use parallel_for. However, this does reduce the function overhead and will make it easier to parallelize in the future. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31257877 Pulled By: ngimel fbshipit-source-id: d8ea97cc1f43d8d9dfff355ae27c9d982838b57e	2021-10-05 10:53:22 -07:00
Richard Barnes	1db78c30c9	Fix LLVM-12 concat_split_op.h error (#66060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66060 Fixes ``` testTumHistoryAdditionalLaser (caffe2.caffe2.fb.layers.tests.tum_history_test.TestTumHistory) ... caffe2/caffe2/operators/concat_split_op.h:363:74: runtime error: applying non-zero offset 8 to null pointer #0 0x7f8f39d29795 in caffe2::ConcatOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/concat_split_op.h:363 #1 0x7f8f39c4978d in caffe2::Operator<caffe2::CPUContext>::Run(int) caffe2/caffe2/core/operator.h:987 #2 0x7f8f381fe9c9 in caffe2::SimpleNet::Run() caffe2/caffe2/core/net_simple.cc:67 #3 0x7f8f38ee488e in caffe2::Workspace::RunNet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) caffe2/caffe2/core/workspace.cc:289 ``` Test Plan: Sandcastle Reviewed By: dzhulgakov, xush6528 Differential Revision: D31366205 fbshipit-source-id: 566aa519677c9d371189e4b1f81d595732861efc	2021-10-05 10:48:56 -07:00
Dhruv Matani	9c3eb50b7b	[PyTorch] Use std::move() in a couple places in function_schema_parser.cpp (#66114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66114 ghstack-source-id: 139712533 Test Plan: Build Reviewed By: swolchok Differential Revision: D31387502 fbshipit-source-id: e850cb7df397a7c5b31df995b23ad6e5c004ac86	2021-10-05 10:44:07 -07:00
Xiang Gao	aa80f05d2d	Remove sync in Embedding caused by unique (#66091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66091 Reviewed By: albanD Differential Revision: D31385576 Pulled By: ngimel fbshipit-source-id: e656d4d9c38b705c71853ca295f977d1cddc61a1	2021-10-05 09:39:42 -07:00
Nikita Shulga	1932bc69e9	Move GHA to ONNX (#65975 ) Summary: - Delete CircleCI ONNX config - Add sharded ONNX job to the list of generated workflows - Move ONNX runtime installation from `pytorch-job-specs.yml` to `.jenkins/caffe2/test.sh` - Limit MKLDNN to AVX2 ISA while running Caffe2 tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/65975 Reviewed By: seemethere Differential Revision: D31327206 Pulled By: malfet fbshipit-source-id: 15aa53e4481e846c62b4ee2db5c03047d68679a4	2021-10-05 09:31:57 -07:00
Stephen Jia	df475aa1dc	Update Vulkan runner in benchmark binary to handle non-tensor inputs (#66123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66123 Some models may take in a list of tensors as inputs, thus the bundled inputs will contain `IValues` that are of the type `c10::List`. For Vulkan models, every tensor in the `IValue` list has to be converted to a vulkan tensor first, and this case is not currently handled by the Vulkan model wrapper in the benchmark binary. This diff introduces `IValue` type checking to the input processor of the Vulkan model wrapper, and adds support for Tensor and List types. Test Plan: ``` # Build the binary cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:ptmobile_compareAndroid\#android-arm64 --show-output # Push it to the device adb push buck-out/gen/xplat/caffe2/ptmobile_compareAndroid\#android-arm64 /data/local/tmp/compare_models # Run the benchmark binary BENCH_CMD="/data/local/tmp/compare_models" BENCH_CMD+=" --model=$PATH_TO_MODEL" BENCH_CMD+=" --refmodel=$PATH_TO_REFERENCE_MODEL" BENCH_CMD+=" --input_type=float --input_dims=$MODEL_INPUT_SIZE" BENCH_CMD+=" --iter=100" BENCH_CMD+=" --tolerance 1e-5" ``` Reviewed By: beback4u Differential Revision: D31276862 fbshipit-source-id: 1d9abf958963da6ecad641202f0458402bee5ced	2021-10-05 07:59:56 -07:00
Jerry Zhang	2a5116e159	[quant][fx2trt] Add quantize_per_channel in acc_ops and acc_ops_converter (#65287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65287 Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: 842974287 Differential Revision: D31038882 fbshipit-source-id: cd20e132ffa85f6fb070e21cd96a9e84dd15fab5	2021-10-05 02:12:00 -07:00
jjsjann123	d609957c95	patching graph_for (#55139 ) Summary: Allows individual DifferentiableGraphOp to display optimized forward graph. This improves user visibility to graph mutation via optimization pass, especially fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55139 Reviewed By: albanD Differential Revision: D31330909 Pulled By: dzhulgakov fbshipit-source-id: c745b482fdc34876dc404cbe3bacd99dcf2ac724	2021-10-04 21:50:22 -07:00
Mike Iovine	ed50fa2513	[Static Runtime] Test isOptimizableContainerType and getAlwaysAliveValues (#65849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65849 Add tests for some of `StaticModule`'s exposed methods. Both of these are used by the memory planner, so it would be helpful to have some unit tests that ensure our basic invariants don't break. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31282901 fbshipit-source-id: e390329f4794e034170507e3a0de0abcfe0ab7b9	2021-10-04 20:46:07 -07:00
Nikita Shulga	4c4525fa5c	Compile without -Wno-unused-variable (take 2) (#66041 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Do not delete `caffe2::OperatorBase::Output` calls as they have side effects Pull Request resolved: https://github.com/pytorch/pytorch/pull/66041 Reviewed By: ngimel Differential Revision: D31360142 Pulled By: malfet fbshipit-source-id: 6fdfb9f91efdc49ca984a2f2a17ee377d28210c8	2021-10-04 20:39:39 -07:00
Yinghai Lu	6b0aa2958d	[FX] Support torch.layout as arg (#66048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66048 Previously, create_arg would fail if it encountered a not `None` layout argument. Adding it to `BaseArgumentTypes` list should be enough to fix that. Test Plan: Added unittest Reviewed By: jamesr66a Differential Revision: D31362662 fbshipit-source-id: 20049971e18c17e9c75e50540500c567266daa55	2021-10-04 19:58:08 -07:00
Jerry Zhang	6ea4902cf4	[ao_migration] torch.quantization --> torch.ao.quantization in caffe2/torch/fx (#66096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66096 codemod -m -d caffe2/torch/fx --extensions py \ 'torch.quantization' \ 'torch.ao.quantization' Test Plan: test_in_prod Reviewed By: z-a-f Differential Revision: D31294195 fbshipit-source-id: 00425844f8160749f68bdbdf0e08cb22c79099c9	2021-10-04 19:57:01 -07:00
n-v-k	de24faec5f	Binary building wthout python fix (#66031 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/66030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66031 Reviewed By: VitalyFedyunin Differential Revision: D31356243 Pulled By: malfet fbshipit-source-id: d1537bc65bbba5d6497ecb8db7160a397eca81fd	2021-10-04 18:34:35 -07:00
Nikita Shulga	6eb3a1c831	Run master clang-tidy on PRs (#66104 ) Summary: Make PR clang-tidy a strong superset of master one Should prevent a situation when [clang-tidy on PR](https://github.com/pytorch/pytorch/runs/3773346094) was clean but regressed on [trunk commit](https://github.com/pytorch/pytorch/runs/3773406183?check_suite_focus=true) Pull Request resolved: https://github.com/pytorch/pytorch/pull/66104 Reviewed By: seemethere Differential Revision: D31384608 Pulled By: malfet fbshipit-source-id: 397319be3480520d58eab11ec001ad7a9a94d41c	2021-10-04 18:27:38 -07:00
Scott Wolchok	7c758759e3	[PyTorch Edge] Avoid string copying in TypeParser (#64278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64278 Use c10::string_view and const char* to copy less. ghstack-source-id: 139468089 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/132239033718036 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/132239033718036 went from mean of 293 ms to 281 ms. Reviewed By: dhruvbird Differential Revision: D30650712 fbshipit-source-id: abad143f2d5cc99a30e8da376c8e37716373032a	2021-10-04 16:10:38 -07:00
Jane Xu	69da4b4381	GHA: make obvious when we are running smoke tests to user (#66011 ) Summary: This PR clarifies what's run on PRs by explicitly stating when it runs smoke tests for windows CUDA and makes the logic so that user defined labels override other workflow logic. 1. Move smoke tests to its own config. 2. Make sure that when a user specifies a ciflow label that is not the default, the workflow runs as if it is on trunk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/66011 Test Plan: the default on PRs would generate this matrix (default replaced by smoke_tests) ![image](https://user-images.githubusercontent.com/31798555/135672182-64454ea3-ff43-4746-b8e4-09b0b28e9d33.png) But when retriggered with a label, it looks like (note that there's no smoke_tests config): ![image](https://user-images.githubusercontent.com/31798555/135672601-5aa9a268-bc76-40f1-80c6-62b3fac6601d.png) Reviewed By: VitalyFedyunin, seemethere Differential Revision: D31355130 Pulled By: janeyx99 fbshipit-source-id: fed58ade4235b58176e1d1a24101aea0bea83aa4	2021-10-04 07:53:17 -07:00
soulitzer	4cdfceddd2	[Reland] Avoid saving self for `softmax` and `log_softmax` (#66018 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/65242 The last attempt of the reland automatically rebased onto stable, which did not yet have the revert commit Pull Request resolved: https://github.com/pytorch/pytorch/pull/66018 Reviewed By: albanD Differential Revision: D31348822 Pulled By: soulitzer fbshipit-source-id: 881d701b404530c1352ac9245bd67264e1652b8a	2021-10-03 21:35:01 -07:00
soulitzer	8f5631b859	Refactor functional api vectorized jacobian to use batched grad parameter (#65566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65566 This doesn't simplify vectorized jacobian computation, but is good to consolidate logic and helps us to test the logic Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31236257 Pulled By: soulitzer fbshipit-source-id: 00ca0aa6519bed5f9ee2c7be4daa8872af5e92cd	2021-10-03 19:55:08 -07:00
soulitzer	73901b099d	Add batched_grad parameter to `autograd.grad` (#65564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65564 - wrap the call into engine with vmap if `batched_grad` is `True` - improves the comment on the call to engine (somewhat addressing https://github.com/pytorch/pytorch/issues/41659) - borrows the message from functional.jacobian's vectorized argument concerning usage of the vmap feature - adds basic test (further testing is done when we replace the usage in vectorized jacobian computation) TODO: - create an issue tracking this Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31236259 Pulled By: soulitzer fbshipit-source-id: b33e6b26ea98fa9f70c44da08458fc54ba4df0f7	2021-10-03 19:55:06 -07:00
soulitzer	b6d5f1ee70	Allow None to pass through for vmap (#65565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65565 Does jax allow this? Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31236258 Pulled By: soulitzer fbshipit-source-id: 80460b355fc32ecbba8151e1f3179f076a927f9d	2021-10-03 19:53:49 -07:00
Don Jang	89ed9bdaee	[Static Runtime] Fix bug of creating output aliases in aten::embedding_bag (#65516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65516 This change fixes a bug that Static Runtime's `aten::embedding_bag` out variant implementation creates aliases in its managed output tensors. Managed output tensors should never be an alias with each other since writing to them can illegally overwrite others' contents unintentionally, and this exact problem was causing the bug at T97393697, causing SR to return wrong return values. This bug is detected in inline_cvr/remote_ro by a DCHECK, `verify_no_memory_overlap` (introduced by D30211705 (`3fb33b38b9`)), but wasn't found so far since our testing didn't include running the model in the debug mode. Fortunately this bug is not hitting production since the aliases outputs are not used in production. This change fixes the root cause from `_embedding_bag_cpu_impl_out` by replacing alias creation with copying. Note that this change also includes a fundamental change in Static Runtime's unit testing: `testStaticRuntime` exercises the given graph 3 times: 1. profile run 2. run using the profile to allocate managed tensors 3. reuse the managed tensors -- newly added Adding 3 reveals this bug with a new unittest `EmbeddingBagWithManagedOutput`. Test Plan: - Confirmed that the crash experienced by `StaticRuntime.EmbeddingBagWithManagedOutput` disappears with this change (crash paste: P459807248). - Added `StaticRuntime.EmbeddingBagWithManagedOutput` to detect the same problem in the future. Reviewed By: hlu1 Differential Revision: D31104345 fbshipit-source-id: 7bddf9cd82b400d18d8ce1bf15e29b815ef9ba8f	2021-10-03 15:10:58 -07:00
Richard Barnes	40948a935d	Fix LLVM-12 UB in generate_proposals_op.cc (#66009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66009 Fixes ``` test_trace_c10_ops (jit.test_tracer.TestTracer) ... third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374:24: runtime error: applying non-zero offset 4 to null pointer #0 0x7f5228f72227 in Eigen::internal::BlockImpl_dense<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, true>::BlockImpl_dense(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:374 #1 0x7f5228f7212c in Eigen::BlockImpl<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false, Eigen::Dense>::BlockImpl(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:166 #2 0x7f5228f720dc in Eigen::Block<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, -1, -1, false>::Block(Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >&, long, long, long, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/Block.h:142 #3 0x7f5229b0e059 in Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::FixedBlockXpr<internal::get_fixed_value<int>::value, internal::get_fixed_value<long>::value>::Type Eigen::DenseBase<Eigen::Map<Eigen::Array<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> > >::block<int, long>(long, long, int, long) third-party-buck/platform009/build/eigen/include/Eigen/src/Core/../plugins/BlockMethods.h:98 #4 0x7f5229b0c5ca in caffe2::GenerateProposalsOp<caffe2::CPUContext>::RunOnDevice() caffe2/caffe2/operators/generate_proposals_op.cc:348 ``` Also cleans up some data type and const issues around the area. Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D31343046 fbshipit-source-id: fd9096c8e47a0aad529c72fd313f64ca98dcb80b	2021-10-03 12:50:21 -07:00
Prabhat Roy	c7748fc172	Added validation of mode parameter in AveragedModel (#65921 ) Summary: Discussion: https://github.com/pytorch/pytorch/pull/65495#issuecomment-930460469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65921 Reviewed By: albanD Differential Revision: D31310105 Pulled By: prabhat00155 fbshipit-source-id: 417691832a7c793744830c11e0ce53e3972d21a3	2021-10-03 08:42:28 -07:00
Hector Yuen	0fc6bd2e47	[gpu ne eval] disable adam decay unit test for gpu (#66056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66056 keep running into this unrelated failure when landing diffs regarding the gpu inference project, disabling this operator unit test in gpu because it doesn't exist RuntimeError: [enforce fail at operator.cc:277] op. Cannot create operator of type 'SmartDecaySparseAdam' on the device 'CUDA'. Verify that implementation for the corresponding device exist. It might also happen if the binary is not linked with the operator implementation code. If Python frontend is used it might happen if dyndep.InitOpsLibrary call is missing. Operator def: input: "param" input: "mom1" input: "mom2" input: "last_seen" input: "indices" input: "grad" input: "lr" input: "iter" output: "param" output: "mom1" output: "mom2" output: "last_seen" name: "" type: "SmartDecaySparseAdam" arg { name: "beta1" f: 0 } arg { name: "beta2" f: 0.9 } arg { name: "epsilon" f: 1e-05 } device_option { device_type: 1 } https://www.internalfb.com/intern/testinfra/diagnostics/5910974579962988.562949996565057.1633122845/ Test Plan: sandcastle Reviewed By: jianyuh Differential Revision: D31364731 fbshipit-source-id: 7fbd994cbe7f6ca116f5f34506a1ed7f14759bdf	2021-10-03 07:40:23 -07:00
Nirav Mehta	29c0725e8a	Back out "[caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error" (#66055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66055 Original commit changeset: c31f179f8a7d Reviewed By: igorsugak Differential Revision: D31353348 fbshipit-source-id: 73d928e5c938ba604a7f9ea17a6250b57306e88f	2021-10-02 16:46:26 -07:00
Nikolay Korovaiko	7c52963350	[WIP] skip constant folding dequant node (#63991 ) Summary: This PR makes Constant Propagation to ignore dequant nodes. https://github.com/pytorch/pytorch/issues/61092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63991 Reviewed By: pbelevich Differential Revision: D31363993 Pulled By: Krovatkin fbshipit-source-id: 99f7c56a4381aff2cbdf1167508414cf240e9f75	2021-10-02 15:30:43 -07:00
Yinghai Lu	8a307640db	selective trt import based whether we have gpu or not (#66045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66045 Att. Reviewed By: kflu Differential Revision: D31357388 fbshipit-source-id: 601affe067e5e4c1f1516dff4ac84fa9cdd27d5e	2021-10-02 06:12:37 -07:00
Chen Lai	8b8012a165	[PyTorch Edge] Skip writing version during backport (#65842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65842 During backport, only parts of the model (like bytecode.pkl) needs to be re-written, while the rest of the model is the same. However, `version` will always be re-written when `PyTorchStreamWriter` is destrcuted. Change version to optional and add an api to allow skipping writing version when closing the writer. ghstack-source-id: 139580386 Test Plan: buck run papaya/scripts/repro:save_load Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D31262904 fbshipit-source-id: 3b8a5e1aaa610ffb0fe8a616d9ad9d0987c03f23	2021-10-01 21:18:31 -07:00
Don Jang	7941590a51	[JIT] Selectively enable precise alias analysis for TupleConstruct (#66025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66025 This change adds an option to selectively enable precise alias analysis for `prim::`TupleConstruct` (introduced by D30437737 (`cd458fe092`)) to minimize its exposure only to `StaticRuntime` as of now. Test Plan: Modified existing unit tests whose behavior depends on D30437737 (`cd458fe092`). Reviewed By: eellison Differential Revision: D31350285 fbshipit-source-id: 3ce777f07f99650d74634481ad0805192dce55c6	2021-10-01 20:42:22 -07:00
Nikita Shulga	e4ee5ca698	Revert D31326599: [pytorch][PR] Compile without -Wno-unused-variable Test Plan: revert-hammer Differential Revision: D31326599 (`a6280ab653`) Original commit changeset: 924155f1257a fbshipit-source-id: b8ee5bc0298637443232f5ee9ec79e51ed256faf	2021-10-01 20:40:47 -07:00
Nikita Shulga	5ef350d7cc	Revert D31359010: [pytorch][PR] Fix cang-tidy regressions caused by #65954 Test Plan: revert-hammer Differential Revision: D31359010 (`c269f471f4`) Original commit changeset: dce4b91a9891 fbshipit-source-id: 085417432b6748d3672b9b7141460f47d1c17a7f	2021-10-01 20:35:35 -07:00
Nikita Shulga	c269f471f4	Fix cang-tidy regressions caused by #65954 (#66040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66040 Reviewed By: ZolotukhinM Differential Revision: D31359010 Pulled By: malfet fbshipit-source-id: dce4b91a98913c8d8c2d8f9ebc49654265239158	2021-10-01 19:50:53 -07:00
lezcano	ca76e193a3	Fix nll_backward for negative weights (#64572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64572 Fixes https://github.com/pytorch/pytorch/issues/64256 It also fixes an inconsistent treatment of the case `reduction = "mean"` when the whole target is equal to `ignore_index`. It now returns `NaN` in this case, consistently with what it returns when computing the mean over an empty tensor. We add tests for all these cases. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31116297 Pulled By: albanD fbshipit-source-id: cc44e79205f5eeabf1efd7d32fe61e26ba701b52	2021-10-01 19:41:51 -07:00
Karol Kosik	eb3b9fe719	[XROS][ML] System specific adjustments for UTs to work. (#65245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65245 Building and running c10 and qnnpack tests on XROS. Notable changes: - Adding #if define(_XROS_) in few places not supported by XROS - Changing Threadpool to abstract class ghstack-source-id: 139513579 Test Plan: Run c10 and qnnpack tests on XROS. Reviewed By: veselinp, iseeyuan Differential Revision: D30137333 fbshipit-source-id: bb6239b935187fac712834341fe5a8d3377762b1	2021-10-01 18:15:14 -07:00
Samuel Salas	363ccb257d	GELU acc OP (#65957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65957 added accelerator ops and unit test for GELU. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer Reviewed By: 842974287 Differential Revision: D31277083 fbshipit-source-id: f66dd05ef574db58cfa599e3575f95f1ebe82e93	2021-10-01 17:49:53 -07:00
Nikita Shulga	a6280ab653	Compile without -Wno-unused-variable (#65954 ) Summary: Delete `-Wno-unused-variable` from top level `CMakeLists.txt` Still suppress those warnings for tests and `torch_python` Delete number of unused variables from caffe2 code Use `(void)var;` to suppress unused variable in range loops Use `C10_UNUSED` for global constructors and use `constexpr` instead of `static` for global constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/65954 Reviewed By: ngimel Differential Revision: D31326599 Pulled By: malfet fbshipit-source-id: 924155f1257a2ba1896c50512f615e45ca1f61f3	2021-10-01 17:40:47 -07:00
Janet Yang	10f6294281	Fix shape inference dim_type for Clip, Mean, Div (#65996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65996 Test Plan: Facebook ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Clip ``` ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Div ``` ``` buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Mean ``` Reviewed By: yinghai Differential Revision: D31121298 fbshipit-source-id: f366d8f4d4d0be159b62bfaafc42ca924c05e022	2021-10-01 17:34:34 -07:00
David Reiss	e1d963e8fc	model_dump: Fix memory computation when both constants and data tensors are present (#66006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66006 Previously, this was resulting in a key collision and a crash. ghstack-source-id: 139342089 Test Plan: Ran webdriver test locally. Reviewed By: dhruvbird Differential Revision: D31281092 fbshipit-source-id: f31311726c681d6d7e0504ff8e84c888af9054f0	2021-10-01 16:31:06 -07:00
David Reiss	23caeb3f71	model_dump: Add a helper to produce html with a single call (#66005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66005 ghstack-source-id: 139342091 Test Plan: Unit test, and used in a notebook. Reviewed By: dhruvbird Differential Revision: D31281091 fbshipit-source-id: 1e4d0713b9796a3d182de9e676c3b3c3b1610d6e	2021-10-01 16:29:43 -07:00
driazati	d9a95e66f0	Upload test failures to RDS (#65873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65873 Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31296520 Pulled By: driazati fbshipit-source-id: 0bd3fb6b62e49c7177199001fda0e7b124a22ab2	2021-10-01 16:25:51 -07:00
Shiyan Deng	f85d7422bb	[fx2trt]add support for torch.tile (#66016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66016 Add acc_ops.tile and converter for it. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_tile Reviewed By: wushirong Differential Revision: D30587939 fbshipit-source-id: 1e2613cfca486fe54fcc0d38e5c7cdeb7d0ed4a0	2021-10-01 16:06:09 -07:00
Erjia Guan	060e41eafa	Forward fix type hint for DataLoader (#66001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66001 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D31340565 Pulled By: ejguan fbshipit-source-id: d05ae42ebf93f61d781dc5d81ef0222e24f5acb3	2021-10-01 15:48:45 -07:00
Michael Suo	ad889d0b5e	Revert D30634700: [pytorch][PR] Fix typo in tensor docs Test Plan: revert-hammer Differential Revision: D30634700 (`d937473709`) Original commit changeset: e8952be20966 fbshipit-source-id: b18694e332023abcdf17ec1900b81b00d21f1014	2021-10-01 15:23:38 -07:00
Alex Beloi	7d22007902	[fx-acc] add acc_op optimization flags and decorator (#65928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65928 This diff adds a decorator for adding flags to acc_ops. These flags inform graph optimizations that the op is eligible for optimization by some general criteria (e.g. op acts elementwise, op does quantization). This makes it simpler to expand acc_ops. The user can add an op and add flags to enable optimization without going through all graph opts and trying to determine if new acc_op is eligible for the graph optimization. Even though our list of graph opts is small now we already see that for `sink_reshape_ops` we had hardcoded 11 pointwise acc_ops, now there are 24 pointwise acc_ops. Test Plan: ``` buck test mode/opt glow/fb/fx/graph_opts:test_fx_sink ``` ``` Parsing buck files: finished in 0.5 sec Downloaded 0/3 artifacts, 0.00 bytes, 100.0% cache miss (for updated rules) Building: finished in 37.1 sec (100%) 10279/10279 jobs, 3/10279 updated Total time: 37.7 sec More details at https://www.internalfb.com/intern/buck/build/e13521bb-6142-4960-8cdd-6b5e4780da96 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 16260a2a-d364-4605-9111-6f2a19317036 Trace available for this run at /tmp/tpx-20210922-124332.623880/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564 ✓ ListingSuccess: glow/fb/fx/graph_opts:test_fx_sink - main (6.038) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.036) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_concat_below_quantize (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.048) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_sink_reshape_nodes (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.058) ✓ Pass: glow/fb/fx/graph_opts:test_fx_sink - test_no_sink (glow.fb.fx.graph_opts.tests.test_fx_sink.TestSink) (0.057) Summary Pass: 4 ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124720425564 ``` Reviewed By: jfix71 Differential Revision: D31121321 fbshipit-source-id: 6f6e3b8e2d57ea30766fa6bee34ca207cec86f0f	2021-10-01 15:19:35 -07:00
Akshit Khurana	d937473709	Fix typo in tensor docs (#64160 ) Summary: Remove extra character from `torch.qfint32` Pull Request resolved: https://github.com/pytorch/pytorch/pull/64160 Test Plan: Docs Reviewed By: jerryzh168 Differential Revision: D30634700 Pulled By: axitkhurana fbshipit-source-id: e8952be20966b9a3f9d62d9957ae255d5d4889bb	2021-10-01 14:57:55 -07:00
David Riazati	8e8695285f	Re-generate workflows (#66027 ) Summary: Fix master breakage Pull Request resolved: https://github.com/pytorch/pytorch/pull/66027 Reviewed By: suo, malfet Differential Revision: D31353922 Pulled By: driazati fbshipit-source-id: cdb7f639608999b6ee72f6b1000d7ecbc02efc95	2021-10-01 14:56:51 -07:00
driazati	894d296bae	Remove usage of GitHub's artifact store in linux jobs (#65875 ) Summary: The docs stuff is unnecessary since they are hosted in S3 anyways, and the reports are mirrored in S3 which has better upload/download speed and is available as soon as the upload is done rather than once the workflow is complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65875 Reviewed By: seemethere Differential Revision: D31296500 Pulled By: driazati fbshipit-source-id: 8c371230d0c8c0eb785702df9ae495de85f60afa	2021-10-01 13:49:44 -07:00
Bert Maher	6e8ffd191e	Fix typo in name of LayerNormBackwardCUDAKernel (#66000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66000 Saw this in nvprof and I'm just a little too nitpicky to let it slide! ghstack-source-id: 139547271 Test Plan: CI Reviewed By: xiaomengy Differential Revision: D31340262 fbshipit-source-id: ab48dc99c34a74585e66800b4bbcccc6aabbaff2	2021-10-01 12:28:59 -07:00
Scott Wolchok	ffede499b2	[PyTorch][Static Runtime] Fast path for contiguous to_copy (#65499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65499 When the tensors in question are contiguous, there is no need to go through dispatch, use TensorIterator, etc. ghstack-source-id: 139549027 Test Plan: Ran ptvsc2_predictor_bench for ctr_mobile_feed local net following https://fb.quip.com/q8hBAFGMeaOU (but without the profile and compare_results options). Before: I0922 14:00:32.261942 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.18124. Iters per second: 139.252 I0922 14:01:44.865965 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.25314. Iters per second: 137.871 I0922 14:02:56.929602 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.1986. Iters per second: 138.916 I0922 14:04:05.923025 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.89211. Iters per second: 145.093 I0922 14:05:17.953056 3132627 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 7.19577. Iters per second: 138.971 mean: 7.144172, stddev: 0.1283 After: I0922 13:51:55.233937 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.79709. Iters per second: 147.122 I0922 13:53:03.062682 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.77605. Iters per second: 147.579 I0922 13:54:10.230386 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.70993. Iters per second: 149.033 I0922 13:55:18.403434 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.81044. Iters per second: 146.833 I0922 13:56:26.568646 3086245 PyTorchPredictorBenchLib.cpp:312] PyTorch run finished. Milliseconds per iter: 6.80965. Iters per second: 146.85 mean: 6.800632, stddev: 0.013227 Looks like about a 5.3% improvement. Reviewed By: hlu1 Differential Revision: D31125492 fbshipit-source-id: 92ab5af242d0a84dcf865323a57b48e8374eb823	2021-10-01 12:13:33 -07:00
Scott Wolchok	7b10a76e05	[PyTorch] Try removing Android strtod implementation (#65713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65713 This may not be needed anymore. ghstack-source-id: 139114284 Test Plan: see if it builds Reviewed By: dhruvbird Differential Revision: D31216245 fbshipit-source-id: 29c9c013f94070c7713e46027881cb693b144d36	2021-10-01 11:43:15 -07:00
Scott Wolchok	176d3c6fb4	[PyTorch] Fix many Tuple::elements() callsites (#64065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64065 It is only safe to mutate Tuple elements if you are the sole owner of the tuple. The most efficient way to do this, then, is `std::move(*std::move(tupleIValue).toTuple()).elements()` (the innermost move allows `IValue::toTuple()` to avoid a refcount bump and the outermost move allows the element vector to be moved out of the tuple), but many callsites write simply `tupleIValue.toTuple().elements()`, which incurs many extra refcount bumps. ghstack-source-id: 139468088 Test Plan: CI Reviewed By: ezyang Differential Revision: D30592621 fbshipit-source-id: e8312de866de09b9ea2a62e5128cbf403ee16f09	2021-10-01 11:36:05 -07:00
Shiyan Deng	f14e5e636d	[fx2trt]fix slice tensor converter (#65960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65960 Fix a bug in the converter and add support for negative dim. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/fx2trt:test_narrow Reviewed By: wushirong Differential Revision: D31310232 fbshipit-source-id: 62887369d830202cae6d63b41747225b12dcf754	2021-10-01 11:29:42 -07:00
Scott Wolchok	21eebc9fd6	[PyTorch][easy] Use copy-and-move instead of copy-and-swap in IValue::operator= (#65826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65826 Should be marginally more efficient. ghstack-source-id: 139315050 Test Plan: CI Reviewed By: ezyang Differential Revision: D31272489 fbshipit-source-id: 7c309d67a0ec0ada35a5b62497bac374538394a9	2021-10-01 11:16:42 -07:00
Jordan Fix	592481a5cc	[fx][const_fold] Refactor to use base split module to simplify, and correctly handle non-single-Tensor outputs (#65933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65933 We use `split_module` to split the input model that we want to const fold into const and non-const subgraphs. Previously we were taking the non-const graph and trying to hack it back into the same signature as the input model. However this was complex/buggy. Instead, refactor to just keep using the base split module that contains both const and non-const graphs. This means we: - Inline the non-const graph into the split module - Remove the const graph from the module and replace it with a getattr that will be run to insert that attr when we `run_folding` Test Plan: Added test coverage to cover newly supported folding, and updated other tests for new strategy. Reviewed By: yinghai Differential Revision: D31293307 fbshipit-source-id: 6e283a8c7222cf07b14e30e74dffc8ae5ee8b55f	2021-10-01 10:26:29 -07:00
Tao Xu	34682377b9	[iOS][CI] Update dev certs (#66004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/66004 Reviewed By: xta0 Differential Revision: D31340893 Pulled By: malfet fbshipit-source-id: 3bf0be266e9686a73d62e86c5cf0bebeb0416260	2021-10-01 09:38:49 -07:00
Michael Suo	ccf8d48f16	Revert D31317680: [pytorch][PR] Avoid saving self for`softmax` and `log_softmax` Test Plan: revert-hammer Differential Revision: D31317680 (`5f7cadc7aa`) Original commit changeset: b3b921e06775 fbshipit-source-id: 1bca0672383536a2c21243ceb52349c766a94344	2021-10-01 09:31:44 -07:00
Michael Suo	21da6ae9ce	suppress mypy error (#66003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/66003 Differential Revision: D31340874 D31340874 Test Plan: Imported from OSS Reviewed By: seemethere Pulled By: suo fbshipit-source-id: d9ef0f40625fe5ff21f8a5e044d5a75400367dc2	2021-10-01 09:17:42 -07:00
Brian Hirsh	eac218dbc6	Revert "Port `sort` kernel to structured kernels. (#62391 )" (#65876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65876 This reverts commit 93852bb2d41d90b6ac660015d79f7474bcebb774. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31296329 Pulled By: bdhirsh fbshipit-source-id: 85eae72f2346d69290f440f5393a7da096a96c6e	2021-10-01 07:50:28 -07:00
soulitzer	5f7cadc7aa	Avoid saving self for`softmax` and `log_softmax` (#65242 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64000 - updates double backward formula to compute grad wrt output instead of self - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242 Reviewed By: malfet Differential Revision: D31317680 Pulled By: soulitzer fbshipit-source-id: b3b921e06775cfc12e5a97a9ee8d73aec3aac7c3	2021-10-01 07:49:07 -07:00
Ivan Yashchuk	383c0a3858	Fix internal assert failure for torch.all and torch.any with requires_grad=True (#65714 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/58547. I added an OpInfo-based test that fails on master and passes with the proposed changes. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/65714 Reviewed By: saketh-are, mruberry Differential Revision: D31248307 Pulled By: albanD fbshipit-source-id: 041eaa9b744c3043f78dd8ae5f457f67c311df4f	2021-10-01 07:32:44 -07:00
Ivan Yashchuk	53c0d91db9	Make autograd codegen for differentiable outputs safer to use (#65823 ) Summary: This PR adds raising an error when `len(output_differentiability) != len(outputs)` Notes in derivatives.yml tell that > 'output_differentiability' and value a list of the same length as the number of outputs from the forward function. but it was not enforced in codegen leading to confusion and unexpected bugs https://github.com/pytorch/pytorch/issues/65061#issuecomment-930271126. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65823 Reviewed By: mrshenli Differential Revision: D31307312 Pulled By: albanD fbshipit-source-id: caeb949e9249310dffd237e77871e6d0d784e298	2021-10-01 07:27:57 -07:00
Bert Maher	bff8d8fd28	[nnc] Add BufHandle.store to python API (#65213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31328502 Pulled By: bertmaher fbshipit-source-id: 1f260f68692c3859350587afe021a500672d79f0	2021-10-01 06:59:50 -07:00
Bert Maher	8cf047afac	[nnc] Add call_with_numel interface for fast CUDA calls (#65213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65213 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31319012 Pulled By: bertmaher fbshipit-source-id: 93fee80f956795470f5a2ce3b33c2ea2f132036f	2021-10-01 06:58:37 -07:00
Bert Maher	8595b6eeed	Avoid UB when indexing into size-0 tensors (#65878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65878 If we attempt to compute an offset into an empty tensor we trigger UB, since we'd be adding an offset to a nullptr, which is UB (https://reviews.llvm.org/D67122) even if we never use the pointer. Since indexing into an empty tensor yields an empty tensor anyways, let's just return the underlying (null) data ptr in this case. ghstack-source-id: 139448496 Test Plan: r-barnes originally pointed this out to me in a failing TE fuser test: https://www.internalfb.com/intern/testinfra/diagnostics/5910974579561425.281475022329152.1632898053/ ``` buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU)' ``` But it turns out it's easily triggered by anything that tries to operate on a slice of a size-0 tensor: ``` def test_pad(self): F.pad(torch.ones(0, 3, 3), (1, 2), 'circular') def test_index(self): input = torch.zeros(0, 3, 3) out = torch.zeros(0, 3, 6) out[..., 1:4] = input[..., 0:3] def test_add(self): torch.ones(0, 2)[:, 1] + torch.ones(0, 1) ``` What's the right place for these sort of operator corner-case tests? Should they be/are they part of OpInfo? Reviewed By: jamesr66a Differential Revision: D31296914 fbshipit-source-id: 0ef52ad311dceeed985498f8d9390bc6fbaefbfc	2021-10-01 06:55:15 -07:00
Roman Shapovalov	fc52f1293e	Improve pytorch type hints (Dataloader, trig functions) Summary: This is to fix Pyre errors in our applications: * calling `tensor.cos()` etc. * creating a data loader with batch sampler that is `List[List[int]]`. Test Plan: TODO: rebase the diffs and run Pyre. Reviewed By: ejguan Differential Revision: D31309564 fbshipit-source-id: 1c6f3070d7570260de170e2fe2153d277b246745	2021-10-01 06:53:57 -07:00
Mike Iovine	982ef8837b	[Static Runtime] Fuse ListUnpack + gather_ranges_to_dense (#65116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65116 Fuse `fb::gather_ranges_to_dense` with `prim::ListUnpack`. ``` %0 : Tensor[] = fb::gather_ranges_to_dense(...) %1: Tensor, %2: Tensor, ... = prim::ListUnpack(%0) ``` turns into: ``` %0: Tensor, %1: Tensor, ... = fb::gather_ranges_to_dense(...) ``` Reviewed By: hlu1 Differential Revision: D30973525 fbshipit-source-id: f0349baa1622b697ee2ab652376a24ec0d89e819	2021-10-01 06:49:54 -07:00
Vasiliy Kuznetsov	227e37dd39	pytorch quantization ao migration phase 2: caffe2/test (#65832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65832 Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/test` folder. ``` find caffe2/test/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" HG: manually revert the files testing this migration hg revert caffe2/test/quantization/ao_migration/common.py hg revert caffe2/test/quantization/ao_migration/test_ao_migration.py ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31275754 fbshipit-source-id: 4ed54a74525634feb0f47a26d071102e19c30049	2021-10-01 06:26:30 -07:00
Vasiliy Kuznetsov	dac35b3592	pytorch quantization ao migration phase 2: torch/jit (#65829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65829 Renames `torch.quantization` to `torch.ao.quantization` in `torch/jit` folder. ``` find caffe2/torch/jit/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31273365 fbshipit-source-id: 350eb116148d91b967d428b54413caee4fd68438	2021-10-01 06:22:22 -07:00
Vasiliy Kuznetsov	e3af4be963	pytorch quantization ao migration phase 2: caffe2/benchmark (#65833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65833 Renames `torch.quantization` to `torch.ao.quantization` in `caffe2/benchmarks` folder. ``` find caffe2/benchmarks/ -type f -name "*.py" -print0 \| xargs -0 sed -i "s/torch\.quantization/torch.ao.quantization/g" ``` Test Plan: CI Reviewed By: z-a-f Differential Revision: D31275963 fbshipit-source-id: 8596bf28df5c3ad2c4490ac8abb285d6517c0116	2021-10-01 06:17:36 -07:00
kshitij12345	c1447f06a8	[special] special alias for softmax (#62251 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62251 Reviewed By: H-Huang Differential Revision: D31141834 Pulled By: mruberry fbshipit-source-id: aecaf62af248e9034ef589159ce0fb325c729493	2021-10-01 03:55:32 -07:00
Zafar	c27b427cd9	[sparsity] Add m-out-of-n support in the WeightNormSparsifier (#65295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65295 The m-out-of-n is implemented as follows: 1. Compute the blocks that need to be sparsified using the weight-norm criterion 2. Within each block below the threshold find the smallest absolute value elements 3. Zero out only the smallest values within each block m-out-of-n describes sparsification scheme where in a block with "n" elements, only "m" of them would be zeroed-out. Block sparsity, with the whole block being all zeros, is a special case of m-out-n: If m==n, the whole block is reset. This echoes the implementation described in the https://github.com/pytorch/pytorch/issues/59835, as well as meets the support of the nVidia cusparselt requirements. To support the CUDA sparsity (2/4), one would need to set the sparsity_level to 1.0. That translates to all blocks of shape 1x4 within a tensor will sprasify with 2-out-4 scheme. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D31186828 Pulled By: z-a-f fbshipit-source-id: 7bd3e2707915b90f4831859781fc6e25f716c618	2021-10-01 03:19:15 -07:00
Zafar	8b1aa85388	[sparsity] Change API to take FQNs as configuration (#65296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65296 The original API described in the https://github.com/pytorch/pytorch/issues/59835 assumed that the per-layer configuration would take a module/layer reference. However, a more useful approach is to refer to the layers by their fully qualified names (FQN). That allows us to store the configuration in a file without serializing the models. We define a layer's FQN as it's "path" within a model. For example, if one can refer to a model using `model.layer0.sublayerX`, the FQN of the sublayerX is `'layer0.sublayerX'`. Test Plan: ``` python test/test_ao_sparsity.py -- TestBaseSparsifier buck test mode/opt //caffe2:test -- TestBaseSparsifier ``` Reviewed By: gchanan Differential Revision: D31186830 Pulled By: z-a-f fbshipit-source-id: d8d87f1c054e5c10d470e67837476a11e0a9b1d4	2021-10-01 03:17:31 -07:00
Dhruv Matani	ea0de37d2e	[PyTorch] Avoid string construction from const char* and speedup empty string creation if error messages are suppressed (#65939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65939 This change includes 2 separate optimizations. 1. Provide an overload of `debugString(const char, ...)` in addition to `debugString(std::string, ...)` for cases where `const char` is passed in to avoid `std::string` construction in cases where `STRIP_ERROR_MESSAGES` is also defined and the caller is passing in a `const char*` 2. Return `std::string("", 0)` instead of `""` since the former triggers no call to `std::basic_string`'s constructor whereas the latter does. [Godbolt Link](https://godbolt.org/z/oTExed5h8). However, I'm surprosed by this since the man page for [std::basic_string](https://en.cppreference.com/w/cpp/string/basic_string/basic_string) clearly states that the constexpr overload is since C++20, and I am building using `-Os -std=c++17` Godbolt Screenshot: {F667311023} ghstack-source-id: 139507542 Test Plan: CI and local build via: ``` buck build //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: swolchok Differential Revision: D31312942 fbshipit-source-id: aa24abbfe1c16419f235d037595321982614c5ea	2021-10-01 00:17:21 -07:00
Hariom Narang	2828ce53fd	Added jit log stream changing function and some refactor (#65768 ) Summary: Description: - Have only added `stdout` and `stderr` as possible options from python API for now. We can do file path passing later maybe. - Put the class `JitLoggingConfig` in the cpp file as none of its methods were being used outside of this file. Python API: `torch._C._jit_set_logging_stream('stdout\|stderr')` C++ API: `::torch::jit::set_jit_logging_output_stream(ostream);` Testing: - Tested python API locally. - Unit test for the C++ API is written Fixes https://github.com/pytorch/pytorch/issues/54182 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65768 Reviewed By: mrshenli Differential Revision: D31291739 Pulled By: ZolotukhinM fbshipit-source-id: eee72edc20488efad78a01c5b0ed8a132886a08d	2021-09-30 23:25:11 -07:00
Michael Suo	33c03cb61a	[deploy][1/n] Make deploy code conform to PyTorch style. (#65861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65861 First in a series. This PR changes the code in deploy.h/cpp and interpreter_impl.h/cpp to be camel case instead of snake case. Starting with this as it has the most impact on downstream users. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D31291183 Pulled By: suo fbshipit-source-id: ba6f74042947c9a08fb9cb3ad7276d8dbb5b2934	2021-09-30 22:59:47 -07:00
Mikhail Zolotukhin	765b6a90f3	[TensorExpr] Move lowerings registration from kernel.cpp to lowerings.cpp. (#65553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65553 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148921 Pulled By: ZolotukhinM fbshipit-source-id: 772062155043d4be9e9a25f6259b8e4a6cb762f4	2021-09-30 22:56:22 -07:00
Mikhail Zolotukhin	015e0079e3	[TensorExpr] Move 'compute' functions to operators/... (#65552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65552 This PR is mostly a verbatim move of several functions to different files. The goal is to have more consistency in what resides where. With this PR: All `compute` functions defining how a given operator needs to be lowered to TE IR will reside in `operators/.{cpp,h}`. * Auxiliary functions for these functions will reside in `operators/misc.cpp`. `compute` functions for ops not belonging anywhere else can also go to that file. `operators/unary.` is renamed to `operators/pointwise.` and now includes functions like `computeTwoOperands`. * `kernel.` now contains only JIT-related* logic and implementations of `TensorExprKernel` methods. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148923 Pulled By: ZolotukhinM fbshipit-source-id: e36ad8e779b8d30a33b49ea4ebf6d6a7438989f4	2021-09-30 22:56:20 -07:00
Mikhail Zolotukhin	3a0165da49	[TensorExpr] Port NNC lowerings to the new registry mechanism. (#65551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65551 Previously we had a big switch on Op kind to decide how to lower a given JIT operator to NNC. This PR changes this switch to a hash table lookup. Why? This helps us with at least two things: 1) With this approach we can easily check if we know how to handle a given node in advance - i.e. we can inspect the entire graph and tell whether it's possible to compile it or not without actually trying to do that and dying in the middle. This would allow us to, say, provide user-friendly error messages in AOT workflow. 2) We can switch to use schema instead of op kind to determine correct lowering. Unlike op schema, op kind might be ambigous (see e.g. #64963) and using it instead of schema can lead to bugs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148926 Pulled By: ZolotukhinM fbshipit-source-id: ac12684e2126c899426ef5e4cc1e3f70fa01f704	2021-09-30 22:56:18 -07:00
Mikhail Zolotukhin	eee9ad0fdd	[TensorExpr] Add a skeleton for a registry of NNC lowerings. (#65550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65550 This PR adds the source files and the class for the registry, subsequent PRs actually port existing lowerings to this mechanism. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31148922 Pulled By: ZolotukhinM fbshipit-source-id: 4c087b22ee898d5a5a18a5d2a4bb795aa2ffd655	2021-09-30 22:56:16 -07:00
Mikhail Zolotukhin	d84191fcc6	[TensorExpr] Kernel: make prim::ConstantChunk handled like other ops. (#65549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65549 Previously it had a special handling, with this change it follows the same mechanism as other ops. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31148924 Pulled By: ZolotukhinM fbshipit-source-id: 572d8ae5e123e7a0e2a656154d7bd0f73c785a06	2021-09-30 22:55:00 -07:00
Hao Lu	a6ad2b41ac	[Static Runtime] Make module_ optional in StaticModule (#65882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65882 `torch::jit::Module` is refcounted. There is no need to wrap it in a `shared_ptr`. Test Plan: ``` buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: mikeiovine Differential Revision: D31012222 fbshipit-source-id: 74d234bd85423e5ba0e396f24899631354a2c74b	2021-09-30 22:48:49 -07:00
Peter Bell	08df4c2b3c	slow_conv2d grad_input: avoid dispatch in parallel region (#65725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65725 See gh-56794 Avoid dispatch inside of parallel_for by: 1. Replacing Tensor slicing with TensorAccessor 2. Call `grad_input.zero_()` only once, outside of the parallel region 3. Replace `at::mm` with a `gemm` call Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31257876 Pulled By: ngimel fbshipit-source-id: f2902edeccd161431c1dfb1ab3e165d039ec259d	2021-09-30 22:47:31 -07:00
Jiewen Tan	6502fb89dd	Make JIT Aliasing Test Less Brittle (#65493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493 Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation. Test Plan: python test/test_ops.py -k test_variant_consistency_jit Reviewed By: mrshenli Differential Revision: D31321221 Pulled By: alanwaketan fbshipit-source-id: f4a5cbfd36bd0867d8c1bf9de9a65365ee7c35d6	2021-09-30 22:43:03 -07:00
Linbin Yu	4f5ea5983a	[QPL] move metadata logging to markerEnd for model run QPL (#65451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65451 This diff moved metadata logging from marker start to marker end. This should improve perf because we can skip metadata logging when mark is not sampled (using isMarkerOn) Test Plan: Verified metadata are logged: https://fburl.com/scuba/qpl_metrics/pytorch_employee/armjgtyw https://fburl.com/scuba/qpl_metrics/pytorch_employee/zz36zkr1 Reviewed By: xcheng16 Differential Revision: D31105548 fbshipit-source-id: 0eafaaefecb7e230021616e397e548a2fd2b92e9	2021-09-30 22:12:40 -07:00
Igor Sugak	2481c06496	[caffe2] fix LLVM-12 nullptr-with-nonzero-offset UBSAN error (#65506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65506 Test Plan: run a adfinder canary and verify this error is fixed. Reviewed By: swolchok Differential Revision: D31130083 fbshipit-source-id: c31f179f8a7de75ed6f6e7ee68b197f2970ddd3d	2021-09-30 21:47:25 -07:00
Peter Bell	f6dfac6974	Migrate THCCachingHostAllocator to ATen (#65746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65746 This also removes the cudaHostAllocator field on THCState, since there doesn't seem to be an API anywhere for customizing it. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31236630 Pulled By: ngimel fbshipit-source-id: 2a8e756222ae70565e77f8e7139d60ec5be32276	2021-09-30 21:26:38 -07:00
BowenBao	d39790340d	[ONNX] Enable export of __xor_ (#64042 ) (#64581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64581 * Enbale xor * Update test_pytorch_onnx_onnxruntime.py * Update symbolic_opset9.py * Update symbolic_opset9.py * Update test_pytorch_onnx_onnxruntime.py * Update symbolic_opset9.py Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919598 Pulled By: malfet fbshipit-source-id: 044e55d0697da0050f26a6ceccd1517493d7e8a6	2021-09-30 21:09:01 -07:00
BowenBao	e598ba2ef3	[ONNX] Fix inplace fill_ dtype export mismatch (#64233 ) (#64580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64580 Append `type_as` after convert `fill_` to `full_like` without dtype argument. BowenBao <bowbao@microsoft.com> Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919599 Pulled By: malfet fbshipit-source-id: f174977ced8f2c991b0615b65ff7c23fecf301c2	2021-09-30 21:08:59 -07:00
BowenBao	89cbe6229d	[ONNX] Update doc and error message for indexing export (#64290 ) (#64579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64579 Added suggested workarounds into indexing section of onnx export documentation. Update indexing export warning message with link to documentation. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919603 Pulled By: malfet fbshipit-source-id: 7fe65cb5aa7de4f7d93ff05011ba22f5adb27811 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:56 -07:00
BowenBao	d4ff344fae	[ONNX] Fix remainder export (#64230 ) (#64578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64578 * Fix remainder export for edge case when input is negative. New export relies on true_divide export. * Simplified true_divide export. Cleaned up redundant code which is handled by scalar type analysis pass. Removed dependency on `onnx::Where`, thus supports opset 7 & 8. Fixes #60179 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919601 Pulled By: malfet fbshipit-source-id: 0f78621c0ac3bdb6bf4225e049ba5f470dc8ab12 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:54 -07:00
BowenBao	0f0ef4fe64	Add onnx test for batched_nms (#53175 ) (#64381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64381 * Added new ONNX test for batched_nms * Update test according to PR in torchvision * Update test/onnx/test_pytorch_onnx_onnxruntime.py Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919602 Pulled By: malfet fbshipit-source-id: edfb5b9f75077429f7f242fd6ac06d962968dfba Co-authored-by: Bowen Bao <imbowenbao@outlook.com>	2021-09-30 21:08:52 -07:00
BowenBao	7e15f2ddaa	[ONNX] Fix gather squeeze axis in constant folding (#63588 ) (#64379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64379 * Fix gather squeeze axis in constant folding * mypy * fix indent * address comments Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919604 Pulled By: malfet fbshipit-source-id: 90edb054491433a0da2fe82324ac7c12f1ef062b	2021-09-30 21:08:50 -07:00
BowenBao	41bdfe3919	[ONNX] Fix cuda test case (#63597 ) (#64378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64378 * skip script test for unsupported autocast. * Fix test case by adding missed `autocast` and `model.cuda()`. Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919600 Pulled By: malfet fbshipit-source-id: 3231fc672d97de487d6e4460626df0ba25f212ce Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-09-30 21:08:48 -07:00
BowenBao	2d61009f4a	[ONNX] Fix input sequence for pad op (#60554 ) (#64377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64377 * Fix for input primitive sequence * Test mypy * Fix for tracing tuples * Fix for extra inputs * flake8 * Rebase * Fix for tracing tuples Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919606 Pulled By: malfet fbshipit-source-id: a718c4a12cda77b968cb636acd7aa63d7b5ba326	2021-09-30 21:08:45 -07:00
BowenBao	f17ee368b3	Fix empty size constant creation (#63607 ) (#64376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64376 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919608 Pulled By: malfet fbshipit-source-id: 0e789e8470ce0f130148df764ce77f6d4fd0a274	2021-09-30 21:08:43 -07:00
BowenBao	84190dafa8	[ONNX] Update instance_norm implementation and support training (#60538 ) (#64375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64375 * Update the instance_norm track_running_stats=True implementation and support the training mode * Reference: `9baf75c86e/aten/src/ATen/native/Normalization.cpp (L532)` * Fix https://github.com/pytorch/pytorch/issues/53887 Test Plan: Imported from OSS Reviewed By: jansel Differential Revision: D30919605 Pulled By: malfet fbshipit-source-id: 306eb2a1122bb5d90dcb7c18260a3a2057a21c34 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-30 21:07:26 -07:00
Jerry Zhang	3d6d4f4322	[fx2trt][quant] Add lowering support for per channel quantization in fx2trt (#64787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64787 This PR added support for lowering per channel quantization and dequantization operators in fx2trt, this also extends TensorMeta with extra arguments corresponding to per channel quantized Tensors, initially I was thinking of adding a qpram that can capture everything, but currently we still have some lowering support for fbgemm ops (which has scale and zero_point in operator interface). I think we can move everything to qprams after we deprecate lowering support for fbgemm ops in the future. Test Plan: Test for per channel weight: ``` python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py ``` change BC compatibility test expect for TensorMeta ``` python test/test_fx.py TestFXAPIBackwardCompatibility.test_class_member_back_compat --accept ``` Imported from OSS Reviewed By: jfix71, mrshenli, 842974287 Differential Revision: D30879848 fbshipit-source-id: 76c3804bb1d9343183ae53d9f02c1a3bf6c79e1c	2021-09-30 18:54:14 -07:00
Nikita Shulga	207fefc988	Delete rouge cu102 windows builds (#65961 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65961 Reviewed By: seemethere Differential Revision: D31325279 Pulled By: malfet fbshipit-source-id: b8748c0040cdcfb8182eb7c59a3770b7d0681de9	2021-09-30 18:44:02 -07:00
Yukio Siraichi	b3da2afebe	Clarified difference in behavior of `empty_strided` and `as_strided` (#64568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64568 Fix: #64389 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31299999 Pulled By: mruberry fbshipit-source-id: dd538ffa7cc1267ab6472806f4216b170dd0faad	2021-09-30 17:27:59 -07:00
Michael Suo	22f36353dc	Revert D31137652: [pytorch][PR] Skip failing tests when LAPACK and MAGMA are not available Test Plan: revert-hammer Differential Revision: D31137652 (`dd354117ef`) Original commit changeset: c969f75d7cf1 fbshipit-source-id: bc4cde4eeb5d38ac940ebb471abbd8b9009b3aee	2021-09-30 16:08:57 -07:00
Peter Bell	6285348f06	Implement n-dimensional hermitian FFTs (#63890 ) Summary: Closes https://github.com/pytorch/pytorch/issues/59127 cc mruberry peterbell10 walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/63890 Reviewed By: ngimel Differential Revision: D30761909 Pulled By: mruberry fbshipit-source-id: 06e1e4dc65726f35c99a74f18b9fa36eb7d694a5	2021-09-30 16:02:28 -07:00
Nelson Elhage	70f9f58a71	Add __module__ to torch.dtype.__dict__ (#65182 ) Summary: torch.dtype.__reduce__ returns a string, which causes Pickle to look up the object by module and name. In order to find the right module, Pickle looks for __module__ on the object; if it doesn't find that, it falls back to searching sys.modules. Previously, torch.dtype instances did not have a `__module__` attribute, so pickling dtypes would fall back to a search of sys.module. Instances of normal Python objects have a `__module__` attribute because normal Python classes have a `__module__` key in their `__dict__`. Imitate that by populating one in `torch.dtype`. We set the field in `tp_dict` before calling `PyType_Ready` (instead of afterwards) because of the doc warning against mutating a type's dictionary once initialized: https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_dict fixes https://github.com/pytorch/pytorch/issues/65077 --- I didn't add any tests because I didn't see any obvious places with similar tests for pickling or dtype objects. Let me know if I missed the right place, or should start one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65182 Reviewed By: mrshenli Differential Revision: D31310530 Pulled By: ezyang fbshipit-source-id: 20cd713ce175a709d6ce47459c3891162ce29d77	2021-09-30 14:58:11 -07:00
Scott Wolchok	38c77539e8	[PyTorch][Edge] Fix inefficiency in objLoaderMobile (#65710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65710 No need to incur extra refcount bumps, and no need to use a stringstream for what are presumably string keys anyway. ghstack-source-id: 139325445 Test Plan: CI, reviewers to confirm the keys are supposed to be strings Reviewed By: dhruvbird Differential Revision: D31215347 fbshipit-source-id: 82be93cb2e57aefe94edf74d149115cb734112be	2021-09-30 14:53:40 -07:00
Raghavan Raman	8f3983254b	[MicroBench] Added a micro benchmark for prefix sum (#65790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65790 Here are the results of the benchmark: * ATen - version that calls `at::cumsum` * NNC - a simple prefix-sum loop implemented in NNC (not vectorized) * Local - a C++ implementation of the simple prefix-sum loop * LocalAVX2 - a vectorized C++ implementation of prefix-sum, only using AVX2 * LocalAVX512 - a vectorized C++ implementation of prefix-sum, using AVX512. The vectorized implementations are from the paper "Parallel Prefix Sum with SIMD" in ADMS' 20. ``` $ OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/benchmarks/cpp/tensorexpr/tensorexpr_bench --benchmark_filter=PrefixSumBench Run on (36 X 1601 MHz CPU s) 2021-09-28 23:13:12 ------------------------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------ PrefixSumBench/ATen/64 1289 ns 1289 ns 543199 GB/s=397.069M/s PrefixSumBench/ATen/256 1867 ns 1867 ns 374232 GB/s=1096.8M/s PrefixSumBench/ATen/1024 4169 ns 4169 ns 167889 GB/s=1.9649G/s PrefixSumBench/ATen/4096 14137 ns 14136 ns 49266 GB/s=2.31806G/s PrefixSumBench/ATen/16384 49887 ns 49883 ns 13988 GB/s=2.6276G/s PrefixSumBench/ATen/65536 193742 ns 193686 ns 3628 GB/s=2.7069G/s PrefixSumBench/ATen/262144 764803 ns 764774 ns 917 GB/s=2.74219G/s PrefixSumBench/ATen/1048576 3040653 ns 3040277 ns 231 GB/s=2.75916G/s PrefixSumBench/Local/64 586 ns 586 ns 1197003 GB/s=873.244M/s PrefixSumBench/Local/256 1077 ns 1077 ns 646265 GB/s=1.90143G/s PrefixSumBench/Local/1024 3050 ns 3050 ns 229458 GB/s=2.68579G/s PrefixSumBench/Local/4096 11910 ns 11910 ns 58953 GB/s=2.75132G/s PrefixSumBench/Local/16384 43204 ns 43202 ns 16081 GB/s=3.03393G/s PrefixSumBench/Local/65536 167966 ns 167966 ns 4154 GB/s=3.12139G/s PrefixSumBench/Local/262144 667631 ns 667613 ns 1048 GB/s=3.14127G/s PrefixSumBench/Local/1048576 2654785 ns 2654631 ns 264 GB/s=3.15999G/s PrefixSumBench/NNC/64 642 ns 642 ns 1095277 GB/s=797.442M/s PrefixSumBench/NNC/256 1139 ns 1138 ns 617214 GB/s=1.799G/s PrefixSumBench/NNC/1024 3103 ns 3103 ns 225531 GB/s=2.63979G/s PrefixSumBench/NNC/4096 12053 ns 12052 ns 58084 GB/s=2.71883G/s PrefixSumBench/NNC/16384 43227 ns 43225 ns 16192 GB/s=3.03231G/s PrefixSumBench/NNC/65536 168065 ns 168056 ns 4153 GB/s=3.11972G/s PrefixSumBench/NNC/262144 668974 ns 668921 ns 1045 GB/s=3.13513G/s PrefixSumBench/NNC/1048576 2657464 ns 2657341 ns 263 GB/s=3.15677G/s PrefixSumBench/LocalAVX2/64 523 ns 523 ns 1351308 GB/s=979.537M/s PrefixSumBench/LocalAVX2/256 755 ns 755 ns 927762 GB/s=2.71159G/s PrefixSumBench/LocalAVX2/1024 1759 ns 1759 ns 400355 GB/s=4.65609G/s PrefixSumBench/LocalAVX2/4096 6708 ns 6706 ns 103959 GB/s=4.88649G/s PrefixSumBench/LocalAVX2/16384 22143 ns 22142 ns 31229 GB/s=5.91951G/s PrefixSumBench/LocalAVX2/65536 83649 ns 83642 ns 8350 GB/s=6.26828G/s PrefixSumBench/LocalAVX2/262144 330433 ns 330427 ns 2133 GB/s=6.34679G/s PrefixSumBench/LocalAVX2/1048576 1302301 ns 1302179 ns 537 GB/s=6.44198G/s PrefixSumBench/LocalAVX512/64 474 ns 474 ns 1459151 GB/s=1080.8M/s PrefixSumBench/LocalAVX512/256 576 ns 576 ns 1217442 GB/s=3.55524G/s PrefixSumBench/LocalAVX512/1024 994 ns 994 ns 703387 GB/s=8.24434G/s PrefixSumBench/LocalAVX512/4096 3642 ns 3641 ns 190646 GB/s=8.99857G/s PrefixSumBench/LocalAVX512/16384 10140 ns 10140 ns 68947 GB/s=12.9267G/s PrefixSumBench/LocalAVX512/65536 35739 ns 35736 ns 19567 GB/s=14.6711G/s PrefixSumBench/LocalAVX512/262144 156415 ns 156413 ns 4467 GB/s=13.4078G/s PrefixSumBench/LocalAVX512/1048576 613952 ns 613876 ns 1144 GB/s=13.665G/s ``` Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D31253849 Pulled By: navahgar fbshipit-source-id: f33e7be787c86a09e90babddd66b16e2e0777eb4	2021-09-30 14:44:52 -07:00
Michael Suo	24f59fa20b	[ci] fix softmax bc check (#65952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65952 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31320441 Pulled By: suo fbshipit-source-id: ddd2ccca523d7ed31b231d924fbd6206525f16cf	2021-09-30 14:40:43 -07:00
Kefei Lu	d4d3bb91f9	Refactor `OperatorSupport` related code and fix TRT not supporting int64 dtype (#65848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65848 This diff includes: * [fix]: The initialization of `OperatorSupport._support_dict` makes it a class variable, so we need to move its initialization into constructor. * Add abstract class (more of an interface) `OperatorSupportBase`, since `OperatorSupport`'s purpose is too specific. * [refactor]: what `TRToperatorSupport` really does is to populate a `OperatorSupport._support_dict`, so there really is no reason for subclassing. So removing it, and changing it to instantiating a `OperatorSupport` with properly populated `_support_dict`. * Add a framework for defining simple and basic op support logic, and composing them into more complex ones: 1. `create_op_support` wraps a function into a `OperatorSupportBase` instance 2. `chain` can combine several simple `OperatorSupportBase` into more complex ones 3. `OpSupports` provides a set of pre-defined, simple `OperatorSupportBase` that can be composed together using `chain`. 1. Currently the only pre-defined one is `decline_if_input_dtype(..)`, which declares a node non-supported, if its args are of user specified dtype * Fix `TRTOperatorSupport` so that it not only looks for registered converters, but also decline a node if its arg is of int64 Test Plan: linter and CI Reviewed By: 842974287 Differential Revision: D31275525 fbshipit-source-id: bbc02f7ccf4902a7912bb98ba5be2c2fbd53b606	2021-09-30 13:36:26 -07:00
Michael Suo	9ae63bd87c	Revert D31238123: [pytorch][PR] Avoid saving self for`softmax` and `log_softmax` Test Plan: revert-hammer Differential Revision: D31238123 (`fb412bdd80`) Original commit changeset: afd319d3676d fbshipit-source-id: b7980d653a4b8322a225f1dd08c2857ecbe5bc94	2021-09-30 11:34:14 -07:00
Ivan Yashchuk	541eb1db63	Add cuSPARSE descriptors and update CSR addmm (#60838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60838 Rewrote `addmm_out_sparse_csr_dense_cuda` implementation using new cusparse descriptors. `addmm` now works without conversions with both 32-bit and 64-bit indices. The dense tensors can have a row- or column-major layout. If the dense tensors are a contiguous slice of a larger tensor, the storage is used directly without temporary copies. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D30643191 Pulled By: cpuhrsch fbshipit-source-id: 5555f5b59b288daa3a3987d322a93dada63b46c8	2021-09-30 11:32:51 -07:00
Jithun Nair	be00f0207a	Update git version for CentOS base dockers (#65703 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65048 cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65703 Reviewed By: albanD Differential Revision: D31245666 Pulled By: janeyx99 fbshipit-source-id: 5431876bf19435eb3fd90a53a3ec94fd66c9210e	2021-09-30 11:26:21 -07:00
Michael Suo	8297a16cc0	[ci] try installing libgnutls to fix cert error (#65934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65934 see: https://github.com/pytorch/pytorch/issues/65931, this was a suggested remediation on the linked issue Test Plan: Imported from OSS Reviewed By: malfet, zhouzhuojie Differential Revision: D31313040 Pulled By: suo fbshipit-source-id: a9e2b82a1e879962af768ed3049c73ab77394738	2021-09-30 11:23:17 -07:00
Nikita Shulga	6a30d83596	Move ASAN to GHA (#65846 ) Summary: - Introduce `ciflow/sanitizers` label - Modify asan pattern in `.jenkins/pytorch/build.sh` - Produce wheel in `.jenkins/pytorch/build-asan.sh` - Increase stack size hard limit to 82Mb in test docker containers Pull Request resolved: https://github.com/pytorch/pytorch/pull/65846 Reviewed By: seemethere Differential Revision: D31282654 Pulled By: malfet fbshipit-source-id: f73e692899cc9bbe106ececc26f1fe430dfeae9d	2021-09-30 09:49:52 -07:00
Eli Uriegas	cdbfb2b689	.github: Bump linux and windows gpu max available (#65923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65923 Still noticing that queues are long particularly for windows GPU machines, bumping this to compensate Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31308728 Pulled By: seemethere fbshipit-source-id: b68c3a76335960def23e1f425ba5b0a219f07e73	2021-09-30 09:38:02 -07:00
Elias Ellison	928a4bbafb	[JIT] Fix compilation unit reference link in constant object upon load (#65784 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/65442, make sure objects inserted into the graph from load do not holding owning reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65784 Reviewed By: suo Differential Revision: D31251033 Pulled By: eellison fbshipit-source-id: 59efe19ce6f70744383de4eebf0f89f79f3eb03a	2021-09-30 09:32:28 -07:00
Kevin Tse	8130157504	[DataPipe] Fixes an issue where TarArchiveReader closes stream when read into a buffer (#65877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65877 Fixes #65808 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31296041 Pulled By: NivekT fbshipit-source-id: cdcad3a333ae9781d6063678a122a128955b0ff4	2021-09-30 08:46:32 -07:00
Martin Yuan	7f87ff183d	[RFC] [Modular] Include less headers in vararg_functions.cpp (#65672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65672 `ATen/ATen.h` has a list of all headers but vararg_functions.cpp only uses two of them. Change to include less for min_runtime. ghstack-source-id: 139389772 Test Plan: CI Reviewed By: larryliu0820 Differential Revision: D31198293 fbshipit-source-id: 9794a2696a1b124be7fced2836c633ae899aa5c8	2021-09-30 08:35:28 -07:00
albanD	ea776fa034	Update CODEOWNERS for optim (#65773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65773 Reviewed By: mrshenli Differential Revision: D31269749 Pulled By: albanD fbshipit-source-id: 1ec35d2396797b8e97a7122e2b3a9021f8fcf0a0	2021-09-30 08:30:42 -07:00
Erjia Guan	b777d790ea	Convert Sampler back to lazily construction (#63646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63646 Fixes #63609 Test Plan: Imported from OSS Reviewed By: NivekT Differential Revision: D30451774 Pulled By: ejguan fbshipit-source-id: 550d77494326446d1a42b5da0559e0d384c47413	2021-09-30 07:32:06 -07:00
Supriya Rao	4666e3f192	[quant] update fused_obs_fake_quant op to accept output_fake_quant argument (#65621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65621 Add a new attribute to the FusedMovingAvgObsFakeQuantize that controls if the Fake Quant operation should be applied at the output of a particular layer. The motivation is to give the users additional control to control the numerics of the fake_quant operators during training. It defaults to always fake quant the output (True). Note: We will still observer the tensors as before (only the fake_quant operation is controlled using this flag) For example ``` input model x -> fc1 -> fc2 -> non_quantizable_op -> fc3 After fake_quant x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> fake_quant(fc2) -> non_quantizable_op -> fake_quant() -> fc3 -> fake_quantize(fc3) With output_fake_quant disabled at the output of fc2 and fc3 (since their outputs are non-quantizable) x -> fake_quant(x) -> fc1 -> fake_quant(fc1) -> fc2 -> non_quantizable_op -> fake_quant() -> fc3 ``` Test Plan: ./buck-out/gen/caffe2/test/quantization_fx\#binary.par -r test_disable_output_fake_quant Reviewed By: jerryzh168 Differential Revision: D31174526 fbshipit-source-id: bffe776216d041fb09133a6fb09bfc2c0bb46b89	2021-09-30 01:08:01 -07:00
Charles David Hernandez	6d4b93bd96	[quant] adding memoryless observers for embeddingbag QAT work (#65699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65699 related to: https://github.com/pytorch/pytorch/pull/65443#discussion_r715132425 The QAT and PAT (pruning aware training) support for embedding bags needs a memoryless observer to work properly. This is necessitated by the changing pruned/non-pruned weights during training which can significantly change the quantization parameters. This PR adds a memoryless flag to the simpler observer classes (not moving average since those explicitly have memory) In addition to the above, I altered the reset_min_max_vals function for MinMaxObserver so that it would preserve the device of the existing self.min_val and self.max_val which was not preserved previously compared to how it is initialized (using factory_kwargs) Test Plan: python test/test_quantization.py TestObserver (added test_memoryless_minmaxobserver, test_memoryless_per_channel_minmaxobserver, test_memoryless_histogramobserver) Imported from OSS Reviewed By: supriyar Differential Revision: D31209773 fbshipit-source-id: 44a63298e44880fbd3576f49ac568e781f3fd79a	2021-09-30 00:55:32 -07:00
Michael Suo	de80aff72d	Revert D31132861: Make JIT Aliasing Test Less Brittle Test Plan: revert-hammer Differential Revision: D31132861 (`9f97c66a7a`) Original commit changeset: 26fc2e6bc77b fbshipit-source-id: 46be9168179d555be6b6a92b54b2bb84b3f834ed	2021-09-29 23:39:40 -07:00
Don Jang	4176afc4a0	[Static Runtime] Disable SigridTransform + ListUnpack fusion when outputs reachable from graph output (#62697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62697 Reviewed By: hlu1 Differential Revision: D29979402 fbshipit-source-id: 913e8396a0530ce3617211112a2b1147ef2e9df9	2021-09-29 22:47:48 -07:00
Kevin Tse	edab202a30	[DatePipe] add deprecation warnings for DataPipes that will solely exist in TorchData (#65827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65827 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D31272794 Pulled By: NivekT fbshipit-source-id: 8da8266184b4df050422904cbc5fca6d7c3d2e02	2021-09-29 22:42:22 -07:00
Don Jang	cd458fe092	[JIT] Make output of prim::TupleConstruct alias only with its inputs (#64879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64879 This change makes the output of `prim::TupleConstruct` alias only with its inputs when the created tuple is directly returned from the graph. The same treatment could be made to any tuples newly constructed by `prim::TupleConstruct` if they do not let their elements escape. However, this change only focuses on only one simplest, but frequently used usecase: tuples constructed only to be returned from a graph. This usecase turns out to be very often used. Test Plan: Added - `AliasMoveForTupleConstructWithSingleUseAsGraphOutput` - `WildcardAliasForTupleConstructWithUses` to cover the newly added code. Reviewed By: eellison Differential Revision: D30437737 fbshipit-source-id: 417fbc6bc348062e60e7acdddd340d4754d090eb	2021-09-29 21:56:31 -07:00
Ivan Yashchuk	dd354117ef	Skip failing tests when LAPACK and MAGMA are not available (#64930 ) Summary: Skip failing tests when LAPACK and MAGMA are not available for ` test_linalg.py` and ` test_ops.py`. Note that there's no CI without LAPACK or MAGMA. I verified locally that now it works as expected, but in the future we have no guards against tests failing again for this situation. <details> <summary> test_ops.py failures that are fixed</summary> ``` FAILED test/test_ops.py::TestCommonCPU::test_out_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_reference_testing_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_linalg_tensorinv_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestCommonCPU::test_variant_consistency_eager_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_grad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestGradientsCPU::test_forward_mode_AD_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestJitCPU::test_variant_consistency_jit_triangular_solve_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_linalg_tensorinv_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_conj_view_triangular_solve_cpu_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_linalg_tensorinv_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_ops.py::TestMathBitsCPU::test_neg_view_triangular_solve_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> <details> <summary> test_linalg.py failures that are fixed</summary> ``` FAILED test/test_linalg.py::TestLinalgCPU::test_norm_dtype_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_norm_matrix_cpu_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCPU::test_nuclear_norm_axes_small_brute_force_old_cpu - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_complex128 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_eigh_hermitian_grad_meta_float64 - RuntimeError: Calling torch.linalg.eigh or eigvalsh on a CPU tensor requires compiling PyTorch with LAPACK. Please use PyTorch built with LAPACK support. FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_inverse_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_lu_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_broadcasting_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_old_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_batched_non_contiguous_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_solve_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_square_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_all_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_col_maj_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgMETA::test_svd_tall_some_meta_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_inverse_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_lowrank_cuda_float64 - RuntimeError: Calling torch.lu on a CUDA tensor requires compiling PyTorch with MAGMA. lease rebuild with MAGMA. FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex128 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_complex64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_square_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_all_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_col_maj_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float32 - RuntimeError: svd: LAPACK library not found in compilation FAILED test/test_linalg.py::TestLinalgCUDA::test_svd_tall_some_cuda_float64 - RuntimeError: svd: LAPACK library not found in compilation ``` </details> Fixes https://github.com/pytorch/pytorch/issues/59662 cc mruberry jianyuh nikitaved pearu walterddr IvanYashchuk xwang233 Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/64930 Reviewed By: H-Huang Differential Revision: D31137652 Pulled By: mruberry fbshipit-source-id: c969f75d7cf185765211004a0878e7c8a5d3cbf7	2021-09-29 21:31:14 -07:00
Yi Wang	2c29ec2a41	Remove "SciPioneer" from PT Distributed code owners (#65862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65862 ghstack-source-id: 139378782 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D31291340 fbshipit-source-id: 65d6a82c57dd50d8a4241e9442d73002590989d9	2021-09-29 20:52:01 -07:00
Mike Ruberry	91f8755b0e	Revert D31005792: [NCCL] Init dummy NCCL comms in constructor Test Plan: revert-hammer Differential Revision: D31005792 (`2b22a5dde2`) Original commit changeset: c2c582dee25a fbshipit-source-id: d8e962b8aab6fda8a6c013e8577492dff9568c27	2021-09-29 20:46:38 -07:00
Natalia Gimelshein	5349ea921b	Migrate THCIntegerDivider.cuh to ATen (#65745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65745 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257937 fbshipit-source-id: 283693525859b7a77a116df0c227653763911a42	2021-09-29 20:37:41 -07:00
Kiuk Chung	3900509b7d	(torchelastic) make --max_restarts explicit in the quickstart and runner docs (#65838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65838 closes https://github.com/pytorch/pytorch/pull/65675 The default `--max_restarts` for `torch.distributed.run` was changed to `0` from `3` to make things backwards compatible with `torch.distributed.launch`. Since the default `--max_restarts` used to be greater than `0` we never documented passing `--max_restarts` explicitly in any of our example code. Test Plan: N/A doc change only Reviewed By: d4l3k Differential Revision: D31279544 fbshipit-source-id: 98b31e6a158371bc56907552c5c13958446716f9	2021-09-29 19:29:01 -07:00
Zafar Takhirov	c7ef620a14	[quant] Add imports to the torch/ao/quantization/__init__.py (#64911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64911 The import statements that involve the `quantize.py` were not added to the module level __init__ file. Those imports are necessary to mimic the behavior of the old import locations. Otherwise, the user would need to change their import statements to `from torch.ao.quantization.quantize import quantize` (instead of `from torch.ao.quantization import quantize`. Another change in this diff is that we don't use `__all__` anymore. The all dunder was never used in quantization anyway, and just creates a potential bug when using `from ... import *`. ghstack-source-id: 139342483 Test Plan: `buck test mode/dev //caffe2/test:quantization` Reviewed By: vkuzo Differential Revision: D30897663 fbshipit-source-id: a7b4919a191755e3ba690a79ce3362889f416689	2021-09-29 19:08:45 -07:00
soulitzer	fb412bdd80	Avoid saving self for`softmax` and `log_softmax` (#65242 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64000 - updates double backward formula to compute grad wrt output instead of self - ~~In some of the error messages, we still refer to the dtype of the input, even though we are now checking the dtype of the output~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/65242 Reviewed By: albanD Differential Revision: D31238123 Pulled By: soulitzer fbshipit-source-id: afd319d3676d9ef8d81607e0e8c2a3e6d09f68e4	2021-09-29 18:16:12 -07:00
Masaki Kozuki	768cfaa8f8	fix typo in _sharded_tensor (#65511 ) Summary: per title cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65511 Reviewed By: albanD Differential Revision: D31239269 Pulled By: cbalioglu fbshipit-source-id: 602c0bf7ef96a930606d68b15a5b3cadda9d9437	2021-09-29 18:00:47 -07:00
Jiewen Tan	9f97c66a7a	Make JIT Aliasing Test Less Brittle (#65493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65493 Added a last resolve to use whatever ATen operator that has Tensor outputs in the graph as the operator node to check alias annotation. Test Plan: python test/test_ops.py -k test_variant_consistency_jit_linalg_tensorinv python test/test_ops.py -k test_variant_consistency_jit_nn_functional_normalize Reviewed By: eellison Differential Revision: D31132861 Pulled By: alanwaketan fbshipit-source-id: 26fc2e6bc77be3a296967cf29a3f6ded231302fa	2021-09-29 17:11:04 -07:00
soulitzer	91611fe1d1	Decouple forward AD checks from backward AD in OpInfo tests and gradcheck (#65040 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64999 - Adds a flag to gradcheck `check_backward_ad` that can be used to disable gradcheck for backward ad - This is a bit bc-breaking in terms of positional args, but I prefer this ordering - In OpInfo tests for forward ad: - set `check_backward_ad` False - In test_ops treat `supports_autograd` as if it is `supports_backward_ad` (it basically already is) - the only modification needed is to no longer skip forward ad tests if `supports_autograd` is false - test_dtype, test_variant_consistency, etc behave correctly as-is - In a follow-up PR, we can rename it to actually be `supports_backward_ad` - Testing - https://github.com/pytorch/pytorch/pull/65060 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65040 Reviewed By: albanD Differential Revision: D31238177 Pulled By: soulitzer fbshipit-source-id: f068d4cbe7ffb094930b16cddb210583b9b7b2c4	2021-09-29 17:01:34 -07:00
Nikita Shulga	5950240bdf	Stop Win+CUDA-10.2 builds (#65649 ) Summary: See https://github.com/pytorch/pytorch/issues/65612 and https://github.com/pytorch/pytorch/issues/25393 Fixes https://github.com/pytorch/pytorch/issues/65648 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65649 Reviewed By: janeyx99 Differential Revision: D31189692 Pulled By: malfet fbshipit-source-id: 6ec0548d5833f3428d882071d26c357d89b0a9ba	2021-09-29 15:41:23 -07:00
Rohan Varma	2b22a5dde2	[NCCL] Init dummy NCCL comms in constructor (#65173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65173 Initializes dummy NCCL communicators in constructor for a basic health check that communicators can be initialized prior to launching the first collective. After successful init, we immediately use `ncclCommAbort` to destroy these communicators to ensure they don't interfere with regular communicator creation during collectives. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31005792 fbshipit-source-id: c2c582dee25a098361ead6ef03f541e7833c606b	2021-09-29 15:36:54 -07:00
Natalia Gimelshein	ad85b582da	Remove THCDeviceTensor (#65744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65744 This is just dead code. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257940 fbshipit-source-id: 6c02264106c2dcbadd332f24b95bc9351a04fd9e	2021-09-29 14:54:46 -07:00
Peter Bell	20374c991b	slow_conv2d_forward: avoid calling dispatcher in parallel region (#65724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65724 See gh-56794 Avoid dispatch inside of parallel_for by: 1. Replacing Tensor slicing with TensorAccessor 2. Copy bias into output only once, outside of the parallel region 3. Replaces `addmm`_ with a direct call to gemm. Technically this also adds a new requirement that the output always be contiguous, but the out argument version isn't exposed or used anywhere in the `torch.nn` API. So that should be fine. Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D31257875 Pulled By: ngimel fbshipit-source-id: 84d2b39e7f65334bdfcc2c4719f93ee3c514ca32	2021-09-29 14:09:32 -07:00
Jack Kelly	7191dd2613	Update Module docstring for Python 3 (#65748 ) Summary: In Python 3, we can call `super()` without any arguments. If I understand correctly, Python 2 is no longer supported by PyTorch, so we can change the documentation to be Python-3 only :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65748 Reviewed By: saketh-are Differential Revision: D31246055 Pulled By: albanD fbshipit-source-id: 3980def1a556d4bdfa391ea61cb2a65efa20df79	2021-09-29 13:40:15 -07:00
Vasiliy Kuznetsov	8bf0ba546e	ns for fx: add basic testing on cuda (#65593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65593 Adds test cases that the three Numeric Suite Core APIs work when the models are on cuda. In particular: 1. create models and move them to cuda 2. add loggers (if applicable) 3. run data through (if applicable) 4. extract results It works without code changes because a `Logger` object is created without any device specific objects (they only get added if a data is passed through). It's good to have this tested. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_cuda python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_loggers_cuda python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_cuda ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D31160897 fbshipit-source-id: 8eacf164d0496baf2830491200ea721c0f32ac92	2021-09-29 13:06:30 -07:00
Natalia Gimelshein	0dd1b74a5b	Migrate THCScanUtils to ATen (#65743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65743 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31257938 fbshipit-source-id: 273b22df41bb7f2a0ab605ec1f6322c2937e7472	2021-09-29 12:39:37 -07:00
Dhruv Matani	a84feeeade	[PyTorch Edge] Conditionally trim dispatch key set to save heap memory at runtime (#65732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65732 For certain on-device uses, runtime memory comes at a premium. On-device deployments won't use all the available dispatch keys, so it makes sense to keep only the on-device specific ones around for such uses to reduce runtime heap memory allocated. This change keeps just 10 dispatch keys (the ones that used on-device), guarded under the `C10_MOBILE_TRIM_DISPATCH_KEYS` macro. it tries to keep the other code-paths unaffected and uses `constexpr` for use in the `array` declaration, and simple inline functions to ensure that the compiler is able to optimize these for server builds. Test Plan: Build and check mobile models end to end. ``` buck build -c "pt.enable_milan_dispatch_keys_trimming"=1 //xplat/caffe2/fb/lite_predictor:lite_predictor ``` Reviewed By: ezyang Differential Revision: D31185407 fbshipit-source-id: e954765606373dea6ee9466a851dca7684167b0b	2021-09-29 12:20:33 -07:00
Eli Uriegas	7b5d676fa1	.github: Bump linux gpu max limit to 100 (#65831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65831 Was noticing scaling issues last night due to the lack of linux.8xlarge.nvidia.gpu machines, seems as though that even at max capacity we were still about ~50 queued workflows behind, this should close that gap. Also since these run the longest types of tests these are the most likely to overlap with scale messages being processed while available runners are still maxed out Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31275892 Pulled By: seemethere fbshipit-source-id: b22ceda115b70d7bdd9c4bc207b55ffab50381ef	2021-09-29 12:06:54 -07:00
Mike Iovine	c975ca4337	[Static Runtime] Simplify out variant overload implementations (#65384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65384 The following pattern appears frequently in `ops.cpp`: ``` if (!n->matches(schema_1) && !n->matches(schema_2) && ... && !n->matches(schema_n)) { LogAndDumpSchema(n); return nullptr; } return [](ProcessedNode* p_node) { if (p_node->Output(0).isNone()) { if (p_node->Input(i).isSomeType()) { // special logic for schema 1 } else if (p_node->Input(i).isSomeOtherType()) { // special logic for schema 2 } else if (...) { // special logic for schema3 } // and so on } else { // another complicated type checking chain } }; ``` A much cleaner way to implement operator overloads is like this: ``` if (n->matches(schema_1)) { return schema_1_impl; } else if (n->matches(schema_2)) { return schema_2_impl; } // and so on ``` This has a few advantages: * Significantly reduces complexity of the out variant implementations, especially for ops with more than 2 overloads. One implementation corresponds to one schema. This makes the implementation more readable/maintainable. * Adhering to this convention makes it easier to add a new overload. Just add a new `n->matches(...)` case instead of working the schema into existing complicated logic. * Ops are marginally faster since we don't have to check types at runtime. Note: there are a few cases where this actually made the code less concise (`aten::div`), so I left those ops untouched. Thanks for pointing this out in another diff d1jang Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest` Reviewed By: hlu1 Differential Revision: D31072328 fbshipit-source-id: c40a4f7e6a79881e94c9ec49e9008ed75cfc8688	2021-09-29 12:02:11 -07:00
Eli Uriegas	2f712c452e	.github: Remove confusing on_pull_request variable (#65731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65731 It originally had purpose but after ciflow was introduced every PR had on_pull_request set so it's not really as useful as it once was Also removes the equally as confusing only_build_on_pull_request variable as well This change should produce no functional changes in our generated workflows Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31225398 Pulled By: seemethere fbshipit-source-id: 7bd8e8175794ab7d09b0632321bf52538435e858	2021-09-29 11:56:13 -07:00
Jane Xu	6c2f235d36	common_utils.py: Add ASAN as a platform for which you can disable tests (#65791 ) Summary: Could be useful for the future. Next steps: document it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65791 Reviewed By: suo Differential Revision: D31254115 Pulled By: janeyx99 fbshipit-source-id: 715c18b4505f2be6328aa0be25976116d6956b25	2021-09-29 11:00:03 -07:00
Kefei Lu	911d01c1de	type annotate operator_support (#65136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65136 Opportunistically add type annotation for operator_support.py Test Plan: run linter, CI Reviewed By: yinghai Differential Revision: D30928464 fbshipit-source-id: 615c75152b9938792f03cdceb2a113bda6ab28c7	2021-09-29 10:38:47 -07:00
Pruthvi Madugundu	085e2f7bdd	[ROCm] Changes not to rely on CUDA_VERSION or HIP_VERSION (#65610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65610 - Replace HIP_PLATFORM_HCC with USE_ROCM - Dont rely on CUDA_VERSION or HIP_VERSION and use USE_ROCM and ROCM_VERSION. - In the next PR - Will be removing the mapping from CUDA_VERSION to HIP_VERSION and CUDA to HIP in hipify. - HIP_PLATFORM_HCC is deprecated, so will add HIP_PLATFORM_AMD to support HIP host code compilation on gcc. cc jeffdaily sunway513 jithunnair-amd ROCmSupport amathews-amd Reviewed By: jbschlosser Differential Revision: D30909053 Pulled By: ezyang fbshipit-source-id: 224a966ebf1aaec79beccbbd686fdf3d49267e06	2021-09-29 09:55:43 -07:00
Michael Suo	9b40eaaaab	Revert D31193205: [pytorch][PR] CMake: Limit python include directories to only python libraries Test Plan: revert-hammer Differential Revision: D31193205 (`971c57f1d0`) Original commit changeset: 5c1b554a59d0 fbshipit-source-id: 5719b7df987ded6e7e212749a438db947656df87	2021-09-29 09:49:33 -07:00
Richard Barnes	2670cacfc2	LLVM-12 fix for tensor_new.cpp (#65785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65785 Fixes offset to nullptr at fbcode/caffe2/torch/csrc/utils/tensor_new.cpp:206 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31250995 fbshipit-source-id: 56c7761787e732180a2537a8aa4346a39e7399a8	2021-09-29 09:35:18 -07:00
Natalia Gimelshein	09eb3e661c	don't check 0 elements for cat symbolic diff (#65751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65751 Fixes symbolic script grad formula for cat to correctly handle empty tensors Test Plan: Existing tests Reviewed By: eellison Differential Revision: D31208364 fbshipit-source-id: d676d9abcc033b56076fa946f58f3db50034502d	2021-09-29 09:34:03 -07:00
Peter Bell	1d681c1ab2	Migrate THCThrustAllocator to ATen (#65492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65492 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148180 Pulled By: ngimel fbshipit-source-id: d5e4902036493517ca97c3442713b5e0e79229f9	2021-09-29 09:27:41 -07:00
Peter Bell	971c57f1d0	CMake: Limit python include directories to only python libraries (#65654 ) Summary: `include_directories` is old-style CMake which adds the include path to every file being compiled. This instead makes python, numpy and pybind11 into targets that only torch_python and caffe2_pybind_state are linked to. So, python libraries can't be accidentally included elsewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65654 Reviewed By: gchanan Differential Revision: D31193205 Pulled By: malfet fbshipit-source-id: 5c1b554a59d0e441a701a04ebb62f0032d38b208	2021-09-29 08:09:08 -07:00
Mike Iovine	5f7ab7be6f	[Static Runtime] concat_add_mul_replacenan_clip retains axis arg (#65741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65741 This op previously assumed `axis == 1`, causing graphs that would otherwise be valid to return incorrect results after fusing. Reviewed By: hlu1 Differential Revision: D31234944 fbshipit-source-id: 89885a3b119357698ebd9fd429b009813260a2f4	2021-09-29 08:04:20 -07:00
Dhruv Matani	f63150fd1d	[PyTorch Edge] Reduce the cost of computing isIncludedInAlias() (#65735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65735 Currently, `isIncludedInAlias()` calls `getRuntimeDispatchKeySet()` which creates a new `DispatchKeySet` object from an enumerated list of dispatch keys. `isIncludedInAlias()` then checks if a single dispatch key is part of this set. Instead, just pass in the key one wishes to check. This is marginally faster. ghstack-source-id: 139281528 Test Plan: See these 2 AI Bench Runs on the Milan-FFF-11-30 device. ### Before [AI Bench](https://www.internalfb.com/intern/aibench/details/237302972704466), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632804218329.html) ### After [AI Bench](https://www.internalfb.com/intern/aibench/details/606320012968375), [Flamegraph](https://interncache-all.fbcdn.net/manifold/aibench/tree/mobile/pt/profiling_reports/speech_transducer_v25_perf_1632807348803.html) Check the the flamegraphs, and focus on any kernel registration code path during library initialization. Reviewed By: swolchok Differential Revision: D31228062 fbshipit-source-id: 7a986e3593c30ded7919cd3b564ec579dc97ab5f	2021-09-29 07:40:39 -07:00
Philip Meier	aebde1bc2b	deprecate device getter from `torch.testing` namespace (#63844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63844 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31141433 Pulled By: mruberry fbshipit-source-id: a29331278ab99a19e225e2cb357458e3db4f9732	2021-09-29 02:40:52 -07:00
Philip Meier	07d5d7b5cc	move kernel launch checks from `torch.testing` to `torch.testing._internal.check_kernel_launches` (#60862 ) Summary: The fact that these functions are only used in a single test might be a good enough reason to move them to that module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/60862 Reviewed By: H-Huang Differential Revision: D31141354 Pulled By: mruberry fbshipit-source-id: 6ce1f721b88620c5f46222ad1b942bc689f0a3e0	2021-09-29 00:39:22 -07:00
Mike Ruberry	0a0564a347	Revert D31206837: [pytorch][PR] `*_solve` methods: implements forward AD Test Plan: revert-hammer Differential Revision: D31206837 (`26e31f76b0`) Original commit changeset: 040beda97442 fbshipit-source-id: f28091327357af9f54f367eda6606240924b93ac	2021-09-28 23:31:16 -07:00
Philip Meier	f9c2dc860d	make layout check optional in torch.testing.assert_close() (#65419 ) Summary: In case the inputs have a different layout, `assert_close(..., check_layout=False)` converts them to strided before comparison. This is helpful if you just want to compare the values of sparse COO / CSR tensor against a strided reference. This keeps BC, since the default `check_layout=True` was the old, hard-coded behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65419 Reviewed By: H-Huang Differential Revision: D31133629 Pulled By: mruberry fbshipit-source-id: ca8918af81fb0e0ba263104836a4c2eeacdfc7e6	2021-09-28 23:23:41 -07:00
Richard Barnes	8a247fb418	LLVM-12 fix for shm_mutex (#65781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65781 Fixes ``` stderr: In file included from caffe2/caffe2/contrib/shm_mutex/shm_mutex.cc:1: caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:28: error: anonymous non-C-compatible type given name for linkage purposes by alias declaration; add a tag name here [-Werror,-Wnon-c-typedef-for-linkage] using TicketStruct = struct : ShmBaseHeader { ^ TicketStruct caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:31: note: type is not C-compatible due to this base class using TicketStruct = struct : ShmBaseHeader { ^~~~~~~~~~~~~ caffe2/caffe2/contrib/shm_mutex/shm_mutex.h:334:7: note: type is given name 'TicketStruct' for linkage purposes by this alias declaration using TicketStruct = struct : ShmBaseHeader { ^ 1 error generated. Cannot execute a rule out of process. On RE worker. Thread: Thread[main,5,main] Command failed with exit code 1. ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31248938 fbshipit-source-id: 47342fecc72ada9397a1b7bd6fcabfccf988dd3e	2021-09-28 22:51:38 -07:00
Nikita Shulga	4a7a0ea42e	Skip flaky ASAN tests (#65792 ) Summary: See https://github.com/pytorch/pytorch/issues/65727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65792 Reviewed By: janeyx99 Differential Revision: D31254490 Pulled By: malfet fbshipit-source-id: 76714db30a5566fbab95179236ccdafab22cf551	2021-09-28 22:33:02 -07:00
Eli Uriegas	d528c7f3c0	.github: Move windows back to default directory (#64962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64962 Moves windows builds / tests back to the default directory. Previously we had moved them because checkout would sometimes fail due to file handlers still being open on the working directory. Moving back to the default directory also has the added bonus of sccache working again so here's to hoping that this doesn't have any adverse affects Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc peterjc123 mszhanyi skyline75489 nbcsm ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31250072 Pulled By: seemethere fbshipit-source-id: a803bf0e00e1b2b0d63f78600588281622ee0652	2021-09-28 19:41:35 -07:00
peterjc123	ed4491be6f	Fix error code checking for Windows build scripts (#57331 ) Summary: The variable `%errorlevel%` is evaluated before the whole line of command starts, so it is useless when used in a if-block. Also, let's prevent using `%errorlevel%` because it may be set by the users accidentally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57331 Reviewed By: anjali411 Differential Revision: D28140182 Pulled By: malfet fbshipit-source-id: a3f21d65623bb25f039805c175e9f3b468bcb548	2021-09-28 19:27:07 -07:00
Shunting Zhang	0d7036fdaf	don't leak build time path name to runtime for frozen python modules (#65715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65715 Here is how we freeze a python module: - we call python builtin compile method with the source code of the modules and the path. This method returns a python code object - we call marshal.dumps to serialize the code object to bytes. The code_object.co_filename actually matches the one passed in to the compile method. We can simply replace that with a marker to avoid leak build time path to runtime. This works on nested code objects as well: ``` #!/bin/env python3.8 import marshal code_str = """ print("hello") class MyCls: def __init__(self): pass """ co = compile(code_str, "<Generated by torch::deploy>", "exec") cobytes = marshal.dumps(co) import pdb; pdb.set_trace() ``` Checking `co`: ``` (Pdb) co.co_filename '<Generated by torch::deploy>' (Pdb) co.co_consts ('hello', <code object MyCls at 0x7f0e8670bbe0, file "<Generated by torch::deploy>", line 4>, 'MyCls', None) (Pdb) co.co_consts[1].co_filename '<Generated by torch::deploy>' ``` Test Plan: Find the serialized frozenmodule for torch.nn.modules.linear module in the generated bytecode_x.c file. Put the content to /tmp/linear.bytecode Run the testing script: ``` import marshal co_bytes = bytes(eval("[{}]".format("".join(open('/tmp/linear.bytecode').readlines()).replace('\n', '').replace('\t', '')))) co = marshal.loads(co_bytes) print(co) ``` The output for the paste without the change: ``` <code object <module> at 0x7f39ca7f07c0, file "/data/users/shunting/fbsource/fbcode/buck-out/opt/gen/caffe2/gen_frozen_torchpython_src__srcs/torch/nn/modules/linear.py", line 1> ``` The output for the paste with the change: ``` <code object <module> at 0x7f05a765d710, file "<Generated by torch::deploy>", line 1> ```` Note that the file part is changed as expected. Reviewed By: suo Differential Revision: D31214555 fbshipit-source-id: 56958e0a7352f8c30a3377f83209efe7db61f0fb	2021-09-28 19:25:51 -07:00
Nikita Shulga	72b27bde83	[CIFlow] Modify workflow trigger logic (#65733 ) Summary: CIFLow workflows should always run on push event On pull-request workflow should run if label conditions are met or if no `ciflow/` labels are associated with it, workflow is enabled by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/65733 Reviewed By: zhouzhuojie Differential Revision: D31251278 Pulled By: malfet fbshipit-source-id: 31ce745cb224df7c6fec1682ec4180513e3dadf3	2021-09-28 19:19:49 -07:00
Eli Uriegas	b3c32ad32f	.github: Move calculate-docker-image into build (#65789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65789 These common types of jobs can be moved into build since it's typically a no-op, could be annoying in the future to debug docker builds but dedicating an entire ephemeral node to a noop seems like a waste to me Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet, janeyx99 Differential Revision: D31253017 Pulled By: seemethere fbshipit-source-id: c7b5ea35a57fb1576122df219d387c86e420fd1f	2021-09-28 19:15:24 -07:00
Zafar	609384c056	[sparsity][doc] Docstring for WeightNormSparsifier (#65294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65294 This adds the docstring documentation to the WeightNormSparsifier and adds the typehints for the constructor args. Note, this does not require testing as only the doc is changed. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186827 Pulled By: z-a-f fbshipit-source-id: c5010c9bba25b074c4cc6c88f251474b758f950d	2021-09-28 14:14:51 -07:00
Zafar	92ee5cc2e2	[sparsity] Fix for accumulation bug in WeightNormSparsifier (#65293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65293 This fixes a bug in the WeightNormSparsifier, where the mask is being multiplied by the newly computed mask. Because the mask elements are binary 0/1, this accumulates the mask over every iteration, eventually collapsing the mask to zero. This bug accidentally bled through from old versions. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186829 Pulled By: z-a-f fbshipit-source-id: 3f5b2c833148ab0bd8084e7410ce398f1252e65e	2021-09-28 14:14:49 -07:00
Zafar	a90912ecc5	[sparsity] Remove the pack_param from the sparsifier state_dict (#65292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65292 That was the original design, that we decided to simplify by removing the packing in the sparsifier. The state of the sparsifier is saved directly, and the old behavior accidentally bled through to the current version. This change removes the `_pack_params` method, and changes the state_dict to include the state directly. We don't have to change the load_state_dict, as it will work with either the old or the new format. The main reason for this PR is the simplification. The original design didn't achieve anything useful by packing the sparsification parameters. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31186826 Pulled By: z-a-f fbshipit-source-id: 4ad72a7e669f048d2f2d269269ee11b63fa169db	2021-09-28 14:12:52 -07:00
Yukio Siraichi	c829cb6840	Port `min` kernel to structured kernels. (#61450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61450 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741713 Pulled By: bdhirsh fbshipit-source-id: 2c107752a90fd39cfb55e08aaf3541bd484a5fc3	2021-09-28 14:03:54 -07:00
Yukio Siraichi	c2252b3aa6	Port `max` kernel to structured kernels. (#61449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61449 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D29741714 Pulled By: bdhirsh fbshipit-source-id: 6c8c17d20f578ab0af8a969d103a19ccd8d51842	2021-09-28 14:02:26 -07:00
Yukio Siraichi	51f1569c77	Add checks for structured in-place operations. (#65686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65686 Fixes: #57827 This PR introduces `check_inplace` function. It contains some common checks for all structured in-place operators (e.g. dtype, device, and sizes). `set_output` method calls `check_inplace` on in-place specializations of structured kernels. Besides that, it also: - adds overlap assertions for both in-place and out-of-place overloads - remove in-place operator specific `TORCH_CHECK` around the code base Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31234063 Pulled By: ezyang fbshipit-source-id: fa3b45775af7812e07a282e7cae00b68caf0fdb0	2021-09-28 13:21:26 -07:00
Yukio Siraichi	93852bb2d4	Port `sort` kernel to structured kernels. (#62391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62391 Tracking issue: #55070 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D30903992 Pulled By: bdhirsh fbshipit-source-id: 52687aa2483c101056825433d39d69c60b829c62	2021-09-28 13:12:35 -07:00
Zafar Takhirov	57529d48c4	[quant] Fix applying non-zero offset 1 to null pointer in quantized interpolation (#65570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65570 Although this is not an issue that could pop-up in practice, LLVM-12 throws an error about this issue if not checked. Test Plan: `buck test mode/dev //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_empty_batch (quantization.core.test_quantized_op.TestQuantizedOps)'` Reviewed By: r-barnes Differential Revision: D31151681 fbshipit-source-id: e039c6aa1687a61ef6774f045744dc9d768d5c80	2021-09-28 12:28:59 -07:00
Kushashwa Ravi Shrimali	4752453d27	[Structured Kernels] Port for `baddbmm` and `bmm` (#64805 ) Summary: This PR attempts to port `baddbmm` and `bmm` to structured kernels. The reason it's in the same PR: because a lot of it is common for both the ops, including the checks and implementation. Issue tracker: https://github.com/pytorch/pytorch/issues/55070 cc: ysiraichi ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/64805 Reviewed By: gchanan Differential Revision: D31134454 Pulled By: ezyang fbshipit-source-id: 3294619834a8cc6a0407aea660c556d3a42b6261	2021-09-28 11:07:31 -07:00
Eli Uriegas	278edb5626	.circleci: Only generate docker configs we need (#65728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65728 Changes the docker image generation script to only include image build jobs for images that we actually use within CircleCI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D31224674 Pulled By: seemethere fbshipit-source-id: 64b14e1a4ef82d345ec7b898c4c89d9a9419e4de	2021-09-28 10:38:13 -07:00
Nikita Shulga	145202c45b	Define timeout in TestIndividualWorkerQueue (#65742 ) Summary: This test occasionally deadlocks while waiting for the child process to report result. But as the test is small, entire test should never take more than 1-2 sec, but to be on the safe side set timeout to 5 sec Somewhat mitigates https://github.com/pytorch/pytorch/issues/65727 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65742 Reviewed By: janeyx99, ejguan Differential Revision: D31235116 Pulled By: malfet fbshipit-source-id: 0cdd2f7295f6f9fcefee954a14352e18fae20696	2021-09-28 10:01:19 -07:00
Jane Xu	50edc2679d	onnx/test.sh: Run test/onnx in only shard 1 (#65722 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65458 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65722 Reviewed By: albanD Differential Revision: D31223236 Pulled By: janeyx99 fbshipit-source-id: 3b648cb940a95866f465b27b8bdc74b06d258140	2021-09-28 08:45:25 -07:00
Yuichi Taguchi	87cd658c27	Add override to virtual destructor in derived class (#65476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65476 As suggested by `-Winconsistent-missing-destructor-override`. Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D31115128 fbshipit-source-id: a4e2441c13704c0c46e3e86f7886fca76c40ca39	2021-09-28 08:37:23 -07:00
Stephen Jia	57e5ae5306	[vulkan] Use push constants instead of SSBOs (#65716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65716 Currently, we send arguments to shaders by creating and filling a SSBO (Shader Storage Buffer Object). However, we can instead use [push constants](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkCmdPushConstants.html) to send a small amount of uniform data to shaders. Push constants are slightly more performant than using a SSBO and also have the added benefit of not needing to allocate and manage memory for a buffer object since they update the pipeline data directly. The downside of using push constants is that there is a maximum size for a push constant block, described by `maxPushConstantsSize` in [VkPhysicalDeviceLimits](https://www.khronos.org/registry/vulkan/specs/1.1/html/vkspec.html#VkPhysicalDeviceLimits). The minimum size guaranteed by the spec is 128 bytes, which is enough for 32 `float`/`int` variables, or 8 `vec4` variables. This should be enough for our purposes. Currently, the Convolution shaders use the largest uniform block which only uses 22 bytes. Test Plan: Run `vulkan_api_test`: ``` buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" ``` Reviewed By: beback4u Differential Revision: D30368834 fbshipit-source-id: 65a42b9da1a9084ba2337b41eaab9b612583c408	2021-09-28 08:32:30 -07:00
Nikita Shulga	e155e7520f	MaxUnpooling: parallel_for not always backed by OMP (#65655 ) Summary: Use `c10::optional` + thread_fence instead of `#pragma omp critical` inside max_unpooling kernels Using any OpenMP pragma in `at::parallel_for` body is wrong, as it can be implemented using native treading algorithms such as ptrheads `c10::optional` sounds like a much better approach to pair of `has_error` and `error_index` variables. Use `std::atomic_thread_fence` to ensure error_index value is synchronized. It also fixes ICE reported in https://github.com/pytorch/pytorch/issues/65578 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65655 Reviewed By: ngimel Differential Revision: D31206501 Pulled By: malfet fbshipit-source-id: 93df34530e721777b69509cd6c68f5d713fb2b2a	2021-09-28 08:13:58 -07:00
Nikita Vedeneev	26e31f76b0	`_solve` methods: implements forward AD (#65546 ) Summary: This PR adds forward AD for `_solve` methods. Additionally, `cholesky_solve` gets OpInfo + a bug fix when wrong leading dimensions could be passed to LAPACK, and `lu_solve` gets forward AD with 2x`lu_solve` instead of 1x`lu_solve` + 2x`triangular_solve`. cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 jianyuh mruberry walterddr IvanYashchuk xwang233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65546 Reviewed By: gchanan Differential Revision: D31206837 Pulled By: albanD fbshipit-source-id: 040beda97442e7a88a9df9abc7bb18313ce55bc3	2021-09-28 06:51:32 -07:00
Prabhat Roy	2ea724b1fd	Added option to update parameters using state_dict in AveragedModel (#65495 ) Summary: While implementing [EMA](https://github.com/pytorch/vision/pull/4381)(which extends AveragedModel) in torchvision, update_parameters() from AveragedModel could not be used as it did not handle state_dict(), so a custom update_parameters() needed to be defined in [EMA class](https://github.com/pytorch/vision/pull/4406). This PR aims to handle this scenario removing the need for this custom update_parameters() implementation. Discussion: https://github.com/pytorch/vision/pull/4406#pullrequestreview-753734102 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65495 Reviewed By: datumbox Differential Revision: D31176742 Pulled By: prabhat00155 fbshipit-source-id: 326d14876018f21cf602bab5eaba344678dbabe2	2021-09-28 03:34:49 -07:00
Peter Bell	3324bae5f1	Remove THCTensor.cu and THCTensorCopy.cu copy (#65491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65491 The only user of any of this code is THCStorage_copy, so I've migrated that to call `Tensor.copy_` directly. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148183 Pulled By: ngimel fbshipit-source-id: 92bab71306c84bc481c47a0615ebb811af2c2875	2021-09-27 23:21:45 -07:00
Hariom Narang	6a99053515	Added sparse-tensor copy logic to dispatcher (#65304 ) Summary: - Only ported copy for sparse tensor to dispatcher. Everything else is the same - Duplicated code for named tensor handling in sparse tensor copy - Might change it later to handle named tensors using dispatcher Issue https://github.com/pytorch/pytorch/issues/61122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65304 Reviewed By: gchanan Differential Revision: D31176720 Pulled By: ezyang fbshipit-source-id: 56757a3b0fb56c3d05c16dd935428a0cd91ea766	2021-09-27 20:08:27 -07:00
Ivan Kobzarev	43d47bdcca	[tensorexpr] conv2d handle optional bias (#64750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64750 conv2d bias is optional. It will be ArgNone in processing of the graph. This bias is prim::constant NoneType, so we do not know shape at the moment of constant binding. This adding it as a constant zeros Tensor at the moment of graph processing => for that adding `std::vector<TensorExprKernel::ConstantDescr>& constants and std::vector<at::Tensor>& constant_tensors` to `computeOperandValue` as it is not in `TensorExprKernel` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30842101 Pulled By: IvanKobzarev fbshipit-source-id: 88020f6934e43fe606f8eae928b7e21b7c3f15f6	2021-09-27 20:00:53 -07:00
Ivan Kobzarev	31ea4358d8	[tensorexpr] Add Op handling for mobilenetv3 large (#64741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64741 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D30839110 Pulled By: IvanKobzarev fbshipit-source-id: d8e89c086c713fbe816dd8c8096cd64c05dc7431	2021-09-27 20:00:51 -07:00
Ivan Kobzarev	c28e3ffb4b	[jit] Shape propagation batch_norm, dropout, quantize, hardswidh (#64740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64740 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D30839111 Pulled By: IvanKobzarev fbshipit-source-id: c8f477ee05769865c0a23127b7f8a8276f46b54e	2021-09-27 19:59:34 -07:00
Peter Bell	46b3fc032a	Migrate remainder of THCDeviceUtils.cuh to ATen (#65472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65472 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148181 Pulled By: ngimel fbshipit-source-id: f777ba85b1cd8cb98b0ceb1756c558dde5862fc2	2021-09-27 19:37:06 -07:00
Yi Wang	12137db5e3	Fix the slowdown of _object_to_tensor since 1.9 (#65721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65721 #Closes: https://github.com/pytorch/pytorch/issues/65696 The bug is introduced in https://github.com/pytorch/pytorch/pull/55861, and it causes 100X slowdown since 1.9. ghstack-source-id: 139128267 Test Plan: Performance test: ``` import time from torch.distributed.distributed_c10d import _object_to_tensor start = time.time() _object_to_tensor("x" * 50_000_000) print("Time:", time.time() - start) ``` Reviewed By: rohan-varma Differential Revision: D31219794 fbshipit-source-id: 1abec38f9d51361c1eab6ad5efd87b589322e208	2021-09-27 19:22:10 -07:00
Jordan Fix	002ff19836	[acc_utils] Fix off by one for model info getter (#65708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65708 att Test Plan: added unit test Reviewed By: khabinov Differential Revision: D31209992 fbshipit-source-id: c1b4e70bd9705dcfdf3039cb8791149c8646f1d7	2021-09-27 19:01:55 -07:00
Priya Ramani	63bb7c6dba	Refactor AotCompile to return a pair (#65707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65707 Refactoring aotCompile to return a pair of compiled function and the LLVM assembly instead of updating an incoming string with assembly code Testing: Gives expected results when compiled and run ``` (pytorch) ~/local/pytorch refactor_aot └─ $ build/bin/aot_model_compiler --model mobilenetv3.pt --model_name=pytorch_dev_mobilenetv3 --model_version=v1 --input_dims="2,2,2" The compiled model was saved to mobilenetv3.compiled.pt ``` Test Plan: Imported from OSS Reviewed By: qihqi Differential Revision: D31220452 Pulled By: priyaramani fbshipit-source-id: f957c53ba83f876a2e7dbdd4b4571a760b3b6a9a	2021-09-27 18:56:04 -07:00
Rui Zhu	e9327ed2ce	Add nn.function.hardtanh in acc_tracer (#65639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65639 This op is used by mobilenet v2. Test Plan: buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh buck test glow/fb/fx/acc_tracer:test_acc_shape_inference -- hardtanh buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardtanh Reviewed By: yinghai Differential Revision: D31184297 fbshipit-source-id: 5a04319f6d16fb930372442616e27211107ecc67	2021-09-27 18:40:18 -07:00
Ben Koopman	6a6ee92e36	[quant] Add op benchmark for CPU FakeQuantizePerChannel with float zero_points (#65241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65241 Test Plan: Imported from OSS Reviewed By: jingsh Differential Revision: D31150087 Pulled By: b-koopman fbshipit-source-id: a00d4995841eee81305d0007c908473cc3d5a727	2021-09-27 16:01:49 -07:00
Alban Desmaison	7c62b6e973	add deepcopy support to subclasses (#65584 ) Summary: Happy to get any feedback on how to make this code cleaner! This: - Fix Tensor attribute deepcopy BC-breaking? - Add a test for Tensor attribute deepcopy - Fix subclass deepcopy - Moves the subclass serialization tests into their own class not to interfere with other serialization test logic - Add a test for subclass deepcopy cc ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65584 Reviewed By: gchanan Differential Revision: D31206590 Pulled By: albanD fbshipit-source-id: 74a8f0767f4933b9c941fbea880a8fd1b893ea2f	2021-09-27 14:36:22 -07:00
Peter Bell	f5b4e369f6	Sparse SoftMax: Remove unused variables (#65539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65539 This function doesn't directly use thrust so these are simply unused variables. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31193191 Pulled By: malfet fbshipit-source-id: 231b6a197c9f1bd5a61e46cb858e8eedc85b2818	2021-09-27 13:51:49 -07:00
Nikita Shulga	e1340d4282	[GHA] Small refactors (#65647 ) Summary: Introduce `main` method in generate_ci_workflows Check that all `ciflow/` labels start with the same prefix Move `ciflow_should_run` defenition to common.yml.j2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65647 Reviewed By: janeyx99 Differential Revision: D31189537 Pulled By: malfet fbshipit-source-id: 7cc47f63fb334c57f450034b931ff5bae1c0ed8b	2021-09-27 13:14:49 -07:00
Sujoy Saraswati	fea32be964	Add HPU type for check_base_legacy_new (#65410 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/65410 Reviewed By: H-Huang Differential Revision: D31143754 Pulled By: malfet fbshipit-source-id: 32abfbae4f7c09924c7dfa16758d64a2215ec636	2021-09-27 13:13:34 -07:00
Nikita Shulga	82e0bf44c0	Apply linter suggestions to #65137 (#65459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65459 Just run linter on the change and apply all suggestions Test Plan: N/A Reviewed By: seemethere Differential Revision: D31102960 fbshipit-source-id: 04e1d07935690f2ddbc64533661b3e55379d13b5	2021-09-27 13:07:40 -07:00
David Riazati	811601e19a	Upload sccache stats (#65582 ) Summary: This adds some tracking to metrics.pytorch.org for sccache build stats per environment Pull Request resolved: https://github.com/pytorch/pytorch/pull/65582 Reviewed By: malfet, zhouzhuojie, janeyx99 Differential Revision: D31160761 Pulled By: driazati fbshipit-source-id: a497918bafbe610a51c92a9139684cd3efe670d3	2021-09-27 12:55:10 -07:00
Richard Zou	ea546e20fd	[Reland] nn.functional.linear OpInfo (#65498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65498 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D31171149 Pulled By: zou3519 fbshipit-source-id: badb06af08a772397b0280189385723c0175200b	2021-09-27 12:42:46 -07:00
Yi Zhang	b91375f741	upgrade windows cuda installer: cu11.1.0 to cu11.1.1 (#65669 ) Summary: Fixes pytorch/vision#4483 Please merge it with https://github.com/pytorch/builder/pull/857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65669 Reviewed By: gchanan Differential Revision: D31205107 Pulled By: janeyx99 fbshipit-source-id: 654f0440ad33d2517db95d64df64e14de1233ad7	2021-09-27 12:27:19 -07:00
Michael Suo	cd2656a2e5	[package] add some docs describing how to debug dependencies (#65704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65704 As title. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D31209866 Pulled By: suo fbshipit-source-id: 4c8ec1d5418ea75b71c4b9a498b86f0ef5383544	2021-09-27 12:14:23 -07:00
Sujoy Saraswati	10d0dbc6d9	Avoid storage access for HPU tensors (#65409 ) Summary: Add is_hpu() methods for Aten tensor and device Pull Request resolved: https://github.com/pytorch/pytorch/pull/65409 Reviewed By: wconstab, H-Huang Differential Revision: D31134422 Pulled By: malfet fbshipit-source-id: 181ebb67dce8e05a0723ef3c82f23e39228841ee	2021-09-27 11:54:30 -07:00
Jane Xu	aa5d2a8d86	Remove confusing SHARD_NUMBER resetting logic (#65701 ) Summary: The SHARD_NUMBER reset was to figure out a way to differentiate whether we had just one shard vs multiple. We shouldn't reset SHARD_NUMBER but instead should just pass and use NUM_TEST_SHARDS for clarity and ease of scaling up to more shards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65701 Reviewed By: driazati Differential Revision: D31209306 Pulled By: janeyx99 fbshipit-source-id: 3a3504bd47e655d62aa0d9ed2f4657ca34c71c0e	2021-09-27 10:55:00 -07:00
Shen Li	facff2ec65	Update ProcessGroup collective C++ APIs to be non-pure virtual functions (#64943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64943 Most ProcessGroup Collective APIs are pure virtual. As a result, c10d extensions need to override all of them and throw an error if they don't need certain APIs. This is too verbose for users. This commit changes those collective APIs to virtual and throws an error by default. Note that ProcessGroup is still an abstract class as `getBackendName` is a pure virtual function that all subclasses have to override. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang cbalioglu gcramer23 Test Plan: Imported from OSS Reviewed By: cbalioglu Differential Revision: D30906866 Pulled By: mrshenli fbshipit-source-id: c4df8962d60350a44d2df72fd04f9dd6eadb9fa6	2021-09-26 19:19:43 -07:00
nayef211	cd80bbe5f5	Bug fixes in dataframe_wrapper (#65629 ) Summary: ## Description - Updated functions in `dataframe_wrapper.py` to return values - Fixed bug in `set_df_wrapper` to update `global default_wrapper` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65629 Reviewed By: ejguan Differential Revision: D31180110 Pulled By: Nayef211 fbshipit-source-id: a8046e582fd6ce982fcdc89dae4932d0edc83d6b	2021-09-25 21:09:41 -07:00
Rohan Varma	1c8949c51a	[BE] Run Zero test internally (#65519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65519 Adds buck target so we can run this internally. ghstack-source-id: 139009957 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D31072784 fbshipit-source-id: 7185cc1e6f9df3d79251eb017270471942a9d7dd	2021-09-25 13:26:50 -07:00
Rohan Varma	f70147b426	[BE] Enable ZeRO test on windows (#65385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65385 Enables the ZeRO tests to run on windows. Closes https://github.com/pytorch/pytorch/issues/63086. Backend == NCCL was used as a proxy to see if we were running under CUDA, but Windows GPU tests uses Gloo. In this case use Gloo on GPU. For some reason these tests don't seem to test Gloo on GPU with ZeRO in general (picks NCCL backend when GPU is available), so kept that behavior for now. ghstack-source-id: 139003920 Test Plan: CI Reviewed By: mrshenli Differential Revision: D31071181 fbshipit-source-id: 45a76309ac5e882f5aa6c4b130118a68800754bb	2021-09-25 13:25:40 -07:00
CodemodService Bot	4fe66d962d	[Codemod][FBSourceBlackLinter] Daily `arc lint --take BLACK` Reviewed By: zertosh Differential Revision: D31192084 fbshipit-source-id: 25d490783b876253ddd1ad0a70832766ebd33f51	2021-09-25 06:42:19 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	146817c9d0	Add all_paths utility function (#65602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65602 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D31163681 Pulled By: tugsbayasgalan fbshipit-source-id: fa0b28b1d3b73efcc7671698a613e695a01cc103	2021-09-25 01:11:20 -07:00
Mikhail Zolotukhin	0256c3be50	[TensorExpr] Delete dtype_ field from Let - it should use its var's dtype. (#65634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65634 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D31182697 Pulled By: ZolotukhinM fbshipit-source-id: 572ecd74cdf2a671ee98e81f0b3e387f3d9c6202	2021-09-25 00:11:06 -07:00
Nikita Shulga	399214efd6	Revert D31172530: [pytorch][PR] Enable CUPTI for kineto by default on windows Test Plan: revert-hammer Differential Revision: D31172530 (`6b60884f12`) Original commit changeset: 2c69ed0282c5 fbshipit-source-id: 649e040a8c44b0f536a8db397b4325309a285934	2021-09-24 19:18:15 -07:00
Rui Zhu	cda2ee9016	Add nn.function.hardswish in acc_tracer (#65590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65590 hardswish is used by mobile net v3 oss model. This diff added hardswish support in acc_tracer Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardswish Reviewed By: 842974287 Differential Revision: D30950061 fbshipit-source-id: cab57b8de5bea3a4d9d2b7d2a410d9afe787d66f	2021-09-24 17:30:39 -07:00
Akshit Khurana	1de8976e85	Add quantized::convtranspose2d (#63914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63914 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531889 fbshipit-source-id: a65e389da2722efbc62e3fe1edf503732326350d	2021-09-24 17:07:29 -07:00
Akshit Khurana	ab5eb56983	add qmul (#63913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63913 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D30531890 fbshipit-source-id: 29d88cc61bd1e328cc7ae7a91a2f8d4819803c8d	2021-09-24 17:06:17 -07:00
Scott Wolchok	ece25c453f	[PyTorch] Store Argument::alias_info_ on the heap (#64824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64824 See comment in function_schema.h for explanation. I claim that this is a good tradeoff because the aliasing information seems to be used only in compiler-ish code paths, where performance isn't as critical as actual execution. If performance is important there too, perhaps we should hoist isWrite into the Argument itself since there are several paths that only care about isWrite. ghstack-source-id: 138958896 Test Plan: CI, profile schema parsing on startup and see much fewer page faults in createArgumentVector. Reviewed By: suo Differential Revision: D30860719 fbshipit-source-id: 1d4d2328f2b8e34f5ddf9d82083fd4dd7b7f738f	2021-09-24 17:00:51 -07:00
Kyle Chen	af7238f214	Rocm4.3.1 nightly (#65624 ) Summary: Depends on pytorch/builder#851. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Pull Request resolved: https://github.com/pytorch/pytorch/pull/65624 Reviewed By: zou3519 Differential Revision: D31180780 Pulled By: malfet fbshipit-source-id: 98a51eb45985ef648108e811d2c02231ec8b3a1f	2021-09-24 16:21:01 -07:00
Mikhail Zolotukhin	15724bcc03	[TensorExpr] Re-enable a float16 test. (#65632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65632 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D31181798 Pulled By: ZolotukhinM fbshipit-source-id: 1a57d0a878d44f8b73f3c24eef7ba707ce18fb70	2021-09-24 15:15:42 -07:00
Thomas J. Fan	0d3bf97fd0	TST Adds test for non-contiguous tensors (#64954 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 This PR: 1. Adds test for non-contiguous tensors 2. Fixes bug in `NLLLoss` that was catch by the test. The reason this was not catch in `common_nn` is because `CriterionTest` overrides `test_cuda` but does not call `test_nonconfig`. cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64954 Reviewed By: zou3519 Differential Revision: D31174149 Pulled By: jbschlosser fbshipit-source-id: a16073e59b40ccc01c82ede016b63a8db2e810f5	2021-09-24 15:05:09 -07:00
Jane Xu	a839cec0ad	.github: GHA retry docker pull (#65103 ) Summary: This should help alleviate workflows failing due to docker pull timing out, which doesn't happen often, but did happen once in the past day. Was also reported in https://github.com/pytorch/pytorch/issues/65439 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65103 Reviewed By: driazati Differential Revision: D31157772 Pulled By: janeyx99 fbshipit-source-id: 7bf556f849b41eeb6dea69d73e5a8e1a40dec514	2021-09-24 14:31:43 -07:00
Peter Bell	68e5935498	Remove fgrad_input from slow_conv2d (#64280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64280 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D30830887 Pulled By: jbschlosser fbshipit-source-id: 5a3a79ad9d9118177672eabf872f9d9a3313ebe4	2021-09-24 14:27:39 -07:00
John Clow	71d1d16acb	Moving the constant parameter check to a more common file (#64251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64251 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D31161850 Pulled By: Gamrix fbshipit-source-id: 5db3e6d52c99c1f40455c601122bb7680a287ae5	2021-09-24 13:54:27 -07:00
Dhruv Matani	640a615150	[easy] [PyTorch Edge] Remove double pragma once directive in the generated code (#65620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65620 This was bothering me for a while. ghstack-source-id: 138914860 Test Plan: Sandcastle Reviewed By: beback4u Differential Revision: D31162648 fbshipit-source-id: 72c47ea34d40c772bb53da721fcb36365b5dbaf3	2021-09-24 13:14:37 -07:00
Thomas J. Fan	57e066e188	TST Adds gradcheck and gradgradcheck to module info (#64444 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/61935 cc albanD mruberry jbschlosser walterddr Pull Request resolved: https://github.com/pytorch/pytorch/pull/64444 Reviewed By: pbelevich Differential Revision: D31174672 Pulled By: jbschlosser fbshipit-source-id: 86dc3576479974fd0996f06298c09692c07e6b24	2021-09-24 13:10:29 -07:00
Guangyun Han	6b60884f12	Enable CUPTI for kineto by default on windows (#65608 ) Summary: Retry of https://github.com/pytorch/pytorch/pull/62175 See https://github.com/pytorch/pytorch/pull/62175#issuecomment-926411151 for more information. malfet gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/65608 Reviewed By: zou3519 Differential Revision: D31172530 Pulled By: gdankel fbshipit-source-id: 2c69ed0282c54fa6cdb6e604096d0370e230fd66	2021-09-24 13:00:49 -07:00
Scott Wolchok	eca4f14b6c	[PyTorch] Add C10_ prefix to MPARK_* macros in variant.h (#65589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65589 Without this prefix, the include guards interfere with attempts to indirectly include both c10::variant and the original mpark variant in the same translation unit. ghstack-source-id: 138901838 Test Plan: Temporarily `#include <c10/util/variant.h>` in ivalue.h and buck build //data_preproc/preproc:preproc_adapter_utils mode/no-gpu -- this delayed D31101962 (`01720d6a23`) from fixing S244170 Reviewed By: bhosmer Differential Revision: D31159414 fbshipit-source-id: 234c5ed37ca853702bcdf3263e4f185b95ac1d08	2021-09-24 12:57:26 -07:00
Yi Wang	7f25c3e666	Update distributed.rst to show that CUDA send/recv on GPU is supported (#65601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65601 I believe this feature was supported one year ago: https://github.com/pytorch/pytorch/pull/44921 #Closes: https://github.com/pytorch/pytorch/issues/65525 ghstack-source-id: 138918961 Test Plan: N/A Reviewed By: pritamdamania87, mingzhe09088 Differential Revision: D31163535 fbshipit-source-id: 9321a0a5137a3e265e2b54bd78730ac28c7acd55	2021-09-24 12:30:10 -07:00
Richard Barnes	760aefd34d	Fix nullptr addition (#65548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65548 Fixes caffe2/test:jit - test_unsupported_nn_functional_pad_circular_cpu_float32 (test_jit_fuser_te.TestNNCOpInfoCPU) Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D31148405 fbshipit-source-id: 4c8c693a45229ab4e59b0b0ec5326d3ac114dbaf	2021-09-24 11:43:22 -07:00
Zhengxu Chen	c3b09e977a	[fx2trt] Refresh execution context across save/load for TRTModule. (#65592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65592 IExecutionContext might not be safe to be serialized, therefore the simplest way to support save/load of TRTModule is to re-populate the execution context upon every load. ghstack-source-id: 138904770 Test Plan: buck run mode/dev-nosan -c python.package_style=inplace -j 40 deeplearning/trt/fx2trt:acc2trt_test Reviewed By: zrphercule Differential Revision: D31070427 fbshipit-source-id: 88c58c6ce50e6dc9383d7f9419b5447cb89a4a3a	2021-09-24 11:36:57 -07:00
XiaobingSuper	1682722152	keep output type after calling SubgraphRewriter (#65453 ) Summary: For jit SubgraphRewriter, it doesn't keep output type after overwriting the old graph, for example, in profiling mode, the old graph has the old operator's shapes, but after replacing the old operator with a newer operator by applying SubgraphRewriter, the tensor shape info was eliminated. The activation is that I want to replace pytorch convolution with a customer's convolution, I first register aten::_convolution as a profiler node that can reorder the input and output's shapes, and then using graph rewrite to replace it as aten::conv2d, which tensors' shapes info are eliminated. I hope using input size do some pre-progress before replacing aten::conv2d with the customer's convolution. Before rewrite: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %x : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::_convolution(%x.1, %weight, %4, %3, %2, %3, %6, %2, %7, %6, %6, %5, %5), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3. 6/site-packages/torch/nn/modules/conv.py:443:0 %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%x, %z, %7) # jit_test.py: 24:0 return (%16) ``` after rewrite by using aten::conv2d ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:22:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Tensor = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py:24:0 return (%16) ``` expected result after replace aten::_convolution with aten::conv2d: ``` graph(%self.1 : __torch__.MyModule, %x.1 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu)): %7 : int = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6/ site-packages/torch/nn/modules/conv.py:443:0 %6 : bool = prim::Constant[value=0](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %5 : bool = prim::Constant[value=1](), scope: __module.conv # /home/xiaobinz/miniconda3/envs/pytorch-master/lib/python3.6 /site-packages/torch/nn/modules/conv.py:443:0 %4 : NoneType = prim::Constant() %3 : int[] = prim::Constant[value=[1, 1]]() %2 : int[] = prim::Constant[value=[0, 0]]() %conv : __torch__.torch.nn.modules.conv.Conv2d = prim::GetAttr[name="conv"](%self.1) %z : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::clone(%x.1, %4) # jit_test.py:2 2:0 %weight : Float(3, 3, 1, 1, strides=[3, 1, 1, 1], requires_grad=0, device=cpu) = prim::GetAttr[name="weight"](%conv) %18 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::conv2d(%x.1, %weight, %4, %3, %2, %3, %7) %16 : Float(2, 3, 20, 20, strides=[1200, 400, 20, 1], requires_grad=0, device=cpu) = aten::add(%18, %z, %7) # jit_test.py :24:0 return (%16) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65453 Reviewed By: zdevito Differential Revision: D31162489 Pulled By: ZolotukhinM fbshipit-source-id: 0d1c1d607cb612df47c64f173d9f4c9e8b1d6c49	2021-09-24 11:07:40 -07:00
Peter Bell	f3587f6bfa	Remove THC ScalarConvert (#65471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65471 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148182 Pulled By: ngimel fbshipit-source-id: bbf74e36a3d91a7be3e47199981440c68a2f645f	2021-09-24 10:29:51 -07:00
Andres Suarez	5b2a7eaa03	[codemod][fbcode/caffe2] Apply all buildifier fixes Test Plan: Visual inspection. Sandcastle. Reviewed By: zsol Differential Revision: D31170304 fbshipit-source-id: ee56312b5262247bb5a2e68a66d51f6cb3a0bf82	2021-09-24 09:03:29 -07:00
Alban Desmaison	b858993c97	Fix engine check for case where grad is a subclass (#65568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65568 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31158089 Pulled By: albanD fbshipit-source-id: 2a77df9b6340107de02a043b57a36cb7ae68df34	2021-09-24 08:41:19 -07:00
Alban Desmaison	e742839f0e	Fix autograd engine test in python_dispatch (#65567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65567 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31158090 Pulled By: albanD fbshipit-source-id: 651b78016ad978c7419343554ce7ceffd54aef1b	2021-09-24 08:39:52 -07:00
Mike Iovine	ef9e560796	[Static Runtime] Add aten::remainder out variant (#64967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64967 Out variant implementation for `aten::remainder`. Added both scalar and tensor overloads. Test Plan: `buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- Remainder` Reviewed By: d1jang Differential Revision: D30915469 fbshipit-source-id: 9f27f18c86d66b11eac0aa4659c7062cb785b7e9	2021-09-24 07:51:39 -07:00
Mike Iovine	b003b2a9c0	[Static Runtime] Add record functions (#64698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64698 Reviewed By: hlu1 Differential Revision: D30747191 fbshipit-source-id: 7ded6ea9bd36b5e3343d1efa9f3c92e02ff6d7f8	2021-09-24 07:20:17 -07:00
Philip Meier	fd24e1b61f	add `OpInfo` for `torch.repeat_interleave` (#65455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65455 Addresses facebookresearch/functorch#103. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31111696 Pulled By: zou3519 fbshipit-source-id: 4fa73708fa915cb21adbba9cb8fd2b8f75bcd3e0	2021-09-24 07:16:08 -07:00
Philip Meier	d85e12a5bf	add OpInfo for `torch.argsort` (#65454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65454 Addresses facebookresearch/functorch#103. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31111700 Pulled By: zou3519 fbshipit-source-id: ec4babd2fcdcea856ba0ee8db0fd8f42b87269f3	2021-09-24 07:14:41 -07:00
CodemodService FBSourceClangFormatLinterBot	ca66698202	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31166199 fbshipit-source-id: 3fb46d64aba5e7c443b70beda77338f2ee63a99e	2021-09-24 02:57:37 -07:00
Mikhail Zolotukhin	cc4db35205	[TensorExpr] Break circular dependency of shared pointers in MemDependencyChecker. (#65600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65600 Previously AccessInfo owned two maps: dependencies_ and dependents_, which represented an edge in dependency graph. These two maps were holding shared pointers and thus each edge immediately became a cycle, which resulted in memory leaks. This PR makes one of the ends of these edges weak pointer thus breaking the loop. Test Plan: buck test mode/dbgo-asan-ubsan //search/lib/query_expansion/candidate_generator/test:transliteration_expander_test -- --exact 'search/lib/query_expansion/candidate_generator/test:transliteration_expander_test - TransliterationExpander.romanizationByLocaleTest' Reviewed By: bertmaher Differential Revision: D31163441 Pulled By: ZolotukhinM fbshipit-source-id: 9cef921f5c9293f1237144d1ee92e31f3e44c00a	2021-09-23 23:33:36 -07:00
Elias Ellison	01720d6a23	[JIT] constant object compilation unit ref fix (#65442 ) Summary: // A non owning pointer to a type. When a class get inserted as a constant // into a graph, if we used a strong pointer we would have a circular reference // from Object -> CompilationUnit and CompilationUnit -> Graph (which owns the // Constant Object) Pull Request resolved: https://github.com/pytorch/pytorch/pull/65442 Reviewed By: ezyang Differential Revision: D31101962 Pulled By: eellison fbshipit-source-id: f1c1cfbe5a8d16a832cad7ba46e2a57a98670083	2021-09-23 22:43:02 -07:00
Ansley Ussery	f83250fd4e	Revert logic in `mobile/type_parser.cpp` (#65556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65556 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D31149080 Pulled By: ansley fbshipit-source-id: d5986d019fc2c47fd45cc10f0397499cc1e81329	2021-09-23 22:26:02 -07:00
BowenBao	20143bf07f	[ONNX] Deprecate use_external_data_format param from torch.onnx.export() function. (#62257 ) (#64382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64382 * This `use_external_data_format` parameter is used for large models cannot be exported because of the 2GB protobuf limit. * When `use_external_data_format` set to True, the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. * This PR will set this paramter to DEPRECATED and check the model proto sizes by code instead of by user, if the sizes lager than 2GB, then `use_external_data_format = True` automatically. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905265 Pulled By: malfet fbshipit-source-id: 82b4e17bfa6a8de2bfd700a5282c12f6835603cb Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:48 -07:00
BowenBao	478d4cf883	[ONNX] Deprecated the example_outputs param from torch.onnx.export() function. (#62815 ) (#64380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64380 * `example_outputs` used to determine the type and shape of the outputs without tracing the execution of the model. And it must be provided when exporting a ScriptModule or ScriptFunction when using export() function. * Since we can work out `example_outputs` in internal function instead of being provided by user, so we deprecated this argument in the export() function to increase user experience of calling this function. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905266 Pulled By: malfet fbshipit-source-id: d00b00d7d02b365d165028288ad915678caa51f2 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:46 -07:00
BowenBao	9323ea2195	[ONNX] minor doc improvements and cleanup (#62514 ) (#64373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64373 * Fix some bad formatting and clarify things in onnx.rst. * In `export_to_pretty_string`: * Add documentation for previously undocumented args. * Document that `f` arg is ignored and mark it deprecated. * Update tests to stop setting `f`. * Warn if `_retain_param_name` is set. * Use double quotes for string literals in test_operators.py. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905271 Pulled By: malfet fbshipit-source-id: 3627eeabf40b9516c4a83cfab424ce537b36e4b3	2021-09-23 22:20:44 -07:00
BowenBao	9965163751	[ONNX] Add supplementary tests and description for custom_opsets param from torch.onnx.export() function. (#62085 ) (#64372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64372 custom_opsets arg from torch.onnx.export() is no needed to be removed. Add some supplementary description and tests for easier understanding. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905269 Pulled By: malfet fbshipit-source-id: 489fbee0e2c1d6c5405c9bf7dfd85223ed981a44 Co-authored-by: hwangdeyu <dejack953@outlook.com>	2021-09-23 22:20:42 -07:00
BowenBao	fb71ccf0f1	[ONNX] Remove strip_doc_string param from torch.onnx.export() function. (#61712 ) (#64371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64371 As of now, the "strip_doc_string" parameter was described as below: strip_doc_string (bool, default True): do not include the field doc_string``` from the exported model. Otherwise the field will mention the source code locations for model``. This is usually useless to users who want to transform a PyTorch model to ONNX one. Only when the user wants to debug the export process, these source code locations could provide benefits. To make the export() function more friendly by providing less parameters, we combined "strip_doc_string" into "verbose" parameter. If a user set verbose to True, it means the users need some log information for debugging the export process and this is similar with the purpose of strip_doc_string parameter. But the usage of these 2 arguments are opposite: setting verbose to True means we want to print log information to help debug, which means strip_doc_string should be False. And this is how we replace strip_doc_string with verbose argument in this PR. This PR will still keep it in torch.onnx.export() function for backward support while the usage of it has been combined with verbose argument. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905268 Pulled By: malfet fbshipit-source-id: 2f06eb805c01fe15ff7a1b4f6595c937ba716d60 Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-23 22:20:40 -07:00
BowenBao	47d1ed60e1	[ONNX] Remove argument _retain_param_name from torch.onnx.export() function. (#61702 ) (#64370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64370 As of now, the "_retain_param_name" parameter has no description in PyTorch docs website. According to code, this argument determines if we keep the original parameter names of PyTorch model in the final ONNX graph. If this is False, those original parameter names will be replaced with a series of integers starting from 1. Since setting numbers as parameter names make no sense to users, we remove this argument from the torch.onnx.export() function to increase user experience of calling this function. This PR will still keep it in torch.onnx.export() function for backward support while all backend logic has been changed to work as _retain_param_name is set to True. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D30905270 Pulled By: malfet fbshipit-source-id: ca60757ca17daaff937e9f08da42596086795f4a Co-authored-by: fatcat-z <zhang-ji@outlook.com>	2021-09-23 22:18:52 -07:00
Nikita Shulga	bc02255d5e	Revert D30721329: [pytorch][PR] Enable CUPTI for kineto by default on windows. Test Plan: revert-hammer Differential Revision: D30721329 (`7dbc21bc2b`) Original commit changeset: aa1af47df8cc fbshipit-source-id: 565d50841e19a45f8798a490aa3aa6b9f69ca404	2021-09-23 22:14:32 -07:00
Michael Dagitses	8c7caedbb8	avoid re-allocation of view_shape for every tensor in `torch.meshgrid` (#62908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62908 Reviewed By: mruberry Differential Revision: D31064165 Pulled By: dagitses fbshipit-source-id: 3ddc3088e70fc8ef6dcf56ceb67fd20991169af1	2021-09-23 21:41:51 -07:00
Peter Bell	963ae25e41	Migrate THCAtomics to ATen (#65470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65470 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D31148184 Pulled By: ngimel fbshipit-source-id: aaac3dfb5f2c6f88e9bd922b3a56d0a16a861e17	2021-09-23 19:43:34 -07:00
Sujoy Saraswati	c73f0e457e	Tensor and device is_hpu methods (#65408 ) Summary: Add is_hpu() methods for Aten tensor and device Pull Request resolved: https://github.com/pytorch/pytorch/pull/65408 Reviewed By: malfet Differential Revision: D31144227 Pulled By: wconstab fbshipit-source-id: 115f4df4b8d54e6913dd51af7b6d4cacf6dd43c5	2021-09-23 18:42:45 -07:00
Shen Li	d78b3909e8	Explicitly destory ProcessGroup in allgather_coalesced_async test (#65513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65513 The error in #65231 means some child threads were destructed before joined. I added some trace and prints and found that, in the failed tests, all `assertEqual` are passed, but the `ProcessGroupGloo` destructor wasn't called in one of the process. It could be due to the only guarantee that Python makes is that garbage collection MAY happen before the program exits. This commit adds an explicit `destroy_process_group()` to alleviate the problem. cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D31134174 Pulled By: mrshenli fbshipit-source-id: 2e42fe93d3f16ce34681b591afc15a6ac0b9fab6	2021-09-23 18:35:08 -07:00
Jerry Zhang	b77c979102	[quant][fx][graphmode] Make FixedQParam ops work for dtypes other than quint8 (#65484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65484 This PR makes sure we only use FixedQParamFakeQuantize for quint8 dtype and allows user to use other dtypes for ops like sigmoid, this is useful for producing reference pattern for these ops that can be used in other backends like TensorRT Test Plan: python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: vkuzo Differential Revision: D31120377 fbshipit-source-id: 3b529d588e2b6ff0377a89c181f6237f8f0cc2f5	2021-09-23 18:29:56 -07:00
Jane Xu	a2e631b874	Windows GHA: Only upload artifacts if prev steps pass (#65561 ) Summary: Fixes a task in https://github.com/pytorch/pytorch/issues/65439 And removes the Upload to GitHub step as it's redundant with the S3 step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65561 Reviewed By: seemethere Differential Revision: D31157685 Pulled By: janeyx99 fbshipit-source-id: cd23113a981eb4467fea3af14d916f6f2445a02b	2021-09-23 17:38:39 -07:00
Guangyun Han	7dbc21bc2b	Enable CUPTI for kineto by default on windows. (#62175 ) Summary: It fix nothing. For tracking this PR, please refers to https://github.com/pytorch/kineto/issues/356 Pull Request resolved: https://github.com/pytorch/pytorch/pull/62175 Reviewed By: ezyang Differential Revision: D30721329 Pulled By: gdankel fbshipit-source-id: aa1af47df8cc1b6f5ba2194447f62b902a6a9c84	2021-09-23 15:13:47 -07:00
Tao Xu	f850d7ef2e	[CoreML][OSS] Add Simulator tests (#65076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65076 ghstack-source-id: 138869950 create a new conda environment - conda create --name coreml python=3.8 conda activate coreml pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html pip install coremltools==5.0b5 cd pytorch git fetch git checkout gh/xta0/131/head cd ios/TestApp/benchmark mkdir ../models python coreml_backend.py Test the model_coreml.ptl in the helloworld example Test Plan: 1. CircleCI 2. Pytorch nightly builds Reviewed By: hanton Differential Revision: D30912268 fbshipit-source-id: 52b2ed1ad40e5949ee2755bca113119132dfc914	2021-09-23 14:57:01 -07:00
Tingting Markstrum	2a0208f4dc	fixed comments referring fairscale master branch (#65531 ) Summary: replace comments referring fairscale master branch with main branch Pull Request resolved: https://github.com/pytorch/pytorch/pull/65531 Test Plan: buck build cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Reviewed By: H-Huang, anj-s Differential Revision: D31132552 Pulled By: tmarkstrum fbshipit-source-id: d3ee8920ab5cccad99f640934c21e8eee022e9b9	2021-09-23 14:37:58 -07:00
Andres Suarez	c015cbabf9	[codemod][fbcode/caffe2] Apply all buildifier fixes Test Plan: Visual inspection. Sandcastle. Reviewed By: zsol Differential Revision: D31144864 fbshipit-source-id: f8e65fec69f88d03048df9edb98969d648eb6981	2021-09-23 14:03:19 -07:00
Shiyan Deng	d07b2cb4ec	[fx2trt] update the oss fx2trt exmaple (#65544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65544 ATT Test Plan: CI Reviewed By: mikekgfb Differential Revision: D31147750 fbshipit-source-id: eacc1c9157a32d6deebbfe9ff2aaae13c434e72b	2021-09-23 13:45:22 -07:00
Rohan Varma	71704349aa	[DDP] Allow await of custom buffer reduction in backward (#64515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64515 For performance reasons, we would like to ensure that we can await user collectives as part of custom buffer reduction in parallel to other work. As a result, add support to return futures from custom buffer hooks and await those futures at end of backwards pass. Also added some docs to clarify how to use these APIs. ghstack-source-id: 138793803 Test Plan: I Reviewed By: zhaojuanmao Differential Revision: D30757761 fbshipit-source-id: e1a2ead9ca850cb345fbee079cf0614e91bece44	2021-09-23 13:02:53 -07:00
John Clow	36485d36b6	Docathon: Add docs for nn.functional.*d_max_pool (#63264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63264 Adding docs to max_pool to resolve docathon issue #60904 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31071491 Pulled By: Gamrix fbshipit-source-id: f4f6ec319c62ff1dfaeed8bb6bb0464b9514a7e9	2021-09-23 11:59:50 -07:00
Facebook Community Bot	1f0f246fe2	Automated submodule update: FBGEMM (#65360 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `0108d4f552` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65360 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D31061552 fbshipit-source-id: 8bce5157a281e38cad5d5d0e9dcd123beda39735	2021-09-23 11:47:15 -07:00
Michael Suo	65fbd2c12b	[ci] do not continue through error on trunk (#65503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65503 There are two reasons for this change: - I don't think trunk jobs should have different behavior than their PR equivalents. - Continuing through error makes it challenging to figure out what is actually failing, especially given the poor UX of GitHub Actions when it comes to reading logs Example: https://github.com/pytorch/pytorch/runs/3680114581. Here, there is a failure but the rendered test results tell me everything is successful. I have no idea how to quickly tell what failed; the log is so long and terms like "error", "failure", etc. are common enough that searching it is very difficult. Differential Revision: D31130478 D31130478 Test Plan: Imported from OSS Reviewed By: ezyang Pulled By: suo fbshipit-source-id: 15a80475ca4c49644c0f7b779f5c6c2ffeb946a6	2021-09-23 11:36:03 -07:00
Rodrigo Berriel	7e772e7685	Update link to tutorial on defining NN modules (#65534 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65527. Please, see my comment in the issue: https://github.com/pytorch/pytorch/issues/65527#issuecomment-925863193. The file was renamed in `ce58d5904c (diff-e5ef486bd89eb38de15752211d9437953681b8caa8f44d7c86bb820d13151df2)`, but the link in this repository was not updated. It doesn't change the fact that the old link is still working, but I guess this has to be fixed in [pytorch/tutorials](https://github.com/pytorch/tutorials) instead of here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65534 Reviewed By: soulitzer Differential Revision: D31144269 Pulled By: H-Huang fbshipit-source-id: f70744a21113b7dc84510e2992d87f0fed793985	2021-09-23 11:26:50 -07:00
Michael Suo	cac7c1a192	[ci] remove auto-label-rocm workflow (#65558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65558 This will temporarily be replaced by an FB-internal workflow that does the exact same thing, pending a migration of this workflow to probot. cc jeffdaily sunway513 jithunnair-amd ROCmSupport Test Plan: Imported from OSS Reviewed By: zhouzhuojie, driazati Differential Revision: D31149105 Pulled By: suo fbshipit-source-id: 2aa122820ae3b5286774501f5ecfe052bc949dea	2021-09-23 11:15:35 -07:00
Nikita Shulga	c731be8066	[BE] Use `DispatchKeySet` in `check_base_legacy_new` (#65535 ) Summary: Refactor: ``` TORCH_CHECK ( key == a \|\| key == b \|\| key == c, "expected key to be in ", a, " or ", b , " or ", c, " but got ", key); ``` into ``` TORCH_CHECK( key_set.has(key), "expected key to be in ", key_set, " but got ", key ); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65535 Reviewed By: wconstab Differential Revision: D31144239 Pulled By: malfet fbshipit-source-id: 68a053041a38f043e688e491889dd7ee258f3db3	2021-09-23 11:01:23 -07:00
Bin Wen	da166d4f12	Add a timeout argument to RPC shutdown() (#65425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65425 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS python3 test/distributed/rpc/test_tensorpipe_agent.py -v -k test_wait_all_workers_timeout Reviewed By: mrshenli Differential Revision: D31092483 Pulled By: dracifer fbshipit-source-id: 5b5e9f20b1d6602cf8cde3772678f721dddf0d78	2021-09-23 10:42:58 -07:00
Scott Wolchok	97b535dabd	[PyTorch] add fastToString for infer_schema (#64823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64823 We seem to spend noticable time in vfprintf for this, and the number of arguments is almost always small enough to do this in just a few instructions. ghstack-source-id: 138623354 Test Plan: Profile schema parsing, saw less time in vfprintf Reviewed By: ezyang, dhruvbird Differential Revision: D30860716 fbshipit-source-id: 09ef085cd6f93dc1eaa78790dde918ac60e67450	2021-09-23 10:15:40 -07:00
Scott Wolchok	eb949464d6	[PyTorch] Fix missing moves in SchemaParser::parseArgument (#64839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64839 Resulted in some extra shared_ptr refcount bumps. ghstack-source-id: 138623356 Test Plan: CI Reviewed By: smessmer Differential Revision: D30875749 fbshipit-source-id: 531f04c453f7410ed3d4ff054217f21a250be8e9	2021-09-23 10:14:22 -07:00
Raghavan Raman	14307f7a56	[Static Runtime] Added logging to dump the model graphs (#65509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65509 With this change, we can get dumps of the model graphs by setting the env variable `PYTORCH_JIT_LOG_LEVEL=">>impl"` while running the model. Test Plan: buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: mikeiovine Differential Revision: D31125797 fbshipit-source-id: d8979a4e138047518140e0eaecb46e012891b17c	2021-09-23 10:06:13 -07:00
Supriya Rao	767a104698	[quant] change observer FQNs generated in prepare step (#65420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65420 Context: In some FB use cases we have a need to map observer stats from train model checkpoint to inference model. We observerd that some buffer names are different becuase the intermediate activation tensors are generated differently across train and inference model. More details in https://fb.quip.com/PtGcAR0S5CQP Currently, for each observer (activation_post_process), the FQN of the module inserted is determined based on the FQN of the input tensor it is observing. In this change we change the observer FQN to include the FQN of the op/module it is observing rather than tensor/intermediate op names along with the “input”/“output” detail. Before ``` def forward(self, x): x_activation_post_process_0 = self.x_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(x_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None linear_activation_post_process_0 = self.linear_activation_post_process_0(linear); linear = None return linear_activation_post_process_0 ``` After ``` def forward(self, x): mods1_input_activation_post_process_0 = self.mods1_input_activation_post_process_0(x); x = None mods1_w = self.mods1.w mods1_w_activation_post_process_0 = self.mods1_w_activation_post_process_0(mods1_w); mods1_w = None mods1_b = self.mods1.b linear = torch.nn.functional.linear(mods1_input_activation_post_process_0, mods1_w_activation_post_process_0, bias = mods1_b); x_activation_post_process_0 = mods1_w_activation_post_process_0 = mods1_b = None mods1_output_activation_post_process_0 = self.mods1_output_activation_post_process_0(linear); linear = None return mods1_output_activation_post_process_0 ``` Test Plan: python test/test_quantization.py test_observer_fqn Imported from OSS Reviewed By: jerryzh168 Differential Revision: D31088652 fbshipit-source-id: 2f1526f578a13000b34cfd30d11f16f402fd3447	2021-09-23 09:08:10 -07:00
kshitij12345	a012216b96	[nn] Fold : no batch dim (#64909 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/64907 Reference: https://github.com/pytorch/pytorch/issues/60585 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64909 Reviewed By: cpuhrsch, heitorschueroff Differential Revision: D30991087 Pulled By: jbschlosser fbshipit-source-id: 91a37e0b1d51472935ff2308719dfaca931513f3	2021-09-23 08:37:32 -07:00
CodemodService FBSourceClangFormatLinterBot	2a4d5e4c6d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D31138547 fbshipit-source-id: ba134ae7f057c918eaefdc6310f7663e187e9749	2021-09-23 07:54:32 -07:00
Kevin Tse	9668a8a82d	[DataPipe] Update Docstrings for Tar and ZipArchiveReader (#65500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65500 cc VitalyFedyunin ejguan Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31127241 Pulled By: NivekT fbshipit-source-id: aed41aa192fe55e10ba67beda460fac70f2703c7	2021-09-23 07:20:08 -07:00
Horace He	7e7be526c9	Add TORCH_SHOW_CPP_STACKTRACES to Contributing.md (#64052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64052 Reviewed By: ezyang Differential Revision: D31107779 Pulled By: Chillee fbshipit-source-id: 2ad8ad40cd48e54fe711863c3c74df884a2e2de7	2021-09-22 22:53:19 -07:00
Rui Zhu	14949d2922	Add nn.function.hardsigmoid in acc_tracer (#65422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65422 hardsigmoid is used by mobile net v3 oss model. This diff added hardsigmoid support in acc_tracer Test Plan: buck test glow/fb/fx/acc_tracer:test_acc_shape_inference buck test glow/fb/fx/oss_acc_tracer:test_acc_tracer -- test_hardsigmoid Reviewed By: jfix71 Differential Revision: D30950304 fbshipit-source-id: 8fe4b4c6df29c06a73850d32f59321a9311f94f5	2021-09-22 20:57:42 -07:00
Bert Maher	5525e9a591	Lock unpickling of source ranges Summary: The source is shared across all threads running the torchscript interpreter, so if several threads encounter errors at once, they will all race to unpickle the source, leading to memory corruption. Test Plan: Model 217993215_0 is the problematic model; I wasn't able to repro the crash with requests stored in Hive, but I could easily by adding my devserver (SMC tier predictor.bertrand) as a shadow tier to the model's tier (inference_platform.predictor_model.prod.bi.217993215_latest). (i.e., set shadow_tier property to predictor.bertrand=1 to proxy 1% of traffic). With this diff, the ASAN/TSAN errors go away. Reviewed By: suo Differential Revision: D31044009 fbshipit-source-id: 56f9ef3880e7cf09f334db71b4256e362b4de965	2021-09-22 20:41:02 -07:00
Jongsoo Park	228141f939	[pytorch] more informative error msg from fbgemm embedding spmdm call (#65186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65186 FBGEMM JIT'ed EmbeddingSpMDM kernel just returns false when there's an error delegating detailed error handling to the caller (since each framework like PyTorch and Caffe2 wants to do error handling differently). Many of PyTorch code was simply reporting there was "an" error without pinpointing exactly why error happened. This diff introduces more informative error msg following what Caffe2 was doing. Test Plan: CI Reviewed By: dskhudia Differential Revision: D31008300 fbshipit-source-id: b8d069af0692dc86dc642b18a9c68f22deaffea3	2021-09-22 20:13:34 -07:00
Shiyan Deng	0ca1102609	[fx2trt] fuse permute + matmul using a pass instead of hardcoding it as a leaf module (#65482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65482 Currently we hardcoded permute + bmm in a module and tagged it as a leaf module during tracing. This diff introduces a pass to fuse permute + matmul to a single node. TODO: For fusion transformation like this kind, they would actually share many similar code like finding the fusion pattern, replacing original nodes with fused node. Current fx subgraph rewriter allows us to specify patterns that we want to replace but we would need to extend it to allow specify constraint on nodes' kwargs. Reviewed By: yinghai Differential Revision: D31022055 fbshipit-source-id: 13d1f18d79b09d371897ecde840f582ccaf5713a	2021-09-22 18:43:09 -07:00
Shiyan Deng	fccaa4a3c8	[fx2trt] fix transpose unittest (#65481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65481 Previous we have `acc_ops.transpose` but after a recent diff `torch.transpose` is mapped to `acc_ops.permute`. Here we clean up the fx2trt unittest for transpose and add support for negative indices in permute. Reviewed By: wushirong Differential Revision: D31115280 fbshipit-source-id: 58e689e6dd14181aea5186f3bb5b8745a07d0e51	2021-09-22 18:08:55 -07:00
Wanchao Liang	2f67579864	[ddp] use named_params and named_buffers explicitly (#65181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65181 This PR changes `state_dict()` during sync to `named_parameters` and `named_buffers` explicitly. the underlying motivation is that, `state_dict()` doesn't necessarily equals to "params + buffers" for all cases, state_dict is used for checkpoint purpose mainly, and params/buffers are used for training, we might have cases that params/buffers be in different forms with state_dict (i.e. state_dict we might want to save in small pieces of tensors while in training we want to concat the tensors together for performance reasons). ghstack-source-id: 138701159 Test Plan: wait for ci Reviewed By: divchenko, rohan-varma Differential Revision: D31007085 fbshipit-source-id: 4e1c4fbc07110163fb9b09b043ef7b4b75150f18	2021-09-22 17:32:54 -07:00
Max Ren	0eaf081018	[JIT] canonicalize aten::rsub (#65014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65014 ghstack-source-id: 138656948 Test Plan: ``` (pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing ........s...................... ---------------------------------------------------------------------- Ran 31 tests in 0.393s OK (skipped=1) (pytorch) [maxren@devvm3115.atn0 ~/pytorch] python3 test/test_jit.py TestPeephole.test_normalized_rsub CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing . ---------------------------------------------------------------------- Ran 1 test in 0.015s OK ``` Reviewed By: eellison Differential Revision: D30941389 fbshipit-source-id: 03f0416d99090845c9bfb1e5fcf771d5f1d7a050	2021-09-22 17:20:46 -07:00
Balaji	32f0387ee8	Bug in CosineAnnealingWarmRestarts in optim/lr_scheduler.py (#64758 ) Summary: ## {emoji:1f41b} Bug 'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'. In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts. The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error. This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object. ![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png) ## To Reproduce Steps to reproduce the behavior: 1. Give the value for the last_epoch argument as zero OR 1. Give the value for the last_epoch argument as a Positive integer. ## Expected behavior I only expected the 'CosineAnnealingWarmRestarts' object to be initialized. ## Environment PyTorch version: 1.9.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.2 Libc version: glibc-2.31 Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA ## Additional context We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object. Pull Request resolved: https://github.com/pytorch/pytorch/pull/64758 Reviewed By: ezyang Differential Revision: D31113694 Pulled By: jbschlosser fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721	2021-09-22 16:55:14 -07:00
Rodrigo Berriel	b80bdcc73b	Add register_module alias to nn.Module (#65174 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`: ![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png) An alternative implementation could be: ```python def register_module(self, name: str, module: Optional['Module']) -> None: r"""Alias for :func:`add_module`.""" self.add_module(name, module) ``` which results in this documentation: ![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png) Questions: 1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](`873255c6d9/test/test_nn.py (L1420-L1434)`) and [test_add_module](`873255c6d9/test/test_nn.py (L1837-L1855)`). 2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](`873255c6d9/torch/_C/__init__.pyi.in (L468)`), which means something else, I think. Should I do anything about them? cc ngimel SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/65174 Reviewed By: soulitzer Differential Revision: D31089717 Pulled By: jbschlosser fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9	2021-09-22 16:37:28 -07:00
Raghavan Raman	31584d065e	[Static Runtime] Added NNC implementation for signed log1p kernel. (#65387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65387 Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op. Also, added a SR microbenchmark for this kernel which shows the performance improvement. Without fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 1953 ns 1953 ns 358746 BM_signed_log1p/64 2049 ns 2049 ns 342145 BM_signed_log1p/512 3291 ns 3291 ns 214342 BM_signed_log1p/4096 15559 ns 15559 ns 44420 BM_signed_log1p/32768 101936 ns 101935 ns 6843 BM_signed_log1p/65536 194792 ns 194789 ns 3615 ``` With NNC fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 369 ns 369 ns 1896179 BM_signed_log1p/64 497 ns 497 ns 1406995 BM_signed_log1p/512 1618 ns 1618 ns 430209 BM_signed_log1p/4096 11327 ns 11326 ns 61463 BM_signed_log1p/32768 84099 ns 84086 ns 8325 BM_signed_log1p/65536 166531 ns 166510 ns 4186 ``` This clearly shows >15% improvement in performance of this kernel with NNC fusion. On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops: without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved) with NNC fusion: `0.55%` Test Plan: `buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1) ``` get 57220 prediction values get 57220 prediction values max_error: 0 total: 0 ``` Reviewed By: hlu1 Differential Revision: D30609492 fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd	2021-09-22 15:53:33 -07:00
Tao Xu	1c20b98b4b	[iOS][CoreML] Check backend availability at runtime. (#65315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65315 ghstack-source-id: 138703808 Test Plan: - OSS builds and BUCK builds - CircleCI Reviewed By: hanton Differential Revision: D31048011 fbshipit-source-id: 824a8e32d65de2caf25e41efef2b022ddbb63156	2021-09-22 15:38:53 -07:00
Peter Bell	2898ef7549	Minor ScanKernels.cu cleanup (#65350 ) Summary: - Replace THCNumerics with `at::_isnan` - Replace `contiguous` with `expect_contiguous` - Don't use `contiguous` on output tensors. Instead skip the copy and just create a new empty tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65350 Reviewed By: ezyang Differential Revision: D31103501 Pulled By: ngimel fbshipit-source-id: 9030869e28d6c570fad074fd0502076de8e2ab09	2021-09-22 15:34:01 -07:00
Rohan Varma	5739f77775	[DDP] Refactor and remove sync_params (#64514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64514 sync_params is a misnomer since we don't actually synchroniz parameters. While removing this I realized `self._check_and_sync_module_buffers` does almost everything we need it to, so just refactored that and made DDP forward call into it. ghstack-source-id: 138684982 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30751231 fbshipit-source-id: add7c684f5c6c71dad9e9597c7759849fa74f47a	2021-09-22 14:12:51 -07:00
Rohan Varma	ce5981e431	[DDP] Custom buffer reduction (#64513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64513 Proposal: https://github.com/pytorch/pytorch/issues/63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138655663 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30751152 fbshipit-source-id: 257a9d46bb178d8812d4ea5a4d9c6140b8a1791f	2021-09-22 14:11:35 -07:00
Nikita Shulga	923f06621c	Fix Windows ninja builds when MAX_JOBS is specified (#65444 ) Summary: Reported by cloudhan in https://github.com/pytorch/pytorch/pull/64733#issuecomment-924545463 Fixes regression introduced by `047e68235f` cc malfet seemethere Pull Request resolved: https://github.com/pytorch/pytorch/pull/65444 Reviewed By: dagitses, seemethere Differential Revision: D31103260 Pulled By: malfet fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284	2021-09-22 14:04:31 -07:00
Zhengxu Chen	cbc3db8274	Create test for builtin tensorrt module in torch deploy (#63819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63819 ghstack-source-id: 138521664 Test Plan: buck test mode/dev-nosan caffe2/torch/csrc/deploy:test_deploy_gpu buck test mode/opt-split-dwarf caffe2/torch/csrc/deploy:test_deploy_gpu Reviewed By: wconstab Differential Revision: D30499301 fbshipit-source-id: 0bc165b4ed5be28ebb0becc65f292cf26368692f	2021-09-22 13:42:35 -07:00
Eli Uriegas	72fc53ff27	.github: Add timeout for test step (#65486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65486 Adding this after observing jobs running for 6+ hours on `pytorch/pytorch-canary`, still trying to debug why they happen there but this should resovle jobs running forever Signed-off-by: Eli Uriegas <eliuriegas@fb.com> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: ezyang, malfet, janeyx99 Differential Revision: D31117497 Pulled By: seemethere fbshipit-source-id: 126a10e844bdef77c2852cc5c392e5f37f130f7e	2021-09-22 13:23:41 -07:00
Jessica Choi	f24bd43375	Changing type and name of local_used_maps to reflect that it is only one map (#65380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65380 Fixing bugs that arise when running setup.py develop cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31104844 Pulled By: jaceyca fbshipit-source-id: acfd4cf316c71177df758ca55b470f51a17f776b	2021-09-22 11:35:33 -07:00
Chiang, Yu-Hsun (oToToT)	0fe86ac6c6	Fix torch.any documentation (#65310 ) Summary: Currently, the description of torch.any would be parsed like ``` param input the input tensor. ``` However, it should be ``` Tests if any element in input evaluates to True. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/65310 Reviewed By: ezyang Differential Revision: D31102918 Pulled By: soulitzer fbshipit-source-id: 678ade20ba16ad2643639fbd2420c8b36fcd8bd7	2021-09-22 11:24:20 -07:00
Rodrigo Berriel	a0dea074b2	Remove `.data` from benchmarks and tensorboard (#65389 ) Summary: Related to https://github.com/pytorch/pytorch/issues/30987 and https://github.com/pytorch/pytorch/issues/33628. Fix the following tasks: - Remove the use of `.data` in all our internal code: - [x] `benchmarks/` - [x] `torch/utils/tensorboard/` cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/65389 Reviewed By: soulitzer Differential Revision: D31093464 Pulled By: albanD fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232	2021-09-22 11:16:59 -07:00
Edward Yang	70a545b21e	Add Tensor._make_wrapper_subclass (#65340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65340 I thought about a few possible ways of doing this. The main hazard is that if I create a CPU tensor that doesn't have any real storage, the moment I actually try to access the data on the tensor I will segfault. So I don't want to use _make_subclass on a "cpu meta tensor" because the CPU meta tensor (with no subclass) is radioactive: printing it will immediately cause a segfault. So instead, I have to create the CPU meta tensor AND subclass all in one go, and that means I need another function for it. One downside to doing it this way is I need another overload for explicit strides, and in general it is difficult to get the view relationships to all work out properly; tracked at https://github.com/pytorch/pytorch/issues/65339 Fixes https://github.com/pytorch/pytorch/issues/62972 Fixes https://github.com/pytorch/pytorch/issues/62730 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31057231 Pulled By: ezyang fbshipit-source-id: 73522769e093ae8a1bf0c7f7e594659bfb827b28	2021-09-22 11:10:47 -07:00
Rodrigo Berriel	11ca641491	[docs] Add images to some activation functions (#65415 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/65368. See discussion in the issue. cc mruberry SsnL jbschlosser soulitzer Pull Request resolved: https://github.com/pytorch/pytorch/pull/65415 Reviewed By: soulitzer Differential Revision: D31093303 Pulled By: albanD fbshipit-source-id: 621c74c7a2aceee95e3d3b708c7f1a1d59e59b93	2021-09-22 11:05:29 -07:00
anjali411	158393e1a1	Fix autograd engine checks and update InputMetadata (#65235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235 1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor. 2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor. 3. To fix https://github.com/pytorch/pytorch/issues/65016: a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has python key set on it (tensor subclass). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31117318 Pulled By: anjali411 fbshipit-source-id: 825401df98695c48bf9b320be54585f6aff500bd	2021-09-22 11:01:19 -07:00
Michael Suo	db4b68b3ac	Back out "Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1" Summary: Original commit changeset: 9cfda47cafb3 Test Plan: unland Reviewed By: ezyang Differential Revision: D31116643 fbshipit-source-id: 631eea446ed48c63ca39281d24163a2eadbe8d12	2021-09-22 10:37:27 -07:00
Michael Suo	b3ec88f41f	ugh (#65477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65477 Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D31115936 Pulled By: suo fbshipit-source-id: fb16911a683713fdc2393bfe7150fc29c7d6814f	2021-09-22 10:15:41 -07:00
Brian Hirsh	152f0236c3	Revert D31082693: Fix autograd engine checks and update InputMetadata Test Plan: revert-hammer Differential Revision: D31082693 (`9324d682fd`) Original commit changeset: cb551cd438c6 fbshipit-source-id: fc60f86b80fc70058984df6bccbf240d27f5843e	2021-09-22 10:00:08 -07:00
Michael Suo	7c9a278895	fix trailing newlines (#65474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65474 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D31114952 Pulled By: suo fbshipit-source-id: 3b8cde2098635450c3e22571a401f78e4e54e9e0	2021-09-22 09:48:34 -07:00
Jerry Zhang	508845f2b5	[quant] AO migration of the `torch/quantization/quantize_fx.py` and `torch/quantization/fx/` (#65033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65033 1. Move the file: ``` hg mv caffe2/torch/quantization/fx caffe2/torch/ao/quantization/fx hg mv caffe2/torch/quantization/quantize_fx.py caffe2/torch/ao/quantization/quantize_fx.py ``` 2. Create new files ``` touch caffe2/torch/quantization/quantize_fx.py touch caffe2/torch/quantization/fx/__init__.py ``` 3. import things in the new files 4. add tests to test/quantization/ao_migration/test_quantization_fx.py this is because we have some fx import in quantize_fx and fx/.py Test Plan: buck test mode/dev //caffe2/test:quantization Reviewed By: vkuzo, z-a-f Differential Revision: D30949749 fbshipit-source-id: 9e5d4d039c8a0a0820bc9040e224f0d2c26886d3	2021-09-22 09:29:15 -07:00
Shirong Wu	762c2276e1	feed model merge net lower benchmark (#65191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65191 Test Plan: run command: buck run mode/opt -c python.package_style=inplace hpc/new/models/feed/benchmark:feed_lower_benchmark example output: Eager, BS: 2048, TFLOP/s: 253.25, Time per iter: 4.49ms, QPS: 456289.25 TensorRT, BS: 2048, TFLOP/s: 162.30, Time per iter: 7.00ms, QPS: 292426.58 Reviewed By: yinghai Differential Revision: D31010288 fbshipit-source-id: f30b520eca9508439588bcf48476b1b1edfb09af	2021-09-22 09:21:18 -07:00
Brian Hirsh	bcc6e3ab5e	add python API to print all operators that have kernels registered to a particular DispatchKey (#63575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63575 Test Plan: Imported from OSS Reviewed By: ezyang, Chillee Differential Revision: D30426919 Pulled By: bdhirsh fbshipit-source-id: b0e487e48dfe02f7b9d678403f0a2b5bfe146f4e	2021-09-22 09:15:55 -07:00
anjali411	9324d682fd	Fix autograd engine checks and update InputMetadata (#65235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65235 1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor. 2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor. 3. To fix https://github.com/pytorch/pytorch/issues/65016: a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has python key set on it (tensor subclass). Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31082693 Pulled By: anjali411 fbshipit-source-id: cb551cd438c6ca40b0f18a4d0009e0861cf0fd4e	2021-09-22 07:49:52 -07:00
Peter Bell	f90d9b48db	test_neg_view: preseve sign of sample input (#63010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63010 This changes `test_neg_view` to call the operator with the same numeric values as the original sample input. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D31082824 Pulled By: anjali411 fbshipit-source-id: 7d50f99dc0d1343247e366cbe9b0ca081bd0a9b1	2021-09-22 07:47:42 -07:00
Vitaly Fedyunin	9d17f21e46	Added PandasDataframeWrapper (#65411 ) Summary: - Added `PandasDataframeWrapper` around `pandas` functions to easily drop-and-replace`torcharrow` for Facebook internal use cases - Updated relevant datapipe/dataframe usesites to use the new `PandasDataframeWrapper` instead of calling `pandas` functions directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/65411 Reviewed By: VitalyFedyunin, hudeven Differential Revision: D31087746 Pulled By: Nayef211 fbshipit-source-id: 299901f93a967a5fb8ed99d6db9b8b9203634b8f	2021-09-22 07:42:45 -07:00
Edward Yang	3c6d9fd124	Eagerly populate python_error::what() when TORCH_SHOW_CPP_STACKTRACES=1 (#65376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65376 Let's suppose there's a bug in PyTorch and python_error gets thrown and never gets caught. Typically, you'll get a very useless error message like this: ``` terminate called after throwing an instance of 'python_error' what(): Aborted (core dumped) ``` Now, you'll get: ``` what(): unknown Python error (for more information, try rerunning with TORCH_SHOW_CPP_STACKTRACES=1) ``` and with TORCH_SHOW_CPP_STACKTRACES=1 you'll get: ``` what(): error message from Python object ``` If we're OK with making Python exceptions go even slower, we could eagerly populate unconditionally. I'm also not so happy we don't get a Python backtrace or the Python error name, that's worth improving (this is a minimal diff to get the discussion going.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31067632 Pulled By: ezyang fbshipit-source-id: 9cfda47cafb349ee3d6853cdfb0f319073b87bff	2021-09-22 07:12:28 -07:00
Nikita Shulga	2c7df1360a	Bump torch version to 1.11 (#65435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65435 Reviewed By: zhouzhuojie Differential Revision: D31099045 Pulled By: malfet fbshipit-source-id: 6ae6ca8a4b652fc51ee3138c800d067e144acbaa	2021-09-22 07:07:16 -07:00
Erjia Guan	96383ca704	Unify the output pathname of archive reader and extractor (#65424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65424 This PR is re-implementation for https://github.com/facebookexternal/torchdata/pull/93 Same PR has landed into torchdata https://github.com/facebookexternal/torchdata/pull/157 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D31090447 Pulled By: ejguan fbshipit-source-id: 45af1ad9b24310bebfd6e010f41cff398946ba65	2021-09-22 06:34:29 -07:00
Nikita Shulga	e331beef20	Delete code coverage jobs from CI (#65362 ) Summary: As it does not seem useful to the lots of peope, see https://fb.workplace.com/groups/1144215345733672/posts/2062909540530910 Pull Request resolved: https://github.com/pytorch/pytorch/pull/65362 Reviewed By: janeyx99, bdhirsh Differential Revision: D31061945 Pulled By: malfet fbshipit-source-id: 912ed92cc901a370a40448f1127c3ba43640ac43	2021-09-22 05:38:35 -07:00
jiej	127c9402d0	Revert "Revert D30752939: [pytorch][PR] nvfuser update" (#65137 ) Summary: This reverts commit 03389dc851db6f3ca52f9a4455ce2090c64a223d. Attempt again for PR: https://github.com/pytorch/pytorch/issues/63745 Fixes the windows build failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65137 Reviewed By: seemethere, dzhulgakov, heitorschueroff Differential Revision: D30994556 Pulled By: malfet fbshipit-source-id: f1925b6c5cc1a1a441a96499667c91e8dfc1b53d	2021-09-22 04:54:51 -07:00
Jerry Zhang	feefc94573	[fx2trt] Use itensor_to_tensor_meta to track the TensorMeta info for ITensor node (#65427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65427 Previously we added a input_tensor_meta for dequantize function, this is a bit hacky since this creates a dependency between the arguments of dequantize and if there are passes that changes the input then we would need to update tensor meta as well Test Plan: python torch/fx/experimental/fx2trt/example/quantized_resnet_test.py Imported from OSS Reviewed By: soulitzer Differential Revision: D31094274 fbshipit-source-id: 5e40648d3081e2363f3a70bcc9745df4a8190ad3	2021-09-22 00:02:31 -07:00
Michael Carilli	64d3c7388f	[RELAND] Enable ncclAvg for reductions (#62835 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/62303. Reverts the revert, and restores some diffs that were mysteriously missing from the reverted revert. I think some of the diffs I pushed to the original PR raced with its import or landing, such that the original PR's merge didn't pick up all the diffs I wanted. I don't know enough about the landing process to do more than speculate wildly, but hopefully this resubmit sorts things out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/62835 Reviewed By: zhouzhuojie, seemethere, janeyx99, heitorschueroff Differential Revision: D30999982 Pulled By: malfet fbshipit-source-id: 1f70ab4055208f1c6a80c9fc9fbe292ce68ecaa9	2021-09-21 18:09:45 -07:00
Yuan Shangguan (June)	3f5f721ab3	Pass through allow-list from prepare_qat into propagate_qconfig_ to allow custom mapping and custom QAT module (#65119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65119 Pytorch Quantization: allow prepare_qat to include custom module by passing allow_list into the prepare_qat. When we are implementing custom module and custom mapping for Quantization Aware Training (QAT), we need to add the custom module to the mappings and to the allow_list during prepare_qat. The allow_list needs to be surfaced to the propagate_qconfig_. Test Plan: relying on general unit test Reviewed By: supriyar Differential Revision: D30982060 fbshipit-source-id: 1114115b6a3b853238d33d72b5cbaafc60f463e0	2021-09-21 17:15:25 -07:00
Jessica Choi	158b8bdc8a	Cleaning up DDP SPMD in reducer.cpp (#64113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64113 Since there is only one model replica per process, `replicas` can be simplified from `std::vector<std::vector<at::Tensor>>` to `std::vector<at::Tensor>` in the Reducer class. Test Plan: All tests are passing `pytest test/distributed/test_c10d_gloo.py -vs` Imported from OSS Reviewed By: mrshenli Differential Revision: D30615965 fbshipit-source-id: d2ec809d99b788c200b01411333e7dbad1269b51	2021-09-21 16:13:18 -07:00
Keyrat06	27faa7a560	[ONNX] Support torch.isfinite export (#64759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/issues/64754 1. onnx::IsInf is introduced in opset 10, onnx:isnan is introduced in opset 9 -> isfinite = not(or(isinf,isnan)) -> opset 10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64759 Test Plan: Imported from OSS Reviewed By: seemethere, bdhirsh Differential Revision: D31060760 Pulled By: malfet fbshipit-source-id: 499ecd6cc55ea881b8a57e6a9a4fb38eaaee5242	2021-09-21 15:47:48 -07:00
Jane Xu	5aa33770f5	.circleci: Remove Windows workflows from Circle (#64959 ) Summary: Removes Windows CI from Circle Will go in after https://github.com/pytorch/pytorch/pull/65094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/64959 Reviewed By: soulitzer Differential Revision: D31095374 Pulled By: janeyx99 fbshipit-source-id: b0d13a59aa8c6e2f85dbd9c343cac395c4e64475	2021-09-21 15:32:24 -07:00
Erjia Guan	a1216061c1	[DataPipe] Fix deepcopy filehandle for Mapper and in-place modification for IterableWrapper (#65220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65220 Fixes #65221 - Remove deepcopy from Mapper to support file handles - Convert `IterableWrapper` to deepcopy iterable instance within each iterator to prevent in-place modification (different data per epoch) - Convert `IDP` to `IterableWrapper` in test_datapipe.py - Refine the variable names (prevent using `dp` that is module reference) Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D31021886 Pulled By: ejguan fbshipit-source-id: 72a9eee66c758e2717d591cd0942892bddedc223	2021-09-21 14:29:40 -07:00
BowenBao	73c4bfc30a	[ONNX] Add log10 symbolic (#63418 ) (#64374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64374 Fixes #61332 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D30919609 Pulled By: msaroufim fbshipit-source-id: f474376bbf7b59677b10565f316384eca59dba43 Co-authored-by: Shubham Bhokare <shubhambhokare@gmail.com>	2021-09-21 13:30:59 -07:00
Ivan Yashchuk	1fec9cd76b	[Fixed] Enable Half, BFloat16, and Complex dtypes for coo-coo sparse matmul [CUDA] (#59980 ) Summary: This PR enables Half, BFloat16, ComplexFloat, and ComplexDouble support for matrix-matrix multiplication of COO sparse matrices. The change is applied only to CUDA 11+ builds. `cusparseSpGEMM` also supports `CUDA_C_16F` (complex float16) and `CUDA_C_16BF` (complex bfloat16). PyTorch also supports the complex float16 dtype (`ScalarType::ComplexHalf`), but there is no convenient dispatch, so this dtype is omitted in this PR. cc nikitaved pearu cpuhrsch IvanYashchuk ezyang anjali411 dylanbespalko mruberry Lezcano Pull Request resolved: https://github.com/pytorch/pytorch/pull/59980 Reviewed By: ngimel Differential Revision: D30994115 Pulled By: cpuhrsch fbshipit-source-id: 4f55b99e8e25079d6273b4edf95ad6fa85aeaf24	2021-09-21 13:03:40 -07:00
albanD	8bab468943	Reduce test size for max_pool (#65336 ) Summary: Fixe OOM in slow gradcheck tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65336 Reviewed By: malfet Differential Revision: D31059007 Pulled By: albanD fbshipit-source-id: 2dd6967d88663558e37f8c0836ad33333c92dfb5	2021-09-21 12:57:02 -07:00
Emilio Castillo	cd813f16bf	Add functional api for `nn.Module` (#61447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58839 After discussing with albanD he proposed this simple design. Let's iterate over the idea here :). Thanks. The main point that this PR does is to use reparametrization to be reverted at the end of the functional call. This allows us to have the original model with its status unchanged, also in this scenario the module is created without parameters so this will hard error if not all parameters are specified when the forward pass is done. ``` python import torch import torch.nn.utils._stateless class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.l1 = torch.nn.Linear(1, 1) def forward(self, x): return self.l1(x) mod = MyModule() print('weight before', mod.l1.weight) x = torch.rand((1, 1)) parameters = {"l1.weight": torch.nn.Parameter(torch.tensor([[1.0]])), "l1.bias": torch.nn.Parameter(torch.tensor([0.0]))} res = torch.nn.utils._stateless.functional_call(mod, parameters, x) print('Functional call input ', x, ' and result ', res) print('weight after', mod.l1.weight) ``` Output ``` weight before Parameter containing: tensor([[-0.4419]], requires_grad=True) Functional call input tensor([[0.3531]]) and result tensor([[0.3531]], grad_fn=<AddmmBackward>) weight after Parameter containing: tensor([[-0.4419]], requires_grad=True) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/61447 Reviewed By: soulitzer Differential Revision: D31082765 Pulled By: albanD fbshipit-source-id: ba814d0f9162fb39c59989ca9a8efe160405ba76	2021-09-21 12:39:43 -07:00
Pritam Damania	c245632e2e	Use higher timeout for TSAN tests. (#65391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65391 TSAN tests are much slower than the usual dev/opt mode, about 5-10x slower. As a result, for TSAN build mode we use a much higher timeout for distributed tests. ghstack-source-id: 138584613 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D31076575 fbshipit-source-id: 44a485f07101deac536470ceeff2a52cac4f9e0b	2021-09-21 12:08:27 -07:00
Kushashwa Ravi Shrimali	28bfdbb066	OpInfo for `nn.functional.batch_norm` (#63218 ) Summary: Addresses https://github.com/facebookresearch/functorch/issues/78 and https://github.com/pytorch/pytorch/issues/54261. * There exists `torch.batch_norm` but it takes an extra arg: `cudnn_enabled` (not there in functional variant). This is passed from the functional variant to `torch.batch_norm` here: https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L2282. `test_variant_consistency_jit` fails with an error: (when passed an alias) ```python File "/home/krshrimali/Documents/Projects/Quansight/pytorch/test/test_ops.py", line 457, in _test_consistency_helper variant_forward = variant(cloned, TypeError: batch_norm() missing 1 required positional arguments: "cudnn_enabled" ``` * I'm not sure of a solution to this, as AFIK - there is no way to pass a lambda wrapper for an alias. Hence, I've skipped adding this as an alias there. * On second thought, is this even an alias? cc: mruberry zou3519 kshitij12345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/63218 Reviewed By: bdhirsh Differential Revision: D31019785 Pulled By: zou3519 fbshipit-source-id: 2a834d05835da975289efc544a7ad7e98c99438f	2021-09-21 11:35:34 -07:00
Jane Xu	9afdf017dc	Add force_on_cpu test to win cuda10.2 on GHA (#65094 ) Summary: Part of migrating from Circle. Once we get a successful force_on_cpu test, we can move it to trunk only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/65094 Reviewed By: seemethere Differential Revision: D31086289 Pulled By: janeyx99 fbshipit-source-id: e1d135cc844d51f0b243b40efb49edca277d9de8	2021-09-21 11:14:15 -07:00
Rodrigo Berriel	00b732e98b	Remove orphan from cuDNN persistent note (#65160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/60009. As the document is properly [included](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L799), and [`:orphan:` doesn't need to be used in included documents](https://github.com/sphinx-doc/sphinx/issues/6787#issuecomment-549256840), and no warning is emitted in my local build when removing it, I think it can be removed. The artifact reported in https://github.com/pytorch/pytorch/issues/60009 can be seen in 3 pages: [torch.nn.RNN](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html#torch.nn.RNN), [torch.nn.LSTM](https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html#torch.nn.LSTM), and [torch.nn.GRU](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html#torch.nn.GRU). cc ezyang suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/65160 Reviewed By: bdhirsh Differential Revision: D31020280 Pulled By: ezyang fbshipit-source-id: 6c3541e5a856a91cf1ce1d2db4d04f5d13118ee4	2021-09-21 11:09:47 -07:00
Scott Wolchok	c0eb266c02	[Static runtime] Micro-optimization pass on GetLivenessMap (#65175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65175 More efficient use of map API, more efficient way to insert all pairs of inputs/outputs in liveness map ghstack-source-id: 138547815 Test Plan: Time to enable static runtime down from ~8.7s to ~8.4s Reviewed By: mikeiovine Differential Revision: D30983897 fbshipit-source-id: fa6000bfd0fa0adfcd7c5922199ee32ada8c430e	2021-09-21 10:52:08 -07:00
Edward Yang	6d7bc34b67	Make new_empty/new_ones/new_zeros/new_full respect subclass (#65169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65169 Previously these composite functions created a new tensor using at::empty (or some other factory function) using TensorOptions which doesn't preserve Python subclass. Making new_empty a non-composite op and then routing everyone through it makes it respect subclass. We could also make all of these non-composite but this reduces the number of derivatives.yaml entries I have to make and allows you to trace the fill calls. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31003713 Pulled By: ezyang fbshipit-source-id: 19f906f1404a6b724769c49f48d123f407a561ff	2021-09-21 10:50:48 -07:00
Scott Wolchok	04a5e45aeb	[PyTorch] Compare Type pointers before calling operator== in EqualNode (#65352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65352 This can be a big win if it saves the virtual call to operator== and the cost is tiny. ghstack-source-id: 138497657 Test Plan: Profiled ptvsc2_predictor_bench startup, inclusive time spent in EqualNode::operator() dropped from 0.8% to negligible Reviewed By: hlu1 Differential Revision: D30974969 fbshipit-source-id: 9c3af36cffe709dfce477dcc49722536470264a0	2021-09-21 10:46:24 -07:00
Edward Yang	88232b4cee	Fix ENABLE_RECORD_KERNEL_FUNCTION_DTYPE build (#65370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/65370 Forgot a wrapping 'namespace at' here! And no contbuilds to test it. ghstack-source-id: 138565579 Test Plan: ``` buck build --show-output -c pt.disable_per_op_profiling=0 -c pt.enable_record_kernel_dtype=1 -c pt.has_backtraces=1 fbsource//xplat/caffe2/fb/model_tracer:model_tracer ``` Reviewed By: JacobSzwejbka Differential Revision: D31065923 fbshipit-source-id: ed4563fbd8f3c29f6b10ac8999c9010bd4359c97	2021-09-21 10:42:33 -07:00

3946 changed files with 298383 additions and 111113 deletions

									
										2

.azure_pipelines/job_templates/prepare-build-template.yml
									
												View File
												
				@ -46,7 +46,7 @@ steps:

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output .\tmp_bin\sccache.exe

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output .\tmp_bin\sccache-cl.exe

				      copy .\tmp_bin\sccache.exe .\tmp_bin\nvcc.exe

				      curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output .\tmp_bin\randomtemp.exe

				      curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.4/randomtemp.exe --output .\tmp_bin\randomtemp.exe

				    displayName: Install sccache and randomtemp

				    condition: not(eq(variables.CUDA_VERSION, ''))

									
										4

.azure_pipelines/job_templates/set-environment-variables.yml
									
												View File
												
				@ -120,9 +120,7 @@ steps:

				        Write-Host "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]$(Build.SourcesDirectory)\mkl\lib;$env:CMAKE_LIBRARY_PATH"

				        Write-Host "##vso[task.setvariable variable=ADDITIONAL_PATH;]$(Build.SourcesDirectory)\tmp_bin"

				        Write-Host "##vso[task.setvariable variable=SCCACHE_IDLE_TIMEOUT;]1500"

				        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\nvcc.exe"

				        Write-Host "##vso[task.setvariable variable=CUDA_NVCC_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\randomtemp.exe"

				        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_BASEDIR;]$(Build.SourcesDirectory)\tmp_bin"

				        Write-Host "##vso[task.setvariable variable=CMAKE_CUDA_COMPILER_LAUNCHER;]$(Build.SourcesDirectory)/tmp_bin/randomtemp.exe;$(Build.SourcesDirectory)/tmp_bin/sccache.exe"

				      displayName: Set MKL, sccache and randomtemp environment variables

				    # View current environment variables

6

.bazelrc

View File

 @ -1,6 +1,7 @@
 build --copt=--std=c++14
 build --copt=-I.
 build --copt=-isystem --copt bazel-out/k8-fastbuild/bin
 build --experimental_ui_max_stdouterr_bytes=2048576
 # Configuration to disable tty features for environments like CI
 build:no-tty --curses no
 @ -11,3 +12,8 @@ build:no-tty --show_progress_rate_limit 10
 build:gpu --define=cuda=true
 # define a separate build folder for faster switching between configs
 build:gpu --platform_suffix=-gpu
 # rules_cuda configuration
 build:gpu --@rules_cuda//cuda:enable_cuda
 build:gpu --@rules_cuda//cuda:cuda_targets=sm_52
 build:gpu --@rules_cuda//cuda:compiler=nvcc
 build:gpu --repo_env=CUDA_PATH=/usr/local/cuda

									
										3

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -63,7 +63,8 @@ CONFIG_TREE_DATA = OrderedDict(

				        ],

				    )),

				    windows=(

				        [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS],

				        # Stop building Win+CU102, see https://github.com/pytorch/pytorch/issues/65648

				        [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS and v != "cuda102"],

				        OrderedDict(

				            wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				            conda=dimensions.STANDARD_PYTHON_VERSIONS,

									
										4

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -4,12 +4,13 @@ CUDA_VERSIONS = [

				    "102",

				    "111",

				    "113",

				    "115",

				]

				ROCM_VERSIONS = [

				    "4.0.1",

				    "4.1",

				    "4.2",

				    "4.3.1",

				]

				ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

				@ -17,7 +18,6 @@ ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

				GPU_VERSIONS = [None] + ["cuda" + v for v in CUDA_VERSIONS] + ROCM_VERSION_LABELS

				STANDARD_PYTHON_VERSIONS = [

				    "3.6",

				    "3.7",

				    "3.8",

				    "3.9"

									
										74

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -1,70 +1,7 @@

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				from cimodel.lib.conf_tree import ConfigNode

				CONFIG_TREE_DATA = [

				    ("xenial", [

				        ("gcc", [

				            ("5.4", [  # All this subtree rebases to master and then build

				                ("3.6", [

				                    ("important", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				            ("7", [X("3.6")]),

				        ]),

				        ("clang", [

				            ("7", [

				                ("3.6", [

				                    ("asan", [

				                        (True, [

				                            ("shard_test", [XImportant(True)]),

				                        ]),

				                    ]),

				                    ("onnx", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("cuda", [

				            ("10.2", [

				                ("3.6", [

				                    # Build are needed for slow_gradcheck

				                    ('build_only', [X(True)]),

				                    ("slow_gradcheck", [

				                        # If you update this slow gradcheck, you should

				                        # also update docker_definitions.py to make sure

				                        # the docker image match the config used here

				                        (True, [

				                            ('shard_test', [XImportant(True)]),

				                        ]),

				                    ]),

				                    # UNCOMMENT THE BELOW TO REENABLE LIBTORCH

				                    # ("libtorch", [

				                    #     (True, [

				                    #         ('build_only', [X(True)]),

				                    #     ]),

				                    # ]),

				                ]),

				            ]),

				        ]),

				    ]),

				    ("bionic", [

				        ("clang", [

				            ("9", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                    ("vulkan", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        # @jithunnair-amd believes Jenkins builds are sufficient

				        # ("rocm", [

				        #     ("3.9", [

				        #         ("3.6", [

				        #             ('build_only', [XImportant(True)]),

				        #         ]),

				        #     ]),

				        # ]),

				    ]),

				]

				@ -145,7 +82,6 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				            "build_only": BuildOnlyConfigNode,

				            "shard_test": ShardTestConfigNode,

				            "cuda_gcc_override": CudaGccOverrideConfigNode,

				            "coverage": CoverageConfigNode,

				            "pure_torch": PureTorchConfigNode,

				            "slow_gradcheck": SlowGradcheckConfigNode,

				        }

				@ -289,14 +225,6 @@ class ShardTestConfigNode(TreeConfigNode):

				        return ImportantConfigNode

				class CoverageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["is_coverage"] = node_name

				    def child_constructor(self):

				        return ExperimentalFeatureConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "IMPORTANT=" + str(label)

									
										27

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -239,7 +239,6 @@ def instantiate_configs(only_slow_gradcheck):

				        compiler_version = fc.find_prop("compiler_version")

				        is_xla = fc.find_prop("is_xla") or False

				        is_asan = fc.find_prop("is_asan") or False

				        is_coverage = fc.find_prop("is_coverage") or False

				        is_noarch = fc.find_prop("is_noarch") or False

				        is_onnx = fc.find_prop("is_onnx") or False

				        is_pure_torch = fc.find_prop("is_pure_torch") or False

				@ -284,10 +283,6 @@ def instantiate_configs(only_slow_gradcheck):

				            python_version = fc.find_prop("pyver")

				            parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if is_coverage:

				            parms_list_ignored_for_docker_image.append("coverage")

				            python_version = fc.find_prop("pyver")

				        if is_noarch:

				            parms_list_ignored_for_docker_image.append("noarch")

				@ -357,28 +352,6 @@ def instantiate_configs(only_slow_gradcheck):

				                                        tags_list=RC_PATTERN)

				            c.dependent_tests = gen_docs_configs(c)

				        if (

				            compiler_name != "clang"

				            and not rocm_version

				            and not is_libtorch

				            and not is_vulkan

				            and not is_pure_torch

				            and not is_noarch

				            and not is_slow_gradcheck

				            and not only_slow_gradcheck

				            and not build_only

				        ):

				            distributed_test = Conf(

				                c.gen_build_name("") + "distributed",

				                [],

				                is_xla=False,

				                restrict_phases=["test"],

				                is_libtorch=False,

				                is_important=True,

				                parent_build=c,

				            )

				            c.dependent_tests.append(distributed_test)

				        config_list.append(c)

				    return config_list

									
										119

.circleci/cimodel/data/simple/android_definitions.py
									
												View File
											
				@ -1,119 +0,0 @@

				import cimodel.data.simple.util.branch_filters as branch_filters

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK

				)

				import cimodel.lib.miniutils as miniutils

				class AndroidJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 is_master_only=True):

				        self.variant = variant

				        self.template_name = template_name

				        self.is_master_only = is_master_only

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant + [

				            "build",

				        ]

				        full_job_name = "_".join(base_name_parts)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "build_environment": "\"{}\"".format(build_env_name),

				            "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),

				            "requires": [DOCKER_REQUIREMENT_NDK]

				        }

				        if self.is_master_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        return [{self.template_name: props_dict}]

				class AndroidGradleJob:

				    def __init__(self,

				                 job_name,

				                 template_name,

				                 dependencies,

				                 is_master_only=True,

				                 is_pr_only=False,

				                 extra_props=tuple()):

				        self.job_name = job_name

				        self.template_name = template_name

				        self.dependencies = dependencies

				        self.is_master_only = is_master_only

				        self.is_pr_only = is_pr_only

				        self.extra_props = dict(extra_props)

				    def gen_tree(self):

				        props_dict = {

				            "name": self.job_name,

				            "requires": self.dependencies,

				        }

				        if self.is_master_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

				        elif self.is_pr_only:

				            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)

				        if self.extra_props:

				            props_dict.update(self.extra_props)

				        return [{self.template_name: props_dict}]

				WORKFLOW_DATA = [

				    AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),

				    AndroidJob(["x86_64"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",

				        "pytorch_android_gradle_build-x86_32",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				        "pytorch_android_gradle_custom_build_single",

				        [DOCKER_REQUIREMENT_NDK],

				        is_master_only=False,

				        is_pr_only=True),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				        "pytorch_android_gradle_custom_build_single",

				        [DOCKER_REQUIREMENT_NDK],

				        is_master_only=False,

				        is_pr_only=True,

				        extra_props=tuple({

				            "lite_interpreter": miniutils.quote(str(int(False)))

				        }.items())),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

				        "pytorch_android_gradle_build",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										10

.circleci/cimodel/data/simple/binary_smoketest.py
									
												View File
												
				@ -120,9 +120,9 @@ WORKFLOW_DATA = [

				    ),

				    SmoketestJob(

				        "binary_windows_build",

				        ["wheel", "3.7", "cu102"],

				        ["wheel", "3.7", "cu113"],

				        None,

				        "binary_windows_wheel_3_7_cu102_build",

				        "binary_windows_wheel_3_7_cu113_build",

				        is_master_only=True,

				    ),

				@ -144,11 +144,11 @@ WORKFLOW_DATA = [

				    ),

				    SmoketestJob(

				        "binary_windows_test",

				        ["wheel", "3.7", "cu102"],

				        ["wheel", "3.7", "cu113"],

				        None,

				        "binary_windows_wheel_3_7_cu102_test",

				        "binary_windows_wheel_3_7_cu113_test",

				        is_master_only=True,

				        requires=["binary_windows_wheel_3_7_cu102_build"],

				        requires=["binary_windows_wheel_3_7_cu113_build"],

				        extra_props={

				            "executor": "windows-with-nvidia-gpu",

				        },

									
										27

.circleci/cimodel/data/simple/docker_definitions.py
									
												View File
												
				@ -4,27 +4,8 @@ from cimodel.lib.miniutils import quote

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

				# TODO: make this generated from a matrix rather than just a static list

				# NOTE: All hardcoded docker image builds have been migrated to GHA

				IMAGE_NAMES = [

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				    "pytorch-linux-bionic-py3.6-clang9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

				    "pytorch-linux-bionic-py3.8-gcc9",

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    "pytorch-linux-xenial-py3-clang5-asan",

				    "pytorch-linux-xenial-py3-clang7-asan",

				    "pytorch-linux-xenial-py3-clang7-onnx",

				    "pytorch-linux-xenial-py3.8",

				    "pytorch-linux-xenial-py3.6-clang7",

				    "pytorch-linux-xenial-py3.6-gcc5.4",  # this one is used in doc builds

				    "pytorch-linux-xenial-py3.6-gcc7.2",

				    "pytorch-linux-xenial-py3.6-gcc7",

				    "pytorch-linux-bionic-rocm4.1-py3.6",

				    "pytorch-linux-bionic-rocm4.2-py3.6",

				    "pytorch-linux-bionic-rocm4.3.1-py3.6",

				]

				# This entry should be an element from the list above

				@ -32,10 +13,12 @@ IMAGE_NAMES = [

				# pytorch_build_data.py

				SLOW_GRADCHECK_IMAGE_NAME = "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				def get_workflow_jobs(only_slow_gradcheck=False):

				def get_workflow_jobs(images=IMAGE_NAMES, only_slow_gradcheck=False):

				    """Generates a list of docker image build definitions"""

				    ret = []

				    for image_name in IMAGE_NAMES:

				    for image_name in images:

				        if image_name.startswith('docker-'):

				            image_name = image_name.lstrip('docker-')

				        if only_slow_gradcheck and image_name is not SLOW_GRADCHECK_IMAGE_NAME:

				            continue

									
										6

.circleci/cimodel/data/simple/ios_definitions.py
									
												View File
												
				@ -75,6 +75,12 @@ WORKFLOW_DATA = [

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={

				        "op_list": "mobilenetv2.yaml",

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				    IOSJob(XCODE_VERSION, ArchVariant("x86_64", "coreml"), is_org_member_context=False, extra_props={

				        "use_coreml": miniutils.quote(str(int(True))),

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				    IOSJob(XCODE_VERSION, ArchVariant("arm64", "coreml"), extra_props={

				        "use_coreml": miniutils.quote(str(int(True))),

				        "lite_interpreter": miniutils.quote(str(int(True)))}),

				]

									
										33

.circleci/cimodel/data/simple/mobile_definitions.py
									
												View File
												
				@ -4,12 +4,6 @@ PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

				import cimodel.lib.miniutils as miniutils

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_ASAN,

				    DOCKER_REQUIREMENT_ASAN,

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class MobileJob:

				@ -52,33 +46,6 @@ class MobileJob:

				WORKFLOW_DATA = [

				    MobileJob(

				        DOCKER_IMAGE_ASAN,

				        [DOCKER_REQUIREMENT_ASAN],

				        ["build"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["custom", "build", "dynamic"]

				    ),

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["custom", "build", "static"]

				    ),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

				    MobileJob(

				        DOCKER_IMAGE_NDK,

				        [DOCKER_REQUIREMENT_NDK],

				        ["code", "analysis"],

				        True

				    ),

				]

									
										77

.circleci/cimodel/data/simple/nightly_android.py
									
												View File
											
				@ -1,77 +0,0 @@

				from cimodel.data.simple.util.docker_constants import (

				    DOCKER_IMAGE_NDK,

				    DOCKER_REQUIREMENT_NDK

				)

				class AndroidNightlyJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 extra_props=None,

				                 with_docker=True,

				                 requires=None,

				                 no_build_suffix=False):

				        self.variant = variant

				        self.template_name = template_name

				        self.extra_props = extra_props or {}

				        self.with_docker = with_docker

				        self.requires = requires

				        self.no_build_suffix = no_build_suffix

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant

				        build_suffix = [] if self.no_build_suffix else ["build"]

				        full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "requires": self.requires,

				            "filters": {"branches": {"only": "nightly"}},

				        }

				        props_dict.update(self.extra_props)

				        if self.with_docker:

				            props_dict["docker_image"] = DOCKER_IMAGE_NDK

				            props_dict["build_environment"] = build_env_name

				        return [{self.template_name: props_dict}]

				BASE_REQUIRES = [DOCKER_REQUIREMENT_NDK]

				WORKFLOW_DATA = [

				    AndroidNightlyJob(["x86_32"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["x86_64"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build", requires=BASE_REQUIRES),

				    AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",

				                      with_docker=False,

				                      requires=[

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				    AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",

				                      extra_props={"context": "org-member"},

				                      with_docker=False,

				                      requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],

				                      no_build_suffix=True),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										20

.circleci/cimodel/data/simple/nightly_ios.py
									
												View File
												
				@ -5,9 +5,11 @@ import cimodel.lib.miniutils as miniutils

				class IOSNightlyJob:

				    def __init__(self,

				                 variant,

				                 is_full_jit=False,

				                 is_upload=False):

				        self.variant = variant

				        self.is_full_jit = is_full_jit

				        self.is_upload = is_upload

				    def get_phase_name(self):

				@ -17,8 +19,11 @@ class IOSNightlyJob:

				        extra_name_suffix = [self.get_phase_name()] if self.is_upload else []

				        extra_name = ["full_jit"] if self.is_full_jit else []

				        common_name_pieces = [

				            "ios",

				        ] + extra_name + [

				        ] + ios_definitions.XCODE_VERSION.render_dots_or_parts(with_version_dots) + [

				            "nightly",

				            self.variant,

				@ -31,7 +36,8 @@ class IOSNightlyJob:

				        return "_".join(["pytorch"] + self.get_common_name_pieces(False))

				    def gen_tree(self):

				        extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []

				        build_configs = BUILD_CONFIGS_FULL_JIT if self.is_full_jit else BUILD_CONFIGS

				        extra_requires = [x.gen_job_name() for x in build_configs] if self.is_upload else []

				        props_dict = {

				            "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),

				@ -47,6 +53,9 @@ class IOSNightlyJob:

				            props_dict["use_metal"] = miniutils.quote(str(int(True)))

				            props_dict["use_coreml"] = miniutils.quote(str(int(True)))

				        if self.is_full_jit:

				            props_dict["lite_interpreter"] = miniutils.quote(str(int(False)))

				        template_name = "_".join([

				            "binary",

				            "ios",

				@ -61,9 +70,14 @@ BUILD_CONFIGS = [

				    IOSNightlyJob("arm64"),

				]

				BUILD_CONFIGS_FULL_JIT = [

				    IOSNightlyJob("x86_64", is_full_jit=True),

				    IOSNightlyJob("arm64", is_full_jit=True),

				]

				WORKFLOW_DATA = BUILD_CONFIGS + [

				    IOSNightlyJob("binary", is_upload=True),

				WORKFLOW_DATA = BUILD_CONFIGS + BUILD_CONFIGS_FULL_JIT + [

				    IOSNightlyJob("binary", is_full_jit=False, is_upload=True),

				    IOSNightlyJob("binary", is_full_jit=True, is_upload=True),

				]

									
										160

.circleci/cimodel/data/windows_build_definitions.py
									
												View File
											
				@ -1,160 +0,0 @@

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN, NON_PR_BRANCH_LIST

				from cimodel.data.simple.util.versions import CudaVersion

				class WindowsJob:

				    def __init__(

				        self,

				        test_index,

				        vscode_spec,

				        cuda_version,

				        force_on_cpu=False,

				        multi_gpu=False,

				        master_only=False,

				        nightly_only=False,

				        master_and_nightly=False

				    ):

				        self.test_index = test_index

				        self.vscode_spec = vscode_spec

				        self.cuda_version = cuda_version

				        self.force_on_cpu = force_on_cpu

				        self.multi_gpu = multi_gpu

				        self.master_only = master_only

				        self.nightly_only = nightly_only

				        self.master_and_nightly = master_and_nightly

				    def gen_tree(self):

				        base_phase = "build" if self.test_index is None else "test"

				        numbered_phase = (

				            base_phase if self.test_index is None else base_phase + str(self.test_index)

				        )

				        key_parts = ["pytorch", "windows", base_phase]

				        if self.multi_gpu:

				            key_parts.append('multigpu')

				        key_name = "_".join(key_parts)

				        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []

				        target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"

				        python_version = "3.8"

				        base_name_parts = [

				            "pytorch",

				            "windows",

				            self.vscode_spec.render(),

				            "py" + python_version.replace(".", ""),

				            target_arch,

				        ]

				        prerequisite_jobs = []

				        if base_phase == "test":

				            prerequisite_jobs.append("_".join(base_name_parts + ["build"]))

				        if self.cuda_version:

				            self.cudnn_version = 8 if self.cuda_version.major == 11 else 7

				        arch_env_elements = (

				            ["cuda" + str(self.cuda_version.major) + "." + str(self.cuda_version.minor)]

				            if self.cuda_version

				            else ["cpu"]

				        )

				        build_environment_string = "-".join(

				            ["pytorch", "win"]

				            + self.vscode_spec.get_elements()

				            + arch_env_elements

				            + ["py" + python_version.split(".")[0]]

				        )

				        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu

				        if self.multi_gpu:

				            props_dict = {"requires": prerequisite_jobs}

				        else:

				            props_dict = {

				                "build_environment": build_environment_string,

				                "python_version": miniutils.quote(python_version),

				                "vs_version": miniutils.quote("16.8.6"),

				                "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

				                "vc_year": miniutils.quote(str(self.vscode_spec.year)),

				                "vc_product": self.vscode_spec.get_product(),

				                "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),

				                "requires": prerequisite_jobs,

				            }

				        if self.master_only:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict()

				        elif self.nightly_only:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict(branches_list=["nightly"], tags_list=RC_PATTERN)

				        elif self.master_and_nightly:

				            props_dict[

				                "filters"

				            ] = gen_filter_dict(branches_list=NON_PR_BRANCH_LIST + ["nightly"], tags_list=RC_PATTERN)

				        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]

				        if not self.multi_gpu:

				            if base_phase == "test":

				                test_name = "-".join(["pytorch", "windows", numbered_phase])

				                props_dict["test_name"] = test_name

				                if is_running_on_cuda:

				                    props_dict["executor"] = "windows-with-nvidia-gpu"

				            props_dict["cuda_version"] = (

				                miniutils.quote(str(self.cuda_version))

				                if self.cuda_version

				                else "cpu"

				            )

				        props_dict["name"] = "_".join(name_parts)

				        return [{key_name: props_dict}]

				class VcSpec:

				    def __init__(self, year, version_elements=None, hide_version=False):

				        self.year = year

				        self.version_elements = version_elements or []

				        self.hide_version = hide_version

				    def get_elements(self):

				        if self.hide_version:

				            return [self.prefixed_year()]

				        return [self.prefixed_year()] + self.version_elements

				    def get_product(self):

				        return "BuildTools"

				    def dotted_version(self):

				        return ".".join(self.version_elements)

				    def prefixed_year(self):

				        return "vs" + str(self.year)

				    def render(self):

				        return "_".join(self.get_elements())

				_VC2019 = VcSpec(2019)

				WORKFLOW_DATA = [

				    # VS2019 CUDA-10.2

				    WindowsJob(None, _VC2019, CudaVersion(10, 2), master_only=True),

				    # VS2019 CUDA-10.2 force on cpu

				    WindowsJob(1, _VC2019, CudaVersion(10, 2), force_on_cpu=True, master_only=True),

				    # TODO: This test is disabled due to https://github.com/pytorch/pytorch/issues/59724

				    # WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, master_and_nightly=True),

				]

				def get_windows_workflows():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

3054

.circleci/config.yml generated

View File

File diff suppressed because it is too large Load Diff

									
										4

.circleci/docker/android/build.gradle
									
												View File
												
				@ -51,9 +51,9 @@ android {

				dependencies {

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'androidx.appcompat:appcompat:1.0.0'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.2.2'

				    implementation 'com.google.code.findbugs:jsr305:3.0.1'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    implementation 'com.facebook.soloader:nativeloader:0.10.1'

				    implementation 'junit:junit:' + rootProject.junitVersion

				    implementation 'androidx.test:core:' + rootProject.coreVersion

									
										85

.circleci/docker/build.sh
									
												View File
												
				@ -82,8 +82,8 @@ case "$image" in

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=5

				    PROTOBUF=yes

				@ -91,14 +91,14 @@ case "$image" in

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-gcc7)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				@ -108,7 +108,7 @@ case "$image" in

				  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				@ -119,7 +119,7 @@ case "$image" in

				  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.1

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				@ -130,7 +130,19 @@ case "$image" in

				  pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    TENSORRT_VERSION=8.0.1.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7)

				    CUDA_VERSION=11.5.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    GCC_VERSION=7

				    PROTOBUF=yes

				@ -139,15 +151,15 @@ case "$image" in

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.10.3

				    CMAKE_VERSION=3.13.5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				@ -155,7 +167,7 @@ case "$image" in

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang7-onnx)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=7

				    CMAKE_VERSION=3.10.3

				    PROTOBUF=yes

				@ -163,9 +175,9 @@ case "$image" in

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=5.0

				    CMAKE_VERSION=3.10.3

				    CMAKE_VERSION=3.13.5

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				@ -173,16 +185,16 @@ case "$image" in

				    GRADLE_VERSION=6.8.3

				    NINJA_VERSION=1.9.0

				    ;;

				  pytorch-linux-xenial-py3.6-clang7)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-xenial-py3.7-clang7)

				    ANACONDA_PYTHON_VERSION=3.7

				    CMAKE_VERSION=3.10.3

				    CLANG_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-py3.6-clang9)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-py3.7-clang9)

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				@ -197,10 +209,10 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				@ -215,34 +227,34 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.7-gcc9)

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=3.9

				    ;;

				  pytorch-linux-bionic-rocm4.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-rocm4.1-py3.7)

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=4.1

				    ;;

				  pytorch-linux-bionic-rocm4.2-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-rocm4.2-py3.7)

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=4.2

				    ;;

				  pytorch-linux-bionic-rocm4.3.1-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				  pytorch-linux-bionic-rocm4.3.1-py3.7)

				    ANACONDA_PYTHON_VERSION=3.7

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				@ -294,6 +306,16 @@ fi

				tmp_tag=$(basename "$(mktemp -u)" | tr '[:upper:]' '[:lower:]')

				# If we are trying to use nvidia cuda image make sure it exists, otherwise use IMAGE from ghcr.io

				# this logic currently only exists for ubuntu

				if [[ "$image" == *cuda*  && ${OS} == "ubuntu" ]]; then

				  IMAGE_NAME="nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}"

				  if ! DOCKER_CLI_EXPERIMENTAL=enabled docker manifest inspect "${IMAGE_NAME}" >/dev/null 2>/dev/null; then

				    IMAGE_NAME="ghcr.io/pytorch/nvidia/cuda:${CUDA_VERSION}-devel-ubuntu${UBUNTU_VERSION}"

				    INSTALL_CUDNN="True"

				  fi

				fi

				# Build image

				# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

				# it's no longer needed.

				@ -320,6 +342,7 @@ docker build \

				       --build-arg "GCC_VERSION=${GCC_VERSION}" \

				       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

				       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

				       --build-arg "TENSORRT_VERSION=${TENSORRT_VERSION}" \

				       --build-arg "ANDROID=${ANDROID}" \

				       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

				       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

				@ -329,6 +352,9 @@ docker build \

				       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \

				       --build-arg "KATEX=${KATEX:-}" \

				       --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \

				       --build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx900;gfx906}" \

				       --build-arg "IMAGE_NAME=${IMAGE_NAME}" \

				       --build-arg "INSTALL_CUDNN=${INSTALL_CUDNN}" \

				       -f $(dirname ${DOCKERFILE})/Dockerfile \

				       -t "$tmp_tag" \

				       "$@" \

				@ -347,6 +373,7 @@ function drun() {

				}

				if [[ "$OS" == "ubuntu" ]]; then

				  if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then

				    echo "OS=ubuntu, but:"

				    drun lsb_release -a

									
										15

.circleci/docker/build_docker.sh
									
												View File
												
				@ -26,11 +26,14 @@ login() {

				    docker login -u AWS --password-stdin "$1"

				}

				# Retry on timeouts (can happen on job stampede).

				retry login "${registry}"

				# Logout on exit

				trap "docker logout ${registry}" EXIT

				# Only run these steps if not on github actions

				if [[ -z "${GITHUB_ACTIONS}" ]]; then

				  # Retry on timeouts (can happen on job stampede).

				  retry login "${registry}"

				  # Logout on exit

				  trap "docker logout ${registry}" EXIT

				fi

				# export EC2=1

				# export JENKINS=1

				@ -45,8 +48,8 @@ trap "docker logout ${registry}" EXIT

				docker push "${image}:${tag}"

				docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then

				  trap "rm -rf ${IMAGE_NAME}:${tag}.tar" EXIT

				  docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				  aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

				fi

									
										12

.circleci/docker/centos-rocm/Dockerfile
									
												View File
												
				@ -4,6 +4,10 @@ FROM centos:${CENTOS_VERSION}

				ARG CENTOS_VERSION

				# Set AMD gpu targets to build for

				ARG PYTORCH_ROCM_ARCH

				ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}

				# Install required packages to build Caffe2

				# Install common dependencies (so that this step can be cached separately)

				@ -11,6 +15,12 @@ ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Update CentOS git version

				RUN yum -y remove git

				RUN yum -y remove git-*

				RUN yum -y install https://packages.endpoint.com/rhel/7/os/x86_64/endpoint-repo-1.9-1.x86_64.rpm

				RUN yum install -y git

				# Install devtoolset

				ARG DEVTOOLSET_VERSION

				ADD ./common/install_devtoolset.sh install_devtoolset.sh

				@ -27,7 +37,7 @@ RUN rm install_glibc.sh

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

									
										16

.circleci/docker/common/install_base.sh
									
												View File
												
				@ -11,8 +11,13 @@ install_ubuntu() {

				  #   "$UBUNTU_VERSION" == "18.04"

				  if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				    cmake3="cmake=3.10*"

				    maybe_libiomp_dev="libiomp-dev"

				  elif [[ "$UBUNTU_VERSION" == "20.04"* ]]; then

				    cmake3="cmake=3.16*"

				    maybe_libiomp_dev=""

				  else

				    cmake3="cmake=3.5*"

				    maybe_libiomp_dev="libiomp-dev"

				  fi

				  # Install common dependencies

				@ -33,7 +38,7 @@ install_ubuntu() {

				    git \

				    libatlas-base-dev \

				    libc6-dbg \

				    libiomp-dev \

				    ${maybe_libiomp_dev} \

				    libyaml-dev \

				    libz-dev \

				    libjpeg-dev \

				@ -44,6 +49,10 @@ install_ubuntu() {

				    wget \

				    vim

				  # Should resolve issues related to various apt package repository cert issues

				  # see: https://github.com/pytorch/pytorch/issues/65931

				  apt-get install -y libgnutls30

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				@ -109,10 +118,7 @@ esac

				# Install Valgrind separately since the apt-get version is too old.

				mkdir valgrind_build && cd valgrind_build

				VALGRIND_VERSION=3.16.1

				if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2

				then

				  wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2

				fi

				wget https://ossci-linux.s3.amazonaws.com/valgrind-${VALGRIND_VERSION}.tar.bz2

				tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2

				cd valgrind-${VALGRIND_VERSION}

				./configure --prefix=/usr/local

									
										25

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -13,7 +13,12 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				      CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"

				    ;;

				    3)

				      CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"

				      if [ "$ANACONDA_PYTHON_VERSION" = "3.6" ]; then

				        # Latest release of Conda that still supports python-3.6

				        CONDA_FILE="Miniconda3-py37_4.10.3-Linux-x86_64.sh"

				      else

				        CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"

				      fi

				    ;;

				    *)

				      echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"

				@ -56,7 +61,9 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  pushd /opt/conda

				  # Track latest conda update

				  as_jenkins conda update -y -n base conda

				  if [ "$ANACONDA_PYTHON_VERSION" != "3.6" ]; then

				    as_jenkins conda update -y -n base conda

				  fi

				  # Install correct Python version

				  as_jenkins conda install -y python="$ANACONDA_PYTHON_VERSION"

				@ -86,14 +93,10 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

				  fi

				  if [[ "$CUDA_VERSION" == 10.2* ]]; then

				    conda_install magma-cuda102 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.0* ]]; then

				    conda_install magma-cuda110 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.1* ]]; then

				    conda_install magma-cuda111 -c pytorch

				  elif [[ "$CUDA_VERSION" == 11.3* ]]; then

				    conda_install magma-cuda113 -c pytorch

				  # Magma package names are concatenation of CUDA major and minor ignoring revision

				  # I.e. magma-cuda102 package corresponds to CUDA_VERSION=10.2 and CUDA_VERSION=10.2.89

				  if [ -n "$CUDA_VERSION" ]; then

				    conda_install magma-cuda$(TMP=${CUDA_VERSION/./};echo ${TMP%.*[0-9]}) -c pytorch

				  fi

				  # TODO: This isn't working atm

				@ -103,14 +106,12 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # TODO: Why is scipy pinned

				  # Pin MyPy version because new errors are likely to appear with each release

				  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

				  # Pin coverage so we can use COVERAGE_RCFILE

				  as_jenkins pip install --progress-bar off pytest \

				    scipy==$SCIPY_VERSION \

				    scikit-image \

				    psutil \

				    unittest-xml-reporting \

				    boto3==1.16.34 \

				    coverage==5.5 \

				    hypothesis==4.53.2 \

				    expecttest==0.1.3 \

				    mypy==0.812 \

									
										10

.circleci/docker/common/install_cudnn8.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,10 @@

				#!/bin/bash

				sudo apt-get update

				# also install ssh to avoid error of:

				# --------------------------------------------------------------------------

				# The value of the MCA parameter "plm_rsh_agent" was set to a path

				# that could not be found:

				#   plm_rsh_agent: ssh : rsh

				sudo apt-get install -y ssh

				sudo apt-get update && apt-get install -y --no-install-recommends libcudnn8=8.2.0.53-1+cuda11.3 libcudnn8-dev=8.2.0.53-1+cuda11.3 && apt-mark hold libcudnn8

									
										11

.circleci/docker/common/install_gcc.sh
									
												View File
												
				@ -7,15 +7,18 @@ if [ -n "$GCC_VERSION" ]; then

				  # Need the official toolchain repo to get alternate packages

				  add-apt-repository ppa:ubuntu-toolchain-r/test

				  apt-get update

				  if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then

				  if [[ "$UBUNTU_VERSION" == "16.04" && "${GCC_VERSION:0:1}" == "5" ]]; then

				    apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12

				    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50

				    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50

				    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-5 50

				  else

				    apt-get install -y g++-$GCC_VERSION

				    update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				    update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				    update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

				  fi

				  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

									
										2

.circleci/docker/common/install_openssl.sh
									
												View File
												
				@ -4,7 +4,7 @@ set -ex

				OPENSSL=openssl-1.1.1k

				wget -q -O "${OPENSSL}.tar.gz" "https://www.openssl.org/source/${OPENSSL}.tar.gz"

				wget -q -O "${OPENSSL}.tar.gz" "https://ossci-linux.s3.amazonaws.com/${OPENSSL}.tar.gz"

				tar xf "${OPENSSL}.tar.gz"

				cd "${OPENSSL}"

				./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'

									
										27

.circleci/docker/common/install_rocm.sh
									
												View File
												
				@ -4,22 +4,27 @@ set -ex

				install_magma() {

				    # "install" hipMAGMA into /opt/rocm/magma by copying after build

				    git clone https://bitbucket.org/icl/magma.git -b magma_ctrl_launch_bounds

				    git clone https://bitbucket.org/icl/magma.git

				    pushd magma

				    # The branch "magma_ctrl_launch_bounds" is having a fix over the below commit, so keeping the below comment for reference.

				    #git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f

				    # Work around non-asii characters in certain magma sources; remove this after upstream magma fixes this.

				    perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zfree.cpp

				    perl -i.bak -pe 's/[^[:ascii:]]//g' sparse/control/magma_zsolverinfo.cpp

				    # fix for magma_queue memory leak issue

				    git checkout c62d700d880c7283b33fb1d615d62fc9c7f7ca21

				    cp make.inc-examples/make.inc.hip-gcc-mkl make.inc

				    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

				    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

				    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --gpu-max-threads-per-block=256' >> make.inc

				    echo 'DEVCCFLAGS += --gpu-max-threads-per-block=256' >> make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    if [[ -n "$PYTORCH_ROCM_ARCH" ]]; then

				      amdgpu_targets=`echo $PYTORCH_ROCM_ARCH | sed 's/;/ /g'`

				    else

				      amdgpu_targets=`rocm_agent_enumerator | grep -v gfx000 | sort -u | xargs`

				    fi

				    for arch in $amdgpu_targets; do

				      echo "DEVCCFLAGS += --amdgpu-target=$arch" >> make.inc

				    done

				    # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition

				    sed -i 's/^FOPENMP/#FOPENMP/g' make.inc

				    export PATH="${PATH}:/opt/rocm/bin"

				    make -f make.gen.hipMAGMA -j $(nproc)

				    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

				    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

				    popd

				    mv magma /opt/rocm

				@ -35,6 +40,10 @@ install_ubuntu() {

				      # gpg-agent is not available by default on 18.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    if [[ $UBUNTU_VERSION == 20.04 ]]; then

				      # gpg-agent is not available by default on 20.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    apt-get install -y kmod

				    apt-get install -y wget

									
										7

.circleci/docker/common/install_tensorrt.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,7 @@

				#!/bin/bash

				if [ -n "$TENSORRT_VERSION" ]; then

				    python3 -m pip install --upgrade setuptools pip

				    python3 -m pip install nvidia-pyindex

				    python3 -m pip install nvidia-tensorrt==${TENSORRT_VERSION} --extra-index-url https://pypi.ngc.nvidia.com

				fi

									
										22

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -1,13 +1,15 @@

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				ARG IMAGE_NAME

				FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}

				FROM ${IMAGE_NAME}

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				@ -24,7 +26,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				@ -65,6 +67,12 @@ ADD ./common/install_openssl.sh install_openssl.sh

				ENV OPENSSL_ROOT_DIR /opt/openssl

				RUN bash ./install_openssl.sh

				# (optional) Install TensorRT

				ARG TENSORRT_VERSION

				ADD ./common/install_tensorrt.sh install_tensorrt.sh

				RUN if [ -n "${TENSORRT_VERSION}" ]; then bash ./install_tensorrt.sh; fi

				RUN rm install_tensorrt.sh

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				@ -75,7 +83,7 @@ RUN rm install_cmake.sh

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc

				ENV CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache

				# Add jni.h for java host build

				ADD ./common/install_jni.sh install_jni.sh

				@ -94,9 +102,17 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# AWS specific CUDA build guidance

				ENV TORCH_CUDA_ARCH_LIST Maxwell

				ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				ENV CUDA_PATH /usr/local/cuda

				# Install LLVM dev version (Defined in the pytorch/builder github repository)

				COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

				# Hack for CUDA 11.5.0 image to install cudnn8 since cudnn8 is not included with CUDA 11.5 image

				# Also note cudnn 8.2.0.53 is labeled for cuda 11.3

				ARG INSTALL_CUDNN

				ADD ./common/install_cudnn8.sh install_cudnn8.sh

				RUN if [ -n "${INSTALL_CUDNN}" ]; then bash install_cudnn8.sh; fi

				RUN rm install_cudnn8.sh

				USER jenkins

				CMD ["bash"]

									
										6

.circleci/docker/ubuntu-rocm/Dockerfile
									
												View File
												
				@ -6,6 +6,10 @@ ARG UBUNTU_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Set AMD gpu targets to build for

				ARG PYTORCH_ROCM_ARCH

				ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				@ -21,7 +25,7 @@ RUN bash ./install_clang.sh && rm install_clang.sh

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

									
										2

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -33,7 +33,7 @@ ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda and other packages (e.g., numpy, coverage, pytest)

				# Install conda and other packages (e.g., numpy, pytest)

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

									
										59

.circleci/generate_config_yml.py
									
												View File
												
				@ -11,17 +11,11 @@ import sys

				from collections import namedtuple

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.simple.android_definitions

				import cimodel.data.simple.binary_smoketest

				import cimodel.data.simple.docker_definitions

				import cimodel.data.simple.ios_definitions

				import cimodel.data.simple.macos_definitions

				import cimodel.data.simple.mobile_definitions

				import cimodel.data.simple.nightly_android

				import cimodel.data.simple.nightly_ios

				import cimodel.data.simple.anaconda_prune_defintions

				import cimodel.data.windows_build_definitions as windows_build_definitions

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.miniyaml as miniyaml

				@ -78,15 +72,15 @@ class Header(object):

				        for line in filter(None, lines):

				            output_filehandle.write(line + "\n")

				def filter_master_only_jobs(items):

				    def _for_all_items(items, functor) -> None:

				        if isinstance(items, list):

				            for item in items:

				                _for_all_items(item, functor)

				        if isinstance(items, dict) and len(items) == 1:

				            item_type, item = next(iter(items.items()))

				            functor(item_type, item)

				def _for_all_items(items, functor) -> None:

				    if isinstance(items, list):

				        for item in items:

				            _for_all_items(item, functor)

				    if isinstance(items, dict) and len(items) == 1:

				        item_type, item = next(iter(items.items()))

				        functor(item_type, item)

				def filter_master_only_jobs(items):

				    def _is_master_item(item):

				        filters = item.get('filters', None)

				        branches = filters.get('branches', None) if filters is not None else None

				@ -124,24 +118,37 @@ def filter_master_only_jobs(items):

				    _for_all_items(items, _save_requires_if_master)

				    return _do_filtering(items)

				def generate_required_docker_images(items):

				    required_docker_images = set()

				    def _requires_docker_image(item_type, item):

				        requires = item.get('requires', None)

				        if not isinstance(requires, list):

				            return

				        for requirement in requires:

				            requirement = requirement.replace('"', '')

				            if requirement.startswith('docker-'):

				                required_docker_images.add(requirement)

				    _for_all_items(items, _requires_docker_image)

				    return required_docker_images

				def gen_build_workflows_tree():

				    build_workflows_functions = [

				        cimodel.data.simple.docker_definitions.get_workflow_jobs,

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.macos_definitions.get_workflow_jobs,

				        cimodel.data.simple.android_definitions.get_workflow_jobs,

				        cimodel.data.simple.ios_definitions.get_workflow_jobs,

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.binary_smoketest.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.nightly_android.get_workflow_jobs,

				        cimodel.data.simple.anaconda_prune_defintions.get_workflow_jobs,

				        windows_build_definitions.get_windows_workflows,

				        binary_build_definitions.get_post_upload_jobs,

				        binary_build_definitions.get_binary_smoke_test_jobs,

				    ]

				    build_jobs = [f() for f in build_workflows_functions]

				    build_jobs.extend(

				        cimodel.data.simple.docker_definitions.get_workflow_jobs(

				            # sort for consistency

				            sorted(generate_required_docker_images(build_jobs))

				        )

				    )

				    master_build_jobs = filter_master_only_jobs(build_jobs)

				    binary_build_functions = [

				@ -150,11 +157,6 @@ def gen_build_workflows_tree():

				        binary_build_definitions.get_nightly_uploads,

				    ]

				    slow_gradcheck_jobs = [

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.docker_definitions.get_workflow_jobs,

				    ]

				    return {

				        "workflows": {

				            "binary_builds": {

				@ -169,10 +171,6 @@ def gen_build_workflows_tree():

				                "when": r"<< pipeline.parameters.run_master_build >>",

				                "jobs": master_build_jobs,

				            },

				            "slow_gradcheck_build": {

				                "when": r"<< pipeline.parameters.run_slow_gradcheck_build >>",

				                "jobs": [f(only_slow_gradcheck=True) for f in slow_gradcheck_jobs],

				            },

				        }

				    }

				@ -196,7 +194,6 @@ YAML_SOURCES = [

				    File("job-specs/docker_jobs.yml"),

				    Header("Workflows"),

				    Treegen(gen_build_workflows_tree, 0),

				    File("workflows/workflows-scheduled-ci.yml"),

				    File("workflows/workflows-ecr-gc.yml"),

				    File("workflows/workflows-promote.yml"),

				]

									
										2

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -61,7 +61,7 @@ git --no-pager log --max-count 1

				popd

				# Clone the Builder master repo

				retry git clone -q https://github.com/pytorch/builder.git -b release/1.10 "$BUILDER_ROOT"

				retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				echo "Using builder from "

				git --no-pager log --max-count 1

									
										2

.circleci/scripts/binary_ios_test.sh
									
												View File
												
				@ -27,4 +27,4 @@ if ! [ -x "$(command -v xcodebuild)" ]; then

				    exit 1

				fi

				PROFILE=PyTorch_CI_2022

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID} -f Accelerate,MetalPerformanceShaders,CoreML

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

									
										35

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -23,14 +23,23 @@ do

				    fi

				done

				lipo -i ${ZIP_DIR}/install/lib/*.a

				echo "BUILD_LITE_INTERPRETER: ${BUILD_LITE_INTERPRETER}"

				# copy the umbrella header and license

				cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    cp ${PROJ_ROOT}/ios/LibTorch-Lite.h ${ZIP_DIR}/src/

				else

				    cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/

				fi

				cp ${PROJ_ROOT}/LICENSE ${ZIP_DIR}/

				# zip the library

				export DATE="$(date -u +%Y%m%d)"

				export IOS_NIGHTLY_BUILD_VERSION="1.10.0.${DATE}"

				# libtorch_lite_ios_nightly_1.10.0.20210810.zip

				ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

				export IOS_NIGHTLY_BUILD_VERSION="1.11.0.${DATE}"

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    # libtorch_lite_ios_nightly_1.11.0.20210810.zip

				    ZIPFILE="libtorch_lite_ios_nightly_${IOS_NIGHTLY_BUILD_VERSION}.zip"

				else

				    ZIPFILE="libtorch_ios_nightly_build.zip"

				fi

				cd ${ZIP_DIR}

				#for testing

				touch version.txt

				@ -52,13 +61,15 @@ set +x

				# echo "AWS SECRET: ${AWS_SECRET_ACCESS_KEY}"

				aws s3 cp ${ZIPFILE} s3://ossci-ios-build/ --acl public-read

				# create a new LibTorch-Lite-Nightly.podspec from the template

				echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"

				cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				if [ "${BUILD_LITE_INTERPRETER}" == "1" ]; then

				    # create a new LibTorch-Lite-Nightly.podspec from the template

				    echo "cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec"

				    cp ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec.template ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				# update pod version

				sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    # update pod version

				    sed -i '' -e "s/IOS_NIGHTLY_BUILD_VERSION/${IOS_NIGHTLY_BUILD_VERSION}/g" ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    cat ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				# push the new LibTorch-Lite-Nightly.podspec to CocoaPods

				pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				    # push the new LibTorch-Lite-Nightly.podspec to CocoaPods

				    pod trunk push --verbose --allow-warnings --use-libraries --skip-import-validation ${PROJ_ROOT}/ios/LibTorch-Lite-Nightly.podspec

				fi

									
										2

.circleci/scripts/binary_linux_build.sh
									
												View File
												
				@ -11,7 +11,7 @@ NUM_CPUS=$(( $(nproc) - 2 ))

				# Defaults here for **binary** linux builds so they can be changed in one place

				export MAX_JOBS=${MAX_JOBS:-$(( ${NUM_CPUS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${NUM_CPUS} ))}

				if [[ "${DESIRED_CUDA}" == "cu111" || "${DESIRED_CUDA}" == "cu113" ]]; then

				if [[ "${DESIRED_CUDA}" =~ cu11[0-9] ]]; then

				  export BUILD_SPLIT_CUDA="ON"

				fi

									
										2

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -30,7 +30,7 @@ if [[ "\$python_nodot" = *39* ]]; then

				  NUMPY_PIN=">=1.20"

				fi

				if [[ "$DESIRED_CUDA" == "cu112" ]]; then

				if [[ "$DESIRED_CUDA" == "cu112" || "$DESIRED_CUDA" == "cu115" ]]; then

				  EXTRA_CONDA_FLAGS="-c=conda-forge"

				fi

									
										4

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -85,7 +85,7 @@ PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="1.10.0.dev$DATE"

				BASE_BUILD_VERSION="1.11.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				# Use 'git -C' to make doubly sure we're in the correct directory for checking

				# the git tag

				@ -148,7 +148,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				fi

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.10.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.11.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

									
										7

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -65,7 +65,6 @@ cp torch/_utils_internal.py tools/shared

				# Generate PyTorch files

				time python tools/setup_helpers/generate_code.py \

				  --declarations-path build/aten/src/ATen/Declarations.yaml \

				  --native-functions-path aten/src/ATen/native/native_functions.yaml \

				  --nn-path aten/src/

				@ -97,8 +96,12 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Generate C++ docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git commit -m "Generate C++ docs from pytorch/pytorch@${GITHUB_SHA}" || true

				git status

				if [[ "${WITH_PUSH:-}" == true ]]; then

				  git push -u origin

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										6

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -131,8 +131,12 @@ git status

				git config user.email "soumith+bot@pytorch.org"

				git config user.name "pytorchbot"

				# If there aren't changes, don't make a commit; push is no-op

				git commit -m "Generate Python docs from pytorch/pytorch@$CIRCLE_SHA1" || true

				git commit -m "Generate Python docs from pytorch/pytorch@${GITHUB_SHA}" || true

				git status

				if [[ "${WITH_PUSH:-}" == true ]]; then

				  git push -u origin "${branch}"

				fi

				popd

				# =================== The above code **should** be executed inside Docker container ===================

									
										2

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -32,7 +32,7 @@ if ! command -v aws >/dev/null; then

				fi

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

									
										6

.circleci/scripts/windows_cuda_install.sh
									
												View File
												
				@ -11,13 +11,17 @@ case ${CUDA_VERSION} in

				        cuda_install_packages="nvcc_10.2 cuobjdump_10.2 nvprune_10.2 cupti_10.2 cublas_10.2 cublas_dev_10.2 cudart_10.2 cufft_10.2 cufft_dev_10.2 curand_10.2 curand_dev_10.2 cusolver_10.2 cusolver_dev_10.2 cusparse_10.2 cusparse_dev_10.2 nvgraph_10.2 nvgraph_dev_10.2 npp_10.2 npp_dev_10.2 nvrtc_10.2 nvrtc_dev_10.2 nvml_dev_10.2"

				        ;;

				    11.1)

				        cuda_installer_name="cuda_11.1.0_456.43_win10"

				        cuda_installer_name="cuda_11.1.1_456.81_win10"

				        cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

				        ;;

				    11.3)

				        cuda_installer_name="cuda_11.3.0_465.89_win10"

				        cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"

				        ;;

				    11.5)

				        cuda_installer_name="cuda_11.5.0_496.13_win10"

				        cuda_install_packages="thrust_11.5 nvcc_11.5 cuobjdump_11.5 nvprune_11.5 nvprof_11.5 cupti_11.5 cublas_11.5 cublas_dev_11.5 cudart_11.5 cufft_11.5 cufft_dev_11.5 curand_11.5 curand_dev_11.5 cusolver_11.5 cusolver_dev_11.5 cusparse_11.5 cusparse_dev_11.5 npp_11.5 npp_dev_11.5 nvrtc_11.5 nvrtc_dev_11.5 nvml_dev_11.5"

				        ;;

				    *)

				        echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

				        exit 1

									
										3

.circleci/scripts/windows_cudnn_install.sh
									
												View File
												
				@ -19,6 +19,9 @@ case ${CUDA_VERSION} in

				    11.3)

				        archive_version="v8.2.0.53"

				        ;;

				    11.5)

				        archive_version="v8.2.0.53"

				        ;;

				    *)

				        echo "CUDA_VERSION: ${CUDA_VERSION} not supported yet"

				        exit 1

									
										18

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
												View File
												
				@ -26,24 +26,6 @@ pytorch_params: &pytorch_params

				    CI_MASTER: << pipeline.parameters.run_master_build >>

				  resource_class: << parameters.resource_class >>

				pytorch_android_params: &pytorch_android_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    op_list:

				      type: string

				      default: ""

				    lite_interpreter:

				      type: string

				      default: "1"

				  environment:

				    BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single

				    DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				    PYTHON_VERSION: "3.6"

				    SELECTED_OP_LIST: << parameters.op_list >>

				    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

				pytorch_ios_params: &pytorch_ios_params

				  parameters:

				    build_environment:

									
										64

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -43,7 +43,8 @@

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          tag=${CIRCLE_TAG:1:5}

				          # turn v1.12.0rc3 into 1.12.0

				          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')

				          target=${tag:-master}

				          echo "building for ${target}"

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				@ -88,6 +89,8 @@

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}:build-${DOCKER_TAG}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          # turn v1.12.0rc3 into 1.12.0

				          tag=$(echo $CIRCLE_TAG | sed -e 's/v*\([0-9.]*\).*/\1/')

				          tag=${CIRCLE_TAG:1:5}

				          target=${tag:-master}

				          echo "building for ${target}"

				@ -210,7 +213,7 @@

				          command: |

				            set -ex

				            source /Users/distiller/workspace/miniconda3/bin/activate

				            pip install boto3

				            python3 -m pip install boto3==1.19.12

				            export IN_CI=1

				            export JOB_BASE_NAME=$CIRCLE_JOB

				@ -413,43 +416,6 @@

				        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				  pytorch_android_gradle_custom_build_single:

				    <<: *pytorch_android_params

				    resource_class: large

				    machine:

				      image: ubuntu-2004:202104-01

				    steps:

				    - checkout

				    - calculate_docker_image_tag

				    - setup_linux_system_environment

				    - checkout

				    - calculate_docker_image_tag

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle custom build single architecture (for PR)

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:

				          # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;

				          # 2) Not parallelizable by architecture: it only builds libtorch for one architecture;

				          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

				          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

				          git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0

				          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

				          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

				          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Skip docker push as this job is purely for size analysis purpose.

				          # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.

				    - upload_binary_size_for_android_build:

				        build_type: custom-build-single

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				@ -518,6 +484,7 @@

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            echo "USE_PYTORCH_METAL": "${USE_METAL}"

				            echo "BUILD_LITE_INTERPRETER": "${BUILD_LITE_INTERPRETER}"

				            echo "USE_COREML_DELEGATE": "${USE_COREML_DELEGATE}"

				            #check the custom build flag

				            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

				@ -526,6 +493,7 @@

				            fi

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            export USE_COREML_DELEGATE=${USE_COREML_DELEGATE}

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              export USE_PYTORCH_METAL=${USE_METAL}

				            fi

				@ -565,20 +533,32 @@

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            # use the pytorch nightly build to generate models

				            conda install pytorch torchvision -c pytorch-nightly --yes

				            pip3 install --pre torch torchvision torchaudio -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

				            # generate models for differnet backends

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            mkdir -p ../models

				            python trace_model.py

				            if [ ${USE_COREML_DELEGATE} == 1 ]; then

				              pip install coremltools==5.0b5

				              pip install six

				              python coreml_backend.py

				            else

				              python trace_model.py

				            fi

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              echo "Setting up the TestApp for LiteInterpreter"

				              ruby setup.rb --lite 1

				            else

				              echo "Setting up the TestApp for Full JIT"

				              ruby setup.rb

				            fi

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            if [ ${BUILD_LITE_INTERPRETER} == 1 ]; then

				              fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter

				              if [ ${USE_COREML_DELEGATE} == 1 ]; then

				                fastlane scan --only_testing TestAppTests/TestAppTests/testCoreML

				              else

				                fastlane scan --only_testing TestAppTests/TestAppTests/testLiteInterpreter

				              fi

				            else

				              fastlane scan --only_testing TestAppTests/TestAppTests/testFullJIT

				            fi

									
										171

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
												View File
												
				@ -190,8 +190,6 @@ jobs:

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            echo ".jenkins/pytorch/multigpu-test.sh" >> docker_commands.sh

				          elif [[ ${BUILD_ENVIRONMENT} == *onnx* ]]; then

				            echo "pip install click mock tabulate networkx==2.0" >> docker_commands.sh

				            echo "pip -q install --user \"file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx\"" >> docker_commands.sh

				            echo ".jenkins/caffe2/test.sh" >> docker_commands.sh

				          else

				            echo ".jenkins/pytorch/test.sh" >> docker_commands.sh

				@ -199,17 +197,6 @@ jobs:

				          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

				          unbuffer bash command.sh | ts

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then

				              echo "Retrieving C++ coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then

				              echo "Retrieving Python coverage report"

				              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test

				              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test

				              python3 -mpip install codecov

				              python3 -mcodecov

				          fi

				    - run:

				        name: Report results

				        no_output_timeout: "5m"

				@ -240,161 +227,3 @@ jobs:

				        when: always

				    - store_test_results:

				        path: test-reports

				    - store_artifacts:

				        path: test/.coverage

				    - store_artifacts:

				        path: test/coverage.xml

				  pytorch_windows_build:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-xlarge-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.8"

				      vs_version:

				        type: string

				        default: "16.8.6"

				      vc_version:

				        type: string

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - run:

				          name: Install VS2019 toolchain

				          no_output_timeout: 10m

				          command: |

				              powershell .circleci/scripts/vs_install.ps1

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              .circleci/scripts/windows_cuda_install.sh

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Build

				          no_output_timeout: "90m"

				          command: |

				            set -e

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-build.sh

				      - persist_to_workspace:

				          root: "C:/w"

				          paths: build-results

				      - store_artifacts:

				          path: C:/w/build-results

				  pytorch_windows_test:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-medium-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10.1"

				      python_version:

				        type: string

				        default: "3.8"

				      vs_version:

				        type: string

				        default: "16.8.6"

				      vc_version:

				        type: string

				        default: "14.16"

				      vc_year:

				        type: string

				        default: "2019"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - attach_workspace:

				          at: c:/users/circleci/workspace

				      - run:

				          name: Install VS2019 toolchain

				          no_output_timeout: 10m

				          command: |

				              powershell .circleci/scripts/vs_install.ps1

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              if [[ "${CUDA_VERSION}" != "10" || "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				                .circleci/scripts/windows_cuda_install.sh

				              fi

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${CUDA_VERSION}" != "cpu" ]]; then

				              .circleci/scripts/windows_cudnn_install.sh

				            fi

				      - run:

				          name: Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            export IN_CI=1

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-test.sh

				      - run:

				          name: Report results

				          no_output_timeout: "5m"

				          command: |

				            set -ex

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            pip install typing_extensions boto3

				            python -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				          when: always

				      - store_test_results:

				          path: test/test-reports

				      - store_artifacts:

				          path: test/coverage.xml

									
										37

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml
									
												View File
											
				@ -1,37 +0,0 @@

				  # the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables

				  # slow tests and sets an environment variable so gradcheck runs with fast_mode=False

				  slow-gradcheck-scheduled-ci:

				    triggers:

				      - schedule:

				          # runs every 8 hours on the 45th minute

				          cron: "45 0,8,16 * * *"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				      - pytorch_linux_build:

				          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

				          requires:

				            - "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				      - pytorch_linux_test:

				          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test1

				          requires:

				            - periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test1"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - pytorch_linux_test:

				          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_test2

				          requires:

				            - periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

				          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-test2"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

3

.clang-tidy

View File

 @ -33,11 +33,12 @@ modernize-*,
 -modernize-use-default-member-init,
 -modernize-use-using,
 -modernize-use-trailing-return-type,
 -modernize-use-nodiscard,
 performance-*,
 -performance-noexcept-move-constructor,
 -performance-unnecessary-value-param,
 '
 HeaderFilterRegex: 'torch/csrc/.*'
 HeaderFilterRegex: 'torch/csrc/(?!deploy/interpreter/cpython).*'
 AnalyzeTemporaryDtors: false
 WarningsAsErrors: '*'
 CheckOptions:

1

.flake8

View File

 @ -16,7 +16,6 @@ per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
 optional-ascii-coding = True
 exclude =
     ./.git,
     ./build_code_analyzer,
     ./build_test_custom_build,
     ./build,
     ./caffe2,

									
										49

.github/ISSUE_TEMPLATE/bug-report.md
									
										vendored
									
												View File
											
				@ -1,49 +0,0 @@

				---

				name: "\U0001F41B Bug Report"

				about: Submit a bug report to help us improve PyTorch

				---

				## 🐛 Bug

				<!-- A clear and concise description of what the bug is. -->

				## To Reproduce

				Steps to reproduce the behavior:

				1.

				1.

				1.

				<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

				## Expected behavior

				<!-- A clear and concise description of what you expected to happen. -->

				## Environment

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				 - PyTorch Version (e.g., 1.0):

				 - OS (e.g., Linux):

				 - How you installed PyTorch (`conda`, `pip`, source):

				 - Build command you used (if compiling from source):

				 - Python version:

				 - CUDA/cuDNN version:

				 - GPU models and configuration:

				 - Any other relevant information:

				## Additional context

				<!-- Add any other context about the problem here. -->

									
										56

.github/ISSUE_TEMPLATE/bug-report.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,56 @@

				name: 🐛 Bug Report

				description: Create a report to help us reproduce and fix the bug

				body:

				- type: markdown

				  attributes:

				    value: >

				      #### Before submitting a bug, please make sure the issue hasn't been already addressed by searching through [the existing and past issues](https://github.com/pytorch/pytorch/issues?q=is%3Aissue+sort%3Acreated-desc+).

				- type: textarea

				  attributes:

				    label: 🐛 Describe the bug

				    description: |

				      Please provide a clear and concise description of what the bug is.

				      If relevant, add a minimal example so that we can reproduce the error by running the code. It is very important for the snippet to be as succinct (minimal) as possible, so please take time to trim down any irrelevant code to help us debug efficiently. We are going to copy-paste your code and we expect to get the same result as you did: avoid any external data, and include the relevant imports, etc. For example:

				      ```python

				      # All necessary imports at the beginning

				      import torch

				      # A succinct reproducing example trimmed down to the essential parts:

				      t = torch.rand(5, 10)  # Note: the bug is here, we should pass requires_grad=True

				      t.sum().backward()

				      ```

				      If the code is too long (hopefully, it isn't), feel free to put it in a public gist and link it in the issue: https://gist.github.com.

				      Please also paste or describe the results you observe instead of the expected results. If you observe an error, please paste the error message including the **full** traceback of the exception. It may be relevant to wrap error messages in ```` ```triple quotes blocks``` ````.

				    placeholder: |

				      A clear and concise description of what the bug is.

				      ```python

				      # Sample code to reproduce the problem

				      ```

				      ```

				      The error message you got, with the full traceback.

				      ```

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Versions

				    description: |

				      Please run the following and paste the output below.

				      ```sh

				      wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				      # For security purposes, please check the contents of collect_env.py before running it.

				      python collect_env.py

				      ```

				  validations:

				    required: true

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										39

.github/ISSUE_TEMPLATE/ci-sev.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,39 @@

				---

				name: "⚠️CI SEV"

				about: Tracking incidents for PyTorch's CI infra.

				---

				> NOTE: Remember to label this issue with "`ci: sev`"

				## Current Status

				*Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase)*.

				## Error looks like

				*Provide some way users can tell that this SEV is causing their issue.*

				## Incident timeline (all times pacific)

				*Include when the incident began, when it was detected, mitigated, root caused, and finally closed.*

				<details>

				<summary> Click for example </summary>

				e.g.

				- 10/30 7:27a incident began

				- 10/30 8:30a detected by <method>

				- 10/30 9:00 pm root caused as…

				- 10/30 9:10 pm mitigated by…

				- 10/31 10: am closed by…

				</details>

				## User impact

				*How does this affect users of PyTorch CI?*

				## Root cause

				*What was the root cause of this issue?*

				## Mitigation

				*How did we mitigate the issue?*

				## Prevention/followups

				*How do we prevent issues like this in the future?*

									
										5

.github/ISSUE_TEMPLATE/config.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,5 @@

				blank_issues_enabled: true

				contact_links:

				  - name: Questions

				    url: https://discuss.pytorch.org/

				    about: Ask questions and discuss with other pytorch community members

									
										9

.github/ISSUE_TEMPLATE/documentation.md
									
										vendored
									
												View File
											
				@ -1,9 +0,0 @@

				---

				name: "\U0001F4DA Documentation"

				about: Report an issue related to https://pytorch.org/docs

				---

				## 📚 Documentation

				<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

									
										20

.github/ISSUE_TEMPLATE/documentation.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				name: 📚 Documentation

				description: Report an issue related to https://pytorch.org/docs/stable/index.html

				body:

				- type: textarea

				  attributes:

				    label: 📚 The doc issue

				    description: >

				      A clear and concise description of what content in https://pytorch.org/docs/stable/index.html is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new.

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Suggest a potential alternative/fix

				    description: >

				      Tell us how we could improve the documentation in this regard.

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										24

.github/ISSUE_TEMPLATE/feature-request.md
									
										vendored
									
												View File
											
				@ -1,24 +0,0 @@

				---

				name: "\U0001F680 Feature Request"

				about: Submit a proposal/request for a new PyTorch feature

				---

				## 🚀 Feature

				<!-- A clear and concise description of the feature proposal -->

				## Motivation

				<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

				## Pitch

				<!-- A clear and concise description of what you want to happen. -->

				## Alternatives

				<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

				## Additional context

				<!-- Add any other context or screenshots about the feature request here. -->

									
										25

.github/ISSUE_TEMPLATE/feature-request.yml
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,25 @@

				name: 🚀 Feature request

				description: Submit a proposal/request for a new pytorch feature

				body:

				- type: textarea

				  attributes:

				    label: 🚀 The feature, motivation and pitch

				    description: >

				      A clear and concise description of the feature proposal. Please outline the motivation for the proposal. Is your feature request related to a specific problem? e.g., *"I'm working on X and would like Y to be possible"*. If this is related to another GitHub issue, please link here too.

				  validations:

				    required: true

				- type: textarea

				  attributes:

				    label: Alternatives

				    description: >

				      A description of any alternative solutions or features you've considered, if any.

				- type: textarea

				  attributes:

				    label: Additional context

				    description: >

				      Add any other context or screenshots about the feature request.

				- type: markdown

				  attributes:

				    value: >

				      Thanks for contributing 🎉!

									
										13

.github/ISSUE_TEMPLATE/questions-help-support.md
									
										vendored
									
												View File
											
				@ -1,13 +0,0 @@

				---

				name: "❓Questions/Help/Support"

				about: Do you need support? We have resources.

				---

				## ❓ Questions and Help

				### Please note that this issue tracker is not a help form and this issue will be closed.

				We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:

				- [Discussion Forum](https://discuss.pytorch.org/)

2

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

 @ -1 +1 @@
 Fixes #{issue number}
 Fixes #ISSUE_NUMBER

									
										1

.github/actionlint.yaml
									
										vendored
									
												View File
												
				@ -1,5 +1,6 @@

				self-hosted-runner:

				  labels:

				    - linux.large

				    - linux.2xlarge

				    - linux.8xlarge.nvidia.gpu

				    - linux.16xlarge.nvidia.gpu

									
										254

.github/generated-ciflow-ruleset.json
									
										generated
									
										vendored
									
												View File
												
				@ -2,100 +2,236 @@

				  "__comment": "@generated DO NOT EDIT MANUALLY, Generation script: .github/scripts/generate_ci_workflows.py",

				  "label_rules": {

				    "ciflow/all": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "docker-builds",

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "puretorch-linux-xenial-py3.6-gcc5.4",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda10.2-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/bazel": [

				      "linux-xenial-py3.6-gcc7-bazel-test"

				    "ciflow/android": [

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"

				    ],

				    "ciflow/coverage": [

				      "linux-bionic-py3.8-gcc9-coverage"

				    "ciflow/bazel": [

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test"

				    ],

				    "ciflow/cpu": [

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "puretorch-linux-xenial-py3.6-gcc5.4",

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3"

				    ],

				    "ciflow/cuda": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "win-vs2019-cuda10.2-py3",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/default": [

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/docs": [

				      "linux-docs"

				    ],

				    "ciflow/ios": [

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit"

				    ],

				    "ciflow/libtorch": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7"

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7"

				    ],

				    "ciflow/linux": [

				      "libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.6-clang9",

				      "linux-bionic-py3.8-gcc9-coverage",

				      "linux-xenial-cuda10.2-py3.6-gcc7",

				      "linux-xenial-cuda11.3-py3.6-gcc7",

				      "linux-xenial-py3.6-gcc5.4",

				      "linux-xenial-py3.6-gcc7-bazel-test",

				      "parallelnative-linux-xenial-py3.6-gcc5.4",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "puretorch-linux-xenial-py3.6-gcc5.4"

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-docs-push",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit"

				    ],

				    "ciflow/macos": [

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64"

				    ],

				    "ciflow/mobile": [

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static"

				    ],

				    "ciflow/noarch": [

				      "linux-bionic-py3.6-clang9"

				      "linux-bionic-py3.7-clang9"

				    ],

				    "ciflow/onnx": [

				      "linux-xenial-py3.7-clang7-onnx"

				    ],

				    "ciflow/sanitizers": [

				      "linux-xenial-py3.7-clang7-asan"

				    ],

				    "ciflow/scheduled": [

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				      "periodic-win-vs2019-cuda11.1-py3"

				      "linux-docs-push",

				      "periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				      "periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				      "periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3"

				    ],

				    "ciflow/slow": [

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-xenial-cuda10.2-py3.6-gcc7"

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"

				    ],

				    "ciflow/slow-gradcheck": [

				      "periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck"

				    ],

				    "ciflow/trunk": [

				      "caffe2-linux-xenial-py3.7-gcc5.4",

				      "docker-builds",

				      "ios-12-5-1-arm64",

				      "ios-12-5-1-arm64-coreml",

				      "ios-12-5-1-arm64-custom-ops",

				      "ios-12-5-1-arm64-full-jit",

				      "ios-12-5-1-arm64-metal",

				      "ios-12-5-1-x86-64",

				      "ios-12-5-1-x86-64-coreml",

				      "ios-12-5-1-x86-64-full-jit",

				      "libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				      "libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-bionic-cuda10.2-py3.9-gcc7",

				      "linux-bionic-py3.7-clang9",

				      "linux-docs",

				      "linux-vulkan-bionic-py3.7-clang9",

				      "linux-xenial-cuda11.3-py3.7-gcc7",

				      "linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				      "linux-xenial-py3-clang5-mobile-build",

				      "linux-xenial-py3-clang5-mobile-custom-build-static",

				      "linux-xenial-py3.7-clang7-asan",

				      "linux-xenial-py3.7-clang7-onnx",

				      "linux-xenial-py3.7-gcc5.4",

				      "linux-xenial-py3.7-gcc7",

				      "macos-10-15-py3-arm64",

				      "macos-10-15-py3-lite-interpreter-x86-64",

				      "macos-11-py3-x86-64",

				      "parallelnative-linux-xenial-py3.7-gcc5.4",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				      "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/vulkan": [

				      "linux-vulkan-bionic-py3.7-clang9"

				    ],

				    "ciflow/win": [

				      "periodic-win-vs2019-cuda11.1-py3",

				      "periodic-win-vs2019-cuda11.5-py3",

				      "win-vs2019-cpu-py3",

				      "win-vs2019-cuda10.2-py3",

				      "win-vs2019-cuda11.3-py3"

				    ],

				    "ciflow/xla": [

				      "linux-bionic-py3.6-clang9"

				    ]

				  },

				  "version": "v1"

									
										23

.github/scale-config.yml
									
										vendored
									
												View File
												
				@ -15,23 +15,42 @@

				#     os: linux

				#     max_available: 20

				#     disk_size: 50

				#     is_ephemeral: true

				runner_types:

				  # mainly used for ciflow-should-run, not made to run any serious tests

				  linux.large:

				    instance_type: c5.large

				    os: linux

				    disk_size: 10

				    is_ephemeral: false

				  linux.2xlarge:

				    instance_type: c5.2xlarge

				    os: linux

				    max_available: 500

				    disk_size: 150

				  linux.4xlarge:

				    instance_type: c5.4xlarge

				    os: linux

				    disk_size: 150

				  linux.8xlarge.nvidia.gpu:

				    instance_type: g3.8xlarge

				    os: linux

				    max_available: 50

				    max_available: 125

				    disk_size: 150

				    is_ephemeral: false

				  linux.4xlarge.nvidia.gpu:

				    instance_type: g3.4xlarge

				    os: linux

				    max_available: 125

				    disk_size: 150

				    is_ephemeral: false

				  linux.16xlarge.nvidia.gpu:

				    instance_type: g3.16xlarge

				    os: linux

				    max_available: 10

				    disk_size: 150

				    is_ephemeral: false

				  windows.4xlarge:

				    instance_type: c5d.4xlarge

				    os: windows

				@ -40,5 +59,5 @@ runner_types:

				  windows.8xlarge.nvidia.gpu:

				    instance_type: p3.2xlarge

				    os: windows

				    max_available: 25

				    max_available: 50

				    disk_size: 256

									
										11

.github/scripts/ensure_actions_will_cancel.py
									
										vendored
									
												View File
												
				@ -46,11 +46,20 @@ if __name__ == "__main__":

				            "group": concurrency_key(filename),

				            "cancel-in-progress": True,

				        }

				        if data.get("concurrency", None) != expected:

				        actual = data.get("concurrency", None)

				        if actual != expected:

				            print(

				                f"'concurrency' incorrect or not found in '{filename.relative_to(REPO_ROOT)}'",

				                file=sys.stderr,

				            )

				            print(

				                f"expected: {expected}",

				                file=sys.stderr,

				            )

				            print(

				                f"actual:   {actual}",

				                file=sys.stderr,

				            )

				            errors_found = True

				    if errors_found:

									
										71

.github/scripts/export_pytorch_labels.py
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,71 @@

				#!/usr/bin/env python3

				'''

				Test ownership was introduced in https://github.com/pytorch/pytorch/issues/66232.

				As a part of enforcing test ownership, we want to maintain a list of existing PyTorch labels

				to verify the owners' existence. This script outputs a file containing a list of existing

				pytorch/pytorch labels so that the file could be uploaded to S3.

				This script assumes the correct env vars are set for AWS permissions.

				'''

				import boto3  # type: ignore[import]

				import json

				from functools import lru_cache

				from typing import List, Any

				from urllib.request import urlopen, Request

				# Modified from https://github.com/pytorch/pytorch/blob/b00206d4737d1f1e7a442c9f8a1cadccd272a386/torch/hub.py#L129

				def _read_url(url: Any) -> Any:

				    with urlopen(url) as r:

				        return r.headers, r.read().decode(r.headers.get_content_charset('utf-8'))

				def request_for_labels(url: str) -> Any:

				    headers = {'Accept': 'application/vnd.github.v3+json'}

				    return _read_url(Request(url, headers=headers))

				def get_last_page(header: Any) -> int:

				    # Link info looks like: <https://api.github.com/repositories/65600975/labels?per_page=100&page=2>;

				    # rel="next", <https://api.github.com/repositories/65600975/labels?per_page=100&page=3>; rel="last"

				    link_info = header['link']

				    prefix = "&page="

				    suffix = ">;"

				    return int(link_info[link_info.rindex(prefix) + len(prefix):link_info.rindex(suffix)])

				def update_labels(labels: List[str], info: str) -> None:

				    labels_json = json.loads(info)

				    labels.extend([x["name"] for x in labels_json])

				@lru_cache()

				def get_pytorch_labels() -> List[str]:

				    prefix = "https://api.github.com/repos/pytorch/pytorch/labels?per_page=100"

				    header, info = request_for_labels(prefix + "&page=1")

				    labels: List[str] = []

				    update_labels(labels, info)

				    last_page = get_last_page(header)

				    assert last_page > 0, "Error reading header info to determine total number of pages of labels"

				    for page_number in range(2, last_page + 1):  # skip page 1

				        _, info = request_for_labels(prefix + f"&page={page_number}")

				        update_labels(labels, info)

				    return labels

				def send_labels_to_S3(labels: List[str]) -> None:

				    labels_file_name = "pytorch_labels.json"

				    obj = boto3.resource('s3').Object('ossci-metrics', labels_file_name)

				    obj.put(Body=json.dumps(labels).encode())

				def main() -> None:

				    send_labels_to_S3(get_pytorch_labels())

				if __name__ == '__main__':

				    main()

									
										1

.github/scripts/generate_binary_build_matrix.py
									
										vendored
									
												View File
												
				@ -72,7 +72,6 @@ LIBTORCH_CONTAINER_IMAGES = {

				}

				FULL_PYTHON_VERSIONS = [

				    "3.6",

				    "3.7",

				    "3.8",

				    "3.9",

									
										699

.github/scripts/generate_ci_workflows.py
									
										vendored
									
												View File
												
				@ -2,7 +2,7 @@

				from dataclasses import asdict, dataclass, field

				from pathlib import Path

				from typing import Dict, Set

				from typing import Dict, Set, List, Iterable

				import jinja2

				import json

				@ -11,12 +11,13 @@ import sys

				from typing_extensions import Literal

				YamlShellBool = Literal["''", 1]

				Arch = Literal["windows", "linux"]

				Arch = Literal["windows", "linux", "macos"]

				DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"

				GITHUB_DIR = Path(__file__).resolve().parent.parent

				WINDOWS_CPU_TEST_RUNNER = "windows.4xlarge"

				# contains 1 gpu

				WINDOWS_CUDA_TEST_RUNNER = "windows.8xlarge.nvidia.gpu"

				WINDOWS_RUNNERS = {

				    WINDOWS_CPU_TEST_RUNNER,

				@ -24,12 +25,21 @@ WINDOWS_RUNNERS = {

				}

				LINUX_CPU_TEST_RUNNER = "linux.2xlarge"

				LINUX_CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"

				# contains 1 gpu

				LINUX_CUDA_TEST_RUNNER = "linux.4xlarge.nvidia.gpu"

				LINUX_RUNNERS = {

				    LINUX_CPU_TEST_RUNNER,

				    LINUX_CUDA_TEST_RUNNER,

				}

				MACOS_TEST_RUNNER_10_15 = "macos-10.15"

				MACOS_TEST_RUNNER_11 = "macos-11"

				MACOS_RUNNERS = {

				    MACOS_TEST_RUNNER_10_15,

				    MACOS_TEST_RUNNER_11,

				}

				CUDA_RUNNERS = {

				    WINDOWS_CUDA_TEST_RUNNER,

				    LINUX_CUDA_TEST_RUNNER,

				@ -41,60 +51,71 @@ CPU_RUNNERS = {

				LABEL_CIFLOW_ALL = "ciflow/all"

				LABEL_CIFLOW_BAZEL = "ciflow/bazel"

				LABEL_CIFLOW_COVERAGE = "ciflow/coverage"

				LABEL_CIFLOW_CPU = "ciflow/cpu"

				LABEL_CIFLOW_CUDA = "ciflow/cuda"

				LABEL_CIFLOW_DOCS = "ciflow/docs"

				LABEL_CIFLOW_DEFAULT = "ciflow/default"

				LABEL_CIFLOW_LIBTORCH = "ciflow/libtorch"

				LABEL_CIFLOW_LINUX = "ciflow/linux"

				LABEL_CIFLOW_MOBILE = "ciflow/mobile"

				LABEL_CIFLOW_ANDROID = "ciflow/android"

				LABEL_CIFLOW_SANITIZERS = "ciflow/sanitizers"

				LABEL_CIFLOW_ONNX = "ciflow/onnx"

				LABEL_CIFLOW_SCHEDULED = "ciflow/scheduled"

				LABEL_CIFLOW_SLOW = "ciflow/slow"

				LABEL_CIFLOW_WIN = "ciflow/win"

				LABEL_CIFLOW_XLA = "ciflow/xla"

				LABEL_CIFLOW_NOARCH = "ciflow/noarch"

				LABEL_CIFLOW_VULKAN = "ciflow/vulkan"

				LABEL_CIFLOW_PREFIX = "ciflow/"

				LABEL_CIFLOW_SLOW_GRADCHECK = "ciflow/slow-gradcheck"

				LABEL_CIFLOW_DOCKER = "ciflow/docker"

				LABEL_CIFLOW_IOS = "ciflow/ios"

				LABEL_CIFLOW_MACOS = "ciflow/macos"

				LABEL_CIFLOW_TRUNK = "ciflow/trunk"

				@dataclass

				class CIFlowConfig:

				    enabled: bool = False

				    # For use to enable workflows to run on pytorch/pytorch-canary

				    run_on_canary: bool = False

				    labels: Set[str] = field(default_factory=set)

				    trigger_action: str = 'unassigned'

				    trigger_actor: str = 'pytorchbot'

				    root_job_name: str = 'ciflow_should_run'

				    root_job_condition: str = ''

				    # trigger_action_only controls if we listen only on the trigger_action of a pull_request.

				    # If it's False, we listen on all default pull_request actions, this is useful when

				    # ciflow (via probot) is not automated yet.

				    trigger_action_only: bool = False

				    label_conditions: str = ''

				    def gen_root_job_condition(self) -> None:

				        # TODO: Make conditions strict

				        # At the beginning of the rollout of ciflow, we keep everything the same as what we have

				        # Once fully rollout, we can have strict constraints

				        # e.g. ADD      env.GITHUB_ACTOR == '{self.trigger_actor}

				        #      REMOVE   github.event.action !='{self.trigger_action}'

				        label_conditions = [

				            f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]

				        if self.run_on_canary:

				            self.root_job_condition = "(github.repository_owner == 'pytorch') && "

				        # CIFlow conditions:

				        #  - Workflow should always run on push

				        #  - CIFLOW_DEFAULT workflows should run on PRs even if no `ciflow/` labels on PR

				        #  - Otherwise workflow should be scheduled on all qualifying events

				        label_conditions = [f"contains(github.event.pull_request.labels.*.name, '{label}')" for label in sorted(self.labels)]

				        self.label_conditions = ' || '.join(label_conditions)

				        repo_condition = "github.repository_owner == 'pytorch'" if self.run_on_canary else "github.repository == 'pytorch/pytorch'"

				        push_event = "github.event_name == 'push'"

				        scheduled_event = "github.event_name == 'schedule'"

				        pr_updated_event = f"github.event_name == 'pull_request' && github.event.action != '{self.trigger_action}'"

				        if LABEL_CIFLOW_DEFAULT in self.labels:

				            run_with_no_labels = f"({pr_updated_event}) && " \

				                                 f"!contains(join(github.event.pull_request.labels.*.name), '{LABEL_CIFLOW_PREFIX}')"

				        else:

				            self.root_job_condition = "(github.repository == 'pytorch/pytorch') && "

				        self.root_job_condition += f"((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || " \

				            f"(github.event.action !='{self.trigger_action}') || " \

				            f"({' || '.join(label_conditions)}))"

				            run_with_no_labels = "false"

				        self.root_job_condition = f"${{{{ ({repo_condition}) && (\n" \

				                                  f"            ({push_event}) ||\n" \

				                                  f"            ({scheduled_event}) ||\n" \

				                                  f"            ({self.label_conditions}) ||\n" \

				                                  f"            ({run_with_no_labels}))\n"\

				                                  f"         }}}}"

				    def reset_root_job(self) -> None:

				        self.root_job_name = ''

				        self.root_job_condition = ''

				    def __post_init__(self) -> None:

				        if not self.enabled:

				            self.reset_root_job()

				            return

				        self.labels.add(LABEL_CIFLOW_ALL)

				        if LABEL_CIFLOW_SCHEDULED not in self.labels:

				            self.labels.add(LABEL_CIFLOW_TRUNK)

				        assert all(label.startswith(LABEL_CIFLOW_PREFIX) for label in self.labels)

				        self.gen_root_job_condition()

				@ -131,28 +152,30 @@ class CIWorkflow:

				    # Required fields

				    arch: Arch

				    build_environment: str

				    test_runner_type: str

				    # Optional fields

				    test_runner_type: str = ''

				    ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)

				    cuda_version: str = ''

				    docker_image_base: str = ''

				    enable_doc_jobs: bool = False

				    exclude_test: bool = False

				    is_coverage: bool = False

				    is_libtorch: bool = False

				    build_generates_artifacts: bool = True

				    build_with_debug: bool = False

				    is_scheduled: str = ''

				    num_test_shards: int = 1

				    on_pull_request: bool = False

				    only_build_on_pull_request: bool = False

				    only_run_smoke_tests_on_pull_request: bool = False

				    num_test_shards_on_pull_request: int = -1

				    distributed_test: bool = True

				    fx2trt_test: bool = False

				    timeout_after: int = 240

				    xcode_version: str = ''

				    # The following variables will be set as environment variables,

				    # so it's easier for both shell and Python scripts to consume it if false is represented as the empty string.

				    enable_jit_legacy_test: YamlShellBool = "''"

				    enable_distributed_test: YamlShellBool = "''"

				    enable_fx2trt_test: YamlShellBool = "''"

				    enable_multigpu_test: YamlShellBool = "''"

				    enable_nogpu_no_avx_test: YamlShellBool = "''"

				    enable_nogpu_no_avx2_test: YamlShellBool = "''"

				@ -161,23 +184,24 @@ class CIWorkflow:

				    enable_backwards_compat_test: YamlShellBool = "''"

				    enable_xla_test: YamlShellBool = "''"

				    enable_noarch_test: YamlShellBool = "''"

				    enable_force_on_cpu_test: YamlShellBool = "''"

				    def __post_init__(self) -> None:

				        if self.is_libtorch:

				        if not self.build_generates_artifacts:

				            self.exclude_test = True

				        if not self.on_pull_request:

				            self.only_build_on_pull_request = False

				        if self.distributed_test:

				            self.enable_distributed_test = 1

				        if self.fx2trt_test:

				            self.enable_fx2trt_test = 1

				        # If num_test_shards_on_pull_request is not user-defined, default to num_test_shards unless we are

				        # only running smoke tests on the pull request.

				        if self.num_test_shards_on_pull_request == -1:

				            # Don't waste resources on runner spinup and cooldown for another shard if we are only running a few tests

				            # Don't run the default if we are only running smoke tests

				            if self.only_run_smoke_tests_on_pull_request:

				                self.num_test_shards_on_pull_request = 1

				                self.num_test_shards_on_pull_request = 0

				            else:

				                self.num_test_shards_on_pull_request = self.num_test_shards

				        self.assert_valid()

				@ -189,20 +213,27 @@ class CIWorkflow:

				        if self.arch == 'windows':

				            assert self.test_runner_type in WINDOWS_RUNNERS, err_message

				        if self.ciflow_config.enabled:

				            # make sure if LABEL_CIFLOW_DEFAULT is set, we then need to set trigger_action_only to False

				            assert self.ciflow_config.trigger_action_only != (LABEL_CIFLOW_DEFAULT in self.ciflow_config.labels)

				            assert self.on_pull_request

				            assert LABEL_CIFLOW_ALL in self.ciflow_config.labels

				            assert LABEL_CIFLOW_ALL in self.ciflow_config.root_job_condition

				            if self.arch == 'linux':

				                assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels

				            if self.arch == 'windows':

				                assert LABEL_CIFLOW_WIN in self.ciflow_config.labels

				            if self.test_runner_type in CUDA_RUNNERS:

				                assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels

				            if self.test_runner_type in CPU_RUNNERS:

				                assert LABEL_CIFLOW_CPU in self.ciflow_config.labels

				        assert LABEL_CIFLOW_ALL in self.ciflow_config.labels

				        assert LABEL_CIFLOW_ALL in self.ciflow_config.label_conditions

				        if self.arch == 'linux':

				            assert LABEL_CIFLOW_LINUX in self.ciflow_config.labels

				        if self.arch == 'windows':

				            assert LABEL_CIFLOW_WIN in self.ciflow_config.labels

				        if self.arch == 'macos':

				            assert LABEL_CIFLOW_MACOS in self.ciflow_config.labels

				        # Make sure that jobs with tests have a test_runner_type

				        if not self.exclude_test:

				            assert self.test_runner_type != ''

				        if self.test_runner_type in CUDA_RUNNERS:

				            assert LABEL_CIFLOW_CUDA in self.ciflow_config.labels

				        if self.test_runner_type in CPU_RUNNERS and not self.exclude_test:

				            assert LABEL_CIFLOW_CPU in self.ciflow_config.labels

				        if self.is_scheduled:

				            assert LABEL_CIFLOW_DEFAULT not in self.ciflow_config.labels

				            assert LABEL_CIFLOW_TRUNK not in self.ciflow_config.labels

				            assert LABEL_CIFLOW_SCHEDULED in self.ciflow_config.labels

				        if self.build_with_debug:

				            assert self.build_environment.endswith("-debug")

				    def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:

				        output_file_path = GITHUB_DIR / f"workflows/generated-{self.build_environment}.yml"

				@ -219,6 +250,30 @@ class CIWorkflow:

				                output_file.write("\n")

				        print(output_file_path)

				@dataclass

				class DockerWorkflow:

				    build_environment: str

				    docker_images: List[str]

				    # Optional fields

				    ciflow_config: CIFlowConfig = field(default_factory=CIFlowConfig)

				    cuda_version: str = ''

				    is_scheduled: str = ''

				    def generate_workflow_file(self, workflow_template: jinja2.Template) -> None:

				        output_file_path = GITHUB_DIR / "workflows/generated-docker-builds.yml"

				        with open(output_file_path, "w") as output_file:

				            GENERATED = "generated"  # Note that please keep the variable GENERATED otherwise phabricator will hide the whole file

				            output_file.writelines([f"# @{GENERATED} DO NOT EDIT MANUALLY\n"])

				            try:

				                content = workflow_template.render(asdict(self))

				            except Exception as e:

				                print(f"Failed on template: {workflow_template}", file=sys.stderr)

				                raise e

				            output_file.write(content)

				            if content[-1] != "\n":

				                output_file.write("\n")

				        print(output_file_path)

				WINDOWS_WORKFLOWS = [

				    CIWorkflow(

				@ -226,41 +281,38 @@ WINDOWS_WORKFLOWS = [

				        build_environment="win-vs2019-cpu-py3",

				        cuda_version="cpu",

				        test_runner_type=WINDOWS_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CPU, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    CIWorkflow(

				        arch="windows",

				        build_environment="win-vs2019-cuda10.2-py3",

				        cuda_version="10.2",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        on_pull_request=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    CIWorkflow(

				        arch="windows",

				        build_environment="win-vs2019-cuda11.3-py3",

				        cuda_version="11.3",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        only_run_smoke_tests_on_pull_request=True,

				        enable_force_on_cpu_test=1,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    CIWorkflow(

				        arch="windows",

				        build_environment="periodic-win-vs2019-cuda11.5-py3",

				        cuda_version="11.5",

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        enable_force_on_cpu_test=1,

				        is_scheduled="45 4,10,16,22 * * *",

				        ciflow_config=CIFlowConfig(

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_WIN}

				        ),

				    ),

				    CIWorkflow(

				        arch="windows",

				        build_environment="periodic-win-vs2019-cuda11.1-py3",

				@ -268,10 +320,7 @@ WINDOWS_WORKFLOWS = [

				        test_runner_type=WINDOWS_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_WIN, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				@ -280,17 +329,50 @@ WINDOWS_WORKFLOWS = [

				LINUX_WORKFLOWS = [

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        build_environment="linux-xenial-py3.7-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        enable_jit_legacy_test=1,

				        enable_doc_jobs=True,

				        enable_docs_test=1,

				        enable_backwards_compat_test=1,

				        enable_docs_test=1,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-docs",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        enable_doc_jobs=True,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_DOCS, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-docs-push",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        enable_doc_jobs=True,

				        exclude_test=True,

				        is_scheduled="0 0 * * *",  # run pushes only on a nightly schedule

				        # NOTE: This is purposefully left without LABEL_CIFLOW_DOCS so that you can run

				        #       docs builds on your PR without the fear of anything pushing

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc7",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU}

				        ),

				@ -301,257 +383,390 @@ LINUX_WORKFLOWS = [

				    #    build_environment="paralleltbb-linux-xenial-py3.6-gcc5.4",

				    #    docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				    #    test_runner_type=LINUX_CPU_TEST_RUNNER,

				    #    # This is a master only job despite on_pull_request is set to True

				    #    on_pull_request=True,

				    #    ciflow_config=CIFlowConfig(

				    #        enabled=True,

				    #        trigger_action_only=True,

				    #        labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				    #    ),

				    # ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="parallelnative-linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        build_environment="parallelnative-linux-xenial-py3.7-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        # This is a master only job despite on_pull_request is set to True

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    # Build PyTorch with BUILD_CAFFE2=OFF

				    # Build PyTorch with BUILD_CAFFE2=ON

				    CIWorkflow(

				        arch="linux",

				        build_environment="puretorch-linux-xenial-py3.6-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

				        build_environment="caffe2-linux-xenial-py3.7-gcc5.4",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.7-gcc5.4",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        exclude_test=True,

				        # This is a master only job despite on_pull_request is set to True

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-gcc7",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-asan",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang7-onnx",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3-clang5-mobile-build",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        build_generates_artifacts=False,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3-clang5-mobile-custom-build-static",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        build_generates_artifacts=False,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_MOBILE, LABEL_CIFLOW_DEFAULT},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.7-clang7-asan",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-asan",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        num_test_shards=3,

				        distributed_test=False,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_SANITIZERS, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.7-clang7-onnx",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        num_test_shards=2,

				        distributed_test=False,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_ONNX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-cuda10.2-py3.9-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            run_on_canary=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-cuda10.2-py3.6-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        enable_jit_legacy_test=1,

				        enable_multigpu_test=1,

				        enable_nogpu_no_avx_test=1,

				        enable_nogpu_no_avx2_test=1,

				        enable_slow_test=1,

				        num_test_shards=2,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				            run_on_canary=True,

				            labels={LABEL_CIFLOW_SLOW, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="libtorch-linux-xenial-cuda10.2-py3.6-gcc7",

				        build_environment="libtorch-linux-xenial-cuda10.2-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        on_pull_request=True,

				        build_generates_artifacts=False,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-cuda11.3-py3.6-gcc7",

				        build_environment="periodic-linux-bionic-cuda11.5-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        is_scheduled="45 4,10,16,22 * * *",

				        ciflow_config=CIFlowConfig(

				            labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        build_generates_artifacts=False,

				        is_scheduled="45 4,10,16,22 * * *",

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels=set([LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-cuda11.3-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        on_pull_request=True,

				        fx2trt_test=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels=set([LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="libtorch-linux-xenial-cuda11.3-py3.6-gcc7",

				        build_environment="libtorch-linux-xenial-cuda11.3-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        on_pull_request=True,

				        build_generates_artifacts=False,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels=set([LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA]),

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-linux-xenial-cuda11.1-py3.6-gcc7",

				        build_environment="periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        build_with_debug=True,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA}

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7",

				        build_environment="periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        is_libtorch=True,

				        build_generates_artifacts=False,

				        exclude_test=True,

				        is_scheduled="45 0,4,8,12,16,20 * * *",

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            trigger_action_only=True,

				            labels={LABEL_CIFLOW_SCHEDULED, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_LIBTORCH, LABEL_CIFLOW_CUDA},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-py3.8-gcc9-coverage",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",

				        build_environment="linux-bionic-py3.7-clang9",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        is_coverage=True,

				        num_test_shards=2,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_COVERAGE, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-bionic-py3.6-clang9",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        num_test_shards=2,

				        distributed_test=False,

				        enable_noarch_test=1,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_XLA, LABEL_CIFLOW_NOARCH},

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_NOARCH},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-vulkan-bionic-py3.7-clang9",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.7-clang9",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        num_test_shards=1,

				        distributed_test=False,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_VULKAN},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				        test_runner_type=LINUX_CUDA_TEST_RUNNER,

				        num_test_shards=2,

				        distributed_test=False,

				        timeout_after=360,

				        # Only run this on master 4 times per day since it does take a while

				        is_scheduled="0 */4 * * *",

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CUDA, LABEL_CIFLOW_SLOW_GRADCHECK, LABEL_CIFLOW_SLOW, LABEL_CIFLOW_SCHEDULED},

				        ),

				    ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-bionic-rocm3.9-py3.6",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-custom-dynamic",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-custom-static",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				    # CIWorkflow(

				    #     arch="linux",

				    #     build_environment="linux-xenial-py3.6-clang5-mobile-code-analysis",

				    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    #     test_runner_type=LINUX_CPU_TEST_RUNNER,

				    # ),

				]

				ANDROID_SHORT_WORKFLOWS = [

				    CIWorkflow(

				        arch="linux",

				        build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT},

				        ),

				    ),

				    CIWorkflow(

				        arch="linux",

				        build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID, LABEL_CIFLOW_DEFAULT},

				        ),

				    ),

				]

				ANDROID_WORKFLOWS = [

				    CIWorkflow(

				        arch="linux",

				        build_environment="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_LINUX, LABEL_CIFLOW_CPU, LABEL_CIFLOW_ANDROID},

				        ),

				    ),

				]

				BAZEL_WORKFLOWS = [

				    CIWorkflow(

				        arch="linux",

				        build_environment="linux-xenial-py3.6-gcc7-bazel-test",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

				        build_environment="linux-xenial-cuda11.3-py3.7-gcc7-bazel-test",

				        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

				        test_runner_type=LINUX_CPU_TEST_RUNNER,

				        on_pull_request=True,

				        ciflow_config=CIFlowConfig(

				            enabled=True,

				            labels={LABEL_CIFLOW_DEFAULT, LABEL_CIFLOW_BAZEL, LABEL_CIFLOW_CPU, LABEL_CIFLOW_LINUX},

				        ),

				    ),

				]

				if __name__ == "__main__":

				IOS_WORKFLOWS = [

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-arm64",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-arm64-coreml",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-arm64-full-jit",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-arm64-custom-ops",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-arm64-metal",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-x86-64",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-x86-64-coreml",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="ios-12-5-1-x86-64-full-jit",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_IOS, LABEL_CIFLOW_MACOS},

				        ),

				    ),

				]

				MACOS_WORKFLOWS = [

				    # Distributed tests are still run on MacOS, but part of regular shards

				    CIWorkflow(

				        arch="macos",

				        build_environment="macos-11-py3-x86-64",

				        xcode_version="12.4",

				        test_runner_type=MACOS_TEST_RUNNER_11,

				        num_test_shards=2,

				        distributed_test=False,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="macos-10-15-py3-lite-interpreter-x86-64",

				        xcode_version="12",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        build_generates_artifacts=False,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_MACOS},

				        ),

				    ),

				    CIWorkflow(

				        arch="macos",

				        build_environment="macos-10-15-py3-arm64",

				        test_runner_type=MACOS_TEST_RUNNER_10_15,

				        exclude_test=True,

				        ciflow_config=CIFlowConfig(

				            labels={LABEL_CIFLOW_MACOS},

				        ),

				    ),

				]

				DOCKER_IMAGES = {

				    f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9",  # for pytorch/xla

				    f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.1-py3.7",                 # for rocm

				    f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.2-py3.7",                 # for rocm

				    f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7",               # for rocm

				}

				DOCKER_IMAGES.update({

				    workflow.docker_image_base

				    for workflow in [*LINUX_WORKFLOWS, *BAZEL_WORKFLOWS, *ANDROID_WORKFLOWS]

				    if workflow.docker_image_base

				})

				DOCKER_WORKFLOWS = [

				    DockerWorkflow(

				        build_environment="docker-builds",

				        docker_images=sorted(DOCKER_IMAGES),

				        # Run weekly to ensure they can build

				        is_scheduled="1 * */7 * *",

				    ),

				]

				def main() -> None:

				    jinja_env = jinja2.Environment(

				        variable_start_string="!{{",

				        loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),

				@ -561,6 +776,11 @@ if __name__ == "__main__":

				        (jinja_env.get_template("linux_ci_workflow.yml.j2"), LINUX_WORKFLOWS),

				        (jinja_env.get_template("windows_ci_workflow.yml.j2"), WINDOWS_WORKFLOWS),

				        (jinja_env.get_template("bazel_ci_workflow.yml.j2"), BAZEL_WORKFLOWS),

				        (jinja_env.get_template("ios_ci_workflow.yml.j2"), IOS_WORKFLOWS),

				        (jinja_env.get_template("macos_ci_workflow.yml.j2"), MACOS_WORKFLOWS),

				        (jinja_env.get_template("docker_builds_ci_workflow.yml.j2"), DOCKER_WORKFLOWS),

				        (jinja_env.get_template("android_ci_full_workflow.yml.j2"), ANDROID_WORKFLOWS),

				        (jinja_env.get_template("android_ci_workflow.yml.j2"), ANDROID_SHORT_WORKFLOWS),

				    ]

				    # Delete the existing generated files first, this should align with .gitattributes file description.

				    existing_workflows = GITHUB_DIR.glob("workflows/generated-*")

				@ -572,15 +792,14 @@ if __name__ == "__main__":

				    ciflow_ruleset = CIFlowRuleset()

				    for template, workflows in template_and_workflows:

				        # added Iterable check to appease the mypy gods

				        if not isinstance(workflows, Iterable):

				            raise Exception(f"How is workflows not iterable? {workflows}")

				        for workflow in workflows:

				            workflow.generate_workflow_file(workflow_template=template)

				            if workflow.ciflow_config.enabled:

				                ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)

				            elif workflow.on_pull_request:

				                # If ciflow is disabled but still on_pull_request, we can denote

				                # it as a special label LABEL_CIFLOW_DEFAULT in the ruleset, which will be later

				                # turned into an actual LABEL_CIFLOW_DEFAULT label in the workflow.

				                # During the rollout phase, it has the same effect as LABEL_CIFLOW_DEFAULT

				                ciflow_ruleset.add_label_rule({LABEL_CIFLOW_DEFAULT}, workflow.build_environment)

				            ciflow_ruleset.add_label_rule(workflow.ciflow_config.labels, workflow.build_environment)

				    ciflow_ruleset.generate_json()

				if __name__ == "__main__":

				    main()

									
										56

.github/scripts/generate_pytorch_test_matrix.py
									
										vendored
									
												View File
												
				@ -15,6 +15,9 @@ from typing import Dict

				from typing_extensions import TypedDict

				BUILD_ENVIRONMENT = os.getenv('BUILD_ENVIRONMENT')

				assert BUILD_ENVIRONMENT is not None

				class Config(TypedDict):

				    num_shards: int

				    runner: str

				@ -31,28 +34,63 @@ def get_disabled_issues() -> str:

				    issue_numbers = [x[4] for x in re.findall(regex, pr_body)]

				    return ','.join(issue_numbers)

				# When the user specifies labels that are NOT ciflow/default, the expectation is

				# that the workflows should be triggered as if they are on trunk. For example, when

				# ciflow/all is specified, we should run the full test suite for Windows CUDA

				# and NOT only the smoke tests.

				def run_as_if_on_trunk() -> bool:

				    ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')

				    if not ON_PULL_REQUEST:

				        return True

				    from pathlib import Path

				    GITHUB_DIR = Path(__file__).resolve().parent.parent

				    with open(f'{GITHUB_DIR}/generated-ciflow-ruleset.json') as f:

				        labels_to_workflows = json.load(f)['label_rules']

				    pr_labels = json.loads(os.getenv('PR_LABELS', '[]'))

				    current_workflow_triggered_by_label = False

				    for label in pr_labels:

				        if label != 'ciflow/default' and label in labels_to_workflows:

				            workflows_triggered_by_label = labels_to_workflows[label]

				            if any([BUILD_ENVIRONMENT in workflow for workflow in workflows_triggered_by_label]):

				                current_workflow_triggered_by_label = True

				                break

				    return current_workflow_triggered_by_label

				def main() -> None:

				    TEST_RUNNER_TYPE = os.getenv('TEST_RUNNER_TYPE')

				    assert TEST_RUNNER_TYPE is not None

				    ON_PULL_REQUEST = os.getenv('GITHUB_HEAD_REF')

				    RUN_SMOKE_TESTS_ONLY_ON_PR = os.getenv('RUN_SMOKE_TESTS_ONLY_ON_PR')

				    RUN_SMOKE_TESTS = RUN_SMOKE_TESTS_ONLY_ON_PR == "true" and not run_as_if_on_trunk()

				    NUM_TEST_SHARDS_ON_PULL_REQUEST = os.getenv('NUM_TEST_SHARDS_ON_PULL_REQUEST')

				    NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '1'))

				    if ON_PULL_REQUEST and NUM_TEST_SHARDS_ON_PULL_REQUEST:

				    NUM_TEST_SHARDS = int(os.getenv('NUM_TEST_SHARDS', '0'))

				    if not run_as_if_on_trunk() and NUM_TEST_SHARDS_ON_PULL_REQUEST:

				        NUM_TEST_SHARDS = int(NUM_TEST_SHARDS_ON_PULL_REQUEST)

				    MULTIGPU_RUNNER_TYPE = os.getenv('MULTIGPU_RUNNER_TYPE')

				    DISTRIBUTED_GPU_RUNNER_TYPE = os.getenv('DISTRIBUTED_GPU_RUNNER_TYPE', TEST_RUNNER_TYPE)

				    NOGPU_RUNNER_TYPE = os.getenv('NOGPU_RUNNER_TYPE')

				    configs: Dict[str, Config] = {}

				    if os.getenv('ENABLE_JIT_LEGACY_TEST'):

				        configs['jit_legacy'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if MULTIGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_MULTIGPU_TEST'):

				        configs['multigpu'] = {'num_shards': 1, 'runner': MULTIGPU_RUNNER_TYPE}

				    if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):

				        configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				    if NOGPU_RUNNER_TYPE is not None and os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):

				        configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				    if NOGPU_RUNNER_TYPE is not None:

				        if os.getenv('ENABLE_NOGPU_NO_AVX_TEST'):

				            configs['nogpu_NO_AVX'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				        if os.getenv('ENABLE_NOGPU_NO_AVX2_TEST'):

				            configs['nogpu_NO_AVX2'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				        if os.getenv('ENABLE_FORCE_ON_CPU_TEST'):

				            configs['force_on_cpu'] = {'num_shards': 1, 'runner': NOGPU_RUNNER_TYPE}

				    if os.getenv('ENABLE_DISTRIBUTED_TEST'):

				        configs['distributed'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				        configs['distributed'] = {

				            'num_shards': 1,

				            'runner': DISTRIBUTED_GPU_RUNNER_TYPE if "cuda" in str(BUILD_ENVIRONMENT) else TEST_RUNNER_TYPE

				        }

				    if os.getenv('ENABLE_FX2TRT_TEST'):

				        configs['fx2trt'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_SLOW_TEST'):

				        configs['slow'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_DOCS_TEST'):

				@ -63,6 +101,8 @@ def main() -> None:

				        configs['xla'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if os.getenv('ENABLE_NOARCH_TEST'):

				        configs['noarch'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    if RUN_SMOKE_TESTS:

				        configs['smoke_tests'] = {'num_shards': 1, 'runner': TEST_RUNNER_TYPE}

				    matrix = {

				        'include': [

				            {

									
										2

.github/scripts/install_nvidia_utils_linux.sh
									
										vendored
									
												View File
												
				@ -3,7 +3,7 @@

				set -eou pipefail

				DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \

				DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

				DRIVER_FN="NVIDIA-Linux-x86_64-495.44.run"

				YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"

				install_nvidia_docker2_amzn2() {

									
										88

.github/scripts/lint_test_ownership.py
									
										vendored
									
										Executable file
									
												View File
												
				@ -0,0 +1,88 @@

				#!/usr/bin/env python3

				'''

				Test ownership was introduced in https://github.com/pytorch/pytorch/issues/66232.

				This lint verifies that every Python test file (file that matches test_*.py or *_test.py in the test folder)

				has valid ownership information in a comment header. Valid means:

				  - The format of the header follows the pattern "# Owner(s): ["list", "of owner", "labels"]

				  - Each owner label actually exists in PyTorch

				  - Each owner label starts with "module: " or "oncall: " or is in ACCEPTABLE_OWNER_LABELS

				This file is expected to run in the root directory of pytorch/pytorch.

				'''

				import boto3  # type: ignore[import]

				import botocore  # type: ignore[import]

				import fnmatch

				import json

				import sys

				from pathlib import Path

				from typing import List, Any

				# Team/owner labels usually start with "module: " or "oncall: ", but the following are acceptable exceptions

				ACCEPTABLE_OWNER_LABELS = ["NNC", "high priority"]

				GLOB_EXCEPTIONS = [

				    "test/run_test.py"

				]

				PYTORCH_ROOT = Path(__file__).resolve().parent.parent.parent

				TEST_DIR = PYTORCH_ROOT / "test"

				CURRENT_FILE_NAME = Path(__file__).resolve().relative_to(PYTORCH_ROOT)

				S3_RESOURCE_READ_ONLY = boto3.resource("s3", config=botocore.config.Config(signature_version=botocore.UNSIGNED))

				def get_all_test_files() -> List[Path]:

				    test_files = list(TEST_DIR.glob("**/test_*.py"))

				    test_files.extend(list(TEST_DIR.glob("**/*_test.py")))

				    return [f for f in test_files if any([fnmatch.fnmatch(str(f), g) for g in GLOB_EXCEPTIONS])]

				def get_pytorch_labels() -> Any:

				    bucket = S3_RESOURCE_READ_ONLY.Bucket("ossci-metrics")

				    summaries = bucket.objects.filter(Prefix="pytorch_labels.json")

				    for summary in summaries:

				        labels = summary.get()["Body"].read()

				    return json.loads(labels)

				# Returns a string denoting the error invalidating the label OR an empty string if nothing is wrong

				def validate_label(label: str, pytorch_labels: List[str]) -> str:

				    if label not in pytorch_labels:

				        return f"{label} is not a PyTorch label (please choose from https://github.com/pytorch/pytorch/labels)"

				    if label.startswith("module:") or label.startswith("oncall:") or label in ACCEPTABLE_OWNER_LABELS:

				        return ""

				    return f"{label} is not an acceptable owner (please update to another label or edit ACCEPTABLE_OWNERS_LABELS " \

				        "in {CURRENT_FILE_NAME}"

				# Returns a string denoting the error invalidating the file OR an empty string if nothing is wrong

				def validate_file(filename: Path, pytorch_labels: List[str]) -> str:

				    prefix = "# Owner(s): "

				    relative_name = Path(filename).relative_to(PYTORCH_ROOT)

				    with open(filename) as f:

				        for line in f.readlines():

				            if line.startswith(prefix):

				                labels = json.loads(line[len(prefix):])

				                labels_msgs = [validate_label(label, pytorch_labels) for label in labels]

				                file_msg = ", ".join([x for x in labels_msgs if x != ""])

				                return f"{relative_name}: {file_msg}" if file_msg != "" else ""

				    return f"{relative_name}: missing a comment header with ownership information."

				def main() -> None:

				    test_file_paths = get_all_test_files()

				    pytorch_labels = get_pytorch_labels()

				    file_msgs = [validate_file(f, pytorch_labels) for f in test_file_paths]

				    err_msg = "\n".join([x for x in file_msgs if x != ""])

				    if err_msg != "":

				        err_msg = err_msg + "\n\nIf you see files with missing ownership information above, " \

				            "please add the following line\n\n# Owner(s): [\"<owner: label>\"]\n\nto the top of each test file. " \

				            "The owner should be an existing pytorch/pytorch label."

				        print(err_msg)

				        sys.exit(1)

				if __name__ == '__main__':

				    main()

									
										86

.github/scripts/run_torchbench.py
									
										vendored
									
												View File
												
				@ -20,8 +20,6 @@ import subprocess

				from typing import List

				CUDA_VERSION = "cu102"

				PYTHON_VERSION = "3.7"

				TORCHBENCH_CONFIG_NAME = "config.yaml"

				MAGIC_PREFIX = "RUN_TORCHBENCH:"

				MAGIC_TORCHBENCH_PREFIX = "TORCHBENCH_BRANCH:"

				@ -45,6 +43,16 @@ def gen_abtest_config(control: str, treatment: str, models: List[str]) -> str:

				    config = config + "\n"

				    return config

				def setup_gha_env(name: str, val: str) -> None:

				    fname = os.environ["GITHUB_ENV"]

				    content = f"{name}={val}\n"

				    with open(fname, "a") as fo:

				        fo.write(content)

				def find_current_branch(repo_path: str) -> str:

				    repo = git.Repo(repo_path)

				    name: str = repo.active_branch.name

				    return name

				def deploy_torchbench_config(output_dir: str, config: str) -> None:

				    # Create test dir if needed

				    pathlib.Path(output_dir).mkdir(exist_ok=True)

				@ -73,25 +81,18 @@ def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:

				            return []

				    return model_list

				def identify_torchbench_branch(torchbench_path: str, prbody_file: str) -> None:

				    branch_name: str

				def find_torchbench_branch(prbody_file: str) -> str:

				    branch_name: str = ""

				    with open(prbody_file, "r") as pf:

				        lines = map(lambda x: x.strip(), pf.read().splitlines())

				        magic_lines = list(filter(lambda x: x.startswith(MAGIC_TORCHBENCH_PREFIX), lines))

				        if magic_lines:

				            # Only the first magic line will be recognized.

				            branch_name = magic_lines[0][len(MAGIC_TORCHBENCH_PREFIX):].strip()

				    # If not specified, directly return without the branch checkout

				    # If not specified, use main as the default branch

				    if not branch_name:

				        return

				    try:

				        print(f"Checking out the TorchBench branch: {branch_name} ...")

				        repo = git.Repo(torchbench_path)

				        origin = repo.remotes.origin

				        origin.fetch(branch_name)

				        repo.create_head(branch_name, origin.refs[branch_name]).checkout()

				    except git.exc.GitCommandError:

				        raise RuntimeError(f'{branch_name} doesn\'t exist in the pytorch/benchmark repository. Please double check.')

				        branch_name = "main"

				    return branch_name

				def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) -> None:

				    # Copy system environment so that we will not override

				@ -104,28 +105,41 @@ def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str) ->

				if __name__ == "__main__":

				    parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR')

				    parser.add_argument('--pr-num', required=True, type=str, help="The Pull Request number")

				    parser.add_argument('--pr-base-sha', required=True, type=str, help="The Pull Request base hash")

				    parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")

				    parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request")

				    parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")

				    parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")

				    subparsers = parser.add_subparsers(dest='command')

				    # parser for setup the torchbench branch name env

				    branch_parser = subparsers.add_parser("set-torchbench-branch")

				    # parser to run the torchbench branch

				    run_parser = subparsers.add_parser("run")

				    run_parser.add_argument('--pr-num', required=True, type=str, help="The Pull Request number")

				    run_parser.add_argument('--pr-base-sha', required=True, type=str, help="The Pull Request base hash")

				    run_parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")

				    run_parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")

				    run_parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")

				    args = parser.parse_args()

				    output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")

				    # Identify the specified models and verify the input

				    models = extract_models_from_pr(args.torchbench_path, args.pr_body)

				    if not models:

				        print("Can't parse the model filter from the pr body. Currently we only support allow-list.")

				        exit(1)

				    # Identify the specified TorchBench branch, verify the branch exists, and checkout the branch

				    try:

				        identify_torchbench_branch(args.torchbench_path, args.pr_body)

				    except RuntimeError as e:

				        print(f"Identify TorchBench branch failed: {str(e)}")

				        exit(1)

				    print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")

				    # Run TorchBench with the generated config

				    torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

				    deploy_torchbench_config(output_dir, torchbench_config)

				    run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)

				    if args.command == 'set-torchbench-branch':

				        branch_name = find_torchbench_branch(args.pr_body)

				        # env name: "TORCHBENCH_BRANCH"

				        setup_gha_env(MAGIC_TORCHBENCH_PREFIX[:-1], branch_name)

				    elif args.command == 'run':

				        output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")

				        # Identify the specified models and verify the input

				        models = extract_models_from_pr(args.torchbench_path, args.pr_body)

				        if not models:

				            print("Can't parse the model filter from the pr body. Currently we only support allow-list.")

				            exit(-1)

				        # Assert the current branch in args.torchbench_path is the same as the one specified in pr body

				        branch_name = find_torchbench_branch(args.pr_body)

				        current_branch = find_current_branch(args.torchbench_path)

				        assert branch_name == current_branch, f"Torchbench repo {args.torchbench_path} is on branch {current_branch}, \

				                                                but user specified to run on branch {branch_name}."

				        print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")

				        # Run TorchBench with the generated config

				        torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

				        deploy_torchbench_config(output_dir, torchbench_config)

				        run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)

				    else:

				        print(f"The command {args.command} is not supported.")

				        exit(-1)

157

.github/templates/android_ci_full_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,157 @@
 {%- extends "linux_ci_workflow.yml.j2" -%}
 {% import 'common_android.yml.j2' as common_android %}
 {%- set exclude_test = true -%}
 {% block name -%}
 # Template is at:    .github/templates/android_ci_full_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   pull_request:
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
 {% block build +%}
   # building and testing in a single job since bazel runs only small subset of tests
   build-and-test:
     runs-on: !{{ test_runner_type }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-build-and-test
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.calculate_docker_image(false) }}
       - name: Pull Docker image
         run: |
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       - name: Determine shm-size
         run: |
           shm_size="1g"
           case "${BUILD_ENVIRONMENT}" in
             *cuda*)
               shm_size="2g"
               ;;
             *rocm*)
               shm_size="8g"
               ;;
           esac
           echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
       - name: Output disk space left
         run: |
           sudo df -H
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       !{{ common.parse_ref() }}
       !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a-build", "arm-v7a") }}
       !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a-build", "arm-v8a") }}
       !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build", "x86_32") }}
       !{{ common_android.build_android("pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64-build", "x86_64") }}
       - name: Build-Final-Artifcact
         env:
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
         run: |
           set -eux
           docker_image_libtorch_android_x86_32="${DOCKER_IMAGE}-x86_32"
           docker_image_libtorch_android_x86_64="${DOCKER_IMAGE}-x86_64"
           docker_image_libtorch_android_arm_v7a="${DOCKER_IMAGE}-arm-v7a"
           docker_image_libtorch_android_arm_v8a="${DOCKER_IMAGE}-arm-v8a"
           echo "docker_image_commit: ${DOCKER_IMAGE}"
           echo "docker_image_libtorch_android_x86_32: ${docker_image_libtorch_android_x86_32}"
           echo "docker_image_libtorch_android_x86_64: ${docker_image_libtorch_android_x86_64}"
           echo "docker_image_libtorch_android_arm_v7a: ${docker_image_libtorch_android_arm_v7a}"
           echo "docker_image_libtorch_android_arm_v8a: ${docker_image_libtorch_android_arm_v8a}"
           # x86_32
           time docker pull "${docker_image_libtorch_android_x86_32}" >/dev/null
           export id_x86_32
           id_x86_32=$(docker run -e GRADLE_OFFLINE=1 --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_32}")
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_32}" bash) 2>&1
           # arm-v7a
           time docker pull "${docker_image_libtorch_android_arm_v7a}" >/dev/null
           export id_arm_v7a
           id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v7a}")
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_arm_v7a}" bash) 2>&1
           mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v7a"
           docker cp "${id_arm_v7a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v7a"
           # x86_64
           time docker pull "${docker_image_libtorch_android_x86_64}" >/dev/null
           export id_x86_64
           id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_x86_64}")
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "${id_x86_64}" bash) 2>&1
           mkdir -p "${GITHUB_WORKSPACE}/build_android_install_x86_64"
           docker cp "${id_x86_64}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_x86_64"
           # arm-v8a
           time docker pull "${docker_image_libtorch_android_arm_v8a}" >/dev/null
           export id_arm_v8a
           id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins "${docker_image_libtorch_android_arm_v8a}")
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1
           mkdir -p "${GITHUB_WORKSPACE}/build_android_install_arm_v8a"
           docker cp "${id_arm_v8a}:/var/lib/jenkins/workspace/build_android/install" "${GITHUB_WORKSPACE}/build_android_install_arm_v8a"
           # Putting everything together
           docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v7a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v7a"
           docker cp "${GITHUB_WORKSPACE}/build_android_install_x86_64" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_x86_64"
           docker cp "${GITHUB_WORKSPACE}/build_android_install_arm_v8a" "${id_x86_32}:/var/lib/jenkins/workspace/build_android_install_arm_v8a"
           # run gradle buildRelease
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec \
             -e BUILD_ENVIRONMENT="pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build" \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e AWS_DEFAULT_REGION \
             -e IS_GHA \
             -e PR_NUMBER \
             -e SHA1 \
             -e BRANCH \
             -e GITHUB_RUN_ID \
             -e SCCACHE_BUCKET \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e SKIP_SCCACHE_INITIALIZATION=1 \
             -e TORCH_CUDA_ARCH_LIST \
             -e PR_LABELS \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --user jenkins \
             -u jenkins -i "${id_x86_32}" bash) 2>&1
           mkdir -p "${GITHUB_WORKSPACE}/build_android_artifacts"
           docker cp "${id_x86_32}:/var/lib/jenkins/workspace/android/artifacts.tgz" "${GITHUB_WORKSPACE}/build_android_artifacts/"
           output_image="${DOCKER_IMAGE}-android-x86_32-gradle"
           docker commit "${id_x86_32}" "${output_image}"
           time docker push "${output_image}"
       !{{ common_android.upload_androind_binary_size("prebuilt", "${GITHUB_WORKSPACE}/build_android_artifacts/artifacts.tgz") }}
       - uses: !{{ common.upload_artifact_s3_action }}
         name: Store PyTorch Android Build Artifacts on S3
         with:
           name: ${{ env.BUILD_ENVIRONMENT }}
           retention-days: 14
           if-no-files-found: error
           path:
             build_android_artifacts/artifacts.tgz
       !{{ common.teardown_ec2_linux() }}
 {%- endblock %}

103

.github/templates/android_ci_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,103 @@
 {%- extends "linux_ci_workflow.yml.j2" -%}
 {% import 'common_android.yml.j2' as common_android %}
 {%- set exclude_test = true -%}
 {% block name -%}
 # Template is at:    .github/templates/android_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   pull_request:
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
 {% block build +%}
   # building and testing in a single job since bazel runs only small subset of tests
   build-and-test:
     runs-on: !{{ test_runner_type }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-build-and-test
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.calculate_docker_image(false) }}
       - name: Pull Docker image
         run: |
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       - name: Determine shm-size
         run: |
           shm_size="1g"
           case "${BUILD_ENVIRONMENT}" in
             *cuda*)
               shm_size="2g"
               ;;
             *rocm*)
               shm_size="8g"
               ;;
           esac
           echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"
       - name: Output disk space left
         run: |
           sudo df -H
       - name: Preserve github env variables for use in docker
         run: |
           env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"
       - name: Build
         run: |
           set -e
           # Unlike other gradle jobs, it's not worth building libtorch in a separate CI job and share via docker, because:
           # 1) Not shareable: it's custom selective build, which is different from default libtorch mobile build;
           # 2) Not parallelizable by architecture: it only builds libtorch for one architecture;
           echo "DOCKER_IMAGE: ${DOCKER_IMAGE}"
           time docker pull "${DOCKER_IMAGE}" >/dev/null
           export BUILD_LITE_INTERPRETER
           BUILD_LITE_INTERPRETER="1"
           if [[ "${BUILD_ENVIRONMENT}" == *"full-jit" ]]; then
             BUILD_LITE_INTERPRETER="0"
           fi
           git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
           # shellcheck disable=SC2016
           export id
           id=$(docker run -e BUILD_ENVIRONMENT \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e PR_LABELS \
             -e SKIP_SCCACHE_INITIALIZATION=1 \
             -e TORCH_CUDA_ARCH_LIST \
             -e BUILD_LITE_INTERPRETER \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --tty \
             --detach \
             --user jenkins \
             -v "$(pwd):/var/lib/jenkins/workspace" \
             --cap-add=SYS_PTRACE \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --security-opt seccomp=unconfined \
             -t -d -w /var/lib/jenkins "${DOCKER_IMAGE}")
           # shellcheck disable=SC2016
           export COMMAND
           # shellcheck disable=SC2016
           COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
           echo "${COMMAND}" > ./command.sh && bash ./command.sh
           # Skip docker push as this job is purely for size analysis purpose.
           # Result binaries are already in `/home/circleci/project/` as it's mounted instead of copied.
       !{{ common.parse_ref() }}
       !{{ common_android.upload_androind_binary_size("custom-build-single", "") }}
       !{{ common.teardown_ec2_linux() }}
 {%- endblock %}

46

.github/templates/bazel_ci_workflow.yml.j2 vendored

View File

 @ -1,4 +1,5 @@
 {%- extends "linux_ci_workflow.yml.j2" -%}
 {% import 'common_android.yml.j2' as common_android %}
 {%- set exclude_test = true -%}
 {% block name -%}
 # Template is at:    .github/templates/bazel_ci_workflow.yml.j2
 @ -7,35 +8,28 @@ name: !{{ build_environment }}
 {%- endblock %}
 on:
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- else %}
   # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
 {%- endif %}
 {% block build +%}
   # building and testing in a single job since bazel runs only small subset of tests
   build-and-test:
     runs-on: !{{ test_runner_type }}
     needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-build-and-test
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
       !{{ common.calculate_docker_image(false) }}
       - name: Pull Docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       - name: Determine shm-size
         run: |
           shm_size="1g"
 @ -79,23 +73,10 @@ on:
           )
           docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && sudo chown -R jenkins /dev && .jenkins/pytorch/build.sh'
       !{{ common.parse_ref() }}
       - name: Display and upload binary build size statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         run: |
           COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
           export COMMIT_TIME
           pip3 install requests==2.26
           python3 -m tools.stats.upload_binary_size_to_scuba || exit 0
       !{{ common_android.upload_androind_binary_size("", "")}}
       - name: Test
         # Time out the test phase after 3.5 hours
         timeout-minutes: 210
         run: |
           # detached container should get cleaned up by teardown_ec2_linux
           export SHARD_NUMBER=0
 @ -108,10 +89,10 @@ on:
             -e GITHUB_ACTIONS \
             -e IN_CI \
             -e SHARD_NUMBER \
             -e NUM_TEST_SHARDS \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e CONTINUE_THROUGH_ERROR \
             -e PR_LABELS \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
 @ -132,6 +113,7 @@ on:
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       !{{ common.upload_test_reports(name='bazel') }}
       !{{ common.upload_downloaded_files(name='bazel') }}
       !{{ common.upload_test_statistics(build_environment) }}
       !{{ common.teardown_ec2_linux() }}
 {%- endblock %}

182

.github/templates/common.yml.j2 vendored

View File

 @ -4,6 +4,7 @@
 {%- set squid_proxy    = "http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -%}
 {# squid_no_proxy is a list of common set of fixed domains or IPs that we don't need to proxy. See https://docs.aws.amazon.com/AmazonECS/latest/developerguide/http_proxy_config.html#windows-proxy #}
 {%- set squid_no_proxy = "localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" -%}
 {%- set timeout_minutes = 240 -%}
 {%- macro concurrency(build_environment) -%}
 concurrency:
 @ -11,6 +12,13 @@ concurrency:
   cancel-in-progress: true
 {%- endmacro -%}
 {%- macro pull_docker(image) -%}
           retry () {
               "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")
           }
           retry docker pull "!{{ image }}"
 {%- endmacro -%}
 {%- macro display_ec2_information() -%}
       - name: Display EC2 information
         shell: bash
 @ -33,23 +41,23 @@ concurrency:
         run: .github/scripts/parse_ref.py
 {%- endmacro -%}
 {%- macro upload_test_statistics(build_environment) -%}
 {%- macro upload_test_statistics(build_environment, when="always()") -%}
       - name: Display and upload test statistics (Click Me)
         if: always()
         if: !{{ when }}
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
           JOB_BASE_NAME: !{{ build_environment }}-test
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
           PR_NUMBER: ${{ github.event.pull_request.number }}
           SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           TAG: ${{ steps.parse-ref.outputs.tag }}
           WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         shell: bash
         run: |
           python3 -m pip install -r requirements.txt
           python3 -m pip install boto3==1.16.34
           python3 -m pip install boto3==1.19.12
           python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
 {%- endmacro -%}
 @ -60,18 +68,13 @@ concurrency:
           AWS_RETRY_MODE: standard
           AWS_MAX_ATTEMPTS: 5
         run: |
           aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh
           bash /tmp/ecr-login.sh
           rm /tmp/ecr-login.sh
           AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
           aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
               --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
       - name: Chown workspace
         env:
           ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
         run: |
           retry () {
               "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")
           }
           !{{ pull_docker("${ALPINE_IMAGE}") }}
           # Ensure the working directory gets chowned back to the current user
           retry docker pull "${ALPINE_IMAGE}"
           docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       - name: Clean workspace
         run: |
 @ -93,8 +96,6 @@ concurrency:
         run: .github/scripts/wait_for_ssh_to_drain.sh
       - name: Chown workspace
         if: always()
         env:
           ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
 @ -115,14 +116,56 @@ concurrency:
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
           submodules: !{{ submodules }}
       - name: Clean PyTorch checkout
         run: |
           # Remove any artifacts from the previous checkouts
           git clean -fxd
 {%- endmacro -%}
 {%- macro upload_downloaded_files(name, artifact_name="", use_s3=True, when="always()") -%}
       - name: Zip JSONs for upload
         if: !{{ when }}
         env:
 {%- if name == 'linux' or name == 'windows' or name == 'macos' %}
           FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
 {%- else %}
           FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
 {%- endif %}
 {%- if name == 'windows' %}
         shell: powershell
         run: |
           # -ir => recursive include all files in pattern
 z a "test-jsons-$Env:FILE_SUFFIX.zip" -ir'!test\*.json'
 {%- else %}
         run: |
           # Remove any previous test jsons if they exist
           rm -f test-jsons-*.zip
           zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'
 {%- endif %}
 {%- if use_s3 %}
       - uses: !{{ upload_artifact_s3_action }}
         name: Store Test Downloaded JSONs on S3
 {%- else %}
       - uses: actions/upload-artifact@v2
         name: Store Test Downloaded JSONs on Github
 {%- endif %}
         if: !{{ when }}
         with:
 {%- if artifact_name != "" %}
           name: !{{ artifact_name }}
 {%- endif %}
           retention-days: 14
           if-no-files-found: warn
           path:
             test-jsons-*.zip
 {%- endmacro -%}
 {%- macro upload_test_reports(name) -%}
 {%- macro upload_test_reports(name, artifact_name="", use_s3=True) -%}
       - name: Zip test reports for upload
         if: always()
         env:
 {%- if name == 'linux' or name == 'windows' %}
 {%- if name == 'linux' or name == 'windows' or name == 'macos' %}
           FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'
 {%- else %}
           FILE_SUFFIX: '!{{ name }}-${{ github.job }}'
 @ -138,35 +181,22 @@ concurrency:
           rm -f test-reports-*.zip
           zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'
 {%- endif %}
       - uses: actions/upload-artifact@v2
         name: Store Test Reports
         if: always()
         with:
 {%- if name == 'linux' or name == 'windows' %}
           name: test-reports-${{ matrix.config }}
 {%- else %}
           name: test-reports-!{{ name }}
 {%- endif %}
           retention-days: 14
           if-no-files-found: error
           path:
 {%- if name == 'windows' %}
             pytorch-${{ github.run_id }}/test-reports-*.zip
 {%- else %}
             test-reports-*.zip
 {%- endif %}
 {%- if use_s3 %}
       - uses: !{{ upload_artifact_s3_action }}
         name: Store Test Reports on S3
 {%- else %}
       - uses: actions/upload-artifact@v2
         name: Store Test Reports on Github
 {%- endif %}
         if: always()
         with:
 {%- if artifact_name != "" %}
           name: !{{ artifact_name }}
 {%- endif %}
           retention-days: 14
           if-no-files-found: error
           path:
 {%- if name == 'windows' %}
             pytorch-${{ github.run_id }}/test-reports-*.zip
 {%- else %}
             test-reports-*.zip
 {%- endif %}
 {%- endmacro -%}
 {%- macro render_test_results() -%}
 @ -184,3 +214,71 @@ concurrency:
         run: |
           python3 tools/render_junit.py test/
 {%- endmacro -%}
 {%- macro calculate_docker_image(always_rebuild) -%}
       - name: Calculate docker image tag
         id: calculate-tag
         run: |
           DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
           echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"
           echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"
           echo "::set-output name=docker_tag::${DOCKER_TAG}"
           echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
       - name: Check if image should be built
         id: check
         env:
           BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
         run: |
           set -x
 {%- if not always_rebuild %}
           # Check if image already exists, if it does then skip building it
           if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
             exit 0
           fi
           if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
             # if we're on the base branch then use the parent commit
             MERGE_BASE=$(git rev-parse HEAD~)
           else
             # otherwise we're on a PR, so use the most recent base commit
             MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
           fi
           # Covers the case where a previous tag doesn't exist for the tree
           # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
           if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
             echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
             exit 1
           fi
           PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
           # If no image exists but the hash is the same as the previous hash then we should error out here
           if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
             echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
             echo "       contact the PyTorch team to restore the original images"
             exit 1
           fi
 {%- endif %}
           echo ::set-output name=rebuild::yes
       - name: Build and push docker image
         if: ${{ steps.check.outputs.rebuild }}
         env:
           DOCKER_SKIP_S3_UPLOAD: 1
         working-directory: .circleci/docker
         run: |
           export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
           ./build_docker.sh
 {%- endmacro -%}
 {%- macro setup_miniconda(python_version) -%}
       - name: Setup miniconda
         uses: conda-incubator/setup-miniconda@v2
         with:
           auto-update-conda: true
           python-version: !{{ python_version }}
           activate-environment: build
 {%- endmacro -%}
 {%- macro set_xcode_version(xcode_version) -%}
 {%- if xcode_version != '' %}
   # Set xcode xcode version to !{{ xcode_version }}
   DEVELOPER_DIR: /Applications/Xcode_!{{ xcode_version }}.app/Contents/Developer
 {%- endif %}
 {%- endmacro -%}

81

.github/templates/common_android.yml.j2 vendored Normal file

View File

 @ -0,0 +1,81 @@
 {% import 'common.yml.j2' as common %}
 {%- macro upload_androind_binary_size(build_type, artifacts) -%}
       - name: Display and upload binary build size statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
           PR_NUMBER: ${{ github.event.pull_request.number }}
           SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           TAG: ${{ steps.parse-ref.outputs.tag }}
           WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         run: |
           # The artifact file is created inside docker container, which contains the result binaries.
           # Now unpackage it into the project folder. The subsequent script will scan project folder
           # to locate result binaries and report their sizes.
           # If artifact file is not provided it assumes that the project folder has been mounted in
           # the docker during build and already contains the result binaries, so this step can be skipped.
           export ARTIFACTS=!{{ artifacts }}
           if [ -n "${ARTIFACTS}" ]; then
             tar xf "${ARTIFACTS}" -C "${GITHUB_WORKSPACE}"
             cd "${GITHUB_WORKSPACE}"
           fi
           COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
           export COMMIT_TIME
           ANDROID_BUILD_TYPE=!{{ build_type}}
           export ANDROID_BUILD_TYPE
           pip3 install requests==2.26 boto3==1.16.34
           python3 -m tools.stats.upload_binary_size_to_scuba "android" || exit 0
 {%- endmacro -%}
 {%- macro build_android(env_name, container_suffix) -%}
       - name: Build-!{{ container_suffix }}
         env:
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
         run: |
           # detached container should get cleaned up by teardown_ec2_linux
           #!/bin/bash -eo pipefail
           # Pull Docker image and run build
           time docker pull "${DOCKER_IMAGE}" >/dev/null
           echo "${DOCKER_IMAGE}"
           export container_name
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT=!{{ env_name }} \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e AWS_DEFAULT_REGION \
             -e IS_GHA \
             -e PR_NUMBER \
             -e SHA1 \
             -e BRANCH \
             -e GITHUB_RUN_ID \
             -e SCCACHE_BUCKET \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e SKIP_SCCACHE_INITIALIZATION=1 \
             -e TORCH_CUDA_ARCH_LIST \
             -e PR_LABELS \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --tty \
             --detach \
             --user jenkins \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}"
           )
           git submodule sync && git submodule update -q --init --recursive --depth 1 --jobs 0
           docker cp "${GITHUB_WORKSPACE}/." "${container_name}:/var/lib/jenkins/workspace"
           # shellcheck disable=SC1105
           ((echo "sudo chown -R jenkins . && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "${container_name}" bash) 2>&1
           # Copy dist folder back
           export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-!{{ container_suffix }}
           docker cp "${container_name}:/var/lib/jenkins/workspace/dist" "${GITHUB_WORKSPACE}/." || echo "Dist folder not found"
           docker commit "${container_name}" "${COMMIT_DOCKER_IMAGE}"
           time docker push "${COMMIT_DOCKER_IMAGE}"
 {%- endmacro -%}

59

.github/templates/docker_builds_ci_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,59 @@
 {% import 'common.yml.j2' as common %}
 {%- block name -%}
 # Template is at:    .github/templates/docker_builds_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   workflow_dispatch:
   pull_request:
     types: [opened, synchronize, reopened]
     paths:
       - '.circleci/docker/**'
       - '.github/workflows/generated-docker-builds.yml'
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 {%- endif %}
 !{{ common.concurrency(build_environment) }}
 env:
   ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
   AWS_DEFAULT_REGION: us-east-1
 jobs:
 {% block docker_build +%}
   docker-build:
     runs-on: linux.2xlarge
     timeout-minutes: !{{ common.timeout_minutes }}
     strategy:
       matrix:
         include:
           {%- for docker_image in docker_images %}
             - docker_image_base: '!{{ docker_image }}'
               docker_image_short_name: '!{{ docker_image.split('/')[-1] }}'
           {%- endfor %}
     env:
       DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}'
     name: docker-build (${{ matrix.docker_image_short_name }})
     steps:
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.calculate_docker_image(true) }}
       - name: Pull Docker image
         run: |
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       !{{ common.parse_ref() }}
       !{{ common.teardown_ec2_linux() }}
       - name: Hold runner for 2 hours or until ssh sessions have drained
         # Always hold for active ssh sessions
         if: always()
         run: .github/scripts/wait_for_ssh_to_drain.sh
       - name: Clean up docker images
         if: always()
         run: |
           # Prune all of the docker images
           docker system prune -af
 {%- endblock %}

86

.github/templates/ios_ci_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,86 @@
 {% import 'common.yml.j2' as common %}
 {%- block name -%}
 # Template is at:    .github/templates/ios_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   pull_request:
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 {%- else %}
   push:
     branches:
       - master
       - release/*
 {%- endif %}
   workflow_dispatch:
 # For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
 defaults:
   run:
     shell: bash -x -e -l {0}
 env:
   BUILD_ENVIRONMENT: !{{ build_environment }}
   IN_CI: 1
   IS_GHA: 1
 !{{ common.set_xcode_version(xcode_version) }}
 jobs:
 {% block build +%}
   build:
     runs-on: macos-10.15
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-build
       IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}
       IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
       PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.setup_miniconda("3.8") }}
       - name: Install ios / conda Dependencies
         run: |
           # Install dependencies
           brew install libtool
           conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes
       - name: Run Fastlane
         shell: bash -e {0}
         run: |
           set -x
           cd ios/TestApp
           # install fastlane
           sudo gem install bundler && bundle install
           # install certificates
           echo "${IOS_CERT_KEY_2022}" >> cert.txt
           base64 --decode cert.txt -o Certificates.p12
           rm cert.txt
           bundle exec fastlane install_root_cert
           bundle exec fastlane install_dev_cert
           # install the provisioning profile
           PROFILE=PyTorch_CI_2022.mobileprovision
           PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
           mkdir -pv "${PROVISIONING_PROFILES}"
           cd "${PROVISIONING_PROFILES}"
           echo "${IOS_SIGN_KEY_2022}" >> cert.txt
           base64 --decode cert.txt -o ${PROFILE}
           rm cert.txt
       - name: Build
         run: |
           export TCLLIBPATH="/usr/local/lib"
           python -VV
           export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}
           scripts/build_ios.sh
 {% endblock +%}
 !{{ common.concurrency(build_environment) }}

229

.github/templates/linux_ci_workflow.yml.j2 vendored

View File

 @ -7,18 +7,8 @@ name: !{{ build_environment }}
 {%- endblock %}
 on:
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- else %}
   # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers
 {%- endif %}
 {%- if is_scheduled %}
   schedule:
 @ -28,6 +18,7 @@ on:
     branches:
       - master
       - release/*
       - fbsync
 {%- endif %}
   workflow_dispatch:
 @ -38,6 +29,7 @@ env:
   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
   TORCH_CUDA_ARCH_LIST: 5.2
   IN_CI: 1
   IS_GHA: 1
   # This is used for the phase of adding wheel tests only, will be removed once completed
   IN_WHEEL_TEST: 1
   # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh
 @ -45,101 +37,52 @@ env:
   ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
   PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
   GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
   AWS_DEFAULT_REGION: us-east-1
   PR_NUMBER: ${{ github.event.pull_request.number }}
   SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
   PYTORCH_RETRY_TEST_CASES: 1
 {%- if build_with_debug %}
   DEBUG: 1
 {%- endif %}
 !{{ common.concurrency(build_environment) }}
 jobs:
 {%- if ciflow_config.enabled %}
   !{{ ciflow_config.root_job_name }}:
     runs-on: ubuntu-18.04
     if: ${{ !{{ ciflow_config.root_job_condition }} }}
     env:
       LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
     steps:
       - name: noop
         run: echo running !{{ ciflow_config.root_job_name }}
       - name: print labels
         run: echo "${LABELS}"
 {%- endif %}
   calculate-docker-image:
     runs-on: linux.2xlarge
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     env:
       DOCKER_BUILDKIT: 1
     timeout-minutes: 90
     outputs:
       docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
     steps:
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("false") }}
       - name: Calculate docker image tag
         id: calculate-tag
         run: |
           DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)
           echo "::set-output name=docker_tag::${DOCKER_TAG}"
           echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"
       - name: Check if image should be built
         id: check
         env:
           DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
           BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}
         run: |
           set -x
           # Check if image already exists, if it does then skip building it
           if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then
             exit 0
           fi
           if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then
             # if we're on the base branch then use the parent commit
             MERGE_BASE=$(git rev-parse HEAD~)
           else
             # otherwise we're on a PR, so use the most recent base commit
             MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")
           fi
           # Covers the case where a previous tag doesn't exist for the tree
           # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly
           if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then
             echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"
             exit 1
           fi
           PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")
           # If no image exists but the hash is the same as the previous hash then we should error out here
           if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then
             echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"
             echo "       contact the PyTorch team to restore the original images"
             exit 1
           fi
           echo ::set-output name=rebuild::yes
       - name: Build and push docker image
         if: ${{ steps.check.outputs.rebuild }}
         env:
           DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}
           DOCKER_SKIP_S3_UPLOAD: 1
         run: |
           export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}
           cd .circleci/docker && ./build_docker.sh
 {% block build +%}
   build:
     runs-on: linux.2xlarge
     needs: [calculate-docker-image, !{{ ciflow_config.root_job_name }}]
     timeout-minutes: !{{ common.timeout_minutes }}
     if: !{{ ciflow_config.root_job_condition }}
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-build
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
     outputs:
       docker_image: ${{ steps.calculate-tag.outputs.docker_image }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
       !{{ common.calculate_docker_image(false) }}
       - name: Pull Docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       !{{ common.parse_ref() }}
       - name: Build
         env:
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
         run: |
           # detached container should get cleaned up by teardown_ec2_linux
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e JOB_BASE_NAME \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e AWS_DEFAULT_REGION \
             -e IS_GHA \
             -e PR_NUMBER \
             -e SHA1 \
             -e BRANCH \
             -e GITHUB_RUN_ID \
             -e SCCACHE_BUCKET \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
 @ -158,19 +101,14 @@ jobs:
             "${DOCKER_IMAGE}"
           )
           docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'
       !{{ common.parse_ref() }}
       - name: Display and upload binary build size statistics (Click Me)
         # temporary hack: set CIRCLE_* vars, until we update
         # tools/stats/print_test_stats.py to natively support GitHub Actions
         env:
           AWS_DEFAULT_REGION: us-east-1
           IS_GHA: 1
           SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}
           CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
           TAG: ${{ steps.parse-ref.outputs.tag }}
           WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'
         run: |
           COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)
           export COMMIT_TIME
 @ -180,7 +118,7 @@ jobs:
         run: |
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       {%- if not is_libtorch %}
       {%- if build_generates_artifacts %}
       - name: Archive artifacts into zip
         run: |
           zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json
 @ -207,14 +145,14 @@ jobs:
 {%- if not exclude_test %}
 {% block test +%}
   generate-test-matrix:
     needs: build
     runs-on: ubuntu-18.04
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       TEST_RUNNER_TYPE: !{{ test_runner_type }}
       ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
       ENABLE_JIT_LEGACY_TEST: !{{ enable_jit_legacy_test }}
       ENABLE_FX2TRT_TEST: !{{ enable_fx2trt_test }}
       ENABLE_MULTIGPU_TEST: !{{ enable_multigpu_test }}
       ENABLE_NOGPU_NO_AVX_TEST: !{{ enable_nogpu_no_avx_test }}
       ENABLE_NOGPU_NO_AVX2_TEST: !{{ enable_nogpu_no_avx2_test }}
 @ -225,6 +163,7 @@ jobs:
       ENABLE_NOARCH_TEST: !{{ enable_noarch_test }}
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu
       DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu
       NOGPU_RUNNER_TYPE: linux.2xlarge
       PR_BODY: ${{ github.event.pull_request.body }}
     outputs:
 @ -243,25 +182,25 @@ jobs:
         run: .github/scripts/generate_pytorch_test_matrix.py
   test:
     needs: [calculate-docker-image, build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
     needs: [build, generate-test-matrix]
     strategy:
       matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
       fail-fast: false
     runs-on: ${{ matrix.runner }}
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
       JOB_BASE_NAME: !{{ build_environment }}-test
       TEST_CONFIG: ${{ matrix.config }}
       SHARD_NUMBER: ${{ matrix.shard }}
       NUM_TEST_SHARDS: ${{ matrix.num_shards }}
       PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
     steps:
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
       - name: Pull Docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG
         if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}
         run: |
 @ -293,24 +232,31 @@ jobs:
       - name: Test
         env:
           PR_NUMBER: ${{ github.event.pull_request.number }}
           IS_GHA: 1
           CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}
           CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}
           CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
           AWS_DEFAULT_REGION: us-east-1
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
         # Time out the test phase after !{{ timeout_after }} minutes
         timeout-minutes: !{{ timeout_after }}
         run: |
           set -x
           if [[ $TEST_CONFIG == 'multigpu' ]]; then
             TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh
           elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then
             TEST_COMMAND=.jenkins/caffe2/test.sh
           else
             TEST_COMMAND=.jenkins/pytorch/test.sh
           fi
           if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
             export SHARD_NUMBER=0
           PROXY_ENV=
           # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now
           #       We should investigate whether or not there's a list of hostnames we can add to no_proxy to
           #       make it so that we shouldn't have to fully disable squid for XLA tests
           if [[ $TEST_CONFIG != 'xla' ]]; then
             # shellcheck disable=SC2089
             PROXY_ENV="-e http_proxy=!{{ common.squid_proxy }} -e https_proxy=!{{ common.squid_proxy }} -e no_proxy=!{{ common.squid_no_proxy }}"
           fi
           # detached container should get cleaned up by teardown_ec2_linux
           # TODO: Stop building test binaries as part of the build phase
           # Used for GPU_FLAG since that doesn't play nice
           # shellcheck disable=SC2086
           # shellcheck disable=SC2086,SC2090
           container_name=$(docker run \
             ${GPU_FLAG:-} \
             -e BUILD_ENVIRONMENT \
 @ -319,9 +265,8 @@ jobs:
             -e GITHUB_ACTIONS \
             -e IN_CI \
             -e IS_GHA \
             -e CIRCLE_BRANCH \
             -e CIRCLE_SHA1 \
             -e CIRCLE_PR_NUMBER \
             -e BRANCH \
             -e SHA1 \
             -e AWS_DEFAULT_REGION \
             -e IN_WHEEL_TEST \
             -e SHARD_NUMBER \
 @ -329,15 +274,17 @@ jobs:
             -e TEST_CONFIG \
             -e NUM_TEST_SHARDS \
             -e PYTORCH_IGNORE_DISABLED_ISSUES \
             -e PYTORCH_RETRY_TEST_CASES \
             -e PR_LABELS \
             -e CONTINUE_THROUGH_ERROR \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e SCCACHE_BUCKET \
             -e http_proxy="!{{ common.squid_proxy }}" -e https_proxy="!{{ common.squid_proxy }}" -e no_proxy="!{{ common.squid_no_proxy }}" \
             -e XLA_CLANG_CACHE_S3_BUCKET_NAME \
             ${PROXY_ENV} \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --ulimit stack=10485760:83886080 \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --ipc=host \
             --shm-size="${SHM_SIZE}" \
             --tty \
             --detach \
 @ -354,12 +301,7 @@ jobs:
           # Ensure the working directory gets chowned back to the current user
           docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .
       !{{ common.render_test_results() }}
       {%- if is_coverage %}
       - name: Report coverage
         run: |
           python3 -mpip install codecov==2.1.12
           python3 -mcodecov
       {%- endif %}
       !{{ common.upload_downloaded_files(name='linux') }}
       !{{ common.upload_test_reports(name='linux') }}
       !{{ common.upload_test_statistics(build_environment) }}
       !{{ common.teardown_ec2_linux() }}
 @ -368,19 +310,21 @@ jobs:
 {%- if enable_doc_jobs %}
   build-docs:
     runs-on: linux.2xlarge
     timeout-minutes: !{{ common.timeout_minutes }}
     strategy:
       matrix:
         docs_type: [cpp, python]
     needs: [calculate-docker-image, build, !{{ ciflow_config.root_job_name }}]
     needs: [build]
     env:
       DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}
       DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}
       DOCS_TYPE: ${{ matrix.docs_type }}
       WITH_PUSH: ${{ github.event_name == 'schedule' }}
     steps:
       !{{ common.setup_ec2_linux() }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Pull docker image
       - name: Pull Docker image
         run: |
           docker pull "${DOCKER_IMAGE}"
           !{{ common.pull_docker("${DOCKER_IMAGE}") }}
       - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b
         name: Download PyTorch Build Artifacts
         with:
 @ -388,29 +332,44 @@ jobs:
       - name: Unzip artifacts
         run: |
           unzip -o artifacts.zip
 {%- if is_scheduled %}
       - name: Generate netrc (only for docs-push)
         if: ${{ github.event_name == 'schedule' }}
         env:
           GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }}
         run: |
           # set credentials for https pushing
           echo "machine github.com" > "${RUNNER_TEMP}/.netrc"
           echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc"
           echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc"
 {%- endif %}
       - name: Build ${{ matrix.docs_type }} docs
         run: |
           set -ex
           time docker pull "${DOCKER_IMAGE}" > /dev/null
           echo "${GITHUB_REF}"
           ref=${GITHUB_REF##*/}
           target=${ref//v}
           # TODO: Set it correctly when workflows are scheduled on tags
           target="master"
           # detached container should get cleaned up by teardown_ec2_linux
           container_name=$(docker run \
             -e BUILD_ENVIRONMENT \
             -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \
             -e IN_CI \
             -e MAX_JOBS="$(nproc --ignore=2)" \
             -e CIRCLE_SHA1="$GITHUB_SHA" \
             -e SHA1="$GITHUB_SHA" \
             -e DOCS_VERSION="${target}" \
             -e DOCS_TYPE \
             -e PR_LABELS \
             -e WITH_PUSH \
             --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \
             --security-opt seccomp=unconfined \
             --cap-add=SYS_PTRACE \
             --tty \
             --detach \
             --user jenkins \
 {%- if is_scheduled %}
             -v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \
 {%- endif %}
             -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \
             -w /var/lib/jenkins/workspace \
             "${DOCKER_IMAGE}"
 @ -427,7 +386,7 @@ jobs:
           retention-days: 14
           s3-bucket: doc-previews
           if-no-files-found: error
           path: pytorch.github.io/docs/merge/
           path: pytorch.github.io/docs/master/
           s3-prefix: pytorch/${{ github.event.pull_request.number }}
       - uses: !{{ common.upload_artifact_s3_action }}
         name: Upload C++ Docs Preview
 @ -438,14 +397,4 @@ jobs:
           s3-bucket: doc-previews
           path: cppdocs/
           s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs
       - name: Archive artifacts into zip
         run: |
           zip -r "docs_${DOCS_TYPE}.zip" "${GITHUB_WORKSPACE}/pytorch.github.io" "${GITHUB_WORKSPACE}/cppdocs"
       - uses: actions/upload-artifact@v2
         name: Store PyTorch Build Artifacts
         with:
           name: docs_${{ matrix.docs_type }}
           path: docs_${{ matrix.docs_type }}.zip
           if-no-files-found: error
       !{{ common.teardown_ec2_linux() }}
 {%- endif -%}

149

.github/templates/macos_ci_workflow.yml.j2 vendored Normal file

View File

 @ -0,0 +1,149 @@
 {% import 'common.yml.j2' as common %}
 {%- block name -%}
 # Template is at:    .github/templates/macos_ci_workflow.yml.j2
 # Generation script: .github/scripts/generate_ci_workflows.py
 name: !{{ build_environment }}
 {%- endblock %}
 on:
   pull_request:
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 {%- else %}
   push:
     branches:
       - master
       - release/*
       - fbsync
 {%- endif %}
   workflow_dispatch:
 # For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179
 defaults:
   run:
     shell: bash -e -l {0}
 env:
   BUILD_ENVIRONMENT: !{{ build_environment }}
   COMPACT_JOB_NAME: !{{ build_environment }}
   IN_CI: 1
   IS_GHA: 1
   PYTORCH_RETRY_TEST_CASES: 1
 !{{ common.set_xcode_version(xcode_version) }}
 jobs:
 {% block build +%}
   build:
     runs-on: !{{ test_runner_type }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}
       # For sccache access (only on non-forked PRs)
       AWS_ACCESS_KEY_ID: ${{ secrets.MACOS_SCCACHE_S3_ACCESS_KEY_ID }}
       AWS_SECRET_ACCESS_KEY: ${{ secrets.MACOS_SCCACHE_S3_SECRET_ACCESS_KEY }}
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
       PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.setup_miniconda("3.8") }}
       - name: Install macOS homebrew dependencies
         run: |
           # Install dependencies
           brew install libomp
       - name: Install sccache (only for non-forked PRs, and pushes to trunk)
         if: ${{ github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository }}
         run: |
           sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache
           sudo chmod +x /usr/local/bin/sccache
           echo "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> "${GITHUB_ENV}"
       - name: Build
         run: |
           echo "CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}" >> "${GITHUB_ENV}"
           .jenkins/pytorch/macos-build.sh
 {%- if build_generates_artifacts %}
       - name: Archive artifacts into zip
         run: |
           zip -1 -r artifacts.zip dist/
       - uses: actions/upload-artifact@v2
         name: Store PyTorch Build Artifacts on GHA
         with:
           name: ${{ env.BUILD_ENVIRONMENT }}
           retention-days: 14
           if-no-files-found: error
           path:
             artifacts.zip
 {%- endif %}
 {% endblock +%}
 {%- if not exclude_test %}
 {% block test +%}
   generate-test-matrix:
     needs: build
     runs-on: ubuntu-18.04
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       TEST_RUNNER_TYPE: !{{ test_runner_type }}
       ENABLE_DISTRIBUTED_TEST: !{{ enable_distributed_test }}
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       PR_BODY: ${{ github.event.pull_request.body }}
     outputs:
       matrix: ${{ steps.set-matrix.outputs.matrix }}
       render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
       ignore-disabled-issues: ${{ steps.set-matrix.outputs.ignore-disabled-issues }}
     container:
       image: python:3.9
     steps:
       - name: Install dependencies
         run: pip install typing-extensions==3.10
       - name: Clone pytorch/pytorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
       - name: Generating test matrix
         id: set-matrix
         run: .github/scripts/generate_pytorch_test_matrix.py
   test:
     needs: [build, generate-test-matrix]
     strategy:
       matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
       fail-fast: false
     runs-on: ${{ matrix.runner }}
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-test
       TEST_CONFIG: ${{ matrix.config }}
       SHARD_NUMBER: ${{ matrix.shard }}
       NUM_TEST_SHARDS: ${{ matrix.num_shards }}
       PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
     steps:
       !{{ common.checkout_pytorch("false") }}
       - uses: actions/download-artifact@v2
         name: Download PyTorch Build Artifacts from GHA
         with:
           name: ${{ env.BUILD_ENVIRONMENT }}
           path: .
       - name: Unzip artifacts
         run: |
           unzip -o artifacts.zip
       !{{ common.setup_miniconda("3.8") }}
       - name: Install macOS homebrew dependencies
         run: |
           # Install dependencies
           brew install libomp
       !{{ common.parse_ref() }}
       - name: Test
         run: |
           python3 -mpip install dist/*.whl
           .jenkins/pytorch/macos-test.sh
       !{{ common.render_test_results() }}
       !{{ common.upload_downloaded_files(name='macos', artifact_name="test-jsons", use_s3=False) }}
       !{{ common.upload_test_reports("macos", artifact_name="test-reports", use_s3=False) }}
       !{{ common.upload_test_statistics(build_environment) }}
 {% endblock +%}
 {%- endif %}
 !{{ common.concurrency(build_environment) }}

99

.github/templates/windows_ci_workflow.yml.j2 vendored

View File

 @ -19,16 +19,8 @@
 name: !{{ build_environment }}
 on:
 {%- if on_pull_request %}
   pull_request:
   {%- if ciflow_config.enabled %}
     {%- if ciflow_config.trigger_action_only %}
     types: [!{{ ciflow_config.trigger_action }}]
     {%- else %}
     types: [opened, synchronize, reopened, !{{ ciflow_config.trigger_action }}]
     {%- endif %}
   {%- endif %}
 {%- endif %}
 {%- if is_scheduled %}
   schedule:
     - cron: !{{ is_scheduled }}
 @ -37,16 +29,20 @@ on:
     branches:
       - master
       - release/*
       - fbsync
 {%- endif %}
   workflow_dispatch:
 env:
   BUILD_ENVIRONMENT: !{{ build_environment }}
   BUILD_WHEEL: 1
   MAX_JOBS: 8
   CUDA_VERSION: "!{{ cuda_version }}"
   IN_CI: 1
   IS_GHA: 1
   INSTALL_WINDOWS_SDK: 1
   PYTHON_VERSION: "3.8"
   PYTORCH_RETRY_TEST_CASES: 1
   PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}
   SCCACHE_BUCKET: "ossci-compiler-cache"
   VC_PRODUCT: "BuildTools"
 @ -55,46 +51,38 @@ env:
   VC_YEAR: "2019"
   ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"
   no_proxy: !{{ common.squid_no_proxy }}
   AWS_DEFAULT_REGION: us-east-1
   PR_NUMBER: ${{ github.event.pull_request.number }}
   SHA1: ${{ github.event.pull_request.head.sha || github.sha }}
 {%- if build_with_debug %}
   DEBUG: 1
 {%- endif %}
 {%- if cuda_version != "cpu" %}
   TORCH_CUDA_ARCH_LIST: "7.0"
   USE_CUDA: 1
 {%- endif %}
   USE_CUDA: !{{ 1 if cuda_version != "cpu" else 0 }}
 !{{ common.concurrency(build_environment) }}
 jobs:
 {%- if ciflow_config.enabled %}
   !{{ ciflow_config.root_job_name }}:
     runs-on: ubuntu-18.04
     if: ${{ !{{ ciflow_config.root_job_condition }} }}
     steps:
       - name: noop
         run: echo running !{{ ciflow_config.root_job_name }}
 {%- endif %}
   build:
     runs-on: "windows.4xlarge"
     defaults:
       run:
         working-directory: pytorch-${{ github.run_id }}
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-build
       http_proxy: "!{{ common. squid_proxy }}"
       https_proxy: "!{{ common.squid_proxy }}"
       IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == '!{{ ciflow_config.trigger_action }}') && (github.event.assigneed.login == '!{{ ciflow_config.trigger_actor }}') }}
       LABEL_CONDITIONS: ${{ !{{ ciflow_config.label_conditions }} }}
     if: !{{ ciflow_config.root_job_condition }}
     steps:
       - name: print labels
         run: echo "${PR_LABELS}"
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
       - name: Checkout PyTorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
         with:
           submodules: recursive
           path: pytorch-${{ github.run_id }}
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
       !{{ common.checkout_pytorch("recursive") }}
       !{{ common.display_ec2_information() }}
       - name: Install Visual Studio 2019 toolchain
         shell: powershell
 @ -110,25 +98,16 @@ jobs:
         run: |
           .circleci/scripts/windows_cudnn_install.sh
 {%- endif %}
       !{{ common.parse_ref() }}
       - name: Build
         shell: bash
         env:
           PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
           BRANCH: ${{ steps.parse-ref.outputs.branch }}
         run: |
           .jenkins/pytorch/win-build.sh
       # Upload to github so that people can click and download artifacts
       - name: Upload artifacts to Github
         if: always()
         uses: actions/upload-artifact@v2
         # Don't fail on upload to GH since it's only for user convenience
         continue-on-error: true
         with:
           retention-days: 14
           if-no-files-found: error
           name: ${{ env.BUILD_ENVIRONMENT }}
           path: C:\${{ github.run_id }}\build-results
       - name: Upload artifacts to s3
         if: always()
         uses: !{{ common.upload_artifact_s3_action }}
         with:
           retention-days: 14
 @ -147,15 +126,17 @@ jobs:
           rm -rf ./*
   generate-test-matrix:
     {%- if ciflow_config.enabled %}
     needs: [!{{ ciflow_config.root_job_name }}]
     {%- endif %}
     needs: build
     runs-on: ubuntu-18.04
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       TEST_RUNNER_TYPE: !{{ test_runner_type }}
       NUM_TEST_SHARDS: !{{ num_test_shards }}
       NUM_TEST_SHARDS_ON_PULL_REQUEST: !{{ num_test_shards_on_pull_request }}
       PR_BODY: ${{ github.event.pull_request.body }}
       NOGPU_RUNNER_TYPE: windows.4xlarge
       ENABLE_FORCE_ON_CPU_TEST: !{{ enable_force_on_cpu_test }}
       RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
     outputs:
       matrix: ${{ steps.set-matrix.outputs.matrix }}
       render-matrix: ${{ steps.set-matrix.outputs.render-matrix }}
 @ -172,9 +153,7 @@ jobs:
         run: .github/scripts/generate_pytorch_test_matrix.py
   test:
 {%- if only_build_on_pull_request %}
     if: ${{ github.event_name == 'push' }}
 {%- endif %}
     timeout-minutes: !{{ common.timeout_minutes }}
     env:
       JOB_BASE_NAME: !{{ build_environment }}-test
       SHARD_NUMBER: ${{ matrix.shard }}
 @ -182,40 +161,31 @@ jobs:
       TEST_CONFIG: ${{ matrix.config }}
       http_proxy: "!{{ common.squid_proxy }}"
       https_proxy: "!{{ common.squid_proxy }}"
       RUN_SMOKE_TESTS_ONLY_ON_PR: !{{ only_run_smoke_tests_on_pull_request }}
       PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}
       CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}
     needs: [build, generate-test-matrix, !{{ ciflow_config.root_job_name }}]
     needs: [build, generate-test-matrix]
     strategy:
       matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}
       fail-fast: false
     runs-on: ${{ matrix.runner }}
     defaults:
       run:
         working-directory: pytorch-${{ github.run_id }}
     steps:
       - name: Checkout PyTorch
         uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9
         with:
           submodules: recursive
           path: pytorch-${{ github.run_id }}
           # deep clone, to allow use of git merge-base
           fetch-depth: 0
       !{{ common.display_ec2_information() }}
       - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"
         uses: seemethere/add-github-ssh-key@v1
         with:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
       !{{ common.checkout_pytorch("recursive") }}
       - name: Install Visual Studio 2019 toolchain
         shell: powershell
         run: |
           .\.circleci\scripts\vs_install.ps1
 {%- if cuda_version != "cpu" %}
       - name: Install Cuda
         if: ${{ matrix.config != 'force_on_cpu' }}
         shell: bash
         run: |
           .circleci/scripts/windows_cuda_install.sh
       - name: Install Cudnn
         if: ${{ matrix.config != 'force_on_cpu' }}
         shell: bash
         run: |
           .circleci/scripts/windows_cudnn_install.sh
 @ -238,14 +208,11 @@ jobs:
         shell: bash
         env:
           PYTORCH_FINAL_PACKAGE_DIR: /c/${{ github.run_id }}/build-results/
         # Time out the test phase after 3.5 hours
         timeout-minutes: 210
         run: |
             if [[ $NUM_TEST_SHARDS -ne 2 ]]; then
               export SHARD_NUMBER=0
             fi
             if [[ -n $GITHUB_HEAD_REF && "$RUN_SMOKE_TESTS_ONLY_ON_PR" == "true" ]]; then
               export RUN_SMOKE_TESTS_ONLY=1
             fi
             .jenkins/pytorch/win-test.sh
       !{{ common.upload_downloaded_files(name='windows') }}
       !{{ common.upload_test_reports(name='windows') }}
       !{{ common.render_test_results() }}
       !{{ wait_and_kill_ssh() }}

									
										54

.github/workflows/auto_label.yml
									
										vendored
									
												View File
											
				@ -1,54 +0,0 @@

				name: Label PRs & Issues

				on:

				  issues:

				    types: [opened, edited]

				  pull_request_target:

				    types: [edited, opened, synchronize, reopened]

				concurrency:

				  group: auto-label-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  auto-label-rocm:

				    if: ${{ github.repository == 'pytorch/pytorch' }}

				    runs-on: ubuntu-18.04

				    steps:

				    - name: Retrieve information

				      id: vars

				      env:

				        EVENT_NAME: ${{ github.event_name }}

				        PR_TITLE: ${{ github.event.pull_request.title }}

				        PR_NUMBER: ${{ github.event.pull_request.number }}

				        ISSUE_TITLE: ${{ github.event.issue.title }}

				        ISSUE_NUMBER: ${{ github.event.issue.number }}

				      run: |

				        set -eux

				        if [[ "$EVENT_NAME" == "pull_request_target" ]]; then

				          TITLE="${PR_TITLE}"

				          ISSUE_NUMBER="${PR_NUMBER}"

				        else

				          TITLE="${ISSUE_TITLE}"

				          # ISSUE_NUMBER is already set

				        fi

				        echo ::set-output name=TITLE::"${TITLE}"

				        echo ::set-output name=ISSUE_NUMBER::"${ISSUE_NUMBER}"

				    - name: Auto-label ROCm

				      env:

				        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        TITLE: ${{ steps.vars.outputs.TITLE }}

				        ISSUE_NUMBER: ${{ steps.vars.outputs.ISSUE_NUMBER }}

				        OWNER: ${{ github.repository_owner }}

				        REPO: ${{ github.event.repository.name }}

				      run: |

				        set -eux

				        if [[ "${TITLE,,}" == *rocm* ]]; then

				          curl \

				            -X POST \

				            -H "Authorization: token ${GITHUB_TOKEN}" \

				            "https://api.github.com/repos/${OWNER}/${REPO}/issues/${ISSUE_NUMBER}/labels" \

				            -d '{"labels":["module: rocm"]}'

				        fi

									
										12

.github/workflows/build_linux_conda.yml
									
										vendored
									
												View File
												
				@ -95,15 +95,13 @@ jobs:

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

									
										12

.github/workflows/build_linux_libtorch.yml
									
										vendored
									
												View File
												
				@ -94,15 +94,13 @@ jobs:

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

									
										12

.github/workflows/build_linux_wheels.yml
									
										vendored
									
												View File
												
				@ -93,15 +93,13 @@ jobs:

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

									
										151

.github/workflows/generated-puretorch-linux-xenial-py3.6-gcc5.4.yml → .github/workflows/generated-caffe2-linux-xenial-py3.7-gcc5.4.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,24 +1,26 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: puretorch-linux-xenial-py3.6-gcc5.4

				name: caffe2-linux-xenial-py3.7-gcc5.4

				on:

				  pull_request:

				    types: [unassigned]

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: puretorch-linux-xenial-py3.6-gcc5.4

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4

				  BUILD_ENVIRONMENT: caffe2-linux-xenial-py3.7-gcc5.4

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: puretorch-linux-xenial-py3.6-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  group: caffe2-linux-xenial-py3.7-gcc5.4-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: caffe2-linux-xenial-py3.7-gcc5.4-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: puretorch-linux-xenial-py3.6-gcc5.4-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -270,8 +229,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

									
										174

.github/workflows/generated-docker-builds.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,174 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/docker_builds_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: docker-builds

				on:

				  workflow_dispatch:

				  pull_request:

				    types: [opened, synchronize, reopened]

				    paths:

				      - '.circleci/docker/**'

				      - '.github/workflows/generated-docker-builds.yml'

				  schedule:

				    - cron: 1 * */7 * *

				concurrency:

				  group: docker-builds-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  AWS_DEFAULT_REGION: us-east-1

				jobs:

				  docker-build:

				    runs-on: linux.2xlarge

				    timeout-minutes: 240

				    strategy:

				      matrix:

				        include:

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9'

				              docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.7-clang9'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'

				              docker_image_short_name: 'pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'

				              docker_image_short_name: 'pytorch-linux-bionic-cuda11.5-cudnn8-py3-gcc7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9'

				              docker_image_short_name: 'pytorch-linux-bionic-py3.7-clang9'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.1-py3.7'

				              docker_image_short_name: 'pytorch-linux-bionic-rocm4.1-py3.7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.2-py3.7'

				              docker_image_short_name: 'pytorch-linux-bionic-rocm4.2-py3.7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-rocm4.3.1-py3.7'

				              docker_image_short_name: 'pytorch-linux-bionic-rocm4.3.1-py3.7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'

				              docker_image_short_name: 'pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'

				              docker_image_short_name: 'pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'

				              docker_image_short_name: 'pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c'

				              docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-android-ndk-r19c'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan'

				              docker_image_short_name: 'pytorch-linux-xenial-py3-clang5-asan'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-asan'

				              docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-asan'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang7-onnx'

				              docker_image_short_name: 'pytorch-linux-xenial-py3-clang7-onnx'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4'

				              docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc5.4'

				            - docker_image_base: '308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc7'

				              docker_image_short_name: 'pytorch-linux-xenial-py3.7-gcc7'

				    env:

				      DOCKER_IMAGE_BASE: '${{ matrix.docker_image_base }}'

				    name: docker-build (${{ matrix.docker_image_short_name }})

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

									
										98

.github/workflows/generated-ios-12-5-1-arm64-coreml.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-arm64-coreml

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-arm64-coreml

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-arm64-coreml-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-arm64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-arm64-custom-ops.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-arm64-custom-ops

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-arm64-custom-ops

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-arm64-custom-ops-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-arm64-custom-ops-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-arm64-full-jit.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-arm64-full-jit

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-arm64-full-jit

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-arm64-full-jit-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-arm64-full-jit-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-arm64-metal.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-arm64-metal

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-arm64-metal

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-arm64-metal-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-arm64-metal-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-arm64.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-arm64

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-arm64

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-arm64-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-arm64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-x86-64-coreml.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-x86-64-coreml

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-x86-64-coreml

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-x86-64-coreml-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-x86-64-coreml-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-x86-64-full-jit.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-x86-64-full-jit

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-x86-64-full-jit

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-x86-64-full-jit-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-x86-64-full-jit-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										98

.github/workflows/generated-ios-12-5-1-x86-64.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,98 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/ios_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: ios-12-5-1-x86-64

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				  workflow_dispatch:

				# For setup-miniconda, see https://github.com/conda-incubator/setup-miniconda/issues/179

				defaults:

				  run:

				    shell: bash -x -e -l {0}

				env:

				  BUILD_ENVIRONMENT: ios-12-5-1-x86-64

				  IN_CI: 1

				  IS_GHA: 1

				jobs:

				  build:

				    runs-on: macos-10.15

				    timeout-minutes: 240

				    env:

				      JOB_BASE_NAME: ios-12-5-1-x86-64-build

				      IOS_CERT_KEY_2022: ${{ secrets.IOS_CERT_KEY_2022 }}

				      IOS_SIGN_KEY_2022: ${{ secrets.IOS_SIGN_KEY_2022 }}

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				      PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/ios') || contains(github.event.pull_request.labels.*.name, 'ciflow/macos') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Setup miniconda

				        uses: conda-incubator/setup-miniconda@v2

				        with:

				          auto-update-conda: true

				          python-version: 3.8

				          activate-environment: build

				      - name: Install ios / conda Dependencies

				        run: |

				          # Install dependencies

				          brew install libtool

				          conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

				      - name: Run Fastlane

				        shell: bash -e {0}

				        run: |

				          set -x

				          cd ios/TestApp

				          # install fastlane

				          sudo gem install bundler && bundle install

				          # install certificates

				          echo "${IOS_CERT_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o Certificates.p12

				          rm cert.txt

				          bundle exec fastlane install_root_cert

				          bundle exec fastlane install_dev_cert

				          # install the provisioning profile

				          PROFILE=PyTorch_CI_2022.mobileprovision

				          PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				          mkdir -pv "${PROVISIONING_PROFILES}"

				          cd "${PROVISIONING_PROFILES}"

				          echo "${IOS_SIGN_KEY_2022}" >> cert.txt

				          base64 --decode cert.txt -o ${PROFILE}

				          rm cert.txt

				      - name: Build

				        run: |

				          export TCLLIBPATH="/usr/local/lib"

				          python -VV

				          export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname "$(which conda)")/../"}

				          scripts/build_ios.sh

				concurrency:

				  group: ios-12-5-1-x86-64-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

									
										149

.github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.6-gcc7.yml → .github/workflows/generated-libtorch-linux-xenial-cuda10.2-py3.7-gcc7.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,24 +1,26 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: libtorch-linux-xenial-cuda10.2-py3.6-gcc7

				name: libtorch-linux-xenial-cuda10.2-py3.7-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.6-gcc7

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda10.2-py3.7-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  group: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.7-gcc7-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda10.2-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -259,8 +218,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

									
										149

.github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.6-gcc7.yml → .github/workflows/generated-libtorch-linux-xenial-cuda11.3-py3.7-gcc7.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,24 +1,26 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: libtorch-linux-xenial-cuda11.3-py3.6-gcc7

				name: libtorch-linux-xenial-cuda11.3-py3.7-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.6-gcc7

				  BUILD_ENVIRONMENT: libtorch-linux-xenial-cuda11.3-py3.7-gcc7

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  group: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.7-gcc7-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/libtorch') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: libtorch-linux-xenial-cuda11.3-py3.6-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -259,8 +218,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

									
										260

.github/workflows/generated-linux-bionic-cuda10.2-py3.9-gcc7.yml
									
										generated
									
										vendored
									
												View File
												
				@ -5,11 +5,12 @@ name: linux-bionic-cuda10.2-py3.9-gcc7

				on:

				  pull_request:

				    types: [unassigned]

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				@ -19,6 +20,7 @@ env:

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: linux-bionic-cuda10.2-py3.9-gcc7-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository_owner == 'pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository_owner == 'pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            (false))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cuda') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/slow') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -270,8 +229,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				@ -294,22 +251,25 @@ jobs:

				          docker system prune -af

				  generate-test-matrix:

				    needs: build

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    env:

				      TEST_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      TEST_RUNNER_TYPE: linux.4xlarge.nvidia.gpu

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				      ENABLE_SLOW_TEST: ''

				      ENABLE_JIT_LEGACY_TEST: 1

				      ENABLE_FX2TRT_TEST: ''

				      ENABLE_MULTIGPU_TEST: 1

				      ENABLE_NOGPU_NO_AVX_TEST: 1

				      ENABLE_NOGPU_NO_AVX2_TEST: 1

				      ENABLE_SLOW_TEST: 1

				      ENABLE_DOCS_TEST: ''

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				@ -328,19 +288,19 @@ jobs:

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    needs: [build, generate-test-matrix]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    timeout-minutes: 240

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				@ -360,18 +320,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -390,9 +348,16 @@ jobs:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				      - name: Clean PyTorch checkout

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				@ -426,24 +391,31 @@ jobs:

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        # Time out the test phase after 240 minutes

				        timeout-minutes: 240

				        run: |

				          set -x

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then

				            TEST_COMMAND=.jenkins/caffe2/test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          PROXY_ENV=

				          # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now

				          #       We should investigate whether or not there's a list of hostnames we can add to no_proxy to

				          #       make it so that we shouldn't have to fully disable squid for XLA tests

				          if [[ $TEST_CONFIG != 'xla' ]]; then

				            # shellcheck disable=SC2089

				            PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          # shellcheck disable=SC2086,SC2090

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				@ -452,9 +424,8 @@ jobs:

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e BRANCH \

				            -e SHA1 \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				@ -462,15 +433,17 @@ jobs:

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PYTORCH_RETRY_TEST_CASES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            ${PROXY_ENV} \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --ulimit stack=10485760:83886080 \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --ipc=host \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				@ -499,6 +472,22 @@ jobs:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip JSONs for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test jsons if they exist

				          rm -f test-jsons-*.zip

				          zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Downloaded JSONs on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: warn

				          path:

				            test-jsons-*.zip

				      - name: Zip test reports for upload

				        if: always()

				        env:

				@ -507,15 +496,6 @@ jobs:

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				@ -530,16 +510,16 @@ jobs:

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-cuda10.2-py3.9-gcc7-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m pip install boto3==1.19.12

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				@ -547,8 +527,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

									
										258

.github/workflows/generated-linux-bionic-py3.6-clang9.yml → .github/workflows/generated-linux-bionic-py3.7-clang9.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,7 +1,7 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-bionic-py3.6-clang9

				name: linux-bionic-py3.7-clang9

				on:

				  pull_request:

				@ -10,15 +10,17 @@ on:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-bionic-py3.6-clang9

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.6-clang9

				  BUILD_ENVIRONMENT: linux-bionic-py3.7-clang9

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: linux-bionic-py3.6-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  group: linux-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/xla'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            ((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: linux-bionic-py3.7-clang9-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/noarch') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.6-clang9-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -270,8 +229,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				@ -294,12 +251,14 @@ jobs:

				          docker system prune -af

				  generate-test-matrix:

				    needs: build

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: ''

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_FX2TRT_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				@ -310,6 +269,7 @@ jobs:

				      ENABLE_NOARCH_TEST: 1

				      NUM_TEST_SHARDS: 2

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				@ -328,19 +288,19 @@ jobs:

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    needs: [build, generate-test-matrix]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    timeout-minutes: 240

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.6-clang9-test

				      DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.7-clang9-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				@ -360,18 +320,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -390,9 +348,16 @@ jobs:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				      - name: Clean PyTorch checkout

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				@ -426,24 +391,31 @@ jobs:

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        # Time out the test phase after 240 minutes

				        timeout-minutes: 240

				        run: |

				          set -x

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then

				            TEST_COMMAND=.jenkins/caffe2/test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          PROXY_ENV=

				          # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now

				          #       We should investigate whether or not there's a list of hostnames we can add to no_proxy to

				          #       make it so that we shouldn't have to fully disable squid for XLA tests

				          if [[ $TEST_CONFIG != 'xla' ]]; then

				            # shellcheck disable=SC2089

				            PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          # shellcheck disable=SC2086,SC2090

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				@ -452,9 +424,8 @@ jobs:

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e BRANCH \

				            -e SHA1 \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				@ -462,15 +433,17 @@ jobs:

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PYTORCH_RETRY_TEST_CASES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            ${PROXY_ENV} \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --ulimit stack=10485760:83886080 \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --ipc=host \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				@ -499,6 +472,22 @@ jobs:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Zip JSONs for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          # Remove any previous test jsons if they exist

				          rm -f test-jsons-*.zip

				          zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Downloaded JSONs on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: warn

				          path:

				            test-jsons-*.zip

				      - name: Zip test reports for upload

				        if: always()

				        env:

				@ -507,15 +496,6 @@ jobs:

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				@ -530,16 +510,16 @@ jobs:

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-py3.6-clang9-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-py3.7-clang9-test

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m pip install boto3==1.19.12

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				@ -547,8 +527,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

									
										384

.github/workflows/generated-linux-docs-push.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,384 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-docs-push

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  schedule:

				    - cron: 0 0 * * *

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-docs-push

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: linux-docs-push-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  build:

				    runs-on: linux.2xlarge

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled')) ||

				            (false))

				         }}

				    env:

				      JOB_BASE_NAME: linux-docs-push-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/scheduled') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  build-docs:

				    runs-on: linux.2xlarge

				    timeout-minutes: 240

				    strategy:

				      matrix:

				        docs_type: [cpp, python]

				    needs: [build]

				    env:

				      DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}

				      DOCS_TYPE: ${{ matrix.docs_type }}

				      WITH_PUSH: ${{ github.event_name == 'schedule' }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Generate netrc (only for docs-push)

				        if: ${{ github.event_name == 'schedule' }}

				        env:

				          GITHUB_PYTORCHBOT_TOKEN: ${{ secrets.GH_PYTORCHBOT_TOKEN }}

				        run: |

				          # set credentials for https pushing

				          echo "machine github.com" > "${RUNNER_TEMP}/.netrc"

				          echo "login pytorchbot" >> "${RUNNER_TEMP}/.netrc"

				          echo "password ${GITHUB_PYTORCHBOT_TOKEN}" >> "${RUNNER_TEMP}/.netrc"

				      - name: Build ${{ matrix.docs_type }} docs

				        run: |

				          set -ex

				          time docker pull "${DOCKER_IMAGE}" > /dev/null

				          echo "${GITHUB_REF}"

				          # TODO: Set it correctly when workflows are scheduled on tags

				          target="master"

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e IN_CI \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SHA1="$GITHUB_SHA" \

				            -e DOCS_VERSION="${target}" \

				            -e DOCS_TYPE \

				            -e PR_LABELS \

				            -e WITH_PUSH \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${RUNNER_TEMP}/.netrc":/var/lib/jenkins/.netrc \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload Python Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}

				        with:

				          retention-days: 14

				          s3-bucket: doc-previews

				          if-no-files-found: error

				          path: pytorch.github.io/docs/master/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload C++ Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          s3-bucket: doc-previews

				          path: cppdocs/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs

									
										377

.github/workflows/generated-linux-docs.yml
									
										generated
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,377 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-docs

				on:

				  pull_request:

				    types: [opened, synchronize, reopened, unassigned]

				  push:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-docs

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.7-gcc5.4

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: linux-docs-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  build:

				    runs-on: linux.2xlarge

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/docs') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk')) ||

				            ((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))

				         }}

				    env:

				      JOB_BASE_NAME: linux-docs-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/docs') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				          # Check if image already exists, if it does then skip building it

				          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

				            exit 0

				          fi

				          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

				            # if we're on the base branch then use the parent commit

				            MERGE_BASE=$(git rev-parse HEAD~)

				          else

				            # otherwise we're on a PR, so use the most recent base commit

				            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

				          fi

				          # Covers the case where a previous tag doesn't exist for the tree

				          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

				          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

				            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

				            exit 1

				          fi

				          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

				          # If no image exists but the hash is the same as the previous hash then we should error out here

				          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

				            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				            echo "       contact the PyTorch team to restore the original images"

				            exit 1

				          fi

				          echo ::set-output name=rebuild::yes

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e SKIP_SCCACHE_INITIALIZATION=1 \

				            -e TORCH_CUDA_ARCH_LIST \

				            -e PR_LABELS \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				          pip3 install requests==2.26 boto3==1.16.34

				          python3 -m tools.stats.upload_binary_size_to_scuba || exit 0

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Archive artifacts into zip

				        run: |

				          zip -1 -r artifacts.zip dist/ build/custom_test_artifacts build/lib build/bin .pytorch-test-times.json

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store PyTorch Build Artifacts on S3

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            artifacts.zip

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Kill containers, clean up images

				        if: always()

				        run: |

				          # ignore expansion of "docker ps -q" since it could be empty

				          # shellcheck disable=SC2046

				          docker stop $(docker ps -q) || true

				          # Prune all of the docker images

				          docker system prune -af

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				        if: always()

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Clean up docker images

				        if: always()

				        run: |

				          # Prune all of the docker images

				          docker system prune -af

				  build-docs:

				    runs-on: linux.2xlarge

				    timeout-minutes: 240

				    strategy:

				      matrix:

				        docs_type: [cpp, python]

				    needs: [build]

				    env:

				      DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}

				      DOCS_TYPE: ${{ matrix.docs_type }}

				      WITH_PUSH: ${{ github.event_name == 'schedule' }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - uses: seemethere/download-artifact-s3@0504774707cbc8603d7dca922e8026eb8bf3b47b

				        name: Download PyTorch Build Artifacts

				        with:

				          name: ${{ env.BUILD_ENVIRONMENT }}

				      - name: Unzip artifacts

				        run: |

				          unzip -o artifacts.zip

				      - name: Build ${{ matrix.docs_type }} docs

				        run: |

				          set -ex

				          time docker pull "${DOCKER_IMAGE}" > /dev/null

				          echo "${GITHUB_REF}"

				          # TODO: Set it correctly when workflows are scheduled on tags

				          target="master"

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				            -e IN_CI \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SHA1="$GITHUB_SHA" \

				            -e DOCS_VERSION="${target}" \

				            -e DOCS_TYPE \

				            -e PR_LABELS \

				            -e WITH_PUSH \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --tty \

				            --detach \

				            --user jenkins \

				            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

				            -w /var/lib/jenkins/workspace \

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/${DOCS_TYPE}_doc_push_script.sh"

				      - name: Chown workspace

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload Python Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'python' }}

				        with:

				          retention-days: 14

				          s3-bucket: doc-previews

				          if-no-files-found: error

				          path: pytorch.github.io/docs/master/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Upload C++ Docs Preview

				        if: ${{ github.event_name == 'pull_request' && matrix.docs_type == 'cpp' }}

				        with:

				          retention-days: 14

				          if-no-files-found: error

				          s3-bucket: doc-previews

				          path: cppdocs/

				          s3-prefix: pytorch/${{ github.event.pull_request.number }}/cppdocs

									
										264

.github/workflows/generated-linux-bionic-py3.8-gcc9-coverage.yml → .github/workflows/generated-linux-vulkan-bionic-py3.7-clang9.yml
									
										generated
									
										vendored
									
												View File
												
				@ -1,7 +1,7 @@

				# @generated DO NOT EDIT MANUALLY

				# Template is at:    .github/templates/linux_ci_workflow.yml.j2

				# Generation script: .github/scripts/generate_ci_workflows.py

				name: linux-bionic-py3.8-gcc9-coverage

				name: linux-vulkan-bionic-py3.7-clang9

				on:

				  pull_request:

				@ -10,15 +10,17 @@ on:

				    branches:

				      - master

				      - release/*

				      - fbsync

				  workflow_dispatch:

				env:

				  BUILD_ENVIRONMENT: linux-bionic-py3.8-gcc9-coverage

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.8-gcc9

				  BUILD_ENVIRONMENT: linux-vulkan-bionic-py3.7-clang9

				  DOCKER_IMAGE_BASE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-py3.7-clang9

				  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

				  XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla

				  TORCH_CUDA_ARCH_LIST: 5.2

				  IN_CI: 1

				  IS_GHA: 1

				  # This is used for the phase of adding wheel tests only, will be removed once completed

				  IN_WHEEL_TEST: 1

				  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

				@ -26,31 +28,34 @@ env:

				  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				  PR_LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  AWS_DEFAULT_REGION: us-east-1

				  PR_NUMBER: ${{ github.event.pull_request.number }}

				  SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				  PYTORCH_RETRY_TEST_CASES: 1

				concurrency:

				  group: linux-bionic-py3.8-gcc9-coverage-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  group: linux-vulkan-bionic-py3.7-clang9-${{ github.event.pull_request.number || github.sha }}-${{ github.event_name == 'workflow_dispatch' }}

				  cancel-in-progress: true

				jobs:

				  ciflow_should_run:

				    runs-on: ubuntu-18.04

				    if: ${{ (github.repository == 'pytorch/pytorch') && ((github.event_name != 'pull_request') || (github.event.assignee.login != 'pytorchbot' ) || (github.event.action !='unassigned') || (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/coverage') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux'))) }}

				    env:

				      LABELS: ${{ toJson(github.event.pull_request.labels.*.name) }}

				    steps:

				      - name: noop

				        run: echo running ciflow_should_run

				      - name: print labels

				        run: echo "${LABELS}"

				  calculate-docker-image:

				  build:

				    runs-on: linux.2xlarge

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    if: ${{ (github.repository == 'pytorch/pytorch') && (

				            (github.event_name == 'push') ||

				            (github.event_name == 'schedule') ||

				            (contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') || contains(github.event.pull_request.labels.*.name, 'ciflow/vulkan')) ||

				            ((github.event_name == 'pull_request' && github.event.action != 'unassigned') && !contains(join(github.event.pull_request.labels.*.name), 'ciflow/')))

				         }}

				    env:

				      DOCKER_BUILDKIT: 1

				    timeout-minutes: 90

				      JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-build

				      IS_PROBOT_TRIGGER_EVENT: ${{ (github.event.action == 'unassigned') && (github.event.assigneed.login == 'pytorchbot') }}

				      LABEL_CONDITIONS: ${{ contains(github.event.pull_request.labels.*.name, 'ciflow/all') || contains(github.event.pull_request.labels.*.name, 'ciflow/cpu') || contains(github.event.pull_request.labels.*.name, 'ciflow/default') || contains(github.event.pull_request.labels.*.name, 'ciflow/linux') || contains(github.event.pull_request.labels.*.name, 'ciflow/trunk') || contains(github.event.pull_request.labels.*.name, 'ciflow/vulkan') }}

				    outputs:

				      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

				    steps:

				      - name: print labels

				        run: echo "${PR_LABELS}"

				      - name: Display EC2 information

				        shell: bash

				        run: |

				@ -69,18 +74,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -98,17 +101,22 @@ jobs:

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: false

				          submodules: recursive

				      - name: Clean PyTorch checkout

				        run: |

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Calculate docker image tag

				        id: calculate-tag

				        run: |

				          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

				          echo "DOCKER_TAG=${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "DOCKER_IMAGE=${DOCKER_IMAGE_BASE}:${DOCKER_TAG}" >> "${GITHUB_ENV}"

				          echo "::set-output name=docker_tag::${DOCKER_TAG}"

				          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

				      - name: Check if image should be built

				        id: check

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

				        run: |

				          set -x

				@ -140,77 +148,35 @@ jobs:

				      - name: Build and push docker image

				        if: ${{ steps.check.outputs.rebuild }}

				        env:

				          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

				          DOCKER_SKIP_S3_UPLOAD: 1

				        working-directory: .circleci/docker

				        run: |

				          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

				          cd .circleci/docker && ./build_docker.sh

				  build:

				    runs-on: linux.2xlarge

				    needs: [calculate-docker-image, ciflow_should_run]

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-build

				    steps:

				      - name: Display EC2 information

				        shell: bash

				        run: |

				          set -euo pipefail

				          function get_ec2_metadata() {

				            # Pulled from instance metadata endpoint for EC2

				            # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

				            category=$1

				            curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"

				          }

				          echo "ami-id: $(get_ec2_metadata ami-id)"

				          echo "instance-id: $(get_ec2_metadata instance-id)"

				          echo "instance-type: $(get_ec2_metadata instance-type)"

				      - name: Log in to ECR

				        env:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				          ./build_docker.sh

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				          rm -rf "${GITHUB_WORKSPACE:?}/*"

				          rm -f ~/.ssh/authorized_keys

				      - name: "[FB EMPLOYEES] Enable SSH (Click me for login details)"

				        uses: seemethere/add-github-ssh-key@v1

				        with:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				      - name: Preserve github env variables for use in docker

				        run: |

				          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

				      - name: Checkout PyTorch

				        uses: zhouzhuojie/checkout@05b13c9a0d21f08f6d5e64a1d5042246d13619d9

				        with:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Build

				        env:

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        run: |

				          # detached container should get cleaned up by teardown_ec2_linux

				          container_name=$(docker run \

				            -e BUILD_ENVIRONMENT \

				            -e JOB_BASE_NAME \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e AWS_DEFAULT_REGION \

				            -e IS_GHA \

				            -e PR_NUMBER \

				            -e SHA1 \

				            -e BRANCH \

				            -e GITHUB_RUN_ID \

				            -e SCCACHE_BUCKET \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

				@ -229,21 +195,14 @@ jobs:

				            "${DOCKER_IMAGE}"

				          )

				          docker exec -t "${container_name}" sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

				      - name: Parse ref

				        id: parse-ref

				        run: .github/scripts/parse_ref.py

				      - name: Display and upload binary build size statistics (Click Me)

				        # temporary hack: set CIRCLE_* vars, until we update

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          IS_GHA: 1

				          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        run: |

				          COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				          export COMMIT_TIME

				@ -270,8 +229,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				@ -294,12 +251,14 @@ jobs:

				          docker system prune -af

				  generate-test-matrix:

				    needs: build

				    runs-on: ubuntu-18.04

				    needs: [ciflow_should_run]

				    timeout-minutes: 240

				    env:

				      TEST_RUNNER_TYPE: linux.2xlarge

				      ENABLE_DISTRIBUTED_TEST: 1

				      ENABLE_DISTRIBUTED_TEST: ''

				      ENABLE_JIT_LEGACY_TEST: ''

				      ENABLE_FX2TRT_TEST: ''

				      ENABLE_MULTIGPU_TEST: ''

				      ENABLE_NOGPU_NO_AVX_TEST: ''

				      ENABLE_NOGPU_NO_AVX2_TEST: ''

				@ -308,8 +267,9 @@ jobs:

				      ENABLE_BACKWARDS_COMPAT_TEST: ''

				      ENABLE_XLA_TEST: ''

				      ENABLE_NOARCH_TEST: ''

				      NUM_TEST_SHARDS: 2

				      NUM_TEST_SHARDS: 1

				      MULTIGPU_RUNNER_TYPE: linux.16xlarge.nvidia.gpu

				      DISTRIBUTED_GPU_RUNNER_TYPE: linux.8xlarge.nvidia.gpu

				      NOGPU_RUNNER_TYPE: linux.2xlarge

				      PR_BODY: ${{ github.event.pull_request.body }}

				    outputs:

				@ -328,19 +288,19 @@ jobs:

				        run: .github/scripts/generate_pytorch_test_matrix.py

				  test:

				    needs: [calculate-docker-image, build, generate-test-matrix, ciflow_should_run]

				    needs: [build, generate-test-matrix]

				    strategy:

				      matrix: ${{ fromJson(needs.generate-test-matrix.outputs.matrix) }}

				      fail-fast: false

				    runs-on: ${{ matrix.runner }}

				    timeout-minutes: 240

				    env:

				      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

				      JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test

				      DOCKER_IMAGE: ${{ needs.build.outputs.docker_image }}

				      JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test

				      TEST_CONFIG: ${{ matrix.config }}

				      SHARD_NUMBER: ${{ matrix.shard }}

				      NUM_TEST_SHARDS: ${{ matrix.num_shards }}

				      PYTORCH_IGNORE_DISABLED_ISSUES: ${{ needs.generate-test-matrix.outputs.ignore-disabled-issues }}

				      CONTINUE_THROUGH_ERROR: ${{ github.repository == 'pytorch/pytorch' && (github.event_name == 'push' || github.event_name == 'schedule') }}

				    steps:

				      - name: Display EC2 information

				        shell: bash

				@ -360,18 +320,16 @@ jobs:

				          AWS_RETRY_MODE: standard

				          AWS_MAX_ATTEMPTS: 5

				        run: |

				          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

				          bash /tmp/ecr-login.sh

				          rm /tmp/ecr-login.sh

				          AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

				          aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \

				              --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

				      - name: Chown workspace

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          # Ensure the working directory gets chowned back to the current user

				          retry docker pull "${ALPINE_IMAGE}"

				          # Ensure the working directory gets chowned back to the current user

				          docker run --pull=never --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

				      - name: Clean workspace

				        run: |

				@ -390,9 +348,16 @@ jobs:

				          # deep clone, to allow use of git merge-base

				          fetch-depth: 0

				          submodules: recursive

				      - name: Pull docker image

				      - name: Clean PyTorch checkout

				        run: |

				          docker pull "${DOCKER_IMAGE}"

				          # Remove any artifacts from the previous checkouts

				          git clean -fxd

				      - name: Pull Docker image

				        run: |

				          retry () {

				              "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@")

				          }

				          retry docker pull "${DOCKER_IMAGE}"

				      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

				        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') && !contains(matrix.config, 'nogpu') }}

				        run: |

				@ -426,24 +391,31 @@ jobs:

				      - name: Test

				        env:

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          IS_GHA: 1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          AWS_DEFAULT_REGION: us-east-1

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				        # Time out the test phase after 240 minutes

				        timeout-minutes: 240

				        run: |

				          set -x

				          if [[ $TEST_CONFIG == 'multigpu' ]]; then

				            TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh

				          elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then

				            TEST_COMMAND=.jenkins/caffe2/test.sh

				          else

				            TEST_COMMAND=.jenkins/pytorch/test.sh

				          fi

				          if [[ $NUM_TEST_SHARDS -ne 2 ]]; then

				            export SHARD_NUMBER=0

				          PROXY_ENV=

				          # NOTE: XLA multiprocessing tests appear to have issues with squid proxy, going to disable for now

				          #       We should investigate whether or not there's a list of hostnames we can add to no_proxy to

				          #       make it so that we shouldn't have to fully disable squid for XLA tests

				          if [[ $TEST_CONFIG != 'xla' ]]; then

				            # shellcheck disable=SC2089

				            PROXY_ENV="-e http_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e https_proxy=http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128 -e no_proxy=localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock"

				          fi

				          # detached container should get cleaned up by teardown_ec2_linux

				          # TODO: Stop building test binaries as part of the build phase

				          # Used for GPU_FLAG since that doesn't play nice

				          # shellcheck disable=SC2086

				          # shellcheck disable=SC2086,SC2090

				          container_name=$(docker run \

				            ${GPU_FLAG:-} \

				            -e BUILD_ENVIRONMENT \

				@ -452,9 +424,8 @@ jobs:

				            -e GITHUB_ACTIONS \

				            -e IN_CI \

				            -e IS_GHA \

				            -e CIRCLE_BRANCH \

				            -e CIRCLE_SHA1 \

				            -e CIRCLE_PR_NUMBER \

				            -e BRANCH \

				            -e SHA1 \

				            -e AWS_DEFAULT_REGION \

				            -e IN_WHEEL_TEST \

				            -e SHARD_NUMBER \

				@ -462,15 +433,17 @@ jobs:

				            -e TEST_CONFIG \

				            -e NUM_TEST_SHARDS \

				            -e PYTORCH_IGNORE_DISABLED_ISSUES \

				            -e PYTORCH_RETRY_TEST_CASES \

				            -e PR_LABELS \

				            -e CONTINUE_THROUGH_ERROR \

				            -e MAX_JOBS="$(nproc --ignore=2)" \

				            -e SCCACHE_BUCKET \

				            -e http_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e https_proxy="http://internal-tf-lb-20210727220640487900000002-835786077.us-east-1.elb.amazonaws.com:3128" -e no_proxy="localhost,127.0.0.1,github.com,amazonaws.com,s3.amazonaws.com,169.254.169.254,169.254.170.2,/var/run/docker.sock" \

				            -e XLA_CLANG_CACHE_S3_BUCKET_NAME \

				            ${PROXY_ENV} \

				            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

				            --ulimit stack=10485760:83886080 \

				            --security-opt seccomp=unconfined \

				            --cap-add=SYS_PTRACE \

				            --ipc=host \

				            --shm-size="${SHM_SIZE}" \

				            --tty \

				            --detach \

				@ -499,10 +472,22 @@ jobs:

				          PYTHONIOENCODING: "utf-8"

				        run: |

				          python3 tools/render_junit.py test/

				      - name: Report coverage

				      - name: Zip JSONs for upload

				        if: always()

				        env:

				          FILE_SUFFIX: '${{ github.job }}-${{ matrix.config }}-${{ matrix.shard }}-${{ matrix.num_shards }}-${{ matrix.runner }}'

				        run: |

				          python3 -mpip install codecov==2.1.12

				          python3 -mcodecov

				          # Remove any previous test jsons if they exist

				          rm -f test-jsons-*.zip

				          zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Downloaded JSONs on S3

				        if: always()

				        with:

				          retention-days: 14

				          if-no-files-found: warn

				          path:

				            test-jsons-*.zip

				      - name: Zip test reports for upload

				        if: always()

				        env:

				@ -511,15 +496,6 @@ jobs:

				          # Remove any previous test reports if they exist

				          rm -f test-reports-*.zip

				          zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'

				      - uses: actions/upload-artifact@v2

				        name: Store Test Reports

				        if: always()

				        with:

				          name: test-reports-${{ matrix.config }}

				          retention-days: 14

				          if-no-files-found: error

				          path:

				            test-reports-*.zip

				      - uses: seemethere/upload-artifact-s3@v3

				        name: Store Test Reports on S3

				        if: always()

				@ -534,16 +510,16 @@ jobs:

				        # tools/stats/print_test_stats.py to natively support GitHub Actions

				        env:

				          AWS_DEFAULT_REGION: us-east-1

				          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-bionic-py3.8-gcc9-coverage-test

				          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

				          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

				          CIRCLE_WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				          BRANCH: ${{ steps.parse-ref.outputs.branch }}

				          JOB_BASE_NAME: linux-vulkan-bionic-py3.7-clang9-test

				          PR_NUMBER: ${{ github.event.pull_request.number }}

				          SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

				          TAG: ${{ steps.parse-ref.outputs.tag }}

				          WORKFLOW_ID: '${{ github.run_id }}_${{ github.run_number }}'

				        shell: bash

				        run: |

				          python3 -m pip install -r requirements.txt

				          python3 -m pip install boto3==1.16.34

				          python3 -m pip install boto3==1.19.12

				          python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test

				      - name: Hold runner for 2 hours or until ssh sessions have drained

				        # Always hold for active ssh sessions

				@ -551,8 +527,6 @@ jobs:

				        run: .github/scripts/wait_for_ssh_to_drain.sh

				      - name: Chown workspace

				        if: always()

				        env:

				          ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

				        run: |

				          # Ensure the working directory gets chowned back to the current user

				          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

Compare commits

2395 Commits v1.10.0-rc ... v0.1.2

2 .azure_pipelines/job_templates/prepare-build-template.yml Unescape Escape View File

4 .azure_pipelines/job_templates/set-environment-variables.yml Unescape Escape View File

6 .bazelrc Unescape Escape View File

3 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

4 .circleci/cimodel/data/dimensions.py Unescape Escape View File

74 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

27 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

119 .circleci/cimodel/data/simple/android_definitions.py Unescape Escape View File

10 .circleci/cimodel/data/simple/binary_smoketest.py Unescape Escape View File

27 .circleci/cimodel/data/simple/docker_definitions.py Unescape Escape View File

6 .circleci/cimodel/data/simple/ios_definitions.py Unescape Escape View File

33 .circleci/cimodel/data/simple/mobile_definitions.py Unescape Escape View File

77 .circleci/cimodel/data/simple/nightly_android.py Unescape Escape View File

20 .circleci/cimodel/data/simple/nightly_ios.py Unescape Escape View File

160 .circleci/cimodel/data/windows_build_definitions.py Unescape Escape View File

3054 .circleci/config.yml generated View File

4 .circleci/docker/android/build.gradle Unescape Escape View File

85 .circleci/docker/build.sh Unescape Escape View File

15 .circleci/docker/build_docker.sh Unescape Escape View File

12 .circleci/docker/centos-rocm/Dockerfile Unescape Escape View File

16 .circleci/docker/common/install_base.sh Unescape Escape View File

25 .circleci/docker/common/install_conda.sh Unescape Escape View File

10 .circleci/docker/common/install_cudnn8.sh Executable file Unescape Escape View File

11 .circleci/docker/common/install_gcc.sh Unescape Escape View File

2 .circleci/docker/common/install_openssl.sh Unescape Escape View File

27 .circleci/docker/common/install_rocm.sh Unescape Escape View File

7 .circleci/docker/common/install_tensorrt.sh Normal file Unescape Escape View File

22 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

6 .circleci/docker/ubuntu-rocm/Dockerfile Unescape Escape View File

2 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

59 .circleci/generate_config_yml.py Unescape Escape View File

2 .circleci/scripts/binary_checkout.sh Unescape Escape View File

2 .circleci/scripts/binary_ios_test.sh Unescape Escape View File

35 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

2 .circleci/scripts/binary_linux_build.sh Unescape Escape View File

2 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

4 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

7 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

6 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

2 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

6 .circleci/scripts/windows_cuda_install.sh Unescape Escape View File

3 .circleci/scripts/windows_cudnn_install.sh Unescape Escape View File

18 .circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Unescape Escape View File

64 .circleci/verbatim-sources/job-specs/job-specs-custom.yml Unescape Escape View File

171 .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml Unescape Escape View File

37 .circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml Unescape Escape View File

3 .clang-tidy Unescape Escape View File

1 .flake8 Unescape Escape View File

49 .github/ISSUE_TEMPLATE/bug-report.md vendored Unescape Escape View File

56 .github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file Unescape Escape View File

39 .github/ISSUE_TEMPLATE/ci-sev.md vendored Normal file Unescape Escape View File

5 .github/ISSUE_TEMPLATE/config.yml vendored Normal file Unescape Escape View File

9 .github/ISSUE_TEMPLATE/documentation.md vendored Unescape Escape View File

20 .github/ISSUE_TEMPLATE/documentation.yml vendored Normal file Unescape Escape View File

24 .github/ISSUE_TEMPLATE/feature-request.md vendored Unescape Escape View File

25 .github/ISSUE_TEMPLATE/feature-request.yml vendored Normal file Unescape Escape View File

13 .github/ISSUE_TEMPLATE/questions-help-support.md vendored Unescape Escape View File

2 .github/PULL_REQUEST_TEMPLATE.md vendored Unescape Escape View File

1 .github/actionlint.yaml vendored Unescape Escape View File

254 .github/generated-ciflow-ruleset.json generated vendored Unescape Escape View File

23 .github/scale-config.yml vendored Unescape Escape View File

11 .github/scripts/ensure_actions_will_cancel.py vendored Unescape Escape View File

71 .github/scripts/export_pytorch_labels.py vendored Executable file Unescape Escape View File

1 .github/scripts/generate_binary_build_matrix.py vendored Unescape Escape View File

699 .github/scripts/generate_ci_workflows.py vendored Unescape Escape View File

56 .github/scripts/generate_pytorch_test_matrix.py vendored Unescape Escape View File

2 .github/scripts/install_nvidia_utils_linux.sh vendored Unescape Escape View File

88 .github/scripts/lint_test_ownership.py vendored Executable file Unescape Escape View File

86 .github/scripts/run_torchbench.py vendored Unescape Escape View File

157 .github/templates/android_ci_full_workflow.yml.j2 vendored Normal file Unescape Escape View File

103 .github/templates/android_ci_workflow.yml.j2 vendored Normal file Unescape Escape View File

46 .github/templates/bazel_ci_workflow.yml.j2 vendored Unescape Escape View File

182 .github/templates/common.yml.j2 vendored Unescape Escape View File

81 .github/templates/common_android.yml.j2 vendored Normal file Unescape Escape View File

59 .github/templates/docker_builds_ci_workflow.yml.j2 vendored Normal file Unescape Escape View File

86 .github/templates/ios_ci_workflow.yml.j2 vendored Normal file Unescape Escape View File

229 .github/templates/linux_ci_workflow.yml.j2 vendored Unescape Escape View File

149 .github/templates/macos_ci_workflow.yml.j2 vendored Normal file Unescape Escape View File

2395 Commits

v1.10.0-rc ... v0.1.2

2

.azure_pipelines/job_templates/prepare-build-template.yml

View File

4

.azure_pipelines/job_templates/set-environment-variables.yml

View File

6

.bazelrc

View File

3

.circleci/cimodel/data/binary_build_data.py

View File

4

.circleci/cimodel/data/dimensions.py

View File

74

.circleci/cimodel/data/pytorch_build_data.py

View File

27

.circleci/cimodel/data/pytorch_build_definitions.py

View File

119

.circleci/cimodel/data/simple/android_definitions.py

View File

10

.circleci/cimodel/data/simple/binary_smoketest.py

View File

27

.circleci/cimodel/data/simple/docker_definitions.py

View File

6

.circleci/cimodel/data/simple/ios_definitions.py

View File

33

.circleci/cimodel/data/simple/mobile_definitions.py

View File

77

.circleci/cimodel/data/simple/nightly_android.py

View File

20

.circleci/cimodel/data/simple/nightly_ios.py

View File

160

.circleci/cimodel/data/windows_build_definitions.py

View File

3054

.circleci/config.yml generated

View File

4

.circleci/docker/android/build.gradle

View File

85

.circleci/docker/build.sh

View File

15

.circleci/docker/build_docker.sh

View File

12

.circleci/docker/centos-rocm/Dockerfile

View File

16

.circleci/docker/common/install_base.sh

View File

25

.circleci/docker/common/install_conda.sh

View File

10

.circleci/docker/common/install_cudnn8.sh Executable file

View File

11

.circleci/docker/common/install_gcc.sh

View File

2

.circleci/docker/common/install_openssl.sh

View File

27

.circleci/docker/common/install_rocm.sh

View File

7

.circleci/docker/common/install_tensorrt.sh Normal file

View File

22

.circleci/docker/ubuntu-cuda/Dockerfile

View File

6

.circleci/docker/ubuntu-rocm/Dockerfile

View File

2

.circleci/docker/ubuntu/Dockerfile

View File

59

.circleci/generate_config_yml.py

View File

2

.circleci/scripts/binary_checkout.sh

View File

2

.circleci/scripts/binary_ios_test.sh

View File

35

.circleci/scripts/binary_ios_upload.sh

View File

2

.circleci/scripts/binary_linux_build.sh

View File

2

.circleci/scripts/binary_linux_test.sh

View File

4

.circleci/scripts/binary_populate_env.sh

View File

7

.circleci/scripts/cpp_doc_push_script.sh

View File

6

.circleci/scripts/python_doc_push_script.sh

View File

2

.circleci/scripts/setup_ci_environment.sh

View File

6

.circleci/scripts/windows_cuda_install.sh

View File

3

.circleci/scripts/windows_cudnn_install.sh

View File

18

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml

View File

64

.circleci/verbatim-sources/job-specs/job-specs-custom.yml

View File

171

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml

View File

37

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml

View File

3

.clang-tidy

View File

1

.flake8

View File

49

.github/ISSUE_TEMPLATE/bug-report.md vendored

View File

56

.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file

View File

39

.github/ISSUE_TEMPLATE/ci-sev.md vendored Normal file

View File

5

.github/ISSUE_TEMPLATE/config.yml vendored Normal file

View File

9

.github/ISSUE_TEMPLATE/documentation.md vendored

View File

20

.github/ISSUE_TEMPLATE/documentation.yml vendored Normal file

View File

24

.github/ISSUE_TEMPLATE/feature-request.md vendored

View File

25

.github/ISSUE_TEMPLATE/feature-request.yml vendored Normal file

View File

13

.github/ISSUE_TEMPLATE/questions-help-support.md vendored

View File

2

.github/PULL_REQUEST_TEMPLATE.md vendored

View File

1

.github/actionlint.yaml vendored

View File

254

.github/generated-ciflow-ruleset.json generated vendored

View File

23

.github/scale-config.yml vendored

View File

11

.github/scripts/ensure_actions_will_cancel.py vendored

View File

71

.github/scripts/export_pytorch_labels.py vendored Executable file

View File

1

.github/scripts/generate_binary_build_matrix.py vendored

View File

699

.github/scripts/generate_ci_workflows.py vendored

View File

56

.github/scripts/generate_pytorch_test_matrix.py vendored

View File

2

.github/scripts/install_nvidia_utils_linux.sh vendored

View File

88

.github/scripts/lint_test_ownership.py vendored Executable file

View File

86

.github/scripts/run_torchbench.py vendored

View File