pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-30 11:44:59 +08:00

Author	SHA1	Message	Date
SplitInfinity	d69c22dd61	[docs] Add torch.package documentation for beta release (#59886 ) Summary This commit adds documentation for the `torch.package` module to accompany its beta release in 1.9. Test Plan Continous integration.	2021-06-11 13:43:27 -07:00
Lillian Johnson	4ad4f6db7f	hold references to storages during TorchScript serializaiton (#59672 ) Fixes issue for serialization problem caused by using memory address of storages for mobile and torch.package models. - https://github.com/pytorch/pytorch/pull/59642 hold references to storages during TorchScript serialization Uses StorageContext to hold a reference to all storages seen during TorchScript serialization to allow for tensors to be created/destroyed during serialization process. Tracking of the storages solves for the ABA memory problem.	2021-06-11 13:42:58 -07:00
Nikita Shulga	90e67738b1	[Release/1.9] Link whole CuDNN for CUDA-11.1 (#59873 ) * Move cublas dependency after CuDNN (#58287) Summary: Library linking order matters during static linking Not sure whether its a bug or a feature, but if cublas is reference before CuDNN, it will be partially statically linked into the library, even if it is not used Pull Request resolved: https://github.com/pytorch/pytorch/pull/58287 Reviewed By: janeyx99 Differential Revision: D28433165 Pulled By: malfet fbshipit-source-id: 8dffa0533075126dc383428f838f7d048074205c * [CMake] Split caffe2::cudnn into public and private (#59721) Summary: This is only important for builds where cuDNN is linked statically into libtorch_cpu. Before this PR PyTorch wheels often accidentally contained several partial copies of cudnn_static library. Splitting the interface into header only (cudnn-public) and library+headers(cudnn-private) prevents those from happening. Preliminary step towards enabling optional linking whole cudnn_library to workaround issue reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59721 Reviewed By: ngimel Differential Revision: D29000967 Pulled By: malfet fbshipit-source-id: f054df92b265e9494076ab16c247427b39da9336 * Add USE_WHOLE_CUDNN option (#59744) Summary: It is only enabled if USE_STATIC_CUDNN is enabled Next step after https://github.com/pytorch/pytorch/pull/59721 towards resolving fast kernels stripping reported in https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59744 Reviewed By: seemethere, ngimel Differential Revision: D29007314 Pulled By: malfet fbshipit-source-id: 7091e299c0c6cc2a8aa82fbf49312cecf3bb861a * [Binary] Link whole CuDNN for CUDA-11.1 (#59802) Summary: Fixes https://github.com/pytorch/pytorch/issues/50153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59802 Reviewed By: driazati, seemethere Differential Revision: D29033537 Pulled By: malfet fbshipit-source-id: e816fc71f273ae0b4ba8a0621d5368a2078561a1	2021-06-11 10:38:31 -07:00
Edward Z. Yang	43c581aa62	Make detach return an alias even under inference mode (#59633 ) (#59757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59633 Fixes #59614 This fix isn't 100% correct but it appears to stem the bleeding. A better fix would be understand how to detect when function implementations don't uphold required invariants, leading to refcount disaster. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28962183 Pulled By: ezyang fbshipit-source-id: 6ec71994666289dadef47bac363e6902df90b094	2021-06-11 10:04:14 -07:00
Nikita Shulga	bc446f6a54	Fix test_randperm_device_compatibility for 1 GPU (#59484 ) (#59502 ) Summary: Do not try to create tensors on 2nd device if device_count() == 1 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/59484 Reviewed By: ngimel Differential Revision: D28910673 Pulled By: malfet fbshipit-source-id: e3517f31a463dd049ce8a5155409b7b716c8df18	2021-06-04 20:01:02 -07:00
Nikita Shulga	abe996a7fb	Move CUDA async warning to suffix (#59467 ) (#59501 ) Summary: After the change async error warnings look as follows: ``` $ python -c "import torch;torch.eye(3,3,device='cuda:777')" Traceback (most recent call last): File "<string>", line 1, in <module> RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59467 Reviewed By: ngimel Differential Revision: D28904360 Pulled By: malfet fbshipit-source-id: 2a8fa5affed5b4ffcaa602c8ab2669061cde7db0	2021-06-04 20:00:55 -07:00
Nikita Shulga	795df76568	Do not use gold linker for CUDA builds (#59490 ) (#59500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59490 Reviewed By: agolynski, seemethere Differential Revision: D28913160 Pulled By: malfet fbshipit-source-id: d27092c252fc86424028abe146cf5f33a2f74544	2021-06-04 20:00:45 -07:00
Nikita Shulga	3b9cd08901	Prefer accurate reciprocal on ARMv8 (#59361 ) (#59470 ) Summary: Default NEON accelerated implementation of reciprocal uses vrecpeq_f32 which yield Newton-Raphson approximation rather than actual value Use regular NEON accelerated division for reciprocal and reciprocal square root operations. This fixes `test_reference_numerics_hard_frac_cpu_float32`, `test_reference_numerics_normal_rsqrt_cpu_float32` etc Pull Request resolved: https://github.com/pytorch/pytorch/pull/59361 Reviewed By: mruberry Differential Revision: D28870456 Pulled By: malfet fbshipit-source-id: e634b0887cce7efb046ea1fd9b74424e0eceb164	2021-06-04 18:34:39 -07:00
Nikita Shulga	226c274f70	Search for static OpenBLAS compiled with OpenMP (#59428 ) (#59463 ) Summary: Before that, only dynamically linked OpenBLAS compield with OpenMP could be found. Also get rid of hardcoded codepath for libgfortran.a in FindLAPACK.cmake Only affects aarch64 linux builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/59428 Reviewed By: agolynski Differential Revision: D28891314 Pulled By: malfet fbshipit-source-id: 5af55a14c85ac66551ad2805c5716bbefe8d55b2	2021-06-04 11:15:58 -07:00
Zhuojie Zhou	ce24cab257	Fix torch.randperm for CUDA (#59352 ) (#59452 ) Summary: Context https://github.com/pytorch/pytorch/issues/58545 The logic is that we are going to keep it consistent for both torch.randperm and torch.randint 1. Generators can have either a fully-specified or non-fully specified device 2. As long as the device type match with the result, we don't error out Pull Request resolved: https://github.com/pytorch/pytorch/pull/59352 Test Plan: ``` python test/test_tensor_creation_ops.py -k TestRandomTensorCreation ``` Reviewed By: ngimel Differential Revision: D28855920 Pulled By: zhouzhuojie fbshipit-source-id: f8141a2c4b2f177e1aa7baec6999b65916cba02c	2021-06-04 10:23:29 -07:00
Eli Uriegas	d98d113810	.circleci: Disable USE_GOLD_LINKER for CUDA 10.2 (#59413 ) (#59462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59413 For CUDA 10.2 builds linked with the gold linker we were observing crashes when exceptions were being raised Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28888054 Pulled By: seemethere fbshipit-source-id: f9b38147591721803ed3cac607510fe5bbc49d6d (cherry picked from commit c7a3a13baba0d547c5c20579328b0b3d83b94656) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-06-04 10:22:51 -07:00
Maxime	17a44c2bb5	Added missing namespaces for C++ API (#45736 ) (#59367 ) Summary: Hello, depending on the build environment you may encounter ```c++ error: reference to 'optional' is ambiguous ``` when using the Torch-C++-API. This PR adds `c10::` to avoid possible ambiguities with std::optional and does not introduce any functional change. Fixes https://discuss.pytorch.org/t/linker-failed-with-ambiguous-references/36255 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/45736 Reviewed By: dzhulgakov Differential Revision: D24125123 Pulled By: VitalyFedyunin fbshipit-source-id: df21420f0a2d0270227c28976a7a4218315cc107 Co-authored-by: Johannes Czech <QueensGambit@users.noreply.github.com>	2021-06-03 10:39:51 -07:00
Ivan Kobzarev	26e6fa380e	[vulkan] Remove constant duplication for Vulkan optimize_for_mobile (#59341 ) ghstack-source-id: bb809586d27d1285660d1db2c3561b46d158f499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59276	2021-06-03 09:45:56 -07:00
Nikita Shulga	bf16699cc8	[Release-1.9] Disable failing ROCM-4.2 tests (#59339 ) * [ROCm] disable test test_Conv2d_groups_nobias for ROCm (#59158) Summary: Disabling the test since its failing in ROCm4.2 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/59158 Reviewed By: mruberry Differential Revision: D28808953 Pulled By: ngimel fbshipit-source-id: 134f147ead6dc559d2cde49cf8343cd976e6c224 * [ROCm] disable test test_Conv2d_groups_nobias_v2 for ROCm (#58701) Summary: Disable test_Conv2d_groups_nobias_v2 test because it is failing on ROCm 4.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58701 Reviewed By: ngimel Differential Revision: D28626651 Pulled By: mruberry fbshipit-source-id: a74bdf45335ae2afee0aa5e3bece6e208e75a63f Co-authored-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Co-authored-by: Kyle Chen <kylechen@amd.com>	2021-06-02 15:07:06 -07:00
driazati	6d4fe05502	Build with USE_GLOO_WITH_OPENSSL=1 (#59274 ) Needed for https://github.com/pytorch/builder/pull/779 Co-authored-by: Your Name <driazati@users.noreply.github.com>	2021-06-02 08:18:25 -07:00
driazati	b046542f8a	Add breakpad + debug builds (#59275 ) This is the combination of #59236 and #58685 which will enable <insert builder PR here> to land on the release branch. This enables breakpad for minidump collection (which is still opt-in) and debug builds for the release. Co-authored-by: Your Name <driazati@users.noreply.github.com>	2021-06-01 23:32:08 -07:00
SplitInfinity	5d57b9392c	[pkg] Catch exceptions where dependency resolution gets invalid imports (#58573 ) (#59272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58573 Users can create invalid imports, like: ``` HG: in a top-level package if False: from .. import foo ``` Since this code is never executed, it will not cause the module to fail to load. But our dependency analysis walks every `import` statement in the AST, and will attempt to resolve the (incorrectly formed) import, throwing an exception. For posterity, the code that triggered this: https://git.io/JsCgM Differential Revision: D28543980 Test Plan: Added a unit test Reviewed By: Chillee Pulled By: suo fbshipit-source-id: 03b7e274633945b186500fab6f974973ef8c7c7d Co-authored-by: Michael Suo <suo@fb.com>	2021-06-01 15:51:38 -07:00
SplitInfinity	f6a9351776	[pkg] simplifications to broken dependency handling (#58572 ) (#59273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58572 Right now, we have three categories of error (broken, denied, unhandled). This PR unifies them into a single "error" field in the node, with optional context. It also generalizes how formatting of the error in PackagingError occurs. Differential Revision: D28543982 Test Plan: sandcastle Reviewed By: Chillee Pulled By: suo fbshipit-source-id: d99d37699ec2e172e3798763e60aafe9a66ed6f4 Co-authored-by: Michael Suo <suo@fb.com>	2021-06-01 15:51:30 -07:00
Rohan Varma	3071601491	[c10d] Fix monitored_barrier with wait_all_ranks (#58702 ) (#59266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58702 Off by one error when determining if some ranks failed or not with `wait_all_ranks=True`. This wasn't caught by tests because the tests only tested failure scenarios, not success scenarios with `wait_all_ranks=True`. ghstack-source-id: 129559840 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28583235 fbshipit-source-id: a8f376efb13a3f36c788667acab86543c80aff59	2021-06-01 15:45:16 -07:00
Joel Schlosser	d417a094f3	Document factory_kwargs in nn.Quantize + remove Attributes section (#59025 ) (#59045 ) Summary: The `factory_kwargs` kwarg was previously undocumented in `nn.Quantize`. Further, the `Attributes` section of the docs was improperly filled in, resulting in bad formatting. This section doesn't apply since `nn.Quantize` doesn't have parameters, so it has been removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59025 Reviewed By: anjali411 Differential Revision: D28723889 Pulled By: jbschlosser fbshipit-source-id: ba86429f66d511ac35042ebd9c6cc3da7b6b5805 Co-authored-by: Joel Schlosser <jbschlosser@fb.com>	2021-05-27 20:53:52 -07:00
Natalia Gimelshein	1fdbbc96ae	fix unique for discontiguous inputs (#59003 ) (#59055 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58959 Pull Request resolved: https://github.com/pytorch/pytorch/pull/59003 Reviewed By: mruberry Differential Revision: D28714534 Pulled By: ngimel fbshipit-source-id: d9bf82f54be5b5919e27281e49fad74e00d8b766	2021-05-27 20:52:42 -07:00
Nikita Shulga	e761f16ad5	Collect kernel version (#58485 ) (#59121 ) Summary: Collect env should collect kernel and glibc version Fixes https://github.com/pytorch/pytorch/issues/58387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58485 Reviewed By: walterddr Differential Revision: D28510564 Pulled By: malfet fbshipit-source-id: ad3d4b93f51db052720bfaa4322138c55816921b	2021-05-27 17:12:31 -07:00
Nikita Shulga	0544a765d3	Split CUDA SpectralOp (#58459 ) (#59120 ) Summary: Move all cuFFT related parts to SpectralOps.cpp Leave only _fft_fill_with_conjugate_symmetry_cuda_ in SpecralOps.cu Keep `CUDAHooks.cpp` in torch_cuda_cpp by introducing `at::cuda::detail::THCMagma_init` functor and registering it from global constructor in `THCTensorMathMagma.cu` Move entire detail folder to torch_cuda_cpp library. This is a no-op that helps greatly reduce binary size for CUDA-11.x builds by avoiding cufft/cudnn symbol duplication between torch_cuda_cpp(that makes most of cuFFT calls) and torch_cuda_cu (that only needed it to compile SpectralOps.cu) Pull Request resolved: https://github.com/pytorch/pytorch/pull/58459 Reviewed By: ngimel Differential Revision: D28499001 Pulled By: malfet fbshipit-source-id: 425a981beb383c18a79d4fbd9b49ddb4e5133291	2021-05-27 17:08:20 -07:00
Nikita Shulga	1ea8ae5d93	Refactor GlooDeviceFactory::makeDeviceFor... (#58996 ) (#59118 ) Summary: `makeDeviceForHostname` and `makeDeviceForInterface` are almost duplicate except for different default argument values Create generic `makeGlooDevice` anonymous function that takes both host name and interface name and call it from both makeDeviceFor[Hostname\|Interface] Also solve two other minor issues: - do not call `getenv("GLOO_DEVICE_TRANSPORT")` during library load time - Raise exception rather than crash if GLOO_DEVICE_TRANSPORT is set to unknown value Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58996 Reviewed By: pbelevich Differential Revision: D28713324 Pulled By: malfet fbshipit-source-id: cb33b438078d163e3ec6f047f2e5247b07d94f8d	2021-05-27 17:08:09 -07:00
Nikita Shulga	97ca7303b0	[ROCm] fix JIT codegen (#57400 ) (#59116 ) Summary: Fixes upcoming changes that are part of ROCm 4.2 and affect PyTorch JIT. - ROCM_VERSION macro must be available to both device and host compilation passes. - Unifies some of CUDA and HIP differences in the code generated. - NAN / POS_INFINITY / NEG_INFINITY - Do not hipify `extern __shared__` -> `HIP_DYNAMIC_SHARED()` macro [deprecated] - Differentiates bf16 codegen for HIP. - Optionally provides missing macros when using hiprtc precompiled header feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57400 Reviewed By: ejguan Differential Revision: D28421065 Pulled By: malfet fbshipit-source-id: 215f476773c61d8b0d9d148a4e5f5d016f863074 Co-authored-by: Jeff Daily <jeff.daily@amd.com>	2021-05-27 17:07:45 -07:00
Nikita Shulga	ac94547143	Change link order for BUILD_SPLIT_CUDA option (#58437 ) (#59119 ) Summary: torch_cuda_cu depends on torch_cuda_cpp, so it should be linked first Otherwise linker keeps lots of cudnn symbols for no good reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/58437 Reviewed By: janeyx99 Differential Revision: D28496472 Pulled By: malfet fbshipit-source-id: 338605ff755591476070c172a6ea0a0dcd0beb23	2021-05-27 17:07:39 -07:00
anjali411	e2027acebe	Add underscores to some internal names (#59105 ) * Add underscores to some internal names Summary: Add underscores to some of the internal names Test Plan: python test/test_profiler.py -v Reviewers: anjali411 [ghstack-poisoned] * Add underscores to some internal names Summary: Add underscores to some of the internal names Test Plan: python test/test_profiler.py -v Reviewers: anjali411 [ghstack-poisoned] Co-authored-by: ilia-cher <iliacher@fb.com>	2021-05-27 14:19:13 -07:00
Basil Hosmer	0896c6b1f0	fix nn.MHA scriptability (#58727 ) (#59072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58727 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28593830 Pulled By: bhosmer fbshipit-source-id: 37dee9efededaea9985a2bf040df1ba4b46f6580	2021-05-27 10:30:10 -07:00
Natalia Gimelshein	43f6675363	[PyTorch] Remove device check from a few indexing methods (#58800 ) (#59048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58800 These methods leverages TensorIterator which will handle (or skip) device check. ghstack-source-id: 129654358 Test Plan: CI && sandcastle Reviewed By: ngimel Differential Revision: D28622626 fbshipit-source-id: 6153299780d4f7bf286423520ba4cb60b554335e Co-authored-by: Wenlei Xie <wxie@fb.com>	2021-05-27 10:28:56 -07:00
soulitzer	450f5c6f4d	Add docstring for is_inference_mode_enabled (#59047 ) (#59085 ) Summary: Fixes` #{issue number} Testing: ``` >>> import torch >>> torch.is_inference_mode_enabled.__doc__ '\nis_inference_mode_enabled(input) -> (bool)\n\nReturns True if inference mode is currently enabled.\n\nArgs:\n input (Tensor): the input tensor.\n' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/59047 Reviewed By: ailzhang Differential Revision: D28726991 Pulled By: soulitzer fbshipit-source-id: c117c7d73e551a1b5f0e215f2aed528bf558ef7c	2021-05-27 10:27:32 -07:00
Joel Schlosser	310e528a0d	Add UninitializedBuffer to nn docs (#59021 ) (#59044 ) Summary: The `UninitializedBuffer` class was previously left out of `nn.rst`, so it was not included in the generated documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59021 Reviewed By: anjali411 Differential Revision: D28723044 Pulled By: jbschlosser fbshipit-source-id: 71e15b0c7fabaf57e8fbdf7fbd09ef2adbdb36ad Co-authored-by: Joel Schlosser <jbschlosser@fb.com>	2021-05-27 10:27:20 -07:00
driazati	e4161d0b2b	Add sparse_csr_tensor to BC allow-list (#59093 ) Fix for intentional regression in #59001 Co-authored-by: driazati <driazati@users.noreply.github.com>	2021-05-27 10:27:00 -07:00
Nikita Shulga	016dc8cb68	Fix build regression caused by https://github.com/pytorch/pytorch/pull/58940 (#59008 ) s/Vectorized/Vec256/ Vec256 were renamed to Vectorized on master after the branch cut	2021-05-26 11:55:50 -07:00
Joel Schlosser	a3ea5cee52	[docs] Clarify batch_first behavior for nn.LSTM, nn.RNN, and nn.GRU (#58809 ) (#58958 ) Summary: Fixes the high-pri doc component of https://github.com/pytorch/pytorch/issues/4145. To make the input / output shapes more readable for both `batch_first` states, this PR also introduces short dim names. Opinions welcome on the readability of the restructured docs! Screenshot for `nn.LSTM`: <img width="791" alt="Screen Shot 2021-05-24 at 5 11 39 PM" src="https://user-images.githubusercontent.com/75754324/119408130-389e5300-bcb3-11eb-9a4f-1df96a0a4d70.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/58809 Reviewed By: gchanan Differential Revision: D28685415 Pulled By: jbschlosser fbshipit-source-id: e8c92e3d7e052071a505b55dca976fd2ef5a8307 Co-authored-by: Joel Schlosser <jbschlosser@fb.com>	2021-05-26 11:12:56 -07:00
Edward Z. Yang	dfc58f4faa	Underscore prefix sparse_csr_tensor and to_sparse_csr (#59001 ) * Underscore prefix sparse_csr_tensor and to_sparse_csr Signed-off-by: Edward Z. Yang <ezyang@fb.com> * fix lint Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2021-05-26 11:11:25 -07:00
Joel Schlosser	b5e2635281	Add mish activation function (#58648 ) (#58940 ) Summary: See issus: https://github.com/pytorch/pytorch/issues/58375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648 Reviewed By: gchanan Differential Revision: D28625390 Pulled By: jbschlosser fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4 Co-authored-by: Adnios <2780199647@qq.com>	2021-05-25 13:30:36 -07:00
soulitzer	9dfd2e7b56	Add no-grad inference mode note (#58513 ) (#58939 ) Summary: Adds a note explaining the difference between several often conflated mechanisms in the autograd note Also adds a link to this note from the docs in `grad_mode` and `nn.module`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58513 Reviewed By: gchanan Differential Revision: D28651129 Pulled By: soulitzer fbshipit-source-id: af9eb1749b641fc1b632815634eea36bf7979156	2021-05-25 13:30:29 -07:00
Erjia Guan	f0bdbb4ce1	[Release/1.9][DataLoader] Add keyword arg to meta and support abc for typing (#58848 ) ghstack-source-id: 36e1ae3e08cf19da25c00a0a5e8a2bd0ab9530c3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58450	2021-05-24 10:26:54 -07:00
Rong Rong	bc4471c8c9	catch exception when running print regression (#58751 ) (#58752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58751 Test Plan: https://github.com/pytorch/pytorch/issues/58752 Reviewed By: samestep Differential Revision: D28605667 Pulled By: walterddr fbshipit-source-id: 3796c924df8e50849dd08ecbeab612ba4f0c569b	2021-05-23 22:30:07 -07:00
Nikita Shulga	317fd72526	Quote in setup-ci-env (#58637 ) (#58763 ) Summary: Do not put quotes for arguments that do not have space in them in add_to_env_file ENV file is used both by bash as well as by docker, which does not omit quotes when they are present there Pull Request resolved: https://github.com/pytorch/pytorch/pull/58637 Reviewed By: wconstab Differential Revision: D28561159 Pulled By: malfet fbshipit-source-id: 0843aad22703b6c3adebeb76175de1cfc1a974b5	2021-05-21 13:55:42 -07:00
nSircombe	b8d36033f0	Enables builds with Compute Library backend for oneDNN (#55913 ) (#58746 ) Summary: Since v1.7, oneDNN (MKL-DNN) has supported the use of Compute Library for the Arm architeture to provide optimised convolution primitives on AArch64. This change enables the use of Compute Library in the PyTorch build. Following the approach used to enable the use of CBLAS in MKLDNN, It is enabled by setting the env vars USE_MKLDNN and USE_MKLDNN_ACL. The location of the Compute Library build must be set useing `ACL_ROOT_DIR`. This is an extension of the work in https://github.com/pytorch/pytorch/pull/50400 which added support for the oneDNN/MKL-DNN backend on AArch64. _Note: this assumes that Compute Library has been built and installed at ACL_ROOT_DIR. Compute library can be downloaded here: `https://github.com/ARM-software/ComputeLibrary`_ Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55913 Reviewed By: ailzhang Differential Revision: D28559516 Pulled By: malfet fbshipit-source-id: 29d24996097d0a54efc9ab754fb3f0bded290005	2021-05-21 10:59:00 -07:00
cccclai	47507259b9	[PyTorch Edge] Use lite interpreter as default and bump model version (#58630 ) * [PyTorch Edge] bytecode version bump to v5 and enable share constant table * [Pytorch] Build lite interpreter as default for iOS * [Pytorch] Build lite interpreter as default for Android	2021-05-20 17:43:14 -07:00
mcarilli	e77e8d52da	Add grid_sample to fp32 list (#58683 )	2021-05-20 17:34:03 -07:00
Natalia Gimelshein	b9fb6d1c7e	fix nonzero perf regression (#58714 )	2021-05-20 17:31:34 -07:00
Lillian Johnson	1ea310bc8e	[1.9] remove gate for beta feature (torchscript support in torch.package) (#58620 )	2021-05-19 15:21:11 -07:00
Joel Schlosser	8e6b8d8d46	Add shape documentation for CosineEmbeddingLoss (#58403 ) (#58590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58403 Reviewed By: HDCharles Differential Revision: D28480076 Pulled By: jbschlosser fbshipit-source-id: c2c51e9da86e274e80126bbcabebb27270f2d2d0 Co-authored-by: Joel Schlosser <jbschlosser@fb.com>	2021-05-19 14:05:11 -07:00
Richard Zou	87c46a5e32	[1.9] Remove torch.vmap (#58589 ) torch.vmap is a prototype feature and should not be in the stable binary. This PR: - Removes the torch.vmap API - Removes the documentation entry for torch.vmap - Changes the vmap tests to use an internal API instead of torch.vmap. Test Plan: - Tested locally (test_torch, test_autograd, test_type_hints, test_vmap), but also wait for CI.	2021-05-19 14:04:27 -07:00
driazati	5092364d78	[release/1.9] Pin builder and xla repos (#58514 ) Pin builder to https://github.com/pytorch/builder/commits/release/1.9 Pin xla to https://github.com/pytorch/xla/tree/r1.9 Co-authored-by: driazati <driazati@users.noreply.github.com>	2021-05-18 18:52:06 -07:00
Eli Uriegas	085a3bcb77	[release/1.9] Fix issues regarding binary_chekcout (#58495 ) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-05-18 10:36:16 -07:00
Eli Uriegas	5f0bbb38ec	ci: Release branch specific changes Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2021-05-17 17:30:59 -07:00
Eli Uriegas	cce156ac94	.github: Make on_pull_request a conditional block (#58363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58363 Previous implemntation relied on us directly writing the yaml instead of just having a conditional block, this allows us better readability for pull request triggers Signed-off-by: Eli Uriegas <seemethere101@gmail.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28465271 Pulled By: seemethere fbshipit-source-id: fd556bb6bac4954fcddb4a2b0383e996f292a794	2021-05-17 12:08:58 -07:00
Sicheng Stephen Jia	c29e6d37e8	[Vulkan] Switch to Image2D for Convolution biases (#57201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57201 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D28293602 Pulled By: SS-JIA fbshipit-source-id: 7f9691ea9e8e2505616ee7cefc0a1f3fe4bf95e7	2021-05-17 12:06:52 -07:00
Sicheng Stephen Jia	2879f0f780	[Vulkan] Use 2D tensor views when possible (#57198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57198 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D28293561 Pulled By: SS-JIA fbshipit-source-id: 77099cfe54870a19e926a067de9852787eb55d0b	2021-05-17 12:05:12 -07:00
Natalia Gimelshein	95fd1e9045	reduce number of randperm template instantiations (#58362 ) Summary: Per title, benchmarks in https://github.com/pytorch/pytorch/issues/54113 don't regress, size of torch_cuda_cu_generated_Randperm.cu.o goes 8562152 -> 2585792 for a single architecture, compilation time decreases also. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58362 Reviewed By: heitorschueroff Differential Revision: D28477697 Pulled By: ngimel fbshipit-source-id: 32dbe44ca6b3807668d548512d7484f8488834c4	2021-05-17 11:40:59 -07:00
Jacob Szwejbka	a3b33139da	[Pytorch] Add non mutator bundled inputs method (#58408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58408 Itd be nice to have a version of bundle inputs that didnt mutate the original class/object. So now there is! ghstack-source-id: 129127316 Test Plan: The new unittests Reviewed By: dhruvbird Differential Revision: D28460231 fbshipit-source-id: f6f7a19e264bddfaa177304cbde40336060a237a	2021-05-17 11:36:49 -07:00
Luca Wehrstedt	ae9b66dd94	Fix TP agent not recording outgoing tensors with caching allocator (#58384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58384 When the caller send tensors within a request, it does so on fresh streams it obtained from the caching allocator. However it wasn't recording those tensors with the caching allocator. This carried the risk that, if those tensors were deleted before the async CUDA ops were done, the caching allocator could reuse the storage and thus overwrite the previous data while it was still being used. ghstack-source-id: 129107582 Test Plan: eyes Reviewed By: mrshenli Differential Revision: D28473429 fbshipit-source-id: 3f2617048d984cec7a270858d282cecf1140ecf0	2021-05-17 10:57:44 -07:00
Luca Wehrstedt	affed3b04d	Prevent lock inversions with GIL in Future (#58391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58391 An additional (and hopefully more robust) way of fixing the same problem https://github.com/pytorch/pytorch/pull/58382 fixed. ghstack-source-id: 129110325 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28474154 fbshipit-source-id: 625ebe782e380c60b3ead4c4ed8a51d4bc917153	2021-05-17 10:54:26 -07:00
Luca Wehrstedt	5a238eb96e	Fix deadlock in Future due to lock inversion with GIL (#58382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58382 Calling markCompleted on a Future now first acquires the Future's mutex (as usual) but then sometimes tries to acquire the GIL during the DataPtr extraction while still holding the Future's mutex. (This happens when the value passed to markCompleted is a Python object). This can cause a deadlock if someone else calls any of the other methods of Future while holding the GIL. There are two solutions to this: avoid holding the Future's mutex when extracting DataPtrs, and avoid holding the GIL while invoking the Future's method. In this PR I'm going for the latter, because it's a very simple immediate fix, but I believe this is brittle and that we should probably also consider the former fix. ghstack-source-id: 129105358 Test Plan: The repro in https://github.com/pytorch/pytorch/issues/58239 now doesn't deadlock. Reviewed By: mrshenli Differential Revision: D28472816 fbshipit-source-id: 1bc9bca426dd004f9eb2568db1ffd38f014450e2	2021-05-17 10:53:19 -07:00
zhouzhuojie	eab59bae15	Fix cmake_minimum_require in libshm (#58306 ) Summary: Deprecation warning reported by cmake: ``` CMake Deprecation Warning at CMakeLists.txt (cmake_minimum_required): Compatibility with CMake < 2.8.12 will be removed from a future version of CMake. Update the VERSION argument <min> value or use a ...<max> suffix to tell CMake that the project does not need compatibility with older versions. ``` This is the only place that requires bumping min version. There're two others but only in `third_party` folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58306 Reviewed By: bdhirsh Differential Revision: D28446097 Pulled By: zhouzhuojie fbshipit-source-id: af5ef50e61bd57dc36089ebe62db70ba0081864c	2021-05-17 09:55:07 -07:00
zhouzhuojie	bef0e07e09	Remove unused Dockerfile_runtime (#58333 ) Summary: Related to the effort of upgrade ubuntu base images https://github.com/pytorch/pytorch/issues/58309, this PR removes the unused tools/docker/Dockerfile_runtime It was introduced in https://github.com/pytorch/pytorch/issues/1619, https://github.com/pytorch/pytorch/pull/1732 - No code references in pytorch github org https://github.com/search?q=org%3Apytorch+Dockerfile_runtime&type=code - Runtime images are available https://hub.docker.com/r/pytorch/pytorch/tags?page=1&ordering=last_updated&name=runtime (~2GB image size) One less thing to maintain... Pull Request resolved: https://github.com/pytorch/pytorch/pull/58333 Reviewed By: samestep Differential Revision: D28457139 Pulled By: zhouzhuojie fbshipit-source-id: 3c7034c52eb71463ac284dc48f0f9bbbf3af1312	2021-05-17 09:50:47 -07:00
Edward Yang	4454b18e14	Revert D28371127: Wrap torch::deploy API functions in safe rethrow macros Test Plan: revert-hammer Differential Revision: D28371127 (`1ad06ba3f5`) Original commit changeset: c0ced2f19442 fbshipit-source-id: 1775bed182692b3246ff591e6a655264f3546315	2021-05-17 09:30:34 -07:00
peter	432676599c	Stop installing libuv on Windows (#51936 ) Summary: Fixes #{issue number} gunandrose4u Pull Request resolved: https://github.com/pytorch/pytorch/pull/51936 Reviewed By: malfet Differential Revision: D28467662 Pulled By: seemethere fbshipit-source-id: 28d203ee3af13d6a3158f188c2e889e310ee6010	2021-05-17 08:52:29 -07:00
Will Constable	1ad06ba3f5	Wrap torch::deploy API functions in safe rethrow macros (#58192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58192 Exceptions thrown by deploy internals need to be sanitized for application safety. See commment in deploy.h for detailed explanation. Test Plan: Added unit test Reviewed By: suo Differential Revision: D28371127 fbshipit-source-id: c0ced2f194424a394c5852bd4ab5cb41b0f4e87b	2021-05-17 08:02:33 -07:00
Chester Liu	b1b9fb0147	Specify the exact commit when triggering multi-gpu pipeline (#58219 ) Summary: Previously only the branch is specified when triggering the multi-gpu pipeline, which could result in incorrect commit being targeted, because when the pipeline actually runs there could be newer commit on the specified branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58219 Reviewed By: malfet, bdhirsh Differential Revision: D28446453 Pulled By: seemethere fbshipit-source-id: 680c0b3a9f3f20b61787cc90fda73b87d66e6af8	2021-05-17 06:32:18 -07:00
Thomas J. Fan	ee93a348de	ENH Raises nicer error when calling module.train with invalid modes (#58247 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46763 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58247 Reviewed By: ejguan Differential Revision: D28418080 Pulled By: albanD fbshipit-source-id: fef8f4f641ef75e801ed8b8d04c4016579aea8b0	2021-05-17 05:57:18 -07:00
Mike Ruberry	9c7d5ed9b0	Clarifies cholesky_ex role and makes batched support a common string (#58217 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34272 Also updates and creates a common string for when the linear algebra operations support batching Pull Request resolved: https://github.com/pytorch/pytorch/pull/58217 Reviewed By: ngimel Differential Revision: D28405908 Pulled By: mruberry fbshipit-source-id: a9d81a5a4712cfdedc22d614986d3707f10742a2	2021-05-17 05:23:06 -07:00
Facebook Community Bot	6060684609	Automated submodule update: tensorpipe (#57613 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `9ed4fb12a4` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57613 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D28220987 fbshipit-source-id: 4ecd2589d01f91678194d9e3ac309ad6f6df3e70	2021-05-17 01:35:48 -07:00
Nikita Shulga	71f4c5c1f4	Fix "ci/master" workflow (#58335 ) Summary: Include jobs master-only jobs depends on to the workflow Pull Request resolved: https://github.com/pytorch/pytorch/pull/58335 Reviewed By: walterddr Differential Revision: D28458406 Pulled By: malfet fbshipit-source-id: 217a8996daacd494af1bbc54e725bbcacc0c7784	2021-05-16 12:01:38 -07:00
Prabhav Agrawal	1a91892f90	Added fix for missing ops aten::sorted.str (#58339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58339 The operator was present as part of full_jit ops but wasn't included for mobile. The diff copies it for mobile Test Plan: buck run xplat/langtech/mobile:giga5_bin -- --voice /data/users/prabhavag/experiments/embedded_new_stateful_conv_may6/nicole_batch.giga5 --frontend /data/users/prabhavag/experiments/tools_pkg/en_US.embedded.frontend.titan --icudata xplat/third-party/icu/stubdata/reduced/icudt55l.dat --text "haha" Reviewed By: iseeyuan Differential Revision: D28452179 fbshipit-source-id: ef7a929f1a6d40573438785a4959c1c1e39762f0	2021-05-15 19:26:35 -07:00
Elias Ellison	211bac53ef	[JIT] Add optimize_for_inference API (#58193 ) Summary: Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations. The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword. Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193 Reviewed By: bertmaher, navahgar Differential Revision: D28443714 Pulled By: eellison fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649	2021-05-15 15:50:14 -07:00
Bert Maher	fad2ce439e	[nnc] Link all available LLVM targets (#58312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58312 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28449168 Pulled By: bertmaher fbshipit-source-id: 4c72f8dbb28d860377dcd19f5934927e7347409a	2021-05-15 15:32:14 -07:00
Vasiliy Kuznetsov	4f50fdc2a3	fx quant: refactor observer insertion Summary: tl;dr; rewrites the FX graph mode quantization observer insertion to be easier to understand and extend. The key conceptual difference from before is: * before: for each node, observers are always inserted to the output of the current node, even if they are needed for the next node. This is hard to reason about. * after: for each node, observers are inserted to the inputs (if needed, as calculated by the dtype of the argument and dtype of current node) and to the output (if needed for the type of pattern and qconfig). There is no knowledge of future nodes needed to insert observers for the current node. This allows us to significantly simplify various things: * all new observers needed for a node are inserted together. This makes it easier to understand and debug things. We add an invariant that node X will never change any observers inserted by any preceding or subsequent node, so to debug an issue the user can just understand what is happening for node X, without having to understand what happens before or after it. * all the state tracking of activation_post_process_map and activation_post_process_indices are removed, instead observers are looked up by graph traversals * since there is no longer a need for overlapping graph passes which mutate each other's interemediate state, it is easier to understand what the rules are for inserting observers, and to create new rules in the future. Test Plan: ``` # all OSS tests pass python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Differential Revision: D28241864 Reviewed By: jerryzh168 Pulled By: vkuzo fbshipit-source-id: 950d58972d26362808564cc0a2dfb30413a3734d	2021-05-15 09:51:33 -07:00
Yi Wang	2436377a7d	Remote the list for the attributes that will be ignored for pickling (#58345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58345 1. Add a sanity check to make sure any new attribute added to the constructor should be added to either `_REMOTE_MODULE_ATTRIBUTES_IGNORE_FOR_PICKLING` pr `_REMOTE_MODULE_ATTRIBUTES_IGNORE_FOR_PICKLING`. 2. Update some comments and warning -- now if a new attribute is added after the construction, it will not be pickled. Previously it will trigger a runtime error, which is hard for unit test (one worker hit the runtime error, but the other worker will cause timeout). Context: https://github.com/pytorch/pytorch/pull/58019#discussion_r632322083 ghstack-source-id: 129070358 Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D28460744 fbshipit-source-id: 8028186fc447c88fbf2bf57f5c5d321f42ba54ed	2021-05-15 00:47:48 -07:00
Shiyan Deng	9def776cd6	[fx_acc] e2e quantized resnet18 (#58204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58204 Pull Request resolved: https://github.com/pytorch/glow/pull/5657 E2E quantized ResNet18 via accelerated graph module. Accuracy matches Test Plan: `buck test glow/fb/fx/fx_glow:test_fx_glow -- test_fx_glow_binding_quantized_resnet` Reviewed By: khabinov Differential Revision: D27717265 fbshipit-source-id: 6c6a40eb07f19c7b4d663a5dfb07e5d16fd05e03	2021-05-14 18:49:37 -07:00
Shiyan Deng	bcacf91a71	[fx_glow]Add Support for importing quantized linear in FXIRImporter (#57483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57483 Pull Request resolved: https://github.com/pytorch/glow/pull/5622 Quantized linear has packed parameters. We want to unpack it so that it would be easier for graph optimization and importer to deal with the weight and bias. A customized remapping function is used to unpack quantized linear and map it to acc_op.linear. Test Plan: `buck test glow/fb/fx/nnpi_importer:test_importer` Reviewed By: gcatron, jfix71, khabinov Differential Revision: D27451237 fbshipit-source-id: e46e961734788fd5333e227ca6143fd37c33204e	2021-05-14 18:48:31 -07:00
Kiuk Chung	998374a702	[tsm] add support for jetter to Role (base_image) for mast launches (#58252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58252 Pull Request resolved: https://github.com/pytorch/elastic/pull/149 1. Adds `ml_image` buck macro 2. Adds `--run_path` option to `torch.distributed.run` 3. Adds `tsm/driver/fb/test/patched/foo` (for unittesting) 4. Changes to `distributed_sum` to use `ml_image` (see Test plan for how this was tested in local and mast) NOTE: need to enable jetter for flow and local schedulers (will do this on a separate diff since this diff is already really big) Test Plan: ## Local Testing ``` # build the two fbpkgs (base and main) buck run //pytorch/elastic/examples/distributed_sum/fb:torchx.examples.dist_sum.base buck run //pytorch/elastic/examples/distributed_sum/fb:torchx.examples.dist_sum # fetch the fbpkgs cd ~/tmp fbpkg fetch --symlink-tags -o -d . jetter:prod fbpkg fetch --symlink-tags -o -d . torchx.examples.dist_sum.base fbpkg fetch --symlink-tags -o -d . torchx.examples.dist_sum jetter/LAST/jetter apply-and-run \ torchx.examples.dist_sum.base/LAST/torchrun \ torchx.examples.dist_sum/LAST \ -- \ --as_function \ --rdzv_id foobar \ --nnodes 1 \ --nproc_per_node 2 \ --max_restarts 0 \ --role worker \ --no_python \ ~/torchx.examples.dist_sum/LAST/pytorch/elastic/examples/distributed_sum/fb/main.py ``` ## Mast Testing ``` buck-out/gen/pytorch/elastic/torchelastic/tsm/fb/cli/tsm.par run_ddp \ --scheduler mast --base_fbpkg torchx.examples.dist_sum.base:78f01b5 \ --fbpkg torchx.examples.dist_sum:f38ab46 \ --run_cfg hpcClusterUuid=MastNaoTestCluster,hpcIdentity=pytorch_r2p,hpcJobOncall=pytorch_r2p \ --nnodes 2 \ --resource T1 \ --nproc_per_node 4 \ --name kiuk_jetter_test \ pytorch/elastic/examples/distributed_sum/fb/main.py ``` Runs successfully: https://www.internalfb.com/mast/job/tsm_kiuk-kiuk_jetter_test_34c9f0fa? Reviewed By: tierex Differential Revision: D28421033 fbshipit-source-id: 96edcecf639143e31ec6c86ec713a2e2d7790f3d	2021-05-14 17:39:18 -07:00
Nikita Shulga	b0819b0b73	[CircleCI] s/ubuntu-1604:202007-01/ubuntu-2004:202104-01/ (#58308 ) Summary: Switch to latest ubuntu supported by CircleCI, according to https://circleci.com/docs/2.0/configuration-reference/#machine Also upgrade awscli from 1.x to 2.x, which requires replacing aws ecr get-login with awc ecr get-login-password, per https://docs.aws.amazon.com/cli/latest/userguide/cliv2-migration.html#cliv2-migration-ecr-get-login Pull Request resolved: https://github.com/pytorch/pytorch/pull/58308 Reviewed By: walterddr Differential Revision: D28446659 Pulled By: malfet fbshipit-source-id: 260f795dce83b3d191d8e3fa629c77c1b9fae36c	2021-05-14 17:28:35 -07:00
Sam Estep	67583122f0	Use pip3 instead of pip when building ECR GC image (#58334 ) Summary: A followup to https://github.com/pytorch/pytorch/issues/58309, to fix the broken docker_for_ecr_gc_build_job: - https://app.circleci.com/pipelines/github/pytorch/pytorch/322672/workflows/4877ddfe-eee1-4116-91ae-6ee9dd3a97ad/jobs/13486207 - https://app.circleci.com/pipelines/github/pytorch/pytorch/322710/workflows/8d33afb6-7b85-48c7-94fd-ac9176f4a16e/jobs/13488388 - https://app.circleci.com/pipelines/github/pytorch/pytorch/322759/workflows/b480989a-b39e-48f7-929d-66f1bdc50c89/jobs/13490919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58334 Test Plan: Before this PR, this fails: ``` cd .circleci/ecr_gc_docker && docker build . ``` After this PR, it succeeds. Reviewed By: zhouzhuojie Differential Revision: D28457290 Pulled By: samestep fbshipit-source-id: d8d8c8759412d9d7876d5908768ee5cb7261132d	2021-05-14 17:21:48 -07:00
eqy	00a46a5eb4	Fix incorrect inplace sort in `topk` (#58314 ) (#58318 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58314 https://github.com/pytorch/pytorch/issues/55392 introduced a bug by not allocating a separate value tensor for sorting CC ngimel zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/58318 Reviewed By: mruberry Differential Revision: D28450698 Pulled By: ngimel fbshipit-source-id: dea1201ebfcbaab8536580b80f8321bda2468fc4	2021-05-14 17:15:24 -07:00
Bert Maher	c4c2039fb2	Revert D27652484: [nnc] Enable CPU fusion inside Facebook Test Plan: revert-hammer Differential Revision: D27652484 (`ac04cc775b`) Original commit changeset: a82681455dae fbshipit-source-id: ecfef3ee1e7197148b172234691e72faf4b95cf8	2021-05-14 16:41:23 -07:00
Basil Hosmer	a4075fca9a	extract dispatch keys from optional Tensors (unboxed) (#58296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58296 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D28436822 Pulled By: bhosmer fbshipit-source-id: 8031c9a3c121483dd0e5ed7b8b165952477108e4	2021-05-14 16:12:57 -07:00
Yuchen Huang	3dc70d8f78	[iOS][Metal] Add target for testing metal ops (#57832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57832 - Added a shared BUCK target: `//fbobjc/Apps/Internal/PyTorchMetalOpTester:PyTorchMetalOpTester` - Invoke this target from 3 Apps: pp-ios, pp-macos, PyTorchBenckmark ghstack-source-id: 129037985 (Note: this ignores all push blocking failures!) Test Plan: ## PyTorch Playground - macOS `buck test pp-macos` ``` ➜ fbsource buck test pp-macos Building: finished in 0.3 sec (100%) 204/6264 jobs, 0 updated Total time: 0.5 sec Testing: finished in 8.4 sec (6 PASS/0 FAIL) RESULTS FOR //fbobjc/Apps/Internal/PyTorchPlaygroundMac:PyTorchPlaygroundMacTests PASS 999ms 2 Passed 0 Skipped 0 Failed ClassificationTests PASS 6.4s 1 Passed 0 Skipped 0 Failed MetalOpsTests PASS 181ms 3 Passed 0 Skipped 0 Failed PersonSegmentationTests Updated test logs: buck-out/log/test.log TESTS PASSED ``` ## PyTorch Playground - iOS - `arc focus2 pp-ios` - Build and run via Xcode {F613716289} ### AI Bench - `buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/metal/metal_op_test.json --platform ios --framework pytorch --remote --devices D53 (`871b1419de`)pAP` - Result: https://fburl.com/aibench/d2gtlndd Reviewed By: xta0 Differential Revision: D28235867 fbshipit-source-id: dcee8aee140b5f665a97efe278ee621f436c7c68	2021-05-14 15:06:24 -07:00
Horace He	84d8e3b0f6	[FX] Finished prepare_for_inference API for release (#58293 ) Summary: Added an ability to configure which passes to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58293 Reviewed By: bdhirsh Differential Revision: D28435948 Pulled By: Chillee fbshipit-source-id: dfc7f1ef6b38e6f49c2423a5efe8477a645171d0	2021-05-14 14:10:07 -07:00
James Reed	00156d4845	[FX][WIP] Proxyable classes (#56737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56737 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D27953073 Pulled By: jamesr66a fbshipit-source-id: fafc681af7bd5200a9ead2fd0720940913885575	2021-05-14 14:07:04 -07:00
Chen Lai	d3fbb41c61	[PyTorch Edge] share tensors in mobile with new api (#58182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58182 As title, the v5 model format will be ``` (base) chenlai@chenlai-mp reuse_constant % zipinfo /Users/chenlai/Documents/pytorch/reuse_constant/tmp/zip/script_module_v5_unify.ptl Archive: /Users/chenlai/Documents/pytorch/reuse_constant/tmp/zip/script_module_v5_unify.ptl Zip file size: 3120 bytes, number of entries: 7 -rw---- 0.0 fat 77 bl stor 80-000-00 00:00 script_module_v4_unify/data.pkl -rw---- 0.0 fat 240 bl defN 80-000-00 00:00 script_module_v4_unify/code/__torch__/___torch_mangle_5.py -rw---- 0.0 fat 422 bl defN 80-000-00 00:00 script_module_v4_unify/code/__torch__/___torch_mangle_5.py.debug_pkl -rw---- 0.0 fat 64 bl stor 80-000-00 00:00 script_module_v4_unify/constants/140245072983168.storage -rw---- 0.0 fat 172 bl stor 80-000-00 00:00 script_module_v4_unify/constants.pkl -rw---- 0.0 fat 678 bl stor 80-000-00 00:00 script_module_v4_unify/bytecode.pkl -rw---- 0.0 fat 2 bl stor 80-000-00 00:00 script_module_v4_unify/version 7 files, 1655 bytes uncompressed, 1453 bytes compressed: 12.2% ``` bytecode.pkl is: ``` (5, ('__torch__.___torch_mangle_5.TestModule.forward', (('instructions', (('STOREN', 1, 2), ('DROPR', 1, 0), ('LOADC', 0, 0), ('LOADC', 1, 0), ('MOVE', 2, 0), ('OP', 0, 0), ('LOADC', 1, 0), ('OP', 1, 0), ('RET', 0, 0))), ('operators', (('aten::add', 'int'), ('aten::add', 'Scalar'))), ('constants', (torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.DoubleStorage, '140245072983168.storage', 'cpu', 8),), 0, (2, 4), (4, 1), False, collections.OrderedDict()), 1)), ('types', ()), ('register_size', 2)), (('arguments', ((('name', 'self'), ('type', '__torch__.___torch_mangle_5.TestModule'), ('default_value', None)), (('name', 'y'), ('type', 'int'), ('default_value', None)))), ('returns', ((('name', ''), ('type', 'Tensor'), ('default_value', None)),))))) ``` constants.pkl is: ``` (torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.DoubleStorage, '140245072983168.storage', 'cpu', 8),), 0, (2, 4), (4, 1), False, collections.OrderedDict()),) ``` Both tensors will refer to the tensor in at the path `script_module_v4_unify/constants/140245072983168.storage`. ## Note According to unify format, all tensors will be written to the folder `.data`, however, torch.jit.load() can't handle the unified format at this moment, so this change will write tensors at the `constants` folders, and mobile will write/read tensors from `constants` folder. such that the model can be interpreted by both jit and mobile. ghstack-source-id: 129010347 Test Plan: buck test mode/dev //caffe2/test/cpp/jit:jit Reviewed By: raziel, iseeyuan Differential Revision: D28375257 fbshipit-source-id: 6544472db4c957c5ea037e0bb5112b637dd15897	2021-05-14 14:03:56 -07:00
Meghan Lele	c034bce979	Back out "[ONNX] Process const folding progressively when converts to ONNX (#54569 )" Summary: Original commit changeset: 833dac7c71f2 Test Plan: ``` buck test mode/dev //pytext/fb/assistant/lite/test:test -- --exact 'pytext/fb/assistant/lite/test:test - test_export_bytes_model_to_caffe2 (pytext.fb.assistant.lite.test.test.TestExport)' ``` Reviewed By: jeanm Differential Revision: D28431840 fbshipit-source-id: 0f1d530034404421a5d51691173e1cc0ee16fdd6	2021-05-14 13:45:49 -07:00
Lunwen He	0a561f83ca	[PyTorch Mobile]Fix unit test (#58202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58202 This unit test was testing the wrong target. It should test the sampler under jit::mobile. This diff fixes it. Test Plan: run unit tests Reviewed By: shreyanb98 Differential Revision: D28384839 fbshipit-source-id: 35cc63be2e73ca9b1a7d30d6f67fffcfe5021fa2	2021-05-14 13:43:22 -07:00
Sam Estep	34ac28b5ff	Bump Ubuntu/Python versions in ECR GC Docker image (#58309 ) Summary: Should fix the ECR GC jobs that broke as a result of https://github.com/pytorch/pytorch/issues/58275; examples: - https://app.circleci.com/pipelines/github/pytorch/pytorch/322385/workflows/c26788cb-2147-4279-9813-224af3c01583/jobs/13480923 - https://app.circleci.com/pipelines/github/pytorch/pytorch/322385/workflows/c26788cb-2147-4279-9813-224af3c01583/jobs/13473074 - https://app.circleci.com/pipelines/github/pytorch/pytorch/322385/workflows/c26788cb-2147-4279-9813-224af3c01583/jobs/13473077 - https://app.circleci.com/pipelines/github/pytorch/pytorch/322385/workflows/c26788cb-2147-4279-9813-224af3c01583/jobs/13473073 See also https://github.com/pytorch/pytorch/issues/58308. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58309 Reviewed By: malfet, seemethere Differential Revision: D28447014 Pulled By: samestep fbshipit-source-id: db857154a94482f4da1db8d74c383527d1b14b49	2021-05-14 13:12:02 -07:00
Sam Estep	623d63d9da	[reland] Build and push Docker images in GitHub Actions (#58299 ) Summary: This is a reland of https://github.com/pytorch/pytorch/issues/58174. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58299 Reviewed By: malfet, seemethere, janeyx99 Differential Revision: D28445451 Pulled By: samestep fbshipit-source-id: 2654118fe80f50bbdaaad9b7ee58dfd8ef25980d	2021-05-14 13:07:25 -07:00
Lunwen He	73d51406fa	[PyTorch Mobile]Move train related files to their own folder (#58205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58205 It's worthing moving train related files into their own folder since we are adding more code under the mobile directory. This diff does that. Test Plan: run unit tests and ci Reviewed By: iseeyuan Differential Revision: D28402432 fbshipit-source-id: cd76a1c4f8ff06508cdc3aad8a169fbf34bb4995	2021-05-14 12:54:44 -07:00
Vitaly Fedyunin	49a8942a77	Revert D25399466: add channels last support for AvgPool2d on CPU Test Plan: revert-hammer Differential Revision: D25399466 (`8ac0917cc7`) Original commit changeset: 9477b0c281c0 fbshipit-source-id: e0245f0e390f5eca228445fd82d6e5142a827abc	2021-05-14 12:45:29 -07:00
Vitaly Fedyunin	0caec739a3	Revert D25399468: optimize channels last for BatchNorm2d on CPU Test Plan: revert-hammer Differential Revision: D25399468 (`0be334a1ba`) Original commit changeset: a4cd7a09cd4e fbshipit-source-id: cef74881adcdf193355fa5a77e816addd2e2c56e	2021-05-14 12:44:14 -07:00
Jacob Szwejbka	94ef2b9b48	[Pytorch] Better doc strings for bundled inputs (#56591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56591 title ghstack-source-id: 128926699 Test Plan: na Reviewed By: dreiss Differential Revision: D27912185 fbshipit-source-id: 1a8f267af21afb7b4393b9ec0792dd17c48e57cb	2021-05-14 11:13:23 -07:00
mingfeima	0be334a1ba	optimize channels last for BatchNorm2d on CPU (#48919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48919 move data indexing utils parallel inference contiguous path parallel inference channels last path add dim apply optimize update stats add channels last support for backward Revert "add channels last support for backward" This reverts commit cc5e29dce44395250f8e2abf9772f0b99f4bcf3a. Revert "optimize update stats" This reverts commit 7cc6540701448b9cfd5833e36c745b5015ae7643. Revert "add dim apply" This reverts commit b043786d8ef72dee5cf85b5818fcb25028896ecd. bug fix add batchnorm nhwc test for cpu, including C=1 and HW=1 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399468 Pulled By: VitalyFedyunin fbshipit-source-id: a4cd7a09cd4e1a8f5cdd79c7c32c696d0db386bd	2021-05-14 11:09:42 -07:00
BowenBao	0d11dbf511	[ONNX] Support index_add_ function. (#56867 ) (#57830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57830 This is PR is aiming to support tensor.index_add_() method in symbolic function. We leverage scatter_add() to implement this function while ONNX doesn't have a corresponding operator. Notes: 1. 4 tests have been added for some scenarios. 2. If there are duplicated value in 'index' parameter, the export will still execute successfully but the results are wrong. Add a warning for every call to this symbolic function. And if we detect that the rank of 'index' is greater than the size of the 'dim' dimension, will raise an exception to stop exporting an incorrect ONNX file. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393518 Pulled By: SplitInfinity fbshipit-source-id: f487ca2c63fec47c6ab74f1a7783dae7f3b8d1ef Co-authored-by: Jay Zhang <jiz@microsoft.com>	2021-05-14 09:51:55 -07:00
BowenBao	520f90f359	[ONNX] Handling incorrect format for example_outputs (#55802 ) (#57829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57829 Handling incorrect format for example_outputs, fixing exception behavior. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393521 Pulled By: SplitInfinity fbshipit-source-id: aa518483f94e31194b951198aefa7c897932356e Co-authored-by: Ksenija Stanojevic <ksenija.stanojevic@gmail.com> Co-authored-by: Negin Raoof <neginmr@utexas.edu>	2021-05-14 09:50:58 -07:00
Rohan Varma	52bb8120b8	Mention distributed profiling in documentation (#58286 ) Summary: Added a simple section indicating distributed profiling is expected to work similar to other torch operators, and is supported for all communication backends out of the box. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58286 Reviewed By: bdhirsh Differential Revision: D28436489 Pulled By: rohan-varma fbshipit-source-id: ce1905a987c0ede8011e8086a2c30edc777b4a38	2021-05-14 09:43:00 -07:00
Peter Bell	064923e635	Improve native_batch_norm_backward performance (CUDA) (#58240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38915 The original code uses a single kernel to do both the reduction and the elementwise backward calculations. Whereas the `SyncBatchNorm` kernels are split, which makes them slightly slower in some cases. I try to use the fused kernel when it's beneficial, but otherwise choose the optimized channels last split kernels. There is also eval mode, where the reduction is sometimes unnecessary in which case split kernels are a win even without channels last. Benchmarks on my system show significant speedups for channels last reductions and eval mode, with only a few minor slowdowns in training mode. These slowdowns are for 2 x 2048 shape in training, which is a small channels last inputs. But for larger batches or channels, the channels last kernels are much faster. \|N \|C \|L \|training\|backward\|old \|new \|cudnn \| \|----\|----\|----\|--------\|--------\|------\|------\|------\| \|1 \|256 \|3136\|TRUE \|all \|70.25 \|64.93 \|68.90 \| \| \| \| \|TRUE \|self \|69.77 \|64.61 \|69.42 \| \| \| \| \|FALSE \|all \|70.10 \|51.12 \|x \| \| \| \| \|FALSE \|self \|70.17 \|51.17 \|x \| \|3136\|256 \| \|TRUE \|all \|554.08\|76.63 \|549.88\| \| \| \| \|TRUE \|self \|553.34\|78.19 \|552.36\| \| \| \| \|FALSE \|all \|565.40\|55.09 \|x \| \| \| \| \|FALSE \|self \|565.71\|54.84 \|x \| \|2 \|8192\|1 \|TRUE \|all \|155.47\|47.26 \|202.26\| \| \| \| \|TRUE \|self \|155.46\|48.36 \|203.72\| \| \| \| \|FALSE \|all \|178.28\|40.90 \|x \| \| \| \| \|FALSE \|self \|178.21\|40.69 \|x \| \|2 \|2048\|1 \|TRUE \|all \|43.50 \|48.21 \|57.47 \| \| \| \| \|TRUE \|self \|43.63 \|47.24 \|55.22 \| \| \| \| \|FALSE \|all \|49.36 \|39.27 \|x \| \| \| \| \|FALSE \|self \|49.25 \|42.02 \|x \| \|128 \|8192\|1 \|TRUE \|all \|762.70\|106.45\|336.52\| \| \| \| \|TRUE \|self \|765.79\|107.04\|337.32\| \| \| \| \|FALSE \|all \|792.68\|74.94 \|x \| \| \| \| \|FALSE \|self \|793.86\|74.83 \|x \| \|128 \|2048\|1 \|TRUE \|all \|188.37\|46.20 \|85.02 \| \| \| \| \|TRUE \|self \|188.47\|47.57 \|85.04 \| \| \| \| \|FALSE \|all \|191.57\|40.44 \|x \| \| \| \| \|FALSE \|self \|190.13\|41.55 \|x \| \|2 \|8192\| \|TRUE \|all \|156.03\|43.01 \|155.19\| \| \| \| \|TRUE \|self \|156.24\|46.59 \|156.93\| \| \| \| \|FALSE \|all \|179.34\|40.06 \|x \| \| \| \| \|FALSE \|self \|179.20\|41.85 \|x \| \|2 \|2048\| \|TRUE \|all \|44.05 \|50.15 \|44.21 \| \| \| \| \|TRUE \|self \|44.10 \|48.97 \|44.11 \| \| \| \| \|FALSE \|all \|49.72 \|40.95 \|x \| \| \| \| \|FALSE \|self \|49.87 \|43.43 \|x \| \|128 \|8192\| \|TRUE \|all \|775.19\|96.60 \|777.64\| \| \| \| \|TRUE \|self \|776.20\|96.85 \|774.21\| \| \| \| \|FALSE \|all \|797.64\|68.01 \|x \| \| \| \| \|FALSE \|self \|806.25\|68.05 \|x \| \|128 \|2048\| \|TRUE \|all \|188.49\|48.10 \|188.97\| \| \| \| \|TRUE \|self \|188.07\|46.97 \|187.98\| \| \| \| \|FALSE \|all \|192.32\|43.78 \|x \| \| \| \| \|FALSE \|self \|193.72\|40.82 \|x \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/58240 Reviewed By: bdhirsh Differential Revision: D28435158 Pulled By: ngimel fbshipit-source-id: bf62a1ee1c5d95a2caf55bee6176ae5c965688ec	2021-05-14 09:29:05 -07:00
albanD	c711c30c74	Revert "Revert D28387764: Codegen inplace forward AD formula from out of place one if needed" (#58231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58231 This reverts commit 066e7699eb8c375a441e6de168da3ba7a73c3f27. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28412495 Pulled By: albanD fbshipit-source-id: 97dd4580baac903805ab66ad55fe9570dec993ee	2021-05-14 08:35:38 -07:00
Sam Estep	6e1718277c	Make GHA test-reports upload regex more permissive (#58250 ) Summary: Currently, our test stats [uploaded to S3](`fee7e8b91d`/&showversions=false) by GitHub Actions are missing the reports from `test/custom_backend/test_custom_backend.py` and `test/custom_operator/test_custom_ops.py`. From [this log](https://github.com/pytorch/pytorch/runs/2573747177), we know that those tests are indeed being run, but the artifact on that workflow run shows that the XML files are currently not being uploaded for use in the render-test-results job. This PR makes the regex for that artifact upload more permissive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58250 Test Plan: For context, before this PR, the test-reports artifact of Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) before this PR looks like this: - `test-reports` - `cpp-rpc` - ... - `cpp-unittest` - ... - `dist-gloo` - ... - `python-unittest` - ... Wait for Linux CI (pytorch-linux-xenial-py3.6-gcc5.4) to run on this PR, then download and unzip the test-reports artifact and check that its directory structure looks like this: - `custom_backend` - `test-reports` - `python-unittest` - ... - `custom_operator` - `test-reports` - `python-unittest` - ... - `test-reports` - `cpp-rpc` - ... - `cpp-unittest` - ... - `dist-gloo` - ... - `python-unittest` - ... Also, [this run](https://github.com/pytorch/pytorch/runs/2579875947) shows the following line of output, which is exactly what we would expect to see if this PR correctly adds the 9 tests across `custom_backend` and `custom_operator`: > ``` > Added (across 2 suites) 9 tests, totaling + 0.10s > ``` Reviewed By: walterddr Differential Revision: D28442396 Pulled By: samestep fbshipit-source-id: 893a397a8e701e4180e1812d6f83352b5920ced6	2021-05-14 08:28:51 -07:00
Alban Desmaison	4bcaa5ae20	Revert D28412496: Revert "Revert D28387767: Add forward AD test for op info" Test Plan: revert-hammer Differential Revision: D28412496 (`4f28c0b590`) Original commit changeset: 5b8e30b5e807 fbshipit-source-id: 5a47aad4d5428e97e2d2b4acb4192909360870cd	2021-05-14 08:26:03 -07:00
Sam Estep	2e26976ad3	Disallow versionless Python shebangs (#58275 ) Summary: Some machines don't have a versionless `python` on their PATH, which breaks these existing shebangs. I'm assuming that all the existing versionless `python` shebangs are meant to be `python3` and not `python2`; please let me know if my assumption was incorrect for any of these. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58275 Test Plan: CI. Reviewed By: zhouzhuojie Differential Revision: D28428143 Pulled By: samestep fbshipit-source-id: 6562be3d12924db72a92a0207b060ef740f61ebf	2021-05-14 08:26:02 -07:00
Sam Estep	e6adc06221	Revert D28425179: Build and push Docker images in GitHub Actions Test Plan: revert-hammer Differential Revision: D28425179 (`2ead01f796`) Original commit changeset: acea02d300c2 fbshipit-source-id: 01bf91a58238a9cd47e6b503c197ac346e8bbabe	2021-05-14 08:24:49 -07:00
Lillian Johnson	76d2cb3b8e	[torch.package/TorchScript] flag to gate allowance of TS serializaiton in torch.package (#57678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57678 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28232891 Pulled By: Lilyjjo fbshipit-source-id: f6b2f4557cb98c4e811b7e3b665e0ffe88115555	2021-05-14 08:21:46 -07:00
Lillian Johnson	e27f861db7	[torch.Package/TorchScript] TS into torch.Package test cases (#54894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54894 Test cases to test torch.Package's handling of TorchScript objects. TODO: test mapping storages to different device Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27832544 Pulled By: Lilyjjo fbshipit-source-id: 6a67938a428b57520fead698da1412623ece9dbd	2021-05-14 08:21:44 -07:00
Lillian Johnson	9403fe17ce	[torch.package/TorchScript] logic to enable sharing of tensors on load (#57573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57573 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28226975 Pulled By: Lilyjjo fbshipit-source-id: bc8cb3e8052fa18336c437e0601d8b0028fd1895	2021-05-14 08:21:43 -07:00
Lillian Johnson	307375a88e	[torch.Package/TorchScript] torch.Package python logic to save TorchScript (#54893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54893 Adds logic to torch.Package's `PackageExporter` and `PackageImporter` to handle TorchScript objects. Also adds necessary `__reduce_package__` methods to `ScriptModule` and `RecursiveScriptModule` to enable this API: ``` # create scripted objects scripted_mod = torch.jit.script(Mod1("initial_1")) scripted_mod2 = torch.jit.script(Mod2("initial_2")) # save objects into package with PackageExporter(filename, verbose=False) as e: e.save_pickle("res", "mod.pkl", scripted_mod) e.save_pickle("res", "mod2.pkl", scripted_mod2) # load scripted objects from package importer = PackageImporter(filename) scripted_mod_loaded = importer.load_pickle("res", "mod.pkl") scripted_mod2_loaded = importer.load_pickle("res", "mod2.pkl") ``` Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27832547 Pulled By: Lilyjjo fbshipit-source-id: 73bf254c311fee2a2b21a9a7861d6cdc53709bd1	2021-05-14 08:21:41 -07:00
Lillian Johnson	3ad11803f7	[torch.Package/TorchScript] ScriptModuleSerializer add unified format (#56299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56299 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27832545 Pulled By: Lilyjjo fbshipit-source-id: 1b2880a8458f99bd66a8c9656c5ca700f43cffe8	2021-05-14 08:21:40 -07:00
Lillian Johnson	8ab3aa464a	[torch.Package/TorchScript] refactor ScriptModuleSerializer Exporter (#55958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55958 This PR refactors the existing ScriptModuleSerializer to be exposed to the public. Most of the code is the same, git just thinks it's different due to it being shifted over a white space. I commented on the actual changes that weren't due to the white space shifting Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27832546 Pulled By: Lilyjjo fbshipit-source-id: c73e33211e46fca56053aa45ea2b9a2803eab82c	2021-05-14 08:21:38 -07:00
Lillian Johnson	07de11c26d	[torch.Package/TorchScript] TS serialization importer to handle unified format (#54891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54891 Changed TorchScript's jit/serialization importer logic to handle both original TS serialization format and new unified TS format Original TS file format: ``` resnet.pt ├── data # tensor data │ ├── 94286146172688 │ ├── 94286146172784 │ └── ... ├── code/ # TorchScript code │ ├── __torch__ │ │ ├── torch │ │ │ └── nn ... │ │ └── torchvision ... │ ├── __torch__.py │ └── __torch__.py.debug_pkl ├── data.pkl # the ScriptModule object, pickled by the TS pickler ├── version # version metadata ├── constants.pkl # any tensor constants present in the TS code └── extra ├── name_of_file └── foo ``` Unified file format: ``` ─── package_name.pt ├── .data │ ├── ts_code # code shared between models │ │ ├── 0 │ │ │ ├── constants.pkl │ │ │ └── data.pkl │ │ ├── 1 │ │ │ ├── constants.pkl │ │ │ └── data.pkl │ │ └── code │ │ ├── __torch__ │ │ │ ├── torch │ │ │ │ └── nn ... │ │ │ └── torchvision ... │ │ ├── __torch__.py │ │ └── __torch__.py.debug_pkl │ ├── 0.storage │ ├── 1.storage │ ├── <many more storages> │ ├── 201.storage │ ├── extern_modules │ └── version └── res ├── mod.pkl # maps to ts_id 0 and .data/ts_code/0 └── mod2.pkl # maps to ts_id 1 and .data/ts_code/1 ``` Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27832548 Pulled By: Lilyjjo fbshipit-source-id: 4a6e84c3a9bac8eed6a4e4afc2ac76dd691858b0	2021-05-14 08:20:34 -07:00
Sam Estep	2ead01f796	Build and push Docker images in GitHub Actions (#58174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58174 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28425179 Pulled By: samestep fbshipit-source-id: acea02d300c2547ced55e0e5586e95a6b5e1876d	2021-05-14 08:13:21 -07:00
Stephen Jia	1f5ed1ff69	[metal] Fix binary elementwise ops to handle inputs with mismatched dim() (#58262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58262 When broadcasting, it can be fine for input tensors to have a different number of dims. Fix the checks in arithmetic ops to accept these cases. Test Plan: Test on device: ``` arc focus2 pp-ios ``` Test on mac ``` buck test pp-macos ``` Reviewed By: xta0 Differential Revision: D27093367 fbshipit-source-id: 797eeffa1864291cb0e40277372842dca145c9c0	2021-05-14 07:53:39 -07:00
Stephen Jia	72a90c3ea5	[metal] Add reflection_pad2d for metal (#58263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58263 Add the `reflection_pad2d` op in preparation for newer xirp models. Test Plan: Test on device: ``` arc focus2 pp-ios ``` Test on mac ``` buck test pp-macos ``` Reviewed By: xta0 Differential Revision: D27047892 fbshipit-source-id: 815856e19e4885c352f5d7174866480db7641cdf	2021-05-14 07:52:30 -07:00
albanD	4f28c0b590	Revert "Revert D28387767: Add forward AD test for op info" (#58230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58230 This reverts commit f88297c66bd36d075e9d50eb09a81bea74a669c6. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28412496 Pulled By: albanD fbshipit-source-id: 5b8e30b5e80771dedf999c3aaa9791fc9026f8c1	2021-05-14 06:55:44 -07:00
Horace He	ccd7141919	Modify DispatchKeyExtractor to also work for optional Tensors (#58283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58283 Reviewed By: bhosmer Differential Revision: D28436443 Pulled By: Chillee fbshipit-source-id: ba6aae74e8ec3c5a6157cc4517c29b36bdd4a30d	2021-05-14 03:09:04 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	88ff651e90	torch.jit.ignore as a context manager (#55172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55172 Description: This is part 1 of series of PRs for supporting torch.jit.ignore as context manager. Following features are implemented in this PR: - Unique name for the registered function under torch.jit.frontend module. The unique name is generated based on the file name and line number of context manager - Forcing user to explicitly annotate the input and outputs. - No side effects are considered. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27895283 Pulled By: tugsbayasgalan fbshipit-source-id: 5d36d9aa5d457055a6bb1676f264647a745ec36a	2021-05-14 01:53:50 -07:00
Freey0	cf1daf571d	Port silu to structured (#58050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58050 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382790 Pulled By: ezyang fbshipit-source-id: 5aeedfe39b5f15d14022d1e9edec1b30e98e5076	2021-05-14 00:49:10 -07:00
Freey0	f23e10f27b	Port softshrink to structured (#57623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57623 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224703 Pulled By: ezyang fbshipit-source-id: 62e40d53eb130205f6c4d2775082e436e6adadce	2021-05-14 00:49:09 -07:00
Freey0	d65dff463a	Port hardsigmoid to structured (#57622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57622 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224704 Pulled By: ezyang fbshipit-source-id: 3feea6d87f2de4da5ae1f973a53ee136957ec807	2021-05-14 00:49:07 -07:00
Freey0	401d0fe8c5	Port leaky_relu to structured (#57621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57621 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224706 Pulled By: ezyang fbshipit-source-id: 168b175d0fd9e0cc3335ea00df4c7967fea77819	2021-05-14 00:49:05 -07:00
Freey0	9dba26eed1	Port softplus to structured (#57620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57620 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224705 Pulled By: ezyang fbshipit-source-id: a48419f5958e4d29427fb1fec7ff929f0297e4e4	2021-05-14 00:49:04 -07:00
Freey0	03398b7edb	Port elu to structured (#57619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57619 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224707 Pulled By: ezyang fbshipit-source-id: 9e1cad3f5536c65ada2d951366de134ebcb6bb3f	2021-05-14 00:47:41 -07:00
Bert Maher	ac04cc775b	[nnc] Enable CPU fusion inside Facebook (#58029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58029 We've been testing this for months, it's time. ghstack-source-id: 128932738 Test Plan: CI Reviewed By: ZolotukhinM Differential Revision: D27652484 fbshipit-source-id: a82681455dae0db19c8ac9918065b6e186c9e71a	2021-05-14 00:10:10 -07:00
Bert Maher	6b8b591a84	[nnc] Fix output restriding of size-1 dimensions (#58256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58256 Size-1 dims mess up our output restriding logic, because they're technically "dense" no matter what stride the dimension has. In this example a size-1 dim has stride 1, which causes all the indices to be taken mod 1 (i.e., all indices become 0). We work around this peculiar case by skipping size-1 in our layout logic, since it has no impact on the rest of the tensor's indexing. ghstack-source-id: 128932739 Test Plan: new unit test, plus ``` buck test mode/dev //langtech/mobile/audio_stream_processor:audio_stream_processor_test -- --exact 'langtech/mobile/audio_stream_processor:audio_stream_processor_test - AudioStreamProcessorTest.DemucsReadWriteFloat' ``` Reviewed By: eellison Differential Revision: D28424388 fbshipit-source-id: e33e39eef2a5bf2797bee78a5987558308b6d110	2021-05-14 00:09:12 -07:00
Wanchao Liang	cb7c6a536b	[doc] update distributed optimizer doc (#58084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58084 update the doc for distributed optimizer with TorchScript support. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D28363971 Pulled By: wanchaol fbshipit-source-id: df9d2acc1bbb2292d683d2231e1349b8d3946c8f	2021-05-13 23:37:00 -07:00
Lunwen He	a8122062c0	[PyTorch Mobile]Add light version of RandomSampler (#58201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58201 Add light version of RandomSampler which can be used torch mobile. Test Plan: run unit test Reviewed By: iseeyuan Differential Revision: D28364467 fbshipit-source-id: 3148129fa56533f5f4b76b63b60e8778eeaf815f	2021-05-13 22:53:21 -07:00
Dhruv Matani	38e606d056	[RFC] Add method torch.jit._clone_module_with_class (#56152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56152 Currently, the Bundled Inputs API mutates the module in-place. It adds class methods and not instance methods. This results in a small problem that one can't re-run an already executed cell in Bento if the class has already been subject to bundled inputs. In addition, there is no way to add bundled inputs to a module that has bundled inputs added already. This API provides a way to solve this problem as well by adding an `ignored_methods` to the call to `clone()` by allowing the implementation of bundled inputs to pass in the methods that it will add as `ignored_methods` so that when it does try to add those methods, it will be able to do so successfully. We'll have to be careful when ignoring those methods during the call to `torch.jit._clone_module_with_class` since any bundled input that relies on a user-provided method will need to be preserved and not ignored during the clone. Looking for feedback on whether this is an acceptable direction. ghstack-source-id: 128908360 Test Plan: Added unit test and ran it as `buck test //caffe2/test:mobile` Also see this Bento Notebook: https://www.internalfb.com/intern/anp/view/?id=550829 Reviewed By: gmagogsfm Differential Revision: D27788394 fbshipit-source-id: 48109cd4583506d4efdb345e4ba31385db23a273	2021-05-13 22:31:05 -07:00
lezcano	452569dffb	cfloat and cdouble functions (#58137 ) Summary: This adds the methods `Tensor.cfloat()` and `Tensor.cdouble()`. I was not able to find the tests for `.float()` functions. I'd be happy to add similar tests for these functions once someone points me to them. Fixes https://github.com/pytorch/pytorch/issues/56014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58137 Reviewed By: ejguan Differential Revision: D28412288 Pulled By: anjali411 fbshipit-source-id: ff3653cb3516bcb3d26a97b9ec3d314f1f42f83d	2021-05-13 21:13:37 -07:00
Hao Lu	f6408c0dc1	[ATen][quant] Use expect_contiguous in quantized::linear fbgemm version (#58221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58221 - Use expect_contiguous to avoid Tensor refcount bumps if input tensor is already contiguous - Use Tensor::sizes()[i] in place of Tensor::size(i) which goes through the dispatcher - Use at::Dimvector in place of std::vector to avoid heap allocation Since the qnnpack version needs on device testing, I'll skip that one for now. Test Plan: CI Reviewed By: swolchok Differential Revision: D28406942 fbshipit-source-id: 3c1bdfd1c859fe71869d4daec22158be5c2719d4	2021-05-13 20:51:47 -07:00
Horace He	31607ad41d	[nnc] Started codegenning some external calls (#58118 ) Summary: Currently only supports native ops that have all tensor arguments, an out variant, and no kwargs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58118 Reviewed By: ejguan Differential Revision: D28421323 Pulled By: Chillee fbshipit-source-id: 1c75c900415deca63fcc0e496e3bac126f21bf49	2021-05-13 19:56:50 -07:00
Nikita Shulga	04970057d8	Code-dedup in PowKernel (#57873 ) Summary: Both CPU and CUDA versions of PowKernel reimplement functionality that already exists in UnaryOps, such as sqrt, rsqrt and reciprocal Find this out while looking at sluggish compilation of PowerKernel.cu: - Before the change it took 11m5s and resulted in 7.6Mb .o file - After the change compilation finished in 10m20s, and 6.4Mb .o file Pull Request resolved: https://github.com/pytorch/pytorch/pull/57873 Reviewed By: ezyang Differential Revision: D28304929 Pulled By: malfet fbshipit-source-id: ac499476280de55a92044b1b041b1246eea74c64	2021-05-13 19:52:34 -07:00
Rong Rong (AI Infra)	64d23cc040	Revert D28379394: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28379394 (`b0833533a7`) Original commit changeset: b47f66bc1ee1 fbshipit-source-id: c81b34f45a1d82a2b1cecc8987048fa1055203d6	2021-05-13 19:49:41 -07:00
Nikolay Korovaiko	3072c97017	Gelu Backward, Contribution from Kevin Stephano (#58249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58249 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28425629 Pulled By: Krovatkin fbshipit-source-id: 494ab165d548aa76f036344ab1c19c5fd64bae82	2021-05-13 19:39:39 -07:00
Nick Korovaiko	f3ead05d77	hardtanh (#57750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57750 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28425975 fbshipit-source-id: a5e3dfbd6c77c595528c052e0b4325ef452983eb	2021-05-13 19:39:37 -07:00
Nick Korovaiko	c524448dd1	init hardshrink (#57749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57749 add to a fx test Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D28425974 fbshipit-source-id: 195c7a1944decb7a2a99c2831cab38485f32be17	2021-05-13 19:38:05 -07:00
Ilia Cherniavskii	047ae6b713	[profiler][small] CUDA synchronize guard, minor fix (#58254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58254 Don't use CUDA synchronize when profiling in CPU only mode. minor fixes (a clarification for a doc string, fix spammy logging) (Note: this ignores all push blocking failures!) Test Plan: manual + CI Reviewed By: gdankel, chaekit Differential Revision: D28423667 Pulled By: ilia-cher fbshipit-source-id: 04c71727f528ae8e2e0ff90e88271608d291bc69	2021-05-13 19:21:56 -07:00
mingfeima	8ac0917cc7	add channels last support for AvgPool2d on CPU (#48918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48918 enable test case on AvgPool2d channels last for CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399466 Pulled By: VitalyFedyunin fbshipit-source-id: 9477b0c281c0de5ed981a97e2dcbe6072d7f0aef	2021-05-13 18:05:57 -07:00
Yi Wang	fd3d3ef900	[RPC Framework] Add _script_module_reducer unconditionally for RecursiveScriptModule in RPC pickler (#58020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58020 Previously there is no RPC pickler for `RecursiveScriptModule`. Although it is a subclass of `ScriptModule`, the reducer of `ScriptModule` is not triggered for `RecursiveScriptModule` when a script remote module is sent over RPC. This PR checkpoints the investigation of #58274, which makes sure that a RPC pickler is invoked here. This still cannot fix `test_send_remote_module_over_the_wire_script`. Will revisit this bug once there is a feature request from users. ghstack-source-id: 128949642 Test Plan: TODO: re-enable these tests buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_remote_module_py_pickle_not_supported_script Reviewed By: rohan-varma Differential Revision: D28346758 fbshipit-source-id: 3cff84ca665da03da6ed6acb094a1f594fcd945e	2021-05-13 17:51:25 -07:00
Hao Lu	993a35a8cb	[Static Runtime] Support clamp.Tensor (#58191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58191 There are two clamp overloads: clamp.Scalar and clamp.Tensor. SR needs to support both or has checks in place to avoid runtime errors. Supporting both is not too hard so here we are. Reviewed By: edvgha Differential Revision: D28371949 fbshipit-source-id: 0ec6b8a0b8c6277e50d8e51e4e7a45aa62211e22	2021-05-13 17:46:59 -07:00
lezcano	1f3807ce5d	More stable and faster implementation of the gradient of torch.linalg.eigh (#55049 ) Summary: This PR: - Renames symeig_backward to eigh_backward - Improves the stability and speed of the gradient computation by doing `V(A + B)Vh` instead of `VAVh + VBVh` when both the gradients of the eigenvectors and eigenvalues are defined. - Updates the comments of the function to make them arguably clearer Pull Request resolved: https://github.com/pytorch/pytorch/pull/55049 Reviewed By: ngimel Differential Revision: D28396823 Pulled By: mruberry fbshipit-source-id: a144482bfb1054e281b58ae1fe3cf1015bab505d	2021-05-13 17:17:35 -07:00
Ivan Yashchuk	b0833533a7	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28379394 Pulled By: mruberry fbshipit-source-id: b47f66bc1ee12715da11dcffc92e31e67fa8c8f6	2021-05-13 16:57:29 -07:00
Nikolay Korovaiko	d304bb070a	Gelu Backward, Contribution from Kevin Stephano (#58249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58249 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28425381 Pulled By: Krovatkin fbshipit-source-id: 21b7ac972220b6c35b285e3b66f05eb392002408	2021-05-13 16:36:44 -07:00
Sam Estep	3a898c26c0	Print stderrs in tools/mypy_wrapper.py (#58265 ) Summary: Uncovered while investigating https://github.com/pytorch/pytorch/issues/58253. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58265 Test Plan: Before this PR: ``` $ mypy tools/stats_utils/foo.txt mypy: can't read file 'tools/stats_utils/foo.txt': No such file or directory $ echo $? 2 $ tools/mypy_wrapper.py $PWD/tools/stats_utils/foo.txt $ echo $? 2 ``` After this PR: ``` $ mypy tools/stats_utils/foo.txt mypy: can't read file 'tools/stats_utils/foo.txt': No such file or directory $ echo $? 2 $ tools/mypy_wrapper.py $PWD/tools/stats_utils/foo.txt > /dev/null mypy: can't read file 'tools/stats_utils/foo.txt': No such file or directory mypy: can't read file 'tools/stats_utils/foo.txt': No such file or directory $ echo $? 2 ``` Reviewed By: zhouzhuojie Differential Revision: D28426439 Pulled By: samestep fbshipit-source-id: c47a85a696ed44c9873416decc9fed8d99bc556c	2021-05-13 16:25:42 -07:00
zhouzhuojie	7756cb6a5b	Migrate pytorch_python_doc_build to github action (#57371 ) Summary: # Changes This PR migrates `pytorch_python_doc_build` from circleci to github actions. Noticeable changes - Refactor `docker cp` into a single `docker run` with volume mount, because the in circleci volume is not accessible from its remote docker engine - `pytorch_python_doc_push` job will have a race condition with circleci, which will be migrated in separate PRs Pull Request resolved: https://github.com/pytorch/pytorch/pull/57371 Reviewed By: samestep Differential Revision: D28416289 Pulled By: zhouzhuojie fbshipit-source-id: 04caccccf3d7eb7e2225846a406a53ccda356d44	2021-05-13 15:42:52 -07:00
Sam Estep	3f9126f399	Only quicklint files that exist (#58261 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58253. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58261 Test Plan: The repro steps for https://github.com/pytorch/pytorch/issues/58253. Reviewed By: janeyx99 Differential Revision: D28425900 Pulled By: samestep fbshipit-source-id: b4abe910bd9ba5a34ec5a413d4df21b85f96a89f	2021-05-13 15:16:07 -07:00
Natalia Gimelshein	f6532468c8	Make norm and vector_norm use the same kernels. (#58214 ) Summary: Fixes a few problems with `torch.norm` (incorrect behavior for empty inputs and negative p, https://github.com/pytorch/pytorch/issues/52783, and incorrect imaginary part for complex). Most importantly, makes linalg_norm and vector_norm use the same kernels, reducing compile time and binary size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58214 Reviewed By: ejguan Differential Revision: D28422439 Pulled By: ngimel fbshipit-source-id: afe088a866963068e8c85eb9c3b2218a21ff2d48	2021-05-13 15:06:37 -07:00
Sam Estep	26aeec35a1	Disable more of quicklint test (#58257 ) Summary: Essentially a followup to https://github.com/pytorch/pytorch/issues/57968. For now, this test is just too flaky to run on every PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58257 Test Plan: The repro steps in https://github.com/pytorch/pytorch/issues/58253. Reviewed By: walterddr Differential Revision: D28424862 Pulled By: samestep fbshipit-source-id: 00aed872fe505db67e48414b1234505a71099262	2021-05-13 14:45:45 -07:00
Martin Yuan	d833caaf6b	[PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845 Handle forward/backward compatibility caused by added default arguments in mobile. As an example, In older version, operator aten::foo's schema is ``` foo(Tensor a, Tensor b) -> Tensor ``` In the new version, the schema is updated to ``` foo(Tensor a, Tensor b, int groups=1) -> Tensor ``` ## Model file Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name: ``` ('operators', (('aten::foo', ''),)) ``` Now the number of specified arguments is added: ``` # bytecode version 6 ('operators', (('aten::foo', '', 2),)) ``` where "2" means the number of specified arguments. Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6. ## Runtime and backward compatibility When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by ``` op.value().schema().arguments() ``` Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema: ``` foo(Tensor a, Tensor b, int groups=1) -> Tensor ``` Whereas the model file carries ``` (('aten::foo', ''), 2) ``` We can implement a wrapper around the original function pointer to push the default argument to the stack. ## Deliver time and forward compatibility At model delivery time, two checks can be done: ### Operator check Two APIs to be provided: * Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP) * Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required). The APIs can be used to check * runtime.ops() is a superset of model.ops() * for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op. Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too. ### Bytecode version backport (PR coming) When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed. * The number of arguments is removed from the operator table * The bytecode version is changed from 6 to 5 Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27986544 Pulled By: iseeyuan fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda	2021-05-13 14:20:47 -07:00
Jeffrey Wan	e1bb9d2d99	Reimplement spectral_norm using new parametrization functionality (#57784 ) Summary: Adds a new file under `torch/nn/utils/parametrizations.py` which should contain all the parametrization implementations For spectral_norm we add the `SpectralNorm` module which can be registered using `torch.nn.utils.parametrize.register_parametrization` or using a wrapper: `spectral_norm`, the same API the old implementation provided. Most of the logic is borrowed from the old implementation: - Just like the old implementation, there should be cases when retrieving the weight should perform another power iteration (thus updating the weight) and cases where it shouldn't. For example in eval mode `self.training=True`, we do not perform power iteration. There are also some differences/difficulties with the new implementation: - Using new parametrization functionality as-is there doesn't seem to be a good way to tell whether a 'forward' call was the result of parametrizations are unregistered (and leave_parametrizations=True) or when the injected property's getter was invoked. The issue is that we want perform power iteration in the latter case but not the former, but we don't have this control as-is. So, in this PR I modified the parametrization functionality to change the module to eval mode before triggering their forward call - Updates the vectors based on weight on initialization to fix https://github.com/pytorch/pytorch/issues/51800 (this avoids silently update weights in eval mode). This also means that we perform twice any many power iterations by the first forward. - right_inverse is just the identity for now, but maybe it should assert that the passed value already satisfies the constraints - So far, all the old spectral_norm tests have been cloned, but maybe we don't need so much testing now that the core functionality is already well tested Pull Request resolved: https://github.com/pytorch/pytorch/pull/57784 Reviewed By: ejguan Differential Revision: D28413201 Pulled By: soulitzer fbshipit-source-id: e8f1140f7924ca43ae4244c98b152c3c554668f2	2021-05-13 14:16:13 -07:00
BowenBao	51cd89ecc6	[ONNX] Handle mixed mask, index input for index_put (#57604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57604 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393524 Pulled By: SplitInfinity fbshipit-source-id: 6c0cd9db981a7e4ece2fdd375a814a13449e1ab0 Co-authored-by: David <jiafa@microsoft.com>	2021-05-13 13:42:56 -07:00
BowenBao	01374d69e4	[ONNX] ListUnpack on dynamic tensor list (#56592 ) (#57603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57603 With explicit list unpack code from user, it is possible to observe `prim::ListUnpack` with a `ONNX::Sequence` object. This PR supports the conversion. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393527 Pulled By: SplitInfinity fbshipit-source-id: 1e6234d349b94c97c6ff20880a801433a9a428e9 Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-13 13:42:55 -07:00
BowenBao	8e29863a0d	[ONNX] Handle NoneType in Assign Output Shapes (#54623 ) (#57602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57602 Needed for ONNX Export of Huggingface-reformer models Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393517 Pulled By: SplitInfinity fbshipit-source-id: bab6f91e624bb31e804fe2cf7ec0970164a6f29e Co-authored-by: shubhambhokare1 <shubhambhokare1@gmail.com>	2021-05-13 13:42:53 -07:00
BowenBao	bfe7728f18	[ONNX] Process const folding progressively when converts to ONNX (#54569 ) (#57601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57601 This PR automatically solves onnx const attribute issue in PR https://github.com/pytorch/pytorch/pull/53784. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393525 Pulled By: SplitInfinity fbshipit-source-id: 833dac7c71f24a88af62d5dd2be0a702ed34d053 Co-authored-by: David <jiafa@microsoft.com>	2021-05-13 13:42:51 -07:00
BowenBao	346dc88bfa	[ONNX] Support registering custom export for prim::PythonOp from torch.autograd.Function (#55630 ) (#57600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57600 Demo script: ```python import torch class MyReLU(torch.autograd.Function): staticmethod def forward(ctx, input, scalar_tuple, scalar, scalar_list): ctx.save_for_backward(input) return input.clamp(min=scalar) staticmethod def backward(ctx, grad_output): input, = ctx.saved_tensors grad_input = grad_output.clone() grad_input[input < 0] = 0 return grad_input class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.linear_a = torch.nn.Linear(2, 2) self.linear_b = torch.nn.Linear(2, 2) self.relu = MyReLU.apply def forward(self, x): h = self.linear_a(x) h = self.relu(h, (5, 3), 2, [1, 2, 3]) h = self.linear_b(h) return h """ User define how to export prim::PythonOp into custom op. """ def symbolic_pythonop(g, n, args, *kwargs): # Print information: print('arguments of ', kwargs['name'], ':') print('original node: ', n) for i, out in enumerate(n.outputs()): print('original output {}: {}, requires grad: {}'.format(i, out, out.requiresGrad())) import torch.onnx.symbolic_helper as sym_helper for i, arg in enumerate(args): print('arg {}: {}, requires grad: {}'.format(i, arg, arg.requiresGrad() if sym_helper._is_value(arg) else False)) for k, v in kwargs.items(): print('key: ', k, ' v: ', v) # TODO: all inputs (tensors and scalars) are in args. # backend can define CustomDomain::PythonOp and how info are stored however it deem fit. return g.op("CustomDomain::PythonOp", args[0], name_s=kwargs['name']) torch.onnx.register_custom_op_symbolic("::prim_PythonOp", symbolic_pythonop, 9) # Define input. x = torch.tensor([[0.3971, 0.7544], [0.5695, 0.4388]], requires_grad=True) model = MyModule() # Forward. y = model(x) torch.onnx.export(model, (x,), 'model.onnx', opset_version=12, verbose=True) ``` Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393528 Pulled By: SplitInfinity fbshipit-source-id: e0d55b7c737c5916fda08a3b26b3306037f970df Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-13 13:42:49 -07:00
BowenBao	2b0f481d3f	Add support to to(device) op. (#56857 ) (#57599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57599 Currently, if we call tensor.to() method and pass a device as the parameter. It will fail, because in symbolic function of to() we didn't handle such case. So add a check in the beginning of this symbolic function, if this is a device cast, we return self directly. A test has also been added. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393523 Pulled By: SplitInfinity fbshipit-source-id: c41e3c0293932fc90dedb544daadd9c5d4b48792 Co-authored-by: Jay Zhang <jiz@microsoft.com>	2021-05-13 13:42:48 -07:00
BowenBao	9e56314d2c	onnx.symbolic_helper.parse_args: document and clean up (#56956 ) (#57598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57598 Add a doc string to explain what it does and how to use it. Remove hack around a bug in Python 2's functools.wrap(). Python 2 is no longer supported. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393519 Pulled By: SplitInfinity fbshipit-source-id: aae8c5e7b49e2ad2d24a0e86f8ba47f1cd080e46 Co-authored-by: Gary Miguel <garymiguel@microsoft.com>	2021-05-13 13:42:46 -07:00
BowenBao	dc0071dfa5	[ONNX] Special post process for onnx::Cast and onnx::ConstantOfShape shape type inference (#55962 ) (#57597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57597 * Special post process for onnx::Cast and onnx::ConstantOfShape * Update `test_pytorch_onnx_shape_inference.py` to be unit test over shape inference patterns. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393529 Pulled By: SplitInfinity fbshipit-source-id: fc26032ddb842d4e299447da39564b28049752ed Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-13 13:42:44 -07:00
BowenBao	ac9e79e561	Add a new operator for fill_() function. (#56859 ) (#57596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57596 Add the corresponding symbolic function and test for fill_() function. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393520 Pulled By: SplitInfinity fbshipit-source-id: 3e177f88d3776d0d4a9d5e7ec7df4e6629738799 Co-authored-by: Jay Zhang <jiz@microsoft.com>	2021-05-13 13:42:43 -07:00
BowenBao	6d7fe76317	[ONNX] Warning when using __len__ to calculate tensor shape (#55151 ) (#57595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57595 Difference in traced graph and outputs, when using len(tensor) as compared to tensor.shape[0] An example model is (with tensor.shape): ``` # Test len fix with variable inputs import torch import onnxruntime class Model(torch.nn.Module): def forward(self, x): return x.size(1) + x.shape[0] # Call export dummy_x = torch.randn(3, 5) model = Model() import io onnx_io = io.BytesIO() torch.onnx.export(model, (dummy_x,), onnx_io, input_names=['input'], dynamic_axes={'input': {0:'h'}}, verbose=True) # Call onnxruntime runtime and compare outputs on dynamic inputs ort_session = onnxruntime.InferenceSession(onnx_io.getvalue()) x = torch.randn(2, 5) print(model(x)) print(ort_session.run(None, {ort_session.get_inputs()[0].name: x.numpy()})) ``` The output graph is as follows: ``` graph(%input : Float(, 5, strides=[5, 1], requires_grad=0, device=cpu)): %1 : Long(2, strides=[1], device=cpu) = onnx::Shape(%input) %2 : Long(device=cpu) = onnx::Constant[value={1}]() %3 : Long(device=cpu) = onnx::Gather[axis=0](%1, %2) # test/onnx/test_m.py:9:0 %4 : Long(2, strides=[1], device=cpu) = onnx::Shape(%input) %5 : Long(device=cpu) = onnx::Constant[value={0}]() %6 : Long(device=cpu) = onnx::Gather[axis=0](%4, %5) # test/onnx/test_m.py:9:0 %7 : Long(requires_grad=0, device=cpu) = onnx::Add(%3, %6) # test/onnx/test_m.py:9:0 return (%7) ``` Torch output: 7 ORT output: 7 Now replacing tensor.shape[0] with len(tensor), the graph looks like: ``` graph(%input : Float(, 5, strides=[5, 1], requires_grad=0, device=cpu)): %1 : Long(2, strides=[1], device=cpu) = onnx::Shape(%input) %2 : Long(device=cpu) = onnx::Constant[value={1}]() %3 : Long(device=cpu) = onnx::Gather[axis=0](%1, %2) # test/onnx/test_m.py:9:0 %4 : Long(requires_grad=0, device=cpu) = onnx::Constant[value={3}]() %5 : Long(requires_grad=0, device=cpu) = onnx::Add(%3, %4) return (%5) ``` Torch output: 7 ORT output: 8 In the case with __len__, %4 is traced as a constant Workaround to avoid the mismatch when using len to get tensor.shape Add the following pattern around _export call ``` import builtins len_backup = builtins.len builtins.len = lambda x : x.__len__() # Call export _export(model, args, ..... builtins.len = len_backup ``` Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393526 Pulled By: SplitInfinity fbshipit-source-id: a7d50740442c7e913119f9f92deab48aa8c02843 Co-authored-by: shubhambhokare1 <shubhambhokare1@gmail.com>	2021-05-13 13:42:41 -07:00
BowenBao	3bc8a2264d	[ONNX] Support .item() export & NumberType to tensor conversion (#55697 ) (#57594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57594 Support .item() export & NumberType to tensor conversion Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28393516 Pulled By: SplitInfinity fbshipit-source-id: 94d0aec0a8fe144ee2567dc3c9c19fcb18ed21fa Co-authored-by: BowenBao <bowbao@microsoft.com>	2021-05-13 13:41:29 -07:00
Sam Estep	061c7a1e17	Overwrite with `ln` if libc10.so already exists (#58243 ) Summary: This should fix the issue noted in https://github.com/pytorch/pytorch/pull/57622#issuecomment-840612300 and demonstrated in [this run](https://github.com/pytorch/pytorch/runs/2566809365). Please review this PR carefully, because I do not know enough context to know whether this is the right thing to do. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58243 Test Plan: n/a Reviewed By: walterddr Differential Revision: D28414358 Pulled By: samestep fbshipit-source-id: 0eb1c598f353ebac7f0a85c290be6fce4e00b6d5	2021-05-13 13:31:29 -07:00
albanD	9b95568dc3	update abs forward ad formula (#58235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58235 this is to make the opinfo change python only Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28412937 Pulled By: albanD fbshipit-source-id: 1d6eb1e4baaa837c300ee8aa00b57986ba3e3eb2	2021-05-13 13:19:27 -07:00
albanD	3c4a90ce38	Revert "Revert D28387764: Codegen inplace forward AD formula from out of place one if needed" (#58231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58231 This reverts commit 066e7699eb8c375a441e6de168da3ba7a73c3f27. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28412480 Pulled By: albanD fbshipit-source-id: 7a231aa81b9e89537e6dca19642c4f12cd4b5ea5	2021-05-13 13:18:16 -07:00
Xue Haotian	098d9975a7	Port heaviside to structured kernel (#57933 ) Summary: Port heaviside to structured kernel Related https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57933 Reviewed By: mruberry Differential Revision: D28362533 Pulled By: ezyang fbshipit-source-id: 96b4591db3f609434784bd0ef9e54c61c918fb88	2021-05-13 10:48:11 -07:00
anjali411	770f8cea2d	Add support for real and imag tensor attributes (#54692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54692 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D28412240 Pulled By: anjali411 fbshipit-source-id: e6965c55539a44260a812fdaa4a982f02067bb05	2021-05-13 10:44:27 -07:00
Lijiang Fang	8888565597	T90561249: Enforce kernel launch checks (#58178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58178 https://www.internalfb.com/T90561249: change the test to enforce Test Plan: buck test //caffe2/test:kernel_launch_checks before fixing LinearAlgebra.cu and file close: https://www.internalfb.com/intern/testinfra/testconsole/testrun/1970324893386017/ after: https://www.internalfb.com/intern/testinfra/testconsole/testrun/2814749824394650/ Reviewed By: r-barnes Differential Revision: D28390166 fbshipit-source-id: 8a217ce8c0b204922005c731aa38bc03f70fabcc	2021-05-13 10:41:20 -07:00
Jacob Szwejbka	1de9f51782	[Pytorch Edge] Runtime ops compatibility api (#57570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57570 Move runtime ops compatibility api to OSS and introduce schema information ghstack-source-id: 128789159 Test Plan: unit test and manually ran it for a runtime with all (non custom) ops, and the bixray models unittest {P412728176} Reviewed By: raziel Differential Revision: D28203104 fbshipit-source-id: 432a7d0247bccfb2e1ce90e8d41f81596efa3d67	2021-05-13 10:20:41 -07:00
Eli Uriegas	2294fd61c6	.github: Add windows.4xlarge to scale-config.yml (#58198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58198 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28401791 Pulled By: seemethere fbshipit-source-id: dabaf58417114cc6138feca26d0121036476e04b	2021-05-13 10:07:22 -07:00
lezcano	d8c6b74b0b	Deprecate torch.solve (#57741 ) Summary: Deprecate deprecate deprecate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57741 Reviewed By: agolynski Differential Revision: D28379337 Pulled By: mruberry fbshipit-source-id: a7a35ce1d3f25d8593698d89761c6c2d940db31a	2021-05-13 09:54:21 -07:00
nikithamalgi	020e2ff115	Add tests for PDT (#58211 ) Summary: This is a duplicate of the PR https://github.com/pytorch/pytorch/issues/56029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58211 Reviewed By: gmagogsfm Differential Revision: D28403903 Pulled By: nikithamalgifb fbshipit-source-id: 290c46709c77c1a6fd647a2348419d12bf0a5ed6	2021-05-13 09:51:09 -07:00
Ivan Yashchuk	5e65428503	Fix NumPy compatibility issue for torch.linalg.cond (#58041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58041 The shape of the returned result was different for NumPy and PyTorch for `ord={-2, 2, None}`. Now it's fixed. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405147 Pulled By: mruberry fbshipit-source-id: 30293a017a0c0a7e9e3aabd470386235fef7b6a6	2021-05-13 09:42:18 -07:00
Ivan Yashchuk	a49406b331	Fixed batched version of torch.linalg.cond for singular inputs (#58040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58040 This PR uses `torch.linalg.inv_ex` to determine the non-invertible inputs and return the condition number of infinity for such inputs. Added OpInfo entry for `torch.linalg.cond`. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405146 Pulled By: mruberry fbshipit-source-id: 524b9a38309851fa6461cb787ef3fba5aa7d5328	2021-05-13 09:42:17 -07:00
Ivan Yashchuk	c1430c3425	Add torch.linalg.inv_ex without checking for errors by default (#58039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58039 The new function has the following signature `inv_ex(Tensor inpit, *, bool check_errors=False) -> (Tensor inverse, Tensor info)`. When `check_errors=True`, an error is thrown if the matrix is not invertible; `check_errors=False` - responsibility for checking the result is on the user. `linalg_inv` is implemented using calls to `linalg_inv_ex` now. Resolves https://github.com/pytorch/pytorch/issues/25095 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28405148 Pulled By: mruberry fbshipit-source-id: b8563a6c59048cb81e206932eb2f6cf489fd8531	2021-05-13 09:42:15 -07:00
lezcano	9e156b01e5	linalg.eig backwards and linalg.eigvals (#57276 ) Summary: This PR adds backwards support for `eig` and `eigvals`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57276 Reviewed By: ngimel Differential Revision: D28405056 Pulled By: mruberry fbshipit-source-id: 27ef03f139f44d75f4d319b0f3e77e99eea9bb01	2021-05-13 09:42:13 -07:00
Nikita Shulga	2afcb7e8fd	Move Azure MultiGPU tests back to nightly (#58242 ) Summary: As its currently broken on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/58242 Reviewed By: samestep Differential Revision: D28414152 Pulled By: malfet fbshipit-source-id: 2566be294d62e39f9f7d316a039ab9372d685577	2021-05-13 09:41:02 -07:00
Yi Wang	e507771294	[RPC Framework] Replace Python Pickler with internal RPC pickler for RemoteModule (#58019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58019 In order to support sending `RemoteModule` over PRC, previously the pickling/unpickling of `RemoteModule` was implemented based on `__setstate__` and `__getstate__`. However, this means that the user can call regular Python pickler/unpickler to invoke the same logic,which should not be allowed. This PR ensures that the pickling can only happen over RPC and not via regular python pickle. Additionally, when a new attribute is added to `RemoteModule`, if it's not added to either `_REMOTE_MODULE_PICKLED_ATTRIBUTES` or `_REMOTE_MODULE_ATTRIBUTES_IGNORE_FOR_PICKLING`, this attribute will be ignored and an error message will be printed to std.err. However, it will not raise an exception like before, because such exception raised at the RPC layer will somehow cause timeout. #Closes: https://github.com/pytorch/pytorch/issues/57516 ghstack-source-id: 128868501 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_remote_module_py_pickle_not_supported buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_with_a_new_attribute_ignored_over_the_wire buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule buck test mode/dev-nosan //caffe2/torch/fb/csrc/concurrency/test:atomic_int_interprocess_test -- --exact 'caffe2/torch/fb/csrc/concurrency/test:atomic_int_interprocess_test - test_multiple_processes (caffe2.torch.fb.csrc.concurrency.test.atomic_int_interprocess_test.ForkMultipleProcessTest)' buck test mode/dev //caffe2/torch/distributed/fb/test:app_test -- --exact 'caffe2/torch/distributed/fb/test:app_test - test_custom_init_rpc (caffe2.torch.distributed.fb.test.app_test.TestRpc)' Reviewed By: mrshenli Differential Revision: D28318270 fbshipit-source-id: 7e7df2a6690f0860c4531a244d38789db424496f	2021-05-13 09:37:42 -07:00
Mikhail Zolotukhin	470cd64514	[TensorExpr] Remove disabled tests that we do not plan to re-enable. (#58207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58207 We probably don't even know what these tests check and there are no plans on re-enabling them - let's just nuke them to keep the code clean. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28403251 Pulled By: ZolotukhinM fbshipit-source-id: fe12e978636a74f309f57e3408ab78d459fe4d29	2021-05-13 09:19:20 -07:00
Mikhail Zolotukhin	a0f4b7cd48	[TensorExpr] Re-enable skipped tests, they seem to be working now. (#58206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58206 Tested on CUDA with and without `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1`. Closes #48053. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28403250 Pulled By: ZolotukhinM fbshipit-source-id: 1ae1cfed691e0077a37db646937e580fbd32b23f	2021-05-13 09:18:09 -07:00
Edmund Chang	dd3bd0286b	T89509943 - Improve error message during Glow ONNXIFI (#58069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58069 We want to tell user 5821 means ONNXIFI_EVENT_STATE_NONSIGNALLED in the error message. Added that status code to the mapping and the error message output. Reviewed By: hl475 Differential Revision: D28359864 fbshipit-source-id: 87f50ddd4ded9ced03ec6af6a1a4ef85bd2195d6	2021-05-13 09:02:36 -07:00
Jeffrey Wan	e71b526e7e	Add inference mode python bindings and tests (#58045 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56608 - Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function. - Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global). Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class. - Adds some tests based on those linked in the issue + several more for just the context manager Issues/todos (not necessarily for this PR): - Improve short inference mode description - Small example - Improved testing since there is no direct way of checking TLS/dispatch keys - Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045 Reviewed By: agolynski Differential Revision: D28390595 Pulled By: soulitzer fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c	2021-05-13 08:55:35 -07:00
Rong Rong (AI Infra)	002ce5c1df	port addmm to structure kernel (#57417 ) Summary: Port addmm to structure kernel Follow ups - migrate `mm` and `addbmm` to structure - move TORCH_CHECKS currently in `addmm_cpu_impl_` and `addmm_out_cuda_impl` to meta Pull Request resolved: https://github.com/pytorch/pytorch/pull/57417 Reviewed By: bdhirsh Differential Revision: D28291001 Pulled By: walterddr fbshipit-source-id: 4eafaa30a465e225fbb4d2a69a36f1e037df9122	2021-05-13 08:33:42 -07:00
Jeff Daily	52e9a192b1	[ROCm] add 4.2 to nightly builds (#58143 ) Summary: Depends on https://github.com/pytorch/builder/pull/764. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58143 Reviewed By: agolynski Differential Revision: D28385532 Pulled By: malfet fbshipit-source-id: 1a37b1d4636327f8e1d0d5cfaa03f652565f8e38	2021-05-13 08:14:23 -07:00
Gregory Chanan	e8574b84bf	Fix legacy tensor constructor/new matching incorrect signature with d… (#58108 ) Summary: …evice. Previously, it was possible for torch.Tensor(tensor, device) or Tensor.new(tensor, device) to map to IntArrayRef or PyObject. PyObject was not a problem because that would error out later. But IntArrayRef would create an uninitialized tensor, which is confusing. Fixes https://github.com/pytorch/pytorch/issues/47112 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58108 Reviewed By: agolynski, mruberry Differential Revision: D28372426 Pulled By: gchanan fbshipit-source-id: 795ab4f0561939d002a661c5cc14c6cdb579f31a	2021-05-13 08:11:08 -07:00
anjali411	ab5c273950	Remove the matmul complex backward skip (#58138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58138 related https://github.com/pytorch/pytorch/issues/55754 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28403156 Pulled By: anjali411 fbshipit-source-id: dca4dd643f190b314a8a4c01c698c6a1e5229f6f	2021-05-13 07:48:08 -07:00
mfkasim91	cf7d56d8f2	Gradgradcheck to runs successfully with unrelated inputs (#58049 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57649 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58049 Reviewed By: agolynski Differential Revision: D28390033 Pulled By: albanD fbshipit-source-id: a0809b918321f3ea6fc59bfbec1f37e566d3611d	2021-05-13 06:42:29 -07:00
Ilia Cherniavskii	6997e7bd39	Update Kineto submodule (#58179 ) Summary: Update Kineto submodule, minor api changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/58179 Test Plan: CI Reviewed By: gdankel Differential Revision: D28391369 Pulled By: ilia-cher fbshipit-source-id: 61fbf63d9ec2db66fac203944679e4b99cb0d568	2021-05-13 04:03:04 -07:00
Ilia Cherniavskii	2b99bce1d7	[profiler] CUDA event fallback (#58133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58133 Adding CUDA event fallback for cases when CUPTI tracing is not available, this corresponds to the legacy profiler GPU profiling Test Plan: python test/test_profiler.py -v Reviewed By: gdankel Differential Revision: D28379596 Pulled By: ilia-cher fbshipit-source-id: 2db3b2cd8c1c3e6e596784ab00a226c69db2ef27	2021-05-13 03:41:03 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fee7e8b91d	Striding for lists Part 2 (#49352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49352 In this PR, we replace all definitions of slice to take None parameters for the start, end, and step. This will simplify the compiler logic Test Plan: test_jit test cases Imported from OSS Reviewed By: jamesr66a, nikithamalgifb Differential Revision: D25929903 fbshipit-source-id: 5bfc6bad514a8aafbef2dacc706f95f867fe85f1	2021-05-13 00:16:02 -07:00
anjali411	82d714935e	[TS] Add complex support for more ops (#54541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54541 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27599114 Pulled By: anjali411 fbshipit-source-id: 182d4480fd788599c408bfaf0d23baf3d9a4e967	2021-05-12 23:33:29 -07:00
Mike Ruberry	7a95cccbc7	Revert D28393469: [pytorch][PR] Enable `ceil`, `floor`, `frac`, `round` & `trunc` for BFloat16 on CUDA Test Plan: revert-hammer Differential Revision: D28393469 (`e6d8f45523`) Original commit changeset: b0f02ade7c6e fbshipit-source-id: 5e900f240e738168b9db9a617c6a75c949ad36d6	2021-05-12 23:29:34 -07:00
Mike Ruberry	c8644326a7	Revert D28177553: [tsm] add support for jetter to Role (base_image) for mast launches Test Plan: revert-hammer Differential Revision: D28177553 (`8a1dab3d26`) Original commit changeset: 29daada4bc26 fbshipit-source-id: 28132684dfdc28915d5fa5217a4591fec8d880fe	2021-05-12 23:21:59 -07:00
Ailing Zhang	e554731b32	Hide set_enabled since it's not public facing. (#58078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58078 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28362048 Pulled By: ailzhang fbshipit-source-id: 4c78a7c58860ec4963bc8d05d133ea26e47dcf00	2021-05-12 22:52:17 -07:00
Kiuk Chung	8a1dab3d26	[tsm] add support for jetter to Role (base_image) for mast launches Summary: 1. Adds `ml_image` buck macro 2. Adds `--run_path` option to `torch.distributed.run` 3. Adds `tsm/driver/fb/test/patched/foo` (for unittesting) 4. Changes to `distributed_sum` to use `ml_image` (see Test plan for how this was tested in local and mast) NOTE: need to enable jetter for flow and local schedulers (will do this on a separate diff since this diff is already really big) Test Plan: ## Local Testing ``` # build the two fbpkgs (base and main) buck run //pytorch/elastic/examples/distributed_sum/fb:torchx.examples.dist_sum.base buck run //pytorch/elastic/examples/distributed_sum/fb:torchx.examples.dist_sum # fetch the fbpkgs cd ~/tmp fbpkg fetch --symlink-tags -o -d . jetter:prod fbpkg fetch --symlink-tags -o -d . torchx.examples.dist_sum.base fbpkg fetch --symlink-tags -o -d . torchx.examples.dist_sum jetter/LAST/jetter apply-and-run \ torchx.examples.dist_sum.base/LAST/torchrun \ torchx.examples.dist_sum/LAST \ -- \ --as_function \ --rdzv_id foobar \ --nnodes 1 \ --nproc_per_node 2 \ --max_restarts 0 \ --role worker \ --no_python \ ~/torchx.examples.dist_sum/LAST/pytorch/elastic/examples/distributed_sum/fb/main.py ``` ## Mast Testing ``` buck-out/gen/pytorch/elastic/torchelastic/tsm/fb/cli/tsm.par run_ddp \ --scheduler mast --base_fbpkg torchx.examples.dist_sum.base:78f01b5 \ --fbpkg torchx.examples.dist_sum:f38ab46 \ --run_cfg hpcClusterUuid=MastNaoTestCluster,hpcIdentity=pytorch_r2p,hpcJobOncall=pytorch_r2p \ --nnodes 2 \ --resource T1 \ --nproc_per_node 4 \ --name kiuk_jetter_test \ pytorch/elastic/examples/distributed_sum/fb/main.py ``` Runs successfully: https://www.internalfb.com/mast/job/tsm_kiuk-kiuk_jetter_test_34c9f0fa? Reviewed By: tierex, yifuwang Differential Revision: D28177553 fbshipit-source-id: 29daada4bc26e5ef0949bf75215f35e557bd35b8	2021-05-12 22:10:15 -07:00
Chester Liu	ad4b2571b6	Fix multi gpu test break on Windows (#58213 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/58213 Reviewed By: robieta, mruberry Differential Revision: D28405126 Pulled By: malfet fbshipit-source-id: 48c0aa8a113c554e3a007c1900fae2ff453cf85b	2021-05-12 21:39:08 -07:00
kshitij12345	6b1eeef601	OpInfo: squeeze (#58080 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58080 Reviewed By: agolynski Differential Revision: D28379485 Pulled By: mruberry fbshipit-source-id: 2b288036f595a5bd6b948a072494ee87f82322ce	2021-05-12 21:29:31 -07:00
Nikita Shulga	a31daf381f	Move libtorch builds to be master-only (#58183 ) Summary: There were almost no libtorch specific regressions recently Pull Request resolved: https://github.com/pytorch/pytorch/pull/58183 Reviewed By: janeyx99 Differential Revision: D28393091 Pulled By: malfet fbshipit-source-id: 6dadd915ba574294afa6a95eaa759564af3154d4	2021-05-12 21:16:16 -07:00
Mike Ruberry	2d7d6922b6	Revert D28387765: Add forward AD gradcheck Test Plan: revert-hammer Differential Revision: D28387765 (`647282cb0c`) Original commit changeset: ed15049b5bda fbshipit-source-id: b47ac5de90da8fce3697a4d16aa10feea5668c99	2021-05-12 20:42:31 -07:00
Mike Ruberry	f88297c66b	Revert D28387767: Add forward AD test for op info Test Plan: revert-hammer Differential Revision: D28387767 (`26b6d044cd`) Original commit changeset: 369d76921c84 fbshipit-source-id: 91ac961339bdd5e1e2530d2655364f9fe46cdafb	2021-05-12 20:41:25 -07:00
Taylor Robie	87f7fdfd5c	Allow instruction counting to use shared memory as a staging ground. (And a couple other tweaks.) (#56711 ) Summary: This is actually something I discovered a while ago with the wall of serotonin. It was really easy for large scale runs to get bottlenecked on disk access. I have a hack in the working files of that machine to use `/dev/shm`, but I figured I should formalize and actually make a respectable utility. I also added a param to tweak the run cadence and print when a CorePool is created; these are just to make the CI logs a bit nicer. (A printout each second on a 40 minute CI job is a bit much...) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56711 Reviewed By: agolynski Differential Revision: D28392248 Pulled By: robieta fbshipit-source-id: b6aa7445c488d8e4ab9d4b31ab18df4e12783d8f	2021-05-12 20:37:41 -07:00
Edward Yang	066e7699eb	Revert D28387764: Codegen inplace forward AD formula from out of place one if needed Test Plan: revert-hammer Differential Revision: D28387764 (`2279962162`) Original commit changeset: 7bf3929dd214 fbshipit-source-id: 473851cf7527b0edf303fdb46b9c07357ff7f340	2021-05-12 20:35:02 -07:00
Winston Smith	ce1a8620d9	Enabled `roll` & `diag` for BFloat16 dtype on CUDA (#57916 ) Summary: Enabled `roll` & `diag` for BFloat16 dtype on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/57916 Reviewed By: agolynski Differential Revision: D28393534 Pulled By: ngimel fbshipit-source-id: fc1d8555b23a75f8b24c2ad826f89cd4e14cf487	2021-05-12 20:29:17 -07:00
Winston Smith	f9aa6b2432	Enable lerp for BFloat16 on CUDA (#57907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57907 Reviewed By: agolynski Differential Revision: D28393597 Pulled By: ngimel fbshipit-source-id: 27ebfaf175c9eeb8d411ce782fdbc468082c6af3	2021-05-12 20:23:52 -07:00
Winston Smith	e6d8f45523	Enable `ceil`, `floor`, `frac`, `round` & `trunc` for BFloat16 on CUDA (#57910 ) Summary: Enable `ceil`, `floor`, `frac`, `round` & `trunc` for BFloat16 on CUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/57910 Reviewed By: agolynski Differential Revision: D28393469 Pulled By: ngimel fbshipit-source-id: b0f02ade7c6e2ed122aa5d80f6d442823dc1f221	2021-05-12 20:22:19 -07:00
Winston Smith	c4a486f4b1	Enable atan2 & hypot for BFloat16 on CUDA (#57905 ) Summary: Enable `atan2` & `hypot` for BFloat16 on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57905 Reviewed By: agolynski Differential Revision: D28393706 Pulled By: ngimel fbshipit-source-id: e505e5f098d35e4f7417508443cb0fedf6562dd1	2021-05-12 20:19:14 -07:00
Hsiu-Chi Chang	f4a5730a6b	Add LowerSimpleTuples for freeze tuples (#57915 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57698 Follow the suggestion mentioned in https://github.com/pytorch/pytorch/issues/57698 add a call to LowerSimpleTuples after the call: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/passes/freeze_module.cpp#L89. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57915 Reviewed By: agolynski Differential Revision: D28387310 Pulled By: nikithamalgifb fbshipit-source-id: 5becb608c5352240b30dfdf03a821d0297e9609c	2021-05-12 19:07:20 -07:00
Can Balioglu	f0a5500722	[torch/elastic] Add logging to the sanitize function of RendezvousStateHolder (#58169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58169 This PR adds logging to the `_sanitize()` function of `RendezvousStateHolder` to output the nodes that had no recent heartbeat and are considered "dead". ghstack-source-id: 128798389 Test Plan: Run the existing tests. Reviewed By: tierex Differential Revision: D28333394 fbshipit-source-id: ba0a398a759815e4224b58323c0e743eb383f723	2021-05-12 18:53:55 -07:00
albanD	2279962162	Codegen inplace forward AD formula from out of place one if needed (#57767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57767 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28387764 Pulled By: albanD fbshipit-source-id: 7bf3929dd21425be653da112385e902aa50455a1	2021-05-12 18:49:20 -07:00
albanD	26b6d044cd	Add forward AD test for op info (#57701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57701 The new OpInfo flag has the following semantic: - If it says that it supports forward AD, we run gradcheck with forward AD to ensure it is correct - If it says that it does not support it, we check that the corresponding error is raised All the added tests take 3s to run for CPU builds and 1min for GPU builds which should be pretty negligible compared to the test_ops runtime for each of these arch. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28387767 Pulled By: albanD fbshipit-source-id: 369d76921c8460aa4548f9b5159b7297994672f5	2021-05-12 18:49:18 -07:00
albanD	647282cb0c	Add forward AD gradcheck (#57633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57633 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28387765 Pulled By: albanD fbshipit-source-id: ed15049b5bdacca54f775b50ef166d540ba0b847	2021-05-12 18:48:07 -07:00
Alexander Golynski	bc30c3165c	Update docs for get_future support (#58107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58107 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28387374 Pulled By: agolynski fbshipit-source-id: 70052afbb0b07ba341ea55f7ec30f7d9759b7bd4	2021-05-12 18:29:28 -07:00
Eddie Yan	645a5f706a	move `flatten_dense_tensors` and `unflatten_dense_tensors` to Native (#58006 ) Summary: https://github.com/pytorch/pytorch/issues/55240 CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/58006 Reviewed By: agolynski Differential Revision: D28386749 Pulled By: ngimel fbshipit-source-id: 4860c35d5ff95bcc38a243d7001180e7bd536314	2021-05-12 18:18:34 -07:00
Michael Suo	f1ac9b6598	fix lint (#58203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58203 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28401974 Pulled By: suo fbshipit-source-id: cc244e0fc81c5f699ff4bd30754a3f6467f232c4	2021-05-12 18:01:50 -07:00
Can Balioglu	028f2f62ac	[torch/elastic] Update the rendezvous docs (#58160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58160 This PR updates the Torch Distributed Elastic documentation with references to the new `c10d` backend. ghstack-source-id: 128783809 Test Plan: Visually verified the correct Reviewed By: tierex Differential Revision: D28384996 fbshipit-source-id: a40b0c37989ce67963322565368403e2be5d2592	2021-05-12 16:54:28 -07:00
Can Balioglu	ae63b1d1c6	[torch/elastic] Revise distributed run script (#58159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58159 This PR includes the following changes: - The `--standalone` option of `torch.distributed.run` now uses the `c10d` backend instead of `etcd` backend. - The `import` statement for `EtcdServer` has been removed from the run script. - The docstrings and parameter descriptions of the run script have been revised and improved. - The default port number of `EtcdRendezvousBackend` has been changed from 29500 to 29400 to improve the user experience when used along with the run script which uses the port 29500 for the distributed job store (a.k.a. `MASTER_PORT`) by default. ghstack-source-id: 128782267 Test Plan: - Run existing tests. - Visually verified the correct rendering of the docs. Reviewed By: tierex Differential Revision: D28383681 fbshipit-source-id: a4098f7c23c97a2376a9c4023e81f82fedd04b10	2021-05-12 16:53:31 -07:00
Yanli Zhao	166a8df65f	[reland] make ddp logging api to be private (#58089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58089 make ddp logging api to be private ghstack-source-id: 128796419 Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D28365412 fbshipit-source-id: 374c01d443ffb47a3706f59e296d6e47eb5f4c85	2021-05-12 16:45:13 -07:00
Yu Guo	8a45006765	enable deterministic path for index_copy_cuda with index_put (#58144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58144 reland D28291041 (`14badd9929`), which was reverted due to a type error from Tuple[torch.Tensor], seems that mypy requires Tuple[torch.Tensor, torch.Tensor, torch.Tensor] Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229) ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750) ✓ Pass: caffe2/test:torch_cuda - main (25.750) Reviewed By: ngimel Differential Revision: D28383178 fbshipit-source-id: 38896fd6ddd670cfcce36e079aee7ad52adc2a28	2021-05-12 16:26:50 -07:00
Michael Suo	01d0eb9dac	[package] Add an intern keyword (#57341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57341 Require that users be explicit about what they are going to be interning. There are a lot of changes that are enabled by this. The new overall scheme is: PackageExporter maintains a dependency graph. Users can add to it, either explicitly (by issuing a `save_` call) or explicitly (through dependency resolution). Users can also specify what action to take when PackageExporter encounters a module (deny, intern, mock, extern). Nothing (except pickles, tho that can be changed with a small amount of work) is written to the zip archive until we are finalizing the package. At that point, we consult the dependency graph and write out the package exactly as it tells us to. This accomplishes two things: 1. We can gather up all* packaging errors instead of showing them one at a time. 2. We require that users be explicit about what's going in packages, which is a common request. Differential Revision: D28114185 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: suo fbshipit-source-id: fa1abf1c26be42b14c7e7cf3403ecf336ad4fc12	2021-05-12 16:22:43 -07:00
Pritam Damania	d230045fde	Combine backtrace print into one string to avoid interleaving. (#56961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56961 As described in https://github.com/pytorch/pytorch/issues/56583, the backtrace amongst several processes was garbled. https://github.com/pytorch/pytorch/pull/56198 would've alleviated this to some extent, but this PR combines all the logging into just one string to reduce interleaving further. ghstack-source-id: 128730047 Test Plan: waitforbuildbot Reviewed By: cbalioglu Differential Revision: D28013191 fbshipit-source-id: 8bd8978a92ee2fbcd18472e1293d5809455b411b	2021-05-12 15:52:05 -07:00
kshitij12345	d09abf004c	OpInfo: narrow (#58082 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58082 Reviewed By: agolynski Differential Revision: D28379371 Pulled By: mruberry fbshipit-source-id: 484e560b1e6ceba234e497585ed308a27cd8b7a0	2021-05-12 15:39:15 -07:00
Philip Meier	9148f19e85	enable support for nested containers in `torch.testing.assert(equal\|close)` (#57270 ) Summary: In contrast to the initial opinion in https://github.com/pytorch/pytorch/issues/55385, there are legitimate use cases for nested containers. One such example is the [output of `LSTM`'s](https://pytorch.org/docs/stable/generated/torch.nn.LSTM): ```python output: Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]] = torch.nn.LSTM()(input) assert_close(output, expected) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57270 Reviewed By: albanD Differential Revision: D28249303 Pulled By: mruberry fbshipit-source-id: 75caa4414cc184ff0ce4cfc0dd5aafddfad42bcf	2021-05-12 15:37:42 -07:00
nikithamalgi	9063cb0a3c	Infer types for arguments of methods not invoked directly by monkeytype (#57202 ) Summary: Support adding type annotations for class methods and nn.Module methods which are not invoked under the hood of MonkeyType Changes * This PR involves a slight change in how the example inputs are passed while scripting `class` and `nn.Module` objects. * The example inputs passed to `_script_pdt` is of the following format: - example_inputs= [(obj.method1, (arg_list)), (obj.method2, (arg_list)),] * For nn.Modules, to infer types for `forward` methods, example_inputs can be passed in two ways: - example_inputs= [(obj.forward, (arg_list, ))] - example_inputs = [(obj, (arg_list, ) )] Pull Request resolved: https://github.com/pytorch/pytorch/pull/57202 Reviewed By: desertfire Differential Revision: D28382827 Pulled By: nikithamalgifb fbshipit-source-id: 5481467f3e909493bf3f439ee312056943508534	2021-05-12 15:32:38 -07:00
neginraoof	1de3525ca8	[ONNX] Handle PackedParams inputs for _propagate_and_assign_input_shapes (#56449 ) (#57079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57079 Testing onnx 1.9 release, we see that the old bug is triggered for the caffe2 test: `pytest test/onnx/test_pytorch_onnx_caffe2_quantized.py::TestQuantizedOps::test_small_model` This is because the graph inputs ```python graph(%x.1 : Tensor, %conv1._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase, %conv2._packed_params : __torch__.torch.classes.quantized.Conv2dPackedParamsBase, %fc.bias : Float(10, strides=[1], requires_grad=0, device=cpu), %fc.weight : Float(10, 72, strides=[72, 1], requires_grad=0, device=cpu)): ``` contains `Conv2dPackedParamsBase` which is a PackedParams. When we do flatten, we will flatten to several tensors, then the shape inference for input misaligned. This PR record how may tensors got flattened in PackeParams, and skip by these number rather than 1, then the UT passed. Note that tuple case should still follow the original logic. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28393949 Pulled By: malfet fbshipit-source-id: 98d48aad27e5ca03fb10d260f8e625478d996ee2 Co-authored-by: David <jiafa@microsoft.com>	2021-05-12 15:20:26 -07:00
Chen Lai	3d5bb71020	Back out "[PyTorch Edge] Reuse constant table from ts in bytecode" (#58099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58099 Original commit changeset: 34e0cb814901 ghstack-source-id: 128749184 Test Plan: CI Reviewed By: raziel, iseeyuan Differential Revision: D28369142 fbshipit-source-id: 631034126cebbd1c94ead6316b66e83a4812a890	2021-05-12 15:12:18 -07:00
liuyuanqiang@bytedance	85d64648d3	Port threshold to structure (#57810 ) Summary: Related https://github.com/pytorch/pytorch/issues/55070 Port threshold and threshold_backward to structure Pull Request resolved: https://github.com/pytorch/pytorch/pull/57810 Reviewed By: agolynski Differential Revision: D28382716 Pulled By: ezyang fbshipit-source-id: 8d0702ad074b52e8512524d9807c93bfe04c51d6	2021-05-12 15:04:55 -07:00
Edward Yang	82b2013eac	Delete move constructor on TensorImpl (#58048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58048 It's never used, and it is also a bit dangerous, because a move typically destroys the source location, but there may be other owning references to the original location! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28390241 Pulled By: ezyang fbshipit-source-id: 68f22756ac066a7a0fc8baedd2b7834c01c2c534	2021-05-12 15:03:49 -07:00
Yi Wang	9bfc1c4e0e	[Gradient Compression] Update the docstring of fp16_compress_hook (#58168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58168 Update the documentation to be consistent to https://github.com/pytorch/pytorch/pull/57410. ghstack-source-id: 128797174 Test Plan: N/A Reviewed By: agolynski, zhengwy888 Differential Revision: D28388160 fbshipit-source-id: 6ba13ad9f9d7b4d003cdc112545573e452df8b65	2021-05-12 14:28:41 -07:00
Sam Estep	2073e866ad	Switch GHA test stats S3 upload token (#58156 ) Summary: TODOs: - [x] generate a temporary new token on this repo for testing purposes - [x] change the name of the S3 secret used in the workflow YAML definitions - [x] check the test plan - [x] replace the temporary token with a more permanent one - [x] check the test plan again - [x] uncomment the `if` statement that guards against uploading PR test stats Pull Request resolved: https://github.com/pytorch/pytorch/pull/58156 Test Plan: Check the [ossci-metrics bucket](https://s3.console.aws.amazon.com/s3/buckets/ossci-metrics) after CI runs on this PR. Specifically, [this prefix](`a3445bfbd7/pytorch-linux-xenial-py3.6-gcc5.4/`&showversions=false) has two objects under it. Reviewed By: janeyx99 Differential Revision: D28393138 Pulled By: samestep fbshipit-source-id: 2c39c102652d471afa016cfc4942bb1e5bbb4163	2021-05-12 14:20:18 -07:00
Yi Wang	581bf01074	[Gradient Compression] Remove unnecessary warning on the rst file and the check on C++ version (#58170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58170 Now comm hook can be supported on MPI and GLOO backends besides NCCL. No longer need these warnings and check. ghstack-source-id: 128799123 Test Plan: N/A Reviewed By: agolynski Differential Revision: D28388861 fbshipit-source-id: f56a7b9f42bfae1e904f58cdeccf7ceefcbb0850	2021-05-12 14:15:10 -07:00
Mikhail Zolotukhin	4c24d820ff	[TensorExpr] Implement 'call_raw' in CUDA codegen. (#57901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57901 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28312107 Pulled By: ZolotukhinM fbshipit-source-id: 53b4fd418d0c7bf70647278ee03efa5ef60b3af8	2021-05-12 14:08:20 -07:00
Mikhail Zolotukhin	c751e53800	[TensorExpr] Implement 'call_raw' in IREval. (#57882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57882 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28306752 Pulled By: ZolotukhinM fbshipit-source-id: 11d0034f9bfbadf8483de90c457f952a2161f10b	2021-05-12 14:08:18 -07:00
Mikhail Zolotukhin	cbba3db21b	[TensorExpr] Minor cleanup in IREval. (#57881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57881 Test Plan: Imported from OSS Reviewed By: navahgar, ngimel Differential Revision: D28306751 Pulled By: ZolotukhinM fbshipit-source-id: aad9774d62d2e54b3ca51f5cc2ced841c6b9206b	2021-05-12 14:07:08 -07:00
Alban Desmaison	5e83c62a9e	Revert D28351931: [pytorch][PR] Fix some tensor operators to return `NotImplemented` for invalid inputs Test Plan: revert-hammer Differential Revision: D28351931 (`35521a2629`) Original commit changeset: 985457a44dba fbshipit-source-id: 10724c219e53648f10a70719e25bcf774c6c7852	2021-05-12 13:58:03 -07:00
Hameer Abbasi	46e4b2dbda	Convert assert -> cast. (#57458 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55868. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57458 Reviewed By: mruberry Differential Revision: D28365745 Pulled By: walterddr fbshipit-source-id: 35cc3fa85f87b0ef98cf970f620ab909d240c7be	2021-05-12 13:54:16 -07:00
Mehdi Mirzazadeh	614437751f	make remote model instantiation async when possible (#58052 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58052 for the cases where `module_interface_cls` is not provided Pull Request resolved: https://github.com/pytorch/pytorch/pull/58052 Reviewed By: mruberry Differential Revision: D28369064 Pulled By: mrzzd fbshipit-source-id: 3ded7ea943a5ff0425bedc05448a59e6eefbeaaf	2021-05-12 13:48:09 -07:00
Brian Hirsh	0bfcc3e3f4	fix topk with k=0 on cuda (#58086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58086 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28364964 Pulled By: bdhirsh fbshipit-source-id: 4d02bf5b27ca5e8b6f7b6cc6aa99d9e31233578b	2021-05-12 13:38:10 -07:00
albanD	cbd1227809	Add a note in the parametrize doc about the naming choice (#58142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58142 Reviewed By: agolynski Differential Revision: D28386655 Pulled By: albanD fbshipit-source-id: c2793ac377ef7082c1840e1a50604da3ff9c61ac	2021-05-12 13:15:56 -07:00
Sujoy Saraswati	3c973de543	HABANA Device registration key and Autograd key addition (#57094 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57094 Reviewed By: mruberry Differential Revision: D28355895 Pulled By: wconstab fbshipit-source-id: 5d8b5762a69f444f4fe7f476891150fa5483d893	2021-05-12 13:07:33 -07:00
Sam Estep	c9eb381aac	Allow zero jobs in tools/explicit_ci_jobs.py (#58176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58176 Test Plan: ``` tools/explicit_ci_jobs.py --keep-gha ``` Reviewed By: driazati Differential Revision: D28390351 Pulled By: samestep fbshipit-source-id: 1dc01c523b465efd0b617d98d0cdd1a759882110	2021-05-12 13:03:34 -07:00
Bert Maher	6955d4d0f7	[nnc] Handle only the first argument of aten::to (#58028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58028 We were trying to translate the device argument and thus throwing an unsupported dtype. ghstack-source-id: 128748658 Test Plan: predictor models Reviewed By: navahgar Differential Revision: D28347704 fbshipit-source-id: 331a5786339e01f9df1b1878970b0c5983a92980	2021-05-12 12:52:29 -07:00
Bert Maher	a88673e93e	Enable cat wo conditionals iff cpu (#58026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58026 Cat-without-conditionals is a valuable optimization on CPU but on GPU it can generate invalid code since it may introduce allocations (i.e. extra kernel launches) ghstack-source-id: 128748630 Test Plan: predictor Reviewed By: navahgar Differential Revision: D28347703 fbshipit-source-id: f9e68cd7bcf5d316082ce8378ddf99f2d33fcc07	2021-05-12 12:51:10 -07:00
Jithun Nair	ab6b5fa036	Add HIP (ROCm) semantics doc (#57871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57871 Reviewed By: agolynski Differential Revision: D28385510 Pulled By: malfet fbshipit-source-id: 9cf69e52d026a1cf74cc12d8727ca17ae026235e	2021-05-12 12:34:07 -07:00
Eli Uriegas	af36d084fd	reland [ROCm] ubuntu version check in install_rocm.sh (#58164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58164 Reland of https://github.com/pytorch/builder/pull/764 This reverts commit 6404184700159213f7d64df62537e238822f8b15. Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28387195 Pulled By: seemethere fbshipit-source-id: 905a14d9ed5e85a7dff7da3d1c4a628320ff7451	2021-05-12 12:29:17 -07:00
Mustafa Bal	53bc6f79f3	Added DevOps PR and Nightly Build logic (#58007 ) Summary: This PR adds Azure DevOps support for running custom PyTorch unit tests on PyTorch PR and Nightly builds. PR Builds on Azure DevOps: - Ensures that the wheel artifacts for a given PR build is ready - Once the wheels are ready, PyTorch custom tests are run on torch installation from build wheels Nightly Builds on Azure DevOps: - Cues 4 builds {Win,Linux}*{cpu, CUDA} to run PyTorch custom unit tests on nightly PyTorch builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58007 Reviewed By: seemethere, mruberry Differential Revision: D28342428 Pulled By: malfet fbshipit-source-id: a454accf69163f9ba77845eeb54831ef91437981	2021-05-12 12:24:41 -07:00
liuyuanqiang@bytedance	7156168f71	Port max_pool2d_with_indices_backward to structure (#57797 ) Summary: Realted https://github.com/pytorch/pytorch/issues/55070 Port max_pool2d_with_indices_backward to structure kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/57797 Reviewed By: bdhirsh Differential Revision: D28289731 Pulled By: ezyang fbshipit-source-id: 4c562d0b9fddbf9a445062f8723eeec607bd1108	2021-05-12 12:11:29 -07:00
Erjia Guan	3b977b3b4d	[DataLoader] Add context manager for runtime type validation (#55936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55936 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27743476 Pulled By: ejguan fbshipit-source-id: 8f0454ccf3ec37807598056433bff91013fa9bb9	2021-05-12 11:59:16 -07:00
Erjia Guan	5c696443c7	[DataLoader] Modfity construct_time_validation to argument_validation (#55836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55836 Change construct_time_validation to argument_validation as we should provide users the flexibility to use this decorator over all different functions, which are required with type validation. It can also work as a construct-time validation ```py class ExampleDataPipe(IterDataPipe): argument_validation def __init__(self, dp: IterDataPipe[int]): self.dp = dp ... ``` Notebook is also updated. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27743478 Pulled By: ejguan fbshipit-source-id: 49743152d121028cd7d72d89dc7df5c7c7b94c41	2021-05-12 11:58:05 -07:00
Rohan Varma	a0ac80ec76	[DDP] Don't find tensors if static graph (#58105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58105 When find_unused_parameters=True but static_graph is also set, static graph handles unused parameter accounting, so this code path is not needed ghstack-source-id: 128736289 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28371954 fbshipit-source-id: 0b42a9c0fd2bba26a0de288436e9c7139e292578	2021-05-12 11:40:18 -07:00
Lijiang Fang	87afcea0cc	T90561249: Enforce kernel launch checks (#58116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58116 T90561249: Enforce kernel launch checks Test Plan: how to test? Reviewed By: r-barnes Differential Revision: D28367890 fbshipit-source-id: 159dd3e14a4532c1325a0a332c02ef58d720a91b	2021-05-12 11:34:22 -07:00
Akifumi Imanishi	35521a2629	Fix some tensor operators to return `NotImplemented` for invalid inputs (#57934 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57719. This PR fixes `torch.Tensor{__rsub__, __rdiv__, __rtruediv__, __pow__, __rmatmul__}` to return `NotImplemented` instead of raising a `TypeError`. cc/ mruberry: The first commit of this PR is the same as `1d209db1cc` excepts the commit message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57934 Reviewed By: mruberry Differential Revision: D28351931 Pulled By: albanD fbshipit-source-id: 985457a44dba24d2496794dfb8c1661cbcd4ff8f	2021-05-12 11:03:23 -07:00
Eli Uriegas	6404184700	Revert D28385479: [pytorch][PR] [ROCm] ubuntu version check in install_rocm.sh Test Plan: revert-hammer Differential Revision: D28385479 (`94bb1150a7`) Original commit changeset: 10ad225b7185 fbshipit-source-id: ac9167c906404e87ec0b94cf1c4e9c4cab7aca0f	2021-05-12 10:52:01 -07:00
Shiyan Deng	9d56176034	Fix splitter and add a unittest (#58075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58075 Pull Request resolved: https://github.com/facebookresearch/pytext/pull/1687 Reviewed By: mikekgfb Differential Revision: D28357724 fbshipit-source-id: 36c2d211576a90107bc75468a39408ffecaeed43	2021-05-12 10:40:37 -07:00
Shiyan Deng	bfd0a46156	[fx] Arg normalization not save output node in the node_map (#58058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58058 Don't save output node in the `node_map` because the result of output node could be a list of proxies and would throw an error when being used as key. Test Plan: CI Reviewed By: mikekgfb Differential Revision: D28329580 fbshipit-source-id: a29f3ef1763930faa20cb20eb9ffd04ef7e52dc1	2021-05-12 10:39:37 -07:00
Chester Liu	3603ba24d5	Trigger Windows multi gpu tests on master (#57817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57817 Reviewed By: mruberry Differential Revision: D28368299 Pulled By: malfet fbshipit-source-id: 765ef740c25477ba8a5d41489ffad4e5d8456236	2021-05-12 10:36:02 -07:00
Jithun Nair	8f83bfeb98	Update CI images for rocm4.2 (#58017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58017 Reviewed By: agolynski Differential Revision: D28385181 Pulled By: malfet fbshipit-source-id: b4bb02d4dfaaa741ee6a804bbd7d7e9e394f7321	2021-05-12 10:31:07 -07:00
Jeff Daily	94bb1150a7	[ROCm] ubuntu version check in install_rocm.sh (#57751 ) Summary: In preparation for ROCm 4.2 release changing the apt repo name from xenial to ubuntu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57751 Reviewed By: agolynski Differential Revision: D28385479 Pulled By: malfet fbshipit-source-id: 10ad225b71857226d8e36eaa62eba4511d9362e7	2021-05-12 10:23:45 -07:00
Garrett Cramer	16d617c3e5	test experiment script (#57925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57925 1. adds test_scripts.py that will run added scripts and verify that there are no errors 2. adds local ddp_nccl_allreduce experiment script test with command `pytest test_scripts.py` Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D28382452 Pulled By: gcramer23 fbshipit-source-id: 21028a990ebfedf1aad6b007a723c02403e8bea8	2021-05-12 10:22:47 -07:00
Winston Smith	d212bf1863	Enable `BFloat16` for `nan_to_num` on CUDA (#58063 ) Summary: Enabled BFloat16 for `nan_to_num` on CUDA. For comparison with numpy, a [workaround suggested](https://github.com/pytorch/pytorch/issues/57982#issuecomment-839150556) by ngimel is being used - the OpInfo's `sample.kwargs` is used to set two `numpy.kwargs`, viz. `posinf` & `neginf` for `BFloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58063 Reviewed By: mruberry Differential Revision: D28373478 Pulled By: ngimel fbshipit-source-id: 6493b560d83632a8519c1d3bfc5c54be9b935fb9	2021-05-12 09:50:26 -07:00
Rohan Varma	c52700dbcd	[wip] enhance DDPSink to work for general outputs (#57073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57073 Enhances use of DDPSink to work for all output types DDP supports as per https://github.com/pytorch/pytorch/issues/55876. TODO: Add additional testing for tuple, list, dict return types ghstack-source-id: 128726768 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27756985 fbshipit-source-id: 2e0408649fb2d6a46d6c33155a24c4c1723dd799	2021-05-12 09:45:10 -07:00
Rohan Varma	4faa427383	Remove printout from distributed tests (#58095 ) Summary: These were added to help debug a flaky test, the flaky test has since been resolved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/58095 Reviewed By: SciPioneer Differential Revision: D28368077 Pulled By: rohan-varma fbshipit-source-id: 9618f64de2b7015401bb8cb7816b09e1a44e0fef	2021-05-12 09:34:38 -07:00
Edgar Andrés Margffoy Tuay	30f26c5893	Reimplement torch::flip based on advanced indexing (#56713 ) Summary: ## Rationale This PR improves the performance of `torch::flip` by using `TensorIterator` as the same fashion as using `AdvancedIndexing`. Which means that this implementation is semantically equivalent to indexing a tensor using reverse indices `A[dim0_size - 1:0 ..., dimN_size-1:0, ...]`. ## Benchmark results The following benchmark compares the runtime of this implementation of `flip` against the current implementation, AdvancedIndexing with reversed indices, as well as OpenCV one. The comparison scenarios consider a 4D tensor `[B, C, H, W]`, where the dimensions flipped correspond to `H` (vertical flip) and `W` (horizontal flip) under float32 and uint8 datatypes. The benchmark implementation details can be found in https://github.com/andfoy/flip-benchmarks/blob/main/5_Stable_implementation/benchmarks.py. Additionally, there are correctness tests against the current flip implementation in https://github.com/andfoy/flip-benchmarks/blob/main/5_Stable_implementation/main.cpp, which tests against different layouts, datatypes and contiguous/non-contiguous tensors. The following plots correspond to the means of the runtime of each operator after 100 samples. As it is possible to observe, the latest implementation of flip has a runtime similar to the indexing one. Also, the performance gains are up to 6X under some scenarios. ### Horizontal flip (float) ![bokeh_plot](https://user-images.githubusercontent.com/1878982/115766715-e72a3d80-a36d-11eb-8552-9005028900b1.png) ### Horizontal flip (uint8) ![bokeh_plot(1)](https://user-images.githubusercontent.com/1878982/115766720-e7c2d400-a36d-11eb-822d-44046882c976.png) ### Vertical flip (float) ![bokeh_plot(2)](https://user-images.githubusercontent.com/1878982/115766721-e7c2d400-a36d-11eb-8f4b-d44c8c33d104.png) ### Vertical flip (uint8) ![bokeh_plot(3)](https://user-images.githubusercontent.com/1878982/115766725-e85b6a80-a36d-11eb-907a-cfcddba555ad.png) cc fmassa vfdev-5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56713 Reviewed By: datumbox Differential Revision: D28255088 Pulled By: fmassa fbshipit-source-id: 5b8684812357c331e83a677b99cf0d78f0821678	2021-05-12 09:17:03 -07:00
$PCTURBOX\anton$ PCTURBOX\anton	5ea87f9c24	Grammatically updated the tech docs (complex_numbers.rst) (#57540 ) Summary: Small grammatical change in complex_numbers.rst . -You can see the changes in the screenshot below - ![Capture](https://user-images.githubusercontent.com/38073192/117013956-01aed000-acf9-11eb-9d17-1e369de68585.PNG) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57540 Reviewed By: albanD Differential Revision: D28233650 Pulled By: mrshenli fbshipit-source-id: 0cec7bb1f4bd61e929e2a8fc5292bc20b77aee35	2021-05-12 09:05:18 -07:00
kshitij12345	ff982ef73d	OpInfo: reshape, reshape_as and minor clean-up (#57460 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57460 Reviewed By: nairbv Differential Revision: D28151675 Pulled By: anjali411 fbshipit-source-id: 2b3bcadab3ff5d1761b2922b63afd70a354e785c	2021-05-12 06:05:21 -07:00
Mike Ruberry	c911c30520	Revert D28291041: enable deterministic path for index_copy_cuda with index_put Test Plan: revert-hammer Differential Revision: D28291041 (`14badd9929`) Original commit changeset: 7f0cf3ec7280 fbshipit-source-id: 6117bc6e5b2044ce70d4e4a19bccd8c183ea3702	2021-05-12 03:33:57 -07:00
Kurt Mohler	c7fb0a0e82	Remove beta warning for use_deterministic_algorithms (#58074 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/58073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58074 Reviewed By: ngimel Differential Revision: D28373676 Pulled By: mruberry fbshipit-source-id: cae9a92ebbf6ac5f8d3008aa6a6a9cd5c1041c9f	2021-05-12 03:30:12 -07:00
Peter Bell	e1078d42f0	std/var: Return real results for complex input (#58066 ) Summary: Fixes gh-56627 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58066 Reviewed By: ngimel Differential Revision: D28372987 Pulled By: mruberry fbshipit-source-id: c34d55f1a48047ceefa298ef2f4f33ad7dd1e577	2021-05-12 03:26:55 -07:00
lezcano	db13119fc4	Deprecate symeig (#57732 ) Summary: This one had a tricky usage of `torch.symeig` that had to be replaced. I tested the replacement locally though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57732 Reviewed By: bdhirsh Differential Revision: D28328189 Pulled By: mruberry fbshipit-source-id: 7f000fcbf2b029beabc76e5a89ff158b47977474	2021-05-12 02:21:35 -07:00
Ilia Cherniavskii	e18f5f1d13	[profiler][small] Add skip_first parameter to the default schedule (#58025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58025 Add `skip_first` parameter to allow for arbitrary profiler step ranges Test Plan: python test/test_profiler.py Reviewed By: gdankel Differential Revision: D28347768 Pulled By: ilia-cher fbshipit-source-id: bb6fd3cedfa4a5d1307b91002def733896dd03eb	2021-05-12 02:06:11 -07:00
Ilia Cherniavskii	cdf161c382	[profiler][small] Speed up postprocessing (#58021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58021 Improve complexity of _remove_dup_nodes function Test Plan: using trivial microbenchmark: ``` import torch from torch.autograd.profiler import * import time evts = EventList() id_cnt = 0 for r in range(101000): st = r 1000 evts.append(FunctionEvent(id_cnt, thread=0, name="parent", start_us=st, end_us=st+100)) evts.append(FunctionEvent(id_cnt+1, thread=0, name="parent", start_us=st+1, end_us=st+99)) evts.append(FunctionEvent(id_cnt+2, thread=0, name="child", start_us=st+10, end_us=st+90)) id_cnt+=3 st = time.time() evts._build_tree() print("Elapsed: {:.3f}s".format(time.time() - st)) ``` ``` After: python test_prof.py Elapsed: 0.203s Before: python test_prof.py Elapsed: 3.653s ``` Reviewed By: gdankel Differential Revision: D28347217 Pulled By: ilia-cher fbshipit-source-id: d62da3400009f1fa8cb41a11a828aa8307f190bf	2021-05-12 02:06:09 -07:00
Ilia Cherniavskii	bf2ebfc9f6	[profiler][small] Handle empty trace (#58013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58013 Add a test case and a fix (legacy profiler) for empty trace handling Test Plan: python test/test_profiler.py Reviewed By: gdankel Differential Revision: D28345388 Pulled By: ilia-cher fbshipit-source-id: 4727589ab83367ac8b506cc0f186e5292d974671	2021-05-12 02:06:08 -07:00
Ilia Cherniavskii	f1defeaea4	[profiler][resend] Add cuda memory and distributed metadata (#58010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58010 Resending https://github.com/pytorch/pytorch/pull/57252 Test Plan: CI Reviewed By: gdankel Differential Revision: D28345161 Pulled By: ilia-cher fbshipit-source-id: 18be07b275403205f5b5487ae3589bd39a8eac96	2021-05-12 02:04:48 -07:00
Yu Guo	14badd9929	enable deterministic path for index_copy_cuda with index_put (#57870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57870 this is similar to index_add_cuda with index_put accumulate = True Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_copy_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (9.229) ✓ Pass: caffe2/test:torch_cuda - test_index_copy_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (25.750) ✓ Pass: caffe2/test:torch_cuda - main (25.750) Reviewed By: ngimel Differential Revision: D28291041 fbshipit-source-id: 7f0cf3ec72805f3617fd1de9ff03e1d49114fed8	2021-05-12 00:32:35 -07:00
Yu Guo	a07a0190f9	enable deterministic path for index_put with accumulate=False on CPU and CUDA (#57839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57839 we reuse the `index_put_accum_kernel`, rename it to `index_put_deterministic_kernel` and add a bool `accumulate` in `index_backward_kernel` Test Plan: buck test mode/opt //caffe2/test:torch -- test_index_put_non_accumulate_deterministic ✓ Pass: caffe2/test:torch - test_index_put_non_accumulate_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.120) Summary Pass: 1 Skip: 1 ↻ caffe2/test:torch - test_index_put_non_accumulate_deterministic_meta (test_torch.TestTorchDeviceTypeMETA) ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_index_put_non_accumulate_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (6.397) ✓ Pass: caffe2/test:torch_cuda - test_index_put_non_accumulate_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (26.030) ✓ Pass: caffe2/test:torch_cuda - main (26.030) Summary Pass: 2 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D28290699 fbshipit-source-id: df8bbe7af2e72017566161b05b85737fda4ceb3f	2021-05-12 00:31:19 -07:00
Luca Wehrstedt	d623fb7e04	Add a disclaimer about limited CUDA support in RPC (#58023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58023 Clearly state that some features of RPC aren't yet compatible with CUDA. ghstack-source-id: 128688856 Test Plan: None Reviewed By: agolynski Differential Revision: D28347605 fbshipit-source-id: e8df9a4696c61a1a05f7d2147be84d41aeeb3b48	2021-05-12 00:11:22 -07:00
Hao Lu	c3d40fdf56	[ATen] Use expect_contiguous in layer_norm (#58067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58067 - Use expect_contiguous in layer_norm to avoid unnecessary refcount bumps when the tensors are contiguous - Clean up some leftovers from the hacky wrappers removal cleanup: use c10::MaybeOwned<Tensor> for bias tensors - Skip dispatcher for at::empty in the layer_norm impl in Static Runtime Test Plan: CI Reviewed By: swolchok Differential Revision: D28214298 fbshipit-source-id: 73150fa62d5c18f41a2264f8e56bbe5e377ad045	2021-05-11 22:56:32 -07:00
Nikita Vedeneev	c790fd2bf8	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: albanD Differential Revision: D28355725 Pulled By: mruberry fbshipit-source-id: 281260f3b6e93c15b08b2ba66d5a221314b00e78	2021-05-11 22:53:21 -07:00
Hao Lu	32acc96f78	[Static Runtime] Fix bug in aten::clone (#58100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58100 aten::clone has a second arg, memory_format, which was not previously supported. Reviewed By: ajyu Differential Revision: D28347171 fbshipit-source-id: e083cc24c3228048429bba3497326415bc3d1f5a	2021-05-11 22:47:25 -07:00
Eddie Yan	8c91acc161	port topk to structured (#57790 ) Summary: https://github.com/pytorch/pytorch/issues/55070 There are a few places where `const_cast` is used with utility functions shared with unstructured operators. The RFC says that assigning to the `out` tensor doesn't work, but that seems to be what e.g., `_allocate_or_resize_output_with_indices` seems to do. Does assignment "work" when the tensors are not allocated? Pull Request resolved: https://github.com/pytorch/pytorch/pull/57790 Reviewed By: bdhirsh Differential Revision: D28289685 Pulled By: ezyang fbshipit-source-id: 7027f162581af0bc0f5b750ff4439b0ecb01ec7b	2021-05-11 22:14:53 -07:00
Hao Lu	e9e125475e	[Static Runtime] Add schema check to aten::repeat and fb::fast_gather (#58106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58106 Followup for D28047955 (`1f83d8eec2`). Reviewed By: ajyu Differential Revision: D28369472 fbshipit-source-id: 36aa10082589f4b6f0cc2d79f032fe72a19cda57	2021-05-11 22:07:21 -07:00
Philip Meier	8824f49e68	Split `test_testing.py::TestAsserts` for multiple devices (#56365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56365 Follow-up to https://github.com/pytorch/pytorch/pull/54784#discussion_r614156172. Instead of having one large testcase where most methods are decorated with `onlyCPU`, this factors out all tests that actually need another device into a separate test case. Test Plan: Imported from OSS Reviewed By: walterddr, albanD Differential Revision: D28247529 Pulled By: mruberry fbshipit-source-id: 946e7694b70e736941565f29b5dd459ed7fbca4e	2021-05-11 19:47:56 -07:00
Ilqar Ramazanli	8b816e9010	To implement gradient for Pytorch (#54617 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56129 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54617 Reviewed By: anjali411 Differential Revision: D28057452 Pulled By: iramazanli fbshipit-source-id: 9bd86679282d34f5e5393e6447121586517eb4f0	2021-05-11 18:52:20 -07:00
Ansley Ussery	0d4dc6cb39	Let submodules be collected as args/kwargs (#57840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57840 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D28294984 Pulled By: ansley fbshipit-source-id: d64fe109a349516da69d2d17f58e42f98af564fd	2021-05-11 18:17:11 -07:00
Kimish Patel	b7d674eb21	Revert D28331386: [pytorch][PR] [torch/elastic] Update the rendezvous docs Test Plan: revert-hammer Differential Revision: D28331386 (`e4418b67c7`) Original commit changeset: 95dd32146222 fbshipit-source-id: 5522d4a09bc06ac42943eec9aa8bf5292cc778b2	2021-05-11 18:10:46 -07:00
Ivan Yashchuk	aaca12bcc2	Deprecate in docs torch.svd and change svd -> linalg_svd (#57981 ) Summary: This PR adds a note to the documentation that torch.svd is deprecated together with an upgrade guide on how to use `torch.linalg.svd` and `torch.linalg.svdvals` (Lezcano's instructions from https://github.com/pytorch/pytorch/issues/57549). In addition, all usage of the old svd function is replaced with a new one from torch.linalg module, except for the `at::linalg_pinv` function, that fails the XLA CI build (https://github.com/pytorch/xla/issues/2755, see failure in draft PR https://github.com/pytorch/pytorch/pull/57772). Pull Request resolved: https://github.com/pytorch/pytorch/pull/57981 Reviewed By: ngimel Differential Revision: D28345558 Pulled By: mruberry fbshipit-source-id: 02dd9ae6efe975026e80ca128e9b91dfc65d7213	2021-05-11 18:04:10 -07:00
Natalia Gimelshein	e573987bea	remove syncs in one_hot (#57902 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55579 Now on cuda one-hot relies on device-side asserts thrown by scatter Pull Request resolved: https://github.com/pytorch/pytorch/pull/57902 Reviewed By: bdhirsh Differential Revision: D28328698 Pulled By: ngimel fbshipit-source-id: 1cd13e2c123c733cde7dbe4cbe6ff5335063bb70	2021-05-11 17:54:08 -07:00
Edward Yang	7a23a5e8e9	Shut up sccache couldn't connect error (#58047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58047 This error ALWAYS gets picked up by Dr. CI and IT DRIVES ME NUTS. Consign it to the /dev/null bin. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28352658 Pulled By: ezyang fbshipit-source-id: a55f99ed76728d46f02d6a61a45c7691e8be7a47	2021-05-11 17:34:09 -07:00
Michael Suo	29cfcf70be	[package] add mock/extern hooks (#58000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58000 Directly overriding save_extern and save_mock may mess with our invariants in weird ways. This is less pronounced now, but once we switch to graph-based dependency management things will get broken subtly if people fail to call `super()`. Better to add hook support to reflect that really you can only do a side effect. Also has the bonus that people are likely familiar with it from `nn.Module` hooks. Differential Revision: D28339191 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: suo fbshipit-source-id: 63ffd39d2dcb1a7524f3c2c6a23bd399e754cc44	2021-05-11 16:46:54 -07:00
Richard Barnes	d9ea93181b	Some types for remote_module (#58012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58012 Test Plan: Sandcastle Reviewed By: SciPioneer Differential Revision: D28334611 fbshipit-source-id: 5e4645a7de65e064cb6a919cdc2372151ec48d44	2021-05-11 16:43:55 -07:00
Hao Lu	1f83d8eec2	[Static Runtime] Return nullptr if the number of input args doesn't match (#58018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58018 - Add checks for the number of input args and return nullptr if it doesn't match. This is intended to make Static Runtime more robust so that op schema change is less likely to break things. Imagine that a new arg is added to an op or a new overload is added that has this added arg, SR would simply ignore this added arg. If this arg has a default value, SR would run the model with the default value and give you wrong results, which can be hard to track down. Reviewed By: ajyu Differential Revision: D28047955 fbshipit-source-id: 01067059edd5cfea80c4ee121829f7733b11f601	2021-05-11 16:30:45 -07:00
Ivan Yashchuk	a90c229900	Remove the BETA status for torch.linalg (#58043 ) Summary: We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?). Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/58043 Reviewed By: ngimel Differential Revision: D28356172 Pulled By: mruberry fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd	2021-05-11 16:11:48 -07:00
Gao, Xiang	a1f9a3c643	Fix UB in library.h (#57962 ) Summary: The function name and return type both are called `class_`, therefore they are ambiguous and this is UB and does not work on NVCC. See the tests for the failure case. Thanks for the help of Thibaut Lutz from NVIDIA's compiler team. cc: yueyericardo ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/57962 Reviewed By: mruberry Differential Revision: D28359400 Pulled By: ezyang fbshipit-source-id: c64ec89203f99f656611aba34f7424eed7bc9e7c	2021-05-11 16:04:02 -07:00
Sam Estep	c36055bb42	Make mypy_wrapper.py accept multiple filenames (#57998 ) Summary: A followup to https://github.com/pytorch/pytorch/issues/57752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57998 Test Plan: ``` mypy --config=mypy-strict.ini python tools/test/test_mypy_wrapper.py python tools/test/test_actions_local_runner.py -k mypy ``` Reviewed By: driazati Differential Revision: D28338531 Pulled By: samestep fbshipit-source-id: ae31e3fa4a2b8060c200f9a13f768beaf2f55694	2021-05-11 15:54:12 -07:00
Gary Miguel	f9c8b7f1a8	[FX][docs] minor fixes (#58085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58085 Reviewed By: mruberry Differential Revision: D28364553 Pulled By: jamesr66a fbshipit-source-id: 0d953672de9a86ecf5b1900b22e6ddef850dbe8f	2021-05-11 15:35:49 -07:00
James Reed	a13718b69f	[FX] Make stack trace testing less strict (#58088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58088 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D28365398 Pulled By: jamesr66a fbshipit-source-id: 4d5d173721b4a917893a6f1202e3980aa6e85fcc	2021-05-11 15:34:06 -07:00
Can Balioglu	e4418b67c7	[torch/elastic] Update the rendezvous docs (#57973 ) Summary: This PR updates the rendezvous documentation for the Torch Distributed Elastic section of PyTorch docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57973 Reviewed By: kiukchung Differential Revision: D28331386 Pulled By: cbalioglu fbshipit-source-id: 95dd32146222aaeff246bd3c3d2caf0036a9011b	2021-05-11 15:32:50 -07:00
Sigmund_Rolfsjord	8b12c8e8b3	Fixes: register_full_backward_hook crash if first argument don't require a gradient (#57944 ) (#57945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57945 Reviewed By: mruberry Differential Revision: D28351929 Pulled By: albanD fbshipit-source-id: d0db898e6bf13d1877cd81892a5a65c7854c8102	2021-05-11 15:07:35 -07:00
Alexander Golynski	4ef94265e9	Add Futures to ProcessGroupGloo (#57818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57818 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28304171 Pulled By: agolynski fbshipit-source-id: dbf7f5538890d138582831aa0279ede89619ea1e	2021-05-11 14:47:09 -07:00
Ivan Kobzarev	111c99cdfd	[vulkan] Fix glslc path for desktop build (#56507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56507 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D27951058 Pulled By: IvanKobzarev fbshipit-source-id: 29443b61264bb28ae4982ed9f4c21f1c45f6b519	2021-05-11 14:18:39 -07:00
Erjia Guan	d49f6d556b	[DataLoader] Fix tempfile binding and removing for torch_shm_manager (#57566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57566 Fix the problem that `tempfile` has never been deleted even after `torch_shm_manager` is destroyed. - The previous implementation has wrong path length for the Linux Socket. It leads to we lose the last character of the name of `tempfile` when bind the pathname to socket. At the end, we can not delete this file due to unexpected file name. - After we solve the racing problem by introducing a temporary directory, it becomes more dangerous since it prevents `torch_shm_manager` to delete directory as the tempfile persists in the temporary directory. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28202866 Pulled By: ejguan fbshipit-source-id: 912cfd8fec0cc309d47df223b2b0faa599c60799	2021-05-11 14:14:58 -07:00
Can Balioglu	1d4d9ffca0	[torch/elastic] Refactor rendezvous store initialization logic (#58057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58057 This PR refactors the store initialization logic and moves it to the `create_backend` function for both C10d and etcd backends. ghstack-source-id: 128671579 Test Plan: Run the existing and revised tests. Reviewed By: tierex Differential Revision: D28356587 fbshipit-source-id: caf9416ab811eefe4834268d8a11a48f2236ed5b	2021-05-11 13:46:07 -07:00
Erjia Guan	b58a7c95aa	[DataLoader] Raise detailed Error for ForwardRef type (#57824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57824 Implement type check for string type. Re-raise detailed exception at compile time. ```py >>> class InvalidData(Generic[T_co], NamedTuple): # Invalid generic namedtuple in Python typing ... name: str ... data: T_co class DP(IterDataPipe['InvalidData[int]']): ... pass TypeError: InvalidData[int] is not supported by Python typing ``` Add `__type_class__` attribute to class, which optimizes the static checking flow by reducing checking times. ```py >>> class DP1(IterDataPipe[Union[int, str]]): ... pass >>> class DP2(DP1[int]): ... pass >>> list((cls, getattr(cls, '__type_class__', None)) for cls in DP2.__mro__) [(<class '__main__.DP2'>, False), (<class 'abc.DP1[int]'>, True), (<class '__main__.DP1'>, False), (<class 'abc.IterableDataset[typing.Union[int, str]]'>, True), (<class 'torch.utils.data.dataset.IterableDataset'>, False), (<class 'torch.utils.data.dataset.Dataset'>, None), (<class 'typing.Generic'>, None), (<class 'object'>, None)] ``` Among the class of `DP2`'s MRO, only `DP2`, `DP1` will be static checked when `__type_class__` is `False`. `abc.DP1[int]` and `abc.IterableDataset[typing.Union[int, str]]` will be ignored since they are just a class with typing. ## Future When Python 3.6 is deprecated, using TypeAlias rather than TypeMeta can eliminates the usage of `__type_class__` attribute. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28289104 Pulled By: ejguan fbshipit-source-id: 1da97460c8bfc48cea7396033fde484a24caba7c	2021-05-11 13:38:30 -07:00
Edvard Ghazaryan	dd876120f9	Out version for aten::repeat (#57683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57683 Support aten::repeat for static runtime Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D27639482 fbshipit-source-id: e6e706cb1d52750eea74f19536245f0484e945e6	2021-05-11 13:21:58 -07:00
Facebook Community Bot	86b7ae181a	Automated submodule update: FBGEMM (#57983 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `566d74c27c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57983 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D28334558 fbshipit-source-id: fcc41aae7c8309e8baccbf71442436a1ebb42378	2021-05-11 12:42:11 -07:00
Ansha Yu	eb1ffa91d8	[pyper] allow static runtime on and glow on simultaneously (#57972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57972 Allow static runtime to be on when glow is on. This should be fine as long as glow AOT has already been run. Test Plan: Test on replayer with remote_other net. D28291326 fixes remaining issue removing loops from the remote_other model. Need to test on regenerated model. Reviewed By: hlu1 Differential Revision: D28275514 fbshipit-source-id: ee78972660dfdc3fcfb9af2cf7ebb19ee745a4f1	2021-05-11 12:24:07 -07:00
John Clow	698be31262	Adding support for normalization of __is__ op (#57862 ) Summary: normalizing `__is__` to `eq`, and `__isnot__` to `ne` in the case of bools. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57862 Test Plan: ``` python test/test_jit.py TestPeephole ``` 11 Tests, 1 skipped, no failures Fixes https://github.com/pytorch/pytorch/issues/57387 Reviewed By: eellison Differential Revision: D28335646 Pulled By: Gamrix fbshipit-source-id: c9f885044b32897ba35483091bcf7037759b7517	2021-05-11 12:20:47 -07:00
Kimish Patel	ad4cd6ef89	Revert D28338485: make ddp logging api to be private Test Plan: revert-hammer Differential Revision: D28338485 (`ac44569b0d`) Original commit changeset: bd2ae7c78904 fbshipit-source-id: d383f42a2051457147dec42ea273ed4fa82ffa1f	2021-05-11 12:12:51 -07:00
driazati	a02305925c	[local lint] Force color output on mypy (#58071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58071 There's an environment variable that mypy will use to force color output, so turn that on if the runner detects a terminal. Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28360742 Pulled By: driazati fbshipit-source-id: c0dc372a44ab3a16e67115ce54784f4d5a4833ee	2021-05-11 12:07:08 -07:00
Heitor Schueroff	0da5421837	Doc deprecate norm and add seealso to linalg.norm (#57986 ) Summary: BC-breaking note This PR updates the deprecation notice for torch.norm to point users to the new torch.linalg.vector_norm and torch.linalg.matrix_norm functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57986 Reviewed By: nikithamalgifb Differential Revision: D28353625 Pulled By: heitorschueroff fbshipit-source-id: 5de77d89f0e84945baa5fea91f73918dc7eeafd4	2021-05-11 12:02:12 -07:00
driazati	e385aa863a	Add tools/ script to limit circleci to a set of jobs (#58001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58001 Adds a script so that devs can generate a commit (at the base of a stack) that removes all CI jobs but the set that they care about. See CONTRIBUTING.md changes for usage Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28359910 Pulled By: driazati fbshipit-source-id: 2741570f2bab2c28f4a9d7aef727b1b2399d0ce1	2021-05-11 11:58:35 -07:00
Eddie Yan	18edb77a28	Add `pad_sequence` as a native function (#57868 ) Summary: https://github.com/pytorch/pytorch/issues/56229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57868 Reviewed By: mruberry Differential Revision: D28334174 Pulled By: ngimel fbshipit-source-id: f1647718ada596686117703b682c0af7e92e16f5	2021-05-11 11:18:13 -07:00
Yanli Zhao	ac44569b0d	make ddp logging api to be private (#57999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57999 make ddp logging api to be private ghstack-source-id: 128607185 Test Plan: unit test Reviewed By: rohan-varma Differential Revision: D28338485 fbshipit-source-id: bd2ae7c78904e93eed88be91876f5a832b5b7886	2021-05-11 10:37:03 -07:00
Erjia Guan	e0539b0ba6	[DataLoader] Remove redundant len >= 0 (#57951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57951 As pmeier suggested in another PR, just remove all redundant check for prior DataPipe. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28325414 Pulled By: ejguan fbshipit-source-id: 17497745fef1647c24a25f4ca08082dd4df6f4a7	2021-05-11 10:34:07 -07:00
Xiao Wang	7faac089ca	Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3 (#57788 ) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also https://github.com/pytorch/pytorch/issues/42666 #47953 https://github.com/pytorch/pytorch/issues/53104 #53879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61	2021-05-11 10:26:34 -07:00
Yanli Zhao	ea421fb249	enable static graph training in DDP (#55248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55248 This PR provides enable static graph training when users call _set_static_graph(). This can help support more use cases in DDP without performance regression, also can potentially improve performance when there are unused parameters in the graph. 1. first iteration records graph states like how many times a grad is calculated, whether the grad is used or not. then first iteration queues a delay_all_reduce call back to all reduce grads. 2. Since autograd call back is associated with current target graph task, the delay_all_all call back should be associated with out-most backward graph task. A DDP sink layer is added in DDP forward loop so that we can queue the delay_all_reduce call back in the sink layer. 3. after first iterations, DDP will use the saved graph states to determine whether a grad is used or not. whether a grad is ready for communication. 4. rebuilt bucket is called in second iteration, after graph states are recorded in first iteration. 5. if the graph states change, DDP will throw errors ghstack-source-id: 128599464 Test Plan: unit tests. adding more tests Reviewed By: rohan-varma Differential Revision: D27539964 fbshipit-source-id: 74de1ad2719465be67bab8688d6e293cd6e3a246	2021-05-11 10:23:25 -07:00
kshitij12345	502eb664ae	OpInfo: chunk (#57935 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57935 Reviewed By: ngimel Differential Revision: D28346217 Pulled By: mruberry fbshipit-source-id: 331995aa18fd2983fc2122a9af31fba43ab9839c	2021-05-11 10:16:10 -07:00
Basil Hosmer	90f05c005d	refactor multi_head_attention_forward (#56674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56674 `torch.nn.functional.multi_head_attention_forward` supports a long tail of options and variations of the multihead attention computation. Its complexity is mostly due to arbitrating among options, preparing values in multiple ways, and so on - the attention computation itself is a small fraction of the implementation logic, which is relatively simple but can be hard to pick out. The goal of this PR is to - make the internal logic of `multi_head_attention_forward` less entangled and more readable, with the attention computation steps easily discernible from their surroundings. - factor out simple helpers to perform the actual attention steps, with the aim of making them available to other attention-computing contexts. Note that these changes should leave the signature and output of `multi_head_attention_forward` completely unchanged, so not BC-breaking. Later PRs should present new multihead attention entry points, but deprecating this one is out of scope for now. Changes are in two parts: - the implementation of `multi_head_attention_forward` has been extensively resequenced, which makes the rewrite look more total than it actually is. Changes to argument-processing logic are largely confined to a) minor perf tweaks/control flow tightening, b) error message improvements, and c) argument prep changes due to helper function factoring (e.g. merging `key_padding_mask` with `attn_mask` rather than applying them separately) - factored helper functions are defined just above `multi_head_attention_forward`, with names prefixed with `_`. (A future PR may pair them with corresponding modules, but for now they're private.) Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D28344707 Pulled By: bhosmer fbshipit-source-id: 3bd8beec515182c3c4c339efc3bec79c0865cb9a	2021-05-11 10:09:56 -07:00
Winston Smith	4fb8676cea	Add dot implementation for BFloat16 on CUDA (#57903 ) Summary: Enabled `dot` for BFloat16 on CUDA (version 11+). It also enabled `matmul` & `vdot` for BFloat16. Backward for `matmul` isn't supported for `BFloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57903 Reviewed By: mruberry Differential Revision: D28346031 Pulled By: ngimel fbshipit-source-id: 0917e9e0d6cf3694f45fe1c7e76370581502036a	2021-05-11 09:46:58 -07:00
Winston Smith	067147ac7d	Enable BFloat16 for `logaddexp` & `logaddexp2` on CUDA (#57908 ) Summary: Enabled BFloat16 for `logaddexp` & `logaddexp2` on CUDA, with a [workaround](https://github.com/pytorch/pytorch/pull/57908#issuecomment-837320532) suggested by zasdfgbnm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57908 Reviewed By: mruberry Differential Revision: D28344976 Pulled By: ngimel fbshipit-source-id: edef654b5819b236fbd9996f962115beb6e147e1	2021-05-11 09:44:15 -07:00
Winston Smith	fa318911be	Enable geometric ops, exp2, expm1, rsqrt & erfc for BFloat16 on CUDA (#57913 ) Summary: Ops enabled for BFloat16 on CUDA (12 in total): `acos` `asin` `atan` `cosh` `sin` `sinh` `tan` `sinc` `exp2` `erfc` `expm1` `rsqrt` Enabled backward for `cos` on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57913 Reviewed By: mruberry Differential Revision: D28342969 Pulled By: ngimel fbshipit-source-id: 3c140fe408cbf93b21296a52d95ef0a0ccd96503	2021-05-11 09:43:05 -07:00
Michael Carilli	dbedb1fa1c	[CUDA graphs] Sync after replay (#57556 ) Summary: Right now there's a bug in libcuda.so that triggers sometimes when graphs with certain topologies are replayed back to back without a sync in between. Replays that hit this bug turn into spaghetti: kernels reordered ignoring dependencies, kernels elided, corrupted results. Currently, the only workaround I know that fixes all our repros is a manual sync between replays. I'll remove the sync (or special case it based on cuda version) in a later PR, as soon as a fixed libcuda.so is available. The only substantive change is the cudaDeviceSynchronize, other lines changed are de-indenting an unneeded scope. The bug is in current and semi-recent public versions of libcuda.so. We discovered the bug recently and we're not sure yet which public release was first affected. The version that ships with 11.3 is definitely affected, versions that shipped with 11.1 and earlier are likely not affected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57556 Reviewed By: mruberry Differential Revision: D28343043 Pulled By: ngimel fbshipit-source-id: 3b907241aebdb8ad47ae96a6314a8b02de7bfa77	2021-05-11 09:38:47 -07:00
Tao Xu	565550d89a	[iOS GPU][perf][5/n] Replace std:vector with IntArrayRef and SmallVector (#57668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57668 As title says ghstack-source-id: 128338530 Test Plan: CI Reviewed By: SS-JIA Differential Revision: D28053671 fbshipit-source-id: 8052398a1f31dc34f427e8eecb31ddf7a27a0754	2021-05-11 09:33:02 -07:00
Jongsoo Park	dc55ab3f77	[fbgemm] fix bug handling bias in rowwise quantization of FC (#58022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58022 Caffe2 Int8FC + rowwise quantization was not handling bias correctly. Test Plan: The example in D28347336 doesn't show bigger error with rowwise quantization any more Reviewed By: hx89, janeyx99 Differential Revision: D28347336 fbshipit-source-id: 3ac95fd2f29ef6e52705c3a2361b605813c2bcc5	2021-05-11 08:38:39 -07:00
Luca Wehrstedt	3e46d6c9e4	Update docs to mention CUDA support for Future (#50048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50048 To reflect the many changes introduced recently. In my mind, CUDAFuture should be considered a "private" subclass, which in practice should always be returned as a downcast pointer to an ivalue::Future. Hence, we should document the CUDA behavior in the superclass, even if it's CUDA-agnostic, since that's the interface the users will see also for CUDA-enabled futures. ghstack-source-id: 128640983 Test Plan: Built locally and looked at them. Reviewed By: mrshenli Differential Revision: D25757474 fbshipit-source-id: c6f66ba88fa6c4fc33601f31136422d6cf147203	2021-05-11 08:26:33 -07:00
Mehdi Mirzazadeh	9e94921a55	combine consecutive layes on the same device (#55973 ) Summary: Implements proposal https://github.com/pytorch/pytorch/issues/53438 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55973 Reviewed By: pritamdamania87 Differential Revision: D28340034 Pulled By: mrzzd fbshipit-source-id: d7fe476c0364603f36d41f348769245dac0acd88	2021-05-11 08:04:08 -07:00
Shen Li	cf7a0e5af4	Use RPC context streams to cover serde ops (#57926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57926 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28316526 Pulled By: mrshenli fbshipit-source-id: 1907ec8f46e40fa5049d810c6ad959263361b6aa	2021-05-11 07:07:51 -07:00
Tao Xu	0d564904b5	[iOS GPU][Perf][4/n] Reuse the same command buffer when copying results to CPU (#57667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57667 Context - https://fb.workplace.com/groups/pytorch.edge.team/permalink/855194118368662/ Got 5% win for mobilenetv2 and unet ghstack-source-id: 128338532 Test Plan: - CI Reviewed By: kimishpatel Differential Revision: D28116806 fbshipit-source-id: b9c766c58ae41f3408724ec962695f38985ace05	2021-05-11 04:47:58 -07:00
lezcano	43f6deb6e4	Deprecate chain_matmul (#57735 ) Summary: This one's easy. I also included a bugfix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57735 Reviewed By: bdhirsh Differential Revision: D28318277 Pulled By: mruberry fbshipit-source-id: c3c4546a11ba5b555b99ee79b1ce6c0649fa7323	2021-05-11 00:09:36 -07:00
lezcano	7707efed8f	Deprecate matrix_rank (#57734 ) Summary: This one's straightforward BC-breaking Note This PR deprecates matrix_rank in favor of linalg.matrix_rank. An upgrade guide from matrix_rank to linalg.matrix_rank is provided in the documentation of matrix_rank. It DOES NOT remove matrix_rank. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57734 Reviewed By: bdhirsh Differential Revision: D28318301 Pulled By: mruberry fbshipit-source-id: b9a27f58fdad72f408ca8b83a70c9b1fc2ef28e9	2021-05-10 23:58:46 -07:00
lezcano	415ae54c31	Deprecate torch.eig (#57727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57727 Reviewed By: bdhirsh Differential Revision: D28317984 Pulled By: mruberry fbshipit-source-id: fa1aa1b78fd3611ac208bca93e2b745a1bac41f1	2021-05-10 23:31:02 -07:00
Zheng Yan	ee48bd089c	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#55189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55189 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: allwu Differential Revision: D27482738 fbshipit-source-id: deeadd391d49ff65d17d016092df1839b82806cc	2021-05-10 23:23:50 -07:00
Thomas J. Fan	3ec16035f2	TST Migrates some of test_nn.py from assertEqualIgnoreTypes to assertEqual (#57642 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38095, https://github.com/pytorch/pytorch/issues/50006 Migrates some of `test_nn.py` from `assertEqualIgnoreTypes` to `assertEqual` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57642 Reviewed By: bdhirsh Differential Revision: D28317761 Pulled By: mruberry fbshipit-source-id: 6bea6f669569922b2a391d1523917edde976f014	2021-05-10 23:10:29 -07:00
lezcano	24087d07ca	Deprecate QR (#57745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57745 Reviewed By: bdhirsh Differential Revision: D28318164 Pulled By: mruberry fbshipit-source-id: b8e3cb9d7ab33f30c8653ec39f932a8af8bd2a50	2021-05-10 22:56:37 -07:00
lezcano	4fef1c1d74	Deprecate torch.cholesky (#57725 ) Summary: BC-breaking note: This PR deprecates torch.cholesky in favor of torch.linalg.cholesky. A upgrade guide is added to the documentation for torch.cholesky. Note this PR DOES NOT remove torch.cholesky. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57725 Reviewed By: bdhirsh Differential Revision: D28318260 Pulled By: mruberry fbshipit-source-id: e7ba049321810e70f4de08e6ac37ff800e576152	2021-05-10 22:44:25 -07:00
Tao Xu	f3e014f37b	[iOS GPU][Perf][3/n] Cache the compuation pipeline state object (#57666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57666 Got 5% improvement on mobilenetv2 and Unet 1. `std::unordered_map` is faster than `NSMutableDictionary` 2. `std::string` is cheaper than `NSString` ghstack-source-id: 128338531 Test Plan: CI Reviewed By: kimishpatel, SS-JIA Differential Revision: D28048992 fbshipit-source-id: fc4f7e41928c524acde48947d2cd6b9f6ef7cbc8	2021-05-10 22:41:57 -07:00
Oleg Khabinov	36a22967b7	[fx ir] Handle the case when output consumes get_attr directly (#57844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57844 Reviewed By: 842974287 Differential Revision: D28294298 fbshipit-source-id: db337fadca9f10f208324c9da6d95620178a189b	2021-05-10 22:04:43 -07:00
lezcano	a93314dec3	Alias det, slogdet, matrix_power, inverse, pinverse (#57821 ) Summary: When doing this, I realised that `torch.linalg.pinv` did not have a note on the problems of its derivative (`torch.pinverse` did have it), so I added that. As I was at it, I made a bit more explicit the recommendation for some functions in `torch.linalg` to prefer other functions. I also changed the mentions of "stable" to "numerically stable" as discussed with IvanYashchuk and mruberry If it seems like too much, I'm happy to move the recommendations part of `torch.linalg` to a different PR, but it was such a small thing that I figured it wouldn't be that big a deal if it was here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57821 Reviewed By: bdhirsh Differential Revision: D28317959 Pulled By: mruberry fbshipit-source-id: 6b116561bf3cba46fadc5ac14448e5d28ea88039	2021-05-10 22:00:59 -07:00
lezcano	ba84c91197	Deprecate torch.lstsq (#57743 ) Summary: BC-breaking note: This PR deprecates torch.lstsq; it adds an upgrade guide for how to use torch.linalg.lstsq instead. It DOES NOT remove torch.lstsq, but warns once when it's called Pull Request resolved: https://github.com/pytorch/pytorch/pull/57743 Reviewed By: bdhirsh Differential Revision: D28318196 Pulled By: mruberry fbshipit-source-id: 0d6df29648a91a44c7d0ac58062c1099fcb61fb8	2021-05-10 21:39:19 -07:00
Rohan Varma	5840c8cfd8	[nccl] log rank when communicator is aborted (#57974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57974 We see this error quite a bit in internal workflows, would be useful to have this additional logging information here. ghstack-source-id: 128602199 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28331693 fbshipit-source-id: 25398c6a3420a2b594d79aa8f46936cd0addd426	2021-05-10 21:23:31 -07:00
kshitij12345	a0d686c9cd	OpInfo: select (#57731 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57731 Reviewed By: bdhirsh Differential Revision: D28318229 Pulled By: mruberry fbshipit-source-id: ec9058fd188b82de80d3a2f1a1ba07f36d8d0741	2021-05-10 21:18:58 -07:00
Rohan Varma	e90fcffb65	[c10d] Log when store based barrier succeeds (#57711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57711 Seeing some hangs/issues around store based barrier internally, would be good to have this log to indicate whether store based barrier has completed successfully or not for a particular rank to debug further. ghstack-source-id: 128605600 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28249087 fbshipit-source-id: 644e5780519017ae780c3bc78bbe5def322db3f8	2021-05-10 21:09:40 -07:00
Philip Meier	71ca3e99af	Only use actually mismatched elements for reporting in `torch.testing` (#57923 ) Summary: Redo of https://github.com/pytorch/pytorch/issues/57135 out of stack --- Currently all values are used for the reported absolute and relative differences. This usually works fine, but breaks down for the extremals: ```python torch.testing.assert_close(torch.tensor([1.0, 0.0]), torch.tensor([2.0, 0.0])) ``` ``` [...] Greatest absolute difference: 1.0 at 0 (up to 1e-05 allowed) Greatest relative difference: nan at 1 (up to 1.3e-06 allowed) ``` Although the second element is matching it is listed as offender for the greatest relative difference. The `NaN` stems from the `0 / 0` division. To overcome this, we should only use the values that were considered a mismatch for the reported stats. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57923 Reviewed By: ngimel Differential Revision: D28317316 Pulled By: mruberry fbshipit-source-id: 4c604493bbe13b37f41225ea9af9e839a7304161	2021-05-10 20:58:47 -07:00
Ilia Cherniavskii	c714596027	[kineto] Update Kineto submodule, cupti library paths (#57789 ) Summary: Update kineto submodule, improve cupti detection Pull Request resolved: https://github.com/pytorch/pytorch/pull/57789 Test Plan: CI Reviewed By: ngimel Differential Revision: D28297175 Pulled By: ilia-cher fbshipit-source-id: 5895270fae160097ae8872a592984d0e4a1b187b	2021-05-10 19:15:59 -07:00
Bert Maher	f97650e70b	[nnc] Fix float->bool conversion on cpu (#57798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57798 Our instruction sequence was just plain wrong, instead of `fcmp une %x, +0.0` (unordered equal 0.0) we were doing `fcmp uno`, which is just an unordered check (i.e., is either side NaN). ghstack-source-id: 128586464 Test Plan: New unit test against the full cross-product of dtypes. Reviewed By: navahgar Differential Revision: D28276269 fbshipit-source-id: ba5e59778e07770fb78ef02309f10edde333a800	2021-05-10 18:31:38 -07:00
Yanan Cao	b8ca1219de	Add tests for custom state_dict save/load methods in TorchScript (#57886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57886 Reviewed By: jamesr66a Differential Revision: D28309228 Pulled By: gmagogsfm fbshipit-source-id: 6ac60b1d4a8017aefb6f6dff49cde598de000265	2021-05-10 18:04:56 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	fc9c486044	Add enabling default instructions flag for mobile (#57778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57778 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28268997 Pulled By: tugsbayasgalan fbshipit-source-id: 5571b233d03d3aa80c820ee4245b4d0d3b70f924	2021-05-10 17:26:05 -07:00
Yi Wang	38500d5d7b	[RPC Framework] Move the annotation w/ bold effect out of the quotes (#57965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57965 The bold effect does not work under quotes, so move it out. ghstack-source-id: 128570357 Test Plan: locally view {F614715259} Reviewed By: rohan-varma Differential Revision: D28329694 fbshipit-source-id: 299b427f4c0701ba70c84148f65203a6e2d6ac61	2021-05-10 16:51:23 -07:00
Aravind Kalaiah	747312bf61	Support for accumulate nodes traversal and to access op names in the compare function (#57685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57685 - Accumulate traversal : `minimizer.settings.traverse_method = "accumulate" ` - Feature - net_min_tests - Return op name to the compare function so that we can map the cosine similarity to the individual ops - Fix the settings combinations in net_min_tests Test Plan: buck test glow/fb/nnpi/lowering:net_min_tests NNPI_LOG_LEVEL=5 USE_INF_API=1 buck run mode/opt -j 12 --config fbcode//cxx.link_weight=3 --config misc.strip_binaries=debug-non-line -c glow.nnpi_project_name='fb-nnpi-nextgen' ai_codesign/video/inference:xrayvideo_2019a_eval -- --job create --model_a model_prod --device_a PTCPU --trace_a none --model_b model_v3 --device_b NNPI --trace_b fusion --replace_b true --log_level INFO --use_scrambled false --save_repro false --num_ab_runs 0 --symbolic_trace_b true --save_modified_model_b false USE_INF_API=1 buck test glow/fb/nnpi/lowering:net_min_tests Reviewed By: 842974287 Differential Revision: D27867010 fbshipit-source-id: 6a756468b1f1fe24ef0400669d911825a7562484	2021-05-10 15:52:17 -07:00
Alban Desmaison	036167111d	Revert D28294662: [pytorch][PR] add cuda memory and distributed metadata Test Plan: revert-hammer Differential Revision: D28294662 (`98fcdb8005`) Original commit changeset: 3c71ffa333e3 fbshipit-source-id: 7c96e13b227fe0dff60ccb1c57cfd6790f8591b7	2021-05-10 15:28:53 -07:00
Tao Xu	36172f347a	[iOS GPU][Perf][2/n] Prepack linear + Fuse relu/hardtanh (#57665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57665 Prepacking linear weights in JIT passes gives us 25% win on mobilenetv2. ghstack-source-id: 128338535 allow-large-files Test Plan: - CI Reviewed By: kimishpatel Differential Revision: D28033081 fbshipit-source-id: b006313f6b94b31b8d7ddacc0165ceab5a23dce9	2021-05-10 14:58:05 -07:00
davidriazati@fb.com	f1d01b9488	Disable test for quicklint (#57968 ) Summary: Disabling until we fix https://github.com/pytorch/pytorch/issues/57967 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57968 Pulled By: driazati Reviewed By: samestep Differential Revision: D28330226 fbshipit-source-id: 7ea130e0cf7b94959a7db18838d21e4711716625	2021-05-10 14:40:59 -07:00
venkatesh	f0f69c5dc1	torch.where is now mentioning Bool rather than Byte when given wrong dtype mask (#57942 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49769 torch.where is now mentioning Bool instead of Byte when given wrong dtype mask Pull Request resolved: https://github.com/pytorch/pytorch/pull/57942 Reviewed By: bdhirsh Differential Revision: D28330921 Pulled By: ngimel fbshipit-source-id: 44a01e1daf1790308804ca7bb606f745c3eb71e1	2021-05-10 14:36:45 -07:00
guyang3532	ebb1b74f65	Fix json parse error for profiler call stack (#57099 ) Summary: The call stack in profiler result json file lacks surrounding double quotes, and result in json parse error. This PR just add it. ilia-cher Pull Request resolved: https://github.com/pytorch/pytorch/pull/57099 Reviewed By: gdankel Differential Revision: D28324182 Pulled By: ilia-cher fbshipit-source-id: dc479a023bb25de27c414629a27d624d64457c3e	2021-05-10 13:46:01 -07:00
Mike Guo	98fcdb8005	add cuda memory and distributed metadata (#57252 ) Summary: Implementation for https://github.com/pytorch/kineto/issues/155 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57252 Reviewed By: gdankel Differential Revision: D28294662 Pulled By: ilia-cher fbshipit-source-id: 3c71ffa333e341ff8113e891681a4905f54802dc	2021-05-10 13:29:18 -07:00
Kaushik B	ba07aaf211	Fix typo in warning for spawn method (#57927 ) Summary: Fix typo in warning for spawn method Pull Request resolved: https://github.com/pytorch/pytorch/pull/57927 Reviewed By: suo Differential Revision: D28326390 Pulled By: bdhirsh fbshipit-source-id: b0c12b1020d713865687f94f28ab2873ae260c23	2021-05-10 13:12:38 -07:00
Sicheng Stephen Jia	19706d91cd	[vulkan] Add sigmoid activation functions (#57867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57867 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D28324220 Pulled By: SS-JIA fbshipit-source-id: eae9d8ecca1c641cb7b356db66c368304bc92311	2021-05-10 12:41:10 -07:00
Ailing Zhang	481806be97	Fix creation_meta for multi view outputs in NoGradMode/InferenceMode. (#57842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57842 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D28295649 Pulled By: ailzhang fbshipit-source-id: e0e11f537a97825e3fb7255aa561d3e855a6d3ce	2021-05-10 12:37:30 -07:00
Sicheng Stephen Jia	478f639779	[Vulkan] Fix seg fault during descriptor set allocation on some platforms (#57825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57825 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D28294939 Pulled By: SS-JIA fbshipit-source-id: a7847fdc118ce96bfee83e755888e28dd605c1fb	2021-05-10 12:33:49 -07:00
Sam Estep	e1cbc43f50	Use tools/print_test_stats.py in GHA (#57647 ) Summary: Judging from https://github.com/pytorch/pytorch/issues/57584, it seems like the test-reports artifact was originally intended to be downloaded to `$PWD/test-reports` instead of just directly into `$PWD`. To minimize confusion, this PR changes it to download into `test/test-reports`, which should match where the files came from in the `test` step anyway. TODOs: - [x] change the extract path for test-reports - [x] install Python dependencies - [x] call `tools/print_test_stats.py` - [x] use deep clone to allow `git` commands - [x] correctly set `CIRCLE_*` environment variables - [x] set Scribe credentials - [x] set AWS credentials Pull Request resolved: https://github.com/pytorch/pytorch/pull/57647 Test Plan: CI. Reviewed By: seemethere Differential Revision: D28325833 Pulled By: samestep fbshipit-source-id: cc322bad76747f59b764a1a0a863153bb26095e7	2021-05-10 12:29:40 -07:00
nikithamalgi	bf053a1296	Fix hasattr support type (#57950 ) Summary: `hasattr` is partially supported. This PR fixes that in the builtin table. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57950 Reviewed By: pbelevich Differential Revision: D28329005 Pulled By: nikithamalgifb fbshipit-source-id: c4cfba9badcc8f7cbc8250a5c21dfb62b35a83fc	2021-05-10 12:21:56 -07:00
Francesco Casalegno	fea3824214	Ensure torch.save() deterministic output (#57536 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/42163. ## {emoji:1f525} Pitch Currently, the binary outputs produced by `torch.save()` are non-deterministic (as pointed out in https://github.com/pytorch/pytorch/issues/42163). This means that running a simple snippet that creates a tensor (or a model) twice will produce output files with a different `md5` sum. Why does this occur? The cause of this behavior lies in the fact that the `obj._cdata` is used to identify a tensor and is written to a file, but the `_cdata` attribute is of course non-deterministic: `a80b215a9a/torch/serialization.py (L416)` Why does this matter? Reproducibility is essential for many Machine Learning projects. For instance, when using [`dvc`](https://dvc.org/) you would expect that if none of the dependencies of a stage of a ML pipeline has changed, then running the same stage another time will produce the same binary output. For the reasons explained above, with `torch` this was not the case, so this PR tries to fix this issue. ## {emoji:1f4cc} Content of this PR ### What changes? - The `persistent_id()` function now returns a deterministic value, rather than `obj._cdata` (which depends on runtime). - As a consequence, `torch.save(obj, "output.pt")` produces a deterministic output, i.e. the `md5` hash of `output.pt` is determinstic. See Test 1 and Test 2 below. ### What does not change? - If an `obj` contains several tensors that share the same underlying data (e.g. they are views of the same tensor),the `obj_key` returned by `persistent_id()` is still going to be the same for all of them - As a consequence, serialization optimizes disk storage by storing only necessary tensors, rather than writing one tensor per view. See Test 3 below. ## � How to test ### Test 1: snipped from https://github.com/pytorch/pytorch/issues/42163 Consider the following `snippet_1.py` (from https://github.com/pytorch/pytorch/issues/42163). ```python import hashlib import torch def get_sha256_hash(file: str, chunk_size: int = 4096) -> str: hasher = hashlib.sha256() with open(file, "rb") as fh: for chunk in iter(lambda: fh.read(chunk_size), b""): hasher.update(chunk) return hasher.hexdigest() file = "tensor.pt" hashes = [] for _ in range(5): obj = torch.ones(1) torch.save(obj, file) hashes.append(get_sha256_hash(file)[:8]) del obj hash = hashes[0] assert all(other == hash for other in hashes[1:]) print(hash) ``` On `master` you obtain an error ```bash $ python snippet_1.py Traceback (most recent call last): File "save_tensor.py", line 84, in <module> assert all(other == hash for other in hashes[1:]) AssertionError ``` while on this PR branch you should get the following consistent behaviour: ```bash $ for run in {1..2}; do python snippet_1.py; done 600a83cb 600a83cb ``` ### Test 2: Deterministic save of `Tensor` and `nn.Module` instances Consider the following `snippet_2.py` ```python import torch torch.manual_seed(0) x = torch.tensor([8., 8., 5., 0.]) torch.save(x, "out_tensor.pt") model = torch.nn.Sequential( torch.nn.Linear(3, 1), torch.nn.Flatten(0, 1) ) torch.save(model, "out_model.pt") ``` On `master` branch, the `md5` hash of `out_tensor.pt` and `out_model.pt` are non-determinstic, for instance you may get ```bash $ for run in {1..2}; do python snippet_2.py; md5 out_pt; done MD5 (`bc9e8af218`) (out_model.pt) = 92dca4a310b691e893f3cb41d64d5af1 MD5 (`bc9e8af218`) (out_tensor.pt) = a4ef290583f50a9c203a42d0cfc078af MD5 (`bc9e8af218`) (out_model.pt) = de3cb9791a66af8aed77ed7224bd1d5c MD5 (`bc9e8af218`) (out_tensor.pt) = 3b8a6009d3a0be5b9dd94152dcc0c7cb ``` while on this PR branch you should get the following consistent behaviour: ```bash $ for run in {1..2}; do python snippet_2.py; md5 out_pt; done MD5 (`bc9e8af218`) (out_model.pt) = dba75fd50a190e4e7fa89b7a2477bab7 MD5 (`bc9e8af218`) (out_tensor.pt) = 029f52f0706d6c813cc796d3cdcd3eb0 MD5 (`bc9e8af218`) (out_model.pt) = dba75fd50a190e4e7fa89b7a2477bab7 MD5 (`bc9e8af218`) (out_tensor.pt) = 029f52f0706d6c813cc796d3cdcd3eb0 ``` ### Test 3: Views of the same tensor are not re-written to file Consider the following `snippet_3.py`. ```python import torch torch.manual_seed(0) x = torch.rand(1_000, 1_000) y = x.T z = x.view(1_000_000, 1) torch.save({"x": x}, "out_tensor_x.pt") torch.save({"x": x, "y": y, "z": z}, "out_tensor_xyz.pt") ``` Both on `master` branch and on this PR branch you should get two output files with same size: ```bash $ python snippet_3.py && du -sh out_tensorpt && md5 out_pt 3.8M out_tensor_x.pt 3.8M out_tensor_xyz.pt MD5 (`bc9e8af218`) (out_tensor_x.pt) = eda516d9156177b27bdc2a75c9064d9b MD5 (`bc9e8af218`) (out_tensor_xyz.pt) = 333b869f5b93ced7b8649ab1571eb8e3 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57536 Reviewed By: bdhirsh Differential Revision: D28304728 Pulled By: ailzhang fbshipit-source-id: 49788e566a3cd2c6c36dc801e6bdd8f42c9459cb	2021-05-10 11:51:55 -07:00
Rohan Varma	fe3c63d9d3	[DDP] fix param to name mapping (#57771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57771 This mapping didn't work properly when certain parameters didn't require grad. Fixed that and added a test. ghstack-source-id: 128527537 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D28265636 fbshipit-source-id: 7b342ce012b2b7e33058b4c619ffb98992ed05b7	2021-05-10 11:47:46 -07:00
Nikita Shulga	b587354e4c	Add Python-3.9 CI testing (#50992 ) Summary: Skip number of tests adjust typing handling Pull Request resolved: https://github.com/pytorch/pytorch/pull/50992 Reviewed By: walterddr Differential Revision: D26170388 Pulled By: malfet fbshipit-source-id: 47852512aa3d5c25faf6687bcd0b1cbb332b0b20	2021-05-10 10:51:39 -07:00
Rong Rong (AI Infra)	29753339b7	Do not download slow test when on sandcastle (#57953 ) Summary: Downloading slow_test list on SC causes timeout, this is even a bigger issue since `common_utils.py` is reused in many internal projects/modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57953 Test Plan: CI Reviewed By: janeyx99 Differential Revision: D28325527 fbshipit-source-id: ae47c9e43ad6f416008005bb26ceb2f3d6966f2e	2021-05-10 10:39:10 -07:00
Jeffrey Wan	710a83d09f	Remove code and logic for old style custom autograd Function (#57357 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30696 ### Release Notes Instantiating a custom autograd function is now deprecated. Users should call `.apply()` on the class itself because it is a static method. --end release notes-- - There are a couple error messages that we can't entirely remove because accessing these attributes of the autograd function instance may segfault (due to cdata being nullptr). Also added a TORCH_CHECK for the name attribute which previously segfaulted. - Error message updated to convey 1) old-style functions have been deprecated 2) this access pattern was once valid - Updates variable -> Tensor for some error messages Pull Request resolved: https://github.com/pytorch/pytorch/pull/57357 Reviewed By: mrshenli Differential Revision: D28193095 Pulled By: soulitzer fbshipit-source-id: f021b105e9a3fd4a20d6ee3dfb6a06a8c34b10ca	2021-05-10 10:26:06 -07:00
Rohan Varma	d115e81a32	Fix document around DDP uneven inputs (#57448 ) Summary: Typo fix and additional clarifications on the API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57448 Reviewed By: SciPioneer Differential Revision: D28153264 Pulled By: rohan-varma fbshipit-source-id: 9bd35d918299ad7e080785d755f97b966f826615	2021-05-10 09:33:59 -07:00
Freey0	4d181ba51c	Port maximum and minimum to structured (#57630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57630 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224831 Pulled By: ezyang fbshipit-source-id: de4c40560613b68473aa53bb7424476dc558a6b2	2021-05-10 08:51:24 -07:00
Edward Yang	727c1d69d7	Remove unnecessary indirection through torch::autograd::impl::pyobj/set_pyobj (#57733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57733 I'm going to be modifying the APIs here, so the less API surface covering these functions the better. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28289082 Pulled By: ezyang fbshipit-source-id: 4b71270bb82e0d6baa4dfed2f2e4ee8831f590b5	2021-05-10 08:18:33 -07:00
Rong Rong	807bea1c4e	[JIT] initial support for PEP-585 types (#57363 ) Summary: Relates to https://github.com/pytorch/pytorch/issues/56210. Initial attempt to make support for `list`, `tuple` and `dict` type in PEP-585. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57363 Test Plan: - newly added `test_pep585_type` - CI Reviewed By: ngimel Differential Revision: D28128230 Pulled By: walterddr fbshipit-source-id: e5ba487dfd8c42e89f851d22b3aebfa56dd419bf	2021-05-10 07:29:10 -07:00
Nikita Shulga	bc798cdc1d	Add run_master_build workflow (#57899 ) Summary: Automatically generate this workflow by filtering all jobs that has filters:branches:only:master restriction Add probot config to schedule this workflow if `ci/master` label is set on PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/57899 Reviewed By: walterddr Differential Revision: D28311838 Pulled By: malfet fbshipit-source-id: 63df81212279f5edd8463d1f6b22f37253c53a98	2021-05-10 07:10:05 -07:00
Erjia Guan	ece15f6902	[DataLoader] Change Decoder signature and add MatHandler (#57391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57391 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28151601 Pulled By: ejguan fbshipit-source-id: 34814197d2f068cab0c7ca2330152ad588eb1ef0	2021-05-10 06:29:00 -07:00
CodemodService FBSourceClangFormatLinterBot	cbfce376a8	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28319469 fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988	2021-05-10 03:39:31 -07:00
Natalia Gimelshein	b84a28b50a	tweak sync note wording for linalg docs (#57924 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/57924 Reviewed By: mruberry Differential Revision: D28317350 Pulled By: ngimel fbshipit-source-id: 2411edb392ba52621941269c4835dfe573a6225a	2021-05-10 02:40:36 -07:00
Mike Ruberry	3c87fe9b14	Revert D28117714: [pytorch][PR] ATen lu_unpack. Required for making `torch.lu_solve` differentiable. Test Plan: revert-hammer Differential Revision: D28117714 (`5c67d8dfd3`) Original commit changeset: befd33db12ec fbshipit-source-id: 295b2134935542a903a73f90a7998239dfe6cc81	2021-05-09 23:20:06 -07:00
Raghavan Raman	259d19a733	[JIT] Adding a concat optimization pass (#55474 ) Summary: This PR adds a new pass in JIT that optimizes `aten::cat` ops. Specifically, here are optimizations performed: * Eliminate redundant in `cat` inputs by performing cse on the list of inputs. - This includes eliminating fully redundant `cat` ops when all the inputs are the same as well the case when "all but one" of the inputs have already been concatenated. * Expand `cat` into multiple copies and eliminate redundancies. - This also includes eliminating redundancies in the underlying buffers used for `cat`. These optimizations are not enabled in any compilation flow at this point. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55474 Reviewed By: albanD Differential Revision: D27624511 Pulled By: navahgar fbshipit-source-id: d509289fafc23e73b02f64a90219148896817339	2021-05-09 22:06:44 -07:00
Ivan Yashchuk	e7e73192f6	Added cuBLAS path for torch.linalg.lstsq (#54725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54725 cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: https://github.com/pytorch/pytorch/pull/54725#issuecomment-832234456. Performance comparison MAGMA vs cuBLAS: https://github.com/pytorch/pytorch/pull/54725#issuecomment-827649039. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248803 Pulled By: mruberry fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400	2021-05-09 21:20:16 -07:00
Ivan Yashchuk	d11cce4f5e	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28312683 Pulled By: mruberry fbshipit-source-id: dc8ae837a5fb0685d85c8733a47d7d25dc46443a	2021-05-09 21:19:10 -07:00
Thomas J. Fan	300363b54f	CLN Removes unused RReLU code (#57672 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/49788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57672 Reviewed By: mruberry Differential Revision: D28291489 Pulled By: ngimel fbshipit-source-id: 4691051165756d38ef37b48a78f456fa44d27022	2021-05-09 20:59:03 -07:00
Weiwu Chen	50e22e1e08	Remove tmp folder when run unit test (#57800 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57799 Remove tmp folder when run cpp_api_parity test cases one by one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57800 Reviewed By: mruberry Differential Revision: D28303956 Pulled By: ngimel fbshipit-source-id: ec313d0c14ae432bc3862988eb00742810ef53e2	2021-05-09 20:07:14 -07:00
Nikita Vedeneev	5c67d8dfd3	ATen lu_unpack. Required for making `torch.lu_solve` differentiable. (#46913 ) Summary: Backward methods for `torch.lu` and `torch.lu_solve` require the `torch.lu_unpack` method. However, while `torch.lu` is a Python wrapper over a native function, so its gradient is implemented via `autograd.Function`, `torch.lu_solve` is a native function, so it cannot access `torch.lu_unpack` as it is implemented in Python. Hence this PR presents a native (ATen) `lu_unpack` version. It is also possible to update the gradients for `torch.lu` so that backward+JIT is supported (no JIT for `autograd.Function`) with this function. ~~The interface for this method is different from the original `torch.lu_unpack`, so it is decided to keep it hidden.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/46913 Reviewed By: astaff Differential Revision: D28117714 Pulled By: mruberry fbshipit-source-id: befd33db12ecc147afacac792418b6f4948fa4a4	2021-05-09 19:12:56 -07:00
Shen Li	fc55290e5b	Fix distributed autograd gradients synchronization (#57792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57792 There are two problems when using CUDA RPC with distributed autograd and distributed optimizer: 1) In local autograd engine, all autograd functions/nodes, including AccumualteGrad will use the forward stream for backward computation. But distributed autograd skips AccumulateGrad autograd function/node and directly calls into `AccumulateGrad::accumulateGrad`. As the result, it will use the default stream to accumulate gradients instead of the forward stream. This commit changes that and uses the forward stream to accumulate gradients, matching forward behavior. 2) Distributed optimizer and distributed autograd backward are separate RPC calls, and CUDA streams are not synchronized across different RPC calls. As a result, distributed optimizer might consume gradients before they are ready. This commit uses CUDA events to record the completion of gradient computation, and use those events to block current streams when getGradients() are called. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D28274876 Pulled By: mrshenli fbshipit-source-id: 22e607152324ae918084066cde8c5dbb418bba7c	2021-05-09 17:32:59 -07:00
Akifumi Imanishi	14282232d9	Fix `generate_not_implemented_tests` not testing unknown types correctly (#56997 ) Summary: Currently, the test code is not testing unknown types correctly because `op` is overwritten in the for-loop (i.e., currently only `__ior__` is tested). This PR fixes the test `generate_not_implemented_tests` to bind operator name to each method, and remove operators currently unsupported (`__rand__`, …). cc/ mruberry This fix is be needed to add tests for the operator we are going to introduce (e.g., `__rand__`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56997 Reviewed By: astaff Differential Revision: D28118465 Pulled By: mruberry fbshipit-source-id: c5a466a7604262ed5490862300d47043aff63d0b	2021-05-09 05:34:10 -07:00
Heitor Schueroff	4cf2c646c2	Added torch.linalg.matrix_norm (#57127 ) Summary: This PR is focused on the API for `linalg.matrix_norm` and delegates computations to `linalg.norm` for the moment. The main difference between the norms is when `dim=None`. In this case - `linalg.norm` will compute a vector norm on the flattened input if `ord=None`, otherwise it requires the input to be either 1D or 2D in order to disambiguate between vector and matrix norm - `linalg.vector_norm` will flatten the input - `linalg.matrix_norm` will compute the norm over the last two dimensions, treating the input as batch of matrices In future PRs, the computations will be moved to `torch.linalg.matrix_norm` and `torch.norm` and `torch.linalg.norm` will delegate computations to either `linalg.vector_norm` or `linalg.matrix_norm` based on the arguments provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57127 Reviewed By: mrshenli Differential Revision: D28186736 Pulled By: mruberry fbshipit-source-id: 99ce2da9d1c4df3d9dd82c0a312c9570da5caf25	2021-05-09 04:50:33 -07:00
Mikhail Zolotukhin	9ad19af935	[TensorExpr] Fix a condition when we use a native depthwise conv2d lowering. (#57906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57906 I think it was accidentally flipped in #56875. Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D28312947 Pulled By: ZolotukhinM fbshipit-source-id: 8d0f45e540f47daefbc270f5a2ade87f2171b958	2021-05-08 23:04:14 -07:00
Peter Bell	0c2d38264a	Improve BatchNorm1d performance (CUDA) (#57786 ) Summary: Part of gh-38915, resubmit of gh-57034 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57786 Reviewed By: mruberry Differential Revision: D28290284 Pulled By: ngimel fbshipit-source-id: 8768578ba9ace6a948cb8145c0091e0ea49b12da	2021-05-08 19:09:29 -07:00
Chen Lai	e8fb167b17	[PyTorch Edge] Reuse constant table from ts in bytecode (#56002 ) Summary: ## Note: This change will include the feature, but the feature is not on. It will be enabled and bytecode version will be bumped in D27844651 (`8c04593c0a`). Jit will generate constant tensor, and it locates in the constant folder (can find them after unzip model.ptl). Bytecode generated by lite interpreter also includes constant tensor, which are almost the same with the constant tensor value from jit. This pr will let lite interpreter reuses the constant tensor from jit, instead of reproducing the similar tensor values. The reading and writing session will be as following. More details and background can found in [Lite Interpreter Model Size Issue](https://fb.quip.com/OSidAcjhL9LS). Data size comparison can be found in [Model size analysis](https://fb.quip.com/oEm6A4bhbo06) ### Write 1. In `export_module.cpp`, store all constant tensor value from jit in an `unordered_map constants_from_jit`, where the tensor value use tensor string as a hash. constants_from_jit is a map: (tensor) => (archive_name, index). When writing bytecode archive `writeByteCode()`, the map `constants_from_jit` will also be passed all the way to it's pickler. 2. In `pickler.cpp`, a new map tensors_archive_table_ is added. It is also a map: (tensor) => (archive_name, index). The corresponding function to update the map is `updateTensorsArchiveTable`. When pushing the storage of a tensor, if the tensor exists in `tensors_archive_table_`, the root key will be `{archive_name}/{index}`, instead of `{index}`. For example, the tensor ``` torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.FloatStorage, '0', 'cpu', 90944),), 0, (1, 116, 28, 28), (90944, 784, 28, 1), False, collections.OrderedDict()), ``` will be like following instead ``` torch._utils._rebuild_tensor_v2(pers.obj(('storage', torch.FloatStorage, 'constants/0', 'cpu', 90944),), 0, (1, 116, 28, 28), (90944, 784, 28, 1), False, collections.OrderedDict()), ``` Note: Only tensors in bytecode archive will be different. The tensors in other archive remains the same, because `updateTensorsArchiveTable()` is only called when `use_tensors_archive_table_` is `true`, and `tensors_archive_table_` is only set as `true` when `bytecode_version` is a valid number. ### Read 1. In `import.cpp`, the function `read_record` passed to Unpickler is updated. The argument of `read_record` is the root key. In version 4, the root key will just be index, and `archive_name_plus_slash` + `name` will be used to get the tensor. With this change (version 5+), `read_record` will check if slash exists in the argument `name`. If it does, it means the argument is `archive_name/index`, and it can be used to get tensor directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56002 ghstack-source-id: 128498244 Test Plan: ### Verify the new model generated from this pr can reuse constant table and the numerical result is the same. 1. Build pytorch locally. `MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_CUDA=0 DEBUG=1 MAX_JOBS=16 python setup.py develop` 2. Run `python save_lite.py` ``` import torch # ~/Documents/pytorch/data/dog.jpg model = torch.hub.load('pytorch/vision:v0.6.0', 'shufflenet_v2_x1_0', pretrained=True) model.eval() # sample execution (requires torchvision) from PIL import Image from torchvision import transforms import pathlib import tempfile import torch.utils.mobile_optimizer input_image = Image.open('~/Documents/pytorch/data/dog.jpg') preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model # move the input and model to GPU for speed if available if torch.cuda.is_available(): input_batch = input_batch.to('cuda') model.to('cuda') with torch.no_grad(): output = model(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes print(output[0]) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. print(torch.nn.functional.softmax(output[0], dim=0)) traced = torch.jit.trace(model, input_batch) sum(p.numel() * p.element_size() for p in traced.parameters()) tf = pathlib.Path('~/Documents/pytorch/data/data/example_debug_map_with_tensorkey.ptl') torch.jit.save(traced, tf.name) print(pathlib.Path(tf.name).stat().st_size) traced._save_for_lite_interpreter(tf.name) print(pathlib.Path(tf.name).stat().st_size) print(tf.name) ``` 3. Run `python test_lite.py` ``` import torch from torch.jit.mobile import _load_for_lite_interpreter # sample execution (requires torchvision) from PIL import Image from torchvision import transforms input_image = Image.open('~/Documents/pytorch/data/dog.jpg') preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) input_tensor = preprocess(input_image) input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model reload_lite_model = _load_for_lite_interpreter('~/Documents/pytorch/experiment/example_debug_map_with_tensorkey.ptl') with torch.no_grad(): output_lite = reload_lite_model(input_batch) # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes print(output_lite[0]) # The output has unnormalized scores. To get probabilities, you can run a softmax on it. print(torch.nn.functional.softmax(output_lite[0], dim=0)) ``` 4. Compare the result with pytorch in master and pytorch built locally with this change, and see the same output. 5. The model size was 16.1 MB and becomes 12.9 with this change. Size comparison in production models: {F603127047} Reviewed By: iseeyuan Differential Revision: D27759891 fbshipit-source-id: 34e0cb8149011c46c1910165b545c137d7a0b855	2021-05-08 13:08:09 -07:00
Martin Yuan	737f48dfc5	Remove _save_data() and _load_data() from mobile (#57879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57879 _save_data() and _load_data() were designed as a protocol of data serialization of trainer client. As confirmed with kwanmacher and dreiss , they are not used. In addition, there's no plan to use them in Federated Learning flow. Remove them for now. Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D28306682 Pulled By: iseeyuan fbshipit-source-id: 1b993ce4d78e372ae9b83bcbe496a196f9269d47	2021-05-08 10:52:44 -07:00
Zhijing Li	88a1e8eb01	Add EMA to DecayAdagrad (#57866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57866 As titled Test Plan: f271267365 Reviewed By: lanlanfb Differential Revision: D28292875 fbshipit-source-id: f6532048eb558afce87fdada3b7dfa8457a1f538	2021-05-07 23:09:08 -07:00
Jongsoo Park	a46e927b1a	[torch] handle embedding bag with empty bag (#57446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57446 GPU EmbeddingBag is handling L == 0 . Matching CPU version. Test Plan: buck test //deeplearning/fbgemm/fbgemm_gpu:split_table_batched_embeddings_test -- test_forward Reviewed By: jiyuanzFB Differential Revision: D28145090 fbshipit-source-id: d91d0050ddd5636293a8965d3eece02633918f4c	2021-05-07 22:20:26 -07:00
Mikhail Zolotukhin	f51798d0dc	[TensorExpr] Fix UB in LoopNest::distribute. (#57883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57883 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28307300 Pulled By: ZolotukhinM fbshipit-source-id: 5c35d50759904ed10c54e71b8bcb91572341f991	2021-05-07 22:08:19 -07:00
Ilia Cherniavskii	8639fd104e	[profiler][kineto] Support for memory allocs/deallocs in the traces (#57835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57835 Pull Request resolved: https://github.com/pytorch/kineto/pull/208 Adding ability to save memory allocs/deallocs into the trace Test Plan: python test/test_profiler.py -v Reviewed By: gdankel Differential Revision: D28260915 fbshipit-source-id: d7905d38d7fac9750754ac1b293d3a1951590b5f	2021-05-07 21:23:30 -07:00
Nikita Shulga	3a66a1cb99	[clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841 ) Summary: Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy Remove existing nolint warnings using following script: ``` for file in `git ls-files \| grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i $file; done ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841 Reviewed By: samestep Differential Revision: D28295045 Pulled By: malfet fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163	2021-05-07 20:02:33 -07:00
Garrett Cramer	bc2540f0be	benchmark rpc ps (#57454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57454 DDP with NCCL AllReduce for the entire model experiment from Quip https://fb.quip.com/iQUtAeKIxWpF I have been testing this on the AI cluster. There seem to be some connection problems with RPC when using multiple trainers or parameter servers. ``` Namespace(bconfig_id='3', dconfig_id='DummyData', mconfig_id='DummyModel', pconfig_id='None', tconfig_id='DdpNcclTrainer') benchmark warmup done metrics for trainer=0 +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.972 \| 0.097122 \| 0.311644 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.58138 \| 4.31439 \| 0.00229848 \| 0.0479424 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.4807 \| 0.222566 \| 0.0555432 \| 0.235676 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.191488 \| 3.54099 \| 3.11694 \| 0.557106 \| 0.746395 \| +-----------------------------------+----------+---------+----------+------------+-----------+ metrics for trainer=1 +-----------------------------------+----------+---------+----------+-------------+------------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+=============+============+ \| backward_metric,backward \| 2.4617 \| 2.59174 \| 2.51196 \| 0.000938276 \| 0.0306313 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| batch_level_metric,batch_all \| 4.22605 \| 4.71757 \| 4.27921 \| 0.00468424 \| 0.0684415 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| foward_metric,forward_pass \| 0.807936 \| 1.50118 \| 0.846008 \| 0.00601693 \| 0.0775688 \| +-----------------------------------+----------+---------+----------+-------------+------------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 0.1536 \| 0.11222 \| 2.16726e-05 \| 0.00465538 \| +-----------------------------------+----------+---------+----------+-------------+------------+ metrics for all trainer +-----------------------------------+----------+---------+----------+------------+-----------+ \| name \| min \| max \| mean \| variance \| stdev \| +===================================+==========+=========+==========+============+===========+ \| backward_metric,backward \| 2.45248 \| 4.18304 \| 3.24198 \| 0.584391 \| 0.764455 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| batch_level_metric,batch_all \| 4.11955 \| 4.71757 \| 4.2968 \| 0.00378467 \| 0.0615197 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| foward_metric,forward_pass \| 0.141312 \| 1.50118 \| 0.534287 \| 0.128284 \| 0.358167 \| +-----------------------------------+----------+---------+----------+------------+-----------+ \| hook_future_metric,nccl_allreduce \| 0.108544 \| 3.54099 \| 1.61458 \| 2.5456 \| 1.59549 \| +-----------------------------------+----------+---------+----------+------------+-----------+ ``` Test Plan: Imported from OSS Reviewed By: H-Huang, ngimel Differential Revision: D28296175 Pulled By: gcramer23 fbshipit-source-id: 5dd208fc86f8b5558d7c8860d685bb25c2e09fe7	2021-05-07 19:58:40 -07:00
Yi Wang	94080f45ab	[RPC Framework] Update rpc.rst (#57876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57876 ghstack-source-id: 128484049 Test Plan: N/A Reviewed By: pritamdamania87 Differential Revision: D28305719 fbshipit-source-id: cc0d79fb46077a0d1cf6026c373893e7d3b7761e	2021-05-07 19:42:29 -07:00
Yi Wang	4db88307d9	[RPC Framework] Add a link to the tutorial in RemoteModule docstring (#57875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57875 This tutorial combines DDP and RemoteModule. ghstack-source-id: 128482681 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D28305382 fbshipit-source-id: 572e1ec4b4aa00735fff16a6ce6ae4c7cad0b27f	2021-05-07 19:42:27 -07:00
Yi Wang	74d493cc07	[RPC Framework] Support passing RemoteModule as an arg (#57695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57695 Add pickling/unpickling support for `RemoteModule`. #Closes: https://github.com/pytorch/pytorch/issues/57516 ghstack-source-id: 128472946 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_over_the_wire buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_send_remote_module_with_a_new_attribute_over_the_wire buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule Reviewed By: rohan-varma Differential Revision: D28233108 fbshipit-source-id: 94eea2251fa53fb71912457c80d0a1e44504fc85	2021-05-07 19:41:17 -07:00
Chen Lai	8c04593c0a	[PyTorch Edge] Add backport to export old bytecode models (#56802 ) Summary: Add an api to backport a model vn to model vi. It accept an input model (file or buffer) and output a model (file or buffer) with an expected bytecode version. In this change, the input is a model and it can come from a file or buffer. The output is a model and can be either file path or buffer. When backport fails, function return false with a warning message : ``` /Users/chenlai/pytorch/cmake-build-debug/bin/test_jit --gtest_filter=LiteInterpreterTest.BackPortByteCodeModelV4:LiteInterpreterTest/.BackPortByteCodeModelV4:/LiteInterpreterTest.BackPortByteCodeModelV4/:/LiteInterpreterTest/*.BackPortByteCodeModelV4 --gtest_color=no Testing started at 2:32 PM ... CUDA not available. Disabling CUDA and MultiCUDA tests [W backport.cpp:419] Warning: Backport doesn't support backport to version3 (function _backport_for_mobile_impl) Process finished with exit code 0 ``` ## Test 1. Run both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py`. 2. Run all prod models with backport api. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56802 ghstack-source-id: 128425510 Test Plan: CI Reviewed By: raziel, iseeyuan Differential Revision: D27844651 fbshipit-source-id: 8a803cf6c76433ee0a3049b1a5570585d569f8d6	2021-05-07 18:14:33 -07:00
Pritam Damania	e9c3ce30d4	Fix flaky test_barrier_timeout_global. (#57523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57523 `_test_barrier_timeout` would run a barrier on rank 1 and sleep for `timeout` on other ranks. In some cases if the other ranks would be faster, they would enter the sleep call much earlier than rank 0 would enter barrier. As a result, they would exit before the timeout is up and rank 0 would receive a connection closed error instead of a timeout error. This would result in the barrier call exiting before the timeout and the subsequent assertion failing. #Closes: https://github.com/pytorch/pytorch/issues/57176 ghstack-source-id: 128278775 Test Plan: 1) waitforbuildbot 2) Tested synthetically by forcing a rank to exit earlier. Reviewed By: rohan-varma Differential Revision: D28170821 fbshipit-source-id: a67456a1784dd0657f264c4f5498638e0aa00de2	2021-05-07 17:32:05 -07:00
Oleg Khabinov	73f22bcbf9	[fx ir] Handle cases in GraphDrawer when shape, type or stride are not present (#57845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57845 As title says Test Plan: N/A Reviewed By: 842974287 Differential Revision: D28295999 fbshipit-source-id: f2cbf80c468f13685b17bb396c1f48972744ced0	2021-05-07 17:24:48 -07:00
davidriazati@fb.com	ee4be5322b	Fix lint in test_tensorexpr_pybind (#57869 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57869 Pulled By: driazati Reviewed By: gmagogsfm Differential Revision: D28303542 fbshipit-source-id: a4c77c1e7ee92c39fd1c7d88422728e2f1c31680	2021-05-07 15:58:21 -07:00
albanD	4fad8d1a2c	Update the default detach semantic for forward mode AD (#57820 ) Summary: This makes detach both forward and backward non-differentiable by default. You can pass the `only_backward_mode=True` argument to make it forward differentiable but backward non-differentiable. The important side effect of this change is that, by default, detach is not tracking any view information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57820 Reviewed By: ezyang Differential Revision: D28287633 Pulled By: albanD fbshipit-source-id: bdc4726fcd05889f6ac84e5a3a3ef71b2ec41015	2021-05-07 15:51:18 -07:00
Sicheng Stephen Jia	bc0965ac85	[Vulkan] Use more optimal command buffer submission rate (#57196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57196 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D28293756 Pulled By: SS-JIA fbshipit-source-id: 3099e2112c5ba665e2045cbfc57acc131143f864	2021-05-07 15:42:54 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar	b0c27b44cf	Enable backward/forward compatibility for TS runtime (#57498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57498 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28162448 Pulled By: tugsbayasgalan fbshipit-source-id: 5c21ced42a22aca7cee089e876e9d98d32f68955	2021-05-07 15:41:45 -07:00
Horace He	b38f153d91	[nnc] Added NNC lowerings for t/transpose/permute/expand + other cleaning (#57426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57426 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28293191 Pulled By: Chillee fbshipit-source-id: b8fc44299acf2569c11e87e1991a2b724434b15d	2021-05-07 15:38:56 -07:00
davidriazati@fb.com	c88167d2ed	Respect .ini for flake8 and mypy (#57752 ) Summary: Previously `make quicklint` would lint all changed files for both mypy `ini`s, regardless of whether that file was actually supposed to be run under that configuration. This PR fixes that so we are using `tools/mypy_wrapper.py` to check if files should be included. There's a similar change for `flake8` so that it now only outputs errors once and correctly excludes the paths in `.flake8`. This also adds a bunch of tests to ensure that `make lint` and `make quicklint` both work and that `make quicklint` is excluding and including what it should. Fixes #57644 ](https://our.intern.facebook.com/intern/diff/28259692/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57752 Pulled By: driazati Reviewed By: samestep Differential Revision: D28259692 fbshipit-source-id: 233d355781230f11f98a6f61e2c07e9f5e737e24	2021-05-07 15:21:57 -07:00
Ivan Yashchuk	18fed3dfbe	Change name for namedtuple return of torch.linalg.svd (#57181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57181 Documentation for torch.linalg.svd says: > The returned decomposition is a named tuple `(U, S, Vh)` The documentation is correct while the implementation was wrong. Renamed `V` -> `Vh`. `h` stands for hermitian. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice or aliases. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142162 Pulled By: mruberry fbshipit-source-id: 5e6e0ae5a63300f2db1575ca3259df381f8e1a7e	2021-05-07 15:17:43 -07:00
Ivan Yashchuk	58f32fa5fd	Remove compute_uv flag from torch.linalg.svd (#57180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57180 We have now a separate function for computing only the singular values. `compute_uv` argument is not needed and it was decided in the offline discussion to remove it. This is a BC-breaking change but our linalg module is beta, therefore we can do it without a deprecation notice. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142163 Pulled By: mruberry fbshipit-source-id: 3fac1fcae414307ad5748c9d5ff50e0aa4e1b853	2021-05-07 15:16:42 -07:00
Alexander Golynski	db412a6885	Avoid 2 extra copies when reducing sparse tensors and fix result() vs inplace output discrepancy (#57822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57822 * `AsyncSparseAllreduceWork` can avoid copying output tensors, since we keep all the results alive by means of modifying input vector directly * `AsyncSparseAllreduceWork` now returns inputs back to user instead of former behavior where it returned copies of inputs. This is consistent with other operations and process group implementations * `AsyncSparseAllreduceCUDAWork` is now copying tensors directly from CPU to input tensors avoiding extra copy `output` -> `outputs` -> `inputs`. inputs are being returned to back to user. This is consistent with other operations and process group implementations. overall AsyncSparseAllreduceCUDAWork is now avoiding 2 extra copies (as AsyncSparseAllreduceCUDAWork is using AsyncSparseAllreduceWork's impl) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28298325 Pulled By: agolynski fbshipit-source-id: 18e2104413cdf5e73a01aad464e2613807779297	2021-05-07 15:12:58 -07:00
Peter Bell	2043093217	Add correction parameter to std/var (#50903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50903 First part of #50010. Also fixes #51127. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27911345 Pulled By: mruberry fbshipit-source-id: 7138fddc935802918ab9ff19f4bc1b9f4d745d41	2021-05-07 14:40:28 -07:00
Scott Wolchok	3d2ce60539	[PyTorch] Remove dead get/setTLSCallbacks APIs (#56492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56492 These are documented as internal-only and aren't called. ghstack-source-id: 128354112 Test Plan: CI Reviewed By: ilia-cher Differential Revision: D27834789 fbshipit-source-id: 4a1aa320f952249db51945ff77563558fa884266	2021-05-07 14:00:59 -07:00
Scott Wolchok	9234d7fc27	[PyTorch] Use MaybeOwned and avoid resize in bmm_cuda (#56115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56115 No reason to size it wrong and then resize it. Also, no reason to unconditionally go through the dispatcher. ghstack-source-id: 128354110 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D27768757 fbshipit-source-id: 5dcb1fed5c5fa6707ee15359a26fde2a9a888b7f	2021-05-07 13:59:47 -07:00
Pavel Belevich	96e1a83fb2	Add Gloo TCP_TLS transport (#56442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56442 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27896285 Pulled By: pbelevich fbshipit-source-id: 589af59ca4c7c9bab2329f079382c09b71cfcf9e	2021-05-07 13:36:11 -07:00
Sicheng Stephen Jia	96fce78ac4	[Vulkan] Add -Os flag to shader compilation (#57199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57199 Reduces the size of compiled shaders, and also potentially adds some performance boost. Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D28293816 Pulled By: SS-JIA fbshipit-source-id: 424dc0bce24d6115ba2bf8405027e967f6cb9497	2021-05-07 13:15:01 -07:00
Can Balioglu	731dcd75f5	[torch/elastic] Revise the note section of RendezvousHandler doc (#57723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57723 Updated the note section of `RendezvousHandler`: - Removed the experimental API warning. - Recommended using the C10d Store instead of etcd for most users. Test Plan: N/A Reviewed By: kiukchung Differential Revision: D28253828 fbshipit-source-id: c4f34dffd1a3cc132977029fe449b6d63ddc877b	2021-05-07 13:10:22 -07:00
Sicheng Stephen Jia	64dc10e268	[JIT] Also fold NaiveSyncBatchNorm when folding batch norm (#57823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57823 Some models use the `NaiveSyncBatchNorm` instead of `BatchNorm2d`, but during inference they behave the same. This change is to ensure that `NaiveSyncBatchNorm` gets folded into convs during optimization passes, particularly `FoldConvBatchNorm`. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28291709 Pulled By: SS-JIA fbshipit-source-id: c494dc7698c3fa536146038808fedbb46c17a63b	2021-05-07 12:55:35 -07:00
Freey0	0503105bc2	Port logaddexp and logaddexp2 to structured (#57629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57629 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224830 Pulled By: ezyang fbshipit-source-id: 356aa5683f254e77e0a77d76f6ef939631a3910c	2021-05-07 12:41:24 -07:00
Freey0	223a362f63	Port lcm to structured (#57628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57628 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224833 Pulled By: ezyang fbshipit-source-id: 17ba9ea419a9fab5dcbdaefbe6d330e2e74c1e1f	2021-05-07 12:41:22 -07:00
Freey0	470c7af749	Port hypot to structured (#57627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57627 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224829 Pulled By: ezyang fbshipit-source-id: d7b683618cc0c4df923ee771d6363478df677d7c	2021-05-07 12:41:21 -07:00
Freey0	3dd88d6792	Port igamma and igammac to structured (#57626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57626 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224834 Pulled By: ezyang fbshipit-source-id: 924b0e10428429715dca1b75dce40dcd67b59134	2021-05-07 12:41:19 -07:00
Freey0	3a1dc60da5	Port nextafter to structured (#57625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57625 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224828 Pulled By: ezyang fbshipit-source-id: 560f1bf942ea14ea6d8b915753aec83d1168005e	2021-05-07 12:41:17 -07:00
Freey0	7e51ac5ea7	Port gcd to structured (#57624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57624 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224832 Pulled By: ezyang fbshipit-source-id: 30a8eba025c67d990103e49c03a396810f9d4006	2021-05-07 12:39:51 -07:00
Charles David Hernandez	5044d9dc51	Fixing quantize_per_tensor on cuda (#57703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57703 The .bzl files didn't have registerQuantizedCUDA listed for some reason but upon adding them, the previously broken commands (on CUDA) now work. note: these build files didn't affect OSS builds which was working throughout. the test_qtensor test was potentially misleading since it would pass even if CUDA support wasn't working as long as the build system wasn't CUDA enabled. I broke this out into independent tests for each device so at least a skip would be produced rather than a pass for systems without CUDA enabled. Test Plan: buck test mode/dbg //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_qtensor_cpu (quantization.test_quantized_tensor.TestQuantizedTensor)' buck test mode/dbg //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_qtensor_cuda (quantization.test_quantized_tensor.TestQuantizedTensor)' Reviewed By: jerryzh168 Differential Revision: D28242797 fbshipit-source-id: 938ae86dcd605aedf26ac0bace9db77deaaf9c0f	2021-05-07 12:26:19 -07:00
Weiyi Zheng	c07babbcf1	[Gradient Compression] Divide by world size before all_reduce to avoid overflow (#57410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57410 FP16 gradient compression may run into 'inf' issue. switching to division before allreduce can avoid this problem. ghstack-source-id: 127877083 Test Plan: before chage f268909897 after change: f270950609 If you still sees 'grad_norm = inf' after enabling fp16 hook, you can resume the training and turning off the hook. Reviewed By: SciPioneer Differential Revision: D28128628 fbshipit-source-id: 0b6648637713e4f321e39c9ccb645a6b6f1750a0	2021-05-07 12:23:21 -07:00
Holly Sweeney	626ae7f036	Copy edit of TorchScript Language Reference (#57694 ) Summary: Initial copy edit of the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57694 Reviewed By: malfet, ngimel Differential Revision: D28289209 Pulled By: holly1238 fbshipit-source-id: 7035d6790767a2f758e6019ae63df16537ef2725	2021-05-07 12:17:32 -07:00
Chester Liu	b5b158a6c6	Be more lenient with network exceptions in trigger_azure_pipeline.py (#57714 ) Summary: Fixes possible failure caused by network instability. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57714 Reviewed By: ngimel Differential Revision: D28288555 Pulled By: malfet fbshipit-source-id: 2deedf3fe1a95dae5a68d599d9603f3da4702e8e	2021-05-07 10:02:23 -07:00
Chester Liu	161ea537f0	[reland] Remove unused code in windows_build_definition.py (#57107 ) Summary: I accidentally reverted https://github.com/pytorch/pytorch/issues/56230 in https://github.com/pytorch/pytorch/issues/56128 when resolving conflicts.. This PR relands https://github.com/pytorch/pytorch/issues/56230 CC mszhanyi Pull Request resolved: https://github.com/pytorch/pytorch/pull/57107 Reviewed By: astaff Differential Revision: D28096003 Pulled By: seemethere fbshipit-source-id: ea616d6b5cb0b04841d2f4cc30bd130ade4a364c	2021-05-07 09:27:53 -07:00
Philip Meier	0dd0151c64	add `torch.testing` to docs (#57247 ) Summary: Redo of https://github.com/pytorch/pytorch/issues/56373 out of stack. --- To reviewers: please be nitpicky. I've read this so often that I probably missed some typos and inconsistencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57247 Reviewed By: albanD Differential Revision: D28247402 Pulled By: mruberry fbshipit-source-id: 71142678ee5c82cc8c0ecc1dad6a0b2b9236d3e6	2021-05-07 09:16:39 -07:00
Nikita Shulga	27f672a0fc	Fix test reporting regression (#57795 ) Summary: Here is why another move of this single line is needed: - Regardless of whether test-run failed or succeeded it's good to report number of tests executed - `docker cp \|\| echo` always succeeds so could safely be executed before any other step in "Report test results" - This command should not be part of "Run tests" step, otherwise it would not get executed if any of the test failed (if it must be part of "Run tests" step, it should be prefixed with [trap](https://tldp.org/LDP/Bash-Beginners-Guide/html/sect_12_02.html) command and defined before `docker exec` step This fixes "regression" introduced by https://github.com/pytorch/pytorch/pull/56725 although real culprit here is lack of documentation Here is an example of PR where test results are not reported back due to the failure: https://app.circleci.com/pipelines/github/pytorch/pytorch/317199/workflows/584a658b-c742-4cbb-8f81-6bb4718a0c04/jobs/13209736/steps Pull Request resolved: https://github.com/pytorch/pytorch/pull/57795 Reviewed By: samestep Differential Revision: D28275510 Pulled By: malfet fbshipit-source-id: 622f3bfca96a1ee9b8959590b28a26046eb37ea3	2021-05-07 09:12:50 -07:00
Vasiliy Kuznetsov	2901d2e694	make quantizeable MHA work with torch.jit.script (#57774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57774 Makes `torch.nn.quantizable.MultiheadAttention` work with `torch.jit.script`. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_custom_module_multi_head_attention ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28268218 fbshipit-source-id: 422868d9d26cae015d3c691ea710d82ffac3fa7f	2021-05-07 08:40:49 -07:00
Sam Estep	023ecc40ad	Revert D28248766: Update internal code for torch.linalg.solve Test Plan: revert-hammer Differential Revision: D28248766 (`5f2925074b`) Original commit changeset: 300366605653 fbshipit-source-id: 316b97791e57f9017d4bf87898aea8dc869cba79	2021-05-07 07:49:16 -07:00
Alexander	6f2c0cccdd	New: sparse complex: add linear algebra, addmm (#57129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57129 Test Plan: Imported from OSS Reviewed By: janeyx99, astaff Differential Revision: D28112701 Pulled By: ezyang fbshipit-source-id: 1b253453dc19e908fb18d0b1a83738243e0a8d59	2021-05-07 05:37:48 -07:00
Alexander	a911c4fc1c	New: Initial support for sparse complex tensors constructors for CPU/CUDA (#57125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57125 I'm opening this PR, solving the last issued reported before merging PR #54153 https://github.com/pytorch/pytorch/pull/54153#issuecomment-827997616, Solves gh-50690 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28112702 Pulled By: ezyang fbshipit-source-id: 915681954edb14b7c19c3ffe641af2d2e6649576	2021-05-07 05:36:41 -07:00
Horace He	8d363d37da	[FX] Adds PyTree support to FX through `concrete_args` (#55888 ) Summary: ``` class Foo(nn.Module): def __init__(self): super().__init__() def forward(self, y, x): for k in x: for v in x[k]: v += y return x example_dict = {'x': {'a': [fx.HOLE], 'z': [fx.HOLE, fx.HOLE]}} new_f = fx.symbolic_trace(Foo(), concrete_args=example_dict) print(new_f.code) new_f(torch.randn(5), {'x': {'a': [torch.randn(5)], 'z': [torch.randn(5), torch.randn(5)]}}) fx.symbolic_trace(new_f, concrete_args=example_dict) ``` prints out ``` def forward(self, y, x): y, tree_2, tree_3, tree_4 = pytree.tree_flatten([y, x])[0] add = tree_2 + y add_1 = tree_3 + y add_2 = tree_4 + y; y = None return {'a': [tree_2], 'z': [tree_3, tree_4]} ``` Currently, I store `in_spec` as an extra attribute on `fx.Graph`, and then include it when we do the codegen. I'm not sure if this is the right approach - it introduces a divergence between what's in `fx.Graph` and what's in the python code. Perhaps the best API is something explicit like `fx.Graph.flatten_args`, but that does make calling things a bit ... more verbose. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55888 Reviewed By: jamesr66a Differential Revision: D27884694 Pulled By: Chillee fbshipit-source-id: f9e8a70c63a8df63c9f9bd0a6459255daa5a8df8	2021-05-07 04:48:35 -07:00
Luca Wehrstedt	45012da298	Migrate from shared_ptr to intrusive_ptr for Future (#57636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57636 The "preferred" pointer holder for Future is `intrusive_ptr` (e.g., `then` returns an `intrusive_ptr`, `toFuture` returns `intrusive_ptr`, ...). However in RPC we often wrap it with `shared_ptr`. This probably dates back to when we had a separate Future type, before the merge. At the boundary between RPC and JIT this difference becomes a bit annoying, as conversions between the pointer types are needed. I think it would be simpler and more consistent to always use `intrusive_ptr`, also in RPC. This PR was produced mainly by find-and-replace, plus a couple of manual fixes. ghstack-source-id: 128296581 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D28187972 fbshipit-source-id: d4609273a1550b4921910e85d2198e02f31c905b	2021-05-07 03:59:20 -07:00
Luca Wehrstedt	36e47af58b	Pass reference to parent future in callbacks (#57635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635 Note: this PR looks massive, but it's just one simple change, codemodded many times. In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't). ghstack-source-id: 128296580 Test Plan: CI Reviewed By: wanchaol Differential Revision: D28178783 fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081	2021-05-07 03:59:18 -07:00
Luca Wehrstedt	9aa1461a68	Make wrapPropagateTLSState more generic (#57634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57634 `wrapPropagateTLSState` was restricting its argument to be an argument-less function, and I need to relax this for later work. Also, it was requiring its argument to be converted to `std::function`, and also returned a `std::function`. Each creation of a `std::function` could cause a heap allocation. It's not particularly expensive, but here we can easily avoid it by having `wrapPropagateTLSState` directly operate on generic callables (thus, possibly, raw lambdas). ghstack-source-id: 128295264 Test Plan: CI Reviewed By: ilia-cher Differential Revision: D28178782 fbshipit-source-id: d657f5751514974518606dd4fc4175e805dcb90a	2021-05-07 03:58:08 -07:00
Ivan Yashchuk	5f2925074b	Update internal code for torch.linalg.solve (#56613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56613 Replace linalg_solve_helper with `lu_stub` + `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248766 Pulled By: mruberry fbshipit-source-id: 3003666056533d097d0ad659e0603f59fbfda9aa	2021-05-07 03:29:16 -07:00
Ivan Yashchuk	adaf80bcbe	Update internal code for at::_lu_with_info (#56612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56612 The goal of this refactoring is to make the `torch.linalg.solve` to be a composition of calls to `lu_stub` and `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Replaced `lu_with_info_{cpu, cuda}` with one function that calls to `lu_stub`. Split MAGMA-based `apply_lu` into `apply_lu_looped_magma` and `apply_lu_batched_magma`. This simplifies the future switch to cuSOLVER and cuBLAS libraries. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248756 Pulled By: mruberry fbshipit-source-id: 40e02b5be4ff5f78885bcc95685aba581043e096	2021-05-07 03:28:04 -07:00
kshitij12345	9e6b7e6e6e	OpInfo: expand and expand_as (#57606 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57606 Reviewed By: albanD Differential Revision: D28249191 Pulled By: mruberry fbshipit-source-id: d985ab4e8a99b116c45953e621092929a9a8028e	2021-05-07 02:50:00 -07:00
kshitij12345	1f7309dfe3	[testing] clean-up test_unary_ufuncs.py (#57615 ) Summary: Some clean-ups Pull Request resolved: https://github.com/pytorch/pytorch/pull/57615 Reviewed By: albanD Differential Revision: D28249173 Pulled By: mruberry fbshipit-source-id: 19a300f6aa267932a7a92c2f5f377488f69bd822	2021-05-07 02:26:10 -07:00
Kushashwa Ravi Shrimali	4cb3c60c20	OpInfo: float_power (#57648 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54295 (`float_power`) cc: mruberry kshitij12345 krshrimali Pull Request resolved: https://github.com/pytorch/pytorch/pull/57648 Reviewed By: albanD Differential Revision: D28249489 Pulled By: mruberry fbshipit-source-id: 0ae5ce0d8b154724ae59f5f5b4412e34b0128d0a	2021-05-07 02:09:47 -07:00
kshitij12345	6eec730a73	[testing] atan2: Enable cases where self broadcasts (#57608 ) Summary: Just a follow-up Pull Request resolved: https://github.com/pytorch/pytorch/pull/57608 Reviewed By: albanD Differential Revision: D28249409 Pulled By: mruberry fbshipit-source-id: a1ce2cd736ac5547cecb3e21aaa50637917284bc	2021-05-07 01:48:44 -07:00
Peter Bell	159a2404bd	fft: Increase tolerance for nd-fft tests (#57576 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56820 The test only fails for inverse n-dim functions with `norm="forward"`. The relative error for isn't actually any bigger than other norm modes though. It's just that the magnitude of the result is bigger, so the absolute tolerance is less relative each element. So, I just increase the relative tolerance to compensate. This `precisionOverride` is already applied to `fftn` and `rfftn` for exactly the same reason. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57576 Reviewed By: albanD Differential Revision: D28249222 Pulled By: mruberry fbshipit-source-id: 734c7c1ae8236b253d6e3cd2218c05d21901c567	2021-05-07 01:30:32 -07:00
kshitij12345	ee79413b6a	[testing] change unaryufunc default dtypes (#57616 ) Summary: Reference: https://github.com/pytorch/pytorch/pull/56646#pullrequestreview-644839124 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57616 Reviewed By: albanD Differential Revision: D28249129 Pulled By: mruberry fbshipit-source-id: 2cfc837fd49100d2b1b2a09d9ca6db93e089e099	2021-05-07 01:20:49 -07:00
Will Constable	319b08be59	Add call_kwargs(args, kwargs) method to torch::deploy api (#57748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57748 To be used by PyTorchPredictor integration for deploy. Original commit changeset: 4d41efc733b2 Test Plan: tested via new unit tests Reviewed By: suo Differential Revision: D28258525 fbshipit-source-id: 8b9436e47501d7c1c16e79909e668100f825711e	2021-05-07 00:07:06 -07:00
Tao Xu	f2fdb61e2d	[iOS GPU][Perf][1/n] Use aten::contiguous instead of permuting weights manually (#57664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57664 Manually permuting weights is slower than using aten::contiguous. This will improve the model loading time at runtime especially on low-end devices. Some numbers from the Unet model. Average 6x faster. - iPhone 12 - before - 26.252 ms - after - 4.727 ms - iPhone 11 - before - 29.638 ms - after - 5.012 ms - iPhone X - before - 33.257 ms - after - 5.481 ms - iPhone 8 - before - 33.335 ms - after - 5.83 ms - iPhone 7 - before - 36.144 ms - after - 6.232 ms - iPhone 6s - before - 47.977 ms - after - 6.998 ms ghstack-source-id: 128338534 Test Plan: - CI Reviewed By: kimishpatel Differential Revision: D28087911 fbshipit-source-id: ad0029436e59a0ecc02ce660ed1110dc0b82848c	2021-05-06 23:15:41 -07:00
Kimish Patel	ca8090f81b	[Pytorch Edge] Enable eager symbolication in benchmarking binary (#57705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57705 This will enable module level debug info for benchmarking binary. Test Plan: Run on AIBench Reviewed By: larryliu0820 Differential Revision: D28230948 fbshipit-source-id: 5d06c6853d049ff678995a2ed4a86f4e6c85bdc7	2021-05-06 21:50:57 -07:00
Can Balioglu	e5e095cbe4	[torch/elastic] Rename etcd-/c10d-experimental to etcd-v2 and c10d (#57764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57764 As discussed offline this PR renames etcd-experimental backend to etcd-v2 and c10d-experimental backend to c10d. ghstack-source-id: 128342523 Test Plan: Run the existing unit tests. Reviewed By: kiukchung Differential Revision: D28263739 fbshipit-source-id: c3409037ecea5a8ff6daadeeb1f2fb4205cc3852	2021-05-06 19:51:53 -07:00
Facebook Community Bot	cb95c9db9f	Automated submodule update: FBGEMM (#57485 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `530356e16f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57485 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jiecaoyu Differential Revision: D28158310 fbshipit-source-id: 2ea77956a6e1709569a587c671c0c08018b8a966	2021-05-06 17:57:33 -07:00
Heitor Schueroff	1f1e2dab6b	Remove optional type for ord parameter in vector_norm (#57662 ) Summary: As per discussion here https://github.com/pytorch/pytorch/pull/57127#discussion_r624948215 Note that we cannot remove the optional type from the `dim` parameter because the default is to flatten the input tensor which cannot be easily captured by a value other than `None` ### BC Breaking Note This PR changes the `ord` parameter of `torch.linalg.vector_norm` so that it no longer accepts `None` arguments. The default behavior of `2` is equivalent to the previous default of `None`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57662 Reviewed By: albanD, mruberry Differential Revision: D28228870 Pulled By: heitorschueroff fbshipit-source-id: 040fd8055bbe013f64d3c8409bbb4b2c87c99d13	2021-05-06 17:53:25 -07:00
Yi Zhang	cb1272a846	update doc in build section (#56686 ) Summary: Why: To keep VS version always updated in README 1. update VS version link in CI. It's more convenient for my PR robot to update the version in README once the VS in CI is updated. and permlink isn't stable. 2. Move `building on legacy code` to development tips. The table is big and it looks the REAMD not updated at the first sight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56686 Reviewed By: janeyx99 Differential Revision: D28272060 Pulled By: samestep fbshipit-source-id: 4bb879ea2914cc8bcd68343a9ed230418e1f9268	2021-05-06 17:35:56 -07:00
Zhengxu Chen	8b38458011	[jit] Break interpreter.cpp into smaller files. (#56546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56546 A code move for CodeImpl and Frame to a subdirectory runtime/interpreter, so that it's easier to reuse them and navigate the interpreter code. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D28133580 fbshipit-source-id: 8de89a4e8e637836625e1ac1db95f0a3353da670	2021-05-06 16:43:57 -07:00
Sam Estep	2787f01455	Catch KeyboardInterrupt in tools/test_history.py (#57780 ) Summary: Currently, interrupting `tools/test_history.py` with `^C` gives a very long traceback: ``` $ tools/test_history.py --mode=multiline --ref=594a66 --sha-length=8 --test=test_set_dir --job pytorch_linux_xenial_py3_6_gcc5_4_test --job pytorch_linux_xenial_py3_6_gcc7_test 2021-02-10 11:13:34Z 594a66d7 pytorch_linux_xenial_py3_6_gcc5_4_test 0.36s 2021-02-10 11:13:34Z 594a66d7 pytorch_linux_xenial_py3_6_gcc7_test 0.573s errored 2021-02-10 10:13:25Z 9c0caf03 pytorch_linux_xenial_py3_6_gcc5_4_test 0.819s 2021-02-10 10:13:25Z 9c0caf03 pytorch_linux_xenial_py3_6_gcc7_test 0.449s 2021-02-10 10:09:14Z 602434bc pytorch_linux_xenial_py3_6_gcc5_4_test 0.361s 2021-02-10 10:09:14Z 602434bc pytorch_linux_xenial_py3_6_gcc7_test 0.454s 2021-02-10 10:09:10Z 2e35fe95 (no reports in S3) 2021-02-10 10:09:07Z ff73be7e (no reports in S3) 2021-02-10 10:05:39Z 74082f0d (no reports in S3) 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc5_4_test 0.414s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc5_4_test 0.476s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc7_test 0.377s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc7_test 0.326s 2021-02-10 07:27:53Z 33afb5f1 pytorch_linux_xenial_py3_6_gcc5_4_test 0.381s 2021-02-10 07:27:53Z 33afb5f1 pytorch_linux_xenial_py3_6_gcc7_test 0.294s ^CTraceback (most recent call last): File "tools/test_history.py", line 344, in <module> main() File "tools/test_history.py", line 339, in main for line in run(sys.argv[1:]): File "tools/test_history.py", line 143, in history_lines summaries = get_test_stats_summaries(sha=sha, jobs=jobs) File "/Users/sestep/github/pytorch/pytorch/tools/stats_utils/s3_stat_parser.py", line 161, in get_test_stats_summaries return _parse_s3_summaries(summaries, jobs=list(jobs or [])) File "/Users/sestep/github/pytorch/pytorch/tools/stats_utils/s3_stat_parser.py", line 147, in _parse_s3_summaries for summary in summaries: File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/boto3/resources/collection.py", line 83, in __iter__ for page in self.pages(): File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/boto3/resources/collection.py", line 166, in pages for page in pages: File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/paginate.py", line 255, in __iter__ response = self._make_request(current_kwargs) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/paginate.py", line 332, in _make_request return self._method(**current_kwargs) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call return self._make_api_call(operation_name, kwargs) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/client.py", line 662, in _make_api_call http, parsed_response = self._make_request( File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/client.py", line 682, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request return self._send_request(request_dict, operation_model) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/endpoint.py", line 134, in _send_request success_response, exception = self._get_response( File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/endpoint.py", line 166, in _get_response success_response, exception = self._do_get_response( File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response http_response = self._send(request) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send return self.http_session.send(request) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/botocore/httpsession.py", line 308, in send urllib_response = conn.urlopen( File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/http/client.py", line 1347, in getresponse response.begin() File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/http/client.py", line 307, in begin version, status, reason = self._read_status() File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/http/client.py", line 268, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/Users/sestep/miniconda3/envs/pytorch/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) KeyboardInterrupt ``` This PR eliminates that traceback using a technique from `tools/actions_local_runner.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57780 Test Plan: ``` $ tools/test_history.py --mode=multiline --ref=594a66 --sha-length=8 --test=test_set_dir --job pytorch_linux_xenial_py3_6_gcc5_4_test --job pytorch_linux_xenial_py3_6_gcc7_test 2021-02-10 11:13:34Z 594a66d7 pytorch_linux_xenial_py3_6_gcc5_4_test 0.36s 2021-02-10 11:13:34Z 594a66d7 pytorch_linux_xenial_py3_6_gcc7_test 0.573s errored 2021-02-10 10:13:25Z 9c0caf03 pytorch_linux_xenial_py3_6_gcc5_4_test 0.819s 2021-02-10 10:13:25Z 9c0caf03 pytorch_linux_xenial_py3_6_gcc7_test 0.449s 2021-02-10 10:09:14Z 602434bc pytorch_linux_xenial_py3_6_gcc5_4_test 0.361s 2021-02-10 10:09:14Z 602434bc pytorch_linux_xenial_py3_6_gcc7_test 0.454s 2021-02-10 10:09:10Z 2e35fe95 (no reports in S3) 2021-02-10 10:09:07Z ff73be7e (no reports in S3) 2021-02-10 10:05:39Z 74082f0d (no reports in S3) 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc5_4_test 0.414s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc5_4_test 0.476s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc7_test 0.377s 2021-02-10 07:42:29Z 0620c96f pytorch_linux_xenial_py3_6_gcc7_test 0.326s 2021-02-10 07:27:53Z 33afb5f1 pytorch_linux_xenial_py3_6_gcc5_4_test 0.381s 2021-02-10 07:27:53Z 33afb5f1 pytorch_linux_xenial_py3_6_gcc7_test 0.294s ^C ``` Reviewed By: walterddr Differential Revision: D28269719 Pulled By: samestep fbshipit-source-id: e5b4f2677f90f745fb171f159cced03a4f1d4b0b	2021-05-06 16:19:28 -07:00
zhouzhuojie	78fb9c2f5b	Reorder gc.py imports (#57779 ) Summary: A tiny PR that reorder imports and run autopep8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57779 Reviewed By: malfet Differential Revision: D28269455 Pulled By: samestep fbshipit-source-id: 7d3176efad96e3a8ac1cdc76a5018c7ffa00c449	2021-05-06 16:10:58 -07:00
Elias Ellison	241c2f4496	Add Gelu To NNC (#57753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57753 I'm not adding symbolic gradient because that is being added in https://github.com/pytorch/pytorch/pull/46785. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28262765 Pulled By: eellison fbshipit-source-id: be365a2d392d7ac4bcc099a184762249ec2e18a6	2021-05-06 16:04:50 -07:00
Rong Rong (AI Infra)	aedcff7275	fix codegen for lite_interpreter (#57761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57761 Reviewed By: cccclai Differential Revision: D28262513 Pulled By: walterddr fbshipit-source-id: 40fe82de540791f19fdf349e71b05a12b9a57ad0	2021-05-06 16:01:01 -07:00
Sam Estep	52d1b91d38	Give Python sub-version in GHA CUDA workflow name (#57770 ) Summary: Addresses part of https://github.com/pytorch/pytorch/issues/57686#issuecomment-833672132. Evidence that the Python version is indeed 3.6: https://github.com/pytorch/pytorch/runs/2520276328 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57770 Test Plan: CI would be nice, but this workflow does not currently run on PRs. Reviewed By: malfet Differential Revision: D28265048 Pulled By: samestep fbshipit-source-id: 513caf52a8f18d6e529e0934bf024f49e1571926	2021-05-06 15:16:37 -07:00
Sam Estep	2992ff3fb8	Revert D28142447: Improve BatchNorm1d performance (CUDA) Test Plan: revert-hammer Differential Revision: D28142447 (`b2936ad8fa`) Original commit changeset: c70109780e20 fbshipit-source-id: e93f6d00d644697b106f5ea8ab79872f353b51c6	2021-05-06 15:01:19 -07:00
Atul Jangra	3948ce2fd9	[Caffe2] Introduce c10::CudaError for CUDA Exceptions (#57609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57609 Throw c10::CudaError for CUDA Exceptions for better classification of errors Test Plan: Test locally by running some workflows Reviewed By: dzhulgakov Differential Revision: D28209356 fbshipit-source-id: 19a5fc8548433238dc224ea81a5f63a945fc5cc3	2021-05-06 14:28:45 -07:00
Michael Suo	cb234e606d	[package] fix corner case in PacakgeImporter.whichmodule (#57651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57651 We've gone back and forth on whether to emulate the `sys.modules` lookup behavior in our own `whichmodule`, the provided test is a concrete case for doing so. An additional minor cleanup is to make the type of `self.modules` in importers `Dict[str, ModuleType]`. Modules could only be None in the dictionary in older versions of the import system Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D28226536 Pulled By: suo fbshipit-source-id: c2e6da91651ddaa4fbf7171555df9e5cbe1060fd	2021-05-06 14:15:21 -07:00
Ilia Cherniavskii	2370d8c41f	[profiler] Add profiler fallback (#57612 ) Summary: Add an ability to use new profiler API even if Kineto is not compiled in, by falling back to the legacy profiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57612 Test Plan: compiled USE_KINETO=0 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake and with USE_KINETO=1 and ran python test/test_profiler.py -v Reviewed By: gdankel Differential Revision: D28217680 Pulled By: ilia-cher fbshipit-source-id: ec81fb527eb69bb0a3e0bd6aad13592200d7fe70	2021-05-06 13:35:27 -07:00
Valentin Andrei	da06ae73a3	[c2] Fix flaky test_spatial_bn_multi_batch_grad Summary: Removed the deadline restriction since the first run can take more than the deadline, wile subsequent runs are shorter. Reviewed By: ngimel Differential Revision: D28260077 fbshipit-source-id: 8ed2f5c16bc184bf4fae0a59b662fa1da2d4dd0a	2021-05-06 12:50:53 -07:00
Bin Bao	eb6445a92a	[JIT] Lazily initialize aliasDb in DCE (#56649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56649 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27926043 Pulled By: desertfire fbshipit-source-id: 6f6cca6a8d32ac26d780a41edba1e6e653050a1f	2021-05-06 12:19:32 -07:00
Peter Bell	b2936ad8fa	Improve BatchNorm1d performance (CUDA) (#57034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57034 Resolves gh-38915 For the example given in the issue, BatchNorm1d on cuDNN is around 12x slower than BatchNorm2d. Internally, cuDNN expects at least a 4d tensor (N, C, H, W) so these two modules actually call the same cuDNN code. My assumption is that cuDNN just isn't optimized for H=W=1. Instead, this disables cudnn for 2d batch_norm inputs and improves the CUDA implementation of `native_batch_norm` to be competative with cuDNN. For the example in the issue, `BatchNorm1d` now takes 335 us compared to 6.3 ms before, or a 18x speedup. Before this change, nvprof shows: ``` Type Time(%) Time Calls Avg Min Max Name GPU activities: 99.64% 630.95ms 100 6.3095ms 5.6427ms 8.8800ms void cudnn::bn_fw_tr_1C11_kernel_NCHW<float, float, int=512, bool=0, int=2>(cudnnTensorStruct, float const , cudnn::bn_fw_tr_1C11_kernel_NCHW<float, float, int=512, bool=0, int=2>, cudnnTensorStruct, float const , float const , cudnnTensorStruct, cudnnTensorStruct, cudnnTensorStruct, float const , float const , float const , cudnnTensorStruct, cudnnTensorStruct) ``` But after, it shows: ``` Type Time(%) Time Calls Avg Min Max Name GPU activities: 54.76% 14.352ms 100 143.52us 123.52us 756.28us _ZN2at6native27unrolled_elementwise_kernelIZZZNS0_72_GLOBAL__N__48_tmpxft_001e82d0_00000000_7_Normalization_cpp1_ii_db66e07022batch_norm_elementwiseERKNS_6TensorES5_RKN3c108optionalIS3_EESA_S5_S5_ENKUlvE_clEvENKUlvE2_clEvEUlfffffE_NS_6detail5ArrayIPcLi6EEE16OffsetCalculatorILi5EjESI_ILi1EjENS0_6memory15LoadWithoutCastENSL_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_ 35.09% 9.1951ms 100 91.950us 84.415us 362.17us void at::native::reduce_kernel<int=256, int=2, at::native::ReduceOp<float, at::native::WelfordOps<float, float, int, float, thrust::pair<float, float>>, unsigned int, float, int=2>>(float) 0.71% 186.14us 100 1.8610us 1.8240us 1.9840us _ZN2at6native72_GLOBAL__N__48_tmpxft_001e82d0_00000000_7_Normalization_cpp1_ii_db66e07045unrolled_elementwise_kernel_for_multi_outputsILi3EZZZNS1_34batch_norm_update_stats_and_invertERKNS_6TensorES5_S5_S5_ddlENKUlvE_clEvENKUlvE2_clEvEUlffffE_NS_6detail5ArrayIPcLi7EEE23TrivialOffsetCalculatorILi4EjESD_ILi3EjEEEviT0_T1_T2_T3_ 0.59% 153.37us 100 1.5330us 1.4720us 2.6240us void at::native::vectorized_elementwise_kernel<int=4, at::native::BUnaryFunctor<at::native::AddFunctor<long>>, at::detail::Array<char*, int=2>>(int, long, at::native::AddFunctor<long>) ``` I think there is similar scope to improve the backward implementation. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28142447 Pulled By: ngimel fbshipit-source-id: c70109780e206fa85e50a31e90a1cb4c533199da	2021-05-06 12:14:02 -07:00
Jay Chae	1101a5f6e9	[paramcomms] support for in and out split sizes (#57709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57709 NOTE: initial commit got reverted D28247764 Adding way to accept in and out split sizes. Test Plan: {F613245151} https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620153506%2F127.0.0.1%2Flibkineto_activities_1112677.json.gz&bucket=gpu_traces NOTE: ignore the GPU user showing up in CPU - the issue is fixed in the diff above the stack D28196723 (`fc657b547a`) UPDATED: now the sizes are encoded as arrays in .json https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620259313%2F127.0.0.1%2Flibkineto_activities_3944235.json.gz&bucket=gpu_traces Reviewed By: kingchc Differential Revision: D28248333 fbshipit-source-id: cee523612667cb37170c94e3c40dab5fba432225	2021-05-06 12:04:34 -07:00
Freey0	ebd2c0a4ed	Port ceil to structured (#57589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57589 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224598 Pulled By: ezyang fbshipit-source-id: 2c83d2b005004d783a394f1f7e3db828adf5d566	2021-05-06 10:05:30 -07:00
Freey0	ccbaa5fbe5	Port sign to structured (#57588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57588 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28224600 Pulled By: ezyang fbshipit-source-id: 71de5211617c1eba34192e23831136ae5c403e61	2021-05-06 10:05:28 -07:00
Freey0	3c0d22fab3	Port floor to structured (#57587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57587 Use ghstack to reopen to reduce self conflict。 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28224599 Pulled By: ezyang fbshipit-source-id: 023f21bc976b90f8a5a409db4f3390aa4eaea446	2021-05-06 10:04:15 -07:00
Sebastian Weiss	d83d1d3741	TensorIterator: documentation on the order of creation (#57550 ) Summary: Adds documentation to TensorIterator and TensorIteratorConfig that outputs need to be added first before inputs. Fixes https://github.com/pytorch/pytorch/issues/57343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57550 Reviewed By: VitalyFedyunin Differential Revision: D28198135 Pulled By: mrshenli fbshipit-source-id: 363603cac968bf786a4a6a64e353307c54d541b1	2021-05-06 09:39:47 -07:00
Sam Estep	72ebdd68e1	Revert D28242069: Add cuSOLVER path for torch.linalg.lstsq Test Plan: revert-hammer Differential Revision: D28242069 (`7b31d4262b`) Original commit changeset: 23979d19ccc7 fbshipit-source-id: edf26a78b3485790deb1a8f53e8c8d3989c28e1b	2021-05-06 09:28:15 -07:00
Alexander Golynski	dc06f52480	Add result() to ProcessGroupGloo::AsyncWork's (#57565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57565 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28255120 Pulled By: agolynski fbshipit-source-id: 1e904d4fe024d5b99cb642f8689ca32be0581e82	2021-05-06 08:48:48 -07:00
Ivan Yashchuk	a7ba0f08f3	Update internal code for torch.lu_solve (#56611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56611 The goal of this refactoring is to make the `torch.linalg.solve` to be a composition of calls to `lu_stub` and `lu_solve_stub`. Once `lu_stub` and `lu_solve_stub` have cuSOLVER-based codepath, `torch.linalg.solve` will have it as well. Replaced lu_solve_helper with DECLARE_DISPATCH for lu_solve_stub. Removed unnecessary copy improving the performance (see https://github.com/pytorch/pytorch/pull/56611#issuecomment-824303673). Split MAGMA-based `apply_lu_solve` into `apply_lu_solve_looped_magma` and `apply_lu_solve_batched_magma`. This simplifies future dispatch to cuSOLVER and cuBLAS. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28142279 Pulled By: mruberry fbshipit-source-id: 9d4baf650ca7a40b800616794408b34342d8d68f	2021-05-06 08:26:13 -07:00
Aliaksandr Ivanou	cb7197ce3f	Torchelastic: populate __init__.py with failover documentation Summary: Torchelastic: populate __init__.py with failover documentation Test Plan: {F613772684} Reviewed By: cbalioglu Differential Revision: D28243715 fbshipit-source-id: aeed8d3ddd2d27ef86d837e7e3ebfa7a0b80a07d	2021-05-06 07:38:48 -07:00
Harish Shankam	ad31aa652c	Fixed the error in conv1d example (#57356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57356 Reviewed By: albanD Differential Revision: D28173174 Pulled By: malfet fbshipit-source-id: 5e813306f2e2f7e0412ffaa5d147441134739e00	2021-05-06 07:02:37 -07:00
Richard Zou	52ac015d76	Add note about improved vmap prototype to vmap docstring (#57677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57677 This PR adds a note about the existence of the improved vmap prototype to raise awareness of its existence. Eventually the plan is to delete the in-core vmap prototype and replace it with the improved vmap prototype but that might take a while. Test Plan: - view docs Reviewed By: Chillee Differential Revision: D28231346 Pulled By: zou3519 fbshipit-source-id: 0a3b274df87ffd50333330e413e1a89634865403	2021-05-06 06:47:18 -07:00
CodemodService FBSourceClangFormatLinterBot	f1a62264f3	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28250914 fbshipit-source-id: 8bec4e0806891a045becf59c2d2f44f12bc41926	2021-05-06 05:11:25 -07:00
Mike Ruberry	40cb55f978	Revert D28154522: Add call_kwargs(args, kwargs) method to torch::deploy api Test Plan: revert-hammer Differential Revision: D28154522 (`ba500c5c90`) Original commit changeset: 5ba57a8d7f01 fbshipit-source-id: 4d41efc733b22bc8eb8d6b174f4531e7e87e38ee	2021-05-06 04:52:07 -07:00
Ivan Yashchuk	7b31d4262b	Add cuSOLVER path for torch.linalg.lstsq (#57317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57317 This PR implements QR-based least squares solver using geqrf, ormqr, and triangular_solve operations. Internal code of triangular_solve was fixed to handle correctly larger sized rectangular arrays. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242069 Pulled By: mruberry fbshipit-source-id: 23979d19ccc7f591afa8df4435d0db847e2d0d97	2021-05-06 04:45:55 -07:00
Ivan Yashchuk	35fab44eaf	Add CUDA support for torch.ormqr (#57316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57316 CUDA support is implemented using cuSOLVER. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242071 Pulled By: mruberry fbshipit-source-id: 6f0a1c50c21c376d2ee2907bddb618c6a600db1f	2021-05-06 04:45:54 -07:00
Ivan Yashchuk	59d794b2c3	Port CPU torch.ormqr to ATen (#57315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57315 This PR ports `torch.ormqr` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port, support for complex and batched inputs is added. The tests are rewritten and OpInfo entry is added. We can implement the least squares solver with geqrf + ormqr + triangular_solve. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24748 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28242070 Pulled By: mruberry fbshipit-source-id: f070bb6ac2f5a3269b163b22f7354e9089ed3061	2021-05-06 04:44:40 -07:00
Jiakai Liu	b4a098f1fb	[pytorch][nnc] mobile nnc backend skeleton (#56852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56852 This is part of the changes to enable NNC AOT compilation for mobile. It introduced a custom backend for NNC, which uses the components defined in the stacked PRs to load and execute a NNC-compiled model. ghstack-source-id: 128285801 Test Plan: - On X86 host: ``` buck build //xplat/caffe2/fb/lite_predictor:lite_predictor_nnc buck-out/last/lite_predictor_nnc --model xplat/pytorch_models/build/pytorch_dev_linear/v1/nnc/compiled.pt --print_output true --input_dims '4,4' --input_type float ``` - On Android: ``` buck build fbsource//fbandroid/mode/gnustl //xplat/caffe2/fb/lite_predictor:lite_predictor_nncAndroid#android-armv7 adb push buck-out/last/lite_predictor_nncAndroid#android-armv7 /data/local/tmp adb push xplat/pytorch_models/build/pytorch_dev_linear/v1/nnc/compiled.pt /data/local/tmp adb shell 'cd /data/local/tmp; ./lite_predictor_nncAndroid\#android-armv7 --model compiled.pt --print_output true --input_dims "4,4" --input_type float' ``` Reviewed By: kimishpatel, raziel Differential Revision: D27897153 fbshipit-source-id: 8e039089d1602782582747adfd75b31496b525ca	2021-05-06 03:25:18 -07:00
Jiakai Liu	d82333e92a	[pytorch][nnc] protocol classes to persist the context for compiled functions (#56851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56851 This is part of the changes to enable NNC AOT compilation for mobile. At the end of the ahead-of-time compilation the compiler produces two sets of artifacts: 1. "compiled assembly code" - kernel functions in assembly format optimized for target platforms; 2. "compiled model" - regular TorchScript model that contains serialized parameters (weights/bias/etc) and invokes kernel functions via "handles" (name/version id/input & output specs/etc of the kernel functions). This PR introduces a set of classes to represent kernel functions (a.k.a "handles"), which can be serialized/deserialized into/from the "compiled model" as an IValue. Also introduces APIs to register/look-up "compiled assembly code". ghstack-source-id: 128285802 Test Plan: - unit tests - for FB build environment: buck test //caffe2/test/mobile/nnc:mobile_nnc Reviewed By: kimishpatel, raziel Differential Revision: D27921866 fbshipit-source-id: 4c2a4d8a4d072fc259416ae674b3b494f0ca56f3	2021-05-06 03:24:15 -07:00
Gao, Xiang	db7b31358f	Fix internal assert in CUDA caching allocator when trying to allocate ~2^64 memory (#57571 ) Summary: When the memory requested is huge, some internal logic in CUDA caching allocator could overflow. The result of the overflow is the caching allocator gives a confusing error message. For example: ```python import torch import torch.nn as nn from torch.utils import cpp_extension cuda_source = """ #include <c10/cuda/CUDACachingAllocator.h> void my_fun(void) { size_t temp_storage_bytes = 18446744073708433663UL; auto& caching_allocator = ::c10::cuda::CUDACachingAllocator::get(); auto temp_storage = caching_allocator.allocate(temp_storage_bytes); return; } """ cpp_source = """ void my_fun(void); """ module = torch.utils.cpp_extension.load_inline( name="cuda_test_extension", cpp_sources=cpp_source, cuda_sources=cuda_source, functions="my_fun", extra_cuda_cflags=["--extended-lambda"], verbose=True, ) module.my_fun() print('done') ``` gives ``` Traceback (most recent call last): File "/home/gaoxiang/misc/caching-allocator.py", line 26, in <module> module.my_fun() RuntimeError: p.block != nullptr && p.block->ptr != nullptrINTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":991, please report a bug to PyTorch. Exception raised from alloc_block at ../c10/cuda/CUDACachingAllocator.cpp:991 (most recent call first): frame #0: <unknown function> + 0x83e93 (0x7f424f05ee93 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/1: <unknown function> + 0x83bf9 (0x7f424f05ebf9 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/2: <unknown function> + 0x839bd (0x7f424f05e9bd in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/3: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x4c (0x7f428a3350a2 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/4: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x40 (0x7f424f05dc34 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/5: c10::detail::torchCheckFail(char const, char const, unsigned int, char const) + 0x97 (0x7f424f05c42f in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/6: <unknown function> + 0x6948b4 (0x7f42978fd8b4 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libtorch_python.so) frame https://github.com/pytorch/pytorch/issues/7: <unknown function> + 0x22373 (0x7f424f0e2373 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) frame https://github.com/pytorch/pytorch/issues/8: <unknown function> + 0x1fa6c (0x7f424f0dfa6c in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) frame https://github.com/pytorch/pytorch/issues/9: <unknown function> + 0x2337a (0x7f424f0e337a in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) frame https://github.com/pytorch/pytorch/issues/10: <unknown function> + 0x23f18 (0x7f424f0e3f18 in /home/gaoxiang/.local/lib/python3.9/site-packages/torch/lib/libc10_cuda.so) frame https://github.com/pytorch/pytorch/issues/11: my_fun() + 0x4b (0x7f4200338f74 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/12: torch::detail::wrap_pybind_function_impl_<void (&)()>(void (&)(), std::integer_sequence<unsigned long>)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const + 0x3f (0x7f420031e575 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/13: <unknown function> + 0x570f2 (0x7f42003350f2 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/14: <unknown function> + 0x536e2 (0x7f42003316e2 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/15: <unknown function> + 0x4ef2f (0x7f420032cf2f in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/16: <unknown function> + 0x4ef93 (0x7f420032cf93 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) frame https://github.com/pytorch/pytorch/issues/17: <unknown function> + 0x3e7f2 (0x7f420031c7f2 in /home/gaoxiang/.cache/torch_extensions/cuda_test_extension/cuda_test_extension.so) <omitting python frames> frame https://github.com/pytorch/pytorch/issues/30: __libc_start_main + 0xd5 (0x7f42c60bab25 in /usr/lib/libc.so.6) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57571 Reviewed By: VitalyFedyunin Differential Revision: D28224574 Pulled By: ezyang fbshipit-source-id: df440961f6eaf58048af36ae2a06c59f3c18baec	2021-05-06 01:36:58 -07:00
Luca Wehrstedt	7d4121d1d2	Make RRefContext get devices from RPC agent when creating OwnerRRef (#57443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57443 Based on the comments in https://github.com/pytorch/pytorch/pull/57355, I started looking at the callsites of `getOrCreateOwnerRRef` and `createOwnerRRef`, and noticed that many of them didn't specify the `devices` argument, which was optional and thus defaulted to `{}`, which created a CPU-only Future inside the OwnerRRef. (Such callsites were, for example, in `processPythonRemoteCall` and `processBaseScriptRemoteCall`, or `PyRRef::unpickle`, ...). Some (or all?) of these callsites might still have worked thanks to the RRef's own handling of CUDA streams and events, however we intend to remove that in https://github.com/pytorch/pytorch/pull/57355. I think it would be a safer and more generic solution to always create OwnerRRefs with the full set of devices supported by the RPC agent, and this is in fact easy to do since the RRefContext has access to the RPC agent. This means that all OwnerRRefs, no matter how they're created, will support CUDA if the agent does. This also allows us to stop requiring to specify devices when creating a OwnerRRef by hand in Python. ghstack-source-id: 128184665 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28144365 fbshipit-source-id: 1f2d446873f31ee297415c46b94126b6502b12d3	2021-05-06 01:12:56 -07:00
Luca Wehrstedt	7ffadf6e46	Replace DeviceIndexes with Devices in RRefs (#57442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57442 We did this for the RPC agents and for ivalue::Future, the last one (I think) is RRef. ghstack-source-id: 128184664 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28144368 fbshipit-source-id: eeacab6006f72118cbec542a02322f2e391c67a3	2021-05-06 01:12:54 -07:00
Luca Wehrstedt	8e9bbd3113	Make DataPtr extraction in CUDAFuture faster for Python values (#56918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56918 Re-importing a Python module each time is a bit expensive, and it's unnecessary because this is a private module which won't change and thus we can cache the value once we first extract it. ghstack-source-id: 128184666 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27985910 fbshipit-source-id: be40ae9b67ab8ea6c07bc2cb9a78d2c2c30b35d3	2021-05-06 01:12:53 -07:00
Luca Wehrstedt	69de4940f3	Ensure devices are preserved when forwarding between futures (#57432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57432 In a bunch of places we were creating a future and then "forwarding" the value of another future to it once that other future completed. (This was in order to convert the type of the value, or to "merge" multiple futures into one). However when doing so we often created a child future with an empty set of devices, which meant it didn't support CUDA, and thus would cause a silent synchronization/correctness bug if the parent future did actually contain CUDA tensors. One way this could have been caught earlier would have been to have Future always extract the DataPtrs, even in CPU-only mode, in order to ensure they always reside on the expected set of devices. Unfortunately this might have some averse perf effects thus should be done carefully. ghstack-source-id: 128184667 Test Plan: eyes Reviewed By: mrshenli Differential Revision: D28143045 fbshipit-source-id: 9af1abf270366dc1df0d4857d6a8cc73668af9d1	2021-05-06 01:12:51 -07:00
Luca Wehrstedt	1292602375	Avoid re-extracting DataPtrs when forwarding values between Futures (#57433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57433 In a bunch of cases we need to "forward" between one future and another, typically because we need to convert the type of the data (e.g., from Message to PyObject). In most of these cases the DataPtrs of the value don't change, and yet the new future must re-extract them from scratch. By allowing the user to obtain the vector of extracted DataPtrs from the old future, we can allow them to "shortcut" this step. Also, this change is a requirement for the next PR to work, since the next PR would otherwise cause us to attempt extracting DataPtrs from Message instances, which doesn't work (because Message is a custom class), but thanks to this PR we actually skip that. ghstack-source-id: 128184663 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28118298 fbshipit-source-id: 70e333ea6a4f8d4d9a86514c350028d412469ee1	2021-05-06 01:11:38 -07:00
Raghavan Raman	1f178de800	[NNC] Add support for computing conv with dynamic shapes (#57514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57514 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28226918 Pulled By: navahgar fbshipit-source-id: 818ac8411b809033388d419c8f33db6aeece4b33	2021-05-06 01:08:25 -07:00
Raghavan Raman	eef72f3f8a	[NNC] Update Buf on mutation instead of creating new ones (#57513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57513 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28226917 Pulled By: navahgar fbshipit-source-id: 4e74c56a85b7aadc285b872b8ef8f8e26f31c8ce	2021-05-06 01:08:23 -07:00
Raghavan Raman	95fbc158d4	[NNC] Add a method to compute conv without bias (#57512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57512 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D28226919 Pulled By: navahgar fbshipit-source-id: e84b944f7fdc84a77409d59218ceaa0862298f3c	2021-05-06 01:07:21 -07:00
Tao Xu	3fb5be05ba	[iOS GPU] Add Metal API availability check (#57663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57663 Detail error messages when shader compilation fails. ghstack-source-id: 128282408 Test Plan: CI Reviewed By: linbinyu Differential Revision: D28247966 fbshipit-source-id: 2c8ae575acbb197048c1edde28674ab69f008751	2021-05-06 00:17:55 -07:00
Scott Wolchok	7870450706	[PyTorch] Use c10::ThreadLocal instead thread_local in record_function.cpp for specific __GLIBCXX__ on Android (#57689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57689 * Older versions of libgnustd have issues with thread_local C++ qualifier on Android devices prior to r17+. Use c10::tls<> wrapper with smart pointer semantics in such cases. * Convenient macro `C10_DEFINE_TLS_static` was added as well: ``` // Define static TLS variable str_tls_ of type std::string C10_DEFINE_TLS_static(std::string, str_tls_); //////// Excercise it //////// { *str_tls_ = "abc"; assert(str_tls_->length(), 3); } ``` ghstack-source-id: 128233742 Test Plan: CI + Reviewed By: ilia-cher Differential Revision: D27875779 fbshipit-source-id: 7764f96ac1e121051c6ea66eabcedb9ef54d290e	2021-05-06 00:13:33 -07:00
Jay Chae	fc657b547a	[kineto] set the correct device id for GenericTraceActivity Summary: while merging ClientTraceActivity and GenericTraceActivity, we accidentally adopted CTA's behavior of returning process id over its `device`. This causes GTA to show up in CPU timeline rather than associated GPU's Test Plan: before https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620113910%2F127.0.0.1%2Flibkineto_activities_1270242.json.gz&bucket=gpu_traces {F613233496} after https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620115859%2F127.0.0.1%2Flibkineto_activities_1511899.json.gz&bucket=gpu_traces {F613231643} Reviewed By: gdankel Differential Revision: D28196723 fbshipit-source-id: eb8330c14e7c43a470bb4df4811b80754d96535b	2021-05-05 23:54:08 -07:00
Hao Lu	8bbe383877	[Static Runtime] Fix bugs in logit (#57578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57578 The original impl in SR assumes that eps is a constant, which is true most of the times. However it could be a graph input as well. This diff fixes this issue. Unit tests are added as well. Reviewed By: edvgha Differential Revision: D28207975 fbshipit-source-id: 9a10dec159f3804e43ef74aaa20c3ec6c79548c9	2021-05-05 23:38:15 -07:00
Philip Meier	126ea1ccad	relax type equality constraint for scalars (#57532 ) Summary: Currently we require type equality for `torch.testing.assert_(equal\|close)`: `3db45bcb91/torch/testing/_asserts.py (L509-L513)` That means `assert_equal(1, 1.0)` will correctly fail. Although the type of a scalar is similiar to a dtype of a tensor, `assert_equal(1, 1.0, check_dtype=False)` will also fail while `assert_equal(torch.as_tensor(1), torch.as_tensor(1.0), check_dtype=False)` will pass. To make the interface more consistent, this PR relaxes the type equality constraint, by disabling it in case both inputs are scalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57532 Reviewed By: ngimel Differential Revision: D28242428 Pulled By: mruberry fbshipit-source-id: b643c77f48b64fc2c8a43925120d2b634ec336b5	2021-05-05 22:42:51 -07:00
Yi Huang (Symphony)	ba78bf1363	[standaloneRunner] fix another GIL mutithreading issue exposed by torch::jit::toIValue() (#57688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57688 P412982836 says that `torch::jit::toIValue()` will also touch GIL through `torch::jit::createGenericDict()` (P412848640) So we have to move `torch::jit::toIValue()` out of multithreading execution Reviewed By: hyuen Differential Revision: D28236527 fbshipit-source-id: 43a33dbcfc828cc42c5e1230c8f5cb415bf7bde4	2021-05-05 21:41:04 -07:00
Horace He	ccbbb2d6f8	Revert D28052211: [paramcomms] support for in and out split sizes Test Plan: revert-hammer Differential Revision: D28052211 (`866b19e95d`) Original commit changeset: 4ab7d425fc72 fbshipit-source-id: 80c001ddcb3730f0487adddf66d9166f53c45a8c	2021-05-05 21:10:31 -07:00
Horace He	86b061c80e	[FX] Changes in order to move python key out of tree (#57427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57427 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28215322 Pulled By: Chillee fbshipit-source-id: 94439376097c74f2004e6eca214d7940df20865d	2021-05-05 20:55:51 -07:00
Horace He	c27428b5e9	[nnc] ported conv2d lowering over (#56875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56875 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28213450 Pulled By: Chillee fbshipit-source-id: bacdcec83ec61aba1d55f5e3a16f81d6ada3cff2	2021-05-05 20:54:43 -07:00
Jay Chae	866b19e95d	[paramcomms] support for in and out split sizes Summary: Adding way to accept in and out split sizes. Test Plan: {F613245151} https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620153506%2F127.0.0.1%2Flibkineto_activities_1112677.json.gz&bucket=gpu_traces NOTE: ignore the GPU user showing up in CPU - the issue is fixed in the diff above the stack D28196723 UPDATED: now the sizes are encoded as arrays in .json https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1620259313%2F127.0.0.1%2Flibkineto_activities_3944235.json.gz&bucket=gpu_traces Reviewed By: kingchc Differential Revision: D28052211 fbshipit-source-id: 4ab7d425fc722907d9bbcfad7e364d031ff69b29	2021-05-05 20:46:11 -07:00
Pritam Damania	27af9b0462	Fix flaky test_rref_context_debug_info (#57526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57526 This test would create an RRef, delete that rref and then create two more RRefs and validate total rrefs were 2 in the end. Due to the async nature of delete, sometimes the RRef would not be deleted until the assertion was made. As a result, I've fixed this by waiting for the RRef to be deleted at the appropriate time. #Closes: https://github.com/pytorch/pytorch/issues/55382 ghstack-source-id: 128037566 Test Plan: waitforbuildbot Reviewed By: H-Huang Differential Revision: D28173151 fbshipit-source-id: e4f34ff4e49b72cfc9e67a72c482f5e05159eda5	2021-05-05 20:26:58 -07:00
Will Constable	ba500c5c90	Add call_kwargs(args, kwargs) method to torch::deploy api (#57484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57484 To be used by PyTorchPredictor integration for deploy. Test Plan: tested via new unit tests Reviewed By: suo Differential Revision: D28154522 fbshipit-source-id: 5ba57a8d7f01686180e6fd47663635ec3ab2120d	2021-05-05 20:21:59 -07:00
Ilia Cherniavskii	8df9b88042	[kineto] Update Kineto submodule (#57700 ) Summary: Update Kineto submodule to fix an invalid json bug, also update and move profiler json tracing unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/57700 Test Plan: python test/test_profiler.py -v Reviewed By: gdankel, rohan-varma Differential Revision: D28243256 Pulled By: ilia-cher fbshipit-source-id: edfe9f26c66e967d610231be5fc22ba5ee1054fa	2021-05-05 20:09:38 -07:00
Paul Qi	0d813bbca5	Revert D28177176: [iOS GPU] Add Metal API availability check Test Plan: revert-hammer Differential Revision: D28177176 (`30c96c9419`) Original commit changeset: b5913e5ed75d fbshipit-source-id: e6ea545493b788b66b065a4c16daf31373c9eecc	2021-05-05 19:50:09 -07:00
Michael Suo	44b021d21b	[package] remove save_source_file API (#57340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57340 This API was only used within our own implementation. I couldn't find any uses anywhere else. Removing it to reduce our overall surface area, and also because the semantics are unclear in a world where serialization is deferred to close() time. Differential Revision: D28114188 Test Plan: Imported from OSS Reviewed By: anjali411 Pulled By: suo fbshipit-source-id: 6da53f20518885c7f4359e00e174f5e911906389	2021-05-05 17:57:05 -07:00
Michael Suo	a3cba770b5	[package] remove PackageExporter.file_structure (#57339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57339 After the `intern` changes, we will no longer eager write to the package archive so `file_structure` as written doesn't make much sense. Differential Revision: D28114187 Test Plan: Imported from OSS Reviewed By: anjali411 Pulled By: suo fbshipit-source-id: 875595db933e9d1b2fdde907b086889cc977e92f	2021-05-05 17:57:04 -07:00
Michael Suo	f326f7dda8	[package] use digraph to back dependency visualization (#57338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57338 Differential Revision: D28114190 Test Plan: Imported from OSS Reviewed By: astaff Pulled By: suo fbshipit-source-id: 78b15edae3b991307fd3656ac7b374d4d218b460	2021-05-05 17:57:02 -07:00
Michael Suo	53c21172c0	[package] add simple graph data structure (#57337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57337 Add a really simple graph data sturcutre for tracking dependencies. API based on networkx, but I didn't want to require the dependency. Differential Revision: D28114186 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Pulled By: suo fbshipit-source-id: 802fd067017e493a48d6672538080e61d249accd	2021-05-05 17:57:00 -07:00
Michael Suo	a39c685ace	[package] make extern a dict (#57336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57336 Avoid a small n^2 Differential Revision: D28114189 Test Plan: Imported from OSS Reviewed By: astaff Pulled By: suo fbshipit-source-id: 2672669ad0e23169d70c92f9d5ed61f66081f248	2021-05-05 17:56:59 -07:00
Michael Suo	dedf9fbe81	[package] factor out `PackageExporter._get_dependencies` (#57335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57335 Mostly refactoring. The only behavioral change is that I have eliminated the `orig_source_file` argument to `save_source_string`. I think it doesn't provide enough marginal value (since if you have the module name you can get the source file anyway). Differential Revision: D28114184 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Pulled By: suo fbshipit-source-id: b5e9eb4250dc84552befeef2dcf9e591b32899ae	2021-05-05 17:55:48 -07:00
Elias Ellison	7627dd568a	hardswish reland (#57652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57652 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D28226724 Pulled By: eellison fbshipit-source-id: 585a91ffab7a855b5600e79130a37be25ef9b354	2021-05-05 17:21:43 -07:00
Horace He	56211524a7	[NNC] ported over sum and softmax to new scheme (#56775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56775 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28173905 Pulled By: Chillee fbshipit-source-id: 865ff71e5a428341d7225f534f7093ef2994fe5a	2021-05-05 17:09:34 -07:00
albanD	0b51ee311d	Add missing return statement from 57057 (#57669 ) Summary: Fixes a bug introduced by https://github.com/pytorch/pytorch/issues/57057 cc ailzhang while writing the tests, I realized that for these functions, we don't properly set the CreationMeta in no grad mode and Inference mode. Added a todo there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57669 Reviewed By: soulitzer Differential Revision: D28231005 Pulled By: albanD fbshipit-source-id: 08a68d23ded87027476914bc87f3a0537f01fc33	2021-05-05 16:13:35 -07:00
Scott Wolchok	cd22bdf236	[PyTorch] Autoformat c10, round 2 (#57645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57645 Second round of autoformatting changes since the first pass became too large. ghstack-source-id: 128199695 Test Plan: CI Reviewed By: zertosh Differential Revision: D28131430 fbshipit-source-id: 24b03e38b087f31e8cac2404bebcd401c55b6cab	2021-05-05 15:45:53 -07:00
Sam Estep	e5179e960e	Share VS Code settings/extensions nicely (#57671 ) Summary: This is a second attempt at https://github.com/pytorch/pytorch/issues/51214. It should achieve the same goals with (as far as I can tell) no disadvantages, but the advantages are a bit less pronounced than in the more dictatorial approach that https://github.com/pytorch/pytorch/issues/51214 took: - Unfortunately, I was unable to figure out how to include [the `mypy` configuration given in the docstring of `tools.mypy_wrapper.main`](`7115a4b870/tools/mypy_wrapper.py (L81-L89)`), because as walterddr pointed out, `"${env:HOME}/miniconda3/envs/pytorch/bin/python"` is not guaranteed to be correct on everyone's machine: ```json { "python.linting.enabled": true, "python.linting.mypyEnabled": true, "python.linting.mypyPath": "${env:HOME}/miniconda3/envs/pytorch/bin/python", "python.linting.mypyArgs": [ "${workspaceFolder}/tools/mypy_wrapper.py" ] } ``` Importantly, this does not work: ```json "python.linting.mypyPath": "${workspaceFolder}/tools/mypy_wrapper.py" ``` This is because VS Code does not run the given `mypy` command inside of the user's specified virtual environment, so for instance, on my system, setting the `mypy` command to directly call `tools/mypy_wrapper.py` results in using `mypy 0.782` instead of the correct `mypy 0.812`. Sadly, [this](https://code.visualstudio.com/docs/editor/variables-reference#_configuration-variables) does not work either, although I'm not sure why: ```json { "python.linting.mypyPath": "${config:python.pythonPath}", "python.linting.mypyArgs": [ "${workspaceFolder}/tools/mypy_wrapper.py" ] } ``` - As a result, `git clean -fdx; tools/vscode_settings.py` still results in some loss of useful configuration. One other thing to note: as `.vscode/settings_recommended.json` shows, there are some configuration sections that only take effect within the context of a `"[language]"`, so currently, if a dev already has one of those settings, it would be entirely overwritten by `tools/vscode_settings.py` rather than a graceful merge. This could probably be fixed by using a deep merge instead of the current shallow merge strategy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57671 Test Plan: If you want, you can typecheck the small script added by this PR (no output is expected): ```sh tools/mypy_wrapper.py $PWD/tools/vscode_settings.py ``` You can also try running it to update your own VS Code workspace settings: ```sh tools/vscode_settings.py ``` This should have minimal impact on your existing `tools/settings.json` file other than enabling the few explicitly recommended settings (e.g. it should not reorder or remove any of your existing settings). Reviewed By: malfet Differential Revision: D28230390 Pulled By: samestep fbshipit-source-id: 53a7907229e5807c77531cae4f9ab9d469fd7684	2021-05-05 15:19:59 -07:00
Ilia Cherniavskii	65fad0ebd2	Expand Kineto platform support (ci-all) (#56323 ) Summary: Expanding support to all builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56323 Test Plan: CI Reviewed By: malfet Differential Revision: D28171478 Pulled By: ilia-cher fbshipit-source-id: 16bc752d1be3cbaeda5316f5d8a687ae05a83d22	2021-05-05 15:00:01 -07:00
Tao Xu	30c96c9419	[iOS GPU] Add Metal API availability check (#57663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57663 Detail error messages when shader compilation fails. ghstack-source-id: 128206967 Test Plan: CI Reviewed By: linbinyu Differential Revision: D28177176 fbshipit-source-id: b5913e5ed75df96fda770c3f1a893f9bfd781ec0	2021-05-05 14:21:56 -07:00
Rohan Varma	69e64b2632	[Flaky tests] Fix flaky rpc profiling tests (#57517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57517 Fixes the flaky tests https://github.com/pytorch/pytorch/issues/45145 and https://github.com/pytorch/pytorch/issues/45067. The root cause is that it is not the case that all remote events will be children of the record function remote event, as other events can sometimes be profiled under the hood such as the issue described in https://github.com/pytorch/pytorch/issues/43868. We fix this issue by verifying that the set of events that are children on the remote end and children on the local end are the same, without necessarily enforcing specific events to be logged. Tested by running the test 1000+ times and verifying it passed. Will also test on CI box before landing ghstack-source-id: 128200041 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D28166602 fbshipit-source-id: 8145857da4642aef31f360b20db00f4328abe2ca	2021-05-05 14:06:32 -07:00
Akshit Khurana	c4bb6a5781	NNAPI: flex size support for upsample_nearest2d op (#57563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57563 Add flexible size support for upsample_nearest2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200847 fbshipit-source-id: 901fe3f6e68e4c16ece730f3ffa68dc88c6ed6c3	2021-05-05 13:54:43 -07:00
Akshit Khurana	4c609a9782	NNAPI: Add qadd flexible size support (#57562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57562 Add flexible size support for qadd op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200849 fbshipit-source-id: d5b2ea8e9eb8ae405ff2c960f7549cef60bc0991	2021-05-05 13:54:41 -07:00
Akshit Khurana	28cd04ea64	NNAPI: add flexible size support for conv2d (#57561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57561 Add flexible size support for conv2d op in nnapi model conversion Test Plan: pytest test/test_nnapi.py Imported from OSS Reviewed By: dreiss Differential Revision: D28200848 fbshipit-source-id: d94ccf48a3d8453aa8e96c7cac02948c4cd870cc	2021-05-05 13:53:33 -07:00
Ivan Yashchuk	049152faa9	Make torch.linalg.eigvalsh differentiable (#57189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57189 `torch.linalg.eigvalsh` now supports autograd. This is achieved by computing the eigenvectors internally if input requires grad, otherwise the eigenvectors are not computed and the operation is faster. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28199708 Pulled By: albanD fbshipit-source-id: 12ac56f50137398613e186abd49f82c8ab83532e	2021-05-05 13:12:18 -07:00
Ivan Yashchuk	babae61f2f	Make torch.linalg.svdvals differentiable (#57188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57188 `torch.linalg.svdvals` now supports autograd. This is achieved by computing the singular vectors internally if input requires grad, otherwise the vectors are not computed and the operation is faster. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28199709 Pulled By: albanD fbshipit-source-id: cf39cf40965c606927db5331ce16743178fa711f	2021-05-05 13:11:15 -07:00
Linbin Yu	534c457d3d	add a standalone extra file loader for pytorch model (#57591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57591 According to dhruvbird we should be able to read a file from pytorch model (which is a zip file) using miniz. This diff added a standalone loader so user can load a JSON (or other type) file in the extra folder of the model. The whole point is to avoid loading pytorch library first, which can be complex (voltron, dynamic loading etc). With this the hand tracking inference config (D27937516) can no longer depends on pytorch or use dynamic_pytorch. Previous it uses torch::jit::_load_extra_only_for_mobile which requires pytorch to be loaded first. We want to avoid doing that. Test Plan: buck test caffe2/fb/dynamic_pytorch:extract_file_test Reviewed By: dhruvbird Differential Revision: D28140492 fbshipit-source-id: 2fd1570523841f4c35dc2ad8dfde5f1d396a74fa	2021-05-05 13:08:40 -07:00
Alban Desmaison	15c092b888	Revert "Make grad mode error just a warning (#56401 )" (#57640 ) Summary: This reverts commit 63dac82444cc522f177b801d9f0cd2e22417c2f4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57640 Reviewed By: soulitzer, yuguo68 Differential Revision: D28223946 Pulled By: albanD fbshipit-source-id: 641b87cff1e2f08162ca8cacae333105e89438f1	2021-05-05 13:07:29 -07:00
Rohan Varma	7115a4b870	Clang format ProcessGroupNCCL.cpp (#56840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56840 Per comments in https://github.com/pytorch/pytorch/pull/56427/files ghstack-source-id: 128142665 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D27980768 fbshipit-source-id: 0158ae1cfd892ff3385ffa0084dd7ef9de014f8c	2021-05-05 10:17:09 -07:00
Rohan Varma	a948e279ac	[c10d] Profiler support for nccl p2p collectives (#56427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56427 This PR enables support for nccl send/recv profiling similar to how we have it for MPI and Gloo. The process to do so is similar to the NCCL collectives where we create the `recordingFunction` in `initWork` and then add a callback that runs the profiler end callbacks. Tests are added similar to send/recv tests with gloo/MPI. We also test with both autograd profiler and torch.profiler. ghstack-source-id: 128142666 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27866600 fbshipit-source-id: f29d9103e22b22f658632fece0df9ba36911fc62	2021-05-05 10:14:56 -07:00
Nikita Shulga	17035f6aab	Speedup render_junit (#57641 ) Summary: JUnitXml.__iadd__() is very slow But since testsuites are flattened anyway in `convert_junit_to_testcases` concatenate flattened tests right away As result, parsing test-reports folder with 393 files and 25+ test cases takes .5 sec instead of 193 sec Fix typing errors and add script to mypy-strict Print warning, rather than abort if xml can not be parsed Pull Request resolved: https://github.com/pytorch/pytorch/pull/57641 Reviewed By: samestep Differential Revision: D28224401 Pulled By: malfet fbshipit-source-id: 3efc079c1c0deef8fff5ddf083268885b28418f9	2021-05-05 09:45:47 -07:00
Chen Lai	fb9a32b7b4	[PyTorch][Edge] Add api to get bytecode model version (#56801 ) Summary: Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer. ## Test CI (new added unit test will run as part of `pytorch_core-buck`) 1. run test_lite_interpreter.cpp 2. `python test/mobile/test_bytecode.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801 ghstack-source-id: 128169647 Test Plan: CI (new added unit test will run as part of `pytorch_core-buck`) 1. run test_lite_interpreter.cpp 2. `python test/mobile/test_bytecode.py` Reviewed By: iseeyuan Differential Revision: D27961417 fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c	2021-05-05 09:17:26 -07:00
Mikhail Zolotukhin	dedaf4fad7	Reland: [TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. (#57560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57560 The new methods allow to peak into bufferArgs which describe parameters that codegen expects. This description includes info whether a given parameter is a scalar var or a buffer and in case it's a buffer allows to get the corresponding `Buf*` pointer from which we could get the expected sizes. Relanding #57074 which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28199048 Pulled By: ZolotukhinM fbshipit-source-id: 636e838e7e242a3c63e97ec453b8fae9b6380231	2021-05-05 09:11:40 -07:00
Mikhail Zolotukhin	9e7814d539	Reland: [StaticRuntime] Use NNC's call_raw API to reduce call overheads. (#57553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57553 Relanding #57329 (the entire stack) which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28195048 Pulled By: ZolotukhinM fbshipit-source-id: 50052a2f20f84940b83d1dd1241c8659ff06e014	2021-05-05 09:11:38 -07:00
Mikhail Zolotukhin	e686c66fe7	Reland: [TensorExpr] Add `TensorExprKernel::runFast` method. (#57552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552 This method uses `CodeGen::call_raw` instead of `CodeGen::call`. Relanding #57328 (the entire stack) which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28195047 Pulled By: ZolotukhinM fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f	2021-05-05 09:11:37 -07:00
Mikhail Zolotukhin	0bf69278f7	Reland: [TensorExpr] Add `CodeGen::call_raw` method. (#57551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57551 The new method allows to pass input and output arguments by `void*` pointers instead of CallArgs. That helps to reduce the invocation overhead. Currently this is only supported in LLVM codegen. Relanding #55113 (the entire stack) which was reverted because I forgot to guard a new test with `ifdef LLVM`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28195049 Pulled By: ZolotukhinM fbshipit-source-id: 035b77ae996dbbcd542b4b0e4c011b41e8d7828b	2021-05-05 09:10:25 -07:00
Edward Yang	da8cc355a3	Relax tp_new so that it is OK to call (#57544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57544 Instead of removing tp_new from the superclass (which causes super().__new__ to not work), I now still install tp_new on the superclass, but verify that you are not trying to directly construct _TensorBase. Fixes https://github.com/pytorch/pytorch/issues/57421 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28189475 Pulled By: ezyang fbshipit-source-id: 9397a3842a77f5428d182dd62244b42425bca827	2021-05-05 09:04:39 -07:00
Heitor Schueroff	c65a1da90a	Fixed C++ linalg API (#57464 ) Summary: Previous reverted PR https://github.com/pytorch/pytorch/pull/57055. This PR leaves the deprecated signatures untouched. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57464 Reviewed By: mruberry Differential Revision: D28151048 Pulled By: heitorschueroff fbshipit-source-id: bc89d6cf3d801819d37b3d19bf525f8abd816881	2021-05-05 08:05:10 -07:00
Shen Li	887d0e5657	Revert D28197820: [JIT][NNC] add hardswish symbolic gradient and NNC lowering Test Plan: revert-hammer Differential Revision: D28197820 (`0142fd0b57`) Original commit changeset: 05305d85c5bb fbshipit-source-id: 2e1d9699515982ba2a9be06e83a2ce043ec857ee	2021-05-05 07:53:30 -07:00
Richard Zou	0787d781c5	Fix compatibility problem with LSTMs and torch.save (#57558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57558 Fixes #53359 If someone directly saves an nn.LSTM in PyTorch 1.7 and then loads it in PyTorch 1.8, it errors out with the following: ``` (In PyTorch 1.7) import torch model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') (In PyTorch 1.8) model = torch.load('lstm17.pt') AttributeError: 'LSTM' object has no attribute 'proj_size' ``` Although we do not officially support this (directly saving modules via torch.save), it used to work and the fix is very simple. This PR adds an extra line to `__setstate__`: if the state we are passed does not have a `proj_size` attribute, we assume it was saved from PyTorch 1.7 and older and set `proj_size` equal to 0. Test Plan: I wrote a test that tests `__setstate__`. But also, Run the following: ``` (In PyTorch 1.7) import torch x = torch.ones(32, 5, 2) model = torch.nn.LSTM(2, 3) torch.save(model, 'lstm17.pt') y17 = model(x) (Using this PR) model = torch.load('lstm17.pt') x = torch.ones(32, 5, 2) y18 = model(x) ``` and finally compare y17 and y18. Reviewed By: mrshenli Differential Revision: D28198477 Pulled By: zou3519 fbshipit-source-id: e107d1ebdda23a195a1c3574de32a444eeb16191	2021-05-05 07:36:13 -07:00
Vasiliy Kuznetsov	49adac65c4	ns for fx: clean up manual string names of related ops (#57210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57210 Removes the manually specified string name for sets of related ops, and replaces it with an automatically generated index. The manual name was arbitrary and ok for an MVP, but is not safe for wide usage. Also, adds APIs for users to add custom functions to the relatedness map by either pairing it to a known function or creating a new relatedness set. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28077977 fbshipit-source-id: e64a1ad6cd063014d74cdad189b0a612b1143435	2021-05-05 06:30:32 -07:00
Vasiliy Kuznetsov	76f29d53bf	ns for fx: change matching to only match known types (#57186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57186 Before this PR, we matched any pair of nodes with equal or related types. This PR changes the behavior to only match nodes whose type is in the allowlist (the relatedness mappings). This will prevent matching user defined modules, unless users add them to the mappings. This is motivated by a couple of things: 1. if user defined types are matched, it can break scriptability of the model with loggers attached. This happens whenever the user module has a return type of anything other than a Tensor or a tuple of Tensors. 2. we tried the past behavior on a couple of models, and it hasn't been useful. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXGraphMatcherModels python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28077981 fbshipit-source-id: 0a698e52b807cda47e6923310448a985b26eb362	2021-05-05 06:30:30 -07:00
Vasiliy Kuznetsov	44bb15cfd3	ns for fx: add more type to relationship mapping (#57184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57184 Add remaining types to the relationship mapping to have full coverage of ops quantization knows about, except binary ops and RNNs. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_op_relationship_mapping ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28077979 fbshipit-source-id: 0f6070c8a995032978702d088803f89ff25f2a7f	2021-05-05 06:30:29 -07:00
Vasiliy Kuznetsov	a9dc9535f6	ns for fx: move relatedness mapping to mappings file (#57171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57171 No logic change, just moving the mapping to a file where the other mappings are. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28077978 fbshipit-source-id: 4049d6a498156a5dffe3a03d2f4abc79da7bf907	2021-05-05 06:29:11 -07:00
CodemodService FBSourceClangFormatLinterBot	9ec6883442	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28216577 fbshipit-source-id: ce31fb98320a31eb947bdd31c68aaafed034df79	2021-05-05 04:41:21 -07:00
Nikolay Korovaiko	aeaa91bff6	mkldnn gelu (#53615 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/53615 Reviewed By: anjali411 Differential Revision: D28154396 Pulled By: Krovatkin fbshipit-source-id: 7a9d4d37dc06e54e3249c531a034667b5a2afc46	2021-05-05 02:03:52 -07:00
eellison	0142fd0b57	[JIT][NNC] add hardswish symbolic gradient and NNC lowering (#57383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57383 Notes: I picked up an activation from https://github.com/pytorch/pytorch/issues/56969. You can look at the [activations.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/Activation.cpp#L429) file which has both forward and backward kernel code to help you write the NNC lowering and the symbolic gradient. I added a test in test_jit_fuser_te for the fusion, and I added an OpInfo and asserted that we expect to see autodiffable nodes to test the symbolic gradient. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28197820 Pulled By: eellison fbshipit-source-id: 05305d85c5bb0847c8f911b95ba47b137dca7e90	2021-05-04 23:39:59 -07:00
Nikita Shulga	133d8abbfc	Compute nvrtc during libtorch build (#57579 ) Summary: The warning is completely harmless, but it still its nice not to emit it when it could be computed. Fixes https://github.com/pytorch/pytorch/issues/53350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57579 Reviewed By: walterddr Differential Revision: D28208938 Pulled By: malfet fbshipit-source-id: 8dcc3f1bff7c5ed2c0157268c3063228d3c445b6	2021-05-04 22:51:24 -07:00
Pavel Belevich	cd9995ae14	Update Gloo submodule (#57586 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57586 Reviewed By: mrshenli Differential Revision: D28210701 Pulled By: pbelevich fbshipit-source-id: 5edce939ee50bc0488e190a61b6fd10c635dff67	2021-05-04 22:06:18 -07:00
Nicolas Jean van Kempen	45a3231bb8	[codemod] Enforce proper use of emplacy functions Summary: The goal of this diff is enforce proper use of "emplacy" functions. In each case, this saves at worst a move constructor call, and at best a full copy of the object (in the case of a constructor call where the object does not have a move constructor). Test Plan: CI. Reviewed By: marksantaniello Differential Revision: D27888714 fbshipit-source-id: 235d0b31066463588c7e4ab86e132c430a352500	2021-05-04 20:58:18 -07:00
Dhruv Matani	d728491fc1	[RFC] [PyTorch Edge] Simplify error logging in mobile/import.cpp (#55711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55711 Currently, there is some complex logic that tries to handle all exceptions but re-throws them as a `c10::Error` so that it can log the error message. I'm looking for context on why this was added. The current logic (after talking with swolchok) seems equivalent, simpler, and also preserves the original stack trace from where the exception was originally thrown. This is useful when viewing the backtrace in logview. Re-throwing an exception using `TORCH_CHECK(false, message)` results in the original exception stack trace getting lost, so we want to avoid that. ghstack-source-id: 128043281 Test Plan: Build. Reviewed By: iseeyuan Differential Revision: D27688352 fbshipit-source-id: b7b1a29b652b31da80d72f16d284e48b8623377b	2021-05-04 20:45:32 -07:00
Sam Estep	eb39da6b52	Always run as many quick-checks steps as possible (#57572 ) Summary: This is essentially a continuation of https://github.com/pytorch/pytorch/issues/56700. Currently, some of the steps in Lint / quick-checks (such as the trailing newlines check) still don't always run if an earlier steps fail; example: https://github.com/pytorch/pytorch/runs/2504623867 This PR adds some more `if`s to remaining steps, so that they, too, can still run even when earlier steps fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57572 Test Plan: - https://github.com/pytorch/pytorch/runs/2504706736 before this PR, many steps get skipped if an early step fails - https://github.com/pytorch/pytorch/runs/2504778437 using this PR's technique, those steps still run - https://github.com/pytorch/pytorch/runs/2504787234 if the requirements step doesn't run, steps still get skipped - https://github.com/pytorch/pytorch/runs/2504796695 after this PR, `quick-checks` still succeeds Reviewed By: driazati Differential Revision: D28205900 Pulled By: samestep fbshipit-source-id: bea856e15bdd17ee66e9ebba019ce91133b17bcd	2021-05-04 19:18:18 -07:00
Rohan Varma	7175d49122	[Dist profiling] Add is_async field (#57253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57253 This PR: 1. Adds is_async getter/setter to RecordFunction 2. Adds is_async field to LegacyEvent and KinetoEvent, read from RecordFunction 3. Modifies python profiler code to check is_async via this flag (and keeps the old thread check as well) 4. Sets profiling of c10d collectives as async in ProcessGroup.cpp 5. Modifies tests to ensure is_async is set This also fixes flaky tests such as #50840 and #56690 which have been flaky due to the profiling part (https://github.com/pytorch/pytorch/pull/56963 tried to do so as well but this is a better approach). ghstack-source-id: 128021158 Test Plan: CI Reviewed By: walterddr, ilia-cher Differential Revision: D28086719 fbshipit-source-id: 4473db4aed939a71fbe9db5d6655f3008347cb29	2021-05-04 17:44:28 -07:00
Bert Maher	151e81b7bc	[nnc][tests] Skip long running tests when using TE interpreter (#57568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57568 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28202740 Pulled By: bertmaher fbshipit-source-id: 3f88aed91cd92c270ea5e6b504ae5ebc6810aa2b	2021-05-04 16:57:48 -07:00
Vasiliy Kuznetsov	7c3a30fd79	fx quant: remove matching hack for binary qhandler (#57470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57470 Removes the earlier hack of matching patterns originally matched to BinaryOpQuantizeHandler to switch to CopyHandler. After this PR, each pattern can only be matched to one type of QuantizeHandler or to nothing. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28152909 fbshipit-source-id: afc285e770bd7eb0518c90e3ee4874c421e78bbc	2021-05-04 16:38:56 -07:00
Alexander Golynski	2b6c09c11e	Add futures to ProcessGroupMPI work (but not including Send/Recv) and python DDP comm hook testing (#57214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57214 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28200791 Pulled By: agolynski fbshipit-source-id: 83f814abd4f2eea70e383ed373b04aae8291be55	2021-05-04 16:04:45 -07:00
Eli Uriegas	8c9e42baaf	.github: Add render_test_results job (#57472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57472 This render should put us with some feature parity to the CircleCI web UI renders for j(x)unit test reports, should make it so you don't have to look through a long list of logs to see what tests failed for which job Render should look somewhat similar to ![image](https://user-images.githubusercontent.com/1700823/116908744-1bb4b980-abf8-11eb-904c-e93ea4d2f805.png) Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D28154513 Pulled By: seemethere fbshipit-source-id: 02d918b5c4cb6e236b806db48c3debe44de69660	2021-05-04 15:59:31 -07:00
Eli Uriegas	00d6472b4d	tools: Add render_junit script (#57327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57327 Renders junit results to the console Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28154514 Pulled By: seemethere fbshipit-source-id: 02e34930b4f0bd257b4e359623b06a4b8f8e996d	2021-05-04 15:58:23 -07:00
Tao Xu	9c5478588e	[iOS GPU] [easy] Rename APIs in MPSImageWrapper Summary: 1. Clean up unused APIs on MPSImageWrapper 2. Rename textures to images to avoid confusions. Test Plan: CI Reviewed By: husthyc Differential Revision: D28176917 fbshipit-source-id: 3afb261d9e5a9a6145ca3067cf0d245f1bf04683	2021-05-04 15:37:51 -07:00
Jane Xu	76d9070d10	Replace windows CUDA 11.2 CI with 11.3 (#57223 ) Summary: Testing 11.3 with current CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57223 Test Plan: Relevant CI (11.3) pass! Disclaimer: Skipped test_inverse_errors_large for CUDA 11.3 as it failed. Issue documented at https://github.com/pytorch/pytorch/issues/57482. Reviewed By: malfet Differential Revision: D28169393 Pulled By: janeyx99 fbshipit-source-id: 9f5cf7b6737ee6196de92bd80918a5bfbe5510ea	2021-05-04 14:23:23 -07:00
Nicolas Hug	1fc89d9ffc	Use proper Google Analytics id (#56578 ) Summary: This PR fixes the GA id and relies on `pytorch-sphinx-theme` to set the GA script instead of hard-coding it (this is supported since https://github.com/pytorch/pytorch_sphinx_theme/pull/110 was merged). Similar PRs were opened and merged in torchchvision/audio/text, e.g.: https://github.com/pytorch/vision/pull/3700 CC brianjo Pull Request resolved: https://github.com/pytorch/pytorch/pull/56578 Reviewed By: mrshenli Differential Revision: D28199244 Pulled By: ranman fbshipit-source-id: a20b7fd1b1da3ebff491286c3eeb1410f3c80670	2021-05-04 13:23:16 -07:00
Xiang Gao	383e451036	Implement torch.sort with cub::DeviceSegmentedRadixSort (#56821 ) Summary: Benchmark: ```python import torch import itertools for i in range(1000): torch.arange(100000, device='cuda') def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() for i, j in itertools.product([512, 4096, 8192], repeat=2): print(i,j) t = torch.randn(i, j, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda: torch.sort(t)) torch.cuda.synchronize() %timeit run50_sync(lambda: torch.sort(t, dim=0)) print() ``` Before ``` 512 512 4.02 ms ± 28.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5 ms ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 512 4096 40.7 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 33.9 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 512 8192 71.7 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 66.4 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 512 27.6 ms ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 46.6 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 4096 262 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 321 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4096 8192 520 ms ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 661 ms ± 853 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 512 54.6 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 83.2 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 8192 4096 521 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 645 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 8192 1.04 s ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.34 s ± 541 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 512 512 4.65 ms ± 62.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.75 ms ± 62.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 512 4096 30.3 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.4 ms ± 421 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 512 8192 59.7 ms ± 344 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 77 ms ± 601 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 512 32.2 ms ± 376 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 37.1 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 4096 204 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 272 ms ± 1.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4096 8192 422 ms ± 3.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 562 ms ± 4.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 512 63.1 ms ± 595 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 72.7 ms ± 532 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 8192 4096 401 ms ± 3.08 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 573 ms ± 2.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 8192 831 ms ± 7.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.2 s ± 9.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56821 Reviewed By: mrshenli Differential Revision: D28172609 Pulled By: ngimel fbshipit-source-id: 87314a6985a84d326304ff5220df5661ef00d710	2021-05-04 13:16:52 -07:00
Pyre Bot Jr	bca1949dc9	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D28191118 fbshipit-source-id: 59421c7346903597308b0fdf8a0984f56664fb4f	2021-05-04 12:44:27 -07:00
kshitij12345	28c24ec3e8	[numpy] polygamma: int -> float promotion (#57462 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57462 Reviewed By: mrshenli Differential Revision: D28187104 Pulled By: ezyang fbshipit-source-id: 4072589ad1cb9766e7721d006d43701820922d56	2021-05-04 12:22:57 -07:00
Mike Ruberry	1461859fde	Revert D28048289: [TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. Test Plan: revert-hammer Differential Revision: D28048289 (`6b2cb939c5`) Original commit changeset: 3867e862a0ec fbshipit-source-id: bdd45dcc4b229673efeb06da411bbf0c58d44026	2021-05-04 11:29:14 -07:00
albanD	b3c0ef4a40	Revert back to old assert behavior in as_view (#57499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57499 Reviewed By: gmagogsfm Differential Revision: D28162814 Pulled By: albanD fbshipit-source-id: e3a970107ab59bb15794f0f82ee12c771caa93d5	2021-05-04 11:16:11 -07:00
Sam Estep	42d073a7e9	Look for unqualified ignore in .pyi, not just .py (#57468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57468 Test Plan: On the commit that expanded the lints but didn't remove the `# type: ignore` comment, the quick-checks job failed: - https://github.com/pytorch/pytorch/runs/2493713340 In contrast, on the tip of this PR, both the quick-checks job and the mypy job succeed: - https://github.com/pytorch/pytorch/runs/2493744907 - https://github.com/pytorch/pytorch/runs/2493746144 Reviewed By: driazati Differential Revision: D28153020 Pulled By: samestep fbshipit-source-id: 5e21bde38ab741e87b3e5f3d45e7e50456fd7ec9	2021-05-04 09:40:40 -07:00
Shiyan Deng	34d853a524	[fx2trt] example for lowering model to trt with FX based tooling (#57298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57298 Some of the code is borrowed from NVIDIA-AI-IOT/torch2trt https://github.com/NVIDIA-AI-IOT/torch2trt/tree/master/torch2trt. Move fx2trt stuff to fx/experimental/fx2trt. Add an example in fx/experimental/fx2trt/example/fx2trt_example.py that shows how we lower resnet18 to TensorRT using FX. TODO: Include license from NVIDIA-AI-IOT/torch2trt Test Plan: CI Reviewed By: jackm321 Differential Revision: D28102144 fbshipit-source-id: 1a7b03e45b8ab3fcc355d097d73afeec2efc3328	2021-05-04 09:24:43 -07:00
Kimish Patel	5326ec60e6	[Inlined Callstack Fix] Fix inlined callstack for blocks of the node. (#56562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56562 Earlier inlined callstack was annotated only nodes. This left out nodes such as If which have block of nodes. These nodes should also be updated similarly. Test Plan: Added test in test_misc Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27902516 fbshipit-source-id: 4e65c686fa6b4977e8719db45f71f7d2599d4d8e	2021-05-04 09:21:15 -07:00
Kimish Patel	bb3c6699a5	[Pytorch Mobile DebugInfo Serialization] Save debug handles for all instructions. (#55252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55252 Earlier for bytecode serialization we were saving debug handles only for OPs and not all instructions. This PR makes changes to add that for all instructions. Test Plan: python test/mobile/test_lite_script_module.py TestLiteScriptModule Imported from OSS Reviewed By: dreiss Differential Revision: D27542502 fbshipit-source-id: cff75118c721ce9f0c2f60d2c9471481f05264ca	2021-05-04 09:21:13 -07:00
Kimish Patel	e0fc473e47	[Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062 This diff introduces the following changes: 1. InlinedCallStack pickler/serializer is introduced. It is serialized as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack} Module instance info is serialized as tuple of {class_type_name, instance_name}. Note that callee of the serialized inlined callstack points to the tuple of already serialized callstack. This means the first callstack ptr to serialize, will serialize entire path of the tree, where some callee nodes might be shared with callstack pointers that will be serialized subsequently. Pickler supports memoization of pickled objects, where if a tuple has been serialized then object id is obtained instead of serialized object again. Thus we stll serialize the tree and not every path from the root separately. Furthermore, InlinedCallStackSerializer also uses cache to lookup the pointer and return the serialized IValue. Furthermore, note that we must also serialize the source range of InlinedCallStack. In order to this serializer requires map of source-range-tags-to-source-range map. This was done in the previous diff, where as part of source range serialization we also generate unique tags. These are the tags that are serialized in InlinedCallStack. Thus during deserialization we would have to deserialize source range before deserializing InlinedCallStacks. 2. Furthermore, each serialized InlinedCallStack is serialized with a unique debug_handle and source range tag. BackendDebugHandleManager manages generation of unique debug handles and saves the map of debug-handles-to-{source_range_tag, inlined-callstack-ptr}. This map is then serialized as callstack_debug_map.pkl. Note that inlined callstack is not sufficient to get all the source information since it contains source information about the nodes which are inlined. The top-of-the-stack (or bottom) node, which is the actual op node, is not part of the inlined callstack pointer and thus the source range of this node is serialized separately using source_range_tag. This is similar to how JIT creates callstack in torch/csrc/jit/runtime/interpreter.cpp Unique debug handles facilitates exception throwing or profiling using just the debug handle without any further qualifications, such as which function or module the inlined-callstack belongs to. Furthermore, this diff refactors the old mobile code for tracking module hierarchy information per op. Mainly now bytecode serialization will serialize debug handles corresponding to ops/nodes in graph and have callstack_debug_map.pkl help generate: 1. Entire callstack and 2. Module hierarchy information. Test Plan: python test/mobile/test_lite_script_module.py TestLiteScriptModule ./build/bin/test_jit --gtest_filter=*ModuleInfo Imported from OSS Reviewed By: raziel Differential Revision: D27468709 fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23	2021-05-04 09:21:12 -07:00
Kimish Patel	f4a921600a	[PyTorch, Mobile] Serialization format change for source range (#54284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54284 In order to bring mobile deployment, via lite interpreter, on feature parity with JIT, with respect model level debug information we must make model level debug information available to mobile runtime. At the moment, model level debug information is stored in SourceRange which associates node's of graph to where the come from in original python source code. This information is serialized as part of debug_pkl and deserialized when JIT loads the model and reads the model code. On lite interpreter, we do not have access to all the functionality of JIT and hence we cannot load model in the same way as JIT, by reading code, constructing module hierarchy and graph corresponding module methods etc. Instead in, lite interpreter, only bytecode corresonding to the compiled graph, Code, is saved. Thus in order to annotate OPs in the bytecode with equivalent SourceRange information we do the following: 1. During model serialization, we create a unique tag for each source range of the model. 2. Create a map of <SourceRange, tag> 3. During debug_pkl serialization we save tag along with SourceRange, on top of byte offset. 4. During bytecode generation, the methods of the top module are lowered. During this process methods are inlined. In the inlined graph, when the node of a graph is lowered to bytecode, we query node's source range and look it up against the map. 5. Resulting source range tag is serialized in module_debug_info. 6. During model deserialization, we read all the debug_pkl records in the archieve and create a map of <tag, SourceRange> 7. This map can be used to find source code information. During mobile runtime: 1. We read all the debug_pkl records and create <tag=debug_handle, SourceRange> map. 1.1 This map, MobileDebugInfo, is a member of mobile Module. 2. Interpreter catches appropriate exceptions and sets the thread local debug handle and rethrows the exception. 3. In Function's run method we catch exception and query current debug handle where the exception happened. 4. Query MobileDebugInfo with debug handle to retrieve source range and augment error with source range info. This information is still incomplete as it does not contain entire callstack. In the following diffs we will serialize InlinedCallStack directly. Note that compilation is gated by SYMBOLICATE_MOBILE_DEBUG_HANDLE macro, so that mobile builds can avoid building MobileDebugInfo, source range and source range pickler/unpickler. Later we will add path where, if building without debug support stack trace will contain only debug handles. They can be symbolicated later. Test Plan: Ported bunch of source range tests from test_jit.py. Added on more test in test_lite_interpreter.py Imported from OSS Reviewed By: raziel Differential Revision: D27174722 fbshipit-source-id: a7b7c6088ce16dec37e823c7fefa4f0b61047e12	2021-05-04 09:19:27 -07:00
Richard Barnes	aa5ff7cc91	irange for Indexing.cu (#57479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57479 Test Plan: Sandcastle Reviewed By: walterddr, ngimel Differential Revision: D28135714 fbshipit-source-id: 4fe4559b25165c59bd69180bfd439b74cedc0942	2021-05-04 08:52:32 -07:00
Julius Bier Kirkegaard	01e4444211	Tiny typo fix (#57113 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57113 Reviewed By: astaff Differential Revision: D28122605 Pulled By: zou3519 fbshipit-source-id: dcf30ce38366d62befd784d7b3878c2ad1e3b86b	2021-05-04 08:42:20 -07:00
Jeff Yang	03b5d87980	fix(docs): `torch.add` and `torch.mul` (#54672 ) Summary: fixes https://github.com/pytorch/pytorch/issues/39425 https://11813267-65600975-gh.circle-artifacts.com/0/docs/generated/torch.add.html https://11813267-65600975-gh.circle-artifacts.com/0/docs/generated/torch.mul.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54672 Reviewed By: ailzhang Differential Revision: D27328523 Pulled By: zou3519 fbshipit-source-id: c804e3312b63ee209fef8bdfd8a92d46a345aa21	2021-05-04 08:38:06 -07:00
Lucas Hosseini	dc49299078	Allow passing cpu to CUDA RPC device maps (#57019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57019 Based on https://github.com/pytorch/pytorch/pull/56043 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D28169796 Pulled By: beauby fbshipit-source-id: 7fcf623de07c74c4f1ab415b7e20b518876a567a	2021-05-04 04:14:27 -07:00
Hao Lu	5439977352	[Static Runtime] Revamp op schema check (#57521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57521 When an op is added to static runtime, we manually check the schema (not with the jit schema check, more with IValue.IsTensor()/IsInt() etc) and make sure it's the one we do support. If the schema doesn't match, SR would throw an exception with TORCH_CHECK, which makes the entire graph invalid for SR. This diff tries to make the op with unsupported schema to use the fallback path and make it go through the dispatcher instead: ``` if (node->kind() != prim::ListConstruct && node->kind() != prim::TupleConstruct && node->kind() != prim::DictConstruct && node->kind() != prim::ListUnpack) { const Operator& op = node->getOperator(); TORCH_CHECK(op.hasOperation()); op_ = op.getOperation(node); VLOG(1) << "Fallback interpreter for node: " << PrintNode(node); } ``` The 2-arg `torch.norm`, which the SR `torch.norm impl doesn't support (only 3, 4, 5 args are supported), now can run in static runtime with fallback mode. (Note: this ignores all push blocking failures!) Reviewed By: ajyu Differential Revision: D27531447 fbshipit-source-id: 0a9c2662ac73ed0393a23cc3a2c7df45fdb00fdd	2021-05-04 02:48:04 -07:00
Kiuk Chung	a80b215a9a	[1/n][torch/elastic] Move torchelastic docs .rst (#148 ) Summary: Pull Request resolved: https://github.com/pytorch/elastic/pull/148 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56811 Moves docs sphinx `.rst` files from the torchelastic repository to torch. Note: only moves the rst files the next step is to link it to the main pytorch `index.rst` and write new `examples.rst` Reviewed By: H-Huang Differential Revision: D27974751 fbshipit-source-id: 8ff9f242aa32e0326c37da3916ea0633aa068fc5	2021-05-04 00:57:56 -07:00
Alexander Golynski	3db45bcb91	Compilation error fix for torch/csrc/distributed/rpc/init.cpp (#57500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57500 Test Plan: Imported from OSS Reviewed By: SciPioneer Differential Revision: D28162887 Pulled By: agolynski fbshipit-source-id: b6fafd64778fc09a5e832b0a557ae70f06951454	2021-05-03 23:15:02 -07:00
puririshi98	3cc733e451	fix for nvtxstring not printing name for aten kernels (#57407 ) Summary: aten kernels have a sequence number of -1 In order to ensure the names are properly printed in every case, we must change the >= 0 to => -1 Example of bug: ![Capture](https://user-images.githubusercontent.com/20074092/116767312-45959280-a9e4-11eb-92a3-c2236a00d481.PNG) Example of fix: ![image](https://user-images.githubusercontent.com/20074092/116919709-82d96a80-ac06-11eb-8b74-e34cf1214ea5.png) Additionally, while fixing and investigating this issue another issue was detected and has now been filed: https://github.com/pytorch/pytorch/issues/57476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57407 Reviewed By: anjali411 Differential Revision: D28165818 Pulled By: ngimel fbshipit-source-id: dd3d245f1ea23c4b2edfcedbed3b47705ec1e966	2021-05-03 21:42:56 -07:00
Xiang Gao	67f874de8f	[resubmit] Remove sync for randperm on small tensors. (#54113 ) (#57364 ) Summary: - [x] check MaskRCNN Pull Request resolved: https://github.com/pytorch/pytorch/pull/57364 Reviewed By: anjali411 Differential Revision: D28166385 Pulled By: ngimel fbshipit-source-id: 42804b52cc837a95fc1d7ea49b430b55598be7bb	2021-05-03 20:48:04 -07:00
Yi Wang	5c7e35c689	[RPC Framework] Clang-format remote_module.py and instantiator.py (#57414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57414 ghstack-source-id: 127927609 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D28138870 fbshipit-source-id: 04894abaf2e713dc559cd9795197f85539b25e17	2021-05-03 20:28:51 -07:00
Mikhail Zolotukhin	6b2cb939c5	[TensorExpr] Add methods for inspecting generated code in `TensorExprKernel`. (#57074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57074 The new methods allow to peak into bufferArgs which describe parameters that codegen expects. This description includes info whether a given parameter is a scalar var or a buffer and in case it's a buffer allows to get the corresponding `Buf*` pointer from which we could get the expected sizes. Differential Revision: D28048289 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: 3867e862a0ec3593906820826c2344bd8a8f5c0a	2021-05-03 20:02:28 -07:00
Mikhail Zolotukhin	030692cf9e	[TensorExpr] Remove `dtype_` and add `buf_` fields to `CodeGen::BufferArg`. (#57382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57382 `BufferArg` is used to describe parameters passed to the codegen: it indicates whether the parameter is a var or a buf and holds a pointer to the corresponding var/buf. Both var and buf contain dtype, and thus duplicating it in BufferArg is unnecessary - we can always get it from the var/buf. Hence we're removing dtype_ field from BufferArg in this PR. We're also adding a `buf_` field here: this is done so that BufferArg truly has all the info about the parameter. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28128329 Pulled By: ZolotukhinM fbshipit-source-id: c03bff54bc6860f7ac6edfcb42ce6a82d8309589	2021-05-03 20:02:26 -07:00
Mikhail Zolotukhin	839d549f8f	[JIT] Add a pass for removing a first (self) argument from a graph if it is unused. (#57169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57169 The pass is planned to be used in AOT pipeline, where we expect input graphs to be functional. As such, these graphs should not use 'self' argument even if it is present, and thus it can be remove safely. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28128328 Pulled By: ZolotukhinM fbshipit-source-id: a7dfbf7776682826100c8eb0fef982a2e81c2554	2021-05-03 20:02:25 -07:00
Mikhail Zolotukhin	3ad3d8bd3f	[JIT] Add a pass for annotating graph with input types derived from sample inputs. (#57076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57076 This pass is intended to be used in conjunction with shape propagation pass: first we use sample inputs to specify shape info for graph inputs and then we run shape-prop to infer shapes of intermediate values in the graph. Differential Revision: D28048290 Test Plan: Imported from OSS Reviewed By: astaff Pulled By: ZolotukhinM fbshipit-source-id: 778d772e873d59d77af9f669f45dc44b9ee5e443	2021-05-03 20:01:15 -07:00
Ailing Zhang	74a4868d9a	Add docs for c10::InferenceMode. (#57480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57480 Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D28164070 Pulled By: ailzhang fbshipit-source-id: a6b0658f3e65f76387095fbd8d66c762914d3bea	2021-05-03 19:36:10 -07:00
albanD	75f6dcf8b5	protect destructors of python bindings that can be kept alive by c++ objects (#57488 ) Summary: Such a deadlock was found for PyFunctionPreHook after adding https://github.com/pytorch/pytorch/pull/57057 This is fixing all occurrences in torch/csrc/autograd Pull Request resolved: https://github.com/pytorch/pytorch/pull/57488 Reviewed By: malfet Differential Revision: D28163321 Pulled By: albanD fbshipit-source-id: 4daf1db69674e73967fc7c5ca2a240c61340e7ca	2021-05-03 19:32:37 -07:00
Jane Xu	1d3a9bff3c	Swap CUDA 10.1 and CPU CI for windows (#57493 ) Summary: This change temporarily disables CUDA testing on PRs, but keeps it on master. This is likely to increase the number of reverts, but this is necessary as a stop-gap measure to cap the CI costs growth. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57493 Reviewed By: seemethere Differential Revision: D28162697 Pulled By: janeyx99 fbshipit-source-id: 1bc529a405f7d63c07f4bd9f8ceca8da450743fc	2021-05-03 19:21:09 -07:00
Yi Wang	4143483d95	[RPC Framework] Create a separate remote module template when moving CPU tensors to a cuda device is not enabled (#57413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57413 An internal test fails because somehow `Tuple[()]` is not considered compatible with `Tuple[Any]` in TorchScript, even if the code that involves this type of variables is not executed at all. Therefore, create separate templates for instantiation to avoid typing check failure. This can address the FIXME left in https://github.com/pytorch/pytorch/pull/57288 #Closes: https://github.com/pytorch/pytorch/issues/51670 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule -j 1 buck test mode/dev-nosan caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- test_load_di_parts Reviewed By: wanchaol Differential Revision: D28138864 fbshipit-source-id: 39e3e67b0c3979b607ff104d84b4fb1070ffefd6	2021-05-03 19:10:24 -07:00
Ilqar Ramazanli	15975cf6a6	To add priority of int/int? over int[] on signature matching and adding {h,v,d}split methods (#57346 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54555 It has been discussed in the issue https://github.com/pytorch/pytorch/issues/54555 that {h,v,d}split methods unexpectedly matches argument of single int[] when it is expected to match single argument of int. The same unexpected behavior can happen in other functions/methods which can take both int[] and int? as single argument signatures. In this PR we solve this problem by giving higher priority to int/int? arguments over int[] while sorting signatures. We also add methods of {h,v,d}split methods here, which helped us to discover this unexpected behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57346 Reviewed By: ezyang Differential Revision: D28121234 Pulled By: iramazanli fbshipit-source-id: 851cf40b370707be89298177b51ceb4527f4b2d6	2021-05-03 18:52:41 -07:00
Jane Xu	c0309af1f3	Actually report mac stats (#57511 ) Summary: Give credentials to pytorch mac tests in CI so that test reports can be uploaded to S3. Master runs have not been uploaded to S3 as the credentials were missing. https://app.circleci.com/pipelines/github/pytorch/pytorch/311990/workflows/2b2fbb72-b613-4986-8842-eccd93e7cdae/jobs/12945609/steps Pull Request resolved: https://github.com/pytorch/pytorch/pull/57511 Reviewed By: samestep Differential Revision: D28165041 Pulled By: janeyx99 fbshipit-source-id: a4a9c793029838bdab41af19dbce1c8c49f7122d	2021-05-03 18:35:30 -07:00
Can Balioglu	bf6e3425b0	[23/n] [torch/elastic] Introduce the implementation of DynamicRendezvousHandler (#57151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57151 This PR introduces the implementation of `DynamicRendezvousHandler` that mostly facilitates the types introduced in previous PRs. ghstack-source-id: 127685212 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28060531 fbshipit-source-id: 844ff0e9c869f2bbb85fba05a16002d00eae130f	2021-05-03 18:32:43 -07:00
Can Balioglu	a357fc8a4b	[22/n] [torch/elastic] Introduce a new from_backend static constructor for DynamicRendezvousHandler (#57150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57150 This PR refactors the `__init__` method of `DynamicRendezvousHandler` to a `from_backend` static constructor for easier testing and future extensibility. ghstack-source-id: 127685183 Test Plan: Run the updated unit tests. Reviewed By: tierex Differential Revision: D28060336 fbshipit-source-id: b07dcbb61e8ff5a536b7b021cd50438010c648dd	2021-05-03 18:32:42 -07:00
Can Balioglu	4a10bd3b58	[21/n] [torch/elastic] Introduce _RendezvousJoinOp (#57149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57149 This PR introduces the `_RendezvousJoinOp` type that represents a rendezvous join operation to be executed via a `_RendezvousOpExecutor`. ghstack-source-id: 127685142 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28059785 fbshipit-source-id: 6e67a54289eef1a2349fcc52f8841e49c139459a	2021-05-03 18:32:40 -07:00
Can Balioglu	81ef683cb3	[20/n] [torch/elastic] Introduce _RendezvousExitOp (#57148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57148 This PR introduces the `_RendezvousExitOp` type that represents a rendezvous exit operation to be executed via a `_RendezvousOpExecutor`. ghstack-source-id: 127685094 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28059764 fbshipit-source-id: 2da428885f1390957242fdd82d68cee2ac273c71	2021-05-03 18:32:38 -07:00
Can Balioglu	baf8f4c0a6	[19/n] [torch/elastic] Introduce _RendezvousKeepAliveOp (#57147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57147 This PR introduces the `_RendezvousKeepAliveOp` type that represents a rendezvous keep-alive heartbeat operation to be executed via a `_RendezvousOpExecutor`. ghstack-source-id: 127685037 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28059733 fbshipit-source-id: 31fd8fc06f03d8f9cd21558b15a06dea7ad85bc6	2021-05-03 18:32:37 -07:00
Can Balioglu	3e024fcfc9	[18/n] [torch/elastic] Introduce _RendezvousCloseOp (#57146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57146 This PR introduces the `_RendezvousCloseOp` type that represents a rendezvous close operation to be executed via a `_RendezvousOpExecutor`. ghstack-source-id: 127684991 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28059693 fbshipit-source-id: 6c944d3b4f6a6ed2057ea2921ae8a42609998dd2	2021-05-03 18:32:35 -07:00
Can Balioglu	aa5d35e1d7	[17/n] [torch/elastic] Introduce _DistributedRendezvousOpExecutor (#57145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57145 This PR introduces the `_DistributedRendezvousOpExecutor` type that implements the `_RendezvousOpExecutor` interface for rendezvous shared via a `_RendezvousStateHolder`. ghstack-source-id: 127684945 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D28059417 fbshipit-source-id: 7ef72ea16b54eaaa11a6ece7459d385d49692a84	2021-05-03 18:31:23 -07:00
Adolfo Victoria	2a178d34cd	[Redo] Add pybind interface to caffe2 quantization server (#57378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57378 Previous version got reverted due to some tests not running because I wasn't in the pytorch github org Differential Revision: D28125562 fbshipit-source-id: 758c1c9a009e79febf6cbd062a47d2a3d94e3a78	2021-05-03 15:52:18 -07:00
Han Li	6d3bb01b1a	Sequence Blob NVM Reader to Selectively NVMify Ads Embeddings in A* Summary: This diff enabled mapping a selected set of Ads embeddings to the T17 host on hierarchical memory (nvmify). To achieve that the following is implemented: - Allow fo OTHER net to be both onnxified and nvmified - For that an allowlist placement policy is added to the nvmify stack - onnxifi_transform is lightly updated to accept a blacklist of operators based on name - nvm transform is broken into two parts, op replacement, and blob update. - A drived class `SeqBlobNVMReader` is defined which adds the functionality to load blobs to the card or nvm. Test Plan: * Unit test * Run predictor replayer: selectively load the following ads embedding to NVM as in `--caffe2_nvm_dram_placement_file=/home/hanli/nvm_allowlist`: ``` SPARSE_AD_ACCOUNT_ID SPARSE_NEW_AD_ID_COARSE SPARSE_NEW_AD_ID_REFINED SPARSE_NEW_CAMPAIGN_ID SPARSE_NEW_TARGET_ID SPARSE_NEW_AD_CLUSTER_ID SPARSE_NEW_PAGE_ID SPARSE_NEW_STORY_ID SPARSE_NEW_VIDEO_ID SPARSE_ENTITY_EQUIVALENCE_KEY SPARSE_ENTITY_EQUIVALENCE_KEY_NO_CREATIVE ``` major parameter change in sigrid_remote_predictor_glow_nnpi: ``` --caffe2_nets_to_nvmify=DISAGG_ACC_REMOTE_OTHER \ --caffe2_nvm_sls_ops=SparseLengthsSumFused8BitRowwise,SparseLengthsWeightedSumFused8BitRowwise,SparseLengthsSumFused4BitRowwise,SparseLengthsWeightedSumFused4BitRowwise,SparseLengthsSum4BitRowwiseSparse \ --caffe2_nvm_table_path=/home/hanli/tables/225412100_2870/ \ --caffe2_nvm_dram_placement_file=/home/hanli/nvm_allowlist \ --caffe2_nvm_dram_placement_policy=by_file_allowlist \ --caffe2_predictor_nets_to_load=DISAGG_ACC_REMOTE_OTHER ``` In predictor log, observe that the blobs to be NVMified are transformed in op types, skipped in Onnxifi transform, and deferred loaded and do NVM net transform: ``` I0416 09:59:29.550690 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m I0416 09:59:29.550701 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m I0416 09:59:29.550705 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m I0416 09:59:29.550712 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m I0416 09:59:29.550715 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m I0416 09:59:29.550721 662344 Nvmifier.cpp:142] ^[[92mReplacing SparseLengthsSumFused4BitRowwise with NVM variant.^[[0m ... I0416 09:59:31.665369 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 770 I0416 09:59:31.667042 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 777 I0416 09:59:31.667294 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 779 I0416 09:59:31.668828 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 786 I0416 09:59:31.668843 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 787 I0416 09:59:31.669909 662344 onnxifi_transformer.cc:1097] Skipping blocklisted op SparseLengthsSumFused4BitRowwiseNVM at pos 792 ... I0416 10:01:09.087282 662344 Nvmifier.cpp:346] found the name: table0 I0416 10:01:09.373975 662344 Nvmifier.cpp:374] ^[[96mSaved /home/hanli/tables/225412100_2870/table0^[[0m I0416 10:01:09.376008 662344 Nvmifier.cpp:343] filename: sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_COARSE_dedicated_13_w_EmbeddingFusedUint4Quantization .. I0416 10:11:05.310854 662344 Nvmifier.cpp:161] ^[[95mNVMifying the model.^[[0m I0416 10:11:05.310887 662344 Nvmifier.cpp:185] found the name: table0 for sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_COARSE_dedicated_13_w_EmbeddingFusedUint4Quantization I0416 10:11:07.580587 662344 Nvmifier.cpp:185] found the name: table4 for sparse_nn_sparse_arch_SPARSE_AD_ACCOUNT_ID_dedicated_20_w_EmbeddingFusedUint4Quantization I0416 10:11:07.580648 662344 Nvmifier.cpp:185] found the name: table3 for sparse_nn_sparse_arch_SPARSE_ENTITY_EQUIVALENCE_KEY_dedicated_22_w_EmbeddingFusedUint4Quantization I0416 10:11:07.580667 662344 Nvmifier.cpp:185] found the name: table5 for sparse_nn_sparse_arch_SPARSE_NEW_TARGET_ID_dedicated_29_w_EmbeddingFusedUint4Quantization I0416 10:11:07.580682 662344 Nvmifier.cpp:185] found the name: table2 for sparse_nn_sparse_arch_SPARSE_NEW_AD_ID_REFINED_dedicated_30_w_EmbeddingFusedUint4Quantization I0416 10:11:07.580695 662344 Nvmifier.cpp:185] found the name: table1 for sparse_nn_sparse_arch_SPARSE_NEW_STORY_ID_dedicated_35_w_EmbeddingFusedUint4Quantization ``` Make sure model is properly loaded: ``` I0415 21:42:48.400249 873685 ModelManagerBase.cpp:806] Loaded 225412100_2870 in 730944 ms (63800 ms of IO) memory used 8744167456 byte(s) ``` * Only load user embedding to NVM to make sure baseline use case is not broken by this diff: ``` --caffe2_nets_to_nvmify=DISAGG_ACC_REMOTE_REQUEST_ONLY \ --caffe2_nvm_sls_ops=SparseLengthsSumFused8BitRowwise,SparseLengthsWeightedSumFused8BitRowwise,SparseLengthsSumFused4BitRowwise,SparseLengthsWeightedSumFused4BitRowwise,SparseLengthsSum4BitRowwiseSparse \ --caffe2_nvm_table_path=/home/hanli/tables/225412100_2870/ ``` Make sure model is loaded: ``` Loaded 225412100_2870 in 381139 ms (56313 ms of IO) memory used 7043933560 byte(s) ``` * Run feed replayer: `buck-out/gen/sigrid/feed/prediction_replayer/fully_remote_replayer_main --use_new_encoding_for_ads_services --use_new_encoding_from_model_id_to_shard_id --request_file_path /data/users/hanli/f266405843.requests --model_id=265540157_0 --replayer_thread_count=30 --sigrid_predictor_single_host=2401:db00:272c:602e:face:0:10:0 --sigrid_predictor_single_port=7444 --num_iterations=5 --qps=100 --client_name=predictor_v1` (load predictor as in P411172400) Output: ``` I0428 21:20:25.106635 1396182 FullyRemoteReplayer.cpp:107] Loading requests from /data/users/hanli/f266405843.requests I0428 21:20:25.547982 1396182 FullyRemoteReplayer.cpp:109] Requests size : 6699 I0428 21:20:25.548146 1396182 Client.cpp:274] V1 tier name: V2 tier name: sigrid.predictor.fully_remote_test V2 fully remote tier name: I0428 21:20:25.548153 1396182 Client.cpp:282] [MF] Migration Framework (traffic routing) enabled: false I0428 21:20:25.548172 1396182 ModelRemoteStatus.cpp:206] Selection probabilities znode path: /configerator-gz/.prn I0428 21:20:25.674162 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:20:25.674181 1396182 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:21:26.252820 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:21:26.252851 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:22:22.225976 1396182 PredictionReplayer.cpp:67] Previous request took too long, not reaching target QPS I0428 21:22:26.252643 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:22:26.252678 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:23:26.252959 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:23:26.252987 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:24:26.253135 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:24:26.253166 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:25:27.252734 1396265 ModelRemoteStatus.cpp:612] Found 0 host, 0 shards in predictor tier I0428 21:25:27.252763 1396265 ModelRemoteStatus.cpp:557] Refresh sigrid model succeeded: 1 I0428 21:26:03.172894 1396182 FullyRemoteReplayer.cpp:59] cpu time p25, p50, p75, p95, p99 9570 13011 16218 20788 24840 I0428 21:26:03.172927 1396182 FullyRemoteReplayer.cpp:61] wait time p25, p50, p75, p95, p99 11845 15958 19946 26579 31842 I0428 21:26:03.172940 1396182 FullyRemoteReplayer.cpp:63] wall time p25, p50, p75, p95, p99 16194 20888 25303 31692 37387 ``` Reviewed By: ehsanardestani Differential Revision: D27701121 fbshipit-source-id: e898abc6957c839e402a9763172cf85d9bb84cbd	2021-05-03 15:21:13 -07:00
Nikita Shulga	589072afa1	Fix return type of getDeviceMap (#57487 ) Summary: https://github.com/pytorch/pytorch/pull/57294 changed behaviour to return `c10::Device` rather that `c10::DeviceIndex`, but missed method bind: `1a6f827ae6/torch/csrc/distributed/rpc/init.cpp (L606-L611)` that cast return type to map of c10::DeviceIndex rather than c10::Device Do not ignore cast error when compiling this code Pull Request resolved: https://github.com/pytorch/pytorch/pull/57487 Reviewed By: nikithamalgifb Differential Revision: D28158750 Pulled By: malfet fbshipit-source-id: d57d869cceca8b7ed06d4d638e2b911da8236ed4	2021-05-03 15:01:24 -07:00
Xu Zhao	d68ad3cb1e	Add a shortcut to test all torchbench models. (#57311 ) Summary: This PR adds a shortcut of specifying all models in TorchBench CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/57311 Test Plan: CI RUN_TORCHBENCH: ALL Reviewed By: bitfort Differential Revision: D28160198 Pulled By: xuzhao9 fbshipit-source-id: 67c292bc98868979d868d4cf1e599c38e0da94b5	2021-05-03 13:50:27 -07:00
Peter Bell	33eea146ee	torch.clamp with tensor min and max (#52695 ) Summary: Fixes gh-2793 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52695 Reviewed By: mruberry Differential Revision: D27395977 Pulled By: ezyang fbshipit-source-id: f86aa240feb034d42e4c45447e72218f6a773c24	2021-05-03 12:56:16 -07:00
Freey0	c328bb6d79	Port trunc to structured (#57350 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57350 Reviewed By: anjali411 Differential Revision: D28146473 Pulled By: ezyang fbshipit-source-id: bf23fc0dda4afddd070ff1bb7aac2759be4002e6	2021-05-03 12:51:50 -07:00
Can Balioglu	1a6f827ae6	[16/n] [torch/elastic] Introduce _RendezvousOpExecutor (#57144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57144 This PR introduces the `_RendezvousOpExecutor` interface. Implementers of this interface are responsible for executing rendezvous operations in a state machine that outputs actions based on the current state of the rendezvous. ghstack-source-id: 127684898 Test Plan: None beyond `flake8` and `mypy` as this is solely an interface definition. Reviewed By: tierex Differential Revision: D28059159 fbshipit-source-id: 8e7da33e02336206cddbe76d773681e98c28a98f	2021-05-03 12:18:27 -07:00
Can Balioglu	76bccfb2e0	[15/n] [torch/elastic] Introduce _RendezvousStateHolder (#56538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56538 This PR introduces the `_RendezvousStateHolder` interface and its accompanying `_BackendRendezvousStateHolder` type that is responsible for synchronizing the local rendezvous state with the other nodes. ghstack-source-id: 127684796 Test Plan: Run the existing and new unit tests. Reviewed By: tierex Differential Revision: D27892600 fbshipit-source-id: a55d884a1f9b0d742787be4dff4271e076c08962	2021-05-03 12:17:18 -07:00
eqy	160304a81d	fix comments in ATenNVRTC.h (#57318 ) Summary: Adding a function in ATenNVRTC.h also requires changing Lazy NVRTC.cpp, but this was missing in the comments. Also fix a typo. CC jjsjann123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57318 Reviewed By: anjali411 Differential Revision: D28146223 Pulled By: ezyang fbshipit-source-id: be69241a4b41ac7361a8c9f978fa4c837f41fbd1	2021-05-03 11:59:10 -07:00
Michael Carilli	e841f335aa	[RELAND] [CUDA graphs] Avoid sync errors when graph capturing cudnn rnn calls that use cudnn dropout (#57373 ) Summary: https://github.com/pytorch/pytorch/pull/56433 was reverted because the test perceived internal dropout state creation as a memory leak. This PR resubmits with the leak check skipped. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57373 Reviewed By: anjali411 Differential Revision: D28152186 Pulled By: ezyang fbshipit-source-id: 9a593fcdbbabbb09dc4e4221191663e94b697503	2021-05-03 11:41:40 -07:00
Can Balioglu	1b745efbe8	[14/n] Introduce a name attribute to _PeriodicTimer (#57143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57143 This PR introduces a `name` attribute in `_PeriodicTimer` for testing and debugging purposes. ghstack-source-id: 127684751 Test Plan: Run the new and updated unit tests. Reviewed By: tierex Differential Revision: D28059045 fbshipit-source-id: 9eb067300aea21a99577e6cd8a354f7eb749f4a6	2021-05-03 11:37:05 -07:00
Can Balioglu	233004b4c8	[13/n] Extend the return type of RendezvousBackend's set_state method (#57142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57142 This PR extends the return type of `RendezvousBackend`'s `set_state` method with an additional boolean flag that specifies whether the write attempt has succeeded. ghstack-source-id: 127629538 Test Plan: Run the updated unit tests. Reviewed By: tierex Differential Revision: D28058980 fbshipit-source-id: 26333790c39386891beb155b20ba1291d2cbdd03	2021-05-03 11:37:03 -07:00
Can Balioglu	a6f60cf4f0	[12/n] Rename last_keep_alives to last_heartbeats in _RendezvousState (#57141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57141 Per feedback this PR renames `last_keep_alives` to `last_heartbeats` in `_RendezvousState`. ghstack-source-id: 127629442 Test Plan: Run the updated unit tests. Reviewed By: tierex Differential Revision: D28058948 fbshipit-source-id: 0db12eac56a47a426a7a48fb5c93ac6a08b0d22e	2021-05-03 11:37:01 -07:00
Can Balioglu	3209364724	[11/n] [torch/elastic] Add heartbeat timeout to RendezvousTimeout (#57140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57140 This PR introduces a new `heartbeat` attribute in `RendezvousTimeout`. ghstack-source-id: 127626815 Test Plan: Run the updated unit tests. Reviewed By: tierex Differential Revision: D28058908 fbshipit-source-id: c6f8b3a06210cc59714fa841d9387eeb028dc02f	2021-05-03 11:37:00 -07:00
Can Balioglu	6876e15dbe	[10/n] [torch/elastic] Add comparison operators to _NodeDesc (#57139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57139 This PR sets the `order` attribute of the `dataclass` annotation to `True` in order to introduce comparison operators for `_NodeDesc`. ghstack-source-id: 127626783 Test Plan: Run the existing unit tests. Reviewed By: tierex Differential Revision: D28058851 fbshipit-source-id: 66313f84f507100e20acb687a3427b3dd51a6310	2021-05-03 11:36:58 -07:00
Can Balioglu	6bf8df6b3b	[9/n] [torch/elastic] Introduce RendezvousSettings (#56537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56537 This PR introduces the `RendezvousSettings` type to consolidate the arguments passed to `DynamicRendezvousHandler`. ghstack-source-id: 127626738 Test Plan: Run the existing unit tests. Reviewed By: tierex Differential Revision: D27890155 fbshipit-source-id: 22060c25b6927cc832f18ae6c5f7ba0f7a9ef3cf	2021-05-03 11:36:04 -07:00
Chen Lai	ac71432c54	[PyTorch][Edge] Add api to get bytecode version from runtime (#56948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56948 Add api to get runtime bytecode version ## Test Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass ghstack-source-id: 127939889 Test Plan: Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass Reviewed By: raziel, iseeyuan Differential Revision: D27987811 fbshipit-source-id: 35ed9bd626aecffc226f6dacfa046e6cdabfed51	2021-05-03 11:26:38 -07:00
Jerry Zhang	945c93b8bd	[quant][graphmode][fx] Skip observering boolean Tensors (#57375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57375 Skip observing the input for masked_fill. Currently we don't have a way to query the type of Proxy in GraphModule, hopefully we should have the functionality to annotate the type, we'll need to annotate a Proxy to be a boolean Tensor to remove this hack. Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_boolean_tensor Imported from OSS Reviewed By: vkuzo Differential Revision: D28126003 fbshipit-source-id: 2989766370a607579b3ea07ca36cdc2ce35893cc	2021-05-03 11:20:33 -07:00
davidriazati@fb.com	264d87985a	Use ld.gold by default to link in CI (#57061 ) Summary: This adds an option to CMake to use `ld.gold` to link rather than `ld` (which symlinks to `ld.bfd` on Ubuntu by default). This shouldn't change any functionality, only a mild improvement on link times during builds (shaves off 1 minute) on CI. Verify by searching for `ld.gold is available` in [the logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/13046834/output/105/0?file=true&allocation-id=608c434338107e5b6cf938a1-0-build%2F7BDA2FF1) ](https://our.intern.facebook.com/intern/diff/28123522/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57061 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28123522 fbshipit-source-id: 5a60798ca4785427fd92bbf3b3aa5f63730e9b20	2021-05-03 10:05:36 -07:00
Jane Xu	c0d39ba680	Replace 11.2 linux CI with 11.3 (#57222 ) Summary: Let's see how 11.3 holds up! Pull Request resolved: https://github.com/pytorch/pytorch/pull/57222 Test Plan: CUDA 11.3 has passed build and test below. Reviewed By: malfet Differential Revision: D28152554 Pulled By: janeyx99 fbshipit-source-id: 84b687660b9a5b6337b65d6aaaaf003ea94b2864	2021-05-03 09:48:52 -07:00
Rohan Varma	375c8a81dc	[DDP] Profile search_unused_parameters (#57376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57376 Having this in profiler/trace outputs will be useful when investigating performance overhead of find_unused_parameters for certain workloads, to determine whether it is a bottleneck or not. ghstack-source-id: 127942159 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D28126233 fbshipit-source-id: 93082ae5b84e64351d59447a29f97eaf9b0bbd64	2021-05-03 09:41:18 -07:00
liuyuanqiang@bytedance	52b389259c	Port max_pool2d_with_indices to structured kernel (#56459 ) Summary: Port max_pool2d_with_indices to structured kernel https://github.com/pytorch/pytorch/issues/55070 also clean code for https://github.com/pytorch/pytorch/issues/56320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56459 Reviewed By: zou3519 Differential Revision: D27882473 Pulled By: ezyang fbshipit-source-id: 9f502c3c89d57ee201db4a024465c4b79446c8c6	2021-05-03 09:36:09 -07:00
Shen Li	6bc3ad28a3	Revert D28143091: [pytorch][PR] Add cross OpInfo Test Plan: revert-hammer Differential Revision: D28143091 (`4a872f8539`) Original commit changeset: 0b98226a1811 fbshipit-source-id: eda38923f31ac5a79af5c78077ed0106d904f6da	2021-05-03 09:19:41 -07:00
Nikita Shulga	c7d8d8f925	[BE] Improve has_bf16_support (#57408 ) Summary: Use `functools.lru_cache` to avoid calling this function multiple time Check that we are running on Linux platform before trying to open "/proc/cpuinfo" Do not spawn new process, but simply open("/proc/cpuinfo").read() and search the output for the keywords Fixes https://github.com/pytorch/pytorch/issues/57360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57408 Reviewed By: driazati Differential Revision: D28136769 Pulled By: malfet fbshipit-source-id: ab476774c3be2913cb576d98d47a2f7ec03c19aa	2021-05-03 09:11:04 -07:00
Alexander Golynski	f332a8bdff	Implement result() function in MPI Work classes (#57168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57168 Implement result() for MPI which wasn't previously supported. Some user rely on output args, however in future usecases (e.g. DDP comm hook) we need to return the result explicitly. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D28129125 Pulled By: agolynski fbshipit-source-id: d6abcd2114163471c045043534a0a3377f2579b4	2021-05-03 07:12:46 -07:00
albanD	0a0e024648	use importlib instead of imp as it support python 3.5+ (#57160 ) Summary: Prevent some annoying deprecation warning when importing cpp_extensions Pull Request resolved: https://github.com/pytorch/pytorch/pull/57160 Reviewed By: astaff Differential Revision: D28096751 Pulled By: albanD fbshipit-source-id: f169ad4c4945b0fff54c0339052a29f95b9f1831	2021-05-03 05:56:25 -07:00
Facebook Community Bot	7e12c3e10a	Automated submodule update: tensorpipe (#56916 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `12699ad388` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56916 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27997920 Pulled By: beauby fbshipit-source-id: 057dff1f28bf3a9d1d05522d3b60ee3530aecf22	2021-05-03 02:08:56 -07:00
Alexander	87242d2393	Eliminate global usage of torch.set_default_dtype in test_autograd (#56446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56446 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28000589 Pulled By: mruberry fbshipit-source-id: c8fb2907d656138e72ecf8fb3e572591f8972900	2021-05-02 22:13:33 -07:00
kshitij12345	154eca0309	OpInfo: ravel, view, view_as (#56910 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56910 Reviewed By: ngimel Differential Revision: D28141867 Pulled By: mruberry fbshipit-source-id: bff49d40d7e3bb36bc83d1405bd77f5529eeffe9	2021-05-02 22:10:36 -07:00
Edward Yang	e845158b1a	Assert that GIL is not held in blocking destructors (#57030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57030 PR #57029 is not perfect; there are still obscure situations in which we might allocate a shared_ptr to an RpcAgent that doesn't have a no GIL constructor, so this PR adds the other half of the equation: assert that we don't hold the GIL when running a blocking destructor. This makes it possible to detect potential deadlocks even if the code doesn't deadlock in practice (because you got lucky and none of the threads you blocked on tried to also take out the GIL). I considered whether or not to make this DEBUG_ONLY. For now it's not, so I can get better CI coverage, and because this test only happens in destructors of objects that die rarely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28030582 Pulled By: ezyang fbshipit-source-id: a7d7f6545223c4823c7f6036dfe29bd2edaf60a5	2021-05-02 22:06:02 -07:00
Vasiliy Kuznetsov	da51fd31a5	fx quant: remove `find_quants` from convert (#57402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57402 This is a cleanup, the value is not used by anything. It was probably left behind after previous refactors. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28133622 fbshipit-source-id: 44a3f955d4af8d6dd15b4fb3038188568e4ee549	2021-05-02 20:13:13 -07:00
Vasiliy Kuznetsov	d6563bc153	fx quant: remove unnecessary quants arguments (#57399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57399 There were a couple of functions which took `quants` as arguments without using them, probably left over from after past refactors. Cleaning this up to improve code readability. Test Plan: ``` python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28132413 fbshipit-source-id: 636b146c0b5ef0caea9c4b539e245de245d48c49	2021-05-02 20:13:12 -07:00
Vasiliy Kuznetsov	643f41be61	fx quant: remove FixedQParamsOpQuantizeHandler from quantize.py (#57393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57393 Moves the information on whether we should pass the information whether the output is quantized based on the inputs to live on the qhandler object. This allows us to remove FixedQParamsOpQuantizeHandler from quantize.py, further reducing the coupling between handler objects and the quantization pass. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: astaff Differential Revision: D28132414 fbshipit-source-id: 5c28524b47c00f618d3a38657376abae9e6ffe7c	2021-05-02 20:13:10 -07:00
Vasiliy Kuznetsov	2bd158386a	fx quant: move input_output_observed to qhandler (#57388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57388 It's a bit confusing to have this be a decorator. It's simpler to just expose it as a function on qhandler. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28129411 fbshipit-source-id: f7316f285e8546c67e8d8cf753462b2c2abb2636	2021-05-02 20:13:08 -07:00
Vasiliy Kuznetsov	1b20eeb138	fx quant: move output obs logic to QuantizeHandler (#57377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57377 Moves the logic which determines 1. whether a pattern instance's output should be observed 2. whether a pattern instance's output should be marked as observed based on its inputs 3. whether to ovverride the activation specified in the qconfig from `quantize.py` to `quantization_patterns.py`. This makes the code easier to read and reduces the coupling between `Quantizer` and `QuantizeHandler` instances. Note: there are some further cleanups which would be good after this one - leaving those for future PRs. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28126896 fbshipit-source-id: 94c80a9c7307452783348d65b402acc84983e3f6	2021-05-02 20:13:07 -07:00
Vasiliy Kuznetsov	fe23881e76	fx quant: readability improvements on observer functions (#57368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57368 1. renames functions which only sometimes insert observers to start with `maybe_`, to clarify the difference from functions which always insert observers 2. saves a level of indent in `maybe_insert_observer_for_output_of_the_node` Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28126897 fbshipit-source-id: 4cbc184dbf5e85954314cfbbcdd1551474175bf0	2021-05-02 20:13:05 -07:00
Vasiliy Kuznetsov	db6cd42434	fx quant: clean up nit in insert_observer (#57367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57367 This code is never hit (see insert_observer_for_output_of_the_node which gates it out), so changing to an assert in order to have `insert_observer` actually always insert an observer. This helps code readability. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D28126898 fbshipit-source-id: 411bc37769a6eacbebc463ed6c84cac85871bd5e	2021-05-02 20:12:10 -07:00
Peter Bell	46a32e075c	Improve BatchNorm1d training performance (CPU) (#57033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57033 CPU part of gh-38915 BatchNorm1d is implemented by looping over the channels, selecting one channel at a time and performing cpu_serial_kernel loops per-channel. For (N, C) contiguous layout this results in a sub-optimal strided memory access pattern; guarunteeing no elements will ever be in the same cache line. I fix this by passing the entire input into one `TensorIterator` and letting it decide which dimensions to iterate over and how to divide work among threads. For statistic updates and the backward function, I use `at::mean` and `at::sum` instead of the ad-hoc reductions there. Not only does this allow better memory access patterns, it also enables vectorization and so performance improves for BatchNorm2d as well. Unfortunately, `at::var` and `at::var_mean` don't perform as well so I've left the other reductions as they were. Overall, on my machine this takes the 1d example from 24 ms down to 4 ms and the 2d example from 2.5 ms down to 2 ms. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28142333 Pulled By: ngimel fbshipit-source-id: 066fe4f37f29b6458005e513e85faa398eeb9e2d	2021-05-02 17:47:55 -07:00
Mike Ruberry	4a872f8539	Add cross OpInfo (#55483 ) Summary: One of the tasks in https://github.com/pytorch/pytorch/issues/54261. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55483 Reviewed By: ngimel Differential Revision: D28143091 Pulled By: mruberry fbshipit-source-id: 0b98226a1811f61cb90d2248dd4425135a096551	2021-05-02 16:23:02 -07:00
Philip Meier	5c68072ee8	add support for complex input to `torch.testing.assert_(equal\|close)` (#57162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57162 Reviewed By: ngimel Differential Revision: D28141902 Pulled By: mruberry fbshipit-source-id: fd35e73e10167e3e44da4daf6582183bc4a0de7f	2021-05-02 16:13:12 -07:00
Ivan Yashchuk	eaf00bf7d4	Skip linalg.qr saved mode check if compiled without LAPACK (#56284 ) Summary: This PR also removes qr and eig tests from test/test_torch.py. They were not skipped if compiled without LAPACK and they are now replaced with OpInfos. Fixes https://github.com/pytorch/pytorch/issues/55929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56284 Reviewed By: ejguan Differential Revision: D27827077 Pulled By: mruberry fbshipit-source-id: 1dceb955810a9fa34bb6baaccbaf0c8229444d3a	2021-05-02 16:07:07 -07:00
Yukio Siraichi	ce4449918a	Port reverse binary ops to `OpInfo` (#56471 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54296 Tracking Issue https://github.com/pytorch/pytorch/issues/54261 Summary: - `rsub` (aten function) was already ported - Ported tests for its dunder version: `__rsub__` - Ported tests for the other dunder functions: `__radd__`, `__rmul__`, `__rdiv__`, `__rpow__` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56471 Reviewed By: ngimel Differential Revision: D28142843 Pulled By: mruberry fbshipit-source-id: 3d1bd88a4f124774f48d33a7ca7bfc7f796360df	2021-05-02 16:01:12 -07:00
Rohan Varma	57f72b8433	[DDP] Uneven inputs: option to throw early (#56755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56755 Rehash of https://github.com/pytorch/pytorch/pull/47488 Adds a flag to ddp join() context manager that enables throwing a StopIteration across all ranks when this flag is specified. To do this, we implement the design in #47250. When running with this flag, we schedule an additional allreduce in the case that a joined rank needs to throw a StopIteration. In non-joined ranks forward pass, we match this allreduce and if at least one rank tells us to throw, we raise a StopIteration. Tested by modifying existing tests, as well as adding additional tests validating that this works with SyncBatchNorm models and a model with custom collectives in the forward pass. Currently running perf benchmarks, will post when those are available, but we expect a small (~2%) perf reduction when enabling this feature due to the blocking allreduce. Hence we will only recommend it for models with collective comm. ghstack-source-id: 127883115 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D27958369 fbshipit-source-id: c26f7d315d95f17bbdc28b4a0561916fcbafb7ca	2021-05-02 15:41:50 -07:00
Aliaksandr Ivanou	7fe4c1d0e7	Torchelastic: add multiprocessing tests to ci/cd (#56842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56842 Add elastic multiprocessing test to ci/cd Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/elastic/multiprocessing/... -- --run-disabled Reviewed By: wilson100hong Differential Revision: D27982226 fbshipit-source-id: 1b4e6f1a20867a6aa7ca409e280fdb04e8db198b	2021-05-02 14:03:47 -07:00
Vasiliy Kuznetsov	bb640efa40	ns for fx: add missing add_relu and mul_relu patterns (#56927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56927 Adds the connection of `torch.add` to `toq.add_relu` and of `torch.mul` to `toq.mul_relu`. Test Plan: CI Imported from OSS Reviewed By: supriyar Differential Revision: D28003475 fbshipit-source-id: a12871feacf84c5afb0e1cc47e708e285695ffeb	2021-05-02 08:34:49 -07:00
Ailing Zhang	0ecdbfebff	s/InplaceOrView/ADInplaceOrView/g (#57372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57372 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57324 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28121821 Pulled By: ailzhang fbshipit-source-id: f568dd2505f6279da9ffb93ce1d22e0f98c606bb	2021-05-01 22:56:18 -07:00
kshitij12345	41099ef71c	OpInfo: mvlgamma (#56907 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56907 Reviewed By: astaff Differential Revision: D28118669 Pulled By: mruberry fbshipit-source-id: f54ad6dc64ddb6bcfca5c5c7fd8f395cd9761128	2021-05-01 20:51:01 -07:00
Mike Ruberry	05b255c543	Revert D27487549: [TensorExpr] Add `CodeGen::call_raw` method. Test Plan: revert-hammer Differential Revision: D27487549 (`c9ab384af7`) Original commit changeset: d8f3d92262cd fbshipit-source-id: ea8e71dbe2d632bc0fb557362c8bd899eb6aa83a	2021-05-01 19:48:07 -07:00
Ivan Yashchuk	75a2a92b02	Add torch.linalg.cholesky_ex without checking for errors by default (#56724 ) Summary: The new function has the following signature `cholesky_ex(Tensor input, *, bool check_errors=False) -> (Tensor L, Tensor infos)`. When `check_errors=True`, an error is thrown if the decomposition fails; `check_errors=False` - responsibility for checking the decomposition is on the user. When `check_errors=False`, we don't have host-device memory transfers for checking the values of the `info` tensor. Rewrote the internal code for `torch.linalg.cholesky`. Added `cholesky_stub` dispatch. `linalg_cholesky` is implemented using calls to `linalg_cholesky_ex` now. Resolves https://github.com/pytorch/pytorch/issues/57032. Ref. https://github.com/pytorch/pytorch/issues/34272, https://github.com/pytorch/pytorch/issues/47608, https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56724 Reviewed By: ngimel Differential Revision: D27960176 Pulled By: mruberry fbshipit-source-id: f05f3d5d9b4aa444e41c4eec48ad9a9b6fd5dfa5	2021-05-01 18:48:27 -07:00
Hui Guo	afe6b4c8ee	[NNC] Add logical Operators '&&' and '\|\|' (#56947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56947 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28007342 Pulled By: huiguoo fbshipit-source-id: a2ad8d2e99d7c8d8c8bdcd8f65fa3f340bdd2bbc	2021-05-01 18:44:27 -07:00
Ivan Yashchuk	2be115336b	Fix torch.ormqr for non Fortran-contiguous inputs (#57314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57314 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118029 Pulled By: mruberry fbshipit-source-id: e2ef65093cc5f77769adc7066c76f0607b5559a9	2021-05-01 17:50:06 -07:00
Peter Bell	7c8d0069c4	grad_fn getter for optional strings (#55225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55225 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118113 Pulled By: mruberry fbshipit-source-id: 711723922cff3afa220e03d926cee5884e167706	2021-05-01 17:39:17 -07:00
Peter Bell	a5288a0244	Sparse support for division rounding_mode argument (#51989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51989 Test Plan: Imported from OSS Reviewed By: astaff Differential Revision: D28118114 Pulled By: mruberry fbshipit-source-id: 2a76ee55c3845552e57e93d54628ce3c2fab3399	2021-05-01 17:37:25 -07:00
Arindam Roy	6d681d064f	ROCM: Re-enable test_norm_fro_2_equivalence_old (#57170 ) Summary: This test was disabled for ROCM 3.9. With latest updates, the test is passing in ROCM 4.1. Hence enabling this test in test/test_linalg.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/57170 Reviewed By: astaff Differential Revision: D28118217 Pulled By: mruberry fbshipit-source-id: 1b830eed944a664c3b1b3e936b87096fef0c0ca2	2021-05-01 16:41:41 -07:00
Kenichi Maehashi	4350d4af77	Immediately mark DLPack capsule as used after stealing the ownership (#56789 ) Summary: After stealing the ownership of the tensor passed via DLPack capsule, PyTorch should immediately mark it as used (by changing its name to `used_dltensor`). This fix is needed because the following line may raise an exception: ```cpp py::module::import("torch.cuda").attr("init")(); ``` When an exception is raised, Tensor created by `at::fromDLPack` calls the `deleter`. However as the causple is not consumed, the producer (a library that created the capsule) also calls the `deleter`, causing a double free. Reprodcuer (I'm running this code on A100 GPU + PyTorch wheel which does not include `sm_80` support; in this configuration `torch.cuda.init` will raise a warning): ```py $ python -Werror >>> import torch.utils.dlpack >>> import cupy >>> tensor = torch.utils.dlpack.from_dlpack(cupy.arange(10).toDlpack()) free(): double free detected in tcache 2 zsh: abort (core dumped) python -Werror ``` Once this fix is merged users can now see the exception correctly: ``` A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56789 Reviewed By: astaff Differential Revision: D28118512 Pulled By: mruberry fbshipit-source-id: 56992f7a3fc78d94c69513e864a473ae9587a9c8	2021-05-01 16:20:54 -07:00
Mike Ruberry	3018093066	Revert D28110359: [TensorExpr] Add `TensorExprKernel::runFast` method. Test Plan: revert-hammer Differential Revision: D28110359 (`f219ed6627`) Original commit changeset: 4fdffc8196d2 fbshipit-source-id: 3c93a058b5dd7a3b71e399341a408ec74949ef56	2021-05-01 16:16:37 -07:00
Luca Wehrstedt	82d245faef	Inline hooks in ivalue::Future (#57354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57354 The ivalue::Future class used to have some hooks, defined as separate protected virtual methods, so that they could be overridden by the CUDAFuture subclass. Now that CUDAFuture has been merged into ivalue::Future those hooks can be "inlined" to where they're used, hopefully making the code more readable as it puts related things closer together. ghstack-source-id: 127920096 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28117199 fbshipit-source-id: f749cd842c3bdc44a08f0a33bef972dfbf08afdd	2021-05-01 16:12:58 -07:00
Luca Wehrstedt	fb7469fb7f	Use Devices instead of DeviceIndexes in Future (#57353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57353 Even though we merged CUDAFuture into ivalue::Future, the resulting methods still had basically two distinct codepaths (i.e., an "early exit" if `impl_ == nullptr` for CPU, and then some code for CUDA). This works but it risks creating divergence and inconsistencies when the same class is used in those two modes. Ideally we should have the same codepath, and have the stream operations be no-ops for CPU. Luckily, this is exactly what happens when using a CPU DeviceGuardImplInterface! Hence here I do that, and for convenience I also use c10::Devices instead of c10::DeviceIndexes (like we did in https://github.com/pytorch/pytorch/pull/57294 for RPC). ghstack-source-id: 127920097 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28100525 fbshipit-source-id: cfac73894220ef5fa8a0389b5533c5d69ba1cf04	2021-05-01 16:12:56 -07:00
Luca Wehrstedt	0422e67336	Use Devices instead of DeviceIndexes in TensorPipe agent (#57294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57294 With the advent of CPUs in the device maps, and to be more generic (e.g., to support AMD GPUs), and to avoid conversions when passing to Future and RRef and such, it's easier to use Devices instead of DeviceIndices. This started by just migrating the TensorPipe agent but the RPC layer is quite intertwined so I had to migrate a lot of stuff. ghstack-source-id: 127916562 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28092733 fbshipit-source-id: 024dcb3648c5898ab13e770413c43958f04f1a8a	2021-05-01 16:12:55 -07:00
Luca Wehrstedt	0c3e79b5b9	Rename DeviceGuardImplInteface's getStreamFromPool method (#57345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57345 Already back in https://github.com/pytorch/pytorch/pull/57046 we realized that calling this method `getStreamFromPool` could cause issues because that name gets HIPified and thus in some callsites we'd end up calling a method that doesn't exist. In the end we got away with it because the places where we were calling that method weren't HIPified. However in the next PR we'll use this method inside RPC, and that will start causing problems, hence here I rename it to something that should not cause conflicts. This is a private API (since it's inside `impl`) thus there's no backwards compatibility concerns. ghstack-source-id: 127916484 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28114923 fbshipit-source-id: e027ad08a8e02090c08c6407c2db5a7fde104812	2021-05-01 16:12:53 -07:00
Luca Wehrstedt	6697ef51b2	Add device() method to c10::Event (#57293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57293 It's just a nice convenience. ghstack-source-id: 127916265 Test Plan: It builds Reviewed By: mrshenli Differential Revision: D28092731 fbshipit-source-id: 99c8c33fd6e245915f2ed0c0482de132d7c75bf5	2021-05-01 16:12:51 -07:00
Luca Wehrstedt	58bc003487	Add pybind type caster for c10::Device (#57292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57292 In Future (and soon in other places too) we need to receive a list of devices from Python-land. We don't want to just take their indices because we need full devices in order to infer the type from them. torch.device is not defined through pybind, it's defined through a plain `PyModule_AddObject` call with CPython, thus pybind isn't naturally able to understand and convert it. However we can provide a custom type caster which fixes that. We have this already for at::Tensor, at::Generator, ... ghstack-source-id: 127916268 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28092732 fbshipit-source-id: 1c31d0b85a4d5c9e7bde8161efbb7574d505157c	2021-05-01 16:11:10 -07:00
Shen Li	2dffa8cdf8	Fix CUDA Stream synchronization when arguments contains RRefs (#57394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57394 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28131325 Pulled By: mrshenli fbshipit-source-id: 7174942d4c8dabe13f8eb1ba7fea599922a022c0	2021-05-01 16:04:11 -07:00
Peter Bell	d536e6c684	Fix variable names in torch.fft examples (#57290 ) Summary: Apparently normal reST doctests aren't run in CI, because of this line in the `conf.py`: `ac86e0a0e5/docs/source/conf.py (L366)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57290 Reviewed By: astaff Differential Revision: D28118198 Pulled By: mruberry fbshipit-source-id: 7af621c4fef4e5d37e0fc62b9fd4382cc1698d89	2021-05-01 15:56:19 -07:00
Mike Ruberry	3315f14280	Revert D28110358: [StaticRuntime] Use NNC's call_raw API to reduce call overheads. Test Plan: revert-hammer Differential Revision: D28110358 (`400ca7677c`) Original commit changeset: 94b87130a1ff fbshipit-source-id: 246c0e54b02443c039105f48c4c419fe281150fc	2021-05-01 15:35:34 -07:00
Wenlei Xie	20085f6d23	Support auto generation of device check (#56872 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56872 ghstack-source-id: 127914018 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986429 fbshipit-source-id: 0da8413b0b8e6810fcea27ed1de499f11f68bd1f	2021-05-01 12:02:09 -07:00
Wenlei Xie	22ecb8885f	Disable device check for foreach kernels (#56871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56871 foreach kernels fall back to slow path when tensor are on different devices Generated by codemod: ``` fastmod '(- func: _foreach.*)' '${1} device_check: NoCheck # foreach kernels fall back to slow path when tensor are on different devices' aten/src/ATen/native/native_functions.yaml ``` ghstack-source-id: 127914017 Test Plan: autotest Reviewed By: ezyang Differential Revision: D27986560 fbshipit-source-id: b0cd963cdba04b4e1589bbf369eb26b48d523968	2021-05-01 12:02:07 -07:00
Wenlei Xie	183320df96	Add device_check place holder for functions (#56870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56870 Automatic generation of device check code will be supported in following PRs. Changed are generaetd via: 1. Codemod ``` fastmod ' device_guard: False' ' device_check: NoCheck device_guard: False' aten/src/ATen/native/native_functions.yaml ``` 2. Python script: https://gist.github.com/wenleix/be20c34bbbfcee0b289cdea2cf15b16c ghstack-source-id: 127914016 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986427 fbshipit-source-id: 4e598a30306b80b5ade27af70d3e58770e401fc2	2021-05-01 12:02:05 -07:00
Wenlei Xie	f7f8540794	Fix tensor device in test_kthvalue_overlap (#56869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56869 ghstack-source-id: 127914015 Test Plan: auto test Reviewed By: ezyang Differential Revision: D27986559 fbshipit-source-id: f4a638d737b06dd5f384b54e20490d76543d4e78	2021-05-01 12:01:09 -07:00
Scott Wolchok	44cc873fba	[PyTorch] Autoformat c10 (#56830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56830 Opt into formatting on GitHub and format everything. This is a trial run before turning on formatting for more and eventually all of the codebase. Test Plan: CI Reviewed By: zertosh Differential Revision: D27979080 fbshipit-source-id: a80f0c48691c08ae8ca0af06377b87e6a2351151	2021-04-30 21:23:28 -07:00
Jiakai Liu	3c4d57c18b	[pytorch][nnc] update external functions for mobile build (#56850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56850 This is part of the changes to enable NNC AOT compilation for mobile. The generated kernels need to call these external functions thus change the declarations to use C linkage when building the mobile runtime. Added nnc_aten_addmm external function. ghstack-source-id: 127877411 Test Plan: - build & CI; - tested mobile build with stacked PRs; Reviewed By: ZolotukhinM Differential Revision: D27897154 fbshipit-source-id: 61d5499d7781a83bd2657859659fd1b5043d6b04	2021-04-30 19:07:19 -07:00
Scott Wolchok	b11a24209f	[PyTorch] Take advantage of string literals in TORCH_WARN (#54032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54032 Add a `const char*` override to c10::Warning::warn so that we can avoid wrapping plain C string literals in std::string. ghstack-source-id: 125544720 Test Plan: Buildsizebot some iOS apps? Reviewed By: ezyang Differential Revision: D27061983 fbshipit-source-id: dc11150c911a4317a8edac75e50c5ba43511ff24	2021-04-30 19:02:42 -07:00
Yi Wang	13dbb77b7a	[RPC Framework] Enable RemoteModule to directly send GPU tensors over the wire on TensorPipe RPC backend if a device map is provided (#57288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57288 If the device map provided by RemoteModue is not empty, then TensorPipe RPC backend can support directly sending GPU tensors over the wire. Also add pybind of `_get_device_map`. The changes in unit test setup is separated out as a follow-up PR, as currently it breaks some tests in `distributed/rpc/test_faulty_agent.py`. Still need to fix test_load_di_parts in `torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test`. Currently an early return is used to bypass this test failure. #Original PR issue: https://github.com/pytorch/pytorch/issues/51670 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule -j 1 CAUTION: This one actually fails and now it is bypassed. See FIXME in `_remote_forward`. buck test mode/dev-nosan caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- test_load_di_parts Reviewed By: wanchaol Differential Revision: D28021672 fbshipit-source-id: a89245dc35e1d9479811ec6f98d9f34116837d79	2021-04-30 18:04:45 -07:00
Serhat Yilmaz	20eac093a7	[torch][segment_reduce] Add support for initial value (#56923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56923 Next Steps in order: - Add backward support for CUDA - Add support for more aggregation types - Benchmarking (for cuda mainly)/more testing/documentation - Support for multi dimension Test Plan: Updated unit test to include 0 length segment as well. Reviewed By: ngimel Differential Revision: D27992228 fbshipit-source-id: 28851811f8a784a63162721c511d69e617a93727	2021-04-30 18:01:31 -07:00
Mustafa Bal	bd347012ec	Added sm_75 support for CI Xenial CUDA 11.1 cuDNN 8 builds (#57320 ) Summary: This PR adds `sm_75` CUDA architecture support for the PR CI build Xenial CUDA 11.1 cuDNN 8, with build name:`pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build`, so that generated artifacts from these builds can be installed and run on machines with CUDA capability sm_75. In PR https://github.com/pytorch/pytorch/issues/57207, the Xenial CUDA 10.2 cuDNN 7 build `pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build` was taken off the list of builds done for PRs to `master`. PR https://github.com/pytorch/pytorch/issues/56619 has added `sm_75` support for this build. This PR removes this support for the Xenial CUDA 10.2 cuDNN7 builds, and adds it for the current PR CI build Xenial CUDA 11.1 cuDNN 8 `pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_build`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57320 Reviewed By: astaff Differential Revision: D28125542 Pulled By: malfet fbshipit-source-id: f220b8f3279054c98cab9eef1e0d7e37161a946f	2021-04-30 17:51:42 -07:00
Jeffrey Wan	2b54cec7e8	Clean up naming and comments (#56964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56964 This PR does many things but does not update any logic: - Prefixes all function names that are not `gradcheck`, `gradgradcheck`, `get_numerical_jacobian`, and `get_analytical_jacobian` with underscore to indicate that they aren't part of the public API (https://github.com/pytorch/pytorch/issues/55714). - Improve naming to avoid referencing Jacobian rows or Jacobian cols when we really mean vjp and jvp as suggested by zou3519 - Try to reduce comment line length so they are more consistent and easier to read - Other misc improvements to documentaiton Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28096571 Pulled By: soulitzer fbshipit-source-id: d372b5f8ee080669e525a987402ded72810baa0c	2021-04-30 17:40:14 -07:00
Jeffrey Wan	bbdadab306	Refactor fast gradcheck (#55871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55871 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28096549 Pulled By: soulitzer fbshipit-source-id: ee8b71fbd03ee581e71cdfcfd5e2258adefe15a6	2021-04-30 17:39:09 -07:00
Horace He	47e9ec401a	[nnc] ported some more ops + added vectors to argvalue (#56766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56766 Test Plan: Imported from OSS Reviewed By: desertfire Differential Revision: D28118331 Pulled By: Chillee fbshipit-source-id: eb012943ad3b83e72a8cb17b594852164c3f0567	2021-04-30 17:34:49 -07:00
Ansley Ussery	233f2cd29f	Maintain submodule references during subgraph rewriting (#55463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55463 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D27621650 Pulled By: ansley fbshipit-source-id: e3558c64cdc2c1d846355fa58307a18c0714874b	2021-04-30 16:46:44 -07:00
Jongsoo Park	3a5f85465b	[pytorch] fewer cuda sync in unique by using cub instead of thrust (#57323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57323 Use cub library instead of thrust to reduce # of cuda stream synchronize. Reviewed By: ngimel Differential Revision: D28088029 fbshipit-source-id: b616294cd776aa5643c153e172258a0153a42b6a	2021-04-30 16:36:01 -07:00
Scott Wolchok	208f81b787	[PyTorch] ifdef out ATen tests that fail with static dispatch (#57379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57379 Reviewed By: cspanda Differential Revision: D27576223 fbshipit-source-id: 6f77f1ac8b92f955d654231527eee2a8b7a1ff3d	2021-04-30 15:58:13 -07:00
Sameer Deshmukh	293830bc19	Fix min() and max() for empty tensors (#52565 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52565 Reviewed By: anjali411 Differential Revision: D27999955 Pulled By: ezyang fbshipit-source-id: 30e88cc8d84806198500e3001ecf58fa764536dd	2021-04-30 15:55:10 -07:00
Bin Bao	c1a442248b	[JIT] Enable conv-add-relu fusion as a part of frozen graph optimization (#56580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56580 Turn on conv-add-relu fusion as default for the frozen graph optimization. Test Plan: ``` python test/test_jit.py -k test_freeze_conv_relu_fusion ``` Reviewed By: nikithamalgifb Differential Revision: D27915515 Pulled By: desertfire fbshipit-source-id: 9a68d2a6aba70e697258c02c4fd3f3fbfc9fb8f6	2021-04-30 15:29:38 -07:00
Mikhail Zolotukhin	400ca7677c	[StaticRuntime] Use NNC's call_raw API to reduce call overheads. (#57329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57329 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28110358 Pulled By: ZolotukhinM fbshipit-source-id: 94b87130a1ffdb4acf171ddcea3895e8a75c34ac	2021-04-30 15:26:20 -07:00
Mikhail Zolotukhin	f219ed6627	[TensorExpr] Add `TensorExprKernel::runFast` method. (#57328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57328 This method uses `CodeGen::call_raw` instead of `CodeGen::call`. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D28110359 Pulled By: ZolotukhinM fbshipit-source-id: 4fdffc8196d24fc3300a9b4bc69f67562042a045	2021-04-30 15:26:18 -07:00
Mikhail Zolotukhin	c9ab384af7	[TensorExpr] Add `CodeGen::call_raw` method. (#55113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55113 The new method allows to pass input and output arguments by `void*` pointers instead of CallArgs. That helps to reduce the invocation overhead. Currently this is only supported in LLVM codegen. Differential Revision: D27487549 Test Plan: Imported from OSS Reviewed By: bertmaher Pulled By: ZolotukhinM fbshipit-source-id: d8f3d92262cde1c155beefb629454370d9af2f89	2021-04-30 15:24:37 -07:00
davidriazati@fb.com	4c3283da0d	Fix binary_checkout to use master (#57389 ) Summary: These lines never should have been committed, remove themh Pull Request resolved: https://github.com/pytorch/pytorch/pull/57389 Pulled By: driazati Reviewed By: seemethere, samestep Differential Revision: D28129673 fbshipit-source-id: 2de4b4d94c569177fec0c9eac8b7e9a8e59b550b	2021-04-30 14:52:19 -07:00
Charles David Hernandez	5e422fa170	per_channel fake quant fp16 and fp64 support (#56894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56894 used the dispatch type macro to add support for fp16 and 64 tensors. haven't tested on gpu yet, will do so once I can rebuilt pytorch with cuda. Test Plan: python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_half_precision_numerics python test/test_quantization.py TestFakeQuantize python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cpu Imported from OSS Reviewed By: vkuzo Differential Revision: D28002955 fbshipit-source-id: c9cf17aa0f15f163bfcc8e5ef7b329ca754924fd	2021-04-30 13:52:45 -07:00
Xiang Gao	42b3fc29f4	Fix NVRTC versioning for CUDA 11.X (X>=3), CUDA 12 and later (#57204 ) Summary: NVRTC versioning has changed starting 11.3, and will change again for CUDA 12.X. See comment in code for detail. As a result, jit on CUDA 11.3 is broken. Also, the error message is misleading: When both `libname` and `alt_libname` are non-empty, the error message is only reporting `alt_libname`, it should report both. To reproduce the error, you can use: ```python import torch torch._C._jit_set_profiling_mode(False) torch._C._jit_set_profiling_executor(False) torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_override_can_fuse_on_gpu(True) torch.jit.script def jit_relu_dropout(x, prob) : # type: (Tensor, float) -> Tensor x = torch.nn.functional.relu(x) x = torch.nn.functional.dropout(x, p=prob, training=True) return x x = torch.randn((64, 40, 12, 1024), device="cuda:0", dtype=torch.float16, requires_grad=True) y = jit_relu_dropout(x, 0.5) ``` with CUDA 11.3, and you will see ``` Traceback (most recent call last): File "/home/gaoxiang/misc/nvrtc-failure.py", line 16, in <module> y = jit_relu_dropout(x, 0.5) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: Error in dlopen or dlsym: libnvrtc-8aa72235.so.11.3: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57204 Reviewed By: ngimel Differential Revision: D28122083 Pulled By: malfet fbshipit-source-id: fd387cf79f33a6d5a5b93d54c9f21e9c23731045	2021-04-30 13:24:01 -07:00
Can Balioglu	72b1faa2d2	[8/n] [torch/elastic] Add unit tests for _RendezvousState (#56536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56536 This PR adds unit tests to ensure that the encoded byte length of `_RendezvousState` stays under a certain limit. ghstack-source-id: 127626622 Test Plan: Run the newly-introduced unit tests. Reviewed By: tierex Differential Revision: D27890704 fbshipit-source-id: 24905c8bc9d985d5ee90d370f28739eb137ce0f0	2021-04-30 13:14:52 -07:00
Michael Carilli	bbc3cc6718	[CUDA graphs] [BC-breaking] Makes torch.cuda.amp.GradScaler scale updates in-place for better composability with graph capture (#55562 ) Summary: I'd like the following pattern (a natural composition of Amp with full fwd+bwd capture) to work: ```python # Create "static_input" with dummy data, run warmup iterations, # call optimizer.zero_grad(set_to_none=True), then g = torch.cuda._Graph() s.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(s): optimizer.zero_grad(set_to_none=True) g.capture_begin() with autocast(): out = model(static_input) loss = loss_fn(out) scaler.scale(loss).backward() g.capture_end() torch.cuda.current_stream().wait_stream(s) # Training loop: for b in data: # optimizer.zero_grad() deliberately omitted, replay()'s baked-in backward will refill statically held .grads static_input.copy_(b) g.replay() scaler.step(optimizer) scaler.update() ``` Right now `GradScaler` can't work with this pattern because `update()` creates the scale tensor for the next iteration out of place. This PR changes `update()` to act in place on a long-lived scale tensor that stays static across iterations. I'm not sure how this change affects XLA (see https://github.com/pytorch/pytorch/pull/48570), so we shouldn't merge without approval from ailzhang yaochengji. Tagged bc-breaking because it's a change to the amp update utility function in native_functions.yaml. The function was never meant to be user-facing though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55562 Reviewed By: zou3519 Differential Revision: D28046159 Pulled By: ngimel fbshipit-source-id: 02018c221609974546c562f691e20ab6ac611910	2021-04-30 13:03:05 -07:00
Scott Wolchok	3a777b6792	[PyTorch] Optimize intrusive_ptr(TTarget*) ctor (pybind) (#57053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57053 This ctor is intended for pybind use. It increments weakcount when creating a strong reference, which is only correct if you know that the value was previously zero. So, I consolidated make() with this ctor. ghstack-source-id: 127537070 Test Plan: existing CI Reviewed By: ezyang Differential Revision: D28037206 fbshipit-source-id: eec57a99e3e032830f156c1e6258760f6465137b	2021-04-30 11:26:58 -07:00
Sam Estep	b9b768c0e7	Revert D28011862: Add pybind interface to caffe2 quantization server Test Plan: revert-hammer Differential Revision: D28011862 (`81ef82e5f4`) Original commit changeset: 647383017c4f fbshipit-source-id: 1e2dbaba7c5fdc98d75a3bcf3722b529e9109348	2021-04-30 11:20:38 -07:00
Ivan Yashchuk	f54aa85a6c	Fix MAGMA qr for empty batched inputs (#56257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56257 CPU and cuSOLVER path were fixed with refactoring of `_linalg_qr_helper_default`. Resolves https://github.com/pytorch/pytorch/issues/50576 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960157 Pulled By: mruberry fbshipit-source-id: f923f3067a35e65218889e64c6a886364c3d1759	2021-04-30 11:15:03 -07:00
Ivan Yashchuk	ff59039a24	Add cuSOLVER path for torch.linalg.qr (#56256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56256 Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr' --durations=5` cuts the runtime for these tests by 1 minute locally. See https://github.com/pytorch/pytorch/pull/56256#issuecomment-821069086. Performance comparison: https://github.com/pytorch/pytorch/pull/56256#issuecomment-821077712. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960154 Pulled By: mruberry fbshipit-source-id: 5312330d82337dec2856ec5527156a3a547a0b50	2021-04-30 11:15:01 -07:00
Ivan Yashchuk	6cb9abfd20	Remove size arguments for internal orgqr and geqrf calls (#56255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56255 With refactored non-allocating `linalg_qr_out_helper` from the previous commit we don't need to specify the size arguments because the inputs to orgqr and geqrf are always of correct size. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960153 Pulled By: mruberry fbshipit-source-id: 0f9be25781371633378752b587da62b828816646	2021-04-30 11:14:59 -07:00
Ivan Yashchuk	d5e1cac6e1	Add non-allocating helper function for torch.linalg.qr (#56254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56254 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960151 Pulled By: mruberry fbshipit-source-id: 4067afed0dcca3f32d0fa153e50a268a850817b2	2021-04-30 11:13:22 -07:00
Brad Fish	e68c46bb3a	Propagate information on torch_shm_manager execl failure to parent process (#57310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57310 If we fail to exec `torch_shm_manager`, write an appropriate error message to stdout so that the parent process can have some context on the failure. Reviewed By: ejguan Differential Revision: D28047917 fbshipit-source-id: 68bf357df7a6b318c036f4f62cbb428a62cb139e	2021-04-30 11:11:09 -07:00
Brad Fish	2c2aa9e030	Address temp file/bind race condition in torch_shm_manager (#57309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57309 Addressing a race condition that can occur in `torch_shm_manager` between the time its temporary file is unlinked and when it `bind()`s the manager server socket to that same name. In that time window, other threads/processes can re-create another temporary file with the same name, causing `bind()` to fail with `EADDRINUSE`. This diff introduces `c10::TempDir` and associated helper functions that mirror those of `c10::TempFile` and generates the manager socket name using a combination of a temporary directory, which will be valid for the lifetime of `torch_shm_manager`, and a well-known file name within that directory that will never be used outside of `bind()`. Reviewed By: ejguan Differential Revision: D28047914 fbshipit-source-id: 148d54818add44159881d3afc2ffb31bd73bcabf	2021-04-30 11:11:07 -07:00
Brad Fish	7eed5410cd	Make c10::TempFile non-copyable but movable (#57308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57308 This diff makes `c10::TempFile` non-copyable but movable. `torch_shm_manager` was previously dependent upon some hidden behavior that was a result of copying `TempFile`s, which is also being made more explicit now that they can be moved but not copied. Context: `c10::TempFile` is currently copyable, which leads to surprising behavior. A seemingly valid `TempFile` may in fact be invalid if the original it was copied from has already been destroyed, resulting in the file descriptor to be closed and the filename being unlinked without the user knowing about it. In fact, both `c10::try_make_tempfile` and `c10::make_tempfile` cause copies of `TempFile` to be made, which can easily be verified by explicitly deleting the copy constructor of `TempFile` and attempting to compile. This means that in practice, users of these functions are getting temporary files that have already been closed and unlinked. This copying of `TempFile` is particularly interesting in the case of `torch_shm_manager`, which uses `try_make_tempfile` to generate the name of a Unix domain socket to communicate with clients. In order for `bind()` on the socket name to be successful, a file with that same name must not be linked in the filesystem, or `EADDRINUSE` will result. Happily, beacuse `try_make_tempfile` previously created a copy of the `TempFile` while destroying the original, `torch_shm_manager` did not encounter this. With this change, howevrer, `torch_shm_manager` must now explicitly destroy the `TempFile` before attempting to `bind()`. Unfortunately, this exposes a race condition--other code can re-generate the same-named temporary file after the one created by `torch_shm_manager` is explicitly unlinked but before `torch_shm_manager` binds it to the server socket. To be clear: this race condition already existed before this diff, but this makes things more explicit. The real fix will be in a follow-up change. Reviewed By: ejguan Differential Revision: D28047915 fbshipit-source-id: e8a1b6bb50419fe65620cfecdb67c566a4cf9056	2021-04-30 11:11:06 -07:00
Brad Fish	788aefd7cc	Propagate information on torch_shm_manager failures to parent process (#57307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57307 Extend the `"ERROR"` message that `torch_shm_manager` writes to the pipe when it encounters a fatal error with some extra context (specifically, the `what()` on a caught `std::exception`), allowing the parent process to gain some insight into the cause of the failure. Also, simply return from `main()` with an error exit code when a fatal exception is caught rather than re-throwing, because re-throwing leads to premature process termination that may prevent standard output from being flushed (and therefore the parent process from being able to read the error context from the pipe). Reviewed By: ejguan Differential Revision: D28047916 fbshipit-source-id: d423ee8ed1b2bf7831db877e8f8515ec6d6aa169	2021-04-30 11:09:47 -07:00
Yanli Zhao	3f81912885	static graph api skeleton (#54995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54995 provide an DDP private API to explicitly set the training is static, also set this flag in logger ghstack-source-id: 127755713 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D27444965 fbshipit-source-id: 06ef1c372296815944b2adb33fbdf4e1217c1359	2021-04-30 11:07:26 -07:00
Yanli Zhao	5f2b9b1df9	refactor autograd_hook (#54981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54981 put part of codes in autograd_hook into functions, so that they can be used in the static graph training later on. ghstack-source-id: 127755405 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D27439508 fbshipit-source-id: a02a4b029841f5e7f11cfc5496bb7972ef53d878	2021-04-30 11:06:04 -07:00
Adolfo Victoria	81ef82e5f4	Add pybind interface to caffe2 quantization server (#57330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57330 Differential Revision: D28011862 fbshipit-source-id: 647383017c4fbc9afc4fd5aa5c771fd6a4619e29	2021-04-30 10:53:34 -07:00
Edvard Ghazaryan	e62cdae469	Static Runtime support for aten::matmul (#57291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57291 aten::matmul support for static runtime Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- IndividualOps_Binary_MatMul Reviewed By: hlu1 Differential Revision: D28099671 fbshipit-source-id: 784035060c8c24953df47ca4227d2bca5094da22	2021-04-30 10:49:55 -07:00
Emilio Castillo	0a9c9cc674	Update DLPack to 0.4 (#55365 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55090 I included the header directly, but I am not sure if we should add this as a git submodule, what do you guys think? Also regarding the implementation, in ATen lanes seems not to be supported, but from CuPy complex types are exported with 2 lanes, I am not sure wether this is correct or not. However, in PyTorch this seems to be working properly, so I forgive 2 lanes for complex datatypes. TODO: add tests for complex and bfloat Easy test script against cupy ```python import cupy import torch from torch.utils.dlpack import to_dlpack from torch.utils.dlpack import from_dlpack # Create a PyTorch tensor. tx1 = torch.tensor( [2 + 1j, 3 + 2j, 4 + 3j, 5 + 4j], dtype=torch.complex128 ).cuda() # Convert it into a DLPack tensor. dx = to_dlpack(tx1) # Convert it into a CuPy array. cx = cupy.fromDlpack(dx) # Convert it back to a PyTorch tensor. tx2 = from_dlpack(cx.toDlpack()) torch.testing.assert_allclose(tx1, tx2) ``` Thanks to leofang who updated CuPy's dlpack version and his PR served me as the guide for this one. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55365 Reviewed By: ngimel Differential Revision: D27724923 Pulled By: mruberry fbshipit-source-id: 481eadb882ff3dd31e7664e08e8908c60a960f66	2021-04-30 10:30:05 -07:00
Scott Wolchok	b87d3fa432	[PyTorch][jit] Don't allow create() on singleton types (#56807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807 If I understand correctly, there's no reason to create your own instance of these global singleton types. ghstack-source-id: 127312270 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27973447 fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912	2021-04-30 10:28:50 -07:00
Shiyan Deng	d896d1f4ce	[fx splitter] Fix fusion group utility (#57280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57280 We've found an issue that fusion group would results in circular dependency. For example ``` a -> b -> c -> d \| ^ + -------------+ Only a has non tensor output and currently we would create a fusion group (a, b, d). This results in circular dependency because now the fusion group depends on c while c depends on the fusion group as well. ``` This diff implement the solution discussed before. When we add a node to fusion group, we add all the nodes that are in the middle of the fusion group and this newly added node. Use the same logic in minimizer to build fusion group. Test Plan: split_tests and net_min_tests Reviewed By: khabinov Differential Revision: D27917432 fbshipit-source-id: a3d99fe5929dbc9f8eb0f45bccd83fd7b173795a	2021-04-30 10:18:01 -07:00
Bert Maher	7c8a7efe3f	[nnc] Enable all fuser tests for cpu (#57332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57332 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28113481 Pulled By: bertmaher fbshipit-source-id: b55e4bbcc25a09614b37985873b72337fdefc6b0	2021-04-30 10:11:06 -07:00
Nikolay Korovaiko	d50a969f2a	reduce inline autodiff threshold so we can caputre smaller fusions (#57062 ) Summary: This should let us fuse simpler expressions like ```cpp torch.jit.script def foo(x): return torch.sigmoid(torch.sigmoid(x)) ``` RUN_TORCHBENCH: alexnet attention_is_all_you_need_pytorch Background_Matting BERT_pytorch demucs densenet121 dlrm fastNLP gen_torchvision_benchmarks.py LearningToPaint maml mnasnet1_0 mobilenet_v2 mobilenet_v2_quantized_qat moco pyhpc_equation_of_state pyhpc_isoneutral_mixing pytorch_CycleGAN_and_pix2pix pytorch_mobilenet_v3 pytorch_stargan pytorch_struct resnet18 resnet50 resnext50_32x4d shufflenet_v2_x1_0 squeezenet1_1 Super_SloMo tacotron2 vgg16 yolov3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57062 Reviewed By: zou3519 Differential Revision: D28053608 Pulled By: Krovatkin fbshipit-source-id: 6871c3d2a81dd326a481e7ecfaf2ffefffce4a89	2021-04-30 09:55:09 -07:00
Raghavan Raman	e795f88d6b	[NNC] Make flatten transform in-place (#56629 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/56157 This PR updates the `flatten` API in `LoopNest` to perform the flattening transformation in-place. After this transformation, the first loop in the input becomes the flattened loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56629 Reviewed By: H-Huang Differential Revision: D28004787 Pulled By: navahgar fbshipit-source-id: 7474ae237fae3fff0cd1c64a276a8831dc5b7db0	2021-04-30 09:51:45 -07:00
Nikita Shulga	b49e079a2a	Fix string_view::equals_ compilation by CUDA-11.3 (#57322 ) Summary: __builtin_memcmp is not a constexpr for character arrays for NVCC-11.3 compiler. Attempts to compile this code results in the following error: ``` /opt/conda/lib/python3.6/site-packages/torch/include/c10/util/string_view.h(585): note: constexpr memory comparison is only supported for top-level integer or array-of-integer objects /opt/conda/lib/python3.6/site-packages/torch/include/c10/util/string_view.h(340): note: called from: /opt/conda/lib/python3.6/site-packages/torch/include/c10/util/string_view.h(369): note: called from: ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57322 Reviewed By: janeyx99 Differential Revision: D28119125 Pulled By: malfet fbshipit-source-id: e5ff6ac7bb42022e86c9974919e055cf82c2ea83	2021-04-30 09:07:15 -07:00
Scott Wolchok	52805a0f4f	[PyTorch] Include hip_runtime.h in macros.h (#57070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57070 See code comment. ghstack-source-id: 127564865 Test Plan: CI, should unbreak build of following formatting diff Reviewed By: ngimel Differential Revision: D28044331 fbshipit-source-id: f571e60b2534313fb9e7dd13dd98d2441b9ce8b8	2021-04-30 09:02:48 -07:00
Bin Bao	c971401696	[JIT] Disable conv-add-relu fusion for cuDNN7 when model uses fp16 (#56579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56579 On earlier cuDNN versions, when a model uses fp16, the performance after conv-add-relu fusion regresses. Let's just disable the fusion for fp16 if cuDNN version is older than v8. Test Plan: Tested for fp16 models on Nvidia Tesla T4 Reviewed By: ZolotukhinM Differential Revision: D27915514 Pulled By: desertfire fbshipit-source-id: 1c0081a80540c507e608216c90bc74c486c7008d	2021-04-30 08:57:50 -07:00
leslie-fang-intel	731cc472c5	refactor autocast to be extensible for devices (#57104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57104 Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D28094173 Pulled By: ezyang fbshipit-source-id: a5fb62b9a4e58f30d2756bba4331d5fc88136b89	2021-04-30 08:46:40 -07:00
Rong Rong (AI Infra)	095c328d9f	Add supported backward_dtype to OpInfo (#56156 ) Summary: Related to https://github.com/pytorch/pytorch/issues/55601. - [x] removed complex autograd checker in `test_supported_backward` - [x] created `backward_dtype[If<Device>]` that inherits from normal `dtype[If<Device>]` by default - [x] removed all skip for backward test, instead added backward dtype - [x] change complex autograd to a function call: `support_complex_autograd(device_type)` that depends on `backward_dtype*` since they essentially mean the same thing for complex types TODO for next PR - add `test_unsupported_backward` to verify they are actually unsupported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56156 Reviewed By: mruberry Differential Revision: D27926717 Pulled By: walterddr fbshipit-source-id: 9a4af8612278ca44a97b6f1510b6b175852c893b	2021-04-30 08:24:58 -07:00
Nikita Shulga	e08303c740	Revert D27582224: [pytorch][PR] Automated submodule update: FBGEMM Test Plan: revert-hammer Differential Revision: D27582224 (`54469e157b`) Original commit changeset: 6670e96b21d8 fbshipit-source-id: fbc6ab0d35ff6168cb341477e7e86169ab1a43bf	2021-04-30 07:47:47 -07:00
Nikita Shulga	0dddfbf346	Revert D28114231: [pytorch][PR] Automated submodule update: FBGEMM Test Plan: revert-hammer Differential Revision: D28114231 (`264db1959e`) Original commit changeset: 0a5883ebb2fc fbshipit-source-id: edcb0d2ae1adfdea0999a6e410bdbe530bf61dda	2021-04-30 07:41:47 -07:00
albanD	95dc2b6e9b	Remove unused forward AD flag (#57058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57058 Test Plan: Imported from OSS Reviewed By: soulitzer Differential Revision: D28071504 Pulled By: albanD fbshipit-source-id: df694ac6b9fbb4aed269d61cd9522f8602fdae0c	2021-04-30 07:32:56 -07:00
albanD	83f186717b	Improve perf for forward AD view handling (#57057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57057 This PR performs optimization on the ViewInfo handling to remove the need for the "no forward AD mode". - When the forward and backward ViewInfo are the same, create and store only one of them Code for timing: ```python timer = Timer( stmt='a.view(-1)', setup='''\ import torch a = torch.rand(4)''') res = timer.collect_callgrind(repeats=2, number=10)[1] ``` Difference between master and this PR: ``` # Benchmark at master <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fe33be83690> a.view(-1) setup: import torch a = torch.rand(4) All Noisy symbols removed Instructions: 69286 68442 Baseline: 1332 1188 10 runs per measurement, 1 thread # Benchmark at this branch <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fe33bd7ec30> a.view(-1) setup: import torch a = torch.rand(4) All Noisy symbols removed Instructions: 69437 68562 Baseline: 1363 1188 10 runs per measurement, 1 thread # Difference between the two <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7fe1216e9a00> 160 ???:0x000000000a11c8d0 60 torch::autograd::DifferentiableViewMeta::DifferentiableViewMeta 60 ???:torch::autograd::as_view(at::Tensor const&, at::Tensor const&, bool, bool, std::function<at::Tensor (at::Tensor const&)>, torch::autograd::CreationMeta, bool) 40 ???:0x0000000008e14f50 40 ???:0x0000000008e05bd0 40 ???:0x0000000008e05480 40 ???:0x0000000008e036d0 40 ???:0x0000000008e02720 30 make_variable_differentiable_view ... -20 ???:0x0000000008e02060 -20 ???:0x0000000008e01fd0 -30 ???:torch::autograd::isForwardADEnabled() -40 ???:0x0000000008e14f90 -40 ???:0x0000000008e05c00 -40 ???:0x0000000008e054a0 -40 ???:0x0000000008e036f0 -40 ???:0x0000000008e02740 -160 ???:0x000000000a11d8d0 Total: 120 ``` Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28071505 Pulled By: albanD fbshipit-source-id: 672b1bdf87d516b6de4f2e36656819cfd6f4c9b9	2021-04-30 07:32:54 -07:00
albanD	b016bc1c91	fix InplaceOrView implementation for manual functions (#57152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57152 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28071506 Pulled By: albanD fbshipit-source-id: ef015593dd81be11bc08714d07e0ac4f26e188ec	2021-04-30 07:32:53 -07:00
albanD	c91bd25e90	Fix use of allow_tensor_metadata in view variable creation (#57069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57069 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28071507 Pulled By: albanD fbshipit-source-id: 44f0e09846fdc569cf1a62a6f80ca88911e7e45c	2021-04-30 07:31:54 -07:00
Brian Hirsh	6fa1d880b6	make external codegen aware of autogen'd composite kernels (#56960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56960 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D28012667 Pulled By: bdhirsh fbshipit-source-id: 56da050c5d46b8952ddecfa83ebd5fe8454acffe	2021-04-30 07:23:28 -07:00
kshitij12345	d4ddb47719	[special] Add `xlog1py` (#55138 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/50345 * [x] Check Rendered Document (https://12494173-65600975-gh.circle-artifacts.com/0/docs/special.html#torch.special.xlog1py) * [x] Tests in Binary Ufunc * [x] OpInfo * [x] Structured Kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/55138 Reviewed By: ngimel Differential Revision: D27961461 Pulled By: mruberry fbshipit-source-id: 30a8f41970a829bf50254aadf5615e8ce4148c7e	2021-04-30 05:51:13 -07:00
Ilia Cherniavskii	5b3e7638ca	Expand Kineto profiler support (part 1) (#57333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57333 Pull Request resolved: https://github.com/pytorch/kineto/pull/193 Expanding Kineto support to more platforms Test Plan: CI and OSS CI: https://github.com/pytorch/pytorch/pull/56323 Reviewed By: gdankel Differential Revision: D27873669 fbshipit-source-id: 4a72a589f958440cbfff247751b7f4e1910a10c7	2021-04-30 05:02:23 -07:00
Philip Meier	db32b69591	quote str kwarg values in `test_ops.py::TestCommon::test_jit_alias_remapping` (#57120 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57119. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57120 Reviewed By: gchanan Differential Revision: D28086601 Pulled By: mruberry fbshipit-source-id: 566a53c2365f2d128da49ac58463e37b36455831	2021-04-30 04:29:12 -07:00
CodemodService FBSourceClangFormatLinterBot	df69b0d060	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28115855 fbshipit-source-id: 20434a96dae636db53fae089042342000fc103c7	2021-04-30 04:18:28 -07:00
Facebook Community Bot	264db1959e	Automated submodule update: FBGEMM (#57342 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `5ce0eed074` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57342 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D28114231 fbshipit-source-id: 0a5883ebb2fcd45ff547d594928372a9a9c9b76c	2021-04-30 00:01:15 -07:00
Facebook Community Bot	54469e157b	Automated submodule update: FBGEMM (#55347 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `c565348fdc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55347 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D27582224 fbshipit-source-id: 6670e96b21d84dc6464559bf179f74751927fdd4	2021-04-29 22:51:42 -07:00
Edvard Ghazaryan	b3e1802439	Static runtime support for fb::expand_dims (#57282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57282 Added support for fb::expand_dims for SR. Test Plan: buck test caffe2/torch/fb/sparsenn:gpu_test -- test_expand_dims buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators Reviewed By: hlu1 Differential Revision: D28043049 fbshipit-source-id: 01f59db7b507f027b220f044d6ff23602adbdb06	2021-04-29 22:40:56 -07:00
Michael Suo	e31b67f550	[torch/deploy] opt torch/csrc/depoy into autofromatting Summary: One time formatting change + editing fbsource-lint-engine.toml. Test Plan: ``` arc lint --take CLANGFORMAT --apply-patches --paths-cmd 'hg files caffe2/torch/csrc/deploy' ``` Reviewed By: wconstab, Lilyjjo Differential Revision: D28100954 fbshipit-source-id: 831e5796d23c99a2f92e7abd9983ac07b1cf6fbb	2021-04-29 22:29:24 -07:00
Xiao Wang	ac72881f3f	Fix a numerical issue of CUDA channels-last SyncBatchNorm (#57077 ) Summary: Fix a numerical issue of CUDA channels-last SyncBatchNorm The added test is a repro for the numerical issue. Thanks for the help from jjsjann123 who identified the root cause. Since pytorch SBN channels-last code was migrated from [nvidia/apex](https://github.com/nvidia/apex), apex SBN channels-last also has this issue. We will submit a fix there soon. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57077 Reviewed By: mruberry Differential Revision: D28107672 Pulled By: ngimel fbshipit-source-id: 0c80e79ddb48891058414ad8a9bedd80f0f7f8df	2021-04-29 21:38:52 -07:00
davidriazati@fb.com	c44cbc63cc	Ignore more compiler warnings, unify WERROR options (#56630 ) Summary: This adds some more compiler warnings ignores for everything that happens on a standard CPU build (CUDA builds still have a bunch of warnings so we can't turn on `-Werror` everywhere yet). ](https://our.intern.facebook.com/intern/diff/28005063/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56630 Pulled By: driazati Reviewed By: malfet Differential Revision: D28005063 fbshipit-source-id: 541ed415eb0470ddf7e08c22c5eb6da9db26e9a0	2021-04-29 21:20:29 -07:00
Natalia Gimelshein	65968ab817	Revert "Remove sync for randperm on small tensors. (#54113 )" (#57299 ) Summary: This reverts commit e8c268746b297efa988e03abc61ff22203bf3980. It occasionally produces wrong results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57299 Reviewed By: wat3rBro Differential Revision: D28102706 Pulled By: ngimel fbshipit-source-id: d7618e104d854c3b96aa502fb4e30041b9aab5df	2021-04-29 17:52:21 -07:00
Jay Chae	49dbe1798f	[kineto] Deprecate ClientTraceActivity and merge it with GenericTraceActivity (#56743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56743 Pull Request resolved: https://github.com/pytorch/kineto/pull/184 as part of the migration to ClientTraceActivity -> GenericTraceActivity, now that all CTA mirrors GTA's data structure, we can safely swap out the symbol name. Test Plan: - `buck build kineto` - sandcastle to catch any other breakage in depdendees Took before and after of `fastrnns` bench `buck run mode/opt //caffe2/benchmarks/fastrnns:bench -- --cnns resnet50 --group cnns --nloops 1000` Before https://fburl.com/perfdoctor/9n0izgji {F611729029} After https://fburl.com/perfdoctor/h9d9tlmp {F611725475} Sample ParamComms traces https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1619503816%2F127.0.0.1%2Flibkineto_activities_4003656.json.gz&bucket=gpu_traces https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1619503816%2F127.0.0.1%2Flibkineto_activities_4003657.json.gz&bucket=gpu_traces https://www.internalfb.com/intern/perfdoctor/trace_view?filepath=tree%2Ftraces%2Fdynocli%2F0%2F1619503816%2F127.0.0.1%2Flibkineto_activities_4003658.json.gz&bucket=gpu_traces Reviewed By: gdankel Differential Revision: D27353973 fbshipit-source-id: 7012c6524c3c75079029ac290c1dd722ac187ec5	2021-04-29 16:36:40 -07:00
Rong Rong (AI Infra)	16fc18bf82	port neg to structure kernel (#57212 ) Summary: `negative` alias is not ported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57212 Reviewed By: driazati Differential Revision: D28095043 Pulled By: walterddr fbshipit-source-id: 6c7bcd727800bb1db7add43a152de7b58f4ccf43	2021-04-29 15:59:08 -07:00
Jeffrey Wan	995161203b	Fix sort for slow gradcheck (#57192 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57166 Generate new inputs until we get one where we know that x + eps won't change its sorted order. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57192 Reviewed By: albanD Differential Revision: D28102361 Pulled By: soulitzer fbshipit-source-id: a12377cc135b0bd92adf0914a100969317b97e8c	2021-04-29 15:48:02 -07:00
Serhat Yilmaz	e27740b38e	[torch] Add backward support for segment reduce (CPU only) Summary: This is to setup boiler plate code for backward and CPU implementation. Next Steps in order: - Add backward support for CUDA - Add support for more aggregation types - Benchmarking (for cuda mainly)/more testing/documentation - Support for multi dimension Test Plan: Updated unit test to also check correctness of backward. Wait for CI signal Reviewed By: ngimel Differential Revision: D27970340 fbshipit-source-id: 3e608c7fe3628b0a761dd8affc6aad8f65a6ef7f	2021-04-29 15:41:37 -07:00
Aapo Kyrola	d1def93166	[torch/debuggability] use log.info() in addition to print() in timeoutguard (#57296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57296 Seems many trainers disable print(), so we cannot see the thread dumps with CompleteInTimeOrDie(). So log.info() also. Test Plan: sandcastle Reviewed By: aalmah Differential Revision: D28098738 fbshipit-source-id: dfdca8801bacf5c7bccecc2387cb7ef41dadfa46	2021-04-29 15:23:35 -07:00
Yi Wang	c2fbd96735	[RPC Framework] Expose a Python API for device map getter (#57179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57179 Expose a Python API to get the device map and unblock RemoteModule work. See: https://github.com/pytorch/pytorch/pull/56854#issuecomment-827762398 Additionally, add a const decorator for the C++ getter. #Original PR issue: https://github.com/pytorch/pytorch/issues/51670 ghstack-source-id: 127684266 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D28070160 fbshipit-source-id: 624d14552d82b99487f72e16428fa75c7a47f61f	2021-04-29 14:29:10 -07:00
Lillian Johnson	2c6f5e8a12	[package] PackageExporter `__import__` logic to not parse dynamic cases (#57283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57283 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28095858 Pulled By: Lilyjjo fbshipit-source-id: c3cec074e6b2c48a09785fa0c02cd576b7ec94d9	2021-04-29 14:21:33 -07:00
Winston Smith	6ed90ed1ac	Added OpInfos for sub & mul (#56227 ) Summary: `OpInfo`s for `sub` & `mul` operators. Both of them will reuse the sample inputs function added for `add` via another PR. A https://github.com/pytorch/pytorch/issues/54261 task. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/56227 Reviewed By: H-Huang Differential Revision: D27993889 Pulled By: mruberry fbshipit-source-id: 7b2da02b0edba3cc37b5b1b88ca32f7dd369ca60	2021-04-29 14:10:15 -07:00
Howard Huang	149000c3f0	Update compare_set docs (#57203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57203 Update documentation to remove warning. Refactored arguments from `old_value` -> `expected_value` and `new_value` -> `desired_value` Test Plan: Imported from OSS Reviewed By: gchanan, cbalioglu Differential Revision: D28076556 Pulled By: H-Huang fbshipit-source-id: 5fcc5bcfff89cad51d8dc0b74a234964f1af20ed	2021-04-29 13:58:57 -07:00
Chester Liu	e31265dfb3	Fix path handling on Win32 in rendezvous.py (#57000 ) Summary: Fixes test failure after https://github.com/pytorch/pytorch/issues/56598 Introduced by https://github.com/pytorch/pytorch/issues/45335. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57000 Reviewed By: zou3519 Differential Revision: D28030360 Pulled By: seemethere fbshipit-source-id: 4871d51e6b80dceef8bf95c6c658441287575f63	2021-04-29 13:55:11 -07:00
Shiyan Deng	a6fa6a6cda	[fx minimizer] Add an option to minimizer to allow return all intermediate results (#57279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57279 Added an option "return_intermediate". If true, when building the submodule we want to run , we will replace the output with all the nodes, so that intermediate results of all the nodes will be returned as output. This is recommended to use with `run_node()` function. Test Plan: `buck test glow/fb/nnpi/lowering:net_min_tests` Reviewed By: khabinov Differential Revision: D27913887 fbshipit-source-id: 5a3eab02da05214fb9adeb25656c267b58075b1d	2021-04-29 13:46:25 -07:00
Howard Huang	95f393f212	Add compare_set to trampoline class, add typing and formatting (#57191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57191 Changed Store::compareSet() to a pure virtual function and added compareSet definition to PythonStore. Rest of changes are from clang-format. Test Plan: Imported from OSS Reviewed By: cbalioglu Differential Revision: D28076557 Pulled By: H-Huang fbshipit-source-id: 379636cf8b031088341a032250ba410d84ccf692	2021-04-29 13:29:11 -07:00
Michael Suo	be0ca00c5c	[torch/deploy] Minor housekeeping in interpreter_impl Summary: 1. Delete dead code relating to maskrcnn_benchmark extension module 2. Add some more commentary on why we define a meta path finder isthisimpact Test Plan: sandcastle Reviewed By: wconstab Differential Revision: D28078211 fbshipit-source-id: cfc6f47861c14ec7482b55ee585504271ae0f365	2021-04-29 12:51:56 -07:00
davidriazati@fb.com	4b96fc060b	Remove distutils (#57040 ) Summary: [distutils](https://docs.python.org/3/library/distutils.html) is on its way out and will be deprecated-on-import for Python 3.10+ and removed in Python 3.12 (see [PEP 632](https://www.python.org/dev/peps/pep-0632/)). There's no reason for us to keep it around since all the functionality we want from it can be found in `setuptools` / `sysconfig`. `setuptools` includes a copy of most of `distutils` (which is fine to use according to the PEP), that it uses under the hood, so this PR also uses that in some places. Fixes #56527 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57040 Pulled By: driazati Reviewed By: nikithamalgifb Differential Revision: D28051356 fbshipit-source-id: 1ca312219032540e755593e50da0c9e23c62d720	2021-04-29 12:10:11 -07:00
davidriazati@fb.com	21be40b390	Add torch_cpu specific flag for debug info (#57190 ) Summary: Right now we are using `REL_WITH_DEB_INFO=1` on Linux CI binary builds. This is causing intermittent failures on CUDA builds since the debug information increases the load on the linker. This adds a workaround by a flag to enable debug info only for the target we actually want it for (`libtorch_cpu.so`, all the other binaries are stripped over their debug info after building). Example failures (from [the hud](https://ezyang.github.io/pytorch-ci-hud/build2/pytorch-nightly?mode=nightly)): * https://app.circleci.com/pipelines/github/pytorch/pytorch/311785/workflows/df640957-54b0-4592-aeef-6d5baee503ae/jobs/12932229 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932228 * https://app.circleci.com/pipelines/github/pytorch/pytorch/311784/workflows/e3b487d6-fb46-4a5d-a2d5-22eec328b678/jobs/12932227 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57190 Pulled By: driazati Reviewed By: janeyx99 Differential Revision: D28085550 fbshipit-source-id: 0fc5b3e769b10c0dd3811717f968d0c933667361	2021-04-29 12:06:15 -07:00
Scott Wolchok	d3ffe9ab6b	[PyTorch] Allocate correctly-sized output tensor in addmm_cuda (#56033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56033 There doesn't seem to be any reason not to size the output correctly, and it avoids a round of dispatch for resize. ghstack-source-id: 127409715 Test Plan: Inspected GPU trace for simple nn.Linear in a loop. No more resize operator invocation. Existing CI should let us know if this is incorrect Reviewed By: ngimel Differential Revision: D27768311 fbshipit-source-id: fb48ec50f3cffc1015ef03d528e9007274b4dd3a	2021-04-29 11:59:51 -07:00
Scott Wolchok	dd9f4c8cc9	[PyTorch] Reduce move overhead in inferExpandGeometry (#56032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56032 Profiling & assembly inspection showed that we weren't getting NRVO with `inferExpandGeometry_dimvector` returning `std::tuple`. I added a custom type with constructors so that, as the comment says, we could be sure to get NRVO. ghstack-source-id: 127409717 Test Plan: Inspected new assembly, no more move construction (which is a copy for on-stack DimVectors!) upon returning Reviewed By: ezyang Differential Revision: D27768312 fbshipit-source-id: d1d53a36508be92585802e1467d8a42d1ae05d80	2021-04-29 11:59:50 -07:00
Scott Wolchok	fb2f3cd172	[PyTorch] Migrate copy_ to borrow input/output (#56031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56031 Copy kernels just immediately do the copy; borrowing should be fine. ghstack-source-id: 127409719 Test Plan: CI, review Reviewed By: ezyang, walterddr Differential Revision: D27768310 fbshipit-source-id: 7651731fd3dea14adbdb3fef95a6d67c02175508	2021-04-29 11:59:48 -07:00
Scott Wolchok	a1d2bd56a0	[PyTorch] Make as_strided_ use_const_ref_for_mutable_tensors (#55875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55875 One less const-incorrect function. ghstack-source-id: 127409720 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27686995 fbshipit-source-id: 6ba3fe86be9957770920177649f586da8134a09a	2021-04-29 11:58:38 -07:00
kshitij12345	ac86e0a0e5	fix: index_fill_ formula to support duplicate indices (#57101 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57101 Reviewed By: gchanan Differential Revision: D28076988 Pulled By: albanD fbshipit-source-id: 1c1bd396282ca030b2445e4f3e1912f3c5a42b6c	2021-04-29 11:29:17 -07:00
Kevin Rose	ec86f96e91	Fix for derivative of sinc(x) when x is positive but very very small (#56986 ) Summary: Problem arises for sinc'(x) where x != 0, but x ** 2 == 0, which happens for some very small floats. I realized that my solution from https://github.com/pytorch/pytorch/issues/56763 was incomplete when I did a quick implementation using `torch.autograd.Function` and still got a `NaN` from my derivative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56986 Reviewed By: gchanan Differential Revision: D28093507 Pulled By: albanD fbshipit-source-id: 2a30e1065b08c5c60de843a0778dedeb0fb295f4	2021-04-29 11:16:39 -07:00
Jagadish Krishnamoorthy	fd67088a57	[Distributed test]Enable ddp_control_flow tests for ROCm (#57159 ) Summary: Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/57159 Reviewed By: zou3519 Differential Revision: D28074244 Pulled By: rohan-varma fbshipit-source-id: 03e66cf5f546987b3d6d1b9c5feafcdf8292573e	2021-04-29 11:10:47 -07:00
nikithamalgifb	2e2c0099eb	Support type inference of nn.Module methods using PDT (#57165 ) Summary: Adds support for type inference of nn.Module methods using monkeytype in JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/57165 Reviewed By: gmagogsfm Differential Revision: D28064983 Pulled By: nikithamalgifb fbshipit-source-id: 303eaf8d7a27e74be09874f70f519b4c1081645b	2021-04-29 11:09:37 -07:00
Aliaksandr Ivanou	8a949f9e51	[23/n][torch/elastic][upstream] Rename torch.distributed.elastic_launch to torch.distributed.run (#56831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56831 Rename torch.distributed.elastic_launch to torch.distributed.run Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/... flow-cli canary pytorch.elastic.examples.classy_vision.main --entitlement gpu_prod --run-as-secure-group oncall_dai_pet --buck-target //fblearner/flow/projects/pytorch/elastic/examples:workflow Reviewed By: kiukchung Differential Revision: D27921159 fbshipit-source-id: cc7f2f035223b2d4abd7373af298998887e14c12	2021-04-29 11:06:20 -07:00
Xu Zhao	c72f01ab6b	Add CI workflow and script to test torchbench. (#56957 ) Summary: This PR adds TorchBench (pytorch/benchmark) CI workflow to pytorch. It tests PRs whose body contains a line staring with "RUN_TORCHBENCH: " followed by a list of torchbench model names. For example, this PR will create a Torchbench job of running pytorch_mobildnet_v3 and yolov3 model. For security reasons, only the branch on pytorch/pytorch will run. It will not work on forked repositories. The model names have to match the exact names in pytorch/benchmark/torchbenchmark/models, separated by comma symbol. Only the first line starting with "RUN_TORCHBENCH: " is respected. If nothing is specified after the magic word, no test will run. Known issues: 1. Build PyTorch from scratch and do not reuse build artifacts from other workflows. This is because GHA migration is still in progress. 2. Currently there is only one worker, so jobs are serialized. We will review the capacity issue after this is deployed. 3. If the user would like to rerun the test, she has to push to the PR. Simply updating the PR body won't work. 4. Only supports environment CUDA 10.2 + python 3.7 RUN_TORCHBENCH: yolov3, pytorch_mobilenet_v3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56957 Reviewed By: janeyx99 Differential Revision: D28079077 Pulled By: xuzhao9 fbshipit-source-id: e9ea73bdd9f35e650b653009060d477b22174bba	2021-04-29 11:02:38 -07:00
Howard Huang	ee71584236	Update compare_set implementation for FileStore and HashStore (#57175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57175 Update other Store implementations to add the value when current value is empty to match the amendment made to TCPStore (#55636). Added test to cover this case. Test: `pytest -vs test/distributed/test_c10d_common.py -k compare_set` Test Plan: Imported from OSS Reviewed By: cbalioglu Differential Revision: D28069380 Pulled By: H-Huang fbshipit-source-id: eac703edb41faee32a4e7cda61107e2a0e726326	2021-04-29 10:48:11 -07:00
Jerry Zhang	ecacb8c78b	[quant][graphmode][fx] Fix getitem for unmatched nodes (#57173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57173 If getitem is followed by an unmatched node, we'll remove the observer after it. Test Plan: python test/test_quantization.pyt TestQuantizeFxOps.test_getitem Imported from OSS Reviewed By: vkuzo Differential Revision: D28068805 fbshipit-source-id: e79f8ec3e8fd61d348b8a7069ab0bb434d737c30	2021-04-29 10:16:44 -07:00
Chen Lai	9486fc3229	[PyTorch][Edge] share readArchiveAndTensors between mobile and jit (#57098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57098 1. Separate `readArchiveAndTensors()` from `jit/import.cpp` to a new file `jit/import_read.cpp`. 2. Use `readArchiveAndTensors()` in `mobile/import.cpp` ghstack-source-id: 127703081 3. Add a util function in cpp that could read .pkl files directly instead of loading the entire module Test Plan: CI Reviewed By: raziel, iseeyuan Differential Revision: D28052193 fbshipit-source-id: c8d57f3270bdcf2e52a32f7c111899bd5da7cac2	2021-04-29 10:09:50 -07:00
Yanli Zhao	2c8ea63cbb	add a test for grad view with torch amp (#56730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56730 add a test to verify DDP with torch map will result in the same results when using grad_as_bucket_view=true and false. torch.amp scale factor does not have dependencies on old gradients, thus it is not affected by grad_as_bucket_view=true or false, see how torch.amp is implemeted here https://github.com/pytorch/pytorch/pull/33366/files. This diff verified ddp can work as expected with amp.GradScaler and amp.autocast when when using grad_as_bucket_view=true and false. ghstack-source-id: 127526358 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D27950132 fbshipit-source-id: 8ed26935fdcb4514fccf01bb510e31bf6aedac69	2021-04-29 10:06:07 -07:00
Eli Uriegas	e96667175e	.circleci: Switch libtorch builds to use smaller image (#56937 ) Summary: These weren't using the smaller images so we should probably let them use the smaller images Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/56937 Reviewed By: walterddr Differential Revision: D28077747 Pulled By: seemethere fbshipit-source-id: da0245bc3b4f564fcd392630542777b2b668b98f	2021-04-29 10:01:41 -07:00
Luca Wehrstedt	311ad5e3af	Merge CUDAFuture into ivalue::Future (#57052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57052 This PR caps a stack whose goal was to merge CUDAFuture into ivalue::Future. CUDAFuture used to be a subclass of ivalue::Future, which was already pretty good, but it meant that in several places we needed `#ifdef`s or registries in order to create the right type of class, which was annoying. We've made CUDAFuture device-agnostic, by using generic helpers, so that it doesn't depend on CUDA. Now all its code can be inserted into ivalue::Future. This PR does this very naively, by copy-pasting CUDAFuture's code into the (previously empty) virtual methods of ivalue::Future. This helps ensure the correctness of this PR, as it's straightforward to see it behaves exactly like before. However we probably want to polish it a bit later to iron out so wrinkles. ghstack-source-id: 127713138 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: mrshenli Differential Revision: D28036829 fbshipit-source-id: 3e5b16402f5dc245c1fcb9d7bf06db64dcb0d2a3	2021-04-29 09:31:52 -07:00
Luca Wehrstedt	71c2f88b90	Make CUDAFuture handle any kind of device type (#57051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57051 Make CUDAFuture autodetect the devicetype from its arguments (which thus change from DeviceIndices to full Devices). This in fact transforms CUDAFuture into a AnythingFuture, since it's not tied to CUDA in any way anymore. Having made it fully device-agnostic, we'll merge it into ivalue::Future in the next PR. ghstack-source-id: 127713134 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: mrshenli Differential Revision: D28032711 fbshipit-source-id: 8ba23b1b0d97f61db8693cd5f3c7bae7989a9bcd	2021-04-29 09:31:50 -07:00
Luca Wehrstedt	cf1595c48b	Use only generic helpers in CUDAFuture (#57050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57050 Avoid (nearly) any explicit mention of CUDA in CUDAFuture, and instead use "generic" classes like c10::Event, c10::Stream and most notably c10::impl::DeviceGuardImplInterface which allow us to indirectly manipulate CUDA entities. This is a preparation step to make CUDAFuture device-agnostic and thus become able to merge it into ivalue::Future. The one exception is when we construct the c10::impl::DeviceGuardImplInterface, where for now we still hardcode CUDA. This will be fixed in the very next PR ghstack-source-id: 127713133 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: mrshenli Differential Revision: D28032710 fbshipit-source-id: a240ecc32bda481e8ecf85dab94933e24f832bb0	2021-04-29 09:31:48 -07:00
Luca Wehrstedt	682476022f	Introduce generic MultiStreamGuard (#57049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57049 There was a comment above CUDAMultiStreamGuard which said "TODO: Implement this generically in c10". This is what I'm doing here. The new generic MultiStreamGuard class is able to take a vector of device-agnostic c10::Streams and is able to support any device type (CUDA, but also ROCm and others) by using a VirtualGuardImpl. A class called CUDAMultiStreamGuard is still kept around, for convenience, and slightly for performance as it avoids a vtable lookup. ghstack-source-id: 127713139 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: mrshenli Differential Revision: D28029158 fbshipit-source-id: 2f3181371f8cb0d77a3b2e6aa510f1dd74e8f69b	2021-04-29 09:31:47 -07:00
Luca Wehrstedt	381698f900	Simplify CUDAMultiStreamGuard (#57048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57048 CUDAMultiStreamGuard had a default constructor and a `original_devices()` method which were only used in a test. I'm removing them here to simplify the API and make it easier to manipulate this class later. One extra benefit is that this class used to get and store the current stream of _all_ devices, whereas now it only does so for the relevant devices. ghstack-source-id: 127713136 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: mrshenli Differential Revision: D28029160 fbshipit-source-id: 185ef9a7ac909cd0ae6507dad9826fe978e67308	2021-04-29 09:31:45 -07:00
Luca Wehrstedt	ea64c90ecc	Add recordDataPtrOnStream to DeviceGuardImplInterface (#57047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57047 We intend to merge CUDAFuture into ivalue::Future by using DeviceGuardImplInterface to avoid explicitly referring to CUDA. For that we need to add two methods to DeviceGuardImplInterface. In this PR, we add a method to record a DataPtr onto a stream with the caching allocator. ghstack-source-id: 127713135 (Note: this ignores all push blocking failures!) Test Plan: Used later in this stack Reviewed By: ezyang Differential Revision: D28029161 fbshipit-source-id: ff337ab8ccc98437b5594b2f263476baa1ae93e7	2021-04-29 09:31:43 -07:00
Luca Wehrstedt	6fdf092cad	Add getStreamFromPool to DeviceGuardImplInterface (#57046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57046 We intend to merge CUDAFuture into ivalue::Future by using DeviceGuardImplInterface to avoid explicitly referring to CUDA. For that we need to add two methods to DeviceGuardImplInterface. In this PR, we add a method to get a stream from the global ATen pool. ghstack-source-id: 127713137 (Note: this ignores all push blocking failures!) Test Plan: Used later in this stack Reviewed By: ezyang Differential Revision: D28029159 fbshipit-source-id: 5055d84c1f3c2a4d86442f3149455c5ebd976dea	2021-04-29 09:30:41 -07:00
Chester Liu	63533478bd	Fix misleading messages in test_jit_c10d (#57256 ) Summary: TCPStore is now available on Windows. Before: `TCPStore not available on Windows` After: `c10d was not compiled with the NCCL backend` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57256 Reviewed By: gchanan Differential Revision: D28092539 Pulled By: H-Huang fbshipit-source-id: 1e48cfe29b33b102bc97f51268ac1bbda596397d	2021-04-29 09:17:41 -07:00
Ivan Yashchuk	b232659765	Replaced _lstsq_helper with internal dispatch (#54724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54724 Removed at::_lstsq_helper; it is replaced with DEFINE/DECLARE_DISPATCH. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27993747 Pulled By: mruberry fbshipit-source-id: dc8b884fd33b3dd18d9a8e4c582b869ac5391de5	2021-04-29 09:11:14 -07:00
Ivan Yashchuk	03962bc7f1	Updated linalg.lstsq with NumPy compatible kwarg rcond (#54723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54723 Renamed "cond" -> "rcond" to be NumPy compatible. The default value for rcond was changed to match non-legacy NumPy behavior. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27993741 Pulled By: mruberry fbshipit-source-id: a4baf25aca6a8272f1af2f963600866bfda56fb3	2021-04-29 09:11:12 -07:00
Ivan Yashchuk	5a02f72fcf	Modified batched residuals return of torch.linalg.lstsq (#54722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54722 SciPy and NumPy operate only on non-batched input and return an empty array with shape (0,) if rank(a) != n. The behavior for non-batched inputs is NumPy and SciPy compatible and the same result is computed. For batched inputs, if any matrix in the batch has a rank less than `n`, then an empty tensor is returned. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27993736 Pulled By: mruberry fbshipit-source-id: 0d7cff967b322a5e816a23f282b6ce383c4468ef	2021-04-29 09:10:12 -07:00
Edward Yang	36ebd0f65d	Improve LeftRight documentation (#57164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57164 Give some more indications about its performance characteristics and when it is appropriate to use. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28064685 Pulled By: ezyang fbshipit-source-id: dbf5e041088d7921db2111d287feb9079466f1b5	2021-04-29 08:44:27 -07:00
Mike Ruberry	b8e1be1a13	Revert D28041140: [pytorch][PR] Adding vector_norm to the C++ API Test Plan: revert-hammer Differential Revision: D28041140 (`fda8561944`) Original commit changeset: 65ab32efbcf9 fbshipit-source-id: ce69c6c1f2076c24f96d1f678ace415b22b2332c	2021-04-29 08:20:10 -07:00
Heitor Schueroff	fda8561944	Adding vector_norm to the C++ API (#57055 ) Summary: ## BC Breaking Note This PR removes the redundant linalg_ prefix from torch::linalg::linalg_det and torch::linalg::linalg_norm C++ API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57055 Reviewed By: H-Huang Differential Revision: D28041140 Pulled By: heitorschueroff fbshipit-source-id: 65ab32efbcf92010439881bd8a292cdb5b39c579	2021-04-29 08:12:24 -07:00
Jeffrey Wan	82e50f4757	Update test_overrides for gradcheck (#57155 ) Summary: Run both fast and slow mode for test overrides and fix failure in slow_mode Pull Request resolved: https://github.com/pytorch/pytorch/pull/57155 Reviewed By: albanD Differential Revision: D28076483 Pulled By: soulitzer fbshipit-source-id: ef942d787d986ba881329e9515e5de6194f3782b	2021-04-29 07:43:18 -07:00
Nikita Shulga	762b3aa7ba	Revert D28078846: [pytorch][PR] Enable clang-tidy on master Test Plan: revert-hammer Differential Revision: D28078846 (`4049732811`) Original commit changeset: adffa292c9f5 fbshipit-source-id: 44cf37ba1aac57aa77abf045ae0deefa0048756f	2021-04-29 06:56:20 -07:00
chenlai	17b961b8bc	[PyTorch][Edge] Fix mypy error (#56999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56999 ## Summary Currently ## Test ![image](https://user-images.githubusercontent.com/16430979/116294682-19acaf80-a74d-11eb-9596-3a1d697ae835.png) Note: there are still some other mypy failure for other functions in other repo Differential Revision: D28023671 Test Plan: See the test image above Also CI Reviewed By: dhruvbird Pulled By: cccclai fbshipit-source-id: d59da32b8b5a12c3f13bc5f4e02794db01132be3	2021-04-29 06:50:06 -07:00
Aliaksandr Ivanou	5c8ceefe46	Pytorch add agent api tests (#56985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56985 Pytorch add agent api tests Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28020485 fbshipit-source-id: e6acf095f26ce4b99cddfbf7641fb4fa885b0c86	2021-04-29 06:14:39 -07:00
Horace He	3a923a555a	[NNC] moved lowerings out of the TensorExprKernel and into independent functions (#56679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56679 moved lowerings out of the TensorExprKernel and into independent functions Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D28082921 Pulled By: Chillee fbshipit-source-id: af530510957ed4aa8b64dcc77ca36b69866d8000	2021-04-29 05:46:50 -07:00
Atul Jangra	ca814904b4	Handle error reporting when reply file already exists (#57217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57217 In torch multiprocessing error handler, we try to remove the file if it already exists. Before removing, we try to log the contents of the file. Here the assumption is that the contents would be valid json. However, in some cases, it isn't and then we end up not clearing the file. Let's handle this error and make sure that the file is cleaned irrespective of the contents of the file. Reviewed By: devashisht Differential Revision: D28041470 fbshipit-source-id: da96d11b8f7091715cf0152cccd3ecc08b688eae	2021-04-29 04:57:35 -07:00
Yanan Cao	2aadeac0ff	Remove duplicate entry for filter in language ref v2 (#57154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57154 Reviewed By: zou3519 Differential Revision: D28061690 Pulled By: gmagogsfm fbshipit-source-id: b895238c0425cc6b60f5e19c67fc5bc6e0115d7f	2021-04-29 04:52:50 -07:00
CodemodService FBSourceClangFormatLinterBot	e903e16d40	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D28088724 fbshipit-source-id: 3a350580427b92719a3c300bec310aea78375996	2021-04-29 04:12:25 -07:00
Nikita Shulga	eac02f85cf	Fix more clang-tidy errors (#57235 ) Summary: In my last PR I've missed CUDA and distributed folders, fixing this now This change is autogenerated by `python tool/clang_tidy.py -s` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57235 Reviewed By: janeyx99 Differential Revision: D28084444 Pulled By: malfet fbshipit-source-id: bf222f69ee90c7872c3cb0931e8cdb84f0cb3cda	2021-04-28 23:29:10 -07:00
Horace He	565b034237	changed parametric type error in normalize to a warning (#57183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57183 Previously, if it was unable to support matching against a type, it would throw an error. However, this exposes the user to arbitrary Torchscript schemas, which may or may not be problematic. Although we may support these in the future, for now we just return False (which will simply eliminate that schema from the candidates). Test Plan: T89661626 and T89664016 Reviewed By: spaugh, khabinov Differential Revision: D28072018 fbshipit-source-id: 83017d1e96d19912163edc74a5e43b2816783218	2021-04-28 22:33:44 -07:00
Natalia Gimelshein	54eee04226	support discontiguous tensors only for contiguous output format (#57177 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/57122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57177 Reviewed By: zou3519 Differential Revision: D28072674 Pulled By: ngimel fbshipit-source-id: 1f0b1d6916eb9739c35a5ac5aba33e70c1c43a34	2021-04-28 19:31:07 -07:00
Gao, Xiang	d0ea3183c1	Remove debugging print in randperm (#57218 ) Summary: Sorry that I forget to delete this. Thank xwang233 for finding this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57218 Reviewed By: mruberry Differential Revision: D28081292 Pulled By: ngimel fbshipit-source-id: a75867aa82d8644ef3a863d94f225c37babfe249	2021-04-28 19:16:43 -07:00
Shen Li	1ee54cc7b4	Add devices argument to RRef constructor (#57085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57085 PR #54932 fixed the CUDA RPC for RRef when RRef is created through RPC. But besides that use case, RRef can also be created locally by directly passing in a value, which would bypass the CUDA stream synchronization in #54932. This commit covers the above gap by adding a `devices` argument to RRef constructor. The RRef will then use this argument to choose between `CUDAFutre` and `ivalue::Future` to hold the value. When `devices` is specified and non-empty, `CUDAFuture` will be used, and the `devices` will be passed to that `CUDAFuture`. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28050001 Pulled By: mrshenli fbshipit-source-id: 2316b419fa69aa4dcd444050f0b74e61c3d0af1e	2021-04-28 19:11:10 -07:00
Ilia Cherniavskii	dd6b9665bf	[profiler] Add sequenceNr and fwdThreadId to the trace (#57182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57182 Adding sequenceNr and fwdThreadId to the trace, to associate fwd ops with backward ops Test Plan: CI Reviewed By: xuzhao9 Differential Revision: D28070725 fbshipit-source-id: aa4db580c9fd3ed061eaceb5239f4d9b2f8da3dc	2021-04-28 17:26:28 -07:00
Pritam Damania	2dc3dc2324	Enhance error message for Future.setErrorIfNeeded. (#56631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56631 `setErrorIfNeeded` did not mention whether the future was already completed or there was some other exception. This particular change ensures that we also print out the original exception as part of the error message. This would help in debugging issues where this codepath is triggered. ghstack-source-id: 127248844 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27919974 fbshipit-source-id: 2273a93f3475929b14f721c976f194f33a5aa746	2021-04-28 17:21:33 -07:00
Aliaksandr Ivanou	6ff0002b12	Pytorch: enable many torchelastic tests (#56970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56970 The diff enables metrics, events, utils and timer tests on ci/cd pipeline Test Plan: ci/cd Reviewed By: cbalioglu Differential Revision: D28015200 fbshipit-source-id: 6b419aaf9e62a10a747b6511bff90c82cfb7bcd6	2021-04-28 17:05:09 -07:00
Nikita Shulga	4049732811	Enable clang-tidy on master (#57213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57213 Reviewed By: seemethere Differential Revision: D28078846 Pulled By: malfet fbshipit-source-id: adffa292c9f5d75b5f4840f9129d0184763d96a6	2021-04-28 16:41:44 -07:00
Nikita Shulga	73453f1de1	Swap CUDA-10.2 and CUDA-11.1 master-only status (#57207 ) Summary: CUDA-11.1 build and tests will now run on PR and master, but 10.2 will be master only Also, delete remaining CUDA-10.1 build Pull Request resolved: https://github.com/pytorch/pytorch/pull/57207 Reviewed By: ngimel Differential Revision: D28077271 Pulled By: malfet fbshipit-source-id: 633945bf85091575efa34280e04a6b9d68a53138	2021-04-28 16:23:05 -07:00
Yanan Cao	78736a72a5	Fix default dtype for randperm, triu/tril_indices inside TorchScript (#57105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57105 Reviewed By: ezyang Differential Revision: D28060969 Pulled By: gmagogsfm fbshipit-source-id: 6b074418306377f5f906aafd121b614964972fc3	2021-04-28 16:18:33 -07:00
Michael Suo	63d54874e7	[torch/deploy] smol cleanups to generate_packages Summary: Remove some unnecessary args Test Plan: sandcastle Reviewed By: wconstab Differential Revision: D28052626 fbshipit-source-id: f1b4d0555b4ab37dc9a245fbc1fa455f69a4db20	2021-04-28 15:54:46 -07:00
Michael Suo	c69386ccee	[torch/deploy] remove usage of fbcode_dir (#57102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57102 We don't actually need to peek into `--fbcode_dir` for this. There are two reasons we should avoid this: 1. The [`TARGETS` docs](https://fburl.com/wiki/zz1wh6uc) recommend against it, as it can break buck caching and dependency tracking. This doesn't seem to be a serious issue in our case (we declare our sources anyway) but worth respecting. 2. More seriously, if we want to use this script from outside fbcode (like `fbsource/third-party/pypi`), it will break since `fbcode_dir` gets set to something wild The preferred method is apparently to use `$SRCDIR`, which represents a directory that all specified sources are copied to before exexcuting the custom rule. Found the suggestion here: https://fburl.com/w33wae2b. Seems less fragile, since it's publically documented as well: https://buck.build/rule/genrule.html Test Plan: sandcastle Reviewed By: wconstab Differential Revision: D28052570 fbshipit-source-id: cb4772b5dc07fbdc251249d6e0759e71730098af	2021-04-28 15:53:36 -07:00
Akshit Khurana	3483049d58	Add xnnpack global average pool op (#55791 ) Summary: Adaptive average pool with output size (1, 1) is a global average pool For mobile use xnnpack to speed up that path Pull Request resolved: https://github.com/pytorch/pytorch/pull/55791 Test Plan: buck test //xplat/caffe2:pt_xnnpack_test pytest test/test_xnnpack_integration.py::TestXNNPACKOps Fixes #{issue number} Reviewed By: kimishpatel Differential Revision: D27711082 Pulled By: axitkhurana fbshipit-source-id: 8757042c4a31a60451d8ba5fb6bf8cfbaf0a8d10	2021-04-28 14:54:47 -07:00
Akshit Khurana	aac2e68515	Add inplace hardswish xnnpack op (#56715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55801 Refactor to add inplace version of xnnpack hardswish op Test Plan: buck test //xplat/caffe2:pt_xnnpack_test Reviewed By: kimishpatel Differential Revision: D27712305 fbshipit-source-id: ed1dba22b026251f891fe7b88fbaa9a42985ef2c	2021-04-28 14:54:45 -07:00
Akshit Khurana	28fc59d13d	Add xnnpack hardswish op (#56714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56714 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55800 For mobile use xnnpack implementation of hardswish Test Plan: buck test //xplat/caffe2:pt_xnnpack_test Reviewed By: kimishpatel Differential Revision: D27712306 fbshipit-source-id: c7f0b70482aeef2aaa1966e2c669f79ecd29caa7	2021-04-28 14:53:46 -07:00
Nikita Shulga	0a30d64c83	Revert D27966444: [pytorch][PR] [CUDA graphs] Avoid sync errors when graph capturing cudnn rnn calls that use cudnn dropout Test Plan: revert-hammer Differential Revision: D27966444 (`610c984d2e`) Original commit changeset: fe0df843c521 fbshipit-source-id: 8223b7f8b7183f0e7c9df6a7aa8f6b164e5634db	2021-04-28 14:51:10 -07:00
Nikita Shulga	4cb534f92e	Make PyTorch code-base clang-tidy compliant (#56892 ) Summary: This is an automatic change generated by the following script: ``` #!/usr/bin/env python3 from subprocess import check_output, check_call import os def get_compiled_files_list(): import json with open("build/compile_commands.json") as f: data = json.load(f) files = [os.path.relpath(node['file']) for node in data] for idx, fname in enumerate(files): if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'): files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')] return files def run_clang_tidy(fname): check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"]) changes = check_output(["git", "ls-files", "-m"]) if len(changes) == 0: return check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"]) def main(): git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n") compiled_files = get_compiled_files_list() for idx, fname in enumerate(git_files): if fname not in compiled_files: continue if fname.startswith("caffe2/contrib/aten/"): continue print(f"[{idx}/{len(git_files)}] Processing {fname}") run_clang_tidy(fname) if __name__ == "__main__": main() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892 Reviewed By: H-Huang Differential Revision: D27991944 Pulled By: malfet fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179	2021-04-28 14:10:25 -07:00
Howard Huang	5a10ee71d6	[Reland] TCPStore add watchKey method and new listener thread (#56217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56217 Reland of https://github.com/pytorch/pytorch/pull/54264 Changes: - Update socket send() to use flag MSG_NOSIGNAL to prevent SIGPIPE because error in return is already capturad - Update watchKey to block until callback has been registered on master. - Fix race condition in testWatchKeyCallback which caused flaky test failures. Test: Ran TCPStoreTest 100 times locally with no errors, running [ci-all tests](https://github.com/pytorch/pytorch/pull/56219) Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27824802 Pulled By: H-Huang fbshipit-source-id: c32230ce726d7d848b9896a63aa52b8eb04a0a2d	2021-04-28 13:46:02 -07:00
Erjia Guan	6ec01b1610	[DataLoader] Add mode to LoadFilesFromDisk (#57056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57056 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D28059776 Pulled By: ejguan fbshipit-source-id: 0be511f196bedf6eab3cd0bded35096c17a473bf	2021-04-28 13:13:30 -07:00
Lillian Johnson	31e59c3869	torch.package change `Folder` to `Directory` and add doc strings (#56925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56925 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D28002145 Pulled By: Lilyjjo fbshipit-source-id: 6265970202d1530c4fb7ea10011b0e09094037d5	2021-04-28 13:03:12 -07:00
Michael Carilli	610c984d2e	[CUDA graphs] Avoid sync errors when graph capturing cudnn rnn calls that use cudnn dropout (#56433 ) Summary: Cudnn rnn calls that use use cudnn dropout maintain a "state" buffer across calls. [DropoutState](`fe3f6f2da2/aten/src/ATen/native/cudnn/RNN.cpp (L1388-L1402)`)'s lock() and unlock() ensure the current call's use of the state buffer syncs with the end of the previous call's use of the state buffer (in case the previous call was on a different stream). Telling a capturing stream to wait on an event recorded in a non-capturing stream is an error (1). Telling a non-capturing stream to wait on an event recorded during capture is also an error (2). So DropoutState's flow can error in either of two simple use cases: ```python rnn = nn.LSTM(512, 512, 2, dropout=0.5).cuda() out1 = rnn(in1) # calling cudnn rnn with dropout in capture after calling it uncaptured triggers 1 capture_stream.wait_stream(torch.cuda.current_stream()) with torch.cuda.stream(capture_stream): graph.capture_begin() out2 = rnn(in2) graph.capture_end() torch.cuda.current_stream().wait_stream(capture_stream) # calling cudnn rnn with dropout uncaptured after calling it in capture triggers 2 out3 = rnn(in3) ``` This PR fixes both cases by telling `DropoutState::lock()`: "if the most recent end-of-usage event was in a different capture state (ie, we crossed a capturing<->noncapturing border) or in a different capture, don't sync on it." While considering the fix I had two assumptions in mind: - only one capture using the RNN can be underway at a time in this process - no noncapturing ops in this process are issuing RNN calls while the capture using the RNN is underway. That second assumption seems brittle if, for example, someone wants to capture an internal region of the forward method of a model wrapped with DataParallel: multiple threads could be issuing RNN calls with some currently capturing and some not. We should talk about whether that use case seems realistic. (Bigger-picture thoughts: I don't know if forcing calls to serialize on using the shared state buffer is the best design. And if we want to do it that way, we might as well run all cudnn rnns with dropout on a dedicated side stream synced with the surrounding stream (capturing or not), in which case I don't think this PR's event-handling diffs would be needed.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56433 Reviewed By: heitorschueroff Differential Revision: D27966444 Pulled By: ezyang fbshipit-source-id: fe0df843c521e0d48d7f2c81a17aff84c5497e20	2021-04-28 12:52:03 -07:00
Mike Guo	efd451385c	Add gzip format support for chrome tracing (#56554 ) Summary: add gzip format support when exporting chrome tracing Pull Request resolved: https://github.com/pytorch/pytorch/pull/56554 Reviewed By: xuzhao9 Differential Revision: D28019111 Pulled By: ilia-cher fbshipit-source-id: 7d522481912bc9e93b4b31b17f01b1b069c7d2b6	2021-04-28 12:40:33 -07:00
Nikitha Malgi	ce79bd255d	Fix doc issues (#57153 ) Summary: Fixes inconsistencies in the TorchScript Language reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57153 Reviewed By: zou3519, gmagogsfm Differential Revision: D28061449 Pulled By: nikithamalgifb fbshipit-source-id: a055c7b1417391afe00ec0b35e1042acb049feed	2021-04-28 11:47:10 -07:00
Eli Uriegas	911852ffe2	.github: Only add @generated on generated workflows (#57063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57063 Removes the generated tag from the original template so the diff shows up correctly on internal Phab Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28040694 Pulled By: seemethere fbshipit-source-id: c6ec0520fbc4ea169abefc7df2ff925ecc0474cc	2021-04-28 11:28:57 -07:00
Edward Yang	18337fec7e	Remove glaringlee from C++ frontend codeowners (#57130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57130 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28059800 Pulled By: ezyang fbshipit-source-id: dc8a28761acaf19bc5620912c016c67bdd3a4e5b	2021-04-28 11:03:41 -07:00
Eli Uriegas	4b8ccc6a0f	.circleci: Add /opt/openssl to CI images (#57071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57071 Adds /opt/openssl v1.1.1 to cpu CI images to enable testing for Gloo TCP_TLS Similar to https://github.com/pytorch/builder/pull/712 Enables https://github.com/pytorch/pytorch/pull/56442 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D28061203 Pulled By: seemethere fbshipit-source-id: 222a824b30de96c1064da11ce8ce4dc6c851111e	2021-04-28 10:43:10 -07:00
Edward Yang	ec0fa40f0f	Release GIL before destructing RPCAgent subclasses. (#57029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57029 Partially addresses https://github.com/pytorch/pytorch/issues/56297 This fixes deadlocks when the threads the RPCAgent are blocking on try to take the GIL. This also adds a general utility for making shared_ptr run destructors without GIL. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28030294 Pulled By: ezyang fbshipit-source-id: 628c066eebbb70bda5b914645a109dce35d73c8d	2021-04-28 10:25:03 -07:00
Rohan Varma	fe09d54120	[c10d] Add debug level field in ProcessGroup (#56530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56530 For upcoming diffs, ProcessGroup will need to know about debug level for e.g. logging collective operations. ghstack-source-id: 127535775 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27849839 fbshipit-source-id: a9f016a27d30a242eced19929b3824ae68fe430f	2021-04-28 10:01:21 -07:00
Rohan Varma	6ee5e490d4	[BE][SyncBN] Avoid sync stats in eval mode (#56982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56982 SyncBatchNorm should behave as a regular BN layer in eval model, this change ensures that this is the case. In particular, the bug was when `track_running_stats=False`, `bn_training` would be set to True in eval mode, but this would trigger a collective sync in syncBN. However, in eval mode syncBN should behave like a regular BN layer and not do this sync. Closes https://github.com/pytorch/pytorch/issues/48988 Ensured with unittest that when used for inference on a single rank, stats sync is not triggered. ghstack-source-id: 127544421 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27579297 fbshipit-source-id: 26406e2793f0be14f2daa46ae66f97a8494182ed	2021-04-28 09:53:30 -07:00
Edward Yang	e362ee6f8a	Make it illegal to directly construct _TensorBase (#56150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56150 See #56017 for full context; the short story is that by making it illegal to directly construct _TensorBase, we need only write a single tp_dealloc function which will work universally for all _TensorBase subclasses, rather than having to write two versions, one for _TensorBase itself, and others for Python subclasses of _TensorBase. This means simpler code. The subtlety here is that we only install our custom `tp_new` for direct subclasses of TensorBase. This is important, because overriding the `tp_new` also overrides any user defined constructor. Fortunately class Tensor(_TensorBase) has no nontrivial constructors and doesn't mind, but other subclasses like Parameter definitely mind! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28028746 Pulled By: ezyang fbshipit-source-id: 3c03a14666ad1ded1145fe676afb0a7623cdb9bb	2021-04-28 09:25:25 -07:00
Edward Yang	4d72538f80	Give Tensor a trivial (for now) metaclass _TensorMeta (#56147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56147 This is support of #55686, you can see the broader context of the metaclass in a more complete PR #56017. The short story is that in the future I want to give Tensor a non-trivial metaclass, so to derisk the change first I give it a trivial metaclass to shake out any bugs that might be caused by it. The metaclass shouldn't have any performance impact on Tensor as it only gets invoked upon subclass creation. By the way, it was totally not documented how to create metaclasses in the Python C API, and it took a good bit of trial error to figure it out (and the answer is now immortalized in https://stackoverflow.com/q/67077317/23845 -- the things that I got wrong in earlier versions of the PR included setting tp_basicsize incorrectly, incorrectly setting Py_TPFLAGS_HAVE_GC on the metaclass--you want to leave it unset so that it inherits, and determining that tp_init is what actually gets called when you construct a class, not tp_call as another not-to-be-named StackOverflow question suggests). Aside: Ordinarily, adding a metaclass to a class is a user visible change, as it means that it is no longer valid to mixin another class with a different metaclass. However, because _C._TensorBase is a C extension object, it will typically conflict with most other metaclasses, so this is not BC breaking. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28028747 Pulled By: ezyang fbshipit-source-id: c1e35a986aeb3db540c73d188f53dce951eeed33	2021-04-28 09:24:21 -07:00
Arindam Roy	5d7e48c9fc	Disable one test in rocm (#56951 ) Summary: The test seems to be failing in ROCM 4.1 on CI node. Disabling the same for now. The test will be re-enabled for ROCM when CI transitions to 4.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56951 Reviewed By: zou3519 Differential Revision: D28059808 Pulled By: ezyang fbshipit-source-id: a9b064b7525ae6dce89c51fe29ff07f37b7ac796	2021-04-28 08:58:51 -07:00
Xiao Wang	ef2bb784da	Replace raw cudaMalloc calls with CUDACachingAllocator (#57083 ) Summary: Replace raw cudaMalloc calls with CUDACachingAllocator Pull Request resolved: https://github.com/pytorch/pytorch/pull/57083 Reviewed By: zou3519 Differential Revision: D28058989 Pulled By: ezyang fbshipit-source-id: 84e2d0937e3ad5e3db9ae5a5e584d8c90954e213	2021-04-28 08:52:46 -07:00
Ansha Yu	46321cb937	[static runtime] binding for aten::norm_out (#56636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56636 Test Plan: Test it runs on the aug_1x model, which has aten::norm, and verify jit/sr results ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.53159 ms. 35.8619%. fb::sigrid_transforms_torch_bind (1 nodes) 0.9481 ms. 22.1996%. aten::linear (6 nodes) 0.704806 ms. 16.5029%. aten::argmin (1 nodes) 0.252252 ms. 5.90643%. aten::matmul (1 nodes) 0.140869 ms. 3.29842%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.100014 ms. 2.34181%. fb::clip_ranges_gather (263 nodes) 0.0880838 ms. 2.06247%. aten::sub (1 nodes) 0.0553556 ms. 1.29614%. aten::repeat (1 nodes) 0.0438464 ms. 1.02665%. aten::norm (1 nodes) 0.0395956 ms. 0.927124%. fb::batch_box_cox (1 nodes) 0.035834 ms. 0.839045%. aten::__getitem__ (506 nodes) 0.0345233 ms. 0.808357%. prim::TupleUnpack (254 nodes) 0.0316876 ms. 0.741959%. aten::sigmoid (2 nodes) 0.0293246 ms. 0.686629%. aten::mul (3 nodes) 0.0287696 ms. 0.673635%. fb::offsets_to_ranges (253 nodes) 0.0242373 ms. 0.567511%. aten::pow (1 nodes) 0.0224204 ms. 0.52497%. fb::simple_embedding_bag_sum (3 nodes) 0.0200074 ms. 0.468469%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0190264 ms. 0.445499%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0167253 ms. 0.39162%. prim::TupleConstruct (1 nodes) 0.0164962 ms. 0.386255%. aten::sum (3 nodes) 0.0158986 ms. 0.372262%. prim::DictConstruct (2 nodes) 0.0109372 ms. 0.256093%. aten::div (1 nodes) 0.00910563 ms. 0.213207%. prim::ListConstruct (4 nodes) 0.00876917 ms. 0.205328%. static_runtime::to_copy (8 nodes) 0.00822567 ms. 0.192603%. fb::sigrid_hash_precompute (1 nodes) 0.00622559 ms. 0.145771%. aten::contiguous (1 nodes) 0.00460064 ms. 0.107723%. aten::narrow (4 nodes) 0.00297164 ms. 0.0695804%. static_runtime::reshape_copy (2 nodes) 0.00287099 ms. 0.0672237%. aten::logit (1 nodes) 0.00277557 ms. 0.0649894%. aten::add (1 nodes) 0.00264978 ms. 0.0620441%. aten::clamp_min (1 nodes) 0.00215832 ms. 0.0505366%. aten::relu (1 nodes) 0.00213779 ms. 0.050056%. fb::gather_ranges (4 nodes) 0.00195846 ms. 0.0458571%. aten::full (1 nodes) 0.00177333 ms. 0.0415222%. aten::stack (1 nodes) 0.00147449 ms. 0.034525%. aten::size (3 nodes) 0.000762524 ms. 0.0178544%. aten::expand_as (1 nodes) 0.000757406 ms. 0.0177345%. fb::clip_ranges (2 nodes) 0.000614798 ms. 0.0143954%. fb::lengths_to_offsets (3 nodes) 0.000407952 ms. 0.00955212%. static_runtime::flatten_copy (1 nodes) 0.000159918 ms. 0.00374445%. prim::device (1 nodes) 4.2708 ms. in Total StaticRuntime setup time: 0.000407 ms Memory allocation time: 0.0089714 ms Memory deallocation time: 0.0592135 ms Outputs deallocation time: 0.0458097 ms Total memory managed: 947328 bytes Total number of reused tensors: 28 ``` Reviewed By: hlu1 Differential Revision: D27922070 fbshipit-source-id: 538b39b7fff0638fc994b7983bf32d9e9f15d016	2021-04-28 08:44:10 -07:00
Alexander Golynski	4638bd0f0f	Fix ProcessGroupMPITest.cpp Gather, Scatter and SendRecv. Enable ProcessGroupMPITest (#56709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56709 Right now, ProcessGroupMPITest testGather() fails with ``` what(): Gather: number of output tensors should be 0 for non-root [devgpu025:429730] * Process received signal * ``` there is a similar issue with testScatter() where number of input/output tensors on source/destination respectively should be 0. In addition testSendRecv(true); fails with ``` terminate called after throwing an instance of 'std::runtime_error' what(): src rank is wrong for recvAnysource ``` since we never populate `srcRanks` Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D28001963 Pulled By: agolynski fbshipit-source-id: c381dfc6f417ee78fbbaf884e567b0485076dfc8	2021-04-28 08:39:08 -07:00
David Reiss	89377e3e45	model_dump tool for model inspection (#56868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56868 See __init__.py for a summary of the tool. The following sections are present in this initial version - Model Size. Show the total model size, as well as a breakdown by stored files, compressed files, and zip overhead. (I expect this breakdown to be a bit more useful once data.pkl is compressed.) - Model Structure. This is basically the output of `show_pickle(data.pkl)`, but as a hierarchical structure. Some structures cause this view to crash right now, but it can be improved incrementally. - Zip Contents. This is basically the output of `zipinfo -l`. - Code. This is the TorchScript code. It's integrated with a blame window at the bottom, so you can click "Blame Code", then click a bit of code to see where it came from (based on the debug_pkl). This currently doesn't render properly if debug_pkl is missing or incomplete. - Extra files (JSON). JSON dumps of each json file under /extra/, up to a size limit. - Extra Pickles. For each .pkl file in the model, we safely unpickle it with `show_pickle`, then render it with `pprint` and include it here if the size is not too large. We aren't able to install the pprint hack that thw show_pickle CLI uses, so we get one-line rendering for custom objects, which is not very useful. Built-in types look fine, though. In particular, bytecode.pkl seems to look fine (and we hard-code that file to ignore the size limit). I'm checking in the JS dependencies to avoid a network dependency at runtime. They were retrieved from the following URLS, then passed through a JS minifier: https://unpkg.com/htm@3.0.4/dist/htm.module.js?module https://unpkg.com/preact@10.5.13/dist/preact.module.js?module Test Plan: Manually ran on a few models I had lying around. Mostly tested in Chrome, but I also poked around in Firefox. Reviewed By: dhruvbird Differential Revision: D28020849 Pulled By: dreiss fbshipit-source-id: 421c30ed7ca55244e9fda1a03b8aab830466536d	2021-04-28 07:33:10 -07:00
Yanli Zhao	1e77ba36db	change ddpLoggingData struct to map or dict (#56641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56641 currently ddpLoggingData is flat struct, which requires internal DDP developers and external users to know about the struct field names. This is not flexible to delete or add new fields in the future. also it is hard to access ddpLoggingData. With maps/dict, developers and users can easily access the fields without knowing the field names, also easier to add/remove a new/old field. Since C++ does not support map values to be different types, right now ddpLoggingData containes two types of maps. ghstack-source-id: 127482694 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D27923723 fbshipit-source-id: c90199c14925fc50ef219000e2f809dc7601cce1	2021-04-28 06:43:25 -07:00
Ilia Cherniavskii	3115728cba	[profiler] Support for trace metadata (#56575 ) Summary: Adding support for user defined trace metadata Pull Request resolved: https://github.com/pytorch/pytorch/pull/56575 Test Plan: python test/test_profiler.py TestProfiler.test_profiler_metadata Reviewed By: gdankel Differential Revision: D27957876 Pulled By: ilia-cher fbshipit-source-id: 8b6c254cca97eca23fc418e37e5772b207b0525a	2021-04-28 05:12:34 -07:00
Peter Bell	5536cda19a	Update floor_divide behavior in line with NumPy 1.20 (#56893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56893 Fixes gh-56814 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D28025814 Pulled By: mruberry fbshipit-source-id: 8654978ea1d5aa7c12bcf5a8c939966287a2d34e	2021-04-28 05:01:23 -07:00
Ilia Cherniavskii	77721ee318	[profiler] Add cuda synchronization point (ci-all) (#57036 ) Summary: Adding cuda synchronization when exiting the profiler context manager Pull Request resolved: https://github.com/pytorch/pytorch/pull/57036 Test Plan: CI Reviewed By: xuzhao9 Differential Revision: D28040552 Pulled By: ilia-cher fbshipit-source-id: 944c46a58f4c2b6d1a1c64c8d4012d662d0262d0	2021-04-28 01:17:28 -07:00
Tao Xu	8134806e23	[iOS GPU][Kernel] Implement channel split in Metal shaders (#56074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56074 To run shufflenet we need to support at::chunk on GPU. The current implementation only splits the tensor into two on channel dimension. We'll come back and fully implement it in Metal shaders. ghstack-source-id: 127522377 Test Plan: ``` 2021-03-26 01:37:07.693411-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 2, 2, 2] 2021-03-26 01:37:07.693499-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 2, 2, 2] 2021-03-26 01:37:07.693544-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 4, 2, 2] 2021-03-26 01:37:07.695415-0700 PyTorchPlayground[2279:235793] [bool test_chunk()],[1 4 2 2 ],[SUCCEED] 2021-03-26 01:37:07.695862-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 4, 2, 2] 2021-03-26 01:37:07.695927-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 5, 2, 2] 2021-03-26 01:37:07.695971-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 9, 2, 2] 2021-03-26 01:37:07.698215-0700 PyTorchPlayground[2279:235793] [bool test_chunk2()],[1 9 2 2 ],[SUCCEED] 2021-03-26 01:37:07.699086-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 8, 2, 2] 2021-03-26 01:37:07.699154-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 16, 2, 2] 2021-03-26 01:37:07.699197-0700 PyTorchPlayground[2279:235793] [MPSImageWrapper] Found a temporary image: [1, 8, 2, 2] 2021-03-26 01:37:07.700842-0700 PyTorchPlayground[2279:235793] [bool test_chunk3()],[1 16 2 2 ],[SUCCEED] ``` - Sandcastle - CircleCI Reviewed By: SS-JIA Differential Revision: D27357096 fbshipit-source-id: fd3908ad2c26466e4f714d531790be2f1ae24153	2021-04-28 00:51:58 -07:00
Aliaksandr Ivanou	0df574017d	Torchelastic: add support for the new error file format (#57084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57084 The diff adds support for new error message file format: { "message":"test", "timestamp": 12 } Test Plan: fbcode buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing/errors:api_test example job: tsm_aivanou-torchelastic_distributed_sum_77c0b147 Reviewed By: borovsky-d, wilson100hong Differential Revision: D28042764 fbshipit-source-id: 4d21c2319654f3460d551d91cbf48568356cf4e8	2021-04-28 00:04:45 -07:00
Jongsoo Park	882e273663	[caffe2] fix bug when weight_decay is used with fused rowwise + SLWS grad (#57090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57090 We did loop-invariant code motion to avoid multiplying with in_weight_temp for each element but this breaks down when weight decay is not zero. Test Plan: In devgpu buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- test_fuse_sparse_adagrad_with_sparse_lengths_weighted_sum_gradient --run-disabled Reviewed By: jianyuh Differential Revision: D28051026 fbshipit-source-id: f8906b72a41a87c2d43c447197b5fd695373ae23	2021-04-27 23:59:30 -07:00
Nikita Shulga	51e6ebb5b7	Add missing vec256<>::isnan() for VSX float and double vectors (#56658 ) Summary: Obviously, have no way of testing it Fixes https://github.com/pytorch/pytorch/issues/56650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56658 Reviewed By: walterddr Differential Revision: D27929750 Pulled By: malfet fbshipit-source-id: a4e3fe75cfeeb35f47590c940ef17b2ba4172cd5	2021-04-27 20:59:40 -07:00
Chen Lai	c91ea7d488	[PyTorch][Edge] Add binarires for unittests (#57039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57039 ## Summary Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002) ## Test plan CI Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D28047615 Pulled By: cccclai fbshipit-source-id: 47f7df3094dadb7e013ed57bc713cc8b3d1c8ce0	2021-04-27 20:46:34 -07:00
Horace He	786b0a8091	[FX] fix normalization issues with lists of tensors (#57004 ) Summary: Fixes issue with lists of tensors not being normalized correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/57004 Reviewed By: jamesr66a Differential Revision: D28034559 Pulled By: Chillee fbshipit-source-id: f935f0b73a8356acd8a2ae93fcfc0417f0eab224	2021-04-27 20:02:00 -07:00
Nikita Shulga	1c0617bb54	Fix clang-tidy for native CPU ops (#57037 ) Summary: Attempts to call clang-tidy on any source file in `aten/src/ATen/cpu/native` would fail with series of ``` /Users/nshulga/git/pytorch-worktree/aten/src/ATen/native/cpu/Activation.cpp:637:1: warning: variable 'REGISTER_DISPATCH' is non-const and globally accessible, consider making it const [cppcoreguidelines-avoid-non-const-global-variables] /Users/nshulga/git/pytorch-worktree/aten/src/ATen/native/cpu/Activation.cpp:638:1: error: C++ requires a type specifier for all declarations [clang-diagnostic-error] REGISTER_DISPATCH(log_sigmoid_backward_cpu_stub, &log_sigmoid_backward_cpu_kernel); ``` because those macros are only defined for cpu-arch specific compilation of above mentioned files. Fix this by introducing `map_filename` function that will map source file to its copy in `build` folder, run clang-tidy over the copy and than map it back Find it while working on https://github.com/pytorch/pytorch/pull/56892 Pull Request resolved: https://github.com/pytorch/pytorch/pull/57037 Reviewed By: walterddr Differential Revision: D28033760 Pulled By: malfet fbshipit-source-id: b67cd007000574ecc165ab4b285c0c102cbceadd	2021-04-27 18:56:47 -07:00
Nikita Shulga	808850b6de	[ARM] Do not use depthwise3x3 conv in grad mode (#56889 ) Summary: cpu_depthwise3x3_winograd is not grad aware and therefore should not be used if grad is expected on the input Fixes https://github.com/pytorch/pytorch/issues/56145 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56889 Reviewed By: ngimel Differential Revision: D27990448 Pulled By: malfet fbshipit-source-id: 9c649f14b8f514eb1dfb7f0eb8e3357c09ddb299	2021-04-27 18:45:29 -07:00
Ilqar Ramazanli	6e826cac67	To fix inconsistency of digamma with SciPy (#56689 ) Summary: Fixes {https://github.com/pytorch/pytorch/issues/49015} Pull Request resolved: https://github.com/pytorch/pytorch/pull/56689 Reviewed By: mruberry Differential Revision: D28014563 Pulled By: iramazanli fbshipit-source-id: 4d311e6a32737e44ebfabfc1a4b9414b0de7b46e	2021-04-27 18:36:11 -07:00
Hao Lu	0319b64ea0	[aten][simple] Optimize atrepeat (#56994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56994 - Use `DimVector` in place of `std::vector<int64_t>` to remove heap allocations for tensors with ndim <= 5 - Use `sizes()[i]` in place of `size(i)` where we know i is positive Test Plan: CI Reviewed By: edvgha, swolchok Differential Revision: D28022355 fbshipit-source-id: ef20ac73c0a330192ebc41ab9c92374ed8e2484a	2021-04-27 18:17:29 -07:00
Gao, Xiang	e8c268746b	Remove sync for randperm on small tensors. (#54113 ) Summary: For small tensors, it is known that GPU operates slower than CPU. However, offloading to CPU causes host <--> device sync. As a result, although offloading to CPU has better microbenchmarks, it often hurts instead of benefits the end-to-end performance, and it could be a blocker for CUDA graphs. After discussion with mcarilli and ptrblck, we think it might be good to just remove this piece of code and let it be slow. Microbenchmarks: ```python def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() torch.cuda.synchronize() %timeit run50_sync(lambda: torch.randperm(3, device='cuda')) %timeit run50_sync(lambda: torch.randperm(30, device='cuda')) %timeit run50_sync(lambda: torch.randperm(300, device='cuda')) %timeit run50_sync(lambda: torch.randperm(3000, device='cuda')) %timeit run50_sync(lambda: torch.randperm(30000, device='cuda')) %timeit run50_sync(lambda: torch.randperm(300000, device='cuda')) %timeit run50_sync(lambda: torch.randperm(3000000, device='cuda')) %timeit run50_sync(lambda: torch.randperm(30000000, device='cuda')) ``` Before this PR: ``` 5.79 ms ± 51.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.78 ms ± 92.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.17 ms ± 87.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.65 ms ± 69.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.6 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 21 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 104 ms ± 880 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 944 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After this PR: ``` 7.22 ms ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.28 ms ± 9.03 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.25 ms ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.19 ms ± 5.83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.76 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.3 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 69.3 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 716 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54113 Reviewed By: ezyang Differential Revision: D28017958 Pulled By: ngimel fbshipit-source-id: 660992d43ca449e61ce0cb0aa1dae554c9560a4e	2021-04-27 16:47:41 -07:00
Vasiliy Kuznetsov	9fe2673d1c	ns for fx: additional bugfix for user defined functions (#57028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57028 Adds a test case for wrapped sigmoid, and fixes the following issues to make it pass in NS: * allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related * allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later) Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Reviewed By: jerryzh168 Differential Revision: D28030089 Pulled By: vkuzo fbshipit-source-id: b237353e2d564a4476f409df461746a259015a4b	2021-04-27 16:29:03 -07:00
Vasiliy Kuznetsov	da2cef6a40	ns for fx: allow comparing int8 to int8 for functionals (#57027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57027 Fixes a bug to allow shadowing of linear and conv functionals. The bug is to only detach tensors, not all objects. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun ``` Reviewed By: jerryzh168 Differential Revision: D28030090 Pulled By: vkuzo fbshipit-source-id: 0a38c4b232e007d7822eee818b0af99d98335d22	2021-04-27 16:29:01 -07:00
Vasiliy Kuznetsov	a359cfac22	ns for fx: add option to skip matching classes and functions (#57026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57026 Adds a config option to skip matching classes by class type and functions by function type. This is useful when users make custom modules which return types other than tensors. With the current implementation of Logger, these are not scriptable. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable ``` Reviewed By: jerryzh168 Differential Revision: D28030093 Pulled By: vkuzo fbshipit-source-id: 71dc54dd935d2071c4b017260ea2a1e5c2298bfe	2021-04-27 16:29:00 -07:00
Vasiliy Kuznetsov	e8a5490c0a	ns for fx: support binary ops when adding unshadowed loggers for inputs (#57025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57025 Adds the ability to log unshadowed inputs of binary ops such as `add` and `mul`, when indices 0, 1, or 0 and 1 are tensors. Note: making shadowing support this is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations ``` Reviewed By: jerryzh168 Differential Revision: D28030098 Pulled By: vkuzo fbshipit-source-id: fd46760faac153975cd7688e70c44991ec1d5dff	2021-04-27 16:28:58 -07:00
Vasiliy Kuznetsov	ddedeab66d	ns for fx: bug fix for shadowing fp16 emulation patterns (#57024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57024 Enables shadow copies of fp16 emulation patterns where weights are cast to fp16 before being passed to linear. This previously did not work because copying of `call_method` nodes was not implemented. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations ``` Reviewed By: jerryzh168 Differential Revision: D28030096 Pulled By: vkuzo fbshipit-source-id: 13a39ea6c106180df6d750246672286b58b4d04c	2021-04-27 16:28:56 -07:00
Vasiliy Kuznetsov	2acc19eca1	ns for fx: add fp16 function shadowing (#57023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57023 Adds functionality for shadowing user functions with fp16 I/O dtype. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Reviewed By: jerryzh168 Differential Revision: D28030092 Pulled By: vkuzo fbshipit-source-id: 642792398a76bd62593fa439ab14901e9dbdf4f8	2021-04-27 16:28:54 -07:00
Vasiliy Kuznetsov	782a0a1469	ns for fx: allow user functions in shadowing (#57022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57022 Allows usage of user functions in NS shadow APIs. We expose the i/o mapping to the user APIs, and thread them throughout the code. Note: the format of the mapping is currently not the best. Saving improving that for a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Reviewed By: jerryzh168 Differential Revision: D28030095 Pulled By: vkuzo fbshipit-source-id: 2863312362223ad276437e2aeeec4a3f71b691c7	2021-04-27 16:28:53 -07:00
Vasiliy Kuznetsov	c4bec76bec	ns for fx: move node I/O dtype mapping to be local instead of global (#57021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57021 To support shadows of custom functions, we need to allow user to specify I/O type of the custom functions. This PR is a cleanup in preparation for making the above happen. We make the I/O dtype mappings be generated by a function instead of a global variable. In the next PR, we will add a hook so user can modify these mappings. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Reviewed By: jerryzh168 Differential Revision: D28030094 Pulled By: vkuzo fbshipit-source-id: 3cbb617f034ef385c2875c4ec7fed13ca30bfc57	2021-04-27 16:27:40 -07:00
Winston Smith	c307379170	Output tensor specified via out= must be on the same device as inputs for dot & vdot (#56334 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55561. 1. Added checks to ensure that Output tensor specified via out= must be on the same device as inputs for `dot` & `vdot`. 2. Unskipped `test_out` for `dot` & `vdot`. 3. Also changed the `tensordot` implementation to check if both input tensors are on the same device as the output tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56334 Reviewed By: H-Huang Differential Revision: D27993778 Pulled By: mruberry fbshipit-source-id: 36dee41ceef123c29d0cc52d6b09c3c440e8e60e	2021-04-27 16:14:39 -07:00
Mike Ruberry	7bcce2acb9	Revert D27765618: Initial support for sparse complex tensors constructors for CPU/CUDA Test Plan: revert-hammer Differential Revision: D27765618 (`daef60c3b7`) Original commit changeset: a9cdd31d5c7a fbshipit-source-id: f700d5db7ff8930b9158460b5a77f68a35e212a4	2021-04-27 15:48:51 -07:00
Eddie Yan	fa57191b16	fix #56822 (#56967 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56822 There was an off by one in CPU randperm when checking the limits of the requested range. Also shows up in the "CUDA" version as it will fallback to CPU for small input sizes. CC zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/56967 Reviewed By: mruberry Differential Revision: D28031819 Pulled By: ngimel fbshipit-source-id: 4d25995628997f164aafe94e7eae6c54f018e4e5	2021-04-27 15:32:01 -07:00
Alexander	0d41122e61	Eliminate global usage of torch.set_default_dtype in sparse test (#56393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56393 Fixes for gh-56369 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27913266 Pulled By: mruberry fbshipit-source-id: 2c590d3a2188aae251184f08c1a6a2c4c570d150	2021-04-27 15:23:14 -07:00
Alexander	18c89a904b	Modernize test-suite in sparse tensor CSR (#56392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56392 Fixes for gh-56371 and gh-56369 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27913212 Pulled By: mruberry fbshipit-source-id: 2c78fe9fa4b6c6b566d9eb01f71e6016d672a545	2021-04-27 15:22:17 -07:00
Edward Yang	09feb5f579	Delete grandfathered Caffe2 dispatch keys. (#56939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56939 These never have kernels registered to them and are effectively useless. What I am not so sure if we allocate tensors to them or not; if we do I cannot use asserts and I need to ensure we just return undefined or something equivalent. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D28006160 Pulled By: ezyang fbshipit-source-id: f8e2b61b8bd928fb2c0ac0b534bd4af076423f71	2021-04-27 14:58:35 -07:00
Jacob Szwejbka	60a5ebfac2	[Pytorch Edge] Remove methods_to_optimize arg (#57045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57045 Went back and adjusted the previous optimizations to just be applied to every function. Cleaned up api to match. ghstack-source-id: 127214412 ghstack-source-id: 127536155 Test Plan: unit test Reviewed By: kimishpatel Differential Revision: D27950859 fbshipit-source-id: 214e83d5a19b452747fe223615815c10fa4aee58	2021-04-27 14:54:13 -07:00
Rohan Varma	7b160e29a4	[DDP] remove backend constraints on uneven input tests (#56754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56754 these tests are backend agnostic and shouldn't require a specific backend(s) to run properly. Hence enabling them regardless of the backends that are available. ghstack-source-id: 127463147 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27954174 fbshipit-source-id: 24759486b0c0647a5c88da4721a9a78d78c0b1f6	2021-04-27 14:50:38 -07:00
Eddie Yan	522dca4ab0	Port `topk` from THC to ATen, migrate most of sort as well (#55392 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24648 The large tensor codepath is ported, but there is a legacy codepath that depends on an inplace sort in THC that is not callable from `at::`. At first glance, THC `topk` seems to be the only function that uses this `sortKeyValueInplace`. Is the correct change to wrap `sortKeyValueInplace` in legacy functions for visibility in the `at::` namespace? Pull Request resolved: https://github.com/pytorch/pytorch/pull/55392 Reviewed By: ezyang Differential Revision: D28014257 Pulled By: ngimel fbshipit-source-id: e297423c763f0691151cb62a4f5eff4cb31fb2b3	2021-04-27 14:49:41 -07:00
Alexander	ecaa208fd6	Fix: sparse_csr_tensor segfaults when crow_indices or col_indices are non-tensors (#56723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56723 WIP gh-56687 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27999919 Pulled By: ezyang fbshipit-source-id: 7eb23c8f45f3c459efe65793caecaa6b67a187c9	2021-04-27 14:47:12 -07:00
Alexander	4a899bb3c4	Fix: Incorrect example output in sparse_csr_tensor doc-string (#56722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56722 Fix: Incorrect example output in sparse_csr_tensor doc-string closes gh-56685 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27999920 Pulled By: ezyang fbshipit-source-id: 0b344f7ddab4be8aadde540ce010b75df4433f4b	2021-04-27 14:46:03 -07:00
Alexander	daef60c3b7	Initial support for sparse complex tensors constructors for CPU/CUDA (#54153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54153 Currently, sparse tensors only support real floating point tensors. Complex support is added in this PR for CPU/CUDA. - [x] add complex support (torch.cfloat and torch.cdouble) to torch.sparse_coo_tensor constructors - [x] add complex support to coalesce function - [x] add complex support to to_dense function - [x] add complex support to to_sparse function - [x] add complex support to sparse_add function - [x] add unit tests Note: This PR contains only complex support for torch.sparse_coo_tensor fordward function and the related ops used with this function (coalesce, to_dense, to_sparse, and sparse_add). The following PRs in ghstack should cover other sparse operations to have a more complex sparse support, specifically related with the use of specific APIs for accelerated linear algebra. Note: Before using ghstack the original PR was #50984 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27765618 Pulled By: ezyang fbshipit-source-id: a9cdd31d5c7a7dafd790f6cc148f3df26e884c89	2021-04-27 14:39:13 -07:00
albanD	d16ed1ee8a	Add first draft of gradcheck note (#55966 ) Summary: You can find the latest rendered version in the `python_doc_build` CI job below, in the artifact tab of that build on circle CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/55966 Reviewed By: H-Huang Differential Revision: D28032446 Pulled By: albanD fbshipit-source-id: 227ad37b03d39894d736c19cae3195b4d56fc62f	2021-04-27 14:33:42 -07:00
Eli Uriegas	dd84224edc	.github: Switch alpine to ECR image instead (#57060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57060 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28040144 Pulled By: seemethere fbshipit-source-id: f7590256c9f067add5d5e7b61a2c44beb2482d71	2021-04-27 14:18:13 -07:00
kshitij12345	26ed4b4756	OpInfo : index_fill (port remaining method_tests) (#57009 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53237 Before PR (around 90s) (most time consuming tests in details) <details> ``` pytest test/test_ops.py -k _index_fill --durations=20 ========================================================================= test session starts ========================================================================== platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 plugins: hypothesis-5.38.1 collected 19327 items / 19225 deselected / 102 selected test/test_ops.py s..................ssssssssssssssssssss..................ss....ssssssssssssssss....sssss....ssssss.... [100%] =========================================================================== warnings summary =========================================================================== ========================================================================= slowest 20 durations ========================================================================= 44.14s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_index_fill_cuda_complex128 13.08s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_index_fill_cpu_complex128 7.36s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_index_fill_cuda_complex128 4.20s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_index_fill_cuda_float32 3.42s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_index_fill_cpu_float32 2.93s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_index_fill_cuda_complex64 2.32s call test/test_ops.py::TestGradientsCPU::test_fn_grad_index_fill_cpu_complex128 2.18s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_index_fill_cpu_complex64 1.03s call test/test_ops.py::TestOpInfoCUDA::test_duplicate_method_tests_index_fill_cuda_float32 0.84s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_index_fill_cuda_float64 0.64s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_index_fill_cuda_float64 0.41s call test/test_ops.py::TestOpInfoCUDA::test_supported_backward_index_fill_cuda_complex128 0.41s call test/test_ops.py::TestOpInfoCUDA::test_supported_backward_index_fill_cuda_bfloat16 0.39s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64 0.38s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_float32 0.36s call test/test_ops.py::TestOpInfoCUDA::test_supported_backward_index_fill_cuda_complex64 0.36s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_float16 0.35s call test/test_ops.py::TestOpInfoCUDA::test_supported_backward_index_fill_cuda_float16 0.35s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_int16 0.35s call test/test_ops.py::TestOpInfoCUDA::test_supported_backward_index_fill_cuda_float32 ======================================================================= short test summary info ======================================================================== =============================================== 52 passed, 50 skipped, 19225 deselected, 8 warnings in 97.31s (0:01:37) ================================================ ``` </details> After PR (around 90s) (most time consuming tests in details) <details> ``` pytest test/test_ops.py -k _index_fill --durations=20 ========================================================================= test session starts ========================================================================== platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 plugins: hypothesis-5.38.1 collected 19327 items / 19225 deselected / 102 selected test/test_ops.py s..................ssssssssssssssssssss..................ss....ssssssssssssssss....sssss....ssssss.... [100%] =========================================================================== warnings summary =========================================================================== ========================================================================= slowest 20 durations ========================================================================= 40.88s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_index_fill_cuda_complex128 13.12s call test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_index_fill_cpu_complex128 7.03s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_index_fill_cuda_complex128 3.48s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_index_fill_cuda_complex64 3.01s call test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_index_fill_cuda_float32 2.55s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_index_fill_cpu_complex64 2.43s call test/test_ops.py::TestGradientsCPU::test_fn_grad_index_fill_cpu_complex128 2.38s call test/test_ops.py::TestCommonCPU::test_variant_consistency_jit_index_fill_cpu_float32 1.10s call test/test_ops.py::TestOpInfoCUDA::test_duplicate_method_tests_index_fill_cuda_float32 0.76s call test/test_ops.py::TestGradientsCUDA::test_fn_grad_index_fill_cuda_float64 0.67s call test/test_ops.py::TestGradientsCUDA::test_fn_gradgrad_index_fill_cuda_float64 0.50s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_bfloat16 0.50s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_uint8 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_float64 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_float16 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_complex128 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_bool 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_float32 0.49s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_int32 0.48s call test/test_ops.py::TestOpInfoCUDA::test_supported_dtypes_index_fill_cuda_complex64 ======================================================================= short test summary info ======================================================================== =============================================== 52 passed, 50 skipped, 19225 deselected, 8 warnings in 93.31s (0:01:33) ================================================ ``` </details> TODO: * [x] Add test timings (Before and After) Pull Request resolved: https://github.com/pytorch/pytorch/pull/57009 Reviewed By: H-Huang Differential Revision: D28027095 Pulled By: mruberry fbshipit-source-id: 6509ff726c8d954171cc0921b803ba261091a0e9	2021-04-27 13:50:23 -07:00
Ilia Cherniavskii	092eeedcb7	[profier] Fix double printing of FLOPs (#56974 ) Summary: Call table() shouldn't modify the events Pull Request resolved: https://github.com/pytorch/pytorch/pull/56974 Test Plan: ``` import torch from torch import nn from torch.profiler import profile, record_function model = nn.Conv2d(8, 64, 3, padding=1) input = torch.randn(1, 8, 272, 272) with profile(record_shapes=True, with_flops=True) as prof: with record_function("model_inference"): model(input) events = prof.key_averages(group_by_input_shape=True) print(events.table()) print(events.table()) ``` ``` ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes GFLOPS/s ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ aten::zeros 0.78% 68.000us 1.16% 101.000us 101.000us 1 [[], [], [], [], []] -- aten::empty 0.49% 43.000us 0.49% 43.000us 14.333us 3 [[], [], [], [], [], []] -- aten::zero_ 0.23% 20.000us 0.23% 20.000us 20.000us 1 [[1]] -- model_inference 13.67% 1.195ms 98.84% 8.639ms 8.639ms 1 [] -- aten::conv2d 0.42% 37.000us 85.13% 7.440ms 7.440ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ 91.645 aten::convolution 0.15% 13.000us 84.70% 7.403ms 7.403ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::_convolution 0.48% 42.000us 84.55% 7.390ms 7.390ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::mkldnn_convolution 83.47% 7.295ms 84.07% 7.348ms 7.348ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::as_strided_ 0.31% 27.000us 0.31% 27.000us 27.000us 1 [[1, 64, 272, 272], [], [], []] -- ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Self CPU time total: 8.740ms ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes GFLOPS/s ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ aten::zeros 0.78% 68.000us 1.16% 101.000us 101.000us 1 [[], [], [], [], []] -- aten::empty 0.49% 43.000us 0.49% 43.000us 14.333us 3 [[], [], [], [], [], []] -- aten::zero_ 0.23% 20.000us 0.23% 20.000us 20.000us 1 [[1]] -- model_inference 13.67% 1.195ms 98.84% 8.639ms 8.639ms 1 [] -- aten::conv2d 0.42% 37.000us 85.13% 7.440ms 7.440ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ 91.645 aten::convolution 0.15% 13.000us 84.70% 7.403ms 7.403ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::_convolution 0.48% 42.000us 84.55% 7.390ms 7.390ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::mkldnn_convolution 83.47% 7.295ms 84.07% 7.348ms 7.348ms 1 [[1, 8, 272, 272], [64, 8, 3, 3], [64], [], [ -- aten::as_strided_ 0.31% 27.000us 0.31% 27.000us 27.000us 1 [[1, 64, 272, 272], [], [], []] -- ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Self CPU time total: 8.740ms ``` Fixes https://github.com/pytorch/pytorch/issues/55606 Reviewed By: xuzhao9 Differential Revision: D28019925 Pulled By: ilia-cher fbshipit-source-id: 7e7d7ed496059caf917a3dd8dea2daaceb5db920	2021-04-27 13:46:25 -07:00
Akifumi Imanishi	9da0f2e95e	Support `__pos__` and `positive` (#55891 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55604. This PR implements `torch.Tensor.__pos__` and `torch.positive` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo and kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55891 Reviewed By: H-Huang Differential Revision: D28025928 Pulled By: mruberry fbshipit-source-id: e43e329a802f31bf8805f6efab5c2c7ef34c88b9	2021-04-27 13:23:59 -07:00
Shen Li	5b3c0ae563	Use a FutureFactoryRegistry to allow libtorch_cpu files to create CUDAFuture (#56984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56984 This is a preparation PR before we can create CUDAFuture in rref_impl.cpp. The solution is adding a `FutureFactoryRegistry` in `rpc/utils.`. The TensorPipe RPC agent is responsible for registering `CUDAFuture` factory and `ivalue::Future` factory. The reason that we need this change instead of directly using `USE_CUDA` macro in RRef files is as follows. There are three build targets: `torch_cpu`, `torch_cuda`, and `torch_python`. `torch_python` is built on top of the other two. `torch_cpu` is CPU-only, which contains no CUDA-related code, and hence no `USE_CUDA` macro. `tensorpipe_` files are in `torch_python` which does have access to CUDA. However RRef source files are in `torch_cpu`, which cannot contain CUDA code. The recommended solution is to allow dynamic dispatching. Therefore, we had this PR. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28020917 Pulled By: mrshenli fbshipit-source-id: e67c76a273074aebb61877185cc5e6bc0a1a5448	2021-04-27 12:34:15 -07:00
Shen Li	f9e7e2e20e	Remove unnecessary noCuda arg from AtomicJitFuture (#56973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56973 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28020918 Pulled By: mrshenli fbshipit-source-id: 99d0e4306f7650be97f73af00d89bdbb762595bc	2021-04-27 12:33:02 -07:00
Edvard Ghazaryan	cea265b8d8	Support layer_norm for static runtime (#56444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56444 Added out version for layer_norm Test Plan: buck test caffe2/aten:math_kernel_test -- NativeLayerNorm buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D27873846 fbshipit-source-id: 53ee9fec4ff9a4e78198b031e86b5afd013626dd	2021-04-27 12:28:37 -07:00
Xiang Gao	3de86b951d	Migrate thrust->cub for index put (#55693 ) Summary: 64bit indexing is not supported, because if `num_indices = 2^31`, then 4 long tensors of `num_indices` elements will take 64GB RAM. I don't think anybody will be interested in running `index_put` with 64GB GPU RAM. Benchmark on CUDA 11.3 RTX3090: ```python import torch import itertools def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() run50_sync(lambda: torch.randperm(1000000, device='cuda')) def benchmark(M, L): a = torch.randn(M, device='cuda') i1 = torch.randint(M, (L,), dtype=torch.long, device='cuda') v = torch.randn(L, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda:a.index_put_((i1,), v, True)) for M, L in itertools.product((100, 100000, 10000000), repeat=2): print(M, L) benchmark(M, L) ``` Before ``` 100 100 5.13 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100 100000 30.2 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 100 10000000 3.17 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 100000 100 5.19 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 100000 11.9 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 10000000 712 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 10000000 100 5.07 ms ± 66.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 100000 12.1 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 10000000 627 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 100 100 3.75 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100 100000 26.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 100 10000000 2.81 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 100000 100 3.85 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 100000 9.74 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 10000000 444 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 10000000 100 3.85 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 100000 10.7 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 10000000 396 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55693 Reviewed By: albanD Differential Revision: D27895967 Pulled By: ngimel fbshipit-source-id: 0616ce33395ce46f1a4161dfd38940b8e54fedc2	2021-04-27 12:27:09 -07:00
Edward Yang	6c602eb099	Don't hold ThreadPool lock when destructing task (#56817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56817 Fix https://github.com/pytorch/pytorch/issues/56701 and https://github.com/pytorch/pytorch/issues/56786 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D27975642 Pulled By: ezyang fbshipit-source-id: b7f4a6c18a4fa65c38bacc7c46246f0865c95f86	2021-04-27 12:22:49 -07:00
Peter Bell	a18f3aacee	Vectorize floating point floor_divide (#55380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27993499 Pulled By: mruberry fbshipit-source-id: 45ea9c3295e4d85316bae9487db20914e0cbe3ed	2021-04-27 12:10:06 -07:00
Yukio Siraichi	cf17fd6dd5	Fix multinomial CUDA misalignment and non-deterministic behavior (#55364 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46702 - fails on probability distribution with odd items - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory - produce unrepeatable result for large input tensor - parallel cumsum not monotonic at some positions ### Fixes - computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues - the non-monotonic behavior may happen even using `float`, though - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364 Reviewed By: mruberry Differential Revision: D28031666 Pulled By: ngimel fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de	2021-04-27 12:04:32 -07:00
Akifumi Imanishi	6e91e90b4d	Use OpInfo for unsqueeze test (#56924 ) Summary: This PR is ready for https://github.com/pytorch/pytorch/issues/56774. (cc: mruberry, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56924 Reviewed By: H-Huang Differential Revision: D28026529 Pulled By: mruberry fbshipit-source-id: 3afb33bb2999110c565728cd761d3e7d9d3fc82b	2021-04-27 11:58:30 -07:00
Serhat Yilmaz	6c37788cb1	[torch] Add cuda support for segment reduction 'max' (#56704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56704 This is re submit of PR: https://github.com/pytorch/pytorch/pull/54175 Main changes compared to original PR: - Switch to importing "<ATen/cuda/cub.cuh>" - Use CUB_WRAPPER to reduce boiler plate code. Test Plan: Will check CI status to make sure a Added unit test Reviewed By: ngimel Differential Revision: D27941257 fbshipit-source-id: 24a0e0c7f6c46126d2606fe42ed03dca15684415	2021-04-27 11:29:03 -07:00
lezcano	d578e8cfa2	Improved docs for `torch.linalg` (#56265 ) Summary: This PR tries to make the docs of `torch.linalg` have/be: - More uniform notation and structure for every function. - More uniform use of back-quotes and the `:attr:` directive - More readable for a non-specialised audience through explanations of the form that factorisations take and when would it be beneficial to use what arguments in some solvers. - More connected among the different functions through the use of the `.. seealso::` directive. - More information on when do gradients explode / when is a function silently returning a wrong result / when things do not work in general I tried to follow the structure of "one short description and then the rest" to be able to format the docs like those of `torch.` or `torch.nn`. I did not do that yet, as I am waiting for the green light on this idea: https://github.com/pytorch/pytorch/issues/54878#issuecomment-816636171 What this PR does not do: - Clean the documentation of other functions that are not in the `linalg` module (although I started doing this for `torch.svd`, but then I realised that this PR would touch way too many functions). Fixes https://github.com/pytorch/pytorch/issues/54878 cc mruberry IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/56265 Reviewed By: H-Huang Differential Revision: D27993986 Pulled By: mruberry fbshipit-source-id: adde7b7383387e1213cc0a6644331f0632b7392d	2021-04-27 11:16:09 -07:00
Yukio Siraichi	9d54475032	Hide module paths leaking in the documentation. (#54585 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54585 Reviewed By: H-Huang Differential Revision: D28027037 Pulled By: mruberry fbshipit-source-id: 219874e143221f5e8349d007f88464e0be1a6243	2021-04-27 10:58:01 -07:00
Ilia Cherniavskii	c203c921bc	Revert D27926270: [pytorch][PR] [profiler] Add cuda synchronization points Test Plan: revert-hammer Differential Revision: D27926270 (`38bb0ac3e8`) Original commit changeset: 5cf30128590c fbshipit-source-id: 940da27f5c921d8921191188230807f1708e3e1f	2021-04-27 09:27:35 -07:00
Nikita Shulga	a93ceb333d	Workaround intermittent gcc-7.5 ICE in cpp tests (#57016 ) Summary: gcc-7.5 optimizer can hit internal compiler error if both `-fopenmp` and `-faligned-new` are passed: ``` /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp: In function 'void transformer_decoder_test_helper(bool)': /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6: internal compiler error: in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429 void transformer_decoder_test_helper(bool is_cuda) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Fixes https://github.com/pytorch/pytorch/issues/40941 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57016 Reviewed By: walterddr Differential Revision: D28027670 Pulled By: malfet fbshipit-source-id: 834e34b95e09bcae39ada25e02749f479a7e9013	2021-04-27 09:21:23 -07:00
Eli Uriegas	11d455fa8b	.github: Enable Linux CPU GHA on PRs (#56942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56942 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28018455 Pulled By: seemethere fbshipit-source-id: 2b4ba3d616c217b4e960871f1428dda03f2ad92a	2021-04-27 09:16:33 -07:00
Nikita Shulga	ed617a61ce	Adjust computeLRWorkDim() to work with Accelerate.framework (#56847 ) Summary: According to `vecLib.framework/Headers/clapack.h` Accelerate.framework's LAPACK implementation is based on 3.2.1, and so LRWORK should be computed using following formula (from ``` > If JOBZ = 'N', LRWORK >= 7min(M,N). > Otherwise, > LRWORK >= min(M,N)max(5min(M,N)+7,2max(M,N)+2min(M,N)+1) ``` Found while looking at test_linalg.py crashes on M1, but would have happen on x86 as well, if Pytorch+Accelerate framework are to be tested on x86_64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56847 Reviewed By: albanD Differential Revision: D27983352 Pulled By: malfet fbshipit-source-id: f757c515c85b32c1e09d00a91bc20fe4b390a75a	2021-04-27 09:12:54 -07:00
Richard Zou	338a600e78	Add dispatch keys for out-of-tree grad+vmap prototype (#56824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56824 This PR adds 6 dispatch uses to be used with prototyping. I'm not sure what the best way to name these are, please let me know if you think that these should have the same prefix. Test Plan: - wait for tests Reviewed By: driazati Differential Revision: D27999963 Pulled By: zou3519 fbshipit-source-id: 0c3ef4788854f7a93d077cc454b773a6eedbbc22	2021-04-27 09:02:49 -07:00
Nikolay Korovaiko	cfbd06d7a1	add all pools, Batchnorm and Tanh (i.e. all ideeped MKLDNN ops) to MKLDNNFuser (#56541 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/56541 Reviewed By: pbelevich Differential Revision: D27930353 Pulled By: Krovatkin fbshipit-source-id: 4d5b932bad4154e8bdd6e35498354e13b39c87a1	2021-04-27 08:59:30 -07:00
Eli Uriegas	8d29ac2033	.github: Bump linux.2xlarge runners to 500 (#56945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56945 In preparation to turn these on for CI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28018454 Pulled By: seemethere fbshipit-source-id: fa94d666499877f2cdd7b8fd3fc8b2d8127f61e8	2021-04-27 08:49:22 -07:00
Eli Uriegas	e138987818	.github: Build test binaries in build/ directory (#56941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56941 Sets the custom test binaries we build in .jenkins/pytorch/build.sh to be built in the `build` directory instead of the directory above the workspace. This should alleviate any weirdness we were seeing before with test binaries having to be overwritten Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28018453 Pulled By: seemethere fbshipit-source-id: 74add11037a622e011d00fb6292bfe20e1d55d9e	2021-04-27 08:48:09 -07:00
Hui Guo	6bbd8ba658	[NNC] removed the second run of llvm passmanager - it is repeated and caused a slowdown in the generated code (#56837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56837 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27980073 Pulled By: huiguoo fbshipit-source-id: 4bc821adb7bba67078f0a4cb3294143f701f5335	2021-04-27 08:36:04 -07:00
Erjia Guan	3b977a0d28	[DataLoader] Add `generate_state` for NumPy seeding (#56797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56797 After adding default seeding strategy for NumPy random module within each worker of DataLoader #56488, two concerns are raised: - We dropped the support for NumPy < 1.17 due to `SeedSequence` - In order to support seeding for NumPy < 1.17, how can we provide seed for `numpy.random`? - First option is set the same seed as `random`. But, the problem is a same algorithm is shared between `numpy.random` and `random`. With the same seed, they will have exact same state sequence. Thanks to rkern, we noticed this so-called [bad things](https://github.com/PyTorchLightning/pytorch-lightning/pull/6960#issuecomment-818393659). - Considering most of users do not aware this problem, we can provide a better seed by default for `numpy.random` using same `SeedSequence` algorithm as numpy. This is just a workaround with hard-coded function to generate an array of four int32 as the seed. To better coping with this problem since there are amount of 3rd party libraries not just `NumPy` having random module. We may at the end need to implement a `SeedSequence` within `torch.random` module, then users can `spawn` a new `SeedSequence` for each library. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28000619 Pulled By: ejguan fbshipit-source-id: 5701c8124a38ea5ded69eb8eee70f9680877ffa6	2021-04-27 08:14:02 -07:00
Philip Meier	759cfb7495	add missing comma to `run_test.py` (#57010 ) Summary: Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121: > Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010 Reviewed By: malfet Differential Revision: D28028061 Pulled By: walterddr fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258	2021-04-27 08:00:13 -07:00
Jeffrey Wan	201ad938b2	Enable fixed fast_mode for complex (#55699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55699 Todo: - error message should be updated to say whether the failure is for fn's real or imaginary component Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28007887 Pulled By: soulitzer fbshipit-source-id: 1819201f59c8586a1d9631db05983969438bde66	2021-04-27 07:54:19 -07:00
Jeffrey Wan	7fe6e8e5a2	Refactor C->C to C->R twice (#55692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55692 ### Release notes get_numerical_jacobian and get_analytical_jacobian only support `grad_out=1` and `fn` no longer accepts functions that return complex output Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28004614 Pulled By: soulitzer fbshipit-source-id: 9592c9c69584b4035b39be62252f138dce39d3b5	2021-04-27 07:53:13 -07:00
anjali411	268cc117a8	Add OpInfos for torch.{complex, view_as_real, view_as_complex} (#56524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56524 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27909165 Pulled By: anjali411 fbshipit-source-id: 38592cdb357386549c8309792ef7c3218665d286	2021-04-27 07:40:46 -07:00
Heitor Schueroff	57e37080cd	Added OpInfo for torch.einsum (#56276 ) Summary: Adds OpInfo testing for torch.einsum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56276 Reviewed By: mruberry Differential Revision: D27967095 Pulled By: heitorschueroff fbshipit-source-id: 60524273d2ca885e7eeb932db3e7fd697ae5ca8e	2021-04-27 07:39:38 -07:00
Edward Yang	ab1457ad14	Remove C++17 only optional include (#56782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56782 Fixes #56749 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28000019 Pulled By: ezyang fbshipit-source-id: 87f86a402dac87e6c101aef8c78a928ce7d21340	2021-04-27 07:35:15 -07:00
Edward Yang	0d777a808c	Make test_randperm work with meta device (#56976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56976 Band-aid fix for #54282 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28020401 Pulled By: ezyang fbshipit-source-id: 50546d5275eade408d65e9c883999fb3b65ff55a	2021-04-27 07:26:58 -07:00
Joel Schlosser	f7fba854bf	Implement module.to_empty() (#56610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56610 Reviewed By: malfet Differential Revision: D27921653 Pulled By: jbschlosser fbshipit-source-id: 10734b3eaa5b84bb4ba6eeba1043cfc8bb570a17	2021-04-27 06:19:54 -07:00
Bharat123rox	f2acdff73d	DOC: Add note to mutating methods (#56877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56243 by adding a note to mutating functions not following the trailing `_` convention in `torch/nn/modules/module.py` I can also raise separate PRs for other files, if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/56877 Reviewed By: ezyang Differential Revision: D28008856 Pulled By: jbschlosser fbshipit-source-id: 63bfca0df05e49fceadd3167b1427dcb5542206a	2021-04-27 06:16:56 -07:00
Mike Ruberry	1145e2c6e2	Revert D27831996: ns for fx: move node I/O dtype mapping to be local instead of global Test Plan: revert-hammer Differential Revision: D27831996 (`93de80203d`) Original commit changeset: 782f5e77de0e fbshipit-source-id: 6637ef4e8ba76fc4f2b3836ad1ed8d37ce040576	2021-04-27 01:01:08 -07:00
Mike Ruberry	45e96b5410	Revert D27833189: ns for fx: allow user functions in shadowing Test Plan: revert-hammer Differential Revision: D27833189 (`1917350977`) Original commit changeset: dac418e294d1 fbshipit-source-id: c6f58dac1a35806ea7d1dfb993d67e698196dee1	2021-04-27 01:01:06 -07:00
Mike Ruberry	982c72ac33	Revert D27836064: ns for fx: add fp16 function shadowing Test Plan: revert-hammer Differential Revision: D27836064 (`96a9eafcfb`) Original commit changeset: 37a434a04e2b fbshipit-source-id: e85088f5e301e14a0fc9ac1f7241c2baaf0a957e	2021-04-27 01:01:04 -07:00
Mike Ruberry	90d554bd86	Revert D27857735: ns for fx: bug fix for shadowing fp16 emulation patterns Test Plan: revert-hammer Differential Revision: D27857735 (`f35540be38`) Original commit changeset: 7c1a067f035a fbshipit-source-id: 6816223975b2e7b1f395e8894d17e3358fdb50ed	2021-04-27 01:01:02 -07:00
Mike Ruberry	abb8b6c1c1	Revert D27864296: ns for fx: support binary ops when adding unshadowed loggers for inputs Test Plan: revert-hammer Differential Revision: D27864296 (`c004346c88`) Original commit changeset: 3cbeb728297a fbshipit-source-id: bc87cb707b14a0965452e9a1aa0d4e37ffbe5bf1	2021-04-27 01:01:01 -07:00
Mike Ruberry	cc8c5c1447	Revert D27886107: ns for fx: add option to skip matching classes and functions Test Plan: revert-hammer Differential Revision: D27886107 (`92c7aec5f5`) Original commit changeset: ec92c4f7ab71 fbshipit-source-id: 87d3b91c3d601f1706b61a2b2ce287a7b44f3d81	2021-04-27 01:00:59 -07:00
Mike Ruberry	5dc7a6b050	Revert D27960767: ns for fx: allow comparing int8 to int8 for functionals Test Plan: revert-hammer Differential Revision: D27960767 (`502c58ad84`) Original commit changeset: abc911ca4b9e fbshipit-source-id: 9bb1aa9d0e764bfd2dd6745af897d958c054ef3a	2021-04-27 01:00:57 -07:00
Mike Ruberry	5db03b4109	Revert D27960766: ns for fx: additional bugfix for user defined functions Test Plan: revert-hammer Differential Revision: D27960766 (`9bd14da6e4`) Original commit changeset: 02935d2f400a fbshipit-source-id: e7026c8637a591b6ffef288da8ef6306cdb9eb95	2021-04-27 00:59:57 -07:00
Andrew Millspaugh	a0483cd06b	Back out "fx: Fix type_matches for Optional[List[int]] arguments" (#56991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56991 Original commit changeset: c5aa5f61a215 Diff: D27987746 (`267b554b6f`) Test Plan: `buck test` under the glow-buck target is the target that this reversion is intended to fix Reviewed By: jfix71 Differential Revision: D28019659 fbshipit-source-id: 37584ff404fc9195b309a5a6afdb4edbc2b4f088	2021-04-27 00:15:15 -07:00
Bert Maher	780f454297	Add some functions for manipulating mkldnn tensors to TORCH_API (#56954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56954 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D28010327 Pulled By: bertmaher fbshipit-source-id: 59872a40c7bc06187f0d87046446dd39193a1d71	2021-04-26 23:52:49 -07:00
Bert Maher	c42dd8b257	Revert "Use at::cpu in bench_approx (#56563 )" (#56816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56816 This doesn't actually work. For some reason the linker can't find at::cpu::logit_out, and it's not worth digging into why not. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27977406 Pulled By: bertmaher fbshipit-source-id: d0235a393f25243e2c8a011e9baf267daf483ae4	2021-04-26 23:51:49 -07:00
Ilia Cherniavskii	38bb0ac3e8	[profiler] Add cuda synchronization points (#56651 ) Summary: Adding cuda synchronization when entering and exiting the profiler context manager Pull Request resolved: https://github.com/pytorch/pytorch/pull/56651 Test Plan: CI Reviewed By: gdankel Differential Revision: D27926270 Pulled By: ilia-cher fbshipit-source-id: 5cf30128590c1c71a865f877578975c4a6e2cb48	2021-04-26 23:21:05 -07:00
Pritam Damania	dc8a8cea79	Move caffe2 signal_handler to c10. (#56717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717 The signal_handler was under the caffe2 namespacee but was being used by PyTorch as well. I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch can use it. The signal_handler interface in caffe2/utils/signal_handler.h is kept the same for backward compatiblity for C2, but most of the commmon code is moved to c10. ghstack-source-id: 127446929 Test Plan: waitforbuildbot Reviewed By: ezyang Differential Revision: D27946738 fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f	2021-04-26 23:08:12 -07:00
Lucas Hosseini	6ed5bbfb46	[TensorPipe] Give higher priority to CPU-only channels. (#56908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56908 CUDA channels might implement CPU-to-CPU transfers, but will usually be less efficient for that purpose. Test Plan: CI Reviewed By: lw Differential Revision: D27994069 fbshipit-source-id: fefa7f243eb43cf769864233df518f2a1819f949	2021-04-26 22:27:44 -07:00
Edvard Ghazaryan	a09bbe73fd	static runtime support for fb::equally_split (#56812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56812 fb::equally_split get fused with ListUnpack and all outputs from ListUnpack getting attached to fb::equally_split. So fb::equal_split will have as many outputs as ListUnpack . Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/torch/fb/sparsenn:test -- test_equally_split_op Reviewed By: hlu1 Differential Revision: D27974999 fbshipit-source-id: b2ca19ff86aec76b977c1e3cfc56567adab66b35	2021-04-26 20:18:09 -07:00
Yi Wang	35f3feca28	[RPC Framework] Supporting reading the input from the remote worker (#56943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56943 If the module is placed on a CUDA device, then all the CPU tensors in `args` and `kwargs` will also be implicitly moved to the same CUDA device to run forward. Currently still need to move the forward output from CUDA device back to CPU, until: 1) Process group RPC backend is completely deprecated, and we always use TensorPipe RPC backend; 2) A device map is explicitly provided to TensorPipe RPC backend. These steps will be done in a separate PR. #Original PR issue: https://github.com/pytorch/pytorch/issues/51670 ghstack-source-id: 127457584 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- --exact 'caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test - test_load_di_parts (caffe2.torch.fb.training_toolkit.applications.sparse_nn.batch_distributed_inference.tests.batch_distributed_inference_test.BatchDistributedInferenceTest)' Reviewed By: wanchaol Differential Revision: D27934791 fbshipit-source-id: de27e27b905db83cc52800e63684fc6c942e9dc7	2021-04-26 20:04:06 -07:00
Meghan Lele	3721e01d60	Port adaptive_max_pool3d_backward to structured kernel (#56800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56800 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27984077 Pulled By: SplitInfinity fbshipit-source-id: 1425ae741474128f3aacd032d7f926ce5ea81101	2021-04-26 20:01:09 -07:00
Meghan Lele	77e3f5d73d	Port adaptive_max_pool2d_backward to structured kernel (#56799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56799 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27984078 Pulled By: SplitInfinity fbshipit-source-id: 6404513f413fc6966687d8f1e9ea2a423a332ec9	2021-04-26 20:00:07 -07:00
Guilherme Leobas	e7c79cb158	Add type annotations to nnapi (#48142 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48141 ~Mypy is complaining about a missing arg in a function call.~ ```bash torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary" [call-arg] Found 1 error in 1 file (checked 1140 source files) ``` `9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)` ~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~ Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142 Reviewed By: ezyang Differential Revision: D28006249 Pulled By: walterddr fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c	2021-04-26 19:08:07 -07:00
Mikhail Zolotukhin	8a0eb7fb2d	[TensorExpr] Docs: checkin 'Conditionals in TE' doc. (#56949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56949 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D28009077 Pulled By: ZolotukhinM fbshipit-source-id: 8d72c38ede623c93c6bd982d75a8ef9b23ba3825	2021-04-26 18:22:55 -07:00
Ansha Yu	e909ad2dc4	[static runtime] binding for aten::argmin_out (#56638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56638 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.55901 ms. 35.3486%. fb::sigrid_transforms_torch_bind (1 nodes) 0.986321 ms. 22.3636%. aten::linear (6 nodes) 0.722277 ms. 16.3767%. aten::argmin (1 nodes) 0.256231 ms. 5.80971%. aten::matmul (1 nodes) 0.149653 ms. 3.39319%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.105381 ms. 2.38938%. fb::clip_ranges_gather (263 nodes) 0.0911405 ms. 2.06649%. aten::sub (1 nodes) 0.0605429 ms. 1.37273%. aten::repeat (1 nodes) 0.0456569 ms. 1.03521%. aten::norm (1 nodes) 0.0421855 ms. 0.956501%. fb::batch_box_cox (1 nodes) 0.0370142 ms. 0.839249%. aten::__getitem__ (506 nodes) 0.0359091 ms. 0.814193%. prim::TupleUnpack (254 nodes) 0.0338332 ms. 0.767123%. aten::sigmoid (2 nodes) 0.0315159 ms. 0.714582%. aten::mul (3 nodes) 0.0297553 ms. 0.674662%. fb::offsets_to_ranges (253 nodes) 0.0279913 ms. 0.634666%. fb::simple_embedding_bag_sum (3 nodes) 0.0233521 ms. 0.529478%. aten::pow (1 nodes) 0.021296 ms. 0.48286%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0208991 ms. 0.473861%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0183163 ms. 0.415298%. aten::sum (3 nodes) 0.0164318 ms. 0.372571%. prim::DictConstruct (2 nodes) 0.0160191 ms. 0.363211%. prim::TupleConstruct (1 nodes) 0.0126953 ms. 0.287849%. aten::div (1 nodes) 0.0106084 ms. 0.240532%. static_runtime::to_copy (8 nodes) 0.0092846 ms. 0.210516%. prim::ListConstruct (4 nodes) 0.00916175 ms. 0.207731%. fb::sigrid_hash_precompute (1 nodes) 0.00707015 ms. 0.160307%. aten::contiguous (1 nodes) 0.00621954 ms. 0.14102%. aten::narrow (4 nodes) 0.00302307 ms. 0.0685441%. aten::add (1 nodes) 0.00290759 ms. 0.0659259%. aten::full (1 nodes) 0.00283369 ms. 0.0642503%. aten::logit (1 nodes) 0.00239244 ms. 0.0542455%. fb::gather_ranges (4 nodes) 0.00220181 ms. 0.0499232%. aten::relu (1 nodes) 0.00211563 ms. 0.0479691%. static_runtime::reshape_copy (2 nodes) 0.0020059 ms. 0.0454812%. aten::stack (1 nodes) 0.00186682 ms. 0.0423276%. aten::clamp_min (1 nodes) 0.00172548 ms. 0.039123%. aten::size (3 nodes) 0.0011853 ms. 0.0268751%. aten::expand_as (1 nodes) 0.000881784 ms. 0.0199933%. fb::clip_ranges (2 nodes) 0.000835602 ms. 0.0189462%. fb::lengths_to_offsets (3 nodes) 0.000444376 ms. 0.0100757%. static_runtime::flatten_copy (1 nodes) 0.000197078 ms. 0.00446848%. prim::device (1 nodes) 4.4104 ms. in Total StaticRuntime setup time: 0.000702 ms Memory allocation time: 0.00943333 ms Memory deallocation time: 0.062704 ms Outputs deallocation time: 0.0477171 ms Total memory managed: 831744 bytes Total number of reused tensors: 31 W0421 14:53:04.841202 929500 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 14:53:04.841315 929500 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 14:53:04.841341 929500 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 14:53:04.971776 929500 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 130.423. Iters per second: 7.66736 I0421 14:53:05.122830 929500 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27923172 fbshipit-source-id: 05cf5497fb6ac39dd3ff24f583607a3dff8cae95	2021-04-26 17:28:42 -07:00
Vasiliy Kuznetsov	9bd14da6e4	ns for fx: additional bugfix for user defined functions (#56762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56762 Adds a test case for wrapped sigmoid, and fixes the following issues to make it pass in NS: * allows comparing between x.sigmoid() and torch.sigmoid(x), if they are related * allows dtype cast from FP32_OR_INT8 to FP32, via dequantize (this will be improved later) Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27960766 fbshipit-source-id: 02935d2f400aa0b8f3d51bbf664a6c8ca89aa811	2021-04-26 17:03:32 -07:00
Vasiliy Kuznetsov	502c58ad84	ns for fx: allow comparing int8 to int8 for functionals (#56742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56742 Fixes a bug to allow shadowing of linear and conv functionals. The bug is to only detach tensors, not all objects. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8_fun ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27960767 fbshipit-source-id: abc911ca4b9edafd1effb9dada7731981538c2df	2021-04-26 17:03:30 -07:00
Vasiliy Kuznetsov	92c7aec5f5	ns for fx: add option to skip matching classes and functions (#56493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56493 Adds a config option to skip matching classes by class type and functions by function type. This is useful when users make custom modules which return types other than tensors. With the current implementation of Logger, these are not scriptable. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_module_scriptable ``` needs more testing before land Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27886107 fbshipit-source-id: ec92c4f7ab7141021bc022f07b3b558b42bbb986	2021-04-26 17:03:28 -07:00
Vasiliy Kuznetsov	c004346c88	ns for fx: support binary ops when adding unshadowed loggers for inputs (#56408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56408 Adds the ability to log unshadowed inputs of binary ops such as `add` and `mul`, when indices 0, 1, or 0 and 1 are tensors. Note: making shadowing support this is saved for a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_mul_inputs_activations ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27864296 fbshipit-source-id: 3cbeb728297aa192d1ea17e815299709fd9db056	2021-04-26 17:03:26 -07:00
Vasiliy Kuznetsov	f35540be38	ns for fx: bug fix for shadowing fp16 emulation patterns (#56384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56384 Enables shadow copies of fp16 emulation patterns where weights are cast to fp16 before being passed to linear. This previously did not work because copying of `call_method` nodes was not implemented. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_vs_linear_fp16_shadow_activations ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27857735 fbshipit-source-id: 7c1a067f035acf7322175f8535876d0ead88a86a	2021-04-26 17:03:25 -07:00
Vasiliy Kuznetsov	96a9eafcfb	ns for fx: add fp16 function shadowing (#56311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56311 Adds functionality for shadowing user functions with fp16 I/O dtype. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27836064 fbshipit-source-id: 37a434a04e2bd2593a892209bbae59f0f1f34319	2021-04-26 17:03:23 -07:00
Vasiliy Kuznetsov	1917350977	ns for fx: allow user functions in shadowing (#56301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56301 Allows usage of user functions in NS shadow APIs. We expose the i/o mapping to the user APIs, and thread them throughout the code. Note: the format of the mapping is currently not the best. Saving improving that for a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27833189 fbshipit-source-id: dac418e294d1c9b204efbf4071d5cc12a9e784c0	2021-04-26 17:03:21 -07:00
Vasiliy Kuznetsov	93de80203d	ns for fx: move node I/O dtype mapping to be local instead of global (#56296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56296 To support shadows of custom functions, we need to allow user to specify I/O type of the custom functions. This PR is a cleanup in preparation for making the above happen. We make the I/O dtype mappings be generated by a function instead of a global variable. In the next PR, we will add a hook so user can modify these mappings. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27831996 fbshipit-source-id: 782f5e77de0eef3899b9b7def0fdabd8dcafef12	2021-04-26 17:03:19 -07:00
Vasiliy Kuznetsov	8dbf6ae8fa	ns for fx: handling for user functions in weight and unshadowed act APIs (#56292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56292 Adds hooks for specifying user defined functions to NS weight and unshadowed activation APIs. Adding it to shadowed activation APIs will be a bit more work, upcoming in a separate PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27830409 fbshipit-source-id: 6bbddc3062c0b3e412a3147244795319c0785a92	2021-04-26 17:03:18 -07:00
Vasiliy Kuznetsov	d405d41a7c	ns for fx: enable user defined functions for graph matching (#56283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56283 Exposes the `base_name_to_sets_of_related_ops` variable to the graph matching API, so that users can add relationships for custom functions. This is needed to enable full support of external functions for custom backends. The next PR will extend this to the NS APIs. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_user_defined_function ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27830410 fbshipit-source-id: 8688cf697d388c52e3d18f108765edfca3c3d3aa	2021-04-26 17:02:11 -07:00
Yu Guo	f5c24cc891	add deterministic path for index_copy_cpu (#56900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56900 use serial copy with iter.serial_for_each in the deterministic mode Test Plan: buck test mode/opt //caffe2/test:torch -- test_index_copy_deterministic ✓ Pass: caffe2/test:torch - test_index_copy_deterministic_cpu (test_torch.TestTorchDeviceTypeCPU) (5.581) buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert_index_copy ✓ ListingSuccess: caffe2/test:torch_cuda - main (11.565) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (29.172) ✓ Pass: caffe2/test:torch_cuda - main (29.172) Reviewed By: ngimel Differential Revision: D27992992 fbshipit-source-id: cebeefd8508553f9dbc4145819fe90dd625502f3	2021-04-26 16:57:47 -07:00
Ansha Yu	0888b8726a	[static runtime] binding for aten::clamp_min_out (#56635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56635 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=0 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.50885 ms. 36.0064%. fb::sigrid_transforms_torch_bind (1 nodes) 0.92296 ms. 22.0251%. aten::linear (6 nodes) 0.695455 ms. 16.596%. aten::argmin (1 nodes) 0.237931 ms. 5.67787%. aten::matmul (1 nodes) 0.141634 ms. 3.37989%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.0925469 ms. 2.2085%. fb::clip_ranges_gather (263 nodes) 0.0886556 ms. 2.11563%. aten::sub (1 nodes) 0.0549624 ms. 1.3116%. aten::repeat (1 nodes) 0.043996 ms. 1.0499%. aten::norm (1 nodes) 0.0403472 ms. 0.962826%. fb::batch_box_cox (1 nodes) 0.0371137 ms. 0.885664%. aten::sigmoid (2 nodes) 0.035054 ms. 0.836512%. aten::__getitem__ (506 nodes) 0.0338771 ms. 0.808427%. prim::TupleUnpack (254 nodes) 0.0288516 ms. 0.688502%. aten::mul (3 nodes) 0.026195 ms. 0.625106%. fb::offsets_to_ranges (253 nodes) 0.0243627 ms. 0.581381%. aten::pow (1 nodes) 0.0210347 ms. 0.501962%. fb::simple_embedding_bag_sum (3 nodes) 0.0195358 ms. 0.466192%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0193484 ms. 0.461722%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0164265 ms. 0.391995%. aten::sum (3 nodes) 0.0157266 ms. 0.375291%. prim::TupleConstruct (1 nodes) 0.0156512 ms. 0.373493%. prim::DictConstruct (2 nodes) 0.0114427 ms. 0.273062%. aten::div (1 nodes) 0.00884876 ms. 0.211163%. static_runtime::to_copy (8 nodes) 0.00864496 ms. 0.206299%. prim::ListConstruct (4 nodes) 0.00803458 ms. 0.191734%. fb::sigrid_hash_precompute (1 nodes) 0.00619933 ms. 0.147938%. aten::contiguous (1 nodes) 0.00462827 ms. 0.110447%. aten::narrow (4 nodes) 0.00293105 ms. 0.0699452%. aten::logit (1 nodes) 0.00287083 ms. 0.0685082%. static_runtime::reshape_copy (2 nodes) 0.00250605 ms. 0.0598032%. aten::add (1 nodes) 0.00217015 ms. 0.0517875%. fb::gather_ranges (4 nodes) 0.00202655 ms. 0.0483607%. aten::full (1 nodes) 0.00200812 ms. 0.0479208%. aten::relu (1 nodes) 0.00175433 ms. 0.0418644%. aten::stack (1 nodes) 0.00174899 ms. 0.041737%. aten::clamp_min (1 nodes) 0.00134367 ms. 0.0320646%. aten::size (3 nodes) 0.000811416 ms. 0.0193633%. fb::clip_ranges (2 nodes) 0.000801096 ms. 0.019117%. aten::expand_as (1 nodes) 0.000541452 ms. 0.012921%. fb::lengths_to_offsets (3 nodes) 0.000477838 ms. 0.0114029%. static_runtime::flatten_copy (1 nodes) 0.000192906 ms. 0.00460342%. prim::device (1 nodes) 4.19049 ms. in Total StaticRuntime setup time: 0.000408 ms Memory allocation time: 0.00895982 ms Memory deallocation time: 0.0587527 ms Outputs deallocation time: 0.0430985 ms Total memory managed: 947328 bytes Total number of reused tensors: 28 W0421 14:33:55.610956 836281 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 14:33:55.611043 836281 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 14:33:55.611063 836281 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 14:33:55.736069 836281 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 124.995. Iters per second: 8.0003 I0421 14:33:55.874794 836281 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27922570 fbshipit-source-id: 095aa9bd0c425bc73eb48841653441d5c9e45744	2021-04-26 16:39:12 -07:00
Tao Xu	d221be6fb4	[iOS GPU] Use thread buffer to store indices for transpose (#56706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56706 We've seen the transpose op failed on iOS 12 devices. This is because the index buffer is allocated in the device address space which is shared across multiple threads. Write operations are not guaranteed to be atomic. Use a thread buffer solves the issue. ghstack-source-id: 127365795 Test Plan: CI Reviewed By: SS-JIA Differential Revision: D27941353 fbshipit-source-id: 5f09f0a085081b7c5e8019ebe711e36394cdde92	2021-04-26 16:34:35 -07:00
Ailing Zhang	16710e5d93	Add reasons in TODO for the unblocked AVNTM -> InferenceMode cases. (#56823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56823 Test Plan: CI Reviewed By: bhosmer Differential Revision: D27975596 fbshipit-source-id: 1d5681852163cd24ae245a6d90e44a34a0909145	2021-04-26 15:58:34 -07:00
Hao Lu	e810bed63f	[Static Runtime] Clean up op implementations (#56841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56841 - Move arg checks to outside the lambda so we can perform these checks at Static Runtime initialization time - use `optional` where possible - support `to.other` overload, the 5-arg input load of `torch.to`. Test Plan: ``` buck run //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/opt-clang //caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench_test -- --run-disabled ``` Reviewed By: edvgha Differential Revision: D27933176 fbshipit-source-id: 49d6249c8784c44146461e286e7a301596172d7c	2021-04-26 15:37:39 -07:00
Mustafa Bal	9b46b6b37a	Added sm_75 to CUDA Arch List for Linux CI GPU builds (#56619 ) Summary: This PR adds `sm_75` CUDA architecture support for CircleCI GPU builds, so that generated artifacts from these builds can be installed and run on machines with CUDA capability `sm_75`. This PR is currently to see how much longer the PR CI GPU builds will take with `TORCH_CUDA_ARCH_LIST="7.5"` rather than `TORCH_CUDA_ARCH_LIST="5.2"`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56619 Reviewed By: malfet Differential Revision: D28012538 Pulled By: seemethere fbshipit-source-id: 3959736721eab7389984234d89eadcf04d163c37	2021-04-26 15:32:14 -07:00
Shen Li	d1088de522	Let RRef getValue() synchronize CUDA streams (#56895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56895 PR #54932 fixes CUDA stream synchronization between RPC-created OwnerRRef and UserRRef when `to_here()` is invoked. However, there are two more gaps. 1. RRef value can be accessed on the owner directly through `local_value`, which bypasses the fix in #54932. 2. When RRef is created directly through RRef ctor instead of RPC, the OwnerRRef won't be able to correctly record CUDA events. This PR fixes 1 by letting current streams wait for RRef recorded CUDA events before returning the value in `RRef::getValue()`. For 2, more discussions is needed to decide whether we should add a `devices` argument to RRef ctor, or should RRef ctor inspect the given values. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27992775 Pulled By: mrshenli fbshipit-source-id: ed0e5bfbf715460208c85e46dd3317deef17f8fe	2021-04-26 15:27:26 -07:00
Igor Sugak	e1a7ec3c4f	[caffe2] fix -Wrange-loop-construct Test Plan: ``` % jf get -u D27943111 % buck build mode/dev-nosan admarket/adfinder:adfinder admarket/adindexer:adindexer \ -c cxx.extra_cxxflags='-Wno-implicit-const-int-float-conversion -Wno-sign-compare -Wno-deprecated-copy -Wno-deprecated-declarations -Wno-pass-failed' \ -c cxx.compiler_variant=clang-12 \ -c cxx.modules=false ``` Reviewed By: hlu1 Differential Revision: D27988238 fbshipit-source-id: 304e44bfa141a1bcb291f9434fed514bbb568f8f	2021-04-26 13:27:59 -07:00
Yu Guo	72c3ee073f	add deterministic path for index_add_cuda (#56521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56521 index_add_cuda is non-deterministic due to cuda atomicAdd. Here we add a deterministic code path with index_put(accumulate=True) Test Plan: buck test mode/opt //caffe2/test:torch_cuda -- test_index_add_deterministic ✓ ListingSuccess: caffe2/test:torch_cuda - main (12.289) ✓ Pass: caffe2/test:torch_cuda - test_index_add_deterministic_cuda (test_torch.TestTorchDeviceTypeCUDA) (27.190) ✓ Pass: caffe2/test:torch_cuda - main (27.190) Summary Pass: 2 ListingSuccess: 1 buck test mode/opt //caffe2/test:torch_cuda -- test_nondeterministic_alert ✓ ListingSuccess: caffe2/test:torch_cuda - main (16.088) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_kthvalue_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad1d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_bincount_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_EmbeddingBag_max_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_MaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_histc_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_linear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AdaptiveAvgPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_NLLLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_accumulate_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_put_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_trilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bicubic_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReflectionPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_scatter_add_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_AvgPool3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_grid_sample_3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_CTCLoss_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_FractionalMaxPool2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad3d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_index_copy_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_ReplicationPad2d_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_median_cuda_float64 (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_gather_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - test_nondeterministic_alert_interpolate_bilinear_cuda (test_torch.TestTorchDeviceTypeCUDA) (37.654) ✓ Pass: caffe2/test:torch_cuda - main (37.654) Summary Pass: 32 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D27861072 fbshipit-source-id: c33731017b863751f3e3068a23135129c555b66f	2021-04-26 12:14:58 -07:00
Eli Uriegas	cb1e78038f	.github: Add options to force unzip artifacts (#56929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56929 Artifacts were failing to unzip since they already existed in the current tree so this just forces the zip to go through no matter what Was observing that test phases will fail if attempting to zip over an already existing directory, https://github.com/pytorch/pytorch/runs/2424525136?check_suite_focus=true In the long run however it'd be good to have these binaries built out as part of the regular cmake process instead of being one off builds like they are now NOTE: This wouldn't be an issue if `--ephemeral` workers was a thing, see: https://github.com/actions/runner/pull/660 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D28004271 Pulled By: seemethere fbshipit-source-id: c138bc85caac5d411a0126d27cc42c60fe88de60	2021-04-26 11:40:48 -07:00
Yi Wang	7989f2ac87	Clang format dist_utils.py and rpc/__init__.py (#56853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56853 ghstack-source-id: 127412640 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27984669 fbshipit-source-id: 8e89ba0c53107622b3ca29ea296226e260b251df	2021-04-26 11:33:42 -07:00
skyline75489	6155b0d9fa	[reland] Trigger azure pipeline for multi gpu tests (#56128 ) Summary: Reland https://github.com/pytorch/pytorch/issues/52490 only for nightly builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/56128 Reviewed By: anjali411 Differential Revision: D28002257 Pulled By: seemethere fbshipit-source-id: d32bf420fee13b809cee402362f98942234d380b	2021-04-26 10:47:58 -07:00
Rong Rong (AI Infra)	2639c4e6b3	fix bug in rocm device type (#56646 ) Summary: related to https://github.com/pytorch/pytorch/issues/56156. https://github.com/pytorch/pytorch/issues/55808 effectively turned dtypeIfROCM off but let some legacy issues unfixed. Given the fact that we still need to deal with discrepancy between the 2 platform. This PR turns dtypeIfROCM default pointing to dtypeIfCUDA and only override when user specifies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56646 Reviewed By: mruberry Differential Revision: D27968959 Pulled By: walterddr fbshipit-source-id: 6a11987b8ddf4417577b3d0d5054eaab169de42c	2021-04-26 10:38:44 -07:00
Rong Rong	2f598b53dd	catch xml parser error during report test result phase in CI (#56864 ) Summary: Fixes Report test result step fails the test suite entirely, such as: https://app.circleci.com/pipelines/github/pytorch/pytorch/307916/workflows/4144870c-d1cf-4567-a6f8-93bb436471a4/jobs/12732796 and https://app.circleci.com/pipelines/github/pytorch/pytorch/307388/workflows/a945940f-3325-43b3-bc14-c9f885b21f50/jobs/12705944 and also not related but could be a problem that only partial test reports are uploaded: https://app.circleci.com/pipelines/github/pytorch/pytorch/308777/workflows/bdc37967-3863-448e-8264-311bf21ca381/jobs/12777741 This skips the parser error and move on to the next file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56864 Test Plan: CI Reviewed By: janeyx99 Differential Revision: D28002166 Pulled By: walterddr fbshipit-source-id: 6fa48122ae9dd68e401daf3692821fb00082b3ae	2021-04-26 10:29:20 -07:00
Yanli Zhao	28a9483e36	fix ddp logging test (#56640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56640 reset performance stats for current iteration, also fix ddp logging verifiction for sampled iterations. ghstack-source-id: 127327708 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D27923414 fbshipit-source-id: aaa1b10f64a0c952ba345c789c864bcef5cf1ab0	2021-04-26 10:12:05 -07:00
Ivan Yashchuk	5b1f0ef622	Add cuBLAS path for batched torch.geqrf (#56253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56253 `geqrfBatched` from cuBLAS is used if ``` (input.size(-2) <= 256 && batchCount(input) >= std::max<int64_t>(2, input.size(-2) / 16)) ``` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960156 Pulled By: mruberry fbshipit-source-id: 3e438eff01cbf7c7e075fb7aef709b97698a4650	2021-04-26 09:52:42 -07:00
Ivan Yashchuk	27a8ece805	Add cuSOLVER path for torch.geqrf (#56252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56252 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960152 Pulled By: mruberry fbshipit-source-id: 0510a302aab50623d7490efaba0133f740cd57c3	2021-04-26 09:52:41 -07:00
Ivan Yashchuk	f84f2063b4	Port CUDA torch.geqrf to ATen (#56251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56251 This PR ports `torch.geqrf` from TH to ATen for CUDA path. Resolves https://github.com/pytorch/pytorch/issues/24569 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27960155 Pulled By: mruberry fbshipit-source-id: a8b010c41d703a5de4bf40b045c89e6b95b5a5ca	2021-04-26 09:50:41 -07:00
Kevin Rose	5854e93bc9	Fix derivative of sinc at x=0 (#56763 ) Summary: Attempting to fix https://github.com/pytorch/pytorch/issues/56760 The derivative of `sinc(x)` at `x=0` should be special cased to 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56763 Reviewed By: zhangguanheng66 Differential Revision: D27978135 Pulled By: albanD fbshipit-source-id: ede5e734613cf60e720f6bcc7387c3cd9c6ec233	2021-04-26 09:43:42 -07:00
iramazanli	3e006fc57e	Adding hsplit,vsplit and dsplit methods (#53536 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/53536 Reviewed By: albanD Differential Revision: D27938880 Pulled By: iramazanli fbshipit-source-id: f741119517783ec2bafa296622ee518b587dd127	2021-04-26 09:39:09 -07:00
Ivan Yashchuk	6ba9fd5963	Added "Tensor tol" overload of torch.linalg.matrix_rank (#54157 ) Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489	2021-04-26 09:35:40 -07:00
Jane Xu	a90a3acbee	Use JIT Plug-in for coverage to cover JIT'd functions and methods (#56310 ) Summary: This PR is step 2 (after https://github.com/pytorch/pytorch/issues/56708) to having JIT coverage--it actually uses the plug-in in CI! Disclaimer: note that this will mark the entire JIT'd function/method as covered without seeking proof that the compiled code has been executed. This means that even if the code chunk is merely compiled and not run, it will get marked as covered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56310 Test Plan: We should see coverage improvements in CI after. A file to look out for would be `torch/jit/quantized.py`, which should have more coverage after this PR, which it does! `d3283ccd8c/torch/jit/quantized.py` vs https://codecov.io/gh/pytorch/pytorch/src/master/torch/jit/quantized.py More generally, the whole jit folder got ~3% increase in coverage, I believe. Reviewed By: walterddr Differential Revision: D28000672 Pulled By: janeyx99 fbshipit-source-id: 6712979d63a5e1224a92ee9bd9679ec62cf1cbba	2021-04-26 09:19:32 -07:00
Jane Xu	1e51c05b71	Name .coverage.jit with timestamp to prevent loss of stats (#56829 ) Summary: The reason we were not seeing so many wins was because .coverage.jit would overwrite itself every coverage run. (What a noob mistake who wrote that code?!?!) This should fix that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56829 Test Plan: Coverage in CI should audibly increase. It does, somewhat: Check out `f8a475b056`! New covered files include: Classes in torch/distributed/optim torch/utils/mkldnn.py Reviewed By: walterddr Differential Revision: D27984427 Pulled By: janeyx99 fbshipit-source-id: e82d074c2b4a60a5204a73efc2823824384c8bf5	2021-04-26 08:43:17 -07:00
IceTDrinker	689d3a70aa	Fix broken link to fx graph quant guide in quantization.rst (#56776 ) Summary: No oustanding issue, can create it if needed. Was looking for that resource and it was moved without fixing the documentation. Cheers Pull Request resolved: https://github.com/pytorch/pytorch/pull/56776 Reviewed By: heitorschueroff Differential Revision: D27967020 Pulled By: ezyang fbshipit-source-id: a5cd7d554da43a9c9e44966ccd0b0ad9eef2948c	2021-04-26 08:22:28 -07:00
Winston Smith	ed9c7e187b	Added OpInfo for addmm (#55920 ) Summary: Added an OpInfo for `addmm` & ported its `method_tests` Skipping `test_variant_consistency_eager` on CPU, as it's blocked by https://github.com/pytorch/pytorch/issues/56233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55920 Reviewed By: agolynski Differential Revision: D27800325 Pulled By: heitorschueroff fbshipit-source-id: 311cd26c6b491b486f652cf64275c6901fea03c5	2021-04-26 06:20:00 -07:00
Facebook Community Bot	b3f56ec0e0	Automated submodule update: tensorpipe (#56495 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `87f7681286` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56495 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: beauby Differential Revision: D27886370 fbshipit-source-id: 2b6e2b38412694633517df2b0501e5da9e81656c	2021-04-26 04:53:41 -07:00
sorenrasmussenai	f27513e951	Fix bug in torch.sparse.addmm on CUDA when beta != 0 or 1 (#56160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55917, which caused `torch.sparse.addmm` to fail on CUDA whenever `beta` was different from 0 or 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56160 Reviewed By: ejguan Differential Revision: D27825108 Pulled By: ngimel fbshipit-source-id: 2ade5ea38c5322768dc4dffb40c65fcbb17ec201	2021-04-26 02:57:41 -07:00
Mikhail Zolotukhin	f3743f097f	[TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27977461 Pulled By: ZolotukhinM fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f	2021-04-26 01:51:21 -07:00
Mikhail Zolotukhin	441c835733	[TensorExpr] Remove unused field from TensorExprKernel. (#56761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56761 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27960594 Pulled By: ZolotukhinM fbshipit-source-id: 8f2bf1d688422363b97f48045ff96601665301f5	2021-04-26 01:51:19 -07:00
Mikhail Zolotukhin	1faf1f96aa	[TensorExpr] Fuser: don't lift tensor constants from fusion groups. (#56756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56756 With #56319 TE kernel could handle tensor constants, so there is no more need in lifting them out and passing as inputs. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27959258 Pulled By: ZolotukhinM fbshipit-source-id: 00269cf1c4747c10dfc40cb4e330991d0bf1e2ee	2021-04-26 01:49:26 -07:00
Xiao Wang	7b31ba4708	Fix cudnn ctc loss backward (#56639 ) Summary: Fix cudnn ctc loss backward Fix https://github.com/pytorch/pytorch/issues/49046, which was working in pytorch 1.1 Originally modified in this PR in Oct 2019, https://github.com/pytorch/pytorch/pull/27039/files#diff-25ec2c1108ee03e2167622588ec31d167897ef1cccb12a4cfe77eb98777316daR2383-R2392 According to the original code `90ffab6e37/tools/autograd/derivatives.yaml (L1387-L1388)` and the code after PR `f461184505/tools/autograd/templates/Functions.cpp (L2456-L2465)` This `at::zeros({0}, raw_grad.options())` in line 2460 seems suspicious, and is causing `infer_size` runtime error ``` RuntimeError: The size of tensor a (0) must match the size of tensor b (177) at non-singleton dimension 2 Exception raised from infer_size at ..\aten\src\ATen\ExpandUtils.cpp:24 (most recent call first): ``` I've modified that to `at::zeros_like(raw_grad)`, which looks more accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56639 Reviewed By: mruberry Differential Revision: D27987860 Pulled By: ngimel fbshipit-source-id: 5ad65e78d017c26894fb26318a5992b0878d04d5	2021-04-25 22:51:19 -07:00
kshitij12345	9eee14704a	OpInfo: roll and rot90 (#56770 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56770 Reviewed By: ngimel Differential Revision: D27987820 Pulled By: mruberry fbshipit-source-id: c6b86cdc1b89d91eeda2215020137582e7c20c65	2021-04-25 22:12:38 -07:00
kshitij12345	9e027d7ea3	[OpInfo] Add opinfo for `transpose` and its aliases (#56122 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56122 Reviewed By: ezyang Differential Revision: D27962878 Pulled By: mruberry fbshipit-source-id: cfd84bb0dcedeb98233a10e2c9754281f7cb76af	2021-04-25 21:58:16 -07:00
kshitij12345	298db67220	[OpInfo] Add Function Variant and Opinfo for permute (#56125 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56125 Reviewed By: ezyang Differential Revision: D27960312 Pulled By: mruberry fbshipit-source-id: b9dd89f7e69d7dff29f3b53828656c13df898fa5	2021-04-25 21:26:44 -07:00
Peter Bell	267b554b6f	fx: Fix type_matches for Optional[List[int]] arguments (#56790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56790 If the argument doesn't match `List[int]`, this code falls through to `issubclass(argument_type, List[int])` which is invalid and raises a `TypeError`. If this happens during the processing of a `Union` (e.g. `Optional`), the other union types aren't given the chance to match against the signature. This also stop normalize_function from indescriminately swallowing exceptions, which let this bug go unnoticed. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27987746 Pulled By: mruberry fbshipit-source-id: c5aa5f61a215f0f39925e7053f33bff4b5d5acc2	2021-04-25 20:28:37 -07:00
Pavel Belevich	dde2bc4818	Add OPENSSL_ROOT_DIR to cmake.py (#56846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56846 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27992923 Pulled By: pbelevich fbshipit-source-id: dc2d26d4bc9d17a5da441ae4db8241609ca97c6e	2021-04-25 20:14:56 -07:00
Rohan Varma	7b74c3c70a	Enable tests for dist profiling with torch.profiler (#56216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56216 Verifies that the newly added distributed profiling works as expected for torch.profiler. Example trace from `test_ddp_profiling`: Note that tests are disabled internally due to an unrelated hang issue but run in OSS. ghstack-source-id: 127357993 Reviewed By: mrshenli Differential Revision: D27645105 fbshipit-source-id: 7ddba271acd8f7fbce1f9c5370830d5310314736	2021-04-25 19:41:27 -07:00
Rohan Varma	2d2370bb61	[Dist profiling] Fix ProcessGroupNCCL collective profiling (#55204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55204 Implements a fix discussed offline with pritamdamia87 to run end callbacks after `CUDAFuture`'s wrapCallback has ensured appropriate synchronization. Also enables the relevant distributed profiling tests that were previously disabled for ProcessGroupNCCL. Note that the profiling infrastructure has moved to primarily encourage the use of torch.profiler and CUPTI to trace CUDA kernels, support for distributed collectives for that will require further discussion with ilia-cher. However, this PR improves the usability of torch.autograd.profiler with respect to distributed collectives. ghstack-source-id: 127357995 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27491711 fbshipit-source-id: cec7703a4c5d59b5023b0aa8fef4c2e3fb8d37d0	2021-04-25 19:40:19 -07:00
Ilqar Ramazanli	70d9be0f42	Replace duplicative s with alpha (#56804 ) Summary: It is always easier to read a document when different objects / concepts denoted with different variables / representations. In this PR we make sure the [complex autograd](https://pytorch.org/docs/master/notes/autograd.html#autograd-for-complex-numbers) documentation, the variable of output and step size diverge. Fixes https://github.com/pytorch/pytorch/issues/53633 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56804 Reviewed By: anjali411 Differential Revision: D27989959 Pulled By: iramazanli fbshipit-source-id: c271590ee744c8aeeff62bfaa2295429765ef64e	2021-04-25 16:27:09 -07:00
nikithamalgi	d4707e260b	Infer types (#56832 ) Summary: Addresses: Infer argument types for functions in JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/56832 Reviewed By: pbelevich Differential Revision: D27979495 Pulled By: nikithamalgifb fbshipit-source-id: 82156a516c7f96cdd3f7a067d41cb210a6d13a51	2021-04-25 13:01:55 -07:00
Ivan Yashchuk	e97c17afa0	Update internal code for torch.geqrf (#56250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56250 Moved `apply_geqrf` to `BatchLinearAlgebraKernel.cpp`. Added `geqrf_stub` dispatch. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27907362 Pulled By: mruberry fbshipit-source-id: 6719464aef29dcf3bbbde060edf79f1e32fc8ad6	2021-04-25 03:46:59 -07:00
Ivan Yashchuk	d5ff432615	Add torch.linalg.svdvals (#56684 ) Summary: This PR adds `torch.linalg.svdvals(input, out=None)` that computes only the singular values of `input`. Resolves https://github.com/pytorch/pytorch/issues/54155. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56684 Reviewed By: albanD Differential Revision: D27938229 Pulled By: mruberry fbshipit-source-id: 5ea79ad9cccf818df0fbda1f431299ebf8de3798	2021-04-25 03:42:24 -07:00
Ivan Yashchuk	58fcf77712	Port CPU torch.geqrf to ATen (#56249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56249 This PR ports `torch.geqrf` from TH to ATen. CUDA path will be implemented in a follow-up PR. With ATen port support for complex and batched inputs is added. There were no correctness tests, they are added in this PR and I added OpInfo for this operation. We can implement the QR decomposition as a composition of geqrf and orgqr (torch.linalg.householder_product). Also we can implement the least squares solver with geqrf + ormqr + trtrs. So it's useful to have this function renewed at least for the internal code. Resolves https://github.com/pytorch/pytorch/issues/24705 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27907357 Pulled By: mruberry fbshipit-source-id: 94e1806078977417e7903db76eab9d578305f585	2021-04-25 01:17:00 -07:00
Philip Meier	805129f957	enable support for custom error messages in `torch.testing` (#55890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55890 Proof-of-concept for https://github.com/pytorch/pytorch/pull/55145#issuecomment-817297273 With this the user is able to pass a custom error message to `assert_(equal\|close)` which will be used in case the values mismatch. Optionally, a callable can be passed which will be called with mismatch diagnostics and should return an error message: ```python def make_msg(a, b, info): return ( f"Argh, we found {info.total_mismatches} mismatches! " f"That is {info.mismatch_ratio:.1%}!" ) torch.testing.assert_equal(torch.tensor(1), torch.tensor(2), msg=make_msg) ``` If you imagine `a` and `b` as the outputs of binary ufuncs, the error message could look like this: ```python def make_msg(input, torch_output, numpy_output, info): return ( f"For input {input} torch.binary_op() and np.binary_op() do not match: " f"{torch_output} != {numpy_output}" ) torch.testing.assert_equal( torch.binary_op(input), numpy.binary_op(input), msg=lambda a, b, info: make_msg(input, a, b, info), ) ``` This should make it much easier for developers to find out what is actually going wrong. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27903842 Pulled By: mruberry fbshipit-source-id: 4c82e3d969e9a621789018018bec6399724cf388	2021-04-24 23:37:44 -07:00
Philip Meier	edfbc989d1	add support for equal_nan in torch.testing.assert_close (#55788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55788 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27903821 Pulled By: mruberry fbshipit-source-id: c10254b2cdc7c1ae5a31b22913136013f0472b26	2021-04-24 23:37:43 -07:00
Philip Meier	27148db5df	Add support for scalars and numpy in torch.testing (#55786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55786 Add support to compare scalars as well as `np.ndarray`'s with torch.testing. We are reusing the mathcing functionality that is already in place for tensors, by casting the inputs. The approach can easily extended if we want to support other input types as long as they can be cast to a tensor. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27903814 Pulled By: mruberry fbshipit-source-id: fe3d063d0c9513cbd8b3408a2023e94c490c817e	2021-04-24 23:37:41 -07:00
Philip Meier	dbf3451c6e	Add support for checking tensor containers in `torch.testing` (#55385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55385 This renames `assert_tensors_(equal\|close)` to `_check_tensors_(equal\|close)` and exposes two new functions: `assert_(equal\|close)`. In addition to tensor pairs, the newly added functions also support the comparison of tensors in sequences or mappings. Otherwise their signature stays the same. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27903805 Pulled By: mruberry fbshipit-source-id: 719d19a1d26de8d14cb25846e3d22a6ac828c80a	2021-04-24 23:36:36 -07:00
Horace He	bcef7ebd60	[NNC] Added matmul for NNC lowering/unified dtypes (#56456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56456 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27977532 Pulled By: Chillee fbshipit-source-id: c04372d988c8ef795f27037348a155894c2eddad	2021-04-24 19:15:16 -07:00
Peter Bell	710288e413	torch.fft: Document out argument (#56732 ) Summary: An oversight from https://github.com/pytorch/pytorch/issues/49335, the documentation was never updated to include `out` arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56732 Reviewed By: ezyang Differential Revision: D27960478 Pulled By: mruberry fbshipit-source-id: a342a4f590369d6d2e17bed014fa64e49ee72936	2021-04-24 17:14:00 -07:00
xamm	6e5ce569bd	DOC: add note for torch.clamp() special case min > max See #45664 (#56367 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45664 This PR adds a note to the documentation for `torch.clamp()` to alert users to a special case: If `min` is greater than `max`, all values are set to the `max` value. Also, an example was added after the first code example. And this one is referenced in the note. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56367 Reviewed By: ezyang Differential Revision: D27960553 Pulled By: mruberry fbshipit-source-id: 9dc6016ccacebe87c809a0dd9f557b4aea0ae6f5	2021-04-24 17:09:22 -07:00
Shiyan Deng	45692fbef0	[fx splitter][fx net_min] Move Splitter, Minimizer and necessary deps to OSS (#56201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56201 Refactor Splitter and Minimizer to superclass `_SplitterBase` and `_MinimizerBase` and move them to OSS. This is needed to create an OSS example of GPU lowering with those tools. Test Plan: CI Reviewed By: jackm321 Differential Revision: D27629598 fbshipit-source-id: 0d4da02105ca509b31f1a6c4a39b1122c2bc7bf0	2021-04-24 15:19:12 -07:00
Igor Sugak	51bca2ca4d	[caffe2] fix -Wrange-loop-construct in onnx_exporter.cc (#56759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56759 ``` caffe2/caffe2/onnx/onnx_exporter.cc:415:21: error: loop variable 'it' creates a copy from type 'const std::pair<const std::basic_string<char>, int>' [-Werror,-Wrange-loop-construct] for (const auto it : blob_versions) { ^ caffe2/caffe2/onnx/onnx_exporter.cc:415:10: note: use reference type 'const std::pair<const std::basic_string<char>, int> &' to prevent copying for (const auto it : blob_versions) { ^~~~~~~~~~~~~~~ & ``` Reviewed By: yfeldblum Differential Revision: D27960126 fbshipit-source-id: fd46f37cf1aca9441209de8eb06add204046db95	2021-04-24 13:13:51 -07:00
Jordan Fix	4ef8205104	[fx][normalize] Allow for args to be left as args (#55995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55995 Normalization is kind of broken currently. But making default arguments visible still appears to work, and is nice functionality to still be able to rely on/use. Adds an option to `NormalizeArgs`'s `__init__` called `normalize_to_only_use_kwargs` which defaults to true, which if set to false will keep using the same signature as provided, but additionally set kwargs in kwargs. Test Plan: Added test to `test_fx_experimental`. Reviewed By: 842974287 Differential Revision: D27759448 fbshipit-source-id: 620061fcf46d8549ac70b62aede8b6740aee3778	2021-04-24 08:15:17 -07:00
Rong Rong (AI Infra)	3fbc15410a	Revert D27967517: [pytorch][PR] Use JIT Plug-in for coverage to cover JIT'd functions and methods Test Plan: revert-hammer Differential Revision: D27967517 (`88bd0510ef`) Original commit changeset: 53fd8431d772 fbshipit-source-id: 491841dcde629f1e9f8ee38be7366955c03b6e27	2021-04-24 07:53:49 -07:00
Luca Wehrstedt	c416167fb7	Add tests for CUDAFuture (#56518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56518 I don't think we have any tests for CUDAFuture (I couldn't find any, and I didn't write any in the past). I think especially for the two latest features added by this stack we should have a test to ensure they properly work and to catch regressions. (These tests also add indirect coverage for the more "basic" features of CUDAFuture). I didn't know how/where to add tests for C++ ATen stuff, so instead I added these tests to the Python RPC suite, using the torch.futures.Future wrapper. (It made sense in my mind because RPC is the main user of CUDAFuture). I'll gladly accept pointers to better ways of doing this. ghstack-source-id: 127295022 Test Plan: The tests themselves. Reviewed By: mrshenli Differential Revision: D27887191 fbshipit-source-id: 4ad6d81e676fe486aa8d329591ee1a3818fea059	2021-04-24 07:07:31 -07:00
Luca Wehrstedt	a688b29750	Support custom Python classes in CUDAFuture (#56516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56516 One problem with CUDAFuture's extraction of DataPtrs from IValues is that it only supported Python objects that could be converted to "regular" IValues (e.g., lists/dicts/tuples of ints/strings/tensors/...). One notable exception are custom Python classes, which are in fact a very common data type transferred over RPC. The only solution we found for those is to use the Python pickler to extract the tensors contained in them. We can't insert a Python dependency directly into CUDAFuture, so instead I'm proposing to use the same indirection technique used to support `getSubValues` on Python objects: define some methods on the abstract class `PyObjectHolder` (which can be used by CUDAFuture) but only implement them in the concrete subclass `ConcretePyObjectHolder` (which is only built when Python support is enabled). I am a bit worried about the performance toll of this (pickling isn't exactly known to be cheap) but I think we should start by providing a functionally complete API. We already have ideas on how to make this faster if needed, for example by having users provide a custom DataPtr extractor tailored to their class via a decorator. (Or just use TorchScript). ghstack-source-id: 127295014 Test Plan: Added a test later in the stack Reviewed By: mrshenli Differential Revision: D27887189 fbshipit-source-id: 9d27e4e62390b836e5bb4f06f401cc002f0cf95b	2021-04-24 07:06:28 -07:00
Hao Lu	e4efc0c948	[Static Runtime] Enable check_for_memory_leak in StaticRuntime::benchmark (#56839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56839 Enable check_for_memory_leak at the end of StaticRuntime::benchmark so this code is exercised more often. Test Plan: Checked with adindexer merge net model Reviewed By: edvgha Differential Revision: D27417911 fbshipit-source-id: 5248942dc439fcc7301ffb0005da76374939fa96	2021-04-23 19:54:58 -07:00
Amir Shimoni	34eb6c8589	[Caffe2] ScriptModuleOp support pass_inputs_as_tensor_list (#56813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56813 When the arg `pass_inputs_as_tensor_list` is True, the input tensors are wrapped into a TensorList and passes in as a single param. Test Plan: buck test //caffe2/caffe2/python:workspace_test -- TestScriptModule Reviewed By: dzhulgakov Differential Revision: D27972928 fbshipit-source-id: 5a199649445b0306f3134086c85bd55da45e1a0b	2021-04-23 18:49:57 -07:00
Eli Uriegas	b2b9efb33a	.github: Add initial Linux CI for CUDA (#56494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56494 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27953781 Pulled By: seemethere fbshipit-source-id: bce9298dc40d035bfbb5057e48b99d15c13733bc	2021-04-23 18:09:08 -07:00
Aliaksandr Ivanou	060e4c96ee	Torchelastic: forbid mp tests running with san (#56827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56827 The diff makes sure that mp tests are not executed in modes that allow san, since python mp does not behave well with tsan and asan. Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/launcher/... -- --run-disabled Reviewed By: cbalioglu Differential Revision: D27976626 fbshipit-source-id: 7747d67687fa0fd095f799b3708038f672119e73	2021-04-23 17:55:26 -07:00
Nikita Shulga	bd3dda95fd	Make old_gpu warning dynamic (#56621 ) Summary: Compute minimum support CUDA architecture as oldest GPU arch_list supported by current build Pull Request resolved: https://github.com/pytorch/pytorch/pull/56621 Reviewed By: soumith Differential Revision: D27920141 Pulled By: malfet fbshipit-source-id: 71a42dd60c38a658ebad4544bcfb3d2d20e471b5	2021-04-23 17:52:07 -07:00
Robin Cheng	5d940e2fbc	[TSAN] Fix PythonEngine data-race-on-vptr. (#56808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56808 For information about data-race-on-vptr in general, see https://www.internalfb.com/intern/wiki/TSAN/Common_Concurrency_Mistakes/Stopping_a_Thread_in_Destructor/ Engine::~Engine() was previously tasked with stopping the threads. This causes a data race on the object's vptr when PythonEngine is being destructed. This fixes the data race by making ~PythonEngine trigger the thread stopping before going down to the base class's destructor. Test Plan: Many tests are affected, but here's one example: buck test mode/dev-tsan -c fbcode.tsan_strict_mode=true //oculus/research/orcoptics/deep_learning/srg_nn/tests:test_grating_net -- 'test_train (oculus.research.orcoptics.deep_learning.srg_nn.tests.test_grating_net.TestGratingNet)' --run-disabled Reviewed By: walterddr, albanD Differential Revision: D27972384 fbshipit-source-id: 8b70fec8d9326497c591a2777b355ea590a85082	2021-04-23 17:39:27 -07:00
Tugsbayasgalan Manlaibaatar	2041cd6707	Enable forward/backward compatibility in TS mobile (#56079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27828149 Pulled By: tugsbayasgalan fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da	2021-04-23 16:55:18 -07:00
Ailing Zhang	be7a943bb8	s/AutoDispatchBelowAutograd/AutoDispatchBelowInplaceOrView. (#56657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56657 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27931526 Pulled By: ailzhang fbshipit-source-id: 3af718df3435e2b0b30bc62070dbdc5aeeecdfb4	2021-04-23 15:50:00 -07:00
Scott Wolchok	375ebd634a	[PyTorch] Break up generated tag in source (#56503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56503 The presence of `generated` causes Phabricator and hg to think the file is generated (e.g., hg won't prompt to resolve merge conflicts with an editor). Breaking up the tag is the traditional way to solve this. ghstack-source-id: 126965382 Test Plan: Review, builds Reviewed By: ailzhang Differential Revision: D27887691 fbshipit-source-id: 394a38d50289d64f8801a13f9a28f6f0f37ca59d	2021-04-23 15:46:24 -07:00
Rong Rong (AI Infra)	5288d05cfd	Revert D27958477: [PyTorch][Edge] Add v4 and v5 models and remove unused model Test Plan: revert-hammer Differential Revision: D27958477 (`2e4c68a727`) Original commit changeset: 2e6f985a988d fbshipit-source-id: 520cb8a353d91cd26cb27880a0a8e27dbfcd2d99	2021-04-23 14:42:01 -07:00
Liang Luo	c37095760d	[torch distributed] Implementing all_gather_base (#56315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56315 This diff implements the all_gather_base in pytorch distributed. Test Plan: dist.all_gather_base(output, input)... Reviewed By: agolynski, amylittleyang Differential Revision: D27488999 fbshipit-source-id: 937ec8bddf9527fa4d114f984d1d0f6a5b8c3936	2021-04-23 14:16:47 -07:00
Raghavan Raman	5b7317b562	[NNC] API for Buffer Compression (#55853 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54338 This PR adds the following API in NNC to implement "buffer compression". ``` static void compressBuffer(Buf* buf, Stmt* stmt); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853 Reviewed By: ezyang Differential Revision: D27960986 Pulled By: navahgar fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac	2021-04-23 14:12:03 -07:00
albanD	e098515b89	Fix cdist backward for empty inputs (#56606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56606 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27939201 Pulled By: albanD fbshipit-source-id: 7ac2b579577cc5b58e714935d791be26478eb83c	2021-04-23 14:08:20 -07:00
albanD	0d7e780eff	Fix broadcasting of cdist backward (#56605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56605 Fix https://github.com/pytorch/pytorch/issues/55370 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27939202 Pulled By: albanD fbshipit-source-id: a4ac50a7b504c24f47f5343414fb57523546a0c7	2021-04-23 14:08:18 -07:00
albanD	3ddcc8d833	Add more test cases for cdist OpInfo and TODOs (#56604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56604 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27939203 Pulled By: albanD fbshipit-source-id: 197de148ba00d217eb0bfc5b5724d23cf6de0910	2021-04-23 14:08:17 -07:00
albanD	10fd7d8be6	Add option to OpInfo to skip gradgrad check and empty cdist OpInfo (#56603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56603 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27939204 Pulled By: albanD fbshipit-source-id: c7c80551ef3c34c822832891a99104440893ea4c	2021-04-23 14:06:33 -07:00
Aswin John Mathews	ed2104fe5c	Fixing MAGMA with HIP issues (#56448 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55552 * Root-caused issue to MAGMA kernels * Issue is fixed on master of MAGMA MAGMA issue: https://bitbucket.org/icl/magma/issues/43/zgetrf_batched-shfl-kernel-failure-seen-on * Changing PyTorch to use particular commit sha from master of MAGMA project * ~~Reactivating skipped ROCm tests~~ : We will reactivate tests in a different PR Corresponding PyTorch builder PR: https://github.com/pytorch/builder/pull/695 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56448 Reviewed By: seemethere Differential Revision: D27974563 Pulled By: janeyx99 fbshipit-source-id: 25e6f95a20a06d27a5199a623dd7c5db7ca8d6ea	2021-04-23 13:42:29 -07:00
davidriazati@fb.com	0424f6af93	Local lint fixes - missing steps, pin to bash (#56752 ) Summary: Fixes #56738 * `setup_lint` now installs mypy / shellcheck * the shell used to execute commands is pinned to `bash` (on Ubuntu the default is `dash`, which was causing the false positives in #56738) * the emoji check marks don't always work, so use more basic ones instead * adds `Run autogen` step for mypy (for the `lint` step only since it's pretty slow) ](https://our.intern.facebook.com/intern/diff/27972006/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56752 Pulled By: driazati Reviewed By: samestep Differential Revision: D27972006 fbshipit-source-id: 624e6c1af2d4f7c8623f420516744922b6b829a5	2021-04-23 13:10:14 -07:00
Tugsbayasgalan Manlaibaatar	6de1d9b2d0	Fix bug in emitUse to drop all values that are marked as drop (#56652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56652 Previous code doesn't drop prim::Constant values even when they are marked as drop. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27927413 fbshipit-source-id: 67cd52cf292e111be2830ccf93b0e7b089e49001	2021-04-23 12:42:51 -07:00
Chen Lai	2e4c68a727	[PyTorch][Edge] Add v4 and v5 models and remove unused model (#56751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56751 ## Summary 1. Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002) 2. Remove an unused model. Side note: these binaries are part of the test in https://github.com/pytorch/pytorch/pull/56002, and currently there is an ongoing issue to `ghexport` with binaries (post is https://fb.workplace.com/groups/533197713799375/permalink/1130109004108240/). `ghimport` can work with binary after checking temporary diff (D23336574). Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27958477 Pulled By: cccclai fbshipit-source-id: 2e6f985a988da55ad08fb9a5037434a2b6db0776	2021-04-23 11:52:42 -07:00
Allen (Congcong) Chen	798dd4665d	Add a new API replace_input_with to node.py (#55887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55887 Reviewed By: jfix71 Differential Revision: D27731389 fbshipit-source-id: 754654e64c4f3a584dfea06322d833bc11bcc3cc	2021-04-23 11:37:41 -07:00
Joel Schlosser	7d2a9f2dc9	Fix instance norm input size validation + test (#56659 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/45687 Fix changes the input size check for `InstanceNorm*d` to be more restrictive and correctly reject sizes with only a single spatial element, regardless of batch size, to avoid infinite variance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56659 Reviewed By: pbelevich Differential Revision: D27948060 Pulled By: jbschlosser fbshipit-source-id: 21cfea391a609c0774568b89fd241efea72516bb	2021-04-23 10:53:39 -07:00
Jacob Szwejbka	7e9f7fb980	[Pytorch Edge] Prepack folding for functions besides forward (#56081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56081 ghstack-source-id: 127205799 Test Plan: unit test. Since I'm prepacking the weights of the same operators multiple times I wonder if its a just works thing? Reviewed By: kimishpatel Differential Revision: D27777337 fbshipit-source-id: 909d2a667d9eb51e205536b478a6668c33b3fb15	2021-04-23 10:40:15 -07:00
Rohan Varma	7ff1990caf	[c10d] Increment sequence numbers on collectives. (#55718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55718 Increments sequence numbers when ProcessGroupGloo::enqueue or ProcessGroupNCCL::collective is run, which is a common call all collectives make. The next step will be to log these along with other collective info in debug mode as well as integrating them with the process group wrapper. ghstack-source-id: 127215077 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27690690 fbshipit-source-id: cb284b7c760763b7c0f814a41f06656fabf806d6	2021-04-23 10:06:56 -07:00
Xiaodong Wang	ed0a0c3578	Revert D27902824: static runtime support for fb::equally_split Test Plan: revert-hammer Differential Revision: D27902824 (`a4e47ea152`) Original commit changeset: 7855047c3bd4 fbshipit-source-id: a46834418ce98826871cd604d1a01f0ff8f23d7f	2021-04-23 10:03:12 -07:00
Ilqar Ramazanli	d1fe68e70b	To add single and chained learning schedulers to docs (#56705 ) Summary: In the optimizer documentation, many of the learning rate schedulers [examples](https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate) are provided according to a generic template. In this PR we provide a precise simple use case example to show how to use learning rate schedulers. Moreover, in a followup example we show an example how to chain two schedulers next to each other. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56705 Reviewed By: ezyang Differential Revision: D27966704 Pulled By: iramazanli fbshipit-source-id: f32b2d70d5cad7132335a9b13a2afa3ac3315a13	2021-04-23 09:36:00 -07:00
Jane Xu	88bd0510ef	Use JIT Plug-in for coverage to cover JIT'd functions and methods (#56310 ) Summary: This PR is step 2 (after https://github.com/pytorch/pytorch/issues/56708) to having JIT coverage--it actually uses the plug-in in CI! Disclaimer: note that this will mark the entire JIT'd function/method as covered without seeking proof that the compiled code has been executed. This means that even if the code chunk is merely compiled and not run, it will get marked as covered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56310 Test Plan: We should see coverage improvements in CI after. A file to look out for would be `torch/jit/quantized.py`, which should have more coverage after this PR, which it does! `d3283ccd8c/torch/jit/quantized.py` vs https://codecov.io/gh/pytorch/pytorch/src/master/torch/jit/quantized.py More generally, the whole jit folder got ~3% increase in coverage, I believe. Reviewed By: ezyang Differential Revision: D27967517 Pulled By: janeyx99 fbshipit-source-id: 53fd8431d772c2447191135c29d1b166ecd42f50	2021-04-23 09:12:21 -07:00
albanD	22b151a3ba	Make sure full backward hook fire when no input requires grad (#56693 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56380 BC-breaking note: This changes the behavior of full backward hooks as they will now fire properly even if no input to the Module require gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56693 Reviewed By: ezyang Differential Revision: D27947030 Pulled By: albanD fbshipit-source-id: e8353d769ba5a2c1b6bdf3b64e2d61308cf624a2	2021-04-23 08:46:49 -07:00
Shen Li	acca89e25f	Add more RRef CUDA RPC tests (#56757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56757 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27959592 Pulled By: mrshenli fbshipit-source-id: b72c873bcaef4515b0fc8d48ae539477e1850a40	2021-04-23 08:40:41 -07:00
Heitor Schueroff	369e8bc4bc	Added support for uppercase letters in torch.einsum (#56475 ) Summary: This PR adds support for upper case letters in `torch.einsum` equation. Addresses PR https://github.com/pytorch/pytorch/pull/55013 here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56475 Reviewed By: ailzhang Differential Revision: D27948362 Pulled By: heitorschueroff fbshipit-source-id: 51cf57b17c4c23d88fab5343f17ba3bfbe3607a5	2021-04-23 08:13:58 -07:00
Luca Wehrstedt	15ca379bde	Add CUDA support to a user-created torch.futures.Future (#56517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56517 Currently a torch.futures.Future could wrap a CUDAFuture, but it could not create one from scratch. This prevented users from using CUDAFutures in some occasions, for example when using `rpc.functions.async_execution`, or in their own code. I don't see any reason for such a limitation, hence here I add support for this. ghstack-source-id: 127261554 Test Plan: Added a test later in the stack Reviewed By: mrshenli Differential Revision: D27887190 fbshipit-source-id: ecbb39c1ad7cd189d478ded9c361448f05a270ad	2021-04-23 08:13:56 -07:00
Luca Wehrstedt	58d12eb75e	Allow to specify a set of device for CUDAFuture (#56515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56515 In https://github.com/pytorch/pytorch/pull/56405 we finally found a solution to support RPC remote user functions that created/used CUDA tensors on devices that were not used by their arguments, by defining a "bounding set" of devices when constructing the agent and allowing all functions to freely use any of those devices. We had the same exact problem with the callbacks of CUDAFuture, and in this PR I'm adopting the same exact solution: I allow to specify a set of devices when constructing a CUDAFuture, and then every callback is allowed to use any of those devices. (These devices will also be propagated to child futures). I'm also making ProcessGroupNCCL pass these devices. I can't yet do it for TensorPipeAgent, until #56405 lands. ghstack-source-id: 127261552 Test Plan: Added a test for this later in the stack. Reviewed By: mrshenli Differential Revision: D27861067 fbshipit-source-id: 8ab2c9d06a514c0407a7e96abc3704e8d5c5dc09	2021-04-23 08:12:41 -07:00
Nikolay Korovaiko	d6a25a58f5	add hardtanh(0,6) to the set of MKLDNN fusible ops for mobilenetv2 (#56203 ) Summary: TODO: post the numbers for mobilenetv2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56203 Reviewed By: malfet Differential Revision: D27917557 Pulled By: Krovatkin fbshipit-source-id: acea0f933a7e8c7a036a494295f68222c46a36f7	2021-04-23 08:08:17 -07:00
Scott Wolchok	7b7a4750a9	[PyTorch] Migrate hacky wrapper removal to borrow_from_optional_tensor (#56648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56648 Generated with ``` fastmod -m "^((?P<indent>\s)// See \[Note: hacky wrapper removal for optional tensor\]) \sconst Tensor& (?P<varname>[A-Za-z_]+) = c10::value_or_else$(?P<optionalname>[A-Za-z_]+), \[\] \{return Tensor\($;\}\);" \ '${1} ${indent}c10::MaybeOwned<Tensor> ${varname}_maybe_owned = c10::borrow_from_optional_tensor(${optionalname}); ${indent}const Tensor& ${varname} = *${varname}_maybe_owned;' ``` ghstack-source-id: 127112928 Test Plan: CI Reviewed By: wenleix Differential Revision: D27925837 fbshipit-source-id: 720a4f2e3b96e14c93466698c9c4a3b9c8446a69	2021-04-23 08:05:02 -07:00
Scott Wolchok	f2fd91ccfd	[PyTorch] Add & document borrow_from_optional_tensor (#56647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56647 This should be more efficient than the old hacky wrapper for optional Tensor pattern. Despite appearances, the old pattern did a reference count bump for non-empty optionals. Following diff will contain an automated change to migrate callsites. ghstack-source-id: 127112926 Test Plan: Review, CI on following change Reviewed By: bhosmer Differential Revision: D27925838 fbshipit-source-id: 2c6082c5930b1e71b853a75c52873088dbc48167	2021-04-23 08:05:00 -07:00
Winston Smith	02c3e6d98a	addmm CPU inplace implementation shouldn't resize an input tensor (#56452 ) Summary: `addmm` CPU inplace implementation shouldn't resize an input tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56452 Reviewed By: malfet Differential Revision: D27925216 Pulled By: ngimel fbshipit-source-id: 3a4cda62ea59774ddf89f2c0592e9faffa1afe43	2021-04-23 08:03:58 -07:00
Joel Schlosser	e5fda07e80	Fix: Compare input against beta * threshold in softplus backwards (#56484 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55587 The fix converts the binary `TensorIterator` used by softplus backwards to a ternary one, adding in the original input for comparison against `beta * threshold`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56484 Reviewed By: malfet Differential Revision: D27908372 Pulled By: jbschlosser fbshipit-source-id: 73323880a5672e0242879690514a17886cbc29cd	2021-04-23 07:58:51 -07:00
cyy	83c23703b7	Some simple optimizations (#51831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51831 Reviewed By: albanD Differential Revision: D26379122 Pulled By: VitalyFedyunin fbshipit-source-id: d3562232f8501f2ad0b291586bf7f828e9b47010	2021-04-23 07:55:15 -07:00
Aliaksandr Ivanou	0a72904ab4	Torchelastic: make process failure init error non-fatal (#56739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56739 The diff makes several tiny changes: * Add logs for each worker error file destination * Make sure log_dir is propagated from the launcher * Make ProcessFailure initialization error non-fatal. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/multiprocessing/errors:api_test https://fburl.com/tupperware/0nizb9z8 Reviewed By: borovsky-d, wilson100hong Differential Revision: D27952596 fbshipit-source-id: 69582bf4be47758def4008f2abf82d123294cd1a	2021-04-23 00:49:47 -07:00
Edvard Ghazaryan	a4e47ea152	static runtime support for fb::equally_split (#56565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56565 fb::equally_split get fused with ListUnpack and all outputs from ListUnpack getting attached to fb::equally_split. So fb::equal_split will have as many outputs as ListUnpack . Test Plan: buck test caffe2/torch/fb/sparsenn:fb_operators_test buck test caffe2/torch/fb/sparsenn:test -- test_equally_split_op Reviewed By: hlu1 Differential Revision: D27902824 fbshipit-source-id: 7855047c3bd46bbb74b7346ac384c70b6a3e1f46	2021-04-23 00:12:54 -07:00
Horace He	7c50852a60	moved more lowerings over (#55372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55372 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27884601 Pulled By: Chillee fbshipit-source-id: 91b00182abb5dcf60209425d2717fa0303cb4932	2021-04-23 00:08:26 -07:00
Kurt Mohler	1f04494c0e	Consolidate nondeterministic error tests (#55631 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51498 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55631 Reviewed By: malfet Differential Revision: D27909953 Pulled By: mruberry fbshipit-source-id: 9115b2433f9c276555be55bd51b270a7a2846829	2021-04-22 23:37:01 -07:00
Lillian Johnson	88deea4e29	[torch.package] is_from_package check (#56729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56729 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27949829 Pulled By: Lilyjjo fbshipit-source-id: 1159d66d51b8f187c43847a5d449b13683c39eeb	2021-04-22 22:28:07 -07:00
BowenBao	913f1f75b3	Revert "Revert [ONNX] Redesign inplace conversion" (#56675 ) Summary: Adjust how MutationRemover is used to avoid creating aliasDb multiple times for the same graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56675 Reviewed By: pbelevich Differential Revision: D27945692 Pulled By: SplitInfinity fbshipit-source-id: a6c548438e88ddee18ef03a6f0461ab9eaaaa829	2021-04-22 22:22:16 -07:00
Bert Maher	461e887d92	CPU Convolution benchmark harness for some popular models (#56455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56455 CPU convolution performance is pretty important for inference, so tracking performance for CNNs often boils down to finding shapes that have either regressed or need optimization. This diff adds a benchmark harness that lets you pretty easily add new sets of convolution parameters to benchmark. I've started with an exhaustive list of layers from MobileNetV3, ResNet-18 and ResNet-50, which are fairly popular torchvision models. More to come if these prove useful. I've also added four backend configurations: - native: uses at::conv2d, which applies its own backend selection heuristics - mkldnn_none: uses mkldnn but applies no prepacking; uses the NCHW default - mkldnn_weight: prepacks weights in an mkldnn-friendly format - mkldnn_input: also prepacks the inputs in NCHW16c ghstack-source-id: 127027784 Test Plan: Ran this on my Skylake Xeon Reviewed By: ngimel Differential Revision: D27876139 fbshipit-source-id: 950e1dfa09a33cc3acc7efd579f56df8453af1f2	2021-04-22 22:14:36 -07:00
Nikita Shulga	f84a50109f	Move windows testers to previous image (#56626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56626 Reviewed By: seemethere Differential Revision: D27920812 Pulled By: malfet fbshipit-source-id: faa739ca8500654df18cf963707b31c3345132cf	2021-04-22 20:53:41 -07:00
Hui Guo	29491f7954	[NNC] Add unroll and flatten APIs which not require return stmt pointer (#56420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56420 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27866118 Pulled By: huiguoo fbshipit-source-id: f7e44fb20ef3a3c43b95d15f7b3b12e9e5cc89c9	2021-04-22 19:59:34 -07:00
Jeffrey Wan	2078836005	Clean up raise exception logic (#55656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55656 ### For release notes What: - All errors that are silenced by "raise_exception=False" are now GradcheckError (which inherits from RuntimeError). Why: - Due to a refactor of gradcheck Workaround: - If you catch for 'RuntimeError' with `except RuntimeError`, since GradcheckError inherits from RuntimeError, no changes are necessary. However if you explicitly check for the errors type via `type(error)`, you'll need to update your code to check for `GradcheckError` instead. Factors out all the logic handling involving `fail_test`, `raise_exception` into 1) a wrapper around gradcheck that uses try/except 2) gradcheck_helper that always raises exception. This allows us to avoid having to write the `if not x: return False` logic that is scattered throughout gradcheck currently. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27920809 Pulled By: soulitzer fbshipit-source-id: 253aef6d9a3b147ee37a6e37a4ce06437981929a	2021-04-22 19:46:39 -07:00
Jeffrey Wan	d01302431c	Enable fast gradcheck for real inputs and outputs (#55237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55237 In this PR, we reenable fast-gradcheck and resolve misc issues that arise: Before landing this PR, land #55182 so that slow tests are still being run periodically. Bolded indicates the issue is handled in this PR, otherwise it is handled in a previous PR. Non-determinism issues: - ops that do not have deterministic implementation (as documented https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms) - test_pad_cuda (replication_pad2d) (test_nn) - interpolate (test_nn) - cummin, cummax (scatter_add_cuda_kernel) (test_ops) - test_fn_gradgrad_prod_cpu_float64 (test_ops) Randomness: - RRelu (new module tests) - we fix by using our own generator as to avoid messing with user RNG state (handled in #54480) Numerical precision issues: - jacobian mismatch: test_gelu (test_nn, float32, not able to replicate locally) - we fixed this by disabling for float32 (handled in previous PR) - cholesky_solve (test_linalg): #56235 handled in previous PR - cumprod (test_ops) - #56275 disabled fast gradcheck Not yet replicated: - test_relaxed_one_hot_categorical_2d (test_distributions) Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27920906 fbshipit-source-id: 894dd7bf20b74f1a91a5bc24fe56794b4ee24656	2021-04-22 19:46:37 -07:00
Jeffrey Wan	2ea3c24c06	Disable flaky tests (#56279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56279 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27916606 Pulled By: soulitzer fbshipit-source-id: 60c07024f6eb818f4aa6730a5f9ff90d7bc2b80f	2021-04-22 19:45:41 -07:00
davidriazati@fb.com	5c752ead3e	Print non-breaking space directly in lint.yml (#56726 ) Summary: After some fun investigating, samestep found that `\u1234` to produce a unicode character is only supported in bash > 4.2, but MacOS ship with bash/sh 3.2, so it was searching for the literal string `u1234`. This fixes the issue by printing out the char directly via its UTF-8 bytes and `printf`. ](https://our.intern.facebook.com/intern/diff/27952866/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56726 Pulled By: driazati Reviewed By: SplitInfinity Differential Revision: D27952866 fbshipit-source-id: 35871e959e250dfdbbdf8b121fc92212bc0614e8	2021-04-22 16:58:12 -07:00
Eli Uriegas	08ce2300bf	torch: Add cpython as a dependency for torch_python_obj (#56740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56740 Was running into a race condition where the torch_python_obj was attempting to build before cpython was actually finished installing, this should resolve that issue. Only applicable on builds that use the `USE_DEPLOY=ON` option Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27953782 Pulled By: seemethere fbshipit-source-id: 76dd7c4218870eac97fc4c14e20b46128d264b30	2021-04-22 16:52:29 -07:00
Vasiliy Alekseev	bac4cfd54d	Fix mp serialization for integer nn.Parameter on CUDA (#56529 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56342 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56529 Reviewed By: albanD Differential Revision: D27896094 Pulled By: ngimel fbshipit-source-id: fe817781eb7139ea57c78acfd56e7c11b61eb4ed	2021-04-22 16:21:04 -07:00
Joel Schlosser	febff45900	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: albanD Differential Revision: D27939544 Pulled By: jbschlosser fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805	2021-04-22 16:16:53 -07:00
Tran Le	3a4344a717	Create helper function for RPC profiling in _invoke_rpc and remote (#56643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56643 Refactor enabling rpc profiling logic in `_invoke_rpc` and `remote()` into `_rpc_profiling()` helper function. Reviewed By: rohan-varma Differential Revision: D27922286 fbshipit-source-id: 27cfe662a401756f0ee8a3cd45978d933377f78f	2021-04-22 15:15:49 -07:00
Jerry Zhang	1719cb82f3	[quant][graphmode][fx] Support preserving attributes in deepcopy of observed/quantized graphmodule (#56550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56550 Add support for preserving a list of attributes on observed/quantized GraphModule Test Plan: python test/test_quantization.py TestQuantizeFx.test_deepcopy_preserve_attributes Imported from OSS Reviewed By: vkuzo, kazhang Differential Revision: D27899317 fbshipit-source-id: ebf21334715e5ab764aaa27eed534cc0cdf9f2b5	2021-04-22 15:02:44 -07:00
Nikita Shulga	3a44d269ac	Add periodic_ prefix to all jobs run by cron (#56695 ) Summary: To make them more easily distinguishable in the HUD Pull Request resolved: https://github.com/pytorch/pytorch/pull/56695 Reviewed By: walterddr, samestep Differential Revision: D27939938 Pulled By: malfet fbshipit-source-id: e0abd1a6bc931a89f2aa5c6e2d8ebb471c461051	2021-04-22 14:25:17 -07:00
Zafar Takhirov	375687839e	[sparsity] Moving the sparsity python files to OSS (#56617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56617 This migrates the sparsity to the open source Test Plan: `buck test mode/opt //caffe2/test:ao` Reviewed By: raghuramank100 Differential Revision: D27812207 fbshipit-source-id: cc87d9d2b486269901a4ad9b483615741a1cd712	2021-04-22 14:07:31 -07:00
Jane Xu	31fe2bbb30	Remove extraneous variables in windows report stats step (#56596 ) Summary: Testing that the CIRCLE variables in the Windows test CI report stats step aren't needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56596 Test Plan: CI Reviewed By: samestep Differential Revision: D27948983 Pulled By: janeyx99 fbshipit-source-id: 71f2ca08246eea7580e31fb632612b205fb995fc	2021-04-22 13:45:09 -07:00
Jane Xu	5b01b3e8e8	Introducing JitPlugin (#56708 ) Summary: This PR is step 1 to covering JIT'd methods and functions. Step 2 (using it in CI) is here: https://github.com/pytorch/pytorch/issues/56310. 1. This PR introduces a package `coverage_plugins` that hosts JITPlugin. 2. We also bring in a `.coveragerc` file that is used in CI to omit the files we don't want to report on (e.g., temporary directories or test or utils.) Disclaimer: This PR does NOT use the plug-in. Nothing should change as a result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56708 Test Plan: CI. Coverage should not go down. If you're interested in testing this plug-in locally, you should: `pip install -e tools/coverage_plugins_package` from the root directory. Add the following lines to `.coveragerc` under `[run]` ``` plugins = coverage_plugins.jit_plugin ``` And then try: `coverage run test/test_jit.py TestAsync.test_async_script_no_script_mod` You should see `.coverage.jit` show up at the end. You can then run `coverage combine --append` and `coverage debug data` to see that some files in `torch/jit` are covered. Reviewed By: samestep Differential Revision: D27945570 Pulled By: janeyx99 fbshipit-source-id: 78732940fcb498d5ec37d4075c4e7e08e96a8d55	2021-04-22 13:41:49 -07:00
Jeffrey Wan	2128a84a69	Fix grad_fn bindings when saved variable freed (#56499 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54472 Adds HANDLE_TH_ERRORS to python bindings for grad_fn attrs and updates tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56499 Reviewed By: albanD Differential Revision: D27920742 Pulled By: soulitzer fbshipit-source-id: d4f7ac8c0aa2173d25517277c393f8c66de68951	2021-04-22 13:40:40 -07:00
Alban Desmaison	679cc7eb13	Re-enable fast winograd conv on IOS (#56021 ) Summary: This is the proper fix for https://github.com/pytorch/pytorch/issues/38186 husthyc tested locally that it indeed fixes the issue there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56021 Reviewed By: ailzhang Differential Revision: D27940362 Pulled By: albanD fbshipit-source-id: 020743315ce055633324ccd751c457e32ea3263d	2021-04-22 13:34:20 -07:00
Jane Xu	2ee3f5f812	Copy over test reports before running "report results" for linux test jobs (#56725 ) Summary: This way, if report results fail, the test reports are still saved as artifacts so we could use them to help us debug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56725 Test Plan: CI linux test to pass + see that the test reports are copied in the Run tests step Reviewed By: samestep Differential Revision: D27948434 Pulled By: janeyx99 fbshipit-source-id: 597a2ba4fe1dca16c7b75a1399600b27f380f5cd	2021-04-22 13:27:14 -07:00
Edvard Ghazaryan	048087d942	make beg_size output deterministic for EmbeddingBag (#56661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56661 Under some conditions (requires_grad = false and mode=SUM) bag_size and max_indices will be created via at::empty and will not be modified, that is why corresponding outputs is not deterministic and causing tests to fail. Test Plan: buck test mode/opt //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --exact 'caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.EmbeddingBag' --run-disabled Reviewed By: hlu1 Differential Revision: D27931445 fbshipit-source-id: fe9747094027e4e6f7c7b0771c1cd994f94fd554	2021-04-22 11:58:32 -07:00
Peter Bell	8b3bf98cb8	Tell codegen that SparseCsrCUDA is cuda (#56602 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/50937, Fixes build failures in https://github.com/pytorch/pytorch/issues/56561 Currently SparseCsrCUDA is included in cpu build and also doesn't get code-generated device guards. This fixed both issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56602 Reviewed By: albanD Differential Revision: D27921001 Pulled By: ezyang fbshipit-source-id: 2b3b0b66d0a7c5ef96e0817d8852d511dd954ae4	2021-04-22 11:57:10 -07:00
Shen Li	b85b89d246	Re-enable test_device_maps_gpu (#56415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56415 closes #53287 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27865438 Pulled By: mrshenli fbshipit-source-id: 3f7fcba8b799966388cc98ffc349cb62f281c367	2021-04-22 11:50:23 -07:00
Ailing Zhang	0c544ebd24	Revert to ANVTM in jni_lite due to Oculus failure. Test Plan: FanW123 verified on her Oculus device Reviewed By: FanW123 Differential Revision: D27943428 fbshipit-source-id: ac1c1ca6b47937f8839ba23c9e3af0843ea086a3	2021-04-22 11:49:01 -07:00
Tao Xu	614dce54a6	[iOS GPU] Fix Shader compilation errors for Metal 1.2 (iOS 12) (#56670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56670 `int64_t` is only available for Metal 2.2 and above. `size_t` works fine in those situations. https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf ghstack-source-id: 127169610 Test Plan: - AIBench ``` buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/metal/metal_unet_1001_detection.json --platform ios --framework pytorch --remote --devices D201 (`85b1c45a45`)AP-12.0.1 ``` Reviewed By: linbinyu Differential Revision: D27933297 fbshipit-source-id: 474b1eb191c68101367c9623c855645684434bd7	2021-04-22 11:44:31 -07:00
driazati	187a524249	Re-order tests based on changed files (#56666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56666 Addresses some of #56557 by checking for changed files when running tests. This will help deliver signal faster when a failing test is run. It should always be safe to at least try to re-order the tests, so there's no option to turn it off, and any error ends up bailing out of the sorting process. Time saved will change between tests, with more improvement for things that are further down the static list here: `1e9c7ad4cb/test/run_test.py (L32)` The results vary from not much improvement ([before: 11m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712819/steps), [after: 10m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712802/steps)) to a lot ([before: 75m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307580/workflows/6ab3def6-8d63-4f41-9b8d-9c2c50f6266b/jobs/12712884/steps), [after: 8m](https://app.circleci.com/pipelines/github/pytorch/pytorch/307578/workflows/157407b4-f850-431c-b641-d2ac97916a04/jobs/12712865/steps)), but overall there shouldn't be any regression in test timing. These results are also probably a little confounded since the test sharding will be different after re-ordering. As a follow up we can use the target determination logic to figure out which tests to bring to front based on the actual code instead of just edits to test files Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D27934076 Pulled By: driazati fbshipit-source-id: 747d09ad732289d7693101803d46e9fa8e6d2f59	2021-04-22 10:27:07 -07:00
Stas Bekman	1dbbbbe904	[doc] FX Graph Mode Quantization - fix preamble (#52192 ) Summary: The pre-amble here is misformatted at least and is hard to make sense of: https://pytorch.org/docs/master/quantization.html#prototype-fx-graph-mode-quantization This PR is trying to make things easier to understand. As I'm new to this please verify that my modifications remain in line with what may have been meant originally. Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52192 Reviewed By: ailzhang Differential Revision: D27941730 Pulled By: vkuzo fbshipit-source-id: 6c4bbf7c87d8fb87ab5d588b690a72045752e47a	2021-04-22 10:20:31 -07:00
Yi Wang	f0958f4748	[c10d] Add requires_gloo decorator to test_logging_init (#56682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56682 The process group is created on a gloo backend. Context: https://github.com/pytorch/pytorch/pull/56598#discussion_r618206941 ghstack-source-id: 127150910 Test Plan: buck test caffe2/test/distributed:c10d -- test_logging_init Reviewed By: pbelevich Differential Revision: D27936805 fbshipit-source-id: 932efc638f94bdf78ddbae291e3720a20e43f2af	2021-04-22 10:14:41 -07:00
Bert Maher	036becf29c	Disable TestComplexity.test_nn_module_test in fbcode (#56677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56677 This has been failing with `RecursionError: maximum recursion depth exceeded while calling a Python object` in fbcode for a while now. Obviously this isn't a fix, but the test works in OSS, so... ghstack-source-id: 127146338 Test Plan: ``` buck test mode/dev //caffe2/test:jit -- --exact 'caffe2/test:jit - test_nn_module_tests (jit.test_complexity.TestComplexity)' --run-disabled ``` Reviewed By: Lilyjjo Differential Revision: D27934963 fbshipit-source-id: 21d9858dab9ca1ebb5b67f286e788662dd24a988	2021-04-22 10:01:45 -07:00
Edward Yang	c6d004125e	Port all non-float unary operators to structured (and rsqrt) (#56151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56151 I missed rsqrt in the last PR. The native_functions.yaml was done with the following script: ``` import ruamel.yaml from ruamel.yaml.tokens import CommentToken from ruamel.yaml.error import CommentMark from tools.codegen.model import * # noqa: F403 with open("aten/src/ATen/native/native_functions.yaml", "r") as f: contents = f.read() yaml = ruamel.yaml.YAML() yaml.preserve_quotes = True yaml.width = 1000 yaml.boolean_representation = ['False', 'True'] r = yaml.load(contents) convert = '''\ rsqrt bitwise_not frac i0 round '''.split() for e in r: f = NativeFunction.from_yaml(e, Location("", 0)) if f.structured or f.structured_delegate is not None: continue n = f.func.name.name.base if n not in convert: continue # mutate e to make changes if f.func.kind() == SchemaKind.out: e.insert(1, 'structured', True) e.insert(2, 'structured_inherits', 'TensorIteratorBase') else: # TODO: The .out overload assumption is not sound in general e.insert(1, 'structured_delegate', f'{n}.out') if 'dispatch' in e: e['dispatch'].pop('CPU', None) e['dispatch'].pop('CUDA', None) e['dispatch'].pop('CPU, CUDA', None) e['dispatch'].pop('CompositeExplicitAutograd', None) else: print(n) _, last_k = e.keys() needs_fixup = False if 'dispatch' in e and not e['dispatch']: if last_k == 'dispatch': needs_fixup = True del e['dispatch'] # Manually fix up newlines at the end, because ruamel # made some bad life choices about where to associate trailing # whitespace for nested dicts; see # https://stackoverflow.com/questions/42172399/modifying-yaml-using-ruamel-yaml-adds-extra-new-lines if needs_fixup: _, last_k = e.keys() # post_key, pre_key, post_value, pre_value e.ca.items[last_k] = [None, None, CommentToken('\n\n', CommentMark(0), None), None] with open("aten/src/ATen/native/native_functions.yaml.new", "w") as f: yaml.dump(r, f) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27795078 Pulled By: ezyang fbshipit-source-id: c8961b58753c12f985d786eae73f776c39d30e6e	2021-04-22 09:57:23 -07:00
Lillian Johnson	86ae22d85d	[torch.Package] Folder has_file() method (#56584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56584 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27909314 Pulled By: Lilyjjo fbshipit-source-id: facc89735ab67c87f0ec7653d8ccc359f98d4e0d	2021-04-22 09:52:35 -07:00
Eli Uriegas	dfb65146e5	Add RELEASE.md (#56520 ) Summary: The purpopse of this document is to outline our current release process so that users coming into the project have a better idea on how the actual release process works and how they can help contribute to it. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/56520 Reviewed By: janeyx99 Differential Revision: D27890571 Pulled By: seemethere fbshipit-source-id: 882a565ea8d9b9a46c9242be7cf79dede2bae63f	2021-04-22 09:43:29 -07:00
Erjia Guan	8cf85a1152	[DataLoader][doc] Randomness for base_seed generator and NumPy seed (#56528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56528 Tried to search across internal and external usage of DataLoader. People haven't started to use `generator` for `DataLoader`. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908487 Pulled By: ejguan fbshipit-source-id: 14c83ed40d4ba4dc988b121968a78c2732d8eb93	2021-04-22 09:40:45 -07:00
Erjia Guan	aec83ff45e	[DataLoader] Add Numpy seeding to worker of DataLoader (#56488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56488 Considering amount of requests for this feature, introduce numpy seeding as default within each worker for DataLoader. ## BC-breaking Note: - By introducing default numpy.random seeding strategy to workers of DataLoader, users don't need to manually set seed for workers by the `worker_init_fn`. And this PR won't influence users who are currently using `worker_init_fn` to set customized seed for workers. - DataLoader will preserve reproducibility for users who are using numpy.random within Dataset. - Multiprocessing (without `worker_init_fn` to define seed for numpy) - Start method as `spawn`: Each worker will now have seed for numpy random, rather than the seed generated from the imported time of Numpy module that make the DataLoader lose the reproducibility. - Start method as `fork`: Each worker not only have the same benefit as `spawn`, but also have different seed for numpy as default, rather than inheriting the same seed. Using the following Dataset and script as an example: ```py class RandomDataset(Dataset): def __getitem__(self, ind): item = [ind, np.random.randint(1, 10000)] return item def __len__(self): return 20 if __name__ == '__main__'" ctx = mp.get_context('fork') ds = RandomDataset() g = torch.Generator() g.manual_seed(0) dl = DataLoader(ds, 2, shuffle=False, num_workers=4, multiprocessing_context=ctx, generator=g) epochs = 2 for _ in range(epochs): for batch in d;: print(batch) print("====" * 10) ``` ### 1.8.1: Each worker generates same random result per iteration. And the seed will be reset to same for each epoch. ```py tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= tensor([[ 0, 7449], [ 1, 1519]]) tensor([[ 2, 7449], [ 3, 1519]]) tensor([[ 4, 9645], [ 5, 2387]]) tensor([[ 6, 9645], [ 7, 2387]]) tensor([[ 8, 3118], [ 9, 4552]]) ========================= ``` ### This PR: Each worker has different seed at the beginning and re-seed for each epoch. ```py tensor([[ 0, 8715], [ 1, 5555]]) tensor([[ 2, 6379], [ 3, 1432]]) tensor([[ 4, 3271], [ 5, 5132]]) tensor([[ 6, 4287], [ 7, 1104]]) tensor([[ 8, 8682], [ 9, 1699]]) ========================= tensor([[ 0, 1374], [ 1, 996]]) tensor([[ 2, 143], [ 3, 3507]]) tensor([[ 4, 5887], [ 5, 4730]]) tensor([[ 6, 7274], [ 7, 738]]) tensor([[ 8, 6374], [ 9, 1572]]) ========================= ``` Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27908486 Pulled By: ejguan fbshipit-source-id: 5f313a30563bedeb88be214fa4beca0cefe9e4f4	2021-04-22 09:39:33 -07:00
Gary Miguel	bc3d892c20	README: Minor improvements (#56193 ) Summary: * Visual studio versions: clarify and shorten. * Remove obsolete note about a bug that has been fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56193 Reviewed By: albanD Differential Revision: D27939766 Pulled By: ezyang fbshipit-source-id: e142ec04ba98d5468f28ddf2e8bba5d99d3cfc26	2021-04-22 09:30:23 -07:00
Will Constable	21fd5f4b79	Document current deploy cpython build #56490 (#56600 ) Summary: Call out the issues with cpython deps and suggest a workaround. Fixes https://github.com/pytorch/pytorch/issues/56490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56600 Reviewed By: albanD Differential Revision: D27920647 Pulled By: wconstab fbshipit-source-id: 61a53a176eaf42a6166d649d3cb0fdfa2489e9d2	2021-04-22 09:02:29 -07:00
Suraj Subramanian	78022aa62c	Add more model symbolic tracing tests from torchvision (#55744 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55398 Generates tests that calls `symbolic_trace` on torchvision models and verifies the parity of outputs from eager model, `fx.GraphModule`, `jit.ScriptModule`. Test errors: GoogleNet and Inception models throw a type mismatch when scripting the traced `fx.GraphModule`. ``` Return value was annotated as having type __torch__.torchvision.models.googlenet.GoogLeNetOutputs but is actually of type Tensor: dropout = self.dropout(flatten); flatten = None fc = self.fc(dropout); dropout = None return fc ~~~~~~~~~ <--- HERE ``` Relevant type-inconsistency `512ea299d4/torchvision/models/googlenet.py (L200)` ``` torch.jit.unused def eager_outputs(self, x: Tensor, aux2: Tensor, aux1: Optional[Tensor]) -> GoogLeNetOutputs: if self.training and self.aux_logits: return _GoogLeNetOutputs(x, aux2, aux1) else: return x # type: ignore[return-value] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55744 Reviewed By: albanD Differential Revision: D27920595 Pulled By: suraj813 fbshipit-source-id: 01f6f2aef7badbde29b5162a7787b5af9398090d	2021-04-22 08:54:06 -07:00
Nikita Shulga	9be2cabc45	Pass contiguous weight to NNPACK convolution (#56569 ) Summary: Added TestNN.test_conv2d_discontiguous_weight to prevent further regressions Fixes https://github.com/pytorch/pytorch/issues/55781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56569 Reviewed By: ngimel Differential Revision: D27926509 Pulled By: malfet fbshipit-source-id: fa5ce943c3e4db4aa4de1b1cba35bd399fb3c54d	2021-04-22 08:45:24 -07:00
Ansha Yu	690c8b434f	[static runtime] binding for aten::sub_out (#56656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56656 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.85766 ms. 35.7817%. fb::sigrid_transforms_torch_bind (1 nodes) 1.1238 ms. 21.6464%. aten::linear (6 nodes) 0.858116 ms. 16.5288%. aten::argmin (1 nodes) 0.334183 ms. 6.43694%. aten::matmul (1 nodes) 0.173697 ms. 3.3457%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.118827 ms. 2.28881%. fb::clip_ranges_gather (263 nodes) 0.101348 ms. 1.95215%. aten::sub (1 nodes) 0.0748209 ms. 1.44118%. aten::repeat (1 nodes) 0.0582576 ms. 1.12214%. aten::norm (1 nodes) 0.0474353 ms. 0.913686%. fb::batch_box_cox (1 nodes) 0.0457588 ms. 0.881393%. aten::__getitem__ (506 nodes) 0.0435175 ms. 0.838222%. prim::TupleUnpack (254 nodes) 0.0425416 ms. 0.819425%. aten::sigmoid (2 nodes) 0.0383822 ms. 0.739308%. fb::offsets_to_ranges (253 nodes) 0.0330187 ms. 0.635996%. aten::mul (3 nodes) 0.027534 ms. 0.530352%. fb::simple_embedding_bag_sum (3 nodes) 0.0274914 ms. 0.529532%. aten::pow (1 nodes) 0.0236733 ms. 0.455989%. fb::casted_batch_one_hot_lengths (1 nodes) 0.023348 ms. 0.449723%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0193511 ms. 0.372735%. aten::sum (3 nodes) 0.0188839 ms. 0.363737%. prim::DictConstruct (2 nodes) 0.0183191 ms. 0.352858%. prim::TupleConstruct (1 nodes) 0.0119029 ms. 0.22927%. aten::div (1 nodes) 0.0103263 ms. 0.198902%. static_runtime::to_copy (8 nodes) 0.00977658 ms. 0.188314%. prim::ListConstruct (4 nodes) 0.00924042 ms. 0.177986%. fb::sigrid_hash_precompute (1 nodes) 0.00692162 ms. 0.133322%. aten::contiguous (1 nodes) 0.00567485 ms. 0.109307%. aten::narrow (4 nodes) 0.00362285 ms. 0.0697823%. aten::logit (1 nodes) 0.00329995 ms. 0.0635627%. aten::add (1 nodes) 0.00285633 ms. 0.0550178%. aten::full (1 nodes) 0.00268469 ms. 0.0517118%. fb::gather_ranges (4 nodes) 0.00248577 ms. 0.0478803%. aten::stack (1 nodes) 0.00241782 ms. 0.0465715%. aten::relu (1 nodes) 0.00233674 ms. 0.0450096%. aten::clamp_min (1 nodes) 0.00222238 ms. 0.0428068%. static_runtime::reshape_copy (2 nodes) 0.00171177 ms. 0.0329716%. aten::size (3 nodes) 0.00120008 ms. 0.0231155%. aten::expand_as (1 nodes) 0.00112628 ms. 0.0216942%. fb::clip_ranges (2 nodes) 0.00103193 ms. 0.0198768%. fb::lengths_to_offsets (3 nodes) 0.000598624 ms. 0.0115305%. static_runtime::flatten_copy (1 nodes) 0.000236196 ms. 0.00454954%. prim::device (1 nodes) 5.19164 ms. in Total StaticRuntime setup time: 0.000868 ms Memory allocation time: 0.0109619 ms Memory deallocation time: 0.071791 ms Outputs deallocation time: 0.0560187 ms Total memory managed: 1232320 bytes Total number of reused tensors: 32 W0421 17:40:52.053653 1746499 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 17:40:52.053757 1746499 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 17:40:52.053779 1746499 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 17:40:52.185776 1746499 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 131.985. Iters per second: 7.57661 I0421 17:40:52.337853 1746499 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27929253 fbshipit-source-id: 5a7984ba3ce2d6d4bce0a0ab6c5e09e8c037b44e	2021-04-22 08:40:35 -07:00
Sam Estep	3355c30f91	Always run all the grep-based quick-checks steps (#56700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56700 Reviewed By: walterddr Differential Revision: D27940638 Pulled By: samestep fbshipit-source-id: 54311ef45ec051ee29d934d501e83b3542bbb439	2021-04-22 08:35:43 -07:00
Nikita Shulga	47d2edd597	Fix quick-checks for operator-schemas (#56692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56692 Reviewed By: heitorschueroff Differential Revision: D27939830 Pulled By: malfet fbshipit-source-id: 67a054de5c58832fcd7d0df0dd37faf1ea1406fd	2021-04-22 08:11:29 -07:00
Sam Estep	bdb421895a	Remove some wildcards from mypy configs (#56645 ) Summary: See https://github.com/pytorch/pytorch/pull/56523#issuecomment-823562134 for context. Basically the idea is that people (including myself) keep assuming that the single-asterisk `` wildcard means "match in this directory and in its subdirectories", which is _not_ true. Removing the wildcards thus reduces confusion. Ideally I would like to remove _all_ of these wildcards and then add a lint to disallow them in the future (and also greatly simplify the pattern-matching logic in `tools/mypy_wrapper.py`; see https://github.com/pytorch/pytorch/issues/55702 for context), but currently this one can't be removed: ``` tools/autograd/.py, ``` That is because there is a file called `tools/autograd/templates/annotated_fn_args.py` (added in https://github.com/pytorch/pytorch/issues/41575) which is not a valid Python file and thus cannot be checked by `mypy`. ezyang would it be possible to rename that file to use a suffix other than `.py`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/56645 Test Plan: ``` $ mypy Success: no issues found in 1317 source files $ mypy --config=mypy-strict.ini Success: no issues found in 72 source files ``` The numbers of source files should be the same before and after this PR. Reviewed By: ezyang Differential Revision: D27925207 Pulled By: samestep fbshipit-source-id: c17faf73665a75393d3109346a1138c2af023abb	2021-04-22 07:51:01 -07:00
M.L. Croci	1f0223d6bb	Fix bug in gaussian_nll_loss (#56469 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53964. cc albanD almson ## Major changes: - Overhauled the actual loss calculation so that the shapes are now correct (in functional.py) - added the missing doc in nn.functional.rst ## Minor changes (in functional.py): - I removed the previous check on whether input and target were the same shape. This is to allow for broadcasting, say when you have 10 predictions that all have the same target. - I added some comments to explain each shape check in detail. Let me know if these should be shortened/cut. Screenshots of updated docs attached. Let me know what you think, thanks! ## Edit: Description of change of behaviour (affecting BC): The backwards-compatibility is only affected for the `reduction='none'` mode. This was the source of the bug. For tensors with size (N, D), the old returned loss had size (N), as incorrect summation was happening. It will now have size (N, D) as expected. ### Example Define input tensors, all with size (2, 3). `input = torch.tensor([[0., 1., 3.], [2., 4., 0.]], requires_grad=True)` `target = torch.tensor([[1., 4., 2.], [-1., 2., 3.]])` `var = 2*torch.ones(size=(2, 3), requires_grad=True)` Initialise loss with reduction mode 'none'. We expect the returned loss to have the same size as the input tensors, (2, 3). `loss = torch.nn.GaussianNLLLoss(reduction='none')` Old behaviour: `print(loss(input, target, var)) ` `# Gives tensor([3.7897, 6.5397], grad_fn=<MulBackward0>. This has size (2).` New behaviour: `print(loss(input, target, var)) ` `# Gives tensor([[0.5966, 2.5966, 0.5966], [2.5966, 1.3466, 2.5966]], grad_fn=<MulBackward0>)` `# This has the expected size, (2, 3).` To recover the old behaviour, sum along all dimensions except for the 0th: `print(loss(input, target, var).sum(dim=1))` `# Gives tensor([3.7897, 6.5397], grad_fn=<SumBackward1>.` ![doc1](https://user-images.githubusercontent.com/26558092/115391089-f7f47b00-a1d6-11eb-8726-e4da9057aee0.png) ![doc2](https://user-images.githubusercontent.com/26558092/115391094-f925a800-a1d6-11eb-954b-afd187f42bc7.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56469 Reviewed By: jbschlosser, agolynski Differential Revision: D27894170 Pulled By: albanD fbshipit-source-id: 197890189c97c22109491c47f469336b5b03a23f	2021-04-22 07:43:48 -07:00
anjali411	76214bb464	Add OpInfo for torch.baddbmm (#56502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56502 Reviewed By: heitorschueroff Differential Revision: D27890939 Pulled By: anjali411 fbshipit-source-id: 072647a05cf93aedb76df0367af71b534be77258	2021-04-22 07:00:52 -07:00
Yukio Siraichi	49df8993c4	Port `scatter` and `scatter_add` to `OpInfo` (#56140 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54302 Tracking Issue https://github.com/pytorch/pytorch/issues/54261 Summary: - Port `scatter` and `scatter_add` tests to `OpInfo` - `masked_scatter` was already ported to `OpInfo` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56140 Reviewed By: malfet Differential Revision: D27918038 Pulled By: heitorschueroff fbshipit-source-id: 80b507fe8761cd15c967c85e0c289b568b877573	2021-04-22 06:53:52 -07:00
Horace He	0df239e550	[FX] Make arg normalization a method on Node and not a pass (also augment tests to be exhaustive) (#55992 ) Summary: Commandeered from https://github.com/pytorch/pytorch/pull/54563 Primary changes from first PR: 1. Refactored primary `normalize_function` logic into `operator_schemas.py` so that non-FX users can use it. 2. Refactored tests a bit, and added a path to call `normalize_function` directly. 3. Moved check for `boolean_dispatch` so that `torch.lu` also gets properly handled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55992 Reviewed By: mruberry Differential Revision: D27774396 Pulled By: Chillee fbshipit-source-id: 7f65632e1d608e4abd55aec5ccbfdc3f67f52b8e	2021-04-22 03:53:41 -07:00
Ansha Yu	81b59211d4	[static runtime] binding for aten::div_out (#56653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56653 Test Plan: ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=500 --warmup_iters=500 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=1 --do_profile=1 --adsfinder_compatibility=1 ``` ``` Time per node type: 1.48563 ms. 35.9861%. fb::sigrid_transforms_torch_bind (1 nodes) 0.92385 ms. 22.3783%. aten::linear (6 nodes) 0.681066 ms. 16.4974%. aten::argmin (1 nodes) 0.239311 ms. 5.79679%. aten::matmul (1 nodes) 0.140157 ms. 3.39501%. fb::clip_ranges_gather_sigrid_hash_v3 (77 nodes) 0.0951568 ms. 2.30497%. fb::clip_ranges_gather (263 nodes) 0.0835801 ms. 2.02455%. aten::sub (1 nodes) 0.054081 ms. 1.31%. aten::repeat (1 nodes) 0.0424465 ms. 1.02818%. aten::norm (1 nodes) 0.0389049 ms. 0.942389%. fb::batch_box_cox (1 nodes) 0.0346992 ms. 0.840514%. aten::__getitem__ (506 nodes) 0.0341335 ms. 0.82681%. prim::TupleUnpack (254 nodes) 0.0306839 ms. 0.743252%. aten::sigmoid (2 nodes) 0.0280489 ms. 0.679426%. aten::mul (3 nodes) 0.0265321 ms. 0.642684%. fb::offsets_to_ranges (253 nodes) 0.0207622 ms. 0.50292%. aten::pow (1 nodes) 0.0202067 ms. 0.489465%. fb::simple_embedding_bag_sum (3 nodes) 0.0195497 ms. 0.47355%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0184351 ms. 0.446551%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.016382 ms. 0.39682%. aten::sum (3 nodes) 0.0158651 ms. 0.384299%. prim::TupleConstruct (1 nodes) 0.0150918 ms. 0.365567%. prim::DictConstruct (2 nodes) 0.00858005 ms. 0.207833%. aten::div (1 nodes) 0.00810684 ms. 0.196371%. fb::sigrid_hash_precompute (1 nodes) 0.00796325 ms. 0.192893%. static_runtime::to_copy (8 nodes) 0.00782038 ms. 0.189432%. prim::ListConstruct (4 nodes) 0.0057504 ms. 0.139291%. aten::contiguous (1 nodes) 0.0044688 ms. 0.108247%. aten::narrow (4 nodes) 0.00284054 ms. 0.068806%. aten::logit (1 nodes) 0.00265049 ms. 0.0642024%. aten::add (1 nodes) 0.00216242 ms. 0.05238%. aten::full (1 nodes) 0.00207732 ms. 0.0503187%. aten::relu (1 nodes) 0.00198412 ms. 0.048061%. fb::gather_ranges (4 nodes) 0.00176954 ms. 0.0428632%. aten::stack (1 nodes) 0.00175913 ms. 0.0426112%. static_runtime::reshape_copy (2 nodes) 0.0016996 ms. 0.0411692%. aten::clamp_min (1 nodes) 0.00128528 ms. 0.0311331%. aten::size (3 nodes) 0.000849156 ms. 0.020569%. aten::expand_as (1 nodes) 0.000757672 ms. 0.018353%. fb::clip_ranges (2 nodes) 0.000596224 ms. 0.0144423%. fb::lengths_to_offsets (3 nodes) 0.000442632 ms. 0.0107218%. static_runtime::flatten_copy (1 nodes) 0.000196158 ms. 0.00475151%. prim::device (1 nodes) 4.12833 ms. in Total StaticRuntime setup time: 0.000451 ms Memory allocation time: 0.0089336 ms Memory deallocation time: 0.0578358 ms Outputs deallocation time: 0.0431742 ms Total memory managed: 947328 bytes Total number of reused tensors: 31 W0421 16:56:34.220682 1522800 PyTorchPredictorContainer.cpp:200] Failed to load metadata file W0421 16:56:34.220772 1522800 PyTorchPredictorContainer.cpp:457] Couldn't find model param config file xl_model_weights/model_param_config I0421 16:56:34.220791 1522800 PyTorchPredictorBenchLib.cpp:137] PyTorch predictor: number of prediction threads 1 I0421 16:56:34.366667 1522800 PyTorchPredictorBenchLib.cpp:230] PyTorch run finished. Milliseconds per iter: 145.863. Iters per second: 6.85573 I0421 16:56:34.514202 1522800 PtVsBlackBoxPredictorBenchLib.cpp:132] Finished comparing PT static runtime and jit interpreter results ``` Reviewed By: hlu1 Differential Revision: D27927731 fbshipit-source-id: 595883a31ba0cadf6449799d47bf2294a1d05b41	2021-04-22 01:38:24 -07:00
Bert Maher	57cba8e601	Use at::cpu in bench_approx (#56563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56563 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27902737 Pulled By: bertmaher fbshipit-source-id: 66962671afbb093d5ae0b9308a401536c06ce8f5	2021-04-21 22:56:07 -07:00
Pavel Belevich	426852b4f0	Split test_c10d_spawn.py to test_c10d_spawn_gloo.py,test_c10d_spawn_nccl.py (#56599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56599 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913955 fbshipit-source-id: 7206e589fb7d08c55d08a58a3d57dc3d210a795e	2021-04-21 22:11:49 -07:00
Pavel Belevich	5cc75e46fa	Split test_c10d.py to test_c10d_common.py, test_c10d_gloo.py, test_c10d_nccl.py (#56598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56598 Test Plan: NA Reviewed By: SciPioneer Differential Revision: D27913170 fbshipit-source-id: 3439d18141131b02d55f2ca399a4c795cba2b04b	2021-04-21 22:10:41 -07:00
Ilia Cherniavskii	d24314bd2c	Update Kineto submodule and use new metadata api (#56432 ) Summary: Update Kineto submodule and use new metadata api Pull Request resolved: https://github.com/pytorch/pytorch/pull/56432 Test Plan: CI Reviewed By: chaekit Differential Revision: D27871570 Pulled By: ilia-cher fbshipit-source-id: 3556787f07a9c9e138666a62ee4cd23af6d7473b	2021-04-21 21:50:13 -07:00
Tao Xu	1b87274460	[iOS GPU][Design] Support multiple tensors as outputs (#56072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56072 Currently, we don't support outputting more than one tensors on GPU. For example, if you do ``` auto x = at::rand(1,4,2,2).metal(); auto y = at::chunk(x,2,1); //y is a tuple auto output1 = y[0].cpu(); auto output2 = y[1].cpu(); ``` In the example above, when it hits `y[0].cpu()`, the command buffer will be committed to move `y[0]` from GPU to CPU. By the time it hits `y[1].cpu()`, since the command buffer has already become invalid, the temporary image that lives in `output2` has been recycled. Thus, a runtime exception will be thrown. The way we address it is using the observer pattern 1. Before we flush the command buffer, we'll notify its the observers(a.k.a MPSImageWrapper objects) who hold the temporary images. 2. When observers receive the notification, they'll turn the current temporary images into a static images. 3. Now, when `.cpu()` happens, the output tensor can just read the data directly from the static image generated in the above step. You may be wondering does that have a hidden cost where all the intermediate tensors have hold unused static images? The answers is no. All intermediate tensors will be released once their reference counts become zero. Since the MetalTensorImpl is subclassing from the TensorImpl, we're overriding the release_resource method which gives us a chance to release the underlying storage (textures and buffers) and remove observers from the command buffer. Therefore, once the intermediate tensors go away, the temporary images will be recycled immediately. ghstack-source-id: 127079751 Test Plan: - We'll be using `at::chunk` to test this in the following diffs, as it returns a tuple that contains multiple tensors. - Sandcastle CI - CircleCI Reviewed By: dreiss Differential Revision: D27165886 fbshipit-source-id: 290b0d77b1dc74990b25cbd0abb775df1ab47ca0	2021-04-21 21:15:34 -07:00
Nikita Shulga	36828aa0ff	Revert D27866138: [ONNX] Redesign inplace conversion (#55033 ) Test Plan: revert-hammer Differential Revision: D27866138 (`24ff92f76d`) Original commit changeset: ab5c9188740c fbshipit-source-id: b99bf5b12e109089ebd5748c1dc152c6af1cebdb	2021-04-21 21:11:06 -07:00
Nikita Shulga	a1299a2802	Disable Windows GPU testing (#56655 ) Summary: Until https://github.com/pytorch/pytorch/issues/56654 is resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/56655 Reviewed By: ilia-cher Differential Revision: D27929002 Pulled By: malfet fbshipit-source-id: af741d67e4c938f632afad29e675533e1fcb445d	2021-04-21 20:46:11 -07:00
beningodfrey4	df1dfd879e	Fix errors when initializing Linear with 0 in_features (#56505 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56505 Reviewed By: malfet Differential Revision: D27919590 Pulled By: jbschlosser fbshipit-source-id: 462ca280051f63c31ff588c38a9e436116c0f336	2021-04-21 20:42:32 -07:00
Brian Hirsh	76fbd755c1	Reland of "D27708346: generate xla codegen in-tree" (#56601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56601 Updating it to ensure that RegistrationDeclarations.yaml is completely unchanged This reverts commit 90e532f3ef17a9611e9e7a9f1f6189d4168bf084. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27915305 Pulled By: bdhirsh fbshipit-source-id: 491a025c44221690dad849f9a2166934130c0fec	2021-04-21 19:36:31 -07:00
Xiang Gao	0cc42809ce	Enable skipped test for c10::complex on CUDA >= 11.2 (#50227 ) Summary: That test was skipped due to a compiler bug. That bug should be fixed in 11.2, so we should enable it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50227 Reviewed By: malfet Differential Revision: D27909195 Pulled By: anjali411 fbshipit-source-id: c802702079d0e521f53fc98cd0fc3ded0c12b455	2021-04-21 18:33:31 -07:00
BowenBao	24ff92f76d	[ONNX] Redesign inplace conversion (#55033 ) (#56173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56173 * Create `InplaceConverter` and `ValueTracker` to keep track of aliases of values throughout the graph. For a given value, a new alias is created every time when there is an inplace operation, SetAttr, or through nested blocks owned by If/Loop nodes. * Fix bug where controlflow node output types are not set, when the complete node is unable to run ONNX shape inference due to containing non-onnx node. * Add symbolic for `__not__` ~~and `prim_min`~~(update: moved to a separate PR), and update `index_put` opset9 to support case of assignment without providing indices. * Bump ORT version in CI test. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866138 Pulled By: SplitInfinity fbshipit-source-id: ab5c9188740c50f783ceba4d54fda43c26e2fde7	2021-04-21 17:59:11 -07:00
BowenBao	818ce1d0d2	Add standardOps match more input type in ORT (#53813 ) (#56172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56172 Enable the standardOps include Add\Sub\Mul\Div\Gemm\Pow\Mod with low precision input in ORT Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866136 Pulled By: SplitInfinity fbshipit-source-id: f2cf5649fffefd68c0cc7b6dce94198751636727	2021-04-21 17:58:08 -07:00
Wanchao Liang	43ad172c54	make ProcessGroupDefaultTimeout the same as python (#56549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56549 This make the `kProcessGroupDefaultTimeout` be the same as the python side, and python side directly use the pybind value instead Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D27899190 Pulled By: wanchaol fbshipit-source-id: 388a7f42358b0abed75cf4934fb7b311fd33fee6	2021-04-21 17:56:05 -07:00
Wanchao Liang	a970e525fd	make ProcessGroup.Options.timeout argument private in python (#56531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56531 per discussions in https://github.com/pytorch/pytorch/pull/53663/files#r593409009, we need to make sure our API not confusing user by passing in both timeout in argument and timeout in processgroup.options. This PR tries to make the `ProcessGroup.Options.timeout` be a private field, and only be used in our test utils, for both `init_process_group` and `new_group`, we still allow user pass `timeout` as a separate argument. Since `ProcessGroupGloo.Options` only have a `timeout` config, both functions will not allow passing in options for the GLOO backend. This way we still preserve the only `timeout` API, and only allow user to use `ProcessGroupNCCL.Options` when needed. cc pritamdamania87 rohan-varma Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D27893395 Pulled By: wanchaol fbshipit-source-id: cdd29c84648002226ef3d9f9f3ea67b795e64bc5	2021-04-21 17:55:10 -07:00
Nikita Shulga	6d7d36d255	s/“pad”/"pad"/ in files introduced by #56065 (#56618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56618 Reviewed By: albanD Differential Revision: D27919343 Pulled By: malfet fbshipit-source-id: 2fac8ba5f399e050463141eba225da935c97a5ce	2021-04-21 17:40:29 -07:00
Jeffrey Wan	5dcc7ac35c	Add new scheduled job to circle-ci workflow (#55182 ) Summary: Under this setting the job should run 3 times a day. When the environment variable, `PYTORCH_TEST_WITH_SLOW_GRADCHECK` is set to `ON`, set the default value for `fast_mode` in gradchack wrapper as False. This would be overriden by whatever value the user explicitly passes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55182 Reviewed By: albanD Differential Revision: D27919236 Pulled By: soulitzer fbshipit-source-id: 3a55ec6edcfc6e65fbc3a8a09c63aaea1bd1c5bf	2021-04-21 17:05:10 -07:00
Aswin John Mathews	73eaa0a5f5	Fixing error in jit cuda on ROCm: non-constant-expression cannot be n… (#55243 ) Summary: On ROCm, the error when compiling was "non-constant-expression cannot be narrowed from type 'int' to 'uint32_t'" when compiling grid_reduction.cu. Added typecast to fix issue. Also, removed test skip with ROCm : re-enabling Pull Request resolved: https://github.com/pytorch/pytorch/pull/55243 Reviewed By: malfet Differential Revision: D27917066 Pulled By: ngimel fbshipit-source-id: b0b7c5fc8ecd2624222b35fe060846f7d1670f07	2021-04-21 16:35:27 -07:00
Ansha Yu	e0be76fb9b	[static_runtime] fix num args for to_copy (#56441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56441 Since aten::to is overloaded, match schema to replace it with static_runtime::to_copy Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_model=/data/users/ansha/tmp/adfinder/210494966_0.predictor.disagg.remote_request_only --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_net2.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.remote_request_only.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=1 --benchmark_c2_predictor=0 --do_benchmark=0 ``` ``` Time per node type: 0.623426 ms. 55.337%. quantized::embedding_bag_4bit_rowwise_offsets (82 nodes) 0.331633 ms. 29.4367%. quantized::embedding_bag_byte_rowwise_offsets (71 nodes) 0.123163 ms. 10.9323%. aten::to (155 nodes) 0.038479 ms. 3.4155%. fb::lengths_to_offsets (155 nodes) 0.004169 ms. 0.370052%. aten::embedding_bag (2 nodes) 0.002549 ms. 0.226256%. static_runtime::to_copy (2 nodes) 0.002512 ms. 0.222972%. prim::TupleConstruct (1 nodes) 0.000667 ms. 0.0592048%. prim::dtype (2 nodes) 1.1266 ms. in Total StaticRuntime setup time: 0.009605 ms Memory allocation time: 0.001907 ms Memory deallocation time: 0.032401 ms Outputs deallocation time: 0.020876 ms Total memory managed: 256 bytes Total number of reused tensors: 159 ``` I verified that all of the aten::to matches, for the local, local_ro, and remote_ro nets in opt and dev mode. Only 2 of calls are replaced because the other 155 have either the input or the ouput of the op returned as an external output. This is a similar case for the other instances of aten::to in the local and local_ro nets. Reviewed By: hlu1 Differential Revision: D27872350 fbshipit-source-id: b72785ea2768be415faae2afcf9915aef07daec2	2021-04-21 16:31:36 -07:00
Shen Li	d83ae5d1b7	Add devices to TensorPipe options (#56405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56405 If not provided, the `devices` field will be initialized to local devices in local `device_maps` and corresponding devices in peers' `device_maps`. When processing CUDA RPC requests, the agent will use a dedicated stream for each device in the devices list to 1) accept argument CUDA tensors 2) run user functions 3) send return value tensors. closes #54017 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27863133 Pulled By: mrshenli fbshipit-source-id: 5d078c3b6d1812f85d62b0eb0f89f2b6c82cb060	2021-04-21 16:16:48 -07:00
Can Balioglu	853112bbfc	[7/n] [torch/elastic] Rename _Rendezvous to _RendezvousState (#56535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56535 This PR renames the `_Rendezvous` class to `_RendezvousState` in preparation of the upcoming changes. ghstack-source-id: 126979138 Test Plan: Run the existing unit tests. Reviewed By: H-Huang Differential Revision: D27889894 fbshipit-source-id: 027d26aa5e1acd5bba3ad2e58b140428a4a176b2	2021-04-21 16:01:03 -07:00
Can Balioglu	21d9bc246b	[6/n] [torch/elastic] Reorder type definitions in dynamic_rendezvous.py (#56534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56534 This PR reorders the type definitions in dynamic_rendezvous.py to increase the readability. ghstack-source-id: 126979087 Test Plan: Run the existing unit tests. Reviewed By: H-Huang Differential Revision: D27889817 fbshipit-source-id: 04291af9b8f3170e4b33cb4f33e0dff0d2d3fb23	2021-04-21 16:01:02 -07:00
Can Balioglu	df91eb924c	[5/n] [torch/elastic] Introduce the delay utility function (#56533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56533 This PR introduces a small utility function to delay the execution of the current thread. ghstack-source-id: 126979035 Test Plan: Run the associated unit tests. Reviewed By: H-Huang Differential Revision: D27889671 fbshipit-source-id: aae93b624bd4704da7a48004f50d130cec64969d	2021-04-21 16:01:00 -07:00
Can Balioglu	76ca1eeeb8	[4/n] [torch/elastic] Fix the finalizer of PeriodicTimer (#56532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56532 This PR fixes a subtle issue with the finalizer implementation of `_PeriodicTimer`. We avoid using a regular finalizer (a.k.a. `__del__`) for stopping the timer as joining a daemon thread during the interpreter shutdown can cause deadlocks. The `weakref.finalize` is a superior alternative that provides a consistent behavior regardless of the GC implementation. ghstack-source-id: 126978904 Test Plan: Run the existing unit tests as there is no behavioral change. Reviewed By: H-Huang Differential Revision: D27889289 fbshipit-source-id: a248cf6fd1abc4da8bef90e160fa9669a4961fa5	2021-04-21 15:59:19 -07:00
Lillian Johnson	c244d1c540	[package] resolve `__import__` calls on export (#55153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55153 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27504536 Pulled By: Lilyjjo fbshipit-source-id: 5e3e10f213c6e0cf1755d18eb19727515362f91a	2021-04-21 15:43:15 -07:00
Mike Guo	28f52649d8	add dtype information for input (#55358 ) Summary: add dtype for all input besides input dimenstion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55358 Reviewed By: heitorschueroff Differential Revision: D27862346 Pulled By: ilia-cher fbshipit-source-id: 656c5d6c9f23d723b27b44f0afc1a249ce1f3e44	2021-04-21 15:25:08 -07:00
Scott Wolchok	6032ea0313	[PyTorch] Migrate add operators to borrow in TensorIteratorBase (#55691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55691 Avoiding reference counting for these operations is roughly a 5% CPU time win vs not supporting borrowing at all. ghstack-source-id: 127092680 Test Plan: Existing CI for correctness. Continued perf stat experiment from previous diff. All results included below for reviewing convenience. Baseline: ``` Performance counter stats for '/tmp/cpp_benchmark.MaybeOwnedBaselineD27607270' (5 runs): 5,837.13 msec task-clock # 1.000 CPUs utilized ( +- 0.34% ) 442 context-switches # 0.076 K/sec ( +- 3.54% ) 5 cpu-migrations # 0.001 K/sec ( +- 19.07% ) 13,144 page-faults # 0.002 M/sec ( +- 0.39% ) 11,597,542,455 cycles # 1.987 GHz ( +- 0.32% ) (50.05%) 30,687,118,071 instructions # 2.65 insn per cycle ( +- 0.03% ) (50.08%) 6,247,677,215 branches # 1070.334 M/sec ( +- 0.04% ) (50.08%) 1,705,403 branch-misses # 0.03% of all branches ( +- 2.16% ) (50.05%) # Table of individual measurements: 5.9025 (+0.0663) # 5.8276 (-0.0085) # 5.8151 (-0.0210) # 5.7842 (-0.0519) # 5.8511 (+0.0150) # # Final result: 5.8361 +- 0.0198 seconds time elapsed ( +- 0.34% ) ``` Add but don't use borrowing support: ``` Performance counter stats for '/tmp/cpp_benchmark.MeasureMaybeOwnedCost' (5 runs): 5,947.20 msec task-clock # 0.999 CPUs utilized ( +- 0.15% ) 422 context-switches # 0.071 K/sec ( +- 1.88% ) 3 cpu-migrations # 0.001 K/sec ( +- 47.14% ) 13,025 page-faults # 0.002 M/sec ( +- 0.46% ) 11,814,216,945 cycles # 1.987 GHz ( +- 0.12% ) (50.08%) 31,535,372,676 instructions # 2.67 insn per cycle ( +- 0.06% ) (50.09%) 6,482,809,438 branches # 1090.060 M/sec ( +- 0.04% ) (50.07%) 1,688,623 branch-misses # 0.03% of all branches ( +- 1.62% ) (50.07%) # Table of individual measurements: 5.97105 (+0.01991) # 5.93649 (-0.01466) # 5.93568 (-0.01547) # 5.95940 (+0.00825) # 5.95310 (+0.00196) # # Final result: 5.95114 +- 0.00679 seconds time elapsed ( +- 0.11% ) ``` Now, use the borrowing support (this diff): ``` Performance counter stats for '/tmp/cpp_benchmark.MakeAddBorrow' (5 runs): 5,528.58 msec task-clock # 1.000 CPUs utilized ( +- 0.33% ) 451 context-switches # 0.082 K/sec ( +- 4.29% ) 6 cpu-migrations # 0.001 K/sec ( +- 34.65% ) 13,155 page-faults # 0.002 M/sec ( +- 0.32% ) 10,985,806,260 cycles # 1.987 GHz ( +- 0.33% ) (50.09%) 30,657,224,792 instructions # 2.79 insn per cycle ( +- 0.02% ) (50.07%) 6,247,997,282 branches # 1130.127 M/sec ( +- 0.01% ) (50.04%) 1,732,507 branch-misses # 0.03% of all branches ( +- 1.04% ) (50.06%) # Table of individual measurements: 5.5626 (+0.0356) # 5.4913 (-0.0357) # 5.5007 (-0.0263) # 5.5839 (+0.0569) # 5.4965 (-0.0305) # # Final result: 5.5270 +- 0.0192 seconds time elapsed ( +- 0.35% ) ``` 7.02% cycles improvement vs previous diff 2.78% instructions improvement vs previous diff 5.28% cycles improvement vs baseline 0.1% instructions improvement vs baseline Note that instructions per cycle improved. This makes sense because we are avoiding memory accesses, and memory accesses manifest as instructions which take 3 (or many more in the case of a cache miss) cycles. This is also a great example of an effect that instruction counting is blind to. Reviewed By: bhosmer Differential Revision: D27607295 fbshipit-source-id: 7a0205b4aba6b63febbb5966f0f5e2627815cbbe	2021-04-21 14:51:50 -07:00
Scott Wolchok	01842d2bb0	[PyTorch] Support borrowing in/out Tensors in TensorIterator (#55690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55690 Just change `OperandInfo::tensor` and `TensorIteratorConfig::tensors` to hold `c10::MaybeOwned<Tensor>`, and deal with the consequent pointer syntax. Had to C10_ALWAYS_INLINE OperandInfo to preserve existing inlining behavior for whatever compiler-idiosyncratic reason. This is a separate diff from usage to enable measuring the cost of support, and because there is no reason not to send it separately. We probably should not land this without a plan to migrate a lot of TensorIterator use cases to use either borrowing or structured kernels & borrowing. ghstack-source-id: 127092681 Test Plan: Existing CI for correctness. Ran perf stat on existing add in-place C++ benchmark and compared to D27607270 (diff before last; previous diff is arguably part of supporting borrowing). This is a devbig with turbo off. Baseline: ``` Performance counter stats for '/tmp/cpp_benchmark.MaybeOwnedBaselineD27607270' (5 runs): 5,837.13 msec task-clock # 1.000 CPUs utilized ( +- 0.34% ) 442 context-switches # 0.076 K/sec ( +- 3.54% ) 5 cpu-migrations # 0.001 K/sec ( +- 19.07% ) 13,144 page-faults # 0.002 M/sec ( +- 0.39% ) 11,597,542,455 cycles # 1.987 GHz ( +- 0.32% ) (50.05%) 30,687,118,071 instructions # 2.65 insn per cycle ( +- 0.03% ) (50.08%) 6,247,677,215 branches # 1070.334 M/sec ( +- 0.04% ) (50.08%) 1,705,403 branch-misses # 0.03% of all branches ( +- 2.16% ) (50.05%) # Table of individual measurements: 5.9025 (+0.0663) # 5.8276 (-0.0085) # 5.8151 (-0.0210) # 5.7842 (-0.0519) # 5.8511 (+0.0150) # # Final result: 5.8361 +- 0.0198 seconds time elapsed ( +- 0.34% ) ``` Add but don't use borrowing support: ``` Performance counter stats for '/tmp/cpp_benchmark.MeasureMaybeOwnedCost' (5 runs): 5,947.20 msec task-clock # 0.999 CPUs utilized ( +- 0.15% ) 422 context-switches # 0.071 K/sec ( +- 1.88% ) 3 cpu-migrations # 0.001 K/sec ( +- 47.14% ) 13,025 page-faults # 0.002 M/sec ( +- 0.46% ) 11,814,216,945 cycles # 1.987 GHz ( +- 0.12% ) (50.08%) 31,535,372,676 instructions # 2.67 insn per cycle ( +- 0.06% ) (50.09%) 6,482,809,438 branches # 1090.060 M/sec ( +- 0.04% ) (50.07%) 1,688,623 branch-misses # 0.03% of all branches ( +- 1.62% ) (50.07%) # Table of individual measurements: 5.97105 (+0.01991) # 5.93649 (-0.01466) # 5.93568 (-0.01547) # 5.95940 (+0.00825) # 5.95310 (+0.00196) # # Final result: 5.95114 +- 0.00679 seconds time elapsed ( +- 0.11% ) ``` 1.87% cycles regression vs baseline 2.76% instructions regression vs baseline Reviewed By: ezyang Differential Revision: D27607293 fbshipit-source-id: 55b9873c15b0de689ae17f9c35eb4ba0d026cade	2021-04-21 14:51:48 -07:00
Scott Wolchok	7e8f078a3d	[PyTorch] Always update op.current_dtype in TensorIteratorBase::set_output (#55940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55940 Simpler way to keep current_dtype up to date than #55689. ghstack-source-id: 127092676 Test Plan: CI Reviewed By: ezyang Differential Revision: D27744064 fbshipit-source-id: 23fccb8b0375f5b790439a9a1c9ac07d5fae391b	2021-04-21 14:51:46 -07:00
Scott Wolchok	b79901f932	[PyTorch] Remove non-const TensorIterator::tensor() method (#55420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55420 It doesn't seem to be necessary, and it blocks using `c10::MaybeOwned` to support borrowing. ghstack-source-id: 127092679 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27607270 fbshipit-source-id: a007e9896785c8708f8cc02035cc6f4607a0a31b	2021-04-21 14:51:44 -07:00
Scott Wolchok	26fc27cb4f	[PyTorch] Format generated structured kernels code better (#55258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55258 Prior to this diff, the generated code contained whitespace-only lines and odd indenting. Now, it is easier to read. ghstack-source-id: 127092678 Test Plan: Inspect generated RegisterCPU.cpp. Before: P372666985 After: P372665023 Reviewed By: ezyang Differential Revision: D27544604 fbshipit-source-id: 03095aa0275e7e817951cf8b303e4ad5cbb486ca	2021-04-21 14:51:43 -07:00
Scott Wolchok	1211bccc65	[PyTorch] Fix const correctness for resize native functions (#55351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55351 We incorrectly used `Tensor&` to mean "the underlying TensorImpl cannot be changed", as explained in https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 . This diff gets us on the path to fixing this problem: we have an incremental way to fix individual native functions so that we can apply any handwritten fixes a few at a time. It gets the migration started with the `resize` family of native functions. ghstack-source-id: 127092677 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27583983 fbshipit-source-id: 4eeeec85f5d268e9d0f1645eb9396914a9f9557f	2021-04-21 14:51:41 -07:00
davidriazati@fb.com	5e695b1271	Use absolute path for local linter (#56633 ) Summary: In some cases the `__file__` here was relative, so in the linter script it ended up setting the repo root to `''`, which `asyncio` doesn't handle. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56633 Pulled By: driazati Reviewed By: samestep Differential Revision: D27922510 fbshipit-source-id: 7e406fa374ec0e5c4917b7c11742b9457dd52668	2021-04-21 14:50:28 -07:00
Ivan Kobzarev	772ca1a2c3	[vulkan] Add Vulkan registrar for internal build (#56620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56620 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D27919883 Pulled By: IvanKobzarev fbshipit-source-id: af5eb7e2e16a31af80539dcbebc296857b45faff	2021-04-21 14:46:51 -07:00
Ailing Zhang	27a0d6f1df	AutoDispatchBelowAutograd takes no arguments. (#56424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56424 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27866607 Pulled By: ailzhang fbshipit-source-id: b82cfb90af5bc7b4129266083fe31f8b335a5b41	2021-04-21 14:44:12 -07:00
Xiao Wang	3ec6bf5d26	Fix cuda launch error in reflection_pad2d (#56451 ) Summary: Fix https://github.com/pytorch/pytorch/issues/55222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56451 Reviewed By: malfet Differential Revision: D27912184 Pulled By: ngimel fbshipit-source-id: 3fc80273c30a68a247289d3fb698f99b92931731	2021-04-21 14:39:31 -07:00
Meghan Lele	eac082891f	[package] Massage exporter docstrings (#56547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56547 Summary This commit tweaks the docstrings of `PackageExporter` so that they look nicer on the docs website. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27912965 Pulled By: SplitInfinity fbshipit-source-id: 38c0a715365b8cfb9eecdd1b38ba525fa226a453	2021-04-21 14:06:54 -07:00
Luca Wehrstedt	0911ee9108	Split CUDAFuture into a .h and a .cpp file (#56514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56514 rohan-varma mentioned that having CUDAFuture entirely defined in a header meant having to rebuild a whole lot of things whenever it changed. In fact there's no reason not to use a .cpp file, so here I do so. ghstack-source-id: 127035765 Test Plan: Unit tests Reviewed By: rohan-varma, mrshenli Differential Revision: D27861071 fbshipit-source-id: c209d54af9b52d3ad781db1b61f6fca02c637f32	2021-04-21 13:58:45 -07:00
Luca Wehrstedt	7dec14a491	Avoid defining RpcCUDAFuture subclass in TensorPipe agent (#56513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56513 The RpcCUDAFuture class existed solely to support extracting DataPtrs from a Message class. However, this can be done more simply by using a vanilla CUDAFuture and just extracting those DataPtrs before marking it complete and passing them to markCompleted. This allows to make the DataPtr extraction logic of CUDAFuture private again. ghstack-source-id: 127035771 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D27861064 fbshipit-source-id: b0b4df2cab7be6b4b16d5cfc888483c18fbce60e	2021-04-21 13:58:43 -07:00
Luca Wehrstedt	5ddc2691d0	Merge ivalue::Future's markCompleted and markCompletedWithDataPtrs (#56512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56512 I don't know if there was a reason to keep them separate, but since the former deferred to the latter, it seems to me that we can get the exact same behavior by merging them and making the `data_ptrs` argument optional (by giving it a default value). ghstack-source-id: 127035767 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D27861069 fbshipit-source-id: 93a49d6959b65a8d4ab9b31accce90bf30cd441e	2021-04-21 13:58:42 -07:00
Luca Wehrstedt	af23822112	Gracefully handle failure of DataPtr extraction in CUDAFuture (#56511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56511 CUDAFuture needs to inspect the value in order to extract DataPtrs. Sometimes it's unable to do so. So far we've handled this by raising an error when `markCompleted` is called. In this PR I'm proposing a change, which makes `markCompleted` return successfully, but instead causes the Future to be set to an error if the DataPtr extraction fails. The advantages I see are that user code calling `markCompleted` didn't expect it to throw, and thus wasn't catching and handle that error. Which in the best case could lead to a crash, and in the worst case could lead to the Future remaining incomplete, thus not unblocking any client waiting on it. With this change those clients would be woken up and would see the error. ghstack-source-id: 127035772 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D27861070 fbshipit-source-id: 4bb6100a488ab35fbe3c2bc3ac6f98d166c60a0b	2021-04-21 13:58:40 -07:00
Luca Wehrstedt	3e0c226eed	Raise TypeErrors when IValue::getSubValues fails (#56510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56510 The comment for `TORCH_INTERNAL_ASSERT` say to use it for "enforcement of internal invariants in code", meaning "assuming no bugs in PyTorch, the conditions tested by this macro should always be true". However this wasn't the case here, at least for the RPC code: CUDAFuture is calling the `getSubValues` method on a generic IValue of which it doesn't know (or care about) the type. It was thus sometimes triggering the internal assert when users provided non-inspectable types, which was producing an exception with a message containing "please report a bug to PyTorch", which was confusing to users. It makes more sense to me to consider this a type error, which can thus be reported more clearly to the user (and, later on in this stack, to catch). Hence the difference introduced here is just the type and the message of the exception. I don't expect there to be any code depending on the old behavior (as it would mean depending on a violation of an internal invariant). ghstack-source-id: 127035768 Test Plan: Unit tests Reviewed By: mrshenli Differential Revision: D27861066 fbshipit-source-id: 6d41c922257cba5f37c7a4614d8e5ab5c7c87b92	2021-04-21 13:57:34 -07:00
davidriazati@fb.com	5e4dfd0140	Add quicklint make target (#56559 ) Summary: This queries the local git repo for changed files (any changed files, not just committed ones) and sends them to mypy/flake8 instead of the default (which is the whole repo, defined the .flake8 and mypy.ini files). This brings a good speedup (from 15 seconds with no cache to < 1 second from my local testing on this PR). ```bash make quicklint -j 6 ``` It should be noted that the results of this aren’t exactly what’s in the CI, since mypy and flake8 ignore the `include` and `exclude` parts of their config when an explicit list of files is passed in. ](https://our.intern.facebook.com/intern/diff/27901577/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56559 Pulled By: driazati Reviewed By: malfet Differential Revision: D27901577 fbshipit-source-id: 99f351cdfe5aba007948aea2b8a78f683c5d8583	2021-04-21 13:47:25 -07:00
Joel Schlosser	12b2bc94d7	Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27909732 (`5a09def9b0`) Original commit changeset: d8684b2403ab fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4	2021-04-21 13:44:03 -07:00
Sam Estep	284e735b3f	Set show_error_codes = True in mypy-strict.ini (#56616 ) Summary: This should make it easier to resolve issues surfaced by https://github.com/pytorch/pytorch/issues/56290. Also see https://github.com/pytorch/pytorch/pull/56559#discussion_r617828152 for context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56616 Test Plan: You could add a type error in a strict-checked file like `tools/test_history.py`, and then run this command: ``` $ mypy --config=mypy-strict.ini tools/test_history.py ``` Output before this PR: ``` tools/test_history.py:13:1: error: Function is missing a type annotation for one or more arguments Found 1 error in 1 file (checked 1 source file) ``` Output after this PR: ``` tools/test_history.py:13:1: error: Function is missing a type annotation for one or more arguments [no-untyped-def] Found 1 error in 1 file (checked 1 source file) ``` Reviewed By: driazati Differential Revision: D27918753 Pulled By: samestep fbshipit-source-id: 953926e019a7669da9004fd54498b414aec777a6	2021-04-21 13:23:36 -07:00
Joel Schlosser	5a09def9b0	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: malfet Differential Revision: D27909732 Pulled By: jbschlosser fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76	2021-04-21 13:20:11 -07:00
Zafar Takhirov	11e26e7246	[sparsity][refactor] Remove "Sparsity" from the function names (#56555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56555 Remove the "sparse" and "sparsity" from the function/variable names Test Plan: `buck test mode/opt //caffe2/torch/fb/model_optimization:sparsity_test` Reviewed By: raghuramank100 Differential Revision: D27812205 fbshipit-source-id: 1665253720467030b84b744f824fa7742a802542	2021-04-21 13:15:27 -07:00
Protonu Basu	8ee1347c3f	Changes to support strides in addition to shape and dtype. (#56567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56567 This adds stride information to the serialized JSON. This also adds shape, dtype and stride to the graph that is printed out. Test Plan: Run unit tests. Reviewed By: jfix71 Differential Revision: D27528988 fbshipit-source-id: f0be92055ad7c8e525625bfd1332c2db11ba612d	2021-04-21 12:43:52 -07:00
Eli Uriegas	4230040470	torch: Fix flake8 errors from leftover import (#56614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56614 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D27917831 Pulled By: seemethere fbshipit-source-id: 90a3213080cc2c8da2bc63c8971e14f7823390a9	2021-04-21 12:39:53 -07:00
Sam Estep	7660cb880f	Rename job to be py2-setup-validate-errormsg (#56593 ) Summary: This should clarify its purpose, which is: > to make sure that we give an appropriate error message when someone tries to use python2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56593 Test Plan: CI. Reviewed By: gchanan Differential Revision: D27913086 Pulled By: samestep fbshipit-source-id: e7555d5cab5696b19a17824383c92f25f91da2cf	2021-04-21 12:32:26 -07:00
Joel Schlosser	8a81c4dc27	Update padding_idx docs for EmbeddingBag to better match Embedding's (#56065 ) Summary: Match updated `Embedding` docs from https://github.com/pytorch/pytorch/pull/54026 as closely as possible. Additionally, update the C++ side `Embedding` docs, since those were missed in the previous PR. There are 6 (!) places for docs: 1. Python module form in `sparse.py` - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s 2. Python `from_pretrained()` in `sparse.py` (refers back to module docs) 3. Python functional form in `functional.py` 4. C++ module options - includes an additional line about newly constructed `Embedding`s / `EmbeddingBag`s 5. C++ `from_pretrained()` options 6. C++ functional options Pull Request resolved: https://github.com/pytorch/pytorch/pull/56065 Reviewed By: malfet Differential Revision: D27908383 Pulled By: jbschlosser fbshipit-source-id: c5891fed1c9d33b4b8cd63500a14c1a77d92cc78	2021-04-21 12:10:37 -07:00
Zafar Takhirov	e691f24079	[sparsity] Moving only the C++ files from internal to OSS (#56553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56553 This splits the previous diff into multiple parts. This introduces only the c++ files. The unittests pass as part of the internal build. Will be put in the OSS in the later PRs Test Plan: `buck test mode/opt //caffe2/torch/fb/model_optimization:sparsity_test` ``` Parsing buck files: finished in 2.0 sec Creating action graph: finished in 16.4 sec Building: finished in 55.0 sec (100%) 20264/20264 jobs, 16 updated Total time: 01:13.6 min More details at https://www.internalfb.com/intern/buck/build/c9c5e69e-ce00-4560-adce-58b68bc43e47 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 1e678a07-0689-45b4-96f3-54d0a3181996 Trace available for this run at /tmp/tpx-20210415-161113.966600/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224795029304 ✓ ListingSuccess: caffe2/torch/fb/model_optimization:sparsity_test - main (4.186) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (1.752) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseKernels) (1.884) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear_serdes (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (2.013) Summary Pass: 3 ListingSuccess: 1 ``` Reviewed By: ailzhang Differential Revision: D27833226 fbshipit-source-id: a47707117de950a9794f79e50a544aa13542c1e1	2021-04-21 12:02:00 -07:00
Edward Yang	02c9d2dc90	Release GIL before destructing ProcessGroup classes (#56381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56381 Part of fix for https://github.com/pytorch/pytorch/issues/56297 ghstack-source-id: 126943449 Test Plan: sandcastle Reviewed By: zhaojuanmao Differential Revision: D27855337 fbshipit-source-id: 88bc9234685a6637318e35b25fa68ccbdc3cbc12	2021-04-21 11:49:38 -07:00
Wilson Hong	3e55fc91fd	[pet] Remove additional @record in elastic_launch to fix file existing error Summary: Since `launch_agent()` in api.py is already decorated with record, we can remove the usage in elastic_launch. It also fix the bug for FileExistError on MAST We run an experiment to count how many times record is invoked in D27901961 to ensure the assumption. Test Plan: ``` fbpkg build -E torchelastic_distributed_sum buck run mode/dev-nosan //pytorch/elastic/torchelastic/tsm/fb/cli:tsm -- run_ddp --scheduler mast --fbpkg torchelastic_distributed_sum:fde7879 --nnodes 1 --nproc_per_node 1 --resource T1 --run_cfg hpcIdentity=oncall_dai_pet,hpcClusterUuid=MastNaoTestCluster main.par ``` https://www.internalfb.com/mast/job/tsm_wilsonhong-torchelastic_distributed_sum_a92f97e7 Reviewed By: borovsky-d Differential Revision: D27902034 fbshipit-source-id: e08b02d4b9c7a7c70fbb0dbcb24b95af55d2ea95	2021-04-21 11:32:09 -07:00
Brian Hirsh	90e532f3ef	Revert D27708346: generate xla codegen in-tree Test Plan: revert-hammer Differential Revision: D27708346 (`51d0212d0f`) Original commit changeset: 2289edd641f3 fbshipit-source-id: 86711c07db19833b9e772c558e12accba1432499	2021-04-21 11:07:45 -07:00
Rohan Varma	b7d5a0cf10	[c10d] sequence number in process group (#55319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55319 Adds a sequence number class as well as integration with ProcessGroup (nccl and gloo) as part of better debugability. The main use case is that each ProcessGroup instantiated will have a sequence number initially set by rank 0, and broadcasted to all others. We will increment the number on each collective, thus allowing us to match the numbers appropriately when checking for desynchronization. This PR just adds the bare-bones integration and verifies sequence numbers are set appropriately at the beginning. ghstack-source-id: 127011277 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27562769 fbshipit-source-id: d4a4de7529ce07a0c86fcf6beb06f317f359d89b	2021-04-21 10:59:24 -07:00
Jerry Zhang	096089abcb	[quant][graphmode][fx] Produce torch.cat instead of torch.ops.quantized.cat (#54924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54924 Previously we are producing torch.ops.quantize.cat which takes inputs, dequantize them and requantize with new qparams. This PR changes that to produce torch.cat directly, torch.cat will assume all inputs are sharing the same qparam, and it will produce a quantized Tensor with the same qparam as all inputs (because previous PR makes sure all inputs and output of cat are sharing the same observer/fakequant instance). Using torch.cat is expected to be more efficient since it does not introduce extra quant/dequant. Test Plan: python test/test_quantization.py TestQuantizeFx.test_cat Imported from OSS Reviewed By: vkuzo Differential Revision: D27416528 fbshipit-source-id: 896c280abec2903c29d597c655729666583ff0dd	2021-04-21 10:58:09 -07:00
Ivan Kobzarev	2e8418025a	[vulkan] safe_downcast for buck build (#56540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56540 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D27894423 Pulled By: IvanKobzarev fbshipit-source-id: 2302f9ef9d06c2e072a5e83ea7abecf754ce325d	2021-04-21 10:54:44 -07:00
Shai Bagon	a583b9cd86	Fixing "naive" `forward` of `ModuleList` and `ModuleDict (#48785 ) Summary: Goal: Making sure "calling"/"forwarding" a `ModuleList` or `ModuleDict` produce the intended `NotImpmentedError`. Current behavior: Currently, when naively calling `forward` user ends up with the confusing error message: ```python TypeError: forward() takes 1 positional argument but 2 were given ``` Instead of the intended `NotImplementedError.` This minor issue was brought up by vadimkantorov in issue https://github.com/pytorch/pytorch/issues/37718 [here][1], also by a confused stackoverflow user [here][2]. What this PR includes: Remove `forward` altogether from `ModuleList` and `ModuleDict` to fall back on the `_forward_unimplemented` of `Module` that properly throws `NotImplementedError` regardless of input arguments. Appropriate test was added to `test_nn.py` Fixes previous PR https://github.com/pytorch/pytorch/issues/48698 and PR https://github.com/pytorch/pytorch/issues/48783 (third time's a charm? I'm really sorry for the mess) Test added according to ngimel [request][3]. [1]: https://github.com/pytorch/pytorch/issues/37718#issuecomment-736333345 [2]: https://stackoverflow.com/q/65096679/1714410 [3]: https://github.com/pytorch/pytorch/pull/48698#issuecomment-737398693 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48785 Reviewed By: zhangguanheng66 Differential Revision: D25359759 Pulled By: jbschlosser fbshipit-source-id: 28f82386f2e9a2a9b0b0b81b16dba6b79398bd34	2021-04-21 10:43:07 -07:00
Jane Xu	e51f73a03e	Report test stats for macos_10_13 tests (#56429 ) Summary: Run print_test_stats.py for macos_10_13 tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56429 Test Plan: Make sure CI passes, specifically for macos_10_13 Reviewed By: samestep Differential Revision: D27911557 Pulled By: janeyx99 fbshipit-source-id: 178c0ff7786ab5c41dec9d8afa257eebda4f5a0f	2021-04-21 10:02:39 -07:00
Raghavan Raman	d43d6593cd	[NNC] Handling conditionals in reorderAxis (#56063 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063 Reviewed By: huiguoo Differential Revision: D27894772 Pulled By: navahgar fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84	2021-04-21 09:35:17 -07:00
Sam Estep	fe0e1c71a7	Add type ignore lint to Makefile (#56587 ) Summary: Followup to https://github.com/pytorch/pytorch/issues/56290 which adds the new lint to the local runner from https://github.com/pytorch/pytorch/issues/56439. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56587 Test Plan: Same as https://github.com/pytorch/pytorch/issues/56439. Reviewed By: walterddr Differential Revision: D27909889 Pulled By: samestep fbshipit-source-id: 8b67f3bc36c9b5567fe5a9e49904f2cf23a9f135	2021-04-21 08:45:19 -07:00
Brian Hirsh	51d0212d0f	generate xla codegen in-tree (#55050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55050 not ready for review yet Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27708346 Pulled By: bdhirsh fbshipit-source-id: 2289edd641f30277d7561cf2d48ec69c6a2137a9	2021-04-21 08:19:08 -07:00
Peter Bell	744360ce52	Fix missing definitions in Vec256 for VSX (#56486 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/56474, although I have no Power PC system to test on. Sleef has `copysign` support for vsx, according to https://sleef.org/ppc64.xhtml Pull Request resolved: https://github.com/pytorch/pytorch/pull/56486 Reviewed By: heitorschueroff Differential Revision: D27890091 Pulled By: ezyang fbshipit-source-id: be0221f33a12f66f30d49a4cdea858ffcce1061f	2021-04-21 08:13:07 -07:00
Sam Estep	75024e228c	Add lint for unqualified `type: ignore` (#56290 ) Summary: The other half of https://github.com/pytorch/pytorch/issues/56272. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56290 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI runs (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2384511062 - https://github.com/pytorch/pytorch/actions/runs/765036024 Reviewed By: seemethere Differential Revision: D27867219 Pulled By: samestep fbshipit-source-id: e648f07b6822867e70833e23ddafe7fb7eaca235	2021-04-21 08:07:23 -07:00
Brian Hirsh	87a1ebc9cd	fix RegistrationDeclarations.yaml, now that we codegen composite kernels for structured functional/inplace ops (#56307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56307 This should fix https://github.com/pytorch/pytorch/issues/56273. I tested these changes locally by making them directly on top of https://github.com/pytorch/pytorch/pull/56151, and running the xla tests (`xla/test/cpp/build/test_ptxla`). Current state: For ops that are ported to structured, If external backends like XLA have implemented the `out` op but not the `functional` version, they will call into our code-generated `CompositeExplicitAutograd` kernel, which calls the structured operator's `meta()` function and then redispatches to the external backend's `out` function. If XLA has registered their own kernel to the `functional` variant of the op, it'll override our codegen'd composite kernel. XLA has logic to code-generate "CPU fallback" kernels for "required" ops. It gets this information based off of `RegistrationDeclarations.yaml`. That info was technically incorrect up until this PR, since we were code-generating `inplace/functional` composite kernels for structured ops, but not updating `RegistrationDeclarations.yaml` with that information. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27883950 Pulled By: bdhirsh fbshipit-source-id: fe896b0d2bbd4369490dcdf7a87f227fd3d8b8b3	2021-04-21 07:41:09 -07:00
Brian Hirsh	46a1ac40d9	fix meta() calls for non-storage tensors (i.e. xla) (#56306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56306 It turns out that TensorIteratorBase `meta()` calls don't work with XLA tensors, since the logic that builds up the `TensorIteratorBase` object also tries to grab/store the underlying tensors' data pointers. This doesn't work for XLA because they don't have storage. I think it's fine to just skip this bit of logic for tensors that don't have storage- since the data_ptr information isn't important to the meta call, and TensorIterator isn't actually used in the implementation for non-native kernels, i.e. XLA. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27883949 Pulled By: bdhirsh fbshipit-source-id: 7db4358b94b23c504a383f9673dc509c4020a708	2021-04-21 07:39:41 -07:00
Philip Meier	d168eae114	make torch.testing error messages more expressive (#55145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55145 Repeating the discussion from https://github.com/pytorch/pytorch/pull/54784#issuecomment-811792089 The error messages for mismatched values are directly adapted from the old `_compare_tensors_internal`: `50cb75edce/torch/testing/__init__.py (L104-L111)` A sample error message right now looks like this ``` With rtol=1.3e-06 and atol=1e-05, found 1 different element(s) out of 12 (8.3%). The greatest difference of 4.0 (5.0 vs. 9.0) occurred at index (2, 3) ``` Using the same data with `numpy.testing.assert_equal` gives the following output: ``` Not equal to tolerance rtol=1.3e-06, atol=1e-05 Mismatched elements: 1 / 12 (8.33%) Max absolute difference: 4. Max relative difference: 0.44444445 x: array([[5., 5., 5., 5.], [5., 5., 5., 5.], [5., 5., 5., 5.]], dtype=float32) y: array([[5., 5., 5., 5.], [5., 5., 5., 5.], [5., 5., 5., 9.]], dtype=float32) ``` Pros: - The info is presented in a list instead of a sentence. IMO this makes it more readable - The maximum relative difference is reported, which is beneficial in case a comparison fails due to the `rtol` Cons: - The values of the inputs are reported (this can be disabled by passing `verbose=False`, but lets face it: most users will use the default setting). In case the inputs are large, the output gets truncated with `...`. Not only is it hard to visually find the mismatching values, they could also live within the truncated part, making the output completely useless. - Even when visually find the offending values it is hard to parse this back to the index in the inputs. This implements a mix of both to get a short but expressive message: ``` Tensors are not close according to rtol=1.3e-6 and atol=1e-05: Mismatched elements: 1 / 12 (8.3%) Max. rel. diff.: 4.44e-1 at (2, 3) Max. abs. diff.: 4.0 at (2, 3) ``` Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D27877157 Pulled By: mruberry fbshipit-source-id: 6898a995f116f127e3ae8ed0bcb1ada63eadc45a	2021-04-21 06:29:42 -07:00
Horace He	b66a1e00a6	[NNC] added skeleton for refactoring (#55371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55371 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D27616418 Pulled By: Chillee fbshipit-source-id: 8187a0cb2495b6bec07bb5992e352da3ffb299fb	2021-04-21 04:07:01 -07:00
Chunli Fu	7929bc76a0	[shape inference] Fix dim type for Cast Summary: ATT Test Plan: unit test Reviewed By: yinghai Differential Revision: D27904584 fbshipit-source-id: b62d2eb5da0be79091c82e6300dd0c075a0bf2fe	2021-04-21 03:21:56 -07:00
nikithamalgi	4575028f6c	Update script API to take example inputs (#55376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55376 Test Plan: Imported from OSS Reviewed By: driazati, gmagogsfm Differential Revision: D27897350 Pulled By: nikithamalgifb fbshipit-source-id: 4f63235b9eae898c8f4ccaec3fcf64b4b29c860e	2021-04-21 01:00:35 -07:00
Bert Maher	c91c4a081d	[NNC] Horizontally fuse all loops (#56324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324 Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs (and thus multiple loops), CSE has no chance. So, this pass "horizontally" fuses the output loops together so that CSE can go to town. Essentially we want to turn ``` for (...) { output_1[] = some_complicated_expr... } for (...) { output_2[] = some_complicated_expr... } ``` Into: ``` for (...) { output_1[] = complicated_expr output_2[] = complicated_expr. // llvm cse should take care of this } ``` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27841194 Pulled By: bertmaher fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a	2021-04-20 23:54:40 -07:00
Hao Lu	33f206b865	[StaticRuntime] Replace StorageImpl with TensorImpl in MemoryPlanner (#56447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56447 MemoryPlanner shouldn't manage StorageImpls; instead, it should manage the TensorImpls because the StorageImpl in Tensors can change. Test Plan: CI Reviewed By: ajyu Differential Revision: D27840361 fbshipit-source-id: f22165d167c70165be2934c6717b5057a8bb4d29	2021-04-20 23:04:01 -07:00
BowenBao	88fbbb4165	[ONNX] Fix ComputeShapeFromReshape when input_shape_size < reshape_size (#56171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56171 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866146 Pulled By: SplitInfinity fbshipit-source-id: 4f361e1a99e4aedd701c73aae97a440f98282086	2021-04-20 23:00:50 -07:00
BowenBao	1e449694a3	[ONNX] enable word_language_model GRU and LSTM scripting (#54310 ) (#56170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56170 * enable test_word_language_model_GRU * add test_word_language_model_LSTM * fix ci clang * fix flake8 format * remove the outer call to tuple() Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866142 Pulled By: SplitInfinity fbshipit-source-id: ff71d6e1af49b01c6059592930dcfffae98675e8 Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-04-20 23:00:48 -07:00
BowenBao	0b0fca3c59	[ONNX] Export mv op (#55470 ) (#56169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56169 Adding matrix-vector multiplication op Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866141 Pulled By: SplitInfinity fbshipit-source-id: 40e8f65c590bc5354b764b51e0c3cd8386fdc33b	2021-04-20 23:00:46 -07:00
BowenBao	90e63cc41f	[ONNX] Add support for prim::min (#55259 ) (#56168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56168 Add support for prim::min operator and update full_like Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866144 Pulled By: SplitInfinity fbshipit-source-id: f4af4b8171ed8bd7980fa3141f5fc9811e2bc367	2021-04-20 23:00:44 -07:00
BowenBao	a31fd7f453	Fix onnx/constant_fold.cpp compilation on Windows (#55770 ) (#56167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56167 VC++ does not recognize `or` as a valid operator. This breaks the build under `Debug` configuration. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866143 Pulled By: SplitInfinity fbshipit-source-id: 490cee57b9762170ce02a6f73130772a3542e76d	2021-04-20 23:00:43 -07:00
BowenBao	5a455dc717	[ONNX] Enable tensordot symbolic function. (#55654 ) (#56166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56166 Support tensordot in symbolic function of opset 12, and add tests accordingly. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866140 Pulled By: SplitInfinity fbshipit-source-id: 68e218cfbd630900fb92871fc7c0de3e7e8c8c3d	2021-04-20 23:00:41 -07:00
BowenBao	f804b65d4e	[ONNX] Update repeat_interleave symbolic (#54312 ) (#56165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56165 Add implementation for cases when - interleaving happens along dim which consist of dynamic axes Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866137 Pulled By: SplitInfinity fbshipit-source-id: 7fef1b2c614f2e24a677b7ca0886bb37bd0ab479	2021-04-20 23:00:39 -07:00
BowenBao	9986b109d2	[ONNX] Fix assign input shape for tuple inputs & primitive type inputs (#54112 ) (#56164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56164 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866139 Pulled By: SplitInfinity fbshipit-source-id: c59f5a07df685e1ccdc4860d603ec422ec80d188	2021-04-20 23:00:37 -07:00
BowenBao	75995e4bf6	[ONNX] Add support for hann_window operator. (#54587 ) (#56163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56163 * [ONNX] Improve index_put symbolic to handle singular Bool updates (#53690) Adds support for cases where the updates to the index_put node is a single Bool value, such as the case shown below ``` mask[indices] = True ``` Fixes #53507 * [ONNX] Support primitive type input/outputs and attributes (#53550) Support primitive type attributes. Needed for Silero model. * [ONNX] Fix if output shape mismatch error & Fix graph input directly used as output (#53219) Fix if output shape mismatch error & Fix graph input directly used as output * Add support for hann_window operator. * [ONNX] Replace decomposeLinear pre process pass with a symbolic (#53077) Replace decomposeLinear pre process pass with a symbolic * Add a test case for dtype is None. * Resolve flake8 issue. * Remove one unused test case. * Add support for hann_window operator. * Add a test case for dtype is None. * Remove one unused test case. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27866145 Pulled By: SplitInfinity fbshipit-source-id: e0b43df9ecd1a95cd7ac297213aba453bbaf2913 Co-authored-by: Shubham Bhokare <32080845+shubhambhokare1@users.noreply.github.com> Co-authored-by: Negin Raoof <neginmr@utexas.edu> Co-authored-by: Bowen Bao <bowbao@microsoft.com> Co-authored-by: Ksenija Stanojevic <KsenijaS@users.noreply.github.com>	2021-04-20 22:59:31 -07:00
Andy Wei	19943aafe9	[caffe2] Speed up remote net loading Summary: Training recovery takes over 3 hours for DI models. See T88118480 for more details. One of the slowness reasons could be the linear search in the ApplicationSpecificInfo. To improve that, we cache the app info into a dict so the lookup can be much faster. Test Plan: Unit test buck test caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test ```Building: finished in 6.2 sec (100%) 11023/11023 jobs, 2 updated Total time: 6.6 sec More details at https://www.internalfb.com/intern/buck/build/95555464-b15f-44f2-a781-a712126aeaa1 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 3f4e4913-5802-4437-81bf-1e0a08c067da Trace available for this run at /tmp/tpx-20210420-101444.394595/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863 ✓ ListingSuccess: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - main (8.412) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_empty_remote_net_in_app_into (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (7.844) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_distributed_context_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.014) ✓ Pass: caffe2/caffe2/fb/predictor:predictor_py_dist_utils_test - test_remote_net_in_app_info (caffe2.caffe2.fb.predictor.predictor_py_dist_utils_test.TestPredictorDistUtils) (8.027) Summary Pass: 3 ListingSuccess: 1 If you need help debugging your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024608951863 ``` Performance Test: N557020 is the old way, which takes about 30~60 secs for every 1000 remote nets N556897 is the new way, which takes 0.12 secs for every 1000 remote nets N557020 output: ~~~ I0420 112047.755 <ipython-input-2-515f8ba1b5f6>:48] Start retrieving remote nets ... I0420 112050.036 <ipython-input-2-515f8ba1b5f6>:27] Get 1000 remote nets I0420 112052.750 <ipython-input-2-515f8ba1b5f6>:27] Get 2000 remote nets I0420 112055.907 <ipython-input-2-515f8ba1b5f6>:27] Get 3000 remote nets I0420 112059.542 <ipython-input-2-515f8ba1b5f6>:27] Get 4000 remote nets I0420 112103.628 <ipython-input-2-515f8ba1b5f6>:27] Get 5000 remote nets I0420 112108.309 <ipython-input-2-515f8ba1b5f6>:27] Get 6000 remote nets I0420 112113.883 <ipython-input-2-515f8ba1b5f6>:27] Get 7000 remote nets I0420 112119.564 <ipython-input-2-515f8ba1b5f6>:27] Get 8000 remote nets I0420 112125.629 <ipython-input-2-515f8ba1b5f6>:27] Get 9000 remote nets I0420 112132.057 <ipython-input-2-515f8ba1b5f6>:27] Get 10000 remote nets I0420 112138.979 <ipython-input-2-515f8ba1b5f6>:27] Get 11000 remote nets I0420 112146.198 <ipython-input-2-515f8ba1b5f6>:27] Get 12000 remote nets I0420 112154.381 <ipython-input-2-515f8ba1b5f6>:27] Get 13000 remote nets I0420 112202.881 <ipython-input-2-515f8ba1b5f6>:27] Get 14000 remote nets I0420 112211.595 <ipython-input-2-515f8ba1b5f6>:27] Get 15000 remote nets I0420 112221.341 <ipython-input-2-515f8ba1b5f6>:27] Get 16000 remote nets I0420 112231.300 <ipython-input-2-515f8ba1b5f6>:27] Get 17000 remote nets I0420 112242.615 <ipython-input-2-515f8ba1b5f6>:27] Get 18000 remote nets I0420 112253.730 <ipython-input-2-515f8ba1b5f6>:27] Get 19000 remote nets I0420 112305.044 <ipython-input-2-515f8ba1b5f6>:27] Get 20000 remote nets I0420 112316.378 <ipython-input-2-515f8ba1b5f6>:27] Get 21000 remote nets I0420 112328.176 <ipython-input-2-515f8ba1b5f6>:27] Get 22000 remote nets I0420 112341.466 <ipython-input-2-515f8ba1b5f6>:27] Get 23000 remote nets I0420 112355.653 <ipython-input-2-515f8ba1b5f6>:27] Get 24000 remote nets I0420 112409.014 <ipython-input-2-515f8ba1b5f6>:27] Get 25000 remote nets I0420 112422.924 <ipython-input-2-515f8ba1b5f6>:27] Get 26000 remote nets I0420 112437.026 <ipython-input-2-515f8ba1b5f6>:27] Get 27000 remote nets I0420 112451.413 <ipython-input-2-515f8ba1b5f6>:27] Get 28000 remote nets I0420 112506.773 <ipython-input-2-515f8ba1b5f6>:27] Get 29000 remote nets I0420 112522.614 <ipython-input-2-515f8ba1b5f6>:27] Get 30000 remote nets I0420 112538.564 <ipython-input-2-515f8ba1b5f6>:27] Get 31000 remote nets I0420 112555.075 <ipython-input-2-515f8ba1b5f6>:27] Get 32000 remote nets I0420 112612.159 <ipython-input-2-515f8ba1b5f6>:27] Get 33000 remote nets I0420 112629.656 <ipython-input-2-515f8ba1b5f6>:27] Get 34000 remote nets I0420 112647.850 <ipython-input-2-515f8ba1b5f6>:27] Get 35000 remote nets I0420 112705.807 <ipython-input-2-515f8ba1b5f6>:27] Get 36000 remote nets I0420 112724.495 <ipython-input-2-515f8ba1b5f6>:27] Get 37000 remote nets I0420 112744.072 <ipython-input-2-515f8ba1b5f6>:27] Get 38000 remote nets I0420 112804.266 <ipython-input-2-515f8ba1b5f6>:27] Get 39000 remote nets I0420 112824.954 <ipython-input-2-515f8ba1b5f6>:27] Get 40000 remote nets I0420 112845.934 <ipython-input-2-515f8ba1b5f6>:27] Get 41000 remote nets I0420 112908.721 <ipython-input-2-515f8ba1b5f6>:27] Get 42000 remote nets I0420 112930.573 <ipython-input-2-515f8ba1b5f6>:27] Get 43000 remote nets I0420 112952.775 <ipython-input-2-515f8ba1b5f6>:27] Get 44000 remote nets I0420 113015.969 <ipython-input-2-515f8ba1b5f6>:27] Get 45000 remote nets I0420 113041.214 <ipython-input-2-515f8ba1b5f6>:27] Get 46000 remote nets I0420 113104.702 <ipython-input-2-515f8ba1b5f6>:27] Get 47000 remote nets I0420 113128.730 <ipython-input-2-515f8ba1b5f6>:27] Get 48000 remote nets I0420 113153.378 <ipython-input-2-515f8ba1b5f6>:27] Get 49000 remote nets I0420 113218.021 <ipython-input-2-515f8ba1b5f6>:27] Get 50000 remote nets I0420 113243.351 <ipython-input-2-515f8ba1b5f6>:27] Get 51000 remote nets I0420 113309.279 <ipython-input-2-515f8ba1b5f6>:27] Get 52000 remote nets I0420 113335.202 <ipython-input-2-515f8ba1b5f6>:27] Get 53000 remote nets I0420 113402.367 <ipython-input-2-515f8ba1b5f6>:27] Get 54000 remote nets I0420 113430.947 <ipython-input-2-515f8ba1b5f6>:27] Get 55000 remote nets I0420 113458.127 <ipython-input-2-515f8ba1b5f6>:27] Get 56000 remote nets I0420 113526.365 <ipython-input-2-515f8ba1b5f6>:27] Get 57000 remote nets I0420 113554.709 <ipython-input-2-515f8ba1b5f6>:27] Get 58000 remote nets I0420 113623.601 <ipython-input-2-515f8ba1b5f6>:27] Get 59000 remote nets I0420 113653.264 <ipython-input-2-515f8ba1b5f6>:27] Get 60000 remote nets I0420 113724.726 <ipython-input-2-515f8ba1b5f6>:27] Get 61000 remote nets I0420 113755.080 <ipython-input-2-515f8ba1b5f6>:27] Get 62000 remote nets I0420 113827.936 <ipython-input-2-515f8ba1b5f6>:27] Get 63000 remote nets I0420 113859.362 <ipython-input-2-515f8ba1b5f6>:27] Get 64000 remote nets I0420 113931.138 <ipython-input-2-515f8ba1b5f6>:27] Get 65000 remote nets I0420 114003.229 <ipython-input-2-515f8ba1b5f6>:27] Get 66000 remote nets I0420 114038.085 <ipython-input-2-515f8ba1b5f6>:27] Get 67000 remote nets I0420 114111.300 <ipython-input-2-515f8ba1b5f6>:27] Get 68000 remote nets I0420 114145.383 <ipython-input-2-515f8ba1b5f6>:27] Get 69000 remote nets I0420 114219.571 <ipython-input-2-515f8ba1b5f6>:27] Get 70000 remote nets I0420 114254.233 <ipython-input-2-515f8ba1b5f6>:27] Get 71000 remote nets I0420 114329.326 <ipython-input-2-515f8ba1b5f6>:27] Get 72000 remote nets I0420 114405.087 <ipython-input-2-515f8ba1b5f6>:27] Get 73000 remote nets I0420 114440.979 <ipython-input-2-515f8ba1b5f6>:27] Get 74000 remote nets I0420 114518.520 <ipython-input-2-515f8ba1b5f6>:27] Get 75000 remote nets I0420 114556.013 <ipython-input-2-515f8ba1b5f6>:27] Get 76000 remote nets I0420 114633.434 <ipython-input-2-515f8ba1b5f6>:27] Get 77000 remote nets I0420 114711.834 <ipython-input-2-515f8ba1b5f6>:27] Get 78000 remote nets I0420 114750.741 <ipython-input-2-515f8ba1b5f6>:27] Get 79000 remote nets I0420 114829.749 <ipython-input-2-515f8ba1b5f6>:27] Get 80000 remote nets I0420 114909.038 <ipython-input-2-515f8ba1b5f6>:27] Get 81000 remote nets I0420 114948.711 <ipython-input-2-515f8ba1b5f6>:27] Get 82000 remote nets I0420 115028.869 <ipython-input-2-515f8ba1b5f6>:27] Get 83000 remote nets I0420 115109.094 <ipython-input-2-515f8ba1b5f6>:27] Get 84000 remote nets I0420 115150.249 <ipython-input-2-515f8ba1b5f6>:27] Get 85000 remote nets I0420 115231.601 <ipython-input-2-515f8ba1b5f6>:27] Get 86000 remote nets I0420 115313.772 <ipython-input-2-515f8ba1b5f6>:27] Get 87000 remote nets I0420 115356.035 <ipython-input-2-515f8ba1b5f6>:27] Get 88000 remote nets I0420 115438.846 <ipython-input-2-515f8ba1b5f6>:27] Get 89000 remote nets I0420 115522.213 <ipython-input-2-515f8ba1b5f6>:27] Get 90000 remote nets I0420 115607.908 <ipython-input-2-515f8ba1b5f6>:27] Get 91000 remote nets I0420 115652.009 <ipython-input-2-515f8ba1b5f6>:27] Get 92000 remote nets I0420 115736.510 <ipython-input-2-515f8ba1b5f6>:27] Get 93000 remote nets I0420 115822.303 <ipython-input-2-515f8ba1b5f6>:27] Get 94000 remote nets I0420 115908.392 <ipython-input-2-515f8ba1b5f6>:27] Get 95000 remote nets I0420 115954.912 <ipython-input-2-515f8ba1b5f6>:27] Get 96000 remote nets I0420 120042.219 <ipython-input-2-515f8ba1b5f6>:27] Get 97000 remote nets I0420 120129.969 <ipython-input-2-515f8ba1b5f6>:27] Get 98000 remote nets I0420 120218.765 <ipython-input-2-515f8ba1b5f6>:27] Get 99000 remote nets I0420 120306.883 <ipython-input-2-515f8ba1b5f6>:27] Get 100000 remote nets I0420 120355.543 <ipython-input-2-515f8ba1b5f6>:27] Get 101000 remote nets I0420 120444.976 <ipython-input-2-515f8ba1b5f6>:27] Get 102000 remote nets I0420 120533.482 <ipython-input-2-515f8ba1b5f6>:27] Get 103000 remote nets I0420 120622.351 <ipython-input-2-515f8ba1b5f6>:27] Get 104000 remote nets I0420 120712.467 <ipython-input-2-515f8ba1b5f6>:27] Get 105000 remote nets I0420 120802.660 <ipython-input-2-515f8ba1b5f6>:27] Get 106000 remote nets I0420 120854.634 <ipython-input-2-515f8ba1b5f6>:27] Get 107000 remote nets I0420 120945.786 <ipython-input-2-515f8ba1b5f6>:27] Get 108000 remote nets ~~~ N556897 output: ~~~ I0420 111502.516 <ipython-input-7-52640a51556f>:60] Start retrieving remote nets ... I0420 111504.709 <ipython-input-7-52640a51556f>:40] Get 1000 remote nets I0420 111504.825 <ipython-input-7-52640a51556f>:40] Get 2000 remote nets I0420 111504.941 <ipython-input-7-52640a51556f>:40] Get 3000 remote nets I0420 111505.056 <ipython-input-7-52640a51556f>:40] Get 4000 remote nets I0420 111505.174 <ipython-input-7-52640a51556f>:40] Get 5000 remote nets I0420 111505.286 <ipython-input-7-52640a51556f>:40] Get 6000 remote nets I0420 111505.405 <ipython-input-7-52640a51556f>:40] Get 7000 remote nets I0420 111505.522 <ipython-input-7-52640a51556f>:40] Get 8000 remote nets I0420 111505.639 <ipython-input-7-52640a51556f>:40] Get 9000 remote nets I0420 111505.756 <ipython-input-7-52640a51556f>:40] Get 10000 remote nets I0420 111505.873 <ipython-input-7-52640a51556f>:40] Get 11000 remote nets I0420 111505.990 <ipython-input-7-52640a51556f>:40] Get 12000 remote nets I0420 111506.106 <ipython-input-7-52640a51556f>:40] Get 13000 remote nets I0420 111506.223 <ipython-input-7-52640a51556f>:40] Get 14000 remote nets I0420 111506.343 <ipython-input-7-52640a51556f>:40] Get 15000 remote nets I0420 111506.457 <ipython-input-7-52640a51556f>:40] Get 16000 remote nets I0420 111506.585 <ipython-input-7-52640a51556f>:40] Get 17000 remote nets I0420 111508.930 <ipython-input-7-52640a51556f>:40] Get 18000 remote nets I0420 111509.045 <ipython-input-7-52640a51556f>:40] Get 19000 remote nets I0420 111509.154 <ipython-input-7-52640a51556f>:40] Get 20000 remote nets I0420 111509.266 <ipython-input-7-52640a51556f>:40] Get 21000 remote nets I0420 111509.382 <ipython-input-7-52640a51556f>:40] Get 22000 remote nets I0420 111509.497 <ipython-input-7-52640a51556f>:40] Get 23000 remote nets I0420 111509.614 <ipython-input-7-52640a51556f>:40] Get 24000 remote nets I0420 111509.736 <ipython-input-7-52640a51556f>:40] Get 25000 remote nets I0420 111509.854 <ipython-input-7-52640a51556f>:40] Get 26000 remote nets I0420 111509.972 <ipython-input-7-52640a51556f>:40] Get 27000 remote nets I0420 111510.090 <ipython-input-7-52640a51556f>:40] Get 28000 remote nets I0420 111510.210 <ipython-input-7-52640a51556f>:40] Get 29000 remote nets I0420 111510.329 <ipython-input-7-52640a51556f>:40] Get 30000 remote nets I0420 111510.448 <ipython-input-7-52640a51556f>:40] Get 31000 remote nets I0420 111510.572 <ipython-input-7-52640a51556f>:40] Get 32000 remote nets I0420 111510.689 <ipython-input-7-52640a51556f>:40] Get 33000 remote nets I0420 111510.821 <ipython-input-7-52640a51556f>:40] Get 34000 remote nets I0420 111510.989 <ipython-input-7-52640a51556f>:40] Get 35000 remote nets I0420 111511.110 <ipython-input-7-52640a51556f>:40] Get 36000 remote nets I0420 111511.236 <ipython-input-7-52640a51556f>:40] Get 37000 remote nets I0420 111511.357 <ipython-input-7-52640a51556f>:40] Get 38000 remote nets I0420 111511.482 <ipython-input-7-52640a51556f>:40] Get 39000 remote nets I0420 111511.607 <ipython-input-7-52640a51556f>:40] Get 40000 remote nets I0420 111511.729 <ipython-input-7-52640a51556f>:40] Get 41000 remote nets I0420 111511.855 <ipython-input-7-52640a51556f>:40] Get 42000 remote nets I0420 111511.988 <ipython-input-7-52640a51556f>:40] Get 43000 remote nets I0420 111512.112 <ipython-input-7-52640a51556f>:40] Get 44000 remote nets I0420 111512.232 <ipython-input-7-52640a51556f>:40] Get 45000 remote nets I0420 111512.353 <ipython-input-7-52640a51556f>:40] Get 46000 remote nets I0420 111512.477 <ipython-input-7-52640a51556f>:40] Get 47000 remote nets I0420 111512.597 <ipython-input-7-52640a51556f>:40] Get 48000 remote nets I0420 111512.723 <ipython-input-7-52640a51556f>:40] Get 49000 remote nets I0420 111512.839 <ipython-input-7-52640a51556f>:40] Get 50000 remote nets I0420 111512.969 <ipython-input-7-52640a51556f>:40] Get 51000 remote nets I0420 111513.085 <ipython-input-7-52640a51556f>:40] Get 52000 remote nets I0420 111513.205 <ipython-input-7-52640a51556f>:40] Get 53000 remote nets I0420 111513.322 <ipython-input-7-52640a51556f>:40] Get 54000 remote nets I0420 111513.441 <ipython-input-7-52640a51556f>:40] Get 55000 remote nets I0420 111513.559 <ipython-input-7-52640a51556f>:40] Get 56000 remote nets I0420 111513.678 <ipython-input-7-52640a51556f>:40] Get 57000 remote nets I0420 111513.796 <ipython-input-7-52640a51556f>:40] Get 58000 remote nets I0420 111513.918 <ipython-input-7-52640a51556f>:40] Get 59000 remote nets I0420 111514.038 <ipython-input-7-52640a51556f>:40] Get 60000 remote nets I0420 111514.158 <ipython-input-7-52640a51556f>:40] Get 61000 remote nets I0420 111514.273 <ipython-input-7-52640a51556f>:40] Get 62000 remote nets I0420 111514.391 <ipython-input-7-52640a51556f>:40] Get 63000 remote nets I0420 111514.512 <ipython-input-7-52640a51556f>:40] Get 64000 remote nets I0420 111514.638 <ipython-input-7-52640a51556f>:40] Get 65000 remote nets I0420 111514.759 <ipython-input-7-52640a51556f>:40] Get 66000 remote nets I0420 111514.874 <ipython-input-7-52640a51556f>:40] Get 67000 remote nets I0420 111515.000 <ipython-input-7-52640a51556f>:40] Get 68000 remote nets I0420 111515.117 <ipython-input-7-52640a51556f>:40] Get 69000 remote nets I0420 111515.235 <ipython-input-7-52640a51556f>:40] Get 70000 remote nets I0420 111515.358 <ipython-input-7-52640a51556f>:40] Get 71000 remote nets I0420 111515.481 <ipython-input-7-52640a51556f>:40] Get 72000 remote nets I0420 111515.604 <ipython-input-7-52640a51556f>:40] Get 73000 remote nets I0420 111515.725 <ipython-input-7-52640a51556f>:40] Get 74000 remote nets I0420 111515.848 <ipython-input-7-52640a51556f>:40] Get 75000 remote nets I0420 111515.979 <ipython-input-7-52640a51556f>:40] Get 76000 remote nets I0420 111516.102 <ipython-input-7-52640a51556f>:40] Get 77000 remote nets I0420 111516.226 <ipython-input-7-52640a51556f>:40] Get 78000 remote nets I0420 111516.344 <ipython-input-7-52640a51556f>:40] Get 79000 remote nets I0420 111516.472 <ipython-input-7-52640a51556f>:40] Get 80000 remote nets I0420 111516.603 <ipython-input-7-52640a51556f>:40] Get 81000 remote nets I0420 111516.751 <ipython-input-7-52640a51556f>:40] Get 82000 remote nets I0420 111516.883 <ipython-input-7-52640a51556f>:40] Get 83000 remote nets I0420 111517.025 <ipython-input-7-52640a51556f>:40] Get 84000 remote nets I0420 111517.160 <ipython-input-7-52640a51556f>:40] Get 85000 remote nets I0420 111517.290 <ipython-input-7-52640a51556f>:40] Get 86000 remote nets I0420 111517.415 <ipython-input-7-52640a51556f>:40] Get 87000 remote nets I0420 111517.541 <ipython-input-7-52640a51556f>:40] Get 88000 remote nets I0420 111517.665 <ipython-input-7-52640a51556f>:40] Get 89000 remote nets I0420 111517.790 <ipython-input-7-52640a51556f>:40] Get 90000 remote nets I0420 111517.918 <ipython-input-7-52640a51556f>:40] Get 91000 remote nets I0420 111518.044 <ipython-input-7-52640a51556f>:40] Get 92000 remote nets I0420 111518.171 <ipython-input-7-52640a51556f>:40] Get 93000 remote nets I0420 111518.292 <ipython-input-7-52640a51556f>:40] Get 94000 remote nets I0420 111518.429 <ipython-input-7-52640a51556f>:40] Get 95000 remote nets I0420 111520.024 <ipython-input-7-52640a51556f>:40] Get 96000 remote nets I0420 111520.148 <ipython-input-7-52640a51556f>:40] Get 97000 remote nets I0420 111520.271 <ipython-input-7-52640a51556f>:40] Get 98000 remote nets I0420 111520.396 <ipython-input-7-52640a51556f>:40] Get 99000 remote nets I0420 111520.522 <ipython-input-7-52640a51556f>:40] Get 100000 remote nets I0420 111520.646 <ipython-input-7-52640a51556f>:40] Get 101000 remote nets I0420 111520.770 <ipython-input-7-52640a51556f>:40] Get 102000 remote nets I0420 111520.899 <ipython-input-7-52640a51556f>:40] Get 103000 remote nets I0420 111521.023 <ipython-input-7-52640a51556f>:40] Get 104000 remote nets I0420 111521.149 <ipython-input-7-52640a51556f>:40] Get 105000 remote nets I0420 111521.274 <ipython-input-7-52640a51556f>:40] Get 106000 remote nets I0420 111521.399 <ipython-input-7-52640a51556f>:40] Get 107000 remote nets I0420 111521.526 <ipython-input-7-52640a51556f>:40] Get 108000 remote nets I0420 111521.651 <ipython-input-7-52640a51556f>:40] Get 109000 remote nets I0420 111521.778 <ipython-input-7-52640a51556f>:40] Get 110000 remote nets I0420 111521.900 <ipython-input-7-52640a51556f>:40] Get 111000 remote nets I0420 111522.055 <ipython-input-7-52640a51556f>:40] Get 112000 remote nets I0420 111522.173 <ipython-input-7-52640a51556f>:40] Get 113000 remote nets I0420 111522.297 <ipython-input-7-52640a51556f>:40] Get 114000 remote nets I0420 111522.421 <ipython-input-7-52640a51556f>:40] Get 115000 remote nets I0420 111522.545 <ipython-input-7-52640a51556f>:40] Get 116000 remote nets I0420 111522.671 <ipython-input-7-52640a51556f>:40] Get 117000 remote nets I0420 111522.795 <ipython-input-7-52640a51556f>:40] Get 118000 remote nets I0420 111522.919 <ipython-input-7-52640a51556f>:40] Get 119000 remote nets I0420 111523.048 <ipython-input-7-52640a51556f>:40] Get 120000 remote nets I0420 111523.171 <ipython-input-7-52640a51556f>:40] Get 121000 remote nets I0420 111523.298 <ipython-input-7-52640a51556f>:40] Get 122000 remote nets I0420 111523.420 <ipython-input-7-52640a51556f>:40] Get 123000 remote nets I0420 111523.544 <ipython-input-7-52640a51556f>:40] Get 124000 remote nets I0420 111523.669 <ipython-input-7-52640a51556f>:40] Get 125000 remote nets I0420 111523.794 <ipython-input-7-52640a51556f>:40] Get 126000 remote nets I0420 111523.920 <ipython-input-7-52640a51556f>:40] Get 127000 remote nets I0420 111524.041 <ipython-input-7-52640a51556f>:40] Get 128000 remote nets I0420 111524.173 <ipython-input-7-52640a51556f>:40] Get 129000 remote nets I0420 111524.293 <ipython-input-7-52640a51556f>:40] Get 130000 remote nets I0420 111524.417 <ipython-input-7-52640a51556f>:40] Get 131000 remote nets I0420 111524.542 <ipython-input-7-52640a51556f>:40] Get 132000 remote nets I0420 111524.665 <ipython-input-7-52640a51556f>:40] Get 133000 remote nets I0420 111524.790 <ipython-input-7-52640a51556f>:40] Get 134000 remote nets I0420 111524.913 <ipython-input-7-52640a51556f>:40] Get 135000 remote nets I0420 111525.038 <ipython-input-7-52640a51556f>:40] Get 136000 remote nets I0420 111525.166 <ipython-input-7-52640a51556f>:40] Get 137000 remote nets I0420 111525.289 <ipython-input-7-52640a51556f>:40] Get 138000 remote nets I0420 111525.414 <ipython-input-7-52640a51556f>:40] Get 139000 remote nets I0420 111525.536 <ipython-input-7-52640a51556f>:40] Get 140000 remote nets I0420 111525.659 <ipython-input-7-52640a51556f>:40] Get 141000 remote nets I0420 111525.782 <ipython-input-7-52640a51556f>:40] Get 142000 remote nets I0420 111525.907 <ipython-input-7-52640a51556f>:40] Get 143000 remote nets I0420 111526.035 <ipython-input-7-52640a51556f>:40] Get 144000 remote nets I0420 111526.157 <ipython-input-7-52640a51556f>:40] Get 145000 remote nets I0420 111526.287 <ipython-input-7-52640a51556f>:40] Get 146000 remote nets I0420 111526.409 <ipython-input-7-52640a51556f>:40] Get 147000 remote nets I0420 111526.533 <ipython-input-7-52640a51556f>:40] Get 148000 remote nets I0420 111526.658 <ipython-input-7-52640a51556f>:40] Get 149000 remote nets I0420 111526.781 <ipython-input-7-52640a51556f>:40] Get 150000 remote nets I0420 111526.908 <ipython-input-7-52640a51556f>:40] Get 151000 remote nets I0420 111527.033 <ipython-input-7-52640a51556f>:40] Get 152000 remote nets I0420 111527.158 <ipython-input-7-52640a51556f>:40] Get 153000 remote nets I0420 111527.289 <ipython-input-7-52640a51556f>:40] Get 154000 remote nets I0420 111527.413 <ipython-input-7-52640a51556f>:40] Get 155000 remote nets I0420 111527.544 <ipython-input-7-52640a51556f>:40] Get 156000 remote nets I0420 111527.665 <ipython-input-7-52640a51556f>:40] Get 157000 remote nets I0420 111527.790 <ipython-input-7-52640a51556f>:40] Get 158000 remote nets I0420 111527.917 <ipython-input-7-52640a51556f>:40] Get 159000 remote nets I0420 111528.046 <ipython-input-7-52640a51556f>:40] Get 160000 remote nets I0420 111528.175 <ipython-input-7-52640a51556f>:40] Get 161000 remote nets I0420 111528.297 <ipython-input-7-52640a51556f>:40] Get 162000 remote nets I0420 111528.422 <ipython-input-7-52640a51556f>:40] Get 163000 remote nets I0420 111528.548 <ipython-input-7-52640a51556f>:40] Get 164000 remote nets I0420 111528.672 <ipython-input-7-52640a51556f>:40] Get 165000 remote nets I0420 111528.796 <ipython-input-7-52640a51556f>:40] Get 166000 remote nets I0420 111528.920 <ipython-input-7-52640a51556f>:40] Get 167000 remote nets I0420 111529.045 <ipython-input-7-52640a51556f>:40] Get 168000 remote nets I0420 111529.172 <ipython-input-7-52640a51556f>:40] Get 169000 remote nets I0420 111529.300 <ipython-input-7-52640a51556f>:40] Get 170000 remote nets I0420 111529.426 <ipython-input-7-52640a51556f>:40] Get 171000 remote nets I0420 111529.547 <ipython-input-7-52640a51556f>:40] Get 172000 remote nets I0420 111529.683 <ipython-input-7-52640a51556f>:40] Get 173000 remote nets I0420 111529.800 <ipython-input-7-52640a51556f>:40] Get 174000 remote nets I0420 111529.923 <ipython-input-7-52640a51556f>:40] Get 175000 remote nets I0420 111530.080 <ipython-input-7-52640a51556f>:40] Get 176000 remote nets I0420 111530.205 <ipython-input-7-52640a51556f>:40] Get 177000 remote nets I0420 111530.331 <ipython-input-7-52640a51556f>:40] Get 178000 remote nets I0420 111530.453 <ipython-input-7-52640a51556f>:40] Get 179000 remote nets I0420 111530.577 <ipython-input-7-52640a51556f>:40] Get 180000 remote nets I0420 111530.705 <ipython-input-7-52640a51556f>:40] Get 181000 remote nets I0420 111530.829 <ipython-input-7-52640a51556f>:40] Get 182000 remote nets I0420 111530.955 <ipython-input-7-52640a51556f>:40] Get 183000 remote nets I0420 111531.082 <ipython-input-7-52640a51556f>:40] Get 184000 remote nets I0420 111531.210 <ipython-input-7-52640a51556f>:40] Get 185000 remote nets I0420 111531.338 <ipython-input-7-52640a51556f>:40] Get 186000 remote nets I0420 111531.461 <ipython-input-7-52640a51556f>:40] Get 187000 remote nets I0420 111531.588 <ipython-input-7-52640a51556f>:40] Get 188000 remote nets I0420 111531.708 <ipython-input-7-52640a51556f>:40] Get 189000 remote nets I0420 111531.845 <ipython-input-7-52640a51556f>:40] Get 190000 remote nets I0420 111531.968 <ipython-input-7-52640a51556f>:40] Get 191000 remote nets I0420 111532.096 <ipython-input-7-52640a51556f>:40] Get 192000 remote nets I0420 111534.047 <ipython-input-7-52640a51556f>:40] Get 193000 remote nets I0420 111534.172 <ipython-input-7-52640a51556f>:40] Get 194000 remote nets I0420 111534.297 <ipython-input-7-52640a51556f>:40] Get 195000 remote nets I0420 111534.420 <ipython-input-7-52640a51556f>:40] Get 196000 remote nets I0420 111534.543 <ipython-input-7-52640a51556f>:40] Get 197000 remote nets I0420 111534.671 <ipython-input-7-52640a51556f>:40] Get 198000 remote nets I0420 111534.794 <ipython-input-7-52640a51556f>:40] Get 199000 remote nets I0420 111534.920 <ipython-input-7-52640a51556f>:40] Get 200000 remote nets I0420 111535.044 <ipython-input-7-52640a51556f>:40] Get 201000 remote nets I0420 111535.167 <ipython-input-7-52640a51556f>:40] Get 202000 remote nets I0420 111535.291 <ipython-input-7-52640a51556f>:40] Get 203000 remote nets I0420 111537.169 <ipython-input-7-52640a51556f>:64] Finish retrieving remote nets. Starting processing ... I0420 111537.201 <ipython-input-7-52640a51556f>:77] Finished processing remote nets ~~~ Reviewed By: heslami Differential Revision: D27886217 fbshipit-source-id: cdc398d04bf963d4f495adc0a91c8ceb54466e58	2021-04-20 22:32:40 -07:00
Tugsbayasgalan Manlaibaatar	a2422cc243	Add stricter check for function schemas with varargs (#56509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56509 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27889626 Pulled By: tugsbayasgalan fbshipit-source-id: 5ff81a313ff53a9519d7dc9f3d6f7234d58af8e2	2021-04-20 20:04:38 -07:00
Michael Suo	a4626348bc	fix unqualified noqa lint (#56548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56548 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27898933 Pulled By: suo fbshipit-source-id: dc4dcd2ab8bb145e5a548566fc299fa6e7e1928e	2021-04-20 17:27:15 -07:00
Dhruv Matani	594c546b69	[PyTorch Edge] Eliminate non-determinism when generating build YAML file (#56539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56539 It seems like a potential source of non-determinism when generating YAML files during the build stems from the fact that when we write out Python lists, they get written out in list order. This isn't a problem per-se, but if you look to see how these lists are generated, you'll see that they come from sets, which are inherently [not order preserving](https://stackoverflow.com/questions/1653970/does-python-have-an-ordered-set) in Python. I can't guarantee that this removes non-determinism, but it removes all non-determinism that I know of so far. The surface area of codegen isn't sprawling, and the YAML file is generated by converting the object `toDict()` and passing it into the YAML serializer, so this should cover it (I think). Dictionaries are serialized in key order by pyyaml, so that's not a problem. This could be releated to the elevated Android build times being seen [here](https://fb.workplace.com/groups/pytorch.edge.users/permalink/841622146708080/). ghstack-source-id: 126987721 Test Plan: Build + Sandcastle. Reviewed By: JacobSzwejbka Differential Revision: D27893058 fbshipit-source-id: 6d7bcb09f34c05b71fbb4a0673bac1c4c33f23d7	2021-04-20 17:26:14 -07:00
davidriazati@fb.com	7fff71eb9a	Fix warnings in tensor_flatten.cpp (#55956 ) Summary: Switch to use `TensorOptions` instead of deprecated `.type()` to fix compiler warnings as part of #55952 ](https://our.intern.facebook.com/intern/diff/27830504/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55956 Pulled By: driazati Reviewed By: pritamdamania87 Differential Revision: D27830504 fbshipit-source-id: f705818ddb7d8b17c0f5383f22dc431203a194d9	2021-04-20 17:22:05 -07:00
Ailing Zhang	3d904b56ec	s/AutoNonVariableTypeMode/AutoDispatchBelowAutograd/ (#56423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56423 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866606 Pulled By: ailzhang fbshipit-source-id: e3942356dc3133d1c5722de40ec0d45e6a60f2f1	2021-04-20 17:17:46 -07:00
Raghavan Raman	13ac0019ae	[NNC] Update loop-carried dependence check to handle all known dependences (#56354 ) Summary: This PR includes: * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW. * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354 Reviewed By: bertmaher Differential Revision: D27856202 Pulled By: navahgar fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc	2021-04-20 17:12:51 -07:00
Ailing Zhang	1d8053655d	Rename AutoNonVariableTypeMode to AutoDispatchBelowAutograd and add a warning. (#56422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56422 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866608 Pulled By: ailzhang fbshipit-source-id: 507bbcaa4c25edf23e67162780efaa70f64ad14a	2021-04-20 17:04:08 -07:00
Steven Ingram	3cc4dbb66d	Expose nbins and ratio (#50398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50398 Test Plan: fbcode/caffe2/test/quantization/test_quantized_op.py Differential Revision: D25873541 fbshipit-source-id: 7c3cdbb38a1e943e7fa8943a4195dc65d9d95105	2021-04-20 16:28:00 -07:00
Richard Barnes	af7775ba26	Types for caffe2/torch/testing/_internal/common_distributed.py (#55338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55338 Test Plan: Sandcastle Reviewed By: pritamdamania87, ngimel Differential Revision: D27575367 fbshipit-source-id: ca8eb77967af71ce2734408b8e2e15bf64a5ab4a	2021-04-20 16:26:53 -07:00
Tao Xu	8ae8fb7dd1	[iOS GPU][Stub] Move conv2d_prepack impl from MetalPrepackOpRegister.cpp to MetalConvolution.cpp (#56491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56491 Move the prepack convolution to the op file to get rid of the selective compilation. ghstack-source-id: 126960054 Test Plan: CI Reviewed By: SS-JIA Differential Revision: D27719539 fbshipit-source-id: 75fb3849858a31a915828a0f5f6f3d4066ff4c9b	2021-04-20 16:21:18 -07:00
davidriazati@fb.com	15734f5b6f	Ignore warnings for record_function_ops (#56543 ) Summary: This hides the warnings from #35026 until we can fix them for real by migrating to custom classes Pull Request resolved: https://github.com/pytorch/pytorch/pull/56543 Pulled By: driazati Reviewed By: rohan-varma Differential Revision: D27895085 fbshipit-source-id: a325a5d8cefb20a5033c1a059e49c03c08514f18	2021-04-20 16:17:30 -07:00
nikithamalgi	20e88401db	Add monkey type config for JIT (#54513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54513 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27881707 Pulled By: nikithamalgifb fbshipit-source-id: d318a5f3fc2deb7d9b2364962ec709c6bbb68b2c	2021-04-20 16:11:53 -07:00
Bert Maher	17b8a4db1c	[nnc] Support `pow` on CPU (#56308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56308 But only for float tensors. Even on CUDA, int tensors just have weird behavior with pow, and I bet FP is so much more common that it's just not worth trying to fuse ints here. ghstack-source-id: 126769637 Test Plan: `pytest test_jit_fuser_te.py -k test_binary_pow` Reviewed By: navahgar Differential Revision: D27834694 fbshipit-source-id: 7274d72cf02ab95d63574b6c17995b8f34560810	2021-04-20 15:13:03 -07:00
mingfeima	1e03a2505f	add channels last for MaxPool2d (#56361 ) Summary: add channels last support for MaxPool2d. this one is a replacement of https://github.com/pytorch/pytorch/pull/48917 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56361 Reviewed By: heitorschueroff Differential Revision: D27874142 Pulled By: VitalyFedyunin fbshipit-source-id: bc9604def9c974d7b59621fc709a39948088b992	2021-04-20 15:02:18 -07:00
Zhiyuan Chen	7d4e9bdba1	Add type hint for SequentialSampler (#56374 ) Summary: Add type hint for SequentialSampler Pull Request resolved: https://github.com/pytorch/pytorch/pull/56374 Reviewed By: heitorschueroff Differential Revision: D27884528 Pulled By: ejguan fbshipit-source-id: 68eb900643098565743245c843e76e464f981458	2021-04-20 14:45:52 -07:00
Nikitha Malgi	c65284aa07	Remove caption for Lang Reference (#56526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56526 Test Plan: Imported from OSS Reviewed By: navahgar, gmagogsfm Differential Revision: D27891208 Pulled By: nikithamalgifb fbshipit-source-id: 50da4f08a01b5407c9a1ead535539a5a26aea0f7	2021-04-20 14:33:42 -07:00
Brian Hirsh	12b5e666b0	add codegen subdirectories to mypy-strict.ini (#56523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56523 Test Plan: Imported from OSS Reviewed By: malfet, samestep Differential Revision: D27890855 Pulled By: bdhirsh fbshipit-source-id: 78cd725bcf534b8410bdfaf93d2eb681e8a56ff7	2021-04-20 14:00:46 -07:00
Charles David Hernandez	6e1fc5cef8	[quant] added dq->op->q quantization patterns for GELU and softmax ops (#56004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56004 added reference pattern support for GELU, softmax and bmm for int dtypes. For GELU and Softmax, this consisted of adding reference patterns to the default node handler for int dtypes. Note GELU and softmax patterns are not registered since they do not have a proper quantized kernel which means they would either add unnecessary dequant and quant ops to the network, or they would simply error. This can be circumvented with custom qconfig usage as in test_gelu_reference bmm was added within binary ops along with some significant changes to how that code is structured. Theoretically the reference pattern used for bmm could be applied to other dtypes. This was not enabled because of issues relating to Line 1323 in quantize.py. In essence, the prepare step does not know whether an op will use a reference pattern or not, so for ops that are supported with one dtype in reference and one dtype normally, this has the potential to cause issues. This is difficult to get aorund with the is_reference flag being available in the prepare step or discussed changes around separating Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_gelu_reference python test/test_quantization.py TestQuantizeFxOps.ttest_gelu_normal python test/test_quantization.py TestQuantizeFxOps.test_softmax_reference python test/test_quantization.py TestQuantizeFxOps.test_softmax_normal python test/test_quantization.py TestQuantizeFxOps.test_silu_reference python test/test_quantization.py TestQuantizeFxOps.test_bmm_int_reference python test/test_quantization.py TestQuantizeFxOps python test/test_quantization.py TestFuseFx python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxModels Imported from OSS Reviewed By: raghuramank100 Differential Revision: D27818340 fbshipit-source-id: de65be0797035463cd2d1b0e4677d1a87f69143c	2021-04-20 13:26:15 -07:00
Jacob Szwejbka	ea4af1511c	[Pytorch] Better error message for bundling inputs a second time (#56086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56086 ghstack-source-id: 126671245 Test Plan: unittest Reviewed By: dhruvbird Differential Revision: D27778582 fbshipit-source-id: 6b59aa7ddb25c1b3162bbffdf0dd212a96f22bd3	2021-04-20 12:28:27 -07:00
driazati	43eb21bff3	[skip ci] Add simple local actions runner (#56439 ) Summary: This pulls out shell scripts from an action and runs them locally as a first pass at https://github.com/pytorch/pytorch/issues/55847. A helper script extracts specific steps in some order and runs them: ```bash $ time -p make lint -j 5 # run lint with 5 CPUs python scripts/actions_local_runner.py \ --file .github/workflows/lint.yml \ --job 'flake8-py3' \ --step 'Run flake8' python scripts/actions_local_runner.py \ --file .github/workflows/lint.yml \ --job 'mypy' \ --step 'Run mypy' python scripts/actions_local_runner.py \ --file .github/workflows/lint.yml \ --job 'quick-checks' \ --step 'Ensure no trailing spaces' \ --step 'Ensure no tabs' \ --step 'Ensure no non-breaking spaces' \ --step 'Ensure canonical include' \ --step 'Ensure no unqualified noqa' \ --step 'Ensure no direct cub include' \ --step 'Ensure correct trailing newlines' python scripts/actions_local_runner.py \ --file .github/workflows/lint.yml \ --job 'cmakelint' \ --step 'Run cmakelint' quick-checks: Ensure no direct cub include quick-checks: Ensure canonical include quick-checks: Ensure no unqualified noqa quick-checks: Ensure no non-breaking spaces quick-checks: Ensure no tabs quick-checks: Ensure correct trailing newlines cmakelint: Run cmakelint quick-checks: Ensure no trailing spaces mypy: Run mypy Success: no issues found in 1316 source files Success: no issues found in 56 source files flake8-py3: Run flake8 ./test.py:1:1: F401 'torch' imported but unused real 13.89 user 199.63 sys 6.08 ``` Mypy/flake8 are by far the slowest, but that's mostly just because they're wasting a bunch of work linting the entire repo. In followup, we could/should: * Improve ergonomics (i.e. no output unless there are errors) * Speed up lint by only linting files changes between origin and HEAD * Add clang-tidy Pull Request resolved: https://github.com/pytorch/pytorch/pull/56439 Reviewed By: samestep Differential Revision: D27888027 Pulled By: driazati fbshipit-source-id: d6f2a59a45e9d725566688bdac8e909210175996	2021-04-20 12:17:55 -07:00
haozhe.zhu	ab20ba4427	Fix issue with dispatch key: AutogradXPU (#56336 ) Summary: Automatically add dispatch key "AutogradXPU" with "xpu" tensor. And set "fall through" for AutogradXPU Pull Request resolved: https://github.com/pytorch/pytorch/pull/56336 Reviewed By: heitorschueroff Differential Revision: D27872125 Pulled By: ailzhang fbshipit-source-id: c120c62becd577699f9aecb4c356c889bd37ad06	2021-04-20 12:09:59 -07:00
Lucas Hosseini	8868f9c8e3	[TensorPipe] Use targetDevice in tensorpipe_agent. (#56346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56346 Now that TensorPipe's API has `targetDevice`, use that instead of manually writing the CUDA device index in `metadata`. Test Plan: CI Reviewed By: lw Differential Revision: D27703235 fbshipit-source-id: c5b620e3b3ce619367412efdbe9fa3778f6b8869	2021-04-20 11:54:13 -07:00
davidriazati@fb.com	a8ea490f67	Revert caffe2 print stack traces flag (#56496 ) Summary: This reverts the change in #56198 which broke some internal tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/56496 Pulled By: driazati Reviewed By: walterddr Differential Revision: D27886611 fbshipit-source-id: b04de01b3bcf886294ff7ae45776b5955ce19858	2021-04-20 11:43:33 -07:00
Yi Wang	5017c5fcad	[SPMD] Remove _specify_ddp_gpu_num method (#56425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56425 As SPMD mode is gone, `_specify_ddp_gpu_num` becomes useless. It only checks if the module is a GPU module. This actually is already checked by the caller of this function (in fairscale and some other codebases). Additionally also remove `enable_pytorch_sync_bn` wrapper that only calls this function and does nothing else. ghstack-source-id: 126885376 Test Plan: waitforbuildbot Reviewed By: zhaojuanmao Differential Revision: D27866440 fbshipit-source-id: d2fd5cf43eda25c0a2bd35f647848ec0dbd6ad0f	2021-04-20 11:17:47 -07:00
Rohan Varma	04de24d10a	Separate profiling tests from p2p tests (#56412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56412 We are investigating some flaky profiiling tests such as https://github.com/pytorch/pytorch/issues/56146. One issue is that the profiling tests are tightly coupled to these send/recv tests, hence if this test is disabled, we lose signal round send/recv collectives tests. To mitigate this, separate the tests into ones that only test send/recv, and ones that test it with profiling. This way flakiness should not result in the send/recv only tests being disabled. ghstack-source-id: 126920867 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27864845 fbshipit-source-id: 01f04a884482ec7741323218a7f8f4a8451eb4ae	2021-04-20 10:42:00 -07:00
Ailing Zhang	59b61f912a	Switch assertWarnsOnceRegex logic to check any instead of all. (#56434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56434 If we hit multiple TORCH_WARN from different sources when running the statement, it makes more sense to me that we want to check the regex is met in any one of the warning messages instead of all messages. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27871946 Pulled By: ailzhang fbshipit-source-id: 5940a8e43e4cc91aef213ef01e48d506fd9a1132	2021-04-20 10:37:36 -07:00
Sam Estep	75651e3cc4	Add remaining ToCs to ToC lint (#56487 ) Summary: The lint was originally added in https://github.com/pytorch/pytorch/issues/54974, but at the time I didn't realize that these other Markdown files also each have a table of contents: - `GLOSSARY.md` - `torch/csrc/jit/OVERVIEW.md` - `torch/csrc/jit/docs/serialization.md` - `torch/fx/OVERVIEW.md` This PR adds those files to the lint, and also changes the rule from using a fixed list of filenames to a `git grep` command that finds all Markdown files containing this magic comment: ```md ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56487 Test Plan: The "Lint / toc" job in GitHub Actions. Reviewed By: janeyx99 Differential Revision: D27884885 Pulled By: samestep fbshipit-source-id: 5462437502b17fba93abf5098e21754bf566a4fe	2021-04-20 10:28:47 -07:00
anjali411	062e70590c	Add OpInfo tests for torch.{dot, vdot, bmm, mv} (#56409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56409 Reviewed By: nikithamalgifb Differential Revision: D27870769 Pulled By: anjali411 fbshipit-source-id: a1a0e89856529a4739c7612c5b1e3c5ed2569126	2021-04-20 10:22:15 -07:00
Facebook Community Bot	e4faebca0d	Automated submodule update: tensorpipe (#56259 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `416a9d8a4a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56259 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: pbelevich Differential Revision: D27881993 Pulled By: beauby fbshipit-source-id: e7d8cefe89c6fb09b59e3ef57da05a7ab0a3cb16	2021-04-20 09:38:05 -07:00
cyy	f74a346213	Fix torch.hub.load("pytorch/vision") fails to validate the master branch (#56138 ) Summary: We should iterate all pages of the branches API. Otherwise, even using "pytorch/vision" would fail to find master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56138 Reviewed By: heitorschueroff Differential Revision: D27872346 Pulled By: ailzhang fbshipit-source-id: 55881558f7980b1fb08b0d08ed6687a38df06edd	2021-04-20 09:33:25 -07:00
Howard Huang	b2dae294b6	Fix distributed.test_jit_c10d flaky tests (#56410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56410 Changes: - Move create_tcp_store() helper function to common file - Update test_jit_c10d to retry TCP Store creation in case allocated port becomes used fixes https://github.com/pytorch/pytorch/issues/55053 Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D27869560 Pulled By: H-Huang fbshipit-source-id: f4a6613049bb25e6f6f194214379a380968bb19c	2021-04-20 09:28:27 -07:00
Hong Xu	0e0a5471ef	Remove an unused variable in SoftmaxWithLossOp (#56321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56321 Reviewed By: heitorschueroff Differential Revision: D27854332 Pulled By: bdhirsh fbshipit-source-id: 1a9dcfdc63412069cee4444a595c3460815d3c6c	2021-04-20 09:18:08 -07:00
davidriazati@fb.com	4e0760f41a	Remove `is_variable` from tests (#56305 ) Summary: `is_variable` spits out a deprecation warning during the build (if it's still something that needs to be tested we can ignore deprecated warnings for the whole test instead of this change). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56305 Pulled By: driazati Reviewed By: ezyang Differential Revision: D27834218 fbshipit-source-id: c7bbea7e9d8099bac232a3a732a27e4cd7c7b950	2021-04-20 09:03:53 -07:00
UnmeshPadhye	eacf6f1b51	Updated the tech docs to be consistent with other two descriptions (#56338 ) Summary: Updated the Beta channel description to be consistent with other two channels (Stable, Prototype) The screenshot attached is for reference before changes. ![Screenshot 2021-04-18 12-36-55](https://user-images.githubusercontent.com/20245964/115137303-0c077380-a043-11eb-9532-c46486e8a75a.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56338 Reviewed By: heitorschueroff Differential Revision: D27854350 Pulled By: bdhirsh fbshipit-source-id: a21208c11242e84de313d5b11269264756bf9029	2021-04-20 09:00:42 -07:00
Sam Estep	c61778355c	Upgrade ShellCheck to v0.7.2 (#56445 ) Summary: [First ShellCheck release in over a year!](https://github.com/koalaman/shellcheck/releases/tag/v0.7.2) I'm thankful for doing https://github.com/pytorch/pytorch/issues/55109 at the beginning of this month, because otherwise `master` would have just suddenly started failing a few hours ago. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56445 Test Plan: CI. You can also run `shellcheck` locally; for instance, if you're on Mac and [installed it with Homebrew](https://github.com/koalaman/shellcheck/tree/v0.7.2#installing): ```sh brew upgrade shellcheck rm -r .extracted_scripts ; tools/extract_scripts.py --out=.extracted_scripts tools/run_shellcheck.sh .jenkins/pytorch .extracted_scripts ``` Reviewed By: janeyx99 Differential Revision: D27874084 Pulled By: samestep fbshipit-source-id: 3bd871a368fe03aecd559e2f55bce36af49cfa27	2021-04-20 07:58:22 -07:00
Ivan Yashchuk	3d878dee45	Added out= variant for torch.linalg.lstsq (#54721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54721 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27874711 Pulled By: mruberry fbshipit-source-id: 696ebb6eb0bad81988e9cb7a081388a3a5ab3e2c	2021-04-20 07:09:06 -07:00
davidriazati@fb.com	43c747859c	Use c10 backtrace generation in caffe2 (#56198 ) Summary: This cuts out caffe2's old backtrace generation in favor of the one already in c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56198 Pulled By: driazati Reviewed By: nikithamalgifb Differential Revision: D27868282 fbshipit-source-id: aa9b9691271eaa3f95baab48773ffefebd924ae2	2021-04-20 07:00:33 -07:00
Alban Desmaison	63dac82444	Make grad mode error just a warning (#56401 ) Summary: Temporary fix to give people extra time to finish the deprecation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401 Reviewed By: xw285cornell, drdarshan Differential Revision: D27862196 Pulled By: albanD fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6	2021-04-20 06:30:55 -07:00
kshitij12345	0ea4eb745b	[opinfo] torch.lerp: move remaining cases from tensor_methods to opinfo (#55665 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/54304 Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55665 Reviewed By: bdhirsh Differential Revision: D27845528 Pulled By: mruberry fbshipit-source-id: 36bdf14c4923a83fb8e4f4d361467d9568784011	2021-04-20 02:01:34 -07:00
kshitij12345	df8bb5a42b	Add OpInfo for polygamma and remove torch_op_tests Infra (#51966 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 * OpInfo entry for Polygamma * Removes infra of torch_op_tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/51966 Reviewed By: bdhirsh Differential Revision: D27851858 Pulled By: mruberry fbshipit-source-id: 7f1d0273065e1df56a152f95a14513959af29a1b	2021-04-20 01:03:09 -07:00
Ivan Yashchuk	a661e58731	Removed infos vector in torch.linalg.qr (#56248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56248 `info` error code for QR decomposition only indicates wrong parameters, when everything is implemented correctly it will never be nonzero so we don't need to check it for CPU path. For MAGMA `checkMagmaInternalError` is added that checks for failed memory allocations internal to MAGMA. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27850414 Pulled By: mruberry fbshipit-source-id: ddda1209008f879f24c9ad08739e10c28b194d18	2021-04-20 00:08:31 -07:00
Aliaksandr Ivanou	c5c5230890	Pytorch resolve bug around incorrect rdzv handler resolution (#56386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56386 The diff resolves bug around incorrect handler resolution: _create_static_handler pointed towards etcd, and _create_etcd_handler pointed towards static. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:test_launcher Added test_launcher to the ci/cd tests Reviewed By: cbalioglu Differential Revision: D27858897 fbshipit-source-id: 440155789958c091ce5755e7c9524e4bb704203a	2021-04-19 23:50:28 -07:00
Ansha Yu	7ae45403a1	[static runtime] support aten::__getitem__ natively (#55310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55310 Test Plan: Run on the dper generated local/local_ro model ``` ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.local.local_ro.pt --pt_inputs=/data/users/ansha/tmp/adfinder/aug_1x/210616848_0.predictor.disagg.input_data.container.pt --iters=1000 --warmup_iters=10000 --num_threads=1 --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 --compare_results=0 --do_profile=0 --adsfinder_compatibility=1 ``` Reviewed By: hlu1 Differential Revision: D27569662 fbshipit-source-id: df68c2fdd95e39a30aec35ddbaf1f5df0bc3a3da	2021-04-19 23:08:19 -07:00
Meghan Lele	85f4025ad7	Port adaptive_max_pool3d to structured kernel (#56320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56320 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27862931 Pulled By: SplitInfinity fbshipit-source-id: 99fc79611a95ce934ed879f8ae6b0c26a645813b	2021-04-19 22:42:12 -07:00
Meghan Lele	0d4394778e	Port adaptive_max_pool2d to structured kernel (#56317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56317 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27862930 Pulled By: SplitInfinity fbshipit-source-id: e2d199df0ebaf585698f26fcfda5a0301ba67ade	2021-04-19 22:41:05 -07:00
Xiang Gao	513e9e0927	Fix cxx11 abi (#55984 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55984 Reviewed By: agolynski Differential Revision: D27809478 Pulled By: seemethere fbshipit-source-id: b00801e50c364b307009349594e396b934cc3a49	2021-04-19 22:20:10 -07:00
Yi Wang	07653b7fe0	[SPMD] Remove ddp_gpu_size field from SyncBatchNorm (#55946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55946 As `ddp_gpu_size` field of `SyncBatchNorm` will always be 1 for GPU modules, remove this field and the relevant code. ghstack-source-id: 126883498 Test Plan: waitforbuildbot Reviewed By: zhaojuanmao Differential Revision: D27746021 fbshipit-source-id: b4518c07e6f0c6943fbd7a7548500a7d4337126c	2021-04-19 21:41:29 -07:00
Kiuk Chung	023231a2ac	[torch/distributed] Fix pydoc for torch.distributed.elastic.multiprocessing (replace Redirect with Std) Summary: `Redirects` was renamed to `Std` in `torch.distributed.elastic.multiprocessing.api`. Pointed out by a user in https://github.com/pytorch/elastic/issues/147. Test Plan: N/A just doc change Reviewed By: tierex Differential Revision: D27866614 fbshipit-source-id: 9fb901aae7ebe11cde13000a1c118de527f34400	2021-04-19 21:40:16 -07:00
Jerry Zhang	94406f77f6	[quant][graphmode][fx] Add support for keeping output quantized for list and dict (#56391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56391 Previously we only support keeping output quantized for tensor output, this PR adds support for list and dict (values) as well Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27860327 fbshipit-source-id: e770160ced47a7173abff5505ec620bd2b1a0b01	2021-04-19 21:37:11 -07:00
eqy	42f0fe1fe3	fix misaligned access #56325 (#56403 ) Summary: CC ngimel ptrblck ref: https://github.com/pytorch/pytorch/issues/56325 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56403 Reviewed By: mruberry Differential Revision: D27866625 Pulled By: ngimel fbshipit-source-id: 9dff0e9749f8de57fac6a653f685c14854611a02	2021-04-19 20:12:03 -07:00
Natalia Gimelshein	92d24e3060	Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27855386 (`40483acc51`) Original commit changeset: dabd505d2a04 fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500	2021-04-19 20:07:20 -07:00
Edward Yang	b1282bc109	Use stack trace implementation in common/process on fbcode (#56400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56400 See https://github.com/pytorch/pytorch/issues/56399 I don't have time to fix this properly, so this is just to stem the bleeding. Someone should go and figure out what it is that common/process is doing better. ghstack-source-id: 126868405 Test Plan: I manually patched this into D27765125 and triggered a exception and observed that everything symbolized good: ``` [9] what(): new_refcount != 1INTERNAL ASSERT FAILED at "caffe2/c10/util/intrusive_ptr.h":234, please report a bug to PyTorch. intrusive_ptr: Cannot increase refcount after it reached zero. Exception raised from retain_ at caffe2/c10/util/intrusive_ptr.h:234 (most recent call first): # 0 c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool) # 1 c10::(anonymous namespace)::GetFetchStackTrace[abi:cxx11]()::$_0::operator()[abi:cxx11]() const # 2 std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::Ge tFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&) # 3 std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const # 4 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) # 5 c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocat or<char> > const&) # 6 c10::detail::torchInternalAssertFail(char const, char const, unsigned int, char const, char const) # 7 c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::retain_() # 8 c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::intrusive_ptr(c10::intrusiv e_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> > const&) # 9 c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >& c10::intrusive_ptr<c10d::Pr ocessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessGroup> >::operator=<c10d::ProcessGroup, c10::detail::intrusive_target _default_null_type<c10d::ProcessGroup> >(c10::intrusive_ptr<c10d::ProcessGroup, c10::detail::intrusive_target_default_null_type<c10d::ProcessG roup> > const&) & ``` Reviewed By: driazati Differential Revision: D27861908 fbshipit-source-id: 84c1dfb1ef28c460b020646f836c153562ad5c44	2021-04-19 19:30:11 -07:00
Ailing Zhang	f096245610	AutoNonVariableTypeMode->InferenceMode in OSS. (#56421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56421 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27866609 Pulled By: ailzhang fbshipit-source-id: 040991a031c5511501b03cfe21a4a636586e120e	2021-04-19 18:07:41 -07:00
Mike Guo	5b4c3a9da1	record Torch DP and DDP modules forward (#55578 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55578 Reviewed By: gdankel Differential Revision: D27862392 Pulled By: ilia-cher fbshipit-source-id: 18545d23e35a97c8f760707fecb696a24d47dc0a	2021-04-19 17:52:59 -07:00
Eli Uriegas	31677c5fcb	[reland] .github: Add initial linux CI workflow (#56280 ) Summary: This reverts commit 6b5ed5ec454ecd8597ff0465305915dd1e09a805. There'll also probably be fixes here, see diff from original PR: https://github.com/pytorch/pytorch/compare/f2abce0...ci-all/add-initial-linux-ci-gha Pull Request resolved: https://github.com/pytorch/pytorch/pull/56280 Reviewed By: walterddr Differential Revision: D27826012 Pulled By: seemethere fbshipit-source-id: 71cad1d7f840ede5025b1bb4a33d628aa74686d1	2021-04-19 17:36:09 -07:00
Ivan Kobzarev	0917061f43	[vulkan][jit_pass] Add optimized_for_vulkan attribute on vulkan pass (#56414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56414 Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D27865144 Pulled By: IvanKobzarev fbshipit-source-id: c59f0eb2722f3fce0a91d9bd0b7cae3e0436c496	2021-04-19 17:27:59 -07:00
Pritam Damania	7adc04d7b5	Add more logging to debug test_reduce_sum_cuda_twice (#56406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56406 Been hard to reproduce https://github.com/pytorch/pytorch/issues/50840, adding some debug log to get a better sense of the issue. ghstack-source-id: 126874222 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27863328 fbshipit-source-id: e6f125b77cfb636b90598eb54395609654f5e139	2021-04-19 17:24:49 -07:00
Raghavan Raman	0d94c04247	[NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353 ) Summary: Partial fix for https://github.com/pytorch/pytorch/issues/56357 Changes the `fuseLoops` API to the following form: ``` static bool fuseLoops(const std::vector<For>& loops, For* fused); ``` Also, adds a new API to check for loop-carried dependences: ``` static bool hasLoopCarriedDependence(For* loop); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353 Reviewed By: bertmaher Differential Revision: D27856214 Pulled By: navahgar fbshipit-source-id: 443557088692585657faee296602c547a00117dd	2021-04-19 17:20:40 -07:00
Tao Xu	fe3f6f2da2	[iOS GPU][Kernel] Implement mean.dim using MPSReduce kernel (#56073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56073 Implement the `mean.dim` operator for Metal backend. Currently, we don't support reducing the batch dimension. ghstack-source-id: 126802129 Test Plan: - Sandcastle - CircleCI - Unit tests ``` 2021-03-23 13:01:29.663842-0700 PyTorchPlayground[64572:9575354] [bool test_mean_dim()],[1 5 2 2 ],[SUCCEED] 2021-03-23 13:01:29.666230-0700 PyTorchPlayground[64572:9575354] [bool test_mean_dim2()],[1 5 2 2 ],[SUCCEED] ``` Reviewed By: dhruvbird Differential Revision: D27269394 fbshipit-source-id: fafcdde50ac457a8488c6170d0a8d3db1871439b	2021-04-19 16:34:16 -07:00
Ikko Ashimine	fa7534788b	Fix typo in gradcheck.py (#56368 ) Summary: betwen -> between Pull Request resolved: https://github.com/pytorch/pytorch/pull/56368 Reviewed By: bdhirsh Differential Revision: D27860450 Pulled By: albanD fbshipit-source-id: 86ef7b62e228c15319683a8d72b404b5f527666e	2021-04-19 15:53:02 -07:00
Sam Estep	34d0bd5b1d	Fix TestTypeHints.test_doc_examples (#56388 ) Summary: https://github.com/pytorch/pytorch/issues/54268 removed `test_run_mypy` since now we're running `mypy` as its own job in GitHub Actions, but previously we used this `set_cwd` context manager in that test to ensure that we picked up the `mypy` config correctly. However, for some reason, we have not been doing that in `test_doc_examples`, which has been succeeding in CI for a while despite being broken. Specifically, [`run_test.py` changes the working directory to `test/` before running test files](`48aaea3359/test/run_test.py (L534-L535)`), which is contrary to [what `CONTRIBUTING.md` instructs developers to do](`48aaea3359/CONTRIBUTING.md (python-unit-testing)`). As a result, in CI, `test/test_type_hints.py` has been passing in CI, but if you run it locally from the root of the repo, this you get this error: ``` F ====================================================================== FAIL: test_doc_examples (__main__.TestTypeHints) Run documentation examples through mypy. ---------------------------------------------------------------------- Traceback (most recent call last): File "test/test_type_hints.py", line 127, in test_doc_examples self.fail(f"mypy failed:\n{stdout}") AssertionError: mypy failed: test/generated_type_hints_smoketest.py:851: error: Name 'tensor' is not defined [name-defined] test/generated_type_hints_smoketest.py:853: error: Name 'tensor' is not defined [name-defined] Found 2 errors in 1 file (checked 1 source file) ---------------------------------------------------------------------- Ran 1 test in 1.416s FAILED (failures=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56388 Test Plan: Before this PR, the first of the following two commands should fail (since that is essentially what is run in CI), but the second should fail: ``` python test/run_test.py -i test_type_hints python test/test_type_hints.py ``` After this PR, both commands should succeed. Reviewed By: driazati Differential Revision: D27860173 Pulled By: samestep fbshipit-source-id: efb82fffd7ccb04d0331824b40bdef7bbc319c98	2021-04-19 15:27:09 -07:00
davidriazati@fb.com	2f5c352162	Fix protobuf warnings in caffe2 (#56186 ) Summary: This guards some deprecated usages of the Protobuf API behind an `#ifdef` (this is how onnx does it as well) ](https://our.intern.facebook.com/intern/diff/27803121/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56186 Pulled By: driazati Reviewed By: bertmaher, dzhulgakov Differential Revision: D27803121 fbshipit-source-id: 2d3a348ec1ab9879a0d8f2dff17c5444fd4baf2c	2021-04-19 15:19:53 -07:00
davidriazati@fb.com	638617f9f8	Write mini dump on pybind exceptions (#55652 ) Summary: We register an [error handler](https://pybind11.readthedocs.io/en/stable/advanced/exceptions.html#registering-custom-translators) with pybind so that C++ exceptions are passed to Python and raised as runtime errors that can be `try...except`ed etc. Since these don't terminate the program (until Python does), they never fire the signal handler to write a minidump out with the crash information. This PR adds some logic in the exception translator to write out a minidump if enabled. ](https://our.intern.facebook.com/intern/diff/27830952/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55652 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830952 fbshipit-source-id: 26e8f913e99dff971a4eb09eb87221c66f759763	2021-04-19 14:53:43 -07:00
Yi Zhang	a14178ed5c	Remove useless code (#56230 ) Summary: Since we're using specific VS, we don't need to specify VC version. In fact, the VC version is not used in CI now. Why I make this change now? I'm writing a robot to update the vs_install.ps1 (https://github.com/pytorch/pytorch/pull/56261/) every 2 weeks. It will submit a PR to check if the latest VS is compatible with PyTorch automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56230 Reviewed By: bdhirsh Differential Revision: D27856647 Pulled By: ezyang fbshipit-source-id: b46f2bdf35ab5841fded470e23bbf7a01d5f60f4	2021-04-19 14:22:18 -07:00
Brandon Lin	04607a58f1	[pytorch] Fix compiler warnings from conv.h (#56181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56181 Need to change to size_t vs size_t: Reviewed By: ezyang Differential Revision: D27800849 fbshipit-source-id: 25f744128eb8750c382dc967a99af3c9f16247d9	2021-04-19 14:13:02 -07:00
Tao Xu	2c9972facf	[iOS GPU][Kernel] Implement transpose in Metal shaders (#54522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54522 Implement the transpose operator in metal shaders using textures. ghstack-source-id: 126802125 Test Plan: - Metal operator tests ``` 2021-03-22 02:25:53.941006-0700 PyTorchPlayground[57924:9047121] [bool test_transpose()],[1 2 2 5 ],[SUCCEED] 2021-03-22 02:25:53.949834-0700 PyTorchPlayground[57924:9047121] [bool test_transpose2()],[1 2 58 28 28 ],[SUCCEED] 2021-03-22 03:12:19.786584-0700 PyTorchPlayground[58230:9066223] [bool test_transpose3()],[4 5 6 ],[SUCCEED] ``` - Sandcastle CI - CircleCI Reviewed By: SS-JIA Differential Revision: D27225940 fbshipit-source-id: 14bfb96435a39aecf4f14bc5e2f7232421014328	2021-04-19 13:52:54 -07:00
Sam Estep	e3900d2ba5	Add lint for unqualified `noqa` (#56272 ) Summary: As this diff shows, currently there are a couple hundred instances of raw `noqa` in the codebase, which just ignore all errors on a given line. That isn't great, so this PR changes all existing instances of that antipattern to qualify the `noqa` with respect to a specific error code, and adds a lint to prevent more of this from happening in the future. Interestingly, some of the examples the `noqa` lint catches are genuine attempts to qualify the `noqa` with a specific error code, such as these two: ``` test/jit/test_misc.py:27: print(f"{hello + ' ' + test}, I'm a {test}") # noqa E999 test/jit/test_misc.py:28: print(f"format blank") # noqa F541 ``` However, those are still wrong because they are [missing a colon](https://flake8.pycqa.org/en/3.9.1/user/violations.html#in-line-ignoring-errors), which actually causes the error code to be completely ignored: - If you change them to anything else, the warnings will still be suppressed. - If you add the necessary colons then it is revealed that `E261` was also being suppressed, unintentionally: ``` test/jit/test_misc.py:27:57: E261 at least two spaces before inline comment test/jit/test_misc.py:28:35: E261 at least two spaces before inline comment ``` I did try using [flake8-noqa](https://pypi.org/project/flake8-noqa/) instead of a custom `git grep` lint, but it didn't seem to work. This PR is definitely missing some of the functionality that flake8-noqa is supposed to provide, though, so if someone can figure out how to use it, we should do that instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56272 Test Plan: CI should pass on the tip of this PR, and we know that the lint works because the following CI run (before this PR was finished) failed: - https://github.com/pytorch/pytorch/runs/2365189927 Reviewed By: janeyx99 Differential Revision: D27830127 Pulled By: samestep fbshipit-source-id: d6dcf4f945ebd18cd76c46a07f3b408296864fcb	2021-04-19 13:16:18 -07:00
Tao Xu	7bcf95bbb6	[iOS GPU] Move the definition of `fp16_t` to MetalUtils.h (#54521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54521 Move code around. No significant feature changes. ghstack-source-id: 126802126 Test Plan: - CircleCI - Sandcastle Reviewed By: SS-JIA Differential Revision: D27225941 fbshipit-source-id: 8b5439b6bf5e24ea755cb1941d92fae26a8d5a06	2021-04-19 13:08:01 -07:00
Joel Schlosser	40483acc51	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: bdhirsh Differential Revision: D27855386 Pulled By: jbschlosser fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c	2021-04-19 12:24:58 -07:00
Hui Guo	ca6e5c7fc9	[NNC] added more python bindings for loopnest (#56213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56213 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27809000 Pulled By: huiguoo fbshipit-source-id: 013a1ae74397650a958c4fdcd39b38a3c0ff17ef	2021-04-19 11:28:00 -07:00
davidriazati@fb.com	d1b6383d65	Hide warnings for deprecated quantization APIs (#56291 ) Summary: These have a tracking task to actually fix them but in the meantime they should not be clogging up everyone's build output (see #55952). ](https://our.intern.facebook.com/intern/diff/27830229/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56291 Pulled By: driazati Reviewed By: bertmaher Differential Revision: D27830229 fbshipit-source-id: f1e5d6e9b2c63d4a4ad99a1744a520f8c681c22b	2021-04-19 11:11:33 -07:00
marksaroufim	48aaea3359	unified GlooStore and c10d store API (#56222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56222 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55719 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D27785267 Pulled By: msaroufim fbshipit-source-id: ce247f9226ecc971af8e1f08adeb835f64973e12	2021-04-19 10:57:18 -07:00
Tao Xu	5748cc0d11	[Mobile GPU] Ban mutations in JIT passes (#56070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56070 Summary Currently, we're returning copies instead of alias on mobile GPU (Metal/Vulkan). As suggested by ailzhang , we could use the JIT pass - `RemoveTensorMutation` to ban mutations ahead of time. I've tested two scenarios as shown below. They both work fine on mobile. - view ``` class Model (torch.nn.Module): def forward(self, x): y = x.view(-1) z = torch.tensor(2.0).float() y.add_(z) return x m = Model() x = torch.rand(2, 3) y = m(x) ``` - transpose ``` class Model (torch.nn.Module): def forward(self, x): y = x.transpose(1, 2) z = torch.tensor(2.0).float() x.add_(z) return y m = Model() x = torch.rand(1, 2, 3) y = m(x) ``` As we're adding more ops, we should add more tests to cover all the alias ops - https://github.com/pytorch/pytorch/blob/master/tools/autograd/gen_inplace_or_view_type.py#L31-L80 Next step Synced offline with eellison. Since mutation removal is also being used in ONNX, Static runtime, some jit optimizations, Torch -> TVM, etc, instead of inventing something new, we would continue to make it better in cases where it fails. Although this JIT pass could work for most of the mobile models, there are cases that it can't cover. What we're going to do next is to implement stub ops for GPU models to let them run on server side, such that users can compare results to see if there is any discrepancy. ghstack-source-id: 126802123 Test Plan: - Sandcastle - CircleCI Reviewed By: raziel Differential Revision: D27692683 fbshipit-source-id: 9d1be8a6c0a276032b1907807a54fbe2afd882f9	2021-04-19 10:43:53 -07:00
Ailing Zhang	98162cb0bb	Enable AutoGradMode in InferenceMode. (#56107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56107 Test Plan: Imported from OSS Reviewed By: pbelevich, driazati Differential Revision: D27807137 Pulled By: ailzhang fbshipit-source-id: bfacf11ec5a431589cec73d6371cac81b425a115	2021-04-19 10:24:20 -07:00
Hong Xu	8881f504f1	Remove the unused maximum and minimum functions in vec256_base (#56313 ) Summary: They are unused, unrelated to vectorization, and confusing for code readers (each of them have 2 overloads that are actually used). Pull Request resolved: https://github.com/pytorch/pytorch/pull/56313 Reviewed By: bdhirsh Differential Revision: D27854290 Pulled By: ezyang fbshipit-source-id: 14945ceac39a3f19e5d0f8d762b17f8c2172b966	2021-04-19 10:00:52 -07:00
Edward Yang	6409d34482	Sort glob of files to ensure it is deterministic (#55850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55850 ghstack-source-id: 126339587 Test Plan: diff on top builds successfully on Sandcastle Reviewed By: wconstab Differential Revision: D27722254 fbshipit-source-id: 181ae1a874dbfc73688dcc5b7e9264d79abd44d3	2021-04-19 09:50:13 -07:00
Peng Wu	838d3079ad	Lazily initialize alias db in remove_mutation opt (#55949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55949 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27793881 fbshipit-source-id: eebde5b5142d8fecfee4756604d313b0da809882	2021-04-19 09:45:33 -07:00
Aliaksandr Ivanou	98ac6f7cbc	Increase default rendezvous timeout to 15 minutes Summary: Increase default rendezvous timeout to 15 minutes to address slow static initialization. Test Plan: n/a Reviewed By: wilson100hong Differential Revision: D27725655 fbshipit-source-id: a1b8c49b225b61be0d13ff5e52bf6677bf72f792	2021-04-19 09:20:15 -07:00
Brandon Lin	d806b06167	Support int32 indices in torch.repeat_interleave (#55102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55102 To avoid casting a tensor to `.long()`, we introduce support for int32 in `torch.repeat_interleave`. Reviewed By: ezyang Differential Revision: D27478235 fbshipit-source-id: 08b4cce65fe94ff10535ddc07e1ba2bacea6a2cf	2021-04-19 09:07:25 -07:00
Winston Smith	b6b2fc7e3f	Added OpInfos of add & mm (#55915 ) Summary: Added `OpInfo`s of `add` & `mm`. cc anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55915 Reviewed By: agolynski Differential Revision: D27800077 Pulled By: heitorschueroff fbshipit-source-id: 84be4b0930f6ef472622e6721a516cc182ac76d1	2021-04-19 08:56:19 -07:00
Heitor Schueroff	92991d9533	Add OpInfo for (nan)quantile (#55548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55548 Reviewed By: mruberry Differential Revision: D27796638 Pulled By: heitorschueroff fbshipit-source-id: d7f09c4ffa7d726cc8228a16f818b74fb9e1a93a	2021-04-19 08:41:32 -07:00
Sam Estep	d05e7c163f	Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27600457 (`1077f87269`) Original commit changeset: b58bfee61c39 fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c	2021-04-19 07:47:24 -07:00
kshitij12345	7d17559152	[special] OpInfo `i0e`: fix missing check (#56232 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/56274 Missed to check if scipy is installed or not (for reference tests). Thanks mruberry !! Pull Request resolved: https://github.com/pytorch/pytorch/pull/56232 Reviewed By: ejguan Differential Revision: D27832397 Pulled By: ezyang fbshipit-source-id: abc40ce7bf14d3c0f20877030880663ccb7fe375	2021-04-19 07:12:39 -07:00
Joel Schlosser	1077f87269	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: mrshenli Differential Revision: D27600457 Pulled By: jbschlosser fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1	2021-04-19 06:58:40 -07:00
Winston Smith	7513455c74	Make tensordot resize output tensor's size if out= argument is specified & make it safely cast & copy output (#56286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56022. Fixes https://github.com/pytorch/pytorch/issues/56316 For `torch.tensordot`, 1. `tensordot`'s out variant now resizes the output tensor provided as the `out` argument if necessary. 2. Added a check to verify if the output tensor provided as the argument for `out` is on the same device as the input tensors. 3. Added a check to verify if the dtype of the result is castable to the dtype of the output tensor provided as an argument for `out`. 4. Because of (2) & (3), `tensordot`'s out variant now [safely casts & copies output](https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch). 5. `test_tensordot` in `test_linalg.py` had a bug - the output tensor wasn't being defined to be on the same device as the input tensors. It was fixed by simply using a `device` argument in its definition. 6. Added an `OpInfo` for `tensordot` and modified the `OpInfo` for `inner`. cc heitorschueroff mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/56286 Reviewed By: ngimel Differential Revision: D27845980 Pulled By: mruberry fbshipit-source-id: 134ab163f05c31a6900dd65aefc745803019e037	2021-04-19 04:20:21 -07:00
Philip Meier	0e106fce9c	add tests for torch.testing (#54784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54784 * #54769 make torch.testing asserts importable Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27717422 Pulled By: mruberry fbshipit-source-id: 7526af4f17d8ffcc4ea5e5a5d98f07ceac89df40	2021-04-19 03:47:31 -07:00
Ivan Yashchuk	2219286de4	Updated internal code for orgqr function (#56247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56247 Moved `apply_orgqr` to `BatchLinearAlgebraKernel.cpp`. Removed `infos` tensor parameter. We don't need to expose lapack/cusolver error codes because they do not contain any useful information about the input. Its value is checked only in debug mode now removing the device syncronization from the cuSOLVER path of `torch.linalg.householder_product` or `torch.orgqr`. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27844339 Pulled By: mruberry fbshipit-source-id: 47aa20dfe2c116951b968362ad55e837caece042	2021-04-19 00:44:52 -07:00
Raghavan Raman	b387f7ca47	[NNC] Make normalization transformation in-place (#56158 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/56157 This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one. New API: ``` static bool normalize(For* f); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158 Reviewed By: agolynski Differential Revision: D27798361 Pulled By: navahgar fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b	2021-04-18 23:54:13 -07:00
Lucas Hosseini	22d4d9f4a6	[pytorch][PR] Automated submodule update: tensorpipe (#56348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56348 This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `f88994cf33` Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27820550 fbshipit-source-id: efde79955af9a902c2d2bf38ed2705282a9ae2f0	2021-04-18 23:29:25 -07:00
Michael Carilli	ffdecc1ac4	[CUDA graphs] Allows DeviceCachingAllocator to capture cross-stream memory use (#55860 ) Summary: Safely deallocating and repurposing memory used across streams relies on recording end-of-life events in all an allocation's usage streams beyond its original allocation stream. The events are later queried to see if all GPU work in those extra streams that could have used the allocation is done (from the CPU's perspective) before repurposing the allocation for use in its original stream. The trouble is, calling EventQuery on an ordinary event recorded in a capturing stream is illegal. Calling EventQuery while capture is underway is also illegal. So when we call `tensor.record_stream` (or `c10::cuda::cudaCachingAllocator::recordStream`) on any tensor that's used or deleted in or around a capture, we often end up with a confusing error thrown from the cudaEventQuery in DeviceCachingAllocator::process_events(). This PR enables hopefully-safe deletion of tensors used across streams in or around capture with a conservative but simple approach: don't record or process end of life events for such tensors until the allocator's sure no captures are underway. You could whiteboard cases where this causes cross-stream-used allocations to be unavailable for reuse longer than absolutely necessary, but cross-stream-used allocations are uncommon, so for practical purposes this approach's impact on the memory footprint of captured sequences should be small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55860 Reviewed By: ejguan Differential Revision: D27822557 Pulled By: ezyang fbshipit-source-id: b2e18a19d83ed05bad67a8157a14a606ed14d04e	2021-04-18 20:32:10 -07:00
Ilqar Ramazanli	3e42da09df	Porting logcumsumexp tests to OpInfo (#56135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56135 Reviewed By: mruberry Differential Revision: D27844398 Pulled By: iramazanli fbshipit-source-id: e0191314dc4e248501ad25170da0b77c0b799781	2021-04-18 18:49:06 -07:00
Rohan Varma	ce05b7a324	[c10d] Remove deprecated use of torch.LongTensor, torch.ByteTensor (#55861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55861 APIs such as torch.LongTensor and torch.ByteTensor are deprecated and the recommended API is torch.tensor(args, dtype=...). Use this API in distributed_c10d. ghstack-source-id: 126777875 Test Plan: CI Reviewed By: pbelevich Differential Revision: D27726600 fbshipit-source-id: 07eb8168d93697593589002c93c3903ce29431ef	2021-04-18 14:12:02 -07:00
Michael Carilli	a24b17248f	Short circuits DistributedDataParallel._recursive_to's copy and stream syncs if input is already on the right device (#55624 ) Summary: ^ Pull Request resolved: https://github.com/pytorch/pytorch/pull/55624 Reviewed By: pbelevich, agolynski Differential Revision: D27836170 Pulled By: rohan-varma fbshipit-source-id: 954bf336d70f9e80c045a6a96c1d8843c7f1cf2c	2021-04-18 14:08:08 -07:00
Raghavan Raman	29c5cb797d	[NNC] Fuse loops that have the same bounds as expressions (#55997 ) Summary: This PR allows fusing loops whose bounds are specified as expressions that are equal. For example: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 * j; } for (int k = 0; k < M + N; k++) { B[k] = 20 * k; } ``` `fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr` and will result in: ``` for (int j = 0; j < M + N; j++) { A[j] = 10 j; B[j] = 20 * j; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997 Reviewed By: bertmaher Differential Revision: D27841270 Pulled By: navahgar fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e	2021-04-18 11:14:26 -07:00
Ilqar Ramazanli	b0e0841f98	OpInfo porting for logsumexp operator (#55520 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55520 Reviewed By: mruberry Differential Revision: D27844357 Pulled By: iramazanli fbshipit-source-id: 6228041be9edc0a148fa34e965d2ff6423649b05	2021-04-18 06:57:14 -07:00
Peter Bell	8c74e1b840	Vectorize copysign on CPU (#51792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51792 Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D27769007 Pulled By: mruberry fbshipit-source-id: 65fceb9f59ed6afee4452278992340da104ed5fe	2021-04-18 02:14:18 -07:00
Winston Smith	36b476ccdd	Added OpInfos for eq, ne, ge, gt, le, and lt (#55709 ) Summary: A https://github.com/pytorch/pytorch/issues/54261 task Added OpInfos for `eq`, `ne`, `ge`, `gt`, `le`, and `lt`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55709 Reviewed By: jbschlosser Differential Revision: D27760382 Pulled By: mruberry fbshipit-source-id: 30d8c9633c69a097c1e4a9daf4178c617c0a9093	2021-04-17 22:52:47 -07:00
Mikhail Zolotukhin	85126629a5	[TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319 With this change the TorchScript graph can have constant tensors in it and we still will be able to lower it to TE. The constants are registered (or bound) within the `TensorExprKernel` object and when the codegen is called, they are passed along with usual inputs and outputs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27838747 Pulled By: ZolotukhinM fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437	2021-04-17 11:15:35 -07:00
Mikhail Zolotukhin	dd9ef529ba	[TensorExpr] TensorExprKernel: switch type of tensors_ from Tensor to Buf. (#56318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56318 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27838748 Pulled By: ZolotukhinM fbshipit-source-id: 371a454912be76889999eda79e60d8154b749134	2021-04-17 11:14:26 -07:00
Peter Bell	50d4c63f46	Allow inlining of more Tensor methods (#53905 ) Summary: This `is_meta` call in `TensorIterator` shows up in profiling as around 4-5% of fast setup time: `49a5f99440/aten/src/ATen/TensorIterator.cpp (L886)` After inlining, `is_meta()` compiles to a single `test` instruction. Saving 20-30 ns per operator call. The functions I'm moving into the header here are all similar, in that they inline away to almost nothing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53905 Reviewed By: gchanan Differential Revision: D27513232 Pulled By: swolchok fbshipit-source-id: 33ec9eefecd0ddebc285e1d830edb558818dc391	2021-04-17 09:21:23 -07:00
Lucas Hosseini	be2a0805d2	[TensorPipe] Update tensorpipe subodule + remove TP_NEW_API switch. (#56260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56260 Test Plan: CI Reviewed By: lw Differential Revision: D27693102 fbshipit-source-id: b682e88f818657065a478b5a90ca1a4ca8c52018	2021-04-17 07:31:07 -07:00
Bert Maher	928a4733af	[nnc] Only lower float conv2d's (#56289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56289 While there's no reason to think non-float32 conv2d's don't work, they're only tested in float32 now. Since that's the most important use case, I'd rather restrict the dtypes than spend time testing all the weird dtype combinations that could possibly happen. ghstack-source-id: 126755549 Test Plan: unit tests Reviewed By: navahgar Differential Revision: D27828495 fbshipit-source-id: fcf179207f2c9b20e0e86eb2b85687517d87063c	2021-04-17 05:12:54 -07:00
Nikolay Korovaiko	04e7891aab	Add adaptive_avgpool2d to the set of fusible ops (#56180 ) Summary: This improves mobilenetv3 perf from ~1300msto 1147ms (~12%) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56180 Reviewed By: Chillee Differential Revision: D27840860 Pulled By: Krovatkin fbshipit-source-id: 6ce38e93fd2f55e68a69c34b45271743f84a13b8	2021-04-17 02:04:17 -07:00
Natalia Gimelshein	7636cb6bab	clean up unused reduction functions in THC (#56293 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/56293 Reviewed By: mruberry Differential Revision: D27833949 Pulled By: ngimel fbshipit-source-id: b9bf03c783b41c35890249902ea9bf1c34c9c13d	2021-04-16 22:37:17 -07:00
Nikolay Korovaiko	a43483586d	A heuristic to avoid perf incompatible MKLDNN formats for binary ops (#56089 ) Summary: After adding new ops to a set of fusible ops, mobilenetv3 slows down to 9000ms from 1200ms without this fix. This happens because one of the inputs was expanded and converted to nchw/nhwc we might end up in a very bad spot if the second argument is in a blocked format. In this case, MKLDNN uses its reference implementation for a binary operation that follows these broadcasts and it could be up to ~100x slower. We use a very simple heuristic to convert an arg in nchw to the blocked format of the other argument. * MKLDNN_VERBOSE without the issue: [test_mobilenet_nopool.txt](https://github.com/pytorch/pytorch/files/6319528/test_mobilenet_nopool.txt) * MKLDNN_VERBOSE with the issue (Note the times for `ref` operations) [test_mobilenet_pool.txt](https://github.com/pytorch/pytorch/files/6319529/test_mobilenet_pool.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56089 Reviewed By: eellison Differential Revision: D27796688 Pulled By: Krovatkin fbshipit-source-id: fc34d76358ce899e3b1f2b69efb9b5c38f5af1ad	2021-04-16 20:58:17 -07:00
James Reed	d02919dd50	[FX] Make shape_prop handle targets with aggregate outputs (#56221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56221 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D27810693 Pulled By: jamesr66a fbshipit-source-id: 17c6ad671786b3bacb5026bd88b8f5b7b4b96a1a	2021-04-16 18:58:25 -07:00
davidriazati@fb.com	72a93a6337	Fix warnings in ivalue test (#56303 ) Summary: Simple conversion of `std::unordered_map` -> `c10::Dict` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56303 Pulled By: driazati Reviewed By: swolchok Differential Revision: D27833970 fbshipit-source-id: e08b852a20b1cabc1cef890cdcbacbd0d40a3a8a	2021-04-16 18:33:28 -07:00
Vasiliy Kuznetsov	48e675ac75	fx quant: fix subtle bug in BinaryOpQuantizeHanlder logic in matching (#56294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56294 When matching a pattern to `BinaryOpQuantizeHandler`, we need to make sure we check for dtype support on the base node, instead of the current node. This is important in cases such as `add-relu` and `mul-relu`, when the current node is `relu`, but the base node is `add\|mul`. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` There is no good test case to check this in current logic. Created an add-relu model manually, and verified with pdb that the add node was being used to match against dtypes. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27831070 fbshipit-source-id: 3697f1328dff9fec3eb910bae49a73793ef36d63	2021-04-16 18:19:22 -07:00
Jordan Fix	5eadc243f3	Preserve node meta info in split_module (#56212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56212 The current design doesn't make it easy to use `node.copy()`. Explicitly copy over the node's meta. Test Plan: Updated `test_subgraph_creation` in `test_fx_experimental` Reviewed By: jamesr66a Differential Revision: D27808477 fbshipit-source-id: 7fe7b6428c830307dbd1e395f16fa2774936d3b3	2021-04-16 18:02:50 -07:00
Jerry Zhang	98933866a9	[quant][graphmode][fx] Optimize cat (#54813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54813 Previously we have a cat that takes a list of Tensors with different qparams and dequantize them cacatenate them and requantize with the output qparams. This adds some unnecessary overhead in dequantizing and quantizing Tensors. This PR adds an optimization for cat operator, we'll make sure inputs and output of cat uses same observer/fake_quant and produce a cat that does not do rescaling. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27408377 fbshipit-source-id: 6a4bdcfd15e57ea1fe0f7e72d1e1288eb3ece4db	2021-04-16 16:00:43 -07:00
Tugsbayasgalan Manlaibaatar	cd780e1c6e	Move graph iterator to seperate utility file (#56211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56211 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27828150 Pulled By: tugsbayasgalan fbshipit-source-id: f91747fabde9caf864a62e4028fdc7bbbab7ee66	2021-04-16 15:51:26 -07:00
Zhengxu Chen	8176ab6ca0	[JIT] Put explicit error message on class attribute accesses. (#55723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55723 Resolving https://github.com/pytorch/pytorch/issues/51139 Test Plan: python test/test_jit.py TestClassType.test_unresolved_attributes Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27691960 fbshipit-source-id: 1d078a4ab25af1a73109ca6ef0333a67a634bff6	2021-04-16 15:47:10 -07:00
Jeffrey Wan	d312aeb6ac	Implement faster gradcheck but not enabled for most things (#54480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54480 This PR shouldn't really change the behavior of gradcheck for most ops. However, the changes in test_autograd allow us to run basic checks for both fast and slow (instead of previously just slow). All it should be doing is wrapping the preexisting tests we introduced in prior PRs in a function which takes `fast_mode` as a param. We then call this function twice, once with `fast_mode=True` and once with `fast_mode=False`. Plan for rollout: - This PR should only land the code (and runs some basic checks as described above). - This should help us verify that a) slow is still working as expected b) basic functionality of fast works - After we land this, but before we run the next PR in the stack, we should land https://github.com/pytorch/pytorch/pull/55182. This is to ensure that there is no gap where the slow tests aren't running. - The next PR is responsible for enabling the fast_mode=True flag on all tests (where the function has real inputs/outputs), and selectively disabling for the cases the fail. - Finally in a later PR, we reenable fast-gradcheck for functions w/ complex inputs/outputs TODOs and open questions (not necessarily blocking this PR): - ~How do we think about atol/rtol~ (scale atol, keep rtol as-is) - ~reenable fast-gradcheck for complex numbers~ - ~when inputs are uncoalesced we don't truly test this case because we coalesce the inputs before calling function. Revisit this when https://github.com/pytorch/pytorch/pull/52874/files is landed~ ### Developer Experience Sample output when jacobian mismatch occurs: ``` Traceback (most recent call last): File "/home/s/local/pytorch4/test/test_autograd.py", line 4220, in test_gradcheck_jacobian_mismatch check(fast_mode=True) File "/home/s/local/pytorch4/test/test_autograd.py", line 4196, in check gradcheck(fn, (x,), fast_mode=fast_mode) File "/home/s/local/pytorch4/torch/testing/_internal/common_utils.py", line 2067, in gradcheck return torch.autograd.gradcheck(fn, inputs, **kwargs) File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 1020, in gradcheck if not fast_gradcheck(fail_test, seeded_func, func_out, tupled_inputs, outputs, eps, rtol, File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 915, in fast_gradcheck return fail_test(get_notallclose_msg(a, n, i, j, prefix) + jacobians_str) File "/home/s/local/pytorch4/torch/autograd/gradcheck.py", line 996, in fail_test raise RuntimeError(msg) RuntimeError: Jacobian mismatch for output 0 with respect to input 0, numerical:tensor(0.9195) analytical:tensor(0.9389) The above quantities relating the numerical and analytical jacobians are computed in fast mode. See: https://github.com/pytorch/pytorch/issues/53876 for more background about fast mode. Below, we recompute numerical and analytical jacobians in slow mode: Numerical: tensor([[1.0000, 0.0000, 0.0000, 0.0000], [0.0000, 1.0000, 0.0000, 0.0000], [0.0000, 0.0000, 1.0000, 0.0000], [0.0000, 0.0000, 0.0000, 1.0000]]) Analytical: tensor([[1.0100, 0.0100, 0.0100, 0.0100], [0.0100, 1.0100, 0.0100, 0.0100], [0.0100, 0.0100, 1.0100, 0.0100], [0.0100, 0.0100, 0.0100, 1.0100]]) The max per-element difference (slow mode) is: 0.010000000000054632. ``` Additionally, if the per-element difference is small i.e., `allclose(analytical_slow, numerical_slow, rtol, atol) is True` we follow up with this message: ``` Fast gradcheck failed but element-wise differences are small. This means that the test might've passed in slow_mode! If you are adding a new operator, please file an issue and then use one of the workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck. If the test - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck with `fast_mode=False` as a keyword argument. - is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test to have `gradcheck_fast_mode=False` - is a Module test (e.g., in common_nn.py), then modify the corresponding module_test entry to have `gradcheck_fast_mode=False` ``` Test Plan: Imported from OSS Reviewed By: walterddr, ejguan Differential Revision: D27825160 Pulled By: soulitzer fbshipit-source-id: 1fe60569d8b697c213b0d262a832622a4e9cf0c7	2021-04-16 15:03:18 -07:00
Jay Chae	83cfaf1a12	[kineto] deprecate pthreadid (#56209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56209 Pull Request resolved: https://github.com/pytorch/kineto/pull/172 in this diff of the stack, we remove the threadId field from the ClientTraceActivity as our step towards the deprecation Test Plan: sandcastle builds to cover all the dependent targets Reviewed By: ilia-cher Differential Revision: D27662747 fbshipit-source-id: 040ba040390680a0fc63ddc8149c6fad940439fc	2021-04-16 14:45:48 -07:00
Nikitha Malgi	643dd26389	Fix formatting for the new language reference (#56042 ) Summary: This PR fixes the formatting issues in the new language reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/56042 Reviewed By: gmagogsfm Differential Revision: D27830179 Pulled By: nikithamalgifb fbshipit-source-id: bce3397d4de3f1536a1a8f0a16f10a703e7d4406	2021-04-16 14:18:09 -07:00
jiej	ce1380f9b5	fixing Optional[Tensor] type in autodiff (#55565 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54783 We need to be extra careful with the pattern to legitimately use `unchecked_unwrap_optional` in autodiff. This would at least allow us to start support `Optional[Tensor]` in autodiff, which is quite common in composite layers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55565 Reviewed By: ejguan Differential Revision: D27825336 Pulled By: Krovatkin fbshipit-source-id: a8562eb10ea741effff430d7417d313b1eb53dfe	2021-04-16 14:06:49 -07:00
Aliaksandr Ivanou	d30e31cfe6	[20/n][torch/elastic][upstream] Move torchelastic.distributed.tests to pytorch.distributed (#56215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56215 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56077 Move torchelastic.distributed.tests to pytorch.distributed Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/... Reviewed By: H-Huang Differential Revision: D27808887 fbshipit-source-id: 6c9e2cba0bb202d8a5497697773d48e215e555f8	2021-04-16 14:00:10 -07:00
Tugsbayasgalan Manlaibaatar	1360980659	Remove duplicate test due to rebasing mistake (#56287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56287 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27828430 Pulled By: tugsbayasgalan fbshipit-source-id: a5d846871ce78399409113fd5dbf2c43a4e46296	2021-04-16 13:57:13 -07:00
Erjia Guan	d4fad109e8	Add OpInfo tests for torch.inner (#55536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55536 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27650199 Pulled By: ejguan fbshipit-source-id: 5805f1ca25019fc57971e31659fac345646368b6	2021-04-16 13:52:22 -07:00
Aliaksandr Ivanou	a6940aae37	[19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher (#56214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56037 The diff introduces new `torch.distributed.elastic_launch` and removes internals of `torch.distributed.launch` keeping backwards compatibility. Since torchelastic and torch.launch are not fully compatible due to `--use_env` arg, the `torch.distributed.launch` deprecation is going to be iterative: as part of pytorch 1.9 we are going to deprecate it, and in the following releases we will remove `torch.distributed.launch` The diff leaves `torchelastic.distributed.launch` module, and the follow up diffs will migrate the users form `torchelastic.distributed.launch` to `torch.distributed.elastic_launch` Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/... Reviewed By: H-Huang Differential Revision: D27805799 fbshipit-source-id: 599a4c0592fbc7a1bc1953040626dd6b72bac907	2021-04-16 13:38:23 -07:00
Yorkie Liu	5a9b1ddf3b	fix the readme link (#56269 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/56269 Reviewed By: VitalyFedyunin Differential Revision: D27824048 Pulled By: ejguan fbshipit-source-id: 8d5ecbdf502ae8bf8e807b55f6daeb3ff234aa62	2021-04-16 13:35:53 -07:00
Raghavan Raman	164de39a11	Fix build failure due to namespace change for log_out and tanh_out (#56278 ) Summary: There is a build failure in `bench_approx.cpp` due to namespace change for log_out and tanh_out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56278 Reviewed By: bertmaher, nikithamalgifb Differential Revision: D27825621 Pulled By: navahgar fbshipit-source-id: 0bccd324af92a3460610bf475514449f0223de2b	2021-04-16 13:34:32 -07:00
Michael Suo	8d4e6c9570	[package] make GlobGroup a public concept (#56238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56238 It's already functionally public due to `extern` and `mock`, but exposing the underlying implementation makes extending PackageExporter easier. Changed the underscores, expose on `torch.package`, add docs, etc. Differential Revision: D27817013 Test Plan: Imported from OSS Reviewed By: Lilyjjo Pulled By: suo fbshipit-source-id: e39199e7cb5242a8bfb815777e4bb82462864027	2021-04-16 13:31:48 -07:00
David Riazati	1ec12fd491	Add minidump collection via breakpad (#55647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55647 This adds [breakpad](https://github.com/google/breakpad) which comes with out-of-the-box utilities to register a signal handler that writes out a minidump on an unhandled exception. Right now this is gated behind a flag in `torch.utils`, but in the future it could be on by default. Sizewise this adds aboute 500k to `libtorch_cpu.so` (187275968 B to 187810016 B). ```bash $ cat <<EOF > test.py import torch torch.utils.enable_minidump_collection() # temporary util that just segfaults torch._C._crash() EOF $ python test.py Wrote minidump to /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp fish: “python test.py” terminated by signal SIGSEGV (Address boundary error) $ minidump-2-core /tmp/pytorch_crashes/6a829041-50e9-4247-ea992f99-a74cf47a.dmp -o core.dmp $ gdb python core.dmp ... commence debugging ... ``` Right now all exceptions that get passed up to Python don't trigger the signal handler (which by default only handles [these](https://github.com/google/breakpad/blob/main/src/client/linux/handler/exception_handler.cc#L115)). It would be possible for PyTorch exceptions to explicitly write a minidump when passed up to Python (maybe only when the exception is unhandled or something). Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27679767 Pulled By: driazati fbshipit-source-id: 1ab3b5160b6dc405f5097eb25acc644d533358d7	2021-04-16 13:05:01 -07:00
Mikhail Zolotukhin	5f19385588	[TensorExpr] Add aten::matmuls to TE fuser. (#54605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54605 For small sizes we generate a naive 3-layer loopnest, for bigger sizes we generate an external call. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27298364 Pulled By: ZolotukhinM fbshipit-source-id: 2ddf275ff68d6fca16a3befca5ce5c26aef462b5	2021-04-16 12:54:38 -07:00
Nico	8d7faa2af8	Update _torch_docs.py to close #56240 . (#56242 ) Summary: Update _torch_docs.py to close https://github.com/pytorch/pytorch/issues/56240. Added the "generator" argument to the docs of torch.rand and torch.randn. Fixes https://github.com/pytorch/pytorch/issues/56240 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56242 Reviewed By: ejguan Differential Revision: D27821513 Pulled By: agolynski fbshipit-source-id: e42c431eddc7a83bd1c1ea368a2effbe3f10e92e	2021-04-16 12:09:49 -07:00
Ailing Zhang	0dc6e7ae38	Move grad_mode.h/cpp to c10. (#56204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56204 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27807139 Pulled By: ailzhang fbshipit-source-id: 2b693eb0a1034138d8bd68836528078ea5f38145	2021-04-16 11:50:08 -07:00
Eli Uriegas	d79326ce7a	Revert D27812204: [sparsity] Moving only the C++ files from internal to OSS Test Plan: revert-hammer Differential Revision: D27812204 (`3e0744a1ae`) Original commit changeset: 6becaba3ab9c fbshipit-source-id: 335fdd37f6cfd8a65aae749354cfe52590be5043	2021-04-16 11:46:35 -07:00
Brian Hirsh	eca98fedb5	split out NamedCType from CType. Remove direct string comparison from autograd codegen (#55334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55334 The goal of this PR is to clean up some of the autograd codegen to compare C++ types using `CType` objects instead of raw strings. My last PR in the stack made that string comparison a little more fragile, since the raw C++ strings needed to be namespace-aware. I confirmed byte-for-byte no codegen changes vs. the last PR (which added namespaces to the codegen) by running `diff -qr ../pytorch-common_test/torch/csrc/autograd/generated/ ../pytorch-callgrind_test_after2/torch/csrc/autograd/generated/` and `diff -qr ../pytorch-common_test/build/aten/src/ATen/ ../pytorch-callgrind_test_after2/build/aten/src/ATen/` Note that a better end-state for the autograd codegen would be to do all of its type pattern matching directly off of JIT types, instead of off of CType’s (which are really just generated from JIT types, incorporating C++ specific semantics). That looks like it’ll require a pretty substantial change though, so I’m not doing it in this PR. As part of this change (and after talking with ezyang), I split off the `CType` data class into a separate `NamedCType` class, which holds a name and a `CType`. This way, `CType` only knows about actual C++ types, making it easier to compare CType’s to each other in the codegen when we only care about the type. The core change is in `types.py`, but it required a bunch of downstream changes to update all of the places where we create `CType`s to create `NamedCType`s instead. The main change in the autograd codegen was that I updated `SavedAttribute` to store a `NamedCType`. The other autograd changes all pretty much came from that change. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708347 Pulled By: bdhirsh fbshipit-source-id: 3e07c80569c7b229c638f389e76e319bff6315f9	2021-04-16 11:43:08 -07:00
Brian Hirsh	947c7a8215	add C++ namespacing logic to ctypes (#55047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55047 Added namespaces to all of the `CTypes` printed in the codegen. This is pretty much required if we want to use codegen externally, since we can no longer assume that we're inside of the `at::` namespace. Important changes are in `types.py`. How do we add the notion of namespaces to C++ types without people having to write "at::Tensor" everywhere? Before this PR, `CType` held a raw string representing the type, i.e. `BaseCType("Tensor", binds)`. This PR introduces a set of singleton base C++ types in `types.py`, that know how to print their namespace. Instead, we'd write `BaseCType(tensorT, binds)`, where printing `tensorT` will properly print out "at::Tensor". This also means that you can't create arbitrary `CTypes`. If we need a new C++ type in the codegen, we need to add it to the list in `types.py`. One blip in the design: we don't want to change `RegistrationDeclarations.yaml`, since that'll break external backends that ingest it. I added separate functions to display types without the namespace that are used to create RegistrationDeclarations.yaml`. With an external codegen API though, we can eventually kill it :) I also didn't realize until this PR that `Declarations.yaml` is still directly in use, by some python/autograd codegen. Rather than keep that yaml byte-for-byte compatible, I just updated the callsites in the autograd codegen to work with namespaces. In the NEXT pr, I try to clean up some of the autograd codegen to stop using raw strings to match against C++ types. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708349 Pulled By: bdhirsh fbshipit-source-id: 56a4f81fc101795bcb9ee1f722121480fb2356ad	2021-04-16 11:43:06 -07:00
Brian Hirsh	164bee1d09	Return a CType instead of a string for returns, beef up CType (#55046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55046 Updating `returns` in the codegen to return a CType instead of a raw string. This has benefit of putting all stringifying logic through CType, which is useful in the followup PR when I add namespaces. I also added new CTypes for other templated C++ types: array, vector and tuple. Mostly because it makes the namespacing logic in the next PR significantly easier. It also seems more natural to me that `BaseCType` shouldn't represent specializations of templated types. There's a little bit of weirdness, types that are currently only used for returns, i.e. `TupleCType`. Returns aren't named, so I opted not to give it one- so we can add it in later if we discover that we need it. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27708348 Pulled By: bdhirsh fbshipit-source-id: 230b210c3e53be1bd362105fbea8451055dc59a8	2021-04-16 11:41:46 -07:00
Georgia Hong	26046b9110	[caffe2][publish] Optimize metanetdef load Summary: When loading optional blobs from a large file to workspace, for instance: https://fburl.com/diffusion/l0mcnofg, we are currently loading the file in multiple times. https://fburl.com/diffusion/qhbpyq0e This diff optimized the load time by loading in the large model file only once, and using the allow_incomplete arg into LoadOp. The implementation of the LoadOp with this arg previously did not delete the blobs that were not found, which is also fixed in this diff. Test Plan: Existing unit tests: ``` buck test //caffe2/caffe2/fb/distribute/tests:meta_net_def_storage_utils_test ``` Many sandcastle integration tests. scuba logs: https://fburl.com/scuba/dai_modelstore/txdf3pjt Reviewed By: TailofJune Differential Revision: D27575622 fbshipit-source-id: 7c2b25ef603a378e87ebdbe349c94c2f1952493c	2021-04-16 11:35:53 -07:00
Sam Estep	7629477ff7	Filter out more expected errors from sccache log (#56281 ) Summary: This PR extends `.jenkins/pytorch/print_sccache_log.py` to filter out a distracting "error" message that walterddr came across while debugging failures in https://github.com/pytorch/pytorch/issues/55176: ``` =================== sccache compilation log =================== ERROR 2021-04-05T15:44:18Z: sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected ‘;’ before ‘}’ token\n int main() { return 0 }\n ^\n" } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56281 Test Plan: TODO (reviewers: is there an easy way to test this?) Reviewed By: walterddr Differential Revision: D27826064 Pulled By: samestep fbshipit-source-id: 7322a830c1246820a5b2b7bbeaa4697ebd13b617	2021-04-16 11:27:41 -07:00
Zafar Takhirov	3e0744a1ae	[sparsity] Moving only the C++ files from internal to OSS Summary: This splits the previous diff into multiple parts. This introduces only the c++ files. The unittests pass as part of the internal build. Will be put in the OSS in the later PRs Test Plan: `buck test mode/opt //caffe2/torch/fb/model_optimization:sparsity_test` ``` Parsing buck files: finished in 2.0 sec Creating action graph: finished in 16.4 sec Building: finished in 55.0 sec (100%) 20264/20264 jobs, 16 updated Total time: 01:13.6 min More details at https://www.internalfb.com/intern/buck/build/c9c5e69e-ce00-4560-adce-58b68bc43e47 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 1e678a07-0689-45b4-96f3-54d0a3181996 Trace available for this run at /tmp/tpx-20210415-161113.966600/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224795029304 ✓ ListingSuccess: caffe2/torch/fb/model_optimization:sparsity_test - main (4.186) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (1.752) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseKernels) (1.884) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear_serdes (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (2.013) Summary Pass: 3 ListingSuccess: 1 ``` Reviewed By: raghuramank100 Differential Revision: D27812204 fbshipit-source-id: 6becaba3ab9cd054caf8b9bbae53af6d01347809	2021-04-16 11:18:39 -07:00
Sam Estep	bb35b066af	Put `env` before `run` or `with` in GHA workflows (#56268 ) Summary: Addresses seemethere's [comment](https://github.com/pytorch/pytorch/pull/56071#discussion_r614469633) on https://github.com/pytorch/pytorch/issues/56071. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56268 Test Plan: CI. Reviewed By: seemethere, ejguan Differential Revision: D27823149 Pulled By: samestep fbshipit-source-id: 44d816abd85372b58c70bd81b189a0659a4079a4	2021-04-16 11:06:39 -07:00
Prabhat Roy	03cc9fabd4	Added complex datatype support to sigmoid on cuda (#55975 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55975 Reviewed By: ezyang Differential Revision: D27770438 Pulled By: prabhat00155 fbshipit-source-id: 730193950805ce28d8672104fe446a647194e8cb	2021-04-16 10:48:38 -07:00
Jeffrey Wan	dd8bfe2b93	Finish deprecation cycle for inplace view error checks (#56093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50617 Also updates the relevant tests to expect errors instead of warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/56093 Reviewed By: agolynski Differential Revision: D27806795 Pulled By: soulitzer fbshipit-source-id: 93c5c28edb1f97fa4457332c2ef4711f050ac81f	2021-04-16 10:44:58 -07:00
Vasiliy Kuznetsov	9f216b9499	ns for fx: enable shadowing int8 to int8 (#56205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56205 Allows for int8 modules to shadow int8 modules. This is useful when comparing quantized models with different qconfigs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_int8_shadows_int8 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27807405 fbshipit-source-id: 10c3bc7ab9bb1e6808aa1af23a34c7cf380465fd	2021-04-16 10:34:47 -07:00
Vasiliy Kuznetsov	ae0af8bb51	ns for fx: move unmatchable mod/fun/meth mapping to mappings file (#56197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56197 No logic change, just moving code around. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_op_io_dtype_coverage ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27805332 fbshipit-source-id: 0a63cf6ef7e5c4f655cdd5a18d54cc988424ac80	2021-04-16 10:34:46 -07:00
Vasiliy Kuznetsov	6de5d13e0f	ns for fx: make `call_method` nodes work in NS APIs (#56196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56196 Enables `call_method` nodes to work in NS APIs for unshadowed and shadowed activations. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_op_io_dtype_coverage python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_match_activations_meth_ptq python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_meth_ptq ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27805335 fbshipit-source-id: 39b9c02c5c5faf098f2dd4f36d1ea8296d51a63c	2021-04-16 10:34:44 -07:00
Vasiliy Kuznetsov	07f3eaa716	ns for fx: remove deprecated code (#56195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56195 This is outdated, removing (forgot to clean up in a previous PR). Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27805334 fbshipit-source-id: 3b035945b4928a3c727e96e0f7fe0efe201f42c0	2021-04-16 10:34:42 -07:00
Vasiliy Kuznetsov	0fbc2be234	ns for fx: enable `call_method` nodes in graph matching (#56194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56194 Enables the NS graph matcher to also match `call_method` nodes. These are useful for ops such as `torch.sigmoid`. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_methods ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27805333 fbshipit-source-id: 509ae283db6b245671f11e3eb6b7fcb3a5735ef5	2021-04-16 10:34:41 -07:00
Vasiliy Kuznetsov	2380cc7d65	ns for fx: fill out coverage for node I/O types (#55918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55918 Adds coverage for determining I/O dtype for various ops. This will enable shadowing of these ops. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_op_io_dtype_coverage ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27740661 fbshipit-source-id: c5ce873ec56bffa50ca46d2fe134c70ed677e37e	2021-04-16 10:34:39 -07:00
Vasiliy Kuznetsov	430fc03e3f	ns for fx: add category for ops which accept fp32 or int8 input (#55859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55859 Adds mappings for ops which can accept either fp32 or int8 input, such as `F.relu`. A future PR will fill out the op coverage. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_op_with_either_fp32_or_int8_input ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D27740659 fbshipit-source-id: cfc3dd58319b7161ca7f1fe05cd22d9a3ff11141	2021-04-16 10:34:37 -07:00
Vasiliy Kuznetsov	5ec6434945	ns for fx: move op dtype category mapping to separate file (#55858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55858 Moves the mappings of input and output dtypes of various ops into its own file, and makes the variable names more clear. No logic change. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D27740662 fbshipit-source-id: d384e7e542d9cc868d9cee9c53c2ac2f74a15a48	2021-04-16 10:33:05 -07:00
Thomas Viehmann	fe18144618	Generalize HIP-specific launch bounds to apply to CUDA as well (#56143 ) Summary: Launch bounds for HIP were added along the way, but the smaller CUDA devices (like Jetson) also benefit from them. So here I go over the HIP-specific launch bounds and try to generalize them to cover CUDA, too. The long term goal is to eventually not need to resort to somewhat ad-hoc adaptations like the reduction of block size discussed in https://github.com/pytorch/pytorch/issues/8103, but have good coverage of our kernels with launch bound annotations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56143 Reviewed By: agolynski Differential Revision: D27804640 Pulled By: ngimel fbshipit-source-id: d4c345f9f7503e050a46361bfe2625865d0a42ba	2021-04-16 10:29:28 -07:00
Heitor Schueroff	48c6f0c25e	Add OpInfo for torch.mean (#55525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55525 Reviewed By: agolynski Differential Revision: D27796651 Pulled By: heitorschueroff fbshipit-source-id: 6473d854f090ff62c856b404870f226f46569449	2021-04-16 10:10:24 -07:00
anjali411	119b3eccda	Revert "Revert D27598681: Add OpInfo tests for torch.addbmm" (#55908 ) Summary: This reverts commit fd450ff1b93e4c498e7326cb35c7c26760c5ddbf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55908 Reviewed By: agolynski Differential Revision: D27800571 Pulled By: anjali411 fbshipit-source-id: f04144afe7768872acb3fc2f5f242bb0093abc5e	2021-04-16 10:01:43 -07:00
Jane Xu	f9b3dcba0d	Store coverage.xml as artifact for windows test jobs (#56179 ) Summary: Currently, coverage stats is getting covered for sharded windows tests. This PR attempts to store the coverage.xml file as an artifact. I wonder what CircleCI will do when the artifacts don't exist (for nonsharded tests), and if we could conditionally store artifacts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56179 Reviewed By: samestep Differential Revision: D27800628 Pulled By: janeyx99 fbshipit-source-id: 919f5696c0d7b4ee0d99969f35797f5be644c364	2021-04-16 07:53:57 -07:00
Sam Estep	c5e80d30bf	Harden "Add annotations" workflow (#56071 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/55810 by closing some possible security holes due to using [GitHub Actions `${{ <expressions> }}`](https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions) in `.github/workflows/add_annotations.yml` and also patching a few other possible scenarios that could cause the workflow to fail by a PR passing a malformed artifact. - [x] flag and remove GitHub Actions expressions in JS scripts - [x] don't fail the workflow if the artifact doesn't look as expected - [x] write unit tests for `tools/extract_scripts.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56071 Test Plan: I tested the end-to-end "Lint" and "Add annotations" system in a separate sandbox repo, including the following cases: - well-formed artifact - missing artifact - artifact containing a file named `linter-output.zip` (name clash) - artifact whose `commit-sha.txt` doesn't contain a 40-digit hex string - artifact whose `commit-sha.txt` contains a 40-digit hex string that isn't a valid Git hash for the current repo - in this last case, the workflow does fail, but handling that is the responsibility of [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action), not pytorch/pytorch To run the new unit tests added in this PR: ``` python tools/test/test_extract_scripts.py ``` Reviewed By: seemethere Differential Revision: D27807074 Pulled By: samestep fbshipit-source-id: e2d3cc5437fe80ff03d46237ebba289901bc567c	2021-04-16 07:46:20 -07:00
Gary Miguel	e387bd780e	Ignore envrc files (#56199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56199 Reviewed By: ejguan Differential Revision: D27821439 Pulled By: agolynski fbshipit-source-id: 4be7158d723c58f82b6ec56b3817932899e1b196	2021-04-16 07:36:51 -07:00
Pavel Belevich	f236c27819	Update Gloo submodule (#56189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56189 Reviewed By: rohan-varma Differential Revision: D27814124 Pulled By: pbelevich fbshipit-source-id: cdea2db24634d9d171cac60709ef5135c099aabe	2021-04-16 07:20:31 -07:00
Erjia Guan	b96cc9ab20	[FX][testing] Test tracing into all the standard torch.nn.functional (#55550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55550 Add a test for `symbolic_trace` into `torch.nn.functional` Test against all `functional`s with `torch.Tensor` argument and `functional`s from `FUNCTIONALS_WITHOUT_ANNOTATION`. ```py FUNCTIONALS_WITHOUT_ANNOTATION = ( "adaptive_max_pool1d", "adaptive_max_pool2d", "adaptive_max_pool3d", "fractional_max_pool2d", "fractional_max_pool3d", "max_pool1d", "max_pool2d", "max_pool3d", "gaussian_nll_loss", "upsample", "upsample_bilinear", "upsample_nearest", ) ``` `UNTRACEABLE_FUNCTIONALS` lists 110 current untraceable `functional`s with expected `Error`. - `BUILT_IN_FUNC`: built-in functions or built-in methods can not be traced. - `PROXY_ITERATED`: Proxy object cannot be iterated. This can be attempted when used in a for loop or as a args or *kwargs function argument - `LEN_ERROR`: 'len' is not supported in symbolic tracing by default. If you want this call to be recorded, please call torch.fx.wrap('len') at module scope - `ARG_TYPE_MISMATCH`: `functional()`: argument <name> (position <n>) must be <type>, not Proxy - `CONTROL_FLOW`: symbolically traced variables cannot be used as inputs to control flow - `INTERPOLATE_ARGS_CONFLICT`: When tracing the functional by calling `interpolate(input, size, scale_factor, mode="bilinear", align_corners=True)`, `ValueError("only one of size or scale_factor should be defined")` is raised Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D27659367 Pulled By: ejguan fbshipit-source-id: d0d05e4d94e0b85f47e6c171a31f0d41b1387373	2021-04-16 06:48:02 -07:00
CodemodService FBSourceClangFormatLinterBot	1a1b23f00c	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27818592 fbshipit-source-id: dc9d12a747464bb3c3d88bead606de6e9233b80c	2021-04-16 04:16:20 -07:00
Luca Wehrstedt	6b5ed5ec45	Revert D27803529: [pytorch][PR] .github: Add initial linux CI workflow Test Plan: revert-hammer Differential Revision: D27803529 (`7d410bc3c8`) Original commit changeset: 52a65ec8f7a8 fbshipit-source-id: ce968654f2aecd8b36b5f86e0fe5ed6056f0fb8a	2021-04-16 02:53:31 -07:00
Jerry Zhang	0a541e23e1	[nn] Add allow_duplicate option for named_modules (#54812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54812 Needed for quantization since different attribute might refer to the same module instance Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27408376 fbshipit-source-id: cada85c4a1772d3dd9502c3f6f9a56d690d527e7	2021-04-16 01:26:16 -07:00
Tugsbayasgalan Manlaibaatar	b405e2ce12	Implicit conversion from null tensor to NoneType (#55823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55823 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27717324 Pulled By: tugsbayasgalan fbshipit-source-id: a071b90bcea9e8f2b5da633a8dadd11772fb5101	2021-04-16 00:05:52 -07:00
Richard Zou	d2d1112513	Set ThreadLocalState correctly in the autograd engine (#56174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56174 evaluate_function: 1. calls the autograd function (call_function) 2. accumulates gradients into buffers Previously, ThreadLocalStateGuard only covered part of `call_function`. However, it should cover all Tensor operations in `evaluate_function`, so this PR moves it to do so. One alternative would have been to move ThreadLocalStateGuard to here: `71f9e99e29/torch/csrc/autograd/engine.cpp (L394)` Unfortunately that adds 2% additional instructions according to the instruction count benchmark in the next section. This is because `evaluate_function` does an early return: `71f9e99e29/torch/csrc/autograd/engine.cpp (L732-L735)` If this is preferred, please let me know. Test Plan: - run existing tests. It's hard to actually come up with a test case for this. Benchmark plan: TL;DR: Instruction count decreases by a little after this PR. ``` import torch from torch.utils.benchmark import Timer timer = Timer( stmt="""\ torch::autograd::grad({y}, {x}, {}, /retain_grad=/true);""", setup="""\ auto x = torch::ones({}, torch::requires_grad()); auto y = x * 2;""", language="cpp") stats = timer.collect_callgrind() print(stats) ``` This gave the following: ``` Before: <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f4b28ce6a90> torch::autograd::grad({y}, {x}, {}, /retain_grad=/true); setup: auto x = torch::ones({}, torch::requires_grad()); auto y = x * 2; All Noisy symbols removed Instructions: 3514184 3514184 Baseline: 0 0 100 runs per measurement, 1 thread After: <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7fdbc9d187d0> torch::autograd::grad({y}, {x}, {}, /retain_grad=/true); setup: auto x = torch::ones({}, torch::requires_grad()); auto y = x * 2; All Noisy symbols removed Instructions: 3513884 3513884 Baseline: 0 0 100 runs per measurement, 1 thread ``` Reviewed By: albanD Differential Revision: D27799283 Pulled By: zou3519 fbshipit-source-id: 0a8213824e08c04748d38e66604c73f395285d63	2021-04-15 20:57:27 -07:00
Michael Suo	8f68396462	[package] fix error handling with allow_empty (#56190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56190 Previously, if we had some code that did the following: ``` - pattern A, allow_empty=False - save module B, but throws an exception for whatever reason - save module that causes match against A ``` Then the resulting behavior would be: 1. exception thrown, which triggers `__close__` on `PackageExporter` 2. `PackageExporter` checks that all patterns are matched against, and sees that A was not matched. 3. Error is raised that we didn't match against pattern A. This is confusing, since the real error that caused packaging to fail occurred when trying to package module B, but it's being hidden by the error about module A (even though if packaging module B had succeeded, there would be no error). Change it so that the behavior looks like: 1. exception thrown, which triggers `__close__` on `PackageExporter` 2. `PackageExporter` recognizes that an exception is happening and immediately just returns control flow to the caller to handle the "real" exception. Differential Revision: D27803988 Test Plan: Imported from OSS Reviewed By: guangyuwang Pulled By: suo fbshipit-source-id: f67b2e96165a0547c194a8bef1af1c185452173e	2021-04-15 20:16:43 -07:00
Wanchao Liang	4611387608	[optim] take kw-only argument for functional optim APIs (#56185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56185 ghstack-source-id: 126670123 Reviewed By: albanD Differential Revision: D27802169 fbshipit-source-id: f5e1cb2046dcdeecf5f6b0f70892828bf0adb22f	2021-04-15 20:08:04 -07:00
Dhruv Matani	bd3c63aeeb	[PyTorch Edge] Move torch::jit::mobile::_export_operator_list() from serialization/export_module.cpp to mobile/import.cpp (#56044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56044 We want to be able to drop the dependence of full-jit deps in the auto-generated unit tests for 2 reasons: 1. Running bloaty on the auto-generated unit tests should be somewhat representative of the actual size. 2. The runtime environment of the auto-generated unit tests should be as close to the production environment as possible to ensure that we are running the tests in a production-like runtime. Due to the dependece on full-jit, we aren't there yet. For the auto-generated tests, we probably don't need to depend on `_export_operator_list()` evetually, but for now we do since it is used to decide whether the model being run is a Metal GPU model or a CPU model, and gates whether the test runs that model or not. Eventually, we can stop doing this in the test and do it in the codegen from PTM-CLI instead (by fetching the operators from that tool, and writing out to the BUCK file which backend(s) this model is targeting). However, that will take some time to land, so in the spirit of expediency, this change is being proposed. Discussed this offline with iseeyuan ghstack-source-id: 126656877 Test Plan: Build + BSB. Reviewed By: iseeyuan Differential Revision: D27694781 fbshipit-source-id: f31a2dfd40803c02f4fd19c45a3cc6fb9bdf9697	2021-04-15 17:53:36 -07:00
Tao Xu	94ce10f732	[iOS GPU] Use setTexture() rather than copyTexture() (#56069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56069 It's more efficient to capture a MPSImage object than copying a one from outside. ghstack-source-id: 126552396 Test Plan: - All operator tests pass - Sandcastle - CircleCI Reviewed By: SS-JIA Differential Revision: D27694542 fbshipit-source-id: e1bbbffc3f8c109816cb117aebd0aae8576c6c5c	2021-04-15 17:35:29 -07:00
Can Balioglu	42f5d66080	[DDP] Fixes flaky tests caused by incorrect floating-point comparison (#56192 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50699. The root cause was that some floating-point assertions had a "greater than or equal to" condition. The "equal to" part was causing flakiness due to strict equality check (`==`) in `TestCase.assertGreaterEqual()`. This PR introduces a new assertion method called `assertGreaterAlmostEqual()` in `common_utils.py` that mitigates the problem by behaving similar to `TestCase.assertAlmostEqual()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56192 Reviewed By: zhaojuanmao Differential Revision: D27804724 Pulled By: cbalioglu fbshipit-source-id: bc44a41ca4ce45dfee62fb3769fb47bfd9028831	2021-04-15 17:15:42 -07:00
Sam Estep	7d410bc3c8	.github: Add initial linux CI workflow (#55176 ) Summary: This is a commandeer of https://github.com/pytorch/pytorch/issues/54091. TODO: - [x] understand why the build is [failing](https://github.com/pytorch/pytorch/pull/55176/checks?check_run_id=2254742265) here when it was [succeeding](https://github.com/pytorch/pytorch/pull/54091/checks?check_run_id=2177844748) on https://github.com/pytorch/pytorch/issues/54091 - [x] fix the build failure - [x] fix the test failure(s) - [x] add CI check to generate YAML workflows from templates, similar to https://github.com/pytorch/pytorch/issues/55171 - [ ] uncomment the rest of the matrix Pull Request resolved: https://github.com/pytorch/pytorch/pull/55176 Reviewed By: walterddr Differential Revision: D27803529 Pulled By: seemethere fbshipit-source-id: 52a65ec8f7a83b929fed47f0bbdca544210ec9c2	2021-04-15 16:54:04 -07:00
Jay Chae	400398006f	[PARAM] Param comms debug info (#55976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55976 - Define a concrete `DebugInfo` to collect Param comms. - Add a macro to easily log `DebugInfo` Test Plan: Tested on `ads:simplified_launcher` with `dyno gputrace` locally tested in libkinetoObserver that it can collect the debug Infobase Reviewed By: kingchc, ilia-cher Differential Revision: D26773447 fbshipit-source-id: a8eeede2d6dbf34d7a1b3614843b4a1baba94448	2021-04-15 16:22:01 -07:00
Hui Guo	bde53cfd9a	[tensorexpr] Add missing python bindings for NNC Stmts (#55570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55570 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27634987 Pulled By: huiguoo fbshipit-source-id: 220a00b1dcc4d42d93b6600b730d35432316eff6	2021-04-15 16:13:59 -07:00
Vasiliy Kuznetsov	f59244ec16	ns for fx: add test for op relationship coverage (#55837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55837 Adds a test that checks that all of the relevant op pairs defined in `quantization_mappings.py` are also defined as related by Numerical Suite. Note: this does not cover all the ops, just the ones in `quantization_mappings.py`. A future PR will fill out the remainder. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_op_relationship_mapping ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27719979 fbshipit-source-id: 9e852ef94da5f7a653ea15ba52c68a89c8e30208	2021-04-15 16:11:26 -07:00
Vasiliy Kuznetsov	c8209a7336	ns for fx: move pattern utils to separate file (#55805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55805 No logic change, just moving util functions to separate file. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27719982 fbshipit-source-id: c80d5397c1efeb9fc83eacaa532ecbde557cca3f	2021-04-15 16:11:24 -07:00
Vasiliy Kuznetsov	b461104554	ns for fx: make get_reversed_fusions reuse quantization fusions (#55803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55803 Makes the NS `graph_matcher.get_reversed_fusions` use the fusions defined the FX quantization code instead of duplicating them. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27719980 fbshipit-source-id: 12e3183405181bb9001f10e765cfb4d2ffdfdd88	2021-04-15 16:11:23 -07:00
Vasiliy Kuznetsov	84b5f67d9b	ns for fx: add qat tests cases for shadowed activations (#55614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55614 Adds testing for shadowed activations APIs and QAT. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_mod_ptq python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_mod_qat python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_fun_ptq python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_add_shadow_loggers_qat_qat ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27650405 fbshipit-source-id: c5138d98aa072e2927a54329c87e755413adeb5d	2021-04-15 16:11:21 -07:00
Vasiliy Kuznetsov	37fbc069f1	ns for fx: qat test cases for unshadowed activations (#55508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55508 Adds QAT test cases for unshadowed activation APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27650406 fbshipit-source-id: bcbbdf1d32b8f8627c30d6aaf22607f34d1e2e08	2021-04-15 16:11:19 -07:00
Vasiliy Kuznetsov	f6a3936ab3	ns for fx: extend functional weight extraction testing to QAT (#55507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55507 As titled, extends the test cases for weight extraction from functionals to cover QAT. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27650408 fbshipit-source-id: 8ce87d56bbc0da7c2330ece71a897d6d8c5110a0	2021-04-15 16:11:17 -07:00
Vasiliy Kuznetsov	1cbc4023e9	ns for fx: add qat handling for weight extraction (#55506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55506 Makes the NS weight extraction tests also test QAT, and fixes the mappings where necessary to cover all the fusions and make the tests pass. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_mod_ptq python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_mod_qat ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27650409 fbshipit-source-id: c5bd9268d1bc559afc27d4c5109effd77bf1538a	2021-04-15 16:11:16 -07:00
Vasiliy Kuznetsov	3786c2719d	ns for fx: make NSTracer inherit from QuantizationTracer (#55505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55505 This necessary to add support in NS for QAT modules, to avoid duplicating logic between NSTracer and QuantizationTracer. The eng work to expose the custom module and class names to the user will be in a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27650407 fbshipit-source-id: 431f47c5353b41c11371c5efa79657bfd085459a	2021-04-15 16:11:14 -07:00
Vasiliy Kuznetsov	5ad3bc715c	ns for fx: change node I/O determination to strict allowlist (#55434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55434 Before this PR, there was some hacky logic which determined the input and output types of nodes based on heuristics such as inspecting `__module__`, or assuming that an op has an I/O dtype of `torch.float` when the heuristics did not find any matches. This is problematic because the heuristics were not exact, and this could result in non-sensical shadow graphs when the heuristics would return an incorrect dtype. This PR switches the dtype determination to an allowlist system, where we specify exactly what the dtypes are for the nodes or modules which are in an allowlist, and we add an `UNKNOWN` type for everything else. The shadow logic is changed to skip inserting shadows on any function or module where the I/O dtype is unknown. The current allowlist only contains functions necessary for the currently existing tests. Filling out the allowlist with all necessary torch functions is left for a future PR. As a result of this, we can do the following (also implemented in this PR): 1. enable graph matching on nodes with equal types (for example, F.linear and F.linear). The restriction that only nodes with equal types was in the code as a placeholder, it's better to allow comparisons of nodes of equal types. One case where this is useful is unshadowed activations. 2. enable models with user defined modules to be passed to Numeric Suite APIs without errors. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXGraphMatcherModels python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27622418 fbshipit-source-id: 40dcba0222c01154c141467640c1eb89725f33a7	2021-04-15 16:09:51 -07:00
Jay Chae	1ca51f0fba	[kineto] deprecate metdata args from ClientTraceActivity (#55988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55988 Pull Request resolved: https://github.com/pytorch/kineto/pull/165 as part of the ClientTraceActivity -> GenericTraceActivity migration, move all the metadata fields to JSON encoded string Test Plan: - `buck build` - tested with subsequent diffs Reviewed By: gdankel Differential Revision: D27340314 fbshipit-source-id: f55b77a779e4bda1fb8667cb4e0f4252b93af5ea	2021-04-15 16:06:45 -07:00
Victor Bittorf	52f1a07b63	Python API for Vitals (#53238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53238 There is a tension for the Vitals design: (1) we want a macro based logging API for C++ and (2) we want a clean python API. Furthermore, we want to this to work with "print on destruction" semantics. The unfortunate resolution is that there are (2) ways to define vitals: (1) Use the macros for local use only within C++ - this keeps the semantics people enjoy (2) For vitals to be used through either C++ or Python, we use a global VitalsAPI object. Both these go to the same place for the user: printing to stdout as the globals are destructed. The long history on this diff shows many different ways to try to avoid having 2 different paths... we tried weak pointers & shared pointers, verbose switch cases, etc. Ultimately each ran into an ugly trade-off and this cuts the difference better the alternatives. Test Plan: buck test mode/dev caffe2/test:torch -- --regex vital buck test //caffe2/aten:vitals Reviewed By: orionr Differential Revision: D26736443 fbshipit-source-id: ccab464224913edd07c1e8532093f673cdcb789f	2021-04-15 16:06:43 -07:00
Edward Yang	f17c9ea2ed	Port all unary float functions to structured (#56082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56082 The native_functions.yaml changes were done by codemod using the following script: ``` import ruamel.yaml from ruamel.yaml.tokens import CommentToken from ruamel.yaml.error import CommentMark from tools.codegen.model import * # noqa: F403 with open("aten/src/ATen/native/native_functions.yaml", "r") as f: contents = f.read() yaml = ruamel.yaml.YAML() yaml.preserve_quotes = True yaml.width = 1000 yaml.boolean_representation = ['False', 'True'] r = yaml.load(contents) convert = '''\ acos acosh asin asinh atan atanh cos cosh digamma erf erfc erfinv exp expm1 exp2 lgamma log log10 log1p log2 reciprocal sigmoid sin sinc sinh special_entr sqrt tan tanh'''.split() for e in r: f = NativeFunction.from_yaml(e, Location("", 0)) if f.structured or f.structured_delegate is not None: continue n = f.func.name.name.base if n not in convert: continue # mutate e to make changes if f.func.kind() == SchemaKind.out: e.insert(1, 'structured', True) e.insert(2, 'structured_inherits', 'TensorIteratorBase') else: # TODO: The .out overload assumption is not sound in general e.insert(1, 'structured_delegate', f'{n}.out') e['dispatch'].pop('CPU', None) e['dispatch'].pop('CUDA', None) e['dispatch'].pop('CPU, CUDA', None) e['dispatch'].pop('CompositeExplicitAutograd', None) _, last_k = e.keys() needs_fixup = False if not e['dispatch']: if last_k == 'dispatch': needs_fixup = True del e['dispatch'] # Manually fix up newlines at the end, because ruamel # made some bad life choices about where to associate trailing # whitespace for nested dicts; see # https://stackoverflow.com/questions/42172399/modifying-yaml-using-ruamel-yaml-adds-extra-new-lines if needs_fixup: _, last_k = e.keys() # post_key, pre_key, post_value, pre_value e.ca.items[last_k] = [None, None, CommentToken('\n\n', CommentMark(0), None), None] with open("aten/src/ATen/native/native_functions.yaml.new", "w") as f: yaml.dump(r, f) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27777769 Pulled By: ezyang fbshipit-source-id: 1ecbac7cb3e0093167bb61c7d2b1ecb95b8ae17c	2021-04-15 16:06:42 -07:00
Edward Yang	cfc9716246	Change all unary functions stubs to use TensorIteratorBase& (#56078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56078 This is in preparation for making all unary functions structured. I don't actually have to make them structured yet as TensorIterator& casts to TensorIteratorBase& Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27777768 Pulled By: ezyang fbshipit-source-id: 05a3a95f200698eef72c5c74fff85fe881e1c4a3	2021-04-15 16:04:58 -07:00
Horace He	3c4e1cd141	remove annoying warnings from common_nn.py (#55982 ) Summary: ^^ Pull Request resolved: https://github.com/pytorch/pytorch/pull/55982 Reviewed By: mruberry Differential Revision: D27776380 Pulled By: Chillee fbshipit-source-id: 22b3a8de73416821bed56b75b68dca1c33a21250	2021-04-15 16:03:00 -07:00
Chao Kong	ff1498e668	Add cost inference for MulGradient operator Summary: Add cost inference for MulGradient operator; also whitelist MulGradient in COMPUTE_OP_TYPES in dense_perf_estimation Test Plan: buck run //caffe2/caffe2/python/operator_test:elementwise_ops_test Reviewed By: CrazySherman Differential Revision: D27614003 fbshipit-source-id: 30901e5e2b6ce7e2183c2362d1bf9f895046cf55	2021-04-15 16:02:06 -07:00
Natalia Gimelshein	3fbca31be3	port addmv to structured kernels (#55746 ) Summary: Per title I've revamped size checks a bit to provide better error message if `self` is of the wrong size, also added check that inplace variant has correct `self` size Ref: https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55746 Reviewed By: ezyang Differential Revision: D27782980 Pulled By: ngimel fbshipit-source-id: 6ba949b682b8fd1170d0304da0ed348dd1a7b8c7	2021-04-15 15:57:46 -07:00
Bert Maher	8e82e932f3	Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120 This reverts commit ad17fadbfc786dc1ccb42e822208ff03c2a2b72c (D27786457). The big annoyance here is that depending on the threading mode you may not be able to toggle num_threads at will, so the fusion tests won't fail. I hate this solution, but I'm adding a secondary override for the TE fuser. Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're OK if you're running with 1 thread, or you can add `_jit_set_texpr_parallel_cpu_enabled` to enable it anyways. This is (a) mainly for tests, since a real user probably won't fiddle aimlessly with the thread count, and (b) will go away once NNC's threading support is fully baked. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27788199 Pulled By: bertmaher fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1	2021-04-15 15:50:18 -07:00
Jeff Daily	e1752ffa04	[reland][ROCm] use hiprtc precompiled header (#55965 ) Summary: Revert "Revert D27449031 (`2a7df657fe`): [pytorch][PR] [ROCm] use hiprtc precompiled header". Reland PR https://github.com/pytorch/pytorch/issues/54350. This reverts commit 204ac21bf1457022caab197001788239720b96d6. The original PR was reverted under suspicion that it was causing CI instability, but it was instead due to a hardware failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55965 Reviewed By: jbschlosser Differential Revision: D27755907 Pulled By: malfet fbshipit-source-id: 75bf0b9d888df3dee62f00a366b1123757e0474e	2021-04-15 15:47:56 -07:00
h6197627	f02454f957	Fix ChanelShuffle named tensor warnings (#55911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54846 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55911 Reviewed By: agolynski Differential Revision: D27798078 Pulled By: jbschlosser fbshipit-source-id: 1ebd325ac8a21f82c395d2eafac7ef2ecd1f32b1	2021-04-15 15:36:35 -07:00
Wanchao Liang	dd090e72b2	[dist_optim] add distributed functional rprop optimizer (#55834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55834 ghstack-source-id: 126325536 Reviewed By: rohan-varma Differential Revision: D27703878 fbshipit-source-id: 5c8ec9a4ccb4442b2b51d48d75ea5cd506179f14	2021-04-15 15:19:44 -07:00
Wanchao Liang	4e9e7200f2	[dist_optim] Add distributed functional Adamax optimizer (#55833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55833 Add distributed functional Adamax optimizer, to support in TorchScript ghstack-source-id: 126325538 Reviewed By: rohan-varma Differential Revision: D26696540 fbshipit-source-id: 6242faebd2476847831a05df7f8b0d616f2b5355	2021-04-15 15:19:43 -07:00
Wanchao Liang	8ef13cf976	[optim] refactor rprop to use functional API (#55832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55832 ghstack-source-id: 126325541 Reviewed By: driazati Differential Revision: D27703877 fbshipit-source-id: 34d4ce7b7d124c0cd75e2f6d0bc8f836713b7301	2021-04-15 15:19:41 -07:00
Wanchao Liang	bb245b6444	[optim] refactor adamax to use functional API (#55830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55830 ghstack-source-id: 126325537 Reviewed By: driazati Differential Revision: D26561017 fbshipit-source-id: 41273d200e546d4ac08d39b57865d63c624f143a	2021-04-15 15:19:39 -07:00
James Reed	f26a6cb372	[quantization] Fix deepcopy on quantized ConvNd (#56154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56154 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27796268 Pulled By: jamesr66a fbshipit-source-id: cb693dc16582a9334c93f46201c42eb0f4b794b3	2021-04-15 15:18:22 -07:00
Kurt Mohler	a3a75bd35e	Add complex autograd support for `torch.cross` (#55854 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55854 Reviewed By: nikithamalgifb Differential Revision: D27737571 Pulled By: anjali411 fbshipit-source-id: 38165b952cc4c9213d61c7d98b549b984c154927	2021-04-15 15:07:25 -07:00
Vitaly Fedyunin	90e103ddfe	Revert D27753803: [19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher Test Plan: revert-hammer Differential Revision: D27753803 (`7c708ef4ea`) Original commit changeset: 5f24bcfdcb70 fbshipit-source-id: 650e229b788d046450615364e5cba65065a95e3b	2021-04-15 15:03:14 -07:00
Can Balioglu	512c744f2e	[torch/elastic] Introduce `PeriodicTimer` (#55919 ) Summary: This PR introduces a basic timer type that periodically calls a specified function. Its main use in the upcoming `DynamicRendezvousHandler` implementation will be to send periodic keep-alive updates in a background thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55919 Reviewed By: tierex Differential Revision: D27740823 Pulled By: cbalioglu fbshipit-source-id: e46fc848ab033995946a38a29c01d67d387a4cf5	2021-04-15 14:51:14 -07:00
Vitaly Fedyunin	e2036ea342	Revert D27758303: [20/n][torch/elastic][upstream] Move torchelastic.distributed.tests to pytorch.distributed Test Plan: revert-hammer Differential Revision: D27758303 (`9f6fed8a15`) Original commit changeset: c987d4764f47 fbshipit-source-id: 90846dcd5c8512dd615c7f44dc3663f124cf4a25	2021-04-15 14:51:13 -07:00
mattip	9bfe16a308	should_check_autodiff is now should_autodiff_node (#56013 ) Summary: The name `should_check_autodiff` became `should_autodiff_node` but documentation did not change. The identifier is used in `test/test_jit.py`. It seems the file is too big for github to link to the line, but it is the return value from `normalize_check_ad` Pull Request resolved: https://github.com/pytorch/pytorch/pull/56013 Reviewed By: agolynski Differential Revision: D27800008 Pulled By: Lilyjjo fbshipit-source-id: 88a43c14c0f48fb3f94792e3fd6de2bd6a59a1a2	2021-04-15 14:49:49 -07:00
Adam Simpkins	aae1023bed	[caffe2] allow passing options to the DB in Save operations (#55935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55935 Add a new `DB::SetOptions()` method to allow passing options to the DB as part of Save operations. This can be used for passing in options to control the serialization behavior, such as rate limits or other parameters. The serialization options are passed is an opaque string, so that different DB implementations may choose their own options and options format. This also adds a new `db_options` parameter to the `Save` operator. This allows users to pass in the DB options when saving data. ghstack-source-id: 126589771 Test Plan: I don't have any tests in this diff since no DB implements options yet. The next diff in the stack includes an options implementation, along with unit tests that verify the options are passed in correctly. Differential Revision: D27729461 fbshipit-source-id: 4d03250c389c66a049cdee1d05e082f5649ac0f0	2021-04-15 14:45:47 -07:00
nikithamalgi	14d529a368	Add support for refinement for torch.jit.Future (#56148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56148 Fixes issue: #55787 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27796830 Pulled By: nikithamalgifb fbshipit-source-id: b7a60218010793a54eb52d6b7602d333dc5a1c9e	2021-04-15 14:08:58 -07:00
Heitor Schueroff	33159b68a3	Revert "Deprecate legacy constructor `torch.Tensor()` (#54414 )" (#55831 ) Summary: This PR reverts https://github.com/pytorch/pytorch/pull/54414 because of https://github.com/pytorch/pytorch/issues/55780 cc ysiraichi Pull Request resolved: https://github.com/pytorch/pytorch/pull/55831 Reviewed By: agolynski Differential Revision: D27762264 Pulled By: heitorschueroff fbshipit-source-id: 8079a660cc440cafb9d22aa031d36dde121e13b3	2021-04-15 14:06:10 -07:00
Bert Maher	b940516061	[nnc] Don't fuse fp16 on CPU (#56119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56119 There are apparently still more issues with fp16 on LLVM so let's just nuke it from orbit while we develop a robust workaround. ghstack-source-id: 126619411 Test Plan: compile Reviewed By: ZolotukhinM Differential Revision: D27787080 fbshipit-source-id: 9e771211fe48266f50fca1de8d40295922da5bca	2021-04-15 14:01:29 -07:00
Bert Maher	16820bba5a	[nnc][trivial] Trailing underscore style for llvmCode, asmCode members (#56118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56118 that's it ghstack-source-id: 126592595 Test Plan: compile Reviewed By: huiguoo Differential Revision: D27781682 fbshipit-source-id: 12728c279d0e02eb007093e18d9fc989456bea77	2021-04-15 14:01:28 -07:00
Bert Maher	d56f451820	[nnc] Separate printing of optimized llvm bitcode from assembly (#56117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56117 I was debugging an issue during instruction selection and wanted to see the input bitcode. This way we always print it before going into the asm generation pass. ghstack-source-id: 126592596 Test Plan: Run with `PYTORCH_JIT_LOG_LEVEL=">>llvm_codegen"` Reviewed By: huiguoo Differential Revision: D27781683 fbshipit-source-id: 84635d0ca2a1318ae7a9a73cc1d2df450d8b6a08	2021-04-15 13:59:35 -07:00
Dhruv Matani	06ea73942a	[easy] Rename fb::jpeg_decode_to_NCHW to fb::image_decode_to_NCHW (#55857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55857 Since OpenCV supports more than just the JPEG file format. ghstack-source-id: 126528422 Test Plan: Build Reviewed By: JacobSzwejbka Differential Revision: D27722865 fbshipit-source-id: 6cf83bf187bb1fb3a28e3aa2a011959ef8925449	2021-04-15 13:44:13 -07:00
Xue Haotian	63f83edcfb	OpInfo porting for torch.real & torch.imag (#55134 ) Summary: Related https://github.com/pytorch/pytorch/issues/54298 This PR ports the method_tests() entries of torch.real & torch.imag to OpInfo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55134 Reviewed By: agolynski Differential Revision: D27793242 Pulled By: anjali411 fbshipit-source-id: 0e9a987bfef16e78a1cda81ce14970993a59e467	2021-04-15 13:28:21 -07:00
Rong Rong (AI Infra)	5ed3be799d	skip test_filtering_env_var for rocm (#56178 ) Summary: ROCM doesn't report the correct number of expected test device type. Skipping for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56178 Reviewed By: seemethere Differential Revision: D27802139 Pulled By: walterddr fbshipit-source-id: 2e58df1a3ba2411e690be52babf946e284c4efcc	2021-04-15 13:20:03 -07:00
Edward Yang	6c327ef9d4	matches_jit_signatures is dead (#53637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53637 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26920687 Pulled By: ezyang fbshipit-source-id: 288bd9dca63da04ccc633d939833066a3305a68a	2021-04-15 12:31:19 -07:00
Heitor Schueroff	6366658fbf	Add OpInfo for torch.nansum (#55523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55523 Reviewed By: agolynski Differential Revision: D27796660 Pulled By: heitorschueroff fbshipit-source-id: fea4d9dcccb7f4b9ba1b00079fb3899a8d20ba4b	2021-04-15 12:11:34 -07:00
Aliaksandr Ivanou	9f6fed8a15	[20/n][torch/elastic][upstream] Move torchelastic.distributed.tests to pytorch.distributed (#56077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56077 Move torchelastic.distributed.tests to pytorch.distributed Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/... Reviewed By: cbalioglu Differential Revision: D27758303 fbshipit-source-id: c987d4764f4776f55306988b02eae2306db06c2b	2021-04-15 11:58:05 -07:00
Luca Wehrstedt	857d8264a7	Skip RPC's CPU-only tests on CircleCI GPU jobs (#55778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55778 The RPC suite takes very long to run, and most of it is CPU-only. As long as we run the CPU-only part on some CPU worker on CircleCI, we can skip it on the GPU workers (which are expensive and we should waste their time). ghstack-source-id: 126270873 Test Plan: Exported to CircleCI and checked that the CPU-only part still runs on the CPU workers but doesn't on the GPU workers. Reviewed By: mrshenli Differential Revision: D27705941 fbshipit-source-id: a0a509d6e72cf69e417f4b48336df534b070a66d	2021-04-15 11:20:44 -07:00
Ailing Zhang	0a06d054d0	Revert "Only allow hub.load() from original repo. (#54451 )" (#56048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56048 This reverts commit c411017a41988e9c5184279c1ec7dd7ef4e1a6fe. This implementation broke CI in pytorch/vision and it's not handling tags properly. So I want to revert it first to unblock vision CI and send out a proper fix later. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D27771701 Pulled By: ailzhang fbshipit-source-id: 932f9be72a1ae1816f4032643b3c2dde0cb7ae4c	2021-04-15 11:16:56 -07:00
Can Balioglu	71f9e99e29	[torch/elastic] Introduce aux types required by `DynamicRendezvousHandler` (#55932 ) Summary: This PR includes the auxiliary types used by the upcoming implementation of the `DynamicRendezvousHandler`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55932 Test Plan: Run the existing and newly-introduced unit/integration tests. Reviewed By: tierex Differential Revision: D27742329 Pulled By: cbalioglu fbshipit-source-id: cf2e0d88042909739e7c37c25b4b90192c26e198	2021-04-15 11:12:20 -07:00
Aliaksandr Ivanou	7c708ef4ea	[19/n][torch/elastic][upstream] Replace pytorch.distributed.launch with torchelastic launcher (#56037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56037 The diff introduces new `torch.distributed.elastic_launch` and removes internals of `torch.distributed.launch` keeping backwards compatibility. Since torchelastic and torch.launch are not fully compatible due to `--use_env` arg, the `torch.distributed.launch` deprecation is going to be iterative: as part of pytorch 1.9 we are going to deprecate it, and in the following releases we will remove `torch.distributed.launch` The diff leaves `torchelastic.distributed.launch` module, and the follow up diffs will migrate the users form `torchelastic.distributed.launch` to `torch.distributed.elastic_launch` Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/... Reviewed By: cbalioglu Differential Revision: D27753803 fbshipit-source-id: 5f24bcfdcb70356f0787b11f6cb9479f3515fb47	2021-04-15 11:09:12 -07:00
Rohan Varma	728d2e4e0f	[BE] Speed up runtime of test_ddp_model_diff_across_ranks (#55659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55659 As per https://github.com/pytorch/pytorch/issues/55583, this is the most expensive distributed test. Instead of waiting for process 0 in this test to be taken down by nccl_async_error_handling, just remove the barrier and let the process exit when the backend is NCCL. A slight downside here is that the test no longer verifies that the process would be brought down by nccl_async_error_handling, but nccl_async_error_handling is already well tested in other tests. If we feel we need to ensure this for this test, then we can pass in a process group with a smaller timeout as an alternative solution. The test now runs in 4-6s as opposed to 70. Ran the test 1000 times to verify no flakiness ghstack-source-id: 126590904 Test Plan: CI Reviewed By: mrshenli Differential Revision: D27672161 fbshipit-source-id: 38fb518606daac9b0390ca4c3ce1a72dc2da36fc	2021-04-15 10:30:14 -07:00
Ivan Kobzarev	7eed077406	[android] Fix headers publishing in aar (#56068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56068 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D27776655 Pulled By: IvanKobzarev fbshipit-source-id: 75e07b56dab8f7ff2ab501d0ddc4566ef2378fcf	2021-04-15 09:54:08 -07:00
Richard Barnes	49e5e284ea	Additional annotations in fbcode/caffe2/torch/_jit_internal.py (#55855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55855 Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D27715202 fbshipit-source-id: 99d59345a1915030f12441de91a6b7d4250a1f43	2021-04-15 09:47:17 -07:00
Jane Xu	a60dca8e80	Make the script generate cancel_redundant_workflows.yml (#56092 ) Summary: This way, the user would just have to run the `regenerate_cancel_redundant_workflow.py` script to fix the inconsistency (instead of manual stuff). Lots of the indentation changes were caused by regenerating the file, which I don't think is terrible, and ruamel.yaml did great at preserving comments and order and such! Pull Request resolved: https://github.com/pytorch/pytorch/pull/56092 Reviewed By: samestep Differential Revision: D27780877 Pulled By: janeyx99 fbshipit-source-id: dd2996a88cd70a83d8daac33ba6659f93add8b92	2021-04-15 09:36:49 -07:00
Rohan Varma	51e7a371f5	[DDP] Param to name mapping in Reducer (#55075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55075 Constructs and passes in a mapping with parameter names to Reducer to log information about unused parameters in error messages about unused parameters/not all parameters getting gradient. Use case: 1) User runs DDP forward + bwd, and it has some unused parameters that will result in ddp error in next iteration 2) Next forward pass calls `Reducer::ensure_prior_reduction_finished()` where we check all params got gradient from the previous bwd pass. DDP would throw here in this case. 3) Reducer maintains mapping and tracks used parameters, and computes which parameters did not get gradient and logs this as part of the error. Implementation details: 0) The following is only enabled for debug modes of INFO or DETAIL. 1) To save memory, we don't map param -> param name so that we don't have to copy the entire tensor, instead we map param_index -> param_name and use the existing concept of variable_index in Reducer to look up parameter names. 2) DDP constructs param index -> param name mapping. The name is the fully qualified name: f"{module_name}:{param_name}" and passes it into Reducer 3) Reducer maintains per-iteration std::set<int> of variable indices that have had `mark_variable_ready` called. 4) When some params go unused, we take a set difference to detect the unused params. 5) Unittests to test the logged unused params, as well as for nested modules, are added ghstack-source-id: 126581051 Test Plan: CI, UT Reviewed By: zhaojuanmao Differential Revision: D27356394 fbshipit-source-id: 89f436af4e74145b0a8eda92b3c4e2af8e747332	2021-04-15 09:19:50 -07:00
Peter Bell	1934725875	Use cascade summation in nll_loss on CPU (#55841 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55657 This also avoids summing `total_weight_val` when weights aren't supplied. Avoiding accumulated error completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55841 Reviewed By: jbschlosser Differential Revision: D27751492 Pulled By: ngimel fbshipit-source-id: 2c2dc48f31c25dfa9db48693e3f765b179771a3c	2021-04-15 09:10:35 -07:00
Edward Yang	6c65ce8ee1	Use THPVariable_Unpack in python_nccl (#56016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56016 Missed these because I don't build on CUDA Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27765124 Pulled By: ezyang fbshipit-source-id: aa202f594659d53c903b88c9d4a4cbb0e1c0b40a	2021-04-15 08:57:06 -07:00
Edward Yang	6ec71ed4f9	Replace all direct cdata access with THPVariable_Unpack (#55799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55799 I'm going to change the implementation of cdata soon so I need to abstract over cdata access with a function. Additionally, many users are casting manually casting to THPVariable to access the member so I can remove these unsafe casts in the client code (the implementation, of course, is still doing an unsafe cast.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27712130 Pulled By: ezyang fbshipit-source-id: 95fcc013bf3913d67f2c634068eb5b3aab144cb3	2021-04-15 08:57:04 -07:00
Edward Yang	61418aa069	Make THPVariable_Unpack work on THPVariable too (#55798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55798 I'm going to change how cdata is implemented internally, so I want to make all callsites call through THPVariable_Unpack even if they actually have a THPVariable in hand Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27712131 Pulled By: ezyang fbshipit-source-id: bd2eb1e43c52c6b7a776ff3a45350a23934e643c	2021-04-15 08:57:02 -07:00
Edward Yang	82a7fff3cd	Modify a few APIs to take/return const Tensor& instead of Tensor& (#55797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55797 In all of these cases, the inside of the function didn't make use of the fact that the tensor was a mutable reference Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27712132 Pulled By: ezyang fbshipit-source-id: 99e0bb1d783f63d2d42ab53d3d406b2064405ef4	2021-04-15 08:57:00 -07:00
Brian Hirsh	e8faf69739	fix torch.pow type promotion issue (#54085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54085 Fixes https://github.com/pytorch/pytorch/issues/50121. This fixes two similar issues pointed out with the dtype that `torch.pow` performs its computation. Thanks ngimel for spotting the issues originally (comments [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594624355) and [here](https://github.com/pytorch/pytorch/pull/53669#discussion_r594719704))! Before: ``` >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0])) tensor([0]) >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0)) tensor(131072) >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda')) tensor([131072], device='cuda:0') >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda')) tensor(131072, device='cuda:0') ``` After: ``` >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8), out=torch.tensor([0])) tensor([0]) >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8), out=torch.tensor(0)) tensor(0) >>> torch.pow(2, torch.tensor([17], dtype=torch.uint8, device='cuda'), out=torch.tensor([0], device='cuda')) tensor([0], device='cuda:0') >>> torch.pow(2, torch.tensor(17, dtype=torch.uint8, device='cuda'), out=torch.tensor(0, device='cuda')) tensor(0, device='cuda:0') ``` In all four cases above, `tensor(0, ...)` is the correct value because the computed "common dtype" among the inputs is expected to be `uint8`. Computing `2 ** 7` in uint8 will then overflow to zero. Finally, we cast the computed output to the output tensor's dtype, which is `int32`. There were two separate issues fixed in this PR: one for cpu and one for cuda: * For CPU, The `pow(Scalar, Tensor)` overload wasn't calling `set_wrapped_number(true)` after wrapping the scalar in a Tensor, which caused the "promoted" scalar to incorrectly participate in type promotion (see the documented behavior [here](`aa8714dfed/c10/core/TensorImpl.h (L590)`)) * For CUDA, the cuda kernels defined in `PowKernel.cu` were using the output's dtype to run the computation, instead of the common dtype. As an aside: The CPU and CUDA kernels actually both use `iter.dtype()` instead of `iter.common_dtype()` to run the computation, which I fixed. The reason that only manifested here for CUDA is because TensorIterator has cpu-specific logic to create temporary outputs with the intermediate dtype (shown [here](`aa8714dfed/aten/src/ATen/TensorIterator.cpp (L349)`)). I'm not sure what the end state is there- I can imagine that being something we're more okay doing for cpu than for cuda, but it also leads to hard-to-track-down inconsistencies between the two like in this case. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27096330 Pulled By: bdhirsh fbshipit-source-id: a7e2909243851625cb3056d1e7abb2383bfe95f2	2021-04-15 08:55:53 -07:00
Nikolay Korovaiko	9d3d169d2d	Implement hardswish/hardsigmoid on MKLDNN tensors (#55218 ) Summary: Adding hardwish and hardsigmoid improves mobilenetv3 by ~13% \| hardswish \| base \| -- \| -- \| -- \| -- run 1 \| 1305.032 \| 1486.013 \| run 2 \| 1290.142 \| 1491.001 \| run 3 \| 1305.51 \| 1491.66 \| run 4 \| 1308.788 \| 1495.577 \| avg \| 1302.368 \| 1491.063 \| 0.873449 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55218 Reviewed By: albanD Differential Revision: D27701276 Pulled By: Krovatkin fbshipit-source-id: cde78da71d327e65461e80fbb6c3bb3429505410	2021-04-15 08:51:30 -07:00
James Reed	71a5314591	Fix ScriptMethod dispatch on __torch_function__ (#56103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56103 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27784142 Pulled By: jamesr66a fbshipit-source-id: 555dcb7c3a98b8fb9e9ca9b499cafad54e819aa7	2021-04-15 08:46:43 -07:00
Natalia Gimelshein	61725f15c0	cleanup unused implicit argument of expand function (#56101 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/56101 Reviewed By: mruberry Differential Revision: D27783771 Pulled By: ngimel fbshipit-source-id: 73044461fc2d7bfab5e84eef87ff381f40a46bad	2021-04-15 08:43:15 -07:00
Nikita Shulga	6daa1760d7	Skip geqrf test if compiled without LAPACK (#56105 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56105 Reviewed By: walterddr Differential Revision: D27785443 Pulled By: malfet fbshipit-source-id: 9701f693a71f77259c0a6371106e7185cc49a803	2021-04-15 08:07:51 -07:00
Rong Rong (AI Infra)	e0f9a5fed8	[BE] add test selector to test_testing (#55931 ) Summary: This is a reflection of recent failures in https://github.com/pytorch/pytorch/issues/55753 and https://github.com/pytorch/pytorch/issues/55522. We are lacking a test to safeguard these test env var. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55931 Test Plan: 1. CI 2. Run locally using `python test/test_testing.py -k test_filtering_env_var -v` - gives failure on 2ca45cb9e8 and d0cd16899f - passes on 159e1100bf and current master Reviewed By: jbschlosser Differential Revision: D27747537 Pulled By: walterddr fbshipit-source-id: c88e1c818199c7838866037d702d4012cacf510e	2021-04-15 08:00:46 -07:00
albanD	1d49fd31c4	[reland] Add formulas and basic tests (#56083 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/49098 See original issue for details. The only difference with previous PR is the fix of the _embedding_bag_dense_backward formula to stop declaring a backward formula for an argument that does not exists. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56083 Reviewed By: samestep Differential Revision: D27778221 Pulled By: albanD fbshipit-source-id: 159ef91ca931ef2ccfbc3d1c46c7880c32919dc9	2021-04-15 07:52:43 -07:00
Pruthvi Madugundu	b383b63550	[ROCm] Updating ROCM_HOME handling for >ROCm 4.0 (#55968 ) Summary: - This change is required to handle the case when hipcc is updated to the latest using update-alternatives. - Update-alternatives support for few ROCm binaries is available from ROCm 4.1 onwards. - This change doesnt not affect any previous versions of ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55968 Reviewed By: mruberry Differential Revision: D27785123 Pulled By: ezyang fbshipit-source-id: 8467e468d8d51277fab9b0c8cbd57e80bbcfc7f7	2021-04-15 07:48:36 -07:00
Howard Huang	5cab3b9cf6	Revert D27709912: TCPStore add watchKey method and new listener thread Test Plan: revert-hammer Differential Revision: D27709912 (`f8f756efb2`) Original commit changeset: 619aa3b2a8eb fbshipit-source-id: 3ef96ccaa76c702d7e5427dfc263531fb1c274ab	2021-04-15 07:43:48 -07:00
kshitij12345	6350fcef83	[testing] add `broadcasts_input` and verifies the behaviour for inplace_variant. (#55771 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55595 * Add `broadcasts_input` attribute to SampleInput * Update test_variant_consistency_eager to verify that sample with `broadcasts_input==True` and inplace variant raises an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55771 Reviewed By: jbschlosser, ngimel Differential Revision: D27760530 Pulled By: mruberry fbshipit-source-id: feb0658730d4cff483848a5ade9512837a65c24c	2021-04-15 07:39:50 -07:00
kshitij12345	50057e560b	[special] Add `i0e` (#54409 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Changes: * Add `i0e` * Move some kernels from `UnaryOpsKernel.cu` to `UnarySpecialOpsKernel.cu` to decrease compilation time per file. Time taken by i0e_vs_scipy tests: around 6.33.s <details> <summary>Test Run Log</summary> ``` (pytorch-cuda-dev) kshiteej@qgpu1:~/Pytorch/pytorch_module_special$ pytest test/test_unary_ufuncs.py -k _i0e_vs ======================================================================= test session starts ======================================================================== platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /home/kshiteej/Pytorch/pytorch_module_special, configfile: pytest.ini plugins: hypothesis-5.38.1 collected 8843 items / 8833 deselected / 10 selected test/test_unary_ufuncs.py ...sss.... [100%] ========================================================================= warnings summary ========================================================================= ../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73 test/test_unary_ufuncs.py::TestUnaryUfuncsCUDA::test_special_i0e_vs_scipy_cuda_bfloat16 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( -- Docs: https://docs.pytest.org/en/stable/warnings.html ===================================================================== short test summary info ====================================================================== SKIPPED [3] test/test_unary_ufuncs.py:1182: not implemented: Could not run 'aten::_copy_from' with arguments from the 'Meta' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_copy_from' is only available for these backends: [BackendSelect, Named, InplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, UNKNOWN_TENSOR_TYPE_ID, AutogradMLC, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode]. BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] InplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:56 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] UNKNOWN_TENSOR_TYPE_ID: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_4.cpp:8761 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_4.cpp:9348 [kernel] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:250 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1016 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] ==================================================== 7 passed, 3 skipped, 8833 deselected, 2 warnings in 6.33s ===================================================== ``` </details> TODO: * [x] Check rendered docs (https://11743402-65600975-gh.circle-artifacts.com/0/docs/special.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54409 Reviewed By: jbschlosser Differential Revision: D27760472 Pulled By: mruberry fbshipit-source-id: bdfbcaa798b00c51dc9513c34626246c8fc10548	2021-04-15 06:06:11 -07:00
CodemodService FBSourceClangFormatLinterBot	2f895f790a	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27789747 fbshipit-source-id: ef4882e92d7755669083573c43ae6c5088bf01ab	2021-04-15 04:27:27 -07:00
Xiao Wang	84e6580b5f	Use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA (#54676 ) Summary: This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA. Cusolver `potri` is not used, because - it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix - it's no faster than cusolver potrs in most cases - it doesn't have a batched version or 64-bit version `cholesky_inverse` dispatch heuristics: - If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. See also https://github.com/pytorch/pytorch/issues/42666 #47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54676 Reviewed By: ngimel Differential Revision: D27723805 Pulled By: mruberry fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f	2021-04-15 04:16:18 -07:00
Kurt Mohler	699b47cd2c	Update use_deterministic_algorithms docs (#55413 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55086 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55413 Reviewed By: jbschlosser Differential Revision: D27759069 Pulled By: mruberry fbshipit-source-id: 16c0dc1dc6f80ddd4f131e5e91729bbda8850878	2021-04-15 04:04:27 -07:00
Lucas Hosseini	3802e577fb	[TensorPipe] Use Descriptor::Tensor::sourceDevice in tensorpipe_agent. (#55821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55821 Test Plan: CI Reviewed By: lw Differential Revision: D27661608 fbshipit-source-id: fd241f073d8928528a749758c7d0f570dfeb677b	2021-04-15 03:21:26 -07:00
Lucas Hosseini	047164437e	[TensorPipe] Prepare for new Pipe API. (#55820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55820 Test Plan: CI Reviewed By: lw Differential Revision: D27648291 fbshipit-source-id: e08db6e8c1f5f333ec355de29e25fbe552904b25	2021-04-15 03:20:32 -07:00
Guilherme Leobas	6eeffc64f1	Port NumPy typing testing style to PyTorch (#54234 ) Summary: This is a follow-up PR of https://github.com/pytorch/pytorch/issues/52408 and includes the `pass/` and `fail/` directories. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54234 Reviewed By: walterddr Differential Revision: D27681410 Pulled By: malfet fbshipit-source-id: e6817df77c758f4c1295ea62613106c71cfd3fc3	2021-04-15 01:25:16 -07:00
Jeff Daily	a128938a75	[ROCm] add MAGMA_HOME env var hint to cmake, centos-rocm Dockerfile (#54511 ) Summary: MAGMA_HOME was previously set for the ubuntu-rocm/Dockerfile. However, this missed centos builds as well as any builds that do not use the CI image environments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54511 Reviewed By: jbschlosser Differential Revision: D27755983 Pulled By: malfet fbshipit-source-id: 1ffd2cd100f4221c2bb64e6915fa3372ee1f6247	2021-04-15 01:06:44 -07:00
Sam Estep	1e9c7ad4cb	Add a test to measure `import torch` time (#56041 ) Summary: This PR adds a couple very simple tests which (as the code comment says) measure the time it takes to `import torch` and ask for the CUDA device count. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56041 Test Plan: ``` $ rm -r /tmp/reports ; python3 test/test_import_time.py --save-xml=/tmp/reports Running tests... ---------------------------------------------------------------------- .. ---------------------------------------------------------------------- Ran 2 tests in 1.855s OK Generating XML reports... ``` ``` $ tools/print_test_stats.py /tmp/reports No scribe access token provided, skip sending report! class TestImportTime: tests: 2 failed: 0 skipped: 0 errored: 0 run_time: 1.85 seconds avg_time: 0.93 seconds median_time: 0.93 seconds 2 longest tests: test_time_cuda_device_count time: 1.10 seconds test_time_import_torch time: 0.75 seconds Total runtime is 0:00:01 2 longest tests of entire run: TestImportTime.test_time_cuda_device_count time: 1.10 seconds TestImportTime.test_time_import_torch time: 0.75 seconds ``` Reviewed By: driazati Differential Revision: D27770908 Pulled By: samestep fbshipit-source-id: 01bbf5a339f41d3a1f493e6fa8c946ff7567daec	2021-04-15 00:53:30 -07:00
Nikita Shulga	75b6644a4c	Add USE_NUMPY define only if PyTorch is compiled with Numpy (#56102 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56102 Reviewed By: driazati, mruberry Differential Revision: D27784057 Pulled By: malfet fbshipit-source-id: 636a005e7f74a58d47188f18b74f7deb4afe5fcb	2021-04-15 00:48:03 -07:00
Eddie Yan	81f181567a	Add `USE_MAGMA` build flag (#55994 ) Summary: Many model pipelines/workflows don't use MAGMA even though it is included in the build by default. Leaving MAGMA kernels out of the build can save 60+MB of GPU memory when loading `libtorch_cuda.so` (tested on V100, current upstream master). A current sharp corner of this flag is that toggling it when rebuilding requires `torch/include/THC/THCGeneral.h` to be manually deleted by the user, as even running `make clean` or `setup.py` with `--cmake` does not properly regenerate it with the appropriate substitution for `#cmakedefine USE_MAGMA`. Is there a way to force the regeneration of the header during a rebuild? CC malfet ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/55994 Reviewed By: mruberry Differential Revision: D27766287 Pulled By: malfet fbshipit-source-id: 93deca57befa0febb9c5b7875ecf0015c547d421	2021-04-15 00:43:12 -07:00
Nikita Shulga	1995640d86	Fix compiler warnings in mkldnn Pooling (#56095 ) Summary: Also, add `-Werror` flag to prevent this regressions from happening in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/56095 Reviewed By: walterddr Differential Revision: D27781603 Pulled By: malfet fbshipit-source-id: 2a404788a965c380ff9feb72d0b2d967b131371f	2021-04-15 00:33:21 -07:00
Bert Maher	f5a7b2e641	Put llvmMathExtras in c10 namespace (#55886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55886 We've imported llvm's MathExtras header, but now that we want to also include LLVM (which includes its own MathExtras), we need to guard the c10 version appropriately (or interwine llvm more deeply with our build than just the CPU fuser, which I'm not super excited about doing just yet). ghstack-source-id: 126375067 Test Plan: build Reviewed By: ZolotukhinM Differential Revision: D27731038 fbshipit-source-id: 7c136341d6b433b3876ee983820016df75c14dec	2021-04-14 21:56:57 -07:00
Mikhail Zolotukhin	556dfcb0db	[TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094 Now FunctionCalls are merged with Loads and vectorization for intermediate values automatically started to work. Fixes #53553. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27781519 Pulled By: ZolotukhinM fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0	2021-04-14 21:39:03 -07:00
Natalia Gimelshein	ad17fadbfc	Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1 Test Plan: revert-hammer Differential Revision: D27652485 (`e7e164f9e6`) Original commit changeset: 182580cf758d fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af	2021-04-14 20:23:15 -07:00
Natalia Gimelshein	506eca24b9	Revert D27752279: [nnc] Do not try to vectorize kernels that use float16 Test Plan: revert-hammer Differential Revision: D27752279 (`8df5e61fd6`) Original commit changeset: ac115080bf2a fbshipit-source-id: cbc0aa2dcb7691d9fc9d081c6169dea711cd9fac	2021-04-14 20:21:40 -07:00
Aliaksandr Ivanou	8f663170bd	[17/n][torch/elastic] Make torchelastic launcher compatible with the caffe2.distributed.launch (#55687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55687 The diff makes sure that users can transfer the following parameters: * master_addr * master_port * node_rank * use_env The diff implement StaticTCPRendezvous that creates a store with listener on agent rank #0 The diff modifies caffe2/rendezvous: If the worker process launched with torchelastic agent, the worker processes will create a PrefixStore("worker/") from TCPStore without listener. The diff adds macros functionality to torch/distributed/ealstic/utils that helps to resolve local_rank parameter. Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/distributed/test:launch_test Reviewed By: cbalioglu, wilson100hong Differential Revision: D27643206 fbshipit-source-id: 540fb26feac322cc3ec0a989fe53324755ccc4ea	2021-04-14 19:33:26 -07:00
Taylor Robie	c5f9e043e9	Collect instruction counts (and wall times) for CI (#55428 ) Summary: This PR add a `--mode` flag and a script to collect microbenchmarks in a single JSON file. I also added a version check since benchmarks are expected to evolve; this also turned up a determinism bug in `init_from_variants`. (`set` is not ordered, unlike `dict`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55428 Test Plan: Run in CI CC: ngimel wconstab ezyang bhosmer Reviewed By: mruberry Differential Revision: D27775284 Pulled By: robieta fbshipit-source-id: c8c338fedbfb2860df207fe204212a0121ecb006	2021-04-14 17:53:13 -07:00
Nikolay Korovaiko	92a09fb87a	Manual revert of D27369251 (#56080 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/56080 Reviewed By: hansonw Differential Revision: D27777498 Pulled By: Krovatkin fbshipit-source-id: f72ca725ceba3c1fbd54c30014ac001d4b35b9eb	2021-04-14 17:25:59 -07:00
Louis Feng	f8d331b33b	PyTorch Execution Graph Observers (#55957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55957 This diff adds an execution graph observer that tracks all operators (dispatcher autograd, jit, user defined, etc.) and their inputs and outputs. The results are written to a temp JSON file which can be used for further analysis. This support various use cases, such as dependency analysis, performance optimizations, etc. Some minor refactoring of existing code for clarity and completeness. Test Plan: Example output: {F603167736} ``` => buck build caffe2/torch/fb/observers:execution_graph_observer_runner --show-output => buck-out/gen/caffe2/torch/fb/observers/execution_graph_observer_runner --pytorch_enable_execution_graph_observer=true --pytorch_execution_graph_observer_iter_label="## START ##" --pytorch_execution_graph_observer_iter_target=3 I0414 01:26:55.834039 1038798 ExecutionGraphObserver.cpp:408] Enabled PyTorch execution graph observer I0414 01:26:55.834717 1038798 ExecutionGraphObserver.cpp:411] Matching iteration start label: "## START ##" I0414 01:26:55.834940 1038798 ExecutionGraphObserver.cpp:423] Target iteration: 3 I0414 01:26:55.835962 1038798 ExecutionGraphObserverRunner.cpp:50] Running test execution graph observer runner. I0414 01:26:55.836180 1038798 ExecutionGraphObserverRunner.cpp:51] iterations: 10 I0414 01:26:55.836419 1038798 ExecutionGraphObserverRunner.cpp:52] output file name: /tmp/pytorch_execution_graph_1618388815_1038798_3.json I0414 01:26:56.246432 1038798 ExecutionGraphObserver.cpp:137] Writing PyTorch execution graph to: /tmp/pytorch_execution_graph_1618388815_1038798_3.json I0414 01:26:56.278715 1038798 ExecutionGraphObserver.cpp:314] PyTorch execution graph is written to file: /tmp/pytorch_execution_graph_1618388815_1038798_3.json ``` see `/tmp/pytorch_execution_graph_[timestamp]_[process_id]_[iter_target].json` Reviewed By: albanD Differential Revision: D27238906 fbshipit-source-id: 3eb717d7d512e2d51d3162e9995b1ccd18e5a725	2021-04-14 17:13:37 -07:00
Rong Rong (AI Infra)	55432982d2	[OpInfo][take2] move matmul to OpInfo (#55947 ) Summary: This is a reland of https://github.com/pytorch/pytorch/issues/55543 after fixing bfloat16 issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55947 Reviewed By: mruberry Differential Revision: D27765035 Pulled By: walterddr fbshipit-source-id: b27a769de7686777012194ebbc1f38fc5d4acb67	2021-04-14 16:01:40 -07:00
Meghan Lele	669a8acc54	[package] Allow save_module to accept module as arg (#55996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55996 Sumamary This commit modifies `PackageExporter.save_module` so that the `module` argument can be either a string (`str`) or a module (`types.ModuleType`). Test Plan This commit adds a unit test similar to `TestSaveLoad.test_save_module` that tests that calling `save_module` with a module object works. Fixes This commit fixes #55939. Test Plan: Imported from OSS Reviewed By: jamesr66a, huiguoo Differential Revision: D27771781 Pulled By: SplitInfinity fbshipit-source-id: 57c8cf45575bb8dcfca711759fadfff72efb35e7	2021-04-14 15:52:55 -07:00
Peng Wu	1a116a9332	[Static runtime] Add optimize_graph_output_memory flag (#55811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55811 - Added manage_graph_output_memory flag to opts (default false) - Added checking for flag dependency between enable_out_variant and optimize_graph_output_memory and optimize_memory - Minor refactoring for readability Test Plan: buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- --exact 'caffe2/caffe2/fb/predictor:pytorch_predictor_test - PyTorchPredictor.StaticRuntime Reviewed By: hlu1 Differential Revision: D27573780 fbshipit-source-id: 28698657f686f27b8ad60e1276cdf17402d2cf91	2021-04-14 15:41:18 -07:00
Edward Yang	44e2c2cdfb	Add a lint for native_functions.yaml (#56059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56059 The lint doesn't do very much, mostly it enforces that indentation is consistent. The real point of the lint is to just make sure that we can still do surgery on codemod with tools like ruamel, by reusing the configuration in this script. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27774590 Pulled By: ezyang fbshipit-source-id: c26bc6c95a478bd9b86387b18de7e906e7d13193	2021-04-14 15:20:29 -07:00
aburmagin	6b8696172f	Fixed some Clang-Tidy checks in Aten Context class (#55942 ) Summary: Clang-Tidy displayed that it's possible to make some methods static and const in Context class. So I made. It also shows that it has some unused headers from standard libraries included, which i will fix with a next PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55942 Reviewed By: mruberry Differential Revision: D27766213 Pulled By: bdhirsh fbshipit-source-id: 4bd9b92c0b8e5c540ac94fbd2bdace64949946e3	2021-04-14 14:55:44 -07:00
Sam Estep	817fd932ac	Revert D25607505: Add formulas and basic tests Test Plan: revert-hammer Differential Revision: D25607505 (`70f5905565`) Original commit changeset: fe2315d58768 fbshipit-source-id: 519d7426a6f32f0db51c4f360e5d5a79dbaac99d	2021-04-14 14:50:43 -07:00
Pavel Belevich	ed03a0791e	Change MessageType values from decimals to hexadecimals for readability (#55985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55985 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27758101 Pulled By: pbelevich fbshipit-source-id: 45a7c4d1c4fea874bca7b96e7f2b699ce3a199e5	2021-04-14 14:32:02 -07:00
Eli Uriegas	50bd6a3640	ci: Remove CUDA 10.1 builds (#56056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56056 Since internal systems as well as colab have updated beyond CUDA 10.1 I'd say it's safe to remove CUDA 10.1 builds entirely As mentioned in https://github.com/pytorch/pytorch/issues/55829#issuecomment-818236019 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27772826 Pulled By: seemethere fbshipit-source-id: 1599bba26b73b909b2575130219e2708ade5654c	2021-04-14 14:24:56 -07:00
Eli Uriegas	2e7e4d0795	ci: Add job to ensure python2 setup.py compat (#56057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56057 Just to make sure we don't add anything there that'd break python 2 users from receiving the correct error message Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27774120 Pulled By: seemethere fbshipit-source-id: e40a1a2672a69eed3b6e834b1acbb7a04c0adec1	2021-04-14 14:17:14 -07:00
albanD	70f5905565	Add formulas and basic tests (#49098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49098 RFC: https://github.com/pytorch/rfcs/pull/11 This PR adds: - Codegen support to define forward grad formulas and few manual formulas - Codegen support to automatically generate formulas as well as few usage - Tests for basic forward grad components Codegen generated examples. For each of them, the only part that is changed is the if statement before the return checking for fw grad defined. - For manual entry: ```yaml - name: max(Tensor self) -> Tensor self: evenly_distribute_backward(grad, self, result) result: max_forward(self_fw_grad, self, result) ``` ```cpp Tensor max(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<MaxBackward1> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<MaxBackward1>(new MaxBackward1(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::max(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "max"); if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto self_primal = toLegacyPrimal(self); auto result_new_fw_grad = max_forward(self_fw_grad, self_primal, result); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op / false); } } if (grad_fn) { grad_fn->result_ = SavedVariable(result, true); } return result; } ``` - For element wise entry: ```yaml - name: abs(Tensor self) -> Tensor self: grad self.sgn() result: auto_element_wise ``` ```cpp Tensor abs(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<AbsBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AbsBackward>(new AbsBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::abs(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "abs"); if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto self_primal = toLegacyPrimal(self); auto result_new_fw_grad = self_fw_grad * self_primal.sgn(); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op / false); } } return result; } ``` - For linear entry: ```yaml - name: clone(Tensor self, , MemoryFormat? memory_format=None) -> Tensor self: grad result: auto_linear ``` ```cpp Tensor clone(const Tensor & self, c10::optional<MemoryFormat> memory_format) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<CloneBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<CloneBackward>(new CloneBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::clone(self_, memory_format); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } if (isFwGradDefined(self)) { auto self_fw_grad = toLegacyFwGrad(self); auto result_new_fw_grad = at::clone(self_fw_grad, memory_format); if (result_new_fw_grad.defined()) { result.set_fw_grad(result_new_fw_grad, /* level / 0, / is_inplace_op */ false); } } return result; } ``` - For no entry: ```yaml - name: angle(Tensor self) -> Tensor self: angle_backward(grad, self) ``` ```cpp Tensor angle(const Tensor & self) { auto& self_ = unpack(self, "self", 0); auto _any_requires_grad = compute_requires_grad( self ); std::shared_ptr<AngleBackward> grad_fn; if (_any_requires_grad) { grad_fn = std::shared_ptr<AngleBackward>(new AngleBackward(), deleteNode); grad_fn->set_next_edges(collect_next_edges( self )); grad_fn->self_ = SavedVariable(self, false); } #ifndef NDEBUG c10::optional<Storage> self__storage_saved = self_.has_storage() ? c10::optional<Storage>(self_.storage()) : c10::nullopt; c10::intrusive_ptr<TensorImpl> self__impl_saved; if (self_.defined()) self__impl_saved = self_.getIntrusivePtr(); #endif auto tmp = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::angle(self_); })(); auto result = std::move(tmp); #ifndef NDEBUG if (self__storage_saved.has_value()) AT_ASSERT(self__storage_saved.value().is_alias_of(self_.storage())); if (self__impl_saved) AT_ASSERT(self__impl_saved == self_.getIntrusivePtr()); #endif if (grad_fn) { set_history(flatten_tensor_args( result ), grad_fn); } throw_error_for_complex_autograd(result, "angle"); TORCH_CHECK(!(isFwGradDefined(self)), "Trying to use forward prop with angle that does not support it."); return result; } ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25607505 Pulled By: albanD fbshipit-source-id: fe2315d587689af1cd5968536fa26c680b8b8829	2021-04-14 14:13:30 -07:00
Ailing Zhang	1e225a5187	Add a few InferenceMode test cases to the wall. (#55993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55993 Test Plan: Imported from OSS Reviewed By: robieta Differential Revision: D27769478 Pulled By: ailzhang fbshipit-source-id: 009592d64ef24e1cf7e977d02acf662eb841ca58	2021-04-14 13:48:37 -07:00
Your Name	cc7fab6e9c	Update pthreadpool (#55950 ) Summary: This updates pthreadpool to include [this commit](`a134dd5d4c`), which removes a bunch of deprecation warnings in the build. Fixes https://github.com/pytorch/pytorch/issues/33760 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55950 Reviewed By: mruberry Differential Revision: D27773417 Pulled By: driazati fbshipit-source-id: b4397787d882228bae47ddd8ccf628047466b904	2021-04-14 13:38:28 -07:00
Aleksei Kashapov	0b8bd22614	Fix bug with rebuilding extensions every import (#56015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56015 Reviewed By: mruberry Differential Revision: D27765934 Pulled By: ezyang fbshipit-source-id: 65cace951fce5f2284ab91d8bd687ac89a2311fb	2021-04-14 13:25:01 -07:00
Howard Huang	f8f756efb2	TCPStore add watchKey method and new listener thread (#54264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54264 Changes - Creates new listener thread on each client to run the callback - Create new class which listener thread and master thread derive from, this class is used to handle shut down and clean up of the thread in windows and linux - Add watchKey method and update any functions that changes the key value. Background This PR adds functionality to TCPStore to allow users to watch a key and execute a callback on key change. It introduces this a new watchKey() API: `TCPStore::watchKey(const std::string& key, std::function<void(std::string, std::string)> callback)` which has parameters `key` and `callback(old_key, new_key)` to run on key change. Since current methods are blocking, for example in`TCPStore::get()` a worker will send a "get key" request to the master -> wait for a response back -> then exit the function and return the value to user, we need a non-blocking, asynchronous way to execute the callback whenever a key changes. This is done by creating a new listener thread on each client which the master can communicate with. Right now, the API is C++ only and only for TCPStore, the internal use case is for elastic RPC. We will have an internal key such as `_NumNodes` and all nodes in the elastic RPC group will watch this key. When a node leaves, this key will be updated and each node will execute a callback to clean up Autograd context and RRef context. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27709912 Pulled By: H-Huang fbshipit-source-id: 619aa3b2a8eb23f4be5f5736efdcca6c175aadf3	2021-04-14 13:23:12 -07:00
Edward Yang	bc86358cf5	Make run_test.py work even if s3_stat_parser fails to import (#56039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56039 Python will try to eagerly resolve the name references even if the import failed. Quote them so that it doesn't. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D27770536 Pulled By: ezyang fbshipit-source-id: b111739289498f9bab856fb9424f3080efee4ee0	2021-04-14 13:21:50 -07:00
davidriazati@fb.com	48a7d69946	Catch and ignore tracebacks for compilation errors (#55986 ) Summary: The Python traceback on a cmake invocation is meaningless to most developers, so this PR wraps it in a `try..catch` so we can ignore it and save scrolling through the 20-or-so lines. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55986 Pulled By: driazati Reviewed By: wanchaol Differential Revision: D27769304 fbshipit-source-id: 5889eea03db098d10576290abeeb4600029fb3f2	2021-04-14 13:05:27 -07:00
mattip	40d74e6f71	breakup optim, cuda documentation (#55673 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Use autosummary instead of autofunction to create subpages for optim and cuda functions/classes. Also fix some minor formatting issues in optim.LBFGS and cuda.stream docstings Pull Request resolved: https://github.com/pytorch/pytorch/pull/55673 Reviewed By: jbschlosser Differential Revision: D27747741 Pulled By: zou3519 fbshipit-source-id: 070681f840cdf4433a44af75be3483f16e5acf7d	2021-04-14 12:44:00 -07:00
mattip	fd15557ccc	breakup autograd documentation (#55672 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Use autosummary instead of autofunction to create subpages for autograd functions. I left the autoclass parts intact but manually laid out their members. Also the Latex formatting of the spcecial page emitted a warning (solved by adding `\begin{align}...\end{align}`) and fixed alignment of equations (by using `&=` instead of `=`). zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55672 Reviewed By: jbschlosser Differential Revision: D27736855 Pulled By: zou3519 fbshipit-source-id: addb56f4f81c82d8537884e0ff243c1e34969a6e	2021-04-14 12:40:00 -07:00
Rohan Varma	bbc4c775bb	[reland][c10d] monitored_barrier: ensure all ranks pass or none do (#55990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55990 Reland of https://github.com/pytorch/pytorch/pull/55197, which fails windows test that was only run on master. Disabled these tests for windows, similar to they are disabled on MacOS. The reason for disabling as that they use libuv transport which does not have as robust error handling as tcp on linux. The result is that non-zero ranks that were healthy don't throw immediately (like they do on linux) but they throw on timeout. The error handling still occurs as expected on rank 0 for all platforms. ghstack-source-id: 126478371 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27758424 fbshipit-source-id: d30841c8dda77f51b09a58161e638657ef758e63	2021-04-14 12:26:54 -07:00
Rohan Varma	752f5b1030	[reland][c10d] Log API usage of monitored barrier (#55989 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55989 Reland of https://github.com/pytorch/pytorch/pull/55197, which fails windows test that was only run on master. ghstack-source-id: 126477554 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27758425 fbshipit-source-id: ebca8b6baf0019879bc4b16639d6cccf27dc6b1c	2021-04-14 12:25:35 -07:00
Sam Estep	c8cf9114bf	Include short test suites ln total_seconds stat (#56040 ) Summary: Up until this PR, the top-level `total_seconds` stat we've been uploading to S3 has only included suites longer than one second. This PR corrects that issue, and also clarifies the script's textual output for "longest tests of entire run". (Note that the `total_time` local variable is passed as the `total_seconds` parameter in the call to `assemble_s3_object`.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56040 Test Plan: Create a simple test file (call it `test_quick_maths.py`) with these contents: ```py from torch.testing._internal.common_utils import TestCase, run_tests class TestQuickMaths(TestCase): def test_two_plus_two(self): self.assertEqual(2 + 2, 4) if __name__ == '__main__': run_tests() ``` Run it and save the test results: ```sh rm -r /tmp/reports ; python3 test_quick_maths.py --save-xml=/tmp/reports ``` Then display them using the script: ```sh tools/print_test_stats.py /tmp/reports ``` - Before this PR: ``` No scribe access token provided, skip sending report! Total runtime is 0:00:00 0 longest tests of entire run: ``` - With this PR: ``` No scribe access token provided, skip sending report! Total runtime is 0:00:00.108000 0 longest tests of entire run (ignoring suites totaling less than 1.0 seconds): ``` If you were to upload this to S3 (see https://github.com/pytorch/pytorch/issues/49190 for an example of how to do this manually), the top-level `total_seconds` field should also change from `0` to `0.108`. Reviewed By: janeyx99 Differential Revision: D27770666 Pulled By: samestep fbshipit-source-id: 8255a4726ab3a692bbeff4c48974fbb3c6375142	2021-04-14 11:53:55 -07:00
Bert Maher	8df5e61fd6	[nnc] Do not try to vectorize kernels that use float16 (#55970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55970 LLVM's support for float16 is not great, and we were seeing assertion failures trying to generate code for vectorized uses. I note that clang doesn't even try to vectorize operations involving half: https://gcc.godbolt.org/z/86MW4xr17, so that's a good sign we shouldn't either. Fixes #55905 ghstack-source-id: 126511474 Test Plan: pytest test_jit_fuser_te.py -k test_isnan Reviewed By: asuhan Differential Revision: D27752279 Pulled By: bertmaher fbshipit-source-id: ac115080bf2a4a73d52b396d64a5bce0cf13abfe	2021-04-14 11:28:34 -07:00
Nikita Shulga	087049000b	Make c10 clang-tidy clean (#55870 ) Summary: This change was autogenerated by running: ``` % find c10 -iname "*.cpp" -exec python3 tools/clang_tidy.py -c build -x {} -s \; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55870 Reviewed By: janeyx99 Differential Revision: D27728617 Pulled By: malfet fbshipit-source-id: bede4d7f0c106d51394d1e9efddf01bf894421c5	2021-04-14 11:23:28 -07:00
S.Cao	416c18b7c9	Add a batch_first arg to Transformer / MHA modules (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: mruberry Differential Revision: D27765694 Pulled By: jbschlosser fbshipit-source-id: c34774fa065d67c0ac130de20a54e66e608bdbf4	2021-04-14 11:18:42 -07:00
Chester Liu	ba320cec6b	Prepare for Azure Pipeline for multi-gpu tests (#55600 ) Summary: Previous PR: https://github.com/pytorch/pytorch/issues/52490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55600 Reviewed By: albanD, seemethere Differential Revision: D27667544 Pulled By: malfet fbshipit-source-id: f5843379807d8c95f3791d19ac0ab2d1973fa087	2021-04-14 10:02:21 -07:00
Jane Xu	1127bab828	Make GHA for consistency cancel_redundant_workflow return useful err msg (#55961 ) Summary: This way, the user gets more useful actionable results from the GHA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55961 Test Plan: CI Reviewed By: samestep Differential Revision: D27749013 Pulled By: janeyx99 fbshipit-source-id: bb0edbcdab29ba8ef99005f6fcf52de6782b468d	2021-04-14 09:54:08 -07:00
Kurt Mohler	3fe4718d16	Add `padding_idx` argument to EmbeddingBag (#49237 ) Summary: This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction. This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided. Fixes https://github.com/pytorch/pytorch/issues/3194 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237 Reviewed By: walterddr, VitalyFedyunin Differential Revision: D26948258 Pulled By: jbschlosser fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc	2021-04-14 09:38:01 -07:00
Natalia Gimelshein	f94c95a2dd	Revert D23752058: [pytorch][PR] Don't split oversize cached blocks Test Plan: revert-hammer Differential Revision: D23752058 (`67dcd62310`) Original commit changeset: ccb7c13e3cf8 fbshipit-source-id: 12ae9702135ea510e9714ed97fb75ca3b9f97c27	2021-04-14 09:24:08 -07:00
Bert Maher	e7e164f9e6	[nnc] Enable CPU fusion only when num_threads == 1 (#55621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621 Fuser support for thread-level parallelism is a work in progress, so only fuse when the program is running single-threaded. ghstack-source-id: 126069259 Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not Reviewed By: ZolotukhinM Differential Revision: D27652485 fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef	2021-04-14 09:16:54 -07:00
Nikitha Malgi	88c06d9dfc	Add cuda device synchronization support in JIT (#55469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55469 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27749077 Pulled By: nikithamalgifb fbshipit-source-id: bce3d331ab781cf3232b47b4f02ef504b9eadc7e	2021-04-14 09:13:07 -07:00
Ailing Zhang	1688a5d31a	Cleanup since FEATURE_TORCH_MOBILE is always true. (#55835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55835 Now that https://github.com/pytorch/pytorch/pull/55238 is landed for a week and no complains. It seems safe to say FEATURE_TORCH_MOBILE is always true and we can do some cleanup. Test Plan: Imported from OSS Reviewed By: ezyang, walterddr Differential Revision: D27721284 Pulled By: ailzhang fbshipit-source-id: 4896bc5f736373d0922cfbe8eed0d16df62f0fa1	2021-04-14 09:08:18 -07:00
Vasiliy Kuznetsov	8188d18f8d	ns for fx: add functional conv-relu fusion support (#55433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55433 Makes `F.conv{n}d -> F.relu` patterns work for NS weight extraction. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_conv_fun ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27622417 fbshipit-source-id: d3ee08bd19865874cff3776c3b69e232fdfc5912	2021-04-14 09:04:37 -07:00
Vasiliy Kuznetsov	1ea95fa5b2	ns for fx: add test case for linear dynamic (#55432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55432 As titled. Test Plan: python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_dynamic Imported from OSS Reviewed By: hx89 Differential Revision: D27622416 fbshipit-source-id: 319cfc0401e843006cafe4c6a272cb4d7462db18	2021-04-14 09:04:34 -07:00
Vasiliy Kuznetsov	784ae23d43	ns for fx: fix bug in weight extraction testing (#55431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55431 Fixes a bug in the test cases, returning early resulted in some tests not being run. Adds logic for `nni.LinearReLU`, which was unmasked by making the tests run Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_mod ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27622415 fbshipit-source-id: 79d9e3125e5d881d9d13645abbe4bd007a5e1d44	2021-04-14 09:04:32 -07:00
Vasiliy Kuznetsov	8b992ab0e4	ns for fx: add conv1d weight extraction (#55327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55327 Adds NS functionality for extracting weights from `F.conv1d` nodes. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_conv_fun ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27575425 fbshipit-source-id: 65fa194802ac7a9fb75b7616d962c5c2e71321ff	2021-04-14 09:04:30 -07:00
Vasiliy Kuznetsov	8fc1ca0d22	fx quant: fix prepacking for F.conv1d (#55311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55311 Before this PR, `F.conv1d` was matched by FX graph mode quant patterns but the prepacking was happening inline. There was also a bug with argument type mismatch. This PR fixes both issues and adds a test. Thanks jerryzh168 for the code tip. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_functional_not_reference ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27575422 fbshipit-source-id: 42301e23cb101a9e64e46800813bc771317e233e	2021-04-14 09:04:28 -07:00
Vasiliy Kuznetsov	457fac0a33	ns for fx: move more weight matching logic to weight_utils.py (#55288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55288 No logic change, just moving util-like code to the utils file. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27575423 fbshipit-source-id: cd5188a0940bb664be7d0275faa7df8ea18401a8	2021-04-14 09:04:26 -07:00
Vasiliy Kuznetsov	13d7b40ea0	ns for fx: add F.conv2d and F.conv3d weight extraction (#55287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55287 Adds support for extracting weights from F.conv2d and F.conv3d. F.conv1d and the fused variants are saved for future PRs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_conv_fun ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27575424 fbshipit-source-id: e945912d7d0ab320f47cab30d00d60ddb7497158	2021-04-14 09:04:24 -07:00
Vasiliy Kuznetsov	1fb2abc7ad	ns for fx: rename SugraphTypeRelationship to SubgraphTypeRelationship (#55155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55155 Fixes typo in enum name, no logic change Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27504625 fbshipit-source-id: 21605dadb48225987f1da5ad5f6c30b0183278f2	2021-04-14 09:04:22 -07:00
Vasiliy Kuznetsov	37a404610f	ns for fx: add allowlist for ops with same signature across dtypes (#55154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55154 Adds functionality to NS to allow matching nodes which have the same signature across dtypes. For now, only the skeleton is added, we can fill out the rest of the ops later. This is to unblock the work to change `cat` to have the same signature for fp32 and int8, and keep the testing we have for `cat` in NS. For context, the main reason we are not matching nodes with equal types, for now, is user defined types for which we do not know the signature. For now, the design is strictly allowlist of everything. In the future, we may adjust the design to safely match user defined types. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_ops_with_same_fp32_and_int8_signature python test/test_quantization.py TestFXGraphMatcher.test_nodes_with_equal_types_do_not_get_matched ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D27504624 fbshipit-source-id: 4f8eb4f3258caf6f99aa373ca7ba516ebbcf4779	2021-04-14 09:04:20 -07:00
Vasiliy Kuznetsov	444b318a90	ns for fx: add linear-relu mod weight extraction (#55080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55080 Adds support for extracting weights of linear-relu module pattern. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D27474701 fbshipit-source-id: 69ceaadc28d7fdcebd16d519367274d348b0dd29	2021-04-14 09:02:51 -07:00
Can Balioglu	2587a28bbd	Improve the instructions on how to build the docs (#56018 ) Summary: This PR includes: - A formatting change to make katex installation instructions more visible for Facebook employees. - A short tip about how to start a lightweight HTTP server on a remote machine to browse the doc build artifacts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56018 Reviewed By: H-Huang Differential Revision: D27765157 Pulled By: cbalioglu fbshipit-source-id: 67663de0ba7b742e0deb5358d1e45eea9edd840f	2021-04-14 08:47:43 -07:00
Heitor Schueroff	b1d17bc55f	Added OpInfo for torch.sum (#55406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55406 Reviewed By: mrshenli Differential Revision: D27620593 Pulled By: heitorschueroff fbshipit-source-id: 73f0a1890d3a92c5374470610dce086a868763b3	2021-04-14 03:32:13 -07:00
Michael Wootton	67dcd62310	Don't split oversize cached blocks (#44742 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35901 This change is designed to prevent fragmentation in the Caching Allocator. Permissive block splitting in the allocator allows very large blocks to be split into many pieces. Once split too finely it is unlikely all pieces will be 'free' at that same time so the original allocation can never be returned. Anecdotally, we've seen a model run out of memory failing to alloc a 50 MB block on a 32 GB card while the caching allocator is holding 13 GB of 'split free blocks' Approach: - Large blocks above a certain size are designated "oversize". This limit is currently set 1 decade above large, 200 MB - Oversize blocks can not be split - Oversize blocks must closely match the requested size (e.g. a 200 MB request will match an existing 205 MB block, but not a 300 MB block) - In lieu of splitting oversize blocks there is a mechanism to quickly free a single oversize block (to the system allocator) to allow an appropriate size block to be allocated. This will be activated under memory pressure and will prevent _release_cached_blocks()_ from triggering Initial performance tests show this is similar or quicker than the original strategy. Additional tests are ongoing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/44742 Reviewed By: ngimel Differential Revision: D23752058 Pulled By: ezyang fbshipit-source-id: ccb7c13e3cf8ef2707706726ac9aaac3a5e3d5c8	2021-04-14 03:04:41 -07:00
Alex Suhan	09c0bb4fb9	Make replication_pad2d structured (#55511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55511 Reviewed By: albanD Differential Revision: D27681155 Pulled By: asuhan fbshipit-source-id: 9851d856b601337faa39a242904b8e4e696aeb61	2021-04-14 01:08:10 -07:00
Meghan Lele	7985753421	[package] Add dependency tracing function (#55167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55167 Summary This commit adds a function that uses `sys.setprofile` to trace the execution of a callable in order to determine which modules it really uses. The result of this trace can inform packaging decisions. Test Plan This commit adds a unit test to `test_analyze.py` that tests this feature. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27730805 Pulled By: SplitInfinity fbshipit-source-id: 11802625564513da9a0144904be0d34dbae0f601	2021-04-14 00:06:40 -07:00
Pavel Belevich	9f89b53d7d	Synchronize RRef.to_here() CUDA Streams properly (#54932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54932 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27684022 Pulled By: pbelevich fbshipit-source-id: 2bae51ab6649258d0219ca4e9dbbf45ac6a76c28	2021-04-13 23:24:38 -07:00
Jerry Zhang	c96b5b2a20	[quant][graphmode][fx][fix] Fix fp16 reference patterns for linear (#55727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55727 number of dequantize for fp16 reference pattern was incorrect before, this PR fixes the problem Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27713390 fbshipit-source-id: 72b8d4cda0bdcea74abe27a76f918d1b47819b01	2021-04-13 23:19:45 -07:00
James Reed	2236f43da0	[FX] Put tensor metadata into a NamedTuple in ShapeProp (#55930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55930 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27741730 Pulled By: jamesr66a fbshipit-source-id: 0a0a1b94beed6c482add9e9551f316f3b4220ab2	2021-04-13 22:21:50 -07:00
Rohan Varma	48c73d24b8	Revert D27523060: [c10d] monitored_barrier: ensure all ranks pass or none do Test Plan: revert-hammer Differential Revision: D27523060 (`a5290adea5`) Original commit changeset: fa05e4f8ad8a fbshipit-source-id: aa59c1c3ab0ed5b124583a52aed0f93c3b93a05a	2021-04-13 21:33:09 -07:00
Rohan Varma	c7aa1026a8	Revert D27548433: [c10d] Log API usage of monitored barrier Test Plan: revert-hammer Differential Revision: D27548433 (`09231b5db1`) Original commit changeset: 7520ad0948b8 fbshipit-source-id: aa946d8d27472d19c0fe855952ec58d1266ee35a	2021-04-13 21:31:49 -07:00
Rohan Varma	3646fa3621	Fix tensorpipe test (#55979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55979 Fix name used for this test ghstack-source-id: 126465107 Test Plan: CI Reviewed By: pbelevich, H-Huang Differential Revision: D27755320 fbshipit-source-id: fead989041d703d473b6847ee0cee1deebe12571	2021-04-13 19:10:03 -07:00
Rohan Varma	09231b5db1	[c10d] Log API usage of monitored barrier (#55265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55265 Logs API usage of monitored barrier for better tracking and use case understanding. ghstack-source-id: 126413087 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27548433 fbshipit-source-id: 7520ad0948b8dc9d44fa3118d5ea953d52f9f1c5	2021-04-13 19:02:52 -07:00
Rohan Varma	a5290adea5	[c10d] monitored_barrier: ensure all ranks pass or none do (#55197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55197 From initial user feedback, one unexpected difference between monitored_barrier impl and barrier is the "all or nothing" semantics. In barrier, all ranks pass or they all fail. With monitored barrier however, if rank 1 is healthy, it will respond to both send and recv from rank 0, but rank 0 can later fail because rank 2 is stuck. In this case, rank 1 will move forward out of the barrier. This change makes it so that if a rank fails in monitored barrier, all other ranks in monitored barrier will also fail. It does so by the following process, similar to acknowledgements: Nonzero ranks call send() Nonzero ranks call recv() Rank 0 calls recv(), if this succeeds, rank 0 has acknowledged rank N as healthy Once all ranks are acknowledged as healthy: Rank 0 calls send() to all nonzero ranks to unblock them Modified unittests to ensure the all or nothing failure behavior ghstack-source-id: 126413088 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27523060 fbshipit-source-id: fa05e4f8ad8ae97fd6cb20da5c3a7ef76fd31de6	2021-04-13 19:01:25 -07:00
Scott Wolchok	86368700e8	[PyTorch] Change MaybeOwned tests to use intrusive_ptr and Tensor (#55684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55684 Upcoming changes to `MaybeOwned<T>` will require that T is one of these two types and will have custom code for both. This diff updates the tests to continue to build under these new requirements; it is being sent separately to demonstrate that the tests continue to work on the current implementation. ghstack-source-id: 126405918 Test Plan: CI will run the rewritten tests. Reviewed By: bhosmer Differential Revision: D27630289 fbshipit-source-id: e38097d9ca04f3337cfa543ebcc8fb5d6916fcf3	2021-04-13 18:53:43 -07:00
Scott Wolchok	cf7c5dcae3	[PyTorch] Avoid double indirection in MaybeOwned's borrowed state (#55685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55685 This diff introduces a traits class that tells `MaybeOwned` how to borrow a specific type. While it is still capable of handling a generic `T` by storing `const T` and how to do so is shown in a comment, it is not committed in live code because it is not needed. Instead, we have specific traits implementations for `c10::intrusive_ptr<T>` and `Tensor` that implement the borrowed state as just a plain old `c10::intrusive_ptr<T>` or `Tensor` (respectively) that we manipulate to avoid reference counting operations. We do this entirely with public API to `c10::intrusive_ptr<T>` and could do likewise with `Tensor`, but (as comments in the code explain) adding a private constructor to `MaybeOwnedTraits<Tensor>` allowed additional performance optimization. This representation of `MaybeOwned` seems to be more efficient than the generic `T-or-pointer-to-const-T` representation. Intuitively, we avoid a double indirection at minimal cost vs the previous implementation. It also* seems to be more efficient than the pointer tagging representation I sent out as #55555; apparently, having the extra word for a flag is much cheaper than the masking operands for pointer tagging and the same double indirection as the generic representation. In particular, this seems to have the same effect as the `TensorHandle` idea we've discussed internally (a hypothetical class like `Tensor` that wraps a raw `TensorImpl*` and shares the generated methods of `Tensor` so that everything still works), but you have to be explicit about borrowing and use pointer syntax to get the effect. Unlike `TensorHandle`, you can use it as internal state in a class and "upgrade" from a borrow to an owned `Tensor` derived from your original borrow if necessary. Note that this is just a representational change and it still has the same semantics: you need to keep the T you borrowed from around! ghstack-source-id: 126405920 Test Plan: Previous diff changes the MaybeOwned tests to cover both `intrusive_ptr` and `Tensor`, which we need in order to ensure that our trait implementations are correct. Further diffs in this stack will use this type to hold operand tensors in `TensorIteratorBase` to allow borrowing at relatively small cost (very roughly, a 6% win in the successful borrowing case for our add-in-place benchmark at the cost of a 2.5% regression in the legacy non-borrowing case, and we know that we will be able to borrow in structured kernels and probably most unstructured operands as well). Reviewed By: ezyang Differential Revision: D27679723 fbshipit-source-id: 57104f4edabc545ff83657233fde9eb40b969826	2021-04-13 18:48:41 -07:00
Yi Wang	d398a705c6	Clang-format batchnorm.py and distributed.py (#55971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55971 Per title ghstack-source-id: 126454339 Test Plan: N/A Reviewed By: zhaojuanmao Differential Revision: D27752315 fbshipit-source-id: 64ca5dea7b2689037594a6bd9a75641a9bb817c1	2021-04-13 18:40:23 -07:00
Yi Wang	132f5c1f36	Clang-format ProcessGroupMPI.cpp (#55969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55969 Per title ghstack-source-id: 126453717 Test Plan: N/A Reviewed By: zhaojuanmao Differential Revision: D27752173 fbshipit-source-id: e5069b91d699b9d02b12e5dab5e62007dbcee9f0	2021-04-13 17:11:19 -07:00
James Reed	8bdea14cd3	[FX] Add memory_format to shape_prop (#55815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55815 Test Plan: Imported from OSS Reviewed By: pbelevich, ansley Differential Revision: D27716342 Pulled By: jamesr66a fbshipit-source-id: f7c22dd77a4f48650700fc4c3c44b1c59196282e	2021-04-13 16:37:54 -07:00
Vitaly Fedyunin	2bf26965e7	Revert D27710107: [pytorch][PR] Update a `batch_first` arg for transformers like GRU and LSTM. Test Plan: revert-hammer Differential Revision: D27710107 (`2237754b13`) Original commit changeset: c4363a460454 fbshipit-source-id: 5387b5deae6db43f17a7d5e0408a7d24e463d73a	2021-04-13 16:22:23 -07:00
Zheng Yan	a61d91e803	Port reflection_pad1d to structured kernel (#55531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55531 Following https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md Test Plan: unittests Reviewed By: ezyang Differential Revision: D27628059 fbshipit-source-id: 885a10b766db39f8f8df4dcaaf0769fcf2ff9751	2021-04-13 15:33:29 -07:00
Yi Wang	de5e3b5eb0	Fix OSS flaky test_destroy_full_group on MPI backend in pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test environment by adding a barrier and retrying MPI_Comm_create 3 times (#55921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55921 Fix this flaky test by adding a barrier and retrying the flaky function call `MPI_Comm_create` 3 times. Couldn't figure out the root cause why `createProcessGroupMPI` can be flaky when just creating a subgroup communicator by mainly invoking `MPI_Comm_create`. Here `createProcessGroupMPI` does not involve any p2p or collective communication at all. Cannot further dig into `MPI_Comm_create`, which is in MPI codebase. Also checked the commit history, and no commit on `ProcessGroupMPI.cpp` can be found within a few days before Mar 10th. First failure (on Mar 10th): https://app.circleci.com/pipelines/github/pytorch/pytorch/283704/workflows/d84ac4a0-42e3-4925-b1cf-32d3c3d1022a/jobs/11456129 Note that the test failure cannot be reproduced locally. Verified the fix on CI: https://app.circleci.com/pipelines/github/pytorch/pytorch/300586/workflows/a5c16db4-3ae2-44c7-a9c8-b0885dad2a64/jobs/12356852 test_destroy_full_group has rerun 100 times and pass. #Closes: https://github.com/pytorch/pytorch/issues/53899 ghstack-source-id: 126414937 Test Plan: ``` export BACKEND=mpi export WORLD_SIZE=2 pytest -k test_destroy_full_group test/distributed/test_distributed_fork.py -vs ``` ``` #!/bin/bash for i in {1..100} do pytest -k test_destroy_full_group test/distributed/test_distributed_fork.py done ``` The CI tests triggered by a new branch: https://app.circleci.com/pipelines/github/pytorch/pytorch?branch=ci-all%2Fwayi_mpi Reviewed By: mrshenli Differential Revision: D27245421 fbshipit-source-id: 86e7fe208e34eda8a33885e385d56ec6b60eca27	2021-04-13 15:28:51 -07:00
Rohan Varma	c218ac3bc0	[NCCL] Join work clean up thread before aborting communicators (#55444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55444 Changes ~ProcessGroupNCCL so that we join work cleanup thread before aborting nccl communicators. This is because if we abort nccl communicators first on destruction, outstanding work objects in workMetaList can have exceptions set on them. Right now this doesn't trigger errors in nccl async error handling due to the terminated check, but it seems a bit cleaner to just join this thread first. The main motivation is also to reduce log spam since we added some logging when an exception is set on WorkNCCL, but this unexpectedly resulted in a lot of false-positive errors being logged even after pg shutdown. An example is below: I0406 18:30:27.361981 1567104 ProcessGroupNCCL.cpp:527] [Rank 1] NCCL watchdog thread terminated normally I0406 18:30:27.364675 1567105 ProcessGroupNCCL.cpp:265] [Rank 1] found async exception when checking for NCCL errors: NCCL error: unhandled system error, NCCL version 2. 7.3 With this change, we no longer see these false positive logs. ghstack-source-id: 126145284 Test Plan: CI Reviewed By: osalpekar Differential Revision: D27613035 fbshipit-source-id: abf924630128b50e7f66ae41ac83403e7a0aac96	2021-04-13 15:25:22 -07:00
Yu Guo	8596ac186b	deterministic code path for gather_backward for dim = 1 (#55573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55573 provide a deterministic code path for gather_backward when dim = 1 Test Plan: buck test //caffe2/test:torch -- test_gather_backward ✓ Pass: caffe2/test:torch - test_gather_backward_one_dim (test_torch.TestTorch) (1.099) ✓ Pass: caffe2/test:torch - test_gather_backward_deterministic_path (test_torch.TestTorch) (1.166) test on GPU buck test mode/opt //caffe2/test:torch_cuda -- test_gather_backward_deterministic Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1407375070421778 ✓ ListingSuccess: caffe2/test:torch_cuda - main (7.484) ✓ Pass: caffe2/test:torch_cuda - test_gather_backward_deterministic_path_cuda (test_torch.TestTorchDeviceTypeCUDA) (26.145) ✓ Pass: caffe2/test:torch_cuda - main (26.145) Summary Pass: 2 ListingSuccess: 1 Reviewed By: ngimel Differential Revision: D27632008 fbshipit-source-id: ec27475332a3b36360cc014193256c21cba77d63	2021-04-13 15:18:00 -07:00
S.Cao	2237754b13	Update a `batch_first` arg for transformers like GRU and LSTM. (#55285 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25100 #43112 EDIT: pardon my inexperience since this is my first PR here, that I did not realize the doc should not have any trailing white spaces, and `[E712] comparison to False should be 'if cond is False:' or 'if not cond:'`, now both fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55285 Reviewed By: ngimel Differential Revision: D27710107 Pulled By: jbschlosser fbshipit-source-id: c4363a4604548c0d84628c4997dd23d6b3afb4d9	2021-04-13 14:54:50 -07:00
Eli Uriegas	b98f011cd4	cmake: Enable (s)ccache for nccl builds (#55814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55814 I don't really know if the original issue is resolved but let's just check and see if this passes CI so that we can potentially get some speed up on our builds Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27715734 Pulled By: seemethere fbshipit-source-id: a8f90774dfd25b0abf8e57283fe3591a8d8f3c4b	2021-04-13 14:49:25 -07:00
Nikita Shulga	c47cc30bf5	Skip testing torch.float16 in test_isnan (#55906 ) Summary: See https://github.com/pytorch/pytorch/issues/55905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55906 Reviewed By: walterddr Differential Revision: D27737356 Pulled By: malfet fbshipit-source-id: 39571cfe6f078af8bb7387ed459a5d0f2410bad1	2021-04-13 14:44:43 -07:00
Eli Uriegas	bf8b790ba7	.github: Bump disk size for auto-scaled workers (#55955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55955 Was experiencing build failures related to disk size issues, let's bump to 150 to see if that resolves these issues Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27747958 Pulled By: seemethere fbshipit-source-id: 9222475d2298cf942479650567616489387bf552	2021-04-13 14:40:35 -07:00
Kurt Mohler	5a45b1b2f2	Add nondeterministic alert for `index_put_` when `accumulate=False` (#55827 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55827 Reviewed By: yinghai Differential Revision: D27725794 Pulled By: ngimel fbshipit-source-id: f6b5b3e635170524fdb5a0141ebd27925c37e8d9	2021-04-13 14:28:16 -07:00
Yanli Zhao	5ffc4e3b0f	refactor prepare_for_backward (#54977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54977 put part of codes in prepare_for_backward into functions, so that those functions can be used in static graph training and delay all reduce later on. ghstack-source-id: 126366714 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D27439195 fbshipit-source-id: 8899eda621260232d774cb145f9c6d683c47e188	2021-04-13 14:25:29 -07:00
Linbin Yu	6dd1978d4b	print average duration for caffe2 benchmark Summary: print average duration for caffe2 benchmark Test Plan: buck run //xplat/caffe2:caffe2_benchmarkAppleMac -- --init_net ~/track_init_net.pb --net ~/track_predict_net.pb --warmup 10 --input 'data' --input_dims '1,4,128,256' --input_type float --iter 20 Using additional configuration options from .buckconfig.local Building: finished in 0.6 sec (100%) 247/2137 jobs, 0 updated Total time: 0.6 sec Average Duration: 18111 us Reviewed By: larryliu0820 Differential Revision: D27745416 fbshipit-source-id: a5d20b8ef0ba4a9547d396738d5ddd1aca57684d	2021-04-13 14:19:34 -07:00
Jacob Szwejbka	d1fac54f13	[Pytorch] Only print gradient of a tensor if it requires_grad (#54446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54446 Several people now have run into an issue with printing tensors using lite interpreter in xplat. https://fb.workplace.com/groups/2148543255442743/?multi_permalinks=2620088118288252&notif_id=1616432955971055&notif_t=work_group_activity&ref=notif This is due to the fallback of _fw_grad also requiring autograd to exist. Introduce a new function that can be used to guard against calling _fw_grad if autograd isnt built. ghstack-source-id: 126334787 Test Plan: ci, tested the guard in by printing in a tensor in a situation where autograd isnt built. Reviewed By: albanD Differential Revision: D27239164 fbshipit-source-id: 4b98b4b7770b153bc2c13c95f7d256425e09ef39	2021-04-13 13:47:41 -07:00
Winston Smith	aceceb3d5c	Reland #50999 (Added pow() on CPU for float16 & bfloat16) (#55280 ) Summary: #### Reason for relanding Line 1607 of `torch/testing/_internal/common_methods_invocations.py` of https://github.com/pytorch/pytorch/issues/50999 had `dtype` instead of `dtype=torch.bool`, so 4 of the 9 sample inputs for `bool` had incorrect dtype. This bug was caught by https://github.com/pytorch/pytorch/issues/54949. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `tan` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. 7. Removed redundant `dtypesIfCPU` and `dtypesIfCUDA` from `OpInfo`s where they are equal to `dtypes`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55280 Reviewed By: jbschlosser Differential Revision: D27591772 Pulled By: heitorschueroff fbshipit-source-id: c7420811b32595bb3353149a61e54a73f2eb352b	2021-04-13 13:23:29 -07:00
Scott Wolchok	de53de39d7	[PyTorch] Mark borrowed case as C10_LIKELY in MaybeOwned (#55553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55553 If this case isn't likely, user code would have been better off with a regular T. ghstack-source-id: 126369326 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D27630287 fbshipit-source-id: b074af3a65c61dfe9e0246df046cc8c49e8efb03	2021-04-13 13:23:27 -07:00
Scott Wolchok	ea446ed600	[PyTorch] Allow copy operations on MaybeOwned (#55419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55419 Turns out it's useful to have these. I chose to implement them in the straightforward safe way, rather than always borrowing. ghstack-source-id: 126369328 Test Plan: Added more automated tests. Reviewed By: hlu1 Differential Revision: D27545805 fbshipit-source-id: 84bb4458b86672ad340cc1f0aa18b80ca7ee13f1	2021-04-13 13:21:45 -07:00
eellison	bbdb37b93d	[JIT] Use type cache in erasing shape information (#55828 ) Summary: `unshapedType` can be very slow on a graph with many modules and recursively contained classes, because each Type you have to recursively descend and map over. Speed it up with a type cache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55828 Reviewed By: ngimel Differential Revision: D27717995 Pulled By: eellison fbshipit-source-id: f1d502bef0356e78100c27bf00f6caf08a75d68c	2021-04-13 12:35:09 -07:00
Sukru Eryilmaz	8f953ef544	Increase token count threshold for calling thrust sort in embedding backward (#49913 ) Summary: Increases the token count threshold to expand the span of custom CUDA kernel implementation of embedding backward. Here is the speedup for embedding backward implementation for DGXV100-128GB and DGXA100-640GB given below. I picked 6144 as the new threshold since anything below it mostly results in faster execution with custom CUDA kernel. One important advantage of the custom CUDA kernel is that it allows CUDA graph capture, whereas thrust code path results in CPU syncs, prohibiting graph capture (times below are collected without graph capture). For reference, MLPerf BERT benchmark uses num_features=1024. \| num_tokens \| num_features \| thrust path(ms) \| custom kernel(ms) \| speedup -- \| -- \| -- \| -- \| -- \| -- DGXV100 \| \| \| \| \| \| 1024 \| 64 \| 0.36 \| 0.18 \| 2.04 \| 1024 \| 256 \| 0.43 \| 0.30 \| 1.46 \| 1024 \| 1024 \| 0.89 \| 0.74 \| 1.20 \| 1024 \| 2048 \| 1.50 \| 1.33 \| 1.12 \| 1024 \| 4096 \| 2.71 \| 2.50 \| 1.08 \| 1024 \| 8192 \| 5.07 \| 4.89 \| 1.04 \| 2048 \| 64 \| 0.33 \| 0.23 \| 1.46 \| 2048 \| 256 \| 0.41 \| 0.33 \| 1.26 \| 2048 \| 1024 \| 0.92 \| 0.79 \| 1.17 \| 2048 \| 2048 \| 1.54 \| 1.38 \| 1.11 \| 2048 \| 4096 \| 2.80 \| 2.54 \| 1.10 \| 2048 \| 8192 \| 5.29 \| 4.98 \| 1.06 \| 4096 \| 64 \| 0.46 \| 0.32 \| 1.43 \| 4096 \| 256 \| 0.50 \| 0.47 \| 1.07 \| 4096 \| 1024 \| 1.02 \| 0.88 \| 1.15 \| 4096 \| 2048 \| 1.70 \| 1.59 \| 1.07 \| 4096 \| 4096 \| 3.06 \| 2.68 \| 1.14 \| 4096 \| 8192 \| 5.79 \| 5.28 \| 1.10 \| 5120 \| 64 \| 0.42 \| 0.33 \| 1.28 \| 5120 \| 256 \| 0.51 \| 0.46 \| 1.11 \| 5120 \| 1024 \| 1.06 \| 0.93 \| 1.14 \| 5120 \| 2048 \| 1.77 \| 1.55 \| 1.14 \| 5120 \| 4096 \| 3.18 \| 2.76 \| 1.15 \| 5120 \| 8192 \| 6.24 \| 5.46 \| 1.14 \| 6144 \| 64 \| 0.42 \| 0.36 \| 1.17 \| 6144 \| 256 \| 0.52 \| 0.50 \| 1.05 \| 6144 \| 1024 \| 1.10 \| 0.98 \| 1.13 \| 6144 \| 2048 \| 1.85 \| 1.61 \| 1.15 \| 6144 \| 4096 \| 3.34 \| 2.84 \| 1.17 \| 6144 \| 8192 \| 6.19 \| 5.69 \| 1.09 \| 8192 \| 64 \| 0.42 \| 0.48 \| 0.88 \| 8192 \| 256 \| 0.51 \| 0.65 \| 0.78 \| 8192 \| 1024 \| 1.14 \| 1.12 \| 1.01 \| 8192 \| 2048 \| 1.92 \| 1.77 \| 1.09 \| 8192 \| 4096 \| 3.49 \| 3.03 \| 1.15 \| 8192 \| 8192 \| 6.59 \| 5.96 \| 1.11 \| 16384 \| 64 \| 0.46 \| 0.82 \| 0.56 \| 16384 \| 256 \| 0.59 \| 0.99 \| 0.60 \| 16384 \| 1024 \| 1.35 \| 1.54 \| 0.88 \| 16384 \| 2048 \| 2.31 \| 2.24 \| 1.03 \| 16384 \| 4096 \| 4.20 \| 3.63 \| 1.16 \| 16384 \| 8192 \| 8.26 \| 7.51 \| 1.10 \| 32768 \| 64 \| 0.47 \| 1.48 \| 0.32 \| 32768 \| 256 \| 0.68 \| 1.70 \| 0.40 \| 32768 \| 1024 \| 1.63 \| 2.35 \| 0.69 \| 32768 \| 2048 \| 2.87 \| 3.19 \| 0.90 \| 32768 \| 4096 \| 5.26 \| 4.86 \| 1.08 \| 32768 \| 8192 \| 10.17 \| 9.92 \| 1.03 \| 65536 \| 64 \| 0.50 \| 2.81 \| 0.18 \| 65536 \| 256 \| 0.78 \| 3.12 \| 0.25 \| 65536 \| 1024 \| 2.02 \| 3.99 \| 0.51 \| 65536 \| 2048 \| 3.58 \| 5.06 \| 0.71 \| 65536 \| 4096 \| 6.68 \| 7.40 \| 0.90 \| 65536 \| 8192 \| 13.08 \| 15.35 \| 0.85 DGXA100 \| \| \| \| \| \| 1024 \| 64 \| 0.28 \| 0.09 \| 3.05 \| 1024 \| 256 \| 0.30 \| 0.17 \| 1.71 \| 1024 \| 1024 \| 0.51 \| 0.39 \| 1.31 \| 1024 \| 2048 \| 0.81 \| 0.68 \| 1.20 \| 1024 \| 4096 \| 1.43 \| 1.24 \| 1.16 \| 1024 \| 8192 \| 2.63 \| 2.42 \| 1.09 \| 2048 \| 64 \| 0.25 \| 0.12 \| 2.15 \| 2048 \| 256 \| 0.29 \| 0.22 \| 1.36 \| 2048 \| 1024 \| 0.53 \| 0.44 \| 1.20 \| 2048 \| 2048 \| 0.86 \| 0.73 \| 1.18 \| 2048 \| 4096 \| 1.51 \| 1.30 \| 1.16 \| 2048 \| 8192 \| 2.81 \| 2.55 \| 1.10 \| 4096 \| 64 \| 0.31 \| 0.20 \| 1.57 \| 4096 \| 256 \| 0.35 \| 0.33 \| 1.08 \| 4096 \| 1024 \| 0.63 \| 0.57 \| 1.10 \| 4096 \| 2048 \| 1.08 \| 0.86 \| 1.26 \| 4096 \| 4096 \| 2.11 \| 1.44 \| 1.46 \| 4096 \| 8192 \| 3.33 \| 2.81 \| 1.19 \| 5120 \| 64 \| 0.36 \| 0.22 \| 1.63 \| 5120 \| 256 \| 0.37 \| 0.37 \| 0.98 \| 5120 \| 1024 \| 0.66 \| 0.62 \| 1.07 \| 5120 \| 2048 \| 1.05 \| 0.92 \| 1.15 \| 5120 \| 4096 \| 1.83 \| 1.51 \| 1.21 \| 5120 \| 8192 \| 3.35 \| 2.94 \| 1.14 \| 6144 \| 64 \| 0.29 \| 0.25 \| 1.18 \| 6144 \| 256 \| 0.37 \| 0.43 \| 0.86 \| 6144 \| 1024 \| 0.70 \| 0.68 \| 1.03 \| 6144 \| 2048 \| 1.08 \| 0.98 \| 1.11 \| 6144 \| 4096 \| 1.89 \| 1.57 \| 1.20 \| 6144 \| 8192 \| 3.49 \| 3.07 \| 1.14 \| 8192 \| 64 \| 0.29 \| 0.31 \| 0.95 \| 8192 \| 256 \| 0.37 \| 0.52 \| 0.70 \| 8192 \| 1024 \| 0.71 \| 0.79 \| 0.90 \| 8192 \| 2048 \| 1.16 \| 1.10 \| 1.06 \| 8192 \| 4096 \| 2.04 \| 1.70 \| 1.20 \| 8192 \| 8192 \| 3.86 \| 3.32 \| 1.16 \| 16384 \| 64 \| 0.31 \| 0.55 \| 0.56 \| 16384 \| 256 \| 0.42 \| 0.93 \| 0.45 \| 16384 \| 1024 \| 0.87 \| 1.24 \| 0.70 \| 16384 \| 2048 \| 1.46 \| 1.57 \| 0.93 \| 16384 \| 4096 \| 2.60 \| 2.23 \| 1.17 \| 16384 \| 8192 \| 5.15 \| 4.69 \| 1.10 \| 32768 \| 64 \| 0.33 \| 1.03 \| 0.32 \| 32768 \| 256 \| 0.49 \| 1.78 \| 0.28 \| 32768 \| 1024 \| 1.11 \| 2.18 \| 0.51 \| 32768 \| 2048 \| 1.90 \| 2.54 \| 0.75 \| 32768 \| 4096 \| 3.45 \| 3.31 \| 1.04 \| 32768 \| 8192 \| 6.46 \| 6.43 \| 1.00 \| 65536 \| 64 \| 0.36 \| 2.19 \| 0.16 \| 65536 \| 256 \| 0.56 \| 3.41 \| 0.17 \| 65536 \| 1024 \| 1.39 \| 4.01 \| 0.35 \| 65536 \| 2048 \| 2.48 \| 4.45 \| 0.56 \| 65536 \| 4096 \| 4.50 \| 5.44 \| 0.83 \| 65536 \| 8192 \| 8.49 \| 10.55 \| 0.80 Here is the script used to generate the times (30522 is used in BERT MLPerf benchmark as vocabulary size, hence is used in this example): ``` import torch import torch.nn as nn import time vocabulary_size = 30522 for num_tokens in [512,1024,2048,4096,5120,6144,8192,16384,32768,65536]: for hidden_dim in [64,256,1024,2048,4096,8192]: fprop_time_avg = 0.0 bprop_time_avg = 0.0 emb = nn.Embedding(vocabulary_size, hidden_dim).cuda() for trial in range(0,10): inds = torch.round(torch.rand(num_tokens) * (vocabulary_size-1)).to(dtype=torch.int64).cuda() y = emb(inds) dy = torch.randn_like(y) torch.cuda.synchronize() t_start_bwd = time.time() y.backward(dy) torch.cuda.synchronize() t_stop_bwd = time.time() bprop_time_avg += t_stop_bwd - t_start_bwd bprop_time_avg /= 10.0 print("bprop num_tokens %5d, num_features %5d, time %2.6f" %(num_tokens, hidden_dim, bprop_time_avg)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/49913 Reviewed By: jbschlosser Differential Revision: D27727738 Pulled By: ngimel fbshipit-source-id: fa497b6745b6d20bb11352579ed9eb5b66a8b1e2	2021-04-13 12:16:28 -07:00
Stiopa Koltsov	72b8864b34	[caffe2] constexpr const Summary: To fix warning: ``` xplat\\caffe2\\torch\\csrc\\jit\\runtime\\instruction.cpp(59,20): warning: ISO C++11 does not allow conversion from string literal to 'char *const' [-Wwritable-strings] ``` which can be seen in Windows CI logs. Test Plan: Eyes; did not run it. Reviewed By: iseeyuan Differential Revision: D27717057 fbshipit-source-id: f365405663b5adfbc0c87dc26a9921b6d03f1f5a	2021-04-13 12:11:12 -07:00
Mikhail Zolotukhin	7ab654afd7	[TensorExpr] Rename `Tensor::call` to `Tensor::load` to be consistent with `Buf` and `Placeholder`. (#55826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826 It's a mechanical change. Differential Revision: D27717777 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51	2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin	1263448cb2	[TensorExpr] Remove mask field from Load and Store classes. (#55825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825 The mask has never been used (in vectorization we generate an explicit `IfThenElse` construct when we need to mask out some elements). The PR removes it and cleans up all its traces from tests. Differential Revision: D27717776 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db	2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin	754b0d073a	[TensorExpr] Unbreak benchmarks. (#55824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55824 Seemingly some of my last changes (namely, removing dep-tracker) broke the TE benchmarks. This PR fixes it. Differential Revision: D27717778 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 48584bc0cfd4879a3e44cb45ee1f0d5c91b5afbc	2021-04-13 12:08:50 -07:00
Mikhail Zolotukhin	b01a15d3d3	[TensorExpr] Redesign Rfactor loopnest transformation. (#55324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324 With this change `rfactor` only affects the passed loop and its body never touching anything outside (that was a rootcause of a bug with the previous implementation). Also, we don't have an `insertion_point` parameter anymore - its meaning was vague, and the effect of it should've been achievable with other transformations anyway. The new `rfactor` semantics is as follows: ``` Requirements: * S is the reduction store * S is the only statement in the innermost loop * There is at least two reduction arguments in S * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable used in the store and all other reduction variables are index variables of children loops of OUTER_REDUCTION_FOR * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops corresponding to the other reduction variables and the store, nested into each other What it does: * Introduce a new buffer with an extra dimension of a size equal to the span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via RFAC_BUF_PTR) * Insert an initialization store for the new buffer in OUTER_REDUCTION_FOR before its nested loop * Replace the reduction store to the original buffer with the reduction store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR from reduction arguments * Insert a final reduction store over the extra dimension of the new buffer to the original buffer * Returns TRUE if the transformation succeeded and FALSE otherwise Example: Original IR: S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis S4: for k # reduction axis S5: X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k}) After RFACTOR(S5, S3) S1: for i # normal axis S2: X[i] = 0 S3: for j # reduction axis for X, normal axis for X_rfac X_rfac[i,j] = 0 S4: for k # reduction axis X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k}) X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j}) ``` Differential Revision: D27694960 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c	2021-04-13 12:08:48 -07:00
Mikhail Zolotukhin	57f795c27b	[TensorExpr] Remove unused `LoopNest::hasLoopBodyFor` method. (#55323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55323 Differential Revision: D27694961 Test Plan: Imported from OSS Reviewed By: SplitInfinity, gmagogsfm Pulled By: ZolotukhinM fbshipit-source-id: 367ae212054c3516409a568facc19a19671df488	2021-04-13 12:07:31 -07:00
mattip	f61556a7ce	Use autosummary on torch.fft, torch.linalg (#55748 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Use autosummary instead of autofunction to create subpages for `torch.fft` and `torch.linalg` functions. zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55748 Reviewed By: jbschlosser Differential Revision: D27739282 Pulled By: heitorschueroff fbshipit-source-id: 37aa06cb8959721894ffadc15ae8c3b83481a319	2021-04-13 12:02:36 -07:00
Rohan Varma	657b66e87d	[NCCL] Log when barrier guesses device to use (#54991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54991 Actual proposed fix is in https://github.com/pytorch/pytorch/pull/53934, in the meantime, would be useful to include this LOG when barrier does not know what devices to use, and suggest the workaround of passing in device_ids into barrier(). ghstack-source-id: 126351889 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27444917 fbshipit-source-id: 0f269c5a7732e5be6e51adfca7ef70d04ffd71d3	2021-04-13 11:53:55 -07:00
Michael Suo	0517222dc8	[package] Correct usage of miniz API in PyTorchStreamReader (#55725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55725 We were previously checking m_last_error on the miniz struct directly, which fails to preserve internal invariants and can the leave the reader broken in specific situations (reading a non-existent file). Using the provided error checking API fixes this. Differential Revision: D27693105 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: suo fbshipit-source-id: 20c520bb1d590fb75751bca1e970df4f2b7eb043	2021-04-13 11:50:08 -07:00
Richard Barnes	c3a49cb30c	Better types in fbcode/caffe2/torch/jit/_script.py (#55856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55856 Test Plan: Sandcastle Reviewed By: SplitInfinity Differential Revision: D27715495 fbshipit-source-id: 9804e2d432fda302117f05a0d21cbb7f0dd3ae38	2021-04-13 11:46:23 -07:00
Yanli Zhao	85b97e449d	[RFC]fix test_ddp_logging_data_cpu with tsan (#54465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54465 It is reported that there is data race issue when the test runs with tsan. The root cause is from 'model.frc1.double()' call. This is not because DistributedDataParallel() works together with 'model.frc1.double()'. If we remove DistributedDataParallel(), just call 'model.frc1.double(); model.frc2.double();', it complained the same data race issue. I'm not sure how to do data type cast in this test without tsan complains, so removing this line of codes and mixed data type logging check. Please kindly let me know if you have a better suggestion on how to do data type cast correctly Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D27249821 fbshipit-source-id: 0368157e11cbe7d15828dccca78271d89d502ec4	2021-04-13 11:20:43 -07:00
Yanli Zhao	2eebd9fdce	fix ddp logging flaky test (#55414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55414 close #55384 backward_compute_comm_overlap_time may not be larger than 1. we should check backward_compute_time, backward_comm_time are larger than 1 instead. ghstack-source-id: 126360517 Test Plan: unit tests Reviewed By: H-Huang, SciPioneer Differential Revision: D27606132 fbshipit-source-id: 418fe9f958287779d637856e355cc36cab384c68	2021-04-13 11:14:04 -07:00
Arindam Roy	800fa5f369	[ROCM] Enable more dtypes in common_method_invocations (#55808 ) Summary: The PR enables additional dtypes in common_method_invocations for ROCM. This enables around 4k new tests for ROCM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55808 Reviewed By: jbschlosser Differential Revision: D27729885 Pulled By: ngimel fbshipit-source-id: 061b88901bbe7128d51e49803f64295037b09b8d	2021-04-13 11:10:43 -07:00
Peng Wu	18662d4321	[Static runtime] refactor MemoryPlanner codes to prepare for output tensor memory planning (#55809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55809 [Static runtime] refactor MemoryPlanner codes to prepare for output tensor memory planning Test Plan: buck test mode/dev //caffe2/caffe2/fb/predictor:pytorch_predictor_test -- --exact 'caffe2/caffe2/fb/predictor:pytorch_predictor_test - PyTorchPredictor.StaticRuntime' Reviewed By: bwasti Differential Revision: D27411416 fbshipit-source-id: 7dae7c2586ce3b4ebacf6169017140166c30e99c	2021-04-13 11:04:47 -07:00
Richard Barnes	6269efde91	Add stricter typing to caffe2/torch/distributed/elastic/multiprocessing/errors/__init__.py (#55848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55848 Test Plan: Sandcastle Reviewed By: xush6528 Differential Revision: D27714781 fbshipit-source-id: cff651e04c1e8363a249c7de9de01c33db47f003	2021-04-13 10:47:08 -07:00
Avinash Nagaraj Bukkittu	70a09d97d1	Use nodes instead of node Summary: `networkx 2.4+` replaced `node` attribute to `nodes` in graph object. This caused failures in `caffe2`'s' `topological_sort_traversal_longest_path` function which uses networkx library for topological sort. Differential Revision: D27718857 fbshipit-source-id: 812fbb613946565d089cc84a20f3cdf7df046e19	2021-04-13 10:45:35 -07:00
Brian Hirsh	2bb58a06ef	move logic to skip a redispatch directly inside of resize_output (#55162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55162 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27506253 Pulled By: bdhirsh fbshipit-source-id: 02fddb1926de49cd8c915c549eb99d92e58e75e1	2021-04-13 10:25:02 -07:00
leslie-fang-intel	87fcf3072e	Fix overflow issue in quantized instance_norm/layer_norm/group_norm (#54872 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54837 `hsum_sq` has the overflow issue when the input image size is large such as (H,W,D) as (224,224,160). `hsum_sq` is used in the quantized instance_norm/layer_norm/group_norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54872 Reviewed By: dskhudia Differential Revision: D27690767 Pulled By: vkuzo fbshipit-source-id: 9b9ac3e76220d42a3b48f8bf4e20823f775789a2	2021-04-13 10:21:38 -07:00
Jeffrey Wan	8c8f8829f0	Factor out numerical logic (#54479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54479 This change is similar to #54049 in that it helps us factor out some code that can be used in both fast and slow versions of gradcheck. - `compute_gradient` and `compute_numerical_jacobian_cols` have fewer responsibilities: - compute_numerical_jacobian_cols essentially only handles the complexity of complex derivatives - compute_gradient handles only finite differencing (and doesn't worry about different layouts and indexing into the input tensor) - we have two stages again where we first compute the columns separately, then combine them Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27728727 Pulled By: soulitzer fbshipit-source-id: fad3d5c1a91882621039beae3d0ecf633c19c28c	2021-04-13 10:08:09 -07:00
Jeffrey Wan	381b3d8f4b	Refactor get numerical jacobian to calculate wrt all outputs at once (#54378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54378 ### For release notes `torch.autograd.gradcheck.get_numerical_jacobian` (not part of the public api) is being deprecated. In the future, user code relying on this function will break because, among other changes, `get_numerical_jacobian` now returns `List[Tuple[torch.Tensor]]` instead of `List[torch.Tensor]`. (more details if necessary) For a `fn` that takes in M inputs and N outputs we now return a list of M N-tuples of jacobians where `output[i][j]` would represent the numerical jacobian w.r.t. to the ith input and the jth output. Previously `get_numerical_jacobian` returned a list of tensors where each tensor represents the jacobian w.r.t. to each of the M inputs and a specific output. Finally, the function passed in as the parameter `fn` should expect to handle individual parameters, where previously `fn` is required to expect its parameters wrapped in a tuple. --- end -- This PR addresses the comment here https://github.com/pytorch/pytorch/pull/53857#discussion_r595429639, to reduce the run-time of old gradcheck's get numerical jacobian by a factor of num_outputs. However, because very few ops actually return multiple outputs, there is not too much real speed up here. The main benefit of doing this change as part of the refactor is that it helps us isolate the possible bugs that are specific to switching `get numerical jacobian` to run in a per output way vs all outputs at once. Much of the logic implemented here will be the same for the fast gradcheck case, so knowing for certain that everything should pass after this stage will make the next step much simpler. The get_numerical_jacobian api is also being used in common_nn. So we update the callsite there as well. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27728720 Pulled By: soulitzer fbshipit-source-id: ee0f90b4f26ddc5fdbe949c4965eaa91c9ed0bb8	2021-04-13 10:06:20 -07:00
Meghan Lele	fc6985eceb	[package] Minor fixes to PackageExporter docstrings (#55817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55817 Summary This commit makes minor edits to the docstrings of `PackageExporter` so that they render properly in the `torch.package` API reference. Test Plan Continuous integration (especially the docs tests). Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27726817 Pulled By: SplitInfinity fbshipit-source-id: b81276d7278f586fceded83d23cb4d0532f7c629	2021-04-13 10:00:38 -07:00
Meghan Lele	6a738196af	[package] Create API reference (#55812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55812 Summary This commit creates a barebones API reference doc for `torch.package`. The content is sourced from the docstrings in the source for the `torch.package`. Test Plan Continuous integration (specifically the docs tests). Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27726816 Pulled By: SplitInfinity fbshipit-source-id: 5e9194536f80507e337b81c5ec3b5635d7121818	2021-04-13 09:58:45 -07:00
Sam Estep	5e625906e9	Fix lint for redundant-workflows list (#55916 ) Summary: Currently this lint is [passing](https://github.com/pytorch/pytorch/runs/2335195975) on https://github.com/pytorch/pytorch/issues/55176 when it should be failing, because it is using [`l.sort()` instead of `sorted(l)`](https://docs.python.org/3/howto/sorting.html). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55916 Test Plan: Check out `0c29aa1679`, start a `python3` shell, and run the steps from this lint. The final `assert` from before this PR should succeed, and the `assert` from this PR should fail. The lint should succeed on this PR's CI, though, since the list of workflows is correct on `master`. Reviewed By: janeyx99 Differential Revision: D27739792 Pulled By: samestep fbshipit-source-id: 068fa846569eb83b98088215d8a1b63d12560633	2021-04-13 09:47:26 -07:00
Sam Estep	4753100a3b	Un-ignore F403 in .flake8 (#55838 ) Summary: Generally wildcard imports are bad for the reasons described here: https://www.flake8rules.com/rules/F403.html This PR replaces wildcard imports with an explicit list of imported items where possible, and adds a `# noqa: F403` comment in the other cases (mostly re-exports in `__init__.py` files). This is a prerequisite for https://github.com/pytorch/pytorch/issues/55816, because currently [`tools/codegen/dest/register_dispatch_key.py` simply fails if you sort its imports](https://github.com/pytorch/pytorch/actions/runs/742505908). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55838 Test Plan: CI. You can also run `flake8` locally. Reviewed By: jbschlosser Differential Revision: D27724232 Pulled By: samestep fbshipit-source-id: 269fb09cb4168f8a51fd65bfaacc6cda7fb87c34	2021-04-13 09:24:07 -07:00
Brian Hirsh	75eb026e07	migrate matrix_exp to opInfo tests (#55533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55533 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27628966 Pulled By: bdhirsh fbshipit-source-id: 87dd1858a1ebe22dcca9bd90b8cdca8c3d67d0e9	2021-04-13 08:32:34 -07:00
Facebook Community Bot	99d77c55dd	Automated submodule update: tensorpipe (#55881 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `e5e974b6cd` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55881 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27730207 fbshipit-source-id: 7d2901e676645f3da6e5ca8f9d8c1b55d63cc1c7	2021-04-13 08:04:54 -07:00
Alban Desmaison	24f9a446c9	Fix wrong detection of depthwise conv on neon (#55794 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54136 tldr: dephwise conv require that the nb of output channel is 1. The code here only handles this case and previously, all but the first output channel were containing uninitialized memory. The nans from the issue were random due to the allocation of a torch.empty() that was sometimes returning non-nan memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55794 Reviewed By: ngimel Differential Revision: D27711717 Pulled By: albanD fbshipit-source-id: 00eac3fd59db1d09fe7bab89427b105a019e7a5d	2021-04-13 07:52:11 -07:00
nikithamalgi	d7d7556f17	Move tensor implicit conversions to test_builtins.py (#55532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55532 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27729682 Pulled By: nikithamalgifb fbshipit-source-id: d2517ee68b83e59cde87b8fb7d5bf7203f02cbc6	2021-04-13 07:13:20 -07:00
albanD	5dba4ff786	move topk to use OpInfo (#55547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55547 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin, mruberry Differential Revision: D27649412 Pulled By: albanD fbshipit-source-id: e36a5bb5703681b7f7647ca30d6f4a72faf5ed0e	2021-04-13 06:21:13 -07:00
albanD	192df16a4d	move logaddexp{2} to opinfo (#55535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55535 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27649410 Pulled By: albanD fbshipit-source-id: 4453da3853e2ac8e2e625ae9bdb9f717336bb3ec	2021-04-13 06:21:12 -07:00
albanD	505f6f325f	port addcdiv to opinfo (#55518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55518 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27649411 Pulled By: albanD fbshipit-source-id: cfb0a235d94ef62589acbeb9bf11d2ea17248484	2021-04-13 06:21:10 -07:00
albanD	9ccae89102	port addcmul to OpInfo (#55517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55517 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27649413 Pulled By: albanD fbshipit-source-id: e1faf25cf7f9c3636f62db1512aee78fd7c4f9b6	2021-04-13 06:19:33 -07:00
Chunli Fu	00737efdb2	[shape inference] Add shape inference func for Bucketize Summary: ATT, to ensure output has the same dim type with the input. We need to find a more generic way though... Test Plan: unit test Reviewed By: ipiszy, khabinov Differential Revision: D27690748 fbshipit-source-id: e53832c67b8ac86973c288d2d6b76ef8e5db14b9	2021-04-13 05:59:40 -07:00
Yi Wang	4b09756d26	[SPMD] Move a comment (#55877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55877 Address a comment in: `10bc1dae40 (r610930244)` ghstack-source-id: 126369525 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27729567 fbshipit-source-id: 5509ebfba2b741cd3532c69044227e5af0fb54fc	2021-04-13 05:53:31 -07:00
Natalia Gimelshein	56212daf7e	allow tests to run locally without setting environment variables (#55880 ) Summary: Fixes breakage caused by https://github.com/pytorch/pytorch/issues/55753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55880 Reviewed By: nikithamalgifb Differential Revision: D27735299 Pulled By: mruberry fbshipit-source-id: f8f927f95e4f7fe5f00448ed25d23dac3b7104a4	2021-04-13 05:29:51 -07:00
CodemodService FBSourceGoogleJavaFormatLinterBot	37ac271089	[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D27731676 fbshipit-source-id: 9402fa9f19b9186a2f38e56c110800254a8e8d91	2021-04-13 04:15:35 -07:00
Yi Wang	b4cb020c0f	[Gradient Compression] Make orthogonalization_epsilon configurable in PowerSGDState (#55738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55738 Per title, and use 0 as the default value. It turns out that setting this epsilon as 0 can accelerate convergence and improve accuracy for some use cases. Test Plan: unit tests f264687105 f264675194 Reviewed By: shuyingsunshine21 Differential Revision: D27694971 fbshipit-source-id: b61528c6c817127974acdc4635bccf607532287f	2021-04-13 02:52:56 -07:00
Arindam Roy	4cfbb2401f	[ROCM] Re-enable 3 previously faling tests in test_cuda.py (#55813 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53190 The following tests are passing in ROCM 4.1. Hence re-enabling them. test_grad_scaling_multigpu test_streaming_backwards_device_transfer test_streaming_backwards_multiple_streams Pull Request resolved: https://github.com/pytorch/pytorch/pull/55813 Reviewed By: yinghai Differential Revision: D27725547 Pulled By: ngimel fbshipit-source-id: d8b3ed69fa44c2086f0666b4db0fabb30ad59439	2021-04-13 01:09:11 -07:00
Jeff Yang	5a4e5db9ad	docs: fix profiler docstring (#55750 ) Summary: Description: - change the docstrings for profiler module as per google docstring - add link to `torch.autograd` module - document `ProfilerAction` and `ProfilerActivity` https://12292060-65600975-gh.circle-artifacts.com/0/docs/profiler.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/55750 Reviewed By: yinghai Differential Revision: D27725494 Pulled By: ngimel fbshipit-source-id: 32d0a18e274a871ac712b28b61ba63eb08299a03	2021-04-13 00:23:14 -07:00
Can Balioglu	e61b4fa691	[3/n] [torch/elastic] Introduce `EtcdRendezvousBackend`. (#55637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55637 This diff introduces the `EtcdRendezvousBackend` type that will serve as an experimental alternative to the existing `EtcdRendezvousHandler`. The major advantage of `EtcdRendezvousBackend` is that it delegates the bulk of the rendezvous handling logic to `DynamicRendezvousHandler` which is shared with `C10dRendezvousBackend` (see D27654492) and any other potential future rendezvous backend (e.g. Amazon S3). ghstack-source-id: 126312209 Test Plan: Run the existing and newly-introduced unit/integration tests. Reviewed By: tierex Differential Revision: D27654498 fbshipit-source-id: f3259adfc9068b7e323b947a7d8d52fcd0b8ada1	2021-04-12 22:20:29 -07:00
Can Balioglu	339d3bf394	[2/n] [torch/elastic] Introduce `C10dRendezvousBackend`. (#55636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55636 This diff introduces: - The `C10dRendezvousBackend` type to support C10d stores as rendezvous backends. - A fix to the `TCPStore.compare_set()` function to support non-existent keys. - A placeholder `c10d-experimental` registry to instantiate C10d-baked rendezvous backends via `get_rendezvous_handler()`. ghstack-source-id: 126312162 Test Plan: Run the existing and newly-introduced unit/integration tests. Reviewed By: tierex Differential Revision: D27654492 fbshipit-source-id: 09f498138b35186de4b0e174adb33fb5b5aa4b52	2021-04-12 22:20:27 -07:00
Can Balioglu	b3dd8cde61	[1/n] [torch/elastic] Introduce `DynamicRendezvousHandler` and `RendezvousBackend`. (#55635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55635 This diff introduces the `DynamicRendezvousHandler` type as a stub implementation and its accompanying `RendezvousBackend` interface. `DynamicRendezvousHandler` is intended to be a backend-agnostic type that will contain the core (bulk) logic of rendezvous handling. Any backend specific operation will be delegated to a concrete subclass of `RendezvousBackend` (e.g. `C10dRendezvousBackend` - see D27654492) that is passed as a constructor argument to `DynamicRendezvousHandler`. ghstack-source-id: 126304697 Test Plan: Run the existing and newly-introduced unit/integration tests. Reviewed By: tierex Differential Revision: D27654478 fbshipit-source-id: 9fc89a6e4cb308971c65b29a7c5af7ae191f70c5	2021-04-12 22:18:49 -07:00
Ailing Zhang	da01f4398b	Add InferenceMode TLS to ThreadLocalState. (#55822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55822 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27721285 Pulled By: ailzhang fbshipit-source-id: c978927f8cb3a91de45635b8279e166a3d5652ab	2021-04-12 21:37:27 -07:00
nikithamalgi	8fc16da649	[Hackathon]Move tests for slice to test_slice.py (#55524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55524 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D27686738 Pulled By: nikithamalgifb fbshipit-source-id: f1896d739c3a3a7ece987f6eea4072477626231b	2021-04-12 21:02:19 -07:00
nikithamalgi	5cd73df8f8	[Hackathon]Move complex tests to test_complex.py (#55514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55514 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27679881 Pulled By: nikithamalgifb fbshipit-source-id: 8a4f4ab8f375187b72ede6feaea37ab546da6d3e	2021-04-12 20:35:36 -07:00
Nikita Shulga	bbcb12614e	Sort slow tests json by test name (#55862 ) Summary: This will make https://github.com/pytorch/test-infra/commits/master more readable in the future Pull Request resolved: https://github.com/pytorch/pytorch/pull/55862 Reviewed By: ngimel Differential Revision: D27728462 Pulled By: malfet fbshipit-source-id: 2f10dd7ace49f343c4b91fc02be9d955fdbf67cc	2021-04-12 20:08:56 -07:00
Xiang Gao	a756a9e553	Add device id to ConvolutionParams (#50892 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50892 Reviewed By: mruberry Differential Revision: D27703874 Pulled By: ngimel fbshipit-source-id: aefa4f44ca3387c2f7aa06136e5c62d66a4ac6ab	2021-04-12 19:22:18 -07:00
Jane Xu	5ba4cfb7bf	Minor typo fixes in `_script.py` (#55818 ) Summary: I was reading through this file to get a better understanding of torch.jit.script and just fixed these along the way. The only functional change is [here](https://github.com/pytorch/pytorch/compare/master...janeyx99:minor-jit-nits?expand=1#diff-c05f6af41a2d9c7ec7a2b15088259fb74763f7d1406da70f324fc6b20af47427R824). Everything else is documentation only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55818 Reviewed By: walterddr Differential Revision: D27718853 Pulled By: janeyx99 fbshipit-source-id: a08f5451a904ef7a440be418f11ec083dd14766d	2021-04-12 18:48:26 -07:00
Shen Li	e7bb00cb49	Add a warning message to retire ProcessGroup RPC backend (#55616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55616 Test Plan: Imported from OSS Reviewed By: beauby Differential Revision: D27650627 Pulled By: mrshenli fbshipit-source-id: ecf06f3b77c7e66b32822dfabf2ef88864b0e5bd	2021-04-12 18:31:57 -07:00
Raghavan Raman	d805908c34	[NNC] API to reorder multiple loops (#55568 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52690 This PR adds the following APIs: ``` static bool areLoopsPerfectlyNested(const std::vector<For>& loops); static std::vector<For> reorder( const std::vector<For*>& loops, const std::vector<size_t>& permutation); ``` The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568 Reviewed By: albanD Differential Revision: D27689734 Pulled By: navahgar fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6	2021-04-12 18:12:24 -07:00
Ralf Gommers	48ddc9762b	Upgrade mypy to version 0.812 (#55712 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54211 This was a little more annoying than expected, because the `exclude = ` key in `mypy.ini` is weird. I'll file an upstream issue about that. I ignored one file, `torch/distributed/elastic/agent/server/api.py` that had ~8 errors that were hard to figure out. This can be done in a follow-up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55712 Reviewed By: walterddr Differential Revision: D27694976 Pulled By: malfet fbshipit-source-id: 228d8be6af040343ce46595dabaca212e69ccc68	2021-04-12 18:08:28 -07:00
James Reed	68e0796466	[JIT][write path] Make NoneType annotation_str emit `NoneType` instead of `None` (#54746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54746 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27350331 Pulled By: jamesr66a fbshipit-source-id: 3f44d6589c29f39378432d0b6b281d96bb4829e7	2021-04-12 17:36:45 -07:00
James Reed	a3c06e69aa	[JIT][write path] Fix TupleType.annotation_str to conform to `typing` module syntax for empty tuple type (#54745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54745 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27350332 Pulled By: jamesr66a fbshipit-source-id: 62af3b2b53561bb8e4adbdc0aec54520e08a5bf7	2021-04-12 17:29:33 -07:00
Rong Rong (AI Infra)	d0cd16899f	rework device type filter rule (#55753 ) Summary: Currently common_device_type generates device-specific test based on vague rules. see https://github.com/pytorch/pytorch/issues/55707. This should fix https://github.com/pytorch/pytorch/issues/55707 # Changes included This PR changes the rule: 1. First user provided args (`except_for` and `only_for`) are processed to filter out desired device type from a ALL_AVAILABLE_LIST 2. Then environment variables are processed the exact same way. tests are generated based on the final filtered list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55753 Test Plan: CI Reviewed By: seemethere, ngimel Differential Revision: D27709192 Pulled By: walterddr fbshipit-source-id: 1d5378ef013b22a7fb9fdae24b486730b2e67401	2021-04-12 16:07:27 -07:00
Anjali Chourdia	dab1cdf7cb	Revert D27708944: [pytorch][PR] [OpInfo] move matmul to OpInfo Test Plan: revert-hammer Differential Revision: D27708944 (`08561cad10`) Original commit changeset: c200ded15082 fbshipit-source-id: 5bb75aa19c1ca761d1f118aafc483746ae813e2a	2021-04-12 15:56:55 -07:00
Wenlei Xie	561b507843	Eliminate device guard in generic dispatch key kernel wrappers (#55131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55131 Benchmark `zeros_out`: ```python from torch.utils.benchmark import Timer counts = Timer( stmt="""at::zeros_out(t, {1});""", setup="auto t = at::empty({1});", language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` With device guard: ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f834f095ca0> at::zeros_out(t, {1}); setup: auto t = at::empty({1}); All Noisy symbols removed Instructions: 1396022 1396022 Baseline: 0 0 1000 runs per measurement, 1 thread ``` Without device guard: ``` <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.CallgrindStats object at 0x7f25e48927c0> at::zeros_out(t, {1}); setup: auto t = at::empty({1}); All Noisy symbols removed Instructions: 1296022 1296022 Baseline: 0 0 1000 runs per measurement, 1 thread ``` We see about `7.7%` improvement. ghstack-source-id: 126295368 Test Plan: ``` buck build //caffe2/aten/... buck test mode/dev mode/no-gpu //caffe2/test:torch -- 'caffe2/test:torch - test_msnpu_error (test_torch.TestTorch)' ``` Reviewed By: ezyang Differential Revision: D27496584 fbshipit-source-id: 97f783a809b77b28f77a93096d69b3da9ee69df7	2021-04-12 15:42:19 -07:00
Bin Bao	69b7b011dc	[JIT] Add cond-add-relu matching pattern to cover in-place ops (#55458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55458 Previously the cond-add-relu pass blindly turns all in-place add and relu ops into non-mutation version, when those ops are not part of the fusion patten, it can actually hurt the performance as shown in densenet on some platform. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D27620415 fbshipit-source-id: 8302c0c85f3a064dfd8ac994e92416dde927e348	2021-04-12 15:23:34 -07:00
Peter Bell	566e06eb9b	Use _WeakTensorRef over weakref in test_autograd.py (#55726 ) Summary: There are a few autograd tests checking for tensors leaked by reference cycles. This changes them to use `_WeakTensorRef` over `weakref`. `_WeakTensorRef`, added in https://github.com/pytorch/pytorch/issues/52874, accesses the C++ level `TensorImpl` reference count, compared to `weakref` which accesses python refcounts and so can only tell if the python wrapper object gets deallocated. Not only is this less code, it's also more accurately detecting that the Tensor itself is deallocated. I didn't touch `weakref` usage in [test_anomaly_assign_parent_cleanup](`fc349cbcde/test/test_autograd.py (L3733)`) and [test_nested_anomaly_printstack_cleanup](`fc349cbcde/test/test_autograd.py (L3772)`) because these are intentionally testing for python object cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55726 Reviewed By: ngimel Differential Revision: D27718526 Pulled By: albanD fbshipit-source-id: 37a4914360e35dd4ae8db06b29525cebec4d4b84	2021-04-12 14:16:02 -07:00
Winston Smith	af1a772876	Disable overloading of std::max & std::min for inputs of distinct types (#55638 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55613 ### Problem By default, `std::max` and `std::min` only operate on inputs of the same type. In [`c10/util/BFloat16-math.h`](https://github.com/pytorch/pytorch/blob/master/c10/util/BFloat16-math.h), `std::max` & `std::min` have been overloaded: `305abde976/c10/util/BFloat16-math.h (L32-L33)` ezyang [observed](https://github.com/pytorch/pytorch/pull/55586#issuecomment-815862373) & [illustrated](https://godbolt.org/z/bjTjPMMco) that calls to `std::max` & `std::min` for distinct input types (eg. `std::max(int, float)`) are being handled via `BFloat16`'s aforementioned overloads via implicit conversion to `BFloat16`. (I haven't looked into why yet). ### Solution implemented 1. Disabled overloading of `std::max` & `std::min` for inputs of distinct types by removing these overloads for `BFloat16`. 2. Instead, `<` and `>` operators are now being overloaded for `BFloat16` now (for comparison with another `BFloat16`), since `std::max` and `std::min` use these operators. 3. Calls to `std::max` and `std::min` with inputs of distinct types are only present at 3 places in the codebase, where they can either be handled by a `static_cast`, or by changing the type: a. [`aten/src/ATen/native/quantized/fake_quant_per_tensor_affine.cpp`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/fake_quant_per_tensor_affine.cpp#L111) b. [`aten/src/ATen/native/cpu/BinaryOpsKernel.cpp`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp#L74) c. [`aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp`](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L2998-L2999) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55638 Reviewed By: albanD Differential Revision: D27669702 Pulled By: ezyang fbshipit-source-id: 790a67b76f86c25fad2c7ed0345b7f35ab5eca68	2021-04-12 12:49:34 -07:00
Matt Popovich	c00b9dc599	Small typo in comment (#55485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55485 Reviewed By: albanD Differential Revision: D27641537 Pulled By: mrshenli fbshipit-source-id: 1dc0d2d77c47a66dcf10866801a1e0f495422149	2021-04-12 12:43:14 -07:00
Ailing Zhang	f7a51b2ab9	Don't set version_counter on inference tensor for unsafe_ ops. (#55819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55819 Test Plan: On devserver: `buck run //xplat/langtech/tuna/cli:tuclix -- --model-dir ~/workspace/portal_en_US/ --audio-file ~/fbsource/fbcode/shortwave/test/data/audio_unittest.wav.to.raw` on top of Rittzz's D27691649 On device: Reviewed By: Rittzz Differential Revision: D27716745 fbshipit-source-id: 1921f18ee6b06990f71b86b9c4b3e1f3ce531001	2021-04-12 12:37:48 -07:00
Rong Rong (AI Infra)	08561cad10	[OpInfo] move matmul to OpInfo (#55543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55543 Reviewed By: samestep Differential Revision: D27708944 Pulled By: walterddr fbshipit-source-id: c200ded15082eaeed7ba3077a0c8629fed0db505	2021-04-12 11:35:10 -07:00
Georgia Hong	008ec544f4	[p2c2][operators] Self binning histogram op error msg Summary: Change error msg to include the min max values when failing. Test Plan: Existing unit tests: ``` buck test //caffe2/caffe2/python/operator_test:self_binning_histogram_test ``` Failing wf with error msg: f264505545 Reviewed By: TailofJune Differential Revision: D27630820 fbshipit-source-id: c490ce8c8c40414403634979c9beaf9c08569a96	2021-04-12 11:33:39 -07:00
Sam Estep	01441af763	Use mypy internals instead of fnmatch for mypy wrapper (#55702 ) Summary: I noticed that https://github.com/pytorch/pytorch/issues/53296 added these two lines to the `files` list in `mypy-strict.ini`: ``` benchmarks/instruction_counts/.py, benchmarks/instruction_counts//.py, ``` I opened https://github.com/pytorch/pytorch/issues/55700 to simplify them into one line, but I was also curious whether `tools/mypy_wrapper.py` correctly handles those patterns, so I added the `test_glob_wildcards_dont_expand_or_collapse` case shown in this PR. Turns out, it doesn't! I believe this is because [`mypy` uses `glob`](https://github.com/python/mypy/blob/v0.770/mypy/config_parser.py#L45-L63) to parse these patterns, and for some reason, [`fnmatch`](https://docs.python.org/3/library/fnmatch.html) and [`glob`](https://docs.python.org/3/library/glob.html) don't agree with each other on what `` means: - according to `fnmatch`, `` seems to mean `.` - according to `glob`, `` seems to mean `[^/]` [This SO answer](https://stackoverflow.com/a/60174071) suggests using the [`glob.globmatch` function from the `wcmatch` library](https://facelessuser.github.io/wcmatch/glob/#globmatch) to solve the issue, but [we didn't want to add another external dependency](https://github.com/pytorch/pytorch/pull/55702#discussion_r610868623), so instead I simply modified our matching function to just directly call `mypy`'s own internal function that does the globbing (linked above). One possible downside of this approach is that now the tests in `tools/test/test_mypy_wrapper.py` could break if the directory structure of PyTorch is changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55702 Test Plan: ``` python tools/test/test_mypy_wrapper.py ``` Reviewed By: malfet, seemethere Differential Revision: D27684499 Pulled By: samestep fbshipit-source-id: d99387a579c21eee73d1714e3e815ab7155f9646	2021-04-12 11:30:16 -07:00
Facebook Community Bot	9593af305c	Automated submodule update: tensorpipe (#55137 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `fddc3aa75b` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55137 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: beauby Differential Revision: D27499763 fbshipit-source-id: d96538009be7824f2ef600e9816239188ddd991a	2021-04-12 11:27:16 -07:00
Andres Suarez	684589e8e0	[codemod][fbcode][1/n] Apply buildifier Test Plan: Manual inspection. Sandcastle. Reviewed By: karlodwyer, zsol Differential Revision: D27702434 fbshipit-source-id: ee7498331c51daf44a29f2de452e3b02488b9af3	2021-04-12 11:04:32 -07:00
Ben Koopman	db394efbb9	Support batched embeddings for 8Bit embedding bag quantization (#55343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55343 Add support for N-dimensioned batches of 2D embedding bags to qembeddingbag_byte_prepack and qembeddingbag_byte_unpack. This is currently supported in C2 via caffe2::Fused8BitRowwiseQuantizedToFloat and caffe2::FloatToFused8BitRowwiseQuantized, but is being supported in PyTorch operators via this change. Test Plan: buck test //caffe2/test:quantization -- test_embedding_bag_byte Reviewed By: radkris-git Differential Revision: D27480917 fbshipit-source-id: 9878751c6cee8a55909fe58a3e8c222ea31c20bb	2021-04-12 11:00:44 -07:00
ACactUs	80d04f910c	fix typo in argmax docstring (#55239 ) Summary: argmax docstring previously said that it returns indexes of the first 'minimal' value, fixed typo in that line to 'maximal' Pull Request resolved: https://github.com/pytorch/pytorch/pull/55239 Reviewed By: albanD Differential Revision: D27641562 Pulled By: mrshenli fbshipit-source-id: f8b5c579400088b5210c83a05da6c4c106fbf95d	2021-04-12 10:39:36 -07:00
Edward Yang	c91cf1e7a9	Add support for multiple outputs in structured kernels, port fractional_max_pool2d (#55581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55581 Multiple outputs now OK, as long as their all Tensor. Ported fractional_max_pool2d to make sure the shindig all works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27641267 Pulled By: ezyang fbshipit-source-id: f88bfcd2b11e9ae90b023c9310c033d12637a53e	2021-04-12 10:17:40 -07:00
Edward Yang	8dd7e1528f	Port replication_pad1d_backward to structured (#55537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55537 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27641208 Pulled By: ezyang fbshipit-source-id: 56303b103e3ebc651bb64d11a9b19647f9affe53	2021-04-12 10:17:37 -07:00
Edward Yang	3b96a7965a	Port replication_padding3d to structured (#55499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55499 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27641207 Pulled By: ezyang fbshipit-source-id: 0a0ef15151ae5de09b08ee09c623f9f38df3bec0	2021-04-12 10:17:35 -07:00
Edward Yang	b9b103ff94	Port replication_padding1d to structured (#55481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55481 Tracking issue #55070 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27641209 Pulled By: ezyang fbshipit-source-id: f1fc5588aaeb91d974aee74a8740ab47b7383baf	2021-04-12 10:16:04 -07:00
Sameer Deshmukh	5fb1142702	Add CSR (compressed sparse row) layout for sparse tensors (#50937 ) Summary: Implement compressed sparse row format. Derived from the GCS implementation at https://github.com/pytorch/pytorch/pull/44190 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50937 Reviewed By: mrshenli Differential Revision: D27439865 Pulled By: ezyang fbshipit-source-id: 3ba3dcb9679505b980ff6a5f513e913bbae2fb1d	2021-04-12 10:09:12 -07:00
Ailing Zhang	c6d9ca0c2b	[reland]Replace AutoNonVariableTypeMode with InferenceMode in static runtime. (#55731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55731 Forgot to export the diff in my last one. Retry... Test Plan: https://www.internalfb.com/intern/aibench/details/3752129704 https://www.internalfb.com/intern/aibench/details/1306815519 Reviewed By: hlu1 Differential Revision: D27694660 fbshipit-source-id: b351338fa789b9e9c7337df9b1bc1bc0fc387f5d	2021-04-12 09:48:20 -07:00
lezcano	211d31afc9	symeig supports complex backward (#55085 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53651 I did not put much effort in improving the docs, as I will go over all these docs in future PRs cc anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55085 Reviewed By: nikithamalgifb Differential Revision: D27493604 Pulled By: anjali411 fbshipit-source-id: 413363013e188bc869c404b2d54ce1f87eef4425	2021-04-12 09:45:50 -07:00
Philip Meier	e05ca753bf	Fix nightly tool for python 3.6 (#55776 ) Summary: Given that the minimal required Python version for using PyTorch is 3.6, the development tools should also be able to handle it. `./tools/nightly.py` currently uses the parameters `capture_output` and `text` of `subprocess.run` that were only added for [Python 3.7](https://docs.python.org/3/library/subprocess.html#subprocess.run). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55776 Reviewed By: ngimel Differential Revision: D27709124 Pulled By: ezyang fbshipit-source-id: aeea15a891ba792f3cd5fa602f0d7b746007e30c	2021-04-12 09:34:29 -07:00
Ilqar Ramazanli	13153924cc	OpInfo porting for msort operator (#55488 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55488 Reviewed By: ngimel Differential Revision: D27708648 Pulled By: iramazanli fbshipit-source-id: 62b6bc5bd6e54c593b9afac56cb2511411683416	2021-04-12 09:22:30 -07:00
Your Name	1a8ec9c447	Add breakpad to Docker image (#55439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55439 This adds [breakpad](https://github.com/google/breakpad) to the build in CI (just on one image for now). I attempted in #54739 to build it from source as a normal third_party submodule but it uses autotools and has some weird build steps that made it hacky to integrate. We really only need it for release builds anyways since its use is moot if built with anything but `RELEASE=1`. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27679766 Pulled By: driazati fbshipit-source-id: 8211444df49b219c722137b9243d16d649a1f1ae	2021-04-12 09:20:57 -07:00
David Riazati	3c6b52ae62	Cache slow/disabled test files (#55682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55682 Fixes #55648 For now it downloads and writes the relevant files to the system's temp dir and marks it as valid for 3 hours. Test Plan: Imported from OSS Reviewed By: malfet, nikithamalgifb Differential Revision: D27685616 Pulled By: driazati fbshipit-source-id: 27469b85fe4b6b4addde6b22bf795bca3d4990ef	2021-04-12 09:17:07 -07:00
Vasiliy Kuznetsov	ec9b20ddc0	fx quant: fix edge case with copynode after user function (#55710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55710 In the current code, there is an edge case which leads to an error after the prepare step: 1. have a pattern like this: ``` user_func_unmatched_to_qhandler -> node_matched_to_copy_node_qhandler ``` 2. the user function returns a type which is not observable (i.e. not a Tensor) 3. if this is run through `prepare_fx`, calibrating it with data leads to a runtime error, because observers cannot observe non-tensor types. This PR fixes the issue. If a node matched to `CopyNodeQuantizeHandler` is after an unmatched node, we delete the observer. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_no_obs_between_unmatched_node_and_copy_node ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27686811 fbshipit-source-id: 320be41b1f383c6352ff89fb39a9f480822a3bb2	2021-04-12 08:47:44 -07:00
Luca Wehrstedt	3f8d476857	Split out CUDA RPC tests (#55695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55695 In order to be able to run CUDA tests on their own (e.g., to avoid running CPU tests on GPU machines). Done by moving test methods to a separate class (and sometimes introducing a "common" base class for utils), and then providing new entry points inside a `cuda/` subdirectory. Test Plan: Checked they are run on Sandcastle. Reviewed By: mrshenli Differential Revision: D27618198 fbshipit-source-id: 8f671657f79c8ae115748ab7752fe0066705893b	2021-04-12 07:48:08 -07:00
Mike Ruberry	399b66c813	Ports logdet from method_tests() to op_db (#55743 ) Summary: Per title. Also updates some tensor construction helpers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55743 Reviewed By: ngimel Differential Revision: D27702060 Pulled By: mruberry fbshipit-source-id: f64b7bee855733ad1f4fd182819ceec5831d9878	2021-04-11 20:39:16 -07:00
Jie	66289673f7	patching requires_grad on DifferentiableGraph (#55701 ) Summary: The retrieval of profile node is much easier prior to inserting guard node. test cases updated to reflect the patch on a previously failing cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55701 Reviewed By: pbelevich Differential Revision: D27701216 Pulled By: Krovatkin fbshipit-source-id: e2e6b64b682377e622b75c762e85ff7967e45118	2021-04-11 17:04:13 -07:00
Rohan Varma	19f15317a0	[BE][Docs] Improve dist.new_group doc (#55660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55660 Noticed this doc was missing clarification on nccl env vars that init_process_group docs have. Also, specify default behavior when backend=None is passed in. ghstack-source-id: 126251116 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D27672208 fbshipit-source-id: 2e79d297174e135173bceb059450ea267367bde4	2021-04-11 16:16:18 -07:00
neal	a3c062d4f5	docs: improve torch.matrix_exp() (#55626 ) Summary: Add a signature and make the mathematical expression related to the signature Fixes https://github.com/pytorch/pytorch/issues/55599 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55626 Reviewed By: ngimel Differential Revision: D27699518 Pulled By: mruberry fbshipit-source-id: e61d76e99eb8fc36114c1c2ee90990740d78beea	2021-04-11 16:03:03 -07:00
Yukio Siraichi	93bf0ae6fc	Remove legacy constructor calls from pytorch codebase. (#54142 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/53889 Related to https://github.com/pytorch/pytorch/issues/47112 Removing every occurrence of the legacy constructor call present in PyTorch at: - _docs_ - _benchmarks_ - _test_ - _caffe2_ - _CONTRIBUTING.md_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142 Reviewed By: ngimel Differential Revision: D27699450 Pulled By: mruberry fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546	2021-04-11 15:45:17 -07:00
Yanan Cao	fa29a647db	[JIT] Allow unpacking tuple and assign their values to SELECT-type expressions (#55268 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55268 Reviewed By: pbelevich, izdeby Differential Revision: D27551950 Pulled By: gmagogsfm fbshipit-source-id: 35324b728649bb1e6c5410a1004d2f6964f98304	2021-04-11 14:21:48 -07:00
Tugsbayasgalan Manlaibaatar	b80c6f863f	Disambiguate error message for working with not fully refined tuple types (#55745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55745 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27698691 Pulled By: tugsbayasgalan fbshipit-source-id: 7855042d37290f19d53adfc0b4da606430501663	2021-04-11 02:30:05 -07:00
Winston Smith	facbcec298	Make leak_corrupted_threadpool non-atomic (#55341 ) Summary: Following up on https://github.com/pytorch/pytorch/pull/54895#discussion_r606402656. A race-condition wouldn't arise because `leak_corrupted_threadpool` can be set to true only after fork via the `pthread_atfork` handler, when a (child) process would be single-threaded. It's set to false also when the process is still single-threaded (`pthreadpool` is called during an invocation to `set_num_threads`, prior to which a child process would remain single-threaded). All threads (if & when multiple threads would be created) would always see `leak_corrupted_threadpool` as false if it would be accessed concurrently. Since no reader threads can exist while a writer thread changes its value (false->true and true->false), `leak_corrupted_threadpool` might as well be a non-atomic bool. ### Pros 1. No thread-synchronization is required for `leak_corrupted_threadpool`, as it's a non-atomic bool. 2. The call to `compare_exchange_strong` has been be removed. cc: malfet VitalyFedyunin ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/55341 Reviewed By: albanD Differential Revision: D27669442 Pulled By: ezyang fbshipit-source-id: 926cb5c1b0a537c1c2ab164b0d51d37c1f1b67f0	2021-04-10 19:25:33 -07:00
Can Balioglu	84a7ab250b	Optimize constructing tensors from external data (#55705 ) Summary: This PR optimizes the way tensors are constructed from external data. It avoids allocating an empty tensor beforehand and directly constructs the target tensor by passing the newly-initialized `DataPtr`. Running some Facebook-internal benchmarks showed that combined with https://github.com/pytorch/pytorch/issues/54530 this PR achieves performance parity with Caffe2 tensor construction. (Overall ~2x speed improvement over the original `at::from_blob()` implementation.) Testing is done with the existing unit and integration tests as there is no user-observable API change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55705 Reviewed By: ezyang Differential Revision: D27686043 Pulled By: cbalioglu fbshipit-source-id: b365c614476bcf0567797dfaf2add1b76fb6c272	2021-04-10 17:54:10 -07:00
Philip Meier	255494c2aa	torch.testing allclose -> close (#54781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54781 Right now the functions have divergent names with one postfixed `_equal` and the other `_allclose`. I've opted to use `_(equal\|close)` over `_all(equal\|close)` think it is a reasonable assumption that all values need to be equal or close for this pass even without explicitly naming the function this way. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27438957 Pulled By: mruberry fbshipit-source-id: 2951dac06d1430e15119ae94eafa234f3eb02f09	2021-04-10 13:35:38 -07:00
Philip Meier	c9b94a85e9	change torch.testing helper asserts to checks (#54780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54780 - In #53152 we opted to use `tb=native`. Thus, regardless if we use `pytest` to run the tests `__tracebackhide__` is not honored. and additional layers of helper functions make the traceback harder to parse. To overcome this, we change the internal helpers to return `ok: bool, msg: Optional[str]` and only raise the error in the top level function. We do that already in the current implementation that we are trying to replace: `36ce673f16/torch/testing/__init__.py (L92-L93)` `36ce673f16/torch/testing/__init__.py (L112)` Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27438849 Pulled By: mruberry fbshipit-source-id: 3e7a33dabb45463c29e8b9736fad09efb523f18d	2021-04-10 13:12:09 -07:00
Scott Wolchok	548765d9a5	[PyTorch] Add & use inferExpandGeometry_dimvector (#55316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55316 No need for heap allocations in the common case here. ghstack-source-id: 126170054 Test Plan: Existing CI Reviewed By: hlu1 Differential Revision: D27571942 fbshipit-source-id: 11fbf077c583c80ea63e024d2b9e1599785fff71	2021-04-09 22:15:20 -07:00
Scott Wolchok	151869aca6	[PyTorch][easy] Use sizes()[x] instead of size(x) in addr (#55247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55247 When x is known to be in-bounds, sizes() is faster. ghstack-source-id: 126170048 Test Plan: CI Reviewed By: hlu1 Differential Revision: D27523681 fbshipit-source-id: 021c82a8a6b770802f4cd51cf6ff77046d71c938	2021-04-09 22:15:15 -07:00
Scott Wolchok	12c19c398c	[PyTorch] Update expand_size API to match expand_inplace (#55246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55246 c10::MaybeOwned<Tensor> and no more unary tuples. ghstack-source-id: 126170051 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D27523682 fbshipit-source-id: 2590993cfc62136e65fd9a791e4ab68b2c366556	2021-04-09 22:15:10 -07:00
Scott Wolchok	16a9141e2c	[PyTorch] Update expand_outplace API to match expand_inplace (#55245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55245 Like `expand_inplace`, `expand_outplace` now returns `MaybeOwned<Tensor>` in most cases. I wasn't confident around the ownership semantics of the `TensorList` -> `std::vector<Tensor>` case, so I left that one alone. ghstack-source-id: 126170052 Test Plan: Existing CI. Reviewed By: ezyang Differential Revision: D27522811 fbshipit-source-id: 28c5a626b65681e361f4006a0aaa7dc23ba9612a	2021-04-09 22:15:04 -07:00
Scott Wolchok	6fd875923e	[PyTorch] Add MaybeOwned::operator*() && (#55244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55244 Add the ability to move from the underlying object in a `MaybeOwned`. FWIW, `MaybeOwned` is new territory for me personally and this move-and-dereference operation is even more so, but I think it makes sense and the tests pass. ghstack-source-id: 126170046 Test Plan: Added automated tests. Reviewed By: bhosmer Differential Revision: D27522809 fbshipit-source-id: 82b180031e93d725209b6328f656315c232e5237	2021-04-09 22:14:59 -07:00
Scott Wolchok	e8dd65102b	[PyTorch] Use infer_size_dimvector in ExpandUtils (#55180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55180 Even if we're expanding a Tensor's dimensions, DimVector's size is still a good guess at the rank of a Tensor in general. None of these sites actually seem to need a std::vector. ghstack-source-id: 126170045 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D27520127 fbshipit-source-id: 4064764fad1b3782b379f04627b48331c3ee011f	2021-04-09 22:14:55 -07:00
Scott Wolchok	fa19b6dd4d	[PyTorch] New expand_inplace API with MaybeOwned<Tensor> and no unary tuples (#55065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55065 expand_inplace may give you the same Tensor(s) back, and it unnecessarily wrapped single-Tensor results in a tuple. Further diffs will deprecate and replace the rest of the similar APIs in ExpandUtils. ghstack-source-id: 126170049 Test Plan: beyonce_test Reviewed By: ezyang Differential Revision: D27469297 fbshipit-source-id: 56cf14bc5603355f399fef2e5b02b97afa504428	2021-04-09 22:13:21 -07:00
Yi Wang	2496a09314	[Gradient Compression] Fix PowerSGD docstring by removing an extra whitespace (#55666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55666 {F590513307} Some code is not properly displayed due to an extra whitespace ahead of `(num_rows + num_cols)`. ghstack-source-id: 126148569 Test Plan: Locally viewed Reviewed By: rohan-varma Differential Revision: D27673663 fbshipit-source-id: 603ae4ddbe86ceaefc311885b82b0f6b48b57b27	2021-04-09 21:11:40 -07:00
Ailing Zhang	5a8cdc2fdb	Revert D27691509: Replace AutoNonVariableTypeMode with InferenceMode in static runtime. Test Plan: revert-hammer Differential Revision: D27691509 (`d695ba94f6`) Original commit changeset: d43db028a399 fbshipit-source-id: 8cfa2f821ef3251b323483691672ed70858d9d68	2021-04-09 20:36:20 -07:00
Ailing Zhang	d695ba94f6	Replace AutoNonVariableTypeMode with InferenceMode in static runtime. Test Plan: https://www.internalfb.com/intern/aibench/details/3752129704 https://www.internalfb.com/intern/aibench/details/1306815519 Reviewed By: hlu1 Differential Revision: D27691509 fbshipit-source-id: d43db028a399bb02166a539577f6922237145f83	2021-04-09 20:04:00 -07:00
Hui Guo	263a15c5aa	[tensorexpr] Add PYTORCH_TENSOREXPR_DONT_FUSE env variable to disable fusion on specified operators - fixed #50757 (#55650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55650 Test Plan: Imported from OSS $ python local/fusion.py ``` graph(%a.1 : Tensor, %b.1 : Tensor, %c.1 : Tensor): %33 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %34 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %35 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %36 : bool = prim::TypeCheck[types=[Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)]](%c.1, %a.1, %b.1) %37 : Tensor = prim::If(%36) block0(): %18 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = prim::TensorExprGroup_0(%33, %34, %35) -> (%18) block1(): %44 : Function = prim::Constant[name="fallback_function", fallback=1]() %45 : (Tensor) = prim::CallFunction(%44, %c.1, %a.1, %b.1) %46 : Tensor = prim::TupleUnpack(%45) -> (%46) return (%37) with prim::TensorExprGroup_0 = graph(%c.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %a.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %b.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)): %10 : int = prim::Constant[value=1]() %11 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%a.1, %b.1, %10) # local/fusion.py:13:15 %9 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::mul(%a.1, %b.1) # local/fusion.py:13:19 %6 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::mul(%9, %c.1) # local/fusion.py:13:19 %3 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::add(%11, %6, %10) # local/fusion.py:13:15 return (%3) ``` $ PYTORCH_TENSOREXPR_DONT_FUSE="add" python local/fusion.py ``` graph(%a.1 : Tensor, %b.1 : Tensor, %c.1 : Tensor): %3 : int = prim::Constant[value=1]() %6 : Tensor = aten::add(%a.1, %b.1, %3) # local/fusion.py:13:15 %27 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %28 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %29 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %30 : bool = prim::TypeCheck[types=[Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)]](%c.1, %a.1, %b.1) %31 : Tensor = prim::If(%30) block0(): %18 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = prim::TensorExprGroup_0(%27, %28, %29) -> (%18) block1(): %35 : Function = prim::Constant[name="fallback_function", fallback=1]() %36 : (Tensor) = prim::CallFunction(%35, %c.1, %a.1, %b.1) %37 : Tensor = prim::TupleUnpack(%36) -> (%37) %15 : Tensor = aten::add(%6, %31, %3) # local/fusion.py:13:15 return (%15) with prim::TensorExprGroup_0 = graph(%c.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %a.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu), %b.1 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu)): %5 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::mul(%a.1, %b.1) # local/fusion.py:13:19 %2 : Float(4, 4, strides=[4, 1], requires_grad=0, device=cpu) = aten::mul(%5, %c.1) # local/fusion.py:13:19 return (%2) ``` Reviewed By: navahgar Differential Revision: D27667232 Pulled By: huiguoo fbshipit-source-id: 002ddbb49760b42d52e0605ca3967f4fa36f4e3f	2021-04-09 18:57:40 -07:00
Jerry Zhang	3e8ebb17aa	[reland][quant][graphmode][fx][refactor] Factor out insert_observers_for_model to a separate function (#54733 ) (#55307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55307 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27567475 fbshipit-source-id: 74b7db63f7e1e795e7ac7ed6027cf786d922e7bf	2021-04-09 17:56:55 -07:00
Nikita Shulga	d33829f844	Fix type annotations for state_dict() override (#55704 ) Summary: Change annotation to OrderedDict, but stringify it to stay compatible with Python-3.6 Fixes https://github.com/pytorch/pytorch/issues/55302 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55704 Reviewed By: walterddr Differential Revision: D27686011 Pulled By: malfet fbshipit-source-id: 3a8dedf33f38d86767ebd4e8a1a8abfe850b375a	2021-04-09 17:48:12 -07:00
Alex Suhan	fc349cbcde	OpInfo for kron (#55546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55546 Test Plan: pytest test/test_ops.py -v -k "_kron_" Reviewed By: albanD Differential Revision: D27681131 Pulled By: asuhan fbshipit-source-id: e480d8f163d73b9ca5353b2320ccb0631a5f06c5	2021-04-09 17:36:26 -07:00
Yi Wang	3e9cbe5ef7	[SPMD] Remove the code branches only used in SPMD mode from distributed.py (#55353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55353 Remove all the code branches that will only be executed when `device_ids > 1`. Some helper functions are also removed: 1. `_verify_replicas_within_process` and `verify_replicas_within_process` 2. `_replicate_modules_within_process` 3. `parallel_apply` The next step is deprecating `_module_copies` field. ghstack-source-id: 126201121 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27552201 fbshipit-source-id: 128d0216a202f5b1ba4279517d68c3badba92a6c	2021-04-09 17:27:56 -07:00
Yanan Cao	717d54bc2b	[Hackathon] Add source highlighting check to test_unsupported_ops (#55501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55501 Reviewed By: janeyx99 Differential Revision: D27627517 Pulled By: gmagogsfm fbshipit-source-id: 2542473425d10f1e3eb926f9e0fb6cd40679bd82	2021-04-09 16:40:29 -07:00
Kimish Patel	7485818a3f	Revert D27670883: [pytorch][PR] Added an OpInfo for mm & ported its method_tests Test Plan: revert-hammer Differential Revision: D27670883 (`fc1d7a85bb`) Original commit changeset: 51232f44ab01 fbshipit-source-id: c372b578541626de3871ef94c97b5766c8412580	2021-04-09 16:32:39 -07:00
Yu Guo	846c8d94c7	mark embedding backward non-deterministic for max mode rather than all reducing modes (#55574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55574 nn.EmbeddingBag backward is non-deterministic when reducing_mode = Max and on GPU, reducing mode Mean and Sum should be deterministic Test Plan: NA Reviewed By: ngimel Differential Revision: D27633832 fbshipit-source-id: 50786ed8522f1aae27442f5f244a65eab8000b06	2021-04-09 16:19:01 -07:00
Ailing Zhang	7671c15d4f	Make VariableVersion::DISABLED the default constructor for VariableVersion. (#55572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55572 We used to have VariableVersion default constructor `VariableVersion(uint32_t version=0)`. But sometimes we override the version_counter right after it's constructed. E.g in SavedVariable/TensorImpl. Thus we should make DISABLED the default constructor and else where using explicit `VariableVersion(uint32_t)` constructor. Note this PR effectively changes SavedVariable constructor (which overrides version_counter_ inside) to use the DISABLED constructor and we can see the gains in reduced instruction counts. ``` // benchmark code timer = Timer( "y = x * x", """ x = torch.rand((3, 3)).requires_grad_() """, language=Language.PYTHON, ) λ ~ python compare.py No CUDA runtime is found, using CUDA_HOME='/public/apps/cuda/10.2' <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f06c48b3a50> 7236 lookdict_unicode_nodummy 2600 torch::autograd::VariableType::(...) 100 0x0000000017751750 -5 unlink_chunk.isra.0 -100 0x000000001773e750 -402 _int_malloc -1600 operator delete(...) -1600 c10::intrusive_ptr_target::release_resources() -2400 c10::VariableVersion::VersionCounter::~VersionCounter() -3600 torch::autograd::SavedVariable::operator=(...) -4800 operator new(...) -6400 torch::autograd::SavedVariable::SavedVariable(...) -7200 torch::autograd::SavedVariable::SavedVariable() -8400 free -16800 malloc -24400 _int_free Total: -67771 ``` Note there're for other callsites(esp. view related) we just keep it unchanged by explicitly calling `VariableVersion(uint32_t)` but we should be able to optimize those in the followup PRs. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27669074 Pulled By: ailzhang fbshipit-source-id: a4deb297cc89142ae8bd683284516c881ddf3c87	2021-04-09 15:55:02 -07:00
Nikita Shulga	6e4e3a1159	Fix annotations in _autograd.pyi (#55706 ) Summary: `str` is reserved keyword, besides parameter name in `profiler_kineto.h` is `path`: `6ee333cdb5/torch/csrc/autograd/profiler_kineto.h (L209)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55706 Reviewed By: janeyx99 Differential Revision: D27686286 Pulled By: malfet fbshipit-source-id: b27e8e3812214218054be0e69493177bb728d8d7	2021-04-09 14:54:04 -07:00
Ivan Kobzarev	ee2de8ae3a	[android] Module load extraFiles (#55644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55644 Testing: Prepare the model with extra: ``` extra_files = {} extra_files["model/live.spec.json"] = '{ "spec": "spec_value"}' torch.jit.save(script_model, "extra.pt", _extra_files=extra_files) script_model._save_for_lite_interpreter("extra.ptl"), _extra_files=extra_files) ``` Change android/test_app/app/src/main/java/org/pytorch/testapp/MainActivity.java 1. Full jit ``` Map<String, String> map = new HashMap<>(); map.put("model/live.spec.json", ""); mModule = Module.load(assetFilePath(this, BuildConfig.MODULE_ASSET_NAME), map, Device.CPU); android.util.Log.i("XXX", "map:" + map); ``` `gradle -p android test_app:installMnetLocalBaseDebug -PABI_FILTERS=arm64-v8a` 2. Lite ``` Map<String, String> map = new HashMap<>(); map.put("model/live.spec.json", ""); mModule = LiteModuleLoader.load(assetFilePath(this, BuildConfig.MODULE_ASSET_NAME), map, Device.CPU); android.util.Log.i("XXX", "map:" + map); ``` `BUILD_LITE_INTERPRETER=1 gradle -p android test_app:installMnetLocalBaseDebug -PABI_FILTERS=arm64-v8a` Check logcat Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D27663624 Pulled By: IvanKobzarev fbshipit-source-id: 0dcd93b2fddaacd221db0306d18afee2584fcb85	2021-04-09 14:52:21 -07:00
Sam Estep	9f519d2d2d	Simplify benchmark patterns in mypy-strict.ini (#55700 ) Summary: These two lines were added in https://github.com/pytorch/pytorch/issues/53296, but they are needlessly complicated; this PR consolidates them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55700 Test Plan: Run this command, and verify that the same number of files is given both before and after this PR: ``` mypy --config=mypy-strict.ini ``` Reviewed By: robieta Differential Revision: D27684278 Pulled By: samestep fbshipit-source-id: a34968cdff29cb8ad83813b277114224b5e37569	2021-04-09 14:48:45 -07:00
Ailing Zhang	6842da6251	[WIP]Relax some limitations of InferenceMode. (#54403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403 A few important points about InferenceMode behavior: 1. All tensors created in InferenceMode are inference tensors except for view ops. - view ops produce output has the same is_inference_tensor property as their input. Namely view of normal tensor inside InferenceMode produce a normal tensor, which is exactly the same as creating a view inside NoGradMode. And view of inference tensor outside InferenceMode produce inference tensor as output. 2. All ops are allowed inside InferenceMode, faster than normal mode. 3. Inference tensor cannot be saved for backward. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27316483 Pulled By: ailzhang fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b	2021-04-09 14:40:37 -07:00
Lillian Johnson	91ab0d9680	[hackathon] port addmv to OpInfo (#55545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55545 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27629053 Pulled By: Lilyjjo fbshipit-source-id: d7a114e21d3b90c2563a26d7103703988114353d	2021-04-09 14:25:14 -07:00
Michael Suo	162e1003c9	[package] fix whichmodule for OrderedImporter (#55646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55646 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27664238 Pulled By: suo fbshipit-source-id: 752ba568ade2dbd268a7c1d5b3a12f5c396fcfbb	2021-04-09 13:26:44 -07:00
Alexander	6ee333cdb5	modernize test_sparse (#54572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54572 Adding device generic tests to `test_sparse`. Follow-up PR: #54153 I think is ready to review. Looking forward your comments cc mruberry. Thanks Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27562663 Pulled By: mruberry fbshipit-source-id: c48973e707f779b529bc7f61b75103194b428987	2021-04-09 12:19:29 -07:00
Winston Smith	fc1d7a85bb	Added an OpInfo for mm & ported its method_tests (#55446 ) Summary: Added an `OpInfo` for `mm` & ported its `method_tests` entry (it only had one). cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/55446 Reviewed By: ngimel Differential Revision: D27670883 Pulled By: mruberry fbshipit-source-id: 51232f44ab01ad0454113992f80a4cfc730f8800	2021-04-09 12:15:33 -07:00
Xiong Wei	53f9fc1802	Port hypot method_tests() to OpInfo (#55140 ) Summary: Related https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55140 Reviewed By: ngimel Differential Revision: D27562164 Pulled By: mruberry fbshipit-source-id: fc698ddc624d2abf5d540aac76baa5d398993f1f	2021-04-09 11:40:27 -07:00
Sam Estep	f3367f917e	Translate annotation line numbers from merge to head (#55569 ) Summary: This PR - adds a `tools/translate_annotations.py` script that - parses annotations into JSON using the regexes that we were previously passing to [`pytorch/add-annotations-github-action`](https://github.com/pytorch/add-annotations-github-action) and - uses `git diff-index` to translate the line numbers for those annotations from the PR `merge` onto the PR `head`, since (as of https://github.com/pytorch/pytorch/issues/54967) we now run CI on the former instead of the latter; - modifies the `flake8-py3` and `clang-tidy` jobs to use that script and thus upload JSON in their artifacts instead of raw text; and - modifies the "Add annotations" workflow to specify `mode: json` to allow it to use those preprocessed annotations. Depends on https://github.com/pytorch/add-annotations-github-action/pull/18. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55569 Test Plan: You can run the unit tests with this command: ``` python tools/test/test_translate_annotations.py ``` I also tested the entire system together in my personal sandbox repo. Reviewed By: malfet Differential Revision: D27662161 Pulled By: samestep fbshipit-source-id: ecca51b79b9cf00c90fd89f0d41d0c7b89d69c63	2021-04-09 11:12:40 -07:00
harish	11dd6d3dbb	Mycontrib Added Example for is_tensor API (#55052 ) Summary: [Added Example for is_tensor API](https://github.com/harishsdev/practice/blob/master/changes_to_opensource/is_tensor_example_added.jpg) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55052 Reviewed By: ezyang Differential Revision: D27523833 Pulled By: gchanan fbshipit-source-id: 06036342223454856d4cfec46b40a72b311d261f	2021-04-09 11:02:59 -07:00
Wenlei Xie	c0379ac83f	Simplify device guard code generation (#55112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55112 Based on https://github.com/pytorch/pytorch/pull/47765 ghstack-source-id: 126114775 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang Differential Revision: D27487085 fbshipit-source-id: 157fcd19f538ce0c1e053e3e974b48bdb93a0226	2021-04-09 10:53:38 -07:00
Shiyan Deng	43ede4c2e3	Add Per Tensor Quantization Support to FXIRImporter (#55405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55405 Pull Request resolved: https://github.com/pytorch/glow/pull/5516 Allows FXIRImport to import quantized model. This diff doesn't include the supports for per-channel weights, linear and conv. Will address them in the next diff. Test Plan: buck test glow/fb/fx/nnpi_importer:test_importer Reviewed By: jackm321, jfix71 Differential Revision: D27313543 fbshipit-source-id: bf5c96ef5f2ff1835c09db981e0ceefaec56dd5b	2021-04-09 10:49:48 -07:00
Tugsbayasgalan Manlaibaatar	076961e8b5	Add tuple add operator (#52292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52292 Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26792416 Pulled By: tugsbayasgalan fbshipit-source-id: 882325b171c1ff53ec40243d3f9334049c03fe57	2021-04-09 10:24:48 -07:00
Kshiteej K	159e1100bf	[fix][tests] fix logic if env variables not present (#55664 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/55670 Reference: https://github.com/pytorch/pytorch/pull/55522 Cant Run tests locally without setting the ENV variables <details> ``` (pytorch-cuda-dev) kshiteej@qgpu1:~/Pytorch/pytorch_opinfo$ pytest test/test_ops.py ======================================================================= test session starts ======================================================================== platform linux -- Python 3.8.6, pytest-6.1.2, py-1.9.0, pluggy-0.13.1 rootdir: /home/kshiteej/Pytorch/pytorch_opinfo, configfile: pytest.ini plugins: hypothesis-5.38.1 collected 0 items ========================================================================= warnings summary ========================================================================= ../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/backends/cudnn/__init__.py:73: UserWarning: PyTorch was compiled without cuDNN/MIOpen support. To use cuDNN/MIOpen, rebuild PyTorch making sure the library is visible to the build system. warnings.warn( ../../.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py:1195 /home/kshiteej/.conda/envs/pytorch-cuda-dev/lib/python3.8/site-packages/torch/testing/_internal/common_nn.py:1195: UserWarning: Legacy tensor constructor is deprecated. Use: torch.tensor(...) for creating tensors from tensor-like objects; or torch.empty(...) for creating an uninitialized tensor with specific sizes. (Triggered internally at ../torch/csrc/utils/tensor_new.cpp:474.) random_samples = torch.DoubleTensor(1, 3, 2).uniform_() -- Docs: https://docs.pytest.org/en/stable/warnings.html ======================================================================= 2 warnings in 2.85s ======================================================================== ``` </details> `c7312f5271/torch/testing/_internal/common_device_type.py (L479-L486)` (When running locally where the environment variable is not set) The case when the env variable is not present, `os.getenv` returns `''` which is split and we get `['']` for `only_for` and `except_for`. `c7312f5271/torch/testing/_internal/common_device_type.py (L496-L497)` At this point, we take the branch and skip all the tests. ```python >>> if [''] and 'cuda' not in ['']: ... print("TRUE") ... TRUE ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55664 Reviewed By: albanD Differential Revision: D27677752 Pulled By: malfet fbshipit-source-id: 071486e3b6b5113c56f0f956b8d99a5ab24068fe	2021-04-09 10:22:58 -07:00
Joel Schlosser	defc649eca	Update to short forms of splitWithTail / splitWithMask (#55542 ) Summary: Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542 Reviewed By: mrshenli Differential Revision: D27632033 Pulled By: jbschlosser fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df	2021-04-09 10:15:20 -07:00
Peter Bell	35a66db774	Fix complex mean and reduction tests not being run (#55640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55640 Mean is broken for complex types, since #53218 it's now allocating the result as a real tensor which discards the imaginary component. This wasn't picked up in testing because `_test_dim_ops` tests are defined as closures inside of `_test_dim_ops` instead of as methods on the test class. The result is, they never get run. For best results, view diff with "Hide whitespace changes". Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27671127 Pulled By: mruberry fbshipit-source-id: 4a1f6fea1048919fda7339c867ee78e88f2d7bd2	2021-04-09 10:03:44 -07:00
Jane Xu	2a24a2418a	common_utils.py use new file names for disabled/slow tests (#55620 ) Summary: Following these changes in renaming the files: https://github.com/pytorch/pytorch/pull/55618 https://github.com/pytorch/test-infra/pull/3 We should update the use sites in common_utils.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/55620 Reviewed By: samestep Differential Revision: D27651884 Pulled By: janeyx99 fbshipit-source-id: 298a981e55e0b7c95202294d9bc4b3fcce359590	2021-04-09 09:25:20 -07:00
Xiao Wang	55d45458bd	[cuDNN] Enable Conv3d channels_last_3d (#48430 ) Summary: This PR adds the functionality to use channals_last_3d, aka, NDHWC, in Conv3d. It's only enabled when cuDNN version is greater than or equal to 8.0.5. Todo: - [x] add memory_format test - [x] add random shapes functionality test Close https://github.com/pytorch/pytorch/pull/52547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/48430 Reviewed By: mrshenli Differential Revision: D27641452 Pulled By: ezyang fbshipit-source-id: 0e98957cf30c50c3390903d307dd43bdafd28880	2021-04-09 07:56:49 -07:00
Ailing Zhang	c7312f5271	Enabled xla device in CI. (#55658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55658 Fix #55522. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27671867 Pulled By: ailzhang fbshipit-source-id: af8cc5bfe540af6d33d839bf2f2f254290c95da2	2021-04-08 23:41:03 -07:00
Jerry Zhang	bbd2b1bd3c	[quant][graphmode][fx] Add shape to nontensor op list (#55529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55529 x.shape outputs a non-Tensor, add this to the all_node_args_have_no_tensors function to avoid inserting observer for the getattr "shape" node. Test Plan: Imported from OSS Reviewed By: wat3rBro Differential Revision: D27628145 fbshipit-source-id: 4729294ab80c0a1e72440396d31e7e82257b1092	2021-04-08 23:27:05 -07:00
Natalia Gimelshein	0910363e8f	adds data_ptr checks to in-place OpInfo variant tests and out OpInfo tests (#55527 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55088. Unfortunately, this test wouldn't have caught index_add_ breakage (because index_add_ breakage would appear only in a particular type promotion situation). Pull Request resolved: https://github.com/pytorch/pytorch/pull/55527 Reviewed By: mruberry Differential Revision: D27671138 Pulled By: ngimel fbshipit-source-id: b52411f5a6d81098b706dfda4d0c9a16716414d7	2021-04-08 23:09:28 -07:00
Xiang Gao	d2784c233b	Partially migrate sort from THC to ATen, replace the thrust path with cub (#54626 ) Summary: The thrust path of `torch.sort` in THC is rewritten and replaced with cub in ATen. The original algorithm is followed, but since cub does not offer custom compare operator, I have to change it a bit to 2 sort + gather. Note: tensor larger than 2^31 elements is supported, but the dimension being sorted can not go beyond 2^31. Related: https://github.com/pytorch/pytorch/pull/50887 https://github.com/pytorch/pytorch/issues/24637 Benchmark: ```python import torch import itertools for i in range(1000): torch.arange(100000, device='cuda') def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() for i, j in itertools.product([512, 4096, 8192], repeat=2): print(i,j) t = torch.randn(i, j, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda: torch.sort(t)) torch.cuda.synchronize() %timeit run50_sync(lambda: torch.sort(t, dim=0)) print() ``` Before ``` 512 512 3.91 ms ± 8.53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.87 ms ± 5.06 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 512 4096 70.5 ms ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 32.7 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 512 8192 142 ms ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 64.4 ms ± 94.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 512 26.8 ms ± 1.68 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 82.2 ms ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 4096 606 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 722 ms ± 94.8 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 4096 8192 1.28 s ± 157 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.54 s ± 500 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 512 53.5 ms ± 73.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 168 ms ± 39.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 8192 4096 1.28 s ± 236 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.54 s ± 272 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 8192 2.69 s ± 741 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.28 s ± 549 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 512 512 4.02 ms ± 28.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5 ms ± 15.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 512 4096 40.7 ms ± 74.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 33.9 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 512 8192 71.7 ms ± 636 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 66.4 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 512 27.6 ms ± 27.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 46.6 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4096 4096 262 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 321 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4096 8192 520 ms ± 5.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 661 ms ± 853 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 512 54.6 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 83.2 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 8192 4096 521 ms ± 1.06 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 645 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8192 8192 1.04 s ± 2.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.34 s ± 541 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54626 Reviewed By: VitalyFedyunin Differential Revision: D27396078 Pulled By: ngimel fbshipit-source-id: 4a23b9355e3542e49233b4b4328e43947ec17efd	2021-04-08 23:04:33 -07:00
Meghan Lele	5b149a0d4a	Migrate cos to structured kernel (#55564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55564 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27653764 Pulled By: SplitInfinity fbshipit-source-id: e13d07b2bc76d11e635de63a5d6d1a835da79e47	2021-04-08 22:42:11 -07:00
Meghan Lele	19e43eaaf4	Migrate cosh to structured kernel (#55563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55563 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27653767 Pulled By: SplitInfinity fbshipit-source-id: 11cd631679b9b5a88443a714a56f4178f5bf41b0	2021-04-08 22:42:09 -07:00
Meghan Lele	a699cda846	Migrate acosh to structured kernel (#55540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55540 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27653766 Pulled By: SplitInfinity fbshipit-source-id: 311087befcfaa4bd36d2539b3bfe1d5149922ca3	2021-04-08 22:42:06 -07:00
Meghan Lele	6bdf7ef2a3	Migrate sinh to structured kernel (#55538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55538 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27653765 Pulled By: SplitInfinity fbshipit-source-id: ca708fa20cd95e525827c0834e135c61aff56298	2021-04-08 22:40:21 -07:00
Jerry Zhang	4d449f915f	[quant][graphmode][fx] Separate handling Copy operator to a helper function (#54644 ) (#55429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55429 Previously we special case copy operator in normal insert observer code, this PR tries to split the special case logic to a separate function and keep the rest of the code clean. Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D27609972 fbshipit-source-id: 378f6aa70f18c0b477b62b6efe236648748aae7e	2021-04-08 22:12:24 -07:00
Bert Maher	42486963b2	Integrate NNC conv2d with fuser (#55213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55213 Adds the integration of conv2d with the TE fuser. A few things of interest: - I'm super selective of what convs get lowered. Only 3x3 depthwise, because I've benchmarked those to death and I'm pretty sure it's a good change. - I'm allowing single-node "fusion" groups for supported convs. (Maybe this is a sign that conv2d codegen should go through a different path entirely, but it seems to basically work). I'll shared full benchmarkr results once I clean them up a little. To summarize, I tested the following torchvision models containing depthwise convolutions. Results are single-core on a skylake-avx512: mobilenet_v2: 8% improvement mobilenet_v3: 9% improvement mnasnet: 10% improvement shufflenet: 18% improvement Note these are comparing against a baseline with a fast-but-buggy grouped convolution implementation in MKLDNN. So perf results will be better if compared on master, but I'm going to assume the MKLDNN bug will be fixed and re-enabled. Perf results are more complicated when comparing to freezing plus conversion to mkldnn layout; mobilenet v2/v3 are still faster, but mnasnet and shufflenet are not. Landing this doesn't prevent MKLDNN freezing from kicking in though, so there's no harm (although landing mkldnn freezing will regress mobilenet, but cest la vie). ghstack-source-id: 126076112 Test Plan: New unit test, plus torchvision Reviewed By: ZolotukhinM Differential Revision: D27530272 fbshipit-source-id: 92153fad234bc9f1eaa4f7624c543168d1294a87	2021-04-08 21:58:27 -07:00
Bert Maher	cb4b3b04a8	[nnc] Move device type checks from isSupported to typesAreSupported (#55025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55025 Something needs to be fixed about the names of these functions, because they are confusing. The profiling infrastructure calls `isSupported` to see if it should insert profiling nodes. The fuser calls `isSupported` but also `typesAreSupported` to determine if it can actually fuse the node. At profiling time, we don't know device types yet, so we can't use device type checks in `isSupported` or else we'll never profile the node. So we want to move those checks into `typesAreSupported`, where we actually have profiling info available. ghstack-source-id: 126076111 Test Plan: sandcastle Reviewed By: ngimel Differential Revision: D27454968 fbshipit-source-id: 4ffb142ea7a0086842a034c9e202f9cb1065fc95	2021-04-08 21:58:25 -07:00
Bert Maher	90f848572c	NNC depthwise conv2d implementation (#54920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920 Add a depthwise convolution implementation and reasonably good schedules for 3x3 stride=1,2. ghstack-source-id: 126076113 Test Plan: new tensorexpr test: Conv.DepthwiseConv2D Reviewed By: ZolotukhinM Differential Revision: D27413745 fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e	2021-04-08 21:56:53 -07:00
Nikita Shulga	6a39613f35	[BE] Make torch/csrc/jit/tensorexpr/ clang-tidy clean (#55628 ) Summary: Mostly auto-generated changes using ``` python3 tools/clang_tidy.py -c build -x torch/csrc/jit/tensorexpr/eval.cpp -s ``` With following common patterns manually fixed - Use ` = default` instead of `{}` - deleted methods should be public - Use pass-by-value + std::move instead of pass-by-reference+copy Pull Request resolved: https://github.com/pytorch/pytorch/pull/55628 Reviewed By: walterddr Differential Revision: D27655378 Pulled By: malfet fbshipit-source-id: 92be87a08113435d820711103ea9b0364182c71a	2021-04-08 19:44:14 -07:00
nikithamalgi	c998f3573c	[Hackathon]Move tests related to containers in typing to test_typing.py (#55504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55504 Test Plan: Imported from OSS Reviewed By: navahgar, pbelevich Differential Revision: D27666760 Pulled By: nikithamalgifb fbshipit-source-id: c1a7904f33855efa4f60f8f54c029a95a5fd529c	2021-04-08 18:40:37 -07:00
Eli Uriegas	2ca45cb9e8	[hackathon] ci: Only generate cuda tests for cuda configurations (#55522 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/55522 Reviewed By: walterddr Differential Revision: D27634951 Pulled By: seemethere fbshipit-source-id: 1dccaeb4bc8d0d53d61e467ba676c5c538fd4cf2	2021-04-08 17:31:26 -07:00
zsef123	3498fde20e	Add AccumulateType in AdaptiveAveragePooling3d.cu (#53607 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52719 - Changed the type(`scalar_t`) of intermediate results to `at::acc_type<scalar_t, true>` This issue occurs by decimal precision of the half precision. Follows test cases of upper issue, The value range of input tensors are [0, 1] because init by `rand`. And when the kernel size 1, summations all target values and divide numel of kernel `34d9278c19/aten/src/ATen/native/cuda/AdaptiveAveragePooling3d.cu (L94-L95)` When adding [0, 1] values, if `sum` more than 2048 then not changed values. ( Even if the value is small, the mored exact value is added, but there are still precision issues.) (https://en.wikipedia.org/wiki/Half-precision_floating-point_format) Benchmarks - In V100 32GB, Driver : 450.80, cuda 10.1 - faster than prev <details><summary>Script</summary><p> ```import torch from torch.utils.benchmark import Timer torch.manual_seed(0) kernel_sizes = [1, 3, 5, 7, 9, 11, 13] shapes = [(12, 12, 12), (16, 16, 16), (16, 32, 32), (16, 56, 56), (16, 112, 112)] def run(batch, channel): print(f"Batch : {batch}, Channel : {channel} / (diff, diff / numel, time)") head = "\t".join(f"{str(s):30s}" for s in ["k \ shape"] + shapes) print(head) for kernel_size in kernel_sizes: kernel_size = (kernel_size, kernel_size, kernel_size) pool = torch.nn.AdaptiveAvgPool3d(kernel_size) print(f"{str(kernel_size):30s}", end="\t") for shape in shapes: x_half = torch.rand([batch, channel, shape], dtype=torch.half, device="cuda") x_float = x_half.float() y_half = pool(x_half) y_float = pool(x_float) timer = Timer("pool(x_half)", globals={"pool": pool, "x_half": x_half}) measurement = timer.blocked_autorange(min_run_time=5) diff = (y_float - y_half).abs().sum().item() diff = f"{diff:.4f}, {diff / y_half.numel():.6f}, {measurement.median 1e6 :3.2f}us" print(f"{diff:30s}", end="\t") print("") run(1, 1) run(1, 3) run(1, 54) run(1, 16) run(8, 1) run(8, 16) run(8, 54) import torch m = torch.nn.AdaptiveAvgPool3d((1,1,1)) inputs = torch.rand([8,54,16,56,56]) inputs = inputs.cuda() inputs_2 = inputs.half() print("Float") out = m(inputs).float() print("half") out2 = m(inputs_2).float() print('Discepancies', torch.sum(torch.abs(out2- out)).item(), torch.sum(torch.abs(out2- out)).item() / out.numel() , out.numel()) print("Sum : ", torch.sum(inputs, dim=(2,3,4))[0, 0], torch.sum(inputs_2, dim=(2,3,4))[0, 0]) ``` </p> </details> <details><summary>This commit</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0001, 0.000078, 55.73us 0.0001, 0.000079, 117.51us 0.0000, 0.000003, 379.60us 0.0000, 0.000046, 1046.21us 0.0001, 0.000139, 3897.17us (3, 3, 3) 0.0021, 0.000076, 22.04us 0.0031, 0.000115, 21.47us 0.0022, 0.000080, 41.63us 0.0030, 0.000111, 100.59us 0.0025, 0.000091, 295.04us (5, 5, 5) 0.0103, 0.000083, 21.65us 0.0097, 0.000078, 21.37us 0.0103, 0.000083, 21.60us 0.0114, 0.000091, 25.69us 0.0107, 0.000085, 97.06us (7, 7, 7) 0.0312, 0.000091, 21.52us 0.0290, 0.000084, 21.61us 0.0311, 0.000091, 21.60us 0.0309, 0.000090, 21.44us 0.0334, 0.000097, 33.60us (9, 9, 9) 0.0646, 0.000089, 21.57us 0.0672, 0.000092, 21.89us 0.0662, 0.000091, 21.89us 0.0684, 0.000094, 27.64us 0.0660, 0.000091, 54.85us (11, 11, 11) 0.1251, 0.000094, 21.68us 0.1194, 0.000090, 21.70us 0.1202, 0.000090, 21.72us 0.1233, 0.000093, 22.25us 0.1229, 0.000092, 41.39us (13, 13, 13) 0.2038, 0.000093, 21.57us 0.2047, 0.000093, 21.58us 0.1964, 0.000089, 21.54us 0.2021, 0.000092, 21.94us 0.1989, 0.000091, 40.01us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0003, 0.000110, 55.74us 0.0003, 0.000093, 118.62us 0.0003, 0.000093, 382.12us 0.0001, 0.000040, 1052.33us 0.0003, 0.000114, 3917.90us (3, 3, 3) 0.0073, 0.000090, 21.84us 0.0075, 0.000093, 22.25us 0.0072, 0.000089, 41.78us 0.0070, 0.000087, 100.27us 0.0069, 0.000086, 293.96us (5, 5, 5) 0.0353, 0.000094, 22.57us 0.0325, 0.000087, 21.64us 0.0343, 0.000092, 22.63us 0.0338, 0.000090, 25.82us 0.0332, 0.000089, 97.16us (7, 7, 7) 0.0937, 0.000091, 22.50us 0.0910, 0.000088, 21.92us 0.0933, 0.000091, 21.99us 0.0948, 0.000092, 21.56us 0.0928, 0.000090, 34.17us (9, 9, 9) 0.1957, 0.000089, 21.68us 0.1984, 0.000091, 21.57us 0.2025, 0.000093, 22.10us 0.1986, 0.000091, 27.66us 0.2020, 0.000092, 55.32us (11, 11, 11) 0.3585, 0.000090, 21.75us 0.3684, 0.000092, 22.70us 0.3706, 0.000093, 21.67us 0.3752, 0.000094, 21.86us 0.3663, 0.000092, 41.22us (13, 13, 13) 0.5931, 0.000090, 21.67us 0.6056, 0.000092, 21.79us 0.6005, 0.000091, 21.79us 0.6112, 0.000093, 21.69us 0.6034, 0.000092, 40.02us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0051, 0.000095, 55.76us 0.0060, 0.000112, 118.60us 0.0036, 0.000067, 381.50us 0.0054, 0.000100, 1054.03us 0.0048, 0.000089, 4888.68us (3, 3, 3) 0.1332, 0.000091, 21.66us 0.1344, 0.000092, 22.62us 0.1354, 0.000093, 45.72us 0.1364, 0.000094, 106.63us 0.1324, 0.000091, 448.31us (5, 5, 5) 0.6221, 0.000092, 22.48us 0.6220, 0.000092, 21.71us 0.6053, 0.000090, 27.65us 0.6137, 0.000091, 31.40us 0.6209, 0.000092, 172.78us (7, 7, 7) 1.6859, 0.000091, 22.42us 1.6972, 0.000092, 21.96us 1.6849, 0.000091, 23.14us 1.7012, 0.000092, 26.25us 1.6920, 0.000091, 75.58us (9, 9, 9) 3.5811, 0.000091, 21.73us 3.5746, 0.000091, 22.55us 3.6237, 0.000092, 27.66us 3.6046, 0.000092, 59.71us 3.6392, 0.000092, 168.15us (11, 11, 11) 6.5582, 0.000091, 22.05us 6.5746, 0.000091, 21.74us 6.5955, 0.000092, 32.91us 6.5644, 0.000091, 45.57us 6.5697, 0.000091, 114.01us (13, 13, 13) 10.6384, 0.000090, 21.81us 10.8608, 0.000092, 21.79us 10.8375, 0.000091, 37.01us 10.8662, 0.000092, 51.80us 10.8593, 0.000092, 123.19us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0015, 0.000093, 55.75us 0.0012, 0.000075, 118.10us 0.0013, 0.000079, 379.25us 0.0012, 0.000075, 1047.21us 0.0013, 0.000079, 4451.57us (3, 3, 3) 0.0407, 0.000094, 21.82us 0.0395, 0.000091, 21.69us 0.0385, 0.000089, 42.07us 0.0397, 0.000092, 100.33us 0.0384, 0.000089, 363.31us (5, 5, 5) 0.1858, 0.000093, 21.76us 0.1799, 0.000090, 21.63us 0.1834, 0.000092, 21.76us 0.1890, 0.000095, 26.04us 0.1814, 0.000091, 135.32us (7, 7, 7) 0.4937, 0.000090, 21.65us 0.5076, 0.000092, 21.69us 0.5001, 0.000091, 22.31us 0.4988, 0.000091, 21.59us 0.5123, 0.000093, 50.03us (9, 9, 9) 1.0678, 0.000092, 21.73us 1.0752, 0.000092, 21.75us 1.0673, 0.000091, 21.75us 1.0649, 0.000091, 30.01us 1.0786, 0.000092, 70.92us (11, 11, 11) 1.9591, 0.000092, 21.57us 1.9522, 0.000092, 21.60us 1.9566, 0.000092, 21.73us 1.9475, 0.000091, 23.46us 1.9323, 0.000091, 55.02us (13, 13, 13) 3.1784, 0.000090, 22.02us 3.2165, 0.000092, 21.95us 3.1969, 0.000091, 21.92us 3.2061, 0.000091, 24.40us 3.2578, 0.000093, 56.00us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0010, 0.000122, 55.74us 0.0009, 0.000114, 118.82us 0.0006, 0.000074, 379.80us 0.0009, 0.000107, 1047.31us 0.0008, 0.000102, 3900.36us (3, 3, 3) 0.0219, 0.000101, 21.57us 0.0200, 0.000093, 21.61us 0.0194, 0.000090, 41.74us 0.0208, 0.000096, 99.91us 0.0212, 0.000098, 293.03us (5, 5, 5) 0.0906, 0.000091, 21.46us 0.0911, 0.000091, 21.60us 0.0934, 0.000093, 21.93us 0.0927, 0.000093, 25.74us 0.0913, 0.000091, 96.85us (7, 7, 7) 0.2530, 0.000092, 22.53us 0.2526, 0.000092, 22.46us 0.2558, 0.000093, 22.03us 0.2542, 0.000093, 22.29us 0.2475, 0.000090, 34.44us (9, 9, 9) 0.5305, 0.000091, 22.34us 0.5368, 0.000092, 22.42us 0.5265, 0.000090, 21.74us 0.5370, 0.000092, 27.81us 0.5416, 0.000093, 55.65us (11, 11, 11) 0.9887, 0.000093, 21.80us 0.9660, 0.000091, 21.61us 0.9793, 0.000092, 22.11us 0.9719, 0.000091, 21.80us 0.9650, 0.000091, 43.90us (13, 13, 13) 1.6024, 0.000091, 21.87us 1.6198, 0.000092, 22.65us 1.6242, 0.000092, 21.73us 1.6236, 0.000092, 22.59us 1.6025, 0.000091, 42.77us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0113, 0.000088, 56.66us 0.0117, 0.000091, 119.57us 0.0130, 0.000102, 389.57us 0.0110, 0.000086, 1433.78us 0.0119, 0.000093, 5217.61us (3, 3, 3) 0.3209, 0.000093, 21.54us 0.3184, 0.000092, 22.87us 0.3115, 0.000090, 51.00us 0.3171, 0.000092, 164.17us 0.3182, 0.000092, 500.60us (5, 5, 5) 1.4391, 0.000090, 22.39us 1.4577, 0.000091, 21.69us 1.4601, 0.000091, 53.87us 1.4626, 0.000091, 93.65us 1.4567, 0.000091, 370.11us (7, 7, 7) 4.0501, 0.000092, 22.34us 4.0230, 0.000092, 31.45us 4.0381, 0.000092, 45.19us 4.0171, 0.000091, 65.35us 4.0108, 0.000091, 164.76us (9, 9, 9) 8.5360, 0.000091, 22.80us 8.5456, 0.000092, 27.24us 8.5461, 0.000092, 50.23us 8.5677, 0.000092, 117.63us 8.5645, 0.000092, 270.46us (11, 11, 11) 15.5521, 0.000091, 26.56us 15.5826, 0.000091, 32.81us 15.6014, 0.000092, 63.82us 15.5620, 0.000091, 96.87us 15.5722, 0.000091, 220.24us (13, 13, 13) 25.4146, 0.000090, 32.91us 25.7898, 0.000092, 38.48us 25.6698, 0.000091, 72.02us 25.8193, 0.000092, 121.73us 25.7718, 0.000092, 249.71us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0377, 0.000087, 109.07us 0.0405, 0.000094, 233.17us 0.0392, 0.000091, 998.97us 0.0393, 0.000091, 2960.68us 0.0408, 0.000094, 11879.53us (3, 3, 3) 1.0660, 0.000091, 25.68us 1.0761, 0.000092, 64.12us 1.0725, 0.000092, 182.50us 1.0801, 0.000093, 505.82us 1.0736, 0.000092, 1650.21us (5, 5, 5) 4.9587, 0.000092, 50.84us 4.9336, 0.000091, 47.38us 4.9696, 0.000092, 158.49us 4.9347, 0.000091, 237.39us 4.9303, 0.000091, 965.13us (7, 7, 7) 13.5409, 0.000091, 45.60us 13.5736, 0.000092, 87.45us 13.5012, 0.000091, 141.63us 13.6111, 0.000092, 181.51us 13.5296, 0.000091, 469.77us (9, 9, 9) 28.7817, 0.000091, 58.01us 28.7969, 0.000091, 77.61us 28.8761, 0.000092, 159.33us 28.8786, 0.000092, 334.47us 28.8093, 0.000091, 786.72us (11, 11, 11) 52.4453, 0.000091, 78.19us 52.7265, 0.000092, 95.12us 52.7322, 0.000092, 200.38us 52.6342, 0.000092, 282.41us 52.6467, 0.000092, 652.54us (13, 13, 13) 85.7411, 0.000090, 98.85us 86.7183, 0.000091, 115.28us 86.8545, 0.000092, 232.34us 86.9997, 0.000092, 367.32us 86.9083, 0.000092, 757.73us Float half Discepancies 0.03963914513587952 9.175728040712852e-05 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> <details><summary>1.8.0</summary><p> ``` Batch : 1, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0023, 0.002275, 74.35us 0.0040, 0.003985, 159.73us 0.3740, 0.374021, 546.59us 0.4587, 0.458663, 1543.16us 0.4906, 0.490637, 5945.97us (3, 3, 3) 0.0100, 0.000370, 20.37us 0.0230, 0.000852, 22.12us 0.0309, 0.001143, 54.75us 0.0520, 0.001926, 129.78us 7.1219, 0.263775, 377.11us (5, 5, 5) 0.0441, 0.000352, 20.06us 0.0394, 0.000316, 20.50us 0.0759, 0.000607, 26.43us 0.1499, 0.001199, 32.01us 0.2707, 0.002166, 128.15us (7, 7, 7) 0.0791, 0.000231, 20.10us 0.1002, 0.000292, 20.56us 0.1812, 0.000528, 20.48us 0.2424, 0.000707, 20.83us 0.4994, 0.001456, 43.97us (9, 9, 9) 0.1122, 0.000154, 20.55us 0.1778, 0.000244, 20.44us 0.2572, 0.000353, 20.15us 0.4149, 0.000569, 35.64us 0.7208, 0.000989, 68.46us (11, 11, 11) 0.2044, 0.000154, 20.47us 0.2647, 0.000199, 20.62us 0.3867, 0.000291, 20.61us 0.6059, 0.000455, 23.54us 1.0902, 0.000819, 53.32us (13, 13, 13) 0.3094, 0.000141, 20.53us 0.3843, 0.000175, 20.60us 0.5756, 0.000262, 20.80us 0.8598, 0.000391, 24.52us 1.4853, 0.000676, 47.70us Batch : 1, Channel : 3 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0054, 0.001801, 74.36us 0.0108, 0.003614, 158.94us 1.1183, 0.372768, 547.67us 1.3782, 0.459387, 1545.27us 1.4685, 0.489505, 5949.17us (3, 3, 3) 0.0308, 0.000380, 20.14us 0.0502, 0.000619, 22.11us 0.1210, 0.001493, 54.80us 0.1900, 0.002345, 130.47us 21.3483, 0.263560, 375.68us (5, 5, 5) 0.1179, 0.000314, 20.68us 0.1326, 0.000354, 20.53us 0.2662, 0.000710, 26.51us 0.4116, 0.001098, 31.85us 0.8369, 0.002232, 128.19us (7, 7, 7) 0.2335, 0.000227, 20.40us 0.3057, 0.000297, 20.43us 0.4954, 0.000481, 20.31us 0.7339, 0.000713, 20.74us 1.4208, 0.001381, 44.55us (9, 9, 9) 0.3326, 0.000152, 20.63us 0.5353, 0.000245, 20.42us 0.8025, 0.000367, 20.13us 1.2693, 0.000580, 35.64us 2.2096, 0.001010, 68.88us (11, 11, 11) 0.6121, 0.000153, 20.59us 0.8086, 0.000202, 20.42us 1.1700, 0.000293, 20.71us 1.8170, 0.000455, 23.54us 3.2117, 0.000804, 53.36us (13, 13, 13) 0.9165, 0.000139, 20.51us 1.1395, 0.000173, 20.56us 1.7343, 0.000263, 20.80us 2.5868, 0.000392, 24.59us 4.5823, 0.000695, 47.77us Batch : 1, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.1092, 0.002023, 75.45us 0.1709, 0.003165, 160.44us 20.2452, 0.374911, 548.61us 24.7990, 0.459240, 1550.34us 26.4494, 0.489804, 6957.79us (3, 3, 3) 0.5352, 0.000367, 20.58us 1.0281, 0.000705, 24.14us 2.0150, 0.001382, 59.12us 3.3069, 0.002268, 138.23us 384.5216, 0.263732, 529.71us (5, 5, 5) 2.0739, 0.000307, 20.60us 2.5199, 0.000373, 20.44us 4.6916, 0.000695, 33.89us 7.9482, 0.001178, 37.74us 14.2553, 0.002112, 200.54us (7, 7, 7) 4.2236, 0.000228, 20.61us 5.5605, 0.000300, 20.97us 9.0440, 0.000488, 26.40us 12.7847, 0.000690, 30.64us 25.3050, 0.001366, 88.05us (9, 9, 9) 6.0817, 0.000154, 20.63us 9.5416, 0.000242, 20.84us 14.2416, 0.000362, 32.47us 22.8452, 0.000580, 78.57us 40.3246, 0.001024, 194.50us (11, 11, 11) 11.1144, 0.000155, 20.56us 14.5581, 0.000203, 20.91us 20.8263, 0.000290, 38.07us 33.0004, 0.000459, 52.74us 57.3275, 0.000798, 137.19us (13, 13, 13) 16.5176, 0.000139, 21.26us 20.8089, 0.000175, 22.33us 31.3433, 0.000264, 42.93us 45.9733, 0.000388, 59.84us 82.8301, 0.000698, 138.42us Batch : 1, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0274, 0.001715, 74.99us 0.0485, 0.003034, 159.92us 5.9925, 0.374529, 546.35us 7.3389, 0.458679, 1544.53us 7.8354, 0.489714, 6677.00us (3, 3, 3) 0.1560, 0.000361, 20.72us 0.3043, 0.000704, 22.37us 0.5838, 0.001352, 54.97us 1.0455, 0.002420, 130.57us 113.9739, 0.263828, 463.43us (5, 5, 5) 0.6121, 0.000306, 20.12us 0.7247, 0.000362, 20.73us 1.3740, 0.000687, 26.59us 2.3794, 0.001190, 32.12us 4.1929, 0.002096, 165.81us (7, 7, 7) 1.2389, 0.000226, 20.59us 1.6311, 0.000297, 20.53us 2.6732, 0.000487, 20.37us 3.7501, 0.000683, 20.71us 7.4575, 0.001359, 59.16us (9, 9, 9) 1.7983, 0.000154, 20.64us 2.8075, 0.000241, 20.59us 4.2165, 0.000361, 20.38us 6.7153, 0.000576, 38.29us 12.0530, 0.001033, 86.33us (11, 11, 11) 3.3326, 0.000156, 20.56us 4.3061, 0.000202, 20.67us 6.2235, 0.000292, 20.47us 9.8009, 0.000460, 27.41us 16.9994, 0.000798, 68.49us (13, 13, 13) 4.9016, 0.000139, 20.63us 6.1261, 0.000174, 20.65us 9.2106, 0.000262, 20.93us 13.5843, 0.000386, 27.95us 24.6476, 0.000701, 64.88us Batch : 8, Channel : 1 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.0170, 0.002122, 74.99us 0.0316, 0.003946, 160.66us 3.0013, 0.375158, 546.94us 3.6780, 0.459753, 1544.58us 3.9197, 0.489966, 5948.43us (3, 3, 3) 0.0821, 0.000380, 20.27us 0.1559, 0.000722, 22.29us 0.3133, 0.001450, 54.72us 0.5100, 0.002361, 130.12us 57.0481, 0.264111, 376.71us (5, 5, 5) 0.3075, 0.000307, 20.57us 0.3680, 0.000368, 20.69us 0.6786, 0.000679, 26.61us 1.1744, 0.001174, 31.77us 2.0654, 0.002065, 128.31us (7, 7, 7) 0.6512, 0.000237, 20.60us 0.8359, 0.000305, 20.50us 1.3712, 0.000500, 20.75us 1.9472, 0.000710, 20.92us 3.7586, 0.001370, 44.59us (9, 9, 9) 0.9138, 0.000157, 20.43us 1.4198, 0.000243, 20.58us 2.1018, 0.000360, 20.52us 3.3691, 0.000578, 35.90us 5.9491, 0.001020, 69.16us (11, 11, 11) 1.6606, 0.000156, 20.63us 2.1599, 0.000203, 20.57us 3.1240, 0.000293, 20.98us 4.8874, 0.000459, 24.65us 8.4780, 0.000796, 56.47us (13, 13, 13) 2.4987, 0.000142, 20.71us 3.0667, 0.000174, 20.45us 4.6387, 0.000264, 20.76us 6.8187, 0.000388, 25.95us 12.2077, 0.000695, 50.46us Batch : 8, Channel : 16 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.2635, 0.002059, 75.66us 0.4030, 0.003149, 161.78us 48.0296, 0.375231, 550.46us 58.7787, 0.459209, 1902.41us 62.6966, 0.489817, 7817.48us (3, 3, 3) 1.2271, 0.000355, 20.72us 2.4185, 0.000700, 26.44us 4.6933, 0.001358, 64.66us 7.7016, 0.002228, 192.69us 912.0736, 0.263910, 593.69us (5, 5, 5) 4.8716, 0.000304, 24.75us 5.8624, 0.000366, 21.39us 11.0705, 0.000692, 66.94us 18.9280, 0.001183, 104.93us 34.0512, 0.002128, 441.81us (7, 7, 7) 10.1713, 0.000232, 20.98us 13.2273, 0.000301, 36.26us 21.5426, 0.000491, 52.18us 30.1910, 0.000688, 72.94us 59.8381, 0.001363, 191.52us (9, 9, 9) 14.4542, 0.000155, 23.85us 22.6579, 0.000243, 30.59us 33.8839, 0.000363, 57.40us 54.3563, 0.000583, 142.53us 95.8123, 0.001027, 309.24us (11, 11, 11) 26.3348, 0.000155, 30.07us 34.3043, 0.000201, 37.01us 49.8093, 0.000292, 74.04us 78.3720, 0.000460, 110.53us 136.5404, 0.000801, 264.14us (13, 13, 13) 39.3550, 0.000140, 37.38us 49.3207, 0.000175, 43.51us 74.1139, 0.000264, 83.70us 108.7627, 0.000387, 136.09us 196.5412, 0.000699, 280.16us Batch : 8, Channel : 54 / (diff, diff / numel, time) k \ shape (12, 12, 12) (16, 16, 16) (16, 32, 32) (16, 56, 56) (16, 112, 112) (1, 1, 1) 0.8467, 0.001960, 147.36us 1.3993, 0.003239, 314.95us 162.0182, 0.375042, 1327.22us 198.3226, 0.459080, 3921.79us 211.6123, 0.489843, 15646.94us (3, 3, 3) 4.3146, 0.000370, 29.23us 8.1125, 0.000696, 74.94us 15.8886, 0.001362, 223.69us 26.2404, 0.002250, 601.33us 3076.5354, 0.263763, 1974.06us (5, 5, 5) 16.5032, 0.000306, 58.79us 19.6887, 0.000365, 53.79us 37.2731, 0.000690, 192.34us 63.3076, 0.001172, 270.01us 114.8880, 0.002128, 1148.56us (7, 7, 7) 34.0802, 0.000230, 51.12us 44.4087, 0.000300, 100.93us 72.4613, 0.000489, 161.48us 101.9317, 0.000688, 202.91us 201.8955, 0.001363, 545.33us (9, 9, 9) 48.8179, 0.000155, 65.78us 76.3465, 0.000242, 87.48us 114.0228, 0.000362, 179.11us 182.9805, 0.000581, 403.66us 322.7040, 0.001025, 894.86us (11, 11, 11) 88.9993, 0.000155, 88.69us 116.4213, 0.000202, 107.55us 168.3363, 0.000293, 228.71us 264.2232, 0.000460, 322.84us 459.1324, 0.000799, 784.25us (13, 13, 13) 132.7447, 0.000140, 112.91us 165.4525, 0.000174, 131.08us 249.7127, 0.000263, 266.43us 367.0824, 0.000387, 410.17us 663.1367, 0.000699, 847.87us Float half Discepancies 198.37625122070312 0.4592042852331091 432 Sum : tensor(25110.1484, device='cuda:0') tensor(25104., device='cuda:0', dtype=torch.float16) ``` </p> </details> ngimel malfet anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53607 Reviewed By: mruberry Differential Revision: D27652337 Pulled By: ngimel fbshipit-source-id: 6439c0cafe6ca3f761a3f5d058050a55e9a0abd8	2021-04-08 15:48:08 -07:00
Sam Estep	cc11aaaa60	Disallow non-breaking spaces (#55465 ) Summary: malfet found a couple of these in https://github.com/pytorch/pytorch/issues/55346; this PR removes the rest and adds a lint that prevents them from being accidentally added again in the future. It also removes the `-o` flag added in https://github.com/pytorch/pytorch/issues/53733 (which was unnecessarily hiding context without reducing the number of lines of output), and updates the lint error messages to reflect that the individual line numbers are shown in the logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55465 Test Plan: The "Lint / quick-checks" job in GitHub Actions should succeed on this PR. To verify that the lint does correctly find and error on non-breaking spaces, checkout ece075195d49c25213c96b9d53fcf7077215f44a and run it locally: ```sh (! git --no-pager grep -In $'\u00a0' -- . \|\| (echo "The above lines have non-breaking spaces (U+00A0); please convert them to spaces (U+0020)"; false)) ``` It should print over a hundred lines of output and exit with status 1. Reviewed By: janeyx99 Differential Revision: D27622136 Pulled By: samestep fbshipit-source-id: e7ffd5a9519093e7a0ffdf55e9291f63e21ce841	2021-04-08 15:44:44 -07:00
Jane Xu	bf882929f1	[skip ci] Add explanation for why we split TORCH_CUDA_API (#55641 ) Summary: Provide explanation for why we have (and use) the BUILD_SPLIT_CUDA option as a result of PR https://github.com/pytorch/pytorch/pull/49050. This should hopefully clarify why there is both TORCH_CUDA_CU_API and TORCH_CUDA_CPP_API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55641 Reviewed By: samestep Differential Revision: D27661729 Pulled By: janeyx99 fbshipit-source-id: a68b44df2b45ce10590b9b0229558a1ad40ce485	2021-04-08 15:40:48 -07:00
Your Name	fc45ff8177	[skip ci] Document '[skip ci]' (#55418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55418 Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D27609269 Pulled By: driazati fbshipit-source-id: 6ce562950ee35e029f0bfa3d0fffbbcc28265a7a	2021-04-08 15:36:29 -07:00
Kimish Patel	364639041f	Revert D27121170: [torch] Add cuda support for segment reduction 'max' Test Plan: revert-hammer Differential Revision: D27121170 (`eb5e1fc713`) Original commit changeset: 1c2565f42e29 fbshipit-source-id: 3dd394edcf5ef53c27098b4d0a1dd6fbbabdd506	2021-04-08 15:30:58 -07:00
Rong Rong (AI Infra)	55db156229	remove test_jit_py3.py entirely (#55560 ) Summary: 1. move module related stuff to test_module_container 2. created test_types for types and annotation 3. created test_misc for the rest Pull Request resolved: https://github.com/pytorch/pytorch/pull/55560 Reviewed By: VitalyFedyunin Differential Revision: D27650911 Pulled By: walterddr fbshipit-source-id: d895a7da9e9c3d25a662a37faf4daabc276b9c1a	2021-04-08 14:28:54 -07:00
Vladimir Petrovic	305abde976	Fix nvcc warnings (#55367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55367 During compilation, nvcc emits several warnings about unused variables and static funcions: ``` caffe2/aten/src/ATen/native/cuda/SpectralOps.cu(231): warning: function "at::native::_run_cufft" was declared but never referenced caffe2/aten/src/ATen/native/sparse/cuda/SparseMatMul.cu(60): warning: function "at::native::<unnamed>::confirm_mult_size" was declared but never referenced caffe2/aten/src/ATen/native/cuda/UnaryFractionKernels.cu(112): warning: function "at::native::nearbyint_wrapper(c10::complex<double>)" was declared but never referenced caffe2/aten/src/ATen/native/cuda/TensorFactories.cu(106): warning: variable "d_temp_storage" was declared but never referenced caffe2/torch/fb/sparsenn/sparsenn_operators_gpu.cu(2325): warning: variable "kMaxThreads" was declared but never referenced ``` To reproduce, run the following build command on remote/master: ``` buck build mode/dev-nosan caffe2/torch/fb/sparsenn:sparsenn_operators_gpu ``` Warnings about unused variables are fixed by removing the variable declaration. However, I don't want to remove the unused static functions. They were probably used before some other part of the code was refactored. They might be useful again in the future. So, I added a #pragma firectives to disable warnings for such functions. Test Plan: Compilation does not produce warnings any more. Reviewed By: r-barnes Differential Revision: D27577342 fbshipit-source-id: e6a6e5ec513996337d904985dd27c60601c74803	2021-04-08 13:42:44 -07:00
Serhat Yilmaz	eb5e1fc713	[torch] Add cuda support for segment reduction 'max' (#54175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54175 Building on top of previous PR. This PR adds cuda support for 1D max reduction. Next steps: - Add support for other major reduction types (e.g. min, sum) for 1D tensor - Documentation for the op - Perf optimizations and benchmark util - Backward support (not high priority) - Support for multi dimensional tensors (on data and lengths) (not high priority) - Support for 'indices' (not high priority) Test Plan: Added unit test Reviewed By: ngimel Differential Revision: D27121170 fbshipit-source-id: 1c2565f42e2903e6fc089d56983ce8857efbfa3c	2021-04-08 13:25:55 -07:00
Edward Yang	778f9eab6c	Don't switch streams when running Caffe2 ops from c10. (#55121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55121 This is done by allow -1 as a stream ID, meaning "don't change the stream", in SwitchToDevice Fixes #54830 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D27527544 Pulled By: ezyang fbshipit-source-id: c54983d6fc79a8fa1c65a71559a57425e40ba717	2021-04-08 13:21:11 -07:00
Sam Estep	adc65974b2	Run ShellCheck on scripts in GitHub Actions workflows (#55486 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/55314. - [x] Extract shell scripts from `.github/workflows/*.yml` into `.shellcheck_generated` dir - [x] Run ShellCheck on `.shellcheck_generated` - [x] Fail if any of the extracted scripts contain [GitHub Actions expressions][]: `${{ <expression> }}` - [x] Fix the newly-surfaced warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/55486 Test Plan: Locally run the "ShellCheck" step from "Lint / quick-checks". [github actions expressions]: https://docs.github.com/en/actions/reference/context-and-expression-syntax-for-github-actions#about-contexts-and-expressions Reviewed By: malfet Differential Revision: D27627590 Pulled By: samestep fbshipit-source-id: 8a22c6743e11b3059506043735f100efdd7c5a26	2021-04-08 13:15:00 -07:00
Aliaksandr Ivanou	960b40156c	[6/n][torch/elastic][upstream] Move torchelastic/distributed/api to torch/distributed/elastic/launchers/api (#55471 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55471 Move torchelastic/distributed/api to torch/distributed/elastic/launchers/api Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/... SyncSGD: tsm_aivanou-SparseNNApplication_432fc009 f263322216 Reviewed By: wilson100hong Differential Revision: D27614353 fbshipit-source-id: a3b58fac2ebf803b8da5852ae2be0851b1cca695	2021-04-08 12:30:25 -07:00
Brian Hirsh	fd450ff1b9	Revert D27598681: Add OpInfo tests for torch.addbmm Test Plan: revert-hammer Differential Revision: D27598681 (`b5647dd52b`) Original commit changeset: 24082f54b12e fbshipit-source-id: 43d5713829fbaa00353bb7b054b66f537d768cd1	2021-04-08 11:38:49 -07:00
Nikita Shulga	2564c0c889	avoid CPU std::copysign segfault when compiling on arm64 (take-2) (#55608 ) Summary: Re-land of https://github.com/pytorch/pytorch/issues/51834 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55608 Reviewed By: ngimel Differential Revision: D27649077 Pulled By: malfet fbshipit-source-id: 1a21611fb12106f75fe50e8f9f14796ab6ab9464	2021-04-08 11:34:09 -07:00
Nikita Shulga	11add8f45f	Add --suppress-diagnostics option (#55612 ) Summary: Add option to add //NOLINTNEXTLINE for every detected violation Series of automated huge diffs are coming after this one to make large chunks of code clang-tidy PR generated by new option: https://github.com/pytorch/pytorch/pull/55628 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55612 Reviewed By: samestep Differential Revision: D27649473 Pulled By: malfet fbshipit-source-id: 251a68fcc50bf0fd69c6566293d4a516c0ab24c8	2021-04-08 11:32:32 -07:00
James Reed	ad823888a1	[FX] Speed up _Namespace.create_name (#55580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55580 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27641156 Pulled By: jamesr66a fbshipit-source-id: d2443d41c8d84dddb1794a7901e2d09ae3639846	2021-04-08 10:59:42 -07:00
Xue Haotian	60263e0f5a	OpInfo porting for torch.maximum / torch.minimum / torch.fmax / torch.fmin (#55129 ) Summary: Related https://github.com/pytorch/pytorch/issues/54261 This PR ports the method_tests() entries of following operators to OpInfo. - torch.maximum - torch.minimum - torch.fmax - torch.fmin Pull Request resolved: https://github.com/pytorch/pytorch/pull/55129 Reviewed By: ngimel Differential Revision: D27562189 Pulled By: mruberry fbshipit-source-id: 9f25aeb09eb353080af43f25ea2e931474510aca	2021-04-08 10:14:38 -07:00
Wilson Hong	f665a7f8a1	[pet] Set error code in reply file when child process is terminated by signals. Summary: Fill reply file's error code with ProcessFailure's exitcode. This is necessary when child process terminated by signals (ex. SIGSEGV). Test Plan: - Buck test ``` buck test mode/dev-nosan pytorch/elastic/torchelastic/distributed/fb/test:launch_test buck test mode/dev-nosan caffe2/torch/distributed/elastic/multiprocessing/errors/fb/test:error_handler_fb_test_needed_coverage ``` - TSM ``` fbpkg build -E torchelastic_distributed_sum buck run mode/dev-nosan //pytorch/elastic/torchelastic/tsm/fb/cli:tsm -- run_ddp --scheduler mast --fbpkg torchelastic_distributed_sum:ecdf31f --nnodes 2 --nproc_per_node 2 --resource T1 --run_cfg hpcIdentity=oncall_dai_pet,hpcClusterUuid=MastNaoTestCluster main.pa ``` https://www.internalfb.com/mast/job/tsm_wilsonhong-torchelastic_distributed_sum_ef3fd8d3 - classy_vision ``` flow-cli canary pytorch.elastic.examples.classy_vision.main --entitlement gpu_prod --run-as-secure-group oncall_dai_pet --buck-target //fblearner/flow/projects/pytorch/elastic/examples:workflow ``` https://our.intern.facebook.com/intern/fblearner/details/263970380/?notif_channel=cli Reviewed By: tierex Differential Revision: D27512554 fbshipit-source-id: 903d25d96655085685f874113826d4627d9a79e4	2021-04-08 09:58:20 -07:00
Jane Xu	8b5da2f48d	rename .pytorch-disabled-tests to disabled-tests.json (#55618 ) Summary: We shouldn't store it as a hidden file, and having the json extension is useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55618 Test Plan: https://github.com/janeyx99/gha-experiments/runs/2298470065?check_suite_focus=true in my own repo Reviewed By: samestep Differential Revision: D27651467 Pulled By: janeyx99 fbshipit-source-id: cd9b6c8d065f1ffdcabb0844d375ae8be7177e13	2021-04-08 09:28:52 -07:00
Jeffrey Wan	3f9492c8b3	[Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/55203 Fixes issues (1) and (2) in the following tests: tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512 Reviewed By: mrshenli Differential Revision: D27630679 Pulled By: soulitzer fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80	2021-04-08 08:34:25 -07:00
Nikitha Malgi	432df40d83	[Hackathon] Move python builtins to test_python_builtins.py (#55479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55479 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27642098 Pulled By: nikithamalgifb fbshipit-source-id: 8d92a7d0f6db63f3cc3f439cb75a8d809af9106d	2021-04-08 08:06:54 -07:00
mattip	7d56de1834	DOC: use autosummary on tensors.rst (#55042 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Splits tensors into a table-of-contents page and many sub-pages, one for each function Pull Request resolved: https://github.com/pytorch/pytorch/pull/55042 Reviewed By: mrshenli Differential Revision: D27628688 Pulled By: zou3519 fbshipit-source-id: 08e87700a8e7d5b3fba3f1949e29e988a42bf2c6	2021-04-08 06:44:23 -07:00
lezcano	d3d7f57c2c	Fix a problem when removing parametrizations (#55456 ) Summary: There was an error when removing a parametrization with `leave_parametrized=True`. It had escaped the previous tests. This PR should fix that. Edit. I also took this chance to fix a few mistakes that the documentation had, and to also write the `set_original_` in a more compact way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55456 Reviewed By: mrshenli Differential Revision: D27620481 Pulled By: albanD fbshipit-source-id: f1298ddbcf24566ef48850c62a1eb4d8a3576152	2021-04-08 06:39:28 -07:00
Masaya, Kato	473d193966	Use mkldnn copy for copy_ when self and src are Mkldnn layout (#54248 ) Summary: Currently, when copy_ is called with Mkldnn layout, a RuntimeError is raised. Environment - CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz - PyTorch master(1772e26f6380d1) - build with USE_MKLDNN=1 Sample code to reproduce: ```python import torch x = torch.randn(4, 5, dtype=torch.float32) mkldnn_x = x.to_mkldnn() mkldnn_y = torch.randn(4, 5, dtype=torch.float32).to_mkldnn() mkldnn_y.copy_(mkldnn_x) print(x) print(mkldnn_y.to_dense()) ``` Results: Actual: ```sh Traceback (most recent call last): File "mkldnn_copy.py", line 6, in <module> mkldnn_y.copy_(mkldnn_x) RuntimeError: unsupported tensor layout: Mkldnn ``` Expected: ```sh # x tensor([[ 0.1276, -0.1179, 1.1970, 2.4836, 1.9059], [-1.9647, 0.8613, -0.5060, 0.1555, 0.3661], [-0.1560, -0.2133, 0.3414, -1.7095, -2.3431], [ 1.3291, 0.3083, 0.5523, -2.0577, -0.4740]]) # mkldnn_y tensor([[ 0.1276, -0.1179, 1.1970, 2.4836, 1.9059], [-1.9647, 0.8613, -0.5060, 0.1555, 0.3661], [-0.1560, -0.2133, 0.3414, -1.7095, -2.3431], [ 1.3291, 0.3083, 0.5523, -2.0577, -0.4740]]) ``` This is because `copy_` does not support Mkldnn layout. So I modified to call `copy_mkldnn_` in `copy_` when both `self` and `src` are Mkldnn layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54248 Reviewed By: mrshenli Differential Revision: D27641352 Pulled By: ezyang fbshipit-source-id: 70a37cdacb4a40b250ca16f2f6ddb6b71ff52d90	2021-04-08 06:35:39 -07:00
anjali411	b5647dd52b	Add OpInfo tests for torch.addbmm (#55378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55378 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27598681 Pulled By: anjali411 fbshipit-source-id: 24082f54b12e6346b81c9b6a6e20714e8fd94a9b	2021-04-08 05:48:23 -07:00
Tao Xu	f1a0b817f0	[pthreadpool] Apply cap for macos builds (#55435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55435 We've seen issues from the macos skylight app that PyTorch is super slow due to the lack of cap support in pthreadpools. For mac builds, we set the thread count to `#threads/2`. ghstack-source-id: 125900852 Test Plan: - Sandcastle CI - CircleCI Reviewed By: kimishpatel Differential Revision: D27578871 fbshipit-source-id: 7b947bc5d6cf289378abf5f479575e112325d02b	2021-04-08 03:56:12 -07:00
Nikolay Korovaiko	f88a3fff65	Set requires_gradient to help autodiff to prune unneeded gradients (#54374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54040 `prim::RequiresGradCheck` guarantees that requires_grad properties of input tensors will match the profiled, otherwise a fallback path will be triggered. This allow us to prune off gradients in backward graph for inputs that don't need gradients. We transfer requires_grad properties from inputs to the `prim::DifferentiableGraph` onto inputs to the differentiable graph. Autodiff will inspect these properties and prune off gradients that aren't required Pull Request resolved: https://github.com/pytorch/pytorch/pull/54374 Reviewed By: H-Huang Differential Revision: D27369251 Pulled By: Krovatkin fbshipit-source-id: 2bce7a2d7f2ec091db9bf4c4b91d8b29edd5be11	2021-04-08 03:15:40 -07:00
kshitij12345	37d1b39413	OpInfo: `atan2` (#55132 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55132 Reviewed By: mrshenli Differential Revision: D27615135 Pulled By: mruberry fbshipit-source-id: 22fa1a225b9a75eb478797316e4462d4af4e8826	2021-04-08 01:21:46 -07:00
kshitij12345	902bf0bbbe	[special] Alias for sigmoid and logit & follow-up (#54759 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Chages: * Alias for sigmoid and logit * Adds out variant for C++ API * Updates docs to link back to `special` documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/54759 Reviewed By: mrshenli Differential Revision: D27615208 Pulled By: mruberry fbshipit-source-id: 8bba908d1bea246e4aa9dbadb6951339af353556	2021-04-08 00:56:59 -07:00
Philip Meier	f4967d68f5	make torch.testing asserts importable (#54769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54769 Follow-up to #53820. This - makes the `asserts.py` module private as per suggestion from rgommers in https://github.com/pytorch/pytorch/pull/53820#issuecomment-802661387. With this the functions should only be accessible through `torch.testing`, giving us the option the change the underlying structure later. - moves the code from `torch/testing/__init__.py` to `torch/testing/_core.py` (happy to accept other name suggestions). Otherwise we can't import the new `_asserts.py` in `torch/testing/__init__.py` due to circular imports. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27438451 Pulled By: mruberry fbshipit-source-id: c7292b4d5709185b42b4aac8016648562688040e	2021-04-07 23:53:02 -07:00
Tugsbayasgalan Manlaibaatar	ffe301846b	[Hackathon] Add error source range highlighting check in test_hash and test_list_dict (#55490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55490 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27628697 Pulled By: tugsbayasgalan fbshipit-source-id: 694226f0b083606f665569e6a84d547026c7f19f	2021-04-07 23:48:51 -07:00
peter	3517ee1bcb	Fix ordered_dict.h for CUDA on Windows (#55275 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55266 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55275 Reviewed By: mrshenli Differential Revision: D27623887 Pulled By: malfet fbshipit-source-id: 6dac357e21179a259ac95f0e1b7399b03dacc81d	2021-04-07 23:43:35 -07:00
Arindam Roy	0dff0d1537	[ROCM] Disable few tests for Magma (#55534 ) Summary: After MAGMA has been enabled, around 5k new tests are running now. Out of these 5 tests (each having 4 datatypes) are failing on the latest ROCM CI with Rocm 4.1. Disabling these tests for now so the ROCM CI does not fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55534 Reviewed By: ZolotukhinM Differential Revision: D27630085 Pulled By: malfet fbshipit-source-id: c48d124e6a2b4a4f3c6c4b6ac2bdf6c214f325c7	2021-04-07 22:22:43 -07:00
James Reed	ec38dda1cc	Remove extra close bracket in extending.rst (#55409 ) Summary: Small typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/55409 Reviewed By: pbelevich Differential Revision: D27611177 Pulled By: jamesr66a fbshipit-source-id: 8a5ff702e4ab8a7eb2403432889f8b7a5a69484b	2021-04-07 21:15:46 -07:00
Can Balioglu	493a233c04	[torch/elastic] Revise the rendezvous handler registry logic. (#55466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55466 Improve the implementation and the unit test coverage of `RendezvousHandlerRegistry`. ### Note See the original diff (D27442325 (`df299dbd7d`)) that had to be reverted due to an unexpected Python version incompatibility between the internal and external PyTorch CI tests. Test Plan: Run the existing and newly-introduced unit tests. Reviewed By: tierex Differential Revision: D27623215 fbshipit-source-id: 51538d0f154f64e04f685a95d40d805b478c93f9	2021-04-07 20:43:20 -07:00
Peter Bell	8ac0619784	Avoid infinite recursion in __torch_function__ example (#55391 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55284 This gets the example to run but probably doesn't help the readability of the example. Thoughts? Pull Request resolved: https://github.com/pytorch/pytorch/pull/55391 Reviewed By: mrshenli Differential Revision: D27621096 Pulled By: ezyang fbshipit-source-id: d02c4fb0001e54139a167b477fd3b4a229e4dc8c	2021-04-07 20:31:46 -07:00
Natalia Gimelshein	b39eeb07ed	Revert D27622277: [pytorch][PR] avoid CPU std::copysign segfault when compiling on arm64 with gcc 7.5 / 8 for CUDA Test Plan: revert-hammer Differential Revision: D27622277 (`3bb1f59a9c`) Original commit changeset: a1dc4c3a67f9 fbshipit-source-id: 352443cec6ae0ba794e559f92578192cefbe2ab4	2021-04-07 18:25:32 -07:00
Scott Wolchok	d6cbecbbb6	[PyTorch] Reapply D27404164: Devirtualize is_contiguous (#55333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55333 Reapplying without using enum class in a bitfield. See new comments about gcc bug. ghstack-source-id: 125776904 Test Plan: Carefully review OSS test failure logs this time Reviewed By: kimishpatel, bhosmer Differential Revision: D27576623 fbshipit-source-id: 68fb00e5ff5215e56c8b9bc02717e1e7b2fedf9b	2021-04-07 18:20:33 -07:00
Edward Yang	e359842f23	Strict typecheck all files in tools/codegen (#55227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55227 This seems to increase the number of typechecked files. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D27535373 Pulled By: ezyang fbshipit-source-id: b36f6f8ce52c76848ed600ca9dd6b0c1de5813ff	2021-04-07 18:06:41 -07:00
Tugsbayasgalan Manlaibaatar	384daacd1e	[Hackathon] Add source range info for tests in test_module_containers (#55500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55500 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27628752 Pulled By: tugsbayasgalan fbshipit-source-id: 3b0a1a1daae4d701be2358f912cba839844b2a44	2021-04-07 16:46:30 -07:00
Tugsbayasgalan Manlaibaatar	b3f1fece1b	[Hackathon] add highlight to test_module_interface.py (#55530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55530 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27628729 Pulled By: tugsbayasgalan fbshipit-source-id: 4d7d2d56f0475c4311fe68ff336c073b564e02fa	2021-04-07 16:42:19 -07:00
Lu Fang	524dbe1fa1	[Easy] Fix typo in package_exporter.py (#55551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55551 Simple typo, it should be `OrderedImporter` Test Plan: ci Differential Revision: D27629463 fbshipit-source-id: 745527a8339f03a8fd38d0a4491811b3c9ca9b1e	2021-04-07 16:30:07 -07:00
Yanan Cao	f0ce8593db	[Hackathon] Add source highlight check in test_torchbind (#55495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55495 Reviewed By: janeyx99 Differential Revision: D27627499 Pulled By: gmagogsfm fbshipit-source-id: 6749d7f58a98f40d6f301c6f37321ec85707242e	2021-04-07 16:17:22 -07:00
Yanan Cao	0f1350055b	[Hackathon] Add source range highlight check to test_with (#55513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55513 Reviewed By: janeyx99 Differential Revision: D27627520 Pulled By: gmagogsfm fbshipit-source-id: 132f4dd2e99d2b5981fdd1522dbf7727b6abf7ab	2021-04-07 16:14:10 -07:00
Yanan Cao	94a3bad343	[Hackathon] Add source highlighting check in test_type_sharing (#55498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55498 Reviewed By: janeyx99 Differential Revision: D27627506 Pulled By: gmagogsfm fbshipit-source-id: abdd2a505099d3976762b4851d1024eb791e9204	2021-04-07 16:12:17 -07:00
Yanan Cao	5f90ed550c	[Hackathon] Add source range highligh check to test_string_formatting (#55491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55491 Reviewed By: janeyx99 Differential Revision: D27627477 Pulled By: gmagogsfm fbshipit-source-id: 4586d7c96eae762be53c1155c6c724c6d65f1e7f	2021-04-07 16:08:30 -07:00
Yanan Cao	b9326d418d	[Hackathon] Add error source range highlighting check in test_scriptmod_ann (#55482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55482 Reviewed By: janeyx99 Differential Revision: D27627460 Pulled By: gmagogsfm fbshipit-source-id: 099e36f36561c9252b027c8f89b301108133b0a7	2021-04-07 15:46:43 -07:00
Jane Xu	5d78c4f701	Use assertRaisesRegexWithHighlight test_class_type.py (#55510 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55510 Reviewed By: gmagogsfm Differential Revision: D27627208 Pulled By: janeyx99 fbshipit-source-id: 6cfbd4523ebd9803496fbdc5128b91110e87e07a	2021-04-07 15:06:06 -07:00
Jane Xu	11889a51ed	Use assertRaisesRegexWithHighlight test_enum.py (#55521 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55521 Reviewed By: gmagogsfm Differential Revision: D27627616 Pulled By: janeyx99 fbshipit-source-id: fabdec52729087b9ae693b14a0bc11c596003035	2021-04-07 15:00:45 -07:00
Jane Xu	b665298dc8	Use assertRaisesRegexWithHighlight test_custom_operators.py (#55519 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55519 Reviewed By: gmagogsfm Differential Revision: D27627487 Pulled By: janeyx99 fbshipit-source-id: 6bd54433617180c56153785b69c2e49faf19ef34	2021-04-07 14:39:52 -07:00
Jane Xu	469734ae54	Replace assertRaisesRegex w/ assertRaisesRegexWithHighlight test_builtins (#55496 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55496 Reviewed By: gmagogsfm Differential Revision: D27627271 Pulled By: janeyx99 fbshipit-source-id: c59c93018dbb5051e1e49b66298e9caf779b438b	2021-04-07 14:37:01 -07:00
Jane Xu	b1bae01e0c	Replace raiseRegex with raiseRegexWithHighlight test_async.py (#55489 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55489 Reviewed By: gmagogsfm Differential Revision: D27625872 Pulled By: janeyx99 fbshipit-source-id: 1ee606a30b13d041d8d107e6cc23be16c076d072	2021-04-07 14:30:23 -07:00
Jane Xu	a20a72d41b	Replace assertRaisesRegex w/ assertRaisesRegexWithHighlight test_backends.py (#55493 ) Summary: Step to resolving https://github.com/pytorch/pytorch/issues/55072 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55493 Reviewed By: gmagogsfm Differential Revision: D27626192 Pulled By: janeyx99 fbshipit-source-id: 047b7b6754e21388f52489160d712858b7aa0288	2021-04-07 14:30:20 -07:00
whiteking64	e6bfff679d	[ONNX] Add hardsigmoid symbolic in opset 9 #49649 (#54193 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49649 Adds support for torch.nn.Hardsigmoid operator in torch.onnx.export Pull Request resolved: https://github.com/pytorch/pytorch/pull/54193 Reviewed By: anjali411 Differential Revision: D27522969 Pulled By: SplitInfinity fbshipit-source-id: 33abcec578f4bc3cf5c3ee1c1bed7d94816bee96	2021-04-07 14:28:31 -07:00
Rong Rong (AI Infra)	2dd7dafb62	[Hackathon][take2] jit py3 move list dict tuple to jit/ (#55515 ) Summary: moved to jit/test_list_dict.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/55515 Reviewed By: mrshenli Differential Revision: D27627615 Pulled By: walterddr fbshipit-source-id: 6b17a4d6535ae2d6d848532a4df2278d3aaefa7b	2021-04-07 14:25:04 -07:00
Gemfield	afd549bb8f	[Doc] fix profiler doc (#55449 ) Summary: fix profiler doc Pull Request resolved: https://github.com/pytorch/pytorch/pull/55449 Reviewed By: bdhirsh Differential Revision: D27626301 Pulled By: mrshenli fbshipit-source-id: e9540fa0022c764371c785ca4079797d17532417	2021-04-07 14:16:47 -07:00
Richard Zou	1e70d217e7	Add error message for complex alpha and non-complex inputs (#54964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54964 Previously, the following would error out with a strange error message: ``` import torch x=torch.randn(2) torch.rsub(x, 1, alpha=2j) Traceback (most recent call last) <ipython-input-2-caf2a1c03d0b> in <module> 1 import torch 2 x=torch.randn(2) ----> 3 torch.rsub(x, 1, alpha=2j) RuntimeError: value cannot be converted to type float without overflow: (-0,-2) ``` The reason why this is happening is because the alpha check doesn't check for if `x` is not complex and `alpha` is complex. The error gets thrown further along in the implementation of torch.sub, when it coerces `alpha` to be the same dtype as the input tensor: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/BinaryOpsKernel.cpp#L53 This PR fixes the bad error message by adding a new check to the alpha check. Test Plan: - pytest test/test_binary_ufuncs.py - NB: add, sub, and rsub all share the same alpha check. The test only tests it for torch.add, but that should be sufficient. Reviewed By: gchanan Differential Revision: D27504017 Pulled By: zou3519 fbshipit-source-id: 70b9aa75a7a4faaaa93f6ba235cae85998a91697	2021-04-07 14:12:34 -07:00
Brian Hirsh	dd2bccafc5	nnc hackathon - use new APIs in tests (#55497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497 Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203 I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask` Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27628625 Pulled By: bdhirsh fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d	2021-04-07 13:03:25 -07:00
Tugsbayasgalan Manlaibaatar	10abbb812a	Support tensor subclasses in Torchscript (#54817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54817 Test Plan: python test case Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27407723 fbshipit-source-id: 459b9067f07908026f94620c1cfa3e00e8b50a4e	2021-04-07 12:10:27 -07:00
Alban Desmaison	b91d48877d	Reland Fix reference cycle in sparse coalesce graph (#55404 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/52874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55404 Reviewed By: bdhirsh Differential Revision: D27600438 Pulled By: albanD fbshipit-source-id: f5c286638b324ad59be65657a016028af5e2b303	2021-04-07 12:02:42 -07:00
Yanan Cao	797d0c4c68	[Hackathon] Add error source range highlighting check in test_recursive_script.py (#55475 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/55475 Reviewed By: janeyx99 Differential Revision: D27625464 Pulled By: gmagogsfm fbshipit-source-id: 27cf508593f02a26ba63e58b9bbb125b9e90e1ea	2021-04-07 11:51:15 -07:00
Rong Rong (AI Infra)	0d1058fbc7	Revert D27625646: [pytorch][PR] move list dict and named tuple tests out of py3 and into test_list_dict.py Test Plan: revert-hammer Differential Revision: D27625646 (`1c78a4a733`) Original commit changeset: 2d68f0e24df2 fbshipit-source-id: 8ccae798e1c38b7df1320d767bcf281d2d466758	2021-04-07 11:47:22 -07:00
Ivan Kobzarev	85fcadc059	[lite-interpreter] speed_benchmark_torch support BUILD_LITE_INTERPRETER (#55402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55402 Test Plan: Imported from OSS Reviewed By: cccclai Differential Revision: D27599824 Pulled By: IvanKobzarev fbshipit-source-id: 3adbb8a16a785d3610404d71ef2d895904b1a8ef	2021-04-07 11:39:32 -07:00
Rong Rong (AI Infra)	1c78a4a733	move list dict and named tuple tests out of py3 and into test_list_dict.py (#55476 ) Summary: Hackathon: Split test_jit_py3 into jit/ individual tests part 1: move Dict, List, NamedTuple Pull Request resolved: https://github.com/pytorch/pytorch/pull/55476 Reviewed By: nikithamalgifb Differential Revision: D27625646 Pulled By: walterddr fbshipit-source-id: 2d68f0e24df2c26ea73860b9d36669e2a6e4ff44	2021-04-07 11:29:44 -07:00
Jacob Szwejbka	bfee8d0464	[Pytorch Edge] Dont cache inflated bundled inputs (#55181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55181 There can be a dramatic model size delta between saving a model after calling generate_bundled_inputs_for_* and saving before. This is due to the caching of the inflated tensor. This increases latency when asking for the bundled inputs multiple times. I dont think this matters but it might for something like benchmarking? ghstack-source-id: 125746773 Test Plan: unit tests. Reviewed By: dreiss Differential Revision: D27519487 fbshipit-source-id: 6ba22bff9c4e3a8d86c04627b7cbf47ca2d141b9	2021-04-07 10:46:43 -07:00
CodemodService FBSourceClangFormatLinterBot	56cd1d366e	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27617241 fbshipit-source-id: a5f695a6ee34daf0acd970720565296d785e9eb1	2021-04-07 10:37:27 -07:00
Erjia Guan	f9a0bbbeb8	[DataPipe] Remove duplicate dataset (#54553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54553 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27279301 Pulled By: ejguan fbshipit-source-id: 112a83e7061e3f35dc517eb623bd9ca93c2f034c	2021-04-07 10:11:22 -07:00
Aliaksandr Ivanou	f5675f8306	[torchelastic] Make sure torchelastic mp wait for queue to be drained before finishing the process (#55412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55412 The diff resolves bug where worker processes could exit before torchelastic process would read the return values. This is a rare event, but still can happen, e.g. https://fb.workplace.com/groups/319878845696681/permalink/512409069776990/ When users want to return torch.Tensor object from worker process, the torchelastic multiprocessing will fail. Currently worker process finishes its job after it writes output to the IPC queue without receiver process confirmation. When this happens, the underlying channel between worker and torchelastic process could be closed (in case of mp.SimpleQueue it is file descriptors, that is why we see FileNotFoundException: since worker process finished execution, the file descriptor just got deleted, and torchelastic process cannot find it). Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test User workflow: f263531643 Reviewed By: cbalioglu Differential Revision: D27602838 fbshipit-source-id: 29871178232e3af4ad3dec406c234aba9c5faba1	2021-04-07 09:39:24 -07:00
Thomas Viehmann	3bb1f59a9c	avoid CPU std::copysign segfault when compiling on arm64 with gcc 7.5 / 8 for CUDA (#51834 ) Summary: It seems that the std::copysign code introduced in https://github.com/pytorch/pytorch/issues/51706 is too much for gcc 7.5 / 8 when compiled on arm64 (e.g. on Jetson with latest Jetpack) and causes it to produce an internal compiler error with segfault during compilation. This avoids the compiler bug it by not using std::copysign. A very kind person sent a Jetson Xavier NX {emoji:1f381} thank you {emoji:2764}. After https://github.com/pytorch/pytorch/issues/51900 fixed this for CPU-only arm64 (eg Raspberry), this fixes it for CUDA-using arm64 (e.g. Jetson). CUDA device lambdas must also be present as host functions for technical reasons but they are never used, so we just assert in the CPU variant instead of actually doing the operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51834 Reviewed By: mrshenli Differential Revision: D27622277 Pulled By: malfet fbshipit-source-id: a1dc4c3a67f925019782e24b796919e17339749f	2021-04-07 09:31:13 -07:00
Ilia Cherniavskii	af2beaf675	[profiler] Fix time discrepancy between legacy and kineto events (#55226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55226 Fixes a bug caused by using different clocks in legacy events, also fixes two small issues with not using relative time in memory events and discrepancy between start and stop profile events CUDA-wise Test Plan: CI Reviewed By: xuzhao9 Differential Revision: D27534920 fbshipit-source-id: 7a877367b3031660516c9c4fdda1bf47e77bcb3e	2021-04-07 09:20:19 -07:00
Jeff Daily	8e6e7dca09	[ROCm] if TEST_WITH_ROCM, only instantiate GPU device tests (#55069 ) Summary: Improves ROCm CI throughput by instantiating only for device tests that exercise the AMD GPU devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55069 Reviewed By: mrshenli Differential Revision: D27610877 Pulled By: malfet fbshipit-source-id: aa2b42b9f7611dca37cbb922790d7fe0f4be8dbd	2021-04-07 09:12:09 -07:00
kshitij12345	17e5ba44f1	[testing] Support input samples where `self` is broadcasted. (#53014 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50747 Reference https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53014 Reviewed By: SplitInfinity, ngimel Differential Revision: D27615320 Pulled By: mruberry fbshipit-source-id: a48bccf06aef1ee8f66a89e6b2bbe79736700b2b	2021-04-07 08:20:27 -07:00
Jane Xu	2e9eb5afa2	Use slow tests stats in common_utils (#55190 ) Summary: This is a step in adding automatic slowTest detection to our testing infrastructure. This uses stats (updated daily) in https://github.com/pytorch/test-infra/blob/master/stats/.pytorch-slow-tests to determine whether more tests need to be marked as slow as they are run. More details in previous PR draft/proposal [here](https://github.com/pytorch/pytorch/pull/54456#issue-598388491), though I no longer think we need the third step as using a raw git file does not require much processing. Upon looking at [logs](https://circleci.com/api/v1.1/project/github/pytorch/pytorch/12060292/output/107/0?file=true&allocation-id=606660dbd8e5857bcc2b2e0f-0-build%2F60DCA8CD) for the coverage tests as of the first commit [when I had not skipped the tests so we could see their actual times], here are some slow tests that weren't marked as slow before: ``` test_fn_gradgrad_unfold_cpu_complex128 (__main__.TestGradientsCPU) (172.554s) test_matmul_4d_4d_complex_cpu (__main__.TestAutogradDeviceTypeCPU) (180.057s) test_conv1d_basic (__main__.TestXNNPACKConv1dTransformPass) (94.737s) ``` And here is a test that wasn't actually slow but was still marked as slow based on stats: ``` test_trunc_normal (__main__.TestNNInit) ... ok (1.208s) ``` The new logs show the above tests as skipped (as they should be): [Coverage Test 1](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126156) and [Coverage Test 2](https://app.circleci.com/pipelines/github/pytorch/pytorch/296224/workflows/ba6c2917-51f8-4fb8-be57-90151c2e5c25/jobs/12126155) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55190 Reviewed By: samestep Differential Revision: D27566663 Pulled By: janeyx99 fbshipit-source-id: c13f8c676bb8eb15d9d697d224dbaef7df98aef3	2021-04-07 08:04:39 -07:00
mattip	b9a02128bc	split nn.functional (#55038 ) Summary: Related to https://github.com/pytorch/pytorch/issues/52256 Splits torch.nn.functional into a table-of-contents page and many sub-pages, one for each function Pull Request resolved: https://github.com/pytorch/pytorch/pull/55038 Reviewed By: gchanan Differential Revision: D27502677 Pulled By: zou3519 fbshipit-source-id: 38e450a0fee41c901eb56f94aee8a32f4eefc807	2021-04-07 06:35:47 -07:00
Jeff Yang	263d8ef4ef	docs: fix formatting for embedding_bag (#54666 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54666 Reviewed By: H-Huang Differential Revision: D27411027 Pulled By: jbschlosser fbshipit-source-id: a84cc174155bd725e108d8f953a21bb8de8d9d23	2021-04-07 06:32:16 -07:00
Bassam Yassin	6fd20a8dea	Back out "[pytorch][PR] [fix] torch.frac : Handle inf correctly" Summary: Original commit changeset: 92c7309558ee Test Plan: reverting D27566407 (`ece075195d`) Differential Revision: D27618949 fbshipit-source-id: 7930251f4bc88e7991805d77a617a181d68a4880	2021-04-07 04:34:07 -07:00
James Reed	c96f076248	Fix typo in extending.rst (#55408 ) Summary: Small typo in docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/55408 Reviewed By: pbelevich Differential Revision: D27611175 Pulled By: jamesr66a fbshipit-source-id: a83a6220054c0411329792c7ac6afceb2b699f44	2021-04-07 03:46:01 -07:00
kshitij12345	ece075195d	[fix] torch.frac : Handle inf correctly (#52678 ) Summary: Fixes : https://github.com/pytorch/pytorch/issues/51948 Fixes : https://github.com/pytorch/pytorch/issues/52027 Depends On: https://github.com/pytorch/pytorch/issues/52660 TODO * [x] Benchmark Pull Request resolved: https://github.com/pytorch/pytorch/pull/52678 Reviewed By: anjali411 Differential Revision: D27566407 Pulled By: heitorschueroff fbshipit-source-id: 92c7309558ee41f8b9c641f791e8f84819c333e2	2021-04-07 02:27:56 -07:00
Ailing Zhang	bc05867618	Separate TLS for InferenceMode (#55424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55238 I tried to avoid creating new TLS, but InferenceMode::is_enabeld() is in perf critical path (TensorImpl constructor) so it seems worth adding one for it. This PR reduces one sources of instruction count increased by https://github.com/pytorch/pytorch/pull/55008. ``` λ ~ python compare.py <torch.utils.benchmark.utils.valgrind_wrapper.timer_interface.FunctionCounts object at 0x7f59097ef310> 100 0x0000000004854750 -100 0x0000000004854760 -4400 c10::impl::tls_is_dispatch_key_included(...) ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27539230 Pulled By: ailzhang fbshipit-source-id: e040877faef966dca3c2c3d5f9e9a80496c81415	2021-04-06 22:17:26 -07:00
Dhruv Matani	82006ba460	[PyTorch Edge] Implement fb::jpeg_decode_to_NCHW (#55251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55251 Based on a discussion with dreiss and JacobSzwejbka, we decided to implement a flexible operator for decoding a JPEG bundled image that allows getting the image in BGR format with scaling, and offsets applied for the MaskRCNN operators without calling `conv2d()` and pulling in a ton of additional operators and kernel functions. Please see the previous diff in the stack for the new operators that the change w/o this diff would have pulled in since Inflatable Arg string is non-trivial. This change implements that operator. Please see the comments in the code for detail regarding what the operator does. ghstack-source-id: 125641068 Test Plan: I re-implemented the existing operator in terms of the new operator and used the existing unit test to ensure that the same (or comparable) tensor is produced. ``` cd fbsource/fbcode/ buck test caffe2/test:test_bundled_images ``` Ran this bento notebook https://www.internalfb.com/intern/anp/view/?id=476100 with the new operator `fb::jpeg_decode_to_NCHW` and saw that it is able to generate proposals. Ran the generated hand tracking model with tracer and observed just the 2 new operators and 0 new dtypes copy kernel function, which to me seems like an acceptable set of new ops to pull in since they are relatively simple operators: {P383858691} Reviewed By: dreiss Differential Revision: D27531423 fbshipit-source-id: 2dc6c41029236bb71922e51cbfd14a46c5651149	2021-04-06 21:37:53 -07:00
Ying Zhang	8c1a70a7c9	[A*][Gen-1.5] Add shape inference func for PredictorCall. Summary: ATT, so that the shape inference works for a model with only distributed parts. Previously, we rely on a full_predictor net to do shape inference. For very large models, the full_predictor net won't be generated, so we have to do shape inference based on distributed parts. Surprisingly, the PredictorCall op does tensor name mapping so it has to have shape inference func supported. Test Plan: Added unittests. Reviewed By: khabinov Differential Revision: D27250956 fbshipit-source-id: 3ebd36ba1eb020bb5d00358cffb8f038a6a996e8	2021-04-06 21:18:40 -07:00
Sebastian Messmer	87cf277bd7	Don't allocate result Tensors in out overloads: _linalg_solve_out_helper_cuda (#55321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55321 We have some operators that previously allowed you to pass in an undefined tensor to the out argument, and then would go on to allocate that for you. This behavior is broken and doesn't work in JIT when things are converted to/from IValues. Because of this, it blocks backend fallbacks because they force going through IValue. This PR is one in a series to remove that behavior and forces out arguments to be defined tensors. It only looks at at::_linalg_solve_out_helper_cuda(), but there's more PRs for other ops. ghstack-source-id: 125886984 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ngimel Differential Revision: D27572759 fbshipit-source-id: 5bca60b39c513b8d85fe282ebd4d66607d54774f	2021-04-06 19:55:39 -07:00
Sebastian Messmer	acfb05ff43	Boxing logic forwards arguments to stack (#53624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53624 Previously, the boxing logic didn't correctly forward arguments to the stack but called copy constructors. This PR fixes that. ghstack-source-id: 125886983 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: bhosmer Differential Revision: D26852856 fbshipit-source-id: d2463eeca2f3fce1bbe117611be200fda59c880b	2021-04-06 19:55:37 -07:00
Sebastian Messmer	34b46359e3	Fix forwarding/move bug (#53556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53556 When packing a `Tensor&` (mutable lvalue reference) into an IValue, we accidentally didn't increase the refcount. This wasn't triggered anywhere, until I tried to enable backend fallbacks. Backend fallbacks for ops that have out arguments (i.e. ops that take `Tensor&` arguments and return `Tensor&` arguments) pack those returns into an IValue stack (and accidentally don't increase the refcount), then later that stack gets destructed (which decreases the refcount and possibly destroys the Tensor), and the `Tensor&` passed in as an out argument is suddenty freed memory. This PR fixes that by forwarding instead of moving when wrapping Tensors into IValues. ghstack-source-id: 125886986 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: swolchok Differential Revision: D26896507 fbshipit-source-id: 62102fa89e522699b5174c33279a2b1a775066a4	2021-04-06 19:55:34 -07:00
Sebastian Messmer	4757d4c077	Don't allocate result Tensors in out overloads: at::kron_out() (#53640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53640 We have some operators that previously allowed you to pass in an undefined tensor to the out argument, and then would go on to allocate that for you. This behavior is broken and doesn't work in JIT when things are converted to/from IValues. Because of this, it blocks backend fallbacks because they force going through IValue. This PR is one in a series to remove that behavior and forces out arguments to be defined tensors. It only looks at at::kron_out(), but there's more PRs for other ops. BC Breaking: This breaks BC since those ops previously allowed calling with undefined tensors and that isn't allowed anymore. ghstack-source-id: 125886981 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: bhosmer, ngimel Differential Revision: D26921165 fbshipit-source-id: e61411226c12d33cb196a1e010ff733fe9fa6b7b	2021-04-06 19:55:31 -07:00
Sebastian Messmer	db75ebac4a	Don't allocate result Tensors in out overloads: Reduction Ops (#53218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53218 We have some operators that previously allowed you to pass in an undefined tensor to the out argument, and then would go on to allocate that for you. This behavior is broken and doesn't work in JIT when things are converted to/from IValues. Because of this, it blocks backend fallbacks because they force going through IValue. This PR removes that behavior and forces out arguments to be defined tensors. It only looks at reduction ops for now, there's likely more PRs coming for other ops. BC Breaking: This breaks BC since those ops previously allowed calling with undefined tensors and that isn't allowed anymore. ghstack-source-id: 125886980 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D26795461 fbshipit-source-id: 158465260fe59deb7d4b2081e810a7434cfba722	2021-04-06 19:55:29 -07:00
Sebastian Messmer	73aeea648e	Fix Scalar formatting (#53229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53229 Scalar formatting was assuming that everything non-float was integral. This would output bools as ints, and even worse, it would crash for complex. This PR fixes that. ghstack-source-id: 125886979 (Note: this ignores all push blocking failures!) Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D26800345 fbshipit-source-id: 1a9efd085276b40d6fb399d255a6bbd7d5f3619f	2021-04-06 19:55:26 -07:00
Sebastian Messmer	35caae6045	Fix boxing/unboxing for Scalar bool values (#53228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53228 Previously, if a Scalar value contained a bool and was put into and then out of an IValue, it would magically transform to an int. This PR fixes that and preserves the bool-ness. ghstack-source-id: 125886985 (Note: this ignores all push blocking failures!) Test Plan: unit tests Reviewed By: ezyang Differential Revision: D26800346 fbshipit-source-id: f170a5b8419bde9d3155042f9126e377714ec3ba	2021-04-06 19:53:29 -07:00
Nikita Shulga	add49e7e4e	Enforce PEP263 for PyTorch python codebase (#55346 ) Summary: All python files containing non-ASCII characters should be correctly annotated with `# -- coding: utf-8 --` comment Delete number of superfluous UTF-8 characters, most commonly UTF-8 opening closing quotation mark U+2019 (’) instead of ascii apostrophe ', for example `Module’s`->`Module's` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55346 Reviewed By: samestep Differential Revision: D27582044 Pulled By: malfet fbshipit-source-id: c1cd89655915858ff3a41f675cdfffff795a8e44	2021-04-06 18:31:38 -07:00
Meghan Lele	34a7b4aabb	[tools] Remove newline from clang-format reference hashes (#55328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55328 Summary The `clang-format` reference hashes committed in #54737 have newlines at the end but the locally computed ones do not. This commit removes these newlines so that the `clang-format` binary verification step doesn't fail. Test Plan `./tools/clang_format_all.py`, ran successfully. Fixes This commit fixes #54790. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27577398 Pulled By: SplitInfinity fbshipit-source-id: e30bee58c2eb5ea96ed0a503480dea4f67b86aca	2021-04-06 17:17:19 -07:00
Jane Xu	96655e2b81	Re-enable disabled tests workflow with GHA (#55417 ) Summary: Replace the old (and disabled) workflow to update disabled tests with a GitHub Action that would gather a list of disabled tests and export them to our test-infra repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55417 Test Plan: This [workflow](https://github.com/janeyx99/gha-experiments/runs/2282792158?check_suite_focus=true) has successfully pushed, resulting in this file: https://github.com/pytorch/test-infra/blob/master/stats/.pytorch-disabled-tests Reviewed By: walterddr Differential Revision: D27608584 Pulled By: janeyx99 fbshipit-source-id: b9dc184712484ef4806f24a34670390f723824bc	2021-04-06 16:40:41 -07:00
maxwell	79fe5b7897	[Doc]fix torch.ceil formula issue(pytorch#54948) (#55039 ) Summary: Fixes wrong formula https://github.com/pytorch/pytorch/issues/54948 The new one is <img width="157" alt="截屏2021-03-31 下午5 25 59" src="https://user-images.githubusercontent.com/32546978/113124411-14407000-9248-11eb-92f6-7b47b4cfd5e4.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/55039 Reviewed By: ngimel Differential Revision: D27562484 Pulled By: mruberry fbshipit-source-id: e01d9bfc0cf04558ecff3336a055037e6c3df028	2021-04-06 15:33:23 -07:00
Peter Bell	5c402d9026	STFT: Clarify output shape in documentation (#54877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54631 I removed the phrase "When `onesided` is the default value `True`". It's not always the default and it's also confusing because it doesn't seem to relate to the bullet points it's introducing. It makes more sense in the sentence before, i.e. these frequencies are included "when the output is onesided". So, I've rewritten it as that meaning and included the correct formula for frequencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54877 Reviewed By: ngimel Differential Revision: D27562785 Pulled By: mruberry fbshipit-source-id: d7f36382611e8e176e3370393d1b371d577d46bb	2021-04-06 15:28:57 -07:00
Scott Wolchok	933bbbbed6	[PyTorch] Fix waste in unfold() (#55339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55339 Use DimVector. Avoid calling size()/stride() when we know argument is in bounds. ghstack-source-id: 125839415 Test Plan: Existing CI Reviewed By: hlu1 Differential Revision: D27577647 fbshipit-source-id: b33057c383037dd0865de3a944ebf225ad8d9169	2021-04-06 14:38:06 -07:00
Scott Wolchok	4cf42fc62f	[PyTorch] Cache self.size(dim) in TensorShape functions (#55336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55336 The compiler cannot optimize this away because it does not know that size() has no side effects and doesn't get changed by anything else that goes on in the function. ghstack-source-id: 125775704 Test Plan: Spot-check assembly to verify assertion I made in the summary Reviewed By: ngimel Differential Revision: D27577299 fbshipit-source-id: 7b7ce1044c4c0b437d95103a5d149acb5d86c1bd	2021-04-06 14:36:36 -07:00
Bradley Davis	8eaa4a97b7	Back out "[quant][graphmode][fx] Separate handling Copy operator to a helper function" (#55388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55388 temporarily revert D27314678 (`c57541ce06`), it appears to cause a perf regression that makes quantization of some models take too long to complete tests. Reviewed By: houseroad Differential Revision: D27583809 fbshipit-source-id: e9c088ccbfd3bfb3a1d4c7eafee3eca29ee7717b	2021-04-06 14:20:36 -07:00
Ivan Yashchuk	84d18727bd	Added linalg.eig, linalg.eigvals (#52491 ) Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5	2021-04-06 13:53:26 -07:00
David Reiss	da7a27b847	[NNAPI] Initial flexible size support (#54701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54701 We need NNAPI models to support inputs (and, by extension, intermediate values and outputs) whose shape is only determined at load time. For example, a vision models input shape might be dependent on the aspect ratio of the device camera. While NNAPI has full support for variable shapes (by setting components of the operand shape to 0), the guidance we have received is that vendor-provided drivers for real hardware are not able to support this efficiently. Therefore, we take a hybrid approach where shapes are calculated at model load time to semi-dynamically construct our NNAPI model. While this doesn't let us have truly dynamic input shapes, it does allow us to ensure that the vendor driver only sees fixed shapes, so we get maximum performance. In this initial commit, only PReLU supports dynamic shapes. Additional operators will be converted in separate diffs. - In order to convert a flexible-shape model, the user supplies inputs with shapes containing dimensions of size 0 for the flexible dimensions. - During conversion, we generate code to compute the shapes of all intermediates and outputs as a function of the input shapes. - We no longer run the input model to produce the output templates. Instead, we generate code to return properly-sized templates, given the input shapes. - All of this generated code goes into a "ShapeComputeModule" that is used by the NnapiModule during initialization. - The ShapeComputeModule mutates the serialized model to fill in the computed sizes for each operand. This requires us to change the dtype for the serialized model to int32, but this should be fine because everything in it is already 4-byte aligned. - NnapiInitWrapper no longer exists. Instead, initialization is performed on the first run, based on the real arguments. We plan to provide an API for doing eager initialization. - Unit test updated to allow separate arguments to be given for trace, conversion, and inference. A flexible-shape test case was added for PReLU. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536796 Pulled By: dreiss fbshipit-source-id: 105585f247987b1e6ec6946a6fe44401237cb0a0	2021-04-06 13:49:43 -07:00
David Reiss	1e3b3a4714	[NNAPI] Create get_next_operand_id (#54700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54700 This is an internal method just to make it more clear what len(self.operands) is doing. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536794 Pulled By: dreiss fbshipit-source-id: 678cee8a47df6757dd2e6feabf2560fd82d32e26	2021-04-06 13:49:41 -07:00
David Reiss	ca67c17e46	[NNAPI] Add fixed-size assertions (#54699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54699 We'll soon be adding support for flexible-size tensors to the NNAPI converter, but it won't be added to all ops at once. Create get_tensor_operand_by_jitval_fixed_size as a wrapper for get_tensor_operand_by_jitval that verifies that the argument has a fixed shape. Update all call sites. As flexible size support is added to each op, the call sites can be converted back and proper size checks added. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536791 Pulled By: dreiss fbshipit-source-id: 6fb1fea814d767b6ff263fd8b88240a51be74777	2021-04-06 13:49:38 -07:00
David Reiss	5936faee7e	[NNAPI] Rename local variable (#54698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54698 "mf" was short for memory format, but the concept that this variable represents was renamed to "dim_order", so rename the variable. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536793 Pulled By: dreiss fbshipit-source-id: 2b31c70da1ff221a7833e67486690fa606f01dea	2021-04-06 13:49:35 -07:00
David Reiss	1f1d26137b	[NNAPI] Use code generation to better support list input/output (#54697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54697 Previously, models being converted to NNAPI were expected to take inputs as separate arguments, but the generated NNAPI model could only take multiple inputs as a list. Now the generated model always takes inputs (single or multiple) as separate tensor arguments. Previously, models being converted to NNAPI were expected to return outputs as a single tensor or tuple of tensors, but the generated NNAPI model would return multiple outputs as a list. Now the generated model returns a tuple as well (or single tensor). Internally, we decied what output format to use (single tensor or tuple) based on the conversion process, rather than by running the model. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536790 Pulled By: dreiss fbshipit-source-id: c0f93c85d450757e568985947cc2f32043795859	2021-04-06 13:49:33 -07:00
David Reiss	d34d6244e7	[NNAPI] Use array instead of struct for serializing ints (#54696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54696 This was originally developed for a Python version where array was not available. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536792 Pulled By: dreiss fbshipit-source-id: 39e5507e37d4f91871113439fe752a4d5373eaba	2021-04-06 13:49:30 -07:00
David Reiss	1d1db42340	Fix NNAPI for internal fbcode build (#48925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48925 The internal build has different header visibility than CMake. Test Plan: Ran unit tests on dev server. Reviewed By: axitkhurana Differential Revision: D25365246 Pulled By: dreiss fbshipit-source-id: 6b66f972b75874596b5b0e7fef34475950d8f611	2021-04-06 13:49:27 -07:00
David Reiss	476c597ae6	[NNAPI] Handle binary ops combining NHWC+NCHW in some cases (#48812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48812 This came up in a squeeze-and-excitation model. Starting with an NHWC tensor T, we perform a mean operation across H and W, giving an NxC tensor, which (after some fully connected layers) is reshaped to NxCx1x1, then multiplied with T. To handle this, we detect the specific case of a binary op with one NHWC input and one contiguous input with H,W == 1,1 and allow the op to be applied (after transposing the contiguous input). Test Plan: Unit test. Reviewed By: axitkhurana Differential Revision: D25317939 Pulled By: dreiss fbshipit-source-id: b4c17ab3b874d1a7defa04664010ba82115f1c20	2021-04-06 13:49:25 -07:00
David Reiss	b057d27b0b	[NNAPI] Add support for unsqueeze, cat, and mean (#48811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48811 Test Plan: Unit tests. Reviewed By: axitkhurana Differential Revision: D25317936 Pulled By: dreiss fbshipit-source-id: 9b3a0a75b8157ae35ac13d52293a67800bad0ded	2021-04-06 13:49:22 -07:00
David Reiss	3802edd9ab	[NNAPI] Add unit test (#47521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47521 This mostly goes op-by-op. We construct a simple model containing the op (in various configurations for complex ops) and verify that it can be converted to NNAPI. Additionally, if libneuralnetworks is available, we also run both the eager model and NNAPI model and ensure that their outputs are equal (allowing for some slight numerical differences). serializer.py has 94% coverage. And most of the uncovered lines are error cases, defensive code, or dead code that I might want to use later. prepare.py has 56% coverage, but probably closer to 75-80% if we could collect coverage from TorchScript. Test Plan: Ran tests with NNAPI available. Made various tweaks to the codebase to make sure tests properly detected bugs. Reviewed By: axitkhurana Differential Revision: D25317940 Pulled By: dreiss fbshipit-source-id: 709125af820440bfa7a73bab3304395f115f717f	2021-04-06 13:49:19 -07:00
David Reiss	8fcf9ca341	[NNAPI] Update support for Linear (#54695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54695 Previously, torch.nn.Linear was calling aten::addmm internally. Now it's calling aten::linear, so add support for that. Test Plan: Unit test Reviewed By: axitkhurana Differential Revision: D27536795 Pulled By: dreiss fbshipit-source-id: 42c8d2a80b20ac12ed9bba599c5e0e874256bb13	2021-04-06 13:49:17 -07:00
David Reiss	8d960f7043	[NNAPI] Fix hardtanh (#47520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47520 NNAPI defines "RELU1" as clamping from [-1, 1], not [0, 1] as I previously assumed. Fix our implementation to match. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317934 Pulled By: dreiss fbshipit-source-id: 70efd5bb6092b0628ff6b765ce6f6274ef28d741	2021-04-06 13:49:14 -07:00
David Reiss	beca1fdbec	[NNAPI] Fix MUL op (#47519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47519 This wasn't updated when _do_add_binary was refactored. Test Plan: Upcoming unit test. Reviewed By: axitkhurana Differential Revision: D25317938 Pulled By: dreiss fbshipit-source-id: 99212404c189481cfa692dd77d8f7c7865b6872b	2021-04-06 13:49:12 -07:00
David Reiss	38a3c28f17	[NNAPI] Remove solid weights support (#47518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47518 This was left over from an old version of the code. The idea was that instead of indexing into separate tensors for each weight, you could bundle them all into a single file and use different offsets into that file. With the current design, this is nontrivial to support, so drop the code for now. Test Plan: CI Reviewed By: axitkhurana Differential Revision: D25317935 Pulled By: dreiss fbshipit-source-id: e26ab3a8d437cb1bbb50319209fa56d9c571ce61	2021-04-06 13:49:09 -07:00
David Reiss	1be909f074	[NNAPI] Fix models with no weights (#47517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47517 While we're unlikely to see this in practice, it comes up in unit tests. This type annotation is necessary for `torch.jit.script` to figure out the type of the list if it is empty. Test Plan: Unit tests in a later diff. Reviewed By: axitkhurana Differential Revision: D25317937 Pulled By: dreiss fbshipit-source-id: de8b6665c6fcd3cd2b39e3c696a39336c064e4c1	2021-04-06 13:49:06 -07:00
David Reiss	0e7af36acd	Make bundled inputs work with quantized zero inputs (#47407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47407 Previously, the code for bundling contiguous single-valued tensors (like torch.zeros) wasn't working for quantized tensors because it was calling the `torch.tensor` constructor without passing in the quantizer. Instead, skip the constructor entirely, which makes this use case work and also simplifies the code. (Originally, I forgot that `arg.flatten()[0]` would return a tensor, not a scalar.) Test Plan: Bundled a quantized zero input and saw it run properly. Reviewed By: dhruvbird Differential Revision: D24752890 Pulled By: dreiss fbshipit-source-id: 26bc4873a71dd44660cc0fcb74c227b754e31663	2021-04-06 13:47:35 -07:00
Stephen Jia	ad5dc84ed3	[vulkan] Add Winograd convolutions (#54639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54639 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D27514882 Pulled By: SS-JIA fbshipit-source-id: 35cae338cf1e2e753bc66d27e1318168573ecb1d	2021-04-06 13:40:53 -07:00
Jacob Szwejbka	20d7916a6a	[Pytorch Mobile] Fold Conv BatchNorm for functions besides forward (#54619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54619 Minor refactor to conv batchnorm folding to work on other functions besides forward ghstack-source-id: 125767010 Test Plan: unit test and {P339453712} Reviewed By: kimishpatel Differential Revision: D27301452 fbshipit-source-id: 4e0cc544a171a970583979a496b2908935124497	2021-04-06 13:07:12 -07:00
Ailing Zhang	a9bcab46ff	Revert back changes in test_custom_ops.cpp. (#55350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55350 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27600413 Pulled By: ailzhang fbshipit-source-id: 5e0d5f13fe3a51fcdccaad8af4d46cbe82795174	2021-04-06 12:41:31 -07:00
Bin Bao	a0d9776104	[JIT] Include conv3d in the conv-add-relu fusion (#54772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54772 conv3d-add-relu fusion does not work on some platforms when TF32 is enabled, so set allow_tf32 to false. Test Plan: ``` python test/test_jit.py -k test_freeze_conv_relu_fusion ``` Imported from OSS Reviewed By: bertmaher Differential Revision: D27435560 fbshipit-source-id: e35e2297dce85acfbe988deea97c3f5e68f1e1c7	2021-04-06 12:08:13 -07:00
Brian Hirsh	ec80981d28	Revert D27246997: [pytorch][PR] Fix reference cycle in sparse coalesce graph Test Plan: revert-hammer Differential Revision: D27246997 (`815bfad28c`) Original commit changeset: 0fe6c1104350 fbshipit-source-id: 4d345718589a642d3c65474b266342285205ccdf	2021-04-06 11:45:27 -07:00
Brian Hirsh	ae3a876c9c	Revert D27572158: [torchelastic] Make sure torchelastic mp wait for queue to be drained before finishing the process Test Plan: revert-hammer Differential Revision: D27572158 (`e9c6a51100`) Original commit changeset: 9a360468acc9 fbshipit-source-id: 29f7e2cba3e134bc81fb31b7e1dfceb7c1f9d734	2021-04-06 11:41:55 -07:00
Szymon Migacz	8e78a1b084	[Resubmit] Fix for incorrect usage of logging in torch/distributed/distributed_c10d.py (#52757 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/51739 Fixes https://github.com/pytorch/pytorch/issues/51428 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52757 Reviewed By: cbalioglu Differential Revision: D26646843 fbshipit-source-id: df4962ef86ea465307e39878860b9fbbcc958d52	2021-04-06 11:32:26 -07:00
Aliaksandr Ivanou	e9c6a51100	[torchelastic] Make sure torchelastic mp wait for queue to be drained before finishing the process Summary: The diff resolves bug where worker processes could exit before torchelastic process would read the return values. This is a rare event, but still can happen, e.g. https://fb.workplace.com/groups/319878845696681/permalink/512409069776990/ When users want to return torch.Tensor object from worker process, the torchelastic multiprocessing will fail. Currently worker process finishes its job after it writes output to the IPC queue without receiver process confirmation. When this happens, the underlying channel between worker and torchelastic process could be closed (in case of mp.SimpleQueue it is file descriptors, that is why we see FileNotFoundException: since worker process finished execution, the file descriptor just got deleted, and torchelastic process cannot find it). Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test:local_agent_test User workflow: f263531643 Reviewed By: cbalioglu, wilson100hong Differential Revision: D27572158 fbshipit-source-id: 9a360468acc98d85d587ebf223e7e96d4b43fe4b	2021-04-06 11:03:00 -07:00
XiaobingSuper	f8788d5188	Upgrade onednn to v2.1.2 (#54956 ) Summary: This PR is to upgrade onednn to v2.1.2 which has the following main changes about cpu: - Improved performance of forward convolution with plain activations for processors with Intel AVX-512 support - Improved performance of fp32 depthwise convolution with plain activations on CPU. more changes can be found in https://github.com/oneapi-src/oneDNN/releases. Ideep used version is [pytorch-rls-v2.1.2](https://github.com/intel/ideep/tree/pytorch-rls-v2.1.2). OneDNN used version is [v2.1.2](https://github.com/oneapi-src/oneDNN/tree/v2.1.2). Pull Request resolved: https://github.com/pytorch/pytorch/pull/54956 Reviewed By: ejguan Differential Revision: D27466741 Pulled By: VitalyFedyunin fbshipit-source-id: ff96e2cbda4b6bf04d299b9978e9125a013ce32f	2021-04-06 10:51:57 -07:00
nikithamalgi	8243ba7205	Add MonkeyType dependency for testing on Linux (#55305 ) Summary: Install Monkey Type as part of our testing on Linux Pull Request resolved: https://github.com/pytorch/pytorch/pull/55305 Reviewed By: ailzhang Differential Revision: D27592679 Pulled By: nikithamalgifb fbshipit-source-id: c92b786e45fc16288d658228a5f96aca53a3da6b	2021-04-06 09:14:11 -07:00
lezcano	158cdece65	Correct many OpInfos "test_out" skips. (#55141 ) Summary: Partially solves https://github.com/pytorch/pytorch/issues/54061 This PR solves many of the "easy to solve" problems with `out=` not notifying when it resizes a tensor. It also reports the cause of some fails of the `out=` operation in the tests. Hopefully this way we will be able to catch some errors that do not come simply from not using `resize_output`. cc mruberry anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55141 Reviewed By: anjali411 Differential Revision: D27568755 Pulled By: mruberry fbshipit-source-id: a32546555fef8d241de2ef635a99e5615461ed09	2021-04-06 08:41:25 -07:00
Peter Bell	815bfad28c	Fix reference cycle in sparse coalesce graph (#52874 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52253 In the issue reproducer we can replace `torch.sparse.sum(S)` with `S.coalesce()` and get the same memory leak. The reason is that calling `coalesce()` on an already coalesced tensor returns `self`. With autograd, the result gets it's `grad_fn` set to a node that contains a reference to the input tensor, creating a reference cycle. Cloning the tensor fixes this, so `coalesce` always returns a new tensor. As an aside, `torch.sparse.sum(S)` doesn't need to coalesce. The result should be the same either way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52874 Reviewed By: bdhirsh Differential Revision: D27246997 Pulled By: albanD fbshipit-source-id: 0fe6c11043501a7874a50982afd42964f47470d3	2021-04-06 08:32:19 -07:00
vfdev-5	e5f66f0059	Optimized generic interpolation using TensorIterator (keeps original 2d/3d channels last impl) (#54500 ) Summary: Related to https://github.com/pytorch/pytorch/issues/10482 A follow-up PR to https://github.com/pytorch/pytorch/pull/51653/ Description: - Replaces nearest/linear/cubic implementations with generic interpolation implementation - Retains 2d/3d channels last implementation due to perf slowdown for 1 thread (see below appendix note) Speed-ups for cases: - upsample_nearest channels first - upsample_bicubic channels first/last ### Results for this PR <details> <summary> Benchmark results between 8518b0e (master) and 73137d8 (this PR) </summary> ``` Description: - 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.6 - 20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.1 - 20210331-092940_pr_results_1.9.0a0+git73137d8.6 - 20210331-092940_pr_results_1.9.0a0+git73137d8.1 [---------- upsample_bilinear2d channels_first contiguous torch.float32 ----------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 331.8 \| 334.6 [1, 3, 320, 320] -> (512, 512) \| 1261.7 \| 1271.5 [32, 128, 64, 64] -> (32, 32) \| 10164.6 \| 10251.4 [32, 128, 64, 64] -> (128, 128) \| 195966.1 \| 197141.8 [1, 3, 500, 500] -> (256, 256) \| 347.7 \| 348.3 [1, 3, 500, 500] -> (800, 800) \| 3044.9 \| 3071.4 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 76.1 \| 77.0 [1, 3, 320, 320] -> (512, 512) \| 244.8 \| 247.6 [32, 128, 64, 64] -> (32, 32) \| 2329.4 \| 2315.8 [32, 128, 64, 64] -> (128, 128) \| 47855.3 \| 49047.7 [1, 3, 500, 500] -> (256, 256) \| 78.1 \| 78.7 [1, 3, 500, 500] -> (800, 800) \| 569.3 \| 575.6 Times are in microseconds (us). [------- upsample_bilinear2d channels_first non-contiguous torch.float32 --------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 339.0 \| 340.3 [1, 3, 320, 320] -> (512, 512) \| 1266.1 \| 1277.3 [1, 3, 500, 500] -> (256, 256) \| 348.8 \| 351.3 [1, 3, 500, 500] -> (800, 800) \| 3054.5 \| 3077.3 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 76.6 \| 77.4 [1, 3, 320, 320] -> (512, 512) \| 246.0 \| 248.1 [1, 3, 500, 500] -> (256, 256) \| 78.3 \| 79.5 [1, 3, 500, 500] -> (800, 800) \| 572.2 \| 580.0 Times are in microseconds (us). [--------- upsample_bilinear2d channels_last non-contiguous torch.float32 --------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 965.4 \| 964.9 [1, 3, 320, 320] -> (512, 512) \| 3856.2 \| 3866.8 [32, 128, 64, 64] -> (32, 32) \| 5808.3 \| 5812.8 [32, 128, 64, 64] -> (128, 128) \| 99575.2 \| 97226.2 [2, 128, 64, 46] -> (32, 32) \| 110.5 \| 109.0 [2, 128, 64, 46] -> (128, 128) \| 1662.3 \| 1612.0 [1, 128, 64, 46] -> (32, 32) \| 55.6 \| 55.5 [1, 128, 64, 46] -> (128, 128) \| 467.0 \| 463.9 [1, 3, 500, 500] -> (256, 256) \| 967.7 \| 966.7 [1, 3, 500, 500] -> (800, 800) \| 9394.7 \| 9436.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 962.2 \| 965.4 [1, 3, 320, 320] -> (512, 512) \| 3844.3 \| 3844.3 [32, 128, 64, 64] -> (32, 32) \| 2270.0 \| 2267.6 [32, 128, 64, 64] -> (128, 128) \| 31909.7 \| 32106.5 [2, 128, 64, 46] -> (32, 32) \| 61.3 \| 59.9 [2, 128, 64, 46] -> (128, 128) \| 912.3 \| 893.5 [1, 128, 64, 46] -> (32, 32) \| 55.5 \| 55.3 [1, 128, 64, 46] -> (128, 128) \| 467.0 \| 466.4 [1, 3, 500, 500] -> (256, 256) \| 967.2 \| 971.1 [1, 3, 500, 500] -> (800, 800) \| 9383.2 \| 9417.4 Times are in microseconds (us). [------ upsample_linear1d channels_first contiguous torch.float32 -------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 513.5 \| 521.8 [4, 512, 320] -> [512] \| 999.0 \| 1011.8 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 103.7 \| 104.9 [4, 512, 320] -> [512] \| 192.2 \| 194.9 Times are in microseconds (us). [------------- upsample_trilinear3d channels_first contiguous torch.float32 -------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 5.4 \| 5.5 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 111.2 \| 111.1 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 1.1 \| 1.0 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 23.4 \| 23.2 Times are in milliseconds (ms). [----------- upsample_trilinear3d channels_last non-contiguous torch.float32 ------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 13521.9 \| 12939.9 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 244561.3 \| 236595.6 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 362.2 \| 365.5 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 38141.4 \| 37957.7 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 12980.4 \| 12962.7 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 236256.4 \| 236364.5 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 367.9 \| 393.2 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 38222.5 \| 38198.3 Times are in microseconds (us). [----------- upsample_nearest2d channels_first contiguous torch.float32 ----------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 1205.7 \| 107.2 [1, 3, 320, 320] -> (512, 512) \| 4793.5 \| 357.7 [32, 128, 64, 64] -> (32, 32) \| 26550.0 \| 6227.1 [32, 128, 64, 64] -> (128, 128) \| 341140.3 \| 116404.4 [1, 3, 500, 500] -> (256, 256) \| 1208.6 \| 122.9 [1, 3, 500, 500] -> (800, 800) \| 11648.0 \| 848.1 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 220.5 \| 32.6 [1, 3, 320, 320] -> (512, 512) \| 865.4 \| 78.1 [32, 128, 64, 64] -> (32, 32) \| 4890.9 \| 2201.2 [32, 128, 64, 64] -> (128, 128) \| 73533.8 \| 32315.4 [1, 3, 500, 500] -> (256, 256) \| 222.3 \| 35.0 [1, 3, 500, 500] -> (800, 800) \| 2107.5 \| 170.7 Times are in microseconds (us). [----------- upsample_nearest2d channels_first contiguous torch.uint8 -----------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 1457.0 \| 310.7 [1, 3, 320, 320] -> (512, 512) \| 5808.0 \| 1196.6 [1, 3, 500, 500] -> (256, 256) \| 1460.9 \| 312.7 [1, 3, 500, 500] -> (800, 800) \| 14094.3 \| 2903.5 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 264.8 \| 66.8 [1, 3, 320, 320] -> (512, 512) \| 1046.0 \| 228.9 [1, 3, 500, 500] -> (256, 256) \| 266.0 \| 68.0 [1, 3, 500, 500] -> (800, 800) \| 2546.6 \| 535.8 Times are in microseconds (us). [-------- upsample_nearest2d channels_first non-contiguous torch.float32 --------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 1284.3 \| 109.9 [1, 3, 320, 320] -> (512, 512) \| 4870.0 \| 361.6 [1, 3, 500, 500] -> (256, 256) \| 1482.8 \| 123.3 [1, 3, 500, 500] -> (800, 800) \| 12050.3 \| 858.8 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 240.2 \| 32.8 [1, 3, 320, 320] -> (512, 512) \| 886.1 \| 78.4 [1, 3, 500, 500] -> (256, 256) \| 274.9 \| 34.9 [1, 3, 500, 500] -> (800, 800) \| 2188.8 \| 174.0 Times are in microseconds (us). [--------- upsample_nearest2d channels_first non-contiguous torch.uint8 ---------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 1501.9 \| 312.2 [1, 3, 320, 320] -> (512, 512) \| 5853.4 \| 1202.1 [1, 3, 500, 500] -> (256, 256) \| 1574.0 \| 313.9 [1, 3, 500, 500] -> (800, 800) \| 14210.2 \| 2904.5 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 277.2 \| 67.2 [1, 3, 320, 320] -> (512, 512) \| 1059.8 \| 228.9 [1, 3, 500, 500] -> (256, 256) \| 292.2 \| 68.1 [1, 3, 500, 500] -> (800, 800) \| 2574.4 \| 536.2 Times are in microseconds (us). [--------- upsample_nearest2d channels_last non-contiguous torch.float32 ---------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 746.0 \| 751.1 [1, 3, 320, 320] -> (512, 512) \| 2967.6 \| 2979.2 [32, 128, 64, 64] -> (32, 32) \| 3408.5 \| 3379.0 [32, 128, 64, 64] -> (128, 128) \| 90166.4 \| 90023.0 [2, 128, 64, 46] -> (32, 32) \| 74.8 \| 74.5 [2, 128, 64, 46] -> (128, 128) \| 1591.2 \| 1594.3 [1, 128, 64, 46] -> (32, 32) \| 39.3 \| 39.2 [1, 128, 64, 46] -> (128, 128) \| 420.3 \| 419.1 [1, 3, 500, 500] -> (256, 256) \| 751.6 \| 756.3 [1, 3, 500, 500] -> (800, 800) \| 7222.2 \| 7268.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 144.9 \| 140.1 [1, 3, 320, 320] -> (512, 512) \| 560.7 \| 540.6 [32, 128, 64, 64] -> (32, 32) \| 1418.1 \| 1418.6 [32, 128, 64, 64] -> (128, 128) \| 28158.4 \| 26411.4 [2, 128, 64, 46] -> (32, 32) \| 18.4 \| 17.8 [2, 128, 64, 46] -> (128, 128) \| 532.3 \| 552.0 [1, 128, 64, 46] -> (32, 32) \| 13.9 \| 13.6 [1, 128, 64, 46] -> (128, 128) \| 81.3 \| 82.9 [1, 3, 500, 500] -> (256, 256) \| 145.9 \| 141.6 [1, 3, 500, 500] -> (800, 800) \| 1363.4 \| 1316.2 Times are in microseconds (us). [---------- upsample_nearest2d channels_last non-contiguous torch.uint8 ----------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 795.7 \| 824.1 [1, 3, 320, 320] -> (512, 512) \| 3163.4 \| 3274.8 [32, 128, 64, 64] -> (32, 32) \| 798.8 \| 812.2 [32, 128, 64, 64] -> (128, 128) \| 25259.6 \| 25453.1 [2, 128, 64, 46] -> (32, 32) \| 39.3 \| 39.9 [2, 128, 64, 46] -> (128, 128) \| 493.7 \| 499.9 [1, 128, 64, 46] -> (32, 32) \| 22.6 \| 22.9 [1, 128, 64, 46] -> (128, 128) \| 249.7 \| 254.0 [32, 64, 128, 64] -> (32, 32) \| 475.3 \| 507.4 [32, 64, 128, 64] -> (128, 128) \| 13709.7 \| 13767.5 [1, 3, 500, 500] -> (256, 256) \| 804.0 \| 827.6 [1, 3, 500, 500] -> (800, 800) \| 7764.9 \| 7982.7 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 150.1 \| 151.4 [1, 3, 320, 320] -> (512, 512) \| 589.5 \| 592.6 [32, 128, 64, 64] -> (32, 32) \| 141.3 \| 194.5 [32, 128, 64, 64] -> (128, 128) \| 6916.5 \| 7445.0 [2, 128, 64, 46] -> (32, 32) \| 10.0 \| 12.5 [2, 128, 64, 46] -> (128, 128) \| 95.8 \| 141.1 [1, 128, 64, 46] -> (32, 32) \| 8.1 \| 10.0 [1, 128, 64, 46] -> (128, 128) \| 52.5 \| 74.3 [32, 64, 128, 64] -> (32, 32) \| 79.8 \| 123.7 [32, 64, 128, 64] -> (128, 128) \| 3639.9 \| 4087.9 [1, 3, 500, 500] -> (256, 256) \| 150.7 \| 152.2 [1, 3, 500, 500] -> (800, 800) \| 1430.9 \| 1440.7 Times are in microseconds (us). [------ upsample_nearest1d channels_first contiguous torch.float32 ------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 1601.7 \| 241.7 [4, 512, 320] -> [512] \| 3188.5 \| 435.7 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 291.9 \| 53.3 [4, 512, 320] -> [512] \| 577.8 \| 88.1 Times are in microseconds (us). [------- upsample_nearest1d channels_first contiguous torch.uint8 -------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 2010.1 \| 532.3 [4, 512, 320] -> [512] \| 3999.7 \| 1011.4 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 364.2 \| 104.6 [4, 512, 320] -> [512] \| 722.8 \| 193.5 Times are in microseconds (us). [-------------- upsample_nearest3d channels_first contiguous torch.float32 --------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 14801.0 \| 977.5 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 217368.5 \| 41577.3 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 2670.3 \| 210.7 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 42023.6 \| 10971.6 Times are in microseconds (us). [--------------- upsample_nearest3d channels_first contiguous torch.uint8 ---------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 17151.7 \| 3195.8 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 221221.0 \| 50524.5 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 3085.3 \| 588.6 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 39842.0 \| 9141.0 Times are in microseconds (us). [------------ upsample_nearest3d channels_last non-contiguous torch.float32 -------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 7694.1 \| 7729.0 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 138104.6 \| 138158.0 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 251.1 \| 252.4 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 28991.5 \| 28882.8 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 1398.3 \| 1402.6 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 28056.5 \| 28123.2 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 50.8 \| 51.1 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 7595.7 \| 7540.7 Times are in microseconds (us). [------------- upsample_nearest3d channels_last non-contiguous torch.uint8 --------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 8147.8 \| 8176.2 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 114658.1 \| 114992.7 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 364.3 \| 356.0 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 17276.0 \| 16331.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 1469.4 \| 1476.1 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 20647.1 \| 20722.6 [1, 16, 32, 64, 64] -> [16, 32, 32] \| 69.7 \| 68.4 [1, 16, 32, 64, 64] -> [64, 128, 128] \| 3125.7 \| 2948.2 Times are in microseconds (us). [----------- upsample_bicubic2d channels_first contiguous torch.float32 ----------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 5961.0 \| 1680.2 [1, 3, 320, 320] -> (512, 512) \| 23803.7 \| 6591.0 [32, 128, 64, 64] -> (32, 32) \| 620609.4 \| 37981.6 [32, 128, 64, 64] -> (128, 128) \| 10120286.1 \| 646305.5 [1, 3, 500, 500] -> (256, 256) \| 6005.4 \| 1694.6 [1, 3, 500, 500] -> (800, 800) \| 58271.9 \| 16047.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 6218.5 \| 347.1 [1, 3, 320, 320] -> (512, 512) \| 24144.6 \| 1253.4 [32, 128, 64, 64] -> (32, 32) \| 612762.5 \| 6934.8 [32, 128, 64, 64] -> (128, 128) \| 9906221.2 \| 127411.1 [1, 3, 500, 500] -> (256, 256) \| 6241.9 \| 350.2 [1, 3, 500, 500] -> (800, 800) \| 59052.2 \| 2984.8 Times are in microseconds (us). [-------- upsample_bicubic2d channels_first non-contiguous torch.float32 --------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 6050.9 \| 1694.3 [1, 3, 320, 320] -> (512, 512) \| 23897.1 \| 6607.9 [1, 3, 500, 500] -> (256, 256) \| 6282.8 \| 1693.9 [1, 3, 500, 500] -> (800, 800) \| 58608.1 \| 16061.0 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 6243.7 \| 347.6 [1, 3, 320, 320] -> (512, 512) \| 24779.9 \| 1253.8 [1, 3, 500, 500] -> (256, 256) \| 6348.0 \| 350.7 [1, 3, 500, 500] -> (800, 800) \| 59255.6 \| 2983.8 Times are in microseconds (us). [--------- upsample_bicubic2d channels_last non-contiguous torch.float32 ---------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+git73137d8 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 6117.0 \| 1688.2 [1, 3, 320, 320] -> (512, 512) \| 23967.4 \| 6644.8 [32, 128, 64, 64] -> (32, 32) \| 679574.0 \| 78477.4 [32, 128, 64, 64] -> (128, 128) \| 10334325.5 \| 817649.0 [2, 128, 64, 46] -> (32, 32) \| 9828.0 \| 4449.2 [2, 128, 64, 46] -> (128, 128) \| 134989.3 \| 42817.4 [1, 128, 64, 46] -> (32, 32) \| 4508.2 \| 2228.6 [1, 128, 64, 46] -> (128, 128) \| 59404.9 \| 21400.4 [1, 3, 500, 500] -> (256, 256) \| 6359.0 \| 1712.7 [1, 3, 500, 500] -> (800, 800) \| 58717.6 \| 16086.6 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 6922.0 \| 349.5 [1, 3, 320, 320] -> (512, 512) \| 24916.5 \| 1260.2 [32, 128, 64, 64] -> (32, 32) \| 454240.4 \| 16491.4 [32, 128, 64, 64] -> (128, 128) \| 7198101.5 \| 159921.9 [2, 128, 64, 46] -> (32, 32) \| 10082.8 \| 891.1 [2, 128, 64, 46] -> (128, 128) \| 151037.0 \| 7704.2 [1, 128, 64, 46] -> (32, 32) \| 4325.5 \| 633.9 [1, 128, 64, 46] -> (128, 128) \| 62400.4 \| 3853.5 [1, 3, 500, 500] -> (256, 256) \| 6374.9 \| 354.9 [1, 3, 500, 500] -> (800, 800) \| 58638.8 \| 2992.0 Times are in microseconds (us). Intermediate benchmark sources: - results/20210331-092940_pth_nightly_results_1.9.0a0+git8518b0e.log.save - results/20210331-092940_pr_results_1.9.0a0+git73137d8.log.save ``` [Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20210326-061238_pr_1.9.0a0%2Bgita17040a_vs_pth_1.9.0a0%2Bgit8518b0e_results.md) </details> This description is based on the benchmarks and the code from [here](https://github.com/vfdev-5/interpolate-tensoriterator/tree/master/step_seven). Joint work with Francisco Massa (fmassa). --- Appendix: Results without original 2d/3d channels last implementation <details> <summary> Quick benchmark results between 8518b0e (master) and [this branch](https://github.com/pytorch/pytorch/compare/master...Quansight:vfdev-5/generic-upsample-tensor-iterator) </summary> ``` Description: - 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.6 - 20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.opencv.1 - 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.6 - 20212303-061238_pr_results_1.9.0a0+gite3a9544.opencv.1 [----------------- upsample_bilinear2d channels_first contiguous -----------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 348.5 \| 331.7 [1, 3, 320, 320] -> (512, 512) \| 1254.0 \| 1178.1 [32, 128, 64, 64] -> (32, 32) \| 10409.4 \| 10009.1 [32, 128, 64, 64] -> (128, 128) \| 210175.8 \| 204542.5 [1, 3, 500, 500] -> (256, 256) \| 348.5 \| 329.5 [1, 3, 500, 500] -> (800, 800) \| 3079.8 \| 2890.1 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 76.4 \| 73.4 [1, 3, 320, 320] -> (512, 512) \| 247.1 \| 232.0 [32, 128, 64, 64] -> (32, 32) \| 2371.1 \| 2340.5 [32, 128, 64, 64] -> (128, 128) \| 62182.6 \| 54089.9 [1, 3, 500, 500] -> (256, 256) \| 78.2 \| 75.8 [1, 3, 500, 500] -> (800, 800) \| 569.0 \| 541.3 Times are in microseconds (us). [-------------- upsample_bilinear2d channels_first non-contiguous ---------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 340.5 \| 321.9 [1, 3, 320, 320] -> (512, 512) \| 1256.1 \| 1179.0 [1, 3, 500, 500] -> (256, 256) \| 351.4 \| 332.0 [1, 3, 500, 500] -> (800, 800) \| 3089.1 \| 2898.6 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 77.2 \| 75.0 [1, 3, 320, 320] -> (512, 512) \| 246.6 \| 232.7 [1, 3, 500, 500] -> (256, 256) \| 78.6 \| 75.4 [1, 3, 500, 500] -> (800, 800) \| 576.3 \| 539.6 Times are in microseconds (us). [------------------------ upsample_bilinear2d channels_last non-contiguous ------------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 \| opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 971.9 \| 1324.6 \| 99.6 [1, 3, 320, 320] -> (512, 512) \| 3867.8 \| 5329.9 \| 271.5 [32, 128, 64, 64] -> (32, 32) \| 6010.6 \| 6304.3 \| [32, 128, 64, 64] -> (128, 128) \| 112299.9 \| 116956.8 \| [2, 128, 64, 46] -> (32, 32) \| 110.1 \| 133.2 \| [2, 128, 64, 46] -> (128, 128) \| 1690.1 \| 1838.6 \| [1, 128, 64, 46] -> (32, 32) \| 55.8 \| 73.4 \| 185.8 [1, 128, 64, 46] -> (128, 128) \| 474.5 \| 684.9 \| 1445.7 [1, 3, 500, 500] -> (256, 256) \| 972.9 \| 1343.0 \| 149.5 [1, 3, 500, 500] -> (800, 800) \| 9460.2 \| 12925.8 \| 685.1 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 956.6 \| 260.1 \| 27.1 [1, 3, 320, 320] -> (512, 512) \| 3867.3 \| 967.1 \| 63.6 [32, 128, 64, 64] -> (32, 32) \| 2489.4 \| 2427.0 \| [32, 128, 64, 64] -> (128, 128) \| 37462.1 \| 41329.8 \| [2, 128, 64, 46] -> (32, 32) \| 61.2 \| 38.9 \| [2, 128, 64, 46] -> (128, 128) \| 904.2 \| 652.0 \| [1, 128, 64, 46] -> (32, 32) \| 57.1 \| 32.0 \| 191.1 [1, 128, 64, 46] -> (128, 128) \| 491.4 \| 138.1 \| 1485.8 [1, 3, 500, 500] -> (256, 256) \| 977.0 \| 257.8 \| 36.6 [1, 3, 500, 500] -> (800, 800) \| 9470.0 \| 2696.0 \| 142.8 Times are in microseconds (us). [------------- upsample_linear1d channels_first contiguous --------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 516.5 \| 524.7 [4, 512, 320] -> [512] \| 993.8 \| 1008.0 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 104.3 \| 105.4 [4, 512, 320] -> [512] \| 193.5 \| 195.6 Times are in microseconds (us). [-------------------- upsample_trilinear3d channels_first contiguous --------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 5.5 \| 11.5 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 116.3 \| 213.1 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 1.1 \| 2.1 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 36.1 \| 47.2 Times are in milliseconds (ms). [------------------ upsample_trilinear3d channels_last non-contiguous -------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 13.1 \| 19.9 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 242.3 \| 349.4 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 13.1 \| 4.4 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 242.4 \| 87.2 Times are in milliseconds (ms). [------------------ upsample_nearest2d channels_first contiguous -----------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 1194.5 \| 107.8 [1, 3, 320, 320] -> (512, 512) \| 4813.8 \| 365.5 [32, 128, 64, 64] -> (32, 32) \| 26745.6 \| 6280.6 [32, 128, 64, 64] -> (128, 128) \| 357686.7 \| 129032.9 [1, 3, 500, 500] -> (256, 256) \| 1205.9 \| 123.8 [1, 3, 500, 500] -> (800, 800) \| 11770.3 \| 879.2 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 220.2 \| 32.7 [1, 3, 320, 320] -> (512, 512) \| 867.2 \| 78.7 [32, 128, 64, 64] -> (32, 32) \| 5789.6 \| 2241.8 [32, 128, 64, 64] -> (128, 128) \| 89125.3 \| 41881.3 [1, 3, 500, 500] -> (256, 256) \| 224.3 \| 34.8 [1, 3, 500, 500] -> (800, 800) \| 2182.8 \| 176.6 Times are in microseconds (us). [--------------- upsample_nearest2d channels_first non-contiguous ---------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 1279.5 \| 110.2 [1, 3, 320, 320] -> (512, 512) \| 4908.1 \| 367.1 [1, 3, 500, 500] -> (256, 256) \| 1488.1 \| 123.4 [1, 3, 500, 500] -> (800, 800) \| 12186.4 \| 879.3 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 241.8 \| 32.6 [1, 3, 320, 320] -> (512, 512) \| 889.0 \| 79.2 [1, 3, 500, 500] -> (256, 256) \| 279.2 \| 35.6 [1, 3, 500, 500] -> (800, 800) \| 2226.5 \| 174.3 Times are in microseconds (us). [------------------------ upsample_nearest2d channels_last non-contiguous -------------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 \| opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 752.1 \| 487.2 \| 75.5 [1, 3, 320, 320] -> (512, 512) \| 2992.6 \| 1880.0 \| 251.4 [32, 128, 64, 64] -> (32, 32) \| 3458.6 \| 3466.5 \| [32, 128, 64, 64] -> (128, 128) \| 102350.7 \| 103919.4 \| [2, 128, 64, 46] -> (32, 32) \| 75.2 \| 85.2 \| [2, 128, 64, 46] -> (128, 128) \| 1637.0 \| 1690.4 \| [1, 128, 64, 46] -> (32, 32) \| 39.6 \| 47.2 \| 37.6 [1, 128, 64, 46] -> (128, 128) \| 426.3 \| 449.0 \| 412.4 [1, 3, 500, 500] -> (256, 256) \| 757.5 \| 495.5 \| 85.0 [1, 3, 500, 500] -> (800, 800) \| 7281.4 \| 4532.6 \| 622.8 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 139.3 \| 104.1 \| 75.7 [1, 3, 320, 320] -> (512, 512) \| 535.5 \| 361.2 \| 73.0 [32, 128, 64, 64] -> (32, 32) \| 1518.6 \| 1458.2 \| [32, 128, 64, 64] -> (128, 128) \| 37117.7 \| 40142.4 \| [2, 128, 64, 46] -> (32, 32) \| 17.6 \| 26.6 \| [2, 128, 64, 46] -> (128, 128) \| 537.6 \| 629.4 \| [1, 128, 64, 46] -> (32, 32) \| 13.7 \| 22.1 \| 38.8 [1, 128, 64, 46] -> (128, 128) \| 83.6 \| 94.5 \| 420.2 [1, 3, 500, 500] -> (256, 256) \| 140.8 \| 104.9 \| 87.8 [1, 3, 500, 500] -> (800, 800) \| 1317.8 \| 853.8 \| 139.7 Times are in microseconds (us). [------------- upsample_nearest1d channels_first contiguous -------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 1594.3 \| 247.4 [4, 512, 320] -> [512] \| 3222.6 \| 440.4 6 threads: --------------------------------------------------------------- [4, 512, 320] -> [256] \| 294.4 \| 53.7 [4, 512, 320] -> [512] \| 575.0 \| 88.5 Times are in microseconds (us). [--------------------- upsample_nearest3d channels_first contiguous ---------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 14952.7 \| 1005.7 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 224955.6 \| 46228.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 2887.2 \| 206.2 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 56872.0 \| 13566.3 Times are in microseconds (us). [------------------- upsample_nearest3d channels_last non-contiguous --------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 7772.3 \| 4770.9 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 144655.1 \| 108605.0 6 threads: ------------------------------------------------------------------------------- [1, 3, 16, 320, 320] -> [8, 256, 256] \| 1401.9 \| 877.7 [1, 3, 16, 320, 320] -> [32, 512, 512] \| 35939.6 \| 28621.5 Times are in microseconds (us). [------------------ upsample_bicubic2d channels_first contiguous -----------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 6038.7 \| 2340.4 [1, 3, 320, 320] -> (512, 512) \| 24040.6 \| 9205.9 [32, 128, 64, 64] -> (32, 32) \| 471016.3 \| 52059.1 [32, 128, 64, 64] -> (128, 128) \| 7705594.5 \| 884743.9 [1, 3, 500, 500] -> (256, 256) \| 6061.5 \| 2361.9 [1, 3, 500, 500] -> (800, 800) \| 58940.7 \| 22401.8 6 threads: ------------------------------------------------------------------------ [1, 3, 320, 320] -> (256, 256) \| 6594.3 \| 466.5 [1, 3, 320, 320] -> (512, 512) \| 25361.5 \| 1729.1 [32, 128, 64, 64] -> (32, 32) \| 487783.5 \| 11550.0 [32, 128, 64, 64] -> (128, 128) \| 7963636.6 \| 196017.3 [1, 3, 500, 500] -> (256, 256) \| 6443.8 \| 464.1 [1, 3, 500, 500] -> (800, 800) \| 61891.9 \| 4257.2 Times are in microseconds (us). [--------------- upsample_bicubic2d channels_first non-contiguous ---------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 1 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 6116.7 \| 2357.0 [1, 3, 320, 320] -> (512, 512) \| 24182.0 \| 9213.9 [1, 3, 500, 500] -> (256, 256) \| 6349.6 \| 2358.5 [1, 3, 500, 500] -> (800, 800) \| 59365.2 \| 22431.2 6 threads: ----------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 7155.1 \| 464.6 [1, 3, 320, 320] -> (512, 512) \| 24566.8 \| 1712.4 [1, 3, 500, 500] -> (256, 256) \| 7217.5 \| 466.6 [1, 3, 500, 500] -> (800, 800) \| 59880.2 \| 4148.8 Times are in microseconds (us). [------------------------ upsample_bicubic2d channels_last non-contiguous -------------------------] \| 1.9.0a0+git8518b0e \| 1.9.0a0+gite3a9544 \| opencv 4.5.1 1 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 6184.3 \| 2360.0 \| 215.0 [1, 3, 320, 320] -> (512, 512) \| 24499.7 \| 9231.1 \| 510.7 [32, 128, 64, 64] -> (32, 32) \| 548304.5 \| 93517.8 \| [32, 128, 64, 64] -> (128, 128) \| 7810958.3 \| 1086334.6 \| [2, 128, 64, 46] -> (32, 32) \| 10883.4 \| 5594.9 \| [2, 128, 64, 46] -> (128, 128) \| 153253.2 \| 57071.2 \| [1, 128, 64, 46] -> (32, 32) \| 4519.4 \| 2826.5 \| 619.7 [1, 128, 64, 46] -> (128, 128) \| 61339.7 \| 28470.7 \| 3654.5 [1, 3, 500, 500] -> (256, 256) \| 6444.8 \| 2389.9 \| 292.9 [1, 3, 500, 500] -> (800, 800) \| 59448.0 \| 22479.1 \| 1316.9 6 threads: ----------------------------------------------------------------------------------------- [1, 3, 320, 320] -> (256, 256) \| 6370.1 \| 464.9 \| 61.3 [1, 3, 320, 320] -> (512, 512) \| 25365.6 \| 1767.5 \| 145.7 [32, 128, 64, 64] -> (32, 32) \| 502888.7 \| 22016.3 \| [32, 128, 64, 64] -> (128, 128) \| 8072918.9 \| 234567.0 \| [2, 128, 64, 46] -> (32, 32) \| 11171.4 \| 1049.5 \| [2, 128, 64, 46] -> (128, 128) \| 152612.5 \| 11264.8 \| [1, 128, 64, 46] -> (32, 32) \| 4359.3 \| 791.4 \| 651.1 [1, 128, 64, 46] -> (128, 128) \| 61346.5 \| 7563.9 \| 3765.2 [1, 3, 500, 500] -> (256, 256) \| 6644.4 \| 469.7 \| 77.4 [1, 3, 500, 500] -> (800, 800) \| 59947.2 \| 4154.3 \| 313.2 Times are in microseconds (us). Intermediate benchmark sources: - results/20212303-061238_pth_nightly_results_1.9.0a0+git8518b0e.log.save.opencv - results/20212303-061238_pr_results_1.9.0a0+gite3a9544.log.save.opencv ``` [Source file](https://raw.githubusercontent.com/vfdev-5/interpolate-tensoriterator/master/step_seven/results/20212303-061238_pr_1.9.0a0%2Bgite3a9544_vs_pth_1.9.0a0%2Bgit8518b0e_results.opencv.md) </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/54500 Reviewed By: glaringlee Differential Revision: D27463566 Pulled By: fmassa fbshipit-source-id: ceac3a8cee0eeb1a4ddd9344accffcc65449a49a	2021-04-06 08:21:10 -07:00
Sam Estep	87d55058f1	Fix the clang-tidy diff SHA for using PR merge (#55318 ) Summary: Since https://github.com/pytorch/pytorch/issues/54967, our clang-tidy CI job has been giving warnings on files that PRs don't touch (see the screenshot below for an example). This PR should fix the issue by comparing against the merge-base of the `merge` commit with `master`, which is just `master` itself. ![clang-tidy](https://user-images.githubusercontent.com/8246041/113618718-eb83f600-960c-11eb-9375-8b88158eb566.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55318 Test Plan: CI. Reviewed By: janeyx99 Differential Revision: D27572553 Pulled By: samestep fbshipit-source-id: 9a833aaeecc2ab22462b3fa99fa3353490c3de85	2021-04-06 07:40:39 -07:00
Brian Hirsh	bf70fe69ae	Revert D27442325: [torch/elastic] Revise the rendezvous handler registry logic. Test Plan: revert-hammer Differential Revision: D27442325 (`df299dbd7d`) Original commit changeset: 8519a2caacbe fbshipit-source-id: f10452567f592c23ae79ca31556a2a77546726b1	2021-04-06 06:17:14 -07:00
Hao Lu	c3d0607ffa	[Static Runtime] Make sure the copy version of the op exist in ReplaceWithCopy (#55337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55337 `static_runtime::permute_copy` is in fb-only folder. Because `caffe2/test/test_static_runtime.py` is in OSS, we can't load the fb-only operator library. The workaround is to check at runtime whether the op is registered or not. Test Plan: This fixed two of the broken tests: ``` ✓ Pass: caffe2/test:static_runtime - test_multihead_attention_layer (test_static_runtime.TestStaticModule) (10.316) ✓ Pass: caffe2/test:static_runtime - test_mlp (test_static_runtime.TestStaticModule) (16.134) ``` Reviewed By: ajyu Differential Revision: D27577066 fbshipit-source-id: ac87dcde71f0d5140ccde448bb49aaebbbb5908a	2021-04-06 04:25:04 -07:00
Yi Wang	1b4bb3691c	[Gradient Compression] Update _powerSGD_comm_hook_wrapper to only expose 2 most critical hyperparameters (#55295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55295 Update `_powerSGD_comm_hook_wrapper` to only expose 2 most critical hyperparameters, to make this API more clear to any future user (although the second hyperparameter `start_powerSGD_iter` is not in use yet). Test Plan: waitforbuildbot Reviewed By: shuyingsunshine21 Differential Revision: D27561734 fbshipit-source-id: b661981cc033b109f4f2fc92b435567a184a7fb5	2021-04-06 01:29:10 -07:00
Yi Wang	cc4036905c	[Gradient Compression] Update the default value of start_powerSGD_iter and update the docstring (#55272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55272 1. Set 1K as the default value of `start_powerSGD_iter` for practicability. The original default value 10 is usually too small for real use cases. The new default value 1K is also consistent with PyTorch Lightning. 2. Update the docstring of `start_powerSGD_iter` to remind the users to set a value no less than the warm-up steps if any. 3. Update some unit tests to start PowerSGD early. ghstack-source-id: 125707662 Test Plan: waitforbuildbot Reviewed By: shuyingsunshine21 Differential Revision: D27553388 fbshipit-source-id: 40076419bc85755c0c0b64b79ba914b241085fcc	2021-04-06 01:27:29 -07:00
Martin Yuan	3551bd31be	[PyTorch] Lite interpreter with a backend delegate (#54462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54462 Unclean files during sync - Sat Mar 20 04:00:02 PDT 2021 Unclean files during sync - Sun Mar 21 04:00:01 PDT 2021 ghstack-source-id: 124585992 Test Plan: ``` buck run xplat/caffe2/fb/test/delegate:interpreter_test -- --model_file_path=/path/to/mobile_model.ptl ``` Reviewed By: raziel Differential Revision: D27232309 fbshipit-source-id: 8504a3185339d73bfa6e924485c4745acf269cec	2021-04-06 00:55:26 -07:00
Chen Lai	7d9a619796	[PyTorch] Fix bin hash comparison failure in clang format script (#55281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55281 ## Summary ` python3 tools/clang_format_all.py` is complaining that binary is not what expected. Find out the reference hash include an extra new line comparing with the actual hash. In this pr, 1. Use `expr(hash)` to show the raw string, such that it's easier to compare two string. 2. Remove the extra new line. 3. Run `python3 tools/clang_format_all.py `, and it formats `torch/csrc/jit/runtime/static/passes.h`. Before the change, ``` (base) chenlai@chenlai-mp pytorch % python3 tools/clang_format_all.py -v Found pre-existing clang-format binary, skipping download Reference Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353' Actual Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353' The downloaded binary is not what was expected! (base) chenlai@chenlai-mp pytorch % ``` After the change, ``` (base) chenlai@chenlai-mp pytorch % python3 tools/clang_format_all.py -v Found pre-existing clang-format binary, skipping download Reference Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353\n' Actual Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353' The downloaded binary is not what was expected! (base) chenlai@chenlai-mp pytorch % ``` After strip the hash str: ``` (base) chenlai@chenlai-mp pytorch % python3 tools/clang_format_all.py -v Downloading clang-format to /Users/chenlai/pytorch/.clang-format-bin 0% \|################################################################\| 100% Reference Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353' Actual Hash: '5fde7bccf65032da297dfb1f18e4a95e96e278fa397e9dcaf364dfe23ec46353' Using clang-format located at /Users/chenlai/pytorch/.clang-format-bin/clang-format ``` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27556372 Pulled By: cccclai fbshipit-source-id: 2fd1ba220733e767ffab41ab31e162f0bf3f1d62	2021-04-06 00:33:53 -07:00
Can Balioglu	df299dbd7d	[torch/elastic] Revise the rendezvous handler registry logic. Summary: Improve the implementation and the unit test coverage of `RendezvousHandlerRegistry`. Test Plan: Run the existing and newly-introduced unit tests. Reviewed By: tierex Differential Revision: D27442325 fbshipit-source-id: 8519a2caacbe2e3ce5d9a02e87a910503dea27d7	2021-04-05 23:38:29 -07:00
Can Balioglu	359d0a0205	[torch/elastic] Improve the implementation of `RendezvousParameters` and add its unit tests. (#146 ) Summary: Pull Request resolved: https://github.com/pytorch/elastic/pull/146 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54807 Improve the implementation and the unit test coverage of `RendezvousParameters`. Test Plan: Run the existing and newly-introduced unit tests. Reviewed By: kiukchung Differential Revision: D27342444 fbshipit-source-id: 88de356c0a799844a739eb9105185bb8c1acf11f	2021-04-05 23:38:27 -07:00
Can Balioglu	7f06c65a4c	[torch/elastic] Improve the implementation of the utility functions and add their unit tests. (#54804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54804 Improve the implementation of the utility functions to handle more edge cases and also have a new set of unit tests to cover their usage. Test Plan: Run the existing and newly introduced unit tests. Reviewed By: kiukchung Differential Revision: D27327898 fbshipit-source-id: 96b6fe2d910e3de69f44947a0e8a9f687ab50633	2021-04-05 23:38:25 -07:00
Can Balioglu	de7f05b9eb	[torch/elastic] Expose a `stderr` parameter in `EtcdServer`. (#54805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54805 Expose a `stderr` parameter to `EtcdServer` to have a clean unit test outputs. Test Plan: Run the existing test suite. Reviewed By: kiukchung Differential Revision: D27327495 fbshipit-source-id: 0a342aeda0ff4d85d809aab1cbf155d3fafd4fa1	2021-04-05 23:38:22 -07:00
Can Balioglu	bad8d34780	[torch/elastic] Revise the rendezvous exception types. (#54803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54803 Revise the rendezvous exception types to align their naming convention more closely with the standard Python exception types. Test Plan: Run the existing test suite. Reviewed By: H-Huang Differential Revision: D27327505 fbshipit-source-id: 862c59222f9ca61a0e5afde89ae8f226090b4f92	2021-04-05 23:36:50 -07:00
Natalia Gimelshein	5584332180	Wrap cub in its own namespace (#55292 ) Summary: Tentative fix for https://github.com/pytorch/pytorch/issues/55027. Wraps cub import in its name space so that static variables used by cub and thrust don't conflict if they end up in the different libraries when torch is built with BUILD_SPLIT_CUDA. cub variables end up in their own namespace, thrust variables are unwrapped, so they don't clash. This also allows extensions to use cub without wrapping it (thrust will still be problematic). The solution to allowing extensions to use thrust is to stop using thrust in pytorch completely. Now importing cub and importing thrust cannot coexist, so I had to move nonzero to its own file, and remove reliance on thrust functions for it. Nonzero now uses cub only. Also, we cannot selectively import just some of cub headers, we are forced to import `cub/cub.cuh`, which is not great. Caffe2 ops using cub are not touched (there are too many), so mixing caffe2 and torch will (can) still result in the same bug. We are moving towards disabling c2 ops, so I think this is fine. Still, even with that compiler (correctly) warns about redefinition of `CUB_NS_PREFIX` because including `ATen/ATen.h` transitively includes `thrust/complex.h` and that in turn includes original (empty) definition of `CUB_NS_PREFIX`. We probably can just ignore this warning. Here's an example warning: ``` In file included from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:9: /data/users/ngimel/pytorch/aten/src/ATen/cuda/CubUtils.cuh:4: warning: "CUB_NS_PREFIX" redefined #define CUB_NS_PREFIX namespace at{ namespace native{ In file included from /home/ngimel/local/cuda/include/thrust/system/cuda/config.h:76, from /home/ngimel/local/cuda/include/thrust/system/cuda/detail/execution_policy.h:33, from /home/ngimel/local/cuda/include/thrust/iterator/detail/device_system_tag.h:23, from /home/ngimel/local/cuda/include/thrust/iterator/iterator_traits.h:111, from /home/ngimel/local/cuda/include/thrust/detail/type_traits/pointer_traits.h:23, from /home/ngimel/local/cuda/include/thrust/type_traits/is_contiguous_iterator.h:27, from /home/ngimel/local/cuda/include/thrust/type_traits/is_trivially_relocatable.h:19, from /home/ngimel/local/cuda/include/thrust/detail/complex/complex.inl:20, from /home/ngimel/local/cuda/include/thrust/complex.h:1031, from /data/users/ngimel/pytorch/c10/util/complex.h:9, from /data/users/ngimel/pytorch/c10/core/ScalarType.h:4, from /data/users/ngimel/pytorch/c10/core/Scalar.h:10, from /data/users/ngimel/pytorch/build/aten/src/ATen/core/TensorBody.h:8, from /data/users/ngimel/pytorch/aten/src/ATen/Tensor.h:3, from /data/users/ngimel/pytorch/aten/src/ATen/Context.h:4, from /data/users/ngimel/pytorch/aten/src/ATen/ATen.h:9, from /data/users/ngimel/pytorch/aten/src/ATen/native/cuda/Nonzero.cu:1: /home/ngimel/local/cuda/include/cub/util_namespace.cuh:43: note: this is the location of the previous definition #define CUB_NS_PREFIX ``` We will need a lint rule to prevent people from including `cub/cub.cuh`, because this will lead to https://github.com/pytorch/pytorch/issues/55027 reappearing again for some sequence of operations (and will lead to errors with cub code in extensions). Also, for this to work reliably we'll need to make sure that everything calling cub ends up in only one of libtorch_cuda_cu or libtorch_cuda_cpp, otherwise even namespace won't help (there still will be same symbols in 2 libraries). Upd: libtorch_cuda_cpp and libtorch_cuda_cu still contain the same symbols, which means that there exists a sequence of operations that will cause cache bug to reappear, so this is not a solution, we need to adjust file lists for BUILD_SPLITC_CUDA: ``` (pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cu.so \| grep PerDeviceAttributeCache \| c++filt 000000000c6bf808 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache 000000000c600830 u guard variable for cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache 00000000018625e0 t at::native::cub::PerDeviceAttributeCache::DevicePayload at::native::cub::PerDeviceAttributeCache::operator()<at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(at::native::cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int) 00000000009ce630 t cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()<cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}>(cub::PtxVersion(int&)::{lambda(int&)https://github.com/pytorch/pytorch/issues/1}&&, int) 000000000c6bf820 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache 000000000c600840 u cub::GetPerDeviceAttributeCache<cub::PtxVersionCacheTag>()::cache (pytorch) [ngimel@ ~/local/pytorch/build/lib] nm libtorch_cuda_cpp.so \| grep PerDeviceAttributeCache \| c++filt 0000000000ad2d98 u guard variable for at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache 0000000000ad2da0 u at::native::cub::GetPerDeviceAttributeCache<at::native::cub::PtxVersionCacheTag>()::cache ``` Upd2: Moved TensorFactories.cu to torch_cuda_cu sources (see a change to caffe2/CMakeLists.txt), so now cub-related symbols are only in libtorch_cuda_cu. We'd need a test for that, any suggestions on how best to test it? cc zasdfgbnm malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/55292 Reviewed By: anjali411 Differential Revision: D27576442 Pulled By: ngimel fbshipit-source-id: 1ef29503a342bb214794d34a42a47052092a66c1	2021-04-05 23:21:05 -07:00
Rohan Varma	0e03a2978a	[DDP] Call ensure_prior_reduction_finished within lock (#55074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55074 This function accesses member variables that can be modified by different threads (i.e. autograd engine threads), so call it within lock scope. ghstack-source-id: 125707513 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27474526 fbshipit-source-id: 8d43faedd6e6eeeb69e21ce3262337ab83d7ba07	2021-04-05 22:16:13 -07:00
Richard Barnes	697b130374	Add some missing types to torch (#55184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55184 Test Plan: Sandcastle Reviewed By: ezyang Differential Revision: D27515470 fbshipit-source-id: 264bc067db8fb430465d14bf9508ac8b1faf0f2f	2021-04-05 21:44:47 -07:00
Hao Lu	0521e420fd	[Static Runtime] Temporarily disable fusion tests (#55342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55342 The fusion stuff is pretty hard to debug. Given that we're not shipping this part of the stack any time soon, let's temporarily disable them and re-enable them when somebody has the cycles to debug them. Test Plan: Verified that the tests are now disabled Reviewed By: ajyu Differential Revision: D27578573 fbshipit-source-id: cb8d7c9339f7c1700b7653b0231cf570996995ff	2021-04-05 20:54:02 -07:00
Pritam Damania	e0c5d0ea15	Add tutorials to pipeline docs. (#55209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55209 ghstack-source-id: 125588324 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27528715 fbshipit-source-id: e6de3649e7265f34de03d452ffdf66ae45569d58	2021-04-05 20:01:00 -07:00
Shiyan Deng	15b087cdd2	[fx]Allow rewrite a symbolic traced module (#54011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54011 After symbolic tracing, `fn` seems to already have "forward" in its globals. In this case, `new_keys` would have length of 0 and we take "forward" from `global_dict` directly as `fn_compiled`. Test Plan: Added a new test in test_fx_experimental. Reviewed By: ansley Differential Revision: D27049012 fbshipit-source-id: 7fbeb50ebb717900ff5fc0a8a0925d6a97f5a6dd	2021-04-05 18:35:51 -07:00
lezcano	fd02fc5d71	Port put_ and take from TH to ATen (#53356 ) Summary: The two ports were don together, as they can be implemented with the same kernel. In TH, they were already implemented with the same kernel. Resolves https://github.com/pytorch/pytorch/issues/24751 Resolves https://github.com/pytorch/pytorch/issues/24614 Resolves https://github.com/pytorch/pytorch/issues/24640 Resolves https://github.com/pytorch/pytorch/issues/24772 This port makes sure that it interacts correctly with the "deterministic algorithms" flag, as done in https://github.com/pytorch/pytorch/pull/51388 This PR also makes these two functions correct in the following aspects (all of them added to the tests as well): - Support for complex numbers - Correct handling of scalar inputs and zero-dimensional inputs - Implementation that does not do any copies nor sorting of any of the input tensors - Faster and more correct implementation of the backwards (now it works as it should when `source.shape() != index.shape()`) - Now `put_(..., accumulate=True)` is implemented correctly with atomic operations on GPU / CPU (when possible) and is deterministic (modulo the loss of precision that might happen due to the reordering of a sum of floats) - Adds the `torch.put` function that was missing, (`index_put` exists, for example) - Corrected docs It also adds a much more thorough testing to the operations and their gradients. There is a BC-breaking change, and that is that now we check that the inputs do not overlap in the `put_` operation. This was handled (some of the cases, other cases were wrong) in the TH implementation by making contiguous copies of the inputs. How should we handle this one? Edit. Benchmarks: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device, cmd): print(f"cmd: {cmd}, ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") large_tensor = torch.rand(([size] ndims), device=device) small_tensor = torch.rand((index_len,), device=device) index = torch.randint(size * ndims, (index_len,), dtype=torch.long, device=device) if cmd == "put": command = "large_tensor.put_(index, small_tensor, accumulate=False)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "accumulate": command = "large_tensor.put_(index, small_tensor, accumulate=True)" if device == cuda: command += "; torch.cuda.synchronize()" elif cmd == "take": command = "torch.take(large_tensor, index)" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for method, device in product(["accumulate", "put", "take"], [cpu, cuda]): run_test(3, 1000, 10, device, method) run_test(3, 1000, 1000, device, method) run_test(3, 1000, 10000, device, method) run_test(2, 10000, 100000, device, method) ``` </details> ```python put_(accumulate=False) ``` <details> <summary>ATen CPU (1.5x - 2x speedup)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.05 µs ± 2.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.15 µs ± 5.13 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 21.6 µs ± 13.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 238 µs ± 781 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 722 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 4.89 µs ± 18.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 42.5 µs ± 96.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 428 µs ± 774 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.99 µs ± 16 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.4 µs ± 11.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.6 µs ± 1.12 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: put, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 8.44 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.09 µs ± 4.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.77 µs ± 0.998 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: put, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 15.8 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> ```python put_(accumulate=True) ``` <details> <summary>ATen CPU (x2 speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.12 µs ± 2.91 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 3.14 µs ± 2.05 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 20.8 µs ± 25.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 264 µs ± 263 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 814 ns ± 1.87 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 5.11 µs ± 6.02 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 43.9 µs ± 49.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 442 µs ± 1.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` </details> <details> <summary>ATen GPU (3x - 11x speedup)</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.01 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.4 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.3 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 12.6 µs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 34.7 µs ± 131 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 38.2 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 61.2 µs ± 50.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cmd: accumulate, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 140 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> ```python take() ``` <details> <summary>ATen CPU (1.1x speedup)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.18 µs ± 2.34 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.79 µs ± 2.96 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 16.6 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 161 µs ± 984 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>TH CPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cpu 1.1 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cpu 2.93 µs ± 7.31 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cpu 18.6 µs ± 14.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cpu 178 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` </details> <details> <summary>ATen GPU (same speed)</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.38 µs ± 23.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 10.7 µs ± 9.77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 10.6 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.5 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> <details> <summary>TH GPU</summary> ```python cmd: take, ndims: 3, tensor_size: 1000, index_len: 10, device: cuda 9.31 µs ± 7.57 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 1000, device: cuda 9.52 µs ± 5.78 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 3, tensor_size: 1000, index_len: 10000, device: cuda 9.73 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) cmd: take, ndims: 2, tensor_size: 10000, index_len: 100000, device: cuda 11.7 µs ± 5.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` </details> cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53356 Reviewed By: mruberry Differential Revision: D27520243 Pulled By: ngimel fbshipit-source-id: e3979349c2c62d2949e09fb05e5fd4883fbc9093	2021-04-05 18:05:38 -07:00
Jane Xu	bf37bf7da4	Make JSON files more human readable (#55335 ) Summary: Prettifies JSON files .pytorch-test-times and .pytorch-slow-tests so that not everything is on one single line. This is of slightly more importance as generated .pytorch-slow-tests ends up getting stored in our test-infra repo ([example](`ad9cd87565`)), and it is nice to not have that lil red symbol at the end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55335 Reviewed By: samestep Differential Revision: D27576930 Pulled By: janeyx99 fbshipit-source-id: be58565b8c8593a9bfcfab383ee19facc79f0572	2021-04-05 17:23:36 -07:00
Yi Wang	b986a76d91	Clang-format distributed.py (#55254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55254 ghstack-source-id: 125680320 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27542846 fbshipit-source-id: 700c3e59a9df98233fdb27054b472f5cb33eb604	2021-04-05 16:48:22 -07:00
Yi Wang	6a2f046504	[SPMD] Restrict DDP communication hooks to SPSD mode (#55253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55253 Previously DDP communication hooks takes a tensor list as the input. Now only takes a single tensor, as the preparation of retiring SPMD and only providing a single model replica for DDP communication hooks. The next step is limiting only 1 model replica in Reducer. ghstack-source-id: 125677637 Test Plan: waitforbuildbot Reviewed By: zhaojuanmao Differential Revision: D27533898 fbshipit-source-id: 5db92549c440f33662cf4edf8e0a0fd024101eae	2021-04-05 16:46:47 -07:00
Richard Barnes	d690973295	irange on int64_t (#55148 ) Summary: Converts loops of the form: ``` for(int64_t VAR=0;VAR<LIMIT;VAR++) ``` to the form ``` for(const auto VAR : c10::irange(LIMIT)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55148 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27447811 fbshipit-source-id: 6311a094ec4a81a0b57383aaee0ba1b1dc2445c4	2021-04-05 16:14:00 -07:00
Meghan Lele	ef262575dd	[pytorch] Fix printing of optional string arguments in schemas (#55196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55196 This commit fixes printing of default values for optional string type arguments in schemas. At the moment, these default values are not printed as quoted strings. If a schema with an optional string type parameter with a default value that is not `None` is printed and then parsed, the lack of quotes causes a parsing error. ghstack-source-id: 125655241 Test Plan: This commit adds a unit test to `test_function_schema.py` to test this case. Differential Revision: D27525450 fbshipit-source-id: 23a93169e7599e7b385e59b7cfafb17fd76318b7	2021-04-05 15:28:18 -07:00
Peter Bell	2ee02b30b1	Replace rounding_mode="true" with rounding_mode=None (#51988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51988 * #51988 Replace rounding_mode="true" with rounding_mode=None Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27561817 Pulled By: mruberry fbshipit-source-id: 60d1d9c389570f60d599fc1876518717367fb368	2021-04-05 14:53:43 -07:00
Edward Yang	3acbaf834e	Make structured functions properly check device/dtype of explicit out args (#55150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55150 Somehow I forgot to add these checks. Now they're in here. Thanks ngimel for noticing. This is probably a slight efficiency hit on TensorIterator, which is probably already doing all these checks. Would be good to follow up on this, though it may not be easily fixable with the TI rewrite. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D27523879 Pulled By: ezyang fbshipit-source-id: 458e617dbc6de6fcfa9e5841148b30b99f52e001	2021-04-05 14:42:43 -07:00
Naveed Golafshani	45aaaef22c	Fix timer overflow on small, fast snippets (#55200 ) Summary: - Fixes https://github.com/pytorch/pytorch/issues/54114 - Capped estimated block size to the largest multiple of ten less than C++ INT_MAX Pull Request resolved: https://github.com/pytorch/pytorch/pull/55200 Test Plan: unit test doesn't throw exception as expected Reviewed By: robieta Differential Revision: D27542652 Pulled By: naveedgol fbshipit-source-id: 3ba68ce84d5fa1d8338cdd5c9f9e5d8c9adda51c	2021-04-05 14:11:26 -07:00
Jerry Zhang	7613b1150b	[docs][quant] Add fx graph mode quant api doc (#55306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55306 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27567187 fbshipit-source-id: ceef873b78fc77e366a47be66c8efd856bac013e	2021-04-05 13:56:23 -07:00
Kimish Patel	e61f5b586b	Revert D27404164: [PyTorch] Devirtualize is_contiguous Test Plan: revert-hammer Differential Revision: D27404164 (`62aa924368`) Original commit changeset: e1dce8c02100 fbshipit-source-id: 9caad109f371607479314501653c275ad95120b8	2021-04-05 13:41:31 -07:00
Scott Wolchok	62aa924368	[PyTorch] Devirtualize is_contiguous (#54896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54896 This should help performance. (For example, it improves total time spent in a C++ benchmark that just adds 2 tensors in place by about 10%.) ghstack-source-id: 125659451 Reviewed By: bhosmer Differential Revision: D27404164 fbshipit-source-id: e1dce8c02100ee4ce22510298c7e0d0f192be201	2021-04-05 13:16:49 -07:00
Eli Uriegas	d0ffada9ee	.github: Add scale-config.yml (#55315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55315 Tested here: https://github.com/seemethere/test-repo/actions/runs/720143591 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D27572122 Pulled By: seemethere fbshipit-source-id: 0b5a772cebf2a8adb9b8805fd813e9cfbe0249d7	2021-04-05 12:49:09 -07:00
Scott Wolchok	f4a618bb5a	[PyTorch] Don't create intermediate Tensor for at::result_type w/Scalar (#55232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55232 Fixes https://github.com/pytorch/pytorch/issues/55229 . ghstack-source-id: 125616311 Test Plan: Looks like test/test_type_promotion.py covers this. Reviewed By: ezyang Differential Revision: D27536521 fbshipit-source-id: 3e686934f845588da07de9190c9760c8ed453caf	2021-04-05 12:19:19 -07:00
Eli Uriegas	fffdc5fa2f	docs: Pin docutils to 0.16 (#55309 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/55309 Reviewed By: seemethere, samestep Differential Revision: D27569585 Pulled By: agolynski fbshipit-source-id: 09f7ee08a0aea9fffd118a290f2295fe9dcab25a	2021-04-05 11:31:09 -07:00
Taylor Robie	5339d534a3	Add runner for instruction count benchmarks. (#54652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54652 This PR adds a fairly robust runner for the instruction count microbenchmarks. Key features are: * Timeout and retry. (In rare cases, Callgrind will hang under heavy load.) * Robust error handling and keyboard interrupt support. * Benchmarks are pinned to cores. (Wall times still won't be great, but it's something.) * Progress printouts, including a rough ETA. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27537823 Pulled By: robieta fbshipit-source-id: 699ac907281d28bf7ffa08594253716ca40204ba	2021-04-05 11:18:57 -07:00
Taylor Robie	c5a1eb4156	extend benchmarks (#54651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54651 This PR fleshes out the benchmarks to everything I could come up with. (166 individual cases when all is said and done.) If there's anything you feel warrants a spot in CI that I've missed, by all means let me know. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D27537824 Pulled By: robieta fbshipit-source-id: 3819e8fec2131c6b5f29f5099cd41e79131bed90	2021-04-05 11:17:12 -07:00
Nikita Shulga	c9b214f9fb	Add Python-3.9 PyTorch M1 nightly builds (#55278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55278 Reviewed By: janeyx99 Differential Revision: D27554985 Pulled By: malfet fbshipit-source-id: 8d2cd0ef1cea7f2c7c586da798f07dde4581d279	2021-04-05 10:43:23 -07:00
Facebook Community Bot	a102adb55e	Automated submodule update: FBGEMM (#54575 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `7c0c486650` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54575 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia, yns88 Differential Revision: D27286716 fbshipit-source-id: 03b83dacc04edecebbb5b49046baa27deb5ba541	2021-04-05 10:18:36 -07:00
lezcano	7fd3c030ef	Write OpInfo for dist (#55092 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53516 cc anjali411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55092 Reviewed By: nikithamalgifb Differential Revision: D27493577 Pulled By: anjali411 fbshipit-source-id: c7e8400a20bbc7138249b249e322b3b23e112336	2021-04-05 10:09:56 -07:00
Alexander Golynski	6c8270ea21	fix bc breakage of #52043 (#55303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55303 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D27567671 Pulled By: agolynski fbshipit-source-id: 771e75b68be52dd5dd31437238d1f9fef481f853	2021-04-05 09:49:22 -07:00
Martin Yuan	ebf40e6ed2	CI: Run test_lite_interpreter_runtime from built lib directly (#55291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55291 From the script the build happens in cpp-bulid/caffe2. All the executables and dylibs are available there. It may be more straightforward and accurate to use those binaries, instead of copying the test binary to miniconda3 and use dylibs from there. Test: CI, especially pytorch_macos_10_13_py3_lite_interpreter_build_test. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D27566631 Pulled By: iseeyuan fbshipit-source-id: 402b9941ab422979d53243624f67d65752213191	2021-04-05 09:19:33 -07:00
Antonio Cuni	980d6f2589	torch.linalg.det (#53119 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51652. In particular: - the main implementation is in `torch.linalg.det` now. `torch.det` is just a deprecated alias to it - add a new `OpInfo` for `torch.linalg.det` - remove the old-style tests for `torch.det` (this is similar to what we did for `torch.linalg.slogdet`, see https://github.com/pytorch/pytorch/issues/49194) - added a `out=` argument to `torch.linalg.det`, but not to `torch.det`. It is worth noting that I had to skip few tests: - `TestGradientsCuda::test_fn_gradgrad_linalg_det_cuda_float64`. This is not a regression: the functionality is broken also on master, but the test is not executed properly due to https://github.com/pytorch/pytorch/issues/53361. And the following tests which fails only on ROCm: - `test_variant_consistency_jit_cuda_{float64,float32}` - `test_fn_grad_cuda_float64` I think that the ROCm tests fail because the current linalg.det backward is unstable if the matrix has repeated singular values, see https://github.com/pytorch/pytorch/issues/53364 . (At the moment of writing some CI jobs are still running but I believe the build will be green, since the only difference wrt the last push is the skip of the ROCm tests) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53119 Reviewed By: H-Huang Differential Revision: D27441999 Pulled By: mruberry fbshipit-source-id: 5eab14c4f0a165e0cf9ec626c3f4bb23359f2a9e	2021-04-05 08:45:27 -07:00
Nikitha Malgi	197f9f0826	Merge CUDA Streams and Events (#53902 ) Summary: ----------- - Updates current_stream and default stream API's to take `optional[device]` argument - Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT - Merges StreamContext manager for both Eager and JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902 Test Plan: ------ Run JIT tests: python test/test_jit.py -v TestCUDA Run eager tests: python test/test_cuda.py -v TestCuda Reviewed By: glaringlee Differential Revision: D27494627 Pulled By: nikithamalgifb fbshipit-source-id: b30b0570e38a33fb335c83762eb06ffd46a44b5c	2021-04-05 08:19:55 -07:00
albanD	5e72571df3	Fix wrong changes from #54103 (#54610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54610 The `.is_view()` method actually only refers to backward mode views This is not a problem right now in master (and thus I didn't revert the other PR) because nothing creates forward AD views. Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D27396756 Pulled By: albanD fbshipit-source-id: 64ff11c6f2486c6430714988d1cf6ecf3d80dccb	2021-04-05 07:48:23 -07:00
Edward Yang	f3969d3db6	Fix bug in self.assertExpectedInline (#55149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55149 I was wondering why no one used this function. It's because it doesn't work! Also a small doc improvement for expected inline. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zhangguanheng66 Differential Revision: D27523880 Pulled By: ezyang fbshipit-source-id: a1d80c088ebf1c58a2b9b13d28f7f23d08c42e60	2021-04-05 06:37:36 -07:00
generatedunixname89002005325676	edb919376d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27563536 fbshipit-source-id: 2323c810b4bcac9934e90675d6291822d463b081	2021-04-05 04:17:35 -07:00
Stephen Macke	c821b83ab3	[typing] make mypy-protobuf output compatible with pyre for caffe2 type stubs (#55294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55294 Some static checkers like pyre have difficulties with types like `builtings.type`, so we strip the `builtins` prefix from autogened proto type stubs. Test Plan: Let CI run. Reviewed By: d4l3k Differential Revision: D27477699 fbshipit-source-id: 45e19835974200a030817d37aec785e3ecb23e8b	2021-04-05 03:23:31 -07:00
Mike Ruberry	3d492b0697	Revert D27505153: [pytorch][PR] OpInfo: `atan2` Test Plan: revert-hammer Differential Revision: D27505153 (`e309ab8510`) Original commit changeset: 45430ad0a7ef fbshipit-source-id: 630c287e9344b32bd3fcf5092e3e952907774fba	2021-04-05 02:11:00 -07:00
Xiao Wang	bcdcf347cb	Add cusolver potrs and potrsBatched to the backend of torch.cholesky_solve (#54315 ) Summary: This PR adds cusolver potrs and potrsBatched to the backend of torch.cholesky_solve and torch.linalg.cholesky_solve. `cholesky_solve` heuristics: - If magma is not installed, or batch_size is 1: - If batch_size > 1 and nrhs == 1, dispatch to `cusolverDn<T>potrsBatched`, - Otherwise, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. Note: `cusolverDn<T>potrsBatched` only supports `nrhs == 1`. It is used for `nrhs==1` batched matrix if magma is not installed. See also https://github.com/pytorch/pytorch/issues/42666 #47953 Todo: - [x] benchmark and heuristic Pull Request resolved: https://github.com/pytorch/pytorch/pull/54315 Reviewed By: ngimel Differential Revision: D27562225 Pulled By: mruberry fbshipit-source-id: 323e5d60610abbbdc8369f5eb112d9fa01da40f6	2021-04-05 02:03:55 -07:00
kshitij12345	0a81034dd0	Port atan2 to structured kernel (#55130 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/55070 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55130 Reviewed By: gchanan Differential Revision: D27502777 Pulled By: ezyang fbshipit-source-id: 9c368e2c3670f5633e059024ccff8b3e95e2733e	2021-04-05 00:12:42 -07:00
Akao, Kazutoshi	d2a58bfe6f	Add mkldnn tanh operator (#54656 ) Summary: ## 🚀 Feature Add Mkl-Layout kernel for tanh. ## Motivation We want to add a Mkl-Layout kernel for tanh to improve tanh's performance when the input Tensor is Mkl-Layout. Because, PyTorch does not have the Mkl-Layout kernel for tanh, so it cannot execute the tanh input by the Mkl-Layout Tensor. Off course you can temporarily avoid this problem by executing to_dense/to_mkldnn, but the performance is significantly reduced due to the copy overhead(1.6-4.3 times slower than CPU kernel). ## Perfomance results ### Environment - CPU: Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz - OS: 18.04.1 LTS - compiler: gcc 7.5.0 - branch: master - commit ID: fe2c126 - build Environment variable: USE_CUDA=0 - Python: 3.6.9 - Intel MKL(Math Kernel Library): 2020.2-254 - Intel oneDNN: 1.8.1 ### Benchmark script ``` python import torch import torch.nn as nn torch.manual_seed(1) x = torch.randn(2048, 2048) x_mkl = x.to_mkldnn() print("### CPU tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### CPU tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.to_dense().tanh().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### to_dense/to_mkldnn + tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.to_dense().tanh_().to_mkldnn() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): output = x_mkl.tanh() print(prof.key_averages().table(sort_by="self_cpu_time_total")) print("\n### Mkl-Layout tanh_") with torch.autograd.profiler.profile(record_shapes=True) as prof: for i in range(100): x_mkl.tanh_() print(prof.key_averages().table(sort_by="self_cpu_time_total")) ``` ### Results #### OMP_NUM_THREADS=1 Results(Self CPU time total ms) \| Operation \| CPU kernel \| to_dense/to_mkldnn+CPU kernel \| Mkl-Layout kernel(This PR) \| \| ---------- \| ---------- \| ----------------------------- \| -------------------------- \| \|tanh \| 579.662 \| 1658.000 \| 617.565 \| \| tanh_ \| 554.477 \| 881.997 \| 589.426 \| #### OMP_NUM_THREADS=6 Results(Self CPU time total ms) \| Operation \| CPU kernel \| to_dense/to_mkldnn+CPU kernel \| Mkl-Layout kernel(This PR) \| \| ---------- \| ---------- \| ----------------------------- \| -------------------------- \| \|tanh \| 182.387 \| 421.336 \| 136.226 \| \| tanh_ \| 94.331 \| 404.931 \| 99.254 \| ## Modification policy for the code oneDNN is already supported tanh operation. [oneDNN: Elementwise](https://spec.oneapi.com/versions/latest/elements/oneDNN/source/primitives/eltwise.html) There is already exist sigmoid implementation that uses the same Elementwise API as tanh, so we created this PR code with reference to the sigmoid implementation. `527c1e0e37/aten/src/ATen/native/mkldnn/UnaryOps.cpp (L28-L42)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54656 Test Plan: A test for sigmoid has already been created as shown below. So, I added a new test of tanh referring to the test of sigmoid. `527c1e0e37/test/test_mkldnn.py (L944-L954)` ### mkldnn tanh test result ``` $ python3 test/test_mkldnn.py TestMkldnn.test_tanh Couldn't download test skip set, leaving all tests enabled... . ---------------------------------------------------------------------- Ran 1 test in 0.004s OK ``` Reviewed By: gchanan Differential Revision: D27395827 Pulled By: ezyang fbshipit-source-id: d4481332de187e2dea095f9b6aabc73a497960fe	2021-04-05 00:00:16 -07:00
Rohan Varma	19a0eb4cdb	[c10d] Monitored barrier: option to collect all failed ranks (#55010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55010 Follow up change to add a flag to provide an option for monitored barrier to collect all the failed ranks and then throw instead of just throwing on the first one. This is useful as now monitored barrier will be able to pick up on all hanging ranks instead of just one. This is done by passing in a flag `wait_all_ranks=True`. ghstack-source-id: 125699839 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27447787 fbshipit-source-id: ec23aee212060d9eb515ff8adc96c6a17822d1bb	2021-04-04 21:39:54 -07:00
Rohan Varma	0ec1af4b7e	[c10d] Enforce order of waited ranks in monitored barrier. (#55009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55009 Changes monitoredBarrier so that we await acknowledgemenet from ranks in a consistent order (from least to greatest). This will reduce confusion around the order the ranks are awaited. We are still planning to add support for awaiting all ranks in follow up changes. ghstack-source-id: 125699838 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27405417 fbshipit-source-id: b9a3e72742cbffdd9bf890ab2c94103b768a7b71	2021-04-04 21:38:25 -07:00
kshitij12345	e309ab8510	OpInfo: `atan2` (#55132 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55132 Reviewed By: gchanan Differential Revision: D27505153 Pulled By: mruberry fbshipit-source-id: 45430ad0a7efab0b32c945356aa49f45d0175f83	2021-04-04 21:24:06 -07:00
Mike Ruberry	c0ac0fef4e	Revert D27448156: irange for size_t Test Plan: revert-hammer Differential Revision: D27448156 (`041b4431b2`) Original commit changeset: 585da57d4de9 fbshipit-source-id: 8e047c29f391c0166e0a1a87c3fb2a0854377365	2021-04-03 19:14:00 -07:00
Pritam Damania	e3691be2d9	Dump C++ stack traces of all threads for distributed tests. (#55003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55003 Using the `caffe2::setPrintStackTracesOnFatalSignal` utility in distributed tests to set a signal handler that dumps the state of all threads for all processes when it receives a FATAL signal. This would help in debugging tests further. I had to revert all the python faulthandler code since only one signal handler function is supported, so running python faulthandler with `setPrintStackTracesOnFatalSignal` doesn't work. Sample output: ``` SIGSEGV(11), PID: 3492872, Thread 3492872: [0] ???(0x7fa7b2d1d61b) in libcaffe2_caffe2_caffe2_cpu.so [1] ???(0x7fa7b2d1d3fb) in libcaffe2_caffe2_caffe2_cpu.so [2] ???(0x7fa7b2d1d33d) in libcaffe2_caffe2_caffe2_cpu.so [3] ???(0x7fa7b2d1d167) in libcaffe2_caffe2_caffe2_cpu.so [4] ???(0x7fa7ce683150) in libpthread.so.0 [5] ???(0x7fa7be2b233c) in libcaffe2__C_impl_cuda.so [6] ???(0x7fa7be2ce80c) in libcaffe2__C_impl_cuda.so [7] ???(0x7fa7be2a0512) in libcaffe2__C_impl_cuda.so [8] torch::distributed::rpc::TensorPipeAgent::send(torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, float, std::unordered_map<signed char, signed char, std::hash<signed char>, std::equal_to<signed char>, std::allocator<std::pair<signed char const, signed char> > > const&)+0x24f(0x7fa7be29f71f) in libcaffe2__C_impl_cuda.so [9] torch::distributed::autograd::sendMessageWithAutograd(torch::distributed::rpc::RpcAgent&, torch::distributed::rpc::WorkerInfo const&, torch::distributed::rpc::Message&&, bool, float, bool)+0x393(0x7fa7b602b203) in libcaffe2_libtorch.so [10] torch::distributed::rpc::pyRpcPythonUdf(torch::distributed::rpc::WorkerInfo const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<at::Tensor, std::allocator<at::Tensor> >&, float, bool)+0x201(0x7fa7bd844971) in libcaffe2__C_impl_cuda.so ``` ghstack-source-id: 125630551 Test Plan: waitforbuildbot Reviewed By: SciPioneer Differential Revision: D27419714 fbshipit-source-id: 8aca9a14ef688004053d8798124d9c3a3fbe3489	2021-04-03 13:59:56 -07:00
Winston Smith	8ed20b3f65	Leak Caffe2 threadpool in child processes right after fork to prevent segfault (#54895 ) Summary: ## Problem summary Fixes https://github.com/pytorch/pytorch/issues/54752 - when the number of threads is more than 3 and at least one `set_num_threads` invocation has taken place before forking child processes by the dataloader, `set_num_threads(1)` in the child process causes a segfault, as during its invocation, the child process is made to handle the data structures of the Caffe2 thread-pool of the parent process, whose data structures it inherits from the parent process (these threads don't exist in the child process, but some of its data structures do, due to the copy-on-write technique used by `fork`). ## Solution malfet [advised](https://github.com/pytorch/pytorch/issues/54752#issuecomment-810315302) & [authored code](https://github.com/pytorch/pytorch/pull/54895#pullrequestreview-625670122) for adding a `pthread_atfork` handler in `pytorch/caffe2/utils/threadpool/pthreadpool-cpp.cc`, that's invoked in the child process right after fork, to leak the Caffe2 thread-pool (the child inherits the thread-pool's data structures from its parent process, but doesn't actually have those threads, since after `fork` , a child process only has one thread). ## Additional changes Added unittest `test_no_segfault` to test for this issue in `test_dataloader.py` Also enabled `test_segfault` (which actually makes sure that segfaults happen in worker processes in a particular case). Pull Request resolved: https://github.com/pytorch/pytorch/pull/54895 Reviewed By: zhangguanheng66 Differential Revision: D27542253 Pulled By: malfet fbshipit-source-id: 10f9c67ce1ff1aa37d3efebf405bd93f7f9d2489	2021-04-03 10:51:20 -07:00
Nikita Shulga	8377e6221a	Revert D27478225: [pytorch][PR] Added pow() on CPU for float16 & bfloat16 Test Plan: revert-hammer Differential Revision: D27478225 (`6d030c14cf`) Original commit changeset: d309dd98d5a9 fbshipit-source-id: e0518f15185b41946caf3a8456c7af3f52e5a910	2021-04-03 10:26:44 -07:00
Michael Suo	a84c92b78b	[package] populate a special attribute on imported modules (#55255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55255 This allows packaged code to detect whether or not they are used in a packaged context, and do different things depending on that. An example where this might be useful is to control dynamic dependency loading depending on whether or not something is packaged. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D27544245 Pulled By: suo fbshipit-source-id: 55d44ef57281524b8d9ab890bd387de97f20bd9f	2021-04-03 00:58:59 -07:00
Richard Barnes	041b4431b2	irange for size_t (#55163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55163 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27448156 fbshipit-source-id: 585da57d4de91c692b6360d65f7b8a66deb0f8c1	2021-04-02 23:22:29 -07:00
Yi Wang	322854d2f0	[SPMD] Error out SPMD in C++ Reducer (#55212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55212 Error out SPMD in C++ Reducer. Added a new test `test_reducer_no_multi_replicas`, which checks no multiple replicas are allowed at the Reducer constructor. Removed 2 tests relevant to reducer in SPMD mode: `test_ddp_comm_hook_multiple_replica_check` `test_forward_backward_multi_replica` ghstack-source-id: 125602472 Test Plan: waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D27497747 fbshipit-source-id: 17ef1bc4d889cbe8076bcb3d504aed4c1aea1562	2021-04-02 22:59:25 -07:00
Yukio Siraichi	4170a6cc24	Migrate `mode` from TH to ATen (#52043 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24731 #24673 https://github.com/pytorch/pytorch/issues/24597 #24526 https://github.com/pytorch/pytorch/issues/46507 Related https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52043 Reviewed By: mruberry Differential Revision: D27468266 Pulled By: ngimel fbshipit-source-id: 35a3229c2a706da9bad4ccd0070161831e5476ba	2021-04-02 22:21:53 -07:00
Mikhail Zolotukhin	e8dbd0e1a0	[TensorExpr] Minor cleanups in kernel.cpp. (#55257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55257 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D27544659 Pulled By: ZolotukhinM fbshipit-source-id: c2f51be1a42df090a105689c8e3e91446e9ea8b4	2021-04-02 21:47:48 -07:00
James Reed	641d4ff160	[FX] Add stride to shape_prop pass (#55108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55108 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27482241 Pulled By: jamesr66a fbshipit-source-id: 7d928015712126e916c86225dc3ab27aba22d431	2021-04-02 19:57:11 -07:00
Oleg Khabinov	28531c97b2	[caffe2] Shape inference for Transpose (#55188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55188 We need to make sure dim types are preserved after applying Transpose. Test Plan: ``` $ buck build caffe2/caffe2/opt:bound_shape_inference_test && ./buck-out/gen/caffe2/caffe2/opt/bound_shape_inference_test --gtest_filter=Transpose ``` Reviewed By: yinghai Differential Revision: D27514487 fbshipit-source-id: 431b7f2d08664f2ec311a733c926dbb52c63a7d4	2021-04-02 17:43:27 -07:00
Winston Smith	6d030c14cf	Added pow() on CPU for float16 & bfloat16 (#50999 ) Summary: Added the functionality desired in https://github.com/pytorch/pytorch/issues/50789. 1. Added support for pow() on CPU for `float16` (`Half`) and `bfloat16` types. Both `pow(Tensor, Scalar)` and `pow(Tensor, Tensor)` are now supported for the aforementioned types. However autograd isn't supported for `Float16` on CPU yet, as `log_vml_cpu` can't be enabled for it. 2. heitorschueroff added `pow_tensor_scalar_optimized_kernel` to refactor & simplify `PowKernel.cpp`. It provides a common path for all the complex types & floating point types (except Float16, due to lack of complete AVX2 vectorization support for it). It replaced code that had previously been duplicated for (float, double) and complex types, so PowKernel.cpp looks a lot cleaner now. 3. Enabled (unskipped) some tests for `erf`, `erfc`,`erfinv`, `linalg.norm` and `linalg.vector.norm` which were being skipped earlier due to `pow()` not having been implemented for `float16` & `bfloat16`. 4. Added an OpInfo for `pow()` & enabled some test cases for `pow()`. 5. Extended the coverage of existing tests for `pow` in `test_binary_ufuncs.py` in order to enable comparison with `numpy`, even with discontiguous tensors, and added a test to ensure that a runtime error is raised for `pow`'s inplace variant if resizing the base tensor is required during its invocation. 6. Added `float16` & `bfloat16` to `square`'s dtype lists in its `UnaryUfuncInfo`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50999 Reviewed By: zou3519 Differential Revision: D27478225 Pulled By: heitorschueroff fbshipit-source-id: d309dd98d5a96d0cb9b08281757bb1c65266d011	2021-04-02 15:57:06 -07:00
Erjia Guan	c549a147a9	[DataLoader] Typing Doc (#54773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54773 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27364246 Pulled By: ejguan fbshipit-source-id: 48908555853c364d2d3cc173e3b73a6bec2e19f1	2021-04-02 15:22:35 -07:00
Erjia Guan	0b1c3dfae4	[DataLoader] Typing Enforcement for DataPipe at runtime (#54544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54544 ## Feature - Add `subinstance(data, type)` to check `data` is a subtype instance of the `type` - Add a decorator of `runtime_validation` to validate the returned data from `__iter__` is subtype instance of hint. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27327234 Pulled By: ejguan fbshipit-source-id: fb6a332762b0fe75284bb2b52a13ed171b42558c	2021-04-02 15:22:32 -07:00
Erjia Guan	1535520f08	[DataLoader] Typing Enforcement for DataPipe at construct-time (#54066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54066 ## Feature - Add a decorator `construct_time_validation` to validate each input datapipe according to the corresponding type hint. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27327236 Pulled By: ejguan fbshipit-source-id: a9d4c6edb5b05090bd5a369eee50a6fb4d7cf957	2021-04-02 15:22:29 -07:00
Erjia Guan	44edf8c421	[DataLoader] Typing Enforcement for DataPipe at Compile-time (#54020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54020 ## Feature - Add `issubtype` to check the type is a subtype of the other type. - Add `_DataPipeMeta` (mimic Python typing 3.6) - Add `type` attribute for each DataPipe - Save original `__init__` function for each DataPipe - Validate return hint of `__iter__` - Replace `__init__` function bases on `type` - Fixed type: Put original `__init__` back, if it exists or use a plain `__init__` - Non-fixed type: Add new `__init__` with the functionality to copy `cls.type` for each instance. (Optimized for memory) No Error for main repo, `torchvision`, `torchaudio` and `torchtext`. ## Future - Add same thing for `__getitem__`. - When DataFrame came out, add an another type for DataFrame with column name and type. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27327232 Pulled By: ejguan fbshipit-source-id: fd3a6029c16f5d814b1d7e1b1566fdcd8fd1ad9a	2021-04-02 15:22:27 -07:00
Erjia Guan	560e3be587	[DataLoader] Implement issubtype for type hints (#54299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54299 ## Feature - Check type is a subtype of another type Prerequisite for DataPipe tying system. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D27327235 Pulled By: ejguan fbshipit-source-id: 8f50a663a86540677c9e132ac7c5216fdac46f70	2021-04-02 15:20:55 -07:00
Louis Feng	159fdde9ae	Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55012 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D27449877 fbshipit-source-id: 69918b729565f5899471d9db42a587f9af52238d	2021-04-02 15:16:17 -07:00
Yi Wang	2452182e6c	[SPMD] Remove test_grad_layout_1devicemodule_2replicaperprocess (#54826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54826 This test will no longer work, because we errored out SPMD in #54454. This test is already disabled. ghstack-source-id: 125602473 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27381719 fbshipit-source-id: a3079ff0766f91112cbe58c1f00c1b02d241c8cd	2021-04-02 15:13:47 -07:00
Yi Wang	e589247a19	[SPMD] Change assertions to raising value errors in distributed.py (#54825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54825 These assertions are tested in test_c10d.py Context: https://github.com/pytorch/pytorch/pull/54454#discussion_r602657818 ghstack-source-id: 125602462 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_multi_device_module_config Reviewed By: rohan-varma Differential Revision: D27381649 fbshipit-source-id: 9b994e9c2acf796770c2f2af2cebdd5561834d14	2021-04-02 15:13:45 -07:00
Yi Wang	6a40339920	[SPMD] Error out SPMD mode (#54454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54454 According to the pitch in https://github.com/pytorch/pytorch/issues/47012 1. Let DDP error out if `device_ids` contains multiple devices. 2. If device_ids is not specified, DDP will use the provided model (module argument in DDP constructor) as-is, regardless if the model is on one GPU or multiple GPUs or on CPU. 3. Remove the assertion that prevents SPMD in DDP `join()` method, because now SPMD is already forbidden by the constructor. Also remove the relevant unit test `test_ddp_uneven_inputs_replicated_error`. #Closes: https://github.com/pytorch/pytorch/issues/47012 ghstack-source-id: 125644392 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_spawn -- test_cuda buck test mode/dev-nosan caffe2/test/distributed:distributed_gloo_spawn -- test_rnn buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_nccl_backend_multi_device_ids_not_allowed buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_nccl_backend_single_device_module_device_ids_None buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_nccl_backend_multi_device_module_device_ids_None buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_ddp_multi_device_module_config waitforbuildbot Reviewed By: pritamdamania87 Differential Revision: D27226092 fbshipit-source-id: 3ee1e4bc46e5e362fc82cf7a24b2fafb34fcf1b9	2021-04-02 15:11:59 -07:00
Garret Catron	6e33420436	Add embedding bag support to fx_glow (#54909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54909 Pull Request resolved: https://github.com/pytorch/glow/pull/5481 This diff adds support for embedding bag to fx_glow and a test case to test_fx_glow. Reviewed By: jfix71 Differential Revision: D27272897 fbshipit-source-id: 9e3be28efee38a01784afceb188a86f6408393dd	2021-04-02 14:20:40 -07:00
Yi Wang	29916dbf1e	Clang-format _distributed_c10d.pyi (#55220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55220 ghstack-source-id: 125597170 Test Plan: N/A Reviewed By: pritamdamania87 Differential Revision: D27531346 fbshipit-source-id: c603cadbff682a9361d0e97d164f18b029e396b1	2021-04-02 13:43:31 -07:00
Nikita Shulga	91a809bbd7	[c10] Adjust macro check that detects if glibc++ use c99 csqrt (#55177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55177 This fixes `warning: '_GLIBCXX11_USE_C99_COMPLEX' is not defined, evaluates to 0`, that would be raised if https://github.com/pytorch/pytorch/pull/54820 used with libstd++ compiled without USE_C99_COMPLEX support. In `c++config.h` `_GLIBCXX_USE_C99_COMPLEX` is aliased to either `_GLIBCXX98_USE_C99_COMPLEX` or `_GLIBCXX11_USE_C99_COMPLEX` depending on `__cplusplus` macro, as shown here: `0cf4813202/libstdc%2B%2B-v3/include/bits/c%2B%2Bconfig (L641-L647)` Abovementioned config file is generated by autoconf, that leaves macro undefined if feature is not used, so using conditional like `defined(_GLIBCXX_USE_C99_COMPLEX) && _GLIBCXX_USE_C99_COMPLEX == 0` would trigger undefined macro preprocessor warning. Test Plan: CI Reviewed By: Orvid Differential Revision: D27517788 fbshipit-source-id: a6db98d21c9bd98205815641363b765a02399678	2021-04-02 13:20:30 -07:00
Sam Estep	fb64caedb5	Don't fail "Add annotations" if "Lint" is canceled (#55242 ) Summary: https://github.com/pytorch/pytorch/issues/54779 split out the logic from our "Lint" workflow into a separate workflow that allows us to annotate PRs from forks. However, as of https://github.com/pytorch/pytorch/issues/54689, it is possible for the "Lint" workflow to be canceled, in which case it may not upload the "flake8-py3" and "clang-tidy" artifacts that the "Add annotations" workflow expects. This often results in GitHub pointlessly sending notification emails due to the failure in the "Add annotations" workflow. This PR fixes the issue by gracefully handling the case where the expected artifact is absent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55242 Test Plan: I tested this in the same external sandbox repo used to test https://github.com/pytorch/pytorch/issues/54779. Reviewed By: malfet Differential Revision: D27540120 Pulled By: samestep fbshipit-source-id: 47cc02950edbbc6381033bda2fe4570cb3e331cb	2021-04-02 12:40:20 -07:00
Maxim Grechkin	38a08a49ea	Flip clip_grad_norm default for error_if_nonfinite to false (#55169 ) Summary: Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169 Reviewed By: mruberry Differential Revision: D27511150 Pulled By: jbschlosser fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525	2021-04-02 12:25:32 -07:00
Meghan Lele	6866c033d5	[JIT] Add recursive scripting for class type module attributes (#55124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55124 Summary This commit modifies type inference (used by the module scripting code) so that it tries to script the type of any class instances that it encounters. This enables recursive, automatic scripting of class type module attributes. Test Plan This commit adds a test case for this to `TestClassType`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D23971883 Pulled By: SplitInfinity fbshipit-source-id: 7a5a2e7c12ee68cbdeb0a07e6aaf98734a79cb06	2021-04-02 12:16:21 -07:00
Heitor Schueroff	6e2d020037	Add interpolation kwarg to torch.quantile (#49267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49267 This PR builds upon the PR https://github.com/pytorch/pytorch/pull/48711 by RockingJavaBean. The original PR introduced a BC breaking change by making the interpolation parameter positional. Thus, previous invocations of torch.quantile that did not include the interpolation parameter failed after the PR landed. To avoid BC breaking changes, we preserve the original signatures and make the interpolation parameter in the new signatures kwarg only. For now, interpolation cannot have a default value to avoid ambiguity with the deprecated signature. However, due to limitations of codegen and C++, we cannot have a required arg after optional ones. Thus, this PR also makes dim and keepdim requires args. Once we can remove the old signatures, dim, keepdim and interpolation parameters in the new signature will get the default values back. __TODO__ --- - [ ] Run backward compat tests This reverts commit 2f1d1eb7df5e8032392b73751c84025a2aa3d1ee. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D27337117 Pulled By: heitorschueroff fbshipit-source-id: 7fe31f22027645e0d6cb3cab0392d532a4b362c9	2021-04-02 12:11:36 -07:00
Yi Wang	e593044748	[Gradient Compression] Update a warning in ddp_comm_hooks.rst (#55031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55031 It turns out that PowerSGD hooks can work on PyTorch native AMP package, but not Apex AMP package, which can somehow mutate gradients during the execution of communication hooks. {F561544045} ghstack-source-id: 125268206 Test Plan: Used native amp backend for the same pytext model and worked: f261564342 f261561664 Reviewed By: rohan-varma Differential Revision: D27436484 fbshipit-source-id: 2b63eb683ce373f9da06d4d224ccc5f0a3016c88	2021-04-02 12:07:50 -07:00
Mikhail Zolotukhin	7ab53eb960	[StaticRuntime] Unbreak benchmarks. (#55199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55199 Test Plan: Imported from OSS Reviewed By: walterddr, hlu1 Differential Revision: D27526600 Pulled By: ZolotukhinM fbshipit-source-id: 9318cb5d6adca3e8073f8ec4219afc3cc1c75f7c	2021-04-02 12:03:56 -07:00
Scott Wolchok	a0bb0968d5	[PyTorch] Don't bother with SmallVector in TensorMaker (#55125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55125 We can provide an ArrayRef to 1-5 zeros much more efficiently, like this. ghstack-source-id: 125471024 Test Plan: Existing CI Reviewed By: ezyang Differential Revision: D27494800 fbshipit-source-id: 5e2addfabae70960475a4b322925cd0eae71b4c6	2021-04-02 11:56:23 -07:00
Pritam Damania	02af4b511d	Enhance Pipe docs to explicitly mention RPC initialization. (#55187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55187 As described in https://github.com/pytorch/pytorch/issues/54927, Pipe docs didn't explicitly mention initializing RPC. This PR improves the docs and also ensures Pipe throws a more useful error message when RPC is not initialized and not an internal assertion error. ghstack-source-id: 125563552 Test Plan: 1) unit test added. 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27521783 fbshipit-source-id: d1a5c6ca789b9a66c07a794468178c25cfd4b743	2021-04-02 11:51:22 -07:00
Ailing Zhang	24c904951c	Replace AutoNonVariableTypeMode with InferenceMode in fbcode. (#55114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55114 Test Plan: CI Reviewed By: ezyang, bhosmer Differential Revision: D27472768 fbshipit-source-id: 76f17ef7de40f6e04e2968f8958027b5f93e1c0c	2021-04-02 11:45:53 -07:00
Ailing Zhang	181de40688	Split copy_ kernel to InplaceOrView. (#55133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55133 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27527939 Pulled By: ailzhang fbshipit-source-id: 5ddaac563b5bab38b7091b5b88e00502cb390f1a	2021-04-02 10:48:47 -07:00
Sam Estep	09670c7d43	Don't globally disable any ShellCheck warnings (#55165 ) Summary: https://github.com/pytorch/pytorch/issues/47786 updated ShellCheck and fixed the warnings that it was already giving in CI (since it previously didn't cause the job to fail). https://github.com/pytorch/pytorch/issues/54069 enabled two ShellCheck warnings that previously were globally disabled. This PR continues the trend by reenabling the remaining four ShellCheck warnings that previously were globally disabled. Also, this PR puts as many remaining ShellCheck arguments as possible into `.shellcheckrc` to make it easier to integrate with editors. For instance, in VS Code, this is now all that is needed (due to https://github.com/koalaman/shellcheck/issues/1818 and the fact that VS Code only runs ShellCheck on one file at a time): ```json { "shellcheck.customArgs": [ "--external-sources" ] } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55165 Test Plan: [The "Lint / quick-checks" job in GitHub Actions](https://github.com/pytorch/pytorch/pull/55165/checks?check_run_id=2250098330), or this command if you want to check locally: ``` .jenkins/run-shellcheck.sh ``` Reviewed By: walterddr Differential Revision: D27514119 Pulled By: samestep fbshipit-source-id: f00744b2cb90a2ab9aa05957bff32852485a351f	2021-04-02 10:41:37 -07:00
Alexander Golynski	978fca64a6	Revert D25399470: add channels last for MaxPool2d Test Plan: revert-hammer Differential Revision: D25399470 (`f43eb59a68`) Original commit changeset: b49b9581f132 fbshipit-source-id: ab8c053964aeecf196f6d932c63ada51a3b7ced8	2021-04-02 10:15:11 -07:00
Ivan Yashchuk	e406d4e6cb	Modified lstsq_helper to accept lapack error codes tensor (#54720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54720 lstsq_helper takes infos tensor now; it is modified in-place. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27439273 Pulled By: mruberry fbshipit-source-id: b964003982b88be85bf305059a15fb92207e2b6f	2021-04-02 09:38:49 -07:00
Vasiliy Kuznetsov	8062545c63	ns for fx: weight extaction for conv1d and conv3d (#55079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55079 Extends weight extraction to conv1d and conv3d. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27474696 fbshipit-source-id: 9d5f892160b1b003aa557cfd099c6834e3f70ded	2021-04-02 09:35:34 -07:00
Vasiliy Kuznetsov	80b1b7e4b1	ns for fx: ensure kwargs are handled when graph matching (#55078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55078 Fixes a TODO, make sure we iterate through kwargs as well as args when navigating graphs. We can use `node.all_input_nodes` convenience property to accomplish this. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27474699 fbshipit-source-id: 8a6e3db5a73328c4f296ac5fce951e81213b6f58	2021-04-02 09:35:32 -07:00
Vasiliy Kuznetsov	a590fa7af4	ns for fx: clean up debug print statements (#55077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55077 Deletes debugging prints from the code, no logic change. Test Plan: CI Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27474700 fbshipit-source-id: 3d9d73da6615ddffdfdb0df270bcdfd2c4b50be3	2021-04-02 09:35:30 -07:00
Vasiliy Kuznetsov	f6b25e758d	ns for fx: move it to top level file (#55060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55060 Removes the previous iteration of Numeric Suite for FX graph mode quantization, and moves the current iteration into the top level file. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXGraphMatcher ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27467725 fbshipit-source-id: 4c22b5a3221857231f9f59cf6d2908820e6a7f12	2021-04-02 09:35:27 -07:00
Vasiliy Kuznetsov	c6cb99a6c7	ns for fx: weight extraction for nni.ConvReLU2d (#54335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54335 Simple fix to enable weight extraction for nni.ConvReLU2d. Note: this module only appears if the internal GraphModule APIs are called, so we add testing for this path. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_extract_weights_mod ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27192844 fbshipit-source-id: 923cf63e29e4638fd77ca42e69aedb15fb20a330	2021-04-02 09:35:25 -07:00
Vasiliy Kuznetsov	5319d17be4	ns for fx: make input logging work for multi node subgraphs (#54327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54327 Makes input logging work properly for multi-node subgraphs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_shadow_activations ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27190137 fbshipit-source-id: 3f39bfd5112d5ee92c1e66c133e970c28db40d46	2021-04-02 09:35:22 -07:00
Vasiliy Kuznetsov	b8019cee0e	ns for fx: make input logging work for multi-node subgraphs (#54326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54326 Fixes unshadowed activation input logging for subgraphs where start_node does not equal end_node. In detail: * instead of passing around a single list of nodes, pass around a list of nodes to instrument inputs, and a list of nodes to instrument outputs. This way we can handle multi-node subgraphs properly, and we also keep the subgraph instance definition out of the public APIs. * add a test case Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16_activations ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27190138 fbshipit-source-id: 58e2377c1c128baaf3b760c1ad29098fb21f53d3	2021-04-02 09:35:20 -07:00
Vasiliy Kuznetsov	cbcde79023	ns for fx: refactor test cases (#54280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54280 Some easy refactors to reduce duplicate logic in test cases for NS for FX. In particular, we start reusing a common model within this file, and we split the fp16 test cases to be more modular. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D27173373 fbshipit-source-id: cf3f21ee8b9b12dff89f1cd2d3ac1749f3f63fe6	2021-04-02 09:35:18 -07:00
Vasiliy Kuznetsov	757e3cbf82	ns for fx: add support for shadowing linear fp16 patterns (#54275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54275 Adds support for NS shadow activations path for the fp16 emulation pattern such as ``` ... -> dequantize -> linear -> relu -> to(torch.float16) -> ... ``` There are a couple of changes necessary here: 1. removing the restriction on the shadowing graph pass that the B subgraph is a single node (since this subgraph is four nodes), and modifying the code to correctly add the relevant inputs versus output loggers (input loggers and subgraph copy if we are at start_node, and output logger if we are at end_node) 2. modifying the logic for calculating node input and output type to work correcty for the `to` and `dequantize` nodes: 2a. make the function return the first input and output, instead of just the first input 2b. make the function handle `dequantize` correctly by recursively using the output if its input 2c. make the function handle `to` correctyl by recursively using the output of its input and the target dtype 3. a bug fix to handle observers in kwargs, while copying subgraphs Note: input logging for these patterns is not tested yet, this will be in the next PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27172655 fbshipit-source-id: 3bdc86618b2a5782627fcf303d58af7f47fbc30d	2021-04-02 09:33:36 -07:00
mingfeima	f43eb59a68	add channels last for MaxPool2d (#48917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48917 max_pool2d channels last support forward path max_pool2d channels last support backward path vectorize channels last forward path rename the header file fix windows build combine PoolingKernel.h into Pool.h add data type check loosen test_max_pool2d_nhwc to cover device CPU Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D25399470 Pulled By: VitalyFedyunin fbshipit-source-id: b49b9581f1329a8c2b9c75bb10f12e2650e4c65a	2021-04-02 09:13:06 -07:00
Mike Ruberry	bdb225e9f0	Revert D27478436: Use tensorpipe::Buffer::device() instead of tensorpipe::Buffer::deviceType(). Test Plan: revert-hammer Differential Revision: D27478436 (`3e185253b6`) Original commit changeset: 3962257bc623 fbshipit-source-id: 6619617af2b32445473f21e73bf3841dd7a491b2	2021-04-02 09:08:50 -07:00
Lucas Hosseini	3e185253b6	Use tensorpipe::Buffer::device() instead of tensorpipe::Buffer::deviceType(). Summary: The `tensorpipe::Buffer::deviceType()` method is going away. Test Plan: CI Reviewed By: lw Differential Revision: D27478436 fbshipit-source-id: 3962257bc6237d1dde7e5f4fddae38abe8384c68	2021-04-02 08:39:39 -07:00
Hao Lu	61914cb2fa	[ATen][qembeddingbag] Avoid tensor refcount bumps (#55023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55023 Test Plan: CI Reviewed By: swolchok Differential Revision: D27453856 fbshipit-source-id: f2b5ed97d3cc179baba4c158871a0225e3ba9030	2021-04-02 06:43:13 -07:00
Hao Lu	93d0f636bb	[c10] Add default constructor to Maybeowned (#55128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55128 Test Plan: CI Reviewed By: swolchok Differential Revision: D27495079 fbshipit-source-id: 3bd01956a8b65170d6b38096dbd15c4809904f88	2021-04-02 06:42:04 -07:00
Yanan Cao	ec609e7420	Adds torch.* API section for TorchScript Lang Ref (#53236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53236 Reviewed By: SplitInfinity Differential Revision: D27526584 Pulled By: gmagogsfm fbshipit-source-id: ea931ea63aa4b37a7782935a1760bebffedc5b67	2021-04-02 03:01:08 -07:00
Dhruv Matani	271879fe67	[PyTorch Edge] Provide a method ObservedOperators::getUnobservedOperatorList() so that model tracer can empty it out during tracing (#55017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55017 JacobSzwejbka in D26678637 found the mis-alignment between the operator list that the YAML file claimed and the dispatcher claimed. After some digging and thorough investigation by JacobSzwejbka, we have come to the conclusion that the non-traced operators are more trouble than they are worth since they will result in phantom operators which every user of the capabilities API needs to be aware of (or every language implementation needs to be aware of). Instead, with this change, we can reliably trace all operators called via the dispatcher by clearing the list of un-observed operators during model tracing. Also another thing to note is that the ignore-list in the observer is a list of base operator names, and not full operator names (with overload), which is whaat tracing based selective build needs. If we use the ignore-list, then we would need to include every overload on un-traced operators. Latency isn't an issue during model tracing, so this should be generally okay. Ran the following command to re-generate all the YAML files: `buck run caffe2/torch/fb/mobile/cli:cli -- --gen_all_model_configs` ghstack-source-id: 125337353 (Note: this ignores all push blocking failures!) Test Plan: Sandcastle and wait for unit tests. Also see BSB results in the diff comments. Reviewed By: JacobSzwejbka Differential Revision: D27452855 fbshipit-source-id: 410bafec7ac67503f68623a5e3d4ab258f434cbf	2021-04-02 02:31:25 -07:00
Lucas Hosseini	09f1f14569	Transition to new tensorpipe::Pipe API. (#55193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193 Test Plan: CI Reviewed By: lw Differential Revision: D27466387 fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3	2021-04-02 02:28:07 -07:00
Xiong Wei	b074a24394	Port torch.copysign method_tests() to OpInfo (#54945 ) Summary: Related https://github.com/pytorch/pytorch/issues/54261 This PR ports the method_tests() entries of `torch.copysign` to OpInfo. While porting the tests, the `test_out` cases from `test_ops.py` would fail as the out variant of `torch.copysign` does not support scalar inputs. ```python >>> x = torch.randn(2) >>> y = torch.empty_like(x) >>> torch.copysign(x, 1.) tensor([1.4836, 1.2156]) >>> torch.copysign(x, 1., out=y) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: copysign(): argument 'other' (position 2) must be Tensor, not float ``` This PR fixes the tests by adding an overload `native_functions` entry and re-dispatching scalar inputs to the existing `copysign_out` function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54945 Reviewed By: gchanan Differential Revision: D27505300 Pulled By: mruberry fbshipit-source-id: f68250fa52f8dcfd45426039ec178ca5e883e206	2021-04-01 20:30:25 -07:00
Heitor Schueroff	ed4a1d54a7	[OpInfo] Enable jit tests for multi_dot (#55147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55147 Enabling this test now that jit supports TensorList inputs Test Plan: Imported from OSS Reviewed By: gchanan Differential Revision: D27505270 Pulled By: heitorschueroff fbshipit-source-id: 05b0d47cb71740309ec5130bf520c576fb90a4d1	2021-04-01 20:11:43 -07:00
Mikhail Zolotukhin	ff6b3c76ab	[TensorExpr] Add TORCH_APIs to all expr classes. (#55002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55002 Test Plan: Imported from OSS Reviewed By: navahgar, walterddr Differential Revision: D27446409 Pulled By: ZolotukhinM fbshipit-source-id: 3442d5876bc68974fb3d44878f89c1a7895668d2	2021-04-01 19:48:10 -07:00
Mikhail Zolotukhin	1ccaec0238	[TensorExpr] Cleanup IRNodeType enum. (#55001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55001 The enum is only used for precedence computation thus we only need to enum node-types for which we know the precedence priority. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446410 Pulled By: ZolotukhinM fbshipit-source-id: 217dd63c4fd086155030ebf0c3e1772605109f7b	2021-04-01 19:48:07 -07:00
Mikhail Zolotukhin	f8f30a5e27	[TensorExpr] Remove stale docs from DesignOverview.md. (#55000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55000 Test Plan: Imported from OSS Reviewed By: bertmaher, pbelevich Differential Revision: D27446413 Pulled By: ZolotukhinM fbshipit-source-id: 4874dcd992fd4bc60ade008c59822194d39792d7	2021-04-01 19:48:05 -07:00
Mikhail Zolotukhin	bdbfb2a035	[TensorExpr] Nuke BaseCallNode. (#54999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54999 BaseCallNode was used as a base class for Intrinsics and FunctionCall. Now FunctionCall is gone, so BaseCallNode could be removed as well. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446411 Pulled By: ZolotukhinM fbshipit-source-id: be8ce06fbac72bfe355e5e3e1d2aa2267fae79fd	2021-04-01 19:48:02 -07:00
Mikhail Zolotukhin	0b75f862c7	[TensorExpr] Nuke FunctionCall. (#54998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998 The only reason why we couldn't use Load instead of FunctionCall was DepTracker. Now this is gone and we finally could replace FunctionCall with Load. Test Plan: Imported from OSS Reviewed By: bertmaher, pbelevich Differential Revision: D27446412 Pulled By: ZolotukhinM fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47	2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin	688e350725	[TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997 DepTracker was used to automatically pull in dependent computations from output ones. While it seems quite convenient, it's led to several architectural issues, which are fixed in this stack. DepTracker worked on Tensors, which is a pair of Buf and Stmt. However, Stmt could become stale and there was no way to reliably update the corresponding tensor. We're now using Bufs and Stmts directly and moving away from using Tensors to avoid these problems. Removing DepTracker allowed to unify Loads and FunctionCalls, which essentially were duplicates of each other. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D27446414 Pulled By: ZolotukhinM fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399	2021-04-01 19:46:26 -07:00
Natalia Gimelshein	0d47374c54	construct only necessary elements in OffsetCalculator (#55107 ) Summary: Per title. Elements beyond `dim` are never accessed because `646510f702/aten/src/ATen/cuda/detail/OffsetCalculator.cuh (L49-L51)`. On `addmm` instruction count per 30 repetitions 1467813 -> 1452261 `add` 651522 -> 633462 `add_` 529331 -> 511271 add benchmarking snippet: ``` timer = Timer("m1.add_(b);", setup="at::Tensor m1=torch::empty({2,2},device(at::kCUDA) ); at::Tensor b = torch::empty({2}, device(at::kCUDA));", language="c++", timer=timeit.default_timer) stats=timer.collect_callgrind(number=30) print(stats.as_standardized().stats(inclusive=False)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55107 Reviewed By: swolchok Differential Revision: D27494492 Pulled By: ngimel fbshipit-source-id: 23389a6bc9c9c0096751b95e7f9bf1c9f7bc594f	2021-04-01 19:04:30 -07:00
Howard Huang	5610e8271b	Fix skip_if_not_multigpu decorator (#54916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54916 Fixes https://github.com/pytorch/pytorch/issues/54887 `skip_if_not_multigpu` was skipping all the tests that use it. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27412193 Pulled By: H-Huang fbshipit-source-id: 28d6697bd8cc6b6784cdb038ccb3ff138d0610eb	2021-04-01 18:01:33 -07:00
Lucas Hosseini	8822c7e052	Update TensorPipe submodule. (#55164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55164 Reviewed By: mrshenli Differential Revision: D27522063 Pulled By: beauby fbshipit-source-id: 5473ab7a51f5da365bd5931254bc4d9f47b46201	2021-04-01 16:30:58 -07:00
Sam Estep	047a487b07	Fix accidental Flake8 excludes (#55178 ) Summary: [Currently](`faa4da49ff/.flake8 (L22)`), our `.flake8` config file has the `exclude` pattern `scripts`. I'm guessing that this is just meant to exclude the top-level `scripts` dir from Flake8, but it also applies to the following (apparently erroneously): - `.circleci/scripts` - `.github/scripts` - `test/scripts` This PR corrects the problem by making all the `exclude` patterns (except for the wildcard `*.pyi` pattern) relative to the repository root. Also, since this PR already touches all the `exclude` lines, it also sorts them to help reduce merge conflicts when `.flake8` is edited in the future. This sorting happened to reveal that the `build` pattern was previously present twice, so now it has been deduplicated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55178 Test Plan: Locally: ``` flake8 ``` And also [in CI](https://github.com/pytorch/pytorch/pull/55178/checks?check_run_id=2249949511). Reviewed By: janeyx99 Differential Revision: D27520412 Pulled By: samestep fbshipit-source-id: 359275c10ca600ee4ce7906e3a7587ffaa4ae1ed	2021-04-01 16:26:46 -07:00
Rohan Varma	3575e71be8	[DDP Logging] Log use of uneven inputs API (#54919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54919 Log the use of uneven inputs API for better tracking and use case detection. ghstack-source-id: 125446499 Test Plan: CI, added ut Reviewed By: zhaojuanmao, SciPioneer Differential Revision: D27410764 fbshipit-source-id: abc8055a2e15a3ee087d9959f8881b05a0ea933e	2021-04-01 16:22:32 -07:00
Scott Wolchok	057ec81b17	[PyTorch] OperandInfo ctor should take rvalue reference (#54972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54972 No reason to create a temporary. ghstack-source-id: 125338543 Test Plan: CI Reviewed By: bdhirsh Differential Revision: D27437190 fbshipit-source-id: 05eeb3ccd33700d8776b6ce58a120c7697acf49e	2021-04-01 14:55:31 -07:00
Scott Wolchok	dded5d72a4	[PyTorch] Move Tensor::has_names inline (#54965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54965 Yet another small getter that should be inlineable. ghstack-source-id: 125338544 Test Plan: Framework overhead benchmark w/o arguments Before: ``` I0329 13:56:41.268244 447880 bench.cpp:186] Mean 0.562635 I0329 13:56:41.268270 447880 bench.cpp:187] Median 0.562465 I0329 13:56:41.268276 447880 bench.cpp:188] Min 0.561757 I0329 13:56:41.268285 447880 bench.cpp:189] stddev 0.000707741 I0329 13:56:41.268292 447880 bench.cpp:190] stddev / mean 0.0012579 ``` After: ``` I0329 14:32:34.116181 607857 bench.cpp:186] Mean 0.557326 I0329 14:32:34.116206 607857 bench.cpp:187] Median 0.557194 I0329 14:32:34.116212 607857 bench.cpp:188] Min 0.556323 I0329 14:32:34.116219 607857 bench.cpp:189] stddev 0.000700897 I0329 14:32:34.116226 607857 bench.cpp:190] stddev / mean 0.00125761 ``` So roughly 1% faster overall if I've done the mental arithmetic right? Reviewed By: ezyang Differential Revision: D27410928 fbshipit-source-id: 4e66d40c71f534f66deb9c64502fb35d0a5997bf	2021-04-01 14:55:28 -07:00
Peter Bell	22f3b4eaa8	Tensor::register_hook: Avoid wrapping hook in two levels of std::function (#53917 ) Summary: The void overload of `register_hook` puts the user's callable into a `std::function` which is used in a lambda, then `_register_hook` wraps that lambda in another `std::function`. This is bad because each call goes through two indirections and also it requires more heap allocations. Instead, the lambda can capture the original callable without wrapping it in an `std::function` first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53917 Reviewed By: gchanan Differential Revision: D27513822 Pulled By: swolchok fbshipit-source-id: 026d40d7e9fb718757b7203737b0662ba36bc021	2021-04-01 14:53:55 -07:00
Scott Wolchok	8dc29e8a4a	[PyTorch] Allow IValue to construct from Tuple with fewer copies (#54534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54534 Moving overload of tuple -> IValue constructor was missing. ghstack-source-id: 124671165 Test Plan: Compare assembly for ivalue_test.cpp before/after this change. Newly added snippet stops calling `std::__invoke_impl` with a real function pointer to a by-value variant of `c10::ivalue::Tuple::create` and starts directly calling by-const-reference variant of `c10::ivalue::Tuple::create` instead. Reviewed By: smessmer Differential Revision: D27271895 fbshipit-source-id: 8b0e146a15d66883146b89b93da5e95f903484e6	2021-04-01 14:49:21 -07:00
Jane Xu	faa4da49ff	Add code to ensure workflow consistency for autocanceling (#55171 ) Summary: Currently, we only have three GHA workflows that need to be canceled on reruns. To anticipate for future workflows, this PR enables a check that will make sure any new workflow that should be autocanceled on reruns will be included in the cancel_redundant_workflows.yml GHA workflow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55171 Test Plan: Succeeded quick-checks https://github.com/pytorch/pytorch/runs/2249162035?check_suite_focus=true Reviewed By: samestep Differential Revision: D27514294 Pulled By: janeyx99 fbshipit-source-id: 27da321f648b97a090052823ec955caffeb6ae97	2021-04-01 14:11:32 -07:00
Vladimir Petrovic	2962fee99a	Fix/suppress a type warning in PyTorch (#55142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55142 Declare some functions C10_HOST_DEVICE to fix the NVCC warning. During pytorch compilation, NVCC compiler was emmiting several warnings like this one: ``` caffe2/c10/util/TypeCast.h(39): warning: calling a constexpr __host__ function from a __host__ __device__ function is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this. detected during: instantiation of "dest_t c10::static_cast_with_inter_type<dest_t, src_t>::apply(src_t) [with dest_t=c10::complex<double>, src_t=__nv_bool]" (158): here instantiation of "To c10::convert<To,From>(From) [with To=c10::complex<double>, From=__nv_bool]" (170): here instantiation of "To c10::checked_convert<To,From>(From, const char *) [with To=c10::complex<double>, From=__nv_bool]" caffe2/c10/core/Scalar.h(63): here ``` How to reproduce. - Make sure you are on remote/master - run: `buck build mode/dev-nosan caffe2/torch/fb/sparsenn:sparsenn_operators_gpu` Test Plan: - compilation completes without warnings. Reviewed By: r-barnes Differential Revision: D27469757 fbshipit-source-id: f8c4eedb637c6d487ac49bb310e48be11db204e2	2021-04-01 13:59:56 -07:00
Benjamin Lefaudeux	787854ce41	[ZeroRedundancyOptimizer] bounding the multiple gpus unit test to 4 gpus, hardcoded values (#54788 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53322, the test has some hardcoded values to check that the sharding works as expected, and was not used beyond 4 gpus prior Pull Request resolved: https://github.com/pytorch/pytorch/pull/54788 Reviewed By: mrshenli Differential Revision: D27483078 Pulled By: blefaudeux fbshipit-source-id: 63fe072c41e1601925af23d8fb1ea3f4729b2044	2021-04-01 13:50:29 -07:00
Scott Wolchok	0a329c66bf	[PyTorch] Remove stray comments in TensorBody (#54985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54985 I forgot to fix these before landing D27375016 (`e829754992`). ghstack-source-id: 125291731 Test Plan: Review Reviewed By: bhosmer Differential Revision: D27442002 fbshipit-source-id: 0bff8396e90f4e6889bf3320c2e316760491ce2f	2021-04-01 13:00:53 -07:00
Sam Estep	84ad5df8e3	Correct the name of the label in auto-label-rocm (#55170 ) Summary: The label name was meant to be "module: rocm". Pull Request resolved: https://github.com/pytorch/pytorch/pull/55170 Test Plan: None. Reviewed By: malfet Differential Revision: D27513290 Pulled By: samestep fbshipit-source-id: ef86fcd5f94a76c9e04653995c2ba9369c5ecb34	2021-04-01 12:54:56 -07:00
Scott Wolchok	dfa2daac1d	[PyTorch] Remove outdated C++11 note on C10_DEPRECATED (#55061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55061 We're C++14. ghstack-source-id: 125377571 Test Plan: Review Reviewed By: bhosmer Differential Revision: D27467852 fbshipit-source-id: 720cdd02813e84a43357ab5e35dfebe3d773bb0f	2021-04-01 12:06:32 -07:00
Hao Lu	070169e4d0	[ATen] tensor.contiguous() -> tensor.expect_contiguous (#55022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55022 Replace tensor.contiguous() with tensor.expect_contiguous in aten::narrow_copy Test Plan: CI Reviewed By: edvgha Differential Revision: D27453866 fbshipit-source-id: c5a6e64ccca4cf52cb879dfb02fd4c451fb397cb	2021-04-01 11:24:17 -07:00
Hao Lu	b74795c460	[Pyper] resize_as_ -> resize_ (#55098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55098 resize_as_ still goes through the dispatcher because it calls tensor.resize_. We can easily call resize_ directly while bypassing the dispatcher. Reviewed By: swolchok Differential Revision: D27457894 fbshipit-source-id: 8a5da185d1a6addafbf4915e29613013451b5e43	2021-04-01 11:17:40 -07:00
Ivan Yashchuk	f34de6a9f4	Modified lstsq_helper to accept rank and singular_values (#54719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54719 lstsq_helper now takes rank and singular_values that are modified in-place. This is required for adding out= variant. TODO: - [ ] Fix CI failures Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27439197 Pulled By: mruberry fbshipit-source-id: f2fe421aa393c2d58f5c50f33e21a9eae57e4f01	2021-04-01 11:14:04 -07:00
Alexander Golynski	cdd9911a12	Revert D27470071: [pytorch][PR] Trigger azure pipeline for multi gpu tests Test Plan: revert-hammer Differential Revision: D27470071 (`f0dafeb0cb`) Original commit changeset: 9b7615799da5 fbshipit-source-id: 60a7d9ba5eda31d7381d15920f9fc9ec15df1a6c	2021-04-01 11:02:28 -07:00
Ansley Ussery	0eba63ec93	Improve testing documentation in `CONTRIBUTING.md` (#54904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54904 Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D27407009 Pulled By: ansley fbshipit-source-id: ae69d8387b55f714fd105efe7c4ecbdd69674f65	2021-04-01 10:18:34 -07:00
Yanan Cao	1b2b3ca86d	Language Ref Python Builtin Functions and Values (#52830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52830 Reviewed By: SplitInfinity, nikithamalgifb Differential Revision: D27407474 Pulled By: gmagogsfm fbshipit-source-id: 06fcafbcc66376c5f1818cb12fca2f2a57843c9d	2021-04-01 10:14:03 -07:00
Jane Xu	c64e006fc3	Fix security of ROCm labeling workflow (#55157 ) Summary: The current workflow fails when there are backticks in the PR title, because bash tries to evaluate it right away. (example: https://github.com/pytorch/pytorch/runs/2242913870) Moving the variables to the env section away from bash, which removes the security risk. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55157 Test Plan: my repo: https://github.com/janeyx99/gha-experiments/actions/runs/709088679 Reviewed By: gchanan Differential Revision: D27505033 Pulled By: janeyx99 fbshipit-source-id: 1cc7545c18400d63a4490d9b019afe383b272229	2021-04-01 10:07:32 -07:00
eqy	53609b4cac	fix typo in ReduceMinMaxKernel (#54984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54984 Reviewed By: zou3519 Differential Revision: D27494418 Pulled By: heitorschueroff fbshipit-source-id: 5066df75ba82c15787edbcb0208594aac2bbaf01	2021-04-01 09:43:17 -07:00
Ailing Zhang	a4125876c9	Move BackendSelect to default_included_set. (#55117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55117 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27490571 Pulled By: ailzhang fbshipit-source-id: a0d8a25a8217a754061fbf3b8e31cc1cf2d3bdea	2021-04-01 09:38:07 -07:00
Ivan Yashchuk	2798f38c86	Added checks for dtype and device of OpInfo's sample_inputs (#54949 ) Summary: Currently, it's not tested whether `op.sample_inputs` actually used the provided dtype and device arguments. This PR fixes that introducing asserts in `test_supported_dtypes`. This will help to detect incorrectly generated inputs in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54949 Reviewed By: H-Huang Differential Revision: D27435952 Pulled By: mruberry fbshipit-source-id: 8465c459b9b0c007411a9a74340bc2755519624a	2021-04-01 09:34:51 -07:00
lezcano	36c27fd0ac	SVD docs improved (#54002 ) Summary: - Corrected a few errata in the SVD docs - Made the notation more uniform (refer to `Vh` in `linalg.svd`, always use double tilts...) - Wrote a better explanation about why the gradients of `U` and `V` are not well-defined when the input is complex or real but has repeated singular values. The previous one pointed to a somewhat obscure post on gauge theory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54002 Reviewed By: malfet Differential Revision: D27459502 Pulled By: mruberry fbshipit-source-id: f5c35eca02d35dadd2fc0eeadfacc8824f409400	2021-04-01 09:31:40 -07:00
Sam Estep	6b20046491	Pin ShellCheck version to v0.7.1 (#55109 ) Summary: Not sure why I didn't do this in https://github.com/pytorch/pytorch/issues/47786. Version 0.7.1 (the latest `"stable"` version of ShellCheck) was released [almost a year ago](https://github.com/koalaman/shellcheck/releases/tag/v0.7.1), but even if releases are infrequent, it's better to just get rid of the nondeterminism that caused https://github.com/pytorch/pytorch/issues/47786 in the first place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55109 Test Plan: The "Lint / quick-checks" job in GitHub Actions. Reviewed By: janeyx99 Differential Revision: D27483473 Pulled By: samestep fbshipit-source-id: e09f52844db440f2b6ea3cd54340c9e62dea09f4	2021-04-01 09:09:49 -07:00
Nikita Shulga	8d5df95551	Make TensorIterator, SparseTensorMath and UnaryOps clang-tidy clean (#55087 ) Summary: Disable `cppcoreguidelines-macro-usage` as PyTorch codebase uses a lots of macros that violate this rule. Disable `bugprone-reserved-identifier` and `performance-unnecessary-value-param` as those checks are very slow Add `NOLINT` to DEFINE_DISPATCH as it introduces non-const global variables Replace `for(auto i = 0; i < lim; ++i)` with `for(auto i: c10::irange(lim))` throughout the modified files Pull Request resolved: https://github.com/pytorch/pytorch/pull/55087 Reviewed By: samestep Differential Revision: D27475822 Pulled By: malfet fbshipit-source-id: 2651a4b3dc062066a15e69380354414a198fb279	2021-04-01 09:04:35 -07:00
Chester Liu	f0dafeb0cb	Trigger azure pipeline for multi gpu tests (#52490 ) Summary: The run on CircleCI: https://app.circleci.com/pipelines/github/pytorch/pytorch/283891/workflows/e049872e-5327-4f8c-abc3-a72ca6a1a548/jobs/11462671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52490 Reviewed By: glaringlee Differential Revision: D27470071 Pulled By: malfet fbshipit-source-id: 9b7615799da5fc8381fef226da003449b2698d35	2021-04-01 08:29:56 -07:00
Nikita Shulga	69c5fd1e00	SyncBatchNorm.forward() to handle optional weight (#54568 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54568 Reviewed By: ezyang Differential Revision: D27285822 Pulled By: malfet fbshipit-source-id: 4f7b489d80294cb2509eec4f6c4aa22d5c47b35d	2021-04-01 08:02:21 -07:00
Alban Desmaison	f83668b4e5	Update release notes scripts following runbook update (#54594 ) Summary: This adds: - new categories - global commit counter - support for new "Reverted" label on PRs - new export system to multiple files Pull Request resolved: https://github.com/pytorch/pytorch/pull/54594 Reviewed By: H-Huang Differential Revision: D27396011 Pulled By: albanD fbshipit-source-id: ca1ec3a1b90221ba26fd8b053dfb10f614f05909	2021-04-01 07:55:16 -07:00
Hui Guo	967e59e557	[tensorexpr] Add sliceHead/sliceTail APIs with short parameter list (#55115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55115 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27488754 Pulled By: huiguoo fbshipit-source-id: d8a1b39ec891c80f6a9078768d692ac4ebeb5f79	2021-04-01 07:34:33 -07:00
Horace He	1324b0dd44	[FX] Adds C-level monkeypatching of `torch.randn` so that we can capture it during tracing. (#54060 ) Summary: ``` def foo(x): return x + torch.randn(3, 3) fx.enable_ctracing(True) print(fx.symbolic_trace(foo).code) ``` results in ``` def forward(self, x): randn = torch.randn(3, 3) add = x + randn; x = randn = None return add ``` Seems to slow down tracing by 1.5-3x. DenseNet121: 0.05 -> 0.12 seconds ResNet18: 0.10 -> 0.15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54060 Reviewed By: jamesr66a Differential Revision: D27208978 Pulled By: Chillee fbshipit-source-id: b9e19a9b1084dadfc0dfaee41a03bc25a45910b1	2021-04-01 07:34:31 -07:00
Ansha Yu	0cfd9e881f	[static runtime] fix out variant for 4bit embedding bag (#55096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55096 There were issues with D26138322 (`5b0a6482c1`) that we didn't catch the first time around. This (rebased on top of the to_copy fixes) fixes the converted remote_ro c2/pt output comparison Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_model=/data/users/ansha/tmp/adfinder/210494966_0.predictor.disagg.remote_request_only --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_remote_ro_net2.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.remote_request_only.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/remote_ro_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=1 --pt_enable_out_variant=1 --compare_results=1 --iters=1 --warmup_iters=1 --num_threads=1 --do_profile=0 --benchmark_c2_predictor=1 --do_benchmark=1 ``` Reviewed By: hlu1 Differential Revision: D27477104 fbshipit-source-id: 5a95dfa7eae23566fadc3fec323ad03a34e6734d	2021-04-01 07:33:02 -07:00
Xiong Wei	b880854f31	port copysign to structured kernel (#55040 ) Summary: Related https://github.com/pytorch/pytorch/issues/54945 This PR ports `copysign` to structured, and the `copysign.Scalar` overloads are re-dispatched to the structured kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55040 Reviewed By: glaringlee Differential Revision: D27465501 Pulled By: ezyang fbshipit-source-id: 5cbabfeaaaa7ca184ae0b701b9692a918a90b117	2021-04-01 07:29:11 -07:00
Heitor Schueroff	8b02d1207b	Fixed OpInfo jit tests failing for TensorList inputs (#54954 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53906 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54954 Reviewed By: glaringlee Differential Revision: D27474863 Pulled By: heitorschueroff fbshipit-source-id: cf8c1cac6fd1cceacd6be73a2eb49d28a5cfc20a	2021-04-01 07:09:07 -07:00
Lucas Hosseini	9d6a81d1a6	Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136 This will ease the transition to the new API where `Buffer` does not store a length anymore. Test Plan: CI Reviewed By: lw Differential Revision: D27466385 fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925	2021-04-01 06:55:02 -07:00
Alexander Golynski	204ac21bf1	Revert D27449031: [pytorch][PR] [ROCm] use hiprtc precompiled header Test Plan: revert-hammer Differential Revision: D27449031 (`2a7df657fe`) Original commit changeset: 81a8d7847a47 fbshipit-source-id: b7b970c8ea4110357fba3ad4d52a86fa5641d90c	2021-04-01 06:42:04 -07:00
Heitor Schueroff	3036777305	Replace torch.chain_matmul calls to torch.linalg.multi_dot (#55064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55064 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D27469261 Pulled By: heitorschueroff fbshipit-source-id: 4a53cb058babc81f93f159747b4ed2b6c985a0bc	2021-04-01 04:50:53 -07:00
Heitor Schueroff	d98072b027	Deprecate torch.chain_matmul in favor of torch.linalg.multi_dot (#53453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53453 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27406282 Pulled By: heitorschueroff fbshipit-source-id: b6e715d1b88e0613ee6b6208cb28ba4757e31717	2021-04-01 04:50:51 -07:00
Heitor Schueroff	5d68b3695c	[Relanding] Implemented torch.linalg.multi_dot (#52859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52859 This reverts commit 92a4ee1cf6092dd941591f80885eb7fef5b2c0d8. Added support for bfloat16 for CUDA 11 and removed fast-path for empty input tensors that was affecting autograd graph. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27402390 Pulled By: heitorschueroff fbshipit-source-id: 73c5ccf54f3da3d29eb63c9ed3601e2fe6951034	2021-04-01 04:49:05 -07:00
Hangchen Yu	5a1191d050	Check exception messages in embedding_bag_proxy unit test Summary: This replaces the use of `assertRaises` with `assertRaisesRegex` to make sure that we catch the expected exceptions. It also corrects a few unit tests: * Test for the case where `input is 2D and offsets is not None` was wrong. * Check for `empty offsets` was missing. * Check for `offsets length when include_last_offset=True` was wrong. Test Plan: ``` buck test mode/opt caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/tests:test_embedding_bag_proxy ✓ ListingSuccess: caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/tests:test_embedding_bag_proxy - main (3.049) ✓ Pass: caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/tests:test_embedding_bag_proxy - test_module_swapping_py (caffe2.torch.fb.training_toolkit.common.proxy_module_thrift.tests.test_embedding_bag_proxy.EmbeddingBagProxyTest) (1.084) ✓ Pass: caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/tests:test_embedding_bag_proxy - test_bad_inputs (caffe2.torch.fb.training_toolkit.common.proxy_module_thrift.tests.test_embedding_bag_proxy.EmbeddingBagProxyTest) (1.164) ✓ Pass: caffe2/torch/fb/training_toolkit/common/proxy_module_thrift/tests:test_embedding_bag_proxy - test_module_swapping_jit (caffe2.torch.fb.training_toolkit.common.proxy_module_thrift.tests.test_embedding_bag_proxy.EmbeddingBagProxyTest) (1.388) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4222124700133860 buck test caffe2/test:nn Pass: 1086 Skip: 1099 Timeout: 3 Omit: 1 {emoji:2702} caffe2/test:nn - test_conv_double_backward (test_nn.TestNN) ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/6755399476551597 buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest ✓ ListingSuccess: caffe2/benchmarks/static_runtime:static_runtime_cpptest - main (7.985) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.TrivialModel (12.349) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.LongModel (12.805) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.IndividualOps_to (12.890) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.IndividualOps_pow (13.329) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.EmbeddingBag (13.703) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.IndividualOps_Reshape (13.886) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.LeakyReLU (13.964) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.IndividualOps_Binary (13.967) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.DeepWide (14.095) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.KWargsAPI_1 (14.461) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.UnaryOps (14.527) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.CleanUpMemory (14.624) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.FusionPass (14.635) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.KWargsAPI_2 (15.027) ✓ Pass: caffe2/benchmarks/static_runtime:static_runtime_cpptest - StaticRuntime.IndividualOps_flatten (15.299) Summary Pass: 15 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/5348024606957775 ``` Reviewed By: qizzzh Differential Revision: D27415247 fbshipit-source-id: c4915170e89359ea961c1a6df513b29790f147fa	2021-04-01 03:58:11 -07:00
Hui Guo	50cb75edce	[tensorexpr] Add python bindings for TensorExprKernel (#54450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54450 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27243175 Pulled By: huiguoo fbshipit-source-id: 820cf0d6cd1dd984d4153628e0f419d234668c82	2021-04-01 02:11:32 -07:00
Scott Wolchok	ba95e08a95	[PyTorch] Use DimVector for inputs to as_strided that don't grow dim (#55016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55016 When we call as_strided() and don't add an extra dimension, we should continue to expect that the number of dimensions will fit in a DimVector and thus that using it will save heap allocations. ghstack-source-id: 125337281 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D27452838 fbshipit-source-id: 8b3d118de322638c0c0e3a4bfcfb3c820c64e6cc	2021-04-01 01:22:33 -07:00
Oleg Khabinov	6145ac07b5	[caffe2] Reintroduce Log1p operator (#55073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55073 Original diff D27422219 (`d92e2520de`) was reverted, reintroducing this op again. Reviewed By: ChunliF Differential Revision: D27473735 fbshipit-source-id: 1af0281724e9ada699ebf2045d51f65083daf5b4	2021-03-31 22:29:23 -07:00
Stiopa Koltsov	547346d663	[caffe2] Fix -Wundef Summary: * `#if` with some undefined name is a warning when `-Wundef` is specified (which is in ovrsource for example) * identifiers starting with two underscores are [reserved for compiler internals](https://en.cppreference.com/w/cpp/language/identifiers) Test Plan: CI Reviewed By: ezyang Differential Revision: D27318070 fbshipit-source-id: 4989fc6a3bf3c176eddd7c25aca47414e4973edd	2021-03-31 22:24:30 -07:00
Yi Wang	058357a439	[Gradient Compression] Report compression rate for batched PowerSGD hook (#55103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55103 Previously compression rate is only reported in PowerSGD hook. Also report this metric for comprehensive experimentation. It is very easy to compute the sizes before and after compression, because there is only one matrix factorization per bucket, and no accumulation within the bucket is needed. 1) The size before compression is the input tensor size. 2) The size after compression is the size of P + Q, where each has a size of `square_side_length * state.matrix_approximation_rank`. ghstack-source-id: 125399028 Test Plan: Tested by running scripts/wayi/torch/power_sgd.py locally. Reviewed By: deadlybulb Differential Revision: D27474295 fbshipit-source-id: a2225e85be03ab20238f01014d5ec9ae1787c4fb	2021-03-31 22:17:05 -07:00
Scott Wolchok	d2aab53dc2	[PyTorch] Check is_same instead of data_ptr in addmm_out_cuda_impl (#55111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55111 I don't see how we could have ended up with !is_same but also identical data_ptr, and is_same is cheaper. ghstack-source-id: 125438822 Test Plan: Existing CI? Reviewed By: ngimel Differential Revision: D27484914 fbshipit-source-id: 22125b29e6e09d312a2b92e893d08c69059e4435	2021-03-31 21:35:42 -07:00
Ksenija Stanojevic	7824d8277a	[ONNX] Fix export of copy_ operator (#51938 ) (#54870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54870 copy_operator before going into onnx exporter is being decomposed into aten::expand_as and aten::index_put. There is a scenario where inputs to copy are not of the same type, but copy op in torch does implicit casting that is not currently reflected inside onnx exporter. This PR is adding casting inside index_put symbolic in case when tensor self is not of the same type as values. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408975 Pulled By: SplitInfinity fbshipit-source-id: 15022703e76b9c98b02285c06b13d44f3c4a3f00	2021-03-31 21:14:32 -07:00
DeyuHuang	40869884cd	Add outer export to onnx (#53603 ) (#54869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54869 Add symbolic fuction to support torch.outer export to onnx. Support for transfo-xl-wt103 model. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408978 Pulled By: SplitInfinity fbshipit-source-id: 70c89a9fc1a5e4a4ddcf674afb1e82e492a7d3b9	2021-03-31 21:14:29 -07:00
Negin Raoof	c5f3d92816	[ONNX] Update scripting docs (#54634 ) (#54868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54868 * Updating docs for scripting * Rebase * Fix formatting Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408980 Pulled By: SplitInfinity fbshipit-source-id: 2b176a5a746c1a2369be1940d84e6491a1ecd015	2021-03-31 21:14:27 -07:00
Ksenija Stanojevic	cb0cee4a3d	[ONNX] Replace decomposeLinear pre process pass with a symbolic (#53077 ) (#54866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54866 Replace decomposeLinear pre process pass with a symbolic Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408981 Pulled By: SplitInfinity fbshipit-source-id: d2d76cab3383122a60df1f356742a33db56adc71	2021-03-31 21:14:25 -07:00
Bowen Bao	849dcb8b69	[ONNX] Fix if output shape mismatch error & Fix graph input directly used as output (#53219 ) (#54865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54865 Fix if output shape mismatch error & Fix graph input directly used as output Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408979 Pulled By: SplitInfinity fbshipit-source-id: 4cfc7b8110b6cb73e000c9cf754190044fb5e1c0	2021-03-31 21:14:22 -07:00
Negin Raoof	cd9dd653e9	[ONNX] Support primitive type input/outputs and attributes (#53550 ) (#54864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54864 Support primitive type attributes. Needed for Silero model. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408982 Pulled By: SplitInfinity fbshipit-source-id: 16b291eedbe9f9bb31d7664a29a484555df53755	2021-03-31 21:14:20 -07:00
Shubham Bhokare	ce48b14060	[ONNX] Improve index_put symbolic to handle singular Bool updates (#53690 ) (#54863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54863 Adds support for cases where the updates to the index_put node is a single Bool value, such as the case shown below ``` mask[indices] = True ``` Fixes #53507 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27408977 Pulled By: SplitInfinity fbshipit-source-id: bcfb55b50ce76b3d4913ffbc16cdef1f98cb7a84	2021-03-31 21:12:53 -07:00
Kurt Mohler	6c235ef267	Allow `std=0` in `torch.normal`, and error if `std<0` (#51317 ) Summary: Part of https://github.com/pytorch/pytorch/issues/49998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51317 Reviewed By: bdhirsh Differential Revision: D27253939 Pulled By: mruberry fbshipit-source-id: af7a72c3d91549b1a88b73849b6973e7619dc50b	2021-03-31 21:06:07 -07:00
Mike Ruberry	15f04e3466	Revert D27408378: [quant][graphmode][fx][refactor] Factor out insert_observers_for_model to a separate function Test Plan: revert-hammer Differential Revision: D27408378 (`c445f4ee93`) Original commit changeset: 9143f0a6f939 fbshipit-source-id: ae65ea798a6d72f2ec724c4c1b492937edddf721	2021-03-31 20:51:42 -07:00
Heitor Schueroff	8b8c4096ee	Added OpInfo gradcheck_wrapper to replace output_func (#54914 ) Summary: Added a field to `OpInfo` to provide a wrapper function for gradcheck. This is useful for functions that need to perform some extra input/output processing to work with gradcheck. fixes https://github.com/pytorch/pytorch/issues/50837 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54914 Reviewed By: H-Huang Differential Revision: D27435234 Pulled By: heitorschueroff fbshipit-source-id: fa3e9b61f3d3df221243fd142ddb8b7861dbf669	2021-03-31 20:23:49 -07:00
nikithamalgi	790b69e096	Language Ref for Statements in Torchscript (#52847 ) Summary: Addresses the Statements supported in Torchscript for Language Spec Pull Request resolved: https://github.com/pytorch/pytorch/pull/52847 Reviewed By: gmagogsfm Differential Revision: D27463142 Pulled By: nikithamalgifb fbshipit-source-id: ff3def1b878092b0a2afc7c2f47b7857e6658ecf	2021-03-31 19:15:53 -07:00
Jerry Zhang	c445f4ee93	[quant][graphmode][fx][refactor] Factor out insert_observers_for_model to a separate function (#54733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54733 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27408378 fbshipit-source-id: 9143f0a6f939fa80f1d1d6bae4b2d37aa21cb9b9	2021-03-31 18:50:47 -07:00
Jerry Zhang	c57541ce06	[quant][graphmode][fx] Separate handling Copy operator to a helper function (#54644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54644 Previously we special case copy operator in normal insert observer code, this PR tries to split the special case logic to a separate function and keep the rest of the code clean. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27314678 fbshipit-source-id: d36870ceb3717bc01eaeaa6f3f1532ad562cbaf1	2021-03-31 17:50:32 -07:00
Jerry Zhang	c0d6dbdce4	[quant][fx][graphmode][refactor] Change activation_post_process_map to track the observer name instead (#54643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54643 A refactor needed for future changes. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27314677 fbshipit-source-id: 972fbfb506f86da13f8817b3eaa5e6d0ad16ffe1	2021-03-31 17:50:30 -07:00
Jerry Zhang	c2adedf6fe	[quant][graphmode][refactor] Remove reduandent code (#54073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54073 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27086067 fbshipit-source-id: b1995138de56f1352c5df03378ebc2832bf35ef7	2021-03-31 17:50:27 -07:00
Jerry Zhang	55544cb13a	[quant][graphmode][fx] Add support for one value being quantized with different qconfigs (#53586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53586 Previously one value can only be quantized to one dtype, this PR adds the support for quantizing one value in the fx graph with multiple dtypes, e.g. first quantize to int8 and then float16 might do some followup PRs to clean up the hacks and refactor the code. Test Plan: python test/test_quantization.py TestQuantizeFx.test_multiple_qconfigs_single_value Imported from OSS Reviewed By: vkuzo Differential Revision: D26912676 fbshipit-source-id: ae3653fd67f05870a3a9e808f491871826c555d5	2021-03-31 17:48:50 -07:00
Ailing Zhang	eb52e36460	Revert D27469727: [pytorch][PR] [android] fbjni from prefab dependency 0.2.2 Test Plan: revert-hammer Differential Revision: D27469727 (`507b46f23e`) Original commit changeset: 2ab22879e81c fbshipit-source-id: d656463b81a02fbf870dded5d3868bb33e016fe0	2021-03-31 17:21:30 -07:00
Jordan Fix	c85d3f501f	Move shape prop inside acc_tracer (#55091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55091 Test Plan: All tests are updated and passing. Reviewed By: 842974287 Differential Revision: D27471887 fbshipit-source-id: 98969fb1bfc72f6c57835525d82d4a8b78bb19bb	2021-03-31 17:16:16 -07:00
Rong Rong (AI Infra)	0d80f378f6	fix boto3 resource not close (#55082 ) Summary: Test Plan GHA CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/55082 Reviewed By: samestep Differential Revision: D27475006 Pulled By: walterddr fbshipit-source-id: ccf50ea0b15ea6840e593a2c056ed2b388a96c52	2021-03-31 16:49:15 -07:00
Jeffrey Wan	f29039677d	Refactor get numerical jacobian (#54092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54092 This is the first of several refactors to get numerical jacobian: This one just moves some logic around as to try to split the get_numerical_jacobian function into smaller more manageable functions: - compute_gradient is now no longer nested, but we have to pass in the parameters instead - iter_tensor extracts out the logic of iterating through different types of tensors (the code should be almost the exact same here except for instead of calling into the update jacobian function, we yield the arguments instead) Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27354268 Pulled By: soulitzer fbshipit-source-id: 73288e3c889ae31bb8bf77a0e3acb3e9020e09a3	2021-03-31 16:28:16 -07:00
Wenlei Xie	70af5db7ca	Remove use_c10_dispatcher option (#54969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54969 With all use cases to hacky wrapper removed, all kernels will be dispatched with c10 full dispatcher. ghstack-source-id: 125434790 Test Plan: buck build //caffe2/aten/... Reviewed By: ezyang, walterddr Differential Revision: D27436596 fbshipit-source-id: 7a146d1f4a983b4a81f8552be4eec6c482b6bea2	2021-03-31 16:24:24 -07:00
Peng Wu	908a74e8c1	[Refactoring] make transformations return whether graph is modified (#54777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54777 Updated RemoveListMutation, PeepholeOptimizedListIdoms, LoopUnrolling, PeepholeOptimization to return whether graph is modified after transformation, PeepholeAliasSensitivity Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27412105 fbshipit-source-id: 0c1520bc34f6bd59acd83d98bed58897376eac41	2021-03-31 16:20:12 -07:00
Rohan Varma	a37fbf9b45	[Futures] Bump log verbosity when ignoring cb errors in python future. (#54476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54476 Per title. For `add_done_callback`, we log but swallow exceptions in order to keep consistent with what concurrent.futures python library does, see discussion in https://github.com/pytorch/pytorch/pull/45675. Although, it would be good to improve the verbosity here as this can be a source of confusion if users are setting a different future via `add_done_callback`, and an error is hit resulting in an unexpected hang (see https://github.com/pytorch/pytorch/issues/52132 for more details on how this can happen). ghstack-source-id: 125300389 Test Plan: CI Reviewed By: lw Differential Revision: D27253004 fbshipit-source-id: 72ed21c8fb6d27de5797c17fc46b762f893e6fea	2021-03-31 15:17:06 -07:00
Rohan Varma	28daa6b7dd	[Futures] enhance error handling in then() (#54475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54475 Implements the proposal in https://github.com/pytorch/pytorch/issues/53717#issuecomment-800545655. See that issue for more details, but at a high level: 1. markCompleted() immediately sets completed_ = true 2. Subclasses of future (such as cuda future) implement a nontrivial `postMarkCompletedHook` which may throw 3. If above error is caught and we call `setError`, setError itself will error out because completed_ = true. To fix this, only call setError if the user-defined cb resulted in an error, otherwise, call `markCompleted` and let postMarkCompletedHook() throw and crash the program (per lw's thoughts this should be a fatal). ghstack-source-id: 125300388 Test Plan: CI Reviewed By: lw Differential Revision: D27252965 fbshipit-source-id: fda41e8844104774aaf897286512d83ff06632b1	2021-03-31 15:15:34 -07:00
Natalia Gimelshein	63c70ae032	various overhead improvements to cuda addmm (#55026 ) Summary: Add fast common case to `prepare_matrix_for_cublas`, use index size instead of size(), move some checks where they belong so they are not triggered where they are guaranteed to be true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/55026 Reviewed By: gchanan Differential Revision: D27468945 Pulled By: ngimel fbshipit-source-id: 79c9f7b3d61595536f603d6fb0316e6f21630f38	2021-03-31 14:58:31 -07:00
Sam Estep	8eb9a934bc	Clarify tools/test_history.py output for re-runs (#55106 ) Summary: This PR clarifies the output of `tools/test_history.py` in the presence of re-runs for a single commit/job pair. Specifically: - in `multiline` mode, the results from all re-runs are now shown - in `columns` mode, the wording is now changed from "S3 reports omitted" to "job re-runs omitted" Pull Request resolved: https://github.com/pytorch/pytorch/pull/55106 Test Plan: ``` python tools/test/test_test_history.py ``` Reviewed By: walterddr Differential Revision: D27480590 Pulled By: samestep fbshipit-source-id: 5b4ccae7586ef1df744663cba1c16bb5bfa75bb7	2021-03-31 14:54:38 -07:00
Jacob Szwejbka	2726de0119	[Pytorch] Expose ops present in dispatcher (#54791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54791 Several usecases have a need to want and see what ops are present in a specific pytorch runtime. This diff exposes that information in the dispatcher ghstack-source-id: 125314247 Test Plan: D26678637 uses this api. Reviewed By: swolchok Differential Revision: D27271371 fbshipit-source-id: e572f0c85dcd75d75356e2cd4cfdd77efee17f94	2021-03-31 14:48:29 -07:00
Akshit Khurana	5c3963373a	Handle 1D input for xnnpack::linear (#54986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54986 If the input is 1D xnnpack::linear fails while aten::linear makes it (1, D) and continues Test Plan: buck test //caffe2/test:xnnpack_integration -- TestXNNPACKOps Reviewed By: kimishpatel Differential Revision: D27441966 fbshipit-source-id: dfb2c23b91247632e0e3fd2482056a503c246c39	2021-03-31 14:45:43 -07:00
Natalia Gimelshein	fb1c193eed	Simplify convolution double backward gradInput formulas (#54840 ) Summary: Currently in convolution double backward grad of input is computed as `convT(ggW, gO.T)`. Notice how first argument is, in fact, of the size that convolution weight has, and second is of the size of gradOutput, which is an inverse order compared to how convolutions are regularly called, and sizes are far from what cudnn heuristics is trained for and what cudnn is guaranteed to have efficient kernels for. This takes cudnn 8 to some dark places, calling kernels that take 20-100 s. But, luckily for us, convT is a commutative operation (unlike conv), so convT(ggW, gO) is actually the same as convT(gO, ggW), modulo some transposes because of conventions around the weight size, so we can use convT(gO, ggW). As an added bonus, we don't need a special branch for groups with this formulation. For the following pretty standard convolution, - cudnn 7.6+old formulation takes 7.5 ms for double backward, - cudnn 8 + old formulation takes ~40 s, - cudnn 8 + new formulation is 1.8 ms with benchmark enabled, - cudnn 8 + new formulation is 4 ms with benchmark disabled, benchmarking script is below: ``` import torch import time #torch.backends.cudnn.benchmark=True def ggI(conv, inp): out = conv(inp) grads = torch.autograd.grad(out, conv.weight, torch.rand_like(out), create_graph=True, retain_graph=True) torch.cuda.synchronize() start = time.time() grads[0].backward(torch.rand_like(grads[0])) torch.cuda.synchronize() print("db time: ", time.time()-start) return inp.grad conv = torch.nn.Conv2d(512,256,kernel_size=3, padding=1, groups=2).cuda() inp = torch.randn(1,512,128,128, device="cuda", requires_grad=True) for _ in range(20): ggI(conv, inp) torch.cuda.synchronize() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54840 Reviewed By: mruberry Differential Revision: D27384866 Pulled By: ngimel fbshipit-source-id: c6c875776a9801a0a2cd2f34f8ec39d0fcd59df8	2021-03-31 14:44:09 -07:00
Michael Melesse	26c1e2ee83	[ROCM] enable miopen for rnn f16 (#52475 ) Summary: This PR enables using MIOpen for RNN FP16 on ROCM. It does this by altering use_miopen to allow fp16. In the special case where LSTMs use projections we use the default implementation, as it is not implemented in MIOpen at this time. We do send out a warning once to let the user know. We then remove the various asserts that are no longer necessary since we handle the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52475 Reviewed By: H-Huang Differential Revision: D27449150 Pulled By: malfet fbshipit-source-id: 06499adb94f28d4aad73fa52890d6ba361937ea6	2021-03-31 14:39:54 -07:00
Ivan Kobzarev	7f87358840	[android] bump nigtlies version to 1.9.0 (#55076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55076 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D27474604 Pulled By: IvanKobzarev fbshipit-source-id: b0b694333464485fd20036e9d5ef982877b1aa19	2021-03-31 14:37:59 -07:00
James Reed	bcb4583170	[FX] Add a metadata dict to Node and switch shapeprop to use that (#54926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54926 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27417801 Pulled By: jamesr66a fbshipit-source-id: 68a5155120a235065f58aa64ba1a6a97818dd0c1	2021-03-31 14:36:54 -07:00
Jane Xu	b64d775636	Adding workflow to auto-label PRs with ROCm (#54989 ) Summary: This PR adds a workflow that automatically adds ROCm label to PRs and issues with ROCm (case insensitive) in their titles. Note that this does not remove labels even if the title is changed to no longer contain ROCm, but I can easily add removal functionality if that is desired. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54989 Test Plan: much test in my own repo: https://github.com/janeyx99/gha-experiments/actions (thanks samestep for your help!) Reviewed By: walterddr Differential Revision: D27448651 Pulled By: janeyx99 fbshipit-source-id: 103f39df0697eb6571c96e88c98d28c8b7adcfd7	2021-03-31 14:17:42 -07:00
Ivan Kobzarev	507b46f23e	[android] fbjni from prefab dependency 0.2.2 (#55066 ) Summary: Switching pytorch android to use fbjni from prefab dependencies Bumping version of fbjni to 0.2.2 soloader version to 0.10.1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55066 Reviewed By: dreiss Differential Revision: D27469727 Pulled By: IvanKobzarev fbshipit-source-id: 2ab22879e81c9f2acf56807c6a133b0ca20bb40a	2021-03-31 14:12:18 -07:00
Joel Schlosser	0bd96458ba	Revert D26820202: Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants Test Plan: revert-hammer Differential Revision: D26820202 (`f9097c43b9`) Original commit changeset: 3e8f09523329 fbshipit-source-id: 5742b69a96ce1c848d75348d0f761cf66a69cbf3	2021-03-31 13:57:44 -07:00
Yi Zhang	8ad32dbbd7	update build tutorial - choose the correct VS version (#54933 ) Summary: There might be regressions in newest VS. Remind users to choose the stable VC version as our CI's Pull Request resolved: https://github.com/pytorch/pytorch/pull/54933 Reviewed By: walterddr Differential Revision: D27466645 Pulled By: malfet fbshipit-source-id: a6a1ebea4cc1b22e13c7342ee4c061afcef7e2b5	2021-03-31 13:45:48 -07:00
Jeff Daily	2a7df657fe	[ROCm] use hiprtc precompiled header (#54350 ) Summary: HIP's runtime compiler (hiprtc) is adding support for precompiled HIP headers in the ROCm 4.2 release. Conditionally add support for this feature. Using this feature will improve the ROCm torch wheel user experience; users will no longer need to install HIP headers separately to use torch JIT features. The use of this feature is conditionalized on a new ROCM_VERSION macro. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54350 Reviewed By: H-Huang Differential Revision: D27449031 Pulled By: malfet fbshipit-source-id: 81a8d7847a47ce2bb253d1ea58740ef66ed154a3	2021-03-31 13:36:50 -07:00
Orvid King	07602bf7e1	[caffe2] Use the CXX11 version of the USE_C99_COMPLEX macro Summary: Because the bare CXX version forwards to this without checking if it's defined causing errors for builds with -Wundef enabled Test Plan: contbuilds Differential Revision: D27443462 fbshipit-source-id: 554a3c653aae14d19e35038ba000cf5330e6d679	2021-03-31 12:54:47 -07:00
Pritam Damania	f71a0daeb7	Use faulthandler to dump traceback of timed out processes in unit tests. (#54818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54818 Several flaky tests fail due to some sort of timeout and it isn't clear from the error message in CI where exactly each process is stuck. In this PR, I've added mechanism to dump the entire python traceback of all python threads when we encounter a timeout. Example traceback: ``` Process 3 timed out with traceback: Current thread 0x00007ff3363ff700 (most recent call first): File "torch/testing/_internal/common_distributed.py", line 373 in _event_listener File "threading.py", line 870 in run File "threading.py", line 932 in _bootstrap_inner File "threading.py", line 890 in _bootstrap Thread 0x00007ff406132180 (most recent call first): File "torch/distributed/distributed_c10d.py", line 2477 in barrier File "torch/testing/_internal/distributed/rpc/rpc_test.py", line 838 in test_reinit File "torch/testing/_internal/dist_utils.py", line 90 in new_test_method File "torch/testing/_internal/common_distributed.py", line 292 in wrapper File "torch/testing/_internal/common_distributed.py", line 409 in run_test File "torch/testing/_internal/common_distributed.py", line 393 in _run File "multiprocessing/process.py", line 108 in run File "multiprocessing/process.py", line 315 in _bootstrap File "multiprocessing/popen_fork.py", line 75 in _launch File "multiprocessing/popen_fork.py", line 19 in __init__ File "multiprocessing/context.py", line 277 in _Popen File "multiprocessing/process.py", line 121 in start ``` ghstack-source-id: 125323810 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27378764 fbshipit-source-id: 661c009a5458c724f004aa83de9347a4bc03b63e	2021-03-31 11:38:30 -07:00
Sam Estep	bab8a1a81e	[reland] Add annotations to PRs from forks (#54779 ) Summary: We've been using [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action) to add annotations to PRs when they fail Flake8 or clang-tidy. Up until now, though, that functionality has only worked on PRs in pytorch/pytorch itself, not on PRs from forks. This PR fixes that using a technique from [this GitHub blog post](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) (also linked in a comment in this diff). Pull Request resolved: https://github.com/pytorch/pytorch/pull/54779 Test Plan: janeyx99 and I tested this in the same GitHub repo used to test https://github.com/pytorch/pytorch/issues/54685 and https://github.com/pytorch/pytorch/issues/54693, including with PRs from forks. Reviewed By: seemethere, xuzhao9 Differential Revision: D27470866 Pulled By: samestep fbshipit-source-id: d165b8e875d412b910592aa897163fb938d23365	2021-03-31 11:05:27 -07:00
peterjc123	57519e705a	Link onnx_library when BUILD_TEST=0 for Windows (#51937 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51877 cc antoniovs1029 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51937 Reviewed By: malfet Differential Revision: D27470392 Pulled By: ezyang fbshipit-source-id: 5abe47b58df9ea3f0706fa4d5a7c8dd92e738f8b	2021-03-31 10:58:25 -07:00
Ilya Persky	cff266544a	Fix minor style/typos problems in comment_device_type.py (#54768 ) Summary: One typo, one example correction and capitalization for a couple of comment lines. ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/54768 Reviewed By: H-Huang Differential Revision: D27362999 Pulled By: ezyang fbshipit-source-id: 91404ac9e9747ef7d7882a5f50b81d7eb570448b	2021-03-31 10:53:17 -07:00
Ailing Zhang	43d4f3b8d0	Implement public API InferenceMode and its error handling (#55008 ) Summary: https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008 Test Plan: Imported from OSS Pulled By: ailzhang Reviewed By: bhosmer Differential Revision: D27443229 fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c	2021-03-31 10:48:00 -07:00
Shruti Ramesh	f1f3c8b0fa	Adding PyTorch + DNNL + AMD BLIS path (#54953 ) Summary: These changes provide the user with an additional option to choose the DNNL+BLIS path for PyTorch. This assumes BLIS is already downloaded or built from source and the necessary library file is available at the location: $BLIS_HOME/lib/libblis.so and include files are available at: $BLIS_HOME/include/blis/blis.h and $BLIS_HOME/include/blis/cblas.h Export the below variables to build PyTorch with MKLDNN+BLIS and proceed with the regular installation procedure as below: $export BLIS_HOME=path-to-BLIS $export PATH=$BLIS_HOME/include/blis:$PATH LD_LIBRARY_PATH=$BLIS_HOME/lib:$LD_LIBRARY_PATH $export BLAS=BLIS USE_MKLDNN_CBLAS=ON WITH_BLAS=blis $python setup.py install CPU only Dockerfile to build PyTorch with AMD BLIS is available at : docker/cpu-blis/Dockerfile Example command line to build using the Dockerfile: sudo DOCKER_BUILDKIT=1 docker build . -t docker-image-repo-name Example command line to run the built docker container: sudo docker run --name container-name -it docker-image-repo-name Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54953 Reviewed By: glaringlee Differential Revision: D27466799 Pulled By: malfet fbshipit-source-id: e03bae9561be3a67429df3b1be95a79005c63050	2021-03-31 10:40:25 -07:00
Sam Estep	a74b10def9	Keep Markdown ToCs up to date (#54974 ) Summary: This PR uses [markdown-toc](https://github.com/jonschlinkert/markdown-toc#cli) to [automatically update the table of contents for `README.md` and `CONTRIBUTING.md`](https://github.com/pytorch/pytorch/pull/54904#issuecomment-809682134) in CI. This keeps the same format already used in `README.md`. While it does slightly change the format for the ToC in `CONTRIBUTING.md`, the new format is actually just the same as the old format that was already being used prior to https://github.com/pytorch/pytorch/issues/51458. Race condition with https://github.com/pytorch/pytorch/issues/54904. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54974 Test Plan: The new "Lint / toc" job in GitHub Actions [succeeds](https://github.com/pytorch/pytorch/pull/54974/checks?check_run_id=2238739005) on this PR, and [fails](https://github.com/pytorch/pytorch/pull/54976/checks?check_run_id=2238784022) on https://github.com/pytorch/pytorch/issues/54976 with an understandable error message. Reviewed By: malfet Differential Revision: D27468390 Pulled By: samestep fbshipit-source-id: 14a73f42ed546d4310140b94ded14e099185d0e0	2021-03-31 10:36:09 -07:00
Jianyu Huang	7fc03dd7c9	Back out "[pytorch][PR] Merge CUDA Streams and Events" (#54996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54996 Original commit changeset: 45d9fee9a582 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D27444718 fbshipit-source-id: deb627230817923eaf84ade50ecb14bfbce4e779	2021-03-31 10:21:35 -07:00
Horace He	24bfcd537e	[FX] Added FX prepare_for_inference for Intel CPUs (#53805 ) Summary: Part of https://github.com/pytorch/pytorch/issues/48209 Taken from the docstring: Performs a set of optimization passes to optimize a model for the purposes of inference. Specifically, the passes that are run are: 1. Conv/BN fusion 2. Dropout removal 3. MKL layout optimizations The third optimization takes a function `use_mkl_heuristic` that's used to determine whether a subgraph should be explicity run in MKL layout. I implemented 2 heuristics: 1. Does it in MKL if the subgraph is larger than 2. 2. Benchmarks each subgraph with MKL layout and without, and keeps the subgraph if it's faster. ### Batch size of 10 and multi-threaded. Results with the second heuristic are generally as strong as the "jit.freeze" version, except in `densenet` and `vgg`, where it's faster, likely due to the heuristic being better. With the first heuristic, there are some notable gaps, particularly on `inception_v3` and `alexnet`. ``` model Eager FX FX Auto jit.mkldnn ------------ --------- --------- --------- --------- - custom 0.195614 0.14686 0.15929 0.156442 6 resnet18 0.172012 0.114007 0.119678 0.12945 6 resnet50 0.486463 0.294308 0.299518 0.318121 6 densenet161 0.955309 0.893502 0.882798 1.29315 6 inception_v3 0.38454 0.307076 0.239513 0.233083 6 googlenet 0.229388 0.237486 0.170458 0.174106 6 shufflenet 0.0513613 0.0286739 0.0292908 0.0267209 6 alexnet 0.0709602 0.0768137 0.0660831 0.0650399 6 vgg16 1.053993 0.9013264 0.9360212 1.082820 6 mobilenet 0.12264 0.0970935 0.0936568 0.106314 6 mnasnet 0.0989875 0.0412083 0.0424499 0.0472336 6 resnext 0.476811 0.315428 0.314422 0.343156 6 ``` For single-threaded (still running...) ``` model eager FX FX auto mkl threads ------------ --------- --------- --------- --------- --------- custom 0.0401415 0.259863 0.0263152 0.200667 1 resnet18 0.499931 0.382113 0.383711 0.396335 1 resnet50 1.10353 0.911865 0.923645 0.992125 1 densenet161 2.20158 2.39421 2.08204 2.30124 1 inception_v3 0.79161 0.849207 0.703546 0.724492 1 googlenet 0.66896 0.820965 0.515927 0.529414 1 shufflenet 0.0987308 0.0689343 0.0629298 0.0617193 1 alexnet 0.198795 0.19862 0.19325 0.211934 1 vgg16 3.744 3.2499 3.28503 3.31576 1 mobilenet 0.152725 0.14505 0.135555 0.159754 1 mnasnet 0.141983 0.089406 0.089599 0.0956167 1 resnext 1.13778 0.97016 0.955417 0.965376 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53805 Reviewed By: gmagogsfm Differential Revision: D27424611 Pulled By: Chillee fbshipit-source-id: a39137159de962fba7ca15121dfa9e78c1e01223	2021-03-31 10:15:01 -07:00
Jacob Szwejbka	a0ae3e520f	[Pytorch Mobile] 'fix' filter of named parameters for FL (#54633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54633 Theres currently no information that could be used to determine what is a parameter during the loading of a mobile module. This prevents named parameters from functioning correctly. This change is a temporary hack to help out federated learning the sole user of this api currently. ghstack-source-id: 124885201 Test Plan: todo Reviewed By: dhruvbird Differential Revision: D27308738 fbshipit-source-id: 0af5d1e8381ab7b7a43b20560941aa070a02e7b8	2021-03-31 09:21:35 -07:00
Jeff Daily	0dd873bdd5	[ROCm] add 4.1 docker image (#54628 ) Summary: Add a ROCm 4.1 docker image for CI. Plan is to keep two ROCm versions at a time, however we still need the 3.9 image due to some CI jobs depending on it. Keep the 4.0.1 and 3.10 images, in addition to the 3.9 image until the 3.9 image is no longer needed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54628 Reviewed By: H-Huang Differential Revision: D27443378 Pulled By: malfet fbshipit-source-id: 3f3417ec4822c6ef4c10ce2144a5b2957503dfbe	2021-03-31 09:13:56 -07:00
SpaceIm	aeedd5c7df	cmake: fix ONNX_NAMESPACE if USE_SYSTEM_ONNX (#54973 ) Summary: `ONNX_NAMESPACE` is empty by default if `USE_SYSTEM_ONNX ON`, while it should be equal to `onnx`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54973 Reviewed By: glaringlee Differential Revision: D27466020 Pulled By: walterddr fbshipit-source-id: 47cde3604acbda3f45bec5893036b39fd1eb58c9	2021-03-31 08:29:00 -07:00
Ilia Cherniavskii	449a9514d1	Update Kineto submodule (#55011 ) Summary: Update Kineto submodule rev. Fixes https://github.com/pytorch/pytorch/issues/54889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/55011 Test Plan: CI Reviewed By: gdankel Differential Revision: D27450222 Pulled By: ilia-cher fbshipit-source-id: 0652a5d42182197acc4c9e6f07e71b5b55a557ee	2021-03-31 08:07:40 -07:00
Facebook Community Bot	99a423f8fc	Automated submodule update: tensorpipe (#54970 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `5bc304d17e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54970 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27436760 fbshipit-source-id: 7325350c1798feacdc1faeea8c39ce8e4b91c73d	2021-03-31 07:53:35 -07:00
Alexander Golynski	09756e7280	Revert D27370295: [android] fbjni android use prefab dependency, version 0.2.2 Test Plan: revert-hammer Differential Revision: D27370295 (`2bee09a577`) Original commit changeset: bde881a8d4ed fbshipit-source-id: 2fcc8f522fb08d4f8299f7e824341be32afb184a	2021-03-31 06:13:26 -07:00
Alexander Golynski	25e07c6e91	Revert D27422219: [caffe2] Support Log1p operator Test Plan: revert-hammer Differential Revision: D27422219 (`d92e2520de`) Original commit changeset: f9eba82bf09c fbshipit-source-id: 7cd5b778ae5f296187f57b6efaa782de97a6f013	2021-03-31 06:03:45 -07:00
Heitor Schueroff	6d87b3667f	Added support for TensorList inputs in OpInfo (#54922 ) Summary: Stack: * https://github.com/pytorch/pytorch/issues/54954 Fixed OpInfo jit tests failing for TensorList inputs * __#54922 Added support for TensorList inputs in OpInfo__ Updated OpInfo to accept either a `Tensor` or `TensorList` as `sample.input` and added workarounds to make this work with gradcheck. Note: JIT testing support for TensorList inputs will be added in a follow up PR. Fixes https://github.com/pytorch/pytorch/issues/51996 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54922 Reviewed By: H-Huang Differential Revision: D27448952 Pulled By: heitorschueroff fbshipit-source-id: 3f24a56f6180eb2d044dcfc89ba59fce8acfe278	2021-03-31 04:42:10 -07:00
Michael Suo	8a170fbacd	[package] fix mangling issues with TorchScript (#54915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54915 TorchScript and torch.package have different mangling schemes. To avoid them interfering with each other, we should undo the torch.package mangling before processing anything with TorchScript (since TS independently makes sure that no names collide). Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27410472 Pulled By: suo fbshipit-source-id: d1cc013c532d9abb7fb9615122bc465ded4785bb	2021-03-31 00:58:05 -07:00
nikithamalgi	444e5f0b60	Add Type System (I) (#53244 ) Summary: Summary This commit adds a new .rst file to update the language specification with the updated content for the Type System section. Test Plan ![image](https://user-images.githubusercontent.com/70345919/109920057-9308b400-7c6e-11eb-8391-83635efbf036.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53244 Reviewed By: H-Huang Differential Revision: D27445210 Pulled By: nikithamalgifb fbshipit-source-id: 984c25b06686ba7a72cc03c5c069d819709eedb8	2021-03-30 23:10:27 -07:00
Scott Wolchok	4865195cf4	[PyTorch] Add DimVector variant of infer_size (#54882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54882 Sometimes we have no reason to think that the output of `infer_size` won't be within the range of typical tensor sizes. In those cases, we can use a DimVector. ghstack-source-id: 125137792 Test Plan: CI Reviewed By: ezyang Differential Revision: D27400387 fbshipit-source-id: 9a11d0f93010540f3aa65c0e208fc8e03f0e8a7f	2021-03-30 20:40:50 -07:00
Ivan Kobzarev	2bee09a577	[android] fbjni android use prefab dependency, version 0.2.2 (#54792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54792 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D27370295 Pulled By: IvanKobzarev fbshipit-source-id: bde881a8d4edd4636aa4ec7cecbe770b5b65bb1f	2021-03-30 20:26:36 -07:00
Ivan Yashchuk	854c92078a	Fixed the default size of the workspace array for MAGMA's SVD (#54875 ) Summary: The problem was that MAGMA might not set the value for the optimal size of the workspace array leaving it uninitialized. This is fixed by setting the default value for `wkopt` variable. Fixes https://github.com/pytorch/pytorch/issues/54381 and https://github.com/pytorch/pytorch/issues/53976. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54875 Reviewed By: H-Huang Differential Revision: D27437702 Pulled By: mruberry fbshipit-source-id: bf61555abc4c50e8ef2dae933df24ce4d4fe4527	2021-03-30 19:28:06 -07:00
Jeff Daily	1dffbe759b	[ROCm] utilize PUBLIC vs PRIVATE linking to avoid incorrect dependencies (#54727 ) Summary: Fixes the build of projects that depend on torch, such as torchaudio. Otherwise torchaudio will complain that gloo_hip is missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54727 Reviewed By: H-Huang Differential Revision: D27361513 Pulled By: ezyang fbshipit-source-id: 714cc2db23e7adf3e89303e941b78c27625b9460	2021-03-30 19:22:56 -07:00
Nuno Lopes	d4c37b314e	Mark redispatch functions with TORCH_API (#54966 ) Summary: So they can be called from out-of-tree extensions Otherwise I get linking errors like: ``` ImportError: /anaconda/envs/mytorch/lib/python3.8/site-packages/torchy-0.1-py3.8-linux-x86_64.egg/_TORCHY.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN2at10redispatch3addEN3c1014DispatchKeySetERKNS_6TensorES5_RKNS1_6ScalarE ``` cc ezyang bdhirsh Pull Request resolved: https://github.com/pytorch/pytorch/pull/54966 Reviewed By: H-Huang Differential Revision: D27439712 Pulled By: ezyang fbshipit-source-id: 4c0b45e87e708c57283758da49c54a767ab7ecbc	2021-03-30 18:30:42 -07:00
Arindam Roy	b907d6e3b6	[ROCm] skip some tests to enable 4.1 CI upgrade (#54536 ) Summary: Skips the tests indicated as failing in https://github.com/pytorch/pytorch/issues/54535. During the ROCm CI upgrade from 4.0.1 to 4.1, some tests regressed. Specifically, FFT tests in test_spectral_ops.py and test_grid_sample in test_nn.py. In order to keep a passing CI signal, we need to disable these temporarily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54536 Reviewed By: H-Huang Differential Revision: D27442974 Pulled By: malfet fbshipit-source-id: 07dffb957757a5fc7afaa5bf78b935a427251ef4	2021-03-30 17:49:45 -07:00
Mustafa Bal	3baeeb3f57	Added Azure Pipelines build steps for PyTorch (#54039 ) Summary: This PR adds Azure Pipelines build steps for PyTorch. There are 3 pipelines that are added. 1) CI Build - Runs when a PR is opened or when new commits to an open PR is added. This build must succeed before the PR can be merged. - Currently only TestTorch unit tests are run. - Only the CI Build configurations are run. 2) Daily Build - Runs once a day during inactive hours to ensure the current PyTorch repo performs as expected. - Runs all unit tests. - Note: I do not have access to the current [determine-from](`b9e900ee52/test/run_test.py (L737)`) unit tests that are skipped on Windows builds. This `determine-from` filter can be added once a clear way to skip certain unit tests given the build configuration is explained. - Runs on All Build configurations. 3) Official Build - Runs once a day during inactive hours to publish official PyTorch artifacts to Azure DevOps Artifacts for consumption. - No unit tests are run. - Runs in three stages: Build, Verify, Publish, where PyTorch is built, then its wheel is installed in a clean Conda environment for verification, and then the wheel is published to Azure Artifacts as a Universal Package. - Runs on All Build configurations. Ubuntu builds run on Docker with the specified Dockerfile configuration. Windows builds run directly on configured Windows VMs (CPU, CUDA/cuDNN) CI Build configurations: 1. Ubuntu 18.04 1. Python 3.9 a. CUDA 11.2/cuDNN 8.1.0 2. Python 3.8 a. CPU 2. Windows 2019 1. Python 3.8 b. CUDA 10.2/cuDNN 7.6.5 2. Python 3.7 a. CPU All Build configurations: 1. Ubuntu 18.04 1. Python 3.9 a. CUDA 11.2/cuDNN 8.1.0 2. Python 3.8 a. CPU b. CUDA 10.2/cuDNN 8.1.0 3. Python 3.7 a. CPU b. CUDA 10.1/cuDNN 7.6.5 2. Windows 2019 1. Python 3.9 a. CUDA 11.2/cuDNN 8.1.0 2. Python 3.8 a. CPU b. CUDA 10.2/cuDNN 7.6.5 3. Python 3.7 a. CPU b. CUDA 10.1/cuDNN 7.6.4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54039 Reviewed By: ezyang Differential Revision: D27373310 Pulled By: malfet fbshipit-source-id: 06dcfe2d99da0e9876b6deb224272800dae46028	2021-03-30 17:32:43 -07:00
Nikolay Korovaiko	f956b7524e	lazily init AliasDb and add `changed` status to CSE (#54776 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54776 Reviewed By: H-Huang Differential Revision: D27422064 Pulled By: Krovatkin fbshipit-source-id: dfeb61001f60a2080246e128d8b7f83bbc584801	2021-03-30 16:59:55 -07:00
Wenlei Xie	2df4acd025	Remove hacky wrapper for about 70 kernels (#54898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54898 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54098 ghstack-source-id: 125278884 Test Plan: buck build //caffe2/aten/... BUILD_TENSOREXPR_BENCHMARK=ON BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install Reviewed By: smessmer Differential Revision: D27404868 fbshipit-source-id: cb6593c0d1a2dee4e65f0baa08f32a76cf7f5339	2021-03-30 16:47:33 -07:00
Wenlei Xie	1bf57430f1	Remove hacky wrappers for 21 operators (#54819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54819 20 of them contain both optional Tensor and output position. Hacky wrapper for `_convolution_mode` was added in 04e0cbf5a9f073a1b73195537c12fb332c2fddd9 after hacky wrappers are removed for optional<Tensor>. Codemod commands are generated by a hacked version of https://github.com/pytorch/pytorch/pull/54223 and https://github.com/pytorch/pytorch/pull/54098. ghstack-source-id: 125278883 Test Plan: buck build //caffe2/aten/... BUILD_TENSOREXPR_BENCHMARK=ON BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install Reviewed By: smessmer Differential Revision: D27378819 fbshipit-source-id: b925ed0510a83e3976383aaeec8b7de438b23bf3	2021-03-30 16:46:22 -07:00
Oleg Khabinov	d92e2520de	[caffe2] Support Log1p operator (#54968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54968 Support Log1p operator to add feature parity with PyTorch. NumPy doc https://numpy.org/doc/stable/reference/generated/numpy.log1p.html PyTorch doc https://pytorch.org/docs/stable/generated/torch.log1p.html Test Plan: ``` $ buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:log1p_op_test ``` Differential Revision: D27422219 fbshipit-source-id: f9eba82bf09c1c440f11a33f8ae2bf8084609457	2021-03-30 16:38:37 -07:00
Scott Wolchok	d490e0120f	[PyTorch] One less refcount bump in linear() (#54936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54936 Another case where we can use `MaybeOwned<Tensor>` to save a bump at a small cost. ghstack-source-id: 125218488 Test Plan: Existing CI Reviewed By: ngimel Differential Revision: D27421117 fbshipit-source-id: 16bb31ec38817be1f889360e2abfd0d9596e2943	2021-03-30 15:54:57 -07:00
Scott Wolchok	dde7fff0e9	[PyTorch] Avoid refcount bumps in addmm_out_cuda_impl (#54935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54935 Bunch of avoidable copying of Tensor objects, which results in a refcount bump. ghstack-source-id: 125216023 Test Plan: Compared percentage of self time spent in addmm_out_cuda_impl while running the following sample: ``` +import torch +import torch.nn as nn + +m = nn.Linear(1024, 1024).cuda().half() +x = torch.randn(16, 1024).cuda().half() +while True: y = m(x) ``` in perf record, decreased from 0.74% to 0.56%. Reviewed By: ngimel Differential Revision: D27420388 fbshipit-source-id: d2c5e4c4899cd02c60c45735b2d72c4ed913f6e8	2021-03-30 15:54:55 -07:00
Scott Wolchok	ea37fe34ff	[PyTorch] Avoid refcount bump in TensorArg (#54934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54934 It looks like the vast majority of usage is just borrowing a pre-existing Tensor. ghstack-source-id: 125216052 Test Plan: Existing CI. Reviewed By: hlu1 Differential Revision: D27415131 fbshipit-source-id: d5a8dc4ca5d48ca3eaa3664655b724094e61f371	2021-03-30 15:53:22 -07:00
Qi Zhao	5b448cf21a	Revert D25966661: Support needsOutputs for RecordFunction and ObserverUtil improvements Test Plan: revert-hammer Differential Revision: D25966661 (`0e43a73f76`) Original commit changeset: 707886e1f212 fbshipit-source-id: a4e4af29abf622c1e0aaaf7dfb019c045988b4bc	2021-03-30 15:41:12 -07:00
Rohan Varma	23b15ef98a	test_c10d: use with_nccl_blocking_wait decorator (#54742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54742 Uses with_nccl_blocking_wait decorator for test_c10d. ghstack-source-id: 125233691 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D27277835 fbshipit-source-id: 063de32646b19d18969e9d60cb9a31a40d73d6a7	2021-03-30 15:33:17 -07:00
Rohan Varma	3f1cd2e3a0	test_c10d: Run tests with nccl_async_error_handling (#54741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54741 Similar to what we did for distributed_test.py, let MultiProcessTests that run collecticve comm. tests with nccl blocking run under nccl_async_error_handling. This will better simulate real-world training scenarios. ghstack-source-id: 125233692 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27277389 fbshipit-source-id: a6c6e9abcf3a53b03ea8b9f8fb63b78e0cb6e81e	2021-03-30 15:33:14 -07:00
Rohan Varma	0e543b2b00	Provide a decorator to set/unset nccl blocking wait for tests (#54740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54740 Adds a simple helper decorator to set/unset nccl blocking wait for tests. This will make it easier than having to manually set/unset the os.environ vars every time. ghstack-source-id: 125233693 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27277222 fbshipit-source-id: c289b9d05e2f6328d672810b07501979b6e177c6	2021-03-30 15:31:30 -07:00
Michael Carilli	920eb01e2e	Add scatter_add to amp docs (#54908 ) Summary: Updates docs to reflect https://github.com/pytorch/pytorch/pull/52133. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54908 Reviewed By: agolynski Differential Revision: D27431302 Pulled By: H-Huang fbshipit-source-id: fa3dc6267bc73c81cdd96f986c971daee1922cb5	2021-03-30 15:26:41 -07:00
kshitij12345	4694452d08	[complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. (#54244 ) Summary: Reference Issue: https://github.com/pytorch/pytorch/issues/33152 Previous PR : https://github.com/pytorch/pytorch/pull/52035, https://github.com/pytorch/pytorch/pull/52483 Fixes : https://github.com/pytorch/pytorch/issues/53608 Fixes : https://github.com/pytorch/pytorch/issues/53523 Note: This PR is based on `ci-all/*` branch to ascertain that we don't break the master again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54244 Reviewed By: H-Huang Differential Revision: D27429147 Pulled By: anjali411 fbshipit-source-id: 97945998b6911c2e7fd3f8db6cbd8963e5d6f21f	2021-03-30 14:58:40 -07:00
Louis Feng	0e43a73f76	Support needsOutputs for RecordFunction and ObserverUtil improvements (#54442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442 Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent. To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels. For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled. Test Plan: ``` => buck build //caffe2/test/cpp/jit: --show-output => buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest* CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = RecordFunctionTest-_CUDA:*_MultiCUDA [==========] Running 7 tests from 1 test case. [----------] Global test environment set-up. [----------] 7 tests from RecordFunctionTest [ RUN ] RecordFunctionTest.TracedTestInputsOutputs [ OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms) [ RUN ] RecordFunctionTest.SampledCallbacks [ OK ] RecordFunctionTest.SampledCallbacks (771 ms) [ RUN ] RecordFunctionTest.RecordFunctionGuard [ OK ] RecordFunctionTest.RecordFunctionGuard (0 ms) [ RUN ] RecordFunctionTest.Callbacks [ OK ] RecordFunctionTest.Callbacks (2 ms) [ RUN ] RecordFunctionTest.ShouldRun [ OK ] RecordFunctionTest.ShouldRun (0 ms) [ RUN ] RecordFunctionTest.Basic [ OK ] RecordFunctionTest.Basic (1 ms) [ RUN ] RecordFunctionTest.OperatorNameOverload [ OK ] RecordFunctionTest.OperatorNameOverload (1 ms) [----------] 7 tests from RecordFunctionTest (1001 ms total) [----------] Global test environment tear-down [==========] 7 tests from 1 test case ran. (1002 ms total) [ PASSED ] 7 tests. ``` Reviewed By: ilia-cher Differential Revision: D25966661 fbshipit-source-id: 707886e1f212f40ba16a1fe292ea7dd33f2646e3	2021-03-30 14:26:22 -07:00
Mikhail Zolotukhin	85c056a508	[JIT] Add EliminateExceptions pass. (#54730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54730 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27343165 Pulled By: ZolotukhinM fbshipit-source-id: 1574e7aad4d527c4caf74383335265c9bffc7640	2021-03-30 13:56:54 -07:00
Sam Estep	5bcbbf5373	Lint trailing newlines (#54737 ) Summary: Context: https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines. The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR: - `.github/workflows/lint.yml` - `mypy-strict.ini` - `tools/README.md` - `tools/test/test_trailing_newlines.py` - `tools/trailing_newlines.py` I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository): - [How to detect file ends in newline?](https://stackoverflow.com/q/38746) - [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068) - [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800) - [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632) - [git ensure newline at end of each file](https://stackoverflow.com/q/57770972) To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737 Test Plan: Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR: - https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true In contrast, this run (after correcting the trailing newlines in this PR) succeeded: - https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241 To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow): ``` python tools/test/test_trailing_newlines.py ``` Reviewed By: malfet Differential Revision: D27409736 Pulled By: samestep fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19	2021-03-30 13:09:52 -07:00
Sam Estep	eafa235582	Clarify and document commit choice for CI jobs (#54967 ) Summary: PRs https://github.com/pytorch/pytorch/issues/53652 and https://github.com/pytorch/pytorch/issues/54693 attempted to increase the consistency of our choice of commit (head vs merge) for CI on PRs, and have so far been unsuccessful. This PR takes a less ambitious approach to the problem by clarifying the choice in one specific way (see the following paragraph) and documenting it in `CONTRIBUTING.md`. In addition to documentation, this PR also removes the current behavior of our GHA jobs that checkout the PR tip instead of the merge commit. At first glance, this behavior seems to increase consistency (by eliminating the special-case for `ghstack` PRs), but in reality, it actually just means that for non-`ghstack` PRs, the question "Which commit is used in CI?" has two answers instead of just one; see the description of https://github.com/pytorch/pytorch/issues/53652 for more details. Once merged, this PR will unblock other PRs that add modify our GHA workflows in breaking ways, such as https://github.com/pytorch/pytorch/issues/54737. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54967 Test Plan: None. Reviewed By: walterddr, seemethere Differential Revision: D27435913 Pulled By: samestep fbshipit-source-id: 405fb419cf015cf88107d5eb2498cfb5bcb7ce33	2021-03-30 11:47:40 -07:00
Ansley Ussery	18e61d1ce9	Improve placeholder matching in subgraph rewriter (#54958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54958 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D27431889 Pulled By: ansley fbshipit-source-id: 8b1b4f2f0202305530b9648b6b770f9e2ecacfe2	2021-03-30 11:40:33 -07:00
Hong Xu	f5d6b90c35	Add a missing sys import in test/distributed/rpc/test_tensorpipe_agent.py (#54925 ) Summary: `sys` is used a couple of lines below. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54925 Reviewed By: agolynski Differential Revision: D27434941 Pulled By: H-Huang fbshipit-source-id: b03c9373ee77e7a158964f619b29967fa55226d0	2021-03-30 11:24:06 -07:00
Vitaly Fedyunin	46c27ea84d	Enabling OneDNN for group conv (#54890 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54890 Reviewed By: ejguan Differential Revision: D27405252 Pulled By: VitalyFedyunin fbshipit-source-id: 7f4880ff07a51b83f796e218eb0df048ad4725ce	2021-03-30 11:18:23 -07:00
Ansha Yu	d49beba071	[pyper] out variant of sigrid_transforms_torch_bind + ListUnpack (#54761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54761 Test Plan: Regen adindexer model that uses sigrid_transforms_torch_bind: /mnt/public/ansha/adindexer/merge20210323/adindexer_pt_traced_merge.pt ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --scripted_model=adindexer_pt_traced_merge.pt --pt_inputs=/data/users/ansha/tmp/adindexer/merge2/container_precomputation_bs1.pt --iters=30000 --warmup_iters=300000 --num_threads=1 --pred_net=c2_net_merge.pb --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --pt_optimize_memory=1 ``` Before ms/iter: 0.0647056 After ms/iter: 0.0581197 Reviewed By: hlu1 Differential Revision: D27239617 fbshipit-source-id: dffe6cbaf3a783c41605c97c5947a36e3b1b1f3b	2021-03-30 10:54:44 -07:00
Meghan Lele	d60874354f	[docs] Add updated TorchScript language reference section for types (#53673 ) Summary: Summary This commit adds information about type annotation and inference to the updated language specification. It will be rebased on top of https://github.com/pytorch/pytorch/issues/52494 after it lands. Test Plan Continuous integration. Screen capture: https://user-images.githubusercontent.com/4392003/110560184-66371f80-80fa-11eb-803a-923cf8de25ff.mov Pull Request resolved: https://github.com/pytorch/pytorch/pull/53673 Reviewed By: gmagogsfm Differential Revision: D27413001 Pulled By: SplitInfinity fbshipit-source-id: b54b300b4b1f10537ec06e2ee9eeb6d2b1f1810b	2021-03-30 10:32:58 -07:00
kshitij12345	9f93d82907	OpInfo: Add opinfo for cum{min,max} and minor fixes (#54762 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/54261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54762 Reviewed By: H-Huang Differential Revision: D27390171 Pulled By: mruberry fbshipit-source-id: 9376dafa3bd2228786756f62fed01565134228fa	2021-03-30 10:24:38 -07:00
Ivan Yashchuk	4e110528bd	Added cuSOLVER path for torch.linalg.eigh/eigvalsh (#53040 ) Summary: This PR adds the cuSOLVER based path for `torch.linalg.eigh/eigvalsh`. The device dispatching helper function was removed from native_functions.yml, it is replaced with `DECLARE/DEFINE_DISPATCH`. cuSOLVER is used if CUDA version >= 10.1.243. In addition if CUDA version >= 11.1 (cuSOLVER version >= 11.0) then the new 64-bit API is used. I compared cuSOLVER's `syevd` vs MAGMA's `syevd`. cuSOLVER is faster than MAGMA for all matrix sizes. I also compared cuSOLVER's `syevj` (Jacobi algorithm) vs `syevd` (QR based divide-and-conquer algorithm). Despite it is said that `syevj` is better than `syevd` for smaller matrices, in my tests it is the case only for float32 dtype and matrix sizes 32x32 - 512x512. For batched inputs comparing a for loop of `syevd/syevj` calls to `syevjBatched` shows that for batches of matrices up to 32x32 the batched routine is much better. However, there are bugs in `syevjBatched`, sometimes it doesn't compute the result leaving eigenvectors as a unit diagonal matrix and eigenvalues as the real diagonal of the input matrix. The output is the same with `cupy.cusolver.syevj` so the problem is definitely on the cuSOLVER side. This bug is not present in the non-batched `syevj`. The performance of 64-bit `syevd` is the same as 32-bit version. Ref. https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53040 Reviewed By: H-Huang Differential Revision: D27401218 Pulled By: mruberry fbshipit-source-id: aef91eefb57ed73fef87774ff9a36d50779903f7	2021-03-30 10:14:00 -07:00
kshitij12345	c9d0c855f7	[special] Alias for special.expm1 and special.exp2 (#54670 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54670 Reviewed By: H-Huang Differential Revision: D27401440 Pulled By: mruberry fbshipit-source-id: 02b1fd0e8ffd3f5a017d6b6b9229b76b92b4b745	2021-03-30 10:03:13 -07:00
peter	75ed6fbd91	Fix CUDA 11.2 jobs for Windows (#54955 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/54589#issuecomment-810255467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54955 Reviewed By: walterddr Differential Revision: D27434722 Pulled By: agolynski fbshipit-source-id: b99f24be679da65e5894e1a21e3cb2a62320fdda	2021-03-30 09:58:31 -07:00
Ilia Cherniavskii	728d18f976	Enable USE_KINETO (#51273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51273 Reviewed By: malfet Differential Revision: D26119144 fbshipit-source-id: eab0d17789c1eab89a7369f0574d3b4c2767c98a	2021-03-30 09:39:11 -07:00
anjali411	9b9e19a808	Fix test_variant_consistency_jit_addmm for complex types (#54917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54917 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D27411483 Pulled By: anjali411 fbshipit-source-id: 95a2241ff326a7ab8b8d3abe0ad100074c23e47a	2021-03-30 09:33:10 -07:00
Edward Yang	6c8d783830	Generate no-op meta functions for all inplace operations (#54901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54901 Some subtleties: - Need to make sure not to clobber composite definitions when deciding when to generate - I was lazy and so I didn't make inplace on TensorList work, nor did I make inplace functions that returned void work - A few tests started complaining that these noop meta functions weren't raising the errors they needed. This is tracked in https://github.com/pytorch/pytorch/issues/54897 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27407232 Pulled By: ezyang fbshipit-source-id: 5e706a267496368acdafd128942c310954e43d29	2021-03-30 09:31:39 -07:00
Yi Wang	7c0941ee63	Clang-format powerSGD_hook.py (#54839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54839 ghstack-source-id: 125089465 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27384796 fbshipit-source-id: 8312059f6a47d60ca29f75041141bb88804e1b32	2021-03-30 09:28:45 -07:00
Yi Wang	6c31f56bf4	[Gradient Compression] Add cuda.syncrhonize back to batched powerSGD (#54838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54838 Realize that an explicit sync is somehow still needed for batched PowerSGD hook. I find that a job failure can be fixed by this change. The sync was once removed by #54482. Test Plan: f260900882 f260899693 Reviewed By: rohan-varma Differential Revision: D27384738 fbshipit-source-id: 3efd738b9fd375e2ceb36ed3a6bf99cd8ce8ff95	2021-03-30 09:27:11 -07:00
Supriya Rao	6f63126b5c	[quant][fx] Add pass in convert to fold quant-dequant sequence (#54860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54860 Currently we insert a quantize_per_tensor op when we encounter the quantizable input, so if it has multiple uses and not all are quantizable then we need to add a dequantize op before these ops. In this pass - For a sequence of quantize_per_tensor - dequantize, we combine them since it is a no-op. [internal only][pyper] Before this change we had redundant dequantize nodes in the graph Example 1x inline_cvr graph https://www.internalfb.com/intern/everpaste/?handle=GODBxAlUMzGHD6 (`98143776f5`)MSACpHKKu9qjorbsIXAAAz FC layers -> 37 quantize_per_tensor -> 30 dequantize -> 49 After this change https://www.internalfb.com/intern/everpaste/?handle=GAl0uQnOlDNmpLoSAB-GZqRxu9wMbsIXAAAz FC layers -> 37 quantize_per_tensor -> 30 dequantize -> 39 We remove extra 10 dequantize nodes in the graph. Test Plan: python test/test_quantization.py test_fold_quant_dequant Imported from OSS Reviewed By: vkuzo Differential Revision: D27390506 fbshipit-source-id: 56e6fb8496171246eccf4bd45eb8bebd87fcb740	2021-03-30 08:40:24 -07:00
Supriya Rao	a7dc0ab845	[quant][fx][pyper] Get first linear use of quantize_per_tensor for FQN (#54859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54859 This is applicable to the case when a call_function linear op is one of the users of quantize op In order to be able to map the qparams of quantize_per_tensor to the qparams of the linear operator that consumes it, we need to use the FQN of the module with linear op for the qparmas of quantize_per_tensor. Test Plan: python test/test_quantization.py test_qparams_fqn Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27390505 fbshipit-source-id: a47af0e5ac016f2b2df74fbdf45afe99dc04be46	2021-03-30 08:38:51 -07:00
Hameer Abbasi	c690ed0ae8	Fix override for __iter__ (#54702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54702 This fixes subclassing for __iter__ so that it returns an iterator over subclasses properly instead of Tensor. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D27352563 Pulled By: ezyang fbshipit-source-id: 4c195a86c8f2931a6276dc07b1e74ee72002107c	2021-03-30 08:30:50 -07:00
Peter Bell	2503028ff5	Fix ConvTranspose with padding as a list of values (#54911 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54452 The assertion that fails in the issue is necessary to appease mypy. Instead, I fix `_ntuple` to always return a `tuple`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54911 Reviewed By: H-Huang Differential Revision: D27411088 Pulled By: jbschlosser fbshipit-source-id: 7f5045c58dd4f5f3b07b4826d9b4ca85606c5bce	2021-03-30 07:37:31 -07:00
Nikita Shulga	0269a5f481	Re-enable `cmath.sqrt(complex(-1,-0.0))` test (#54923 ) Summary: Both JITed and plan `cmath.sqrt(complex(-1, -0.0))` should return `-1j` after https://github.com/pytorch/pytorch/pull/54820 has been resolved. Also, use f-string instead of `.format` method Pull Request resolved: https://github.com/pytorch/pytorch/pull/54923 Reviewed By: anjali411 Differential Revision: D27415117 Pulled By: malfet fbshipit-source-id: 52e182feca50b690684de87c99df0ad6bef1ab44	2021-03-30 07:25:26 -07:00
Hao Lu	46e7f6773f	[Static Runtime] Check for inplace ops explicitly in ReplaceWithCopy (#54657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54657 The constraint checked in D27145406 (`acf03b13f1`) is too tight for the adindexer model and as a result, 5 ops (4 aten::narrow + 1 aten::premute) are not replaced with the copy version and resulted in perf regression. This diff checks for inplace ops explicitly and only applies the input constraint to graphs with inplace ops. Test Plan: Contbuild Reviewed By: ajyu Differential Revision: D27253145 fbshipit-source-id: 23e2b1a018c84dd0fc2880fddd9c41bc0422b8eb	2021-03-30 07:08:00 -07:00
Tao Xu	32bb5c3609	[iOS GPU][Kernel] Fix the softmax kernels (#54519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54519 The current MPSCNNSoftmax kernels operates on tensors' feature channels. Therefore, in order to use it, we need to reshape the input tensors based on the value of `dim` . Currently, I decide to limit the input to be two dimensional. I'll remove the constraint once we have shader implementations. ghstack-source-id: 124497702 Test Plan: - SandcastleCI - CircleCI Reviewed By: dhruvbird Differential Revision: D27218823 fbshipit-source-id: 48c427ceedb42e63c183114939ca801ebfc81fd9	2021-03-30 03:58:55 -07:00
Tao Xu	626bb3d310	[iOS GPU][Design] Use function_constants to simply shader kernels (#54518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54518 When I was reading the Metal Shader Language Specification, I noticed that using `function_constants` in C++ attributes could let us do compile time kernel selection, which can drastically reduce the complexity of writing GPU kernels for different input texture types. We should apply this trick to all our existing shader functions. ghstack-source-id: 124497703 Test Plan: - Metal op tests ``` 2021-03-20 23:35:20.496922-0700 PyTorchPlayground[48215:8455407] [bool test_view()],[1 10 2 2 ],[SUCCEED] 2021-03-20 23:35:20.522714-0700 PyTorchPlayground[48215:8455407] [bool test_view2()],[1 10 2 2 ],[SUCCEED] 2021-03-20 23:35:20.553591-0700 PyTorchPlayground[48215:8455407] [bool test_view3()],[5 8 ],[SUCCEED] 2021-03-20 23:35:20.571194-0700 PyTorchPlayground[48215:8455407] [bool test_view4()],[5 8 ],[SUCCEED] ``` - Sandcastle CI - CircleCI Reviewed By: SS-JIA Differential Revision: D27218965 fbshipit-source-id: 763c54d551de3a88e4ff0007894200d72f00958c	2021-03-30 03:57:02 -07:00
Zheng Yan	f9097c43b9	Support mix of int32 and int64 offsets/indices for EmbeddingBag and its variants (#53655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53655 Currently EmbeddingBag and it variants support either int32 or int64 indices/offsets. We have use cases where there are mix of int32 and int64 indices which are not supported yet. To avoid introducing too many branches we could simply cast offsets type to indices type when they are not the same. Test Plan: unit tests Reviewed By: qizzzh Differential Revision: D26820202 fbshipit-source-id: 3e8f09523329ea12393ea92ee9a6315aa40a0b7f	2021-03-29 23:58:03 -07:00
Jane Xu	5c12d97d96	Add script to export a JSON of slow test case times (#54907 ) Summary: This PR introduces a script to spit our a list of slow tests into a file `.pytorch-slow-tests`. The format is currently JSON, and is simply a dictionary with entries that look like: `("test_case_name (__main__.test_suite)" -> average time in seconds)`. This is one of the steps in maintaining a list of slow tests so we could retire the manual slowTest labeling process. The script reads data from the previous day's viable/strict's data (to ensure we have fully uploaded data), and aggregates the test times for passed test cases. It then filters the individual test cases to exclude those faster than 60 seconds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54907 Test Plan: `python tools/export_slow_test.py` Check that `.pytorch-slow-tests` contains data. Mine looks like: ``` { "test_matmul_4d_4d_complex_cpu (__main__.TestAutogradDeviceTypeCPU)": 91.22675, "test_unary_ops (__main__.TestTEFuser)": 68.6, "test_fn_gradgrad_unfold_cpu_complex128 (__main__.TestGradientsCPU)": 82.49153333333334, "test_conv1d_basic (__main__.TestXNNPACKConv1dTransformPass)": 94.0914375, "test_ddp_uneven_inputs (__main__.TestDistBackendWithFork)": 134.4995, "test_pdist_norm_large_cuda (__main__.TestTorchDeviceTypeCUDA)": 60.2634, "test_cusparse_multiple_threads_same_device (__main__.TestCuda)": 97.9022, "test_fn_gradgrad_unfold_cuda_complex128 (__main__.TestGradientsCUDA)": 130.7222, "test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn)": 136.08133333333333, "test_jit_cuda_archflags (__main__.TestCppExtensionJIT)": 112.80733333333333, "test_lobpcg_ortho_cuda_float64 (__main__.TestLinalgCUDA)": 63.8312, "test_matmul_4d_4d_complex_cuda (__main__.TestAutogradDeviceTypeCUDA)": 62.1062, "test_inverse_many_batches_cuda_complex128 (__main__.TestLinalgCUDA)": 1434.505, "test_inverse_many_batches_cuda_complex64 (__main__.TestLinalgCUDA)": 1403.846, "test_inverse_many_batches_cuda_float32 (__main__.TestLinalgCUDA)": 2081.614, "test_inverse_many_batches_cuda_float64 (__main__.TestLinalgCUDA)": 1410.788, "test_matrix_exp_analytic_cuda_complex128 (__main__.TestLinalgCUDA)": 172.167, "test_matrix_exp_analytic_cuda_complex64 (__main__.TestLinalgCUDA)": 172.57, "test_matrix_exp_analytic_cuda_float32 (__main__.TestLinalgCUDA)": 258.61, "test_matrix_exp_analytic_cuda_float64 (__main__.TestLinalgCUDA)": 174.793, "test_inverse_many_batches_cpu_complex128 (__main__.TestLinalgCPU)": 666.464, "test_inverse_many_batches_cpu_complex64 (__main__.TestLinalgCPU)": 667.26, "test_inverse_many_batches_cpu_float32 (__main__.TestLinalgCPU)": 1100.719, "test_inverse_many_batches_cpu_float64 (__main__.TestLinalgCPU)": 651.037, "test_matrix_exp_analytic_cpu_complex128 (__main__.TestLinalgCPU)": 72.965, "test_matrix_exp_analytic_cpu_complex64 (__main__.TestLinalgCPU)": 74.184, "test_matrix_exp_analytic_cpu_float32 (__main__.TestLinalgCPU)": 128.768, "test_matrix_exp_analytic_cpu_float64 (__main__.TestLinalgCPU)": 72.138, "test_conv1d_with_relu_fc (__main__.TestXNNPACKConv1dTransformPass)": 123.728, "test_fn_gradgrad_linalg_householder_product_cuda_complex128 (__main__.TestGradientsCUDA)": 60.708, "test_lobpcg (__main__.TestAutograd)": 120.408, "test_collect_callgrind (__main__.TestBenchmarkUtils)": 206.896, "test_collect_cpp_callgrind (__main__.TestBenchmarkUtils)": 122.507, "test_proper_exit (__main__.TestDataLoader)": 172.356, "test_proper_exit (__main__.TestDataLoaderPersistentWorkers)": 172.02, "testNBit (__main__.operator_test.fused_nbit_rowwise_conversion_ops_test.TestNBitGreedyFused)": 96.9435, "IntegerDivider (__main__.TestCUDAIntegerDivider)": 156.73700000000002 } ``` Reviewed By: walterddr, malfet Differential Revision: D27412861 Pulled By: janeyx99 fbshipit-source-id: ec3d327e0dc6c93093e8b1c8454e3166b0649909	2021-03-29 20:45:02 -07:00
Jerry Zhang	a1bd7918cc	[docs][quant] Fix FX Graph Mode Quantization tutorial link (#54715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54715 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D27338515 fbshipit-source-id: d61b140284548073df42ead1900f179c6ada2f02	2021-03-29 17:25:19 -07:00
Scott Wolchok	fbaad8c0f9	[PyTorch] TensorIterator::output should return const reference (#54811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54811 Callers can make a refcount bump themselves if they need one. ghstack-source-id: 125136516 Test Plan: CI Reviewed By: ngimel Differential Revision: D27377210 fbshipit-source-id: ea58c7190fe2d7896432e403ecb1c59761aa319d	2021-03-29 15:16:25 -07:00
Bert Maher	1267efce75	[nnc] Add a default constructor for Placeholder Summary: It's useful to be able to have an uninitialized Placeholder, sometimes, e.g., as a class member, where member initialization is awkward/impossible. (Yes, one could wrap a Placeholder in a unique_ptr, but it's an extra layer of cruft). Test Plan: `buck build //caffe2/test:jit` Reviewed By: navahgar Differential Revision: D27400784 fbshipit-source-id: 56191ee11cbb4bc91b5624af6329f2d6d007570b	2021-03-29 15:11:21 -07:00
anjali411	1bccd48465	Allow creating SugaredValue for a complex valued IValue and deserialization logic for "infj" and "nanj" global constants (#54328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54328 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27369134 Pulled By: anjali411 fbshipit-source-id: aec26750a6fc8917ee15306684b743d13a91570c	2021-03-29 14:46:29 -07:00
Yanan Cao	f4dfa02c03	Add documentation for torch.jit.Attribute and torch.jit.annotate (#54485 ) Summary: This is to prepare for new language reference spec that needs to describe `torch.jit.Attribute` and `torch.jit.annotate` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54485 Reviewed By: SplitInfinity, nikithamalgifb Differential Revision: D27406843 Pulled By: gmagogsfm fbshipit-source-id: 98983b9df0f974ed69965ba4fcc03c1a18d1f9f5	2021-03-29 14:44:53 -07:00
Edward Yang	1a0b77e7c4	Suggest TORCH_LIBRARY_FRAGMENT in duplicate TORCH_LIBRARY error message (#54883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54883 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27400592 Pulled By: ezyang fbshipit-source-id: 45d6a3a890979cce1b07e933f5335f3fa3a375a2	2021-03-29 14:43:11 -07:00
Scott Wolchok	e829754992	[PyTorch] Inline Tensor keyset-checking methods & similar getters (#54806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54806 These are all very small key set checks (or similar getters like `dtype()`, and we clearly want them to be inlinable -- we've even made them non-virtual for perf in TensorImpl and said so in comments. Don't make LTO work to figure that out. ghstack-source-id: 125060650 Test Plan: CI Reviewed By: ezyang Differential Revision: D27375016 fbshipit-source-id: 5c3dbfa38fa493c8f7e0ac4e5acd3598d5896558	2021-03-29 14:40:02 -07:00
Kurt Mohler	49b07ac5d1	Enable complex autograd for `index`, add `index` and `index_put` OpInfos (#54562 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54562 Reviewed By: malfet Differential Revision: D27300086 Pulled By: anjali411 fbshipit-source-id: 23e8335e6e4c8f10888b5c54a040880c5b499215	2021-03-29 14:36:43 -07:00
Rohan Varma	d5564618d0	[NCCL][Blocking Wait] Log set exceptions when checking for exceptions in (#54558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54558 In blocking wait's polling synchronization loop, we frequently call checkAndSetException() as part of isCompleted() to check the status of nccl operations. It would be useful to log here in case we encounter any exceptions (which are later thrown by `checkAndThrowException`). Also slightly refactors code previously added to make use of a helper function to get the error message given an `std::exception_ptr`. ghstack-source-id: 125124314 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D27136202 fbshipit-source-id: 256eb63c5c2a84be909722d3fd7377ad9303fa11	2021-03-29 14:15:45 -07:00
Rohan Varma	028d2d6e63	[NCCL] Enhance watchdog to log exceptions (#54557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54557 When looping through the nccl communicator cache checking for errors, enhance the watchdog to log exceptions that are set on the communicator. This will allow for better debugability since the NCCL error will be logged when the watchdog receives errors for the communicators and aborts them appropriately. Tested by forcing a NCCL error with NCCL_BLOCKING_WAIT=1 and verifying that the exception is indeed logged. ghstack-source-id: 125124310 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27106699 fbshipit-source-id: 1d2bd9f057a3796ce15dd8a4ce34cf6899eee45c	2021-03-29 14:15:42 -07:00
Rohan Varma	8c13dde458	[DDP] Remove redundant pass statement (#54219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54219 There is no need for this ``pass``. ghstack-source-id: 125124311 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27105234 fbshipit-source-id: 95496fa785fdc66a6c3c8ceaa14af565588325df	2021-03-29 14:15:39 -07:00
Rohan Varma	d185719455	Expose dist.monitored_barrier() API (#53787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53787 Per title, exposes a python-based monitored barrier API that we can use as part of debugability and may be useful for user applications. ghstack-source-id: 125124315 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26965127 fbshipit-source-id: 6c7826e63758462e3e5111f28cced54cba76a758	2021-03-29 14:15:37 -07:00
Rohan Varma	4541f60390	Gloo-only CPU-based monitored barrier (#53773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53773 Closes https://github.com/pytorch/pytorch/issues/52876 Implements a barrier by doing send/recv to rank 0, and rank 0 waits for these requests and on timeout, throws an exception indicating which rank did not join in the given timeout. This barrier is only intended for CPU use cases and built into process group gloo, and will be used for debugging synchronization/hang issues. Test Plan: Added UT Reviewed By: zhaojuanmao Differential Revision: D26921357 fbshipit-source-id: 7c16e861b4b8ea2bdd67a36b3de7b1029af7d173	2021-03-29 14:14:10 -07:00
Bert Maher	8e89d30f09	[nnc] Lower scalar constants as doubles/longs (#54824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54824 Test Plan: Imported from OSS Reviewed By: asuhan Differential Revision: D27383224 Pulled By: bertmaher fbshipit-source-id: 84b43ba6c22c1338c68c40a11ca647c3717f2abc	2021-03-29 14:06:04 -07:00
anjali411	7c8b0f2600	Test torch.chain_matmul for complex dtype (#54885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54885 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D27400936 Pulled By: anjali411 fbshipit-source-id: 415d843d7c55f4d84a8e9faab926a4895e1544d0	2021-03-29 13:37:23 -07:00
Jeff Daily	8cf97cbb55	[ROCm] add 4.1 to nightly builds (#54635 ) Summary: Depends on https://github.com/pytorch/builder/pull/685. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54635 Reviewed By: malfet Differential Revision: D27368700 Pulled By: walterddr fbshipit-source-id: 35ac59bed8450e7e69b1a4ba74955a72d729487a	2021-03-29 12:33:38 -07:00
Scott Wolchok	ff537b77ff	[PyTorch][easy] Move more strings in torch::class_ (#54547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54547 These arguments to `BuiltinOpFunction`'s ctor don't need to be copied. ghstack-source-id: 124690196 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27277318 fbshipit-source-id: 68f1f545ca977b2e1cabc91620da31719bf81e1a	2021-03-29 12:27:11 -07:00
Scott Wolchok	51fa25443f	[PyTorch][easy] Move strings into class_::defineMethod (#54533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54533 There were some forgotten moves here. Since the values are not otherwise used, let's just not give them names. ghstack-source-id: 124674348 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D27271991 fbshipit-source-id: 793dd4576db659b3b9b973a4e09ee3133cf41dfe	2021-03-29 12:25:41 -07:00
Wenlei Xie	67d44377e3	Remove hacky wrapper for about 100 kernels (#54751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54751 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54098 ghstack-source-id: 125141211 Test Plan: buck build //caffe2/aten/... BUILD_TENSOREXPR_BENCHMARK=ON BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install Reviewed By: smessmer Differential Revision: D27353530 fbshipit-source-id: 66f83edfb1016ca0040fb603e43604cd2db02c4c	2021-03-29 12:06:34 -07:00
Sam Estep	ec1bbe130c	Revert D27364777: [pytorch][PR] Add annotations to PRs from forks Test Plan: revert-hammer Differential Revision: D27364777 (`56f12e6199`) Original commit changeset: a830d372d7bb fbshipit-source-id: 56d490a4161a78ab28fd7d948b5a13af58efd9d7	2021-03-29 11:52:55 -07:00
peter	3187a71bbe	[test] vc toolchain modification (#54589 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54502 Needs to be merged after https://github.com/pytorch/builder/pull/684 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54589 Reviewed By: walterddr Differential Revision: D27402066 Pulled By: seemethere fbshipit-source-id: 68f92485d89edf2c3315de8c57447f180679c77d	2021-03-29 11:21:17 -07:00
Ailing Zhang	263180d7fc	Revert D26973911: Implement public API InferenceMode and its error handling Test Plan: revert-hammer Differential Revision: D26973911 (`7caa464631`) Original commit changeset: 0ebdac7a3cd5 fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273	2021-03-29 11:17:49 -07:00
Leszek Nowaczyk	1551bcc670	change logging.warn to logging.warning (#51727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51727 logging.warn() is deprecated since Python 3.3 in favor of logging.warning() Reviewed By: yinghai Differential Revision: D25785598 fbshipit-source-id: 391d834fe607cd571ee147445aa0a98910535099	2021-03-29 10:42:30 -07:00
Jeff Yang	9ef53f7e0f	docs: remove extra backticks in `narrow_copy` (#54669 ) Summary: fixes https://github.com/pytorch/pytorch/issues/41590 https://11813004-65600975-gh.circle-artifacts.com/0/docs/tensors.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54669 Reviewed By: ailzhang Differential Revision: D27328228 Pulled By: zou3519 fbshipit-source-id: 9a4a9bc4b265b0e82cf91f94dbbfd842fc42cdcb	2021-03-29 10:38:21 -07:00
Elias Ellison	63997db6ec	[JIT] fix freezing with mkldnn tensors (#54632 ) Summary: We were accessing their storage which will throw Pull Request resolved: https://github.com/pytorch/pytorch/pull/54632 Reviewed By: ezyang Differential Revision: D27372192 Pulled By: eellison fbshipit-source-id: 9985e85af7a35a3d6bf1c0be0185699c34877b94	2021-03-29 10:27:33 -07:00
Jeff Yang	74e01c1dd9	docs: change to FloatTensor for `requires_grad=True` (#54658 ) Summary: fixes https://github.com/pytorch/pytorch/issues/54506 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54658 Reviewed By: ailzhang Differential Revision: D27328321 Pulled By: zou3519 fbshipit-source-id: d29fa266a1cb2b6d8566055dfb6ce001edde9d96	2021-03-29 10:25:56 -07:00
Jeff Yang	6dedecc77c	docs: add `memory_format` in torch.empty (#54664 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43504 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54664 Reviewed By: ailzhang Differential Revision: D27328504 Pulled By: zou3519 fbshipit-source-id: 6c3e11473ada34f7e9fae7bae366328e50f71b0e	2021-03-29 10:23:36 -07:00
Jeff Yang	02f5c50828	docs: separate autosummary for flatten layers (#54663 ) Summary: fixes https://github.com/pytorch/pytorch/issues/46881 https://11815123-65600975-gh.circle-artifacts.com/0/docs/generated/torch.nn.Flatten.html#torch.nn.Flatten Pull Request resolved: https://github.com/pytorch/pytorch/pull/54663 Reviewed By: ailzhang Differential Revision: D27328367 Pulled By: zou3519 fbshipit-source-id: de1651a670181db8ea8ab16624c17ba08a88eb5d	2021-03-29 10:23:34 -07:00
Jeff Yang	7eef0c3ab5	docs: add functional group_norm (#54673 ) Summary: fixes https://github.com/pytorch/pytorch/issues/34209 https://11813548-65600975-gh.circle-artifacts.com/0/docs/nn.functional.html#normalization-functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/54673 Reviewed By: ailzhang Differential Revision: D27328211 Pulled By: zou3519 fbshipit-source-id: 75c49849377047502962157239857ed99afe6d1e	2021-03-29 10:21:50 -07:00
Jeff Yang	475251631b	docs: reference links to serialization.html (#54659 ) Summary: fixes https://github.com/pytorch/pytorch/issues/54311 https://11811979-65600975-gh.circle-artifacts.com/0/docs/generated/torch.save.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54659 Reviewed By: ailzhang Differential Revision: D27328281 Pulled By: zou3519 fbshipit-source-id: b88d02e5407238a338d537d013a297ae9cdf922b	2021-03-29 10:15:07 -07:00
Jeff Yang	59d1f08b4c	docs: fix docstring signature of torch.{onnx,utils} (#54662 ) Summary: fixes https://github.com/pytorch/pytorch/issues/50018 fixes https://github.com/pytorch/pytorch/issues/50017 https://11811000-65600975-gh.circle-artifacts.com/0/docs/onnx.html#functions https://11811000-65600975-gh.circle-artifacts.com/0/docs/mobile_optimizer.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54662 Reviewed By: ailzhang Differential Revision: D27328485 Pulled By: zou3519 fbshipit-source-id: e658542072ba633b9c309145fc5182edf895d0a6	2021-03-29 10:07:42 -07:00
Jeff Yang	84232b762b	docs: add `reset_peak_memory_stats` in cuda.rst (#54668 ) Summary: fixes https://github.com/pytorch/pytorch/issues/41808 https://11812999-65600975-gh.circle-artifacts.com/0/docs/cuda.html One question: does `reset_peak_stats` exist in `torch.cuda` ? I can't find anywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54668 Reviewed By: ailzhang Differential Revision: D27328444 Pulled By: zou3519 fbshipit-source-id: 098024d43da98e3249aa9aa71cb10126095504a4	2021-03-29 10:05:20 -07:00
Jeff Yang	12a454788b	docs: fix parameter in `torch.take` (#54667 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43495 https://11812612-65600975-gh.circle-artifacts.com/0/docs/generated/torch.take.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/54667 Reviewed By: ailzhang Differential Revision: D27328252 Pulled By: zou3519 fbshipit-source-id: 5812ebdaba063ca0a9c0f4a9becd00a570d84d30	2021-03-29 10:01:23 -07:00
Sam Estep	56f12e6199	Add annotations to PRs from forks (#54779 ) Summary: We've been using [pytorch/add-annotations-github-action](https://github.com/pytorch/add-annotations-github-action) to add annotations to PRs when they fail Flake8 or clang-tidy. Up until now, though, that functionality has only worked on PRs in pytorch/pytorch itself, not on PRs from forks. This PR fixes that using a technique from [this GitHub blog post](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/) (also linked in a comment in this diff). Pull Request resolved: https://github.com/pytorch/pytorch/pull/54779 Test Plan: janeyx99 and I tested this in the same GitHub repo used to test https://github.com/pytorch/pytorch/issues/54685 and https://github.com/pytorch/pytorch/issues/54693, including with PRs from forks. Reviewed By: walterddr Differential Revision: D27364777 Pulled By: samestep fbshipit-source-id: a830d372d7bb3b2529fc633b707b44f2b6cf9baa	2021-03-29 09:25:12 -07:00
Nikita Shulga	68af6d9565	Use custom sqrt if stdc++ does not fall back to C99 csqrt (#54820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54820 template implementation of std::sqrt() in libstdc++ yields incorrect results for `std::complex(-std::abs(x), -0.0)`, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89991 For example: ``` #include <iostream> #include <complex> int main() { std::cout << std::sqrt(std::complex<float>(-1.0f, -0.0f)) << std::endl; } ``` prints `(0, -1)` if libstdc++ is compiled to use C99 csqrt/csqrtf fallback, but `(0, 1)` if configured not to use it. Test Plan: CI Reviewed By: luciang Differential Revision: D27379302 fbshipit-source-id: 03f614fdb7ff734139736a2a5f6872cee0173bee	2021-03-29 09:05:48 -07:00
Jane Xu	717e70a824	(BE) Refactor get-test-times-from-S3 into s3_stat_parser (#54808 ) Summary: Moves more s3 parsing code to s3_stat_parser.py. This is another step in modularizing the parsing code more correctly. I will also be using this exact function in future slowTest code. Also replaces some Any's in the code to be Report. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54808 Test Plan: .pytorch-test-times generated before the code and after this code is the same. CI should pass, specifically the test tools GHA. Reviewed By: walterddr Differential Revision: D27375783 Pulled By: janeyx99 fbshipit-source-id: bec28551668b2eb3fdd60d802200993e493eac83	2021-03-29 08:45:22 -07:00
Kurt Mohler	3ddc6174da	Raise error in clip_grad_norm_ if norm is non-finite (#53843 ) Summary: BC-breaking note: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False` Fixes https://github.com/pytorch/pytorch/issues/46849 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843 Reviewed By: malfet Differential Revision: D27291838 Pulled By: jbschlosser fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4	2021-03-29 08:41:21 -07:00
Edward Yang	1f36ce6e4d	Restore storage on meta tensors; increase meta coverage (#53973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53973 Two parts to this PR; I had to put them together because adding support for X causes more test code to be exercised, which in turn may require a fix for Y. The first part is restoring the concept of storage to meta tensors. Previously, meta tensors had a nullptr storage (e.g., `meta_tensor.storage()` is an error.) As I was increasing the coverage of meta tensors, I started running into test cases (specifically memory overlap tests) that were failing because not having storage meant I couldn't check for memory overlap. After some discussion, we decided that it would make sense for meta tensors to model this as well (we already model strides, so getting accurate view information also seems useful). This PR does that by: * Rewrite all of the factory functions in MetaTensor.cpp to use the generic versions (which are very carefully written to not actually poke at the data pointer, so everything works out). The key idea here is we give meta tensors a special allocator, MetaAllocator, which always returns a nullptr even if you ask for a nonzero number of bytes. resize_ is also made generic; the normal variant can be used directly rather than having to instruct it to avoid resizing storage * Turn on memory overlap checking in TensorIterator even for meta tensors * Although meta tensors now have storage, the concept of meta storage is NOT exposed to Python land (as it would imply I would have to codegen MetaFloatStorage, MetaDoubleStorage, etc. classes). So `x.storage()` still raises an error and I have a cludge in `__deepcopy__` to break storage sharing upon deep copy (this is wrong, but no tests exercise this at the moment). The second part is adding more support for the most used functions in the test suite. * Inplace operations have very simple meta functions. I added `fill_`, `zero_`, `random_`, `uniform_` and `normal_`. In the case of random, I take advantage of pbelevich's templates for defining random kernels, so that I can reuse the common scaffolding, and then just register a noop stub that actually does the RNG. (Look, another structured kernels tiny variant!) * `copy_` is now implemented. Copying into a meta tensor is always OK, but copying out of a meta tensor raises an error (as we don't know what the "correct" data to copy out is in this case) * `empty_strided` usage from structured kernels now is implemented (TBH, this could have been done as soon as `empty_strided` was added) * Meta was missing in a few places in TensorOptions/DispatchKey utility functions, so I added them * Autograd engine now correctly homes meta tensors with CPU tensors (they have -1 device index so CUDA queues wouldn't work anyway) * `apply_`, `map_` and `map2_` are special cased to no-op on meta tensor self. These count as inplace operations too but they are implemented a little differently. Getting more meta function support triggers a number of bugs in the test suite, which I then fix: - Linear algebra functions sometimes don't report NotImplementedError because they get swallowed by catch all try blocks. This is tracked in https://github.com/pytorch/pytorch/issues/53739 - dlpack obviously doesn't work with meta tensors, I just disabled the test Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D27036572 Test Plan: Imported from OSS Reviewed By: agolynski, bdhirsh Pulled By: ezyang fbshipit-source-id: 7005ecf4feb92a643c37389fdfbd852dbf00ac78	2021-03-29 08:37:46 -07:00
Mike Ruberry	94efb48e16	Adds the cfloat dtype to the eager and jit variant consistency tests (#54854 ) Summary: Per title. One skip for addmm was needed. Either it or the jit test doesn't seem to handle a complex literal properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54854 Reviewed By: anjali411 Differential Revision: D27395651 Pulled By: mruberry fbshipit-source-id: 0bfadf0a8500f26d3a89f56f104fb44561f594d9	2021-03-29 08:15:27 -07:00
Rong Rong (AI Infra)	2fd1eb3a9f	make all arguments in test_history.py optional kwarg (#54797 ) Summary: This is to make it more flexible to be reused when pulling test stats other than by-test-case. Also it makes it less likely to use it wrong with positional arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54797 Test Plan: see the updated tools/test/test_test_history.py examples. Reviewed By: samestep Differential Revision: D27371903 Pulled By: walterddr fbshipit-source-id: 0ee02d654684315b44f5942904b857053d27e954	2021-03-29 07:25:14 -07:00
Rong Rong (AI Infra)	6d2bf76bba	Using latest windows CUDA exe (#54596 ) Summary: Using latest cuda_11.2.2_461.33_win10 to fix cu112 test failures. this should fix https://github.com/pytorch/pytorch/issues/51980. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54596 Reviewed By: seemethere Differential Revision: D27365008 Pulled By: walterddr fbshipit-source-id: 682e79888d9f10c0a5b227d66165ea50c47ba0f9	2021-03-29 07:20:34 -07:00
Brian Hirsh	86b1f4e9f2	fix silent correctness bug with channels_last usage of upsample cuda kernels (#54744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54744 Fixes https://github.com/pytorch/pytorch/issues/54590 After the porting the upsample operators to be structured, they now forward memory_format information to the output. This is a problem for the cuda kernels, which are not implemented to deal with `torch.channels_last` memory format. The operators are: * upsample_nearest2d * upsample_bilinear2d * upsample_nearest3d * upsample_trilinear3d This fix just allocates a temporary, contiguous output tensor when that happens, writes the results to the temporary and copies the results back to the output tensor. I held off on adding tests to get the fix out quickly, but I wrote a script and ran some manual tests, that basically just asserts that the outputs are the same for cpu and cuda, for some threshold. I ran it for all 4 operators: ``` import torch def basically_equal(t1, t2): epsilon = 1e-4 diffs = torch.abs(t1 - t2) print(torch.all(diffs < 1e-4)) # upsample 2d a = torch.arange(48).reshape(2, 2, 3, 4).contiguous(memory_format=torch.channels_last).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=2, mode='bilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=2, mode='bilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) # upsample 3d a = torch.arange(96).reshape(2, 2, 2, 3, 4).contiguous(memory_format=torch.channels_last_3d).float() out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='nearest') out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='nearest') basically_equal(out_cpu, out_cuda.to("cpu")) out_cpu = torch.nn.functional.interpolate(a, scale_factor=3, mode='trilinear', align_corners=True) out_cuda = torch.nn.functional.interpolate(a.to('cuda'), scale_factor=3, mode='trilinear', align_corners=True) basically_equal(out_cpu, out_cuda.to("cpu")) ``` prints ``` tensor(True) tensor(True) tensor(True) tensor(True) ``` One thing that was weird- `upsample_bilinear2d` and `upsample_trilinear3d` were only accurate across cpu/cuda with an epsilon of `1e-4`. That tentatively sounds close enough to say that cuda isn't "wrong" (?), but that's not exactly "equal"... and I also ran the script before my change, and `bilinear2d` and `trilinear3d` were also the same across cpu/cuda with an epsilon of `1e-4`. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27351393 Pulled By: bdhirsh fbshipit-source-id: b33f46e4855dc8b49b363770190b639beebbf5a7	2021-03-29 06:42:03 -07:00
Yukio Siraichi	4e5af53d29	Deprecate legacy constructor `torch.Tensor()` (#54414 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/47112 This pull request is the final step in [the proposed plan](https://github.com/pytorch/pytorch/issues/47112#issuecomment-789972007) for deprecating `torch.Tensor()` constructor. Specifically, it updates the docs and throws `TORCH_WARN_ONCE` if someone uses `torch.Tensor()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54414 Reviewed By: ailzhang Differential Revision: D27325267 Pulled By: heitorschueroff fbshipit-source-id: 5442572603d340b89e8cc5a886a330dd9b13550a	2021-03-29 05:14:47 -07:00
Supriya Rao	a0a7a2d648	[quant][fx] store dtype, axis as literals in the graph (#54624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54624 previously we were creating setattr nodes for dtype and axis. The FX convention is that primitive types are embedded as literals in args/kwargs. With this change we won't see getattr nodes in the graph anymore for dtype/axis Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27306898 fbshipit-source-id: a7c91c7cb21ee96015c7f8830b38d943ada65358	2021-03-28 21:59:49 -07:00
Ilya Persky	9e6877c5c5	Port torch.outer method_tests() to OpInfos (#54798 ) Summary: An attempt to make an OpInfo-based test for torch.outer (aka toch.ger). As a part of https://github.com/pytorch/pytorch/issues/54261 effort. mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/54798 Reviewed By: ngimel Differential Revision: D27384891 Pulled By: mruberry fbshipit-source-id: 0c90f84a388d2addc8de37d0c1713d8598211555	2021-03-28 18:34:54 -07:00
kshitij12345	b7c5d57563	[testing] support op with args/kwargs in test_unary_ufunc (#52194 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52194 Reviewed By: ngimel Differential Revision: D27385139 Pulled By: mruberry fbshipit-source-id: 63118dee33a138ef13810465d2d2d9fa194dfb28	2021-03-28 18:10:20 -07:00
haozhe.zhu	07350da3b4	enable bf16 for cat serial kernel (#54674 ) Summary: cat 10 2-D tensors at dim=1 \| \| shape \| serial kernel \| copy kernel \| \| ------------ \| ------------- \| ------------ \| ------------- \| \| fp32 \| 1024 * 16k \| 105.45 ms \| 102.41 ms \| \| fp32 \| 1024 * (100 + i) \| 324.75 us \| 448.66 us \| \| bf16 \| 1024 * 16k \| 49.82 ms \| 51.39 ms \| \| bf16 \| 1024 * (100 + i) \| 164.74 us \| 244.64 us \| i = {0, ..., 9} benchmark code ``` import torch import torch.utils.benchmark as benchmark def cat(args, dim=0): return torch.cat(args, dim) tensors = [] for i in range(10): tensors.append(torch.rand(1024, 16 1024)) # tensors.append(torch.rand(1024, 16 1024).bfloat16()) # tensors.append(torch.rand(1024, 100 + i)) # tensors.append(torch.rand(1024, 100 + i).bfloat16()) t0 = benchmark.Timer( stmt='cat(tensors, dim=1)', setup='from __main__ import cat', globals={'tensors': tensors}, num_threads=1) print(t0.blocked_autorange()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54674 Reviewed By: ailzhang Differential Revision: D27325347 Pulled By: heitorschueroff fbshipit-source-id: 7a0f4bf8d92dbf8e725fdd2e8a2c901188811d6f	2021-03-28 17:05:10 -07:00
haozhe.zhu	01b1557014	enable bf16 vec copy (#54671 ) Summary: Enable bf16 vectorized copy. BFloat16's copy get 2x performance for fp32 as our expectation. BFloat16's vec copy dose not show performance gain compare with scalar version with op benchmark. This should caused by the memory system of operator. The system will really "read/write" a scalar at one time, although the code is written in scalar version. benchmarks code: ``` import torch import torch.utils.benchmark as benchmark # x = torch.empty(10 * 18304 * 1024 * 16, dtype=torch.bfloat16) x = torch.empty(10 * 18304 * 1024 * 16, dtype=torch.float) def copy(tensors): for t in tensors: x.copy_(t) tensors = [] for i in range(2): # l3 cache size 36608k = 18304 bfloat16 * 2 byte(per bfloat16) # tensors.append(torch.rand(10 * 18304 * 1024 * 16).bfloat16()) tensors.append(torch.rand(10 * 18304 * 1024 * 16)) t0 = benchmark.Timer( stmt='copy(tensors)', setup='from __main__ import copy', globals={'tensors': tensors}, num_threads=1) print(t0.timeit(20)) ``` Before this comit: fp32: 3.84 s 1 measurement, 20 runs , 1 thread bf16: 1.89 s 1 measurement, 20 runs , 1 thread After: fp32: 3.71 s 1 measurement, 20 runs , 1 thread bf16: 1.85 s 1 measurement, 20 runs , 1 thread Pull Request resolved: https://github.com/pytorch/pytorch/pull/54671 Reviewed By: ailzhang Differential Revision: D27325350 Pulled By: heitorschueroff fbshipit-source-id: 1a3b8ca17b4c60dbb3e86bf196f63e0a05228c65	2021-03-28 08:40:24 -07:00
kshitij12345	0527d14248	[numpy] Add torch.take_along_dim (#52833 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 Wrapper around the existing `torch.gather` with broadcasting logic. TODO: * [x] Add Doc entry (see if phrasing can be improved) * [x] Add OpInfo * [x] Add test against numpy * [x] Handle broadcasting behaviour and when dim is not given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52833 Reviewed By: malfet Differential Revision: D27319038 Pulled By: mruberry fbshipit-source-id: 00f307825f92c679d96e264997aa5509172f5ed1	2021-03-28 05:22:51 -07:00
Xiang Gao	eec48303c0	Make index_add take a scalar argument alpha (#54176 ) Summary: ``` index_add(Tensor self, int dim, Tensor index, Tensor source) -> Tensor ``` now becomes ``` index_add(Tensor self, int dim, Tensor index, Tensor source, Scalar alpha=1) -> Tensor ``` Generally, this sounds useful and harmless, and inside PyTorch, we are already needing this feature in `add_out_dense_sparse_cuda`, see the `SparseCUDATensorMath.cu` change in this PR. Test not added yet. Will add if after discussion we believe this is a good idea. - [ ] TODO: add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/54176 Reviewed By: ngimel Differential Revision: D27319198 Pulled By: mruberry fbshipit-source-id: fe43be082d1230c87c5313458213d5252be2ff23	2021-03-28 00:22:45 -07:00
Lanlan Liu	695eef05a4	optimizer exploration - v1 and v2 + fix position_weighted optimizer + decoupled weight decay (#54042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53881 1. Fix position_weighted optimizer: Position weighted layer uses default optimizer but is actually gradient_slice, which will cause problem if we do not handle it properly in the new optimizier. The solution is to use sparseadagrad when it is gradient_slices. 2. Optimizer implementation of v1 and v2: using 1st momentum with/without bias_correction. 3. also implemented decoupled weight decay in the new optimizer. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_2 -- test_mlp_optimization buck test //caffe2/caffe2/python:optimizer_test -- TestDecayAdagrad buck test //caffe2/caffe2/python/operator_test:decay_adagrad_test ctr_mbl_feed work flow: f255731660 oc work flow: f255739503 Reviewed By: 0x10cxR1 Differential Revision: D26839668 fbshipit-source-id: 2b6881c1a88540ef5766be40f5e80001257e2199	2021-03-27 23:03:29 -07:00
Rohan Varma	5c3d80d8fa	[DDP] Mark a few variables as const in reducer (#54764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54764 We mark a few vars as const in Reducer, also do this for replicas_ and process_group_ as they should not be changed by Reducer during training. This can help eliminate issues at compile time and prevent the developer from accidently changing these variables. ghstack-source-id: 125040110 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27357132 fbshipit-source-id: 23a0edf754a8e4f9e6440e99860e5549724cb7ad	2021-03-27 21:40:18 -07:00
Rohan Varma	671f80a313	[c10d] s/torch::autograd::variable/at::Tensor/g (#54763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54763 Replaces deprecated torch::autograd::variable with at::Tensor. torch::autograd::variable is defined as equal to at::Tensor now so this should be a noop, but follows convention of using tensor instead of Variable. ghstack-source-id: 125040109 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D27356450 fbshipit-source-id: 1a001358d7726a597141ec47803c8213db4814c0	2021-03-27 21:38:51 -07:00
Ailing Zhang	7caa464631	Implement public API InferenceMode and its error handling (#53343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343 Test Plan: Imported from OSS Reviewed By: ezyang, nikithamalgifb Differential Revision: D26973911 Pulled By: ailzhang fbshipit-source-id: 0ebdac7a3cd554822d26d5a40f539b6e2aaec61d	2021-03-27 13:44:23 -07:00
Edward Yang	2309173143	Compute Tensor::toString() without reference to backend (#54711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54711 Just print the dispatch key directly. The format here doesn't really make sense but you'll still get something like CPUFloatTensor (because the dispatch key is just CPU). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27338811 Pulled By: ezyang fbshipit-source-id: f459c5f7c006c06df4913ab33697eae89c46d83f	2021-03-27 11:55:52 -07:00
Edward Yang	f067972527	Make memory overlap a little less precise so it works with null data ptr (#54710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54710 I'm going to make meta tensors have storage (but DataPtr is always null) so that I can accurately report memory overlap error checking, but I now have a problem which is that if memory overlap test looks at the actual data pointer, everything is going to look like it aliases! A more conservative test is to just see if the Storage objects themselves alias, and assume that the data pointers are unique if they don't. The loss of precision stems from if you unsafely have two distinct storage objects that point to the same data pointer. This situation is pretty rare and so I think it is worth it (and I am hoping no tests trigger by this.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27338810 Pulled By: ezyang fbshipit-source-id: 5ebaf81c22824494c47c1ae78982d9c0e5cba59f	2021-03-27 11:55:50 -07:00
Edward Yang	c782949e17	Make the fuser raise NotImplementedError when unknown device is hit (#54709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54709 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D27338815 Pulled By: ezyang fbshipit-source-id: 5cbaf3c19b9b85cc3f171f3b405d0cd98f832e65	2021-03-27 11:55:47 -07:00
Edward Yang	6445c9a1cb	Avoid testing device in cdist when called in a "Math" context (#54708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54708 cdist advertises itself as Math but actually it error checks that the inputs are CPU/CUDA in cdist_impl, which is invoked from a composite context in some situations. I worked around this by ensuring that when cdist_impl was called in this way, we DON'T do the device checks, but the entire function is a little janky and I filed an issue about it at #54096 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27338813 Pulled By: ezyang fbshipit-source-id: 1202b02c58584a33dc32a5270e59e5f0af6398c5	2021-03-27 11:55:44 -07:00
Edward Yang	c9e0aab2bf	Make convolution_overrideable default implementation raise NotImplementedError (#54707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54707 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27338807 Pulled By: ezyang fbshipit-source-id: b18c39a09d130626709408c08034c260c34e2bc5	2021-03-27 11:55:42 -07:00
Edward Yang	ed560cf2c6	Disambiguate where 'Doesn't run' error message comes from (#54706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54706 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: wenleix, anjali411 Differential Revision: D27338812 Pulled By: ezyang fbshipit-source-id: 76321e49f2a8140595c89775afbecd5717e31c2e	2021-03-27 11:55:39 -07:00
Edward Yang	b5ab348253	Fix missing format string qualifier (#54705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54705 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27338808 Pulled By: ezyang fbshipit-source-id: b21c931c2306e525bc444766bc203bb303868dbf	2021-03-27 11:55:36 -07:00
Edward Yang	d9a7c758e1	Rename linalg.det test so that it generates a valid method name (#54704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54704 See https://github.com/pytorch/pytorch/issues/54607 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27338809 Pulled By: ezyang fbshipit-source-id: 52a246a8b8743b8a887403c02df6271ba6db3617	2021-03-27 11:55:33 -07:00
Edward Yang	05fa570bbc	Add empty_generic, which allocates an empty tensor in a device-generic way (#54703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54703 The trick is that this function takes in the allocator and dispatch key explicitly; so you still need to know where to find the appropriate allocator. The plan is to use this for meta tensors, but you probably could also use this for empty_cuda as well. It also takes in arguments post optional resolution, which can save a few instructions if you want to call this function directly (no uses yet). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27338814 Pulled By: ezyang fbshipit-source-id: 131c97922d245e9a2de547527123b464bddb2f99	2021-03-27 11:55:31 -07:00
Edward Yang	90e70ace9b	Fix some more native_functions.yaml mistakes (#54597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54597 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27328667 Pulled By: ezyang fbshipit-source-id: 79ddfda28e05d4cbcbed37a969f2577ea7c292fb	2021-03-27 11:55:28 -07:00
Edward Yang	e70f3d1189	Nasty little hack to preserve NotImplementedError raised in interpreter (#54627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54627 This is the simplest little fix to get interpreter to preserve NotImplementedError, so that the test suite doesn't start choking on meta tensors not working in interpreter. It is sound and correct but doesn't work for other c10::Error subclasses with special handling. A more proper fix is requested at https://github.com/pytorch/pytorch/issues/54612 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: wenleix, ngimel Differential Revision: D27328666 Pulled By: ezyang fbshipit-source-id: 483bef062de5a907d20e2d9e25eafe2d5197cf8d	2021-03-27 11:53:06 -07:00
Richard Barnes	e5634f5f25	More types for torch (#54037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54037 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27054755 fbshipit-source-id: f21985e201b35bdb83269595cdcf5e1e64837e52	2021-03-27 08:57:15 -07:00
Ilqar Ramazanli	d59fb7a2f6	Add complex autograd support for `torch.unfold` (#52999 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51875 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52999 Reviewed By: H-Huang Differential Revision: D26735206 Pulled By: iramazanli fbshipit-source-id: ee134461e97079722a79f89737a7f0d2b620c2c8	2021-03-27 08:21:28 -07:00
Rahul Arunapuram Gokul	6eaf96961d	[codemod] fix tautological imports Test Plan: waitforsandcastle Reviewed By: koronthaly Differential Revision: D27310963 fbshipit-source-id: 9ca0a6468e00d481b1583ab98578dc70f80bb3bf	2021-03-27 01:15:57 -07:00
Pritam Damania	65781f94ad	Enable faulthandler for distributed tests. (#54531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54531 Enabling faulthandler will intercept signals like SIGSEGV, SIGFPE, SIGABRT, SIGBUS and SIGKILL and dump the entire python traceback before the process goes down. This can help us in debugging flaky tests where a process crashes and we need to debug what happened. ghstack-source-id: 125045894 Test Plan: 1) Tested locally to see traceback is produced. 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D27271048 fbshipit-source-id: ca12125a9da6cdfc7bac5619ad1c7e116666014b	2021-03-27 00:43:58 -07:00
Ailing Zhang	1d5cc6c53d	Move requires_grad_/backward out of VariableTypeManual. (#54543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54543 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D27321819 Pulled By: ailzhang fbshipit-source-id: 991c83e134d109e270c872b4b79026dcb732d77a	2021-03-26 23:16:32 -07:00
anjali411	d63dd07f06	Add JIT support for cmath unary ops (#54089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54089 This PR adds: 1. support for the following [cmath](https://docs.python.org/3/library/cmath.html) functions: - Power and logarithmic functions (`cmath.{exp, log, log10, sqrt}`) - Trigonometric functions (`cmath.{sin, cos, tan, asin, acos, atan}`) - Hyperbolic functions (`cmath.{sinh, cos, tanh, asinh, acosh, atanh}`) - `cmath.phase()` 2. `abs()` Future work: 1. support - `cmath.{polar, rect}` - classification functions (`cmath.{isfinite, isnan, isinf, isclose}`) - constants (`cmath.{pi, e, inf, nan, infj, nanj}`) Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27339149 Pulled By: anjali411 fbshipit-source-id: fe1a019c95adbc9f27f7948eb28c0c3b93d8c026	2021-03-26 22:55:34 -07:00
Pritam Damania	8eb896ce99	Improve error message while setting error twice. (#54464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54464 In case where we accidentaly set an error twice on a Future, we get a cryptic error like this: ``` Exception in thread pool task: !completed() INTERNAL ASSERT FAILED at "aten/src/ATen/core/ivalue_inl.h":534, please report a bug to PyTorch. ``` This PR, updates the error message to include some additional information about what the previous error was. ghstack-source-id: 125039478 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: swolchok Differential Revision: D27249758 fbshipit-source-id: 517cf3837fb7b7821312e101e8813844c188f372	2021-03-26 21:55:19 -07:00
Pritam Damania	f612d4eb58	Add 'remote_parameters' and 'get_module_rref' to RemoteModule docs. (#54645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54645 Had to replace RRef[..] with just RRef in the return signature since sphynx seemed to completely mess up rendering RRef[..] ghstack-source-id: 125024783 Test Plan: View locally. Reviewed By: SciPioneer Differential Revision: D27314609 fbshipit-source-id: 2dd9901e79f31578ac7733f79dbeb376f686ed75	2021-03-26 21:41:28 -07:00
Jagadish Krishnamoorthy	316804e373	[test_c10d] Add wait in nccl high priority stream test (#54714 ) Summary: Add wait in test_pass_nccl_options_high_priority_stream after the all reduce operation. Without wait, the allreduce operation might be still running and the comparison of result might not be valid. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/54714 Reviewed By: ezyang Differential Revision: D27379544 fbshipit-source-id: 6393d25f8f3d5635c5d34c9b3aac8b801315b48e	2021-03-26 20:47:00 -07:00
Bert Maher	e4d19798f3	[nnc][tests] Convert a bunch of FileCheck to checkIR Summary: I added a helper to convert a Stmt to string and FileCheck it, so started using it in a bunch of places. I replaced about half the current uses, got tired, started to write a Perl script to automate it, realized that was hard, and decided to give up for a bit. But this cleans up some of the tests a bit, so seems easy to review and worth landing. Test Plan: test_tensorexpr --gtest_filter=LoopNest.* Reviewed By: navahgar Differential Revision: D27375866 fbshipit-source-id: 15894b9089dec5cf25f340fe17e6e54546a64257	2021-03-26 20:27:50 -07:00
Bert Maher	24f589df44	[nnc] Disabled test case for failure in implementing conv1d (#54756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54756 We have multiple bugs here, one relating to index flattening and the other to computeAt. ghstack-source-id: 125054729 Test Plan: yikes Reviewed By: ZolotukhinM Differential Revision: D27354082 fbshipit-source-id: 8b15bac28e3eba4629881ae0f3bd143636f65ad7	2021-03-26 20:27:48 -07:00
Bert Maher	e542e67253	[nnc] Test case for computeAt with reduction (#54755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54755 As title. A step on the way to using computeAt to optimize convolution. ghstack-source-id: 125054730 Test Plan: new test Reviewed By: ZolotukhinM Differential Revision: D27353663 fbshipit-source-id: 930e09d96d1f74169bf148cd30fc195c6759a3e9	2021-03-26 20:25:18 -07:00
Wenlei Xie	71201340c6	Remove 13 hacky wrapper not required (#54793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54793 ghstack-source-id: 125033229 Test Plan: buck build //caffe2/aten/... BUILD_TENSOREXPR_BENCHMARK=ON BUILD_STATIC_RUNTIME_BENCHMARK=ON python setup.py install Generated `build/aten/src/ATen/NativeFunctions.h` is same Reviewed By: smessmer Differential Revision: D27369943 fbshipit-source-id: 5171bad44290a4ecf62a8f4deab17252c5bd0852	2021-03-26 20:08:10 -07:00
Michael Melesse	2620bce42a	[ROCM] load only hipfft separately past rocm4.1 (#54349 ) Summary: This PR is a follow up to https://github.com/pytorch/pytorch/pull/53408. It only loads hipfft if the version is rocm 4.1 or after and stops loading rocfft. This was done to resolve some issues observed in our internal ci due to conflicts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54349 Reviewed By: ezyang Differential Revision: D27374252 Pulled By: ngimel fbshipit-source-id: 724e80df5011ea8fabd81739e18ae8a13d3a7ea0	2021-03-26 19:54:25 -07:00
eellison	0e320ddb36	Lazily initialize alias db constant prop (#54640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54640 If we are running constant propagation on a graph that doesn't have any operators with constant inputs and any mutable inputs/outputs, we do not need to initialize an alias db. This is going to be used to speed up symbolic shape analysis. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27340863 Pulled By: eellison fbshipit-source-id: 087b2a33b42c58fa5dae405d652b056d0f1d72e7	2021-03-26 19:44:29 -07:00
Muhammed Yavuz Nuzumlali	ba1f640928	Optimize memory usage in logsumexp_out (#51239 ) Summary: Partly fixes https://github.com/pytorch/pytorch/issues/31837. ### Update: This is ready for review. Currently, `torch.logsumexp(input, out=result)` internally creates 2 intermediate tensors with same shape as `input` tensor. This causes unnecessary OOM problems when tensor size is large. These tensors come from the following: 1. `self - maxes` will create a new tensor with shape of `self` 2. `at::exp` will create another tensor with the shape of `self` To get rid of this problem, we can use `(self-maxes).exp_()` that performs exp operation in-place. This would reduce memory need from `~3 x input.shape` to `~2 x input.shape` (`self-maxes` is still there) I think we can't get rid of having a single intermediate tensor with shape of `input` because of `self - maxes` as we have to keep `self` intact. The only scenario would be to have a `torch.Tensor.logsumexp_` method that can do in-place operations on tensor itself. However, I didn't see any in-place method example for reduction operations, so it might not be a good fit. This is my first contribution here, please let me know if I'm missing anything! Thanks! Pull Request resolved: https://github.com/pytorch/pytorch/pull/51239 Reviewed By: anjali411 Differential Revision: D27363147 Pulled By: ezyang fbshipit-source-id: 696fa8764b74386a80b4aa33104f3f9ca57ed712	2021-03-26 19:28:55 -07:00
Ailing Zhang	f22bad752d	Move some variable ops out of VariableTypeManual. (#54459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54459 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D27321820 Pulled By: ailzhang fbshipit-source-id: e45392d2332f3c4bc31f20a500f58cdcd75f9ddf	2021-03-26 18:42:46 -07:00
Edward Yang	394b720e38	Fix raw_deleter() bug with PYTORCH_NO_CUDA_MEMORY_CACHING=1 (#54775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54775 Thanks danpovey for reporting. Fixes https://github.com/pytorch/pytorch/issues/54770 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27363730 Pulled By: ezyang fbshipit-source-id: 81777aff7d9194b060fb076ef97cf788f2a4f43e	2021-03-26 15:00:47 -07:00
Nikitha Malgi	416ba5c48f	Merge CUDA Streams and Events (#53902 ) Summary: ----------- - Updates current_stream and default stream API's to take `optional[device]` argument - Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT - Merges StreamContext manager for both Eager and JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902 Test Plan: ------ Run JIT tests: python test/test_jit.py -v TestCUDA Run eager tests: python test/test_cuda.py -v TestCuda Reviewed By: SplitInfinity Differential Revision: D27285996 Pulled By: nikithamalgifb fbshipit-source-id: 45d9fee9a582b5f4c82330f5f99eb88584804270	2021-03-26 14:19:39 -07:00
Wenlei Xie	593295daac	Migrate kernels with TensorOptions to C10 full dispatcher (#54539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54539 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54468 ghstack-source-id: 125018630 # Facebook: The following 2 files are changed on fb side: ``` // Should be hidden ``` Test Plan: buck build //caffe2/aten/... Reviewed By: smessmer Differential Revision: D27273744 fbshipit-source-id: 35c1bff63189477645008caaf0dc794096e3fcc4	2021-03-26 13:55:22 -07:00
Ilqar Ramazanli	ee73c752c6	Delete unnecessary empty file (#54796 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54796 Reviewed By: albanD Differential Revision: D27370733 Pulled By: iramazanli fbshipit-source-id: 5f78e9250a545afb91b4bc7b14daa7135a2b6a1b	2021-03-26 13:45:05 -07:00
Nikita Shulga	14a2501786	Update max-version in setup.py to 3.9 (#54690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54690 Reviewed By: seemethere Differential Revision: D27330462 Pulled By: malfet fbshipit-source-id: db332acf5aa5bff67af2bef777935f2387bc963c	2021-03-26 12:45:03 -07:00
anjali411	3ed6e0ce6c	Remove ops from the complex_list for which the method_tests have been ported (#54754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54754 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27354326 Pulled By: anjali411 fbshipit-source-id: 745cbc24b885f7d9263fa8796279200518e56edb	2021-03-26 12:09:28 -07:00
James Reed	3db2333d09	[JIT] Make NoneType annotation_str emit `NoneType` instead of `None` (#54642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54642 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27314174 Pulled By: jamesr66a fbshipit-source-id: 153e9aa4ab781fa1d49d9d55a2e487bf7b04f0d7	2021-03-26 11:32:20 -07:00
James Reed	1e9ad6e5cd	[JIT] Fix TupleType.annotation_str to conform to `typing` module syntax for empty tuple type (#54641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54641 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D27314173 Pulled By: jamesr66a fbshipit-source-id: 13c6e6b571672adc443429f59f3b30aae356c03d	2021-03-26 11:30:17 -07:00
Jeffrey Wan	df70e2fde5	Refactor get analytical jacobian (#54049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54049 The goal of this is to factor out the core logic of getting the analytical jacobian which is effectively doing `f(grad_out) = grad_out^T J = grad_input`. This allows us to test a lot of logic that was not possible before because now we can replace f with whatever we want in order to simulate potential issues that gradcheck is designed to catch. Edit: I realize a lot of things this PR was originally aiming to allow is actually possible with hooks, hence the tests have already been added in a earlier PR in the stack. But this is still slightly useful for reducing code duplication when adding the new fast gradcheck code (more details below) After this change, `get_analytical_jacobian` is only responsible for gathering a list of rows that are later combined into a single Jacobian tensor. This means we don't have to perform any checks for correctness of the dtypes/size at this step We factor out that logic into a separate function, `combine_jacobian_rows`, which handles the list of rows -> single Tensor step for each jacobian, and the error checking it entails. (This allows this code to be shared between the fast/slow versions.) Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27307240 Pulled By: soulitzer fbshipit-source-id: 65bb58cda000ed6f3114e5b525ac3cae8da5b878	2021-03-26 11:19:19 -07:00
Jeff Yang	0435059ddf	docs: fix docstring signature in `all_reduce_multigpu` (#54665 ) Summary: fixes https://github.com/pytorch/pytorch/issues/43500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54665 Reviewed By: ezyang Differential Revision: D27340481 Pulled By: rohan-varma fbshipit-source-id: d53c36b41dd26c7a791d3674a5b4b67daaadae13	2021-03-26 11:08:32 -07:00
Hameer Abbasi	db3a9d7f8a	Fix __torch_function__ tests. (#54492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54492 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27292567 Pulled By: ezyang fbshipit-source-id: dc29daea967c6d8aaf63bdbcb4aff0bb13d7a5f7	2021-03-26 10:59:15 -07:00
Edward Yang	13b1ca9466	Rename DefaultBackend to CompositeExplicitAutograd (#54470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54470 ``` git grep -l 'DefaultBackend' \| xargs sed -i 's/DefaultBackend/CompositeExplicitAutograd/g' ``` Plus a quick fixup in native/README.md Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27253240 Pulled By: ezyang fbshipit-source-id: 964df951ea8b52fa72937f3cc66aeaf49a702e6f	2021-03-26 10:53:30 -07:00
Edward Yang	70dd2a2bdd	Add myself on all native_functions.yaml code reviews (#54595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54595 Seeing a lot of misuse of DefaultBackend, want to try to nip some of these in code review. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27301721 Pulled By: ezyang fbshipit-source-id: 1a39426cb6cac5c7f322df6f8a69ccb463f1b258	2021-03-26 10:51:40 -07:00
Rong Rong (AI Infra)	5c6208abba	remove docker dir (#54729 ) Summary: i dont think docker/ folder is used anymore. creating this draft to verify Pull Request resolved: https://github.com/pytorch/pytorch/pull/54729 Reviewed By: ezyang Differential Revision: D27364811 Pulled By: walterddr fbshipit-source-id: 3e4a9d061b0e5f00015a805dd8b4474105467572	2021-03-26 10:47:13 -07:00
Jeff Daily	4399aadcc7	add sndfile yum package to centos dockerfile (#54687 ) Summary: Fixes error when running torch test suite inside a centos CI image. As described by https://pypi.org/project/SoundFile/0.10.3.post1/, `On Linux, you need to install libsndfile using your distribution’s package manager`. This was missing from the centos CI image. ``` python test_spectral_ops.py -v ... Traceback (most recent call last): File "test_spectral_ops.py", line 25, in <module> import librosa File "/opt/conda/lib/python3.6/site-packages/librosa/__init__.py", line 211, in <module> from . import core File "/opt/conda/lib/python3.6/site-packages/librosa/core/__init__.py", line 6, in <module> from .audio import * # pylint: disable=wildcard-import File "/opt/conda/lib/python3.6/site-packages/librosa/core/audio.py", line 8, in <module> import soundfile as sf File "/opt/conda/lib/python3.6/site-packages/soundfile.py", line 142, in <module> raise OSError('sndfile library not found') OSError: sndfile library not found ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54687 Reviewed By: ezyang Differential Revision: D27332975 Pulled By: walterddr fbshipit-source-id: 9c6b37545e9f2536c83e606912859439847c884a	2021-03-26 10:35:24 -07:00
Robin Cheng	20d8fe83cd	[TSAN] Suppress data races in caffe2/c10/util/Logging.cpp Summary: This suppresses some data races reported by TSAN. See the associated task(s) below for context, including sample stack traces caused by these races and reproduction instructions. This diff is automatically generated. Therefore, the way it makes suppressions may not be as beautiful as if written by hand. However, we don't have the resources to manually adjust these diffs, nor do we have the capacity to actually fix the bugs; we just want to get the existing bugs out of the way so we can enable TSAN across the fleet. If you are a reviewer please do one of the following: 1. Accept the diff as is, and you may follow up with more changes (or fix the bugs) later. 2. Fix the data races in a different diff and land it within a reasonable amount of time (e.g. a week), and comment about it here. 3. Comment to suggest us a different code location(s) to suppress these data races. Test Plan: Unit tests were automatically run as part of https://www.internalfb.com/intern/sandcastle/job/22517998509525934/ Reviewed By: ezyang Differential Revision: D26094360 fbshipit-source-id: 06c285570bcf7a1491d8f17d1885d065ef0bc537	2021-03-26 10:11:23 -07:00
Ilya Persky	2be1b486ce	Drop Python 2 support in common_device_type.py (#54691 ) Summary: Hey! Just stumbled across these Python 2 fragments while reading the source code and thought it could be removed, since the Python 2 support has already been dropped. mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/54691 Reviewed By: mruberry Differential Revision: D27344439 Pulled By: ailzhang fbshipit-source-id: 926303bfff9afa6dabd2efb5e98f9d0d9ef83dc7	2021-03-26 10:04:52 -07:00
Heitor Schueroff	f6634be4c2	Fix OpInfo failing without scipy (#54735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54735 One of the tests didn't wrap scipy call with TEST_SCIPY. Also, the wrapper function seems unnecessary and requires lambdas to be created. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27351349 Pulled By: heitorschueroff fbshipit-source-id: 029e273785b11e01d6be7b816469654de6583deb	2021-03-26 08:01:24 -07:00
Bel H	645119eaef	Lowering NLLLoss/CrossEntropyLoss to ATen code (#53789 ) Summary: * Lowering NLLLoss/CrossEntropyLoss to ATen dispatch * This allows the MLC device to override these ops * Reduce code duplication between the Python and C++ APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53789 Reviewed By: ailzhang Differential Revision: D27345793 Pulled By: albanD fbshipit-source-id: 99c0d617ed5e7ee8f27f7a495a25ab4158d9aad6	2021-03-26 07:31:08 -07:00
Rong Rong (AI Infra)	d4045e9aa1	initial commit to refactor all s3 access codes to s3_stats_parser (#54681 ) Summary: First step to move all S3 related operations into S3 parser utils. in the end we provide APIs from s3_stats_parser: 1. downloading data as reports and uploading data as reports 2. filter by job name and handle all compression, formatting inside. TODO - [ ] Refactor out upload into s3_stats_parser - [ ] Remove all S3/BOTO related checkers and try/catch blocks outside of s3_stats_parser Pull Request resolved: https://github.com/pytorch/pytorch/pull/54681 Test Plan: 1. Running tools/test/* covers the refactoring logic (test_test_history.py and test_stats.py as entrypoint and both using the 2 new APIs in s3_stats_parser after the refactoring. 2. print_test_stats.py's main argparse entrypoint is covered by CI step Report Test Result step. 3. run `python test/run_test.py --export-past-test-times` before and after this PR should result in the same file content in .pytorch-test-times Reviewed By: ailzhang Differential Revision: D27346742 Pulled By: walterddr fbshipit-source-id: fb40162e631e007fed9d5821fe4f190bda2cb52e	2021-03-26 06:49:15 -07:00
albanD	1126d51de9	Remove useless contiguous calls from torch.matmul (#54616 ) Summary: This reduces the memory usage of matmul significantly for expanded batch size. This reduces the peak memory usage of ``` a = torch.rand(1, 1024, 1024, device="cuda") b = torch.rand(1024, 1024, 1, device="cuda") out = torch.matmul(a, b) ``` From 4GB to 16MB which is not too bad. It also fixes the same problem when `b` is not batched. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54616 Reviewed By: ailzhang Differential Revision: D27327056 Pulled By: albanD fbshipit-source-id: 4bb5f4015aeab4174148512f3c5b8d1ffa97bf54	2021-03-26 06:34:24 -07:00
generatedunixname89002005325676	5e62da2efd	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27356622 fbshipit-source-id: f03ad23a2847b3cbaf61e16055393cbbfbc215ae	2021-03-26 04:18:11 -07:00
Chen Lai	b7b481bd07	[PyTorch] Enable template build at aten operator level (#53801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53801 ## Summary Enable partial explicit Aten level sources list for lite interpreter. More aten level source list will be added. 1. Use `gen_selected_mobile_ops_header.py ` to generate `selected_mobile_ops.h`. Currently, it only includes selected operators, and dtypes is all. 2. Add a custom target includes only `seleteted_mobile_ops.h`, and add it to `torch_cpu` dependency, when `BUILD_LITE_INTERPRETER` is enabled. As a note, the current input yaml file is slightly different than the one use in internal. Will align these two yaml as next step. Android x86: `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` libpytorch_jni_lite.so -- 3.4 MB armeabi-v7a `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh armeabi-v7a` libpytorch_jni_lite.so -- 2.5 MB iOS: ``` (base) chenlai@chenlai-mp install % du -sh * 15M include 57M lib 2.8M share ``` ``` (base) chenlai@chenlai-mp lib % ls -lh total 117296 -rw-r--r-- 1 chenlai staff 3.2M Mar 15 22:03 libXNNPACK.a -rw-r--r-- 1 chenlai staff 913K Mar 15 22:03 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Mar 15 22:03 libclog.a -rw-r--r-- 1 chenlai staff 42K Mar 15 22:03 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Mar 15 22:03 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Mar 15 22:03 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Mar 15 22:03 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Mar 15 22:03 libtorch.a -rw-r--r-- 1 chenlai staff 51M Mar 15 22:03 libtorch_cpu.a ``` ### Master (Baseline): Android x86: `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` libpytorch_jni_lite.so -- 3.8 MB armeabi-v7a `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh armeabi-v7a` libpytorch_jni_lite.so -- 2.8 MB iOS: ``` (base) chenlai@chenlai-mp install % du -sh * 15M include 58M lib 2.8M share ``` ``` (base) chenlai@chenlai-mp lib % ls -lh total 119600 -rw-r--r-- 1 chenlai staff 3.2M Mar 4 23:16 libXNNPACK.a -rw-r--r-- 1 chenlai staff 910K Mar 4 23:16 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Mar 4 23:16 libclog.a -rw-r--r-- 1 chenlai staff 42K Mar 4 23:16 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Mar 4 23:16 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Mar 4 23:16 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Mar 4 23:16 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Mar 4 23:16 libtorch.a -rw-r--r-- 1 chenlai staff 52M Mar 4 23:16 libtorch_cpu.a ``` Test Plan: Imported from OSS Reviewed By: dhruvbird Differential Revision: D27074814 Pulled By: cccclai fbshipit-source-id: 762b5ad5b87b6a262444392fd089249c4837ba18	2021-03-25 23:57:48 -07:00
Vasiliy Kuznetsov	0a18211989	ns for fx: add weight matching for linear fp16 emulation (#54257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54257 Makes the NS weight extraction fuction work correctly with fp16 emulation patterns for linear. We navigate to the weight correctly, and cast it to `torch.float16` before returning. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27159370 fbshipit-source-id: 95f555298e3153e4783c64b3d8c83b9d3fdffa12	2021-03-25 22:35:38 -07:00
Vasiliy Kuznetsov	182d8c375c	ns for fx: add partial support for subgraphs with base_op_node (#54254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54254 In fp16 emulation, we now have patterns such as ``` ... -> dequantize -> linear -> relu -> to(torch.float16) -> ... ``` This PR adds support for * specifying a subgraph's "base_op_node", which is the node with the op which should be matched to related nodes. In the example above, "base_op_node" would be the linear node, and it would be the second node in the matched pattern. * matching these fusion patterns and properly setting "base_op_node" based on pattern and index * using "base_op_node" instead of "start_node" throughout the NS codebase wherever the intent is to match subgraphs or create names for subgraphs. At the end of this PR, matching unshadowed activations with an example fp16 emulation pattern works e2e. I'm saving the following work for future PRs (soon), mostly to keep PR size manageable: * adding weight matching (will require some changes to function which extracts weights) * adding shadowed activation matching (will require some changes to shadow copying) * adding input logging for these patterns (will likely require some changes as well) Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_linear_fp16 ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27158199 fbshipit-source-id: 49fc445395452fda62e3c7a243544190f9af691c	2021-03-25 22:35:36 -07:00
Vasiliy Kuznetsov	454832e5fa	ns for fx: create subgraph type (#54253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54253 Creates an `NSSubgraph` type for representing a subgraph instance, and modifies the NS code to use it. This will enable us to add more information to the subgraph instance definition without having to change all the callsites. Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27158198 fbshipit-source-id: 548785dd90144e2da256c23af990620c778e7cfe	2021-03-25 22:35:34 -07:00
Vasiliy Kuznetsov	9e8e744efe	ns for fx: move shadow lstm test to new API (#53828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53828 Moves LSTM shadow activations test to new API. In order to enable this, adds support for passing two args instead of one arg when copying a subgraph from A to B. Since this was the last test of the old API, deletes the old test case. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_shadow_activations_lstm_dynamic ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26982733 fbshipit-source-id: 03f580688dd37f3ccd688d9f444e9e79cfa84734	2021-03-25 22:35:31 -07:00
Vasiliy Kuznetsov	cfe7364809	ns for fx: move shadow activations linear test to new API (#53819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53819 Moves the linear tests for shadow activations to new API. In order to do so, adds logic for fp32 to fp32 dtype cast, which is an identity. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_shadow_activations_linear ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26982734 fbshipit-source-id: b6203228abf3cdf74ab0638468a6df77658aa662	2021-03-25 22:35:29 -07:00
Vasiliy Kuznetsov	3dc8ba27a5	ns for fx: move shadow activations conv test to new API (#53818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53818 Moves testing of conv for shadow activations to new NS API Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_shadow_activations_conv ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26982732 fbshipit-source-id: 9e8709a76363fbcdf84413e5d4a6c8a0889cb97b	2021-03-25 22:35:27 -07:00
Vasiliy Kuznetsov	52a8075f16	ns for fx: add support for lstm activation matching (#53779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53779 Moves the test case for LSTM activation matching to new NS APIs. This requires adding the ability to log non-Tensor types. Since we need Loggers to be scriptable and TorchScript does not support `Union`, we collect statistics in a separate collector if we have an RNN. Note: this can scale to a small N of return types, but not to a large N. If the N becomes large in the future, we will solve it then. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967110 fbshipit-source-id: afe60b44fdec28a328813b4f342cf4fe04820baa	2021-03-25 22:33:41 -07:00
James Reed	c656a5befa	[FX] Normalize Python operators to `torch.` ops when called with Tensors (#54236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54236 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D27149411 Pulled By: jamesr66a fbshipit-source-id: fe9c468f7c84c254dbb1b70163d08b343725861a	2021-03-25 22:27:49 -07:00
Vasiliy Kuznetsov	b81e10a291	fx quant: fix bug with fusion patterns and disabling quantization (#54654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54654 Fixes a bug where disabling quantizaton on potential fusion patterns would lead to errors in the `convert` function. For example: 1. have a model with add-relu 2. disable quantization for the part of the model containing add-relu 3. run prepare and convert, the convert step would fail because intermediate nodes were missing from `env`. The fix is to add handling for this edge case. If quantization is disabled, we manually copy the nodes for multi-node fusion patterns. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_fusion_pattern_unquantized ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D27318454 fbshipit-source-id: 27c1fd1cb7c9711a8e8d338200971c428dae8f98	2021-03-25 22:21:41 -07:00
James Reed	a28c7db9f9	[FX] Garbage collect values in Interpreter (#54726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54726 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27341449 Pulled By: jamesr66a fbshipit-source-id: 9dc5f9675ed197dee4a31c8b0e6276248378f1ea	2021-03-25 20:35:32 -07:00
Jane Xu	fd58ececab	Pin autocanceling GHA repo to specific commit (#54738 ) Summary: This way, if malicious code gets committed and the tag moves forward, we would be at risk. This does mean that we would have to manually update the SHA if there are desirable upgrades to the repository. We are pinning it to this commit: `a81b3c4d59` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54738 Reviewed By: samestep Differential Revision: D27346792 Pulled By: janeyx99 fbshipit-source-id: 5641a78567c3cd61dce35dfa2fd4918f255a7681	2021-03-25 16:09:42 -07:00
Bert Maher	9db4802184	[fuser] Support bfloat16 (#54571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54571 Supports bfloat16 via a similar method to half: upconvert inputs to fp32, do math, then downconvert outputs to bf16. Resource strings are mostly derived from cuda-11 headers. Fixes #53918, for the legacy fuser at least. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27328987 Pulled By: bertmaher fbshipit-source-id: 5c0eae44164623faa0c75cb818e8bf0211579fdc	2021-03-25 15:59:15 -07:00
Rohan Varma	6b7652e26c	[DDP logging] Prefer use of c10::Join (#54649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54649 Some operator<< code manually implemented string join in C++, turns out there is a c10 util for this. Use the util instead of rolling our own. ghstack-source-id: 124840043 Test Plan: Ci Reviewed By: SciPioneer Differential Revision: D27316705 fbshipit-source-id: 5118097f84be2f38a503d8f81faa38c8d95ec17a	2021-03-25 15:54:48 -07:00
Nikita Vedeneev	dfc7fa03e5	lu_backward: more numerically stable and with complex support. (#53994 ) Summary: As per title. Numerical stability increased by replacing inverses with solutions to systems of linear triangular equations. Unblocks computing `torch.det` for FULL-rank inputs of complex dtypes via the LU decomposition once https://github.com/pytorch/pytorch/pull/48125/files is merged: ``` LU, pivots = input.lu() P, L, U = torch.lu_unpack(LU, pivots) det_input = P.det() * torch.prod(U.diagonal(0, -1, -2), dim=-1) # P is not differentiable, so we are fine even if it is complex. ``` Unfortunately, since `lu_backward` is implemented as `autograd.Function`, we cannot support both autograd and scripting at the moment. The solution would be to move all the lu-related methods to ATen, see https://github.com/pytorch/pytorch/issues/53364. Resolves https://github.com/pytorch/pytorch/issues/52891 TODOs: * extend lu_backward for tall/wide matrices of full rank. * move lu-related functionality to ATen and make it differentiable. * handle rank-deficient inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53994 Reviewed By: pbelevich Differential Revision: D27188529 Pulled By: anjali411 fbshipit-source-id: 8e053b240413dbf074904dce01cd564583d1f064	2021-03-25 13:33:58 -07:00
Facebook Community Bot	3bb0f1f343	Automated submodule update: tensorpipe (#54686 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `5d15ff7a64` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54686 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27328262 fbshipit-source-id: 81e1ede0607da4d8f676145cfb6729ac5544c77d	2021-03-25 13:16:55 -07:00
Nikita Shulga	68bdeef2ce	[CMake] Simplify CPU architecture detection logic (#54637 ) Summary: CMAKE_SYSTEM_PROCESSOR set to x86_64(on Linux) or AMD64 (`5ec224496b`)(on Windows) indicates build is running on x86_64 architecture, while `CMAKE_SYSTEM_PROCESSOR` set to aarch64 or arm64 means we running on ARMv8+ architecture. Delete `i[3-6]86` pattern as 32-bit builds are no longer supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/54637 Reviewed By: ezyang Differential Revision: D27311897 Pulled By: malfet fbshipit-source-id: 26989fc9b54a96d70c768ab03ca4528506ee7808	2021-03-25 12:32:18 -07:00
Michael Suo	911b8b1bfc	[package] rename PackageExporter.external to PacakgeExporter.extern_modules (#54601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54601 This make it consistent with PackageImporter and the on-disk format. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D27296915 Pulled By: suo fbshipit-source-id: a9bc615b1952b6cc4dcba31d4a33932b1fa1a2aa	2021-03-25 11:50:07 -07:00
Brian Vaughan	9c60fc9cd9	Fix broken javadoc URL in README (#54434 ) Summary: The link in the README was broken Pull Request resolved: https://github.com/pytorch/pytorch/pull/54434 Reviewed By: ailzhang Differential Revision: D27328733 Pulled By: nairbv fbshipit-source-id: 12ebb6f66983f9348a90b9738fbd9f3f2660c2d1	2021-03-25 11:28:23 -07:00
mrTsjolder	a7c7fc96ff	Add doc warnings for default SELU gain (#54057 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24991 and provides the alternative solution suggested in https://github.com/pytorch/pytorch/issues/53694. Also related to https://github.com/pytorch/pytorch/issues/54055 Attempt to make people aware of the difference between paper and implementation of SELU gain. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54057 Reviewed By: ailzhang Differential Revision: D27292060 Pulled By: jbschlosser fbshipit-source-id: e0e303595e6a7d05d11dfb68735e1839f55987a2	2021-03-25 11:21:02 -07:00
Hameer Abbasi	f1edaabc35	Simplify creation of unary structured kernels. (#54592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54592 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27293362 Pulled By: bdhirsh fbshipit-source-id: 805f4e321645bc1ad7b8811f4b6daf96775eac9f	2021-03-25 11:08:33 -07:00
Jane Xu	71b9f2dd76	Add GHA to cancel redundant GHA workflows except on master (#54689 ) Summary: Relands https://github.com/pytorch/pytorch/issues/54685 with the fix to filter out master Tested with samestep in other repository. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54689 Reviewed By: walterddr Differential Revision: D27330804 Pulled By: janeyx99 fbshipit-source-id: 06d8199af6173eedca2e7db4a1fd7b9a143d29d2	2021-03-25 10:37:41 -07:00
Wenlei Xie	53596cdb73	Remove hacky wrapper for about 100 kernels (#54367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54367 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54098 ghstack-source-id: 124804544 Test Plan: buck build //caffe2/aten/... Reviewed By: smessmer Differential Revision: D27210057 fbshipit-source-id: 368dc77843468cfc44535488a040dbc2cb67208d	2021-03-25 10:00:16 -07:00
Thomas Viehmann	d12118c0aa	Handle stride > 1 with im2col in CUDA thnn conv2d (#54080 ) Summary: The fallback thnn 2d convolution uses `im2col` to get patches and `gemm` to implement convolution . I has a shortcut to use `gemm` directly for kernel size 1, but this only works for stride == 1 and padding == 0. This PR adds checks for stride == 1 and padding == 0 to determining whether `im2col` can be skipped. Fixes https://github.com/pytorch/pytorch/issues/54036 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54080 Reviewed By: ejguan Differential Revision: D27170482 Pulled By: zou3519 fbshipit-source-id: 055d6502239d34945934de409d78144d8a5c56f4	2021-03-25 09:53:49 -07:00
Sam Estep	0b0a5dd35f	Revert D27327999: [pytorch][PR] Cancel redundant GHA workflows Test Plan: revert-hammer Differential Revision: D27327999 (`f251bb40c1`) Original commit changeset: c5793a7660d2 fbshipit-source-id: 1f65b5341527de00d497780565a5cfd27da5239d	2021-03-25 09:25:31 -07:00
Sam Estep	f251bb40c1	Cancel redundant GHA workflows (#54685 ) Summary: This PR adds a lightweight workflow which runs when any of our GitHub Actions lint or test workflows start (currently just the three listed in the YAML in this PR's diff), and cancels redundant ones (e.g. if a PR author pushes several commits in rapid succession). Currently this isn't particularly impactful, but it would become more so if/when we add heavier workflows that run on PRs. Initially we tried using [`technote-space/auto-cancel-redundant-workflow`](https://github.com/technote-space/auto-cancel-redundant-workflow) instead of [`potiuk/cancel-workflow-runs`](https://github.com/potiuk/cancel-workflow-runs), but for some reason it the former doesn't seem to work even if triggered by `workflow_run` with the `TARGET_RUN_ID` input set appropriately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54685 Test Plan: janeyx99 and I tested this in a separate GitHub repo, and confirmed that it successfully cancels redundant `push`-triggered workflows on the source repo and `pull_request`-triggered workflows from forks. Reviewed By: janeyx99 Differential Revision: D27327999 Pulled By: samestep fbshipit-source-id: c5793a7660d21361381e0f033d314f2d603f70ec	2021-03-25 09:13:39 -07:00
Mark Astley	4bf90558e0	[Gradient Compression] Add logging for gradient compression stats. (#54647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54647 Regularly log stats showing effect of gradient compression when using the PowerSGD DDP communication hook. Test Plan: buck run mode/dev-nosan scripts/wayi/torch:power_sgd Play with the layer sizes of the input model (you can just use linear layers for convenience), and check the log that shows compression stats. For convenience, you can change `logging.info` to `print` locally. You can create some test diffs on top of this diff, to show that the compression stats are correct in different cases. Run with power_sgd script: {F537381542} Diff with example using a simple linear model: D27299934 sample output: {F538486535} Reviewed By: SciPioneer Differential Revision: D27240254 fbshipit-source-id: 9e142b2f7957cc874804f799b7bb3bffdf824858	2021-03-25 07:44:17 -07:00
Pritam Damania	267fc27d39	Ensure torch.futures.wait_all exits early on error. (#53953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53953 torch.futures.wait_all, would wait for all specified futures to complete before it returned. As a result, if there was an error it would still wait for a long time (ex: long running RPCs) before it returned an error to the user. This PR ensures `wait_all` returns and error as soon as any future runs into an error and doesn't wait for all futures to complete. I removed the logic _invoke_rpc_python_udf which raised an error in the unwrap function, because ideally the error should be set on the Future and not be raised to the user only when `wait()` is called. As an example, in the case of `wait_all`, the user never calls `wait()` on the future that errored out but a future down the chain and we should propagate these errors via `setError` instead. ghstack-source-id: 124721216 Test Plan: 1) Unit test added. 2) waitforbuildbot Reviewed By: mrshenli Differential Revision: D27032362 fbshipit-source-id: c719e2277c27ff3d45f1511d5dc6f1f71a03e3a8	2021-03-25 07:39:14 -07:00
Robert Gmyr	93bbbeccf7	Make SharedCache thread-safe (#53750 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53731 Make SharedCache thread-safe by using explicit locks instead of relying on atomicity of certain Python operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/53750 Reviewed By: malfet Differential Revision: D27304793 Pulled By: albanD fbshipit-source-id: 7c62babe4357bed57df3056fbda6801fb6168846	2021-03-25 06:35:03 -07:00
Can Balioglu	9029d0d7d8	Introduce a fluent API to construct tensors from external data. (#54530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54530 This diff introduces the following changes and improvements: - Introduces a new fluent API to construct tensors from external data as an alternative to `from_blob` overloads. See below for an example. - Leverages several small-buffer optimizations which result in %50 reduction in tensor construction times. - Exposes a new (lightweight) way to construct tensors by passing a naked `context` and `context_deleter` pair as an alternative to the existing `deleter` parameter. - Updates the existing `from_blob` overloads to internally use the fluent API. ``` // Example 1 at::Tensor tensor = at::for_blob(data, sizes) .strides(strides) .context(context, [](void ctx) { delete static_cast<Ctx>(ctx); }) .options(...) .target_device(...) .make_tensor(); // Example 2 at::Tensor tensor = at::for_blob(data, sizes).make_tensor(); // Example 3 at::Tensor tensor = at::for_blob(data, sizes) .deleter(...) .make_tensor(); ``` Test Plan: Below are the folly Benchmark results for the following two equivalent operations: ``` // The fluent API at::Tensor tensor = at::for_blob(data, sizes) .deleter([buffer](void) mutable { buffer.reset(); }) .options(dtype(c10::ScalarType::Float)) .make_tensor(); // The original `from_blob` overload at::Tensor tensor = at::from_blob( data, sizes, [buffer](void) mutable { buffer.reset(); }, dtype(c10::ScalarType::Float)); ``` ``` ============================================================================ scripts/balioglu/from_blob_exp/main.cpp relative time/iter iters/s ============================================================================ fluent 298.34ns 3.35M from_blob 55.19% 540.51ns 1.85M ============================================================================ ``` Various similar experiments show an approximate %50 reduction in tensor construction times. Reviewed By: ezyang Differential Revision: D27269344 fbshipit-source-id: e6bd0b78384bf89fd24f22254008180329000363	2021-03-25 06:24:50 -07:00
Heitor Schueroff	6cdabb2e40	Update .gitignore to ignore NFS handle files (#54618 ) Summary: Ignore NFS handle files starting with .nfs*. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54618 Reviewed By: malfet Differential Revision: D27304405 Pulled By: heitorschueroff fbshipit-source-id: 9abeed796fec0a4ff416eacea450f3f8e2813b32	2021-03-25 04:49:36 -07:00
Yi Wang	55dfb4a575	Update CODEOWNERS for distributed training (#54661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54661 My username was somehow deleted, and I couldn't receive review requests. ghstack-source-id: 124853153 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D27320286 fbshipit-source-id: c38ea3adb2e8197f949a806127d20982299a2851	2021-03-25 00:04:13 -07:00
Christian Puhrsch	c0bcd5a58f	Remove NestedTensor from DefaultBackend alias (#54559 ) Summary: Kernels such as "add" are registered to DefaultBackend. At a minimum NestedTensor is not compatible with structured kernels due to missing fields such as size, which can therefore cause difficult to catch bugs when being passed into a function without a NestedTensor-specific kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54559 Reviewed By: ezyang Differential Revision: D27283591 Pulled By: cpuhrsch fbshipit-source-id: fad7c03ca3b2190f2f90039dd2872184e9bc5049	2021-03-24 23:43:13 -07:00
Will Constable	2662e34e92	Add PyTorchDeploy predictor model type (#54120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54120 Construct InterpreterManager inside PyTorchDeployModel - add ReadAdapterInterface to deploy::Package Implement PyTorchDeployModel::makePrediction for FeatureStore Examples - Basic test of loading and executing 'simple' model Test Plan: ran unit tests locally and CI Differential Revision: D26961744 fbshipit-source-id: fce72bc83b9005500d9b7ce3fab2ed466f73d6ed	2021-03-24 23:01:06 -07:00
haozhe.zhu	947ab84fd2	enable_and_enhance_bf16_threshold (#54384 ) Summary: enable_and_enhance_bf16_threshold Pull Request resolved: https://github.com/pytorch/pytorch/pull/54384 Reviewed By: ngimel Differential Revision: D27286323 Pulled By: mruberry fbshipit-source-id: 517fa94764d8202bbcbf94011d2d48f716fbd01b	2021-03-24 22:46:20 -07:00
Xiang Gao	9f336bdf10	Fixes new tf32 failures in test_nn.py (#52871 ) Summary: Also modify the `tf32_on_and_off` decorator to make it support function without `device` argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52871 Reviewed By: ngimel Differential Revision: D27286674 Pulled By: mruberry fbshipit-source-id: 14f6d558271bd6a1d0bc40691c170d47e81de1ff	2021-03-24 21:53:33 -07:00
Venkata Chintapalli	64d31e3f45	Add double tensor type to DivFakeFp16 Op (#54636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54636 Test Plan: The model will be rerun after the diff lands... Reviewed By: hx89 Differential Revision: D27310244 fbshipit-source-id: 88575237596a59996da14a49a8459f8b3d0ee66a	2021-03-24 21:40:29 -07:00
Peng Wu	fe2c1268b7	More name refactoring of memory planning codes to make it more readable (#54272 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54272 Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D27233881 fbshipit-source-id: f257f16ac0684df055961e539f17d002cb8f1bfe	2021-03-24 19:52:35 -07:00
Mikhail Zolotukhin	1ceb90405b	[TensorExpr] Add plumbing for conv2d fusion. (#54439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54439 For now the only way to represent conv2d in TE is via an external call, and since aten library doesn't have an out variant for conv2d, the external call has to perform an extra copy. Because of that fusing conv2d now regressed performance and hence is disabled. However, in near future we should have two alternative ways to enable it: 1) represent conv2d natively in TE (without an external call) 2) add an out variant for conv2d Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27237045 Pulled By: ZolotukhinM fbshipit-source-id: f5545ff711b75f9f37bc056316d1999a70043b4c	2021-03-24 18:49:07 -07:00
kshitij12345	6f8328ef44	[special] Add special.entr (#53500 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 TODO: * [x] Verfiy docs rendering (https://11397990-65600975-gh.circle-artifacts.com/0/docs/special.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53500 Reviewed By: ngimel Differential Revision: D27287096 Pulled By: mruberry fbshipit-source-id: 6b3dfd53e811a0f023ee444a0b56176f825d39e9	2021-03-24 18:44:42 -07:00
Ilia Cherniavskii	347ab5d8b8	Update Kineto submodule (#54621 ) Summary: Update Kineto submodule to the latest rev. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54621 Test Plan: CI Reviewed By: gdankel Differential Revision: D27303589 Pulled By: ilia-cher fbshipit-source-id: 7cea96f779981acd36d10290a537601e2f361720	2021-03-24 17:31:58 -07:00
Michael Carilli	1442a92741	Ensure local_used_maps_tmp is distinct from local_used_maps_[i] (#54474 ) Summary: Followup/hotfix for https://github.com/pytorch/pytorch/pull/53160. rohan-varma and zhaojuanmao were seeing https://github.com/pytorch/pytorch/pull/53160/files#diff-9273e5ff7b40f30d6a4444d1c7be9fe9a5c2068070c68af4e7b0ac2d4cff0923R582 fire in some internal workloads, indicating `local_used_maps_tmp` wasn't actually being created as a distinct temporary, in other words, `local_used_maps_[i]` was already pinned for some reason. This seems like a bug with the CPU allocator: [`local_used_maps_` should not have been pinned on construction](`9be4c75fa0/torch/lib/c10d/reducer.cpp (L180-L183)`). We should [investigate that separately](https://github.com/pytorch/pytorch/pull/53160/files#r599188373). In the meantime, the present PR should ensure `local_used_maps_tmp` is always distinct from `local_used_maps_[i]` (and therefore prevents the race condition described in https://github.com/pytorch/pytorch/pull/51360) even if `local_used_maps_[i]`is already pinned. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54474 Reviewed By: zhaojuanmao Differential Revision: D27268039 Pulled By: rohan-varma fbshipit-source-id: ab9af3dd845098bde788cb28a9217caea246ddfa	2021-03-24 16:58:31 -07:00
Ivan Yashchuk	ac33432606	Fixed out= variants of non-symmetric eigendecomposition and QR decomposition (#54056 ) Summary: This PR modifies the behavior of _out variants of `torch.eig`, `torch.qr`, `torch.linalg.qr` to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". Tested with OpInfo's `supports_out=True`. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54056 Reviewed By: heitorschueroff Differential Revision: D27230275 Pulled By: mruberry fbshipit-source-id: 3fe1ce6c0e2c20bdfd6742305a20f3cf3632a4d6	2021-03-24 16:52:19 -07:00
Chen Lai	7605ce4ed8	[PyTorch] Enable test_lite_interpreter_runtime running in android (#54579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54579 ## Summary 1. Eliminate a few more tests when BUILD_LITE_INTERPRETER is on, such that test_lite_interpreter_runtime can build and run on device. 2. Remove `#include <torch/torch.h>`, because it's not needed. ## Test plan Set the BUILD_TEST=ON `in build_android.sh`, then run ` BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` push binary to android device: ``` adb push ./build_android_x86/bin/test_lite_interpreter_runtime /data/local/tmp ``` Reorganize the folder in `/data/local/tmp` so the test binary and model file is like following: ``` /data/local/tmp/test_bin/test_lite_interpreter_runtime /data/local/tmp/test/cpp/lite_interpreter_runtime/sequence.ptl ``` such that the model file is in the correct path and can be found by the test_lite_interpreter_runtime. ![image](https://user-images.githubusercontent.com/16430979/112276332-d89d1900-8c3d-11eb-91de-7bf10d1e418d.png) Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D27300720 Pulled By: cccclai fbshipit-source-id: d9526c7d3db8c0d3e76c5a4d604c6877c78afdf9	2021-03-24 14:45:27 -07:00
Jeffrey Wan	673ed4623e	Gradcheck small fixes (#53916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53916 This PR fixes some bugs that are made more clear by the previous refactor. - make sure gradcheck returns false when its supposed to fail and when raise_exception=False. - make sure when test_batched_grad fails, it returns false when raise_exception=False Removing checkIfNumericalAnalyticAreClose made sense here to me because underneath its really doing `torch.allclose`, and using that directly instead of adding another opaque function to call seemed to make the code more clear. TODO: - ~add a test to see if when torch.allclose fails, we indeed return false.~ - ~uncomment test from previous PR.~ Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D27201692 Pulled By: soulitzer fbshipit-source-id: 8b8dc37c59edb7eebc2e8db6f8839ce98a81d78b	2021-03-24 14:35:40 -07:00
Jeffrey Wan	796be045bb	Refactor gradcheck (#53857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53857 This PR basically just factors a lot of the logic out from the main gradcheck function into their own individual functions. It aims to avoid any behavior change (but we may not have enough tests to actually verify this). Refactorings that lead to any behavior chang are done in the next PR in this stack. The rationale for this change is 1) to make the main gradcheck function cleaner to read, and 2) also allow us to reuse the same pieces when we add the fast gradcheck. Maybe this PR is also a good place to add some tests for gradcheck, i.e., make sure gradcheck fails when it should fail, as to make sure that we are indeed not changing any logic. This will also help us make sure our fast_gradcheck does all the necessary checks: So far existing tests are: - test_gradcheck_fail_when_no_differentiable_outputs_and_num_grad_not_zero` (test_autograd) - test_gradcheck_single_input (test_autograd) - test_gradcheck_sparse_input (test_autograd) - test_gradcheck_nondeterministic (test_autograd) - test_gradcheck (test_overrides) Full coverage would potentially require adding the following missing tests (for each test for both raise_exception=True/False) - Methodology for getting the list below is that for every type of error message we spit out, we make sure we can hit it: - complex: - when numerical != analytical when tested with imag grad_out - check_inputs - ~when inputs are not dense, but check_sparse_nnz is false~ - ~when none of the inputs require grad~ - ~(warning) when inputs are not double precision~ - ~when layout is not mkldnn(aka has strides) and input has a dimension with stride 0.~ - check_no_differentiable_outputs: - ~when none of the outputs are differentiable, but numerical gradient is not zero~ - check_outputs: - ~when sparse outputs (always raise)~ - ~when mkldnn outputs (always raise)~ - test_batched_grad - ~when encounter runtime error while computing batched grad (print big message)~ - when not allclose (print out big message) - test_backward_mul_by_grad_output - ~when layout of grad_input is not the same as input~ - ~when grad_input is sparse and has incorrect sparse_dim/dense_dim~ - ~when backward not multiplied by grad_output (sparse/non-sparse case)~ - when grad is incorrect type/size - test_undefined_grad - ~when encounter runtime error while running backward~ - when we complete backward but grad inputs (the output of .grad()) is not none - check_analytical_jacobian_attributes (for both complex/non complex) - when grad input is incorrect dtype/size Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D27201571 Pulled By: soulitzer fbshipit-source-id: 86670a91e65740d57dd6ada7c6b4512786d15962	2021-03-24 14:34:08 -07:00
Yukio Siraichi	d371a9f9c5	Change ScatterGather kernel names on dtype dispatch. (#54498 ) Summary: Changed `ScatterGather` kernel name when `dtype` dispatching to a more meaningful name than `"method_name"`. `2a53897114/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp (L146-L148)` `2a53897114/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp (L241-L243)` Maybe, a more specific name, based on who is calling (e.g. `gather_cpu_kernel`, `scatter_cpu_kernel`), would be better. Any thoughts? Pull Request resolved: https://github.com/pytorch/pytorch/pull/54498 Reviewed By: malfet Differential Revision: D27291514 Pulled By: bdhirsh fbshipit-source-id: 123b77296e685ee34031da661c78e201a10757db	2021-03-24 14:30:02 -07:00
Arindam Roy	556fc8d418	skip test_symeig if MAGMA not detected (#54526 ) Summary: Add proper way to skip test_symeig. In case MAGMA is not detected, skip the test_symeig properly. Added skipCUDAIfNoMagma decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54526 Reviewed By: malfet Differential Revision: D27293640 Pulled By: heitorschueroff fbshipit-source-id: 245f86540af0e37c8795e80dc003e1ca4c08cd5b	2021-03-24 13:55:36 -07:00
Edward Yang	145bc5cd51	Rename Math to CompositeImplicitAutograd (#54466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54466 I had to very carefully audit all the use sites since there are a lot of other uses of the string Math; I did most of the conversion by grepping for all occurrences of Math and then doing a search replace. I also updated documentation for clarity. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27253239 Pulled By: ezyang fbshipit-source-id: afb485d07ff39575742a4f0e1e205179b60bc953	2021-03-24 13:49:24 -07:00
Adam Simpkins	87989a6cf9	[caffe2] support serializing float data as bfloat16 (#53735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53735 Add an option to BlobSerializationOptions to request that float data be serialized as bfloat16. This reduces the serialized data size at the expense of some loss in precision. ghstack-source-id: 124317910 Test Plan: Included a new unit test. Reviewed By: mraway Differential Revision: D26658205 fbshipit-source-id: 74521ed161059066355a3f208488ed01a344dbb5	2021-03-24 13:27:22 -07:00
Ansley Ussery	b032316c41	Improve `nn.Sequential` documentation (#53380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53380 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26849861 Pulled By: ansley fbshipit-source-id: 2add8c73ae421332ed1c03340806e25656bafabb	2021-03-24 13:02:43 -07:00
vfdev-5	2b07bcf9eb	[operator benchmarks] Added more interpolation test cases (#54584 ) Summary: Description: - Added uint8 nearest test case - Added 3d vectorization test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/54584 Reviewed By: malfet Differential Revision: D27291303 Pulled By: fmassa fbshipit-source-id: 236ee5af351c8dc34ec3cdb7dda662c77feb8cf0	2021-03-24 11:46:27 -07:00
Dirk Kulawiak	12a61a172e	Fix missing class in cpp tensor documentation (#54488 ) Summary: The given example in the documentation does not compile due to the missing `torch::`. It is correct in the tutorial about [writing a custom extension ](https://pytorch.org/tutorials/advanced/cpp_extension.html#writing-a-mixed-c-cuda-extension) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54488 Reviewed By: bdhirsh Differential Revision: D27267000 Pulled By: glaringlee fbshipit-source-id: 86a46d656c1a4fa4098287a6a43a38d1ef80171e	2021-03-24 11:10:19 -07:00
anjali411	f9ca0d87a7	Teach Python TS frontend to parse complex literals (#52881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52881 This PR adds: 1. logic to parse complex constants (complex literals of the form `bj`) 2. logic to parse complex lists 3. support for complex constructors: `complex(tensor/int/float/bool, tensor/int/float/bool)` 4. Limited operator support - `add`, `sub`, `mul`, `torch.tensor`, `torch.as_tensor` Follow-up work: 1. Add complex support for unary and other registered ops. 2. support complex constructor with string as input (this is supported in Python eager mode). 3. Test all emitXYZ for all XYZ in `ir_emitter.cpp` (currently only emitConst, emitValueToTensor are tested). e.g., test loops etc. 4. onnx doesn't support complex tensors, so we should error out with a clear and descriptive error message. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27245059 Pulled By: anjali411 fbshipit-source-id: af043b5159ae99a9cc8691b5a8401503fa8d6f05	2021-03-24 08:12:17 -07:00
Edward Yang	2f5db68797	Make nightly checkout work with generated testing py (#54477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54477 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27263065 Pulled By: ezyang fbshipit-source-id: 7fa653fb334ff91c9100cf5adcabab6b30533a89	2021-03-24 07:40:26 -07:00
magialiao	67e4618037	Add arg layer_norm_eps to transformer layers (#54494 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44367 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54494 Reviewed By: bdhirsh Differential Revision: D27264321 Pulled By: jbschlosser fbshipit-source-id: ed264d253b2df2d6f1d80898464f4f26022482ec	2021-03-24 06:59:25 -07:00
Facebook Community Bot	732815af7a	Automated submodule update: tensorpipe (#54582 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `52774a0165` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54582 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27289673 fbshipit-source-id: c1284b1642c518ce4568e32ddebee5034d8a542e	2021-03-24 06:29:27 -07:00
generatedunixname89002005325674	05c8ddfe05	[AutoAccept][Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D27288729 fbshipit-source-id: 84c9f4cffdabd3c1967e3279ec123867d8eded00	2021-03-24 04:18:23 -07:00
kshitij12345	c371542efc	testing: dont skip test_ops suite for operators testing against scipy (#54186 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54186 Reviewed By: ngimel Differential Revision: D27287024 Pulled By: mruberry fbshipit-source-id: 3e19b94b138fb56a7cb2c1c13af3587a5b6d937a	2021-03-24 00:25:24 -07:00
kshitij12345	bac566bf61	torch.square : OpInfo and minor fixes (#52551 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Add `out` variant to be consistent with Unary Ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52551 Reviewed By: heitorschueroff Differential Revision: D27233482 Pulled By: mruberry fbshipit-source-id: fef6f241849a12c46028bd1aad8f5ecc1dc65ea1	2021-03-24 00:04:42 -07:00
frdong	d3f784244e	fix comparison of narrow type with wide type in loop condition part2 (#54471 ) Summary: Follow up PR of https://github.com/pytorch/pytorch/issues/53951. This PR fixes remaining semmle warning: comparison of narrow type with wide type in loop condition Pull Request resolved: https://github.com/pytorch/pytorch/pull/54471 Reviewed By: bdhirsh Differential Revision: D27262493 Pulled By: malfet fbshipit-source-id: 05765758da79699936af11de237c3ff3d34373d6	2021-03-23 23:38:38 -07:00
Taylor Robie	0d81528a47	Definition infrastructure for instruction count ubenchmarks (#53296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53296 Part 1 of the instruction count microbenchmarks. This PR is focused on benchmark definition machinery. (Though you can run `main.py` to see it in action.) A summary of the system is given in the README. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26907092 Pulled By: robieta fbshipit-source-id: 0f61457b3ce89aa59a06bf1f0e7a74ccdbf17090	2021-03-23 21:59:46 -07:00
Taylor Robie	a4ca394f8a	Revert "Revert D26907093: Add repeats to Timer.collect_callgrind(...)" (#54484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54484 Re-land of https://github.com/pytorch/pytorch/pull/53295. (With fixed unit tests.) This reverts commit 0dc5abfaa9cac9266791788839d896b14600d123. Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27255201 Pulled By: robieta fbshipit-source-id: 4e9fed7522631d66c5cd7e27ace9b5ffc3a0bbfc	2021-03-23 21:58:17 -07:00
Ansha Yu	afe339d7dd	[static runtime] support DictConstruct (#54438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54438 August 1x model has DictConstruct in the graph (P331168321) These can be easily removed with jit pass, but to easily measure the improvement and run replayer with the model in the meantime, enable DictConstruct in static runtime Test Plan: ``` ./sigrid/predictor/scripts/pytorch/pyper_inference_e2e_local_replayer_test.sh \ cpu 218841466_0 7449 /data/users/ansha/tmp/adfinder/august_1x/ /data/users/ansha/tmp/adfinder/august_1x/filtered_requests_inline_cvr_100 ``` ``` TEST trace Total num requests 100 Num exceptions 0 Latency us avg 180965 Latency us p25 89785 Latency us p50 131240 Latency us p75 146621 Latency us p90 158378 Latency us p95 166628 Latency us p99 1886680 Latency us p100 3803252 Server latency us avg 91554 Server latency us p25 51447 Server latency us p50 86371 Server latency us p75 95229 Server latency us p90 102706 Server latency us p95 116023 Server latency us p99 557017 Server latency us p100 716319 Num rankUnits avg 28 ``` Reviewed By: hlu1 Differential Revision: D27236682 fbshipit-source-id: 1da49a836dd7533480e77797338baa9edcb65fb5	2021-03-23 21:20:03 -07:00
Raghavan Raman	601e79200d	[NNC] Implementing LoopFusion (#54461 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54337 This PR adds a new API to NNC to perform loop fusion. ``` static For* fuseLoops(const std::vector<For>& loops); ``` Loop fusion is done only when all the conditions below are satisfied. All the loops have the same parent. * There are no statements between these loops in their parent body. * The start bounds are the same for all loops. * The stop bounds are the same for all loops. * Fusing the loops does not violate or add any dependencies. This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461 Reviewed By: bertmaher Differential Revision: D27254888 Pulled By: navahgar fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e	2021-03-23 21:20:00 -07:00
Zeina Migeed	5105250e16	[FX] Add docs for shape propagation (#54554 ) Summary: Fixes #{i54538} Pull Request resolved: https://github.com/pytorch/pytorch/pull/54554 Reviewed By: nikithamalgifb Differential Revision: D27281263 Pulled By: jamesr66a fbshipit-source-id: 2fd3914f0e24be0b6a18ad7715f3336dcf7949ba	2021-03-23 21:18:11 -07:00
76181208+imaginary-person@users.noreply.github.com	5cd8a77e01	Skip inplace autograd test if inplace variant doesn't exist (#54460 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54413 1. Skip inplace autograd test for an op if its inplace variant does not exist. 2. For ops that don't have an inplace variant, remove redundant `supports_inplace_autograd=False` assignments in their `OpInfo`s. 3. Ops having inplace variants that do not support autograd should not have `supports_inplace_autograd=False` entries removed from their `OpInfo`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54460 Reviewed By: ngimel Differential Revision: D27255938 Pulled By: mruberry fbshipit-source-id: f15334b09e68995e9f26adc2ff3e59c292689ee8	2021-03-23 21:10:37 -07:00
Rohan Varma	789dc6d445	[NCCL] Add more details for checkForNCCLErrors (#54117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54117 https://github.com/pytorch/pytorch/pull/45950 enhanced our NCCL logging errors so that we add some basic debug information about what when wrong when erroring out with a NCCL error. However, that PR only used the added function for `C10D_NCCL_CHECK` which is used to check the return values of NCCL calls. However, in ProcessGroupNCCL we also have `checkForNCCLErrors` which checks for errors on nccl communicators, and in case of errors it would be good to have this logging there too. Also renames the function s/errorMessage/getNcclErrorDetailStr ghstack-source-id: 124662592 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D27100497 fbshipit-source-id: fec3663ffa3e92bae8391ef4f77054abb4bb9715	2021-03-23 20:29:16 -07:00
kshitij12345	b93ab10b7a	torch.lerp: cuda complex support (#54129 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54048 TODO * [x] Add test Pull Request resolved: https://github.com/pytorch/pytorch/pull/54129 Reviewed By: bdhirsh Differential Revision: D27261878 Pulled By: anjali411 fbshipit-source-id: 10937a2eab944c73b5a98ec6278f50a876b8c7dc	2021-03-23 19:58:43 -07:00
Facebook Community Bot	5781aec74e	Automated submodule update: FBGEMM (#54509 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a2b58dfab5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54509 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D27264145 fbshipit-source-id: 606948e002dcf364bb39aad49ef4f2144bbba7a4	2021-03-23 18:52:30 -07:00
Edward Yang	33b95c6bac	Add __torch_function__ support for torch.nn.functional.embedding (#54478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54478 Fixes #54292 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D27264179 Pulled By: ezyang fbshipit-source-id: cd267e2e668fdd8d7f958bf70a0b93e058ec7c23	2021-03-23 17:22:39 -07:00
Nikita Shulga	91d37d7d2f	[CI] Install compatible cmath for Win binary builds (#54527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54527 Reviewed By: walterddr Differential Revision: D27269528 Pulled By: malfet fbshipit-source-id: 4afdc706598f3a6ad296468dfb77a70433ae7d0f	2021-03-23 17:05:20 -07:00
Sam Estep	66a3614b47	Fix typo in .github/workflows/lint.yml (#54551 ) Summary: Fixes a minor typo introduced in https://github.com/pytorch/pytorch/issues/51796. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54551 Test Plan: None, since this only changes a comment. Reviewed By: seemethere Differential Revision: D27278347 Pulled By: samestep fbshipit-source-id: 34a781cce0cb4e93a68821d6006bbf05b0bbe2f0	2021-03-23 16:45:41 -07:00
Rong Rong (AI Infra)	5754816597	fix SC2126 introduced error (#54545 ) Summary: SC2126 suggested from diff CI is wrong. reverting last commit in https://github.com/pytorch/pytorch/issues/54373 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54545 Reviewed By: samestep Differential Revision: D27276006 Pulled By: walterddr fbshipit-source-id: 1a9823e9ad6c6509a36896df88d599546826f4e9	2021-03-23 16:39:42 -07:00
James Reed	4a74b0f2dd	Fix logic in TestFX.test_get_torch_func_signature_exhaustive (#54510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54510 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D27264670 Pulled By: jamesr66a fbshipit-source-id: 0ef6395dacde3eb2a4b9c7eeff760a1be38b6dfe	2021-03-23 16:23:25 -07:00
Serhat Yilmaz	7e3cf1ee24	[pytorch] Add native support for segment reduce step1: API definition (#53727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53727 This is first diff to add native support for segment reduction in PyTorch. It provides similar functionality like torch.scatter or "numpy.ufunc.reduceat". This diff mainly focuses on API layer to make sure future improvements will not cause backward compatibility issues. Once API is settled, here are next steps I am planning: - Add support for other major reduction types (e.g. min, sum) for 1D tensor - Add Cuda support - Backward support - Documentation for the op - Perf optimizations and benchmark util - Support for multi dimensional tensors (on data and lengths) (not high priority) - Support for 'indices' (not high priority) Test Plan: Added unit test Reviewed By: ngimel Differential Revision: D26952075 fbshipit-source-id: 8040ec96def3013e7240cf675d499ee424437560	2021-03-23 16:00:30 -07:00
Heitor Schueroff	591084abb8	Deprecate torch.matrix_power in favor of torch.linalg.matrix_power (#53538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53538 * #52608 Added torch.linalg.matrix_power Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27261531 Pulled By: heitorschueroff fbshipit-source-id: 5a944b390f3cc6896c2aa92ba467319ddc9309e4	2021-03-23 15:11:24 -07:00
Heitor Schueroff	f9e7f132fb	Added torch.linalg.matrix_power (#52608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52608 TODO - [x] Add OpInfo - [x] Update documentation - [x] Add more tests and compare against NumPy Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27261532 Pulled By: heitorschueroff fbshipit-source-id: c1e4ab297da3683f6d5751be8790602f9dc37b6b	2021-03-23 15:10:06 -07:00
Ivan Kobzarev	345b26ca08	[android][utils] Support ChannelsLast in TensorImageUtils (#48990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48990 Introducing TensorImageUtils methods to prepare tensors in channelsLast MemoryFormat. ChannlesLast is preferred for performance. Not to introduce api breaking changes, adding additional parameter MemoryFormat which is CONTIGUOUS by default. Testing by checking test_app that uses this call ``` gradle -p android installMnetLocalBaseDebug -PABI_FILTERS=arm64-v8a ``` Test Plan: Imported from OSS Reviewed By: jeffxtang Differential Revision: D27173940 Pulled By: IvanKobzarev fbshipit-source-id: 27788082d2c8b190323eadcf18de25d2c3b5e1f1	2021-03-23 14:54:36 -07:00
Jane Xu	792f5ffb83	Also strip slow_test (#54528 ) Summary: Since `_test1`, `_test2` and `_build` and `test` are all stripped, `slow_test` should be stripped as well. This way, the _slow_test stats will be considered as a part of all stats relating to a particular build job, though currently, it doesn't do much because the jobs don't share a common stemmed name--the build has `_gcc7` while the slow_test CI job does not. This makes me think...do we omit the `gcc7` intentionally? Are there other things I should strip, e.g., `multigpu_test`? See: ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 ci/circleci: pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54528 Reviewed By: samestep Differential Revision: D27270393 Pulled By: janeyx99 fbshipit-source-id: ffb7289cfe4dba52ded67f50a89f3e75e7bad68d	2021-03-23 14:44:21 -07:00
Peng Wu	c06d979731	[Static Runtime] Name refactoring to make MemoryPlanning more readable (#54045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54045 Test Plan: Imported from OSS Reviewed By: bwasti Differential Revision: D27233880 fbshipit-source-id: 43b38901d8cfea0941a1a2934997a08027b57b6d	2021-03-23 14:28:43 -07:00
Luca Wehrstedt	35186eb983	Update TensorPipe submodule (#54507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54507 Test Plan: CircleCI Reviewed By: mrshenli Differential Revision: D27262943 fbshipit-source-id: cffecd01756180325147d4fb85fbe9bc78727884	2021-03-23 14:22:13 -07:00
Sam Estep	1b792a7f15	Fix Flake8 (#54540 ) Summary: https://github.com/pytorch/pytorch/issues/54339 broke Flake8. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54540 Test Plan: ``` flake8 ``` Reviewed By: walterddr Differential Revision: D27274171 Pulled By: samestep fbshipit-source-id: 4b440d72b4b5615f45e6fcb25f7a4c0423add272	2021-03-23 13:50:03 -07:00
Jeff Daily	e5b97777e3	[ROCm] allow PYTORCH_ROCM_ARCH in cpp_extension.py (#54341 ) Summary: Allows extensions to override ROCm gfx arch targets. Reuses the same env var used during cmake build for consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54341 Reviewed By: bdhirsh Differential Revision: D27244010 Pulled By: heitorschueroff fbshipit-source-id: 279e1a41ee395a0596aa7f696b6e908cf7f5bb83	2021-03-23 13:06:00 -07:00
kshitij12345	446e477d4f	[complex] torch.rsub(): complex autograd support (#53702 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53702 Reviewed By: agolynski Differential Revision: D27142807 Pulled By: anjali411 fbshipit-source-id: 053d0a0f9a478cf04efcb0d84aacf042abae1a4e	2021-03-23 12:46:51 -07:00
Antonio Cuni	21a9a93eb4	gdb special command to print tensors (#54339 ) Summary: This is something which I wrote because it was useful during my debugging sessions, but I think it might be generally useful to other people as well so I took the liberty of proposing an official `pytorch-gdb` extension. `pytorch-gdb` is a gdb script written in python. Currently, it contains only one command: `torch-tensor-repr`, which prints a human-readable repr of an `at::Tensor` object. Example: ``` Breakpoint 1, at::native::neg (self=...) at [...]/pytorch/aten/src/ATen/native/UnaryOps.cpp:520 520 Tensor neg(const Tensor& self) { return unary_op_impl(self, at::neg_out); } (gdb) # the default repr of 'self' is not very useful (gdb) p self $1 = (const at::Tensor &) 0x7ffff72ed780: {impl_ = {target_ = 0x5555559df6e0}} (gdb) torch-tensor-repr self Python-level repr of self: tensor([1., 2., 3., 4.], dtype=torch.float64) ``` The idea is that by having an official place where to put these things, `pytorch-gdb` will slowly grow other useful features and make the pytorch debugging experience nicer and faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54339 Reviewed By: bdhirsh Differential Revision: D27253674 Pulled By: ezyang fbshipit-source-id: dba219e126cc2fe66b2d26740f3a8e3b886e56f5	2021-03-23 12:30:18 -07:00
Jacob Szwejbka	583c4bf7d3	[Pytorch Mobile] optimize_for_mobile: Fuse Add Relu on any function (#54441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54441 Similar to previous dropout one ghstack-source-id: 124544176 Test Plan: Printed graphs before and after fusion. verified input outputs stayed the same {P299343882} Reviewed By: kimishpatel Differential Revision: D27014352 fbshipit-source-id: d0a9548f8743472bdd7e194efd8e8d5fe53b95b6	2021-03-23 12:11:59 -07:00
Rong Rong (AI Infra)	acffa604cc	disable cu112 test on windows (#54512 ) Summary: Currently cu112 test on windows is broken. see https://app.circleci.com/pipelines/github/pytorch/pytorch/288940/workflows/c6612fe8-8396-4266-88d8-2ad2736c994c/jobs/11744008/steps Pull Request resolved: https://github.com/pytorch/pytorch/pull/54512 Reviewed By: janeyx99 Differential Revision: D27265585 Pulled By: walterddr fbshipit-source-id: 49e212d6c332a9725e6f2a78faf41198d4a21ac5	2021-03-23 11:48:06 -07:00
Neha Shah	f3c00047ce	Reset Optimizer counter while deserializing netWithBackwardOptions Summary: Add ability to reset optimizer counter.. Test Plan: will wait for integration tests to run on diff. Differential Revision: D27248286 fbshipit-source-id: a608df1bd61b64eb317c9ffd9cfdd804c5288f6d	2021-03-23 11:16:11 -07:00
Sam Estep	ba9f12d235	Fix minor whitespace typo in tools/test_history.py (#54504 ) Summary: oops Pull Request resolved: https://github.com/pytorch/pytorch/pull/54504 Test Plan: ``` tools/test_history.py --help ``` Reviewed By: walterddr Differential Revision: D27262271 Pulled By: samestep fbshipit-source-id: 0f47e9a69d35a605c558f6c86e3e2ca98720ff86	2021-03-23 08:26:44 -07:00
Facebook Community Bot	a4a21e7d8d	Automated submodule update: FBGEMM (#54486 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `8998e6f1d7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54486 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D27255655 fbshipit-source-id: 5315687d4121c5ff2628ba7f134c1a5134369ed2	2021-03-23 07:10:05 -07:00
Hui Guo	2a53897114	[jit][tensorexpr] Added aten::batch_norm into fuser when in inference mode (#54204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54204 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D27134348 Pulled By: huiguoo fbshipit-source-id: 5ea7a6c5bc694fcdfc436dba3fa6eb269420324e	2021-03-23 04:41:52 -07:00
generatedunixname89002005325676	fee470d8ef	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27257117 fbshipit-source-id: 6fdda695987892a74137d3afa720979c8d5c68bb	2021-03-23 04:04:13 -07:00
sanchit	f2a38a0edd	Enabled BFloat16 support for argmax & argmin on both CPU & CUDA (#52582 ) Summary: 1. Enabled `BFloat16` support for `argmax` & `argmin` on both CPU & CUDA 2. Added `OpInfo`s for `argmax` & `argmin` 3. Enabled `test_argminmax_multiple` for `float16`. It can't be enabled for `bfloat16`, as comparison is done with numpy, which doesn't currently support `bfloat16`. 4. Enabled `test_dim_arg_reduction_scalar` for `float16` & `bfloat16`. 5. Enabled `test_reduction_vectorize_along_output` for `bfloat16`. 6. Enabled `test_reduction_vectorize_along_input_corner` for `bfloat16`. 7. Enabled `test_dim_reduction` for both `float16` and `bfloat16`, except that both of them don't support `prod` on CPU. 8. Unskipped `TestCommonCPU.test_variant_consistency_jit` for dtype `bfloat16` for `amax` & `amin`, as they're passing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52582 Reviewed By: anjali411 Differential Revision: D27204704 Pulled By: heitorschueroff fbshipit-source-id: cdad5df494d070f8e1a8fb83939441a91124b4d9	2021-03-23 03:38:11 -07:00
Peiyuan Liao	3519625a34	Fix onnx warning message (#54371 ) Summary: Adding a space between "as" and "Dropout". Pull Request resolved: https://github.com/pytorch/pytorch/pull/54371 Reviewed By: radkris-git Differential Revision: D27244053 Pulled By: heitorschueroff fbshipit-source-id: 500ea719e239ce89e5ac4b54e5b32a36155e8544	2021-03-23 03:05:52 -07:00
Ioana Tivadar	1041fdd069	Grammatically update tech docs (#54370 ) Summary: Small grammatical update to nn.rst ![Screenshot 2021-03-20 at 11 44 29](https://user-images.githubusercontent.com/80534697/111867047-d868f900-8971-11eb-8cc2-0ae7d2c59229.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54370 Reviewed By: radkris-git Differential Revision: D27243944 Pulled By: heitorschueroff fbshipit-source-id: 08d8061d9e74ffaf95c8a610107a8632259474ca	2021-03-23 02:59:19 -07:00
Luca Wehrstedt	8518b0ee55	[PyTorch] Update Bazel build for TensorPipe (#54416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54416 Once D27230990 lands, we'll need this for TensorPipe to be built with Bazel. ghstack-source-id: 124512701 Test Plan: None for now. Reviewed By: beauby Differential Revision: D27231000 fbshipit-source-id: 474cc1b23118703ecb47ed4b8e0c5b000572eae8	2021-03-23 01:34:13 -07:00
Yi Wang	c22fc448cd	[Gradient Compression] Remove cuda.syncrhonize in batched powerSGD (#54482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54482 `cuda.synchronize` is unnecessary for `batched_powerSGD_hook`. ghstack-source-id: 124607761 Test Plan: f259607860 f259563921 Reviewed By: rohan-varma Differential Revision: D27254314 fbshipit-source-id: 4744c07a6f0c8939e766ffa935ddbf3c47e85d18	2021-03-23 00:55:53 -07:00
Edward Yang	6d0027197c	Delete all unnecessary singular Math entries (#54436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54436 An operator entry with no dispatch table implicitly generates a Math entry, so you don't need to define one yourself. I also added some asserts in the codegen to fail on these cases. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235381 Pulled By: ezyang fbshipit-source-id: f8c905090b863120f4f3656c37e2b7f26e8bb9ef	2021-03-23 00:44:01 -07:00
Edward Yang	6e8c4ad7fd	s/StructuredNativeFunctions/NativeFunctionsGroup/ (#54427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54427 A StructuredNativeFunctions is no longer guaranteed to actually be structured (test structured property for that), so we rename this to a more neutral name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235380 Pulled By: ezyang fbshipit-source-id: 2b438d615bf06a47fc9c7bf6eb66fd8b4df31bc8	2021-03-23 00:43:57 -07:00
Edward Yang	bf2ca35f35	Rejigger to use NativeFunctionsGroup even without structured: True (#54426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54426 Previously, we only put NativeFunctions in StructuredNativeFunctions if the out variant advertised that the kernel was structured. However, there are a few code generation things that can take advantage of this trio structure, even if the kernel itself hasn't been ported to be structured. So better to always group things when they are related, and then let clients decide whether or not to use the structure or throw it away. While doing this, I had hoped that there weren't any functional/inplace pairs that didn't also have an out variant. This turned out to not be true. These are probably all oversights and should get fixed at some point. Bill of changes: - The actual operational change happens in StructuredNativeFunctions.from_dict; then I need to relax some __post_init__ invariants. To tell if a StructuredNativeFunctions is actually structured, there is a new structured property, which is queried from a few new locations in code - Refactor native_functions.py into gen_structured/gen_unstructured functions so I can easily call gen_unstructured from two contexts I intend to s/StructuredNativeFunctions/NativeFunctionsGroup/ but for ease of review this rename hasn't been done in this PR. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235379 Pulled By: ezyang fbshipit-source-id: d8a15de9abb75b365348ab94e67b830704e30cf0	2021-03-23 00:43:54 -07:00
Edward Yang	c00d66f73c	Move compute_native_function_declaration to its own dest module (#54419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54419 I'm planning to break it into some helper functions, so let's put it in its own module first. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235378 Pulled By: ezyang fbshipit-source-id: c03c5440d2d753859e2c5ec2b2c8b1b82870f03a	2021-03-23 00:43:50 -07:00
Edward Yang	349a17f1c0	Replace some tensor.device().is_cpu() calls with direct tensor.is_cpu() (#54397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54397 I was supposed to have done this in https://github.com/pytorch/pytorch/pull/54079 but apparently I forgot to push these changes before landing, so here's the clean up. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D27235382 Pulled By: ezyang fbshipit-source-id: ffcce5abc78251c81c230992bac70b8973906ace	2021-03-23 00:42:05 -07:00
Aliaksandr Ivanou	77ccd4f9a3	[5/n][torch/elastic][upstream] Move torchelastic/agent to torch/distributed/elastic/agent (#54343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54343 Move torchelastic/agent to torch/distributed/elastic/agent Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/agent/server/test/... Reviewed By: kiukchung, wilson100hong Differential Revision: D27173271 fbshipit-source-id: 26761acc3f962af2afffcc3c7a237f3b6d65e531	2021-03-22 23:15:37 -07:00
lezcano	5870346173	Port index_copy from TH to ATen (#52203 ) Summary: The design of the `TensorIterator` was similar to that in https://github.com/pytorch/pytorch/pull/50578 Resolves https://github.com/pytorch/pytorch/issues/24670 Resolves https://github.com/pytorch/pytorch/issues/24523 Timings: <details> <summary>Script</summary> ```python from IPython import get_ipython import torch torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, index_len, device): print(f"ndims: {ndims}, tensor_size: {size}, index_len: {index_len}, device: {device}") x = torch.rand(([size] ndims), device=device) index = torch.randint(size, (index_len,), dtype=torch.long, device=device) for d in range(ndims): shape_t = [size] * d + [index_len] + [size] * (ndims - d - 1) t = torch.rand(*shape_t, device=device) command = "x.index_copy(d, index, t)" if device == cuda: command = command + "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() run_test(3, 700, 10, cpu) run_test(3, 700, 100, cpu) run_test(3, 700, 700, cpu) run_test(2, 10000, 10000, cpu) run_test(3, 700, 10, cuda) run_test(3, 700, 100, cuda) run_test(3, 700, 700, cuda) run_test(2, 10000, 10000, cuda) ``` </details> <details> <summary>CPU ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 327 ms ± 309 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 329 ms ± 456 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 378 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 348 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 359 ms ± 330 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 526 ms ± 686 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 560 ms ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 552 ms ± 2.61 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 932 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 163 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 302 ms ± 5.75 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA ATen</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.63 ms ± 441 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.65 ms ± 230 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.4 ms ± 881 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 10.8 ms ± 1.51 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 11 ms ± 417 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 21.2 ms ± 18.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 19 ms ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.8 ms ± 493 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 25.8 ms ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 5.59 ms ± 109 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10 ms ± 25.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cpu 333 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 327 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 366 ms ± 753 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 100, device: cpu 336 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 345 ms ± 914 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 884 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 3, tensor_size: 700, index_len: 700, device: cpu 441 ms ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 514 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 7.46 s ± 6.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cpu 141 ms ± 233 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 1.13 s ± 855 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` </details> <details> <summary>CUDA TH</summary> ``` ndims: 3, tensor_size: 700, index_len: 10, device: cuda 9.64 ms ± 390 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.68 ms ± 3.26 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 928 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 700, index_len: 100, device: cuda 11.6 ms ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 12.1 ms ± 3.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 30.3 ms ± 27.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 700, index_len: 700, device: cuda 27.2 ms ± 19.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 43.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 146 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 2, tensor_size: 10000, index_len: 10000, device: cuda 6.5 ms ± 3.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 64.7 ms ± 55.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> According to these we see a slight performance improvement across both CPU and GPU. cc: nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/52203 Reviewed By: jbschlosser Differential Revision: D27066572 Pulled By: mruberry fbshipit-source-id: 6101e461cf731afa3db042a383b723d3d6bfdc26	2021-03-22 22:36:35 -07:00
Hao Lu	52abd3bd7b	[Static Runtime] Fix bug in reshape_copy (#54467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54467 `at::native::copy_` requires src/dest to have the same sizes, which isn't true in reshape. Test Plan: Added new test cases to cover this case. Reviewed By: ajyu Differential Revision: D27249617 fbshipit-source-id: 2c95175fa8564b3c648979445ad4314f97818852	2021-03-22 22:20:55 -07:00
Ailing Zhang	c411017a41	Only allow hub.load() from original repo. (#54451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54451 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D27243825 Pulled By: ailzhang fbshipit-source-id: 2f65a82064d83b71224b4280ddfaabfa8ec9aec3	2021-03-22 20:27:54 -07:00
Elias Ellison	9be4c75fa0	[JIT] Add Reinplacing to MKLDNN Subgraphs (#53908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53908 This adds reinplacing to MKLDNN Subgraphs so that we replace `aten::add` with `aten::add_`. Normally you would have to prove device and dtype, but we know that already, and because we have explicit broadcast nodes for other reasons we dont have to prove that the output shape of add is the same as inputs. Ive tested correctness on resnet, I'm going to do more extensive testing as well. When I benchmarked the "unsafe" version (always inplace) I saw average speedups of ~16% for both Single threaded and Multithreaded. I dont think the "safe" version will be far beyond; when I looked at resnet for example every `add` and `relu` were reinplaced. Theres some question of reusing other alias / liveness / inplacing passes in SR. I thought about it, however I didnt want to add a cross-dependency between very different parts of the code base with a bunch of different assumptions. The logic here is also covering a simpler case and does not add much complexity IMO. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D27132969 Pulled By: eellison fbshipit-source-id: 121a38daaedf01363f6b66a814beaaa72a0ab0dc	2021-03-22 19:21:03 -07:00
Elias Ellison	81c6e5fb38	use reshape when possible in broadcasting (#53326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53326 Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D26897275 Pulled By: eellison fbshipit-source-id: 44278633a1e6429db43443ca689b97d5a077a15c	2021-03-22 19:20:59 -07:00
Elias Ellison	18c04a3f0f	Avoid dispatch overhead in call to mkldnn convolution (#52614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52614 This can speed up models by 5% (~.5-1% from the base, but ~5% after they've been sped up with mkldnn). Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696693 Pulled By: eellison fbshipit-source-id: bfed55242524a4c2f1ae5d63e76d6803016d986d	2021-03-22 18:38:39 -07:00
Scott Wolchok	3959d393b8	[PyTorch][JIT] Less shared_ptr use in dictConstruct (#54110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54110 dictConstruct doesn't need to make its caller have a `shared_ptr<DictType>`. It also doesn't need to do extra `shared_ptr` copies into the `key_type` and `value_type` locals. ghstack-source-id: 124150642 Test Plan: fitsships Reviewed By: ezyang Differential Revision: D27101782 fbshipit-source-id: 3c632ad9d8f1bd7bdf37f517a86aca27bd41548a	2021-03-22 18:31:27 -07:00
Scott Wolchok	7e33dc3498	[PyTorch] Avoid extra intrusive_ptr copy in IValue::toIntrusivePtr (#54124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54124 No need to have an extra temporary intrusive_ptr (`p`) just to do an `incref`. ghstack-source-id: 124150644 Test Plan: existing tests for correctness; inspect assembly for c10::IValue::toObject to double-check & see that it's a bit shorter Reviewed By: smessmer Differential Revision: D27109183 fbshipit-source-id: 497706190867eeac0fb1d309d0ecc97cf8d65b08	2021-03-22 18:29:50 -07:00
Facebook Community Bot	568d43b935	Automated submodule update: FBGEMM (#54447 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `ffff7a3118` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54447 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D27242112 fbshipit-source-id: 768b1a40652b6c2f0710bd4bb655697daf45f756	2021-03-22 18:21:47 -07:00
Lucas Hosseini	decbdf7b0b	Get rid of {Cpu,Cuda}{Channel,Context} in tensorpipe_agent. (#54432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54432 Following the merge of channel hierarchies, here comes the promised clean up. Test Plan: CI Reviewed By: lw Differential Revision: D27232442 fbshipit-source-id: 540dc6bc18a9a415b676e06e75530d729daf2d5b	2021-03-22 18:03:23 -07:00
Christian Puhrsch	2668149b8c	Export torch::jit::toIValue (#54449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54448 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54449 Reviewed By: SplitInfinity Differential Revision: D27243154 Pulled By: cpuhrsch fbshipit-source-id: fc21d6ce251b868356ad8ea13ae891fb56e311ce	2021-03-22 17:17:18 -07:00
frdong	92770d25cd	fix comparison of narrow type with wide type in loop condition (#53951 ) Summary: fix Semmle warning: Comparison of narrow type with wide type in loop condition For example there is below piece of code: for (int i=0; i<array.size(); ++i) {} The problem is that array.size() return type is size_t can be larger type than int depending on the implementation so there is chance that i overflows (for very large array that array size is beyond the range of integer) and this loop will never be terminated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53951 Reviewed By: zou3519 Differential Revision: D27181495 Pulled By: malfet fbshipit-source-id: 0612c5cedcdc656c193085e7fbb87dd163f20688	2021-03-22 16:40:35 -07:00
Wenlei Xie	edfc787df4	Migrate kernels with Tensor? to C10 full dispatcher (#54263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54263 Codemod commands generated by https://github.com/pytorch/pytorch/pull/54223 Signatures of the following 8 methods in LegacyTHFunctionsCUDA.h are manually changed. ``` _thnn_multi_margin_loss_forward _thnn_multi_margin_loss_backward _thnn_nll_loss_forward _thnn_nll_loss_backward _thnn_nll_loss2d_forward _thnn_nll_loss2d_backward _thnn_conv2d_forward _thnn_conv_depthwise2d_forward ``` ghstack-source-id: 124539990 Test Plan: buck build //caffe2/aten/... Reviewed By: smessmer Differential Revision: D27164092 fbshipit-source-id: 59062179ffd958ca253cbf63fdd495799b9a9586	2021-03-22 16:08:10 -07:00
Wenlei Xie	5a27199149	Add device_of overload for optional<Tensor> (#54262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54262 register_dispatch_key.py might generate device_of call over optional<Tensor> if it happened to be the first Tensor-like argument. ghstack-source-id: 124535550 Test Plan: Test together with next diff in stack Reviewed By: ezyang Differential Revision: D27164093 fbshipit-source-id: 3b0400d5d603338e884218498106f6481e53f194	2021-03-22 16:06:29 -07:00
Can Balioglu	2130f4ccc4	Use c10::ArrayRef instead of std::vector for the jit::unpickle's tensor_table. (#54428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54428 Using c10::ArrayRef as the parameter type makes the API more flexible and allows the caller to leverage small-buffer optimizations (e.g. c10::SmallVector, std::array) for performance critical cases. Test Plan: No behavioral changes. Run the existing unit and integration tests. Reviewed By: suo Differential Revision: D27232222 fbshipit-source-id: 7b13bc6bd02257097ca119077028fbccc68cc925	2021-03-22 15:31:47 -07:00
Edward Yang	4919fecf23	Delete dead TensorOptions::key_set (#54004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54004 According to `glean-search find-decls --refs 'c10::TensorOptions::key_set'` there are no uses of this function Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D27047971 Pulled By: ezyang fbshipit-source-id: 63662dd7ab27753ecb79c45c152c2cad1160dab2	2021-03-22 15:24:34 -07:00
Jacob Szwejbka	9fef25e579	[Pytorch Mobile] optimize_for_mobile: Remove dropout from any function (#53846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53846 Theres already a varient of removeDropout that takes in a graph. So just switch to calling that one. It doesnt error check that the module isnt in training mode (because it doenst have a module) but optimize_for_mobile guarantees the cloned_module is in eval mode. ghstack-source-id: 124544216 Test Plan: called optimize on forward and foo, both contained dropouts, both dropouts removed. Called both functions afterwords to verify they ran and gave the same output. {P308987364} Reviewed By: kimishpatel Differential Revision: D26986251 fbshipit-source-id: 085e08cbaa982aa08803a718fee4380af5f86b78	2021-03-22 14:57:02 -07:00
Rong Rong (AI Infra)	f1e72a7e18	add uncommit change detector (#54373 ) Summary: warn if uncommit changes exists in .circleci/config.yml, unlike other generated code, .circleci/config.yml actually commits to the repo. (this is a follow up of https://github.com/pytorch/pytorch/issues/54345) two options I am open to 1. abort regenerate if detected 2. print out backed up temp filename Also remove the `-x` since it is currently very verbose ``` ++ dirname .circleci/regenerate.sh + cd .circleci ++ mktemp + OLD_FILE=/var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.54GhUh7w + cp config.yml /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.54GhUh7w ++ mktemp + NEW_FILE=/var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.aV87RTvQ + ./generate_config_yml.py + cp /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.aV87RTvQ config.yml ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54373 Test Plan: 1. ``` $ echo "418 I'm a teapot" > .circleci/config.yml $ .circleci/regenerate.sh $ .circleci/regenerate.sh ``` Result: ``` $ .circleci/regenerate.sh Uncommitted change detected in .circleci/config.yml It has been backed up to /var/folders/89/brnr1wt970130lk0m52605mw0000gn/T/tmp.2VOp4BPo New config generated in .circleci/config.yml $ .circleci/regenerate.sh #-- second time there's no uncommitted changes New config generated in .circleci/config.yml ``` 2. ``` $ echo "418 I'm a teapot" > .circleci/config.yml $ git add .circleci/config.yml $ .circleci/regenerate.sh $ .circleci/regenerate.sh ``` Result: ``` $ .circleci/regenerate.sh Uncommitted change detected in .circleci/config.yml It has been backed up to /var/folders/89/brnr1wt970130lk0m52605mw0000gn/T/tmp.2VOp4BPo New config generated in .circleci/config.yml $ .circleci/regenerate.sh #-- second time there's still uncommitted changes b/c git split staged vs unstaged changes Uncommitted change detected in .circleci/config.yml It has been backed up to /var/folders/89/brnr1wt970130lk0m52605mw0000gn/T/tmp.2ruMAynI New config generated in .circleci/config.yml ``` Reviewed By: samestep Differential Revision: D27234394 Pulled By: walterddr fbshipit-source-id: 6364cc1f6f71a43424a63ca6fce9d2ba69437741	2021-03-22 14:51:22 -07:00
Pruthvi Madugundu	0f628d1503	[ROCm][doc] add ROCm section for building from source (#53845 ) Summary: Instructions for compiling PyTorch from source for ROCm were missing now that PyTorch 1.8 announced beta support for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53845 Reviewed By: heitorschueroff Differential Revision: D27237916 Pulled By: malfet fbshipit-source-id: c8be92fd76ea8df7e9f6944c0036568189f58808	2021-03-22 14:35:35 -07:00
Peng Wu	1e09880b45	Add support for list insertion for mutation removal (#54271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54271 Test Plan: python test/test_jit.py TestRemoveMutation.test_lists_insert Imported from OSS Reviewed By: bertmaher Differential Revision: D27180031 fbshipit-source-id: ba4388b6688cf83caf70901934f4adacd22d8ca6	2021-03-22 14:27:24 -07:00
Rong Rong (AI Infra)	263cd5cf98	Disable all cu92 in scheduled-ci (#54421 ) Summary: since we no longer support cuda9.2 disabling scheduled ci for those Pull Request resolved: https://github.com/pytorch/pytorch/pull/54421 Reviewed By: janeyx99 Differential Revision: D27234293 Pulled By: walterddr fbshipit-source-id: 923e32c0229ea861bce6ff473501892bd4e5bec1	2021-03-22 13:25:08 -07:00
n-v-k	7bda8b650c	[caffe2] Fix caffe2 build with TensorRT support (#54322 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54321 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54322 Reviewed By: heitorschueroff Differential Revision: D27190319 Pulled By: malfet fbshipit-source-id: 224b19f71572e07ef5092ce397497f99935a45a6	2021-03-22 13:19:08 -07:00
Jongsoo Park	17f9b5ff4a	[caffe2] increase deadline of test_dnnlowp_batch_matmul_int_constant_B (#54241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54241 As title Test Plan: buck test //caffe2/caffe2/quantization/server:batch_matmul_dnnlowp_op_test -- test_dnnlowp_batch_matmul_int_constant_B -[jongsoo@devbig279.ftw3 ~/server] buck test :batch_matmul_dnnlowp_op_test -- test_dnnlowp_batch_matmul_int_constant_B --run-disabled --stress-runs 10 --jobs 18 Reviewed By: dskhudia Differential Revision: D27150098 fbshipit-source-id: be8bc1e57077a7399ae5fd5e5df14407b618a7f3	2021-03-22 13:13:22 -07:00
Nikita Shulga	8bb07c7e21	[CI]Install older cmath during Windows build (#54431 ) Summary: Based on peterjc123 analysis, `cmath` after `26bbe2ad50 (diff-3fa97ceb95d524432661f01d4b34509c6d261a2f7f45ddcf26f79f55b3eec88a)` renders a lot of CUDA fail to compile with: ``` error: calling a __host__ function("__copysignf") from a __host__ __device__ function("c10::guts::detail::apply_impl< ::at::native::AUnaryFunctor< ::> &, ::std::tuple<float > &, (unsigned long long)0ull > ") is not allowed ``` Workaround for https://github.com/pytorch/pytorch/issues/54382 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54431 Reviewed By: anjali411 Differential Revision: D27234299 Pulled By: malfet fbshipit-source-id: b3f1fef941341222cc10cb27346fcf4a1d522a0c	2021-03-22 12:20:23 -07:00
Yi Wang	6e7a3c1fdd	Clang-format distributed.py (#54402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54402 ghstack-source-id: 124497872 Test Plan: N/A Reviewed By: zhaojuanmao Differential Revision: D27225942 fbshipit-source-id: 277f466554fbc034fb76de161bf4b3b7c431daf7	2021-03-22 11:39:58 -07:00
Lucas Hosseini	a46d56f988	Update tensorpipe submodule. (#54412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54412 Reviewed By: lw Differential Revision: D27230317 Pulled By: beauby fbshipit-source-id: 9e8380584cdd0f5750047005416202a23abe738c	2021-03-22 11:19:03 -07:00
Rong Rong	19f77700ec	clean up typos in submodule (#54372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54372 Reviewed By: heitorschueroff Differential Revision: D27233797 Pulled By: walterddr fbshipit-source-id: f8d321199b6ae8b482e2ac3f10575402551365ef	2021-03-22 11:13:06 -07:00
Hui Guo	c2c97cd290	<tensorexpr> Add python bindings for missing loop transformations in LoopNest (#54355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54355 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27202561 Pulled By: huiguoo fbshipit-source-id: 87cb6730311845157d4cba749017de80fd9aa82e	2021-03-22 10:32:18 -07:00
kshitij12345	afb560065c	[testing] OpInfo for sgn and sign (#53885 ) Summary: Reference https://github.com/pytorch/pytorch/issues/42515 TODO: * [x] Check rendered docs. https://11525594-65600975-gh.circle-artifacts.com/0/docs/generated/torch.sgn.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/53885 Reviewed By: ejguan Differential Revision: D27114318 Pulled By: mruberry fbshipit-source-id: 678179d87741aacd3b50f03dc460207c5aa29589	2021-03-22 09:39:40 -07:00
Pavel Belevich	b6bbb41fd8	Temporary disable TestNumbaIntegration.test_from_cuda_array_interface* (#54430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54430 see https://github.com/pytorch/pytorch/issues/54429 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27232636 Pulled By: pbelevich fbshipit-source-id: 15fb69828a23cb6f3c173a7863bd55bf4973f408	2021-03-22 09:17:28 -07:00
Kimish Patel	ef472d5b31	Back out "[PT QNNPACK] Temporarily disable input pointer caching" (#52917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52917 Original commit changeset: f6ceef606994 Test Plan: FB: This was an attempt to fix ig crashes but we root caused it to pthreadpool changes. Thus this is not needed anymore. Reviewed By: AshkanAliabadi Differential Revision: D26485737 fbshipit-source-id: 5d689231cccd11d911b571f8486a19d646352698	2021-03-22 09:05:42 -07:00
Alyssa Wang	2355f61f19	Add logging for debugging S223170 Summary: more context in T86752810. Add info for tensor lengths size to see if it fails on in complete batch Test Plan: manually created failed run: f258719092 Reviewed By: aartibasant Differential Revision: D27181049 fbshipit-source-id: 341c020a3430c410f9726d92315efb80d36e9452	2021-03-22 08:58:40 -07:00
Jane Xu	635595f706	Change sharding in ci (#54228 ) Summary: Step three (landing this should fix https://github.com/pytorch/pytorch/issues/53882)! Modifying CI to compute job times during build so that the exported job times can be used for sharding future test jobs. The builds that are exempted from this: - `bazel` (no python tests so no need) - `libtorch` (no python stuff so no need) - `onnx` (the test shards are not calculated the same way) - `asan` (runs into error I don't know how to debug/we can debug later: [logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/288019/workflows/57f95f67-1a1b-44a0-9b02-9652b57f2a5f/jobs/11693962) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54228 Test Plan: CI Reviewed By: samestep Differential Revision: D27192978 Pulled By: janeyx99 fbshipit-source-id: 3cb20d14f4989e61873043b81dfd6b0f82d17ccd	2021-03-22 08:40:34 -07:00
Edward Yang	d226985257	Read out layout from options directly, rather than via backend (#54074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54074 I don't see why this shouldn't work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D27086594 Pulled By: ezyang fbshipit-source-id: 1d5f1997017ec48c4140f43e44f0d8a3df28ac7f	2021-03-22 08:20:13 -07:00
johnlu	36ce673f16	Disable the fusion group which is not supported by XPU device. (#54239 ) Summary: The XPU device doesn't support the fusion group. Disable it for XPU devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54239 Reviewed By: zou3519 Differential Revision: D27188735 Pulled By: ezyang fbshipit-source-id: f28f62148e7aa12e8b08345df7eb0133216ce6a5	2021-03-22 07:43:28 -07:00
Edward Yang	4ffafbac40	Make test_cpp_extensions_aot handle lack of pytest more gracefully (#53740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53740 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26956603 Pulled By: ezyang fbshipit-source-id: 09ff60b29c4bd64044f4c9f0b7e17ffed33c30db	2021-03-22 07:23:41 -07:00
Mike Ruberry	7b939d934e	Simplifes OpInfo test matrix to reduce test time (#53255 ) Summary: This PR: - Updates the structure of the SampleInput class to require the "input" attribute be a tensor - Limits unary ufuncs to test only the uint8, long, float16, bfloat16, float and cfloat dtypes by default - Limits variant testing to the float dtype - Removes test_variant_consistency from test_unary_ufuncs.py since it's now redundant with variant testing in test_ops.py - Adds backwards supported testing to clarify failures that were coming from variant testing This should decrease test e2e time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53255 Reviewed By: ngimel Differential Revision: D27043643 Pulled By: mruberry fbshipit-source-id: 91d6b483ad6e2cd1b9ade939d42082980ae14217	2021-03-22 03:48:27 -07:00
Jeff Daily	ab8e9188dc	add --gpu-max-threads-per-block=256 to hipMAGMA build (#54161 ) Summary: As of ROCm version 4.0.1, the HIP compiler default for max threads per block is 256 but is subject to change in future releases. To protect against changes, hipMAGMA should be built with the previously-assumed default. This change is necessary here in PyTorch until upstream magma project utilizes `__launch_bounds__` or some other means of controlling launch bounds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54161 Reviewed By: zou3519 Differential Revision: D27194829 Pulled By: malfet fbshipit-source-id: 8be2cff3b38786526954b627ff6ab02b510040a1	2021-03-21 22:09:36 -07:00
Facebook Community Bot	80a4a50081	Automated submodule update: FBGEMM (#54118 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `88ba128b7c` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54118 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: ejguan Differential Revision: D27105781 fbshipit-source-id: 3f71299dcee11459efa3a14c051afc031a99ecea	2021-03-21 20:12:30 -07:00
Mike Ruberry	fc58b3f146	Skips failing pinv and pinverse test (#54392 ) Summary: This will unblock the CI failing due to https://github.com/pytorch/pytorch/issues/54381. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54392 Reviewed By: ngimel Differential Revision: D27221925 Pulled By: mruberry fbshipit-source-id: 5b6e6f21428fd7d97cc75300e3a1aca2a40fbb24	2021-03-21 14:45:27 -07:00
Yanan Cao	f48a9712b7	Rewrite functional.tensordot to be TorchScript-able (#53672 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53487 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53672 Reviewed By: VitalyFedyunin Differential Revision: D26934392 Pulled By: gmagogsfm fbshipit-source-id: f842af340e4be723bf90b903793b0221af158ca7	2021-03-20 23:03:30 -07:00
Lucas Hosseini	cffa70d36d	Merge channel hierarchies. (#54333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54333 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/326 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/312 This is a first step towards cross-device type transfers: eventually, channels will not connect devices of a given type between two hosts, but possibly heterogeneous pairs of devices. Hence, the distinction between CPU-to-CPU and GPU-to-GPU channels will not make much sense anymore, and we can afford to simplify the Pipe's code quite bit. The main change here is that the `channel::Channel` and `channel::Context` classes are not templated (on the buffer type) anymore. Instead, a channel's `send`/`recv` methods act on generic `Buffer`s and the actual unpacking is done in the `ChannelBoilerplate`. The `channel::CpuContext`/`channel::CudaContext` (respectively `channel::CudaContext`/`channel::CudaChannel`) aliases now simply resolve to `channel::Context` (respectively `channel::Channel`). A subsequent diff will get rid of the aliases altogether. The Pipe is being simplified: all the duplication due to having separate hierarchies is gone, which gets rid of a lot of boiler plate template code. Note that previously, two channels with the same name could potentially coexist, provided one was a CPU channel and the other a GPU channel. This is not the case anymore, though it should not matter. In its current state, the Pipe still needs to pick a channel based on whether that channel acts on CPU or GPU buffers. This is solved by introducing the temporary method `bool channel::Context::supportsDeviceType(DeviceType t)`. When iterating through available channels to select one for a given tensor, the Pipe now discards channels that do not support the tensor's `DeviceType`. This leads to having a single ordered list of channels, which in practice is two separate lists (one for CPU, one for GPU) merged together. This will change soon as we initialize only one channel per `DeviceType`. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D26958187 Pulled By: beauby fbshipit-source-id: 3e3f7921166892d468fa78cfad3199277588021c	2021-03-20 17:59:23 -07:00
Hao Lu	8294bff20d	[StaticRuntime] Copy version of reshape/flatten (#54353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54353 The current implementation of reshape/flatten is problematic because whether the output is sometimes a tensor view and sometimes not. It entirely depends on the graph ir and input shapes. Replacing them with the copy version makes it deterministic and the output is always a tensor. Reviewed By: ajyu, edvgha Differential Revision: D26358525 fbshipit-source-id: ee7571317b061221a8d50083676cded388ce6f87	2021-03-20 16:55:30 -07:00
Rohan Varma	08e4312559	Tag distributed team for review for /torch/nn/parallel (#54221 ) Summary: This folder contains the DDP python interface as well as several misc. communication files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54221 Reviewed By: agolynski Differential Revision: D27149068 Pulled By: rohan-varma fbshipit-source-id: 0c23ea9a0d1dfc2719a2008e182ea75f2058d7dc	2021-03-20 13:20:00 -07:00
Bert Maher	98baad5764	[nnc] Remove cached argv from LLVMCodeGen to fix race condition (#54286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54286 A generated code object was holding not just a function pointer but a pre-allocated argument buffer. I assume this was a performance optimization to avoid allocating a vector on each call? This cached buffer makes it unsafe to call a generated function from multiple threads, which is too severe a limitation. This diff fixes it by locally allocating a SmallVector to hold the args. A better fix will be to avoid creating CallArgs, so the function can be called directly without this packing-and-unpacking nonsense, but that's a slightly more involved fix, possibly involving changing the kernel codegen, and this bug needs fixing now. ghstack-source-id: 124333028 Test Plan: `threads=64 scripts/bwasti/static_runtime/run.sh` Reviewed By: asuhan Differential Revision: D27175715 fbshipit-source-id: 44dafe77b95ede69c63ae6d64f39f0aa4877712f	2021-03-19 22:36:58 -07:00
Pritam Damania	4fa47e5e7d	Support non-tensor inputs and outputs for checkpointed functions. (#52422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52422 As mentioned in https://github.com/pytorch/pytorch/issues/52415, `torch.utils.checkpoint` doesn't support checkpointing for functions which have non-tensor inputs and outputs. This PR resolves this issue by ensuring the autograd machinery ignores the non-tensor inputs and outputs and processes the tensors accordingly. ghstack-source-id: 124406867 Test Plan: 1) unit test 2) waitforbuildbot Reviewed By: albanD Differential Revision: D26507228 fbshipit-source-id: 0a5a1591570814176185362e83ad18dabd9c84b0	2021-03-19 21:29:03 -07:00
lezcano	9d9986fd10	Support for Half / bfloat16 / index_select and better testing (#53898 ) Summary: Added the support for half / bfloat / bool for `index_select`, as suggested by ngimel in https://github.com/pytorch/pytorch/issues/49707#issuecomment-788140578 For the tests to pass, I also added the support for `index_add`. I added `OpInfo` tests for `index_add` and more thorough forward tests for `index_select` to test these changes. While doing so, I found that the support for scalar types in the derivative of `index_add` was not correct, so I corrected it. Resolves https://github.com/pytorch/pytorch/issues/49707 It should also resolve similar issues that I encountered when porting `index_copy`, `take` and `put`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53898 Reviewed By: mruberry Differential Revision: D27193294 Pulled By: ngimel fbshipit-source-id: 5a0af2c62a0cf24f3cc9c74f230ab4f3712bbb7a	2021-03-19 20:37:48 -07:00
Meghan Lele	d58c00a5d8	[package] Make exporters write to buffer in fbcode (#54303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54303 Summary Creating temporary files can cause problem in fbcode. This commit updates the packaging tests so that exporters write to a memory buffer when tests run in fbcode. Test Plan Continuous integration. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27180839 Pulled By: SplitInfinity fbshipit-source-id: 75689d59448de2cd1595ef0ecec69e1bbcf9a96f	2021-03-19 19:59:35 -07:00
Nikita Shulga	41b1ea216f	Fix `torch.linalg.qr` example (#54342 ) Summary: Since a.size() is (3, 4, 5), so r.size() is (3, 4, 5) , but q.size is (3, 4, 4) Also, reduce tolerance from 1e-8 to 1e-5 as Fixes https://github.com/pytorch/pytorch/issues/54320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54342 Reviewed By: walterddr Differential Revision: D27193947 Pulled By: malfet fbshipit-source-id: 362a0fdd90550888a4f0c6deaa49b9f72d379842	2021-03-19 19:49:53 -07:00
Wanchao Liang	270d675f86	update distributed doc table for alltoall nccl (#54277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54277 alltoall already supported in nccl backend, so update the doc to reflect it. Test Plan: Imported from OSS Reviewed By: divchenko Differential Revision: D27172904 Pulled By: wanchaol fbshipit-source-id: 9afa89583d56b247b2017ea2350936053eb30827	2021-03-19 15:35:10 -07:00
Yukio Siraichi	27048c1dfa	Remove legacy constructor calls from _torch_ folder. (#53889 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53146 Related to https://github.com/pytorch/pytorch/issues/47112 As mentioned in https://github.com/pytorch/pytorch/issues/47112, the plan is to: 1. Verify that all `torch.Tensor()` scenarios are covered by other functions 2. Scrub internal `torch.Tensor()` uses 3. Update the docs and throw `TORCH_WARN_ONCE` if someone uses `torch.Tensor()` In this PR, I replaced all occurrences of `torch.Tensor` present in the _torch_ folder. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53889 Reviewed By: walterddr, zou3519 Differential Revision: D27190743 Pulled By: jbschlosser fbshipit-source-id: 7ecc201d57935b8dbb98ae3718b60d95cb55a010	2021-03-19 15:20:19 -07:00
Chester Liu	6a4d2c61d5	Allow linking against vcomp on Windows (#54132 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54054 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54132 Reviewed By: zou3519 Differential Revision: D27181524 Pulled By: malfet fbshipit-source-id: b79b34afb7edcc594d9b5907c5a7505b9cc5683b	2021-03-19 14:36:07 -07:00
Brian Hirsh	6f7a5a47af	port div to structured (#53680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53680 Porting `div` to structured. One weird thing to call out with div: It has an overload, `div.Tensor_mode`, which uses different TensorIterator settings depending on this input (the "mode" argument that you pass to it). So I ended up switching on the mode inside of the meta function to determine which TensorIterator builder to use. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27029819 Pulled By: bdhirsh fbshipit-source-id: 3f216f6c197a2321087b4c23202bc2fc561491ba	2021-03-19 14:32:28 -07:00
Brian Hirsh	fa482aa4ef	port sub to structured, fix sub.Scalar bug (#53679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53679 This PR ports sub to be a structured kernel. It also fixes a bug with `sub.Scalar`. `sub.Scalar` is currently listed as a `DefaultBackend` op, but it isn't actually backend agnostic- it calls into `native::sub`, which is CPU/CUDA-specific. That can cause bugs like [this](https://github.com/pytorch/pytorch/pull/51758) for other backends like MKLDNN. `sub.Scalar` is now really backend-agnostic, since it performs a redispatch to call the overload. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27029820 Pulled By: bdhirsh fbshipit-source-id: d24b435a42f4c505bc763ea77672956f81ad3e26	2021-03-19 14:32:24 -07:00
Brian Hirsh	779cae9e42	port at::pow to structured (#53669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53669 This PR does two things: * Ports `pow` to be structured * Fixes a bug with how pow handles mixed cpu and cuda tensors bug fix Pow is a binary op, and all binary ops that use TensorIterator are currently written to handle the case when one of the inputs is a CUDA tensor, and the other is a zero-dimensional cpu tensor. `pow` incidentally only handles one of the two cases: it fails when the CUDA tensor is passed as the exponent, e.g. `at::pow(torch.tensor(2.0, device='cpu'), torch.tensor([2, 2], device='cuda'))`. Porting `pow` to structured happened to change the error that was outputted from a `TORCH_CHECK` in TensorIterator to an `INTERNAL_ASSERT` in loop.cuh, so I ended up trying to fix the error and update the tests. I added more details in a comment on the PR. notes on the structured port Pow is a little weird, so I wrote down a couple of issues I noticed during the port: * Multiple independent overloads. `pow` has two overloads that have their own cpu/cuda kernels, meaning one doesn't call the other. I have to update the names of the kernel overloads to make the compiler happy, since the codegen would otherwise try to generate two classes with the same name. `pow` actually has 3 overloads that all have `out` variants, so I ported all 3 to structured- one of them just happens to redispatch one of the others in most cases. * Name propagation. Is name propagation implemented per operator? Or is expected to work for most/all ops by default. Right now it looks like it happens for TensorIterator ops by default. For ops that don't use TensorIterator, we need to explicitly pass the names through to the `set_output()` call in the meta function. This happened to matter for `pow` because it has 3 overloads, but only two of them directly use TensorIterator. I had to pass names directly to `set_output` in the 3rd overload to make tests happy. * Lack of `const Tensor &` in the C++ API. It's a goal to slowly make all `Tensor &` arguments const as part of the structured port, but in this case I needed to explicitly cast constness away because one structured kernel called back into the C++ API, which still has ordinary `Tensor &` arguments. This probably isn't something we'll fix soon, since we have boxing logic that actually relies on the `Tensor &` / `const Tensor &` distinction in some places. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27029821 Pulled By: bdhirsh fbshipit-source-id: c1786e770de6e6c2474b9a48210b88057ab1018e	2021-03-19 14:30:48 -07:00
BowenBao	454dd7ba86	Add codeowners for onnx export (#54287 ) Summary: cc neginraoof spandantiwari SplitInfinity for review Pull Request resolved: https://github.com/pytorch/pytorch/pull/54287 Reviewed By: zou3519 Differential Revision: D27190058 Pulled By: SplitInfinity fbshipit-source-id: 2f218e6da563be19338531e57fe8138f530cb86d	2021-03-19 13:51:39 -07:00
Sam Estep	679f07a017	Backup .circleci/config.yml before regenerating (#54345 ) Summary: If you accidentally modify `.circleci/config.yml` directly and then run `.circleci/regenerate.sh`, it clobbers your changes. This PR saves the previous contents of `.circleci/config.yml` to a temporary file, whose name is printed to the console due to the `-x` already present in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54345 Test Plan: ``` $ echo "418 I'm a teapot" > .circleci/config.yml $ .circleci/regenerate.sh ``` Before: ``` ++ dirname .circleci/regenerate.sh + cd .circleci ++ mktemp + NEW_FILE=/var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.vW7yBQT2 + ./generate_config_yml.py + cp /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.vW7yBQT2 config.yml ``` ``` $ echo ':(' :( ``` After: ``` ++ dirname .circleci/regenerate.sh + cd .circleci ++ mktemp + OLD_FILE=/var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.54GhUh7w + cp config.yml /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.54GhUh7w ++ mktemp + NEW_FILE=/var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.aV87RTvQ + ./generate_config_yml.py + cp /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.aV87RTvQ config.yml ``` ``` $ cat /var/folders/vw/ryb6j4d97xs1t_14024b710h0000gn/T/tmp.54GhUh7w 418 I'm a teapot $ echo ':D' :D ``` Reviewed By: janeyx99 Differential Revision: D27195142 Pulled By: samestep fbshipit-source-id: fcd9e4ac102ec3523d96f772eedbd42123364e26	2021-03-19 13:08:02 -07:00
Adam Simpkins	da18313de3	[caffe2] expose whether FBGEMM is available to the Python code (#54274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54274 Some of the Python tests need to be aware of whether or not FBGEMM is available, so expose this setting in the pybind extension. ghstack-source-id: 124317732 Test Plan: Will use this variable in the tests on D26658205. Reviewed By: mraway Differential Revision: D27171780 fbshipit-source-id: 4c94144a959bf8bf0e1553b6e029e94a91794e29	2021-03-19 12:52:14 -07:00
Hao Lu	f1cbd10276	[PyPer] Port c2 add to pt (#54229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54229 Because caffe2 add uses Eigen for add with broadcasting which is not well supported by OSS PyTorch, it's easier to just keep the `c2_add_out` internal for now. Caffe2 does use mkl add when the input dims of A and B are the same and there is no broadcasting needed. Reviewed By: bertmaher Differential Revision: D27036279 fbshipit-source-id: 49f0ec5407ea1f641896f054cad2283faed81687	2021-03-19 12:45:11 -07:00
Sam Estep	fa07d0c8eb	.github: Add workflow to build libtorch (#53292 ) Summary: Based on https://github.com/pytorch/pytorch/issues/50633 and https://github.com/pytorch/pytorch/issues/51243. Things left to do: - [x] modify `.github/scripts/generate_binary_build_matrix.py` further - [x] add option for not iterating over Python version - [x] add `LIBTORCH_VARIANT` - [x] add option for cxx11 - [x] fix the artifact uploading - [x] remove `pull_request` hook before merging Pull Request resolved: https://github.com/pytorch/pytorch/pull/53292 Test Plan: [CI](https://github.com/pytorch/pytorch/actions/runs/665781075). Reviewed By: seemethere Differential Revision: D27189150 Pulled By: samestep fbshipit-source-id: ec91e1f0b75b8c93613d55801585ed975697be03	2021-03-19 12:39:36 -07:00
James Reed	05a03a6c8c	[FX][EZ] Fix type correctness on GraphModule.graph (#54305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54305 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D27181176 Pulled By: jamesr66a fbshipit-source-id: ed91cfed193984249c07a5bafc7aa732bfe0194d	2021-03-19 11:48:15 -07:00
Brian Hirsh	bc4f521178	port at::mul to structured (#52692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52692 Porting `at::mul` to structured. One other issue I hit with the port was the fact that there are a bunch of other places around the code base that used to call out to variants of `at::native::mul`, which no longer exists. Technically, `at::cpu::mul` does the equivalent thing now, so I patched most call-sites to use that. There were two other places where I did something slightly different (calling `at::cuda::mul` and `at::mul`, respectively), which I called out in the comments. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27029822 Pulled By: bdhirsh fbshipit-source-id: 6cc80de0dfccec304bf8e16a1823e733bed27bf4	2021-03-19 11:34:33 -07:00
Nikita Vedeneev	61b074581c	`torch.prod` backward for complex types. (#48125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53511 torch.det does depend on torch.prod, which in turn depends on several other functions, and they also depend on torch.prod, so there is a circular relationship, hence this PR will enable complex backward support for several functions at once. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48125 Reviewed By: pbelevich Differential Revision: D27188589 Pulled By: anjali411 fbshipit-source-id: bbb80f8ecb83a0c3bea2b917627d3cd3b84eb09a	2021-03-19 09:44:08 -07:00
Iurii Zdebskyi	cc7a28d727	Refactor Unary Ops tests (#49712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49712 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25673712 Pulled By: izdeby fbshipit-source-id: 4420d5d129026195097d914e410b75b144bea795	2021-03-19 09:28:00 -07:00
Edward Yang	645a3e9a92	Fix inaccurate dispatch tables (#54127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54127 During the meta tensor bringup, I found all of these operators advertised that they worked on all backends (DefaultBackend/Math) but actually they only worked on CPU/CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27109508 Pulled By: ezyang fbshipit-source-id: 0f474ecf4aba8b8207f2910bdc962bf581f53853	2021-03-19 09:10:29 -07:00
Edward Yang	49f1336106	Add Tensor::is_cpu, genericize TensorIterator (#54079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54079 Fixes https://github.com/pytorch/pytorch/issues/53815 Instead of testing if something is CUDA, we instead test if something is not CPU. This in the general theming of "Don't be so darn CUDA centric". Intruigingly, we didn't have a is_cpu() method on Tensor. Which seems like a big oversight and one of the reasons how we ended up in this mess. So in it goes. Maybe we should also get this for Python bindings as well (but in that case, should probably look into redoing all of the is_X bindings so they aren't done manually). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27109507 Pulled By: ezyang fbshipit-source-id: abbe72c2e688c452ffe098d206cb79938b5824b1	2021-03-19 09:10:24 -07:00
Edward Yang	e0aebe241d	Refactor tensor_new.cpp to use TensorOptions instead of DispatchKey (#54034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54034 Fixes #53544 I had to touch a bunch of lines but the refactoring was fairly mechanical. Here's how it works. The basic concept behind this PR is that tensor_new.cpp was previously abusing DispatchKey when it actually meant TensorOptions. The provided DispatchKey argument to most of the constructor functions typically comes from torch::tensors::get_default_dispatch_key(); it doesn't really make sense for people to set the default dispatch key, but this got grandfathered in due to the old API set_default_tensor_type (where the "Type" concept got refactored into "DispatchKey" concept over time). See also #53124. But the upshot is that, semantically, what we refer to as the default dispatch key really is more like torch.set_default_tensor_type(torch.Tensor) versus torch.set_default_tensor_type(torch.cuda.Tensor): clearly the user wants to do something about construction of the tensor, and TensorOptions captures that exactly. So, how exactly to translate from one to the other? - Sources (things that used to PRODUCE DispatchKey) - Most top level functions take a DispatchKey as their argument. I use the new function dispatchKeyToTensorOptions to convert it into a TensorOptions - typeIdWithDefault now produces a TensorOptions (probably could do with a rename, though I didn't) - Sinks (things that used to CONSUME DispatchKey) - Previously, the function options() was typically used to convert the DispatchKey into a TensorOptions. Now its replacement build_options just takes a TensorOptions and sets some extra fields on it. Irritatingly, I can't just replace `build_options(options, scalar_type, device)` with `options.dtype(scalar_type).device(device)` because the semantics are slightly different: if device is nullopt, we should preserve the usage of the device specified in options (what options.device() does is overwrite the device unconditionally; e.g., if device is nullopt, unset device from options) - The other major sink for DispatchKey was `internal_new_from_data`, but it turns out it only really extracts the device type from the dispatch key. Now it just pulls out the device from TensorOptions. - To actually do the translation of DispatchKey to TensorOptions, I introduce new functions dispatchKeyToLayout (replicating layout_from_backend--there are still a few uses of this function so I couldn't delete it) and dispatchKeyToDeviceType (replacing computeDeviceType) - In all internal functions, whenever DispatchKey is taken as an argument, I instead take TensorOptions as an argument, and pass it along. - Anywhere `legacyExtractDispatchKey(other.key_set())` equality was previously used, I now do `other.options().type_equal()`, which is the intended BC for doing "backend to backend" comparisons - There are a few places in the sparse constructors where we allocated a tensor for values, and then read out the dispatch key from the result to allocate the keys. As best as I can tell, this is totally equivalent to just passing in the options to both values and indices (the only difference is dtype, which is captured via a separate argument) This refactor doesn't really go far enough: for example, there are now functions that take both TensorOptions and ScalarType, when really the TensorOptions can capture this all. I kept it solely just s/DispatchKey/TensorOptions/ to reduce the number of possible bugs; also, a lot of this will be mooted by a proper fix to #53124. Even with this limited refactor, the payoff is sweet. I can delete: - backendToCPU - backendToXPU - backendToCUDA - backendToHIP - backendToBackendOfDeviceType The reason I can do this is because I can simply overwrite layout in TensorOptions to do the conversion, rather than having to type out each backend case explicitly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D27109509 Pulled By: ezyang fbshipit-source-id: 91d16cfbc390127770362ac04fb43f7e070077e9	2021-03-19 09:08:32 -07:00
Mike Ruberry	544a996f83	Revert D27155845: [pytorch][PR] Fixed the size of the workspace array in functions calling MAGMA Test Plan: revert-hammer Differential Revision: D27155845 (`04a2506091`) Original commit changeset: 04439bfa82a5 fbshipit-source-id: f45967e94883effbb43d8d0a019596f1f82caa56	2021-03-19 08:27:18 -07:00
Bel H	887759c9b9	Changes to autograd/custom functions to handle optional arguments (#54270 ) Summary: Small changes to autograd to support optional Tensor values. On MLC device, we use Autograd Custom Functions to override the autograd engine for a specific operation. We do something like: ``` at::Tensor AtenMLCAutogradTypeDefault::abs(const at::Tensor & self) { torch_mlc::mlclogger() << "MLC bridge autograd MLC : abs" << std::endl; torch_mlc::AutoNonAtenMLCAutogradTypeDefault guard(true); return MLCAbsFunction::apply(self); } TORCH_LIBRARY_IMPL(aten, AutogradMLC, m) { m.impl("abs", static_cast<at::Tensor (*)(const at::Tensor &)>(&AtenMLCAutogradTypeDefault::abs)); } ``` What I noticed is that the existing code does not always work for optional Tensor types. This PR fixes it. Let me know if you have a better way to deal with this issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54270 Reviewed By: ejguan Differential Revision: D27171623 Pulled By: albanD fbshipit-source-id: 3aa8d59ee8da3cc943ad5e73521c2755d1ff2341	2021-03-19 07:45:36 -07:00
generatedunixname89002005325676	f2b4b0e9eb	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27184963 fbshipit-source-id: 65355a12697c8bd996b86947e3e0aeb0ee4eff3f	2021-03-19 05:16:43 -07:00
Lucas Hosseini	a84afb3a7c	Use type-erased union for Buffer. (#54251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54251 Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/324 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. This is a new version of D27001339 (`c618dc13d2`) which broke PyTorch OSS build. Test Plan: CI Reviewed By: lw, mrshenli Differential Revision: D27156053 fbshipit-source-id: 4244302af33a3be91dcd06093c0d6045d081d3cc	2021-03-19 04:58:09 -07:00
Philip Meier	8f755b9ed0	initial draft for assert_tensors_(equal\|allclose) in torch.testing (#53820 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/53618#issuecomment-795896298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53820 Reviewed By: agolynski Differential Revision: D27113912 Pulled By: mruberry fbshipit-source-id: 2a37522eaa37e90bf7b116f3964b06b46068cab7	2021-03-18 20:32:03 -07:00
Hao Lu	acf03b13f1	[Static Runtime] Check for number of uses of op inputs > 1 in ReplaceWithCopy (#54230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54230 The comments in the code explained why this change is needed. Reviewed By: bwasti Differential Revision: D27145406 fbshipit-source-id: 2a61a42f22dfadfad59ee6c3be3e9e9d19e90ac3	2021-03-18 20:02:20 -07:00
kshitij12345	bfd009836e	[torch.special] Add special.erf{c, inv} (#53260 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 Also adds `overrides` entry for module and the newly added functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53260 Reviewed By: agolynski Differential Revision: D27114342 Pulled By: mruberry fbshipit-source-id: b1dd88f373db251bb71df12d33b160382138f63f	2021-03-18 19:06:25 -07:00
Guilherme Leobas	19792b45db	add a pytest.ini file (#53152 ) Summary: This shall fix the first three items of https://github.com/pytorch/pytorch/issues/52984 by adding a pytest.ini configuration file. #### `--tb=short` In a failure, pytest will not show the entire traceback nor the docstring. - Without `--tb=short`: <details> ``` $ pytest test/test_typing.py -k tensor_copy ================================================================== test session starts =================================================================== platform linux -- Python 3.8.6, pytest-6.2.1, py-1.10.0, pluggy-0.13.1 rootdir: /home/guilhermel/git/pytorch, configfile: pytest.ini plugins: hypothesis-5.38.1, typeguard-2.10.0 collected 8 items / 7 deselected / 1 selected test/test_typing.py F [100%] ======================================================================== FAILURES ======================================================================== ______________________________________________________________ test_reveal[tensor_copy.py] _______________________________________________________________ path = '/home/guilhermel/git/pytorch/test/typing/reveal/tensor_copy.py', reveal = 'int ' expected_reveal = "/home/guilhermel/git/pytorch/test/typing/reveal/tensor_copy.py:11: note: Revealed type is 'torch.tensor.Tensor'", lineno = 11 def _test_reveal(path: str, reveal: str, expected_reveal: str, lineno: int) -> None: if reveal not in expected_reveal: > raise AssertionError(_REVEAL_MSG.format(lineno, expected_reveal, reveal)) E AssertionError: Reveal mismatch at line 11 E E Expected reveal: "/home/guilhermel/git/pytorch/test/typing/reveal/tensor_copy.py:11: note: Revealed type is 'torch.tensor.Tensor'" E Observed reveal: 'int ' test/test_typing.py:156: AssertionError ================================================================ short test summary info ================================================================= FAILED test/test_typing.py::test_reveal[tensor_copy.py] - AssertionError: Reveal mismatch at line 11 ``` </details> - With `--tb=short`: <details> ``` $ pytest test/test_typing.py -k tensor_copy ================================================================== test session starts =================================================================== platform linux -- Python 3.8.6, pytest-6.2.1, py-1.10.0, pluggy-0.13.1 rootdir: /home/guilhermel/git/pytorch, configfile: pytest.ini plugins: hypothesis-5.38.1, typeguard-2.10.0 collected 8 items / 7 deselected / 1 selected test/test_typing.py F [100%] ======================================================================== FAILURES ======================================================================== ______________________________________________________________ test_reveal[tensor_copy.py] _______________________________________________________________ test/test_typing.py:156: in _test_reveal raise AssertionError(_REVEAL_MSG.format(lineno, expected_reveal, reveal)) E AssertionError: Reveal mismatch at line 11 E E Expected reveal: "/home/guilhermel/git/pytorch/test/typing/reveal/tensor_copy.py:11: note: Revealed type is 'torch.tensor.Tensor'" E Observed reveal: 'int ' ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53152 Reviewed By: agolynski Differential Revision: D26846808 Pulled By: walterddr fbshipit-source-id: d16c951b370b0643c8bbedca73d5184c6b65aba7	2021-03-18 17:46:26 -07:00
Guilherme Leobas	bbb06c05a8	remove type_hint_tests and convert the files to use the new test style (#53167 ) Summary: This is a follow-up PR of https://github.com/pytorch/pytorch/issues/52408 and move/convert all files under `test/type_hint_tests/*.py` to use the new test style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53167 Reviewed By: ejguan Differential Revision: D27081041 Pulled By: walterddr fbshipit-source-id: 56508083800a5e12a7af88d095ca26229f0df358	2021-03-18 17:33:53 -07:00
kedejesu	53d8778b4d	Update clang-format linux hash and yaml import calls (#53932 ) Summary: Fixing Bandit security issues. - yaml_load: Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load(). Test ID: B506 Severity: MEDIUM Confidence: HIGH File: ./caffe2/contrib/aten/gen_op.py More info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html 235 if __name__ == '__main__': 236 decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader) 237 factory_methods = find_factory_methods(decls) - Blacklist: Use of insecure MD2 (`6149a26adb`), MD4 (`fc7f026980`), MD5 (`7ea9d9af4e`), or SHA1 hash function. Test ID: B303 Severity: MEDIUM Confidence: HIGH File: ./tools/clang_format_utils.py More info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b303-md5 36 37 hash = hashlib.sha1() 38 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53932 Reviewed By: jbschlosser Differential Revision: D27072017 Pulled By: malfet fbshipit-source-id: 2fef0119388797aee3cacdc880fc345bd2ba68ce	2021-03-18 17:11:58 -07:00
Peter Bell	04e0cbf5a9	Add padding='same' mode to conv{1,2,3}d (#45667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45667 First part of #3867 (Pooling operators still to do) This adds a `padding='same'` mode to the interface of `conv{n}d`and `nn.Conv{n}d`. This should match the behaviour of `tensorflow`. I couldn't find it explicitly documented but through experimentation I found `tensorflow` returns the shape `ceil(len/stride)` and always adds any extra asymmetric padding onto the right side of the input. Since the `native_functions.yaml` schema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically padded `conv{n}d` and the `_conv{n}d_same` variant. Underscores because I couldn't see any way to avoid exporting a function into the `torch` namespace. A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use `TORCH_WARN_ONCE` to notify the user of the performance implications. Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27170744 Pulled By: jbschlosser fbshipit-source-id: b3d8a0380e0787ae781f2e5d8ee365a7bfd49f22	2021-03-18 16:22:03 -07:00
Pritam Damania	a8a1090324	Perform appropriate CUDA stream synchronization in distributed autograd. (#53929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53929 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 124055277 (Note: this ignores all push blocking failures!) Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: walterddr, wanchaol Differential Revision: D27025307 fbshipit-source-id: 2944854e688e001cb3989d2741727b30d9278414	2021-03-18 16:15:46 -07:00
Jane Xu	75498164fe	Remove nonexistent files (#54276 ) Summary: Since both these files were deleted back in time, we shouldn't be running them anymore, as this was the old sharding strategy (see https://github.com/pytorch/pytorch/issues/50660). ``` test_python_nn.bat test_python_all_except_nn.bat ``` I believe we intend to run all the python files, so I added a call for that instead. Note: I don't believe there is a single unsharded test build, though, so should I instead just assume that all windows tests will be sharded? Pull Request resolved: https://github.com/pytorch/pytorch/pull/54276 Reviewed By: ejguan Differential Revision: D27173045 Pulled By: janeyx99 fbshipit-source-id: a7562c1479e18bd63f192f02129a42911a73a70b	2021-03-18 16:10:40 -07:00
Sam Estep	8cd4dac78f	Move mypy wrapper to tools (#54268 ) Summary: This PR - moves `torch/testing/_internal/mypy_wrapper.py` (and its accompanying tests from `test/test_testing.py`) to `tools`, - removes the now-unused `test_run_mypy` from `test/test_type_hints.py`, and - replaces the hardcoded list of `mypy` configs (previously duplicated across `mypy_wrapper.py` and `.github/workflows/lint.yml`) with a simpler glob Pull Request resolved: https://github.com/pytorch/pytorch/pull/54268 Test Plan: Should also be run in the "Test tools" GHA workflow in CI: ``` python tools/test/test_mypy_wrapper.py ``` Reviewed By: janeyx99 Differential Revision: D27168095 Pulled By: samestep fbshipit-source-id: a8dc18407b5e4c103ace23a636b0a8534951905a	2021-03-18 15:41:27 -07:00
Bin Bao	4626886f21	[JIT] Add CUDNN Conv-Add-Relu fusion for Frozen Model Optimization (#52102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52102 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D26646100 fbshipit-source-id: 7f7a82cc0b42c958b9e0c854b3b5dc6ea7cfff6c	2021-03-18 15:18:52 -07:00
Leonard Lausen	90bbe0b38b	cmake: auto-detect ccache to speed up developer builds (#49389 ) Summary: https://ccache.dev/ is a compiler cache that speeds up subsequent builds. Auto-detecting ccache ensures that it is used on systems where it is available, greatly improving build times for developers. There is no risk in enabling ccache in practice. Please refer to https://ccache.dev/ for a short summary / motivation Pull Request resolved: https://github.com/pytorch/pytorch/pull/49389 Reviewed By: ejguan Differential Revision: D27169957 Pulled By: malfet fbshipit-source-id: 673b60bbceb0d323901c8a992a75792c6da9b805	2021-03-18 14:20:53 -07:00
Sam Estep	a95abc4648	Test tools/test_history.py (#54259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54259 Test Plan: The main point of this is to be run in our "Test tools" GitHub Actions workflow. To test locally: ``` mypy --config=mypy-strict.ini python tools/test/test_test_history.py ``` Reviewed By: seemethere Differential Revision: D27164519 Pulled By: samestep fbshipit-source-id: 46f90e62e2d4d0c413b202419e509d471bad43de	2021-03-18 14:05:42 -07:00
Jane Xu	0645e2b490	Use shard file if present, improve functions used for sharding (#54210 ) Summary: Step 2 to fixing https://github.com/pytorch/pytorch/issues/53882 :) This changes TARGET_DET_LIST and sharding automation by checking if there's already cached data from the commit in `.pytorch-test-times`. If not, it pulls data from S3 and updates the file to have the stats. This way, S3 pulling does not need to happen more than once for the same commit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54210 Test Plan: the following methods should run the same set of tests. First `export CIRCLE_JOB=pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2` or your favorite CIRCLE JOB. 1. Pull data first and use it: Download the data from S3 and write it to the cache file with `python test/run_test.py --export-historic-test-times .pytorch-test-times` Now run `python test/run_test.py --shard 1 10` 2. Make the sharding job pull data: Delete the file you just created: `rm .pytorch-test-times` Now run `python test/run_test.py --shard 1 10` Reviewed By: walterddr Differential Revision: D27136849 Pulled By: janeyx99 fbshipit-source-id: 51a42c4e2fa3f8cf15e682679dd3eb6130aad927	2021-03-18 13:25:51 -07:00
Ilia Cherniavskii	3b1e3103ca	Remove usage of onEachDevice from legacy profiler (#54125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54125 Fixes https://github.com/pytorch/pytorch/issues/48987 Test Plan: python setup.py clean TORCH_CUDA_ARCH_LIST="6.0" USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 \| tee ~/output.txt python test/test_profiler.py -v python setup.py clean USE_CUDA=0 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install --cmake 2>&1 \| tee ~/output.txt python test/test_profiler.py -v + CI Reviewed By: rohan-varma Differential Revision: D27109481 Pulled By: ilia-cher fbshipit-source-id: 3fba8bc55deafeed1ab4680b311e927f40eaf99c	2021-03-18 12:19:51 -07:00
Nikita Shulga	d85faf8d8e	Cleanup mypy lint job (#54260 ) Summary: Update to checkout v2 Delete "Get HEAD commit SHA" step Pull Request resolved: https://github.com/pytorch/pytorch/pull/54260 Reviewed By: samestep Differential Revision: D27160678 Pulled By: malfet fbshipit-source-id: d1afe4f1cf0046cfb93de583ee123b4db5b25f9a	2021-03-18 10:49:57 -07:00
Ivan Yashchuk	04a2506091	Fixed the size of the workspace array in functions calling MAGMA (#54009 ) Summary: The size of the workspace arrays should not be less than 1. This PR fixes lstsq calls to LAPACK and MAGMA. Also `max(1, ...)` guards were added to a few other functions (symeig, svd). ROCm testing is enabled for lstsq, pinv, pinverse. Fixes https://github.com/pytorch/pytorch/issues/53976 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54009 Reviewed By: ejguan Differential Revision: D27155845 Pulled By: mruberry fbshipit-source-id: 04439bfa82a5bdbe2297a6d62b6e68ba1c30e4a2	2021-03-18 10:07:45 -07:00
Richard Barnes	f0056f89a4	Final kernel launch checks (#54214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54214 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D27138004 fbshipit-source-id: 4448ad8242eb721d0ce02b35a65236226eed9a31	2021-03-18 09:37:16 -07:00
albanD	cc92117aad	cleanup static_cast of AutogradMeta (#54103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54103 The goal is to reduce the spread of static casts in the autograd code as per the comment in https://github.com/pytorch/pytorch/pull/49097#discussion_r543695091 I wasn't sure how to use a virtual method here but a simple method in impl clean it up quite nicely. Test Plan: Imported from OSS Reviewed By: agolynski Differential Revision: D27117840 Pulled By: albanD fbshipit-source-id: 5f277dde34ccf6bc20f76583b906ff3528cde5aa	2021-03-18 09:29:07 -07:00
albanD	004db37358	properly make AutogradMeta/DifferentiableViewMeta attributes internal (#54102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54102 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117841 Pulled By: albanD fbshipit-source-id: bb047cf1878ccff81d677ceb02e98e784760c3ec	2021-03-18 09:29:03 -07:00
albanD	09b4af2f0f	Remove legacy from optional-related function names (#54101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54101 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117839 Pulled By: albanD fbshipit-source-id: 1f50b06ff9b0be8301f6ea9eca14f73a3a5fa137	2021-03-18 09:29:00 -07:00
albanD	a425eb2135	Add size check for forward grads (#54100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54100 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117842 Pulled By: albanD fbshipit-source-id: ccb6abac38d7fca31bea72cbbf3bba38c6030c37	2021-03-18 09:28:56 -07:00
albanD	cba8516b52	make internal forwardAD methods on at::Tensor internal (#54099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54099 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27117838 Pulled By: albanD fbshipit-source-id: ede96529a4b099dea9cf885d0bf2cb352aa30fa5	2021-03-18 09:27:17 -07:00
Nikita Shulga	a52e295cbb	Add MyPY to lint GHA workflow (#54067 ) Summary: Also disable test_run_mypy from test_type_hints.py as it is running as part of GHA Pull Request resolved: https://github.com/pytorch/pytorch/pull/54067 Reviewed By: ezyang Differential Revision: D27091530 Pulled By: malfet fbshipit-source-id: 9cfe397260aba34aeb055676855db383cd06f76d	2021-03-18 08:55:04 -07:00
Raghavan Raman	4b2abc4b8e	[NNC] Adding API to distribute loops (#53865 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53864 This PR adds the following APIs that perform loop distribution to `LoopNest`: ``` static std::vector<For> distributeLoop(For loop, const std::unordered_set<Stmt>& pivots); static std::vector<For> distributeLoop(For* loop); static std::vector<For> distributeLoopOverInnerLoops(For loop); ``` * The first method distributes the given loop over its body by splitting after every given pivot stmt. * The second method distributes the given loop over every stmt in its body. * The last method distributes the given loop over its body by splitting after every `For` stmt in its body. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53865 Reviewed By: mruberry Differential Revision: D27075006 Pulled By: navahgar fbshipit-source-id: 031746aad619fe84c109e78b53387535e7f77cef	2021-03-18 07:27:39 -07:00
Chen Lai	dc35848804	[PyTorch] Rename XPLAT_MOBILE_BUILD to TEMPLATE_SELECTIVE_BUILD (#54217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54217 As title. Find all XPLAT_MOBILE_BUILD usage: [search result](https://www.internalfb.com/intern/codesearch/?bunny_arg=XPLAT_MOBILE_BUILD&bunny_command=fbgs&lucky=0&q=repo%3Afbcode%20case%3Ainsensitive%20regex%3Aoff%20XPLAT_MOBILE_BUILD&source=redirect) and replace. Since template selective build is added in OSS, rename the macro to make it clearer. T86478520 to follow up to unify XPLAT_MOBILE_BUILD (rename to TEMPLATE_SELECTIVE_BUILD), C10_MOBILE and BUILD_LITE_INTERPRETER macros. ghstack-source-id: 124206354 Test Plan: CI Reviewed By: dhruvbird, iseeyuan Differential Revision: D27112046 fbshipit-source-id: 6f89b168c1f39c5449c8ed6538d887ea066a2225	2021-03-18 07:25:52 -07:00
Xiaoqiang Zheng	9f86b656ba	Resubmit: Adding parallel support for the LLVM backend. (#54122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54122 Test Plan: * USE_TBB=1 ATEN_THREADING=TBB python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=NATIVE python setup.py develop --cmake * USE_TBB=1 ATEN_THREADING=OMP python setup.py develop --cmake * cd build; ninja bin/tensorexpr_bench * bin/test_tensorexpr --gtest_filter="Parallel" Reviewed By: bertmaher Differential Revision: D27109802 Pulled By: zheng-xq fbshipit-source-id: db159466d0b46357bcf0fbefb36094bee312368c	2021-03-18 07:19:37 -07:00
Tugsbayasgalan Manlaibaatar	444552e7f9	Optimize alias_analysis node lookup (#54115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54115 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D27104047 Pulled By: tugsbayasgalan fbshipit-source-id: 0ef4e78be9ea7081b63ab2303711746bf09653eb	2021-03-18 07:14:49 -07:00
Kurt Mohler	382a47b493	Add torch.linalg.vector_norm function (#51099 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50214 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51099 Reviewed By: agolynski Differential Revision: D27147360 Pulled By: mruberry fbshipit-source-id: 1056f840e7027ad81971c9d1a9f952ab9648f1b5	2021-03-18 06:41:39 -07:00
Ivan Yashchuk	564456ac44	Added autograd support for torch.orgqr (#52637 ) Summary: This PR adds autograd support for `torch.orgqr`. Since `torch.orgqr` is one of few functions that expose LAPACK's naming and all other linear algebra routines were renamed a long time ago, I also added a new function with a new name and `torch.orgqr` now is an alias for it. The new proposed name is `householder_product`. For a matrix `input` and a vector `tau` LAPACK's orgqr operation takes columns of `input` (called Householder vectors or elementary reflectors) scalars of `tau` that together represent Householder matrices and then the product of these matrices is computed. See https://www.netlib.org/lapack/lug/node128.html. Other linear algebra libraries that I'm aware of do not expose this LAPACK function, so there is some freedom in naming it. It is usually used internally only for QR decomposition, but can be useful for deep learning tasks now when it supports differentiation. Resolves https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52637 Reviewed By: agolynski Differential Revision: D27114246 Pulled By: mruberry fbshipit-source-id: 9ab51efe52aec7c137aa018c7bd486297e4111ce	2021-03-18 05:42:18 -07:00
Xiao Wang	2f3b194dc2	Add cusolver potrf and potrfBatched to the backend of torch.cholesky decomposition (#53104 ) Summary: This PR adds cusolver potrf and potrfBatched to the backend of torch.cholesky and torch.linalg.cholesky. Cholesky heuristics: - Use cusolver potrf for batch_size 1 - Use magma_xpotrf_batched for batch_size >= 2 - if magma is not available, use loop of cusolver potrf for batch_size >= 2 cusolver potrf batched currently has some nan output issue, we will switch to cusolver potrf batched after it's fixed See also https://github.com/pytorch/pytorch/issues/42666 #47953 Todo: - [x] benchmark and heuristic Close https://github.com/pytorch/pytorch/pull/53992 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53104 Reviewed By: agolynski Differential Revision: D27113963 Pulled By: mruberry fbshipit-source-id: 1429f63891cfc6176f9d8fdeb5c3b0617d750803	2021-03-18 05:35:40 -07:00
Mike Ruberry	8caa7889fc	Revert D27001339: Use type-erased union for Buffer. Test Plan: revert-hammer Differential Revision: D27001339 (`c618dc13d2`) Original commit changeset: 26d7dc19d69d fbshipit-source-id: 6e036ed7e1f71c9cf20e3361607c4fe4fa2d3d02	2021-03-18 05:27:17 -07:00
Lucas Hosseini	c618dc13d2	Use type-erased union for Buffer. (#322 ) Summary: Pull Request resolved: https://github.com/pytorch/tensorpipe/pull/322 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54145 In order to merge the channel hierarchies, we need a generic `Buffer` type, that can wrap either a `CpuBuffer` or a `CudaBuffer`. The constraints are that, since this type is used by the channels, it cannot explicitly refer to `CudaBuffer`. We propose here a type-erasure based solution, with small-buffer optimization to avoid heap-allocating the wrapped concrete buffer. ghstack-source-id: 124131499 Test Plan: CI Reviewed By: lw Differential Revision: D27001339 fbshipit-source-id: 26d7dc19d69d7e3336df6fd4ff6ec118dc17c5b6	2021-03-18 02:23:17 -07:00
Wanchao Liang	133000fe7a	[distributed] add processgroup options as argument (#53663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53663 This add the processgroup option as an optional argument to new_group and init_processgroup, this allows user to pass in a initialized processgroup option for gloo and nccl. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26968857 Pulled By: wanchaol fbshipit-source-id: 2ff73a009120b85e83ecde7c69956b731902abc2	2021-03-18 01:04:17 -07:00
James Reed	2d8795c552	[FX] Normalize torch. namespace ops (#53832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53832 Test Plan: Imported from OSS Reviewed By: jfix71, Chillee Differential Revision: D26982801 Pulled By: jamesr66a fbshipit-source-id: 96ac8efe2b3c644cfb7328168f6db089d3756aa2	2021-03-17 23:34:29 -07:00
Edward Yang	72c7983f23	Remove __get__ from Tensor stub. (#54208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54208 It seems like it was added to suppress some errors in LazyModules, but I think we should solve those more directly with some type ignores in more surgical places. Fixes #54087. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D27137363 Pulled By: ezyang fbshipit-source-id: 017cafcc3350e73cd62436078835b97cd9b3b929	2021-03-17 21:40:58 -07:00
James Reed	a27f46bbe3	[FX] Experimental type annotation pass using Python signatures (#53831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53831 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26982804 Pulled By: jamesr66a fbshipit-source-id: 17db9f71e729206f29ee231e34723d9616f128b7	2021-03-17 20:43:17 -07:00
James Reed	255b103c1b	[WIP] Function to retrieve inspect.Signature instances for PyTorch ops (#53830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53830 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26982802 Pulled By: jamesr66a fbshipit-source-id: 18fddc9f3f34b09e173de59f2fe886f8eedd000e	2021-03-17 20:41:27 -07:00
Pavel Belevich	0dc5abfaa9	Revert D26907093: Add repeats to Timer.collect_callgrind(...) Test Plan: revert-hammer Differential Revision: D26907093 (`74993dcf7b`) Original commit changeset: 72e5b4889691 fbshipit-source-id: 80779ec895920a4e9b33daa56f32b587f8912ed6	2021-03-17 20:14:21 -07:00
Hao Lu	ca429fedd3	[StaticRuntime] Fuse SigridTransforms + ListUnpack (#53920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53920 Fusing SigridTransforms + ListUnpack allows for enabling out variant for SigridTransforms so that the output tensors can be managed by the MemoryPlanner in Static Runtime. The speedup comes from three parts 1) get rid of memory allocation inside SigridTransforms itself, 2) memory deallocation cost (outside SigridTransforms, inside MemoryPlanner), 3) get rid of ListUnpack. However, in 3) we still need to pay the cost of constructing `vector<Tensor>` for outputs and a round of refcount bumps for all the output TensorImpls. Reviewed By: ajyu Differential Revision: D26220546 fbshipit-source-id: 651bdfb850225511c43b8f50083b13e8dec46bcc	2021-03-17 19:58:02 -07:00
Shen Li	ef9ee46756	Avoid modifying rebuild buckets state in no_grad context (#54159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54159 See https://github.com/pytorch/pytorch/issues/54059 for discussion. In short, users might want to run evaluation on a single rank in `torch.no_grad()` mode. When this happens, we need to make sure that we skip all rebuild bucket logics, as the forward only runs on one rank and not all peers can sure the bucket configuration sync communication. Test Plan: Imported from OSS Reviewed By: zhaojuanmao Differential Revision: D27119666 Pulled By: mrshenli fbshipit-source-id: 4b2f8cce937cdd893e89d8d10c9267d255ba52ea	2021-03-17 19:50:29 -07:00
Michael Melesse	fef0219f7e	[ROCM] Fix hipfft transform type error (#53411 ) Summary: This PR enable some failing unit tests for fft in pytorch on ROCM. The reason these tests were failing was due to an error in how hipfft was executed for different transform types for float inputs causing a mismatch error when compared to baselines. We solved the problem by calling hipfft with the right config for each transformation type. There PR doesnot enable all fft tests. There are still other issues that need to be resolved before that can happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53411 Reviewed By: albanD Differential Revision: D27008323 Pulled By: mruberry fbshipit-source-id: 649c65d0f12a889a426ec475f7d8fcc6f1d81bd3	2021-03-17 19:26:04 -07:00
Wanchao Liang	f4a044ca1d	[distributed] add options field in ProcessGroupGloo/NCCL (#54090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54090 This PR adds an options field to both ProcessGroupGloo/NCCL so that we have a constant `options` field even after the initialization of ProcessGroup, which gives us the ability to inspect the options during construction of specific ProcessGroup. Also use options inside different methods instead of separate fields. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D27093670 Pulled By: wanchaol fbshipit-source-id: b02d9394290e9be88b21bddb94d4de7993b4a2e3	2021-03-17 18:41:55 -07:00
Wanchao Liang	a4f0f8b1e9	[distributed] add base processgroup::options (#53662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53662 Add a base processgroup::options so that we can do inheritance and provide a universal option API in python Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26968856 Pulled By: wanchaol fbshipit-source-id: 858f4b61b27aecb1943959bba68f8c14114f67d8	2021-03-17 18:40:04 -07:00
Gisle Dankel	ac78d05d05	[Kineto] Update rev for fix to #53848 (#54226 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53848. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54226 Reviewed By: ilia-cher Differential Revision: D27144893 Pulled By: gdankel fbshipit-source-id: f3609de540fd62c58f60f19cdca88f0dbf3ee8ca	2021-03-17 18:23:25 -07:00
Taylor Robie	74993dcf7b	Add repeats to Timer.collect_callgrind(...) (#53295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53295 A lot of the time spent in `collect_callgrind` is spinning up Valgrind and executing the initial `import torch`. In most cases the actual run loop is a much smaller fraction. As a result, we can reuse the same process to do multiple replicates and do a much better job amortizing that startup cost. This also tends to result in more stable measurements: the kth run is more repeatable than the first because everything has been given a chance to settle into a steady state. The instruction microbenchmarks lean heavily on this behavior. I found that in practice doing several `n=100` replicates to be more reliable than one monolithic 10,000+ iteration run. (Since rare cases like memory consolidation will just contaminate that one replicate, as opposed to getting mixed into the entire long run.) Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26907093 Pulled By: robieta fbshipit-source-id: 72e5b48896911f5dbde96c8387845d7f9882fdb2	2021-03-17 18:05:13 -07:00
Vitaly Fedyunin	8ecb2d35bc	Add ability to override _reduce_ex_ function of DataPipe (#52858 ) Summary: Required for `torchdata` graph functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/52858 Reviewed By: H-Huang Differential Revision: D26736348 Pulled By: VitalyFedyunin fbshipit-source-id: 1735e88374090422e6365d07d5b84075e371500c	2021-03-17 17:23:05 -07:00
Stephen Jia	2eb3917629	[Vulkan] Add reflection_pad2d to Vulkan (#53604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53604 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D27098310 Pulled By: SS-JIA fbshipit-source-id: efb6692d20edcab06907d12ad0121676876216dc	2021-03-17 16:11:25 -07:00
Sam Estep	06cb9293c5	Add GitHub Actions workflow to test tools (#54207 ) Summary: This PR closes https://github.com/pytorch/pytorch/issues/52866 by adding a GitHub Actions workflow to run the tests in the dir introduced by https://github.com/pytorch/pytorch/issues/53755. It also uses `actions/setup-python@v2`, assuming that https://github.com/pytorch/pytorch/issues/54202 will be merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54207 Test Plan: The added "Test tools" GHA workflow in CI. Reviewed By: walterddr Differential Revision: D27135159 Pulled By: samestep fbshipit-source-id: c8c5e2e2ac2491baab1b1f1ed4f44b4c3266ee8d	2021-03-17 15:30:08 -07:00
Richard Barnes	7d1e1c7e0d	Pyre-ify torch.jit.interface's (#54084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54084 Test Plan: Sandcastle Reviewed By: derekmod-fb Differential Revision: D27075597 fbshipit-source-id: 992592c88320df61e3a65eb0ac4ba5705b0b5802	2021-03-17 14:56:31 -07:00
Rong Rong (AI Infra)	94b22b5b3b	try catch test upload failures (#54194 ) Summary: Exception during send report shouldn't fail the entire pipeline Pull Request resolved: https://github.com/pytorch/pytorch/pull/54194 Test Plan: CI Reviewed By: samestep Differential Revision: D27128457 Pulled By: walterddr fbshipit-source-id: 5404b542bc1a14c6f6c4d8586c1643c8c65e6d1f	2021-03-17 14:47:11 -07:00
Jacob Szwejbka	8f61b13e80	[Pytorch Mobile] Optimize Non Forward for Mobile (#53314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53314 Introduction of api for optimizing non forward functions for mobile. As of this diff, all functions that you say to optimize will be preserved, and those functions will be run through canonical optimization. The intention is to stack each further optimization onto separate diffs since they touch multiple files, and it seems like it'd be a nightmare to review. ghstack-source-id: 123909414 Test Plan: torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward", "foo"]) runs fine torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize={"foo"}) optimizes just foo if the model doesnt define forward otherwise optimizes foo and forward torch.utils.mobile_optimizer.optimize_for_mobile(net, methods_to_optimize=["forward"]) runs fine torch.utils.mobile_optimizer.optimize_for_mobile(net) runs fine if the model defines forward, Throws otherwise Reviewed By: kimishpatel Differential Revision: D26618689 fbshipit-source-id: 5bff1fb3f3f6085c4a649a8128af9c10f0fa9400	2021-03-17 14:31:24 -07:00
Sam Estep	407d60ee91	Upgrade actions/setup-python from v1 to v2 (#54202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54202 Test Plan: The lint and clang-format workflows in CI. Reviewed By: janeyx99 Differential Revision: D27134223 Pulled By: samestep fbshipit-source-id: 7f38240696e31f1a479e93f6b326b9d13e3ddf9c	2021-03-17 14:06:12 -07:00
Stephen Jia	cd776560d0	[vulkan] Add hardswish and hardsigmoid activations to Vulkan (#53362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53362 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D27098430 Pulled By: SS-JIA fbshipit-source-id: aa2edf2af4ebabe95dbc02d33ecff6f7c9f0953c	2021-03-17 14:00:34 -07:00
Edvard Ghazaryan	957700be7e	Improved aten::to performance from inline cvr remote_request_only (#53800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53800 copy_impl improvement: Before: 1732 ns After:. 1159 ns remote_request_only Before: Milliseconds per iter: 1.24185. Iters per second: 805.252 0.161477 ms. 13.5036%. aten::to (155 nodes) After: Milliseconds per iter: 1.14195. Iters per second: 875.696 0.113893 ms. 10.339%. aten::to (155 nodes) Test Plan: buck test caffe2:ATen-core-test Reviewed By: ajyu Differential Revision: D26967349 fbshipit-source-id: d8f8dc5e8e3df1cec57fa098b21119ec9568e4a5	2021-03-17 13:54:18 -07:00
Shen Li	e442d5c8a5	Disallow CUDA RPC to use new devices in output tensors (#54024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54024 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27059108 Pulled By: mrshenli fbshipit-source-id: 1997ce8b130220786883b54c8a32e99989f70f22	2021-03-17 13:44:15 -07:00
Shen Li	8cc06e3ca3	Disable CUDA RPC tests that use new device in user-function outputs (#54023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54023 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D27059107 Pulled By: mrshenli fbshipit-source-id: e878511942f2e2577b2f0b8e7711d70582537851	2021-03-17 13:41:50 -07:00
Wenlei Xie	79534867ac	Migrate about 100 kernel to C10 full dispatcher (#54109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54109 Codemod command generated by https://github.com/pytorch/pytorch/pull/54098 ghstack-source-id: 124114894 Test Plan: CI Reviewed By: smessmer Differential Revision: D27100359 fbshipit-source-id: 8338405274a2a020856af6e4a35a2fb21438f2a8	2021-03-17 13:35:39 -07:00
Thomas Viehmann	fd5c1123e4	wrap AliasDb in Python (#51336 ) Summary: Also added a wrapper tlemo 's graphviz export to string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51336 Reviewed By: ezyang Differential Revision: D26150809 Pulled By: eellison fbshipit-source-id: 9beafce5cbdc1785b986b71c3cd986c1087faa11	2021-03-17 12:55:22 -07:00
Jane Xu	2e7311ef25	First step to refactoring S3 reading logic (#53755 ) Summary: This is an initial attempt in refactoring and consolidating our S3 read logic for print_test_stats.py, test_history.py, and run_test.py. This way, boto3 and botocore do not need to be imported in various places throughout the code base, and duplicated logic (such as the many type definitions) can exist in one place: `tools/stat_utils/s3_stat_parser.py`. walterddr contributed to this PR by moving print_test_stats.py to the tools folder and the corresponding tests a subfolder within tools. NOTE: this removes those tests from CI as the new `tools/test/test_stats.py` is not in the test/ directory as the other tests in TESTS in run_test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53755 Test Plan: This refactoring change should not break anything, so running the files as before should work as they did previously. To make sure that print_test_stats.py still functions: run `python tools/test/test_stats.py` and make sure all tests pass. To make sure that test_history.py works, run the example commands from `tools/test_history.py --help` and check that their output matches that shown. Note that the script will continue printing for a while, so don't be alarmed. Some next steps: - Actually coming up with similarities among the three current use cases and further refactoring/consolidating of functions (e.g., combining simplify and get_cases) - Moving more parsing logic to s3_stat_parser.py to have better abstraction between our files - Adding tests for s3_stat_parser.py when there is more functionality in it Reviewed By: agolynski, samestep Differential Revision: D27030285 Pulled By: janeyx99 fbshipit-source-id: e664781324ef7c0c30943bfd7f17c895075ef7a7	2021-03-17 12:38:09 -07:00
Adam Simpkins	ccdcfba5de	[caffe2] Refactor tensor serialization function (#53404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53404 This refactors `TensorSerializer::Serialize()` so that we have a separate helper function for each data type. This should make it slightly easier in the future to add new serialization formats for specific data types. ghstack-source-id: 124085413 Test Plan: Confirmed the existing tests pass. This diff is not expected to have any behavior changes. Reviewed By: mraway, glamtechie Differential Revision: D26658204 fbshipit-source-id: 232776262db6486ba845a7ba223e3987053dac27	2021-03-17 12:36:31 -07:00
Edward Yang	a2a7179695	Fix bug in assertRaises NotImplemented handling when no exception is thrown (#54126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54126 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: agolynski, mruberry Differential Revision: D27109510 Pulled By: ezyang fbshipit-source-id: ba5a4de85ca00f81724f3d4e645797e8f32aa3b1	2021-03-17 12:30:51 -07:00
Edward Yang	7e7533b2e2	Delete denseTypeIdWithDefault and toDense (#54016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54016 I managed to convince myself that typeIdWithDefault was sufficient for the sparse constructor case. Here is the reasoning. The surface reading of the use site of denseTypeIdWithDefault is to convert what could be a sparse dispatch key into the dense version so we can properly allocate underlying dense tensors for the sparse constructor call. But WHERE does this dispatch key come from? Inspection of call sites reveals that dispatch key is provided by torch::tensors::get_default_dispatch_key(). This key is NEVER sparse, as that would correspond to setting sparse tensors to be the default tensor via torch.set_default_tensor_type() (which is forbidden, and even if it worked most of everything in PyTorch would break). That means that typeIdWithDefault is a sufficient replacmenet. With denseTypeIdWithDefault removed, we can also delete toDense as this was the sole use of that function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27109511 Pulled By: ezyang fbshipit-source-id: c698eff0ab54c0c101fe9f55be3b7657584c4372	2021-03-17 12:28:55 -07:00
Jane Xu	f30a7a2739	Add export-historic-test-times option to dump S3 test times into a JSON file (#54083 ) Summary: This will allow for future work to use the test times file (which will save computation time and also allow for more consistency). (Step one to fixing https://github.com/pytorch/pytorch/issues/53882) Pull Request resolved: https://github.com/pytorch/pytorch/pull/54083 Test Plan: export CIRCLE_JOB=your-favorite-circleci-job e.g., pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 `python test/run_test.py --export-historic-test-times` OR `python test/run_test.py --export-historic-test-times .your-favorite-file` When opening either .pytorch-test-times or .your-favorite-file, you should see something like: ``` {"commit": "2d559a09392aabb84dfb4a498010b2f01d99818c", "job_times": {"distributed/test_distributed_spawn": 583.5889999999973, "distributed/test_data_parallel": 4.866999999999997, "test_binary_ufuncs": 171.1569999999998, "test_numpy_interop": 2.5649999999999995, "test_public_bindings": 0.011,...}} ``` Note that no tests will be run when this option is specified. Reviewed By: walterddr Differential Revision: D27091351 Pulled By: janeyx99 fbshipit-source-id: e191d739268d86de0a0ba0eea0006969859d1940	2021-03-17 12:22:00 -07:00
Bert Maher	7367bca066	[nnc] Tests for proposed feature: loop bounds conditional simplification (#54121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54121 It would be nice to do range analysis to determine if a condition cannot be satisfied. These are some tests that we should be able to turn on once we have this feature. ghstack-source-id: 124116847 Test Plan: Simplify.*LoopBounds Reviewed By: ZolotukhinM Differential Revision: D27107956 fbshipit-source-id: bb27e3d3bc803f0101c416e4a351ba2278684980	2021-03-17 11:01:10 -07:00
Bert Maher	a852fdb6b5	[nnc] Test for using int64 dimensions (#54094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54094 We should be able to use 64-bit integers for loop boundaries and buffer/tensor indexing. ghstack-source-id: 124116846 Test Plan: New tests, disabled Reviewed By: ZolotukhinM Differential Revision: D27094934 fbshipit-source-id: a53de21a0ef523ea3560d5dd4707df50624896ef	2021-03-17 10:59:26 -07:00
Jordan Fix	0806126aad	[fx][trivial] Add TestConstFold coverage to test_fx (#54072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54072 att Test Plan: Adding coverage Differential Revision: D27085591 fbshipit-source-id: 8c5ea5a52be619249f23a938ddb0a3aed1ada0f7	2021-03-17 10:38:54 -07:00
Yanli Zhao	91747a5e93	add tests for ddp with activation check points (#52894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52894 add two success cases and two failure cases for ddp with activation check points when grad_as_bucket_view = true and false Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D26679895 fbshipit-source-id: a6f6cb22b4903ed8b1f7b8ed4fe8b13e102d8c21	2021-03-17 10:16:20 -07:00
Michael Carilli	ce40ff5c64	Avoid DDP race condition with find_unused_parameters=True when all params are used (#53160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53159. See comments for a description of the race condition. Thanks to ptrblck xwang233 and especially zasdfgbnm for lots of help isolating the problem and discussing the fix. PRing for discussion. We can try to concoct a dedicated test for the problem if you want. The ingredients are: - DDP(..., find_unused_parameters=True) - Use all the DDP-ed model's params in forward such that the "lazy local used work wait()" path will be taken in backward - Queue up a lot of asynchronous dummy work just before backward(), so stream work gets pushed far into the future relative to CPU work Benchmark: Bert model, When find_unused_parameters=true, latency (sec) per iteration P50: trunk-1.265sec, this PR-1.263sec, if add blocking copy before calling local_used_.fill(i)-1.236 sec Bert model, When find_unsued_parameters=false, latency (sec) per iteration P50: trunk-1.00sec, this PR-1.026sec Resnet50 model, accuracy is also matched with trunk when find_unused_parameters=true and false Pull Request resolved: https://github.com/pytorch/pytorch/pull/53160 Reviewed By: albanD Differential Revision: D26916766 Pulled By: zhaojuanmao fbshipit-source-id: 3e0ed91b7b5c42e2f2c82e12d4d2940fdc89e023	2021-03-17 10:08:22 -07:00
Joel Schlosser	4fac72ee9d	[fix] Dimension out of range in pixel_shuffle / pixel_unshuffle (#54086 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54051 Problem was application of the unary minus operator to an unsigned type. Positive indices are now used to build the permutation array for both `pixel_shuffle` and `pixel_unshuffle`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54086 Reviewed By: agolynski Differential Revision: D27093435 Pulled By: jbschlosser fbshipit-source-id: 4062f71277d037e91dc3cf5835b29b8ed4d16607	2021-03-17 09:26:47 -07:00
Scott Wolchok	4a24c552cc	[PyTorch] Fix string copy in WARN path for both interpreters (#54076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54076 If we don't constrain ourselves to use `torch::jit::pop`, we can avoid copying a string or moving IValues around. ghstack-source-id: 124040891 Test Plan: existing tests spot-checked regular interpreter assembly; seems better Reviewed By: dhruvbird, walterddr Differential Revision: D27087204 fbshipit-source-id: 7cf355dbcec31409bdb37afa09d7df85cf2a7e4b	2021-03-17 08:44:08 -07:00
Scott Wolchok	8f1af02f35	[PyTorch][mobile] Audit mobile interpreter for extra copies (#54031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54031 Similar to D27060762 (`665d5e2a4f`), caught some probably-unintended copies. ghstack-source-id: 124040889 Test Plan: CI? Reviewed By: walterddr, iseeyuan Differential Revision: D27061818 fbshipit-source-id: f4a77cb5c21cd3ebce7b7e82764e4361467bab91	2021-03-17 08:42:34 -07:00
Mengwei Liu	ce15f312a8	[PyTorch] Align function parameters across declaration and definition for max pool 2d (#54105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54105 This is preparing XNNPACK to be enabled in Windows. For some reason Windows clang doesn't think functions taking `float` and `const float` to have the same signature and thus throwing link errors like: ``` lld-link: error: undefined symbol: bool __cdecl at::native::xnnpack::use_max_pool2d(class at::Tensor const &, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, bool, float, float) >>> referenced by C:\open\fbsource\buck-out\gen\f84e6a81\xplat\caffe2\pt_ops_full_template_registration\aten\src\ATen\native\Pooling.cpp:127 >>> libpt_ops_fullWindows.lib(out.obj):(class at::Tensor __cdecl at::native::max_pool2d(class at::Tensor const &, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, bool)) lld-link: error: undefined symbol: class at::Tensor __cdecl at::native::xnnpack::max_pool2d(class at::Tensor const &, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, bool, float, float) >>> referenced by C:\open\fbsource\buck-out\gen\f84e6a81\xplat\caffe2\pt_ops_full_template_registration\aten\src\ATen\native\Pooling.cpp:129 >>> libpt_ops_fullWindows.lib(out.obj):(class at::Tensor __cdecl at::native::max_pool2d(class at::Tensor const &, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, class c10::ArrayRef<__int64>, bool)) ``` Declaration: `src/ATen/native/xnnpack/Engine.h` Definition: `src/ATen/native/xnnpack/MaxPooling.cpp` Reference: `src/ATen/native/Pooling.cpp` Test Plan: build succeeded Reviewed By: kimishpatel Differential Revision: D27097201 fbshipit-source-id: ab557f608713840ee0a65b252fa875624ddd502f	2021-03-17 08:23:05 -07:00
Tao Xu	527c1e0e37	[iOS GPU] Remove unnecessary texture size changing (#54108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54108 Clean up the hardcode for 2d tensors. ghstack-source-id: 124113406 Test Plan: ``` 2021-03-16 00:27:31.280761-0700 PyTorchPlayground[16024:6249832] [bool test_view()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.282833-0700 PyTorchPlayground[16024:6249832] [bool test_view2()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.285320-0700 PyTorchPlayground[16024:6249832] [bool test_view3()],[5 8 ],[SUCCEED] 2021-03-16 00:27:31.286929-0700 PyTorchPlayground[16024:6249832] [bool test_view4()],[5 8 ``` - Sandcastle CI - CircleCI Reviewed By: SS-JIA Differential Revision: D27075236 fbshipit-source-id: 1005fd82f4a75603697579a191d3acc6fc1bd690	2021-03-17 01:03:56 -07:00
Tao Xu	e579b39b9e	[iOS GPU] Implement view and reshape in metal shaders (#54107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54107 The current implementation doesn't change the underlying texture's shape. This diff converts MPSImage from one shape to the other. The way we'll do it is we implement this as an elementwise kernel. We have a thread grid of size (N2, C2, H2, W2) with a thread for each output element, and we compute the "linear index" of the output element, and convert it to the equivalent "linear index" of the input element. This is the known as sub2ind/ind2sub conversion in MATLAB, ravel_multi_index in numpy, etc. `a08841a8e1/cupy/indexing/generate.py (L301-L304)` is a clean generic version of ind2sub. ghstack-source-id: 124113407 Test Plan: ``` 2021-03-16 00:27:31.280761-0700 PyTorchPlayground[16024:6249832] [bool test_view()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.282833-0700 PyTorchPlayground[16024:6249832] [bool test_view2()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.285320-0700 PyTorchPlayground[16024:6249832] [bool test_view3()],[5 8 ],[SUCCEED] 2021-03-16 00:27:31.286929-0700 PyTorchPlayground[16024:6249832] [bool test_view4()],[5 8 ],[SUCCEED] ``` - Sandcastle CI - CircleCI Reviewed By: SS-JIA Differential Revision: D27074719 fbshipit-source-id: 445f55fefeb9cc7b3eeab106b6d567facef58343	2021-03-17 01:03:53 -07:00
Tao Xu	2e8a9d2bfe	[iOS GPU] Support multi-dimension tensors via MPSImage (#54106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54106 The texture size will always be four dimensional. For higher dim tensors, we just fold them to the batch dim. ghstack-source-id: 124113408 Test Plan: - Sandcastle CI - CircleCI - Metal Unit tests ``` 2021-03-16 00:27:30.407417-0700 PyTorchPlayground[16024:6249832] [bool test_synchronization()],[1 3 2 2 ],[SUCCEED] 2021-03-16 00:27:30.440521-0700 PyTorchPlayground[16024:6249832] [bool test_nchw_to_nc4_cpu()],[8 2 154 299 ],[SUCCEED] 2021-03-16 00:27:30.478765-0700 PyTorchPlayground[16024:6249832] [bool test_nchw_to_nc4_cpu()],[11 9 25 319 ],[SUCCEED] 2021-03-16 00:27:30.668841-0700 PyTorchPlayground[16024:6249832] [bool test_nchw_to_nc4_cpu()],[12 14 281 86 ],[SUCCEED] 2021-03-16 00:27:30.820580-0700 PyTorchPlayground[16024:6249832] [bool test_nchw_to_nc4_cpu()],[13 3 308 264 ],[SUCCEED] 2021-03-16 00:27:30.863287-0700 PyTorchPlayground[16024:6249832] [bool test_nchw_to_nc4_cpu()],[8 2 281 213 ],[SUCCEED] 2021-03-16 00:27:30.870941-0700 PyTorchPlayground[16024:6249832] [bool test_copy_nchw_to_metal()],[1 3 224 224 ],[SUCCEED] 2021-03-16 00:27:30.881768-0700 PyTorchPlayground[16024:6249832] [bool test_conv2d()],[4 2 10 258 ],[SUCCEED] 2021-03-16 00:27:30.916943-0700 PyTorchPlayground[16024:6249832] [bool test_conv2d()],[7 9 68 111 ],[SUCCEED] 2021-03-16 00:27:31.011515-0700 PyTorchPlayground[16024:6249832] [bool test_conv2d()],[4 25 186 246 ],[SUCCEED] 2021-03-16 00:27:31.018628-0700 PyTorchPlayground[16024:6249832] [bool test_conv2d()],[5 5 291 25 ],[SUCCEED] 2021-03-16 00:27:31.070833-0700 PyTorchPlayground[16024:6249832] [bool test_conv2d()],[2 38 178 109 ],[SUCCEED] WARNING: Logging before InitGoogleLogging() is written to STDERR W0316 00:27:31.076831 1843703808 TensorImpl.h:965] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator()) 2021-03-16 00:27:31.094476-0700 PyTorchPlayground[16024:6249832] [bool test_depthwiseConv()],[1 32 112 112 ],[SUCCEED] 2021-03-16 00:27:31.097782-0700 PyTorchPlayground[16024:6249832] [bool test_max_pool2d()],[1 3 4 4 ],[SUCCEED] 2021-03-16 00:27:31.109290-0700 PyTorchPlayground[16024:6249832] [bool test_max_pool2d_ceil()],[1 96 55 55 ],[SUCCEED] 2021-03-16 00:27:31.112203-0700 PyTorchPlayground[16024:6249832] [bool test_relu()],[1 3 4 4 ],[SUCCEED] 2021-03-16 00:27:31.116675-0700 PyTorchPlayground[16024:6249832] [bool test_addmm()],[5 120 105 ],[SUCCEED] 2021-03-16 00:27:31.119392-0700 PyTorchPlayground[16024:6249832] [bool test_addmm()],[6 6 84 ],[SUCCEED] 2021-03-16 00:27:31.122741-0700 PyTorchPlayground[16024:6249832] [bool test_addmm()],[5 110 38 ],[SUCCEED] 2021-03-16 00:27:31.125273-0700 PyTorchPlayground[16024:6249832] [bool test_addmm()],[8 116 90 ],[SUCCEED] 2021-03-16 00:27:31.128231-0700 PyTorchPlayground[16024:6249832] [bool test_addmm()],[5 92 123 ],[SUCCEED] 2021-03-16 00:27:31.132546-0700 PyTorchPlayground[16024:6249832] [bool test_add()],[1 180 12 12 ],[SUCCEED] 2021-03-16 00:27:31.138931-0700 PyTorchPlayground[16024:6249832] [bool test_add_broadcast()],[2 17 58 67 ],[SUCCEED] 2021-03-16 00:27:31.145191-0700 PyTorchPlayground[16024:6249832] [bool test_add_broadcast2()],[2 17 1 67 ],[SUCCEED] 2021-03-16 00:27:31.174218-0700 PyTorchPlayground[16024:6249832] [bool test_sub()],[5 3 167 222 ],[SUCCEED] 2021-03-16 00:27:31.182838-0700 PyTorchPlayground[16024:6249832] [bool test_sub_broadcast()],[3 1 1 ],[SUCCEED] 2021-03-16 00:27:31.205262-0700 PyTorchPlayground[16024:6249832] [bool test_sub_broadcast2()],[2 3 3 192 192 ],[SUCCEED] 2021-03-16 00:27:31.227730-0700 PyTorchPlayground[16024:6249832] [bool test_mul()],[2 7 262 119 ],[SUCCEED] 2021-03-16 00:27:31.244125-0700 PyTorchPlayground[16024:6249832] [bool test_mul_broadcast()],[4 3 192 192 ],[SUCCEED] 2021-03-16 00:27:31.250476-0700 PyTorchPlayground[16024:6249832] [bool test_mul_broadcast2()],[1 3 192 192 ],[SUCCEED] 2021-03-16 00:27:31.254482-0700 PyTorchPlayground[16024:6249832] [bool test_div()],[1 3 24 24 ],[SUCCEED] 2021-03-16 00:27:31.258273-0700 PyTorchPlayground[16024:6249832] [bool test_div_broadcast()],[4 3 24 24 ],[SUCCEED] 2021-03-16 00:27:31.259873-0700 PyTorchPlayground[16024:6249832] [bool test_div_broadcast2()],[1 3 24 24 ],[SUCCEED] 2021-03-16 00:27:31.269028-0700 PyTorchPlayground[16024:6249832] [bool test_t()],[109 196 ],[SUCCEED] 2021-03-16 00:27:31.271374-0700 PyTorchPlayground[16024:6249832] [bool test_t()],[82 227 ],[SUCCEED] 2021-03-16 00:27:31.273238-0700 PyTorchPlayground[16024:6249832] [bool test_t()],[33 175 ],[SUCCEED] 2021-03-16 00:27:31.275284-0700 PyTorchPlayground[16024:6249832] [bool test_t()],[13 226 ],[SUCCEED] 2021-03-16 00:27:31.277017-0700 PyTorchPlayground[16024:6249832] [bool test_t()],[7 153 ],[SUCCEED] 2021-03-16 00:27:31.280761-0700 PyTorchPlayground[16024:6249832] [bool test_view()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.282833-0700 PyTorchPlayground[16024:6249832] [bool test_view2()],[1 10 2 2 ],[SUCCEED] 2021-03-16 00:27:31.285320-0700 PyTorchPlayground[16024:6249832] [bool test_view3()],[5 8 ],[SUCCEED] 2021-03-16 00:27:31.286929-0700 PyTorchPlayground[16024:6249832] [bool test_view4()],[5 8 ],[SUCCEED] 2021-03-16 00:27:31.515716-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim0()],[3 9 221 193 ],[SUCCEED] 2021-03-16 00:27:31.520599-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim0_nonarray()],[1 3 90 77 ],[SUCCEED] 2021-03-16 00:27:32.122259-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim1_0()],[4 10 271 333 ],[SUCCEED] 2021-03-16 00:27:32.618431-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim1_1()],[3 11 271 333 ],[SUCCEED] 2021-03-16 00:27:32.621299-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim1_nonarray_0()],[1 3 22 33 ],[SUCCEED] 2021-03-16 00:27:32.626100-0700 PyTorchPlayground[16024:6249832] [bool test_cat_dim1_nonarray_1()],[1 9 53 67 ],[SUCCEED] 2021-03-16 00:27:32.630042-0700 PyTorchPlayground[16024:6249832] [bool test_softmax()],[2 3 1 1 ],[SUCCEED] 2021-03-16 00:27:32.632536-0700 PyTorchPlayground[16024:6249832] [bool test_sigmoid()],[1 3 4 4 ],[SUCCEED] 2021-03-16 00:27:32.636125-0700 PyTorchPlayground[16024:6249832] [bool test_hardsigmoid()],[3 3 44 44 ],[SUCCEED] 2021-03-16 00:27:32.638887-0700 PyTorchPlayground[16024:6249832] [bool test_hardswish()],[3 3 44 44 ],[SUCCEED] 2021-03-16 00:27:32.646802-0700 PyTorchPlayground[16024:6249832] [bool test_upsampling_nearest2d_vec()],[1 48 24 24 ],[SUCCEED] 2021-03-16 00:27:32.650445-0700 PyTorchPlayground[16024:6249832] [bool test_adaptive_avg_pool2d()],[1 48 24 24 ],[SUCCEED] 2021-03-16 00:27:32.667118-0700 PyTorchPlayground[16024:6249832] [bool test_hardtanh_()],[1 32 112 112 ],[SUCCEED] 2021-03-16 00:27:32.669041-0700 PyTorchPlayground[16024:6249832] [bool test_reshape()],[1 1280 1 1 ],[SUCCEED] ``` Reviewed By: SS-JIA Differential Revision: D27033569 fbshipit-source-id: 0b140a76e0ae2b27b57c0c9efb34a5fa03793c59	2021-03-17 01:02:17 -07:00
Scott Wolchok	0c8f16622b	[Caffe2] Rework CAFFE_ENFORCE_THAT (#53303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53303 The old code did a heap allocation unnecessarily and was a little convoluted. I think that it was structured that way to avoid double-evaluating arguments; I just forced them to be evaluated once as though they were passed to a function by binding const references to them. ghstack-source-id: 123918262 Test Plan: 1) `buck run mode/opt-clang //caffe2/caffe2/fb/tests:logging_bench` Before: ``` ============================================================================ caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s ============================================================================ glog_CHECK 2.01ns 498.63M caffe2_ENFORCE_GE 50.00% 4.01ns 249.31M glog_CHECK_GE 17.39% 11.53ns 86.73M fbcode_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE 100.00% 2.01ns 498.63M caffe2_ENFORCE_THAT 50.00% 4.01ns 249.33M ============================================================================ ``` After: ``` ============================================================================ caffe2/caffe2/fb/tests/logging_bench.cpp relative time/iter iters/s ============================================================================ glog_CHECK 2.01ns 498.63M caffe2_ENFORCE_GE 97.44% 2.06ns 485.88M glog_CHECK_GE 17.39% 11.53ns 86.73M fbcode_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE 100.00% 2.01ns 498.65M caffe2_ENFORCE_THAT 97.28% 2.06ns 485.06M ============================================================================ ``` Looks like about a 1.94x speedup! 2) Inspect generated assembly for logging_bench.cpp before & after by: ``` $ compile-commands caffe2/caffe2/fb/tests/logging_bench.cpp -f "mode/opt-clang" $ jq -r '.[0].arguments \| sh' < compile_commands.json \| sed -e "s/'-c'/'-S'/g" \| sed -E -e "s/'-g[12]'/'-g0'/g" > out.sh $ sh out.sh ``` Then diff logging_bench.s as you like. Before: P255408666 After: P277883307 Net about 1500 lines deleted from the assembly. We can see that the happy path (which the benchmark tests) no longer contains string creation. Reviewed By: dzhulgakov Differential Revision: D26829714 fbshipit-source-id: 6e11f8ea29292ae3d9f2cc89d08afcb06f7d39c9	2021-03-16 23:01:00 -07:00
lezcano	11a135ec82	Remove _th_take (#52665 ) Summary: These definitions of TH functions were left in the codebase after they were ported to ATen in https://github.com/pytorch/pytorch/pull/45283 and https://github.com/pytorch/pytorch/pull/45430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52665 Reviewed By: mruberry Differential Revision: D26655236 Pulled By: ailzhang fbshipit-source-id: eb106b72dfb814bd1fb4d240a1ede621ef4261b2	2021-03-16 22:56:35 -07:00
Hao Lu	04d5278cb6	[Static Runtime] Only run ReplaceWithCopy pass when enable_out_variant is true (#54111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54111 If we only run the ReplaceWithCopy pass when enable_out_variant is true, there is no need register a default op implementation. Reviewed By: edvgha Differential Revision: D27036077 fbshipit-source-id: f615f5d8b84629044af1c554421ea5e505e93239	2021-03-16 22:06:33 -07:00
Facebook Community Bot	fb7bab97c4	Automated submodule update: FBGEMM (#53947 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a7fd8fba11` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53947 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D27031755 fbshipit-source-id: d4cc9a791d4b9908f993a950c539bcbd988bde8b	2021-03-16 17:31:26 -07:00
mingfeima	b936abd840	fix nest openmp performance bug in thnn_conv2d (#52577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52577 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D27063800 Pulled By: VitalyFedyunin fbshipit-source-id: 000e17b722b2b1d48e1012b3fa222729e26777fb	2021-03-16 17:10:53 -07:00
Luca Wehrstedt	252916ab61	Update TensorPipe submodule (#54070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54070 Test Plan: Export to CircleCI Reviewed By: mrshenli Differential Revision: D27084375 fbshipit-source-id: 9e67916ad5abf91ccb62f8cbce6197e1e7fbc8d6	2021-03-16 17:05:56 -07:00
generatedunixname89002005307016	c4f50162be	[typing] suppress errors in `fbcode/caffe2` - batch 2 Test Plan: Sandcastle Differential Revision: D27082725 fbshipit-source-id: a920b4eb62ff07d8e80fa2b9e3fd340cb44b689f	2021-03-16 16:45:41 -07:00
Pritam Damania	8533a485ea	Fix SIGSEGV in CudaIPCTypes.cpp. (#53080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53080 As described in https://github.com/pytorch/pytorch/issues/51619, ProcessGroupShareTensorTest was failing due to segfaults in CudaIPCTypes.cpp. There were two issues that had to be fixed for this: 1. The ref_counter_files_ map was looked up and the result was used without checking whether or not the appropriate key existed in the map. This would result in default construction in the map if the key didn't exist resulting in a nullptr being stored in the map. 2. ~CudaIPCSentData uses the global cuda_ipc_global_entities variable. But as part of destroying cuda_ipc_global_entities, ~CudaIPCSentData is called which accesses an already destroyed cuda_ipc_global_entities. This is now avoided by clearing all shared blocks in ~CudaIPCGlobalEntities to ensure they are all cleaned up before the destructor exits. #Closes: https://github.com/pytorch/pytorch/issues/51619 ghstack-source-id: 122812319 Test Plan: Run `python test/distributed/test_c10d_spawn.py -v ProcessGroupShareTensorTest` Reviewed By: VitalyFedyunin Differential Revision: D26742332 fbshipit-source-id: 6de4c4533f5bca673e6e171af32d034bd6ade5bb	2021-03-16 16:39:40 -07:00
Thomas J. Fan	dc070605f1	TST Replaces assertEqualIgnoreTypes with assertEqual in test_indexing (#53115 ) Summary: Related to https://github.com/pytorch/pytorch/issues/38095 and https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53115 Reviewed By: mruberry Differential Revision: D27086086 Pulled By: VitalyFedyunin fbshipit-source-id: 7a6af6bcf3d7ce9ba96d47a24a40f451d00f0e67	2021-03-16 16:06:36 -07:00
Yi Wang	4b00bce156	[Gradient Compression] Introduce fp16_compress_wrapper in ddp_comm_hooks.rst (#54052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54052 Introduce `fp16_compress_wrapper`, which can give some speedup on top of some gradient compression algorithms like PowerSGD. ghstack-source-id: 124001805 Test Plan: {F509205173} Reviewed By: iseessel Differential Revision: D27076064 fbshipit-source-id: 4845a14854cafe2112c0caefc1e2532efe9d3ed8	2021-03-16 15:40:10 -07:00
Martin Yuan	524cb0a514	[PyTorch Mobile] Dedup method names in bytecode serialization (#53677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53677 When serializing bytecode, we serialize it based on methods. It may happen that there are multiple instances of a class. In such a case, the methods inside the class may be serialized multiple times. To reduce the duplication, we cache the qualified name of the methods, so that one method is serialized only once. Test Plan: existing unittests and CI Reviewed By: dhruvbird, raziel Differential Revision: D26933945 Pulled By: iseeyuan fbshipit-source-id: 8a9833949fa18f7103a5a0be19e2028040dc7717	2021-03-16 15:24:47 -07:00
Edward Yang	282eefebf3	Delete defunct ComplexCPU/ComplexCUDA dispatch keys (#54013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54013 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27051837 Pulled By: ezyang fbshipit-source-id: c2a20737b6bd4a1317905bafceb2d8cb39f37e76	2021-03-16 15:20:04 -07:00
Edward Yang	4878415688	Make storage access error NotImplementedError (#53972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53972 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27036573 Pulled By: ezyang fbshipit-source-id: 5cc7d9e124bd27ca4041feb56b5007d9408d622a	2021-03-16 15:20:01 -07:00
Edward Yang	d47fd3df81	Compute type_equal() without reference to backend() (#53823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53823 Argument for correctness: type_equal previous compared if backends are equal. Backend is computed by translation from dispatch key. I verified that computeDispatchKey never computed a weird dispatch key (e.g., AutogradXLA), so that dispatchKeyToBackend was effectively injective. Then it is always valid to compare the arguments of an injective function for equality, rather than the output of the injective function. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27036575 Pulled By: ezyang fbshipit-source-id: 6aeafc89f287da0bc0065bd21c1adb5e272dbb81	2021-03-16 15:19:57 -07:00
Edward Yang	3c457043fb	Also propagate storage_access_should_throw_ when copying tensor metadata (#53816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53816 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27036574 Pulled By: ezyang fbshipit-source-id: 71e61b0aa3d46159c9af1112c262cbfa7eaa1879	2021-03-16 15:18:37 -07:00
Scott Wolchok	665d5e2a4f	[PyTorch][JIT] Audit interpreter for extra copies (#54029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/54029 I found what appear to be some missed moves and/or extra copies in the JIT interpreter. ghstack-source-id: 123958682 Test Plan: Existing CI for correctness Ran AdIndexer inline_cvr local_ro model benchmark with static_runtime off via `env bin=/tmp/ptvsc2_predictor_bench.StaticDispatchModeFile static_runtime=0 caffe2=0 scripts/swolchok/static_runtime/inline_cvr/run_local_ro.sh` before: ``` I0315 14:25:23.916893 3075680 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01635. Iters per second: 983.914 I0315 14:26:05.536207 3080560 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01689. Iters per second: 983.395 I0315 14:26:47.510561 3083335 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.02697. Iters per second: 973.737 I0315 14:27:29.024830 3086767 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.01326. Iters per second: 986.918 I0315 14:28:10.849496 3091323 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.023. Iters per second: 977.517 ``` after: ``` I0315 14:17:43.280469 3046242 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.997838. Iters per second: 1002.17 I0315 14:18:24.244606 3046861 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00173. Iters per second: 998.269 I0315 14:19:05.208899 3051998 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00187. Iters per second: 998.136 I0315 14:19:46.103854 3055392 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 1.00073. Iters per second: 999.27 I0315 14:20:27.011411 3056062 PyTorchPredictorBenchLib.cpp:215] PyTorch run finished. Milliseconds per iter: 0.999121. Iters per second: 1000.88 ``` (This was just a convenient workload I had handy; the plan of record is to use static runtime for inline_cvr inference AIUI.) Reviewed By: dhruvbird, walterddr Differential Revision: D27060762 fbshipit-source-id: 5567206d7c2d9ae99776ce5524caf09ec2035e87	2021-03-16 15:09:09 -07:00
mattip	ae154a8c2c	various doc building cleanups (#53851 ) Summary: brianjo - Add a javascript snippet to close the expandable left navbar sections 'Notes', 'Language Bindings', 'Libraries', 'Community' - Fix two latex bugs that were causing output in the log that might have been misleading when looking for true doc build problems - Change the way release versions interact with sphinx. I tested these via building docs twice: once with `export RELEASE=1` and once without. - Remove perl scripting to turn the static version text into a link to the versions.html document. Instead, put this where it belongs in the layout.html template. This is the way the domain libraries (text, vision, audio) do it. - There were two separate templates for master and release, with the only difference between them is that the master has an admonition "You are viewing unstable developer preview docs....". Instead toggle that with the value of `release`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53851 Reviewed By: mruberry Differential Revision: D27085875 Pulled By: ngimel fbshipit-source-id: c2d674deb924162f17131d895cb53cef08a1f1cb	2021-03-16 15:01:59 -07:00
kshitij12345	aa8714dfed	[complex] torch.lerp: complex autograd support (#53689 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53689 Reviewed By: bdhirsh Differential Revision: D27081150 Pulled By: anjali411 fbshipit-source-id: 06f96b6f67bac69ef56c12a12fc12499c2435641	2021-03-16 14:28:13 -07:00
Sam Estep	c0fafcc766	Don't actually print anomalies in TTRR (#54078 ) Summary: This PR disables the bulk of the output for test time regression reporting, since it's obscuring more important signal (especially in cases where shards are shifting around). Pull Request resolved: https://github.com/pytorch/pytorch/pull/54078 Test Plan: ``` python test/test_testing.py ``` Reviewed By: ezyang, walterddr Differential Revision: D27088987 Pulled By: samestep fbshipit-source-id: 06a4eeb75641552bad2ab4b9154a8c70c57b0d68	2021-03-16 14:26:32 -07:00
lezcano	1f5b9170aa	Faster backwards for cumsum and cumprod (#53711 ) Summary: Provides a faster formula for `cumprod` in the case when the input has zeros. This formula is non-differentiable, so we leave the previous formula for the cases when `at::GradMode::is_enabled()`. This new formula gives up to x10 and x30 speed-ups in CPU and GPU (see the benchmarks below). The `cumsum` backward formula was rewritten so that no copies are necessary. We also removed a double negation in its formula. This gives a significant speed-up in CPU, while being almost as efficient as the formula with copies in GPU. We can see this speed-up when comparing the "No zeros" part of the benchmark. Benchmarks: nb. It is worth noting that the script tests the forward and the backward for `cumprod`, so the speed-ups should be even larger than those announced here. <details> <summary>Script</summary> ```python from IPython import get_ipython import torch from itertools import product torch.manual_seed(13) torch.set_num_threads(1) ipython = get_ipython() cpu = torch.device('cpu') cuda = torch.device('cuda') def run_test(ndims, size, size_prod, zeros, device): print(f"ndims: {ndims}, tensor_size: {size}, size_prod: {size_prod}, zeros: {zeros}, device: {device}") for dim in range(ndims): sizes = ndims * [size] sizes[dim] = size_prod tensor = torch.rand(sizes, device=device) with torch.no_grad(): if zeros: # Set 0.1 of them to zero p_drop = 0.1 mask = torch.full_like(tensor, 1.0 - p_drop) tensor = tensor torch.bernoulli(mask) else: tensor = tensor + 1e-3 tensor.requires_grad_() grad = torch.ones_like(tensor) # We test both forward + backward, meaning that the speed-up is actually greater than reported # That being said, this is more realistic than doing `retain_graph=True` command = "torch.autograd.grad([tensor.cumprod(dim)], [tensor], grad_outputs=[grad])" if device == cuda: command += "; torch.cuda.synchronize()" ipython.magic(f"timeit {command}") print() for device, zeros in product([cuda, cpu], [True, False]): run_test(3, 300, 10, zeros, device) run_test(3, 300, 100, zeros, device) if device == cuda: run_test(3, 300, 300, zeros, device) ``` </details> <details> <summary>CPU This PR (Some regression small tensors, x4 speed-up large tensors)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 28.2 ms ± 12.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 29.8 ms ± 78.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 24.5 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 414 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 428 ms ± 4.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 382 ms ± 3.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.11 ms ± 9.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.83 ms ± 3.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.08 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 92.2 ms ± 113 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 101 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 87 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA This PR (7-30x speed-up)</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 1.46 ms ± 2.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.48 ms ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.93 ms ± 8.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 10.5 ms ± 914 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.6 ms ± 509 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 11.7 ms ± 864 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 30.3 ms ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 30.6 ms ± 6.44 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 32.2 ms ± 2.34 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 248 µs ± 335 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 252 µs ± 186 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 438 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.1 ms ± 193 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.16 ms ± 380 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.59 ms ± 398 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 6.3 ms ± 857 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.39 ms ± 288 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.15 ms ± 233 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> <details> <summary>CPU master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cpu 8.27 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.8 ms ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 28.2 ms ± 74.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cpu 1.53 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.95 s ± 4.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.86 s ± 3.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cpu 3.42 ms ± 20 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.25 ms ± 3.65 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.34 ms ± 3.04 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cpu 104 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 117 ms ± 99.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 94.8 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` </details> <details> <summary>CUDA master</summary> ``` Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: True, device: cuda 912 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.05 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.74 ms ± 381 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: True, device: cuda 71.3 ms ± 7.91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 85.4 ms ± 9.82 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 119 ms ± 6.21 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: True, device: cuda 646 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 776 ms ± 81.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 917 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) No Zeros: ndims: 3, tensor_size: 300, size_prod: 10, zeros: False, device: cuda 301 µs ± 893 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 308 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 592 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) ndims: 3, tensor_size: 300, size_prod: 100, zeros: False, device: cuda 2.61 ms ± 375 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 2.68 ms ± 524 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.38 ms ± 736 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ndims: 3, tensor_size: 300, size_prod: 300, zeros: False, device: cuda 7.89 ms ± 848 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.03 ms ± 517 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.24 ms ± 405 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` </details> cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53711 Reviewed By: jbschlosser Differential Revision: D27059662 Pulled By: anjali411 fbshipit-source-id: be610d5590c0199b4412dff66fac47666faaff9d	2021-03-16 13:57:43 -07:00
Rong Rong (AI Infra)	6332fd6255	enable sc1090 and sc1091 (#54069 ) Summary: SC1090/1091 are important to prevent accidental delete/move of utility shell scripts Pull Request resolved: https://github.com/pytorch/pytorch/pull/54069 Test Plan: CI Reviewed By: samestep Differential Revision: D27084094 Pulled By: walterddr fbshipit-source-id: 16deb83fce691eba0263978374564d172bc8d371	2021-03-16 12:59:55 -07:00
Masaya, Kato	2c4a64589b	fix mkldnn_add in-place behavior (#51687 ) Summary: There are the following two patterns to call add in-pace. ```python torch.add(a, b, out=a) # (1) a in-placed torch.add(a, b, out=b) # (2) b in-placed ``` If a and b are mkldnn Tensor, the value is different from expected in case (2). Sample code to reproduce the behavior: ```python import torch torch.manual_seed(4) a = torch.randn(4, 4) b = torch.randn(4, 4) b.fill_(1.0) a_mkl = a.to_mkldnn() b_mkl = b.to_mkldnn() torch.add(b, a, alpha=1.0, out=a) torch.add(b_mkl, a_mkl, alpha=1.0, out=a_mkl) print(a) print(a_mkl) ``` Results: Actual: ```python tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]]) tensor([[2., 2., 2., 2.], [2., 2., 2., 2.], [2., 2., 2., 2.], [2., 2., 2., 2.]], layout=torch._mkldnn) ``` Expected: ```python tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]]) tensor([[ 0.0586, 2.2632, 0.8162, 1.1505], [ 1.1075, 0.7220, -1.6021, 1.6245], [ 0.1316, 0.7949, 1.3976, 1.6699], [ 0.9463, 1.0467, -0.7671, -1.1205]], layout=torch._mkldnn) ``` This is because `dnnl::sum` called in `mkldnn_add` has the following specifications: [oneDNN doc : Sum](https://oneapi-src.github.io/oneDNN/dev_guide_sum.html) > The sum primitive supports in-place operation, meaning that the src0 tensor can be used as both input and output. > In-place operation overwrites the original data. Using in-place operation requires the memory footprint of the > output tensor to be either bigger than or equal to the size of the dst memory descriptor used for primitive creation. but, case 2) are added to the first argument. So, we modified it so that a and b are swapped and passed to "sum" in case (2). Environment ・CPU : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz ・build USE_MKLDNN=1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51687 Reviewed By: jbschlosser Differential Revision: D27062172 Pulled By: VitalyFedyunin fbshipit-source-id: bf76d36f9fdb1b4337d71d87bcdbaf4edb11f12f	2021-03-16 12:54:27 -07:00
Michael Carilli	b27e678dfb	[RELAND] [CUDA graphs] Private mempools for CUDA graphs (#54038 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/51436. Apparently some non-public windows builds run cuda tests on the default stream, so I changed a few capture tests to manually ensure all captures happen on non-default streams. Pull Request resolved: https://github.com/pytorch/pytorch/pull/54038 Reviewed By: mruberry Differential Revision: D27068649 Pulled By: ngimel fbshipit-source-id: 4284475fa40ee38c0f8faff05a2faa310cf8a207	2021-03-16 12:13:33 -07:00
XiaobingSuper	bea3cb7069	remove aliasMultinomial decode from TH and THC (#52585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52585 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26654125 Pulled By: albanD fbshipit-source-id: 6a745080021623a2472dae7862cde91b949983ee	2021-03-16 09:43:56 -07:00
Nikita Shulga	e8e570e9c5	[MacOS] Cross compile stub when building for M1 on x86 (#54046 ) Summary: Also rename `CROSS_COMPILE_ARM` to `CROSS_COMPILE_ARM64` Pull Request resolved: https://github.com/pytorch/pytorch/pull/54046 Reviewed By: walterddr Differential Revision: D27071928 Pulled By: malfet fbshipit-source-id: 9143cd5d110ed67f0609f0a4bbb20922012ee665	2021-03-16 00:24:09 -07:00
Wenlei Xie	2ecb2c7931	Pass Scalar by reference (#53583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53583 `Scalar` takes 32 bytes due to `c10::complex<double>` requires aligning to 16 bytes. Passing Scalar by reference shows about 1% improvements on instruction count. All the changes in this commit are codemoded except for the following 4 files (which code-gen signatures): ``` tools/codegen/api/cpp.py tools/codegen/api/native.py tools/codegen/api/structured.py caffe2/contrib/aten/gen_op.py ``` # Codemode ## Main Step For the codemod part, here is the main command used: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)optional<Scalar> (\w+)' '${1}const optional<Scalar>& ${2}' ``` As you can tell, it codemods both `Scalar` and `optional<Scalar>`. Apply these commands iteratively until reaching a fix-point (since one method signature might contain multiple `Scalar` parameter). In retrospect, excluding `thrid_party` and `torch/csrc/jit` would be a good idea. (I revert it manually later, see https://github.com/pytorch/pytorch/pull/53479 as an reference). ## Pre-Step Prior to applying the main command, as some `Scalar` are presented as `at::Scalar` or `c10::Scalar`, so I codemod some of them in advance. Here is an incomplete list: ``` fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)at::Scalar (\w+)' '${1}const at::Scalar& ${2}' fastmod --extensions h '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' fastmod --extensions cpp '([a-zA-Z_+]\([^)],?\s)c10::optional<Scalar> (\w+)' '${1}const c10::optional<Scalar>& ${2}' ``` ## Fixup There are a couple of post codemod fixup. For example, `const Scalar` will be codemoded into `const const Scalar&`. `at:Scalar` will be codemoded into `at::const Scalar&` (if `Pre-step` is not done comprehensively). Here is an incomplete list: ``` fastmod --extensions cpp 'const const Scalar' 'const Scalar' fastmod --extensions h 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod --extensions cpp 'const const c10::optional<Scalar>' 'const c10::optional<Scalar>' fastmod 'at::const Scalar&' 'const at::Scalar&' ``` ## Supplementary `cu` and `mm` files also need to be codemoded, for example: ``` fastmod --extensions cu 'at::const Scalar&' 'const at::Scalar&' fastmod --extensions mm '([a-zA-Z_+]$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' ``` Function pointers are not codemoded. Here is an incomplete list: ``` # Cover case: using index_fill_fn = void()(TensorIterator & iter, int64_t dim, int64_t self_dim_size, int64_t self_dim_stride, Scalar source); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar (\w+)' '${1}const Scalar& ${2}' # Cover case: using softplus_fn = void ()(TensorIterator&, Scalar, Scalar); fastmod --extensions h '(void\s\(\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions cpp '(void\s$\s\\s$$[^)],?\s)Scalar([, $])' '${1}const Scalar&${2}' fastmod --extensions h '(void\s$\s\\s$$[^)],?\s)optional<Scalar>([, $])' '${1}const optional<Scalar>&${2}' ``` Some corner cases needs to be manually fixed. ghstack-source-id: 123970306 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904445 fbshipit-source-id: 8d8a002af4b5125f153a32f03c6956be7ae5671d	2021-03-15 23:17:06 -07:00
Wenlei Xie	4dd1c72dde	Treat Scalar parameter as if it is constant (#53582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53582 We will pass `Scalar` by reference in the following commit, i.e. `const Scalar&`. ghstack-source-id: 123965970 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D26904444 fbshipit-source-id: 7f58ee4e38dcd860f0d1120cab4e82f35ca3770f	2021-03-15 23:15:27 -07:00
XiaobingSuper	603097be18	OneDNN MaxPooling: reduce memory use for inference path (#52728 ) Summary: For OneDNN MaxPooling training, it will save indices as a workspace for backward, but for inference, indices are not necessary, this PR will make check to avoid saving indices to reduce memory use for inference path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52728 Reviewed By: jbschlosser Differential Revision: D27062435 Pulled By: VitalyFedyunin fbshipit-source-id: 9e70268a8ba491a7914b980079c0945d753cd4f3	2021-03-15 21:53:05 -07:00
Dhruv Matani	2c5579702a	[PyTorch Mobile] Add module size to logged metadata (#53578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53578 We want to be able to log the loaded module size to the scuba table `qpl_metrics/pytorch`. Hence, adding the `model_size` field to the logged metadata when logging a module load success event. ghstack-source-id: 123980964 Test Plan: xcheng16 How should this be tested? Reviewed By: xcheng16, raziel Differential Revision: D26902971 fbshipit-source-id: a7c2e9120706bd31f76f6572c8503d4acf8a89e2	2021-03-15 21:11:36 -07:00
Ansley Ussery	08f04c0db2	Test forward reference annotations (#53713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53713 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26946847 Pulled By: ansley fbshipit-source-id: 2f99247c4b54ee06dcb54b23fdcee3537643cad4	2021-03-15 19:40:26 -07:00
Vitaly Fedyunin	ce2f71836c	Disabling dispatch to OneDNN for group convolutions when groups size = 24 * n (#53991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53991 Reviewed By: malfet Differential Revision: D27048155 Pulled By: VitalyFedyunin fbshipit-source-id: 5009f064220156ca14e1eb97172cfd4f7531b2a9	2021-03-15 19:30:19 -07:00
Rohan Varma	f52a3bd634	[DDP] remove dedupe check in reducer (#53919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53919 https://github.com/pytorch/pytorch/pull/53279/files has landed deduplicating the shared params in python before constructing reducer. Because of this, we no longer need the changes in https://github.com/pytorch/pytorch/pull/46755/files. This is already tested by `test_ddp_shared_grad_acc_unused_params` and `test_ddp_weight_sharing` ghstack-source-id: 123828299 Test Plan: ci Reviewed By: SciPioneer Differential Revision: D27015466 fbshipit-source-id: efb079540c1a0e18bb38e68479caeb50cf550304	2021-03-15 18:50:05 -07:00
Michael Suo	8c2c9450cc	[package] autoformat (#53783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53783 Use isort + black on torch/package/ Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26969020 Pulled By: suo fbshipit-source-id: e2c0738e79bf41b6342355eb7025998178c35dc9	2021-03-15 17:18:43 -07:00
Jane Xu	ee35060888	Fix sharding algo + test it (#53942 ) Summary: This PR: 1. moves sharding algorithm from run_test.py to framework_utils.py (let me know if you have a better place for it) 2. adds tests for the algorithm in test_testing.py 3. fixes the algorithm so that it doesn't tack on the unknown jobs all to the shard with the minimum time, but instead distributes them around the shards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53942 Test Plan: python test/test_testing.py -k TestFrameworkUtils Reviewed By: samestep Differential Revision: D27047223 Pulled By: janeyx99 fbshipit-source-id: 824b20009c0bb707aa5361de445cdec795d5e3f1	2021-03-15 16:33:56 -07:00
Aliaksandr Ivanou	e91aeb0470	[4/n][torch/elastic][upstream] Move torchelastic/metrics to torch/distributed/elastic/metrics (#53870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53870 Move torchelastic/metrics to torch/distributed/elastic/metrics Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/... Reviewed By: kiukchung Differential Revision: D26970901 fbshipit-source-id: 0e0a211fe509b7bc3ab10adfefba81cd71b6db37	2021-03-15 16:07:18 -07:00
nikithamalgi	b9fdf72174	Fix doc (#53996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53996 Fixes issue: #52479 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D27051056 Pulled By: nikithamalgifb fbshipit-source-id: ff5d2fc3599571346e2323fa893c1e238097a164	2021-03-15 15:44:30 -07:00
Erjia Guan	e87ab2ac4d	[DataLoader] Switch to guaranteed determinism & add option to non_deterministic (#53532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53532 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D26888825 Pulled By: ejguan fbshipit-source-id: 1e8c266146aa802a43e8c23c4f0b3b02134c8b50	2021-03-15 14:47:16 -07:00
Ailing Zhang	274b96b878	Move as_view/increment_version to its separate key. (#53342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53342 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973913 Pulled By: ailzhang fbshipit-source-id: bc7fc25d1a3a1f20cdfa1d7126fa559a84d194a4	2021-03-15 14:47:12 -07:00
Ilia Cherniavskii	8f98b87212	Update Kineto revision (#53940 ) Summary: Update Kineto revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/53940 Test Plan: CI Reviewed By: gdankel, ngimel Differential Revision: D27027834 Pulled By: ilia-cher fbshipit-source-id: f5515720c641fde8a8b80c38fa4cbb611f76f36e	2021-03-15 14:45:22 -07:00
Facebook Community Bot	a7ba3f3aa8	Automated submodule update: tensorpipe (#53999 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `17008b1be8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53999 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27046211 fbshipit-source-id: 72d7eb3814d30afb7956e0e0b43b0b320fbf009a	2021-03-15 14:39:17 -07:00
Colin Gravill	65087dd1d4	Fix broken link from load_inline to new test location (#53701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53701 Reviewed By: jbschlosser Differential Revision: D27047406 Pulled By: ezyang fbshipit-source-id: 0be6e669cf41527d3ffeb101e5f36db07e41b4af	2021-03-15 13:53:15 -07:00
Eli Uriegas	67f765328b	scripts: Change promote pypi to be more flexible (#53774 ) Summary: Promotion to PyPI should be more flexible to allow any package to be promoted to PyPI. After we re-added a version suffix to cuda 10.2 it means that this script needs to have the flexibility to designate which platform and which version suffix will actually be uploaded to PyPI Should coincide with https://github.com/pytorch/builder/pull/678 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53774 Reviewed By: jbschlosser Differential Revision: D27052347 Pulled By: seemethere fbshipit-source-id: 71129cc5afbd7de448c970ef721bc979c3420586	2021-03-15 13:30:21 -07:00
XiaobingSuper	793a29a7d5	add OneDNN batch_norm backward (#50460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50460 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006887 Pulled By: VitalyFedyunin fbshipit-source-id: 472398772af01a31594096ccc714fd487ed33dd4	2021-03-15 13:30:17 -07:00
XiaobingSuper	33e3deed4f	add OneDNN relu backward and reshape backward (#49455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49455 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006886 Pulled By: VitalyFedyunin fbshipit-source-id: c81ef115205171b80652800a76170dd759905e28	2021-03-15 13:27:56 -07:00
Howard Huang	7f88840495	Fix prefix store timeout bug (#53928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53928 HashStoreTest was taking forever to run. Turns out it was because a default timeout is set when creating Store() and setTimeout for prefixStore is not actually able to change the timeout of the underlying store. After removing the default timeout and updating setTimeout, this will save ~10 minutes for all of the gcc_test CI runs. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D27025275 Pulled By: H-Huang fbshipit-source-id: 650c8c1eb8b166da1d412ed88e765747a2ca2069	2021-03-15 13:23:20 -07:00
Meraj	7ff4955de5	[doc] Fix documentation for tensorsolve (#53320 ) Summary: This PR fixes a typo in the explanation of `dims` for `linalg.tensorsolve`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53320 Reviewed By: jbschlosser Differential Revision: D27048736 Pulled By: anjali411 fbshipit-source-id: db230b21191cc9cfb73b967cd15305fe74178c2b	2021-03-15 12:22:17 -07:00
Iurii Zdebskyi	b5cdb53af1	Add division logic to a slow/fast path (#49250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49250 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25502938 Pulled By: izdeby fbshipit-source-id: bdd583464eb15d7cb30fd0c22d119cc4b31cbf8d	2021-03-15 12:17:39 -07:00
Iurii Zdebskyi	4bb34c2a75	Update Binary Ops with scalar lists (#49249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49249 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D25502939 Pulled By: izdeby fbshipit-source-id: b16e23063b37521be549e83cb17676e3afc4ddb3	2021-03-15 12:16:04 -07:00
kshitij12345	c1a39620b8	[nn] nn.Embedding : `padding_idx` doc update (#53809 ) Summary: Follow-up of https://github.com/pytorch/pytorch/pull/53447 Reference: https://github.com/pytorch/pytorch/pull/53447#discussion_r590521051 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53809 Reviewed By: bdhirsh Differential Revision: D27049643 Pulled By: jbschlosser fbshipit-source-id: 623a2a254783b86391dc2b0777b688506adb4c0e	2021-03-15 11:54:51 -07:00
Ikko Ashimine	5b62b0d9bc	[RPC] Fix typo in rref_context.cpp (#53978 ) Summary: untill -> until Pull Request resolved: https://github.com/pytorch/pytorch/pull/53978 Reviewed By: jbschlosser Differential Revision: D27039043 Pulled By: rohan-varma fbshipit-source-id: c9178e79fe8b2a3dc61665148fe55dba5adb0abf	2021-03-15 11:08:59 -07:00
Ikko Ashimine	7e39a40300	Fix typo in torchvision_models.py (#53968 ) Summary: accross -> across Pull Request resolved: https://github.com/pytorch/pytorch/pull/53968 Reviewed By: jbschlosser Differential Revision: D27035761 Pulled By: ngimel fbshipit-source-id: 94fac6f2e27648e70652fd29f7800e60b211acd5	2021-03-15 11:02:06 -07:00
Xiong Wei	da10ccd35f	Implements cpu_kernel_multiple_outputs and torch.frexp (#51097 ) Summary: Close https://github.com/pytorch/pytorch/issues/51108 Related https://github.com/pytorch/pytorch/issues/38349 This PR implements the `cpu_kernel_multiple_outputs` to support returning multiple values in a CPU kernel. ```c++ auto iter = at::TensorIteratorConfig() .add_output(out1) .add_output(out2) .add_input(in1) .add_input(in2) .build(); at::native::cpu_kernel_multiple_outputs(iter, [=](float a, float b) -> std::tuple<float, float> { float add = a + b; float mul = a * b; return std::tuple<float, float>(add, mul); } ); ``` The `out1` will equal to `torch.add(in1, in2)`, while the result of `out2` will be `torch.mul(in1, in2)`. It helps developers implement new torch functions that return two tensors more conveniently, such as NumPy-like functions [divmod](https://numpy.org/doc/1.18/reference/generated/numpy.divmod.html?highlight=divmod#numpy.divmod) and [frexp](https://numpy.org/doc/stable/reference/generated/numpy.frexp.html#numpy.frexp). This PR adds `torch.frexp` function to exercise the new functionality provided by `cpu_kernel_multiple_outputs`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51097 Reviewed By: albanD Differential Revision: D26982619 Pulled By: heitorschueroff fbshipit-source-id: cb61c7f2c79873ab72ab5a61cbdb9203531ad469	2021-03-15 10:44:32 -07:00
BowenBao	ad8d1b2aaa	[ONNX] Update embedding export wrt padding_idx (#53931 ) Summary: To be in-sync with https://github.com/pytorch/pytorch/issues/53447 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53931 Reviewed By: ngimel Differential Revision: D27026616 Pulled By: malfet fbshipit-source-id: 4c50b29fa296c90aeeeb1757bdaada92cbba33d4	2021-03-15 10:03:53 -07:00
Jay Patel	4f62c622b3	Cleanup of unused list in adam.py (#53874 ) Summary: Code cleanup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53874 Reviewed By: jbschlosser Differential Revision: D27036819 Pulled By: ngimel fbshipit-source-id: c267e20c8d91224cd3c01b302a75f43aa309b560	2021-03-15 09:49:27 -07:00
Thomas Viehmann	8734e88f0b	delete has no more data after the key (#53886 ) Summary: The tcpstore delete key implementation inadvertendly set "moreData" when sending the key when it was in fact the last message. Thank you, PetrochukM, for the reproducing example which was instrumental in developing the fix (and is the blueprint for the test case). Fixes https://github.com/pytorch/pytorch/issues/53872 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53886 Reviewed By: jbschlosser Differential Revision: D27011846 Pulled By: H-Huang fbshipit-source-id: 5c460d1e4d095a8bc267bf63613b556856ced3e8	2021-03-15 08:44:55 -07:00
Gemfield	700c817a6a	Add install for libCaffe2_perfkernels_avx*.a (#53825 ) Summary: When build libtorch static library, these three static libraries will be generated but won't be installed to CMAKE_INSTALL_LIBDIR: - libCaffe2_perfkernels_avx2.a - libCaffe2_perfkernels_avx512.a - libCaffe2_perfkernels_avx.a This PR will fix this issue. Please be noted that after this fix there still have static libraries missing in CMAKE_INSTALL_LIBDIR, but they belong to third_party repo, and we need to fix in the corresponding repo: - libfoxi_loader.a - libonnx.a - libonnx_proto.a - libfmt.a - libnccl_static.a Pull Request resolved: https://github.com/pytorch/pytorch/pull/53825 Reviewed By: ngimel Differential Revision: D27013844 Pulled By: malfet fbshipit-source-id: 8a84cc72b6ae87393ca26c4e474f5526a7b18ab2	2021-03-15 08:37:11 -07:00
Facebook Community Bot	2782126bfe	Automated submodule update: tensorpipe (#53892 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `cd0eb12c1f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53892 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D27009398 fbshipit-source-id: af46edd701cde94c6175d3058fd15487d8b0b8c7	2021-03-15 05:58:27 -07:00
Tao Xu	bb21aea37a	[iOS GPU] Add the reset of binary ops (#53950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53950 Add four binary ops to Metal - `aten::mul_` - `aten::sub_` - `aten::div` - `aten::div_` ghstack-source-id: 123850577 Test Plan: - `buck test pp-mac` ``` 2021-03-11 20:36:47.151139-0800 PyTorchPlayground[8469:5169786] [bool test_sub()],[5 3 167 222 ],[SUCCEED] 2021-03-11 20:36:47.157638-0800 PyTorchPlayground[8469:5169786] [bool test_sub_broadcast()],[1 3 1 1 ],[SUCCEED] 2021-03-11 20:36:47.170640-0800 PyTorchPlayground[8469:5169786] [bool test_sub_broadcast2()],[3 3 192 192 ],[SUCCEED] 2021-03-11 20:36:47.194009-0800 PyTorchPlayground[8469:5169786] [bool test_mul()],[2 7 262 119 ],[SUCCEED] 2021-03-11 20:36:47.210344-0800 PyTorchPlayground[8469:5169786] [bool test_mul_broadcast()],[4 3 192 192 ],[SUCCEED] 2021-03-11 20:36:47.216610-0800 PyTorchPlayground[8469:5169786] [bool test_mul_broadcast2()],[1 3 192 192 ],[SUCCEED] 2021-03-11 20:36:47.224471-0800 PyTorchPlayground[8469:5169786] [bool test_div()],[1 3 192 192 ],[SUCCEED] 2021-03-11 20:36:47.240817-0800 PyTorchPlayground[8469:5169786] [bool test_div_broadcast()],[4 3 192 192 ],[SUCCEED] 2021-03-11 20:36:47.246816-0800 PyTorchPlayground[8469:5169786] [bool test_div_broadcast2()],[1 3 192 192 ],[SUCCEED] ``` Reviewed By: SS-JIA Differential Revision: D27003417 fbshipit-source-id: 290f7e524eef4c444f8884fc1315151752e5ac31	2021-03-14 22:14:24 -07:00
Tao Xu	530dc828ae	[iOS GPU] Support element-wise broadcasting for binary ops in shaders (#53949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53949 As title says ghstack-source-id: 123849745 Test Plan: `buck test pp-mac` ``` 2021-03-11 18:25:07.922375-0800 PyTorchPlayground[8324:5122672] [bool test_add()],[1 180 12 12 ],[SUCCEED] 2021-03-11 18:25:07.960812-0800 PyTorchPlayground[8324:5122672] [bool test_add_broadcast()],[2 17 58 67 ],[SUCCEED] 2021-03-11 18:25:07.978399-0800 PyTorchPlayground[8324:5122672] [bool test_add_broadcast2()],[2 17 1 67 ],[SUCCEED] 2021-03-11 18:25:08.021570-0800 PyTorchPlayground[8324:5122672] [bool test_sub()],[5 3 167 222 ],[SUCCEED] 2021-03-11 18:25:08.034218-0800 PyTorchPlayground[8324:5122672] [bool test_sub_broadcast()],[1 3 1 1 ],[SUCCEED] 2021-03-11 18:25:08.069419-0800 PyTorchPlayground[8324:5122672] [bool test_sub_broadcast2()],[3 3 192 192 ],[SUCCEED] 2021-03-11 18:25:08.112967-0800 PyTorchPlayground[8324:5122672] [bool test_mul()],[2 7 262 119 ],[SUCCEED] 2021-03-11 18:25:08.136691-0800 PyTorchPlayground[8324:5122672] [bool test_mul_broadcast()],[4 3 192 192 ],[SUCCEED] 2021-03-11 18:25:08.148920-0800 PyTorchPlayground[8324:5122672] [bool test_mul_broadcast2()],[1 3 192 192 ],[SUCCEED] ``` Reviewed By: SS-JIA Differential Revision: D27000487 fbshipit-source-id: f86fca5ac1960ca0a56636da17ae05020c1a4138	2021-03-14 22:12:52 -07:00
kshitij12345	df7c0a06d6	[testing] assert no duplicate in method_tests for an OpInfo entry (#53492 ) Summary: Assert no duplicate in method_tests for an OpInfo entry Pull Request resolved: https://github.com/pytorch/pytorch/pull/53492 Reviewed By: izdeby Differential Revision: D26882441 Pulled By: mruberry fbshipit-source-id: f0631ea2b46b74285c76365c679bd45abc917d63	2021-03-14 21:58:39 -07:00
Edward Yang	547f435763	Fix restriding logic for structured kernels (#53759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53759 Fixes #53587, see issue for in-depth explanation of the bug. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26971342 Pulled By: ezyang fbshipit-source-id: 805983fed2658e27fb033f36a71fd30950a29328	2021-03-14 20:41:23 -07:00
Edward Yang	c2f41b6b84	Add meta device to generic device testing framework, skip NotImplementedError (#53682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53682 With this, under the meta device, 101 tests passed and 16953 skipped. It ain't much, but it's a start. Some various bits and bobs: - NotImplementedError suppression at test level is implemented in the same way as CUDA memory leak check, i.e., by wrapping test methods and monkeypatching them back in. - I had to reimplement assertRaises/assertRaisesRegex from scratch to ignore NotImplementedError when _ignore_not_implemented_error is True. The implementation relies on a small amount of private API that hasn't changed since 2010 - expectedAlertNondeterministic doesn't really work so I skipped them all; there's probably a way to do it better I tested this using `pytest --disable-warnings --tb=native -k meta --sw test/*.py` and a pile of extra patches to make collection actually work (lol). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26955539 Pulled By: ezyang fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74	2021-03-14 20:41:19 -07:00
Edward Yang	d47d246206	Add 'noarch' tests which only run in one CI config (#53747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53747 Fixes #53743 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26971343 Pulled By: ezyang fbshipit-source-id: cee7aa10063ae674f741406a3af830e4b4f128df	2021-03-14 20:39:07 -07:00
Chester Liu	f6df18f6ca	Clean up future imports for Python 2 (#53349 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53349 Reviewed By: malfet Differential Revision: D27039089 Pulled By: bugra fbshipit-source-id: 8063dc184248604506a8dbb1bcb73da8ec85bb18	2021-03-14 15:56:13 -07:00
Mike Ruberry	319ab58e27	Skips test_linalg_lstsq on ROCm (#53977 ) Summary: This test is flaky (tracked in https://github.com/pytorch/pytorch/issues/53976). This PR skips it to let the rest of the ROCm CI run. cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/53977 Reviewed By: ngimel Differential Revision: D27036705 Pulled By: mruberry fbshipit-source-id: 5bae741fd2a68f23717cb3a7c8b73e97cfb23b5c	2021-03-14 05:42:39 -07:00
Ivan Yashchuk	790326d49b	Fixed the size of the workspace array in functions calling LAPACK (#53909 ) Summary: The size of the workspace array should be max(1, lwork) according to LAPACK documentation. We got away with this previously because we tested only MKL which does a nice thing returning lwork >= 1. Fixes https://github.com/pytorch/pytorch/issues/53454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53909 Reviewed By: heitorschueroff Differential Revision: D27017025 Pulled By: mruberry fbshipit-source-id: 040a8cfb4bfb98db47d0b117938856d9483b20fb	2021-03-14 01:17:11 -08:00
Ivan Yashchuk	7df176b1f9	Added OpInfo-based testing of some linalg functions (#51107 ) Summary: Added OpInfo-based testing of the following linear algebra functions: * cholesky, linalg.cholesky * linalg.eigh * inverse, linalg.inv * qr, linalg.qr * solve The output of `torch.linalg.pinv` for empty inputs was not differentiable, now it's fixed. In some cases, batched grad checks are disabled because it doesn't work well with 0x0 matrices (see https://github.com/pytorch/pytorch/issues/50743#issuecomment-767376085). Ref. https://github.com/pytorch/pytorch/issues/50006 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51107 Reviewed By: albanD Differential Revision: D27006115 Pulled By: mruberry fbshipit-source-id: 3c1d00e3d506948da25d612fb114e6d4a478c5b1	2021-03-14 01:10:02 -08:00
Mike Ruberry	d46978cc55	Refines test_orgqr_* skip (#53975 ) Summary: https://github.com/pytorch/pytorch/pull/51348 added CUDA support for orgqr but only a cuSOLVER path; the orgqr tests, however, were marked to run on builds with either MAGMA or cuSOLVER. This PR addresses the issue by creating a skipCUDAIfNoCusolver decator and applying to the orgqr tests. It triggers ci-all because our CI build with MAGMA but no cuSOLVER is CUDA 9.2, which does run in the typical PR CI. cc IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/53975 Reviewed By: ngimel Differential Revision: D27036683 Pulled By: mruberry fbshipit-source-id: f6c0a3e526bde08c44b119ed2ae5d51fee27e283	2021-03-14 00:41:26 -08:00
Alexander	39f50f468d	matmul performance benchmarks (#51647 ) Summary: Minor PR following up the previous PR about sparse benchmarking utils https://github.com/pytorch/pytorch/pull/48397 Fixes https://github.com/pytorch/pytorch/issues/44634: Performance benchmarks for matrix-matrix and matrix-vector ops (dense-sparse, sparse-sparse, and compare to dense-dense) I ran all benchmarks on an 2xRTX8000 machine with AMD 2970WX 24-cores for `DLMC/magnitude_pruning` dataset with different sparsity levels. --- <details><summary> forward tests (expand for details). </summary> - `sparse@sparse` ``` [------------------------------- cpu:matmul-forward -------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense \| 108.1 \| 100.5 \| 101.3 \| 108.4 \| 98.4 \| 187.4 torch:sparse@sparse \| 659.1 \| 368.8 \| 156.5 \| 53.3 \| 26.8 \| 14.9 scipy:sparse@sparse \| 565.1 \| 233.9 \| 130.2 \| 23.1 \| 21.6 \| 15.2 Times are in milliseconds (ms). [----------------------------------- cuda:matmul-forward -----------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ---------------------------------------------------------------------------------- torch:dense@dense \| 2243.5 \| 4392.5 \| 4419.8 \| 2272.3 \| 4433.9 \| 8920.1 torch:sparse@sparse \| 21369.2 \| 11877.6 \| 7339.2 \| 1787.2 \| 1335.1 \| 845.7 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------- cpu:matmul-forward -------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense \| 105.8 \| 103.8 \| 103.0 \| 104.4 \| 104.4 \| 197.0 torch:sparse@dense \| 119.9 \| 102.4 \| 84.0 \| 19.7 \| 16.8 \| 11.6 scipy:sparse@dense \| 906.5 \| 799.6 \| 697.8 \| 182.2 \| 165.5 \| 135.4 Times are in milliseconds (ms). [------------------------- cuda:matmul-forward --------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: --------------------------------------------------------------- torch:dense@dense \| 2.2 \| 4.4 \| 4.4 \| 2.3 \| 4.5 \| 2.3 torch:sparse@dense \| 5.7 \| 6.6 \| 4.5 \| 1.4 \| 1.4 \| 1.3 Times are in milliseconds (ms). ``` - `sparse@vector` ``` [----------------------------------- cpu:matmul-forward ----------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: -------------------------------------------------------------------------------- torch:dense@vector \| 510.6 \| 505.8 \| 759.6 \| 782.1 \| 682.4 \| 764.6 torch:sparse@vector \| 10122.8 \| 6241.1 \| 7935.6 \| 2076.3 \| 1049.5 \| 826.3 scipy:sparse@vector \| 1756.7 \| 1033.9 \| 678.2 \| 343.5 \| 168.5 \| 65.4 Times are in microseconds (us). [-------------------------------- cuda:matmul-forward --------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ---------------------------------------------------------------------------- torch:dense@vector \| 36.1 \| 21.5 \| 21.6 \| 21.5 \| 21.6 \| 21.5 torch:sparse@vector \| 1099.2 \| 1289.4 \| 775.7 \| 327.1 \| 285.4 \| 274.0 Times are in microseconds (us). ``` </details> --- <details><summary> backward tests (expand for details). </summary> - `sparse@sparse` ``` [--------------------------------- cpu:matmul-backward ---------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------------------ torch:dense@dense \| 246.1 \| 315.0 \| 306.9 \| 168.6 \| 290.6 \| 146.9 torch:sparse@sparse \| 6417.5 \| 4393.7 \| 3012.7 \| 1029.4 \| 908.0 \| 650.7 Times are in microseconds (us). [----------------------------- cuda:matmul-backward -----------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ----------------------------------------------------------------------- torch:dense@dense \| 6.7 \| 13.3 \| 13.3 \| 6.9 \| 13.5 \| 6.9 torch:sparse@sparse \| 143.7 \| 143.4 \| 119.6 \| 29.5 \| 29.1 \| 10.9 Times are in microseconds (us). ``` - `sparse@dense` ``` [------------------------------ cpu:matmul-backward -------------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------------- torch:dense@dense \| 185.9 \| 304.8 \| 305.8 \| 169.9 \| 308.7 \| 168.4 torch:sparse@dense \| 407.9 \| 345.8 \| 274.6 \| 114.2 \| 163.6 \| 230.5 Times are in milliseconds (ms). [--------------------------- cuda:matmul-backward --------------------------] \| 0.5 \| 0.7 \| 0.8 \| 0.9 \| 0.95 \| 0.98 1 threads: ------------------------------------------------------------------ torch:dense@dense \| 6.7 \| 13.3 \| 13.3 \| 6.9 \| 13.4 \| 6.9 torch:sparse@dense \| 16.7 \| 19.0 \| 15.1 \| 6.3 \| 8.2 \| 12.7 Times are in milliseconds (ms). ``` </details> Kindly review this PR. cc mruberry, ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/51647 Reviewed By: albanD Differential Revision: D27007809 Pulled By: mruberry fbshipit-source-id: 8c1922cb3280027ca5e3eef31bfa20500c548cfd	2021-03-14 00:25:45 -08:00
Huamin Li	142c6b0e55	increase timeout for test_op_nnpi_fp16 Summary: As title. Otherwise we are getting flaky when running on devices in dev mode Reviewed By: jfix71 Differential Revision: D27035924 fbshipit-source-id: 4946a90bd341be63d74b7052cace3fabdefdc0c4	2021-03-13 23:17:21 -08:00
Iurii Zdebskyi	84af0c7acd	Refactor ForeachUtils.h (#51131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51131 -------- - Refactored `can_use_fast_route` logic in ForeachUtils.h. - Fixed related bugs in test_foreach.py Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D26103904 Pulled By: izdeby fbshipit-source-id: b3859b39adaab55c87dab6f7709d227adc0f6342	2021-03-13 13:39:25 -08:00
Nikita Shulga	f2689b1e13	Make ideep honor `torch.set_num_thread` changes (#53871 ) Summary: When compiled with OpenMP support `ideep`'s computational_cache would cache max number of OpenMP workers This number could be wrong after `torch.set_num_threads` call, so clean it after the call. Fixes https://github.com/pytorch/pytorch/issues/53565 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53871 Reviewed By: albanD Differential Revision: D27003265 Pulled By: malfet fbshipit-source-id: 1d84c23070eafb3d444e09590d64f97f99ae9d36	2021-03-13 11:20:44 -08:00
Yi Wang	de70cdb66b	Clang format default_hooks.py (#53956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53956 ghstack-source-id: 123852987 Test Plan: N/A Reviewed By: iseessel Differential Revision: D27032713 fbshipit-source-id: 11d831fa0f08b1c8bc2e44acd144bf85a69a1211	2021-03-13 10:41:11 -08:00
Yi Wang	ca4aae85fa	[Gradient Compression] Update the docstring of fp16_compress_wrapper (#53955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53955 Per title ghstack-source-id: 123852836 Test Plan: N/A Reviewed By: iseessel Differential Revision: D27032700 fbshipit-source-id: 6f9bbc028efe6cc9b54f4ec729fea745368efb2e	2021-03-13 10:39:40 -08:00
XiaobingSuper	3ce51fd5f4	remove th_fill and th_mul dead code (#52546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52546 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26654127 Pulled By: malfet fbshipit-source-id: 68b777cd8ce2992a876dc8d22276a2afcef4830e	2021-03-12 20:55:09 -08:00
Siqi Yan	317ff429d3	[TB] Support writing new style scalar (#53496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53496 New style vs old style `b306651ab5/tensorboard/data_compat.py (L49-L53)` Writing in new style can help avoid the cost of migration `b306651ab5/tensorboard/data_compat.py (L46)` ---- Test Plan: buck run caffe2/test:tensorboard --- Reviewed By: edward-io Differential Revision: D26879076 fbshipit-source-id: 43cfe9e1ca52dad3efc10332715d39f1cc984862	2021-03-12 19:03:13 -08:00
Raghavan Raman	ef07a04072	[NNC] New APIs to get loops corresponding to a Buf (#53778 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53092 This PR adds the following APIs to NNC. ``` // In For: static For* getParentLoop(const Stmt* st); static std::vector<For> getEnclosingLoopNest(const Stmt st); // In LoopNest: std::vector<const Stmt> getAllWritesToBuf(const Buf) const; std::vector<For> getAllInnermostLoopsWritingToBuf(const Buf) const; std::vector<std::vector<For>> getAllLoopNestsWritingToBuf(const Buf) const; ``` These APIs are required for some usecases that involve multiple transformations like `splitWithTail` followed by `reorder` as shown in https://github.com/pytorch/pytorch/issues/53092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53778 Reviewed By: albanD Differential Revision: D26987013 Pulled By: navahgar fbshipit-source-id: 491459eddfff045132d2358631ad069bbcc520df	2021-03-12 18:50:15 -08:00
Edvard Ghazaryan	ce0fd095a8	Implemented embedding_bag for SR (#52429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52429 Implemented embedding_bag for supporting out version in SR Befor:Milliseconds per iter: 1.15443. Iters per second: 866.226 After: Milliseconds per iter: 1.14791. Iters per second: 871.149 Test Plan: buck test caffe2/test:nn buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D26089498 fbshipit-source-id: c9ba7068d5aa696c8f37a4846d8e80c6379538d2	2021-03-12 17:52:27 -08:00
Isaac Seessel	3078233e9a	[Gradient Compression] Make FP16 compression as a wrapper that can be combined with other communication hooks (#53808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53808 Create a FP16 wrapper that can combine FP16 gradient compression with any gradient compression algorithm. Test Plan: Unit test: ``` buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_fp16_compress_wrapper ``` Performance Test on DDP QPS Benchmark: Check if AllReduce + FP16 Wrapper = FP16 Compression 1) FP16 Compression: f256897690 2) FP16 Wrapper + AllReduce (after patching D26960986): f256897289 Reviewed By: SciPioneer Differential Revision: D26978832 fbshipit-source-id: 0dcd18b050c02f5e9f3cff56344d1f39a04e20c0	2021-03-12 17:31:07 -08:00
Scott Wolchok	8a5b946ff6	[caffe2] Don't call TensorImpl::size() in dim32() (#53852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53852 dim32() requires that its argument is in range, so we can use the faster `TensorImpl::sizes()` call instead. ghstack-source-id: 123784862 Test Plan: Ran MergeNet AdIndexer benchmark under perf stat. Before: ``` Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs): 7,008.70 msec task-clock # 0.997 CPUs utilized ( +- 0.25% ) 4,203 context-switches # 0.600 K/sec ( +- 14.71% ) 3 cpu-migrations # 0.000 K/sec 93,896 page-faults # 0.013 M/sec ( +- 0.80% ) 13,869,719,763 cycles # 1.979 GHz ( +- 0.23% ) (50.05%) 27,561,765,867 instructions # 1.99 insn per cycle ( +- 0.06% ) (50.04%) 4,288,245,412 branches # 611.846 M/sec ( +- 0.05% ) (50.01%) 19,633,433 branch-misses # 0.46% of all branches ( +- 0.83% ) (50.01%) # Table of individual measurements: 7.0670 (+0.0379) # 6.9897 (-0.0394) # 7.0203 (-0.0088) # 6.9829 (-0.0462) # 7.0856 (+0.0565) # # Final result: 7.0291 +- 0.0205 seconds time elapsed ( +- 0.29% ) ``` After: ``` Performance counter stats for 'scripts/bwasti/static_runtime/run.sh' (5 runs): 6,935.61 msec task-clock # 0.997 CPUs utilized ( +- 0.47% ) 2,913 context-switches # 0.420 K/sec ( +- 15.25% ) 3 cpu-migrations # 0.000 K/sec 92,628 page-faults # 0.013 M/sec ( +- 0.50% ) 13,724,940,495 cycles # 1.979 GHz ( +- 0.47% ) (50.01%) 27,226,217,974 instructions # 1.98 insn per cycle ( +- 0.02% ) (50.03%) 4,220,129,358 branches # 608.472 M/sec ( +- 0.06% ) (50.04%) 19,025,346 branch-misses # 0.45% of all branches ( +- 0.53% ) (50.04%) # Table of individual measurements: 6.9402 (-0.0145) # 6.8570 (-0.0978) # 6.9311 (-0.0236) # 7.0101 (+0.0554) # 7.0352 (+0.0805) # # Final result: 6.9547 +- 0.0315 seconds time elapsed ( +- 0.45% ) ``` Roughly 1% cycles win, which is outside the quoted noise level. Reviewed By: hlu1 Differential Revision: D26994107 fbshipit-source-id: f4c4963be0a5c268cbcdac5359f8278750218ae6	2021-03-12 16:22:29 -08:00
Nikita Shulga	5b648ef909	Revert D26922420: [ONNX] fix export of embedding with padding_idx (#53053 ) Test Plan: revert-hammer Differential Revision: D26922420 (`ee4ce8e9d9`) Original commit changeset: b8b867a96a13 fbshipit-source-id: 501392f419f2735658001c96f83d9754acd8e476	2021-03-12 14:51:01 -08:00
kshitij12345	00771eff8e	[reland] Add OpInfo for `bitwise_not` (#53181 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Note: Reland https://github.com/pytorch/pytorch/issues/51944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53181 Reviewed By: albanD Differential Revision: D27004695 Pulled By: mruberry fbshipit-source-id: 92b4e8c60bb6f3c302907716de040b5c81c8db69	2021-03-12 14:43:56 -08:00
Ivan Yashchuk	fe08671756	Added cuBLAS path for torch.triangular_solve (#53147 ) Summary: This PR adds the cuBLAS based path for `torch.triangular_solve` The device dispatching helper function was removed from native_functions.yml, it is replaced with DECLARE/DEFINE_DISPATCH. `magmaTriangularSolve` is removed and replaced with cuBLAS calls, this is not a BC-breaking change because internally MAGMA just calls the same cuBLAS function and doesn't do anything else. Batched cuBLAS is faster than batched MAGMA for matrices of size up until 512x512, after that MAGMA is faster. For batches smaller than ~8 and matrix sizes larger than 64x64 a forloop of cuBLAS calls is faster than batched version. Ref. https://github.com/pytorch/pytorch/issues/47953 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53147 Reviewed By: heitorschueroff Differential Revision: D27007416 Pulled By: mruberry fbshipit-source-id: ddfc190346e6a56b84145ed0a9af67ca9cde3506	2021-03-12 13:38:42 -08:00
Nikita Vedeneev	afa1ff8e04	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: albanD Differential Revision: D26991788 Pulled By: mruberry fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c	2021-03-12 13:25:55 -08:00
Hao Lu	4932342363	[Static Runtime] Fix bug in ClipRangesGatherRangesX2SigridHash (#53799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53799 Fix two issues with ClipRangesGatherRangesX2SigridHash and ClipRangesGatherRangesX2SigridHashPrecompute: - The first issue is with the two step graph rewrite process. If step 2 doesn't happen after step 1, then we're stuck with a graph with a `fb::placeholder` op that can't run. Step 3 is added to revert step 1 so we restore the original graph if there's any `fb::placeholder` op left. - The second issue is with `SigridHashPrecompute`. The coupling with `freeze_module` is not ideal and limits its use to Static Runtime only. By running `ConstantPropagation` and `ConstantPooling` after splitting SigridHash, we can move all the Constant ops to the front of the graph and fusion can happen right afterwards. Reviewed By: ajyu Differential Revision: D26920008 fbshipit-source-id: e4bc67c7a15181bac5dbbfbb95d861849652bddf	2021-03-12 13:15:44 -08:00
Natalia Gimelshein	76129c7cdf	Revert D26993790: [pytorch][PR] [CUDA graphs] Private mempools for CUDA graphs Test Plan: revert-hammer Differential Revision: D26993790 (`90dfdef226`) Original commit changeset: a992eaee1b8c fbshipit-source-id: 6ddb4aedd6154d7d89847aa5a34181158d06a309	2021-03-12 13:07:28 -08:00
kshitij12345	fe38027fc3	[fix] torch.cat : cross-device check for out and input tensors (#53004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52044 (`stack` dispatches to `cat`) The way dispatcher works, currently this case happens only in CUDA kernel (CPU kernel is chosen if all inputs and out are on CPU). That is why the check is added only on the CUDA side. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53004 Reviewed By: albanD Differential Revision: D27003956 Pulled By: mruberry fbshipit-source-id: 818ea0f76153f4fa281740f30705e5ef018413f6	2021-03-12 12:51:11 -08:00
Siva Datta Mannava	fdbd667e31	compareSet method for HashStore and FileStore (#53803 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53803 Reviewed By: ngimel Differential Revision: D27017014 Pulled By: H-Huang fbshipit-source-id: 736aa5ad848f5708e6581e472e48d5682bef7131	2021-03-12 12:38:30 -08:00
Ilya Persky	d4c877b59b	Fix typo "informations" -> "information" (#53746 ) Summary: Hey, fixing the [uncountable](https://www.oxfordlearnersdictionaries.com/definition/american_english/information) noun to the proper form. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53746 Reviewed By: ngimel Differential Revision: D27012035 Pulled By: albanD fbshipit-source-id: dc653e739b5f6abed99b74bd2fd514b795d61b2e	2021-03-12 12:07:38 -08:00
Peter Bell	f62e9156dc	Add missing decorators in test_spectral_ops (#53736 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53456 I'm confused why this wasn't picked up in CI. There's definitely at least one CI job that builds without MKL. Are spectral_ops not being run at all on that job? Pull Request resolved: https://github.com/pytorch/pytorch/pull/53736 Reviewed By: albanD Differential Revision: D27007901 Pulled By: mruberry fbshipit-source-id: cd93a2c48f4ccb2fd2e0e35768ee059039868a1b	2021-03-12 12:00:25 -08:00
Alexander	89fce74d55	fix for method_tests() random failures (#53854 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48125 and https://github.com/pytorch/pytorch/issues/53237 The origin of the problem was that `common_methods_invocations.method_tests()` uses `set_rng_seed(0)` which is different from the seed used at `TestCase:setUp` -> `set_rng_seed(SEED)`. As this issue might block removing old tests I also notice that this could also be the reason of test failures at PR https://github.com/pytorch/pytorch/issues/50655 Thanks cc mruberry, kshitij12345, imaginary-person Pull Request resolved: https://github.com/pytorch/pytorch/pull/53854 Reviewed By: albanD Differential Revision: D27004797 Pulled By: mruberry fbshipit-source-id: 66a15ed900131c782bc341b16c902972d7bb2541	2021-03-12 11:49:47 -08:00
Adam Simpkins	33aaea912a	[caffe2] Support deserializing tensors using alternate serialization formats (#53403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53403 This updates the `TensorProto` field to independently track the data type of the in-memory (deserialized) data from the serialized data format. This will allow us to support multiple different serialization formats in the future. For instance, we could choose to perform quantization of floating point data types, or varint encoding for integer fields. For now this diff does not actually change the serialization code path yet, and does not introduce any new serialization formats, but only refactors the deserialization code path to make it easier to introduce new formats. I'm not really that thrilled with the heavy use of macros and templates here, but I didn't really see better alternatives that made it as simple to specify new deserialization function implementations. ghstack-source-id: 123594220 Test Plan: Confirmed that the existing unit tests pass. This diff only touches the deserialization code path and not the serialization code to help ensure that the deserialization code works with the existing serialization logic, and that there are no changes to the current serialization format. Reviewed By: mraway Differential Revision: D26658206 fbshipit-source-id: d7297d600aee28b92fd9f4ece437b7f519060942	2021-03-12 11:35:15 -08:00
Adam Simpkins	91531d3047	[caffe2] add a CAFFE2_NODISCARD macro to help support old compilers (#53754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53754 Some of the PyTorch CircleCI builds still use gcc 5.4, and compile with `-Werror=attributes` causing this old compiler to fail because it does not understand the `[[nodiscard]]` attribute. Let's define a `CAFFE2_NODISCARD` macro to work around this. ghstack-source-id: 123594084 Test Plan: I'm using this macro in subsequent diffs in the stack. Reviewed By: mraway Differential Revision: D26959584 fbshipit-source-id: c7ba94f7ea944b6340e9fe20949ba41931e11d41	2021-03-12 11:32:30 -08:00
Xu Zhao	7763bb6cb3	Use the conda channel defined in docker.Makefile to install cudatoolkit (#53316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53316 Test Plan: Nightly Docker build CI This is a follow-up PR after docker moved default CUDA => 11.1. Only merge this after https://github.com/pytorch/pytorch/issues/53299 is committed. Reviewed By: albanD Differential Revision: D26996287 Pulled By: xuzhao9 fbshipit-source-id: 0c2e03da41d036d7aada3e07d479a3dede219f58	2021-03-12 11:18:05 -08:00
Michael Carilli	90dfdef226	[CUDA graphs] Private mempools for CUDA graphs (#51436 ) Summary: Implements https://github.com/pytorch/pytorch/issues/51075#issuecomment-768884685 and additions discussed offline with ezyang ngimel . (Calling it "simple" is charitable but it's not too bad). [High level strategy](https://github.com/pytorch/pytorch/pull/51436/files#diff-acc6337586bf9cdcf0a684380779300ec171897d05b8569bf439820dc8c93bd5R57-R82) The current design aggregates stats from private pools with the ordinary pools, which may or may not be what we want. Instead of adding PrivatePools as an internal feature of DeviceAllocator, I could inherit from DeviceAllocator (eg `DevicePrivateAllocator : public DeviceAllocator`) and create separate per-graph instances of the inherited class. I'm not sure if that would be better. Graph bindings in Python are almost unchanged from https://github.com/pytorch/pytorch/pull/48875: ```python # Same bindings as 48875, but now implicitly grabs a private mempool graph1.capture_begin() graph1.capture_end() # pool=... is new. It hints that allocations during graph2's capture may share graph1's mempool graph2.capture_begin(pool=graph1.pool()) graph2.capture_end() # graph3 also implicitly creates its own mempool graph3.capture_begin() graph3.capture_end() ``` Test plan (other suggestions appreciated): - [x] Stop maintaining manual references for all the tensors in my existing graphs+RNG tests. If private pools somehow give bad allocations, they should start failing intermittently. They run eager ops and eager allocations mixed with graph replays, so they may expose if eager ops and replays corrupt each other. - [x] `test_graph_two_successive`: Capture successive graphs, with the second graph using the first graph's result. Try with and without sharing a pool. Check results, also check memory stats to confirm sharing a pool saves memory. - [x] `test_graph_concurrent_replay`: Capture some graphs in separate private pools, replay them concurrently in different streams, check the results to make sure they don't corrupt each other's memory. Capture some graphs with a shared pool, replay them concurrently in different streams, check results, confirm they DO corrupt each other's memory. - [x] `test_graph_three_successive`: A three-graph case, checking the safe and unsafe replay patterns in [Restrictions of the Strawman API](https://github.com/pytorch/pytorch/issues/51075)). - [x] `test_graph_memory_stats_and_use_result_after_destroy_graph`: Comprehensively check torch.cuda.memory_stats() changes that result from graph capture and delete. Check that a tensor ref created during capture and held after graph delete stays valid until the tensor itself is deleted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51436 Reviewed By: mruberry Differential Revision: D26993790 Pulled By: ngimel fbshipit-source-id: a992eaee1b8c23628e7b388a5a3c26e0f80e54da	2021-03-12 11:07:47 -08:00
Scott Wolchok	804f3f9879	[PyTorch] Remove unnecessary assert in maybe_resize_storage_cpu (#53724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53724 See new code comment -- stealAndSetStoragePtr calls set_storage_keep_dtype. ghstack-source-id: 123636226 Test Plan: CI Reviewed By: mruberry Differential Revision: D26922164 fbshipit-source-id: fe1dd2b3e5f0876b8b41694ff2fb19b9ca2bae61	2021-03-12 10:48:42 -08:00
Xiang Gao	34eb644e88	Replace thrust with cub in randperm (#53841 ) Summary: Benchmark of ```python %timeit torch.randperm(100000, device='cuda'); torch.cuda.synchronize() ``` thrust: ``` 5.76 ms ± 42.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cub: ``` 3.02 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` sync in thrust sort is removed Warning: Thrust supports 64bit indexing, but cub doesn't, so this is a functional regression. However, `torch.randperm(231, device='cuda')` fails with OOM on 40GB A100, and `torch.randperm(232, device='cuda')` fails with OOM on 80GB A100, so I think this functional regression has low impact and is acceptable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53841 Reviewed By: albanD Differential Revision: D26993453 Pulled By: ngimel fbshipit-source-id: 39dd128559d53dbb01cab1585e5462cb5f3cceca	2021-03-12 10:30:30 -08:00
Caroline Chen	7f4aff8203	Skip dispatch for is_signed (#53847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53847 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D26994937 Pulled By: carolineechen fbshipit-source-id: 8af25ecdade0b31d29fac27de6ee5f704353af10	2021-03-12 10:26:25 -08:00
Vasiliy Kuznetsov	2912ad1324	ns for fx: move linear activation test case to new API (#53777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53777 Moves linear activation test case to new NS API Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_linear ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967107 fbshipit-source-id: 83c4401b2bf79d15227b7fb3e59c54276ec5626b	2021-03-12 10:02:52 -08:00
Vasiliy Kuznetsov	57bf13409a	ns for fx: move compare activations for conv test to new API (#53776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53776 Moves the test for comparing activations for conv to new API. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_activations_conv ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967106 fbshipit-source-id: 2eb986ff19761a1e2408cb7780ac0b282cdcc523	2021-03-12 10:02:47 -08:00
Vasiliy Kuznetsov	01c6e9360e	ns for fx: move lstm dynamic weight test case to new API (#53772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53772 Moves the test case for extracting LSTM dynamic weights to new NS API. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_weights_lstm_dynamic ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967104 fbshipit-source-id: 0d17e7735ec361167dcf72bcb373bfc1aad84668	2021-03-12 10:02:43 -08:00
Vasiliy Kuznetsov	a71cd135ae	ns for fx: move linear dynamic weight test case to new API (#53765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53765 Moves linear dynamic weight test case to new NS API. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_weights_linear ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967109 fbshipit-source-id: 2096a88a3005270696d536f2e1bbc87e70c07230	2021-03-12 10:02:38 -08:00
Vasiliy Kuznetsov	9c8f112ada	ns for fx: move linear weight test case to new API (#53764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53764 Moving the linear weight test case to new FX NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_weights_linear ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967111 fbshipit-source-id: f0a90d7863d5d866e391729ec28e0e0dea339900	2021-03-12 10:02:34 -08:00
Vasiliy Kuznetsov	19fe8a529e	ns for fx: move conv weight test case to new API (#53761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53761 Moves the testing of conv weight matching to new NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels.test_compare_weights_conv ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26967108 fbshipit-source-id: af3647733f954a657e0868c2c40642018de9ea49	2021-03-12 10:02:30 -08:00
Vasiliy Kuznetsov	986e3c0a00	ns for fx: extract common code in tests to util functions (#53748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53748 Extracts common testing patterns for FX numeric suite into util functions. No logic change. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26967105 fbshipit-source-id: 9f6cbe75bb6d2ede142929e0c9e40812006c159d	2021-03-12 10:02:25 -08:00
Vasiliy Kuznetsov	7d27eb8068	ns for fx: clean up API naming (#53729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53729 Aligns the names of the three core APIs to the design doc. New names: ``` // weights _extract_weights_one_model extract_weights // unshadowed activations _add_loggers_one_model add_loggers _extract_logger_info_one_model extract_logger_info // shadowed activations add_shadow_loggers extract_shadow_logger_info ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26953071 fbshipit-source-id: dda6df1c26afd99dd7779e72e3eed2d3d72c8128	2021-03-12 10:02:21 -08:00
Vasiliy Kuznetsov	421e91dfd2	ns for fx: add support for logging inputs Summary: This PR implements the option to log inputs for FX Numeric Suite. The user facing api looks like ``` def prepare_model_outputs(..., should_log_inputs : bool = False) def prepare_model_with_stubs(..., should_log_inputs : bool = False) ``` The output data now looks like ``` { "layer1": { "node_inputs": { "model1": [{ "values": ..., ..., }], }, "node_outputs": { ..., } }, ... // other layers } ``` One key design decision taken here is that an input logger logs the output of previous nodes, instead of logging the input of the current node. This matters for a signature such as `cat([x1, x2, x3])`. We are inserting three input loggers here (for x1, x2, and x3), instead of a single input logger for `[x1, x2, x3]`. This was chosen in order to preserve the structure of the original graph as much as possible and keep flexibility for future optimizations. Test Plan: TODO: fill out Imported from OSS Differential Revision: D26931225 Reviewed By: hx89 Pulled By: vkuzo fbshipit-source-id: dd692bfb5ddaaf5554f80c25e2f40b21762e4fc3	2021-03-12 10:02:17 -08:00
Vasiliy Kuznetsov	cc940f3580	ns for fx: change dtype cast from once per N node to once per node Summary: This PR ensures that when we do a dtype cast for a shadow module, we insert N dtype casts for N nodes, instead of combining N nodes into a single dtype cast. An example where this occurs is `cat([x, y], dim=0)` ``` // original graph [x, y] -> cat_b -> output // shadow graph with a single dtype cast, before this PR dtype_cast -> cat_a_shadow -> output_a_shadow / [x, y] -> cat_b -> output_b // shadow graph with multiple dtype casts, after this PR [dtype_cast_x, dtype_cast_y] -> cat_a_shadow -> output_a_shadow / [x, y] -> cat_b -> output_b ``` The reason things worked before this PR is because `torch.dequantize` can take either a single tensor or a list of tensors. We are changing this to make an upcoming addition of input loggers easier. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_prepare_model_with_stubs_multiple_dtype_casts ``` Imported from OSS Differential Revision: D26931226 Reviewed By: hx89 Pulled By: vkuzo fbshipit-source-id: e9c7d4c7942e0f59c952094d2e446b1e2c838396	2021-03-12 10:02:12 -08:00
Vasiliy Kuznetsov	d73e36a44a	ns for fx: change API to take nn.Module instead of GraphModule (#53075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53075 The input and output types should be `nn.Module`, to hide the implementation detail that the pass is using FX. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26740548 fbshipit-source-id: d5ed445379355bebdd90d377c95fcd7e671371a3	2021-03-12 10:00:35 -08:00
Nikita Shulga	b00cdfe136	Fix run_test_module logic (#53884 ) Summary: First argument is either file name or test module name, but key to `CUSTOM_HANDLERS` is test module name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53884 Test Plan: Run `python3 run_test.py -i distributed/test_distributed_spawn.py` Reviewed By: janeyx99 Differential Revision: D27006164 Pulled By: malfet fbshipit-source-id: f30b42856cd2754e5981c1c69618f84e392c986a	2021-03-12 09:53:58 -08:00
Nikita Shulga	ae7984b1d6	Do not use shards for single run tests (#53883 ) Summary: Do not compute shards if whole testsuite needs to be run anyway. Helps avoid occasional test duplication/gaps when access to test time database is not available while one of the shards is computed Fixes https://github.com/pytorch/pytorch/issues/53882 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53883 Reviewed By: janeyx99 Differential Revision: D27005910 Pulled By: malfet fbshipit-source-id: f9603db0523a3a2539118e3fec1c6874c54f8d6d	2021-03-12 09:47:00 -08:00
Brian Hirsh	a7ddd15d15	fix static dispatch linker error (#53859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53859 The redispatch API wasn't linking properly when static dispatch is enabled. I'm still not sure why this wasn't caught by the static dispatch test in CI- maybe, as swolchok pointed out, we have a flag set somewhere that defers undefined symbols until runtime. Before, building with static dispatch enabled locally + running `import torch` gave me this error: ``` >>> import torch Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/raid/hirsheybar/pytorch/torch/__init__.py", line 197, in <module> from torch._C import * ImportError: /raid/hirsheybar/pytorch/torch/lib/libtorch_cpu.so: undefined symbol: _ZN2at10redispatch11logical_or_EN3c1014DispatchKeySetERNS_6TensorERKS3_ >>> ``` Printing the symbol: ``` (pytorch) hirsheybar@devfair017:/scratch/hirsheybar/pytorch$ c++filt _ZN2at10redispatch11logical_or_EN3c1014DispatchKeySetERNS_6TensorERKS3_ at::redispatch::logical_or_(c10::DispatchKeySet, at::Tensor&, at::Tensor const&) ``` Sure enough, the functions defined in `RedispatchFunctions.cpp` don't have the DispatchKeySet argument included. Adding them in this PR. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26998735 Pulled By: bdhirsh fbshipit-source-id: c6c1104e42d13b7ec9d964b7e08d2adc8b344b78	2021-03-12 09:41:08 -08:00
Stas Bekman	924c15c962	[doc] reorg dist init and non-init functions (#52976 ) Summary: This PR proposes to improve the distributed doc: * [x] putting the init functions together * [x] moving post-init functions into their own sub-section as they are only available after init and moving that group to after all init sub-sections If this is too much, could we at least put these 2 functions together: ``` .. autofunction:: init_process_group .. autofunction:: is_initialized ``` as they are interconnected. and the other functions are not alphabetically sorted in the first place. Thank you. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52976 Reviewed By: albanD Differential Revision: D26993933 Pulled By: mrshenli fbshipit-source-id: 7cacbe28172ebb5849135567b1d734870b49de77	2021-03-12 08:48:18 -08:00
Erjia Guan	fff0a3f906	[DataLoader] ZipIterDataPipe (#53554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53554 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26913406 Pulled By: ejguan fbshipit-source-id: 24604b41d08eb6f7689add152229049a4c65c06e	2021-03-12 08:26:21 -08:00
Jeffrey Wan	7297556d5d	Add support for single tensor in `inputs` argument for backward (#53827 ) Summary: Also updates the doc such that the language matches the type. For example, previously the `tensors` argument is specified as `(sequence of tensor)`, but has type annotation of `_TensorOrTensors`. Now its correctly updated to be `Sequence[Tensor] or Tensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53827 Reviewed By: albanD Differential Revision: D26997541 Pulled By: soulitzer fbshipit-source-id: e1e609a4e9525139d0fe96f6157175481c90d6f8	2021-03-12 08:19:31 -08:00
Vasiliy Kuznetsov	4884a6ab51	fx quant: clean up names of quantize handlers (#53614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53614 Ensures that every subclass of `QuantizeHandler` has a clear name. This prevents ambiguous names like `Cat`, which look like a module but are really a quantize handler. Test Plan: ``` python test/test_quantization.py TestQuantizeFx ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26914784 fbshipit-source-id: 6dca7e27975c09f422f8e36f1d2b709bf3eaaadf	2021-03-12 07:43:53 -08:00
Vasiliy Kuznetsov	279b5372ab	[not for land] fix fx quant for quant_layer -> stack -> sum (#53196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53196 Before this PR, code patterns like this did not work: ``` x = some_quant_layer(x) x = torch.stack([x, ...]) x = torch.sum(x, ...) ``` The reason this did not work is because `torch.sum` is treated as "quantized" because of the newly added fp16 support, even though it is not actually "quantized" for models where fp16 is not used. We may need to adjust the concept of "quantized vs non-quantized" into a "dtype" for the longer term fix. The current PR is a hacky fix to unblock. We need to clean things up before this is landable Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_quant_sum ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26783960 fbshipit-source-id: 3be7c3c1eaa2b8fcb99a105e1b0004c9ffd3a1c1	2021-03-12 07:43:50 -08:00
Vasiliy Kuznetsov	93d5807c1e	[not for land yet]fix using size of quant layer in torch._assert (#53187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53187 Before this diff, if we had code lik ``` x = any_quant_layer(...) x_size0 = x.size(0) torch._assert(x_size_0 == 1) ``` The convert code would try to insert a dequantize after `x_size0`, because it was a descendant of a quantized node and it was needed for a non-quantized operation. Since the actual type of the `size` function output is an integer, this does not make sense. For now, this is fixed as a one-off to unblock a customer. In the future, we may need to think more deeply about all the functions which can return non-quantized types from quantized tensors and make sure they are all covered. Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_assert_on_size_after_quant_layer ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26780690 fbshipit-source-id: 44cc25c9179d460efb3f110d40b73d854d676af5	2021-03-12 07:43:48 -08:00
Vasiliy Kuznetsov	ccab6680d5	[not for land yet] hacky fix for x.ndim followed by sub (#53120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53120 Currently there is a pattern which is not handled correctly by FX graph mode quantization: ``` def forward(self, x): ndim = x.ndim # or add, mul, div, etc x = torch.sub(x, ndim) return x ``` The reason this does not work is as follows: 1. x.ndim becomes a getattr node 2. the real world type of x.ndim is an integer, but this is not known from the graph (yet) 3. binary ops such as `torch.sub` require quantization of inputs 4. the framework inserts an observer to observe the output of `ndim` 5. the observer fails because `ndim` is not a Tensor For now, we hack a bandaid to unblock some teams, none of this is for land. We will have to think of a better fix which is landable (TBD). Test Plan: ``` python test/test_quantization.py TestQuantizeFx.test_getattr_with_nontensor_result ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26756180 fbshipit-source-id: c0e498766b22c23df74fbb5aaeaa237c4c944263	2021-03-12 07:42:12 -08:00
Howard Huang	4873641602	Fix TCPStore wait() hang when key is previously set (#53860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53860 Fixes [#53840](https://github.com/pytorch/pytorch/issues/53840) Right now [TCPStore wait([LIST_OF_KEYS_TO_AWAIT])](https://pytorch.org/docs/master/distributed.html#torch.distributed.Store.wait) will hang if any of the keys in [LIST_OF_KEYS_TO_AWAIT] has been previously set. This change will ensure that wait() is only waiting for the keys that have not been set Before change: ``` # Case 1: HANG store.set("1", "1") store.wait(["1", "2"]) store.set("2", "2") # Case 2: SUCCEED store.wait(["1", "2"]) store.set("1", "1") store.set("2", "2") ``` After change: Both cases work TODO: working on adding a test for wait() Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26999929 Pulled By: H-Huang fbshipit-source-id: 8931749923c98b520366538f785af82ef37cca8e	2021-03-12 07:05:31 -08:00
generatedunixname89002005325676	a51f130d37	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D27005870 fbshipit-source-id: 5d51d0e64ae3fb15d38f8a9f8479af1c86b18fa9	2021-03-12 04:00:36 -08:00
BowenBao	ee4ce8e9d9	[ONNX] fix export of embedding with padding_idx (#53053 ) (#53530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53530 fix export of embedding with padding_idx Test Plan: Imported from OSS Reviewed By: navahgar, jamesr66a, malfet Differential Revision: D26922420 Pulled By: SplitInfinity fbshipit-source-id: b8b867a96a13cf810f9c0ae88fcc5c95072bb390	2021-03-12 02:49:46 -08:00
BowenBao	a572f70f2f	[ONNX] Support torch.isinf, torch.any and torch.all export to ONNX (#53328 ) (#53529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53529 Supported for ONNX export after opset 10. This is not exportable to opsets < 10 due to 1. onnx::IsInf is introduced in opset 10 2. onnx::Equal does not accept float tensor prior to opset 11 Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922418 Pulled By: SplitInfinity fbshipit-source-id: 69bcba50520fa3d69db4bd4c2b9f88c00146fca7	2021-03-12 02:49:41 -08:00
BowenBao	705131c5d3	[ONNX] Update ONNX documentation (#51362 ) (#53313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53313 Add information about .data field Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922421 Pulled By: SplitInfinity fbshipit-source-id: 5117ac20990e286dcacb44f7b810b1bcc75d3dd6	2021-03-12 02:49:38 -08:00
BowenBao	a6a811f23a	[ONNX] Add repeat_interleave symbolic (#52855 ) (#53312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53312 - Add support for aten::repeat_interleave - NOTE: Also adds fix for cases with split op where input tensor sizes are not known but _outputs is provided Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922422 Pulled By: SplitInfinity fbshipit-source-id: 5362d0d8ccfdc14c15e1ae73fd70c4c113f823e6	2021-03-12 02:49:34 -08:00
BowenBao	76147b897c	[ONNX] Update assign output shape for nested structure and dict output (#52893 ) (#53311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53311 Fixes dict output & nested tuple. Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922426 Pulled By: SplitInfinity fbshipit-source-id: c2c6b71c8d978b990181e0b025626dbf6ef2199e	2021-03-12 02:49:30 -08:00
BowenBao	4c1d9e58c2	Fix copy_ export (#53046 ) (#53310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53310 Fixes export of torch.copy_ Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922424 Pulled By: SplitInfinity fbshipit-source-id: f509e531f5064d2be7f55e1681813f10f17475d2	2021-03-12 02:49:26 -08:00
BowenBao	8dab886d3b	[ONNX] enable several script unit tests using new jit passes (#51722 ) (#53309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53309 * enable scripting related unit test * fix flaske * enable more tests * fix interpolate and BCElogits test * fix interpolate test ci * add test_interpolate_upsample opset comment * add interpolate_upsample tracing support below opset 9 Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922423 Pulled By: SplitInfinity fbshipit-source-id: d1cd6a34c0820a75ffc28ff17acc5daa7807c00b Co-authored-by: hwangdeyu <deyhuang@qq.com>	2021-03-12 02:49:22 -08:00
BowenBao	be344e9d88	Update test cases generated by make_test() method to support running them in script mode. (#52748 ) (#53308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53308 * Update tests for test_gru_* at this moment. * Update flake8 error. * Update tests for test_gru_* at this moment. * Update flake8 error. * Update test_gru_* test cases only. * Fix flake8 issue. * Fix flake8 issue on test. * Still disable test cases created by make_test. * Update code to fix issue 'AttributeError: 'RecursiveScriptModule' object has no attribute 'forward'' for test_elman_* test cases. * Add script model support for test_lstm_* test cases. Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922419 Pulled By: SplitInfinity fbshipit-source-id: a96432b2e7da9b142a38f87fbaf56737117462c1	2021-03-12 02:49:18 -08:00
BowenBao	7f17058894	[ONNX] Symbolic shape inference (#51481 ) (#53307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53307 This PR did symbolic shape inference, in the onnx pass _jit_pass_onnx_graph_shape_type_inference. It creates a singleton ConstantValueMap. It leverages constant folding technique and did a per-op based handling for ConstantValueMap. As a byproduct, it enables fold_if pass for dynamic axes cases, typically for faster-rcnn etc. The core change is in `torch/csrc/jit/passes/onnx/shape_type_inference.cpp` and `torch/csrc/jit/passes/onnx/constant_map.cpp`. We usually need copy tensor to store in the ConstantValueMap, otherwise the underlying value may change. I see this issue in (1) from_blob (2) get value from Constant node. Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922414 Pulled By: SplitInfinity fbshipit-source-id: 7654dc13d1de8d9496ad4be89f1454260d7bdeb0	2021-03-12 02:49:14 -08:00
BowenBao	57d1df071f	[ONNX] Support inplace operations on inplace indexing (#52063 ) (#53306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53306 * [ONNX] Fix for sequence of mutations in blocks (#51577) Fixes consecutive mutations in a tensor inside blocks. Also, support append and pop in blocks. * Support inplace operations + indexing * Clean up old pass for remove mutations * Add loop test * Fixes for set attr in loops * Removing the new jit API flag * [ONNX] Redesign onnx pass to enable shape type dependent pattern conversion - cont (#51795) With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators. This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available. The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion. The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions. The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator. This PR includes several other fixes, listed below. * ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~ * fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist. * ~~enable onnx shape inference in export with parameter/initializer data.~~ * other code clean ups. * fix insertion of identity nodes for loop opset 13 sequence output. ~~PR depends on #51603~~ * Fix after merge * clang * Fix clang * Fix clang * Fix warning message. * Fixes for non-model param attributes * Fix for caffe2 * Additional test * clang * Skip test for lower opsets * fix clang-tidy * Update init.cpp * Update remove_inplace_ops_for_onnx.cpp * Update remove_inplace_ops_for_onnx.cpp * Update remove_inplace_ops_for_onnx.cpp * Fix for clang formatting Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922416 Pulled By: SplitInfinity fbshipit-source-id: e7108620b39b6404c594910786c4d275fee59d84 Co-authored-by: Bowen Bao <bowbao@microsoft.com>	2021-03-12 02:49:11 -08:00
BowenBao	38414d29a1	[ONNX] Remove the last Cast in pow symbolic_opset9 (#52646 ) (#53305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53305 Fixes #52436 For opset 9 of onnx Pow, if X is int32, Y is float, we will cast back to int32 which is consistent with X type. However, pytorch is still float. The aten graph sometimes does not bind with the type for operators, we are fine with the float type and don't want to cast back. Even if X, Y are int32, the resulting float32 and int32 makes no difference. Test Plan: Imported from OSS Reviewed By: pbelevich, malfet Differential Revision: D26922425 Pulled By: SplitInfinity fbshipit-source-id: f8c09524acee0de615df10a14310ca1dd583831e	2021-03-12 02:47:19 -08:00
Chen Lai	1772e26f63	[PyTorch] Move selected_mobile_ops.h codegen function to tools (#53786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53786 To generate `selected_mobile_ops.h` in OSS, move the header file codegen functions to `tools/lite_interpreter/gen_selected_mobile_ops_header.py` file, so OSS can reuse these functions. ghstack-source-id: 123754437 Test Plan: ``` buck test //xplat/caffe2:supported_mobile_models_test ``` ``` buck run //xplat/caffe2:gen_oplist -- --model_file_list_path @/data/users/chenlai/data/pytorch/oplist_folder/file_list_path.macro --allow_include_all_overloads --output_dir /home/chenlai/local/data/pytorch/oplist_folder ``` `file_list_path.macro` content is: ``` chenlai@devvm2090:~/fbsource(45a9b7888)$ cat /data/users/chenlai/data/pytorch/oplist_folder/file_list_path.macro /data/users/chenlai/fbsource/buck-out/gen/aab7ed39/xplat/caffe2/supported_mobile_models_test_op_list/model_operators.yaml ``` In output folder `/home/chenlai/local/data/pytorch/oplist_folder`, these files are generated: ``` selected_mobile_ops.h selected_operators.yaml SupportedMobileModelsRegistration.cpp ``` the generated files are the same as before. {P282056731} {P282055046} Reviewed By: dhruvbird, iseeyuan Differential Revision: D26907868 fbshipit-source-id: 9ba786f9c5674a72cad237ae7baadbe4642c51d5	2021-03-12 00:13:03 -08:00
Hui Guo	8737c2a1a2	[TensorExpr] Reland: "Simplify index expressions constructed in loop flattening. Fixes #51173 " (#53861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53861 Replaced the iterators in the for-loops with integer index variables due to overflow when handling empty vectors. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26998894 Pulled By: huiguoo fbshipit-source-id: a1f6475c8ba123968ef7247b4f6f38edbf24b9ef	2021-03-11 23:52:36 -08:00
Ailing Zhang	aeb3e93351	Move view handling logic to gen_inplace_or_view_type.py (#53341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53341 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973912 Pulled By: ailzhang fbshipit-source-id: ea31bdef0beac6996d509f5d45ebefa3ea8e2b89	2021-03-11 21:25:15 -08:00
Rohan Varma	e09e97ebf9	[DDP] add _distributed_rank helper function (#53795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53795 There are 4 calls in ddp implementation to dist.get_rank(), move these to a helper property to ensure that users don't actually call `dist.get_rank()` instead of `dist.get_rank(self.process_group)`. Keeping API private for now because not sure if there is a user need to call `model.distributed_rank`, but can make it public if we think it's a useful api. ghstack-source-id: 123640713 Test Plan: Ci Reviewed By: mrshenli Differential Revision: D26972368 fbshipit-source-id: a5f1cac243bca5c6f90a44f74d39cfffcc2b9a5a	2021-03-11 21:20:05 -08:00
Rohan Varma	0c2fe02ec1	[DDP] Fix wrong call to dist.get_rank() (#53793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53793 This call should pass in the process group so it works appropriately for subgroups instead of whole world being passed into DDP. Aside: This wasn't caught by tests since we don't have good testing around passing subgroups into DDP, I believe nearly all tests use the entire world. Should we add better testing for subgroups which may potentially bring up more subtle bugs? ghstack-source-id: 123640712 Test Plan: CI Reviewed By: mrshenli Differential Revision: D26972367 fbshipit-source-id: 8330bd51e2ad66841e4c12e96b67d3e78581ec74	2021-03-11 21:18:31 -08:00
Horace He	d4602b7e45	[NNC] Fixes case where inlining wouldn't work because dim-size was 1. (#53254 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52581 The git diff is absolutely atrocious since I also refactored the code to share stuff between `Load` and `FunctionCall`. Biggest questions I have about this diff are: 1. The asserts I added. From my understanding it's not possible to have a constant index in `Store` that's non-zero, since `Store` always creates a new buffer. Perhaps the user can write this kind of incorrect code, though, so perhaps I should just check for it and not assert it? 2. I don't think(?) I need to do any special handling for `index_vars`, but wasn't totally able to track the logic there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53254 Reviewed By: albanD Differential Revision: D26991064 Pulled By: Chillee fbshipit-source-id: 0bcd612d5f4b031c0b34e68a72d9c8d12d118be8	2021-03-11 20:53:20 -08:00
Nikita Shulga	ce670238ba	Revert D26927500: [libkineto] Log CUPTI errors on libkineto initialization Test Plan: revert-hammer Differential Revision: D26927500 (`cffe9aa617`) Original commit changeset: 2a78005239a5 fbshipit-source-id: ff9fdcb197b06b4ff99f41c80b4cecaf6a1820b8	2021-03-11 20:17:19 -08:00
Ailing Zhang	9f75de278f	Move common autograd utils functions from gen_variable_type.py to api/autograd.py. (#53340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53340 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26973914 Pulled By: ailzhang fbshipit-source-id: 8367a08b27b25808782c77aadc3c67d07c354957	2021-03-11 19:58:45 -08:00
Gisle Dankel	cffe9aa617	[libkineto] Log CUPTI errors on libkineto initialization Summary: When libkineto is initialized from the PyTorch Profiler, if it fails we will not know why because errors are not reported. Reporting errors is not always safe, e.g. if init happens from static initialization or a dlopen library constructor function, so add a flag to specify whether to log. Test Plan: Testing in PyTorch OSS build. Reviewed By: chaekit Differential Revision: D26927500 fbshipit-source-id: 2a78005239a5fcbe7e1de82e5405f04e07000fa8	2021-03-11 19:47:58 -08:00
Yi Wang	d726ce6668	Support loading a non-DP/DDP model from a DP/DDP state_dict (#53224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53224 Loading a DP/DDP dict just needs to strip the module prefix from all items in the state dict and the metadata. One existing example is here: https://github.com/facebookresearch/fvcore/blob/master/fvcore/common/checkpoint.py#L239. #Closes: https://github.com/pytorch/pytorch/issues/41048/ ghstack-source-id: 123722976 Test Plan: buck test mode/dev-nosan caffe2/test:nn -- test_load_state_dict buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_save_load_checkpoint Reviewed By: rohan-varma, mrshenli Differential Revision: D26798495 fbshipit-source-id: 035c7d0907d7ae8f0d7ca21ec71f7f96ef8df6c8	2021-03-11 18:43:33 -08:00
Kyle Chen	5c2b3d7784	[ROCm] Enable RNN test in test_c10d_spawn.py for ROCm (#52707 ) Summary: Enabling test_rnn test because it is passing for ROCm. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52707 Reviewed By: albanD Differential Revision: D26994407 Pulled By: mrshenli fbshipit-source-id: f7d60ab7c4f0128e5f7770f959e2b83694d18275	2021-03-11 18:41:54 -08:00
Xiang Gao	dfb5f029da	Disable TF32 on DDP tests (#52941 ) Summary: When a system has an ampere and a non-ampere card, lots of tests will fail, because results on different cards are differnet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52941 Reviewed By: albanD Differential Revision: D26994287 Pulled By: mrshenli fbshipit-source-id: 287537495fc13361104a4460f5bcd79a208b5d8d	2021-03-11 18:31:28 -08:00
Kyle Chen	06cf6d37b5	[ROCm] Enable test cases in test_data_parallel.py for ROCm (#52708 ) Summary: Enabling the test cases because they are passing for ROCm. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52708 Reviewed By: albanD Differential Revision: D26994458 Pulled By: mrshenli fbshipit-source-id: f0b3797c7889287a0154b1d5397df715ffb1c605	2021-03-11 18:29:37 -08:00
Mengwei Liu	c15d943149	[PyTorch] Fix broken build caused by keyword missing on Windows (#53562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53562 On Windows when we try to build //xplat/caffe2/c10:c10Windows, it failed with an error like ``` stderr: buck-out\gen\83497cbb\xplat\caffe2\c10\c10Windows#header-mode-symlink-tree-only,headers\c10/macros/Macros.h(189): error C2220: warning treated as error - no 'object' file generated buck-out\gen\83497cbb\xplat\caffe2\c10\c10Windows#header-mode-symlink-tree-only,headers\c10/macros/Macros.h(189): warning C4067: unexpected tokens following preprocessor directive - expected a newline ``` See log here: https://www.internalfb.com/intern/buck/build/6eaea1f8-e237-4860-9f3b-3a8edd2207c6/ This is because Windows doesn't support `__has_attribute` keyword. Here I'm changing the ordering of `if` and `elif` so that we don't hit that line when build in Windows. Test Plan: buck build //xplat/caffe2/c10:c10Windows xplat/mode/windows Reviewed By: kimishpatel, swolchok Differential Revision: D26896510 fbshipit-source-id: d52438a3df7bf742e467a919f6ab4fed14484f22	2021-03-11 18:24:46 -08:00
Meghan Lele	b69dd910e8	[docs] Add starter content for new TorchScript language reference (#53837 ) Summary: Summary This commit adds a new .rst file to use for updating the language specification and prepopulates it with the updated content for the expressions section. Test Plan https://user-images.githubusercontent.com/4392003/110441235-638ee880-806e-11eb-83ae-3b908bf00d5b.mov Pull Request resolved: https://github.com/pytorch/pytorch/pull/53837 Reviewed By: nikithamalgifb Differential Revision: D26990801 Pulled By: SplitInfinity fbshipit-source-id: 3b4e711bfaa8aac4ee3a075822fed7267a818121	2021-03-11 18:18:27 -08:00
Nikita Shulga	d57ae6c46d	Revert D26906509: Adding parallel support for the LLVM backend. Test Plan: revert-hammer Differential Revision: D26906509 (`95d2318510`) Original commit changeset: 12c17f2f21af fbshipit-source-id: cc86d0dfca0dd791b31bda23a0172fc1cfd89760	2021-03-11 17:54:47 -08:00
Yi Wang	8d8a4a0624	Remove the extra ":noindex:" in ddp_comm_hooks.rst (#53855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53855 Remove "noindex" here: {F492926346} ghstack-source-id: 123724419 Test Plan: waitforbuildbot The failure on doctest does not seem to be relevant. Reviewed By: rohan-varma Differential Revision: D26967086 fbshipit-source-id: adf9db1144fa1475573f617402fdbca8177b7c08	2021-03-11 17:26:50 -08:00
Pritam Damania	5344c3ea9e	Remove `join_workers` from Pipeline destructor. (#53433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53433 As described in https://github.com/pytorch/pytorch/issues/53413, the pipeline destructor ends up hanging sometimes. The reason for this is that Pipe uses daemon threads and as a result these threads could be destroyed before the Pipe destructor is done. The Pipe destructor then calls `join_workers` which waits on signals from the worker threads, which might be already dead and results in the main thread blocking forever. To resolve this issue, in this PR we remove `join_workers` completely since it is not necessary to wait for daemon threads. #Closes: https://github.com/pytorch/pytorch/issues/53413 ghstack-source-id: 123641509 Test Plan: 1) Tested with repro in https://github.com/pytorch/pytorch/issues/53413. 2) Hard to add a unit test for this since the bug really depends on order of objects being destroyed. Reviewed By: rohan-varma Differential Revision: D26863321 fbshipit-source-id: 18fff072cabacfb10390e971eac789859d3dcc81	2021-03-11 17:05:22 -08:00
Sebastian Messmer	6da0b94dd8	Add note on forwarding arguments in the dispatcher (#53641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53641 ghstack-source-id: 123466764 Test Plan: comments only Reviewed By: bhosmer Differential Revision: D26922477 fbshipit-source-id: ad630b5e1b10a2238f9b48aba656b2ffe65520a1	2021-03-11 16:40:37 -08:00
Facebook Community Bot	13f63fda5f	Automated submodule update: FBGEMM (#53722 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `d12fc485d5` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53722 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D26949768 fbshipit-source-id: 718796736c0641b7cf6c5b0617fc744a090c78c4	2021-03-11 16:13:22 -08:00
Jagadish Krishnamoorthy	ec6a7cace3	[ROCm] Fix the flaky test test_stream_event_nogil (#53850 ) Summary: Fix the flaky test in https://github.com/pytorch/pytorch/issues/53192 properly. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53850 Reviewed By: albanD Differential Revision: D26993582 Pulled By: malfet fbshipit-source-id: b0aefb188a236a5e94ee31a30ede7e8175443ff5	2021-03-11 16:07:41 -08:00
BowenBao	b9e900ee52	[ONNX] Update inputs/input_names formatting to avoid ValueError with scriptMethods (#53519 ) (#53548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53548 fixes issue faced in #53506 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26922415 Pulled By: malfet fbshipit-source-id: b61842827bb14cef8c7a7089b2426fa53e642c90	2021-03-11 14:26:02 -08:00
skyline75489	cdac61ecd4	Prevent VS from emitting ambiguous symbol errors (third time) (#53490 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/53409 First: https://github.com/pytorch/pytorch/issues/15697 Second: https://github.com/pytorch/pytorch/issues/17863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53490 Reviewed By: VitalyFedyunin Differential Revision: D26946687 Pulled By: mrshenli fbshipit-source-id: 27f85abecbb75456354cc0373529c8cadc8133bd	2021-03-11 13:51:41 -08:00
Yi Wang	8016d28c0b	[Gradient Compression] Update the comment on fp16_compress_hook (#53780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53780 Update the comment, because the input data type of `fp16_compress_hook` does not have to be FP32. For example, the input dtype can also be FP64, as long as it can be casted into FP16. ghstack-source-id: 123680621 Test Plan: N/A Reviewed By: iseessel Differential Revision: D26967224 fbshipit-source-id: 26d79a3629a597e6335b6f59c97d25a764a8ed80	2021-03-11 13:40:32 -08:00
cyy	14d02517e1	replace data with data_ptr (#53097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53097 Reviewed By: albanD Differential Revision: D26972445 Pulled By: rohan-varma fbshipit-source-id: 04798a3fd55dd297638377513cfc57ff86c8916d	2021-03-11 13:14:35 -08:00
Rohan Varma	fa980bb22a	[wip][Dist Profiling] Enable dist profiling for MPI backend (#52949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52949 Enables distributed profiling which we have for gloo and nccl for the MPI backend ghstack-source-id: 123610105 Test Plan: CI Reviewed By: wanchaol Differential Revision: D26591590 fbshipit-source-id: a20ec9d104faa26bc62c727dd01319c3ea230f5d	2021-03-11 13:08:41 -08:00
Adam Simpkins	7e5ffbfa94	[caffe2] add a SerializationOptions field for the save operator (#53402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53402 Add an `options` field to the `Save` operator which accepts options for how to serialize different blobs. At the moment this simply allows controlling the existing `chunk_size` behavior, but in the future we can add other options, such as the ability to control compression settings or other serialization formats. ghstack-source-id: 123567034 Test Plan: Added a new test to `load_save_test.py` that passes in options and verifies that blobs were serialized with the expected number of chunks. buck test caffe2/caffe2:caffe2_test_cpu \ caffe2/caffe2/core:serialization_test \ caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26502577 fbshipit-source-id: 6e302e530bb96990517c2e35c505db7f14a56284	2021-03-11 13:02:58 -08:00
Hui Guo	1acced4eba	Implemented getCodeText(string attr) in llvm/cuda codegen and added python bindings for it - #52974 (#53664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53664 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26929204 Pulled By: huiguoo fbshipit-source-id: 281fe6c25f4664636b29d51dba396056a222a9e7	2021-03-11 11:57:39 -08:00
Facebook Community Bot	379f1f1ede	Automated submodule update: tensorpipe (#53810 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `2719d7e0b7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53810 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26979037 fbshipit-source-id: d0cc7c25b764d5f207431a839f396fb8e22b2a22	2021-03-11 11:35:55 -08:00
kshitij12345	8b9e3e6fd4	[complex] enable complex autograd cumsum (#53240 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53182 Turns out that there is no need to update formula :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53240 Reviewed By: VitalyFedyunin Differential Revision: D26948582 Pulled By: anjali411 fbshipit-source-id: 450aab0d585f15385dd1748c2a3ddf787df0764b	2021-03-11 11:30:18 -08:00
Aliaksandr Ivanou	ec484981c6	[3/n][torch/elastic][upstream] Move torchelastic/events to torch/distributed/events (#53760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53760 Pull Request resolved: https://github.com/pytorch/elastic/pull/143 The diff upsteams torchelastic/events to the torch. Test Plan: buck test mode/dev-nosan //pytorch/elastic/torchelastic/agent/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/events/fb/... Reviewed By: kiukchung Differential Revision: D26932830 fbshipit-source-id: 23fc10d2ead5af7f7ed510ae0d2581cc2421cf76	2021-03-11 11:25:24 -08:00
Sam Estep	bbce574ccf	Pass commit_sha to add-annotations-github-action again (#53834 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53833. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53834 Test Plan: The CI logs for flake8-py3 and clang-tidy on this PR should show `commit_sha` being set to the PR tip in their respective "Add annotations" steps. Reviewed By: malfet Differential Revision: D26983201 Pulled By: samestep fbshipit-source-id: e5d1fbbaf2a2611fec583b430c6353e778bc77a6	2021-03-11 11:17:17 -08:00
Sam Estep	5cf4527c88	Update repo name for add-annotations-github-action (#53826 ) Summary: It looks like https://github.com/suo/add-annotations-github-action redirects to https://github.com/pytorch/add-annotations-github-action, so this is a bit less confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53826 Test Plan: The clang-tidy CI job should pass on this PR. Reviewed By: malfet Differential Revision: D26981832 Pulled By: samestep fbshipit-source-id: 273c18535d0d27b14942b02ae552020ffc60623b	2021-03-11 11:11:24 -08:00
BowenBao	3f9c803fe8	[ONNX] Redesign onnx pass to enable shape type dependent pattern conversion - cont (#51795 ) (#53304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53304 With the introduction of ONNX shape inference, shape and type are inferred on the fly as operators get converted from ATen to ONNX when running symbolic function. This resolves the shape/type requirement for the symbolic functions. The pre-onnx passes however, can not be supported by shape inference, since at that stage the operators in the graph are still ATen operators. This PR is to update the design of ONNX pass, to enable a mechanism of capturing subgraphs of ATen operators of certain patterns, and convert them later, when shape/type information of upstream operators are available. The new design will require pre-onnx passes that need shape/type to be written in two parts, encapsulation and conversion. The encapsulation part will find the nodes of patterns, like how pre-onnx passes were written previously. But instead of converting the nodes, it will encapsulate them into a sub-block of a new placeholder node. This part is called before onnx pass, so it runs before calling symbolic functions. The conversion part will be called inside the onnx pass. In onnx pass, run_symbolic_func will be called for each node in topological order. When it reaches the placeholder node, the conversion part will be invoked. It will convert the nodes inside the sub-block based on pattern. By that time, it will have shape/type of upstream operators available. After the conversion is complete, the placeholder node will be removed, and nodes inside its sub-block converted. Run_symbolic_func will be called for these nodes, and they will be converted from ATen operator to ONNX operator. This PR includes several other fixes, listed below. * ~~replace helper.cpp with onnx_utils.cpp for holding utility functions.~~ * fix EraseNumberTypes on Bool type, the code was outdated that back then Bool type doesn't exist. * ~~enable onnx shape inference in export with parameter/initializer data.~~ * other code clean ups. * fix insertion of identity nodes for loop opset 13 sequence output. ~~PR depends on #51603~~ Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26922417 Pulled By: malfet fbshipit-source-id: 14ed06158d539e2451c2e5e63ba1b32fb0f75095	2021-03-11 10:30:09 -08:00
Edward Yang	5648fe6093	Make storage access throw for meta tensors (#53681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53681 Without throwing, we can easily segfault trying to access nullptr storage. To do this I made set_storage_access_should_throw public so that you don't have to subclass TensorImpl to do it. An alternative is to just bite the bullet and add a MetaTensorImpl subclass. Let me know what is preferred. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26955540 Pulled By: ezyang fbshipit-source-id: 8ce22dd07ef1beb042f1d91de981954d59c2f84a	2021-03-11 10:18:14 -08:00
Kimish Patel	ec713c0eb5	[Pytorch] Improve scale and zero point extraction for per channel quantized (#53726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53726 In quantized linear layers, during deserialization we create scales and zero points which are later used for qnnpack kernels. Scales and zero pointer extraction for per channel quantized tensors is slow. This is due to the fact that we index directly into zero point and scales tensor and this indexing creates a tensor slice of 1 element which is then cast to int32 or float. This is super slow and increases model loading time. This diff fixes that. Test Plan: CI Reviewed By: raziel Differential Revision: D26922138 fbshipit-source-id: b78e8548f736e8fa2f6636324ab1a2239b94a27c	2021-03-11 09:55:31 -08:00
iramazanli	d7b5a6faaa	Revert "Revert D26733731: [pytorch][PR] Skip dispatch for `is_floatin… (#53242 ) Summary: …g_point`" This reverts commit fbf2883d350f62d17292b71a58f404b5e3e58b7b. Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/53242 Reviewed By: mrshenli Differential Revision: D26896105 Pulled By: iramazanli fbshipit-source-id: 279a6f6d4fbb7949a7ed65df848db71a9b8d44e2	2021-03-11 09:46:25 -08:00
Jerry Zhang	7484c56fa3	[quant][graphmode][fx] Fix a condition check for CopyNode (#53585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53585 Previously fp16_static CopyNode would be marked as unquantized because of an incorrect condition check of whether a Node is statically quantized or not. This PR fixes that. Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26912677 fbshipit-source-id: 4ddb538714c5ba2db28430de5e1cf2931baf1993	2021-03-11 09:32:20 -08:00
Michael Melesse	4c1af249fb	[ROCM] load hipfft separately from rocfft (#53408 ) Summary: This PR makes changes to how hipfft is loaded in pytorch. hipfft is packaged in a separate library to rocfft following rocm 4.1. We check the rocm version and if it is past rocm 4.1 we load hipfft in addition to rocfft. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53408 Reviewed By: albanD Differential Revision: D26952702 Pulled By: malfet fbshipit-source-id: f42be304b587c060816e39d36f5c1a2cdc37bfab	2021-03-11 09:18:33 -08:00
skyline75489	5842d34fac	Call nvidia-smi.exe before running tests on Windows (#53422 ) Summary: Follow up for https://github.com/pytorch/pytorch/issues/53334 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53422 Reviewed By: VitalyFedyunin Differential Revision: D26954202 Pulled By: malfet fbshipit-source-id: fe16a2413618e07d6380824e967d87e29a09b178	2021-03-11 09:12:34 -08:00
Jagadish Krishnamoorthy	0a549f9412	[ROCm] Disable flaky tests on ROCm (#53192 ) Summary: The disabled tests are tracked by https://github.com/pytorch/pytorch/issues/53190 Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53192 Reviewed By: zhangguanheng66 Differential Revision: D26782204 Pulled By: mrshenli fbshipit-source-id: bc90b182c236249961da1f0d4894d29f6b44fa27	2021-03-11 08:29:12 -08:00
Sam Estep	05f137c765	Remove GHA "Checkout PR tip" step (#53719 ) Summary: This PR replaces our current "Checkout PR tip" step (which is duplicated across many places) using a [scenario](https://github.com/actions/checkout#checkout-pull-request-head-commit-instead-of-merge-commit) from the `actions/checkout` README. We previously tried something similar in https://github.com/pytorch/pytorch/issues/49578, but using `github.head_ref` didn't work. The reason this PR works is because, for events besides `pull_request`, the value of `github.event.pull_request.head.sha` defaults to the empty string, so it's as if we didn't set the `ref` option for `actions/checkout` at all, so it just uses its default behavior (e.g. for `push` events). Incidentally, this PR also upgrades our use of `actions/checkout` from `v1` to `v2`, which introduces shallow clones by default. A couple of our jobs require deep clones, so we use `fetch-depth: 0` in those cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53719 Test Plan: CI. Reviewed By: albanD Differential Revision: D26949121 Pulled By: samestep fbshipit-source-id: e06f8066682ae0557fb5a055a10ea33b6bd320db	2021-03-11 08:06:49 -08:00
ilqar	f364e492df	Autograd functional API should enable_grad (#47543 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44640 Pull Request resolved: https://github.com/pytorch/pytorch/pull/47543 Reviewed By: albanD Differential Revision: D26965136 Pulled By: iramazanli fbshipit-source-id: 1dd46b9402bb670c0e165db684712e26c1a2036f	2021-03-11 07:41:31 -08:00
Edward Yang	e185ec6c3d	Revert D26955317: Perform appropriate CUDA stream synchronization in distributed autograd. Test Plan: revert-hammer Differential Revision: D26955317 (`0b84f45f03`) Original commit changeset: eace6d4f91d4 fbshipit-source-id: 1f322b4d7cf7d1a7e6caf3194c6f0bf163d45850	2021-03-11 07:27:44 -08:00
Edward Yang	ffac9b2ead	Revert D26965463: [pytorch][PR] [docs] Add starter content for new TorchScript language reference Test Plan: revert-hammer Differential Revision: D26965463 (`d49c5c74f5`) Original commit changeset: 246c76a56d91 fbshipit-source-id: 50de1a2ac92204a2f3a2ad9b8fa163338062bf58	2021-03-11 07:26:00 -08:00
Edward Yang	07d315fce8	Revert D26676150: Simplify index expressions constructed in loop flattening - #51173 Test Plan: revert-hammer Differential Revision: D26676150 (`1f01899e4a`) Original commit changeset: e202e0c8610e fbshipit-source-id: 9611dda6897b67e16e44c731994bc9e5fccab0b9	2021-03-11 07:17:38 -08:00
Xiaoqiang Zheng	95d2318510	Adding parallel support for the LLVM backend. (#53243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53243 Test Plan: Imported from OSS Reviewed By: bertmaher, Chillee Differential Revision: D26906509 Pulled By: zheng-xq fbshipit-source-id: 12c17f2f21af11e73fa4c5b5199043a7a15ecdec	2021-03-11 03:27:37 -08:00
Nikitha Malgi	351f6f5e02	[JIT] Update set_stream API to change the device (#53741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53741 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26970606 Pulled By: nikithamalgifb fbshipit-source-id: 257b9425d105a68fc9ef567af266fa461ddf05ec	2021-03-11 02:13:22 -08:00
Nikitha Malgi	cfaa0bf286	[JIT] Update Namespace from cuda to _cuda (#53378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53378 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26970607 Pulled By: nikithamalgifb fbshipit-source-id: 20a55dd9c0071c5870a4b176d30cb9c1e1496687	2021-03-11 00:52:01 -08:00
Pritam Damania	0b84f45f03	Perform appropriate CUDA stream synchronization in distributed autograd. (#53769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53769 The local autograd engine performs appropriate stream synchronization between autograd nodes in the graph to ensure a consumer's stream is synchronized with the producer's stream before executing the consumer. However in case of distributed autograd, the SendRpcBackward function receives gradients over the wire and TensorPipe uses its own pool of streams for this purpose. As a result, the tensors are received on TensorPipe's stream pool but SendRpcBackward runs on a different stream during the backward pass and there is no logic to synchronize these streams. To fix this, I've enhanced DistEngine to synchronize these streams appropriately when it receives grads over the wire. ghstack-source-id: 123607221 Test Plan: 1) Added unit test which reproduced the issue. 2) waitforbuildbot. Reviewed By: wanchaol, mrshenli Differential Revision: D26955317 fbshipit-source-id: eace6d4f91d4006c9c16ede5ac16362ada052406	2021-03-10 23:39:55 -08:00
Jordan Fix	1053c96693	[GraphModule] Back out changes to module root version of __init__ (#53791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53791 Reviewed By: houseroad Differential Revision: D26970869 fbshipit-source-id: 80684516f57fd2d1aca794f17fe488b2fe2b2f64	2021-03-10 23:18:56 -08:00
James Butterworth	37ab711822	Adding learning rate schedulers to C++ API (#52268 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50577 Learning rate schedulers had not yet been implemented for the C++ API. This pull request introduces the learning rate scheduler base class and the StepLR subclass. Furthermore, it modifies the existing OptimizerOptions such that the learning rate scheduler can modify the learning rate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52268 Reviewed By: mrshenli Differential Revision: D26818387 Pulled By: glaringlee fbshipit-source-id: 2b28024a8ea7081947c77374d6d643fdaa7174c1	2021-03-10 23:09:51 -08:00
Jiatong Zhou	ebfa9276d8	Move prim::layout for lite jit (#53781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53781 needed for running noise suppression model in lite interpreter Test Plan: run model Reviewed By: linbinyu Differential Revision: D26967227 fbshipit-source-id: 19677fc796f1fb4423ebb11b5ffd9df5870a39cf	2021-03-10 21:26:09 -08:00
Bert Maher	3bd250fd03	[nnc] Test ability to vectorize reads from an intermediate tensor (#53752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53752 This test doesn't work today because we don't properly vectorize "FunctionCall" (which is the way one accesses an intermediate tensor). ghstack-source-id: 123592860 Test Plan: `buck test //caffe2/test/cpp/tensorexpr -- LoopNest.VectorizeUse` Reviewed By: ZolotukhinM Differential Revision: D26895550 fbshipit-source-id: 0798ebf3e6a834bd70181732c81528455d5329fa	2021-03-10 20:32:10 -08:00
Raghavan Raman	a5e19126b6	[NNC] LoopNest cleanup (#53688 ) Summary: * Replacing vector of Tensors with a set of output buffers in `TensorExprKernel`. * Creating a block statement while compiling in `TensorExprKernel`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53688 Reviewed By: mrshenli Differential Revision: D26941222 Pulled By: navahgar fbshipit-source-id: 9eb81ec2effcdeafbeaa67d1e12475166054f80f	2021-03-10 20:20:03 -08:00
Meghan Lele	d49c5c74f5	[docs] Add starter content for new TorchScript language reference (#52494 ) Summary: Summary This commit adds a new .rst file to use for updating the language specification and prepopulates it with the updated content for the expressions section. Test Plan https://user-images.githubusercontent.com/4392003/110441235-638ee880-806e-11eb-83ae-3b908bf00d5b.mov Pull Request resolved: https://github.com/pytorch/pytorch/pull/52494 Reviewed By: nikithamalgifb Differential Revision: D26965463 Pulled By: SplitInfinity fbshipit-source-id: 246c76a56d911a8061e720abd200a44d7dfa1f35	2021-03-10 19:36:27 -08:00
Zachary DeVito	ced91bb713	[deploy] namespace and rename (#53670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53670 This puts deploy into the torch::deploy namespace. It also renames some objects to better match their behavior: PythonObject -> Obj, in the future it will refer to either a python object or a handle to a script obj, so rename it torch::deploy::Obj to be generic MovableObject -> ReplicatedObj, to prevent confusion with "std::move" which is unrelated, and to note that we are replicating this object across interpreters. Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D26932131 Pulled By: zdevito fbshipit-source-id: 8041d6c5b2041a7c3192c1a17d2edb38112a89f3	2021-03-10 19:13:07 -08:00
Scott Wolchok	14acf92b2b	[PyTorch] Speed up Tensor::data_ptr (#53723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53723 We know the size of the data item at compile time. Let's take advantage of that instead of doing a runtime multiplication by the data type size. (Presumably, constant propagating through `data_type.itemsize()` to optimize the `imul` away was just a bridge too far for clang -- I checked assembly and we went from a load-and-`imul` to a `lea` that multiplied by constant 4 for `data_ptr<float>()`.) ghstack-source-id: 123559924 Test Plan: Compared `perf stat` output for Mergenet AdIndexer benchmark before/after this change: Before: ``` 16,943.46 msec task-clock # 0.999 CPUs utilized ( +- 0.16% ) 3,771 context-switches # 0.223 K/sec ( +- 15.86% ) 3 cpu-migrations # 0.000 K/sec 101,660 page-faults # 0.006 M/sec ( +- 1.00% ) 33,519,516,740 cycles # 1.978 GHz ( +- 0.20% ) (49.99%) 68,556,471,199 instructions # 2.05 insn per cycle ( +- 0.08% ) (49.98%) 11,100,415,689 branches # 655.145 M/sec ( +- 0.12% ) (50.02%) 73,082,369 branch-misses # 0.66% of all branches ( +- 0.45% ) (50.01%) ``` After: ``` 16,779.99 msec task-clock # 0.999 CPUs utilized ( +- 0.40% ) 2,815 context-switches # 0.168 K/sec ( +- 7.89% ) 3 cpu-migrations # 0.000 K/sec ( +- 6.25% ) 100,124 page-faults # 0.006 M/sec ( +- 0.40% ) 33,213,000,715 cycles # 1.979 GHz ( +- 0.39% ) (49.99%) 68,359,270,731 instructions # 2.06 insn per cycle ( +- 0.08% ) (50.00%) 11,058,104,630 branches # 659.005 M/sec ( +- 0.11% ) (50.00%) 72,840,016 branch-misses # 0.66% of all branches ( +- 0.51% ) (49.99%) ``` 0.9% cycles win and 0.29% instruction count win, both of which seem to be outside the noise. Reviewed By: bhosmer Differential Revision: D26919110 fbshipit-source-id: 23fab7adcfcf6ec9c87ebfb5d5304b6f155f577f	2021-03-10 19:03:42 -08:00
Hui Guo	1f01899e4a	Simplify index expressions constructed in loop flattening - #51173 (#52882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52882 Test Plan: Imported from OSS build/bin/test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D26676150 Pulled By: huiguoo fbshipit-source-id: e202e0c8610eb107558a3add8a6560a0cb97704a	2021-03-10 18:37:42 -08:00
Bert Maher	997f05cd34	[nnc] Add an initialization expression to Reduce() (#53751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53751 Sometimes the initial value of a reduction expression needs to be computed with reference to the loop axes; for example, adding bias can be efficiently represented by initializing the accumulator from the bias tensor: ``` C[n, c, h, w] = bias[c] for (...) C[n, c, h, w] += ... ``` ghstack-source-id: 123592861 Test Plan: `buck test //caffe2/test/cpp/tensorexpr -- Reductions.InitFunction` Reviewed By: navahgar Differential Revision: D26940321 fbshipit-source-id: 8a08e19e5d0b9ad453a07fab8b61e75dcd3d626b	2021-03-10 17:13:14 -08:00
Brian Hirsh	49a5f99440	skip dispatch in resize_ (#53575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53575 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26902348 Pulled By: bdhirsh fbshipit-source-id: b322f233d934278f03e56cd1e35acc0665389398	2021-03-10 17:06:35 -08:00
Huamin Li	21f9a6da7d	Avoid of creating a copy of statusString every inference time (#53756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53756 as title Reviewed By: yinghai Differential Revision: D26949450 fbshipit-source-id: a737ce1ed25cf53faef8cdc94912542769a1008f	2021-03-10 16:58:02 -08:00
Jerry Zhang	0584fd9339	[quant][fx][graphmode][fix] Only insert observers for fixed qparam ops (#53330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53330 Fixed a condition check for fixed qparam ops, previously we were including CopyNodes as well Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_fixed_qparams_ops_fp16 Imported from OSS Reviewed By: vkuzo Differential Revision: D26836867 fbshipit-source-id: 8c486155244f852e675a938c3f4237f26505671c	2021-03-10 16:51:36 -08:00
hyperfraise	f9185973d1	[quantization] Add some support for 3d operations (#50003 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50002 The last commit adds tests for 3d conv with the `SubModelFusion` and `SubModelWithoutFusion` classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50003 Reviewed By: mrshenli Differential Revision: D26325953 Pulled By: jerryzh168 fbshipit-source-id: 7406dd2721c0c4df477044d1b54a6c5e128a9034	2021-03-10 16:40:35 -08:00
Peter Bell	895735c69f	TensorIterator: Avoid nesting two levels of function_ref in for_each (#53613 ) Summary: When calling `TensorIterator::for_each` with a 1d loop, it creates a `function_ref` for the 1D iteration, then wraps it with `LOOP_WRAPPER` to transform it into a 2d loop. That 2d loop then gets wrapped in another `function_ref`. This can result in significant overhead if the 1d inner loop is over a small number of elements. Instead, this wraps the 1d loop before type-erasure so only one level of `function_ref` is introduced. A simple benchmark demonstrates this is a win: ```python import torch a = torch.rand((10000, 2))[::2] %timeit a + a ``` Note the 2D tensor cannot be coalesced into 1D and both `cpu_kernel` and `cpu_kernel_vec` use 1D for_each. On master, this takes 42 us but with this change it's down to 32us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53613 Reviewed By: VitalyFedyunin Differential Revision: D26947143 Pulled By: ezyang fbshipit-source-id: 5189ada0d82bbf74170fb446763753f02478abf6	2021-03-10 16:28:22 -08:00
Yi Wang	fe0810e2f8	Add a section to introduce GradBucket class in ddp_comm_hooks.rst (#53253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53253 Since GradBucket class becomes public, mention this class in ddp_comm_hooks.rst. Screenshot: {F478201008} ghstack-source-id: 123596842 Test Plan: viewed generated html file Reviewed By: rohan-varma Differential Revision: D26812210 fbshipit-source-id: 65b70a45096b39f7d41a195e65b365b722645000	2021-03-10 16:14:34 -08:00
Yi Wang	c988b78be2	Add a description of GradBucket Python class (#53596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53596 This description will be used in ddp_comm_hook docstrings. ghstack-source-id: 123590360 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26908160 fbshipit-source-id: 824dea9203ca583676bddf0161c9edca52c9d20e	2021-03-10 16:12:53 -08:00
Michael Suo	741d0f41d6	[package] split tests (#53749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53749 Split up tests into cases that cover specific functionality. Goals: 1. Avoid the omnibus test file mess (see: test_jit.py) by imposing early structure and deliberately avoiding a generic TestPackage test case. 2. Encourage testing of individual APIs and components by example. 3. Hide the fake modules we created for these tests in their own folder. You can either run the test files individually, or still use test/test_package.py like before. Also this isort + black formats all the tests. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26958535 Pulled By: suo fbshipit-source-id: 8a63048b95ca71f4f1aa94e53c48442686076034	2021-03-10 16:07:36 -08:00
Zachary DeVito	4351d09683	Fix error message in setStorage (#53198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53198 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26830030 Pulled By: zdevito fbshipit-source-id: 34d383c4561bba88dee6d570cbd22bd58a3fe856	2021-03-10 15:45:14 -08:00
Sam Estep	fee263595c	Remove trailing whitespace introduced in #52175 (#53762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53762 Test Plan: CI. Reviewed By: seemethere Differential Revision: D26961133 Pulled By: samestep fbshipit-source-id: 972ea480baa3f34b65327abdf7e8bfdf30788572	2021-03-10 15:35:04 -08:00
Adam Simpkins	023948e6d7	[caffe2] update load_save_test.py to also verify the chunking behavior (#53401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53401 This is a reland of D26641599 (`cd9ac54ea7`) after rebasing onto D26802576 (`f595ba1bae`). Add some small utility functions to read the blob names back from the minidb file so that we can verify how many chunks were written for each blob. ghstack-source-id: 123567033 Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26853942 fbshipit-source-id: 0b45078fdd279f547752c8fdb771e296374a00da	2021-03-10 15:29:36 -08:00
Adam Simpkins	99d7c8ff94	[caffe2] use AddNAlreadyReserved() when serializing blobs (#53400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53400 This is a reland of D26617038 (`b4a8d98247`) after rebasing onto D26802576 (`f595ba1bae`). Optimize the blob serialization code by using `AddNAlreadyReserved()` when serializing tensor data, rather than making N separate `Add()` calls. `AddNAlreadyReserved()` is a simple addition operation, while each `Add()` call checks to see if it needs to reserve new space, and then updates the element data, which is unnecessary in this case. ghstack-source-id: 123567030 Test Plan: This appears to improve raw serialization performance by 30 to 35% for float, double, and int64_t types which use this function. This improvement appears relatively consistent across large and small tensor sizes. Reviewed By: mraway Differential Revision: D26853941 fbshipit-source-id: 4ccaa5bc1dd7f7864068d71a0cde210c699cbdba	2021-03-10 15:27:52 -08:00
Jagadish Krishnamoorthy	2cf90982e9	[TestZeroRedundancyOptimizer] Add multi gpu checker (#53564 ) Summary: The test test_collect_shards fails on single GPU setup. Enabling the multi gpu checker. Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53564 Reviewed By: H-Huang Differential Revision: D26952325 Pulled By: rohan-varma fbshipit-source-id: e8956f9277c7320024bece129767e83fbdf02b2c	2021-03-10 15:17:55 -08:00
Jerry Zhang	d9fa957ecc	[quant][graphmode][fix] Handle the case when observed node has no users (#53210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53210 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26791724 fbshipit-source-id: b2a226a22d6aba86dd01cacbb56577048a289b3e	2021-03-10 15:08:48 -08:00
Jordan Fix	56f3cb7a99	Add AST rewriter to acc_tracer (#53644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53644 Reviewed By: gcatron Differential Revision: D26506841 fbshipit-source-id: 64367d7e9f6619d014a01c147476b50467efa5c8	2021-03-10 14:46:31 -08:00
Elias Ellison	5563248b58	[JIT] [Remove Mutation] Add handling of normal_ (#52175 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52175 Reviewed By: mrshenli Differential Revision: D26919193 Pulled By: eellison fbshipit-source-id: d036cbc7b42377f88a3d381e4932a710b8d22a04	2021-03-10 14:28:09 -08:00
Brian Hirsh	c68cc24cee	update upsample tests in test_nn.py to test for memory_format (#53665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53665 ngimel pointed out to me where we already test the behavior of the `Upsample` ops in `test_nn.py`. This PR deleting my bespoke tests in `test_torch.py` and updates those in `test_nn.py` to test memory format properly. There were two reasons the original test didn't pick up on a memory format regression: - They didn't test the memory format of the output tensor explicitly, i.e. `output.is_contiguous(memory_format=...)` - Even with that change, the test tensors were to simple to fail the tests. From some trial and error, it looks like one of the first two dimensions in the inputs needs to be > 1 in order for the `channels_last` memory format to actually re-order the strides. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26929683 Pulled By: bdhirsh fbshipit-source-id: d17bc660ff031e9b3e2c93c60a9e9308e56ea612	2021-03-10 14:21:14 -08:00
76181208+imaginary-person@users.noreply.github.com	669fcf3093	Replace supports_tensor_out with supports_out (#53745 ) Summary: https://github.com/pytorch/pytorch/issues/52875 introduced this bug, as `supports_tensor_out` was replaced with `supports_out` in https://github.com/pytorch/pytorch/issues/53259, so CI checks are failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53745 Reviewed By: gmagogsfm Differential Revision: D26958151 Pulled By: malfet fbshipit-source-id: 7cfe5d1c1a33f06cb8be94281ca98c635df76838	2021-03-10 13:42:56 -08:00
Thomas Viehmann	76b58dd9ae	Fix distributions which don't properly honor validate_args=False (#53600 ) Summary: A number of derived distributions use base distributions in their implementation. We add what we hope is a comprehensive test whether all distributions actually honor skipping validation of arguments in log_prob and then fix the bugs we found. These bugs are particularly cumbersome in PyTorch 1.8 and master when validate_args is turned on by default In addition one might argue that validate_args is not performing as well as it should when the default is not to validate but the validation is turned on in instantiation. Arguably, there is another set of bugs or at least inconsistencies when validation of inputs does not prevent invalid indices in sample validation (when with validation an IndexError is raised in the test). We would encourage the implementors to be more ambitious when validation is turned on and amend sample validation to throw a ValueError for consistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53600 Reviewed By: mrshenli Differential Revision: D26928088 Pulled By: neerajprad fbshipit-source-id: 52784a754da2faee1a922976e2142957c6c02e28	2021-03-10 13:16:32 -08:00
Kiuk Chung	b03c92a9c5	[2/n][torch/elastic][upstream] Move torchelastic/timer torchelastic/multiprocessing to torch/distributed/elastic (#53574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53574 Upstreams `torchelastic/timer\|multiprocessing` to `torch/distributed/elastic/timer\|multiprocessing` Test Plan: ``` buck test mode/dev-nosan //caffe2/torch/distributed/elastic/... buck test mode/dev-nosan //caffe2/test/distributed/elastic/... buck test mode/dev-nosan //pytorch/elastic/torchelastic/... buck test mode/dev-nosan //hpc/... buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/... ``` Reviewed By: borovsky-d, wilson100hong Differential Revision: D26899809 fbshipit-source-id: e6dbc2a78282eac296c262b3206a979e3ef1ff53	2021-03-10 12:32:53 -08:00
Guilherme Leobas	cb68039363	Port NumPy typing testing style to PyTorch (#52408 ) Summary: ref: https://github.com/pytorch/pytorch/issues/16574 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52408 Reviewed By: anjali411 Differential Revision: D26654687 Pulled By: malfet fbshipit-source-id: 6feb603d8fb03c2ba2a01468bfde1a9901e889fd	2021-03-10 12:18:01 -08:00
Michael Suo	17bc70e6f7	[package] make importer a little more obscure (#51676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51676 We offer the ability to access the importer from within packaged modules by doing `import resources`. This behavior is nice (and more powerful than the importlib resources API), but I think `resources` is too common a name (pip has a package for it) Change to `import torch_package_importer` but open to bikeshedding Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26620314 Pulled By: suo fbshipit-source-id: 0942c99f02c0f55f5f3a1b2566961018b796bdd4	2021-03-10 12:13:15 -08:00
Michael Suo	b4d8f4af82	[package] implement `get_resource_reader` API (#51674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51674 See https://docs.python.org/3/library/importlib.html#importlib.abc.ResourceReader Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26237034 Pulled By: suo fbshipit-source-id: 4c19f6172d16b710737528d3de48372873b9368d	2021-03-10 12:11:11 -08:00
Sam Estep	bfc80b3566	Give line numbers in git-grep-based lints (#53733 ) Summary: Meant to make tasks like https://github.com/pytorch/pytorch/issues/53728 easier. The `-n` flag enables line numbers, and the `-o` flag reduces noise by only showing the part of the line that matched (which in this case is just the trailing whitespace). Pull Request resolved: https://github.com/pytorch/pytorch/pull/53733 Test Plan: ``` $ git checkout e937db5dbaeaeae1134b02b3b78c43db3f6a91cd ``` Before: ``` $ (! git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \|\| (echo "The above files have trailing spaces; please remove them"; false)) aten/src/ATen/native/cuda/BatchLinearAlgebra.cu The above files have trailing spaces; please remove them ``` After: ``` $ (! git grep -I -no ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \|\| (echo "The above files have trailing spaces; please remove them"; false)) aten/src/ATen/native/cuda/BatchLinearAlgebra.cu:1972: The above files have trailing spaces; please remove them ``` Reviewed By: mruberry Differential Revision: D26953538 Pulled By: samestep fbshipit-source-id: 5f7d48b79f1a02e5e5a09fe00316ec350cfc340e	2021-03-10 12:03:26 -08:00
Jean Kossaifi	70a43425e0	Simplify init._calculate_fan_in_and_fan_out (#53522 ) Summary: This uses the shape of the tensor instead of directly indexing it. This is useful when extending PyTorch's tensor class, e.g. for lazy access. Since the `init` sub-module doesn't check for `torch_function`, it is not possibly to override its functions. Explicitly indexing the tensor will force a call to tensor() and reconstruct the full tensor/explicitly access the elements. Simply using the shape allows to avoid that. Fixes https://github.com/pytorch/pytorch/issues/53540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53522 Reviewed By: anjali411 Differential Revision: D26947794 Pulled By: jbschlosser fbshipit-source-id: 80cd65efed16383f21363cee2eb404c9bc05971c	2021-03-10 11:57:17 -08:00
Yanli Zhao	a76b4736db	clang format reducer and logger files (#53148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53148 clang format reducer and logger files ghstack-source-id: 123453983 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26764509 fbshipit-source-id: 711efcfd77420f912861cfd20c69e3af5086f4b9	2021-03-10 11:35:30 -08:00
Yanli Zhao	d032287ec3	fix data type logging (#53162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53162 it is possible there are multiple data types in mixed precision training, so log data types as a list of data type names. ghstack-source-id: 123452626 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26769256 fbshipit-source-id: 8f7d73821e89864fedbbce723f301fe8fbad5685	2021-03-10 11:35:26 -08:00
Yanli Zhao	7d4b229d61	add is_multi_device_module logging field (#53149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53149 add is_multi_device_module logging field ghstack-source-id: 123444621 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26765355 fbshipit-source-id: d4d9c5981b18b1744299aebe8af37eb4e2e35c61	2021-03-10 11:35:22 -08:00
Yanli Zhao	a08fc1a7fc	allow users to set sample rate and add per iteration latency breakdowns (#53145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53145 add a new API to allow users to set sample rate for runtime stats, also add per iteration latency breakdowns to DDPLoggingData struct. e.g. if users set sample rate to be 1, they can analyze per iteration latency change over time (not avged) ghstack-source-id: 123443369 Test Plan: unit test Reviewed By: SciPioneer Differential Revision: D26763957 fbshipit-source-id: baff6a09c2a590e6eb91362ca6f47ae8fa6ddb0e	2021-03-10 11:35:18 -08:00
Nikita Vedeneev	8f15a2f052	eig_backward: faster and with complex support (#52875 ) Summary: As per title. Compared to the previous version, it is lighter on the usage of `at::solve` and `at::matmul` methods. Fixes https://github.com/pytorch/pytorch/issues/51621 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52875 Reviewed By: mrshenli Differential Revision: D26768653 Pulled By: anjali411 fbshipit-source-id: aab141968d02587440128003203fed4b94c4c655	2021-03-10 11:33:30 -08:00
Mike Ruberry	b99b6065f8	Removes trailing whitespace (#53728 ) Summary: Fixes ``` Run (! git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \|\| (echo "The above files have trailing spaces; please remove them"; false)) aten/src/ATen/native/cuda/BatchLinearAlgebra.cu The above files have trailing spaces; please remove them Error: Process completed with exit code 1. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53728 Reviewed By: ngimel Differential Revision: D26953099 Pulled By: mruberry fbshipit-source-id: 5f1ed2cd767de49447fcbd8e03cb3af7841cbcaf	2021-03-10 11:27:33 -08:00
Ivan Kobzarev	5658ab5f77	[andoid] publishing to maven central (#53568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53568 Bintray, JCenter are going to be unavailable on May 1st https://jfrog.com/blog/into-the-sunset-bintray-jcenter-gocenter-and-chartcenter/ Migrating publishing to Maven Central the same way as other fb oss projects, reference PR https://github.com/pytorch/pytorch/pull/53568/files to publish ``` ./android/gradlew -p android publish ``` <img width="697" alt="Screen Shot 2021-03-09 at 3 14 08 PM" src="https://user-images.githubusercontent.com/6638825/110551387-3e3fc000-80ea-11eb-9604-4e69d6e6d085.png"> Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D26928884 Pulled By: IvanKobzarev fbshipit-source-id: 8754c93a2542405870e2621be5b3f14e3d0081b9	2021-03-10 10:42:23 -08:00
Ivan Kobzarev	05e0ea9661	[android] bump gradle version to 6.8.3 (#53567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53567 Updating gradle to version 6.8.3 Proper zip was uploaded to aws. Successful CI check: https://github.com/pytorch/pytorch/pull/53619 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D26928885 Pulled By: IvanKobzarev fbshipit-source-id: b1081052967d9080cd6934fd48c4dbe933630e49	2021-03-10 10:40:28 -08:00
Thomas Viehmann	e13ef777a7	Use native ctc loss for target length 256 (#53557 ) Summary: Apparently cudnn (8.1) does not like 256-long targets. Thank you raotnameh for reporting. Fixes https://github.com/pytorch/pytorch/issues/53505 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53557 Reviewed By: VitalyFedyunin Differential Revision: D26947262 Pulled By: albanD fbshipit-source-id: df6da7db8fd8e35050b4303ff1658646ebc60141	2021-03-10 10:13:42 -08:00
Ivan Yashchuk	e937db5dba	Added CUDA support for torch.orgqr (#51348 ) Summary: Update: MAGMA support was dropped from this PR. Only the cuSOLVER path is implemented and it's used exclusively. Original PR message: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: heitorschueroff Differential Revision: D26882415 Pulled By: mruberry fbshipit-source-id: 9f91ff962921932777ff108bedc133b55fe22842	2021-03-10 09:59:56 -08:00
kshitij12345	45ddf113c9	[fix] nn.Embedding: allow changing the padding vector (#53447 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53447 Reviewed By: albanD Differential Revision: D26946284 Pulled By: jbschlosser fbshipit-source-id: 54e5eec7da86fa02b1b6e4a235d66976a80764fc	2021-03-10 09:53:27 -08:00
Jane Xu	bcbe07200c	Improve logic for S3 stats gathering. Uses automatic SLOW_TESTS. (#53549 ) Summary: This PR: 1. refactors the logic for S3 stats gathering. 2. Renames SLOW_TESTS to TARGET_DET_LIST to disambiguate and remove confusion with slowTest 2. detects slow tests (tests with time > 5min) to add to the TARGET_DET_LIST based on results in S3 from the previous nightly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53549 Test Plan: Set CIRCLE_JOB to your favorite CI job (like `pytorch_linux_bionic_py3_8_gcc9_coverage_test1`). Run `python test/run_test.py --determine-from=<your fave pytorch files>` e.g., `python test/run_test.py --determine-from=test/run_test.py` Reviewed By: mrshenli Differential Revision: D26904478 Pulled By: janeyx99 fbshipit-source-id: 9576b34f4fee09291d60e36ff2631753a3925094	2021-03-10 09:37:06 -08:00
Jane Xu	1c9fc38eb2	Remove reference to 9.2 as it's been removed from nightlies (#53716 ) Summary: Removing a tiny bit of unneeded reference to cuda92 for windows binary. Note that the config.yml did not change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53716 Reviewed By: VitalyFedyunin Differential Revision: D26947029 Pulled By: janeyx99 fbshipit-source-id: 3bbf1faa513756eda182d2d80033257f0c629309	2021-03-10 09:29:10 -08:00
Edward Yang	70733f2e67	Marginally improve pytest collection for top-level test files (#53617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53617 I'm trying to make `pytest test/*.py` work--right now, it fails during test collection. This removes a few of the easier to fix pytest collection problems one way or another. I have two remaining problems which is that the default dtype is trashed on entry to test_torch.py and test_cuda.py, I'll try to fix those in a follow up. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26918377 Pulled By: ezyang fbshipit-source-id: 42069786882657e1e3ee974acb3ec48115f16210	2021-03-10 08:56:39 -08:00
Edward Yang	6e020a4844	Fix inaccurate dispatch table for fill_ (#53611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53611 fill_ now uses DispatchStub which means it only works for CPU/CUDA. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D26918374 Pulled By: ezyang fbshipit-source-id: fc899c28f02121e7719b596235cc47a0f3da3aea	2021-03-10 08:56:29 -08:00
Edward Yang	4dbd0b639d	Convert a few more checks to raise NotImplementedError (#53610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53610 I noticed these because I was running the test suite under meta device and triggered these error checks without getting a NotImplementedError. Well, now they raise. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26918376 Pulled By: ezyang fbshipit-source-id: 20d57417aa64875d43460fce58af11dd33eb4a23	2021-03-10 08:53:59 -08:00
Michael Carilli	e787872a47	[RELAND] Deduplicate shared params before constructing Reducer in DDP (#53279 ) Summary: Original PR https://github.com/pytorch/pytorch/pull/51929 seemed to trigger failures in `pytorch_linux_xenial_py3_clang5_asan_test2`. Resubmitting to figure out why, and hopefully reland. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53279 Reviewed By: mrshenli Differential Revision: D26916701 Pulled By: zhaojuanmao fbshipit-source-id: 75c74c8ad8ad24154eb59eddb2b222da0a09897e	2021-03-10 07:56:20 -08:00
Hong Xu	039402b945	If distributed module isn't available, don't run distributed/pipeline tests (#53547 ) Summary: Following up https://github.com/pytorch/pytorch/issues/52945 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53547 Reviewed By: mrshenli Differential Revision: D26946364 Pulled By: ezyang fbshipit-source-id: 9f93e76e2420d19b46d4eb3429eac5f263fd5c23	2021-03-10 07:43:43 -08:00
Natalia Gimelshein	6aa5148df2	Filter 0's returned by exponential distribution (#53480 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48841 for half datatype (it was fixed for other datatypes before). The reason for https://github.com/pytorch/pytorch/issues/48841 happening for half was that `exponential_` for half was producing 0s. Exponential distribution implementation on cuda is here `e08aae2613/aten/src/ATen/native/cuda/DistributionTemplates.h (L535-L545)` with `transformation::exponential` defined here `e08aae2613/aten/src/ATen/core/TransformationHelper.h (L113-L123)` It takes a uniformly distributed random number and takes `log` of it. If necessary, the result is then converted to low precision datatype (half). To avoid 0's, before applying `log`, ones are replaced with std::nextafter(1,0). This seems fine, because log(1-eps) is still representable in half precision (`torch.tensor([1.], device="cuda").nextafter(torch.tensor([0.], device="cuda")).log().half()` produces 5.96e-8) , so casting to `scalar_t` should work. However, since fast log approximation is used (`__logf`), the log result is ~3e-9 instead of more accurate 5.96e-8, and underflows when casting to half. Using `::log` instead of fast approximation fixes it, however, it comes with ~20% perf penalty on exponential kernel for fp32 datatype, probably more for half. Edit: alternative approach used now is to filter all small values returned by transformation. The result is equivalent to squashing of 1's to 1-eps that was used before, and computing correct log of 1-eps (which is -eps, exactly equal even for doubles). This doesn't incur noticeable performance hit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53480 Reviewed By: mruberry Differential Revision: D26924622 Pulled By: ngimel fbshipit-source-id: dc1329e4773bf91f26af23c8afa0ae845cfb0937	2021-03-10 00:35:31 -08:00
Raziel Alvarez Guevara	c5cd993add	Adds a bool is_available() method to the backend contract (#53068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53068 Adds a ```bool is_available()``` method to the backend contract: it returns ```true``` if ```compile()``` and ```execute()``` can be called; ```false``` otherwise. It is used to implement the following changes in the ```LoweredModule```: * ```compile()``` in ```__setstate__``` will run if ```is_available()```, else ```__setstate__``` throws an exception (“Backend not available.”). * ```compile()``` at ```LoweredModule``` creation will run if ```is_available()```, else a WARNING will be thrown. * ```execute()``` will only be executed if ```is_available()``` returns true; else throws an exception (“Backend not available.”). The goal of these changes is to ensure we have a well defined behaviour for the different combinations of backend availability on-host and on-target. More specifically, backends may have different capabilities to compile and/or execute the Module, depending whether this happens on-host (i.e. where the program is being written) or on-target (where the program is being executed). First of all, we know that "preprocess" always takes place, and that only happens on-host at creation time. So, we can assume that any compilation is needed/possible on-host then all of it could be pushed here. Overall, we want to ensure the following: On host \| compile \| execute \| Outcome \| \| -- \| -- \| -- \| \| No \| No \| On module creation, LoweredModule is generated, with a warning (since compilation and execution can still take place on-target). On module load, throws an exception (since execution is not possible). \| \| No \| Yes \| This configuration should not be possible. This assumes the full compiler is not available, even if some work was done in preprocess the program cannot be finalized for execution. \| \| Yes \| No \| In this case, the expectation would be for is_available() to return false, and compilation logic to move into preprocess. \| \| Yes \| Yes \| All good. This is the only case that is_available() should return true. \| On target \| compile \| execute \| Outcome \| \| -- \| -- \| -- \| \| No \| No \| Loading the LoweredModule throws an exception. Since execution is not possible. \| \| No \| Yes \| Basically this is another instance of Yes/Yes: compilation per se may not be possible on device, which means compile() can be called without issue but it is a no-op, and thus is_available should return true. Consequently, loading the LoweredModule: Succeeds, if the preprocessed module is ready for execution. Fails with exception otherwise. \| \| Yes \| No \| This configuration should not be possible. Just putting here for completeness. \| \| Yes \| Yes \| All good. This, along with No/Yes case (because compilation is assumed to have happened on-host, so it's just another instance of Yes/Yes), are the cases where is_available() should return true. \| Refactoring existing code This change also updates other backends (Glow) code, to implement the is_available() method to have the same behaviour as before this change (i.e. always available). This should not cause backward incompatibilities with already saved models since we're adding a new method to the PyTorchBackendInterface. Models saved with the old interface that didn't have is_available() will still find the other 2 methods in the bound object (i.e. compile and execute), and the saved LoweredModule logic will be the old one. Future We plan to use is_available() to implement support for fallback to the PyTorch interpreter. ghstack-source-id: 123498571 Test Plan: Added C++ (test_backend.cpp) and Python (test_backends.py) tests to validate the exceptions. Reviewed By: jackm321, spaugh, iseeyuan Differential Revision: D26615833 fbshipit-source-id: 562e8b11db25784348b5f86bbc4179aedf15e0d3	2021-03-10 00:24:16 -08:00
James Reed	215950e2be	Convert type annotations in nn/functional.py to py3 syntax (#53656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53656 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26926018 Pulled By: jamesr66a fbshipit-source-id: 2381583cf93c9c9d0c9eeaa6e41eddce3729942d	2021-03-09 22:26:22 -08:00
Meghan Lele	a20b36b03d	[JIT] Fix backward compatibility test broken by #53410 (#53683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53683 Summary This commit fixes the BC test broken by #53410. There are no promises about operator-level BC with the operators added and modified by that PR, so this test failure does not represent a real backward compatibility issue. Test Plan Ran the BC test locally by runniing `dump_all_schemas.py` and then `check_backward_compatibility.py`. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26936505 Pulled By: SplitInfinity fbshipit-source-id: 829d5d78e4cba44feea382d0fbd66e77dee7eed2	2021-03-09 22:00:15 -08:00
Eric Zeng	8a6df06a0e	Print onnxifi failed status code in readable format (#53648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53648 Reviewed By: hl475 Differential Revision: D26838564 fbshipit-source-id: 6e0e5695a58422d573f9c97bfb241bce2688f13b	2021-03-09 21:34:57 -08:00
Jordan Fix	3b0e4a6ed4	[GraphModule] Improve buffer registration during init (#53444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53444 GraphModule construction has two options when constructing the base nn.Module: a dict of names to attrs to assign to the GraphModule, or another nn.Module to copy attrs from. - For the dict case, add logic to explicitly register `nn.Tensors` that are not `nn.Parameter` as buffers on the GraphModule, else fall back to `__setattr__`. - For the other `nn.Module` case, update so that it checks in the other module whether the attr to copy in is a buffer, and register it as such, else fall back to `__setattr__`. Test Plan: Added tests for fetching params and buffers from a GraphModule using both dict and module `__init__`s Reviewed By: jamesr66a Differential Revision: D26860055 fbshipit-source-id: 8d9999f91fef20aaa10969558006fc356247591f	2021-03-09 21:05:01 -08:00
Basil Hosmer	c3f8d57c70	use DimVector for sizes and strides (#53001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53001 Test Plan: Imported from OSS Reviewed By: swolchok Differential Revision: D26719508 Pulled By: bhosmer fbshipit-source-id: 4053d632e11b2de1576c59c5a6b881a195d6206b	2021-03-09 20:09:28 -08:00
Edward Yang	0257eddc16	Editing pass on native/README.md updates (#53638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53638 Mostly slight edits, and deleting some outdated sections. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D26920600 Pulled By: ezyang fbshipit-source-id: e3bda80ecb622a1fcfde64e4752ba89a71056340	2021-03-09 19:30:59 -08:00
Hao Lu	409a76f72c	[Static Runtime] Fix bug in static_runtime::to_copy (#53634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53634 Make the op signature of `static_runtime::to_copy` consistent with that of native_functions.yaml so it works with 2-5 args: ``` - func: to.dtype(Tensor self, ScalarType dtype, bool non_blocking=False, bool copy=False, MemoryFormat? memory_format=None) -> Tensor variants: method device_guard: False ``` (Note: this ignores all push blocking failures!) Reviewed By: ajyu Differential Revision: D26906726 fbshipit-source-id: b9203eb23619aba42b1bfed1a077401f9fe2ddf0	2021-03-09 16:26:34 -08:00
Meghan Lele	60ed8fb244	[JIT] Enable ModuleList non-literal indexing (#53410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53410 Summary This commit enables indexing into `ModuleList` using a non-literal index if the LHS of the assignment statement of which the indexing is the RHS is annotated with an interface type. This feature already exists for `ModuleDict`, and this commit builds on top of that implementation. A `prim::ModuleContainerIndex` operator is emitted for any statement of the form `lhs: InterfaceType = module_container[idx]`. The same operator has to be used for both `ModuleDict` and `ModuleList` because serialization does not preserve the metadata that indicates whether a `Module` is a `ModuleDict` or `ModuleList`. Testing This commit extends the existing unit tests for non-literal `ModuleDict` indexing to test non-literal `ModuleList` indexing. Fixes This commit fixes #47496. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26857597 Pulled By: SplitInfinity fbshipit-source-id: d56678700a264d79aae3de37ad6b08b080175f7c	2021-03-09 16:11:34 -08:00
James Reed	5dca8ff6de	[FX] Make TracerBase._find_user_frame private (#53654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53654 Test Plan: Imported from OSS Reviewed By: suo, Chillee Differential Revision: D26924950 Pulled By: jamesr66a fbshipit-source-id: 23e641bbcabff148c18db0edeff0a12c10b8c42d	2021-03-09 16:06:22 -08:00
Hameer Abbasi	cff22f8794	Port sin to structured. (#52277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52277 Test Plan: Imported from OSS Reviewed By: walterddr, nikithamalgifb Differential Revision: D26732398 Pulled By: bdhirsh fbshipit-source-id: fa1a3c2359e2bf8fe326d2f74d2f9041ba189d24	2021-03-09 16:06:18 -08:00
Hameer Abbasi	b9c3edd583	Remove hacky wrapper from a lot of unary operators. (#52276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52276 Test Plan: Imported from OSS Reviewed By: walterddr, nikithamalgifb Differential Revision: D26732399 Pulled By: bdhirsh fbshipit-source-id: 4189594e938c9908a4ea98a0b29d75a494d0dc35	2021-03-09 16:04:36 -08:00
Brian Hirsh	233b9490c2	fix channels_last bug in upsample kernels (#53535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53535 During the port to structured kernels for upsample kernels, I missed that a subset of them explicitly pass `memory_format` information from the input to the output tensors. Note 1: I added the logic into the `meta` function of each op, which feels morally correct since this logic affects the output shape/metadata. One consequence is that all backend implementations will get the logic. I synced with fmassa that this seems reasonable. Note 2: This logic used to happen in the following operators, which this PR fixes: - upsample_nearest3d - upsample_trilinear3d - upsample_nearest2d - upsample_bilinear2d I explicitly didn't patch the other upsample kernels, which look like they never forwarded memory_format information: - `upsample_bicubic2d` (maybe this should though? `UpSampleBicubic2d.cpp` isn't currently written to do anything different for `channels_last` tensors) - All of the `upsample_{mode}1d` operators. Probably because, afaik, channels_last isn't supported for 3d tensors - The corresponding backwards operator for every upsample op. Note 3: I'm also wondering why memory_format isn't just directly a part of the `tensor::options()` method, which would cause all ops to universally forward memory_format information from input to output tensors, rather than just the upsample ops. My guess is: - BC-breakage. I'm not sure whether this would really break people, but it's an API change - performance. `tensor::options()` is called everywhere, and adding a call to `suggest_memory_format()` would probably noticeably hit microbenchmarks. We could probably deal with that by making `memory_format` a precomputed field on the tensor? Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26891540 Pulled By: bdhirsh fbshipit-source-id: b3845f4dd5646b88bf738b9e41fe829be6b0e5cf	2021-03-09 15:23:53 -08:00
Tomasz Grzegorzek	a3465214ba	move rnn cell size check to cpp (#51964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32193. Possible further improvements: - do the same for quantized cells - reuse newly written functions in `56034636b9/torch/csrc/api/src/nn/modules/rnn.cpp (L699-L715)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51964 Reviewed By: albanD Differential Revision: D26757050 Pulled By: ngimel fbshipit-source-id: 9c917d9124de2b914ad9915c79af675ae561295a	2021-03-09 15:02:20 -08:00
Scott Wolchok	0606057af3	[PyTorch] Add c10::MaybeOwned and Tensor::expect_contiguous (#53317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53317 This seems like it might help in cases where we have to call `Tensor::contiguous`, but we expect that the tensor in question will be contiguous a good portion of the time. ghstack-source-id: 123203771 Test Plan: Profiled AdIndexer on inline_cvr; time spent in clip_ranges_gather_sigrid_hash_each_feature<int> was cut in half from 1.37% to 0.66% Reviewed By: smessmer Differential Revision: D26738036 fbshipit-source-id: b5db10783ccd103dae0ab3e79338a83b5e507ebb	2021-03-09 14:51:23 -08:00
Scott Wolchok	8acb74c405	[PyTorch] Make IValue::toTensor() inlineable (#53213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53213 The failure path for toTensor() is fairly long because it has to stringify tagKind() and construct a std::string. Forcibly outlining it should allow inlining the happy path. ghstack-source-id: 123012703 Test Plan: 1) Compare perf profile on AdIndexer benchmark before/after -- toTensor frames no longer show up, demonstrating inlining 2) Compare perf stat results on AdIndexer benchmark before/after: Before: ``` 17,104.66 msec task-clock # 0.999 CPUs utilized ( +- 0.26% ) 3,666 context-switches # 0.214 K/sec ( +- 18.53% ) 3 cpu-migrations # 0.000 K/sec ( +- 6.25% ) 102,745 page-faults # 0.006 M/sec ( +- 0.47% ) 33,860,604,938 cycles # 1.980 GHz ( +- 0.25% ) (50.02%) 69,514,752,652 instructions # 2.05 insn per cycle ( +- 0.06% ) (50.01%) 11,280,877,966 branches # 659.521 M/sec ( +- 0.11% ) (50.01%) 75,739,099 branch-misses # 0.67% of all branches ( +- 0.98% ) (50.03%) # Table of individual measurements: 17.2467 (+0.1172) # 17.0014 (-0.1280) # 17.2134 (+0.0840) # 17.0951 (-0.0343) # 17.0905 (-0.0389) # # Final result: 17.1294 +- 0.0447 seconds time elapsed ( +- 0.26% ) ``` After: ``` 16,910.66 msec task-clock # 0.999 CPUs utilized ( +- 0.27% ) 3,495 context-switches # 0.207 K/sec ( +- 18.34% ) 3 cpu-migrations # 0.000 K/sec ( +- 6.25% ) 101,769 page-faults # 0.006 M/sec ( +- 0.45% ) 33,460,776,952 cycles # 1.979 GHz ( +- 0.28% ) (50.03%) 69,243,346,925 instructions # 2.07 insn per cycle ( +- 0.17% ) (50.02%) 11,229,930,860 branches # 664.074 M/sec ( +- 0.14% ) (50.03%) 72,273,324 branch-misses # 0.64% of all branches ( +- 0.55% ) (50.03%) # Table of individual measurements: 16.9530 (+0.0246) # 17.0898 (+0.1614) # 16.8493 (-0.0791) # 16.8282 (-0.1002) # 16.9217 (-0.0067) # # Final result: 16.9284 +- 0.0464 seconds time elapsed ( +- 0.27% ) ``` 1.1% cycles win, 0.38% instructions win, both apparently outside noise level Reviewed By: smessmer Differential Revision: D26793481 fbshipit-source-id: b035b3ad20f9e22ae738d91163641031b1130ce6	2021-03-09 14:49:44 -08:00
James Reed	f8e7d8bb0d	[FX][docs] Render inherited methods in fx.Tracer API reference (#53630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53630 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26918962 Pulled By: jamesr66a fbshipit-source-id: 2c84e308889d4ba3176018c7bd44a841e715e6c8	2021-03-09 14:30:41 -08:00
Hao Lu	a8ecf306da	[Static Runtime] Remove dead code (#53588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53588 Remove `SRViewOperatorRegistry` and related code now that it's no longer needed. Reviewed By: swolchok Differential Revision: D26901367 fbshipit-source-id: fa73501cd785d4b89466cda81481aea892f8241f	2021-03-09 13:36:41 -08:00
Richard Barnes	a9e4bb56e5	Add more kernel launch checks (#53286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53286 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D26818164 fbshipit-source-id: 01ba50dc7e4a863e26c289d746bc5b95aa76d3cc	2021-03-09 13:18:54 -08:00
Eli Uriegas	b8546bde09	ci: Remove special versioning privileges for cu102 (#53133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53133 In light of some issues where users were having trouble installing CUDA specific versions of pytorch we should no longer have special privileges for CUDA 10.2. Recently I added scripts/release/promote/prep_binary_for_pypi.sh (https://github.com/pytorch/pytorch/pull/53056) to make it so that we could theoretically promote any wheel we publish to download.pytorch.org to pypi Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26759823 Pulled By: seemethere fbshipit-source-id: 2d2b29e7fef0f48c23f3c853bdca6144b7c61f22	2021-03-09 13:16:56 -08:00
Emilio Castillo	c0c5f80f36	Lazy Modules Documentation Clarifications (#53495 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53366 gchanan albanD Thanks for the feedback. Did a first pass trying to address the concerns in the original issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53495 Reviewed By: mrshenli Differential Revision: D26914768 Pulled By: albanD fbshipit-source-id: fa049f1952ef05598f0da2abead9a5a5d3602f75	2021-03-09 13:09:33 -08:00
Facebook Community Bot	4fa11147c5	Automated submodule update: FBGEMM (#53632 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `4b88f40a0e` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53632 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D26919594 fbshipit-source-id: 4ac25bbe883b3c2cd4c02bc75a6e2c6f41d2beb7	2021-03-09 13:03:28 -08:00
Adam Simpkins	efb1895f81	[caffe2] use snprintf() instead of sprintf() in the Checkpoint operator (#53434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53434 Use `snprintf()` to avoid buffer overflows. Also only throw an exception on error, instead of crashing the entire application. A failure can occur if the caller supplies an invalid format string. ghstack-source-id: 123401582 Test Plan: Ran the checkpoint tests: buck test caffe2/caffe2/python/operator_test:checkpoint_test Verified that the checkpoint file names logged in the output are the same before and after this change. I also tested manually changed the initial buffer size to 1 to confirm that the code works when the initial buffer size is too small. I considered updating the checkpoint_test.py code to test using long db names that would exceed this limit, but I figured that long filenames was likely to cause other problems on some platforms (Windows has a maximum path length of 260 characters up until pretty recent releases). Differential Revision: D26863355 fbshipit-source-id: 8fc24faa2a8dd145471067718d323fdc8ce055d6	2021-03-09 12:54:15 -08:00
Eli Uriegas	e87a686d21	.circleci: Remove hardcoded tag for rocm (#53636 ) Summary: We shouldn't need the hardcoding anymore Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53636 Reviewed By: malfet Differential Revision: D26921067 Pulled By: seemethere fbshipit-source-id: 1e3ba4bbef4c5c6c6a6bcc2f137fef017cec3bb7	2021-03-09 12:52:22 -08:00
Mengwei Liu	bcd94e220d	[PyTorch] Fix typo in QNNPACK Summary: Build failed when `PYTORCH_QNNPACK_RUNTIME_QUANTIZATION` is unset. According to D21339044 (`622f5b68f0`) it seems like a typo. Test Plan: buck build //xplat/caffe2/aten/src/ATen/native/quantized/cpu/qnnpack:pytorch_qnnpackWindows xplat/mode/windows-msvc-15.9 Reviewed By: kimishpatel Differential Revision: D26907439 fbshipit-source-id: ac52eeef4ee70726f2a97b22ae65921b39aa0c0b	2021-03-09 12:45:25 -08:00
Facebook Community Bot	a496520c1d	Automated submodule update: tensorpipe (#53599 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `a11ddfdf99` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53599 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26910634 fbshipit-source-id: a2bf808536e42b9208e5d9f88198ce64061385fa	2021-03-09 11:50:05 -08:00
Philip Meier	b0afe945a7	Fix pylint error torch.tensor is not callable (#53424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53424 Fixes https://github.com/pytorch/pytorch/issues/24807 and supersedes the stale https://github.com/pytorch/pytorch/issues/25093 (Cc Microsheep). If you now run the reproduction ```python import torch if __name__ == "__main__": t = torch.tensor([1, 2, 3], dtype=torch.float64) ``` with `pylint==2.6.0`, you get the following output ``` test_pylint.py:1:0: C0114: Missing module docstring (missing-module-docstring) test_pylint.py:4:8: E1101: Module 'torch' has no 'tensor' member; maybe 'Tensor'? (no- member) test_pylint.py:4:38: E1101: Module 'torch' has no 'float64' member (no-member) ``` Now `pylint` doesn't recognize `torch.tensor` at all, but it is promoted in the stub. Given that it also doesn't recognize `torch.float64`, I think fixing this is out of scope of this PR. --- ## TL;DR This BC-breaking only for users that rely on unintended behavior. Since `torch/__init__.py` loaded `torch/tensor.py` it was populated in `sys.modules`. `torch/__init__.py` then overwrote `torch.tensor` with the actual function. With this `import torch.tensor as tensor` does not fail, but returns the function rather than the module. Users that rely on this import need to change it to `from torch import tensor`. Reviewed By: zou3519 Differential Revision: D26223815 Pulled By: bdhirsh fbshipit-source-id: 125b9ff3d276e84a645cd7521e8d6160b1ca1c21	2021-03-09 11:32:53 -08:00
Xiao Wang	ef3765b992	Fix a cuda max_pool3d issue, do multiplication in int64 (#52828 ) Summary: Fix https://github.com/pytorch/pytorch/issues/52822 - [x] benchmark Pull Request resolved: https://github.com/pytorch/pytorch/pull/52828 Reviewed By: mrshenli Differential Revision: D26866674 Pulled By: heitorschueroff fbshipit-source-id: bd8276dd70316a767dc6e1991c1259f1f0b390b2	2021-03-09 10:54:43 -08:00
Kimish Patel	9f2aea7b88	[Pytorch] Fix embedding bag bug accessing unaligned memory (#53300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53300 Float scale and bias are packed as per row parameters at the end of each row. This takes 8 bytes. However if the number of elements in row are such that end of row address is unaligned for float, not multiply of 4 bytes, we will get unaglined memory access. Current solution is inefficient, so this should really be fixed at weight packing time. It seems that longer term there will be prepack function that packs weights. So this fallback path should eventually match that and not store scale and bias inline. Test Plan: python test/test_quantization.py Reviewed By: pengtxiafb Differential Revision: D26828077 fbshipit-source-id: 8512cd95f3ac3ca53e1048139a9f6e19aa8af298	2021-03-09 09:48:04 -08:00
Nikita Shulga	7e6a84d238	Add logic to auto-fetch submodules (#53461 ) Summary: In setup.py add logic to: - Get list of submodules from .gitmodules file - Auto-fetch submodules if none of them has been fetched In CI: - Test this on non-docker capable OSes (Windows and Mac) - Use shallow submodule checkouts whenever possible Pull Request resolved: https://github.com/pytorch/pytorch/pull/53461 Reviewed By: ezyang Differential Revision: D26871119 Pulled By: malfet fbshipit-source-id: 8b23d6a4fcf04446eac11446e0113819476ef6ea	2021-03-09 09:13:35 -08:00
Caroline Chen	2f91cda37e	Modify error message (#53525 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53518 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53525 Reviewed By: mthrok Differential Revision: D26900045 Pulled By: carolineechen fbshipit-source-id: 387301381603d37d24cc829c8fed38123f268c0b	2021-03-09 09:12:00 -08:00
Edward Yang	02c0c7a32b	Add Meta support for empty_strided (#53397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53397 It turns out once you remove all the indirection from the empty_cpu_strided implementation, this implementation is pretty simple. We should see if we can simplify empty_cpu this way too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26891870 Pulled By: ezyang fbshipit-source-id: 9bddd332d32d8bf32fa3175e3bb0ac3a8954ac91	2021-03-09 09:06:30 -08:00
Edward Yang	707fc354eb	Add debug only layout assert for empty_cpu (#53396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53396 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26891806 Pulled By: ezyang fbshipit-source-id: 4789ab5587d1a11d50e9a60bbfa1c21c1222823e	2021-03-09 09:06:25 -08:00
Edward Yang	28d6e01511	Add TORCH_CHECK_NOT_IMPLEMENTED/c10::NotImplementedError; make dispatch use it (#53377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53377 My underlying goal is I want to make the test suite ignore NotImplementedError without failing when bringing up a backend (meta) that doesn't have very many functions implemented. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D26850766 Pulled By: ezyang fbshipit-source-id: ffbdecd22b06b5ac23e1997723a6e2a71dfcd14a	2021-03-09 09:04:22 -08:00
Mike Ruberry	2d36b30a8c	Expands OpInfo out= testing (#53259 ) Summary: Addresses several of the challenges described in https://github.com/pytorch/pytorch/issues/49468. This PR builds on https://github.com/pytorch/pytorch/pull/50741 and https://github.com/pytorch/pytorch/issues/53105 to extend OpInfo out= testing. It covers the following cases for ops that produce a single tensor: - out= values don't affect computation - out= noncontiguous produces the correct output and preserves strides - out= with the wrong shape throws a warning - out= with an empty tensor throws no warning - out= with the wrong device throws an error - out= with a dtype the computation's result can't be "safely" cast to throws an error It works with operations that produce a single tensor and operations that produce an iterable of tensors (the latter is tested with operations like torch.svd). In addition to the new out= test, the OpInfos have been updated. "supports_tensor_out" is replaced with the more general and straightforward "supports_out" metadata, and many operations which previously had to skip out= testing with an explicit SkipInfo no longer need to. A couple redundant tests in test_unary_ufuncs.py have been removed, too. One other perk of these tests is that once all operations have OpInfos this will allow us to validate that we've universally deprecated incorrectly sized tensors passed to out=, and give us the option to actually disable the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53259 Reviewed By: mrshenli Differential Revision: D26894723 Pulled By: mruberry fbshipit-source-id: 2b536e9baf126f36386a35f2f806dd88c58690b3	2021-03-09 08:19:26 -08:00
Taylor Robie	9df1b98bab	Quality of life improvements to Timer (#53294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53294 Just a bunch of little things, none of which are big enough to need a full PR. 1) C++ wall time should release the GIL 2) Add option to retain `callgrind.out` contents. This will allow processing with kCachegrind for more detailed analysis. 3) Stop subtracting the baseline instruction counts. (People just found it confusing when they saw negative instruction counts.) There is a finesse in #53295 that drops the baseline to ~800 instructions for `number=100`, and at that level it's not worth correcting. 4) Add a `__mul__` overload to function counts. e.g. suppose `c0` was run with `number=100`, and `c1` was run with `number=200`, then `c0 * 2 - c1` is needed to properly diff them. (Obviously there are correctness concerns, but I think it's fine as a caveat emptor convenience method.) 5) Tweak the `callgrind_annotate` call, since by default it filters very small counts. 6) Move some args to kwargs only since types could be ambiguous otherwise. 7) Don't omit rows from slices. It was annoying to print something like `stats[:25]` and have `__repr__` hide the lines in the middle. Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26906715 Pulled By: robieta fbshipit-source-id: 53d5cd92cd17212ec013f89d48ac8678ba6e6228	2021-03-09 08:15:30 -08:00
Taylor Robie	f4b344ad5c	Definition infrastructure for instruction count ubenchmarks (#53293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53293 Instruction count benchmarks need some includes for IValues, but this is also just generally useful. (Unlike Python where you can just drop imports anywhere, C++ will get very upset if you `#include` in a function body...) Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26906684 Pulled By: robieta fbshipit-source-id: cbdfd79d3b8383100ff2e6857b6f309c387cbe2a	2021-03-09 08:13:38 -08:00
Igor Sugak	0a97712326	[caffe2] don't use static for template declarations in headers (#53602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53602 Using static in headers causes code bloat. Remove the unnecessary `static` qualifiers. Test Plan: sandcastle Reviewed By: asp2insp Differential Revision: D26886180 fbshipit-source-id: 6008bce0d47f06d3146ce998234574a607c99311	2021-03-09 07:33:45 -08:00
Sean Silva	34d9278c19	Remove notion of "level" from `Module::dump_to_str`. (#52539 ) Summary: The code uses `torch::jit::jit_log_prefix` for handling recursive indenting in most places in this function. There was one place that was using "level", but it was buggy -- it would result in a compounding superlinear indent. Note that changing it to "level+1" doesn't fix the bug. Before/after: https://gist.github.com/silvasean/8ee3ef115a48de6c9c54fbc40838d8d7 The new code establishes a recursive invariant for `Module::dump_to_str`: the function returns the module printed at the base indent level (i.e. no indent). `torch::jit:log_prefix` is used to prefix recursive calls. The code was already nearly there, except for this spurious use of "level". Pull Request resolved: https://github.com/pytorch/pytorch/pull/52539 Reviewed By: navahgar Differential Revision: D26773657 Pulled By: gmagogsfm fbshipit-source-id: ab476f0738bf07de9f40d168dd038dbf62a9a79e	2021-03-09 05:45:57 -08:00
Yanan Cao	bf88a4dad5	Support parsing Ellipsis in JIT frontend (#53576 ) Summary: De-sugars `Ellipsis` into dots (`...`) Fixes https://github.com/pytorch/pytorch/issues/53517 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53576 Reviewed By: pbelevich Differential Revision: D26904361 Pulled By: gmagogsfm fbshipit-source-id: 5b23e049a075a9a99e37dcb47a9410b6f82a6fb7	2021-03-09 00:06:21 -08:00
Eric Jang	c2ccb3578e	Fix inport -> import typo in documentation (#53589 ) Summary: Fixes a small documentation typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/53589 Reviewed By: ngimel Differential Revision: D26907045 Pulled By: Chillee fbshipit-source-id: 15c35bec8d75dd897fe8886d0e0e1b889df65b24	2021-03-08 23:56:42 -08:00
Tao Xu	cb36e503d8	[iOS GPU][BE][5/n] Remove indirection calls from MPSCNNOps.mm and MetalAten.mm (#53432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53432 1. Creating individual .mm files for each op under the ops/ folder, and each op just has it's own function. The op is registered at the end of the file. 2. Remove the indirection calls from MetalAten.mm to MPSCNNOps.mm 3. Delete MPSCNNOps.mm ghstack-source-id: 123205443 Test Plan: 1. Sandcastle 2. CircleCI 3. Mobilelab Reviewed By: SS-JIA Differential Revision: D26840953 fbshipit-source-id: e1664c8d7445fdbd3b016c4dd51de0a6294af3a5	2021-03-08 22:44:04 -08:00
Tao Xu	aa687bb6f4	[iOS GPU][BE][4/n] - Convert Objective-C class methods to C functions (#53431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53431 Objective-C’s dynamism comes at the cost of code size, perf and safety. In Facebook, we tend to not use Objective-C primitives or keep it to a minimum unless you need them. ghstack-source-id: 123063340 Test Plan: 1. CircleCI 2. SandCastleCI 3. Mobilelab Reviewed By: SS-JIA Differential Revision: D26800753 fbshipit-source-id: b5a752a700d72ca3654f6826537aa3af47e87ecd	2021-03-08 22:42:25 -08:00
Hao Lu	2dffb4e38e	[Static Runtime] Back out D26659824 (#53570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53570 Reviewed By: allwu Differential Revision: D26899099 fbshipit-source-id: 87c6d74a91c102e6b0487f9e6f49394755792a94	2021-03-08 22:14:15 -08:00
Tao Xu	dc29604fd1	[iOS GPU][BE][3/n] - Rename MetalTensor to MetalTensorImplStorage (#53430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53430 The definition of Metal tensor is confusing, as we're using it to initialize the MetalTensorImpl. It acts more like a TensorImplStorage. ghstack-source-id: 123038073 Test Plan: 1. Sandcastle CI 2. Circle CI 3. AIBench/Mobilelab Reviewed By: SS-JIA Differential Revision: D26685439 fbshipit-source-id: e0487d0884e4efc3044d627ed0e4af454eca9d67	2021-03-08 21:41:35 -08:00
Shiyan Deng	d521fd799d	[FX Acc] Add support for multi partitions in fx-glow (#53280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53280 Add supports for handling multiple partitions in fx_glow e2e flow. Test Plan: `buck test glow/fb/fx/fx_glow:test_fx_glow` Reviewed By: gcatron Differential Revision: D26819886 fbshipit-source-id: b31aa4612aab3aee694bb155571ba6f5e75c27ba	2021-03-08 20:16:39 -08:00
Jordan Fix	5b52ff6c8e	[fx] Add DCE pass (#52658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52658 DCE will reverse iterate over the graph looking for nodes without users and delete them. It will skip over unused placeholders (since this affects the signature of the method) and outputs (which never have users but we want to keep them :) ) Test Plan: Added unit tests Reviewed By: jamesr66a, khabinov, chenccfb Differential Revision: D26602212 fbshipit-source-id: f4f196973e40546076636090bb0008c24f33795e	2021-03-08 19:54:56 -08:00
Nikita Shulga	17d00319bc	Install GCC-9 into ROCm builders (#53459 ) Summary: Should prevent intermittent internal compiler errors reported in https://bugs.launchpad.net/ubuntu/+source/gcc-7/+bug/1917830 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53459 Reviewed By: izdeby Differential Revision: D26870602 Pulled By: malfet fbshipit-source-id: 1e90bb0d33736d01a696f80fc981aedcf7e3b639	2021-03-08 19:14:41 -08:00
Bram Wasti	97460d3545	[static runtime] Minimum fusion group size (#50217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50217 If we fuse small groups, things are slow Test Plan: buck test //caffe2/test:static_runtime Reviewed By: bertmaher Differential Revision: D25643460 fbshipit-source-id: d2f39a4d612df3e1e29362abb23c2d997202f6ea	2021-03-08 19:06:16 -08:00
Jacob Szwejbka	a947bfaa26	[Pytorch] Remove assumption forward exists in freeze_module (#52918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52918 Freeze_module seems to operate under the assumption that forward always exists. This isnt true, so the change first checks for existence then retrieves the function. ghstack-source-id: 123215242 Test Plan: Try freezing something with and without forward. Reviewed By: dhruvbird Differential Revision: D26671815 fbshipit-source-id: d4140dad3c59d3d20012143175f9b9268bf23050	2021-03-08 18:29:44 -08:00
Howard Huang	53c77e7d5d	Add mock.patch() to clear environment for test (#53537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53537 fixes #53526 This fixes the issue of the one of the environment variables being tested is somehow set by a previous test. For example: `WORLD_SIZE=1 python test/distributed/test_c10d.py RendezvousEnvTest.test_common_errors` would have previously failed but now passes Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26891207 Pulled By: H-Huang fbshipit-source-id: 1c23f6fba60ca01085a634afbafbb31ad693d3ce	2021-03-08 17:15:47 -08:00
Yuxin Wu	b0984f7925	[pytorch] use correct warning type for tracer warnings (#53460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53460 We have code to ignore this category of warnings and found this one is incorrect. Use `stacklevel=2`, otherwise the warning is always filtered by TracerWarning.ignore_lib_warnings() Test Plan: sandcastle Reviewed By: wanchaol Differential Revision: D26867290 fbshipit-source-id: cda1bc74a28d5965d52387d5ea2c4dcd1a2b1e86	2021-03-08 17:02:41 -08:00
Oleg Khabinov	0d04e51233	[caffe2] Add an optimization to avoid extra fp32->fp16 conversions in Onnxifi (#53560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53560 If an op like Fused8BitRowwiseQuantizedToFloat ends up on CPU and Tile ends up on an accelerator and only FP16 is supported, then we want to make sure conversion from FP32 to FP16 is done on CPU to save cycles on accelerator. Reviewed By: ChunliF Differential Revision: D26862322 fbshipit-source-id: a7af162f2537ee9e4a78e6ef3f587129de410b07	2021-03-08 16:36:12 -08:00
Jane Xu	d0b32156f0	move test to CUDA only (#53561 ) Summary: Helps make master green by removing this hefty memory allocating from CPU test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53561 Reviewed By: malfet, albanD Differential Revision: D26897941 Pulled By: janeyx99 fbshipit-source-id: 9f6c2d55f4eea1ab48665f7819fc113f21991036	2021-03-08 16:32:14 -08:00
Facebook Community Bot	a0d425d38d	Automated submodule update: FBGEMM (#53509 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `da1e687ee3` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53509 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D26885426 fbshipit-source-id: 80a3d0680fa584744380bb993ee3a2dc13991847	2021-03-08 16:26:07 -08:00
Ansha Yu	7c0a4e78ca	[static runtime] convert to->to_copy (#53524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53524 Add to->to_copy in the ReplaceWithCopy pass for playing well with AliasDb Test Plan: Run bench with CastedBatchOneHot fusion off (https://www.internalfb.com/intern/diff/view-version/123230476/), on adindexer and adfinder models Reviewed By: hlu1 Differential Revision: D26887050 fbshipit-source-id: 3f2fb9e27783bcdeb91c8b4181575f059317aff1	2021-03-08 16:19:03 -08:00
Mike Ruberry	1e992810b5	Revert D26811466: [pytorch][PR] [reland] Add OpInfo for `bitwise_not` and make ROCM and CUDA OpInfo tests consistent Test Plan: revert-hammer Differential Revision: D26811466 (`a5ada2127d`) Original commit changeset: 8434a7515d83 fbshipit-source-id: 9c2c760e18154a88cf7531e45843a802e3f3d19c	2021-03-08 15:47:47 -08:00
Horace He	067ad31210	[NNC] Added some more external function bindings (#53420 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/53420 Reviewed By: navahgar Differential Revision: D26876784 Pulled By: Chillee fbshipit-source-id: 05e7c782a72de5159879f88a104f1a273e0345eb	2021-03-08 14:18:30 -08:00
Jane Xu	c72473fe2c	Adding print_test_stats.py job to Windows CI (#53387 ) Summary: This way, we can get S3 test time stats for windows tests as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53387 Reviewed By: samestep Differential Revision: D26893613 Pulled By: janeyx99 fbshipit-source-id: ac59e4406e472c9004eea0aae8a87a23242e3b34	2021-03-08 13:56:46 -08:00
Tao Xu	48ec939d39	[iOS GPU][BE][2/n] - Use dispatcher in MPSCNNTests.mm (#53429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53429 Call the testing ops through dispatcher instead of calling them through `at::native`. Some metal ops can't be called through dispatcher yet. For example, `at::t` will call `at::as_strided` which hasn't been implemented on metal yet. For those ops, we'll skip and call `mpscnn::`directly. We'll convert those ops once we have implemented the missing ops. ghstack-source-id: 123038068 Test Plan: - Sandcastle CI - Circle CI - AIBench/Mobilelab Reviewed By: SS-JIA, AshkanAliabadi Differential Revision: D26683366 fbshipit-source-id: bf130b191046f5d9ac9b544d512bc6cb94f08c09	2021-03-08 13:50:42 -08:00
iramazanli	e90e773445	Fix to empty_like example (#53088 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52375 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53088 Reviewed By: zou3519 Differential Revision: D26752772 Pulled By: iramazanli fbshipit-source-id: 21e395c6bbfd8f2cc808ddc12aefb2a426bb50d0	2021-03-08 13:19:47 -08:00
Chen Lai	64255294ba	[PyTorch][CI] Enable building test_lite_interpreter_runtime unittest in CI (macos) (#52566 ) Summary: ## Summary 1. Enable building libtorch (lite) in CI (macos) 2. Run `test_lite_interpreter_runtime` unittest in CI (macos) ![image](https://user-images.githubusercontent.com/16430979/110189039-b2b8ed00-7dd2-11eb-8fa1-be2d9e23792a.png) {F467163464} ![image](https://user-images.githubusercontent.com/16430979/110189119-e3008b80-7dd2-11eb-9e80-7c2ae6862468.png) {F467164144} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52566 Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26601585 Pulled By: cccclai fbshipit-source-id: da7f47c906317ab3a4ef38fe2dbf2e89bc5bdb24	2021-03-08 13:09:25 -08:00
Tiehang Tim Duan	7b7775bec2	feature_segmented_histogram_binning_calibration Summary: We implement a hierarchical fine grained binning structure, with the top level corresponding to different feature segments and bottom level corresponding to different range of ECTR. The model is designed to be general enough to perform segmented calibration on any useful feature Test Plan: buck test dper3/dper3/modules/calibration/tests:calibration_test -- test_histogram_binning_calibration_by_feature buck test dper3/dper3_models/ads_ranking/model_impl/mtml/tests:mtml_lib_test -- test_multi_label_dependent_task_with_histogram_binning_calibration_by_feature e2e test: buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_sparse_nn_histogram_binning_calibration_by_feature buck test dper3/dper3_models/ads_ranking/tests:model_paradigm_e2e_tests -- test_mtml_with_dependent_task_histogram_binning_calibration_by_feature All tests passed Canary packages: Backend -> aml.dper2.canary:e0cd05ac9b9e4797a94e930426d76d18 Frontend -> ads_dper3.canary:55819413dd0f4aa1a47362e7869f6b1f Test FBL jobs: SparseNN ctr mbl feed f255676727 inline cvr f255677216 MTML regular task offsite cvr f255676719 MTML dependent task mobile cvr f255677551 DSNN for AI models ai oc f255730905 MIMO for both AI DSNN part and AF SNN part mimo ig f255683062 Reviewed By: zhongyx12 Differential Revision: D25043060 fbshipit-source-id: 8237cad41db66a09412beb301bc45231e1444d6b	2021-03-08 12:47:19 -08:00
Scott Wolchok	b2758cdc77	[PyTorch] Don't copy vector arguments to caffe2::Tensor::Resize (#53389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53389 Resize was written to take arguments by value, which was totally fine if they were ArrayRef or a series of integers, but not so fine if they're std::vector. ghstack-source-id: 123212128 Test Plan: Existing CI should make sure it builds Inspected assembly for ios_caffe.cc and saw no more vector copy before calling Resize Reviewed By: smessmer Differential Revision: D26852105 fbshipit-source-id: 9c3b9549d50d32923b532bbc60d0246e2c2b5fc7	2021-03-08 12:33:33 -08:00
Scott Wolchok	b64acfa9ac	[PyTorch] Move non-template part of TensorImpl::Resize to cpp (#53388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53388 Most of this method did not depend on the template parameter. No need to include it in the .h file or duplicate it in the generated code. ghstack-source-id: 123211590 Test Plan: Existing CI should cover this Reviewed By: smessmer Differential Revision: D26851985 fbshipit-source-id: 115e00fa3fde547c4c0009f2679d4b1e9bdda5df	2021-03-08 12:33:29 -08:00
Chen Lai	98943bb863	[PyTorch] Enable explicit ATen level sources for lite interpreter (#52769 ) Summary: Enable partial explicit Aten level sources list for lite interpreter. More aten level source list will be added. x86: `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` libpytorch_jni_lite.so -- 3.8 MB armeabi-v7a `SELECTED_OP_LIST=/Users/chenlai/Documents/pytorch/experiemnt/deeplabv3_scripted.yaml BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh armeabi-v7a` libpytorch_jni_lite.so -- 2.8 MB Pull Request resolved: https://github.com/pytorch/pytorch/pull/52769 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26717268 Pulled By: cccclai fbshipit-source-id: 208300f198071bd6751f76ff4bc24c7c9312d337	2021-03-08 12:31:50 -08:00
Haichuan Yang	25a9f45a5a	fix broken quantization_test in operator_benchmark (#53153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53153 This diff is a fix for quantization_test in operator_benchmark, which is broken because of removing the py_module for learnable fake_quantization. ghstack-source-id: 123103477 Test Plan: `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test` Reviewed By: z-a-f Differential Revision: D26764881 fbshipit-source-id: 8d40c6eb5e7090ca65f48982c837f7dc87d14378	2021-03-08 12:12:57 -08:00
Andrew Millspaugh	1fc8831322	Add missing tensor header (#53489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53489 It appears that D26675801 (`1fe6a6507e`) broke Glow builds (and probably other instals) with the inclusion of the python_arg_parser include. That dep lives in a directory of its own and was not included in the setup.py. Test Plan: OSS tests should catch this. Reviewed By: ngimel Differential Revision: D26878180 fbshipit-source-id: 70981340226a9681bb9d5420db56abba75e7f0a5	2021-03-08 12:05:17 -08:00
Eli Uriegas	117a49c4cb	.circleci: Restore docker builds for scheduled workflows (#53412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53412 Docker builds for scheduled workflows still need to happen within the regular build workflow since new docker image builds are actually only done within the `build` workflow A follow up to #52693 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D26890300 Pulled By: seemethere fbshipit-source-id: d649bfca5186a89bb5213865f1f5738b809d4d38	2021-03-08 12:03:33 -08:00
Sam Estep	1588df6b99	Fix typo in tools/test_history.py (#53514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53514 Test Plan: ``` tools/test_history.py columns --ref=0ca029b22d17d236d34bcecad44b94b35b1af4bb test_common_errors pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test1 ``` Reviewed By: janeyx99 Differential Revision: D26886385 Pulled By: samestep fbshipit-source-id: d3d79282e535707616d992ab8cf6216dfb777639	2021-03-08 11:42:30 -08:00
Tao Xu	36dc5d3b3a	[iOS GPU][BE][1/n] - Remove unused headers + improve error message (#53428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53428 Start to do some code clean up work. ghstack-source-id: 123038070 Test Plan: - CircleCI - Sandcastle CI - AIBench Reviewed By: SS-JIA, AshkanAliabadi Differential Revision: D26681115 fbshipit-source-id: b1b7cfc6543b73928f517cd52e94a2664ee0bd21	2021-03-08 11:36:10 -08:00
Sam Estep	1e306b9a71	Disable failing distributed test (#53527 ) Summary: See https://github.com/pytorch/pytorch/issues/53526. We're disabling the test temporarily until we can figure out what's going on (since it's unclear what needs to be reverted). Pull Request resolved: https://github.com/pytorch/pytorch/pull/53527 Reviewed By: zhangguanheng66 Differential Revision: D26888037 Pulled By: samestep fbshipit-source-id: f21a2d665c13181ed3c8815e352770b2f26cdb84	2021-03-08 11:29:05 -08:00
Facebook Community Bot	2b359dd6dc	Automated submodule update: tensorpipe (#53504 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `46949a8ca3` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53504 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26883701 fbshipit-source-id: 9e132a1389ac9cee9507c5600668af1afbb26efd	2021-03-08 11:02:52 -08:00
Shen Li	115df4fa28	Fix set_device_map docs (#53508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53508 closes #53501 Differential Revision: D26885263 Test Plan: Imported from OSS Reviewed By: H-Huang Pulled By: mrshenli fbshipit-source-id: dd0493e6f179d93b518af8f082399cacb1c7cba6	2021-03-08 10:56:46 -08:00
Alban Desmaison	93f1b10f72	Add missing attr in LazyModuleMixin doc (#53363 ) Summary: To fix some rendering issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53363 Reviewed By: izdeby Differential Revision: D26884560 Pulled By: albanD fbshipit-source-id: fedc9c9972a6c68f311c6aafcbb33a3a881bbcd2	2021-03-08 10:50:56 -08:00
Horace He	656930df26	[FX] Fix default to align with documentation in `fuser.py` (#53457 ) Summary: Currently it says it does a deepcopy by default, but that's not true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53457 Reviewed By: navahgar Differential Revision: D26876781 Pulled By: Chillee fbshipit-source-id: 26bcf76a0c7052d3577f217e79545480c9118a4e	2021-03-08 10:06:40 -08:00
Horace He	c07a62b854	[FX] change dynamic control flow example to a more dynamic version (#53250 ) Summary: This is a more fundamental example, as we may support some amount of shape specialization in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53250 Reviewed By: navahgar Differential Revision: D26841272 Pulled By: Chillee fbshipit-source-id: 027c719afafc03828a657e40859cbfbf135e05c9	2021-03-08 10:00:19 -08:00
Giuseppe Ottaviano	0ca029b22d	[caffe2] Fix DBFileReader (#53498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53498 This code depended on `Blobs()` being returned in sorted order: https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/caffe2/python/db_file_reader.py?commit=472774e7f507e124392491800d9654e01269cbaf&lines=89-91 But D26504408 (`69bb0e0285`) changed the underlying storage to a hashmap, so now the blobs are returned in arbitrary order (Note that `Blobs()` returns also non-local blobs, and for those there was already no guarantee of ordering). So we need to explicitly sort the result. Test Plan: ``` $ buck test dper3/dper3/toolkit/tests:lime_test $ buck test //dper3/dper3/toolkit/tests:model_insight_test ``` Pass after this diff. Differential Revision: D26879502 fbshipit-source-id: d76113f8780544af1d97ec0a818fb21cc767f2bf	2021-03-08 08:34:39 -08:00
generatedunixname89002005325676	d54be1a946	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26879724 fbshipit-source-id: 0e2dd4c5f7ba96e97e7cbc078184aed2a034ad2c	2021-03-08 03:49:09 -08:00
kshitij12345	a5ada2127d	[reland] Add OpInfo for `bitwise_not` and make ROCM and CUDA OpInfo tests consistent (#53181 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 This PR also enables the OpInfo tests on ROCM to check the same dtypes that of CUDA. Note: Reland https://github.com/pytorch/pytorch/issues/51944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53181 Reviewed By: zhangguanheng66 Differential Revision: D26811466 Pulled By: mruberry fbshipit-source-id: 8434a7515d83ed859db1b2f916fad81a9deaeb9b	2021-03-08 03:39:01 -08:00
mattip	54a2498919	Modify tests to use assertWarnsOnceRegex instead of maybeWarnsRegex (#52387 ) Summary: Related to https://github.com/pytorch/pytorch/issues/50006 Follow on for https://github.com/pytorch/pytorch/issues/48560 to ensure TORCH_WARN_ONCE warnings are caught. Most of this is straight-forward find-and-replace, but I did find one place where the TORCH_WARN_ONCE warning was not wrapped into a python warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52387 Reviewed By: albanD Differential Revision: D26773387 Pulled By: mruberry fbshipit-source-id: 5be7efbc8ab4a32ec8437c9c45f3b6c3c328f5dd	2021-03-08 03:32:14 -08:00
Raghavan Raman	d3cde6c23c	[NNC] Implementation for aten::cat without conditionals. (#53128 ) Summary: This PR adds an implementation for `aten::cat` in NNC without any conditionals. This version is not enabled by default. Here is the performance of some micro benchmarks with and without conditionals. There is up to 50% improvement in performance without conditionals for some of the shapes. aten::cat implementation in NNC with conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 5.44 us, SOL 0.26 GB/s, algorithmic 0.51 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.75 us, SOL 1.05 GB/s, algorithmic 2.10 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.87 us, SOL 4.05 GB/s, algorithmic 8.11 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 14.52 us, SOL 8.31 GB/s, algorithmic 16.62 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 9.58 us, SOL 6.84 GB/s, algorithmic 13.68 GB/s ``` aten::cat implementation in NNC without conditionals ``` $ python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion --cat_wo_conditionals concat pt: concat2d2input_fwd_cpu_1_160_1_14_1: 4.67 us, SOL 0.30 GB/s, algorithmic 0.60 GB/s pt: concat2d2input_fwd_cpu_1_580_1_174_1: 5.65 us, SOL 1.07 GB/s, algorithmic 2.14 GB/s pt: concat2d2input_fwd_cpu_20_160_20_14_1: 6.10 us, SOL 4.56 GB/s, algorithmic 9.12 GB/s pt: concat2d2input_fwd_cpu_20_580_20_174_1: 7.44 us, SOL 16.22 GB/s, algorithmic 32.44 GB/s pt: concat2d2input_fwd_cpu_8_512_8_512_1: 6.46 us, SOL 10.14 GB/s, algorithmic 20.29 GB/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53128 Reviewed By: bertmaher Differential Revision: D26758613 Pulled By: navahgar fbshipit-source-id: 00f56b7da630b42bc6e7ddd4444bae0cf3a5780a	2021-03-07 22:57:02 -08:00
Shen Li	c7b1979b6b	Use Store collect and verify names in all RPC agents (#53209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53209 closes #40048 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26791524 Pulled By: mrshenli fbshipit-source-id: fc75589f9707014334fcfae6f05af3c04217783b	2021-03-07 16:51:46 -08:00
Shen Li	affdcce833	Extract TensorPipeAgent's collectNames to be a standalone utility function (#53202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53202 Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26791525 Pulled By: mrshenli fbshipit-source-id: 8234c4d0350a5cd61926dce4ecc9e918960d30d2	2021-03-07 16:48:46 -08:00
Facebook Community Bot	e08aae2613	Automated submodule update: FBGEMM (#53478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53478 This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `c3612e67ee` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53087 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jianyuh Differential Revision: D26744074 Pulled By: jspark1105 fbshipit-source-id: c16de118a5befb9dae9e256a5796993cdcfb714b	2021-03-07 12:39:10 -08:00
Dhruv Matani	b26c0bb2b9	[PyTorch Mobile] Allow skipping operator exists check when bytecode model is loaded (#52814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52814 Currently, there is no way to load a model on a devvm (CPU) if that model has operators that the runtime doesn't support. This ends up happening (currently) for Metal GPU models, and potentially in the future for other backends that have backend-specific operators that don't have a registered implementation (even a dummy one) on CPU. There are at least a couple reasons for why this is needed: 1. We want to extract operator list directly from the bytecode (instead of looking it up from `mobile_info.json). 2. We want to be able to trace the quantized operators that are invoked when loading the compressed weights for a model that has prepacked weights. xta0 root-caused this after husthyc discovered that there are untraced operators showing up when loading a Metal GPU model. If we want to scale out to support different types of models, we absolutely need the ability to load a model on a devvm irrespective of what backend (device/etc...) it is targeted at. ghstack-source-id: 123284366 Test Plan: The next diff in this stack is using the newly introduced methods. Reviewed By: iseeyuan Differential Revision: D26656266 fbshipit-source-id: eed9af2f7b55979e9c18b986b8c3b9a767153297	2021-03-07 02:56:12 -08:00
Hao Lu	3236efa4de	[Static Runtime] Call native resize_/resize_as_ as much as possible (#53425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53425 t.resize_ goes through the dispatcher. Replace with direct native calls - t.resize_/resize_as_ -> at::native::resize_/resize_as_ - t.resize_({0}) -> fastResizeToZero(t) Reviewed By: ajyu, edvgha Differential Revision: D26836278 fbshipit-source-id: d1a95240099a35f5ece0de2eea50620ba8054ee5	2021-03-06 21:12:23 -08:00
Erjia Guan	dbbe0a2105	[DataLoader] Introduce deterministic context (#53271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53271 - [x] Add `set_determinism` context manager - [x] Add `non_deterministic` decorator for `DataPipe` - Raise error at the construction time for non-deterministic DataPipe when `determinism` is set to `True` - [ ] Support `non_deterministic` with option - When `GreedyJoin` only contains one datapipe, it should still be deterministic. Note: Test is in the [PR](https://github.com/facebookexternal/torchdata/pull/15). As the main repo doesn't have non-deterministic DataPipe yet. Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D26823023 Pulled By: ejguan fbshipit-source-id: 51bb92fc3d18d1fc9536c1229363c536ad120876	2021-03-06 07:37:39 -08:00
Erjia Guan	1ba80264f4	[DataLoader] ConcatDataPipe (#53301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53301 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D26829322 Pulled By: ejguan fbshipit-source-id: eeea42fd9ab267d10f39ad7debc279eaded23570	2021-03-06 07:32:02 -08:00
James Reed	1fe6a6507e	[WIP][FX] Fix tracing support for torchbind (#52884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52884 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26675801 Pulled By: jamesr66a fbshipit-source-id: 8e5100bcea17589a53163abf6ab991658e11fa3a	2021-03-05 23:40:16 -08:00
Yukio Siraichi	a0d1e701db	Replace `internal::GRAIN_SIZE` by `grain_size` (parameter). (#53177 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53177 Reviewed By: SplitInfinity, nikithamalgifb Differential Revision: D26860248 Pulled By: ngimel fbshipit-source-id: 56917f8421f7b45c461945fd3d1ff107ce8535b2	2021-03-05 22:13:01 -08:00
Valentin Andrei	369601355f	[caffe2] Use extended versions of cuDNN calls for SpatialBN Summary: Using `cudnnBatchNormalizationForwardTrainingEx` and `cudnnBatchNormalizationBackwardEx` if cuDNN version is greater than 8.0.0. Reviewed By: xw285cornell Differential Revision: D26794173 fbshipit-source-id: dc4994375350f303a3fa0aee03255e8f8be1c605	2021-03-05 18:18:15 -08:00
Edward Yang	758fb94fcb	Prefix assert_async with underscore, fix some bugs in assert_async CUDA testing (#53276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53276 - One of the tests had a syntax error (but the test wasn't fine grained enough to catch this; any error was a pass) - Doesn't work on ROCm Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D26820048 Test Plan: Imported from OSS Reviewed By: mruberry Pulled By: ezyang fbshipit-source-id: b02c4252d10191c3b1b78f141d008084dc860c45	2021-03-05 17:36:01 -08:00
Sam Estep	8c798e0622	Forbid trailing whitespace (#53406 ) Summary: Context: https://github.com/pytorch/pytorch/pull/53299#discussion_r587882857 These are the only hand-written parts of this diff: - the addition to `.github/workflows/lint.yml` - the file endings changed in these four files (to appease FB-internal land-blocking lints): - `GLOSSARY.md` - `aten/src/ATen/core/op_registration/README.md` - `scripts/README.md` - `torch/csrc/jit/codegen/fuser/README.md` The rest was generated by running this command (on macOS): ``` git grep -I -l ' $' -- . ':(exclude)/contrib/' ':(exclude)third_party' \| xargs gsed -i 's/ *$//' ``` I looked over the auto-generated changes and didn't see anything that looked problematic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53406 Test Plan: This run (after adding the lint but before removing existing trailing spaces) failed: - https://github.com/pytorch/pytorch/runs/2043032377 This run (on the tip of this PR) succeeded: - https://github.com/pytorch/pytorch/runs/2043296348 Reviewed By: walterddr, seemethere Differential Revision: D26856620 Pulled By: samestep fbshipit-source-id: 3f0de7f7c2e4b0f1c089eac9b5085a58dd7e0d97	2021-03-05 17:22:55 -08:00
Natalia Gimelshein	cab2689eb1	Revert D26849826: [pytorch][PR] Call nvidia-smi.exe before running tests Windows Test Plan: revert-hammer Differential Revision: D26849826 (`efebc6524d`) Original commit changeset: 14f0d9dfe41a fbshipit-source-id: 5069f25a6bb1301df50a970817729e5241c30c81	2021-03-05 15:19:57 -08:00
Pritam Damania	1974969842	Cleanup async execution for python RPC calls. (#53230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53230 As part of https://github.com/pytorch/pytorch/issues/39351, cleaning up the python call async execution. ghstack-source-id: 123120119 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26800758 fbshipit-source-id: 50fe94c684bf53b907762e8bf196a6f6b97e4cf0	2021-03-05 15:05:45 -08:00
Pritam Damania	7bfa9dc7de	Simplify async execution for script remote calls. (#53207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53207 Simplifying some of the async execution logic in request_callback_impl as part of https://github.com/pytorch/pytorch/issues/39351. ghstack-source-id: 123004020 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D26791325 fbshipit-source-id: 790ad413dad410dbcd07787583674cb5af1d1c92	2021-03-05 15:04:07 -08:00
Xiaomeng Yang	6cbbef2fea	Modify assert order to correct the error message when nan appears in multinomial on cuda (#53288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53288 Modify assert order to correct the error message when nan appears in multinomial on cuda Test Plan: unittest Reviewed By: ngimel Differential Revision: D26824353 fbshipit-source-id: af6195e7c36fd51b3fc90df558ad6fac41288142	2021-03-05 14:58:39 -08:00
Adam Simpkins	f595ba1bae	[caffe2] move the SaveOp implementation from a header to a .cc file (#53298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53298 This is a re-land of D26641600 (`3969391c07`), but with the `SaveOpImpl` class marked as `TORCH_API` to ensure that its symbols get exported properly in shared library builds. This moves the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`. Previously this implementation was all in the templatized `SaveOp` class, even though most of the logic didn't depend on the template parameters. Having this code be in the header file slows down the build, and forces more files to be rebuilt than necessary when changing the SaveOp code. Having this code be in a template class can also increase the generated code size be larger than needed, as we don't need separate copies instantiated for each context type. ghstack-source-id: 123146018 Test Plan: buck test //caffe2/caffe2/python/operator_test:load_save_test Also tested performing the CMake-based build using shared libraries with CUDA enabled, and confirmed that the build succeeded. Reviewed By: mraway Differential Revision: D26802576 fbshipit-source-id: fc2dbdc1cd20680b082c887366a6305d86688138	2021-03-05 14:52:14 -08:00
Eli Uriegas	474fe7d976	docker: Update default cuda => 11.1 (#53299 ) Summary: We no longer build binaries for CUDA 11.0 so let's ensure that we have build for CUDA 11.1 by default instead Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53299 Reviewed By: anjali411 Differential Revision: D26857194 Pulled By: seemethere fbshipit-source-id: 6094913922c0da832b96e5e49a67369d69d0b8ad	2021-03-05 14:45:57 -08:00
Rong Rong (AI Infra)	f58f7b786c	add distributed backend options in setup.py (#53214 ) Summary: Currently there's only one indicator for build_ext regarding distributed backend `USE_DISTRIBUTED`. However one can build with selective backends. adding the 3 distributed backend option in setup.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/53214 Test Plan: Set the 3 options in environment and locally ran `python setup.py build_ext` Reviewed By: janeyx99 Differential Revision: D26818259 Pulled By: walterddr fbshipit-source-id: 688e8f83383d10ce23ee1f019be33557ce5cce07	2021-03-05 14:39:36 -08:00
Pritam Damania	387d9a6bab	Simplify async execution for script calls. (#53204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53204 Async execution for script calls in request_callback_impl.cpp had two similar if-else blocks that were hard to read. This PR simplifies some of the logic by breaking the logic into resuable components. ghstack-source-id: 122996440 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D26788459 fbshipit-source-id: f2818c6251a465936ed75b7bd356b616f0580094	2021-03-05 13:55:52 -08:00
Jane Xu	c0adabe172	automate sharding using S3 test time stats (#53269 ) Summary: Uses nightly commit stats to automatically shard tests based on execution time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53269 Test Plan: set CIRCLE_JOB to an existing job, like `pytorch_linux_bionic_py3_6_clang9_test` Then you can run something like: `python test/run_test.py --shard 1 10` Reviewed By: malfet Differential Revision: D26819440 Pulled By: janeyx99 fbshipit-source-id: 6bc73d6aa3d52d9850817536be15d7b54a72780e	2021-03-05 13:40:24 -08:00
Oleg Khabinov	00bd0e9862	[caffe2] Fix shape inference for LpNorm (#53332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53332 This is to make sure we don't get `BATCH` dim type for the output. Reviewed By: ChunliF Differential Revision: D26836902 fbshipit-source-id: bedbd12330c608406e3466b240015235a28d2c4a	2021-03-05 13:35:32 -08:00
Chester Liu	efebc6524d	Call nvidia-smi.exe before running tests Windows (#53334 ) Summary: To display the basic information about the GPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53334 Reviewed By: anjali411 Differential Revision: D26849826 Pulled By: ngimel fbshipit-source-id: 14f0d9dfe41a35fa45fdf6aa7bf2a41704887c0c	2021-03-05 12:46:01 -08:00
Eli Uriegas	c3405e5ba1	Revert "Automated submodule update: tensorpipe (#53012 )" (#53394 ) Summary: This reverts commit 62d1cdd725a9b2af332b7ee67a75be2bbac1481a. Fixes https://github.com/pytorch/pytorch/issues/53393 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53394 Reviewed By: samestep Differential Revision: D26852966 Pulled By: seemethere fbshipit-source-id: 325c6c3478a990ade8c7b51d40260caf3028b62d	2021-03-05 11:48:34 -08:00
Kiuk Chung	ba75cedfc5	[1/n][torch/elastic][upstream] Move torchelastic/rendezvous to torch/distributed/rendezvous (#53172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53172 Pull Request resolved: https://github.com/pytorch/elastic/pull/141 Upstreams two modules to torch: 1. `torchelastic.rendezvous` 2. `torchelastic.utils` These modules were chosen as `[1/n]` since they are the leaf modules in torchelastic. ==== NOTES: ==== 1. I'm disabling etcd_rendezvous and etcd_server tests in CIRCLECI for the moment since I need to edit the test dockers to contain the etcd server binary (there's 4-5 test dockers - one for each platform so this is going to take some time for me to set up the environments and test) - T85992919. 2. I've fixed all lint errors on python files but there are ones on the cpp files on the ZeusRendezvous. I took a look at them, and I don't want to fix the linter errors right now for 2 major reasons: 1. Some of them are more than formatting changes (e.g. std::move vs pass by value) and I don't want to introduce bundled changes with the move 1. The old rendezvous code (the one we forked from in caffe2/fb) has the same problems and I think its better for us to deal with this when we deprecate caffe2/fb/rendezvous in favor of the one in torchelastic -T86012579. Test Plan: ``` buck test mode/dev-nosan //caffe2/torch/distributed/elastic/utils/test/... buck test mode/dev-nosan //caffe2/torch/distributed/elastic/utils/data/test/... buck test mode/dev-nosan //caffe2/torch/distributed/elastic/rendezvous/test/... buck test mode/dev-nosan //caffe2/torch/distributed/elastic/rendezvous/fb/... buck test mode/dev-nosan //pytorch/elastic/torchelastic/... ``` \+ Sandcastle Reviewed By: H-Huang Differential Revision: D26718746 fbshipit-source-id: 67cc0350c3d847221cb3c3038f98f47915362f51	2021-03-05 11:27:57 -08:00
Rohan Varma	14fa47631b	[DDP Logging] Log comm. hook in ddp logging (#52966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52966 Logs registerd comm hook if there is one, else logs "builtin_allreduce" ghstack-source-id: 123174803 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D26709388 fbshipit-source-id: 484fdbbd6643ec261b3797bd8d9824b2b6a1a490	2021-03-05 11:23:26 -08:00
Rohan Varma	5d9b7bee1a	[DDP Logging] Log nccl_async_error_handling (#52965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52965 Logs nccl async error handling in ddp logger ghstack-source-id: 123171876 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26709030 fbshipit-source-id: 530456a5005b8e4956d7fb023986e9b948ebe1a8	2021-03-05 11:23:22 -08:00
Rohan Varma	bdbfc2582d	[Dist Debugality] Log key DDP metrics to stderr under debug mode. (#52957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52957 This diff: 1. Under TORCH_DISTRIBUTED_DEBUG=INFO or DETAIL, logs DDP information during init time (all stats in ddp_logging_data_) 2. Under TORCH_DISTRIBUTED_DEBUG=DETAIL, logs runtime stats when they are collected (first 10 iterations and then once every 100 iterations). Avoiding logging every iteration to not spam logs. Verified by inspecting logs: ``` I0226 19:12:47.109243 2818475 logger.cpp:140] [Rank 1]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 1 output_device: 1 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A I0226 19:12:47.109252 2818473 logger.cpp:140] [Rank 0]: DDP Initialized with: world_size: 2 module_name: Linear device_ids: 0 output_device: 0 backend_name: nccl parameter_dtype: float total _parameter_size_in_bytes: 40 num_parameter_tensors: 2 bucket_sizes: 40 CUDA_VISIBLE_DEVICES: N/Abroadcast_buffer s: 1 bucket_cap_mb: 25 find_unused_parameters: 0 gradient_as_bucket_view: 0 Backend Info: nccl_socket_ifname: N/A nccl_blocking_wait: N/A nccl_debug: WARN nccl_nthreads: N/A nccl_ib_timeo ut: N/A ``` ``` I0226 19:12:48.117936 2818473 logger.cpp:286] [Rank 0 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 568944 Avg backward compute time: 885504 Avg backward comm. time: 692496 Avg backward comm/comp overlap time: 113536 I0226 19:12:48.118517 2818475 logger.cpp:286] [Rank 1 / 2] Training Linear unused_parameter_size=0 Avg forward compute time: 565584 Avg backward compute time: 876992 Avg backward comm. time: 201872 Avg backward comm/comp overlap time: 128624 ``` ghstack-source-id: 123171875 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26708184 fbshipit-source-id: 16defd5610d28bc4cf3fc2a0cc564e84efcfa791	2021-03-05 11:23:18 -08:00
Rohan Varma	68134374cb	Refactor/fix DDP model check during init (#52887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52887 This diff changes the way to do model consistency check (i.e. `_verify_replicas_across_processes`) in DDP. There were a few things that could be improved with the way we verify model across processes in DDP initialization: 1. We should do this check before syncing module states in DDP init, otherwise with Gloo backend this will throw but we would like to throw the error corresponding to different models on different ranks. To do this, we move the methods to be standalone C++ functions (not part of reducer) and move this check to before synchronizing parameters. 2. Refactor DDP init in the following ways: - Run model consistency check before creating reducer, 2 - add helper functions to build params to pass into reducer - add helper function to call `_verify_model_across_ranks` - move `def parameters` to a helper function `_get_parameters` to be used more broadly within DDP In follow up changes we will add the ability to detect which rank had inconsistent model (https://github.com/pytorch/pytorch/issues/52876 would be useful for this to determine which ranks(s) had errors). ghstack-source-id: 123171877 Test Plan: CI/unittest buck test mode/dev-nosan //caffe2/test/distributed:c10d BACKEND="nccl" WORLD_SIZE="2" ~/fbcode/buck-out/dev/gen/caffe2/test/distributed/distributed_nccl_fork#binary.par -r test_ddp_model_diff_across_ranks Reviewed By: zhaojuanmao Differential Revision: D26565290 fbshipit-source-id: f0e1709585b53730e86915e768448f5b8817a608	2021-03-05 11:21:45 -08:00
Hong Xu	1b35b1a0c4	Properly skip distributed tests when distributed module is not built (#52945 ) Summary: Currently there is some code that intends to skip distributed tests if the distributed module is not built. However, they are missing in some test files; and in some other test files they are checked after distributed module is imported, which leads to failure. This is generating a lot of headaches when testing minimal builds locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52945 Reviewed By: anjali411 Differential Revision: D26848241 Pulled By: ezyang fbshipit-source-id: 983a848844add40869a86f3c9413503a3659b115	2021-03-05 10:28:47 -08:00
Iurii Zdebskyi	c697e48023	Refactor ForeachUnaryOp.cu (#51894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51894 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D26323605 Pulled By: izdeby fbshipit-source-id: eb65269ab3e14160d7cb5e6e84e85ef4037d3b0d	2021-03-05 10:26:58 -08:00
Bram Wasti	56f8379802	[static runtime] Move all heavy constructor logic into InferenceModule (renamed to StaticModule) (#51564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51564 Constructor logic was spread throughout InferenceModule and StaticRuntime. This diff unifies the two. After a lot of discussion on this diff D25961626 it became apparent that `clone` is uglier than a cheap StaticRuntime. This means StaticRuntime is effectively StaticModule and the only code in the new StaticRuntime is the `run` functions. ``` graph, schema = PrepareForStaticModule(torchscript_module) sm = StaticModule(graph, schema, options) sm(inputs) // or create many cheap runtimes with the module sr = StaticRuntime(sm) sr(inputs) ``` Changelist: - Rename InferenceModule StaticModule - Move all logic for construction into StaticModule - Create a new StaticRuntime that only has a unique memory planner (everything else is in StaticModule) - Update comments with explanation - Propagate all changes to predictor integration - Propagate all changes to python integration - Change semantics to be a bit more PyTorch-standard (no "run" calls, no "get_" getters). Test Plan: buck test //caffe2/test:static_runtime buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D25592967 fbshipit-source-id: 8233bed03137ce129137af2d44bce0095033ef0f	2021-03-05 10:15:26 -08:00
Peter Bell	5ebfabb310	MAGMA: Initialize ipiv data to avoid internal memory access violation (#53064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51930 Running the reproducer under `cuda-gdb`, I see access violations in either [`zswap_kernel_batched`](`4fd4634f35/magmablas/zgetf2_kernels.cu (lines-276)`) (part of the LU factorization) and other times in [`zlaswp_columnserial_kernel`](`4fd4634f35/magmablas/zlaswp_batched.cu (lines-335)`) (part of the inverse). The common factor between both of these is they use `ipiv` to index into the matrix. My best guess is the `ipiv` indices aren't written when the factorization fails, hence garbage data is used as matrix indices and we get an access violation. Initializing `ipiv` to a known-good value before the factorization fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53064 Reviewed By: zhangguanheng66 Differential Revision: D26829053 Pulled By: heitorschueroff fbshipit-source-id: 842854a6ee182f20b2acad0d76d32d27cb51b061	2021-03-05 08:59:27 -08:00
Facebook Community Bot	268b96f069	Automated submodule update: tensorpipe (#53353 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `a4816001b8` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53353 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26844238 fbshipit-source-id: 9895773f616c53d7d3b3a5e1b95507d26bb93fee	2021-03-05 08:48:15 -08:00
kshitij12345	e9d7137072	fixes #38775 #38779 : complex support for linspace and logspace (#38875 ) Summary: Closes https://github.com/pytorch/pytorch/issues/38775, Closes https://github.com/pytorch/pytorch/issues/38779 TO-DO: * [x] Add Tests Quansight Tracking : q-38775, q-38779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38875 Reviewed By: malfet Differential Revision: D26628530 Pulled By: anjali411 fbshipit-source-id: ca4259b9f6725c4a4350f944465327169d12122e	2021-03-05 08:37:55 -08:00
Horace He	42e0983230	[NNC] Added some APIs for dealing directly with Bufs (instead of Tensors) (#53011 ) Summary: (also includes some python binding stuff :P) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53011 Reviewed By: gchanan, robieta Differential Revision: D26801120 Pulled By: Chillee fbshipit-source-id: 42a1efb6cbc9ddc0b72b780f3d6b712b3ae62b09	2021-03-05 06:55:48 -08:00
Facebook Community Bot	854cc53594	Automated submodule update: tensorpipe (#53265 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `20224c5fe7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53265 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: walterddr, lw Differential Revision: D26816470 fbshipit-source-id: 8e381a3d6632acbc90691128ef85591b325ecf64	2021-03-05 02:27:28 -08:00
Hao Lu	63e0e88ccc	[PyPer] More at::empty -> at::detail::empty_cpu (#53333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53333 - Add more variants to `create_empty_from` to take more args, like dtype/layout/device. - Clean up stray at::empty uses, mostly in the out variants. Reviewed By: ajyu Differential Revision: D26799900 fbshipit-source-id: 6676d8043fead63208913ef3a28cabbae76e46bb	2021-03-05 00:16:51 -08:00
Giuseppe Ottaviano	69bb0e0285	[caffe2] Avoid some double (and triple) lookups in workspace (#53319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53319 Noticed these in profiles. Also switch to `unordered_map`. Test Plan: Unit tests. Reviewed By: swolchok Differential Revision: D26504408 fbshipit-source-id: 9e14d55909a4af019058b8c27c67ee2348cd02a9	2021-03-04 22:57:02 -08:00
Hao Lu	35364c3641	[static runtime] Enable ClipRangesGatherRangesX2SigridHash fusion for SigridHashPrecompute (#53324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53324 Reviewed By: maratsubkhankulov Differential Revision: D26833478 fbshipit-source-id: 55ab63faf5b535f2acd2ec5dc5721f5b692832d7	2021-03-04 22:01:08 -08:00
Ilia Cherniavskii	dfd5331e9c	Skip tests on ROCm (#53339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53339 Skip tests on ROCm Test Plan: CI Reviewed By: gdankel, ZolotukhinM Differential Revision: D26838813 fbshipit-source-id: e26286a61a192710e393c19d3eb2316b6c76a42e	2021-03-04 21:55:34 -08:00
Mikhail Zolotukhin	8bac382d9d	[TensorExpr] Remove unused classes from TensorExprKernel. (#53283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53283 We had `ShapeArg` and `KernelArg` classes, which were wrappers over `BufferArg` without adding any new functionality on top of what already existed. This PR removes them and replace their uses with `BufferArg`s directly. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26821993 Pulled By: ZolotukhinM fbshipit-source-id: d1f95ea069b9f38f1d32424464551df2565b3c49	2021-03-04 21:24:29 -08:00
Edward Yang	cfd9360d09	Revert D26837780: Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26837780 Original commit changeset: 21567cab5c0f fbshipit-source-id: 8ea735e5fdc97e32ae3fafd40297a1b8a7cd34b0	2021-03-04 20:45:35 -08:00
Meghan Lele	51592a9e0a	[package] Add `deny` method to `PackageExporter` (#53233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53233 Summary This commit adds a `deny` method to `PackageExporter` that allows modules to be prohibited during the packaging process. A dependency on a module matching the names or globs that `deny` was called with will cause an exception to be raised. Test Plan This commit adds unit tests to `PackagingTest` for this new method: `test_deny` and `test_deny_glob`. Fixes This commit fixes #53217. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26834010 Pulled By: SplitInfinity fbshipit-source-id: 469b5c6741bcc6dab77e352f41db38fa1e0dae12	2021-03-04 20:37:41 -08:00
Meghan Lele	f1eedfa2c8	[package] Add `allow_empty` flag to mock and extern (#53232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53232 Summary This commit adds an optional `allow_empty` argument to `PackageExporter.mock` and `PackageExporter.extern` that allows certain patterns for mocked modules and extern modules to be marked ones that must be matched during the packaging process. If a mock or extern module with `allow_empty=False` is not matched while packaging, an error is thrown. Test Plan This commit adds two new test cases to `PackagingTest`, `test_extern_glob_allow_empty` and `test_mock_glob_allow_empty` that test this new flag. Existing tests already tests `allow_empty=True`. Fixes This commit fixes #53217. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26834011 Pulled By: SplitInfinity fbshipit-source-id: 9cf4ea56079ae210d6cfa8604218849eb5cde5f4	2021-03-04 20:35:06 -08:00
Yuchen Huang	842ba90739	[iOS] Bump up the Cocoapods version (#53335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53335 ghstack-source-id: 123166245 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: xta0 Differential Revision: D26838693 fbshipit-source-id: 0007eba40b3145c8ba77b3211759f0609e17f561	2021-03-04 20:29:23 -08:00
Oleg Khabinov	fdd074e806	[caffe2] Fix shape inference for Softmax (#53132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53132 Input and output should have the same shape for Softmax https://caffe2.ai/docs/operators-catalogue.html#softmax. Reviewed By: walterddr, yinghai, ChunliF Differential Revision: D26536592 fbshipit-source-id: 8b50794803aeadcb75d8f370c77f4fef98a1f2ad	2021-03-04 19:37:43 -08:00
Ilia Cherniavskii	795ed5ca3f	Enable Kineto in CPU builds (#53174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53174 Enable Kineto also in the CPU builds (non-mobile, non-Windows(atm)) Test Plan: CI Reviewed By: gdankel Differential Revision: D26776112 Pulled By: ilia-cher fbshipit-source-id: 8733f65c2993105136c853f2a7b6e497d0fa53bf	2021-03-04 19:15:52 -08:00
Dhruv Matani	17495e0318	[PyTorch Mobile] Fix case when error messages are stripped, and stack value isn't popped off in lite-interpreter (#53201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53201 This resulted in [S22350](https://www.internalfb.com/intern/sevmanager/view/s/223540), which caused truoble on Android. 1. The Python has a call to `warnings.warn()`, which resulted in code generated to emit the `WARN` instruction on lite-interpreter. 2. The code for handling that instruction/op-code popped off the value in a call to the `TORCH_WARN()` macro. 3. This macro conditionally compiled out evaluation of the arguments if `STRIP_ERROR_MESSAGES` was defined, which resulted in the stack not getting popped, and the lite-interpreter returning the last pushed value on to the stack. I've attempted to re-produce it using this python code: {P243842428} ghstack-source-id: 122990001 (Note: this ignores all push blocking failures!) Test Plan: Created a new unit test to re-produce the failure in the test. Was able to do so locally using the following command: ``` buck test -c pt.strip_error_messages=1 //xplat/caffe2:test_s223540 ``` However, since `pt.strip_error_messages=0` for dev and continuous builds, I have had to check in a separate contbuild config to try and trigger this failure on contbuild. Reviewed By: iseeyuan Differential Revision: D26765662 fbshipit-source-id: 63c3c96d84ce6a9e5471f13d80165aa3718be9a2	2021-03-04 19:10:07 -08:00
Edward Yang	1accffe450	Revert D26819810: Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26819810 Original commit changeset: e528260e1aa9 fbshipit-source-id: 21567cab5c0ff5f5e60a699d4d4678773a567c30	2021-03-04 18:48:56 -08:00
Iurii Zdebskyi	110a17a4d9	Update foreach APIs to use scalar lists (#51893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51893 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26323606 Pulled By: izdeby fbshipit-source-id: 53791087c924d04526fe7adb8f4ab5676d383b04	2021-03-04 18:20:53 -08:00
Marat Subkhankulov	47dbdfcfe9	[Static Runtime] remove redundant gather_ranges when fusing (#53323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53323 Whilst optimizing inline cvr local ro, found a pattern where gather_ranges is used redundantly. Fuse this pattern to remove unnecessary gather_ranges. Reviewed By: hlu1 Differential Revision: D26659824 fbshipit-source-id: 6420afa3a2c3272c57706b70c2e9834014d6c32d	2021-03-04 18:14:29 -08:00
Rohan Varma	97d4ed3d2d	[torch.futures] Add note about error handling for non-chained futures. (#53212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53212 Ran into a strange issue with error handling in future callbacks, more details in https://github.com/pytorch/pytorch/issues/52132, but essentially, after a callback throws all additional processing stops, and other futures can never be completed, resulting in a hang. Add a note to warn about this. ghstack-source-id: 123122890 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D26793310 fbshipit-source-id: b1ae73a81163d7b37ba07b0685e8de4228f01da6	2021-03-04 18:09:23 -08:00
Hao Lu	ac668c55e5	[Static Runtime] Remove dead code in MemoryPlanner and rename unmanaged_value_set to unmanaged_ivalue_set Test Plan: ``` buck test mode/opt //caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench_test -- --run-disabled buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: bwasti Differential Revision: D26827700 fbshipit-source-id: a8696af3e1d2b504fa5754f823b389d45b48af38	2021-03-04 17:37:43 -08:00
Ansha Yu	36180c1322	[static runtime] aten::to copy out variant (#52343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52343 aten::to returns self when the TensorOptions match and copy is set to false. For static runtime, always copy. There isn't a separate op for aten::to copy, but instead the same function with different arguments. Test Plan: On AdFinder local_ro: Before: 0.896742 0.00824827 ms. 0.92773%. aten::to (5 nodes) After: 0.88233 0.0056607 ms. 0.644675%. aten::to (5 nodes) buck test mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D26477980 fbshipit-source-id: 8e8448092adff38c141af1ce27a10acd39c07dd1	2021-03-04 17:30:15 -08:00
Brian Hirsh	18277137ff	make torch.load() aware of import path changes: torch.tensor -> torch._tensor (#53139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53139 ghstack-source-id: 123090847 Test Plan: Sandcastle Also explicitly tests that this test passes after incorporating the changes from D26656767, and adding a `torch.tensor` -> `torch._tensor` mapping to the `load_module_mapping` dict: `buck test mode/dev //pandora/utils/tests:manifold_utils_tests -- --exact 'pandora/utils/tests:manifold_utils_tests - test_load_dataset_valid_dir (pandora.utils.tests.manifold_utils_tests.TestManifoldUtils)'` With just D26656767, that test fails. With D26656767 + the changes in this diff, that test passes. Reviewed By: ezyang Differential Revision: D26760600 fbshipit-source-id: cb16493b858a358acf468d755740aa272ae9d363	2021-03-04 17:11:20 -08:00
cpuhrsch	a558d3629f	Remove MNIST for XLA (#53274 ) Summary: Mitigates https://github.com/pytorch/pytorch/issues/53267 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53274 Reviewed By: zhangguanheng66, ailzhang Differential Revision: D26819702 Pulled By: cpuhrsch fbshipit-source-id: 5b9b30db6f8fc414aa9f3c841429bf99bc927763	2021-03-04 17:05:56 -08:00
Jeffrey Wan	a3c3141dd2	Fix gradfn attr bindings when saved variable is of an output (#53205 ) Summary: When saved variable is of an output, its grad_fn is not saved in SavedVariable, so it must be passed in during `unpack`. Here, we can always pass in grad_fn (whether or not saved variable is an output) because it is ignored if the saved variable is not an output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53205 Reviewed By: gchanan, zhangguanheng66 Differential Revision: D26794365 Pulled By: soulitzer fbshipit-source-id: e039baba20c364c4ab42ff99d0b242dd95c67fb3	2021-03-04 16:59:42 -08:00
Scott Wolchok	6db2f012a5	[PyTorch] Reduce size of register_symbols.cpp (#53278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53278 We can avoid duplicating the string data for the namespaces by assembling qualified names ourselves as needed. ghstack-source-id: 123111718 Test Plan: CI buildsizebot some iOS apps Reviewed By: dhruvbird, walterddr, ot Differential Revision: D26820648 fbshipit-source-id: e2560874c54f46210181ddfee354967644bd41e1	2021-03-04 16:53:58 -08:00
Jeffrey Wan	4739d15a67	Skip some nodes during discovery using sequence number (#52180 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12635 This change will help us speed up autograd's discovery algorithm in cases where we use `.grad` and we try to "unroll" the training loop. For example the example in the issue and also https://github.com/pytorch/pytorch/pull/52180#issuecomment-783400832 observe an unbounded multiple of speed-up. We do this by adding a new sequence_nr-type numbering: for each node, we maintain the length of the longest path from it to any leaf node. How does this help us speed up discovery (dfs)? Previously the bottleneck was that the dfs that computes which nodes need to be executed always explored every node. With this change, before we run dfs, we first compute the mininum seq_nr among all the nodes passed as the `inputs`. If let this be some number N, intuitively this means that dfs should stay at least N units away from any leaf node. So, if we find ourselves too close to any leaf node, we should stop our search early. Edit: After some discussion offline, the plan is: - make old sequence_nr a construct of the profiler. This means we can avoid accessing thread local state in cases where the profiler is disabled. Note that we cannot replace sequence_nr as-is because profiler's use-case requires that thread-id + sequence_nr can uniquely identify a given node in order for downstream users/programs to correlate nodes from backward and forward passes. This means we must maintain two sequence_nr's and that we have an extra field in Node. - In a future PR, we can potentially remove sequence_nr entirely from the profiler as well, but we avoid doing it now because we haven't measured, and its a larger effort because we'd have to mess around with the dispatcher and profiler Testing with this [code](https://gist.github.com/kyunghyuncho/5fb9991ce1233f909051854a84b7148e), we see that runtime no longer increases as we iterate. Before: ``` 100: Time taken: 0.47s, loss: 1.1e+06 200: Time taken: 0.064s, loss: 6.5e+05 300: Time taken: 0.088s, loss: 4.4e+05 400: Time taken: 0.1s, loss: 3.2e+05 500: Time taken: 0.12s, loss: 2.5e+05 600: Time taken: 0.15s, loss: 2e+05 700: Time taken: 0.18s, loss: 1.7e+05 800: Time taken: 0.2s, loss: 1.4e+05 900: Time taken: 0.22s, loss: 1.2e+05 1000: Time taken: 0.24s, loss: 1.1e+05 1100: Time taken: 0.27s, loss: 9.3e+04 1200: Time taken: 0.3s, loss: 8.3e+04 1300: Time taken: 0.34s, loss: 7.4e+04 1400: Time taken: 0.36s, loss: 6.7e+04 1500: Time taken: 0.38s, loss: 6.1e+04 1600: Time taken: 0.4s, loss: 5.6e+04 1700: Time taken: 0.42s, loss: 5.1e+04 1800: Time taken: 0.44s, loss: 4.7e+04 1900: Time taken: 0.47s, loss: 4.4e+04 2000: Time taken: 0.5s, loss: 4.1e+04 ``` After: ``` 100: Time taken: 0.49s, loss: 1.2e+06 200: Time taken: 0.031s, loss: 6.9e+05 300: Time taken: 0.031s, loss: 4.6e+05 400: Time taken: 0.031s, loss: 3.3e+05 500: Time taken: 0.031s, loss: 2.6e+05 600: Time taken: 0.031s, loss: 2.1e+05 700: Time taken: 0.031s, loss: 1.7e+05 800: Time taken: 0.031s, loss: 1.4e+05 900: Time taken: 0.031s, loss: 1.2e+05 1000: Time taken: 0.031s, loss: 1.1e+05 1100: Time taken: 0.031s, loss: 9.6e+04 1200: Time taken: 0.031s, loss: 8.6e+04 1300: Time taken: 0.031s, loss: 7.7e+04 1400: Time taken: 0.031s, loss: 7e+04 1500: Time taken: 0.031s, loss: 6.3e+04 1600: Time taken: 0.031s, loss: 5.8e+04 1700: Time taken: 0.031s, loss: 5.3e+04 1800: Time taken: 0.031s, loss: 4.9e+04 1900: Time taken: 0.031s, loss: 4.5e+04 2000: Time taken: 0.032s, loss: 4.2e+04 ``` Testing w/ small graph to check for regression: ``` import torch from torch.utils.benchmark import Timer setup=""" a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) """ stmt=""" torch.autograd.grad(a*b, [a, b], gradient) """ timer = Timer(stmt, setup) print(timer.timeit(10000)) print(timer.collect_callgrind(100)) ``` Result: there doesn't seem to be any significant regression ``` Time before: 12.74 us Time after: 13.12 us Instruction count before: All Noisy symbols removed Instructions: 8078960 8000882 Baseline: 4226 3838 Instruction count after: All Noisy symbols removed Instructions: 8091846 8017940 Baseline: 4336 3838 100 runs per measurement, 1 thread ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52180 Reviewed By: gchanan, zhangguanheng66 Differential Revision: D26794387 Pulled By: soulitzer fbshipit-source-id: c00d387a29f151109c33dc6f1b56a8f275cdec58	2021-03-04 16:13:53 -08:00
Ansley Ussery	85109ce427	Support submodule manipulation in GraphModule (#52358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52358 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26759260 Pulled By: ansley fbshipit-source-id: 25d2b9124a7d957704f1700a45dca143aaed391d	2021-03-04 14:52:35 -08:00
Nikita Shulga	72ec718373	Leak autograd threads after wait limit (#53170 ) Summary: Leak autograd threads if TORCH_AUTOGRAD_SHUTDOWN_WAIT_LIMIT is reached (default to 10 seconds) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53170 Reviewed By: zhangguanheng66 Differential Revision: D26821983 Pulled By: malfet fbshipit-source-id: 310960564da7cd8c9f475432a8efbee32cfe6009	2021-03-04 14:42:15 -08:00
Howard Huang	51718c2f3c	Update CODEOWNERS to be tagged as reviewer (#53277 ) Summary: Fixes #FOMOOCR (fear of missing out on code review) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53277 Reviewed By: mrshenli Differential Revision: D26820361 Pulled By: H-Huang fbshipit-source-id: 9e985a6a7e6dbda5e454f54fa95cc7d7050245b2	2021-03-04 14:05:36 -08:00
Rong Rong (AI Infra)	b0aa03b703	fix tensorpipe_agent linked even when USE_TENSORPIPE is turned off (#53281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53281 Reviewed By: xuzhao9 Differential Revision: D26822375 Pulled By: walterddr fbshipit-source-id: d4e2b7ed1b38782a9e7f6c5b96b7bb0e31c4bdae	2021-03-04 13:29:27 -08:00
Evelyn Fitzgerald	b4395b046a	Edit SiLU documentation (#53239 ) Summary: I edited the documentation for `nn.SiLU` and `F.silu` to: - Explain that SiLU is also known as swish and that it stands for "Sigmoid Linear Unit." - Ensure that "SiLU" is correctly capitalized. I believe these changes will help users find the function they're looking for by adding relevant keywords to the docs. Fixes: N/A Pull Request resolved: https://github.com/pytorch/pytorch/pull/53239 Reviewed By: jbschlosser Differential Revision: D26816998 Pulled By: albanD fbshipit-source-id: b4e9976e6b7e88686e3fa7061c0e9b693bd6d198	2021-03-04 12:51:25 -08:00
lezcano	7aeee2849b	Parametrization Functionality (#33344 ) Summary: Provides the implementation for feature request issue https://github.com/pytorch/pytorch/issues/28937. Adds the `Parametrization` functionality and implements `Pruning` on top of it. It adds the `auto` mode, on which the parametrization is just computed once per forwards pass. The previous implementation computed the pruning on every forward, which is not optimal when pruning RNNs for example. It implements a caching mechanism for parameters. This is implemented through the mechanism proposed at the end of the discussion https://github.com/pytorch/pytorch/issues/7313. In particular, it assumes that the user will not manually change the updated parameters between the call to `backwards()` and the `optimizer.step()`. If they do so, they would need to manually call the `.invalidate()` function provided in the implementation. This could be made into a function that gets a model and invalidates all the parameters in it. It might be the case that this function has to be called in the `.cuda()` and `.to` and related functions. As described in https://github.com/pytorch/pytorch/issues/7313, this could be used, to implement in a cleaner way the `weight_norm` and `spectral_norm` functions. It also allows, as described in https://github.com/pytorch/pytorch/issues/28937, for the implementation of constrained optimization on manifolds (i.e. orthogonal constraints, positive definite matrices, invertible matrices, weights on the sphere or the hyperbolic space...) TODO (when implementation is validated): - More thorough test - Documentation Resolves https://github.com/pytorch/pytorch/issues/28937 albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/33344 Reviewed By: zhangguanheng66 Differential Revision: D26816708 Pulled By: albanD fbshipit-source-id: 07c8f0da661f74e919767eae31335a9c60d9e8fe	2021-03-04 12:45:27 -08:00
Scott Wolchok	3826a07a63	[PyTorch] Don't inline Dispatcher::call on mobile (#53197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53197 This probably causes a code size blowup and we care more about the size savings than the incremental perf on mobile. ghstack-source-id: 122977713 Test Plan: buildsizebot some mobile apps Reviewed By: dhruvbird Differential Revision: D26731181 fbshipit-source-id: 78a926278a85028af09bfa0731d4d59a55ee3746	2021-03-04 11:10:16 -08:00
Edward Yang	8c54cd7f37	Declare NamedTuple at top level (#53273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53273 This prevents a mypy bug. Fixes #53272 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26819428 Pulled By: ezyang fbshipit-source-id: e71575ed13321665a976cc5ef8b2993c00626b7d	2021-03-04 10:41:40 -08:00
Edward Yang	9e5e5a7d96	Revert D26815021: Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26815021 Original commit changeset: 972eaafcdf14 fbshipit-source-id: e528260e1aa91df1873c73af00aa57addd671607	2021-03-04 09:28:25 -08:00
Joel Schlosser	6557ea0509	Context manager for hiding source ranges (#53188 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52456 ## Background Provides a context manager `_hide_source_ranges()` that disables printing graph source ranges by default. It can be overridden on a per-graph basis if desired. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53188 Test Plan: ``` python test/test_jit.py TestJit.test_hide_source_ranges_context_manager ``` ```python import torch torch.jit.script def foo(x): return torch.add(x, x) print(foo.graph) with torch.jit._hide_source_ranges(): print(foo.graph) # Override context manager print(foo.graph.str(print_source_ranges=True)) print(foo.graph) ``` ``` graph(%x.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%x.1, %x.1, %3) # /Users/jbschlosser/misc/example.py:5:11 return (%4) graph(%x.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%x.1, %x.1, %3) return (%4) graph(%x.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%x.1, %x.1, %3) # /Users/jbschlosser/misc/example.py:5:11 return (%4) graph(%x.1 : Tensor): %3 : int = prim::Constant[value=1]() %4 : Tensor = aten::add(%x.1, %x.1, %3) # /Users/jbschlosser/misc/example.py:5:11 return (%4) ``` Reviewed By: walterddr, zhangguanheng66 Differential Revision: D26817070 Pulled By: jbschlosser fbshipit-source-id: e9d123452c616b0a9dda9e134ef6c2886f229d9b	2021-03-04 09:11:08 -08:00
Ansley Ussery	6dce0cd0d4	Optimize module path finding (#52990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52990 This PR changes module path finding from O(N^2) to O(1) Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D26779399 Pulled By: ansley fbshipit-source-id: ff49d8e10bb4f82583ab4757926198ed46507c29	2021-03-04 09:00:30 -08:00
76181208+imaginary-person@users.noreply.github.com	e698a634cc	Enabled amin & amax for float16 & bfloat16 (#52579 ) Summary: 1. Enabled `amax` & `amin` for `float16` & `bfloat16` dtypes for both CPU & CUDA. 2. Added `OpInfo`s for `amax` & `amin`. 3. Enabled `test_min_with_inf` & `test_max_with_inf` for both `float16` & `bfloat16`, as they also use `torch.amin` & `torch.amax` respectively. 4. Enabled `test_amax` & `test_amin` for `float16` but not for `bfloat16`, as comparison is done with `numpy`, which doesn't support `bfloat16`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52579 Reviewed By: pbelevich Differential Revision: D26784194 Pulled By: heitorschueroff fbshipit-source-id: 1050de3e155b83f282fb30b0db6658eead89936c	2021-03-04 07:03:03 -08:00
Nicolas Hug	5095332ab9	Minor cleanup of interpolate microbenchmark Summary: Minor cleanup, addresses comments from https://www.internalfb.com/diff/D26780116 (`1559fa6a5c`) Test Plan: ``` ➜ vision buck run //caffe2/benchmarks/operator_benchmark/pt:interpolate_test -- --tag_filter short Parsing buck files: finished in 0.6 sec Building: finished in 6.2 sec (100%) 10951/10951 jobs, 0 updated Total time: 6.9 sec /data/users/nicolashug/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/interpolate_test#link-tree/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastTrue_modenearest # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: True, mode: nearest Forward Execution Time (us) : 1346.156 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastTrue_modelinear # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: True, mode: linear Forward Execution Time (us) : 1283.784 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastTrue_modebicubic # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: True, mode: bicubic Forward Execution Time (us) : 4769.578 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastFalse_modenearest # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: False, mode: nearest Forward Execution Time (us) : 982.910 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastFalse_modelinear # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: False, mode: linear Forward Execution Time (us) : 1182.191 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastFalse_modebicubic # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: False, mode: bicubic Forward Execution Time (us) : 3545.873 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,600,400)_output_size(240,240)_channels_lastTrue_modenearest # Input: input_size: (1, 3, 600, 400), output_size: (240, 240), channels_last: True, mode: nearest Forward Execution Time (us) : 34373.955 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,600,400)_output_size(240,240)_channels_lastTrue_modelinear # Input: input_size: (1, 3, 600, 400), output_size: (240, 240), channels_last: True, mode: linear Forward Execution Time (us) : 42248.109 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,600,400)_output_size(240,240)_channels_lastTrue_modebicubic # Input: input_size: (1, 3, 600, 400), output_size: (240, 240), channels_last: True, mode: bicubic Forward Execution Time (us) : 405944.286 ... ``` Reviewed By: fmassa Differential Revision: D26782757 fbshipit-source-id: 2039e1e6b4fea2b56bb4bcf2a017476f928e4928	2021-03-04 05:36:28 -08:00
Mike Ruberry	b864457743	Revert D26744062: Add assert_async Test Plan: revert-hammer Differential Revision: D26744062 (`12d63cc2f5`) Original commit changeset: be6d2653afe5 fbshipit-source-id: 972eaafcdf14d96abdec3dea6bcbd5cac1f3d759	2021-03-04 04:11:25 -08:00
Kyle Chen	bf5e5bf901	[ROCm] Enable test in test_linalg.py, test_optim.py and test_vmap.py … (#52818 ) Summary: Enable test in test_linalg.py, test_optim.py, and test_vmap.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52818 Reviewed By: H-Huang Differential Revision: D26694091 Pulled By: mruberry fbshipit-source-id: 285d17aa7f271f4d94b5fa9d9f6620de8a70847b	2021-03-04 02:29:45 -08:00
kshitij12345	c4c77e2001	[special] add `torch.special` namespace (#52296 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50345 * Add `torch.special` namespace * Add `torch.special.gammaln` (alias to `torch.lgamma`) TODO: * Add proper entries for docs. * [x] Add .rst file entry * [x] Add documentation * [x] Update `lgamma` OpInfo entry for alias to `special.gammaln`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52296 Reviewed By: ngimel Differential Revision: D26754890 Pulled By: mruberry fbshipit-source-id: 73479f68989d6443ad07b7b02763fa98973c15f6	2021-03-04 00:04:36 -08:00
nikithamalgi	c5b0c2fa8b	Support torch.complex (#53227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53227 ======== Adds Support for Torch.complex in JIT Test: ==== python test/test_jit.py -k test_torch_complex Test Plan: Imported from OSS Reviewed By: zdevito, bhosmer Differential Revision: D26808285 Pulled By: nikithamalgifb fbshipit-source-id: c6918b2baac814e78613a264d90941b8c6102237	2021-03-04 00:03:00 -08:00
Ansha Yu	d98839e53e	[static runtime] register pow out variant (#52454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52454 Test Plan: adfinder local net Before: 7.13307 ms/iter 0.0222672 ms. 0.311136%. aten::pow (1 nodes) After: 7.10623 ms/iter 0.0174462 ms. 0.242774%. aten::pow (1 nodes) Reviewed By: malfet, hlu1 Differential Revision: D26521717 fbshipit-source-id: 8d9279b59d37c8786a9eeccd0f54bd84c400c128	2021-03-03 21:33:11 -08:00
Nikita Shulga	68810c1836	Delete test_rand_quantization (#53234 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53234 Test has been permanently skipped since Nov 2019, see https://github.com/pytorch/pytorch/pull/29463 Test Plan: CI Reviewed By: mruberry Differential Revision: D26802660 fbshipit-source-id: ea66be1afd4d7cfbe692594df5d9dd8c29bc5d23	2021-03-03 20:59:00 -08:00
Nikita Shulga	457b9f672c	[CI]Shard cuda11_1 tests (#53235 ) Summary: As single pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test hits the timeout Pull Request resolved: https://github.com/pytorch/pytorch/pull/53235 Reviewed By: glaringlee Differential Revision: D26802806 Pulled By: malfet fbshipit-source-id: 8dbd30defa978e806d685b0d851145dc7a9049b4	2021-03-03 20:31:14 -08:00
Basil Hosmer	d5507aa5b5	fix output dtype test in compute_types (#52731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52731 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26630251 Pulled By: bhosmer fbshipit-source-id: 5f61967c7e94882a3cc3c1b6beaa2b69d68b9656	2021-03-03 20:30:00 -08:00
Basil Hosmer	fc7171badc	inline TensorIteratorConfig setters (#52661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52661 Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26602944 Pulled By: bhosmer fbshipit-source-id: 54ab402a33cb35927ca5de0106884223475f7528	2021-03-03 20:26:47 -08:00
Mike Ruberry	30a8a13a7d	Revert D26625807: [pytorch][PR] Deduplicate shared params before constructing Reducer in DDP Test Plan: revert-hammer Differential Revision: D26625807 (`5c15a5bb46`) Original commit changeset: f5f5959fef90 fbshipit-source-id: c875cc86b8fd21d9d64f934559f8e3126ed1d23d	2021-03-03 20:05:47 -08:00
Scott Wolchok	38a34887ac	[PyTorch] Fix missing move in {List,Tuple}Construct (#53206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53206 Copying the List in ListConstruct is 1 extra refcount bump. Copying the vector in TupleConstruct is 1 extra bump per tuple element. ghstack-source-id: 123001815 Test Plan: Don't have a precise measurement but it's very roughly 0.5% off total time for AdIndexer inline_cvr based on wall time, and more like 1.2% based on change in perf profile. Reviewed By: hlu1 Differential Revision: D26790670 fbshipit-source-id: 697ef82fe72a85719bf8ce28f2bb87fe56bbd8ad	2021-03-03 19:28:44 -08:00
Yi Wang	68b62493b8	[Gradient Compression] Make GradBucket class public (#53099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53099 Publish GradBucket APIs for publishing DDP communication hooks. s/_GradBucket/GradBucket ghstack-source-id: 123030921 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26721121 fbshipit-source-id: ee5f68e33095b9965b51937b86cdeb331fd2419a	2021-03-03 19:22:15 -08:00
Yi Wang	b59075eced	[Gradient Compression] Refactor tensor grouping in PowerSGD (#52981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52981 No need to create a hard boundary between rank-1 tensors and high-rank tensors, since some high-rank tensors will not be compressed if the compression cannot save enough bandwidth, according to `_should_compress` function. Therefore, refactor and simplify the tensor grouping logic, which addresses the comment in https://github.com/pytorch/pytorch/pull/52541#discussion_r580867311 ghstack-source-id: 122997032 Test Plan: waitforbuildbot Already LGTMed by PowerSGD paper author. Ads1x (completed): https://www.internalfb.com/intern/tupperware/details/job/?handle=priv3_global%2Fmast_hpc%2Ftsm_hpc-wayi_ads_10x_POWER_SGD_gpu8_2021-02-28_15-29.trainer&tatwTabs=tasks&task_id=0&task_tab=TASK_LOGS Detectron2: 1) Before refactoring: f254353864 Accuracy: 39.972 Overall training speed: 67498 iterations in 6:15:42 (0.3340 s / it) 2) After refactoring: f254353380 Accuracy: 39.944 Overall training speed: 67498 iterations in 6:09:41 (0.3286 s / it) Reviewed By: rohan-varma Differential Revision: D26713689 fbshipit-source-id: 12cfcb65feaa2a2d94e3c7793073031f13828305	2021-03-03 19:20:41 -08:00
Hao Lu	248e8b42fa	[Static Runtime] Use native version of at::empty (#53216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53216 - at::native::empty_cpu calls at::detail::empty_cpu without any changes to the arguments. So we could call at::detail::empty_cpu directly. - There is no need to create a TensorOptions object first since we can get all the relevant information from the tensor directly. Reviewed By: bertmaher, swolchok Differential Revision: D26792255 fbshipit-source-id: 7a4e368a19cea79e136e34dab854cb1d37dbeb58	2021-03-03 17:13:26 -08:00
Ansha Yu	9b7396e7e2	[pyper] casted_batch_one_hot_lengths with 4-arg to (#53215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53215 The current 5-arg version doesn't fuse the inline_cvr model instances Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_weights=/data/users/ansha/tmp/adfinder/models/c2_local_weight_data.pb --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_local_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_local_net.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models_dianshi/210494966_0.predictor.disagg.local.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/local_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=1 --compare_results=1 --iters=2000 --warmup_iters=2000 --num_threads=1 --do_profile=1 --do_benchmark --benchmark_c2_predictor=1 ``` ``` Time per node type: 3.82029 ms. 71.8523%. aten::addmm (9 nodes) 0.926298 ms. 17.4219%. fb::sigrid_transforms (1 nodes) 0.122496 ms. 2.30391%. fb::clip_ranges_gather (210 nodes) 0.11985 ms. 2.25416%. fb::clip_ranges_gather_sigrid_hash_precompute_v3 (54 nodes) 0.0973721 ms. 1.83138%. aten::sigmoid (3 nodes) 0.0352937 ms. 0.663807%. fb::batch_box_cox (1 nodes) 0.034759 ms. 0.65375%. prim::TupleConstruct (1 nodes) 0.0222235 ms. 0.417981%. aten::index (4 nodes) 0.0215314 ms. 0.404964%. fb::casted_batch_one_hot_lengths (1 nodes) 0.0199659 ms. 0.375521%. fb::concat_add_mul_replacenan_clip (1 nodes) 0.0192885 ms. 0.362779%. aten::cat (2 nodes) 0.0181285 ms. 0.340963%. aten::mul (2 nodes) 0.0109381 ms. 0.205725%. aten::pow (1 nodes) 0.0091476 ms. 0.172049%. prim::ListConstruct (8 nodes) 0.00794012 ms. 0.149338%. aten::relu (2 nodes) 0.00668873 ms. 0.125802%. prim::ListUnpack (1 nodes) 0.00569745 ms. 0.107158%. aten::to (4 nodes) 0.00527507 ms. 0.099214%. aten::narrow_copy (4 nodes) 0.00483189 ms. 0.0908785%. fb::lengths_range (4 nodes) 0.00399056 ms. 0.0750548%. aten::logit (1 nodes) 0.00324574 ms. 0.0610462%. fb::gather_ranges (4 nodes) 0.00161166 ms. 0.0303122%. fb::clip_ranges (2 nodes) 5.31686 ms. in Total StaticRuntime setup time: 0.016461 ms Memory allocation time: 0.00220284 ms Memory deallocation time: 0.118134 ms Outputs deallocation time: 0.0674883 ms Total memory managed: 716352 bytes Total number of reused tensors: 22 ``` Reviewed By: hlu1 Differential Revision: D26789260 fbshipit-source-id: 52adadddaae29a946de8a58bd592c06e6d4ce8c8	2021-03-03 16:41:39 -08:00
Edward Yang	12d63cc2f5	Add assert_async (#53086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53086 Fixes #36853 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26744062 Pulled By: ezyang fbshipit-source-id: be6d2653afe584adf67a05b5d43185b40764650d	2021-03-03 16:18:07 -08:00
Jane Xu	14a2ef0932	Deduplicate test cases in suites by taking the longer test case (#53154 ) Summary: Also removes unneeded filename field in S3. Tested locally: I locally installed ``` conda install -c anaconda boto3 conda install -c conda-forge unittest-xml-reporting ``` I ran `python test/test_type_hints.py --save-xml=/tmp/reports/test_type_hints` twice to generate two reports of the same test cases. Then, I edited the print_test_stats.py file to print the report instead of upload to S3, and then ran `CIRCLE_SHA1="$(git rev-parse HEAD)" CIRCLE_JOB=foo python torch/testing/_internal/print_test_stats.py --upload-to-s3 /tmp/reports/test_type_hints`. I verified the report object looked correct: ``` { 'build_pr': '', 'build_tag': '', 'build_sha1': '67cecd7f6cf2956bda1178ae2369cd74ba946f78', 'build_branch': '', 'build_job': 'foo', 'build_workflow_id': '', 'total_seconds': 67.316, 'format_version': 2, 'files': { 'test/test_type_hints': { 'total_seconds': 67.316, 'suites': { 'TestTypeHints': { 'total_seconds': 67.316, 'cases': { 'test_doc_examples': { 'seconds': 8.821, 'status': None }, 'test_run_mypy': { 'seconds': 58.495, 'status': None } } } } } } } ``` It did take the longer of the two test cases for both test cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53154 Reviewed By: samestep Differential Revision: D26793522 Pulled By: janeyx99 fbshipit-source-id: 5644c1bd38acb8bca0d69851cf1d549a03334b7a	2021-03-03 16:12:44 -08:00
Sam Estep	c94b8e13ec	Remove docker_config_defaults from CircleCI config (#53200 ) Summary: It doesn't seem to be used anywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53200 Test Plan: CI. Reviewed By: xuzhao9 Differential Revision: D26785924 Pulled By: samestep fbshipit-source-id: f4698ff2c213d4679e6d76b7677d9b9004917ee1	2021-03-03 16:05:47 -08:00
Wanchao Liang	79944f7ad9	[fx] simple doc fix Reviewed By: houseroad Differential Revision: D26739803 fbshipit-source-id: e680ce961a9ed1a5042d675aca9f5cf118c8ff85	2021-03-03 15:47:40 -08:00
Yi Wang	ba36e32406	[Gradient Compression] Correct the usage of min_compression_rate (#52979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52979 Compression rate = uncompressed size / compressed size, so the compression rate is usually greater than 1. Previously the compression rate was perceived as compressed size / uncompressed size, which can be very confusing. ghstack-source-id: 122996272 Test Plan: unit tests Reviewed By: zhaojuanmao Differential Revision: D26713349 fbshipit-source-id: 83b7f8908c101954cf01f56a22161047fbfeaa53	2021-03-03 15:35:40 -08:00
Xiao Wang	d30f4d1dfd	Migrate apex.parallel.SyncBatchNorm channels_last to pytorch (#46906 ) Summary: per title This PR did - Migrate `apex.parallel.SyncBatchNorm` channels_last to pytorch `torch.nn.SyncBatchNorm` - Fix a TODO here by fusing `sum`, `div` kernels into backward elementwise kernel `b167402e2e/torch/nn/modules/_functions.py (L76-L95)` Todo - [x] Discuss a regression introduced in https://github.com/pytorch/pytorch/pull/37133#discussion_r512530389, which is the synchronized copy here `b167402e2e/torch/nn/modules/_functions.py (L32-L34)` Comment: This PR uses apex version for the size check. Test passed and I haven't seen anything wrong so far. - [x] The restriction to use channels_last kernel will be like this ``` inline bool batch_norm_use_channels_last_kernels(const at::Tensor& self) { return self.is_contiguous(at::MemoryFormat::ChannelsLast) \|\| self.ndimension() == 2; } ``` I think we can relax that for channels_last_3d as well? Comment: we don't have benchmark for this now, will check this and add functionality later when needed. - [x] Add test - [x] Add benchmark Detailed benchmark is at https://github.com/xwang233/code-snippet/tree/master/syncbn-channels-last Close https://github.com/pytorch/pytorch/issues/50781 Pull Request resolved: https://github.com/pytorch/pytorch/pull/46906 Reviewed By: albanD Differential Revision: D26771437 Pulled By: malfet fbshipit-source-id: d00387044e9d43ac7e6c0e32a2db22c63d1504de	2021-03-03 15:29:45 -08:00
Mike Ruberry	9c2673df46	Revert D26723384: [pytorch][PR] Implements `torch.linalg.lstsq` Test Plan: revert-hammer Differential Revision: D26723384 (`3ac9013235`) Original commit changeset: c9866a95f140 fbshipit-source-id: 3e5263d71facdc91ca09d7dcbbbe3ba818ee2821	2021-03-03 15:24:25 -08:00
Ilia Cherniavskii	a812175173	Update Kineto revision (#53199 ) Summary: Update Kineto revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/53199 Reviewed By: gdankel Differential Revision: D26784476 Pulled By: ilia-cher fbshipit-source-id: 7e908f63ee2790ddb5348c580ad5a4d5ad94b921	2021-03-03 15:07:18 -08:00
Zafar Takhirov	096c66a99f	[sparsity][refactor] Rename row/col to out/in features Summary: Names such as `row_block_size` and `col_block_size` might be ambiguous, especially if different engines use different tensor layouts (i.e. rows=output features, etc.). Having names such as `out_features_block_size` and `in_features_block_size` makes more sense Test Plan: `buck test mode/opt //caffe2/torch/fb/model_optimization:sparsity_test` ``` Building with Remote Execution [RE]. Used 36:09 minutes of total time. [RE] Waiting on 0 remote actions. Completed 264 actions remotely. Building: finished in 02:34.4 min (100%) 18884/18884 jobs, 420 updated Total time: 02:34.8 min More details at https://www.internalfb.com/intern/buck/build/b34b5c52-eba6-4e17-92f9-1f5ce620f8f0 Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: 8fe8fa95-c1f8-4b4f-9cbf-88b3b1b28eaf Trace available for this run at /tmp/tpx-20210302-000019.503678/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/4785074650825194 ✓ ListingSuccess: caffe2/torch/fb/model_optimization:sparsity_test - main (4.094) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseKernels) (1.896) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (1.907) ✓ Pass: caffe2/torch/fb/model_optimization:sparsity_test - test_sparse_qlinear_serdes (caffe2.torch.fb.model_optimization.test.sparsity.quantized_test.TestQuantizedSparseLayers) (2.035) Summary Pass: 3 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/4785074650825194 ``` Reviewed By: dskhudia Differential Revision: D26747065 fbshipit-source-id: 685fe864062ed532de284b22db757a921806d4ab	2021-03-03 15:05:40 -08:00
Sam Estep	f7d65c5cd2	Use .gv instead of .dot for Graphviz in fast_nvcc (#53208 ) Summary: See this page for context: https://marc.info/?l=graphviz-devel&m=129418103126092 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53208 Test Plan: ``` tools/fast_nvcc/fast_nvcc.py --help ``` Reviewed By: janeyx99 Differential Revision: D26791398 Pulled By: samestep fbshipit-source-id: 6a0363a4664e79b80ddf2ae799ec05ee7d028357	2021-03-03 15:01:21 -08:00
Zafar Takhirov	86166f2124	[quant][fix] MHA tensor assignment fix (#53031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53031 During the module conversion, the weight was assigned directly to the linear layer inside the quantizable MHA. Instead the weight must be assigned to the `layer.weight`. Test Plan: `buck test mode/opt //caffe2/test:quantization -- test_custom_module_multi_head_attention` ``` Building: finished in 6.9 sec (100%) 7316/7316 jobs, 3 updated Total time: 7.4 sec More details at https://www.internalfb.com/intern/buck/build/914cb095-806e-4891-8822-e2644283f05c Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details. Running with tpx session id: fcccbd0b-a887-4874-8455-d1cf8411be1d Trace available for this run at /tmp/tpx-20210301-004359.492205/trace.log Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/1688849910412609 ✓ ListingSuccess: caffe2/test:quantization - main (2.440) ✓ Pass: caffe2/test:quantization - test_custom_module_multi_head_attention (quantization.test_quantized_op.TestQuantizedOps) (5.672) Summary Pass: 1 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1688849910412609 ``` Reviewed By: raghuramank100 Differential Revision: D26720500 fbshipit-source-id: 3ba5d5df1c23cc5150c4a293d3c93c44dc702e50	2021-03-03 14:49:19 -08:00
Tugsbayasgalan Manlaibaatar	4008df3507	Add property binding in torchbind (#50670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50670 This PR adds property support to Torchbind. There are two cases that it needs to work: Torchscript Inside Torchscript, we don't go through pybind so there is no issue with accessing properties through ClassType. Eager Mode In Eager Mode, Torchbind creates ScriptObject which we cannot dynamically add (aka access) properties after initializing it. (https://stackoverflow.com/questions/1325673/how-to-add-property-to-a-class-dynamically ) Therefore we created a Python wrapper (ScriptObjectWrapper) around ScriptObject where we can use property method to set properties. By doing so, we can look up wrapped object's property through __getattr__ method of the ScriptObjectWrapper. This logic is inspired from https://github.com/pytorch/pytorch/pull/44324 Test Plan: test cases in test_torchbind.py Imported from OSS Reviewed By: pbelevich Differential Revision: D26632781 fbshipit-source-id: dd690887cfda0c48ff0d104aa240ce0ab09055bc	2021-03-03 14:25:52 -08:00
Pritam Damania	59c0c19be2	Add RemoteModule to master RPC docs. (#53084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53084 Adding RemoteModule to master RPC docs since it is a prototype feature. ghstack-source-id: 122816689 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26743372 fbshipit-source-id: 00ce9526291dfb68494e07be3e67d7d9c2686f1b	2021-03-03 13:52:11 -08:00
Ashkan Aliabadi	e5ecd1ddf8	[Vulkan]Fix build warnings-treated-as-error on Linux. (#52781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52781 Test Plan: Imported from OSS Reviewed By: SS-JIA Differential Revision: D26669311 Pulled By: AshkanAliabadi fbshipit-source-id: 78b08d0b264d4d5cf8af964c589b9b7d0ddc7311	2021-03-03 13:48:43 -08:00
Ashkan Aliabadi	f3190a77b2	[Vulkan] Update VMA to VMA::e74dc79903f3e59b15a48f112b5c804fea2220b0. (#52938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52938 Update VMA to git revision e74dc79903f3e59b15a48f112b5c804fea2220b0 to fix https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator/issues/164 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D26784525 Pulled By: AshkanAliabadi fbshipit-source-id: a1d88f708f4d64d00167b2f02fefd7d51a25a3ca	2021-03-03 13:47:05 -08:00
Supriya Rao	7cec4b3d4a	[quant][fx] add _remove_qconfig flag to convert_fx (#53166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53166 Context: For fx modules that consist of scriptmodules, calling delattr(module, 'qconfig') throws an attribute error. will follow up with a separate issue/repro to fix this problem This PR adds a temporary flag to convert_fx API to preserve the qconfig attributes on the converted model We will remove this flag once we reach a conclusion on calling delattr on scriptmodules Test Plan: python test/test_quantization.py test_preserve_qconfig Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26771518 fbshipit-source-id: 9fd72816576856ffb4aa11f8fde08303d1df10a2	2021-03-03 12:58:05 -08:00
Stephen Jia	25a3732c8d	[vulkan] Add, sub, mul, and div ops with broadcasting for Vulkan (#52842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52842 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26665326 Pulled By: SS-JIA fbshipit-source-id: ed73918a5cd3390d6c8a7fa284c79eb7de9f9906	2021-03-03 12:55:54 -08:00
James Reed	8b5b7fa83d	[WIP][FX] Optionally record stack traces when symtracing (#53081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53081 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D26742402 Pulled By: jamesr66a fbshipit-source-id: 7987f9ddf061f6de3b4a638d98e0fae6d68d90c6	2021-03-03 12:30:43 -08:00
Yi Wang	510c03d922	[Gradient Compression] Remove some low-level methods of GradBucket class (#53098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53098 Remove some low-level methods that are no longer needed since `get_per_parameter_tensors` method is added to `GradBucket` class. Avoid unnecessary exposure to the internals before publishing GradBucket APIs. ghstack-source-id: 122979064 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: osalpekar Differential Revision: D26784249 fbshipit-source-id: d1b27bb026989c25a5b65be4767cb752afd6f19b	2021-03-03 12:06:14 -08:00
Wanchao Liang	f8238d7917	[optim] bugfix when all parameters have no grad (#52944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52944 This fix the bug introduced during refactoring optimizers https://github.com/pytorch/pytorch/pull/50411. When all parameters have no grads, we should still allows `beta` like hyper params to be defined. Reviewed By: ngimel Differential Revision: D26699827 fbshipit-source-id: 8a7074127704c7a4a1fbc17d48a81e23a649f280	2021-03-03 11:56:09 -08:00
Akshit Khurana	ecd8e4c1d5	Add guard to run on current thread (#52361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52361 Test Plan: buck build //xplat/caffe2:aten_test_test_thread_pool_guard ./aten_test_test_thread_pool_guard Reviewed By: kimishpatel Differential Revision: D26429540 fbshipit-source-id: 16e4a56d4bf9b73b1ea1ff88d7dc6730e0b1e029	2021-03-03 11:43:40 -08:00
Edward Yang	0f81a69a96	Make meta a device (getting rid of empty_meta) (#53143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53143 Meta is now an honest to goodness device type, like cpu, so you can use device='meta' to trigger allocation of meta tensors. This way better than empty_meta since we now have working API for most factory functions (they don't necessarily work yet, though, because need to register Meta versions of those functions.) Some subtleties: - I decided to drop the concept of CPU versus CUDA meta tensors; meta tensors are device agnostic. It's hard to say exactly what the correct level of abstraction here is, but in this particular case implementation considerations trump semantic considerations: it is way easier to have just a meta device, than to have a meta device AND a cpu device AND a cuda device. This may limit the applicability of meta tensors for tracing models that do explicit cpu()/cuda() conversions (unless, perhaps, we make those operations no-ops on meta tensors). - I noticed that the DeviceType uppercase strings are kind of weird. Are they really supposed to be all caps? That's weird. - I moved the Meta dispatch key to live with the rest of the "device" dispatch keys. - I intentionally did NOT add a Backend for Meta. For now, I'm going to hope meta tensors never exercise any of the Backend conversion code; even if it does, better to fix the code to just stop converting to and from Backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: samestep Differential Revision: D26763552 Pulled By: ezyang fbshipit-source-id: 14633b6ca738e60b921db66a763155d01795480d	2021-03-03 11:24:13 -08:00
Edward Yang	fd3004d3ee	Add NoOpDeviceGuardImpl (#53142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53142 It turns out to make Meta a device I need to substantively reuse the CPUGuardImpl implementation. It's pretty parametrizable so just move this over to DeviceGuardImplInterface templated over the DeviceType. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: anjali411, samestep Differential Revision: D26763553 Pulled By: ezyang fbshipit-source-id: 464fb3e3a72ba7c55a12adffe01c18171ce3e857	2021-03-03 11:24:08 -08:00
Edward Yang	99098c1d70	Delete dead Backend toSparse (#53116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53116 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26753226 Pulled By: ezyang fbshipit-source-id: 2941876d546c39ee3913c2ffffdb0a0ea7360f0c	2021-03-03 11:22:03 -08:00
Scott Wolchok	f5e725527d	[PyTorch] Save a single add instruction in the dispatcher (#52543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52543 This saves one (1) add instruction. New code comments should explain exactly why. In short, we store a direct pointer in `OperatorHandle` in addition to the `std::list<OperatorDef>::iterator` because converting the latter to the former requires an add instruction. It is not clear to me whether this is a particularly great tradeoff, but I spent (more) time on it (than I expected), so here it is for review. ghstack-source-id: 122147199 Test Plan: Inspect assembly for at::empty in benchmark code -- see add instruction disappeared. Compare empty benchmark performance to baseline with perf stat. Baseline: 5,077.43 msec task-clock # 1.000 CPUs utilized ( +- 0.25% ) 405 context-switches # 0.080 K/sec ( +- 1.37% ) 3 cpu-migrations # 0.001 K/sec ( +- 18.22% ) 12,259 page-faults # 0.002 M/sec ( +- 0.10% ) 10,089,754,343 cycles # 1.987 GHz ( +- 0.25% ) (50.04%) 29,516,000,227 instructions # 2.93 insn per cycle ( +- 0.04% ) (50.08%) 5,662,629,032 branches # 1115.256 M/sec ( +- 0.02% ) (50.08%) 1,955,729 branch-misses # 0.03% of all branches ( +- 0.88% ) (50.04%) 5.0796 +- 0.0128 seconds time elapsed ( +- 0.25% ) After: ``` 5,017.77 msec task-clock # 1.001 CPUs utilized ( +- 0.19% ) 400 context-switches # 0.080 K/sec ( +- 3.09% ) 4 cpu-migrations # 0.001 K/sec ( +- 46.91% ) 12,240 page-faults # 0.002 M/sec ( +- 0.37% ) 9,960,189,535 cycles # 1.985 GHz ( +- 0.19% ) (50.02%) 29,467,149,773 instructions # 2.96 insn per cycle ( +- 0.11% ) (50.03%) 5,661,074,219 branches # 1128.206 M/sec ( +- 0.02% ) (50.07%) 2,032,712 branch-misses # 0.04% of all branches ( +- 1.35% ) (50.07%) 5.0151 +- 0.0101 seconds time elapsed ( +- 0.20% ) ``` 1.2% cycles win, outside the noise 0.16% instruction count win, barely outside noise I am surprised at the size of the cycles win. Reviewed By: bhosmer Differential Revision: D26564192 fbshipit-source-id: 71f731ba54ec1cb407673db691eaf77a257de4a9	2021-03-03 10:47:34 -08:00
Benjamin Lefaudeux	43906f9b8b	[ZeroRedundancyOptimizer] Minor stub fix (#53165 ) Summary: Not sure how important that is Tied to https://github.com/pytorch/pytorch/issues/53108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53165 Reviewed By: albanD Differential Revision: D26781956 Pulled By: blefaudeux fbshipit-source-id: b7daca0ea95be190a5ffeae12123e301204ed4eb	2021-03-03 10:15:10 -08:00
Michael Carilli	5c15a5bb46	Deduplicate shared params before constructing Reducer in DDP (#51929 ) Summary: Currently, `torch.nn.parallel.DistributedDataParallel(model...)` doesn't deduplicate params shared across `model`'s child Modules before calling Reducer with the param list. This can cause Reducer to register more than one hook on the shared param(s), at which point who knows what happens. We ran into this in mlperf BERT, which has at least one param shared across submodules (an embedding weight iirc, not 100% sure). Running with `gradient_as_bucket_view = False` produced different numerics from running with `gradient_as_bucket_view = True` (which i guess is one potential consequence of multiple DDP hooks on a given param, not sure why, i'd have to dig further). This PR changes DDP to deduplicate shared params (a small diff), and adds some tests (right now just `test_ddp_weight_sharing`, but I'll add more). `test_ddp_weight_sharing` fails with bad numerics on current master (proving the shared param issue is real) and passes with the deduplication diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51929 Reviewed By: zou3519 Differential Revision: D26625807 Pulled By: zhaojuanmao fbshipit-source-id: f5f5959fef90dfe2c55812d79fa88b877f22ecc3	2021-03-03 10:13:24 -08:00
Mike Ruberry	20860ab01a	Revert D26727918: [pytorch][PR] Added CUDA support for torch.orgqr Test Plan: revert-hammer Differential Revision: D26727918 (`e29d8477a6`) Original commit changeset: 1c4d15fa76ba fbshipit-source-id: f3d5d6811ab77332a333cd165d69fcd9ecd92dc6	2021-03-03 10:06:49 -08:00
Jane Xu	fbf60b5aaf	Store only coverage info as artifacts (#53150 ) Summary: I noticed https://github.com/pytorch/pytorch/issues/53126 stored everything in the test folder as an artifact, which isn't exactly what we want. Here, I try to store just the relevant info, coverage files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53150 Reviewed By: albanD Differential Revision: D26767185 Pulled By: janeyx99 fbshipit-source-id: 286d341ccdfa97d138a2048bb4ee01c7ae2579a1	2021-03-03 09:56:17 -08:00
Alban Desmaison	c8cc2e2133	Update CODEOWNERS for test_public_bindings (#53158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53158 Reviewed By: glaringlee Differential Revision: D26779568 Pulled By: albanD fbshipit-source-id: f7d56a30dff95dc3f24608ff01367c134cc08bbf	2021-03-03 09:27:04 -08:00
Yinghai Lu	a1d204807a	Add shape inference for SparseLengthsSumSparseLookup Summary: Just copy whatever corresponding input shape info. Or we will miss the shape info of output of SparseLengthsSumSparseLookup, which will be infered as the input of downstream SparseLengthsSum op, whose int64/int32 mode is undetermined. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: khabinov, ChunliF Differential Revision: D26769226 fbshipit-source-id: 4032bc4643a125095a48fa8c23ca4ebcf26dc29c	2021-03-03 09:25:29 -08:00
vfdev-5	1559fa6a5c	[operator benchmarks] Added more modes to interpolation tests (#53186 ) Summary: Description: - Added more modes: bicubic and nearest to interpolation tests - Added a test case for downsampling a small image Pull Request resolved: https://github.com/pytorch/pytorch/pull/53186 Reviewed By: albanD Differential Revision: D26780116 Pulled By: fmassa fbshipit-source-id: f4f498e6e1da1ec131e6d9d9f42dc482135ae9e2	2021-03-03 09:18:38 -08:00
Howard Huang	85e5fdb919	disable TCPStore multi_worker tests for windows (#53156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53156 Will SSH into windows machine to validate that these tests are skipped. Test Plan: Imported from OSS Reviewed By: osalpekar Differential Revision: D26769791 Pulled By: H-Huang fbshipit-source-id: e4427ba2d6cfe5a1de26e335cd27c1e8875174d3	2021-03-03 08:37:08 -08:00
Nikita Shulga	b3c4ac6319	Fix OpenBLAS discovery (#53168 ) Summary: Fix accidental regression introduced by https://github.com/pytorch/pytorch/issues/47940 `FIND_PACKAGE(OpenBLAS)` does not validate that discovered library can actually be used, while `check_fortran_libraries` does that Pull Request resolved: https://github.com/pytorch/pytorch/pull/53168 Test Plan: Build PyTorch with static OpenBLAS and check that `torch.svd(torch.ones(3, 3)).S` do not raise an exception Reviewed By: walterddr Differential Revision: D26772345 Pulled By: malfet fbshipit-source-id: 3e4675c176b30dfe4f0490d7d3dfe4f9a4037134	2021-03-03 08:23:02 -08:00
Erjia Guan	c957e2ab42	Add more datapipe to functional API (#53123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53123 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D26756638 Pulled By: ejguan fbshipit-source-id: 6ff0eb6c7ee702056ff19eeb723949e4642f2784	2021-03-03 07:01:00 -08:00
Erjia Guan	0aa9f22f1a	Move groupbykey to grouping (#53122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53122 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26756641 Pulled By: ejguan fbshipit-source-id: c4bc5864d841ce20c49446a03cfd195245b2be6e	2021-03-03 06:59:22 -08:00
Alban Desmaison	59b2b8b091	Revert D26727660: [pytorch][PR] Add OpInfo for `bitwise_not` and make ROCM and CUDA OpInfo tests consistent Test Plan: revert-hammer Differential Revision: D26727660 (`816646bd6f`) Original commit changeset: 3aea236cf000 fbshipit-source-id: 91c6ec0c55c0295bb209f450ae3c96bee0a37356	2021-03-03 06:08:48 -08:00
Hao Lu	d90d7245f4	[PyPer] Optimize sigrid_hash (#53065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53065 Reviewed By: ajyu Differential Revision: D26563512 fbshipit-source-id: a1a76f92ba500605ab2e3370737bd3965d81deb1	2021-03-03 01:31:53 -08:00
Chen Lai	30dd15e778	[PyTorch] Add doc string for lite interpreter related api in Android (#53136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53136 As title, doc string in ios, c++ and python is ready. As a reference, the doc string for other lite interpreter related apis [_load_for_mobile](https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/torch/csrc/jit/mobile/import.h?commit=c95d12f9d67ee198aa4b5aafec980e9048de1702&lines=16-43) [_save_for_lite_interpreter](https://www.internalfb.com/intern/diffusion/FBS/browsefile/master/fbcode/caffe2/torch/jit/_script.py?commit=b1d7f0ba6001beed6ba3b0a69a225abab4ed3866&lines=496-509) ghstack-source-id: 122936777 Test Plan: CI Reviewed By: IvanKobzarev, iseeyuan Differential Revision: D26742092 fbshipit-source-id: 76464b5e4ceafe71348b58ba2af98c3debdaae63	2021-03-02 23:17:54 -08:00
Chen Lai	a2a88990cd	[PyTorch] Remove extra RNN.cpp file (#53169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53169 As title, there are two `aten/src/ATen/native/RNN.cpp` in `aten_native_source_list` ghstack-source-id: 122936706 Test Plan: CI Reviewed By: dhruvbird, iseeyuan Differential Revision: D26715640 fbshipit-source-id: 54717ded9b293e022a47ab7891dfd04afae48ce5	2021-03-02 23:09:03 -08:00
Peter Bell	70d0aab7bd	De-prioritise Dimname and DimnameList in python overload resolution (#51350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51350 `None` being a valid `Dimname` is awkward for optional `dim` arguments, as found on NumPy's reduction functions like `std` and `var`. In these cases `dim=None` should mean an all-reduction, but instead you get an error "Please look up dimensions by name". I've also had to fix `FunctionParameter::check` to actually check the first element of `INT_LIST` arguments and reject non-int types. Otherwise, the dim names end up calling the `int[]` overload and fail. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D26756208 Pulled By: mruberry fbshipit-source-id: 44221ca0f4822ec2c1f62b092466fd4f779eb45a	2021-03-02 23:07:08 -08:00
kshitij12345	816646bd6f	Add OpInfo for `bitwise_not` and make ROCM and CUDA OpInfo tests consistent (#51944 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 This PR also enables the OpInfo tests on ROCM to check the same dtypes that of CUDA. Few tests have to be skipped (due to failure). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51944 Reviewed By: H-Huang Differential Revision: D26727660 Pulled By: mruberry fbshipit-source-id: 3aea236cf0002f46c2737afbda2ed3efccfe14f5	2021-03-02 22:56:40 -08:00
Ivan Yashchuk	926e011cde	Fixed out= variant of linalg.solve (#51968 ) Summary: This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. `linalg_solve_out` was broken for batched vector inputs and it's now fixed. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51968 Reviewed By: H-Huang Differential Revision: D26728825 Pulled By: mruberry fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455	2021-03-02 22:33:19 -08:00
Richard Barnes	bd7ac755d8	Fix loop type (#50484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50484 I currently see the compilation warning: ``` Jan 13 16:46:21 [3644/5223] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/core/ivalue.cpp.o Jan 13 16:46:21 ../aten/src/ATen/core/ivalue.cpp:855:22: warning: comparison of integers of different signs: 'int' and 'std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >::size_type' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:21 for (auto i = 0; i < slots_.size(); ++i) { ``` This diff fixes that Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25901674 fbshipit-source-id: 0a09570866f23b5878bf06f46f918d71a733974f	2021-03-02 21:59:31 -08:00
Ivan Yashchuk	e29d8477a6	Added CUDA support for torch.orgqr (#51348 ) Summary: This PR adds support for CUDA inputs for `torch.orgqr`. CUDA implementation is based on both [cuSOLVER](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) and MAGMA. cuSOLVER doesn't have a specialized routine for the batched case. While MAGMA doesn't have a specialized GPU native (without CPU sync) `orgqr`. But MAGMA has implemented (and not documented) the batched GPU native version of `larft` function (for small inputs of size <= 32), which together with `larfb` operation form `orgqr` (see the call graph [here at the end of the page](http://www.netlib.org/lapack/explore-html/da/dba/group__double_o_t_h_e_rcomputational_ga14b45f7374dc8654073aa06879c1c459.html)). So now there are two main codepaths for CUDA inputs (if both MAGMA and cuSOLVER are available): * if `batchsize > 1` and `tau.shape[-1] <= 32` then MAGMA based function is called * else [cuSOLVER's `orgqr`](https://docs.nvidia.com/cuda/cusolver/index.html#cuSolverDN-lt-t-gt-orgqr) is used. If MAGMA is not available then only cuSOLVER is used and vice versa. Documentation updates and possibly a new name for this function will be in a follow-up PR. Ref. https://github.com/pytorch/pytorch/issues/50104 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51348 Reviewed By: ngimel Differential Revision: D26727918 Pulled By: mruberry fbshipit-source-id: 1c4d15fa76ba624e341a69a32337a9a16cc01013	2021-03-02 21:34:23 -08:00
Horace He	0819d5f9e9	[FX] Added docstring for concrete_args (#53151 ) Summary: An oversight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53151 Reviewed By: jamesr66a Differential Revision: D26766450 Pulled By: Chillee fbshipit-source-id: 26e6e44386bbff4bc06b41c39dff9e02cadfcc73	2021-03-02 21:15:00 -08:00
Chunli Fu	e1e19a71ce	[shape inference] fix pruning Summary: Use the dim type of the first input for output. Test Plan: unit test flow test: f254777437 https://fburl.com/n933wc3a shapes { shape { dims: 19102004 dims: 68 data_type: UINT8 name: "sparse_nn_2/sparse_arch_2/grouped_embedding_10/grouped_generic_embedding_10/GSF_IDLIST_IG_BUSINESS_AUTHOR_PPR_ORGANIC_ENGAGEMENT_UNIFORM_RIDS/w_EmbeddingFusedUint4Quantization" } dim_type: CONSTANT dim_type: CONSTANT name: "sparse_nn_2/sparse_arch_2/grouped_embedding_10/grouped_generic_embedding_10/GSF_IDLIST_IG_BUSINESS_AUTHOR_PPR_ORGANIC_ENGAGEMENT_UNIFORM_RIDS/w_EmbeddingFusedUint4Quantization" shape_is_final: true } Reviewed By: yinghai, khabinov Differential Revision: D26763978 fbshipit-source-id: b9c0d6ca4a2b0e4d50d34e08f724e99ad705196b	2021-03-02 20:59:27 -08:00
Chunli Fu	5c1c8cb93b	[caffe2] Fix shape inference for pruning ops (#53082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53082 Reviewed By: yinghai, khabinov Differential Revision: D26742532 fbshipit-source-id: 6cdfb293541b601f7916a95e08bf573876c9ca74	2021-03-02 20:57:45 -08:00
Alexander	0dac7d86ca	blas copy and axpy to aten (#52345 ) Summary: Fixes #{issue number} Follow-up PR: https://github.com/pytorch/pytorch/pull/50984 `copy` and `axpy` functions are ported to ATen. `THBlas_axpy` and `THBlas_copy` are removed. Looking forward your comments cc ngimel, mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/52345 Reviewed By: zou3519 Differential Revision: D26756533 Pulled By: ngimel fbshipit-source-id: 97649485eeb6b361d6434c4701539b5abba4a17d	2021-03-02 20:50:57 -08:00
Bert Maher	565d8235e5	[nnc] Test cases for uneven split + reorder (#53091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53091 Split with tail followed by reorder causes a segfault in NNC Split with mask followed by reorder generates invalid code that writes out of bounds ghstack-source-id: 122870733 Test Plan: LoopNest.ColReduceSplit* Reviewed By: navahgar Differential Revision: D26746254 fbshipit-source-id: f8a0de18531b34d2bf06ccaa35d9c98b81b5c600	2021-03-02 20:36:48 -08:00
cyy	d8730194e7	use device methods (#52899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52899 Reviewed By: zou3519 Differential Revision: D26752203 Pulled By: albanD fbshipit-source-id: eaef89377999b20655fe85d5a38ca7a2c5882de7	2021-03-02 20:14:23 -08:00
Mikhail Zolotukhin	aba33b0042	[TensorExpr] IRVerifier: add index verifier for Store. (#53137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53137 Also, add casting to Int for Load and Store indices. Fixes #52773. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26760256 Pulled By: ZolotukhinM fbshipit-source-id: a2d3141b17584724a5feabcabec25d0577b83a30	2021-03-02 19:56:28 -08:00
Richard Barnes	0f7f600e01	Fix constexpr __host__ warning (#52702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52702 Fixes: ``` stderr: caffe2/c10/util/MathConstants.h(22): warning: calling a constexpr __host__ function("from_bits") from a __host__ __device__ function("pi") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this. ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D26589533 fbshipit-source-id: 42c4b36b0ba1e08cbdc9a122fedf35610483c764	2021-03-02 19:44:08 -08:00
Nikita Vedeneev	3ac9013235	Implements `torch.linalg.lstsq` (#49093 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes https://github.com/pytorch/pytorch/issues/49252 - [x] docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49093 Reviewed By: H-Huang Differential Revision: D26723384 Pulled By: mruberry fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713	2021-03-02 19:00:07 -08:00
Hao Lu	c0b31a5ba7	[StaticRuntime] Clean up (#53096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53096 - auto[&] -> const auto[&] - clean up size() calls Test Plan: ``` buck test //caffe2/torch/fb/sparsenn:test buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest ``` Reviewed By: ajyu Differential Revision: D26747001 fbshipit-source-id: 6ec81310747d86f7c5d2d17202eef7e299ef610c	2021-03-02 18:51:09 -08:00
Ivan Yashchuk	870bac13bc	Fixed out= variant of linalg.inv (#51977 ) Summary: This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51977 Reviewed By: H-Huang Differential Revision: D26725718 Pulled By: mruberry fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753	2021-03-02 18:45:29 -08:00
Yi Zhang	fd582af06c	enable coverage test for dataloader on Windows (#52550 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50661 For coverage, The class qualified name is `'SimpleCustomBatch': <class '__mp_main__.SimpleCustomBatch'>` For pytest The class qualified name is `'SimpleCustomBatch': <class 'test_dataloader.SimpleCustomBatch'>` So move the class to one separate file ![image](https://user-images.githubusercontent.com/16190118/108611869-d6b51f80-741d-11eb-908e-be7a64da916d.png) As malfet suggestion, use __import__ to avoid adding new file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52550 Reviewed By: walterddr Differential Revision: D26754023 Pulled By: malfet fbshipit-source-id: 34b0fbe7336b9303cedc28ec6116ab752a2d3630	2021-03-02 18:40:47 -08:00
Joel Schlosser	e86476f736	Huber loss (#50553 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48595. ## Background This PR implements HuberLoss, which differs from SmoothL1Loss by a factor of beta. The current implementation does not share logic between the two. Feedback is welcome for the optimal way to minimize code duplication while remaining performant. I've done some early [benchmarking](https://pytorch.org/tutorials/recipes/recipes/benchmark.html#collecting-instruction-counts-with-callgrind) with Huber calling in to the Smooth L1 kernel and scaling afterwards; for the simple test case I used, instruction counts are as follows: ``` Huber loss calls dedicated Huber kernel: 2,795,300 Huber loss calls Smooth L1 kernel and scales afterwards: 4,523,612 ``` With these numbers, instruction counts are ~62% higher when using the pre-existing Smooth L1 kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50553 Test Plan: ``` python test/test_nn.py TestNN.test_HuberLoss python test/test_nn.py TestNN.test_HuberLoss_delta python test/test_nn.py TestNN.test_huber_loss_invalid_delta python test/test_nn.py TestNNDeviceTypeCPU.test_smooth_l1_loss_vs_huber_loss_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_smooth_l1_loss_vs_huber_loss_cuda python test/test_nn.py TestNNDeviceTypeCPU.test_invalid_reduction_strings_cpu python test/test_nn.py TestNNDeviceTypeCUDA.test_invalid_reduction_strings_cuda python test/test_nn.py TestNN.test_loss_equal_input_target_shape python test/test_nn.py TestNN.test_pointwise_loss_broadcast python test/test_overrides.py python test/test_jit.py TestJitGeneratedFunctional.test_nn_huber_loss python test/test_type_hints.py python test/test_cpp_api_parity.py build/bin/test_api ``` ## Documentation <img width="677" alt="Screen Shot 2021-01-14 at 4 25 08 PM" src="https://user-images.githubusercontent.com/75754324/104651224-5a445980-5685-11eb-884b-14ea517958c2.png"> <img width="677" alt="Screen Shot 2021-01-14 at 4 24 35 PM" src="https://user-images.githubusercontent.com/75754324/104651190-4e589780-5685-11eb-974d-8c63a89c050e.png"> <img width="661" alt="Screen Shot 2021-01-14 at 4 24 45 PM" src="https://user-images.githubusercontent.com/75754324/104651198-50225b00-5685-11eb-958e-136b36f6f8a8.png"> <img width="869" alt="Screen Shot 2021-01-14 at 4 25 27 PM" src="https://user-images.githubusercontent.com/75754324/104651208-53b5e200-5685-11eb-9fe4-5ff433aa13c5.png"> <img width="862" alt="Screen Shot 2021-01-14 at 4 25 48 PM" src="https://user-images.githubusercontent.com/75754324/104651209-53b5e200-5685-11eb-8051-b0cfddcb07d3.png"> Reviewed By: H-Huang Differential Revision: D26734071 Pulled By: jbschlosser fbshipit-source-id: c98c1b5f32a16f7a2a4e04bdce678080eceed5d5	2021-03-02 17:30:45 -08:00
Basil Hosmer	2c8f9aec64	avoid TLS in has_names (#53003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53003 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26719724 Pulled By: bhosmer fbshipit-source-id: b575e2cec6509e287ed216d9926bbf1108eb7636	2021-03-02 17:19:06 -08:00
Thomas J. Fan	e2ecfb60a6	FIX Validates target in cosine_embedding (#53110 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53030 This PR validates the target for `cosine_embedding_loss`. This is consistent with how `cross_entropy` handles non 1d targets: ```py import torch import torch.nn.functional as F input = torch.randn(3, 5, requires_grad=True) target = torch.randint(5, (3, 1)) # Raises RuntimeError loss = F.cross_entropy(input, target) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53110 Reviewed By: VitalyFedyunin Differential Revision: D26766579 Pulled By: jbschlosser fbshipit-source-id: 73ad559ff9376543b6528a36af094e82eb6f9735	2021-03-02 16:50:44 -08:00
Omkar Salpekar	593b0fbade	Revert D26720919: [Gradient Compression] Remove some low-level methods of GradBucket class Test Plan: revert-hammer Differential Revision: D26720919 (`521e1e83ea`) Original commit changeset: 46fb64230087 fbshipit-source-id: e2b68892d1735b7249b4d36f3dff57160c9cbc78	2021-03-02 16:18:39 -08:00
Richard Barnes	c4c20a5d2d	Suppress unsigned comparison warning (#52653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52653 Fixes: ``` caffe2/aten/src/ATen/native/cuda/BinaryMulDivKernel.cu(105): warning: pointless comparison of unsigned integer with zero ``` Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D26588918 fbshipit-source-id: b1a72cebbb7dcb516f63c7c8e2526840ed7c85d1	2021-03-02 16:00:55 -08:00
nihui	6ab3a8b6f2	Update torch.nn.quantizable.MultiHeadAttention docstring (#53106 ) Summary: Apply the same fix as PR https://github.com/pytorch/pytorch/pull/49950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53106 Reviewed By: zou3519 Differential Revision: D26752234 Pulled By: albanD fbshipit-source-id: 5c924319b8365da4d3d2ba2206e2586e23e718f0	2021-03-02 15:43:00 -08:00
Jeffrey Wan	a3a2150409	Codegen python bindings to access attributes of grad_fn (#52451 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9922 Adds python bindings to selected fields that grad_fn saves - we did not add python bindings to certain types such as 'TypeAndSize' and 'TensorGeometry'. All field names are prefixed with `_saved_` so they are easy to discern. User code should not depend on particular saved fields to exist as what grad_fn saves for the backward pass is considered an implementation detail and thus prone to change. Warning: Not all parameters that are passed in are necessarily stored to be used for the backward pass. What you put in is not necessarily what you get out either. Here we pass `kernel_size=3`, but `b.grad_fn._saved_kernel_size` returns `(3, 3)` instead of 3. It seems to vary case-by-case. For example: ``` import torch import torch.nn as nn model = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=2, padding=1, dilation=1) a = torch.ones(1, 3, 32, 32, requires_grad=True) b = model(a) print("kernel_size: ", b.grad_fn._saved_kernel_size) print("stride: ", b.grad_fn._saved_stride) # returns tuple: (3, 3) # print("dilation: ", b.grad_fn._saved_dilation) # dilation is not stored for backward pass print("padding: ", b.grad_fn._saved_padding) print("weight: ", b.grad_fn._saved_weight) ``` Sample of generated code: ``` PyObject* THPThnnConv2DBackward_self_getter(THPCppFunction self, void _unused) { const auto& prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->self_; return THPVariable_Wrap(prop.unpack()); } PyObject THPThnnConv2DBackward_weight_getter(THPCppFunction self, void _unused) { const auto& prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->weight_; return THPVariable_Wrap(prop.unpack()); } PyObject THPThnnConv2DBackward_kernel_size_getter(THPCppFunction self, void _unused) { auto prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->kernel_size; PyObject tup = PyTuple_New((Py_ssize_t) prop.size()); for (int i = 0; i < prop.size(); i++) { PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong((uint64_t) prop[i])); } return tup; } PyObject* THPThnnConv2DBackward_stride_getter(THPCppFunction self, void _unused) { auto prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->stride; PyObject tup = PyTuple_New((Py_ssize_t) prop.size()); for (int i = 0; i < prop.size(); i++) { PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong((uint64_t) prop[i])); } return tup; } PyObject* THPThnnConv2DBackward_padding_getter(THPCppFunction self, void _unused) { auto prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->padding; PyObject tup = PyTuple_New((Py_ssize_t) prop.size()); for (int i = 0; i < prop.size(); i++) { PyTuple_SetItem(tup, (Py_ssize_t) i, PyLong_FromUnsignedLong((uint64_t) prop[i])); } return tup; } PyObject* THPThnnConv2DBackward_finput_getter(THPCppFunction self, void _unused) { const auto& prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->finput_; return THPVariable_Wrap(prop.unpack()); } PyObject THPThnnConv2DBackward_fgrad_input_getter(THPCppFunction self, void _unused) { const auto& prop = static_cast<ThnnConv2DBackward>(self->cdata.get())->fgrad_input_; return THPVariable_Wrap(prop.unpack()); } static struct PyGetSetDef ThnnConv2DBackward_properties[] = { THP_FUNCTION_DEFAULT_PROPERTIES, {(char)"_saved_self", (getter)THPThnnConv2DBackward_self_getter, nullptr, nullptr, nullptr}, {(char)"_saved_weight", (getter)THPThnnConv2DBackward_weight_getter, nullptr, nullptr, nullptr}, {(char)"_saved_kernel_size", (getter)THPThnnConv2DBackward_kernel_size_getter, nullptr, nullptr, nullptr}, {(char)"_saved_stride", (getter)THPThnnConv2DBackward_stride_getter, nullptr, nullptr, nullptr}, {(char)"_saved_padding", (getter)THPThnnConv2DBackward_padding_getter, nullptr, nullptr, nullptr}, {(char)"_saved_finput", (getter)THPThnnConv2DBackward_finput_getter, nullptr, nullptr, nullptr}, {(char)"_saved_fgrad_input", (getter)THPThnnConv2DBackward_fgrad_input_getter, nullptr, nullptr, nullptr}, {nullptr} /* sentinel */ }; ... void initialize_autogenerated_functions() { ... static PyTypeObject ThnnConv2DBackwardClass; addClass<ThnnConv2DBackward>(ThnnConv2DBackwardClass, "ThnnConv2DBackward", ThnnConv2DBackward_properties); ... } ``` Before: ``` void initialize_autogenerated_functions() { ... static PyTypeObject ThnnConv2DBackwardClass; addClass<ThnnConv2DBackward>(ThnnConv2DBackwardClass, "ThnnConv2DBackward"); ... } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52451 Reviewed By: H-Huang Differential Revision: D26692633 Pulled By: soulitzer fbshipit-source-id: a09b5b8138e4641093aff68c7e9dffdbb96911b8	2021-03-02 15:20:56 -08:00
Edward Yang	baed2cfe01	Back out "Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled" (#53127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53127 Original commit changeset: cc9cc4f508af ghstack-source-id: 122871468 Test Plan: run flake8 on the files locally Reviewed By: malfet, janeyx99 Differential Revision: D26757859 fbshipit-source-id: 7e7bde5c1f2b434442079656e2186b500d53fdc2	2021-03-02 14:46:56 -08:00
Yi Wang	521e1e83ea	[Gradient Compression] Remove some low-level methods of GradBucket class (#53098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53098 Remove some low-level methods that are no longer needed since `get_per_parameter_tensors` method is added to `GradBucket` class. Avoid unnecessary exposure to the internals before publishing GradBucket APIs. ghstack-source-id: 122723683 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D26720919 fbshipit-source-id: 46fb6423008792e72d7a1dd68930a31e0724c92c	2021-03-02 14:39:19 -08:00
Yi Wang	b05dd931ee	[Gradient Compression] Add is_the_last_bucket_to_allreduce method to GradBucket class (#53010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53010 To determine the boundary between different iterations in a DDP communication hook, currently the user code needs `bucket.get_index() == 0`, which involves internal bucketization implementation details and undermines the usability of DDP communication hook. Create an API to hide the details and improve the usability before publishing GradBucket APIs. ghstack-source-id: 122723081 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl Reviewed By: rohan-varma Differential Revision: D26720813 fbshipit-source-id: f4a3147382c1f970534d7f0dee0cd599156c8b8c	2021-03-02 14:39:12 -08:00
Yi Wang	4997c38a15	[Gradient Compression] Don't provide default values in GradBucket constructor (#53102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53102 In `GradBucket` constructor, `offsets`, `lengths`, and `sizes_vec` are optional arguments and could possibly be empty. It will be safe to remove the default values. ghstack-source-id: 122833603 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26748199 fbshipit-source-id: 2e3bcd1b732851919a64bbbd20fe85e77a616fe3	2021-03-02 14:39:07 -08:00
Yi Wang	ecb5ac90ed	[Gradient Compression] Add get_per_parameter_tensors method to GradBucket class (#53009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53009 It can be a common operation to apply layer-wise operations over per-parameter tensors in a DDP communication hook. Create a util method in GradBucket class before publishing GradBucket APIs. ghstack-source-id: 122833594 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl f254364097 Reviewed By: rohan-varma Differential Revision: D26717893 fbshipit-source-id: 916db319de8b85dd22bc4e35db5671bf4e34740f	2021-03-02 14:39:03 -08:00
Nikitha Malgi	ab7f6f3f5b	Add default arguments to cuda stream and events (#53025 ) Summary: * https://github.com/pytorch/pytorch/issues/53025 Add default args for CUDA stream and events Tests: ===== python test/test_jit.py -v TestCUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/53025 Reviewed By: H-Huang Differential Revision: D26734499 Pulled By: nikithamalgifb fbshipit-source-id: 5311623a501e2e6fb3fc70e39522e3970e401feb	2021-03-02 14:37:24 -08:00
Howard Huang	2444b4d122	Add wait_for_worker param to TCPStore and fix port in use flaky test failures (#52888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52888 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26678707 Pulled By: H-Huang fbshipit-source-id: 5662e60c4d06d88d2e57834f496b52fb7600de29	2021-03-02 14:31:33 -08:00
Jane Xu	41765d4681	Store coverage files as artifacts for better debugging (#53126 ) Summary: Helps with https://github.com/pytorch/pytorch/issues/44120 by storing coverage as artifacts to be investigated Pull Request resolved: https://github.com/pytorch/pytorch/pull/53126 Reviewed By: walterddr Differential Revision: D26757702 Pulled By: janeyx99 fbshipit-source-id: f7db2b3f51b9ee1a95178bdbd4b1c453078d2ba7	2021-03-02 14:24:36 -08:00
Shen Li	d697090260	Add a note in DDP doc to point to ZeroRedundancyOptimizer (#53113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53113 Test Plan: Imported from OSS Reviewed By: blefaudeux Differential Revision: D26752339 Pulled By: mrshenli fbshipit-source-id: 7a082f1007bc550eabb82b559d020bbe717fa497	2021-03-02 14:18:06 -08:00
Shen Li	29034b9487	[Reland] Update and expose ZeroRedundancyOptimizer docs (#53112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53112 Test Plan: Imported from OSS Reviewed By: blefaudeux Differential Revision: D26752289 Pulled By: mrshenli fbshipit-source-id: 897257417b530e6e18788cb40c44e5cb7ac688d5	2021-03-02 14:16:12 -08:00
Michael Carilli	66b20bb738	[CUDA graphs] [JIT] improves readability and nvfuser convenience for graph-safe cuda RNG (#51580 ) Summary: I'm trying to make jitted RNG graph-safe in csarofeen 's nvfuser branch. Doing so requires diffs in files outside torch/csrc/jit, and we'd like these to go upstream through the present simple separate PR (instead of needing to be reviewed as part of Christian's branch's eventual merge, which will be massive). From the perspective of eager mode consumers, diffs here are purely cosmetic. I moved raw definitions of `PhiloxCudaState` and `at::cuda::philox::unpack` to standalone headers the codegen can easily copy from. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51580 Reviewed By: malfet Differential Revision: D26626972 Pulled By: ngimel fbshipit-source-id: 7f04d6c5ffe0af7a8a66d3ae6ed36191d12f7d67	2021-03-02 14:13:12 -08:00
Edward Yang	37bf6c134b	Register DefaultBackend implementations for functional/inplace structured operators (#53037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53037 As remarked in #52277 it is easy to give an (inefficient, due to extra redispatches) DefaultBackend implementation of foo and foo_ in terms of foo_out. This patch enables code generation for DefaultBackend in these cases by default for all structured kernels. You can see the payoff in MSNPU extension: it only has to register a kernel for add.out, and it gets add and add_ kernels automatically. The actual code changes are very modest: - When DefaultBackend, call the dispatched (not direct native::) functions to allocate tensors, change device guard, etc - Don't call impl() for DefaultBackend (as it doesn't exist); instead, directly generate a call to at::foo_out to do the actual work. - Do NOT generate DefaultBackend implementation for foo_out. Actually, there is a case to be made for this being a good idea with more infra; see comments inside. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26731225 Pulled By: ezyang fbshipit-source-id: 939da7cb69f694722ec293e5e42e74a755dd0985	2021-03-02 14:13:08 -08:00
Edward Yang	c5a67f1675	Fix minor inaccuracy in translate error reporting (#53032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53032 Previously, you could get this error message: ``` Failed to synthesize the expression "Tensor & out". When I failed, the following bindings were available in the context: const Tensor & self; const Tensor & other; Scalar alpha; const Tensor & op.outputs_[0]; ``` There's a problem with this error message: it doesn't seem like there is any 'out' argument available, but actually there is: the last binding in the context is it. We printed the expression, not the ctype name. After this patch, the context now prints as: ``` const Tensor & self; // self const Tensor & other; // other Scalar alpha; // alpha const Tensor & out; // op.outputs_[0] ``` Now it becomes clear that it's a const mismatch. Maybe we could also beef up the error message so it points out near misses, but I'll leave that to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D26729768 Pulled By: ezyang fbshipit-source-id: adb363551a7145eac788943c20969c86b1f8a81b	2021-03-02 14:11:28 -08:00
Edward Yang	fbf2883d35	Revert D26733731: [pytorch][PR] Skip dispatch for `is_floating_point` Test Plan: revert-hammer Differential Revision: D26733731 (`4fb82a8808`) Original commit changeset: 87398d3b7583 fbshipit-source-id: 9eac2b63c72c7d3da43e6a2fe1610549f5c13b70	2021-03-02 13:21:21 -08:00
Yi Wang	890e051047	Clang-format quantization_hooks.py (#53100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53100 ghstack-source-id: 122723751 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D26721146 fbshipit-source-id: 985057fc02c997124b676854eb0a55e569971a3f	2021-03-02 12:48:43 -08:00
vfdev-5	cb1596a193	[operator_benchmark] Added channels last 3d option to interpolate test (#53117 ) Summary: Description: - Added channels last 3d option to interpolate test - split config non-4d into two : 3d and 5d Pull Request resolved: https://github.com/pytorch/pytorch/pull/53117 Reviewed By: NicolasHug Differential Revision: D26754243 Pulled By: fmassa fbshipit-source-id: 49bbab3bb47de27790e39537d0fbeca0f01782c4	2021-03-02 11:54:45 -08:00
Facebook Community Bot	62d1cdd725	Automated submodule update: tensorpipe (#53012 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `f73bcd9dfa` Pull Request resolved: https://github.com/pytorch/pytorch/pull/53012 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26722108 fbshipit-source-id: ea6fa719c8fb666818a0e91da8d4f2edcc88fc49	2021-03-02 11:49:09 -08:00
Edward Yang	2d7119f943	Revert D26753571: [pytorch][PR] add submodules to sys.modules so their attributes can be pickled Test Plan: revert-hammer Differential Revision: D26753571 (`fbf9745c85`) Original commit changeset: 2bda03bab39f fbshipit-source-id: cc9cc4f508af122b0fdec7f8475343bd9badb9db	2021-03-02 11:11:31 -08:00
Michael Suo	73a57246d9	disable dill extension behavior (#53118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53118 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26754878 Pulled By: suo fbshipit-source-id: e088d1dc841633bfc0902e3d19f151892ac5c38c	2021-03-02 11:07:08 -08:00
Nikita Shulga	43f810fa96	Add streams boundary check to `torch::cuda::scatter`` (#53057 ) Summary: Accessing elements of `std::vector` outside of its boundaries can lead to crashes/memory corruptions Fixes https://github.com/pytorch/pytorch/issues/52526 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53057 Reviewed By: janeyx99 Differential Revision: D26736829 Pulled By: malfet fbshipit-source-id: 7aa13c53c8d062adfef082153809a7a724a74ee5	2021-03-02 10:58:10 -08:00
Natalia Gimelshein	e5e54ada61	fix logcumsumexp functor to properly handle infs and nans (#52947 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52213 Nans were previously inconsistently propagated due to std::min always returning first argument if one of the args in nan when reduction functor was called on 2 `-inf` arguments, `std::min(x,y) - std::max(x,y)` resulted in `-inf - (-inf)` = nan, even though logcumsumexp is well defined for `-inf, -inf` pair. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52947 Reviewed By: H-Huang Differential Revision: D26718456 Pulled By: ngimel fbshipit-source-id: a44433889da352cc959786dd15b6361a68fcfed7	2021-03-02 10:58:01 -08:00
Kyle Chen	d8ef3a4793	[ROCm] Enable test cases in test_nn.py for ROCm (#52836 ) Summary: Enabling tests in test_nn.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52836 Reviewed By: H-Huang Differential Revision: D26725891 Pulled By: mruberry fbshipit-source-id: 59655a2515ddce92ffc4c55dcf6f28257c05e3c9	2021-03-02 10:56:07 -08:00
Xiang	2bf079d060	Remove useless test_reference_numerics skip infos (#52890 ) Summary: These are no longer useful. Let's wait for a few days before merging this, just in case somebody finds failures in them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52890 Reviewed By: H-Huang Differential Revision: D26725500 Pulled By: mruberry fbshipit-source-id: 3ebc18ee11ebef34451e60861414521730742288	2021-03-02 10:49:21 -08:00
mattip	fbf9745c85	add submodules to sys.modules so their attributes can be pickled (#53107 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38137 As mentioned in the issue, this is a workaround for [python issue 43367](https://bugs.python.org/issue43367). There are a number of other places where `sys.modules` is modified, if something changes in python perhaps those should be reviewed as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53107 Reviewed By: zou3519 Differential Revision: D26753571 Pulled By: ezyang fbshipit-source-id: 2bda03bab39ff9ca58ce4bc13befe021da91b9c4	2021-03-02 10:47:21 -08:00
kshitij12345	aa603cb2ce	add OpInfo entry for signbit (#52198 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52198 Reviewed By: H-Huang Differential Revision: D26727598 Pulled By: mruberry fbshipit-source-id: 282350febbd0b1af73320f0e912bf553d386d4b0	2021-03-02 10:38:34 -08:00
iramazanli	4fb82a8808	Skip dispatch for `is_floating_point` (#52998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52998 Reviewed By: H-Huang Differential Revision: D26733731 Pulled By: iramazanli fbshipit-source-id: 87398d3b7583632ca18e906fc997e939c73a57e3	2021-03-02 10:30:49 -08:00
Bram Wasti	d4e64dad15	[static runtime] Register both TupleConstruct and ListConstruct as out variants (#52684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52684 With alias analysis we get much more powerful registration and we can start removing "native" and fallback interpreted implementations. `inputsOutOfPlace` is an artifact of the hardcoded "native" and lax fallback implementations. Ideally every node will run out of place every time. Afaik, there's never a reason to disable it and we may want to remove that functionality. This diff does introduce a "leak" in the memory management - containers are not cleaned up. This only happens when out variants are enabled Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --run-disabled Reviewed By: maratsubkhankulov, hlu1 Differential Revision: D26515801 fbshipit-source-id: 7391d66b9d36e15fc2955a5c34a04d027d18fe78	2021-03-02 09:55:25 -08:00
Bram Wasti	2d67b76fa6	[static runtime] Add Alias analysis to Memory Management/Planning (#50060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50060 Aliasing is currently mishandled in SR. This diff fixes that issue entirely and allows us to avoid hard coded "view" registration. I'll remove the macro in a follow up diff. However, this diff introduces a subtle assumption when memory optimization is turned on: operators cannot "sometimes alias." Some care will need to be taken to actually make sure this is enforced going forward. This diff ``` $ batch=20 ./run.sh --pt_optimize_memory=false \|& grep "finished" C2 run finished. Milliseconds per iter: 0.512114. Iters per second: 1952.69 PyTorch run finished. Milliseconds per iter: 0.51176. Iters per second: 1954.04 $ batch=20 ./run.sh --pt_optimize_memory=true \|& grep "finished" C2 run finished. Milliseconds per iter: 0.511402. Iters per second: 1955.41 PyTorch run finished. Milliseconds per iter: 0.506493. Iters per second: 1974.36 $ batch=1 iters=100000 ./run.sh --pt_optimize_memory=false \|& grep "finished" C2 run finished. Milliseconds per iter: 0.0562877. Iters per second: 17765.9 PyTorch run finished. Milliseconds per iter: 0.0667712. Iters per second: 14976.5 $ batch=1 iters=100000 ./run.sh --pt_optimize_memory=true \|& grep "finished" C2 run finished. Milliseconds per iter: 0.0561829. Iters per second: 17799 PyTorch run finished. Milliseconds per iter: 0.0665069. Iters per second: 15036 ``` Test Plan: buck test //caffe2/test:static_runtime buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: eellison Differential Revision: D25581156 fbshipit-source-id: 41e68119d53e687a9c32d966ed420b270aea4b5b	2021-03-02 09:53:32 -08:00
Jake Bailey	b22df26361	Explicitly export submodules and variables from torch module (#52339 ) Summary: For https://github.com/pytorch/pytorch/issues/47027. Some progress has been made in https://github.com/pytorch/pytorch/issues/50665, but in my testing trying to unwrap the circular dependencies is turning into a neverending quest. This PR explicitly exports things in the top-level torch module without any semantic effect, in accordance with this py.typed library guidance: https://github.com/microsoft/pyright/blob/master/docs/typed-libraries.md#library-interface It may be possible to do some of the other fixes just using `__all__` where needed, but `__all__` has a semantic effect I would like to further review. This PR at least fixes simple completions like `torch.nn` in Pylance/pyright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52339 Reviewed By: smessmer Differential Revision: D26694909 Pulled By: malfet fbshipit-source-id: 99f2c6d0bf972afd4036df988e3acae857dde3e1	2021-03-02 09:03:08 -08:00
Jane Xu	048e3917f9	Add duplicate scheduled-ci to allow for debugging (#53109 ) Summary: This should trigger the 11.2 and 9.2 tests on ci-all and release branch pushes so that debugging can happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53109 Reviewed By: yns88 Differential Revision: D26752151 Pulled By: janeyx99 fbshipit-source-id: 3272038cc97560896ee3e9f5bc461212806c71e2	2021-03-02 08:37:53 -08:00
Jane Xu	28f87bb734	Don't run cpp tests a second time in the sharded ort_test2 job (#53067 ) Summary: Currently, the same C++ tests are run in CI twice in the onnx_ort_test1 job as well as the onnx_ort_test2 job. This PR runs it once on our test1 job only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53067 Reviewed By: walterddr Differential Revision: D26739857 Pulled By: janeyx99 fbshipit-source-id: 8960ad5c70181b8154a230914167286f1d9b64f6	2021-03-02 07:46:18 -08:00
Jane Xu	09ce9b5877	Store test file in S3 as well for every TestSuite (#52869 ) Summary: We want to store the file names that triggers each test suite so that we can use this data for categorizing those test files. ~~After considering several solutions, this one is the most backwards compatible, and the current test cases in test_testing.py for print test stats don't break.~~ The previous plan did not work, as there are multiple Python test jobs that spawn the same suites. Instead, the new S3 format will store test files (e.g., `test_nn` and `distributed/test_distributed_fork`) which will contain the suites they spawn, which will contain the test cases run within the suite. (Currently, there is no top layer of test files.) Because of this major structural change, a lot of changes have now been made (thank you samestep!) to test_history.py and print_test_stats.py to make this new format backwards compatible. Old test plan: Make sure that the data is as expected in S3 after https://github.com/pytorch/pytorch/pull/52873 finishes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52869 Test Plan: Added tests to test_testing.py which pass, and CI. Reviewed By: samestep Differential Revision: D26672561 Pulled By: janeyx99 fbshipit-source-id: f46b91e16c1d9de5e0cb9bfa648b6448d979257e	2021-03-02 07:36:00 -08:00
Shen Li	931100f829	Revert D26696938: Update and expose ZeroRedundancyOptimizer docs Test Plan: revert-hammer Differential Revision: D26696938 (`a586c02962`) Original commit changeset: dafb00e5c9f0 fbshipit-source-id: b08604d2009f4df7b620699dd6659dfed2b02792	2021-03-02 07:14:23 -08:00
Jerry Zhang	46bd76fdec	[quant][graphmode][fx][fp16] Add fp16 support for silu (#52865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52865 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_silu Imported from OSS Reviewed By: vkuzo Differential Revision: D26672270 fbshipit-source-id: a6a6ab58c347a56f0ded612b2e0a3e2230a91d9e	2021-03-02 02:11:29 -08:00
Jerry Zhang	267aeb8a56	[quant][graphmode][fx][fp16] Add fp16 support for tanh (#52864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52864 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_fixed_qprams_ops_fp16 Imported from OSS Reviewed By: vkuzo Differential Revision: D26672271 fbshipit-source-id: 539017c3045a28fc95f4f9d32591c2b2d10af6c0	2021-03-02 02:11:25 -08:00
Jerry Zhang	d40b501cfc	[quant][graphmode][fx][fp16] Add fp16 support for sigmoid (#52863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52863 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_fixed_qparams_ops_fp16 Imported from OSS Reviewed By: vkuzo Differential Revision: D26672273 fbshipit-source-id: 30d5befe2a24081ac12ac773df4d2bd26d2d0192	2021-03-02 02:11:21 -08:00
Jerry Zhang	3fb324f05b	[quant][graphmode][fx][fp16] Add fp16 support for layer_norm (#52862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52862 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_layer_norm Imported from OSS Reviewed By: vkuzo Differential Revision: D26672272 fbshipit-source-id: 4cfdce986efa98db7dc58bf2a62b650e45a69ed0	2021-03-02 02:11:17 -08:00
Jerry Zhang	fc6fdade9f	[quant][graphmode][fx][fp16] Add fp16 support for torch.sum (#52811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52811 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_sum Imported from OSS Reviewed By: vkuzo Differential Revision: D26655619 fbshipit-source-id: 642e0de47d0da7bd1abe1e981819de33e84c32f3	2021-03-02 02:11:13 -08:00
Jerry Zhang	97c51d5d5d	[quant][graphmode][fx][fp16] Add fp16 support for div (#52810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52810 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_div Imported from OSS Reviewed By: vkuzo Differential Revision: D26655620 fbshipit-source-id: e46cb895ba456e99e4433bd6037229b8248a1b28	2021-03-02 02:11:08 -08:00
Jerry Zhang	a6af93e921	[quant][graphmode][fx][fp16] Add fp16 support for sub (#52809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52809 Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_sub Imported from OSS Reviewed By: vkuzo Differential Revision: D26655618 fbshipit-source-id: b47966ee1b75a2f814b9019d8d16b2da2212f5da	2021-03-02 02:09:07 -08:00
Raghavan Raman	d382693263	[NNC] Build aggregate stmt for kernel before LoopNest. (#53024 ) Summary: This PR builds an aggregate stmt for all the tensors in the kernel before constructing LoopNest. This migrates to using the LoopNest constructor that takes in a stmt and output buffers. This is one more step closer to eliminating the dependency of LoopNest on Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53024 Reviewed By: H-Huang Differential Revision: D26729221 Pulled By: navahgar fbshipit-source-id: 43e972585351f6902c14b383b137aaaee3aaa3e1	2021-03-02 00:51:56 -08:00
Yanan Cao	f448c59a57	Fix jit.trace mis-handling of InterfaceType (#53052 ) Summary: `jit.trace` recursively gathers all named attributes in module at beginning of tracing. This is fine in a pure-tracing environment, but breaks when a scripted module that contains an InterfaceType'd submodule is involved. Because InterfaceType, by design, is not allowed to have any attribute, thus some of the gathered attributes will turn into fatal errors in following some graph rewrite passes. This PR fixes this bug by distinguishing InterfaceType'd submodules from normal ClassType'd submodules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53052 Reviewed By: wanchaol Differential Revision: D26735566 Pulled By: gmagogsfm fbshipit-source-id: a14aee6f1fe8000f80c2dc60bdf19acee6225090	2021-03-02 00:40:19 -08:00
Raghavan Raman	aae188c529	[NNC] Handle non literal constant bounds in Unroll. (#53029 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/53000 Also added test to confirm this case works in FlattenLoop as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/53029 Reviewed By: bertmaher Differential Revision: D26742705 Pulled By: navahgar fbshipit-source-id: d87a0f9698411026b5b6e55eee7c2b9fb123d06b	2021-03-02 00:35:27 -08:00
kshitij12345	748285ccd7	[complex] add autograd support for torch.polar (#52488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/33152 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52488 Reviewed By: zou3519 Differential Revision: D26711841 Pulled By: anjali411 fbshipit-source-id: b8538fb8cb44456b832e4f993cf41954b3ddd2e8	2021-03-01 21:57:35 -08:00
Michael Suo	87b6702833	[distributed] make the pickler in distributed_c10d pluggable (#53060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53060 As title. We would like to use alternative pickler/unpickler implementations, to make it possible to send objects over the wire that are coming from a torch.package Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D26737317 Pulled By: suo fbshipit-source-id: 6bdef9824e48ef657dcad72cc5a9114e6612ea4a	2021-03-01 21:37:48 -08:00
Michael Suo	ac122a5a6d	[package] catch exceptions from calling reduce function. (#53061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53061 We only care about evaluating the string return version. If `reduce()` throws an error, we should just continue on with pickling. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26737652 Pulled By: suo fbshipit-source-id: 0b6fbbe345ad0b6a33330b2efa39d7bab703193d	2021-03-01 21:27:08 -08:00
Elias Ellison	506f756a0a	Include max pool in fusion groups (#52613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52613 Including MaxPool as part of the MKLDNN fusion group sped up resnet18 by ~20%, and was a win on other models I tested as well. I will post more complete benchmarks. As mentioned in the diff, in some cases MaxPool can be slower than aten - ideally we'd only include maxpool if it decreased the number of layout transformations that occur. That hasnt actually matttered for all of the torchvision models, I don't think its necessary for this PR. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696704 Pulled By: eellison fbshipit-source-id: 61a025dbf5e7591c0a0f75def3beb439a138a21e	2021-03-01 21:22:46 -08:00
Elias Ellison	6149a26adb	Extend subgraph utils to cover merging a node following a subgraph (#52513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52513 Subgraph Utils previously only worked with merging a node into a subgraph if the node was before the subgraph; extend the logic for the case where the subgraph is first. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696697 Pulled By: eellison fbshipit-source-id: b0595b7d400161b0972321c55718b67103c7bbcd	2021-03-01 21:22:43 -08:00
Elias Ellison	dbbe21dfd7	Remove unused subgraph vmap api (#52512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52512 This API is not used at all, and is tricky to maintain. When we were using it last we ran into lifetime issues when using `Value *` as the key. In hind sight, we should have been using `value->unique()`, but regardless, this not being used and should be removed. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696695 Pulled By: eellison fbshipit-source-id: 97ed92e88ecab0085fabbac46573611666bf2420	2021-03-01 21:22:39 -08:00
Elias Ellison	b1284cfbfb	Only functionalize ops which we want to include in mkldnn group (#51924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51924 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696705 Pulled By: eellison fbshipit-source-id: df2a780f6316d66f0d6ae99bbb54d044947195e5	2021-03-01 21:22:36 -08:00
Elias Ellison	9a990dafd9	Add a filter to remove mutation (#51923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51923 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696700 Pulled By: eellison fbshipit-source-id: 9665e9b786f55b6e5b98420eae19de262d46bb96	2021-03-01 21:22:33 -08:00
Elias Ellison	f41c80c267	Dont error on 0-dim in convolution (#51922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51922 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696701 Pulled By: eellison fbshipit-source-id: f8b2c19e134931971fac00246920c1584dd43581	2021-03-01 21:22:30 -08:00
Elias Ellison	42bfda36e1	Add 0-dim support for binary mkldnn ops (#51921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51921 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696696 Pulled By: eellison fbshipit-source-id: 96ca79c0d6b5ed7c32c14dc4e7c383f2522a85cb	2021-03-01 21:22:26 -08:00
Elias Ellison	32fed3f375	Handle mkldnn broadcasting in mkldnn fuser (#51736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51736 Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696694 Pulled By: eellison fbshipit-source-id: 473cc64c8d9f775e9d06340437aff2eb6c0619b9	2021-03-01 21:22:23 -08:00
Elias Ellison	a2f7e929ef	Add MKLDNN fuser (#51600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51600 Looking for notes on implementation first, will post more notes on benchmarks and overall thoughts/implementation and solicit more input soon. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696702 Pulled By: eellison fbshipit-source-id: cd612f093fe3859e42fb0b77560ebd1b44fccff7	2021-03-01 21:22:19 -08:00
Elias Ellison	43f56e19a6	[NNC] Make NNC sanitize input names (#52786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52786 Previously, NNC did not sanitize input names. I ran into this in the next PR when making subgraph creation preserve debug names caused a number of NNC cuda failures. I also previously ran into this with some masked_fill failures internally, which led me to disable the operator. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696699 Pulled By: eellison fbshipit-source-id: 7c3af4d559d58762fb8332666784a4d5cd6a4167	2021-03-01 21:22:16 -08:00
Elias Ellison	4b40141d2c	Add support for linear in mkldnn fusion (#51484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51484 This PR moves the linear weights of a frozen model to MKLDNN. When the weights are already in MKLDNN, just computing a single linear by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537854) (taken from aten::matmul), as well as verified that it sped up popular models. . Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696698 Pulled By: eellison fbshipit-source-id: 53d03b9e6956e11b700ee58214e2266e2aa4106a	2021-03-01 21:22:13 -08:00
Elias Ellison	bfae3789ba	Move conv to mkldnn (#51483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51483 This PR moves the conv weights of a frozen model to MKLDNN, and AOT reorders the weights. When the weights are already in MKLDNN, just computing a single conv by converting the input and output from/to mkldnn provides large speedups. I benchmark'd the results of the top 200 shapes in predictor [here](https://www.internalfb.com/phabricator/paste/view/P171537938), as well as verified that it sped up popular models in torchvision. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26696703 Pulled By: eellison fbshipit-source-id: 0b4441bee4f6e0890a4540fbca3bb5e58b8c5adf	2021-03-01 21:19:27 -08:00
Nikitha Malgi	7a60b7dc3e	Add support to compare devices (#53045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53045 Test Plan: ===== python test/test_jit.py -k test_device_not_equal Reviewed By: pbelevich Differential Revision: D26737964 Pulled By: nikithamalgifb fbshipit-source-id: 2205aa1f214a86282602168c364dca1363d2f7dd	2021-03-01 21:04:43 -08:00
Shen Li	a586c02962	Update and expose ZeroRedundancyOptimizer docs (#52937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52937 Test Plan: Imported from OSS Reviewed By: blefaudeux Differential Revision: D26696938 Pulled By: mrshenli fbshipit-source-id: dafb00e5c9f0c0c602f471fdcb6416bde74f806b	2021-03-01 20:50:33 -08:00
Mikhail Zolotukhin	a176c73ed5	[TensorExpr] Reland: "PyBind: bind ExternalCalls." (#53063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53063 The problem was that a derived class was marked with "py::nodelete", while the base class wasn't. Now they both are marked correctly. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26737877 Pulled By: ZolotukhinM fbshipit-source-id: 17d9d430651c8f695fc7b6bf6784e7719e20a4d2	2021-03-01 20:44:10 -08:00
Mikhail Zolotukhin	e22da0a5c4	[TensorExpr] Add IRVerifier. (#52901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52901 This PR implements IR Verifier and adds a call to it in `LoopNest` constructors. Checks that were in expr/stmt constructors before are now moved to the corresponding `::make` functions or to the verifier. They didn't really help from the constructors anyway since an exception thrown from there led to a segfault due to the fact our memory management works (object was not fully created but was registered in the kernel arena for destruction anyway). Fixes #52778. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26682928 Pulled By: ZolotukhinM fbshipit-source-id: c56524015cdffb1ed8bce4394509961a4071dcfa	2021-03-01 20:38:00 -08:00
Michael Suo	3bd779cec6	[rpc] make pickler/unpickler pluggable in RPC (#53050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53050 As title. We would like to use alternative pickler/unpickler implementations without changing the entire RPCPickler, to make it possible to send objects over the wire that are coming from a torch.package Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26734592 Pulled By: suo fbshipit-source-id: d9d9fa62ee15bfcb00e09192030541b61df8c682	2021-03-01 18:40:56 -08:00
Michael Suo	83a93ee145	[package] Pull out _UnpicklerWrapper into PackageUnpickler (#53049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53049 This makes our API symmetric--now we have an `Importer` aware Pickler and Unpickler implementation that have similar interfaces. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26734593 Pulled By: suo fbshipit-source-id: 3479437cf6b98e0d6a8aa4907c75f0c61d5495d4	2021-03-01 18:40:52 -08:00
Michael Suo	ec128eadea	[package] _custom_import_pickler -> _package_pickler (#53048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53048 I am planning the custom pickler and unpicklers that we use as semi-public interfaces for `torch.rpc` to consume. Some prefatory movements here. Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26734594 Pulled By: suo fbshipit-source-id: 105ae1161d90f24efc7070a8d80c6ac3d2111bea	2021-03-01 18:38:43 -08:00
Nikita Shulga	272dfc7bb9	Add MANIFEST.in (#52908 ) Summary: Do not build PyTorch if `setup.py` is called with 'sdist' option Regenerate bundled license while sdist package is being built Refactor `check_submodules` out of `build_deps` and check that submodules project are present during source package build stage. Test that sdist package is configurable during `asan-build` step Fixes https://github.com/pytorch/pytorch/issues/52843 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52908 Reviewed By: walterddr Differential Revision: D26685176 Pulled By: malfet fbshipit-source-id: 972a40ae36e194c0b4e0fc31c5e1af1e7a815185	2021-03-01 18:28:25 -08:00
Martin Yuan	b5ae8e69a7	[Lite Interpreter] Support features from to_backend (#52870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52870 Add the missing parts to support to_backend modules by lite interpreter. 1. Add ISINSTANCE instruction support, which is used in to_backend for output type check. 2. Bypass lite interpreter's type parser by checking the qualified name. If it starts with "torch.jit", use the same type resolver as nn module (starting with "__torch__"). Tests Mobile module is serialized and loaded in ```BackendTest.TestCompiler```. The results are compared to those from original torchscript module. Test Plan: Imported from OSS Reviewed By: raziel Differential Revision: D26715351 Pulled By: iseeyuan fbshipit-source-id: ad9d74ee81c6aa692ab9e5dd7a9003bae5d4f01f	2021-03-01 17:56:01 -08:00
Jane Xu	8467e5cad3	Remove ci-all and release branches running scheduled tests (#53069 ) Summary: The previous code allowed these tests to run every four hours on certain ci-all branches...which is really bad and resource intensive. This code removes that, but then disallows the 11.2 and 9.2 tests to be run on ci-all branches. To debug CUDA 11.2 or 9.2 tests, one must now manually change the config to allow for them. (Look at https://github.com/pytorch/pytorch/issues/51888 and https://github.com/pytorch/pytorch/issues/51598 for examples of how to do that.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/53069 Reviewed By: H-Huang Differential Revision: D26739738 Pulled By: janeyx99 fbshipit-source-id: 7577b9b2e876bac0e4e868ce2a1f3ffdb6aca597	2021-03-01 17:22:13 -08:00
kshitij12345	cfa41cea7e	[numpy] torch.logit: promote integer inputs to float (#52028 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52028 Reviewed By: ngimel Differential Revision: D26400552 Pulled By: mruberry fbshipit-source-id: 5aec9c9755a7ae283aa52294517ea28f4b0fd3e7	2021-03-01 17:07:52 -08:00
Scott Wolchok	c7c03dd388	[PyTorch] Fix TORCH_CHECK_INDEX(false, ...) in IndexKernel (#53028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53028 TORCH_CHECK (and variants) wrap the condition in C10_UNLIKELY, so this code is both prettier and better. ghstack-source-id: 122755165 Test Plan: CI Reviewed By: malfet Differential Revision: D26522821 fbshipit-source-id: 70aa11f1859f979657a1f376f7039b5015c69321	2021-03-01 16:54:20 -08:00
Eli Uriegas	07ae4e9309	scripts: Add script to prep wheels for pypi (#53056 ) Summary: Adds a script so that we can take wheels directly from download.pytorch.org and publish them to pypi This is currently mainly used to prep windows binaries for publication to PyPI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/53056 Reviewed By: H-Huang Differential Revision: D26738642 Pulled By: seemethere fbshipit-source-id: 96777ed6c3f3454bddb4bc13121f727074312816	2021-03-01 16:46:44 -08:00
iramazanli	fd4722949d	Fix the repeated entry in the Tensor Attributes doc (#52995 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52995 Reviewed By: H-Huang Differential Revision: D26732911 Pulled By: iramazanli fbshipit-source-id: 86ab93f7f3540cf16dde02670e05cb56999b4929	2021-03-01 16:42:32 -08:00
Nikita Shulga	e2462745ba	Update kineto submodule (#53039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53039 Reviewed By: gdankel Differential Revision: D26732608 Pulled By: malfet fbshipit-source-id: 5c7f30d237f238fc69a6d2a18a0aee41a68f6f09	2021-03-01 15:43:57 -08:00
Jeff Yang	3993fb2bf9	fix(docs): indent in docstring of key_averages (#53006 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/53006 Reviewed By: H-Huang Differential Revision: D26725101 Pulled By: albanD fbshipit-source-id: 867be12b0ee363a3c0ddcaf8cb4f6354dd4aa901	2021-03-01 15:18:20 -08:00
Rohan Varma	b3bf08e67f	Log nccl debug level in ProcessGroupNCCL (#52803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52803 This is useful for double checking we have the expected nccl_debug level when debugging problematic jobs. New logs: When default is warn: ``` NCCL_ASYNC_ERROR_HANDLING: 0 NCCL_BLOCKING_WAIT: 0 TIMEOUT(ms): 60000 USE_HIGH_PRIORITY_STREAM: 0 NCCL_DEBUG: WARN ``` off: ``` NCCL_ASYNC_ERROR_HANDLING: 0 NCCL_BLOCKING_WAIT: 0 TIMEOUT(ms): 1800000 USE_HIGH_PRIORITY_STREAM: 0 NCCL_DEBUG: N/A ``` ghstack-source-id: 122751110 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D26653699 fbshipit-source-id: 845cc1236f3838f4763c6dcf2a30d059b3d44f02	2021-03-01 14:57:22 -08:00
Ansha Yu	ec42c2d89c	[pyper] fuse clip_ranges+gather_ranges (#52461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52461 TODO: add tests Test Plan: Before: 7.10623 ms/iter 0.0849279 ms. 1.21267%. fb::clip_ranges (212 nodes) 0.254071 ms. 3.62783%. fb::gather_ranges (214 nodes) After: 7.0654 ms/iter 0.300174 ms. 4.2739%. fb::clip_ranges_gather (264 nodes) Reviewed By: hlu1 Differential Revision: D26523903 fbshipit-source-id: 9b2420c522232659b198cbe250d4454bbcd9297b	2021-03-01 14:50:39 -08:00
Jerry Zhang	991160ebd9	[quant][graphmode][fx] Add support for fp16 bmm pattern (#52808 ) (#53021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53021 Add support for producing fp16 bmm pattern Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_bmm Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D26725349 fbshipit-source-id: debee718fc33e562aff3f5664757bb52ee85f651	2021-03-01 14:45:55 -08:00
Edward Yang	b039dd15ce	Delete defunct LegacyTHFunctions templates (#53016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53016 We just checked in the generated files directly. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26724876 Pulled By: ezyang fbshipit-source-id: 887d781cac47b7cf16ba2cd6079c63b8f186fe44	2021-03-01 14:41:45 -08:00
Benjamin Lefaudeux	812339ca3d	[ZeroRedundancyOptimizer] Buckets as tensor view + minimize public interface (#52987 ) Summary: Updated version following https://github.com/pytorch/pytorch/issues/52764 (including comments from Shen), but this one I expect to be able to land. ZeroRedundancyOptimizer: - bucket as tensor views, optional - make a lot of attributes private - minor unit test refactor - adding coverage in the unit test for with and without bucket views Pull Request resolved: https://github.com/pytorch/pytorch/pull/52987 Reviewed By: mrshenli Differential Revision: D26728851 Pulled By: blefaudeux fbshipit-source-id: f8c745966719c9076c20a554ef56198fb838856c	2021-03-01 14:37:04 -08:00
Yi Wang	e10d2f477b	Clang-format c10d/init.cpp (#53008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53008 ghstack-source-id: 122722409 Test Plan: N/A Reviewed By: rohan-varma Differential Revision: D26720507 fbshipit-source-id: e3ddbcd9e430c8261cc5364795e4b55320e05c5c	2021-03-01 14:31:50 -08:00
Yi Wang	084839faa6	Clang-format test_c10d.py (#52978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52978 ghstack-source-id: 122701029 Test Plan: N/A Reviewed By: zhaojuanmao Differential Revision: D26713240 fbshipit-source-id: 25301f794a68bee3d6a73d15986a96edab498310	2021-03-01 14:24:36 -08:00
Jerry Zhang	e00e42dbab	[reland][quant][graphmode][fx][test][refactor] Refactoring binary op tests to split int8 and float16 tests (#52807 ) (#53020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53020 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D26725351 fbshipit-source-id: 35086ab19087501e1c9fdef4f16993ee9f364d0d	2021-03-01 14:06:10 -08:00
Kyle Chen	a9f7ae5357	[ROCm] Enable test cases in test/test_dataloader.py for ROCm (#52766 ) Summary: Enabling test cases in test_dataloader.py for ROCm because they are passing now. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52766 Reviewed By: H-Huang Differential Revision: D26706402 Pulled By: ngimel fbshipit-source-id: 63d4ea6d9b16f6244eb0f0f8f7a957bac8469111	2021-03-01 13:32:35 -08:00
Jerry Zhang	096bea5251	[reland][quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} (#52714 ) (#53019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/53019 Test Plan: python test/test_quantization.py TestQuantizedOps.test_add python test/test_quantization.py TestQuantizedOps.test_mul python test/test_quantization.py TestQuantizedOps.test_add_relu python test/test_quantization.py TestQuantizedOps.test_mul_relu Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D26725350 fbshipit-source-id: 2a89f5da6a21908f454f870521d2a4549fdd291e	2021-03-01 13:19:42 -08:00
Erjia Guan	89b1053413	[DataLoader] Move BufferedShuffle from Dataset to DataPipe (#52141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52141 Remove BufferShuffleDataSet, as it's not being used anywhere within PyTorch (no usage on Github based on a search) and it's not included in the release of PyTorch 1.7.1. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D26710940 Pulled By: ejguan fbshipit-source-id: 90023b4bfb105d6aa392753082100f9181ecebd0	2021-03-01 12:54:44 -08:00
Kyle Chen	f2657d2e4f	[ROCm] Enable test cases in test_cuda.py for ROCm (#52739 ) Summary: Enabling four test cases in test_cuda.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52739 Reviewed By: H-Huang Differential Revision: D26706321 Pulled By: ngimel fbshipit-source-id: 6907c548c4ac4e387f0eb7c646e8a01f0d036c8a	2021-03-01 12:54:40 -08:00
Kyle Chen	0a70ec45d1	[ROCm] Enable test cases in autocast_test_lists.py for ROCm (#52737 ) Summary: Enabling test cases in autocast_test_lists.py for ROCm because they are passing. Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52737 Reviewed By: H-Huang Differential Revision: D26706346 Pulled By: ngimel fbshipit-source-id: c1b3b3d8c0ef2a5b1f7e2bd061a749afbae16590	2021-03-01 12:51:56 -08:00
Facebook Community Bot	4daa81e267	Automated submodule update: FBGEMM (#52992 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `a431ee37cb` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52992 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: dskhudia Differential Revision: D26718007 fbshipit-source-id: 7b35ab2012b8b6300a6e78c8425f9e08864a9f68	2021-03-01 12:46:23 -08:00
Michael Carilli	e36576d153	Probable fix for out of place BinaryOpScalar bad values and/or IMAs on 11.2 (ci-all edition) (#52634 ) Summary: Should close https://github.com/pytorch/pytorch/issues/51992. ci-all resubmit of https://github.com/pytorch/pytorch/pull/52591. The plot also thickened considerably since then. Every foreach functor, it turns out, has bad `r_args` accesses for certain code paths and instantiations. Also, I noticed the [`n % kILP == 0`](`2680ff7759/aten/src/ATen/native/cuda/ForeachFunctors.cuh (L87)`) condition for vectorization in all functors is way too restrictive: it'll refuse to vectorize anything on any tensor whose overall numel is not a multiple of ILP. That's out of scope though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52634 Reviewed By: H-Huang Differential Revision: D26725991 Pulled By: izdeby fbshipit-source-id: 4bade0ac186bf85527baddc1c44b2c2b8e3c9777	2021-03-01 12:41:24 -08:00
peter	8870c391e9	Update mkl to 2020.2.254 (#52964 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52907 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52964 Reviewed By: H-Huang Differential Revision: D26726464 Pulled By: seemethere fbshipit-source-id: 8f3067292e6416e299b4b040c8fb73510134f02e	2021-03-01 11:13:57 -08:00
Nikolay Korovaiko	d4527b4e16	add a full pipeline test for a TypeCheck (#52933 ) Summary: This tests a simple failure mode for a TypeCheck when a shape changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52933 Reviewed By: H-Huang Differential Revision: D26727583 Pulled By: Krovatkin fbshipit-source-id: b277218af9572cd6f89f2ece044f7d84d4c10283	2021-03-01 10:58:08 -08:00
Bert Maher	7d060735ca	Back out "[TensorExpr] PyBind: bind ExternalCalls." Summary: Original commit changeset: e1ea3b3630d1 Test Plan: Broke tests, e.g. T85754010 Reviewed By: ZolotukhinM Differential Revision: D26727166 fbshipit-source-id: 42d2090bc55ec2982a4a08c38278c80617d5398a	2021-03-01 10:51:06 -08:00
MY_	b22b082cc8	Fixed the error of generator in the RandomSampler. (#52956 ) Summary: In `__iter__` of the `RandomSampler`, when `self.replacement` is `False` in the original code, `self.generator` is always used in the `torch.randperm` instead of the generator we set. Fixes https://github.com/pytorch/pytorch/issues/52568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52956 Reviewed By: mruberry Differential Revision: D26724303 Pulled By: H-Huang fbshipit-source-id: 86f2795c76f3548e31181fb077af046078a173cb	2021-03-01 10:05:43 -08:00
momohatt	3403babd94	[doc] Fix documentations of torch functions (#52982 ) Summary: This PR includes multiple small fixes of docstrings. * Fix documentation for [`torch.atleast_2d`](https://pytorch.org/docs/master/generated/torch.atleast_2d.html) and [`torch.atleast_3d`](https://pytorch.org/docs/master/generated/torch.atleast_3d.html) by adding a new line before `Args::`. * Fix indentation for [`torch.isfinite`](https://pytorch.org/docs/master/generated/torch.isfinite.html) and [`torch.isinf`](https://pytorch.org/docs/master/generated/torch.isinf.html). The "Arguments", "Parameters" and "Examples" sections need to be at the same level as the first description. * Insert a new line after `Example::` where it is missing. This makes difference in the way the documentations are rendered: see [this](https://pytorch.org/docs/master/generated/torch.gt.html) (with a new line) and [this](https://pytorch.org/docs/master/generated/torch.triu_indices.html) (without). As the majority of the docs seems to follow the former style, this PR amends the latter cases. * Fix the "Returns" section of [`torch.block_diag`](https://pytorch.org/docs/master/generated/torch.block_diag.html) and [`torch.cartesian_prod`](https://pytorch.org/docs/master/generated/torch.cartesian_prod.html). The second and the subsequent lines shouldn't be indented, as can be seen in the docstring of [`torch.vander`](https://pytorch.org/docs/master/generated/torch.vander.html). * Fix variable names in the example of `torch.fft.(i)fftn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52982 Reviewed By: mruberry Differential Revision: D26724408 Pulled By: H-Huang fbshipit-source-id: c65aa0621f7858b05fd16f497caacf6ea8eb33c9	2021-03-01 09:59:57 -08:00
Vitaly Fedyunin	6d29aa5486	Make lambda supported by Map DataPipe (#52856 ) Summary: Pickle lambda function with `dill` module. Tests are in `torchdata`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52856 Reviewed By: anjali411 Differential Revision: D26673337 Pulled By: VitalyFedyunin fbshipit-source-id: c2a1b41b7c4cd824945a016d3c1637eb489da700	2021-03-01 09:22:45 -08:00
vfdev-5	66f07c0c12	Optimized bilinear interpolation using TensorIterator (#51653 ) Summary: Related to https://github.com/pytorch/pytorch/issues/10482 Description: - Optimized bilinear interpolation for 1d, 2d, 3d cases using TensorIterator <details> <summary> Interpolation 2d - 6 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [1, 3, 320, 320] \| [256, 256] \| True \| False \| 0.3938 \| 0.0782 \| 5.0339 [1, 3, 320, 320] \| [512, 512] \| True \| False \| 1.5585 \| 0.4105 \| 3.7965 [1, 3, 320, 320] \| [256, 256] \| False \| False \| 0.3481 \| 0.0760 \| 4.5780 [1, 3, 320, 320] \| [512, 512] \| False \| False \| 1.5848 \| 0.4091 \| 3.8734 [1, 3, 320, 320] \| [256, 256] \| False \| True \| 1.2058 \| 1.2034 \| 1.0020 [1, 3, 320, 320] \| [512, 512] \| False \| True \| 4.8691 \| 4.8537 \| 1.0032 [32, 128, 64, 64] \| [32, 32] \| False \| True \| 6.3915 \| 6.4041 \| 0.9980 [32, 128, 64, 64] \| [128, 128] \| False \| True \| 166.1769 \| 164.5621 \| 1.0098 [32, 128, 64, 64] \| [32, 32] \| True \| False \| 3.7194 \| 2.4720 \| 1.5046 [32, 128, 64, 64] \| [128, 128] \| True \| False \| 86.6704 \| 52.3754 \| 1.6548 [1, 3, 500, 500] \| [256, 256] \| True \| False \| 0.3270 \| 0.0792 \| 4.1307 [1, 3, 500, 500] \| [800, 800] \| True \| False \| 3.3116 \| 0.5567 \| 5.9482 [1, 3, 500, 500] \| [256, 256] \| False \| False \| 0.3763 \| 0.0773 \| 4.8700 [1, 3, 500, 500] \| [800, 800] \| False \| False \| 3.2577 \| 0.5590 \| 5.8279 </details> <details> <summary> Interpolation 1d - 6 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [4, 512, 320] \| 256 \| True \| False \| 0.2795 \| 0.1032 \| 2.7089 [4, 512, 320] \| 512 \| True \| False \| 0.5533 \| 0.1888 \| 2.9303 </details> <details> <summary> Interpolation 3d - 6 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [1, 3, 16, 320, 320] \| [8, 256, 256] \| True \| False \| 4.4105 \| 2.1236 \| 2.0769 [1, 3, 16, 320, 320] \| [32, 512, 512] \| True \| False \| 83.9426 \| 42.6641 \| 1.9675 [1, 3, 16, 320, 320] \| [8, 256, 256] \| False \| True \| 15.5736 \| 15.5758 \| 0.9999 [1, 3, 16, 320, 320] \| [32, 512, 512] \| False \| True \| 272.4795 \| 273.2745 \| 0.9971 </details> <details> <summary> Interpolation 2d - 1 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [1, 3, 320, 320] \| [256, 256] \| True \| False \| 1.0240 \| 0.4145 \| 2.4705 [1, 3, 320, 320] \| [512, 512] \| True \| False \| 4.0771 \| 1.3836 \| 2.9467 [1, 3, 320, 320] \| [256, 256] \| False \| False \| 0.9771 \| 0.3270 \| 2.9878 [1, 3, 320, 320] \| [512, 512] \| False \| False \| 4.1732 \| 1.2209 \| 3.4180 [1, 3, 320, 320] \| [256, 256] \| False \| True \| 1.5466 \| 1.5363 \| 1.0067 [1, 3, 320, 320] \| [512, 512] \| False \| True \| 6.1555 \| 6.1199 \| 1.0058 [32, 128, 64, 64] \| [32, 32] \| False \| True \| 27.6362 \| 27.5901 \| 1.0017 [32, 128, 64, 64] \| [128, 128] \| False \| True \| 468.6442 \| 465.5163 \| 1.0067 [32, 128, 64, 64] \| [32, 32] \| True \| False \| 20.1495 \| 10.0694 \| 2.0011 [32, 128, 64, 64] \| [128, 128] \| True \| False \| 400.0401 \| 204.0662 \| 1.9603 [1, 3, 500, 500] \| [256, 256] \| True \| False \| 0.8956 \| 0.3366 \| 2.6606 [1, 3, 500, 500] \| [800, 800] \| True \| False \| 8.6554 \| 2.9530 \| 2.9310 [1, 3, 500, 500] \| [256, 256] \| False \| False \| 1.0921 \| 0.3385 \| 3.2263 [1, 3, 500, 500] \| [800, 800] \| False \| False \| 8.9594 \| 2.9627 \| 3.0241 </details> <details> <summary> Interpolation 1d - 1 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [4, 512, 320] \| 256 \| True \| False \| 1.5233 \| 0.5027 \| 3.0301 [4, 512, 320] \| 512 \| True \| False \| 3.0302 \| 0.9735 \| 3.1128 </details> <details> <summary> Interpolation 3d - 1 thread(s) </summary> In \| Out \| Is contiguous \| Channels last \| master \| this PR \| speed-up ---\|---\|---\|---\|---\|---\|--- [1, 3, 16, 320, 320] \| [8, 256, 256] \| True \| False \| 12.0477 \| 11.3196 \| 1.0643 [1, 3, 16, 320, 320] \| [32, 512, 512] \| True \| False \| 222.8618 \| 209.9955 \| 1.0613 [1, 3, 16, 320, 320] \| [8, 256, 256] \| False \| True \| 17.9883 \| 17.9937 \| 0.9997 [1, 3, 16, 320, 320] \| [32, 512, 512] \| False \| True \| 380.7244 \| 380.1916 \| 1.0014 </details> <details> <summary> Versions and build configs </summary> PyTorch master: 1.9.0.dev20210223 PyTorch master build setting: ``` BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, ``` PR : 1.9.0a0+74b172b PR build setting: ``` BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/g++-7, CXX_FLAGS=-O3 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, ``` </details> This description is based on the benchmarks and the code from [here](https://github.com/vfdev-5/interpolate-tensoriterator/tree/master/step_six). TL;DR - Linear upsampling generic implementation using TensorIterator for Nd case (single loop function for 1d, 2d and 3d cases) - can be generalized to nearest, bicubic interpolation modes. - works for channels first and last cases. Joint work with Francisco Massa (fmassa). Pull Request resolved: https://github.com/pytorch/pytorch/pull/51653 Reviewed By: malfet Differential Revision: D26619437 Pulled By: fmassa fbshipit-source-id: 7d435e23881c5b40a18bf0dbcab4906d5462025f	2021-03-01 09:13:31 -08:00
Vasiliy Kuznetsov	0d46926c63	ns for fx: remove subgraphs from user facing API (#52928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52928 Changes the user facing API of `prepare_single_model_output` to require a list of nodes instead of a list of subgraphs. This ensures that how we define a subgraph is an implementation detail and is not exposed to the user, keeping the eng cost of updating this implementation later low. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26693471 fbshipit-source-id: 67c2feb844556225e36f8d6d4023246939bcb445	2021-03-01 08:56:26 -08:00
Vasiliy Kuznetsov	87be8c1d7c	ns for fx: clean up duplicate code in get_matching_activations_a_shadows_b (#52927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52927 Refactor to use an existing util instead of duplicating code, no logic change. Test Plan: CI Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26693474 fbshipit-source-id: 06b7047eb9a762557b7f679347e424c0dd009aad	2021-03-01 08:56:22 -08:00
Vasiliy Kuznetsov	5b93cdace1	ns for fx: remove model_name from get_matching_activations API (#52926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52926 Model name is already stored in the Loggers in the prepare call. Removing the need to specify it again in the extract activations functions, to simplify things. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26693473 fbshipit-source-id: 52511cacc16f79fa09c78ccde78e7f439f4b315c	2021-03-01 08:56:18 -08:00
Vasiliy Kuznetsov	907ee5b290	ns for fx: docblock fixes (#52925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52925 Cleans up some incorrect comments and docblocks in `numeric_suite_core_apis.py`. Test Plan: CI Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26693472 fbshipit-source-id: 17f3ff464c6ea01374bcc6ac5899da7034627152	2021-03-01 08:53:57 -08:00
Alban Desmaison	0569f638fe	Update CODEOWNERS for torch.nn (#52942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52942 Reviewed By: H-Huang Differential Revision: D26699633 Pulled By: albanD fbshipit-source-id: cf8b213e9bb69fa4980dba380bd42deee40faf85	2021-03-01 07:50:03 -08:00
kshitij12345	a06cf5d8a4	[numpy] torch.{rad2deg, deg2rad}: promote integer inputs to float (#51853 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Depends on https://github.com/pytorch/pytorch/issues/51283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51853 Reviewed By: albanD Differential Revision: D26399743 Pulled By: mruberry fbshipit-source-id: a6f0e12723e1451c6479d818752fe5d41788715d	2021-03-01 06:25:23 -08:00
kshitij12345	f5617b0932	[testing] Add Opinfo for torch.frac and minor fixes (#52660 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52660 Reviewed By: ailzhang Differential Revision: D26618151 Pulled By: mruberry fbshipit-source-id: cf0df38e46f44d3afff6e0015af5a840c661aa0e	2021-03-01 04:58:31 -08:00
Mike Ruberry	312b297b82	Revert D26626092: [quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} Test Plan: revert-hammer Differential Revision: D26626092 (`2962fbb03c`) Original commit changeset: 91d040efa51e fbshipit-source-id: cc6bcc0f451d6adcd7bf7572451e6e3cd6ad59d1	2021-03-01 04:52:47 -08:00
Mike Ruberry	03693c7e4a	Revert D26655617: [quant][graphmode][fx][test][refactor] Refactoring binary op tests to split int8 and float16 tests Test Plan: revert-hammer Differential Revision: D26655617 (`f2f7fdba05`) Original commit changeset: c36edef09f52 fbshipit-source-id: 7a43cfc9385e45f4532168d2c3d9227da2f1967f	2021-03-01 04:52:44 -08:00
Mike Ruberry	3a024a7ae2	Revert D26655616: [quant][graphmode][fx] Add support for fp16 bmm pattern Test Plan: revert-hammer Differential Revision: D26655616 (`2c44b256d8`) Original commit changeset: 1d0639303e5c fbshipit-source-id: 403429c706c8a9e6a657669daf8aadf282025f83	2021-03-01 04:50:35 -08:00
Facebook Community Bot	e43ea227fe	Automated submodule update: tensorpipe (#52930 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/tensorpipe](https://github.com/pytorch/tensorpipe). New submodule commit: `4b9f7f8abe` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52930 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: lw Differential Revision: D26694739 fbshipit-source-id: d8c835f6e74fec6e2c9a3a6e6713926ccf7dcedd	2021-03-01 01:54:33 -08:00
Horace He	57c7a61237	[NNC] Added NNC IR specification (#52912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52912 Reviewed By: bhosmer Differential Revision: D26695726 Pulled By: Chillee fbshipit-source-id: c2f1efe0696d7567d4ed85487cc20a2db4e73cd5	2021-03-01 01:48:44 -08:00
Bert Maher	b9e12a0e82	[pytorch] Fix mkldnn heuristic for multithreaded convolution (#52909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52909 PR #46675 introduced heuristics to use thnn_conv2d for 1x1 convolutions, since mkldnn had a bug that was slowing those cases down. Unfortunately, the test plan for that PR only tested single-threaded convolutions; mkldnn is considerably faster on multithreaded convolutions. An example from yolov3, on 24 cores of a Xeon Platinum 8175M CPU @ 2.50GHz ``` input:{1, 64, 192, 256}, weight:{32, 64, 1, 1} thnn_conv2d: GFLOPS/s=104.574G/s mkldnn_convolution: GFLOPS/s=467.357G/s ``` ghstack-source-id: 122627564 Test Plan: Multithreaded 1x1 convolutions Reviewed By: wconstab, xuzhao9 Differential Revision: D26685272 fbshipit-source-id: e8e05db89e43856969e26570a170c13b3e73ac74	2021-02-28 22:46:47 -08:00
Jerry Zhang	2c44b256d8	[quant][graphmode][fx] Add support for fp16 bmm pattern (#52808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52808 Add support for producing fp16 bmm pattern Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_bmm Imported from OSS Reviewed By: vkuzo Differential Revision: D26655616 fbshipit-source-id: 1d0639303e5ca2ca4ceae08d03ebc3b25256de57	2021-02-28 16:48:41 -08:00
jiej	4d94ee566e	Ge v1 (#52136 ) Summary: This is a second attempt to use graph executor to run forward on a gradient. This allows a secondary chance to profile intermediate tensor introduced by autodiff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52136 Reviewed By: pbelevich Differential Revision: D26693978 Pulled By: Krovatkin fbshipit-source-id: 91dde8009a210950af8e5173668ada241e16dd52	2021-02-28 00:53:13 -08:00
Jerry Zhang	f2f7fdba05	[quant][graphmode][fx][test][refactor] Refactoring binary op tests to split int8 and float16 tests (#52807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52807 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26655617 fbshipit-source-id: c36edef09f522fe4c8eb0a8872add80c8dae4938	2021-02-27 23:16:49 -08:00
Jerry Zhang	2962fbb03c	[quant][graphmode][fx][fp16] Add fp16 support for {add\|mul}{_relu} (#52714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52714 Test Plan: python test/test_quantization.py TestQuantizedOps.test_add python test/test_quantization.py TestQuantizedOps.test_mul python test/test_quantization.py TestQuantizedOps.test_add_relu python test/test_quantization.py TestQuantizedOps.test_mul_relu Imported from OSS Reviewed By: vkuzo Differential Revision: D26626092 fbshipit-source-id: 91d040efa51e9c955eb688ec16a30f0c12233958	2021-02-27 22:12:10 -08:00
Shen Li	729d88119a	Fix GradBucket Typing (#52943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52943 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26699759 Pulled By: mrshenli fbshipit-source-id: 712165a29d114da761ef4f161096ca46a958df03	2021-02-27 20:04:38 -08:00
Jerry Zhang	0818dbf49d	[quant][refactor] Merge add and mul handler (#52651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52651 Merging them for easier extensions to fp16 and more binary ops Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26600118 fbshipit-source-id: a1816e593cf3065afe87d2e6e44cdace13bf6aeb	2021-02-27 19:56:32 -08:00
Hao Lu	a296fa36ac	[Caffe2] Implement BlackBoxPredictor::BenchmarkIndividualOps (#52903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52903 Implement BlackBoxPredictor::BenchmarkIndividualOps so that we can clean up the output tensors properly after each iteration and get more accurate per operator timing. Add four more metrics to track setup_time, memory_alloc_time, memory_dealloc_time, and output_dealloc_time. Reviewed By: ajyu Differential Revision: D26657473 fbshipit-source-id: 1cf282192b531513b9ee40b37252087818412f81	2021-02-27 19:49:22 -08:00
Benjamin Lefaudeux	249c213462	[ZeroRedundancyOptimizer] Pytorch compliant state (#52960 ) Summary: Same as https://github.com/pytorch/pytorch/issues/52760 which I could not get to land. I just could not live with ghstack/ghimport/randomly broken things, I break enough of them myself, so this is a fresh copy without ghstack shenanigans. I'm hopeful that this can land relatively bug free, and am sorry for the duplications.. What this does: - call the common_utils test runner instead of unittest, because it seems that it's how it should be done - change the returned state from ZeroRedundancyOptimizer to be PyTorch compliant, which has the added benefit of being elastic (world size independent) Pull Request resolved: https://github.com/pytorch/pytorch/pull/52960 Reviewed By: mrshenli Differential Revision: D26710932 Pulled By: blefaudeux fbshipit-source-id: 1d914bc9221442ba1bb2b48f5df10c313e674ece	2021-02-27 11:54:08 -08:00
Jerry Zhang	b685864f50	[quant][graphmode][fx] Add reference option support for linear_static_fp16 (#52650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52650 linear_dynamic_fp16 has following dtypes for activation, weight, bias, output: (fp32, fp16, fp32, fp32) linear_static_fp16 has following dtypes: (fp16, fp16, fp16, fp16) Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26599803 fbshipit-source-id: b4a8345d355125070be718a227288cc848cc8bbc	2021-02-27 08:25:44 -08:00
danielgordon10	7f1693d95e	Fix type hints of the callable arguments for DataLoader (#52924 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52806 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52924 Reviewed By: malfet Differential Revision: D26694894 Pulled By: ejguan fbshipit-source-id: 55734ec9684caa90f1e599b65659b7c57047f802	2021-02-27 07:45:49 -08:00
Jerry Zhang	177694681e	[quant][graphmode][fx] Add reference option support for linear_dynamic_fp16 (#52534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52534 Currently linear_dynamic_fp16 has a signature that's tied to fbgemm/qnnpack We'll need to produce a pattern equivalent to linear_dynamic_fp16 to support extensions to other backends Test Plan: python test/test_quantization.py TestQuantizeFxOps.test_linear_dynamic_fp16 Imported from OSS Reviewed By: vkuzo Differential Revision: D26557726 fbshipit-source-id: 270c9f781f73c79416a092b7831294cabca84b0c	2021-02-26 21:12:22 -08:00
Mikhail Zolotukhin	e63ec556bf	[TensorExpr] PyBind: bind ExternalCalls. (#52905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52905 Differential Revision: D26707491 Test Plan: Imported from OSS Reviewed By: Chillee Pulled By: ZolotukhinM fbshipit-source-id: e1ea3b3630d115e3d81842895c62e22c4cb06fb8	2021-02-26 20:28:13 -08:00
Tristan Rice	94e23e51c4	[caffe2] EnforceFinite: log blobs finiteness in workspace on error (#52892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52892 When an EnforceFinite check fails this logs all of the tensors in the workspace and whether they are finite or not. This is a little bit hacky since it uses the aten APIs. I've `ifdef`ed the implementation so it should compile fine on xplat and mobile. It's also accessing the workspace directly but since this is a logging op it seems fine to bend the rules. Test Plan: $ buck test //caffe2/caffe2/python/operator_test:enforce_finite_op_test $ buck-out/gen/caffe2/caffe2/python/operator_test/enforce_finite_op_test#binary.par I0225 16:29:46.166507 311548 enforce_finite_op.h:62] blob X isfinite=false Reviewed By: dzhulgakov Differential Revision: D26626336 fbshipit-source-id: f68e219b910a7242f2e72bb4d734c3e84f46eec5	2021-02-26 16:48:19 -08:00
Nikita Shulga	10087337c7	Exclude 'test' from codecoverage (#52935 ) Summary: Also, do not generate coverage report on patch level Pull Request resolved: https://github.com/pytorch/pytorch/pull/52935 Reviewed By: walterddr Differential Revision: D26696285 Pulled By: malfet fbshipit-source-id: 87518682f883c94409778525524e7c392407efa8	2021-02-26 16:41:51 -08:00
Meghan Lele	1d6bd15790	[JIT] Add torch._C._jit submodule (#52910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52910 Summary PR #52158 tried to move all JIT bindings from `torch._C` to a new submodule `torch._C._jit`, but that...did not go well. This pull request adds the new `torch._C._jit` submodule, but does not migrate the existing bindings. Instead, it adds a unit test that fails if any new bindings are added to `torch._C`. A comment in the test instructs developers to add their new binding to the allowlist if it really should be in `torch._C`, or to add it to the appropriate submodule (e.g `torch._C._jit`, for example). The idea is to prevent the issue described in #51691 from getting worse if it cannot be fixed. Test Plan Continuous integration. Fixes This commit fixes #51691. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26698373 Pulled By: SplitInfinity fbshipit-source-id: ec9f5426051227a513d4fd09512b624420e0100b	2021-02-26 16:05:05 -08:00
Jerry Zhang	cb6b65699f	[quant][graphmode][fx] Add support for packed params in state_dict (#51639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51639 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D26228185 fbshipit-source-id: 6cf8b4106fec9c6900521a2afe0de6f3d29cc896	2021-02-26 15:13:50 -08:00
Rohan Varma	b8e6e2971c	Run distributed_test with NCCL_ASYNC_ERROR_HANDLING (#52619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52619 Runs this test suite with nccl_async_error_handling enabled. It is the default to run many distributed training jobs, and can also help catch errors/hangs in tests more easily. We don't expect any changes in the actual existing tests since they shouldn't have any hangs. Also removes a commented out line ghstack-source-id: 122595646 Test Plan: CI Reviewed By: pritamdamania87 Differential Revision: D26588108 fbshipit-source-id: a57bbe2ae5a0c86731d77be45756b17151618eb6	2021-02-26 11:59:49 -08:00
Martin Yuan	b2520ab3dc	Add a demo backend with compiler (#52603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52603 This PR introduced a backend with minimum compilation capability to the to_<backend> flow. The targets are: - Demonstrate the end-to-end flow with adding a backend -> compilation -> runtime - How the backend compilation errors be surfaced to the user, with the original model's source code information. (C++ only in this PR. Python APIs will be demonstrated in a following PR.) Changes: - Compilation 1. A backend with minimum compilation features, "backend_with_compiler_demo" is added. 2. The compilation happens AOT in the ```pre_process``` function registered to this backend. 3. Compiled results are stored in a string blob for each method. They are serialized to the lowered module with ```__get_state__``` function. 4. Error message with model source code is thrown, for features not handled by the backend compiler. - Runtime 1. The compiled blob is loaded in ```__set_state__``` method. 2. The ```compile``` function of the backend pass through the AOT compiled blob. (TODO: parsing the blob to the format that the backend can understand can happen here.) 3. The ```execute``` function of the backend executes the specified method (handle). Test Plan: - ```BackendTest.TestCompiler```: the C++ end-to-end demonstration on a supported model. After compilation and running, the lowered model produces the same result as the original torchscript model. - ```BackendTest.TestCompilerNotSupport```: Demonstrate the error message from the AOT compilation for a feature not supported from the input module. The error message looks like: ``` "The node of aten::mul is not supported in this compiler. Source code: File "<string>", line 3 def forward(self, x, h): return x * h ~~~~~ <--- HERE ``` Reviewed By: raziel Differential Revision: D26593968 Pulled By: iseeyuan fbshipit-source-id: 8f264f60a0470e9f07e36fdeccbf17da6c1d7cd7	2021-02-26 11:53:34 -08:00
Chen Lai	502a85990d	[PyTorch] Move Aten level source list to build_variable.bzl (#52792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52792 Move the aten level source code list from `pt_template_srcs.bzl` to `build_variables.bzl`, such that this source list can be shared by both OSS and internal. ghstack-source-id: 122458909 Test Plan: CI Reviewed By: dhruvbird, iseeyuan Differential Revision: D26647695 fbshipit-source-id: 88469c934d4a73c261418c0c584e46104295a0c2	2021-02-26 11:26:22 -08:00
Nikita Shulga	44b9fcfb55	Fix local version generation (#52898 ) Summary: Add "git" prefix to PyTorch local version, otherwise it might strip leading zeroes from git hashum according to https://www.python.org/dev/peps/pep-0440/#local-version-identifiers: > If a segment consists entirely of ASCII digits then that section should be considered an integer for comparison purposes Fixes https://github.com/pytorch/pytorch/issues/52857 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52898 Reviewed By: anjali411 Differential Revision: D26681878 Pulled By: malfet fbshipit-source-id: 0e7baa2716fc06193cfacd7c4e6cdc6f4bbac4a9	2021-02-26 10:57:07 -08:00
Jacob Szwejbka	155b19ef1a	[Pytorch Mobile] Remove useless line from bundled_inputs (#52824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52824 How was this not breaking? _bundled_inputs_deflated doesnt exist ghstack-source-id: 122491970 Test Plan: unit tests Reviewed By: iseeyuan Differential Revision: D26658098 fbshipit-source-id: 9ebf961b8764ba8779052c520dd46a8724be042a	2021-02-26 10:36:32 -08:00
Eli Uriegas	18ee39944a	.circleci: Change conda image to be cuda specific (#51494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51494 The overall `pytorch/conda-cuda` image was getting to a ridiculous size of 36GB so this splits up that image into cuda specific ones to try and reduce the amount of things we have to download. coincides with: https://github.com/pytorch/builder/pull/634 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: janeyx99 Differential Revision: D26281958 Pulled By: seemethere fbshipit-source-id: 83b498532a6f04801952438537b564f998b62d94	2021-02-26 10:25:04 -08:00
Sam Estep	97568d7471	Use --delta=0 by default for tools/test_history.py (#52877 ) Summary: This is less surprising than the current default, `--delta=12`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52877 Test Plan: Run the example commands from `tools/test_history --help` and check that their output matches that shown. Reviewed By: pritamdamania87 Differential Revision: D26674258 Pulled By: samestep fbshipit-source-id: 1413e11519854b0a47e14af2f1d20c57f145dacd	2021-02-26 08:58:08 -08:00
Hao Lu	7a178a8a52	[Static Runtime] Add memoray alloc/dealloc time to benchmark (#52902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52902 Add more metrics to track memory_alloc_time, memory_dealloc_time, and output_dealloc_time. Reviewed By: maratsubkhankulov Differential Revision: D26660715 fbshipit-source-id: 96c6cfac2d2ec66d4c31c84129721a846c3914f0	2021-02-25 22:55:14 -08:00
Rohan Varma	7cfe140705	Add distributed debug mode func to python (#52481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52481 Adds an API `get_debug_mode` that can be used by distributed package and users to retrieve debug mode. Currently no functionality changes, but wanted to get the bare bones function out and add relevant debug mode logging in follow up diffs. ghstack-source-id: 122471216 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26508972 fbshipit-source-id: d1153774f8697bc925a05db177d71c0566d25344	2021-02-25 22:35:55 -08:00
Rohan Varma	a3cd881890	Fix grammar in reducer warning (#52835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52835 Addresses comment in https://github.com/pytorch/pytorch/pull/52385 that was missed before landing the PR ghstack-source-id: 122543534 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D26660764 fbshipit-source-id: 3edfebed56f382c1414ba9eb65a753ced7e34154	2021-02-25 22:29:52 -08:00
Natalia Gimelshein	af1fb4e4ee	Revert D26641600: [caffe2] move the SaveOp implementation from a header to a .cc file Test Plan: revert-hammer Differential Revision: D26641600 (`3969391c07`) Original commit changeset: 84ebe8164ffa fbshipit-source-id: c3a85b7b15b8cdbf019abfabfd740a5b1d5e8775	2021-02-25 21:32:44 -08:00
Natalia Gimelshein	21c3f6f415	Revert D26617038: [caffe2] use AddNAlreadyReserved() when serializing blobs Test Plan: revert-hammer Differential Revision: D26617038 (`b4a8d98247`) Original commit changeset: 97dedbae889d fbshipit-source-id: 6921d0a64dee26e18f16628773953bbe7280998e	2021-02-25 21:32:40 -08:00
Natalia Gimelshein	69b2d5c7c3	Revert D26641599: [caffe2] update load_save_test.py to also verify the chunking behavior Test Plan: revert-hammer Differential Revision: D26641599 (`cd9ac54ea7`) Original commit changeset: bccb0af157d8 fbshipit-source-id: 9fe35382876d19aefd16496bf8f920e12aa6f169	2021-02-25 21:30:36 -08:00
nikithamalgi	c423733967	Add support for builtin sum (#52188 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18627 Adds torch.sum support for JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/52188 Test Plan: python test/test_jit.py -k test_list_sum python test/test_jit.py -k test_torch_sum Reviewed By: pbelevich, anjali411 Differential Revision: D26670022 Pulled By: nikithamalgifb fbshipit-source-id: eb58f0a3a64dab4b9fa1f4eb854e9854fa9bda55	2021-02-25 21:09:34 -08:00
Vasiliy Kuznetsov	25001a0148	ns for fx: remove ".stats" suffix (#52799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52799 We agreed that it's better to not add this, removing. We can make Eager mode NS match this in a future PR. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652638 fbshipit-source-id: 5baa51a6bf6de5632946417fe9fd3d0f3e78f7fa	2021-02-25 20:45:53 -08:00
Vasiliy Kuznetsov	1d3172130d	ns for fx: add node name and type to results (#52798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52798 Adds the node name and node target type to Numerical Suite outputs. This is useful to debug which node got matched to which node, and what is the type of the operation. ``` // before { layer_name: { model_name: { 'type': 'weight', 'values': [...], }, }, } // after { layer_name: { model_name: { 'type': 'weight', 'values': [...], 'node_name': '0', 'node_target_type': "<class 'torch.nn.modules.conv.Conv2d'>", }, }, } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652637 fbshipit-source-id: ba75b110cb91234f17a926ccbc5d0ccee2c3faeb	2021-02-25 20:45:49 -08:00
Vasiliy Kuznetsov	d2e88246d8	ns for fx: make return type of ns APIs future proof (#52789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52789 Changes the return type of NS APIs from ``` { layer_name: { model_name: [torch.Tensor(...), ...], }, } ``` to ``` { layer_name: { model_name: { 'type': 'weight', # or node_output, etc 'values': [torch.Tensor(...), ...], // future info can be added here, such as node name, etc }, } ``` Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652640 fbshipit-source-id: 4b31164e402754141368d5a04d595f2b643af3bb	2021-02-25 20:45:44 -08:00
Vasiliy Kuznetsov	fe068157de	ns for fx: unify return types of weight and activation APIs (#52779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52779 1. makes the return type of the weight comparison APIs match the return type of the activation comparison APIs: ``` # before {layer_name: {model_name: weight_tensor}} {layer_name: {model_name: [activation_tensor]}} # after {layer_name: {model_name: [weight_tensor]}} {layer_name: {model_name: [activation_tensor]}} ``` 2. makes a type alias for the type, so future changes are easier Test Plan: ``` mypy torch/quantization python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652639 fbshipit-source-id: eb1f04d6913cedf88d628f362468875ae9ced928	2021-02-25 20:45:39 -08:00
Vasiliy Kuznetsov	7094d970d1	ns for fx: decouple subgraph names from node names (#52771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52771 Before this PR, subgraph names were derived from node names in model B. For example, if we had ``` A: linear0 -> relu0 -> ... B: linear_relu0 -> ... ``` Then the subgraph name would be `linear_relu0`, and the outputs before this PR would look like ``` { 'linear_relu0': { 'model_a': ..., 'model_b': ..., }, } ``` This PR decouples subgraph naming from node names. The outputs after this PR look like: ``` { # guaranteed to match the right subgraphs across different models # without needing more than one model during the prepare passes 'base_op_torch.nn.functional.linear_0': { 'model_a': ..., 'model_b': ..., }, } ``` There are future requirements for which using node_name as subgraph name does not work well: a. the need to support N models, without having all of them in memory at the same time b. the need to support fusions and match subgraphs with related but non-equal types This PR changes the naming of subgraphs to be based on two things: 1. the name of the underlying set of related ops (i.e. `torch.nn.functional.linear`) 2. the order in which this subgraph was named (i.e. `foo_0`, `foo_1`, ...) Basically, we can't use a node name because of (a), since there must be a reference model which node name other models must use, but that reference model is not guaranteed to be available. Note: we could add some state and require the reference model to go through the APIs first, saving the reference node names, but I'm deliberately not doing that to minimize the state used throughout. To support (b), we need a way to determine a name of a subgraph which is the same for all related subgraphs (i.e. linear-relu vs quantized_linear vs quantized_linear_relu). In this PR, this is done by using the base aten op's name. We use a string name so it looks nice in the output (I tried `str(underlying_type)`, and it is not easy for humans to read). Note: after this PR, it's hard to parse the results to see which layer is related to which node in the graph. This will be fixed in a future PR where we will store the node name on the logger, and expose it in the output. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher python test/test_quantization.py TestFXGraphMatcherModels python test/test_quantization.py TestFXNumericSuiteCoreAPIs python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26652641 fbshipit-source-id: ee8dacc2d6e875357c1574cbf426923f9466ea10	2021-02-25 20:43:45 -08:00
Adam Simpkins	cd9ac54ea7	[caffe2] update load_save_test.py to also verify the chunking behavior Summary: Add some small utility functions to read the blob names back from the minidb file so that we can verify how many chunks were written for each blob. Test Plan: buck test caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26641599 fbshipit-source-id: bccb0af157d85e585e95bc7be61c4584fba3cb04	2021-02-25 20:24:06 -08:00
Adam Simpkins	b4a8d98247	[caffe2] use AddNAlreadyReserved() when serializing blobs Summary: Optimize the blob serialization code by using `AddNAlreadyReserved()` when serializing tensor data, rather than making N separate `Add()` calls. `AddNAlreadyReserved()` is a simple addition operation, while each `Add()` call checks to see if it needs to reserve new space, and then updates the element data, which is unnecessary in this case. Test Plan: This appears to improve raw serialization performance by 30 to 35% for float, double, and int64_t types which use this function. This improvement appears relatively consistent across large and small tensor sizes. Differential Revision: D26617038 fbshipit-source-id: 97dedbae889d35463628f3016ac56986e685289e	2021-02-25 20:24:01 -08:00
Adam Simpkins	3969391c07	[caffe2] move the SaveOp implementation from a header to a .cc file Summary: Move the `SaveOp` code from `load_save_op.h` to `load_save_op.cc`. Previously this implementation was all in the templatized `SaveOp` class, even though most of the logic didn't depend on the template parameters. Having this code be in the header file slows down the build, and forces more files to be rebuilt than necessary when changing the SaveOp code. Having this code be in a template class can also increase the generated code size be larger than needed, as we don't need separate copies instantiated for each context type. Test Plan: buck test //caffe2/caffe2/python/operator_test:load_save_test Reviewed By: mraway Differential Revision: D26641600 fbshipit-source-id: 84ebe8164ffac1e4a691be41147f0c5d8e890e09	2021-02-25 20:21:55 -08:00
Lance Ware	fdd25f82c9	Update to replace AT_ERROR with TORCH_CHECK (#52711 ) Summary: Fixes #{52699} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52711 Reviewed By: ailzhang Differential Revision: D26654677 Pulled By: malfet fbshipit-source-id: 97079250d144c9b1c69028f35e4a23a34481b2a5	2021-02-25 19:51:29 -08:00
Nikita Shulga	a0a1bb074b	Make NumPy dependency dynamic (#52794 ) Summary: Move NumPy initialization from `initModule()` to singleton inside `torch::utils::is_numpy_available()` function. This singleton will print a warning, that NumPy integration is not available, rather than fails to import torch altogether. The warning be printed only once, and will look something like the following: ``` UserWarning: Failed to initialize NumPy: No module named 'numpy.core' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:66.) ``` This is helpful if PyTorch was compiled with wrong NumPy version, of NumPy is not commonly available on the platform (which is often the case on AARCH64 or Apple M1) Test that PyTorch is usable after numpy is uninstalled at the end of `_test1` CI config. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52794 Reviewed By: seemethere Differential Revision: D26650509 Pulled By: malfet fbshipit-source-id: a2d98769ef873862c3704be4afda075d76d3ad06	2021-02-25 19:45:00 -08:00
Vitaly Fedyunin	9a03e65456	Adding functional way of stacking DataPipes with fixed mypy (#52885 ) Summary: Readding reverted PR with MyPY fixed Pull Request resolved: https://github.com/pytorch/pytorch/pull/52885 Reviewed By: ejguan Differential Revision: D26676405 Pulled By: VitalyFedyunin fbshipit-source-id: 020216c5522d21a4994cd896ae778c0b77f6444b	2021-02-25 19:37:35 -08:00
James Reed	f40c9db622	[FX][EZ] Hoist custom class .so loading into setUp (#52883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52883 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26675802 Pulled By: jamesr66a fbshipit-source-id: 7a7bcb1d0a6f8c9b1431bc3e09143ada6e5fbf4d	2021-02-25 18:46:05 -08:00
Jerry Zhang	6514a47385	[quant] Fix conv packed param serialization in state_dict (#52787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52787 Current conv packed param can be serialized/deserialized with `torch.jit.save/torch.jit.load`, but can't be saved with `torch.load(m.state_dict())/torch.save(m.state_dict())` reason is (from James): ``` I think the issue probably has to do with the normal pickle deserialization not detecting List[Optional[Tensor]] if it doesn't witness a None in the list. IIRC this is implemented on the TorchScript side through this type tag mechanism: https://github.com/.../jit/serialization/unpickler.cpp... ``` This PR is a hack but acceptable to JIT team until a proper solution is proposed. Test Plan: Will be tested in next PR Imported from OSS Reviewed By: vkuzo Differential Revision: D26649272 fbshipit-source-id: 4fc47a4c63e4cd1fabb404de5f0b95e127a9fca0	2021-02-25 17:52:31 -08:00
Vasiliy Kuznetsov	a27aaa49aa	quant norm layers: move scale + zp to buffers (#52861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52861 Currently, scale and zp in these layers are not buffers, which means they do not get saved to the state dict. Movin them into buffers to allow people to properly use state_dict. Note: this is a redo of https://github.com/pytorch/pytorch/pull/45313, with BN taken out. Not doing this for BN because it has dependencies on existing behavior. We should clean it up eventually. Note: not handling BC because it's 100% broken now, so there is no practical value in handling BC. Test Plan: ``` python test/test_quantization.py TestPostTrainingStatic.test_normalization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26671761 fbshipit-source-id: 7615b1dd0d1ae88eeff8b1d150f3846815dc2bc9	2021-02-25 17:23:39 -08:00
James Reed	51d8543ac7	[FX] Use precompiled regex in graph name processing (#52853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52853 ghstack-source-id: 122531132 Test Plan: waitforsadcastle Reviewed By: anjali411 Differential Revision: D26668527 fbshipit-source-id: bd34d860cd3a71d3b29f2430df97a0501d542f5b	2021-02-25 17:21:38 -08:00
Eli Uriegas	569d4fe3f9	.github: Add workflow to build conda packages (#51243 ) Summary: Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/51243 Reviewed By: walterddr Differential Revision: D26669795 Pulled By: seemethere fbshipit-source-id: 1e54aa8cab2b0b5324815fa4f1706e468f9f57dd	2021-02-25 16:50:02 -08:00
Nikita Shulga	649760e5f1	`maybe_resize_storage_cuda` new_size argument should be unsigned (#52672 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52672 This allows correct handling on a very large tensor allocations Also, replace AT_ERROR with TORCH_CHECK(false) Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26607547 Pulled By: malfet fbshipit-source-id: 247f7e8c59f76af3b95799afc9bc4ab4cc228739	2021-02-25 16:30:16 -08:00
neerajprad	0f3a3f22af	Add sample validation for LKJCholesky.log_prob (#52763 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52724. This fixes the following for the LKJCholesky distribution in master: - `log_prob` does sample validation when `validate_args=True`. - exposes documentation for the LKJCholesky distribution. cc. fehiepsi, fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/52763 Reviewed By: anjali411 Differential Revision: D26657216 Pulled By: neerajprad fbshipit-source-id: 12e8f8384cf0c3df8a29564c1e1718d2d6a5833f	2021-02-25 16:12:29 -08:00
Xiang	a52001f923	Improve test_reference_numerics (#51604 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50749 ci-all version of https://github.com/pytorch/pytorch/pull/50550 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51604 Reviewed By: anjali411 Differential Revision: D26666951 Pulled By: mruberry fbshipit-source-id: b87db68f1d2a0f6c151edbc5c7809bbceece69b0	2021-02-25 15:38:42 -08:00
Can Balioglu	94da8b9816	Fix resource leak bug in TCPStore constructor (#52860 ) Summary: This PR fixes a resource leakage bug in the constructor of `TCPStore` where an exception thrown in `TCPStoreDaemon` or `tcputil::connect()` can leave the server socket dangling. The ideal long-term solution would be to have a RAII wrapper for TCP sockets returned by `tcputil`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52860 Reviewed By: osalpekar Differential Revision: D26671775 Pulled By: cbalioglu fbshipit-source-id: ccebbd7533ac601a4b80e6e759f2fb4fe01c70fa	2021-02-25 15:32:38 -08:00
Bert Maher	8ba7c4918a	[nnc] Test for direct usage of ramp/broadcast Summary: I was attempting to experiment with "manual" vectorization, and boy was it hard. I finally came up with this, which I want to write down as a test case. Eventually the APIs should make this easier... Test Plan: buck test Reviewed By: navahgar Differential Revision: D26631189 fbshipit-source-id: c28794b25d7852890ea843fdbcaf8751648258c0	2021-02-25 15:02:20 -08:00
Akao, Kazutoshi	0b93974075	Fix incorrect runtime error in mul_() when the tensor layout is Mkldnn (#51758 ) Summary: Calling Mkl-layout's mul_ from C++ API raises a RuntimeError. Error message is bellow: ``` terminate called after throwing an instance of 'c10::Error' what(): unsupported tensor layout: Mkldnn ``` Environment ・CPU : Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz ・OS : 18.04.1 LTS ・compiler : gcc 7.5.0 ・branch : master ・commit ID: 16cfe97 ・build Environment variable: USE_CUDA=0, USE_DISTRIBUTED=0, USE_MKLDNN=1 ・Python: 3.6.9 CMakeLists.txt ``` cmake_minimum_required(VERSION 3.0 FATAL_ERROR) project(mkldnn_test) find_package(Torch REQUIRED) add_executable(mkldnn_test mkldnn_test.cpp) target_link_libraries(mkldnn_test "${TORCH_LIBRARIES}") set_property(TARGET mkldnn_test PROPERTY CXX_STANDARD 14) ``` mkldnn_test.cpp ``` #include <torch/torch.h> int main() { torch::Tensor a = torch::randn({2, 2}); torch::Tensor a_mkl = a.to_mkldnn(); a.mul_(0.5) a_mkl.mul_(0.5); std::cout << a << std::endl; std::cout << a_mkl.to_dense() << std::endl; return 0; } ``` Expected Result ``` $ ./mkldnn_test 0.1344 0.8107 -0.8157 -0.2610 [ CPUFloatType{2,2} ] 0.1344 0.8107 -0.8157 -0.2610 [ CPUFloatType{2,2} ] ``` Execution Result ``` $ ./mkldnn_test terminate called after throwing an instance of 'c10::Error' what(): unsupported tensor layout: Mkldnn Exception raised from validate at /home/gtka7311/pytorch_v180/c_api_test/pytorch/aten/src/ATen/TensorIterator.h:128 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f8a1472690b in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/1: c10::detail::torchCheckFail(char const, char const, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f8a1472316e in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libc10.so) frame https://github.com/pytorch/pytorch/issues/2: <unknown function> + 0x965bc3 (0x7f8a0d07dbc3 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/3: at::TensorIteratorBase::populate_operands(at::TensorIteratorConfig&) + 0xf1 (0x7f8a0d079ee1 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/4: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x3b (0x7f8a0d07ad3b in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/5: at::TensorIteratorBase::build_binary_op(at::Tensor const&, at::Tensor const&, at::Tensor const&) + 0x129 (0x7f8a0d07b339 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/6: at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&) + 0x38 (0x7f8a0d07b418 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/7: at::native::mul_out(at::Tensor&, at::Tensor const&, at::Tensor const&) + 0x33 (0x7f8a0d217793 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/8: at::native::mul_(at::Tensor&, c10::Scalar) + 0x45 (0x7f8a0d217865 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/9: <unknown function> + 0x1435c21 (0x7f8a0db4dc21 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/10: at::Tensor& c10::Dispatcher::call<at::Tensor&, at::Tensor&, c10::Scalar>(c10::TypedOperatorHandle<at::Tensor& (at::Tensor&, c10::Scalar)> const&, at::Tensor&, c10::Scalar) const + 0x15c (0x7f8a0d9e482c in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/11: <unknown function> + 0x2a86269 (0x7f8a0f19e269 in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/12: at::Tensor& c10::Dispatcher::call<at::Tensor&, at::Tensor&, c10::Scalar>(c10::TypedOperatorHandle<at::Tensor& (at::Tensor&, c10::Scalar)> const&, at::Tensor&, c10::Scalar) const + 0x15c (0x7f8a0d9e482c in /home/gtka7311/pytorch_v180/c_api_test/pytorch/torch/lib/libtorch_cpu.so) frame https://github.com/pytorch/pytorch/issues/13: main + 0xfd (0x5653221cd282 in ./mkldnn_test) frame https://github.com/pytorch/pytorch/issues/14: __libc_start_main + 0xe7 (0x7f8a0bba5b97 in /lib/x86_64-linux-gnu/libc.so.6) frame https://github.com/pytorch/pytorch/issues/15: _start + 0x2a (0x5653221ccf2a in ./mkldnn_test) ``` Modification policy for the code Generally ``mul_`` is processed by ``TensorIterator`` of ``mul_out``. However, ``TensorIterator`` does not support ``Mkl-Layout tensor``. Therefore, to solve this problem, modified ``aten/src/ATen/native/BinaryOps.cpp`` so that ``mkldnn_mul_out`` would be executed if ``Mkl-Layout tensor`` is inputed in ``mul_out``. The modifications of the code are as follows: ``` diff --git a/aten/src/ATen/native/BinaryOps.cpp b/aten/src/ATen/native/BinaryOps.cpp index ee55114285..5c403546f2 100644 --- a/aten/src/ATen/native/BinaryOps.cpp +++ b/aten/src/ATen/native/BinaryOps.cpp @@ -270,6 +270,9 @@ Tensor& floor_divide_(Tensor& self, const Tensor& other) { } Tensor& mul_out(Tensor& result, const Tensor& self, const Tensor& other) { + if (self.is_mkldnn()) { + return native::mkldnn_mul_out(result, self, other); + } auto iter = TensorIterator::binary_op(result, self, other); mul_stub(iter.device_type(), iter); return result; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51758 Reviewed By: pbelevich Differential Revision: D26655442 Pulled By: bdhirsh fbshipit-source-id: fcc5e74734cae91f725fab525f181b3066eafa28	2021-02-25 14:36:01 -08:00
Howard Huang	da732c76c4	Revert D26644079: [pytorch][PR] Adding functional way of stacking DataPipes Test Plan: revert-hammer Differential Revision: D26644079 (`7972036bbb`) Original commit changeset: dcf464637b4f fbshipit-source-id: a12a06d7e7fb3821a0990bbc6305d02721ead82c	2021-02-25 14:30:49 -08:00
Stephen Jia	c2558b4b61	[vulkan] Add nonVarTypeModeGuard to vulkan tests and speed_benchmark_torch (#52535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52535 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D26580994 Pulled By: SS-JIA fbshipit-source-id: 94f091432265cf6607b73c34846c07273d47c70b	2021-02-25 14:23:40 -08:00
Chester Liu	e94940b169	Use touch() in pathlib for better compatibility on Windows (#52729 ) Summary: https://github.com/pytorch/pytorch/issues/52477 introduced the usage of `touch`, which is not available on plain Windows environment, unless you made all the things come with Git Bash available. This PR fixes the build break on those systems by using the `touch` provided by Python pathlib. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52729 Reviewed By: anjali411 Differential Revision: D26666724 Pulled By: walterddr fbshipit-source-id: aae357eb55c6787631eadf22bee7901ad3c2604e	2021-02-25 13:46:21 -08:00
Vasiliy Kuznetsov	19a8ada8d5	quant: fix conv transpose with qconfig == None (#52844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52844 Fixes a crash in qconfig checking which happened if a model had conv transpose with qconfig set to None. Test Plan: ``` python test/test_quantization.py TestPostTrainingStatic.test_convtranspose_per_channel_qconfig_none ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26666043 fbshipit-source-id: e1b62840b4e3c67acbb4dbdcd32514b374efce1e	2021-02-25 11:52:30 -08:00
MiHarsh	c871abecf5	Added torch.no_grad() to update_bn (#52654 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52055 This fixes the out of memory error while using update_bn in SWA, by not allocating memory for backpropagation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52654 Reviewed By: malfet Differential Revision: D26620077 Pulled By: albanD fbshipit-source-id: 890b5a78ba9c1a148f3ab7c63472a73d8f6412a4	2021-02-25 11:35:38 -08:00
XiaobingSuper	0e86f14ec0	Upgrade onednn to v.1.8.1 (#51184 ) Summary: This PR is upgrade onednn to v1.8.1 to bug fixed. - https://github.com/pytorch/pytorch/issues/50042 is fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51184 Reviewed By: ailzhang Differential Revision: D26645894 Pulled By: VitalyFedyunin fbshipit-source-id: 5fb3e5f673c819bccc158672e4b648e570bda3a0	2021-02-25 11:29:15 -08:00
Vitaly Fedyunin	7972036bbb	Adding functional way of stacking DataPipes (#52507 ) Summary: Allows to use functional API to stack datapipes: ```python numbers_dp = NumbersDataset(size=10).filter(filter_fn = lambda x: x % 2 == 1).map(fn = lambda x: x * 10) ``` DataPipes have to be decorated with: ```python functional_datapipe('map') class MapIterDataPipe(IterDataPipe[T_co]): ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52507 Reviewed By: ailzhang Differential Revision: D26644079 Pulled By: VitalyFedyunin fbshipit-source-id: dcf464637b4fcf9ea1eb8e84c2a0cd4dfd58b43d	2021-02-25 11:22:01 -08:00
Can Balioglu	a11b601100	Expose Store's timeout and TCPStore's host and port in Python API (#52784 ) Summary: This PR introduces the `timeout` accessor to `Store` and `host`, `port` accessors to `TCPStore` to help testing and troubleshooting higher level APIs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52784 Reviewed By: anjali411 Differential Revision: D26648202 Pulled By: cbalioglu fbshipit-source-id: 9cf23bf998ed330d648dfec2a93e1bbb50817292	2021-02-25 11:05:15 -08:00
Joel Schlosser	f974cf4688	Test for distributed RL with RPC (#52393 ) Summary: Addresses one item in https://github.com/pytorch/pytorch/issues/46321 ## Background This is a test version of the RL RPC example defined [here](https://github.com/pytorch/examples/blob/master/distributed/rpc/rl/main.py) and [here](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html), with the following differences: * It defines and uses a `DummyEnv` to avoid a dependency on `gym`. The `DummyEnv` simply returns random states & rewards for a small number of iterations. * It removes the `ArgumentParser` and utilizes `RpcAgentTestFixture` + hard-coded constants for configuration and launching. * It changes the worker names to match what the internal Thrift RPC tests expect. The code is purposefully kept very similar to the original example code outside of these differences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52393 Test Plan: ``` pytest test/distributed/rpc/test_tensorpipe_agent.py -k test_rl_rpc -vs pytest test/distributed/rpc/test_process_group_agent.py -k test_rl_rpc -vs ``` Reviewed By: glaringlee Differential Revision: D26515435 Pulled By: jbschlosser fbshipit-source-id: 548548c4671fe353d83c04108580d807108ca76e	2021-02-25 10:52:53 -08:00
Luca Wehrstedt	163a91bed3	Fix TensorPipe agent trying to double-set error (#52837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52837 After https://github.com/pytorch/pytorch/pull/52749 we started seeing an increased flakiness of the TensorPipeDistAutogradTestWithSpawn.test_backward_node_failure_python_udf test, with failures like this one: https://app.circleci.com/pipelines/github/pytorch/pytorch/277824/workflows/cfcbef5a-544e-43bd-b3b0-ebc7b95134fe/jobs/11145394 https://gist.github.com/lw/a0b48900673b5ae0f5d03aca1e72ffff The logs are very clear and point to the changes in the error handling code upon a write error. Namely, the bug is triggered when a incoming read fails while there is an outgoing write, in which case the read callback (invoked first) will flush all pending futures, which then causes the write callback (invoked after) to not find the future it's looking for. In a sense this bug wasn't introduced by https://github.com/pytorch/pytorch/pull/52749, however that PR introduced a check for whether the outgoing message was found, whereas before we would silence such a condition. A fix for this could be to just resume silencing the error. However, I'm trying to go a bit further: when an outgoing write fails, we know that all subsequent callbacks will fail too, and thus all pending operations should be flushed. Hence we can do so, instead of just trying to flush a single given operation. This allows us to merge the error-handling code of both the read and write paths. ghstack-source-id: 122509550 Test Plan: Will export to GitHub, run on CircleCI, and manually SSH into a machine and stress-run that test that was flaky. Reviewed By: mrshenli Differential Revision: D26663448 fbshipit-source-id: fbff0f6aff0d98994c08018a27c47c97149b920c	2021-02-25 10:47:04 -08:00
Luca Wehrstedt	3ff6c9174a	Update TensorPipe submodule (#52677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52677 Test Plan: CircleCI Reviewed By: beauby Differential Revision: D26609075 fbshipit-source-id: 7dc2f8a1e6b9d8fe1ff49398379888237c115f2b	2021-02-25 10:36:56 -08:00
pbialecki	39fa0b5d0a	Add scatter_add to amp promote list (#52133 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51730 I've added the `scatter_add` and `scatter_add.dimname` to the promote list as well as test cases for the former op. However, it seems that `scatter_add` [doesn't support named tensors yet](`8b0cb5ede3/aten/src/ATen/native/NamedTensor.cpp (L356-L358)`) (thanks t-vi for the pointer): ```python dev = 'cuda' torch.scatter_add(torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')), 'C', torch.randint(0, 2, (2, 2, 2), device=dev), torch.randn((2, 2, 2), dtype=torch.float32, device=dev)) > RuntimeError: scatter_add: You passed a dimname (string) to this op in place of a dimension index but it does not yet support this behavior. Please pass a dimension index to work around this. ``` which raised this error after adding this test case. I'm thus unsure, if I should also remove `scatter_add.dimname` from the promote list or not. In any case, once named tensors are supported a potential test could be added as: ```python ("scatter_add", (torch.zeros(2, 2, 2, dtype=torch.float16, device=dev, names=('N', 'C', 'L')), 'C', torch.randint(0, 2, (2, 2, 2), device=dev), torch.randn((2, 2, 2), dtype=torch.float32, device=dev))), ``` CC mcarilli ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/52133 Reviewed By: ejguan Differential Revision: D26440392 Pulled By: ngimel fbshipit-source-id: f4ee2d0b9e1f81afb6f94261c497cf2bf79ec115	2021-02-25 09:37:01 -08:00
Jeff Yang	316eabe9ba	fix(docs): remove redundant hardsigmoid() in docstring to show up `inplace` parameter (#52559 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52559 Reviewed By: ailzhang Differential Revision: D26636347 Pulled By: vkuzo fbshipit-source-id: da615d0eb6372637a6441e53698e86252591f6d8	2021-02-25 09:09:32 -08:00
Vasiliy Kuznetsov	1618dc2ac6	ns for fx: update graph matching to handle dicts and tuples in node args (#52681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52681 Updates the NS graph matching to properly traverse through args of nodes if args are lists or tuples. As a side benefit, refactors the code to make future similar improvements easier. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26611221 fbshipit-source-id: 4ddd9b26338a5a2763b2883967e100f73e207538	2021-02-25 08:53:44 -08:00
Vasiliy Kuznetsov	608f44b24b	ns for fx: update graph matching to not match nodes with equal types (#52402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52402 Before this PR, any pair of subgraphs with base nodes of equal types matched. While sometimes this is useful, this should be off by default to properly handle user defined modules and functions, for which we do not know how to extract weights or cast to the right input type. In a future PR, we can add hooks to turn on matching for nodes of equal types, for the situations where it makes sense. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_nodes_with_equal_types_do_not_get_matched ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26499848 fbshipit-source-id: 5818b88eb7fd8ed36390f60aa1a18228bb50507e	2021-02-25 08:53:39 -08:00
Vasiliy Kuznetsov	4483c48eb1	ns for fx: support linear_relu for weight matching (#52395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52395 Simple change to add logic to get the weight of a quantized `linear_relu` node. More flavors of conv and linear will be added in future PRs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_compare_weights_fun ``` Imported from OSS Reviewed By: hx89 Differential Revision: D26497992 fbshipit-source-id: e6d88e92eedd6cdbf9116cbcfc8f6164f8499246	2021-02-25 08:53:35 -08:00
Vasiliy Kuznetsov	64b4e37c26	ns for fx: allow graph matching of parents of cat (#52368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52368 Before this PR, the graph matching logic only handles node arguments of type Node. This PR extends it to allow to handle node arguments of type Tuple, so that the matcher can properly navigate through the arguments of `cat`. Test Plan: ``` python test/test_quantization.py TestFXGraphMatcher.test_nodes_before_cat ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26490101 fbshipit-source-id: 2de8d6acc30f237e22bfc3cfa89728b37411aab6	2021-02-25 08:51:48 -08:00
Kimish Patel	13121598ef	[Pytorch, sparsity] Bug fix to update requantization and zp parameters of input (#52797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52797 Also sneaking in change to check for realloc failure for packed activation buffer FB: In dynamic quantization input's quantization scale and zero point can be different on every iterations. Thus requantization scale needs to be recomputed. Earlier bug that calculated those only at op creation time results in wrong results on subsequent runs. This diff fixes that. Test Plan: FB: buck test caffe2/torch/fb/model_optimization:sparsity_test Reviewed By: z-a-f, jiatongzhou Differential Revision: D26651968 fbshipit-source-id: e5b9acef03fc45f31c43d88a175f3a64f7dbf4bd	2021-02-25 08:44:30 -08:00
Bel H	99a428ab22	Lower ReLu6 to aten (#52723 ) Summary: -Lower Relu6 to ATen -Change Python and C++ to reflect change -adds an entry in native_functions.yaml for that new function -this is needed as we would like to intercept ReLU6 at a higher level with an XLA-approach codegen. -Should pass functional C++ tests pass. But please let me know if more tests are required. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52723 Reviewed By: ailzhang Differential Revision: D26641414 Pulled By: albanD fbshipit-source-id: dacfc70a236c4313f95901524f5f021503f6a60f	2021-02-25 08:38:11 -08:00
Luca Wehrstedt	fa7575ea05	Update backwards compatibility check to ignore reverted op (#52841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52841 ghstack-source-id: 122515522 Test Plan: CircleCI Reviewed By: malfet Differential Revision: D26665136 fbshipit-source-id: f2aafa8e05f39e284f66f88685d9ce675bebe1cf	2021-02-25 08:28:38 -08:00
Sam Estep	914126901e	Fix typos in tools/test_history.py helpstring (#52840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52840 Test Plan: ``` $ tools/test_history.py --help ``` Reviewed By: janeyx99 Differential Revision: D26665121 Pulled By: samestep fbshipit-source-id: 3607a4a598f1b1639ac1752b4e377491bff7188f	2021-02-25 08:21:23 -08:00
Shen Li	1ac59d9db3	Fix RPC get_worker_info for rank=0 (#52804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52804 `rpc.get_worker_info` used to only take string in v1.6. We recently allow it to accept `int` and `WorkerInfo`, but the previous check on `worker_name` is no longer correct. This commit adds explicit `not None` check. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D26655089 Pulled By: mrshenli fbshipit-source-id: fa1545bd6dd2b33bc1e919de46b94e799ab9719c	2021-02-25 08:15:01 -08:00
Jane Xu	f71d9e28f9	Store test filename in test report path (#52791 ) Summary: This way, we can have a mapping from the test files we directly execute (the tests [here](https://github.com/pytorch/pytorch/blob/master/test/run_test.py#L20)) to the test suites that we store data for in XML reports. This will come in use later for categorizing the tests we run in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52791 Reviewed By: samestep Differential Revision: D26655086 Pulled By: janeyx99 fbshipit-source-id: 94be32f80d7bc0ea1a7a11d4c4b1d3d8e774c5ea	2021-02-25 07:53:30 -08:00
Luca Wehrstedt	92a4ee1cf6	Revert D26375734: Implemented torch.linalg.multi_dot Test Plan: revert-hammer Differential Revision: D26375734 (`0396f492b9`) Original commit changeset: 839642692424 fbshipit-source-id: cb64db646010128d802e1930d5e9526c1f7aa6a2	2021-02-25 00:43:57 -08:00
Nikita Vedeneev	0048d97eda	remove index_fill side-effect for scalar tensors (#52209 ) Summary: `index_fill` silently promotes zero dim Tensors to 1-dim Tensors. This PR fixes that. Was: ``` In [1]: import torch In [2]: x = torch.tensor(1) In [3]: idx = torch.tensor(0).long() In [4]: x.dim() Out[4]: 0 In [5]: x.index_fill(0, idx, -1).dim() Out[5]: 1 ``` Now: ``` In [6]: x.index_fill(0, idx, -1).dim() Out[6]: 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52209 Reviewed By: ejguan Differential Revision: D26446470 Pulled By: ngimel fbshipit-source-id: 4737e6941a7216b57f3416b59362817834df3a3a	2021-02-25 00:35:27 -08:00
Mikhail Zolotukhin	57947c5d85	[TensorExpr] Add `Placeholder::handle` method to get the corresponding `BufHandle`. (#52793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52793 Fixes #52776. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26650481 Pulled By: ZolotukhinM fbshipit-source-id: 54461137b857d3ac5d475cfa3d3ba07432c9bf59	2021-02-24 22:47:29 -08:00
Mikhail Zolotukhin	d3b427a0e3	[TensorExpr] Add an unmasked `Load` constructor. (#52790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52790 Fixes #52774. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26649542 Pulled By: ZolotukhinM fbshipit-source-id: ab1c9e55f52e59d0bd00fbde2ec3125f8c7917ee	2021-02-24 22:45:29 -08:00
Bel H	30cb6ac53c	Introduce `mlc` device (ML Compute device) to PyTorch's device list (#50634 ) Summary: Apple recently announced ML Compute, a new framework available in macOS Big Sur, which enables users to accelerate the training of neural networks on Mac hardware. This PR is the first on a series of PRs that will enable the integration with ML Compute. Most of the integration code will live on a separate subrepo named `mlc`. The integration with `mlc` (ML Compute) will be very similar to that of xla. We rely on registering our ops through: TORCH_LIBRARY_IMPL(aten, PrivateUse1, m) { m.impl_UNBOXED(<op_schema_name>, &customized_op_kernel) ... } Pull Request resolved: https://github.com/pytorch/pytorch/pull/50634 Reviewed By: malfet Differential Revision: D26614213 Pulled By: smessmer fbshipit-source-id: 3b492b346c61cc3950ac880ac01a82fbdddbc07b	2021-02-24 22:39:11 -08:00
Richard Barnes	2bdf6305a0	Drop unused variables (#52643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52643 Test Plan: Sandcastle Reviewed By: borovsky-d Differential Revision: D26588961 fbshipit-source-id: 5e00ec05d006ccad8ff8cd98916a0265e592f9fd	2021-02-24 22:20:56 -08:00
Horace He	a649d808e6	Added fast path in the case of no hooks (#52576 ) Summary: See the discussion here: https://github.com/pytorch/pytorch/pull/50431 ~~Not completely done yet - need to figure out the backwards compatibility stuff as well as `RemovableHandle`.~~ ~~Also, this concretely breaks Torchscript (which tries to script the properties), and more generally, probably requires modifying Torchscript hook support: https://github.com/pytorch/pytorch/issues/34329~~ Just kidding, I think all problems are solved :) Another thing I could do in this PR is to simply replace all the `len(x) > 0` checks with the faster checks. That's about 1.5-2k more Python instructions and .4 - .5 microseconds slower. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52576 Reviewed By: ailzhang Differential Revision: D26650352 Pulled By: Chillee fbshipit-source-id: 0fd73e916354b9e306701a8a396c5dc051e69f0d	2021-02-24 21:48:09 -08:00
Xiang Gao	a6b7da7dfe	Add 64bit indexing support for softmax (#52713 ) Summary: fixes https://github.com/pytorch/pytorch/issues/52715 https://github.com/pytorch/pytorch/issues/52716 split across batch dimension Pull Request resolved: https://github.com/pytorch/pytorch/pull/52713 Reviewed By: ailzhang Differential Revision: D26640033 Pulled By: ngimel fbshipit-source-id: f169cb0d6abc1cfbddf658d9775759a7d56f5c12	2021-02-24 21:39:58 -08:00
Luca Wehrstedt	c140a5ec04	Use finer-grained mutexes in TensorPipe RPC agent (#52749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52749 TensorPipe has recently changed some implementation details in how it schedules callbacks and this has exposed an issue in the RPC agent. Previously the callbacks of each pipe were executed independently and possibly simultaneously. For safety reasons (especially during shutdown) TensorPipe now synchronizes the pipes and thus invokes one callback at a time. Another characteristic of TensorPipe is that it "hijacks" some user threads to run some callbacks inline (e.g., if a low-level event loop completes an operation while a pipe is already busy, this completion is queued up and the user callback could be invoked later by a different thread, including the user's own thread). These two effects combined caused a "reentrancy" phenomenon, where calling `context->connect` (formerly on line 850) to create a new client-side pipe could cause invoking a read callback on another pipe. Since we were holding `mutex_` when calling `context->connect`, and we were trying to re-acquire `mutex_` inside the read callback, this lead to a deadlock. One solution to this problem is using finer-grained mutexes. In particular, introduce a mutex for each outgoing pipe (rather than a global one), which thus becomes the only one we need to acquire inside callbacks. At this point, the old `mutex_` is only guarding the vector of ClientPipes, thus we can rename it and release it earlier. I also fixed the agent not acquiring any mutex when it set a message to error after a failed write (and also not removing the message from the timeout map). ghstack-source-id: 122410367 Test Plan: Ran CI in #52677 together with the TensorPipe submodule update. Reviewed By: mrshenli Differential Revision: D26636345 fbshipit-source-id: d36da989f2aab51f4acb92d2e81bb15b76088df1	2021-02-24 21:28:30 -08:00
Aaron Jaech	c954817696	print matrix dims in torch cuda matrix multiply error (#52780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52780 trying to improve the error message for torch matrix multiply dimension mismatch Test Plan: check if code compiles Reviewed By: akyrola Differential Revision: D26617036 fbshipit-source-id: de23e551af985a00384fb1cccd04120b9d2728b3	2021-02-24 20:09:25 -08:00
Richard Barnes	29c4290a8d	Use c10::irange for great good (#52153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52153 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D26407087 fbshipit-source-id: ea8ce1c17299cb9d89621e4a39f31edc2faa9fd6	2021-02-24 18:43:50 -08:00
Richard Barnes	373a20ad4a	Modernize for-loops in caffe2/torch (#52618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52618 Pull Request resolved: https://github.com/pytorch/pytorch/pull/50797 Modernize for-loops throughout caffe2/ subdirs to use ranged-loops where possible (all `.cpp` files were examined). ``` find caffe2/ -iname ".cpp" > /home/rbarnes/files buck run mode/opt foundation/clangr:clangr_local -- -j 10 --file=/home/rbarnes/files --multi --apply-replacements=true tidy '--checks=-,modernize-loop-convert' ``` Test Plan: Sandcastle tests Reviewed By: suo Differential Revision: D26585065 fbshipit-source-id: 439b9f9ce7c54fa9b4b80161f6bb27ebe8a35967	2021-02-24 18:17:46 -08:00
Richard Barnes	0567988e74	Kernel launch checks for aten/src/ATen (#52185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52185 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D26408276 fbshipit-source-id: 554dcfca52304b8e17ffbd0ba0dcf73f99cf28c6	2021-02-24 16:34:28 -08:00
Luca Wehrstedt	98873b9258	Update Gloo submodule (#52754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52754 Test Plan: CI Reviewed By: mrshenli Differential Revision: D26608421 fbshipit-source-id: 034ee34faa62ec4d4672d0197c59fa48894adae0	2021-02-24 15:42:22 -08:00
Heitor Schueroff	0396f492b9	Implemented torch.linalg.multi_dot (#51807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51807 Implemented torch.linalg.multi_dot similar to [numpy.linalg.multi_dot](https://numpy.org/doc/stable/reference/generated/numpy.linalg.multi_dot.html). This function does not support broadcasting or batched inputs at the moment. NOTE numpy.linalg.multi_dot allows the first and last tensors to have more than 2 dimensions despite their docs stating these must be either 1D or 2D. This PR diverges from NumPy in that it enforces this restriction. TODO - [ ] Benchmark against NumPy - [x] Add OpInfo testing - [x] Remove unnecessary copy for out= argument Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26375734 Pulled By: heitorschueroff fbshipit-source-id: 839642692424c4b1783606c76dd5b34455368f0b	2021-02-24 15:32:30 -08:00
Heitor Schueroff	964d47dfb9	Add torch.linalg to generated annotated_args for test_overrides (#52464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52464 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D26618696 Pulled By: heitorschueroff fbshipit-source-id: 9889fcaafcb307319b4526ee86355389653a6b61	2021-02-24 15:30:32 -08:00
Jerry Zhang	7b54a8fc23	[quant] Reference option for conv module (#52316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52316 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26505642 fbshipit-source-id: e25b1a3cc37c4b744e694946e6ddf1470dd4692b	2021-02-24 14:54:02 -08:00
Jacob Szwejbka	3cf08eaf15	[Pytorch Mobile] Improve Bundled Inputs Error Checking (#52386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52386 Remove stale aliasing inputs warning, error check that inputs is not null and has at least one entry, error check that the list of inputs is a list of tuples. This can cause subtle bugs where if the user passes in a list of tensors (the most common mistake) the first dimension of each tensor is dropped. This can go unnoticed because its the often the batch dimension which pytorch occasionally silently re-adds if its missing ghstack-source-id: 122363487 Test Plan: Bundle something with an input, bundle something with {} for inputs For typing check below paste {P199554712} Reviewed By: dhruvbird Differential Revision: D26374867 fbshipit-source-id: cd176f34bad7a4da850b165827f8b2448cd9200d	2021-02-24 13:55:45 -08:00
Mikhail Zolotukhin	88a160dc21	[TensorExpr] LoopNest: Cleanup LoopNest constructors. (#52726 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52726 This change removes `input_bufs_` and `intermediate_bufs_` from `LoopNest` class as they can be deduced from the root stmt and the list of output bufs. As a result, the constuctor of the LoopNest also becomes simpler as we now need to pass just one list of bufs. Note: we might consider passing list of input bufs for verification purposes (only inputs buffers are allowed to not have a definition), but since we don't really have an IR verifier yet, there is no need in it now. Once we add IR verifier, we could reconsider it. Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26629596 Pulled By: ZolotukhinM fbshipit-source-id: 81f544e9602b6855b7968d540b9ae06bd7c7e6d8	2021-02-24 13:26:22 -08:00
Brian Hirsh	08d266943d	structured kernels - error check when structured_delegate is not marked structured (#52227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52227 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26431105 Pulled By: bdhirsh fbshipit-source-id: 82e8c6eb5eee6f9cdb39637ecfee84ab5bb2cabe	2021-02-24 13:17:52 -08:00
Sam Estep	5e977d9c38	Catch Flake8 error codes with multiple letters (#52750 ) Summary: The Flake8 job has been passing on `master` despite giving warnings for [over a month](https://github.com/pytorch/pytorch/runs/1716124347). This is because it has been using a regex that doesn't recognize error codes starting with multiple letters, such as those used by [flake8-executable](https://pypi.org/project/flake8-executable/). This PR corrects the regex, and also adds another step at the end of the job which asserts that Flake8 actually gave no error output, in case similar regex issues appear in the future. Tagging the following people to ask what to do to fix these `EXE002` warnings: - https://github.com/pytorch/pytorch/issues/50629 authored by jaglinux, approved by rohan-varma - `test/distributed/test_c10d.py` - https://github.com/pytorch/pytorch/issues/51262 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/__init__.py` - `torch/utils/data/datapipes/iter/loadfilesfromdisk.py` - `torch/utils/data/datapipes/iter/listdirfiles.py` - `torch/utils/data/datapipes/iter/__init__.py` - `torch/utils/data/datapipes/utils/__init__.py` - `torch/utils/data/datapipes/utils/common.py` - https://github.com/pytorch/pytorch/issues/51398 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromtar.py` - https://github.com/pytorch/pytorch/issues/51599 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/readfilesfromzip.py` - https://github.com/pytorch/pytorch/issues/51704 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/routeddecoder.py` - `torch/utils/data/datapipes/utils/decoder.py` - https://github.com/pytorch/pytorch/issues/51709 authored by glaringlee, approved by ejguan - `torch/utils/data/datapipes/iter/groupbykey.py` Specifically, the question is: for each of those files, should we remove the execute permissions, or should we add a shebang? And if the latter, which shebang? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52750 Test Plan: The Lint / flake8-py3 job in GitHub Actions: - [this run](https://github.com/pytorch/pytorch/runs/1972039886) failed, showing that the new regex catches these warnings properly - [this run](https://github.com/pytorch/pytorch/runs/1972393293) succeeded and gave no output in the "Run flake8" step, showing that this PR fixed all Flake8 warnings - [this run](https://github.com/pytorch/pytorch/pull/52755/checks?check_run_id=1972414849) (in https://github.com/pytorch/pytorch/issues/52755) failed, showing that the new last step of the job successfully catches Flake8 warnings even without the regex fix Reviewed By: walterddr, janeyx99 Differential Revision: D26637307 Pulled By: samestep fbshipit-source-id: 572af6a3bbe57f5e9bd47f19f37c39db90f7b804	2021-02-24 12:56:12 -08:00
Benjamin Lefaudeux	7ae7768617	[ZeroRedundancyOptimizer] Remove pseudo futures handling, not needed (#52698 ) Summary: This was mostly needed for ShardedDDP, not used here, dead code removal Pull Request resolved: https://github.com/pytorch/pytorch/pull/52698 Reviewed By: mrshenli Differential Revision: D26617893 Pulled By: blefaudeux fbshipit-source-id: 9bcfca5135bf332ebc1240300978c138d2041146	2021-02-24 11:39:59 -08:00
Sam Estep	27d04f291e	Clarify usage and output of tools/test_history.py (#52640 ) Summary: This PR makes several UX improvements to `tools/test_history.py`: - warn if `--all` is unset and no jobs are passed - print output even in `multiline` mode if no reports are found for a commit - this makes it easier to tell whether the script is just hanging - if there are multiple reports for a commit/job pair, say so - distinguish between not finding any reports and just not finding the desired test in the reports found - don't require the suite name as a CLI arg, just use the test name Pull Request resolved: https://github.com/pytorch/pytorch/pull/52640 Test Plan: Example shell session: https://pastebin.com/SSemHqP8 Reviewed By: walterddr Differential Revision: D26594350 Pulled By: samestep fbshipit-source-id: 9ce2245f91eef289817aafe955a4343d4a068eda	2021-02-24 11:19:45 -08:00
Shiyan Deng	b4b7db2f3b	[FX acc]Add fx_glow support for multi outputs (#52527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52527 Add e2e support for multiple outputs. Test Plan: `buck test glow/fb/fx/fx_glow:test_fx_glow -- test_fx_glow_binding_with_multiple_outputs` Reviewed By: gcatron Differential Revision: D26555520 fbshipit-source-id: f3ccd61a0c2429d4a5f511c403fa6e782012e21e	2021-02-24 10:22:50 -08:00
Nikita Shulga	59ac0ff037	Change `maybe_resize_storage_cpu` `new_size` arg to unsigned (#52671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52671 Code is written with the assumption that new_size is unsigned value, and when function is called with negative value it silently returns a nullptr rather than raise an exception. Fix above-mentioned logic by converting new_size to unsigned type and let cpu_allocator raise exception on negative alloc. Unroll nested if blocks by returning early if new_size is 0 Add TestNN.test_adaptive_pooling_size_overflow to indirecty validate the fix. Fixes https://github.com/pytorch/pytorch/issues/50960 Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26607549 Pulled By: malfet fbshipit-source-id: e3d4f7548b098f24fa5aba42d8f4e9288ece1e2e	2021-02-24 09:50:28 -08:00
Heitor Schueroff	08d7f29601	Add discontiguous kwarg to make_tensor (#51985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51985 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D26375733 Pulled By: heitorschueroff fbshipit-source-id: bb7831dc28c24b90c6f83885681eeccfdbb83438	2021-02-24 08:57:24 -08:00
Can Balioglu	3489b4a7b8	Fix the ordering of TCPStore's compare_set parameters (#52696 ) Summary: - Fixes the ordering of the value parameters of TCPStore's `compare_set()` in the pybind11 interop layer. The C++ API expects (old, new) while we are passing (new, old) in Python. - Fixes the implementation of TCPStore's `compareSetHandler()` for cases where the key already exists in the store. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52696 Test Plan: `python test/distributed/test_c10d.py` Reviewed By: malfet, H-Huang Differential Revision: D26616976 Pulled By: cbalioglu fbshipit-source-id: e6a70542e837be04697b5850947924edd896dbf6	2021-02-24 06:59:03 -08:00
Nikita Shulga	97b6b3df51	[Reland] Update XNNPACK (#52691 ) Summary: This update contains the fix to XNNPACK by kimishpatel Add unit test that exposed the problem Updated torchvision checkout to 0.9.0rc1 hash Pull Request resolved: https://github.com/pytorch/pytorch/pull/52691 Reviewed By: walterddr Differential Revision: D26614595 Pulled By: malfet fbshipit-source-id: d0fe155a084690a3459a9358dac8488292e734fb	2021-02-24 06:40:38 -08:00
Raghavan Raman	8af648354f	[nnc] Benchmarks for concat (#52592 ) Summary: This PR adds a c++ benchmark for "concat" with 3 different versions - 1) aten::cat, 2) NNC implementation with if-then-else, 3) NNC implementation using multiple loops. It also adds a python benchmark for "concat" which can now be invoked with and without CPU fusion. Here are the results of these benchmarks on a `Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz` machine with `OMP_NUM_THREADS=1` ``` -------------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------------------------------------------------- Concat2D2 (`678fe9f077`)Input/ATen/1/160/1/14/1 1211 ns 1211 ns 567896 GB/s=1.14953G/s Concat2D2 (`678fe9f077`)Input/ATen/1/580/1/174/1 1296 ns 1296 ns 537060 GB/s=4.65362G/s Concat2D2 (`678fe9f077`)Input/ATen/20/160/20/14/1 1823 ns 1823 ns 382052 GB/s=15.2677G/s Concat2D2 (`678fe9f077`)Input/ATen/20/580/20/174/1 3347 ns 3347 ns 210036 GB/s=36.0432G/s Concat2D2 (`678fe9f077`)Input/ATen/8/512/8/512/1 2093 ns 2093 ns 324760 GB/s=31.3061G/s Concat2D2 (`678fe9f077`)Input/NNC/1/160/1/14/1 694 ns 694 ns 1002902 GB/s=2.00692G/s Concat2D2 (`678fe9f077`)Input/NNC/1/580/1/174/1 852 ns 852 ns 803002 GB/s=7.08127G/s Concat2D2 (`678fe9f077`)Input/NNC/20/160/20/14/1 1639 ns 1639 ns 419683 GB/s=16.9828G/s Concat2D2 (`678fe9f077`)Input/NNC/20/580/20/174/1 5956 ns 5956 ns 117833 GB/s=20.2548G/s Concat2D2 (`678fe9f077`)Input/NNC/8/512/8/512/1 3136 ns 3136 ns 224122 GB/s=20.8958G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/1/160/1/14/1 581 ns 581 ns 1209873 GB/s=2.39737G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/1/580/1/174/1 614 ns 614 ns 1132332 GB/s=9.82955G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/20/160/20/14/1 1091 ns 1091 ns 622952 GB/s=25.5247G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/20/580/20/174/1 2399 ns 2399 ns 288376 GB/s=50.289G/s Concat2D2 (`678fe9f077`)Input/NNCLoop/8/512/8/512/1 1500 ns 1500 ns 478360 GB/s=43.6968G/s Concat2D3 (`e23ddf06e9`)Input/ATen/8/512/8/512/8/512/1 2584 ns 2584 ns 266394 GB/s=38.0397G/s Concat2D3 (`e23ddf06e9`)Input/NNC/8/512/8/512/8/512/1 5056 ns 5056 ns 139768 GB/s=19.4416G/s Concat2D3 (`e23ddf06e9`)Input/NNCLoop/8/512/8/512/8/512/1 1917 ns 1917 ns 369626 GB/s=51.2758G/s Concat2D7 (`b5edf329f8`)Input/ATen/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3888 ns 3888 ns 178124 GB/s=46.3571G/s Concat2D7 (`b5edf329f8`)Input/NNC/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 24639 ns 24638 ns 28336 GB/s=7.31481G/s Concat2D7 (`b5edf329f8`)Input/NNCLoop/8/128/8/256/8/384/8/512/8/512/8/512/8/512/1 3093 ns 3093 ns 226326 GB/s=58.265G/s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52592 Reviewed By: bertmaher Differential Revision: D26596701 Pulled By: navahgar fbshipit-source-id: 650fa88febf4423ea49f5a1d3d734edc2294d257	2021-02-24 06:09:32 -08:00
Howard Huang	b56f59ea20	Revert D26599390: [pytorch][PR] Fix for incorrect usage of logging in torch/distributed/distributed_c10d.py Test Plan: revert-hammer Differential Revision: D26599390 (`075bbe0d6a`) Original commit changeset: d822658076f7 fbshipit-source-id: 6c4421f4de99794ea66780175af549cef9410a20	2021-02-24 05:38:34 -08:00
Michael Suo	958d9a8364	[fx/package] make GraphModules packageable (#51976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51976 FX serializes things by serializing Python code as a string and exec'ing it on load. This accomplishes one goal (we don't have to pickle the graph object directly) but breaks the pickle abstraction in ways that are not composable with `torch.package`. In particular: 1. `forward` is serialized by saving Python code. On load, it's installed by `exec`ing that code. This `exec` call needs to have the right importer installed, otherwise it will not import modules from the `torch.package` but instead import from the Python environment. 2. Any types/functions used are emitted as `import` statement in the generated Python code. These are effectively dynamic dependencies of the `GraphModule` being saved, and need to be registered as such so that the `PackageImporter` will package them. To address these, this PR introduces a new protocol for the importer/exporter: `__reduce_package__`. A class can implement `__reduce_package__` to customize how it is placed in the importer/exproter. It functions very similarly to `__reduce__`, except: - `__reduce_package__` takes one argument, which is the `PackageExporter` instance. Users can use this instance to save stuff to the package to implement their serialization. `__reduce__` takes no args. - Only the 2-element tuple version of the return value for `__reduce__` is supported (this could be extended if necessary). - When the reduction function is called on load, an additional argument is added to the beginning of the args tuple. This is the `PackageImporter` instance doing the loading. The `__reduce_package__` protocol is defined using `persistent_id` and `persistent_load`, which ensures that we can still use the cpickle implementation of the pickler by default. Pull Request resolved: #51971 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26340591 Pulled By: suo fbshipit-source-id: 5872a7d22e832056399a7372bae8a57807717882	2021-02-23 22:43:00 -08:00
Szymon Migacz	075bbe0d6a	Fix for incorrect usage of logging in torch/distributed/distributed_c10d.py (#51739 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51428 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51739 Reviewed By: bdhirsh Differential Revision: D26599390 fbshipit-source-id: d822658076f7b08ebfde3dc9994159539490fda0	2021-02-23 22:30:37 -08:00
Seung-Jae Bang	2d75346c25	[Gradient Compression] Add a minimum compression rate threshold for PowerSGD communication hook (#52541 ) Summary: Fixes #{52034} - Add a minimum compression rate threshold to `PowerSGDState` - Use the threshold to determine whether to compress high-rank tensors or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/52541 Test Plan: No performance regression using rank-8 compression: baseline: f253000411 updated one: f253010955 Reviewed By: rohan-varma Differential Revision: D26594862 Pulled By: SciPioneer fbshipit-source-id: 2859a91b4ca6bd1862bf6cd6441dc2a89badb2d5	2021-02-23 22:03:02 -08:00
Dhruv Matani	755c60bffc	[PyTorch Mobile] Allow loading of all extra files using the extra_file argument (#52635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52635 Currently, the method `_load_for_mobile()` accepts an extra files map named `extra_files` which serves as an in-out parameter. i.e. the call fills in the keys of this map with all files under the `extra/` folder that they wish to extract, and the method fills in the `extra_files` map with the contents of those files. In a specific case we have encountered, it is desirable to extract all the extra files so that they can be forwarded in an opaque manner into a `save_for_mobile()` call with the same set of extra files as during load. This change adds a method `_get_all_archive_file_names()` which returns the names of all files in the `.ptl` archive. The caller can then extract the ones within the `extra/` directory and pass them in to the `extra_files` map argument. ghstack-source-id: 122356928 Test Plan: Added additional test + `buck test //xplat/caffe2:test_lite_interpreter` Reviewed By: iseeyuan Differential Revision: D26590027 fbshipit-source-id: 4dc30997929e132f319c32cb9435d8a40fe0db5e	2021-02-23 21:57:13 -08:00
Jeff Yang	0c455332e8	docs: add link to Tensor.share_memory_ in Module.share_memory (#52561 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52561 Reviewed By: malfet Differential Revision: D26626012 fbshipit-source-id: 7aab44c60d1bcbda68012521ec852843864abc7f	2021-02-23 20:20:50 -08:00
Shiyan Deng	238b0bbb68	Allow Transformer accept output result that is not Proxy (#52473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52473 Use `map_aggregate` to create output for new graph so that it won't raise error when we have outputs that is not `Proxy`. Test Plan: `test_transformer_multi_outputs` in `test_fx.py` Reviewed By: jamesr66a Differential Revision: D26502277 fbshipit-source-id: 404d9030a9b84db3f66f8505887a75717a28ad30	2021-02-23 19:28:37 -08:00
Nikita Shulga	75f7b22025	Fix hipify_python (#52709 ) Summary: Two changes: - Print a warning rather than fail if creating hipified file fails with permission denied error - Do not attempt to create /usr/include/libpng/png_hip.h in the first place Pull Request resolved: https://github.com/pytorch/pytorch/pull/52709 Reviewed By: walterddr Differential Revision: D26625033 Pulled By: malfet fbshipit-source-id: ff82dc24aee12eac2daaa6e5bc938811b49ebbc6	2021-02-23 19:19:13 -08:00
Richard Barnes	26419815af	Modernize for-loops (#52330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52330 Test Plan: Sandcastle Reviewed By: mruberry Differential Revision: D26001961 fbshipit-source-id: e75cc8f1a8d30917b4d55df9e1a3c7836c271820	2021-02-23 17:32:33 -08:00
cyy	caa377f546	replace type().backend() with device() (#52558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52558 Reviewed By: malfet Differential Revision: D26616025 Pulled By: jbschlosser fbshipit-source-id: ef9f3f42e830788c21feab533e192ba9c6eb8edb	2021-02-23 16:32:21 -08:00
Erjia Guan	b534466f01	[DataLoader] TransformsIterDataPipe (#52604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52604 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D26581511 Pulled By: ejguan fbshipit-source-id: c927726b7afba14586f16cde0237f2cef9080079	2021-02-23 15:47:27 -08:00
Nikita Shulga	cabb1e7a94	Fix wrong TORCH_CHECK usages (#52670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52670 TORCH_CHECK followed by a string literal is a no-op, and from the text of the message its clear that authors intended those instances to be `TORCH_CHECK(false, "msg")` Discovered while trying to figure out of tensor_offset can be negative in Resize.h s/TORCH_CHECK\("/TORCH_CHECK(false, "/ Test Plan: Imported from OSS Reviewed By: walterddr, janeyx99, mruberry Differential Revision: D26607546 Pulled By: malfet fbshipit-source-id: 661812da84adb1d1af0284da60c93ec4bf5ef08e	2021-02-23 14:47:51 -08:00
XiaobingSuper	420fc42eab	add OneDNN pooling backward (#49454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49454 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26006888 Pulled By: VitalyFedyunin fbshipit-source-id: 6a4930982db784819fea70ffc9029441d673d90e	2021-02-23 14:45:55 -08:00
Richard Barnes	df30cb78d2	Remove unused variable (#52652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52652 Test Plan: Sandcastle Reviewed By: ngimel Differential Revision: D26589146 fbshipit-source-id: 704a93e479e1bf2420dd47589319a5438a2f92f1	2021-02-23 14:38:23 -08:00
Jane Xu	d5ed57569b	Move cuda9 and cuda11.2 CI jobs to a scheduled workflow (#52693 ) Summary: Moving master only resource-interactive CI jobs to a less regular basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52693 Reviewed By: malfet, seemethere Differential Revision: D26615060 Pulled By: janeyx99 fbshipit-source-id: def46a7890ea46c655ef2ee0f7c548171464cb48	2021-02-23 14:17:15 -08:00
Michael Suo	ecf3ca00d8	[fx] Separate globals assignment from code generation (#51974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51974 Right now, when an FX `Graph` references an external object, we will emit code like: import foo def forward(input: foo.bar.baz): ... This is problematic in a world with `torch.package`, since then name `foo.bar.baz` may reference a name from any number of packages. This PR lays the groundwork for FX-package integration by separating the resolution of external references from the genration of the function code. When generating a Graph's Python source, we keep track of all external references and assign them unique names. At the end, we have a dictionary mapping names -> actual objects. This becomes the `globals` namespace we pass to `exec` when installing the forward function in a `GraphModule`. This is nice because we can always be sure that `exec` is seeing the same objects that were referenced from the `Graph`, no import statements needed. At serialization time, we use a `ModuleEnv` to resolve the globals dict to a set of import statements that can be run to reprodce the `global` namespace. This is only used on serialiation/deserialization, and those functions are expected to check that the import statements are producing the correct results. Concretely, the code above will now look like: from foo.bar import baz as foo_bar_baz def forward(input: foo_bar_baz): ... Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26340593 Pulled By: suo fbshipit-source-id: fe247f75205d0a03fd067bdd0f95491e8edf1436	2021-02-23 13:48:03 -08:00
Shiyan Deng	1cddb27f39	[FX acc]Store shape and dtype in serialized output node args (#52462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52462 This is the step one for supporting multiple outputs in fx nnpi path. During serialization, we store the shape and dtype in output args, so that importer doesn't need to go back and find the nodes. The output nodes will looks like ``` { "target": "output", "op_code": "output", "name": "output", "args": [ { "is_node": true, "name": "add_1", "shape": "[1, 1]", "dtype": "torch.float32" } ], "kwargs": {} } ``` Test Plan: Doesn't break existing tests and will test on step two. Reviewed By: jfix71 Differential Revision: D26500742 fbshipit-source-id: 755d2dec704d9da579af40e754b556d6c01aa796	2021-02-23 13:29:02 -08:00
Adam Simpkins	e2afb269b8	[caffe2] add a Python test for SaveOp chunking Summary: Add a test in `load_save_test.py` that passes in a chunk_size parameter, to ensure that we exercise the logic that passes the chunk size to the C++ serialization code. Test Plan: Ran the tests with the vlog level set to 3 and manually verified the log messages showed that we were serializing in the expected chunks. There are existing C++ tests that confirm chunking behavior works as expected in the pure C++ code. Reviewed By: mraway Differential Revision: D26502578 fbshipit-source-id: cd0074f2358da81c68b0fed2c2a94818d83a957d	2021-02-23 11:52:13 -08:00
Pritam Damania	1c63cb2c0f	Pass child error to parent in distributed tests. (#52632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52632 Distributed tests run in a multiprocessing environment, where a parent process drives the tests through several child processes. As a result, when a child process fails the parent only prints the following: ``` Process 0 exited with error code 10 ``` The child process also logs its own exception, but it is cumberson to go through the logs and track this down. To alleviate this, I've added a bunch of pipes for each child process so that the child process writes the error to the pipe before exiting and the parent process can read the appropriate error from the pipe and display it. The new output printed by the parent is as follows: ``` > RuntimeError: Process 0 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 1 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 2 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 Process 3 exited with error code 10 and exception: Traceback (most recent call last): File "torch/testing/_internal/common_distributed.py", line 361, in _run getattr(self, test_name)() File "torch/testing/_internal/common_distributed.py", line 288, in wrapper fn() File "test_c10d.py", line 789, in test_broadcast_checks pg.broadcast([t1], opts) ValueError: ProcessGroupGloo::broadcast: invalid root rank: -1 ``` ghstack-source-id: 122273793 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26589274 fbshipit-source-id: 7b7a71ec790b216a89db7c157377f426531349a5	2021-02-23 11:50:25 -08:00
Charles David Hernandez	e3a805b9c5	Fake Quantization support for f16 and f32 (#52612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52612 used the float type macro to generalize the fake_quantization per tensor functions to f16 and f64. Test Plan: added test to show it works in AMP and extended the forward and backward tests below to test float16 and float64 operations. Note: the reference function doesn't work with with these types so I had to convert in and back out of these types to compare. ```test python test/test_quantization.py TestFakeQuantize.test_forward_backward_per_tensor_with_amp test python test/test_quantization.py TestFakeQuantize.test_forward_per_tensor_cachemask_cpu test python test/test_quantization.py TestFakeQuantize.test_backwards_per_tensor_cachemask_cpu test python test/test_quantization.py TestFakeQuantize.test_forward_per_tensor_cachemask_cuda test python test/test_quantization.py TestFakeQuantize.test_backwards_per_tensor_cachemask_cuda test python test/test_quantization.py ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D26586416 fbshipit-source-id: 55fe83c5e47f45cd1de8ddd69bd4a5653ab6dc12	2021-02-23 10:49:12 -08:00
Tugsbayasgalan Manlaibaatar	e658d7c37b	Ignore user annotated ignored attributes (#52367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52367 This fixes https://github.com/pytorch/pytorch/issues/52217 Test Plan: Imported from OSS Reviewed By: navahgar, gmagogsfm Differential Revision: D26574411 Pulled By: tugsbayasgalan fbshipit-source-id: 7eac097f5b97cfe65854bceca14d41c156cd6e0a	2021-02-23 10:40:44 -08:00
Howard Huang	2680ff7759	Revert D26598115: [pytorch][PR] Update XNNPACK Test Plan: revert-hammer Differential Revision: D26598115 (`3721962c33`) Original commit changeset: d652bacdee10 fbshipit-source-id: 7e0128aa9b7691ecd323687da6f6054363b3174a	2021-02-23 10:27:43 -08:00
kshitij12345	49b59e3472	Add OpInfo entries for i0 and logical_not (#51956 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51956 Reviewed By: albanD Differential Revision: D26404440 Pulled By: mruberry fbshipit-source-id: dd73e63155dd4a200afb38a5e566eb2132e69fde	2021-02-23 10:12:05 -08:00
Xiao Wang	dc6fab4452	Fix performance of CUDA trilinear interpolate backward (#52351 ) Summary: Close https://github.com/pytorch/pytorch/issues/51206 This PR basically reverts the CUDA launch configuration changes made in https://github.com/pytorch/pytorch/issues/48675, then only apply a `gpuAtomicAdd` -> `fastAtomicAdd` replacement in the CUDA kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52351 Reviewed By: seemethere Differential Revision: D26597006 Pulled By: ngimel fbshipit-source-id: 4a34a351a75c80f714e50cf6dae2c31ddb901ffe	2021-02-23 07:41:38 -08:00
Nikita Shulga	3721962c33	Update XNNPACK (#52645 ) Summary: This update contains the fix to XNNPACK by kimishpatel Add unit test that exposed the problem Updated torchvision checkout to 0.9.0rc1 hash Fixes https://github.com/pytorch/pytorch/issues/52463 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52645 Reviewed By: kimishpatel, seemethere Differential Revision: D26598115 Pulled By: malfet fbshipit-source-id: d652bacdee10bb975fc445ab227de37022b8ef51	2021-02-23 06:59:57 -08:00
Mikhail Zolotukhin	64847c7f0b	[TensorExpr] Properly handle ExternalCalls in LoadStore analysis and Inliner. (#52628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52628 Prior to this change ExternalCalls were not considered as Loads or Stores to/from its buffers, which led to incorrect behavior in inlining. This PR fixes it. Differential Revision: D26589378 Test Plan: Imported from OSS Reviewed By: navahgar Pulled By: ZolotukhinM fbshipit-source-id: cd69d5f7075f6dc756aabcf676842b9a250334d6	2021-02-22 21:50:48 -08:00
Mikhail Zolotukhin	b63a1e31d3	[TensorExpr] Inlining: allow inlining into Load exprs. (#52627 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52627 Currently inliner only inlines into Calls, this PR extends this to cover Loads too. Eventually we will remove Calls altogether and use Loads everywhere, this is one step in that direction. Differential Revision: D26589377 Test Plan: Imported from OSS Reviewed By: asuhan Pulled By: ZolotukhinM fbshipit-source-id: ca28f0df2273eb214f203467c6ba3d8f02a8a3b6	2021-02-22 21:47:24 -08:00
Nikita Shulga	67794b14bb	Use `int8_t` instead of `char` in [load\|store]_scalar` (#52616 ) Summary: Since `char` is not guaranteed to be signed on all platforms (it is unsigned on ARM) Fixes https://github.com/pytorch/pytorch/issues/52146 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52616 Test Plan: Run ` python3 -c "import torch;a=torch.tensor([-1], dtype=torch.int8);print(a.tolist())"` on arm-linux system Reviewed By: walterddr Differential Revision: D26586678 Pulled By: malfet fbshipit-source-id: 91972189b54f86add516ffb96d579acb0bc13311	2021-02-22 21:11:18 -08:00
Nikita Shulga	7ecc1b603a	[TensorPipe] Update [Cpu\|Cuda]Buffer fwd declarations (#52600 ) Summary: They've changed from class to struct in tensorpipe repo, but have not been updated in the header, which triggers compiler warning if clang is used and would have triggered a linker error if the same code is compiled with MSVC Pull Request resolved: https://github.com/pytorch/pytorch/pull/52600 Reviewed By: lw Differential Revision: D26579754 Pulled By: malfet fbshipit-source-id: 800c02e7ba839bac01adf216de2d8547b7e9128b	2021-02-22 21:03:41 -08:00
Adam Simpkins	fa8568184f	[caffe2] Delete some unused fields from TensorProto (#52521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52521 The `storage_type` and `external_data` fields were added a few years ago in D10246743 (`30aaa07594`) but don't appear to have been used anywhere. Let's remove them to help simplify the `TensorProto` message definition. ghstack-source-id: 122110201 Test Plan: Confirmed the code still builds. Reviewed By: dzhulgakov Differential Revision: D26500028 fbshipit-source-id: 1e188f98f59e0b8673ea342ad9aaf7e5ba9b5fac	2021-02-22 20:51:27 -08:00
Jeff Yang	f111ec48c1	docs: add fractional_max_pool in nn.functional (#52557 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51708 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52557 Reviewed By: bdhirsh Differential Revision: D26591388 Pulled By: jbschlosser fbshipit-source-id: 42643864df92ea014e69a8ec5c29333735e98898	2021-02-22 20:45:07 -08:00
Hong Xu	6cfe55dea9	Add psutil to requirements.txt (#52285 ) Summary: psutil is used in many test scripts under test/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/52285 Reviewed By: jbschlosser Differential Revision: D26516673 Pulled By: malfet fbshipit-source-id: 09a81d5dba3bf5189e3e5575c2095eb069b93ade	2021-02-22 20:07:07 -08:00
peter	a59c4039e0	Fix undefined symbol for CUDA 11.1 Windows (#52506 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52467. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52506 Reviewed By: bdhirsh Differential Revision: D26582788 Pulled By: seemethere fbshipit-source-id: a03489449e0492ed023bf54aa9da194491f0e67f	2021-02-22 19:02:51 -08:00
Bram Wasti	a0652c8f08	[static runtime] Fix up deprecated exact equality in tests (#52617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52617 swaps `.equals` with `torch::allclose` tests are broken right now Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest -- --run-disabled Reviewed By: bertmaher, maratsubkhankulov, yinghai Differential Revision: D26585079 fbshipit-source-id: 9bd2a7b87208301415a8925f95c69fe44accf159	2021-02-22 17:50:14 -08:00
Jeff Yang	7f4dff5496	docs: add FractionalMaxPool3d in pooling layers (#52556 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52556 Reviewed By: smessmer Differential Revision: D26593666 Pulled By: bdhirsh fbshipit-source-id: 3d81d23fa70efa0f794dde47a34baad0aaa9c751	2021-02-22 17:04:09 -08:00
Jacob Szwejbka	1865499d49	[Pytorch Mobile] Improve export_opnames Documentation (#52333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52333 Export_opnames current documentation is a bit misleading. Change it to better clarify what it does. ghstack-source-id: 121810264 Test Plan: n/a Reviewed By: iseeyuan Differential Revision: D26471803 fbshipit-source-id: 496d10b161c9a4076c4e12db8a0affafc4e1e359	2021-02-22 16:46:08 -08:00
Horace He	108ec77fa7	[NNC] Added reductions to NNC python bindings. (#52492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52492 Reviewed By: bdhirsh Differential Revision: D26575506 Pulled By: Chillee fbshipit-source-id: 9a070f591a9709dab55dfff849184b1bcffc4fa5	2021-02-22 16:31:18 -08:00
Natalia Gimelshein	3309f034aa	remove pointless test (#52609 ) Summary: Fixes T81870118 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52609 Reviewed By: mruberry Differential Revision: D26584288 Pulled By: ngimel fbshipit-source-id: 7cec37db46cfe5b5b2fd21fe7c3e3fcbb8aba049	2021-02-22 16:25:04 -08:00
Jeff Yang	fd5792f857	docs: add :nosignatures: in torch.jit (#52555 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52554 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52555 Reviewed By: ZolotukhinM Differential Revision: D26573956 Pulled By: SplitInfinity fbshipit-source-id: ce011c66ce771bc7e9357f98db9994d54faa7013	2021-02-22 16:19:07 -08:00
Howard Huang	09fe753a33	Enable TCPStore fixed slow test (#52511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52511 Re-enable a test that was previously fixed but forgot to be re-enabled. Test Plan: Imported from OSS Reviewed By: bdhirsh Differential Revision: D26586980 Pulled By: H-Huang fbshipit-source-id: 3cfe21de09036d2b87273680dae351e20125e815	2021-02-22 16:07:37 -08:00
Hui Guo	973e306c84	changed TE 'Allocate' API to take one argument 'Buf' instead of three arguments 'Var', 'dtype', 'dims'. (#50167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50167 Test Plan: Imported from OSS `python test/test_jit_fuser_te.py` `python test/test_jit_fuser_legacy.py` `python test/test_jit_fuser.py` `build/bin/test_tensorexpr` Reviewed By: ZolotukhinM Differential Revision: D25814342 Pulled By: huiguoo fbshipit-source-id: 44cba7f92365b826c9cb1d385a94858934570dee	2021-02-22 15:08:51 -08:00
Lillian Johnson	0bc57f47f0	torch.Package zipfile debugging printer (#52176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52176 Added tooling to print out zipfile structure for PackageExporter and PackageImporter. API looks like: ``` exporter.print_file_structure("sss" /only include files with this in the path/) importer3.print_file_structure(False /don't print storage/, "sss" /only include files with this in the path/) ``` The output looks like this with the storage hidden by default: ``` ─── resnet.zip ├── .data │ ├── extern_modules │ └── version ├── models │ └── models1.pkl └── torchvision └── models ├── resnet.py └── utils.py ``` The output looks like this with the storage being printed out: ``` ─── resnet_added_attr_test.zip ├── .data │ ├── 94574437434544.storage │ ├── 94574468343696.storage │ ├── 94574470147744.storage │ ├── 94574470198784.storage │ ├── 94574470267968.storage │ ├── 94574474917984.storage │ ├── extern_modules │ └── version ├── models │ └── models1.pkl └── torchvision └── models ├── resnet.py └── utils.py ``` If the output is filtered with the string 'utils' it'd looks like this: ``` ─── resnet_added_attr_test.zip └── torchvision └── models └── utils.py ``` Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26429795 Pulled By: Lilyjjo fbshipit-source-id: 4fa25b0426912f939c7b52cedd6e217672891f21	2021-02-22 15:04:56 -08:00
Lillian Johnson	b72a72a477	torch.Package extend PyTorchStreamWriter to track written records (#52218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52218 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26429794 Pulled By: Lilyjjo fbshipit-source-id: 5f68e7991c673ada629d0370c705520243d0637a	2021-02-22 15:02:41 -08:00
Joel Schlosser	a39b1c42c1	MHA: Fix regression and apply bias flag to both in/out proj (#52537 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52257 ## Background Reverts MHA behavior for `bias` flag to that of v1.5: flag enables or disables both in and out projection biases. Updates type annotations for both in and out projections biases from `Tensor` to `Optional[Tensor]` for `torch.jit.script` usage. Note: With this change, `_LinearWithBias` defined in `torch/nn/modules/linear.py` is no longer utilized. Completely removing it would require updates to quantization logic in the following files: ``` test/quantization/test_quantized_module.py torch/nn/quantizable/modules/activation.py torch/nn/quantized/dynamic/modules/linear.py torch/nn/quantized/modules/linear.py torch/quantization/quantization_mappings.py ``` This PR takes a conservative initial approach and leaves these files unchanged. Is it safe to fully remove `_LinearWithBias`? Pull Request resolved: https://github.com/pytorch/pytorch/pull/52537 Test Plan: ``` python test/test_nn.py TestNN.test_multihead_attn_no_bias ``` ## BC-Breaking Note In v1.6, the behavior of `MultiheadAttention`'s `bias` flag was incorrectly changed to affect only the in projection layer. That is, setting `bias=False` would fail to disable the bias for the out projection layer. This regression has been fixed, and the `bias` flag now correctly applies to both the in and out projection layers. Reviewed By: bdhirsh Differential Revision: D26583639 Pulled By: jbschlosser fbshipit-source-id: b805f3a052628efb28b89377a41e06f71747ac5b	2021-02-22 14:47:12 -08:00
Rong Rong (AI Infra)	bfc2645981	[BE] force cmake to always generate version.py (#52477 ) Summary: Fix the issue that `add_custom_command(OUTPUT ...)` will only be called when target output is missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52477 Reviewed By: malfet Differential Revision: D26538718 Pulled By: walterddr fbshipit-source-id: 0fef40585a0f888dcbe268deb2e7a7a8d81e6aa1	2021-02-22 13:54:39 -08:00
Richard Barnes	2eb9c0832e	Modernize for-loops in torch misc (#52452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52452 Test Plan: Sandcastle Reviewed By: pritamdamania87 Differential Revision: D26520760 fbshipit-source-id: c13161324f24f553ad679308d0dc279ab178e129	2021-02-22 13:37:19 -08:00
Brian Hirsh	947225cd1b	update tracing codegen to use redispatch API (#52009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52009 Taking advantage of the new `redispatch` API to clean up the codegen'd tracing kernels. Instead of directly interacting with the Dispatcher, the tracing kernels now just call the `redispatch` API directly. One small benefit to this: hopefully the compiler is more likely to inline `Dispatcher::redispatch()`, since it's now used in fewer call-sites. After this change, the only places it's used are: - the `redispatch` API (`RedispatchFunctions.cpp`) - BackendSelect kernels. One small complication: the redispatch API doesn't interact too well with `manual_cpp_binding` ops currently. I put a note with some thoughts in the comments. Example tracing kernel before: ``` Tensor add_Tensor(c10::DispatchKeySet ks, const Tensor & self, const torch::jit::Node* node = nullptr; std::shared_ptr<jit::tracer::TracingState> tracer_state; if (jit::tracer::isTracing()) { tracer_state = jit::tracer::getTracingState(); at::Symbol op_name; op_name = jit::Symbol::fromQualString("aten::add"); node = tracer_state->graph->create(op_name, /num_outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "other", other); jit::tracer::addInputs(node, "alpha", alpha); tracer_state->graph->insertNode(node); jit::tracer::setTracingState(nullptr); } static auto op = c10::Dispatcher::singleton() .findSchemaOrThrow("aten::add", "Tensor") .typed<Tensor (const Tensor &, const Tensor &, Scalar)>(); auto result =c10::Dispatcher::singleton() .redispatch<Tensor, const Tensor &, const Tensor &, Scalar>(op, if (tracer_state) { jit::tracer::setTracingState(std::move(tracer_state)); jit::tracer::addOutput(node, result); } return result; } ``` after: (note the lack of `Dispatcher::` calls) ``` Tensor add_Tensor(c10::DispatchKeySet ks, const Tensor & self, const Tensor & other, Scalar alpha) torch::jit::Node* node = nullptr; std::shared_ptr<jit::tracer::TracingState> tracer_state; if (jit::tracer::isTracing()) { tracer_state = jit::tracer::getTracingState(); at::Symbol op_name; op_name = jit::Symbol::fromQualString("aten::add"); node = tracer_state->graph->create(op_name, /num_outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "other", other); jit::tracer::addInputs(node, "alpha", alpha); tracer_state->graph->insertNode(node); jit::tracer::setTracingState(nullptr); } auto result =at::redispatch::add(ks & c10::DispatchKeySet(c10::DispatchKeySet::FULL_AFTER, c10::D if (tracer_state) { jit::tracer::setTracingState(std::move(tracer_state)); jit::tracer::addOutput(node, result); } return result; } ``` Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26356078 Pulled By: bdhirsh fbshipit-source-id: bc96ca4c6d90903f1e265859160d4b13a8cc7310	2021-02-22 13:26:47 -08:00
Brian Hirsh	80240d0888	update autograd kernels to use redispatch (#51363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51363 Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26153580 Pulled By: bdhirsh fbshipit-source-id: 5d7905d2c39c9bb7f219e703940ed3eef5230491	2021-02-22 13:24:34 -08:00
Chen Lai	6b8e670eb7	[CI][IOS] Add lite interpreter ios build job (#52567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52567 ## Summary As title, add libtorch (lite) in ios x86 Em and ## Test plan In `config.yml`, remove `context: org-member`: ``` - pytorch_ios_build: build_environment: pytorch-ios-12.0.0-arm64_lite_interpreter_build context: org-member ios_arch: arm64 ios_platform: OS lite_interpreter: "1" name: pytorch_ios_12_0_0_arm64_lite_interpreter_build ``` The build is: https://app.circleci.com/pipelines/github/pytorch/pytorch/276113/workflows/49fa2f6e-c978-424b-9177-bbe313955876/jobs/11050851 Build step finishes successfully: ![image](https://user-images.githubusercontent.com/16430979/108619899-d183b080-73dc-11eb-809d-a21f811cf821.png) It fails Run Build Test because of missing `IOS_DEV_TEAM_ID` Test Plan: Imported from OSS Reviewed By: xta0 Differential Revision: D26572842 Pulled By: cccclai fbshipit-source-id: 9d868ac7e94af37ef90212b754e91d98c0d20b30	2021-02-22 13:17:41 -08:00
Nikolay Korovaiko	067fd78f05	add RECORD_FUNCTION to grad_sum_to_size (#52516 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52516 Reviewed By: bdhirsh Differential Revision: D26582645 Pulled By: Krovatkin fbshipit-source-id: f3aa7d959cc31fc6fd6f8a38c36488b01cc1a515	2021-02-22 12:53:45 -08:00
Raghavan Raman	09c56ef45e	Remove DepTracker from LoopNest (#52405 ) Summary: Remove the dependency tracker that works on Tensors, DepTracker, from LoopNest. This is essential to the goal of removing Tensors from LoopNest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52405 Reviewed By: heitorschueroff Differential Revision: D26548621 Pulled By: navahgar fbshipit-source-id: b20f23d608c19ac71aebd31c14777d653eead36c	2021-02-22 12:48:07 -08:00
Nikolay Korovaiko	847d1d4d53	add debug_flush_compilation_cache to `Method` (#52317 ) Summary: Forgot to add `debug_flush_compilation_cache ` to `Method` as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52317 Reviewed By: bdhirsh Differential Revision: D26583313 Pulled By: Krovatkin fbshipit-source-id: 1b3e503950cc3314796aff53b3b8038d16767870	2021-02-22 12:31:09 -08:00
Richard Barnes	783b5c0c9f	op_whitelist -> op_allowlist (#52150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52150 Renames "whitelist" to "allowlist" to conform to company use standards, prevent critical errors raised by linters which detect the old usage, and to move toward more self-descriptive terminology. Test Plan: Sandcastle Reviewed By: suo Differential Revision: D26405520 fbshipit-source-id: 9c3a41591d4e29c0197de9a8f5858c9c29271e26	2021-02-22 12:23:42 -08:00
Yi Wang	03ae6d9903	Remove useless _allgather_then_aggregate_hook (#52593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52593 This hook is not used at all, and it probably can only be used for demonstrating that allgather is slower than allreduce, so it should never be used in practice. However, this hook and its helper function stay with the communication hook public APIs in the same file. It will be better to make the public API file as concise as possible. Since I don't think we will use this hook in the future, prefer deleting it to moving it to a separate file. ghstack-source-id: 122180969 Test Plan: waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26575318 fbshipit-source-id: b258154a7c92e33236c34104bd79bc244ecdb158	2021-02-22 12:12:53 -08:00
kshitij12345	ad3319cbc2	fractional_max_pool{2/3}d : Fix segfaults for incorrect kernel_size and output_size (#51626 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50967 TODO: * [x] Add test for `fractional_max_pool3d` similar to `fractional_max_pool2d` (since there is no test for the same). Needs Resolution: * [ ] ASAN failure on the newly added 3d variant test. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673756 * [ ] Failing gradcheck on MacOS. https://app.circleci.com/pipelines/github/pytorch/pytorch/269483/workflows/8426b3b7-9a35-4032-a57a-729964a4a5ff/jobs/10673101 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51626 Reviewed By: jbschlosser Differential Revision: D26514064 Pulled By: heitorschueroff fbshipit-source-id: e2cc57585dbc3a08c7f24591b202e0fabfd2a459	2021-02-22 12:06:36 -08:00
76181208+imaginary-person@users.noreply.github.com	116d402200	Skip handle_r_to_c for dot & vdot (#52474 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52455. Summary: `dot` & `vdot` operate on tensors of the same `dtype`, so skip `handle_r_to_c` for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52474 Reviewed By: bdhirsh Differential Revision: D26570931 Pulled By: anjali411 fbshipit-source-id: 07c6c50e3550e521d1807c519154b028d9168de7	2021-02-22 12:00:08 -08:00
Hong Xu	4386a3803c	Replace all ASSERTM in serialization (#51756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51756 Test Plan: Imported from OSS Reviewed By: Lilyjjo Differential Revision: D26280320 Pulled By: bdhirsh fbshipit-source-id: ddba1fe46b9f39234f010aac9cdf198e82727f84	2021-02-22 11:29:10 -08:00
Hong Xu	014d2123a3	Replace all AT_ASSERTM in ATen (#51677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51677 Test Plan: Imported from OSS Reviewed By: anjali411 Differential Revision: D26280321 Pulled By: bdhirsh fbshipit-source-id: cef273e45ba7167ae240b85410ca7a3913ad54b4	2021-02-22 11:27:08 -08:00
Brian Hirsh	d02a2bd5d1	codegen'd API for redispatching (#52008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52008 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26356079 Pulled By: bdhirsh fbshipit-source-id: 1fd34fbb4dbc48cc8390cad99e30e0d04fc75a4f	2021-02-22 10:55:38 -08:00
kshitij12345	ed71cbdd39	Revert PR 52483 "[reland][complex] `masked_fill` (#52587 ) Summary: Revert "[reland][complex] `masked_fill`: Complex Autograd support update masked_scatter skips. (https://github.com/pytorch/pytorch/issues/52483)" This reverts commit b6cf17deeeb526b0dfee5434c96223debe62c506. Reference: https://github.com/pytorch/pytorch/pull/52483#issuecomment-783023560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52587 Reviewed By: anjali411 Differential Revision: D26579741 Pulled By: malfet fbshipit-source-id: 9b53c8aab51d844d0f65393609861a4ff72ef7bb	2021-02-22 10:53:37 -08:00
Brian Hirsh	57637e0ab4	port upsample_nearest3d and upsample_trilinear3d to structured (#52065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52065 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26373027 Pulled By: bdhirsh fbshipit-source-id: 76b283ea8142732ffc8f7b200a8494349739e326	2021-02-22 10:38:52 -08:00
Brian Hirsh	d659477ae0	port upsample_bilinear2d and upsample_bicubic2d to structured (#52012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52012 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26356329 Pulled By: bdhirsh fbshipit-source-id: 8f974224799493e3172fe5dff3fbd43af8c09722	2021-02-22 10:38:48 -08:00
Brian Hirsh	f3ea5ca672	port upsample_linear1d to structured (#51917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51917 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26327750 Pulled By: bdhirsh fbshipit-source-id: 443ad278010ce655eb5f08fa6889c45ccb328268	2021-02-22 10:38:43 -08:00
Brian Hirsh	c78a4a52d2	remove unnecessary/dead code in upsample_nearest1d cuda kernel (#51916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51916 After getting ported to structured kernels, the vector overloads of `upsample_nearest1d` are DefaultBackend kernels, meaning they are backend agnostic. We can kill their CUDA-specific implementations. I also removed a few redundant checks in the cuda kernels that are now performed by the meta shape-checking function. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D26327749 Pulled By: bdhirsh fbshipit-source-id: b5a17e14237fb36236d4079433f99c71cd3beef3	2021-02-22 10:36:47 -08:00
generatedunixname89002005325676	ee04cd9587	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26575694 fbshipit-source-id: afb40c6c12126b4f3cb8ea2ffc526b1d817f5471	2021-02-22 04:41:17 -08:00
Dhruv Matani	c2b9283d4a	[PyTorch Mobile] Use real if constexpr behind macro in hot template (copy D26154964 in a different setting) (#52420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52420 Inspired by D26154964 (`6e1a5b1196`), I'm basically going to just blindly copy the change that swolchok has made since it promises to reduce compile time, and who doesn't want faster compiles! I haven't actually checked if it has any impact on build time, but I have come to trust what swolchok does. In addition, swolchok observed a size reduction with the change, which I assume happens when the `constexpr` is true since the lambda is invoked and possibly needs to be compiled in. When tracing based selective build is enabled, many many many of these will be enabled, and this will use valuable size. This change is required to get the maximum bang for our buck. In addition, I'll look into making the lambda not capture all arguments by ref via the ref-capture `[&]` directive. I can probably have an entire half's worth of impact by copying Scott's changes and mirroring them in other parts of the PyTorch codebase lol. #accep2ship ghstack-source-id: 122178246 Test Plan: Build Reviewed By: iseeyuan Differential Revision: D26506634 fbshipit-source-id: b91d5e4773ade292fddce8dddd7e5ba1e5afeb29	2021-02-21 23:51:47 -08:00
Dhruv Matani	d177654981	[Take-2] [PyTorch Mobile] 15KiB size reduction by reducing MaxTypeIndex from 256 to 32 (#52466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52466 `MaxTypeIndex` controls the size of the array ``` detail::TypeMetaData* TypeMeta::typeMetaDatas() { static detail::TypeMetaData instances[MaxTypeIndex + 1] ``` in `typeid.cpp`. In practice, I have seen that this array doesn't hold more than 18 elements once the PyTorch library has been initialized (in mobile unit tests). I couldn't find situations where elements may be added to this array post library initialization. There is a runtime check to prevent array overflow, so reducing the size of the storage shouldn't come at any additional risk from the perspective of loss in visibility of errors. The fact that this array is staically allocated ends up using a bunch of space in the binary (potentially to initialize the trailing elements?). I'm somewhat surprised but this. However, this change registered a 15KiB size win on both fbios as well as igios. Found this when I was looking at a bloaty run that I shared with smessmer on friday: https://www.internalfb.com/intern/everpaste/?handle=GLXImQisHOfT74EBAKw47V3ktuAzbsIXAAAB I initially thought that the methods being passed in to the constructor of `detail::TypeMetaData` were causing the size increase, but only later relaized the issue after reading the folloing helpful comment: ``` // The remainder of the array is padded with TypeMetaData blanks. // The first of these is the entry for ScalarType::Undefined. // The rest are consumed by CAFFE_KNOWN_TYPE entries. ``` This change was originally reverted at https://www.internalfb.com/diff/D26525208 due to an ONNX test failure. Re-trying the change gated under `C10_MOBILE`. ghstack-source-id: 122178181 Test Plan: Sandcastle runs + the following BSB runs. ### igios ``` D26299594 (`9e54532947`)-V1 (https://www.internalfb.com/intern/diff/D26299594 (`9e54532947`)/?dest_number=121221891) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: +596 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.8 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:443632243487886@base/bsb:443632243487886@diff/ ``` ### fbios ``` D26299594 (`9e54532947`)-V1 (https://www.internalfb.com/intern/diff/D26299594 (`9e54532947`)/?dest_number=121221891) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: +104 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:169233698063125@base/bsb:169233698063125@diff/ ``` Reviewed By: iseeyuan Differential Revision: D26527921 fbshipit-source-id: f019e5fd37e6caf24c58c6f144bedcda942d7164	2021-02-21 23:49:48 -08:00
Chen Lai	d491fc6d48	[PyTorch] Add comment to unify macro and rename one macro (#52573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52573 ## Summary Address comments in https://github.com/pytorch/pytorch/pull/52540 1. Add a comment to indicate that the macros `BUILD_LITE_INTERPRETER` and `C10_MOBILE` will be unified. 2. Rename the macro `DBUILD_LITE_INTERPRETER` to `BUILD_LITE_INTERPRETER` ## Test plan 1. `MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ USE_CUDA=0 DEBUG=1 MAX_JOBS=16 BUILD_LITE_INTERPRETER=1 python setup.py develop` 2. `/Users/chenlai/pytorch/cmake-build-debug/bin/test_lite_interpreter_runtime --gtest_filter=* --gtest_color=no` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26572742 Pulled By: cccclai fbshipit-source-id: c8895fcfe8dd893f8157913f110e2ba025fc3955	2021-02-21 21:53:55 -08:00
nikithamalgi	e677b71056	Add support for pow (#52374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18627 Adds pow support for JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/52374 Test Plan: python test/test_jit.py -k test_torch_pow Reviewed By: heitorschueroff Differential Revision: D26555070 Pulled By: nikithamalgifb fbshipit-source-id: 0d325f09cf893e4ae50277a95a6b7ad67d94f342	2021-02-21 19:55:58 -08:00
nikithamalgi	d819a21692	Support any (#52360 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18627 Adds torch.any support for JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/52360 Test Plan: python test/test_jit.py -k test_torch_any python test/test_jit.py -k test_any Reviewed By: tugsbayasgalan Differential Revision: D26550626 Pulled By: nikithamalgifb fbshipit-source-id: 36c2ae15e3bfb7b32bbf442818c879b0d2120cf1	2021-02-21 15:49:57 -08:00
Chen Lai	14f7bf0629	[PyTorch] update CMake to build libtorch lite (#51419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51419 ## Summary 1. Add an option `BUILD_LITE_INTERPRETER` in `caffe2/CMakeLists.txt` and set `OFF` as default. 2. Update 'build_android.sh' with an argument to swtich `BUILD_LITE_INTERPRETER`, 'OFF' as default. 3. Add a mini demo app `lite_interpreter_demo` linked with `libtorch` library, which can be used for quick test. ## Test Plan Built lite interpreter version of libtorch and test with Image Segmentation demo app ([android version](https://github.com/pytorch/android-demo-app/tree/master/ImageSegmentation)/[ios version](https://github.com/pytorch/ios-demo-app/tree/master/ImageSegmentation)) ### Android 1. Prepare model: Prepare the lite interpreter version of model by run the script below to generate the scripted model `deeplabv3_scripted.pt` and `deeplabv3_scripted.ptl` ``` import torch model = torch.hub.load('pytorch/vision:v0.7.0', 'deeplabv3_resnet50', pretrained=True) model.eval() scripted_module = torch.jit.script(model) # Export full jit version model (not compatible lite interpreter), leave it here for comparison scripted_module.save("deeplabv3_scripted.pt") # Export lite interpreter version model (compatible with lite interpreter) scripted_module._save_for_lite_interpreter("deeplabv3_scripted.ptl") ``` 2. Build libtorch lite for android: Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64) `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh`. This pr is tested on Pixel 4 emulator with x86, so use cmd `BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86` to specify abi to save built time. After the build finish, it will show the library path: ``` ... BUILD SUCCESSFUL in 55s 134 actionable tasks: 22 executed, 112 up-to-date + find /Users/chenlai/pytorch/android -type f -name 'aar' + xargs ls -lah -rw-r--r-- 1 chenlai staff 13M Feb 11 11:48 /Users/chenlai/pytorch/android/pytorch_android/build/outputs/aar/pytorch_android-release.aar -rw-r--r-- 1 chenlai staff 36K Feb 9 16:45 /Users/chenlai/pytorch/android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar ``` 3. Use the PyTorch Android libraries built from source in the ImageSegmentation app: Create a folder 'libs' in the path, the path from repository root will be `ImageSegmentation/app/libs`. Copy `pytorch_android-release` to the path `ImageSegmentation/app/libs/pytorch_android-release.aar`. Copy 'pytorch_android_torchvision` (downloaded from [here](https://oss.sonatype.org/#nexus-search;quick~torchvision_android)) to the path `ImageSegmentation/app/libs/pytorch_android_torchvision.aar` Update the `dependencies` part of `ImageSegmentation/app/build.gradle` to ``` dependencies { implementation 'androidx.appcompat:appcompat:1.2.0' implementation 'androidx.constraintlayout:constraintlayout:2.0.2' testImplementation 'junit:junit:4.12' androidTestImplementation 'androidx.test.ext:junit:1.1.2' androidTestImplementation 'androidx.test.espresso:espresso-core:3.3.0' implementation(name:'pytorch_android-release', ext:'aar') implementation(name:'pytorch_android_torchvision', ext:'aar') implementation 'com.android.support:appcompat-v7:28.0.0' implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3' } ``` Update `allprojects` part in `ImageSegmentation/build.gradle` to ``` allprojects { repositories { google() jcenter() flatDir { dirs 'libs' } } } ``` 4. Update model loader api: Update `ImageSegmentation/app/src/main/java/org/pytorch/imagesegmentation/MainActivity.java` by 4.1 Add new import: `import org.pytorch.LiteModuleLoader;` 4.2 Replace the way to load pytorch lite model ``` // mModule = Module.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.pt")); mModule = LiteModuleLoader.load(MainActivity.assetFilePath(getApplicationContext(), "deeplabv3_scripted.ptl")); ``` 5. Test app: Build and run the ImageSegmentation app in Android Studio, ![image](https://user-images.githubusercontent.com/16430979/107696279-9cea5900-6c66-11eb-8286-4d1d68abff61.png) ### iOS 1. Prepare model: Same as Android. 2. Build libtorch lite for ios* `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=SIMULATOR BUILD_LITE_INTERPRETER=1 ./scripts/build_ios.sh` 3. Remove Cocoapods from the project: run `pod deintegrate` 4. Link ImageSegmentation demo app with the custom built library: Open your project in XCode, go to your project Target’s Build Phases - Link Binaries With Libraries, click the + sign and add all the library files located in `build_ios/install/lib`. Navigate to the project Build Settings, set the value Header Search Paths to `build_ios/install/include` and Library Search Paths to `build_ios/install/lib`. In the build settings, search for other linker flags. Add a custom linker flag below ``` -all_load ``` Finally, disable bitcode for your target by selecting the Build Settings, searching for Enable Bitcode, and set the value to No. 5. Update library and api 5.1 Update `TorchModule.mm`` To use the custom built libraries the project, replace `#import <LibTorch/LibTorch.h>` (in `TorchModule.mm`) which is needed when using LibTorch via Cocoapods with the code below: ``` //#import <LibTorch/LibTorch.h> #include "ATen/ATen.h" #include "caffe2/core/timer.h" #include "caffe2/utils/string_utils.h" #include "torch/csrc/autograd/grad_mode.h" #include "torch/script.h" #include <torch/csrc/jit/mobile/function.h> #include <torch/csrc/jit/mobile/import.h> #include <torch/csrc/jit/mobile/interpreter.h> #include <torch/csrc/jit/mobile/module.h> #include <torch/csrc/jit/mobile/observer.h> ``` 5.2 Update `ViewController.swift` ``` // if let filePath = Bundle.main.path(forResource: // "deeplabv3_scripted", ofType: "pt"), // let module = TorchModule(fileAtPath: filePath) { // return module // } else { // fatalError("Can't find the model file!") // } if let filePath = Bundle.main.path(forResource: "deeplabv3_scripted", ofType: "ptl"), let module = TorchModule(fileAtPath: filePath) { return module } else { fatalError("Can't find the model file!") } ``` ### Unit test Add `test/cpp/lite_interpreter`, with one unit test `test_cores.cpp` and a light model `sequence.ptl` to test `_load_for_mobile()`, `bc.find_method()` and `bc.forward()` functions. ### Size: With the change: Android: x86: `pytorch_android-release.aar` (13.8 MB) IOS: `pytorch/build_ios/install/lib` (lib: 66 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 135016 -rw-r--r-- 1 chenlai staff 3.3M Feb 15 20:45 libXNNPACK.a -rw-r--r-- 1 chenlai staff 965K Feb 15 20:45 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 15 20:45 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 15 20:45 libcpuinfo.a -rw-r--r-- 1 chenlai staff 39K Feb 15 20:45 libcpuinfo_internals.a -rw-r--r-- 1 chenlai staff 1.5M Feb 15 20:45 libeigen_blas.a -rw-r--r-- 1 chenlai staff 148K Feb 15 20:45 libfmt.a -rw-r--r-- 1 chenlai staff 44K Feb 15 20:45 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 15 20:45 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 15 21:19 libtorch.a -rw-r--r-- 1 chenlai staff 60M Feb 15 20:47 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 66M lib 2.8M share ``` Master (baseline): Android: x86: `pytorch_android-release.aar` (16.2 MB) IOS: `pytorch/build_ios/install/lib` (lib: 84 MB): ``` (base) chenlai@chenlai-mp lib % ls -lh total 172032 -rw-r--r-- 1 chenlai staff 3.3M Feb 17 22:18 libXNNPACK.a -rw-r--r-- 1 chenlai staff 969K Feb 17 22:18 libc10.a -rw-r--r-- 1 chenlai staff 4.6K Feb 17 22:18 libclog.a -rw-r--r-- 1 chenlai staff 42K Feb 17 22:18 libcpuinfo.a -rw-r--r-- 1 chenlai staff 1.5M Feb 17 22:18 libeigen_blas.a -rw-r--r-- 1 chenlai staff 44K Feb 17 22:18 libpthreadpool.a -rw-r--r-- 1 chenlai staff 166K Feb 17 22:18 libpytorch_qnnpack.a -rw-r--r-- 1 chenlai staff 384B Feb 17 22:19 libtorch.a -rw-r--r-- 1 chenlai staff 78M Feb 17 22:19 libtorch_cpu.a ``` `pytorch/build_ios/install`: ``` (base) chenlai@chenlai-mp install % du -sh * 14M include 84M lib 2.8M share ``` Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26518778 Pulled By: cccclai fbshipit-source-id: 4503ffa1f150ecc309ed39fb0549e8bd046a3f9c	2021-02-21 01:43:54 -08:00
Modi Mo	a935118c90	Fix caffee2 to use MaybeAlign when using LLVM trunk Summary: Trunk at 13 uses a different type for `CreateAlignedStore` and `CreateAlignedLoad` so updating usage here to reflect this. Test Plan: buck build mode/opt-clang-thinlto sigrid/predictor/v2:sigrid_remote_predictor -c cxx.extra_cxxflags="-Wforce-no-error -fbracket-depth=300" -c cxx.profile="fbcode//fdo/autofdo-bolt-compatible/sigrid/predictor/v2/sigrid_remote_predictor:autofdo-bolt-compatible" -c cxx.modules=False Previously: caffe2/torch/csrc/jit/tensorexpr/llvm_codegen.cpp:1079:21: error: no matching member function for call to 'CreateAlignedLoad' value_ = irb_.CreateAlignedLoad(vaddr, 4); ~~~~~^~~~~~~~~~~~~~~~~ third-party-buck/platform009/build/llvm-fb/include/llvm/IR/IRBuilder.h:1681:13: note: candidate function not viable: no known conversion from 'int' to 'llvm::MaybeAlign' for 2nd argument LoadInst CreateAlignedLoad(Value Ptr, MaybeAlign Align, Now: Passes Differential Revision: D26562330 fbshipit-source-id: dbf9ca5247ccd4351861995c2c5480a7cc55c202	2021-02-20 23:12:00 -08:00
Yinghai Lu	a61a8d059e	Restore fast path in OnnxifiOp::adjustOutputBatchSize (#52498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52498 If `max_shape[dim]` equals to `real_shape[dim]`, we shouldn't need to adjust dim in terms of output slicing. Consider the case, when we have output compiled at [10, 4] and real input is [5, 4], we only need to adjust outermost dim (10->5) for the second dim, we don't need to do anything. Thus this should fall to fast path. Test Plan: ``` buck test glow/fb/test:test_onnxifinnpi ``` Reviewed By: khabinov Differential Revision: D26542773 fbshipit-source-id: 0475e0a1c35be6f28ccc63dc69cb0b5acf695141	2021-02-20 13:50:07 -08:00
Dhruv Matani	65bfa1389d	[PyTorch Mobile] Do not create a static variable in Dispatcher::singleton() (#52447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52447 Currently, `Dispatcher::singleton()` is always inlined. Additionally, `Dispatcher::singleton()` contains a static variable, which means that the generated code calls `__cxa_guard_acquire` and `__cxa_guard_release` which help implement exactly once semantics for the initialization of the `static Dispatcher& s` variable. For `C10_MOBILE`, we should not create the additional static ref within the inlined function to save binary size since it results in a lot of additional code being generated by the compiler. The `Dispatcher::singleton()` method is called from the generated method stubs for all aten opertors that are code-generated and potentially also from other operators that hand off execution to the kernel function for the right backend via the PyTorch Dispatcher. This is a classic space/time (efficiency) tradeoff, so feedback would be welcome. kimishpatel, I'll need your expertise in figuring out how to perf-test this change, specifically for mobile. Here's the godbolt link in case you wish to check out the generated code for a `static` variable within a function: https://godbolt.org/z/cdsG3v {F375631117} ghstack-source-id: 121989311 Test Plan: Build + BSB ### lightspeed-messenger Divide the number below by 2 ``` D26507049-V1 (https://www.internalfb.com/intern/diff/D26507049/?dest_number=121944956) messenger-experimental-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -21.7 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -65.4 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:243392763936025@base/bsb:243392763936025@diff/ ``` ### igios ``` D26507049-V1 (https://www.internalfb.com/intern/diff/D26507049/?dest_number=121944956) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: -15.6 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -34.3 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:882756935844095@base/bsb:882756935844095@diff/ ``` ### fbios-pika ``` D26507049-V1 (https://www.internalfb.com/intern/diff/D26507049/?dest_number=121944956) fbios-pika: Succeeded Change in Download Size for arm64 + 3x assets variation: -8.6 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -29.1 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:832297083999539@base/bsb:832297083999539@diff/ ``` Reviewed By: swolchok Differential Revision: D26507049 fbshipit-source-id: 0d2f55ea2d42a0782fb69aabfa517f2ec60c8036	2021-02-20 13:01:19 -08:00
Haichuan Yang	597c9f8b22	fix zero_point rounding for _fake_quantize_learnable_per_channel_affine (#52290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52290 _fake_quantize_learnable_per_channel_affine should allow taking non-integer zero_point as input, and perform rounding and clamp before doing forward/backward. In this diff, we make _fake_quantize_learnable_per_channel_affine to round and clamp zero_point beforehand as in _fake_quantize_learnable_per_tensor_affine. ghstack-source-id: 122148099 Test Plan: `buck test mode/dev-nosan -c fbcode.platform=platform009 //caffe2/test:quantization -- test_learnable` Reviewed By: raghuramank100 Differential Revision: D26446342 fbshipit-source-id: fc9b6832fa247cc9d41265eb4fd1575a2d2ed12c	2021-02-20 12:21:03 -08:00
Dhruv Matani	15892a651f	[PyTorch Mobile] Create compile time string for passing in to the exception message instead of 4 arguments that will be concatenated at runtime (#52303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52303 swolchok did some stellar work in D26372806 (`22b12179db`) (and friends) to simplify exception handling code-paths and outline uncommon code paths. In addition, non-inlined versions of exception handling functions were provided but only in case of specific cases where 1 (or 2?) arguments were passed in to the exception throwing macros. This change hopes to take advantage of that infrastructure and only pass in a single `const char*` to `AT_ERROR` to leverage any current (or future) optimizations that may take place in this space. Since this isn't yet in production, it won't have a size impact. However, my guess is that it will be a significant size win once we turn on tracing based selective build since the exception code path will be present in every kernel function multiple times over since most dtypes will be unselected. ghstack-source-id: 122149806 Test Plan: Build + auto-generated unit tests for tracing based selective build. Reviewed By: swolchok Differential Revision: D26463089 fbshipit-source-id: 349160a37d43d629249b92fa24f12b5bd128df1c	2021-02-20 10:54:24 -08:00
Yuxin Wu	a62b0deae0	[pytorch] make is_tracing scriptable (#49853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49853 fix https://github.com/pytorch/pytorch/issues/47379 Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- 'test_script_is_tracing' Reviewed By: SplitInfinity Differential Revision: D25704315 fbshipit-source-id: 33c09c5bc1f1b62ef254f58e18ab1e951dbd1790	2021-02-20 02:53:28 -08:00
Yanan Cao	d9161d6da3	Optimize `setDebugName` time complexity (#52346 ) Summary: `setDebugName` maintains an invariant that all debug names of values in same graph must be distinct. This is achieved by appending numeric suffixes to requested debug names. However, the implementation was slow (O(N^2)) when there are a lot of name conflicts. This PR fixes the problem by adding more book-keeping logic so that time complexity is brought down to O(1) on average. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52346 Reviewed By: SplitInfinity Differential Revision: D26564462 Pulled By: gmagogsfm fbshipit-source-id: 3260fc3b436f1b0bcb45fdd2d1ec759b5828263f	2021-02-20 01:38:43 -08:00
cyy	53373a8e8c	remove deprecated function (#52426 ) Summary: In order to remove the annoying compilation warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/52426 Reviewed By: glaringlee Differential Revision: D26525718 Pulled By: SplitInfinity fbshipit-source-id: 0a46389bd8e27e77250ca9501125c6ffc4b5d45b	2021-02-20 00:19:39 -08:00
Erjia Guan	bb34fd6191	[DataLoader] Fix util ImportError (#52459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52459 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26523941 Pulled By: ejguan fbshipit-source-id: 8cd84982348687cf84fe5e821f51fbac43a783fa	2021-02-19 20:28:36 -08:00
David Clissold	1c64f862f6	Update vec_mergee operand specifiers (_vecb) (#52091 ) Summary: Patch needed in order to build on ppc64le with compiler g++V7. (w/o fix, only works on minimum compiler V8). Fixes https://github.com/pytorch/pytorch/issues/51592 To be clear, credit where due: I tested this patch on a ppc64 RHEL container using gcc/g++ 7.4 compiler to ensure a complete pytorch build was successful -- and it was. However, I do not take credit for this patch. I found and reported the issue, but the full brainpower to identify the cause of the error and the appropriate solution and thus the credit for this fix truly belongs to quickwritereader (and I am just helping with the legwork to integrate it after having tested it). Pull Request resolved: https://github.com/pytorch/pytorch/pull/52091 Reviewed By: ejguan Differential Revision: D26494943 Pulled By: glaringlee fbshipit-source-id: 0babdb460db5047c54144f724466b77dd2d8a364	2021-02-19 20:04:14 -08:00
Hao Lu	72f9b3c8d5	[StaticRuntime] Add function to check for memory leak (#52342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52342 Reviewed By: yinghai Differential Revision: D26420826 fbshipit-source-id: 4023f80fadd21e192afa485d96acd37c845146be	2021-02-19 19:45:09 -08:00
Rohan Varma	ef8d17e112	[DDP] Separate error messages for unused params in forward and not all outputs (#52391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52391 There are 2 ways DDP can throw the exception refactored here - 1) Unused params in the forward pass. We provide `find_unused_parameters=True` for this. 2) All params used in fwd pass, but not all outputs used in loss computation. There are a few workarounds for this but we do not provide native support. Previously, these 2 issues were combined into 1 error message but that has historically resulted in confusion, with users reporting getting this error even when they enable `find_unused_parameters=True` (which they expect to fix this error). As a result there is additional churn to debug these issues because the true cause (1) vs (2) is not known. This commit helps to fix the issue by separating out the 2 error messages depending on if we ran with unused parameter detection or not. Hopefully this should make the error message much more clear and actionable. error msg with `find_unused_params=True`: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since `find_unused_parameters=True` is enabled, this likely means that not all `forward` outputs participate in computing loss. You can fix this by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` error msg without `find_unused_params` specified: ``` RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`, and by making sure all `forward` function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable). ``` ghstack-source-id: 122097900 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26496688 fbshipit-source-id: 4a9eeeda10293da13d94a692d10cb954e4506d7c	2021-02-19 17:09:22 -08:00
Jerry Zhang	a3e693789f	[qunat][graphmode][fx] Enable test for non quantized input for cat (#52414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52414 When the input is not quantized, we'll still quantize cat as requested by the qconfig, even though it might be slower Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26503554 fbshipit-source-id: 29d7c136711a12c124791c10ae436b61c1407668	2021-02-19 16:56:41 -08:00
Jane Xu	8fe6d17847	Moving 11.2 CI to master only (#52536 ) Summary: Moves the 11.2 linux and windows builds to master only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52536 Reviewed By: walterddr Differential Revision: D26557925 Pulled By: janeyx99 fbshipit-source-id: 28b8018112c159e2ae259dc00884c17796951c90	2021-02-19 15:33:03 -08:00
Jane Xu	09516d2d0c	Reenables skipped tests for all CUDA versions except 11.2 (#52359 ) Summary: This PR adds functionality to skip a test based on CUDA version. This way, we can be more specific when skipping a test, such as when the test only fails for a particular CUDA version. This allows us to add back the skipped tests for CUDA 11.2 for other CUDA versions, such as 10.1 and 11.1. I tested this locally (by using 11.0 instead of 11.2), but will run all the CI to make sure it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52359 Reviewed By: walterddr Differential Revision: D26487951 Pulled By: janeyx99 fbshipit-source-id: 45c71cc6105ffd9985054880009cf68ea5ef3f6a	2021-02-19 15:30:55 -08:00
Jerry Zhang	626756ac39	[quant][graphmode][api] debug --> reference (#52179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52179 Rename debug to reference. We'll use this to produce a reference quantized model that can be used as a common interface between pytorch quantized model and backends. Test Plan: python test/test_quantization.py TestQuantizeFx Imported from OSS Reviewed By: vkuzo Differential Revision: D26424656 fbshipit-source-id: a0299b023f6ba7d98f5750724c517b0ecb987b35	2021-02-19 14:20:01 -08:00
Oleg Khabinov	941ebecc54	[glow aot] Support --onnxifi_min_ops in AOT flow (#52380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52380 Reviewed By: jfix71, ChunliF Differential Revision: D26455464 fbshipit-source-id: 4def6192a8898d4bbe407b819207b80f262b4721	2021-02-19 13:56:54 -08:00
Bel H	db33afbf9f	Change cmake to allow building with MLC kick-off build (#51326 ) Summary: - Allows build process to build with MLC enabled if subrepo folder mlc is in path and we can link against ML Compute on macOS BigSur - To build with MLC enabled you will need to clone the mlc repo inside the pytorch repository. - We need both this change and https://github.com/pytorch/pytorch/pull/50634 on pytorch/pytorch to enable the `mlc` device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51326 Reviewed By: glaringlee Differential Revision: D26533138 Pulled By: malfet fbshipit-source-id: 0baa06b4eb2d62dbfc0f6fc922096cb0db1cc7d1	2021-02-19 13:04:25 -08:00
Jerry Zhang	0c0de542be	[quant][graphmode][fx] Guard the supported quantization type for add/mul (#52413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52413 TODO: We'll need to add this guard for other ops as well (Note: this ignores all push blocking failures!) Test Plan: python test/test_quantization.py TestQuantizeFx.test_mul_add_fp16_config Imported from OSS Reviewed By: supriyar Differential Revision: D26503348 fbshipit-source-id: 5aaba518742a516cc3521fd5f23f1a264d2973e2	2021-02-19 12:56:22 -08:00
Scott Wolchok	7cd9892f83	[PyTorch] Sync TORCH_INTERNAL_ASSERT optis with TORCH_CHECK (#52226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52226 This gets TORCH_INTERNAL_ASSERT to parity with TORCH_CHECK in terms of optimization for 0 or 1 argument. ghstack-source-id: 121877054 (Note: this ignores all push blocking failures!) Test Plan: Compare generated assembly for ``` #include <c10/util/Exception.h> void f(bool b) { TORCH_INTERNAL_ASSERT(b, "message"); } void g(bool b) { TORCH_INTERNAL_ASSERT(b); } void h(bool b) { TORCH_INTERNAL_ASSERT(b, "message", random()); } ``` before/after this diff. Before: P174916324 After: P174916411 Before, f and g called out to outlined lambdas to build std::strings. After, they load string constants and call torchInternalAssertFail. Similarly, h calls random() and c10::detail::_str_wrapper() inline and then calls out to torchInternalAssertFail. As with D26380783 (`efbb854ed8`), I hope to solve the problem of outlining the random & _str_wrapper calls separately. Profile AdIndexer benchmark & verify that toTensor() is still inlined (it calls TORCH_INTERNAL_ASSERT with an integer argument, like `h` above). Reviewed By: bhosmer Differential Revision: D26410575 fbshipit-source-id: f82ffec8d302c9a51f7a82c65bc698fab01e1765	2021-02-19 12:45:40 -08:00
Scott Wolchok	566f7c79d3	[c10] Take advantage of c10::str optis for simple CAFFE_ENFORCE (#52223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52223 After the previous diffs, `c10::str()` will return a `CompileTimeEmptyString` when passed 0 arguments and a `const char` when passed 1 `const char ` argument. We can take advantage of this to outline further std::string creation from CAFFE_ENFORCE. ghstack-source-id: 121877053 (Note: this ignores all push blocking failures!) Test Plan: Compare assembly for ``` #include <c10/util/Logging.h> void f(bool b) { CAFFE_ENFORCE(b); } void g(bool b) { CAFFE_ENFORCE(b, "message"); } void h(bool b) { CAFFE_ENFORCE(b, "message", random()); } ``` before & after this diff. before: P174902847 after: P174902912 f & g are clearly much improved, and h is about the same. (I tried measuring caffe2 perf on the AdIndexer MergeNet benchmark, but didn't see a win, which makes sense because the change is small.) Reviewed By: bhosmer Differential Revision: D26405181 fbshipit-source-id: c51a9e459ae7d9876494a83ade6f6fe725619512	2021-02-19 12:45:35 -08:00
Scott Wolchok	d6755934fa	[PyTorch] Make c10::str(const char) return const char (#52222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52222 `c10::str()` is often used with variadic macros. It can be more efficient to get a C string out if you put a C string in, like if you are able to defer std::string creation to an outlined function or even never do it at all. Meanwhile, there is an implicit conversion from const char* to std::string, so users who expected a std::string will still make one. ghstack-source-id: 121877052 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: bhosmer Differential Revision: D26419663 fbshipit-source-id: 400bef71e6a0004b5914f5f511ea0e04e0d7599b	2021-02-19 12:43:03 -08:00
kshitij12345	b6cf17deee	[reland][complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. (#52483 ) Summary: Reland https://github.com/pytorch/pytorch/issues/52035 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52483 Reviewed By: heitorschueroff Differential Revision: D26545097 Pulled By: anjali411 fbshipit-source-id: f154c239183279be381a7393a8226778b36148bb	2021-02-19 12:36:49 -08:00
peterjc123	44ff79d849	Automatically set BUILD_SPLIT_CUDA for cpp exts (#52503 ) Summary: Fixes https://github.com/pytorch/vision/pull/3418#issuecomment-781673110 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52503 Reviewed By: malfet Differential Revision: D26546857 Pulled By: janeyx99 fbshipit-source-id: a100b408e7cd28695145a1dda7f2fa081bb7f21f	2021-02-19 12:22:55 -08:00
Raghavan Raman	b6ed05130e	Adding a flag to enable CPU fusion in benchmarks (#48612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48612 Test Plan: python -m benchmarks.tensorexpr --device cpu --mode fwd --jit_mode trace --cpu_fusion element Reviewed By: heitorschueroff Differential Revision: D26548643 Pulled By: navahgar fbshipit-source-id: adb537818d77c9b6b0fe434ae6d963a5f348ad24	2021-02-19 12:11:06 -08:00
CarlosJose126	bfb007a438	Example LSTMCell (#51983 ) Summary: Fixes #{51801} LSTMCell example updated Pull Request resolved: https://github.com/pytorch/pytorch/pull/51983 Reviewed By: agolynski Differential Revision: D26467104 Pulled By: zou3519 fbshipit-source-id: 31c8bf89b21cd2f748b2cc28a74169082d81503c	2021-02-19 11:49:31 -08:00
Jiakai Liu	c9c4b871a5	[pytorch] reintroduce static dispatch (#51957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51957 This is a simplified version of #51554. Compared to #51554, this version only supports statically dispatching to a specific backend. The benefit is that it skipped the dispatch key computation logic thus has less framework overhead. The downside is that if input tensors do not match the specified backend it will throw error instead of falling back to regular dispatch. Sample code: ``` Tensor empty(IntArrayRef size, TensorOptions options, c10::optional<MemoryFormat> memory_format) { return at::cpu::empty(size, options, memory_format); } // aten::conj(Tensor(a) self) -> Tensor(a) Tensor conj(const Tensor & self) { return at::math::conj(self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_out(Tensor & out, const Tensor & self) { return at::cpu::conj_out(out, self); } // aten::conj.out(Tensor self, , Tensor(a!) out) -> Tensor(a!) Tensor & conj_outf(const Tensor & self, Tensor & out) { return at::cpu::conj_out(out, self); } // aten::_conj(Tensor self) -> Tensor Tensor _conj(const Tensor & self) { return at::defaultbackend::_conj(self); } ``` For ops without the specific backend dispatch, it will throw error: ``` // aten::_use_cudnn_ctc_loss(Tensor log_probs, Tensor targets, int[] input_lengths, int[] target_lengths, int blank) -> bool bool _use_cudnn_ctc_loss(const Tensor & log_probs, const Tensor & targets, IntArrayRef input_lengths, IntArrayRef target_lengths, int64_t blank) { TORCH_CHECK(false, "Static dispatch does not support _use_cudnn_ctc_loss for CPU."); } ``` Differential Revision: D26337857 Test Plan: Imported from OSS Reviewed By: bhosmer Pulled By: ljk53 fbshipit-source-id: a8e95799115c349de3c09f04a26b01d21a679364	2021-02-19 11:41:39 -08:00
Meghan Lele	28e3dfdcca	[JIT] Allow __exit__ to have a return value (#52336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52336 Summary In Python, the boolean interpretation of the return value of `__exit__` of objects that are used as context managers with `with` statements is used to determine whether or not to propagate exceptions thrown inside the body of the with statement. This latter feature is not possible to add to TorchScript at the moment, but the current requirement that `__exit__` not have any return values can make it difficult to script a context manager whose `__exit__` does have a return value. Accordingly, this commit removes the requirement that `__exit__` must not have any return value. TorchScript does not interpret this return value in the same way Python does (or at all), but this should make it easier to share context managers between eager mode and script code. Test Plan This commit adds a return value to one of the context managers used in `TestWith`. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26504910 Pulled By: SplitInfinity fbshipit-source-id: 2ab635a24d111ac25df4e361b716be8fada5128e	2021-02-19 11:32:47 -08:00
Meghan Lele	bcd77cece4	[JIT] Display an error message when with item is not an object (#52335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52335 Summary `with` statements can only be used with objects that have `__enter__` and `__exit__` defined. At present, any attempt to use an expression that returns something that is not an instance of a class type results in a cryptic internal assert failure instead of a useful error message. This is because the code that generates IR for `with` statements uses `Type::expect` as if it were `Type::cast`; that is, as if it returns `nullptr` on failure. This commit fixes this issue by checking the `kind()` of the type of the expression used as the with item before calling `expect<ClassType>()` on it. Test Plan This commit adds a unit test to `test_with_errors` to test this case. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26504909 Pulled By: SplitInfinity fbshipit-source-id: 92d108e0c010370fd45131a57120f50c0b85c401	2021-02-19 11:30:48 -08:00
Jerry Zhang	338d2eca4a	[quant][graphmode][fx] Enable test for non quantized input for add/mul (#52412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52412 When the input is not quantized, we'll still quantize add/mul Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26503347 fbshipit-source-id: 457b3444c50e5b49b911b04c67684f5eead78ec9	2021-02-19 11:08:27 -08:00
Shubham Bhokare	49a923c8b5	[ONNX] Update LayerNorm symbolic to handle autocasting (#52199 ) (#52350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52350 When onnx export creates a 0-dim tensor of constant type, this action overrides the type promotion logic as quoted in #9515. In order to prevent this from happening this PR adds the following functionality. If the data type is a floating point type, it is converted to a 0-dim double tensor, else it is converted to a 0-dim tensor of its original type Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26490325 Pulled By: SplitInfinity fbshipit-source-id: 4c47c69c9b6523d2e45b74c2541d6d8ca7e28fc9	2021-02-19 10:57:15 -08:00
Shubham Bhokare	26e8f8f223	[ONNX] Update fuseLogSoftmaxNllLoss function to handle autocasting (#51729 ) (#52349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52349 Adds a check for patterns for cases with autocasting enabled in which a cast node is inserted before the NegativeLogLikelihoodLoss node and causing these patterns below not to be recognizable by peephole pass function Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26490326 Pulled By: SplitInfinity fbshipit-source-id: 4a6d806acc51b4696fd3932734d55af075fba6b1	2021-02-19 10:57:10 -08:00
Negin Raoof	12cbd6975a	[ONNX] Fix for sequence of mutations in blocks (#51577 ) (#52347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52347 Fixes consecutive mutations in a tensor inside blocks. Also, support append and pop in blocks. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26490328 Pulled By: SplitInfinity fbshipit-source-id: f0cdc706d2793e1f4eb0503d3e0f63f4127ea47a	2021-02-19 10:55:05 -08:00
Nikita Shulga	08017f4598	Add explicit cudart_static dependency for cublas_static (#52509 ) Summary: Fixes following error during static linking, by enforcing that cudart dependency is put after cublasLt ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52509 Reviewed By: janeyx99 Differential Revision: D26547622 Pulled By: malfet fbshipit-source-id: 4e17f18cf0ab5479a549299faf2583a79fbda4b9	2021-02-19 10:45:49 -08:00
Elias Ellison	752d808fa0	Trace linear as aten::linear (#51897 ) Summary: https://github.com/pytorch/pytorch/pull/51613 made `torch.nn.functional.linear` compile as `aten::linear`, extend the same behavior with tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51897 Reviewed By: albanD Differential Revision: D26320711 Pulled By: eellison fbshipit-source-id: a26d3c37323a0706313c6ebb210bad60eec6a64b	2021-02-19 10:20:42 -08:00
Michael Suo	d5ac929b62	[package] Introduce Importer to manage module namespace collisions. (#51975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51975 See comments in code. Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26340592 Pulled By: suo fbshipit-source-id: 61b16bafad15e19060710ad2d8487c776d672847	2021-02-19 10:06:04 -08:00
Michael Suo	76e8324370	[package] rename ex/importer.py to package_ex/importer.py (#52320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52320 as title Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26468416 Pulled By: suo fbshipit-source-id: 890eecea76426918daff900402fbcbc149e48535	2021-02-19 10:04:14 -08:00
Howard Huang	bc6852c192	Change TCPStore world_size and is_master to be optional (#51809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51809 Changes to TCPStore which will make world_size and is_master optional parameters for initialization. API before change: ```python # arguments: host_name, port, world_size, is_master, timeout=300s server_store = dist.TCPStore("127.0.0.1", 0, 2, True) client_store = dist.TCPStore("127.0.0.1", 0, 2, False) ``` API after change: ```python # arguments: host_name, port, world_size=-1, is_master=False, timeout=300s server_store = dist.TCPStore("127.0.0.1", 0, is_master=True) client_store = dist.TCPStore("127.0.0.1", 0) ``` Test Plan: Imported from OSS Reviewed By: heitorschueroff Differential Revision: D26461770 Pulled By: H-Huang fbshipit-source-id: 5b2157029c73e8706e158cd49ecce60c9f3a7f41	2021-02-19 09:56:51 -08:00
Nikita Vedeneev	9699c703c2	Stable sort for the CPU take 2. (#51790 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38681. A duplicate of https://github.com/pytorch/pytorch/pull/50052 created to become importable to the fb internal tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51790 Reviewed By: agolynski Differential Revision: D26279045 Pulled By: glaringlee fbshipit-source-id: 348e171dee9c370a76002b65d0c82c329f57a421	2021-02-19 09:28:57 -08:00
kshitij12345	5fda3b094c	Add conj OpInfo and fix out inconsistency (#52059 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/42515 Fixes: https://github.com/pytorch/pytorch/issues/51949 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52059 Reviewed By: ailzhang Differential Revision: D26373800 Pulled By: anjali411 fbshipit-source-id: d2c92263a690072c0f23cb60885be42eebea48c6	2021-02-19 08:18:55 -08:00
Kyle Chen	8094e4844d	[ROCm] Enable test_jit_c10.py tests for ROCm (#52410 ) Summary: Re-enabling these test cases for ROCm because they are passing. jeffdaily Signed-off-by: Kyle Chen <kylechen@amd.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/52410 Reviewed By: glaringlee Differential Revision: D26516757 Pulled By: malfet fbshipit-source-id: 49921ee724a50f19afd8e6884a5f3ecd9291fa5c	2021-02-19 08:11:04 -08:00
David Kyle	dbeda994db	Update FindvecLib.cmake for macOS 10.14, 10.15 and Big Sur (#51288 ) Summary: When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake` To test run the following command: ``` BLAS=vecLib python setup.py install --cmake --cmake-only ``` The choice of BLAS library is confirmed in the output: ``` -- Trying to find preferred BLAS backend of choice: vecLib -- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51288 Reviewed By: jbschlosser Differential Revision: D26531136 Pulled By: malfet fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b	2021-02-19 08:04:10 -08:00
Nikita Shulga	93c4067f25	[BE] Cleanup UnaryOpsKernel.cpp (#52444 ) Summary: Delete unused `dispatchtypes` of `IMPLEMENT_FLOAT_KERNEL` and `IMPLEMENT_COMPLEX_KERNEL` Move common part of above-mentioned macros into `IMPLEMENT_ITERATOR_LAMBDA` macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/52444 Reviewed By: walterddr Differential Revision: D26517032 Pulled By: malfet fbshipit-source-id: f03f89602f14fb513c66f3f2a96596e4c1e4cd16	2021-02-19 07:56:43 -08:00
Richard Zou	b71215a909	Revert D26515596: [pytorch][PR] Add support for pow Test Plan: revert-hammer Differential Revision: D26515596 (`83feaebfc3`) Original commit changeset: 0c25a8eba8ed fbshipit-source-id: 1a206f0b2923d922911fdaa5448a4e3a844ac5c4	2021-02-19 07:29:37 -08:00
Ivan Yashchuk	7ca9776874	Fixed _out variants of linear algebra functions (#51560 ) Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs https://github.com/pytorch/pytorch/pull/51968, https://github.com/pytorch/pytorch/pull/51977 Ref. https://github.com/pytorch/pytorch/issues/42666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67	2021-02-19 04:03:35 -08:00
Rohan Varma	df3d1d9378	[RPC] delete torch/csrc/utils/future.h (#51698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51698 Completely eliminates torch::utils::Future as we are now full relying on JitFuture. ghstack-source-id: 122037612 Test Plan: CI Reviewed By: kiukchung Differential Revision: D26243735 fbshipit-source-id: 95010a730f9d35e618f74c5f9de482738cd57c15	2021-02-19 01:02:04 -08:00
Rohan Varma	3b11822825	[RPC] Refactor rref_context to not use utils::Future (#51697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51697 Refactors the rest of rref_context, specifically pendingOwners map and `getOwnerRRef` to use JitFuture. ghstack-source-id: 122037611 Test Plan: CI Reviewed By: wanchaol Differential Revision: D26243268 fbshipit-source-id: ab8874c8253274e8fe50dcd7291e0655a8f3f1df	2021-02-19 00:59:38 -08:00
Yanli Zhao	d0795ab358	log newly added construction and runtime stats at randomly selected iterations (#51394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51394 log newly added construction and runtime stats at randomly selected iterations ghstack-source-id: 121934040 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26161885 fbshipit-source-id: add6e02c1a03e6f74f08b9a9aecf90fa81631d60	2021-02-19 00:15:04 -08:00
Yanli Zhao	c75fa39b6c	add stats that can only be collected at runtime (#51386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178	2021-02-19 00:13:11 -08:00
Rohan Varma	0c46b6b3f6	[DDP] Enhance warning for find_unused_params (#52385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52385 This warning should specify that we did not find unused params in the _forward_ pass, which is when we log this warning. This is to avoid confusion when we get an error because not all outputs were used to compute loss, which also raises an error about unused parameters (to be fixed in the next diff) ghstack-source-id: 122001929 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26494136 fbshipit-source-id: d9b41732ea7e5e31b899d590d311080e3dc56682	2021-02-18 23:36:08 -08:00
Rohan Varma	c29e279f72	[DDP] unittest for when params arent used in backward pass (#52384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb	2021-02-18 23:34:16 -08:00
Erjia Guan	4ee5bc74d3	[DataLoader] Change signature of Functional DataPipe (#52458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52458 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26523282 Pulled By: ejguan fbshipit-source-id: c7358fc351f859617754a27b8a701d11ada5d61a	2021-02-18 23:30:58 -08:00
76181208+imaginary-person@users.noreply.github.com	3adc8f8cf7	Enable min & max for Float16 & BFloat16 (#51244 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50790. Added `min()` & `max()` support for `Float16` & `BFloat16`. CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled. `OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`. ### MORE INFO The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`, wherever they aren't present already: 1. `amin()` 2. `argmax()` 3. `amax()` 4. `argmin()` 5. `torch._aminmax()` 6. `torch.clamp()` on CPU. Was already supported on CUDA 7. `min()` (in this PR) 8. `max()` (in this PR) 9. `minimum()` 10. `maximum()` I'll submit separate PRs for the other ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51244 Reviewed By: jbschlosser Differential Revision: D26503455 Pulled By: anjali411 fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140	2021-02-18 23:13:51 -08:00
Jerry Zhang	fb9f89507a	[quant][graphmode][fx] Fix fp16 dynamic quant for functional linear (#52369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52369 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26491425 fbshipit-source-id: d2c2a70bf1bc43ac2b63ac4cf9ae9c07887f12e9	2021-02-18 23:05:30 -08:00
nikithamalgi	83feaebfc3	Add support for pow (#52374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/18627 Adds pow support for JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/52374 Test Plan: python test/test_jit.py -k test_torch_pow Reviewed By: Lilyjjo Differential Revision: D26515596 Pulled By: nikithamalgifb fbshipit-source-id: 0c25a8eba8ed93291c5e447e863edac2a35b61fb	2021-02-18 23:03:28 -08:00
Hui Guo	d8b28579c3	Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v3) (#52394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52394 Test Plan: Imported from OSS test/test_tensorexpr.py test/test_jit_fuser_te.py Reviewed By: bertmaher Differential Revision: D26497856 Pulled By: huiguoo fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0	2021-02-18 22:56:03 -08:00
Yinghai Lu	f4c33edb45	Add onnxifi interface for set/get options (#52388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52388 Pull Request resolved: https://github.com/pytorch/glow/pull/5364 This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global. #codemod Test Plan: ``` buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi ``` Reviewed By: jfix71, khabinov Differential Revision: D26481652 fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4	2021-02-18 20:12:34 -08:00
Jeff Daily	82548f3a00	[ROCm] missing template declarations for complex blas (#52472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52472 Reviewed By: jbschlosser Differential Revision: D26533896 Pulled By: anjali411 fbshipit-source-id: 55503028d5e087fc91992b417836cc87eb60ad55	2021-02-18 19:12:12 -08:00
Jordan Fix	65f6e665e6	Improvements for FX tracer (#52232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52232 Pull Request resolved: https://github.com/pytorch/glow/pull/5327 Reviewed By: gcatron Differential Revision: D26355583 fbshipit-source-id: f062e0b3a9cadf1584738bed85e9964b9a63efaf	2021-02-18 18:53:05 -08:00
Oleg Khabinov	bb7e07ce8e	[glow] Extending AOT config with two more fields (#5359 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5359 Reviewed By: ChunliF Differential Revision: D26468908 fbshipit-source-id: 16c4f4215f302c023d75c204b999f23ed6254aa1	2021-02-18 16:08:55 -08:00
Kyle Chen	e0b6252de0	[ROCm] Enable test_ddp_hooks.py test cases (#52403 ) Summary: Re-enabling these test cases for ROCm because they are passing. jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/52403 Reviewed By: jbschlosser, SciPioneer Differential Revision: D26516727 Pulled By: malfet fbshipit-source-id: 6c70805eda39b0aadfbeb30a527af3906d2da867	2021-02-18 15:51:18 -08:00
Jane Xu	89bc9a58e2	Add arm64 binary build (#52443 ) Summary: This is getting tested by https://github.com/pytorch/pytorch/issues/52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e	2021-02-18 14:29:41 -08:00
Nikita Shulga	22adea04df	Revert D26299594: [PyTorch Mobile] 15KiB size reduction by reducing MaxTypeIndex from 256 to 32 Test Plan: revert-hammer Differential Revision: D26299594 (`9e54532947`) Original commit changeset: 9a78c03da621 fbshipit-source-id: 2be1149539892447872eb3289f3fdef0ac92c090	2021-02-18 13:15:55 -08:00
Xu Zhao	2d4354423e	Revert nightly docker build cuda version to 11.1.1. (#52234 ) Summary: CUDA 11.2 has a performance regression, so revert to CUDA 11.1.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52234 Test Plan: [CI](https://github.com/pytorch/pytorch/actions?query=workflow%3A%22Build+PyTorch+nightly+Docker+image+and+push+to+GitHub+Container+Registry%22) Reviewed By: glaringlee Differential Revision: D26519105 Pulled By: xuzhao9 fbshipit-source-id: d1e1ecb7904c196292d83767b71000b465de73ce	2021-02-18 12:30:07 -08:00
Tao Xu	49c90648d3	[iOS GPU] Fix max_pool_2d (#52431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52431 The previous implementation was missing the padding information. Thus is not correct. ghstack-source-id: 121950755 Test Plan: - `buck test pp-macos` - CircleCI Reviewed By: SS-JIA Differential Revision: D26508482 fbshipit-source-id: b28b99c399c4f1390a5cc4f023e470eed0f8c073	2021-02-18 12:28:09 -08:00
Raghavan Raman	c7a70eec1b	Make LLVM the default backend for TE (#52314 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb	2021-02-18 12:00:38 -08:00
XiaobingSuper	8f3ed60d3e	enable mkldnn conv2d backward to support mkldnn tensor input (#48994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48994 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25537189 Pulled By: VitalyFedyunin fbshipit-source-id: d81d247798fad3815b735468d66ef9d62c07ef77	2021-02-18 10:23:10 -08:00
Dhruv Matani	9e54532947	[PyTorch Mobile] 15KiB size reduction by reducing MaxTypeIndex from 256 to 32 (#51881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51881 `MaxTypeIndex` controls the size of the array ``` detail::TypeMetaData* TypeMeta::typeMetaDatas() { static detail::TypeMetaData instances[MaxTypeIndex + 1] ``` in `typeid.cpp`. In practice, I have seen that this array doesn't hold more than 18 elements once the PyTorch library has been initialized (in mobile unit tests). I couldn't find situations where elements may be added to this array post library initialization. There is a runtime check to prevent array overflow, so reducing the size of the storage shouldn't come at any additional risk from the perspective of loss in visibility of errors. The fact that this array is staically allocated ends up using a bunch of space in the binary (potentially to initialize the trailing elements?). I'm somewhat surprised but this. However, this change registered a 15KiB size win on both fbios as well as igios. Found this when I was looking at a bloaty run that I shared with smessmer on friday: https://www.internalfb.com/intern/everpaste/?handle=GLXImQisHOfT74EBAKw47V3ktuAzbsIXAAAB I initially thought that the methods being passed in to the constructor of `detail::TypeMetaData` were causing the size increase, but only later relaized the issue after reading the folloing helpful comment: ``` // The remainder of the array is padded with TypeMetaData blanks. // The first of these is the entry for ScalarType::Undefined. // The rest are consumed by CAFFE_KNOWN_TYPE entries. ``` ghstack-source-id: 121875657 Test Plan: Sandcastle runs + the following BSB runs. ### igios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: +596 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.8 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:443632243487886@base/bsb:443632243487886@diff/ ``` ### fbios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: +104 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:169233698063125@base/bsb:169233698063125@diff/ ``` Reviewed By: raziel, iseeyuan Differential Revision: D26299594 fbshipit-source-id: 9a78c03da621fbc25a1d8087376628bccc8dbfda	2021-02-18 10:01:42 -08:00
Gregory Chanan	983347fa25	Allow broadcasting against lerp weights. (#52319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52319 Fixes: https://github.com/pytorch/pytorch/issues/52254 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26488411 Pulled By: gchanan fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642	2021-02-18 09:53:25 -08:00
Rong Rong (AI Infra)	b52e2e6045	[BE] _get_torch_cuda_version should return tuple (#52409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52409 Reviewed By: jbschlosser, glaringlee Differential Revision: D26513924 Pulled By: walterddr fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734	2021-02-18 09:28:38 -08:00
Gregory Chanan	f72b4b83fe	Fix upsample bicubic2d batching handling on CPU. (#52389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52389 Fixes: https://github.com/pytorch/pytorch/issues/49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93	2021-02-18 09:14:41 -08:00
Xiong Wei	c7b0005831	Enhance Tensor.unflatten to support -1 as the inferred size (#51955 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51719, https://github.com/pytorch/pytorch/issues/28142 Change - Update `torch.Tensor.unflatten` to support users pass`-1` as the inferred size for both tensors and named tensors. - Examples of using `-1` in the `unflatten` function are added to the docs. - Fix the rendered issue of original `unflatten` docs by removing a blank line between its example section. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51955 Reviewed By: agolynski Differential Revision: D26467198 Pulled By: zou3519 fbshipit-source-id: 6a3ede25561223187273796427ad0cb63f125364	2021-02-18 08:37:41 -08:00
Vasiliy Kuznetsov	ad9746456e	ns for fx: make unshadowed activation comparison work for N models (#52357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52357 Refactor the NS for FX compare unshadowed activations API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487270 fbshipit-source-id: 1372ef07b5f3ddc7cebdfb2dee0221a2facd0527	2021-02-18 08:20:14 -08:00
Vasiliy Kuznetsov	a937d1cb16	ns for fx: make weights comparison work on N models (#52356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52356 Refactor the NS for FX compare weights API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487271 fbshipit-source-id: 0c2172a1b33d47565004a307aff14d205671add7	2021-02-18 08:20:09 -08:00
Vasiliy Kuznetsov	d903106bad	[wip] ns for fx: add support for subgraph matching (#52130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52130 We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403092 fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a	2021-02-18 08:20:04 -08:00
Vasiliy Kuznetsov	3978ffb37a	NS for FX: add test for a simple sparsenn model (#52092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52092 Adds a very simple toy sparsenn model, and enables its inspection with the new NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403095 fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1	2021-02-18 08:17:57 -08:00
Scott Wolchok	efbb854ed8	[PyTorch] Avoid std::string in TORCH_CHECK when possible (#52221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52221 The previous code forced a `std::string` to be created even when the default message or a user-provided string literal message was used. Now it's not forced and we don't need an outlined lambda in those cases either. ghstack-source-id: 121877056 Test Plan: Compare assembly for ``` #include <c10/util/Exception.h> void f(bool b) { TORCH_CHECK(b, "message"); } void g(bool b) { TORCH_CHECK(b); } void h(bool b) { TORCH_CHECK(b, "message", random()); } ``` before/after in fbcode optimized build. Before: P174696735 After: P174696840 For `f()` and `g()`, we go from a call to an outlined lambda that did a bunch of `std::string` creation to a load of a string constant before calling `torchCheckFail`. This is a clear improvement. For `h()`, results are mixed: we save a bunch of extra string goop in the outlined lambda and instead call `c10::detail::_str_wrapper` directly. This is good for overall size. However, we no longer outline the call to `random()`, which is less than ideal. I hope to recover the ability to fully outline the `random()` call in future diffs; this is just thorny enough that I don't want to cram even more into one diff. Added automated test to make sure `TORCH_CHECK` and `TORCH_INTERNAL_ASSERT` only evaluate their arguments once. Profiled AdIndexer mergenet benchmark in perf to check that `IValue::toTensor` is still getting inlined. Reviewed By: bhosmer Differential Revision: D26380783 fbshipit-source-id: 288860772423994ac739a8f33e2c09f718e8dd38	2021-02-18 07:51:53 -08:00
Scott Wolchok	ba77b8d84e	[PyTorch][easy] Make shared empty string static instead of thread_local (#52220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52220 D21268320 (`d068a456d3`) made this thread_local, but I don't think it was necessary to do so. ghstack-source-id: 121877050 Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D26378724 fbshipit-source-id: 7f17b5cff42983ea8f5be1bd254de01bf8db9a0e	2021-02-18 07:49:07 -08:00
zilinzhu	c8b3686a3e	Make bias in lazy modules lazy and avoid create empty tensors (#52212 ) Summary: Some minor improvement for lazy modules introduced in https://github.com/pytorch/pytorch/issues/44538, https://github.com/pytorch/pytorch/issues/47350 and https://github.com/pytorch/pytorch/issues/51548. This PR mainly turn the bias to `UninitializedParameter` and instead of creating empty tensors like ```python self.bias = Parameter(torch.Tensor(0)) self.bias = UninitializedParameter() ``` I think it would be better to ```python self.register_parameter('bias', None) self.bias = UninitializedParameter() ``` In addition, I change the constructor of the `LazyBatchNorm` from ```python self.running_mean = UninitializedBuffer() ``` to ```python self.register_buffer('running_mean', UninitializedBuffer()) ``` as the original one would not change the underlying `self._buffers`. Thank you for your time on reviewing this PR :). Gently ping albanD, mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/52212 Reviewed By: jbschlosser Differential Revision: D26504508 Pulled By: albanD fbshipit-source-id: 7094d0bb4fa9e2a40a07b79d350ea12a6ebfd080	2021-02-18 06:34:53 -08:00
Anjali Chourdia	758aa45563	Revert D26369476: [pytorch][PR] [complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. Test Plan: revert-hammer Differential Revision: D26369476 (`7a408c7290`) Original commit changeset: 7a79d5a609b0 fbshipit-source-id: f0011f40962ccbcd8e7c19bd727e1e49cf2ec0c4	2021-02-18 05:01:03 -08:00
Zachary DeVito	60518d10f6	[deploy] torch::deploy API (#51754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51754 This API allows you to manage multiple python interpreters in a single process to deploy PyTorch models packaged with torch.package. torch/csrc/deploy/deploy.h contains the API definition torch/csrc/deploy/test_deploy.cpp has some examples. Notes: * mutex is added to PyTorchStreamReader to make it safe to use from multiple threads at once. * USE_DEPLOY is only true for the special libtorch_deployinterpreter.so library, when enabled we use a hash table to maintain PyObject <> at::Tensor mappping rather than the internal pointer in Tensor since >1 interpreter may have a reference to the tensor. * serialization.py has some additional functions for creating pickle objects but keeping storages in memory for use transfering tensors between interpreters Test Plan: Imported from OSS Reviewed By: wconstab Differential Revision: D26329468 Pulled By: zdevito fbshipit-source-id: d75f4ebb9a27f1d911179d9996041bcb3ca04a07	2021-02-18 02:30:08 -08:00
Nicolas Hug	9cf6be6b3e	Fix torch.nn.functional.interpolate microbenchmark for non-4D inputs Summary: This diff fixes the `interpolate` microbenchmark for non-4D inputs, which are not supported by the `bilinear` mode Test Plan: 5D and 3D: ``` # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,16,320,320)_output_size(8,256,256) # Input: input_size: (1, 3, 16, 320, 320), output_size: (8, 256, 256) Forward Execution Time (us) : 221008.660 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(4,512,320)_output_size(256,) # Input: input_size: (4, 512, 320), output_size: (256,) Forward Execution Time (us) : 9727.900 ``` 4D ``` # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastTrue # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: True Forward Execution Time (us) : 375.181 ``` Reviewed By: fmassa Differential Revision: D26486678 fbshipit-source-id: 5d476afba3f35da9f8b86db16e21505bdb00888b	2021-02-18 02:07:54 -08:00
Bert Maher	7a67a7a396	[static runtime] Generate sigmoid with NNC (#52424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52424 NNC has a fast sigmoid on par with aten. Using it for static runtime lets us skip dispatch overhead. ghstack-source-id: 121953787 Test Plan: ``` caffe2=0 batch=1 run.sh ``` Reviewed By: bwasti Differential Revision: D26291425 fbshipit-source-id: a2ad79765dacee352625f0e5322e871556e0ca9e	2021-02-18 01:56:50 -08:00
Bert Maher	8228086e64	[static runtime] Use VML-inspired logarithm with NNC, tweak scheduling (#52423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52423 NNC has a new logarithm implementation that closely matches the performance of VML (see D26246400 (`2e35fe9535`)). Using this in the NNC generated kernel for logit increases the win slightly. ghstack-source-id: 121953008 Test Plan: ``` caffe2=0 bs=20 scripts/bwasti/static_runtime/run.sh ``` Reviewed By: bwasti Differential Revision: D26291426 fbshipit-source-id: c5c3933732c6ade5235f23d6dc71410170a6c749	2021-02-18 01:54:56 -08:00
Elias Ellison	e1d927e552	[JIT] Update freezing api (#52337 ) Summary: Update freezing api for 1.8, and add a corresponding C++ API. The `optimize` flag hasn't been publicly released yet, so we are able to change it without breaking BC. I will submit a PR to branch release as well, there are a few more days to do that Pull Request resolved: https://github.com/pytorch/pytorch/pull/52337 Reviewed By: ejguan Differential Revision: D26491833 Pulled By: eellison fbshipit-source-id: 6dcd74eb8f76db64ac53183d03dabdd0f101f4b5	2021-02-18 00:17:27 -08:00
Bert Maher	ac121165e2	Remove ReduceOp::accumulator (#52196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52196 A reduction does not need to know the buffer into which its result will be written. This change gets us closer to being able to create reductions inside Compute, where we have access to the tensor axes. ghstack-source-id: 121813071 Test Plan: test_tensorexpr Reviewed By: ZolotukhinM Differential Revision: D26420107 Pulled By: bertmaher fbshipit-source-id: c8d8a99649adfd6de56fe53a728f5aa034a84f13	2021-02-17 23:36:23 -08:00
Bert Maher	a788c2d777	[nnc] Remove output_args from ReduceOp (#52187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52187 ReduceOp doesn't need to track the indices that its result will be written into. ghstack-source-id: 121813075 Test Plan: test_tensorexpr, tensorexpr_bench Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D26420575 fbshipit-source-id: 7afcfa611515334e36de8039722011687f3b61e4	2021-02-17 23:36:18 -08:00
Bert Maher	62d5f60ad2	Avoid using ReduceOp->output_args() in rfactor (#52177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52177 I'm trying to get rid of `output_args` for reductions, because they shouldn't be necessary; it's reducing over its reduction axis, why does it need to know where its output is going? Rfactor is probably the trickiest place where we use output_args, but it looks like it's mostly just carrying around the location of the store, so use that instead. ghstack-source-id: 121813072 Test Plan: build/bin/test_tensorexpr && build/bin/tensorexpr_bench Imported from OSS Reviewed By: navahgar Differential Revision: D26420548 fbshipit-source-id: aeab564c6113fa02eabb14c9b70c7edfd05b264d	2021-02-17 23:36:13 -08:00
Bert Maher	f6a6814a4f	Dont look at reduction output args when computing mem dependencies (#52170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52170 ghstack-source-id: 121813073 Test Plan: unit tests Reviewed By: navahgar Differential Revision: D26411735 fbshipit-source-id: 31c35af80d4f3e06df17ec65e4c91f604fc8745a	2021-02-17 23:34:08 -08:00
Scott Wolchok	de9016007a	[PyTorch][easy] Coalesce string literals in data_ptr error message (#52379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52379 There's no reason to create and concatenate multiple string literals here when we could just combine them. ghstack-source-id: 121890478 Test Plan: builds Reviewed By: ilia-cher Differential Revision: D26492399 fbshipit-source-id: a9e611a5b7ce5c1255135f3a0db12cc765b29a87	2021-02-17 23:06:57 -08:00
kshitij12345	7a408c7290	[complex] `masked_fill`: Complex Autograd support and update masked_scatter skips. (#52035 ) Summary: Now that `masked_fill` CUDA is migrated, skips on masked_scatter can be removed. Reference: https://github.com/pytorch/pytorch/issues/33152 Note: Have decreased the shape of Tensor for `masked_scatter` from (M, M) -> (S, S) and so on. With shapes of M : 96.53s ``` test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................ [100%] =============================================================== 88 passed, 24 skipped, 7981 deselected in 96.53s (0:01:36) ================================================================ ``` With shapes of S : 46.53s ``` test/test_ops.py ........................................ssssssssssss........................ssssssssssss........................ [100%] ==================================================================== 88 passed, 24 skipped, 7981 deselected in 46.53s ===================================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52035 Reviewed By: VitalyFedyunin Differential Revision: D26369476 Pulled By: anjali411 fbshipit-source-id: 7a79d5a609b0019f8fe9ce6452924dd33390dce1	2021-02-17 22:49:26 -08:00
Yinghai Lu	f6321977e9	Fix shape inference for multiple outputs with different output dtypes (#52417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52417 When we have multiple outputs, previously, we will set `infered_data_type` to the first output and stick to it. This is not correct if we have more outputs that has different dtype. The fix here will revert `infered_data_type` back to previous value (`UNDEFINED`) so that we can still enter the conditional check and set the right dtype for second and more outputs. Test Plan: ``` buck test caffe2/caffe2/fb/operators:infer_bound_shape_op_test ``` Reviewed By: khabinov Differential Revision: D26502161 fbshipit-source-id: 647f0106a5785dc156fddfc196ac67001602fda8	2021-02-17 22:24:02 -08:00
Richard Barnes	f1e004b954	Fix compiler warning for MathConstants.h (#52123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52123 Compiler currently complains: ``` caffe2/c10/util/MatchConstants.h(18): warning: calling a constexpr __host__ function("from_bits") from a __host__ __device__ function("pi") is not allowed. ``` This diff extirpates the warning Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D26379485 fbshipit-source-id: ab4821119cba8c43fd1d5788c4632d0613529ec8	2021-02-17 22:08:03 -08:00
Scott Wolchok	eaad002cf6	[PyTorch] s/__attribute__((__noinline__))/__attribute__((noinline))/ (#52362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52362 AFAICT, it is documented to be the latter and not the former. GCC: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes Clang: https://clang.llvm.org/docs/AttributeReference.html Both versions work in the oldest and newest GCC & Clang versions on Godbolt: https://godbolt.org/z/s6f4PW So why change? 1) lack of underscores matches the documentation 2) AMD HIP defines `__noinline__` as a macro, which doesn't play well with the underscore version. See `2080cc113a/include/hip/hcc_detail/host_defines.h (L54)` ghstack-source-id: 121875424 Test Plan: Rely on existing CI Reviewed By: bhosmer Differential Revision: D26488991 fbshipit-source-id: 6cfcdfd41c58170659e263cd519ac5359ffd5d46	2021-02-17 21:04:28 -08:00
Adam Simpkins	f7aa88b400	[caffe2] Explicitly define all DataTypes in python/core.py (#51768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51768 This updates python/core.py to explicitly define all of the `DataType` values rather than dynamically defining them at runtime from the `caffe2_pb2` values. This allows type checkers like Pyre and Mypy to see the members of the `DataType` class. Otherwise the type checkers report errors such as `"core.DataType" has no attribute "INT64"`. This code does keep a run-time check that all of the data types defined by `caffe2_pb2.proto` are defined correctly in this file. This way if someone does add a new type to `caffe2_pb2.proto` it should be very quickly apparent that this file needs to be updated and kept in sync. ghstack-source-id: 121936201 Test Plan: Confirmed that various caffe2/python tests still pass. Verified that this allows many `pyre-fixme` comments to be removed in downstream projects, and that Pyre is still clean for these projects. Reviewed By: jeffdunn Differential Revision: D26271725 Pulled By: simpkins fbshipit-source-id: f9e95795de60aba67d7d3872d0c141ed82ba8e39	2021-02-17 20:54:17 -08:00
Adam Simpkins	27d89057f8	[caffe2] fix deserialization of unknown tensor data_type values (#52411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52411 The `TensorDeserializer` code previously did not correctly handle unknown `data_type` values. It attempted to deserialize the data as floats, rather than recognizing that it did not understand the data type and erroring out. Google protobuf will never return unknown values for enum fields. If an unknown value is found in serialized data, the protobuf code discards it. As a result `has_data_type()` will return false, but `get_data_type()` will simply return the default value, which happens to be set to `FLOAT`. As a result if we ever encounter a serialized blob with an unknown data type the previous code would incorrectly think the data type was `FLOAT`. This fixes the code to check if the `data_type` value is present before reading it. ghstack-source-id: 121915981 Test Plan: Included a unit test that verifies this behavior. Confirmed that without this fix the code proceeded with the float deserialization code path. When deserializing int32_t data it fortunately did fail later due to an unexpected field length check, but this isn't guaranteed to be the case. In some cases it potentially could incorrectly succeed and return wrong data. Reviewed By: mraway Differential Revision: D26375502 fbshipit-source-id: 4f84dd82902e18df5e693f4b28d1096c96de7916	2021-02-17 19:13:43 -08:00
Alban Desmaison	e0d9d0f248	update symeig backward note about similar eigenvalues (#52311 ) Summary: First part of https://github.com/pytorch/pytorch/issues/49886 to at least properly warn users of the current state Pull Request resolved: https://github.com/pytorch/pytorch/pull/52311 Reviewed By: soulitzer Differential Revision: D26495644 Pulled By: albanD fbshipit-source-id: 72abdfe41cdbcc1ac739a536eb85d1aa4ba90897	2021-02-17 19:07:25 -08:00
Kimish Patel	08b95e3c48	[Pytorch, Sparsity] Integrate sparse qnnpack operator in framework (#52377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52377 Add QNNPACK specific packed params for sparse linear. Add sparse linear dynamic op with appropriate registration. Add python side LinearDynamic module for sparsity. Add tests to validate sparse linear qnnpack kernels. Note that since these test are mostly run on x86 platform and given that 1x4 sparse kernels are implemented both in sse and arm, LinearDynamic at the moment defaults to 1x4 pattern. Plan is to add another diff that will allow a global override for 8x1 pattern such that prepare/convert flow can work for exporting model for mobile. Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test Reviewed By: z-a-f Differential Revision: D26491944 fbshipit-source-id: b98839b4c62664e1fabbb0cbeb2e5c1bd5903b4d	2021-02-17 18:25:13 -08:00
Kimish Patel	908ba05a06	[Pytorch] Add python binding to use mobile cpu allocator. (#52376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52376 Using default cpu allocator for ops executed on qnnpack backend will result in asan failures with heap overflow since qnnpack (and xnnpack) can access input beyond their and/beginning. Here we are enabling this feature specifically to enable dynamic sparse linear op test using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and hence can result in out-of-bound access. Test Plan: CI Reviewed By: z-a-f Differential Revision: D26491943 fbshipit-source-id: bcc2485e957c7abdef0853c36f6e0f876c20cee3	2021-02-17 18:23:14 -08:00
Raziel Alvarez Guevara	70bed6a55a	Removes deprecated preprocess method from the backend interface (#52258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52258 Removes deprecated preprocess method from the backend interface. Preprocessing logic should be now registered along with the backend interface (i.e. PyTorchBackendInterface) via the BackendPreprocessFunction. Also refactored internal dependencies. ghstack-source-id: 121704837 Test Plan: Validates all related tests pass: buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - BackendTest.ToBackend' python test/test_jit.py TestBackends ===== Glow buck test mode/dev //glow/fb/torch_glow/tests:TorchGlowBackendTests buck test mode/dev //glow/fb/torch_glow/tests:torch_glow_backend_tests Reviewed By: jackm321 Differential Revision: D26443479 fbshipit-source-id: afdc51ae619ced293d10c7a6a12f3530e4c4e53c	2021-02-17 17:53:36 -08:00
Adam Simpkins	b9f051db9f	Add type hints for the _import_c_extension module (#51767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51767 The `_import_c_extension.py` finds the right C extension library to use, and then simply re-exports all of the symbols that it defines. This adds a `_import_c_extension.pyi` file with type hints to let type checkers like Pyre and Mypy know the names of the symbols that will be re-exported from the C extension. This does not define all of the symbols provided by the C extension, but does define all of the symbols necessary to make type checkers happy about other code in the `caffe2/python` directory. ghstack-source-id: 121916324 Test Plan: Was able to have Pyre successfully type check the `caffe2/python` directory with this stub file plus a few other changes. Confirmed that all of the dependent projects affected by this report no new pyre issues in sandcastle. Ran `python test/test_type_hints.py` in the PyTorch github repository and confirmed it also passes. Differential Revision: D26271726 Pulled By: simpkins fbshipit-source-id: 6dbadcf02e0b2cc44a9e3cdabe9291c1250959b4	2021-02-17 17:37:47 -08:00
Scott Wolchok	76af821c36	[PyTorch] "Fix" wrong-looking move in TensorImpl (#52344 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52344 This line is a bug-prone use of std::move combined with a reference to the moved-from parameter in the same series of function call arguments. This is normally a problem because the order of evaluation is undefined -- if the move happens before the call to `storage.device()`, we may have problems. It is not a problem here because we are merely forwarding from one `Storage&&` parameter to another. ghstack-source-id: 121837267 Test Plan: See no clang-tidy/HowToEven warning on the diff, I hope Reviewed By: bhosmer Differential Revision: D26436550 fbshipit-source-id: da85d79be854ff42c5a0cab9649ba82295816eca	2021-02-17 17:26:04 -08:00
Zirui Tao	2b202667c1	[1/N] CPU pointwise optimization: Add a benchmark for Relu Summary: As title Test Plan: Building: finished in 01:58.4 min (100%) 16761/16761 jobs, 16761 updated Total time: 02:32.3 min Run on (24 X 2394.45 MHz CPU s) 2021-02-16 21:29:30 ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------------------------- relu_nnc/64 1738 ns 1738 ns 410535 log/s=36.8257M/s relu_nnc/512 1708 ns 1708 ns 408678 log/s=299.711M/s relu_nnc/8192 3297 ns 3297 ns 214362 log/s=2.48499G/s relu_nnc/32768 10725 ns 10722 ns 61032 log/s=3.05603G/s log_nnc_sleef/64 2076 ns 2075 ns 326248 log/s=30.8436M/s log_nnc_sleef/512 3070 ns 3069 ns 230616 log/s=166.81M/s log_nnc_sleef/8192 22214 ns 22210 ns 31251 log/s=368.849M/s log_nnc_sleef/32768 85835 ns 85824 ns 8366 log/s=381.804M/s log_nnc_fast/64 1852 ns 1852 ns 379123 log/s=34.5532M/s log_nnc_fast/512 2456 ns 2456 ns 299463 log/s=208.503M/s log_nnc_fast/8192 10953 ns 10952 ns 69894 log/s=747.957M/s log_nnc_fast/32768 35424 ns 35422 ns 19986 log/s=925.08M/s log_nnc_vml/64 2361 ns 2361 ns 356220 log/s=27.1063M/s log_nnc_vml/512 2218 ns 2218 ns 313444 log/s=230.857M/s log_nnc_vml/8192 8420 ns 8420 ns 81594 log/s=972.912M/s log_nnc_vml/32768 29484 ns 29484 ns 21701 log/s=1.1114G/s log_aten/64 15970 ns 15970 ns 44401 log/s=4.00742M/s log_aten/512 18344 ns 18344 ns 41056 log/s=27.9114M/s log_aten/8192 24894 ns 24893 ns 27414 log/s=329.084M/s log_aten/32768 29129 ns 29125 ns 22477 log/s=1.12508G/s logit_nnc_sleef/64 2379 ns 2379 ns 261168 logit/s=26.8981M/s logit_nnc_sleef/512 5778 ns 5774 ns 114009 logit/s=88.6757M/s logit_nnc_sleef/8192 57268 ns 57236 ns 12429 logit/s=143.127M/s logit_nnc_sleef/32768 216356 ns 216344 ns 3026 logit/s=151.462M/s logit_nnc_fast/64 2178 ns 2173 ns 282306 logit/s=29.4565M/s logit_nnc_fast/512 2955 ns 2943 ns 202527 logit/s=173.95M/s logit_nnc_fast/8192 14836 ns 14835 ns 46794 logit/s=552.192M/s logit_nnc_fast/32768 53999 ns 53997 ns 12842 logit/s=606.846M/s logit_nnc_vml/64 2132 ns 2132 ns 335874 logit/s=30.018M/s logit_nnc_vml/512 3029 ns 3029 ns 250988 logit/s=169.058M/s logit_nnc_vml/8192 13264 ns 13263 ns 53504 logit/s=617.655M/s logit_nnc_vml/32768 49395 ns 48284 ns 14526 logit/s=678.654M/s logit_aten/64 88180 ns 86690 ns 9270 logit/s=738.261k/s logit_aten/512 54682 ns 54489 ns 10000 logit/s=9.3964M/s logit_aten/8192 170878 ns 164357 ns 6965 logit/s=49.8427M/s logit_aten/32768 452291 ns 434638 ns 3967 logit/s=75.3915M/s logit_caffe2/64 30170 ns 29902 ns 24686 logit/s=2.14029M/s logit_caffe2/512 203517 ns 201201 ns 3570 logit/s=2.54472M/s logit_caffe2/8192 3199528 ns 3157098 ns 220 logit/s=2.59479M/s logit_caffe2/32768 12520838 ns 12504846 ns 56 logit/s=2.62042M/s tanh_nnc_fast/64 1979 ns 1977 ns 309745 tanh/s=32.3752M/s tanh_nnc_fast/512 2331 ns 2331 ns 300937 tanh/s=219.636M/s tanh_nnc_fast/8192 8323 ns 8323 ns 83601 tanh/s=984.26M/s tanh_nnc_fast/32768 30767 ns 30766 ns 23024 tanh/s=1065.06M/s tanh_aten/64 17181 ns 17180 ns 36818 tanh/s=3.72522M/s tanh_aten/512 19071 ns 19036 ns 37243 tanh/s=26.8968M/s tanh_aten/8192 53542 ns 52006 ns 16268 tanh/s=157.521M/s tanh_aten/32768 619869 ns 587600 ns 1000 tanh/s=55.7658M/s tanh_caffe2/64 9668 ns 9654 ns 70926 tanh/s=6.62919M/s tanh_caffe2/512 70409 ns 70409 ns 9881 tanh/s=7.27184M/s tanh_caffe2/8192 1179098 ns 1179011 ns 644 tanh/s=6.9482M/s tanh_caffe2/32768 4384300 ns 4382613 ns 156 tanh/s=7.47682M/s BatchNorm/ATen/1/64/112/112 23186429 ns 23183715 ns 27 GB/s=277.028M/s BatchNorm/ATen/1/256/14/14 1772907 ns 1770636 ns 394 GB/s=226.703M/s BatchNorm/ATen/1/128/28/28 3069417 ns 3069229 ns 232 GB/s=261.569M/s BatchNorm/ATen/1/64/56/56 6367276 ns 6367190 ns 111 GB/s=252.173M/s BatchNorm/ATen/1/512/7/7 1334734 ns 1334373 ns 516 GB/s=150.411M/s BatchNorm/ATen/5/64/112/112 131727903 ns 131721364 ns 7 GB/s=243.792M/s BatchNorm/ATen/5/256/14/14 7879002 ns 7874672 ns 85 GB/s=254.873M/s BatchNorm/ATen/5/128/28/28 15561373 ns 15269781 ns 42 GB/s=262.877M/s BatchNorm/ATen/5/64/56/56 29169722 ns 29107393 ns 24 GB/s=275.812M/s BatchNorm/ATen/5/512/7/7 5042006 ns 5028687 ns 100 GB/s=199.559M/s BatchNorm/NNC/1/64/112/112 3303598 ns 3271058 ns 188 GB/s=1.96344G/s BatchNorm/NNC/1/256/14/14 330641 ns 326644 ns 2033 GB/s=1.22889G/s BatchNorm/NNC/1/128/28/28 498706 ns 497894 ns 1131 GB/s=1.61242G/s BatchNorm/NNC/1/64/56/56 1116910 ns 1114768 ns 641 GB/s=1.44033G/s BatchNorm/NNC/1/512/7/7 163380 ns 163351 ns 3493 GB/s=1.22867G/s BatchNorm/NNC/5/64/112/112 16392078 ns 16386427 ns 41 GB/s=1.95971G/s BatchNorm/NNC/5/256/14/14 1133781 ns 1133369 ns 674 GB/s=1.77086G/s BatchNorm/NNC/5/128/28/28 2053208 ns 2053211 ns 276 GB/s=1.95503G/s BatchNorm/NNC/5/64/56/56 3874949 ns 3874734 ns 165 GB/s=2.07193G/s BatchNorm/NNC/5/512/7/7 653665 ns 651498 ns 1236 GB/s=1.54033G/s BatchNorm/ATenRelu/1/64/112/112 36878892 ns 36100523 ns 22 GB/s=177.907M/s BatchNorm/ATenRelu/1/256/14/14 6404318 ns 5544976 ns 100 GB/s=72.3913M/s BatchNorm/ATenRelu/1/128/28/28 5897059 ns 5735509 ns 106 GB/s=139.973M/s BatchNorm/ATenRelu/1/64/56/56 10075458 ns 9965146 ns 62 GB/s=161.125M/s BatchNorm/ATenRelu/1/512/7/7 2680507 ns 2662541 ns 254 GB/s=75.3806M/s BatchNorm/ATenRelu/5/64/112/112 145738113 ns 144253693 ns 5 GB/s=222.612M/s BatchNorm/ATenRelu/5/256/14/14 13582519 ns 13427209 ns 65 GB/s=149.476M/s BatchNorm/ATenRelu/5/128/28/28 22747138 ns 22627185 ns 31 GB/s=177.401M/s BatchNorm/ATenRelu/5/64/56/56 53609692 ns 52936728 ns 15 GB/s=151.656M/s BatchNorm/ATenRelu/5/512/7/7 11378314 ns 11083777 ns 65 GB/s=90.5395M/s BatchNorm/NNCRelu/1/64/112/112 3154436 ns 3148939 ns 193 GB/s=2.03958G/s BatchNorm/NNCRelu/1/256/14/14 337341 ns 337163 ns 1926 GB/s=1.19055G/s BatchNorm/NNCRelu/1/128/28/28 505570 ns 505569 ns 1231 GB/s=1.58794G/s BatchNorm/NNCRelu/1/64/56/56 903452 ns 903421 ns 659 GB/s=1.77728G/s BatchNorm/NNCRelu/1/512/7/7 158521 ns 158321 ns 3781 GB/s=1.2677G/s BatchNorm/NNCRelu/5/64/112/112 15488210 ns 15480019 ns 41 GB/s=2.07446G/s BatchNorm/NNCRelu/5/256/14/14 1149186 ns 1148963 ns 649 GB/s=1.74683G/s BatchNorm/NNCRelu/5/128/28/28 2011589 ns 2011424 ns 320 GB/s=1.99564G/s BatchNorm/NNCRelu/5/64/56/56 3776274 ns 3776060 ns 161 GB/s=2.12607G/s BatchNorm/NNCRelu/5/512/7/7 699762 ns 699582 ns 975 GB/s=1.43446G/s BM_CompileSwish 30471825 ns 30470017 ns 24 BM_CompileSwishLLVMOnly 27479624 ns 27473475 ns 25 FusedOverhead 196219 ns 196195 ns 3342 UnfusedOverhead 220210 ns 220119 ns 3302 Gemm/Torch/128/128/128 115526 ns 115343 ns 7414 GFLOPS=36.3637G/s Gemm/TensorExprNoopt/128/128/128 3155851 ns 3155706 ns 210 GFLOPS=1.32912G/s Gemm/TensorExprTile32x32/128/128/128 124454 ns 124452 ns 5774 GFLOPS=33.7021G/s Gemm/TensorExprTile4x16/128/128/128 174408 ns 174366 ns 3987 GFLOPS=24.0546G/s Gemm/TensorExprTile4x16VecUnroll/128/128/128 72949 ns 72948 ns 9028 GFLOPS=57.4974G/s Gemm/TensorExprTile4x16Cache/128/128/128 73237 ns 73234 ns 9501 GFLOPS=57.2726G/s Reduce1D/Torch/16777216 426865265 ns 426853756 ns 2 BYTES=157.217M/s Reduce1D/Naive/16777216 132347709 ns 132343710 ns 5 BYTES=507.08M/s Reduce1D/NativeRfactor/16777216 234668375 ns 234664682 ns 3 BYTES=285.978M/s Reduce1D/TeNaive/16777216 20468304 ns 20467906 ns 34 BYTES=3.27874G/s Reduce1D/TeSplitTail/16777216 20378995 ns 20378678 ns 34 BYTES=3.29309G/s Reduce1D/TeSplitMask/16777216 20371783 ns 20371260 ns 36 BYTES=3.29429G/s Reduce1D/TeRfactorV2/16777216 8235908 ns 8235723 ns 84 BYTES=8.14851G/s CPU info: Running ```sudo lshw -class processor```. Get 24 CPUs with identical architecture as follows: *-cpu:0 description: CPU product: Intel Core Processor (Broadwell) vendor: Intel Corp. physical id: 400 bus info: cpu@0 version: 6.61.2 slot: CPU 0 size: 2GHz capacity: 2GHz width: 64 bits capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx rdtscp x86-64 constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat configuration: cores=1 enabledcores=1 microcode=1 threads=1 Reviewed By: bwasti Differential Revision: D26275048 fbshipit-source-id: 3de669f622eb8cd328787caa878dc0c05de600a5	2021-02-17 17:18:28 -08:00
Nikita Shulga	2775ff4a47	[BE] Decorate unused functions with C10_UNUSED (#52378 ) Summary: This suppresses repeated warnings for every file that includes vec256 or math.h: ``` ../aten/src/ATen/native/Math.h:1095:15: warning: unused function 'calc_igamma' [-Wunused-function] c10::BFloat16 calc_igamma<c10::BFloat16>(c10::BFloat16 a, c10::BFloat16 x) { ^ ../aten/src/ATen/native/Math.h:1100:11: warning: unused function 'calc_igamma' [-Wunused-function] c10::Half calc_igamma<c10::Half>(c10::Half a, c10::Half x) { ^ ../aten/src/ATen/native/Math.h:1105:15: warning: unused function 'calc_igammac' [-Wunused-function] c10::BFloat16 calc_igammac<c10::BFloat16>(c10::BFloat16 a, c10::BFloat16 x) { ^ ../aten/src/ATen/native/Math.h:1110:11: warning: unused function 'calc_igammac' [-Wunused-function] c10::Half calc_igammac<c10::Half>(c10::Half a, c10::Half x) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52378 Reviewed By: walterddr Differential Revision: D26492412 Pulled By: malfet fbshipit-source-id: c570c9beb9915c96fca297e0b88d0291937d3132	2021-02-17 16:39:16 -08:00
Eli Uriegas	a11650b069	.circleci: Downgrade CUDA 11.2 -> 11.1 for binaries (#52151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52151 CUDA 11.2 might not be as performant as we thought so let's downgrade to something we think is more performant. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D26408314 Pulled By: seemethere fbshipit-source-id: e2446aa0115e2c2a79718b1fdfd9fccf2072822d	2021-02-17 16:20:14 -08:00
Scott Wolchok	79e10ce97b	[PyTorch] Construct IValue from List without copies in args (#52325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52325 List's move constructor is comparatively expensive (copies the type) and so is its destructor (has to destroy the type, which isn't null). So, it's best not to create intermediate `List` objects in function parameters. Copy elision won't save us here; it's not allowed to! (see https://en.cppreference.com/w/cpp/language/copy_elision) ghstack-source-id: 121807291 Test Plan: Profile AdIndexer benchmark. Time spent in push_outputs is down from 0.2% to 0.01%. Inspecting assembly for `c10::impl::push_outputs<c10::List<at::Tensor>,false>::call` shows that we have gone from 2 List move ctor calls and 3 ~instrusive_ptr dtor calls to 0 calls and 1 call, respectively. Reviewed By: bhosmer Differential Revision: D26471092 fbshipit-source-id: 412a85fcc36d141fb91710c7855df24c137813a9	2021-02-17 16:14:51 -08:00
Scott Wolchok	7e2becb70f	[PyTorch] Reduce copy/move in c10::ivalue::from (#52324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52324 `c10::ivalue::from` took its parameter by value. `List` has an expensive move ctor (it has to copy the type shared_ptr) and dtor (it has to decref the type, which isn't null), so it's better to avoid intermediate List objects in function parameters. ghstack-source-id: 121807292 Test Plan: Profiled AdIndexer benchmark; time spent in push_outputs is down from 0.5% to 0.23%. Comparing assembly for `c10::impl::push_outputs<c10::List<at::Tensor>, false>::call`, we went from 4 List move ctor calls and 5 ~intrusive_ptr calls to 2 move ctor calls and 3 dtor calls, respectively. Reviewed By: bhosmer Differential Revision: D26471093 fbshipit-source-id: 7b2c5e8d391a428f2b4d895717a43123c8d7a054	2021-02-17 16:07:45 -08:00
James Reed	f7a3634466	[WIP][FX] Normalize torch.nn.functional calls (#51816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51816 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26290764 Pulled By: jamesr66a fbshipit-source-id: 9c05ff1b7c6f0ab8a13516f7cc2fe279980ebe5d	2021-02-17 15:18:03 -08:00
Vitaly Fedyunin	8bf846d2c8	Skip OneDNN Convolution in case of groups = 24 #50042 (#52327 ) Summary: Temporary disabling OneDNN conv for group size = 24 as OneDNN update came too late to be fully tested https://github.com/pytorch/pytorch/issues/50042 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52327 Reviewed By: agolynski Differential Revision: D26474186 Pulled By: VitalyFedyunin fbshipit-source-id: 8d6964d33c8dcab70e207088c3940810eabbd068	2021-02-17 14:49:23 -08:00
Stephen Macke	f6e0f5b85a	[typing] ignore mypy false positives in aten_test.py (#52370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52370 After adding .pyi stubs for torch / caffe2 protos, there were some mypy false positives (https://github.com/pytorch/pytorch/pull/52341). We tell mypy to ignore the offending file here. Test Plan: Let CI run. Reviewed By: malfet, dzhulgakov Differential Revision: D26490302 fbshipit-source-id: 87cdfd7419efdc7abece9ca975a464201732b7a0	2021-02-17 14:31:40 -08:00
Dhruv Matani	5003d417d4	[PyTorch Mobile] Outline DispatchStub::get_call_ptr() (#51908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51908 As suggested by swolchok. The idea is to outline `DispatchStub::get_call_ptr()` and also not have it specialized per instantiation of the `DispatchStub` class since it results in size bloat for mobile. ghstack-source-id: 121873712 Test Plan: Build + Circle CI. ### lightspeed ``` D26324255-V8 (https://www.internalfb.com/intern/diff/D26324255/?dest_number=121462800) messenger-experimental-optimized-device: Succeeded Change in Download Size for arm64 + 3x assets variation: -13.0 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -51.4 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:325359338899206@base/bsb:325359338899206@diff/ ``` ### igios ``` D26324255-V8 (https://www.internalfb.com/intern/diff/D26324255/?dest_number=121462800) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: -9.2 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -23.4 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:823799391811488@base/bsb:823799391811488@diff/ ``` ### fbios-pika ``` D26324255-V8 (https://www.internalfb.com/intern/diff/D26324255/?dest_number=121462800) fbios-pika: Succeeded Change in Download Size for arm64 + 3x assets variation: -8.0 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -22.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:1345469719167068@base/bsb:1345469719167068@diff/ ``` Reviewed By: swolchok Differential Revision: D26324255 fbshipit-source-id: 61aba8687f4c1b742fa9d9d917a026abc8d9c328	2021-02-17 13:42:28 -08:00
Dhruv Matani	e7f28d4241	[PyTorch Mobile] Restructure DispatchStub::operator() code to move template independent code into an external method (#51403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51403 Turns out this isn't a new idea. swolchok posted about this a while ago and this was discussed in the composability group. Links to posts: * Template Hoisting: https://fb.workplace.com/groups/llvm.gcc/permalink/2321250667923535/ * C++: Most of the code in a template should depend on the template parameter(s): https://fb.workplace.com/groups/2088132188069398/permalink/2224983771050905/ ghstack-source-id: 121873716 Test Plan: Results in a 10KiB size reduction on fbios. Will re-run BSB for igios. Reviewed By: swolchok Differential Revision: D25859327 fbshipit-source-id: 915abebb2643f8ac9a901f3b4d79c63f4bbb5fee	2021-02-17 13:40:30 -08:00
Nikita Shulga	6c875f17ca	Enable PyTorch_QNNPACK for Apple Silicon builds (#52308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52308 Reviewed By: janeyx99 Differential Revision: D26488223 Pulled By: malfet fbshipit-source-id: ecc3925f3374ad4a9e8f740b007bf6f3b23d8e51	2021-02-17 13:31:16 -08:00
Michael Melesse	51c28e4d7e	[ROCm] enable fft tests (#51581 ) Summary: This PR enable some failing unit tests for fft in pytorch on ROCM. The reason these tests were failing was due to hipfft clobbering inputs causing a mismatch in tests that were checking that applying ffts and their reverse would get you back to the input. We solve this by cloning the input using an existing flag on the ROCM platform. There PR doesnot enable all fft tests. There are other issues that need to be resolved before that can happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51581 Reviewed By: ejguan Differential Revision: D26489344 Pulled By: seemethere fbshipit-source-id: 472fce8e514adcf91e7f46a686cbbe41e91235a9	2021-02-17 13:27:55 -08:00
Scott Wolchok	edf8130e9e	[PyTorch] Add set_data_ptr_noswap & use where possible (#52244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52244 `StorageImpl::set_data_ptr` returns the old pointer and thus has to do extra work. Found because `std::swap<at::DataPtr>` was showing up in profiling, although at < 1%. ghstack-source-id: 121795131 Test Plan: Run AdIndexer benchmark under `perf stat`. Before: ``` 17,990.01 msec task-clock # 0.998 CPUs utilized ( +- 0.43% ) 6,550 context-switches # 0.364 K/sec ( +- 31.42% ) 3 cpu-migrations # 0.000 K/sec ( +- 7.14% ) 103,820 page-faults # 0.006 M/sec ( +- 2.47% ) 35,610,511,494 cycles # 1.979 GHz ( +- 0.40% ) (50.03%) 71,651,045,779 instructions # 2.01 insn per cycle ( +- 0.07% ) (50.02%) 11,679,947,910 branches # 649.246 M/sec ( +- 0.10% ) (50.03%) 69,088,927 branch-misses # 0.59% of all branches ( +- 0.24% ) (50.06% ``` After: ``` 17,896.20 msec task-clock # 0.999 CPUs utilized ( +- 0.24% ) 4,011 context-switches # 0.224 K/sec ( +- 27.77% ) 3 cpu-migrations # 0.000 K/sec 100,350 page-faults # 0.006 M/sec ( +- 1.58% ) 35,418,702,208 cycles # 1.979 GHz ( +- 0.23% ) (50.05%) 71,449,334,935 instructions # 2.02 insn per cycle ( +- 0.09% ) (50.03%) 11,652,819,899 branches # 651.134 M/sec ( +- 0.12% ) (50.04%) 69,744,411 branch-misses # 0.60% of all branches ( +- 0.53% ) (50.06%) ``` Cycles difference is within the noise, but it looks like we have an 0.28% instruction count win, which is outside the noise (and fits with intuition that this should be better). Reviewed By: hlu1 Differential Revision: D26437297 fbshipit-source-id: bf0fceccf6ad78f1497b03ccb4cdfd1a21c6846c	2021-02-17 12:42:21 -08:00
Zafar Takhirov	a07530e57f	[quant] Factoring out the list of no_observers (#50459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50459 Some of the custom modules cannot have the observers be inserted automatically. This PR factors out that list into a separate function. Test is not required, as it is covered by the unittests for those modules. (Note: this ignores all push blocking failures!) Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D26092531 fbshipit-source-id: 1f89daf3a13ef31bc4e9058c3443559c65a05812	2021-02-17 12:38:30 -08:00
Zafar Takhirov	b8584b884e	[quant] Quantizable MultiheadAttention (#49866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49866 - Adds the `torch.nn.quantizable.MultiheadAttention` The quantizable version can serve as a fully equivalent to `torch.nn.MultiheadAttention` module. The main difference is that it allows for linear units observation after the `prepare` step in the quantization flow. Note: The `from_observed` method (called during the `convert`) removes the `bias_k` and `bias_v` parameters, and resets them as attributes. This is done to avoid an error of assigning a quantized tensor to the `torch.nn.Parameter`. (Note: this ignores all push blocking failures!) Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_custom_module_multi_head_attention ``` Imported from OSS Reviewed By: vkuzo Differential Revision: D25706179 fbshipit-source-id: e27ab641d8d1eccc64cf9e44343459331f89eea4	2021-02-17 12:36:30 -08:00
Ansley Ussery	440fddf07b	Remove unnecessary statement in `capture_stderr` (#52366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52366 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26489602 Pulled By: ansley fbshipit-source-id: dd0db0a631840b5efd5dc48887fbf724781c6be4	2021-02-17 12:28:46 -08:00
Rohan Varma	6dabe0b291	[Dist Profiling] Enable dist profiling for DDP (gloo only) (#52031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52031 Closes https://github.com/pytorch/pytorch/issues/52020 Ensures that we can profile collectives in DDP by propagating the profiler threadLocalState appropriately. As described in the above issue, before this wouldn't work as the profiler would only be enabled on the main thread. ghstack-source-id: 121818080 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26356192 fbshipit-source-id: 0158b5833a3f857a0b4b2943ae3037e9d998dfd1	2021-02-17 12:21:37 -08:00
Scott Wolchok	059ee85ca4	[PyTorch] Devirtualize TensorImpl::storage() (#51050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51050 Subclasses want to be able to make storage() calls throw, so we find some free space in TensorImpl to add a flag that they can set to make that happen without making storage() virtual. It should still be inlineable. ghstack-source-id: 121819684 Test Plan: Compared `perf stat` on 1M iterations on AdIndexer benchmark before/after Before: ``` 74,483.15 msec task-clock # 0.999 CPUs utilized ( +- 0.14% ) 16,637 context-switches # 0.223 K/sec ( +- 11.97% ) 3 cpu-migrations # 0.000 K/sec ( +- 7.20% ) 107,085 page-faults # 0.001 M/sec ( +- 2.39% ) 147,356,440,831 cycles # 1.978 GHz ( +- 0.14% ) (50.06%) 278,678,430,378 instructions # 1.89 insn per cycle ( +- 0.01% ) (50.05%) 43,540,698,177 branches # 584.571 M/sec ( +- 0.01% ) (50.05%) 141,028,843 branch-misses # 0.32% of all branches ( +- 1.00% ) (50.05%) ``` After: ``` 74,178.77 msec task-clock # 0.999 CPUs utilized ( +- 0.31% ) 17,125 context-switches # 0.231 K/sec ( +- 3.41% ) 3 cpu-migrations # 0.000 K/sec 109,535 page-faults # 0.001 M/sec ( +- 1.04% ) 146,803,364,372 cycles # 1.979 GHz ( +- 0.30% ) (50.03%) 277,726,600,254 instructions # 1.89 insn per cycle ( +- 0.02% ) (50.03%) 43,299,659,815 branches # 583.720 M/sec ( +- 0.03% ) (50.03%) 130,504,094 branch-misses # 0.30% of all branches ( +- 1.14% ) (50.03%) ``` Looks like approximately 0.3% instruction count win (and similarly for cycles, but that's within noise). Reviewed By: ezyang Differential Revision: D26013815 fbshipit-source-id: 07939957929070e18b9981d492d8279c9bb33c55	2021-02-17 11:48:06 -08:00
Nikita Shulga	4305609d66	Fix complex acos edge cases (#52287 ) Summary: Use `std::acos` even when avx2 is available Add slow but accurate implementation of complex arc cosine based on W. Kahan "Branch Cuts for Complex Elementary Functions" paper, where cacos(z).re = 2atan2(sqrt(1-z).re(), sqrt(1+z).re()) cacos(z).im = asinh((sqrt(conj(1+z))sqrt(1-z)).im()) Fixes https://github.com/pytorch/pytorch/issues/42952 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52287 Reviewed By: walterddr Differential Revision: D26455027 Pulled By: malfet fbshipit-source-id: a81ce1ba4953eff4d3c2a265ef9199896a67b240	2021-02-17 11:36:09 -08:00
Nikita Shulga	72d1ccd3ca	Revert D26263480: [Pytorch, Sparsity] Integrate sparse qnnpack operator in framework Test Plan: revert-hammer Differential Revision: D26263480 (`87ebaa4eb1`) Original commit changeset: 04ab60aec624 fbshipit-source-id: ad7690eebdc4b2782c2c94b5bbadbde4ef7c0627	2021-02-17 11:29:08 -08:00
Meghan Lele	cbede834d4	[JIT] Add support for default argument values to Torchbind (#51253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51253 Summary This commit adds support to Torchbind for specifying default values for arguments of custom class methods. Test Plan This commit adds a unit test to `test_torchbind.py` that exercises this feature. Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D26131529 Pulled By: SplitInfinity fbshipit-source-id: 68bc86b045dd2f03ba41e1a116081a6eae6ba9ff	2021-02-17 11:27:03 -08:00
XiaobingSuper	324c6aada1	BFloat16: enable prepacked weights's inference (#48922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48922 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25537188 Pulled By: VitalyFedyunin fbshipit-source-id: ab6eb1ba8cffb5ba9d00d05db8ef616628f8c932	2021-02-17 11:20:00 -08:00
Nikita Shulga	e36a900e89	[tools] Use anonymous access to access S3 bucket (#52338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52338 Reviewed By: samestep Differential Revision: D26475415 Pulled By: malfet fbshipit-source-id: a96e8868b11e9e7691daa54ff2baef4446605ba7	2021-02-17 11:00:52 -08:00
Scott Wolchok	0e2520baae	[PyTorch] Don't read 1 char per iteration in Unpickler::readString (#51901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51901 It's much more efficient to read multiple chars with 1 memcpy than to call `read<char>` multiple times. ghstack-source-id: 121278774 Test Plan: Run WireSerializerBench before/after for small tensors: ``` /tmp/WireSerializerBench.Reader --real_data /mnt/homedir/hwwang/test_serialized_api_request --real_pytorch_api_request --bm_regex '[Ss]mall' ``` Before: ``` DeSerializeWire(Small) 7.65us 130.65K DeSerializeWire(small_Zstd) 100.49% 7.62us 131.29K DeSerializeWire(small_Snappy) 100.49% 7.62us 131.29K DeSerializeWireIValue(Small) 82.89% 9.23us 108.30K DeSerializeWireIValue(small_Zstd) 82.87% 9.24us 108.27K DeSerializeWireIValue(small_Snappy) 82.33% 9.30us 107.57K DeSerializeC2ToBlob(small_NoCompress) 1150.28% 665.39ns 1.50M DeSerializeC2ToBlob(small_Zstd) 1149.70% 665.72ns 1.50M DeSerializeC2ToBlob(small_Zstd_Fast) 1150.94% 665.00ns 1.50M DeSerializeC2ToBlob(Small_Snappy) 1151.70% 664.57ns 1.50M DeSerializeC2ToString(small) 9297.81% 82.32ns 12.15M ``` After: ``` DeSerializeWire(Small) 6.86us 145.84K DeSerializeWire(small_Zstd) 100.52% 6.82us 146.60K DeSerializeWire(small_Snappy) 100.13% 6.85us 146.03K DeSerializeWireIValue(Small) 83.94% 8.17us 122.42K DeSerializeWireIValue(small_Zstd) 84.00% 8.16us 122.50K DeSerializeWireIValue(small_Snappy) 84.53% 8.11us 123.28K DeSerializeC2ToBlob(small_NoCompress) 1019.48% 672.58ns 1.49M DeSerializeC2ToBlob(small_Zstd) 1020.03% 672.23ns 1.49M DeSerializeC2ToBlob(small_Zstd_Fast) 1020.59% 671.85ns 1.49M DeSerializeC2ToBlob(Small_Snappy) 1020.30% 672.05ns 1.49M DeSerializeC2ToString(small) 7709.63% 88.94ns 11.24M ``` Second run after to demonstrate it wasn't just variance: ``` DeSerializeWire(Small) 6.92us 144.57K DeSerializeWire(small_Zstd) 99.24% 6.97us 143.47K DeSerializeWire(small_Snappy) 99.58% 6.95us 143.97K DeSerializeWireIValue(Small) 84.83% 8.15us 122.63K DeSerializeWireIValue(small_Zstd) 84.72% 8.16us 122.49K DeSerializeWireIValue(small_Snappy) 84.59% 8.18us 122.29K DeSerializeC2ToBlob(small_NoCompress) 1031.03% 670.89ns 1.49M DeSerializeC2ToBlob(small_Zstd) 1030.64% 671.14ns 1.49M DeSerializeC2ToBlob(small_Zstd_Fast) 1013.39% 682.57ns 1.47M DeSerializeC2ToBlob(Small_Snappy) 1013.95% 682.19ns 1.47M DeSerializeC2ToString(small) 8155.98% 84.81ns 11.79M ``` By the way, this gets us closer to deserialization parity for the real data sample included in D26049387: baseline: ``` DeSerializeWire(RealData) 7.34ms 136.24 DeSerializeWire(RealData_Zstd) 99.95% 7.34ms 136.17 DeSerializeWire(RealData_Snappy) 100.09% 7.33ms 136.36 DeSerializeWireIValue(RealData) 82.69% 8.88ms 112.65 DeSerializeWireIValue(RealData_Zstd) 82.76% 8.87ms 112.75 DeSerializeWireIValue(RealData_Snappy) 82.68% 8.88ms 112.64 DeSerializeC2ToBlob(RealData_NoCompress) 116.87% 6.28ms 159.23 DeSerializeC2ToBlob(RealData_Zstd) 117.33% 6.26ms 159.85 DeSerializeC2ToBlob(RealData_Zstd_Fast) 117.38% 6.25ms 159.91 DeSerializeC2ToBlob(RealData_Snappy) 117.61% 6.24ms 160.23 DeSerializeC2ToString(RealData) 4571.81% 160.55us 6.23K ``` with this diff: ``` DeSerializeWire(RealData) 6.57ms 152.17 DeSerializeWire(RealData_Zstd) 100.17% 6.56ms 152.43 DeSerializeWire(RealData_Snappy) 100.09% 6.57ms 152.31 DeSerializeWireIValue(RealData) 83.06% 7.91ms 126.40 DeSerializeWireIValue(RealData_Zstd) 83.16% 7.90ms 126.54 DeSerializeWireIValue(RealData_Snappy) 83.22% 7.90ms 126.64 DeSerializeC2ToBlob(RealData_NoCompress) 104.02% 6.32ms 158.29 DeSerializeC2ToBlob(RealData_Zstd) 103.46% 6.35ms 157.43 DeSerializeC2ToBlob(RealData_Zstd_Fast) 104.64% 6.28ms 159.23 DeSerializeC2ToBlob(RealData_Snappy) 104.65% 6.28ms 159.25 DeSerializeC2ToString(RealData) 4051.03% 162.22us 6.16K ``` Reviewed By: qizzzh Differential Revision: D26321083 fbshipit-source-id: 92d45e760580bb290078ddac84128174daef0e55	2021-02-17 11:00:48 -08:00
Scott Wolchok	b2aa63f17c	[PyTorch] Fix return value of IValue::to for Tensor/String (#51463 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51463 We can make the return type of the `to()` template match the return type of toFoo() by using the same technique we use for `list_element_to_const_ref`. Also simplifies `list_element_to_const_ref`. ghstack-source-id: 121363468 Test Plan: CI built and ran AdIndexer benchmark w/ batch size 1 under perf stat --repeat 5 to make sure it didn't regress Reviewed By: bhosmer Differential Revision: D26163848 fbshipit-source-id: b8563263b9f9fa5311c7d7cedc89e28bc5badda0	2021-02-17 11:00:44 -08:00
Scott Wolchok	a9f5e7229e	[PyTorch] Remove reference_cast in make_boxed_from_unboxed_functor (#51319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51319 We were going out of our way to accommodate `IValue::to<Tensor>` returning a copy of the inner Tensor. `IValue::toTensor` is capable of returning a reference without copying, so if we use it directly, we can allow kernels that want to take `Tensor &` to do so! As a bonus, we get reduced build times. ghstack-source-id: 121378961 Test Plan: Rely on CI for correctness. Profiled build time with -ftime-trace for RegisterCPU.cpp using an extracted build invocation. Before: P168244900 After: P168245014 Note reduced time spent compiling make_boxed_from_unboxed_functor. I also ran the AdIndexer benchmark (https://fb.quip.com/ztERAYjuzdlr) with static runtime disabled and batch size 1 to see how big the effect on boxed call performance was (any kernels that take `Tensor&` or `const Tensor&` should now actually save a refcount bump). Looks like it was roughly 1% better: Before: 124-125 usec/iter After: 122-123 usec/iter Reviewed By: bhosmer Differential Revision: D26138549 fbshipit-source-id: b0f830527da360c542c815bef2f7e1692615b32a	2021-02-17 10:58:44 -08:00
Scott Wolchok	c442776f3c	[PyTorch] Debug-gate static_assert in KernelFunction::makeFromUnboxedFunctor (#51367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51367 Templight said that this assertion was taking about 5% of build time for RegisterCPU.cpp (a hopefully-representative example I picked to shorten my iteration cycle). I've debug-gated it on the grounds that 1) we at least try to build everything in debug mode and 2) optimized builds presumably take longer in general, so we can more afford to pay the build time cost in debug builds. The win is not entirely clear; please see the test plan for details. ghstack-source-id: 121378960 Test Plan: 1) Built RegisterCPU.cpp with -ftime-trace before and after. It doesn't seem to call out any difference in the details, but the overall time is stably down more like 10% (55s before and 49s after). 2) Did a full rebuild of aten-cpu with -ftime-trace before and after. No significant difference in build times shown (it says after is a regression, but it's using wall-time data and the machine is loaded during builds so there's some noise). 3) Re-profiled with Templight. Before: {F366557311} After: {F366557501} Not sure what to conclude overall. A known problem with templight is that template instantiations form more of a dependency graph than a tree because they're cached internally, so eliminating the first caller of a template may just move the time to another caller. However, it looks like we have actually reduced is_functor traffic. UPDATE: I don't think that the -ftime-trace measurement was reliable; it seems to skew running times. I built this diff vs its base 5 times and measured the CPU ("user") time each time. Results (in seconds): previous diff: [51.97, 50.54, 50.49, 52.89, 51.61] mean: 51.5 std: 0.906 this diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 Reviewed By: ezyang Differential Revision: D26153793 fbshipit-source-id: 9a66912c1b2b068f453e78be57454e4e62b7107b	2021-02-17 10:47:07 -08:00
Guilherme Leobas	975d9f2551	Mypy fixes for pytorch master (#52090 ) Summary: This PR adds fixes mypy issues on the current pytorch main branch. In special, it replaces occurrences of `np.bool/np.float` to `np.bool_/np.float64`, respectively: ``` test/test_numpy_interop.py:145: error: Module has no attribute "bool"; maybe "bool_" or "bool8"? [attr-defined] test/test_numpy_interop.py:159: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float64"? [attr-defined] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52090 Reviewed By: walterddr Differential Revision: D26469596 Pulled By: malfet fbshipit-source-id: e55a5c6da7b252469e05942e0d2588e7f92b88bf	2021-02-17 10:39:51 -08:00
Stephen Macke	a8885ee7e6	[BE][typing] add caffe2/torch proto stubs (1 of 2) (#52341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52341 Add type stubs for caffe2 protos and scripts for generating them. It's worth calling special attention to the following. In order to make `DeviceType`s like `CPU`, `CUDA`, etc. directly accessible from the `caffe2_pb2` module, they are currently freedom-patched into it in `caffe2/python/__init__.py`. This is not ideal: it would be better if these were autogenerated when the protobuf definitions were created by using `allow_alias = true` in the `DeviceTypeProto` definition in `caffe2.proto`. However, it is impossible to do this currently without significant effort. The issue is that the generated proto constants would conflict with various constants defined in the C++ caffe2 codebase in `caffe2_pb.h`. We cannot simply remove these constants and replace them with the caffe2 DeviceTypeProto constants, because a huge portion of code expects `at::DeviceType` constants defined in `core/DeviceType.h` (apparently duplicated to avoid having to figure out how to autogenerate the protobuf definitions using cmake for ATen). Instead, we make a best-effort to add additional definitions in `caffe2_pb2.py` by looking for any freedom-patched constants in `caffe2/python/__init__.py` and making sure they have corresponding stubs in the pyi (see `gen_proto_typestubs_helper.py`). Test Plan: Make sure CI is green; we're just adding stubs. Reviewed By: d4l3k Differential Revision: D26331875 fbshipit-source-id: 2eea147e5bf393542f558ff8cf6385c47624b770	2021-02-17 10:30:11 -08:00
Facebook Community Bot	99619ea3b7	Automated submodule update: FBGEMM (#52354 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `c520088927` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52354 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: ejguan Differential Revision: D26484989 fbshipit-source-id: c9ccce0141be49c57b80e14992f842364bb18a00	2021-02-17 09:30:47 -08:00
Ansley Ussery	d8bb932245	Support AST rewriting for submodules (#52297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52297 Before, an `nn.Module` with submodules would fail AST rewriting with `TypeError: 'RewrittenModule' object does not support item assignment`. (Try the `test_ast_rewriter_reassigns_submodules` test case on `master`.) This PR fixes the issue as well as adding additional test cases Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26483820 Pulled By: ansley fbshipit-source-id: 757e898dc2b0a67daf2bd039d555b85f4e443322	2021-02-17 09:08:07 -08:00
Kimish Patel	87ebaa4eb1	[Pytorch, Sparsity] Integrate sparse qnnpack operator in framework Summary: Add QNNPACK specific packed params for sparse linear. Add sparse linear dynamic op with appropriate registration. Add python side LinearDynamic module for sparsity. Add tests to validate sparse linear qnnpack kernels. Note that since these test are mostly run on x86 platform and given that 1x4 sparse kernels are implemented both in sse and arm, LinearDynamic at the moment defaults to 1x4 pattern. Plan is to add another diff that will allow a global override for 8x1 pattern such that prepare/convert flow can work for exporting model for mobile. Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test Reviewed By: z-a-f Differential Revision: D26263480 fbshipit-source-id: 04ab60aec624d1ecce8cfb38b79c7e94f501cdf6	2021-02-17 08:44:16 -08:00
Kimish Patel	a6e94d274f	[Pytorch] Add python binding to use mobile cpu allocator. (#52323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52323 Using default cpu allocator for ops executed on qnnpack backend will result in asan failures with heap overflow since qnnpack (and xnnpack) can access input beyond their and/beginning. Here we are enabling this feature specifically to enable dynamic sparse linear op test using qnnpack engine. In dynamic linear op, the fp32 bias is not packed and hence can result in out-of-bound access. Test Plan: test_set_default_mobile_cpu_allocator.py Reviewed By: z-a-f Differential Revision: D26263481 fbshipit-source-id: a49227cac7e6781b0db4a156ca734d7671972d9f	2021-02-17 08:42:23 -08:00
Vuk Radovic	4501b52fe5	Benchmark for torch.ops.quantized.linear_prepack_fp16 operator (#52229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52229 Create benchmarks for torch.ops.quantized.linear_prepack_fp16 and torch.ops.quantized.linear_unpack_fp16 operators Benchmark for these operators are written in the same format as the other benchmarks for other operators. Test Plan: linear_prepack_fp16 test was successfully run with various parameters: Sample test run output: ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks ---------------------------------------- Tag : long Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M8_N32_K256_cpu Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 14.002 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M8_N32_K512_cpu Input: M: 8, N: 32, K: 512, device: cpu Forward Execution Time (us) : 14.114 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M8_N64_K256_cpu Input: M: 8, N: 64, K: 256, device: cpu Forward Execution Time (us) : 19.355 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M8_N64_K512_cpu Input: M: 8, N: 64, K: 512, device: cpu Forward Execution Time (us) : 19.056 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M128_N32_K256_cpu Input: M: 128, N: 32, K: 256, device: cpu Forward Execution Time (us) : 115.963 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M128_N32_K512_cpu Input: M: 128, N: 32, K: 512, device: cpu Forward Execution Time (us) : 116.259 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M128_N64_K256_cpu Input: M: 128, N: 64, K: 256, device: cpu Forward Execution Time (us) : 229.336 Benchmarking PyTorch: linear_prepack_fp16 Mode: Eager Name: linear_prepack_fp16_M128_N64_K512_cpu Input: M: 128, N: 64, K: 512, device: cpu Forward Execution Time (us) : 220.016 linear_unpack_fp16 test was successfully run with identical parameters Reviewed By: b-koopman Differential Revision: D26403343 fbshipit-source-id: 11a98e56177952b94f291006975b0b719f48d1b9	2021-02-17 08:02:01 -08:00
Scott Wolchok	6e1a5b1196	[PyTorch] Use real if constexpr behind macro in hot template (#51368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51368 This seems to noticably reduce build times, at least for RegisterCPU.cpp. It makes sense that a compiler builtin would be faster than simulating the same builtin with templates. Identified with templight. ghstack-source-id: 121378959 Test Plan: Confirmed this speeds up RegisterCPU.cpp optimized build by simply running builds under `time(1)`: previous diff: [50.53, 50.41, 50.57, 50.67, 50.94] mean: 50.6 std: 0.179 this diff: [45.71, 45.89, 46.21, 48.51, 45.84] mean: 46.4 std: 1.05 Reviewed By: bhosmer Differential Revision: D26154964 fbshipit-source-id: 62ee2f5a872007db032dfebf7ad4d1b6e1ce63d1	2021-02-17 07:38:50 -08:00
Scott Wolchok	680c4ce1dd	[PyTorch] Avoid some extra intrusive_ptr<Tuple> copies in Unpickler (#51902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51902 These seem like straightforward improvements. (I don't have measurements; feel free to reject if you're skeptical) ghstack-source-id: 121278775 Test Plan: CI Reviewed By: qizzzh Differential Revision: D26322438 fbshipit-source-id: d393a32cc34bb68bc4f804f4b1cc5a8af27763c9	2021-02-17 07:31:58 -08:00
Nikita Shulga	f235c65a2b	[TorchScript] C++ interface of to_<backend> (Re-land) (#52340 ) Summary: This is a re-land off https://github.com/pytorch/pytorch/pull/51797 with fix for spurious libcuda dependency Fix limits the scope of `no-as-needed` linker flag to just `jitbackend_test` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52340 Reviewed By: agolynski, iseeyuan Differential Revision: D26476168 Pulled By: malfet fbshipit-source-id: f909428af82182b3bffd020ca18cca7a9b5846b6	2021-02-17 07:17:50 -08:00
rraminen	8c185e62f9	torchvision hipify revamp fix (#51453 ) Summary: The torchvision build error from hipify revamp, "KeyError: '/usr/include/libpng16/png.h'" is fixed in this PR Description: Traceback (most recent call last): File "setup.py", line 471, in <module> ext_modules=get_extensions(), File "setup.py", line 329, in get_extensions extra_compile_args=extra_compile_args File "/opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 892, in CUDAExtension is_pytorch_extension=True, File "/opt/conda/lib/python3.6/site-packages/torch/utils/hipify/hipify_python.py", line 978, in hipify clean_ctx=clean_ctx) File "/opt/conda/lib/python3.6/site-packages/torch/utils/hipify/hipify_python.py", line 212, in preprocess hip_clang_launch, is_pytorch_extension, clean_ctx, show_progress) File "/opt/conda/lib/python3.6/site-packages/torch/utils/hipify/hipify_python.py", line 175, in preprocess_file_and_save_result hip_clang_launch, is_pytorch_extension, clean_ctx, show_progress) File "/opt/conda/lib/python3.6/site-packages/torch/utils/hipify/hipify_python.py", line 792, in preprocessor output_source = RE_ANGLE_HEADER.sub(mk_repl('#include <{0}>', False), output_source) File "/opt/conda/lib/python3.6/site-packages/torch/utils/hipify/hipify_python.py", line 785, in repl value = HIPIFY_FINAL_RESULT[header_filepath]["hipified_path"] KeyError: '/usr/include/libpng16/png.h' Pull Request resolved: https://github.com/pytorch/pytorch/pull/51453 Reviewed By: agolynski Differential Revision: D26459979 Pulled By: fmassa fbshipit-source-id: f653f55fd34c71314e6c6682217f84b2d1e49335	2021-02-17 03:12:20 -08:00
Facebook Community Bot	35b0560ea2	Automated submodule update: FBGEMM (#52255 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `7f3baec496` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52255 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D26443031 fbshipit-source-id: 9e2758c73a15e7d2b5aefa5bc38270404cb5862a	2021-02-17 01:12:51 -08:00
Bert Maher	bb9e0c625e	[nnc] Add dummy reference to llvm::cfg::Update<BasicBlock> (#52321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52321 We're seeing undefined references to this function in coverage builds. I don't even know why the toolchain is trying to look for it, because it's not actually used in our code anywhere. Obviously dropping in a dummy reference is a workaround more than a real solution, but I'd like to get the coverage build back online. ghstack-source-id: 121818432 Test Plan: `buck build mode/dbgo-cov //caffe2/test/...` Reviewed By: asuhan Differential Revision: D26467484 fbshipit-source-id: 4de8d950b03d0818ffc317fc1bed9be8cf470352	2021-02-16 23:27:31 -08:00
Vasiliy Kuznetsov	bfc7e28188	reland - ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) (#52302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52302 Adds the basic functionality for the three Numeric Suite core APIs to work on FX models: 1. comparing weights 2. comparing activations, with same input fed to both models 3. comparing activations, with nodes of A shadowing nodes of B Note: there are a lot of TODOs in the code, and some/most of the APIs and implementation details may change as we iterate. This is just the first PR. Test Plan: We have unit test coverage for all of the APIs, for now this is with toy models: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Reviewed By: raghuramank100 Differential Revision: D26463013 Pulled By: vkuzo fbshipit-source-id: e454115099ad18e4037d3c54986951cdffcab367	2021-02-16 19:59:32 -08:00
Bram Wasti	fa393b56e7	[static runtime] use NNC to generate logit, relu and tanh (#52322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52322 diff BS=1 ``` C2 run finished. Milliseconds per iter: 0.0564008. Iters per second: 17730.3 PyTorch run finished. Milliseconds per iter: 0.0677778. Iters per second: 14754.1 ``` diff BS=20 ``` C2 run finished. Milliseconds per iter: 0.51086. Iters per second: 1957.48 PyTorch run finished. Milliseconds per iter: 0.510077. Iters per second: 1960.49 ``` master BS=1 ``` C2 run finished. Milliseconds per iter: 0.0567362. Iters per second: 17625.4 PyTorch run finished. Milliseconds per iter: 0.0706478. Iters per second: 14154.7 ``` master BS=20 ``` C2 run finished. Milliseconds per iter: 0.510943. Iters per second: 1957.17 PyTorch run finished. Milliseconds per iter: 0.516338. Iters per second: 1936.72 ``` Reviewed By: bertmaher Differential Revision: D25407106 fbshipit-source-id: 08595ba5e4be59e2ef95fb9b24da7e7671692395	2021-02-16 18:55:34 -08:00
Bert Maher	4156588365	[nnc] Allow 1 ulp tolerance in log approximation (#52165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52165 Apparently bitwise identicality is too high a bar (I'm seeing differences at this level depending on the HW platform, e.g., Broadwell is bitwise accurate but Skylake is 1ulp off). But anyways VML is accurate to 1 ulp, so let's allow that. ghstack-source-id: 121815001 Test Plan: test_approx Reviewed By: asuhan Differential Revision: D26408079 fbshipit-source-id: 46cbd1487c72ae7bc40567f2f72ed2b919707d0d	2021-02-16 16:49:36 -08:00
Richard Barnes	9409a3a39b	Check kernel launches in caffe2/operators (#52240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52240 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D26408330 fbshipit-source-id: 60779ba0e38c8f90e0e341c8faa2661e631112dd	2021-02-16 16:42:05 -08:00
Erjia Guan	059c564ba4	[DataLoader] Fix module import (#52224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52224 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26429871 Pulled By: ejguan fbshipit-source-id: fcf2e5435658ecb92af1079def953b08cebb1f7f	2021-02-16 16:12:33 -08:00
Stephen Jia	4e36891e4f	Temporary disable cat tests on MacOS due to Sandcastle failure Summary: The `cat` op tests pass on device and local MacOS, but will fail during Sandcastle runs. Disabling them for now while we investigate why they fail in Sandcastle. Test Plan: `buck test //fbobjc/Apps/Internal/PyTorchPlaygroundMac:PyTorchPlaygroundMacTests` Reviewed By: xta0 Differential Revision: D26468606 fbshipit-source-id: 440369bb68641060fa98dbf37fb8825ee56083e0	2021-02-16 15:15:46 -08:00
Nikita Shulga	52af23b912	Update PyBind to official v2.6.2 tag (#52304 ) Summary: This moves PyBind module from pre-2.6.2 to an official 2.6.2 release hash. See https://github.com/pybind/pybind11/releases/tag/v2.6.2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52304 Reviewed By: samestep Differential Revision: D26463177 Pulled By: malfet fbshipit-source-id: 6c6c5d0a4ff0c3f399370194e90dc8295fdd4bb2	2021-02-16 13:40:28 -08:00
Jane Xu	63206ada8f	Adding back CUDA 11.1 CI (#52171 ) Summary: - Does not disable current CUDA 11.2 CI jobs - Does not reenable tests disabled for CUDA 11.2 - Removes some unused docker images Pull Request resolved: https://github.com/pytorch/pytorch/pull/52171 Reviewed By: malfet Differential Revision: D26461533 Pulled By: janeyx99 fbshipit-source-id: e0e23117498320e72f2cbca547981c5894b48b68	2021-02-16 13:09:36 -08:00
Jane Xu	f3f72b5c6b	when BUILD_SPLIT_CUDA=ON, create dummy torch_cuda (#52305 ) Summary: Makes dummy torch_cuda target to maintain better backwards compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52305 Test Plan: Run `export BUILD_SPLIT_CUDA=ON && python setup.develop`. When it's done building, run `ls -lah` within `build/lib` to check that `libtorch_cuda.so` exists and is the same size as `libtorch.so`. Reviewed By: walterddr Differential Revision: D26463915 Pulled By: janeyx99 fbshipit-source-id: 2b4cb8ee49bd75e11dc89d94b5956917b1800df1	2021-02-16 12:33:42 -08:00
Edvard Ghazaryan	b887c30980	Out version for sum (#52225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52225 Supported out version for sum for SR Test Plan: buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest sum node runtime before out version (1000 time run): 3558us sum node runtime after out version (1000 time run): 2173 us Reviewed By: ajyu Differential Revision: D26259744 fbshipit-source-id: bc6a1231353d79a96d45f1cdc676e78a92469d85	2021-02-16 12:01:02 -08:00
Bert Maher	71d5a8ea62	[nnc] Benchmark inference batchnorm (#52251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52251 Batchnorm in inference is just a bunch of pointwise ops. NNC should be able to do a good job of this, and indeed it does. For fun I've included a fused BN->Relu (although the real fusion fun would be Conv->BN->Relu...). ``` --------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------------------- BatchNorm/ATen/1/64/112/112 252886 ns 252875 ns 2785 GB/s=25.3981G/s BatchNorm/ATen/1/256/14/14 12145 ns 12145 ns 55347 GB/s=33.0525G/s BatchNorm/ATen/1/128/28/28 18919 ns 18918 ns 37749 GB/s=42.437G/s BatchNorm/ATen/1/64/56/56 61434 ns 61433 ns 11315 GB/s=26.1363G/s BatchNorm/ATen/1/512/7/7 11924 ns 11923 ns 59070 GB/s=16.8327G/s BatchNorm/ATen/5/64/112/112 1873321 ns 1873292 ns 382 GB/s=17.1424G/s BatchNorm/ATen/5/256/14/14 83470 ns 83459 ns 8538 GB/s=24.0483G/s BatchNorm/ATen/5/128/28/28 157521 ns 157520 ns 4440 GB/s=25.4829G/s BatchNorm/ATen/5/64/56/56 314675 ns 314670 ns 2235 GB/s=25.513G/s BatchNorm/ATen/5/512/7/7 48129 ns 48128 ns 14582 GB/s=20.851G/s BatchNorm/NNC/1/64/112/112 249454 ns 249428 ns 2802 GB/s=25.749G/s BatchNorm/NNC/1/256/14/14 9321 ns 9321 ns 74573 GB/s=43.066G/s BatchNorm/NNC/1/128/28/28 16874 ns 16873 ns 40999 GB/s=47.5797G/s BatchNorm/NNC/1/64/56/56 59276 ns 59275 ns 12047 GB/s=27.0878G/s BatchNorm/NNC/1/512/7/7 3452 ns 3452 ns 202610 GB/s=58.1394G/s BatchNorm/NNC/5/64/112/112 1820201 ns 1820038 ns 373 GB/s=17.6439G/s BatchNorm/NNC/5/256/14/14 78429 ns 78420 ns 8871 GB/s=25.5935G/s BatchNorm/NNC/5/128/28/28 155214 ns 155202 ns 4514 GB/s=25.8635G/s BatchNorm/NNC/5/64/56/56 311454 ns 311449 ns 2163 GB/s=25.7768G/s BatchNorm/NNC/5/512/7/7 26853 ns 26851 ns 25283 GB/s=37.3735G/s BatchNorm/ATenRelu/1/64/112/112 378879 ns 378849 ns 1844 GB/s=16.9528G/s BatchNorm/ATenRelu/1/256/14/14 16707 ns 16705 ns 41391 GB/s=24.029G/s BatchNorm/ATenRelu/1/128/28/28 30235 ns 30235 ns 23060 GB/s=26.5529G/s BatchNorm/ATenRelu/1/64/56/56 91164 ns 91160 ns 7662 GB/s=17.6132G/s BatchNorm/ATenRelu/1/512/7/7 14681 ns 14681 ns 46088 GB/s=13.6707G/s BatchNorm/ATenRelu/5/64/112/112 2864060 ns 2863566 ns 243 GB/s=11.2142G/s BatchNorm/ATenRelu/5/256/14/14 118376 ns 118367 ns 5907 GB/s=16.9561G/s BatchNorm/ATenRelu/5/128/28/28 237893 ns 237873 ns 2936 GB/s=16.8749G/s BatchNorm/ATenRelu/5/64/56/56 472452 ns 472386 ns 1479 GB/s=16.9949G/s BatchNorm/ATenRelu/5/512/7/7 61389 ns 61379 ns 11442 GB/s=16.3496G/s BatchNorm/NNCRelu/1/64/112/112 248378 ns 248341 ns 2812 GB/s=25.8618G/s BatchNorm/NNCRelu/1/256/14/14 9965 ns 9964 ns 76013 GB/s=40.2861G/s BatchNorm/NNCRelu/1/128/28/28 16153 ns 16153 ns 43343 GB/s=49.7004G/s BatchNorm/NNCRelu/1/64/56/56 58761 ns 58757 ns 12095 GB/s=27.3265G/s BatchNorm/NNCRelu/1/512/7/7 10529 ns 10529 ns 66590 GB/s=19.0625G/s BatchNorm/NNCRelu/5/64/112/112 1799001 ns 1798757 ns 362 GB/s=17.8527G/s BatchNorm/NNCRelu/5/256/14/14 78252 ns 78246 ns 8974 GB/s=25.6504G/s BatchNorm/NNCRelu/5/128/28/28 154940 ns 154923 ns 4483 GB/s=25.9102G/s BatchNorm/NNCRelu/5/64/56/56 312329 ns 312324 ns 2244 GB/s=25.7046G/s BatchNorm/NNCRelu/5/512/7/7 51203 ns 51199 ns 13559 GB/s=19.6004G/s ``` Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D26440786 Pulled By: bertmaher fbshipit-source-id: 7d3f7bf6eee4c37736e9875d31ae1b483af9fb6f	2021-02-16 10:57:38 -08:00
Nikolay Korovaiko	0019a20a2b	[WIP] Add a `_flush_compilation_cache` for testing (#52001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52001 Reviewed By: eellison Differential Revision: D26371876 Pulled By: Krovatkin fbshipit-source-id: db773d7124916bad31e80bdd7bb9b4170060977b	2021-02-16 10:49:38 -08:00
Jane Xu	b01b7ea4f3	store artifacts for windows binary build (#52239 ) Summary: Better debugging: allows you to download the final package for binary windows builds Pull Request resolved: https://github.com/pytorch/pytorch/pull/52239 Reviewed By: agolynski Differential Revision: D26463613 Pulled By: janeyx99 fbshipit-source-id: ffb0ec044be23286b8975b9a6d2f90d05c2aff9c	2021-02-16 09:55:28 -08:00
Jeff Daily	4df8e774e6	[ROCm] warn unsupported PYTORCH_CUDA_FUSER_DISABLE_FMA (#50508 ) Summary: nvcc's `--fmad=false` is not valid for the HIP compiler. Upcoming ROCm releases will start treating unrecognized compiler flags as an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50508 Reviewed By: albanD Differential Revision: D25920291 Pulled By: mrshenli fbshipit-source-id: c0ff3b74dd07f3d0661ba29efafaab291ef3621c	2021-02-16 08:09:57 -08:00
Jane Xu	68e2a8c420	Reenable test_nn tests for Windows (#52051 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/52002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52051 Reviewed By: ngimel Differential Revision: D26409749 Pulled By: janeyx99 fbshipit-source-id: 5fa76d4fff8cf0fe2130c925fde9dffd0d1e7172	2021-02-16 08:00:07 -08:00
Sergei Vorobev	df837d0384	Use the libc++ detection instead of clang detection around std:isinfinite (#52164 ) Summary: Fixes #52163 The libc++ vs libstdc++ detection in the pre-processor is taken from https://stackoverflow.com/questions/31657499/how-to-detect-stdlib-libc-in-the-preprocessor Note that in our case `std:isinfinite` presents means that we don't need to import any additional headers to guarantee the `_LIBCPP_VERSION` presents for the `libc++`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52164 Reviewed By: albanD Differential Revision: D26413108 Pulled By: malfet fbshipit-source-id: 515e258d6758222c910ababf5172c3a275aff08f	2021-02-15 08:59:37 -08:00
Nikita Shulga	cd46ee6175	Revert D26280518: [TorchScript] C++ interface of to_<backend> Test Plan: revert-hammer Differential Revision: D26280518 (`a184ef8df5`) Original commit changeset: fd466e4b4488 fbshipit-source-id: e4def49703ab525c063b8cc5d11296b9cc614fbb	2021-02-15 08:05:16 -08:00
Yangxin Zhong	1903b32c35	Directly Return when Numel == 0 for WeightedSum and ScatterWeightedSum Summary: Current Caffe2 operators WeightedSum and ScatterWeightedSum will enforce that the first input is not empty; otherwise they will raise error. However, in some cases we will have 0 batch size in training and eval. For example, when training and eval current AF and AI OC models, we will filter out the search ads in data pipeline, which might cause 0 batch size in some iterations. As a result, if the models are using Dper3 modules that contains WeightedSum or ScatterWeightedSum (e.g., HistogramBinningCalibration module), they will occasionally fail in training or eval. To address this issue, we revise the implementation of WeightedSum and ScatterWeightedSum so that we will directly return when their first inputs are empty without failing the operators. Test Plan: We tested the code change by building a Dper3 backend canary package. All the jobs for AF and AI OC succeeded with the modified Caffe2 operators: f251058001 f251058142 f251058332 To compare, all the jobs with identical model configs but with the canary package built from master failed: f250993908 f250994106 f250994174 Reviewed By: chenshouyuan, huayuli00 Differential Revision: D26444645 fbshipit-source-id: 1c2f81a078810e3ef3c17c133a715090dee2c0ff	2021-02-14 17:49:34 -08:00
Natalia Gimelshein	eaddadd4f7	Revert D26403094: ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) Test Plan: revert-hammer Differential Revision: D26403094 (`37622db76a`) Original commit changeset: 9752331d4ae0 fbshipit-source-id: f0a32d443a29b25af33d90420dfd1bada40c917c	2021-02-14 15:09:16 -08:00
Hao Lu	4949eea0ff	[StaticRuntime] Clean up output references and remove dead code (#52237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52237 Redo D26331506 (`4c58be4573`). Get rid of `nodiscard` which broke OSS CI. - Clean up references of outputs, including Tuples/Lists, by using move semantics - Clean up references of elements in output Tuples/Lists by adding them to `unmanaged_values_` in MemoryPlanner. Check for corner case of Tuple/List element being inputs. - Modify unit tests to check for use_counts of outputs - Clean up dead code. A bit overlap with D25592967, but shouldn't be a problem. This diff does not try to fix the alias problem with the MemoryPlanner. Reviewed By: swolchok Differential Revision: D26432539 fbshipit-source-id: e08990e4066c1ce69ad5274860851d012b7be411	2021-02-13 20:05:28 -08:00
Meghan Lele	73de98204d	[JIT] Add static method support for TorchBind (#51177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51177 Summary This commit adds support for static methods to TorchBind. Just like pybind, the API for declaring a static method is `def_static(...)`. A static method must be called on the class directly, and can be called both in Python as well as TorchScript. Support for static methods is implemented in a manner similar to that of instance methods. Registered static functions are wrapped in a layer of unboxing logic, their schemas are inferred using templates and metaprogramming, and they are added to the `ClassType` object corresponding to the TorchBind class on which they are registered. ScriptClass has been extended to support a `__getattr__` function so that static methods of TorchBind classes can be invoked in Python. The implementation of `__getattr__` returns `ScriptClassFunctionPtr`, a version of `StrongFunctionPtr` without a compilation unit (since the functions of a TorchBind class live inside the TorchBind registry). Within TorchScript, TorchBind static functions are desugared in `PythonClassValue::attr` by looking them up on the class type of the `PythonClassValue` instance. Test Plan This commit adds a unit test that tests a simple static method on a TorchBind class. Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26356942 Pulled By: SplitInfinity fbshipit-source-id: 1b6a9bc2e5f3e22071ad78e331a0201fbbf7ab30	2021-02-13 19:41:27 -08:00
Nikita Shulga	de4c9ecc35	Fix libnvrtc discoverability in package patched by `auditwheel` (#52184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52184 `auditwheel` inserts first 8 symbols of sha256 checksum of the library before relocating into the wheel package. This change adds logic for computing the same short sha sum and embedding it into LazyNVRTC as alternative name for libnvrt.so Fixes https://github.com/pytorch/pytorch/issues/52075 Test Plan: Imported from OSS Reviewed By: seemethere Differential Revision: D26417403 Pulled By: malfet fbshipit-source-id: e366dd22e95e219979f6c2fa39acb11585b34c72	2021-02-13 19:38:27 -08:00
Nikita Shulga	357e5baf7e	Extend DynamcLibrary constructor to support alternative library name (#52183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52183 This allows one to load library that can exist on the system under different names. Currently, this functionality is Linux only, as on Windows shared libraries are not renamed by `auditwheel` Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26417405 Pulled By: malfet fbshipit-source-id: d327e2565b26cf5b7214e7978862f56e02cad7c6	2021-02-13 19:38:23 -08:00
Nikita Shulga	b8f3a658f9	Do not include "DynamicLibrary.h" into a top-level header (#52182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52182 DynamicLibrary provides a very specific functionality, so there is no need to exposes it to every project depending on `ATen.h` Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26417404 Pulled By: malfet fbshipit-source-id: f8318cacb07dcc8b2f95984f88ea1df4e5369b8b	2021-02-13 19:34:46 -08:00
Mikhail Zolotukhin	52e6ef8b53	[TensorExpr] Add another test for ExternalCalls. (#52162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52162 This test demonstrates how external calls can interoperate with other tensor computations and between themselves. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26410813 Pulled By: ZolotukhinM fbshipit-source-id: 8180164013b43f613d53620d1b249e0af769ae8e	2021-02-13 18:38:17 -08:00
Nikita Shulga	bf841b25e4	[cmake] Add explicit cublas->cudart dependency (#52243 ) Summary: Necessary to ensure correct link order, especially if libraries are linked statically. Otherwise, one might run into: ``` /usr/bin/ld: /usr/local/cuda/lib64/libcublasLt_static.a(libcublasLt_static.a.o): undefined reference to symbol 'cudaStreamWaitEvent@libcudart.so.11.0' /usr/local/cuda/lib64/libcudart.so: error adding symbols: DSO missing from command line ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52243 Reviewed By: seemethere, ngimel Differential Revision: D26437159 Pulled By: malfet fbshipit-source-id: 33b8bb5040bda10537833f3ad737f535488452ea	2021-02-13 18:21:33 -08:00
Phi Nguyen	490eb3e735	Add 3D depthwise seperable convolution (#51027 ) Summary: Because this pull request (https://github.com/pytorch/pytorch/issues/40801) becomes an important part of recent 3D models, brings significant improvement in speed, and also have been open for a while. So I decided to resolve the previous review comment and modify it a bit so that it can be merged into the latest version of Pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51027 Reviewed By: albanD Differential Revision: D26414116 Pulled By: ngimel fbshipit-source-id: 562c099f4d7f6d603a9c2f2e2a518bc577b0d8ee	2021-02-13 18:14:09 -08:00
Gao, Xiang	846755af2f	Remove unused include in TensorIteratorDynamicCasting.h (#51824 ) Summary: In the past, this file included `thrust/complex.h` because the `thrust::complex` --> `c10::complex` migration was not done. Today, this task has been done for a while but seems that this include was not deleted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51824 Reviewed By: albanD Differential Revision: D26417144 Pulled By: ngimel fbshipit-source-id: 1fff5b8d50f0b34c963a7893cbb0599895823105	2021-02-13 18:02:23 -08:00
Rohan Varma	8ff5a46c32	[RPC] waitForThreadLocalRRefs returns jitFuture (#51696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51696 Modify this API to use JitFuture. ghstack-source-id: 121695707 Test Plan: Ci Reviewed By: mrshenli Differential Revision: D26239132 fbshipit-source-id: 15c0c349a79e660fe4862e1d99176989f8159bf4	2021-02-13 17:43:16 -08:00
Rohan Varma	87c0b6bffc	[RPC] Move confirmation future in rref context to jit future (#51695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51695 As part of the plan to completely eliminate torch/csrc/utils/future.h, we are converting this to JitFuture (c10::ivalue::Future). ghstack-source-id: 121695708 Test Plan: CI Reviewed By: mrshenli Differential Revision: D26238873 fbshipit-source-id: 92bad1a349964ce8a9a80e2d1cf68f293cbe411c	2021-02-13 17:40:55 -08:00
Ansley Ussery	96fd5d87f7	Add `dict()` constructor (#51934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51934 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26418199 Pulled By: ansley fbshipit-source-id: 524f6d9d29ee1fa1b7c5e80ada82e577f47089dc	2021-02-13 15:23:22 -08:00
Martin Yuan	a184ef8df5	[TorchScript] C++ interface of to_<backend> (#51797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51797 The C++ API, ```codegen_backend_module``` is added to ```to_<backend>```. Python related stuffs are decoupled in this function. It can be used from both C++ and python. * Tests Python: The existing ```test_backends.py```, which calls the C++ API under the hood. C++: The end-to-end test of ```jit.BackendTest.ToBackend``` is added in ```test_backend.cpp```. The original class definitions in this file is moved to ```test_backend_lib.cpp``` ghstack-source-id: 121687464 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: raziel Differential Revision: D26280518 fbshipit-source-id: fd466e4b448847ce64010a3297fff0b5760c5280	2021-02-13 15:15:45 -08:00
Jongsoo Park	4ab86c87a2	[caffe2 and pytorch] replace temp name of new sparse adagrad JIT'ed function in fbgemm (#52193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52193 In this step, we replace the temp name and use the old interface name with new behavior Test Plan: CI Reviewed By: dskhudia Differential Revision: D26232170 fbshipit-source-id: 60233f98fe91a15c3c834bf6fde1b185269dd2b6	2021-02-13 10:23:36 -08:00
Rohan Varma	a86027ded3	Use side-stream in CPU to GPU copies in DDP (#50180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50180 Resolves the regression in https://github.com/pytorch/pytorch/issues/49819 by adding copy over background stream similar to scatter. For internal use cases, this is gated with an env var that maintains the previous behavior when it is off. Test Plan: CI Reviewed By: mrshenli, ngimel Differential Revision: D25818170 fbshipit-source-id: e50c76c035504b2a44e2be084701cee45c90df75	2021-02-13 00:57:32 -08:00
Stephen Jia	71d0b5632b	Add SqueezeNet to PyTorch Playground Summary: Add support for SqueezeNet in the PyTorch Playground test app Test Plan: ``` arc focus2 pp-ios ``` Reviewed By: xta0 Differential Revision: D26083960 fbshipit-source-id: a0d753eefa431f2f9e377f082c564370d6774c0b	2021-02-12 18:43:51 -08:00
Stephen Jia	388c38505c	[Metal] Add concat op for metal Summary: Add concat op to enable models such as SqueezeNet. Test Plan: Test on device: ``` arc focus2 pp-ios ``` Test on mac ``` buck test pp-macos ``` Reviewed By: xta0 Differential Revision: D26029029 fbshipit-source-id: b0d621f2069a722f0770218c435b22feac4fb873	2021-02-12 18:40:58 -08:00
Ansley Ussery	4cc10563e7	Customize traceback for calls to symbolically-traced code (#51648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51648 The following code will throw during the call to `traced(5)`: ```python class M(nn.Module): def __init__(self): super(M, self).__init__() self.W = torch.nn.Parameter(torch.randn(5)) def forward(self, x): return torch.dot(self.W, x) traced = fx.symbolic_trace(M()) traced(5) ``` Traceback before: ``` Traceback (most recent call last): File "test/tinytest.py", line 26, in <module> traced(5) File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call return self._cls_call(self, args, kwargs) File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "<eval_with_key_0>", line 4, in forward TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int ``` Traceback after: ``` Traceback (most recent call last): File "/home/ansley/local/pytorch/torch/fx/graph_module.py", line 338, in wrapped_call return torch.nn.Module.__call__(self, args, *kwargs) File "/home/ansley/local/pytorch/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "<eval_with_key_1>", line 4, in forward dot_1 = torch.dot(w, x); w = x = None TypeError: dot(): argument 'tensor' (position 2) must be Tensor, not int Call using an FX-traced Module, line 4 of the traced Module’s generated forward function: w = self.W dot_1 = torch.dot(w, x); w = x = None ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE relu_1 = dot_1.relu(); dot_1 = None return relu_1 ``` (Note that the same `TypeError` is thrown despite modifying the traceback.) Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D26424005 Pulled By: ansley fbshipit-source-id: 368f46ba81fb3111bd09654825bb2ac5595207d1	2021-02-12 18:31:23 -08:00
Ansley Ussery	1657d59641	Walk Python AST to check for unsupported attribute type annotations (#51805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51805 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D26418589 Pulled By: ansley fbshipit-source-id: c13e9096dcfa242d158ebf1ae4f86ef6c46ff0ec	2021-02-12 18:18:01 -08:00
Vasiliy Kuznetsov	37622db76a	ns for fx - stubs of the three APIs (compare weights, activations, activations with shadow) (#51669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51669 Adds the basic functionality for the three Numeric Suite core APIs to work on FX models: 1. comparing weights 2. comparing activations, with same input fed to both models 3. comparing activations, with nodes of A shadowing nodes of B Note: there are a lot of TODOs in the code, and some/most of the APIs and implementation details may change as we iterate. This is just the first PR. Test Plan: We have unit test coverage for all of the APIs, for now this is with toy models: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403094 fbshipit-source-id: 9752331d4ae0105346d3da309b13c895b593b450	2021-02-12 17:52:21 -08:00
Vasiliy Kuznetsov	bfe6e23209	Early version of fx graph matcher for NS (#51588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51588 Early version of utility to match nodes between graph A and graph B, for Numerical Suite for FX graph mode quantization. The main goal of this utility is to reliably match the nodes of graph A to the nodes of graph B, and throw an easy to read error message. This will be used in future PRs to create the APIs for matching activations. It also could potentially be used to match weights. Test Plan: For now, we have bare bones test coverage on some toy models, and a single torchvision model. ``` python test/test_quantization.py TestFXGraphMatcher ``` Future PRs will add more testing. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26403093 fbshipit-source-id: 60e318d51e6fefe65265488c4967629d946048ef	2021-02-12 17:50:13 -08:00
Jeffrey Wan	2900cf2b94	Refactor autograd discovery code (#52057 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34067 by using https://github.com/pytorch/pytorch/issues/34426 by hczhu In addition to removing the unnecessary any() we do also: - Get rid of the outer loop since graph_root also needs to be checked - Update psuedo code description so it matches what the code does - Add some comments explaining the difference between assigning `info.needed_` and `info.captures_` in terms of how that affects discovery - [edit: another benefit is that exec_info entries are no longer created for all reachable nodes] This PR is on top of https://github.com/pytorch/pytorch/issues/51940, so once that lands rebasing on top of master should get rid of the extra commits and changes I'm not sure if this change will bring a lot of performance gains, but the main benefit is that the code is easier to read. Trivial graph: ``` torch.autograd.grad(ab, [a, b], gradient) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) gradient = torch.ones(2, 2) Timer before: 15.45 us Time after: 14.33 us 1 measurement, 10000 runs , 1 thread Instructions after: All Noisy symbols removed Instructions: 8271213 8193169 Baseline: 4244 3838 Instructions before: All Noisy symbols removed Instructions: 8142843 8054463 Baseline: 4280 3838 100 runs per measurement, 1 thread ``` Small graph: ``` torch.autograd.grad((ba.exp()+a*b.exp()).sum(), (a, b)) setup: a = torch.rand((2, 2), requires_grad=True) b = torch.rand((2, 2), requires_grad=True) Time before: 52.25 us Time after: 50.80 us 1 measurement, 10000 runs , 1 thread Instruction count before: All Noisy symbols removed Instructions: 25601257 25518229 Baseline: 4228 3838 Instruction count after: All Noisy symbols removed Instructions: 25606533 25522797 Baseline: 4228 100 runs per measurement, 1 thread ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52057 Reviewed By: ngimel Differential Revision: D26432207 Pulled By: soulitzer fbshipit-source-id: beef68344d66e9e286378e31e3311ba43c25c749	2021-02-12 16:22:35 -08:00
Jiakai Liu	b2d8f0a431	[pytorch][bot] update mobile op deps (#52110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52110 LLVM_DIR=/usr ANALYZE_TORCH=1 tools/code_analyzer/build.sh cp build_code_analyzer/work/torch_result.yaml tools/code_analyzer/default_op_deps.yaml Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D26419138 Pulled By: ljk53 fbshipit-source-id: 26bf00036b19ad18a9cf06111df4d9fe32e5feab	2021-02-12 14:50:29 -08:00
Richard Barnes	a8321855ad	Check kernel launches in caffe2/aten/src/THC (#52174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52174 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D26408837 fbshipit-source-id: deecd2e856946d1adbc985c13db110c06de6f3df	2021-02-12 14:16:10 -08:00
Rohan Varma	7b21c6be67	[Dist Profiling] Enable profiling for gloo send/recv (#52004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52004 Enables profiling of p2p collectives for Gloo. Modified/added relevant unittests. ghstack-source-id: 121507511 Test Plan: CI Reviewed By: mrzzd Differential Revision: D26347164 fbshipit-source-id: f4d1c474fccf40d5776fc13c4add7a053ea08960	2021-02-12 13:46:51 -08:00
Jane Xu	49c8be516e	Add ARM64 cross-compilation build on OS X (#49751 ) Summary: Tests cross-compilation of ARM64 architecture in MacOS CI. This should be merged after PR https://github.com/pytorch/pytorch/issues/50243 and https://github.com/pytorch/pytorch/issues/50922 (adding a fix). The reason we pin the wheel to be version 0.36.2 is because lower versions cannot handle c38 as a tag for the wheel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49751 Reviewed By: albanD Differential Revision: D26411133 Pulled By: janeyx99 fbshipit-source-id: 00a5cf597aee10adea1547579270cb3b38732563	2021-02-12 13:08:30 -08:00
Ailing Zhang	83fa713f2b	Fix test to use proper condition. (#52216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52216 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26427506 Pulled By: ailzhang fbshipit-source-id: ba4f2f66794cb2843926e5566eb4d25582f7fb2b	2021-02-12 12:59:35 -08:00
Junjie Yang	0dc0cb1d8d	Enable FP16 sparse regularizer Summary: Previously there was no regularizer implemented for fp16 sparse features. Add regularizer support here using the Float16SparseNormalize implemented in this stack. Test Plan: buck test //caffe2/caffe2/python:regularizer_test In f248648705, we can see there is the operator `Float16SparseNormalize`. {F356635445} Reviewed By: bigrabithong Differential Revision: D24042567 fbshipit-source-id: 5e0065f8c10b8748daffa8a54a6bf8f461460b18	2021-02-12 12:29:32 -08:00
Adam Simpkins	fa0a049d4e	Add a make_tempdir() utility function to the TestCase base class (#51762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51762 Update test_util.py to add a `make_tempdir()` function to the `TestCase` class. The main advantage of this function is that the temporary directory will be automatically cleaned up when the test case finishes, so that test case does not need to worry about manually cleaning up this directory. This also prefixes the directory name with `caffe2_test.` so that it is more obvious where the temporary directories came from if they are ever left behind after a crashed or killed test process. This updates the tests in `operator_test/load_save_test.py` to use this new function, so they no longer have to perform their own manual cleanup in each test. Test Plan: python caffe2/python/operator_test/load_save_test.py Reviewed By: mraway Differential Revision: D26271178 Pulled By: simpkins fbshipit-source-id: 51175eefed39d65c03484482e84923e5f39a4768	2021-02-12 10:56:01 -08:00
Yuchen Huang	05b60921ae	[iOS][PyTorch][OSS] fix iOS nightly build (#52197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52197 D26187854 (`6045663f39`) added `from typing_extensions import Literal` to `tools/codegen/gen.py` whereas `typing_extensions` was not installed while building iOS binary Was reproduced here: https://app.circleci.com/pipelines/github/pytorch/pytorch/273256/workflows/a1c66866-87ad-4ace-a0f7-f8c17524091c/jobs/10882828 ghstack-source-id: 121621817 Test Plan: Created a PR to trigger the nightly build which also includes the fix. https://github.com/pytorch/pytorch/pull/52195 The nightly build was successful: https://app.circleci.com/pipelines/github/pytorch/pytorch/273262/workflows/ed7a0f14-2b48-4599-877f-45271473dd86/jobs/10883042 {F372504913} Reviewed By: linbinyu Differential Revision: D26420298 fbshipit-source-id: d88c9203473def936aaf1c1756c3c926d087a959	2021-02-12 10:41:11 -08:00
Richard Barnes	de54510f15	Check kernel launches in caffe2/caffe2/image (#52173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52173 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D26408885 fbshipit-source-id: f90a00199a73487cb9134f20c58975b134a0117b	2021-02-12 10:11:41 -08:00
Mike Ruberry	1795398c24	Updates rounding_mode documentation to remove "true" (#52202 ) Summary: In design review the use of the word "true" for a "rounding mode" which actually performed no rounding was, understandably, considered confusing. This PR updates the documentation to remove references to "true." The signatures for torch.div and torch.divide are updated to reflect the future behavior where rounding_mode=None will be the default. This is slightly inaccurate. Today when rounding mode is not specified it is effectively None, but users cannot actually specify rounding_mode=None today. That change was considered too disruptive to the 1.8 branch cut process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52202 Reviewed By: gchanan Differential Revision: D26424979 Pulled By: mruberry fbshipit-source-id: db3cc769c0d9c6d7e42bfad294073c99fa9168d9	2021-02-12 09:19:39 -08:00
Rong Rong (AI Infra)	e8ab58bfc7	[reland] Early terminate CUDA on common_utils TestCases (#52126 ) Summary: Take 2 of https://github.com/pytorch/pytorch/issues/50914 This change moves the early termination logic into common_utils.TestCase class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52126 Test Plan: CI with ci-all tag Reviewed By: malfet Differential Revision: D26391762 Pulled By: walterddr fbshipit-source-id: a149ecc47ccda7f2795e107fb95915506ae060b4	2021-02-12 07:32:42 -08:00
Can Balioglu	d22f700f9e	Link torch_global_deps to libtbb.so if USE_TBB is enabled (#51741 ) Summary: Some distributions of MKL such as the one in the Conda default channel have an implicit dependency to TBB even though they do not list it explicitly in their ELF dynamic section (DT_NEEDED). Pre-loading torch_global_deps into a process that uses such an MKL distribution fails with an unresolved symbol error due to missing libtbb.so. This code change forces torch_global_deps to load libtbb.so into the process to avoid such issues. More over although we distribute our own TBB build, it is a widely-used third-party library and the same global namespace treatment rules should apply to it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51741 Reviewed By: malfet Differential Revision: D26261214 Pulled By: cbalioglu fbshipit-source-id: 94491275f8ec82d5917695e57dd766a10da92726	2021-02-12 07:13:34 -08:00
Mike Ruberry	992d251c39	Revert D26333953: [StaticRuntime] Clean up output references and remove dead code Test Plan: revert-hammer Differential Revision: D26333953 (`0c9d72b5e1`) Original commit changeset: cadc0595ad6a fbshipit-source-id: 75d0b33099342653cd8867b129139325789aee6c	2021-02-12 02:12:31 -08:00
Hao Lu	0c9d72b5e1	[StaticRuntime] Clean up output references and remove dead code (#51991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51991 - Clean up references of outputs, including Tuples/Lists, by using move semantics - Clean up references of elements in output Tuples/Lists by adding them to `unmanaged_values_` in MemoryPlanner. Check for corner case of Tuple/List element being inputs. - Modify unit tests to check for use_counts of outputs - Clean up dead code. A bit overlap with D25592967, but shouldn't be a problem. This diff does not try to fix the alias problem with the MemoryPlanner. (Note: this ignores all push blocking failures!) Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/opt-clang caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench_test ``` Reviewed By: bwasti Differential Revision: D26333953 fbshipit-source-id: cadc0595ad6ab754c4f1f7a5a3733b2c16b3102f	2021-02-12 01:11:08 -08:00
Facebook Community Bot	e4203c4306	Automated submodule update: FBGEMM (#52129 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `4d203256ba` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52129 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jianyuh Differential Revision: D26393870 Pulled By: jspark1105 fbshipit-source-id: 6cf01c45c8768f453c9fac5f8af6813db0549083	2021-02-11 22:01:48 -08:00
Adam Simpkins	db6e0c7c0e	Replace a platform.system() check with sys.platform (#51766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51766 Check if we are on Windows using `sys.platform` rather than `platform.system()`. Even though `platform.system()` is more modern, it has a few downsides: this performs a runtime check of the platform type, which has non-zero overhead. On Linux it actually executes the separate `/bin/uname` process. On the other hand `sys.platform` is determined when the Python interpreter is compiled, so this is a simple hard-coded string. Because it is a runtime check, `platform.system()` checks also cannot be analyzed by static type checkers like Pyre and Mypy. These type checkers do understand `sys.platform` checks, and can correctly avoid complaining about code paths that use platform-specific modules and functions. e.g., they can avoid complaining about `ctypes.WinDLL` not existing on Linux if its use is guarded by a `sys.platform` check. ghstack-source-id: 121107705 Test Plan: Ran tests on Linux, and will check CI test results. Reviewed By: mraway Differential Revision: D26271724 Pulled By: simpkins fbshipit-source-id: b86e427e4ceec0324464ba4bc88b95d5813172d0	2021-02-11 20:09:14 -08:00
Richard Barnes	dc25c90cfc	Check kernel launches in caffe2/aten/src/THCUNN (#52172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52172 Test Plan: Sandcastle tests Reviewed By: xush6528 Differential Revision: D26408802 fbshipit-source-id: 4470203087bfedaf5825e5d63f3b9de25dd50161	2021-02-11 19:46:13 -08:00
AJ San Joaquin	578f0a04c7	fix torch.nn.parallel.scatter_gather.gather to handle NamedTuples and handle moving output to CPU (#51104 ) Summary: Fixes #{[50510](https://github.com/pytorch/pytorch/issues/50510)} Allows ```torch.nn.parallel.scatter_gather.gather``` to accept a list of NamedTuples as input and returns a NamedTuple whose elements are tensors. I added the author's fix using the ```is_namedtuple``` function. While testing this fix, I encountered a deprecation warning instructing me to use ```'cpu'``` instead of ```-1``` to move the outputs to the CPU. However, doing this causes an assertion error in the ```_get_device_index``` function. I solved this by handling the CPU case in the affected ```forward``` function. rohan-varma Pull Request resolved: https://github.com/pytorch/pytorch/pull/51104 Reviewed By: albanD Differential Revision: D26395578 Pulled By: rohan-varma fbshipit-source-id: 6e98c9ce1d9f1725973c18d24a6554c1bceae465	2021-02-11 15:50:28 -08:00
Chen Lai	ba7a2f6513	Add debug helper function to check target property (#52093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52093 # Summary The previous debug if statement only prints the file list, but it's not clear whether the target includes the file list correctly. This function can examine the target so it's more accurate. This pr includes changes: 1. Add a file `DebugHelper.cmake` with `print_target_properties` function. 2. Replace the debug if statement `if(FALSE)` by adding magical variable `PRINT_CMAKE_DEBUG_INFO` and applying the variable accordingly. Note: previous debug if statement output example: ``` -- CPU sources: -- /Users/chenlai/pytorch/aten/src/ATen/BatchedFallback.cpp -- /Users/chenlai/pytorch/aten/src/ATen/BatchedTensorImpl.cpp -- /Users/chenlai/pytorch/aten/src/ATen/BatchingRegistrations.cpp -- /Users/chenlai/pytorch/aten/src/ATen/CPUGeneratorImpl.cpp -- /Users/chenlai/pytorch/aten/src/ATen/Context.cpp -- /Users/chenlai/pytorch/aten/src/ATen/DLConvertor.cpp -- /Users/chenlai/pytorch/aten/src/ATen/DynamicLibrary.cpp -- /Users/chenlai/pytorch/aten/src/ATen/ExpandUtils.cpp -- /Users/chenlai/pytorch/aten/src/ATen/LegacyTHFunctionsCPU.cpp -- /Users/chenlai/pytorch/aten/src/ATen/MemoryOverlap.cpp ... -- GPU sources: -- CPU include: -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH -- /Users/chenlai/pytorch/aten/src -- /Users/chenlai/pytorch/build_android/caffe2/aten/src ... -- GPU include: -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH -- /Users/chenlai/pytorch/build_android/caffe2/aten/src/TH -- /Users/chenlai/pytorch/aten/src/TH ``` # Test plan Set `PRINT_CMAKE_DEBUG_INFO` to true by adding `DPRINT_CMAKE_DEBUG_INFO` in `./scripts/build_pytorch_android.sh`, run `./scripts/build_pytorch_android.sh x86` `print_target_properties(torch)` shows ``` torch ANDROID_ARCH = x86 torch ANDROID_STL_TYPE = c++_static torch ARCHIVE_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/lib torch AUTOGEN_ORIGIN_DEPENDS = ON torch AUTOMOC_COMPILER_PREDEFINES = ON torch AUTOMOC_MACRO_NAMES = Q_OBJECT;Q_GADGET;Q_NAMESPACE;Q_NAMESPACE_EXPORT torch AUTOMOC_PATH_PREFIX = OFF torch BINARY_DIR = /Users/chenlai/pytorch/build_android_x86/caffe2 torch BINARY_DIR = /Users/chenlai/pytorch/build_android_x86/caffe2 torch BUILD_WITH_INSTALL_RPATH = FALSE torch CXX_STANDARD = 14 torch C_STANDARD = 11 torch IMPORTED = FALSE torch IMPORTED_GLOBAL = FALSE torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INCLUDE_DIRECTORIES = /Users/chenlai/pytorch/build_android_x86/aten/src;/Users/chenlai/pytorch/aten/src;/Users/chenlai/pytorch/build_android_x86;/Users/chenlai/pytorch;/Users/chenlai/pytorch/third_party/XNNPACK/include;/Users/chenlai/Library/Android/sdk/ndk/21.3.6528147/sources/third_party/vulkan/src/common;/Users/chenlai/pytorch/cmake/../third_party/eigen;/Users/chenlai/pytorch/cmake/../third_party/pybind11/include torch INSTALL_RPATH = $ORIGIN torch INSTALL_RPATH_USE_LINK_PATH = TRUE torch INTERFACE_LINK_LIBRARIES = torch_cpu_library torch ISPC_HEADER_SUFFIX = _ispc.h torch LIBRARY_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/lib torch LINK_LIBRARIES = torch_cpu_library torch NAME = torch torch PCH_INSTANTIATE_TEMPLATES = ON torch PCH_WARN_INVALID = ON torch POSITION_INDEPENDENT_CODE = TRUE torch RUNTIME_OUTPUT_DIRECTORY = /Users/chenlai/pytorch/build_android_x86/bin torch SKIP_BUILD_RPATH = FALSE torch SOURCES = /Users/chenlai/pytorch/build_android_x86/empty.cpp torch SOURCE_DIR = /Users/chenlai/pytorch/caffe2 torch SOURCE_DIR = /Users/chenlai/pytorch/caffe2 torch TYPE = STATIC_LIBRARY torch TYPE = STATIC_LIBRARY torch UNITY_BUILD_BATCH_SIZE = 8 torch UNITY_BUILD_MODE = BATCH ``` Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D26377725 Pulled By: cccclai fbshipit-source-id: dbe21ad533759f33711a0ce5328205bbcd5cf0f3	2021-02-11 15:37:14 -08:00
Scott Wolchok	22b12179db	[PyTorch] Make TORCH_INTERNAL_ASSERT use torchCheckFail too (#52086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52086 I previously fixed TORCH_CHECK in D25481308 (`7d406b4a07`), but didn't cover TORCH_INTERNAL_ASSERT. No reason not to fix it too. ghstack-source-id: 121456574 Test Plan: Run framework overhead benchmarks. Run build size check for igios. Adindexer benchmark looks encouraging. Before: ``` I0210 11:10:59.974778 2570617 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0548625. Iters per second: 18227.4 I0210 11:11:07.591706 2570617 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0677804. Iters per second: 14753.5 I0210 11:11:07.637014 2570617 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.35653. Iters per second: 157.319 I0210 11:11:14.592409 2572700 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0543933. Iters per second: 18384.6 I0210 11:11:22.158799 2572700 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0673752. Iters per second: 14842.3 I0210 11:11:22.204160 2572700 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.37655. Iters per second: 156.825 I0210 11:11:29.233793 2573079 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0541586. Iters per second: 18464.3 I0210 11:11:36.726284 2573079 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0666658. Iters per second: 15000.2 I0210 11:11:36.774489 2573079 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.36777. Iters per second: 157.041 I0210 11:11:43.799113 2573238 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0535797. Iters per second: 18663.8 I0210 11:11:51.433924 2573238 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0679261. Iters per second: 14721.9 I0210 11:11:51.479207 2573238 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.34747. Iters per second: 157.543 I0210 11:11:58.492782 2573599 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0548257. Iters per second: 18239.6 I0210 11:12:06.072979 2573599 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0674848. Iters per second: 14818.2 I0210 11:12:06.118813 2573599 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.34473. Iters per second: 157.611 ``` After: ``` I0210 11:13:00.267062 2577288 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0531031. Iters per second: 18831.3 I0210 11:13:07.591711 2577288 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0651389. Iters per second: 15351.8 I0210 11:13:07.636951 2577288 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.25168. Iters per second: 159.957 I0210 11:13:14.497283 2580005 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0524907. Iters per second: 19051 I0210 11:13:21.814965 2580005 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0650504. Iters per second: 15372.7 I0210 11:13:21.861150 2580005 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.32074. Iters per second: 158.209 I0210 11:13:28.775005 2580166 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0528345. Iters per second: 18927 I0210 11:13:36.041087 2580166 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0646226. Iters per second: 15474.5 I0210 11:13:36.087904 2580166 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.38721. Iters per second: 156.563 I0210 11:13:43.223469 2580706 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0534523. Iters per second: 18708.3 I0210 11:13:50.603958 2580706 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.065639. Iters per second: 15234.8 I0210 11:13:50.649281 2580706 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.24524. Iters per second: 160.122 I0210 11:13:57.490873 2580904 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0529411. Iters per second: 18888.9 I0210 11:14:04.745435 2580904 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0644963. Iters per second: 15504.8 I0210 11:14:04.790006 2580904 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.22258. Iters per second: 160.705 ``` Looks like a pretty clear win (though it seems to have helped C2 as well). I checked with perf stat as well and it looks like a 1.9% CPU cycles win: before: ``` 35,313,858,645 cycles # 1.989 GHz ( +- 0.32% ) (99.98%) 17,750.69 msec task-clock # 0.999 CPUs utilized ( +- 0.33% ) 70,524,321,763 instructions # 2.00 insn per cycle ( +- 0.52% ) (99.98%) ``` after: ``` 34,628,390,377 cycles # 1.988 GHz ( +- 0.41% ) (99.98%) 17,416.59 msec task-clock # 0.999 CPUs utilized ( +- 0.41% ) 70,800,211,396 instructions # 2.04 insn per cycle ( +- 0.11% ) (99.98%) ``` Reviewed By: ezyang Differential Revision: D26372806 fbshipit-source-id: 817c7e61741334bb3ac33b617f9628309959b9c3	2021-02-11 15:23:01 -08:00
Joe Zhu	f2b43ddbf4	Update api doc for enabling TcpStore on Windows (#51847 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/51847 Reviewed By: albanD Differential Revision: D26405678 Pulled By: malfet fbshipit-source-id: 073b675225b48d1732771583f8f2473e0fdcf35c	2021-02-11 14:44:03 -08:00
Jane Xu	ac2bdf553e	update build_host_protoc command for macos cross compilation (#50922 ) Summary: Currently, adding a cross compile build is failing on CI due to a cmake builtin compiler check that does not pass due to cross compiling the host protoc library. Setting the CMAKE_TRY_COMPILE_TARGET_TYPE flag should fix it. (Based on this [SOF answer](https://stackoverflow.com/questions/53633705/cmake-the-c-compiler-is-not-able-to-compile-a-simple-test-program).) To test that this works, please run: `CMAKE_OSX_ARCHITECTURES=arm64 USE_MKLDNN=OFF USE_NNPACK=OFF USE_QNNPACK=OFF USE_PYTORCH_QNNPACK=OFF BUILD_TEST=OFF python setup.py install` from a Mac x86_64 machine with Xcode12.3 (anything with MacOS 11 SDK). Then, you can check that things were compiled for arm by running `lipo -info <file>` for any file in the `build/lib` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50922 Reviewed By: malfet Differential Revision: D26355054 Pulled By: janeyx99 fbshipit-source-id: 919f3f9bd95d7c7bba6ab3a95428d3ca309f8ead	2021-02-11 14:36:51 -08:00
Stephen Jia	6385c13630	[vulkan] Efficient gemm implementation (#49609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49609 Test Plan: Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26209677 Pulled By: SS-JIA fbshipit-source-id: 773a944559bf0deb3cf3e233d833220a12f9f2ab	2021-02-11 14:10:05 -08:00
Jeff Daily	70a805a286	[ROCm] skip one more magma test that is flaky (#52064 ) Summary: Skipped hipMAGMA tests are tracked in https://github.com/pytorch/pytorch/issues/51303. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52064 Reviewed By: albanD Differential Revision: D26406745 Pulled By: walterddr fbshipit-source-id: 2405ea06e03450eb22177c2c8b12a366cfbdaa93	2021-02-11 14:02:52 -08:00
Hao Lu	4c58be4573	[StaticRuntime] Clean up input references (#51952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51952 StaticRuntime should not hold owning refs of inputs after inference is finished. This diff adds a pass to clean them up and unit tests to enforce the check. Will clean up output tensors in separate diffs. Test Plan: ``` buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest buck test mode/opt-clang caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench_test ``` Reviewed By: bwasti Differential Revision: D26331506 fbshipit-source-id: d395a295ada9de3033d0ea05d1dbab62d879a03b	2021-02-11 13:46:19 -08:00
Sam Estep	deb74edb28	Add script to display history for a single test across multiple jobs over time (#52000 ) Summary: Adapted from this gist: https://gist.github.com/malfet/1c34f261a28ae7af61210174394eaece Pull Request resolved: https://github.com/pytorch/pytorch/pull/52000 Test Plan: Example shell session here: https://pastebin.com/HYgWZBFB Reviewed By: walterddr Differential Revision: D26372191 Pulled By: samestep fbshipit-source-id: cdc9a27e1b4a0b3123a70e693b17d524e7c6cb95	2021-02-11 13:27:49 -08:00
Taylor Robie	8908874003	Gh/taylorrobie/import timer fbcode (#52124 ) Summary: `torch.__config__._cxx_flags` gets called on import, but this means that Timer can't be used if it fails. (Even just the wall time parts.) This is needlessly restrictive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52124 Reviewed By: albanD Differential Revision: D26395917 Pulled By: robieta fbshipit-source-id: 4336a77dba131f80d386368ef715eed63c1cbcb4	2021-02-11 13:16:50 -08:00
Xu Zhao	ea8aadf4b6	Use self-hosted runner for nightly docker build CI. (#52148 ) Summary: The GitHub-hosted runner has maximum 14 GB disk space, which is not enough to host the nightly Docker build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52148 Test Plan: CI workflow Reviewed By: samestep Differential Revision: D26406295 Pulled By: xuzhao9 fbshipit-source-id: 18a0dff45613649d6c15b8e1e9ca85042f648afd	2021-02-11 13:14:01 -08:00
Rohan Varma	4c93a79a04	[Dist Profiling] Support shape recording for profiling collectives (#51822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51822 Adds support for shape recording for profiling distributed collectives, for nccl/gloo backends. Added both cpp and python tests to ensure that shapes are recorded properly. Note that we don't add `ProcessGroupNCCLTest`s since they need to be modified to support single process per device and > 1 world size. ghstack-source-id: 121507509 Test Plan: CI Reviewed By: mrzzd Differential Revision: D26291739 fbshipit-source-id: 5f7bd54d8c36d17a4a29e172b25266ca3dbd8fbd	2021-02-11 12:42:26 -08:00
Nikita Shulga	76c6e12a5c	Minor spelling updates (#52149 ) Summary: Add space between 'e.g.' and 'build' 'pacakge'->'package' Pull Request resolved: https://github.com/pytorch/pytorch/pull/52149 Reviewed By: osalpekar Differential Revision: D26405824 Pulled By: malfet fbshipit-source-id: 386390d3f31a9fc268b05902b9dca1deeaf626f9	2021-02-11 12:36:27 -08:00
Chengji Yao	3d77529f5b	enable autocast for xla (#48570 ) Summary: For enabling amp in torch/xla, see [this](https://github.com/pytorch/xla/pull/2654). Pull Request resolved: https://github.com/pytorch/pytorch/pull/48570 Reviewed By: ezyang Differential Revision: D26120627 Pulled By: ailzhang fbshipit-source-id: 32627b17c04bfdad128624676ea9bf6f117bc97d	2021-02-11 12:06:13 -08:00
Martin Jaggi	b6806308ac	typo in docs ddp_comm_hooks.rst (#51986 ) Summary: Fixes a typo in ddp_comm_hooks.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/51986 Reviewed By: SciPioneer Differential Revision: D26360314 Pulled By: mrshenli fbshipit-source-id: 50349501c53823cbcbad0f72be7c6ac9d51a4120	2021-02-11 12:02:37 -08:00
Roy, Arindam	517185f946	test_lc_1d: Increase deadline to 5 seconds (#52013 ) Summary: Increasing the deadline as to avoid flakiness of the test on ROCM. Signed-off-by: Roy, Arindam <rarindam@gmail.com> Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52013 Reviewed By: albanD Differential Revision: D26360209 Pulled By: mrshenli fbshipit-source-id: 1ddc7062c5ff7c980233d22844073de9fb7dcbb3	2021-02-11 11:59:56 -08:00
Nikita Shulga	497b772547	Add custom implementation for `csqrt` if libc++ is used (#52018 ) Summary: libc++ implements csqrt using polar form of the number, which results in higher numerical error, if `arg` is close to 0, pi/2, pi, 3pi/4 Fixes https://github.com/pytorch/pytorch/issues/47500 Pull Request resolved: https://github.com/pytorch/pytorch/pull/52018 Reviewed By: walterddr Differential Revision: D26359947 Pulled By: malfet fbshipit-source-id: 8c9f4dc45948cb29c43230dcee9b030c2642d981	2021-02-11 11:53:52 -08:00
Jane Xu	0bc7b9843b	use sccache 2.15 over the outdated sccache (#52095 ) Summary: Change macos build job on CI to use newer sccache. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52095 Reviewed By: walterddr Differential Revision: D26406024 Pulled By: janeyx99 fbshipit-source-id: a40da4acd4c01af16d30269e67c7015aff54503a	2021-02-11 11:35:42 -08:00
Adam Simpkins	81b9aa743b	[pytorch] Update caffe2/python to eliminate Pyre errors (#52083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52083 This makes minor fixes in `caffe2/python` to address all errors currently reported by Pyre. I update the code to fix errors when doing so looked simple and safe, and added `pyre-fixme` comments in other places. ghstack-source-id: 121109695 Test Plan: Confirmed that Pyre no longer reports errors under `caffe2/python` Differential Revision: D26272279 fbshipit-source-id: b1eb19d323b613f23280ce9c71e800e874ca1162	2021-02-11 11:04:59 -08:00
Adam Simpkins	c4eb22009e	Drop some Python 2 compatibility code (#51769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51769 Remove some Python 2 compatibility code that otherwise causes errors to be reported from static type checkers. Static type checkers complain that the old Python 2 modules and functions referenced by this code do not exist. Given that Python 2 support is entirely deprecated now we can simply remove the compatibility code. ghstack-source-id: 121313191 Test Plan: Was able to get Pyre to successfully type check the `caffe2/python` directory with this and some other changes. Reviewed By: Tianshu-Bao Differential Revision: D26271723 Pulled By: simpkins fbshipit-source-id: fec8a09466be6867388832380480aafd36616aa1	2021-02-11 11:02:33 -08:00
Scott Wolchok	c931c29120	[PyTorch][easy] Fix TODOs in CppFunction constructors (#51315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51315 The TODOs said to remove this wrapper, and it seems that it can be removed easily. ghstack-source-id: 121363465 Test Plan: CI Reviewed By: ezyang Differential Revision: D26137147 fbshipit-source-id: f1e5971dca071f37400d77cc823214527e4231bc	2021-02-11 10:39:04 -08:00
Scott Wolchok	10d407647f	[PyTorch] Reduce template expansion in call_functor_with_args_from_stack (#51313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51313 The problem here is similar to the one described in https://devblogs.microsoft.com/cppblog/build-throughput-series-more-efficient-template-metaprogramming/ in that we are iterating over an integer seqeunce of length N, where N is the number of argument types to our function, and specializing `TypeListAt` (which we call `element_t`) for each Ith element of the typelist, which instantiates O(I) template specializations, for a total of O(N^2). The solution is also similar: we iterate over the typelist directly. Unlike in the blog post, we do also need the index in the sequence, so we retain the index_sequence. ghstack-source-id: 121363464 Test Plan: Inspect -ftime-trace output for RegisterCPU.cpp. Before: P168220187 After: P168220294 we can see that we spend less time instantiating call_functor_with_args_from_stack and spend a similar amount of time compiling it. The win is modest, but it's a win and I've already written it so I'm sending it out. (I was hoping it would reduce compilation time for make_boxed_from_unboxed_functor.) Reviewed By: bhosmer Differential Revision: D26136784 fbshipit-source-id: c91a523486e3019bd21dcd03e51a58aa25aa0981	2021-02-11 10:36:40 -08:00
Erjia Guan	425a5dc3f7	[DataLoader] Modify SamplerIDP signature (#52104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52104 Make the API of `SamplerIterDataPipe` more reasonable with `sampler_args` and `sampler_kwargs`. Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26401494 Pulled By: ejguan fbshipit-source-id: ee5b5c414782d0880b12968bc9c8aa470b753f6a	2021-02-11 09:29:52 -08:00
Jeffrey Wan	aa2fede201	Fix autograd when `inputs` contains tensors without materialized grad_fn (#51940 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39784 At the time the issue was filed, there was only issue (1) below. There are actually now two issues here: 1. We always set all inputs passed in through `inputs` arg as `needed = True` in exec_info. So if we pass in an input that has a grad_fn that is not materialized, we create an entry of exec_info with nullptr as key with `needed = True`. Coincidentally, when we perform simple arithmetic operations, such as "2 * x", one of the next edges of mul is an invalid edge, meaning that its grad_fn is also nullptr. This causes the discovery algorithm to set all grad_fns that have a path to this invalid_edge as `needed = True`. 2. Before the commit that enabled the engine skipped the dummy node, we knew that root node is always needed, i.e., we hardcode `exec_info[&graph_root]=true`. The issue was that this logic wasn't updated after the code was updated to skip the graph root. To address (1), instead of passing in an invalid edge if an input in `inputs` has no grad_fn, we create a dummy grad_fn. This is done in both python and cpp entry points. The alternative is to add logic for both backward() and grad() cases to check whether the grad_fn is nullptr and set needed=false in that case (the .grad() case would be slightly more complicated than the .backward() case here). For (2), we perform one final iteration of the discovery algorithm so that we really know whether we need to execute the graph root. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51940 Reviewed By: VitalyFedyunin Differential Revision: D26369529 Pulled By: soulitzer fbshipit-source-id: 14a01ae7988a8de621b967a31564ce1d7a00084e	2021-02-11 09:22:15 -08:00
Xu Zhao	0de7a4582e	Fix Pytorch docker image name by adding the registry prefix (#52089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52089 Test Plan: Manually trigger the CI Fixes the [nightly docker pipeline failure](https://github.com/pytorch/pytorch/actions?query=workflow%3A%22Build+PyTorch+nightly+Docker+image+and+push+to+GitHub+Container+Registry%22) Reviewed By: albanD Differential Revision: D26390660 Pulled By: xuzhao9 fbshipit-source-id: 5259fe35ffd154fc6684753f358ec5a63f31428f	2021-02-11 09:12:45 -08:00
Hong Xu	fb2693a632	Use bool/float instead of np.bool/np.float (#52103 ) Summary: This is causing type hint test errors on the latest numpy: ``` torch/testing/_internal/common_quantized.py:38: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float64"? [attr-defined] torch/testing/_internal/common_methods_invocations.py:758: error: Module has no attribute "bool"; maybe "bool_" or "bool8"? [attr-defined] ``` Runtime-wise, there's also a deprecation warning: ``` __main__:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations ``` Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/52103 Reviewed By: suo Differential Revision: D26401210 Pulled By: albanD fbshipit-source-id: a7cc12ca402c6645473c98cfc82caccf161160c9	2021-02-11 08:29:54 -08:00
Chen Lai	7763c127cd	[PyTorch] move aten::dict to lite interpreter (#52032 ) Summary: As title, this operator is needed by [DeepLabV3 model](https://pytorch.org/tutorials/beginner/deeplabv3_on_android.html) used in Image Segmentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/52032 Test Plan: Imported from OSS 1. CI 2. Get the pr (https://github.com/pytorch/pytorch/pull/51419), build pytorch_android (`BUILD_LITE_INTERPRETER=1 ./scripts/build_pytorch_android.sh x86`), run ImageSegmentation app on emulator. {F371671630} Reviewed By: dhruvbird Differential Revision: D26365389 Pulled By: cccclai fbshipit-source-id: bd4c2bd2be83ed6bd3a4cd35eddb98c11a20e245	2021-02-11 01:52:58 -08:00
Garret Catron	bc856b49d4	Add support for constants to fx_glow (#52094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52094 Pull Request resolved: https://github.com/pytorch/glow/pull/5329 Nested constants are created as placeholders by the graph_splitter used in the partitioner. So we change them back to get_attr nodes before serializing the graph. Reviewed By: jfix71 Differential Revision: D26375577 fbshipit-source-id: 66631aadd6f5b8826ffa0a1e70176fbcaa7431d5	2021-02-11 01:42:59 -08:00
Pritam Damania	fd41ed1cce	Fix flaky TestTrainingLoop - TestE2ETensorPipe (#51939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51939 TestTrainingLoop - TestE2ETensorPipe was flaky since there would still be inflight background RPCs running as we performed the assertions. This resulted in these assertions failing since we didn't wait for all RPCs on the agent to finish. To resolve this issue, in this PR we join() and shutdown() the RPC agent to ensure no further RPCs are done. Then we assertion the map sizes to ensure no leaks occurred. In addition to this, added messageIdToTimeout map to lookup the appropriate timeout for a messageId. This ensures we remove the appropriate entry from the map. The previous solution was passing the expirationTime through the lambda, but it is not guaranteed the lambda would read the response of the request we just sent out. ghstack-source-id: 121412604 Test Plan: 1) unit tests 2) waitforbuildbot Reviewed By: rohan-varma Differential Revision: D26331585 fbshipit-source-id: a41e0534d7d4dfd240446e661e5541311931c7d7	2021-02-10 22:14:06 -08:00
Haichuan Yang	4ab0ef36a4	change back to multiple_outputs_gpu_kernel for learnable fake per-channel quantization (#52017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52017 Change back to multiple_outputs_gpu_kernel for per-channel quantization backward c++/cuda implementations (for diff D24479735 (`0c60922fb0`)) ghstack-source-id: 121409281 Test Plan: ## Unit Test: `buck test mode/dev-nosan -c fbcode.platform=platform009 //caffe2/test:quantization -- -v TestFakeQuantize` ## Benchmark Test: (checkout f3980d1d678e) `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerTensorOpBenchmark` `buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:quantization_test -- --operators FakeQuantizePerChannelOpBenchmark` ### In microseconds (`1e-6` second), input size: [1, 3, 256, 256] \| \| C++ Kernel \| Non-backprop C++ Kernel \| \|---------------------------\|---------------\|------------\|-------------------------\|---\| \| Per Tensor CPU Forward \| 1372.123 \| 1365.981 \| \| Per Tensor Cuda Forward \| 84.586 \| 27.205\| \| Per Channel CPU Forward \| 2306.668 \| 2299.991\| \| Per Channel Cuda Forward \| 154.742 \| 135.219 \| \| Per Tensor CPU Backward \| 2544.617 \| 581.268\| \| Per Tensor Cuda Backward \| 304.529 \| 137.335\| \| Per Channel CPU Backward \| 2582.783 \|582.088 \| \| Per Channel Cuda Backward \| 474.265 \| 134.082\| input size: [1, 3, 512, 512] \| \| C++ Kernel \| Non-backprop C++ Kernel \| \|---------------------------\|---------------\|------------\|-------------------------\|---\| \| Per Tensor CPU Forward \| 5426.244 \| 5726.440 \| \| Per Tensor Cuda Forward \| 85.834 \| 26.871\| \| Per Channel CPU Forward \| 9125.913 \| 9118.152\| \| Per Channel Cuda Forward \| 159.599 \| 145.117 \| \| Per Tensor CPU Backward \| 14020.830 \| 2214.864\| \| Per Tensor Cuda Backward \| 285.525 \| 131.302\| \| Per Channel CPU Backward \| 14801.976 \|2104.345 \| \| Per Channel Cuda Backward \| 513.025 \| 120.222\| Reviewed By: raghuramank100 Differential Revision: D26357325 fbshipit-source-id: f42e3803258b0f6b418eab1301b5e5a466671859	2021-02-10 21:46:05 -08:00
cyy	39aa3db62b	use make_shared and make_unique and clean unneeded code (#51829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51829 Reviewed By: izdeby Differential Revision: D26306098 Pulled By: smessmer fbshipit-source-id: 4f6c0469c68f044c0bfe0925fcf7b030a25d15e2	2021-02-10 21:38:43 -08:00
Eli Uriegas	9653161fb4	bump nightlies to 1.9.0 (#51891 ) Summary: similar to https://github.com/pytorch/pytorch/pull/45696 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/51891 Reviewed By: izdeby Differential Revision: D26318646 Pulled By: seemethere fbshipit-source-id: 757194845c758a24eed2d0550866ba890e7a0b58	2021-02-10 20:30:57 -08:00
Jongsoo Park	faaff0cd9b	[caffe2 and pytorch] use new sparse adagrad JIT'ed function in fbgemm Summary: To consider small delay between fbgemm and caffe2/pytorch repo, we are taking multiple steps. In this diff, we use new interface with temp name. Test Plan: CI Reviewed By: dskhudia Differential Revision: D26231909 fbshipit-source-id: 83ceb3e12026d459532ef54459ac125b5625d644	2021-02-10 19:52:54 -08:00
Kshiteej K	d7ea0fe75a	[testing] Add OpInfo for rad2deg and deg2rad (#51283 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/50006 We should probably add aliases for these operators to be consistent with NumPy names i.e. `np.degrees` and `np.radians`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51283 Reviewed By: ngimel Differential Revision: D26171163 Pulled By: mruberry fbshipit-source-id: 1869604ed400820d95f6ff50a0e3cba1de1ffa84	2021-02-10 19:45:10 -08:00
Brian Hirsh	de334e6a2f	fast-path is_complex() in the dispatcher (#50054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50054 Test Plan: Imported from OSS Reviewed By: swolchok Differential Revision: D25760987 Pulled By: bdhirsh fbshipit-source-id: 24666d3b86df6799ebbc478fdcdcaa445daff439	2021-02-10 19:13:33 -08:00
Yanan Cao	705fa7e964	[Usability] Capture argument names for traced functions and modules (#51775 ) Summary: Previously `torch.jit.trace` relies on AutoGrad hooks to infer name of tensors in computation, including those of function/method arguments. This often doesn't work out because: - These names often do not exist - Tracer uses argument name of first tensor operation on each tensor as inferred argument names. These tensor operations have programmatically-generated names like `argument_1` This PR extracts argument names directly from Python functions and pass them down to tracer, which then assigns them to correct graph inputs. This way, we always have the correct argument names captured in IR. This is useful for both debugging and supporting using `InterfaceType` to represent traced modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51775 Reviewed By: izdeby Differential Revision: D26273105 Pulled By: gmagogsfm fbshipit-source-id: 934a385041137dc3731bb6fa8657b11532fed9e5	2021-02-10 18:28:08 -08:00
Brian Hirsh	4add8502c3	inlining a function that i noticed were hot during previous benchmarking (#50848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50848 I noticed that the call overhead from `Tensor::device()` for ~1-2% of instruction counts depending on the microbenchmark Some nice looking instruction count wins https://www.internalfb.com/intern/paste/P164529004/ Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D25984136 Pulled By: bdhirsh fbshipit-source-id: 0e54f2afe78caeb5a03abbb15e9197556acfeca1	2021-02-10 18:12:47 -08:00
Richard Barnes	fa325d7c9f	Use `sum_integers` and `multiply_integers` (#51146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51146 Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25903430 fbshipit-source-id: 329c14018c9e5192864eed88a8ed0a5068ff1c69	2021-02-10 18:05:45 -08:00
Jane Xu	bff8194522	Replace 11.1 with 11.2 on CI for Windows (#51598 ) Summary: Adding CUDA 11.2 to Windows CI. Disabled tests: The following ran into `CUDA error: misaligned address` for CUDA 11.2: (issue linked below) `test_where_scalar_valid_combination_cuda_complex128` in test_torch.py `test_sgn_complex_cuda` in test_autograd.py The following ran into `CUDA error: too many resources requested for launch` for CUDA 11.2: (https://github.com/pytorch/pytorch/issues/52002) test_EmbeddingBag_per_sample_weights_and_new_offsets_cuda_int64_float64 test_EmbeddingBag_per_sample_weights_and_offsets_cuda_int64_float64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51598 Reviewed By: mrshenli Differential Revision: D26344965 Pulled By: janeyx99 fbshipit-source-id: 3c9a4ed16d748969e96593220ec0a9f33e1ffcef	2021-02-10 17:59:11 -08:00
Meghan Lele	5431d87c3e	[JIT] Use `is_buffer` in `BufferPolicy::valid` (#49588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49588 Summary `BufferPolicy::valid` uses `!typ->is_parameter(i)` to check if an attribute is a buffer or not; it should use `type->is_buffer(i)` instead. It also removes a forward compatibility gate in `python_print.cpp` that has prevented the preservation of buffer metadata during serialization in fbcode. Without this, the first change (to `BufferPolicy`) does not work correctly in fbcode. Test Plan It is difficult to write an additional test that would have failed before this commit because the two booleans `is_parameter` and `is_buffer` are never set to `true` at the same time. Fixes This commit fixes #48746. Test Plan: Imported from OSS Reviewed By: xw285cornell Differential Revision: D25633250 Pulled By: SplitInfinity fbshipit-source-id: e727f8506f16d2e2b28f3d76a655f6528e7ac6cb	2021-02-10 17:50:14 -08:00
Meghan Lele	410ef1335a	[JIT] Add buffer/parameter metadata test to test_save_load.py (#49594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49594 Summary This commit adds a unit test to `test_save_load.py` that checks that saving and loading a module preserves metadata about which module attributes are parameters and buffers. The hooks that are currently used to automatically check serialization of every function and module in the unit tests check that the archive produced by saving and loading and saving again are the same and that the type tags for the actual IValues representing the module match before saving and after loading. However, these tests do not check that buffer and parameter metadata was not lost or destroyed during serialization. Test Plan Ran the new unit test. Test Plan: Imported from OSS Reviewed By: xw285cornell Differential Revision: D25730603 Pulled By: SplitInfinity fbshipit-source-id: 06a202935d9e0654cb1966c34f54707f0a28a331	2021-02-10 17:46:35 -08:00
Nikita Shulga	9f1f5636d7	Revert D26019289: [pytorch][PR] Early terminate CUDA on common_utils TestCases Test Plan: revert-hammer Differential Revision: D26019289 (`c1b7ca8062`) Original commit changeset: ddc7c1c0d00d fbshipit-source-id: 6902d03fa06cda5d03191846bc4dd98af501b594	2021-02-10 17:29:10 -08:00
Akshit Khurana	d0fd41dcfe	Add size op in nnapi serializer (#52026 ) Summary: serializer didn't support aten::size Pull Request resolved: https://github.com/pytorch/pytorch/pull/52026 Test Plan: Torchvision Mobilenetv2 [script](https://pytorch.org/tutorials/prototype/nnapi_mobilenetv2.html) works. [Test](`ecfed07cc5`) to be merged after [this PR](https://github.com/pytorch/pytorch/pull/47521/files) is merged Reviewed By: dreiss Differential Revision: D26363133 Pulled By: axitkhurana fbshipit-source-id: 772a6bea62bca69f8bba19c25c582a1734a70eb1	2021-02-10 15:57:01 -08:00
Jane Xu	a1b8f3d4b6	Replace CUDA 11.1 Linux CI with CUDA 11.2 (#51905 ) Summary: Adding 11.2 to CI with BUILD_SPLIT_CUDA enabled. Disabled the following tests as they were failing in test_optim.py: test_adadelta test_adam test_adamw test_multi_tensor_optimizers test_rmsprop (Issue tracking that is here: https://github.com/pytorch/pytorch/issues/51992) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51905 Reviewed By: VitalyFedyunin Differential Revision: D26368575 Pulled By: janeyx99 fbshipit-source-id: 31612c7d04d51afb3f18956e43dc7f7db8a91749	2021-02-10 11:43:50 -08:00
Jane Xu	9b8d414a9c	update sccache wrapper to accommodate new sccache for macos build (#51357 ) Summary: Before I change sccache to point to the newer version in the S3 bucket, this PR makes sure the new sccache wrapper works. This PR previously tested a newer version of sccache for macos build jobs. Last sccache used is over a year old. The results of using both are different, but the speed isn't too impacted, see below. With newer sccache and alternate wrapper script from this PR: https://app.circleci.com/pipelines/github/pytorch/pytorch/271777/workflows/b5c6a75e-781a-4c0f-8c99-ff2cbe1e877c/jobs/10808567 With old sccache: https://app.circleci.com/pipelines/github/pytorch/pytorch/271875/workflows/962079ce-e146-482e-b493-c99004f8d89a/jobs/10805680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51357 Reviewed By: walterddr Differential Revision: D26373266 Pulled By: janeyx99 fbshipit-source-id: ac5ccc512039379af6111b92a5ce37c5268dfdfe	2021-02-10 11:27:55 -08:00
Thomas Viehmann	bd6248106b	Keep alive graph when creating iterators from it (#51951 ) Summary: Previously, the graph might have been delete while Python still has iterators, leading to segfaults. This does not fully work for iterators from Nodes and Blocks as they may be invalidated when the owning graph goes out of scope. I will look into these separately. Fixes https://github.com/pytorch/pytorch/issues/50454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51951 Reviewed By: mrshenli Differential Revision: D26352629 Pulled By: SplitInfinity fbshipit-source-id: 67299b6cbf1ac7ab77f8703a0ca8f1162e03fcd4	2021-02-10 11:09:51 -08:00
Sam Estep	ce8ba5f3bc	Fix test time history report if no ancestor report (#52054 ) Summary: This fixes an issue (currently blocking https://github.com/pytorch/pytorch/issues/51905) where the test time regression reporting step will fail if none of the most recent `master` ancestors have any reports in S3 (e.g. if a new job is added). Pull Request resolved: https://github.com/pytorch/pytorch/pull/52054 Test Plan: ``` python test/test_testing.py ``` Reviewed By: walterddr Differential Revision: D26369507 Pulled By: samestep fbshipit-source-id: 4c4e1e290cb943ce8fcdadacbf51d66b31c3262a	2021-02-10 11:02:46 -08:00
Luca Wehrstedt	a1c67b0763	Silence harmless error logs of TensorPipe agent during shutdown (#51785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51785 The TensorPipe pipes do not really support a "graceful" shutdown: if one side is expecting data (i.e., it has scheduled a readDescriptor call) and the other side closes, the former will receive an error. Such an error will not even be predictable, as it depends on the backend: some may detect this and report it "well" (through an EOFError), others may not be able to tell this apart from a failure and report it as such. This meant that during shutdown some of these errors would fire and thus the agent would log them as warning. We did add a note that these were expected under some conditions, so that users wouldn't be alarmed, but it was still a far-from-ideal experience. In principle we could build a "protocol" on top of these pipes to "agree" on a graceful shutdown, and this was the plan to solve this. However, it was rather complicated to implement. Here I am proposing a quicker, but perhaps hackier, solution, which re-uses the already existing graceful shutdown "protocol" of the agent (i.e., the `join` method) to put the agent in a special state in which it will silence all errors due to a remote shutting down. Such a check cannot happen in the `shutdown` method, because that's also used in case of ungraceful shutdown (in which case I believe we'd still want to display errors). Since it needs to make sure that all participants have transitioned to this new state before any of them can continue (as otherwise one of them may close its pipes before another one has realized that this is now expected), we need to perform a barrier. Hence the ideal place for it is the `join` method, where we're already doing a lot of gang-wide synchronization. Since the `join` method isn't only called during shutdown, we need to make sure we only switch the agent to this state when it's the last call to join, and we do so by adding a new optional argument to it (which will be ignored by all agents except the TensorPipe one). I realize this isn't the prettiest solution, and since it changes the agent's API it's worth discussing it carefully. Let me know what you think! ghstack-source-id: 121131940 Test Plan: Run on CircleCI, where this occurred quite a bit, and check the logs. Reviewed By: mrshenli Differential Revision: D26276137 fbshipit-source-id: 69ef14fe10908e80e627d9b4505352e482089cc8	2021-02-10 10:58:22 -08:00
Luca Wehrstedt	b7b944a319	Avoid TensorPipe agent spamming logs when unable to guess IP address (#51784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51784 The TensorPipe agent mimics Gloo when trying to guess the most reasonable IP address to bind to. When that fails, it prints a warning to inform the user. It turns out, we were attempting to guess the address a lot of times (I counted at least 18: 1 for the UV transport, 1 for the IBV transport, 16 for the multiplexed UV channel) and thus they might all print that same identical warning message. That's noisy. Since the outcome of all these guesses will be the same (unless the system config changes underneath, which is unlikely) we can just do it once, print the warning (at most) once, cache the result and reuse it over and over. Also, we used to have two identical but distinct ways of doing this, one provided by the UV transport and one by the IBV one. TensorPipe offers both methods because backends are modular and independent. However PyTorch always requires the UV one to be present, hence we can always rely on the UV helpers, and avoid using the IBV ones. ghstack-source-id: 121121275 Test Plan: Look at the CircleCI logs, I think I saw this situation happening there. Reviewed By: mrshenli Differential Revision: D26275838 fbshipit-source-id: 8a2ffc40d80388bdca32dbcfed16f28a0a6177a3	2021-02-10 10:54:50 -08:00
Xu Zhao	03e82f7944	Use CUDA 11.2 for nightly docker build. (#51990 ) Summary: Set CUDA_VERSION to 11.2.0 since Nvidia name their docker image on Ubuntu 18.04 to be nvidia/cuda:11.2.0-cudnn8-devel-ubuntu18.04. Note that cudatoolkit 11.2.0 is not yet on [conda](https://repo.anaconda.com/pkgs/main/linux-64/), and we need to wait for that before merging this PR. - https://hub.docker.com/r/nvidia/cuda/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/51990 Reviewed By: samestep Differential Revision: D26371193 Pulled By: xuzhao9 fbshipit-source-id: 76915490dc30ddb03ceeeadb3c45a6c02b60401e	2021-02-10 10:46:20 -08:00
Bram Wasti	c4a8f0ceaa	[torch script] Add pure list producing ops to alias analysis (#51999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51999 as in title Test Plan: waiting on CI for now Reviewed By: eellison Differential Revision: D26349297 fbshipit-source-id: bd5574ed1f8448ba18a6fda4bdc45f45d8b158e9	2021-02-10 09:00:39 -08:00
Nicolas Hug	50e6f0fdb6	Add benchmark for torch.nn.functional.interpolate Summary: This diff adds a new microbencharmk for the `torch.nn.functional.interpolate` operator, using OpBench Test Plan: ``` [nicolashug@59262.od ~/fbsource/fbcode/caffe2/benchmarks/operator_benchmark/pt (39207820)]$ buck run //caffe2/benchmarks/operator_benchmark/pt:interpolate_test -- --tag_filter short Starting new Buck daemon... Buck daemon started. Parsing buck files: finished in 06:30.7 min Creating action graph: finished in 33.9 sec Building: finished in 02:53.4 min (100%) 24224/24224 jobs, 24224 updated Total time: 09:58.2 min /data/sandcastle/boxes/fbsource/fbcode/buck-out/dev/gen/caffe2/benchmarks/operator_benchmark/pt/interpolate_test#link-tree/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastTrue # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: True Forward Execution Time (us) : 510.818 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,60,40)_output_size(24,24)_channels_lastFalse # Input: input_size: (1, 3, 60, 40), output_size: (24, 24), channels_last: False Forward Execution Time (us) : 684.324 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,600,400)_output_size(240,240)_channels_lastTrue # Input: input_size: (1, 3, 600, 400), output_size: (240, 240), channels_last: True Forward Execution Time (us) : 33791.970 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,600,400)_output_size(240,240)_channels_lastFalse # Input: input_size: (1, 3, 600, 400), output_size: (240, 240), channels_last: False Forward Execution Time (us) : 50120.585 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,320,320)_output_size(256,256)_channels_lastTrue # Input: input_size: (1, 3, 320, 320), output_size: (256, 256), channels_last: True Forward Execution Time (us) : 37668.089 # Benchmarking PyTorch: interpolate # Mode: Eager # Name: interpolate_input_size(1,3,320,320)_output_size(256,256)_channels_lastFalse # Input: input_size: (1, 3, 320, 320), output_size: (256, 256), channels_last: False Forward Execution Time (us) : 56869.472 ``` Reviewed By: fmassa Differential Revision: D26225318 fbshipit-source-id: 7757296192e630c42a6e4913c5c1d93af11d286d	2021-02-10 08:28:16 -08:00
Rong Rong (AI Infra)	c1b7ca8062	Early terminate CUDA on common_utils TestCases (#50914 ) Summary: This is a follow up on https://github.com/pytorch/pytorch/issues/49869. Previously CUDA early termination only happens for generic test classes that extends from `DeviceTypeTestBase`. However, JIT test cases which extends from common_utils.TestCase cannot benefit from the early termination. This change moves the early termination logic into common_utils.TestCase class. - all tests extended from common_utils.TestCase now should early terminate if CUDA assert occurs. - For TestCases that extends from common_device_type.DeviceTypeTestBase, still only do torch.cuda.synchronize() when RTE is thrown. - For TestCases extends common_utils.TestCase, regardless of whether a test case uses GPU or not, it will always synchronize CUDA as long as `torch.cuda.is_initialize()` returns true. - Disabling this on common_distributed.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/50914 Reviewed By: malfet Differential Revision: D26019289 Pulled By: walterddr fbshipit-source-id: ddc7c1c0d00db4d073a6c8bc5b7733637a7e77d1	2021-02-10 07:15:40 -08:00
vfdev	8b0cb5ede3	OpInfo: Added clamp and trunc tests with aliases (#51167 ) Summary: Description: - Added clamp, trunc tests with aliases - Added tests for aliases for asin(h), acos(h), etc - fixed 'fix' alias implementation - fixed annotations in test_jit_alias_remapping - updated native_functions.yaml aliases guidelines Blocked by https://github.com/pytorch/pytorch/issues/50368 cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/51167 Reviewed By: gchanan Differential Revision: D26245753 Pulled By: mruberry fbshipit-source-id: e17b657f0515139735a8a677b1ae284904f98aef	2021-02-10 05:36:18 -08:00
generatedunixname89002005325676	3cf78395cb	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26364039 fbshipit-source-id: 750eb64b22cd84cf99d6595970c10f3aa6037f0b	2021-02-10 04:18:50 -08:00
Mike Ruberry	594a66d778	Warn about floor_divide performing incorrect rounding (#50281 ) (#50281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51745 Test Plan: Imported from OSS Reviewed By: ngimel Pulled By: mruberry Differential Revision: D26257855 fbshipit-source-id: e5d497cf07b0c746838ed081c5d0e82fb4cb701b	2021-02-10 03:13:34 -08:00
nikithamalgi	9c0caf0384	Adding support for comparing two bool varibales (#51844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51844 Fixes issue #48174 ========= Adds support to compare two bool variables Test: ====== python test/test_jit.py -k test_compare_two_bool_inputs Test Plan: Imported from OSS Reviewed By: navahgar Differential Revision: D26353694 Pulled By: nikithamalgifb fbshipit-source-id: 41af5ba3e4075ed7a21595b10e388a7302aa1fce	2021-02-10 02:13:25 -08:00
Bert Maher	602434bcbe	[te] Benchmark vml-based logit (#51771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51771 This benchmarks an NNC implementation of logit based on VML's log implementation. It's a modest improvement over the sleef algorithm, but seems to be a bit slower than aten (at larger sizes), and I'm not totally sure why, since you'd think a fused logit kernel would be better than doing clamp/sub/div, followed by log. And yet... Note that it's important to vectorize this kernel by 16, even on an 8-wide AVX2 machine; I suspect that it's needed to give the scheduler enough freedom to fill up both FMA pipes to avoid stalling on fpdiv or (maybe) memory. ghstack-source-id: 121392349 Test Plan: ``` ----------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------- logit_nnc_sleef/64 483 ns 483 ns 1452336 logit/s=132.469M/s logit_nnc_sleef/512 3019 ns 3019 ns 228059 logit/s=169.577M/s logit_nnc_sleef/8192 71427 ns 71424 ns 9662 logit/s=114.695M/s logit_nnc_sleef/32768 307062 ns 306722 ns 2406 logit/s=106.833M/s logit_nnc_fast/64 147 ns 147 ns 4408910 logit/s=434.908M/s logit_nnc_fast/512 781 ns 781 ns 881230 logit/s=655.53M/s logit_nnc_fast/8192 12519 ns 12518 ns 55626 logit/s=654.421M/s logit_nnc_fast/32768 50530 ns 50526 ns 10000 logit/s=648.536M/s logit_nnc_vml/64 125 ns 125 ns 5551460 logit/s=511.603M/s logit_nnc_vml/512 733 ns 733 ns 938444 logit/s=698.955M/s logit_nnc_vml/8192 11282 ns 11280 ns 61610 logit/s=726.23M/s logit_nnc_vml/32768 45051 ns 44991 ns 15473 logit/s=728.325M/s logit_aten/64 450 ns 449 ns 1599269 logit/s=142.429M/s logit_aten/512 1055 ns 1054 ns 665538 logit/s=485.595M/s logit_aten/8192 10865 ns 10864 ns 64152 logit/s=754.032M/s logit_aten/32768 42106 ns 42103 ns 16477 logit/s=778.287M/s logit_caffe2/64 233 ns 233 ns 2952127 logit/s=274.761M/s logit_caffe2/512 1795 ns 1795 ns 393354 logit/s=285.177M/s logit_caffe2/8192 29924 ns 29923 ns 23225 logit/s=273.77M/s logit_caffe2/32768 123899 ns 123893 ns 5642 logit/s=264.487M/s ``` Reviewed By: bwasti Differential Revision: D26272325 fbshipit-source-id: b9771a96e0150685506dbc625e7894e81c93a688	2021-02-10 02:09:14 -08:00
Bert Maher	2e35fe9535	[te] Implement log approximation using the VML approach (#51752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51752 Using a straight power series approximation with enough terms gives precision down to the denormal range, and avoids the fp division used in the sleef approach. This is nice because recent CPUs have dual pipelined fma units, so we can compute 16 logarithms in parallel; whereas there's usually only one FP divider and it has a fairly high latency/low throughput. ghstack-source-id: 121392347 Test Plan: On my avx2+fma broadwell: ``` --------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... --------------------------------------------------------------------------- log_nnc_sleef/64 178 ns 178 ns 3933565 log/s=358.993M/s log_nnc_sleef/512 1286 ns 1285 ns 559459 log/s=398.354M/s log_nnc_sleef/8192 19366 ns 19364 ns 36619 log/s=423.053M/s log_nnc_sleef/32768 79288 ns 79286 ns 8718 log/s=413.287M/s log_nnc_fast/64 92 ns 92 ns 7644990 log/s=696.939M/s log_nnc_fast/512 483 ns 483 ns 1426802 log/s=1059.49M/s log_nnc_fast/8192 7519 ns 7514 ns 95319 log/s=1090.23M/s log_nnc_fast/32768 31344 ns 31338 ns 22397 log/s=1045.62M/s log_nnc_vml/64 88 ns 88 ns 7923812 log/s=728.469M/s log_nnc_vml/512 454 ns 454 ns 1521437 log/s=1.12739G/s log_nnc_vml/8192 6763 ns 6763 ns 103264 log/s=1.21136G/s log_nnc_vml/32768 26565 ns 26564 ns 23609 log/s=1.23354G/s log_aten/64 418 ns 418 ns 1651401 log/s=153.117M/s log_aten/512 801 ns 801 ns 875857 log/s=638.923M/s log_aten/8192 6877 ns 6872 ns 100840 log/s=1.19208G/s log_aten/32768 26989 ns 26988 ns 26268 log/s=1.21416G/s ``` Reviewed By: bwasti, zheng-xq Differential Revision: D26246400 fbshipit-source-id: dae47ee6baeab1a813ec4d4440748164051aed3d	2021-02-10 02:09:10 -08:00
Bert Maher	ff73be7e45	[te] Introduce likely/unlikely CompareSelect hint (#51751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51751 Similar in spirit to the `__builtin_expect` C intrinsic, it's useful to be able to hint the expected branch direction in a tensor expression. Using this flag has a few effects on codegen: - The CompareSelect is generated using conditional branches, rather than selects - The conditional branches are strongly hinted (like, 100000:1) in the indicated direction - A vectorized hinted CompareSelect computes its condition in parallel with a mask "reduction" (e.g. a bitcast from `<i1 x 8>` to `<i*>`). In AVX terms this sequence might look like: ``` vpcmpgtd %ymm0, %ymm1, %ymm2 vmovmskps %ymm2, %eax ``` The motivating case for this addition is an attempt I'm making to replicate fast transcendentals using tensor expressions. Floating-point numbers have lots of special cases (denormals, inf, nan) that need special handling, and it's convenient to be able to punt that handling off to a slow path while keeping the fast path nice and tight. ghstack-source-id: 121366315 Test Plan: I'm not sure how to test this (except I can tell you it works for the `log` implementation I'm working on right now). It would be nice to plumb the LLIR/ASM output through programmatically so it can be used in FileCheck. Maybe I'll do that in another diff? Reviewed By: asuhan Differential Revision: D26246401 fbshipit-source-id: 900f7fa0520010fb9931d6e3efc8680a51f8d844	2021-02-10 02:09:07 -08:00
Bert Maher	74082f0d6f	[te][llvm] Generate arithmetic vs logical right shift as appropriate (#51749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51749 Following in the mode of C++, we probably want to distinguish when it's appropriate to do arithmetic vs. logical right shift. > For negative a, the value of a >> b is implementation-defined (in most > implementations, this performs arithmetic right shift, so that the result > remains negative). If you look at what clang does, if `a` is unsigned, a logical shift is generated; if signed, an arithmetic shift. Let's do the same here. This turns out to be useful for, e.g., implementing transcendental function approximations. ghstack-source-id: 121366317 Test Plan: Added Byte (unsigned) and Char (signed) right-shift tests to test_llvm. Reviewed By: asuhan Differential Revision: D26245856 fbshipit-source-id: 260ee9bf4b032b9ce216f89acbc273cde0ed688c	2021-02-10 02:05:39 -08:00
Bert Maher	0620c96fd6	Back out "Revert D26009829: Optimize relu on cpu using clamp_min" (#51819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51819 Original commit changeset: 3e945b438fb8 One does not simply change the patterns of aten op calls ghstack-source-id: 121379333 Test Plan: CI Reviewed By: nikithamalgifb Differential Revision: D26291736 fbshipit-source-id: b819ac013c0438cc2f70daed7d7f2ef8fdc12ab7	2021-02-09 23:42:29 -08:00
Vasiliy Kuznetsov	33afb5f19f	fake_quant cachemask: remove Python bindings (#51878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51878 `fake_quantize_per_tensor_affine_cachemask` and `fake_quantize_per_channel_affine_cachemask` are implementation details of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`, removing the Python bindings for them since there is no need to expose them. Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: albanD, bugra Differential Revision: D26314173 fbshipit-source-id: 733c93a3951453e739b6ed46b72fbad2244f6e97	2021-02-09 23:27:53 -08:00
Chunli Fu	5f9fb93c14	[model loading] Add max_batch_size override for batch size exploration Summary: Currently batch_size is determined on modeling side. Add a flag caffe2_predictor_disagg_acc_max_batch_size_override to explore different batch_size during inference. Test Plan: replayer test set caffe2_predictor_disagg_acc_max_batch_size_override=32 on both server and client side. Reviewed By: khabinov Differential Revision: D26318568 fbshipit-source-id: 4fa79e2087a5f7f7670988aec7e5b41e63f9980b	2021-02-09 23:05:15 -08:00
kshitij12345	768662913a	Migrate masked_fill__cuda to ATen (#51404 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49543 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51404 Reviewed By: mrshenli Differential Revision: D26329833 Pulled By: ngimel fbshipit-source-id: 510988888fad015239ab4766eb391a89b742130b	2021-02-09 22:57:03 -08:00
Vasiliy Kuznetsov	929b91a24d	ns_eager: rename Logger I/O var names to logger_cls (#51359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51359 `Logger` is the name of the base Logger class. It's confusing that it is also used as a variable name, which can represent this class or its subclasses. Renaming to `logger_cls` to make it clearer. Test Plan: ``` python test/test_quantization.py TestEagerModeNumericSuite ``` Imported from OSS Reviewed By: supriyar Differential Revision: D26149577 fbshipit-source-id: a9c12f9446f66e5c683ab054b2a94aeb0cf9cc8a	2021-02-09 22:30:44 -08:00
Facebook Community Bot	5a9bac58be	Automated submodule update: FBGEMM (#52014 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `884fb257ab` Pull Request resolved: https://github.com/pytorch/pytorch/pull/52014 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: mrshenli Differential Revision: D26357567 fbshipit-source-id: a9f239c9d3273d04ee15fb052b2bf4f25477814b	2021-02-09 22:19:44 -08:00
Yanli Zhao	18e0a61388	add more logging fields that can be set in construction time (#51260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51260 add more logging fields to DDPLoggingData, including param stats, bucket stats, environment variables, nccl version, data type ghstack-source-id: 121260224 Test Plan: unit tests Reviewed By: rohan-varma Differential Revision: D26118245 fbshipit-source-id: ba48b7a11340bda1f5f3b24c8603545d346361e9	2021-02-09 21:58:58 -08:00
James Reed	d23cb94098	[FX] Generalize dict key check in `create-arg` (#51927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51927 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26329655 Pulled By: jamesr66a fbshipit-source-id: a15e7d9564551521af12a8fde1c7524856f0cbc2	2021-02-09 21:52:22 -08:00
James Reed	256f93fb0f	[FX][EZ] Fix tuple type annotations (#52010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52010 Test Plan: Imported from OSS Reviewed By: ansley Differential Revision: D26355481 Pulled By: jamesr66a fbshipit-source-id: 27bbc5d8949beb68663f2e1e7963bec9afbef0cc	2021-02-09 20:32:30 -08:00
James Reed	d4e84b0c07	[FX] Fix leaf modules in Transformer (#51998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51998 Test Plan: Imported from OSS Reviewed By: Chillee Differential Revision: D26352087 Pulled By: jamesr66a fbshipit-source-id: ad8abc6507d4ea95fd3c99b226d1b40c3e9e64cf	2021-02-09 20:29:17 -08:00
Scott Wolchok	d5a9627f10	[PyTorch] Re-order TensorImpl fields to save a word (#50920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50920 There was a hole left after previous changes. ghstack-source-id: 120714378 Test Plan: static_assert still passes. Reviewed By: ezyang Differential Revision: D26008763 fbshipit-source-id: c3830328835e28a0d06c833172ac60457049824b	2021-02-09 20:18:26 -08:00
Horace He	475278f1c0	[FX] Make some modifications to limitation section (#51928 ) Summary: ![](https://i.imgur.com/P0Tq4xR.jpg) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51928 Reviewed By: jamesr66a Differential Revision: D26329664 Pulled By: Chillee fbshipit-source-id: 94fd7b03ca53f48b1e4633a462c6e02bb0fd2f3c	2021-02-09 18:32:28 -08:00
Shen Li	3af7b673ef	Let child CUDAFuture wait for parent CUDAFuture's CUDAEvents (#51820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51820 If the child cannot extract tensors from returned IValue, the current child CUDAFuture won't wait for anything. In this case, if the `wait()` wasn't called on the parent Future, streams are not synchronized, and it is possible that parent Future's CUDA ops have not been added to streams yet. This commit adds a `markCompletedWithDataPtrs()` to `ivalue::Future`, and RPC uses this API to pass Message tensor dataPtrs to the `PyObject` Future when marking it as completed. Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D26324068 Pulled By: mrshenli fbshipit-source-id: 3d838754f6daabad5cd9fb8953e4360196d110bb	2021-02-09 18:02:07 -08:00
Jeff Daily	c6b4fc8a90	[ROCm] add 4.0.1 docker image (#51507 ) Summary: Add a ROCm 4.0.1 docker image for CI. Keep the 3.10 image. Keep the 3.9 image until the 3.9 image is no longer needed. Plan is to keep two ROCm versions at a time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51507 Reviewed By: seemethere Differential Revision: D26350348 Pulled By: malfet fbshipit-source-id: 6230278343ee48f19e96067180590beab96b17cc	2021-02-09 17:51:16 -08:00
Erjia Guan	1921b244f6	[DataLoader] Rename files of functional datapipes (#51880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51880 Using the reference of [iter-tools](https://more-itertools.readthedocs.io/en/stable/api.html) to rename files based on the functionality. Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D26314776 Pulled By: ejguan fbshipit-source-id: e97bac047a0fa808676cd6f3a9202109d17f81ca	2021-02-09 17:09:10 -08:00
Erjia Guan	9eb70c3c78	[DataLoader] Rename Callable to Map IterDataPipe (#51879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51879 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D26314775 Pulled By: ejguan fbshipit-source-id: ee77909eae97092155ed6a6c794540e68a04d754	2021-02-09 17:09:06 -08:00
Erjia Guan	104371e1dc	[DataLoader] Implement FilterIterDataPipe (#51783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51783 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26277688 Pulled By: ejguan fbshipit-source-id: 25ed7da9da88c030b29627142c2f04fed26cdcda	2021-02-09 17:06:06 -08:00
Aapo Kyrola	e964d77fca	[pytorch] recast infer_type error and amend with name and item that failed inferring Summary: When type inference fails when JITing torchscript module, the error message does not give any implication where the error fails. For example: "Cannot create dict for key type 'int?', only int, float, complex, Tensor and string keys are supported". This adds the variable name and item to the error message. Reviewed By: ajaech Differential Revision: D26327483 fbshipit-source-id: d8c85e7550258d7c56530f5826ff9683ca8b2b94	2021-02-09 16:07:16 -08:00
Raghavan Raman	12d85b536e	Fixing Softmax bench. (#51898 ) Summary: Fixes and enables the microbenchmark for Softmax. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51898 Reviewed By: gmagogsfm Differential Revision: D26333189 Pulled By: navahgar fbshipit-source-id: be0934e413c4f6728593f896e53a0b31f1657e52	2021-02-09 15:03:49 -08:00
Andrey Malevich	7e54a64828	[C2] Add shape inference logic for ColwiseMax operator. (#51914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51914 As desc. Test Plan: Unit-test. Reviewed By: intermilan Differential Revision: D26299115 fbshipit-source-id: 9c80236f843e907476da1747dcd623c85147fa90	2021-02-09 14:12:07 -08:00
Jason Ansel	0410cba23e	[FX] make map_arg require a callable (#51907 ) Summary: This makes something like: `map_arg(lambda x: x, [Node(), Node()])` throw an error (before it would silently return `lambda x: x`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51907 Reviewed By: jamesr66a Differential Revision: D26323916 Pulled By: jansel fbshipit-source-id: f56ebcf9a3af47546d75603567025163f1fb8454	2021-02-09 13:36:27 -08:00
Jacob Szwejbka	2f2b170068	[Pytorch Mobile] Only preserve bundled input helpers for forward if they exist (#51884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51884 it is now possible to bundle inputs and not bundle them for forward. This is ok and so we need to account for that. ghstack-source-id: 121266667 Test Plan: Manually bundle inputs for a function not named forward. Call optimize_for_mobile and make sure the functions are still there. {P173289878} Reviewed By: iseeyuan Differential Revision: D26304558 fbshipit-source-id: 79f82d9de59c70b76f34e01f3d691107bf40e7bc	2021-02-09 13:31:42 -08:00
Antonio Cuni	8fab33f942	Fix the lifetime of PyTensorType (#51649 ) Summary: Make sure that `PyTensorType` objects are always available during the shutdown. See https://github.com/pytorch/pytorch/issues/42125#issuecomment-772397319 for a more in-depth explanation of why it's needed. Fixes https://github.com/pytorch/pytorch/issues/42125. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51649 Reviewed By: zhangguanheng66 Differential Revision: D26256843 Pulled By: ezyang fbshipit-source-id: 4d1dac75e063f0bdc65e0784140641fc4beb8616	2021-02-09 13:28:12 -08:00
Jerry Zhang	0ec00c1292	[docs] Add docs for storage and tensors for quantized Tensor (#51817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51817 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26292464 Pulled By: jerryzh168 fbshipit-source-id: c5992deda4af949de4ea2e40edee8f22bd59b9e1	2021-02-09 13:20:56 -08:00
Kai Yang	fc314350ad	Make RebatchingBuffer compatible with auto shape inference Summary: no-op to operator behavior, resolve https://fburl.com/wte0v7tf Test Plan: buck test Reviewed By: huangyi1979 Differential Revision: D26333212 fbshipit-source-id: d237e8caf5977bc19fcced6aeedc6464fc905457	2021-02-09 12:37:26 -08:00
Richard Barnes	1e171f024b	Fix warnings in TensorShape (#51642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51642 Compiling currently gives: ``` an 13 16:46:39 In file included from ../aten/src/ATen/native/TensorShape.cpp:12: Jan 13 16:46:39 ../aten/src/ATen/native/Resize.h:37:24: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 if (new_size_bytes > self->storage().nbytes()) { Jan 13 16:46:39 ~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:32:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'int64_t' (aka 'long long') [-Wsign-compare] Jan 13 16:46:39 for (size_t i = 0; i < shape_tensor.numel(); ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:122:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < tensors.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:162:21: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int i = 0; i < tensors.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:300:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < s1.size(); ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:807:21: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 TORCH_CHECK(dim < self_sizes.size()); Jan 13 16:46:39 ~~~ ^ ~~~~~~~~~~~~~~~~~ Jan 13 16:46:39 ../c10/util/Exception.h:361:31: note: expanded from macro 'TORCH_CHECK' Jan 13 16:46:39 if (C10_UNLIKELY_OR_CONST(!(cond))) { \ Jan 13 16:46:39 ^~~~ Jan 13 16:46:39 ../c10/util/Exception.h:244:47: note: expanded from macro 'C10_UNLIKELY_OR_CONST' Jan 13 16:46:39 #define C10_UNLIKELY_OR_CONST(e) C10_UNLIKELY(e) Jan 13 16:46:39 ^ Jan 13 16:46:39 ../c10/macros/Macros.h:173:65: note: expanded from macro 'C10_UNLIKELY' Jan 13 16:46:39 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0)) Jan 13 16:46:39 ^~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:855:24: warning: comparison of integers of different signs: 'size_t' (aka 'unsigned long') and 'const int64_t' (aka 'const long long') [-Wsign-compare] Jan 13 16:46:39 for (size_t i = 0; i < num_blocks; ++i) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:2055:23: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int i = 0; i < vec.size(); i++) { Jan 13 16:46:39 ~ ^ ~~~~~~~~~~ Jan 13 16:46:39 ../aten/src/ATen/native/TensorShape.cpp:2100:25: warning: comparison of integers of different signs: 'int64_t' (aka 'long long') and 'size_t' (aka 'unsigned long') [-Wsign-compare] Jan 13 16:46:39 for (int64_t i = 0; i < src.size(); ++i) { ``` This fixes issues with loop iteration variable types Test Plan: Sandcastle tests Reviewed By: ngimel Differential Revision: D25935136 fbshipit-source-id: a5da4af16bb8045cc16ab1c78b8e0f2bb3ae64bd	2021-02-09 11:58:45 -08:00
nikithamalgi	141f615161	Support torch.type (#51904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51904 Fixes issue: #25433 ========= Makes tensor.type(dtype) scriptable Test: ====== python test/test_jit.py -v TestJit.test_script_tensor_type Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26331503 Pulled By: nikithamalgifb fbshipit-source-id: d9188999fee601a8402fdc4d9052dee4e0d529d5	2021-02-09 11:39:57 -08:00
Akifumi Imanishi	b3fda95fe7	Add LazyBatchNormXd (#51862 ) Summary: Same diff with https://github.com/pytorch/pytorch/issues/51548 (cc. albanD) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51862 Reviewed By: izdeby Differential Revision: D26312289 Pulled By: albanD fbshipit-source-id: 9cdec0e0c9021c33d10d85010978c7fa5cb4dc60	2021-02-09 10:29:03 -08:00
Jeff Daily	5dd1568aa3	[ROCm] skip more magma tests (#51915 ) Summary: Additional magma tests have been identified as failing after integrating hipMAGMA into the ROCm builds. Skipping is necessary until they can be fixed properly. This is blocking migration of ROCm CI to 4.0.1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51915 Reviewed By: izdeby Differential Revision: D26326404 Pulled By: malfet fbshipit-source-id: 558cce66f216f404c0316ab036e2e5637fc99798	2021-02-09 09:14:42 -08:00
peterjc123	8c09cc6475	Remove android toolchain in Windows CircleCI image (#51405 ) Summary: Fixes #{issue number} It can spare nearly 10 GB of disk space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51405 Reviewed By: izdeby Differential Revision: D26325768 Pulled By: janeyx99 fbshipit-source-id: d9208c59dfd17d7bb529291821c5f1779666ac6f	2021-02-09 08:46:23 -08:00
Stas Bekman	20fe2e12d6	typo (#48887 ) Summary: a small grammar fix jspisak - thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/48887 Reviewed By: malfet, zhangguanheng66 Differential Revision: D25358638 Pulled By: brianjo fbshipit-source-id: 3b805b54df3410f8770e1c6ddc569b26661cece4	2021-02-09 07:50:29 -08:00
Michael Suo	c357f8b826	[package] make torch.package produce unified format (#51826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51826 Looks like this: ``` resnet.pt ├── .data # Data folder named so it can't clash with torch.package codemodules. │ │ # Names/extensions automatically added to avoid namingconflicts. │ ├── 94286146172688.storage # tensor data │ ├── 94286146172784.storage │ ├── extern_modules # torch.package metadata │ ├── version # version metadata │ └── ... ├── model # package pickled model created w/ │ │ # exporter.save_pickel('model','model.pkl', resnet_model) │ └── model.pkl └── torchvision # all code dependencies for packaged picked └── models # models are captured as source files ├── resnet.py └── utils.py ``` Since `version` is hardcoded in our zip reader/writer implementation, add it as an option that defaults to "version" but accepts other locations for putting the version metadata. Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26295649 Pulled By: suo fbshipit-source-id: 2d75feeb7de0f78196b4d0b6e2b814a7d58bd1dd	2021-02-09 07:45:59 -08:00
Michael Suo	85b25257ff	[package] Use custom persistent_load in PackageImporter (#51595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51595 Right now `PackageExporter` defines its own `persistent_id` but `PackageImporter` uses the one defined in `torch.serialization`. I have some downstream plans to customize this so this PR just splits it out. Not to fear! I know this introduces some duplication and potential for different behavior between `torch.save` and `torch.package`, but I have plans to re-unify them soon. Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26211578 Pulled By: suo fbshipit-source-id: 48a2ccaefb2525e1498ad68b75c46d9de3d479b7	2021-02-09 07:45:55 -08:00
Michael Suo	285e69a9cd	[package] more reliable method for determining standard library-ness (#51694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51694 We implicitly extern standard library modules. Our method of determining whether a module is in the standard library is a little unreliable. In particular, I'm seeing lots of flaky errors on windows/mac CI when I start doing more complicated packaging tests. I looked into the best ways to do this, turns out there's no reliable way, so tools that need to do this generally just parse the Python docs for a listing and save it. I took `isort`'s lists and called it a day. Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D26243751 Pulled By: suo fbshipit-source-id: 48c685cd45ae847fe986bcb9f39106e0c3361cdc	2021-02-09 07:42:41 -08:00
Chester Liu	42635c3e59	Fix regex in collect_env.py for CUDA 11 (#51852 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/51840 Manually tested with both CUDA 10.2.89 & 11.2.67. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51852 Reviewed By: izdeby Differential Revision: D26326105 Pulled By: mrshenli fbshipit-source-id: 46fbe5f20c02bca982ce2ec6e62f7cc3a14fcc97	2021-02-09 07:31:08 -08:00
Brandon Lin	35b3e16091	[pytorch] Fix torch.nn.functional.normalize to be properly scriptable (#51909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51909 Several scenarios don't work when trying to script `F.normalize`, notably when you try to symbolically trace through it with using the default argument: ``` import torch.nn.functional as F import torch from torch.fx import symbolic_trace def f(x): return F.normalize(x) gm = symbolic_trace(f) torch.jit.script(gm) ``` which leads to the error ``` RuntimeError: normalize(Tensor input, float p=2., int dim=1, float eps=9.9999999999999998e-13, Tensor? out=None) -> (Tensor): Expected a value of type 'float' for argument 'p' but instead found type 'int'. : def forward(self, x): normalize_1 = torch.nn.functional.normalize(x, p = 2, dim = 1, eps = 1e-12, out = None); x = None ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE return normalize_1 Reviewed By: jamesr66a Differential Revision: D26324308 fbshipit-source-id: 30dd944a6011795d17164f2c746068daac570cea	2021-02-09 07:26:57 -08:00
Michael Dagitses	d61d8d886b	correct value argument name for Tensor.index_fill_ docs (#51763 ) Summary: The name of "val" is inconsistent with the rest of the API and also inconsistent with the underlying C++ implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51763 Test Plan: Used the following command to demonstrate incorrect docs before and correct docs after: python -c 'import torch; print(torch.Tensor.index_fill_.__doc__)' Fixes https://github.com/pytorch/pytorch/issues/51250 Reviewed By: zhangguanheng66 Differential Revision: D26271273 Pulled By: dagitses fbshipit-source-id: 4897da80b639c54ca652d2111e13f26efe2646a0	2021-02-09 07:15:52 -08:00
Nikita Shulga	d5a2429c24	Fix flake8 failures (#51963 ) Summary: Fixes flake8 failures in test_autograd.py by using `gradcheck` from `torch.testing._internal.common_utils` rather than directly from`torch.autograd.gradcheck` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51963 Reviewed By: albanD Differential Revision: D26339107 Pulled By: malfet fbshipit-source-id: 63e0f12df16b70e394097ad88852984c1848a9e6	2021-02-09 07:02:01 -08:00
Nikita Shulga	a1bfa5eed7	Do not print warning if CUDA driver not found (#51806 ) Summary: It frequently happens when PyTorch compiled with CUDA support is installed on machine that does not have NVIDIA GPUs. Fixes https://github.com/pytorch/pytorch/issues/47038 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51806 Reviewed By: ezyang Differential Revision: D26285827 Pulled By: malfet fbshipit-source-id: 9fd5e690d0135a2b219c1afa803fb69de9729f5e	2021-02-09 06:45:35 -08:00
Nikita Shulga	56034636b9	Workaround arm64 gcc error in `std::copysign` (#51900 ) Summary: Move definition of copysign template and specialization for bfloat16/half types before first use of copysign in that file Add comment explaining why this is necessary Fixes https://github.com/pytorch/pytorch/issues/51889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51900 Reviewed By: walterddr Differential Revision: D26321741 Pulled By: malfet fbshipit-source-id: 888858b11d9708fa140fe9c0570cc5a24599205b	2021-02-09 04:54:29 -08:00
lixinyu	015cabf82a	move GroupByFilename Dataset to DataPipe (#51709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51709 Move GroupByFilename Dataset to DataPipe Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26263585 Pulled By: glaringlee fbshipit-source-id: 00e3e13b47b89117f1ccfc4cd6239940a40d071e	2021-02-09 03:34:56 -08:00
lixinyu	482b94ae51	move RoutedDecoder Dataset to DataPipe (#51704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51704 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D26245910 Pulled By: glaringlee fbshipit-source-id: 91e3c9f8a6c1209c1a1a752ba29a80dbd9bf4119	2021-02-09 03:31:07 -08:00
H1Gdev	8ab22a080b	Build pytorch_android using Gradle wrapper. (#51067 ) Summary: [Here](https://docs.gradle.org/current/userguide/gradle_wrapper.html), there is the following description. `The recommended way to execute any Gradle build is with the help of the Gradle Wrapper` I took a little time to prepare Gradle for `pytorch_android` build. (version etc.) I think using Gradle wrapper will make `pytorch_android` build more seamless. Gradle wrapper version: 4.10.3 `250c71121b/.circleci/scripts/build_android_gradle.sh (L13)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/51067 Reviewed By: izdeby Differential Revision: D26315718 Pulled By: IvanKobzarev fbshipit-source-id: f8077d7b28dc0b03ee48bcdac2f5e47d9c1f04d9	2021-02-09 03:09:08 -08:00
Ailing Zhang	034a007ad8	Remind about AutoNonVariableTypeMode in error message. (#51655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51655 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26228508 Pulled By: ailzhang fbshipit-source-id: f5f48fde3611c84cc6473b77824ebf9dffbb4453	2021-02-08 19:22:38 -08:00
Brian Hirsh	2303c244fc	skip a second call to shouldUseRecordFunction for BackendSelect ops (#50891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50891 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D25999514 Pulled By: bdhirsh fbshipit-source-id: 8a6c17ab502fe463cf3fb38a1e555c64bc5556f0	2021-02-08 18:32:40 -08:00
Jeffrey Wan	7b9ca54ecf	Reset checkpoint_valid flag when error happens during function execution (#51746 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37874, https://github.com/pytorch/pytorch/issues/51743 Uses RAII to manage the flag so that it gets reset properly on exception Pull Request resolved: https://github.com/pytorch/pytorch/pull/51746 Reviewed By: izdeby Differential Revision: D26319619 Pulled By: soulitzer fbshipit-source-id: ea1235438ba516f99195c83fa23d5880f9977c93	2021-02-08 17:48:25 -08:00
Sam Estep	dac730af11	Warn if mypy version doesn't match CI (#51799 ) Summary: This PR adds a local [`mypy` plugin](https://mypy.readthedocs.io/en/stable/extending_mypy.html#extending-mypy-using-plugins) that warns if you accidentally run `mypy` using a version that doesn't match [the version we install for CI](`6045663f39/.circleci/docker/common/install_conda.sh (L117)`), since this trips people up sometimes when `mypy` gives errors in some versions (see https://github.com/pytorch/pytorch/issues/51513) but not others. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51799 Test Plan: To check that this doesn't break our `mypy` test(s) when you have the correct version installed: ``` python test/test_type_hints.py ``` To check that this does indeed warn when you have an incorrect `mypy` version installed, switch to a different version (e.g. 0.782), and run the above command or either of these: ``` mypy mypy --config-file=mypy-strict.ini ``` You should get the following message on stderr: ``` You are using mypy version 0.782, which is not supported in the PyTorch repo. Please switch to mypy version 0.770. For example, if you installed mypy via pip, run this: pip install mypy==0.770 Or if you installed mypy via conda, run this: conda install -c conda-forge mypy=0.770 ``` Reviewed By: janeyx99 Differential Revision: D26282010 Pulled By: samestep fbshipit-source-id: 7b423020d0529700dea8972b27afa2d7068e1b12	2021-02-08 15:43:18 -08:00
Sam Estep	21ef248fb8	[reland] Report test time regressions (#50171 ) Summary: This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub. Important: for uninteresting reasons, this PR moves the `print_test_stats.py` file. - Before: `test/print_test_stats.py` - After: `torch/testing/_internal/print_test_stats.py` Notes on the approach: - Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics. - We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit. - We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite. - We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives. - We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3). - We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR. - Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes. - Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171 Test Plan: To run the unit tests: ``` $ python test/test_testing.py $ python test/print_test_stats.py ``` To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example: - pytorch_linux_bionic_py3_6_clang9_test To test locally, use the following steps. First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option): ``` $ DATA_DIR=/tmp $ ARBITRARY_TEST=testing $ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST ``` Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)): ``` $ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489 $ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test ``` Download the `.json.bz2` file(s) for that commit/job pair: ``` $ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive ``` And feed everything into `test/print_test_stats.py`: ``` $ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/Z.json.bz2 \| torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST ``` The first part of the output should be the same as before this PR; here is the new part, at the end of the output: - https://pastebin.com/Jj1svhAn Reviewed By: malfet, izdeby Differential Revision: D26317769 Pulled By: samestep fbshipit-source-id: 1ba06cec0fafac77f9e7341d57079543052d73db	2021-02-08 15:35:21 -08:00
Yi Wang	9e4f3b89c4	[Gradient Compression] Add register_comm_hook API to DDP communication hooks documentation page (#51846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51846 `register_comm_hook` method is defined in DistributedDataParallel module, but it is not covered in `distributed.rst`. Since it's closely related to DDP communication hook, add the docstrings to `ddp_comm_hooks.rst` instead of a reference. Screenshot: {F370425625} ghstack-source-id: 121278173 Test Plan: view locally python_doc_test: https://app.circleci.com/pipelines/github/pytorch/pytorch/271234/workflows/dc0b443d-8a62-4334-9b42-800c33a68553/jobs/10770636 Reviewed By: rohan-varma Differential Revision: D26298191 fbshipit-source-id: 32e0685fd3c935cf9a2d129e6c520a94aa3e3817	2021-02-08 15:12:39 -08:00
Xu Zhao	1e70b4bb73	Add GH Actions CI to build nightly Docker and push to GitHub Container Registry (#51755 ) Summary: Currently PyTorch repository provides Dockerfile to build Docker with nightly builds, but it doesn't have CI to actually build those Dockers. This PR adds a GitHub action workflow to create PyTorch nightly build Docker and publish them to GitHub Container Registry. Also, add "--always" option to the `git describe --tags` command that generates the Docker image tag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51755 Test Plan: Manually trigger the workflow build in the GitHub Actions web UI. Reviewed By: seemethere Differential Revision: D26320180 Pulled By: xuzhao9 fbshipit-source-id: e00b472df14f5913cab9b06a41e837014e87f1c7	2021-02-08 14:59:30 -08:00
Chester Liu	58eb23378f	Clean up usage of torch._six partially (#49785 ) Summary: See https://github.com/pytorch/pytorch/issues/42919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785 Reviewed By: mruberry Differential Revision: D25963833 Pulled By: bugra fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2	2021-02-08 13:58:34 -08:00
Howard Huang	97e35858ec	[Resubmit] Add compare_set operation and test to TCPStore (#51815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51815 This is resubmission of #51593, already approved. Test Plan: Imported from OSS Reviewed By: izdeby Differential Revision: D26316875 Pulled By: H-Huang fbshipit-source-id: d81cb131ef6b9e2ebaee32bb505dfc11235bc29d	2021-02-08 13:44:31 -08:00
Hao Wu	7363da7c57	onnx export of per channel fake quantize functions (#42835 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39502 This PR adds support for exporting fake_quantize_per_channel_affine to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR https://github.com/pytorch/pytorch/pull/39738. `axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by https://github.com/onnx/onnx/pull/2772. [update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master. The function is also tested offline with the following code ```python import torch from torch import quantization from torchvision import models qat_resnet18 = models.resnet18(pretrained=True).eval().cuda() qat_resnet18.qconfig = quantization.QConfig( activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant) quantization.prepare_qat(qat_resnet18, inplace=True) qat_resnet18.apply(quantization.enable_observer) qat_resnet18.apply(quantization.enable_fake_quant) dummy_input = torch.randn(16, 3, 224, 224).cuda() _ = qat_resnet18(dummy_input) for module in qat_resnet18.modules(): if isinstance(module, quantization.FakeQuantize): module.calculate_qparams() qat_resnet18.apply(quantization.disable_observer) qat_resnet18.cuda() input_names = [ "actual_input_1" ] output_names = [ "output1" ] torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13) ``` It can generate the desired graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/42835 Reviewed By: houseroad Differential Revision: D26293823 Pulled By: SplitInfinity fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea	2021-02-08 13:09:50 -08:00
Jeffrey Wan	159c48b19b	Fix triplet margin loss and reciprocal docs (#51650 ) Summary: Reciprocal: the note should be placed after the formula Triplet-margin-loss (before): ![image](https://user-images.githubusercontent.com/13428986/106784863-cb3eb780-661a-11eb-8372-07b51e4cb2d4.png) After: ![image](https://user-images.githubusercontent.com/13428986/106784948-e5789580-661a-11eb-890c-6185aab96e54.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/51650 Reviewed By: izdeby Differential Revision: D26314151 Pulled By: soulitzer fbshipit-source-id: d7574e64e96a41a515231ba7e1008de8b2f292aa	2021-02-08 12:15:11 -08:00
XiaobingSuper	d90911adf9	fix AdaptiveAveragePooling crash problem for non support input (#51443 ) Summary: For none support input, we should not do check in a parallel region, this PR will first do the dtype check, and then do parallel for. Fixes https://github.com/pytorch/pytorch/issues/51352. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51443 Reviewed By: izdeby Differential Revision: D26305584 Pulled By: ngimel fbshipit-source-id: 6faa3148af5bdcd7246771c0ecb4db2b31ac82c6	2021-02-08 11:43:25 -08:00
Yanan Cao	b9acfcddeb	Support mypy ignore annotation with particular rule specified (#51675 ) Summary: Previously TorchScript allows a ignore-all type check suppression rule that looks like ``` code code code # type: ignore ``` But a more common use case is ``` code code code # type: ignore[specific-rule] ``` This PR allows the more common use case Fixes https://github.com/pytorch/pytorch/issues/48643 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51675 Reviewed By: ansley Differential Revision: D26304870 Pulled By: gmagogsfm fbshipit-source-id: 0ac9ee34f0219c86e428318a69484d5aa3ec433f	2021-02-08 11:21:47 -08:00
Brian Hirsh	41bab9a4b6	Plumbing dispatch keys through the dispatcher (#49354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49354 Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25614042 Pulled By: bdhirsh fbshipit-source-id: 269a75e9a3ac518aa63bff2cafbd47ed2c4ff780	2021-02-08 11:09:51 -08:00
Brian Hirsh	6fa5e96f2e	remove unnecessary BoxedKernelWrapper specialization now that ops are all c10-full (#50963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50963 Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D26026665 Pulled By: bdhirsh fbshipit-source-id: ef6e515f7dae5052538789e5b75dc551b4ce3b11	2021-02-08 11:06:51 -08:00
Natalia Gimelshein	d9e6750759	fix multi_output_kernel (#51827 ) Summary: With zasdfgbnm's help and with his small TensorIterator kernel repro https://github.com/zasdfgbnm/tensoriterator we've found a workaround for what looks like a compiler bug in multi_output_kernel that manifests itself with cuda 10.2 and cuda 11 when there is a non-trivial OffsetCalculator. It looks like those nvcc versions cannot handle inheritance in device structs, so instead of inheriting `multi_outputs_unroll` from `unroll` we make it independent. cc vkuzo, haichuan-fb I verified that reverting https://github.com/pytorch/pytorch/issues/49315 to bring back multi_output_kernel and running `test_learnable_backward_per_channel_cuda` test passes, but I didn't do it in this PR - can you take it up as a follow-up? Pull Request resolved: https://github.com/pytorch/pytorch/pull/51827 Reviewed By: izdeby Differential Revision: D26305559 Pulled By: ngimel fbshipit-source-id: 1168e7c894d237a954abfd1998eaad54f0ce40a7	2021-02-08 10:42:50 -08:00
Sam Estep	21dccbca62	Revert D26232345: [pytorch][PR] Report test time regressions Test Plan: revert-hammer Differential Revision: D26232345 (`7467f90b13`) Original commit changeset: b687b1737519 fbshipit-source-id: 10a031c5500b083f7c82f2ae2743b671c5a07bff	2021-02-08 10:15:07 -08:00
cyy	1aaddd83a5	don't set the same C++ and C standards twice (#51832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51832 Reviewed By: izdeby Differential Revision: D26312660 Pulled By: ezyang fbshipit-source-id: 7d646cd106397e70bca0050d0aa30eb62b085cee	2021-02-08 08:53:26 -08:00
Ralf Gommers	649e683255	Fix torch.nonzero type annotation (#51635 ) Summary: The overloads are a little tricky here. It's important that the overloads are such that it's unambiguous what `torch.nonzero(x)` will resolve to - so just specify defaults for one of the overloads. Also, `out` is left out of the second overload because a non-None value for `out` is not valid in combination with `as_tuple=True`. Closes gh-51434 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51635 Reviewed By: zhangguanheng66 Differential Revision: D26279203 Pulled By: walterddr fbshipit-source-id: 8459c04fc9fbf7fc5f31b3f631aaac2f98b17ea6	2021-02-08 08:45:44 -08:00
Bin Bao	0dd1d60d54	[JIT] Remove Dropout during Frozon Optimization (#51589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51589 Dropout operators are only needed in training. Remove them for frozen models. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D26214259 fbshipit-source-id: 3ab05869e1e1f6c57498ba62bf40944f7c2189aa	2021-02-08 08:38:08 -08:00
mattip	9cbefad83f	concantenate LICENSE files when building a wheel (#51634 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/50695 I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51634 Reviewed By: zhangguanheng66 Differential Revision: D26225550 Pulled By: walterddr fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a	2021-02-08 08:28:46 -08:00
mattip	b97a040f71	ENH: toggle TORCH_WARN_ONCE to TORCH_WARN for tests (#48560 ) Summary: Toward fixing https://github.com/pytorch/pytorch/issues/47624 ~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value. Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~ Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests Step 2: add a python-level decorator to use this toggle in tests Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex` Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560 Reviewed By: ngimel Differential Revision: D26171175 Pulled By: mruberry fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f	2021-02-08 08:21:19 -08:00
albanD	d454a84bab	derivatives.yaml cleanup + restore codegen code forgotten in refactor (#51721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51721 Reviewed By: zhangguanheng66 Differential Revision: D26285908 Pulled By: albanD fbshipit-source-id: 3130736be9146eaee3a8e80be59a66eb2180d536	2021-02-08 08:03:40 -08:00
Sam Estep	7467f90b13	Report test time regressions (#50171 ) Summary: This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub. Important: for uninteresting reasons, this PR moves the `print_test_stats.py` file. - Before: `test/print_test_stats.py` - After: `torch/testing/_internal/print_test_stats.py` Notes on the approach: - Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics. - We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit. - We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite. - We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives. - We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3). - We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR. - Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes. - Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171 Test Plan: To run the unit tests: ``` $ python test/test_testing.py $ python test/print_test_stats.py ``` To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example: - pytorch_linux_bionic_py3_6_clang9_test To test locally, use the following steps. First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option): ``` $ DATA_DIR=/tmp $ ARBITRARY_TEST=testing $ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST ``` Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)): ``` $ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489 $ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test ``` Download the `.json.bz2` file(s) for that commit/job pair: ``` $ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive ``` And feed everything into `test/print_test_stats.py`: ``` $ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/Z.json.bz2 \| torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST ``` The first part of the output should be the same as before this PR; here is the new part, at the end of the output: - https://pastebin.com/Jj1svhAn Reviewed By: walterddr Differential Revision: D26232345 Pulled By: samestep fbshipit-source-id: b687b1737519d2eed68fbd591a667e4e029de509	2021-02-08 07:54:34 -08:00
Jane Xu	c89f15ec6d	Reland nightlies 11.2 (#51874 ) Summary: Cherry-picked commits from https://github.com/pytorch/pytorch/issues/51611. Relanding after https://github.com/pytorch/pytorch/issues/51864 should fix failing CUDA tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/51874 Reviewed By: malfet Differential Revision: D26313173 Pulled By: janeyx99 fbshipit-source-id: 02250abb526cc7400bc2d9bbb146e8210ccd4b40	2021-02-08 07:41:45 -08:00
generatedunixname89002005325676	79832f3d77	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D26309565 fbshipit-source-id: b20d37ea90304052cef9b4dc359a5bd726d7fda7	2021-02-08 04:17:41 -08:00
Andrey Malevich	bce4c82f0d	[C2] Add TypeAndShape Inference logic for ReduceMean (#51828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51828 As desc. Test Plan: Unit-tests. Differential Revision: D26293844 fbshipit-source-id: 2eb2a694c439b794ad7c134409e2b8926aabc91f	2021-02-08 00:57:47 -08:00
Nikita Shulga	fcf8b71234	Disable unaliged-access test from TestVectorizedMemoryAccess.CopyKernel (#51864 ) Summary: Test begins to fail after the driver udpate See https://github.com/pytorch/pytorch/issues/51863 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51864 Reviewed By: bertmaher Differential Revision: D26304018 Pulled By: malfet fbshipit-source-id: bb7ade2f28d8cf8f847159d4ce92391f0794c258	2021-02-07 16:20:31 -08:00
Alexander	0c313564af	Backward through sparse_coo_tensor (#50361 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/49683 This PR solves Backward through sparse_coo_tensor bug by implementing a `sparse_mask_helper` function for n-dimensional sparse tensor for CPU and CUDA which is used to reimplement `sparse_constructor_values_backward` function. This `sparse_mask` function was implemented before for backward sparse-sparse matmul. However, the algorithm is little different because in this case it should be applyable not only for matrices but for n-dimensional tensors. Thankfully it was not quite hard to extend and now both share the same code base. Note that no new tests are required because now the backward for sparse-sparse matmul now uses the new `sparse_mask_helper`. ngimel, mruberry - kindly review this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/50361 Reviewed By: zhangguanheng66 Differential Revision: D26270483 Pulled By: ngimel fbshipit-source-id: ee4bda49ff86e769342674b64d3c4bc34eae38ef	2021-02-06 23:15:54 -08:00
Yi Wang	4b3c99ce4a	[Resubmission] Add a documentation page for DDP communication hooks (#51773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51773 Resubmission of #51715. Minor changes: 1) Removed "Note [Guidance to Tune ``matrix_approximation_rank`` And ``start_powerSGD_iter``]" in powerSGD_hook.py. 2) Removed the duplicate description of `torch.nn.parallel.DistributedDataParallel.register_comm_hook` in ddp_comm_hooks.rst, because it is already covered by distributed.rst. Also updated the doc based on the comments from PowerSGD paper author Thijs Vogels . It seems that `python_doc_test` was flaky. The previous error message was not informative: https://app.circleci.com/pipelines/github/pytorch/pytorch/270682/workflows/8d186a3c-d682-46bf-b617-ad4eef5991e2/jobs/10739143, and all the warnings did also appear on the master branch. Rebasing to a new master branch seems to get this fixed: https://app.circleci.com/pipelines/github/pytorch/pytorch/270696/workflows/1a3adbea-6443-4876-b87b-e17d90d41428/jobs/10740021/steps Screenshot: {F369899792} ghstack-source-id: 121199613 Test Plan: View locally Reviewed By: mingzhe09088 Differential Revision: D26272687 fbshipit-source-id: 6677db496a68171798940a80343f4d9a508e15db	2021-02-06 21:22:04 -08:00
Xiaohan Wei	4968227058	add shape inference for Int8GenQuantParamsMinMax Summary: As titleed Test Plan: successful test flow with A* setup: f245569242 Reviewed By: anurag16 Differential Revision: D25966283 fbshipit-source-id: ef9945d5039933df44c2c3c26ca149f47538ff31	2021-02-06 17:50:43 -08:00
Natalia Gimelshein	6c0bf28da6	[wip] doc_fix (#51825 ) Summary: tries to fix doc_test Pull Request resolved: https://github.com/pytorch/pytorch/pull/51825 Reviewed By: bertmaher Differential Revision: D26295583 Pulled By: ngimel fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80	2021-02-06 11:36:36 -08:00
Natalia Gimelshein	6488b2bc3a	Revert D26282829: [pytorch][PR] Adding support for CUDA 11.2 in our nightly build matrix Test Plan: revert-hammer Differential Revision: D26282829 (`fb07aca7b0`) Original commit changeset: b15380e5c44a fbshipit-source-id: 18f86e766ed9ec58da32167584bb5e4e2c87a639	2021-02-06 11:22:23 -08:00
nikithamalgi	fa70168804	Add metacompile of Ternary if (#51789 ) Summary: Fixes issue: https://github.com/pytorch/pytorch/issues/49728 ======== Ternary if operation fails in Torchscript when the condition variable is annotated as Final. Tests: ======= pytest -k test_ternary_static_if test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/51789 Reviewed By: gmagogsfm Differential Revision: D26278969 Pulled By: nikithamalgifb fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e	2021-02-06 10:14:30 -08:00
Ansha Yu	8a9090219e	[pyper] register aten::index_out (#51742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51742 Register aten::index_out with StaticRuntime Test Plan: ``` MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_weights=/data/users/ansha/tmp/adfinder/models/c2_local_ro_weight_data.pb --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_local_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_local_ro_net.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models2/210494966_0.predictor.disagg.local_ro.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/local_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=true --compare_results=0 --iters=30000 --warmup_iters=10000 --num_threads=1 --do_profile=1 --benchmark_c2_predictor=0 ``` Before total ms/iter: 0.699626 Before: 0.0277974 ms. 4.03198%. aten::index (5 nodes) After total ms/iter: 0.696739 After: 0.0254255 ms. 3.67315%. aten::index (5 nodes) Reviewed By: hlu1 Differential Revision: D26261215 fbshipit-source-id: b59ebd5ccd33478a9fbc4629a0075fec597a05cb	2021-02-06 04:26:15 -08:00
Raziel Alvarez Guevara	9a964ce89b	Enables backend preprocessing to take place outside of the backend interface (#51757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51757 Enables backend preprocessing to take place outside of the backend interface. What's new: * A new definition for backend preprocessing (i.e. BackendPreprocessFunction). * Registration of the backend's PyTorchBackendInterface interface implementation is augmented to take the BackendPreprocessFunction. * A new registry is created to handle the BackendPreprocessFunction functions, using the backend's name as key. * When a BackendPreprocessFunction is used, the PyTorchBackendInterface's "preprocess" method is not added to the LoweredModule. Instead, the BackendPreprocessFunction is called and its output used to set the LoweredModule's __processed_module. Why?: These changes are needed to avoid forcing backend preprocessing to be part of the LoweredModule, and in the future be able to eliminate "preprocess" from the PyTorchBackendInterface. This is important for Mobile use cases where "preprocess" can take the bulk of the compilation process, and thus contain code dependencies that we do not want to bring (or cannot bring) to the Mobile binary. What didn't change: * Everything is backwards compatible: The existing "preprocess" method in PyTorchBackendInterface is still there. When backend registration is done without the BackendPreprocessFunction, as before, things work the same way: "preprocess" is added to LoweredModule, and invoked through the module's instance of the backend interface. Longer term, the plan is to refactor existing users to move to the new backend registration. ghstack-source-id: 121190883 Test Plan: Updated existing tests (test_backend.py) to use the new registration mechanism. Verified test ran and passed (in my OSS build). Reviewed By: iseeyuan Differential Revision: D26261042 fbshipit-source-id: 0dc378acd5f2ab60fcdc01f7373616d1db961e61	2021-02-06 01:07:17 -08:00
Ansley Ussery	215d9daceb	Refactor internal methods into debugging utilities (#51737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51737 Test Plan: Imported from OSS Reviewed By: pbelevich Differential Revision: D26288613 Pulled By: ansley fbshipit-source-id: 4504b1af5be7a200c1a6a376d432d7224eb8a796	2021-02-05 21:42:18 -08:00
Kimish Patel	19753af6ea	[QNNPACK Sparsity] Add aarch64 kernel of 8x1 sparsity (#51120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51120 Adds asm kernels for aarch64 using 8x1 sparsity. Also remove aarch32 8x4 prepacked kernels and 8x4 inline packed sse2 kernels. Test Plan: q8gemm-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26077766 fbshipit-source-id: 29793d30a47b8f4084daf8950d925dc804d3dc59	2021-02-05 18:47:38 -08:00
Kimish Patel	6b2811f288	[QNNPACK, Sparsity] Add 8x1 block sparse kernels for aarch32. (#51119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51119 Adds asm kernel for 8x1 block sparse kernel. Since ukernels is still producing 4x8 blocks, similar to 1x4 sparsity pattern, we can use the same prepacking kernel for activation. It does get a tiny bit hacky but allows us to reuse the kernel. Test Plan: q8gemm-sparse-test fully-connectest-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26077765 fbshipit-source-id: cc087b0ff717a613906d442ea73680e785e0ecc2	2021-02-05 18:47:33 -08:00
Kimish Patel	c034e0750c	[QNNPACK, Sparsity] Code refactoring to allow for more generic block (#51118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51118 sparsity Modify BCSR to pack generic block sparsity pattern. Modify rest of the code to accommodate the change. This is in preperation to support 8x1 sparsity. Test Plan: q8gemm-sparse-test Imported from OSS Reviewed By: AshkanAliabadi Differential Revision: D26077767 fbshipit-source-id: 7179975b07a1cb76ef26896701d782fb04638743	2021-02-05 18:44:25 -08:00
jiej	bc1b1e8253	fixing mkldnn_linear & backward with silent error (#51713 ) Summary: mkldnn_linear & mkldnn_linear_backward_input gives wrong result when weight is non contiguous. Issue exposed in PR https://github.com/pytorch/pytorch/issues/51613 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51713 Reviewed By: zhangguanheng66 Differential Revision: D26282319 Pulled By: ngimel fbshipit-source-id: 96516e10c9dc72c30dac278fce09b746aa5f51b2	2021-02-05 18:36:30 -08:00

4395 changed files with 279322 additions and 105893 deletions

									
										63

.azure_pipelines/build-pipeline.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,63 @@

					# PyTorch CI Builds Pipeline on Azure DevOps

					#

					# This pipeline:

					#   1) builds PyTorch on select configurations

					#   2) runs only TestTorch unit tests.

					stages:

					- stage: 'Build'

					  displayName: 'Build PyTorch'

					  jobs:

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: 'PyTorch-Linux-CPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_ci_build: True

					      os: ubuntu

					      cuda: cpu

					      customMatrixes:

					        Py_38:

					          configuration: ubuntu_1804_py_38_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: 'PyTorch-Linux-GPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_ci_build: True

					      os: ubuntu

					      cuda: gpu

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: ubuntu_1804_py_39_cuda_112_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev

					          CUDA_VERSION: 112

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: 'PyTorch-Win-CPU'

					      build_stage: True

					      is_ci_build: True

					      os: windows

					      cuda: cpu

					      customMatrixes:

					        Py_37:

					          configuration: windows_2019_py_37_cpu

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: 'PyTorch-Win-GPU'

					      build_stage: True

					      is_ci_build: True

					      os: windows

					      cuda: gpu

					      customMatrixes:

					        Py_38_CUDA_102_cuDNN_765:

					          configuration: windows_2019_py_38_cuda_102_cudnn_765

					          CUDA_VERSION: 102

									
										82

.azure_pipelines/daily-pipeline.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,82 @@

					# PyTorch Daily Builds Pipeline on Azure DevOps

					#

					# This pipeline:

					#   1) builds PyTorch on all available configurations

					#   2) runs all PyTorch unit tests

					stages:

					- stage: 'BuildTest'

					  displayName: 'Build and Test PyTorch'

					  jobs:

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: 'PyTorch-Linux-CPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_daily_build: True

					      os: ubuntu

					      cuda: cpu

					      customMatrixes:

					        Py_38:

					          configuration: ubuntu_1804_py_38_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev

					        Py_37:

					          configuration: ubuntu_1804_py_37_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: 'PyTorch-Linux-GPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_daily_build: True

					      os: ubuntu

					      cuda: gpu

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: ubuntu_1804_py_39_cuda_112_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_810:

					          configuration: ubuntu_1804_py_38_cuda_102_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_765:

					          configuration: ubuntu_1804_py_37_cuda_101_cudnn_765

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev

					          CUDA_VERSION: 101

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: 'PyTorch-Win-CPU'

					      build_stage: True

					      is_daily_build: True

					      os: windows

					      cuda: cpu

					      customMatrixes:

					        Py_38:

					          configuration: windows_2019_py_38_cpu

					        Py_37:

					          configuration: windows_2019_py_37_cpu

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: 'PyTorch-Win-GPU'

					      build_stage: True

					      is_daily_build: True

					      os: windows

					      cuda: gpu

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: windows_2019_py_39_cuda_112_cudnn_810

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_765:

					          configuration: windows_2019_py_38_cuda_102_cudnn_765

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_764:

					          configuration: windows_2019_py_37_cuda_101_cudnn_764

					          CUDA_VERSION: 101

									
										134

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,134 @@

					# PyTorch build steps template with Unix images Azure DevOps Instances

					#

					# This build depends on 3 parameters set as environment variables in the pipeline:

					#   - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps

					#   - AZURE_DEVOPS_ARTIFACTS_ORGANIZATION: Azure Artifacts Organization name to publish artifacts

					#   - AZURE_DEVOPS_ARTIFACTS_PROJECT: Azure Artifacts Project name to publish artifacts

					parameters:

					  name: ''

					  pool: ''

					  container_endpoint: ''

					  os: ''

					  cuda: ''

					  is_ci_build: False

					  is_official_build: False

					  is_daily_build: False

					  build_stage: False

					  verify_stage: False

					  publish_stage: False

					  customMatrixes: ''

					jobs:

					- job: ${{parameters.name}}

					  timeoutInMinutes: 300

					  strategy:

					    matrix:

					      ${{ insert }}: ${{parameters.customMatrixes}}

					  pool:

					    name: ${{ parameters.pool}}

					  variables:

					    DECODE_PERCENTS: false

					  container:

					    image: $[variables['container_image']]

					    endpoint: ${{parameters.container_endpoint}}

					  steps:

					  # Build stage

					  - ${{ if eq(parameters.build_stage, 'True') }}:

					    # Set up environment variables for specific pipeline build

					    - template: set-environment-variables.yml

					      parameters:

					        os: ${{ parameters.os}}

					        cuda: ${{ parameters.cuda}}

					        is_official_build: ${{ parameters.is_official_build}}

					    # Sync and update PyTorch submodules

					    - bash: git submodule update --init --recursive

					      displayName: Update PyTorch submodules

					    # Build PyTorch and run unit tests - no packaging

					    - ${{ if or(eq(parameters.is_ci_build, 'True'), eq(parameters.is_daily_build, 'True')) }}:

					      # Build PyTorch from source in develop mode

					      - bash: python setup.py develop

					        displayName: Build PyTorch from source

					      - ${{ if eq(parameters.is_ci_build, 'True') }}:

					        # Run TestTorch unit tests to demonstrate successful PyTorch build

					        - bash: python test/test_torch.py TestTorch

					          displayName: Run TestTorch unit tests

					      - ${{ if eq(parameters.is_daily_build, 'True') }}:

					        # Run all unit tests to demonstrate successful PyTorch build

					        - bash: python test/run_test.py --continue-through-error --exclude-jit-executor --verbose

					          displayName: Run all unit tests

					      # Run ComponentGovernance

					      - task: ComponentGovernanceComponentDetection@0

					        inputs:

					          scanType: 'Register'

					          verbosity: 'Verbose'

					          alertWarningLevel: 'High'

					    # Build PyTorch and produce artifacts for verification stage

					    - ${{ if eq(parameters.is_official_build, 'True') }}:

					      # Build PyTorch from source in install mode and exclude test binaries

					      - bash: python setup.py install

					        displayName: Build PyTorch from source without test binaries

					      # Package PyTorch Wheel

					      - bash: python setup.py bdist_wheel

					        displayName: Package PyTorch Wheel

					      # Publish PyTorch Wheel

					      - task: PublishPipelineArtifact@1

					        inputs:

					          targetPath: $(Build.SourcesDirectory)/dist/

					          artifactName: Build_$(Build.BuildNumber)_$(configuration)

					        displayName: Publish PyTorch Wheel to Pipeline Artifacts

					  # Verification stage

					  - ${{ if eq(parameters.verify_stage, 'True') }}:

					    # Download PyTorch Wheel

					    - task: DownloadPipelineArtifact@2

					      inputs:

					        artifact: Build_$(Build.BuildNumber)_$(configuration)

					        path: $(Build.SourcesDirectory)/verify

					      displayName: Download PyTorch Wheel

					    # Install PyTorch Wheel on Windows

					    - bash: python -m pip install $(Build.SourcesDirectory)/verify/torch*linux*.whl

					      displayName: Install PyTorch Wheel

					    # Ensure PyTorch installed correctly from produced wheel

					    - bash: |

					        cd $(Build.SourcesDirectory)/verify

					        python -c "import torch; print('Installed Torch version: ' + torch.__version__)"

					      displayName: Check PyTorch correctly installed from wheel

					  # Publishing stage

					  - ${{ if eq(parameters.publish_stage, 'True') }}:

					    # Download PyTorch Wheel

					    - task: DownloadPipelineArtifact@2

					      inputs:

					        artifact: Build_$(Build.BuildNumber)_$(configuration)

					        path: $(Build.SourcesDirectory)/publish

					      displayName: Download PyTorch Wheel

					    # Publish wheel to Azure Artifacts

					    # The flag continueOnError=true is needed as the artifact to be published

					    # may already exist, because the artifact is differentiated based on the

					    # last commit date.

					    - bash: |

					        export TORCH_VERSION=$(head -c 5 ./version.txt)

					        export LAST_COMMIT=$(git rev-parse --short HEAD)

					        export LAST_COMMIT_DATE=$(git log -1 --pretty=%ad --date=format:%Y%m%d)

					        cd $(Build.SourcesDirectory)/publish

					        export TORCH_WHEEL=$(echo torch*linux*whl)

					        az extension add -n azure-devops

					        echo $ADOTOKEN | az devops login

					        az artifacts universal publish --organization $AZURE_DEVOPS_ARTIFACTS_ORGANIZATION --project $AZURE_DEVOPS_ARTIFACTS_PROJECT --scope project --feed "PyTorch" --name $TORCH_WHEEL --description "PyTorch Official Build Artifact" --version $TORCH_VERSION-$LAST_COMMIT_DATE-$LAST_COMMIT --path .

					      env:

					        ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

					      continueOnError: true

					      displayName: Upload PyTorch Official Build package to Azure Artifacts

									
										150

.azure_pipelines/job_templates/build-verify-publish-template-win.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,150 @@

					# PyTorch build steps template with Windows images Azure DevOps Instances

					#

					# This build depends on 3 parameters set as environment variables in the pipeline:

					#   - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps

					#   - AZURE_DEVOPS_ARTIFACTS_ORGANIZATION: Azure Artifacts Organization name to publish artifacts

					#   - AZURE_DEVOPS_ARTIFACTS_PROJECT: Azure Artifacts Project name to publish artifacts

					parameters:

					  name: ''

					  pool: ''

					  os: ''

					  cuda: ''

					  is_ci_build: False

					  is_official_build: False

					  is_daily_build: False

					  build_stage: False

					  verify_stage: False

					  publish_stage: False

					  customMatrixes: ''

					jobs:

					- job: ${{parameters.name}}

					  timeoutInMinutes: 300

					  strategy:

					    matrix:

					      ${{ insert }}: ${{parameters.customMatrixes}}

					  pool:

					    name: ${{ parameters.pool}}

					  variables:

					    CMAKE_GENERATOR: Ninja

					    PACKAGE_PDBS: 0

					  steps:

					  # Prepare for PyTorch build on Windows

					  - template: prepare-build-template.yml

					    parameters:

					      configuration: $(configuration)

					      build_stage: ${{ parameters.build_stage}}

					  # Build Stage

					  - ${{ if eq(parameters.build_stage, 'True') }}:

					    # Set up environment variables for specific pipeline build

					    - template: set-environment-variables.yml

					      parameters:

					        os: ${{ parameters.os}}

					        cuda: ${{ parameters.cuda}}

					        is_official_build: ${{ parameters.is_official_build}}

					    # Sync and update PyTorch submodules

					    - script: git submodule update --init --recursive

					      displayName: Update PyTorch submodules

					    # Build PyTorch and run unit tests - no packaging

					    - ${{ if or(eq(parameters.is_ci_build, 'True'), eq(parameters.is_daily_build, 'True')) }}:

					      # Build PyTorch from source in develop mode with Ninja

					      - script: call activate $(configuration) && python setup.py develop

					        displayName: Build PyTorch from source

					      - ${{ if eq(parameters.is_ci_build, 'True') }}:

					        # Run TestTorch unit tests to demonstrate successful PyTorch build

					        - script: call activate $(configuration) && python test\test_torch.py TestTorch

					          displayName: Run TestTorch unit tests

					      - ${{ if eq(parameters.is_daily_build, 'True') }}:

					        # Run all unit tests to demonstrate successful PyTorch build

					        - script: call activate $(configuration) && python test/run_test.py --continue-through-error --exclude-jit-executor --verbose

					          displayName: Run all unit tests

					      # Run ComponentGovernance

					      - task: ComponentGovernanceComponentDetection@0

					        inputs:

					          scanType: 'Register'

					          verbosity: 'Verbose'

					          alertWarningLevel: 'High'

					    # Build PyTorch and produce artifacts for verification stage

					    - ${{ if eq(parameters.is_official_build, 'True') }}:

					      # Build PyTorch from source in install mode with Ninja and exclude test binaries

					      - script: call activate $(configuration) && python setup.py install

					        displayName: Build PyTorch from source without test binaries

					      # Package PyTorch Wheel

					      - script: call activate $(configuration) && python setup.py bdist_wheel

					        displayName: Package PyTorch Wheel

					      # Publish PyTorch Wheel

					      - task: PublishPipelineArtifact@1

					        inputs:

					          targetPath: $(Build.SourcesDirectory)\dist\

					          artifactName: Build_$(Build.BuildNumber)_$(configuration)

					        displayName: Publish PyTorch Wheel to Pipeline Artifacts

					  # Verification Stage

					  - ${{ if eq(parameters.verify_stage, 'True') }}:

					    # Download PyTorch Wheel

					    - task: DownloadPipelineArtifact@2

					      inputs:

					        artifact: Build_$(Build.BuildNumber)_$(configuration)

					        path: $(Build.SourcesDirectory)\verify

					      displayName: Download PyTorch Wheel

					    # Install PyTorch Wheel on Windows

					    - script: |

					        call activate $(configuration)

					        cd $(Build.SourcesDirectory)\verify

					        dir torch*win*.whl /b > whl.txt

					        set /p whl= < whl.txt

					        python -m pip install %whl%

					      displayName: Install PyTorch Wheel

					    # Ensure PyTorch installed correctly from produced wheel

					    - script: |

					        call activate $(configuration)

					        cd $(Build.SourcesDirectory)\verify

					        python -c "import torch; print('Installed Torch version: ' + torch.__version__)"

					      displayName: Check PyTorch correctly installed from wheel

					  # Publishing stage

					  - ${{ if eq(parameters.publish_stage, 'True') }}:

					    # Download PyTorch Wheel

					    - task: DownloadPipelineArtifact@2

					      inputs:

					        artifact: Build_$(Build.BuildNumber)_$(configuration)

					        path: $(Build.SourcesDirectory)\publish

					      displayName: Download PyTorch Wheel

					    # Set up Azure Artifacts for Windows

					    # The pip install --upgrade command is a bug fix for Azure CLI on Windows

					    # More info: https://github.com/Azure/azure-cli/issues/16858

					    - script: |

					        pip install --upgrade pip --target \opt\az\lib\python3.6\site-packages\

					        az extension add -n azure-devops

					      displayName: Set up Azure Artifacts download on Windows

					    # Publish wheel to Azure Artifacts

					    # The flag continueOnError=true is needed as the artifact to be published

					    # may already exist, because the artifact is differentiated based on the

					    # last commit date.

					    - script: |

					        set /p TORCH_VERSION= < version.txt

					        cd $(Build.SourcesDirectory)\publish

					        git rev-parse --short HEAD > last_commit.txt && set /p LAST_COMMIT= < last_commit.txt

					        git log -1 --pretty=%ad --date=format:%Y%m%d > last_commit_date.txt && set /p LAST_COMMIT_DATE= < last_commit_date.txt

					        dir torch*win*.whl /b > whl.txt && set /p TORCH_WHEEL= < whl.txt

					        echo %ADOTOKEN% | az devops login

					        az artifacts universal publish --organization %AZURE_DEVOPS_ARTIFACTS_ORGANIZATION% --project %AZURE_DEVOPS_ARTIFACTS_PROJECT% --scope project --feed "PyTorch" --name %TORCH_WHEEL% --description "PyTorch Official Build Artifact" --version %TORCH_VERSION:~0,5%-%LAST_COMMIT_DATE%-%LAST_COMMIT% --path .

					      env:

					        ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

					      continueOnError: true

					      displayName: Upload PyTorch nigthly package to Azure Artifacts

									
										17

.azure_pipelines/job_templates/common-packages.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,17 @@

					dependencies:

					  - python=PYTHON_VERSION

					  - numpy

					  - ninja

					  - pyyaml

					  - mkl

					  - mkl-include

					  - setuptools

					  - cmake

					  - cffi

					  - typing_extensions

					  - future

					  - six

					  - requests

					  - dataclasses

					  - pip:

					    - -r ../../requirements.txt

									
										62

.azure_pipelines/job_templates/prepare-build-template.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,62 @@

					# Build prepare steps for PyTorch on Azure DevOps to build from source.

					# These steps share between normal build process and semmle security scan tasks

					parameters:

					  build_stage: False

					  configuration: ''

					steps:

					# End Python tasks that may be lingering over from previous runs

					# Note: If python.exe isn't currently running, exit code becomes 128,

					# which fails the run. Here exit code is set to 0 to avoid failed run.

					- script: |

					    taskkill /f /im python.exe

					    IF %ERRORLEVEL% EQU 128 exit 0

					  displayName: End previous Python processes

					# Clean up env directory in conda for fresh builds and set up conda environment YAML

					- powershell: |

					    Remove-Item 'C:\Miniconda\envs' -Recurse -ErrorAction Ignore

					    $env:PYTHON_VERSION = $env:SYSTEM_JOBNAME.Substring(3,1) + '.' + $env:SYSTEM_JOBNAME.Substring(4,1)

					    (Get-Content .azure_pipelines\job_templates\common-packages.yml) -replace 'PYTHON_VERSION', $env:PYTHON_VERSION | Out-File -encoding ASCII .azure_pipelines\job_templates\common-packages.yml

					  displayName: Clean up previous environments and Set up conda environment YAML

					# Make conda environment and install required packages

					- script: |

					    call conda clean --all -y

					    call conda env create -n $(configuration) --file .azure_pipelines\job_templates\common-packages.yml

					    call activate $(configuration)

					    call conda install -c conda-forge libuv=1.39

					  displayName: Set up conda environment for building from source

					- ${{ if eq(parameters.build_stage, 'True') }}:

					  # Install MKL

					  - script: |

					      rmdir /s /q mkl

					      del mkl_2020.2.254.7z

					      curl https://s3.amazonaws.com/ossci-windows/mkl_2020.2.254.7z -k -O

					      7z x -aoa mkl_2020.2.254.7z -omkl

					    displayName: Install MKL

					  # Install sccache and randomtemp

					  # Related PyTorch GitHub issue: https://github.com/pytorch/pytorch/issues/25393

					  # Related fix: https://github.com/pytorch/builder/pull/448/

					  - script: |

					      mkdir .\tmp_bin

					      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output .\tmp_bin\sccache.exe

					      curl -k https://s3.amazonaws.com/ossci-windows/sccache-cl.exe --output .\tmp_bin\sccache-cl.exe

					      copy .\tmp_bin\sccache.exe .\tmp_bin\nvcc.exe

					      curl -kL https://github.com/peterjc123/randomtemp-rust/releases/download/v0.3/randomtemp.exe --output .\tmp_bin\randomtemp.exe

					    displayName: Install sccache and randomtemp

					    condition: not(eq(variables.CUDA_VERSION, ''))

					  # CUDA 11.2's CUB directory conflicts with CUDA 10.2 and 10.1

					  # builds, where CUDA 11.2's CUB is injected into non-CUDA

					  # 11.2 builds.

					  - powershell: Remove-Item "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include\cub" -Recurse -ErrorAction Ignore

					    displayName: Remove conflicting CUB from CUDA installation

					    condition: not(eq(variables.CUDA_VERSION, ''))

					  - powershell: Copy-Item -Path "F:\cuda_11_2\cub\" -Destination "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include" -Recurse

					    displayName: Copy CUDA CUB for CUDA 11.2 build

					    condition: eq(variables.CUDA_VERSION, '112')

									
										51

.azure_pipelines/job_templates/pytorch-template-unix.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,51 @@

					# PyTorch build steps template with Unix images Azure DevOps Instances

					#

					# This build depends on 5 parameters set as an environment variables in the pipeline:

					#   - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps

					#   - AZURE_STORAGE_KEY: Secret var for authenticating to Azure Storage

					#   - _TS_CLONE_P, _TS_P, _TS_SM_P: Secret vars for specific unit tests

					parameters:

					  name: ''

					  pool: ''

					  container_endpoint: ''

					  customMatrixes: ''

					jobs:

					- job: ${{parameters.name}}

					  timeoutInMinutes: 600

					  strategy:

					    matrix:

					      ${{ insert }}: ${{parameters.customMatrixes}}

					  pool:

					    name: ${{ parameters.pool}}

					  variables:

					    DECODE_PERCENTS: false

					  steps:

					  # Don't checkout repo contents to save time and CPU compute. Environment variables

					  # related to checkout branch such as $(BUILD_SOURCEBRANCH) are still available.

					  - checkout: none

					  # Delete pytorch_tests repo from previous builds if exists

					  - bash: rm -rf pytorch_tests/

					    displayName: Delete pytorch_tests repo from previous builds if exists

					  # Clone PyTorch Tests repository

					  - bash: |

					      B64_PAT=$(printf "%s"":$_ADOTOKEN" | base64)

					      git -c http.extraHeader="Authorization: Basic ${B64_PAT}" clone $(AZURE_DEVOPS_PYTORCH_TESTS_REPO_URL)

					      cd pytorch_tests

					      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

					    env:

					      _ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

					    displayName: Clone PyTorch Tests repo

					  # Run PyTorch Unit Tests

					  - bash: bash $(Build.SourcesDirectory)/pytorch_tests/scripts/linux/run.sh

					    env:

					      _AZURE_STORAGE_KEY: $(AZURE_STORAGE_KEY)

					      _TS_CLONE_P: $(TS_CLONE_PASSWORD)

					      _TS_P: $(TS_PAT)

					      _TS_SM_P: $(TS_SM_PAT)

					    displayName: Run PyTorch Unit Tests

									
										49

.azure_pipelines/job_templates/pytorch-template-win.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,49 @@

					# PyTorch build steps template with Windows images Azure DevOps Instances

					#

					# This build depends on 5 parameters set as an environment variables in the pipeline:

					#   - AZURE_DEVOPS_CLI_PAT: Secret var for authenticating to Azure DevOps

					#   - AZURE_STORAGE_KEY: Secret var for authenticating to Azure Storage

					#   - _TS_CLONE_P, _TS_P, _TS_SM_P: Secret vars for specific unit tests

					parameters:

					  name: ''

					  pool: ''

					  customMatrixes: ''

					jobs:

					- job: ${{parameters.name}}

					  timeoutInMinutes: 600

					  strategy:

					    matrix:

					      ${{ insert }}: ${{parameters.customMatrixes}}

					  pool:

					    name: ${{ parameters.pool}}

					  steps:

					  # Don't checkout repo contents to save time and CPU compute. Environment variables

					  # related to checkout branch such as $(BUILD_SOURCEBRANCH) are still available.

					  - checkout: none

					  # Delete pytorch_tests repo from previous builds if exists

					  - script: if exist "pytorch_tests/" rmdir "pytorch_tests/" /q /s

					    displayName: Delete pytorch_tests repo from previous builds if exists

					  # Clone PyTorch Tests repository

					  - powershell: |

					      $env:B64Pat = [Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(":$env:_ADOTOKEN"))

					      git -c http.extraHeader="Authorization: Basic $env:B64Pat" clone $env:AZURE_DEVOPS_pytorch_tests_REPO_URL

					      cd pytorch_tests

					      git checkout $(PYTORCH_TESTS_CHECKOUT_BRANCH)

					    env:

					      _ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

					    displayName: Clone PyTorch Tests repo

					  # Run PyTorch Unit Tests

					  - script: call $(Build.SourcesDirectory)\pytorch_tests\scripts\windows\run.bat

					    env:

					      _ADOTOKEN: $(AZURE_DEVOPS_CLI_PAT)

					      _AZURE_STORAGE_KEY: $(AZURE_STORAGE_KEY)

					      _TS_CLONE_P: $(TS_CLONE_PASSWORD)

					      _TS_P: $(TS_PAT)

					      _TS_SM_P: $(TS_SM_PAT)

					    displayName: Run PyTorch Unit Tests

									
										131

.azure_pipelines/job_templates/set-environment-variables.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,131 @@

					# Set environment variables for specific configurations

					parameters:

					  is_official_build: False

					  os: ''

					  cuda: ''

					steps:

					  # Environment configuration steps for Ubuntu builds

					  - ${{ if contains(parameters.os, 'ubuntu') }}:

					    # Set configuration specific build flags

					    - ${{ if eq(parameters.is_official_build, True) }}:

					      - bash: |

					          echo "##vso[task.setvariable variable=INSTALL_TEST;]0"

					          echo "##vso[task.setvariable variable=PYTORCH_BUILD_NUMBER;]1"

					          export PYTORCH_VERSION=$(head -c 5 ./version.txt)

					          echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$PYTORCH_VERSION.dev"

					        displayName: Set configuration-specific build flags

					      # Set PyTorch CPU/GPU build flags.

					      - ${{ if contains(parameters.cuda, 'cpu') }}:

					        - bash: |

					            echo "##vso[task.setvariable variable=USE_CUDA;]0"

					            echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cpu"

					          displayName: Set CUDA-specific build flag for CPU builds

					      - ${{ if contains(parameters.cuda, 'gpu') }}:

					        - bash: |

					            echo "##vso[task.setvariable variable=USE_CUDA;]1"

					            echo "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cu$(CUDA_VERSION)"

					          displayName: Set CUDA-specific build flag for GPU builds

					    # Set MKL environment variables

					    - bash: |

					        echo "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]/opt/intel/lib:$CMAKE_LIBRARY_PATH"

					        echo "##vso[task.setvariable variable=CMAKE_INCLUDE_PATH;]/opt/intel/include:$CMAKE_INCLUDE_PATH"

					      displayName: Set MKL paths

					    # View current environment variables

					    - bash:

					        printenv

					      displayName: Show environment variables

					  # Environment configuration steps for Windows builds

					  - ${{ if contains(parameters.os, 'windows') }}:

					    # Set Conda Lib Path

					    - powershell: Write-Host "##vso[task.setvariable variable=CONDA_LIB_PATH;]C:\Miniconda\envs\$(configuration)\Library\bin"

					      displayName: Set Conda Lib Path

					    # Set configuration specific build flags

					    - ${{ if eq(parameters.is_official_build, True) }}:

					      - powershell: |

					          Write-Host "##vso[task.setvariable variable=INSTALL_TEST;]0"

					          Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_NUMBER;]1"

					          Set-Variable -Name PYTORCH_VERSION -Value (Get-Content .\version.txt).Substring(0,5)

					          Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$PYTORCH_VERSION.dev"

					        displayName: Set configuration-specific build flags

					      # Set PyTorch CPU/GPU build flags..

					      - ${{ if contains(parameters.cuda, 'cpu') }}:

					        - powershell: |

					            Write-Host "##vso[task.setvariable variable=USE_CUDA;]0"

					            Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cpu"

					          displayName: Set CUDA-specific build flag for CPU build

					      - ${{ if contains(parameters.cuda, 'gpu') }}:

					        - powershell: |

					            Write-Host "##vso[task.setvariable variable=USE_CUDA;]1"

					            Write-Host "##vso[task.setvariable variable=PYTORCH_BUILD_VERSION;]$(PYTORCH_BUILD_VERSION).cu$(CUDA_VERSION)"

					          displayName: Set CUDA-specific build flag for GPU build

					    # Set CUDA 11.2, 10.2 or 10.1 specific build flags

					    - ${{ if eq(parameters.cuda, 'gpu') }}:

					      - powershell: |

					          Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5;8.0;8.6"

					          Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\"

					        displayName: Set CUDA 11.2 specific build flags

					        condition: eq(variables.CUDA_VERSION, '112')

					      - powershell: |

					          Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5"

					          Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\"

					        displayName: Set CUDA 10.2 specific build flags

					        condition: eq(variables.CUDA_VERSION, '102')

					      - powershell: |

					          Write-Host "##vso[task.setvariable variable=TORCH_CUDA_ARCH_LIST;]3.7+PTX;5.0;6.0;6.1;7.0;7.5"

					          Write-Host "##vso[task.setvariable variable=CUDA_PATH;]C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\"

					        displayName: Set CUDA 10.1 specific build flags

					        condition: eq(variables.CUDA_VERSION, '101')

					      - powershell: |

					          Write-Host "##vso[task.setvariable variable=CUDA_BIN_PATH;]$env:CUDA_PATH\bin\"

					          Write-Host "##vso[task.setvariable variable=CUDNN_ROOT;]$env:CUDA_PATH"

					          Write-Host "##vso[task.setvariable variable=CUDNN_INCLUDE_DIR;]$env:CUDA_PATH\include\"

					          Write-Host "##vso[task.setvariable variable=CUDNN_LIBRARY;]$env:CUDA_PATH\lib\x64\"

					          Write-Host "##vso[task.prependpath]$env:CUDA_PATH\bin"

					          Write-Host "##vso[task.setvariable variable=TORCH_NVCC_FLAGS;]-Xfatbin -compress-all --no-host-device-move-forward"

					          Write-Host "##vso[task.setvariable variable=THRUST_IGNORE_CUB_VERSION_CHECK;]1"

					          Write-Host "##vso[task.setvariable variable=NVTOOLSEXT_PATH;]C:\Program Files\NVIDIA Corporation\NvToolsExt\"

					        displayName: Set CUDA environment variables

					      - powershell: |

					          copy "$(CUDA_BIN_PATH)\cusparse*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\cublas*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\cudart*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\curand*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\cufft*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\cusolver*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\cudnn*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CUDA_BIN_PATH)\nvrtc*64_*.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64\nvToolsExt64_1.dll*" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CONDA_LIB_PATH)\libiomp*5md.dll" $(Build.SourcesDirectory)\torch\lib

					          copy "$(CONDA_LIB_PATH)\uv.dll" $(Build.SourcesDirectory)\torch\lib

					        displayName: Copy CUDA/cuDNN/libomp/libuv dlls to torch\lib

					    # Set MKL, sccache and randomtemp environment variables

					    - powershell: |

					        Write-Host "##vso[task.setvariable variable=CMAKE_INCLUDE_PATH;]$(Build.SourcesDirectory)\mkl\include"

					        Write-Host "##vso[task.setvariable variable=CMAKE_LIBRARY_PATH;]$(Build.SourcesDirectory)\mkl\lib;$env:CMAKE_LIBRARY_PATH"

					        Write-Host "##vso[task.setvariable variable=ADDITIONAL_PATH;]$(Build.SourcesDirectory)\tmp_bin"

					        Write-Host "##vso[task.setvariable variable=SCCACHE_IDLE_TIMEOUT;]1500"

					        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\nvcc.exe"

					        Write-Host "##vso[task.setvariable variable=CUDA_NVCC_EXECUTABLE;]$(Build.SourcesDirectory)\tmp_bin\randomtemp.exe"

					        Write-Host "##vso[task.setvariable variable=RANDOMTEMP_BASEDIR;]$(Build.SourcesDirectory)\tmp_bin"

					      displayName: Set MKL, sccache and randomtemp environment variables

					    # View current environment variables

					    - script:

					        set

					      displayName: Show environment variables

									
										14

.azure_pipelines/job_templates/wheel-wait-job-template.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,14 @@

					# Main logic to initiate wait for PR artifact to be ready

					steps:

					- task: InvokeRESTAPI@1

					  displayName: 'Wait for job success and wheel ready'

					  timeoutInMinutes: 60

					  inputs:

					    connectionType: 'connectedServiceName'

					    serviceConnection: circleciconn

					    method: 'POST'

					    headers: '{"Content-Type":"application/json", "BranchName":"$(TARGET_BRANCH_TO_CHECK_PR)", "JobName":"$(TARGET_CIRCLECI_PR)", "PlanUrl":"$(System.CollectionUri)", "ProjectId":"$(System.TeamProjectId)", "HubName":"$(System.HostType)", "PlanId":"$(System.PlanId)", "JobId":"$(System.JobId)", "TimelineId":"$(System.TimelineId)", "TaskInstanceId":"$(System.TaskInstanceId)", "AuthToken":"$(System.AccessToken)"}'

					    body: ''

					    urlSuffix: 'api/JobStatus'

					    waitForCompletion: true

									
										49

.azure_pipelines/job_templates/wheel-wait-template.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,49 @@

					# Initiate 5 agentless-server waiting jobs to check on the

					# status of PR artifact builds, for a maximum wait time of

					# 5 * 60 min =300 minutes. These jobs will pass immediately

					# once targeted CircleCI build is ready.

					jobs:

					- job: checkjob1

					  pool: server

					  timeoutInMinutes: 60

					  continueOnError: true

					  steps:

					  - template: wheel-wait-job-template.yml

					- job: checkjob2

					  pool: server

					  timeoutInMinutes: 60

					  dependsOn: checkjob1

					  continueOnError: true

					  steps:

					  - template: wheel-wait-job-template.yml

					- job: checkjob3

					  pool: server

					  timeoutInMinutes: 60

					  dependsOn: checkjob2

					  continueOnError: true

					  steps:

					  - template: wheel-wait-job-template.yml

					- job: checkjob4

					  pool: server

					  timeoutInMinutes: 60

					  dependsOn: checkjob3

					  continueOnError: true

					  steps:

					  - template: wheel-wait-job-template.yml

					- job: checkjob5

					  pool: server

					  timeoutInMinutes: 60

					  dependsOn: checkjob4

					  continueOnError: true

					  steps:

					  - template: wheel-wait-job-template.yml

									
										50

.azure_pipelines/nightly-pytorch-tests-pipeline.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,50 @@

					# PyTorch Nightly PyTorch Tests Builds Pipeline on Azure DevOps

					#

					# This pipeline runs custom PyTorch unit-tests on nightly

					# PyTorch wheels.

					stages:

					- stage: 'NightlyCustomTests'

					  displayName: 'Run custom unit tests on PyTorch wheels'

					  jobs:

					  - template: job_templates/pytorch-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: $(BUILD_POOL_LIN_1)

					      customMatrixes:

					        Nightly_Custom_Tests:

					          _DOCKER_IMAGE: $(DOCKER_IMAGE_LIN_1)

					          _PYTHON_VERSION: $(PYTHON_VERSION_LIN_1)

					          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_LIN_1)

					          _RUN_TESTS: $(RUN_TESTS_LIN)

					  - template: job_templates/pytorch-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: $(BUILD_POOL_LIN_2)

					      customMatrixes:

					        Nightly_Custom_Tests:

					          _DOCKER_IMAGE: $(DOCKER_IMAGE_LIN_2)

					          _PYTHON_VERSION: $(PYTHON_VERSION_LIN_2)

					          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_LIN_2)

					          _RUN_TESTS: $(RUN_TESTS_LIN)

					  - template: job_templates/pytorch-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: $(BUILD_POOL_WIN_1)

					      customMatrixes:

					        Nightly_Custom_Tests:

					          _PYTHON_VERSION: $(PYTHON_VERSION_WIN_1)

					          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_1)

					          _RUN_TESTS: $(RUN_TESTS_WIN)

					  - template: job_templates/pytorch-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: $(BUILD_POOL_WIN_2)

					      customMatrixes:

					        Nightly_Custom_Tests:

					          _PYTHON_VERSION: $(PYTHON_VERSION_WIN_2)

					          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_WIN_2)

					          _RUN_TESTS: $(RUN_TESTS_WIN)

									
										30

.azure_pipelines/pytorch-tests-pipeline.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,30 @@

					# PyTorch PR PyTorch Tests Builds Pipeline on Azure DevOps

					#

					# This pipeline:

					#   1) ensures that CircleCI builds for a given PR

					#      have finished, and that its artifacts are

					#      ready for download

					#   2) runs custom PyTorch unit-tests on PyTorch

					#      wheels generated during PR builds.

					stages:

					- stage: 'EnsureArtifactsReady'

					  displayName: 'Ensure PyTorch PR Artifacts are ready'

					  jobs:

					  - template: job_templates/wheel-wait-template.yml

					- stage: 'PRCustomTests'

					  displayName: 'Run custom unit tests on PyTorch wheels'

					  jobs:

					  - template: job_templates/pytorch-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: $(BUILD_POOL_PR)

					      customMatrixes:

					        PR_Custom_Tests:

					          _PYTHON_VERSION: $(PYTHON_VERSION_PR)

					          _CUDA_BUILD_VERSION: $(CUDA_BUILD_VERSION_PR)

					          _TARGET_CIRCLECI_BUILD: $(TARGET_CIRCLECI_PR)

					          _TARGET_BRANCH_TO_CHECK: $(TARGET_BRANCH_TO_CHECK_PR)

					          _DOCKER_IMAGE: $(DOCKER_IMAGE_PR)

					          _RUN_TESTS: $(RUN_TESTS_PR)

									
										224

.azure_pipelines/verify-pipeline.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,224 @@

					# PyTorch Official Builds Pipeline on Azure DevOps

					#

					# This pipeline:

					#   1) builds PyTorch on all available configurations

					#   2) verifies PyTorch artifacts by installing them in a clean environment

					#      and checking torch.__version_

					#   3) publishes official PyTorch artifacts to Azure DevOps Artifacts for consumption

					stages:

					- stage: 'Build'

					  displayName: 'Build PyTorch'

					  jobs:

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: 'PyTorch-Linux-CPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_official_build: True

					      os: ubuntu

					      cuda: cpu

					      customMatrixes:

					        Py_38:

					          configuration: ubuntu_1804_py_38_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev

					        Py_37:

					          configuration: ubuntu_1804_py_37_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: 'PyTorch-Linux-GPU'

					      container_endpoint: pytorchms.azurecr.io

					      build_stage: True

					      is_official_build: True

					      os: ubuntu

					      cuda: gpu

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: ubuntu_1804_py_39_cuda_112_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_810:

					          configuration: ubuntu_1804_py_38_cuda_102_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_765:

					          configuration: ubuntu_1804_py_37_cuda_101_cudnn_765

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev

					          CUDA_VERSION: 101

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: 'PyTorch-Win-CPU'

					      build_stage: True

					      is_official_build: True

					      os: windows

					      cuda: cpu

					      customMatrixes:

					        Py_38:

					          configuration: windows_2019_py_38_cpu

					        Py_37:

					          configuration: windows_2019_py_37_cpu

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: 'PyTorch-Win-GPU'

					      build_stage: True

					      is_official_build: True

					      os: windows

					      cuda: gpu

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: windows_2019_py_39_cuda_112_cudnn_810

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_765:

					          configuration: windows_2019_py_38_cuda_102_cudnn_765

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_764:

					          configuration: windows_2019_py_37_cuda_101_cudnn_764

					          CUDA_VERSION: 101

					- stage: 'Verify'

					  displayName: 'Verify PyTorch wheels'

					  dependsOn: Build

					  condition: succeeded()

					  jobs:

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: 'PyTorch-Linux-CPU'

					      container_endpoint: pytorchms.azurecr.io

					      verify_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_38:

					          configuration: ubuntu_1804_py_38_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev

					        Py_37:

					          configuration: ubuntu_1804_py_37_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: 'PyTorch-Linux-GPU'

					      container_endpoint: pytorchms.azurecr.io

					      verify_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: ubuntu_1804_py_39_cuda_112_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_810:

					          configuration: ubuntu_1804_py_38_cuda_102_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_765:

					          configuration: ubuntu_1804_py_37_cuda_101_cudnn_765

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev

					          CUDA_VERSION: 101

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: 'PyTorch-Win-CPU'

					      verify_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_38:

					          configuration: windows_2019_py_38_cpu

					        Py_37:

					          configuration: windows_2019_py_37_cpu

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: 'PyTorch-Win-GPU'

					      verify_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: windows_2019_py_39_cuda_112_cudnn_810

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_765:

					          configuration: windows_2019_py_38_cuda_102_cudnn_765

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_764:

					          configuration: windows_2019_py_37_cuda_101_cudnn_764

					          CUDA_VERSION: 101

					- stage: 'Publish'

					  displayName: 'Publish PyTorch wheels'

					  dependsOn: Verify

					  condition: succeeded()

					  jobs:

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_CPU_docker

					      pool: 'PyTorch-Linux-CPU'

					      container_endpoint: pytorchms.azurecr.io

					      publish_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_38:

					          configuration: ubuntu_1804_py_38_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cpu_dev

					        Py_37:

					          configuration: ubuntu_1804_py_37_cpu

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cpu_dev

					  - template: job_templates/build-verify-publish-template-unix.yml

					    parameters:

					      name: ubuntu_1804_GPU_docker

					      pool: 'PyTorch-Linux-GPU'

					      container_endpoint: pytorchms.azurecr.io

					      publish_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: ubuntu_1804_py_39_cuda_112_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_39_cuda_112_cudnn_8_dev

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_810:

					          configuration: ubuntu_1804_py_38_cuda_102_cudnn_810

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_38_cuda_102_cudnn_8_dev

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_765:

					          configuration: ubuntu_1804_py_37_cuda_101_cudnn_765

					          container_image: pytorchms.azurecr.io/ubuntu_1804_py_37_cuda_101_cudnn_7_dev

					          CUDA_VERSION: 101

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_CPU

					      pool: 'PyTorch-Win-CPU'

					      publish_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_38:

					          configuration: windows_2019_py_38_cpu

					        Py_37:

					          configuration: windows_2019_py_37_cpu

					  - template: job_templates/build-verify-publish-template-win.yml

					    parameters:

					      name: windows_2019_GPU

					      pool: 'PyTorch-Win-GPU'

					      publish_stage: True

					      is_official_build: True

					      customMatrixes:

					        Py_39_CUDA_112_cuDNN_810:

					          configuration: windows_2019_py_39_cuda_112_cudnn_810

					          CUDA_VERSION: 112

					        Py_38_CUDA_102_cuDNN_765:

					          configuration: windows_2019_py_38_cuda_102_cudnn_765

					          CUDA_VERSION: 102

					        Py_37_CUDA_101_cuDNN_764:

					          configuration: windows_2019_py_37_cuda_101_cudnn_764

					          CUDA_VERSION: 101

									
										13

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
					@ -52,9 +52,18 @@ CONFIG_TREE_DATA = OrderedDict(

					            "3.7",

					            "3.7",

					        ],

					        ],

					    )),

					    )),

					    # Skip CUDA-9.2 builds on Windows

					    macos_arm64=([None], OrderedDict(

					        wheel=[

					            "3.8",

					            "3.9",

					        ],

					        conda=[

					            "3.8",

					            "3.9",

					        ],

					    )),

					    windows=(

					    windows=(

					        [v for v in dimensions.GPU_VERSIONS if v not in ['cuda92'] + dimensions.ROCM_VERSION_LABELS],

					        [v for v in dimensions.GPU_VERSIONS if v not in dimensions.ROCM_VERSION_LABELS],

					        OrderedDict(

					        OrderedDict(

					            wheel=dimensions.STANDARD_PYTHON_VERSIONS,

					            wheel=dimensions.STANDARD_PYTHON_VERSIONS,

					            conda=dimensions.STANDARD_PYTHON_VERSIONS,

					            conda=dimensions.STANDARD_PYTHON_VERSIONS,

									
										20

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
					@ -27,7 +27,19 @@ class Conf(object):

					    def gen_docker_image(self):

					    def gen_docker_image(self):

					        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

					        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

					            return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")

					            if self.gpu_version is None:

					                return miniutils.quote("pytorch/libtorch-cxx11-builder:cpu")

					            else:

					                return miniutils.quote(

					                    f"pytorch/libtorch-cxx11-builder:{self.gpu_version}"

					                )

					        if self.pydistro == "conda":

					            if self.gpu_version is None:

					                return miniutils.quote("pytorch/conda-builder:cpu")

					            else:

					                return miniutils.quote(

					                    f"pytorch/conda-builder:{self.gpu_version}"

					                )

					        docker_word_substitution = {

					        docker_word_substitution = {

					            "manywheel": "manylinux",

					            "manywheel": "manylinux",

					@ -164,7 +176,7 @@ def gen_build_env_list(smoke):

					            c.find_prop("gpu"),

					            c.find_prop("gpu"),

					            c.find_prop("package_format"),

					            c.find_prop("package_format"),

					            [c.find_prop("pyver")],

					            [c.find_prop("pyver")],

					            c.find_prop("smoke"),

					            c.find_prop("smoke") and not (c.find_prop("os_name") == "macos_arm64"),  # don't test arm64

					            c.find_prop("libtorch_variant"),

					            c.find_prop("libtorch_variant"),

					            c.find_prop("gcc_config_variant"),

					            c.find_prop("gcc_config_variant"),

					            c.find_prop("libtorch_config_variant"),

					            c.find_prop("libtorch_config_variant"),

					@ -216,7 +228,9 @@ def get_jobs(toplevel_key, smoke):

					    configs = gen_build_env_list(smoke)

					    configs = gen_build_env_list(smoke)

					    phase = "build" if toplevel_key == "binarybuilds" else "test"

					    phase = "build" if toplevel_key == "binarybuilds" else "test"

					    for build_config in configs:

					    for build_config in configs:

					        jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

					        # don't test for macos_arm64 as it's cross compiled

					        if phase != "test" or build_config.os != "macos_arm64":

					            jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

					    return jobs_list

					    return jobs_list

									
										6

.circleci/cimodel/data/dimensions.py
									
												View File
												
					@ -1,14 +1,14 @@

					PHASES = ["build", "test"]

					PHASES = ["build", "test"]

					CUDA_VERSIONS = [

					CUDA_VERSIONS = [

					    "101",

					    "102",

					    "102",

					    "112",

					    "111",

					]

					]

					ROCM_VERSIONS = [

					ROCM_VERSIONS = [

					    "3.10",

					    "4.0.1",

					    "4.0.1",

					    "4.1",

					    "4.2",

					]

					]

					ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

					ROCM_VERSION_LABELS = ["rocm" + v for v in ROCM_VERSIONS]

									
										54

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
					@ -32,24 +32,9 @@ CONFIG_TREE_DATA = [

					            ]),

					            ]),

					        ]),

					        ]),

					        ("cuda", [

					        ("cuda", [

					            ("9.2", [

					                ("3.6", [

					                    X(True),

					                    ("cuda_gcc_override", [

					                        ("gcc5.4", [

					                            ('build_only', [XImportant(True)]),

					                        ]),

					                    ]),

					                ])

					            ]),

					            ("10.1", [

					                ("3.6", [

					                    ('build_only', [X(True)]),

					                ]),

					            ]),

					            ("10.2", [

					            ("10.2", [

					                ("3.6", [

					                ("3.6", [

					                    ("shard_test", [XImportant(True)]),

					                    ("shard_test", [X(True)]),

					                    ("libtorch", [

					                    ("libtorch", [

					                        (True, [

					                        (True, [

					                            ('build_only', [X(True)]),

					                            ('build_only', [X(True)]),

					@ -59,10 +44,10 @@ CONFIG_TREE_DATA = [

					            ]),

					            ]),

					            ("11.1", [

					            ("11.1", [

					                ("3.8", [

					                ("3.8", [

					                    X(True),

					                    ("shard_test", [XImportant(True)]),

					                    ("libtorch", [

					                    ("libtorch", [

					                        (True, [

					                        (True, [

					                            ('build_only', [XImportant(True)]),

					                            ('build_only', [X(True)]),

					                        ]),

					                        ]),

					                    ]),

					                    ]),

					                ]),

					                ]),

					@ -72,7 +57,9 @@ CONFIG_TREE_DATA = [

					    ("bionic", [

					    ("bionic", [

					        ("clang", [

					        ("clang", [

					            ("9", [

					            ("9", [

					                XImportant("3.6"),

					                ("3.6", [

					                    ("noarch", [XImportant(True)]),

					                ]),

					            ]),

					            ]),

					            ("9", [

					            ("9", [

					                ("3.6", [

					                ("3.6", [

					@ -81,6 +68,13 @@ CONFIG_TREE_DATA = [

					                ]),

					                ]),

					            ]),

					            ]),

					        ]),

					        ]),

					        ("cuda", [

					            ("10.2", [

					                ("3.9", [

					                    ("shard_test", [XImportant(True)]),

					                ]),

					            ]),

					        ]),

					        ("gcc", [

					        ("gcc", [

					            ("9", [

					            ("9", [

					                ("3.8", [

					                ("3.8", [

					@ -151,6 +145,8 @@ class PyVerConfigNode(TreeConfigNode):

					    def init2(self, node_name):

					    def init2(self, node_name):

					        self.props["pyver"] = node_name

					        self.props["pyver"] = node_name

					        self.props["abbreviated_pyver"] = get_major_pyver(node_name)

					        self.props["abbreviated_pyver"] = get_major_pyver(node_name)

					        if node_name == "3.9":

					            self.props["abbreviated_pyver"] = "py3.9"

					    # noinspection PyMethodMayBeStatic

					    # noinspection PyMethodMayBeStatic

					    def child_constructor(self):

					    def child_constructor(self):

					@ -167,8 +163,10 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

					        next_nodes = {

					        next_nodes = {

					            "asan": AsanConfigNode,

					            "asan": AsanConfigNode,

					            "xla": XlaConfigNode,

					            "xla": XlaConfigNode,

					            "mlc": MLCConfigNode,

					            "vulkan": VulkanConfigNode,

					            "vulkan": VulkanConfigNode,

					            "parallel_tbb": ParallelTBBConfigNode,

					            "parallel_tbb": ParallelTBBConfigNode,

					            "noarch": NoarchConfigNode,

					            "parallel_native": ParallelNativeConfigNode,

					            "parallel_native": ParallelNativeConfigNode,

					            "onnx": ONNXConfigNode,

					            "onnx": ONNXConfigNode,

					            "libtorch": LibTorchConfigNode,

					            "libtorch": LibTorchConfigNode,

					@ -203,6 +201,16 @@ class XlaConfigNode(TreeConfigNode):

					    def child_constructor(self):

					    def child_constructor(self):

					        return ImportantConfigNode

					        return ImportantConfigNode

					class MLCConfigNode(TreeConfigNode):

					    def modify_label(self, label):

					        return "MLC=" + str(label)

					    def init2(self, node_name):

					        self.props["is_mlc"] = node_name

					    def child_constructor(self):

					        return ImportantConfigNode

					class AsanConfigNode(TreeConfigNode):

					class AsanConfigNode(TreeConfigNode):

					    def modify_label(self, label):

					    def modify_label(self, label):

					@ -248,6 +256,14 @@ class ParallelTBBConfigNode(TreeConfigNode):

					        return ImportantConfigNode

					        return ImportantConfigNode

					class NoarchConfigNode(TreeConfigNode):

					    def init2(self, node_name):

					        self.props["is_noarch"] = node_name

					    def child_constructor(self):

					        return ImportantConfigNode

					class ParallelNativeConfigNode(TreeConfigNode):

					class ParallelNativeConfigNode(TreeConfigNode):

					    def modify_label(self, label):

					    def modify_label(self, label):

					        return "PARALLELNATIVE=" + str(label)

					        return "PARALLELNATIVE=" + str(label)

									
										4

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
					@ -273,6 +273,7 @@ def instantiate_configs():

					        is_xla = fc.find_prop("is_xla") or False

					        is_xla = fc.find_prop("is_xla") or False

					        is_asan = fc.find_prop("is_asan") or False

					        is_asan = fc.find_prop("is_asan") or False

					        is_coverage = fc.find_prop("is_coverage") or False

					        is_coverage = fc.find_prop("is_coverage") or False

					        is_noarch = fc.find_prop("is_noarch") or False

					        is_onnx = fc.find_prop("is_onnx") or False

					        is_onnx = fc.find_prop("is_onnx") or False

					        is_pure_torch = fc.find_prop("is_pure_torch") or False

					        is_pure_torch = fc.find_prop("is_pure_torch") or False

					        is_vulkan = fc.find_prop("is_vulkan") or False

					        is_vulkan = fc.find_prop("is_vulkan") or False

					@ -316,6 +317,9 @@ def instantiate_configs():

					            parms_list_ignored_for_docker_image.append("coverage")

					            parms_list_ignored_for_docker_image.append("coverage")

					            python_version = fc.find_prop("pyver")

					            python_version = fc.find_prop("pyver")

					        if is_noarch:

					            parms_list_ignored_for_docker_image.append("noarch")

					        if is_onnx:

					        if is_onnx:

					            parms_list.append("onnx")

					            parms_list.append("onnx")

					            python_version = fc.find_prop("pyver")

					            python_version = fc.find_prop("pyver")

									
										16

.circleci/cimodel/data/simple/android_definitions.py
									
												View File
												
					@ -2,6 +2,7 @@ import cimodel.data.simple.util.branch_filters as branch_filters

					from cimodel.data.simple.util.docker_constants import (

					from cimodel.data.simple.util.docker_constants import (

					    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK

					    DOCKER_IMAGE_NDK, DOCKER_REQUIREMENT_NDK

					)

					)

					import cimodel.lib.miniutils as miniutils

					class AndroidJob:

					class AndroidJob:

					@ -51,13 +52,15 @@ class AndroidGradleJob:

					                 template_name,

					                 template_name,

					                 dependencies,

					                 dependencies,

					                 is_master_only=True,

					                 is_master_only=True,

					                 is_pr_only=False):

					                 is_pr_only=False,

					                 extra_props=tuple()):

					        self.job_name = job_name

					        self.job_name = job_name

					        self.template_name = template_name

					        self.template_name = template_name

					        self.dependencies = dependencies

					        self.dependencies = dependencies

					        self.is_master_only = is_master_only

					        self.is_master_only = is_master_only

					        self.is_pr_only = is_pr_only

					        self.is_pr_only = is_pr_only

					        self.extra_props = dict(extra_props)

					    def gen_tree(self):

					    def gen_tree(self):

					@ -70,6 +73,8 @@ class AndroidGradleJob:

					            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

					            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.NON_PR_BRANCH_LIST)

					        elif self.is_pr_only:

					        elif self.is_pr_only:

					            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)

					            props_dict["filters"] = branch_filters.gen_filter_dict(branch_filters.PR_BRANCH_LIST)

					        if self.extra_props:

					            props_dict.update(self.extra_props)

					        return [{self.template_name: props_dict}]

					        return [{self.template_name: props_dict}]

					@ -91,6 +96,15 @@ WORKFLOW_DATA = [

					        [DOCKER_REQUIREMENT_NDK],

					        [DOCKER_REQUIREMENT_NDK],

					        is_master_only=False,

					        is_master_only=False,

					        is_pr_only=True),

					        is_pr_only=True),

					    AndroidGradleJob(

					        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit",

					        "pytorch_android_gradle_custom_build_single",

					        [DOCKER_REQUIREMENT_NDK],

					        is_master_only=False,

					        is_pr_only=True,

					        extra_props=tuple({

					            "lite_interpreter": miniutils.quote(str(int(False)))

					        }.items())),

					    AndroidGradleJob(

					    AndroidGradleJob(

					        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

					        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

					        "pytorch_android_gradle_build",

					        "pytorch_android_gradle_build",

									
										12

.circleci/cimodel/data/simple/binary_smoketest.py
									
												View File
												
					@ -77,7 +77,7 @@ WORKFLOW_DATA = [

					        ["libtorch", "3.7m", "cpu", "devtoolset7"],

					        ["libtorch", "3.7m", "cpu", "devtoolset7"],

					        "pytorch/manylinux-cuda102",

					        "pytorch/manylinux-cuda102",

					        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build",

					        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build",

					        is_master_only=False,

					        is_master_only=True,

					        has_libtorch_variant=True,

					        has_libtorch_variant=True,

					    ),

					    ),

					    SmoketestJob(

					    SmoketestJob(

					@ -109,14 +109,14 @@ WORKFLOW_DATA = [

					        ["libtorch", "3.7", "cpu", "debug"],

					        ["libtorch", "3.7", "cpu", "debug"],

					        None,

					        None,

					        "binary_windows_libtorch_3_7_cpu_debug_build",

					        "binary_windows_libtorch_3_7_cpu_debug_build",

					        is_master_only=False,

					        is_master_only=True,

					    ),

					    ),

					    SmoketestJob(

					    SmoketestJob(

					        "binary_windows_build",

					        "binary_windows_build",

					        ["libtorch", "3.7", "cpu", "release"],

					        ["libtorch", "3.7", "cpu", "release"],

					        None,

					        None,

					        "binary_windows_libtorch_3_7_cpu_release_build",

					        "binary_windows_libtorch_3_7_cpu_release_build",

					        is_master_only=False,

					        is_master_only=True,

					    ),

					    ),

					    SmoketestJob(

					    SmoketestJob(

					        "binary_windows_build",

					        "binary_windows_build",

					@ -131,7 +131,7 @@ WORKFLOW_DATA = [

					        ["libtorch", "3.7", "cpu", "debug"],

					        ["libtorch", "3.7", "cpu", "debug"],

					        None,

					        None,

					        "binary_windows_libtorch_3_7_cpu_debug_test",

					        "binary_windows_libtorch_3_7_cpu_debug_test",

					        is_master_only=False,

					        is_master_only=True,

					        requires=["binary_windows_libtorch_3_7_cpu_debug_build"],

					        requires=["binary_windows_libtorch_3_7_cpu_debug_build"],

					    ),

					    ),

					    SmoketestJob(

					    SmoketestJob(

					@ -173,7 +173,7 @@ WORKFLOW_DATA = [

					        ["libtorch", "3.7m", "cpu", "devtoolset7"],

					        ["libtorch", "3.7m", "cpu", "devtoolset7"],

					        "pytorch/manylinux-cuda102",

					        "pytorch/manylinux-cuda102",

					        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test",

					        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test",

					        is_master_only=False,

					        is_master_only=True,

					        requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"],

					        requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"],

					        has_libtorch_variant=True,

					        has_libtorch_variant=True,

					    ),

					    ),

					@ -182,7 +182,7 @@ WORKFLOW_DATA = [

					        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],

					        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],

					        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",

					        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",

					        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test",

					        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test",

					        is_master_only=False,

					        is_master_only=True,

					        requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"],

					        requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"],

					        has_libtorch_variant=True,

					        has_libtorch_variant=True,

					    ),

					    ),

									
										13

.circleci/cimodel/data/simple/docker_definitions.py
									
												View File
												
					@ -6,21 +6,16 @@ from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN

					# TODO: make this generated from a matrix rather than just a static list

					# TODO: make this generated from a matrix rather than just a static list

					IMAGE_NAMES = [

					IMAGE_NAMES = [

					    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9",

					    "pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9",

					    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",

					    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",

					    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

					    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

					    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7",

					    "pytorch-linux-bionic-py3.6-clang9",

					    "pytorch-linux-bionic-py3.6-clang9",

					    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

					    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

					    "pytorch-linux-bionic-py3.8-gcc9",

					    "pytorch-linux-bionic-py3.8-gcc9",

					    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",

					    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

					    "pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

					    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",

					    "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7",

					    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",

					    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    "pytorch-linux-xenial-py3-clang5-asan",

					    "pytorch-linux-xenial-py3-clang5-asan",

					    "pytorch-linux-xenial-py3-clang7-onnx",

					    "pytorch-linux-xenial-py3-clang7-onnx",

					@ -30,7 +25,9 @@ IMAGE_NAMES = [

					    "pytorch-linux-xenial-py3.6-gcc7.2",

					    "pytorch-linux-xenial-py3.6-gcc7.2",

					    "pytorch-linux-xenial-py3.6-gcc7",

					    "pytorch-linux-xenial-py3.6-gcc7",

					    "pytorch-linux-bionic-rocm3.9-py3.6",

					    "pytorch-linux-bionic-rocm3.9-py3.6",

					    "pytorch-linux-bionic-rocm3.10-py3.6",

					    "pytorch-linux-bionic-rocm4.0.1-py3.6",

					    "pytorch-linux-bionic-rocm4.1-py3.6",

					    "pytorch-linux-bionic-rocm4.2-py3.6",

					]

					]

									
										18

.circleci/cimodel/data/simple/ios_definitions.py
									
												View File
												
					@ -61,10 +61,20 @@ class IOSJob:

					WORKFLOW_DATA = [

					WORKFLOW_DATA = [

					    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False),

					    IOSJob(XCODE_VERSION, ArchVariant("x86_64"), is_org_member_context=False, extra_props={

					    IOSJob(XCODE_VERSION, ArchVariant("arm64")),

					        "lite_interpreter": miniutils.quote(str(int(True)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={"use_metal": miniutils.quote(str(int(True)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("x86_64", "full_jit"), is_org_member_context=False, extra_props={

					    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={"op_list": "mobilenetv2.yaml"}),

					        "lite_interpreter": miniutils.quote(str(int(False)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("arm64"), extra_props={

					        "lite_interpreter": miniutils.quote(str(int(True)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("arm64", "metal"), extra_props={

					        "use_metal": miniutils.quote(str(int(True))),

					        "lite_interpreter": miniutils.quote(str(int(True)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("arm64", "full_jit"), extra_props={

					        "lite_interpreter": miniutils.quote(str(int(False)))}),

					    IOSJob(XCODE_VERSION, ArchVariant("arm64", "custom"), extra_props={

					        "op_list": "mobilenetv2.yaml",

					        "lite_interpreter": miniutils.quote(str(int(True)))}),

					]

					]

									
										32

.circleci/cimodel/data/simple/macos_definitions.py
									
												View File
												
					@ -1,14 +1,22 @@

					class MacOsJob:

					class MacOsJob:

					    def __init__(self, os_version, is_test=False):

					    def __init__(self, os_version, is_build=False, is_test=False, extra_props=tuple()):

					        # extra_props is tuple type, because mutable data structures for argument defaults

					        # is not recommended.

					        self.os_version = os_version

					        self.os_version = os_version

					        self.is_build = is_build

					        self.is_test = is_test

					        self.is_test = is_test

					        self.extra_props = dict(extra_props)

					    def gen_tree(self):

					    def gen_tree(self):

					        non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]

					        non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]

					        phase_name = "test" if self.is_test else "build"

					        extra_name_list = [name for name, exist in self.extra_props.items() if exist]

					        full_job_name_list = non_phase_parts + extra_name_list + [

					            'build' if self.is_build else None,

					            'test' if self.is_test else None,

					        ]

					        full_job_name = "_".join(non_phase_parts + [phase_name])

					        full_job_name = "_".join(list(filter(None, full_job_name_list)))

					        test_build_dependency = "_".join(non_phase_parts + ["build"])

					        test_build_dependency = "_".join(non_phase_parts + ["build"])

					        extra_dependencies = [test_build_dependency] if self.is_test else []

					        extra_dependencies = [test_build_dependency] if self.is_test else []

					@ -21,7 +29,23 @@ class MacOsJob:

					        return [{full_job_name: props_dict}]

					        return [{full_job_name: props_dict}]

					WORKFLOW_DATA = [MacOsJob("10_13"), MacOsJob("10_13", True)]

					WORKFLOW_DATA = [

					    MacOsJob("10_15", is_build=True),

					    MacOsJob("10_13", is_build=True),

					    MacOsJob(

					        "10_13",

					        is_build=False,

					        is_test=True,

					    ),

					    MacOsJob(

					        "10_13",

					        is_build=True,

					        is_test=True,

					        extra_props=tuple({

					            "lite_interpreter": True

					        }.items()),

					    )

					]

					def get_workflow_jobs():

					def get_workflow_jobs():

									
										6

.circleci/cimodel/data/simple/mobile_definitions.py
									
												View File
												
					@ -65,6 +65,12 @@ WORKFLOW_DATA = [

					        ["custom", "build", "dynamic"]

					        ["custom", "build", "dynamic"]

					    ),

					    ),

					    MobileJob(

					        DOCKER_IMAGE_NDK,

					        [DOCKER_REQUIREMENT_NDK],

					        ["custom", "build", "static"]

					    ),

					    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

					    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

					    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

					    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

					    MobileJob(

					    MobileJob(

									
										96

.circleci/cimodel/data/windows_build_definitions.py
									
												View File
												
					@ -1,5 +1,5 @@

					import cimodel.data.simple.util.branch_filters

					import cimodel.lib.miniutils as miniutils

					import cimodel.lib.miniutils as miniutils

					from cimodel.data.simple.util.branch_filters import gen_filter_dict, RC_PATTERN, NON_PR_BRANCH_LIST

					from cimodel.data.simple.util.versions import CudaVersion

					from cimodel.data.simple.util.versions import CudaVersion

					@ -10,13 +10,19 @@ class WindowsJob:

					        vscode_spec,

					        vscode_spec,

					        cuda_version,

					        cuda_version,

					        force_on_cpu=False,

					        force_on_cpu=False,

					        master_only_pred=lambda job: job.vscode_spec.year != 2019,

					        multi_gpu=False,

					        master_only=False,

					        nightly_only=False,

					        master_and_nightly=False

					    ):

					    ):

					        self.test_index = test_index

					        self.test_index = test_index

					        self.vscode_spec = vscode_spec

					        self.vscode_spec = vscode_spec

					        self.cuda_version = cuda_version

					        self.cuda_version = cuda_version

					        self.force_on_cpu = force_on_cpu

					        self.force_on_cpu = force_on_cpu

					        self.master_only_pred = master_only_pred

					        self.multi_gpu = multi_gpu

					        self.master_only = master_only

					        self.nightly_only = nightly_only

					        self.master_and_nightly = master_and_nightly

					    def gen_tree(self):

					    def gen_tree(self):

					@ -25,7 +31,10 @@ class WindowsJob:

					            base_phase if self.test_index is None else base_phase + str(self.test_index)

					            base_phase if self.test_index is None else base_phase + str(self.test_index)

					        )

					        )

					        key_name = "_".join(["pytorch", "windows", base_phase])

					        key_parts = ["pytorch", "windows", base_phase]

					        if self.multi_gpu:

					            key_parts.append('multigpu')

					        key_name = "_".join(key_parts)

					        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []

					        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []

					@ -61,35 +70,47 @@ class WindowsJob:

					        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu

					        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu

					        props_dict = {

					        if self.multi_gpu:

					            "build_environment": build_environment_string,

					            props_dict = {"requires": prerequisite_jobs}

					            "python_version": miniutils.quote("3.6"),

					        else:

					            "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

					            props_dict = {

					            "vc_year": miniutils.quote(str(self.vscode_spec.year)),

					                "build_environment": build_environment_string,

					            "vc_product": self.vscode_spec.get_product(),

					                "python_version": miniutils.quote("3.6"),

					            "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),

					                "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

					            "requires": prerequisite_jobs,

					                "vc_year": miniutils.quote(str(self.vscode_spec.year)),

					        }

					                "vc_product": self.vscode_spec.get_product(),

					                "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),

					                "requires": prerequisite_jobs,

					            }

					        if self.master_only_pred(self):

					        if self.master_only:

					            props_dict[

					            props_dict[

					                "filters"

					                "filters"

					            ] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

					            ] = gen_filter_dict()

					        elif self.nightly_only:

					            props_dict[

					                "filters"

					            ] = gen_filter_dict(branches_list=["nightly"], tags_list=RC_PATTERN)

					        elif self.master_and_nightly:

					            props_dict[

					                "filters"

					            ] = gen_filter_dict(branches_list=NON_PR_BRANCH_LIST + ["nightly"], tags_list=RC_PATTERN)

					        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]

					        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]

					        if base_phase == "test":

					        if not self.multi_gpu:

					            test_name = "-".join(["pytorch", "windows", numbered_phase])

					            if base_phase == "test":

					            props_dict["test_name"] = test_name

					                test_name = "-".join(["pytorch", "windows", numbered_phase])

					                props_dict["test_name"] = test_name

					            if is_running_on_cuda:

					                if is_running_on_cuda:

					                props_dict["executor"] = "windows-with-nvidia-gpu"

					                    props_dict["executor"] = "windows-with-nvidia-gpu"

					        props_dict["cuda_version"] = (

					            props_dict["cuda_version"] = (

					            miniutils.quote(str(self.cuda_version))

					                miniutils.quote(str(self.cuda_version))

					            if self.cuda_version

					                if self.cuda_version

					            else "cpu"

					                else "cpu"

					        )

					            )

					        props_dict["name"] = "_".join(name_parts)

					        props_dict["name"] = "_".join(name_parts)

					@ -108,7 +129,7 @@ class VcSpec:

					        return [self.prefixed_year()] + self.version_elements

					        return [self.prefixed_year()] + self.version_elements

					    def get_product(self):

					    def get_product(self):

					        return "Community" if self.year == 2019 else "BuildTools"

					        return "BuildTools"

					    def dotted_version(self):

					    def dotted_version(self):

					        return ".".join(self.version_elements)

					        return ".".join(self.version_elements)

					@ -119,28 +140,23 @@ class VcSpec:

					    def render(self):

					    def render(self):

					        return "_".join(self.get_elements())

					        return "_".join(self.get_elements())

					def FalsePred(_):

					    return False

					def TruePred(_):

					    return True

					_VC2019 = VcSpec(2019)

					_VC2019 = VcSpec(2019)

					WORKFLOW_DATA = [

					WORKFLOW_DATA = [

					    # VS2019 CUDA-10.1

					    # VS2019 CUDA-10.1

					    WindowsJob(None, _VC2019, CudaVersion(10, 1)),

					    WindowsJob(None, _VC2019, CudaVersion(10, 1), master_only=True),

					    WindowsJob(1, _VC2019, CudaVersion(10, 1)),

					    WindowsJob(1, _VC2019, CudaVersion(10, 1), master_only=True),

					    WindowsJob(2, _VC2019, CudaVersion(10, 1)),

					    WindowsJob(2, _VC2019, CudaVersion(10, 1), master_only=True),

					    # VS2019 CUDA-11.1

					    # VS2019 CUDA-11.1

					    WindowsJob(None, _VC2019, CudaVersion(11, 1)),

					    WindowsJob(None, _VC2019, CudaVersion(11, 1)),

					    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),

					    WindowsJob(1, _VC2019, CudaVersion(11, 1), master_only=True),

					    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only_pred=TruePred),

					    WindowsJob(2, _VC2019, CudaVersion(11, 1), master_only=True),

					    WindowsJob('_azure_multi_gpu', _VC2019, CudaVersion(11, 1), multi_gpu=True, nightly_only=True),

					    # VS2019 CPU-only

					    # VS2019 CPU-only

					    WindowsJob(None, _VC2019, None),

					    WindowsJob(None, _VC2019, None),

					    WindowsJob(1, _VC2019, None, master_only_pred=TruePred),

					    WindowsJob(1, _VC2019, None),

					    WindowsJob(2, _VC2019, None, master_only_pred=TruePred),

					    WindowsJob(2, _VC2019, None),

					    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),

					    WindowsJob(1, _VC2019, CudaVersion(10, 1), force_on_cpu=True, master_only=True),

					]

					]

3859

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										12

.circleci/docker/README.md
									
												View File
												
					@ -12,8 +12,20 @@ each image as the `BUILD_ENVIRONMENT` environment variable.

					See `build.sh` for valid build environments (it's the giant switch).

					See `build.sh` for valid build environments (it's the giant switch).

					Docker builds are now defined with `.circleci/cimodel/data/simple/docker_definitions.py`

					## Contents

					## Contents

					* `build.sh` -- dispatch script to launch all builds

					* `build.sh` -- dispatch script to launch all builds

					* `common` -- scripts used to execute individual Docker build stages

					* `common` -- scripts used to execute individual Docker build stages

					* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker

					* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker

					## Usage

					```bash

					# Build a specific image

					./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

					# Set flags (see build.sh) and build image

					sudo bash -c 'BREAKPAD=1 ./build.sh pytorch-linux-bionic-py3.8-gcc9 -t myimage:latest

					```

									
										6

.circleci/docker/android/build.gradle
									
												View File
												
					@ -20,10 +20,8 @@ buildscript {

					    }

					    }

					    dependencies {

					    dependencies {

					        classpath 'com.android.tools.build:gradle:3.3.2'

					        classpath 'com.android.tools.build:gradle:4.1.2'

					        classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.0"

					        classpath 'com.vanniktech:gradle-maven-publish-plugin:0.14.2'

					        classpath "com.github.dcendents:android-maven-gradle-plugin:2.1"

					        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"

					    }

					    }

					}

					}

									
										133

.circleci/docker/build.sh
									
												View File
												
					@ -88,6 +88,7 @@ case "$image" in

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    KATEX=yes

					    KATEX=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-py3.6-gcc7.2)

					  pytorch-linux-xenial-py3.6-gcc7.2)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					@ -100,24 +101,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    ;;

					    BREAKPAD=yes

					  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4)

					    CUDA_VERSION=9.2

					    CUDNN_VERSION=7

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=5

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    ;;

					  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)

					    CUDA_VERSION=9.2

					    CUDNN_VERSION=7

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=7

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)

					  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)

					    CUDA_VERSION=10.0

					    CUDA_VERSION=10.0

					@ -127,6 +111,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)

					  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)

					    CUDA_VERSION=10.1

					    CUDA_VERSION=10.1

					@ -137,6 +122,7 @@ case "$image" in

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    KATEX=yes

					    KATEX=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

					  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

					    CUDA_VERSION=10.2

					    CUDA_VERSION=10.2

					@ -147,16 +133,7 @@ case "$image" in

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    KATEX=yes

					    KATEX=yes

					    ;;

					    BREAKPAD=yes

					  pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7)

					    CUDA_VERSION=11.0

					    CUDNN_VERSION=8

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=7

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    KATEX=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

					  pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7)

					    CUDA_VERSION=11.1

					    CUDA_VERSION=11.1

					@ -167,6 +144,18 @@ case "$image" in

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    KATEX=yes

					    KATEX=yes

					    BREAKPAD=yes

					    ;;

					  pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7)

					    CUDA_VERSION=11.3.0 # Deviating from major.minor to conform to nvidia's Docker image names

					    CUDNN_VERSION=8

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=7

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    KATEX=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-py3-clang5-asan)

					  pytorch-linux-xenial-py3-clang5-asan)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					@ -174,6 +163,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-py3-clang7-onnx)

					  pytorch-linux-xenial-py3-clang7-onnx)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					@ -181,6 +171,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

					  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					@ -189,7 +180,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    ANDROID=yes

					    ANDROID=yes

					    ANDROID_NDK_VERSION=r19c

					    ANDROID_NDK_VERSION=r19c

					    GRADLE_VERSION=4.10.3

					    GRADLE_VERSION=6.8.3

					    CMAKE_VERSION=3.7.0

					    CMAKE_VERSION=3.7.0

					    NINJA_VERSION=1.9.0

					    NINJA_VERSION=1.9.0

					    ;;

					    ;;

					@ -199,6 +190,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-bionic-py3.6-clang9)

					  pytorch-linux-bionic-py3.6-clang9)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					@ -206,7 +198,8 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    VULKAN_SDK_VERSION=1.2.148.0

					    BREAKPAD=yes

					    VULKAN_SDK_VERSION=1.2.162.1

					    SWIFTSHADER=yes

					    SWIFTSHADER=yes

					    ;;

					    ;;

					  pytorch-linux-bionic-py3.8-gcc9)

					  pytorch-linux-bionic-py3.8-gcc9)

					@ -215,6 +208,8 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

					  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

					    CUDA_VERSION=10.2

					    CUDA_VERSION=10.2

					@ -224,6 +219,7 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)

					  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)

					    CUDA_VERSION=10.2

					    CUDA_VERSION=10.2

					@ -233,6 +229,17 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					  pytorch-linux-bionic-cuda10.2-cudnn7-py3.9-gcc7)

					    CUDA_VERSION=10.2

					    CUDNN_VERSION=7

					    ANACONDA_PYTHON_VERSION=3.9

					    GCC_VERSION=7

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    BREAKPAD=yes

					    ;;

					    ;;

					  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

					  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

					    CUDA_VERSION=11.0

					    CUDA_VERSION=11.0

					@ -242,57 +249,42 @@ case "$image" in

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    KATEX=yes

					    BREAKPAD=yes

					    ;;

					  pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9)

					    CUDA_VERSION=11.0

					    CUDNN_VERSION=8

					    ANACONDA_PYTHON_VERSION=3.8

					    GCC_VERSION=9

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    KATEX=yes

					    ;;

					  pytorch-linux-bionic-cuda11.1-cudnn8-py3.6-gcc9)

					    CUDA_VERSION=11.1

					    CUDNN_VERSION=8

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=9

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    KATEX=yes

					    ;;

					  pytorch-linux-bionic-cuda11.1-cudnn8-py3.8-gcc9)

					    CUDA_VERSION=11.1

					    CUDNN_VERSION=8

					    ANACONDA_PYTHON_VERSION=3.8

					    GCC_VERSION=9

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    KATEX=yes

					    ;;

					  pytorch-linux-bionic-rocm3.9-py3.6)

					    ANACONDA_PYTHON_VERSION=3.6

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    ROCM_VERSION=3.9

					    ROCM_VERSION=3.9

					    ;;

					    ;;

					  pytorch-linux-bionic-rocm3.10-py3.6)

					  pytorch-linux-bionic-rocm4.0.1-py3.6)

					    ANACONDA_PYTHON_VERSION=3.6

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=9

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    ROCM_VERSION=3.10

					    BREAKPAD=yes

					    ROCM_VERSION=4.0.1

					    ;;

					  pytorch-linux-bionic-rocm4.1-py3.6)

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=9

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    BREAKPAD=yes

					    ROCM_VERSION=4.1

					    ;;

					  pytorch-linux-bionic-rocm4.2-py3.6)

					    ANACONDA_PYTHON_VERSION=3.6

					    GCC_VERSION=9

					    PROTOBUF=yes

					    DB=yes

					    VISION=yes

					    BREAKPAD=yes

					    ROCM_VERSION=4.2

					    ;;

					    ;;

					  *)

					  *)

					    # Catch-all for builds that are not hardcoded.

					    # Catch-all for builds that are not hardcoded.

					    PROTOBUF=yes

					    PROTOBUF=yes

					    DB=yes

					    DB=yes

					    VISION=yes

					    VISION=yes

					    BREAKPAD=yes

					    echo "image '$image' did not match an existing build configuration"

					    echo "image '$image' did not match an existing build configuration"

					    if [[ "$image" == *py* ]]; then

					    if [[ "$image" == *py* ]]; then

					      extract_version_from_image_name py ANACONDA_PYTHON_VERSION

					      extract_version_from_image_name py ANACONDA_PYTHON_VERSION

					@ -328,7 +320,7 @@ if [ -n "${JENKINS:-}" ]; then

					  JENKINS_GID=$(id -g jenkins)

					  JENKINS_GID=$(id -g jenkins)

					fi

					fi

					tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"

					tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | head -c 32)"

					# Build image

					# Build image

					# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

					# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

					@ -356,6 +348,7 @@ docker build \

					       --build-arg "GCC_VERSION=${GCC_VERSION}" \

					       --build-arg "GCC_VERSION=${GCC_VERSION}" \

					       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

					       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

					       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

					       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

					       --build-arg "BREAKPAD=${BREAKPAD}" \

					       --build-arg "ANDROID=${ANDROID}" \

					       --build-arg "ANDROID=${ANDROID}" \

					       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

					       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

					       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

					       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

									
										5

.circleci/docker/build_docker.sh
									
												View File
												
					@ -46,4 +46,7 @@ trap "docker logout ${registry}" EXIT

					docker push "${image}:${tag}"

					docker push "${image}:${tag}"

					docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

					docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

					aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

					if [ -z "${DOCKER_SKIP_S3_UPLOAD:-}" ]; then

					  aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

					fi

									
										1

.circleci/docker/centos-rocm/Dockerfile
									
												View File
												
					@ -64,6 +64,7 @@ ENV PATH /opt/rocm/hcc/bin:$PATH

					ENV PATH /opt/rocm/hip/bin:$PATH

					ENV PATH /opt/rocm/hip/bin:$PATH

					ENV PATH /opt/rocm/opencl/bin:$PATH

					ENV PATH /opt/rocm/opencl/bin:$PATH

					ENV PATH /opt/rocm/llvm/bin:$PATH

					ENV PATH /opt/rocm/llvm/bin:$PATH

					ENV MAGMA_HOME /opt/rocm/magma

					ENV LANG en_US.utf8

					ENV LANG en_US.utf8

					ENV LC_ALL en_US.utf8

					ENV LC_ALL en_US.utf8

									
										2

.circleci/docker/common/install_android.sh
									
												View File
												
					@ -99,7 +99,7 @@ echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

					chown -R jenkins /var/lib/jenkins/gradledeps

					chown -R jenkins /var/lib/jenkins/gradledeps

					chgrp -R jenkins /var/lib/jenkins/gradledeps

					chgrp -R jenkins /var/lib/jenkins/gradledeps

					sudo -H -u jenkins $GRADLE_HOME/bin/gradle -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble

					sudo -H -u jenkins $GRADLE_HOME/bin/gradle -Pandroid.useAndroidX=true -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble

					chown -R jenkins /var/lib/jenkins/.gradle

					chown -R jenkins /var/lib/jenkins/.gradle

					chgrp -R jenkins /var/lib/jenkins/.gradle

					chgrp -R jenkins /var/lib/jenkins/.gradle

									
										1

.circleci/docker/common/install_base.sh
									
												View File
												
					@ -77,6 +77,7 @@ install_centos() {

					    glog-devel \

					    glog-devel \

					    hiredis-devel \

					    hiredis-devel \

					    libstdc++-devel \

					    libstdc++-devel \

					    libsndfile-devel \

					    make \

					    make \

					    opencv-devel \

					    opencv-devel \

					    sudo \

					    sudo \

									
										25

.circleci/docker/common/install_breakpad.sh
									
										Normal file
									
												View File
												
					@ -0,0 +1,25 @@

					#!/bin/bash

					set -ex

					git clone https://github.com/driazati/breakpad.git

					pushd breakpad

					# breakpad has no actual releases, so this is pinned to the top commit from

					# main when this was forked (including the one patch commit). This uses a fork

					# of the breakpad mainline that automatically daisy-chains out to any previously

					# installed signal handlers (instead of overwriting them).

					git checkout 5485e473ed46d065e05489e50dfc59d90dfd7e22

					git clone https://chromium.googlesource.com/linux-syscall-support src/third_party/lss

					pushd src/third_party/lss

					# same as with breakpad, there are no real releases for this repo so use a

					# commit as the pin

					git checkout e1e7b0ad8ee99a875b272c8e33e308472e897660

					popd

					./configure

					make

					make install

					popd

					rm -rf breakpad

									
										40

.circleci/docker/common/install_conda.sh
									
												View File
												
					@ -71,18 +71,22 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

					  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

					  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

					  # DO NOT install cmake here as it would install a version newer than 3.5, but

					  # DO NOT install cmake here as it would install a version newer than 3.5, but

					  # we want to pin to version 3.5.

					  # we want to pin to version 3.5.

					  if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then

					  SCIPY_VERSION=1.1.0

					  if [ "$ANACONDA_PYTHON_VERSION" = "3.9" ]; then

					    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

					    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

					    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0

					    conda_install numpy=1.19.2 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0 -c conda-forge

					    SCIPY_VERSION=1.6.0

					  elif [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then

					    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

					    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0

					  elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then

					  elif [ "$ANACONDA_PYTHON_VERSION" = "3.7" ]; then

					    # DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages

					    # DO NOT install dataclasses if installing python-3.7, since its part of python-3.7 core packages

					    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six typing_extensions

					    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six typing_extensions

					  else

					  else

					    conda_install numpy=1.18.5 pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

					    conda_install numpy=1.18.5 astunparse pyyaml mkl mkl-include setuptools cffi future six dataclasses typing_extensions

					  fi

					  fi

					  if [[ "$CUDA_VERSION" == 9.2* ]]; then

					    conda_install magma-cuda92 -c pytorch

					  if [[ "$CUDA_VERSION" == 10.0* ]]; then

					  elif [[ "$CUDA_VERSION" == 10.0* ]]; then

					    conda_install magma-cuda100 -c pytorch

					    conda_install magma-cuda100 -c pytorch

					  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

					  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

					    conda_install magma-cuda101 -c pytorch

					    conda_install magma-cuda101 -c pytorch

					@ -92,8 +96,8 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

					    conda_install magma-cuda110 -c pytorch

					    conda_install magma-cuda110 -c pytorch

					  elif [[ "$CUDA_VERSION" == 11.1* ]]; then

					  elif [[ "$CUDA_VERSION" == 11.1* ]]; then

					    conda_install magma-cuda111 -c pytorch

					    conda_install magma-cuda111 -c pytorch

					  elif [[ "$CUDA_VERSION" == 11.2* ]]; then

					  elif [[ "$CUDA_VERSION" == 11.3* ]]; then

					    conda_install magma-cuda112 -c pytorch

					    conda_install magma-cuda113 -c pytorch

					  fi

					  fi

					  # TODO: This isn't working atm

					  # TODO: This isn't working atm

					@ -103,20 +107,26 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

					  # TODO: Why is scipy pinned

					  # TODO: Why is scipy pinned

					  # Pin MyPy version because new errors are likely to appear with each release

					  # Pin MyPy version because new errors are likely to appear with each release

					  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

					  # Pin hypothesis to avoid flakiness: https://github.com/pytorch/pytorch/issues/31136

					  # Pin coverage so we can use COVERAGE_RCFILE

					  as_jenkins pip install --progress-bar off pytest \

					  as_jenkins pip install --progress-bar off pytest \

					    scipy==1.1.0 \

					    scipy==$SCIPY_VERSION \

					    scikit-image \

					    scikit-image \

					    librosa>=0.6.2 \

					    psutil \

					    psutil \

					    numba \

					    llvmlite \

					    unittest-xml-reporting \

					    unittest-xml-reporting \

					    boto3==1.16.34 \

					    boto3==1.16.34 \

					    coverage \

					    coverage==5.5 \

					    hypothesis==4.53.2 \

					    hypothesis==4.53.2 \

					    mypy==0.770 \

					    mypy==0.812 \

					    tb-nightly

					    tb-nightly

					  # Install numba only on python-3.8 or below

					  # For numba issue see https://github.com/pytorch/pytorch/issues/51511

					  if [[ $(python -c "import sys; print(int(sys.version_info < (3, 9)))") == "1" ]]; then

					    as_jenkins pip install --progress-bar off numba librosa>=0.6.2

					  else

					    as_jenkins pip install --progress-bar off numba==0.49.0 librosa>=0.6.2

					  fi

					  # Update scikit-learn to a python-3.8 compatible version

					  # Update scikit-learn to a python-3.8 compatible version

					  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then

					  if [[ $(python -c "import sys; print(int(sys.version_info >= (3, 8)))") == "1" ]]; then

					    as_jenkins pip install --progress-bar off -U scikit-learn

					    as_jenkins pip install --progress-bar off -U scikit-learn

									
										14

.circleci/docker/common/install_openssl.sh
									
										Normal file
									
												View File
												
					@ -0,0 +1,14 @@

					#!/bin/bash

					set -ex

					OPENSSL=openssl-1.1.1k

					wget -q -O "${OPENSSL}.tar.gz" "https://www.openssl.org/source/${OPENSSL}.tar.gz"

					tar xf "${OPENSSL}.tar.gz"

					cd "${OPENSSL}"

					./config --prefix=/opt/openssl -d '-Wl,--enable-new-dtags,-rpath,$(LIBRPATH)'

					# NOTE: opensl errors out when built with the -j option

					make install_sw

					cd ..

					rm -rf "${OPENSSL}"

									
										22

.circleci/docker/common/install_rocm.sh
									
												View File
												
					@ -4,20 +4,27 @@ set -ex

					install_magma() {

					install_magma() {

					    # "install" hipMAGMA into /opt/rocm/magma by copying after build

					    # "install" hipMAGMA into /opt/rocm/magma by copying after build

					    git clone https://bitbucket.org/icl/magma.git -b hipMAGMA

					    git clone https://bitbucket.org/icl/magma.git

					    pushd magma

					    pushd magma

					    cp make.inc-examples/make.inc.hip-mkl-gcc make.inc

					    git checkout 878b1ce02e9cfe4a829be22c8f911e9c0b6bd88f

					    cp make.inc-examples/make.inc.hip-gcc-mkl make.inc

					    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

					    echo 'LIBDIR += -L$(MKLROOT)/lib' >> make.inc

					    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

					    echo 'LIB += -Wl,--enable-new-dtags -Wl,--rpath,/opt/rocm/lib -Wl,--rpath,$(MKLROOT)/lib -Wl,--rpath,/opt/rocm/magma/lib' >> make.inc

					    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908' >> make.inc

					    echo 'DEVCCFLAGS += --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --gpu-max-threads-per-block=256' >> make.inc

					    # hipcc with openmp flag may cause isnan() on __device__ not to be found; depending on context, compiler may attempt to match with host definition

					    sed -i 's/^FOPENMP/#FOPENMP/g' make.inc

					    export PATH="${PATH}:/opt/rocm/bin"

					    export PATH="${PATH}:/opt/rocm/bin"

					    make -f make.gen.hipMAGMA -j $(nproc)

					    make -f make.gen.hipMAGMA -j $(nproc)

					    make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

					    LANG=C.UTF-8 make lib/libmagma.so -j $(nproc) MKLROOT=/opt/conda

					    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

					    make testing/testing_dgemm -j $(nproc) MKLROOT=/opt/conda

					    popd

					    popd

					    mv magma /opt/rocm

					    mv magma /opt/rocm

					}

					}

					ver() {

					    printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');

					}

					install_ubuntu() {

					install_ubuntu() {

					    apt-get update

					    apt-get update

					    if [[ $UBUNTU_VERSION == 18.04 ]]; then

					    if [[ $UBUNTU_VERSION == 18.04 ]]; then

					@ -31,9 +38,14 @@ install_ubuntu() {

					    apt-get install -y libc++1

					    apt-get install -y libc++1

					    apt-get install -y libc++abi1

					    apt-get install -y libc++abi1

					    ROCM_REPO="ubuntu"

					    if [[ $(ver $ROCM_VERSION) -lt $(ver 4.2) ]]; then

					        ROCM_REPO="xenial"

					    fi

					    # Add rocm repository

					    # Add rocm repository

					    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

					    wget -qO - http://repo.radeon.com/rocm/rocm.gpg.key | apt-key add -

					    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} xenial main" > /etc/apt/sources.list.d/rocm.list

					    echo "deb [arch=amd64] http://repo.radeon.com/rocm/apt/${ROCM_VERSION} ${ROCM_REPO} main" > /etc/apt/sources.list.d/rocm.list

					    apt-get update --allow-insecure-repositories

					    apt-get update --allow-insecure-repositories

					    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \

					    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \

									
										19

.circleci/docker/common/install_vulkan_sdk.sh
									
												View File
												
					@ -8,16 +8,17 @@ retry () {

					    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

					    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

					}

					}

					_https_amazon_aws=https://ossci-android.s3.amazonaws.com

					_vulkansdk_dir=/var/lib/jenkins/vulkansdk

					_vulkansdk_dir=/var/lib/jenkins/vulkansdk

					mkdir -p $_vulkansdk_dir

					_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz

					_tmp_vulkansdk_targz=/tmp/vulkansdk.tar.gz

					curl --silent --show-error --location --fail --retry 3 \

					  --output "$_tmp_vulkansdk_targz" "$_https_amazon_aws/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"

					tar -C "$_vulkansdk_dir" -xzf "$_tmp_vulkansdk_targz" --strip-components 1

					curl \

					  --silent \

					  --show-error \

					  --location \

					  --fail \

					  --retry 3 \

					  --output "${_tmp_vulkansdk_targz}" "https://ossci-android.s3.amazonaws.com/vulkansdk-linux-x86_64-${VULKAN_SDK_VERSION}.tar.gz"

					export VULKAN_SDK="$_vulkansdk_dir/"

					mkdir -p "${_vulkansdk_dir}"

					tar -C "${_vulkansdk_dir}" -xzf "${_tmp_vulkansdk_targz}" --strip-components 1

					rm "$_tmp_vulkansdk_targz"

					rm -rf "${_tmp_vulkansdk_targz}"

									
										4

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
					@ -93,5 +93,9 @@ ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

					# Install LLVM dev version (Defined in the pytorch/builder github repository)

					# Install LLVM dev version (Defined in the pytorch/builder github repository)

					COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

					COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

					ADD ./common/install_openssl.sh install_openssl.sh

					ENV OPENSSL_ROOT_DIR /opt/openssl

					RUN bash ./install_openssl.sh

					USER jenkins

					USER jenkins

					CMD ["bash"]

					CMD ["bash"]

									
										5

.circleci/docker/ubuntu-rocm/Dockerfile
									
												View File
												
					@ -27,6 +27,11 @@ ARG ANACONDA_PYTHON_VERSION

					ADD ./common/install_conda.sh install_conda.sh

					ADD ./common/install_conda.sh install_conda.sh

					RUN bash ./install_conda.sh && rm install_conda.sh

					RUN bash ./install_conda.sh && rm install_conda.sh

					# Install gcc

					ARG GCC_VERSION

					ADD ./common/install_gcc.sh install_gcc.sh

					RUN bash ./install_gcc.sh && rm install_gcc.sh

					# (optional) Install protobuf for ONNX

					# (optional) Install protobuf for ONNX

					ARG PROTOBUF

					ARG PROTOBUF

					ADD ./common/install_protobuf.sh install_protobuf.sh

					ADD ./common/install_protobuf.sh install_protobuf.sh

									
										11

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
					@ -82,6 +82,13 @@ RUN rm AndroidManifest.xml

					RUN rm build.gradle

					RUN rm build.gradle

					ENV INSTALLED_ANDROID ${ANDROID}

					ENV INSTALLED_ANDROID ${ANDROID}

					# (optional) Install breakpad

					ARG BREAKPAD

					ADD ./common/install_breakpad.sh install_breakpad.sh

					RUN if [ -n "${BREAKPAD}" ]; then bash ./install_breakpad.sh; fi

					RUN rm install_breakpad.sh

					ENV INSTALLED_BREAKPAD ${BREAKPAD}

					# (optional) Install Vulkan SDK

					# (optional) Install Vulkan SDK

					ARG VULKAN_SDK_VERSION

					ARG VULKAN_SDK_VERSION

					ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

					ADD ./common/install_vulkan_sdk.sh install_vulkan_sdk.sh

					@ -123,5 +130,9 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

					# Install LLVM dev version (Defined in the pytorch/builder github repository)

					# Install LLVM dev version (Defined in the pytorch/builder github repository)

					COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

					COPY --from=pytorch/llvm:9.0.1 /opt/llvm /opt/llvm

					ADD ./common/install_openssl.sh install_openssl.sh

					RUN bash ./install_openssl.sh

					ENV OPENSSL_ROOT_DIR /opt/openssl

					USER jenkins

					USER jenkins

					CMD ["bash"]

					CMD ["bash"]

									
										6

.circleci/ecr_gc_docker/Dockerfile
									
												View File
												
					@ -1,10 +1,10 @@

					FROM ubuntu:16.04

					FROM ubuntu:18.04

					RUN apt-get update && apt-get install -y python-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

					RUN apt-get update && apt-get install -y python3-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

					ADD requirements.txt /requirements.txt

					ADD requirements.txt /requirements.txt

					RUN pip install -r /requirements.txt

					RUN pip3 install -r /requirements.txt

					ADD gc.py /usr/bin/gc.py

					ADD gc.py /usr/bin/gc.py

									
										2

.circleci/ecr_gc_docker/docker_hub.py
									
												View File
												
					@ -1,4 +1,4 @@

					#!/usr/bin/env python

					#!/usr/bin/env python3

					from collections import namedtuple

					from collections import namedtuple

									
										12

.circleci/ecr_gc_docker/gc.py
									
												View File
												
					@ -1,11 +1,11 @@

					#!/usr/bin/env python

					#!/usr/bin/env python3

					import argparse

					import argparse

					import datetime

					import boto3

					import boto3

					import datetime

					import pytz

					import pytz

					import sys

					import re

					import re

					import sys

					def save_to_s3(project, data):

					def save_to_s3(project, data):

					@ -148,9 +148,12 @@ def chunks(chunkable, n):

					    """ Yield successive n-sized chunks from l.

					    """ Yield successive n-sized chunks from l.

					    """

					    """

					    for i in range(0, len(chunkable), n):

					    for i in range(0, len(chunkable), n):

					        yield chunkable[i : i + n]

					        yield chunkable[i: i + n]

					SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')

					SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')

					def looks_like_git_sha(tag):

					def looks_like_git_sha(tag):

					    """Returns a boolean to check if a tag looks like a git sha

					    """Returns a boolean to check if a tag looks like a git sha

					@ -159,6 +162,7 @@ def looks_like_git_sha(tag):

					    """

					    """

					    return re.match(SHA_PATTERN, tag) is not None

					    return re.match(SHA_PATTERN, tag) is not None

					stable_window_tags = []

					stable_window_tags = []

					for repo in repos(client):

					for repo in repos(client):

					    repositoryName = repo["repositoryName"]

					    repositoryName = repo["repositoryName"]

									
										56

.circleci/generate_config_yml.py
									
												View File
												
					@ -80,6 +80,52 @@ class Header(object):

					        for line in filter(None, lines):

					        for line in filter(None, lines):

					            output_filehandle.write(line + "\n")

					            output_filehandle.write(line + "\n")

					def filter_master_only_jobs(items):

					    def _for_all_items(items, functor) -> None:

					        if isinstance(items, list):

					            for item in items:

					                _for_all_items(item, functor)

					        if isinstance(items, dict) and len(items) == 1:

					            item_type, item = next(iter(items.items()))

					            functor(item_type, item)

					    def _is_master_item(item):

					        filters = item.get('filters', None)

					        branches = filters.get('branches', None) if filters is not None else None

					        branches_only = branches.get('only', None) if branches is not None else None

					        return 'master' in branches_only if branches_only is not None else False

					    master_deps = set()

					    def _save_requires_if_master(item_type, item):

					        requires = item.get('requires', None)

					        item_name = item.get("name", None)

					        if not isinstance(requires, list):

					            return

					        if _is_master_item(item) or item_name in master_deps:

					            master_deps.update([n.strip('"') for n in requires])

					    def _do_filtering(items):

					        if isinstance(items, list):

					            rc = [_do_filtering(item) for item in items]

					            return [item for item in rc if len(item if item is not None else []) > 0]

					        assert isinstance(items, dict) and len(items) == 1

					        item_type, item = next(iter(items.items()))

					        item_name = item.get("name", None)

					        item_name = item_name.strip('"') if item_name is not None else None

					        if not _is_master_item(item) and item_name not in master_deps:

					            return None

					        if 'filters' in item:

					            item = item.copy()

					            item.pop('filters')

					        return {item_type: item}

					    # Scan of dependencies twice to pick up nested required jobs

					    # I.e. jobs depending on jobs that master-only job depend on

					    _for_all_items(items, _save_requires_if_master)

					    _for_all_items(items, _save_requires_if_master)

					    return _do_filtering(items)

					def gen_build_workflows_tree():

					def gen_build_workflows_tree():

					    build_workflows_functions = [

					    build_workflows_functions = [

					@ -105,7 +151,8 @@ def gen_build_workflows_tree():

					        binary_build_definitions.get_nightly_tests,

					        binary_build_definitions.get_nightly_tests,

					        binary_build_definitions.get_nightly_uploads,

					        binary_build_definitions.get_nightly_uploads,

					    ]

					    ]

					    build_jobs = [f() for f in build_workflows_functions]

					    master_build_jobs = filter_master_only_jobs(build_jobs)

					    return {

					    return {

					        "workflows": {

					        "workflows": {

					            "binary_builds": {

					            "binary_builds": {

					@ -114,7 +161,11 @@ def gen_build_workflows_tree():

					            },

					            },

					            "build": {

					            "build": {

					                "when": r"<< pipeline.parameters.run_build >>",

					                "when": r"<< pipeline.parameters.run_build >>",

					                "jobs": [f() for f in build_workflows_functions]

					                "jobs": build_jobs,

					            },

					            "master_build": {

					                "when": r"<< pipeline.parameters.run_master_build >>",

					                "jobs": master_build_jobs,

					            },

					            },

					        }

					        }

					    }

					    }

					@ -139,6 +190,7 @@ YAML_SOURCES = [

					    File("job-specs/docker_jobs.yml"),

					    File("job-specs/docker_jobs.yml"),

					    Header("Workflows"),

					    Header("Workflows"),

					    Treegen(gen_build_workflows_tree, 0),

					    Treegen(gen_build_workflows_tree, 0),

					    File("workflows/workflows-scheduled-ci.yml"),

					    File("workflows/workflows-ecr-gc.yml"),

					    File("workflows/workflows-ecr-gc.yml"),

					    File("workflows/workflows-promote.yml"),

					    File("workflows/workflows-promote.yml"),

					]

					]

									
										5

.circleci/regenerate.ps1
									
										Normal file
									
												View File
												
					@ -0,0 +1,5 @@

					cd $PSScriptRoot;

					$NewFile = New-TemporaryFile;

					python generate_config_yml.py > $NewFile.name

					(Get-Content $NewFile.name -Raw).TrimEnd().Replace("`r`n","`n") | Set-Content config.yml -Force

					Remove-Item $NewFile.name

									
										17

.circleci/regenerate.sh
									
												View File
												
					@ -1,8 +1,17 @@

					#!/bin/bash -xe

					#!/bin/bash -e

					# Allows this script to be invoked from any directory:

					# Allows this script to be invoked from any directory:

					cd $(dirname "$0")

					cd "$(dirname "$0")"

					UNCOMMIT_CHANGE=$(git status -s | grep " config.yml" | wc -l | xargs)

					if [[ $UNCOMMIT_CHANGE != 0 ]]; then

					    OLD_FILE=$(mktemp)

					    cp config.yml "$OLD_FILE"

					    echo "Uncommitted change detected in .circleci/config.yml"

					    echo "It has been backed up to $OLD_FILE"

					fi

					NEW_FILE=$(mktemp)

					NEW_FILE=$(mktemp)

					./generate_config_yml.py > $NEW_FILE

					./generate_config_yml.py > "$NEW_FILE"

					cp $NEW_FILE config.yml

					cp "$NEW_FILE" config.yml

					echo "New config generated in .circleci/config.yml"

									
										1

.circleci/scripts/binary_checkout.sh
									
												View File
												
					@ -62,6 +62,7 @@ popd

					# Clone the Builder master repo

					# Clone the Builder master repo

					retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

					retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

					git checkout release/1.9

					pushd "$BUILDER_ROOT"

					pushd "$BUILDER_ROOT"

					echo "Using builder from "

					echo "Using builder from "

					git --no-pager log --max-count 1

					git --no-pager log --max-count 1

									
										2

.circleci/scripts/binary_ios_build.sh
									
												View File
												
					@ -15,7 +15,7 @@ export PATH="~/anaconda/bin:${PATH}"

					source ~/anaconda/bin/activate

					source ~/anaconda/bin/activate

					# Install dependencies

					# Install dependencies

					conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests --yes

					conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi requests typing_extensions --yes

					conda install -c conda-forge valgrind --yes

					conda install -c conda-forge valgrind --yes

					export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

					export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

									
										2

.circleci/scripts/binary_ios_test.sh
									
												View File
												
					@ -24,6 +24,6 @@ rm cert.txt

					if ! [ -x "$(command -v xcodebuild)" ]; then

					if ! [ -x "$(command -v xcodebuild)" ]; then

					    echo 'Error: xcodebuild is not installed.'

					    echo 'Error: xcodebuild is not installed.'

					    exit 1

					    exit 1

					fi 

					fi

					PROFILE=PyTorch_CI_2021

					PROFILE=PyTorch_CI_2021

					ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

					ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

									
										4

.circleci/scripts/binary_linux_build.sh
									
												View File
												
					@ -7,6 +7,10 @@ source /env

					# Defaults here so they can be changed in one place

					# Defaults here so they can be changed in one place

					export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

					export MAX_JOBS=${MAX_JOBS:-$(( $(nproc) - 2 ))}

					if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

					  export BUILD_SPLIT_CUDA="ON"

					fi

					# Parse the parameters

					# Parse the parameters

					if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

					if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

					  build_script='conda/build_pytorch.sh'

					  build_script='conda/build_pytorch.sh'

									
										4

.circleci/scripts/binary_linux_test.sh
									
												View File
												
					@ -38,6 +38,10 @@ if [[ "$DESIRED_CUDA" == "cu112" ]]; then

					  EXTRA_CONDA_FLAGS="-c=conda-forge"

					  EXTRA_CONDA_FLAGS="-c=conda-forge"

					fi

					fi

					# Move debug wheels out of the the package dir so they don't get installed

					mkdir -p /tmp/debug_final_pkgs

					mv /final_pkgs/debug-*.zip /tmp/debug_final_pkgs || echo "no debug packages to move"

					# Install the package

					# Install the package

					# These network calls should not have 'retry's because they are installing

					# These network calls should not have 'retry's because they are installing

					# locally and aren't actually network calls

					# locally and aren't actually network calls

									
										22

.circleci/scripts/binary_populate_env.sh
									
												View File
												
					@ -68,12 +68,24 @@ if [[ -z "$DOCKER_IMAGE" ]]; then

					  fi

					  fi

					fi

					fi

					USE_GOLD_LINKER="OFF"

					# GOLD linker can not be used if CUPTI is statically linked into PyTorch, see https://github.com/pytorch/pytorch/issues/57744

					if [[ ${DESIRED_CUDA} == "cpu" ]]; then

					  USE_GOLD_LINKER="ON"

					fi

					USE_WHOLE_CUDNN="OFF"

					# Link whole cuDNN for CUDA-11.1 to include fp16 fast kernels

					if [[  "$(uname)" == "Linux" && "${DESIRED_CUDA}" == "cu111" ]]; then

					  USE_WHOLE_CUDNN="ON"

					fi

					# Default to nightly, since that's where this normally uploads to

					# Default to nightly, since that's where this normally uploads to

					PIP_UPLOAD_FOLDER='nightly/'

					PIP_UPLOAD_FOLDER='nightly/'

					# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

					# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

					export DATE="$(date -u +%Y%m%d)"

					export DATE="$(date -u +%Y%m%d)"

					#TODO: We should be pulling semver version from the base version.txt

					#TODO: We should be pulling semver version from the base version.txt

					BASE_BUILD_VERSION="1.8.0.dev$DATE"

					BASE_BUILD_VERSION="1.9.0.dev$DATE"

					# Change BASE_BUILD_VERSION to git tag when on a git tag

					# Change BASE_BUILD_VERSION to git tag when on a git tag

					# Use 'git -C' to make doubly sure we're in the correct directory for checking

					# Use 'git -C' to make doubly sure we're in the correct directory for checking

					# the git tag

					# the git tag

					@ -85,7 +97,7 @@ if tagged_version >/dev/null; then

					  # Turns tag v1.6.0-rc1 -> v1.6.0

					  # Turns tag v1.6.0-rc1 -> v1.6.0

					  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"

					  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"

					fi

					fi

					if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

					if [[ "$(uname)" == 'Darwin' ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

					  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"

					  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"

					else

					else

					  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

					  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

					@ -136,7 +148,7 @@ if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

					fi

					fi

					export DATE="$DATE"

					export DATE="$DATE"

					export NIGHTLIES_DATE_PREAMBLE=1.8.0.dev

					export NIGHTLIES_DATE_PREAMBLE=1.9.0.dev

					export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

					export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

					export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

					export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

					export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

					export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

					@ -168,6 +180,10 @@ export CIRCLE_SHA1="$CIRCLE_SHA1"

					export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

					export CIRCLE_PR_NUMBER="${CIRCLE_PR_NUMBER:-}"

					export CIRCLE_BRANCH="$CIRCLE_BRANCH"

					export CIRCLE_BRANCH="$CIRCLE_BRANCH"

					export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

					export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

					export USE_GOLD_LINKER="${USE_GOLD_LINKER}"

					export USE_GLOO_WITH_OPENSSL="ON"

					export USE_WHOLE_CUDNN="${USE_WHOLE_CUDNN}"

					# =================== The above code will be executed inside Docker container ===================

					# =================== The above code will be executed inside Docker container ===================

					EOL

					EOL

									
										8

.circleci/scripts/binary_windows_build.sh
									
												View File
												
					@ -15,6 +15,10 @@ else

					  export VC_YEAR=2019

					  export VC_YEAR=2019

					fi

					fi

					if [[ "${DESIRED_CUDA}" == "cu111" ]]; then

					  export BUILD_SPLIT_CUDA="ON"

					fi

					set +x

					set +x

					export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

					export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

					export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

					export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

					@ -27,6 +31,10 @@ if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Pac

					  mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"

					  mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"

					fi

					fi

					if [[ "$CIRCLECI" == 'true' && -d "C:\\Microsoft" ]]; then

					  rm -rf "C:\\Microsoft\\Android*"

					fi

					echo "Free space on filesystem before build:"

					echo "Free space on filesystem before build:"

					df -h

					df -h

									
										2

.circleci/scripts/build_android_gradle.sh
									
												View File
												
					@ -10,7 +10,7 @@ export ANDROID_HOME=/opt/android/sdk

					# Must be in sync with GRADLE_VERSION in docker image for android

					# Must be in sync with GRADLE_VERSION in docker image for android

					# https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155

					# https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155

					export GRADLE_VERSION=4.10.3

					export GRADLE_VERSION=6.8.3

					export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

					export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

					export GRADLE_PATH=$GRADLE_HOME/bin/gradle

					export GRADLE_PATH=$GRADLE_HOME/bin/gradle

									
										4

.circleci/scripts/publish_android_snapshot.sh
									
												View File
												
					@ -5,7 +5,7 @@ set -eu -o pipefail

					export ANDROID_NDK_HOME=/opt/ndk

					export ANDROID_NDK_HOME=/opt/ndk

					export ANDROID_HOME=/opt/android/sdk

					export ANDROID_HOME=/opt/android/sdk

					export GRADLE_VERSION=4.10.3

					export GRADLE_VERSION=6.8.3

					export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

					export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

					export GRADLE_PATH=$GRADLE_HOME/bin/gradle

					export GRADLE_PATH=$GRADLE_HOME/bin/gradle

					@ -35,7 +35,9 @@ else

					  echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

					  echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

					  echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES

					  echo "SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES

					  echo "mavenCentralRepositoryUsername=${SONATYPE_NEXUS_USERNAME}" >> $GRADLE_PROPERTIES

					  echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES

					  echo "SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES

					  echo "mavenCentralRepositoryPassword=${SONATYPE_NEXUS_PASSWORD}" >> $GRADLE_PROPERTIES

					  echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES

					  echo "signing.keyId=${ANDROID_SIGN_KEY}" >> $GRADLE_PROPERTIES

					  echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES

					  echo "signing.password=${ANDROID_SIGN_PASS}" >> $GRADLE_PROPERTIES

									
										8

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
					@ -111,14 +111,6 @@ popd

					git rm -rf "$install_path" || true

					git rm -rf "$install_path" || true

					mv "$pt_checkout/docs/build/html" "$install_path"

					mv "$pt_checkout/docs/build/html" "$install_path"

					# Add the version handler by search and replace.

					# XXX: Consider moving this to the docs Makefile or site build

					if [ "$is_master_doc" = true ]; then

					  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>\1 \&#x25BC</a>@g"

					else

					  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"

					fi

					# Prevent Google from indexing $install_path/_modules. This folder contains

					# Prevent Google from indexing $install_path/_modules. This folder contains

					# generated source files.

					# generated source files.

					# NB: the following only works on gnu sed. The sed shipped with mac os is different.

					# NB: the following only works on gnu sed. The sed shipped with mac os is different.

									
										49

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
					@ -24,7 +24,9 @@ retry sudo apt-get -y install \

					echo "== DOCKER VERSION =="

					echo "== DOCKER VERSION =="

					docker version

					docker version

					retry sudo pip -q install awscli==1.16.35

					if ! command -v aws >/dev/null; then

					  retry sudo pip3 -q install awscli==1.19.64

					fi

					if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

					if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

					  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

					  DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

					@ -48,43 +50,50 @@ else

					fi

					fi

					add_to_env_file() {

					add_to_env_file() {

					  local content

					  local name=$1

					  content=$1

					  local value=$2

					  # BASH_ENV should be set by CircleCI

					  case "$value" in

					  echo "${content}" >> "${BASH_ENV:-/tmp/env}"

					    *\ *)

					      # BASH_ENV should be set by CircleCI

					      echo "${name}='${value}'" >> "${BASH_ENV:-/tmp/env}"

					      ;;

					    *)

					      echo "${name}=${value}" >> "${BASH_ENV:-/tmp/env}"

					      ;;

					  esac

					}

					}

					add_to_env_file "IN_CI=1"

					add_to_env_file IN_CI 1

					add_to_env_file "COMMIT_SOURCE=${CIRCLE_BRANCH:-}"

					add_to_env_file COMMIT_SOURCE "${CIRCLE_BRANCH:-}"

					add_to_env_file "BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}"

					add_to_env_file BUILD_ENVIRONMENT "${BUILD_ENVIRONMENT}"

					add_to_env_file "CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}"

					add_to_env_file CIRCLE_PULL_REQUEST "${CIRCLE_PULL_REQUEST}"

					if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

					if [[ "${BUILD_ENVIRONMENT}" == *-build ]]; then

					  add_to_env_file "SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2"

					  add_to_env_file SCCACHE_BUCKET ossci-compiler-cache-circleci-v2

					  SCCACHE_MAX_JOBS=$(( $(nproc) - 1 ))

					  SCCACHE_MAX_JOBS=$(( $(nproc) - 1 ))

					  MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

					  MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

					  MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

					  MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

					  add_to_env_file "MAX_JOBS=${MAX_JOBS}"

					  add_to_env_file MAX_JOBS "${MAX_JOBS}"

					  if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

					  if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

					    add_to_env_file "TORCH_CUDA_ARCH_LIST=5.2"

					    add_to_env_file TORCH_CUDA_ARCH_LIST 5.2

					  fi

					  fi

					  if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

					  if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

					    # This IAM user allows write access to S3 bucket for sccache & bazels3cache

					    # This IAM user allows write access to S3 bucket for sccache & bazels3cache

					    set +x

					    set +x

					    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

					    add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

					    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

					    add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

					    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

					    add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_AND_XLA_BAZEL_S3_BUCKET_V2:-}"

					    set -x

					    set -x

					  else

					  else

					    # This IAM user allows write access to S3 bucket for sccache

					    # This IAM user allows write access to S3 bucket for sccache

					    set +x

					    set +x

					    add_to_env_file "XLA_CLANG_CACHE_S3_BUCKET_NAME=${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

					    add_to_env_file XLA_CLANG_CACHE_S3_BUCKET_NAME "${XLA_CLANG_CACHE_S3_BUCKET_NAME:-}"

					    add_to_env_file "AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

					    add_to_env_file AWS_ACCESS_KEY_ID "${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

					    add_to_env_file "AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

					    add_to_env_file AWS_SECRET_ACCESS_KEY "${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}"

					    set -x

					    set -x

					  fi

					  fi

					fi

					fi

					@ -93,5 +102,7 @@ fi

					set +x

					set +x

					export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}

					export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4:-}

					export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}

					export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4:-}

					eval "$(aws ecr get-login --region us-east-1 --no-include-email)"

					export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

					export AWS_REGION=us-east-1

					aws ecr get-login-password --region $AWS_REGION|docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

					set -x

					set -x

									
										140

.circleci/scripts/trigger_azure_pipeline.py
									
										Normal file
									
												View File
												
					@ -0,0 +1,140 @@

					# Documentation: https://docs.microsoft.com/en-us/rest/api/azure/devops/build/?view=azure-devops-rest-6.0

					import re

					import json

					import os

					import sys

					import requests

					import time

					AZURE_PIPELINE_BASE_URL = "https://aiinfra.visualstudio.com/PyTorch/"

					AZURE_DEVOPS_PAT_BASE64 = os.environ.get("AZURE_DEVOPS_PAT_BASE64_SECRET", "")

					PIPELINE_ID = "911"

					PROJECT_ID = "0628bce4-2d33-499e-bac5-530e12db160f"

					TARGET_BRANCH = os.environ.get("CIRCLE_BRANCH", "master")

					TARGET_COMMIT = os.environ.get("CIRCLE_SHA1", "")

					build_base_url = AZURE_PIPELINE_BASE_URL + "_apis/build/builds?api-version=6.0"

					s = requests.Session()

					s.headers.update({"Authorization": "Basic " + AZURE_DEVOPS_PAT_BASE64})

					def submit_build(pipeline_id, project_id, source_branch, source_version):

					    print("Submitting build for branch: " + source_branch)

					    print("Commit SHA1: ", source_version)

					    run_build_raw = s.post(build_base_url, json={

					        "definition": {"id": pipeline_id},

					        "project": {"id": project_id},

					        "sourceBranch": source_branch,

					        "sourceVersion": source_version

					    })

					    try:

					        run_build_json = run_build_raw.json()

					    except json.decoder.JSONDecodeError as e:

					        print(e)

					        print("Failed to parse the response. Check if the Azure DevOps PAT is incorrect or expired.")

					        sys.exit(-1)

					    build_id = run_build_json['id']

					    print("Submitted bulid: " + str(build_id))

					    print("Bulid URL: " + run_build_json['url'])

					    return build_id

					def get_build(_id):

					    get_build_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}?api-version=6.0"

					    get_build_raw = s.get(get_build_url)

					    return get_build_raw.json()

					def get_build_logs(_id):

					    get_build_logs_url = AZURE_PIPELINE_BASE_URL + f"/_apis/build/builds/{_id}/logs?api-version=6.0"

					    get_build_logs_raw = s.get(get_build_logs_url)

					    return get_build_logs_raw.json()

					def get_log_content(url):

					    resp = s.get(url)

					    return resp.text

					def wait_for_build(_id):

					    build_detail = get_build(_id)

					    build_status = build_detail['status']

					    while build_status == 'notStarted':

					        print('Waiting for run to start: ' + str(_id))

					        sys.stdout.flush()

					        try:

					            build_detail = get_build(_id)

					            build_status = build_detail['status']

					        except Exception as e:

					            print("Error getting build")

					            print(e)

					        time.sleep(30)

					    print("Bulid started: ", str(_id))

					    handled_logs = set()

					    while build_status == 'inProgress':

					        try:

					            print("Waiting for log: " + str(_id))

					            logs = get_build_logs(_id)

					        except Exception as e:

					            print("Error fetching logs")

					            print(e)

					            time.sleep(30)

					            continue

					        for log in logs['value']:

					            log_id = log['id']

					            if log_id in handled_logs:

					                continue

					            handled_logs.add(log_id)

					            print('Fetching log: \n' + log['url'])

					            try:

					                log_content = get_log_content(log['url'])

					                print(log_content)

					            except Exception as e:

					                print("Error getting log content")

					                print(e)

					            sys.stdout.flush()

					        build_detail = get_build(_id)

					        build_status = build_detail['status']

					        time.sleep(30)

					    build_result = build_detail['result']

					    print("Bulid status: " + build_status)

					    print("Bulid result: " + build_result)

					    return build_status, build_result

					if __name__ == '__main__':

					    # Convert the branch name for Azure DevOps

					    match = re.search(r'pull/(\d+)', TARGET_BRANCH)

					    if match is not None:

					        pr_num = match.group(1)

					        SOURCE_BRANCH = f'refs/pull/{pr_num}/head'

					    else:

					        SOURCE_BRANCH = f'refs/heads/{TARGET_BRANCH}'

					    MAX_RETRY = 2

					    retry = MAX_RETRY

					    while retry > 0:

					        build_id = submit_build(PIPELINE_ID, PROJECT_ID, SOURCE_BRANCH, TARGET_COMMIT)

					        build_status, build_result = wait_for_build(build_id)

					        if build_result != 'succeeded':

					            retry = retry - 1

					            if retry > 0:

					                print("Retrying... remaining attempt: " + str(retry))

					                # Wait a bit before retrying

					                time.sleep((MAX_RETRY - retry) * 120)

					                continue

					            else:

					                print("No more chance to retry. Giving up.")

					                sys.exit(-1)

					        else:

					            break

									
										4

.circleci/scripts/upload_binary_size_to_scuba.py
									
												View File
												
					@ -17,7 +17,7 @@ def get_size(file_dir):

					        # we should only expect one file, if no, something is wrong

					        # we should only expect one file, if no, something is wrong

					        file_name = glob.glob(os.path.join(file_dir, "*"))[0]

					        file_name = glob.glob(os.path.join(file_dir, "*"))[0]

					        return os.stat(file_name).st_size

					        return os.stat(file_name).st_size

					    except:

					    except Exception:

					        logging.exception(f"error getting file from: {file_dir}")

					        logging.exception(f"error getting file from: {file_dir}")

					        return 0

					        return 0

					@ -145,5 +145,5 @@ if __name__ == "__main__":

					        if size != 0:

					        if size != 0:

					            try:

					            try:

					                send_message([build_message(size)])

					                send_message([build_message(size)])

					            except:

					            except Exception:

					                logging.exception("can't send message")

					                logging.exception("can't send message")

									
										24

.circleci/scripts/vs_install.ps1
									
												View File
												
					@ -1,7 +1,10 @@

					$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"

					# https://developercommunity.visualstudio.com/t/install-specific-version-of-vs-component/1142479

					# https://docs.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers

					# 16.8.5 BuildTools

					$VS_DOWNLOAD_LINK = "https://download.visualstudio.microsoft.com/download/pr/20130c62-1bc8-43d6-b4f0-c20bb7c79113/145a319d79a83376915d8f855605e152ef5f6fa2b2f1d2dca411fb03722eea72/vs_BuildTools.exe"

					$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

					$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

					$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

					$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

					                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.13",

					                                                     "--add Microsoft.Component.MSBuild",

					                                                     "--add Microsoft.Component.MSBuild",

					                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",

					                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",

					                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",

					                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",

					@ -13,10 +16,25 @@ $VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStud

					curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

					curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

					if ($LASTEXITCODE -ne 0) {

					if ($LASTEXITCODE -ne 0) {

					    echo "Download of the VS 2017 installer failed"

					    echo "Download of the VS 2019 Version 16.8.5 installer failed"

					    exit 1

					    exit 1

					}

					}

					if (Test-Path "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe") {

					    $existingPath = & "${env:ProgramFiles(x86)}\Microsoft Visual Studio\Installer\vswhere.exe" -products "Microsoft.VisualStudio.Product.BuildTools" -version "[16, 17)" -property installationPath

					    if ($existingPath -ne $null) {

					        echo "Found existing BuildTools installation in $existingPath"

					        $VS_UNINSTALL_ARGS = @("uninstall", "--installPath", "`"$existingPath`"", "--quiet","--wait")

					        $process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_UNINSTALL_ARGS -NoNewWindow -Wait -PassThru

					        $exitCode = $process.ExitCode

					        if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

					            echo "Original BuildTools uninstall failed with code $exitCode"

					            exit 1

					        }

					        echo "Original BuildTools uninstalled"

					    }

					}

					$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

					$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

					Remove-Item -Path vs_installer.exe -Force

					Remove-Item -Path vs_installer.exe -Force

					$exitCode = $process.ExitCode

					$exitCode = $process.ExitCode

									
										5

.circleci/scripts/vs_install_cmath.ps1
									
										Normal file
									
												View File
												
					@ -0,0 +1,5 @@

					$CMATH_DOWNLOAD_LINK = "https://raw.githubusercontent.com/microsoft/STL/12c684bba78f9b032050526abdebf14f58ca26a3/stl/inc/cmath"

					$VC14_28_INSTALL_PATH="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include"

					curl.exe --retry 3 -kL $CMATH_DOWNLOAD_LINK --output "$home\cmath"

					Move-Item -Path "$home\cmath" -Destination "$VC14_28_INSTALL_PATH" -Force

									
										15

.circleci/scripts/windows_cuda_install.sh
									
												View File
												
					@ -8,9 +8,18 @@ if [[ "$cuda_major_version" == "10" ]]; then

					    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

					    msbuild_project_dir="CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

					    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

					    cuda_install_packages="nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1"

					elif [[ "$cuda_major_version" == "11" ]]; then

					elif [[ "$cuda_major_version" == "11" ]]; then

					    cuda_installer_name="cuda_11.1.0_456.43_win10"

					    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

					    msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

					        cuda_installer_name="cuda_11.1.0_456.43_win10"

					    cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

					        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

					        cuda_install_packages="nvcc_11.1 cuobjdump_11.1 nvprune_11.1 nvprof_11.1 cupti_11.1 cublas_11.1 cublas_dev_11.1 cudart_11.1 cufft_11.1 cufft_dev_11.1 curand_11.1 curand_dev_11.1 cusolver_11.1 cusolver_dev_11.1 cusparse_11.1 cusparse_dev_11.1 npp_11.1 npp_dev_11.1 nvrtc_11.1 nvrtc_dev_11.1 nvml_dev_11.1"

					    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

					        cuda_installer_name="cuda_11.3.0_465.89_win10"

					        msbuild_project_dir="visual_studio_integration/CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions"

					        cuda_install_packages="thrust_11.3 nvcc_11.3 cuobjdump_11.3 nvprune_11.3 nvprof_11.3 cupti_11.3 cublas_11.3 cublas_dev_11.3 cudart_11.3 cufft_11.3 cufft_dev_11.3 curand_11.3 curand_dev_11.3 cusolver_11.3 cusolver_dev_11.3 cusparse_11.3 cusparse_dev_11.3 npp_11.3 npp_dev_11.3 nvrtc_11.3 nvrtc_dev_11.3 nvml_dev_11.3"

					    else

					        echo "This should not happen! ABORT."

					        exit 1

					    fi

					else

					else

					    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

					    echo "CUDA_VERSION $CUDA_VERSION is not supported yet"

					    exit 1

					    exit 1

									
										9

.circleci/scripts/windows_cudnn_install.sh
									
												View File
												
					@ -6,7 +6,14 @@ cuda_major_version=${CUDA_VERSION%.*}

					if [[ "$cuda_major_version" == "10" ]]; then

					if [[ "$cuda_major_version" == "10" ]]; then

					    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

					    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows10-x64-v7.6.4.38"

					elif [[ "$cuda_major_version" == "11" ]]; then

					elif [[ "$cuda_major_version" == "11" ]]; then

					    cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

					    if [[ "${CUDA_VERSION}" == "11.1" ]]; then

					        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.0.5.39"

					    elif [[ "${CUDA_VERSION}" == "11.3" ]]; then

					        cudnn_installer_name="cudnn-${CUDA_VERSION}-windows-x64-v8.2.0.53"

					    else

					        echo "This should not happen! ABORT."

					        exit 1

					    fi

					else

					else

					    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

					    echo "CUDNN for CUDA_VERSION $CUDA_VERSION is not supported yet"

					    exit 1

					    exit 1

									
										24

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
												View File
												
					@ -22,6 +22,24 @@ pytorch_params: &pytorch_params

					    BUILD_ONLY: << parameters.build_only >>

					    BUILD_ONLY: << parameters.build_only >>

					  resource_class: << parameters.resource_class >>

					  resource_class: << parameters.resource_class >>

					pytorch_android_params: &pytorch_android_params

					  parameters:

					    build_environment:

					      type: string

					      default: ""

					    op_list:

					      type: string

					      default: ""

					    lite_interpreter:

					      type: string

					      default: "1"

					  environment:

					    BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single

					    DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

					    PYTHON_VERSION: "3.6"

					    SELECTED_OP_LIST: << parameters.op_list >>

					    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

					pytorch_ios_params: &pytorch_ios_params

					pytorch_ios_params: &pytorch_ios_params

					  parameters:

					  parameters:

					    build_environment:

					    build_environment:

					@ -39,12 +57,16 @@ pytorch_ios_params: &pytorch_ios_params

					    use_metal:

					    use_metal:

					      type: string

					      type: string

					      default: "0"

					      default: "0"

					    lite_interpreter:

					      type: string

					      default: "1"

					  environment:

					  environment:

					    BUILD_ENVIRONMENT: << parameters.build_environment >>

					    BUILD_ENVIRONMENT: << parameters.build_environment >>

					    IOS_ARCH: << parameters.ios_arch >>

					    IOS_ARCH: << parameters.ios_arch >>

					    IOS_PLATFORM: << parameters.ios_platform >>

					    IOS_PLATFORM: << parameters.ios_platform >>

					    SELECTED_OP_LIST: << parameters.op_list >>

					    SELECTED_OP_LIST: << parameters.op_list >>

					    USE_PYTORCH_METAL: << parameters.use_metal >>

					    USE_PYTORCH_METAL: << parameters.use_metal >>

					    BUILD_LITE_INTERPRETER: << parameters.lite_interpreter >>

					pytorch_windows_params: &pytorch_windows_params

					pytorch_windows_params: &pytorch_windows_params

					  parameters:

					  parameters:

					@ -84,6 +106,6 @@ pytorch_windows_params: &pytorch_windows_params

					    VC_YEAR: <<parameters.vc_year>>

					    VC_YEAR: <<parameters.vc_year>>

					    VC_PRODUCT: <<parameters.vc_product>>

					    VC_PRODUCT: <<parameters.vc_product>>

					    USE_CUDA: <<parameters.use_cuda>>

					    USE_CUDA: <<parameters.use_cuda>>

					    TORCH_CUDA_ARCH_LIST: "7.5"

					    TORCH_CUDA_ARCH_LIST: "5.2 7.5"

					    JOB_BASE_NAME: <<parameters.test_name>>

					    JOB_BASE_NAME: <<parameters.test_name>>

					    JOB_EXECUTOR: <<parameters.executor>>

					    JOB_EXECUTOR: <<parameters.executor>>

									
										12

.circleci/verbatim-sources/header-section.yml
									
												View File
												
					@ -14,19 +14,15 @@ parameters:

					  run_build:

					  run_build:

					    type: boolean

					    type: boolean

					    default: true

					    default: true

					  run_master_build:

					docker_config_defaults: &docker_config_defaults

					    type: boolean

					  user: jenkins

					    default: false

					  aws_auth:

					    # This IAM user only allows read-write access to ECR

					    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}

					    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}

					executors:

					executors:

					  windows-with-nvidia-gpu:

					  windows-with-nvidia-gpu:

					    machine:

					    machine:

					      resource_class: windows.gpu.nvidia.medium

					      resource_class: windows.gpu.nvidia.medium

					      image: windows-server-2019-nvidia:stable

					      image: windows-server-2019-nvidia:previous

					      shell: bash.exe

					      shell: bash.exe

					  windows-xlarge-cpu-with-nvidia-cuda:

					  windows-xlarge-cpu-with-nvidia-cuda:

									
										45

.circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
					@ -45,7 +45,7 @@

					  binary_linux_test:

					  binary_linux_test:

					    <<: *binary_linux_test_upload_params

					    <<: *binary_linux_test_upload_params

					    machine:

					    machine:

					        image: ubuntu-1604:202007-01

					        image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    - checkout

					    - checkout

					@ -108,7 +108,7 @@

					  smoke_linux_test:

					  smoke_linux_test:

					    <<: *binary_linux_test_upload_params

					    <<: *binary_linux_test_upload_params

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -198,6 +198,44 @@

					        root: /Users/distiller/project

					        root: /Users/distiller/project

					        paths: final_pkgs

					        paths: final_pkgs

					    - store_artifacts:

					        path: /Users/distiller/project/final_pkgs

					  binary_macos_arm64_build:

					    <<: *binary_mac_params

					    macos:

					      xcode: "12.3.0"

					    steps:

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    - checkout

					    - run:

					        <<: *binary_checkout

					    - run:

					        <<: *binary_populate_env

					    - brew_update

					    - run:

					        <<: *binary_install_miniconda

					    - run:

					        name: Build

					        no_output_timeout: "90m"

					        command: |

					          # Do not set -u here; there is some problem with CircleCI

					          # variable expansion with PROMPT_COMMAND

					          set -ex -o pipefail

					          export CROSS_COMPILE_ARM64=1

					          script="/Users/distiller/project/pytorch/.circleci/scripts/binary_macos_build.sh"

					          cat "$script"

					          source "$script"

					    - persist_to_workspace:

					        root: /Users/distiller/project

					        paths: final_pkgs

					    - store_artifacts:

					        path: /Users/distiller/project/final_pkgs

					  binary_ios_build:

					  binary_ios_build:

					    <<: *pytorch_ios_params

					    <<: *pytorch_ios_params

					    macos:

					    macos:

					@ -270,6 +308,8 @@

					    - persist_to_workspace:

					    - persist_to_workspace:

					        root: "C:/w"

					        root: "C:/w"

					        paths: final_pkgs

					        paths: final_pkgs

					    - store_artifacts:

					        path: C:/w/final_pkgs

					  binary_windows_test:

					  binary_windows_test:

					    <<: *binary_windows_params

					    <<: *binary_windows_params

					@ -352,4 +392,3 @@

					          command: |

					          command: |

					              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

					              ANACONDA_API_TOKEN="${CONDA_PYTORCHBOT_TOKEN}" \

					              scripts/release/anaconda-prune/run.sh

					              scripts/release/anaconda-prune/run.sh

									
										2

.circleci/verbatim-sources/job-specs/binary_update_htmls.yml
									
												View File
												
					@ -8,7 +8,7 @@

					  # then install the one with the most recent version.

					  # then install the one with the most recent version.

					  update_s3_htmls: &update_s3_htmls

					  update_s3_htmls: &update_s3_htmls

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    resource_class: medium

					    resource_class: medium

					    steps:

					    steps:

					    - checkout

					    - checkout

									
										16

.circleci/verbatim-sources/job-specs/docker_jobs.yml
									
												View File
												
					@ -4,7 +4,7 @@

					          type: string

					          type: string

					          default: ""

					          default: ""

					      machine:

					      machine:

					        image: ubuntu-1604:202007-01

					        image: ubuntu-2004:202104-01

					      resource_class: large

					      resource_class: large

					      environment:

					      environment:

					        IMAGE_NAME: << parameters.image_name >>

					        IMAGE_NAME: << parameters.image_name >>

					@ -20,7 +20,10 @@

					              set +x

					              set +x

					              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

					              eval $(aws ecr get-login --no-include-email --region us-east-1)

					              export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

					              export AWS_REGION=us-east-1

					              aws ecr get-login-password --region $AWS_REGION|docker login --username AWS \

					                       --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

					              set -x

					              set -x

					              # Check if image already exists, if it does then skip building it

					              # Check if image already exists, if it does then skip building it

					              if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then

					              if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then

					@ -53,7 +56,7 @@

					              cd .circleci/docker && ./build_docker.sh

					              cd .circleci/docker && ./build_docker.sh

					  docker_for_ecr_gc_build_job:

					  docker_for_ecr_gc_build_job:

					      machine:

					      machine:

					        image: ubuntu-1604:202007-01

					        image: ubuntu-2004:202104-01

					      steps:

					      steps:

					        - checkout

					        - checkout

					        - run:

					        - run:

					@ -65,9 +68,12 @@

					              set +x

					              set +x

					              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

					              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

					              eval $(aws ecr get-login --no-include-email --region us-east-1)

					              export AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")

					              export AWS_REGION=us-east-1

					              aws ecr get-login-password --region $AWS_REGION|docker login --username AWS \

					                       --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com

					              set -x

					              set -x

					              docker push 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

					              docker push $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/gc/ecr

					  ecr_gc_job:

					  ecr_gc_job:

					      parameters:

					      parameters:

					        project:

					        project:

									
										129

.circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
					@ -1,7 +1,7 @@

					  pytorch_doc_push:

					  pytorch_doc_push:

					    resource_class: medium

					    resource_class: medium

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    parameters:

					    parameters:

					      branch:

					      branch:

					        type: string

					        type: string

					@ -30,7 +30,7 @@

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -75,7 +75,7 @@

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -111,6 +111,43 @@

					        paths:

					        paths:

					          - .

					          - .

					  pytorch_macos_10_15_py3_build:

					    environment:

					      BUILD_ENVIRONMENT: pytorch-macos-10.15-py3-arm64-build

					    macos:

					      xcode: "12.3.0"

					    steps:

					      - checkout

					      - run_brew_for_macos_build

					      - run:

					          name: Build

					          no_output_timeout: "1h"

					          command: |

					            set -e

					            export IN_CI=1

					            export CROSS_COMPILE_ARM64=1

					            # Install sccache

					            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

					            sudo chmod +x /usr/local/bin/sccache

					            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

					            # This IAM user allows write access to S3 bucket for sccache

					            set +x

					            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

					            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

					            set -x

					            chmod a+x .jenkins/pytorch/macos-build.sh

					            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

					      - persist_to_workspace:

					          root: /Users/distiller/workspace/

					          paths:

					            - miniconda3

					      - store_artifacts:

					          path: /Users/distiller/project/dist

					  pytorch_macos_10_13_py3_build:

					  pytorch_macos_10_13_py3_build:

					    environment:

					    environment:

					      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

					      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

					@ -127,7 +164,7 @@

					            export IN_CI=1

					            export IN_CI=1

					            # Install sccache

					            # Install sccache

					            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

					            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache_v2.15 --output /usr/local/bin/sccache

					            sudo chmod +x /usr/local/bin/sccache

					            sudo chmod +x /usr/local/bin/sccache

					            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

					            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

					@ -164,6 +201,42 @@

					            chmod a+x .jenkins/pytorch/macos-test.sh

					            chmod a+x .jenkins/pytorch/macos-test.sh

					            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

					            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

					      - run:

					          name: Report results

					          no_output_timeout: "5m"

					          command: |

					            set -ex

					            source /Users/distiller/workspace/miniconda3/bin/activate

					            pip install boto3

					            export PYTHONPATH="$PWD"

					            # Using the same IAM user to write stats to our OSS bucket

					            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

					            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

					            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

					          when: always

					      - store_test_results:

					          path: test/test-reports

					  pytorch_macos_10_13_py3_lite_interpreter_build_test:

					    environment:

					      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

					    macos:

					      xcode: "12.0"

					    steps:

					      - checkout

					      - attach_workspace:

					          at: ~/workspace

					      - run_brew_for_macos_build

					      - run:

					          name: Test

					          no_output_timeout: "1h"

					          command: |

					            set -e

					            export IN_CI=1

					            export BUILD_LITE_INTERPRETER=1

					            chmod a+x ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh

					            unbuffer ${HOME}/project/.jenkins/pytorch/macos-lite-interpreter-build-test.sh 2>&1 | ts

					      - store_test_results:

					      - store_test_results:

					          path: test/test-reports

					          path: test/test-reports

					@ -174,7 +247,7 @@

					      PYTHON_VERSION: "3.6"

					      PYTHON_VERSION: "3.6"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -263,7 +336,7 @@

					      PYTHON_VERSION: "3.6"

					      PYTHON_VERSION: "3.6"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -299,7 +372,7 @@

					      PYTHON_VERSION: "3.6"

					      PYTHON_VERSION: "3.6"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -335,13 +408,10 @@

					        destination: artifacts.tgz

					        destination: artifacts.tgz

					  pytorch_android_gradle_custom_build_single:

					  pytorch_android_gradle_custom_build_single:

					    environment:

					    <<: *pytorch_android_params

					      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

					      PYTHON_VERSION: "3.6"

					    resource_class: large

					    resource_class: large

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -361,11 +431,11 @@

					          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

					          echo "DOCKER_IMAGE: ${DOCKER_IMAGE}:${DOCKER_TAG}"

					          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

					          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

					          git submodule sync && git submodule update -q --init --recursive

					          git submodule sync && git submodule update -q --init --recursive --depth 1

					          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

					          VOLUME_MOUNTS="-v /home/circleci/project/:/var/lib/jenkins/workspace"

					          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

					          export id=$(docker run --env-file "${BASH_ENV}" ${VOLUME_MOUNTS} --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

					          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

					          export COMMAND='((echo "export GRADLE_OFFLINE=1" && echo "export BUILD_LITE_INTERPRETER=${BUILD_LITE_INTERPRETER}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

					          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

					          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

					          # Skip docker push as this job is purely for size analysis purpose.

					          # Skip docker push as this job is purely for size analysis purpose.

					@ -430,7 +500,7 @@

					            # sync submodules

					            # sync submodules

					            cd ${PROJ_ROOT}

					            cd ${PROJ_ROOT}

					            git submodule sync

					            git submodule sync

					            git submodule update --init --recursive

					            git submodule update --init --recursive --depth 1

					            # export

					            # export

					            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

					            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

					@ -440,6 +510,7 @@

					            echo "IOS_ARCH: ${IOS_ARCH}"

					            echo "IOS_ARCH: ${IOS_ARCH}"

					            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

					            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

					            echo "USE_PYTORCH_METAL": "${USE_METAL}"

					            echo "USE_PYTORCH_METAL": "${USE_METAL}"

					            echo "BUILD_LITE_INTERPRETER": "${BUILD_LITE_INTERPRETER}"

					            #check the custom build flag

					            #check the custom build flag

					            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

					            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

					@ -457,6 +528,10 @@

					          no_output_timeout: "30m"

					          no_output_timeout: "30m"

					          command: |

					          command: |

					            set -e

					            set -e

					            if [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

					              echo "Run Build Test is not for full jit, skipping."

					              exit 0

					            fi

					            PROJ_ROOT=/Users/distiller/project

					            PROJ_ROOT=/Users/distiller/project

					            PROFILE=PyTorch_CI_2021

					            PROFILE=PyTorch_CI_2021

					            # run the ruby build script

					            # run the ruby build script

					@ -482,6 +557,9 @@

					            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

					            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

					              echo "not SIMULATOR build, skip it."

					              echo "not SIMULATOR build, skip it."

					              exit 0

					              exit 0

					            elif [ ${BUILD_LITE_INTERPRETER} == 0 ]; then

					              echo "Run Simulator Tests is not for full jit, skipping."

					              exit 0

					            fi

					            fi

					            WORKSPACE=/Users/distiller/workspace

					            WORKSPACE=/Users/distiller/workspace

					            PROJ_ROOT=/Users/distiller/project

					            PROJ_ROOT=/Users/distiller/project

					@ -497,7 +575,7 @@

					  pytorch_linux_bazel_build:

					  pytorch_linux_bazel_build:

					    <<: *pytorch_params

					    <<: *pytorch_params

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -515,7 +593,7 @@

					          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

					          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

					          git submodule sync && git submodule update -q --init --recursive

					          git submodule sync && git submodule update -q --init --recursive --depth 1

					          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

					          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

					@ -535,7 +613,7 @@

					  pytorch_linux_bazel_test:

					  pytorch_linux_bazel_test:

					    <<: *pytorch_params

					    <<: *pytorch_params

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

					@ -576,13 +654,26 @@

					    - store_test_results:

					    - store_test_results:

					        path: bazel-testlogs

					        path: bazel-testlogs

					  pytorch_windows_test_multigpu:

					    machine:

					      image: ubuntu-2004:202104-01

					    steps:

					      - checkout

					      - run:

					          name: Test

					          no_output_timeout: "90m"

					          command: |

					            set -e

					            python3 -m pip install requests

					            python3 ./.circleci/scripts/trigger_azure_pipeline.py

					  pytorch_doc_test:

					  pytorch_doc_test:

					    environment:

					    environment:

					      BUILD_ENVIRONMENT: pytorch-doc-test

					      BUILD_ENVIRONMENT: pytorch-doc-test

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4"

					    resource_class: medium

					    resource_class: medium

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    - checkout

					    - checkout

					    - calculate_docker_image_tag

					    - calculate_docker_image_tag

									
										74

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
												View File
												
					@ -2,7 +2,7 @@ jobs:

					  pytorch_linux_build:

					  pytorch_linux_build:

					    <<: *pytorch_params

					    <<: *pytorch_params

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    - checkout

					    - checkout

					@ -15,9 +15,6 @@ jobs:

					        no_output_timeout: "1h"

					        no_output_timeout: "1h"

					        command: |

					        command: |

					          set -e

					          set -e

					          if [[ "${DOCKER_IMAGE}" == *rocm3.9* ]]; then

					            export DOCKER_TAG="f3d89a32912f62815e4feaeed47e564e887dffd6"

					          fi

					          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then

					          if [[ ${BUILD_ENVIRONMENT} == *"pure_torch"* ]]; then

					            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"

					            echo 'BUILD_CAFFE2=OFF' >> "${BASH_ENV}"

					          fi

					          fi

					@ -33,11 +30,11 @@ jobs:

					          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

					          time docker pull ${DOCKER_IMAGE}:${DOCKER_TAG} >/dev/null

					          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

					          export id=$(docker run --env-file "${BASH_ENV}" --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE}:${DOCKER_TAG})

					          git submodule sync && git submodule update -q --init --recursive

					          git submodule sync && git submodule update -q --init --recursive --depth 1

					          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

					          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

					          export COMMAND='((echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

					          export COMMAND='((echo "sudo chown -R jenkins workspace && export CIRCLE_JOB="$CIRCLE_JOB" && cd workspace && .jenkins/pytorch/build.sh && find ${BUILD_ROOT} -type f -name "*.a" -or -name "*.o" -delete") | docker exec -u jenkins -i "$id" bash) 2>&1'

					          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

					          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

					@ -83,7 +80,7 @@ jobs:

					  pytorch_linux_test:

					  pytorch_linux_test:

					    <<: *pytorch_params

					    <<: *pytorch_params

					    machine:

					    machine:

					      image: ubuntu-1604:202007-01

					      image: ubuntu-2004:202104-01

					    steps:

					    steps:

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

					    - checkout

					    - checkout

					@ -168,6 +165,7 @@ jobs:

					          # =================== The following code will be executed inside Docker container ===================

					          # =================== The following code will be executed inside Docker container ===================

					          set -ex

					          set -ex

					          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

					          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

					          export CIRCLE_JOB="$CIRCLE_JOB"

					          ${PARALLEL_FLAGS}

					          ${PARALLEL_FLAGS}

					          cd workspace

					          cd workspace

					          EOL

					          EOL

					@ -182,11 +180,27 @@ jobs:

					          fi

					          fi

					          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

					          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

					          unbuffer bash command.sh | ts

					          unbuffer bash command.sh | ts

					          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then

					              echo "Retrieving C++ coverage report"

					              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test

					          fi

					          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then

					              echo "Retrieving Python coverage report"

					              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test

					              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test

					              python3 -mpip install codecov

					              python3 -mcodecov

					          fi

					    - run:

					    - run:

					        name: Report results

					        name: Report results

					        no_output_timeout: "5m"

					        no_output_timeout: "5m"

					        command: |

					        command: |

					          set -e

					          set -e

					          # Retrieving test results should be done as very first step as command never fails

					          # But is always executed if previous step fails for some reason

					          echo "Retrieving test reports"

					          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

					          docker stats --all --no-stream

					          docker stats --all --no-stream

					          cat >docker_commands.sh \<<EOL

					          cat >docker_commands.sh \<<EOL

					@ -201,27 +215,18 @@ jobs:

					          export CIRCLE_JOB="$CIRCLE_JOB"

					          export CIRCLE_JOB="$CIRCLE_JOB"

					          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

					          export CIRCLE_WORKFLOW_ID="$CIRCLE_WORKFLOW_ID"

					          cd workspace

					          cd workspace

					          python test/print_test_stats.py --upload-to-s3 test

					          export PYTHONPATH="\${PWD}"

					          python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

					          EOL

					          EOL

					          echo "(cat docker_commands.sh | docker exec -u jenkins -i "$id" bash) 2>&1" > command.sh

					          echo "(cat docker_commands.sh | docker exec -u jenkins -e LANG=C.UTF-8 -i "$id" bash) 2>&1" > command.sh

					          unbuffer bash command.sh | ts

					          unbuffer bash command.sh | ts

					          echo "Retrieving test reports"

					          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

					          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* ]]; then

					              echo "Retrieving C++ coverage report"

					              docker cp $id:/var/lib/jenkins/workspace/build/coverage.info ./test

					          fi

					          if [[ ${BUILD_ENVIRONMENT} == *"coverage"* || ${BUILD_ENVIRONMENT} == *"onnx"* ]]; then

					              echo "Retrieving Python coverage report"

					              docker cp $id:/var/lib/jenkins/workspace/test/.coverage ./test

					              docker cp $id:/var/lib/jenkins/workspace/test/coverage.xml ./test

					              python3 -mpip install codecov

					              python3 -mcodecov

					          fi

					        when: always

					        when: always

					    - store_test_results:

					    - store_test_results:

					        path: test-reports

					        path: test-reports

					    - store_artifacts:

					        path: test/.coverage

					    - store_artifacts:

					        path: test/coverage.xml

					  pytorch_windows_build:

					  pytorch_windows_build:

					    <<: *pytorch_windows_params

					    <<: *pytorch_windows_params

					@ -256,6 +261,11 @@ jobs:

					    executor: <<parameters.executor>>

					    executor: <<parameters.executor>>

					    steps:

					    steps:

					      - checkout

					      - checkout

					      - run:

					          name: Install VS2019 toolchain

					          no_output_timeout: 10m

					          command: |

					              powershell .circleci/scripts/vs_install.ps1

					      - run:

					      - run:

					          name: Install Cuda

					          name: Install Cuda

					          no_output_timeout: 30m

					          no_output_timeout: 30m

					@ -320,6 +330,11 @@ jobs:

					      - checkout

					      - checkout

					      - attach_workspace:

					      - attach_workspace:

					          at: c:/users/circleci/workspace

					          at: c:/users/circleci/workspace

					      - run:

					          name: Install VS2019 toolchain

					          no_output_timeout: 10m

					          command: |

					              powershell .circleci/scripts/vs_install.ps1

					      - run:

					      - run:

					          name: Install Cuda

					          name: Install Cuda

					          no_output_timeout: 30m

					          no_output_timeout: 30m

					@ -346,5 +361,18 @@ jobs:

					            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

					            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

					            set -x

					            set -x

					            .jenkins/pytorch/win-test.sh

					            .jenkins/pytorch/win-test.sh

					      - run:

					          name: Report results

					          no_output_timeout: "5m"

					          command: |

					            set -ex

					            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

					            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

					            export PYTHONPATH="$PWD"

					            pip install typing_extensions boto3

					            python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

					          when: always

					      - store_test_results:

					      - store_test_results:

					          path: test/test-reports

					          path: test/test-reports

					      - store_artifacts:

					          path: test/coverage.xml

									
										195

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml
									
										Normal file
									
												View File
												
					@ -0,0 +1,195 @@

					  scheduled-ci:

					    triggers:

					      - schedule:

					          # runs every 4 hours on the 45th minute

					          cron: "45 0,4,8,12,16,20 * * *"

					          filters:

					            branches:

					              only:

					                - master

					    jobs:

					      - docker_build_job:

					          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					      - pytorch_linux_build:

					          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

					          requires:

					            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					      - pytorch_linux_test:

					          name: periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_test

					          requires:

					            - periodic_pytorch_xenial_cuda11_3_cudnn8_gcc7_build

					          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          use_cuda_docker_runtime: "1"

					          resource_class: gpu.medium

					      - pytorch_linux_build:

					          name: periodic_libtorch_xenial_cuda11_3_cudnn8_gcc7_build

					          requires:

					            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					      - pytorch_windows_build:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          name: periodic_pytorch_windows_cuda11.3_build

					          python_version: "3.6"

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					      - pytorch_windows_test:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          executor: windows-with-nvidia-gpu

					          name: periodic_pytorch_windows_cuda11.3_test1

					          python_version: "3.6"

					          requires:

					            - periodic_pytorch_windows_cuda11.3_build

					          test_name: pytorch-windows-test1

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					      - pytorch_windows_test:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          executor: windows-with-nvidia-gpu

					          name: periodic_pytorch_windows_cuda11.3_test2

					          python_version: "3.6"

					          requires:

					            - periodic_pytorch_windows_cuda11.3_build

					          test_name: pytorch-windows-test2

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					  # The following allows these jobs to run on ci-all and release branches

					  debuggable-scheduled-ci:

					    jobs:

					      - docker_build_job:

					          name: "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          image_name: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_linux_build:

					          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

					          requires:

					            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_linux_test:

					          name: pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_test

					          requires:

					            - pytorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

					          build_environment: "pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-test"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          use_cuda_docker_runtime: "1"

					          resource_class: gpu.medium

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_linux_build:

					          name: pytorch_libtorch_linux_xenial_cuda11_3_cudnn8_py3_gcc7_build

					          requires:

					            - "docker-pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          build_environment: "pytorch-libtorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7-build"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_windows_build:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          name: pytorch_windows_vs2019_py36_cuda11.3_build

					          python_version: "3.6"

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_windows_test:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          executor: windows-with-nvidia-gpu

					          name: pytorch_windows_vs2019_py36_cuda11.3_test1

					          python_version: "3.6"

					          requires:

					            - pytorch_windows_vs2019_py36_cuda11.3_build

					          test_name: pytorch-windows-test1

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					      - pytorch_windows_test:

					          build_environment: pytorch-win-vs2019-cuda11-cudnn8-py3

					          cuda_version: "11.3"

					          executor: windows-with-nvidia-gpu

					          name: pytorch_windows_vs2019_py36_cuda11.3_test2

					          python_version: "3.6"

					          requires:

					            - pytorch_windows_vs2019_py36_cuda11.3_build

					          test_name: pytorch-windows-test2

					          use_cuda: "1"

					          vc_product: BuildTools

					          vc_version: "14.28.29333"

					          vc_year: "2019"

					          filters:

					            branches:

					              only:

					                - /ci-all\/.*/

					                - /release\/.*/

					  # the following clones pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7's tests but enables

					  # slow tests and sets an environment variable so gradcheck runs with fast_mode=False

					  slow-gradcheck-scheduled-ci:

					    triggers:

					      - schedule:

					          # runs every 8 hours on the 45th minute

					          cron: "45 0,8,16 * * *"

					          filters:

					            branches:

					              only:

					                - master

					    jobs:

					      - docker_build_job:

					          name: "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

					          image_name: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

					      - pytorch_linux_build:

					          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

					          requires:

					            - "docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

					          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-build"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

					      - pytorch_linux_test:

					          name: periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_old_gradcheck_tests

					          requires:

					            - periodic_pytorch_xenial_cuda10_2_cudnn7_gcc7_build

					          build_environment: "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7-old-gradcheck-tests"

					          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7"

					          use_cuda_docker_runtime: "1"

					          resource_class: gpu.medium

									
										1

.circleci/windows-jni/include/jni.h
									
												View File
												
					@ -1129,4 +1129,3 @@ JNIEXPORT void JNI_OnUnload(JavaVM* vm, void* reserved);

					#define JNI_ABORT       2           /* free buffer w/o copying back */

					#define JNI_ABORT       2           /* free buffer w/o copying back */

					#endif  /* JNI_H_ */

					#endif  /* JNI_H_ */

4

.clang-tidy

View File

 -bugprone-forward-declaration-namespace,
 -bugprone-macro-parentheses,
 -bugprone-lambda-function-name,
+-bugprone-reserved-identifier,
 cppcoreguidelines-*,
+-cppcoreguidelines-avoid-magic-numbers,
 -cppcoreguidelines-interfaces-global-init,
+-cppcoreguidelines-macro-usage,
 -cppcoreguidelines-owning-memory,
 -cppcoreguidelines-pro-bounds-array-to-pointer-decay,
 -cppcoreguidelines-pro-bounds-constant-array-index,
 -modernize-use-trailing-return-type,
 performance-*,
 -performance-noexcept-move-constructor,
+-performance-unnecessary-value-param,
 '
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false

15

.coveragerc Normal file

View File

+[run]
+plugins =
+    coverage_plugins.jit_plugin
+omit =
+    */tmp*
+    */Temp/*
+    */usr/local/lib*
+    *test/*
+[report]
+omit =
+    */tmp*
+    */Temp/*
+    */usr/local/lib*
+    *test/*

35

.flake8

View File

 # C408 ignored because we like the dict keyword argument syntax
 # E501 is not flexible enough, we're using B950 instead
 ignore =
-    E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
+    E203,E305,E402,E501,E721,E741,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
     # shebang has extra meaning in fbcode lints, so I think it's not worth trying
     # to line this up with executable bit
     EXE001,
     # these ignores are from flake8-comprehensions; please fix!
     C400,C401,C402,C403,C404,C405,C407,C411,C413,C414,C415
 per-file-ignores = __init__.py: F401 torch/utils/cpp_extension.py: B950
+optional-ascii-coding = True
 exclude =
-    docs/src,
+    ./.git,
-    docs/cpp/src,
+    ./build_code_analyzer,
-    venv,
+    ./build_test_custom_build,
-    third_party,
+    ./build,
-    caffe2,
+    ./caffe2,
-    scripts,
+    ./docs/caffe2,
-    docs/caffe2,
+    ./docs/cpp/src,
-    torch/lib/include,
+    ./docs/src,
-    torch/lib/tmp_install,
+    ./scripts,
-    build,
+    ./test/generated_type_hints_smoketest.py,
-    torch/include,
+    ./third_party,
-    *.pyi,
+    ./torch/include,
-    .git,
+    ./torch/lib,
-    build,
+    ./venv,
-    build_test_custom_build,
+    *.pyi
-    build_code_analyzer,
-    test/generated_type_hints_smoketest.py

14

.gdbinit Normal file

View File

+# automatically load the pytoch-gdb extension.
+#
+# gdb automatically tries to load this file whenever it is executed from the
+# root of the pytorch repo, but by default it is not allowed to do so due to
+# security reasons. If you want to use pytorch-gdb, please add the following
+# line to your ~/.gdbinit (i.e., the .gdbinit file which is in your home
+# directory, NOT this file):
+#    add-auto-load-safe-path /path/to/pytorch/.gdbinit
+#
+# Alternatively, you can manually load the pytorch-gdb commands into your
+# existing gdb session by doing the following:
+#    (gdb) source /path/to/pytorch/tools/gdb/pytorch-gdb.py
+source tools/gdb/pytorch-gdb.py

									
										2

.github/pytorch-circleci-labels.yml
									
										vendored
									
												View File
												
					@ -11,3 +11,5 @@ labels_to_circle_params:

					        - v[0-9]+(\.[0-9]+)*-rc[0-9]+

					        - v[0-9]+(\.[0-9]+)*-rc[0-9]+

					    set_to_false:

					    set_to_false:

					      - run_build

					      - run_build

					  ci/master:

					    parameter: run_master_build

									
										34

.github/scale-config.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,34 @@

					# scale-config.yml:

					#   Powers what instance types are available for GHA auto-scaled

					#   runners. Runners listed here will be available as self hosted

					#   runners, configuration is directly pulled from the main branch.

					#

					# NOTE (Apr, 5, 2021): Linux runners are currently all an amazonlinux2

					#

					# TODO: Add some documentation on how the auto-scaling works

					#

					# NOTE: Default values,

					#

					# runner_types:

					#   runner_label:

					#     instance_type: m4.large

					#     os: linux

					#     max_available: 20

					#     disk_size: 50

					runner_types:

					  linux.2xlarge:

					    instance_type: c5.2xlarge

					    os: linux

					    max_available: 500

					    disk_size: 150

					  linux.8xlarge.nvidia.gpu:

					    instance_type: g3.8xlarge

					    os: linux

					    max_available: 50

					    disk_size: 150

					  windows.4xlarge:

					    instance_type: c5.4xlarge

					    os: windows

					    max_available: 200

					    disk_size: 256

									
										42

.github/scripts/build_publish_nightly_docker.sh
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,42 @@

					#!/bin/sh

					set -xeuo pipefail

					PYTORCH_DOCKER_TAG=$(git describe --tags --always)-devel

					CUDA_VERSION=11.1

					# Build PyTorch nightly docker

					make -f docker.Makefile \

					     DOCKER_REGISTRY=ghcr.io \

					     DOCKER_ORG=pytorch \

					     CUDA_VERSION=${CUDA_VERSION} \

					     DOCKER_IMAGE=pytorch-nightly \

					     DOCKER_TAG=${PYTORCH_DOCKER_TAG} \

					     INSTALL_CHANNEL=pytorch-nightly BUILD_TYPE=official devel-image

					# Get the PYTORCH_NIGHTLY_COMMIT from the docker image

					PYTORCH_NIGHTLY_COMMIT=$(docker run \

					       ghcr.io/pytorch/pytorch-nightly:${PYTORCH_DOCKER_TAG} \

					       python -c 'import torch; print(torch.version.git_version)' | head -c 7)

					docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_DOCKER_TAG} \

					       ghcr.io/pytorch/pytorch-nightly:${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION}

					docker tag ghcr.io/pytorch/pytorch-nightly:${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \

					       ghcr.io/pytorch/pytorch-nightly:latest

					# Push the nightly docker to GitHub Container Registry

					echo $GHCR_PAT | docker login ghcr.io -u pytorch --password-stdin

					make -f docker.Makefile \

					     DOCKER_REGISTRY=ghcr.io \

					     DOCKER_ORG=pytorch \

					     DOCKER_IMAGE=pytorch-nightly \

					     DOCKER_TAG=${PYTORCH_NIGHTLY_COMMIT}-cu${CUDA_VERSION} \

					     devel-push

					make -f docker.Makefile \

					     DOCKER_REGISTRY=ghcr.io \

					     DOCKER_ORG=pytorch \

					     DOCKER_IMAGE=pytorch-nightly \

					     DOCKER_TAG=latest \

					     devel-push

									
										149

.github/scripts/generate_binary_build_matrix.py
									
										vendored
									
												View File
												
					@ -10,14 +10,13 @@ architectures:

					    * Latest ROCM

					    * Latest ROCM

					"""

					"""

					import argparse

					import json

					import json

					import os

					from typing import Dict, List

					import itertools

					CUDA_ARCHES = [

					CUDA_ARCHES = [

					    "10.1",

					    "10.2",

					    "10.2",

					    "11.0"

					    "11.1"

					]

					]

					ROCM_ARCHES = [

					ROCM_ARCHES = [

					@ -25,13 +24,17 @@ ROCM_ARCHES = [

					    "4.0"

					    "4.0"

					]

					]

					FULL_ARCHES = [

					    "cpu",

					    *CUDA_ARCHES,

					    *ROCM_ARCHES

					]

					CONTAINER_IMAGES = {

					def arch_type(arch_version: str) -> str:

					    if arch_version in CUDA_ARCHES:

					        return "cuda"

					    elif arch_version in ROCM_ARCHES:

					        return "rocm"

					    else:  # arch_version should always be "cpu" in this case

					        return "cpu"

					WHEEL_CONTAINER_IMAGES = {

					    **{

					    **{

					        # TODO: Re-do manylinux CUDA image tagging scheme to be similar to

					        # TODO: Re-do manylinux CUDA image tagging scheme to be similar to

					        #       ROCM so we don't have to do this replacement

					        #       ROCM so we don't have to do this replacement

					@ -45,6 +48,29 @@ CONTAINER_IMAGES = {

					    "cpu": "pytorch/manylinux-cpu"

					    "cpu": "pytorch/manylinux-cpu"

					}

					}

					CONDA_CONTAINER_IMAGES = {

					    **{

					        gpu_arch: f"pytorch/conda-builder:cuda{gpu_arch}"

					        for gpu_arch in CUDA_ARCHES

					    },

					    "cpu": "pytorch/conda-builder:cpu"

					}

					LIBTORCH_CONTAINER_IMAGES = {

					    **{

					        # TODO: Re-do manylinux CUDA image tagging scheme to be similar to

					        #       ROCM so we don't have to do this replacement

					        (gpu_arch, "pre-cxx11"): f"pytorch/manylinux-cuda{gpu_arch.replace('.', '')}"

					        for gpu_arch in CUDA_ARCHES

					    },

					    **{

					        (gpu_arch, "cxx11-abi"): f"pytorch/libtorch-cxx11-builder:cuda{gpu_arch}"

					        for gpu_arch in CUDA_ARCHES

					    },

					    ("cpu", "pre-cxx11"): "pytorch/manylinux-cpu",

					    ("cpu", "cxx11-abi"): "pytorch/libtorch-cxx11-builder:cpu",

					}

					FULL_PYTHON_VERSIONS = [

					FULL_PYTHON_VERSIONS = [

					    "3.6",

					    "3.6",

					    "3.7",

					    "3.7",

					@ -53,34 +79,89 @@ FULL_PYTHON_VERSIONS = [

					]

					]

					def is_pull_request():

					def is_pull_request() -> bool:

					    return os.environ.get("GITHUB_HEAD_REF")

					    return False

					    # return os.environ.get("GITHUB_HEAD_REF")

					def generate_matrix():

					    python_versions = FULL_PYTHON_VERSIONS

					def snip_if(is_pr: bool, versions: List[str]) -> List[str]:

					    arches = FULL_ARCHES

					    """

					    if is_pull_request():

					    Return the full list of versions, or just the latest if on a PR.

					        python_versions = [python_versions[-1]]

					    """

					        arches = ["cpu", CUDA_ARCHES[-1], ROCM_ARCHES[-1]]

					    return [versions[-1]] if is_pr else versions

					    matrix = []

					    for item in itertools.product(python_versions, arches):

					        python_version, arch_version = item

					def generate_conda_matrix(is_pr: bool) -> List[Dict[str, str]]:

					        # Not my favorite code here

					    return [

					        gpu_arch_type = "cuda"

					        {

					        if "rocm" in CONTAINER_IMAGES[arch_version]:

					            gpu_arch_type = "rocm"

					        elif "cpu" in CONTAINER_IMAGES[arch_version]:

					            gpu_arch_type = "cpu"

					        matrix.append({

					            "python_version": python_version,

					            "python_version": python_version,

					            "gpu_arch_type": gpu_arch_type,

					            "gpu_arch_type": arch_type(arch_version),

					            "gpu_arch_version": arch_version,

					            "gpu_arch_version": arch_version,

					            "container_image": CONTAINER_IMAGES[arch_version]

					            "container_image": CONDA_CONTAINER_IMAGES[arch_version],

					        })

					        }

					    return json.dumps({"include": matrix})

					        for python_version in snip_if(is_pr, FULL_PYTHON_VERSIONS)

					        # We don't currently build conda packages for rocm

					        for arch_version in ["cpu"] + snip_if(is_pr, CUDA_ARCHES)

					    ]

					def generate_libtorch_matrix(is_pr: bool) -> List[Dict[str, str]]:

					    libtorch_variants = [

					        "shared-with-deps",

					        "shared-without-deps",

					        "static-with-deps",

					        "static-without-deps",

					    ]

					    return [

					        {

					            "gpu_arch_type": arch_type(arch_version),

					            "gpu_arch_version": arch_version,

					            "libtorch_variant": libtorch_variant,

					            "devtoolset": abi_version,

					            "container_image": LIBTORCH_CONTAINER_IMAGES[(arch_version, abi_version)],

					        }

					        # We don't currently build libtorch for rocm

					        for arch_version in ["cpu"] + snip_if(is_pr, CUDA_ARCHES)

					        for libtorch_variant in libtorch_variants

					        # one of the values in the following list must be exactly

					        # "cxx11-abi", but the precise value of the other one doesn't

					        # matter

					        for abi_version in ["cxx11-abi", "pre-cxx11"]

					    ]

					def generate_wheels_matrix(is_pr: bool) -> List[Dict[str, str]]:

					    arches = ["cpu"]

					    arches += snip_if(is_pr, CUDA_ARCHES)

					    arches += snip_if(is_pr, ROCM_ARCHES)

					    return [

					        {

					            "python_version": python_version,

					            "gpu_arch_type": arch_type(arch_version),

					            "gpu_arch_version": arch_version,

					            "container_image": WHEEL_CONTAINER_IMAGES[arch_version],

					        }

					        for python_version in snip_if(is_pr, FULL_PYTHON_VERSIONS)

					        for arch_version in arches

					    ]

					def from_includes(includes: List[Dict[str, str]]) -> str:

					    return json.dumps({"include": includes})

					def main() -> None:

					    parser = argparse.ArgumentParser()

					    parser.add_argument('mode', choices=['conda', 'libtorch', 'wheels'])

					    args = parser.parse_args()

					    is_pr = is_pull_request()

					    print(from_includes({

					        'conda': generate_conda_matrix,

					        'libtorch': generate_libtorch_matrix,

					        'wheels': generate_wheels_matrix,

					    }[args.mode](is_pr)))

					def main():

					    print(generate_matrix())

					if __name__ == "__main__":

					if __name__ == "__main__":

					    main()

					    main()

									
										161

.github/scripts/generate_linux_ci_workflows.py
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,161 @@

					#!/usr/bin/env python3

					from pathlib import Path

					import jinja2

					DOCKER_REGISTRY = "308535385114.dkr.ecr.us-east-1.amazonaws.com"

					GITHUB_DIR = Path(__file__).parent.parent

					CPU_TEST_RUNNER = "linux.2xlarge"

					CUDA_TEST_RUNNER = "linux.8xlarge.nvidia.gpu"

					class PyTorchLinuxWorkflow:

					    def __init__(

					            self,

					            build_environment: str,

					            docker_image_base: str,

					            on_pull_request: bool = False,

					            enable_doc_jobs: bool = False,

					    ):

					        self.build_environment = build_environment

					        self.docker_image_base = docker_image_base

					        self.test_runner_type = CPU_TEST_RUNNER

					        self.on_pull_request = on_pull_request

					        self.enable_doc_jobs = enable_doc_jobs

					        if "cuda" in build_environment:

					            self.test_runner_type = CUDA_TEST_RUNNER

					    def generate_workflow_file(

					        self, workflow_template: jinja2.Template, jinja_env: jinja2.Environment

					    ) -> Path:

					        output_file_path = GITHUB_DIR.joinpath(

					            f"workflows/{self.build_environment}.yml"

					        )

					        with open(output_file_path, "w") as output_file:

					            output_file.writelines(["# @generated DO NOT EDIT MANUALLY\n"])

					            output_file.write(

					                workflow_template.render(

					                    build_environment=self.build_environment,

					                    docker_image_base=self.docker_image_base,

					                    test_runner_type=self.test_runner_type,

					                    enable_doc_jobs=self.enable_doc_jobs,

					                    on_pull_request=self.on_pull_request,

					                )

					            )

					            output_file.write('\n')

					        return output_file_path

					WORKFLOWS = [

					    PyTorchLinuxWorkflow(

					        build_environment="pytorch-linux-xenial-py3.6-gcc5.4",

					        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

					        on_pull_request=True,

					        enable_doc_jobs=True,

					    ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-paralleltbb-linux-xenial-py3.6-gcc5.4",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-parallelnative-linux-xenial-py3.6-gcc5.4",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-pure_torch-linux-xenial-py3.6-gcc5.4",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-gcc7",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc7",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-asan",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang7-onnx",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang7-onnx",

					    # ),

					    PyTorchLinuxWorkflow(

					        build_environment="pytorch-linux-xenial-cuda10.2-cudnn7-py3.6-gcc7",

					        docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

					    ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-libtorch-linux-xenial-cuda11.1-cudnn8-py3.6-gcc7",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-cuda11.1-cudnn8-py3-gcc7",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-bionic-py3.6-clang9-noarch",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-xla-linux-bionic-py3.6-clang9",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-vulkan-linux-bionic-py3.6-clang9",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.6-clang9",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-bionic-py3.8-gcc9-coverage",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-py3.8-gcc9",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-bionic-rocm3.9-py3.6",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-bionic-rocm3.9-py3.6",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_32",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-x86_64",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v7a",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-android-ndk-r19c-arm-v8a",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-asan",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-dynamic",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-custom-static",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					    # PyTorchLinuxWorkflow(

					    #     build_environment="pytorch-linux-xenial-py3.6-clang5-mobile-code-analysis",

					    #     docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

					    # ),

					]

					if __name__ == "__main__":

					    jinja_env = jinja2.Environment(

					        variable_start_string="!{{",

					        loader=jinja2.FileSystemLoader(str(GITHUB_DIR.joinpath("templates"))),

					    )

					    workflow_template = jinja_env.get_template("linux_ci_workflow.yml.in")

					    for workflow in WORKFLOWS:

					        print(

					            workflow.generate_workflow_file(

					                workflow_template=workflow_template,

					                jinja_env=jinja_env

					            )

					        )

									
										5

.github/scripts/generate_pytorch_version.py
									
										vendored
									
												View File
												
					@ -60,11 +60,6 @@ class PytorchVersion:

					        self.no_build_suffix = no_build_suffix

					        self.no_build_suffix = no_build_suffix

					    def get_post_build_suffix(self):

					    def get_post_build_suffix(self):

					        # CUDA 10.2 is the version to be uploaded to PyPI so it doesn't have a

					        # version suffix

					        if ((self.gpu_arch_type == "cuda" and self.gpu_arch_version == "10.2")

					                or self.no_build_suffix):

					            return ""

					        if self.gpu_arch_type == "cuda":

					        if self.gpu_arch_type == "cuda":

					            return f"+cu{self.gpu_arch_version.replace('.', '')}"

					            return f"+cu{self.gpu_arch_version.replace('.', '')}"

					        return f"+{self.gpu_arch_type}{self.gpu_arch_version}"

					        return f"+{self.gpu_arch_type}{self.gpu_arch_version}"

									
										55

.github/scripts/install_nvidia_utils_linux.sh
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,55 @@

					#!/usr/bin/env bash

					set -eou pipefail

					DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID) \

					DRIVER_FN="NVIDIA-Linux-x86_64-460.39.run"

					YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"

					install_nvidia_docker2_amzn2() {

					    (

					        set -x

					        # Needed for yum-config-manager

					        sudo yum install -y yum-utils

					        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"

					        sudo yum install -y nvidia-docker2

					        sudo systemctl restart docker

					    )

					}

					install_nvidia_driver_amzn2() {

					    (

					        set -x

					        sudo yum groupinstall -y "Development Tools"

					        # ensure our kernel install is the same as our underlying kernel,

					        # groupinstall "Development Tools" has a habit of mismatching kernel headers

					        sudo yum install -y "kernel-devel-uname-r == $(uname -r)"

					        sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

					        sudo /bin/bash /tmp/nvidia_driver -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

					        sudo rm -fv /tmp/nvidia_driver

					        nvidia-smi

					    )

					}

					# Install container toolkit based on distribution

					echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="

					case "${DISTRIBUTION}" in

					    amzn*)

					        install_nvidia_docker2_amzn2

					        ;;

					    *)

					        echo "ERROR: Unknown distribution ${DISTRIBUTION}"

					        exit 1

					        ;;

					esac

					echo "== Installing nvidia driver ${DRIVER_FN} =="

					case "${DISTRIBUTION}" in

					    amzn*)

					        install_nvidia_driver_amzn2

					        ;;

					    *)

					        echo "ERROR: Unknown distribution ${DISTRIBUTION}"

					        exit 1

					        ;;

					esac

									
										51

.github/scripts/lint_native_functions.py
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,51 @@

					#!/usr/bin/env python3

					'''

					Verify that it is possible to round-trip native_functions.yaml via ruamel under some

					configuration.  Keeping native_functions.yaml consistent in this way allows us to

					run codemods on the file using ruamel without introducing line noise.  Note that we don't

					want to normalize the YAML file, as that would to lots of spurious lint failures.  Anything

					that ruamel understands how to roundtrip, e.g., whitespace and comments, is OK!

					ruamel is a bit picky about inconsistent indentation, so you will have to indent your

					file properly.  Also, if you are working on changing the syntax of native_functions.yaml,

					you may find that you want to use some format that is not what ruamel prefers.  If so,

					it is OK to modify this script (instead of reformatting native_functions.yaml)--the point

					is simply to make sure that there is *some* configuration of ruamel that can round trip

					the YAML, not to be prescriptive about it.

					'''

					import ruamel.yaml

					import difflib

					import sys

					from pathlib import Path

					from io import StringIO

					def fn(base):

					    return str(base / Path("aten/src/ATen/native/native_functions.yaml"))

					with open(Path(__file__).parent.parent.parent / fn('.'), "r") as f:

					    contents = f.read()

					yaml = ruamel.yaml.YAML()

					yaml.preserve_quotes = True

					yaml.width = 1000

					yaml.boolean_representation = ['False', 'True']

					r = yaml.load(contents)

					# Cuz ruamel's author intentionally didn't include conversion to string

					# https://stackoverflow.com/questions/47614862/best-way-to-use-ruamel-yaml-to-dump-to-string-not-to-stream

					string_stream = StringIO()

					yaml.dump(r, string_stream)

					new_contents = string_stream.getvalue()

					string_stream.close()

					if contents != new_contents:

					    print("""\

					## LINT FAILURE: native_functions.yaml ##

					native_functions.yaml failed lint; please apply the diff below to fix lint.

					If you think this is in error, please see .github/scripts/lint_native_functions.py

					""", file=sys.stderr)

					    sys.stdout.writelines(difflib.unified_diff(contents.splitlines(True), new_contents.splitlines(True), fn('a'), fn('b')))

					    sys.exit(1)

									
										21

.github/scripts/parse_ref.py
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,21 @@

					#!/usr/bin/env python3

					import os

					import re

					def main() -> None:

					    ref = os.environ['GITHUB_REF']

					    m = re.match(r'^refs/(\w+)/(.*)$', ref)

					    if m:

					        category, stripped = m.groups()

					        if category == 'heads':

					            print(f'::set-output name=branch::{stripped}')

					        elif category == 'pull':

					            print(f'::set-output name=branch::pull/{stripped.split("/")[0]}')

					        elif category == 'tags':

					            print(f'::set-output name=tag::{stripped}')

					if __name__ == '__main__':

					    main()

									
										37

.github/scripts/regenerate_cancel_redundant_workflow.py
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,37 @@

					#!/usr/bin/env python3

					'''

					This file verifies that the workflows that are potentially canceled in our cancel_redundant_workflow.yml

					match the workflows we have running on pull requests (found in .github/workflows). This way, anytime a

					workflow is added or removed, people can be reminded to modify the cancel_redundant_workflow.yml accordingly.

					'''

					import ruamel.yaml

					from pathlib import Path

					yaml = ruamel.yaml.YAML()

					yaml.preserve_quotes = True

					yaml.boolean_representation = ['False', 'True']

					yaml.default_flow_style = False

					if __name__ == '__main__':

					    workflow_paths = (Path(__file__).parent.parent / 'workflows').rglob('*')

					    workflows = []

					    for path in workflow_paths:

					        if path.suffix in {'.yml', '.yaml'}:

					            with open(path) as f:

					                data = yaml.load(f)

					                assert 'name' in data, 'Every GHA workflow must have a name.'

					                if 'pull_request' in data['on']:

					                    workflows.append(data['name'])

					    with open('.github/workflows/cancel_redundant_workflows.yml', 'r') as f:

					        data = yaml.load(f)

					    # Replace workflows to cancel

					    data['on']['workflow_run']['workflows'] = sorted(workflows)

					    with open('.github/workflows/cancel_redundant_workflows.yml', 'w') as f:

					        yaml.dump(data, f)

									
										5

.github/scripts/report_git_status.sh
									
										vendored
									
										Executable file
									
												View File
												
					@ -0,0 +1,5 @@

					#!/usr/bin/env bash

					CHANGES=$(git status --porcelain)

					echo "$CHANGES"

					git diff

					[ -z "$CHANGES" ]

									
										103

.github/scripts/run_torchbench.py
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,103 @@

					"""

					Generate a torchbench test report from a file containing the PR body.

					Currently, only supports running tests on specified model names

					Testing environment:

					- Intel Xeon 8259CL @ 2.50 GHz, 24 Cores with disabled Turbo and HT

					- Nvidia Tesla T4

					- Nvidia Driver 450.51.06

					- Python 3.7

					- CUDA 10.2

					"""

					# Known issues:

					# 1. Does not reuse the build artifact in other CI workflows

					# 2. CI jobs are serialized because there is only one worker

					import os

					import pathlib

					import argparse

					import subprocess

					from typing import List

					CUDA_VERSION = "cu102"

					PYTHON_VERSION = "3.7"

					TORCHBENCH_CONFIG_NAME = "config.yaml"

					MAGIC_PREFIX = "RUN_TORCHBENCH:"

					ABTEST_CONFIG_TEMPLATE = """# This config is automatically generated by run_torchbench.py

					start: {control}

					end: {treatment}

					threshold: 100

					direction: decrease

					timeout: 720

					tests:"""

					def gen_abtest_config(control: str, treatment: str, models: List[str]):

					    d = {}

					    d["control"] = control

					    d["treatment"] = treatment

					    config = ABTEST_CONFIG_TEMPLATE.format(**d)

					    if models == ["ALL"]:

					        return config + "\n"

					    for model in models:

					        config = f"{config}\n  - {model}"

					    config = config + "\n"

					    return config

					def deploy_torchbench_config(output_dir: str, config: str):

					    # Create test dir if needed

					    pathlib.Path(output_dir).mkdir(exist_ok=True)

					    # TorchBench config file name

					    config_path = os.path.join(output_dir, TORCHBENCH_CONFIG_NAME)

					    with open(config_path, "w") as fp:

					        fp.write(config)

					def extract_models_from_pr(torchbench_path: str, prbody_file: str) -> List[str]:

					    model_list = []

					    with open(prbody_file, "r") as pf:

					        lines = map(lambda x: x.strip(), pf.read().splitlines())

					        magic_lines = list(filter(lambda x: x.startswith(MAGIC_PREFIX), lines))

					        if magic_lines:

					            # Only the first magic line will be respected.

					            model_list = list(map(lambda x: x.strip(), magic_lines[0][len(MAGIC_PREFIX):].split(",")))

					    # Shortcut: if model_list is ["ALL"], run all the tests

					    if model_list == ["ALL"]:

					        return model_list

					    # Sanity check: make sure all the user specified models exist in torchbench repository

					    benchmark_path = os.path.join(torchbench_path, "torchbenchmark", "models")

					    full_model_list = [model for model in os.listdir(benchmark_path) if os.path.isdir(os.path.join(benchmark_path, model))]

					    for m in model_list:

					        if m not in full_model_list:

					            print(f"The model {m} you specified does not exist in TorchBench suite. Please double check.")

					            return []

					    return model_list

					def run_torchbench(pytorch_path: str, torchbench_path: str, output_dir: str):

					    # Copy system environment so that we will not override

					    env = dict(os.environ)

					    command = ["python", "bisection.py", "--work-dir", output_dir,

					               "--pytorch-src", pytorch_path, "--torchbench-src", torchbench_path,

					               "--config", os.path.join(output_dir, "config.yaml"),

					               "--output", os.path.join(output_dir, "result.txt")]

					    subprocess.check_call(command, cwd=torchbench_path, env=env)

					if __name__ == "__main__":

					    parser = argparse.ArgumentParser(description='Run TorchBench tests based on PR')

					    parser.add_argument('--pr-num', required=True, type=str, help="The Pull Request number")

					    parser.add_argument('--pr-base-sha', required=True, type=str, help="The Pull Request base hash")

					    parser.add_argument('--pr-head-sha', required=True, type=str, help="The Pull Request head hash")

					    parser.add_argument('--pr-body', required=True, help="The file that contains body of a Pull Request")

					    parser.add_argument('--pytorch-path', required=True, type=str, help="Path to pytorch repository")

					    parser.add_argument('--torchbench-path', required=True, type=str, help="Path to TorchBench repository")

					    args = parser.parse_args()

					    output_dir: str = os.path.join(os.environ["HOME"], ".torchbench", "bisection", f"pr{args.pr_num}")

					    # Identify the specified models and verify the input

					    models = extract_models_from_pr(args.torchbench_path, args.pr_body)

					    if not models:

					        print("Can't parse the model filter from the pr body. Currently we only support allow-list.")

					        exit(1)

					    print(f"Ready to run TorchBench with benchmark. Result will be saved in the directory: {output_dir}.")

					    # Run TorchBench with the generated config

					    torchbench_config = gen_abtest_config(args.pr_base_sha, args.pr_head_sha, models)

					    deploy_torchbench_config(output_dir, torchbench_config)

					    run_torchbench(pytorch_path=args.pytorch_path, torchbench_path=args.torchbench_path, output_dir=output_dir)

									
										368

.github/templates/linux_ci_workflow.yml.in
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,368 @@

					# Template is at:    .github/templates/linux_ci_workflow.yml

					# Generation script: .github/scripts/generate_linux_ci_workflows.py

					name: Linux CI (!{{ build_environment }})

					on:

					  # TODO: Enable pull_request builds when we can verify capacity can be met by auto-scalers

					{%- if on_pull_request %}

					  pull_request:

					{%- endif %}

					  push:

					    branches:

					      - master

					      - release/*

					  workflow_dispatch:

					env:

					  BUILD_ENVIRONMENT: !{{ build_environment }}

					  DOCKER_IMAGE_BASE: !{{ docker_image_base }}

					  SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2

					  TORCH_CUDA_ARCH_LIST: 5.2

					  IN_CI: 1

					  # Used for custom_opertor, jit_hooks, custom_backend, see .jenkins/pytorch/build.sh

					  CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts

					  ALPINE_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/tool/alpine"

					jobs:

					  calculate-docker-image:

					    runs-on: linux.2xlarge

					    env:

					      DOCKER_BUILDKIT: 1

					    timeout-minutes: 90

					    outputs:

					      docker_image: ${{ steps.calculate-tag.outputs.docker_image }}

					    steps:

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					        with:

					          # deep clone, to allow use of git merge-base

					          fetch-depth: 0

					      - name: Calculate docker image tag

					        id: calculate-tag

					        run: |

					          DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)

					          echo "::set-output name=docker_tag::${DOCKER_TAG}"

					          echo "::set-output name=docker_image::${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"

					      - name: Check if image should be built

					        id: check

					        env:

					          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

					          BASE_REVISION: ${{ github.event.pull_request.base.sha || github.sha }}

					        run: |

					          eval "$(aws ecr get-login --no-include-email --region us-east-1)"

					          set -x

					          # Check if image already exists, if it does then skip building it

					          if docker manifest inspect "${DOCKER_IMAGE_BASE}:${DOCKER_TAG}"; then

					            exit 0

					          fi

					          if [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then

					            # if we're on the base branch then use the parent commit

					            MERGE_BASE=$(git rev-parse HEAD~)

					          else

					            # otherwise we're on a PR, so use the most recent base commit

					            MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")

					          fi

					          # Covers the case where a previous tag doesn't exist for the tree

					          # this is only really applicable on trees that don't have `.circleci/docker` at its merge base, i.e. nightly

					          if ! git rev-parse "$MERGE_BASE:.circleci/docker"; then

					            echo "Directory '.circleci/docker' not found in commit $MERGE_BASE, you should probably rebase onto a more recent commit"

					            exit 1

					          fi

					          PREVIOUS_DOCKER_TAG=$(git rev-parse "$MERGE_BASE:.circleci/docker")

					          # If no image exists but the hash is the same as the previous hash then we should error out here

					          if [[ "${PREVIOUS_DOCKER_TAG}" = "${DOCKER_TAG}" ]]; then

					            echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

					            echo "       contact the PyTorch team to restore the original images"

					            exit 1

					          fi

					          echo ::set-output name=rebuild::yes

					      - name: Build and push docker image

					        if: steps.check.outputs.rebuild

					        env:

					          DOCKER_TAG: ${{ steps.calculate-tag.outputs.docker_tag }}

					          DOCKER_SKIP_S3_UPLOAD: 1

					        run: |

					          export IMAGE_NAME=${DOCKER_IMAGE_BASE#308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/}

					          cd .circleci/docker && ./build_docker.sh

					  build:

					    runs-on: linux.2xlarge

					    needs: calculate-docker-image

					    env:

					      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

					    steps:

					      - name: Log in to ECR

					        run: |

					          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

					          bash /tmp/ecr-login.sh

					          rm /tmp/ecr-login.sh

					      - name: Chown workspace

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					        with:

					          fetch-depth: 0 # deep clone, to allow sharding to use git rev-list

					          submodules: recursive

					      - name: Pull docker image

					        run: |

					          docker pull "${DOCKER_IMAGE}"

					      - name: Preserve github env variables for use in docker

					        run: |

					          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

					      - name: Build PyTorch

					        run: |

					          docker run \

					            -e BUILD_ENVIRONMENT \

					            -e MAX_JOBS="$(nproc --ignore=2)" \

					            -e SCCACHE_BUCKET \

					            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

					            -e SKIP_SCCACHE_INITIALIZATION=1 \

					            -e TORCH_CUDA_ARCH_LIST \

					            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

					            --security-opt seccomp=unconfined \

					            --cap-add=SYS_PTRACE \

					            --tty \

					            --user jenkins \

					            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

					            -w /var/lib/jenkins/workspace \

					            "${DOCKER_IMAGE}" \

					            sh -c 'sudo chown -R jenkins . && .jenkins/pytorch/build.sh'

					      - name: Chown workspace

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

					      - name: Archive artifacts into zip

					        run: |

					          zip -r artifacts.zip dist/ build/

					      - uses: actions/upload-artifact@v2

					        name: Store PyTorch Build Artifacts

					        with:

					          name: ${{ env.BUILD_ENVIRONMENT }}

					          retention-days: 30

					          if-no-files-found: error

					          path:

					            artifacts.zip

					      - name: Clean up docker images

					        if: always()

					        run: |

					          # Prune all of the docker images

					          docker system prune -af

					  test:

					    runs-on: !{{ test_runner_type }}

					    needs:

					      - calculate-docker-image

					      - build

					    env:

					      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

					    steps:

					      - name: Log in to ECR

					        run: |

					          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

					          bash /tmp/ecr-login.sh

					          rm /tmp/ecr-login.sh

					      - name: Chown workspace

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)/../":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					      - name: Pull docker image

					        run: |

					          docker pull "${DOCKER_IMAGE}"

					      - name: Install nvidia driver, nvidia-docker runtime, set GPU_FLAG

					        if: ${{ contains(env.BUILD_ENVIRONMENT, 'cuda') }}

					        run: |

					          bash .github/scripts/install_nvidia_utils_linux.sh

					          echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

					      - name: Determine shm-size

					        run: |

					          shm_size="1g"

					          case "${BUILD_ENVIRONMENT}" in

					            *cuda*)

					              shm_size="2g"

					              ;;

					            *rocm*)

					              shm_size="8g"

					              ;;

					          esac

					          echo "SHM_SIZE=${shm_size}" >> "${GITHUB_ENV}"

					      - uses: actions/download-artifact@v2

					        name: Download PyTorch Build Artifacts

					        with:

					          name: ${{ env.BUILD_ENVIRONMENT }}

					      - name: Unzip artifacts

					        run: |

					          unzip -o artifacts.zip

					      - name: Output disk space left

					        run: |

					          sudo df -H

					      - name: Preserve github env variables for use in docker

					        run: |

					          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

					      - name: Test PyTorch

					        run: |

					          # TODO: Stop building test binaries as part of the build phase

					          # Used for GPU_FLAG since that doesn't play nice

					          # shellcheck disable=SC2086

					          docker run \

					            ${GPU_FLAG:-} \

					            -e BUILD_ENVIRONMENT \

					            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

					            -e IN_CI \

					            -e MAX_JOBS="$(nproc --ignore=2)" \

					            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

					            --security-opt seccomp=unconfined \

					            --cap-add=SYS_PTRACE \

					            --shm-size="${SHM_SIZE}" \

					            --tty \

					            --user jenkins \

					            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

					            -w /var/lib/jenkins/workspace \

					            "${DOCKER_IMAGE}" \

					            sh -c 'sudo chown -R jenkins . && pip install dist/*.whl && .jenkins/pytorch/test.sh'

					      - name: Chown workspace

					        if: always()

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

					      - uses: actions/upload-artifact@v2

					        name: Store PyTorch Test Reports

					        if: always()

					        with:

					          name: test-reports

					          retention-days: 30

					          if-no-files-found: error

					          path:

					            test/**/*.xml

					      - name: Clean up docker images

					        if: always()

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v "${ALPINE_IMAGE}" chown -R "$(id -u):$(id -g)" .

					          # Prune all of the docker images

					          docker system prune -af

					  render_test_results:

					    if: always()

					    needs:

					      - test

					    runs-on: ubuntu-18.04

					    steps:

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					        with:

					          # deep clone, to allow tools/print_test_stats.py to use Git commands

					          fetch-depth: 0

					      - uses: actions/download-artifact@v2

					        name: Download PyTorch Test Reports

					        with:

					          name: test-reports

					          path: test/test-reports

					      - uses: actions/setup-python@v2

					        with:

					          python-version: 3.9

					      - name: Install dependencies

					        # boto3 version copied from .circleci/docker/common/install_conda.sh

					        run: |

					          pip install -r requirements.txt

					          pip install boto3==1.16.34 junitparser rich

					      - name: Output Test Results (Click Me)

					        run: |

					          python tools/render_junit.py test

					      - name: Parse ref

					        id: parse-ref

					        run: .github/scripts/parse_ref.py

					      - name: Display and upload test statistics (Click Me)

					        # temporary hack: set CIRCLE_* vars, until we update

					        # tools/print_test_stats.py to natively support GitHub Actions

					        env:

					          SCRIBE_GRAPHQL_ACCESS_TOKEN: ${{ secrets.SCRIBE_GRAPHQL_ACCESS_TOKEN }}

					          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_OSSCI_METRICS_ACCESS_KEY_ID }}

					          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_OSSCI_METRICS_SECRET_ACCESS_KEY }}

					          CIRCLE_BRANCH: ${{ steps.parse-ref.outputs.branch }}

					          CIRCLE_JOB: !{{ build_environment }}

					          CIRCLE_PR_NUMBER: ${{ github.event.pull_request.number }}

					          CIRCLE_SHA1: ${{ github.event.pull_request.head.sha || github.sha }}

					          CIRCLE_TAG: ${{ steps.parse-ref.outputs.tag }}

					          CIRCLE_WORKFLOW_ID: ${{ github.run_id }} # dunno if this corresponds

					        run: |

					          export PYTHONPATH=$PWD

					          python tools/print_test_stats.py --upload-to-s3 --compare-with-s3 test

					  {%- if enable_doc_jobs %}

					  pytorch_python_doc_build:

					    runs-on: linux.2xlarge

					    needs:

					      - calculate-docker-image

					      - build

					    env:

					      DOCKER_IMAGE: ${{ needs.calculate-docker-image.outputs.docker_image }}

					    steps:

					      - name: Log in to ECR

					        run: |

					          aws ecr get-login --no-include-email --region us-east-1 > /tmp/ecr-login.sh

					          bash /tmp/ecr-login.sh

					          rm /tmp/ecr-login.sh

					      - name: Chown workspace

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v alpine chown -R "$(id -u):$(id -g)" .

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					        with:

					          fetch-depth: 0 # deep clone, to allow sharding to use git rev-list

					          submodules: recursive

					      - name: Pull docker image

					        run: |

					          docker pull "${DOCKER_IMAGE}"

					      - name: Preserve github env variables for use in docker

					        run: |

					          env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}"

					      - uses: actions/download-artifact@v2

					        name: Download PyTorch Build Artifacts

					        with:

					          name: ${{ env.BUILD_ENVIRONMENT }}

					      - name: Unzip artifacts

					        run: |

					          unzip -o artifacts.zip

					      - name: Build Python Doc in Docker

					        run: |

					          set -ex

					          time docker pull "${DOCKER_IMAGE}" > /dev/null

					          echo "${GITHUB_REF}"

					          ref=${GITHUB_REF##*/}

					          target=${ref//v}

					          docker run \

					            -e BUILD_ENVIRONMENT \

					            -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \

					            -e IN_CI \

					            -e MAX_JOBS="$(nproc --ignore=2)" \

					            -e CIRCLE_SHA1="$GITHUB_SHA" \

					            --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \

					            --security-opt seccomp=unconfined \

					            --cap-add=SYS_PTRACE \

					            --name="$GITHUB_SHA" \

					            --tty \

					            --user jenkins \

					            -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \

					            -w /var/lib/jenkins/workspace \

					            "${DOCKER_IMAGE}" \

					            bash -c "sudo chown -R jenkins . && pip install dist/*.whl && ./.circleci/scripts/python_doc_push_script.sh docs/$target $target site"

					      - name: Chown workspace

					        run: |

					          # Ensure the working directory gets chowned back to the current user

					          docker run --rm -v "$(pwd)":/v -w /v alpine chown -R "$(id -u):$(id -g)" .

					      - name: Archive artifacts into zip

					        run: |

					          zip -r pytorch_github_io.zip "${GITHUB_WORKSPACE}/pytorch.github.io"

					      - uses: actions/upload-artifact@v2

					        name: Store PyTorch Build Artifacts

					        with:

					          name: pytorch_github_io

					          if-no-files-found: error

					          path: pytorch_github_io.zip

					      - name: Clean up docker images

					        if: always()

					        run: |

					          # Prune all of the docker images

					          docker system prune -af

					  {%- endif -%}

									
										66

.github/workflows/add_annotations.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,66 @@

					name: Add annotations

					on:

					  workflow_run:

					    types:

					      - completed

					    workflows:

					      - Lint

					jobs:

					  annotate:

					    strategy:

					      fail-fast: false

					      matrix:

					        name:

					          - flake8-py3

					          - clang-tidy

					    runs-on: ubuntu-18.04

					    steps:

					      - name: Download artifact

					        uses: actions/github-script@v3

					        env:

					          RUN_ID: ${{ github.event.workflow_run.id }}

					          LINT_NAME: ${{ matrix.name }}

					        with:

					          # https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

					          script: |

					            const artifacts = await github.actions.listWorkflowRunArtifacts({

					              owner: context.repo.owner,

					              repo: context.repo.repo,

					              run_id: process.env.RUN_ID,

					            });

					            const filteredArtifacts = artifacts.data.artifacts.filter(artifact => {

					              return artifact.name == process.env.LINT_NAME;

					            });

					            if (filteredArtifacts.length > 0) {

					              const matchArtifact = filteredArtifacts[0];

					              const download = await github.actions.downloadArtifact({

					                owner: context.repo.owner,

					                repo: context.repo.repo,

					                artifact_id: matchArtifact.id,

					                archive_format: 'zip',

					              });

					              const fs = require('fs');

					              fs.writeFileSync(

					                `${process.env.GITHUB_WORKSPACE}/linter-output.zip`,

					                Buffer.from(download.data),

					              );

					            }

					      - name: Unzip artifact

					        id: unzip

					        run: |

					          if unzip linter-output.zip annotations.json commit-sha.txt; then

					            echo ::set-output \

					              name=sha::"$(grep -Em1 '^[[:xdigit:]]{40}$' commit-sha.txt)"

					          fi

					      - if: steps.unzip.outputs.sha

					        name: Add annotations

					        uses: pytorch/add-annotations-github-action@master

					        env:

					          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

					        with:

					          check_name: ${{ matrix.name }}

					          linter_output_path: annotations.json

					          commit_sha: ${{ steps.unzip.outputs.sha }}

					          mode: json

									
										47

.github/workflows/auto_label.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,47 @@

					name: Label PRs & Issues

					on:

					  issues:

					    types: [opened, edited]

					  pull_request_target:

					    types: [edited, opened, synchronize, reopened]

					jobs:

					  auto-label-rocm:

					    runs-on: ubuntu-18.04

					    steps:

					    - name: Retrieve information

					      id: vars

					      env:

					        EVENT_NAME: ${{ github.event_name }}

					        PR_TITLE: ${{ github.event.pull_request.title }}

					        PR_NUMBER: ${{ github.event.pull_request.number }}

					        ISSUE_TITLE: ${{ github.event.issue.title }}

					        ISSUE_NUMBER: ${{ github.event.issue.number }}

					      run: |

					        set -eux

					        if [[ "$EVENT_NAME" == "pull_request_target" ]]; then

					          TITLE="${PR_TITLE}"

					          ISSUE_NUMBER="${PR_NUMBER}"

					        else

					          TITLE="${ISSUE_TITLE}"

					          # ISSUE_NUMBER is already set

					        fi

					        echo ::set-output name=TITLE::"${TITLE}"

					        echo ::set-output name=ISSUE_NUMBER::"${ISSUE_NUMBER}"

					    - name: Auto-label ROCm

					      env:

					        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

					        TITLE: ${{ steps.vars.outputs.TITLE }}

					        ISSUE_NUMBER: ${{ steps.vars.outputs.ISSUE_NUMBER }}

					        OWNER: ${{ github.repository_owner }}

					        REPO: ${{ github.event.repository.name }}

					      run: |

					        set -eux

					        if [[ "${TITLE,,}" == *rocm* ]]; then

					          curl \

					            -X POST \

					            -H "Authorization: token ${GITHUB_TOKEN}" \

					            "https://api.github.com/repos/${OWNER}/${REPO}/issues/${ISSUE_NUMBER}/labels" \

					            -d '{"labels":["module: rocm"]}'

					        fi

									
										95

.github/workflows/build_linux_conda.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,95 @@

					name: Build Linux Conda Packages

					on:

					  # TODO: These are only runnable from workflow_dispatch, we need to eventually add

					  #       a cron

					  # TODO: Add an on_release trigger to build on tags

					  workflow_dispatch:

					jobs:

					  generate-build-matrix:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    runs-on: ubuntu-18.04

					    outputs:

					      matrix: ${{ steps.set-matrix.outputs.matrix }}

					    container:

					      image: python:3.9

					    steps:

					      - name: Clone pytorch/pytorch

					        uses: actions/checkout@v2

					      - name: Generating build matrix

					        id: set-matrix

					        run: |

					          # outputting for debugging purposes

					          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py conda)

					          echo "${MATRIX}"

					          echo "::set-output name=matrix::${MATRIX}"

					  build-conda:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    needs: generate-build-matrix

					    runs-on: linux.2xlarge

					    strategy:

					      matrix:

					        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}

					      fail-fast: false

					    container:

					      image: ${{ matrix.container_image }}

					    env:

					      DESIRED_PYTHON: ${{ matrix.python_version }}

					      # TODO: This is a legacy variable that we eventually want to get rid of in

					      #       favor of GPU_ARCH_VERSION

					      DESIRED_CUDA: ${{ matrix.gpu_arch_version }}

					      GPU_ARCH_VERSION: ${{ matrix.GPU_ARCH_VERSION }}

					      GPU_ARCH_TYPE: ${{ matrix.gpu_arch_type }}

					      NO_BUILD_SUFFIX: True

					      # TODO: This is a legacy variable, we should just default all build to use

					      #       this folder within the conda/build_pytorch.sh script

					      TORCH_CONDA_BUILD_FOLDER: pytorch-nightly

					      # TODO: Another legacy env variable that isn't useful anymore, should default

					      #       to pytorch within the scripts directly

					      ANACONDA_USER: pytorch

					      PYTORCH_FINAL_PACKAGE_DIR: /remote

					      # We specify the CONDA_BLD_PATH here since conda creates extremely long paths

					      # for its default build path

					      CONDA_BLD_PATH: /build

					      PYTORCH_BUILD_NUMBER: 1

					      SKIP_ALL_TESTS: 1

					    steps:

					      - name: Clean runner workspace

					        run: rm -rf "$GITHUB_WORKSPACE"

					      - name: Clone pytorch/pytorch

					        uses: actions/checkout@v2

					        with:

					          path: pytorch

					          submodules: recursive

					      - name: Clone pytorch/builder

					        uses: actions/checkout@v2

					        with:

					          repository: pytorch/builder

					          path: builder

					      - name: Generate version string

					        working-directory: pytorch/

					        run: |

					          version=$(.github/scripts/generate_pytorch_version.py)

					          echo "Generated version: ${version}"

					          echo "PYTORCH_BUILD_VERSION=${version}" >> "$GITHUB_ENV"

					      - name: Set BUILD_SPLIT_CUDA

					        if: ${{ matrix.gpu_arch_type == 'cuda' && matrix.gpu_arch_version == '11.1' }}

					        run: |

					          echo "BUILD_SPLIT_CUDA=1" >> "$GITHUB_ENV"

					      # TODO: Remove this once we remove the need for the directories to be

					      #       in specific locations

					      - name: Symlink repositories to root directory (for legacy scripts purposes)

					        run: |

					          mv "$PWD"/pytorch /pytorch

					          mv "$PWD"/builder /builder

					      # TODO: Bundle the correct build script in the base container image so

					      #       that we don't have to do this type of specification

					      - name: Build PyTorch binary

					        run: |

					          /builder/conda/build_pytorch.sh

					      - uses: actions/upload-artifact@v2

					        with:

					          name: pytorch-conda-py${{ matrix.python_version }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

					          path: /remote/**/*.bz2

					      # TODO: Add a step here for uploading binaries

									
										94

.github/workflows/build_linux_libtorch.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,94 @@

					name: Build Linux libtorch

					on:

					  # TODO: These are only runnable from workflow_dispatch, we need to eventually add

					  #       a cron

					  # TODO: Add an on_release trigger to build on tags

					  workflow_dispatch:

					jobs:

					  generate-build-matrix:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    runs-on: ubuntu-18.04

					    outputs:

					      matrix: ${{ steps.set-matrix.outputs.matrix }}

					    container:

					      image: python:3.9

					    steps:

					      - name: Clone pytorch/pytorch

					        uses: actions/checkout@v2

					      - name: Generating build matrix

					        id: set-matrix

					        run: |

					          # outputting for debugging purposes

					          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py libtorch)

					          echo "${MATRIX}"

					          echo "::set-output name=matrix::${MATRIX}"

					  build-libtorch:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    needs: generate-build-matrix

					    runs-on: linux.2xlarge

					    strategy:

					      matrix:

					        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}

					      fail-fast: false

					    container:

					      image: ${{ matrix.container_image }}

					    env:

					      # TODO: remove this var from the libtorch builder script(s)

					      DESIRED_PYTHON: '3.7'

					      # TODO: This is a legacy variable that we eventually want to get rid of in

					      #       favor of GPU_ARCH_VERSION

					      DESIRED_CUDA: ${{ matrix.gpu_arch_version }}

					      GPU_ARCH_VERSION: ${{ matrix.GPU_ARCH_VERSION }}

					      GPU_ARCH_TYPE: ${{ matrix.gpu_arch_type }}

					      BUILD_PYTHONLESS: 1

					      LIBTORCH_VARIANT: ${{ matrix.libtorch_variant }}

					      # TODO: remove this and bake env var into the Docker image

					      DESIRED_DEVTOOLSET: ${{ matrix.devtoolset }}

					      PYTORCH_BUILD_NUMBER: 1

					      SKIP_ALL_TESTS: 1

					    steps:

					      - name: Clean runner workspace

					        run: rm -rf "$GITHUB_WORKSPACE"

					      - name: Clone pytorch/pytorch

					        uses: actions/checkout@v2

					        with:

					          path: pytorch

					          submodules: recursive

					      - name: Clone pytorch/builder

					        uses: actions/checkout@v2

					        with:

					          repository: pytorch/builder

					          path: builder

					      - name: Generate version string

					        working-directory: pytorch/

					        run: |

					          version=$(.github/scripts/generate_pytorch_version.py)

					          echo "Generated version: ${version}"

					          echo "PYTORCH_BUILD_VERSION=${version}" >> "$GITHUB_ENV"

					      - name: Set BUILD_SPLIT_CUDA

					        if: ${{ matrix.gpu_arch_type == 'cuda' && matrix.gpu_arch_version == '11.1' }}

					        run: |

					          echo "BUILD_SPLIT_CUDA=1" >> "$GITHUB_ENV"

					      # TODO: Remove this once we remove the need for the directories to be

					      #       in specific locations

					      - name: Symlink repositories to root directory (for legacy scripts purposes)

					        run: |

					          ln -s "$PWD"/pytorch /pytorch

					          ln -s "$PWD"/builder /builder

					      # TODO: Bundle the correct build script in the base container image so

					      #       that we don't have to do this type of specification

					      - name: Build PyTorch binary (CUDA specific)

					        if: ${{ matrix.gpu_arch_type == 'cuda' }}

					        run: |

					          /builder/manywheel/build.sh

					      - name: Build PyTorch binary (CPU specific)

					        if: ${{ matrix.gpu_arch_type == 'cpu' }}

					        run: |

					          /builder/manywheel/build_cpu.sh

					      - uses: actions/upload-artifact@v2

					        with:

					          name: pytorch-libtorch-${{ matrix.libtorch_variant }}-${{ matrix.devtoolset }}-${{matrix.gpu_arch_type}}-${{ matrix.gpu_arch_version }}

					          path: /remote/**/*.zip

					      # TODO: Add a step here for uploading binaries

									
										17

.github/workflows/build_linux_binaries.yml → .github/workflows/build_linux_wheels.yml
									
										vendored
									
												View File
												
					@ -21,8 +21,8 @@ jobs:

					        id: set-matrix

					        id: set-matrix

					        run: |

					        run: |

					          # outputting for debugging purposes

					          # outputting for debugging purposes

					          python .github/scripts/generate_binary_build_matrix.py

					          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py wheels)

					          MATRIX=$(python .github/scripts/generate_binary_build_matrix.py)

					          echo "${MATRIX}"

					          echo "::set-output name=matrix::${MATRIX}"

					          echo "::set-output name=matrix::${MATRIX}"

					  build-wheel:

					  build-wheel:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    if: ${{ github.repository_owner == 'pytorch' }}

					@ -31,6 +31,7 @@ jobs:

					    strategy:

					    strategy:

					      matrix:

					      matrix:

					        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}

					        ${{ fromJson(needs.generate-build-matrix.outputs.matrix) }}

					      fail-fast: false

					    container:

					    container:

					      image: ${{ matrix.container_image }}

					      image: ${{ matrix.container_image }}

					    env:

					    env:

					@ -43,6 +44,8 @@ jobs:

					      PYTORCH_BUILD_NUMBER: 1

					      PYTORCH_BUILD_NUMBER: 1

					      SKIP_ALL_TESTS: 1

					      SKIP_ALL_TESTS: 1

					    steps:

					    steps:

					      - name: Clean runner workspace

					        run: rm -rf "$GITHUB_WORKSPACE"

					      - name: Clone pytorch/pytorch

					      - name: Clone pytorch/pytorch

					        uses: actions/checkout@v2

					        uses: actions/checkout@v2

					        with:

					        with:

					@ -58,13 +61,17 @@ jobs:

					        run: |

					        run: |

					          version=$(.github/scripts/generate_pytorch_version.py)

					          version=$(.github/scripts/generate_pytorch_version.py)

					          echo "Generated version: ${version}"

					          echo "Generated version: ${version}"

					          echo "PYTORCH_BUILD_VERSION=${version}" >> $GITHUB_ENV

					          echo "PYTORCH_BUILD_VERSION=${version}" >> "$GITHUB_ENV"

					      - name: Set BUILD_SPLIT_CUDA

					        if: ${{ matrix.gpu_arch_type == 'cuda' && matrix.gpu_arch_version == '11.1' }}

					        run: |

					          echo "BUILD_SPLIT_CUDA=1" >> "$GITHUB_ENV"

					      # TODO: Remove this once we remove the need for the directories to be

					      # TODO: Remove this once we remove the need for the directories to be

					      #       in specific locations

					      #       in specific locations

					      - name: Symlink repositories to root directory (for legacy scripts purposes)

					      - name: Symlink repositories to root directory (for legacy scripts purposes)

					        run: |

					        run: |

					          ln -s $(pwd)/pytorch /pytorch

					          ln -s "$PWD"/pytorch /pytorch

					          ln -s $(pwd)/builder /builder

					          ln -s "$PWD"/builder /builder

					      # TODO: Bundle the correct build script in the base container image so

					      # TODO: Bundle the correct build script in the base container image so

					      #       that we don't have to do this type of specification

					      #       that we don't have to do this type of specification

					      - name: Build PyTorch binary (CUDA specific)

					      - name: Build PyTorch binary (CUDA specific)

									
										24

.github/workflows/cancel_redundant_workflows.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,24 @@

					name: Cancel redundant workflows

					on:

					  workflow_run:

					    types:

					    - requested

					    # NOTE: Make sure to add to this list as you add more workflows running on 'pull_request'

					    workflows:

					    - Lint

					    - Linux CI (pytorch-linux-xenial-py3.6-gcc5.4)

					    - Test tools

					    - TorchBench CI (pytorch-linux-py3.7-cu102)

					    - clang-format

					jobs:

					  cancel:

					    # We do not want to cancel reruns on master

					    if: github.event.workflow_run.head_branch != 'master'

					    runs-on: ubuntu-18.04

					    steps:

					    - name: Cancel duplicate workflow runs

					      uses: potiuk/cancel-workflow-runs@a81b3c4d59c61e27484cfacdc13897dd908419c9

					      with:

					        cancelMode: duplicates

					        token: ${{ secrets.GITHUB_TOKEN }}

					        sourceRunId: ${{ github.event.workflow_run.id }}

									
										27

.github/workflows/clang_format.yml
									
										vendored
									
												View File
												
					@ -8,46 +8,37 @@ jobs:

					    runs-on: ubuntu-18.04

					    runs-on: ubuntu-18.04

					    steps:

					    steps:

					      - name: Setup Python

					      - name: Setup Python

					        uses: actions/setup-python@v1

					        uses: actions/setup-python@v2

					        with:

					        with:

					          python-version: 3.x

					          python-version: 3.x

					          architecture: x64

					          architecture: x64

					      - name: Fetch PyTorch

					      - name: Fetch PyTorch

					        uses: actions/checkout@v1

					        uses: actions/checkout@v2

					      - name: Checkout PR tip

					        with:

					        run: |

					          fetch-depth: 0 # deep clone, to allow us to use git merge-base

					          set -eux

					          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

					            # We are on a PR, so actions/checkout leaves us on a merge commit.

					            # Check out the actual tip of the branch.

					            git checkout ${{ github.event.pull_request.head.sha }}

					          fi

					          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

					        id: get_pr_tip

					      - name: Run clang-format

					      - name: Run clang-format

					        env:

					          BASE_SHA: ${{ github.event.pull_request.base.sha }}

					        run: |

					        run: |

					          set -eu

					          set -eu

					          # This is necessary to get the same results regardless of whether the

					          # This is necessary to get the same results regardless of whether the

					          # PR was opened directly or from a forked repo. See: `9f890a92` for more info.

					          # PR was opened directly or from a forked repo. See: `9f890a92` for more info.

					          git remote add upstream https://github.com/pytorch/pytorch

					          git remote add upstream https://github.com/pytorch/pytorch

					          git fetch upstream "$GITHUB_BASE_REF"

					          git fetch upstream "$GITHUB_BASE_REF"

					          BASE_SHA=${{ github.event.pull_request.base.sha }}

					          HEAD_SHA=${{ github.event.pull_request.head.sha }}

					          MERGE_BASE=$(git merge-base $BASE_SHA $HEAD_SHA)

					          # only run clang-format on allowlisted files

					          # only run clang-format on allowlisted files

					          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

					          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

					          echo "| clang-format failures found! Run: "

					          echo "| clang-format failures found! Run: "

					          echo "|    tools/clang_format_ci.sh ${MERGE_BASE} "

					          echo "|    tools/clang_format_ci.sh ${BASE_SHA} "

					          echo "| to fix this error. "

					          echo "| to fix this error. "

					          echo "| For more info, see: https://github.com/pytorch/pytorch/wiki/clang-format "

					          echo "| For more info, see: https://github.com/pytorch/pytorch/wiki/clang-format "

					          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

					          echo "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"

					          tools/clang_format_ci.sh ${MERGE_BASE}

					          tools/clang_format_ci.sh "${BASE_SHA}"

					          GIT_DIFF=$(git diff)

					          GIT_DIFF=$(git diff)

					          if [[ -z $GIT_DIFF ]]; then

					          if [[ -z $GIT_DIFF ]]; then

					            exit 0

					            exit 0

					          fi

					          fi

					          echo $GIT_DIFF

					          echo "$GIT_DIFF"

					          exit 1

					          exit 1

									
										316

.github/workflows/lint.yml
									
										vendored
									
												View File
												
					@ -11,113 +11,234 @@ jobs:

					    runs-on: ubuntu-18.04

					    runs-on: ubuntu-18.04

					    steps:

					    steps:

					      - name: Setup Python

					      - name: Setup Python

					        uses: actions/setup-python@v1

					        uses: actions/setup-python@v2

					        with:

					        with:

					          python-version: 3.x

					          python-version: 3.x

					          architecture: x64

					          architecture: x64

					      - name: Checkout PyTorch

					      - name: Checkout PyTorch

					        uses: actions/checkout@v1

					        uses: actions/checkout@v2

					      - name: Checkout PR tip

					      - name: Install requirements

					        run: |

					        id: requirements

					          set -eux

					        run: pip install -r requirements.txt

					          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

					            # We are on a PR, so actions/checkout leaves us on a merge commit.

					            # Check out the actual tip of the branch.

					            git checkout ${{ github.event.pull_request.head.sha }}

					          fi

					          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

					        id: get_pr_tip

					      - name: Ensure consistent CircleCI YAML config

					      - name: Ensure consistent CircleCI YAML config

					        if: always() && steps.requirements.outcome == 'success'

					        run: cd .circleci && ./ensure-consistency.py

					      - name: Ensure consistent GHA workflows in cancel_redundant_workflows.yml

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					        run: |

					          pip install -r requirements.txt

					          pip install ruamel.yaml==0.17.4

					          cd .circleci && ./ensure-consistency.py

					          echo "Please locally run .github/scripts/regenerate_cancel_redundant_workflow.py and commit if this step fails."

					      - name: Shellcheck Jenkins scripts

					          .github/scripts/regenerate_cancel_redundant_workflow.py

					        # https://github.com/koalaman/shellcheck#installing-a-pre-compiled-binary

					          git diff --exit-code .github/workflows/cancel_redundant_workflows.yml

					      - name: Lint native_functions.yaml

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					        run: |

					          scversion="stable"

					          pip install ruamel.yaml==0.17.4

					          .github/scripts/lint_native_functions.py

					      - name: Extract scripts from GitHub Actions workflows

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					          # For local lints, remove the .extracted_scripts folder if it was already there

					          rm -rf .extracted_scripts

					          tools/extract_scripts.py --out=.extracted_scripts

					      - name: Install ShellCheck

					        id: install_shellcheck

					        if: always()

					        # https://github.com/koalaman/shellcheck/tree/v0.7.2#installing-a-pre-compiled-binary

					        run: |

					          set -x

					          scversion="v0.7.2"

					          wget -qO- "https://github.com/koalaman/shellcheck/releases/download/${scversion?}/shellcheck-${scversion?}.linux.x86_64.tar.xz" | tar -xJv

					          wget -qO- "https://github.com/koalaman/shellcheck/releases/download/${scversion?}/shellcheck-${scversion?}.linux.x86_64.tar.xz" | tar -xJv

					          sudo cp "shellcheck-${scversion}/shellcheck" /usr/bin/

					          sudo cp "shellcheck-${scversion}/shellcheck" /usr/bin/

					          rm -r "shellcheck-${scversion}"

					          rm -r "shellcheck-${scversion}"

					          shellcheck --version

					          shellcheck --version

					          .jenkins/run-shellcheck.sh

					      - name: Run ShellCheck

					        if: always() && steps.install_shellcheck.outcome == 'success'

					        run: |

					          tools/run_shellcheck.sh .jenkins/pytorch .extracted_scripts

					      - name: Ensure correct trailing newlines

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					          (! git --no-pager grep -Il '' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude)**.expect' ':(exclude)tools/clang_format_hash' | tools/trailing_newlines.py || (echo "The above files do not have correct trailing newlines; please normalize them"; false))

					      - name: Ensure no trailing spaces

					        if: always()

					        run: |

					          (! git --no-pager grep -In '[[:blank:]]$' -- . ':(exclude)**/contrib/**' ':(exclude)third_party' || (echo "The above lines have trailing spaces; please remove them"; false))

					      - name: Ensure no tabs

					      - name: Ensure no tabs

					        if: always()

					        run: |

					        run: |

					          (! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))

					          (! git --no-pager grep -In $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above lines have tabs; please convert them to spaces"; false))

					      - name: Ensure no non-breaking spaces

					        if: always()

					        run: |

					          # NB: We use 'printf' below rather than '\u000a' since bash pre-4.2

					          # does not support the '\u000a' syntax (which is relevant for local linters)

					          (! git --no-pager grep -In "$(printf '\xC2\xA0')" -- . || (echo "The above lines have non-breaking spaces (U+00A0); please convert them to spaces (U+0020)"; false))

					      - name: Ensure canonical include

					      - name: Ensure canonical include

					        if: always()

					        run: |

					        run: |

					          (! git grep -I -l $'#include "' -- ./c10 ./aten ./torch/csrc ':(exclude)aten/src/ATen/native/quantized/cpu/qnnpack/**' || (echo "The above files have include with quotes; please convert them to #include <xxxx>"; false))

					          (! git --no-pager grep -In $'#include "' -- ./c10 ./aten ./torch/csrc ':(exclude)aten/src/ATen/native/quantized/cpu/qnnpack/**' || (echo "The above lines have include with quotes; please convert them to #include <xxxx>"; false))

					      # note that this next step depends on a clean heckout;

					      - name: Ensure no versionless Python shebangs

					        if: always()

					        run: |

					          (! git --no-pager grep -In '#!.*python$' -- . || (echo "The above lines have versionless Python shebangs; please specify either python2 or python3"; false))

					      - name: Ensure no unqualified noqa

					        if: always()

					        run: |

					          # shellcheck disable=SC2016

					          (! git --no-pager grep -InP '# noqa(?!: [A-Z]+\d{3})' -- '**.py' '**.pyi' ':(exclude)caffe2' || (echo 'The above lines have unqualified `noqa`; please convert them to `noqa: XXXX`'; false))

					      - name: Ensure no unqualified type ignore

					        if: always()

					        run: |

					          # shellcheck disable=SC2016

					          (! git --no-pager grep -InP '# type:\s*ignore(?!\[)' -- '**.py' '**.pyi' ':(exclude)test/test_jit.py' || (echo 'The above lines have unqualified `type: ignore`; please convert them to `type: ignore[xxxx]`'; false))

					      # note that this next step depends on a clean checkout;

					      # if you run it locally then it will likely to complain

					      # if you run it locally then it will likely to complain

					      # about all the generated files in torch/test

					      # about all the generated files in torch/test

					      - name: Ensure C++ source files are not executable

					      - name: Ensure C++ source files are not executable

					        if: always()

					        run: |

					        run: |

					          # shellcheck disable=SC2016

					          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format|gradlew)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))

					          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit|git-clang-format|gradlew)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))

					      - name: C++ docs check

					      - name: C++ docs check

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					        run: |

					          sudo apt-get install -y doxygen && pip install -r requirements.txt

					          sudo apt-get install -y doxygen

					          cd docs/cpp/source && ./check-doxygen.sh

					          cd docs/cpp/source && ./check-doxygen.sh

					      - name: CUDA kernel launch check

					      - name: CUDA kernel launch check

					        if: always() && steps.requirements.outcome == 'success'

					        run: |

					        run: |

					          set -eux

					          set -eux

					          python torch/testing/check_kernel_launches.py |& tee ${GITHUB_WORKSPACE}/cuda_kernel_launch_checks.txt

					          python torch/testing/check_kernel_launches.py |& tee "${GITHUB_WORKSPACE}"/cuda_kernel_launch_checks.txt

					      - name: Ensure no direct cub include

					        if: always()

					        run: |

					          (! git --no-pager grep -I -no $'#include <cub/' --  ./aten  ':(exclude)aten/src/ATen/cuda/cub.cuh' || (echo "The above files have direct cub include; please include ATen/cuda/cub.cuh instead and wrap your cub calls in at::native namespace if necessary"; false))

					  py2-setup-validate-errormsg:

					    runs-on: ubuntu-18.04

					    steps:

					      - name: Setup Python

					        uses: actions/setup-python@v2

					        with:

					          python-version: 2.x

					          architecture: x64

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					      - name: Attempt to run setup.py

					        run: |

					          python2 setup.py | grep "Python 2 has reached end-of-life and is no longer supported by PyTorch."

					  templates:

					    runs-on: ubuntu-18.04

					    steps:

					      - name: Setup Python

					        uses: actions/setup-python@v2

					        with:

					          python-version: 3.x

					          architecture: x64

					      - name: Install Jinja2

					        run: pip install Jinja2

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					      - name: Regenerate workflows

					        run: .github/scripts/generate_linux_ci_workflows.py

					      - name: Assert that regenerating the workflows didn't change them

					        run: .github/scripts/report_git_status.sh

					  toc:

					    runs-on: ubuntu-18.04

					    # https://github.com/actions/virtual-environments/issues/599#issuecomment-602754687

					    env:

					      NPM_CONFIG_PREFIX: ~/.npm-global

					    steps:

					      - name: Setup Node

					        uses: actions/setup-node@v2

					      - name: Checkout PyTorch

					        uses: actions/checkout@v2

					      - name: Install markdown-toc

					        run: npm install -g markdown-toc

					      - name: Regenerate ToCs and check that they didn't change

					        run: |

					          set -eux

					          export PATH=~/.npm-global/bin:"$PATH"

					          for FILE in $(git grep -Il '<!-- toc -->' -- '**.md'); do

					            markdown-toc --bullets='-' -i "$FILE"

					          done

					          .github/scripts/report_git_status.sh

					  flake8-py3:

					  flake8-py3:

					    runs-on: ubuntu-18.04

					    runs-on: ubuntu-18.04

					    steps:

					    steps:

					      - name: Setup Python

					      - name: Setup Python

					        uses: actions/setup-python@v1

					        uses: actions/setup-python@v2

					        with:

					        with:

					          python-version: 3.x

					          python-version: 3.x

					          architecture: x64

					          architecture: x64

					      - name: Fetch PyTorch

					      - name: Fetch PyTorch

					        uses: actions/checkout@v1

					        uses: actions/checkout@v2

					      - name: Checkout PR tip

					        with:

					          fetch-depth: 2 # to allow us to use github.event.pull_request.head.sha

					      - name: Prepare output dir with HEAD commit SHA

					        env:

					          HEAD_SHA: ${{ github.event.pull_request.head.sha }}

					        run: |

					          mkdir flake8-output

					          cd flake8-output

					          echo "$HEAD_SHA" > commit-sha.txt

					      - name: Install dependencies

					        run: |

					        run: |

					          set -eux

					          set -eux

					          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

					          pip install typing-extensions # for tools/translate_annotations.py

					            # We are on a PR, so actions/checkout leaves us on a merge commit.

					          pip install -r requirements-flake8.txt

					            # Check out the actual tip of the branch.

					          flake8 --version

					            git checkout ${{ github.event.pull_request.head.sha }}

					          fi

					          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

					        id: get_pr_tip

					      - name: Run flake8

					      - name: Run flake8

					        run: |

					        run: |

					          set -eux

					          set -eux

					          pip install -r requirements-flake8.txt

					          flake8 | tee "${GITHUB_WORKSPACE}"/flake8-output.txt

					          flake8 --version

					      - name: Translate annotations

					          flake8 | tee ${GITHUB_WORKSPACE}/flake8-output.txt

					        if: github.event_name == 'pull_request'

					      - name: Add annotations

					        uses: pytorch/add-annotations-github-action@master

					        with:

					          check_name: 'flake8-py3'

					          linter_output_path: 'flake8-output.txt'

					          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

					          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'

					        env:

					        env:

					          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

					          HEAD_SHA: ${{ github.event.pull_request.head.sha }}

					        run: |

					          tools/translate_annotations.py \

					            --file="${GITHUB_WORKSPACE}"/flake8-output.txt \

					            --regex='^(?P<filename>.*?):(?P<lineNumber>\d+):(?P<columnNumber>\d+): (?P<errorCode>\w+\d+) (?P<errorDesc>.*)' \

					            --commit="$HEAD_SHA" \

					            > flake8-output/annotations.json

					      - name: Upload artifact

					        uses: actions/upload-artifact@v2

					        with:

					          name: flake8-py3

					          path: flake8-output/

					      - name: Fail if there were any warnings

					        run: |

					          set -eux

					          # Re-output flake8 status so GitHub logs show it on the step that actually failed

					          cat "${GITHUB_WORKSPACE}"/flake8-output.txt

					          [ ! -s "${GITHUB_WORKSPACE}"/flake8-output.txt ]

					  clang-tidy:

					  clang-tidy:

					    if: github.event_name == 'pull_request'

					    if: github.event_name == 'pull_request'

					    runs-on: ubuntu-18.04

					    runs-on: ubuntu-18.04

					    steps:

					    steps:

					      - name: Setup Python

					      - name: Setup Python

					        uses: actions/setup-python@v1

					        uses: actions/setup-python@v2

					        with:

					        with:

					          python-version: 3.x

					          python-version: 3.x

					          architecture: x64

					          architecture: x64

					      - name: Checkout PyTorch

					      - name: Checkout PyTorch

					        uses: actions/checkout@v1

					        uses: actions/checkout@v2

					      - name: Checkout PR tip

					        with:

					          fetch-depth: 0 # to allow tools/clang_tidy.py to do its thing

					      - name: Prepare output dir with HEAD commit SHA

					        env:

					          HEAD_SHA: ${{ github.event.pull_request.head.sha }}

					        run: |

					        run: |

					          set -eux

					          mkdir clang-tidy-output

					          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

					          cd clang-tidy-output

					            # We are on a PR, so actions/checkout leaves us on a merge commit.

					          echo "$HEAD_SHA" > commit-sha.txt

					            # Check out the actual tip of the branch.

					            git checkout ${{ github.event.pull_request.head.sha }}

					          fi

					          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

					        id: get_pr_tip

					      - name: Install dependencies

					      - name: Install dependencies

					        run: |

					        run: |

					          set -eux

					          set -eux

					@ -135,19 +256,17 @@ jobs:

					          sudo apt-get update

					          sudo apt-get update

					          sudo apt-get install -y clang-tidy-11

					          sudo apt-get install -y clang-tidy-11

					          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-11 1000

					          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-11 1000

					      - name: Run clang-tidy

					      - name: Generate build files

					        run: |

					        run: |

					          set -eux

					          set -eux

					          git remote add upstream https://github.com/pytorch/pytorch

					          git remote add upstream https://github.com/pytorch/pytorch

					          git fetch upstream "$GITHUB_BASE_REF"

					          git fetch upstream "$GITHUB_BASE_REF"

					          BASE_SHA=${{ github.event.pull_request.base.sha }}

					          HEAD_SHA=${{ github.event.pull_request.head.sha }}

					          MERGE_BASE=$(git merge-base $BASE_SHA $HEAD_SHA)

					          if [[ ! -d build ]]; then

					          if [[ ! -d build ]]; then

					            git submodule update --init --recursive

					            git submodule update --init --recursive

					            export USE_NCCL=0

					            export USE_NCCL=0

					            export USE_DEPLOY=1

					            # We really only need compile_commands.json, so no need to build!

					            # We really only need compile_commands.json, so no need to build!

					            time python setup.py --cmake-only build

					            time python setup.py --cmake-only build

					@ -162,6 +281,12 @@ jobs:

					              --native-functions-path aten/src/ATen/native/native_functions.yaml \

					              --native-functions-path aten/src/ATen/native/native_functions.yaml \

					              --nn-path aten/src

					              --nn-path aten/src

					          fi

					          fi

					      - name: Run clang-tidy

					        env:

					          BASE_SHA: ${{ github.event.pull_request.base.sha }}

					          HEAD_SHA: ${{ github.event.pull_request.head.sha }}

					        run: |

					          set -eux

					          # Run Clang-Tidy

					          # Run Clang-Tidy

					          # The negative filters below are to exclude files that include onnx_pb.h or

					          # The negative filters below are to exclude files that include onnx_pb.h or

					@ -174,7 +299,7 @@ jobs:

					          python tools/clang_tidy.py                               \

					          python tools/clang_tidy.py                               \

					            --verbose                                              \

					            --verbose                                              \

					            --paths torch/csrc/                                    \

					            --paths torch/csrc/                                    \

					            --diff "$MERGE_BASE"                                   \

					            --diff "$BASE_SHA"                                   \

					            -g"-torch/csrc/jit/passes/onnx/helper.cpp"             \

					            -g"-torch/csrc/jit/passes/onnx/helper.cpp"             \

					            -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp"\

					            -g"-torch/csrc/jit/passes/onnx/shape_type_inference.cpp"\

					            -g"-torch/csrc/jit/serialization/onnx.cpp"             \

					            -g"-torch/csrc/jit/serialization/onnx.cpp"             \

					@ -191,44 +316,67 @@ jobs:

					            -g"-torch/csrc/deploy/interpreter/interpreter.h"  \

					            -g"-torch/csrc/deploy/interpreter/interpreter.h"  \

					            -g"-torch/csrc/deploy/interpreter/interpreter_impl.h"  \

					            -g"-torch/csrc/deploy/interpreter/interpreter_impl.h"  \

					            -g"-torch/csrc/deploy/interpreter/test_main.cpp"  \

					            -g"-torch/csrc/deploy/interpreter/test_main.cpp"  \

					            "$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt

					            "$@" > "${GITHUB_WORKSPACE}"/clang-tidy-output.txt

					          cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt

					          cat "${GITHUB_WORKSPACE}"/clang-tidy-output.txt

					      - name: Add annotations

					        uses: suo/add-annotations-github-action@master

					          tools/translate_annotations.py \

					            --file=clang-tidy-output.txt \

					            --regex='^(?P<filename>.*?):(?P<lineNumber>\d+):(?P<columnNumber>\d+): (?P<errorDesc>.*?) \[(?P<errorCode>.*)\]' \

					            --commit="$HEAD_SHA" \

					            > clang-tidy-output/annotations.json

					      - name: Upload artifact

					        uses: actions/upload-artifact@v2

					        with:

					        with:

					          check_name: 'clang-tidy'

					          name: clang-tidy

					          linter_output_path: 'clang-tidy-output.txt'

					          path: clang-tidy-output/

					          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

					          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorDesc>.*?) \[(?<errorCode>.*)\]'

					        env:

					          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

					  cmakelint:

					  cmakelint:

					    runs-on: ubuntu-18.04

					    runs-on: ubuntu-18.04

					    steps:

					    steps:

					      - name: Setup Python

					      - name: Setup Python

					        uses: actions/setup-python@v1

					        uses: actions/setup-python@v2

					        with:

					        with:

					          python-version: 3.x

					          python-version: 3.x

					          architecture: x64

					          architecture: x64

					      - name: Fetch PyTorch

					      - name: Fetch PyTorch

					        uses: actions/checkout@v1

					        uses: actions/checkout@v2

					      - name: Checkout PR tip

					      - name: Install dependencies

					        run: |

					          set -eux

					          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

					            # We are on a PR, so actions/checkout leaves us on a merge commit.

					            # Check out the actual tip of the branch.

					            git checkout ${{ github.event.pull_request.head.sha }}

					          fi

					          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

					        id: get_pr_tip

					      - name: Run cmakelint

					        run: |

					        run: |

					          set -eux

					          set -eux

					          pip install cmakelint

					          pip install cmakelint

					          cmakelint --version

					          cmakelint --version

					      - name: Run cmakelint

					        run: |

					          set -eux

					          git ls-files -z -- bootstrap '*.cmake' '*.cmake.in' '*CMakeLists.txt' | \

					          git ls-files -z -- bootstrap '*.cmake' '*.cmake.in' '*CMakeLists.txt' | \

					          grep -E -z -v '^(cmake/Modules/|cmake/Modules_CUDA_fix/)' | \

					          grep -E -z -v '^(cmake/Modules/|cmake/Modules_CUDA_fix/|cmake/Caffe2Config.cmake.in|aten/src/ATen/ATenConfig.cmake.in|cmake/Caffe2ConfigVersion.cmake.in|cmake/TorchConfig.cmake.in|cmake/TorchConfigVersion.cmake.in|cmake/cmake_uninstall.cmake.in)' | \

					          xargs -0 cmakelint --config=.cmakelintrc --spaces=2 --quiet

					          xargs -0 cmakelint --config=.cmakelintrc --spaces=2 --quiet

					  mypy:

					    runs-on: ubuntu-18.04

					    steps:

					      - name: Setup Python

					        uses: actions/setup-python@v2

					        with:

					          python-version: 3.8

					          architecture: x64

					      - name: Fetch PyTorch

					        uses: actions/checkout@v2

					      - name: Install dependencies

					        run: |

					          set -eux

					          pip install -r requirements.txt

					          pip install mypy==0.812

					          # Needed to check tools/render_junit.py

					          pip install junitparser rich

					      - name: Run autogen

					        run: |

					          set -eux

					          time python -mtools.generate_torch_version --is_debug=false

					          time python -mtools.codegen.gen -s aten/src/ATen -d build/aten/src/ATen

					          time python -mtools.pyi.gen_pyi --native-functions-path aten/src/ATen/native/native_functions.yaml --deprecated-functions-path "tools/autograd/deprecated.yaml"

					      - name: Run mypy

					        run: |

					          set -eux

					          for CONFIG in mypy*.ini; do mypy --config="$CONFIG"; done

									
										22

.github/workflows/push_nightly_docker_ghcr.yml
									
										vendored
									
										Normal file
									
												View File
												
					@ -0,0 +1,22 @@

					name: Build PyTorch nightly Docker image and push to GitHub Container Registry

					on:

					  schedule:

					    # Push the nightly docker daily at 1 PM UTC

					    - cron: '0 13 * * *'

					  # Have the ability to trigger this job manually using the API as well

					  workflow_dispatch:

					jobs:

					  build-publish-docker:

					    if: ${{ github.repository_owner == 'pytorch' }}

					    runs-on: linux.2xlarge

					    env:

					      GHCR_PAT: ${{ secrets.GHCR_PAT }}

					    steps:

					      - name: Checkout

					        uses: actions/checkout@v2

					        with:

					          ref: master

					      - name: Build and upload nightly docker

					        run: |

					          bash .github/scripts/build_publish_nightly_docker.sh

`@ -1,4 +1,4 @@`
	`#!/usr/bin/env python`	`#!/usr/bin/env python3`

	`from collections import namedtuple`	`from collections import namedtuple`

`@ -1129,4 +1129,3 @@ JNIEXPORT void JNI_OnUnload(JavaVM* vm, void* reserved);`
	`#define JNI_ABORT 2 /* free buffer w/o copying back */`	`#define JNI_ABORT 2 /* free buffer w/o copying back */`

	`#endif /* JNI_H_ */`	`#endif /* JNI_H_ */`

Compare commits

3467 Commits v1.8.1 ... v1.9.0

63 .azure_pipelines/build-pipeline.yml Normal file Unescape Escape View File

82 .azure_pipelines/daily-pipeline.yml Normal file Unescape Escape View File

134 .azure_pipelines/job_templates/build-verify-publish-template-unix.yml Normal file Unescape Escape View File

150 .azure_pipelines/job_templates/build-verify-publish-template-win.yml Normal file Unescape Escape View File

17 .azure_pipelines/job_templates/common-packages.yml Normal file Unescape Escape View File

62 .azure_pipelines/job_templates/prepare-build-template.yml Normal file Unescape Escape View File

51 .azure_pipelines/job_templates/pytorch-template-unix.yml Normal file Unescape Escape View File

49 .azure_pipelines/job_templates/pytorch-template-win.yml Normal file Unescape Escape View File

131 .azure_pipelines/job_templates/set-environment-variables.yml Normal file Unescape Escape View File

14 .azure_pipelines/job_templates/wheel-wait-job-template.yml Normal file Unescape Escape View File

49 .azure_pipelines/job_templates/wheel-wait-template.yml Normal file Unescape Escape View File

50 .azure_pipelines/nightly-pytorch-tests-pipeline.yml Normal file Unescape Escape View File

30 .azure_pipelines/pytorch-tests-pipeline.yml Normal file Unescape Escape View File

224 .azure_pipelines/verify-pipeline.yml Normal file Unescape Escape View File

13 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

20 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

6 .circleci/cimodel/data/dimensions.py Unescape Escape View File

54 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

4 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

16 .circleci/cimodel/data/simple/android_definitions.py Unescape Escape View File

12 .circleci/cimodel/data/simple/binary_smoketest.py Unescape Escape View File

13 .circleci/cimodel/data/simple/docker_definitions.py Unescape Escape View File

18 .circleci/cimodel/data/simple/ios_definitions.py Unescape Escape View File

32 .circleci/cimodel/data/simple/macos_definitions.py Unescape Escape View File

6 .circleci/cimodel/data/simple/mobile_definitions.py Unescape Escape View File

96 .circleci/cimodel/data/windows_build_definitions.py Unescape Escape View File

3859 .circleci/config.yml View File

12 .circleci/docker/README.md Unescape Escape View File

6 .circleci/docker/android/build.gradle Unescape Escape View File

133 .circleci/docker/build.sh Unescape Escape View File

5 .circleci/docker/build_docker.sh Unescape Escape View File

1 .circleci/docker/centos-rocm/Dockerfile Unescape Escape View File

2 .circleci/docker/common/install_android.sh Unescape Escape View File

1 .circleci/docker/common/install_base.sh Unescape Escape View File

25 .circleci/docker/common/install_breakpad.sh Normal file Unescape Escape View File

40 .circleci/docker/common/install_conda.sh Unescape Escape View File

14 .circleci/docker/common/install_openssl.sh Normal file Unescape Escape View File

22 .circleci/docker/common/install_rocm.sh Unescape Escape View File

19 .circleci/docker/common/install_vulkan_sdk.sh Unescape Escape View File

4 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

5 .circleci/docker/ubuntu-rocm/Dockerfile Unescape Escape View File

11 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

6 .circleci/ecr_gc_docker/Dockerfile Unescape Escape View File

2 .circleci/ecr_gc_docker/docker_hub.py Unescape Escape View File

12 .circleci/ecr_gc_docker/gc.py Unescape Escape View File

56 .circleci/generate_config_yml.py Unescape Escape View File

5 .circleci/regenerate.ps1 Normal file Unescape Escape View File

17 .circleci/regenerate.sh Unescape Escape View File

1 .circleci/scripts/binary_checkout.sh Unescape Escape View File

2 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

2 .circleci/scripts/binary_ios_test.sh Unescape Escape View File

4 .circleci/scripts/binary_linux_build.sh Unescape Escape View File

4 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

22 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

8 .circleci/scripts/binary_windows_build.sh Unescape Escape View File

2 .circleci/scripts/build_android_gradle.sh Unescape Escape View File

4 .circleci/scripts/publish_android_snapshot.sh Unescape Escape View File

8 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

49 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

140 .circleci/scripts/trigger_azure_pipeline.py Normal file Unescape Escape View File

4 .circleci/scripts/upload_binary_size_to_scuba.py Unescape Escape View File

24 .circleci/scripts/vs_install.ps1 Unescape Escape View File

5 .circleci/scripts/vs_install_cmath.ps1 Normal file Unescape Escape View File

15 .circleci/scripts/windows_cuda_install.sh Unescape Escape View File

9 .circleci/scripts/windows_cudnn_install.sh Unescape Escape View File

24 .circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Unescape Escape View File

12 .circleci/verbatim-sources/header-section.yml Unescape Escape View File

45 .circleci/verbatim-sources/job-specs/binary-job-specs.yml Unescape Escape View File

2 .circleci/verbatim-sources/job-specs/binary_update_htmls.yml Unescape Escape View File

16 .circleci/verbatim-sources/job-specs/docker_jobs.yml Unescape Escape View File

129 .circleci/verbatim-sources/job-specs/job-specs-custom.yml Unescape Escape View File

74 .circleci/verbatim-sources/job-specs/pytorch-job-specs.yml Unescape Escape View File

195 .circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml Normal file Unescape Escape View File

1 .circleci/windows-jni/include/jni.h Unescape Escape View File

4 .clang-tidy Unescape Escape View File

15 .coveragerc Normal file Unescape Escape View File

35 .flake8 Unescape Escape View File

14 .gdbinit Normal file Unescape Escape View File

3467 Commits

v1.8.1 ... v1.9.0

63

.azure_pipelines/build-pipeline.yml Normal file

View File

82

.azure_pipelines/daily-pipeline.yml Normal file

View File

134

.azure_pipelines/job_templates/build-verify-publish-template-unix.yml Normal file

View File

150

.azure_pipelines/job_templates/build-verify-publish-template-win.yml Normal file

View File

17

.azure_pipelines/job_templates/common-packages.yml Normal file

View File

62

.azure_pipelines/job_templates/prepare-build-template.yml Normal file

View File

51

.azure_pipelines/job_templates/pytorch-template-unix.yml Normal file

View File

49

.azure_pipelines/job_templates/pytorch-template-win.yml Normal file

View File

131

.azure_pipelines/job_templates/set-environment-variables.yml Normal file

View File

14

.azure_pipelines/job_templates/wheel-wait-job-template.yml Normal file

View File

49

.azure_pipelines/job_templates/wheel-wait-template.yml Normal file

View File

50

.azure_pipelines/nightly-pytorch-tests-pipeline.yml Normal file

View File

30

.azure_pipelines/pytorch-tests-pipeline.yml Normal file

View File

224

.azure_pipelines/verify-pipeline.yml Normal file

View File

13

.circleci/cimodel/data/binary_build_data.py

View File

20

.circleci/cimodel/data/binary_build_definitions.py

View File

6

.circleci/cimodel/data/dimensions.py

View File

54

.circleci/cimodel/data/pytorch_build_data.py

View File

4

.circleci/cimodel/data/pytorch_build_definitions.py

View File

16

.circleci/cimodel/data/simple/android_definitions.py

View File

12

.circleci/cimodel/data/simple/binary_smoketest.py

View File

13

.circleci/cimodel/data/simple/docker_definitions.py

View File

18

.circleci/cimodel/data/simple/ios_definitions.py

View File

32

.circleci/cimodel/data/simple/macos_definitions.py

View File

6

.circleci/cimodel/data/simple/mobile_definitions.py

View File

96

.circleci/cimodel/data/windows_build_definitions.py

View File

3859

.circleci/config.yml

View File

12

.circleci/docker/README.md

View File

6

.circleci/docker/android/build.gradle

View File

133

.circleci/docker/build.sh

View File

5

.circleci/docker/build_docker.sh

View File

1

.circleci/docker/centos-rocm/Dockerfile

View File

2

.circleci/docker/common/install_android.sh

View File

1

.circleci/docker/common/install_base.sh

View File

25

.circleci/docker/common/install_breakpad.sh Normal file

View File

40

.circleci/docker/common/install_conda.sh

View File

14

.circleci/docker/common/install_openssl.sh Normal file

View File

22

.circleci/docker/common/install_rocm.sh

View File

19

.circleci/docker/common/install_vulkan_sdk.sh

View File

4

.circleci/docker/ubuntu-cuda/Dockerfile

View File

5

.circleci/docker/ubuntu-rocm/Dockerfile

View File

11

.circleci/docker/ubuntu/Dockerfile

View File

6

.circleci/ecr_gc_docker/Dockerfile

View File

2

.circleci/ecr_gc_docker/docker_hub.py

View File

12

.circleci/ecr_gc_docker/gc.py

View File

56

.circleci/generate_config_yml.py

View File

5

.circleci/regenerate.ps1 Normal file

View File

17

.circleci/regenerate.sh

View File

1

.circleci/scripts/binary_checkout.sh

View File

2

.circleci/scripts/binary_ios_build.sh

View File

2

.circleci/scripts/binary_ios_test.sh

View File

4

.circleci/scripts/binary_linux_build.sh

View File

4

.circleci/scripts/binary_linux_test.sh

View File

22

.circleci/scripts/binary_populate_env.sh

View File

8

.circleci/scripts/binary_windows_build.sh

View File

2

.circleci/scripts/build_android_gradle.sh

View File

4

.circleci/scripts/publish_android_snapshot.sh

View File

8

.circleci/scripts/python_doc_push_script.sh

View File

49

.circleci/scripts/setup_ci_environment.sh

View File

140

.circleci/scripts/trigger_azure_pipeline.py Normal file

View File

4

.circleci/scripts/upload_binary_size_to_scuba.py

View File

24

.circleci/scripts/vs_install.ps1

View File

5

.circleci/scripts/vs_install_cmath.ps1 Normal file

View File

15

.circleci/scripts/windows_cuda_install.sh

View File

9

.circleci/scripts/windows_cudnn_install.sh

View File

24

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml

View File

12

.circleci/verbatim-sources/header-section.yml

View File

45

.circleci/verbatim-sources/job-specs/binary-job-specs.yml

View File

2

.circleci/verbatim-sources/job-specs/binary_update_htmls.yml

View File

16

.circleci/verbatim-sources/job-specs/docker_jobs.yml

View File

129

.circleci/verbatim-sources/job-specs/job-specs-custom.yml

View File

74

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml

View File

195

.circleci/verbatim-sources/workflows/workflows-scheduled-ci.yml Normal file

View File

1

.circleci/windows-jni/include/jni.h

View File

4

.clang-tidy

View File

15

.coveragerc Normal file

View File

35

.flake8

View File

14

.gdbinit Normal file

View File

2

.github/pytorch-circleci-labels.yml vendored

View File